Agent Skills: Embeddings Visualization in FiftyOne

Visualizes datasets in 2D using embeddings with UMAP or t-SNE dimensionality reduction. Use when exploring dataset structure, finding clusters, identifying outliers, or understanding data distribution.

UncategorizedID: AdonaiVera/fiftyone-skills/fiftyone-embeddings-visualization

Install this agent skill to your local

pnpm dlx add-skill https://github.com/voxel51/fiftyone-skills/tree/HEAD/skills/fiftyone-embeddings-visualization

Skill Files

Browse the full folder contents for fiftyone-embeddings-visualization.

Download Skill

Loading file tree…

skills/fiftyone-embeddings-visualization/SKILL.md

Skill Metadata

Name
fiftyone-embeddings-visualization
Description
Visualizes datasets in 2D using embeddings with UMAP or t-SNE dimensionality reduction. Use when exploring dataset structure, finding clusters, identifying outliers, or understanding data distribution.

Embeddings Visualization in FiftyOne

Key Directives

ALWAYS follow these rules:

1. Set context first

set_context(dataset_name="my-dataset")

2. Launch FiftyOne App

Brain operators are delegated and require the app:

launch_app()

Wait 5-10 seconds for initialization.

3. Discover operators dynamically

# List all brain operators
list_operators(builtin_only=False)

# Get schema for specific operator
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")

4. Compute embeddings before visualization

Embeddings are required for dimensionality reduction:

execute_operator(
    operator_uri="@voxel51/brain/compute_similarity",
    params={
        "brain_key": "img_sim",
        "model": "clip-vit-base32-torch",
        "embeddings": "clip_embeddings",
        "backend": "sklearn",
        "metric": "cosine"
    }
)

5. Close app when done

close_app()

Complete Workflow

Step 1: Setup

# Set context
set_context(dataset_name="my-dataset")

# Launch app (required for brain operators)
launch_app()

Step 2: Verify Brain Plugin

# Check if brain plugin is available
list_plugins(enabled=True)

# If not installed:
download_plugin(
    url_or_repo="voxel51/fiftyone-plugins",
    plugin_names=["@voxel51/brain"]
)
enable_plugin(plugin_name="@voxel51/brain")

Step 3: Discover Brain Operators

# List all available operators
list_operators(builtin_only=False)

# Get schema for compute_visualization
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")

Step 4: Check for Existing Embeddings or Compute New Ones

First, check if the dataset already has embeddings by looking at the operator schema:

get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
# Look for existing embeddings fields in the "embeddings" choices
# (e.g., "clip_embeddings", "dinov2_embeddings")

If embeddings exist: Skip to Step 5 and use the existing embeddings field.

If no embeddings exist: Compute them:

execute_operator(
    operator_uri="@voxel51/brain/compute_similarity",
    params={
        "brain_key": "img_viz",
        "model": "clip-vit-base32-torch",
        "embeddings": "clip_embeddings",  # Field name to store embeddings
        "backend": "sklearn",
        "metric": "cosine"
    }
)

Required parameters for compute_similarity:

  • brain_key - Unique identifier for this brain run
  • model - Model from FiftyOne Model Zoo to generate embeddings
  • embeddings - Field name where embeddings will be stored
  • backend - Similarity backend (use "sklearn")
  • metric - Distance metric (use "cosine" or "euclidean")

Recommended embedding models:

  • clip-vit-base32-torch - Best for general visual + semantic similarity
  • dinov2-vits14-torch - Best for visual similarity only
  • resnet50-imagenet-torch - Classic CNN features
  • mobilenet-v2-imagenet-torch - Fast, lightweight option

Step 5: Compute 2D Visualization

Use existing embeddings field OR the brain_key from Step 4:

# Option A: Use existing embeddings field (e.g., clip_embeddings)
execute_operator(
    operator_uri="@voxel51/brain/compute_visualization",
    params={
        "brain_key": "img_viz",
        "embeddings": "clip_embeddings",  # Use existing field
        "method": "umap",
        "num_dims": 2
    }
)

# Option B: Use brain_key from compute_similarity
execute_operator(
    operator_uri="@voxel51/brain/compute_visualization",
    params={
        "brain_key": "img_viz",  # Same key used in compute_similarity
        "method": "umap",
        "num_dims": 2
    }
)

Dimensionality reduction methods:

  • umap - (Recommended) Preserves local and global structure, faster. Requires umap-learn package.
  • tsne - Better local structure, slower on large datasets. No extra dependencies.
  • pca - Linear reduction, fastest but less informative

Step 6: Direct User to Embeddings Panel

After computing visualization, direct the user to open the FiftyOne App at http://localhost:5151/ and:

  1. Click the Embeddings panel icon (scatter plot icon, looks like a grid of dots) in the top toolbar
  2. Select the brain key (e.g., img_viz) from the dropdown
  3. Points represent samples in 2D embedding space
  4. Use the "Color by" dropdown to color points by a field (e.g., ground_truth, predictions)
  5. Click points to select samples, use lasso tool to select groups

IMPORTANT: Do NOT use set_view(exists=["brain_key"]) - this filters samples and is not needed for visualization. The Embeddings panel automatically shows all samples with computed coordinates.

Step 7: Explore and Filter (Optional)

To filter samples while viewing in the Embeddings panel:

# Filter to specific class
set_view(filters={"ground_truth.label": "dog"})

# Filter by tag
set_view(tags=["validated"])

# Clear filter to show all
clear_view()

These filters will update the Embeddings panel to show only matching samples.

Step 8: Find Outliers

Outliers appear as isolated points far from clusters:

# Compute uniqueness scores (higher = more unique/outlier)
execute_operator(
    operator_uri="@voxel51/brain/compute_uniqueness",
    params={
        "brain_key": "img_viz"
    }
)

# View most unique samples (potential outliers)
set_view(sort_by="uniqueness", reverse=True, limit=50)

Step 9: Find Clusters

Use the App's Embeddings panel to visually identify clusters, then:

Option A: Lasso selection in App

  1. Use lasso tool to select a cluster
  2. Selected samples are highlighted
  3. Tag or export selected samples

Option B: Use similarity to find cluster members

# Sort by similarity to a representative sample
execute_operator(
    operator_uri="@voxel51/brain/sort_by_similarity",
    params={
        "brain_key": "img_viz",
        "query_id": "sample_id_from_cluster",
        "k": 100
    }
)

Step 10: Clean Up

close_app()

Available Tools

Session View Tools

| Tool | Description | |------|-------------| | set_view(filters={...}) | Filter samples by field values | | set_view(tags=[...]) | Filter samples by tags | | set_view(sort_by="...", reverse=True) | Sort samples by field | | set_view(limit=N) | Limit to N samples | | clear_view() | Clear filters, show all samples |

Brain Operators for Visualization

Use list_operators() to discover and get_operator_schema() to see parameters:

| Operator | Description | |----------|-------------| | @voxel51/brain/compute_similarity | Compute embeddings and similarity index | | @voxel51/brain/compute_visualization | Reduce embeddings to 2D/3D for visualization | | @voxel51/brain/compute_uniqueness | Score samples by uniqueness (outlier detection) | | @voxel51/brain/sort_by_similarity | Sort by similarity to a query sample |

Common Use Cases

Use Case 1: Basic Dataset Exploration

Visualize dataset structure and explore clusters:

set_context(dataset_name="my-dataset")
launch_app()

# Check for existing embeddings in schema
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")

# If embeddings exist (e.g., clip_embeddings), use them directly:
execute_operator(
    operator_uri="@voxel51/brain/compute_visualization",
    params={
        "brain_key": "exploration",
        "embeddings": "clip_embeddings",
        "method": "umap",  # or "tsne" if umap-learn not installed
        "num_dims": 2
    }
)

# Direct user to App Embeddings panel at http://localhost:5151/
# 1. Click Embeddings panel icon
# 2. Select "exploration" from dropdown
# 3. Use "Color by" to color by ground_truth or predictions

Use Case 2: Find Outliers in Dataset

Identify anomalous or mislabeled samples:

set_context(dataset_name="my-dataset")
launch_app()

# Check for existing embeddings in schema
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")

# If no embeddings exist, compute them:
execute_operator(
    operator_uri="@voxel51/brain/compute_similarity",
    params={
        "brain_key": "outliers",
        "model": "clip-vit-base32-torch",
        "embeddings": "clip_embeddings",
        "backend": "sklearn",
        "metric": "cosine"
    }
)

# Compute uniqueness scores
execute_operator(
    operator_uri="@voxel51/brain/compute_uniqueness",
    params={"brain_key": "outliers"}
)

# Generate visualization (use existing embeddings field or brain_key)
execute_operator(
    operator_uri="@voxel51/brain/compute_visualization",
    params={
        "brain_key": "outliers",
        "embeddings": "clip_embeddings",  # Use existing field if available
        "method": "umap",  # or "tsne" if umap-learn not installed
        "num_dims": 2
    }
)

# Direct user to App at http://localhost:5151/
# 1. Click Embeddings panel icon
# 2. Select "outliers" from dropdown
# 3. Outliers appear as isolated points far from clusters
# 4. Optionally sort by uniqueness field in the App sidebar

Use Case 3: Compare Classes in Embedding Space

See how different classes cluster:

set_context(dataset_name="my-dataset")
launch_app()

# Check for existing embeddings in schema
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")

# If no embeddings exist, compute them:
execute_operator(
    operator_uri="@voxel51/brain/compute_similarity",
    params={
        "brain_key": "class_viz",
        "model": "clip-vit-base32-torch",
        "embeddings": "clip_embeddings",
        "backend": "sklearn",
        "metric": "cosine"
    }
)

# Generate visualization (use existing embeddings field or brain_key)
execute_operator(
    operator_uri="@voxel51/brain/compute_visualization",
    params={
        "brain_key": "class_viz",
        "embeddings": "clip_embeddings",  # Use existing field if available
        "method": "umap",  # or "tsne" if umap-learn not installed
        "num_dims": 2
    }
)

# Direct user to App at http://localhost:5151/
# 1. Click Embeddings panel icon
# 2. Select "class_viz" from dropdown
# 3. Use "Color by" dropdown to color by ground_truth or predictions
# Look for:
# - Well-separated clusters = good class distinction
# - Overlapping clusters = similar classes or confusion
# - Scattered points = high variance within class

Use Case 4: Analyze Model Predictions

Compare ground truth vs predictions in embedding space:

set_context(dataset_name="my-dataset")
launch_app()

# Check for existing embeddings in schema
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")

# If no embeddings exist, compute them:
execute_operator(
    operator_uri="@voxel51/brain/compute_similarity",
    params={
        "brain_key": "pred_analysis",
        "model": "clip-vit-base32-torch",
        "embeddings": "clip_embeddings",
        "backend": "sklearn",
        "metric": "cosine"
    }
)

# Generate visualization (use existing embeddings field or brain_key)
execute_operator(
    operator_uri="@voxel51/brain/compute_visualization",
    params={
        "brain_key": "pred_analysis",
        "embeddings": "clip_embeddings",  # Use existing field if available
        "method": "umap",  # or "tsne" if umap-learn not installed
        "num_dims": 2
    }
)

# Direct user to App at http://localhost:5151/
# 1. Click Embeddings panel icon
# 2. Select "pred_analysis" from dropdown
# 3. Color by ground_truth - see true class distribution
# 4. Color by predictions - see model's view
# 5. Look for mismatches to find errors

Use Case 5: t-SNE for Publication-Quality Plots

Use t-SNE for better local structure (no extra dependencies):

set_context(dataset_name="my-dataset")
launch_app()

# Check for existing embeddings in schema
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")

# If no embeddings exist, compute them (DINOv2 for visual similarity):
execute_operator(
    operator_uri="@voxel51/brain/compute_similarity",
    params={
        "brain_key": "tsne_viz",
        "model": "dinov2-vits14-torch",
        "embeddings": "dinov2_embeddings",
        "backend": "sklearn",
        "metric": "cosine"
    }
)

# Generate t-SNE visualization (no umap-learn dependency needed)
execute_operator(
    operator_uri="@voxel51/brain/compute_visualization",
    params={
        "brain_key": "tsne_viz",
        "embeddings": "dinov2_embeddings",  # Use existing field if available
        "method": "tsne",
        "num_dims": 2
    }
)

# Direct user to App at http://localhost:5151/
# 1. Click Embeddings panel icon
# 2. Select "tsne_viz" from dropdown
# 3. t-SNE provides better local cluster structure than UMAP

Troubleshooting

Error: "No executor available"

  • Cause: Delegated operators require the App executor
  • Solution: Ensure launch_app() was called and wait 5-10 seconds

Error: "Brain key not found"

  • Cause: Embeddings not computed
  • Solution: Run compute_similarity first with a brain_key

Error: "Operator not found"

  • Cause: Brain plugin not installed
  • Solution: Install with download_plugin() and enable_plugin()

Error: "You must install the umap-learn>=0.5 package"

  • Cause: UMAP method requires the umap-learn package
  • Solutions:
    1. Install umap-learn: Ask user if they want to run pip install umap-learn
    2. Use t-SNE instead: Change method to "tsne" (no extra dependencies)
    3. Use PCA instead: Change method to "pca" (fastest, no extra dependencies)
  • After installing umap-learn, restart Claude Code/MCP server and retry

Visualization is slow

  • Use UMAP instead of t-SNE for large datasets
  • Use faster embedding model: mobilenet-v2-imagenet-torch
  • Process subset first: set_view(limit=1000)

Embeddings panel not showing

  • Ensure visualization was computed (not just embeddings)
  • Check brain_key matches in both compute_similarity and compute_visualization
  • Refresh the App page

Points not colored correctly

  • Verify the field exists on samples
  • Check field type is compatible (Classification, Detections, or string)

Best Practices

  1. Discover dynamically - Use list_operators() and get_operator_schema() to get current operator names and parameters
  2. Choose the right model - CLIP for semantic similarity, DINOv2 for visual similarity
  3. Start with UMAP - Faster and often better than t-SNE for exploration
  4. Use uniqueness for outliers - More reliable than visual inspection alone
  5. Store embeddings - Reuse for multiple visualizations via brain_key
  6. Subset large datasets - Compute on subset first, then full dataset

Performance Notes

Embedding computation time:

  • 1,000 images: ~1-2 minutes
  • 10,000 images: ~10-15 minutes
  • 100,000 images: ~1-2 hours

Visualization computation time:

  • UMAP: ~30 seconds for 10,000 samples
  • t-SNE: ~5-10 minutes for 10,000 samples
  • PCA: ~5 seconds for 10,000 samples

Memory requirements:

  • ~2KB per image for embeddings
  • ~16 bytes per image for 2D coordinates

Resources