Embeddings Visualization in FiftyOne Skill

Embeddings Visualization in FiftyOne

Key Directives

ALWAYS follow these rules:

1. Set context first

set_context(dataset_name="my-dataset")

2. Launch FiftyOne App

Brain operators are delegated and require the app:

launch_app()

Wait 5-10 seconds for initialization.

3. Discover operators dynamically

# List all brain operators
list_operators(builtin_only=False)

# Get schema for specific operator
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")

4. Compute embeddings before visualization

Embeddings are required for dimensionality reduction:

execute_operator(
    operator_uri="@voxel51/brain/compute_similarity",
    params={
        "brain_key": "img_sim",
        "model": "clip-vit-base32-torch",
        "embeddings": "clip_embeddings",
        "backend": "sklearn",
        "metric": "cosine"
    }
)

5. Close app when done

close_app()

Complete Workflow

Step 1: Setup

# Set context
set_context(dataset_name="my-dataset")

# Launch app (required for brain operators)
launch_app()

Step 2: Verify Brain Plugin

# Check if brain plugin is available
list_plugins(enabled=True)

# If not installed:
download_plugin(
    url_or_repo="voxel51/fiftyone-plugins",
    plugin_names=["@voxel51/brain"]
)
enable_plugin(plugin_name="@voxel51/brain")

Step 3: Discover Brain Operators

# List all available operators
list_operators(builtin_only=False)

# Get schema for compute_visualization
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")

Step 4: Check for Existing Embeddings or Compute New Ones

First, check if the dataset already has embeddings by looking at the operator schema:

get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
# Look for existing embeddings fields in the "embeddings" choices
# (e.g., "clip_embeddings", "dinov2_embeddings")

If embeddings exist: Skip to Step 5 and use the existing embeddings field.

If no embeddings exist: Compute them:

execute_operator(
    operator_uri="@voxel51/brain/compute_similarity",
    params={
        "brain_key": "img_viz",
        "model": "clip-vit-base32-torch",
        "embeddings": "clip_embeddings",  # Field name to store embeddings
        "backend": "sklearn",
        "metric": "cosine"
    }
)

Required parameters for compute_similarity:

brain_key - Unique identifier for this brain run
model - Model from FiftyOne Model Zoo to generate embeddings
embeddings - Field name where embeddings will be stored
backend - Similarity backend (use "sklearn")
metric - Distance metric (use "cosine" or "euclidean")

Recommended embedding models:

clip-vit-base32-torch - Best for general visual + semantic similarity
dinov2-vits14-torch - Best for visual similarity only
resnet50-imagenet-torch - Classic CNN features
mobilenet-v2-imagenet-torch - Fast, lightweight option

Step 5: Compute 2D Visualization

Use existing embeddings field OR the brain_key from Step 4:

# Option A: Use existing embeddings field (e.g., clip_embeddings)
execute_operator(
    operator_uri="@voxel51/brain/compute_visualization",
    params={
        "brain_key": "img_viz",
        "embeddings": "clip_embeddings",  # Use existing field
        "method": "umap",
        "num_dims": 2
    }
)

# Option B: Use brain_key from compute_similarity
execute_operator(
    operator_uri="@voxel51/brain/compute_visualization",
    params={
        "brain_key": "img_viz",  # Same key used in compute_similarity
        "method": "umap",
        "num_dims": 2
    }
)

Dimensionality reduction methods:

umap - (Recommended) Preserves local and global structure, faster. Requires umap-learn package.
tsne - Better local structure, slower on large datasets. No extra dependencies.
pca - Linear reduction, fastest but less informative

Step 6: Direct User to Embeddings Panel

After computing visualization, direct the user to open the FiftyOne App at http://localhost:5151/ and:

Click the Embeddings panel icon (scatter plot icon, looks like a grid of dots) in the top toolbar
Select the brain key (e.g., img_viz) from the dropdown
Points represent samples in 2D embedding space
Use the "Color by" dropdown to color points by a field (e.g., ground_truth, predictions)
Click points to select samples, use lasso tool to select groups

IMPORTANT: Do NOT use set_view(exists=["brain_key"]) - this filters samples and is not needed for visualization. The Embeddings panel automatically shows all samples with computed coordinates.

Step 7: Explore and Filter (Optional)

To filter samples while viewing in the Embeddings panel:

# Filter to specific class
set_view(filters={"ground_truth.label": "dog"})

# Filter by tag
set_view(tags=["validated"])

# Clear filter to show all
clear_view()

These filters will update the Embeddings panel to show only matching samples.

Step 8: Find Outliers

Outliers appear as isolated points far from clusters:

# Compute uniqueness scores (higher = more unique/outlier)
execute_operator(
    operator_uri="@voxel51/brain/compute_uniqueness",
    params={
        "brain_key": "img_viz"
    }
)

# View most unique samples (potential outliers)
set_view(sort_by="uniqueness", reverse=True, limit=50)

Step 9: Find Clusters

Use the App's Embeddings panel to visually identify clusters, then:

Option A: Lasso selection in App

Use lasso tool to select a cluster
Selected samples are highlighted
Tag or export selected samples

Option B: Use similarity to find cluster members

# Sort by similarity to a representative sample
execute_operator(
    operator_uri="@voxel51/brain/sort_by_similarity",
    params={
        "brain_key": "img_viz",
        "query_id": "sample_id_from_cluster",
        "k": 100
    }
)

Step 10: Clean Up

close_app()

Available Tools

Session View Tools

| Tool | Description | |------|-------------| | set_view(filters={...}) | Filter samples by field values | | set_view(tags=[...]) | Filter samples by tags | | set_view(sort_by="...", reverse=True) | Sort samples by field | | set_view(limit=N) | Limit to N samples | | clear_view() | Clear filters, show all samples |

Brain Operators for Visualization

Use list_operators() to discover and get_operator_schema() to see parameters:

| Operator | Description | |----------|-------------| | @voxel51/brain/compute_similarity | Compute embeddings and similarity index | | @voxel51/brain/compute_visualization | Reduce embeddings to 2D/3D for visualization | | @voxel51/brain/compute_uniqueness | Score samples by uniqueness (outlier detection) | | @voxel51/brain/sort_by_similarity | Sort by similarity to a query sample |

Common Use Cases

Use Case 1: Basic Dataset Exploration

Visualize dataset structure and explore clusters:

set_context(dataset_name="my-dataset")
launch_app()

# Check for existing embeddings in schema
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")

# If embeddings exist (e.g., clip_embeddings), use them directly:
execute_operator(
    operator_uri="@voxel51/brain/compute_visualization",
    params={
        "brain_key": "exploration",
        "embeddings": "clip_embeddings",
        "method": "umap",  # or "tsne" if umap-learn not installed
        "num_dims": 2
    }
)

# Direct user to App Embeddings panel at http://localhost:5151/
# 1. Click Embeddings panel icon
# 2. Select "exploration" from dropdown
# 3. Use "Color by" to color by ground_truth or predictions

Use Case 2: Find Outliers in Dataset

Identify anomalous or mislabeled samples:

set_context(dataset_name="my-dataset")
launch_app()

# Check for existing embeddings in schema
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")

# If no embeddings exist, compute them:
execute_operator(
    operator_uri="@voxel51/brain/compute_similarity",
    params={
        "brain_key": "outliers",
        "model": "clip-vit-base32-torch",
        "embeddings": "clip_embeddings",
        "backend": "sklearn",
        "metric": "cosine"
    }
)

# Compute uniqueness scores
execute_operator(
    operator_uri="@voxel51/brain/compute_uniqueness",
    params={"brain_key": "outliers"}
)

# Generate visualization (use existing embeddings field or brain_key)
execute_operator(
    operator_uri="@voxel51/brain/compute_visualization",
    params={
        "brain_key": "outliers",
        "embeddings": "clip_embeddings",  # Use existing field if available
        "method": "umap",  # or "tsne" if umap-learn not installed
        "num_dims": 2
    }
)

# Direct user to App at http://localhost:5151/
# 1. Click Embeddings panel icon
# 2. Select "outliers" from dropdown
# 3. Outliers appear as isolated points far from clusters
# 4. Optionally sort by uniqueness field in the App sidebar

Use Case 3: Compare Classes in Embedding Space

See how different classes cluster:

set_context(dataset_name="my-dataset")
launch_app()

# Check for existing embeddings in schema
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")

# If no embeddings exist, compute them:
execute_operator(
    operator_uri="@voxel51/brain/compute_similarity",
    params={
        "brain_key": "class_viz",
        "model": "clip-vit-base32-torch",
        "embeddings": "clip_embeddings",
        "backend": "sklearn",
        "metric": "cosine"
    }
)

# Generate visualization (use existing embeddings field or brain_key)
execute_operator(
    operator_uri="@voxel51/brain/compute_visualization",
    params={
        "brain_key": "class_viz",
        "embeddings": "clip_embeddings",  # Use existing field if available
        "method": "umap",  # or "tsne" if umap-learn not installed
        "num_dims": 2
    }
)

# Direct user to App at http://localhost:5151/
# 1. Click Embeddings panel icon
# 2. Select "class_viz" from dropdown
# 3. Use "Color by" dropdown to color by ground_truth or predictions
# Look for:
# - Well-separated clusters = good class distinction
# - Overlapping clusters = similar classes or confusion
# - Scattered points = high variance within class

Use Case 4: Analyze Model Predictions

Compare ground truth vs predictions in embedding space:

set_context(dataset_name="my-dataset")
launch_app()

# Check for existing embeddings in schema
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")

# If no embeddings exist, compute them:
execute_operator(
    operator_uri="@voxel51/brain/compute_similarity",
    params={
        "brain_key": "pred_analysis",
        "model": "clip-vit-base32-torch",
        "embeddings": "clip_embeddings",
        "backend": "sklearn",
        "metric": "cosine"
    }
)

# Generate visualization (use existing embeddings field or brain_key)
execute_operator(
    operator_uri="@voxel51/brain/compute_visualization",
    params={
        "brain_key": "pred_analysis",
        "embeddings": "clip_embeddings",  # Use existing field if available
        "method": "umap",  # or "tsne" if umap-learn not installed
        "num_dims": 2
    }
)

# Direct user to App at http://localhost:5151/
# 1. Click Embeddings panel icon
# 2. Select "pred_analysis" from dropdown
# 3. Color by ground_truth - see true class distribution
# 4. Color by predictions - see model's view
# 5. Look for mismatches to find errors

Use Case 5: t-SNE for Publication-Quality Plots

Use t-SNE for better local structure (no extra dependencies):

set_context(dataset_name="my-dataset")
launch_app()

# Check for existing embeddings in schema
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")

# If no embeddings exist, compute them (DINOv2 for visual similarity):
execute_operator(
    operator_uri="@voxel51/brain/compute_similarity",
    params={
        "brain_key": "tsne_viz",
        "model": "dinov2-vits14-torch",
        "embeddings": "dinov2_embeddings",
        "backend": "sklearn",
        "metric": "cosine"
    }
)

# Generate t-SNE visualization (no umap-learn dependency needed)
execute_operator(
    operator_uri="@voxel51/brain/compute_visualization",
    params={
        "brain_key": "tsne_viz",
        "embeddings": "dinov2_embeddings",  # Use existing field if available
        "method": "tsne",
        "num_dims": 2
    }
)

# Direct user to App at http://localhost:5151/
# 1. Click Embeddings panel icon
# 2. Select "tsne_viz" from dropdown
# 3. t-SNE provides better local cluster structure than UMAP

Troubleshooting

Error: "No executor available"

Cause: Delegated operators require the App executor
Solution: Ensure launch_app() was called and wait 5-10 seconds

Error: "Brain key not found"

Cause: Embeddings not computed
Solution: Run compute_similarity first with a brain_key

Error: "Operator not found"

Cause: Brain plugin not installed
Solution: Install with download_plugin() and enable_plugin()

Error: "You must install the umap-learn>=0.5 package"

Cause: UMAP method requires the umap-learn package
Solutions:
1. Install umap-learn: Ask user if they want to run pip install umap-learn
2. Use t-SNE instead: Change method to "tsne" (no extra dependencies)
3. Use PCA instead: Change method to "pca" (fastest, no extra dependencies)
After installing umap-learn, restart Claude Code/MCP server and retry

Visualization is slow

Use UMAP instead of t-SNE for large datasets
Use faster embedding model: mobilenet-v2-imagenet-torch
Process subset first: set_view(limit=1000)

Embeddings panel not showing

Ensure visualization was computed (not just embeddings)
Check brain_key matches in both compute_similarity and compute_visualization
Refresh the App page

Points not colored correctly

Verify the field exists on samples
Check field type is compatible (Classification, Detections, or string)

Best Practices

Discover dynamically - Use list_operators() and get_operator_schema() to get current operator names and parameters
Choose the right model - CLIP for semantic similarity, DINOv2 for visual similarity
Start with UMAP - Faster and often better than t-SNE for exploration
Use uniqueness for outliers - More reliable than visual inspection alone
Store embeddings - Reuse for multiple visualizations via brain_key
Subset large datasets - Compute on subset first, then full dataset

Performance Notes

Embedding computation time:

1,000 images: ~1-2 minutes
10,000 images: ~10-15 minutes
100,000 images: ~1-2 hours

Visualization computation time:

UMAP: ~30 seconds for 10,000 samples
t-SNE: ~5-10 minutes for 10,000 samples
PCA: ~5 seconds for 10,000 samples

Memory requirements:

~2KB per image for embeddings
~16 bytes per image for 2D coordinates

Agent Skills: Embeddings Visualization in FiftyOne

Install this agent skill to your local

Skill Files