Embeddings Visualization in FiftyOne
Key Directives
ALWAYS follow these rules:
1. Set context first
set_context(dataset_name="my-dataset")
2. Launch FiftyOne App
Brain operators are delegated and require the app:
launch_app()
Wait 5-10 seconds for initialization.
3. Discover operators dynamically
# List all brain operators
list_operators(builtin_only=False)
# Get schema for specific operator
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
4. Compute embeddings before visualization
Embeddings are required for dimensionality reduction:
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "img_sim",
"model": "clip-vit-base32-torch",
"embeddings": "clip_embeddings",
"backend": "sklearn",
"metric": "cosine"
}
)
5. Close app when done
close_app()
Complete Workflow
Step 1: Setup
# Set context
set_context(dataset_name="my-dataset")
# Launch app (required for brain operators)
launch_app()
Step 2: Verify Brain Plugin
# Check if brain plugin is available
list_plugins(enabled=True)
# If not installed:
download_plugin(
url_or_repo="voxel51/fiftyone-plugins",
plugin_names=["@voxel51/brain"]
)
enable_plugin(plugin_name="@voxel51/brain")
Step 3: Discover Brain Operators
# List all available operators
list_operators(builtin_only=False)
# Get schema for compute_visualization
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
Step 4: Check for Existing Embeddings or Compute New Ones
First, check if the dataset already has embeddings by looking at the operator schema:
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
# Look for existing embeddings fields in the "embeddings" choices
# (e.g., "clip_embeddings", "dinov2_embeddings")
If embeddings exist: Skip to Step 5 and use the existing embeddings field.
If no embeddings exist: Compute them:
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "img_viz",
"model": "clip-vit-base32-torch",
"embeddings": "clip_embeddings", # Field name to store embeddings
"backend": "sklearn",
"metric": "cosine"
}
)
Required parameters for compute_similarity:
brain_key- Unique identifier for this brain runmodel- Model from FiftyOne Model Zoo to generate embeddingsembeddings- Field name where embeddings will be storedbackend- Similarity backend (use"sklearn")metric- Distance metric (use"cosine"or"euclidean")
Recommended embedding models:
clip-vit-base32-torch- Best for general visual + semantic similaritydinov2-vits14-torch- Best for visual similarity onlyresnet50-imagenet-torch- Classic CNN featuresmobilenet-v2-imagenet-torch- Fast, lightweight option
Step 5: Compute 2D Visualization
Use existing embeddings field OR the brain_key from Step 4:
# Option A: Use existing embeddings field (e.g., clip_embeddings)
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "img_viz",
"embeddings": "clip_embeddings", # Use existing field
"method": "umap",
"num_dims": 2
}
)
# Option B: Use brain_key from compute_similarity
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "img_viz", # Same key used in compute_similarity
"method": "umap",
"num_dims": 2
}
)
Dimensionality reduction methods:
umap- (Recommended) Preserves local and global structure, faster. Requiresumap-learnpackage.tsne- Better local structure, slower on large datasets. No extra dependencies.pca- Linear reduction, fastest but less informative
Step 6: Direct User to Embeddings Panel
After computing visualization, direct the user to open the FiftyOne App at http://localhost:5151/ and:
- Click the Embeddings panel icon (scatter plot icon, looks like a grid of dots) in the top toolbar
- Select the brain key (e.g.,
img_viz) from the dropdown - Points represent samples in 2D embedding space
- Use the "Color by" dropdown to color points by a field (e.g.,
ground_truth,predictions) - Click points to select samples, use lasso tool to select groups
IMPORTANT: Do NOT use set_view(exists=["brain_key"]) - this filters samples and is not needed for visualization. The Embeddings panel automatically shows all samples with computed coordinates.
Step 7: Explore and Filter (Optional)
To filter samples while viewing in the Embeddings panel:
# Filter to specific class
set_view(filters={"ground_truth.label": "dog"})
# Filter by tag
set_view(tags=["validated"])
# Clear filter to show all
clear_view()
These filters will update the Embeddings panel to show only matching samples.
Step 8: Find Outliers
Outliers appear as isolated points far from clusters:
# Compute uniqueness scores (higher = more unique/outlier)
execute_operator(
operator_uri="@voxel51/brain/compute_uniqueness",
params={
"brain_key": "img_viz"
}
)
# View most unique samples (potential outliers)
set_view(sort_by="uniqueness", reverse=True, limit=50)
Step 9: Find Clusters
Use the App's Embeddings panel to visually identify clusters, then:
Option A: Lasso selection in App
- Use lasso tool to select a cluster
- Selected samples are highlighted
- Tag or export selected samples
Option B: Use similarity to find cluster members
# Sort by similarity to a representative sample
execute_operator(
operator_uri="@voxel51/brain/sort_by_similarity",
params={
"brain_key": "img_viz",
"query_id": "sample_id_from_cluster",
"k": 100
}
)
Step 10: Clean Up
close_app()
Available Tools
Session View Tools
| Tool | Description |
|------|-------------|
| set_view(filters={...}) | Filter samples by field values |
| set_view(tags=[...]) | Filter samples by tags |
| set_view(sort_by="...", reverse=True) | Sort samples by field |
| set_view(limit=N) | Limit to N samples |
| clear_view() | Clear filters, show all samples |
Brain Operators for Visualization
Use list_operators() to discover and get_operator_schema() to see parameters:
| Operator | Description |
|----------|-------------|
| @voxel51/brain/compute_similarity | Compute embeddings and similarity index |
| @voxel51/brain/compute_visualization | Reduce embeddings to 2D/3D for visualization |
| @voxel51/brain/compute_uniqueness | Score samples by uniqueness (outlier detection) |
| @voxel51/brain/sort_by_similarity | Sort by similarity to a query sample |
Common Use Cases
Use Case 1: Basic Dataset Exploration
Visualize dataset structure and explore clusters:
set_context(dataset_name="my-dataset")
launch_app()
# Check for existing embeddings in schema
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
# If embeddings exist (e.g., clip_embeddings), use them directly:
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "exploration",
"embeddings": "clip_embeddings",
"method": "umap", # or "tsne" if umap-learn not installed
"num_dims": 2
}
)
# Direct user to App Embeddings panel at http://localhost:5151/
# 1. Click Embeddings panel icon
# 2. Select "exploration" from dropdown
# 3. Use "Color by" to color by ground_truth or predictions
Use Case 2: Find Outliers in Dataset
Identify anomalous or mislabeled samples:
set_context(dataset_name="my-dataset")
launch_app()
# Check for existing embeddings in schema
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
# If no embeddings exist, compute them:
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "outliers",
"model": "clip-vit-base32-torch",
"embeddings": "clip_embeddings",
"backend": "sklearn",
"metric": "cosine"
}
)
# Compute uniqueness scores
execute_operator(
operator_uri="@voxel51/brain/compute_uniqueness",
params={"brain_key": "outliers"}
)
# Generate visualization (use existing embeddings field or brain_key)
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "outliers",
"embeddings": "clip_embeddings", # Use existing field if available
"method": "umap", # or "tsne" if umap-learn not installed
"num_dims": 2
}
)
# Direct user to App at http://localhost:5151/
# 1. Click Embeddings panel icon
# 2. Select "outliers" from dropdown
# 3. Outliers appear as isolated points far from clusters
# 4. Optionally sort by uniqueness field in the App sidebar
Use Case 3: Compare Classes in Embedding Space
See how different classes cluster:
set_context(dataset_name="my-dataset")
launch_app()
# Check for existing embeddings in schema
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
# If no embeddings exist, compute them:
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "class_viz",
"model": "clip-vit-base32-torch",
"embeddings": "clip_embeddings",
"backend": "sklearn",
"metric": "cosine"
}
)
# Generate visualization (use existing embeddings field or brain_key)
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "class_viz",
"embeddings": "clip_embeddings", # Use existing field if available
"method": "umap", # or "tsne" if umap-learn not installed
"num_dims": 2
}
)
# Direct user to App at http://localhost:5151/
# 1. Click Embeddings panel icon
# 2. Select "class_viz" from dropdown
# 3. Use "Color by" dropdown to color by ground_truth or predictions
# Look for:
# - Well-separated clusters = good class distinction
# - Overlapping clusters = similar classes or confusion
# - Scattered points = high variance within class
Use Case 4: Analyze Model Predictions
Compare ground truth vs predictions in embedding space:
set_context(dataset_name="my-dataset")
launch_app()
# Check for existing embeddings in schema
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
# If no embeddings exist, compute them:
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "pred_analysis",
"model": "clip-vit-base32-torch",
"embeddings": "clip_embeddings",
"backend": "sklearn",
"metric": "cosine"
}
)
# Generate visualization (use existing embeddings field or brain_key)
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "pred_analysis",
"embeddings": "clip_embeddings", # Use existing field if available
"method": "umap", # or "tsne" if umap-learn not installed
"num_dims": 2
}
)
# Direct user to App at http://localhost:5151/
# 1. Click Embeddings panel icon
# 2. Select "pred_analysis" from dropdown
# 3. Color by ground_truth - see true class distribution
# 4. Color by predictions - see model's view
# 5. Look for mismatches to find errors
Use Case 5: t-SNE for Publication-Quality Plots
Use t-SNE for better local structure (no extra dependencies):
set_context(dataset_name="my-dataset")
launch_app()
# Check for existing embeddings in schema
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
# If no embeddings exist, compute them (DINOv2 for visual similarity):
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "tsne_viz",
"model": "dinov2-vits14-torch",
"embeddings": "dinov2_embeddings",
"backend": "sklearn",
"metric": "cosine"
}
)
# Generate t-SNE visualization (no umap-learn dependency needed)
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "tsne_viz",
"embeddings": "dinov2_embeddings", # Use existing field if available
"method": "tsne",
"num_dims": 2
}
)
# Direct user to App at http://localhost:5151/
# 1. Click Embeddings panel icon
# 2. Select "tsne_viz" from dropdown
# 3. t-SNE provides better local cluster structure than UMAP
Troubleshooting
Error: "No executor available"
- Cause: Delegated operators require the App executor
- Solution: Ensure
launch_app()was called and wait 5-10 seconds
Error: "Brain key not found"
- Cause: Embeddings not computed
- Solution: Run
compute_similarityfirst with abrain_key
Error: "Operator not found"
- Cause: Brain plugin not installed
- Solution: Install with
download_plugin()andenable_plugin()
Error: "You must install the umap-learn>=0.5 package"
- Cause: UMAP method requires the
umap-learnpackage - Solutions:
- Install umap-learn: Ask user if they want to run
pip install umap-learn - Use t-SNE instead: Change
methodto"tsne"(no extra dependencies) - Use PCA instead: Change
methodto"pca"(fastest, no extra dependencies)
- Install umap-learn: Ask user if they want to run
- After installing umap-learn, restart Claude Code/MCP server and retry
Visualization is slow
- Use UMAP instead of t-SNE for large datasets
- Use faster embedding model:
mobilenet-v2-imagenet-torch - Process subset first:
set_view(limit=1000)
Embeddings panel not showing
- Ensure visualization was computed (not just embeddings)
- Check brain_key matches in both compute_similarity and compute_visualization
- Refresh the App page
Points not colored correctly
- Verify the field exists on samples
- Check field type is compatible (Classification, Detections, or string)
Best Practices
- Discover dynamically - Use
list_operators()andget_operator_schema()to get current operator names and parameters - Choose the right model - CLIP for semantic similarity, DINOv2 for visual similarity
- Start with UMAP - Faster and often better than t-SNE for exploration
- Use uniqueness for outliers - More reliable than visual inspection alone
- Store embeddings - Reuse for multiple visualizations via
brain_key - Subset large datasets - Compute on subset first, then full dataset
Performance Notes
Embedding computation time:
- 1,000 images: ~1-2 minutes
- 10,000 images: ~10-15 minutes
- 100,000 images: ~1-2 hours
Visualization computation time:
- UMAP: ~30 seconds for 10,000 samples
- t-SNE: ~5-10 minutes for 10,000 samples
- PCA: ~5 seconds for 10,000 samples
Memory requirements:
- ~2KB per image for embeddings
- ~16 bytes per image for 2D coordinates