Export FiftyOne Datasets
Key Directives
ALWAYS follow these rules:
1. Load and understand the dataset first
set_context(dataset_name="my-dataset")
dataset_summary(name="my-dataset")
2. Confirm export settings with user
Before exporting, present:
- Dataset name and sample count
- Available label fields and their types
- Proposed export format
- Export directory path
3. Match format to label types
Different formats support different label types:
| Format | Label Types | |--------|-------------| | COCO | detections, segmentations, keypoints | | YOLO (v4, v5) | detections | | VOC | detections | | CVAT | classifications, detections, polylines, keypoints | | CSV | all (custom fields) | | Image Classification Directory Tree | classification |
4. Use absolute paths
Always use absolute paths for export directories:
params={
"export_dir": {"absolute_path": "/path/to/export"}
}
5. Warn about overwriting
Check if export directory exists before exporting. If it does, ask user whether to overwrite.
Complete Workflow
Step 1: Load Dataset and Understand Content
# Set context
set_context(dataset_name="my-dataset")
# Get dataset summary to see fields and label types
dataset_summary(name="my-dataset")
Identify:
- Total sample count
- Media type (images, videos, point clouds)
- Available label fields and their types (Detections, Classifications, etc.)
Step 2: Get Export Operator Schema
# Discover export parameters dynamically
get_operator_schema(operator_uri="@voxel51/io/export_samples")
Step 3: Present Export Options to User
Before exporting, confirm with the user:
Dataset: my-dataset (5,000 samples)
Media type: image
Available label fields:
- ground_truth (Detections)
- predictions (Detections)
Export options:
- Format: COCO (recommended for detections)
- Export directory: /path/to/export
- Label field: ground_truth
Proceed with export?
Step 4: Execute Export
Export media and labels:
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "COCO",
"export_dir": {"absolute_path": "/path/to/export"},
"label_field": "ground_truth"
}
)
Export labels only (no media copy):
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "LABELS_ONLY",
"dataset_type": "COCO",
"labels_path": {"absolute_path": "/path/to/labels.json"},
"label_field": "ground_truth"
}
)
Export media only (no labels):
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_ONLY",
"export_dir": {"absolute_path": "/path/to/media"}
}
)
Step 5: Verify Export
After export, verify the output:
ls -la /path/to/export
Report exported file count and structure to user.
Supported Export Formats
Detection Formats
| Format | dataset_type Value | Label Types | Labels-Only |
|--------|----------------------|-------------|-------------|
| COCO | "COCO" | detections, segmentations, keypoints | Yes |
| YOLOv4 | "YOLOv4" | detections | Yes |
| YOLOv5 | "YOLOv5" | detections | No |
| VOC | "VOC" | detections | Yes |
| KITTI | "KITTI" | detections | Yes |
| CVAT Image | "CVAT Image" | classifications, detections, polylines, keypoints | Yes |
| CVAT Video | "CVAT Video" | frame labels | Yes |
| TF Object Detection | "TF Object Detection" | detections | No |
Classification Formats
| Format | dataset_type Value | Media Type | Labels-Only |
|--------|----------------------|------------|-------------|
| Image Classification Directory Tree | "Image Classification Directory Tree" | image | No |
| Video Classification Directory Tree | "Video Classification Directory Tree" | video | No |
| TF Image Classification | "TF Image Classification" | image | No |
Segmentation Formats
| Format | dataset_type Value | Label Types | Labels-Only |
|--------|----------------------|-------------|-------------|
| Image Segmentation | "Image Segmentation" | segmentation | Yes |
General Formats
| Format | dataset_type Value | Best For | Labels-Only |
|--------|----------------------|----------|-------------|
| CSV | "CSV" | Custom fields, spreadsheet analysis | Yes |
| GeoJSON | "GeoJSON" | Geolocation data | Yes |
| FiftyOne Dataset | "FiftyOne Dataset" | Full dataset backup with all metadata | Yes |
Note: Formats with "Labels-Only: No" require export_type: "MEDIA_AND_LABELS" (cannot export labels without media).
Export Type Options
| export_type Value | Description |
|---------------------|-------------|
| "MEDIA_AND_LABELS" | Export both media files and labels |
| "LABELS_ONLY" | Export labels only (use labels_path instead of export_dir) |
| "MEDIA_ONLY" | Export media files only (no labels) |
| "FILEPATHS_ONLY" | Export CSV with filepaths only |
Target Options
Export from different sources:
| target Value | Description |
|----------------|-------------|
| "DATASET" | Export entire dataset (default) |
| "CURRENT_VIEW" | Export current filtered view |
| "SELECTED_SAMPLES" | Export selected samples only |
Common Use Cases
Use Case 1: Export to COCO Format
For training with frameworks that use COCO format:
set_context(dataset_name="my-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "COCO",
"export_dir": {"absolute_path": "/path/to/coco_export"},
"label_field": "ground_truth"
}
)
Output structure:
coco_export/
├── data/
│ ├── image1.jpg
│ └── image2.jpg
└── labels.json
Use Case 2: Export to YOLO Format
For training YOLOv5/v8 models:
set_context(dataset_name="my-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "YOLOv5",
"export_dir": {"absolute_path": "/path/to/yolo_export"},
"label_field": "ground_truth"
}
)
Output structure:
yolo_export/
├── images/
│ └── train/
│ └── image1.jpg
├── labels/
│ └── train/
│ └── image1.txt
└── dataset.yaml
Use Case 3: Export Filtered View
Export only a subset of samples:
# Set context
set_context(dataset_name="my-dataset")
# Filter samples in the App
set_view(tags=["validated"])
# Export the filtered view
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"target": "CURRENT_VIEW",
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "COCO",
"export_dir": {"absolute_path": "/path/to/validated_export"},
"label_field": "ground_truth"
}
)
Use Case 4: Export Labels Only
When media should stay in place:
set_context(dataset_name="my-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "LABELS_ONLY",
"dataset_type": "COCO",
"labels_path": {"absolute_path": "/path/to/annotations.json"},
"label_field": "ground_truth"
}
)
Use Case 5: Export for Classification Training
For image classification datasets:
set_context(dataset_name="my-classification-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "Image Classification Directory Tree",
"export_dir": {"absolute_path": "/path/to/classification_export"},
"label_field": "ground_truth"
}
)
Output structure:
classification_export/
├── cat/
│ ├── cat1.jpg
│ └── cat2.jpg
└── dog/
├── dog1.jpg
└── dog2.jpg
Use Case 6: Export to CSV
For analysis in spreadsheets:
set_context(dataset_name="my-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "LABELS_ONLY",
"dataset_type": "CSV",
"labels_path": {"absolute_path": "/path/to/data.csv"},
"csv_fields": ["filepath", "ground_truth.detections.label"]
}
)
Use Case 7: Export FiftyOne Dataset (Full Backup)
For complete dataset backup including all metadata:
set_context(dataset_name="my-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "FiftyOne Dataset",
"export_dir": {"absolute_path": "/path/to/backup"}
}
)
Output structure:
backup/
├── metadata.json
├── samples.json
├── data/
│ └── ...
├── annotations/
├── brain/
└── evaluations/
Python SDK Alternative
For more control, guide users to use the Python SDK directly:
import fiftyone as fo
import fiftyone.types as fot
# Load dataset
dataset = fo.load_dataset("my-dataset")
# Export to COCO format
dataset.export(
export_dir="/path/to/export",
dataset_type=fot.COCODetectionDataset,
label_field="ground_truth",
)
# Export labels only
dataset.export(
labels_path="/path/to/labels.json",
dataset_type=fot.COCODetectionDataset,
label_field="ground_truth",
)
# Export a filtered view
view = dataset.match_tags("validated")
view.export(
export_dir="/path/to/validated",
dataset_type=fot.YOLOv5Dataset,
label_field="ground_truth",
)
Python SDK dataset types:
fot.COCODetectionDataset- COCO formatfot.YOLOv4Dataset- YOLOv4 formatfot.YOLOv5Dataset- YOLOv5 formatfot.VOCDetectionDataset- Pascal VOC formatfot.KITTIDetectionDataset- KITTI formatfot.CVATImageDataset- CVAT image formatfot.CVATVideoDataset- CVAT video formatfot.TFObjectDetectionDataset- TensorFlow Object Detection formatfot.ImageClassificationDirectoryTree- Classification folder structurefot.VideoClassificationDirectoryTree- Video classification foldersfot.TFImageClassificationDataset- TensorFlow classification formatfot.ImageSegmentationDirectory- Segmentation masksfot.CSVDataset- CSV formatfot.GeoJSONDataset- GeoJSON formatfot.FiftyOneDataset- Native FiftyOne format
Exporting to Hugging Face Hub
For complete HF Hub export documentation, see HF-HUB-EXPORT.md.
Quick reference:
| Method | Use Case |
|--------|----------|
| push_to_hub() | Personal accounts, simple upload |
| Manual upload | Organizations, private org repos |
Quick start:
from fiftyone.utils.huggingface import push_to_hub
# Personal account
push_to_hub(dataset, repo_name="my-dataset", private=False)
# With options
push_to_hub(
dataset,
repo_name="my-dataset",
description="My dataset description",
license="apache-2.0",
private=True,
)
IMPORTANT: Always generate and get user approval for dataset card before uploading. See HF-HUB-EXPORT.md for complete documentation including authentication setup, dataset card workflow, parameters reference, use cases, and troubleshooting.
Troubleshooting
Error: "Export directory already exists"
- Add
"overwrite": trueto params - Or specify a different export directory
Error: "Label field not found"
- Use
dataset_summary()to see available label fields - Verify the field name spelling
Error: "Unsupported label type for format"
- Check that the export format supports your label type
- COCO: detections, segmentations, keypoints
- YOLO: detections only
- Classification formats: classification labels only
Error: "Permission denied"
- Verify write permissions for the export directory
- Check parent directory exists
Export is slow
- Large datasets take time; consider exporting a view first
- Export to local disk rather than network drives
- For labels only, use
LABELS_ONLYexport type
Best Practices
- Understand your data first - Use
dataset_summary()to know what fields and label types exist - Match format to purpose - Use COCO/YOLO for training, CSV for analysis, FiftyOne Dataset for backups
- Confirm with user - Present export settings before executing
- Export filtered views - Only export what's needed rather than entire datasets
- Verify after export - Check exported file counts match expectations
- Use labels_path for LABELS_ONLY - When exporting labels only, use
labels_pathnotexport_dir