Export FiftyOne Datasets Skill

Export FiftyOne Datasets

Key Directives

ALWAYS follow these rules:

1. Load and understand the dataset first

set_context(dataset_name="my-dataset")
dataset_summary(name="my-dataset")

2. Confirm export settings with user

Before exporting, present:

Dataset name and sample count
Available label fields and their types
Proposed export format
Export directory path

3. Match format to label types

Different formats support different label types:

| Format | Label Types | |--------|-------------| | COCO | detections, segmentations, keypoints | | YOLO (v4, v5) | detections | | VOC | detections | | CVAT | classifications, detections, polylines, keypoints | | CSV | all (custom fields) | | Image Classification Directory Tree | classification |

4. Use absolute paths

Always use absolute paths for export directories:

params={
    "export_dir": {"absolute_path": "/path/to/export"}
}

5. Warn about overwriting

Check if export directory exists before exporting. If it does, ask user whether to overwrite.

Complete Workflow

Step 1: Load Dataset and Understand Content

# Set context
set_context(dataset_name="my-dataset")

# Get dataset summary to see fields and label types
dataset_summary(name="my-dataset")

Identify:

Total sample count
Media type (images, videos, point clouds)
Available label fields and their types (Detections, Classifications, etc.)

Step 2: Get Export Operator Schema

# Discover export parameters dynamically
get_operator_schema(operator_uri="@voxel51/io/export_samples")

Step 3: Present Export Options to User

Before exporting, confirm with the user:

Dataset: my-dataset (5,000 samples)
Media type: image

Available label fields:
  - ground_truth (Detections)
  - predictions (Detections)

Export options:
  - Format: COCO (recommended for detections)
  - Export directory: /path/to/export
  - Label field: ground_truth

Proceed with export?

Step 4: Execute Export

Export media and labels:

execute_operator(
    operator_uri="@voxel51/io/export_samples",
    params={
        "export_type": "MEDIA_AND_LABELS",
        "dataset_type": "COCO",
        "export_dir": {"absolute_path": "/path/to/export"},
        "label_field": "ground_truth"
    }
)

Export labels only (no media copy):

execute_operator(
    operator_uri="@voxel51/io/export_samples",
    params={
        "export_type": "LABELS_ONLY",
        "dataset_type": "COCO",
        "labels_path": {"absolute_path": "/path/to/labels.json"},
        "label_field": "ground_truth"
    }
)

Export media only (no labels):

execute_operator(
    operator_uri="@voxel51/io/export_samples",
    params={
        "export_type": "MEDIA_ONLY",
        "export_dir": {"absolute_path": "/path/to/media"}
    }
)

Step 5: Verify Export

After export, verify the output:

ls -la /path/to/export

Report exported file count and structure to user.

Supported Export Formats

Detection Formats

| Format | dataset_type Value | Label Types | Labels-Only | |--------|----------------------|-------------|-------------| | COCO | "COCO" | detections, segmentations, keypoints | Yes | | YOLOv4 | "YOLOv4" | detections | Yes | | YOLOv5 | "YOLOv5" | detections | No | | VOC | "VOC" | detections | Yes | | KITTI | "KITTI" | detections | Yes | | CVAT Image | "CVAT Image" | classifications, detections, polylines, keypoints | Yes | | CVAT Video | "CVAT Video" | frame labels | Yes | | TF Object Detection | "TF Object Detection" | detections | No |

Classification Formats

| Format | dataset_type Value | Media Type | Labels-Only | |--------|----------------------|------------|-------------| | Image Classification Directory Tree | "Image Classification Directory Tree" | image | No | | Video Classification Directory Tree | "Video Classification Directory Tree" | video | No | | TF Image Classification | "TF Image Classification" | image | No |

Segmentation Formats

| Format | dataset_type Value | Label Types | Labels-Only | |--------|----------------------|-------------|-------------| | Image Segmentation | "Image Segmentation" | segmentation | Yes |

General Formats

| Format | dataset_type Value | Best For | Labels-Only | |--------|----------------------|----------|-------------| | CSV | "CSV" | Custom fields, spreadsheet analysis | Yes | | GeoJSON | "GeoJSON" | Geolocation data | Yes | | FiftyOne Dataset | "FiftyOne Dataset" | Full dataset backup with all metadata | Yes |

Note: Formats with "Labels-Only: No" require export_type: "MEDIA_AND_LABELS" (cannot export labels without media).

Export Type Options

| export_type Value | Description | |---------------------|-------------| | "MEDIA_AND_LABELS" | Export both media files and labels | | "LABELS_ONLY" | Export labels only (use labels_path instead of export_dir) | | "MEDIA_ONLY" | Export media files only (no labels) | | "FILEPATHS_ONLY" | Export CSV with filepaths only |

Target Options

Export from different sources:

| target Value | Description | |----------------|-------------| | "DATASET" | Export entire dataset (default) | | "CURRENT_VIEW" | Export current filtered view | | "SELECTED_SAMPLES" | Export selected samples only |

Common Use Cases

Use Case 1: Export to COCO Format

For training with frameworks that use COCO format:

set_context(dataset_name="my-dataset")

execute_operator(
    operator_uri="@voxel51/io/export_samples",
    params={
        "export_type": "MEDIA_AND_LABELS",
        "dataset_type": "COCO",
        "export_dir": {"absolute_path": "/path/to/coco_export"},
        "label_field": "ground_truth"
    }
)

Output structure:

coco_export/
├── data/
│   ├── image1.jpg
│   └── image2.jpg
└── labels.json

Use Case 2: Export to YOLO Format

For training YOLOv5/v8 models:

set_context(dataset_name="my-dataset")

execute_operator(
    operator_uri="@voxel51/io/export_samples",
    params={
        "export_type": "MEDIA_AND_LABELS",
        "dataset_type": "YOLOv5",
        "export_dir": {"absolute_path": "/path/to/yolo_export"},
        "label_field": "ground_truth"
    }
)

Output structure:

yolo_export/
├── images/
│   └── train/
│       └── image1.jpg
├── labels/
│   └── train/
│       └── image1.txt
└── dataset.yaml

Use Case 3: Export Filtered View

Export only a subset of samples:

# Set context
set_context(dataset_name="my-dataset")

# Filter samples in the App
set_view(tags=["validated"])

# Export the filtered view
execute_operator(
    operator_uri="@voxel51/io/export_samples",
    params={
        "target": "CURRENT_VIEW",
        "export_type": "MEDIA_AND_LABELS",
        "dataset_type": "COCO",
        "export_dir": {"absolute_path": "/path/to/validated_export"},
        "label_field": "ground_truth"
    }
)

Use Case 4: Export Labels Only

When media should stay in place:

set_context(dataset_name="my-dataset")

execute_operator(
    operator_uri="@voxel51/io/export_samples",
    params={
        "export_type": "LABELS_ONLY",
        "dataset_type": "COCO",
        "labels_path": {"absolute_path": "/path/to/annotations.json"},
        "label_field": "ground_truth"
    }
)

Use Case 5: Export for Classification Training

For image classification datasets:

set_context(dataset_name="my-classification-dataset")

execute_operator(
    operator_uri="@voxel51/io/export_samples",
    params={
        "export_type": "MEDIA_AND_LABELS",
        "dataset_type": "Image Classification Directory Tree",
        "export_dir": {"absolute_path": "/path/to/classification_export"},
        "label_field": "ground_truth"
    }
)

Output structure:

classification_export/
├── cat/
│   ├── cat1.jpg
│   └── cat2.jpg
└── dog/
    ├── dog1.jpg
    └── dog2.jpg

Use Case 6: Export to CSV

For analysis in spreadsheets:

set_context(dataset_name="my-dataset")

execute_operator(
    operator_uri="@voxel51/io/export_samples",
    params={
        "export_type": "LABELS_ONLY",
        "dataset_type": "CSV",
        "labels_path": {"absolute_path": "/path/to/data.csv"},
        "csv_fields": ["filepath", "ground_truth.detections.label"]
    }
)

Use Case 7: Export FiftyOne Dataset (Full Backup)

For complete dataset backup including all metadata:

set_context(dataset_name="my-dataset")

execute_operator(
    operator_uri="@voxel51/io/export_samples",
    params={
        "export_type": "MEDIA_AND_LABELS",
        "dataset_type": "FiftyOne Dataset",
        "export_dir": {"absolute_path": "/path/to/backup"}
    }
)

Output structure:

backup/
├── metadata.json
├── samples.json
├── data/
│   └── ...
├── annotations/
├── brain/
└── evaluations/

Python SDK Alternative

For more control, guide users to use the Python SDK directly:

import fiftyone as fo
import fiftyone.types as fot

# Load dataset
dataset = fo.load_dataset("my-dataset")

# Export to COCO format
dataset.export(
    export_dir="/path/to/export",
    dataset_type=fot.COCODetectionDataset,
    label_field="ground_truth",
)

# Export labels only
dataset.export(
    labels_path="/path/to/labels.json",
    dataset_type=fot.COCODetectionDataset,
    label_field="ground_truth",
)

# Export a filtered view
view = dataset.match_tags("validated")
view.export(
    export_dir="/path/to/validated",
    dataset_type=fot.YOLOv5Dataset,
    label_field="ground_truth",
)

Python SDK dataset types:

fot.COCODetectionDataset - COCO format
fot.YOLOv4Dataset - YOLOv4 format
fot.YOLOv5Dataset - YOLOv5 format
fot.VOCDetectionDataset - Pascal VOC format
fot.KITTIDetectionDataset - KITTI format
fot.CVATImageDataset - CVAT image format
fot.CVATVideoDataset - CVAT video format
fot.TFObjectDetectionDataset - TensorFlow Object Detection format
fot.ImageClassificationDirectoryTree - Classification folder structure
fot.VideoClassificationDirectoryTree - Video classification folders
fot.TFImageClassificationDataset - TensorFlow classification format
fot.ImageSegmentationDirectory - Segmentation masks
fot.CSVDataset - CSV format
fot.GeoJSONDataset - GeoJSON format
fot.FiftyOneDataset - Native FiftyOne format

Exporting to Hugging Face Hub

For complete HF Hub export documentation, see HF-HUB-EXPORT.md.

Quick reference:

| Method | Use Case | |--------|----------| | push_to_hub() | Personal accounts, simple upload | | Manual upload | Organizations, private org repos |

Quick start:

from fiftyone.utils.huggingface import push_to_hub

# Personal account
push_to_hub(dataset, repo_name="my-dataset", private=False)

# With options
push_to_hub(
    dataset,
    repo_name="my-dataset",
    description="My dataset description",
    license="apache-2.0",
    private=True,
)

IMPORTANT: Always generate and get user approval for dataset card before uploading. See HF-HUB-EXPORT.md for complete documentation including authentication setup, dataset card workflow, parameters reference, use cases, and troubleshooting.

Troubleshooting

Error: "Export directory already exists"

Add "overwrite": true to params
Or specify a different export directory

Error: "Label field not found"

Use dataset_summary() to see available label fields
Verify the field name spelling

Error: "Unsupported label type for format"

Check that the export format supports your label type
COCO: detections, segmentations, keypoints
YOLO: detections only
Classification formats: classification labels only

Error: "Permission denied"

Verify write permissions for the export directory
Check parent directory exists

Export is slow

Large datasets take time; consider exporting a view first
Export to local disk rather than network drives
For labels only, use LABELS_ONLY export type

Best Practices

Understand your data first - Use dataset_summary() to know what fields and label types exist
Match format to purpose - Use COCO/YOLO for training, CSV for analysis, FiftyOne Dataset for backups
Confirm with user - Present export settings before executing
Export filtered views - Only export what's needed rather than entire datasets
Verify after export - Check exported file counts match expectations
Use labels_path for LABELS_ONLY - When exporting labels only, use labels_path not export_dir

Agent Skills: Export FiftyOne Datasets

Install this agent skill to your local

Skill Files