Agent Skills: scvi-tools Deep Learning Skill

Deep learning for single-cell analysis using scvi-tools. This skill should be used when users need (1) data integration and batch correction with scVI/scANVI, (2) ATAC-seq analysis with PeakVI, (3) CITE-seq multi-modal analysis with totalVI, (4) multiome RNA+ATAC analysis with MultiVI, (5) spatial transcriptomics deconvolution with DestVI, (6) label transfer and reference mapping with scANVI/scArches, (7) RNA velocity with veloVI, or (8) any deep learning-based single-cell method. Triggers include mentions of scVI, scANVI, totalVI, PeakVI, MultiVI, DestVI, veloVI, sysVI, scArches, variational autoencoder, VAE, batch correction, data integration, multi-modal, CITE-seq, multiome, reference mapping, latent space.

UncategorizedID: anthropics/life-sciences/scvi-tools

Install this agent skill to your local

pnpm dlx add-skill https://github.com/anthropics/life-sciences/tree/HEAD/scvi-tools

Skill Files

Browse the full folder contents for scvi-tools.

Download Skill

Loading file tree…

scvi-tools/SKILL.md

Skill Metadata

Name
scvi-tools
Description
Deep learning for single-cell analysis using scvi-tools. This skill should be used when users need (1) data integration and batch correction with scVI/scANVI, (2) ATAC-seq analysis with PeakVI, (3) CITE-seq multi-modal analysis with totalVI, (4) multiome RNA+ATAC analysis with MultiVI, (5) spatial transcriptomics deconvolution with DestVI, (6) label transfer and reference mapping with scANVI/scArches, (7) RNA velocity with veloVI, or (8) any deep learning-based single-cell method. Triggers include mentions of scVI, scANVI, totalVI, PeakVI, MultiVI, DestVI, veloVI, sysVI, scArches, variational autoencoder, VAE, batch correction, data integration, multi-modal, CITE-seq, multiome, reference mapping, latent space.

scvi-tools Deep Learning Skill

This skill provides guidance for deep learning-based single-cell analysis using scvi-tools, the leading framework for probabilistic models in single-cell genomics.

How to Use This Skill

  1. Identify the appropriate workflow from the model/workflow tables below
  2. Read the corresponding reference file for detailed steps and code
  3. Use scripts in scripts/ to avoid rewriting common code
  4. For installation or GPU issues, consult references/environment_setup.md
  5. For debugging, consult references/troubleshooting.md

When to Use This Skill

  • When scvi-tools, scVI, scANVI, or related models are mentioned
  • When deep learning-based batch correction or integration is needed
  • When working with multi-modal data (CITE-seq, multiome)
  • When reference mapping or label transfer is required
  • When analyzing ATAC-seq or spatial transcriptomics data
  • When learning latent representations of single-cell data

Model Selection Guide

| Data Type | Model | Primary Use Case | |-----------|-------|------------------| | scRNA-seq | scVI | Unsupervised integration, DE, imputation | | scRNA-seq + labels | scANVI | Label transfer, semi-supervised integration | | CITE-seq (RNA+protein) | totalVI | Multi-modal integration, protein denoising | | scATAC-seq | PeakVI | Chromatin accessibility analysis | | Multiome (RNA+ATAC) | MultiVI | Joint modality analysis | | Spatial + scRNA reference | DestVI | Cell type deconvolution | | RNA velocity | veloVI | Transcriptional dynamics | | Cross-technology | sysVI | System-level batch correction |

Workflow Reference Files

| Workflow | Reference File | Description | |----------|---------------|-------------| | Environment Setup | references/environment_setup.md | Installation, GPU, version info | | Data Preparation | references/data_preparation.md | Formatting data for any model | | scRNA Integration | references/scrna_integration.md | scVI/scANVI batch correction | | ATAC-seq Analysis | references/atac_peakvi.md | PeakVI for accessibility | | CITE-seq Analysis | references/citeseq_totalvi.md | totalVI for protein+RNA | | Multiome Analysis | references/multiome_multivi.md | MultiVI for RNA+ATAC | | Spatial Deconvolution | references/spatial_deconvolution.md | DestVI spatial analysis | | Label Transfer | references/label_transfer.md | scANVI reference mapping | | scArches Mapping | references/scarches_mapping.md | Query-to-reference mapping | | Batch Correction | references/batch_correction_sysvi.md | Advanced batch methods | | RNA Velocity | references/rna_velocity_velovi.md | veloVI dynamics | | Troubleshooting | references/troubleshooting.md | Common issues and solutions |

CLI Scripts

Modular scripts for common workflows. Chain together or modify as needed.

Pipeline Scripts

| Script | Purpose | Usage | |--------|---------|-------| | prepare_data.py | QC, filter, HVG selection | python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch | | train_model.py | Train any scvi-tools model | python scripts/train_model.py prepared.h5ad results/ --model scvi | | cluster_embed.py | Neighbors, UMAP, Leiden | python scripts/cluster_embed.py adata.h5ad results/ | | differential_expression.py | DE analysis | python scripts/differential_expression.py model/ adata.h5ad de.csv --groupby leiden | | transfer_labels.py | Label transfer with scANVI | python scripts/transfer_labels.py ref_model/ query.h5ad results/ | | integrate_datasets.py | Multi-dataset integration | python scripts/integrate_datasets.py results/ data1.h5ad data2.h5ad | | validate_adata.py | Check data compatibility | python scripts/validate_adata.py data.h5ad --batch-key batch |

Example Workflow

# 1. Validate input data
python scripts/validate_adata.py raw.h5ad --batch-key batch --suggest

# 2. Prepare data (QC, HVG selection)
python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch --n-hvgs 2000

# 3. Train model
python scripts/train_model.py prepared.h5ad results/ --model scvi --batch-key batch

# 4. Cluster and visualize
python scripts/cluster_embed.py results/adata_trained.h5ad results/ --resolution 0.8

# 5. Differential expression
python scripts/differential_expression.py results/model results/adata_clustered.h5ad results/de.csv --groupby leiden

Python Utilities

The scripts/model_utils.py provides importable functions for custom workflows:

| Function | Purpose | |----------|---------| | prepare_adata() | Data preparation (QC, HVG, layer setup) | | train_scvi() | Train scVI or scANVI | | evaluate_integration() | Compute integration metrics | | get_marker_genes() | Extract DE markers | | save_results() | Save model, data, plots | | auto_select_model() | Suggest best model | | quick_clustering() | Neighbors + UMAP + Leiden |

Critical Requirements

  1. Raw counts required: scvi-tools models require integer count data

    adata.layers["counts"] = adata.X.copy()  # Before normalization
    scvi.model.SCVI.setup_anndata(adata, layer="counts")
    
  2. HVG selection: Use 2000-4000 highly variable genes

    sc.pp.highly_variable_genes(adata, n_top_genes=2000, batch_key="batch", layer="counts", flavor="seurat_v3")
    adata = adata[:, adata.var['highly_variable']].copy()
    
  3. Batch information: Specify batch_key for integration

    scvi.model.SCVI.setup_anndata(adata, layer="counts", batch_key="batch")
    

Quick Decision Tree

Need to integrate scRNA-seq data?
├── Have cell type labels? → scANVI (references/label_transfer.md)
└── No labels? → scVI (references/scrna_integration.md)

Have multi-modal data?
├── CITE-seq (RNA + protein)? → totalVI (references/citeseq_totalvi.md)
├── Multiome (RNA + ATAC)? → MultiVI (references/multiome_multivi.md)
└── scATAC-seq only? → PeakVI (references/atac_peakvi.md)

Have spatial data?
└── Need cell type deconvolution? → DestVI (references/spatial_deconvolution.md)

Have pre-trained reference model?
└── Map query to reference? → scArches (references/scarches_mapping.md)

Need RNA velocity?
└── veloVI (references/rna_velocity_velovi.md)

Strong cross-technology batch effects?
└── sysVI (references/batch_correction_sysvi.md)

Key Resources