RFdiffusion Backbone Generation Skill

RFdiffusion Backbone Generation

Prerequisites

| Requirement | Minimum | Recommended | |-------------|---------|-------------| | Python | 3.9+ | 3.10 | | CUDA | 11.7+ | 12.0+ | | GPU VRAM | 16GB | 24GB (A10G) | | RAM | 16GB | 32GB |

How to run

First time? See Installation Guide to set up Modal and biomodals.

Option 1: Modal (recommended)

# Clone biomodals
git clone https://github.com/hgbrian/biomodals && cd biomodals

# Basic binder design
modal run modal_rfdiffusion.py \
  --pdb target.pdb \
  --contigs "A1-150/0 70-100" \
  --hotspot "A45,A67,A89" \
  --num-designs 100

# With custom GPU/timeout
GPU=A100 TIMEOUT=60 modal run modal_rfdiffusion.py \
  --pdb target.pdb \
  --contigs "A1-150/0 70-100" \
  --num-designs 100

GPU: A10G (24GB) | Timeout: 30min default

Option 2: Local installation

# Clone and install
git clone https://github.com/RosettaCommons/RFdiffusion.git
cd RFdiffusion && pip install -e .

# Download weights
wget http://files.ipd.uw.edu/pub/RFdiffusion/models/Complex_base_ckpt.pt

# Run inference
python run_inference.py \
  inference.input_pdb=target.pdb \
  contigmap.contigs=[A1-150/0 70-100] \
  ppi.hotspot_res=[A45,A67,A89] \
  inference.num_designs=100

Config Schema (Hydra)

Contigmap Syntax

# De novo single chain (50-100 residues)
contigmap.contigs=[50-100]

# Binder + target (A = target chain, fixed with /0)
contigmap.contigs=[A1-150/0 70-100]

# Motif scaffolding (preserve residues, /0 = fixed)
contigmap.contigs=[20-40/0 A10-30/0 20-40]

# Multi-chain binder
contigmap.contigs=[A1-100/0 B1-100/0 60-80]

# Variable length ranges
contigmap.contigs=[A1-150/0 50-100]  # Binder 50-100 AA

Hotspot Specification

# Residues for interface (chain + resnum, no spaces)
ppi.hotspot_res=[A45,A67,A89]

Common mistakes

Contig Syntax

✅ Correct:

contigmap.contigs=[A1-150/0 70-100]  # Target fixed (/0), binder variable

❌ Wrong:

contigmap.contigs=[A1-150 70-100]    # Missing /0 - target will move!
contigmap.contigs="A1-150/0 70-100"  # Quotes break parsing
contigmap.contigs=[A1-150/0, 70-100] # Comma breaks parsing

Hotspot Residues

✅ Correct:

ppi.hotspot_res=[A45,A67,A89]        # Chain letter + residue number

❌ Wrong:

ppi.hotspot_res=[45,67,89]           # Missing chain letter
ppi.hotspot_res=[A45, A67, A89]      # Spaces break parsing
ppi.hotspot_res="A45,A67,A89"        # Quotes break parsing

Complete Parameter Reference

Core Parameters

| Parameter | Default | Range | Description | |-----------|---------|-------|-------------| | inference.num_designs | 10 | 1-10000 | Number of designs to generate | | inference.input_pdb | - | path | Target structure file | | inference.output_prefix | output | string | Output filename prefix | | diffuser.T | 50 | 20-200 | Diffusion timesteps | | denoiser.noise_scale_ca | 1.0 | 0.0-2.0 | CA atom noise (0.5-0.8 = conservative) | | denoiser.noise_scale_frame | 1.0 | 0.0-2.0 | Frame noise | | inference.ckpt_override_path | - | path | Model checkpoint | | potentials.guide_scale | 1.0 | 0.1-10 | Guidance strength | | potentials.guide_decay | constant | string | Decay type |

Advanced Parameters

| Parameter | Default | Description | |-----------|---------|-------------| | diffuser.partial_T | None | Start diffusion from timestep T (partial diffusion) | | contigmap.inpaint_str | None | Sequence positions to inpaint | | scaffoldguided.scaffoldguided | false | Enable scaffold-guided generation | | scaffoldguided.target_pdb | None | Scaffold template PDB | | ppi.binderlen | None | Specify exact binder length |

Symmetry Parameters

| Parameter | Default | Description | |-----------|---------|-------------| | symmetry.symmetry | None | Symmetry type (C2, C3, C4, D2, etc.) | | symmetry.recenter | true | Recenter symmetric assembly | | symmetry.radius | None | Radius constraint for symmetric assembly |

Fold Conditioning

| Parameter | Default | Description | |-----------|---------|-------------| | contigmap.provide_seq | None | Provide sequence for fold conditioning | | contigmap.inpaint_seq | None | Positions for sequence inpainting |

Model Checkpoints

| Checkpoint | Use Case | |------------|----------| | Complex_base_ckpt.pt | Binder design (default) | | Base_ckpt.pt | De novo monomers | | ActiveSite_ckpt.pt | Active site scaffolding | | InpaintSeq_ckpt.pt | Sequence inpainting |

Common workflows

Binder Design

Prepare target PDB (trim to binding region + 10A buffer)
Identify 3-6 hotspot residues (exposed, conserved)
Generate 100-500 backbones
Pass to proteinmpnn for sequence design

Motif Scaffolding

Extract motif coordinates
Use /0 to fix motif in contigmap
Generate surrounding scaffold
Validate motif preservation (RMSD < 1.5A)

Symmetric Oligomers

# C3 symmetric trimer
python run_inference.py \
  symmetry.symmetry=C3 \
  contigmap.contigs=[100-150] \
  inference.num_designs=50

# D2 symmetric tetramer
python run_inference.py \
  symmetry.symmetry=D2 \
  contigmap.contigs=[80-120] \
  symmetry.radius=25

# Supported symmetries: C2, C3, C4, C5, C6, D2, D3, D4, tetrahedral, octahedral

Partial Diffusion (Refinement)

# Start from existing structure, diffuse from timestep 10
python run_inference.py \
  inference.input_pdb=initial.pdb \
  diffuser.partial_T=10 \
  contigmap.contigs=[A1-100]

Output format

output/
├── output_0.pdb       # Generated backbone
├── output_1.pdb
├── ...
└── output_99.pdb

Each PDB contains polyalanine backbone - use proteinmpnn for sequence.

Sample output

Successful run

$ python run_inference.py inference.input_pdb=target.pdb contigmap.contigs=[A1-150/0 70-100] inference.num_designs=100
[INFO] Loading model from Complex_base_ckpt.pt
[INFO] Generating design 1/100...
[INFO] Generating design 50/100...
[INFO] Generating design 100/100...
[INFO] Saved 100 designs to output/

Generated:
output/output_0.pdb (85 residues)
output/output_1.pdb (92 residues)
...

What good output looks like:

File size: 3-8 KB per PDB (backbone only)
Residue count within specified range
Secondary structure visible in PyMOL (helices/sheets, not random coil)

Decision tree

Should I use RFdiffusion?
│
├─ Need to generate protein backbone?
│  ├─ Yes → Continue below
│  └─ No, already have backbone → Use ProteinMPNN
│
├─ What type of design?
│  ├─ Binder for protein target → RFdiffusion ✓
│  ├─ De novo monomer → RFdiffusion ✓
│  ├─ Motif scaffolding → RFdiffusion ✓
│  └─ Symmetric assembly → RFdiffusion ✓
│
└─ Priority?
   ├─ Need highest success rate → Consider BindCraft
   ├─ Need diversity/exploration → RFdiffusion ✓
   └─ Need all-atom precision → Consider BoltzGen

Typical performance

| Campaign Size | Time (A10G) | Cost (Modal) | Notes | |---------------|-------------|--------------|-------| | 100 backbones | 20-30 min | ~$3 | Quick exploration | | 500 backbones | 1.5-2h | ~$12 | Standard campaign | | 1000 backbones | 3-4h | ~$25 | Large campaign |

Expected downstream yield: ~10-15% of backbones pass full QC after sequence design + validation.

Verify

ls output/*.pdb | wc -l  # Should match num_designs

Troubleshooting

Designs lack secondary structure: Decrease noise_scale to 0.5-0.8 Binder not contacting hotspots: Verify residue numbering, increase num_designs OOM errors: Reduce batch size or use A100 GPU Slow generation: Reduce diffuser.T to 25-35

Error interpretation

| Error | Cause | Fix | |-------|-------|-----| | RuntimeError: CUDA out of memory | GPU VRAM exceeded | Use A100 or reduce designs per batch | | KeyError: 'A' | Chain not found in PDB | Check chain IDs with grep ^ATOM target.pdb \| cut -c22 \| sort -u | | ValueError: invalid contig | Syntax error in contigs | Check for spaces, quotes, commas (see Common Mistakes) | | FileNotFoundError: ckpt | Missing model weights | Download from IPD website |

Next: proteinmpnn for sequence design → structure prediction for validation → protein-qc for filtering.

Agent Skills: RFdiffusion Backbone Generation

Install this agent skill to your local

Skill Files