vertex-protein-bisimulation Skill Skill

vertex-protein-bisimulation Skill

======= description: > Protein folding as compositional game on Vertex AI. GameOpt combinatorial Bayesian optimization over residue positions, bisimulation on conformational trajectories, monad-bayes posterior over folding pathways. Use when applying game-theoretic optimization to protein design, checking bisimulation equivalence of folding trajectories, or running AlphaFold/ESMFold batch prediction on Vertex AI.

vertex-protein-bisimulation

origin/main

Folding funnel = payoff landscape. Minimal frustration = Nash equilibrium.

Architecture

Basin-Hedges (ParaLens 6-wire)
  |
  +-- GameOpt layer (Bal, Sessa, Mutny, Krause 2024)
  |     Residue positions = players
  |     Amino acid identities = strategies
  |     Upper confidence bound equilibria guide search
  |     Counterfactual gating prunes combinatorial space
  |
  +-- Vertex AI Pipeline (compute backend)
  |     AlphaFold v2 batch: KFP pipeline, 3 phases
  |       CPU (MSA) -> GPU (predict) -> GPU (relax)
  |     ESMFold: single-seq, no MSA, 10-30x faster
  |       HuggingFace: facebook/esmfold_v1
  |     Batch prediction: 50% cost discount, 24hr
  |
  +-- Bisimulation on Folding
        Two trajectories bisimilar iff same native state
        Despite different intermediate conformations
        Stochastic process algebra on Markov state models
        CellValue lattice: Nothing=unfolded, Value=native,
          Contradiction=misfolded aggregate

monad-bayes Integration

-- Posterior over folding pathways
foldingPathway :: MonadMeasure m => Sequence -> m Structure
foldingPathway seq = do
<<<<<<< HEAD
  -- Prior: Ramachandran angles per residue
=======
>>>>>>> origin/main
  angles <- replicateM (length seq) $ do
    phi <- uniform (-pi) pi
    psi <- uniform (-pi) pi
    return (phi, psi)
<<<<<<< HEAD
  -- Energy function as likelihood
  let energy = forceField seq angles
  factor (Exp (negate energy / kT))
  -- Return structure
=======
  let energy = forceField seq angles
  factor (Exp (negate energy / kT))
>>>>>>> origin/main
  return (buildStructure seq angles)

-- GameOpt: combinatorial optimization as open game
proteinGame :: OpenGame Stochastic [AminoAcid] Energy
proteinGame = sequentialCompose residueGames
  where residueGames = map residueChoice [1..nPositions]
        residueChoice i = decision "residue_i" aminoAcids ucbPayoff

<<<<<<< HEAD

Key Papers

=======

Concrete Affordances

AlphaFold Batch Workflow on Vertex AI

Run the 3-phase AlphaFold pipeline (MSA on CPU, prediction on GPU, relaxation on GPU) via Vertex AI Pipelines. See also: vertex-ai-protein-interleave skill for full gcloud project setup.

# Prerequisites:
#   gcloud auth login
#   gcloud config set project YOUR_PROJECT_ID
#   gcloud services enable aiplatform.googleapis.com lifesciences.googleapis.com

# 1. Build the AlphaFold container (one-time)
gcloud builds submit \
  --tag gcr.io/${GOOGLE_CLOUD_PROJECT}/alphafold-batch:latest \
  --timeout=3600s \
  /Users/alice/v/asi/skills/vertex-ai-protein-interleave/docker/

# 2. Submit a batch prediction pipeline
#    Input: FASTA file in GCS; Output: PDB structures in GCS
gcloud ai custom-jobs create \
  --region=us-central1 \
  --display-name="alphafold-batch-$(date +%Y%m%d-%H%M%S)" \
  --worker-pool-spec="\
machine-type=n1-standard-8,\
accelerator-type=NVIDIA_TESLA_A100,\
accelerator-count=1,\
replica-count=1,\
container-image-uri=gcr.io/${GOOGLE_CLOUD_PROJECT}/alphafold-batch:latest" \
  --args="--fasta_paths=gs://${GOOGLE_CLOUD_PROJECT}-alphafold/input/sequences.fasta,\
--output_dir=gs://${GOOGLE_CLOUD_PROJECT}-alphafold/output/,\
--model_preset=monomer,\
--db_preset=reduced_dbs,\
--max_template_date=2026-03-01"

# 3. Monitor the job
gcloud ai custom-jobs list --region=us-central1 --filter="displayName~alphafold-batch" --limit=5
gcloud ai custom-jobs describe JOB_ID --region=us-central1

ESMFold Single-Sequence Prediction (Python)

Faster alternative when MSA is unnecessary:

# pip install torch transformers
from transformers import EsmForProteinFolding, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("facebook/esmfold_v1")
model = EsmForProteinFolding.from_pretrained("facebook/esmfold_v1")
model = model.eval()

sequence = "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"
inputs = tokenizer([sequence], return_tensors="pt", add_special_tokens=False)

with torch.no_grad():
    output = model(**inputs)

# Extract pLDDT confidence and PDB string
plddt = output["plddt"].mean().item()
pdb_str = model.output_to_pdb(output)[0]
print(f"Mean pLDDT: {plddt:.1f}")
with open("/tmp/predicted.pdb", "w") as f:
    f.write(pdb_str)
print("Structure written to /tmp/predicted.pdb")

GameOpt over Residue Positions (Python)

Combinatorial Bayesian optimization treating residue positions as players in a game. Uses scipy.optimize for the upper confidence bound acquisition:

# pip install scipy numpy
import numpy as np
from scipy.optimize import minimize

# 20 standard amino acids, N mutable positions
AMINO_ACIDS = list("ACDEFGHIKLMNPQRSTVWY")
N_POSITIONS = 5

# Simulated energy function (replace with AlphaFold/ESMFold pLDDT call)
def energy_fn(encoding: np.ndarray) -> float:
    """Negative pLDDT proxy. encoding: (N_POSITIONS,) floats in [0, 19]."""
    rounded = np.round(encoding).astype(int) % 20
    # Placeholder: pairwise contact energy
    return sum(abs(rounded[i] - rounded[i+1]) * 0.1 for i in range(len(rounded)-1))

# UCB acquisition: f(x) - kappa * sigma(x)
# In GameOpt each position is a "player" choosing an amino acid
def ucb_acquisition(x, kappa=2.0, gp_mean=None, gp_std=None):
    """Upper confidence bound for game-theoretic BO."""
    mu = gp_mean(x) if gp_mean else energy_fn(x)
    sigma = gp_std(x) if gp_std else 0.5  # prior uncertainty
    return mu - kappa * sigma

# Per-player (per-residue) best response via scipy
def gameopt_step(current: np.ndarray, position: int) -> int:
    """Find best amino acid at `position` holding others fixed (Nash best response)."""
    best_aa, best_val = 0, float('inf')
    for aa_idx in range(20):
        candidate = current.copy()
        candidate[position] = aa_idx
        val = ucb_acquisition(candidate)
        if val < best_val:
            best_val = val
            best_aa = aa_idx
    return best_aa

# Iterative best-response loop (converges to Nash equilibrium)
state = np.random.randint(0, 20, size=N_POSITIONS).astype(float)
for iteration in range(50):
    for pos in range(N_POSITIONS):
        state[pos] = gameopt_step(state, pos)
    e = energy_fn(state)
    if iteration % 10 == 0:
        seq = ''.join(AMINO_ACIDS[int(s) % 20] for s in state)
        print(f"Iter {iteration}: sequence={seq}  energy={e:.4f}")

final_seq = ''.join(AMINO_ACIDS[int(s) % 20] for s in state)
print(f"GameOpt result: {final_seq}  energy={energy_fn(state):.4f}")

Bisimulation Check on Folding Trajectories (Python)

Check whether two MD folding trajectories reach bisimilar native states:

# pip install mdtraj numpy
import mdtraj as md
import numpy as np

def rmsd_bisimulation(traj_a_path: str, traj_b_path: str, threshold_nm: float = 0.3) -> dict:
    """
    Two trajectories are bisimilar iff their final frames have RMSD < threshold.
    This implements the stochastic process algebra check from the architecture doc:
      bisimilar <=> same native state despite different intermediates.
    """
    traj_a = md.load(traj_a_path)
    traj_b = md.load(traj_b_path)

    # Align final frames
    final_a = traj_a[-1]
    final_b = traj_b[-1]
    rmsd = md.rmsd(final_b, final_a)[0]

    return {
        "bisimilar": rmsd < threshold_nm,
        "rmsd_nm": float(rmsd),
        "threshold_nm": threshold_nm,
        "traj_a_frames": traj_a.n_frames,
        "traj_b_frames": traj_b.n_frames,
    }

# Usage:
# result = rmsd_bisimulation("fold_run1.xtc", "fold_run2.xtc", threshold_nm=0.3)
# print(result)

Key Papers

origin/main

GameOpt (2024): arxiv.org/abs/2409.18582
Bayesian Open Games (Bolt, Hedges, Zahn 2019): arxiv.org/abs/1910.03656
MELD Bayesian protein (PNAS): doi.org/10.1073/pnas.1506788112
AMix-1 Bayesian Flow Networks (2025): protein foundation model <<<<<<< HEAD

GF(3) Trit Classification

| Component | Trit | Role | |-----------|------|------| | ESMFold/AlphaFold prediction | +1 | Generation | | GameOpt equilibrium search | 0 | Coordination | | Bisimulation equivalence check | -1 | Validation |

Conservation: +1 + 0 + (-1) = 0

Edges in Interactome TUI

-> monad-bayes (w=0.65, Bayesian structure posterior)
-> geomstats (w=0.60, protein manifold geometry)
-> bisimulation-game (w=0.90, conformational bisimulation)
-> zubyul/Nikolova_lab (w=0.70, transcription factor bridge)

Trit: 0 (ERGODIC)

=======

origin/main

Agent Skills: vertex-protein-bisimulation Skill

Install this agent skill to your local

Skill Files