Torch Compile Skill | Agent Skills

Torch Compile

Use torch.compile to JIT-compile PyTorch code into optimized kernels, then validate speedups with warmups and graph-break audits.

Use this skill only when the frontmatter triggers apply; otherwise keep eager mode.

Do you need to reduce Python overhead in hot paths?
- Yes: compile and benchmark.
Are first runs much slower than eager?
- Yes: warm up and re-measure after caching.
Are graph breaks frequent?
- Yes: audit with torch._dynamo.explain or logging and reduce non-tensor logic.

Compilation overhead shows up on the first few executions, so warmup is required before benchmarking.
Speedup depends on reducing Python overhead and GPU read/writes; architecture and batch size affect the outcome.
Graph breaks trade optimization opportunities for correctness rather than crashing.

"torch.compile makes PyTorch code run faster by JIT-compiling PyTorch code into optimized kernels, while requiring minimal code changes." - PyTorch
"torch.compile takes extra time to compile the model on the first few executions." - PyTorch
"reducing Python overhead and GPU read/writes, and so the observed speedup may vary on factors such as model architecture and batch size." - PyTorch
"Graph breaks result in lost optimization opportunities, which may still be undesirable, but this is better than silent incorrectness or a hard crash." - PyTorch

scripts/torch-compile_tool.py: CLI for probing torch.compile availability, benchmarking, and explain output.
scripts/torch-compile_tool.js: Node.js wrapper for the same CLI.