Model Routing
Claude model choice is the biggest cost lever in Claude Code. Match the model to the work.
Decision matrix
| Task type | Model | Why | |---|---|---| | Architecture decision | Opus | Multi-step reasoning; hidden-cost detection | | Root-cause debugging (hard) | Opus | Hypothesis trees, multi-source evidence | | Security review | Opus | Risk sensitivity; knowledge of OWASP/CWE | | Feature implementation | Sonnet | Standard generation; good reasoning | | Code review (routine PR) | Sonnet | Fast; catches most issues | | Test writing | Sonnet | Pattern-based | | Research / docs lookup | Haiku | Fast; cheap; sufficient for retrieval | | Bulk file edits (rename, reformat) | Haiku | Mechanical work | | Dependency audit | Haiku | Running commands, parsing output | | Simple Q&A | Haiku | One-shot factual answers |
Cost table (approximate, check cc_docs_model_recommend for current)
| Model | Input $/M | Output $/M | Relative | |---|---|---|---| | Opus 4.7 | ~$15 | ~$75 | 5× | | Sonnet 4.6 | ~$3 | ~$15 | 1× | | Haiku 4.5 | ~$0.80 | ~$4 | 0.3× |
Output tokens are the dominant cost in most Claude Code sessions. Opus is ~5× the cost but ~2× the capability on hard tasks — use it where the capability matters.
Model cascading
The high-leverage pattern: start with a cheap model for planning, delegate implementation to cheap, reserve Opus for review gates.
| Phase | Model | |---|---| | Plan mode (Shift+Tab) | Opus | | Implementation | Sonnet | | Subagent research | Haiku | | Code review gate | Opus | | Final sign-off | Opus |
Net effect: most tokens are on Sonnet/Haiku; Opus tokens are where they matter most.
Budget planning
For a task estimated at N turns:
- Rough floor: 2k input + 2k output per turn = 4k tokens.
- Sonnet cost: 4k × $3/M = $0.012 per turn.
- 20-turn session on Sonnet: ~$0.24.
- Add 3 Opus review passes: +$0.45.
- Total: ~$0.70.
Use cc_docs_model_recommend(task, budget) to get a specific recommendation with cost projection.
Downgrade/upgrade triggers
Downgrade to Haiku when:
- Doing pure retrieval (grep results, file reads).
- Running a known command and parsing output.
- Rate-limited on Sonnet budget.
Upgrade to Opus when:
- Sonnet gets it wrong twice on the same subtask.
- Task is security-critical.
- Stakeholder cost of error is ≥ days of engineer time.
- You're designing something new (vs. implementing something known).
/plan mode
Shift+Tab toggles plan mode — uses Opus to think deeper without producing code. Use for:
- New feature scoping
- Debugging a tough bug before trying fixes
- Architecture choice before committing
Don't use plan mode for: known patterns, mechanical work, small tweaks.
MCP delegation
| Need | Tool |
|---|---|
| Model recommendation for a task | cc_docs_model_recommend(task, budget?) |
| Compare two model choices | cc_docs_compare(["opus", "sonnet"]) |
| Check cost of an autonomy profile | cc_kb_autonomy_profile(profile) |
Anti-patterns
- Defaulting to Opus everywhere → 5× cost, rarely 5× value.
- Haiku on hard tasks → gets it wrong, then you re-run on Opus = 6× cost.
- Ignoring
/planon new work → code-first on unfamiliar problems wastes tokens. - Not estimating budget → costs creep; you notice on the monthly bill.
Reference
- cost-table.md — detailed cost breakdown by task type