Agent Skills: Model Routing

Intelligent model selection for Claude Code — decision matrices, cost tables, budget planning, and subagent model assignment for optimal cost/quality tradeoffs

UncategorizedID: lobbi-docs/claude/model-routing

Install this agent skill to your local

pnpm dlx add-skill https://github.com/markus41/claude/tree/HEAD/plugins/claude-code-expert/skills/model-routing

Skill Files

Browse the full folder contents for model-routing.

Download Skill

Loading file tree…

plugins/claude-code-expert/skills/model-routing/SKILL.md

Skill Metadata

Name
model-routing
Description
Pick the right Claude model (Opus, Sonnet, Haiku) for a task and manage cost — decision matrix, cost tables, budget planning, cascading strategy. Use this skill whenever choosing a model, setting a token budget, optimizing session cost, or deciding whether to upgrade/downgrade mid-task. Triggers on: "which model", "cost", "budget", "haiku vs sonnet", "opus for this", "save tokens", "model cascading", "/cc-budget".

Model Routing

Claude model choice is the biggest cost lever in Claude Code. Match the model to the work.

Decision matrix

| Task type | Model | Why | |---|---|---| | Architecture decision | Opus | Multi-step reasoning; hidden-cost detection | | Root-cause debugging (hard) | Opus | Hypothesis trees, multi-source evidence | | Security review | Opus | Risk sensitivity; knowledge of OWASP/CWE | | Feature implementation | Sonnet | Standard generation; good reasoning | | Code review (routine PR) | Sonnet | Fast; catches most issues | | Test writing | Sonnet | Pattern-based | | Research / docs lookup | Haiku | Fast; cheap; sufficient for retrieval | | Bulk file edits (rename, reformat) | Haiku | Mechanical work | | Dependency audit | Haiku | Running commands, parsing output | | Simple Q&A | Haiku | One-shot factual answers |

Cost table (approximate, check cc_docs_model_recommend for current)

| Model | Input $/M | Output $/M | Relative | |---|---|---|---| | Opus 4.7 | ~$15 | ~$75 | 5× | | Sonnet 4.6 | ~$3 | ~$15 | 1× | | Haiku 4.5 | ~$0.80 | ~$4 | 0.3× |

Output tokens are the dominant cost in most Claude Code sessions. Opus is ~5× the cost but ~2× the capability on hard tasks — use it where the capability matters.

Model cascading

The high-leverage pattern: start with a cheap model for planning, delegate implementation to cheap, reserve Opus for review gates.

| Phase | Model | |---|---| | Plan mode (Shift+Tab) | Opus | | Implementation | Sonnet | | Subagent research | Haiku | | Code review gate | Opus | | Final sign-off | Opus |

Net effect: most tokens are on Sonnet/Haiku; Opus tokens are where they matter most.

Budget planning

For a task estimated at N turns:

  • Rough floor: 2k input + 2k output per turn = 4k tokens.
  • Sonnet cost: 4k × $3/M = $0.012 per turn.
  • 20-turn session on Sonnet: ~$0.24.
  • Add 3 Opus review passes: +$0.45.
  • Total: ~$0.70.

Use cc_docs_model_recommend(task, budget) to get a specific recommendation with cost projection.

Downgrade/upgrade triggers

Downgrade to Haiku when:

  • Doing pure retrieval (grep results, file reads).
  • Running a known command and parsing output.
  • Rate-limited on Sonnet budget.

Upgrade to Opus when:

  • Sonnet gets it wrong twice on the same subtask.
  • Task is security-critical.
  • Stakeholder cost of error is ≥ days of engineer time.
  • You're designing something new (vs. implementing something known).

/plan mode

Shift+Tab toggles plan mode — uses Opus to think deeper without producing code. Use for:

  • New feature scoping
  • Debugging a tough bug before trying fixes
  • Architecture choice before committing

Don't use plan mode for: known patterns, mechanical work, small tweaks.

MCP delegation

| Need | Tool | |---|---| | Model recommendation for a task | cc_docs_model_recommend(task, budget?) | | Compare two model choices | cc_docs_compare(["opus", "sonnet"]) | | Check cost of an autonomy profile | cc_kb_autonomy_profile(profile) |

Anti-patterns

  • Defaulting to Opus everywhere → 5× cost, rarely 5× value.
  • Haiku on hard tasks → gets it wrong, then you re-run on Opus = 6× cost.
  • Ignoring /plan on new work → code-first on unfamiliar problems wastes tokens.
  • Not estimating budget → costs creep; you notice on the monthly bill.

Reference