Cost Optimization
Reduce Claude Code API costs while maintaining quality through smart token management, model selection, and monitoring.
Quick Reference
| Strategy | Impact | Effort | |----------|--------|--------| | Use Haiku for simple tasks | High | Low | | Batch related operations | Medium | Low | | Use /compact strategically | Medium | Low | | Reduce context size | High | Medium | | Efficient prompting | Medium | Medium |
Understanding Costs
How API Costs Work
Claude Code costs are based on tokens:
- Input tokens: Everything Claude reads (prompts, files, context)
- Output tokens: Everything Claude generates (responses, code)
- Cached tokens: Reduced rate for repeated context
Model Pricing (Relative)
| Model | Input Cost | Output Cost | Best For | |-------|------------|-------------|----------| | Haiku | $ | $ | Simple tasks, exploration | | Sonnet | $$ | $$ | General development (default) | | Opus | $$$$$ | $$$$$ | Complex reasoning, architecture |
Rule of thumb: Opus is ~15x more expensive than Haiku for the same tokens.
What Consumes Tokens
| Activity | Token Impact | Optimization | |----------|--------------|--------------| | Reading files | High | Read selectively, use grep | | Long conversations | Cumulative | Use /compact regularly | | Tool outputs | Variable | Request summaries | | Code generation | Medium | Be specific in requests | | Error messages | Low | N/A |
The /cost Command
Basic Usage
> /cost
Shows:
- Session token usage (input/output)
- Estimated cost for current session
- Context window usage percentage
When to Check
- Before starting large tasks
- After reading multiple files
- When responses slow down
- Every 15-20 exchanges
- Before deciding to /compact vs /clear
Interpreting Results
| Metric | Good | Concern | Action | |--------|------|---------|--------| | Context usage | <50% | >70% | Consider /compact | | Session cost | Varies | Unexpected spike | Review recent operations | | Output ratio | Balanced | Output >> Input | Responses too verbose |
The /stats Command
View usage statistics over time:
> /stats
Date Range Filtering (2.1.6+): Press r to cycle between:
- Last 7 days
- Last 30 days
- All time
Shows:
- Total tokens used (input/output)
- Number of sessions
- Cost breakdown by period
- Model usage distribution
Token Reduction Strategies
1. Selective File Reading
Expensive:
> Read the entire src/ directory to understand the codebase
Efficient:
> @src/api/users.ts @src/types/user.ts - I need to modify the user API
2. Use Grep Before Read
Expensive:
> Find all files that use the AuthService class
[Claude reads many files to find them]
Efficient:
> grep for "AuthService" in src/, then I'll look at the most relevant ones
3. Targeted @ Mentions
| Pattern | Token Cost | Use Case |
|---------|------------|----------|
| @src/ | Very High | Avoid unless necessary |
| @src/api/ | High | When exploring a module |
| @src/api/users.ts | Low | Specific file work |
| @src/api/users.ts:50-100 | Very Low | Specific section |
4. Limit Output Verbosity
> Analyze this file and give me a brief summary of the key functions
vs
> Explain every line of this file
5. Batch Related Operations
Expensive (multiple turns):
> Read file A
> Now modify line 10
> Now read file B
> Modify line 20
Efficient (single turn):
> In file A, update the getUserById function to handle null.
> In file B, add the new UserNotFound error type.
> Run the tests after both changes.
/compact vs /clear
When to /compact
Use /compact when:
- Context is 70%+ full
- You want to continue the same task
- Need to preserve decisions and progress
- Responses are slowing down
Cost impact: Reduces ongoing costs by 50-80%
When to /clear
Use /clear when:
- Switching to unrelated task
- Previous context is irrelevant
- Starting fresh approach
- Maximum cost savings needed
Cost impact: Resets to zero (but loses all context)
Decision Matrix
| Situation | Command | Reasoning | |-----------|---------|-----------| | Same task, full context | /compact | Preserve progress | | Different project | /clear | Irrelevant context | | Stuck on approach | /clear | Fresh perspective | | After major milestone | /compact | Keep decisions | | Testing something new | /clear | Clean state |
Model Selection
Quick Guide
| Task Type | Recommended Model | Why | |-----------|-------------------|-----| | File exploration | Haiku | Fast, cheap, sufficient | | Simple edits | Haiku | Straightforward | | General coding | Sonnet | Balanced (default) | | Bug fixing | Sonnet | Needs reasoning | | Architecture design | Opus | Deep analysis | | Security review | Opus | Critical thinking | | Complex refactoring | Opus | Multi-file reasoning |
Switching Models
Set model in skill frontmatter:
---
model: haiku
---
Or request model in prompt:
> Using Haiku, list all TypeScript files in src/
Cost Comparison Example
Task: Review 10 files for security issues
| Approach | Estimated Cost | |----------|---------------| | Opus reviews all | $$$$$ | | Haiku scans, Opus reviews flagged | $$ | | Sonnet reviews all | $$$ |
Best strategy: Use Haiku for initial scan, escalate to Opus for detailed review of potential issues.
Efficient Prompting
Reduce Token Count
| Verbose | Concise | Savings | |---------|---------|---------| | "Could you please" | [Just ask] | 3-4 tokens | | "I want you to" | [State task] | 4-5 tokens | | Long explanations | Bullet points | 20-50% | | Repeated context | @ mentions | Significant |
Be Specific
Token-heavy:
> I have this function that gets users from the database and I want
> to add some caching because it's being called too often and making
> the app slow. Can you help me figure out a good caching strategy?
Efficient:
> Add Redis caching to getUserById in @src/api/users.ts.
> TTL: 5 minutes. Invalidate on user update.
Use Checklists
> Implement user search:
> - [ ] Add search endpoint
> - [ ] Add debounced input
> - [ ] Handle empty results
> Run tests when done.
Clearer than long paragraph descriptions.
Batching Strategies
Batch Similar Operations
Instead of multiple turns:
> Add logging to function A
[response]
> Add logging to function B
[response]
> Add logging to function C
Single turn:
> Add consistent logging to functions A, B, and C in @src/utils.ts
> Use format: logger.info("[FunctionName] action", { params })
Batch Read-Modify Cycles
> Review @src/api/*.ts for missing error handling.
> Add try-catch with proper logging to any functions that need it.
> Summarize changes made.
When NOT to Batch
- Complex, interdependent changes
- When you need to verify each step
- Exploratory work
- Learning a new codebase
Budget Management
Setting Expectations
| Session Type | Typical Cost Range | |--------------|-------------------| | Quick fix | $ | | Feature implementation | $$-$$$ | | Large refactor | $$$-$$$$ | | Architecture session (Opus) | $$$$$ |
Cost Controls
- Monitor actively: Check /cost regularly
- Set mental limits: "I'll compact at $X"
- Use appropriate models: Haiku for exploration
- Plan sessions: Know scope before starting
Daily/Weekly Tracking
> /cost
[Note the total]
Track across sessions to understand your patterns.
Subagent Cost Efficiency
Why Subagents Help
Subagents have isolated context:
- Main context stays lean
- Exploratory work doesn't pollute
- Can use cheaper models
Cost-Efficient Agent Pattern
---
name: explorer
model: haiku
tools: Read, Glob, Grep
---
Explore and summarize. Return only key findings.
Delegation Examples
| Task | Agent Model | Return | |------|-------------|--------| | Find all API routes | Haiku | Route list | | Analyze dependencies | Haiku | Summary | | Review for patterns | Sonnet | Findings | | Deep security review | Opus | Detailed report |
Common Wasteful Patterns
| Pattern | Why Wasteful | Better Approach | |---------|--------------|-----------------| | Reading entire directories | Massive token cost | Grep first, read specific | | Verbose explanations | Unnecessary output | Request concise | | Repeating context | Already in history | Use @ mentions | | Not using /compact | Growing costs | Compact at 70% | | Opus for everything | Expensive overkill | Match model to task | | Long debugging sessions | Cumulative cost | Clear and restart |
Reference Files
| File | Contents | |------|----------| | TOKEN-STRATEGIES.md | Detailed token reduction techniques | | MODEL-SELECTION.md | Model comparison and selection guide | | MONITORING.md | Cost tracking and budget management |
Quick Decisions
| Situation | Action | |-----------|--------| | Context at 70% | /compact | | Simple file exploration | Use Haiku | | Need deep analysis | Use Opus (worth the cost) | | Unexpected high cost | Check recent operations | | Switching tasks | /clear to save costs | | Debugging loop | Clear and try fresh approach |