Anthropic Rate Limits
Overview
Anthropic enforces three types of limits: requests per minute (RPM), input tokens per minute (TPM), and output tokens per minute. Limits depend on your spend tier.
Rate Limit Tiers
| Tier | Qualification | RPM | Input TPM | Output TPM | |------|--------------|-----|-----------|------------| | Tier 1 | Free | 50 | 40,000 | 8,000 | | Tier 2 | $40+ spend | 1,000 | 80,000 | 16,000 | | Tier 3 | $200+ spend | 2,000 | 160,000 | 32,000 | | Tier 4 | $400+ spend | 4,000 | 400,000 | 80,000 | | Scale | Custom | Custom | Custom | Custom |
Check your tier: console.anthropic.com → Settings → Limits
Response Headers
Every API response includes rate limit headers:
claude-ratelimit-requests-limit: 1000
claude-ratelimit-requests-remaining: 998
claude-ratelimit-requests-reset: 2025-01-01T00:01:00Z
claude-ratelimit-tokens-limit: 80000
claude-ratelimit-tokens-remaining: 79500
claude-ratelimit-tokens-reset: 2025-01-01T00:01:00Z
retry-after: 5
Built-In SDK Retries
The SDK automatically retries 429 and 529 errors with exponential backoff:
import Anthropic from '@claude-ai/sdk';
const client = new Anthropic({
maxRetries: 3, // default: 2. Set to 0 to disable.
});
Custom Backoff
async function callWithBackoff(params: Anthropic.MessageCreateParams, maxRetries = 5) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await client.messages.create(params);
} catch (err) {
if (err instanceof Anthropic.RateLimitError) {
const retryAfter = Number(err.headers?.['retry-after'] || 2 ** attempt);
const jitter = Math.random() * 1000;
console.log(`Rate limited. Retry in ${retryAfter}s (attempt ${attempt + 1})`);
await new Promise(r => setTimeout(r, retryAfter * 1000 + jitter));
} else {
throw err;
}
}
}
throw new Error('Exceeded max retries');
}
Throughput Optimization
| Strategy | Impact |
|----------|--------|
| Use Message Batches API | Bypasses rate limits entirely (async, 24h SLA) |
| Use prompt caching | Cached tokens don't count toward input TPM |
| Use smaller models for simple tasks | Lower token counts = more requests per minute |
| Pre-count tokens with countTokens | Avoid wasted requests that will fail |
| Queue and batch requests | Smooth out bursts |
Token Counting
// Count before sending — avoid burning RPM on requests that'll fail
const count = await client.messages.countTokens({
model: 'claude-sonnet-4-20250514',
messages,
system: systemPrompt,
});
console.log(`This request will use ${count.input_tokens} input tokens`);
Python
import anthropic
import time
client = anthropic.Anthropic(max_retries=5)
# Or manual handling:
try:
message = client.messages.create(...)
except anthropic.RateLimitError as e:
retry_after = float(e.response.headers.get("retry-after", 5))
time.sleep(retry_after)
Output
- Rate limit tier identified from response headers
- SDK configured with appropriate
maxRetriessetting - Custom backoff implemented with jitter for high-throughput use cases
- Throughput optimized using batches, caching, or model selection
Error Handling
| Error | Cause | Solution |
|-------|-------|----------|
| API Error | Check error type and status code | See clade-common-errors |
Examples
See Rate Limit Tiers table, Response Headers section, Built-In SDK Retries, Custom Backoff implementation, and Throughput Optimization strategies above.
Resources
- Rate Limits Docs
- Message Batches — no rate limits
- Token Counting
Next Steps
See clade-cost-tuning for cost optimization strategies.
Prerequisites
- Completed
clade-install-auth - Understanding of HTTP response headers
- Familiarity with exponential backoff patterns
Instructions
Step 1: Review the patterns below
Each section contains production-ready code examples. Copy and adapt them to your use case.
Step 2: Apply to your codebase
Integrate the patterns that match your requirements. Test each change individually.
Step 3: Verify
Run your test suite to confirm the integration works correctly.