Agent Skills: Anthropic Rate Limits

|

UncategorizedID: jeremylongshore/claude-code-plugins-plus-skills/anth-rate-limits

Install this agent skill to your local

pnpm dlx add-skill https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/HEAD/plugins/saas-packs/anthropic-pack/skills/anth-rate-limits

Skill Files

Browse the full folder contents for anth-rate-limits.

Download Skill

Loading file tree…

plugins/saas-packs/anthropic-pack/skills/anth-rate-limits/SKILL.md

Skill Metadata

Name
anth-rate-limits
Description
|

Anthropic Rate Limits

Overview

The Claude API uses token-bucket rate limiting measured in three dimensions: requests per minute (RPM), input tokens per minute (ITPM), and output tokens per minute (OTPM). Limits increase automatically as you move through usage tiers.

Rate Limit Dimensions

| Dimension | Header | Description | |-----------|--------|-------------| | RPM | anthropic-ratelimit-requests-limit | Requests per minute | | ITPM | anthropic-ratelimit-tokens-limit | Input tokens per minute | | OTPM | anthropic-ratelimit-tokens-limit | Output tokens per minute |

Limits are per-organization and per-model-class. Cached input tokens do NOT count toward ITPM limits.

Usage Tiers (Auto-Upgrade)

| Tier | Monthly Spend | Key Benefit | |------|---------------|-------------| | Tier 1 (Free) | $0 | Evaluation access | | Tier 2 | $40+ | Higher RPM | | Tier 3 | $200+ | Production-grade limits | | Tier 4 | $2,000+ | High-throughput access | | Scale | Custom | Custom limits via sales |

Check your current tier and limits at console.anthropic.com.

SDK Built-In Retry

import anthropic

# The SDK retries 429 and 5xx errors automatically (2 retries by default)
client = anthropic.Anthropic(max_retries=5)  # Increase for high-traffic apps

# Disable auto-retry for manual control
client = anthropic.Anthropic(max_retries=0)
const client = new Anthropic({ maxRetries: 5 });

Custom Rate Limiter with Header Awareness

import time
import anthropic

class RateLimitedClient:
    def __init__(self):
        self.client = anthropic.Anthropic(max_retries=0)  # We handle retries
        self.remaining_requests = 100
        self.remaining_tokens = 100000
        self.reset_at = 0.0

    def create_message(self, **kwargs):
        # Pre-check: wait if near limit
        if self.remaining_requests < 3 and time.time() < self.reset_at:
            wait = self.reset_at - time.time()
            print(f"Pre-throttle: waiting {wait:.1f}s")
            time.sleep(wait)

        for attempt in range(5):
            try:
                response = self.client.messages.create(**kwargs)

                # Update from response headers (via _response)
                headers = response._response.headers
                self.remaining_requests = int(headers.get("anthropic-ratelimit-requests-remaining", 100))
                self.remaining_tokens = int(headers.get("anthropic-ratelimit-tokens-remaining", 100000))
                reset = headers.get("anthropic-ratelimit-requests-reset")
                if reset:
                    from datetime import datetime
                    self.reset_at = datetime.fromisoformat(reset.replace("Z", "+00:00")).timestamp()

                return response
            except anthropic.RateLimitError as e:
                retry_after = float(e.response.headers.get("retry-after", 2 ** attempt))
                print(f"429 — retry in {retry_after}s (attempt {attempt + 1})")
                time.sleep(retry_after)

        raise Exception("Exhausted rate limit retries")

Queue-Based Throughput Control

import PQueue from 'p-queue';
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

// Enforce 50 RPM with concurrency limit
const queue = new PQueue({
  concurrency: 10,
  interval: 60_000,
  intervalCap: 50,
});

async function rateLimitedCall(prompt: string) {
  return queue.add(() =>
    client.messages.create({
      model: 'claude-sonnet-4-20250514',
      max_tokens: 1024,
      messages: [{ role: 'user', content: prompt }],
    })
  );
}

// Process 200 prompts without hitting limits
const results = await Promise.all(
  prompts.map(p => rateLimitedCall(p))
);

Cost-Saving: Use Batches for Bulk Work

# Message Batches API: 50% cheaper, no rate limit pressure on real-time quota
batch = client.messages.batches.create(
    requests=[
        {"custom_id": f"req-{i}", "params": {
            "model": "claude-sonnet-4-20250514",
            "max_tokens": 1024,
            "messages": [{"role": "user", "content": prompt}]
        }}
        for i, prompt in enumerate(prompts)
    ]
)

Error Handling

| Header | Description | Action | |--------|-------------|--------| | retry-after | Seconds until next request allowed | Sleep this duration exactly | | anthropic-ratelimit-requests-remaining | Requests left in window | Throttle if < 5 | | anthropic-ratelimit-tokens-remaining | Tokens left in window | Reduce max_tokens if low | | anthropic-ratelimit-requests-reset | ISO timestamp of window reset | Schedule retry after this time |

Resources

Next Steps

For security configuration, see anth-security-basics.