Agent Skills: Anthropic Cost Tuning

|

UncategorizedID: jeremylongshore/claude-code-plugins-plus-skills/anth-cost-tuning

Install this agent skill to your local

pnpm dlx add-skill https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/HEAD/plugins/saas-packs/anthropic-pack/skills/anth-cost-tuning

Skill Files

Browse the full folder contents for anth-cost-tuning.

Download Skill

Loading file tree…

plugins/saas-packs/anthropic-pack/skills/anth-cost-tuning/SKILL.md

Skill Metadata

Name
anth-cost-tuning
Description
|

Anthropic Cost Tuning

Overview

Optimize Claude API spend through model routing, prompt caching, the Message Batches API, and real-time cost tracking. The four biggest levers: model selection (4-19x), prompt caching (10x input), batches (2x), and max_tokens discipline.

Pricing Reference (per million tokens)

| Model | Input | Output | Cache Read | Cache Write | |-------|-------|--------|------------|-------------| | Claude Haiku | $0.80 | $4.00 | $0.08 | $1.00 | | Claude Sonnet | $3.00 | $15.00 | $0.30 | $3.75 | | Claude Opus | $15.00 | $75.00 | $1.50 | $18.75 |

Message Batches: 50% off all model pricing for async processing.

Cost Calculator

def estimate_cost(
    input_tokens: int,
    output_tokens: int,
    model: str = "claude-sonnet-4-20250514",
    cached_input: int = 0,
    use_batch: bool = False
) -> float:
    pricing = {
        "claude-haiku-4-20250514": {"input": 0.80, "output": 4.00, "cache_read": 0.08},
        "claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00, "cache_read": 0.30},
        "claude-opus-4-20250514": {"input": 15.00, "output": 75.00, "cache_read": 1.50},
    }
    rates = pricing[model]
    uncached_input = input_tokens - cached_input

    cost = (
        uncached_input * rates["input"] +
        cached_input * rates["cache_read"] +
        output_tokens * rates["output"]
    ) / 1_000_000

    if use_batch:
        cost *= 0.5

    return cost

# Example: 10K requests/day, 500 input + 200 output tokens each
daily = estimate_cost(500, 200, "claude-sonnet-4-20250514") * 10_000
print(f"Daily: ${daily:.2f}")      # ~$0.045 * 10K = $450/day
print(f"Monthly: ${daily * 30:.2f}")  # ~$13,500/month

# Same with Haiku + batching
daily_optimized = estimate_cost(500, 200, "claude-haiku-4-20250514", use_batch=True) * 10_000
print(f"Optimized: ${daily_optimized:.2f}/day")  # ~$22/day (20x cheaper)

Strategy 1: Model Routing

def route_to_model(task: str, complexity: str) -> str:
    """Route tasks to cheapest adequate model."""
    # Haiku: classification, extraction, yes/no, routing ($0.80/$4)
    if task in ("classify", "extract", "route", "validate"):
        return "claude-haiku-4-20250514"

    # Sonnet: general tasks, code, tool use ($3/$15)
    if complexity in ("low", "medium"):
        return "claude-sonnet-4-20250514"

    # Opus: only for complex reasoning, research ($15/$75)
    return "claude-opus-4-20250514"

Strategy 2: Prompt Caching

# Cache system prompts and reference documents (90% input savings)
# Break-even: 2 requests with same cached content
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=256,
    system=[{
        "type": "text",
        "text": large_reference_document,  # 10K+ tokens
        "cache_control": {"type": "ephemeral"}
    }],
    messages=[{"role": "user", "content": user_question}]
)

Strategy 3: Batches for Non-Real-Time

# 50% cost reduction for anything that doesn't need immediate response
# Ideal for: summarization pipelines, data extraction, content generation
batch = client.messages.batches.create(requests=[...])  # Up to 100K requests

Strategy 4: Spend Tracking

import anthropic
from dataclasses import dataclass, field

@dataclass
class SpendTracker:
    budget_usd: float = 100.0
    spent_usd: float = 0.0
    requests: int = 0

    def track(self, response):
        cost = estimate_cost(
            response.usage.input_tokens,
            response.usage.output_tokens,
            response.model,
            getattr(response.usage, "cache_read_input_tokens", 0)
        )
        self.spent_usd += cost
        self.requests += 1

        if self.spent_usd > self.budget_usd * 0.8:
            print(f"WARNING: 80% budget used (${self.spent_usd:.2f}/${self.budget_usd})")
        if self.spent_usd > self.budget_usd:
            raise RuntimeError(f"Budget exceeded: ${self.spent_usd:.2f}")

tracker = SpendTracker(budget_usd=50.0)

Cost Reduction Checklist

  • [ ] Use Haiku for classification/extraction/routing tasks
  • [ ] Enable prompt caching for repeated system prompts
  • [ ] Use Message Batches for non-real-time processing
  • [ ] Set max_tokens to realistic values (not maximum)
  • [ ] Use prefill to reduce output preamble tokens
  • [ ] Implement spend tracking and budget alerts
  • [ ] Monitor via Usage API

Resources

Next Steps

For architecture patterns, see anth-reference-architecture.