Agent Skills: Auto-Claude Optimization

Auto-Claude performance optimization and cost management. Use when optimizing token usage, reducing API costs, improving build speed, or tuning agent performance.

UncategorizedID: adaptationio/skrillz/auto-claude-optimization

Install this agent skill to your local

pnpm dlx add-skill https://github.com/adaptationio/Skrillz/tree/HEAD/.claude/skills/auto-claude-optimization

Skill Files

Browse the full folder contents for auto-claude-optimization.

Download Skill

Loading file tree…

.claude/skills/auto-claude-optimization/SKILL.md

Skill Metadata

Name
auto-claude-optimization
Description
Auto-Claude performance optimization and cost management. Use when optimizing token usage, reducing API costs, improving build speed, or tuning agent performance.

Auto-Claude Optimization

Performance tuning, cost reduction, and efficiency improvements.

Performance Overview

Key Metrics

| Metric | Impact | Optimization | |--------|--------|--------------| | API latency | Build speed | Model selection, caching | | Token usage | Cost | Prompt efficiency, context limits | | Memory queries | Speed | Embedding model, index tuning | | Build iterations | Time | Spec quality, QA settings |

Model Optimization

Model Selection

| Model | Speed | Cost | Quality | Use Case | |-------|-------|------|---------|----------| | claude-opus-4-5-20251101 | Slow | High | Best | Complex features | | claude-sonnet-4-5-20250929 | Fast | Medium | Good | Standard features |

# Override model in .env
AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929

Extended Thinking Tokens

Configure thinking budget per agent:

| Agent | Default | Recommended | |-------|---------|-------------| | Spec creation | 16000 | Keep default for quality | | Planning | 5000 | Reduce to 3000 for speed | | Coding | 0 | Keep disabled | | QA Review | 10000 | Reduce to 5000 for speed |

# In agent configuration
max_thinking_tokens=5000  # or None to disable

Token Optimization

Reduce Context Size

  1. Smaller spec files

    # Keep specs concise
    # Bad: 5000 word spec
    # Good: 500 word spec with clear criteria
    
  2. Limit codebase scanning

    # In context/builder.py
    MAX_CONTEXT_FILES = 50  # Reduce from 100
    
  3. Use targeted searches

    # Instead of full codebase scan
    # Focus on relevant directories
    

Efficient Prompts

Optimize system prompts in apps/backend/prompts/:

<!-- Bad: Verbose -->
You are an expert software developer who specializes in building
high-quality, production-ready applications. You have extensive
experience with many programming languages and frameworks...

<!-- Good: Concise -->
Expert full-stack developer. Build production-quality code.
Follow existing patterns. Test thoroughly.

Memory Optimization

# Use efficient embedding model
OPENAI_EMBEDDING_MODEL=text-embedding-3-small

# Or offline with smaller model
OLLAMA_EMBEDDING_MODEL=all-minilm
OLLAMA_EMBEDDING_DIM=384

Speed Optimization

Parallel Execution

# Enable more parallel agents (default: 4)
MAX_PARALLEL_AGENTS=8

Reduce QA Iterations

# Limit QA loop iterations
MAX_QA_ITERATIONS=10  # Default: 50

# Skip QA for quick iterations
python run.py --spec 001 --skip-qa

Faster Spec Creation

# Force simple complexity for quick tasks
python spec_runner.py --task "Fix typo" --complexity simple

# Skip research phase
SKIP_RESEARCH_PHASE=true python spec_runner.py --task "..."

API Timeout Tuning

# Reduce timeout for faster failure detection
API_TIMEOUT_MS=120000  # 2 minutes (default: 10 minutes)

Cost Management

Monitor Token Usage

# Enable cost tracking
ENABLE_COST_TRACKING=true

# View usage report
python usage_report.py --spec 001

Cost Reduction Strategies

  1. Use cheaper models for simple tasks

    # For simple specs
    AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 python spec_runner.py --task "..."
    
  2. Limit context window

    MAX_CONTEXT_TOKENS=50000  # Reduce from 100000
    
  3. Batch similar tasks

    # Create specs together, run together
    python spec_runner.py --task "Add feature A"
    python spec_runner.py --task "Add feature B"
    python run.py --spec 001
    python run.py --spec 002
    
  4. Use local models for memory

    # Ollama for memory (free)
    GRAPHITI_LLM_PROVIDER=ollama
    GRAPHITI_EMBEDDER_PROVIDER=ollama
    

Cost Estimation

| Operation | Estimated Tokens | Cost (Opus) | Cost (Sonnet) | |-----------|-----------------|-------------|---------------| | Simple spec | 10k | ~$0.30 | ~$0.06 | | Standard spec | 50k | ~$1.50 | ~$0.30 | | Complex spec | 200k | ~$6.00 | ~$1.20 | | Build (simple) | 50k | ~$1.50 | ~$0.30 | | Build (standard) | 200k | ~$6.00 | ~$1.20 | | Build (complex) | 500k | ~$15.00 | ~$3.00 |

Memory System Optimization

Embedding Performance

# Faster embeddings
OPENAI_EMBEDDING_MODEL=text-embedding-3-small  # 1536 dim, fast

# Higher quality (slower)
OPENAI_EMBEDDING_MODEL=text-embedding-3-large  # 3072 dim

# Offline (fastest, free)
OLLAMA_EMBEDDING_MODEL=all-minilm
OLLAMA_EMBEDDING_DIM=384

Query Optimization

# Limit search results
memory.search("query", limit=10)  # Instead of 100

# Use semantic caching
ENABLE_MEMORY_CACHE=true

Database Maintenance

# Compact database periodically
python -c "from integrations.graphiti.memory import compact_database; compact_database()"

# Clear old episodes
python query_memory.py --cleanup --older-than 30d

Build Efficiency

Spec Quality = Build Speed

High-quality specs reduce iterations:

# Good spec (fewer iterations)
## Acceptance Criteria
- [ ] User can log in with email/password
- [ ] Invalid credentials show error message
- [ ] Successful login redirects to /dashboard
- [ ] Session persists for 24 hours

# Bad spec (more iterations)
## Acceptance Criteria
- [ ] Login works

Subtask Granularity

Optimal subtask size:

  • Too large: Agent gets stuck, needs recovery
  • Too small: Overhead per subtask
  • Optimal: 30-60 minutes of work each

Parallel Work

Let agents spawn subagents for parallel execution:

Main Coder
├── Subagent 1: Frontend (parallel)
├── Subagent 2: Backend (parallel)
└── Subagent 3: Tests (parallel)

Environment Tuning

Optimal .env Configuration

# Performance-focused configuration
AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929
API_TIMEOUT_MS=180000
MAX_PARALLEL_AGENTS=6

# Memory optimization
GRAPHITI_LLM_PROVIDER=ollama
GRAPHITI_EMBEDDER_PROVIDER=ollama
OLLAMA_LLM_MODEL=llama3.2:3b
OLLAMA_EMBEDDING_MODEL=all-minilm
OLLAMA_EMBEDDING_DIM=384

# Reduce verbosity
DEBUG=false
ENABLE_FANCY_UI=false

Resource Limits

# Limit Python memory
export PYTHONMALLOC=malloc

# Set max file descriptors
ulimit -n 4096

Benchmarking

Measure Build Time

# Time a build
time python run.py --spec 001

# Compare models
time AUTO_BUILD_MODEL=claude-opus-4-5-20251101 python run.py --spec 001
time AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 python run.py --spec 001

Profile Memory Usage

# Monitor memory
watch -n 1 'ps aux | grep python | head -5'

# Profile script
python -m cProfile -o profile.stats run.py --spec 001
python -c "import pstats; p = pstats.Stats('profile.stats'); p.sort_stats('cumulative').print_stats(20)"

Quick Wins

Immediate Optimizations

  1. Switch to Sonnet for most tasks

    AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929
    
  2. Use Ollama for memory

    GRAPHITI_LLM_PROVIDER=ollama
    GRAPHITI_EMBEDDER_PROVIDER=ollama
    
  3. Skip QA for prototypes

    python run.py --spec 001 --skip-qa
    
  4. Force simple complexity for small tasks

    python spec_runner.py --task "..." --complexity simple
    

Medium-Term Improvements

  1. Optimize prompts in apps/backend/prompts/
  2. Configure project-specific security allowlist
  3. Set up memory caching
  4. Tune parallel agent count

Long-Term Strategies

  1. Self-hosted LLM for memory (Ollama)
  2. Caching layer for common operations
  3. Incremental context building
  4. Project-specific prompt optimization

Related Skills

  • auto-claude-memory: Memory configuration
  • auto-claude-build: Build process
  • auto-claude-troubleshooting: Debugging
Auto-Claude Optimization Skill | Agent Skills