Context Optimization Techniques Skill

Context Optimization Techniques

Extend effective context capacity through compression, masking, caching, and partitioning. Effective optimization can 2-3x effective context capacity without larger models.

Optimization Strategies

| Strategy | Token Reduction | Use Case | |----------|-----------------|----------| | Compaction | 50-70% | Message history dominates | | Observation Masking | 60-80% | Tool outputs dominate | | KV-Cache Optimization | 70%+ cache hits | Stable workloads | | Context Partitioning | Variable | Complex multi-task |

Compaction

Summarize context when approaching limits:

if context_tokens / context_limit > 0.8:
    context = compact_context(context)

Priority for compression:

Tool outputs → replace with summaries
Old turns → summarize early conversation
Retrieved docs → summarize if recent versions exist
Never compress system prompt

Summary generation by type:

Tool outputs: Preserve findings, metrics, conclusions
Conversational: Preserve decisions, commitments, context shifts
Documents: Preserve key facts, remove supporting evidence

Observation Masking

Tool outputs can be 80%+ of tokens. Replace verbose outputs with references:

if len(observation) > max_length:
    ref_id = store_observation(observation)
    return f"[Obs:{ref_id} elided. Key: {extract_key(observation)}]"

Masking rules:

Never mask: Current task critical, most recent turn, active reasoning
Consider: 3+ turns old, key points extractable, purpose served
Always mask: Repeated outputs, boilerplate, already summarized

KV-Cache Optimization

Cache Key/Value tensors for requests with identical prefixes:

# Cache-friendly ordering: stable content first
context = [
    system_prompt,      # Cacheable
    tool_definitions,   # Cacheable
    reused_templates,   # Reusable
    unique_content      # Unique
]

Design for cache stability:

Avoid dynamic content (timestamps)
Use consistent formatting
Keep structure stable across sessions

Context Partitioning

Split work across sub-agents with isolated contexts:

# Each sub-agent has clean, focused context
results = await gather(
    research_agent.search("topic A"),
    research_agent.search("topic B"),
    research_agent.search("topic C")
)
# Coordinator synthesizes without carrying full context
synthesized = await coordinator.synthesize(results)

Budget Management

context_budget = {
    "system_prompt": 2000,
    "tool_definitions": 3000,
    "retrieved_docs": 10000,
    "message_history": 15000,
    "reserved_buffer": 2000
}
# Monitor and trigger optimization at 70-80%

When to Optimize

| Signal | Action | |--------|--------| | Utilization >70% | Start monitoring | | Utilization >80% | Apply compaction | | Quality degradation | Investigate cause | | Tool outputs dominate | Observation masking | | Docs dominate | Summarization/partitioning |

Performance Targets

Compaction: 50-70% reduction, <5% quality loss
Masking: 60-80% reduction in masked observations
Cache: 70%+ hit rate for stable workloads

Best Practices

Measure before optimizing
Apply compaction before masking
Design for cache stability
Partition before context becomes problematic
Monitor effectiveness over time
Balance token savings vs quality
Test at production scale
Implement graceful degradation

Agent Skills: Context Optimization Techniques

Install this agent skill to your local

Skill Files