Context Degradation Patterns
Language models exhibit predictable degradation as context grows. Understanding these patterns is essential for diagnosing failures and designing resilient systems.
Degradation Patterns
| Pattern | Cause | Symptoms | |---------|-------|----------| | Lost-in-Middle | Attention mechanics | 10-40% lower recall for middle content | | Context Poisoning | Errors compound | Tool misalignment, persistent hallucinations | | Context Distraction | Irrelevant info | Uses wrong information for decisions | | Context Confusion | Mixed tasks | Responses address wrong aspects | | Context Clash | Conflicting info | Contradictory guidance derails reasoning |
Lost-in-Middle
Information at beginning and end receives reliable attention. Middle content suffers dramatically reduced recall.
Mitigation:
[CURRENT TASK] # At start (high attention)
- Goal: Generate quarterly report
- Deadline: End of week
[DETAILED CONTEXT] # Middle (less attention)
- 50 pages of data
- Supporting evidence
[KEY FINDINGS] # At end (high attention)
- Revenue up 15%
- Growth in Region A
Context Poisoning
Once errors enter context, they compound through repeated reference.
Entry pathways:
- Tool outputs with errors
- Retrieved docs with incorrect info
- Model-generated summaries with hallucinations
Symptoms:
- Tool calls with wrong parameters
- Strategies that take effort to undo
- Hallucinations that persist despite correction
Recovery:
- Truncate to before poisoning point
- Explicitly note poisoning and re-evaluate
- Restart with clean context
Context Distraction
Even a single irrelevant document reduces performance. Models must attend to everything—they cannot "skip" irrelevant content.
Mitigation:
- Filter for relevance before loading
- Use namespacing for organization
- Access via tools instead of context
Degradation Thresholds
| Model | Degradation Onset | Severe Degradation | |-------|-------------------|-------------------| | GPT-5.2 | ~64K tokens | ~200K tokens | | Claude Opus 4.5 | ~100K tokens | ~180K tokens | | Claude Sonnet 4.5 | ~80K tokens | ~150K tokens | | Gemini 3 Pro | ~500K tokens | ~800K tokens |
The Four-Bucket Approach
| Strategy | Purpose | |----------|---------| | Write | Save context outside window | | Select | Pull relevant context in | | Compress | Reduce tokens, preserve info | | Isolate | Split across sub-agents |
Counterintuitive Findings
- Shuffled haystacks outperform coherent - Coherent context creates false associations
- Single distractors have outsized impact - Step function, not proportional
- Needle-question similarity matters - Dissimilar content degrades faster
When Larger Contexts Hurt
- Performance degrades non-linearly after threshold
- Cost grows exponentially with context length
- Cognitive bottleneck remains regardless of size
Best Practices
- Monitor context length and performance correlation
- Place critical information at beginning or end
- Implement compaction triggers before degradation
- Validate retrieved documents for accuracy
- Use versioning to prevent outdated info clash
- Segment tasks to prevent confusion
- Design for graceful degradation
- Test with progressively larger contexts