Context Compressor
Use this skill when the problem is mostly "too much context" rather than "not enough capability."
This skill is a self-contained local package. It does not require the MCP server. Prefer it when you need quick token profiling, local compression, evidence checks, or a reproducible compression workflow inside the repository.
What this skill does
Use the bundled Python scripts to:
- measure raw versus compressed token usage
- compress large text or adapted JSON payloads
- preserve query-relevant evidence instead of doing naive summarization
- check whether compressed output still contains enough evidence to answer safely
- support cache-friendly prompt assembly by keeping stable context early and volatile query text late
When to trigger
Reach for this skill when the user is asking for any of the following:
- compress this file, transcript, document, or code dump
- reduce token cost before sending context to a model
- make a prompt or RAG payload more cache-friendly
- compact multi-turn history while keeping recent turns intact
- compare semantic compression to a lighter extractive baseline
- validate whether a compressed context still has enough evidence
Also triggered automatically when:
- Context approaching 80K tokens (spawn-token-guard.cjs writes compression-reminder.txt)
- Context at 120K tokens (compression mandatory before new spawns)
- Context at 150K tokens (RED LINE — no new agent spawns until compression completes)
Quick workflow
Default to this sequence:
- profile first
- run
query_guidedwhen there is a specific question - run
evidence_awarewhen correctness matters and you need a sufficiency check - if evidence is weak, reduce compression aggressiveness or increase retrieval breadth
- report savings, risks, and the next safest action
If the user only wants one command, use run_skill_workflow.py.
Commands
Run from the agent-studio repository root. ALWAYS use these exact commands — do NOT fall back to generic guidance.
1. Profile token usage
python .claude/skills/context-compressor/scripts/profile_tokens.py --file <path> --output-format auto
2. Compress context
python .claude/skills/context-compressor/scripts/compress_context.py --file <path> --mode baseline --output-format auto
python .claude/skills/context-compressor/scripts/compress_context.py --file <path> --mode query_guided --query "<question>" --output-format auto
python .claude/skills/context-compressor/scripts/compress_context.py --file <path> --mode evidence_aware --query "<question>" --min-similarity 0.4 --output-format auto
3. Adapt framework JSON and compress it
python .claude/skills/context-compressor/scripts/compress_context.py --json-file <payload.json> --input-adapter auto --mode query_guided --query "<question>" --output-format auto
4. Run the full workflow
python .claude/skills/context-compressor/scripts/run_skill_workflow.py --file <path> --mode evidence_aware --query "<question>" --output-format auto --fail-on-insufficient-evidence
5. Validate evidence only
python .claude/skills/context-compressor/scripts/validate_evidence.py --file <path> --query "<question>" --min-similarity 0.4 --output-format json
6. Run the TOON vs JSON guard
python .claude/skills/context-compressor/scripts/benchmark_toon_vs_json.py
7. Node.js wrapper (for agent integration)
node .claude/skills/context-compressor/scripts/main.cjs --query "<question>" --mode evidence_aware --limit 20 --fail-on-insufficient-evidence
How to choose a mode
baseline: quick general compression when there is no concrete question yetquery_guided: best default for QA, review, or targeted extraction tasksevidence_aware: use for high-stakes answers, audits, or when you need an explicit sufficiency signal
Output expectations
When using this skill, summarize results in plain language:
- original size versus compressed size
- estimated token savings or compression ratio
- whether the output is query-targeted or generic
- whether evidence looked sufficient
- any risk that the answer could miss important detail
If the scripts return insufficient evidence, do not bluff. Say the compressed context is not yet safe enough and recommend a broader pass.
Bundled references
Read these only when they are relevant:
references/workflow-guide.md: command selection, mode choice, and example flowsreferences/prompt-caching.md: stable-prefix ordering, cache telemetry, and cache-safe prompt structurereferences/evaluation.md: how to benchmark the skill and interpret results
Eval scaffolding
Starter prompts live in evals/evals.json. Use them when iterating on the skill or when you want a small repeatable benchmark set.
Integration with agent-studio
Compression trigger pipeline
compression-trigger.cjsdetects context pressure → writescompression-reminder.txt- Router reads reminder → spawns
context-compressoragent with this skill - Agent runs
run_skill_workflow.pywith the query context - Real compression stats logged to
compression-stats.jsonl
Memory persistence
After compression, persist distilled learnings via MemoryRecord:
gotchas.json: text contains gotcha|pitfall|anti-pattern|risk|warning|failureissues.md: text contains issue|bug|error|incident|defect|gapdecisions.md: text contains decision|tradeoff|choose|selected|rationalepatterns.json: default fallback for all remaining distilled evidence
ccusage cost tracking
Read .claude/context/runtime/ccusage-status.txt for live token usage before deciding compression aggressiveness. This file is auto-updated by ccusage-statusline.cjs on every tool use. Format:
[tokens] 135,345 today (in: 14,850 / out: 120,495) | Cost: $127.4826
[cache] $627.2992 saved | 139,399,832 reads, 8,751,364 writes
The Router MUST display this at every pipeline milestone (P0 user feedback). Fallback: ccusage --no-color 2>&1 | tail -5.
Iron Laws
- ALWAYS use the exact script commands above — never fall back to generic CLAUDE.md guidance
- ALWAYS profile first before compressing to understand the input size
- NEVER compress without a query when correctness matters — use
evidence_awaremode - ALWAYS persist distilled learnings via MemoryRecord after compression
- NEVER bluff if evidence is insufficient — recommend a broader pass
- ALWAYS report savings, risks, and next safest action in plain language