Prompt Compression Skill
Capabilities
- Implement token-efficient prompt compression
- Design context pruning strategies
- Configure selective context inclusion
- Implement LLMLingua-style compression
- Design summary-based compression
- Create compression quality metrics
Target Processes
- cost-optimization-llm
- agent-performance-optimization
Implementation Details
Compression Techniques
- LLMLingua: Token-level compression
- Summary Compression: LLM-based summarization
- Selective Context: Relevant section extraction
- Token Pruning: Remove low-importance tokens
- Document Filtering: Pre-retrieval filtering
Configuration Options
- Compression ratio targets
- Quality threshold settings
- Token budget constraints
- Compression model selection
- Evaluation metrics
Best Practices
- Monitor quality vs compression tradeoff
- Test with representative prompts
- Set appropriate compression ratios
- Validate compressed prompt quality
- Track cost savings
Dependencies
- llmlingua (optional)
- tiktoken
- transformers