Prompt Injection Detector Skill
Capabilities
- Detect prompt injection attempts
- Implement input sanitization
- Configure detection classifiers
- Design defense layers
- Implement canary token detection
- Create injection logging and alerting
Target Processes
- prompt-injection-defense
- tool-safety-validation
Implementation Details
Detection Methods
- Pattern Matching: Known injection patterns
- ML Classifiers: Trained injection detectors
- Canary Tokens: Detect instruction override
- LLM-Based: Use LLM to detect manipulation
- Perplexity Analysis: Unusual input patterns
Defense Strategies
- Input preprocessing
- Prompt structure design
- Output validation
- Sandboxed execution
- Multi-layer defense
Configuration Options
- Detection threshold
- Pattern rules
- Classifier model
- Action policies
- Alerting settings
Best Practices
- Defense in depth
- Regular pattern updates
- Monitor false positives
- Test with red-team inputs
Dependencies
- rebuff (optional)
- transformers
- Custom classifiers