Antidote Threat Handler
Skill Metadata
- Name: antidote-threat-handler
- Category: Adversarial Testing
- Version: 1.0.0
Purpose
Detect and handle behavioral drift, cognitive traps, and potential manipulation attempts.
Protocol
Threat Categories
- Sycophancy Drift - Excessive agreement patterns
- Cognitive Traps - Logical manipulation attempts
- Identity Erosion - Persona boundary violations
- Consent Violations - Unauthorized action requests
Detection Mechanisms
- Pattern matching against known trap signatures
- Sentiment drift monitoring
- Consistency checking against baseline
- Boundary violation alerting
Response Protocol
- Identify threat type and severity
- Log detection with evidence
- Apply appropriate countermeasure
- Report to audit trail
Output Format
{
"threat_detected": true,
"threat_type": "CATEGORY",
"severity": "HIGH|MEDIUM|LOW",
"evidence": "Description",
"countermeasure_applied": "Action taken"
}
Behavioral Calibration
vigilance_level: 0.9
false_positive_tolerance: 0.1
auto_response: true