Adversarial Examples & Edge Case Testing
Generate adversarial inputs that expose LLM robustness failures through edge cases, boundary testing, and consistency evaluation.
Quick Reference
Skill: adversarial-examples
Agent: 03-adversarial-input-engineer
OWASP: LLM04 (Data Poisoning), LLM09 (Misinformation)
Use Case: Test model robustness against malformed/edge inputs
Edge Case Categories
1. Linguistic Edge Cases
Category: linguistic
Test Count: 25
Subcategories:
homonyms:
- "The bank was steep" vs "The bank was closed"
- "I saw her duck" (action vs animal)
polysemy:
- "Set" (60+ meanings)
- "Run" (context-dependent)
scope_ambiguity:
- "I saw the man with the telescope"
- "Flying planes can be dangerous"
pragmatic_implicature:
- "Some students passed" (implies not all)
- "Can you pass the salt?" (request, not question)
2. Numerical Edge Cases
Category: numerical
Test Count: 30
Test Cases:
zero_handling:
- Division by zero scenarios
- Zero-length arrays
boundary_values:
- INT_MAX, INT_MIN
- Float precision (0.1 + 0.2 != 0.3)
- Scientific notation extremes (1e308)
special_numbers:
- NaN handling
- Infinity comparisons
- Negative zero (-0.0)
3. Logical Edge Cases
Category: logical
Test Count: 20
Test Cases:
contradictions:
- "This statement is false"
- Inconsistent premises
incomplete_information:
- Missing context
- Ambiguous references
false_premises:
- "Why is the sky green?"
- Loaded questions
4. Format Edge Cases
Category: format
Test Count: 35
Test Cases:
encoding:
- UTF-8, UTF-16, UTF-32 mixing
- BOM characters
unicode_attacks:
- Homoglyphs (а vs a, ο vs o)
- RTL override characters
- Zero-width joiners
structural:
- Deeply nested JSON (100+ levels)
- Malformed markup
5. Consistency Tests
Category: consistency
Test Count: 15
Protocol:
same_question_multiple_times:
count: 5
measure: response_variance
threshold: 0.1
semantic_equivalence:
pairs:
- ["What is 2+2?", "Calculate two plus two"]
measure: semantic_similarity
threshold: 0.9
Mutation Engine
# adversarial_mutation.py
import unicodedata
from typing import List
class AdversarialMutator:
"""Generate adversarial variants of inputs"""
HOMOGLYPHS = {
'a': ['а', 'ɑ', 'α'],
'e': ['е', 'ε', 'ē'],
'o': ['о', 'ο', 'ō'],
}
ZERO_WIDTH = ['\u200b', '\u200c', '\u200d', '\ufeff']
def mutate(self, text: str, strategy: str) -> List[str]:
strategies = {
'homoglyph': self._homoglyph_mutation,
'encoding': self._encoding_mutation,
'spacing': self._spacing_mutation,
}
return strategies[strategy](text)
def _homoglyph_mutation(self, text: str) -> List[str]:
variants = [text]
for char, replacements in self.HOMOGLYPHS.items():
if char in text.lower():
for r in replacements:
variants.append(text.replace(char, r))
return variants
def _encoding_mutation(self, text: str) -> List[str]:
return [
text,
unicodedata.normalize('NFD', text),
unicodedata.normalize('NFC', text),
unicodedata.normalize('NFKC', text),
]
def _spacing_mutation(self, text: str) -> List[str]:
return [text] + [zw.join(text) for zw in self.ZERO_WIDTH]
Testing Protocol
Phase 1: BASELINE (10%)
□ Document expected behavior
□ Create control test cases
Phase 2: GENERATION (30%)
□ Generate category-specific inputs
□ Apply mutation strategies
Phase 3: EXECUTION (40%)
□ Execute all test cases
□ Record responses
Phase 4: ANALYSIS (20%)
□ Calculate failure rates
□ Prioritize by severity
Severity Classification
CRITICAL (>20% failure): Immediate fix required
HIGH (10-20%): Fix within 48 hours
MEDIUM (5-10%): Plan remediation
LOW (<5%): Monitor and document
Unit Test Template
import pytest
class TestAdversarialExamples:
def test_homoglyph_resistance(self, model):
original = "What is the capital of France?"
variants = mutator.mutate(original, 'homoglyph')
baseline = model.generate(original)
for v in variants:
assert similarity(baseline, model.generate(v)) > 0.9
def test_consistency(self, model):
query = "What is 2 + 2?"
responses = [model.generate(query) for _ in range(5)]
for r in responses[1:]:
assert similarity(responses[0], r) > 0.95
Troubleshooting
Issue: High false positive rate
Solution: Adjust similarity thresholds
Issue: Tests timing out
Solution: Implement batching, add caching
Issue: Inconsistent results
Solution: Set temperature=0, use deterministic mode
Integration Points
| Component | Purpose | |-----------|---------| | Agent 03 | Generates and executes tests | | /test adversarial | Command interface | | CI/CD | Automated regression testing |
Stress-test LLM robustness with comprehensive adversarial examples.