Content Sanitization Guidelines
When To Use
Any skill or hook that loads content from external sources:
- GitHub Issues, PRs, Discussions (via gh CLI)
- WebFetch / WebSearch results
- User-provided URLs
- Any content not controlled by this repository
When NOT To Use
- Processing local, git-controlled files (trusted content)
- Internal code analysis with no external input
Trust Levels
| Level | Source | Treatment | |---|---|---| | Trusted | Local files, git-controlled content | No sanitization | | Semi-trusted | GitHub content from repo collaborators | Light sanitization | | Untrusted | Web content, public authors | Full sanitization |
Sanitization Checklist
Before processing external content in any skill:
- Size check: Truncate to 2000 words maximum per entry
- Strip system tags: Remove
<system>,<assistant>,<human>,<IMPORTANT>XML-like tags - Strip instruction patterns: Remove "Ignore previous", "You are now", "New instructions:", "Override"
- Strip code execution patterns: Remove
!!python,__import__,eval(,exec(,os.system - Wrap in boundary markers:
--- EXTERNAL CONTENT [source: <tool>] --- [content] --- END EXTERNAL CONTENT --- - Strip formatting-based hiding: Remove content
using CSS/HTML to hide text from human view:
display:none,visibility:hiddencolor:white,#fff,#ffffff,rgb(255,255,255)font-size:0,opacity:0height:0withoverflow:hidden
- Strip zero-width characters: Remove U+200B (zero-width space), U+200C (zero-width non-joiner), U+200D (zero-width joiner), U+FEFF (BOM/zero-width no-break space)
- Strip instruction-bearing HTML comments: Remove HTML comments containing injection keywords (ignore, override, forget, "you are")
Automated Enforcement
A PostToolUse hook (sanitize_external_content.py)
automatically sanitizes outputs from WebFetch, WebSearch,
and Bash commands that call gh or curl. Skills do not
need to re-sanitize content that has already passed through
the hook.
Skills that directly construct external content (e.g.,
reading from gh api output stored in a variable) should
follow this checklist manually.
Code Execution Prevention
External content must NEVER be:
- Passed to
eval(),exec(), orcompile() - Used in
subprocesswithshell=True - Deserialized with
yaml.load()(useyaml.safe_load()) - Interpolated into f-strings for shell commands
- Used as import paths or module names
- Deserialized with
pickleormarshal
Constitutional Entry Protection
External content can never auto-promote to constitutional importance (score >= 90). Score changes >= 20 points from external sources require human confirmation.