Documentation Claim Validator
Verify that what documentation says is actually true by extracting testable claims
and checking them against the codebase. Complements doc-maintenance (which handles
structural health) by handling semantic accuracy.
When to Use
- After significant code changes (refactors, renames, API changes)
- Before releases — catch docs that describe removed or changed behavior
- When onboarding devs report "the docs are wrong"
- As a periodic trust audit on project documentation
- After running
doc-maintenanceto go deeper than structural checks
Quick Reference
| Resource | Purpose | Load when |
|----------|---------|-----------|
| scripts/extract_claims.py | Deterministic claim extraction from markdown | Always (Phase 1) |
| scripts/verify_claims.py | Automated verification against codebase | Always (Phase 2) |
| references/claim-taxonomy.md | Full taxonomy of claim types with examples | Triaging unclear claims |
Workflow Overview
Phase 1: Extract → Pull verifiable claims from docs (deterministic script)
Phase 2: Verify → Check claims against codebase (automated + AI)
Phase 3: Report → Classify failures by severity and type
Phase 4: Remediate → Fix or flag broken claims
Phase 1: Extract Claims
Run the extraction script to parse all markdown files and pull out verifiable assertions:
python3 skills/doc-claim-validator/scripts/extract_claims.py [--json] [--root PATH] [--scope docs|manual|all]
The script extracts these claim types from markdown:
| Type | What it captures | Example in docs |
|------|-----------------|-----------------|
| file_path | Inline code matching file path patterns | `src/auth/login.ts` |
| command | Code blocks or inline code with shell commands | `npm run build` |
| code_ref | Function, class, method references in inline code | `authenticate()` |
| import | Import/require statements in code blocks | import { Router } from 'express' |
| config | Configuration keys, env vars, settings | `MAX_RETRIES=3` |
| url | External links (http/https) | [docs](https://example.com) |
| architectural | Verb-anchored prose claims about technology, integrations, or architectural patterns | "Uses Redis for caching", "follows the actor model", "delegated to Auth0" |
| dependency | Package/library name claims | "Uses Redis for caching" |
| behavioral | Assertions about what code does | "The system retries 3 times" |
The first 7 types are extracted deterministically. The script uses verb-anchored
regex for architectural (rules like uses X, built with X, follows the X pattern, delegated to X, via X, depends on X) — this catches anchorless
prose claims that previously slipped through.
The last 2 (dependency, behavioral) require AI analysis and are handled in
Phase 2. behavioral in particular is not regex-extracted because behavioral
claims are free-form prose ("the cache invalidates when the user logs out")
that doesn't pattern-match cleanly — the behavioral verifier discovers and
verifies them in one pass.
Output: A structured list of claims with source file, line number, claim type, and the literal text of the claim.
Phase 2: Verify Claims
Step 2a — Automated verification
Run the verification script on the extracted claims:
python3 skills/doc-claim-validator/scripts/verify_claims.py [--json] [--root PATH] [--claims-file PATH] [--check-staleness]
Pass --check-staleness to enable git-based drift analysis (see below).
The script checks each claim type differently:
| Claim type | Verification method | Pass condition |
|------------|-------------------|---------------|
| file_path | os.path.exists() | File exists at referenced path |
| command | shutil.which() + script check | Binary exists or script file exists |
| code_ref | grep -r for function/class name | Symbol found in codebase |
| import | Check module exists in project or deps | Module resolvable |
| config | Grep for config key in source | Key found in config files or code |
| url | HTTP HEAD request (optional, off by default) | Returns 2xx/3xx |
Pass --check-urls to enable URL verification (slow, requires network).
Step 2b — AI-assisted verification
After the automated pass, dispatch agents to verify claims the script cannot.
Three of four verifiers run on general-purpose + sonnet — behavioral,
architectural, and code-example verification all require multi-file reasoning
that haiku's excerpt-read pattern strains under. The dependency verifier stays
on Explore + haiku because it's pure pattern matching against manifest
files.
Dispatch strategy: per-docfile batching
For behavioral and architectural verifiers, dispatch one sonnet call per markdown file containing claims of that type, with all claims from that file batched into a single prompt. This keeps each call's context budget on a small number of related claims (cross-referencing within the doc improves verification) while keeping total call count tied to doc-set size rather than claim count. For a project with ~50 docs and ~150 architectural claims, expect ~10–20 sonnet calls (only docs with claims trigger calls), not 150.
For release audits where precision matters more than cost, run with per-claim dispatch — one sonnet call per claim, each with the full doc as context. Higher cost, higher precision.
Verifiers
Verifier 1 — Dependency claim verifier (subagent_type: "Explore",
model: "haiku"):
Read package.json, requirements.txt, go.mod, Cargo.toml, or equivalent
dependency manifests. Cross-reference any doc claims about libraries,
frameworks, or services used. Report claims that reference dependencies not in
the project. Stays on haiku because pattern-matching against manifests doesn't
benefit from sonnet's reasoning.
Verifier 2 — Behavioral claim verifier (subagent_type: "general-purpose",
model: "sonnet", per-docfile):
For each markdown file in scope, dispatch a sonnet agent with the file content.
The agent (a) discovers behavioral claims in the file ("retries 3 times",
"caches for 5 minutes", "validates input before processing", "the cache
invalidates when the user logs out"), (b) finds the relevant code via grep /
codanna / Read, (c) verifies whether the claim matches the implementation.
Report each claim with confirmed / contradicted / unverifiable / conditional
status. Sonnet is needed because behavioral verification often requires
tracing across multiple files (handler → middleware → config) and
distinguishing happy-path from error-path behavior.
Verifier 3 — Architectural claim verifier (subagent_type: "general-purpose", model: "sonnet", per-docfile):
For each markdown file with extracted architectural claims, dispatch a
sonnet agent with the file content and the list of pre-extracted claims. The
agent verifies each claim by:
- For
uses/built/depends/viaframes: check the named technology in dependency manifests, config files, and source imports. - For
delegatedframes: check for SDK imports or HTTP integrations matching the named service. - For
follows/uses_patternframes: check directory structure, class names, and code organization for the named architectural pattern (e.g., CQRS: separate command/query handlers + event store; hexagonal: adapters/ports dirs; saga: orchestrator class with named transitions).
Report each claim with confirmed / contradicted / unverifiable / conditional status. Sonnet is needed because architectural patterns aren't 1:1 with any single file — verification requires reading enough of the codebase to recognize the pattern.
Verifier 4 — Code example verifier (subagent_type: "general-purpose",
model: "sonnet", per-docfile):
For code blocks in docs that show usage examples, verify the function
signatures, parameter names, return types, and import paths match the current
codebase. Report examples that would fail if copy-pasted. Sonnet is needed
because signature checking requires reading the current implementation and
comparing — haiku's excerpt reads aren't sufficient.
Launch verifiers 1, 2, 3, 4 in parallel. Within verifiers 2/3/4, the per-docfile dispatches run sequentially (or in small parallel batches if cost permits).
Step 2c — Git staleness scoring
For claims that pass existence checks, compute a drift score to surface likely-stale claims:
python3 skills/doc-claim-validator/scripts/verify_claims.py --check-staleness
For each passing claim, the script:
- Gets the doc file's last git modification timestamp
- Gets the target file(s) last git modification timestamp
- Counts how many commits touched the target after the doc was last edited
- Assigns a drift score: low (1-3 commits), medium (4-9), high (10+)
High-drift claims are the best candidates for AI review — the target changed heavily but the doc didn't, so the doc is probably describing outdated behavior.
The staleness report is appended as a ranked table, sorted by score descending.
Phase 3: Report
Merge automated and AI findings into a single report. Classify each failed claim:
Severity
| Level | Meaning | Example |
|-------|---------|---------|
| P0 | User-facing doc claims something that would break if followed | Tutorial shows deleted API endpoint |
| P1 | Dev doc references nonexistent code construct | README references auth.validate() which was renamed |
| P2 | Behavioral claim no longer accurate | "Retries 3 times" but retry logic was removed |
| P3 | Dependency/import claim outdated | "Uses Express" but migrated to Fastify |
| P4 | Minor inaccuracy, cosmetic | Config key renamed but behavior unchanged |
Failure Categories
| Category | Description |
|----------|-------------|
| missing_target | Referenced file, function, or symbol doesn't exist |
| wrong_signature | Function exists but signature differs from doc |
| stale_behavior | Behavioral claim doesn't match implementation |
| dead_dependency | Doc references a dependency not in the project |
| phantom_pattern | Architectural claim ("uses CQRS", "follows actor model") not evidenced in the codebase |
| wrong_integration | Doc names a service/SDK ("delegated to Auth0") that isn't actually integrated |
| broken_example | Code example would fail if executed |
| dead_url | External link returns 4xx/5xx |
| phantom_config | Config option referenced in docs doesn't exist in code |
Phase 4: Remediate
For each failed claim, decide the action:
| Action | When | How | |--------|------|-----| | Update doc | Code is correct, doc is stale | Edit doc to match code | | Flag for review | Unclear if code or doc is wrong | Create issue for human review | | Remove claim | Referenced feature was deleted | Remove or rewrite section | | Update example | Code example is outdated | Rewrite example against current code |
Route remediation to the appropriate agent per doc-maintenance conventions:
reference-builderfor API/CLI reference docstechnical-writerfor architecture and developer docslearning-guidefor user-facing tutorials and guides
Integration with doc-maintenance
This skill is designed to run after doc-maintenance:
doc-maintenance → Structural health (links, orphans, folders, staleness)
doc-claim-validator → Semantic accuracy (do claims match reality?)
The two skills share the same severity scale and remediation agent routing. Results from both can be combined into a single documentation health report.
Anti-Patterns
- Do not auto-fix behavioral claims — they require human judgment about intent
- Do not treat every inline code reference as a file path (
`true`is not a file) - Do not validate claims in archived docs (
docs/archive/) — they're historical - Do not fail on optional/conditional features — mark as "conditional" instead
- Do not check URLs by default — it's slow and flaky; opt-in only
- Do not validate code blocks marked with
<!-- no-verify -->comment
Bundled Resources
Scripts
scripts/extract_claims.py— Deterministic claim extraction from markdown filesscripts/verify_claims.py— Automated verification of extracted claims against codebase
References
references/claim-taxonomy.md— Full taxonomy of claim types with extraction patterns and examples