Verification Techniques
<metadata>- Scope: Hypothesis testing, root cause analysis, and verification
- Load if: Bug reported, test failure, proving correctness, root cause analysis
- Prerequisites: @smith-guidance/SKILL.md
Foundation: Based on PDSA's Study phase (Deming) and Popper's Falsification - understanding WHY something works or doesn't, not just IF it works.
When to use: Debugging, testing hypotheses, validating solutions, proving correctness.
</context>Hypothesis Testing
<context>Strong Inference
Rapid progress through multiple competing hypotheses:
- Devise multiple hypotheses - Not just one, but several alternatives
- Design crucial experiments - Tests that exclude one or more hypotheses
- Execute experiments - Run tests to eliminate hypotheses
- Iterate - Refine remaining hypotheses, repeat
Key insight: Science advances fastest when we actively try to disprove hypotheses, not confirm them.
For debugging:
- Bug: "Login fails intermittently"
- H1: Session storage full
- H2: Race condition in token refresh
- H3: Network timeout on auth server
- Crucial test: Check if failures correlate with session count (tests H1)
Falsification Principle (Popper)
A theory is scientific only if it can be proven false:
- Design tests that could disprove your hypothesis
- Seek evidence that contradicts, not confirms
- One counterexample disproves a universal claim
Anti-pattern: Only running tests you expect to pass Good practice: Actively try to break your own code
</context>Anti-Workaround Policy
<forbidden>- NEVER add
# noqa,// NOLINT, or similar inline suppressions without meeting exception criteria below - NEVER increase timeouts without diagnosing root cause
- NEVER use
_prefix to suppress unused-variable warnings without removing the actual dead code - NEVER disable warnings without documented justification
When lint or test failures occur:
- Apply 5 Whys to find root cause first
- Fix the underlying issue, not the symptom
- Suppressions allowed ONLY when all criteria are met:
- Reason (at least one):
- External library false positive (document which)
- Verified false positive (document why)
- Explicit user approval (cite the approval)
- Mechanism: prefer tool config (ruff.toml, .flake8) for repo-wide patterns; inline comments only for isolated cases with reason on the same line
- Reason (at least one):
Timeout changes require:
- Profiling evidence showing actual duration
- Diagnosis of why the operation is slow
- User approval before increasing
Root Cause Analysis
<context>5 Whys (Toyota)
Root cause analysis through iterative questioning:
- State the problem
- Ask "Why did this happen?"
- Repeat for each answer (typically 5 times)
- Stop when you reach an actionable root cause
Example:
- Bug: Users logged out unexpectedly
- Why? Session expired
- Why? Token refresh failed
- Why? Refresh endpoint returned 401
- Why? Clock skew between servers
- Root cause: NTP not configured on auth server
Caution: Don't stop at symptoms. "Why?" should reach systemic causes.
</context>Explanation Techniques
<context>Rubber Duck Debugging
Explain code line-by-line aloud; when explanation doesn't match code, you've found the bug.
For AI agents: When stuck, explain the problem step-by-step before proposing solutions.
Feynman Technique
Explain simply to reveal gaps: Choose concept → Explain to child → Identify gaps → Review.
If you can't explain it simply, you don't understand it well enough.
</context>Systematic Isolation
<context>Delta Debugging
Minimize failing input: split in half, test each, recurse on failing half until minimal.
Use when: Large input crashes, many files break tests, config changes fail.
Scientific Debugging (TRAFFIC)
Track → Reproduce → Automate → Find origins → Focus → Isolate → Correct
Work backward: Failure → Propagation → Infection → Defect.
</context>Version Control Debugging
<context>Git Bisect
Binary search through commit history:
Usage:
git bisect start
git bisect bad
git bisect good abc1234
git bisect good
git bisect reset
Mark current as bad, known-good commit, then test each checkout (good/bad) until culprit found.
Automated:
git bisect run ./test.sh
Exit codes: 0 = good, 1-127 = bad, 125 = skip
Complexity: O(log n) - tests ~7 commits for 100 commit range
When to use:
- Regression appeared, unknown when
- Automated test can detect the bug
- Need to find exact commit that broke something
Coverage-Based Localization
<context>Spectrum-Based Fault Localization (SBFL)
Use test coverage data to locate bugs:
Concept: Statements executed by failing tests but not passing tests are more suspicious.
Ochiai Formula (most effective):
suspiciousness(s) = failed(s) / sqrt(total_failed * (failed(s) + passed(s)))
Practical application:
- Run test suite with coverage
- Note which tests fail
- Rank statements by how often they appear in failing vs passing tests
- Inspect highest-ranked statements first
For AI agents: When multiple tests fail, identify code paths common to failures but not successes.
</context>ACTION (Recency Zone)
<required>When debugging or validating:
- Use Strong Inference: devise multiple hypotheses before testing
- Apply 5 Whys to find root cause, not symptoms
- Use Git Bisect for regressions (binary search ~7 commits for 100-commit range)
- Run tests with coverage; inspect code paths common to failures
Claude Code Plugin Integration
<context>When pr-review-toolkit is available:
- silent-failure-hunter agent: Detects silent failures, inadequate error handling
- Analyzes catch blocks, fallback behavior, missing logging
- Trigger: "Check for silent failures" or use Task tool
Ralph Loop Integration
<context>Debugging = Ralph iteration: hypothesis → test → eliminate → iterate until <promise>ROOT CAUSE FOUND</promise>.
See @smith-ralph/SKILL.md for full patterns.
- @smith-guidance/SKILL.md - Anti-sycophancy, HHH framework, exploration workflow
@smith-analysis/SKILL.md- Reasoning patterns, problem decomposition@smith-clarity/SKILL.md- Cognitive guards, logic fallacies