Verification Techniques Skill

Verification Techniques

Scope: Hypothesis testing, root cause analysis, and verification
Load if: Bug reported, test failure, proving correctness, root cause analysis
Prerequisites: @smith-guidance/SKILL.md

</metadata> <context>

Foundation: Based on PDSA's Study phase (Deming) and Popper's Falsification - understanding WHY something works or doesn't, not just IF it works.

When to use: Debugging, testing hypotheses, validating solutions, proving correctness.

</context>

Hypothesis Testing

Strong Inference

Rapid progress through multiple competing hypotheses:

Devise multiple hypotheses - Not just one, but several alternatives
Design crucial experiments - Tests that exclude one or more hypotheses
Execute experiments - Run tests to eliminate hypotheses
Iterate - Refine remaining hypotheses, repeat

Key insight: Science advances fastest when we actively try to disprove hypotheses, not confirm them.

For debugging:

Bug: "Login fails intermittently"
H1: Session storage full
H2: Race condition in token refresh
H3: Network timeout on auth server
Crucial test: Check if failures correlate with session count (tests H1)

Falsification Principle (Popper)

A theory is scientific only if it can be proven false:

Design tests that could disprove your hypothesis
Seek evidence that contradicts, not confirms
One counterexample disproves a universal claim

Anti-pattern: Only running tests you expect to pass Good practice: Actively try to break your own code

</context>

Anti-Workaround Policy

NEVER add # noqa, // NOLINT, or similar inline suppressions without meeting exception criteria below
NEVER increase timeouts without diagnosing root cause
NEVER use _ prefix to suppress unused-variable warnings without removing the actual dead code
NEVER disable warnings without documented justification

</forbidden> <required>

When lint or test failures occur:

Apply 5 Whys to find root cause first
Fix the underlying issue, not the symptom
Suppressions allowed ONLY when all criteria are met:
- Reason (at least one):
  - External library false positive (document which)
  - Verified false positive (document why)
  - Explicit user approval (cite the approval)
- Mechanism: prefer tool config (ruff.toml, .flake8) for repo-wide patterns; inline comments only for isolated cases with reason on the same line

Timeout changes require:

Profiling evidence showing actual duration
Diagnosis of why the operation is slow
User approval before increasing

</required>

Root Cause Analysis

5 Whys (Toyota)

Root cause analysis through iterative questioning:

State the problem
Ask "Why did this happen?"
Repeat for each answer (typically 5 times)
Stop when you reach an actionable root cause

Example:

Bug: Users logged out unexpectedly
Why? Session expired
Why? Token refresh failed
Why? Refresh endpoint returned 401
Why? Clock skew between servers
Root cause: NTP not configured on auth server

Caution: Don't stop at symptoms. "Why?" should reach systemic causes.

</context>

Explanation Techniques

Rubber Duck Debugging

Explain code line-by-line aloud; when explanation doesn't match code, you've found the bug.

For AI agents: When stuck, explain the problem step-by-step before proposing solutions.

Feynman Technique

Explain simply to reveal gaps: Choose concept → Explain to child → Identify gaps → Review.

If you can't explain it simply, you don't understand it well enough.

</context>

Systematic Isolation

Delta Debugging

Minimize failing input: split in half, test each, recurse on failing half until minimal.

Use when: Large input crashes, many files break tests, config changes fail.

Scientific Debugging (TRAFFIC)

Track → Reproduce → Automate → Find origins → Focus → Isolate → Correct

Work backward: Failure → Propagation → Infection → Defect.

</context>

Version Control Debugging

Git Bisect

Binary search through commit history:

Usage:

git bisect start
git bisect bad
git bisect good abc1234
git bisect good
git bisect reset

Mark current as bad, known-good commit, then test each checkout (good/bad) until culprit found.

Automated:

git bisect run ./test.sh

Exit codes: 0 = good, 1-127 = bad, 125 = skip

Complexity: O(log n) - tests ~7 commits for 100 commit range

When to use:

Regression appeared, unknown when
Automated test can detect the bug
Need to find exact commit that broke something

</context>

Coverage-Based Localization

Spectrum-Based Fault Localization (SBFL)

Use test coverage data to locate bugs:

Concept: Statements executed by failing tests but not passing tests are more suspicious.

Ochiai Formula (most effective):

suspiciousness(s) = failed(s) / sqrt(total_failed * (failed(s) + passed(s)))

Practical application:

Run test suite with coverage
Note which tests fail
Rank statements by how often they appear in failing vs passing tests
Inspect highest-ranked statements first

For AI agents: When multiple tests fail, identify code paths common to failures but not successes.

</context>

ACTION (Recency Zone)

When debugging or validating:

Use Strong Inference: devise multiple hypotheses before testing
Apply 5 Whys to find root cause, not symptoms
Use Git Bisect for regressions (binary search ~7 commits for 100-commit range)
Run tests with coverage; inspect code paths common to failures

</required>

Claude Code Plugin Integration

When pr-review-toolkit is available:

silent-failure-hunter agent: Detects silent failures, inadequate error handling
Analyzes catch blocks, fallback behavior, missing logging
Trigger: "Check for silent failures" or use Task tool

</context>

Ralph Loop Integration

Debugging = Ralph iteration: hypothesis → test → eliminate → iterate until <promise>ROOT CAUSE FOUND</promise>.

See @smith-ralph/SKILL.md for full patterns.

</context> <related>

@smith-guidance/SKILL.md - Anti-sycophancy, HHH framework, exploration workflow
@smith-analysis/SKILL.md - Reasoning patterns, problem decomposition
@smith-clarity/SKILL.md - Cognitive guards, logic fallacies

</related>

Agent Skills: Verification Techniques

Install this agent skill to your local

Skill Files