Hypothesis-Driven Debugging Skill

Hypothesis-Driven Debugging

Generate a structured debugging document that identifies candidate root causes and provides falsification plans for each. The output document instructs a separate execution agent; do not perform the investigation yourself.

Philosophical Foundation

Apply Popperian falsificationism: hypotheses cannot be proven true, only disproven. Design tests that could definitively rule out each hypothesis rather than confirm it. A good falsification test produces a clear negative result if the hypothesis is wrong.

Process

1. Gather Context

Before forming hypotheses, collect:

Symptom description: What behaviour is observed vs expected?
Reproduction conditions: When does it occur? Intermittent or consistent?
Recent changes: Deployments, configuration changes, dependency updates
Error artefacts: Stack traces, logs, error messages, screenshots
Environmental factors: OS, runtime versions, network conditions

If information is missing, note gaps in the output document.

2. Form Hypotheses

Generate 1–5 hypotheses ranked by plausibility. Each hypothesis must be:

Specific: Name the component, function, or interaction suspected
Falsifiable: A concrete test could disprove it
Independent: Falsifying one should not automatically falsify others

Common hypothesis categories:

| Category | Examples | |----------|----------| | State | Race condition, stale cache, corrupted data | | Input | Malformed payload, encoding issue, boundary case | | Environment | Missing dependency, version mismatch, resource exhaustion | | Logic | Off-by-one, incorrect predicate, missing null check | | Integration | API contract violation, timeout, auth failure |

Avoid vague hypotheses ("something wrong with the database"). Pin down the specific failure mode.

3. Design Falsification Plans

For each hypothesis, specify:

Prediction: If this hypothesis is correct, what observable outcome follows?
Falsification test: What action would produce a contradicting observation?
Expected negative result: What outcome would disprove the hypothesis?
Tooling required: Commands, scripts, or instrumentation needed
Confidence impact: How decisively would a negative result rule this out?

Prefer tests that are:

Quick to execute
Minimally invasive
Deterministic rather than probabilistic

4. Output Document

Generate a Markdown document following the template in assets/debugging-plan.md. Save to the working directory as debugging-plan-{timestamp}.md.

Quality Criteria

A well-formed debugging plan exhibits:

Mutual exclusivity: At least one hypothesis should survive if others fail
Collective exhaustiveness: Hypotheses cover the likely failure space
Ordered efficiency: Cheapest decisive tests appear first
Clear success criteria: The executing agent knows when to stop

Anti-Patterns

Confirmation bias: Designing tests that can only succeed, not fail
Hypothesis creep: Adding new hypotheses during execution rather than revision
Coupling: Tests that cannot isolate individual hypotheses
Vagueness: "Check the logs" without specifying what pattern would falsify

References

references/examples.md: Worked examples of hypothesis-falsification pairs across common debugging scenarios (API timeouts, flaky tests, memory leaks)

Agent Skills: Hypothesis-Driven Debugging

Install this agent skill to your local

Skill Files