Hypothesis-Driven Debugging
Generate a structured debugging document that identifies candidate root causes and provides falsification plans for each. The output document instructs a separate execution agent; do not perform the investigation yourself.
Philosophical Foundation
Apply Popperian falsificationism: hypotheses cannot be proven true, only disproven. Design tests that could definitively rule out each hypothesis rather than confirm it. A good falsification test produces a clear negative result if the hypothesis is wrong.
Process
1. Gather Context
Before forming hypotheses, collect:
- Symptom description: What behaviour is observed vs expected?
- Reproduction conditions: When does it occur? Intermittent or consistent?
- Recent changes: Deployments, configuration changes, dependency updates
- Error artefacts: Stack traces, logs, error messages, screenshots
- Environmental factors: OS, runtime versions, network conditions
If information is missing, note gaps in the output document.
2. Form Hypotheses
Generate 1–5 hypotheses ranked by plausibility. Each hypothesis must be:
- Specific: Name the component, function, or interaction suspected
- Falsifiable: A concrete test could disprove it
- Independent: Falsifying one should not automatically falsify others
Common hypothesis categories:
| Category | Examples | |----------|----------| | State | Race condition, stale cache, corrupted data | | Input | Malformed payload, encoding issue, boundary case | | Environment | Missing dependency, version mismatch, resource exhaustion | | Logic | Off-by-one, incorrect predicate, missing null check | | Integration | API contract violation, timeout, auth failure |
Avoid vague hypotheses ("something wrong with the database"). Pin down the specific failure mode.
3. Design Falsification Plans
For each hypothesis, specify:
- Prediction: If this hypothesis is correct, what observable outcome follows?
- Falsification test: What action would produce a contradicting observation?
- Expected negative result: What outcome would disprove the hypothesis?
- Tooling required: Commands, scripts, or instrumentation needed
- Confidence impact: How decisively would a negative result rule this out?
Prefer tests that are:
- Quick to execute
- Minimally invasive
- Deterministic rather than probabilistic
4. Output Document
Generate a Markdown document following the template in assets/debugging-plan.md. Save to the working directory as debugging-plan-{timestamp}.md.
Quality Criteria
A well-formed debugging plan exhibits:
- Mutual exclusivity: At least one hypothesis should survive if others fail
- Collective exhaustiveness: Hypotheses cover the likely failure space
- Ordered efficiency: Cheapest decisive tests appear first
- Clear success criteria: The executing agent knows when to stop
Anti-Patterns
- Confirmation bias: Designing tests that can only succeed, not fail
- Hypothesis creep: Adding new hypotheses during execution rather than revision
- Coupling: Tests that cannot isolate individual hypotheses
- Vagueness: "Check the logs" without specifying what pattern would falsify
References
references/examples.md: Worked examples of hypothesis-falsification pairs across common debugging scenarios (API timeouts, flaky tests, memory leaks)