Hypothesis-Driven Debugging
Generate a structured debugging document that identifies candidate root causes and provides falsification plans for each. The output document instructs a separate execution agent; do not perform the investigation yourself.
Hypothesis falsification must be delegated to a sub-agent. Use the alchemist
agent type when it is available; otherwise use the nearest available
investigation-oriented sub-agent and record the fallback in the plan.
Philosophical Foundation
Apply Popperian falsificationism: hypotheses cannot be proven true, only disproven. Design tests that could definitively rule out each hypothesis rather than confirm it. A good falsification test produces a clear negative result if the hypothesis is wrong.
Process
1. Gather Context
Before forming hypotheses, collect:
- Symptom description: What behaviour is observed vs expected?
- Reproduction conditions: When does it occur? Intermittent or consistent?
- Recent changes: Deployments, configuration changes, dependency updates
- Error artefacts: Stack traces, logs, error messages, screenshots
- Environmental factors: OS, runtime versions, network conditions
If information is missing, note gaps in the output document.
2. Form Hypotheses
Generate 1–5 hypotheses ranked by plausibility. Each hypothesis must be:
- Specific: Name the component, function, or interaction suspected
- Falsifiable: A concrete test could disprove it
- Independent: Falsifying one should not automatically falsify others
Common hypothesis categories:
| Category | Examples | | ----------- | --------------------------------------------------------- | | State | Race condition, stale cache, corrupted data | | Input | Malformed payload, encoding issue, boundary case | | Environment | Missing dependency, version mismatch, resource exhaustion | | Logic | Off-by-one, incorrect predicate, missing null check | | Integration | API contract violation, timeout, auth failure |
Avoid vague hypotheses ("something wrong with the database"). Pin down the specific failure mode.
3. Design Falsification Plans
For each hypothesis, specify:
- Prediction: If this hypothesis is correct, what observable outcome follows?
- Falsification test: What action would produce a contradicting observation?
- Expected negative result: What outcome would disprove the hypothesis?
- Tooling required: Commands, scripts, or instrumentation needed
- Confidence impact: How decisively would a negative result rule this out?
Prefer tests that are:
- Quick to execute
- Minimally invasive
- Deterministic rather than probabilistic
4. Output Document
Generate a Markdown document following the template in
assets/debugging-plan.md. Save it under docs/debugging/ as
debugging-plan-{timestamp}.md, creating docs/debugging/ first if it does
not already exist. The document must name the sub-agent type that will execute
falsification, preferring alchemist when available, and must state that the
planning agent is not the execution agent.
Quality Criteria
A well-formed debugging plan exhibits:
- Mutual exclusivity: At least one hypothesis should survive if others fail
- Collective exhaustiveness: Hypotheses cover the likely failure space
- Ordered efficiency: Cheapest decisive tests appear first
- Clear success criteria: The executing agent knows when to stop
- Delegated falsification: The plan is ready for a sub-agent, preferably
alchemist, to execute without relying on hidden context from the planning agent
Anti-Patterns
- Confirmation bias: Designing tests that can only succeed, not fail
- Hypothesis creep: Adding new hypotheses during execution rather than revision
- Coupling: Tests that cannot isolate individual hypotheses
- Vagueness: "Check the logs" without specifying what pattern would falsify
- Self-execution: The planning agent performs the falsification work instead of delegating it to a sub-agent
References
references/examples.md: Worked examples of hypothesis-falsification pairs across common debugging scenarios (API timeouts, flaky tests, memory leaks)