Agent Skills: Hypothesis-Driven Debugging

Structured code debugging through hypothesis formation and falsification planning. Use when diagnosing bugs, unexpected behaviour, or system failures where the root cause is unclear. Produces a hypothesis document for execution by another agent rather than performing the investigation directly. Triggers on requests to debug issues, diagnose problems, investigate failures, or create debugging plans.

UncategorizedID: leynos/agent-helper-scripts/hypothesis-debugging

Install this agent skill to your local

pnpm dlx add-skill https://github.com/leynos/agent-helper-scripts/tree/HEAD/skills/hypothesis-debugging

Skill Files

Browse the full folder contents for hypothesis-debugging.

Download Skill

Loading file tree…

skills/hypothesis-debugging/SKILL.md

Skill Metadata

Name
hypothesis-debugging
Description
Structured code debugging through hypothesis formation and falsification planning. Use when diagnosing bugs, unexpected behaviour, or system failures where the root cause is unclear. Produces a hypothesis document for execution by another agent rather than performing the investigation directly. Triggers on requests to debug issues, diagnose problems, investigate failures, or create debugging plans.

Hypothesis-Driven Debugging

Generate a structured debugging document that identifies candidate root causes and provides falsification plans for each. The output document instructs a separate execution agent; do not perform the investigation yourself.

Philosophical Foundation

Apply Popperian falsificationism: hypotheses cannot be proven true, only disproven. Design tests that could definitively rule out each hypothesis rather than confirm it. A good falsification test produces a clear negative result if the hypothesis is wrong.

Process

1. Gather Context

Before forming hypotheses, collect:

  • Symptom description: What behaviour is observed vs expected?
  • Reproduction conditions: When does it occur? Intermittent or consistent?
  • Recent changes: Deployments, configuration changes, dependency updates
  • Error artefacts: Stack traces, logs, error messages, screenshots
  • Environmental factors: OS, runtime versions, network conditions

If information is missing, note gaps in the output document.

2. Form Hypotheses

Generate 1–5 hypotheses ranked by plausibility. Each hypothesis must be:

  • Specific: Name the component, function, or interaction suspected
  • Falsifiable: A concrete test could disprove it
  • Independent: Falsifying one should not automatically falsify others

Common hypothesis categories:

| Category | Examples | |----------|----------| | State | Race condition, stale cache, corrupted data | | Input | Malformed payload, encoding issue, boundary case | | Environment | Missing dependency, version mismatch, resource exhaustion | | Logic | Off-by-one, incorrect predicate, missing null check | | Integration | API contract violation, timeout, auth failure |

Avoid vague hypotheses ("something wrong with the database"). Pin down the specific failure mode.

3. Design Falsification Plans

For each hypothesis, specify:

  1. Prediction: If this hypothesis is correct, what observable outcome follows?
  2. Falsification test: What action would produce a contradicting observation?
  3. Expected negative result: What outcome would disprove the hypothesis?
  4. Tooling required: Commands, scripts, or instrumentation needed
  5. Confidence impact: How decisively would a negative result rule this out?

Prefer tests that are:

  • Quick to execute
  • Minimally invasive
  • Deterministic rather than probabilistic

4. Output Document

Generate a Markdown document following the template in assets/debugging-plan.md. Save to the working directory as debugging-plan-{timestamp}.md.

Quality Criteria

A well-formed debugging plan exhibits:

  • Mutual exclusivity: At least one hypothesis should survive if others fail
  • Collective exhaustiveness: Hypotheses cover the likely failure space
  • Ordered efficiency: Cheapest decisive tests appear first
  • Clear success criteria: The executing agent knows when to stop

Anti-Patterns

  • Confirmation bias: Designing tests that can only succeed, not fail
  • Hypothesis creep: Adding new hypotheses during execution rather than revision
  • Coupling: Tests that cannot isolate individual hypotheses
  • Vagueness: "Check the logs" without specifying what pattern would falsify

References

  • references/examples.md: Worked examples of hypothesis-falsification pairs across common debugging scenarios (API timeouts, flaky tests, memory leaks)