Hypothesis-Elimination Reasoning (HE) Skill

Hypothesis-Elimination Reasoning (HE)

Purpose: Systematic identification of root causes through evidence-based elimination. Unlike exploration methodologies (BoT, ToT), HE is designed to NARROW possibilities, not expand them.

When to Use Hypothesis-Elimination

✅ Use HE when:

Problem has ONE correct answer among many possibilities
Evidence can discriminate between hypotheses
Time-critical diagnosis (production incidents, debugging)
"What caused X?" questions
Differential diagnosis scenarios

❌ Don't use HE when:

Multiple solutions are equally valid (use BoT)
Need to optimize among options (use ToT)
No discriminating evidence available
Creative/generative problems

Examples:

"Why is the API returning 500 errors?" ✅
"What's causing memory leaks in production?" ✅
"Which database should we use?" ❌ (use ToT)
"What features should we build?" ❌ (use BoT)

Core Methodology: 5-Phase HEDAM Process

Phase 1: Hypothesis Generation (Diverge)

Goal: Generate ALL plausible hypotheses without filtering

Process:

State the observable symptom precisely
Generate hypotheses across ALL relevant categories:
- Recent changes (code, config, infrastructure)
- External dependencies (APIs, services, network)
- Resource exhaustion (memory, CPU, disk, connections)
- Data issues (corruption, volume, format)
- Timing/race conditions
- Security incidents
- Human error
- Unknown/novel causes
For each hypothesis, note:
- Mechanism: How would this cause the symptom?
- Prior probability: Based on frequency in similar situations
- Discriminating evidence: What would prove/disprove this?

Template:

## Hypothesis [N]: [Name]
- **Mechanism**: [How this causes the symptom]
- **Prior Probability**: [Low/Medium/High] - [Justification]
- **Supporting Evidence Needed**: [What would increase probability]
- **Eliminating Evidence Needed**: [What would rule this out]

Quantity: Generate 8-15 hypotheses. If fewer than 8, challenge assumptions.

Phase 2: Evidence Hierarchy Design

Goal: Design the most efficient evidence-gathering sequence

Principle: Gather DISCRIMINATING evidence first (evidence that eliminates multiple hypotheses)

Process:

List all potential evidence sources:
- Logs (application, system, network)
- Metrics (CPU, memory, latency, error rates)
- Recent changes (git log, deployment history)
- Reproduction attempts
- User reports / patterns
- External status pages
Score each evidence source:
- Discrimination Power: How many hypotheses does this affect? (1-10)
- Acquisition Cost: How long/difficult to obtain? (1-10, lower = easier)
- Priority Score: Discrimination / Cost
Rank evidence sources by priority score
Design evidence-gathering sequence (highest priority first)

Example:

| Evidence Source | Discriminates | Cost | Priority |
|-----------------|---------------|------|----------|
| Error logs (last hour) | 8 hypotheses | 2 | 4.0 ⬅️ First |
| Recent deployments | 5 hypotheses | 1 | 5.0 ⬅️ First |
| Memory metrics | 3 hypotheses | 2 | 1.5 |
| Network trace | 4 hypotheses | 6 | 0.67 |
| Full reproduction | 10 hypotheses | 8 | 1.25 |

Phase 3: Systematic Elimination

Goal: Eliminate hypotheses through evidence, not intuition

Process: For each evidence source (in priority order):

Gather evidence (read logs, check metrics, etc.)

Update ALL hypotheses:

### Evidence: [What was found]

| Hypothesis | Impact | New Status |
|------------|--------|------------|
| H1: Memory leak | No memory growth seen | ELIMINATED |
| H2: DB connection pool | Connection count normal | ELIMINATED |
| H3: Slow external API | Latency spike at 14:32 | STRENGTHENED |
| H4: Recent deployment | Deploy at 14:30 | STRENGTHENED |

Track elimination count: Stop when 1-2 hypotheses remain
Avoid confirmation bias: Actively seek evidence AGAINST remaining hypotheses

Elimination Criteria:

ELIMINATED: Evidence directly contradicts mechanism
WEAKENED: Evidence reduces probability but doesn't eliminate
UNCHANGED: Evidence doesn't affect this hypothesis
STRENGTHENED: Evidence increases probability

Phase 4: Confirmation Testing

Goal: Confirm the remaining hypothesis through targeted testing

Process:

For the leading hypothesis, identify:
- Prediction: If this is the cause, what else should we observe?
- Test: How can we verify this prediction?
- Expected result: What confirms the hypothesis?
Execute confirmation test
Evaluate:
- CONFIRMED: Prediction matched, mechanism verified
- PARTIAL: Some predictions matched, uncertainty remains
- REFUTED: Prediction failed, reopen eliminated hypotheses

Confirmation Checklist:

[ ] Can we reproduce the issue with the identified cause?
[ ] Does fixing the cause resolve the symptom?
[ ] Does the timeline match (cause preceded symptom)?
[ ] Is the mechanism physically/logically possible?

Phase 5: Root Cause Documentation

Goal: Document findings for future reference and prevention

Template:

## Root Cause Analysis: [Issue Title]

### Summary
- **Symptom**: [What was observed]
- **Root Cause**: [Confirmed cause]
- **Mechanism**: [How the cause produced the symptom]
- **Timeline**: [When cause occurred, when symptom appeared]

### Elimination Path
1. Started with [N] hypotheses
2. [Evidence 1] eliminated [X] hypotheses
3. [Evidence 2] eliminated [Y] hypotheses
4. Confirmed via [Test]

### Hypotheses Considered and Eliminated
| Hypothesis | Eliminated By | Key Evidence |
|------------|---------------|--------------|
| H1 | Evidence 1 | [Specific finding] |
| H2 | Evidence 2 | [Specific finding] |

### Prevention
- [ ] [Action to prevent recurrence]
- [ ] [Monitoring to detect earlier]

### Confidence: [X]%
- [Justification for confidence level]

Time-Critical Mode (Incident Response)

When time is critical, use accelerated HE:

5-Minute Triage:

Check last 3 deployments (30 sec)
Check external dependency status pages (30 sec)
Check error rate spike timing (1 min)
Check resource exhaustion (CPU, mem, disk) (1 min)
Check for similar recent incidents (1 min)
Form top-2 hypotheses (1 min)

Parallel Elimination:

Assign different team members to different evidence sources
Use chat/war room for real-time hypothesis updates
Timebox each investigation track (10 min max)

Common Mistakes

Premature Convergence: Latching onto first plausible hypothesis
- Fix: Force generation of 8+ hypotheses before investigating
Confirmation Bias: Seeking evidence FOR favorite hypothesis
- Fix: Actively try to DISPROVE remaining hypotheses
Ignoring Low-Probability Causes: Novel causes get eliminated by assumption
- Fix: Keep "Unknown/Novel" as permanent hypothesis until confirmed
Evidence Tunnel Vision: Only looking at familiar evidence sources
- Fix: Use the Evidence Hierarchy Design phase systematically
Incomplete Elimination: Declaring victory with 3+ hypotheses remaining
- Fix: Require 1-2 remaining before confirmation phase

Integration with Other Patterns

HE → SRC: After identifying root cause, use Self-Reflecting Chain to trace the exact failure path

BoT → HE: If problem is "what could go wrong?", use BoT first to generate failure modes, then HE when a failure occurs

HE → ToT: After finding root cause, use ToT to evaluate fix options

Confidence Calibration

| Remaining Hypotheses | Max Confidence | |---------------------|----------------| | 1 (confirmed) | 90-95% | | 2 (one leading) | 70-80% | | 3+ | <60% - need more evidence | | All eliminated | 0% - missing hypothesis, restart |

Quick Reference

HEDAM Process:
H - Hypothesis Generation (8-15 possibilities)
E - Evidence Hierarchy (prioritize discriminating evidence)
D - Discrimination/Elimination (update all hypotheses per evidence)
A - Assertion/Confirmation (test leading hypothesis)
M - Memorialize (document for future)

Agent Skills: Hypothesis-Elimination Reasoning (HE)

Install this agent skill to your local

Skill Files