Systematic Debugging
Evidence-based investigation -> root cause -> verified fix.
Steps
- Load the
outfitter:maintain-tasksskill for stage tracking - Collect evidence (reproduce, gather symptoms)
- Isolate variables (narrow scope)
- Formulate and test hypotheses
- Implement fix with failing test first
- Verify fix resolves the issue
For formal incident investigation requiring RCA documentation, use find-root-causes skill instead (it loads this skill and adds formal RCA methodology).
<when_to_use>
- Bugs, errors, exceptions, crashes
- Unexpected behavior or wrong results
- Failing tests (unit, integration, e2e)
- Intermittent or timing-dependent failures
- Performance issues (slow, memory leaks, high CPU)
- Integration failures (API, database, external services)
NOT for: obvious fixes, feature requests, architecture planning
</when_to_use>
<iron_law>
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
Never propose solutions or "try this" without understanding root cause through systematic investigation.
</iron_law>
<stages>See Steps section for skill dependencies. Stages advance forward only.
| Stage | Trigger | activeForm | |-------|---------|------------| | Collect Evidence | Session start | "Collecting evidence" | | Isolate Variables | Evidence gathered | "Isolating variables" | | Formulate Hypotheses | Problem isolated | "Formulating hypotheses" | | Test Hypothesis | Hypothesis formed | "Testing hypothesis" | | Verify Fix | Fix identified | "Verifying fix" |
Situational (insert when triggered):
- Iterate -> Hypothesis disproven, loops back with new hypothesis
Workflow:
- Start: "Collect Evidence" as
in_progress - Transition: Mark current
completed, add nextin_progress - Failed hypothesis: Add "Iterate" task
- Quick fixes: If root cause obvious from error, skip to "Verify Fix" (still create failing test)
- Need more evidence: Add new evidence task (don't regress stages)
- Circuit breaker: After 3 failed hypotheses -> escalate
<quick_start>
- Create "Collect Evidence" todo as
in_progress - Reproduce - exact steps to trigger consistently
- Investigate - gather evidence about what's happening
- Analyze - compare working vs broken, find differences
- Test hypothesis - single specific hypothesis, minimal test
- Implement - failing test first, then fix
- Update todos on stage transitions
</quick_start>
<stage_1_root_cause>
Goal: Understand what's actually happening.
Transition: Mark complete when you have reproduction steps and initial evidence.
Read error messages completely
- Stack traces top to bottom
- Note file paths, line numbers, variable names
- Look for "caused by" chains
Reproduce consistently
- Document exact trigger steps
- Note inputs that cause vs don't cause
- Check if intermittent (timing, race conditions)
- Verify in clean environment
Check recent changes
git diff- what changed?git log --since="yesterday"- recent commits- Dependency updates
- Config/environment changes
Gather evidence
- Add logging at key points
- Print variable values at transformations
- Log function entry/exit with parameters
- Capture timestamps for timing issues
Trace data flow backward
- Where does bad value come from?
- Track through transformations
- Find first place it becomes wrong
Red flags (return to evidence gathering):
- "I think maybe X is the problem"
- "Let's try changing Y"
- "It might be related to Z"
- Starting to write code before understanding
</stage_1_root_cause>
<stage_2_pattern_analysis>
Goal: Learn from working code to understand broken code.
Transition: Mark complete when key differences identified.
Find working examples
- Search for similar functionality that works
rg "pattern"for similar patterns- Look for passing vs failing tests
- Check git history for when it worked
Read references completely
- Every line, not skimming
- Full context
- All dependencies/imports
- Configuration and setup
Identify every difference
- Line by line working vs broken
- Different imports?
- Different function signatures?
- Different error handling?
- Different data flow?
- Different configuration?
Understand dependencies
- Libraries/packages involved
- Versions in use
- External services
- Shared state
- Assumptions made
Questions to answer:
- Why does working version work?
- What's fundamentally different?
- Edge cases working version handles?
- Invariants working version maintains?
</stage_2_pattern_analysis>
<stage_3_hypothesis_testing>
Goal: Test one specific idea with minimal change.
Transition: Mark complete when specific, evidence-based hypothesis formed.
Form single hypothesis
- Template: "X is root cause because Y"
- Must explain all symptoms
- Must be testable with small change
- Must be based on evidence from stages 1-2
Design minimal test
- Smallest change to test hypothesis
- Change ONE variable
- Preserve everything else
- Make reversible
Execute and verify
- Apply change
- Run reproduction steps
- Observe carefully
- Document results
Outcomes:
- Fixed: Confirm across all cases, proceed to Verify Fix
- Not fixed: Mark complete, add "Iterate", form NEW hypothesis
- Partially fixed: Add "Iterate" for remaining issues
- Never: Random variations hoping one works
Bad hypotheses (too vague):
- "Maybe it's a race condition"
- "Could be caching or permissions"
- "Probably something with the database"
Good hypotheses (specific, testable):
- "Fails because expects number but receives string when API returns empty"
- "Race condition: fetchData() called before initializeClient() completes"
- "Memory leak: event listeners in useEffect never removed in cleanup"
</stage_3_hypothesis_testing>
<stage_4_implementation>
Goal: Fix root cause permanently with verification.
Transition: Root cause confirmed, ready for permanent fix.
Create failing test
- Write test reproducing bug
- Verify fails before fix
- Should pass after fix
- Captures exact broken scenario
Implement single fix
- Address identified root cause
- No additional "improvements"
- No refactoring "while you're there"
- Just fix the problem
Verify fix
- Failing test now passes
- Existing tests still pass
- Manual reproduction no longer triggers bug
- No new errors/warnings
Circuit breaker If 3+ fixes tried without success: STOP
- Problem isn't hypothesis - problem is architecture
- May be using wrong pattern entirely
- Escalate or redesign
After fixing:
- Mark "Verify Fix" completed
- Add defensive validation
- Document root cause
- Consider similar bugs elsewhere
</stage_4_implementation>
<red_flags>
STOP and return to Stage 1 if you catch yourself:
- "Quick fix for now, investigate later"
- "Just try changing X and see"
- "I don't fully understand but this might work"
- "One more fix attempt" (already tried 2+)
- "Let me try a few different things"
- Proposing solutions before gathering evidence
- Skipping failing test case
- Fixing symptoms instead of root cause
ALL mean: STOP. Add new "Collect Evidence" task.
</red_flags>
<escalation>When to escalate:
- After 3 failed fix attempts - architecture may be wrong
- No clear reproduction - need more context/access
- External system issues - need vendor/team involvement
- Security implications - need security expertise
- Data corruption risks - need backup/recovery planning
Before claiming "fixed":
- [ ] Root cause identified with evidence
- [ ] Failing test case created
- [ ] Fix addresses root cause only
- [ ] Test now passes
- [ ] All existing tests pass
- [ ] Manual reproduction no longer triggers bug
- [ ] No new warnings/errors
- [ ] Root cause documented
- [ ] Prevention measures considered
- [ ] "Verify Fix" marked completed
Understanding the bug is more valuable than fixing it quickly.
</completion> <rules>ALWAYS:
- Create "Collect Evidence" todo at session start
- Follow four-stage framework
- Update todos on stage transitions
- Create failing test before fix
- Test single hypothesis at a time
- Document root cause after fix
- Mark "Verify Fix" complete only after tests pass
NEVER:
- Propose fixes without understanding root cause
- Skip evidence gathering
- Test multiple hypotheses simultaneously
- Skip failing test case
- Fix symptoms instead of root cause
- Continue after 3 failed fixes without escalation
- Regress stages - add new tasks if needed
- playbooks.md - bug-type specific investigations
- evidence-patterns.md - diagnostic techniques
- reproduction.md - reproduction techniques
- integration.md - workflow integration, anti-patterns