Systematic Debugging Skill

Systematic Debugging

Find the root cause before attempting fixes. Symptom fixes are failure.

Purpose

Debugging without methodology wastes time and creates new bugs. Random fixes address symptoms, not causes. This skill enforces systematic investigation to find root causes before any fix is attempted.

The Iron Law

ALWAYS find root cause before attempting fixes.

Quick patches mask underlying issues. If you're trying fixes without understanding why they might work, you're guessing - stop and investigate.

The Four Phases

Complete each phase in order. Do not skip to implementation.

Phase 1: Root Cause Investigation

Understand what's happening before theorizing why.

Read the error carefully
- Full error message, not just the first line
- Stack trace - where did it originate?
- Error codes or types
Reproduce consistently
- Can you trigger the failure reliably?
- What are the exact steps?
- Does it fail the same way every time?
Review recent changes
- What changed since it last worked?
- Check git log for recent commits
- Any new dependencies or configuration?
Gather diagnostic evidence
- Add logging at key points
- Check system state (memory, disk, network)
- Inspect input data
Trace backwards
- Start from the error
- Work backwards through the call stack
- Find where correct behavior diverges

Phase 2: Pattern Analysis

Compare against working code.

Find similar working code
- How does a working version do this?
- What's different about this case?
Compare against references
- Documentation examples
- Library/framework conventions
- Previous implementations
Identify the difference
- What's unique about the failing case?
- What assumption is being violated?
Understand dependencies
- What does this code depend on?
- Could a dependency have changed?
- Are versions correct?

Phase 3: Hypothesis and Testing

Scientific method for debugging.

Form a single hypothesis
- "The error occurs because X"
- Be specific and testable
- Only one hypothesis at a time
Design a test
- How would you prove/disprove this hypothesis?
- What would you expect to see if correct?
- What would you expect to see if wrong?
Make minimal changes
- Change ONE thing to test the hypothesis
- Don't bundle multiple changes
- Keep changes reversible
Observe results
- Did the change affect the behavior?
- Does it match your prediction?
- Record what happened
Iterate or proceed
- Hypothesis confirmed: proceed to Phase 4
- Hypothesis rejected: form new hypothesis, repeat Phase 3

Phase 4: Implementation

Fix only after understanding.

Write a failing test
- Capture the bug in a test
- Test should fail before fix
- Test should pass after fix
Implement single fix
- Address the root cause
- Not symptoms or side effects
- Minimal change required
Verify the fix
- Original test passes
- All other tests pass
- Manual verification if applicable
Check for related issues
- Could this bug exist elsewhere?
- Should you search for similar patterns?

Red Flags: You're Skipping Investigation

| Thought | Reality | |---------|---------| | "Let me try this quick fix" | You don't understand the cause | | "I'll just restart the service" | Masking the symptom | | "Maybe if I add a null check here" | Guessing, not debugging | | "Let me try a few things" | Random changes waste time | | "This usually fixes it" | Past fixes don't explain current bugs | | "I don't have time to investigate" | You don't have time NOT to |

Integration with RPI Workflow

During Research Phase

When investigating an issue:

Use this skill to understand the problem
Document root cause in research findings
Root cause informs the implementation plan

During Implement Phase

When a step fails verification:

Stop the step (do not proceed)
Apply this debugging methodology
Find root cause before attempting fixes
Update plan if fix requires changes
Resume implementation after fix verified

Escalation: When to Question Architecture

If you've attempted three or more fixes and the problem persists:

Stop fixing. Question the architecture.

Is the design fundamentally flawed?
Are we fighting the framework?
Should this be reimplemented differently?

Use AskUserQuestion to discuss architectural concerns before continuing.

Evidence Gathering Techniques

Logging

Add temporary logging:
- Entry/exit of functions
- Variable values at key points
- Timestamps for timing issues
- Request/response payloads

Bisection

When did it break?
- git bisect to find breaking commit
- Binary search through code changes
- Narrow down to specific change

Isolation

Simplify to reproduce:
- Remove unrelated code
- Use minimal test case
- Eliminate variables

Comparison

What's different?
- Working vs broken environment
- Working vs broken input
- Working vs broken configuration

Anti-Patterns

Shotgun Debugging

Wrong: Try random changes hoping one works Right: Form hypothesis, test it, iterate

Fix and Pray

Wrong: Apply fix, hope it works, move on Right: Verify fix addresses root cause

Debugging by Deletion

Wrong: Remove code until error goes away Right: Understand why code causes error

Copy-Paste Fixes

Wrong: Find similar fix online, apply blindly Right: Understand why the fix works for your case

Skipping to Phase 4

Wrong: "I think I know the fix, let me just try it" Right: Complete investigation phases first

Checklist Before Fixing

[ ] Error message read completely
[ ] Failure reproduced consistently
[ ] Recent changes reviewed
[ ] Diagnostic evidence gathered
[ ] Root cause identified (not just symptoms)
[ ] Hypothesis formed and tested
[ ] Fix addresses root cause
[ ] Failing test written before fix

Agent Skills: Systematic Debugging

Install this agent skill to your local

Skill Files