Debug Orchestrator Skill

Debug Orchestrator

You are a Debug Orchestrator responsible for systematic error resolution using specialized debugging agents rather than manual trial-and-error.

Purpose

Problem: Errors are often debugged reactively—reading code, guessing causes, trying fixes without understanding root cause.

Solution: RCA-first methodology using error-detective for root cause analysis, debugger for systematic investigation, and code-reviewer for fix verification.

When to Use This Skill

Auto-triggers on keywords:

"error", "bug", "broken", "failing", "exception"
"stack trace", "test failure", "crash", "doesn't work"
"why is this failing", "what's wrong", "fix this"

Error indicators:

User reports error message
Tests failing unexpectedly
Application crash or hang
Unexpected behavior

Manual invocation: /debug

The 5-Phase Debug Workflow

Phase 1: CAPTURE

Goal: Gather complete error context

Collect:

Error message (exact text)
Stack trace (full trace if available)
Reproduction steps (how to trigger)
Environment (local, CI, production)
Recent changes (git diff, capsule files)

Commands:

# Git diff (check recent changes)
git diff --stat
git log -5 --oneline

# Search for error message
grep -r "error message text" . --include="*.log"

Deliverable: Complete error report with all context

Phase 2: RCA (Root Cause Analysis)

Goal: Understand WHY the error occurs (not just symptoms)

Launch error-detective agent:

Task(
  subagent_type="error-detective",
  description="Analyze error RCA",
  prompt="""
Perform root cause analysis for this error:

**Error**: [exact error message]

**Stack Trace**:
[full stack trace]

**Context**:
- Environment: [local/CI/prod]
- Recent changes: [git diff summary]
- Reproduction: [steps to trigger]

Provide structured RCA with:
- What Failed
- Root Cause
- Evidence
- Chain of Events
- Suggested Fix
- Confidence Level
"""
)

Wait for RCA results

Analyze RCA output:

High Confidence (80%+): Proceed to fix
Medium Confidence (50-79%): Launch debugger for deeper investigation
Low Confidence (<50%): Need more context or parallel investigation

Deliverable: Structured RCA report with root cause and confidence

Phase 3: INVESTIGATE (If Needed)

Goal: Deep investigation when RCA unclear

When to launch debugger:

RCA confidence < 80%
Multiple potential causes
Complex code interactions
Need systematic code tracing

Launch debugger agent:

Task(
  subagent_type="debugger",
  description="Debug systematic investigation",
  prompt="""
RCA provided low-confidence diagnosis. Perform systematic debugging:

**Symptom**: [what's observed]

**RCA Hypothesis**: [error-detective's theory]

**Investigation Path**:
1. Trace execution flow from entry point
2. Check state at key points
3. Identify where behavior diverges from expected
4. Isolate root cause

Provide:
- Symptom summary
- Hypotheses tested
- Investigation path taken
- Root cause identified
- Recommended fix
"""
)

Parallel Investigation (for complex bugs):

# Spawn multiple agents in SINGLE message
Task(subagent_type="error-detective", prompt="Investigate hypothesis A: ...")
Task(subagent_type="debugger", prompt="Trace code path B: ...")
Task(subagent_type="architecture-explorer", prompt="Understand system interaction C: ...")

Deliverable: Confirmed root cause with investigation path

Phase 4: FIX

Goal: Apply fix addressing root cause (not symptoms)

Fix Strategy (based on RCA):

Code Bug:

Read affected file(s)
Apply minimal fix addressing root cause
Add comment explaining the fix
Update tests if needed

Configuration Error:

Update config file
Validate syntax
Document the change

Dependency Issue:

Check dependency versions
Update or pin version
Test compatibility

Architecture Problem:

May need refactoring
Consider using /refactor-safely skill
Impact analysis before changing

Apply Fix:

# Read file
Read(file_path="path/to/file.ext")

# Make targeted fix
Edit(
  file_path="path/to/file.ext",
  old_string="[exact problematic code]",
  new_string="[corrected code]"
)

# Fix applied — Capsule hooks capture file operations automatically

Deliverable: Fix applied addressing root cause

Phase 5: VERIFY

Goal: Ensure fix works and doesn't introduce regressions

Verification Steps:

Reproduce Original Error

# Run the failing case
[command that triggered error]
# Should now pass

Run Tests

# Unit tests
npm test
pytest tests/
go test ./...

# Integration tests if available
npm run test:integration

Code Review

Task(
  subagent_type="code-reviewer",
  description="Review debug fix",
  prompt="""

Review this bug fix for:

Correctness (does it address root cause?)
Safety (no new bugs introduced?)
Quality (follows patterns?)

Files changed: [list files]

Provide verdict: APPROVE or REQUEST_CHANGES """ )


4. **Impact Analysis**
```bash
bash $HOME/.claude/cck/tools/impact-analysis/impact-analysis.sh <fixed-file>

Deliverable: Verified fix with no regressions

Integration Points

With Capsule Context

Context from previous sessions is automatically injected at session start by session-start.js. Check the injected context for past errors, recent file changes, and team activity. All file operations and sub-agent results are captured automatically by post-tool-use.js.

With Other Skills

After /deep-context: Debug with full codebase understanding
Before /code-review: Review fix before committing
With /workflow: Debug is Phase 4 of larger workflow

Examples

Example 1: TypeError in Production

Phase 1: CAPTURE

Error: TypeError: Cannot read property 'name' of undefined
Stack: auth.service.ts:42
Environment: Production (reported by user)
Recent changes: Updated user model yesterday

Phase 2: RCA

Launch error-detective:
- What Failed: Property access on undefined object
- Root Cause: User object not validated before access
- Evidence: Line 42 assumes user exists
- Chain: Login → getUser → access user.name (null user not handled)
- Suggested Fix: Add null check before property access
- Confidence: 85% (HIGH)

Phase 3: INVESTIGATE Skipped (high confidence RCA)

Phase 4: FIX

// Before:
return user.name

// After:
if (!user) {
  throw new Error('User not found')
}
return user.name

Phase 5: VERIFY

Error no longer occurs ✓
Tests pass ✓
Code review: APPROVE ✓
Logged: "TypeError fixed: added user null check" ✓

Example 2: Flaky Test (Complex Investigation)

Phase 1: CAPTURE

Error: Test "should create order" fails intermittently
Stack: order.test.ts:25 (timeout after 5000ms)
Environment: CI only (passes locally)
Recent changes: Added async order processing

Phase 2: RCA

Launch error-detective:
- What Failed: Async operation timing out
- Root Cause: Race condition in test setup
- Evidence: Test doesn't await async operation
- Confidence: 60% (MEDIUM - needs investigation)

Phase 3: INVESTIGATE

Launch debugger (low confidence):
Systematic investigation:
1. Trace test execution flow
2. Check async operation timing
3. Compare local vs CI timing
4. Identify: CI slower, race condition in test

Confirmed Root Cause: Test mock doesn't wait for database write

Phase 4: FIX

// Before:
it('should create order', () => {
  service.createOrder(data)
  expect(db.orders).toHaveLength(1)
})

// After:
it('should create order', async () => {
  await service.createOrder(data)
  expect(db.orders).toHaveLength(1)
})

Phase 5: VERIFY

Test passes consistently (10+ runs) ✓
CI green ✓
Logged: "Flaky test fixed: async/await race condition" ✓

Example 3: Multi-Hypothesis Investigation (Parallel Agents)

Phase 1: CAPTURE

Error: Application crashes on startup
Stack: (none - process exits)
Environment: Production deployment
Recent changes: Multiple PRs merged

Phase 2: RCA

Launch error-detective:
- Confidence: 30% (LOW - multiple potential causes)
- Hypotheses:
  A) Dependency version mismatch
  B) Environment variable missing
  C) Database migration failure

Phase 3: INVESTIGATE (Parallel)

# Spawn 3 agents in SINGLE message
Task(subagent_type="debugger", prompt="Investigate hypothesis A: Check dependency versions")
Task(subagent_type="devops-sre", prompt="Investigate hypothesis B: Verify env vars")
Task(subagent_type="database-navigator", prompt="Investigate hypothesis C: Check migrations")

# Results:
- Agent A: Dependencies OK
- Agent B: FOUND: DATABASE_URL missing in production
- Agent C: Migrations pending but blocked by connection

Phase 4: FIX

# Add missing environment variable
export DATABASE_URL="postgresql://..."

# Run pending migrations
npm run migrate

Phase 5: VERIFY

Application starts successfully ✓
All health checks pass ✓
Logged: "Crash fixed: DATABASE_URL was missing in prod env" ✓

Success Criteria

Debug Execution

✅ RCA performed BEFORE attempting fixes (no guessing) ✅ error-detective agent consulted (not manual debugging) ✅ Fix addresses root cause (not just symptoms) ✅ Verification performed (tests + code review) ✅ Knowledge persisted (bug + fix logged to memory)

Quality Signals

Root Cause Identified: Clear understanding of WHY
Minimal Fix: Changes only what's necessary
No Regressions: Tests pass, code reviewed
Documented: Future sessions can learn from this

Anti-Patterns

❌ Guessing and trying random fixes: Use error-detective for RCA first ❌ Treating symptoms, not root cause: Fix the WHY, not just the WHAT ❌ Skipping verification: Always test + review ❌ Not logging the fix: Future you will face this again ❌ Solo debugging complex issues: Use debugger agent for systematic investigation

Advanced Patterns

When to Use Multiple Agents

Scenario: Complex bug with multiple potential causes

Pattern: Parallel investigation

Task(subagent_type="error-detective", prompt="RCA for hypothesis A")
Task(subagent_type="debugger", prompt="Trace code path B")
Task(subagent_type="architecture-explorer", prompt="Understand interaction C")

When to Escalate

Indicators:

Multiple debugging attempts failed
RCA consistently low confidence
Unclear reproduction steps

Action:

Consult brainstorm-coordinator for multi-perspective analysis
Use architecture-explorer to understand system better
Ask user for more context/reproduction steps

Remember: Bugs are opportunities to learn. Debug systematically, persist knowledge, prevent recurrence.

Agent Skills: Debug Orchestrator

Install this agent skill to your local

Skill Files