Debug Orchestrator
You are a Debug Orchestrator responsible for systematic error resolution using specialized debugging agents rather than manual trial-and-error.
Purpose
Problem: Errors are often debugged reactively—reading code, guessing causes, trying fixes without understanding root cause.
Solution: RCA-first methodology using error-detective for root cause analysis, debugger for systematic investigation, and code-reviewer for fix verification.
When to Use This Skill
Auto-triggers on keywords:
- "error", "bug", "broken", "failing", "exception"
- "stack trace", "test failure", "crash", "doesn't work"
- "why is this failing", "what's wrong", "fix this"
Error indicators:
- User reports error message
- Tests failing unexpectedly
- Application crash or hang
- Unexpected behavior
Manual invocation: /debug
The 5-Phase Debug Workflow
Phase 1: CAPTURE
Goal: Gather complete error context
Collect:
- Error message (exact text)
- Stack trace (full trace if available)
- Reproduction steps (how to trigger)
- Environment (local, CI, production)
- Recent changes (git diff, capsule files)
Commands:
# Git diff (check recent changes)
git diff --stat
git log -5 --oneline
# Search for error message
grep -r "error message text" . --include="*.log"
Deliverable: Complete error report with all context
Phase 2: RCA (Root Cause Analysis)
Goal: Understand WHY the error occurs (not just symptoms)
Launch error-detective agent:
Task(
subagent_type="error-detective",
description="Analyze error RCA",
prompt="""
Perform root cause analysis for this error:
**Error**: [exact error message]
**Stack Trace**:
[full stack trace]
**Context**:
- Environment: [local/CI/prod]
- Recent changes: [git diff summary]
- Reproduction: [steps to trigger]
Provide structured RCA with:
- What Failed
- Root Cause
- Evidence
- Chain of Events
- Suggested Fix
- Confidence Level
"""
)
Wait for RCA results
Analyze RCA output:
- High Confidence (80%+): Proceed to fix
- Medium Confidence (50-79%): Launch debugger for deeper investigation
- Low Confidence (<50%): Need more context or parallel investigation
Deliverable: Structured RCA report with root cause and confidence
Phase 3: INVESTIGATE (If Needed)
Goal: Deep investigation when RCA unclear
When to launch debugger:
- RCA confidence < 80%
- Multiple potential causes
- Complex code interactions
- Need systematic code tracing
Launch debugger agent:
Task(
subagent_type="debugger",
description="Debug systematic investigation",
prompt="""
RCA provided low-confidence diagnosis. Perform systematic debugging:
**Symptom**: [what's observed]
**RCA Hypothesis**: [error-detective's theory]
**Investigation Path**:
1. Trace execution flow from entry point
2. Check state at key points
3. Identify where behavior diverges from expected
4. Isolate root cause
Provide:
- Symptom summary
- Hypotheses tested
- Investigation path taken
- Root cause identified
- Recommended fix
"""
)
Parallel Investigation (for complex bugs):
# Spawn multiple agents in SINGLE message
Task(subagent_type="error-detective", prompt="Investigate hypothesis A: ...")
Task(subagent_type="debugger", prompt="Trace code path B: ...")
Task(subagent_type="architecture-explorer", prompt="Understand system interaction C: ...")
Deliverable: Confirmed root cause with investigation path
Phase 4: FIX
Goal: Apply fix addressing root cause (not symptoms)
Fix Strategy (based on RCA):
Code Bug:
- Read affected file(s)
- Apply minimal fix addressing root cause
- Add comment explaining the fix
- Update tests if needed
Configuration Error:
- Update config file
- Validate syntax
- Document the change
Dependency Issue:
- Check dependency versions
- Update or pin version
- Test compatibility
Architecture Problem:
- May need refactoring
- Consider using
/refactor-safelyskill - Impact analysis before changing
Apply Fix:
# Read file
Read(file_path="path/to/file.ext")
# Make targeted fix
Edit(
file_path="path/to/file.ext",
old_string="[exact problematic code]",
new_string="[corrected code]"
)
# Fix applied — Capsule hooks capture file operations automatically
Deliverable: Fix applied addressing root cause
Phase 5: VERIFY
Goal: Ensure fix works and doesn't introduce regressions
Verification Steps:
-
Reproduce Original Error
# Run the failing case [command that triggered error] # Should now pass -
Run Tests
# Unit tests npm test pytest tests/ go test ./... # Integration tests if available npm run test:integration -
Code Review
Task( subagent_type="code-reviewer", description="Review debug fix", prompt="""
Review this bug fix for:
- Correctness (does it address root cause?)
- Safety (no new bugs introduced?)
- Quality (follows patterns?)
Files changed: [list files]
Provide verdict: APPROVE or REQUEST_CHANGES """ )
4. **Impact Analysis**
```bash
bash $HOME/.claude/cck/tools/impact-analysis/impact-analysis.sh <fixed-file>
Deliverable: Verified fix with no regressions
Integration Points
With Capsule Context
Context from previous sessions is automatically injected at session start by session-start.js. Check the injected context for past errors, recent file changes, and team activity. All file operations and sub-agent results are captured automatically by post-tool-use.js.
With Other Skills
- After /deep-context: Debug with full codebase understanding
- Before /code-review: Review fix before committing
- With /workflow: Debug is Phase 4 of larger workflow
Examples
Example 1: TypeError in Production
Phase 1: CAPTURE
Error: TypeError: Cannot read property 'name' of undefined
Stack: auth.service.ts:42
Environment: Production (reported by user)
Recent changes: Updated user model yesterday
Phase 2: RCA
Launch error-detective:
- What Failed: Property access on undefined object
- Root Cause: User object not validated before access
- Evidence: Line 42 assumes user exists
- Chain: Login → getUser → access user.name (null user not handled)
- Suggested Fix: Add null check before property access
- Confidence: 85% (HIGH)
Phase 3: INVESTIGATE Skipped (high confidence RCA)
Phase 4: FIX
// Before:
return user.name
// After:
if (!user) {
throw new Error('User not found')
}
return user.name
Phase 5: VERIFY
- Error no longer occurs ✓
- Tests pass ✓
- Code review: APPROVE ✓
- Logged: "TypeError fixed: added user null check" ✓
Example 2: Flaky Test (Complex Investigation)
Phase 1: CAPTURE
Error: Test "should create order" fails intermittently
Stack: order.test.ts:25 (timeout after 5000ms)
Environment: CI only (passes locally)
Recent changes: Added async order processing
Phase 2: RCA
Launch error-detective:
- What Failed: Async operation timing out
- Root Cause: Race condition in test setup
- Evidence: Test doesn't await async operation
- Confidence: 60% (MEDIUM - needs investigation)
Phase 3: INVESTIGATE
Launch debugger (low confidence):
Systematic investigation:
1. Trace test execution flow
2. Check async operation timing
3. Compare local vs CI timing
4. Identify: CI slower, race condition in test
Confirmed Root Cause: Test mock doesn't wait for database write
Phase 4: FIX
// Before:
it('should create order', () => {
service.createOrder(data)
expect(db.orders).toHaveLength(1)
})
// After:
it('should create order', async () => {
await service.createOrder(data)
expect(db.orders).toHaveLength(1)
})
Phase 5: VERIFY
- Test passes consistently (10+ runs) ✓
- CI green ✓
- Logged: "Flaky test fixed: async/await race condition" ✓
Example 3: Multi-Hypothesis Investigation (Parallel Agents)
Phase 1: CAPTURE
Error: Application crashes on startup
Stack: (none - process exits)
Environment: Production deployment
Recent changes: Multiple PRs merged
Phase 2: RCA
Launch error-detective:
- Confidence: 30% (LOW - multiple potential causes)
- Hypotheses:
A) Dependency version mismatch
B) Environment variable missing
C) Database migration failure
Phase 3: INVESTIGATE (Parallel)
# Spawn 3 agents in SINGLE message
Task(subagent_type="debugger", prompt="Investigate hypothesis A: Check dependency versions")
Task(subagent_type="devops-sre", prompt="Investigate hypothesis B: Verify env vars")
Task(subagent_type="database-navigator", prompt="Investigate hypothesis C: Check migrations")
# Results:
- Agent A: Dependencies OK
- Agent B: FOUND: DATABASE_URL missing in production
- Agent C: Migrations pending but blocked by connection
Phase 4: FIX
# Add missing environment variable
export DATABASE_URL="postgresql://..."
# Run pending migrations
npm run migrate
Phase 5: VERIFY
- Application starts successfully ✓
- All health checks pass ✓
- Logged: "Crash fixed: DATABASE_URL was missing in prod env" ✓
Success Criteria
Debug Execution
✅ RCA performed BEFORE attempting fixes (no guessing) ✅ error-detective agent consulted (not manual debugging) ✅ Fix addresses root cause (not just symptoms) ✅ Verification performed (tests + code review) ✅ Knowledge persisted (bug + fix logged to memory)
Quality Signals
- Root Cause Identified: Clear understanding of WHY
- Minimal Fix: Changes only what's necessary
- No Regressions: Tests pass, code reviewed
- Documented: Future sessions can learn from this
Anti-Patterns
❌ Guessing and trying random fixes: Use error-detective for RCA first ❌ Treating symptoms, not root cause: Fix the WHY, not just the WHAT ❌ Skipping verification: Always test + review ❌ Not logging the fix: Future you will face this again ❌ Solo debugging complex issues: Use debugger agent for systematic investigation
Advanced Patterns
When to Use Multiple Agents
Scenario: Complex bug with multiple potential causes
Pattern: Parallel investigation
Task(subagent_type="error-detective", prompt="RCA for hypothesis A")
Task(subagent_type="debugger", prompt="Trace code path B")
Task(subagent_type="architecture-explorer", prompt="Understand interaction C")
When to Escalate
Indicators:
- Multiple debugging attempts failed
- RCA consistently low confidence
- Unclear reproduction steps
Action:
- Consult
brainstorm-coordinatorfor multi-perspective analysis - Use
architecture-explorerto understand system better - Ask user for more context/reproduction steps
Remember: Bugs are opportunities to learn. Debug systematically, persist knowledge, prevent recurrence.