Log-Fix: Iterative Bug Fixing from Log Analysis Skill

Log-Fix: Iterative Bug Fixing from Log Analysis

Analyze logs from multiple sources, identify errors, perform root cause analysis, and interactively fix issues with pattern learning.

Quick Start

# Analyze recent errors from all log sources
/sc:log-fix

# Filter by time and severity
/sc:log-fix --time 1h --level ERROR

# Target specific log source
/sc:log-fix --source backend --component app.services

# Trace a specific request
/sc:log-fix --request-id abc-123-def

# Dry run - analyze without fixing
/sc:log-fix --dry-run

Behavioral Flow

Discover - Detect available log sources (files, docker, systemd)
Load Knowledge - Check for learned patterns from previous fixes
Collect - Parse and normalize logs from detected sources
Triage - Cluster errors, rank by severity, match known patterns
Analyze - Root cause analysis with stack traces and code inspection
Fix - Interactive fix loop with user approval
Learn - Save successful fix patterns to knowledge base
Report - Summary of session with remaining issues

Flags

| Flag | Type | Default | Description | |------|------|---------|-------------| | --time | string | 1h | Time range: 30m, 1h, 1d, 7d | | --source | string | all | Log source: backend, frontend, docker, all | | --level | string | ERROR | Minimum severity: ERROR, WARNING, all | | --component | string | - | Filter by logger/module name | | --request-id | string | - | Trace specific request across logs | | --dry-run | bool | false | Analyze and report without applying fixes | | --fix | bool | true | Enable interactive fix loop |

Phase 0: Log Source Discovery

Auto-detect all available log sources before analysis.

Detection Strategy

| Source | Detection Method | Common Locations | |--------|-----------------|------------------| | File logs | Glob for logs/**/*.log, *.log | logs/, var/log/ | | Docker | Check docker compose ps | docker-compose.yml | | Systemd | Check journalctl availability | System services | | PM2 | Check pm2 list | Node.js apps |

Log Format Detection

Auto-detect log format:

JSON (structured) - Parse with jq
Text (unstructured) - Parse with regex patterns
Combined (nginx/apache style) - Parse with format-specific regex

# Detect format by reading first line
head -1 logs/app.log | python3 -c "import sys,json; json.loads(sys.stdin.read()); print('json')" 2>/dev/null || echo "text"

Phase 1: Knowledge Loading

Load historical fix knowledge for enhanced analysis.

Knowledge File Structure

{
  "version": "1.0",
  "error_patterns": [
    {
      "id": "pattern-001",
      "signature": {
        "message_pattern": "Connection refused.*port \\d+",
        "component": "app.services.database",
        "exception_type": "ConnectionError"
      },
      "fixes": [{
        "description": "Check database is running, verify connection string",
        "file_changed": "src/config.py",
        "validation_command": "pytest tests/test_db.py -v",
        "success_count": 3,
        "confidence": 0.9
      }]
    }
  ],
  "component_knowledge": {},
  "metadata": {
    "total_fixes": 0,
    "updated_at": null
  }
}

Location: .claude/knowledge/log-fix-knowledge.json

When knowledge is loaded:

Triage: Show "Known issue" badge for previously seen errors
Root Cause: Suggest fixes with historical success rates
Fix Loop: Offer one-click application of proven fixes

Phase 2: Triage and Clustering

Incident Clustering

Group by:

Same error type (identical/similar messages)
Same component (logger/service name)
Same request (request_id correlation)
Same time window (errors within 5s)

Severity Ranking

| Priority | Criteria | |----------|----------| | P0 | Stack trace, service crash, database errors | | P1 | ERROR level, API failures, auth issues | | P2 | WARNING level, deprecation, performance | | P3 | Informational, cleanup suggestions |

Triage Summary Output

## Error Triage Summary

**Time Range**: [start] to [end]
**Total Errors**: N | **Unique Types**: M | **Known Patterns**: K

### Top Issues by Frequency

| # | Count | Component | Error Pattern | Priority | Historical |
|---|-------|-----------|---------------|----------|------------|
| 1 | 15 | app.services.api | Connection timeout | P1 | Known: 3 fixes, 90% |
| 2 | 8 | app.models | Validation failed | P1 | New pattern |

Phase 3: Root Cause Analysis

For each incident cluster:

Stack Trace Analysis - Extract file paths, identify failing line, map to components
Log Correlation - Follow request_id across services
Pattern Matching - Check knowledge base, then use code-level analysis
Code Inspection - Read relevant source files, identify the bug

Root Cause Output

## Root Cause: [Error Type]

**Occurrences**: N | **Component**: [logger]

### Error Details
[Full error and stack trace]

### Relevant Code
**File**: `src/services/api.py:145`
[Code context with highlighted line]

### Probable Cause
[Analysis explaining why the error occurs]

### Suggested Fix
[Specific code change with diff preview]

Phase 4: Interactive Fix Loop

Fix Workflow

Present - Show error, root cause, affected files
Show diff - Display exact code changes proposed
Ask permission - User approves, skips, or modifies
Apply if approved - Use Edit tool
Validate - Run relevant tests
Learn - Offer to save pattern to knowledge base

User Options

| Action | Description | |--------|-------------| | Apply | Apply the proposed fix | | Skip | Skip this issue, move to next | | Edit | Modify the proposed fix before applying | | Test first | Write a test before fixing (invoke /sc:tdd) | | Details | Show more context about the error | | Quit | Exit fix loop |

Validation After Fix

| Fix Type | Validation | |----------|------------| | Python code | pytest -k "test_function" -v | | JavaScript | npm test -- --testPathPattern=file | | Config | Restart and verify health | | Database | Run migrations |

Phase 5: Knowledge Learning

After a successful fix, offer to save the pattern:

Extract error signature - Convert specific values to regex patterns
Record fix details - What changed, where, validation command
Update knowledge file - Append or update existing pattern
Report confidence - Initial confidence 1.0, adjusted over time

Learning Prompt

Fix applied and validated!

Save this pattern for future similar errors?
  [Y] Yes, save pattern and fix
  [n] No, this was a one-off fix
  [e] Yes, but let me edit the description first

Phase 6: Session Report

## Log-Fix Session Summary

**Logs Analyzed**: [sources] | **Time Range**: [range]
**Errors Found**: N | **Unique Issues**: M

### Issues Addressed

| # | Component | Issue | Status | Action |
|---|-----------|-------|--------|--------|
| 1 | app.services | Connection timeout | FIXED | Added retry config |
| 2 | app.models | Validation error | SKIPPED | User deferred |

### Files Modified
- src/services/api.py
- src/config.py

### Patterns Learned
- 1 new pattern saved to knowledge base

### Next Steps
- Run `/sc:pr-check` to validate all changes
- Consider `/sc:tdd` for issues without test coverage

MCP Integration

PAL MCP (Debugging & Analysis)

| Tool | When to Use | Purpose | |------|-------------|---------| | mcp__pal__debug | Complex bugs | Multi-stage root cause analysis | | mcp__pal__thinkdeep | Unclear patterns | Deep investigation of recurring issues | | mcp__pal__codereview | Fix validation | Review proposed fix quality | | mcp__pal__apilookup | Dependency errors | Get current docs for version issues |

PAL Usage Patterns

# Debug complex error pattern
mcp__pal__debug(
    step="Investigating recurring connection timeout in API layer",
    hypothesis="Connection pool exhaustion under load",
    confidence="medium",
    relevant_files=["src/services/api.py", "src/config.py"]
)

# Deep analysis of unclear pattern
mcp__pal__thinkdeep(
    step="Why do these errors only occur during peak hours?",
    hypothesis="Race condition in connection pooling",
    confidence="low"
)

Rube MCP (Notifications)

| Tool | When to Use | Purpose | |------|-------------|---------| | mcp__rube__RUBE_SEARCH_TOOLS | External logging | Find logging service tools | | mcp__rube__RUBE_MULTI_EXECUTE_TOOL | Notifications | Post fix summary to Slack/Jira |

Tool Coordination

Bash - Log parsing (jq, grep), test execution, docker commands
Glob - Log file discovery
Grep - Error pattern search, request ID tracing
Read - Source code inspection, log file reading
Edit - Apply code fixes
Write - Knowledge base updates, session reports

Related Skills

/sc:tdd - Write tests before fixing (TDD approach)
/sc:pr-check - Validate all changes before PR
/sc:analyze - Deeper code analysis when root cause is unclear

Agent Skills: Log-Fix: Iterative Bug Fixing from Log Analysis

Install this agent skill to your local

Skill Files