Agent Skills: Session Replay Skill

|

UncategorizedID: rysweet/amplihack/session-replay

Install this agent skill to your local

pnpm dlx add-skill https://github.com/rysweet/amplihack/tree/HEAD/.claude/skills/session-replay

Skill Files

Browse the full folder contents for session-replay.

Download Skill

Loading file tree…

.claude/skills/session-replay/SKILL.md

Skill Metadata

Name
session-replay
Description
|

Session Replay Skill

Purpose

This skill analyzes claude-trace JSONL files to provide insights into Claude Code session health, token usage patterns, error frequencies, and agent effectiveness. It complements the /transcripts command by focusing on API-level trace data rather than conversation transcripts.

When to Use This Skill

  • Session debugging: Diagnose why a session was slow or failed
  • Token analysis: Understand token consumption patterns
  • Error patterns: Identify recurring failures across sessions
  • Performance optimization: Find bottlenecks in tool usage
  • Agent effectiveness: Measure which agents/tools are most productive

Quick Start

Analyze Latest Session

User: Analyze my latest session health

I'll analyze the most recent trace file:

# Read latest trace file from .claude-trace/
trace_dir = Path(".claude-trace")
trace_files = sorted(trace_dir.glob("*.jsonl"), key=lambda f: f.stat().st_mtime)
latest = trace_files[-1] if trace_files else None

# Parse and analyze
if latest:
    analysis = analyze_trace_file(latest)
    print(format_session_report(analysis))

Compare Multiple Sessions

User: Compare token usage across my last 5 sessions

I'll aggregate metrics across sessions:

trace_files = sorted(Path(".claude-trace").glob("*.jsonl"))[-5:]
comparison = compare_sessions(trace_files)
print(format_comparison_table(comparison))

Actions

Action: health

Analyze session health metrics from a trace file.

What to do:

  1. Read the trace file (JSONL format)
  2. Extract API requests and responses
  3. Calculate metrics:
    • Total tokens (input/output)
    • Request count and timing
    • Error rate
    • Tool usage distribution
  4. Generate health report

Metrics to extract:

# From each JSONL line containing a request/response pair:
{
    "timestamp": "...",
    "request": {
        "method": "POST",
        "url": "https://api.anthropic.com/v1/messages",
        "body": {
            "model": "claude-...",
            "messages": [...],
            "tools": [...]
        }
    },
    "response": {
        "usage": {
            "input_tokens": N,
            "output_tokens": N
        },
        "content": [...],
        "stop_reason": "..."
    }
}

Output format:

Session Health Report
=====================
File: log-2025-11-23-19-32-36.jsonl
Duration: 45 minutes

Token Usage:
- Input: 125,432 tokens
- Output: 34,521 tokens
- Total: 159,953 tokens
- Efficiency: 27.5% output ratio

Request Stats:
- Total requests: 23
- Average latency: 2.3s
- Errors: 2 (8.7%)

Tool Usage:
- Read: 45 calls
- Edit: 12 calls
- Bash: 8 calls
- Grep: 15 calls

Health Score: 82/100 (Good)
- Minor issue: 2 errors detected

Action: errors

Identify error patterns across sessions.

What to do:

  1. Scan trace files for error responses
  2. Categorize errors by type
  3. Identify recurring patterns
  4. Suggest fixes

Error categories to detect:

  • Rate limit errors (429)
  • Token limit exceeded
  • Tool execution failures
  • Timeout errors
  • API errors

Output format:

Error Analysis
==============
Sessions analyzed: 5
Total errors: 12

Error Categories:
1. Rate limit (429): 5 occurrences
   - Recommendation: Add delays between requests

2. Token limit: 3 occurrences
   - Recommendation: Use context management skill

3. Tool failures: 4 occurrences
   - Bash timeout: 2
   - File not found: 2
   - Recommendation: Check paths before operations

Action: compare

Compare metrics across multiple sessions.

What to do:

  1. Load multiple trace files
  2. Extract comparable metrics
  3. Calculate trends
  4. Identify anomalies

Output format:

Session Comparison
==================
                    Session 1   Session 2   Session 3   Trend
Tokens (total)      150K        180K        120K        -17%
Requests            25          30          18          -28%
Errors              2           0           1           stable
Duration (min)      45          60          30          -33%
Efficiency          0.27        0.32        0.35        +7%

Action: tools

Analyze tool usage patterns.

What to do:

  1. Extract tool calls from traces
  2. Calculate frequency and timing
  3. Identify inefficient patterns
  4. Suggest optimizations

Patterns to detect:

  • Sequential calls that could be parallel
  • Repeated reads of same file
  • Excessive grep/glob calls
  • Unused tool results

Output format:

Tool Usage Analysis
===================
Tool          Calls   Avg Time   Success Rate
Read          45      0.1s       100%
Edit          12      0.3s       92%
Bash          8       1.2s       75%
Grep          15      0.2s       100%
Task          3       45s        100%

Optimization Opportunities:
1. 5 Read calls to same file within 2 minutes
   - Consider caching strategy

2. 3 sequential Bash calls could be parallelized
   - Use multiple Bash calls in single message

Implementation Notes

Parsing JSONL Traces

Claude-trace files are JSONL format with request/response pairs:

import json
from pathlib import Path
from typing import Dict, List, Any

def parse_trace_file(path: Path) -> List[Dict[str, Any]]:
    """Parse a claude-trace JSONL file."""
    entries = []
    with open(path) as f:
        for line in f:
            if line.strip():
                try:
                    entry = json.loads(line)
                    entries.append(entry)
                except json.JSONDecodeError:
                    continue
    return entries

def extract_metrics(entries: List[Dict]) -> Dict[str, Any]:
    """Extract session metrics from trace entries."""
    metrics = {
        "total_input_tokens": 0,
        "total_output_tokens": 0,
        "request_count": 0,
        "error_count": 0,
        "tool_usage": {},
        "timestamps": [],
    }

    for entry in entries:
        if "request" in entry:
            metrics["request_count"] += 1
            metrics["timestamps"].append(entry.get("timestamp", 0))

        if "response" in entry:
            usage = entry["response"].get("usage", {})
            metrics["total_input_tokens"] += usage.get("input_tokens", 0)
            metrics["total_output_tokens"] += usage.get("output_tokens", 0)

            # Check for errors
            if entry["response"].get("error"):
                metrics["error_count"] += 1

        # Extract tool usage from request body
        if "request" in entry and "body" in entry["request"]:
            body = entry["request"]["body"]
            if isinstance(body, dict) and "tools" in body:
                for tool in body["tools"]:
                    name = tool.get("name", "unknown")
                    metrics["tool_usage"][name] = metrics["tool_usage"].get(name, 0) + 1

    return metrics

Locating Trace Files

def find_trace_files(trace_dir: str = ".claude-trace") -> List[Path]:
    """Find all trace files, sorted by modification time."""
    trace_path = Path(trace_dir)
    if not trace_path.exists():
        return []
    return sorted(
        trace_path.glob("*.jsonl"),
        key=lambda f: f.stat().st_mtime,
        reverse=True  # Most recent first
    )

Error Handling

Handle common error scenarios gracefully:

def safe_parse_trace_file(path: Path) -> Tuple[List[Dict], List[str]]:
    """Parse trace file with error collection for malformed lines.

    Returns:
        Tuple of (valid_entries, error_messages)
    """
    entries = []
    errors = []

    if not path.exists():
        return [], [f"Trace file not found: {path}"]

    try:
        with open(path) as f:
            for line_num, line in enumerate(f, 1):
                if not line.strip():
                    continue
                try:
                    entry = json.loads(line)
                    entries.append(entry)
                except json.JSONDecodeError as e:
                    errors.append(f"Line {line_num}: Invalid JSON - {e}")
    except PermissionError:
        return [], [f"Permission denied: {path}"]
    except UnicodeDecodeError:
        return [], [f"Encoding error: {path} (expected UTF-8)"]

    return entries, errors


def format_error_report(errors: List[str], path: Path) -> str:
    """Format error report for user display."""
    if not errors:
        return ""

    report = f"""
Trace File Issues
=================
File: {path.name}
Issues found: {len(errors)}

"""
    for error in errors[:10]:  # Limit to first 10
        report += f"- {error}\n"

    if len(errors) > 10:
        report += f"\n... and {len(errors) - 10} more issues"

    return report

Common error scenarios:

| Scenario | Cause | Handling | | ----------------- | ------------------------------------ | ---------------------------------- | | Empty file | Session had no API calls | Report "No data to analyze" | | Malformed JSON | Corrupted trace or interrupted write | Skip line, count in error report | | Missing fields | Older trace format | Use .get() with defaults | | Permission denied | File locked by another process | Clear error message, suggest retry | | Encoding error | Non-UTF-8 characters | Report encoding issue |

Integration with Existing Tools

Tool Selection Matrix

| Need | Use This | Why | | ---------------------------------- | ------------------------------------------- | ----------------------------- | | "Why was my session slow?" | session-replay | API latency and token metrics | | "What did I discuss last session?" | /transcripts | Conversation content | | "Extract learnings from sessions" | CodexTranscriptsBuilder | Knowledge extraction | | "Reduce my token usage" | session-replay + context_management | Metrics + optimization | | "Resume interrupted work" | /transcripts | Context restoration |

vs. /transcripts Command

/transcripts (conversation management):

  • Focuses on conversation content
  • Restores session context
  • Used for context preservation
  • Trigger: "restore session", "continue work", "what was I doing"

session-replay skill (API-level analysis):

  • Focuses on API metrics
  • Analyzes performance and errors
  • Used for debugging and optimization
  • Trigger: "session health", "token usage", "why slow", "debug session"

vs. CodexTranscriptsBuilder

CodexTranscriptsBuilder (knowledge extraction):

  • Extracts patterns from conversations
  • Builds learning corpus
  • Knowledge-focused
  • Trigger: "extract patterns", "build knowledge base", "learn from sessions"

session-replay skill (metrics analysis):

  • Extracts performance metrics
  • Identifies technical issues
  • Operations-focused
  • Trigger: "performance metrics", "error patterns", "tool efficiency"

Combined Workflows

Workflow 1: Diagnose and Fix Token Issues

1. session-replay: Analyze token usage patterns (health action)
2. Identify high-token operations
3. context_management skill: Apply proactive trimming
4. session-replay: Compare before/after sessions (compare action)

Workflow 2: Post-Incident Analysis

1. session-replay: Identify error patterns (errors action)
2. /transcripts: Review conversation context around errors
3. session-replay: Check tool usage around failures (tools action)
4. Document findings in DISCOVERIES.md

Workflow 3: Performance Baseline

1. session-replay: Analyze 5-10 recent sessions (compare action)
2. Establish baseline metrics (tokens, latency, errors)
3. Track deviations from baseline over time

Storage Locations

  • Trace files: .claude-trace/*.jsonl
  • Session logs: ~/.amplihack/.claude/runtime/logs/<session_id>/
  • Generated reports: Output directly (no persistent storage needed)

Philosophy Alignment

Ruthless Simplicity

  • Single-purpose: Analyze trace files only - no session management, no transcript editing
  • No external dependencies: Uses only Python standard library (json, pathlib, datetime)
  • Direct file parsing: No ORM, no database, no complex abstractions
  • Present-moment focus: Analyzes what exists now, no future-proofing

Zero-BS Implementation

  • All functions work completely: Every code example in this skill runs without modification
  • Real parsing, real metrics: No mocked data, no placeholder calculations
  • No stubs or placeholders: If a feature is documented, it works
  • Fail fast on errors: Clear error messages, no silent failures

Brick Philosophy

  • Self-contained analysis: All functionality in this single skill
  • Clear inputs (trace files) and outputs (reports): No hidden state or side effects
  • Regeneratable from this specification: This SKILL.md is the complete source of truth
  • Isolated responsibility: Session analysis only - doesn't modify files or trigger actions

Limitations

This skill CANNOT:

  • Modify trace files: Read-only analysis, no editing or deletion
  • Generate traces: Use claude-trace npm package to create trace files
  • Restore sessions: Use /transcripts command for session restoration
  • Real-time monitoring: Analyzes completed sessions, not live tracking
  • Cross-project analysis: Analyzes traces in current project only
  • Parse non-JSONL formats: Only claude-trace JSONL format supported
  • Access remote traces: Local filesystem only, no cloud storage

Tips for Effective Analysis

  1. Start with health check: Run health action first
  2. Look for patterns: Use errors to find recurring issues
  3. Optimize hot spots: Use tools to find inefficiencies
  4. Track trends: Use compare across sessions
  5. Combine with transcripts: Use /transcripts for context

Common Patterns

Pattern 1: Debug Slow Session

User: My last session was really slow, analyze it

1. Run health action on latest trace
2. Check request latencies
3. Identify tool bottlenecks
4. Report findings with recommendations

Pattern 2: Reduce Token Usage

User: I'm hitting token limits, help me understand usage

1. Compare token usage across sessions
2. Identify high-token operations
3. Suggest context management strategies
4. Recommend workflow optimizations

Pattern 3: Fix Recurring Errors

User: I keep getting errors, find the pattern

1. Run errors action across last 10 sessions
2. Categorize and count error types
3. Identify root causes
4. Provide targeted fixes

Resources

  • Trace directory: .claude-trace/
  • Transcripts command: /transcripts
  • Context management skill: context-management
  • Philosophy: ~/.amplihack/.claude/context/PHILOSOPHY.md

Troubleshooting

No trace files found

Symptom: "No trace files in .claude-trace/"

Causes and fixes:

  1. claude-trace not enabled: Set AMPLIHACK_USE_TRACE=1 before starting session
  2. Wrong directory: Check you're in project root with .claude-trace/ directory
  3. Fresh project: Run a session with tracing enabled first

Incomplete metrics

Symptom: Missing token counts or zero values

Causes and fixes:

  1. Interrupted session: Trace may be incomplete if session crashed
  2. Streaming responses: Some streaming modes don't capture full metrics
  3. Older trace format: Upgrade claude-trace to latest version

Health score seems wrong

Symptom: Score doesn't match session experience

Understanding the score:

  • 90-100: Excellent - low errors, good efficiency
  • 70-89: Good - minor issues detected
  • 50-69: Fair - significant issues worth investigating
  • Below 50: Poor - likely errors or inefficiencies

Factors in health score:

  • Error rate (40% weight)
  • Token efficiency ratio (30% weight)
  • Request success rate (20% weight)
  • Tool success rate (10% weight)

Large trace files

Symptom: Analysis is slow or memory-intensive

Solutions:

  1. Analyze specific time range instead of full file
  2. Use tools action for targeted analysis
  3. Archive old traces: mv .claude-trace/old-*.jsonl .claude-trace/archive/

Remember

This skill provides session-level debugging and optimization insights. It complements transcript management with API-level visibility. Use it to diagnose issues, optimize workflows, and understand Claude Code behavior patterns.

Key Takeaway: Trace files contain the raw truth about session performance. This skill extracts actionable insights from that data.