debug_agent_traces
Debug props agent traces. Read LLM request/response history, parse tool calls, and "speak with dead" — resurrect a past agent conversation to ask follow-up questions about its decisions.
narrow_matchability
Propose narrowing of grader match_file_restriction for unrestricted TP/FP occurrences in a specimen. Produces verifiable, link-rich output that lets the user confirm each restriction is correct.
props_evaluator
Operate the live props cluster as evaluator — fetch credentials from k8s, call the API at props.allegedly.works, trigger critic/grader runs, and inspect results.
reverse_engineer
Systematic binary reverse engineering toolkit. Extract source code, understand functions, document protocols, compare versions. Uses strings, symbols, disassembly, and differential verification.
test_props
Manual live props deployment testing — sets up Podman infrastructure (postgres, registry, backend) and runs real agent containers. NOT for standard Bazel tests (use `bazel test //props/...` for those).
startup-hook-skill
Creating and developing startup hooks for Claude Code on the web. Use when the user wants to set up a repository for Claude Code on the web, create a SessionStart hook to ensure their project can run tests and linters during web sessions.
backtrace
Show the current task stack and context. Use when user says "bt", "backtrace", "stack", "where are we", or asks about current progress on a multi-step task.
branch_splitter
Split a large branch with many changes into independent, reviewable PRs. Use when preparing a messy development branch for code review, when asked to "split this into PRs", "make this reviewable", "break this up", or when a branch does too many unrelated things. Produces a DAG of branches/PRs that can be reviewed and merged independently.
forensic_surgeon
Deep forensic debugging that never stops until root cause is found or visibility limit is proven. Use when user wants to understand exactly why something is broken, not work around it. Activates on "why is this happening", "dig deeper", "don't work around it", "I want to understand", "find the root cause", "this seems suspicious", or when a problem suggests deeper breakage.
hetzner_vnc_screenshot
Take and view screenshots of Hetzner Cloud servers via WebSocket VNC console
proxmox_vm
Interact with Proxmox VMs - screenshots, keystrokes, network info (user)
session_logs
Discover and analyze Claude Code session logs from ~/.claude/projects, including finding the current session and extracting tool calls, user messages, and conversation history
superforecaster
Make well-calibrated probability estimates using superforecasting methodology. Use when user asks about probability, likelihood, chance, odds, "will X happen", "when will X happen", "how much will X cost", "what could go wrong", failure modes, risk assessment, forecasting, or any question involving uncertainty and estimation.
forensic-surgeon
Deep forensic debugging that never stops until root cause is found or visibility limit is proven. Use when user wants to understand exactly why something is broken, not work around it. Activates on "why is this happening", "dig deeper", "don't work around it", "I want to understand", "find the root cause", "this seems suspicious", or when a problem suggests deeper breakage.
hetzner-vnc-screenshot
Take and view screenshots of Hetzner Cloud servers via WebSocket VNC console
proxmox-vm
Interact with Proxmox VMs - screenshots, keystrokes, network info (user)
session-logs
Discover and analyze Claude Code session logs from ~/.claude/projects, including finding the current session and extracting tool calls, user messages, and conversation history