bug-hunt
A correctness sweep that only reports bugs it failed to refute. Finders generate candidates; verifiers try to kill them; survivors make the report. The single biggest failure mode of agent bug-hunting is plausible-but-wrong findings, so verification is not optional.
Read-only. Finding bugs and fixing them are separate engagements.
Lenses
Sweep with each lens. With parallel subagents available, one finder per lens; otherwise sequential passes.
| Lens | Hunting for | |------|-------------| | Logic | Inverted conditions, off-by-one, wrong operator, unreachable branches, broken invariants | | Error handling | Swallowed exceptions, missing error paths, errors that corrupt state before propagating, misleading messages | | Edge cases | Empty/nil/zero inputs, unicode, huge inputs, boundary values, first/last iteration | | Concurrency | Races, missing locks, shared mutable state, TOCTOU, async ordering assumptions | | API misuse | Contract violations against libraries and the project's own interfaces, ignored return values, resource leaks, lifecycle errors |
Focus finders on code that is reachable and load-bearing: entry points, hot paths, recently changed files (git log --since is a good prior). A bug in dead code is info, not a finding.
Verification (mandatory)
Every candidate gets an adversarial pass before it may appear in the report. The verifier's job is to REFUTE the finding, default skeptical:
- Read the actual code path end to end, including callers and guards the finder may have missed.
- Trace a concrete input that triggers the bug. No trigger, no bug.
- Check whether a test, type system, or runtime check already prevents it.
- Verdict: confirmed (with the triggering scenario), refuted (drop silently), or unverifiable (report downgraded one severity, marked
(unverified)).
When tests can be run safely (no external dependencies, sandboxed), a failing reproduction test is the gold standard for confirmation and should be included in the finding as a sketch, not committed.
Report contract
Same spine as line-check so findings compose. Severity: critical (data loss, corruption, security-adjacent) / high (wrong results on common inputs, crashes) / medium (wrong on edge cases) / low (latent, needs unlikely conditions) / info. Effort is the fix cost: S / M / L.
# bug-hunt report: <repo> (<date>)
## Verdict
Paragraph: overall correctness posture, the scariest confirmed bug.
## Scorecard
| Lens | Score (0-5) | Summary |
## Findings
### [SEVERITY] Short imperative title
- **Lens:** which lens found it
- **Where:** file:line
- **What:** the defect, concretely
- **Trigger:** the concrete input or sequence that hits it
- **Why it matters:** consequence
- **Fix:** specific action
- **Effort:** S / M / L
## Backlog
Numbered, leverage-sorted: `N. [SEVERITY/EFFORT] title (lens)`
## Not checked
Lenses or areas skipped and why; candidates that were refuted (count only).
Common mistakes
- Reporting finder output without verification. Half of plausible candidates die under a skeptical read.
- "This could be a problem if..." findings with no trigger. A bug without a triggering input is a hypothesis.
- Treating style issues as bugs. Wrong formatting never corrupted data.
- Stopping at the first confirmed bug in a file. Bugs cluster; finish the file.