Code Review: Current Or Branch Diff Skill

Codex compatibility note:

Invoke repository skills with $skill-name in Codex; this mirrored copy rewrites legacy Claude /skill-name references.

Prefer the plan-hard skill for planning guidance in this Codex mirror.

Task tracker mandate: BEFORE executing any workflow or skill step, create/update task tracking for all steps and keep it synchronized as progress changes.

User-question prompts mean to ask the user directly in Codex.

Ignore Claude-specific mode-switch instructions when they appear.

Strict execution contract: when a user explicitly invokes a skill, execute that skill protocol as written.

Subagent authorization: when a skill is user-invoked or AI-detected and its protocol requires subagents, that skill activation authorizes use of the required spawn_agent subagent(s) for that task.

Do not skip, reorder, or merge protocol steps unless the user explicitly approves the deviation first.

For workflow skills, execute each listed child-skill step explicitly and report step-by-step evidence.

If a required step/tool cannot run in this environment, stop and ask the user before adapting.

Codex Project-Reference Loading (No Hooks)

Codex does not receive Claude hook-based doc injection. When coding, planning, debugging, testing, or reviewing, open project docs explicitly using this routing.

Always read:

docs/project-config.json (project-specific paths, commands, modules, and workflow/test settings)
docs/project-reference/docs-index-reference.md (routes to the full docs/project-reference/* catalog)
docs/project-reference/lessons.md (always-on guardrails and anti-patterns)

Situation-based docs:

Backend/CQRS/API/domain/entity changes: backend-patterns-reference.md, domain-entities-reference.md, project-structure-reference.md
Frontend/UI/styling/design-system: frontend-patterns-reference.md, scss-styling-guide.md, design-system/README.md
Spec/test-case planning or TC mapping: feature-docs-reference.md
Integration test implementation/review: integration-test-reference.md
E2E test implementation/review: e2e-test-reference.md
Code review/audit work: code-review-rules.md plus domain docs above based on changed files

Do not read all docs blindly. Start from docs-index-reference.md, then open only relevant files for the task.

[FINAL PURPOSE REMINDER — MUST ATTENTION CRITICAL]

Ensure the changes is reasonable, no potential bugs or flaws, critical thinking hard.

[BLOCKING] Execute skill steps in declared order. NEVER skip, reorder, or merge steps without explicit user approval. [BLOCKING] Before each step or sub-skill call, update task tracking: set in_progress when step starts, set completed when step ends. [BLOCKING] Every completed/skipped step MUST include brief evidence or explicit skip reason. [BLOCKING] If Task tools are unavailable, create and maintain an equivalent step-by-step plan tracker with the same status transitions.

Quick Summary

Goal: Comprehensive review of current diffs following project standards. No flaws, no bugs, no missing updates, no stale content. Applies to uncommitted work, staged changes, branch-to-branch diffs, and any project type — code, docs, config, infrastructure, or non-coding artifacts.

Workflow:

Phase 0: Blast Radius — Call $graph-blast-radius skill FIRST (if .code-graph/graph.db exists)
Phase 0.3: Change Types — Detect high-risk change types; create risk tasks
Phase 0.5: Plan Compliance — Verify against active plan (conditional)
Phase 0.7: Surface Detection — AI categorizes changed files; creates dimension tasks
Phase 1: Collect — Run git status/diff, create report file
Phase 2: File Review — Review each changed file, update report incrementally
Phase 3: Holistic — Spawn fresh-context sub-agent for unbiased holistic assessment
Phase 4: Finalize — Generate critical issues, recommendations, suggested commit message
Phase 5: Docs Triage — Invoke $docs-update if staleness detected

Key Rules:

Report-driven: ALWAYS write findings to plans/reports/code-review-{date}-{slug}.md
MUST ATTENTION create todo tasks for ALL phases before starting
Skeptical: every claim needs file:line proof
Verify convention by grepping 3+ existing examples before flagging violations
Actively check DRY violations, YAGNI/KISS over-engineering, correctness bugs
Cross-reference changed files against related docs — flag stale docs, test specs, READMEs

MANDATORY IMPORTANT MUST ATTENTION Plan ToDo Task to discover and READ project-specific reference docs:

Search for code standards docs: *code-review*, *patterns*, *conventions*, *style-guide* — read any found

Search for architecture docs: *architecture*, *adr-*, README.md at service/module roots

Look for docs referencing changed technology areas (backend, frontend, infra, etc.)

Read docs most relevant to the categories of files changed

Prerequisites: MUST ATTENTION READ before executing:

Critical Purpose: Ensure quality — no flaws, no bugs, no missing updates, no stale content. Verify both artifacts AND documentation.

External Memory: For complex or lengthy work (research, analysis, scan, review), write intermediate findings and final results to a report file in plans/reports/ — prevents context loss and serves as deliverable.

Evidence Gate: MANDATORY IMPORTANT MUST ATTENTION — every claim, finding, and recommendation requires file:line proof or traced evidence with confidence percentage (>80% to act, <80% must verify first).

OOP & DRY Enforcement: MANDATORY IMPORTANT MUST ATTENTION — flag duplicated patterns that should be extracted to a base class, generic, or helper. Classes in the same group or suffix MUST ATTENTION inherit a common base (even if empty now — enables future shared logic and child overrides). Verify project has code linting/analyzer configured for the stack.

Code Review: Current Or Branch Diff

Comprehensive review of current changes or explicit branch/commit diffs following project standards.

Review Scope

Target: Current working-tree changes by default, or an explicit branch/tag/commit diff when the user asks to review a branch comparison.

Use these sources:

Current changes: git status, git diff, and git diff --cached
Branch diff: git diff <base>...<head> plus git diff --name-only <base>...<head>
Commit range: git diff <base>..<head> plus git diff --name-only <base>..<head>

Review Mindset (NON-NEGOTIABLE)

Be skeptical. Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80%.

Do NOT accept correctness at face value — verify by reading actual implementations
Every finding MUST include file:line evidence (grep results, read confirmations)
Cannot prove claim with trace → do NOT include in report
Question assumptions: "Does this actually work?" → trace call path to confirm
Challenge completeness: "Is this all?" → grep related usages
Verify side effects: "What else does this change break?" → check consumers and dependents
No "looks fine" without proof — state what was verified and how

Core Principles (ENFORCE ALL)

YAGNI — Flag code solving hypothetical future problems (unused parameters, speculative interfaces, premature abstractions) KISS — Flag unnecessarily complex solutions. "Is there a simpler way meeting same requirement?" DRY — Actively grep for similar/duplicate code before accepting new code. 3+ similar patterns → flag for extraction. Clean Code — Readable > clever. Names reveal intent. Functions do one thing. No deep nesting. Follow Convention — Before flagging ANY pattern violation, grep for 3+ existing examples. Codebase convention wins over textbook rules. No Flaws/No Bugs — Trace logic paths. Verify edge cases (null, empty, boundary values). Check error handling covers failure modes. Proof Required — Every claim backed by file:line evidence or grep results. Speculation FORBIDDEN. Doc Staleness — Cross-reference changed files against related docs (feature docs, test specs, READMEs). Flag stale or missing updates.

Run python .claude/scripts/code_graph batch-query <f1> <f2> --json on changed files for test coverage and caller impact.

Blast Radius Pre-Analysis (MANDATORY FIRST STEP)

IMPORTANT MANDATORY MUST ATTENTION: FIRST action in every review. Call $graph-blast-radius BEFORE any other review work.

If .code-graph/graph.db exists, run graph-blast-radius analysis before reviewing changes:

Call $graph-blast-radius skill (runs python .claude/scripts/code_graph blast-radius --json)
Include in review: impacted files count, untested changes, risk level based on blast radius size
Use results to prioritize file review order (highest-impact files first)

Graph-Assisted Change Review

For each changed file, trace full impact:

python .claude/scripts/code_graph trace <changed-file> --direction downstream --json — all files affected by changes
Flag any affected file NOT covered by tests
Catches cross-service impact simple diff review misses

Review Approach (Report-Driven Two-Phase — CRITICAL)

MANDATORY FIRST: Create Todo Tasks for Review Phases Before starting, call task tracking with:

[ ] [Review Phase 0] Run $graph-blast-radius to analyze change impact - in_progress (MUST ATTENTION BE FIRST)
[ ] [Review Phase 0.3] Detect high-risk change types, create risk tasks - pending
[ ] [Review Phase 0.7] Categorize changed files, create dimension review tasks - pending
[ ] [Review Phase 0.5] Plan compliance check (skip if no active plan) - pending
[ ] [Review Phase 1] Get changes and create report file - pending
[ ] [Review Phase 2] Review file-by-file and update report - pending
[ ] [Review Phase 3] Spawn fresh-context sub-agent for holistic assessment - pending
[ ] [Review Phase 4] Generate final review findings - pending
[ ] [Review Phase 5] Run $docs-update if staleness detected - pending

Update todo status as each phase completes.

Note: If Phase 1 reveals 10+ changed files, replace Phase 2-4 tasks with Systematic Review Protocol tasks: [Review Phase 2] Categorize and fire parallel sub-agents, [Review Phase 3] Synchronize and cross-reference, [Review Phase 4] Generate consolidated report

Phase 0: Run Graph Blast Radius Analysis (MANDATORY FIRST STEP)

IMPORTANT MANDATORY MUST ATTENTION: FIRST action before ANY other review work.

MUST ATTENTION Call $graph-blast-radius skill
MUST ATTENTION Record in report: changed files count, impacted files count, untested changes, risk level
MUST ATTENTION Use blast radius output to prioritize which files to review most carefully in Phase 2
If .code-graph/graph.db does not exist, note "Graph not available — skipping blast radius" and proceed to Phase 0.3

Phase 0.3: Change Type Detection + Risk Tasks (MANDATORY)

Purpose: Identify HIGH-RISK change types in this diff before dimensional review. Each detected type creates a focused risk task. Change types are ORTHOGONAL to file category: the same file can be both a migration AND a security change — detect all independently.

Step 1: Detect change types

git diff --name-only HEAD       # unstaged
git diff --cached --name-only   # staged
# For branch or commit-range review, use the user-provided diff source:
git diff --name-only <base>...<head>

Evaluate each change type for this diff:

| Change Type | Detection Signal (adapt to project's actual conventions) | TRUE if... | | ------------------ | ----------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------- | | DepUpgrade | Dependency manifest changed (package.json, *.csproj, Gemfile, go.mod, requirements.txt, Cargo.toml, pom.xml, etc.) | A version number changed in any dependency manifest | | Migration | File path or name suggests schema change (contains migration, schema, alter_table, or matches project's migration convention) | Any migration-convention file appears in the diff | | BusEvent | New or modified event/message definition or consumer (infer from project conventions: consumer naming, message type directories) | A consumer or event class is new or its contract changed | | ApiContract | API definition file changed (controller, route handler, OpenAPI/GraphQL schema) with route or field differences | Diff shows route/action/field additions or removals | | SecurityChange | Auth/permission definition changed — infer from project conventions (auth middleware, permission constants, policy definitions) | Any auth or permission gate is added, removed, or changed | | ConfigChange | Configuration files changed (e.g., *.json, *.yaml, *.env*, *Config*, *Options*, *Settings*, *.toml) | Any config-convention file appears | | InfraChange | Infrastructure definition changed (Dockerfile, docker-compose*.yml, CI/CD pipelines, k8s manifests, IaC files) | Any infra-convention file appears |

Record in report:

## Change Type Analysis
DepUpgrade: [YES/NO] | Migration: [YES/NO] | BusEvent: [YES/NO]
ApiContract: [YES/NO] | SecurityChange: [YES/NO] | ConfigChange: [YES/NO] | InfraChange: [YES/NO]

Step 2: Create change-type risk tasks (ALWAYS before any review work)

MANDATORY: Call task tracking for each TRUE signal. Do NOT create tasks for FALSE signals. The concerns listed are starting points — apply domain knowledge beyond them.

| Condition | task tracking subject | Key concerns to investigate (starting points — expand with domain knowledge) | | ------------------- | ---------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | DepUpgrade TRUE | [Review-DepUpgrade] Dependency upgrade — semver, breaking changes, security advisories | Major/minor/patch? Read upstream CHANGELOG for breaking API changes. Grep deprecated API usage. Check transitive dependency changes. Known security advisories for new version? Peer dependency compatibility? Tests still passing? | | Migration TRUE | [Review-Migration] DB migration — rollback path, volume impact, zero-downtime | Rollback/Down script exists? Table size estimate — large tables need lock analysis. NOT NULL column without default on non-empty table? Indexes created with no-lock option? Deployment ordering (before/after service deploy)? Backfill idempotent if run twice? | | BusEvent TRUE | [Review-BusEvent] Cross-service event/message — consumer, idempotency, retry, poison pill | Consumer exists for new event? Retry strategy: prerequisite data not synced → wait-retry vs silent skip? Handler safe to run twice (idempotency)? Malformed message handling / dead-letter configured? Ordering assumptions vs broker guarantees? | | ApiContract TRUE | [Review-ApiContract] API contract change — backward compat, client alignment, auth | Additive or breaking? Breaking → versioning or coordinated deploy required. All callers (UI, other services, tests) still compatible? New endpoint protected appropriately? No required response fields added without client update? | | SecurityChange TRUE | [Review-SecurityChange] Security/permission change — all paths covered, no privilege escalation | All code paths reaching the gate covered? Negative test verifying unauthorized access DENIED? Privilege escalation possible? BOTH enforcement AND display control updated? Permission definition in single authoritative place (no duplicated strings risking drift)? | | ConfigChange TRUE | [Review-ConfigChange] Config/env change — all environments, no secrets committed | New config key present in ALL environment configs? Hardcoded default masking missing production config? Any secret value in the diff? → CRITICAL if yes. Documented in setup guide? App fails fast if config missing? | | InfraChange TRUE | [Review-InfraChange] Infrastructure change — env parity, no dev values in prod, reproducible build | Change affects all environments consistently? Hardcoded dev values (localhost, debug flags, dev credentials)? Pinned image/dependency versions? Local dev impact documented? CI/CD secret/permission requirements documented? |

Step 3: Work through change-type tasks before dimensional review

For each created change-type task:

Set task to in_progress
Work through ALL applicable concerns — the table above is a starting point, not a ceiling
For each concern: cite file:line for PASS or describe finding for FAIL/WARN
Write findings under ## {Task Subject} Findings in report
Set task to completed

IMPORTANT: Complete ALL change-type tasks FIRST, then proceed to Phase 0.7. If no change-type signals detected, log "No high-risk change types detected" and proceed.

Phase 0.7: Change Surface Detection + Dynamic Review Tasks (MANDATORY)

Purpose: Let AI categorize the changes by nature and create review tasks accordingly. Do NOT assume fixed categories — derive them from what the project's actual changed files are. Think, don't classify into a preset grid. The AI owns this step entirely.

Step 1: Derive categories from the diff

git diff --name-only HEAD        # unstaged
git diff --cached --name-only    # staged
# For branch or commit-range review, use the user-provided diff source:
git diff --name-only <base>...<head>

For each changed file, infer its category by examining:

Language/extension: What technology or domain does this file belong to?
Directory semantics: What layer, module, or concern does this path represent in the project?
Change nature: Is this logic, data schema, configuration, documentation, infrastructure, styling, testing, or tooling?

Do NOT map to fixed buckets. Derive categories that fit THIS project's actual structure and vocabulary.

Common category types to consider as starting points (not exhaustive — derive what fits):

Server-side logic — business rules, API handlers, services, consumers, event processors
Client-side logic — UI components, state management, API integration
Data/Schema — migrations, schemas, seed data, domain models
Styles/Assets — CSS/SCSS, design tokens, images, fonts
Configuration — app settings, env vars, feature flags
Infrastructure — Docker, CI/CD, pipelines, cloud manifests
Documentation/Specs — markdown docs, ADRs, feature specs, test specs
Tests — unit, integration, E2E test files
Build/Tooling — build scripts, linters, formatters, bundlers, agent scripts
Security — auth config, permission definitions, certificates

Record in report:

## Change Surface
{Category name} ({category type}): {N} files
{Category name} ({category type}): {M} files
...

Step 2: For each category, enumerate concerns and create a task

This is where you THINK, not fill in blanks. Apply SYNC:category-review-thinking for each category.

For EACH identified category:

Understand the domain: What is this category's purpose? What invariants govern it? Who depends on it?
Read project conventions: Grep for style guides, patterns docs, READMEs specific to this area
Derive concerns from first principles — DO NOT limit to any fixed list; trust your domain knowledge
Create a task tracking task named [Review-{Category}] {brief concern summary} listing derived concerns
Select the appropriate sub-agent type (see Sub-Agent Type Selection)

ALWAYS create: [Review-General] — universal quality: correctness, YAGNI/KISS/DRY, doc staleness, test coverage. Runs across ALL changed files regardless of other categories.

Sub-Agent Type Selection:

| Category Nature | subagent_type | | -------------------------------------- | ----------------------- | | Code logic (any stack) | code-reviewer | | Security, auth, permissions | security-auditor | | Performance, query efficiency, latency | performance-optimizer | | Documentation, plans, specs, ADRs | general-purpose | | Infrastructure, CI/CD, config | general-purpose | | Mixed or default | code-reviewer |

Step 3: Work through tasks in order

For each created task:

Set task to in_progress before starting
Review ONLY files in that category's scope
Apply SYNC:category-review-thinking — trust your domain knowledge beyond the examples there
Write findings to report under ## {Task Subject} Findings section
Set task to completed before starting next task

NEVER mark a dimension task completed by scanning. Work through each relevant file explicitly. For large categories (10+ files): escalate to a parallel sub-agent using the Systematic Review Protocol.

Phase 0.5: Plan Compliance Check (CONDITIONAL — only when active plan exists)

Check ## Plan Context in injected context:

If "Plan: none" → skip, log "No active plan — skipping plan compliance"
If "Plan: {path}" → load plan and verify:

Read {plan-path}$plan-hard.md — get phase list and scope
Read relevant phase files — extract files to modify, test specifications, success criteria
Verify:
- MUST ATTENTION verify Scope match — changed files listed in plan phases (warn on unplanned files)
- MUST ATTENTION verify Test evidence — tests mapped to completed phases have evidence (file:line), not "TBD"
- MUST ATTENTION verify Success criteria met — phase success criteria satisfied by changes
Add "Plan Compliance" section to review report

Phase 1: Get Changes and Create Report File

MUST ATTENTION Identify diff source: current working tree, staged changes, branch comparison, or commit range
MUST ATTENTION Run git status for current changes, or git diff --name-only <base>...<head> for branch comparisons
MUST ATTENTION Run git diff or git diff <base>...<head> to see actual changes
MUST ATTENTION Create plans/reports/code-review-{date}-{slug}.md
MUST ATTENTION Initialize with Scope, Files to Review, Blast Radius Summary sections

Phase 2: File-by-File Review (Build Report Incrementally)

For EACH changed file, read and immediately update report with:

File path and change type (added/modified/deleted)
Change Summary: what modified/added
Purpose: why change exists
Convention check: Grep 3+ similar patterns — does new code follow existing convention?
Correctness check: Trace logic paths — handles null, empty, boundary values, error cases?
DRY check: Grep similar/duplicate code — does this logic already exist elsewhere?
Intention check: Does change serve stated purpose? Flag unrelated modifications
Logic trace: Trace one happy path + one error path. Logic matches requirements?
Semantic correctness: Does the artifact DO what it's supposed to?
Issues Found: naming, typing, responsibility, patterns, bugs, over-engineering, logic errors
Continue to next file, repeat

Phase 3: Second-Round Review (Conditional Protocol — branch on Phase 0.7 surface)

Protocol: SYNC:double-round-trip-review + SYNC:fresh-context-review + SYNC:review-protocol-injection (all inlined above). INVARIANT: Phase 3 fires a fresh sub-agent ONLY after a fix cycle. If Phase 2 finds zero issues, the review ENDS — no Phase 3 needed. If Phase 2 finds issues, fix them, then Phase 3 fresh sub-agent re-review is mandatory.

Check categories from Phase 0.7 — if multiple distinct domains changed (e.g., server-side + client-side), run Synthesis Mode. Otherwise run Holistic Mode.

[SYNTHESIS MODE — when multiple distinct domains changed]

Spawn a Synthesis Agent as Round 2. Purpose: catch cross-boundary issues individual dimensional tasks cannot see.

When constructing Agent call prompt:

Copy Agent call shape from SYNC:review-protocol-injection template verbatim, agent_type: "code-reviewer"
Embed all 10 universal SYNC blocks verbatim

Set Task as:

Synthesis review — cross-boundary concerns ONLY across the changed domains in this diff.
You have these dimensional findings as context: {summary from each dimensional task}.
Re-read ALL changed files from scratch via your own tool calls.

Focus ONLY on cross-boundary concerns — do NOT re-review each domain's internals:
1. Contract Alignment: Do callers match what callees expose? (routes, parameters, field names, types)
2. Data Consistency: Are field names/types consistent across layer boundaries?
3. Security Boundary: Is auth enforced on BOTH sides (enforcement AND display control)?
4. Cross-Layer Naming: Same concept named differently across layers?
5. Missing Wiring: New producer with no consumer? New consumer with no producer? New feature with no doc?
6. Documentation: Docs reflect changes in BOTH domains together?

Set Target Files as "use the selected diff source from Phase 1"
Set report path as plans/reports/synthesis-review-{date}.md

After sub-agent returns:

Read synthesis report
Integrate findings as ## Synthesis Round Findings in main report — DO NOT filter or override
If FAIL: fix issues, spawn NEW synthesis sub-agent (new Agent call)
Max 3 fresh rounds — escalate via a direct user question if still failing

[HOLISTIC MODE — when single domain changed]

No cross-boundary synthesis needed. Spawn standard holistic Round 2.

When constructing Agent call prompt:

Copy Agent call shape from SYNC:review-protocol-injection template verbatim
Select subagent_type based on domain's dominant concern (see Sub-Agent Type Selection)
Set Task as: "Review the selected diff holistically. Focus on big picture — overall technical approach coherence, architecture layers, logic placement (lowest layer), DRY violations, YAGNI/KISS, function complexity. Domain: {category from Phase 0.7} — apply domain knowledge for this category accordingly."
Set Target Files as "use the selected diff source from Phase 1"
Set report path as plans/reports/code-review-changes-round{N}-{date}.md

After sub-agent returns:

Read sub-agent's report
Integrate findings as ## Round {N} Findings (Fresh Sub-Agent) in main report — DO NOT filter or override
If FAIL: fix issues, spawn NEW Round N+1 fresh sub-agent (new Agent call — never reuse Round 2's agent)
Max 3 fresh rounds — escalate to user via a direct user question if still failing
Final verdict must incorporate findings from ALL rounds

The following checks are handled by sub-agent but can be verified in Phase 4:

Clean Code & Over-engineering Checks:

MUST ATTENTION YAGNI: Code solving hypothetical future problems? Unused params, speculative interfaces?
MUST ATTENTION KISS: Unnecessarily complex solution? Could this be simpler while meeting the same requirement?
MUST ATTENTION Function complexity: Methods too long? Nesting too deep? Multiple responsibilities?
MUST ATTENTION Readability: Would a new team member understand without reading the full implementation?

Documentation Staleness Check (REQUIRED):

For each changed file, identify related documentation:

Search for feature docs, architecture references, READMEs at module/service roots, API docs, test specs, setup guides
Flag any doc where content no longer matches the changed artifact
Flag missing docs for new features or components that should be documented
Do NOT auto-fix — flag in report with specific stale section and what changed

Correctness & Bug Detection: Apply SYNC:bug-detection — null safety, boundaries, error handling, resource cleanup, concurrency.

Test Spec Verification: Apply SYNC:test-spec-verification — locate specs, verify coverage, flag gaps.

Integration Test Sync: Apply SYNC:integration-test-sync-check — surface missing tests via a direct user question.

Translation Sync: Apply SYNC:translation-sync-check — for multilingual UI text changes, require translation updates or explicit user risk acceptance.

Phase 4: Generate Final Review Result

Update report with final sections:

MUST ATTENTION Overall Assessment (big picture summary)
MUST ATTENTION Critical Issues (must fix before merge)
MUST ATTENTION High Priority (should fix)
MUST ATTENTION Architecture Recommendations
MUST ATTENTION Documentation Staleness (list stale docs with what changed, or "No doc updates needed")
MUST ATTENTION Positive Observations
MUST ATTENTION Suggested commit message (based on changes)

Phase 5: Docs-Update Triage (CONDITIONAL)

If Documentation Staleness Check in Phase 4 identified stale docs:

Invoke $docs-update skill to update impacted documentation
If $docs-update produces changes, include in review summary
If no staleness detected, skip: "No doc updates needed — staleness check was clean"

Readability Checklist (MUST ATTENTION evaluate)

Before approving, verify artifacts are easy to read, maintain, understand:

Schema visibility — Function computes data structure? Comment shows output shape so readers don't trace code
Non-obvious data flows — Data transforms through multiple steps? Brief comment explains pipeline
Self-documenting signatures — Params explain their role; flag unused params
Magic values — Unexplained numbers/strings → named constants or inline rationale
Naming clarity — Variables/functions reveal intent without reading implementation

Review Checklist

1. Architecture Compliance

MUST ATTENTION Follows project's layer/module boundaries (read docs/project-config.json or equivalent)
MUST ATTENTION No cross-module/service direct data access where boundaries exist
MUST ATTENTION Logic placed in lowest responsible layer (not in orchestrators/top-layer classes)

2. Code Quality & Clean Code

MUST ATTENTION Single Responsibility Principle — each function/class does ONE thing
MUST ATTENTION No code duplication (DRY) — grep for similar code, extract if 3+ occurrences
MUST ATTENTION Appropriate error handling following project patterns
MUST ATTENTION No magic numbers/strings (extract to named constants)
MUST ATTENTION Type annotations on all functions (where language requires)
MUST ATTENTION Early returns/guard clauses used
MUST ATTENTION YAGNI — no speculative features, unused parameters, premature abstractions
MUST ATTENTION KISS — simplest solution meeting requirement
MUST ATTENTION Follows existing codebase conventions (verify with grep for 3+ examples)

2.5. Naming Conventions

MUST ATTENTION Names reveal intent (WHAT not HOW)
MUST ATTENTION Specific names, not generic (employeeRecords not data)
MUST ATTENTION Booleans: prefix with state-indicating verb (isActive, hasPermission, canEdit)
MUST ATTENTION No cryptic abbreviations

3. Project-Specific Patterns

MUST ATTENTION Read project's patterns/conventions reference docs BEFORE flagging violations
MUST ATTENTION Verify 3+ existing examples before concluding a pattern is a violation
MUST ATTENTION Flag deviation from project patterns with evidence (file:line showing existing pattern)

4. Security

MUST ATTENTION No hardcoded credentials, tokens, or secrets
MUST ATTENTION Proper authorization checks at all entry points
MUST ATTENTION Input validation at system boundaries (user input, external APIs, message payloads)
MUST ATTENTION No injection risks (SQL, command, template, etc.)

5. Performance

MUST ATTENTION No O(n²) complexity where O(n) or O(1) is possible (use lookup structures)
MUST ATTENTION No N+1 query patterns (batch load related data before iterating)
MUST ATTENTION Pagination for all list queries (never fetch unbounded result sets)
MUST ATTENTION Parallel operations where independent (not forced sequential)
MUST ATTENTION Async/await used correctly (no blocking in async context)
MUST ATTENTION Query patterns have appropriate indexes

6. Common Issues

MUST ATTENTION Unused imports or variables
MUST ATTENTION Debug/logging statements left in that should not be in production
MUST ATTENTION Hardcoded values that should be configuration
MUST ATTENTION Missing async/await or promise handling
MUST ATTENTION Incorrect or absent exception handling
MUST ATTENTION Missing validation at boundaries

7. Documentation Staleness

MUST ATTENTION For each changed file: identify related docs (feature docs, architecture references, READMEs)
MUST ATTENTION Changed logic → verify relevant feature/module docs still accurate
MUST ATTENTION Changed tooling (scripts, configs, CI) → verify setup/getting-started docs still accurate
MUST ATTENTION New feature/component added → flag if corresponding doc missing
MUST ATTENTION Test specs reflect current behavior after changes
MUST ATTENTION API changes reflected in relevant API docs or specs

Output Format

Provide feedback in this format:

Summary: Brief overall assessment

Critical Issues: (Must fix before commit)

Issue 1: Description and suggested fix

High Priority: (Should fix)

Issue 1: Description

Suggestions: (Nice to have)

Suggestion 1

Documentation Staleness: (Docs that may need updating)

Doc 1: What is stale and why
No doc updates needed — if no changed file maps to a doc

Positive Notes:

What was done well

Suggested Commit Message:

type(scope): description

- Detail 1
- Detail 2

Systematic Review Protocol (for 10+ changed files)

NON-NEGOTIABLE: When changeset is large (10+ files), MUST ATTENTION use this systematic protocol instead of reviewing files one-by-one sequentially.

Principle: Review carefully and systematically — break into groups, fire multiple specialized agents to review in parallel. Ensure no flaws, no bugs, no stale info, and best practices in every aspect.

Auto-Activation

In Phase 0, after running git status, count changed files. If 10 or more files changed:

STOP sequential Phase 1-3 approach
SWITCH to Systematic Review Protocol automatically
ANNOUNCE to user: "Detected {N} changed files. Switching to systematic parallel review protocol."

Step 1: Categorize Changes

Group all changed files into logical categories derived from the project's actual structure (see Phase 0.7). Example groupings to orient thinking (derive what fits the project):

| Category Type | Example Groupings | | ----------------------- | --------------------------------------------------------------------- | | Agent/Tooling | AI scripts, hooks, skill definitions, workflow configs, linting rules | | Root config/docs | Root README, project config, CI/CD pipeline configs | | Reference docs | Architecture docs, patterns references, setup guides | | Feature/domain docs | Business feature documentation, spec files, ADRs | | Backend logic | Service/handler/controller source (infer from project structure) | | Frontend logic | UI component/state/API source (infer from project structure) | | Data/Schema | Migrations, schema files, seed data | | Tests | Unit, integration, E2E test files | | Infrastructure | Docker, k8s, CI/CD, cloud manifests |

Derive the actual groupings from what THIS project contains — do not force files into categories that don't fit.

Step 2: Fire Parallel Specialized Sub-Agents

Launch one sub-agent per category via spawn_agent tool with run_in_background: true.

Sub-agent type selection per category:

Code logic (any stack) → code-reviewer
Security-sensitive changes → security-auditor
Performance-critical paths → performance-optimizer
Docs, plans, specs, configs, infra → general-purpose

Each sub-agent receives:

Full list of files in its category
The SYNC:category-review-thinking framework as its primary thinking model
Project reference docs relevant to its category (discovered by searching *patterns*, *conventions*, *style-guide*)
Cross-reference verification instructions (counts, tables, links where applicable)

All sub-agents run in parallel to maximize speed and coverage.

Step 3: Synchronize & Cross-Reference

After all sub-agents complete:

Collect findings from each agent's report
Cross-reference — verify counts, tables, references consistent ACROSS categories
Detect gaps — issues only visible when looking across categories (e.g., new feature added in code but missing from docs; new API endpoint with no client call)
Consolidate into single holistic report with categorized findings

Step 4: Holistic Big-Picture Assessment

With all category findings combined, assess:

Overall coherence of changes as a unified intent
Cross-category synchronization (do docs match code? do contracts match callers?)
Risk areas where categories interact
Missing documentation updates for changed artifacts

Workflow Recommendation

MANDATORY IMPORTANT MUST ATTENTION — NO EXCEPTIONS: If NOT already in a workflow, MUST use a direct user question to ask user. Do NOT judge task complexity or decide "simple enough to skip" — user decides, not you:

Activate review-changes workflow (Recommended) — review-changes → [parallel: review-architecture + review-domain-entities + performance + integration-test-review + security] → code-simplifier → code-review → integration-test-verify → why-review (synthesis) → plan → why-review → plan-validate → why-review → cook → workflow-review-changes (fresh-subagent re-review gate) → docs-update → watzup → workflow-end

Execute $review-changes directly — run this skill standalone

Architecture Boundary Check

For each changed file, verify no import from forbidden layer:

Read rules from docs/project-config.json → architectureRules.layerBoundaries
Determine layer — For each changed file, match path against each rule's paths glob patterns
Scan imports — Grep file for import statements
Check violations — If any import path contains layer name listed in cannotImportFrom, it is a violation
Exclude framework — Skip files matching any pattern in architectureRules.excludePatterns
BLOCK on violation — Report as critical: "BLOCKED: {layer} layer file {filePath} imports from {forbiddenLayer} layer ({importStatement})"

If architectureRules not present in project-config.json, skip silently.

Phase 6: Why-Review Self-Validation Gate (MANDATORY when findings exist)

Purpose: Adversarial validation of own findings BEFORE handoff. Catches over-flagged Highs, false positives, and severity inflation at the source rather than letting them propagate downstream.

Trigger: Any finding produced (Critical, High, Medium, OR Low). Skip ONLY when the report's verdict is unconditional PASS with literally zero findings.

Protocol:

Read own finalized report from plans/reports/{skill}-{date}-{slug}.md
Invoke $why-review skill with arg: validate findings in plans/reports/{skill}-{date}-{slug}.md — verify each finding has file:line proof, steel-man each rejected interpretation, and stress-test severity classifications
Read why-review output from plans/reports/why-review-{date}.md
If why-review demotes/removes any finding: UPDATE own finalized report with revised severities, remove false positives, and add a ## Why-Review Validation Notes section citing what changed and why
If why-review confirms all findings: Append ## Why-Review Validation line to own report stating "All N findings re-validated against actual code; no severity changes."

Skip conditions (record explicit reason if skipping):

Verdict is unconditional PASS with zero findings → log "Skipped — no findings to validate"
Why-review skill itself is the active context (avoid recursion)

Why this exists: AI sub-agent reports inherit confirmation bias — the orchestrator absorbs severity claims as ground truth. The 2026-05-09 review incident produced 5 Highs; adversarial validation demoted 3 of them. Codify this as standard practice.

Next Steps

MANDATORY IMPORTANT MUST ATTENTION — NO EXCEPTIONS after completing this skill, MUST use a direct user question to present options. Do NOT skip because task seems "simple" or "obvious" — user decides:

"$code-review (Recommended)" — Deeper code quality review
"$watzup" — Wrap up session and review all changes
"Skip, continue manually" — user decides

AI Agent Integrity Gate (NON-NEGOTIABLE)

Completion ≠ Correctness. Before reporting ANY work done, prove it:

Grep every removed name. Extraction/rename/delete touched N files? Grep confirms 0 dangling refs across ALL file types.

Ask WHY before changing. Existing values are intentional until proven otherwise. No "fix" without traced rationale.

Verify ALL outputs. One build passing ≠ all builds passing. Check every affected stack.

Evaluate pattern fit. Copying nearby code? Verify preconditions match — same scope, lifetime, base class, constraints.

New artifact = wired artifact. Created something? Prove it's registered, imported, reachable by all consumers.

Related Skills

| Skill | Relationship | When to Call | | -------------------------- | ------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | | $docs-update | Primary downstream — called when staleness detected | Triggered by Documentation Staleness findings | | $spec-discovery [update] | Spec updater — called when artifact behavior differs from spec bundle | Call BEFORE docs-update if spec-was-wrong scenario detected | | $feature-docs [update] | Feature doc updater — called for feature doc section changes | Called internally by docs-update; call directly for targeted update | | $tdd-spec [update] | Test spec updater — called when test cases may be stale | Called internally by docs-update; call directly for targeted test case update | | $integration-test-review | Test quality gate — detects test/spec mismatches | Call when changes touch areas covered by integration tests | | $code-review | Code quality — deeper review of changed code | Always follows review-changes quality pass |

Standalone Chain

When called outside a workflow (i.e., user ran $review-changes directly):

review-changes (you are here)
  │
  ├─ Code quality checks (code-simplifier → review-architecture → code-review → performance)
  │
  ├─ Phase 5: Documentation Staleness Triage
  │    → If stale docs detected: [REQUIRED] → $docs-update
  │
  ├─ Integration test check (SYNC:integration-test-sync-check):
  │    → If logic changes touch tested areas: [REQUIRED] → $integration-test [from-changes]
  │    → Then: $integration-test-review → $integration-test-verify
  │
  ├─ Translation sync check (SYNC:translation-sync-check):
  │    → If multilingual UI text changes lack locale updates: [REQUIRED] ask the user directly + explicit decision
  │
  ├─ Bugfix-specific: "Was spec wrong?" check:
  │    If this review is post-bugfix AND spec describes the bug as expected behavior:
  │    → [REQUIRED] Flag to user: "The spec may document the bug as correct behavior."
  │    → If spec bug confirmed → [REQUIRED]: $spec-discovery [update] FIRST → $feature-docs [update relevant sections]
  │    → Do NOT let $docs-update update test cases to document broken behavior.
  │
  └─ [RECOMMENDED] → $watzup
        Summary of all review findings, doc changes, and test coverage status.

[CRITICAL — TOP 3 RULES]

MUST ATTENTION Phase 0 graph blast-radius FIRST — NEVER skip; informs entire review order

Clean Round 1 ENDS the review. When issues found, fresh sub-agent re-review mandatory after fixing.

MUST ATTENTION task tracking ALL phases before starting; missing tests MUST surface via a direct user question — NOT silently logged

[IMPORTANT] Use task tracking to break ALL work into small tasks BEFORE starting — including tasks for each file read. Prevents context loss from long files. For simple tasks, AI MUST ATTENTION ask user whether to skip.

Critical Thinking Mindset — Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80% to act. Anti-hallucination: Never present guess as fact — cite sources for every claim, admit uncertainty freely, self-check output for errors, cross-reference independently, stay skeptical of own confidence — certainty without evidence root of all hallucination.

Sequential Thinking Protocol — Structured multi-step reasoning for complex/ambiguous work. Use when planning, reviewing, debugging, or refining ideas where one-shot reasoning is unsafe.

Trigger when: complex problem decomposition · adaptive plans needing revision · analysis with course correction · unclear/emerging scope · multi-step solutions · hypothesis-driven debugging · cross-cutting trade-off evaluation.

Format (explicit mode — visible thought trail):

Thought N/M: [aspect] — one aspect per thought, state assumptions/uncertainty

Thought N/M [REVISION of Thought K]: ... — when prior reasoning invalidated; state Original / Why revised / Impact

Thought N/M [BRANCH A from Thought K]: ... — explore alternative; converge with decision rationale

Thought N/M [HYPOTHESIS]: ... then [VERIFICATION]: ... — test before acting

Thought N/N [FINAL] — only when verified, all critical aspects addressed, confidence >80%

Mandatory closers: Confidence % stated · Assumptions listed · Open questions surfaced · Next action concrete.

Stop conditions: confidence <80% on any critical decision → escalate via ask the user directly · ≥3 revisions on same thought → re-frame the problem · branch count >3 → split into sub-task.

Implicit mode: apply methodology internally without visible markers when adding markers would clutter the response (routine work where reasoning aids accuracy).

Deep-dive: see $sequential-thinking skill (.claude/skills/sequential-thinking/SKILL.md) for worked examples (api-design, debug, architecture), advanced techniques (spiral refinement, hypothesis testing, convergence), and meta-strategies (uncertainty handling, revision cascades).

Understand Code First — HARD-GATE: Do NOT write, plan, or fix until you READ existing code.

Search 3+ similar patterns (grep/glob) — cite file:line evidence

Read existing files in target area — understand structure, base classes, conventions

Run python .claude/scripts/code_graph trace <file> --direction both --json when .code-graph/graph.db exists

Map dependencies via connections or callers_of — know what depends on your target

Write investigation to .ai/workspace/analysis/ for non-trivial tasks (3+ files)

Re-read analysis file before implementing — never work from memory alone

NEVER invent new patterns when existing ones work — match exactly or document deviation

BLOCKED until: - [ ] Read target files - [ ] Grep 3+ patterns - [ ] Graph trace (if graph.db exists) - [ ] Assumptions verified with evidence

Design Patterns Quality — Priority checks for every code change:

DRY via OOP: Identify classes/modules with the same purpose, naming pattern, or lifecycle. Apply your knowledge of the project's language/framework to determine the idiomatic abstraction (base class, mixin, trait, protocol, decorator). 3+ similar patterns → extract to shared abstraction.

Right Responsibility: Logic in LOWEST layer (Entity > Domain Service > Application Service > Controller). Never business logic in controllers.

SOLID: Single responsibility (one reason to change). Open-closed (extend, don't modify). Liskov (subtypes substitutable). Interface segregation (small interfaces). Dependency inversion (depend on abstractions).

After extraction/move/rename: Grep ENTIRE scope for dangling references. Zero tolerance.

YAGNI gate: NEVER recommend patterns unless 3+ occurrences exist. Don't extract for hypothetical future use.

Anti-patterns to flag: God Object, Copy-Paste inheritance, Circular Dependency, Leaky Abstraction.

Serial Attention for Design Quality — DO NOT scan all quality concerns simultaneously. Split attention misses violations that focused passes catch.

Identify applicable dimensions — Based on the code's language, domain, and patterns, determine which quality dimensions apply: DRY, SOLID principles (SRP/OCP/LSP/ISP/DIP), OOP idioms, cohesion/coupling, GRASP, Law of Demeter, CQRS invariants, etc. Your list is NOT fixed — derive from what the code actually does.

One focused pass per dimension — Dedicate single-focus attention to EACH dimension in sequence. Do NOT mix concerns across passes.

Threshold: 3+ similar patterns = MANDATORY extraction — Not optional suggestion. Flag as mandatory structural fix requiring action.

2+ violations of same kind = structural finding — Report as "pattern problem" needing architectural resolution, not a list of individual instances.

Complexity Prevention (Ousterhout) — MANDATORY. Measure code by cost of change: one business change should map to one code change. Flag ALL of the following in review:

Change amplification — small business change forces edits in >3 places → structural flaw. Count edit sites for a plausible future change (add variant, add field, add authorization). >3 = reject.

Cognitive load — reader must hold too much context to safely modify. Flag deep inheritance, long parameter lists, boolean traps, implicit ordering dependencies.

Cross-cutting duplication at entry points — logging, error handling, validation, auth, transactions reimplemented per controller/handler/route. Lift to middleware / interceptor / filter / decorator / aspect.

Leaked implementation technology — repos returning IQueryable/QuerySet/Criteria/raw cursors/ORM entities to callers. Return finished results + intent-revealing methods (GetActiveVipUsers() not Query()).

Type-switch scattering — switch/if-chains on enum/discriminator in >1 place. New variant = new file, not N edits. One factory/registry switch at the boundary OK; scattered switches = reject.

Anemic models — domain objects with only getters/setters, logic floats in services. Move invariants/behavior onto the object (order.Checkout(), not order.Status = ...).

Primitive obsession — raw string/int/decimal for account numbers, emails, money, percentages, date ranges, with re-validation at every entry. Wrap in value objects / records / structs that validate once at construction.

Inline cross-cutting concerns — authorization/tenant isolation/audit/sanitization hand-written at top of every handler. Flag intent with declarative markers (@RequirePermission("Order.Delete")), enforce once centrally.

Shallow modules — tiny class, big interface (many public methods, many flags, many ctor params) wrapping little logic. A module is deep when a small interface hides a lot of implementation. If interface ≈ implementation cost to learn → inline.

Missing base class for repeated component/handler lifecycle — 3+ forms/CRUD handlers/list views reimplementing loading/dirty/submit/pagination → extract to base class / hook / composable / mixin / trait.

Premature vs delayed abstraction — rule-of-three. First occurrence: write it. Second: notice duplication. Third: extract. Don't build generic frameworks before real variation; don't copy-paste for the 4th time.

Embedded utility logic not extracted to helpers — inline paging loops (while (hasMore) { skip += take; ... }), ad-hoc datetime math, string parsing/formatting, collection partitioning, retry/backoff loops, URL/query-string building. If the algorithm is non-trivial AND stack-generic (not business-specific), extract to util/helper/extensions and let consumers call one line. Inline duplicates → duplicated bug surface.

Logic in wrong (higher) layer — downshift to callee — business/derivation logic written in the caller when the callee owns the data. Defaults: Controller code that should be App Service. App Service code that should be Domain Service or Entity. Component code that should be ViewModel/Store/Service. Caller reaching into callee's data shape to compute something → move the computation behind an intent-revealing method on the callee. Lowest responsible layer wins (Entity > Domain Service > App Service > Controller · Model/VM > Store > Component). Higher-layer placement = duplicated logic when a sibling caller needs the same thing.

Owner owns the rule — extract on first write — if a caller inlines logic that derives, normalizes, validates, or computes from another type's data, MOVE it to the owning type. Single use is sufficient — the trigger is wrong responsibility, not duplication. Sibling callers always arrive; inline copies drift silently with no compile error and no name to grep. Common offenders: Backend — inlined rules in application-layer handlers / commands / queries / services / controllers that belong on the domain entity / value object / domain service. Frontend — inlined derivations / formatting / validation in components that belong on the model / store / view-model / API service. Fix: name the rule once as a method (static or instance) on the owning type; callers invoke by name. Future variant → SECOND named method on the owner, never an inline near-duplicate. Right responsibility first; reuse is the consequence.

Extraction target — where the named rule lives:

| Shape of the rule | Goes to | | --------------------------------------------- | ----------------------------- | | Pure function over an entity's own data | static method on the entity | | Behavior that mutates / guards entity state | instance method on the entity | | Always-true invariant on a primitive value | value object constructor | | Needs DI (repo / settings / clock) | helper class registered in DI | | Domain-agnostic algorithm reused across types | util / extension method | | Pure shape / projection conversion | DTO mapping |

Pre-commit edit-site test (reject if answer is "many"):

| Change Scenario | Should touch | | ----------------------------------------------- | ------------------------- | | Add new variant (customer type, payment method) | 1 new file | | Change HTTP error response format | 1 middleware/filter | | Add timestamp field to every persisted entity | 1 base entity/interceptor | | Add authorization to a new endpoint | 1 declarative marker | | Swap database/ORM | Data layer only | | Change business calculation rule | 1 method on owning entity | | Add loading indicator pattern to forms | 1 base component/hook | | Add validation rule to a domain primitive | 1 value-object ctor | | Change paging/retry/datetime algorithm | 1 helper/util function | | Change a derivation of entity data | 1 method on the entity |

Operating heuristics:

Write the call site first.

Count edit sites for plausible future change.

Prefer removing code over adding it.

Surface assumptions at boundaries, hide details inside.

Pre-reuse scan — before writing a non-trivial block, grep for similar algorithms (while.*skip, DateTime.*Add, split/join chains, paging loops, retry loops). Match existing helper → call it. None exists but pattern is stack-generic → extract to util before second caller appears.

Layer placement test — ask "if a sibling caller needed this tomorrow, would they re-derive it?" If yes, the logic is in the wrong layer. Move it down.

Open-case-for-future-reuse — if reviewer spots a block that is likely to appear in another feature (domain-agnostic algorithm, shared lifecycle, recurring derivation), do NOT rationalize with pure YAGNI. Either extract now (if cheap) or create a tracked TODO with the exact extraction target so the second caller does not duplicate silently. Silent duplication is the default failure mode.

When in doubt ask: "What would need to change if the requirement shifts?"

The measure of good code is the cost of change. Not shortest. Not cleverest. Not most abstracted. Cheapest to safely modify having read a small local portion.

Fix-Triggered Re-Review Loop — Re-review is triggered by a FIX CYCLE, not by a round number. Review purpose: review → if issues → fix → re-review until a round finds no issues. A clean review ENDS the loop — no further rounds required.

Round 1: Main-session review. Read target files, build understanding, note issues. Output findings + verdict (PASS / FAIL).

Decision after Round 1:

No issues found (PASS, zero findings) → review ENDS. Do NOT spawn a fresh sub-agent for confirmation.

Issues found (FAIL, or any non-zero findings) → fix the issues, then spawn a fresh sub-agent for Round 2 re-review.

Fresh sub-agent re-review (after every fix cycle): Spawn a NEW spawn_agent tool call — never reuse a prior agent. Sub-agent re-reads ALL files from scratch with ZERO memory of prior rounds. See SYNC:fresh-context-review for the spawn mechanism and SYNC:review-protocol-injection for the canonical Agent prompt template. Each fresh round must catch:

Cross-cutting concerns missed in the prior round

Interaction bugs between changed files

Convention drift (new code vs existing patterns)

Missing pieces that should exist but don't

Subtle edge cases the prior round rationalized away

Regressions introduced by the fixes themselves

Loop termination: After each fresh round, repeat the same decision: clean → END; issues → fix → next fresh round. Continue until a round finds zero issues, or 3 fresh-subagent rounds max, then escalate to user via a direct user question.

Rules:

A clean Round 1 ENDS the review — no mandatory Round 2

NEVER skip the fresh sub-agent re-review after a fix cycle (every fix invalidates the prior verdict)

NEVER reuse a sub-agent across rounds — every iteration spawns a NEW Agent call

Main agent READS sub-agent reports but MUST NOT filter, reinterpret, or override findings

Max 3 fresh-subagent rounds per review — if still FAIL, escalate via a direct user question (do NOT silently loop)

Track round count in conversation context (session-scoped)

Final verdict must incorporate ALL rounds executed

Report must include ## Round N Findings (Fresh Sub-Agent) for every round N≥2 that was executed.

Fresh Sub-Agent Review — Eliminate orchestrator confirmation bias via isolated sub-agents.

Why: The main agent knows what it (or $cook) just fixed and rationalizes findings accordingly. A fresh sub-agent has ZERO memory, re-reads from scratch, and catches what the main agent dismissed. Sub-agent bias is mitigated by (1) fresh context, (2) verbatim protocol injection, (3) main agent not filtering the report.

When: ONLY after a fix cycle. A review round that finds zero issues ENDS the loop — do NOT spawn a confirmation sub-agent. A review round that finds issues triggers: fix → fresh sub-agent re-review.

How:

Spawn a NEW spawn_agent tool call — use code-reviewer subagent_type for code reviews, general-purpose for plan/doc/artifact reviews

Inject ALL required review protocols VERBATIM into the prompt — see SYNC:review-protocol-injection for the full list and template. Never reference protocols by file path; AI compliance drops behind file-read indirection (see SYNC:shared-protocol-duplication-policy)

Sub-agent re-reads ALL target files from scratch via its own tool calls — never pass file contents inline in the prompt

Sub-agent writes structured report to plans/reports/{review-type}-round{N}-{date}.md

Main agent reads the report, integrates findings into its own report, DOES NOT override or filter

Rules:

SKIP fresh sub-agent when the prior round found zero issues (no fixes = nothing new to verify)

NEVER skip fresh sub-agent after a fix cycle — every fix invalidates the prior verdict

NEVER reuse a sub-agent across rounds — every fresh round spawns a NEW spawn_agent call

Max 3 fresh-subagent rounds per review — escalate via a direct user question if still failing; do NOT silently loop or fall back to any prior protocol

Track iteration count in conversation context (session-scoped, no persistent files)

Review Protocol Injection — Every fresh sub-agent review prompt MUST embed 10 protocol blocks VERBATIM. The template below has ALL 10 bodies already expanded inline. Copy the template wholesale into the Agent call's prompt field at runtime, replacing only the {placeholders} in Task / Round / Reference Docs / Target Files / Output sections with context-specific values. Do NOT touch the embedded protocol sections.

Why inline expansion: Placeholder markers would force file-read indirection at runtime. AI compliance drops significantly behind indirection (see SYNC:shared-protocol-duplication-policy). Therefore the template carries all 10 protocol bodies pre-embedded.

Sub-Agent Type Selection

Choose subagent_type based on the dominant concern of the review:

| Dominant Concern | subagent_type | | ---------------------------------------------- | ----------------------- | | Code logic, architecture, correctness | code-reviewer | | Security, auth, permissions, vulnerabilities | security-auditor | | Performance, latency, query efficiency, memory | performance-optimizer | | Documentation, plans, specs, ADRs, configs | general-purpose | | Infrastructure, CI/CD, build tooling | general-purpose | | Mixed concerns (default fallback) | code-reviewer |

For large changesets with multiple distinct dominant concerns — spawn ONE sub-agent per concern type in parallel.

Canonical Agent Call Template (Copy Verbatim)

spawn_agent({
description: "Fresh Round {N} review",
agent_type: "{code-reviewer | security-auditor | performance-optimizer | general-purpose}",
prompt: `
## Task
{review-specific task — e.g., "Review all uncommitted changes for code quality" | "Security review of auth changes" | "Review plan files under {plan-dir}" | "Performance review of data access layer changes"}

## Round
Round {N}. You have ZERO memory of prior rounds. Re-read all target files from scratch via your own tool calls. Do NOT trust anything from the main agent beyond this prompt.

## Protocols (follow VERBATIM — these are non-negotiable)

### Evidence-Based Reasoning
Speculation is FORBIDDEN. Every claim needs proof.
1. Cite file:line, grep results, or framework docs for EVERY claim
2. Declare confidence: >80% act freely, 60-80% verify first, <60% DO NOT recommend
3. Cross-boundary validation required for architectural changes
4. "I don't have enough evidence" is valid and expected output
BLOCKED until: Evidence file path (file:line) provided; Grep search performed; 3+ similar patterns found; Confidence level stated.
Forbidden without proof: "obviously", "I think", "should be", "probably", "this is because".
If incomplete → output: "Insufficient evidence. Verified: [...]. Not verified: [...]."

### Bug Detection
MUST check categories 1-4 for EVERY review. Never skip.
1. Null Safety: Can params/returns be null/undefined? Are they guarded? .find()/.get() returns checked before use?
2. Boundary Conditions: Off-by-one (< vs <=)? Empty collections handled? Zero/negative values? Max limits?
3. Error Handling: Try-catch scope correct? Silent swallowed exceptions? Error types specific? Cleanup in finally/defer?
4. Resource Management: Connections/streams closed? Long-lived resources released? Memory bounded?
5. Concurrency (if async): Missing await/promise handling? Race conditions on shared state? Retry storms?
6. Language/Stack-Specific: Apply known failure modes for the language/runtime in this project — use your domain knowledge of the stack.
Classify: CRITICAL (crash/corrupt) → FAIL | HIGH (incorrect behavior) → FAIL | MEDIUM (edge case) → WARN | LOW (defensive) → INFO.

### Design Patterns Quality
Priority checks for every code change:
1. DRY via OOP: Same-suffix classes MUST share base class. 3+ similar patterns → extract to shared abstraction.
2. Right Responsibility: Logic in LOWEST layer. Never business logic in top-layer orchestrators.
3. SOLID: Single responsibility (one reason to change). Open-closed (extend, don't modify). Liskov (subtypes substitutable). Interface segregation (small interfaces). Dependency inversion (depend on abstractions).
4. After extraction/move/rename: Grep ENTIRE scope for dangling references. Zero tolerance.
5. YAGNI gate: NEVER recommend patterns unless 3+ occurrences exist. Don't extract for hypothetical future use.
Anti-patterns to flag: God Object, Copy-Paste inheritance, Circular Dependency, Leaky Abstraction.

### Logic & Intention Review
Verify WHAT code does matches WHY it was changed.
1. Change Intention Check: Every changed file MUST serve the stated purpose. Flag unrelated changes as scope creep.
2. Happy Path Trace: Walk through one complete success scenario through changed code.
3. Error Path Trace: Walk through one failure/edge case scenario through changed code.
4. Acceptance Mapping: If plan context available, map every acceptance criterion to a code change.
NEVER mark review PASS without completing both traces (happy + error path).

### Test Spec Verification
Map changed code to test specifications.
1. Identify the project's test spec format — grep for test case files (e.g., docs/**/test-*, docs/specs/**, *.feature, *.spec.md, test-cases/).
2. For each changed code path, locate the corresponding test case — or flag as "needs test case".
3. New functions/endpoints/handlers → flag for test spec creation.
4. If test spec evidence fields exist in the project, verify they point to actual code (file:line, not stale).
5. If no specs exist for a changed path → log gap and recommend $tdd-spec.
NEVER skip test mapping. Untested code paths are the #1 source of production bugs.

### Fix-Layer Accountability
NEVER fix at the crash site. Trace the full flow, fix at the owning layer. The crash site is a SYMPTOM, not the cause.
MANDATORY before ANY fix:
1. Trace full data flow — Map the complete path from data origin to crash site across ALL layers. Identify where bad state ENTERS, not where it CRASHES.
2. Identify the invariant owner — Which layer's contract guarantees this value is valid? Fix at the LOWEST layer that owns the invariant, not the highest layer that consumes it.
3. One fix, maximum protection — If fix requires touching 3+ files with defensive checks, you are at the wrong layer — go lower.
4. Verify no bypass paths — Confirm all data flows through the fix point.
BLOCKED until: Full data flow traced (origin → crash); Invariant owner identified with file:line evidence; All access sites audited (grep count); Fix layer justified (lowest layer that protects most consumers).
Anti-patterns (REJECT): "Fix it where it crashes" (crash site ≠ cause site, trace upstream); "Add defensive checks at every consumer" (scattered defense = wrong layer); "Both fix is safer" (pick ONE authoritative layer).

### Rationalization Prevention
AI skips steps via these evasions. Recognize and reject:
- "Too simple for a plan" → Simple + wrong assumptions = wasted time. Plan anyway.
- "I'll test after" → RED before GREEN. Write/verify test first.
- "Already searched" → Show grep evidence with file:line. No proof = no search.
- "Just do it" → Still need task tracking. Skip depth, never skip tracking.
- "Just a small fix" → Small fix in wrong location cascades. Verify file:line first.
- "Code is self-explanatory" → Future readers need evidence trail. Document anyway.
- "Combine steps to save time" → Combined steps dilute focus. Each step has distinct purpose.

### Graph-Assisted Investigation
MANDATORY when .code-graph/graph.db exists.
HARD-GATE: MUST run at least ONE graph command on key files before concluding any investigation.
Pattern: Grep finds files → trace --direction both reveals full system flow → Grep verifies details.
- Investigation/Scout: trace --direction both on 2-3 entry files
- Fix/Debug: callers_of on buggy function + tests_for
- Feature/Enhancement: connections on files to be modified
- Code Review: tests_for on changed functions
- Blast Radius: trace --direction downstream
CLI: python .claude/scripts/code_graph {command} --json. Use --node-mode file first (10-30x less noise), then --node-mode function for detail.

### Understand Code First
HARD-GATE: Do NOT write, plan, or fix until you READ existing code.
1. Search 3+ similar patterns (grep/glob) — cite file:line evidence.
2. Read existing files in target area — understand structure, base classes, conventions.
3. Run python .claude/scripts/code_graph trace <file> --direction both --json when .code-graph/graph.db exists.
4. Map dependencies via connections or callers_of — know what depends on your target.
5. Write investigation to .ai/workspace/analysis/ for non-trivial tasks (3+ files).
6. Re-read analysis file before implementing — never work from memory alone.
7. NEVER invent new patterns when existing ones work — match exactly or document deviation.
BLOCKED until: Read target files; Grep 3+ patterns; Graph trace (if graph.db exists); Assumptions verified with evidence.

### Category Review Thinking
For EACH category of changed files — THINK, do not fill in a checklist. DO NOT limit to the examples below.
Step 1 — Understand the category's role: What is its purpose? What invariants govern it? Who consumes it and what do they expect?
Step 2 — Read project conventions: grep for reference docs, style guides, READMEs for this area. Examine 3+ existing similar files to surface established patterns.
Step 3 — Derive concerns from first principles. Apply ALL that are relevant — expand based on domain knowledge:
- Correctness: logic matches intent? happy path AND error path traced?
- Contracts: interfaces/APIs/events/protocols honored? no implicit coupling introduced?
- Project conventions: follows patterns found in Step 2? evidence-confirmed, not assumed?
- Security: auth enforced? input validated at boundaries? no secrets in diff?
- Performance: unbounded operations? N+1? blocking in async context? unindexed queries?
- Maintainability: DRY? single responsibility? complexity reasonable? names reveal intent?
- Test coverage: changed paths covered? existing tests still valid after the change?
- Documentation: related docs/specs reflect the changes?
Step 4 — For each concern identified: verify with file:line evidence or flag as finding.
Examples only — your knowledge exceeds this list:
- Logic files (any stack): handler/service structure, validation placement, side effect isolation, cross-boundary coupling, data access layer separation
- Data/Schema: rollback path, lock impact on table volume, backfill idempotency, index coverage for query patterns, deployment ordering
- Config files: all environments covered? no secrets committed? app fails fast if missing?
- Infrastructure: dev/prod parity? no hardcoded dev values? pinned versions? CI impact documented?
- Styles/Assets: naming conventions? design variables/tokens used (no magic values)? scope correct?
- Documentation: accurate? links valid? examples match current code/behavior?
- Tests: assertions verify specific outcomes (not just no-exception)? idempotent (repeatable N times)? edge cases covered?
- Security artifacts: all code paths reach the gate? negative tests exist? both enforcement AND display control updated?
- Build/Tooling: rule changes apply consistently? violations not silently swallowed? CI runtime impact?

## Reference Docs (READ before reviewing)
{Discover by searching *patterns*, *conventions*, *style-guide*, *architecture*, README at service/module roots — list what you find}

## Target Files
{explicit file list OR "run git diff to see uncommitted changes" OR "read all files under {plan-dir}"}

## Output
Write a structured report to plans/reports/{review-type}-round{N}-{date}.md with sections:
- Status: PASS | FAIL
- Issue Count: {number}
- Critical Issues (with file:line evidence)
- High Priority Issues (with file:line evidence)
- Medium / Low Issues
- Cross-cutting findings

Return the report path and status to the main agent.
Every finding MUST have file:line evidence. Speculation is forbidden.
`
})

Rules

DO copy the template wholesale — including all 10 embedded protocol sections
DO replace only the {placeholders} in Task / Round / Reference Docs / Target Files / Output sections with context-specific content
DO choose subagent_type based on the dominant concern (see Sub-Agent Type Selection above)
DO NOT paraphrase, summarize, or skip any protocol section
DO NOT pass file contents inline — the sub-agent reads via its own tool calls so it has a fresh context
DO NOT reference protocols by file path or tag name — the bodies are already embedded above
DO NOT introduce placeholder markers for the protocols — they must stay literally expanded

Logic & Intention Review — Verify WHAT code does matches WHY it was changed.

Change Intention Check: Every changed file MUST ATTENTION serve the stated purpose. Flag unrelated changes as scope creep.

Happy Path Trace: Walk through one complete success scenario through changed code

Error Path Trace: Walk through one failure/edge case scenario through changed code

Acceptance Mapping: If plan context available, map every acceptance criterion to a code change

NEVER mark review PASS without completing both traces (happy + error path).

Bug Detection — MUST ATTENTION check categories 1-4 for EVERY review. Never skip.

Null Safety: Can params/returns be null/undefined? Are they guarded? .find()/.get() returns checked before use?

Boundary Conditions: Off-by-one (< vs <=)? Empty collections handled? Zero/negative values? Max limits?

Error Handling: Try-catch scope correct? Silent swallowed exceptions? Error types specific? Cleanup in finally/defer?

Resource Management: Connections/streams closed? Long-lived resources released? Memory bounded?

Concurrency (if async): Missing await/promise handling? Race conditions on shared state? Retry storms?

Language/Stack-Specific: Apply known failure modes for the language/runtime in this project — use your domain knowledge of the stack.

Classify: CRITICAL (crash/corrupt) → FAIL | HIGH (incorrect behavior) → FAIL | MEDIUM (edge case) → WARN | LOW (defensive) → INFO

Test Spec Verification — Map changed code to test specifications.

Identify the project's test spec format — grep for test case files (e.g., docs/**/test-*, docs/specs/**, *.feature, *.spec.md, test-cases/)

For each changed code path, locate the corresponding test case — or flag as "needs test case"

New functions/endpoints/handlers → flag for test spec creation

If test spec evidence fields exist in the project, verify they point to actual code (file:line, not stale references)

If no specs exist for a changed path → log gap and recommend $tdd-spec

NEVER skip test mapping. Untested code paths are the #1 source of production bugs.

Integration Test Sync Check — Verify changed business logic files have corresponding tests.

From changed files → identify business logic files: handlers, commands, queries, services, controllers, resolvers, event processors. Naming varies by stack — infer from project conventions (e.g., *Service.*, *Handler.*, *Controller.*, *Command.*, *Query.*, *Resolver.*).

For each identified file → search for a corresponding test file. Infer test naming from existing tests in the project (e.g., *.test.ts, *Tests.java, *_test.py, *.spec.js, *Tests.cs). Check standard test directories (tests/, spec/, __tests__/, or adjacent test projects/packages).

If test EXISTS → check if test methods cover changed behavior (new methods/parameters/logic paths)

If test MISSING → MANDATORY: use a direct user question: "Business logic file {file} has no integration tests — run $integration-test before proceeding, or confirm tests already written?" Options: "Run $integration-test first" (Recommended) | "Tests already written/updated — proceed"

Severity: HIGH — missing tests for changed business logic MUST be surfaced to the user; do NOT silently flag and continue

Do NOT silently skip. Business logic changes without test coverage require an explicit user decision via a direct user question.

Translation Sync Check — Verify multilingual UI changes include translation updates.

Determine multilingual mode from project config: localization.enabled === true and supportedLocales.length > 1

Detect UI-facing file changes via extensions/path patterns (.ts, .tsx, .html, .css, .scss plus localization.uiPathPatterns when configured)

For multilingual UI changes, verify translation resource diffs exist (localization.translationFilePatterns when configured)

If translation updates are missing → MANDATORY: use a direct user question: "UI text changed in a multilingual project, but translation updates were not detected. Run translation sync now or proceed with explicit risk acceptance?" Options: "Run translation sync first" (Recommended) | "Proceed with explicit risk acceptance"

Severity: HIGH — no silent pass for multilingual UI text changes without explicit translation-sync decision

Do NOT silently skip. Multilingual UI text changes require explicit translation-sync confirmation.

Category Review Thinking — A thinking framework for reviewing any category of changed files. This is NOT a fixed checklist. Derive concerns from domain knowledge — the examples are starting points only. Your knowledge of the category exceeds any list here. Trust it.

Step 1: Understand the category's role

What is this category's responsibility in the overall system?
What invariants must it uphold?
What are its consumer contracts (who depends on it, what do they expect)?

Step 2: Read project conventions for this category

Search for reference docs, style guides, ADRs, or READMEs specific to this area
Grep 3+ existing similar files — extract naming conventions, structural patterns, shared base classes
If no docs exist, derive conventions empirically from existing code

Step 3: Derive concerns from first principles

Apply all that are relevant — expand beyond this list based on the actual category:

Correctness: Does the logic match the intent? Trace happy path AND error path.
Boundary contracts: Are interfaces/APIs/events/protocols honored? No implicit coupling introduced?
Project conventions: Does new code follow patterns found in Step 2? Evidence-confirmed, not assumed.
Security: Auth enforced at every entry point? Input validated at boundaries? No secrets in diff?
Performance: Unbounded operations? N+1 patterns? Blocking calls in async context? Unindexed queries?
Maintainability: DRY? Single responsibility? Complexity within reason? Names reveal intent?
Test coverage: Are the changed paths covered by tests? Are existing tests still valid after the change?
Documentation: Do related docs, specs, or READMEs reflect the changes?

Step 4: Create sub-tasks and execute

For each identified concern: create a task tracking sub-task, work through it with file:line evidence, mark done.

Illustrative concern examples by category type (not exhaustive — trust your knowledge beyond this):

Server-side logic: Handler/service structure conventions, validation layer placement, side effect isolation, cross-service boundary enforcement, data access layer separation, error propagation strategy

Client-side logic: Component lifecycle management, resource cleanup (subscriptions, listeners, timers), state management patterns, API integration layer separation, reactive stream composition

Data/Schema: Migration reversibility (rollback script), lock impact on table volume, backfill idempotency, index coverage for query patterns, deployment ordering

Configuration: Present in ALL environments? No secrets in diff? App fails fast if config missing (not silently null)? Documented in setup guide?

Infrastructure: Dev/prod parity? No hardcoded dev values (localhost, debug flags)? Pinned image/dependency versions? CI/CD secret requirements documented?

Styles/Assets: Follows project naming conventions? Uses design variables/tokens (no hardcoded magic values)? Correct scope (no global side effects from component styles)?

Documentation: Accurate? Links valid? Examples still match current code/behavior? Covers new scenarios?

Tests: Assertions verify specific outcomes (not just "no exception")? Idempotent (repeatable N times)? Covers edge cases, not just happy path?

Security artifacts: All code paths reach the gate? Negative tests exist (unauthorized denied)? Both enforcement AND display control updated?

Build/Tooling: Rule changes apply consistently? No exceptions that silently swallow violations? Impact on CI runtime documented?

Graph-Assisted Investigation — MANDATORY when .code-graph/graph.db exists.

HARD-GATE: MUST ATTENTION run at least ONE graph command on key files before concluding any investigation.

Pattern: Grep finds files → trace --direction both reveals full system flow → Grep verifies details

| Task | Minimum Graph Action | | ------------------- | -------------------------------------------- | | Investigation/Scout | trace --direction both on 2-3 entry files | | Fix/Debug | callers_of on buggy function + tests_for | | Feature/Enhancement | connections on files to be modified | | Code Review | tests_for on changed functions | | Blast Radius | trace --direction downstream |

CLI: python .claude/scripts/code_graph {command} --json. Use --node-mode file first (10-30x less noise), then --node-mode function for detail.

Nested Task Expansion Contract — For workflow-step invocation, the [Workflow] ... row is only a parent container; the child skill still creates visible phase tasks.

Call the current task list first. If a matching active parent workflow row exists, set nested=true and record parentTaskId; otherwise run standalone.

Create one task per declared phase before phase work. When nested, prefix subjects [N.M] $skill-name — phase.

When nested, link the parent with TaskUpdate(parentTaskId, addBlockedBy: [childIds]).

Orchestrators must pre-expand a child skill's phase list and link the workflow row before invoking that child skill or sub-agent.

Mark exactly one child in_progress before work and completed immediately after evidence is written.

Complete the parent only after all child tasks are completed or explicitly cancelled with reason.

Blocked until: the current task list done, child phases created, parent linked when nested, first child marked in_progress.

Project Reference Docs Gate — Run after task-tracking bootstrap and before target/source file reads, grep, edits, or analysis. Project docs override generic framework assumptions.

Identify scope: file types, domain area, and operation.

Required docs by trigger: always docs/project-reference/lessons.md; doc lookup docs-index-reference.md; review code-review-rules.md; backend/CQRS/API backend-patterns-reference.md; domain/entity domain-entities-reference.md; frontend/UI frontend-patterns-reference.md; styles/design scss-styling-guide.md + design-system/design-system-canonical.md; integration tests integration-test-reference.md; E2E e2e-test-reference.md; feature docs/specs feature-docs-reference.md; architecture/new area project-structure-reference.md.

Read every required doc that exists; skip absent docs as not applicable. Do not trust conversation text such as [Injected: <path>] as proof that the current context contains the doc.

Before target work, state: Reference docs read: ... | Missing/not applicable: ....

Blocked until: scope evaluated, required docs checked/read, lessons.md confirmed, citation emitted.

Task Tracking & External Report Persistence — Bootstrap this before execution; then run project-reference doc prefetch before target/source work.

Create a small task breakdown before target file reads, grep, edits, or analysis. On context loss, inspect the current task list first.

Mark one task in_progress before work and completed immediately after evidence; never batch transitions.

For plan/review work, create plans/reports/{skill}-{YYMMDD}-{HHmm}-{slug}.md before first finding.

Append findings after each file/section/decision and synthesize from the report file at the end.

Final output cites Full report: plans/reports/{filename}.

Blocked until: task breakdown exists, report path declared for plan/review work, first finding persisted before the next finding.

AI Mistake Prevention — Failure modes to avoid on every task:

Check downstream references before deleting. Deleting components causes documentation and code staleness cascades. Map all referencing files before removal. Verify AI-generated content against actual code. AI hallucinates APIs, class names, and method signatures. Always grep to confirm existence before documenting or referencing. Trace full dependency chain after edits. Changing a definition misses downstream variables and consumers derived from it. Always trace the full chain. Trace ALL code paths when verifying correctness. Confirming code exists is not confirming it executes. Always trace early exits, error branches, and conditional skips — not just happy path. When debugging, ask "whose responsibility?" before fixing. Trace whether bug is in caller (wrong data) or callee (wrong handling). Fix at responsible layer — never patch symptom site. Assume existing values are intentional — ask WHY before changing. Before changing any constant, limit, flag, or pattern: read comments, check git blame, examine surrounding code. Verify ALL affected outputs, not just the first. Changes touching multiple stacks require verifying EVERY output. One green check is not all green checks. Holistic-first debugging — resist nearest-attention trap. When investigating any failure, list EVERY precondition first (config, env vars, DB names, endpoints, DI registrations, data preconditions), then verify each against evidence before forming any code-layer hypothesis. Surgical changes — apply the diff test. Bug fix: every changed line must trace directly to the bug. Don't restyle or improve adjacent code. Enhancement task: implement improvements AND announce them explicitly. Surface ambiguity before coding — don't pick silently. If request has multiple interpretations, present each with effort estimate and ask. Never assume all-records, file-based, or more complex path. Business terminology in Application/Domain layers. Comments and naming in Application/Domain must stay business-oriented and technical-agnostic; avoid implementation terms (say background job, not Hangfire background job).

IMPORTANT MUST ATTENTION search 3+ existing patterns and read code BEFORE any modification. Run graph trace when graph.db exists.

IMPORTANT MUST ATTENTION check DRY via OOP, right responsibility layer, SOLID. Grep for dangling refs after moves.

IMPORTANT MUST ATTENTION apply complexity prevention — one business change = one code change. Flag change amplification (>3 edit sites for future change), scattered type-switches, anemic models, primitive obsession, leaked technology through abstractions, shallow modules, un-extracted utility logic (paging/datetime/string/retry → helpers), and logic in the wrong higher layer (downshift to callee/entity/VM). Don't rationalize silent duplication with pure YAGNI.

IMPORTANT MUST ATTENTION run at least ONE graph command on key files when graph.db exists. Pattern: grep → trace → verify.

IMPORTANT MUST ATTENTION verify WHAT code does matches WHY it changed. Trace happy + error paths.

IMPORTANT MUST ATTENTION check null safety, boundaries, error handling, resource management for every review.

IMPORTANT MUST ATTENTION map changed code paths to test cases. Flag untested paths.

IMPORTANT MUST ATTENTION check changed logic files for matching tests. Surface missing tests via a direct user question — mandatory, not advisory.

IMPORTANT MUST ATTENTION for multilingual UI text changes, verify translation updates. If missing, require explicit user decision via a direct user question.

MUST ATTENTION apply critical thinking — every claim needs traced proof, confidence >80% to act. Anti-hallucination: never present guess as fact.

MUST ATTENTION apply sequential-thinking — multi-step Thought N/M, REVISION/BRANCH/HYPOTHESIS markers, confidence % closer; see $sequential-thinking skill.

MUST ATTENTION apply AI mistake prevention — holistic-first debugging, fix at responsible layer, surface ambiguity before coding, re-read files after compaction.

MANDATORY Bootstrap task tracking before target work; transition one task at a time.
MANDATORY Persist plan/review findings to plans/reports/ incrementally and synthesize from disk.

MANDATORY After task-tracking bootstrap and before target/source work, read required project-reference docs and cite Reference docs read: ....
MANDATORY Always include lessons.md; project conventions override generic defaults.

MANDATORY Parent workflow rows do not replace child phase tracking; expand phases and link the parent when nested.
MANDATORY Orchestrators pre-expand child skill phases before invocation; use [N.M] $skill-name — phase prefixes and one-in_progress discipline.

Prompt-Enhance Closing Anchors

IMPORTANT MUST ATTENTION follow declared step order for this skill; NEVER skip, reorder, or merge steps without explicit user approval IMPORTANT MUST ATTENTION for every step/sub-skill call: set in_progress before execution, set completed after execution IMPORTANT MUST ATTENTION every skipped step MUST include explicit reason; every completed step MUST include concise evidence IMPORTANT MUST ATTENTION if Task tools unavailable, maintain an equivalent step-by-step plan tracker with synchronized statuses

Closing Reminders

[CRITICAL — TOP 3 RULES REPEATED]

MUST ATTENTION Phase 0 graph blast-radius FIRST — NEVER skip; informs entire review priority order

Clean Round 1 ENDS the review. When issues found, fresh sub-agent re-review mandatory after fixing.

MUST ATTENTION task tracking ALL phases before starting; missing tests MUST surface via a direct user question

MANDATORY IMPORTANT MUST ATTENTION Nested Task Expansion Contract — when invoked inside a workflow, STILL expand internal phases via task tracking with [N.M] $review-changes — phase prefix and TaskUpdate(parentTaskId, addBlockedBy: [childIds]) linkage. Workflow row is container, not substitute.
MANDATORY IMPORTANT MUST ATTENTION break work into small todo tasks using task tracking BEFORE starting
MANDATORY IMPORTANT MUST ATTENTION validate decisions with user via a direct user question — NEVER auto-decide
MANDATORY IMPORTANT MUST ATTENTION add final review todo task to verify work quality
MANDATORY IMPORTANT MUST ATTENTION discover and READ project-specific reference docs before starting
MANDATORY IMPORTANT MUST ATTENTION Phase 0 graph blast-radius is FIRST step — NEVER skip it
MANDATORY IMPORTANT MUST ATTENTION fresh sub-agent re-review is mandatory ONLY after a fix cycle. Clean Round 1 ENDS the review.
MANDATORY IMPORTANT MUST ATTENTION documentation staleness check is REQUIRED in every review — flag stale docs even if not auto-fixing
MANDATORY IMPORTANT MUST ATTENTION missing tests for changed business logic MUST surface to user via a direct user question — NOT silently logged
MANDATORY IMPORTANT MUST ATTENTION run $why-review after completing this review to validate design rationale, alternatives considered, and risk assessment

[TASK-PLANNING] Before acting, analyze task scope and systematically break into small todo tasks and sub-tasks using task tracking.

[IMPORTANT] Analyze task size and break into many small todo tasks systematically before starting — critical for context preservation.

[FINAL PURPOSE REMINDER — MUST ATTENTION CRITICAL]

Ensure the changes is reasonable, no potential bugs or flaws, critical thinking hard.

Hookless Prompt Protocol Mirror (Auto-Synced)

Source: .claude/hooks/lib/prompt-injections.cjs + .claude/.ck.json

[WORKFLOW-EXECUTION-PROTOCOL] [BLOCKING] Workflow Execution Protocol — MANDATORY IMPORTANT MUST CRITICAL. Do not skip for any reason.

DETECT: Match prompt against workflow catalog
ANALYZE: Find best-match workflow AND evaluate if a custom step combination would fit better
ASK (REQUIRED FORMAT): Use a direct user question with this structure:
- Question: "Which workflow do you want to activate?"
- Option 1: "Activate [BestMatch Workflow] (Recommended)"
- Option 2: "Activate custom workflow: [step1 → step2 → ...]" (include one-line rationale)
ACTIVATE (if confirmed): Call $workflow-start <workflowId> for standard; sequence custom steps manually
CREATE TASKS: task tracking for ALL workflow steps
EXECUTE: Follow each step in sequence [CRITICAL-THINKING-MINDSET] Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80% to act. Anti-hallucination principle: Never present guess as fact — cite sources for every claim, admit uncertainty freely, self-check output for errors, cross-reference independently, stay skeptical of own confidence — certainty without evidence root of all hallucination. AI Attention principle (Primacy-Recency): Put the 3 most critical rules at both top and bottom of long prompts/protocols so instruction adherence survives long context windows.

Learned Lessons

Lessons Learned

[CRITICAL] Hard-won project debugging/architecture rules. MUST ATTENTION apply BEFORE forming hypothesis or writing code.

Quick Summary

Goal: Prevent recurrence of known failure patterns — debugging, architecture, naming, AI orchestration, environment.

Top Rules (apply always):

MUST ATTENTION verify ALL preconditions (config, env, DB names, DI regs) BEFORE code-layer hypothesis
MUST ATTENTION fix responsible layer — NEVER patch symptom sites with caller-specific defensive code
MUST ATTENTION use ExecuteInjectScopedAsync for parallel async + repo/UoW — NEVER ExecuteUowTask
MUST ATTENTION name by PURPOSE not CONTENT — adding member forces rename = abstraction broken
MUST ATTENTION persist sub-agent findings incrementally after each file — NEVER batch at end
MUST ATTENTION Windows bash: verify Python alias (where python/where py) — NEVER assume python/python3 resolves

Debugging & Root Cause Reasoning

[2026-04-11] Holistic-first: verify environment before code. Failure → list ALL preconditions (config, env vars, DB names, endpoints, DI regs, credentials, permissions, data prerequisites) → verify each via evidence (grep/cat/query) BEFORE code-layer hypothesis. Worst rabbit holes: diving nearest layer while bug sits elsewhere — e.g., hours debugging "sync timeout", real cause: test appsettings pointing wrong DB. ALWAYS cheapest check first.
[2026-04-01] Ask "whose responsibility?" before fixing. Trace: bug caller (wrong data) or callee (wrong handling)? Fix responsible layer — NEVER patch symptom site masking real issue.
[2026-04-01] Trace data lifecycle, not error site. Follow data: creation → transformation → consumption. Bug usually where data created wrong, not consumed.
[2026-04-01] Code caller-agnostic. Functions/handlers/consumers don't know who invokes them. Comments/guards/messages describe business intent — NEVER reference specific callers (tests, seeders, scripts).

Architecture Invariants

[2026-05-09] User name materialization MUST ATTENTION go through User.UpdateName(firstName, middleName, lastName). Domain method (src/Services/bravoTALENTS/Employee.Domain/AggregatesModel/User.cs:202-209) recomputes FullName as single source of truth. Three sites still manually patch user.FullName = user.GetFullName() after assigning name fields — src/Services/bravoTALENTS/Employee.Application/Factories/UserFactory.cs:50, src/Services/bravoSURVEYS/LearningPlatform.Application/ApplyPlatform/MessageBus/Consumers/AccountUserDeletedEventBusConsumer.cs:102, src/Services/bravoINSIGHTS/Analyze/Analyze.Application/MessageBus/Consumers/AccountUserDeletedEventBusConsumer.cs:66. Next time touching any: replace manual patch with user.UpdateName(...) to maintain invariant.
[2026-03-31] ParallelAsync + repo/UoW MUST ATTENTION use ExecuteInjectScopedAsync, NEVER ExecuteUowTask. ExecuteUowTask creates new UoW but reuses outer DI scope (same DbContext) — parallel iterations sharing non-thread-safe DbContext silently corrupt data. ExecuteInjectScopedAsync creates new UoW + new DI scope (fresh repo per iteration).
[2026-03-31] Bus message naming MUST ATTENTION include service name prefix — core services NEVER consume feature events. Prefix declares schema ownership (AccountUserEntityEventBusMessage = Accounts owns). Core services (Accounts, Communication) leaders. Feature services (Growth, Talents) sending to core MUST ATTENTION use {CoreServiceName}...RequestBusMessage — NEVER define own event for core to consume.

Naming & Abstraction

[2026-04-12] Name PURPOSE not CONTENT — "OrXxx" anti-pattern. HrManagerOrHrOrPayrollHrOperationsPolicy names set members, not what guards. Add role → rename = broken abstraction. Rule: names express DOES/GUARDS, not CONTAINS. Test: adding/removing member forces rename? YES = content-driven = bad → rename to purpose (e.g., HrOperationsAccessPolicy). Nuance: "Or" fine behavioral idioms (FirstOrDefault, SuccessOrThrow) — expresses HAPPENS, not membership.

Environment & Tooling

[2026-04-20] Windows bash: NEVER assume python/python3 resolves — verify alias first. Python may not be bash PATH under those names. Check: where python / where py. ALWAYS prefer py (Windows Python Launcher) one-liners, node if JS alternative exists.

Test-specific lessons → docs/project-reference/integration-test-reference.md Lessons Learned section. Production-code anti-patterns → docs/project-reference/backend-patterns-reference.md Anti-Patterns section. Generic debugging/refactoring reminders → System Lessons .claude/hooks/lib/prompt-injections.cjs.

Closing Reminders

IMPORTANT MUST ATTENTION holistic-first: verify ALL preconditions (config, env, DB names, endpoints, DI regs) BEFORE code-layer hypothesis — cheapest check first
IMPORTANT MUST ATTENTION fix responsible layer — NEVER patch symptom site; trace caller (wrong data) vs callee (wrong handling), fix root owner
IMPORTANT MUST ATTENTION parallel async + repo/UoW → ALWAYS ExecuteInjectScopedAsync, NEVER ExecuteUowTask (shared DbContext = silent data corruption)
IMPORTANT MUST ATTENTION bus message prefix = schema ownership; feature services NEVER define events for core services — use {CoreServiceName}...RequestBusMessage
IMPORTANT MUST ATTENTION name by PURPOSE — adding/removing member forces rename = broken abstraction
IMPORTANT MUST ATTENTION sub-agents MUST write findings after each file/section — NEVER batch all findings into one final write
IMPORTANT MUST ATTENTION Windows bash: NEVER assume python/python3 resolves — run where python/where py first, use py launcher or node
IMPORTANT MUST ATTENTION every claim needs file:line evidence — confidence >80% to act, NEVER speculate

[LESSON-LEARNED-REMINDER] [BLOCKING] Task Planning & Continuous Improvement — MANDATORY. Do not skip.

Break work into small tasks (task tracking) before starting. Add final task: "Analyze AI mistakes & lessons learned".

Extract lessons — ROOT CAUSE ONLY, not symptom fixes:

Name the FAILURE MODE (reasoning/assumption failure), not symptom — "assumed API existed without reading source" not "used wrong enum value".
Generality test: does this failure mode apply to ≥3 contexts/codebases? If not, abstract one level up.
Write as a universal rule — strip project-specific names/paths/classes. Useful on any codebase.
Consolidate: multiple mistakes sharing one failure mode → ONE lesson.
Recurrence gate: "Would this recur in future session WITHOUT this reminder?" — No → skip $learn.
Auto-fix gate: "Could $code-review/$code-simplifier/$security/$lint catch this?" — Yes → improve review skill instead.
BOTH gates pass → ask user to run $learn. [TASK-PLANNING] [MANDATORY] BEFORE executing any workflow or skill step, create/update task tracking for all planned steps, then keep it synchronized as each step starts/completes.

Agent Skills: Code Review: Current Or Branch Diff

Install this agent skill to your local

Skill Files

Codex Project-Reference Loading (No Hooks)

Quick Summary

Code Review: Current Or Branch Diff

Review Scope

Review Mindset (NON-NEGOTIABLE)

Core Principles (ENFORCE ALL)

Blast Radius Pre-Analysis (MANDATORY FIRST STEP)

Graph-Assisted Change Review

Review Approach (Report-Driven Two-Phase — CRITICAL)

Phase 5: Docs-Update Triage (CONDITIONAL)

Readability Checklist (MUST ATTENTION evaluate)

Review Checklist

1. Architecture Compliance

2. Code Quality & Clean Code

2.5. Naming Conventions

3. Project-Specific Patterns

4. Security

5. Performance

6. Common Issues

7. Documentation Staleness

Output Format

Systematic Review Protocol (for 10+ changed files)

Auto-Activation

Step 1: Categorize Changes

Step 2: Fire Parallel Specialized Sub-Agents

Step 3: Synchronize & Cross-Reference

Step 4: Holistic Big-Picture Assessment

Workflow Recommendation

Architecture Boundary Check

Phase 6: Why-Review Self-Validation Gate (MANDATORY when findings exist)

Next Steps

AI Agent Integrity Gate (NON-NEGOTIABLE)

Related Skills

Standalone Chain

Sub-Agent Type Selection

Canonical Agent Call Template (Copy Verbatim)

Rules

Prompt-Enhance Closing Anchors

Closing Reminders

Hookless Prompt Protocol Mirror (Auto-Synced)

[WORKFLOW-EXECUTION-PROTOCOL] [BLOCKING] Workflow Execution Protocol — MANDATORY IMPORTANT MUST CRITICAL. Do not skip for any reason.

Learned Lessons

Lessons Learned

Quick Summary

Debugging & Root Cause Reasoning

Architecture Invariants

Naming & Abstraction

Environment & Tooling

Closing Reminders

[LESSON-LEARNED-REMINDER] [BLOCKING] Task Planning & Continuous Improvement — MANDATORY. Do not skip.