Dev Workflow Skill | Agent Skills

Dev Workflow

Usage

/dev-workflow --init                                     # Project setup (detect check/test commands)
/dev-workflow [-i N | --iterations N] [--fast] <task>    # Execute workflow (default)
/dev-workflow --resume <state-file> [-i N] [--fast]      # Resume next subtask from a decomposition state file

Prerequisites

Reviewer skill (reviewer setting, default: ask-peer): Required for plan/code review. Supported: ask-peer, ask-claude, ask-codex, ask-gemini, ask-copilot, ask-agy. If a Skill() call for the configured reviewer fails, attempt once more before declaring unavailable. If still unavailable, present the user with three explicit fallback options, each with its own resume semantics: (a) switch to another supported reviewer from the list — re-invoke the current review step with the new reviewer immediately (the original reviewer is not retried); (b) self-review — perform the review inline and advance past the current step (no later retry of the original reviewer); (c) pause at the current gate until the skill is installed — name the specific step where the original reviewer call will be retried once the skill is available. If the user picks option (a) and the substitute reviewer's Skill() call also fails, the same retry-once-then-three-options protocol re-applies to the substitute — it is keyed on whichever reviewer is currently selected, not on the original reviewer setting. Do not silently advance past a review pass without the user knowing their options.
rules-review skill: Required for rules compliance review (Step 7.5). If a Skill(rules-review) call fails, attempt once more before declaring unavailable. If still unavailable: skip Step 7.5 with a message that names the fallback (Step 8 reviewer as a lightweight backup) and the resume point (re-run rules-review manually after the session or re-run the workflow once the skill is installed), and append rules-review unavailable (Step 7.5) to bundle_skills_unavailable (§ Step 1 sub-step 3's "Initialize the bundle-unavailability ledger here" bullet).
extract-rules skill: Required for rule update. If a Skill(extract-rules) call fails, attempt once more before declaring unavailable. If still unavailable, Step 11 skips its extraction work and proceeds without it — no rule updates run this session. The fallback (saving reusable patterns to a candidates file for later manual re-extraction) and the bundle_skills_unavailable append are defined inline at Step 11 sub-step 4; this bullet does not restate them.
Cleanup skill (Step 6 Tidy): The Step 6 cleanup pass prefers the built-in simplify skill. Invoke Skill(simplify); if the call fails (skill-not-found or equivalent — a Claude Code version that lacks the built-in simplify), attempt once more, then emit a one-line note naming the fallback (e.g. simplify unavailable — falling back to in-house tidy) and fall back to the bundled Skill(tidy). simplify's absence is not recorded to bundle_skills_unavailable — it is a Claude Code version issue unrelated to dev-workflow-bundle installation. If Skill(tidy) also fails (attempt once more): this second failure is a bundle-completeness signal (tidy is itself a dev-workflow-bundle sibling skill) — skip the Step 6 cleanup dispatch entirely with a one-line note (both simplify and tidy unavailable this run) and append tidy unavailable (Step 6 cleanup fallback) to bundle_skills_unavailable (§ Step 1 sub-step 3's "Initialize the bundle-unavailability ledger here" bullet); see Step 6 sub-step 2's "Fallback — Skill(tidy)" bullet for where this folds into Step 6's existing completion path. "Available" is defined by the observable call outcome (a successful call), not by introspecting the in-context skill list — this mirrors the reviewer / rules-review / extract-rules bullets above so the orchestrator follows it deterministically. This bullet is the single source of truth for the simplify→tidy resolution (and the tidy-also-fails case above); Step 6 references it rather than restating the definition. The fallback proceeds without a user gate (unlike the reviewer bullet's three-option prompt): tidy is a functionally-equivalent cleanup pass, so swapping it for simplify does not change outcomes materially enough to warrant a user decision — whereas a reviewer swap changes review quality and so warrants one. After simplify (or the tidy fallback, or the both-unavailable skip) returns, judge the result semantically and proceed per § No-Stall Principle.
prose-polish skill (Step 4 plan-body polish + Step 6.5 Polish Prose): Used to refine resolved-language prose in the plan body (Step 4) and in changed files (Step 6.5). Both call sites are gated by the polish_prose setting (default true); when it is not true, neither pass runs (see § Configuration's polish_prose bullet). If a Skill(prose-polish) call fails (no verdict returned, or the skill unavailable), the workflow retries once, then proceeds without it — Step 4 presents the un-polished plan and Step 6.5 leaves the prose un-polished; both skip-and-continue without a user gate (polish is correctness-neutral, like the simplify→tidy fallback above). The fallback contract is defined inline at Step 4's plan-body prose-polish paragraph and Step 6.5's verdict-handling sub-step; this bullet does not restate it.

Configuration

Settings files (YAML frontmatter only, merged across layers):

~/.claude/dev-workflow.local.md — User global defaults (lowest priority)
.claude/dev-workflow.md — Project shared settings (git tracked, team-shared)
.claude/dev-workflow.local.md — Personal overrides (gitignored, highest priority)

Merge strategy per key type (summary — the canonical operational definition, including the null/empty-clears and absent-inherits rules, is the Step 1: Load Settings "Overlay" procedure; keep the two in sync):

Scalar (reviewer, review_iterations, subagent_model, interactive_commits, compact_rules, plan_review_gate (+ its deprecated predecessor visual_plan_review, still resolved as a Scalar via the compat mapping described in the plan_review_gate bullet), polish_prose, confirm_remaining_steps, custom_instructions, language): higher layer wins (replaces) when the key is present; a key absent from a higher layer inherits from lower layers (see the inherit note below). When review_iterations carries a map value ({plan, code}) it is still a scalar key here — a higher layer's value replaces the lower layer's wholesale, with no per-key cross-layer merge (an absent map key is not back-filled from a lower layer; it falls to default 3 at resolution time). The subagent_model map ({<tier>: <model>}) is the same scalar/map class — a higher layer's map replaces the lower layer's wholesale (no per-key cross-layer merge), and an absent tier key falls to its built-in per-tier default at resolution time (sonnet for trivial / simple, inherit for moderate / complex)
List (check_commands): append — lower-layer items first, then higher-layer items, duplicates removed (keep first occurrence)
List-replace (test_commands): higher layer's list replaces lower layer's list as a whole (no item-level merge or dedup). Defaults to ["Skill(run-tests)"] when unset
hooks: deep-merge at the hooks level — each sub-key (on_complete) is merged as a list (append, deduplicated)

Keys absent from a higher layer inherit from lower layers. Only specify keys you want to override or extend.

---
reviewer: "ask-peer"
review_iterations: 3
subagent_model:
  trivial: sonnet
  simple: sonnet
interactive_commits: true
compact_rules: false
plan_review_gate: "visual"   # plan-mode | visual | crit
polish_prose: true
confirm_remaining_steps: false
custom_instructions: "Always use TDD. Write tests before implementation."
language: "ja"
check_commands:
  - "pnpm run lint:fix"
  - "pnpm run format"
  - "pnpm run typecheck"
test_commands:
  - "Skill(run-tests)"
hooks:
  on_complete:
    - "Skill(work-complete)"
self_retrospective:
  feedback: "owner/repo"        # or "/abs/path", "~/rel", "./rel"
workability_retrospective:
  enabled: false                # opt-in (experimental); Step 11.6 project-tooling retrospective
  backlog_dir: ".claude/improvements"
---

reviewer: Reviewer skill name (default: ask-peer). Choose from: ask-peer, ask-claude, ask-codex, ask-gemini, ask-copilot, ask-agy. Unsupported values fall back to ask-peer
review_iterations: Max iterations for Plan Review (Step 3) and Code Review (Step 8) (default: 3). Two accepted forms: (i) a scalar positive integer applies the same cap to both phases (e.g. review_iterations: 2); (ii) a map {plan: N, code: M} sets the Plan Review cap (Step 3) and the Code Review cap (Step 8) independently (e.g. {plan: 1, code: 3}). Each map key must be a positive integer; an absent or invalid key falls back to default 3 for that phase only (per-key validation — see Step 1 sub-step 4). Adopting the map form is the user's explicit opt-in; the scalar form and an absent key are fully backward-compatible (both phases default to 3). Can be overridden per invocation with -i N / --iterations N, which overrides both phases with the same value regardless of the config form
subagent_model: Optional. A map from difficulty tier (trivial / simple / moderate / complex) to the model the workflow's Agent-tool subagent dispatches run on — one of the model ids the current Agent tool's model parameter accepts (sonnet / opus / haiku / fable as of this writing; check the tool's live schema loaded in the current session rather than this fixed list, since Anthropic adds new model families over time), or inherit (use the session model). It governs (i) the workflow's direct Agent dispatches (Step 7's two background launches, the shared session scan — one dispatch performed at Step 11 / Step 11.5 / Step 11.6, see references/session-scan.md — and the conditional Step 5 delegation when it fires, see Step 5's sub-step 2 delegation guidance) and (ii) the model propagated via the Model: argument to the named callees the workflow dispatches (Step 7.5 rules-review; Step 6's tidy fallback dispatch — the bundled Skill(tidy) cleanup callee; the built-in simplify primary path takes no model and is unaffected; Step 6.5's and Step 4's prose-polish file-mode callees likewise receive no propagated model — prose-polish keeps its own deliberate sonnet default (its SKILL.md notes sonnet produces more concise, natural prose than larger models), which propagating subagent_model would override and defeat (a different reason from simplify's no-argument-contract one); the Step 3 / Step 8 inline reviewer when the resolved reviewer is Claude-family — see Step 1's reviewer-family classification; external-CLI reviewers are not affected). The conditional Step 2 research delegation (no-Plan-Mode path only — visual or crit — see Step 2 sub-step 3's "Codebase-research delegation" guidance) is a direct Agent dispatch but is excluded from this governance: it fires before Adjust N resolves the tier, so subagent_model is still its inherit init there and it always runs on the session model. Resolved once in Step 2 from the assessed tier (see Step 2's Adjust N for the resolution chain and the -i-path handling). Built-in default = {trivial: sonnet, simple: sonnet} (moderate / complex inherit). Behavior change: under this default, Trivial and Simple tasks run their subagent dispatches on sonnet instead of the session model. To opt out (restore all-inherit on the low tiers), set subagent_model: {trivial: inherit, simple: inherit} in .claude/dev-workflow.md or ~/.claude/dev-workflow.local.md. Invalid values / unknown tier keys warn and fall back to the built-in per-tier default. hooks.on_complete skill entries' models are independent of this key — the workflow never passes Model: to hooks.on_complete callee skills; each callee's model is set skill-side and is unaffected by subagent_model. Per-subagent effort is out of scope — the Agent tool exposes only model.
interactive_commits: Whether Step 10 (Interactive Commits) runs after hooks.on_complete (default: true). When true, after Step 9 (Completion Hooks) the workflow proposes commit groupings and messages, then iterates per-commit with the user; it additionally proposes committing the updates that Step 11 (Update Rules) writes across extract-rules' output directories (rule files under output_dir, .examples.md files under examples_output_dir, and staged candidates under staging_output_dir) — see Step 11's "Commit rule updates" sub-step. Behavior change: previously those rule changes were always left for the user to commit manually (the Completion rule-update reminder); under interactive_commits: true the workflow now proposes the commit as an additional gate, and the manual-commit reminders fire only for whatever changes remain uncommitted after that gate (decline the gate to keep the manual-commit behavior). When false, Step 10 is omitted from the task list and never executes, and the Step 11 rule-update commit is likewise never proposed — the workflow ends with an uncommitted tree (the manual-commit reminder fires as before). Non-boolean values fall back to true with a warning. To opt out, set interactive_commits: false in .claude/dev-workflow.md or ~/.claude/dev-workflow.local.md
compact_rules: Whether Step 11 sub-step 3 (Char-count compaction gate) runs (default: false). The compaction mode added in v1.38.0 is currently experimental — when false (the default), sub-step 3 is skipped entirely: Skill(extract-rules) --compact is never invoked, the gate is never opened, and compaction_applied_count / below_threshold_failed_files stay at their initial values so § Completion's compaction reminder is automatically omitted. When true, the workflow invokes Skill(extract-rules) --compact and may enter the Step 11 compaction approval gate (USER APPROVAL GATE). Non-boolean values fall back to false with a warning. To opt in for a specific project, set compact_rules: true in .claude/dev-workflow.md or .claude/dev-workflow.local.md
plan_review_gate: Which surface Step 4 (Finalize Plan) uses to gate plan approval — plan-mode / visual / crit (default: visual). plan-mode enters Claude Code's built-in Plan Mode (plan_mode_active = true); Step 4 then uses the text approval path (chat-rendered condensed plan → ExitPlanMode). visual and crit both keep Step 2 out of Plan Mode (plan_mode_active = false, see Step 2 sub-step 2's resolution) so their gates can perform the non-read-only operations Plan Mode forbids.
- visual (default, no behavior change from the legacy visual_plan_review: true) runs the bundled browser-based review gate via references/visual-plan-review.md: summary header, collapsible sections, Decision cards, per-element comments; localized comments are applied to the plan and the gate re-runs internally. Returns one of three outcomes: approve (proceed to implementation), rewrite-approach (an approach-level material change), or fallback (the local browser is unreachable, via CLAUDE_CODE_REMOTE, or the launch fails) — fallback routes to a no-Plan-Mode chat approval.
- crit (opt-in, experimental) runs the external crit CLI (https://github.com/tomasz-tomczyk/crit — a separately-installed local browser review tool, not bundled with this skill) via references/crit-plan-review.md, which returns the same three-value contract (approve / rewrite-approach / fallback) as the visual gate. Availability is detected via the crit --version exit code (command -v crit is avoided — it would trip a permission dialog). When crit is unavailable, its launch fails, or the local browser is unreachable, Step 4 routes the fallback to the visual gate above rather than straight to chat (crit → visual → chat) — the bundled visual gate is always available locally, so this preserves a browser-review experience for crit-selecting users. See references/crit-plan-review.md § Fallback contract.
- On Claude Code on the Web, visual and crit both always fall back to the no-Plan-Mode chat approval (no rich browser surface there), but still take Step 2 out of Plan Mode.
- Invalid values fall back to visual with a warning.
- Legacy key: visual_plan_review (boolean) is deprecated but still read for backward compatibility — true maps to visual, false maps to plan-mode. When both keys are set, plan_review_gate wins — "set" means present, not valid: a plan_review_gate holding an invalid value still wins over the legacy key and resolves via the "Invalid values fall back to visual" rule above; it never falls through to the legacy mapping. The legacy key is a temporary backward-compatibility shim; it will be removed in a future release following a standard experimental → graduate → deprecation-notice → removal lifecycle, with a deprecation notice added here first.
- To opt out of the browser-based gates back to the Plan Mode flow, set plan_review_gate: "plan-mode" in .claude/dev-workflow.md or .claude/dev-workflow.local.md.
- (History: v1.70.0 introduced the visual gate as an experimental opt-in; v1.72.0 made it skip Plan Mode entirely; v1.81.0 removed the experimental marker. Behavior change from v1.88.0: visual_plan_review (boolean) is renamed to the plan_review_gate enum and the opt-in crit gate is added; the legacy key keeps working via the compat mapping above.)
polish_prose: Whether the workflow's two prose-polish passes — Step 6.5 (Polish Prose, file-mode polish of the changed files) and the Step 4 plan-body polish (file-mode polish of the plan document before it is presented) — run (default: true). When true (the default), both passes run: the Step 4 plan-body polish runs on every difficulty tier, and Step 6.5 is still subject to the difficulty-skip matrix, so it runs only on Moderate / Complex. When explicitly set to false, both passes are skipped: Step 6.5 marks itself completed and proceeds to Step 7 (emitting a one-line note on Moderate / Complex; on Trivial / Simple the difficulty-skip matrix already owns the skip, so no polish_prose note fires), and the Step 4 plan-body polish is skipped silently (the un-polished plan is presented). Non-boolean values fall back to true with a warning. To opt out for a specific project, set polish_prose: false in .claude/dev-workflow.md or .claude/dev-workflow.local.md. Behavior change from v1.78.0: the default flipped from false (opt-in) to true — projects that left polish_prose unset now run both prose-polish passes by default; set polish_prose: false to opt out. (History: the prose-polish wiring was added in v1.77.0 running unconditionally; v1.78.0 gated it behind polish_prose: true defaulting to opt-in.)
confirm_remaining_steps: Whether the workflow asks, at the entry to Step 11 (Update Rules) — i.e. after the commit phase (after Step 10 Interactive Commits, or after Step 9 Completion Hooks when interactive_commits: false and Step 10 is omitted) — whether to run the remaining rule-maintenance and retrospective steps (Step 11 Update Rules, Step 11.5 Self-Retrospective, Step 11.6 Workability Retrospective — whichever are registered) or skip them and proceed straight to Completion (default: false). The gate is experimental (recently added); it also fires under --fast regardless of this setting — see Step 11's "Confirm remaining steps" gate for the exact firing condition and behavior. Non-boolean values fall back to false with a warning. To opt in for a specific project, set confirm_remaining_steps: true in .claude/dev-workflow.md or .claude/dev-workflow.local.md
custom_instructions: Free-form development instructions applied as guiding principles across planning, implementation, review, and tidy phases (e.g., "Always use TDD", "Prefer functional style"). Optional. .claude/rules/ and explicit user requests take precedence if they conflict
language: Optional. Output language code (e.g. ja, en) for user-facing prose produced by this skill — Step 4 plan body (Overview / Decisions / Design / Test plan / Risks / Unknowns content), user-gate preambles (Step 4 / Step 7.5 / Step 8 / Step 11.6), Step 2 difficulty-assessment log, Step 10 commit-plan / per-commit gate output and the Step 11 "Commit rule updates" gate output (subjects, body, diff blocks framed in the resolved language; verbatim git output and file paths remain English), Completion summary, and Step 11.5 finding Description / Suggested fix direction paragraphs. Resolution: merged skill config → Claude Code settings (~/.claude/settings.json → language field) → default ja. null / empty string / non-string values fall through to the next resolution step. For the localization boundary between translated concepts and verbatim identifiers, see references/plan-format.md § Localization granularity. See references/self-retrospective.md §2.1 Language handling / §5 Contract note for the Step 11.5 scope contract. No Step 11.5 output unless self_retrospective.feedback is also configured
check_commands: Static checks (lint, format, typecheck, etc.). Always run all in order
test_commands: Defaults to ["Skill(run-tests)"]. Each entry must be a Skill(<name>) string (no shell commands). Entries run sequentially during Step 7. Run --init to generate or update run-tests; additional structural-check skills can be appended in project config (e.g. for bundle-sync drift detection, custom marketplace structure validators, or other repository-specific checks)
hooks: Execute skills/commands at specific workflow timing points
- on_complete: Runs as Step 9 (immediately after Step 8 Code Review). Entry format: Skill(<name>) or shell command string
- Entries not covered by allowed-tools require user approval
self_retrospective: Optional. Emits sanitized improvement signal for the dev-workflow-bundle skills (dev-workflow, ask-peer, extract-rules, rules-review) at Step 11.5 (between Step 11 and Completion). Raw conversation stays in-session; only abstracted text leaves
- feedback: Destination string. Auto-detected:
  - Starts with /, ~/, ./, or ../ → local directory path → retrospective written as a markdown file under that directory
  - Matches ^[\w.-]+/[\w.-]+$ → GitHub owner/repo → retrospective submitted via gh api POST to /repos/<feedback>/issues
  - Any other string (including empty) → warn and skip Step 11.5
- If feedback is unset, Step 11.5 is not registered as a task and never executes — the workflow behaves as before
- Step 11.5 runs whenever self_retrospective.feedback is configured, regardless of the Step 2 difficulty assessment — difficulty gates the review-iteration counts N_plan / N_code (Step 3 / Step 8) and the difficulty-skip matrix (Step 6 Tidy / Step 6.5 Polish Prose / Step 7.5 Rules Compliance on Trivial / Simple), but not the self-retrospective. Even Simple / Trivial tasks emit a retrospective when feedback is set; when nothing notable surfaced, the retrospective is simply short
- Agent tool usage: Direct Agent-tool subagent spawns happen at three fixed infrastructure dispatch sites per run (across two steps) — Step 7's two concurrent background launches (the per-pass rules-review launch and the per-pass code review; run_in_background dispatches for test-phase overlap — see Step 7's "Concurrent rules-review launch" and "Concurrent code review launch" paragraphs for why Agent rather than Skill()), and the shared session scan (one dispatch covering the rule-extraction, self-retrospective, and / or workability axes, performed once by whichever of Step 11 / Step 11.5 / Step 11.6 is the first participating step to dispatch — so the scan's host step varies across runs but the site count does not; see references/session-scan.md). Beyond these three fixed sites, two conditional delegations may fire: Step 2 MAY delegate read-only codebase research to a subagent when plan_review_gate is not plan-mode (fires only on the no-Plan-Mode path — visual or crit — when the task benefits from non-trivial research — see Step 2's sub-step 3 "Codebase-research delegation" guidance), and Step 5 MAY conditionally delegate a well-specified implementation unit to a task-effective subagent (fires only when Step 5 delegates such a unit — e.g. a settled bulk-mechanical subtask; rare on a typical run, zero when no unit benefits — see Step 5's sub-step 2 delegation guidance). Each of the three fixed sites — and the conditional Step 5 delegation when it fires — passes the Step 2-resolved subagent_model as the Agent model parameter (omitted when the resolution is inherit); the conditional Step 2 research delegation is the exception — it dispatches before Adjust N resolves the tier, so it always runs on the session model (subagent_model is still its inherit init) and is excluded from subagent_model governance. All other steps delegate to named skills (Skill(ask-peer), Skill(run-tests), Skill(rules-review), Skill(simplify) / Skill(tidy), Skill(prose-polish), etc.) and must not invoke Agent directly (the Step 2 research delegation and the Step 5 implementation delegation above are the sanctioned exceptions, each scoped by its own guards). (The Step 3 / Step 8 inline reviewer's subagent_model propagation rides the named Skill(<reviewer>) call's Model: argument, and Step 6's tidy fallback propagation rides the named Skill(tidy) call's Model: argument — both are not direct Agent spawns and do not count against the three fixed dispatch sites above; Step 6.5's prose-polish file-mode callee and Step 4's plan-body prose-polish callee are likewise named-skill dispatches, not direct Agent spawns, and receive no propagated model per the subagent_model bullet.) The Step 4 plan-review gates (plan_review_gate: visual — Step 4 always runs the gate, which itself owns the browser-reachability determination and only launches serve.mjs when the local browser is reachable, see references/visual-plan-review.md; and plan_review_gate: crit, which likewise owns its own availability/reachability determination and only launches the crit CLI when reachable, see references/crit-plan-review.md) launch their external process via background Bash (run_in_background), not the Agent tool; neither is a subagent dispatch, and neither counts against the three fixed dispatch sites here.
workability_retrospective: Optional. Detects this session's project-tooling workability improvements at Step 11.6 (between Step 11.5 and Completion) and offers a per-candidate disposition gate. Like self_retrospective, it is a nested map merged as a scalar/map key (a higher layer replaces the lower layer's map wholesale; no per-key cross-layer merge)
- enabled: Whether Step 11.6 runs (default: false). The detection + 4-way disposition feature is experimental (recently added) — when false (the default), Step 11.6 is not registered as a task and never executes. To opt in for a specific project, set workability_retrospective.enabled: true in .claude/dev-workflow.md or .claude/dev-workflow.local.md. Non-boolean values fall back to false with a warning
- backlog_dir: Directory for the "save to backlog" disposition's markdown files (default: .claude/improvements). A project that enables this feature should add its backlog_dir to .gitignore (the backlog is kept; only commit inclusion is blocked). Non-string / empty values fall back to the default with a warning
- Step 11.6 runs whenever workability_retrospective.enabled is true, regardless of the Step 2 difficulty assessment (same as self_retrospective — difficulty gates the review-iteration counts and the difficulty-skip matrix, not the retrospective steps). The Step 11.6 detection is served by the shared session scan (see references/session-scan.md): Step 11.6 dispatches that scan when no earlier participating step did (Step 11 abstained because rule-extraction-active was false, and Step 11.5 was unregistered or pre-flight-aborted), otherwise it consumes the workability block from the already-dispatched scan — see the Agent tool usage bullet above

Mode Detection

--init → Init Mode (-i / --iterations is ignored)
--resume <state-file> → Execution Mode (Resume sub-mode; see Step 1.5)
Otherwise → Execution Mode (Normal sub-mode)

--fast is an Execution Mode modifier, not a fourth branch here — it combines with either Normal or Resume sub-mode (see Step 2's Adjust N by difficulty). Like -i, it is ignored under --init.

Init Mode

Read references/init-mode.md and follow the procedure.

Note: Skills generated by --init (e.g. run-tests) are recognized from the next session onward. Do not run /dev-workflow <task> in the same session as --init.

Execution Mode

No-Stall Principle

Once the workflow has started (after Step 1.5 resolves the effective task), it must run to Completion without pausing, except at the explicit user-gate points enumerated below. Every other step — including every skill invocation, every no-op outcome, every "nothing to report" result — must be judged semantically by the agent and passed through automatically. Do not rely on exact-phrase matching; if the skill result reads as a successful completion (fixes applied, no changes needed, no violations, no new rules, or any equivalent "success / no-op" outcome regardless of wording), treat it as success and proceed to the next step.

Explicit user-gates (the only permissible pause points):

Each bullet names the gate and points to the authoritative definition site. When editing either the enumeration or the definition, update both together.

Step 1.5 task-decomposition proposal dialogue — yes / adjust / no confirmation (Normal sub-mode; defined in Step 1.5 dispatch and references/task-decomposition.md § B. Normal sub-mode)
Step 1.5 leftover-subtask picker dialogue — selecting which subtask to run when more than one leftover in_progress subtask is runnable (Resume sub-mode; defined in references/task-decomposition.md § A. Resume sub-mode)
Step 4 plan approval (defined in Step 4: Finalize Plan)
Step 4 plan-review gate (visual / crit) — when plan_review_gate is visual or crit, a browser-based structured review gate replaces the text approval presentation (each gate, per references/visual-plan-review.md / references/crit-plan-review.md respectively, owns its own availability/reachability determination — an unreachable browser, or an unavailable crit binary, is not an up-front condition tested here but surfaces as the fallback covered below). The workflow waits for the browser submit, which is a harness-tracked background boundary (proceed on the background process's exit notification — not a "type continue" pause); crit has no finite-timeout equivalent of its own, so an abandoned crit review simply waits longer for that notification (see references/crit-plan-review.md § Procedure step 6 for the resulting abandonment behavior). Each gate returns approve / rewrite-approach / fallback; both run outside Plan Mode (Step 2 skips EnterPlanMode whenever plan_review_gate is not plan-mode), so on approve they proceed directly to implementation without calling ExitPlanMode. On fallback, crit routes to the visual gate (per references/crit-plan-review.md § Fallback contract and Step 4's routing table), and visual routes to Step 4's no-Plan-Mode chat-approval path (sub-step 2 path (b)'s fallback branch), not the ExitPlanMode modal. Neither gate emits a § User-gate summary preamble — each renders its review surface in the browser, so the preamble's "Applies to" list is intentionally not extended. Defined in references/visual-plan-review.md and references/crit-plan-review.md
Step 5 probe → real-implementation user-observation gate — when the Plan explicitly stages a probe / intermediate-artifact step before its real-implementation replacement: hold the workflow at the boundary until the user signals observation completion (defined in Step 5's "User-observable artifact protection gate" paragraph). Fires conditionally per the Plan's content — non-probe-staged plans never enter this gate
Step 7 pre-execution scope-narrowing stop — when a check_commands entry is assessed as a repo-wide auto-fix tool, the working tree has unrelated existing changes, and scope narrowing is not feasible given the tool's interface: stop and ask the user for direction (options: run accepting full-width effect, skip, or provide an alternative scoped invocation) (defined in Step 7: Check / Test)
Step 7 scope-drift stop — when check_commands writes non-trivial changes outside the task-scope snapshot (trivial = whitespace-or-comment-only formatting on ≤ 5 lines attributable to the formatter/linter that just ran — those proceed automatically with a one-line note): warn and wait for user direction (defined in Step 7: Check / Test)
Step 7 check/test fail-stop — failure after 3 retries: report the error and stop (defined in Step 7: Check / Test). Note: this is an error-stop, not a pause for user decision
Step 7.5 persistent-violations decision — rule violations still present after the 2nd review cycle (defined in Step 7.5: Rules Compliance Review)
Step 8 unresolved-findings decision — reviewer-reported actionable findings still unresolved after the N_code-th iteration (defined in Step 8: Code Review)
Step 10 commit-plan approval gate — accept the proposed commit grouping (subjects + file lists) for the working-tree changes; fires once on the initial plan and re-fires whenever a Mid-loop adjust file-regrouping / split-adding branch rebuilds the un-landed portion of the plan (defined in references/interactive-commits.md § Propose commit plan)
Step 10 per-commit accept gate — accept each individual commit (subject / body / files / diff) before it lands; repeats N times where N is the approved commit count (defined in references/interactive-commits.md § Per-commit loop, judged per § Approval token closed list inside Step 10)
Step 10 fold-or-defer gate — after a pre-commit hook auto-modifies the working tree following a zero-exit commit, ask the user whether to amend the just-landed commit (fold) or leave the changes uncommitted for a later iteration (defer); judged per the dedicated 5-branch → fold / defer / cancel / re-present-as-adjust classifier in references/interactive-commits.md § Post-commit auto-modify cycle bound (the 5 input branches extend § Approval token closed list's 4 buckets with an additional defer-direction branch; this gate is not the per-commit-accept-gate enum — cancel routes via Mid-loop cancel and ambiguous adjust responses re-enter the gate via § Mid-loop adjust branch f, both in the same reference)
Step 10 ambiguous-adjust clarifier — when a Mid-loop adjust request cannot be classified into branches a–e, ask the user a clarifying question and re-enter the gate that issued the request — this gate is itself the disposition for branch f of Mid-loop adjust — closed-list branches (in references/interactive-commits.md; categorization vocabulary depends on which gate originated the request)
Step 11 compaction approval gate — when Skill(extract-rules) --compact returns top-level status: "compacted", present per-file diff (chars_before / chars_after / iterations_used / applied_edits_count / structural_notes / per_file_status / below_threshold) per § User-gate summary preamble and wait for accept/reject/adjust/cancel per the Step 11 local closed list (defined in references/update-rules.md § Char-count compaction gate). cancel aligns with Step 10's Mid-loop cancel semantic (no revert; see references/interactive-commits.md § Mid-loop cancel); adjust uses Step 11's own three-case closed list (per-file disposition / clarification / other), not Step 10's branch f
Step 11 confirm-remaining-steps entry gate — ask at the entry to Step 11 (Update Rules) — i.e. after the commit phase — whether to run the remaining rule-maintenance / retrospective steps (Step 11 / 11.5 / 11.6, whichever are registered) or skip them and proceed straight to Completion. A binary proceed / skip judgment; on skip the gate marks those steps completed without running them. Fires conditionally — see Step 11's "Confirm remaining steps" gate for the exact firing condition (confirm_remaining_steps: true, or fast_mode_active), which is the single source of truth for this bullet
Step 11 rule-update commit gate — when interactive_commits: true and Step 11's Skill(extract-rules) wrote uncommitted changes under any of its three output directories (output_dir / examples_output_dir / staging_output_dir), propose committing them as a single commit (reusing the per-commit Present / accept / adjust / cancel mechanics from references/interactive-commits.md, judged per § Approval token closed list); adjust narrows the file set by pathspec omission and cancel leaves the changes uncommitted. Because it reuses references/interactive-commits.md § Post-commit auto-modify cycle bound, the same fold-or-defer sub-gate the Step 10 fold-or-defer gate above describes can fire here if a pre-commit hook auto-modifies the tree after the rule commit lands (judged per that reference's 5-branch classifier). Defined in Step 11's "Commit rule updates" sub-step. Like the Step 10 commit gates, it renders the commit data verbatim via git-shaped output and emits no § User-gate summary preamble
Step 11.6 workability-candidate disposition gate — per-candidate 4-way disposition (act now / make a subtask / save to backlog / reject; ambiguous responses re-present the affected candidate) over the workability candidates detected this run, presented once with a § User-gate summary preamble (defined in references/workability-retrospective.md § 4. Disposition gate). Fires only when workability_retrospective.enabled is true and the detection subagent returned ≥ 1 candidate. The backlog-write / state-file-create failure classes are non-fatal (recorded and continued, per references/workability-retrospective.md § 6)
Completion execution-time deferral/exclusion gate — when executing a decomposed subtask, if in-scope work items were excluded / deferred / discovered-unassigned during implementation or testing, ask the user to promote each uncovered item to a tracked subtask entry: (a) add as a new pending subtask (with depends_on if sequencing matters), (b) fold into an existing pending subtask's scope, or (c) explicitly accept as permanently out of parent-task scope (defined in Completion's "Execution-time deferral/exclusion gate" paragraph). Fires conditionally — only on decomposed-subtask runs that surfaced uncovered items
Completion subtask PR URL prompt — when executing a decomposed subtask, ask for optional PR URL before resuming (defined in Completion)

Fatal errors are out of scope for this principle: configuration-file absence, malformed state file, irrecoverable skill / tool failures, and similar infrastructure-level errors halt the workflow with a diagnostic regardless of whether they appear in the list above. The No-Stall Principle governs successful step outcomes (including no-op successes); it does not force the agent to push through genuine errors.

At any point not listed above — including after Skill(simplify) / Skill(tidy), Skill(prose-polish), Skill(rules-review), Skill(extract-rules), Skill(run-tests), and reviewer skills return, and including collecting the background rules-review result (Step 7.5 sub-step 1) and the background code-review result (Step 8 sub-step 1), both launched in Step 7 — the agent must never wait for the user to say "continue" / "続けて". Semantic judgment of the returned result is sufficient. Likewise, when the Step 4 visual or crit plan-review gate is active, its wait for the browser submit is a harness-tracked background boundary — proceed on the background serve.mjs / crit process's exit notification without asking the user to "continue".

No standalone waiting turns at async dispatch boundaries. After handing work to a host-tracked background process (a run_in_background Agent dispatch, the Step 4 visual/crit gate's serve.mjs/crit wait, or any other completion-notified worker), yield immediately and wait for the completion signal — do not emit a content-free "waiting for the background task — I'll continue when it reports back" turn, and do not repeat such a turn while the result is still pending. At most one brief acknowledgement immediately after dispatch is permitted (the § Progress Visibility pre-call status line already covers this), and only when it carries new information; every turn after that until the completion notification arrives must carry a concrete tool call. When the harness periodically re-invokes the agent before the completion notification arrives (e.g. a scheduled keep-alive restart) and no new information has arrived since the last turn, still issue the required tool call (e.g. reissue the same wait/monitor action) but omit any acknowledgment or status prose — a no-new-signal restart is not a decision moment, and prose around it is exactly the waiting-turn pattern this paragraph forbids.

No-summary turn at review-return boundaries. When a reviewer or sub-skill returns a result that is semantically "nothing actionable" (no findings, no violations, no changes needed — regardless of the exact wording or the length of the response), the immediately next turn must begin with a tool call (a TaskUpdate to mark the iteration as completed, or the next step's invocation), not with a prose summary of the review outcome. Category-by-category verdict lists, conclusion paragraphs, and "shall I proceed?" sentences are the stall pattern — emit them only in the Completion summary (the ### Completion section that runs after the retrospective steps, Step 11.5 / Step 11.6), never at review-return transition boundaries. This applies to: Skill(ask-peer) / Skill(ask-claude) / Skill(ask-codex) / Skill(ask-gemini) / Skill(ask-copilot) / Skill(ask-agy) returning no actionable findings at Step 3 or Step 8 (at Step 8, whether returned inline or collected from the Step 7 background launch), Skill(simplify) / Skill(tidy) / Skill(prose-polish) returning no changes, Skill(rules-review) returning no violations (whether returned inline or collected from the Step 7 background launch), Skill(extract-rules) returning no new rules at Step 11, and any other sub-skill whose result is treated as success.

Callee verdict transcription is not a turn boundary. When a sub-skill (Skill(simplify) / Skill(tidy) / Skill(rules-review) / Skill(extract-rules) / Skill(run-tests) / reviewer skills / any other callee) returns a fenced JSON verdict, status token, or structured summary, and the orchestrator's response re-transcribes that block (verbatim or paraphrased) in its own output, the transcribed block does not end the orchestrator's turn. The same agent must immediately issue the next tool call in the same turn — the next sub-step's invocation, the next iteration's dispatch, the next phase's transition, the next Step's first tool call. Specifically forbidden: inserting a "shall I proceed?" sentence after the transcribed verdict; emitting "ここまでで一区切り" / "ここまでで完了です" prose summaries between the verdict and the next action; ending the response on the verdict block and waiting for the user to say "continue" / "続けて". This rule extends the "no-summary turn" rule above to the case where the sub-skill returned an actionable result and the orchestrator's response carries the verdict's content forward — the verdict transcription itself is informational, not terminal. (For skill development this covers Pattern A iteration loop verdict returns where the orchestrator re-renders the JSON before re-dispatching, orchestrator multi-callee chains where one callee's verdict feeds the next callee dispatch, sequential sub-step completion marking, and hook-chain continuations.) Sub-step completion prose ("Step N complete", "(d) verify-diff returned converged") follows the same rule: completion reports in prose are not turn-end signals; the next sub-step's first tool call must follow in the same turn.

Progress Visibility

Before any subagent-backed skill call (Skill(<name>) invocations including run-tests, ask-peer, simplify, tidy, prose-polish, rules-review, extract-rules) or any shell command expected to take ≥ 30 seconds, emit a brief status message naming what is starting — e.g. "Starting test run via run-tests…" or "Calling ask-peer for plan review (iteration 1 of N)…". Emit the message as prose in the same assistant turn that issues the tool call, not as a separate preceding turn. This lets the user distinguish an agent in active progress from one that has stalled. After the step returns, proceed immediately to the next step per the No-Stall Principle — do not emit a separate acknowledgment turn. When Step 7 launches the background rules-review and/or the code review as background Agents, each dispatch is a subagent-backed call and emits its status line here; collecting the background results later (rules-review at Step 7.5 sub-step 1, code review at Step 8 sub-step 1) is a non-stalling return-boundary — proceed without a separate acknowledgment turn.

Mid-chain visibility (chained sub-skill calls or extended interpretation between tool calls). When a workflow phase issues sub-skill calls in a chain or spans extended internal interpretation / preparation across multiple tool calls (for skill development this includes a pre-implementation feasibility-check phase that fires several sub-skill dispatches in sequence, or a routine skill that issues several sub-skill dispatches back-to-back), the single pre-call status message rule above does not fully cover the user-visibility window. Extend the rule with a "current-location" line emitted at semantic checkpoints between dispatches — one short sentence naming the current phase and the next action ("Finished verify-diff for Finding 1; next: skill-review polish on the same file"). To keep the addition from re-introducing stall, three constraints bind its shape: (a) emit the current-location line as prose in the same turn as the next tool call, never as a standalone turn that waits for user input; (b) restrict the content to current phase name and next action only — review-result summaries, decision rationales, and "shall I proceed?" sentences stay out; (c) the rule does not apply to short same-turn chains of a few tool calls that complete inside a single turn — only to phases where the gap between user-visible signals would otherwise span multiple turns. Intent: in chained sub-skill phases (feasibility checks, routine dispatch loops, multi-call interpretation work) the user keeps seeing "this is alive and moving", while the No-Stall Principle's confirmation-prohibition stays intact.

Workflow artifacts (cross-step fixed exclusion)

Files this workflow itself creates and maintains as in-session state — plan documents under .claude/plans/, decomposition state files written by Step 1.5, Step 10, or Step 11.6, backlog files written by Step 11.6 under workability_retrospective.backlog_dir, the Step 4 visual-gate served plan file, its comments file, and its prev snapshot (.claude/plans/<slug>.plan-review.md / .claude/plans/<slug>.plan-review.comments.json / .claude/plans/<slug>.plan-review.prev.md), the Step 11 rule-extraction candidate file passed to extract-rules --apply-conversation-candidates (.claude/plans/<slug>.rule-candidates.md), and other workflow-internal staging artifacts placed under .claude/ by this skill — are cross-step fixed exclusions from any per-step changed-file enumeration (Step 6 Tidy scope, Step 6.5 Polish Prose scope, Step 7.5 rules-review diff input, Step 10 Interactive Commits' commit grouping, sub-skill dispatch payloads, scope checks). The exclusion is structural — the workflow owns these files as its own operational substrate — and is not gated on whether the path appears in .gitignore, whether a formatter ignore-file aligns, or whether the user happens to be touching them in this run. Steps that build a changed-file set, a diff-scope set, or a commit grouping must apply this single shared exclusion rather than re-deriving the rationale per step against ad-hoc justifications. If a future change adds another in-session-state path, extend this canonical list once rather than threading the exclusion through per-step prose (for skill development this is the canonical workflow-artifact set; sub-skills the workflow dispatches that maintain their own in-session state under .claude/ follow the same convention).

Step 1: Load Settings

Read settings from up to three layers and merge (type-aware):
```
merged = {}
if ~/.claude/dev-workflow.local.md exists:  overlay its frontmatter onto merged
if .claude/dev-workflow.md exists:          overlay its frontmatter onto merged
if .claude/dev-workflow.local.md exists:    overlay its frontmatter onto merged
```
"Overlay" = for each key present in the file:
- Scalar keys: merged[key] = file[key] (replace) — this includes review_iterations when its value is a map ({plan, code}): the whole map replaces the lower layer's value with no per-key cross-layer merge (a map key absent from the higher layer is not back-filled from the lower layer)
- List keys (check_commands): append file[key] items after merged[key], then deduplicate (keep first occurrence)
- List-replace keys (test_commands): merged[key] = file[key] — the higher layer's whole list replaces the lower layer's (no item-level merge or dedup)
- hooks: deep-merge — for each sub-key (e.g. on_complete), append and deduplicate the list
- null or empty ([], {}) explicitly clears the key — lower-layer value is discarded, not inherited
- Key absent from the file: left untouched (inherit from lower layers) If a file's YAML frontmatter is malformed (parse error), warn the user naming the file, skip that layer, and continue with remaining layers.
If none of the three files exist, prompt user to run /dev-workflow --init and stop
Resolve reviewer from config. If not specified or not in the supported list (ask-peer, ask-claude, ask-codex, ask-gemini, ask-copilot, ask-agy), use ask-peer. Reviewer-family classification (the single definition referenced by the Step 3 / Step 8 inline-reviewer subagent_model propagation): Claude-family = ask-peer / ask-claude — model-controllable (ask-peer via its Model: argument applied to its internal Agent dispatch; ask-claude via the claude -p --model flag); external-CLI = ask-codex / ask-gemini / ask-copilot / ask-agy — these run their own non-Claude models and are not subagent_model-controllable (never receive a propagated model). Initialize the bundle-unavailability ledger here: set bundle_skills_unavailable = [] — same cross-step-ledger mechanism as difficulty_skipped_steps (§ Completion's difficulty-skip reminder; declared here rather than there because this sub-step is the earliest site that may need to append to it). Records are short human-readable strings, e.g. <skill> unavailable (<context>); § Completion's bundle-skill availability reminder renders the list verbatim. Each of the five dev-workflow-bundle sibling skills this workflow depends on (ask-peer, rules-review, extract-rules, tidy, prose-polish) appends at most one record per call site per run, the first time that site's unavailability is declared — most of the five have exactly one call site, but prose-polish has two independent ones (Step 4 and Step 6.5) and may append one record for each if both fail; neither site is recorded twice. Probe the resolved reviewer's availability immediately, regardless of whether it is the default ask-peer or a configured alternative — ask-peer is a dev-workflow-bundle sibling skill, independently installable and installed separately from dev-workflow itself (per .claude-plugin/marketplace.json, each is its own plugin entry in addition to the bundle), so it is not guaranteed present just because dev-workflow is: attempt Skill(<reviewer>) with a one-word probe request (e.g., ping); if the call fails, retry once. If still failing, and the resolved reviewer is ask-peer specifically (the bundle-member reviewer — see above), append ask-peer unavailable (reviewer, Step 3/8) to bundle_skills_unavailable. Either way, emit the three-option prompt defined in § Prerequisites' "Reviewer skill" bullet — do not block the run, present the options and let the user decide before the first review step begins.
Resolve the review iteration counts — N_plan (Plan Review, Step 3) and N_code (Code Review, Step 8). A scalar config, the -i option, and the default all set both values equally; only the map config form makes them differ:
1. If -i / --iterations option is present and is a positive integer, set both N_plan and N_code to it (the option overrides both phases)
2. Else if config review_iterations is present:
  - scalar positive integer → set both N_plan and N_code to it
  - map ({plan, code}) → N_plan = plan if it is a positive integer else default 3; N_code = code if it is a positive integer else default 3 (per-key validation, independent per phase; warn on each absent/invalid key)
  - any other value (non-positive or non-integer scalar, list, string, or any non-map / non-scalar type) → warn and set both to default 3 (any map takes the map branch above — an empty map, or a map with no valid plan / code key, already resolves to default 3 per phase there)
3. Else use default 3 for both Wherever a later step says "N" without a phase qualifier, Step 3 references resolve to N_plan and Step 8 references to N_code.
Parse hooks from config. Warn and ignore if hooks.on_complete has invalid format. For review_iterations, emit the invalid-value / invalid-map-key warnings as sub-step 4's resolution defines (sub-step 4 owns the case analysis and the default-3 fallback per phase). Parse custom_instructions from config (optional, string). Warn and ignore if not a string. Parse interactive_commits, compact_rules, plan_review_gate, polish_prose, confirm_remaining_steps, and subagent_model from config — each key's default value and its warn-and-fall-back-on-invalid-value behavior is exactly as documented in its own § Configuration bullet, which is the canonical source (do not restate it here); subagent_model's validated merged map is consumed by Step 2's subagent_model resolution. plan_review_gate's parse additionally reads the deprecated visual_plan_review boolean per § Configuration's plan_review_gate bullet's "Legacy key" paragraph (do not restate the mapping here). Parse language from config per the Configuration bullet above. For ~/.claude/settings.json, silently accept missing file / absent key / null value; warn once per Step 1 settings-load pass on malformed JSON, non-string, or empty string. The resolved language is available to Step 11.5. Language checkpoint: immediately after resolving language here, emit a one-line informational note surfacing the resolved value (e.g. Output language: ja) — this makes the language setting visible before the first user gate, so all subsequent steps render in the correct language from the start. Parse self_retrospective.feedback from config (optional, string). Warn and ignore if not a string or if empty string "". When feedback matches the owner/repo pattern (^[\w.-]+/[\w.-]+$), additionally run gh auth status as an early warning only — if auth fails, warn but do not block the run. Parse workability_retrospective from config (optional, nested map): workability_retrospective.enabled (optional, boolean, default false; warn and fall back to false if present but not a boolean) and workability_retrospective.backlog_dir (optional, string, default .claude/improvements; warn and fall back to the default if present but not a non-empty string). When enabled is not true, Step 11.6 is not registered (sub-step 7) and never executes
Determine execution sub-mode: Resume if --resume <state-file> was provided, otherwise Normal. Step 1.5 branches on this. Resolve fast_mode_active (boolean) from whether --fast was passed on this invocation — invocation-only, no config key (see Step 2's Adjust N by difficulty for its effect on N_plan/N_code, and § No-Stall Principle's confirm_remaining_steps entry gate bullet for its effect on Step 11)
Register all workflow phases with the Task tools, including review iterations — issue one TaskCreate per phase below (each returns an auto-numbered taskId). Do NOT skip any phase:
- Step 1.5: Task Decomposition (Normal sub-mode only — omit this entry entirely in Resume sub-mode, since the step has nothing to do at registration time there)
- Step 2: Create Plan
- Step 3: Plan Review
- Step 3-1 through Step 3-N_plan: Plan Review - iteration 1 through N_plan (generate N_plan items based on resolved N_plan)
- Step 4: Finalize Plan
- Step 5: Implement
- Step 6: Tidy
- Step 6.5: Polish Prose
- Step 7: Check / Test [check: {check_commands} | test: {test_commands}]
- Step 7.5: Rules Compliance Review
- Step 8: Code Review
- Step 8-1 through Step 8-N_code: Code Review - iteration 1 through N_code (generate N_code items based on resolved N_code)
- Step 9: Completion Hooks (only if hooks.on_complete is configured)
- Step 10: Interactive Commits (only if interactive_commits is true; single row — per-commit iteration is handled inline within Step 10 because the commit count is not known until the proposal phase)
- Step 11: Update Rules
- Step 11.5: Self-Retrospective (only if self_retrospective.feedback is set and parses as a valid destination — see Step 11.5 for detection rules; if unset/invalid, omit this entry)
- Step 11.6: Workability Retrospective (only if workability_retrospective.enabled is true; if unset/false, omit this entry. Registered regardless of the Step 2 difficulty assessment — see Step 11.6) Tool availability (Task tools vs TodoWrite): these steps name the Task tools (TaskCreate / TaskUpdate / TaskList), the default since Claude Code v2.1.142. Where the Task tools are unavailable (e.g. the VSCode extension, or Claude Code before v2.1.142), use the equivalent TodoWrite operations instead — the status values (pending / in_progress / completed) and the register-all-upfront semantics are identical, and a TaskList-by-subject status read becomes a read of the TodoWrite list. allowed-tools grants both, so use whichever the environment exposes. Registration mechanics (Task tools): issue every TaskCreate in a single upfront burst (one tool-call batch) so all phases are registered before Step 2 begins. Two conditional cases: (i) conditionally-omitted phases (the list items above carrying a condition) are omitted by not issuing their TaskCreate; (ii) N-reduced excess iteration tasks (Step 3-x beyond resolved N_plan / Step 8-x beyond resolved N_code) are still TaskCreated here at the resolved ceiling (Step 3 ceiling = N_plan, Step 8 ceiling = N_code), then marked completed via TaskUpdate by Step 2's Adjust N by difficulty. Mark each task in_progress (via TaskUpdate {taskId, status}) when starting and completed when done. Task-handle resolution convention: every later "mark Step N as in_progress / completed" instruction in this skill is shorthand for resolve that Step's task — by its registration-time captured taskId, or by subject via TaskList — then TaskUpdate {taskId, status}; the per-step lines name tasks by their human-readable subject and do not restate this resolution path. Registering all phases upfront gives the user visibility into overall progress and prevents steps from being accidentally dropped. Phase-boundary self-audit: at every top-level Step transition (not the iteration sub-rows Step 3-i / Step 8-i, which are governed by the Return-point no-stall reminders below), before issuing the first tool call that advances into a new Step's procedure, name the Step number you are entering, resolve the prior Step's task by subject via TaskList, and verify it is completed — if it is still pending or in_progress, return to the unfinished Step first instead of advancing. This guards against silent phase-skipping (e.g. jumping from Step 5 Implement to Step 7 Check / Test without running Step 6 Tidy, only to discover the gap during a later phase) that the task registration alone cannot prevent. Implementation sub-tasks in Step 5 are additions, not replacements. Note: Unless -i / --iterations was explicitly specified, Step 2 may reduce N_plan / N_code based on task difficulty.
Context-compaction recovery: if the session context was compacted (prior turns summarized) before reaching this step in the current turn, re-read the configuration files from disk rather than relying on the summary — verify each step's skip conditions (e.g. whether self_retrospective.feedback is set, whether workability_retrospective.enabled is true, whether hooks.on_complete is configured, whether interactive_commits is true, whether compact_rules is true, whether polish_prose is true, whether confirm_remaining_steps is true) from the actual merged config, not from compacted context. fast_mode_active, like -i, cannot be recovered from disk this way (it is invocation-only) — but it is inferable after the fact from the pre-completed Step 3 / Step 6.5 rows, the fast_mode_skipped_steps ledger, and the difficulty log line's fast-mode annotation.
Interruption re-anchoring: if this invocation is a user-prompted continuation of a prior session that was interrupted (connection error, browser refresh, or similar — distinguished from compaction by the user explicitly asking to resume or continue rather than context being summarized in the same session), and --resume <state-file> was not provided (that path is handled by Step 1.5 Resume sub-mode), re-establish the run's current position before proceeding: (i) read the task list (TaskList / TodoWrite read) to identify the step currently marked in_progress, (ii) re-read the configuration files from disk (same as the "Context-compaction recovery" sub-step's procedure), and (iii) announce the resumption point to the user ("Resuming from Step N — <step description>") and proceed immediately from the in_progress step — do not wait for user confirmation. Do not re-execute already-completed steps.

Step 1.5: Task Decomposition

This step decides whether the user's request should be split into multiple smaller subtasks (each delivered as its own PR), or — in Resume sub-mode — picks the next subtask from an existing state file under .claude/plans/dev-workflow.<slug>.md.

State-file semantics are critical (a malformed or mis-routed file silently corrupts subtask boundaries), so the full procedure lives in a dedicated reference. Dispatch:

Resume sub-mode (--resume <state-file> was provided): read references/task-decomposition.md and follow section A. Resume sub-mode from top to bottom.
Normal sub-mode: read references/task-decomposition.md and follow section B. Normal sub-mode.

EnterPlanMode is reserved for Step 2 (and only when plan_mode_active is true — i.e. plan_review_gate: "plan-mode"; on the visual / crit paths Step 2 skips Plan Mode, see § Configuration) — any decomposition proposal in Step 1.5 is a plain yes/no dialogue, not a plan.

After section A or B completes, the "effective task" is set for Step 2 onward: the selected subtask when decomposed, otherwise the original request.

Step 2: Create Plan

Record the current commit as base-commit (git rev-parse HEAD) for later diff comparison. Initialize the difficulty-skip ledger here: set difficulty_skipped_steps = [] (a cross-step list of human-readable records — <step> skipped (<tier> tier) — that § Completion's difficulty-skip reminder renders). This init lives at Step 2 entry, outside the -i-gated Adjust N sub-step below, so the variable is well-defined on every path: when Adjust N is skipped (because -i / --iterations was given) or no tier qualifies for a skip, the ledger simply stays empty and § Completion omits the reminder. Same purpose as the compaction_applied_count State-variable contract (well-defined on the skipped path) but a different technique: compaction initializes inside its conditionally-skipped sub-step and relies on a prose contract; this ledger is physically hoisted out of the skippable sub-step instead — do not relocate it into Adjust N expecting prose to cover the -i path, since the init statement would then not run under -i. Also initialize the fast-mode-skip ledger here, same hoist rationale as difficulty_skipped_steps above (same rendering mechanism, but a distinct sub-condition per the warning-string differentiation rule — see § Completion's fast-mode-skip reminder): set fast_mode_skipped_steps = []. Also initialize the subagent_model cross-step variable here (same hoist rationale): set subagent_model = inherit (no model override). The built-in tier → model map (§ Configuration) is consulted only after a tier is assessed in Adjust N; the pre-assessment value — and the value on the -i path, where Adjust N is skipped and no tier is assessed — is always inherit, so every downstream Agent dispatch / Model: propagation omits the model (current behavior). Read sites: see § Configuration's subagent_model bullet for the full enumeration. The conditional Step 2 research delegation (no-Plan-Mode path only) is the one exception excluded from that list — it consumes this inherit init value directly instead (see Step 2 sub-step 3's "Codebase-research delegation" guidance for why). Also initialize the shared session-scan cross-step state here (same hoist rationale): set session_scan_dispatched = false and session_scan_result = null, so the state is well-defined on every path — including paths where the earliest participating steps abstain or are unregistered (e.g. rule-extraction inactive and self_retrospective.feedback unset, leaving only Step 11.6 to dispatch). Lifecycle (per references/session-scan.md § Dispatch-once contract): init = Step 2 entry (here); set = whichever of Step 11 / Step 11.5 / Step 11.6 performs the shared session-scan dispatch (it sets session_scan_dispatched = true and stores the subagent's raw return in session_scan_result); read = the participating step(s) that consume their axis block from session_scan_result. Unlike subagent_model, this state's set and read sites are confined to Step 11 / Step 11.5 / Step 11.6.
Resolve plan_mode_active, then conditionally enter Plan Mode: set plan_mode_active = (plan_review_gate == "plan-mode") — a derived alias of the run-invariant plan_review_gate setting (read once, never reassigned — including on a Step 4 rewrite-approach re-entry). When plan_mode_active is true (plan_review_gate: "plan-mode"), call EnterPlanMode. When plan_mode_active is false (the default — plan_review_gate: "visual", or the opt-in "crit"), do not enter Plan Mode: Step 4's visual / crit gates and their no-Plan-Mode chat fallback perform non-read-only operations (writing the served plan file, launching node serve.mjs or the crit CLI) that Plan Mode's read-only restriction forbids, so these gates can only fire outside Plan Mode (see § Configuration's plan_review_gate bullet). In this no-Plan-Mode case the sub-step 6 "No code changes in this phase" rule is enforced by agent discipline alone, not by Plan Mode's read-only lock — hold to it through the Step 4 approval gate.
Analyze the task and codebase, create implementation plan. Apply custom_instructions to shape plan priorities and structure. Follow the structure defined in references/plan-format.md — Overview / Decisions / Design / Test plan required; Risks / Unknowns optional. When the work is sequential, Design defaults to an ordered, numbered list of implementation steps (see references/plan-format.md § Template, the source of truth). Section-level content rules live in the reference file; do not re-derive them here.
- If a state file exists (this run is executing one subtask of a decomposed parent): the "effective task" = the current in_progress subtask. Frame the plan around just this subtask while keeping the full parent task and other subtasks as background context so the plan stays consistent with the overall direction. Do not plan work belonging to other subtasks. See references/plan-format.md § Subtask / Resume handling for how Decisions is scoped in this case
- TDD-conflict resolution: if custom_instructions includes a TDD-style requirement (e.g. "Always use TDD", "write tests before implementation") AND the current task is adding tests for existing behavior (characterization tests, coverage tests, or relocating existing tests — keywords: "add tests for", "characterize behavior", "test coverage", "move tests", "固定する", "追加する") rather than driving new implementation, declare explicitly in Plan Overview or Risks that this subtask is TDD-loop-external: tests describe and fix already-implemented behavior, not specification of new behavior. This resolves the apparent conflict: the TDD guideline governs feature-implementation subtasks; characterization and coverage subtasks are outside the TDD loop by design.
- Version/identifier string replacement tasks: if the core operation is replacing a specific version string, identifier, or constant across the project (e.g. version bump, rename, migration), grep the entire repository for the old value before drafting the plan. Include the complete list of affected files in the Design section — missing even one location is the primary regression source for this task class
- Task-relevant skill annotation (applies to every plan — unconditional, unlike the two conditional bullets above): scan the skills available in this session's context (their descriptions are loaded at session start, which is the signal Claude Code uses to decide when to invoke a skill) for any that materially help accomplish this task's own work, and annotate the Design step(s) where each would be used — per references/plan-format.md § Template, a Design step's grammar carries the invoked skill as an optional element. This makes skill use explicit in the plan the workflow executes (Step 5) rather than relying on the model to spontaneously recall the skill mid-implementation. Constraints: (i) target task-domain skills only — exclude any skill the workflow itself already fires at a fixed step (the reviewer, the cleanup pass, rules-review, the test skills, extract-rules are illustrative); judge exclusion by "is this already a fixed-step callee?", not by matching a skill's name against that list — a task-domain skill that merely shares surface vocabulary with a callee still qualifies; (ii) annotate only skills actually present in the available-skills context — never invent a skill name; (iii) the annotation is a recommendation, not a contract — Step 5 may drop a suggested skill that turns out not to fit once implementation starts; (iv) annotate only the step(s) where a skill genuinely applies — most plans annotate few or no steps, and over-annotating bloats the plan
Codebase-research delegation (optional, no-Plan-Mode context isolation). The default for Step 2 is main-thread analysis — the main thread reads the codebase and authors the plan directly. As an exception you MAY delegate the codebase-research portion only to a read-only subagent via the Agent tool, only when all three guards hold: (a) plan_mode_active == false (plan_review_gate is visual or crit) — on the plan_review_gate: "plan-mode" path Plan Mode already isolates exploration, so this delegation does not fire; (b) the session exposes a subagent type effective for read-only codebase research — an Explore-class read-only researcher, or a general-purpose agent constrained to read-only by its dispatch prompt per (ii); (c) the task benefits from non-trivial codebase research — research meaning exploration to discover unknown files, patterns, or constraints; locating a known file or applying a settled mechanical change (a typo fix, a known-location edit, a routine bundle-copy sync) is not research and needs no delegation — plan a self-evident change of that kind with inline main-thread analysis. When delegating: (i) dispatch the read-only research subagent to return a condensed findings report (relevant files, patterns, constraints) — plan authoring stays in the main thread (the plan-format template, the Simplicity self-audit, the Decisions (a)+(b) criterion, and the Task-relevant skill annotation all remain main-thread work); this reproduces the research-context isolation that Plan Mode's built-in read-only Plan subagent provides on the plan_review_gate: "plan-mode" path; (ii) select the subagent type by capability — inverted from Step 5's rule: Step 5 excludes read-only Explore / Plan-class agents because it needs Edit / Write, whereas this delegation is research-only, so it prefers a read-only research-capable type (Explore by default), falling back to a general-purpose subagent instructed to do read-only research only (its read-only scope is set by the dispatch prompt — the Agent tool exposes no per-dispatch capability toggle, so a general-purpose agent's Edit capability cannot be withheld at dispatch and must instead be constrained by instruction) — state this inversion so it does not read as a contradiction of Step 5; (iii) it runs on the session model — it fires here, before Adjust N (sub-step 7) resolves subagent_model from the assessed tier, so subagent_model is still its sub-step-1 inherit init and no tier model is available; it is therefore excluded from subagent_model governance (unlike the Step 5 delegation, which passes the resolved value); (iv) if guard (b) does not hold — the Agent tool is unavailable, or no read-only-research-capable type is exposed — skip the delegation and do the research inline in the main thread (current behavior); both unavailability modes resolve to this same inline-research disposition, an intentional divergence from Step 5, which folds unavailability into its guard (b); (v) the returned findings report is research, not a user gate — judge it and continue per § No-Stall Principle (no new gate is introduced), and because the research subagent is read-only it cannot write, so it does not breach the sub-step 6 "No code changes in this phase" rule. This delegation is one of the two sanctioned Agent exceptions named in § Configuration's Agent tool usage bullet (the other is the Step 5 implementation delegation). Permissive guidance — no config flag beyond the plan_review_gate gate.
Simplicity self-audit: Before proceeding to Step 3, read references/simplicity-self-audit.md and audit the plan against its checklist.
Plan self-check: Run the checklist in references/plan-format.md § Step 2 self-check against the plan. This is the author's first-pass judgment on Decisions content; fix any failures before Step 3.
No code changes in this phase
Adjust N by difficulty (the tier assessment and N-derivation below are skipped if -i / --iterations was explicitly specified — except the --fast Step 6.5-only skip paragraph near the end of this item, which always runs regardless of -i): A typo fix doesn't need 3 rounds of review. Based on the plan just created, assess task difficulty and reduce the iteration counts to avoid unnecessary iterations — the configured value is a ceiling, not a target. The same difficulty cap is applied independently to N_plan and N_code (the two values that may differ only when review_iterations is a map; otherwise they are already equal):
- Trivial (a self-evident, low-risk change with one obviously correct fix — a typo fix, a one-line edit, a config value change, or a mechanical multi-site edit that applies the same clearly-correct replacement everywhere, e.g. a version bump or a rename with one unambiguous target): N_plan = N_code = 0 — Step 3 (Plan Review) and Step 8 (Code Review) are skipped entirely. Difficulty-skip matrix (Trivial): additionally skip Step 6 Tidy, Step 6.5 Polish Prose, and Step 7.5 Rules Compliance Review — at this tier the cleanup pass, the prose-polish pass, and the rules-compliance walk are low-yield, and the Step 4 plan-approval gate plus Step 7 check_commands / test_commands remain the safety net. Tie-break: escalate to Simple or above only when the change requires an actual judgment call — more than one plausible approach, behavior-affecting logic where a subtle mistake could slip through unnoticed, or genuine ambiguity about the correct fix. Do not escalate merely because the change spans several lines, files, or modules when the fix itself is mechanical and identical across every site (e.g. a version bump touching manifests in several modules stays Trivial). The same external-library major-bump exception described under Simple applies here too (such a change is never Trivial)
- Simple (a straightforward bug fix or small feature addition with an obvious, pattern-following solution and no new design decisions — spanning one or several files or call sites within a single module; more than the mechanical, uniform edit that qualifies as Trivial, so a lone typo or one clearly-correct multi-site replacement belongs in Trivial, not here): N_plan = N_code = 1 — unless the change touches an external library's config file or type-level API AND that library had a recent major-version bump (primary check: git diff <base-commit> of the package manifest; if absent in this run, judge from other context since the bump may predate this run); then classify as at least Moderate. Similar qualitative risks (external config-DSL rewrites, etc.) follow the same rule. Purely cosmetic edits (comments, whitespace, auto-formatting) do not trigger the exception — use judgment. Difficulty-skip matrix (Simple): additionally skip Step 6 Tidy, Step 6.5 Polish Prose, and Step 7.5 Rules Compliance Review — the cleanup and prose passes stay correctness-neutral at this tier, and Step 8's single review iteration takes over as the run's primary rules-compliance defense in Step 7.5's place (see Step 7.5's "Responsibility scope" paragraph for the rationale); Step 4's plan-approval gate and Step 7's check_commands / test_commands remain the safety net
- Moderate (a change that requires at least one genuine design decision — even if it otherwise follows existing patterns — or spans multiple modules; a multi-file change that applies one uniform, pattern-following edit with no new design decisions stays in Simple regardless of file count — the exception is scoped to file count within a single module; the same edit applied across multiple modules still escalates via the "or spans multiple modules" clause above when it would otherwise have qualified as Simple — this clause governs the Simple↔Moderate boundary only and does not reach back to narrow Trivial's own mechanical-uniform-edit tie-break above, which already tolerates multi-module spread): N_plan = min(2, N_plan), N_code = min(2, N_code) — no step is skipped (difficulty-skip matrix applies to Trivial / Simple only)
- Complex (cross-module, new patterns, API changes, significant refactoring): keep N_plan and N_code — no step is skipped
--fast N-forcing (applies after the tier-based N_plan/N_code above, only when Adjust N actually runs — see the -i precedence note below): when fast_mode_active and the assessed tier is not Trivial, force N_plan = 0 and N_code = 1 directly — bypassing the tier table above rather than deriving from it. When the assessed tier is Trivial, fast mode changes nothing (Trivial's own N_plan = N_code = 0 already stands). subagent_model and the difficulty-skip matrix below both keep reading the assessed tier unmodified — fast mode never touches either (Step 3 Plan Review is what fast mode skips; Step 6 Tidy and Step 7.5 Rules Compliance Review are unaffected). -i / --iterations precedence: this whole paragraph is skipped whenever -i was given (Adjust N does not run), so an explicit -i always overrides fast mode's N-forcing — this differs from the Step 6.5-only fast-mode skip below, which is mode-driven rather than an N-count and therefore applies regardless of -i.

Step 9 (Completion Hooks) is never skipped by the difficulty-skip matrix at any tier — hooks.on_complete is a project-configured open list whose callee set varies per project, so difficulty-gating it would make behavior project-dependent; the matrix covers only the whole-step-skippable Step 6 / Step 6.5 / Step 7.5.

File count is a hint, not the sole criterion. If adjusted, mark excess task iteration items (Step 3-x beyond N_plan, Step 8-x beyond N_code) completed via TaskUpdate. When Trivial reduces both to 0, mark every Step 3-x / Step 8-x iteration item AND the top-level Step 3: Plan Review / Step 8: Code Review rows as completed — both steps are skipped entirely (their entry-point guards in Step 3 / Step 8 recognize this pre-completed state and pass straight through; Trivial is the only source of N_code = 0, and it always zeroes both counts together, so the "Trivial → both steps skipped" coupling holds for N_code. N_plan = 0 alone has a second source — the --fast N-forcing above — which does not zero N_code; see the marking rule immediately below). When --fast forces N_plan = 0 on a non-Trivial tier, mark every Step 3-x iteration item AND the top-level Step 3: Plan Review row completed the same way, but leave the top-level Step 8: Code Review row and Step 8-1 untouched (N_code is forced to 1, so Step 8 still runs; any Step 8-x beyond Step 8-1 is still excess and gets marked completed per the generic excess-marking sentence above, same as any other N_code reduction) — and append Step 3 Plan Review skipped (fast mode) to fast_mode_skipped_steps (the fast-mode counterpart of difficulty_skipped_steps, rendered by § Completion's fast-mode-skip reminder). Difficulty-skip matrix marking (Step 6 / Step 6.5 / Step 7.5): apply the same pre-completed-mark + entry-point-guard mechanism to the whole-step-skippable quality steps. Keyed on the assessed tier alone (no config flag): for Trivial and Simple alike, mark the top-level Step 6: Tidy, Step 6.5: Polish Prose, and Step 7.5: Rules Compliance Review rows completed (both tiers now skip the same three steps); for Moderate / Complex, mark neither. For each row marked completed here, append one record to difficulty_skipped_steps (e.g., for a Simple-tier run: Step 6 Tidy skipped (Simple tier) / Step 6.5 Polish Prose skipped (Simple tier) / Step 7.5 Rules Compliance Review skipped (Simple tier), substituting the actual assessed tier) so § Completion's difficulty-skip reminder can render it (the skip is never silent). Step 9 (Completion Hooks) is never marked here (see the Step 9 note above). Resolve subagent_model here (after the tier is assessed, in the same Adjust N pass): set subagent_model = the merged-config subagent_model map entry for the lowercased assessed tier name (trivial / simple / moderate / complex) when that key is present and valid, else the built-in default for that tier (sonnet for Trivial / Simple, inherit for Moderate / Complex), else inherit. A resolved value of inherit means downstream dispatches omit the model (current behavior). This resolution is skipped under -i / --iterations (Adjust N does not run), leaving subagent_model at its sub-step 1 inherit init. Log the assessed difficulty and effective N_plan / N_code in the resolved language (see §Configuration; default ja). When --fast forced N_plan/N_code (non-Trivial tier), the log line names both the assessed tier and the forced counts (language: ja: 難易度: Moderate — fast モードにより Step 3 を skip し Step 8 は 1 回に設定します; language: en: Difficulty: Moderate — fast mode skips Step 3 and caps Step 8 at 1 iteration) — this keeps a major-version-bump escalation to Moderate/Complex visible even though fast mode's N-forcing still overrides the resulting N_plan/N_code (the escalation still governs subagent_model and the difficulty-skip matrix, which fast mode does not touch). The Step 11.5 and Step 11.6 task rows are not affected by the difficulty assessment — they stay pending regardless, since the self-retrospective is gated only on self_retrospective.feedback and the workability retrospective only on workability_retrospective.enabled.

--fast Step 6.5-only skip (runs regardless of -i — the one exception to this item's own skip-under--i header, unlike the N-forcing above): when fast_mode_active and Step 6.5's row is not already completed by the difficulty-skip matrix above (a genuine Trivial/Simple assessment), mark the top-level Step 6.5: Polish Prose row completed here and append Step 6.5 Polish Prose skipped (fast mode) to fast_mode_skipped_steps. When the matrix already completed it, do nothing further (avoid a double record). When -i is given, Adjust N's tier-based matrix marking above never runs, so this paragraph is the only path that skips Step 6.5 under fast mode.
Do not present the plan to the user or ask for approval/confirmation — presenting an unreviewed plan wastes user time and risks approval of a suboptimal approach. This prohibition extends to confirmation-seeking transition sentences such as "if this design looks good, I'll proceed to Step 3 (Plan Review)", "shall I move on to Plan Review?", or any equivalent ask-for-go-ahead phrasing — these read as natural conversation but constitute the same approval-gate that wastes user attention on an unreviewed plan. The moment Step 2 ends, advance directly to Step 3 without emitting any user-facing message about the plan or the transition. The user will see the plan in Step 4 (internally reviewed in Step 3, unless N_plan=0 — either the task was assessed Trivial, or --fast forced N_plan=0 on a non-Trivial tier — in which case Step 3 is skipped and the plan reaches Step 4 unreviewed).

Step 3: Plan Review

This step is an internal review — the reviewer refines the plan before the user sees it, so the user receives a higher-quality plan in Step 4. Do not present the plan to the user or ask for feedback during this step.

Difficulty exception (Trivial or --fast / N_plan=0). When N_plan = 0 — either a Trivial task (Trivial zeroes both N_plan and N_code) or --fast forcing N_plan=0 on a non-Trivial tier (N_code stays ≥ 1 in that case) — this entire step is skipped: its task rows (top-level Step 3: Plan Review and every Step 3-x) were already marked completed by Step 2's Adjust N by difficulty. This skip is gated on N_plan itself, not on the presence of user-provided analysis — the analysis-substitution prohibition below still applies in full to every task with N_plan ≥ 1.

Always run (for N_plan ≥ 1). Step 3 is not skippable on the grounds that the user's task prompt contained design analysis, prior-session handoff material, or review-like commentary. User-provided analysis is upstream planning content the user wrote — it is not an independent bias-free peer review pass and does not substitute for the reviewer dispatch. Handling rules (closed list):

(i) The Step 3 reviewer skill is always invoked.
(ii) User-provided analysis (long task descriptions that themselves argue for the approach, embedded justification in handoff docs, etc.) is fed into the reviewer skill's dispatch payload as additional context so the reviewer can build on it rather than re-derive it.
(iii) An explicit user override in the task prompt ("you may skip Step 3 for this run", or equivalent) is the only analysis-driven path to skipping (distinct from the difficulty exception above). When this fires, record a warning in the Completion summary so the user has a visible signal that the bias-free review pass was bypassed.

The existing per-iteration "No actionable findings" semantic-judgment skip continues to work — that is a reviewer-side decision (the reviewer ran and returned no actionable feedback), not a Step-skip.

If N_plan = 0, skip this step entirely (Trivial, or --fast forcing N_plan=0 — see the Difficulty exception above) — its rows are already completed, so do not re-mark them in_progress and proceed directly to Step 4. The following in_progress marking and per-iteration processing apply only when N_plan ≥ 1.

Mark Step 3: Plan Review as in_progress. Process each pending iteration item (Step 3-1 through 3-N_plan) in order:

Mark the iteration item as in_progress. Call the reviewer skill resolved in Step 1 (e.g. Skill(ask-peer)): Review the plan. subagent_model propagation (inline reviewer): when the resolved reviewer is Claude-family (per Step 1's reviewer-family classification) and subagent_model is a model id, propagate it — pass Model: <subagent_model> to ask-peer, or include --model <subagent_model> in the dispatch instruction to ask-claude. External-CLI reviewers and an inherit resolution carry no model (current behavior). Step 3 is always inline, so there is no background-launch path to double-apply against here. Pre-dispatch dispatch-boundary reminder: Issue the Skill(<reviewer>) call in the same turn as any accompanying status prose — never produce a standalone status turn before the Skill() call, as that creates a stall point. Reading the reviewer's SKILL.md is preparation, not dispatch; the Skill() call is the dispatch.
- Instruct reviewer to read all files under .claude/rules/ for project conventions, references/plan-format.md for the Decisions (a)+(b) criterion and § Step 3 (f) content-quality rubric, references/simplicity-self-audit.md for the Step 2 audit checklist that category (a) below verifies, and references/review-categories.md § Plan review categories for the full per-category rubric of the six categories below (resolve these references/*.md links to concrete readable paths when composing the request — the reviewer lacks the skill-directory context)
- Request feedback organized into six categories (labels only — full rubric per the read-instruction above): a. Scope & feasibility b. Approach & alternatives c. Completeness d. Incrementality e. External library primary-source verification f. Presentation & attention allocation (content quality)
- If custom_instructions is configured, include the instructions text in the review request and have the reviewer verify alignment and report conflicts
- If a state file is active (executing a subtask from a decomposition), include the current subtask's scope in the reviewer request: list the subtask's title and description, then list what the other subtasks cover (to define out-of-scope). Instruct the reviewer that missing functionality belonging to other subtasks is not an actionable finding for this review — only findings scoped to the current subtask qualify. Omit this when no state file is active (single-task execution has no defined out-of-scope boundary).
- Reviewer should only report actionable findings. If none, explicitly state "No actionable findings"
Judge the reviewer's response semantically: if the reviewer reports nothing actionable — no actionable findings, no improvements to apply, no review points raised, or any other "nothing to report" outcome regardless of exact wording — mark this and remaining iteration items as completed (skip). Mark Step 3: Plan Review as completed and proceed to Step 4 automatically. Per the No-Stall Principle, do not wait for user input and do not rely on exact-phrase matching; trust semantic judgment since the reviewer skill's phrasing varies (especially Skill(ask-peer) and other free-form-prose reviewers whose verdicts are natural-language Markdown rather than a fixed token).
Otherwise: autonomously apply improvements or reject inapplicable points with reason — do not ask the user for judgment on individual review findings. Mark this iteration item as completed.
- Approach-reconsideration self-audit on high findings count (iter 1 only): at the iter 1 → iter 2 boundary, before applying findings individually for iter 1's output, count the reviewer's findings by severity. If either threshold trips — Critical ≥ 3 OR (Critical + Major) ≥ 10 — additionally scan the findings list for any item that surfaces an approach-level alternative (typical phrasings: "X の方が筋がよい", "existing X と統合できる", "switch to <sibling>", "use <existing-mechanism> instead", or any equivalent "the plan should adopt a different overall approach" framing). If at least one such approach-alternative finding is present, do not proceed with mechanical apply-and-iterate — instead, treat the findings cluster as a signal that the plan's Approach selection itself is the root cause. Rewrite the plan with the approach-alternative finding's direction promoted into the Decisions section (Recommendation / Alternative swap or insertion-direction new Decision item, per the rewrite class), add a new review iteration item Step 3-(N_plan+1), and return to Step 3 to re-review the rewritten plan. The remaining iter-1 findings are carried forward as context for the next reviewer. When the threshold trips but no approach-alternative finding is present (mechanical-fix-level findings only), proceed with the usual per-finding apply-and-iterate path. This audit applies only at the iter 1 → iter 2 boundary; later iterations have already exercised one or more apply cycles and approach-reconsideration after that point is the Step 4 user-gate's responsibility (general principle: high finding density paired with an approach-level alternative finding is a structural signal, not a quality signal — keep applying mechanical fixes and the plan still fails at Step 4 user gate).
- Prose-integrity self-check (post-fix): apply the same procedure as Step 8 sub-step 3's "Prose-integrity self-check (post-fix)" paragraph — re-read the surrounding paragraph as a single unit, verify no sentence is cut mid-word and no logical connective is broken — to the plan prose being edited (Decisions / Design / Test plan / Risks / Unknowns paragraphs) in place of code / doc content.
- If the plan was modified: continue to the next pending iteration item (back to step 1). Plan modifications often introduce new gaps or ripple effects that the previous reviewer had no chance to see — the re-review round-trip is cheap compared to shipping a plan that looks fine to the author but has an unvetted section. Don't short-circuit even when the fixes feel airtight
- If all points were rejected (no modifications): mark remaining iteration items as completed (skip — there is nothing new for the next reviewer to look at) Continue to the next pending iteration item with:
- the updated plan
- a summary of changes made and rejections with reasons
- an iteration-scope instruction: from iteration 2 onward, the reviewer's primary verification scope is the plan changes applied since the previous iteration (conveyed by the summary of changes above — no separate diff artifact is provided) plus landing confirmation of the previous iteration's findings — the full-coverage pass (re-verifying every plan section, decision, and cited reference from scratch) belongs to iteration 1 only. The reviewer must still escalate back to full re-verification when content outside that primary scope raises a new concern, so coverage is reordered, not reduced
- the same six-category structure (a–f), .claude/rules/ reference, and "No actionable findings" requirement
Return-point no-stall reminder: At each iteration boundary (regardless of reviewer outcome — findings reported, "No actionable findings", any non-error result), the next action — the next iteration's reviewer dispatch when more iteration items remain, or the Step 4 transition when this was the last iteration or "No actionable findings" was returned — must be issued in the next tool call. Do not insert an interstitial summary or acknowledgment turn between iterations; the abstract enumeration in § No-Stall Principle is intentionally duplicated here so the rule fires at the decision moment.
If all N_plan iteration items are completed and actionable feedback still remains, carry the unresolved points forward to Step 4.

Mark Step 3: Plan Review as completed.

Step 4: Finalize Plan (USER APPROVAL GATE)

Before presenting, verify via TaskList that Step 3: Plan Review and every Step 3-x iteration item are completed — presenting the plan for approval (the surface chosen in sub-step 2) is the effective end of internal review, so reaching it while any Step 3 item is still pending or in_progress skips the review entirely (on path (a) that presentation is ExitPlanMode, which is also the Plan Mode exit). If any Step 3 item is not completed, emit a one-line inline note to the user naming all incomplete items (e.g., Plan review found incomplete (Step 3-2 still pending) — running the remaining review pass before presenting the plan., substituting the actual incomplete iteration item label(s)) then return to Step 3 to process it (do not flip the row to completed without doing the review work). Exception: when N_plan=0 (a Trivial task, or --fast forcing N_plan=0 on a non-Trivial tier) — Step 2's Adjust N therefore pre-marked all Step 3 rows completed — that completed state is the intended skip, not an unrun-review bug, so proceed to the approval presentation normally. 1.5. Prose-language self-audit: Before presenting the plan for approval (the surface chosen in sub-step 2), verify that explanation prose in the plan body (Overview narrative, Decisions rationale, Design descriptions, Test plan steps, Risks/Unknowns paragraphs) is written in the resolved language. Schema tokens (Overview / Decisions / Design / Test plan / Risks / Unknowns), step labels, enum values, identifiers, and quoted code strings stay in their original form regardless of language. Audit both directions: (a) if any explanation sentences are in a different language than the resolved language, and (b) if concept words outside the verbatim-preserve scope — ordinary nouns, adjectives, conjunctions, verb phrases — are over-preserved in the source language rather than rendered in the resolved language (per references/plan-format.md § Localization granularity's Negative-direction rule). Revise any failures now per references/plan-format.md § Localization granularity before proceeding to sub-step 2. Re-entry coverage: this audit must re-run on every entry into Step 4 — both the initial entry and any re-entry triggered by sub-step 1's "return to Step 3" path or sub-step 3's material-change path — since revisions during Step 3 iteration may introduce prose in a language different from the resolved language.
Plan presentation — branch on plan_mode_active (resolved at Step 2 sub-step 2's conditional Plan-Mode entry). Sub-steps 1, 1.5, and 3 apply to every path unchanged.

Plan-body prose polish (runs on every path when polish_prose is true and fast_mode_active is not true, after the plan document is written and before it is presented). Skip when polish_prose is not true, or when fast_mode_active (the polish_prose case fires only when explicitly set to false; the default true and a non-boolean fall-back-to-true both run — see § Configuration's polish_prose bullet; the fast_mode_active case follows the same silent-skip discipline as Step 6.5's fast-mode skip — plan-body prose-polish is purely cosmetic, unlike Step 6 Tidy / Step 7.5, which fast mode keeps running): skip this prose-polish call silently and present the un-polished plan — emit no note, because the user gate immediately follows and a skip note would only clutter the attention-sensitive presentation (contrast Step 6.5's impl-file prose-polish call, whose note is its only run-signal). When both conditions allow it, within the path branches below, once the full plan body has been written to its plan document — the Plan Mode plan file on path (a), .claude/plans/<slug>.md on path (b) — and after sub-step 1.5's Prose-language self-audit has brought the explanation prose into the resolved language (so this polish refines language-correct prose, never prose still pending translation), and before presenting it (path (a): before the condensed chat view and ExitPlanMode; path (b): before delegating to the visual or crit gate, either of which copies the served file from .claude/plans/<slug>.md), invoke Skill(prose-polish) in file mode on that just-written plan document: File: = the plan document path, Language: = the resolved language, and no Model: (see § Configuration's subagent_model bullet). Anti-skip guard: do not skip this dispatch when polish_prose is true merely because sub-step 1.5's Prose-language self-audit returned clean or because the prose appears concise; the self-audit corrects language-level translation errors, but prose-polish refines naturalness and fluency and may still improve prose even when the self-audit passes. It rewrites only the resolved-language explanation prose in place via surgical Edits, leaving headings, identifiers, code blocks, and git-shaped tokens verbatim; it runs on every difficulty tier (a plan is presented on every tier). § Workflow artifacts non-conflict: this deliberately targets the plan document — a workflow artifact that Step 6.5's impl-file scope excludes — so the two prose-polish call sites never overlap. Judge the verdict semantically and proceed per § No-Stall Principle: on a done / no-change verdict present the (possibly-polished) plan; on error emit a one-line note and present the un-polished plan rather than blocking — prose-polish ran and returned this verdict, so nothing is appended to bundle_skills_unavailable (an error status is not an availability signal). A Skill() call failure after one retry is the distinct availability signal — the skill itself is unreachable — and additionally appends prose-polish unavailable (Step 4) to bundle_skills_unavailable (§ Step 1 sub-step 3's "Initialize the bundle-unavailability ledger here" bullet). The verdict is a return-point covered by § No-Stall Principle — do not pause after it.

(a) plan_mode_active == true (plan_review_gate: "plan-mode"): Plan-Mode text path. This is the first time the user sees the plan. Write the full plan body to the Plan Mode plan file with the Write tool (the ExitPlanMode approval modal renders that file's contents), and present a condensed view in chat per the two-tier protocol in references/plan-format.md § Step 4 presentation order — internally reviewed in Step 3 for N_plan ≥ 1 (include any unresolved review points from Step 3); when N_plan=0 (a Trivial task, or --fast forcing N_plan=0 on a non-Trivial tier) Step 3 was skipped, so present the plan as unreviewed and rely on this user-approval gate as the sole review. Render the chat view in this order: a. ## Plan header as a visual boundary. b. The > Review guide line (per references/plan-format.md § Review guide line) followed by the condensed plan body, following references/plan-format.md § Localization granularity in the resolved language (see §Configuration; default ja): Overview in full (including Highlights when present), Decisions in full, and Design as a file-list only (files to change + one line of what-changes each). Test plan and Risks / Unknowns are not rendered in chat — they live in the full plan file and surface via the preamble's verification approach / known risks slots. Section headings render at ### level (one below the ## Plan container); sub-sections (Title, Goal, Scope, Decision N, Implementation, etc.) at ####. c. Horizontal rule (---) separator. d. Summary preamble per references/plan-format.md § User-gate summary preamble. e. Guidance line per references/plan-format.md § Step 4 guidance lines (verbatim, no paraphrasing, no concatenation). f. Call ExitPlanMode in the same turn, immediately after the guidance line. ExitPlanMode triggers the approval modal (which renders the full plan file) — if it is not called, the user sees the plan text but has no way to approve. Delaying ExitPlanMode to a subsequent turn is the primary cause of Step 4 appearing stalled.

(b) plan_mode_active == false (plan_review_gate is visual — the default — or the opt-in crit): no Plan Mode was entered, so there is no Plan Mode plan file and ExitPlanMode is never called on this path (it is invalid outside Plan Mode; the item-f ExitPlanMode above is scoped to path (a)). First establish the no-Plan-Mode plan document: resolve <slug> once per run, then reuse it verbatim on any Step 4 re-entry (same stability as plan_mode_active — a rewrite-approach re-entry must not re-resolve it). Reuse the active decomposition state file's slug when a state file is in play, otherwise derive a kebab-case slug from the effective task (transliterate non-ASCII where reasonable, strip punctuation, lowercase; on collision with an existing .claude/plans/<slug>.md from a prior run, suffix -2, -3, … — this collision check applies only to first resolution, never on re-entry against this run's own <slug>.md), mirroring references/task-decomposition.md § B. Normal sub-mode's slug algorithm — then write the full plan body to .claude/plans/<slug>.md (the canonical plan document that work-complete archives and that § Completion's cleanup preserves; on a decomposed parent, successive subtasks reuse the parent slug and overwrite this <slug>.md, which is harmless because the plan document is per-run and the separate dev-workflow.<slug>.md state file carries cross-subtask state). Then run the plan-review gate for the resolved plan_review_gate value — the approval surface on this path — per this routing table (crit → visual → chat is a fallback chain, not a flat choice: a crit fallback lands on the Visual gate bullet below, which may itself fall back again to chat):
- plan_review_gate: crit: read references/crit-plan-review.md and follow it top to bottom unconditionally. The reference owns its own availability/reachability determination (the crit --version exit-code check and the local-browser check) and self-detects whether to launch the gate or return fallback. Returns one of three outcomes: approve → proceed directly to implementation (no ExitPlanMode); rewrite-approach → handle it exactly per sub-step 3's rewrite-approach bucket; fallback (crit unavailable, its launch failed, or the local browser is unreachable) → Step 4's routing table: continue directly with the Visual gate bullet below, running it against the same plan document (crit's own reference never routes to chat directly — see references/crit-plan-review.md § Fallback contract). Like the visual gate, the crit path does not emit the chat condensed view / preamble / guidance line — the browser is the review surface.
- Visual gate (the action for plan_review_gate: visual, and also the fallback destination when the crit gate above returned fallback): read references/visual-plan-review.md and follow it top to bottom unconditionally. The reference owns the browser-reachability determination (its step 2 printenv CLAUDE_CODE_REMOTE check) and self-detects whether to launch the gate or return fallback. Anti-skip guard: skipping the reference and jumping straight to the chat path below — including rationalizing under § No-Stall Principle that chat is faster — silently disables the gate (the reported "plan never displayed" failure) and is a defect, since the reference's own fallback return, not an up-front guess, is what routes an unreachable browser to chat. Following it writes the served file from .claude/plans/<slug>.md, launches the browser gate via background Bash, and loops internally on localized revise, returning one of three outcomes: approve → the gate has written the latest plan back to .claude/plans/<slug>.md, and the browser submit is itself the final confirmation, so proceed directly to implementation (no ExitPlanMode); rewrite-approach → a revise comment requested an approach-level material change; handle it exactly per sub-step 3's rewrite-approach bucket (the gate has written the latest plan to .claude/plans/<slug>.md); fallback → continue with the chat-approval path below (fallback triggers per the reference's § Fallback contract). The visual path does not emit the chat condensed view / preamble / guidance line — the browser is the review surface.
- No-Plan-Mode chat-approval path (entered only when the visual gate returns fallback — whether visual was the directly-resolved value, or reached via the crit → visual fallback above): a fallback return covers both an unreachable browser and a launch failure, so reachability is not a separate entry condition you test here. Present the condensed chat view exactly as items a–e above (## Plan header, > Review guide + condensed body, --- separator, preamble, guidance line), then — instead of item f's ExitPlanMode — append a one-line pointer to the full plan document at .claude/plans/<slug>.md (already written above) and wait for the user's chat reply (there is no approval modal). Classify the reply per sub-step 3's four buckets. Do not call ExitPlanMode.
CLAUDE_CODE_REMOTE detection is owned entirely by each reference's own Fallback contract (references/crit-plan-review.md for crit, references/visual-plan-review.md for visual) — this sub-step does not test it up front.

Section headings (Overview / Decisions / Design / Test plan / Risks / Unknowns) and the Step 4 guidance line stay English on every path.
Collaborate with the user to refine the plan as needed (normal Plan Mode interaction on path (a); normal chat / visual-gate interaction on path (b) — a swap-decisions / rewrite-approach re-presentation re-enters whichever surface this run uses: the ExitPlanMode modal on path (a), the relaunched visual/crit gate or the chat re-present on path (b)). Categorize each user response into one of the four buckets below via semantic judgment (per § No-Stall Principle's "do not rely on exact-phrase matching" rule — example phrasings are illustrative, not literal discriminators):
- accept: explicit affirmative — "OK" / "approve" / "looks good" / "進めて" / any semantic equivalent. Begin implementation.
- swap-decisions (Decisions Recommendation/Alternative swap on one or more specific items — "Decision 1 を Alternative に", "swap the recommendation on the language flag", "use the alternative for Decision N", "Decision N と M は Alternative で残りはそのまま"): re-render the plan with the specified Recommendation / Alternative pairs swapped on the named Decisions items, leave other items unchanged, run the read-back sub-step below, then re-present the plan (re-enter the gate). When the user names multiple Decisions in one message, list every affected item on the read-back line so partial-coverage misses cannot slip through.
- rewrite-approach (Approach / Design / Scope-level material change — "switch from independent skill to extending sibling mode", "split this into two subtasks", "scope down to only the canonical site", or any change that does not fit a clean Decisions swap): add a new review iteration item (Step 3-(N_plan+1)), run the read-back sub-step below, return to Step 3 to re-review the modified plan, then re-enter Step 4 from sub-step 1 (so sub-step 1's task completion check on the new Step 3-(N_plan+1) item and sub-step 1.5's prose-language re-entry-coverage audit both run before re-presenting at sub-step 2). Exception — --fast-forced N_plan=0 (non-Trivial tier, N_code ≥ 1): skip the add-iteration-item and return-to-Step-3 legs above entirely — apply the revision to the plan document, run the read-back sub-step below to confirm it, then re-enter Step 4 directly from sub-step 1, since N_plan stays 0 through the rewrite (see the discriminator note at the end of this bullet for why). This exception does not apply to genuine Trivial (N_code=0 too); that case follows the reactivation below unchanged. Trivial (N_plan=0) re-activation: if the task had been assessed Trivial (N_plan=N_code=0) so Step 3 was skipped, an Approach-level material change means the task is no longer trivially self-evident — re-run Step 2's Adjust N by difficulty against the rewritten plan to re-derive the difficulty assessment itself (it will no longer be Trivial) and the effective N_plan / N_code (re-running the independent-cap logic on both values, not a single value). Updating the difficulty assessment — not just the counts — is required because every downstream gate keys on the assessment, not on a bare count: subagent_model and the Step 6/6.5/7.5 difficulty-skip matrix both key on the actual assessed tier, and references/plan-format.md's "N_plan=0 conditional (Trivial or --fast)" picks its replacement sentence based on whether the task is genuinely Trivial; leaving the stale Trivial label in place would keep all three reading a tier that no longer applies. Then re-mark the task rows for the re-derived difficulty: register Step 3-1 … Step 3-N_plan (and the Step 8-1 … Step 8-N_code rows) as fresh pending, and clear the previously-skip-completed top-level Step 8: Code Review row back to pending (this reactivation only reaches this point for a re-derived non-Trivial tier, so N_code ≥ 1 always holds here, whether tier-derived or --fast-forced). Clear the top-level Step 3: Plan Review row back to pending too — unless fast_mode_active and the "re-run Step 2's Adjust N" above already re-forced N_plan=0 for the newly re-derived non-Trivial tier (which re-marks Step 3: Plan Review completed and re-appends its fast_mode_skipped_steps entry as part of that same re-run — see the --fast N-forcing paragraph in Step 2's Adjust N): in that case leave Step 3's re-forced completed state in place instead of clearing it, so the two instructions in this bullet don't fight over the same row. The difficulty-skip matrix is re-derived in the same pass: reset difficulty_skipped_steps = [] and fast_mode_skipped_steps = [] (this reactivation only fires when N_code was 0 too, i.e. genuine Trivial, so fast_mode_skipped_steps is always empty here anyway — reset it for symmetry with difficulty_skipped_steps regardless), and re-run Adjust N, which recomputes which of Step 6: Tidy / Step 6.5: Polish Prose / Step 7.5: Rules Compliance Review the new tier skips and re-populates the ledger from scratch (no find-and-remove of individual records). Clear any previously-skip-completed Step 6 / Step 6.5 / Step 7.5 row back to pending when the higher tier no longer skips it — the same re-pending treatment applied to the Step 3 / Step 8 rows. Without this re-derivation the Step 3 entry-point guard would skip the new review item (it skips whenever N_plan=0), Step 4's completion check would loop on the unprocessed item, and a Step 6 / Step 6.5 / Step 7.5 row left stale-completed would silently skip a quality step the higher tier now requires. Why the exception above is safe: this reactivation's own trigger condition requires N_code=0 too (genuine Trivial), so it is already false whenever --fast forced N_plan=0 on a non-Trivial tier (N_code stayed ≥ 1) — fast mode's Step 3 skip is a deliberate mode-level choice, not a possibly-mistaken difficulty judgment, so an approach rewrite does not reopen it (this discriminator relies on the N_code=0 coupling stated in Step 2's Adjust N marking rule and the non-positive-value rejection in Step 1 sub-step 4's N resolution — no other path produces N_code=0). The --fast Step 6.5-only skip paragraph above is re-evaluated only when Adjust N actually re-runs (the genuine-Trivial reactivation case) — it is not re-evaluated when the exception above fires instead.
- withdraw: explicit halt — "stop" / "cancel" / "abort" / "やめる" / "取り下げ". Exit the workflow with no further steps; do not proceed to implementation.
Read-back sub-step (mandatory before applying any swap-decisions / rewrite-approach interpretation): emit a one-line summary of the interpreted change in the resolved language (e.g. Decision 1 を Alternative に切り替え、Decisions 2 と 3 は Recommendation のまま保持します — このまま反映してよろしいですか？) and wait for the user to confirm before re-rendering. The read-back is the gate-of-origin's own resolution branch; do not nest a separate ExitPlanMode call inside it. If the user's confirmation response itself reads as another swap-decisions / rewrite-approach / withdraw instruction, treat the read-back as un-confirmed and re-classify under the four buckets above. The read-back catches multi-Decisions instructions with partial coverage and Approach-level instructions that masquerade as Decisions swaps — both are common failure modes that silently lose user-specified scope when interpreted without read-back.

NOT approval (interrogative or non-committal — "look good?" / "どう？" / "これでいい？"): treat as ambiguous — ask the user to confirm whether they intended an affirmative or to surface a change request, then re-classify the response under the four buckets above. Do not silently advance.

After the user accepts (accept bucket), begin implementation.

Step 5: Implement

Plan entry self-check — user-side manual action extraction: before issuing the first implementation tool call, scan the approved Plan body (Overview / Decisions / Design / Test plan / Risks / Unknowns) for embedded user-side manual actions — environment-prerequisite probes the user must run themselves, configuration values the user must add to a config file outside the agent's write surface, external authentications / API keys / hook installations / OS-level changes the user must perform manually, manual verification steps the user must execute against external systems (for general software development this includes "run <command> and confirm output", "edit ~/<config> to set X=Y", "log in to <external dashboard> and authorize Z"; for skill development this includes ~/.claude/settings.json edits, hook installation, external CLI installation, or workspace-level config the user must place outside the repo). When at least one such manual action is present, emit a short independent block at the top of Step 5 — separately from any other implementation prose — listing each manual action verbatim with the Plan section it came from, before proceeding with the first implementation tool call. The block ensures the user sees the manual action items distinctly rather than discovering them buried inside long-running plan execution. When no manual actions are present (purely agent-executable plan), skip this block. The block is informational — Step 5 continues without waiting for user input on the manual actions themselves (the user-side observation gate at the probe → real-implementation boundary handles cases where the workflow needs to wait).
Follow the plan, track progress with the Task tools (TaskUpdate). When the Design is an ordered, numbered list of implementation steps (per references/plan-format.md § Template), you MAY register each step as an implementation sub-task and execute them in order, marking each completed as it lands — recommended for long ordered plans, optional for short ones. This is consistent with Step 1's "Implementation sub-tasks in Step 5 are additions, not replacements" rule and does not change the Phase-boundary self-audit (which governs only top-level Step transitions). Apply custom_instructions throughout implementation

Subagent delegation of a settled work unit (optional, guard-gated). The default for Step 5 is main-thread implementation — the main thread holds the full task context. As an exception you MAY delegate a unit of the implementation work to a subagent (via the Agent tool) only when all three guards hold: (a) spec-completeness — the unit's shape is settled enough to hand a context-less executor a complete spec — its transformation rule / decisions / expected outcome are fixed, not still being worked out; (b) the session exposes a subagent type or skill effective for that unit; (c) the unit is not judgment-heavy, context-dependent, or small (those stay in the main thread). When delegating: (i) hand the subagent the settled rule, the already-decided facts, and the verification means; (ii) treat Step 7 (Check / Test) as the verification gate; (iii) select the subagent type by capability first — exclude any agent type that cannot perform the required Edit / Write (read-only Explore / Plan-class agents), then prefer the most task-effective of the remaining edit-capable types, falling back to general-purpose (tool set *) when none fits better — do not hardcode general-purpose as the sole target (this mirrors the spirit of Step 2's Task-relevant skill annotation — prefer a task-fit specialist, else fall back — although the discovery surface differs: Step 2 scans skill descriptions, whereas this selects an Agent subagent_type from the session-exposed agent types); (iv) this delegation is one of the two sanctioned exceptions named in § Configuration's Agent tool usage bullet (the other is the Step 2 research delegation), and it propagates subagent_model like the fixed dispatch sites (pass model: <subagent_model> when it resolves to a model id, omit when inherit). The clearest, highest-value case is a large-scale mechanical subtask whose transformation rule is settled (e.g. applying one settled pattern across dozens of files, a bulk test-suite migration) — the signal that motivated this guidance. This is permissive guidance (no config flag), mirroring the adjacent "you MAY register each step as an implementation sub-task" form above.
Respect prior in-session edits: content the user explicitly removed earlier in this session (comments, guards, logs) must not reappear. Treat deletion as authoritative, not as a gap to fill. This discipline applies when applying plan steps, when applying Step 6 tidy output, and when applying Step 8 review fixes — the reviewer/tidy subagents only see the diff and cannot enforce this themselves
Late-stage scaffolding self-audit: when implementation introduces a structural element that was not present in the Step 2 plan — a new sub-step, an additional enum value, a new branch arm, an additional call site that invokes the same callee at a new location, a new recovery / fall-through path (for skill development this includes a new SKILL.md sub-step, an additional status enum in a return contract, a new error class, a new disposition mapping row) — re-apply the same Step 2 § Simplicity self-audit rigor to the newly introduced element before moving on. Sub-checks (i)–(iv) and (ix) fire only when a new structural element is introduced; sub-checks (v), (vi), (vii), and (viii) fire unconditionally for every diff edit: (i) sibling symmetry — when the new element parallels existing sibling elements, verify same fields / same disposition values / same error-class coverage; (ii) error-path symmetry — for any success path introduced, trace its corresponding failure path explicitly (counter increment vs. non-increment, success-only vs. failure-included); (iii) boundary-value coverage — for any predicate, threshold, or count introduced, trace the boundary cases (empty input, all-same-classification, mixed-classification) and verify the predicate truth value matches design intent; (iv) reference-site sweep — if the new element is referenced from prose elsewhere in the file, verify those references use stable phrase anchors (not raw sub-step numbers / branch letters); when a newly introduced or changed reference points at a heading or bold-prose label in a different file, verify the referenced label actually exists in that file's current content via a live Grep or Read before landing — do not rely on a secondary description (a rules document's paraphrase, a prior review comment's quote) of what the target file contains, since secondary descriptions can be stale; if the label is not found, fix the reference (correct the target label, or update the label and sweep other referrers) before landing rather than leaving a dangling cross-reference; (v) Markdown block-element structural integrity — for each edited .md file (unconditional: applies regardless of whether a new structural element was introduced), scan newly added or changed content for adjacent block elements (list items, paragraphs, code fences, headings, blockquotes) that lack a separating blank line; when a gap is found, insert the missing blank line immediately (in the same Edit call) before moving on; missing blank lines cause the following element to be parsed as a lazy continuation of the preceding block and rendered merged — the gap typically surfaces in skill-review as a mechanical edit (for skill development this includes SKILL.md, references/*.md, and README.md edits); (vi) comment conciseness — for every inline comment introduced or changed in the diff, apply the rule-of-need test: a comment is justified only when the why is non-obvious (a hidden constraint, a subtle invariant, a workaround for a specific bug, behavior that would surprise a reader); a comment that restates what the code does is a deletion candidate; multi-line background explanations belong in rule or design documents, not inline — when a verbose or redundant comment is found, delete or condense it before moving on; (vii) newly-introduced cross-reference stable-anchor form — for every cross-reference (Step / sub-step pointer) the diff newly introduces, verify it carries the stable descriptor alongside the number (the number-plus-stable-descriptor pair form required by project rules) rather than a bare number alone; when a non-conforming form is found, add the stable descriptor before moving on (for skill development this includes new Step N / sub-step pointers in SKILL.md and references/*.md); (viii) closed-list / sibling-set stale-value sweep — when the diff edits one or more entries of a closed list or a set of parallel sibling references (a decision-mapping table row, a numbered list with sibling rows, mirrored cross-references), re-read the entire list / sibling set and confirm no entry was left with a stale value inconsistent with its updated neighbors; grep the remaining entries for the old token before moving on (for skill development this includes decision-mapping rows, return-contract enum value lists, and mirrored step-number references where one token has been renamed or updated); (ix) block placement hierarchy — when adding a new block (test case, list item, section, describe/context block, or any other structural element) to an existing file, verify whether its placement nests it inside a parent element's scope; if it is nested, confirm that the parent element's setup / preconditions / before-hooks are intentionally applicable to the new block; if they are not (e.g. the new test exercises a different fixture context, the new section belongs at the top level), relocate the block outside the unintended parent before proceeding (for general software development this includes test suites where a new it or test block inside a describe or context block inherits setup that changes its preconditions; for skill development this includes a new SKILL.md section inadvertently nested under a subsection heading that scopes it differently than intended). The reviewer / tidy subagents see only the diff and cannot enforce this self-audit, so it must run in the main thread at Step 5 — late-stage scaffolding correctness gaps surfacing first at Step 8 Code Review iter 1 indicate this audit was skipped.
Final-pass literal-value full-repo grep: at Step 5 completion (after all planned edits are applied and before advancing to Step 6 Tidy), for each literal value the plan replaced or introduced — numeric constants (threshold values, version numbers, magic numbers), token strings (status enum values, config keys, identifiers), file path fragments, or any other literal whose semantics are tied to a specific value — grep the entire repository (not just the plan's enumerated sweep targets) for the old value and confirm zero hits. The Plan's Test plan typically enumerates known sweep sites, but narrative examples embedded in prose (illustrative numbers in descriptive text, story-style usage examples in SKILL.md / references/*.md / README files) routinely sit outside the enumerated list and silently retain the old value through mechanical search-and-replace passes. Multi-stage structure: (i) enumerated sites — the explicit list from the Plan's Test plan, verified one by one; (ii) final-pass full-repo grep — git ls-files | xargs grep -l <old-value> (or equivalent) for any residual hits outside the enumerated list, with each hit reviewed in context and either updated (if it carries the old value's semantics) or marked as out-of-scope (e.g. a different concept that coincidentally uses the same literal); (iii) alias and derived-form sweep — for rename and migration tasks, additionally grep for mechanically-derivable aliases and derived forms of the old value (abbreviated forms, synonyms, or alternate identifiers the codebase uses interchangeably to refer to the same concept); when the derived-form set can be enumerated upfront, list and grep each form before declaring the sweep complete; grepping only the exact old value misses same-concept usages expressed under an alternate spelling (for skill development this includes aliased import names, short-form identifiers referenced in SKILL.md narrative examples, and config-key abbreviations). The reviewer / tidy subagents see only the diff and cannot enforce the full-repo sweep, so it runs as a Step 5 completion gate (for skill development this includes literal numeric thresholds cited in references/*.md narrative examples, version strings in README usage snippets, and example values in compaction / extract-rules-style descriptive prose). Authoritative-tool cross-check for load-bearing enumeration claims: when the completeness of the enumeration is mechanically verifiable by a downstream authoritative tool — a compiler or type checker reporting all affected call sites for a changed type or interface, a language server returning all references for a renamed symbol (for skill development, when no compiler or language server is available, the authoritative check is a structured manifest audit: e.g. jq against marketplace.json to enumerate all skills array entries affected by a renamed skill, or a targeted Grep scoped to SKILL.md / plugin.json / marketplace.json for all hook-firing paths or all subagent dispatch routes that a configuration change affects — when no such structured verification is available, the grep-only pass is sufficient) — cross-check the grep results against that tool's output; if the tool reports additional sites that grep missed, treat those as unresolved hits and apply the same two-option disposition defined above for the full-repo grep pass (update if it carries the old value's semantics; mark out-of-scope if it coincidentally uses the same literal for a different concept) — and complete this resolution before presenting the enumeration as confirmed. Search filter prefix/anchor errors can silently drop matches with no error signal; authoritative-tool verification catches these missed cases. When both grep and an authoritative tool are available, treat the tool output as the oracle. Non-literal-replacement tasks skip items (i)–(iii). (iv) Coordinated prose-invariant multi-surface sweep: when the planned change updates a design rule, behavioral constraint, or documented invariant expressed textually across multiple documentation surfaces rather than as a single swappable literal, enumerate all known surfaces where the invariant appears — SKILL.md sub-step paragraphs, references/*.md table rows, references/*.md prose body, README examples — as a closed list before starting edits; after all edits land, grep for the old description's key phrases and synonyms across the enumerated surfaces; for each hit, apply the same two-option disposition as the full-repo grep pass — update the surface if it still carries the old invariant's semantics, or mark it out-of-scope with a one-line rationale if it coincidentally shares a phrase but describes a different concept. Single-file grep misses derivative forms in reference tables and parallel sub-step descriptions (for skill development this includes design-rule updates that propagate across SKILL.md sub-steps, references/*.md table rows, and prose paragraphs — a pattern that surfaces repeatedly as cross-iteration missed-site findings in Step 8 Code Review).
Pre-write path scope check (Write / Edit / new-file path safety): before every Write / Edit / similar file-creation tool call whose file_path argument does not match a path that already exists in git ls-files output (typically: new files generated by Step 5 / Plan rewrite / staging document creation / new test fixtures / new CHANGELOG entries — file paths that the tool will create rather than modify-in-place), run a two-stage path verification before issuing the tool call: (i) repo-root containment — verify the absolute resolved path sits under git rev-parse --show-toplevel (no ../ escape from the working directory, no absolute path leading outside the repo); (ii) prefix sanity — verify the path's leading directory matches an expected location for its content class (.claude/plans/ for plan documents, skills/<name>/ for skill content, src/ or tests/ or equivalent for code, .triage/ or tmp/ for staging, etc.). If either check fails, abort the tool call with a fail-loud diagnostic naming the resolved path and the expected prefix set, rather than silently creating the file. The allowed-tools permission grant alone does not prevent parent-directory landing (Write accepts any string file_path), so a procedural pre-check is the only structural defense against typo-induced orphaned files (for general software development this includes accidental migration / config / test-fixture writes landing one directory up; for skill development this includes .claude/plans/<slug>.md typos depositing files at ../<slug>.md, marketplace.json paired-bump operations writing to the wrong manifest, or staging documents landing outside .triage/ / .claude/). If a tool call has already created a file in the wrong location, instruct the user to delete it manually — the workflow's auto-mode classifier cannot reach files outside the project scope, so manual cleanup is the only path.
User-observable artifact protection gate at probe → real-implementation boundary: when the Plan explicitly stages an implementation as probe / intermediate-artifact → real-implementation replacement (e.g. "first emit a debug-instrumented version for user to observe, then replace with the production implementation", "scaffold a placeholder file the user will manually inspect, then overwrite with the final content", "log expected probe output as a verification step, then remove the logging"), do not advance to the real-implementation step until the user has signaled observation completion. The probe-output observation gate is the only user-side wait state permitted inside Step 5 — every other Step 5 sub-step proceeds autonomously per § No-Stall Principle. When the probe is committed to disk and the user has not yet acknowledged observation, hold the workflow at this boundary and emit a one-line wait prompt in the resolved language (e.g. Probe artifact deployed at <path> — please observe its output before the workflow replaces it with the final implementation. Reply when ready to proceed.). Resume the real-implementation step on any non-empty user reply. When no probe → real sequence is in the Plan (typical case — purely incremental implementation), this gate does not fire (for general software development this includes debug-log-instrumented scaffolds replaced by clean production versions, mock-data fixtures replaced by real-data fetches; for skill development this includes verbose-tracing skill versions replaced by streamlined final versions). The gate exists to prevent the probe artifact from being silently overwritten before the user has had a chance to inspect it — a failure mode the No-Stall Principle's autonomy guarantee otherwise creates.

AskUserQuestion option design (applies to the probe gate above and any future user-state-query call in this workflow): when the workflow uses AskUserQuestion (or any equivalent multi-option user-query tool) to query the user about a plan-derived state — probe-execution outcome, manual-verification result, environment-prerequisite check, or any equivalent state confirmation — the options list MUST include a meta-confusion branch alongside the result enumeration. Concretely, do not present only outcome categories (e.g. success / failure / skipped); also include an option phrased as "the procedure / expected outcome is not yet understood (please re-explain)" in the resolved language (e.g. language: ja: 手順 / 期待結果がまだ把握できていない（要再説明）; language: en: the procedure / expected outcome is not yet clear (please re-explain)). The meta-confusion branch absorbs the "I cannot answer the question as posed" state — without it, the user is forced into Other free-text and the workflow consumes an extra clarification turn re-explaining what was already in the Plan. General principle: user-state queries enumerate outcomes AND leave a fallback for the premise-not-conveyed case, never outcomes alone (for general software development this includes deployment-readiness queries, migration-completion confirmations, external-system-state checks; for skill development this includes probe-result queries inside this Step 5 gate, callee-execution-outcome confirmations, manual-config-applied verifications).
Derived-value claim deferral: when deliverable prose embeds a value derived from content that later phases can still change — a size claim about a generated artifact, an item or step count, or any other body-derived figure (for skill development this includes char-count claims about SKILL.md / references/*.md and step-count mentions in CHANGELOG entries or descriptive prose) — do not finalize that value during Step 5. Keep it as a clearly-marked provisional value (e.g. render the figure as <provisional — finalized at Step 10 entry> so the placeholder is grep-able at the application point) and compute + write the final figure exactly once at the last gate where the source content is settled — the plan-deferred bookkeeping application point at Step 10 entry (the deferred-bookkeeping paragraph at the top of references/interactive-commits.md, applied before its § Collect changes step collects the working tree), after Step 6 Tidy, Step 7.5 fixes, Step 8 review fixes, and any Step 9 hooks.on_complete working-tree modifications have all landed. Re-verifying and re-correcting the figure after every downstream phase that touches the body is the anti-pattern this item forbids — each chase is an avoidable rework turn. When interactive_commits: false (Step 10 is omitted and execution proceeds directly from Step 9 to Step 11 — see § Step 10: Interactive Commits), the Step 10 entry gate never occurs: finalize the figure at the same settledness point — immediately after Step 9 completes or is skipped, before proceeding to Step 11 — so the provisional marker never survives into the final tree.
Implementation diff snapshot: at the conclusion of Step 5 (after all planned edits are applied and the derived-value deferral sub-step above completes), run git diff <base-commit> --name-only and store the result as implementation_diff_paths — the set of tracked paths changed by this task's implementation, recorded before any post-implementation review hook or automated fix tool runs. This snapshot is consumed by § Step 10's "Post-hook attribution check" paragraph to identify on-disk changes introduced during the review-hook phase (Steps 6–9) that no review hook claimed responsibility for.
Side-effecting external-tool launch warning: when a Step 5 tool call launches an external process or tool whose execution has a side effect observable outside the sandbox (opening an actual browser window, sending an external network request, starting a long-lived background process — for skill development this includes launching an external CLI in the background to verify its actual behavior against the plan), warn the user in the resolved language before the first such launch and assign each launch a distinguishable identifier (e.g. a timestamp or sequence tag) reported alongside the warning, so a real-world artifact the launch produces can be attributed to the correct launch afterward without guessing. Ordinary sandboxed tool calls with no external-facing effect (Read / Edit / local Bash commands, or an Agent subagent dispatch that itself has no external-facing side effect) are unaffected.

Step 6: Tidy

Implementation often introduces unnecessary complexity that's easier to spot in a dedicated pass after the code works.

Difficulty exception (difficulty-skip matrix). When Step 2 marked Step 6: Tidy completed under the difficulty-skip matrix (Trivial or Simple tier — see Step 2's Adjust N by difficulty), the row is already completed: do not re-mark it in_progress; proceed directly to Step 6.5. The Phase-boundary self-audit (§ Step 1 registration mechanics) treats this pre-completed row as the intended skip exactly as it does the Trivial Step 3 / Step 8 skips, not an unrun-step bug.

The Step 6 cleanup callee is resolved per the Cleanup skill bullet in § Prerequisites (built-in simplify preferred, bundled tidy as fallback). The phase is named "Tidy" after that in-house fallback skill; when simplify is available it — not tidy — is the primary callee.

Cross-layer review handoff ledger. Step 6 (cleanup), Step 6.5 (prose-only cleanup), Step 7.5 (rules-review), Step 8 (code review), and any review-class hooks.on_complete entries (an entry is review-class when it is a Skill(<name>) entry whose skill reviews or inspects the change and reports findings — judge semantically from the skill's name and purpose; plain shell-command entries are never review-class and receive no ledger) run sequentially against the same deliverable but share no state by default — without a handoff, the same structural concern is re-raised and re-judged independently by each layer, and a finding one layer deferred or applied only partially resurfaces later as scattered per-site fixes. From this step onward, keep a lightweight in-memory ledger of each review layer's dispositions: findings deferred (with the reason), findings applied (with the sites covered), and known leftover sites or residual concerns. Include the ledger as a short context item in each subsequent review layer's dispatch payload (the rules-review dispatch, whether Step 7's background launch or its Step 7.5 sequential fallback; the Step 8 review payload, where it complements that payload's same-layer continuation item; and review-class hooks.on_complete callees). When the ledger has no recorded dispositions yet (no prior layer deferred, applied, or left anything over), omit the ledger item from that payload entirely — do not render an empty placeholder. When a later layer re-surfaces a concern the ledger records as deferred or partially applied, resolve it once: sweep all remaining sibling sites in one pass when they are enumerable and within this task's scope; otherwise (sites outside this task's scope, or a sweep too large for this run) record the leftover explicitly in the plan's Risks — do not let each layer independently re-apply the same structural fix to a different subset of sites.

Pre-dispatch rename-sweep self-audit: if the Implement diff (since <base-commit> recorded in Step 2) includes a term-rename operation — a search-and-replace across the project that swapped a step name, callee name, config key, identifier, or domain concept for a new one — sweep the changed-path SKILL.md / references/*.md / README prose for synonyms and derived forms of the rename target before dispatching the Step 6 cleanup skill (the resolved simplify or tidy), and fix any residue inline. General principle: mechanical search-and-replace leaves synonym / derived-form residue that the substitution alone cannot catch — gerund forms when a verb is renamed, nominalizations and related-noun forms when an action is renamed, conceptual paraphrases of the original term in surrounding description text when a step or concept is renamed (for skill development this includes renaming a procedural verb leaving its -ing form in description prose, renaming a step leaving the prior step-concept paraphrase in cross-section reference text, or renaming a callee leaving the old concept noun in doc-comment / SKILL.md narrative). Detect at this Step 6 so the Completion-time integrity check (Step 8 reviewer / hooks.on_complete) remains a backstop rather than the primary detection point. Non-rename diffs skip this audit.
Dispatch the cleanup skill (resolved per § Prerequisites' Cleanup skill bullet): review changed code for reuse, quality, and efficiency, then apply cleanup edits.
- Primary — Skill(simplify): invoke Skill(simplify). Its argument interface is unverified (built-in skill, no on-disk SKILL.md), so do not assume any tidy-specific field — pass no scope argument (Base ref / --base-commit; simplify auto-scopes to the changed working-tree code), and when custom_instructions is set, pass it only as a short best-effort natural-language hint (simplify may ignore it; do not name a Custom instructions field it may not expose). Omit it entirely when custom_instructions is unset or empty (per § Step 2's Sub-skill natural-language argument minimalism note).
- Fallback — Skill(tidy): Do not pass Base ref / --base-commit <sha> — tidy's default working-tree mode is the intended scope here (covers tracked + staged + untracked changes per tidy's § Invocation contract); passing Base ref would switch tidy to committed-history mode and silently drop untracked files from the cleanup scope, even though sibling Steps (Step 7's test_commands, Step 7.5's Skill(rules-review)) invoke their callees with --base-commit <sha>. This Base ref asymmetry rationale is scoped to the tidy path only — the simplify path above passes no scope argument regardless. Pass the workflow's custom_instructions config value through tidy's natural-language Custom instructions field (omit the field entirely when custom_instructions is unset or empty — do not render (none) / empty string / fabricated default). General principles: (i) when a caller-skill dispatch field is driven by an optional config key, state the absent-key behavior inline on the dispatch line rather than relying on cross-reference to the config-parse step; (ii) when a caller depends on a callee's default-mode behavior for scope correctness and sibling steps use a different argument convention, name the asymmetry on the dispatch line as load-bearing rather than implicit — the executor cannot rely on a default-by-omission when sibling steps create an extrapolation pull toward the explicit form. subagent_model propagation: pass Model: <subagent_model> when the Step 2-resolved subagent_model is a model id (omit when inherit), which tidy applies to its per-iteration reviewer Agent dispatch (see tidy's § Invocation contract Model field). The built-in simplify primary path takes no model — built-in skills expose no argument contract — so the propagation is effective only on this tidy fallback path; the simplify→tidy resolution itself (§ Prerequisites' Cleanup skill bullet, the single source of truth) is unchanged. If Skill(tidy) also fails: see § Prerequisites' Cleanup skill bullet (single source of truth) — skip cleanup entirely and fold into sub-step 3 below.
Regardless of the outcome — whether the cleanup skill (simplify, or the tidy fallback) applied fixes, reported no actionable findings, returned any other non-error result, or (per the bullet above) was skipped entirely because both simplify and tidy were unavailable — mark Step 6: Tidy as completed and proceed to Step 6.5 automatically. Per the No-Stall Principle, do not wait for user input.
If the cleanup skill result is not observable (e.g. context compaction occurred during or immediately after the call): inspect git diff <base-commit>. If the diff contains changes clearly attributable to a cleanup pass, treat Step 6 as completed and proceed to Step 6.5. Otherwise (no cleanup-attributed changes visible, or ambiguous), re-execute the Step 6 cleanup skill once (the resolved simplify, or tidy if simplify is unavailable, or skip entirely per § Prerequisites' Cleanup skill bullet if both are unavailable — governed by the same per call site invariant as the first attempt (§ Step 1 sub-step 3's "Initialize the bundle-unavailability ledger here" bullet), so this re-execution does not append a duplicate record) — inspection-and-fix-class skills are idempotent — then proceed to Step 6.5.

Step 6.5: Polish Prose

A dedicated pass that refines the resolved-language explanation prose (comments, test / example descriptions, docstrings, user-facing strings) in the changed files into concise, natural native text via Skill(prose-polish) in file mode. It runs after Step 6 Tidy (Tidy's comment deletions land first, so prose-polish refines only the survivors) and before Step 7 (so Step 7 validates the polished result).

Difficulty exception (pre-completed row, two independent causes). When Step 6.5's row is already completed on entry, it is for one of two reasons — do not re-mark it in_progress; proceed directly to Step 7 either way: (a) the difficulty-skip matrix (Trivial or Simple tier, coupled with Step 6 Tidy and Step 7.5 in that case; see Step 2's Adjust N by difficulty), or (b) --fast's independent Step 6.5-only skip (Step 6 Tidy still runs in this case — see Step 2's Adjust N by difficulty). The Phase-boundary self-audit (§ Step 1 registration mechanics) treats this pre-completed row as the intended skip under either cause exactly as it does the Trivial Step 3 / Step 8 skips, not an unrun-step bug.

polish_prose gate. When polish_prose is not true (only when explicitly set to false; the default true and a non-boolean fall-back-to-true both run — see § Configuration's polish_prose bullet), Step 6.5 does not run: mark Step 6.5: Polish Prose completed, emit the one-line note below in the resolved language, and proceed to Step 7. This guard is a no-op when the row is already completed — on Trivial / Simple the difficulty exception above already owns the skip and proceeded to Step 7, so the polish_prose note is not emitted there; the same applies when --fast's Step 6.5-only skip pre-completed the row on a Moderate / Complex tier. This gate's own note fires only when the row is still in_progress when reached. When polish_prose is true, run sub-steps 1–4 below.

language: ja: Step 6.5（Polish Prose）を skip しました — \polish_prose: false`（opt-out）が設定されています`
language: en: Step 6.5 (Polish Prose) skipped — \polish_prose: false` (opt-out) is set`

Collect the changed-file set. Tracked: git diff <base-commit> --name-only (base-commit from Step 2). Untracked new files: git status --porcelain=v1 --untracked-files=all -z (the =v1 format pin and -z NUL-separation suppress C-quoting). Union the two, then subtract the § Workflow artifacts (cross-step fixed exclusion) set. Compute this fresh rather than reusing Step 5's implementation_diff_paths — that snapshot is tracked-only and would miss the untracked new files this pass should polish. This is the same in-scope changed-file set Step 6 Tidy / Step 7.5 operate on. Empty-set guard: if the set is empty after the exclusion (every change is a § Workflow artifact, or nothing changed), skip the sub-step 2 dispatch — mark Step 6.5: Polish Prose completed and proceed to Step 7 (nothing to polish; prose-polish is never invoked with an empty File:, so the incomplete args error cannot arise). Scope-awareness filter: after the empty-set guard, apply a per-file scope judgment — exclude any file where the current-run's changed lines are a small fraction of the file's total lines. Operationalize with git diff <base-commit> --stat (sum of inserted + deleted lines for the file, from the --stat output) and wc -l <file> (total lines): exclude the file when changed lines ÷ total lines < 10% AND total lines > 100. Principle: prose-polish refines prose introduced by this change, not entire pre-existing handwritten documents where this run contributed only a minor portion. Large pre-existing documents with minor current-run additions are the primary case to exclude. If the narrowed set is empty, the empty-set guard above applies (no dispatch).
Dispatch Skill(prose-polish) in file mode: pass File: = the collected paths (relative, one per line or comma-separated), Language: = the resolved language (only prose in that language is rewritten), and no Model: (see § Configuration's subagent_model bullet for why subagent_model is deliberately not propagated). prose-polish applies its edits in place and returns a single fenced JSON verdict; the main thread does not re-apply.
Judge the verdict and proceed (§ No-Stall Principle): parse status. done / no-change are both success (no-change — no resolved-language prose needed refining — is a normal outcome, no note) → mark Step 6.5: Polish Prose completed and proceed to Step 7. status: "error" (any reason; see prose-polish's ## Return contract) → emit a one-line note, leave the prose un-polished, mark completed, and proceed to Step 7 without retry (a returned error verdict is deterministic; prose-polish ran, so nothing is appended to bundle_skills_unavailable — an error status is not an availability signal). A Skill() call failure (no verdict returned, or the skill unavailable) is the distinct availability signal — the skill itself is unreachable: retry once, then emit a one-line note, skip, append prose-polish unavailable (Step 6.5) to bundle_skills_unavailable (§ Step 1 sub-step 3's "Initialize the bundle-unavailability ledger here" bullet), and proceed to Step 7.
Runs once. Like Step 6 Tidy, Step 6.5 runs a single time before Step 7; the Step 8 fix-rerun loop re-runs Step 7 and Step 7.5 only, not Step 6.5. Prose added during later fix phases is covered by Step 8's "Natural-language quality self-check (post-fix)".

Return-point no-stall reminder: after Skill(prose-polish) returns (regardless of outcome — done / no-change / an error verdict / a call-failure skip, any non-error-stop result), the next action — Step 7's first tool call — must be issued in the next tool call. Do not insert an interstitial summary or acknowledgment turn; the abstract enumeration in § No-Stall Principle is intentionally duplicated here so the rule fires at the decision moment.

Step 7: Check / Test (max 3 retries)

Run check_commands in order (always run all)
- On failure, fix and retry (do not proceed to test execution)
- Pre-execution scope-narrowing: before running each check command, assess whether it is a repo-wide auto-fix tool — a command that writes to files across the repository regardless of which files are in the task scope (e.g. a project-wide formatter, linter with --fix / --write, or bulk document transformer). If the command is a repo-wide auto-fix tool and the working tree contains files changed outside the task-scope snapshot (unrelated existing changes), narrow the command's scope to the task-scope snapshot files before running (e.g. pass the snapshot file list as explicit path arguments if the tool supports it). If scope narrowing is not feasible given the tool's interface, stop and ask the user for direction before running the command — options: run the command accepting the full-width effect, skip the command, or provide an alternative scoped invocation. The Scope-drift guard below is the second safety net for cases where pre-execution assessment is not feasible.
- Scope-drift guard: before each command, record git diff --name-only <base-commit> as the task-scope snapshot (the file set scoped to this task at the start of Step 7). After the command, re-check — any file newly appearing outside that snapshot was written by the command (auto-fix/write behavior sweeping unrelated drift). If scope drift is detected, classify the out-of-scope changes before acting: if all of the following hold — (i) the out-of-scope diff is whitespace or comment changes only (no code-skeleton changes: no non-blank, non-comment lines added or removed), (ii) the total changed line count across all out-of-scope files is ≤ 5, and (iii) the changes are attributable to the formatter or linter that just ran (the command is a known formatter/linter, e.g. lint:fix, format, prettier, black) — then proceed automatically without a user-direction stop: emit a one-line note (e.g. Scope-drift note: <file>(s) received whitespace-only formatting from <command> — proceeding) and continue to the next command. Otherwise (non-trivial drift): warn the user (list both the in-scope files and the newly-appeared out-of-scope files), do not auto-revert / git checkout / delete the out-of-scope changes (leave the working tree as the command left it for user inspection), leave Step 7: Check / Test as in_progress, and wait for user direction. This is a step-internal stop directive — one of two allowed non-completing exits from the check_commands phase (the other being the pre-execution scope-narrowing infeasibility stop above) — and is consistent with the No-Stall Principle, which permits explicit step-defined stops
- Pre-existing vs regression discrimination (check_commands): before applying the bullet above's fix-and-retry action on a check-command failure, discriminate whether the failure is pre-existing or a regression — a check command (e.g. a linter) that reports against the full working tree rather than only the task-scope snapshot will also surface violations that predate this run. Apply the same regression/pre-existing discrimination as the test_commands loop's Pre-existing vs regression discrimination bullet below, including its git stash / git worktree add ../base-commit-check <base-commit> re-run technique for comparing against <base-commit> — with one check_commands-specific variant: when the command supports scoping to specific paths, apply that scoping to the base-commit re-run itself (invoke the command against <base-commit> restricted to the task-scope snapshot files) rather than running it against the entire base-commit worktree — the base-commit comparison is still required to discriminate pre-existing from regression, only the invocation surface is narrowed. Record pre-existing violations as an informational warning (pre-existing violation: <check-command> / <case> — out-of-scope for this PR) and do not fix or retry on their account.

Two read-only analyses can be launched concurrently here: the Step 7.5 Skill(rules-review) (below) and the Step 8 code review (see the Concurrent code review launch paragraph that follows). Both only return findings — the main thread applies any fixes later (rules-review fixes in Step 7.5, reviewer fixes in Step 8) — so overlapping their analysis with the test phase is a pure wall-clock optimization. test_commands is never backgrounded: a backgrounded callee must have an inline fallback for the nested-Agent-unavailable case, which rules-review (its SKILL.md § 5) and the default reviewer ask-peer (its SKILL.md § Process 1) both have, but run-tests does not. This extends to long-running test entries: do not attempt to offload a Skill(<name>) test command to a one-off background subagent to avoid context accumulation — a subagent that receives a minutes-long command will typically background it internally and return an empty verdict; the main-thread synchronous Skill() invocation is the reliable dispatch path regardless of test duration.

Concurrent rules-review launch (per pass). After check_commands pass and before running test_commands, optionally launch the Step 7.5 Skill(rules-review) concurrently so its read-only analysis overlaps the test phase. A pass here is a Step 7 entry that a Step 7.5 sub-step 1 collect will follow: the initial Step 7 entry, and each full Step 7 → Step 7.5 re-entry triggered by Step 8's post-fix re-run (the "Always re-run Step 7 and Step 7.5" bullet). The Step 7-only re-run inside Step 7.5's fix flow is not a pass: the rules-review call that follows it is the fix flow's direct 2nd-cycle invocation, which has no collect branch — a launch there would be an orphan dispatch with no collector (overlapping that path stays out of scope). This paragraph's pass definition is the single definition the bullets below, the Concurrent code review launch paragraph, the Step 7.5 collect, the Step 8 sub-step 1 collect, and Step 8's "Always re-run Step 7 and Step 7.5" bullet refer to.

Initialize tracking (at every Step 7 entry — pass or not — unconditionally before the availability branch, so the unavailable / skip / re-run paths never read an uninitialized variable): rules_review_launched = false and rules_review_stale = false. Re-initializing on a non-pass entry (the Step 7-only re-run inside Step 7.5's fix flow) is harmless: that entry occurs only after the current pass's Step 7.5 sub-step 1 collect has already consumed the result. Lifecycle: this bullet is the only init site; the only set sites are the If available bullet (rules_review_launched) and the discard bullet below (rules_review_stale); the skip / unavailable paths set neither; Step 7.5 sub-step 1 is the only read site.
Availability detection: inspect the current tool list the same way rules-review SKILL.md § 5 detects Agent availability — do not make a speculative call. The capability gated here is specifically background dispatch (Agent with run_in_background), not a bare foreground Agent. Positive criterion: background dispatch is available when the Agent tool is exposed (top-level or via ToolSearch) AND the session offers an Agent run_in_background parameter or equivalent async-dispatch mechanism — the common case in a standard interactive session, so default to parallel. Treat it as unavailable only when one of these two signals holds (closed list — if neither holds, choose parallel): (a) Agent is absent (this also covers this skill running inside a non-recursing subagent, surfacing as nested Agent being unavailable); (b) Agent is exposed but the session offers no background/detached dispatch capability (e.g. an older Claude Code). This two-item list is the single definition of "unavailable" the If unavailable branches below refer to. Do not treat "unsure" as "unavailable": if a background-dispatch capability is present, choose parallel.
If available (and this Step 7 entry is a pass per the definition above — note that on a Trivial or Simple task Step 2 pre-completes Step 7.5 under the difficulty-skip matrix, so no Step 7.5 sub-step 1 collect follows and the entry is not a pass; skip the launch, same orphan-avoidance as the code-review launch's N_code=0 handling): dispatch a background subagent (Agent with run_in_background: true, subagent_type: general-purpose, plus model: <subagent_model> when the Step 2-resolved subagent_model is a model id — omit model when it is inherit) instructed to run Skill(rules-review) --base-commit <sha> — including the cross-layer review handoff ledger as a short context item (per § Step 6's Cross-layer review handoff ledger paragraph; omit when the ledger has no recorded dispositions) — and return the findings report verbatim, applying no edits (the main thread applies fixes in Step 7.5). On a successful dispatch set rules_review_launched = true. Emit a Progress Visibility status line in the same turn. Include in the dispatch payload a note that nested Agent is unavailable in this subagent context — execute Skill(rules-review) directly without probing for sub-subagent availability (the § 5 inline fallback applies automatically; no runtime discovery is needed). Collect the report from the background Agent's completion notification (no extra tool needed).
If unavailable (per the Availability detection criterion above): skip the launch (rules_review_launched stays false) — Step 7.5 invokes Skill(rules-review) sequentially as before (fully backward-compatible).
If test_commands then fail and you fix them (the diff changes): discard both background results — the rules-review result (set rules_review_stale = true; Step 7.5 sub-step 1 then falls back to a fresh sequential Skill(rules-review) dispatch) and the code-review result (set code_review_stale = true; see the next paragraph) — since the prior analyses are now stale. No-op on the no-launch path: setting rules_review_stale = true when rules_review_launched == false has no effect — Step 7.5 sub-step 1's collect branch is gated on rules_review_launched == true, so the unconditional set is safe on every path. Handle disposition: a stale (or never-collected) background rules-review result is simply ignored at the Step 7.5 collect point — no explicit cancellation of the background subagent is owed.

Concurrent code review launch (per pass). In the same window — after check_commands pass and before running test_commands — optionally launch the Step 8 reviewer (the reviewer skill resolved in Step 1, e.g. Skill(ask-peer)) as a background subagent so its read-only analysis also overlaps the test phase. This is a sibling of the rules-review launch above and shares its mechanics — availability detection, dispatch shape, staleness vocabulary, and the pass definition (the rules-review paragraph above is the single home of that definition).

Initialize tracking (at every Step 7 entry — pass or not — unconditionally before the availability branch, so the unavailable / skip / re-run paths never read an uninitialized variable): code_review_launched = false and code_review_stale = false. Re-initializing on a non-pass entry (the Step 7-only re-run inside Step 7.5's fix flow) is harmless: it rewinds the code_review_stale = true that Step 7.5 sub-step 3.a just set, but code_review_launched is also reset to false, and the Step 8 collect branch requires code_review_launched == true — both states route Step 8 sub-step 1 to a fresh sequential dispatch, so the routing is equivalent. Lifecycle: this bullet is the only init site; the only set sites are the If available bullet (code_review_launched) and the staleness set-sites named in the Staleness bullet below (code_review_stale); the skip / unavailable paths set neither; Step 8 sub-step 1 is the only read site. Because every continuation path to a next Step 8 iteration item passes through a Step 7 re-entry (and therefore through this re-initialization), each pass's launch is collected at most once (the Step 8 sub-step 1 collect bullet names which iteration collects).
Availability detection: use the rules-review launch's detection above verbatim — its positive criterion (default to parallel in the common interactive case), its two-item closed list defining "unavailable", and its "do not treat 'unsure' as 'unavailable'" directive all apply here unchanged. The gated capability is background dispatch (Agent with run_in_background), not a bare foreground Agent.
If available (and this Step 7 entry is a pass per the shared definition): launch only when a pending Step 8 iteration item remains to collect the result after this pass — on the initial pass this means N_code ≥ 1 (iteration 1 collects; when N_code = 0 / Trivial, Step 8 is skipped entirely and no iteration item exists); on a re-run pass it holds only when the fix-applying iteration k satisfies k < N_code (a re-run triggered from the final iteration k = N_code leaves no pending iteration item, so a launch there would be an orphan dispatch with no collector — skip it; same orphan-avoidance vocabulary as the rules-review paragraph's non-pass rationale). Dispatch a background subagent (Agent with run_in_background: true, subagent_type: general-purpose, plus model: <subagent_model> when the Step 2-resolved subagent_model is a model id — omit model when it is inherit) instructed to run Skill(<reviewer>) with the same payload Step 8 sub-step 1 would compose for the next pending iteration item — sub-step 1's review-payload definition (including its rubric-link resolution note and, on a re-run pass, the continuation item and the iteration-scope instruction) is the single parametric source; do not restate its list here. Omitting the continuation item on a re-run pass would hand the reviewer a context-free diff and re-surface already-rejected findings. The reviewer returns its report verbatim, applying no edits. On a successful dispatch set code_review_launched = true. Emit a Progress Visibility status line in the same turn. Include in the dispatch payload a note that nested Agent is unavailable in this subagent context — execute Skill(<reviewer>) directly without probing for sub-subagent availability (the reviewer's inline fallback applies automatically; no runtime discovery is needed). The result is collected at Step 8 sub-step 1, not here. The Step 7-only re-run inside Step 7.5's fix flow is not a pass and does not re-fire this launch; the rules-review paragraph's orphan rationale does not transfer here (the code-review collect point is Step 8, where a collector exists even for that path) — overlapping that path simply stays out of scope.
If unavailable (per the Availability detection criterion above): skip the launch (code_review_launched stays false) — Step 8 dispatches the reviewer sequentially as before (fully backward-compatible).
Staleness — discard owned by Step 8 sub-step 1: this background result is speculative. The discard decision lives at Step 8 sub-step 1 (it reads code_review_stale); this paragraph only names the set sites. code_review_stale is set true whenever an edit lands between this pass's launch and the Step 8 collect point that changes the diff the reviewer analyzed: (i) a test_commands failure fix during Step 7 (the discard bullet above), or (ii) any fix Step 7.5 applies (see Step 7.5). Both set-site descriptions are pass-independent and apply to every pass unchanged. The condition is broader than the rules-review launch's (whose collect point is Step 7.5, before Step 7.5's own fixes land); the code-review launch's collect point is Step 8, after Step 7.5's fixes land, so Step 7.5 fixes also count. No-op on the no-launch path: when code_review_launched == false (the launch was skipped or unavailable), setting code_review_stale has no effect — Step 8 sub-step 1's collect branch is already gated on code_review_launched == true, so the no-launch path dispatches the reviewer fresh regardless of the flag's value; the unconditional code_review_stale = true set-sites are therefore safe to execute on every path. Handle disposition: a stale (or never-collected) background result is simply ignored at the Step 8 collect point — no explicit cancellation of the background subagent is owed.

After launching (or skipping) both, run test_commands in the main thread per sub-step 2 below; the background rules-review and (when launched) the code review proceed concurrently.

Iterate over test_commands in order. For each entry (which must be of the form Skill(<name>)), invoke that skill with --base-commit <sha> (from Step 2) via $ARGUMENTS. Each invocation must return a structured summary with one of three statuses (SUCCESS / TEST_FAILED / EXECUTION_ERROR); a TEST_FAILED or EXECUTION_ERROR from any entry halts the loop immediately and triggers the retry path in sub-step 3 — subsequent entries do not run on the failing pass.
- Each test skill handles scope decision and test execution internally via subagent (when applicable)
- Returns structured summary: SUCCESS / TEST_FAILED / EXECUTION_ERROR
- Skill() call failure (no structured summary returned): distinct from a returned EXECUTION_ERROR verdict — the invocation itself errored, timed out, or the test skill's own internal subagent dispatch never completed (e.g. the skill dispatches a nested Agent internally and that nested dispatch stalls). Retry once. If the retry also fails to return a structured summary, fall back to running the check directly in the main thread (bypassing the skill's own internal dispatch mechanism) as a substitute pass, and emit a one-line note in the resolved language naming the fallback (e.g. Step 7: <skill> dispatch failed twice — falling back to direct main-thread execution). This follows the same retry-once discipline used elsewhere for sub-skill call failures (e.g. Step 6.5's prose-polish Skill() call failure handling) — though Step 6.5 skips rather than substitutes a main-thread fallback, since prose-polish has no direct-execution equivalent.
- Environment sanity check: before running tests, confirm the actual resolved toolchain versions match the project's expected versions — print the effective runtime and dependency-manager versions to surface version-manager-not-activated cases where the shell resolves an unintended runtime version and the build tool raises errors on version-incompatible code. Also confirm that compiled or bundled artifacts are rebuilt before tests that consume them — stale artifacts from a prior build produce test results that do not reflect current source state (for general software development this includes client-side JavaScript bundles that must be rebuilt after source changes before browser or SSR tests run; for skill development this includes re-running any code-generation step that produces an artifact a verification step reads, rather than testing against a cached artifact from a previous run that predates the current change).
- Bulk-vs-split execution: when the change is cross-cutting (shared components, mirrored services, or parallel handlers) and the test suite includes long-duration categories (E2E, integration tests with external dependencies), prefer passing scoped or split arguments rather than requesting a single bulk run. A single command bundling long-running jobs makes intermediate progress opaque and failure recovery harder — scope-targeted execution lets each category succeed or fail independently.
- Shared-path re-run scope: when a fix touches a shared path — a utility, helper, or function invoked by multiple distinct test suites (for skill development this includes subagent dispatch shared forms, hook wiring, state-file processing, or any cross-suite path) — include all suites that exercise that path in the re-run scope, not just a representative suite. A green representative suite proves only the paths it exercises; when the changed code is on a shared path, every suite that routes through it is a potential regression surface. When running all affected suites is impractical, record the excluded suites explicitly in the Completion summary as uncovered risk rather than treating the representative re-run as sufficient verification.
- Cheap-diagnostic first pass: when a test or check command first fails, read the raw error output as a diagnostic pass before taking any edit action — identify the failure class (missing symbol, type error, assertion, configuration issue) and locate the error source (file, line, test case) from the output alone. This pass consumes no retry budget and modifies no files; its purpose is to avoid misdirected first-fix attempts that burn a retry slot without addressing the root cause. Apply an edit only after the diagnostic confirms the error source and failure class: when the cause is immediately apparent from the error text (typo, missing bracket, misnamed identifier), apply the fix and enter the retry loop; when the failure class is ambiguous or the error points to unfamiliar code paths, read the relevant source files before editing.
- Pre-existing vs regression discrimination: before entering the retry path on TEST_FAILED / EXECUTION_ERROR, discriminate each reported failure as regression (introduced by this run's changes) or pre-existing (already failing at <base-commit> from Step 2). Two paths: (i) if the invoked test skill's structured summary already classifies failures as pre-existing / regression (recommended return-contract extension for any verification-class skill — lint, test runners, structural checkers, marketplace validators), trust that classification. (ii) Otherwise, re-run the same test skill against <base-commit>: stash the working changes (git stash --include-untracked), check out <base-commit> into a scratch worktree (git worktree add ../base-commit-check <base-commit>) or rely on the test skill's own --base-commit argument if it supports re-evaluating at that ref without working-tree manipulation; compare the failures. Failures reproducing at <base-commit> are pre-existing — record as an informational warning in the summary (pre-existing failure: <skill> / <case> — out-of-scope for this PR) and do not count toward the 3-retry budget and do not auto-fix. Only failures that do not reproduce at <base-commit> are regressions — proceed with the existing retry / fix path. General principle: regression-vs-pre-existing discrimination via base-commit comparison applies to any verification step running a checker against a working tree (lint, test, structural validator — for skill development this includes marketplace structure validation and plugin integrity checks where docs and implementation can disconnect independently of the current change).
- Self-contamination discrimination: when tests pass on one Step 7 entry but fail on a subsequent re-entry after the workflow itself applied changes (a Step 6 tidy fix, a Step 7.5 rules-review fix, or a Step 8 code-review fix landed between the two entries), check whether the workflow's own applied changes — not the original implementation — caused the new failure. Before applying another fix, inspect the diff between the last passing Step 7 entry and the current state and verify whether the test-failing path was modified by a workflow fix. When the workflow-applied change is the regression source, revert or correct that specific workflow fix rather than adding a new edit to the user's implementation code — treating a workflow-self-contaminated failure as an implementation regression burns retry budget on the wrong target and may cascade into additional incorrect fixes (for skill development this includes a Step 7.5 rules-review edit that inadvertently rewrote a tested behavior, or a Step 6 tidy pass that removed an expression the test suite asserts on).
- EXECUTION_ERROR + pre-declared degraded procedure: when a test invocation returns EXECUTION_ERROR AND the approved Plan explicitly pre-declared a degraded procedure for this failure mode (e.g. a Risks entry naming the environmental constraint and a fallback verification path), apply the degraded procedure automatically — execute the fallback, emit a one-line note in the resolved language (e.g. Step 7: EXECUTION_ERROR — applying pre-declared degraded procedure: <procedure-summary>), and continue without consuming a retry. Pre-declared degraded procedures are user-approved accommodations for predictable environmental constraints; routing them through the retry-and-stop path contradicts the plan's prior approval and violates § No-Stall Principle. When no degraded procedure is pre-declared, treat EXECUTION_ERROR as before (trigger the existing retry / fix path).
- Mock/replay-only coverage self-check: when a test invocation returns SUCCESS, check whether the passing tests exercise actual I/O against the external boundary — real network calls, real serialization into the target system's format, real file writes reaching an external service — or only replay pre-recorded responses (cassette fixtures, mock stubs, HTTP interceptors, seeded response files; for general software development this includes external-API integration tests that replay recorded HTTP interactions, message-bus consumers that replay fixture events, file-transfer tests that stub the remote endpoint). If external boundary behavior is covered exclusively by replay-only tests, do not treat the pass as integration-verified for that boundary. Record such gaps explicitly in the Completion summary as replay-only coverage: <boundary description> — not integration-verified and do not declare the highest-risk behaviors for that boundary resolved on the basis of a replay pass alone.
- Downstream-artifact invalidation self-check: when tests pass, ask whether the change invalidates downstream artifacts that were not directly edited — snapshots, golden images, generated code, lock files, schema artifacts, or other outputs derived from the changed sources. Scope-based test selectors key on directly-changed paths and are structurally silent about derived artifacts; a pass from such selectors does not confirm downstream artifacts remain consistent. Verify each potentially-invalidated downstream artifact class explicitly rather than treating a selector-pass as sufficient (for general software development this includes UI changes invalidating visual-regression baselines, schema changes invalidating generated client code or snapshot fixtures, dependency changes invalidating lock files; for skill development this includes structural changes to references/*.md invalidating cited examples or templates in the same SKILL.md or bundle sibling skill files). Record any unverified downstream artifact classes in the Completion summary as uncovered risk.
- Red-before / Green-after verification: when tests pass after applying a fix, verify the same test would have failed without the fix — a regression test that stays green both with and without the fix is a strong signal the bug was not reproduced and the wrong code path is being exercised. Run the relevant test without the fix applied (re-run against <base-commit> from Step 2's initial-state capture, using the same worktree approach as the Pre-existing vs regression discrimination bullet above) and confirm the test now fails; a persistent green result means the test does not exercise the fixed code path. For behaviors sensitive to entry method (state-restoration, client-side routing, lazy initialization, or any mechanism that behaves differently based on how a boundary is crossed), confirm the test traverses the same code path the user traverses rather than a setup shortcut that bypasses the real entry point — seeding state directly, loading a page via a full-page reload rather than in-app navigation, or calling a handler without the dispatch chain the user triggers are common shortcut traps that skip the exact code a fix touches (for general software development this includes integration tests where fixtures pre-populate state the real user action builds incrementally; for skill development this includes smoke tests that call an internal helper directly rather than driving the full Skill() dispatch path).
- Environmental verification fidelity: before treating an automated SUCCESS as definitive evidence, confirm the test automation's execution path reproduces the actual environmental conditions the behavior depends on — platform-specific input filters, OS-level accessibility layers, system event dispatch paths, and similar runtime factors. Automation that injects synthetic events or runs headless may bypass these layers and reach a green state through a path the user never traverses; the PASS result is then structurally unable to rule out the user-visible failure. When a discrepancy exists between an automated PASS and real-world user observation, treat the real-world observation as ground truth — the automation's coverage gap is the explanatory factor. Record such gaps in the Completion summary as automated-pass / real-world-fail: <boundary description> (e.g. automated-pass / real-world-fail: OS gesture recognizer bypassed by synthetic pointer injection) so they remain visible for a follow-up fix (for general software development this includes UI automation that dispatches synthetic pointer events bypassing OS-level gesture recognizers, headless browser tests that skip platform accessibility pipelines, and mobile test harnesses that inject touch events below the OS input filter layer; for skill development this includes smoke tests that invoke a tool helper directly via an internal call rather than the full Skill() dispatch chain, bypassing hook wiring, allowed-tools enforcement, and subagent context injection that the real invocation path exercises).
After 3 retries, report to user and stop

Coverage note (TypeScript multi-tsconfig): For projects with Project References or multiple tsconfig*.json files, a single tsc --noEmit may miss changed files that belong to other tsconfigs. --init auto-registers a per-tsconfig tsc -p <path> --noEmit in this case (see references/init-mode.md for detection rules). If coverage still looks incomplete, re-run --init or append the missing command manually.

GATE: Verify Steps 2-7 are completed (check task status via TaskList; if status is inconsistent, verify actual completion by reviewing work done). Mark Step 7.5 as in_progress unless Step 2 pre-completed it under the difficulty-skip matrix (Trivial or Simple tier) — in that case the row is already completed; do not re-mark it in_progress, skip straight to Step 8 (same already-completed-row handling as the Step 8 GATE's N_code=0 case). (If Step 7 launched a background rules-review, it may still be in flight — Step 7 is "complete" once the test phase passes; Step 7.5 sub-step 1 collects the rules-review result.)

Step 7.5: Rules Compliance Review

Dedicated rules compliance check, separate from code review (Step 8). This ensures rule enforcement gets focused attention rather than competing with correctness and design concerns.

Difficulty exception (difficulty-skip matrix). When Step 2 marked Step 7.5: Rules Compliance Review completed under the difficulty-skip matrix (Trivial or Simple tier — see Step 2's Adjust N by difficulty), the row is already completed: do not re-mark it in_progress; proceed directly to Step 8. The Phase-boundary self-audit (§ Step 1 registration mechanics) treats this pre-completed row as the intended skip exactly as it does the Trivial Step 3 / Step 8 skips, not an unrun-step bug.

Responsibility scope (so the same rule class is not double-reviewed across passes and no class slips through every pass):

Step 7.5 owns the mechanical walk of every matched .claude/rules/ rule against the diff — hard rules (explicit prohibitions, naming, reference form, import paths, placement, file structure) are evaluated strictly; intent-style rules (judgment-based principles, prose conventions) are evaluated best-effort with low-confidence markers per rules-review SKILL.md.
Step 6 Tidy covers reuse, prose quality, dead code, and redundancy; rule compliance is not its primary responsibility — if the Step 6 cleanup skill (Skill(simplify) or the Skill(tidy) fallback) surfaces rule findings as a side effect, treat them as bonus and do not extend its reviewer prompt to take on .claude/rules/ walks.
Step 8 Code Review covers correctness, edge cases, conventions / consistency lightly (a safety-net pass for files modified after Step 7.5 when Step 7.5 ran), and simplicity / maintainability — the thorough rules check stays at Step 7.5 wherever it runs. On Simple tier, where Step 7.5 is skipped under the difficulty-skip matrix, this safety-net flagging is the run's primary rules-compliance defense instead of a lightweight backstop.
Step 11 Update Rules owns the rule-doc-drift class: findings where the code under review is internally consistent with itself (and with the broader file's existing pattern across 3+ call sites per rules-review SKILL.md's drift detection criteria) but the rule document describes different behavior — i.e. the rule text has gone stale relative to the code. Step 7.5 surfaces this class via the reviewer's Classification: rule-doc-drift finding and does not apply a code fix; the disposition is to route the rule-text update to Step 11 (Skill(extract-rules)) rather than rewriting the code to match a stale rule. When rules-review returns a finding tagged Classification: rule-doc-drift, treat it as out-of-scope for Step 7.5's fix loop (no Skill(rules-review) re-run is required to clear it, since the code is the source of truth), record the routing intent so Step 11 picks it up, and continue.

When a rule violation is reported in both passes (Step 7.5 and Step 8), treat Step 7.5 as authoritative and skip the duplicate fix attempt in Step 8 to avoid double-counting in the iteration budget.

Obtain the rules-review report — collect the Step 7 background launch, or invoke directly. If Step 7's "Concurrent rules-review launch" dispatched a background rules-review this pass and it is still fresh (rules_review_launched == true and rules_review_stale == false), collect that background result now (it ran concurrently with the test phase). If the background subagent has not yet reported when you reach this point (the test phase finished first), wait for its completion notification before judging the report — this wait is a non-stalling return boundary (harness-tracked background work), not a user gate, and a not-yet-arrived notification must never be read as "no findings". If the collected background result is itself an error completion — the subagent died or returned something unusable as a rules-review findings report — treat it the same as not-launched and fall back to the fresh sequential dispatch below; this route only redirects (it does not mutate the launch/stale flags, so the lifecycle closed list is unchanged). Otherwise — rules_review_launched == false (background dispatch was unavailable, or the dispatch attempt did not succeed) or rules_review_stale == true (the background result was discarded after a test failure) — invoke Skill(rules-review) with --base-commit <sha> (base-commit recorded in Step 2) — and Model: <subagent_model> when subagent_model is a model id (omit when inherit), which rules-review applies to its internal per-category Agent dispatch — via $ARGUMENTS, including the cross-layer review handoff ledger as a short context item in the dispatch (per § Step 6's Cross-layer review handoff ledger paragraph; omit when the ledger has no recorded dispositions). Either way the report comes from the external rules-review skill: do not substitute an inline rules-walk based on perceived scope, change size, or any other self-judgment of the diff's complexity — small / "obvious" / single-file changes still go through the external skill. The skip-to-fallback path is documented in Prerequisites and fires only on objective skill unavailability (the Skill(rules-review) call itself fails after one retry), never on subjective judgment that an inline equivalent would suffice. The external skill enforces consistent coverage across runs; inline substitution silently degrades that coverage and the user has no visible signal that it happened.
Judge the result semantically: if the skill reports that there is nothing to act on — no actionable violations, no changed files, no applicable rules, no rule files found, or any other "nothing to report" outcome regardless of exact wording — mark Step 7.5: Rules Compliance Review as completed and proceed to Step 8 automatically. Per the No-Stall Principle, do not wait for user input and do not rely on exact-phrase matching; trust semantic judgment since the skill's phrasing may evolve across versions.
If violations found: a. Fix all reported violations. Applying these fixes changes the diff that the Step 8 code-review background launch (if any) analyzed, so set code_review_stale = true — Step 8 sub-step 1 then discards the now-stale background result and dispatches the reviewer fresh against the post-fix diff. b. Re-run Step 7 (Check / Test) to ensure fixes did not break anything (this sequential re-run does not re-fire Step 7's concurrent rules-review or code-review launches) c. Re-run Skill(rules-review) with --base-commit <sha> for verification (2nd cycle). Apply the same semantic judgment as step 2: if the re-run reports nothing actionable, mark Step 7.5: Rules Compliance Review as completed and proceed to Step 8 automatically (per the No-Stall Principle). When a 2nd-cycle verdict differs from the 1st on a specific location (a previously-flagged item now passes, or a previously-clean location is now flagged), record the reason inline in the Step 7.5 user-facing summary presented to the user (1–2 lines per drifted location: which location, 1st-cycle verdict, 2nd-cycle verdict, why) before completing — judgment drift between cycles is acceptable but must be explained, otherwise repeat-cycle stability cannot be assessed. d. If violations still persist after the 2nd review cycle, present remaining violations to user for decision. Above the violations list, emit a summary preamble per references/plan-format.md § User-gate summary preamble. Render the violations following references/plan-format.md § Localization granularity in the resolved language. Wait for user response before marking completed. (This is one of the explicit user-gates enumerated in the No-Stall Principle.)

--fast 1-pass cap: when fast_mode_active, skip sub-steps (c) and (d) — after (a) applying fixes and (b) re-running Step 7, mark Step 7.5: Rules Compliance Review completed directly and proceed to Step 8 without the 2nd-cycle re-verification dispatch or the persistent-violations user gate, and append Step 7.5 re-verification skipped (fast mode) to fast_mode_skipped_steps so the unverified fix is never silent (§ Completion's fast-mode-skip reminder renders it). The fix is trusted unverified in this case; Step 8's code review is the remaining (partial) safety net — Step 8 sub-step 1's dispatch payload reads this same ledger entry to adjust its own framing accordingly (see that sub-step).

Mark Step 7.5: Rules Compliance Review as completed only after all violations are resolved (or, under --fast, after sub-steps (a)/(b) land) or user has decided on remaining violations.

GATE: Verify Steps 2-7.5 are completed (check task status via TaskList; if status is inconsistent, verify actual completion by reviewing work done). Mark Step 8 as in_progress only when N_code ≥ 1; if Step 2 set N_code=0 (Trivial), Step 8 is already completed — do not re-mark it in_progress, skip straight to Step 9. (If Step 7 launched a background code review, it may still be in flight — Step 8 sub-step 1 collects it.)

Step 8: Code Review

Code review catches bugs, convention violations, and design issues that tests alone miss — skipping it risks shipping preventable defects. Always run this step even when tests pass cleanly.

Difficulty exception (Trivial / N_code=0). When Step 2's difficulty assessment set N_code = 0 (a Trivial task — Trivial zeroes both N_plan and N_code), this entire step is skipped: its task rows (top-level Step 8: Code Review and every Step 8-x) were already marked completed by Step 2's Adjust N by difficulty. As with Step 3, this skip is gated on task difficulty; for any Simple / Moderate / Complex task (N_code ≥ 1) Step 8 always runs.

If N_code = 0 (Trivial), skip this step entirely — its rows are already completed, so do not re-mark them in_progress and proceed directly to Step 9 (Completion Hooks). The following in_progress marking and per-iteration processing apply only when N_code ≥ 1.

Mark Step 8: Code Review as in_progress. Process each pending iteration item (Step 8-1 through 8-N_code) in order:

Mark the iteration item as in_progress. Obtain this iteration's reviewer report — collect the Step 7 background launch when it is fresh, otherwise dispatch the reviewer skill resolved in Step 1 (e.g. Skill(ask-peer)):
- Collect the Step 7 background launch when fresh: if code_review_launched == true and code_review_stale == false, the Step 7 "Concurrent code review launch" ran the reviewer in a background subagent concurrently with this pass's test phase — collect that result now as this iteration's reviewer report. Each pass's launch is collected at most once — by the first iteration item processed after that pass (iteration 1 on the initial pass; iteration k+1 on a re-run pass triggered from iteration k; derivation at the Initialize tracking bullet of Step 7's "Concurrent code review launch" paragraph). If the background subagent has not yet reported, wait for its completion notification before judging it, per the same non-stalling wait-boundary rule as Step 7.5 sub-step 1's background collect (a not-yet-arrived notification must never be read as "No actionable findings"). If the collected background result is itself an error completion, apply the same error-completion route as Step 7.5 sub-step 1's background collect — treat it as not-launched and dispatch fresh per the next bullet (the route only redirects; it does not mutate the launch/stale flags).
- Otherwise — dispatch fresh: when code_review_launched == false (background dispatch unavailable, the launch was skipped for this pass, or the dispatch attempt did not succeed), or code_review_stale == true (the diff changed since this pass's launch so the background result is stale), or when redirected here by the collect bullet's error-completion route, call the reviewer skill (e.g. Skill(ask-peer)) to review the code changes now. subagent_model propagation (inline fresh-dispatch only): propagate subagent_model exactly as in Step 3 (Claude-family reviewers only). This applies only to this inline fresh-dispatch path — the Step 8 background-launch path already carries subagent_model via the Step 7 launch's Agent model, so the two paths never double-apply. Pre-dispatch dispatch-boundary reminder: Issue the Skill(<reviewer>) call in the same turn as any accompanying status prose — never produce a standalone status turn before the Skill() call, as that creates a stall point. Reading the reviewer's SKILL.md is preparation, not dispatch; the Skill() call is the dispatch. In both paths the collecting iteration is an ordinary iteration (it judges and applies findings per sub-steps 2–3 below); the collect path only substitutes the report's source. The reviewer report addresses the following — this list is the single parametric source for both paths (it is the fresh-dispatch request, and the same payload the Step 7 background launch bakes; the continuation item and the iteration-scope instruction below each apply per their own conditions):
- Include git diff <base-commit> (base-commit recorded in Step 2) to capture all changes since workflow start; before composing this payload, also run git status --porcelain=v1 --untracked-files=all and Read the contents of any untracked new files (lines with ?? prefix), then include those file contents in the review payload labeled as untracked new files — these are task-generated artifacts absent from git diff <base-commit> that would otherwise be invisible to the reviewer
- Instruct the reviewer to flag any obvious .claude/rules/ violations as a safety net. On Moderate / Complex, thorough rules compliance was already verified in Step 7.5, so this is a lightweight check focused on code modified after Step 7.5 — unless fast_mode_skipped_steps contains Step 7.5 re-verification skipped (fast mode) (the Step 7.5 1-pass cap fired this run), in which case Step 7.5's fixes were never re-verified: instruct the reviewer to weight the rules check on the fixed locations accordingly, the same as the Simple-tier case below. On Simple, Step 7.5 is skipped under the difficulty-skip matrix (see Step 7.5's "Responsibility scope" paragraph), so this is the run's only rules-compliance check — instruct the reviewer to weight it accordingly
- Request feedback organized into three categories (labels only — the full per-category rubric lives in references/review-categories.md § Code review categories; instruct the reviewer to read that section, resolving the link to a concrete readable path when composing the request — the reviewer lacks the skill-directory context): a. Correctness & edge cases b. Conventions & consistency c. Simplicity & maintainability
- If custom_instructions is configured, include the instructions text in the review request and have the reviewer verify compliance and report conflicts
- If a state file is active (executing a subtask from a decomposition), include the current subtask's scope in the reviewer request: list the subtask's title and description, then list what the other subtasks cover (to define out-of-scope). Instruct the reviewer that missing functionality belonging to other subtasks is not an actionable finding for this code review — only findings scoped to the current subtask qualify. Omit this when no state file is active (single-task execution has no defined out-of-scope boundary).
- Include the cross-layer review handoff ledger as a short context item (per § Step 6's Cross-layer review handoff ledger paragraph; omit when the ledger has no recorded dispositions); it complements the same-layer continuation item below.
- If a prior Step 8 iteration completed this run (a re-run pass): include the continuation item — the summary of fixes made and rejections with reasons from the completed iterations, including any class-level sweep record — so the reviewer builds on prior rounds rather than re-raising already-rejected findings (the latest git diff <base-commit> is already the first item above). Omit this item on the initial pass (no prior iteration).
- On a re-run pass, also include an iteration-scope instruction: the reviewer's primary verification scope is the changes applied since the prior iteration (identified via the continuation item's summary of fixes, located within the latest git diff <base-commit>) plus landing confirmation of the prior iteration's findings — the full-coverage pass (re-verifying every target file in the full git diff <base-commit> from scratch) belongs to the initial pass only. The reviewer must still escalate back to full re-verification when content outside that primary scope raises a new concern, so coverage is reordered, not reduced. Omit this item on the initial pass (the initial pass is the full-coverage pass).
- Reviewer should only report actionable findings. If none, explicitly state "No actionable findings"
Judge the reviewer's response semantically: if the reviewer reports nothing actionable — no actionable findings, no bugs / convention violations / design issues raised, or any other "nothing to report" outcome regardless of exact wording — mark this and remaining iteration items as completed (skip). Mark Step 8: Code Review as completed and proceed to Step 9 (Completion Hooks) automatically. Per the No-Stall Principle, do not wait for user input and do not rely on exact-phrase matching; trust semantic judgment since the reviewer skill's phrasing varies (especially Skill(ask-peer) and other free-form-prose reviewers whose verdicts are natural-language Markdown rather than a fixed token).
Otherwise: autonomously fix genuine issues or reject inapplicable points with reason — do not ask the user for judgment on individual review findings. Mark this iteration item as completed.
- Rejection self-question (severity-label override): before rejecting any finding solely because the reviewer labeled it Minor (or any other low-severity bucket), ask "if I rejected this and presented the resulting code to the user, would the user re-raise the same point themselves?" — judging by which areas the user has historically commented on (intent expression, reader-comprehension, placement consistency for test fixtures / helper functions / dependency locality, and other readability concerns where runtime correctness is unaffected but a reader's interpretation is). If the answer is yes or ambiguous, apply the fix instead of rejecting on the Minor label alone; reject on Minor only when you are confident the user would not surface the same point.
- Class-level extension audit (post-Critical/Major-fix): immediately after applying a fix for a Critical-severity finding, or a Major-severity finding whose fix addresses a structural pattern (external I/O boundary conditions, closed enum / form-set networks, shared helper / safety-rail callers, parallel route handlers — for skill development: subagent return-value schemas, shared handler fallback paths, mirrored form-set network audits), and before the modified-vs-rejected branches below, scan the rest of the diff for other instances of the same defect class — same operation, same broken assumption, same side-effect pattern (e.g. shared-resource-destroying API call sequences, direct processing of unverified input, race conditions). Reviewer feedback typically names one instance; the underlying class often spans the diff (cross-construct propagation, shared safety-rail callers, parallel route handlers, etc.). Apply the same fix direction to additional matches found here, then record the sweep outcome (e.g. class-level sweep for <defect-class>: N additional instances found and fixed or no additional instances found) in the summary passed to the next iteration so the next reviewer does not re-trigger the same audit on already-swept ground, then continue to the modified-vs-rejected branch.
- Prose-integrity self-check (post-fix): after applying a fix that edits prose adjacent to its target line (comments, docstrings, paragraph-level documentation — for skill development this includes SKILL.md and references/*.md content), re-read the surrounding paragraph as a single unit before continuing — verify no sentence is cut mid-word, no logical connective is broken (the connectives however / therefore / because / but / etc. still anchor real clauses), and the paragraph's overall logic still holds. Mechanical fix patches see only the line-level diff and routinely leave the surrounding prose semantically broken in ways the next iteration's reviewer flags as a Major finding, costing an extra iter.
- Natural-language quality self-check (post-fix): when a fix adds new natural-language content that mechanical lint / test cannot verify (code comments, config-file annotations, error messages, UI copy, documentation fragments — for skill development this includes SKILL.md / references/*.md prose additions, frontmatter description text, log messages), re-read each added fragment as a standalone unit in the resolved language. Judge it on four axes: concise (no padding or runaway sentences), phrasing natural for the target reader, vocabulary consistent with surrounding text, register and sentence structure not awkward. Revise any fragment that fails. This self-check is the only gate before natural-language content reaches the user-visible commit gate — Step 7 (check_commands / test_commands) and Step 7.5 (rules-review) cannot evaluate natural-language quality.
- Phrase-duplication sweep (post single-site fix): when a fix corrects a factual or descriptive claim in one location (a comment, a config-key description, a CHANGELOG entry, a frontmatter description, or any other prose surface), search the rest of the diff and the surrounding unchanged file content for the same claim restated elsewhere — exact wording and close paraphrases alike, since a reviewer's strict-match grep catches only the former. Apply the same correction to every match found in this pass; closing every duplicate now is what a 2nd review iteration would otherwise have to catch one site at a time.
- Comment-verbosity self-check (post-fix): scan all inline code comments visible in the diff — both newly-added (+ lines) and pre-existing comments in context lines — for over-explanation. Remove any comment that (a) describes what the code does when identifiers already express that (e.g. // increment counter next to count++), (b) narrates obvious control flow, or (c) restates a variable or function name in prose. Retain only comments that explain why: a hidden constraint, a subtle invariant, a workaround for a specific external issue, or behavior a reader would not expect from the identifiers alone. A pre-existing verbose comment brought into the diff's context window is as legitimate a removal target as a newly-added one. This check is distinct from the Natural-language quality self-check above, which covers documentation prose — this check covers inline source code annotations specifically (for skill development: inline rationale phrases within code blocks in references/*.md, or comments inside bash fenced blocks in SKILL.md).
- If code was modified: re-run Step 7 and Step 7.5 (with same base-commit from Step 2) — this full re-entry is a new pass (per Step 7's "Concurrent rules-review launch" pass definition): both the rules-review launch and the code-review launch re-fire (the code-review launch bakes its payload per sub-step 1's definition, continuation item and iteration-scope instruction included, and only when a pending iteration item remains — see Step 7's "Concurrent code review launch") — then continue to the next pending iteration item (back to step 1). Code fixes routinely introduce fresh bugs, tighten one place while loosening another, or miss a caller the author didn't know about — the next review round is how those leaks get caught. Always re-run Step 7 and Step 7.5 — no exceptions. Do not short-circuit on any rationalization: not on confidence in the fix, not because the diff is small, not because the modified files appear out of scope for the configured check_commands / test_commands (e.g. edits land entirely under a local-skill directory or a docs-only path), not because re-running "would be a no-op". If a re-run is genuinely a no-op, the no-op outcome is the audit trail; skipping the re-run removes the trail. The only permissible skip is when no code was modified in this iteration (handled by the next bullet).
- If all points were rejected (no modifications): mark remaining iteration items as completed (skip — there is nothing new for the next reviewer to look at) Continue to the next pending iteration item; the next pass's reviewer dispatch (the Step 7 background bake and the fresh-dispatch path alike) composes its payload per sub-step 1's definition, continuation item and iteration-scope instruction included — not restated here.
Return-point no-stall reminder: At each iteration boundary (regardless of reviewer outcome — findings reported, "No actionable findings", any non-error result), the next action — the next iteration's reviewer dispatch when more iteration items remain, or the Step 9 (Completion Hooks) transition when this was the last iteration or "No actionable findings" was returned, or the Step 7 / Step 7.5 re-run when code was modified — must be issued in the next tool call. Do not insert an interstitial summary or acknowledgment turn between iterations; the abstract enumeration in § No-Stall Principle is intentionally duplicated here so the rule fires at the decision moment.
If all N_code iteration items are completed and actionable feedback still remains, present the unresolved points to user for decision. Above the unresolved points, emit a summary preamble per references/plan-format.md § User-gate summary preamble. Render the findings following references/plan-format.md § Localization granularity in the resolved language.

Mark Step 8: Code Review as completed.

Step 9: Completion Hooks

Skip this step if hooks.on_complete is not configured. Mark Step 9: Completion Hooks as in_progress.

Task-derived-change gate: before executing any entry, check whether the tracked diff since <base-commit> (recorded in Step 2) contains changes produced by this task. When it does not — the tracked diff is empty or every changed path in it is pre-existing work unrelated to this task, and git status --porcelain=v1 --untracked-files=all shows no task-derived untracked files (gitignored paths never appear in that output, so the typical case — the task's only deliverables living under a gitignored directory — still skips) — skip the whole hooks.on_complete list, mark this step completed, and emit one line in the Completion summary naming the skip reason (e.g. hooks.on_complete skipped: no task-derived changes), then proceed to Step 10 (or directly to Step 11 when interactive_commits: false — per § Step 10). When an unrelated pre-existing diff exists, also add a warning line surfacing those paths (e.g. hooks.on_complete skip warning: pre-existing unrelated diff in <path>, <path>) so the user can notice unintended pre-run changes. Review-class hooks dispatched against an unrelated diff bind their findings to content the task never touched — a misleading record rather than a safety net; skipping the non-review entries along with them is likewise intended — with no task-derived changes there is no task output for any hook entry to act on. On any doubt about whether a changed path is task-derived, run the hooks as usual (the gate skips only when the absence of task-derived changes (tracked or untracked) is clear).

Classify each entry in hooks.on_complete as review-class (skill invocations that only return findings for the main thread to apply later — no working-tree writes, no ordering dependencies on other entries; the review-class definition from § Step 6's Cross-layer review handoff ledger paragraph applies) or non-review-class (entries that write to the working tree, run state-mutating commands, or require ordering guarantees). When background dispatch is available (same criterion as Step 7's Concurrent rules-review launch paragraph: Agent with run_in_background exposed), dispatch all review-class entries concurrently — mirroring Step 7's read-only analysis overlap — then run non-review-class entries sequentially after all concurrent dispatches return. When background dispatch is unavailable, run all entries sequentially in order (fully backward-compatible fallback):
- Skill(<name>) pattern: invoke the skill — for review-class entries, include the cross-layer review handoff ledger as a short context item (per § Step 6's Cross-layer review handoff ledger paragraph; omit when the ledger has no recorded dispositions)
- Other strings: execute as a Bash command (always non-review-class — Bash commands cannot return findings to the ledger)
Review-class write reconciliation (non-fatal): a review-class entry returns findings for the main thread to apply — by its classification it must not write to the working tree itself. After each review-class entry returns, reconcile its reported applied-edit count (the applied_edits_count in its JSON verdict; treat a findings-only reviewer that reports none as 0) against the actual working-tree change attributable to that entry: if the entry reports zero applied edits yet its dispatch left an on-disk change (a path now in git diff <base-commit> that no prior step and no other entry introduced), record a non-fatal warning <hook> reported <n> edits but changed <path> (review-class write divergence) in the Completion summary. This surfaces a findings-only reviewer that silently self-applied an edit as a first-class signal at the hook boundary, rather than leaving it for Step 10's Post-hook attribution check to catch only at the commit gate. Do not revert here — Step 10's Post-hook attribution check owns resolution; this warning is the early-surfacing signal that points the user at the divergence before commit grouping begins.
Decomposed-run state-file guard: Before executing each non-review-class entry, if the current run is a decomposed partial-goal run (a subtask is active — the decomposition state file tracking remaining subtasks still exists), verify the entry's operations would not archive, move, or delete the decomposition state file. If they would, suppress that hook entry and log the suppression with reason (e.g. hooks.on_complete suppressed: <hook> would archive decomposition state <path>); continue to the next entry.
If a hook fails, report the error but continue executing remaining hooks. Include as warnings in the Completion summary
After all hooks complete (or are skipped), mark Step 9: Completion Hooks as completed and proceed to Step 10

GATE: Verify Steps 2-9 are completed (check task status via TaskList; if status is inconsistent, verify actual completion by reviewing work done). Mark Step 10 as in_progress.

Step 10: Interactive Commits

After hooks.on_complete (which may itself modify the working tree, e.g. via auto-formatter or apply-edit hook entries), group the working-tree changes into commits and iterate with the user one commit at a time. Step 10 runs only when interactive_commits: true — Step 1's task registration omits the row otherwise and execution proceeds directly from Step 9 to Step 11. The git push is never performed by this step (or any other step): pushing commits to a remote is the user's responsibility.

Unexpected current branch is not itself a signal to investigate: if the current branch differs from whatever branch was active earlier in the session, this is expected whenever the execution environment pre-creates or switches to a dedicated working branch before the first tool call. As long as the current branch's history contains <base-commit> (recorded in Step 2 — verify with git merge-base --is-ancestor <base-commit> HEAD; zero exit means it does), treat the difference as normal and skip any root-cause investigation into why the branch changed. If that check instead exits non-zero, the current branch does not descend from <base-commit> — this is not the expected pre-created-branch case: stop and surface the discrepancy to the user for direction before proceeding (consistent with the No-Stall Principle's allowance for explicit step-defined stops), rather than investigating or switching branches unilaterally.

On entry to Step 10, initialize landed_count = 0 before running the procedure — so the value is well-defined for the Completion section even when the empty-output skip path in references/interactive-commits.md § Collect changes fires before its § Per-commit loop ever starts.

Post-hook attribution check: Run git diff <base-commit> --name-only to list all currently changed paths (step10_diff_paths). Compute hook_introduced_paths = step10_diff_paths − implementation_diff_paths (captured at § Step 5's "Implementation diff snapshot" paragraph): these are paths that appeared in the working tree during the review-hook phase (Steps 6–9) rather than during implementation. First subtract the § Workflow artifacts (cross-step fixed exclusion) set so workflow-owned in-session state (e.g. a plan file the ledger wrote a leftover into) is never flagged as unattributed. Cross-reference the remaining hook_introduced_paths against the cross-layer review handoff ledger's applied sites (from § Step 6's Cross-layer review handoff ledger paragraph): any path in hook_introduced_paths that is NOT covered by a ledger applied-site is unattributed — a change no review hook claimed responsibility for. When unattributed paths exist, surface each with git diff <base-commit> -- <path> and require explicit resolution before proceeding to commit grouping: (i) confirm as an expected side-effect (an auto-formatter the plan authorized, a file generated by a hook's non-review behavior) and continue, or (ii) revert with git checkout HEAD -- <path>. When no unattributed paths exist (either hook_introduced_paths is empty or every entry is covered by the ledger), proceed directly to references/interactive-commits.md.

Read references/interactive-commits.md and follow the procedure from top to bottom — it is the single canonical home for Step 10's procedure body. The Approval token closed list and Localized summary tokens below stay defined in this file and are referenced from both that procedure and other Steps.

Approval token closed list (per § No-Stall Principle's "do not rely on exact-phrase matching" rule). The example phrases below are illustrative, not literal discriminators — categorize each user response into one of the four buckets via semantic judgment. When presenting an approval gate, include at least one short-form token from the accept bucket (e.g., "OK", "LGTM", "next") so users know brief responses are valid.

accept: explicit affirmative — "OK" / "approve" / "next" / "LGTM" / "コミットして" / "進めて" / "いいよ" or any semantic equivalent
adjust: specific revision request — "subject を ... に" / "this file should be in commit 2" / "split this commit" / any other concrete change demand
cancel / stop: explicit halt — "stop" / "abort" / "やめる" / "中断"
NOT approval: interrogative or non-committal — "look good?" / "どう？" / "これでいい？" / "OK ？". Treat as adjust and re-present (do not silently advance)

Localized summary tokens (per references/plan-format.md § Localization granularity). These tokens are defined here as the single source of truth — § Completion below references the same paired form rather than re-rendering it:

language: ja: Step 10 部分完了: <N>/<total> コミット適用済み
language: en: Step 10 partial completion: <N>/<total> commits landed

§ Completion below emits the localized token whenever Step 10 ended via Mid-loop cancel (see references/interactive-commits.md § Mid-loop cancel). On a normal completion path (every commit landed, or the Mid-loop adjust un-landed-drops-to-zero / merge-absorbs-into-landed branches — see references/interactive-commits.md § Mid-loop adjust — closed-list branches), no partial-state line is needed.

Step 11: Update Rules

Confirm remaining steps (USER APPROVAL GATE — when confirm_remaining_steps: true, or when fast_mode_active). This gate is the single source of truth for the § No-Stall Principle's "Step 11 confirm-remaining-steps entry gate" bullet and for § Configuration's confirm_remaining_steps bullet. Present this gate when confirm_remaining_steps: true or fast_mode_active; otherwise (the default false, or a non-boolean that fell back to false, and --fast was not passed) skip this gate entirely and proceed to sub-step 1 — Step 11 / 11.5 / 11.6 run unconditionally as before. When presented, before running sub-step 1, ask in the resolved language whether to run the remaining rule-maintenance and retrospective steps — Step 11 (Update Rules), Step 11.5 (Self-Retrospective), Step 11.6 (Workability Retrospective), listing only the steps actually registered this run — or skip them and go straight to Completion. Paired bilingual sample (runtime rendering demonstration):

language: ja: 残りのステップ（<登録済みステップを列挙>）を実施しますか？実施するなら「進める」、スキップして完了処理（Completion）へ移るなら「スキップ」と返してください。
language: en: Run the remaining steps (<list the registered steps>)? Reply "proceed" to run them, or "skip" to go straight to Completion.

Classify the reply by semantic judgment per § No-Stall Principle's "do not rely on exact-phrase matching" rule (the example tokens are illustrative, not literal discriminators):

proceed — affirmative ("proceed" / "進める" / "yes" / "実施" or any equivalent): the gate touches no state and falls through to sub-step 1 (rule extraction — the rule-extraction-active gate / shared session scan entry), exactly as the un-gated default.
skip — decline ("skip" / "スキップ" / "不要" or any equivalent): mark the closed set {Step 11 (this step)} ∪ {Step 11.5 if registered} ∪ {Step 11.6 if registered} completed without running their procedures (Step 11 is always registered; Step 11.5 only when self_retrospective.feedback is set; Step 11.6 only when workability_retrospective.enabled: true). This is an intended skip, not an unrun-step bug — the Phase-boundary self-audit (§ Step 1 registration mechanics) treats these gate-marked rows exactly as it treats the difficulty-skip matrix's pre-completed rows (see Step 6 / 6.5 / 7.5's difficulty exception). Because sub-step 3's entry is bypassed, establish its cross-step variables at their initial values — compaction_applied_count = 0, below_threshold_failed_files = [] — so § Completion's compaction reminder reads them well-defined and is omitted. landed_count is not this gate's concern (Step 10 owns its lifecycle; the gate never touches it). The shared session scan (references/session-scan.md) is not dispatched on skip — with Step 11 / 11.5 / 11.6 all skipped, no participating step reaches its dispatch point, so session_scan_dispatched / session_scan_result stay at their Step 2-entry init and Completion (which does not read them) is unaffected. Emit a one-line skip note in the resolved language at this gate (immediately before Completion), so the skip is never silent (language: ja: confirm_remaining_steps: ユーザー選択により <登録済みステップ> を skip しました; language: en: confirm_remaining_steps: skipped <registered steps> per user choice — listing only the registered steps), then proceed to Completion.
ambiguous (interrogative or non-committal — "どっち？" / "which?" / any non-decision): re-present the gate; do not silently pick a branch.

This gate emits no § User-gate summary preamble — it is a binary proceed / skip prompt with no structured content (see references/plan-format.md § User-gate summary preamble).

Rule extraction via the shared session scan → Skill(extract-rules) --apply-conversation-candidates. The prose coding-rule axis (.claude/rules/) is the rule-extraction axis of the shared conversation scan (references/session-scan.md): the shared scan produces a --- RULE-CANDIDATES --- block (the C4-equivalent candidates per references/rule-extraction-axis.md), and this sub-step hands that block to extract-rules Conversation Candidate Apply Mode (Step C5 only — dedup / route / write / promote / .examples.md / Security Self-Check; no jsonl re-parse). This is the apply half of a scan/apply split (the shared scan ingests the large session text once for all axes).
- rule-extraction-active gate: rule-extraction is inactive if (a) any entry in hooks.on_complete (as resolved in Step 1) contains the string extract-rules (direct invocation), OR (b) Step 9 executed at least one hook and the output produced by Step 9's hook invocations (visible in this session's context) contains evidence that extract-rules --from-conversation ran this session (sufficient signal: output contains staged_count or promoted_count). When inactive, skip the rule-extraction work entirely — do not dispatch the shared scan on rule-extraction's behalf and do not call extract-rules. This single gate suppresses both the conversation-derived extraction paths (the apply-only path below and its standalone fallback), so the staging double-count defense is preserved: running a conversation-derived extraction twice against one session (once via a hook's --from-conversation, once via this sub-step) would make the staged-promotion mechanism (1st-observation → 2nd-observation escalation) miscount one session as two independent observations and prematurely promote staging candidates. When inactive but a retrospective axis is registered, Step 11 abstains from dispatching and Step 11.5 / 11.6 dispatch the shared scan for their own axes (references/session-scan.md § Dispatch-once contract).
- When rule-extraction-active (Step 11 is the earliest participating step, so this is the shared scan's first dispatch point): resolve the session jsonl via the shared resolution procedure (references/self-retrospective.md §1.4 / references/workability-retrospective.md §1.3 — identical), then follow references/session-scan.md § Dispatch-once contract — if session_scan_dispatched == false, dispatch the shared scan for the still-active axes (rule-extraction ∪ any registered retrospective axes), set session_scan_dispatched = true, store the raw return in session_scan_result; then consume the rule-extraction block per § Consuming a block. Thread the Step 2-resolved subagent_model and the resolved language into the scan (references/session-scan.md § Inputs sets the scan subagent's Agent model from subagent_model, omitted when inherit). Disposition of the consumed block:
  - Well-formed block → write it verbatim to .claude/plans/<slug>.rule-candidates.md (<slug> = the run's plan slug, the same value § Completion's cleanup resolves; a workflow artifact — see § Workflow artifacts) and invoke Skill(extract-rules) --apply-conversation-candidates .claude/plans/<slug>.rule-candidates.md.
  - Whole-scan Status: ERROR (jsonl unreadable / unparseable) → treat rule-extraction as skipped — no fallback, because a standalone re-scan would fail the same parse.
  - Per-axis malformed / missing block (the jsonl parsed fine — other axes' blocks are present — but the --- RULE-CANDIDATES --- block is absent or malformed) → fall back to standalone Skill(extract-rules) --from-conversation (re-extract against the known-readable jsonl; the standalone C1–C5 path is untouched by this change). This per-axis fallback is the one asymmetry vs the retrospective axes' plain skipped (rule-extraction has a standalone fallback worker — references/session-scan.md § Consuming a block).
  - Zero candidates (Candidates: 0) → nothing to apply; proceed (success no-op).
Return-point no-stall reminder: after Skill(extract-rules) --apply-conversation-candidates (or the standalone --from-conversation fallback) returns — regardless of outcome (rules applied, nothing to apply, promoted / staged counts, any non-error result) — issue the next action (sub-step 2) in the next tool call. Do not insert an interstitial summary or "shall I proceed?" turn. See § No-Stall Principle.
Skill(extract-rules) with --update — Skip if any rule-extraction extraction ran this session: if a conversation-derived extraction happened at any point this session — sub-step 1's apply-only path applied a block, its standalone --from-conversation fallback fired, or a hook ran extract-rules --from-conversation (sub-step 1's rule-extraction-active inactive case (b)) — skip this sub-step. Running --update immediately after a conversation-derived extraction risks prematurely promoting staging candidates that were just created before they accumulate a second observation. Trigger (when not skipped): significant structural/pattern changes to application code occurred — new frameworks, libraries, architectural patterns, or API conventions introduced in the diff; prose-only changes to SKILL.md, agent definitions, references, or rule files do not qualify. A dependency major-version bump alone (no implementation code changes in the diff) does not trigger --update; the major-bump signal (detected via git diff <base-commit> of the package manifest — the same signal used in the Step 2 difficulty assessment) instead triggers the extract-rules Update Mode operational note: surface it to the user as a reminder to run --update after the session and to manually review .examples.md samples that may have gone stale after the bump.
Char-count compaction gate:

Skip condition: If compact_rules is not true (i.e. the default false, or any non-boolean value that fell back to false), skip this entire sub-step — do not invoke Skill(extract-rules) --compact, do not open the Step 11 compaction approval gate, and proceed directly to sub-step 4 (variable initialization is not skipped — it is governed by the State-variable contract below, which covers the skipped case). Emit a one-line informational note in the resolved language so the user has a visible signal that compaction is intentionally not running:
- language: ja: Step 11 sub-step 3（圧縮）を skip しました — \compact_rules: true` が設定されていません（実験的機能 / デフォルト無効）`
- language: en: Step 11 sub-step 3 (compaction) skipped — \compact_rules: true` is not set (experimental feature / disabled by default)`
State-variable contract (cross-step declaration — § Completion reads both variables; the full 4-point lifecycle is specified in references/update-rules.md § Char-count compaction gate): at sub-step 3 entry, initialize compaction_applied_count = 0 and below_threshold_failed_files = []. When the skip condition above fired, both variables simply stay at these initial values (no advance ever runs), so § Completion's reads are well-defined and its compaction reminder is omitted.

When not skipped (compact_rules: true): read references/update-rules.md and follow § Char-count compaction gate from top to bottom — it is the single canonical home for this sub-step's procedure body, including the Step 11 compaction approval gate (USER APPROVAL GATE).
If extract-rules is unavailable: before skipping, save any reusable patterns or insights that surfaced during the workflow to .claude/plans/rules-candidates-<YYYY-MM-DD>.md (append if the file already exists) so the knowledge is not silently lost and can be handed off to a later manual Skill(extract-rules) run. Inform the user that extract-rules is unavailable and point to the saved candidates file, and append extract-rules unavailable (Step 11) to bundle_skills_unavailable (§ Step 1 sub-step 3's "Initialize the bundle-unavailability ledger here" bullet).
Commit rule updates (USER APPROVAL GATE): run this only when interactive_commits is true and there are uncommitted changes under any of extract-rules' three output directories — output_dir (default .claude/rules/), examples_output_dir (default .claude/rules-extras/), and staging_output_dir (default .claude/rules-staging/) — each resolved from .claude/extract-rules.local.md's frontmatter when that field is set, else its default. These are typically the rule files, .examples.md files, and staged 1st-observation candidates Skill(extract-rules) just wrote in sub-steps 1–3, including any accepted compaction edits from sub-step 3. Detect via git status --porcelain=v1 --untracked-files=all -z filtered to the union of the three resolved directories, with the § Workflow artifacts set subtracted (the default dirs are disjoint from the workflow-artifact paths under .claude/plans/ and the backlog dir, but subtract explicitly so a non-default config that points an output dir at a workflow-artifact location stays safe; a project that gitignores staging_output_dir / examples_output_dir naturally excludes those, since gitignored paths never appear in the porcelain output). If interactive_commits is false or no such changes exist, skip this sub-step entirely (no-op — the Completion rule-update / examples-dir / staging-dir reminders cover any remaining uncommitted changes). Otherwise propose a single commit of the uncommitted changes across the three directories:
- Present / Stage / Commit / retry / post-commit auto-modify follow references/interactive-commits.md's Per-commit loop a. Present / c. Stage / d. Commit / e. Non-zero-exit retry steps and its Post-commit auto-modify cycle bound, with two scope differences from Step 10's loop: (i) the diff base is HEAD (Step 10 already committed the production code), so render each tracked rule file's diff via git diff HEAD -- <path> and each untracked rule file via Read; and (ii) this commit does not increment landed_count — that counter tracks Step 10 production commits only, and § Completion's decomposition-resume routing reads it; bumping it here would mis-route a decomposition-subtask run whose Step 10 landed zero production commits into the "already committed by Step 10" branch. Because the pathspec is restricted to the three extract-rules output directories, any production-code changes the user declined at Step 10 stay in the working tree and are not swept into this commit. The gate's user-facing framing (the commit presentation and the accept / adjust / cancel prompt) is rendered in the resolved language — same as the Step 10 commit gates (see § Configuration's language bullet); the commit subject / body and git diff output stay verbatim.
- Output-class labeling in Present: label each proposed file by its extract-rules output class — confirmed rule change, example, or unreviewed 1st-observation candidate — classifying each file by the output-class filename suffix defined canonically at § Completion's "Step 11 extract-rules output reminders" paragraph (so a rule file under a collapsed examples_output_dir == output_dir is labeled a rule change, not an example). Staging candidates additionally carry the note that extract-rules stages them on first observation and normally promotes them to output_dir on a later re-observation rather than committing them as-is. The label lets the user make an informed accept / adjust(exclude) / cancel choice and is the safeguard — together with the USER APPROVAL GATE itself — against silently committing unreviewed staging candidates.
- Draft the subject in the project's conventional commit style (reuse the style deduced at Step 10's Deduce commit style; if Step 10 was skipped this run, run git log -n 10 --format=%s to deduce it).
- Judge the user response per § Approval token closed list: accept → stage with an explicit pathspec spanning the changed paths across the three resolved directories and commit; adjust → revise the subject/body or narrow the committed file set by omitting paths from the pathspec (e.g. drop the staging_output_dir files to keep them reminder-only) — the git checkout HEAD -- <path> exclusion remains forbidden across the whole widened pathspec, not just output_dir, because it destroys uncommitted changes (per references/interactive-commits.md § Propose commit plan's exclusion rule); then re-present; cancel → leave the changes uncommitted and proceed.
- This is a new Step 11 sub-step rather than part of Step 10's commit loop because Skill(extract-rules) runs after Step 10, so the rule files do not yet exist when Step 10's loop runs.
Return-point no-stall reminder: at the gate decision (accept / adjust resolution / cancel — any non-error result), the next action (sub-step 6) must be issued in the next tool call. Do not insert an interstitial summary or acknowledgment turn. See § No-Stall Principle.
After the rule-update commit gate above resolves (or was skipped) — regardless of whether new rules were added, the report indicated nothing changed, or extract-rules was unavailable — mark Step 11: Update Rules as completed and proceed automatically. Per the No-Stall Principle, do not wait for user input.

Step 11.5: Self-Retrospective

Emit a sanitized improvement signal for the dev-workflow-bundle skills (dev-workflow, ask-peer, extract-rules, rules-review) to a user-configured destination. Raw conversation jsonl stays in-session; only abstracted, project-agnostic text leaves.

Skip this step if self_retrospective.feedback is unset/invalid (Step 1 did not register the row). Otherwise read references/self-retrospective.md and follow the procedure from top to bottom — the difficulty assessment does not gate this step. The jsonl scan is performed by the shared session scan: read references/session-scan.md and follow its § Dispatch-once contract. Reaching this dispatch point presupposes Step 11.5's §1 pre-flight (gh-auth / repo accessibility — a runtime gate, distinct from the Step-1 feedback registration gate) already passed; on pre-flight failure Step 11.5 aborts here and Step 11.6 becomes the dispatcher (references/session-scan.md § Dispatch-once contract). At Step 11.5's dispatch point: if session_scan_dispatched is already true (an earlier participating step — Step 11 when rule-extraction-active — already dispatched), consume the self-retrospective block from session_scan_result (§ Consuming a block) — the block Step 11 may have included speculatively before this pre-flight, now validated; if it is false (Step 11 abstained because rule-extraction was inactive), Step 11.5 is the dispatcher — dispatch the shared scan for the still-active axes (self-retrospective ∪ workability when workability_retrospective.enabled), store the return in session_scan_result, and consume the self-retrospective block from it (§ Consuming a block). Thread the Step 2-resolved subagent_model and the resolved language into the shared scan — references/session-scan.md § Inputs sets the scan subagent's Agent model from subagent_model (omitted when inherit).

Step 11.6: Workability Retrospective

Detect this session's project-tooling workability improvements — reusable manual procedures that could become a .claude/skills/<name>/ skill (skill-candidate) and mechanically-enforceable conventions that could become a linter-config / check_commands addition (lint-rule-candidate) — and offer a per-candidate 4-way disposition gate (act now / make a subtask / save to backlog / reject). The detection runs via the shared session scan (references/session-scan.md) — the same single jsonl parse that serves Step 11 (rule-extraction) and Step 11.5; raw conversation stays in-session. This is the project-tooling retrospective axis, distinct from Step 11's prose-rule axis (extract-rules) and Step 11.5's bundle-skill axis (self-retrospective).

Skip this step if workability_retrospective.enabled is not true (Step 1 did not register the row). Otherwise read references/workability-retrospective.md and follow the procedure from top to bottom — the difficulty assessment does not gate this step (mirrors Step 11.5). The jsonl scan is the shared session scan: read references/session-scan.md and follow its § Dispatch-once contract. At this step's dispatch point, if session_scan_dispatched is already true (an earlier participating step — Step 11 and / or Step 11.5 — dispatched the shared scan with the workability axis active), consume the workability block from session_scan_result (§ Consuming a block); if it is false (no earlier participating step dispatched — Step 11 abstained because rule-extraction was inactive, and Step 11.5 was unregistered or pre-flight-aborted), dispatch the shared scan for the workability axis and consume its block. When dispatching, thread the Step 2-resolved subagent_model and the resolved language into the shared scan — references/session-scan.md § Inputs sets the scan subagent's Agent model from subagent_model (omitted when inherit).

Completion

Derived staging artifact cleanup: before reporting summary, delete any per-agent staging documents under .claude/plans/ that dispatched review subagents generated this run (files matching <slug>-agent-*.md, where <slug> is the run's plan slug — on the visual / crit paths, the slug established at Step 4 sub-step 2 path (b); on the plan_review_gate: "plan-mode" path, the Plan Mode plan file's basename), plus the Step 4 visual-gate served plan file, its comments file, and its prev snapshot (<slug>.plan-review.md / <slug>.plan-review.comments.json / <slug>.plan-review.prev.md) when the visual gate ran this session, plus the Step 11 rule-extraction candidate file (<slug>.rule-candidates.md) when rule-extraction ran this session — all are commit-excluded but, unlike the Web-routine staging artifacts whose workspace is torn down, the visual gate is local-only (persistent working tree) so they would otherwise accumulate as untracked noise. Delete the staging files in two separate rm -f commands — fixed-name files first, the agent-staging glob last: rm -f .claude/plans/<slug>.plan-review.md .claude/plans/<slug>.plan-review.comments.json .claude/plans/<slug>.plan-review.prev.md .claude/plans/<slug>.rule-candidates.md then rm -f .claude/plans/<slug>-agent-*.md || true. Isolating the glob in its own trailing command is load-bearing: under zsh's nomatch an unmatched glob aborts the command it sits in — and -f suppresses only rm's own missing-file error, not the shell's expansion failure — so a single combined rm -f <glob> <fixed-names> would skip every fixed-name deletion whenever no agent file matches. The || true keeps the exit clean when nothing matches (zsh still prints a harmless no matches found; bash passes the unmatched glob through literally). Both commands stay covered by the existing Bash(rm -f .claude/plans/*) permission, so no new tool grant is needed. Do not delete the main plan document (<slug>.md) or any decomposition state file — those are canonical workflow artifacts that Step 1.5 / --resume depend on.

Report summary: tasks completed, files modified, test results, review outcomes, rules updated. Output in the resolved language following references/plan-format.md § Localization granularity.

Difficulty-skip reminder (per references/plan-format.md § Localization granularity): when difficulty_skipped_steps (initialized at Step 2 entry, populated by Step 2's Adjust N by difficulty) is non-empty, surface a line in the resolved language naming the steps the difficulty-skip matrix skipped, so the skip is never silent. Render the recorded steps with their tier; the example below pairs the two language values:

language: ja: 難易度判定（<tier> tier）により <steps> を skip しました — 例: 難易度判定（Trivial tier）により Step 6 Tidy / Step 6.5 Polish Prose / Step 7.5 Rules Compliance Review を skip しました
language: en: Skipped <steps> per the difficulty-skip matrix (<tier> tier) — e.g. Skipped Step 6 Tidy / Step 6.5 Polish Prose / Step 7.5 Rules Compliance Review per the difficulty-skip matrix (Trivial tier)

The reminder is omitted when difficulty_skipped_steps is empty (Moderate / Complex tasks, or -i-skipped Adjust N runs — see the Step 2-entry init). The step names (Step 6 Tidy / Step 6.5 Polish Prose / Step 7.5 Rules Compliance Review) stay verbatim regardless of language. Trivial and Simple skip the same three steps (see Step 2's Adjust N by difficulty) — the example above applies to both, substituting the assessed tier.

Fast-mode-skip reminder (paired with the difficulty-skip reminder above, per the warning-string differentiation rule — a separate ledger keeps a fast-mode-caused skip from being misread as a difficulty-driven one): when fast_mode_skipped_steps (initialized at Step 2 entry, populated by --fast's N-forcing and Step 6.5-only skip paragraphs) is non-empty, surface a line in the resolved language naming the steps --fast skipped:

language: ja: fast モードにより <steps> を skip しました — 例: fast モードにより Step 3 Plan Review / Step 6.5 Polish Prose を skip しました
language: en: Skipped <steps> per fast mode — e.g. Skipped Step 3 Plan Review / Step 6.5 Polish Prose per fast mode

The reminder is omitted when fast_mode_skipped_steps is empty (--fast not passed, or a Trivial-tier run where fast mode had nothing left to force). The step names stay verbatim regardless of language.

Bundle-skill availability reminder (per references/plan-format.md § Localization granularity): when bundle_skills_unavailable (declared at Step 1 sub-step 3's "Initialize the bundle-unavailability ledger here" bullet, appended at the sites named there) is non-empty, surface a line in the resolved language naming which dev-workflow-bundle sibling skills were unavailable this run, so a partially-installed bundle is never silently missed run after run:

language: ja: dev-workflow-bundle の一部スキルが今回の実行で利用できませんでした: <list>。\dev-workflow-bundle` プラグインが完全にインストールされているか確認してください。`
language: en: Some dev-workflow-bundle sibling skills were unavailable this run: <list>. Check whether the \dev-workflow-bundle` plugin is fully installed.`

Render <list> as the ledger's recorded entries verbatim, comma-separated (the skill names and step labels stay verbatim per § Localization granularity's "file-internal identifiers" rule; only the surrounding connective sentence is localized). The reminder is omitted entirely when bundle_skills_unavailable is empty — the common case where the bundle is fully installed.

Step 10 partial-state line: if Step 10 ended via its Mid-loop cancel branch (see references/interactive-commits.md § Mid-loop cancel), emit the localized partial-completion token defined at § Step 10's "Localized summary tokens" paragraph. On a normal completion path, omit this line.

Step 11 extract-rules output reminders (division of labor): Step 11's "Commit rule updates" gate proposes committing changes across all three extract-rules output directories (output_dir / examples_output_dir / staging_output_dir). Resolve those three directories once (per the gate's resolution) and run a single git status --porcelain=v1 --untracked-files=all -z scan at Completion, partitioning its output by directory into uncommitted_rule_changes (output_dir, default .claude/rules/) / uncommitted_examples_changes (examples_output_dir, default .claude/rules-extras/) / uncommitted_staging_changes (staging_output_dir, default .claude/rules-staging/). The scan's scope is the three-dir union (a coarse filter — paths under none of the three resolved dirs are ignored); within that scope, assign each changed path to exactly one set in two stages. (1) By directory membership — a path under exactly one of the three resolved dirs goes to that dir's set; the default (disjoint) config resolves entirely here, identical to prior behavior. (2) Filename-class tie-break, applied only when a path matches more than one resolved dir — possible when two dirs resolve to the same path or one nests under another (extract-rules permits examples_output_dir / staging_output_dir to be set to output_dir to opt into auto-load) — classify by extract-rules' output-class filename suffix, not by directory order: a basename ending in .examples.md → uncommitted_examples_changes; the staging file project.staging.local.md (.staging.local.md suffix — test this before the general .md rule fallback, since it also ends in .local.md) → uncommitted_staging_changes; every other .md → uncommitted_rule_changes. Filename-class decides the tie because once two dirs collapse to one path the directory can no longer tell a rule file from an example / staging file — a directory-order tie-break would misroute a rule file under a collapsed examples_output_dir == output_dir into the examples set and wrongly suppress the rule-update reminder. (extract-rules' examples files always end .examples.md with no .local.md variant, so the three suffixes stay mutually exclusive under this ordering.) Either stage lands every path in exactly one set (no double-count, no doubled reminder). The three reminders below read these partitioned sets, none re-resolves or re-scans. Each fires only for residue in its own directory that the gate proposed but the user left uncommitted (adjust-excluded, cancel-ed, or interactive_commits: false so the gate never ran); since the sets reflect the current working-tree state, an accepted gate commit naturally clears the corresponding reminder.

Step 11 rule-update reminder (per references/plan-format.md § Localization granularity): uncommitted_rule_changes is the partitioned set for output_dir (default .claude/rules/); it is also read by the compaction reminder and the decomposition-resume note below. When uncommitted_rule_changes is non-empty, surface a manual-commit reminder in the resolved language (<N> = number of uncommitted rule files):

language: ja: \.claude/rules/` に未コミットの変更が <N> 件あります — PR を開く前に手動で commit してください`
language: en: <N> uncommitted change(s) under \.claude/rules/` remain — please commit manually before opening a PR`

The reminder is omitted when uncommitted_rule_changes is empty — including the case where Step 11's "Commit rule updates" gate already committed the rule changes (interactive_commits: true, gate accepted). When interactive_commits: false the gate never ran, so the rule changes stay uncommitted and the reminder fires as before (backward-compatible).

Step 11 examples-dir reminder: when uncommitted_examples_changes (the partitioned set for examples_output_dir, default .claude/rules-extras/) is non-empty, surface a reminder in the resolved language (<N> = number of uncommitted example files, <examples_dir> = the resolved directory):

language: ja: \<examples_dir>` に未コミットの extract-rules examples が <N> 件あります — PR を開く前に手動で commit してください`
language: en: <N> uncommitted extract-rules example file(s) under \<examples_dir>` remain — please commit manually before opening a PR`

The reminder is omitted when uncommitted_examples_changes is empty.

Step 11 staging-dir reminder: when uncommitted_staging_changes (the partitioned set for staging_output_dir, default .claude/rules-staging/) is non-empty, surface a reminder in the resolved language (<N> = number of uncommitted staging files, <staging_dir> = the resolved directory). The message keeps the promote-review framing — staged entries are 1st-observation candidates normally promoted to .claude/rules/ on a later re-observation rather than adopted as-is, and the localized suffix notes they were also committable at the gate:

language: ja: \<staging_dir>` に未レビューの extract-rules 候補が <N> 件あります — 手動で確認し、採用するものを `.claude/rules/` へ promote してください（またはゲートで commit 可能でした）`
language: en: <N> extract-rules candidate(s) under \<staging_dir>` await review — inspect and promote accepted files to `.claude/rules/` manually (or commit them at the Step 11 gate)`

The reminder is omitted when uncommitted_staging_changes is empty.

Step 11 compaction reminder (per references/plan-format.md § Localization granularity): this block has two independent clauses.

(i) Commit clause — when compaction_applied_count > 0 (the Step 11 sub-step 3 char-count compaction gate landed user-accepted edits) and uncommitted_rule_changes (the output_dir partition from § Step 11 extract-rules output reminders) is non-empty — i.e. the compaction edits were not committed by Step 11's "Commit rule updates" gate (compaction edits live under .claude/rules/, so an accepted rule-update commit stages them along with the other rule changes) — surface a separate manual-commit reminder in the resolved language (rendered in file-unit count, distinct from the rule-update reminder above which counts uncommitted rule files):

language: ja: Step 11 で <N> 件のルールファイルを圧縮しました — PR を開く前に手動で commit してください
language: en: Step 11 compacted <N> rule files — please commit manually before opening a PR

This commit clause is omitted when compaction_applied_count == 0 OR uncommitted_rule_changes is empty (the latter means the compaction edits are already committed). The uncommitted_rule_changes-non-empty test is a coarse proxy for "the compaction edits are still uncommitted": a partial adjust at the rule-update gate that committed the compacted files but left other rule files uncommitted can over-fire this clause — a harmless redundant nudge, not a data-loss path.

(ii) Below-threshold follow-up — unconditional on commit state (it concerns re-running compaction, not committing): when below_threshold_failed_files is non-empty, surface a follow-up reminder naming the files that remain over threshold. <files> always renders at the sentence tail so the block-level list never appears mid-sentence:

language: ja: <M> 件のファイルが閾値を超えています。手動で再度 \Skill(extract-rules) --compact` を実行するか、当該ファイルを直接編集してください:followed by<files>` on the next line
language: en: <M> files still exceed the threshold. Re-run \Skill(extract-rules) --compact` manually or edit the files directly:followed by<files>` on the next line

Render <files> as one path per line — verbatim from files_processed[].path (repo-root-relative, e.g. .claude/rules/project.rules.local.md; never rewritten to user-absolute /Users/... form) — each prefixed with - (hyphen + space, no leading indent) directly below the reminder sentence as a top-level markdown bullet list. This applies for any M ≥ 1 — single-element lists render as a one-bullet list, not inline, so the layout is identical across runs and the trailing prose clause never floats after the bullet list.

The compaction reminder block is omitted entirely when both clauses are omitted — i.e. when the commit clause does not fire (compaction_applied_count == 0 OR uncommitted_rule_changes empty) AND below_threshold_failed_files is empty.

If this run was executing a subtask from a decomposition state file, also do the following (all reads/writes target the canonical state-file path recorded in Step 1.5):

Execution-time deferral/exclusion gate: before marking the subtask as completed, check whether any in-scope work items were excluded, deferred, or discovered as unassigned during implementation or testing. Items recorded only in prose (Risks entries, inline notes) are invisible to --resume and will be silently skipped — each such item must be promoted to a tracked subtask entry in the state file before completion is declared. For each uncovered item, get user approval on one of: (a) add as a new pending subtask with a depends_on link if sequencing matters, (b) fold into an existing pending subtask's scope, or (c) explicitly accept as permanently out of parent-task scope. The completion report must confirm that no goal-required items remain in untracked prose form.

Mark the current subtask's status as completed in the canonical state file and write back
Ask the user for an optional PR URL for this subtask. On a non-empty answer, set the subtask's pr field and write back; otherwise leave it null
Refresh the parent-task progress row's <done>/<total> count
Find the next runnable subtask (smallest-id pending with all depends_on completed)
If a next subtask exists: branch on whether Step 10 actually landed any commits this run (use the landed_count from Step 10 — taking the config flag alone would mis-route the case where interactive_commits: true met the Step 10 skip conditions and exited at zero commits):
- landed_count > 0: tell the user the current subtask's changes have already been committed by Step 10 — open a PR for those commits, then start a new session with /dev-workflow --resume <slug> once the PR is up
- landed_count == 0 (either because interactive_commits: false or because Step 10 was skipped): tell the user to commit the current subtask's changes and open a PR before resuming, then start a new session with /dev-workflow --resume <slug>. Explain why this matters: the next run records a fresh base-commit from HEAD, so uncommitted changes would leak into the next subtask's diff In both branches, if any of extract-rules' three output directories (output_dir / examples_output_dir / staging_output_dir) have uncommitted residue (i.e., any of the Step 11 rule-update / examples-dir / staging-dir reminders above fired), tell the user to commit those writes manually before resuming — otherwise they leak into the next subtask's diff the same way uncommitted feature changes would. When Step 11's "Commit rule updates" gate already committed all of them (no residue in any of the three dirs), omit this instruction — the changes are committed and will not leak. (This warning overlaps the per-dir reminders above on the same residue by design — both are prose nudges, no double-count.) The "no push" invariant for both branches is stated at § Step 10's preamble
If no next subtask exists (all subtasks completed): delete the canonical state file via rm -f <canonical-path>, remove the parent-task progress row, and include every subtask's title and recorded pr (if any) in the parent-task completion summary

Agent Skills: Dev Workflow

Install this agent skill to your local

Skill Files