Dev Workflow
Usage
/dev-workflow --init # Project setup (detect check/test commands)
/dev-workflow [-i N | --iterations N] <task> # Execute workflow (default)
/dev-workflow --resume <state-file> [-i N] # Resume next subtask from a decomposition state file
Prerequisites
- Reviewer skill (
reviewersetting, default: ask-peer): Required for plan/code review. Supported: ask-peer, ask-claude, ask-codex, ask-gemini, ask-copilot, ask-agy. If aSkill()call for the configured reviewer fails, attempt once more before declaring unavailable. If still unavailable, present the user with three explicit fallback options, each with its own resume semantics: (a) switch to another supported reviewer from the list — re-invoke the current review step with the new reviewer immediately (the original reviewer is not retried); (b) self-review — perform the review inline and advance past the current step (no later retry of the original reviewer); (c) pause at the current gate until the skill is installed — name the specific step where the original reviewer call will be retried once the skill is available. Do not silently advance past a review pass without the user knowing their options. - rules-review skill: Required for rules compliance review (Step 7.5). If a
Skill(rules-review)call fails, attempt once more before declaring unavailable. If still unavailable: skip Step 7.5 with a message that names the fallback (Step 8 reviewer as a lightweight backup) and the resume point (re-run rules-review manually after the session or re-run the workflow once the skill is installed). - extract-rules skill: Required for rule update. If a
Skill(extract-rules)call fails, attempt once more before declaring unavailable. If still unavailable: skip Step 11 with a message that names the fallback (no rule updates this run) and the resume point (invoke extract-rules manually after the session to capture rule changes).
Configuration
Settings files (YAML frontmatter only, merged across layers):
~/.claude/dev-workflow.local.md— User global defaults (lowest priority).claude/dev-workflow.md— Project shared settings (git tracked, team-shared).claude/dev-workflow.local.md— Personal overrides (gitignored, highest priority)
Merge strategy per key type:
- Scalar (
reviewer,review_iterations,task_decomposition,interactive_commits,compact_rules,custom_instructions,language): higher layer wins (replaces) - List (
check_commands): append — lower-layer items first, then higher-layer items, duplicates removed (keep first occurrence) - List-replace (
test_commands): higher layer's list replaces lower layer's list as a whole (no item-level merge or dedup). Defaults to["Skill(run-tests)"]when unset hooks: deep-merge at thehookslevel — each sub-key (on_complete) is merged as a list (append, deduplicated)
Keys absent from a higher layer inherit from lower layers. Only specify keys you want to override or extend.
---
reviewer: "ask-peer"
review_iterations: 3
task_decomposition: true
interactive_commits: true
compact_rules: false
custom_instructions: "Always use TDD. Write tests before implementation."
language: "ja"
check_commands:
- "pnpm run lint:fix"
- "pnpm run format"
- "pnpm run typecheck"
test_commands:
- "Skill(run-tests)"
hooks:
on_complete:
- "Skill(work-complete)"
self_retrospective:
feedback: "owner/repo" # or "/abs/path", "~/rel", "./rel"
---
- reviewer: Reviewer skill name (default:
ask-peer). Choose from: ask-peer, ask-claude, ask-codex, ask-gemini, ask-copilot, ask-agy. Unsupported values fall back toask-peer - review_iterations: Max iterations for Plan Review (Step 3) and Code Review (Step 8) (default:
3, must be a positive integer). Can be overridden per invocation with-i N/--iterations N - task_decomposition: Whether Step 1.5 runs the auto-decomposition check in Normal sub-mode (default:
true). Set tofalseto treat Normal sub-mode requests (/dev-workflow <task>) as single tasks — Step 1.5 is omitted from TodoWrite and the decomposition judgment is skipped entirely.--resume <state-file>is unaffected and still executes existing state files. Non-boolean values fall back totruewith a warning - interactive_commits: Whether Step 10 (Interactive Commits) runs after
hooks.on_complete(default:true). Whentrue, after Step 9 (Completion Hooks) the workflow proposes commit groupings and messages, then iterates per-commit with the user. Whenfalse, Step 10 is omitted from TodoWrite and never executes — the workflow ends with an uncommitted tree as before. Non-boolean values fall back totruewith a warning. To opt out, setinteractive_commits: falsein.claude/dev-workflow.mdor~/.claude/dev-workflow.local.md - compact_rules: Whether Step 11 sub-step 3 (Char-count compaction gate) runs (default:
false). The compaction mode added in v1.38.0 is currently experimental — whenfalse(the default), sub-step 3 is skipped entirely:Skill(extract-rules) --compactis never invoked, the gate is never opened, andcompaction_applied_count/below_threshold_failed_filesstay at their initial values so § Completion's compaction reminder is automatically omitted. Whentrue, the workflow invokesSkill(extract-rules) --compactand may enter the Step 11 compaction approval gate (USER APPROVAL GATE). Non-boolean values fall back tofalsewith a warning. To opt in for a specific project, setcompact_rules: truein.claude/dev-workflow.mdor.claude/dev-workflow.local.md - custom_instructions: Free-form development instructions applied as guiding principles across planning, implementation, review, and tidy phases (e.g., "Always use TDD", "Prefer functional style"). Optional.
.claude/rules/and explicit user requests take precedence if they conflict - language: Optional. Output language code (e.g.
ja,en) for user-facing prose produced by this skill — Step 4 plan body (Overview / Decisions / Design / Test plan / Risks / Unknowns content), user-gate preambles (Step 4 / Step 7.5 / Step 8), Step 2 difficulty-assessment log, Step 10 commit-plan / per-commit gate output (subjects, body, diff blocks framed in the resolved language; verbatim git output and file paths remain English), Completion summary, and Step 11.5 findingDescription/Suggested fix directionparagraphs. Resolution: merged skill config → Claude Code settings (~/.claude/settings.json→languagefield) → defaultja.null/ empty string / non-string values fall through to the next resolution step. For the localization boundary between translated concepts and verbatim identifiers, seereferences/plan-format.md§ Localization granularity. Seereferences/self-retrospective.md§2.1 Language handling / §5 Contract note for the Step 11.5 scope contract. No Step 11.5 output unlessself_retrospective.feedbackis also configured AND the task is not assessed as Simple or Trivial difficulty - check_commands: Static checks (lint, format, typecheck, etc.). Always run all in order
- test_commands: Defaults to
["Skill(run-tests)"]. Each entry must be aSkill(<name>)string (no shell commands). Entries run sequentially during Step 7. Run--initto generate or updaterun-tests; additional structural-check skills can be appended in project config (e.g. for bundle-sync drift detection, custom marketplace structure validators, or other repository-specific checks) - hooks: Execute skills/commands at specific workflow timing points
- on_complete: Runs as Step 9 (immediately after Step 8 Code Review). Entry format:
Skill(<name>)or shell command string - Entries not covered by allowed-tools require user approval
- on_complete: Runs as Step 9 (immediately after Step 8 Code Review). Entry format:
- self_retrospective: Optional. Emits sanitized improvement signal for the
dev-workflow-bundleskills (dev-workflow,ask-peer,extract-rules,rules-review) at Step 11.5 (between Step 11 and Completion). Raw conversation stays in-session; only abstracted text leaves- feedback: Destination string. Auto-detected:
- Starts with
/,~/,./, or../→ local directory path → retrospective written as a markdown file under that directory - Matches
^[\w.-]+/[\w.-]+$→ GitHubowner/repo→ retrospective submitted viagh apiPOST to/repos/<feedback>/issues - Any other string (including empty) → warn and skip Step 11.5
- Starts with
- If
feedbackis unset, Step 11.5 is not registered in TodoWrite and never executes — the workflow behaves as before - Step 11.5 is also hard-skipped when Step 2 assesses the task as Simple or Trivial difficulty (typo fix, config tweak, obvious bug fix), regardless of config — Simple/Trivial tasks rarely produce meaningful bundle-skill signal
Agenttool usage: Step 11.5 is the only step in this skill that directly spawns a subagent via theAgenttool (for jsonl scan + sanitization). Other steps delegate to named skills (Skill(ask-peer),Skill(run-tests),Skill(rules-review),Skill(tidy), etc.), never toAgentdirectly. Do not invokeAgentfrom any other step.
- feedback: Destination string. Auto-detected:
Mode Detection
--init→ Init Mode (-i/--iterationsis ignored)--resume <state-file>→ Execution Mode (Resume sub-mode; see Step 1.5)- Otherwise → Execution Mode (Normal sub-mode)
Init Mode
Read references/init-mode.md and follow the procedure.
Note: Skills generated by
--init(e.g.run-tests) are recognized from the next session onward. Do not run/dev-workflow <task>in the same session as--init.
Execution Mode
No-Stall Principle
Once the workflow has started (after Step 1.5 resolves the effective task), it must run to Completion without pausing, except at the explicit user-gate points enumerated below. Every other step — including every skill invocation, every no-op outcome, every "nothing to report" result — must be judged semantically by the agent and passed through automatically. Do not rely on exact-phrase matching; if the skill result reads as a successful completion (fixes applied, no changes needed, no violations, no new rules, or any equivalent "success / no-op" outcome regardless of wording), treat it as success and proceed to the next step.
Explicit user-gates (the only permissible pause points):
Each bullet names the gate and points to the authoritative definition site. When editing either the enumeration or the definition, update both together.
- Step 1.5 task-decomposition proposal dialogue —
yes / adjust / noconfirmation (Normal sub-mode; defined in Step 1.5 dispatch andreferences/task-decomposition.md§ B. Normal sub-mode) - Step 1.5 leftover-subtask picker dialogue — selecting which subtask to run when more than one leftover
in_progresssubtask is runnable (Resume sub-mode; defined inreferences/task-decomposition.md§ A. Resume sub-mode) - Step 4 plan approval (defined in Step 4: Finalize Plan)
- Step 5 probe → real-implementation user-observation gate — when the Plan explicitly stages a probe / intermediate-artifact step before its real-implementation replacement: hold the workflow at the boundary until the user signals observation completion (defined in Step 5's "User-observable artifact protection gate" paragraph). Fires conditionally per the Plan's content — non-probe-staged plans never enter this gate
- Step 7 scope-drift stop — when
check_commandswrites non-trivial changes outside the task-scope snapshot (trivial = whitespace-or-comment-only formatting on ≤ 5 lines attributable to the formatter/linter that just ran — those proceed automatically with a one-line note): warn and wait for user direction (defined in Step 7: Check / Test) - Step 7 check/test fail-stop — failure after 3 retries: report the error and stop (defined in Step 7: Check / Test). Note: this is an error-stop, not a pause for user decision
- Step 7.5 persistent-violations decision — rule violations still present after the 2nd review cycle (defined in Step 7.5: Rules Compliance Review)
- Step 8 unresolved-findings decision — reviewer-reported actionable findings still unresolved after the N-th iteration (defined in Step 8: Code Review)
- Step 10 commit-plan approval gate — accept the proposed commit grouping (subjects + file lists) for the working-tree changes; fires once on the initial plan and re-fires whenever a
Mid-loop adjustfile-regrouping / split-adding branch rebuilds the un-landed portion of the plan (defined in Step 10: Interactive Commits) - Step 10 per-commit accept gate — accept each individual commit (subject / body / files / diff) before it lands; repeats N times where N is the approved commit count (defined in Step 10: Interactive Commits, judged per § Approval token closed list inside Step 10)
- Step 10 fold-or-defer gate — after a pre-commit hook auto-modifies the working tree following a zero-exit commit, ask the user whether to amend the just-landed commit (
fold) or leave the changes uncommitted for a later iteration (defer); judged per the dedicated 5-branch →fold/defer/cancel/ re-present-as-adjustclassifier in Step 10'sPost-commit auto-modify cycle boundparagraph (the 5 input branches extend § Approval token closed list's 4 buckets with an additionaldefer-directionbranch; this gate is not the per-commit-accept-gate enum —cancelroutes viaMid-loop canceland ambiguousadjustresponses re-enter the gate via § Mid-loop adjust branch f) - Step 10 ambiguous-adjust clarifier — when a
Mid-loop adjustrequest cannot be classified into branches a–e, ask the user a clarifying question and re-enter the gate that issued the request — this gate is itself the disposition for branch f ofMid-loop adjust — closed-list branches(categorization vocabulary depends on which gate originated the request) - Step 11 compaction approval gate — when
Skill(extract-rules) --compactreturns top-levelstatus: "compacted", present per-file diff (chars_before / chars_after / iterations_used / applied_edits_count / structural_notes / per_file_status / below_threshold) per § User-gate summary preamble and wait for accept/reject/adjust/cancel per the Step 11 local closed list (defined in Step 11's "Char-count compaction gate" paragraph).cancelaligns with Step 10'sMid-loop cancelsemantic (no revert);adjustuses Step 11's own three-case closed list (per-file disposition / clarification / other), not Step 10's branch f - Completion subtask PR URL prompt — when executing a decomposed subtask, ask for optional PR URL before resuming (defined in Completion)
Fatal errors are out of scope for this principle: configuration-file absence, malformed state file, irrecoverable skill / tool failures, and similar infrastructure-level errors halt the workflow with a diagnostic regardless of whether they appear in the list above. The No-Stall Principle governs successful step outcomes (including no-op successes); it does not force the agent to push through genuine errors.
At any point not listed above — including after Skill(tidy), Skill(rules-review), Skill(extract-rules), Skill(run-tests), and reviewer skills return — the agent must never wait for the user to say "continue" / "続けて". Semantic judgment of the returned result is sufficient.
No-summary turn at review-return boundaries. When a reviewer or sub-skill returns a result that is semantically "nothing actionable" (no findings, no violations, no changes needed — regardless of the exact wording or the length of the response), the immediately next turn must begin with a tool call (TodoWrite to mark the iteration as completed, or the next step's invocation), not with a prose summary of the review outcome. Category-by-category verdict lists, conclusion paragraphs, and "shall I proceed?" sentences are the stall pattern — emit them only in the Completion summary (the ### Completion section that runs after Step 11.5), never at review-return transition boundaries. This applies to: Skill(ask-peer) / Skill(ask-claude) / Skill(ask-codex) / Skill(ask-gemini) / Skill(ask-copilot) / Skill(ask-agy) returning no actionable findings at Step 3 or Step 8, Skill(tidy) returning no changes, Skill(rules-review) returning no violations, Skill(extract-rules) returning no new rules at Step 11, and any other sub-skill whose result is treated as success.
Callee verdict transcription is not a turn boundary. When a sub-skill (Skill(tidy) / Skill(rules-review) / Skill(extract-rules) / Skill(run-tests) / reviewer skills / any other callee) returns a fenced JSON verdict, status token, or structured summary, and the orchestrator's response re-transcribes that block (verbatim or paraphrased) in its own output, the transcribed block does not end the orchestrator's turn. The same agent must immediately issue the next tool call in the same turn — the next sub-step's invocation, the next iteration's dispatch, the next phase's transition, the next Step's first tool call. Specifically forbidden: inserting a "shall I proceed?" sentence after the transcribed verdict; emitting "ここまでで一区切り" / "ここまでで完了です" prose summaries between the verdict and the next action; ending the response on the verdict block and waiting for the user to say "continue" / "続けて". This rule extends the "no-summary turn" rule above to the case where the sub-skill returned an actionable result and the orchestrator's response carries the verdict's content forward — the verdict transcription itself is informational, not terminal. (For skill development this covers Pattern A iteration loop verdict returns where the orchestrator re-renders the JSON before re-dispatching, orchestrator multi-callee chains where one callee's verdict feeds the next callee dispatch, sequential sub-step completion marking, and hook-chain continuations.) Sub-step completion prose ("Step N complete", "(d) verify-diff returned converged") follows the same rule: completion reports in prose are not turn-end signals; the next sub-step's first tool call must follow in the same turn.
Progress Visibility
Before any subagent-backed skill call (Skill(<name>) invocations including run-tests, ask-peer, tidy, rules-review, extract-rules) or any shell command expected to take ≥ 30 seconds, emit a brief status message naming what is starting — e.g. "Starting test run via run-tests…" or "Calling ask-peer for plan review (iteration 1 of N)…". Emit the message as prose in the same assistant turn that issues the tool call, not as a separate preceding turn. This lets the user distinguish an agent in active progress from one that has stalled. After the step returns, proceed immediately to the next step per the No-Stall Principle — do not emit a separate acknowledgment turn.
Mid-chain visibility (chained sub-skill calls or extended interpretation between tool calls). When a workflow phase issues sub-skill calls in a chain or spans extended internal interpretation / preparation across multiple tool calls (for skill development this includes a pre-implementation feasibility-check phase that fires several sub-skill dispatches in sequence, or a routine skill that issues several sub-skill dispatches back-to-back), the single pre-call status message rule above does not fully cover the user-visibility window. Extend the rule with a "current-location" line emitted at semantic checkpoints between dispatches — one short sentence naming the current phase and the next action ("Finished verify-diff for Finding 1; next: skill-review polish on the same file"). To keep the addition from re-introducing stall, three constraints bind its shape: (a) emit the current-location line as prose in the same turn as the next tool call, never as a standalone turn that waits for user input; (b) restrict the content to current phase name and next action only — review-result summaries, decision rationales, and "shall I proceed?" sentences stay out; (c) the rule does not apply to short same-turn chains of a few tool calls that complete inside a single turn — only to phases where the gap between user-visible signals would otherwise span multiple turns. Intent: in chained sub-skill phases (feasibility checks, routine dispatch loops, multi-call interpretation work) the user keeps seeing "this is alive and moving", while the No-Stall Principle's confirmation-prohibition stays intact.
Workflow artifacts (cross-step fixed exclusion)
Files this workflow itself creates and maintains as in-session state — plan documents under .claude/plans/, decomposition state files written by Step 1.5 or Step 10, and other workflow-internal staging artifacts placed under .claude/ by this skill — are cross-step fixed exclusions from any per-step changed-file enumeration (Step 6 Tidy scope, Step 7.5 rules-review diff input, Step 10 Interactive Commits' commit grouping, sub-skill dispatch payloads, scope checks). The exclusion is structural — the workflow owns these files as its own operational substrate — and is not gated on whether the path appears in .gitignore, whether a formatter ignore-file aligns, or whether the user happens to be touching them in this run. Steps that build a changed-file set, a diff-scope set, or a commit grouping must apply this single shared exclusion rather than re-deriving the rationale per step against ad-hoc justifications. If a future change adds another in-session-state path, extend this canonical list once rather than threading the exclusion through per-step prose (for skill development this is the canonical workflow-artifact set; sub-skills the workflow dispatches that maintain their own in-session state under .claude/ follow the same convention).
Step 1: Load Settings
- Read settings from up to three layers and merge (type-aware):
"Overlay" = for each key present in the file:merged = {} if ~/.claude/dev-workflow.local.md exists: overlay its frontmatter onto merged if .claude/dev-workflow.md exists: overlay its frontmatter onto merged if .claude/dev-workflow.local.md exists: overlay its frontmatter onto merged- Scalar keys:
merged[key] = file[key](replace) - List keys (
check_commands): appendfile[key]items aftermerged[key], then deduplicate (keep first occurrence) hooks: deep-merge — for each sub-key (e.g.on_complete), append and deduplicate the listnullor empty ([],{}) explicitly clears the key — lower-layer value is discarded, not inherited- Key absent from the file: left untouched (inherit from lower layers) If a file's YAML frontmatter is malformed (parse error), warn the user naming the file, skip that layer, and continue with remaining layers.
- Scalar keys:
- If none of the three files exist, prompt user to run
/dev-workflow --initand stop - Resolve
reviewerfrom config. If not specified or not in the supported list (ask-peer, ask-claude, ask-codex, ask-gemini, ask-copilot, ask-agy), useask-peer - Resolve N (review iteration count):
- If
-i/--iterationsoption is present and is a positive integer, use it - Else if config
review_iterationsis present and is a positive integer, use it - Else use default
3
- If
- Parse
hooksfrom config. Warn and ignore ifhooks.on_completehas invalid format. Parsecustom_instructionsfrom config (optional, string). Warn and ignore if not a string. Parsetask_decompositionfrom config (optional, boolean, defaulttrue). Warn and fall back totrueif present but not a boolean. Parseinteractive_commitsfrom config (optional, boolean, defaulttrue). Warn and fall back totrueif present but not a boolean. Parsecompact_rulesfrom config (optional, boolean, defaultfalse). Warn and fall back tofalseif present but not a boolean. Parselanguagefrom config per the Configuration bullet above. For~/.claude/settings.json, silently accept missing file / absent key /nullvalue; warn once per Step 1 settings-load pass on malformed JSON, non-string, or empty string. The resolved language is available to Step 11.5. Parseself_retrospective.feedbackfrom config (optional, string). Warn and ignore if not a string or if empty string"". Whenfeedbackmatches theowner/repopattern (^[\w.-]+/[\w.-]+$), additionally rungh auth statusas an early warning only — if auth fails, warn but do not block the run - Determine execution sub-mode: Resume if
--resume <state-file>was provided, otherwise Normal. Step 1.5 branches on this - Register all workflow phases with
TodoWrite, including review iterations. Do NOT skip any phase:- Step 1.5: Task Decomposition (Normal sub-mode only, AND only when
task_decompositionistrue— omit this entry entirely in Resume sub-mode or whentask_decompositionisfalse, since in either case the step has nothing to do at registration time) - Step 2: Create Plan
- Step 3: Plan Review
- Step 3-1 through Step 3-N: Plan Review - iteration 1 through N (generate N items based on resolved N)
- Step 4: Finalize Plan
- Step 5: Implement
- Step 6: Tidy
- Step 7: Check / Test [check: {check_commands} | test: {test_commands}]
- Step 7.5: Rules Compliance Review
- Step 8: Code Review
- Step 8-1 through Step 8-N: Code Review - iteration 1 through N (generate N items based on resolved N)
- Step 9: Completion Hooks (only if
hooks.on_completeis configured) - Step 10: Interactive Commits (only if
interactive_commitsistrue; single row — per-commit iteration is handled inline within Step 10 because the commit count is not known until the proposal phase) - Step 11: Update Rules
- Step 11.5: Self-Retrospective (only if
self_retrospective.feedbackis set and parses as a valid destination — see Step 11.5 for detection rules; if unset/invalid, omit this entry) Mark each itemin_progresswhen starting andcompletedwhen done. Registering all phases upfront gives the user visibility into overall progress and prevents steps from being accidentally dropped. Phase-boundary self-audit: at every top-level Step transition (not the iteration sub-rows Step 3-i / Step 8-i, which are governed by the Return-point no-stall reminders below), before issuing the first tool call that advances into a new Step's procedure, name the Step number you are entering and verify the prior Step's TodoWrite row iscompleted— if it is stillpendingorin_progress, return to the unfinished Step first instead of advancing. This guards against silent phase-skipping (e.g. jumping from Step 5 Implement to Step 7 Check / Test without running Step 6 Tidy, only to discover the gap during a later phase) that the TodoWrite registration alone cannot prevent. Implementation sub-tasks in Step 5 are additions, not replacements. Note: Unless-i/--iterationswas explicitly specified, Step 2 may reduce N based on task difficulty.
- Step 1.5: Task Decomposition (Normal sub-mode only, AND only when
- Context-compaction recovery: if the session context was compacted (prior turns summarized) before reaching this step in the current turn, re-read the configuration files from disk rather than relying on the summary — verify each step's skip conditions (e.g. whether
self_retrospective.feedbackis set, whetherhooks.on_completeis configured, whetherinteractive_commitsistrue, whethercompact_rulesistrue) from the actual merged config, not from compacted context.
Step 1.5: Task Decomposition
This step decides whether the user's request should be split into multiple smaller subtasks (each delivered as its own PR), or — in Resume sub-mode — picks the next subtask from an existing state file under .claude/plans/dev-workflow.<slug>.md.
State-file semantics are critical (a malformed or mis-routed file silently corrupts subtask boundaries), so the full procedure lives in a dedicated reference. Dispatch:
- Resume sub-mode (
--resume <state-file>was provided): readreferences/task-decomposition.mdand follow section A. Resume sub-mode from top to bottom. - Normal sub-mode +
task_decomposition: true(the default): readreferences/task-decomposition.mdand follow section B. Normal sub-mode. - Normal sub-mode +
task_decomposition: false: no decomposition work. Set the "effective task" to the original request and proceed to Step 2 without creating a state file. Step 1.5 is not in TodoWrite in this case (see Step 1), so there is nothing to markcompleted. You do not need to read the reference file.
EnterPlanMode is reserved for Step 2 — any decomposition proposal in Step 1.5 is a plain yes/no dialogue, not a plan.
After section A or B completes, the "effective task" is set for Step 2 onward: the selected subtask when decomposed, otherwise the original request.
Step 2: Create Plan
- Record the current commit as base-commit (
git rev-parse HEAD) for later diff comparison EnterPlanMode- Analyze the task and codebase, create implementation plan. Apply
custom_instructionsto shape plan priorities and structure. Follow the structure defined inreferences/plan-format.md— Overview / Decisions / Design / Test plan required; Risks / Unknowns optional. Section-level content rules live in the reference file; do not re-derive them here.- If a state file exists (this run is executing one subtask of a decomposed parent): the "effective task" = the current
in_progresssubtask. Frame the plan around just this subtask while keeping the full parent task and other subtasks as background context so the plan stays consistent with the overall direction. Do not plan work belonging to other subtasks. Seereferences/plan-format.md§ Subtask / Resume handling for how Decisions is scoped in this case - TDD-conflict resolution: if
custom_instructionsincludes a TDD-style requirement (e.g. "Always use TDD", "write tests before implementation") AND the current task is adding tests for existing behavior (characterization tests, coverage tests, or relocating existing tests — keywords: "add tests for", "characterize behavior", "test coverage", "move tests", "固定する", "追加する") rather than driving new implementation, declare explicitly in Plan Overview or Risks that this subtask is TDD-loop-external: tests describe and fix already-implemented behavior, not specification of new behavior. This resolves the apparent conflict: the TDD guideline governs feature-implementation subtasks; characterization and coverage subtasks are outside the TDD loop by design. - Version/identifier string replacement tasks: if the core operation is replacing a specific version string, identifier, or constant across the project (e.g. version bump, rename, migration), grep the entire repository for the old value before drafting the plan. Include the complete list of affected files in the Design section — missing even one location is the primary regression source for this task class
- If a state file exists (this run is executing one subtask of a decomposed parent): the "effective task" = the current
- Simplicity self-audit: Before proceeding to Step 3, audit the plan:
- Each plan element must be traceable to one of: (a) an explicit user requirement, (b) a known bug or constraint, or (c) a documented project rule under
.claude/rules/. "Future-proofing", "UX polish", "consistency with other projects", or "might be useful later" are not sufficient triggers on their own. - Inherited spec files (
.claude/plans/*.md— full-spec drafts, archived plans, or AI-authored details within a task-decomposition state file): treat the content as a prior-session draft, not as confirmed user requirements. Cross-check each inherited design decision against the user's original ask surfaced by Step 1.5 / the user message — prior-session elaboration is the most common source of scope creep.- Exception — task-decomposition state files: subtask boundaries, order,
depends_on, and purposes were user-approved in a prior Step 1.5 and must be honored as-is. Only AI-authored descriptions, verification hints, and design elaborations within each subtask are draft.
- Exception — task-decomposition state files: subtask boundaries, order,
- Root-cause provenance check: if the plan leans on a root-cause claim from an AI-authored prior-session artifact, re-derive the root cause from the user's original ask before treating it as load-bearing. User-confirmed root causes are exempt.
- Plan-level incrementality: check whether the plan splits into independently verifiable units (e.g. hotfix vs refactor). If yes, propose the split now rather than deferring to Step 3 plan review — PR-level splits restart via Step 1.5, intra-PR splits become staged commits (same dispatch as Step 3's Incrementality review category).
- Cross-component pattern alignment: when the plan touches a structural pattern shared across sibling components (shared base classes, cross-cutting middleware, return/API contracts, mirrored services, parallel route handlers — for skill development this includes subagent dispatch shape, hook wiring, state-file handling, return-contract design), audit three alignment directions. (i) Propagating a fix outward — if the plan fixes a structural defect in one component, check whether siblings sharing the structure carry the same defect; either expand the plan's scope to cover them or add an explicit Risks entry deferring them with a one-line rationale (silently scoping a structural fix to one component leaves the same defect in the others). (ii) Aligning a new component inward — if the plan adds a new component alongside an existing sibling group, decide explicitly in Decisions whether to follow the siblings' shared shape (iteration-loop form, responsibility-split unit, return-contract form — for skill development this includes the iteration-loop vs. single-dispatch choice and the detection-vs-apply split) or intentionally diverge. (iii) Intra-patch self-duplication — if the plan itself lands the same processing pattern (shared validators, common error handling, mirrored formatting / serialization logic) at multiple call sites within this single change, treat those sites as siblings under (i): a defect at one site is likely present at the others, and a fix to one site must be applied to the rest of the same-pattern sites within the patch (for skill development this includes producer / consumer applying the same JSON parse / fallback pattern, or rolling out the same return-contract shape across multiple callees). Any of the three directions missing from Decisions tends to surface late as Step 3 reviewer pushback or Step 4 user pushback.
- Consistency-with-siblings as primary rationale: when a plan element's primary rationale is "align with existing sibling implementations / for consistency" alone (i.e. not directly traceable to (a) explicit user requirement, (b) known bug or constraint, or (c) documented project rule), surface lighter alternatives alongside the consistency choice in Decisions — for example a scope-narrowed single-pass implementation, a hybrid that gates auto-application on specific conditions, or a detection-only design that delegates application to the caller (for skill development this includes single-dispatch detection-only, category-gated auto-apply, or the iteration-loop choice). When
consistency-with-siblingsis chosen, additionally record in one line the cost of taking a different shape from the siblings, so Step 3 reviewers and Step 4 user approval see the trade-off explicitly. - Experimental feature gating overrides sibling-consistency default: when introducing a new config flag whose gated behavior is experimental — the behavior was added in a recent release and the CHANGELOG / SKILL.md prose still marks it as "experimental", "recently added", "still in trial", or any equivalent immaturity signal (for skill development this includes new compaction / auto-fix gates, new dispatch-loop modes, or recently-shipped phase reorderings whose side effects have not yet had time to surface) — prefer opt-in (default disabled) as the Recommendation even when adjacent / sibling config flags default to enabled. Sibling-consistency is a real concern (handled by the bullet above) but yields here because experimental behavior carries unobserved side effects that a default-enabled rollout would broadcast to every user on the next workflow run. Set Recommendation
<flag>: false, place the sibling-direction<flag>: trueas Alternative in Decisions, and cite the experimental marker (specific CHANGELOG bullet wording or SKILL.md prose) in the rationale. The override applies until the feature graduates — a later release removes the experimental marker or no further fixes target it; revisit the default in a follow-up plan at that point. - Upstream-handoff agreement override: when the plan overturns a design decision explicitly stated in a prior-session-agreed upstream document (handoff materials, archived design records, decision logs — for skill development this includes
.claude/plans/*-handoff.mdfiles or similar prior-session artifacts that record user-approved design outcomes), enumerate the diff (current plan's placement / value / shape vs. the upstream-agreed one) explicitly in Decisions. Set the Recommendation / Alternative pair to "uphold upstream agreement" and "overwrite upstream agreement", and annotate the chosen item with an explicit override marker (upstream-overrideforlanguage: en,先行合意上書きforlanguage: ja) so Step 4 user gate surfaces the divergence and the user can affirm or reject the override. Distinct from the "Inherited spec files" bullet above in this Simplicity self-audit — that bullet covers AI-authored draft elaborations from prior sessions, this bullet covers user-agreed design decisions from prior sessions. - Temporary-workaround minimal coupling: when an added plan element is explicitly declared as a temporary workaround awaiting a known removal trigger (upstream bug fix, feature flag rollout, deprecation window expiry, or any condition the user or upstream document names — for skill development this includes project-local skills introduced to work around an upstream Claude Code bug, or rule bullets staged in
*.local.mduntil a planned promotion lands), set minimal coupling — local placement, a single dedicated hook point, and no intervention with the caller's other logic / state machine / counter set — as the first-class Recommendation in Decisions. Place deep integration (state-machine weaving, callee chains entered from multiple hook points, per-iteration record schema extension, sibling-file synchronization) as the Alternative, with an explicit one-line rationale on removal cost — how many files / sections / counter sites / cross-references must change when the removal trigger fires. Removability (deletion blast radius) must also appear as a required Risks evaluation axis for any plan element whose temporary nature is declared up-front. - Domain-state composition explicit decomposition: when a plan element's feature requirement depends on a composition of multiple independent state values — boolean AND / OR over per-source state flags (e.g. an OS-level permission state AND a server-side user-preference flag, a build-time feature flag AND a runtime config value, a multi-source authorization state combining session / role / resource ownership; for skill development this includes a callee's enabled state AND a per-Finding counter, or a hook activation gated by both a
languagesetting AND acompact_rulesflag) — enumerate the constituent state values explicitly in Decisions: name each, identify its source (OS / server / config layer / etc.), and document how they compose into the derived predicate (AND / OR / N-value truth table). Plans that hide the composition behind a single derived predicate routinely gate on one constituent and miss the others, surfacing only at integration time. Required Test plan companion: include a state-space combination table enumerating every combination of constituent values (or every equivalence-class boundary combination when the space is large) and the expected outcome per cell, so implementation cannot pass with a one-constituent gate-check. - Self-application live validation in Test plan: when the change targets the currently running skill itself — the modified skill is the one driving this very workflow run, OR a callee skill the workflow will invoke later in the same run (for skill development this includes a
dev-workflowchange exercising at Step 11 sub-step 3 of the same workflow run, anextract-ruleschange exercising at the upcoming Step 11 sub-step 1 / 2 invocation, or arules-reviewchange exercising at the upcoming Step 7.5 invocation) — identify the immediate-exercise path inside this same run and add a live validation item to the Test plan citing the specific Step / sub-step / hook where the new behavior is naturally exercised. Live validation lets the workflow itself act as the regression check, avoiding the manual-verification round-trip that an external-target change would otherwise require. When the target is not the running skill or a same-run callee (external skill, CLI tool, application code unrelated to this run) or the change has no in-run exercise path (e.g. a session-start-only path the current run already passed), this audit does not apply. - Plan-body technical concept names match current target file vocabulary: when the plan body references technical concept names that already exist in the files the plan will touch — step / section / heading names, file or directory paths, config keys, callee names, identifiers —
grepthe cited names against the current target files before finalizing the plan, and reconcile any mismatch (renamed step, renamed config key, deprecated callee name, stale file path) in this same pass rather than letting it surface at Step 8 as a cross-file inconsistency finding. Distinct from the Step 3 reviewer'sInternal convention citation verification, which judges claims of convention adherence: this audit is a pure name-currency check that catches plans drafted against an older mental snapshot of the target files (for skill development this includes step / heading references in.claude/plans/*.mdthat must match the SKILL.md's current section names, config key references that must match the current.claude/<skill>.mdschema, and callee skill names that must match current bundle membership). When the task is itself a rename / version-bump operation, the Version/identifier string replacement tasks bullet above already enforces a repo-wide grep and this audit folds into that one; this bullet covers the non-rename case where stale vocabulary creeps in inadvertently. - Sub-skill natural-language argument minimalism: when the plan procedure specifies passing a natural-language argument to a sub-skill (a free-text scope hint, an instruction prefix, a per-invocation directive — for skill development this includes plan-body sub-skill invocations with author-written prose, routine skills handing a scope expression to a downstream sub-skill, or workflow steps that pass through user-provided
custom_instructions), default to a short scope-only sentence rather than a long contextualized preamble. Natural-language input has prompt-injection weight over the callee's own procedural instructions (its SKILL.md body), so a long preamble piled on top of the callee's procedural logic can override its built-in fallbacks and produce empty-input early-termination or other unintended branches — the simple short form usually parses correctly because the callee's procedural code is allowed to drive. Three rules: (a) the natural-language argument states the minimum scope only; context, history, and surrounding information are not added by default; (b) extra context is added only when strictly required — extra context can collide with the callee's own judgments, so the default is omission; (c) when the short form does not select the expected scope, the fallback is state preparation before the call (switching to the target branch, staging the working tree, pre-resolving a config value) rather than appending preamble text. This audit applies symmetrically across all sub-skill call sites in the plan body — plans that specify long defensive preambles "to be safe" should be flagged and trimmed. - Project-convention tension surfacing as binary Decisions item: when the plan's default behavior is potentially in tension with a project convention that the same repo enforces (distribution rules in
CLAUDE.md/.claude/rules/, auto-load scope conventions, naming or placement conventions, paired-bump or release-bookkeeping rules — for skill development this includes.claude/rules/project.rules.mddistribution-aware fix-direction rules,.claude/rules/**auto-load directory conventions, marketplace.json bundle-membership conventions, CHANGELOG paired-bump conventions, or any equivalent cross-cutting policy the plan must reconcile against), enumerate the tension as a binary Decisions item with (i) keep-default: default behavior unchanged + opt-in config to enable convention-aligned behavior, and (ii) change-default: default flipped to convention-aligned behavior, with CHANGELOG opt-out note and explicit behavior-change signal. The Recommendation can be either side, but the Alternative field MUST be the other — never leave the convention-tension implicit in the plan's prose. Without this binary item the convention-tension sinks into the plan's invisible premise and only resurfaces at Step 4 user-gate as a material-change request, forcing a plan rewrite (insertion-direction) and a Step 3 review iter re-run. General principle: existing-behavior-preserving and existing-behavior-changing choices are both legitimate candidates when project convention tension exists; surface them symmetrically rather than hiding the change-default option (for skill development this includes default-flip on a config flag that aligns with.claude/rules/auto-load scope, output-directory default that aligns with distribution-aware conventions, or any default whose change would warrant a CHANGELOG entry signaling behavior change to existing users). - Symptom-mitigation vs root-cause-fix discrimination: when the task is a bug fix (the plan is shaped around an observed failure mode), explicitly classify each proposed change as either (a) symptom-mitigation — the change suppresses the firing condition of the observed symptom without correcting the underlying state-machine asymmetry / lifecycle gap that produces it (e.g., "discard stale entries on read" for a save-on-completion gap, "reset the counter at session start" for a missing-decrement path, "skip the malformed record" for a missing-validation-on-write gap), or (b) root-cause-fix — the change corrects the state transition / lifecycle / invariant that produces the symptom (e.g., add the missing save-on-completion event, add the missing decrement, add the validation-on-write step). When the plan is composed entirely of (a)-class changes, surface this explicitly in Decisions as a recorded judgment (Recommendation: "ship mitigation only" vs Alternative: "extend the plan to include a root-cause fix"), with a one-line rationale, and add the structural cause to Risks even when not fixed in this scope — so a plan satisfied with mitigation only is a deliberate decision, not an accident. Failing to make this distinction tends to surface in later review iterations as "but why does the symptom happen in the first place?" pushback that retroactively expands scope.
- Existing-contract preservation alternative: when the plan's default approach would change an established subagent / callee / API contract (return schema, dispatch shape, side-effect set,
allowed-toolssurface, sub-step boundaries — any interface multiple consumers already rely on), enumerate contract preservation + outer-layer adaptation as a parallel Alternative alongside the contract-change Recommendation. The Alternative names where the same target objective can be reached without touching the contract — main-thread synthesis from the callee's existing detection-only output, a caller-side mapper / translation layer, or a wrapper that adapts the new outcome into the existing contract shape. Compare on three axes: (i) sweep cost (how many reference sites must change under the contract change), (ii) contract preservation (downstream callers' invariants stay intact), (iii) blast radius (how many consumers are affected). The contract-preserving option should be the first one evaluated before adopting a contract-change default — contract changes propagate across all reference sites and tend to outweigh the in-skill structural simplification benefit. Surface the choice as a binary Decisions item; the Recommendation may be either side, but the Alternative MUST be the contract-preserving option when contract change is the proposed default (for skill development this is the Pattern A 2-layer discipline — subagent analysis-only + main-thread synthesis ofEditcalls — chosen over a single-layer "subagent emitsEditdirectly" contract change). - Sibling-mechanism extension alternative enumeration: when the plan adds a new feature to a target module that already exposes N ≥ 2 sibling mechanisms for adjacent purposes (existing
--<flag>modes, plugin / strategy / variant classes, config-branch families, named sub-skills — any pre-existing closed set of N siblings that share a common dispatch surface), the Decisions section MUST enumerate two sibling-extension Alternatives alongside the "independent new addition" Recommendation: (i) extend the responsibility of the most-closely-aligned existing sibling to cover the new behavior (single-sibling scope expansion), and (ii) merge the new behavior into the existing sibling group's shared dispatch surface as a new variant of the same shape (full sibling-pattern inheritance). Enumeration may be skipped only when N = 0 / 1, or when an upstream task / handoff document explicitly mandates separation; record the rationale in a one-line note when skipping. Without this enumeration, the default "add as independent component" Recommendation hides the sibling-extension cost ceiling — the user gate at Step 4 surfaces the alternatives belatedly and a Plan rewrite is forced. The Step 3 reviewer's Cross-component sibling coverage sub-check covers this verification (for general software development this includes new endpoint handlers added alongside existing N route handlers, new strategy classes added alongside existing N strategies; for skill development this includes new--<flag>modes added alongside existing--update/--restructure/--compactmodes, new bundle skills added alongside existing N bundle members). - For each element that fails the audit, either (i) drop it from the plan, or (ii) add an explicit one-line rationale tying it to a concrete trigger (user requirement / bug / rule).
- Each plan element must be traceable to one of: (a) an explicit user requirement, (b) a known bug or constraint, or (c) a documented project rule under
- Plan self-check: Run the checklist in
references/plan-format.md§ Step 2 self-check against the plan. This is the author's first-pass judgment on Decisions content; fix any failures before Step 3. - No code changes in this phase
- Adjust N by difficulty (skip if
-i/--iterationswas explicitly specified): A typo fix doesn't need 3 rounds of review. Based on the plan just created, assess task difficulty and reduce N to avoid unnecessary iterations — the configured value is a ceiling, not a target:- Trivial (a genuinely self-evident change with a single unambiguous solution — a typo fix, a one-line edit, a config value change): N = 0 — Step 3 (Plan Review) and Step 8 (Code Review) are skipped entirely. Conservative tie-break: classify as Trivial only when the solution is truly unique and obvious; if the change spans more than a trivial edit, the solution is not uniquely determined, or there is any doubt at all, fall to Simple or above so internal review is retained. The same external-library major-bump exception described under Simple applies here too (such a change is never Trivial)
- Simple (typo fix, config tweak, straightforward bug fix with obvious solution): N = 1 — unless the change touches an external library's config file or type-level API AND that library had a recent major-version bump (primary check:
git diff <base-commit>of the package manifest; if absent in this run, judge from other context since the bump may predate this run); then classify as at least Moderate. Similar qualitative risks (external config-DSL rewrites, etc.) follow the same rule. Purely cosmetic edits (comments, whitespace, auto-formatting) do not trigger the exception — use judgment - Moderate (multi-file within one module, feature following existing patterns): N = min(2, N)
- Complex (cross-module, new patterns, API changes, significant refactoring): keep N
File count is a hint, not the sole criterion. If adjusted, mark excess TodoWrite iteration items (Step 3-x and Step 8-x) as
completed. When N is reduced to 0 (Trivial), mark every Step 3-x / Step 8-x iteration item AND the top-levelStep 3: Plan Review/Step 8: Code Reviewrows ascompleted— both steps are skipped entirely (their entry-point guards in Step 3 / Step 8 recognize this pre-completed state and pass straight through). Log the assessed difficulty and effective N in the resolvedlanguage(see §Configuration; defaultja). If the difficulty is Simple or Trivial and the Step 11.5 TodoWrite row exists (i.e.self_retrospective.feedbackis configured), additionally mark that row ascompletedwith noteskipped: Simple task(orskipped: Trivial task).
- Do not present the plan to the user or ask for approval/confirmation — presenting an unreviewed plan wastes user time and risks approval of a suboptimal approach. This prohibition extends to confirmation-seeking transition sentences such as "if this design looks good, I'll proceed to Step 3 (Plan Review)", "shall I move on to Plan Review?", or any equivalent ask-for-go-ahead phrasing — these read as natural conversation but constitute the same approval-gate that wastes user attention on an unreviewed plan. The moment Step 2 ends, advance directly to Step 3 without emitting any user-facing message about the plan or the transition. The user will see the plan in Step 4 (internally reviewed in Step 3, unless the task was assessed Trivial — N=0 — in which case Step 3 is skipped and the plan reaches Step 4 unreviewed).
Step 3: Plan Review
This step is an internal review — the reviewer refines the plan before the user sees it, so the user receives a higher-quality plan in Step 4. Do not present the plan to the user or ask for feedback during this step.
Difficulty exception (Trivial / N=0). When Step 2's difficulty assessment set N = 0 (a Trivial task), this entire step is skipped: its TodoWrite rows (top-level Step 3: Plan Review and every Step 3-x) were already marked completed by Step 2's Adjust N by difficulty. This skip is gated on task difficulty, not on the presence of user-provided analysis — the analysis-substitution prohibition below still applies in full to every Simple / Moderate / Complex task (N ≥ 1).
Always run (for N ≥ 1). Step 3 is not skippable on the grounds that the user's task prompt contained design analysis, prior-session handoff material, or review-like commentary. User-provided analysis is upstream planning content the user wrote — it is not an independent bias-free peer review pass and does not substitute for the reviewer dispatch. Handling rules (closed list):
- (i) The Step 3 reviewer skill is always invoked.
- (ii) User-provided analysis (long task descriptions that themselves argue for the approach, embedded justification in handoff docs, etc.) is fed into the reviewer skill's dispatch payload as additional context so the reviewer can build on it rather than re-derive it.
- (iii) An explicit user override in the task prompt ("you may skip Step 3 for this run", or equivalent) is the only analysis-driven path to skipping (distinct from the difficulty exception above). When this fires, record a warning in the Completion summary so the user has a visible signal that the bias-free review pass was bypassed.
The existing per-iteration "No actionable findings" semantic-judgment skip continues to work — that is a reviewer-side decision (the reviewer ran and returned no actionable feedback), not a Step-skip.
If N = 0 (Trivial), skip this step entirely — its rows are already completed (see the Difficulty exception above), so do not re-mark them in_progress and proceed directly to Step 4. The following in_progress marking and per-iteration processing apply only when N ≥ 1.
Mark Step 3: Plan Review as in_progress. Process each pending iteration item (Step 3-1 through 3-N) in order:
-
Mark the iteration item as
in_progress. Call the reviewer skill resolved in Step 1 (e.g.Skill(ask-peer)): Review the plan.- Instruct reviewer to read all files under
.claude/rules/for project conventions, andreferences/plan-format.mdfor the Decisions (a)+(b) criterion and § Step 3 (f) content-quality rubric - Request feedback organized into six categories:
a. Scope & feasibility: verify the author's Step 2 simplicity self-audit (each plan element should already tie back to an explicit user requirement, known bug, or documented rule — flag only elements where the rationale is weak, missing, or looks like speculative "future-proofing" or unnecessary abstraction), dependencies, risks,
.claude/rules/compliance. Premise challenge: for each constraint, scope boundary, or strictness level the Recommendation adopts, verify it is genuinely required by an external requirement, a known bug or constraint, or an existing project rule. Any constraint whose origin cannot be identified must be surfaced — report it as a finding — and include at least one relaxed or eliminated alternative in the finding — so the plan author can populate theAlternativefield of the relevant Decisions item (seereferences/plan-format.md). Runtime/language major version upgrades: if the plan proposes upgrading the base runtime or language major version, verify that all pinned dependencies (runtime and dev) explicitly cover the new version — any dependency whose supported range does not include the new version must be flagged, and the plan must adopt the most conservative version all pinned dependencies safely support rather than leaving compatibility gaps for the user to catch at Step 4. Shell content portability: if the plan contains shell commands or shell snippets (build scripts, CI commands, cross-OS scripts — for skill development this includesBashexamples in SKILL.md,allowed-toolsglob patterns, and hook scripts), verify portability across runtime environments — quoting, expansion of variables and globs, special-character handling, and shell-flavor differences (e.g. zshnomatchfailing on unquoted globs that match nothing, bash vs. POSIX feature drift). Local shells may pass while CI or a different distribution fails — surface the compatibility check at plan time rather than catching it at Step 7. Cross-component sibling coverage: if the plan changes a structural pattern shared across sibling components (shared base classes, cross-cutting middleware, return/API contracts, mirrored services, parallel route handlers — for skill development this includes subagent dispatch shape, hook wiring, state-file handling, return-contract design), verify the plan addresses all affected siblings — not just the primary target. Three directions to check: (i) a structural fix applied to one component likely needs the same fix in siblings sharing that structure; (ii) a new component added alongside an existing sibling group should have an explicit Decisions item on whether to follow the siblings' shared shape or intentionally diverge; (iii) a processing pattern landing at multiple sites within the same patch should be applied uniformly. Flag any direction where the plan scopes the change to a single component without documenting the rationale for leaving siblings unchanged. Cross-file closed-list extension audit: if the plan introduces or extends a closed list — a gate enumeration, an enum value set, a disposition value set, a branch label set, a status token set — that is mirrored or referenced in sibling files (typically the canonical site plus one or morereferences/*.mdmirror copies — for skill development this includes a user-gate enumeration in SKILL.md mirrored to areferences/plan-format.mduser-gate preamble, a localization-scope list mirrored to areferences/<variant>.mdtable, or sibling enum fields in a multi-callee record schema), verify the plan's Test plan explicitly enumerates every reference site as a sweep target. Mode of failure: the canonical list in the primary file is updated but mirror copies drift, leading to user-facing inconsistencies (a preamble enumerates a wrong set; a localization-scope list omits a token). General principle: closed-list expansion or shrink requires a class-level extension audit across all reference sites — the canonical change is necessary but not sufficient. Internal convention citation verification: when the plan claims "consistent with existing convention", "following existing pattern", "matches sibling structure", or any equivalent appeal to an internal repository convention (for skill development this includes claims about existing SKILL.md marker formats,allowed-toolsglob granularity,references/*.mdcross-ref form, fence-label conventions, or sibling-bullet structural patterns), the reviewer must verify the cited pattern exists in the repository via primary source (filegrep/Read, not via recalled-from-memory similarity). If the pattern is not found in the repository, treat the plan element as introducing a new convention rather than following an existing one, and re-validate the introduction under the standard "explicit user requirement / known bug or constraint / documented project rule" criterion. General principle: internal-convention citations are verified the same way as external library API citations (category (e)External library primary-source verification), extended to in-repo conventions. Internal cross-reference stability: when the plan introduces new sub-steps, branch arms, phase markers, or named state machine states, and references them in prose elsewhere in the file (or inreferences/*.mdor sibling skill files), verify the references use stable phrase anchors — section headings, bold-prose paragraph labels, or quoted phrase fragments — rather than raw sub-step numbers (step 5,7g,branch 3.4 (c)) which churn whenever a sub-step is inserted, renumbered, or reorganized. The reviewer can grep for raw sub-step-number patterns inside cross-ref contexts to surface candidates. General principle: internal cross-references should survive future sub-step insertion / renumbering / reorganization — refactor-resilient anchoring (for skill development this includes references of the§ <Heading>form,§ <Heading>'s "<bold-prose label>" paragraphform, or quoted-phrase form like§ "Single source of truth" rule). External CLI behavior verification: extend the Shell content portability check above to CLI sub-command semantics — quoting / empty-set handling / sub-command state dependencies / output format conventions are CLI-specific behaviors that compile-time portability checks do not surface (for skill development, where CLI invocations show up in SKILL.md procedural prose andallowed-toolsglobs, examples include:git diff <ref>omits untracked files by design so an "all-changes" presentation needs explicit untracked enumeration;git status --porcelainC-quotes filenames containing spaces or non-ASCII unless-zis used;git commit --amend -F -does not auto-stage modified files so pre-staging is required;gh issue list --limit Ntruncates at N with no count signal, so the caller must compare returned length against limit to detect overflow). When the plan or diff contains snippets dependent on such behavior, the reviewer must verify the assumption via primary source (run the command in a scratch environment, or cite official documentation / installed manpage) — not via remembered behavior. Items that cannot be verified in the current environment should be lifted to Risks as astale-CLI-assumptionconcern. General principle: external CLI / shell / OS behavior assumptions are verified the same way as external library API assumptions (category (e)). Plan-vs-allowed-tools 1:1 alignment: when the plan body (Design / Test plan / Implementation note / Procedure sections) cites concrete external commands — subprocess invocations, shell commands,gitsub-commands,ghAPI calls, file-system operations, or any other tool invocation that requires a permission grant — verify that every cited command has a matching entry in the plan'sallowed-toolsenumeration. Commands cited in plan prose but absent from the enumeration are Critical-severity Plan findings that will otherwise surface at Step 7.5 asrules-reviewhigh-confidence violations and force a mid-implementationallowed-toolsrewrite. General principle: tool grants are part of the plan contract, not an implementation detail — a command path mentioned anywhere in the plan body must be reflected in the enumeration of grants required to execute it (for skill development this includesBash(git checkout HEAD -- *)/Bash(gh issue *)/Bash(jq *)/Bash(cp -R *)patterns the plan body mentions for safety rails, error recovery, or routine queries). b. Approach & alternatives: simpler methods, architectural fit with existing code c. Completeness: edge cases, error handling, test plan adequacy (verify specific test files are identified and existing related tests are covered for update). e2e coverage check: if the change affects user-visible interactions, role-based authorization flows, or multi-step operation sequences (for application development: new user-facing features, permission checks, workflow steps, or role introduction — for skill development: new caller-contract shapes, multi-callee orchestration chains), verify the test plan includes e2e or integration coverage for those paths, or explicitly states why e2e is not needed (e.g. backend-only change with no user-visible behavior). Collection-predicate boundary cases: when the plan designs a flow that classifies elements per-element and then applies an all/every or any/some predicate over the result set, trace the predicate for three boundary cases: (i) empty set — no elements classified, (ii) all elements share the same classification, (iii) elements have differing classifications — and verify the predicate truth value matches the design intent in each case. State-variable lifecycle completeness: when the plan introduces a new state variable — counter, accumulator, boolean flag, state-machine state (for skill development this includes counters that gate cleanup decisions, flags that route subtask transitions, or status tokens that drive disposition mapping) — verify the Design section specifies all four lifecycle points symmetrically: (i) initialization — where the value is set on entry, with the explicit default; (ii) advance / transition conditions — the success path that triggers the value to increment or transition; (iii) non-advance / non-transition conditions — the failure path with explicitdo not increment/do not transitionclauses on retry / abort / amend / recovery branches; (iv) reference sites — which downstream branches consume the value (cleanup gates, routing decisions, mapping table inputs). Symmetric specification of success and failure paths is the general principle — a missing failure-path clause is the primary source of bistability bugs that surface only at Step 8 Code Review. CHANGELOG signal placement for compatibility-affecting changes: when the plan modifies a distributed artifact's default behavior such that the next workflow run will behave differently for existing users — a default-value flip on a published config field, a gate addition or removal that fires by default, a post-completion phase reorder that changes the visible end-state (for skill development this includes default flips ofdev-workflow.mdconfig booleans, gate reordering in§ No-Stall Principle, or removal of a previously-default phase) — verify the CHANGELOG / release-notes signal placement is adequate on three axes: (i) first-line visibility — the behavior-change signal appears in the first / topmost bullet of the entry, not buried in a later sub-bullet that skim-readers and CHANGELOG-parsing automation will miss; (ii) opt-out colocation — when an opt-out exists, the opt-out method is named on the same bullet as the signal (not in a later note); (iii) bump strength — the version bump (patch / minor / major) matches the signal strength expected for the compatibility scope. General principle: user-visible default-behavior changes need first-line visibility because skim-reading users and CHANGELOG-parsing automation both pick up the topmost entries. Closed-list reference sweep: when the plan modifies a closed list — adding or removing an enum value, extending a branch set, changing a gate count, retiring a status token — sweep the entire distribution surface for references to that list beyond the canonical definition: the canonical primary file (e.g. SKILL.md), in-tree reference documents (references/*.md), user-facing documentation (README and equivalent guides), mirrored copy directories (bundled or duplicated reproductions of the canonical files), manifest / schema declarations (package manifests, plugin registries, configuration schemas), and test / config fixtures that hard-code the enumeration. Reference patterns to check: (i) count claims in adjacent prose (e.g. "fires N times per run", "the four-skill bundle", "1 gate per run" — count claims become wrong the moment the list size changes); (ii) sibling enum fields that share the same disposition vocabulary and should be symmetrically extended with the new case; (iii) disposition mapping tables (status → action / status → record-write token) that need the new row when a new enum value lands; (iv) render rules that switch on the enum value and need a new arm. General principle: closed-set expansion or shrink triggers a class-level extension audit across all distribution surfaces — the canonical change is necessary but not sufficient (for skill development this includes§ No-Stall Principlegate count claims, sibling record-schema enum fields likerecord.verify_diff/record.skill_review/record.publicity_review, mapping tables between status tokens and per-Finding record fields, README.md feature lists / supported-reviewer enumerations, bundle copy directories that mirror canonical skill files (plugins/<bundle>/skills/<name>/), and.claude-plugin/marketplace.jsonplugin / skill entries). d. Incrementality: can this plan be split into smaller, independently verifiable units (e.g. hotfix vs refactor)? Step 1.5 checked at request level; this is the plan-level check — concrete plans often bundle independent work even when the task looks single-concern. If splittable, propose the split and order. For PR-level splits (distinct verification / rollback / regression attribution), recommend restarting via Step 1.5; for intra-PR splits, recommend staged commits e. External library primary-source verification: if the plan adopts or adjusts usage of an external library's API, configuration DSL, configuration file, enabled options, or type-level behavior (interpret broadly within that scope — plugin activation and option tweaks count, not just direct API calls; but project-internal configuration decisions that happen to sit in a library's config file — e.g. toggling a style preference like tsconfigstrict— are out of scope for this category, since they carry no library-version-compatibility risk), treat in-project references (.examples.md,.local.md, existing implementations) as secondary since they can go stale after a dependency upgrade. Require the plan to cite at least one primary source — the language's most authoritative installed source (installed type definitions, package source, official reference docs). If the primary source cannot be consulted in this environment (missing installed deps, no web access), flag the item explicitly as a stale-API concern in the plan rather than silently trusting secondary material f. Presentation & attention allocation (content quality): external re-check of the Decisions section's content — verify each item genuinely passes the (a)+(b) criterion, surface any judgment call buried in Design that should have been in Decisions, and check cross-section consistency (Overview Scope ↔ Design, Test plan ↔ Design, Decisions ↔ Design). Full rubric inreferences/plan-format.md§ Step 3 (f) content-quality rubric - If
custom_instructionsis configured, include the instructions text in the review request and have the reviewer verify alignment and report conflicts - If a state file is active (executing a subtask from a decomposition), include the current subtask's scope in the reviewer request: list the subtask's
titleanddescription, then list what the other subtasks cover (to define out-of-scope). Instruct the reviewer that missing functionality belonging to other subtasks is not an actionable finding for this review — only findings scoped to the current subtask qualify. Omit this when no state file is active (single-task execution has no defined out-of-scope boundary). - Reviewer should only report actionable findings. If none, explicitly state "No actionable findings"
- Instruct reviewer to read all files under
-
Judge the reviewer's response semantically: if the reviewer reports nothing actionable — no actionable findings, no improvements to apply, no review points raised, or any other "nothing to report" outcome regardless of exact wording — mark this and remaining iteration items as
completed(skip). MarkStep 3: Plan Reviewascompletedand proceed to Step 4 automatically. Per the No-Stall Principle, do not wait for user input and do not rely on exact-phrase matching; trust semantic judgment since the reviewer skill's phrasing varies (especiallySkill(ask-peer)and other free-form-prose reviewers whose verdicts are natural-language Markdown rather than a fixed token). -
Otherwise: autonomously apply improvements or reject inapplicable points with reason — do not ask the user for judgment on individual review findings. Mark this iteration item as
completed.- Approach-reconsideration self-audit on high findings count (iter 1 only): at the iter 1 → iter 2 boundary, before applying findings individually for iter 1's output, count the reviewer's findings by severity. If either threshold trips — Critical ≥ 3 OR (Critical + Major) ≥ 10 — additionally scan the findings list for any item that surfaces an approach-level alternative (typical phrasings: "X の方が筋がよい", "existing X と統合できる", "switch to <sibling>", "use <existing-mechanism> instead", or any equivalent "the plan should adopt a different overall approach" framing). If at least one such approach-alternative finding is present, do not proceed with mechanical apply-and-iterate — instead, treat the findings cluster as a signal that the plan's Approach selection itself is the root cause. Rewrite the plan with the approach-alternative finding's direction promoted into the Decisions section (Recommendation / Alternative swap or insertion-direction new Decision item, per the rewrite class), add a new review iteration item Step 3-(N+1), and return to Step 3 to re-review the rewritten plan. The remaining iter-1 findings are carried forward as context for the next reviewer. When the threshold trips but no approach-alternative finding is present (mechanical-fix-level findings only), proceed with the usual per-finding apply-and-iterate path. This audit applies only at the iter 1 → iter 2 boundary; later iterations have already exercised one or more apply cycles and approach-reconsideration after that point is the Step 4 user-gate's responsibility (general principle: high finding density paired with an approach-level alternative finding is a structural signal, not a quality signal — keep applying mechanical fixes and the plan still fails at Step 4 user gate).
- Prose-integrity self-check (post-fix): after applying a fix that edits plan prose adjacent to its target line (Decisions / Design / Test plan / Risks / Unknowns paragraphs), re-read the surrounding paragraph as a single unit before continuing — verify no sentence is cut mid-word, no logical connective is broken (the connectives
however/therefore/because/but/ etc. still anchor real clauses), and the paragraph's overall logic still holds. Mechanical fix patches see only the line-level diff and routinely leave the surrounding prose semantically broken in ways the next iteration's reviewer flags as a Major finding, costing an extra iter. - If the plan was modified: continue to the next pending iteration item (back to step 1). Plan modifications often introduce new gaps or ripple effects that the previous reviewer had no chance to see — the re-review round-trip is cheap compared to shipping a plan that looks fine to the author but has an unvetted section. Don't short-circuit even when the fixes feel airtight
- If all points were rejected (no modifications): mark remaining iteration items as
completed(skip — there is nothing new for the next reviewer to look at) Continue to the next pending iteration item with: - the updated plan
- a summary of changes made and rejections with reasons
- the same six-category structure (a–f),
.claude/rules/reference, and "No actionable findings" requirement
Return-point no-stall reminder: At each iteration boundary (regardless of reviewer outcome — findings reported, "No actionable findings", any non-error result), the next action — the next iteration's reviewer dispatch when more iteration items remain, or the Step 4 transition when this was the last iteration or "No actionable findings" was returned — must be issued in the next tool call. Do not insert an interstitial summary or acknowledgment turn between iterations; the abstract enumeration in
§ No-Stall Principleis intentionally duplicated here so the rule fires at the decision moment. -
If all N iteration items are completed and actionable feedback still remains, carry the unresolved points forward to Step 4.
Mark Step 3: Plan Review as completed.
Step 4: Finalize Plan (USER APPROVAL GATE)
-
Before presenting, verify via
TodoWritethatStep 3: Plan Reviewand every Step 3-x iteration item arecompleted—ExitPlanModeis the effective exit from Plan Mode, so issuing it while any Step 3 item is stillpendingorin_progressskips the internal review entirely. If any Step 3 item is notcompleted, emit a one-line inline note to the user naming all incomplete items (e.g.,Plan review found incomplete (Step 3-2 still pending) — running the remaining review pass before presenting the plan., substituting the actual incomplete iteration item label(s)) then return to Step 3 to process it (do not flip the row tocompletedwithout doing the review work). Exception: when Step 2's difficulty assessment set N=0 (a Trivial task) and therefore pre-marked all Step 3 rowscompleted, that completed state is the intended skip — not an unrun-review bug — so proceed toExitPlanModenormally. 1.5. Prose-language self-audit: Before callingExitPlanMode, verify that explanation prose in the plan body (Overview narrative, Decisions rationale, Design descriptions, Test plan steps, Risks/Unknowns paragraphs) is written in the resolvedlanguage. Schema tokens (Overview/Decisions/Design/Test plan/Risks/Unknowns), step labels, enum values, identifiers, and quoted code strings stay in their original form regardless oflanguage. If any explanation sentences are in a different language than the resolvedlanguage, revise them now perreferences/plan-format.md§ Localization granularity before proceeding to sub-step 2. Re-entry coverage: this audit must re-run on every entry into Step 4 — both the initial entry and any re-entry triggered by sub-step 1's "return to Step 3" path or sub-step 3's material-change path — since revisions during Step 3 iteration may introduce prose in a language different from the resolvedlanguage. -
This is the first time the user sees the plan. Present the plan to the user — internally reviewed in Step 3 for N ≥ 1 (include any unresolved review points from Step 3); for a Trivial task (N=0) Step 3 was skipped, so present the plan as unreviewed and rely on this user-approval gate as the sole review. Follow the presentation order in
references/plan-format.md§ Step 4 presentation order — render in this order: a.## Planheader as a visual boundary. b. Full plan body in template order (Overview, Decisions, Design, Test plan, Risks/Unknowns if present) — rendered in full followingreferences/plan-format.md§ Localization granularity in the resolvedlanguage(see §Configuration; defaultja). Section headings render at###level (one below the## Plancontainer); sub-sections (Title, Goal, Scope, Decision N, Implementation, etc.) at####. c. Horizontal rule (---) separator. d. Summary preamble perreferences/plan-format.md§ User-gate summary preamble. e. Guidance line perreferences/plan-format.md§ Step 4 guidance lines (verbatim, no paraphrasing, no concatenation). f. CallExitPlanModein the same turn, immediately after the guidance line.ExitPlanModetriggers the approval modal — if it is not called, the user sees the plan text but has no way to approve. DelayingExitPlanModeto a subsequent turn is the primary cause of Step 4 appearing stalled.Section headings (
Overview/Decisions/Design/Test plan/Risks/Unknowns) and the Step 4 guidance line stay English. -
Collaborate with the user to refine the plan as needed (normal Plan Mode interaction). Categorize each user response into one of the four buckets below via semantic judgment (per § No-Stall Principle's "do not rely on exact-phrase matching" rule — example phrasings are illustrative, not literal discriminators):
- accept: explicit affirmative — "OK" / "approve" / "looks good" / "進めて" / any semantic equivalent. Begin implementation.
- swap-decisions (Decisions Recommendation/Alternative swap on one or more specific items — "Decision 1 を Alternative に", "swap the recommendation on the language flag", "use the alternative for Decision N", "Decision N と M は Alternative で残りはそのまま"): re-render the plan with the specified Recommendation / Alternative pairs swapped on the named Decisions items, leave other items unchanged, run the read-back sub-step below, then re-present the plan (re-enter the gate). When the user names multiple Decisions in one message, list every affected item on the read-back line so partial-coverage misses cannot slip through.
- rewrite-approach (Approach / Design / Scope-level material change — "switch from independent skill to extending sibling mode", "split this into two subtasks", "scope down to only the canonical site", or any change that does not fit a clean Decisions swap): add a new review iteration item (Step 3-(N+1)), run the read-back sub-step below, return to Step 3 to re-review the modified plan, then re-enter Step 4 from sub-step 1 (so sub-step 1's TodoWrite completion check on the new Step 3-(N+1) item and sub-step 1.5's prose-language re-entry-coverage audit both run before re-presenting at sub-step 2). Trivial (N=0) re-activation: if the task had been assessed Trivial (N=0) so Step 3 was skipped, an Approach-level material change means the task is no longer trivially self-evident — re-run Step 2's Adjust N by difficulty against the rewritten plan to re-derive the difficulty assessment itself (it will no longer be Trivial) and the effective N. Updating the difficulty assessment — not just N — is required because every downstream gate keys on the assessment, not on a bare N: Step 4's unreviewed-plan presentation, the
references/plan-format.mdTrivial conditional, and Step 11.5's Simple/Trivial hard-skip all read "the task was assessed Trivial"; leaving the stale Trivial label in place would keep announcing "sole review" and keep Step 11.5 skipped even after Step 3 runs. Then re-mark the TodoWrite rows for the re-derived difficulty: register Step 3-1 … Step 3-N (and the Step 8 rows) as freshpending, clear the previously-skip-completed top-levelStep 3: Plan Review/Step 8: Code Reviewrows back topending, and — when the re-derived difficulty is neither Simple nor Trivial — restore the Step 11.5 row (skip-completed under the prior Trivial assessment) topending. Without this re-derivation the Step 3 entry-point guard would skip the new review item (it skips whenever N=0) and Step 4's completion check would loop on the unprocessed item. - withdraw: explicit halt — "stop" / "cancel" / "abort" / "やめる" / "取り下げ". Exit the workflow with no further steps; do not proceed to implementation.
Read-back sub-step (mandatory before applying any
swap-decisions/rewrite-approachinterpretation): emit a one-line summary of the interpreted change in the resolvedlanguage(e.g.Decision 1 を Alternative に切り替え、Decisions 2 と 3 は Recommendation のまま保持します — このまま反映してよろしいですか?) and wait for the user to confirm before re-rendering. The read-back is the gate-of-origin's own resolution branch; do not nest a separate ExitPlanMode call inside it. If the user's confirmation response itself reads as anotherswap-decisions/rewrite-approach/withdrawinstruction, treat the read-back as un-confirmed and re-classify under the four buckets above. The read-back catches multi-Decisions instructions with partial coverage and Approach-level instructions that masquerade as Decisions swaps — both are common failure modes that silently lose user-specified scope when interpreted without read-back.NOT approval (interrogative or non-committal — "look good?" / "どう?" / "これでいい?"): treat as ambiguous — ask the user to confirm whether they intended an affirmative or to surface a change request, then re-classify the response under the four buckets above. Do not silently advance.
After the user accepts (
acceptbucket), begin implementation.
Step 5: Implement
-
Plan entry self-check — user-side manual action extraction: before issuing the first implementation tool call, scan the approved Plan body (Overview / Decisions / Design / Test plan / Risks / Unknowns) for embedded user-side manual actions — environment-prerequisite probes the user must run themselves, configuration values the user must add to a config file outside the agent's write surface, external authentications / API keys / hook installations / OS-level changes the user must perform manually, manual verification steps the user must execute against external systems (for general software development this includes "run
<command>and confirm output", "edit~/<config>to set X=Y", "log in to <external dashboard> and authorize Z"; for skill development this includes~/.claude/settings.jsonedits, hook installation, external CLI installation, or workspace-level config the user must place outside the repo). When at least one such manual action is present, emit a short independent block at the top of Step 5 — separately from any other implementation prose — listing each manual action verbatim with the Plan section it came from, before proceeding with the first implementation tool call. The block ensures the user sees the manual action items distinctly rather than discovering them buried inside long-running plan execution. When no manual actions are present (purely agent-executable plan), skip this block. The block is informational — Step 5 continues without waiting for user input on the manual actions themselves (the user-side observation gate at the probe → real-implementation boundary handles cases where the workflow needs to wait). -
Follow the plan, track progress with
TodoWrite. Applycustom_instructionsthroughout implementation -
Respect prior in-session edits: content the user explicitly removed earlier in this session (comments, guards, logs) must not reappear. Treat deletion as authoritative, not as a gap to fill. This discipline applies when applying plan steps, when applying Step 6 tidy output, and when applying Step 8 review fixes — the reviewer/tidy subagents only see the diff and cannot enforce this themselves
-
Late-stage scaffolding self-audit: when implementation introduces a structural element that was not present in the Step 2 plan — a new sub-step, an additional enum value, a new branch arm, an additional call site that invokes the same callee at a new location, a new recovery / fall-through path (for skill development this includes a new SKILL.md sub-step, an additional
statusenum in a return contract, a new error class, a new disposition mapping row) — re-apply the same Step 2 § Simplicity self-audit rigor to the newly introduced element before moving on: (i) sibling symmetry — when the new element parallels existing sibling elements, verify same fields / same disposition values / same error-class coverage; (ii) error-path symmetry — for any success path introduced, trace its corresponding failure path explicitly (counter increment vs. non-increment, success-only vs. failure-included); (iii) boundary-value coverage — for any predicate, threshold, or count introduced, trace the boundary cases (empty input, all-same-classification, mixed-classification) and verify the predicate truth value matches design intent; (iv) reference-site sweep — if the new element is referenced from prose elsewhere in the file, verify those references use stable phrase anchors (not raw sub-step numbers / branch letters). The reviewer / tidy subagents see only the diff and cannot enforce this self-audit, so it must run in the main thread at Step 5 — late-stage scaffolding correctness gaps surfacing first at Step 8 Code Review iter 1 indicate this audit was skipped. -
Final-pass literal-value full-repo grep: at Step 5 completion (after all planned edits are applied and before advancing to Step 6 Tidy), for each literal value the plan replaced or introduced — numeric constants (threshold values, version numbers, magic numbers), token strings (status enum values, config keys, identifiers), file path fragments, or any other literal whose semantics are tied to a specific value —
grepthe entire repository (not just the plan's enumerated sweep targets) for the old value and confirm zero hits. The Plan's Test plan typically enumerates known sweep sites, but narrative examples embedded in prose (illustrative numbers in descriptive text, story-style usage examples in SKILL.md /references/*.md/ README files) routinely sit outside the enumerated list and silently retain the old value through mechanical search-and-replace passes. Two-stage structure: (i) enumerated sites — the explicit list from the Plan's Test plan, verified one by one; (ii) final-pass full-repo grep —git ls-files | xargs grep -l <old-value>(or equivalent) for any residual hits outside the enumerated list, with each hit reviewed in context and either updated (if it carries the old value's semantics) or marked as out-of-scope (e.g. a different concept that coincidentally uses the same literal). The reviewer / tidy subagents see only the diff and cannot enforce the full-repo sweep, so it runs as a Step 5 completion gate (for skill development this includes literal numeric thresholds cited inreferences/*.mdnarrative examples, version strings in README usage snippets, and example values in compaction / extract-rules-style descriptive prose). Non-literal-replacement tasks skip this audit. -
Pre-write path scope check (Write / Edit / new-file path safety): before every
Write/Edit/ similar file-creation tool call whosefile_pathargument does not match a path that already exists ingit ls-filesoutput (typically: new files generated by Step 5 / Plan rewrite / staging document creation / new test fixtures / new CHANGELOG entries — file paths that the tool will create rather than modify-in-place), run a two-stage path verification before issuing the tool call: (i) repo-root containment — verify the absolute resolved path sits undergit rev-parse --show-toplevel(no../escape from the working directory, no absolute path leading outside the repo); (ii) prefix sanity — verify the path's leading directory matches an expected location for its content class (.claude/plans/for plan documents,skills/<name>/for skill content,src/ortests/or equivalent for code,.triage/ortmp/for staging, etc.). If either check fails, abort the tool call with a fail-loud diagnostic naming the resolved path and the expected prefix set, rather than silently creating the file. Theallowed-toolspermission grant alone does not prevent parent-directory landing (Writeaccepts any stringfile_path), so a procedural pre-check is the only structural defense against typo-induced orphaned files (for general software development this includes accidental migration / config / test-fixture writes landing one directory up; for skill development this includes.claude/plans/<slug>.mdtypos depositing files at../<slug>.md, marketplace.json paired-bump operations writing to the wrong manifest, or staging documents landing outside.triage//.claude/). If a tool call has already created a file in the wrong location, instruct the user to delete it manually — the workflow's auto-mode classifier cannot reach files outside the project scope, so manual cleanup is the only path. -
User-observable artifact protection gate at probe → real-implementation boundary: when the Plan explicitly stages an implementation as probe / intermediate-artifact → real-implementation replacement (e.g. "first emit a debug-instrumented version for user to observe, then replace with the production implementation", "scaffold a placeholder file the user will manually inspect, then overwrite with the final content", "log expected probe output as a verification step, then remove the logging"), do not advance to the real-implementation step until the user has signaled observation completion. The probe-output observation gate is the only user-side wait state permitted inside Step 5 — every other Step 5 sub-step proceeds autonomously per § No-Stall Principle. When the probe is committed to disk and the user has not yet acknowledged observation, hold the workflow at this boundary and emit a one-line wait prompt in the resolved
language(e.g.Probe artifact deployed at <path> — please observe its output before the workflow replaces it with the final implementation. Reply when ready to proceed.). Resume the real-implementation step on any non-empty user reply. When no probe → real sequence is in the Plan (typical case — purely incremental implementation), this gate does not fire (for general software development this includes debug-log-instrumented scaffolds replaced by clean production versions, mock-data fixtures replaced by real-data fetches; for skill development this includes verbose-tracing skill versions replaced by streamlined final versions). The gate exists to prevent the probe artifact from being silently overwritten before the user has had a chance to inspect it — a failure mode the No-Stall Principle's autonomy guarantee otherwise creates.AskUserQuestion option design (applies to the probe gate above and any future user-state-query call in this workflow): when the workflow uses
AskUserQuestion(or any equivalent multi-option user-query tool) to query the user about a plan-derived state — probe-execution outcome, manual-verification result, environment-prerequisite check, or any equivalent state confirmation — the options list MUST include a meta-confusion branch alongside the result enumeration. Concretely, do not present only outcome categories (e.g.success / failure / skipped); also include an option phrased as "the procedure / expected outcome is not yet understood (please re-explain)" in the resolvedlanguage(e.g.language: ja:手順 / 期待結果がまだ把握できていない(要再説明);language: en:the procedure / expected outcome is not yet clear (please re-explain)). The meta-confusion branch absorbs the "I cannot answer the question as posed" state — without it, the user is forced intoOtherfree-text and the workflow consumes an extra clarification turn re-explaining what was already in the Plan. General principle: user-state queries enumerate outcomes AND leave a fallback for the premise-not-conveyed case, never outcomes alone (for general software development this includes deployment-readiness queries, migration-completion confirmations, external-system-state checks; for skill development this includes probe-result queries inside this Step 5 gate, callee-execution-outcome confirmations, manual-config-applied verifications).
Step 6: Tidy
Implementation often introduces unnecessary complexity that's easier to spot in a dedicated pass after the code works.
- Pre-dispatch rename-sweep self-audit: if the Implement diff (since
<base-commit>recorded in Step 2) includes a term-rename operation — a search-and-replace across the project that swapped a step name, callee name, config key, identifier, or domain concept for a new one — sweep the changed-path SKILL.md /references/*.md/ README prose for synonyms and derived forms of the rename target before dispatchingSkill(tidy), and fix any residue inline. General principle: mechanical search-and-replace leaves synonym / derived-form residue that the substitution alone cannot catch — gerund forms when a verb is renamed, nominalizations and related-noun forms when an action is renamed, conceptual paraphrases of the original term in surrounding description text when a step or concept is renamed (for skill development this includes renaming a procedural verb leaving its-ingform in description prose, renaming a step leaving the prior step-concept paraphrase in cross-section reference text, or renaming a callee leaving the old concept noun in doc-comment / SKILL.md narrative). Detect at this Step 6 so the Completion-time integrity check (Step 8 reviewer /hooks.on_complete) remains a backstop rather than the primary detection point. Non-rename diffs skip this audit. Skill(tidy): Review changed code for reuse, quality, and efficiency, then apply cleanup edits. Do not passBase ref/--base-commit <sha>— tidy's default working-tree mode is the intended scope here (covers tracked + staged + untracked changes per tidy's§ Invocation contract); passingBase refwould switch tidy to committed-history mode and silently drop untracked files from the cleanup scope, even though sibling Steps (Step 7'stest_commands, Step 7.5'sSkill(rules-review)) invoke their callees with--base-commit <sha>. Pass the workflow'scustom_instructionsconfig value through tidy's natural-languageCustom instructionsfield (omit the field entirely whencustom_instructionsis unset or empty — do not render(none)/ empty string / fabricated default). General principles: (i) when a caller-skill dispatch field is driven by an optional config key, state the absent-key behavior inline on the dispatch line rather than relying on cross-reference to the config-parse step; (ii) when a caller depends on a callee's default-mode behavior for scope correctness and sibling steps use a different argument convention, name the asymmetry on the dispatch line as load-bearing rather than implicit — the executor cannot rely on a default-by-omission when sibling steps create an extrapolation pull toward the explicit form.- Regardless of the outcome — whether
tidyapplied fixes, reported no actionable findings, or returned any other non-error result — markStep 6: Tidyascompletedand proceed to Step 7 automatically. Per the No-Stall Principle, do not wait for user input. - If
Skill(tidy)result is not observable (e.g. context compaction occurred during or immediately after the call): inspectgit diff <base-commit>. If the diff contains changes clearly attributable to a cleanup pass, treat tidy as completed and proceed to Step 7. Otherwise (no tidy-attributed changes visible, or ambiguous), re-executeSkill(tidy)once — inspection-and-fix-class skills are idempotent — then proceed to Step 7.
Step 7: Check / Test (max 3 retries)
- Run
check_commandsin order (always run all)- On failure, fix and retry (do not proceed to test execution)
- Scope-drift guard: before each command, record
git diff --name-only <base-commit>as the task-scope snapshot (the file set scoped to this task at the start of Step 7). After the command, re-check — any file newly appearing outside that snapshot was written by the command (auto-fix/write behavior sweeping unrelated drift). If scope drift is detected, classify the out-of-scope changes before acting: if all of the following hold — (i) the out-of-scope diff is whitespace or comment changes only (no code-skeleton changes: no non-blank, non-comment lines added or removed), (ii) the total changed line count across all out-of-scope files is ≤ 5, and (iii) the changes are attributable to the formatter or linter that just ran (the command is a known formatter/linter, e.g.lint:fix,format,prettier,black) — then proceed automatically without a user-direction stop: emit a one-line note (e.g.Scope-drift note: <file>(s) received whitespace-only formatting from <command> — proceeding) and continue to the next command. Otherwise (non-trivial drift): warn the user (list both the in-scope files and the newly-appeared out-of-scope files), do not auto-revert /git checkout/ delete the out-of-scope changes (leave the working tree as the command left it for user inspection), leaveStep 7: Check / Testasin_progress, and wait for user direction. This is a step-internal stop directive — the only allowed non-completing exit from the check_commands phase — and is consistent with the No-Stall Principle, which permits explicit step-defined stops
- Iterate over
test_commandsin order. For each entry (which must be of the formSkill(<name>)), invoke that skill with--base-commit <sha>(from Step 2) via$ARGUMENTS. Each invocation must return a structured summary with one of three statuses (SUCCESS / TEST_FAILED / EXECUTION_ERROR); a TEST_FAILED or EXECUTION_ERROR from any entry halts the loop immediately and triggers the retry path in sub-step 3 — subsequent entries do not run on the failing pass.- Each test skill handles scope decision and test execution internally via subagent (when applicable)
- Returns structured summary: SUCCESS / TEST_FAILED / EXECUTION_ERROR
- Bulk-vs-split execution: when the change is cross-cutting (shared components, mirrored services, or parallel handlers) and the test suite includes long-duration categories (E2E, integration tests with external dependencies), prefer passing scoped or split arguments rather than requesting a single bulk run. A single command bundling long-running jobs makes intermediate progress opaque and failure recovery harder — scope-targeted execution lets each category succeed or fail independently.
- Pre-existing vs regression discrimination: before entering the retry path on
TEST_FAILED/EXECUTION_ERROR, discriminate each reported failure as regression (introduced by this run's changes) or pre-existing (already failing at<base-commit>from Step 2). Two paths: (i) if the invoked test skill's structured summary already classifies failures aspre-existing/regression(recommended return-contract extension for any verification-class skill — lint, test runners, structural checkers, marketplace validators), trust that classification. (ii) Otherwise, re-run the same test skill against<base-commit>: stash the working changes (git stash --include-untracked), check out<base-commit>into a scratch worktree (git worktree add ../base-commit-check <base-commit>) or rely on the test skill's own--base-commitargument if it supports re-evaluating at that ref without working-tree manipulation; compare the failures. Failures reproducing at<base-commit>are pre-existing — record as an informational warning in the summary (pre-existing failure: <skill> / <case>— out-of-scope for this PR) and do not count toward the 3-retry budget and do not auto-fix. Only failures that do not reproduce at<base-commit>are regressions — proceed with the existing retry / fix path. General principle: regression-vs-pre-existing discrimination via base-commit comparison applies to any verification step running a checker against a working tree (lint, test, structural validator — for skill development this includes marketplace structure validation and plugin integrity checks where docs and implementation can disconnect independently of the current change).
- After 3 retries, report to user and stop
Coverage note (TypeScript multi-tsconfig): For projects with Project References or multiple
tsconfig*.jsonfiles, a singletsc --noEmitmay miss changed files that belong to other tsconfigs.--initauto-registers a per-tsconfigtsc -p <path> --noEmitin this case (seereferences/init-mode.mdfor detection rules). If coverage still looks incomplete, re-run--initor append the missing command manually.
GATE: Verify Steps 2-7 are completed (check TodoWrite status; if status is inconsistent, verify actual completion by reviewing work done). Mark Step 7.5 as
in_progress.
Step 7.5: Rules Compliance Review
Dedicated rules compliance check, separate from code review (Step 8). This ensures rule enforcement gets focused attention rather than competing with correctness and design concerns.
Responsibility scope (so the same rule class is not double-reviewed across passes and no class slips through every pass):
- Step 7.5 owns the mechanical walk of every matched
.claude/rules/rule against the diff — hard rules (explicit prohibitions, naming, reference form, import paths, placement, file structure) are evaluated strictly; intent-style rules (judgment-based principles, prose conventions) are evaluated best-effort with low-confidence markers perrules-reviewSKILL.md. - Step 6 Tidy covers reuse, prose quality, dead code, and redundancy; rule compliance is not its primary responsibility — if
Skill(tidy)surfaces rule findings as a side effect, treat them as bonus and do not extend its reviewer prompt to take on.claude/rules/walks. - Step 8 Code Review covers correctness, edge cases, conventions / consistency lightly (a safety-net pass for files modified after Step 7.5), and simplicity / maintainability — the thorough rules check stays at Step 7.5.
- Step 11 Update Rules owns the rule-doc-drift class: findings where the code under review is internally consistent with itself (and with the broader file's existing pattern across 3+ call sites per
rules-reviewSKILL.md's drift detection criteria) but the rule document describes different behavior — i.e. the rule text has gone stale relative to the code. Step 7.5 surfaces this class via the reviewer'sClassification: rule-doc-driftfinding and does not apply a code fix; the disposition is to route the rule-text update to Step 11 (Skill(extract-rules)) rather than rewriting the code to match a stale rule. Whenrules-reviewreturns a finding taggedClassification: rule-doc-drift, treat it as out-of-scope for Step 7.5's fix loop (noSkill(rules-review)re-run is required to clear it, since the code is the source of truth), record the routing intent so Step 11 picks it up, and continue.
When a rule violation is reported in both passes (Step 7.5 and Step 8), treat Step 7.5 as authoritative and skip the duplicate fix attempt in Step 8 to avoid double-counting in the iteration budget.
- Always invoke
Skill(rules-review)with--base-commit <sha>(base-commit recorded in Step 2) via$ARGUMENTS. Do not substitute an inline rules-walk based on perceived scope, change size, or any other self-judgment of the diff's complexity — small / "obvious" / single-file changes still go through the external skill. The skip-to-fallback path is documented in Prerequisites and fires only on objective skill unavailability (theSkill(rules-review)call itself fails after one retry), never on subjective judgment that an inline equivalent would suffice. The external skill enforces consistent coverage across runs; inline substitution silently degrades that coverage and the user has no visible signal that it happened. - Judge the result semantically: if the skill reports that there is nothing to act on — no actionable violations, no changed files, no applicable rules, no rule files found, or any other "nothing to report" outcome regardless of exact wording — mark
Step 7.5: Rules Compliance Reviewascompletedand proceed to Step 8 automatically. Per the No-Stall Principle, do not wait for user input and do not rely on exact-phrase matching; trust semantic judgment since the skill's phrasing may evolve across versions. - If violations found:
a. Fix all reported violations
b. Re-run Step 7 (Check / Test) to ensure fixes did not break anything
c. Re-run
Skill(rules-review)with--base-commit <sha>for verification (2nd cycle). Apply the same semantic judgment as step 2: if the re-run reports nothing actionable, markStep 7.5: Rules Compliance Reviewascompletedand proceed to Step 8 automatically (per the No-Stall Principle). When a 2nd-cycle verdict differs from the 1st on a specific location (a previously-flagged item now passes, or a previously-clean location is now flagged), record the reason inline in the Step 7.5 user-facing summary presented to the user (1–2 lines per drifted location: which location, 1st-cycle verdict, 2nd-cycle verdict, why) before completing — judgment drift between cycles is acceptable but must be explained, otherwise repeat-cycle stability cannot be assessed. d. If violations still persist after the 2nd review cycle, present remaining violations to user for decision. Above the violations list, emit a summary preamble perreferences/plan-format.md§ User-gate summary preamble. Render the violations followingreferences/plan-format.md§ Localization granularity in the resolvedlanguage. Wait for user response before marking completed. (This is one of the explicit user-gates enumerated in the No-Stall Principle.)
Mark Step 7.5: Rules Compliance Review as completed only after all violations are resolved or user has decided on remaining violations.
GATE: Verify Steps 2-7.5 are completed (check TodoWrite status; if status is inconsistent, verify actual completion by reviewing work done). Mark Step 8 as
in_progressonly when N ≥ 1; if Step 2 set N=0 (Trivial), Step 8 is alreadycompleted— do not re-mark itin_progress, skip straight to Step 9.
Step 8: Code Review
Code review catches bugs, convention violations, and design issues that tests alone miss — skipping it risks shipping preventable defects. Always run this step even when tests pass cleanly.
Difficulty exception (Trivial / N=0). When Step 2's difficulty assessment set N = 0 (a Trivial task), this entire step is skipped: its TodoWrite rows (top-level Step 8: Code Review and every Step 8-x) were already marked completed by Step 2's Adjust N by difficulty. As with Step 3, this skip is gated on task difficulty; for any Simple / Moderate / Complex task (N ≥ 1) Step 8 always runs.
If N = 0 (Trivial), skip this step entirely — its rows are already completed, so do not re-mark them in_progress and proceed directly to Step 9 (Completion Hooks). The following in_progress marking and per-iteration processing apply only when N ≥ 1.
Mark Step 8: Code Review as in_progress. Process each pending iteration item (Step 8-1 through 8-N) in order:
-
Mark the iteration item as
in_progress. Call the reviewer skill resolved in Step 1 (e.g.Skill(ask-peer)): Review code changes.- Include
git diff <base-commit>(base-commit recorded in Step 2) to capture all changes since workflow start - Thorough rules compliance has been verified in Step 7.5, but instruct reviewer to also flag any obvious
.claude/rules/violations as a safety net — especially for code modified after Step 7.5 - Request feedback organized into three categories:
a. Correctness & edge cases: bugs, error handling gaps, race conditions, missing validations, missing or insufficient tests for changes (verify planned test files from Step 2 are present in the diff)
b. Conventions & consistency: naming, file structure, patterns,
.claude/rules/compliance (lightweight check — Step 7.5 handles the thorough review). Comments: treat narration (line-by-line paraphrase) and preamble (restating surrounding context) as delete-candidates, not as gaps to expand with more rationale c. Simplicity & maintainability: unnecessary complexity, duplication, unclear abstractions, speculative features without explicit trigger (functionality beyond what the stated requirement needs — flag for removal). Specifically: defensive hardening of already-safe paths, future-proofing for hypothetical double-calls, double-coverage over paths already protected elsewhere. Tidy-revival check (iter k ≥ 1 when Step 6 ran earlier in this session; iter k ≥ 2 otherwise): also verify that fixes applied in the previous iteration have not re-introduced narration, preamble, or redundant prose that an earlier Step 6 Tidy pass deliberately removed — fix patches see only the current diff and can silently undo Tidy's deletions across iterations - If
custom_instructionsis configured, include the instructions text in the review request and have the reviewer verify compliance and report conflicts - If a state file is active (executing a subtask from a decomposition), include the current subtask's scope in the reviewer request: list the subtask's
titleanddescription, then list what the other subtasks cover (to define out-of-scope). Instruct the reviewer that missing functionality belonging to other subtasks is not an actionable finding for this code review — only findings scoped to the current subtask qualify. Omit this when no state file is active (single-task execution has no defined out-of-scope boundary). - Reviewer should only report actionable findings. If none, explicitly state "No actionable findings"
- Include
-
Judge the reviewer's response semantically: if the reviewer reports nothing actionable — no actionable findings, no bugs / convention violations / design issues raised, or any other "nothing to report" outcome regardless of exact wording — mark this and remaining iteration items as
completed(skip). MarkStep 8: Code Reviewascompletedand proceed to Step 9 (Completion Hooks) automatically. Per the No-Stall Principle, do not wait for user input and do not rely on exact-phrase matching; trust semantic judgment since the reviewer skill's phrasing varies (especiallySkill(ask-peer)and other free-form-prose reviewers whose verdicts are natural-language Markdown rather than a fixed token). -
Otherwise: autonomously fix genuine issues or reject inapplicable points with reason — do not ask the user for judgment on individual review findings. Mark this iteration item as
completed.- Rejection self-question (severity-label override): before rejecting any finding solely because the reviewer labeled it Minor (or any other low-severity bucket), ask "if I rejected this and presented the resulting code to the user, would the user re-raise the same point themselves?" — judging by which areas the user has historically commented on (intent expression, reader-comprehension, placement consistency for test fixtures / helper functions / dependency locality, and other readability concerns where runtime correctness is unaffected but a reader's interpretation is). If the answer is yes or ambiguous, apply the fix instead of rejecting on the Minor label alone; reject on Minor only when you are confident the user would not surface the same point.
- Class-level extension audit (post-Critical/Major-fix): immediately after applying a fix for a Critical-severity finding, or a Major-severity finding whose fix addresses a structural pattern (external I/O boundary conditions, closed enum / form-set networks, shared helper / safety-rail callers, parallel route handlers — for skill development: subagent return-value schemas, shared handler fallback paths, mirrored form-set network audits), and before the modified-vs-rejected branches below, scan the rest of the diff for other instances of the same defect class — same operation, same broken assumption, same side-effect pattern (e.g. shared-resource-destroying API call sequences, direct processing of unverified input, race conditions). Reviewer feedback typically names one instance; the underlying class often spans the diff (cross-construct propagation, shared safety-rail callers, parallel route handlers, etc.). Apply the same fix direction to additional matches found here, then record the sweep outcome (e.g.
class-level sweep for <defect-class>: N additional instances found and fixedorno additional instances found) in the summary passed to the next iteration so the next reviewer does not re-trigger the same audit on already-swept ground, then continue to the modified-vs-rejected branch. - Prose-integrity self-check (post-fix): after applying a fix that edits prose adjacent to its target line (comments, docstrings, paragraph-level documentation — for skill development this includes SKILL.md and
references/*.mdcontent), re-read the surrounding paragraph as a single unit before continuing — verify no sentence is cut mid-word, no logical connective is broken (the connectiveshowever/therefore/because/but/ etc. still anchor real clauses), and the paragraph's overall logic still holds. Mechanical fix patches see only the line-level diff and routinely leave the surrounding prose semantically broken in ways the next iteration's reviewer flags as a Major finding, costing an extra iter. - Natural-language quality self-check (post-fix): when a fix adds new natural-language content that mechanical lint / test cannot verify (code comments, config-file annotations, error messages, UI copy, documentation fragments — for skill development this includes SKILL.md /
references/*.mdprose additions, frontmatterdescriptiontext, log messages), re-read each added fragment as a standalone unit in the resolvedlanguage. Judge it on four axes: concise (no padding or runaway sentences), phrasing natural for the target reader, vocabulary consistent with surrounding text, register and sentence structure not awkward. Revise any fragment that fails. This self-check is the only gate before natural-language content reaches the user-visible commit gate — Step 7 (check_commands/test_commands) and Step 7.5 (rules-review) cannot evaluate natural-language quality. - If code was modified: re-run Step 7 and Step 7.5 (with same base-commit from Step 2), then continue to the next pending iteration item (back to step 1). Code fixes routinely introduce fresh bugs, tighten one place while loosening another, or miss a caller the author didn't know about — the next review round is how those leaks get caught. Always re-run Step 7 and Step 7.5 — no exceptions. Do not short-circuit on any rationalization: not on confidence in the fix, not because the diff is small, not because the modified files appear out of scope for the configured
check_commands/test_commands(e.g. edits land entirely under a local-skill directory or a docs-only path), not because re-running "would be a no-op". If a re-run is genuinely a no-op, the no-op outcome is the audit trail; skipping the re-run removes the trail. The only permissible skip is when no code was modified in this iteration (handled by the next bullet). - If all points were rejected (no modifications): mark remaining iteration items as
completed(skip — there is nothing new for the next reviewer to look at) Continue to the next pending iteration item with: - the latest
git diff <base-commit> - a summary of fixes made and rejections with reasons
- the same three-category structure,
.claude/rules/reference, and "No actionable findings" requirement
Return-point no-stall reminder: At each iteration boundary (regardless of reviewer outcome — findings reported, "No actionable findings", any non-error result), the next action — the next iteration's reviewer dispatch when more iteration items remain, or the Step 9 (Completion Hooks) transition when this was the last iteration or "No actionable findings" was returned, or the Step 7 / Step 7.5 re-run when code was modified — must be issued in the next tool call. Do not insert an interstitial summary or acknowledgment turn between iterations; the abstract enumeration in
§ No-Stall Principleis intentionally duplicated here so the rule fires at the decision moment. -
If all N iteration items are completed and actionable feedback still remains, present the unresolved points to user for decision. Above the unresolved points, emit a summary preamble per
references/plan-format.md§ User-gate summary preamble. Render the findings followingreferences/plan-format.md§ Localization granularity in the resolvedlanguage.
Mark Step 8: Code Review as completed.
Step 9: Completion Hooks
Skip this step if hooks.on_complete is not configured. Mark Step 9: Completion Hooks as in_progress.
- Execute each entry in
hooks.on_completein order:Skill(<name>)pattern: invoke the skill- Other strings: execute as a Bash command
- If a hook fails, report the error but continue executing remaining hooks. Include as warnings in the Completion summary
- After all hooks complete (or are skipped), mark
Step 9: Completion Hooksascompletedand proceed to Step 10
GATE: Verify Steps 2-9 are completed (check TodoWrite status; if status is inconsistent, verify actual completion by reviewing work done). Mark Step 10 as
in_progress.
Step 10: Interactive Commits
After hooks.on_complete (which may itself modify the working tree, e.g. via auto-formatter or apply-edit hook entries), group the working-tree changes into commits and iterate with the user one commit at a time. Step 10 runs only when interactive_commits: true — Step 1's TodoWrite registration omits the row otherwise and execution proceeds directly from Step 9 to Step 11. The git push is never performed by this step (or any other step): pushing commits to a remote is the user's responsibility.
On entry to Step 10, initialize landed_count = 0 before running Procedure 1 — so the value is well-defined for the Completion section even when Procedure 1's empty-output skip path fires before the Per-commit loop ever starts.
Procedure:
-
Collect changes:
<base-commit>referenced throughout this Procedure is the value recorded by Step 2's opening sub-step (git rev-parse HEADat workflow start — see § Step 2: Create Plan's first numbered item); Step 10 reuses that captured value verbatim, never re-resolves it. Rungit status --porcelain=v1 --untracked-files=all -zonce (the=v1form pins format;--untracked-files=alloverrides any user-sidestatus.showUntrackedFiles=noconfig;-zemits NUL-separated entries with no C-style quoting so filenames containing spaces, quotes, or non-ASCII characters are recoverable verbatim). Parse the NUL-separated output: entries prefixed??are untracked, others are tracked changes. If the output is empty, markStep 10: Interactive Commitsascompletedand proceed to Step 11 (nothing to commit since base-commit). Otherwise keep the parsed file list in memory as the canonical roster. Then rungit diff <base-commit>once to capture the per-file diff for tracked files (the singlegit diffoutput is consumed per-file atPer-commit loop'sPresentstep by slicing ondiff --git a/<path> b/<path>boundary markers — do not issue separate per-filegit diff <base-commit> -- "<path>"invocations); for untracked files the diff is empty, so for each untracked path alsoReadits file contents and treat that as the "diff" to present atPer-commit loop'sPresentstep (sincegit diffomits untracked paths by design) -
Deduce commit style: run
git log -n 10 --format=%sand infer the project's conventional commit style (presence of type prefix, imperative mood, etc.) — used as a runtime hint when drafting subjects, never hard-coded -
Propose commit plan: group changes into 1+ logical commits, each
{subject, body (optional, may be empty), files}. Group by cohesion (concerns that differ → separate commits); typical case is a single commit. Present the plan to the user as a numbered list (subject + file list per commit) and wait at the commit-plan approval gate. Approval is judged per § Approval token closed list below -
Per-commit loop (
landed_countwas initialized to0at Step 10 entry above; no re-initialization is needed here — the counter is incremented only at sub-stepd. Commiton a zero-exitgit commit): process each commit in order:-
a. Present: show the current commit's four presentation elements as a closed list — Subject / Body / Files / Diff — each rendered in its own block as the user's approval basis:
- Subject: render in a dedicated fenced code block (single line, no surrounding prose).
- Body: render in a dedicated fenced code block. When the commit has no body, render the placeholder
(no body)inside the block — a prose statement like "body included" / "no body needed" without an actual rendered block is forbidden, because the user's approval must be anchored to the material content itself, not a promise of it. - Files: render as an explicit pathspec list (one path per line, prefixed with
-) — the same paths that will be passed toc. Stage'sgit add -- <paths>invocation. - Diff: render the actual per-file hunks. Tracked files use the per-file portion of the diff captured at Procedure 1; untracked files use the full file contents (also captured at Procedure 1) shown as a "new file" hunk — make the untracked status explicit in the per-file label so the user can distinguish "added file" from "modified file" before approving.
Prose-only summaries such as "body 含め、diff full preview" / "subject and body included" without actually rendering the corresponding blocks are forbidden — they leave the user without the material content needed to judge approve / adjust / cancel and trigger an immediate re-render request
-
b. Per-commit accept gate: wait for the user. Categorize the response per § Approval token closed list
-
c. Stage: run
git add -- "<file-1>" "<file-2>" ...in a single invocation with one explicit pathspec per file (verbatim filenames recovered from Procedure 1's-zoutput — never the C-quoted form). The--separator + double-quoting handle spaces, quotes, and non-ASCII characters;-Aand other bulk forms are forbidden because they may stage unrelated drift -
d. Commit: prefer a HEREDOC even for one-line subjects so multi-line bodies share the same form.
git commit -m "<subject>"for subject-only; for subject + body use:git commit -F - <<'EOF' <subject> <body> EOFOn zero exit, increment
landed_countby 1. -
e. Non-zero-exit retry (commit-attempt failure): if
git commitexits non-zero, retry once after a 1–2 second sleep. On the second non-zero exit, rendercommit-failed (<reason>)where<reason>is the last non-empty line of stderr truncated to ≤ 80 characters; if stderr is empty or whitespace-only, rendercommit-failed (no stderr). Then stop Step 10. Do not incrementlanded_count. Do not auto-recover (no force, no reset, no rebase, no amend). For pre-commit hook rejections, instruct the user to fix the hook out-of-band and restart/dev-workflowin a new session
-
-
Post-commit auto-modify cycle bound: after a zero-exit commit, re-run
git status --porcelain=v1 --untracked-files=all -z. If new uncommitted changes appeared (a pre-commit hook auto-fixed the tree), ask the user a fold-or-defer question with explicit semantics —fold(= staple the hook's edits into the just-landed commit) ordefer(= leave them in the working tree for a later commit-plan iteration). Map the response per the dedicated 5-branch →fold/defer/cancel/ re-present-as-adjustclassifier below — the 5 input branches extend § Approval token closed list's 4 buckets (accept/adjust/cancel/NOT approval) with a 5thdefer-directionbranch specific to this gate, mapping to 4 distinct dispositions (explicit per-branch mapping — noeverything elsecatch-all, so ambiguous / interrogative responses cannot silently land indefer):- accept (affirmation of folding — "fold" / "amend" / "現コミットに含める" or any semantic equivalent for incorporating the hook's edits) →
fold - adjust (specific revision request — "defer just foo.ts but fold bar.ts" / "rename the subject before folding" / any concrete change demand that is neither a clean fold nor a clean defer) → re-present the gate via § Mid-loop adjust branch f
Ambiguous / un-classifiable adjust request(clarifier); do NOT silently route todefer - cancel / stop →
Mid-loop cancelper its canonical route - NOT approval (interrogative or non-committal — "look good?" / "どう?" / "これでいい?") → treat as
adjustper § Approval token closed list and re-present the gate; do NOT silently route todefer - Otherwise — an unambiguous defer-direction response ("defer" / "後で" / "leave it" / "skip" / "later" or any semantic equivalent for postponing the hook's edits) →
defer
On
fold:git add -- "<file-1>" "<file-2>" ...with exactly the files reported by this just-issued post-commit porcelain re-run (the hook-modified set — not the original commit's file list, not the Procedure 1 roster); without this stage step--amendwould silently keep the pre-hook tree. Thengit commit --amend --no-edit(orgit commit --amend -F -with the original subject + body) to incorporate them. The amend re-commits the same logical commit —landed_countis not re-incremented. Ondefer: leave the working tree as the hook left it; the next iteration ofPer-commit loop(or a subsequent plan re-render via theMid-loop adjustfile-regrouping branch) will surface them. Re-commit is allowed at most once per logical commit; if a second commit attempt on the same logical commit also triggers auto-modify, abort Step 10 withpre-commit hook unstable (commit <#k>: hook re-modified <files>)where<files>is the porcelain-reported file list at the time of abort. Emit the partial-completion summary (same retry-once + stop policy as thePer-commit loop'sNon-zero-exit retry (commit-attempt failure)step). Surfacing the file list gives the user actionable context — they typically need to disable / fix the offending hook out-of-band before resuming - accept (affirmation of folding — "fold" / "amend" / "現コミットに含める" or any semantic equivalent for incorporating the hook's edits) →
-
Landed-commits invariant: once a commit has zero-exit landed, it is immutable. The
adjustflows below apply only to the un-landed commits starting from#landed_count + 1. Landed commits are shown in the numbered list as history for context but cannot be re-edited from inside Step 10 -
Mid-loop adjust — closed-list branches (treat any user response categorized as
adjustper § Approval token closed list).Gate-of-origin tracking: every time Step 10 dispatches one of its three gates — commit-plan approval gate (Procedure 3), per-commit accept gate (Procedure 4 sub-step
b), or fold-or-defer gate (Procedure 5) — setgate_originto that gate's identifier before waiting for the user response. Branches e and f below readgate_originto know which gate to re-enter; without this tracking, an adjust-request raised at the fold-or-defer gate could silently fall back to the per-commit accept gate (or vice versa), losing the gate-specific framing (e.g. the in-flight just-landed commit's identity forfolddecisions).- a. Subject / body change only on the in-progress un-landed commit: update the field, then jump back to
Per-commit loop'sPresentstep for the same commit (do not move the per-commit pointer) - b. File regrouping (merge / split / reorder / add / remove — including splits or merges that change the un-landed commit count, since both operations rebuild the un-landed portion and re-render): rebuild the un-landed portion of the plan, re-render the numbered list in full (with the landed prefix shown as history), and re-enter the commit-plan approval gate. On re-approval, the per-commit pointer resets to
#landed_count + 1and execution resumes atPer-commit loop'sPresentstep for that pointer - c. Un-landed portion drops to 0 commits (the user's adjust removed everything that was still pending): mark
Step 10: Interactive Commitsascompletedand proceed to Step 11 — this is a normal completion, not a cancel - d. Merge absorbs the un-landed remainder into an already-landed commit → un-landed = 0: same disposition as the Un-landed portion drops to 0 commits branch above (normal completion)
- e. Adjust request targets an already-landed commit (e.g., "change the subject of commit #1 (already landed)", "move file X from commit #1 to commit #3"): reject the request, surface the landed-commits invariant explanation (
landed commits are immutable inside Step 10 — exit and use git rebase -i / commit --amend outside the workflow if needed), and re-enter the gate identified bygate_origin - f. Ambiguous / un-classifiable adjust request: ask the user a clarifying question and re-enter the gate identified by
gate_origin(do not fall through silently)
- a. Subject / body change only on the in-progress un-landed commit: update the field, then jump back to
-
Mid-loop cancel: a response categorized as
cancelper § Approval token closed list stops Step 10 with all landed commits preserved and all un-landed changes left in the working tree as-is. The workflow does not revert — the user is free togit reset/git stash/ continue manually. Record the partial state per § Localized summary tokens below -
After all commits land, or the user cancels (see
Mid-loop cancel), or theMid-loop adjustun-landed-drops-to-zero / merge-absorbs-into-landed branches complete the un-landed work, markStep 10: Interactive Commitsascompletedand proceed to Step 11
Approval token closed list (per § No-Stall Principle's "do not rely on exact-phrase matching" rule). The example phrases below are illustrative, not literal discriminators — categorize each user response into one of the four buckets via semantic judgment:
- accept: explicit affirmative — "OK" / "approve" / "next" / "LGTM" / "コミットして" / "進めて" / "いいよ" or any semantic equivalent
- adjust: specific revision request — "subject を ... に" / "this file should be in commit 2" / "split this commit" / any other concrete change demand
- cancel / stop: explicit halt — "stop" / "abort" / "やめる" / "中断"
- NOT approval: interrogative or non-committal — "look good?" / "どう?" / "これでいい?" / "OK ?". Treat as
adjustand re-present (do not silently advance)
Localized summary tokens (per references/plan-format.md § Localization granularity). These tokens are defined here as the single source of truth — § Completion below references the same paired form rather than re-rendering it:
language: ja:Step 10 部分完了: <N>/<total> コミット適用済みlanguage: en:Step 10 partial completion: <N>/<total> commits landed
§ Completion below emits the localized token whenever Step 10 ended via Mid-loop cancel. On a normal completion path (every commit landed, or the Mid-loop adjust un-landed-drops-to-zero / merge-absorbs-into-landed branches), no partial-state line is needed.
Step 11: Update Rules
-
Skill(extract-rules)with--from-conversation(always) -
Skill(extract-rules)with--update(trigger on either: significant structural/pattern changes to application code occurred — new frameworks, libraries, architectural patterns, or API conventions introduced in the diff; prose-only changes to SKILL.md, agent definitions, references, or rule files do not qualify, OR a dependency had a recent major-version bump — i.e. the semver major digit increased in the manifest, not a minor / patch — detected viagit diff <base-commit>of the package manifest. The same signal used in the Step 2 difficulty assessment. The major-bump trigger opens the extract-rules Update Mode operational note, which prompts manual review of.examples.mdsamples that may have gone stale after the bump) -
Char-count compaction gate:
Skip condition: If
compact_rulesis nottrue(i.e. the defaultfalse, or any non-boolean value that fell back tofalse), skip this entire sub-step — do not invokeSkill(extract-rules) --compact, do not initializecompaction_applied_count/below_threshold_failed_files, do not open the Step 11 compaction approval gate, and proceed directly to sub-step 4. Emit a one-line informational note in the resolvedlanguageso the user has a visible signal that compaction is intentionally not running:language: ja:Step 11 sub-step 3(圧縮)を skip しました — \compact_rules: true` が設定されていません(実験的機能 / デフォルト無効)`language: en:Step 11 sub-step 3 (compaction) skipped — \compact_rules: true` is not set (experimental feature / disabled by default)`
State-variable lifecycle (each variable specifies 4 points: initialization / advance / non-advance / reference):
-
compaction_applied_count:- initialization:
0at sub-step 3 entry - advance: increment by 1 in 3.c on the accept-path for each file with
applied_edits_count > 0(also covers the per-file-accepted subset ofadjustcase 1 per-file disposition request — that branch applies the accept sub-rule per file) - non-advance: do not increment on reject /
adjustcase 2 (clarification, state unchanged across re-entry) /adjustcase 3 (other, routed to case 2) / cancel / no-actionable / error / verdict-parse-failure / schema-violation paths. The user-approval requirement gates all increments — paths that do not result in user-accepted edits leave the counter untouched - reference: § Completion's compaction-specific reminder line
- initialization:
-
below_threshold_failed_files:- initialization:
[]at sub-step 3 entry - advance: append in 3.c on the accept-path for files where
below_threshold: false(per-file status in {partial,unresolved}); also covers the per-file-accepted subset ofadjustcase 1 - non-advance: do not append on reject /
adjustcase 2 (clarification) /adjustcase 3 (other, routed to case 2) / cancel / no-actionable / error / verdict-parse-failure / schema-violation paths. Same reasoning ascompaction_applied_count— only user-accepted state is propagated to § Completion - reference: § Completion's compaction-specific reminder line
- initialization:
Note: target file selection / char-count check / threshold filter are owned by
extract-rules(Step CP1). This step does not Glob the rules directory or measure char counts itself — those concerns stay insideextract-rulesso this skill's prose contains no repository-specific layout assumption.Pre-invocation reminder: the next tool call is
Skill(extract-rules) --compact(no file arguments). Treat the fenced JSON verdict as a return value (parse → branch → next dispatch), not a turn boundary. See§ No-Stall Principle.a. Invoke
Skill(extract-rules) --compact(no file arguments — extract-rules resolvesoutput_dirfrom its own config, globs<output_dir>/**/*.md, appliescompaction_thresholdfilter, and selects the target set internally)b. Parse the fenced JSON return contract (first-match-wins evaluate-in-order):
- Verdict missing / malformed → record
compaction-error: verdict parse failurewarning, do not run user-gate, proceed to sub-step 4 - Schema violation (required keys missing, or
files_processeditems lackingpath/chars_before/chars_after/iterations_used/applied_edits_count/per_file_status/below_threshold/structural_notes) → recordcompaction-error: verdict schema violationwarning, do not run user-gate, proceed to sub-step 4 - Top-level
status: "no-actionable"→ record informational message (compaction not actionable: <reason>), do not run user-gate, proceed to sub-step 4 - Top-level
status: "error"→ recordcompaction-error: <reason>warning, do not run user-gate, proceed to sub-step 4 (no partial-state user-gate; top-level error means no per-file state to surface) - Top-level
status: "compacted"→ proceed to user-gate (c)
c. Step 11 compaction approval gate (USER APPROVAL GATE): present per-file summary preamble (per
references/plan-format.md§ User-gate summary preamble) followed by per-file detail to the user; wait for the user response and categorize per § Approval token closed list (Step 10 defines the 4-bucket classifier — accept / adjust / cancel / NOT approval — that this gate reuses for response categorization; disposition semantics are specified locally below and do not borrow Step 10's Mid-loop adjust / Mid-loop cancel implementations).- Per-file detail render rule: render each file as
<path>: <chars_before> → <chars_after> chars (<below_threshold_label>, <per_file_status> in <iterations_used> iters[, <structural_notes_count> notes])(omit the bracketed, <structural_notes_count> notessegment entirely whenstructural_notesis empty / count is 0) where<below_threshold_label>isunder thresholdwhenbelow_threshold: trueandover thresholdwhenbelow_threshold: false. The<below_threshold_label>strings (under threshold/over threshold) and the JSON field tokens (per_file_status,iterations_used, etc.) are preserved verbatim across resolvedlanguagevalues perreferences/plan-format.md§ Localization granularity's "file-internal identifiers" rule. The unit namesitersandnotesalways render in plural form regardless of count (e.g.in 1 iters,, 1 notes— singular/plural unagreement is an accepted trade-off; verbatim plural keeps the display label aligned with the underlying JSON field namesiterations_used/structural_notes, so users mapping the rendered line back to the JSON verdict do not have to mentally de-pluralize). Placing<below_threshold_label>near the front lets the user read threshold-state at a glance - accept disposition: keep working-tree changes. For each file with
applied_edits_count > 0, incrementcompaction_applied_count. For each file withbelow_threshold: false, append the path tobelow_threshold_failed_files - reject disposition: for each file in
files_processedwithapplied_edits_count > 0, rungit checkout HEAD -- <file>to revert the working-tree change.structural_notesare caller-judgment notes (already not applied). Do not incrementcompaction_applied_count; do not append tobelow_threshold_failed_files(per the state-variable lifecycle above) - adjust disposition — Step 11's own closed list (Step 10's Mid-loop adjust branches a–f are not reused; this gate has its own three cases):
- Per-file disposition request (e.g. "accept file A, reject file B"): apply the disposition per-file using the accept/reject sub-rules above, then exit the gate
- Clarification request (ambiguous / un-classifiable): ask the user a clarifying question and re-enter the gate. State variables are unchanged across the re-entry
- Other adjust (e.g. "apply only structural_note X"): out of scope for this gate — handle as case 2 (clarifier) and await a request that fits cases 1 or 2
- cancel disposition: leave the working tree as-is, do not record state changes (same
no revertsemantic as Step 10'sMid-loop cancel— the workflow does not revert; the user may rungit checkout/git stash/git resetmanually if desired). Unlike Step 10's cancel, this gate's cancel does not terminate the workflow — proceed to sub-step 4 per the Return-point no-stall reminder below. Do not incrementcompaction_applied_count; do not append tobelow_threshold_failed_files. The next/dev-workflowrun with the same working-tree state will see the same files still over threshold and re-fire the gate
Return-point no-stall reminder: At gate decision (accept / reject / adjust resolution / cancel / no-actionable / error path / schema violation — any non-error result), the next action — sub-step 4 — must be issued in the next tool call. See
§ No-Stall Principle.Step 10 / Step 11 ordering note: Step 11 runs after Step 10 (Interactive Commits), so any compaction edits applied here are not committed by the workflow — they remain in the working tree. § Completion's compaction-specific reminder instructs the user to commit the rule files manually before opening a PR.
Self-application warning: When the current
/dev-workflowrun is itself modifyingextract-rulesordev-workflow, sub-step 1 (--from-conversation) may have just appended entries to a.local.mdrule file that the immediately-following compaction may merge or drop. The user-gate at (c) provides the reject path; users can also opt out for the whole run via thecompaction_thresholdconfig (set to a high value inextract-rules.local.md). -
If extract-rules is unavailable, skip this step and inform user
-
After the applicable invocations above return, or after the step was skipped because extract-rules is unavailable — regardless of whether new rules were added or the report indicates nothing changed — mark
Step 11: Update Rulesascompletedand proceed automatically. Per the No-Stall Principle, do not wait for user input. -
If extract-rules wrote any changes to
.claude/rules/during sub-steps 1, 2, or 3, record the count so § Completion can surface the manual-commit reminder. The compaction-specific count (file-unitcompaction_applied_count) is rendered separately by § Completion's "Step 11 compaction reminder" — see § Completion below
Step 11.5: Self-Retrospective
Emit a sanitized improvement signal for the dev-workflow-bundle skills (dev-workflow, ask-peer, extract-rules, rules-review) to a user-configured destination. Raw conversation jsonl stays in-session; only abstracted, project-agnostic text leaves.
Skip this step if self_retrospective.feedback is unset/invalid (Step 1 did not register the row) or the task was assessed Simple or Trivial (Step 2 difficulty assessment pre-marked the row completed). Otherwise read references/self-retrospective.md and follow the procedure from top to bottom.
Manual re-run (same-session only): If the task was assessed Simple or Trivial and Step 11.5 was auto-skipped, and the user in the same session explicitly requests Step 11.5 execution (e.g. "run the retrospective for this run anyway"), bypass the Simple/Trivial hard-skip and follow references/self-retrospective.md from §1. Do not update the already-completed TodoWrite row. At §1.4, confirm with the user that the auto-detected session jsonl matches the intended run (multi-instance setups may pick the wrong one). Cross-session re-runs are unsupported: once the session ends, the Step 2 difficulty assessment and in-memory context are lost. The override covers only the Simple/Trivial hard-skip — an unset or invalid self_retrospective.feedback still blocks Step 11.5 (those gates are about missing destination, not difficulty).
Completion
Report summary: tasks completed, files modified, test results, review outcomes, rules updated. Output in the resolved language following references/plan-format.md § Localization granularity.
Step 10 partial-state line: if Step 10 ended via its Mid-loop cancel branch, emit the localized partial-completion token defined at § Step 10's "Localized summary tokens" paragraph. On a normal completion path, omit this line.
Step 11 rule-update reminder (per references/plan-format.md § Localization granularity): if Skill(extract-rules) wrote any changes to .claude/rules/ during Step 11, surface a manual-commit reminder in the resolved language:
language: ja:extract-rules が \.claude/rules/` に <N> 件の変更を加えました — PR を開く前に手動で commit してください`language: en:extract-rules made <N> changes to \.claude/rules/` — please commit manually before opening a PR`
The reminder is omitted when the rule-change count is zero.
Step 11 compaction reminder (per references/plan-format.md § Localization granularity): when compaction_applied_count > 0 (the Step 11 sub-step 3 char-count compaction gate landed user-accepted edits), surface a separate manual-commit reminder in the resolved language (rendered in file-unit count, distinct from the rule-update reminder above which counts entry-level writes):
language: ja:Step 11 で <N> 件のルールファイルを圧縮しました — PR を開く前に手動で commit してくださいlanguage: en:Step 11 compacted <N> rule files — please commit manually before opening a PR
When below_threshold_failed_files is non-empty, additionally surface a follow-up reminder naming the files that remain over threshold. <files> always renders at the sentence tail so the block-level list never appears mid-sentence:
language: ja:<M> 件のファイルが閾値を超えています。手動で再度 \Skill(extract-rules) --compact` を実行するか、当該ファイルを直接編集してください:followed by<files>` on the next linelanguage: en:<M> files still exceed the threshold. Re-run \Skill(extract-rules) --compact` manually or edit the files directly:followed by<files>` on the next line
Render <files> as one path per line — verbatim from files_processed[].path (repo-root-relative, e.g. .claude/rules/project.rules.local.md; never rewritten to user-absolute /Users/... form) — each prefixed with - (hyphen + space, no leading indent) directly below the reminder sentence as a top-level markdown bullet list. This applies for any M ≥ 1 — single-element lists render as a one-bullet list, not inline, so the layout is identical across runs and the trailing prose clause never floats after the bullet list.
The compaction reminder is omitted when compaction_applied_count == 0 AND below_threshold_failed_files is empty.
If this run was executing a subtask from a decomposition state file, also do the following (all reads/writes target the canonical state-file path recorded in Step 1.5):
- Mark the current subtask's
statusascompletedin the canonical state file and write back - Ask the user for an optional PR URL for this subtask. On a non-empty answer, set the subtask's
prfield and write back; otherwise leave itnull - Refresh the parent-task TodoWrite row's
<done>/<total>count - Find the next runnable subtask (smallest-id
pendingwith alldepends_oncompleted) - If a next subtask exists: branch on whether Step 10 actually landed any commits this run (use the landed_count from Step 10 — taking the config flag alone would mis-route the case where
interactive_commits: truemet the Step 10 skip conditions and exited at zero commits):landed_count > 0: tell the user the current subtask's changes have already been committed by Step 10 — open a PR for those commits, then start a new session with/dev-workflow --resume <slug>once the PR is uplanded_count == 0(either becauseinteractive_commits: falseor because Step 10 was skipped): tell the user to commit the current subtask's changes and open a PR before resuming, then start a new session with/dev-workflow --resume <slug>. Explain why this matters: the next run records a fresh base-commit from HEAD, so uncommitted changes would leak into the next subtask's diff In both branches, if Step 11 also wrote rule updates (i.e., the Step 11 rule-update reminder above fired with<N> > 0), tell the user to commit those.claude/rules/writes manually before resuming — otherwise they leak into the next subtask's diff the same way uncommitted feature changes would. The "no push" invariant for both branches is stated at § Step 10's preamble
- If no next subtask exists (all subtasks completed): delete the canonical state file via
rm <canonical-path>, remove the parent-task TodoWrite row, and include every subtask's title and recordedpr(if any) in the parent-task completion summary