MANDATORY RULES: VIOLATION IS FORBIDDEN
- Response language follows
languagesetting in.agents/oma-config.yamlif configured. - NEVER skip phases. Execute from Phase 0 in order. Explicitly report completion of each phase to the user before proceeding to the next.
- You MUST use MCP tools throughout the entire workflow. This is NOT optional.
- Use code analysis tools (
get_symbols_overview,find_symbol,find_referencing_symbols,search_for_pattern) for code exploration. - Use memory tools (read/write/edit) for progress tracking.
- Memory path: configurable via
memoryConfig.basePath(default:.serena/memories) - Tool names: configurable via
memoryConfig.toolsin.agents/mcp.json - Do NOT use raw file reads or grep as substitutes. MCP tools are the primary interface for code and memory operations.
- Use code analysis tools (
- This workflow does NOT stop until all completion criteria pass or safeguards trigger.
- Follow the context-loading guide. Read
.agents/skills/_shared/core/context-loading.mdand load only task-relevant resources.
Vendor Detection
Before starting, determine your runtime environment by following .agents/skills/_shared/core/vendor-detection.md.
The detected vendor determines how ultrawork spawns agents internally.
Phase 0: INIT (DO NOT SKIP)
Step 0.1: Load Prerequisites
- Read
.agents/skills/_shared/core/context-loading.mdfor resource loading strategy. - Read
.agents/skills/_shared/runtime/memory-protocol.mdfor memory protocol. - Read
.agents/workflows/ralph/resources/judge-protocol.mdfor JUDGE rules. - Read
.agents/skills/_shared/runtime/event-spec.mdfor the L1 event protocol and theoma_emithelper (used by the EXEC checkpoint in Step 1.2).
Step 0.2: Define Completion Criteria
Analyze the user's request and define verifiable completion criteria. Each criterion MUST have:
criteria:
- id: C{N}
description: "<what to achieve>"
verification: "<how to verify — test result, build output, file existence, command output>"
status: PENDING
fail_count: 0 # consecutive failures only — resets to 0 on PASS
previous_status: null # last non-null status from prior iteration
regressed_at_iteration: null # iteration number when PASS → FAIL transition was detected
affected_paths: [] # optional glob list — only set when verification takes >30s
# used by judge-protocol's cache rules; see judge-protocol.md § "Caching for Heavy Verification"
Rules:
- Every criterion must be mechanically verifiable (test pass, build success, file exists, command output)
- Reject subjective criteria ("looks good", "feels right"). Ask the user to rephrase.
- Present criteria to the user for confirmation before proceeding
Step 0.3: Initialize Session
- Generate a
sessionId({YYYYMMDD-HHmmss}timestamp). All ralph memory files for this run are session-scoped with this suffix, per memory-protocol session-scoped naming. Never write to an unsuffixedsession-ralph.md— consecutive ralph runs must not overwrite each other. - Set
max_iterations: 5(default safeguard) - Set
current_iteration: 0 - Load prior-session context (cross-session memory):
- Use the memory list tool to find previous
session-ralph-*.mdfiles. If any exist, read the most recent one and extract: final criteria statuses, BLOCKED items with their failure evidences, and any safeguard trigger. - If
lessons-learned.mdexists in the memory base path, read it. - If any current criterion overlaps a previously BLOCKED item, re-confirm with the user before proceeding: present the prior failure evidence and ask whether to retry it (carrying that evidence as context for EXEC) or pre-mark it BLOCKED for this session.
- Use the memory list tool to find previous
- Record session start using memory write tool:
- Create
session-ralph-{sessionId}.mdin the memory base path - Include: session start time, user request summary, completion criteria, max_iterations, and prior-session findings loaded in step 4 (or
none)
- Create
Phase 1: EXEC
// turbo
Step 1.1: Prepare Ultrawork Input
Compose the ultrawork input based on current iteration:
- Iteration 1: Full user request with all PENDING criteria
- Iteration 2+: REMAINING (FAIL + REGRESSED) criteria from previous JUDGE result, with:
- Previous JUDGE results as context (what failed and why)
- Suggested actions from JUDGE
- Already-PASSED criteria excluded from implementation scope (do not re-implement), but they remain in JUDGE scope (will be re-verified to detect regressions)
Step 1.2: Execute Ultrawork
EXEC-entry checkpoint (MANDATORY — emit before delegating). This records, in the auditable L1 event log, that this iteration delegates to the full ultrawork workflow. A run without this event is a non-compliant run.
oma_emit "decision.made" '{"subject":"ralph.exec-delegated","decision":"Delegate this iteration to the full ultrawork 5-phase workflow.","rationale":"Ralph EXEC must run ultrawork in full; abridging, substituting, or skipping phases for cost/stability/time reasons is forbidden without explicit user approval."}'
oma state:verify --workflow ralph --checkpoint exec-delegated
Delegate to the ultrawork workflow:
- Read and follow
.agents/workflows/ultrawork.mdstep by step. - Pass the prepared input as the task description.
- Ultrawork handles all vendor-specific agent spawning internally.
- Wait for ultrawork to complete all 5 phases (PLAN, IMPL, VERIFY, REFINE, SHIP).
- Do NOT abridge ultrawork. If you believe the environment (subagent instability, cost, time) warrants reducing fan-out or collapsing phases, STOP and ask the user first. Single-judgment substitution of ultrawork's structure is forbidden — see the Anti-Circumvention gate in Step 1.3.
Step 1.3: Verify EXEC Artifacts (Anti-Circumvention Gate)
Prose instructions ("run ultrawork in full") are advisory and can be rationalized away. This gate verifies the work mechanically — by its artifacts, not by your own narration. Ultrawork's 5 phases each leave a durable trace; a single-agent shortcut cannot produce them without actually doing the work.
Run the deterministic verifier from the repo root:
oma ralph:verify --json --session {sessionId} --newer-than {iteration_start_iso}
--sessionscopes the plan artifact to this iteration's session id;--newer-than(this iteration's EXEC start time, ISO-8601) excludes stale artifacts from earlier iterations. Omit either when unknown.- The command checks the artifact table below, prints a structured result (
ok,checks,missing,remediation), and exits non-zero on failure. On failure it also appends agate.failedL1 event automatically. - The JSON verdict IS the gate result. Do NOT substitute your own narration for it, and do NOT proceed on a non-zero exit.
- Manual fallback (only when the
omaCLI is unavailable): check, using memory read / file existence tools, that the just-completed iteration produced ALL of the artifacts below. Resolve{memBase}frommemoryConfig.basePath(default.serena/memories).
| # | Artifact | Proves phase ran |
|---|----------|------------------|
| A1 | {memBase}/session-ultrawork.md with this iteration's phase-completion records | PLAN + gate progression |
| A2 | .agents/results/plan-{sessionId}.json | PLAN produced a real task breakdown |
| A3 | {memBase}/result-qa*.md or .agents/results/result-qa*.md (VERIFY) | a distinct QA agent ran — absent if IMPL was the only spawn. CLI fallback writes result-qa-agent* to {memBase}; Claude-native qa-reviewer writes result-qa* to .agents/results/ |
| A4 | {memBase}/result-debug*.md or .agents/results/result-debug*.md (REFINE) | a distinct Debug agent ran — same naming split (debug-investigator on the native path) |
Decision:
ok: true(exit 0) → ultrawork ran in full. Proceed to Step 1.4.ok: false(exit 1,missingnon-empty) → treat EXEC as NOT performed (the iteration was abridged to implementation-only, regardless of what the EXEC narration claims). Do NOT advance to JUDGE as if work completed. Instead:- Record the violation in
session-ralph-{sessionId}.md:exec-circumvention detected at iteration {N}: missing {artifact}. - Emit the audit event:
oma_emit "decision.made" '{"subject":"ralph.exec-circumvention","decision":"EXEC artifacts incomplete — ultrawork did not run in full.","rationale":"Required VERIFY/REFINE agent result files are absent; the iteration was abridged."}' - STOP and report to the user that ultrawork was not executed in full, citing the missing artifact. Ask whether to re-run the iteration in full or to explicitly authorize a reduced-scope run. Do NOT silently retry with the same abridged approach.
- Record the violation in
REFINE skip exception: ultrawork permits skipping REFINE for trivial tasks (< 50 lines, see ultrawork
REFINE_GATEskip conditions). If REFINE was legitimately skipped, A4 may be absent — butsession-ultrawork.mdMUST record the documented skip reason. "No A4 and no recorded skip reason" is a circumvention, not a skip.oma ralph:verifyimplements this rule: a recorded skip reason reports A4 asskip-recorded(passing), an unrecorded absence reportsmissing(failing).
Step 1.4: Record EXEC Completion
- Increment
current_iteration - Use memory edit tool to record iteration start in
session-ralph-{sessionId}.md
Phase 2: JUDGE
Step 2.1: Independent Verification (Spawned Judge)
The judge is a separate agent with fresh context — not a role the orchestrator plays. The orchestrator that drove EXEC shares context with the implementation and cannot self-judge without rationalization risk. Spawning is the default; inline judging is a recorded exception.
- Compose the judge brief. It contains ONLY:
- The criteria table: id, description, verification method, previous_status, fail_count, affected_paths
- The verification cache records from
session-ralph-{sessionId}.md(if any) - The required output format (Step 2.2) and a pointer to
.agents/workflows/ralph/resources/judge-protocol.md - Do NOT include EXEC narration, implementation summaries, or any claim about what was fixed. The judge verifies what IS, not what was intended.
- Spawn the judge via Per-Agent Dispatch (see Vendor Detection):
- If Claude Code and target vendor is Claude:
Agent(subagent_type="qa-reviewer", prompt="<judge brief>. Follow .agents/workflows/ralph/resources/judge-protocol.md. Execute every verification command and write the JUDGE result to memory as result-judge-{sessionId}-iter{N}.md.") - Otherwise, or when native dispatch is unavailable:
oma agent:spawn qa-agent "<judge brief>" {sessionId} - Verification is mechanical (run command, check exit code/output) — a lower-cost model tier is acceptable where the runtime supports per-agent model selection.
- If Claude Code and target vendor is Claude:
- Wait for
result-judge-{sessionId}-iter{N}.md, then read it as the JUDGE result. - Inline fallback (exception): only if subagent spawning is unavailable in the current runtime, perform the verification inline. Record
judge-inline-fallback at iteration {N}insession-ralph-{sessionId}.mdand emit:oma_emit "decision.made" '{"subject":"ralph.judge-inline-fallback","decision":"Run JUDGE inline in the orchestrator context.","rationale":"Subagent spawning unavailable in this runtime; judge independence is downgraded for this iteration."}'
For EVERY criterion regardless of current status (including PASS from prior iterations), the judge executes the verification method defined in Phase 0:
- Run tests, then check pass/fail count
- Run build, then check exit code
- Check file existence and verify path
- Run specific commands, then check output
Why re-verify PASS criteria: ultrawork modifies shared code (utils, configs, migrations, dependencies). A PASS in iteration N may regress in iteration N+1 when fixing other criteria. Without re-verification, "DONE" can ship silent regressions.
Heavy verification caching: For verifications that take >30 seconds (e2e tests, integration suites), apply the caching rules in judge-protocol.md § "Caching for Heavy Verification" to skip re-runs when no relevant files changed.
Follow .agents/workflows/ralph/resources/judge-protocol.md for the full protocol.
Step 2.2: Produce JUDGE Result
Output the JUDGE result in this exact format:
## JUDGE Result — Iteration {N}
| Criterion | Status | Evidence |
|-----------|-----------|---------------------------------------------------------|
| C1 | PASS | <concrete evidence> |
| C2 | FAIL | <concrete evidence of failure> |
| C3 | BLOCKED | <failed 3x: reason> |
| C4 | REGRESSED | previously PASS at iter N — now FAIL: <evidence + diff> |
verdict: PASS | FAIL
If verdict is FAIL, also output:
remaining:
- id: C{N}
reason: "<why it failed>"
suggested_action: "<what to try next>"
fail_count: {N}
regression: true | false # true if status is REGRESSED
previous_pass_iteration: {N} # only when regression: true
Step 2.3: Apply JUDGE Result
Before updating any criterion, capture the current status into previous_status. Then apply the transition rules in order:
- Verification passed →
PASS. Resetfail_countto 0 andregressed_at_iterationto null (fail_counttracks consecutive failures only; a pass breaks the streak). - Verification failed AND
previous_status == PASS→REGRESSED. Setregressed_at_iteration: {current_iteration}. Do NOT incrementfail_counton the first regression; regression is treated as a distinct first-class signal, not a normal failure streak. Subsequent consecutive failures of the same criterion follow rules 3-4. - Verification failed AND not a regression AND
fail_count < 3→FAIL. Incrementfail_count. - Verification failed AND
fail_count >= 3→BLOCKED.
Decision Gate impact:
REGRESSEDis treated asFAILfor verdict computation (verdict becomes FAIL, REPLAN triggers).REGRESSEDis NOT counted toward "DONE"; onlyPASSandBLOCKEDcount.
Phase 2 → Decision Gate
Evaluate the JUDGE result:
→ DONE (All criteria PASS or BLOCKED)
If all criteria are either PASS or BLOCKED:
- If any BLOCKED exists: Report partial completion with BLOCKED items listed
- If all PASS: Report full completion
- Use memory edit tool to record final results in
session-ralph-{sessionId}.md - Output completion summary:
## Ralph Complete — Iteration {N}/{max} PASSED: C1, C2, ... BLOCKED: C3 (if any) Total iterations: {N} - Workflow ends.
→ REPLAN (Any criterion is FAIL or REGRESSED)
If any criterion has status FAIL or REGRESSED, proceed to Phase 3.
→ SAFEGUARD (max_iterations reached)
If current_iteration >= max_iterations:
- Force stop regardless of FAIL criteria
- Report partial completion:
## Ralph Safeguard — Max Iterations Reached ({max}) PASSED: C1, ... FAILED: C2, ... (still unresolved) BLOCKED: C3, ... (if any) Recommendation: Review FAILED criteria manually or increase max_iterations. - Use memory edit tool to record safeguard trigger in
session-ralph-{sessionId}.md - Workflow ends.
Phase 3: REPLAN
// turbo
Step 3.1: Extract Remaining Work
From the JUDGE result, collect criteria with status FAIL or REGRESSED. Treat the two classes separately:
- FAIL (first-time or persistent failures): list each with its reason and suggested_action
- REGRESSED (previously PASS, now FAIL): list each with previous-pass iteration, the inter-iteration diff that likely caused the regression, and a regression-specific suggested_action.
- Include previous iteration's JUDGE evidence as context
- Explicitly state which criteria are PASS (do not re-implement, but do not exclude from next JUDGE either)
- Explicitly state which criteria are BLOCKED (do not retry)
Step 3.2: Narrow Scope
Compose a focused task description containing the remaining work, separating regressions from first-fail items so ultrawork's reasoning differs:
## Ralph Iteration {N+1} — Remaining Work
### Already Complete (DO NOT re-implement; will be re-verified by JUDGE)
- C1: <description> PASS
### Blocked (DO NOT retry)
- C3: <description> BLOCKED (failed 3x)
### Regressed (was passing — diagnose what broke it; minimal fix that preserves recent changes)
- C4: <description>
- Last passed at: iteration {N}
- Failed at: iteration {current}
- Files changed since last pass: <list of modified paths>
- Failure evidence: <evidence>
- Suggested action: diff-aware diagnosis — identify which change in the listed files broke C4, fix that specifically without reverting the criterion that change was made for
### To Fix (first-time or persistent failures)
- C2: <description>
- Previous failure: <evidence>
- Suggested action: <action>
Why separate Regressed from To Fix: ultrawork prompts that frame work as "fix from scratch" vs "diagnose a regression" produce different reasoning paths. Regressed items should trigger diff-based investigation, not greenfield re-implementation.
Step 3.3: Loop Back
- Use memory edit tool to record REPLAN in
session-ralph-{sessionId}.md - Return to Phase 1: EXEC with the narrowed scope
Summary
Phase 0: INIT → Define criteria, load prior sessions, initialize session
↓
Phase 1: EXEC → Run ultrawork (full or narrowed scope)
↓
Phase 2: JUDGE → Spawned fresh-context judge verifies each criterion
↓
Decision: DONE? → End
SAFEGUARD? → Force end
FAIL? → Phase 3
↓
Phase 3: REPLAN → Extract remaining, narrow scope
↓
└──→ Phase 1 (loop)
| Phase | Purpose | Key Action | |---------|----------------------------|-----------------------------------| | INIT | Define success criteria | Verifiable criteria + prior-session load + session init | | EXEC | Implementation | Delegate to ultrawork | | JUDGE | Independent verification | Spawned judge; evidence-based pass/fail per criterion | | REPLAN | Scope narrowing | Extract FAIL + REGRESSED items, separated by class |