Reduce Orchestrator
Overview
Run a constrained, domain-agnostic MapReduce(+Verify) loop that is suitable for code/debugging/analysis/research/docs, while keeping scope strict, evidence explicit, and run artifacts reproducible and concurrency-safe.
Non-Negotiables (must enforce)
Mandatory dependency: map-worker
- Run all Map and Verify work via the
map-workerskill. - Every spawned worker prompt must include this exact line: “Use the map-worker skill.”
- Also include
$map-workerin the prompt to reliably trigger the installed hyphen-case skill.
Mandatory orchestration style (must keep orchestrator context small)
Goal: the orchestrator must be able to scale parallelism without its own context exploding.
- The orchestrator must operate primarily on paths + small state files, not by inlining large worker outputs into its own context.
- Treat
report_pathmarkdown files as the sole worker source-of-truth. Prefer referencing paths over quoting content. - Never read worker logs into the orchestrator context. Do not open/ingest:
.rlm/runs/<run_id>/artifacts/logs/*.log(map-worker/codex logs)- any stdout/stderr capture files
- any other large, non-contract outputs
- Use file freshness / presence as the primary stall signal (via
report_path,inflight.json, andrlm_watch_reports.py) rather than reading verbose logs. - If a worker appears stalled or failed, do not “debug by reading logs”; instead:
- treat a missing/empty
report_pathas a stall signal, - retry using a new
worker_id+report_path(and/or reduce granularity / upgrade compute_tier), - and if human debugging is required, ask the user to inspect logs out-of-band and summarize (do not paste logs into chat).
- treat a missing/empty
Work delegation (orchestrator does not “do” the work)
- Do not perform substantive domain work directly in the orchestrator (no manual repo edits, no ad-hoc analysis outside workers).
- Treat the orchestrator as: planner + dispatcher + integrator + lifecycle manager.
- The only direct orchestrator actions should be:
- write/update
.rlm/runs/<run_id>/plan.jsonand final outputs, - run lifecycle/admin steps (locks, archiving, TTL cleanup) via the skill-bundled
rlm_admin.py(typically$CODEX_HOME/skills/reduce-orchestrator/scripts/rlm_admin.py), - and spawn workers to do everything else (via the mandatory scripts below).
- write/update
- If an implementation drifts out-of-scope (scope creep), schedule a follow-up worker to trim/revert the drift (e.g.,
impl-0001c“scope-trim”) instead of applying manual edits in the orchestrator. - If you (the orchestrator) accidentally performed substantive work directly, immediately:
- record the deviation in the run’s final/reduce output, and
- schedule a repair/scope-trim worker to reconcile changes back to contract scope (or justify and re-scope via an updated contract).
- Do not silently “normalize” drift; preserve provenance and make the correction workflow explicit.
Worker sizing: granularity + compute_tier
Define every worker along two axes:
granularity(how big the task unit is; drives parallelism and scope control)compute_tier(how strong the model/effort is; drives reasoning capacity)
granularity (recommended vocabulary)
micro: one narrow outcome; ideally 1 file / 1 function / 1 command; should be trivially reviewable.meso: a small cohesive change across a few files or a small subsystem; may require a follow-up join worker.macro: cross-cutting work (multi-subsystem, large-context synthesis, refactor/architecture); avoid assigning macro directly—prefer splitting into micro/meso workers + a join worker.
Default planning rule:
- If you think you need
macro, you probably need 3–8 micro/meso workers + 1 join.
compute_tier (model requirement for all spawned workers)
Spawn workers only via Codex CLI using one of these compute mappings (pick per worker):
| compute_tier | model | model_reasoning_effort | |---|---|---| | standard | gpt-5.1-codex-mini | medium | | standard-plus | gpt-5.2-codex | medium | | heavy | gpt-5.2-codex | xhigh |
Use heavy for any task that requires high-level synthesis, large-context processing, or deep reasoning—e.g., multi-file edits, core-logic changes, or abstraction/re-architecture. When in doubt, upgrade compute_tier to heavy or reduce granularity (split work).
Model escalation (ask user to change models when needed)
If you are using the tiered worker models appropriately (standard/standard-plus/heavy) and the outcome is still not satisfactory (e.g., repeated low-quality reports, missed dependencies, persistent misalignment with the run contract, or inability to complete the task within budget), explicitly ask the user to approve a model change.
- Be concrete about the failure mode (what was expected vs what happened).
- Propose the smallest model change that plausibly fixes it (e.g., promote specific tasks to
heavy, or switch to a stronger model for the orchestrator/workers). - If the user declines, proceed with scope reduction and/or additional verification as a fallback.
Run contract (workers must comply)
- For every run, write a single authoritative run contract document at:
.rlm/runs/<run_id>/artifacts/context/run_contract.md
- Every worker prompt must:
- include the
run_contract.mdpath inhint_paths(or otherwise provide the path explicitly), and - instruct the worker to read it first and follow it.
- include the
- Every worker narrative report must include a “Contract Compliance” section stating:
- whether the contract was read,
- which constraints were applied,
- any deviations (with justification), and
- any requested contract updates / open risks.
Minimum required contents of run_contract.md (keep concise; narrative is fine):
- Goal + non-goals
- Scope boundaries (directories/files to avoid)
- Invariants (must-not-change behaviors, API contracts, data formats)
- Dependency/sequence rules (e.g., “interfaces → call sites → cleanup”)
- Verification requirements (what must pass; which environment to use; repo-root CWD assumption if applicable)
- Failure policy (rollback / retry / scope reduction triggers)
Environment baseline (ask user if not specified)
- If the repo’s applicable
AGENTS.mddoes not specify a verification/runtime environment baseline, you must request it from the user before treating Verify results as authoritative. - Default assumption (unless explicitly overridden): workers execute commands from the repo root (
CWD = repo root, i.e.-C .). If that assumption is unsafe for the repo, ask the user to confirm the expected CWD. - Ask for the minimum needed baseline (keep it short):
- interpreter choice (e.g.,
pythonvs.venv/bin/python) - any required env vars (e.g.,
PYTHONPATH,NAUTILUS_*) - expected working directory assumptions (repo root vs anywhere)
- interpreter choice (e.g.,
- Record the agreed baseline in
run_contract.mdunder “Verification requirements”. - In Verify tasks, instruct workers to explicitly state which interpreter/env they used.
Parallel safety (avoid file conflicts)
- Treat file-level collisions as a first-class risk in parallel Map phases.
- Planning guidance:
- Prefer partitioning work by disjoint file ownership (workers edit non-overlapping file sets).
- If two tasks might touch the same files/symbols, either:
- refactor the plan to make ownership disjoint, or
- serialize those tasks via
depends_on.
- For large refactors, schedule a final “join” worker to integrate and resolve any cross-file issues.
Output contract
Consume narrative report files only. Treat chat text as non-authoritative.
Mandatory scripts (do not hand-roll commands)
To reduce prompt corruption, shell interpolation bugs (e.g. $map-worker expansion), and orchestration context growth:
- Spawn workers: use
$CODEX_HOME/skills/reduce-orchestrator/scripts/rlm_spawn_worker.py(never raw heredocs for worker prompts). - Spawn only READY tasks (recommended default): use
$CODEX_HOME/skills/reduce-orchestrator/scripts/rlm_run_ready.pyto enforcedepends_on+ inflight caps. - Monitor stalls: use
$CODEX_HOME/skills/reduce-orchestrator/scripts/rlm_watch_reports.py(report freshness). - Write a deterministic inventory: use
$CODEX_HOME/skills/reduce-orchestrator/scripts/rlm_reduce.pyto emit missing/stale report lists without reading reports into context. - Collect deferred opportunities: use
$CODEX_HOME/skills/reduce-orchestrator/scripts/rlm_collect_deferred.pyto write.rlm/runs/<run_id>/deferred.jsonfrom worker reports.
If CODEX_HOME is not set, it is typically ~/.codex.
If you cannot use these scripts for some reason, stop and ask the user to fix the environment rather than falling back to ad-hoc shell templates.
State rehydration (mandatory, always)
The orchestrator must never rely on conversational memory for run state. Always rehydrate deterministically from on-disk sources of truth before making any plan/spawn/retry/finish decisions.
Mandatory rehydration procedure:
- Identify the active
run_id(from user, or from.rlm/runs/if explicitly instructed). - Re-load state from these SoT paths (prefer paths over inlined content):
.rlm/runs/<run_id>/plan.json.rlm/runs/<run_id>/artifacts/context/run_contract.md.rlm/runs/<run_id>/artifacts/scheduler/inflight.json(if present).rlm/runs/<run_id>/reduce_state.json(rebuild viarlm_reduce.pyif missing/stale).rlm/runs/<run_id>/deferred.json(rebuild viarlm_collect_deferred.pyif missing/stale)
- Recompute inventory artifacts (no waiting / no sleep loops):
python "$CODEX_HOME/skills/reduce-orchestrator/scripts/rlm_reduce.py" --root . --run-id <run_id> --stale-seconds 1200python "$CODEX_HOME/skills/reduce-orchestrator/scripts/rlm_collect_deferred.py" --root . --run-id <run_id>
- Only after rehydration, proceed with
rlm_run_ready.pyto spawn READY tasks.
Rehydration output rule:
- In the final user-facing message, explicitly state that rehydration was performed and list the key state paths used.
Deferred opportunities (recommended)
When workers surface “good ideas” that are out of scope for the current run (e.g., performance refactors, cleanup, indexing, abstraction), do not silently discard them.
- Keep the current run scoped (do not mix axes), but capture these items as Deferred opportunities with:
- what was proposed (1 line),
- why it was deferred (scope / risk / verification cost),
- how to validate it in a dedicated follow-up run (success criteria + minimal verify).
- Include a short Deferred opportunities section in:
- the run’s
final.md/final.json, and - your final user-facing message (even if empty: “none identified”).
- the run’s
- Mandatory publishing rule: before posting the final user-facing message, explicitly scan the run’s
report_pathmarkdown files for deferred items and include them (with evidence paths).- If nothing is found, still include: “Deferred opportunities: none identified”.
Recommended artifact (SoT, improves reuse): write .rlm/runs/<run_id>/deferred.json and reference it from final.md/final.json.
Mandatory for real runs: write .rlm/runs/<run_id>/deferred.json before archiving, and reference it from final.md/final.json.
Required directory layout and lifecycle
Use this deterministic layout:
.rlm/
runs/<run_id>/
plan.json
artifacts/reports/*.md
reduce_state.json
deferred.json # aggregated from report_path markdown (required for real runs)
final.json # or final.md
artifacts/ # optional large outputs
archived_to.json # written after archiving
archive/<archive_id>/ # immutable snapshot
run/ # snapshot of runs/<run_id>/
meta.json
in_use_by/<run_id> # marker(s) (optional)
locks/
<run_id>.lock
cleanup.lock
cleanup.log
Archive trigger (must do)
When you reach a terminal state (termination_decision.should_finish == true or budget exhausted):
- Write
final.json(orfinal.md) that references evidence/artifacts by path (at minimum: worker report paths + any file paths cited). - Create an immutable archive snapshot under
.rlm/archive/<archive_id>/where<archive_id>includes<run_id>+ timestamp. - Write
.rlm/archive/<archive_id>/meta.jsonwith:goal_summarystart_timestamp,end_timestamptermination_reason- retention:
ttl_days(default),keep_forever(optional),size_bytes(optional)
- Post-archive cleanup of the active run dir:
- remove or compress heavy transient artifacts (e.g.,
artifacts/) - keep only minimal pointers + final outputs
- remove or compress heavy transient artifacts (e.g.,
- Run TTL cleanup after archiving.
Use the skill-bundled rlm_admin.py (typically $CODEX_HOME/skills/reduce-orchestrator/scripts/rlm_admin.py)
to do this safely and deterministically.
TTL cleanup (must do)
Run cleanup:
- at the start of a new run
- after archiving
Delete archives older than ttl_days unless keep_forever == true, and never delete archives with any in_use_by/* markers. Log deletions to .rlm/cleanup.log. Acquire .rlm/locks/cleanup.lock before cleanup; skip cleanup if it already exists.
Concurrency safety (must enforce)
Multiple orchestrators may run concurrently. Enforce:
- Unique
run_idper run (generate if not provided). - Per-run lock file:
.rlm/locks/<run_id>.lock- Acquire atomically; fail fast if it exists.
- If it appears stale (older than threshold, default 24h), warn and require explicit override behavior (do not silently break).
- Release on exit.
- Cleanup lock:
.rlm/locks/cleanup.lock(cleanup must skip if locked). - In-use archive marker: if you reference or depend on an archive, write
.rlm/archive/<archive_id>/in_use_by/<run_id>.
Use the skill-bundled rlm_admin.py for locks, cleanup, archiving, and in-use markers.
Loop algorithm (PLAN → MAP → REDUCE → VERIFY → DECIDE)
0) Initialize a run (locks + cleanup)
Pick a unique run_id and initialize:
python "$HOME/.codex/skills/reduce-orchestrator/scripts/rlm_admin.py" init-run \
--root . \
--run-id <run_id> \
--ttl-days 14
1) PLAN (strict scope; deterministic worker IDs)
Write .rlm/runs/<run_id>/plan.json with:
- goal summary + success criteria + budget (
max_iterations,max_workers, etc.) - a list of Map tasks, each with:
- deterministic
worker_id(e.g.,map-0001,map-0002, …) granularity(micro|meso|macro) (recommended)compute_tier(standard|standard-plus|heavy)goalorintent(may be broad; define the desired outcomes)depends_on(optional; list of priorworker_ids this task relies on)inputs(optional; list of artifact/result paths required from dependencies)context_paths(optional; any additional context paths to include in scope)report_path(required; e.g.,.rlm/runs/<run_id>/artifacts/reports/<worker_id>.md)
- deterministic
hint_paths(optional; include.rlm/runs/<run_id>/if you want a starting scope)
Do not allow two workers to write the same report_path.
Adaptive planning (recommended for exploratory tasks):
- Set
max_iterations >= 2and treat iteration 1 as a standard pass:- Let workers explore within reason, guided by the run contract + hint_paths (scope hints).
- If a worker needs more scope, have them request it explicitly in their narrative report.
- Update
plan.json(and/or schedule the next iteration’s tasks) based on what you learn.
- If you rewrite
plan.jsonbetween iterations, preserve provenance by copying the previous version to:.rlm/runs/<run_id>/artifacts/plan.iter-<n>.json
2) MAP (parallel-first via map-worker)
Spawn Map workers in parallel. For each worker, invoke Codex CLI using the model/effort for its compute_tier, with an explicit envelope that provides the fields you want the worker to follow (at minimum: report_path).
Mandatory: use the skill-bundled launcher script to avoid shell interpolation issues (e.g., $map-worker expansion) when generating prompts.
python "$HOME/.codex/skills/reduce-orchestrator/scripts/rlm_spawn_worker.py" \
--cd . \
--model gpt-5.2-codex \
--reasoning-effort high \
--dangerously-bypass-approvals-and-sandbox \
--run-id <run_id> \
--worker-id map-0001 \
--mode map \
--goal "… (include desired outputs + constraints)" \
--report-path .rlm/runs/<run_id>/artifacts/reports/map-0001.md \
--contract-path .rlm/runs/<run_id>/artifacts/context/run_contract.md \
--hint-path .rlm/runs/<run_id>/ \
--hint-path <project paths…> \
--progress-required \
--log-file .rlm/runs/<run_id>/artifacts/logs/map-0001.log \
--background
To monitor stalls without expanding orchestrator context, check report freshness (mandatory, no polling):
python "$HOME/.codex/skills/reduce-orchestrator/scripts/rlm_watch_reports.py" \
--root . \
--run-id <run_id>
Write a deterministic inventory of report_path artifacts (missing/stale) (mandatory):
.rlm/runs/<run_id>/reduce_state.json
python "$HOME/.codex/skills/reduce-orchestrator/scripts/rlm_reduce.py" \
--root . \
--run-id <run_id> \
--stale-seconds 1200
Do not use sleep loops for waiting. Prefer explicit state:
- Spawn, then exit/return control.
- Re-run
rlm_watch_reports.py/rlm_reduce.pywhen you want an updated view.
Before writing the final output, collect deferred items into .rlm/runs/<run_id>/deferred.json:
python "$HOME/.codex/skills/reduce-orchestrator/scripts/rlm_collect_deferred.py" \
--root . \
--run-id <run_id>
Mandatory default (safer, avoids pre-queue auto-run): spawn only READY tasks (depends_on satisfied) with an inflight cap.
This lets you keep future work in plan.json without actually running it until you re-invoke the runner after reviewing results.
python "$HOME/.codex/skills/reduce-orchestrator/scripts/rlm_run_ready.py" \
--root . \
--run-id <run_id> \
--stage map \
--max-inflight 4 \
--progress-required \
--dangerously-bypass-approvals-and-sandbox
Safe-first defaults (built in):
- Does not spawn additional workers while any are inflight (prevents “auto-refill”).
- Does not spawn join workers (
worker_idstarting withjoin-) unless explicitly allowed.
To override (only when you intentionally want it):
python "$HOME/.codex/skills/reduce-orchestrator/scripts/rlm_run_ready.py" \
--root . \
--run-id <run_id> \
--stage map \
--allow-join \
--allow-refill-while-inflight \
--dangerously-bypass-approvals-and-sandbox
codex -m <compute_tier-model> -c model_reasoning_effort="<compute_tier-effort>" exec \
-C . \
- <<'PROMPT' >/dev/null 2>&1
Use the map-worker skill.
$map-worker
Return minimal chat output; write your narrative report to:
.rlm/runs/<run_id>/artifacts/reports/<worker_id>.md
Read and comply with the run contract first:
.rlm/runs/<run_id>/artifacts/context/run_contract.md
{
"run_id": "<run_id>",
"worker_id": "map-0001",
"mode": "map",
"granularity": "micro",
"compute_tier": "standard",
"goal": "… (include desired outputs + constraints)",
"report_path": ".rlm/runs/<run_id>/artifacts/reports/<worker_id>.md",
"hint_paths": [
"<project paths…>",
".rlm/runs/<run_id>/",
".rlm/runs/<run_id>/artifacts/context/run_contract.md",
"<context_paths…>"
]
}
PROMPT
Do not use this raw heredoc launch pattern for real runs; keep it only as a conceptual reference. If a worker requests more scope in their narrative report, treat that request as input for updating hint scope and re-running targeted workers.
Context handoff (optional): if a worker writes a short context note (recommended path: .rlm/runs/<run_id>/artifacts/context/<worker_id>.md), include it in downstream tasks via context_paths / hint_paths when helpful.
Note: treat report_path as the source of truth. The -o .rlm/.../<worker_id>.last_message.md file is best-effort and may be absent.
Parallelization guidance (recommended):
- Keep
depends_onminimal; only encode true data dependencies. - Split large tasks into independent subtasks that can run in parallel, then add a small number of join tasks for synthesis.
- Prefer passing compact context notes instead of broad file scopes to avoid serializing work.
- Use
standardto map scope in parallel when appropriate; schedulestandard-plus/heavywhere needed (granularity is independent of compute_tier, e.g.microcan beheavy).
Progress & stalls (recommended for long tasks):
- Default behavior is to wait patiently; do not cancel long-running workers unless there is strong evidence of a stall.
- If you need visibility, ask workers to append a short "Progress" note to the report or write a small heartbeat file.
- Only intervene when there is no new output for an extended period (e.g., 20-30 minutes) and the task is blocking the run.
- If
report_pathis missing (no report produced), treat it as a stall signal and retry using a newworker_id+report_path:- Either reduce
granularity(split into 2–6 micro/meso workers + optional join) or retry with a highercompute_tier(standard → standard-plus → heavy). - Granularity and compute_tier are orthogonal; prefer splitting for broad/ambiguous tasks, and prefer compute_tier upgrades when the task is already narrow but needs deeper reasoning.
- Either reduce
- Design for worker self-termination (the orchestrator cannot reliably terminate workers from the outside): for long/ambiguous tasks, instruct workers to self-terminate (write a report + next plan) if completion is not feasible within budget.
3) REDUCE (canonicalize topics; provenance-first)
Reduce by reading worker narrative reports (report_path) and synthesizing:
- consolidated findings
- open questions/risks
- a short verify target list (what needs confirmation)
Do not fully trust worker reports. Apply your own judgment:
- Cross-check claims across multiple reports when possible.
- Treat missing evidence or vague assertions as lower confidence.
- Prefer conservative conclusions when reports conflict.
If the results are not satisfactory, respond with a clear escalation path:
- Identify what is missing or unclear.
- Propose a targeted follow-up worker task (or a small set) to resolve it.
- If necessary, widen hint scope or upgrade
compute_tier/model (or reducegranularity).
4) VERIFY (parallel; top-K + contradictions + weak evidence)
Schedule Verify workers (mode "verify") in parallel for:
- Top-K narrative topics by impact (K default 5; cap at 5)
- Unresolved contradictions found in narrative synthesis
- High-importance statements needing confirmation
Construct each verify worker’s hint_paths broadly, and write results to:
.rlm/runs/<run_id>/artifacts/reports/<worker_id>.md.
Prefer relying on the verify goal + hint_paths + run contract.
codex -m <compute_tier-model> -c model_reasoning_effort="<compute_tier-effort>" exec \
-C . \
- <<'PROMPT' >/dev/null 2>&1
Use the map-worker skill.
$map-worker
Return minimal chat output; write your narrative report to:
.rlm/runs/<run_id>/artifacts/reports/<worker_id>.md
{
"run_id": "<run_id>",
"worker_id": "verify-0001",
"mode": "verify",
"goal": "…",
"report_path": ".rlm/runs/<run_id>/artifacts/reports/<worker_id>.md",
"hint_paths": ["<minimum needed…>", ".rlm/runs/<run_id>/"]
}
PROMPT
5) DECIDE (iterate or finish)
Stop based on:
- high-priority open questions remaining
- contradiction status (unresolved pro/con)
- evidence thresholds (e.g., strong-evidence coverage on high-impact topics)
- budget exhaustion (iterations/workers)
If finishing, write final.json with:
goal_summarytermination_decision(should_finish,reason,budget_used)- key narrative conclusions
- explicit
evidence_paths(narrative report paths, plus any cited files) deferred_opportunities(optional but recommended when discovered)
Then archive + cleanup:
python "$HOME/.codex/skills/reduce-orchestrator/scripts/rlm_admin.py" archive-run \
--root . \
--run-id <run_id> \
--goal-summary "<goal>" \
--termination-reason "<reason>" \
--ttl-days 14
python "$HOME/.codex/skills/reduce-orchestrator/scripts/rlm_admin.py" release-run-lock --root . --run-id <run_id>
Shell safety note (recommended): when writing Markdown via heredocs (run contracts, reports, final.md, etc.),
use a quoted heredoc delimiter (e.g., cat <<'EOF' > <file>.md) to prevent shell command substitution/backtick expansion
from corrupting Markdown content.