Reduce Orchestrator Skill

Reduce Orchestrator

Overview

Run a constrained, domain-agnostic MapReduce(+Verify) loop that is suitable for code/debugging/analysis/research/docs, while keeping scope strict, evidence explicit, and run artifacts reproducible and concurrency-safe.

Non-Negotiables (must enforce)

Mandatory dependency: `map-worker`

Run all Map and Verify work via the map-worker skill.
Every spawned worker prompt must include this exact line: “Use the map-worker skill.”
Also include $map-worker in the prompt to reliably trigger the installed hyphen-case skill.

Mandatory orchestration style (must keep orchestrator context small)

Goal: the orchestrator must be able to scale parallelism without its own context exploding.

The orchestrator must operate primarily on paths + small state files, not by inlining large worker outputs into its own context.
Treat report_path markdown files as the sole worker source-of-truth. Prefer referencing paths over quoting content.
Never read worker logs into the orchestrator context. Do not open/ingest:
- .rlm/runs/<run_id>/artifacts/logs/*.log (map-worker/codex logs)
- any stdout/stderr capture files
- any other large, non-contract outputs
Use file freshness / presence as the primary stall signal (via report_path, inflight.json, and rlm_watch_reports.py) rather than reading verbose logs.
If a worker appears stalled or failed, do not “debug by reading logs”; instead:
- treat a missing/empty report_path as a stall signal,
- retry using a new worker_id + report_path (and/or reduce granularity / upgrade compute_tier),
- and if human debugging is required, ask the user to inspect logs out-of-band and summarize (do not paste logs into chat).

Work delegation (orchestrator does not “do” the work)

Do not perform substantive domain work directly in the orchestrator (no manual repo edits, no ad-hoc analysis outside workers).
Treat the orchestrator as: planner + dispatcher + integrator + lifecycle manager.
The only direct orchestrator actions should be:
- write/update .rlm/runs/<run_id>/plan.json and final outputs,
- run lifecycle/admin steps (locks, archiving, TTL cleanup) via the skill-bundled rlm_admin.py (typically $CODEX_HOME/skills/reduce-orchestrator/scripts/rlm_admin.py),
- and spawn workers to do everything else (via the mandatory scripts below).
If an implementation drifts out-of-scope (scope creep), schedule a follow-up worker to trim/revert the drift (e.g., impl-0001c “scope-trim”) instead of applying manual edits in the orchestrator.
If you (the orchestrator) accidentally performed substantive work directly, immediately:
- record the deviation in the run’s final/reduce output, and
- schedule a repair/scope-trim worker to reconcile changes back to contract scope (or justify and re-scope via an updated contract).
- Do not silently “normalize” drift; preserve provenance and make the correction workflow explicit.

Worker sizing: `granularity` + `compute_tier`

Define every worker along two axes:

granularity (how big the task unit is; drives parallelism and scope control)
compute_tier (how strong the model/effort is; drives reasoning capacity)

`granularity` (recommended vocabulary)

micro: one narrow outcome; ideally 1 file / 1 function / 1 command; should be trivially reviewable.
meso: a small cohesive change across a few files or a small subsystem; may require a follow-up join worker.
macro: cross-cutting work (multi-subsystem, large-context synthesis, refactor/architecture); avoid assigning macro directly—prefer splitting into micro/meso workers + a join worker.

Default planning rule:

If you think you need macro, you probably need 3–8 micro/meso workers + 1 join.

`compute_tier` (model requirement for all spawned workers)

Spawn workers only via Codex CLI using one of these compute mappings (pick per worker):

| compute_tier | model | model_reasoning_effort | |---|---|---| | standard | gpt-5.1-codex-mini | medium | | standard-plus | gpt-5.2-codex | medium | | heavy | gpt-5.2-codex | xhigh |

Use heavy for any task that requires high-level synthesis, large-context processing, or deep reasoning—e.g., multi-file edits, core-logic changes, or abstraction/re-architecture. When in doubt, upgrade compute_tier to heavy or reduce granularity (split work).

Model escalation (ask user to change models when needed)

If you are using the tiered worker models appropriately (standard/standard-plus/heavy) and the outcome is still not satisfactory (e.g., repeated low-quality reports, missed dependencies, persistent misalignment with the run contract, or inability to complete the task within budget), explicitly ask the user to approve a model change.

Be concrete about the failure mode (what was expected vs what happened).
Propose the smallest model change that plausibly fixes it (e.g., promote specific tasks to heavy, or switch to a stronger model for the orchestrator/workers).
If the user declines, proceed with scope reduction and/or additional verification as a fallback.

Run contract (workers must comply)

For every run, write a single authoritative run contract document at:
- .rlm/runs/<run_id>/artifacts/context/run_contract.md
Every worker prompt must:
- include the run_contract.md path in hint_paths (or otherwise provide the path explicitly), and
- instruct the worker to read it first and follow it.
Every worker narrative report must include a “Contract Compliance” section stating:
- whether the contract was read,
- which constraints were applied,
- any deviations (with justification), and
- any requested contract updates / open risks.

Minimum required contents of run_contract.md (keep concise; narrative is fine):

Goal + non-goals
Scope boundaries (directories/files to avoid)
Invariants (must-not-change behaviors, API contracts, data formats)
Dependency/sequence rules (e.g., “interfaces → call sites → cleanup”)
Verification requirements (what must pass; which environment to use; repo-root CWD assumption if applicable)
Failure policy (rollback / retry / scope reduction triggers)

Environment baseline (ask user if not specified)

If the repo’s applicable AGENTS.md does not specify a verification/runtime environment baseline, you must request it from the user before treating Verify results as authoritative.
Default assumption (unless explicitly overridden): workers execute commands from the repo root (CWD = repo root, i.e. -C .). If that assumption is unsafe for the repo, ask the user to confirm the expected CWD.
Ask for the minimum needed baseline (keep it short):
- interpreter choice (e.g., python vs .venv/bin/python)
- any required env vars (e.g., PYTHONPATH, NAUTILUS_*)
- expected working directory assumptions (repo root vs anywhere)
Record the agreed baseline in run_contract.md under “Verification requirements”.
In Verify tasks, instruct workers to explicitly state which interpreter/env they used.

Parallel safety (avoid file conflicts)

Treat file-level collisions as a first-class risk in parallel Map phases.
Planning guidance:
- Prefer partitioning work by disjoint file ownership (workers edit non-overlapping file sets).
- If two tasks might touch the same files/symbols, either:
  - refactor the plan to make ownership disjoint, or
  - serialize those tasks via depends_on.
- For large refactors, schedule a final “join” worker to integrate and resolve any cross-file issues.

Output contract

Consume narrative report files only. Treat chat text as non-authoritative.

Mandatory scripts (do not hand-roll commands)

To reduce prompt corruption, shell interpolation bugs (e.g. $map-worker expansion), and orchestration context growth:

Spawn workers: use $CODEX_HOME/skills/reduce-orchestrator/scripts/rlm_spawn_worker.py (never raw heredocs for worker prompts).
Spawn only READY tasks (recommended default): use $CODEX_HOME/skills/reduce-orchestrator/scripts/rlm_run_ready.py to enforce depends_on + inflight caps.
Monitor stalls: use $CODEX_HOME/skills/reduce-orchestrator/scripts/rlm_watch_reports.py (report freshness).
Write a deterministic inventory: use $CODEX_HOME/skills/reduce-orchestrator/scripts/rlm_reduce.py to emit missing/stale report lists without reading reports into context.
Collect deferred opportunities: use $CODEX_HOME/skills/reduce-orchestrator/scripts/rlm_collect_deferred.py to write .rlm/runs/<run_id>/deferred.json from worker reports.

If CODEX_HOME is not set, it is typically ~/.codex.

If you cannot use these scripts for some reason, stop and ask the user to fix the environment rather than falling back to ad-hoc shell templates.

State rehydration (mandatory, always)

The orchestrator must never rely on conversational memory for run state. Always rehydrate deterministically from on-disk sources of truth before making any plan/spawn/retry/finish decisions.

Mandatory rehydration procedure:

Identify the active run_id (from user, or from .rlm/runs/ if explicitly instructed).
Re-load state from these SoT paths (prefer paths over inlined content):
- .rlm/runs/<run_id>/plan.json
- .rlm/runs/<run_id>/artifacts/context/run_contract.md
- .rlm/runs/<run_id>/artifacts/scheduler/inflight.json (if present)
- .rlm/runs/<run_id>/reduce_state.json (rebuild via rlm_reduce.py if missing/stale)
- .rlm/runs/<run_id>/deferred.json (rebuild via rlm_collect_deferred.py if missing/stale)
Recompute inventory artifacts (no waiting / no sleep loops):
- python "$CODEX_HOME/skills/reduce-orchestrator/scripts/rlm_reduce.py" --root . --run-id <run_id> --stale-seconds 1200
- python "$CODEX_HOME/skills/reduce-orchestrator/scripts/rlm_collect_deferred.py" --root . --run-id <run_id>
Only after rehydration, proceed with rlm_run_ready.py to spawn READY tasks.

Rehydration output rule:

In the final user-facing message, explicitly state that rehydration was performed and list the key state paths used.

Deferred opportunities (recommended)

When workers surface “good ideas” that are out of scope for the current run (e.g., performance refactors, cleanup, indexing, abstraction), do not silently discard them.

Keep the current run scoped (do not mix axes), but capture these items as Deferred opportunities with:
- what was proposed (1 line),
- why it was deferred (scope / risk / verification cost),
- how to validate it in a dedicated follow-up run (success criteria + minimal verify).
Include a short Deferred opportunities section in:
- the run’s final.md/final.json, and
- your final user-facing message (even if empty: “none identified”).
Mandatory publishing rule: before posting the final user-facing message, explicitly scan the run’s report_path markdown files for deferred items and include them (with evidence paths).
- If nothing is found, still include: “Deferred opportunities: none identified”.

Recommended artifact (SoT, improves reuse): write .rlm/runs/<run_id>/deferred.json and reference it from final.md/final.json. Mandatory for real runs: write .rlm/runs/<run_id>/deferred.json before archiving, and reference it from final.md/final.json.

Required directory layout and lifecycle

Use this deterministic layout:

.rlm/
  runs/<run_id>/
    plan.json
    artifacts/reports/*.md
    reduce_state.json
    deferred.json         # aggregated from report_path markdown (required for real runs)
    final.json            # or final.md
    artifacts/            # optional large outputs
    archived_to.json      # written after archiving
  archive/<archive_id>/   # immutable snapshot
    run/                  # snapshot of runs/<run_id>/
    meta.json
    in_use_by/<run_id>    # marker(s) (optional)
  locks/
    <run_id>.lock
    cleanup.lock
  cleanup.log

Archive trigger (must do)

When you reach a terminal state (termination_decision.should_finish == true or budget exhausted):

Write final.json (or final.md) that references evidence/artifacts by path (at minimum: worker report paths + any file paths cited).
Create an immutable archive snapshot under .rlm/archive/<archive_id>/ where <archive_id> includes <run_id> + timestamp.
Write .rlm/archive/<archive_id>/meta.json with:
- goal_summary
- start_timestamp, end_timestamp
- termination_reason
- retention: ttl_days (default), keep_forever (optional), size_bytes (optional)
Post-archive cleanup of the active run dir:
- remove or compress heavy transient artifacts (e.g., artifacts/)
- keep only minimal pointers + final outputs
Run TTL cleanup after archiving.

Use the skill-bundled rlm_admin.py (typically $CODEX_HOME/skills/reduce-orchestrator/scripts/rlm_admin.py) to do this safely and deterministically.

TTL cleanup (must do)

Run cleanup:

at the start of a new run
after archiving

Delete archives older than ttl_days unless keep_forever == true, and never delete archives with any in_use_by/* markers. Log deletions to .rlm/cleanup.log. Acquire .rlm/locks/cleanup.lock before cleanup; skip cleanup if it already exists.

Concurrency safety (must enforce)

Multiple orchestrators may run concurrently. Enforce:

Unique run_id per run (generate if not provided).
Per-run lock file: .rlm/locks/<run_id>.lock
- Acquire atomically; fail fast if it exists.
- If it appears stale (older than threshold, default 24h), warn and require explicit override behavior (do not silently break).
- Release on exit.
Cleanup lock: .rlm/locks/cleanup.lock (cleanup must skip if locked).
In-use archive marker: if you reference or depend on an archive, write .rlm/archive/<archive_id>/in_use_by/<run_id>.

Use the skill-bundled rlm_admin.py for locks, cleanup, archiving, and in-use markers.

Loop algorithm (PLAN → MAP → REDUCE → VERIFY → DECIDE)

0) Initialize a run (locks + cleanup)

Pick a unique run_id and initialize:

python "$HOME/.codex/skills/reduce-orchestrator/scripts/rlm_admin.py" init-run \
  --root . \
  --run-id <run_id> \
  --ttl-days 14

1) PLAN (strict scope; deterministic worker IDs)

Write .rlm/runs/<run_id>/plan.json with:

goal summary + success criteria + budget (max_iterations, max_workers, etc.)
a list of Map tasks, each with:
- deterministic worker_id (e.g., map-0001, map-0002, …)
- granularity (micro | meso | macro) (recommended)
- compute_tier (standard | standard-plus | heavy)
- goal or intent (may be broad; define the desired outcomes)
- depends_on (optional; list of prior worker_ids this task relies on)
- inputs (optional; list of artifact/result paths required from dependencies)
- context_paths (optional; any additional context paths to include in scope)
- report_path (required; e.g., .rlm/runs/<run_id>/artifacts/reports/<worker_id>.md)
hint_paths (optional; include .rlm/runs/<run_id>/ if you want a starting scope)

Do not allow two workers to write the same report_path.

Adaptive planning (recommended for exploratory tasks):

Set max_iterations >= 2 and treat iteration 1 as a standard pass:
- Let workers explore within reason, guided by the run contract + hint_paths (scope hints).
- If a worker needs more scope, have them request it explicitly in their narrative report.
- Update plan.json (and/or schedule the next iteration’s tasks) based on what you learn.
If you rewrite plan.json between iterations, preserve provenance by copying the previous version to:
- .rlm/runs/<run_id>/artifacts/plan.iter-<n>.json

2) MAP (parallel-first via `map-worker`)

Spawn Map workers in parallel. For each worker, invoke Codex CLI using the model/effort for its compute_tier, with an explicit envelope that provides the fields you want the worker to follow (at minimum: report_path).

Mandatory: use the skill-bundled launcher script to avoid shell interpolation issues (e.g., $map-worker expansion) when generating prompts.

python "$HOME/.codex/skills/reduce-orchestrator/scripts/rlm_spawn_worker.py" \
  --cd . \
  --model gpt-5.2-codex \
  --reasoning-effort high \
  --dangerously-bypass-approvals-and-sandbox \
  --run-id <run_id> \
  --worker-id map-0001 \
  --mode map \
  --goal "… (include desired outputs + constraints)" \
  --report-path .rlm/runs/<run_id>/artifacts/reports/map-0001.md \
  --contract-path .rlm/runs/<run_id>/artifacts/context/run_contract.md \
  --hint-path .rlm/runs/<run_id>/ \
  --hint-path <project paths…> \
  --progress-required \
  --log-file .rlm/runs/<run_id>/artifacts/logs/map-0001.log \
  --background

To monitor stalls without expanding orchestrator context, check report freshness (mandatory, no polling):

python "$HOME/.codex/skills/reduce-orchestrator/scripts/rlm_watch_reports.py" \
  --root . \
  --run-id <run_id>

Write a deterministic inventory of report_path artifacts (missing/stale) (mandatory): .rlm/runs/<run_id>/reduce_state.json

python "$HOME/.codex/skills/reduce-orchestrator/scripts/rlm_reduce.py" \
  --root . \
  --run-id <run_id> \
  --stale-seconds 1200

Do not use sleep loops for waiting. Prefer explicit state:

Spawn, then exit/return control.
Re-run rlm_watch_reports.py / rlm_reduce.py when you want an updated view.

Before writing the final output, collect deferred items into .rlm/runs/<run_id>/deferred.json:

python "$HOME/.codex/skills/reduce-orchestrator/scripts/rlm_collect_deferred.py" \
  --root . \
  --run-id <run_id>

Mandatory default (safer, avoids pre-queue auto-run): spawn only READY tasks (depends_on satisfied) with an inflight cap. This lets you keep future work in plan.json without actually running it until you re-invoke the runner after reviewing results.

python "$HOME/.codex/skills/reduce-orchestrator/scripts/rlm_run_ready.py" \
  --root . \
  --run-id <run_id> \
  --stage map \
  --max-inflight 4 \
  --progress-required \
  --dangerously-bypass-approvals-and-sandbox

Safe-first defaults (built in):

Does not spawn additional workers while any are inflight (prevents “auto-refill”).
Does not spawn join workers (worker_id starting with join-) unless explicitly allowed.

To override (only when you intentionally want it):

python "$HOME/.codex/skills/reduce-orchestrator/scripts/rlm_run_ready.py" \
  --root . \
  --run-id <run_id> \
  --stage map \
  --allow-join \
  --allow-refill-while-inflight \
  --dangerously-bypass-approvals-and-sandbox

codex -m <compute_tier-model> -c model_reasoning_effort="<compute_tier-effort>" exec \
  -C . \
  - <<'PROMPT' >/dev/null 2>&1
Use the map-worker skill.
$map-worker

Return minimal chat output; write your narrative report to:
.rlm/runs/<run_id>/artifacts/reports/<worker_id>.md

Read and comply with the run contract first:
.rlm/runs/<run_id>/artifacts/context/run_contract.md

{
  "run_id": "<run_id>",
  "worker_id": "map-0001",
  "mode": "map",
  "granularity": "micro",
  "compute_tier": "standard",
  "goal": "… (include desired outputs + constraints)",
  "report_path": ".rlm/runs/<run_id>/artifacts/reports/<worker_id>.md",
  "hint_paths": [
    "<project paths…>",
    ".rlm/runs/<run_id>/",
    ".rlm/runs/<run_id>/artifacts/context/run_contract.md",
    "<context_paths…>"
  ]
}
PROMPT

Do not use this raw heredoc launch pattern for real runs; keep it only as a conceptual reference. If a worker requests more scope in their narrative report, treat that request as input for updating hint scope and re-running targeted workers.

Context handoff (optional): if a worker writes a short context note (recommended path: .rlm/runs/<run_id>/artifacts/context/<worker_id>.md), include it in downstream tasks via context_paths / hint_paths when helpful.

Note: treat report_path as the source of truth. The -o .rlm/.../<worker_id>.last_message.md file is best-effort and may be absent.

Parallelization guidance (recommended):

Keep depends_on minimal; only encode true data dependencies.
Split large tasks into independent subtasks that can run in parallel, then add a small number of join tasks for synthesis.
Prefer passing compact context notes instead of broad file scopes to avoid serializing work.
Use standard to map scope in parallel when appropriate; schedule standard-plus/heavy where needed (granularity is independent of compute_tier, e.g. micro can be heavy).

Progress & stalls (recommended for long tasks):

Default behavior is to wait patiently; do not cancel long-running workers unless there is strong evidence of a stall.
If you need visibility, ask workers to append a short "Progress" note to the report or write a small heartbeat file.
Only intervene when there is no new output for an extended period (e.g., 20-30 minutes) and the task is blocking the run.
If report_path is missing (no report produced), treat it as a stall signal and retry using a new worker_id + report_path:
- Either reduce granularity (split into 2–6 micro/meso workers + optional join) or retry with a higher compute_tier (standard → standard-plus → heavy).
- Granularity and compute_tier are orthogonal; prefer splitting for broad/ambiguous tasks, and prefer compute_tier upgrades when the task is already narrow but needs deeper reasoning.
Design for worker self-termination (the orchestrator cannot reliably terminate workers from the outside): for long/ambiguous tasks, instruct workers to self-terminate (write a report + next plan) if completion is not feasible within budget.

3) REDUCE (canonicalize topics; provenance-first)

Reduce by reading worker narrative reports (report_path) and synthesizing:

consolidated findings
open questions/risks
a short verify target list (what needs confirmation)

Do not fully trust worker reports. Apply your own judgment:

Cross-check claims across multiple reports when possible.
Treat missing evidence or vague assertions as lower confidence.
Prefer conservative conclusions when reports conflict.

If the results are not satisfactory, respond with a clear escalation path:

Identify what is missing or unclear.
Propose a targeted follow-up worker task (or a small set) to resolve it.
If necessary, widen hint scope or upgrade compute_tier/model (or reduce granularity).

4) VERIFY (parallel; top-K + contradictions + weak evidence)

Schedule Verify workers (mode "verify") in parallel for:

Top-K narrative topics by impact (K default 5; cap at 5)
Unresolved contradictions found in narrative synthesis
High-importance statements needing confirmation

Construct each verify worker’s hint_paths broadly, and write results to: .rlm/runs/<run_id>/artifacts/reports/<worker_id>.md. Prefer relying on the verify goal + hint_paths + run contract.

codex -m <compute_tier-model> -c model_reasoning_effort="<compute_tier-effort>" exec \
  -C . \
  - <<'PROMPT' >/dev/null 2>&1
Use the map-worker skill.
$map-worker

Return minimal chat output; write your narrative report to:
.rlm/runs/<run_id>/artifacts/reports/<worker_id>.md

{
  "run_id": "<run_id>",
  "worker_id": "verify-0001",
  "mode": "verify",
  "goal": "…",
  "report_path": ".rlm/runs/<run_id>/artifacts/reports/<worker_id>.md",
  "hint_paths": ["<minimum needed…>", ".rlm/runs/<run_id>/"]
}
PROMPT

5) DECIDE (iterate or finish)

Stop based on:

high-priority open questions remaining
contradiction status (unresolved pro/con)
evidence thresholds (e.g., strong-evidence coverage on high-impact topics)
budget exhaustion (iterations/workers)

If finishing, write final.json with:

goal_summary
termination_decision (should_finish, reason, budget_used)
key narrative conclusions
explicit evidence_paths (narrative report paths, plus any cited files)
deferred_opportunities (optional but recommended when discovered)

Then archive + cleanup:

python "$HOME/.codex/skills/reduce-orchestrator/scripts/rlm_admin.py" archive-run \
  --root . \
  --run-id <run_id> \
  --goal-summary "<goal>" \
  --termination-reason "<reason>" \
  --ttl-days 14

python "$HOME/.codex/skills/reduce-orchestrator/scripts/rlm_admin.py" release-run-lock --root . --run-id <run_id>

Shell safety note (recommended): when writing Markdown via heredocs (run contracts, reports, final.md, etc.), use a quoted heredoc delimiter (e.g., cat <<'EOF' > <file>.md) to prevent shell command substitution/backtick expansion from corrupting Markdown content.

Agent Skills: Reduce Orchestrator

Install this agent skill to your local

Skill Files