Khala Fleet Management
A "fleet" is the owner's linked coding accounts (Codex and/or Claude), each
running as an ISOLATED local worker reached through Khala -> Pylon ->
assignment, coordinated from one surface (Khala Code Desktop, the khala
CLI, or the khala_fleet MCP tools). This skill encodes the operating
procedure and the non-negotiable guardrails. It is a launcher, not the law:
the canonical runbooks below win whenever they disagree with this summary.
Canonical sources (read before non-trivial fleet work)
In the OpenAgentsInc/openagents repo (or https://openagents.com/AGENTS.md
when working outside the repo):
AGENTS.md/CLAUDE.md— "Help a user connect their Codex fleet to Khala" and the "Khala -> Pylon -> Codex Coding Delegation Runbook" (the request/proof contract, SQL verification, failure signatures).docs/ops/2026-06-27-khala-codex-own-capacity-burn-runbook.md— running the engine 24/7: standing pylon, supervisor, identity/token footguns, stall diagnosis.docs/ops/2026-06-29-khala-codex-fleet-manager-runbook.md— multi-account fan-out, refill loops, merge waves, token proof at scale.docs/khala-code/2026-06-30-khala-code-fleet-management-spec.md— the product capability map (Inbox, fleet board, worker cards, claims).docs/fable/EXECUTION.md— issue/PR/worktree/review discipline for work the fleet produces.
The fleet model in one minute
- Accounts are workers. Each connected account lives in an isolated home
(
<pylon home>/accounts/codex/<ref>,.claude-*viaCLAUDE_CONFIG_DIR). Distinct provider accounts have distinct rate budgets: more accounts = real added concurrency. - Pylon is the executor. A local Pylon advertises capacity via presence
heartbeats (
capacity.coding.codex.available=N, busy, queued, ready); the server dispatch gate admits at most the advertised free slots. - Assignments are the work unit. A typed request (
codex_agent_task/claude_agent_task) produces an assignment with a lease, lifecycle events, a closeout checklist, and exact token rows. Fleet runs are supervised loops that keep N assignments in flight with refill. - Claims prevent duplicate work. At most one live claim per work unit;
claim before dispatch, release on closeout; skips are typed
(
already_claimed,pr_exists,merged,closed,needs_owner).
Connect and inventory
npm install -g @openagentsinc/khala # Node 20+ or Bun
khala fleet connect # paste-free device login; isolated home
khala fleet status # table: ref, readiness, email
- Each
khala fleet connectauto-assigns the next ref (codex,codex-2, ...);--account <ref>names one;--harness claudeconnects a Claude account the same isolated way. - Requires the provider CLI (
npm install -g @openai/codex); the connect flow prints an install hint if missing. - Inventory before routing work: every account you intend to use must show
ready. From a Pylon checkout:$PYLON codex accounts list --json.
Dispatch ladder (smallest sufficient rung first)
-
One bounded task —
codex_spawn(MCP) or:$PYLON khala request \ --prompt "<public-safe bounded objective>" \ --workflow codex_agent_task \ --pylon-ref "<owner pylon ref>" \ --repo <org>/<repo> --branch main --commit "<pinned sha>" \ --verify "<pinned verification command>" \ --jsonUse
--fixturefor a no-spend proof run before real work. If the response has no delegation frame (it fell through to normal model routing), STOP and fix preconditions; do not run spendful work. -
A parallel wave — publish capacity first (
OPENAGENTS_PYLON_CODEX_CONCURRENCY=N ... $PYLON presence heartbeat), then one request per work unit, each with its own claim, pinned refs, and verify command. Never exceed advertised availability. -
A sustained fleet run —
fleet_run_start(MCP or the Fleet panel) with objective, work source, target concurrency, andworkerKind(codex | claude | auto). Monitor withfleet_run_status; steer withfleet_run_control(pause | resume | drain | stop). One supervisor per Pylon; refill takes the next unclaimed unit as slots free up.
Preconditions for every rung: pylon_ensure (or $PYLON provider go-online
presence heartbeat) succeeded, heartbeat is fresh, capacity refs are published, andcodex_fleet_statusshows ready accounts with free slots. The June-2026 "0/1 available" dead-end class is almost always capacity that was never advertised — heartbeat first, then dispatch.
Work-unit hygiene
- One work unit = one claim = one issue = one PR. Search open AND recently-closed issues/PRs before claiming; lower issue number wins races.
- Every real-work dispatch pins repo, commit, branch, and verify command, and cites the issue + claim in the worker prompt.
- Worker prompts are public-safe and bounded: public issue numbers, public paths, public verification commands. Never include raw transcripts, secrets, local paths, provider payloads, or wallet material.
- Branch work is in-progress evidence, not completion. An issue closes only after the PR merges to the owning repo's default branch and required verification ran from the integrated state.
Verification (what "done" means)
- Closeout checklist first:
$PYLON khala closeout "<assignmentRef>" --jsonmust reportcloseoutChecklist.ok: true— trace/proof projections agree, exact own-capacity token rows exist, no-spend runs provepaymentMode: "no-spend"andpayoutClaimAllowed: false. - Exact token rows are the accounting truth: one
token_usage_eventsrow per completed SDK turn (usage_truth = 'exact',demand_source = 'khala_coding_delegation',task_ref = assignmentRef). Public counters are projections of those rows. - Counter movement alone is NEVER completion evidence. Reconcile the public counter delta against the exact rows; other agents may be running.
- A failed or missing token-ingest row is not acceptable proof; rerun or debug until the exact row exists. Interrupted local runs submit a typed stale closeout before claiming new work.
Diagnosis: common failure signatures
target_pylon_not_authorized— the token does not own/link that Pylon.target_pylon_unavailable— Pylon not active, heartbeat stale, not capable, or no free advertised capacity. Re-run heartbeat; checkcapacity.coding.<kind>.available=Nvs active assignment rows.- Provider error about unexpected
openagentsinputs — delegation did not happen; the request fell through to normal routing. Re-check--workflow, target freshness, caller ownership. Stop before spend. - Second dispatch refused below advertised capacity — inspect assignment rows for non-expired stale leases; check whether their local process is alive before creating more requests.
- Long silent run — inspect lifecycle events / raw event chunks before assuming progress; do not stack more dispatches on a wedged worker.
Hard guardrails (never violate, even under time pressure)
- NEVER run
codex login(or any auth flow) against the default~/.codexhome, and never touch the owner's live~/.claude. Login flows clear live credentials at flow start. Worker auth always uses isolated per-account homes; to inspect accounts, list them — never re-login to "check". - Exact-only token accounting: no synthesized counters, no progress-frame
counting, rates reported
pending/not_measuredrather than fabricated. - MCP delegation keeps its approval prompt; a sustained run gets one approval at run-start — never silent standing authority.
- Respect advertised capacity; one fan-out supervisor per Pylon; the dispatch gate is the admission authority.
- Fixture/no-spend tiers never spend or claim real work; live smokes are env-armed and skip-safe by default.
- Never weaken a gate, test, or policy to make dispatch or closeout pass.
- Public-safe projections everywhere; raw worker events stay in owner-scoped private storage.