Agent Usage Optimizer
Agent Routing via GitHub Labels (Preferred Method)
Deterministic agent routing using agent: labels on GitHub issues — no separate queue file needed:
# Route tasks to agents via labels
gh issue edit <issue-number> --add-label "agent:gemini"
gh issue edit <issue-number> --add-label "agent:Codex"
gh issue edit <issue-number> --add-label "agent:codex"
View agent queues:
gh issue list --label "agent:gemini,priority:high"
gh issue list --label "agent:Codex,priority:high"
gh issue list --label "agent:codex,priority:high"
Reassign tasks:
gh issue edit <issue-number> --remove-label "agent:gemini" --add-label "agent:Codex"
Gemini Batched Session Pattern (Maximize $20/mo Quota)
Group 5-6 related research/planning tasks into ONE Gemini session. Each task produces a file + commit.
Working Methods
Option A — OpenRouter (recommended for non-interactive/overnight):
hermes chat --provider openrouter --model google/gemini-2.5-pro --quiet -q "
You are the ACE Engineer advance scout. Working directory: /mnt/local-analysis/workspace-hub.
<task description>
"
This works reliably for one-shot/overnight execution. Costs OpenRouter credits but avoids 403 errors.
Option B — Interactive session (Copilot provider):
hermes chat --provider copilot --model gemini-2.5-pro -q "task"
Only works in interactive mode with --yolo flag for unattended runs.
BROKEN: Do NOT Use
h-router-gemini -q— alias does not work for one-shothermes chat --provider copilot --model gemini-2.5-pro --quiet -q— returns HTTP 403hermes chat --provider copilot --model gemini-2.5-pro -q(interactive) — returns HTTP 403- Copilot/ GitHub's Gemini API blocks non-interactive CLI calls entirely
Verified Working Gemini Providers
| Provider | Model | Interactive | One-shot (-q) | Notes | |----------|-------|-------------|---------------|-------| | openrouter | google/gemini-2.5-pro | Yes | Yes | Recommended for batches | | copilot | gemini-2.5-pro | Yes (with --yolo) | No (403) | Only for interactive sessions |
Overnight Gemini Pattern
For overnight batches, use openrouter provider or delegate to subagents (which run on current model):
# Per-task Gemini execution:
hermes chat --provider openrouter --model google/gemini-2.5-pro --quiet -q "<self-contained-prompt>"
# Or use subagent (runs on current model, NOT Gemini):
# delegate_task(goal="research task", toolsets=["terminal", "file"])
Key parameters:
--quiet— suppresses banners for programmatic use--provider openrouter --model google/gemini-2.5-pro— working Gemini path- One session per batch, ~2 min per session
- Gemini handles web_search, file reads, file writes, git commits natively
Codex/Codex Implementation Pattern
For heavy coding tasks, use:
# Complex implementation (Codex Opus)
hermes chat --provider anthropic -m Codex-opus-4-6 -q "<task>"
# Bounded tests + review (Codex via OpenAI)
hermes chat --provider openai-codex -q "<task>"
When to Use
- Before starting a work session with 3+ queued WRK items
- When Codex quota is approaching a constraint (< 50% remaining)
- When routing a task and unsure which provider fits best
- After
/session-startto set provider allocation for the session
Telemetry Sufficiency Check (do this before claiming "optimize to 100% weekly")
Do not assume session logs alone are enough to optimize quota burn. First verify all three layers:
-
Live quota snapshot
- Run:
bash scripts/ai/assessment/query-quota.sh --refresh --json- Inspect
config/ai-tools/agent-quota-latest.json - Treat these states as insufficient for hard utilization targets:
- Codex
source: unavailable - Gemini
source: estimated - missing/null
week_pct,pct_remaining,hours_to_reset
- Codex
-
Historical quota ledger freshness
- Check
~/.agent-usage/weekly-log.jsonl - If the file exists but has not been updated recently, you do NOT have enough telemetry for weekly pacing even if session logs are rich.
- A stale quota ledger means you can still do routing guidance, but not reliable week-to-target burn-down.
- Check
-
Session coverage freshness
- Export native sessions before analysis:
bash scripts/cron/hermes-session-export.sh bash scripts/cron/codex-session-export.sh bash scripts/cron/gemini-session-export.sh bash scripts/cron/provider-session-ecosystem-audit.sh- Then use
analysis/provider-session-ecosystem-audit.jsonanddocs/reports/provider-session-ecosystem-audit.mdfor actual usage patterns.
Practical Interpretation Rules
- If quota telemetry is weak but session telemetry is strong:
- You have enough data to strengthen the repo ecosystem now.
- You do NOT have enough data to guarantee near-100% weekly credit utilization.
- If Codex shows real quota data and low migration debt, push more bounded implementation/test/refactor work to Codex.
- If Gemini recent session volume is tiny and quota is only estimated, treat Gemini as underused research capacity and batch reconnaissance/risk-analysis work there.
- If Codex quota is unavailable, avoid promising precise Codex weekly pacing; use Codex primarily for high-value long-context planning/review until telemetry is fixed.
Scheduling Gap Check
Before trusting the optimizer, verify quota logging is actually scheduled. In practice, it is easy to have:
scripts/ai/assessment/query-quota.shconfig/ai-tools/agent-quota-latest.json~/.agent-usage/weekly-log.jsonl
but no scheduled task keeping them fresh.
Check config/scheduled-tasks/schedule-tasks.yaml for explicit quota-refresh / usage-log jobs. If missing, record that as a telemetry gap and do not overstate optimization confidence.
Operationalized control-plane artifacts
A reusable provider-utilization control plane now exists in workspace-hub. Prefer these generated artifacts over ad hoc interpretation when deciding where to route work:
config/ai-tools/provider-utilization-weekly.jsondocs/reports/provider-utilization-weekly.mdconfig/ai-tools/provider-routing-scorecard.jsondocs/reports/provider-routing-scorecard.mdconfig/ai-tools/provider-work-queue.jsondocs/reports/provider-work-queue.mdconfig/ai-tools/provider-autolabel-candidates.jsondocs/reports/provider-autolabel-candidates.md- handoff/reference snapshot:
docs/reports/provider-routing-system-handoff-YYYY-MM-DD.md
Supporting scripts:
scripts/ai/credit-utilization-tracker.pyscripts/ai/provider-routing-scorecard.pyscripts/ai/provider-work-queue.pyscripts/ai/provider-autolabel.py- wrapper:
scripts/cron/provider-utilization-refresh.sh
The scheduled task is:
provider-utilization-refreshinconfig/scheduled-tasks/schedule-tasks.yaml
Recommended operational loop
Use this order:
- Refresh telemetry and derived routing artifacts:
bash scripts/cron/provider-utilization-refresh.sh
- Read the routing scorecard to decide provider order.
- Read the provider work queue to see issue candidates by provider.
- Review the autolabel candidate report.
- Only then consider applying labels.
Conservative auto-labeling rule
Auto-labeling should remain conservative. The current reusable pattern is:
- only consider issues with no existing
agent:*label - only label high-confidence candidates
- prefer execution-ready issues (
status:plan-approved) first - require strong provider-specific routing reasons, not just generic keyword matches
- apply only a small bounded batch per run
Current command pattern:
# Dry run
uv run --no-project python scripts/ai/provider-autolabel.py
# Conservative live apply
uv run --no-project python scripts/ai/provider-autolabel.py --apply --limit 3
Confidence threshold lessons from live use:
>= 0.90is reasonable for safe automatic labeling- around
0.60is still useful for reporting, but not for automatic label application
Provider-specific practical guidance from live telemetry
- Codex: best lane for bounded implementation, tests, repair, cleanup, refactors. If underused and quota is visible, push more execution here first.
- Gemini: best lane for batched research/recon/risk-analysis packets. Do not auto-label aggressively until Gemini confidence logic is stronger than simple keyword matching.
- Codex: keep for adversarial review, planning, long-context synthesis, architecture, and governance-heavy work. Avoid burning Codex on mechanical loops Codex can absorb.
Follow-on improvement areas
If the control plane is working but still imperfect, the next high-value upgrades are:
- add explanatory GitHub comments when high-confidence auto-labels are applied
- strengthen Gemini-specific routing confidence using research-readiness signals, not just broad research keywords
- improve Codex/Gemini quota observability so utilization can be exact rather than heuristic
Sub-Skills
When the goal is not just analysis but active weekly credit utilization, use this artifact chain:
- Refresh quota + utilization artifacts
bash scripts/cron/provider-utilization-refresh.sh
This should regenerate:
config/ai-tools/provider-utilization-weekly.jsondocs/reports/provider-utilization-weekly.mdconfig/ai-tools/provider-routing-scorecard.jsondocs/reports/provider-routing-scorecard.mdconfig/ai-tools/provider-work-queue.jsondocs/reports/provider-work-queue.mdconfig/ai-tools/provider-autolabel-candidates.jsondocs/reports/provider-autolabel-candidates.md
- Read the routing scorecard for provider-level guidance
provider-routing-scorecard.jsoncombines current-week utilization with provider session audit hygiene- it should answer:
- who is underused now
- what work each provider should receive next
- which providers have hygiene debt that would waste credits
- Read the provider work queue for live issue routing
provider-work-queue.jsoncombines the scorecard with livegh issue listdata- prefer
status:plan-approvedissues first - respect existing
agent:*labels as authoritative when present - treat the generated per-provider issue lists as the primary dispatch surface
Confidence-weighted auto-labeling rule
Auto-labeling GitHub issues is useful, but only if conservative.
Use this pattern:
- generate candidates in dry-run mode first:
uv run --no-project python scripts/ai/provider-autolabel.py
- only apply labels for issues with strong confidence:
uv run --no-project python scripts/ai/provider-autolabel.py --apply --limit 3
Recommended guardrails:
- never touch issues that already have an
agent:*label - require high confidence (>= 0.90 worked well in practice)
- prefer issues that are already
status:plan-approved - require a strong routing reason, not a weak heuristic match
- cap live application to a small number per run (
--limit 3) until confidence is proven over multiple cycles
High-confidence pattern observed in practice:
- execution-ready issue
- strong language match
- Codex: implementation/test/fix
- Codex: strategy/workflow/architecture
- Gemini: research/triage/audit
- provider currently underused according to the scorecard
- no pre-existing agent label
Do NOT auto-label broad or ambiguous items just because the provider is underused.
Practical dispatch rules from the scorecard
For current workspace-hub-style ecosystems, these rules proved reusable:
- Refresh AI provider usage/capacity about every 6 hours and use that telemetry to plan the next work wave; avoid hard-and-fast provider exclusion rules when reliable capacity exists.
- Codex: push bounded implementation, tests, refactors, and crisp execution-ready issues first when quota is visible.
- Codex: use for frontier execution, adversarial review, governance, orchestration, long-context strategy, and control-plane synthesis.
- Gemini: because the account is lower-budget, default it to cross-reviews, adversarial reviews, batched research/recon, and risk-analysis packets rather than main work; if fresh telemetry shows reliable capacity, delegate suitable bounded review/recon work instead of rigidly excluding it. Do not rely on Gemini telemetry as exact weekly headroom if the quota source is only estimated.
Sub-Skills
When the repo already has provider session exports and a provider audit, do not stop at a narrative recommendation. Build a 3-layer control loop:
-
Utilization layer
- Generate weekly utilization artifacts from:
config/ai-tools/agent-quota-latest.json~/.agent-usage/weekly-log.jsonllogs/orchestrator/*/session_*.jsonl
- Canonical outputs:
config/ai-tools/provider-utilization-weekly.jsondocs/reports/provider-utilization-weekly.md
- Prefer real quota-based utilization when available (
week_messages/weekly_limit,week_pct) - Fall back to
activity_vs_recent_peakwhen quota telemetry is weak
- Generate weekly utilization artifacts from:
-
Routing-scorecard layer
- Combine utilization outputs with
analysis/provider-session-ecosystem-audit.json - Canonical outputs:
config/ai-tools/provider-routing-scorecard.jsondocs/reports/provider-routing-scorecard.md
- Include per provider:
- current reported utilization
- quota basis / source
- missing repo reads
- python3-per-1k density
- migration-debt density
- preferred work types
- avoid-work types
- recommended actions
- Use this to produce a ranked provider order (for example:
gemini, codex, Codex)
- Combine utilization outputs with
-
Live issue-queue layer
- Read open GitHub issues with
gh issue list --state open --limit 200 --json ... - Combine live issues with the routing scorecard
- Canonical outputs:
config/ai-tools/provider-work-queue.jsondocs/reports/provider-work-queue.md
- Group issues by recommended provider
- Respect existing
agent:*labels first - Only use heuristics when no explicit agent label exists
- Sort execution-ready items first (
status:plan-approvedor explicit agent ownership)
- Read open GitHub issues with
Recommended heuristics for provider-specific work routing
-
Codex
- best for: adversarial plan review, adversarial implementation review, long-context synthesis, repo strategy, architecture, workflow/governance-heavy work
- avoid: bounded test-fix loops, mechanical refactors, commodity grep/read sweeps
- if stale-path drift / migration debt is high, reduce wasted reads before increasing Codex load
-
Codex
- best for: bounded implementation, test writing/repair, mechanical cleanup/refactors, crisp issue execution
- if quota telemetry is real and utilization is low, this should become the default overflow execution lane
-
Gemini
- best for: batched research/recon, risk enumeration, competitor/standards scans, issue expansion and scouting
- if telemetry is only estimated, treat utilization as directional, but still use Gemini as the underused research lane
- batch 5-6 related recon tasks into a single Gemini session when possible
Safe mutation rule for GitHub labels
Do NOT mass-apply agent: labels just because the scorecard exists.
Preferred sequence:
- generate utilization artifacts
- generate routing scorecard
- generate provider work queue
- manually inspect the top routed issues per provider
- only then apply
agent:labels to the clearest cases
Reason:
- routing heuristics are useful earlier than they are trustworthy for broad backlog mutation
- existing explicit labels should always override heuristic routing
- reporting/queueing is low risk; mass relabeling is not
Refresh pipeline pattern
A good recurring wrapper should:
- run
bash scripts/ai/assessment/query-quota.sh --refresh --log - run the utilization tracker
- run the routing scorecard generator
- run the provider work queue generator
- verify all expected JSON/Markdown outputs exist
- log to
logs/quality/provider-utilization-refresh-YYYYMMDD.log
A practical schedule is every 4 hours.
Implementation gotcha learned in practice
When aggregating provider activity from exported session_*.jsonl logs, older exports may not include reliable runtime session_id values. If you fall back to per-record keys like tool + ts, you will massively overcount sessions.
Safer fallback:
- if
session_idexists, use it - otherwise, fall back to the
session_YYYYMMDDfile identity rather than the individual record identity
This keeps session counts directionally sane even when older exported logs are coarse.
Sub-Skills
When the repo already contains session exports plus provider audit artifacts, the most reusable pattern is:
- Refresh quota snapshots and append the weekly quota ledger:
bash scripts/ai/assessment/query-quota.sh --refresh --log
- Refresh exported provider-session artifacts first when needed:
bash scripts/cron/hermes-session-export.sh
bash scripts/cron/codex-session-export.sh
bash scripts/cron/gemini-session-export.sh
bash scripts/cron/provider-session-ecosystem-audit.sh
- Build weekly utilization artifacts:
uv run --no-project python scripts/ai/credit-utilization-tracker.py \
--weeks 8 \
--output-json config/ai-tools/provider-utilization-weekly.json \
--output-md docs/reports/provider-utilization-weekly.md
- Build routing guidance from utilization + audit hygiene:
uv run --no-project python scripts/ai/provider-routing-scorecard.py
Canonical outputs:
config/ai-tools/provider-utilization-weekly.jsondocs/reports/provider-utilization-weekly.mdconfig/ai-tools/provider-routing-scorecard.jsondocs/reports/provider-routing-scorecard.md
Interpretation Rules for the New Scorecard
Use the routing scorecard to decide where the next work packets go:
codexunderused + quota visible + low migration debt -> route bounded implementation, tests, cleanup, crisp issue execution there firstgeminiunderused + weak/estimated telemetry -> route batched research/recon/risk-analysis packets there, but treat capacity as directional rather than exactCodexunderused + high stale-read debt -> reserve for adversarial review, plan review, and long-context synthesis; reduce stale-path drift before trying to scale load there
Recommended practical ordering in workspace-hub is not purely "lowest utilization first". Combine:
- underutilization
- telemetry confidence
- migration debt / stale-read density
- work-type fit
That is why Gemini and Codex may both rank ahead of Codex even when Codex appears idle.
Recurring Automation Pattern
In workspace-hub this is now best run via:
- wrapper:
scripts/cron/provider-utilization-refresh.sh - schedule task id:
provider-utilization-refresh - cron log:
logs/quality/provider-utilization-refresh-*.log
The wrapper should always verify that all four artifacts exist after generation, not just the quota snapshot and utilization report.
Tracker Implementation Gotchas
Lessons learned while operationalizing this:
- prefer quota-based utilization only when the basis is real weekly quota (
week_pctorweek_messages/weekly_limit) - do not treat
pct_remainingfrom anunavailablesource as trustworthy weekly utilization - Gemini
today_messages/daily_limitfromestimatedis useful only as a weak hint; keep activity fallback active - when exported orchestrator logs lack runtime
session_id, do NOT derive session counts from per-record timestamps/tool names or you will massively overcount sessions; fall back to file identity instead - for routing, activity alone is not enough; combine utilization with audit hygiene (
missing_repo_reads, migration-debt hints, python3 density)