context-surfing
>
intent-framed-agent
Frames coding-agent work sessions with explicit intent capture and drift monitoring. Use when a session transitions from planning/Q&A to implementation for coding tasks, refactors, feature builds, bug fixes, or other multi-step execution where scope drift is a risk.
mcp-builder
Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).
plan-interview
|
self-improvement
Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Codex ('No, that's wrong...', 'Actually...'), (3) User requests a capability that doesn't exist, (4) An external API or tool fails, (5) Codex realizes its knowledge is outdated or incorrect, (6) A better approach is discovered for a recurring task. Also review learnings before major tasks.
simplify-and-harden
Post-completion self-review for coding agents that runs simplify, harden, and micro-documentation passes on non-trivial code changes. Use when: a coding task is complete in a general agent session and you want a bounded quality and security sweep before signaling done. For CI pipeline execution, use simplify-and-harden-ci.
skill-creator
Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Codex's capabilities with specialized knowledge, workflows, or tool integrations.
agent-teams-simplify-and-harden
Implementation + audit loop using parallel agent teams with structured simplify, harden, and document passes. Spawns implementation agents to do the work, then audit agents to find complexity, security gaps, and spec deviations, then loops until code compiles cleanly, all tests pass, and auditors find zero issues or the loop cap is reached. Use when: implementing features from a spec or plan, hardening existing code, fixing a batch of issues, or any multi-file task that benefits from a build-verify-fix cycle.
eval-creator-ci
[Beta] CI-only eval regression runner using gh-aw (GitHub Agentic Workflows). Runs all eval cases in .evals/ on a schedule or per-PR, reports pass/fail results, and can block merges on regressions. Also creates new eval cases from promoted patterns flagged by learning-aggregator-ci. Use when: you want automated regression testing of promoted rules in CI/headless pipelines. For interactive eval creation and runs, use eval-creator.
eval-creator
[Beta] Creates permanent eval cases from promoted learnings and runs regression checks against them. Turns failures into test cases that prevent silent regression. This is the outer loop's regress-test step. Use when a learning is promoted and has a clear pass/fail condition, or on cadence to verify promoted rules still hold.
learning-aggregator-ci
[Beta] CI-only learning aggregation workflow using gh-aw (GitHub Agentic Workflows). Scans .learnings/ files on a schedule, groups entries by pattern_key, identifies promotion-ready patterns, and posts a gap report as a PR or issue comment. Use when: you want automated cross-session pattern detection in CI/headless pipelines without interactive prompts. For interactive use, use learning-aggregator.
learning-aggregator
[Beta] Cross-session analysis of accumulated .learnings/ files. Reads all entries, groups by pattern_key, computes recurrence across sessions, and outputs ranked promotion candidates. This is the outer loop's inspect step — it turns raw learning data into actionable gap reports. Use on a regular cadence (weekly, before major tasks, or at session start for critical projects). Can be invoked manually or scheduled.
pre-flight-check
[Beta] Session-start scan that surfaces relevant learnings, recent errors, and eval status before work begins. Bridges the outer loop back into the inner loop by making accumulated knowledge visible at task start. Activated via SessionStart hook or manually before major tasks.
verify-gate
Runs project compile, test, and lint commands between implementation and quality review. Gates simplify-and-harden behind machine verification. If checks fail, routes back to implementation with diagnostics for a fix loop. If checks pass, signals ready for the quality pass. Use after any implementation work completes and before simplify-and-harden. Essential for the inner loop's verify step.
dx-data-navigator
Query Developer Experience (DX) data via the DX Data MCP server PostgreSQL database. Use this skill when analyzing developer productivity metrics, team performance, PR/code review metrics, deployment frequency, incident data, AI tool adoption, survey responses, DORA metrics, or any engineering analytics. Triggers on questions about DX scores, team comparisons, cycle times, code quality, developer sentiment, AI coding assistant adoption, sprint velocity, or engineering KPIs.
self-improvement-ci
CI-only self-improvement workflow using gh-aw (GitHub Agentic Workflows). Captures recurring failure patterns and quality signals from pull request checks, emits structured learning candidates, and proposes durable prevention rules without interactive prompts. Use when: you want automated learning capture in CI/headless pipelines.
simplify-and-harden-ci
CI-only Simplify & Harden workflow for pull requests using gh-aw (GitHub Agentic Workflows). Runs headless scan-and-report checks for simplify/harden/document, posts structured findings, and can block merges on critical or advisory classes. Use when: you want automated quality/security review in CI without interactive approvals.
skill-pipeline
>
skill-tester-ci
Validates all CI skills in this repo. Checks Agent Skills spec compliance, gh-aw workflow compilation, permission correctness, and structural conventions. Use when CI skills have been added or modified and you want to verify they compile and conform before committing.
skill-tester
Validates all interactive skills in this repo against the Agent Skills spec, project conventions, and structural requirements. Runs quick_validate.py, checks line limits, verifies cross-references, and tests hook scripts. Use when skills have been added or modified and you want to verify everything passes before committing or submitting.