Prompt Optimizer
Optimize prompts with evals. Keep every instruction, example, and external context reference causal.
Load Only What You Need
| Need | Read |
|------|------|
| New prompt | references/core-patterns.md, references/model-family-notes.md, references/transformed-examples.md |
| Existing prompt | references/meta-optimization-loop.md, references/core-patterns.md, references/model-family-notes.md |
| Model-family port | references/model-family-notes.md, references/core-patterns.md |
| Repeated failures | references/meta-optimization-loop.md, references/core-patterns.md |
| Weak or ambiguous draft | references/transformed-examples.md |
| Provenance | SOURCES.md |
Step 1: Capture Contract
Record before editing:
- task type: new, refine, port, or debug
- target model family and snapshot, if known
- prompt surface:
system,developer,user, tool descriptions, examples, schemas - layer owners: platform, deployer/persona, retrieved context, user payload
- objective and non-goals
- inputs, tools, and external files available
- required output shape
- success criteria and failure cases
- hard constraints: latency, verbosity, safety, budget, tool use, style
If success criteria or examples are missing, create a small eval set first. If the bottleneck is model choice, retrieval, tool schema, or missing evals, say so before rewriting.
Step 2: Inventory External Context
For repo or agent prompts, list stable context by exact path:
| Context type | Examples |
|--------------|----------|
| Agent rules | AGENTS.md, CLAUDE.md |
| Specs | specs/*.md, docs/api.md |
| Policies | SECURITY.md, docs/releasing.md |
| Examples | examples/, tests/fixtures/ |
Rules:
- Reference stable files by repo-relative path instead of copying them.
- Paste only excerpts needed for the prompt or eval case.
- Mark whether a file is
loaded,referenced, orout of scope. - Avoid vague context pointers such as "read the docs".
Step 3: Choose Model Strategy
Read references/model-family-notes.md.
- Known family: optimize for that family.
- Unknown family: write a portable base plus short adapter notes.
- Snapshot changes: rerun evals.
- Cross-family divergence: specialize only the failing layer.
Step 4: Shape Prompt
Read references/core-patterns.md.
- Put stable policy in
systemordeveloper. - Put task-local facts, retrieved context, and variables in user-facing sections.
- Keep one owner per behavior rule.
- Use headings or tags only to separate content types.
- Put tool policy in prompt text; keep schemas in provider-native tools.
- Keep persona light unless it changes behavior.
- Use the shortest wording that preserves the constraint.
- Cut filler, repeated reminders, dead examples, and rationale that does not affect evals.
Step 5: Optimize
Read references/meta-optimization-loop.md for refinements.
- Baseline the current prompt on the same eval slice.
- Cluster failures by root cause.
- Write concrete edit criticisms.
- Generate two to four candidates:
- minimal-diff repair
- structure-first rewrite
- examples-first or tool-rule variant
- provider adapter when needed
- Compare candidates on the same cases.
- Keep a short optimization log.
- Validate the winner on holdout cases.
- Stop on plateau, oscillation, overfit, excessive cost, or non-prompt bottleneck.
Step 6: Return Package
Return:
TargetSuccess CriteriaExternal ContextOptimized PromptAdapter NotesEval SetOptimization LogResidual Risks
For existing prompts, include a concise diff-style note of the main behavioral changes.
Failure Modes
- editing before defining the eval target
- mixing policy, examples, and raw context without boundaries
- duplicating rules across layers
- putting durable policy in user payloads
- asking for chain-of-thought
- keeping contradictory legacy instructions
- overfitting to one or two examples
- retaining examples that no longer improve evals
- fixing tool-use failures only in prompt text when tool descriptions or schemas are weak
- adding markup that does not reduce ambiguity
- using persona as a substitute for behavior rules