GEPA optimize_anything
Goal
Optimize any artifact representable as text — code, prompts, agent architectures, vector graphics, configurations — using a single declarative API powered by GEPA's reflective evolutionary search.
When to Use
- Beyond prompt optimization — optimizing code, configs, SVGs, scheduling policies, etc.
- Single hard problems — circle packing, kernel generation, algorithm discovery
- Batch related problems — CUDA kernels, code generation tasks with cross-transfer
- Generalization — agent skills, policies, or prompts that must transfer to unseen inputs
- When you can express quality as a score and provide diagnostic feedback (ASI)
Inputs
| Input | Type | Description |
|-------|------|-------------|
| seed_candidate | str \| dict[str, str] \| None | Starting artifact text, or None for seedless mode |
| evaluator | Callable | Returns score (higher=better), optionally with ASI dict |
| dataset | list \| None | Training examples (for multi-task and generalization modes) |
| valset | list \| None | Validation set (for generalization mode) |
| objective | str \| None | Natural language description of what to optimize for |
| background | str \| None | Domain knowledge and constraints |
| config | GEPAConfig \| None | Engine, reflection, and tracking settings |
Outputs
| Output | Type | Description |
|--------|------|-------------|
| result.best_candidate | str \| dict | Best optimized artifact |
Workflow
Phase 1: Install
pip install gepa
Phase 2: Define Evaluator with ASI
The evaluator scores a candidate and returns Actionable Side Information (ASI) — diagnostic feedback that guides the LLM proposer during reflection.
Simple evaluator (score only):
import gepa.optimize_anything as oa
def evaluate(candidate: str) -> float:
score, diagnostic = run_my_system(candidate)
oa.log(f"Error: {diagnostic}") # captured as ASI
return score
Rich evaluator (score + structured ASI):
def evaluate(candidate: str) -> tuple[float, dict]:
result = execute_code(candidate)
return result.score, {
"Error": result.stderr,
"Output": result.stdout,
"Runtime": f"{result.time_ms:.1f}ms",
}
ASI can include open-ended text, structured data, multi-objectives (via scores), or images (via gepa.Image) for vision-capable LLMs.
Phase 3: Choose Optimization Mode
Mode 1 — Single-Task Search: Solve one hard problem. No dataset needed.
result = oa.optimize_anything(
seed_candidate="<your initial artifact>",
evaluator=evaluate,
)
Mode 2 — Multi-Task Search: Solve a batch of related problems with cross-transfer.
result = oa.optimize_anything(
seed_candidate="<your initial artifact>",
evaluator=evaluate,
dataset=tasks,
)
Mode 3 — Generalization: Build a skill/prompt/policy that transfers to unseen problems.
result = oa.optimize_anything(
seed_candidate="<your initial artifact>",
evaluator=evaluate,
dataset=train,
valset=val,
)
Seedless mode: Describe what you need instead of providing a seed.
result = oa.optimize_anything(
evaluator=evaluate,
objective="Generate a Python function `reverse()` that reverses a string.",
)
Phase 4: Use Results
print(result.best_candidate)
Production Example
import gepa.optimize_anything as oa
from gepa import Image
import logging
logger = logging.getLogger(__name__)
# ---------- SVG optimization with VLM feedback ----------
GOAL = "a pelican riding a bicycle"
VLM = "vertex_ai/gemini-3-flash-preview"
VISUAL_ASPECTS = [
{"id": "overall", "criteria": f"Rate overall quality of this SVG ({GOAL}). SCORE: X/10"},
{"id": "anatomy", "criteria": "Rate pelican accuracy: beak, pouch, plumage. SCORE: X/10"},
{"id": "bicycle", "criteria": "Rate bicycle: wheels, frame, handlebars, pedals. SCORE: X/10"},
{"id": "composition", "criteria": "Rate how convincingly the pelican rides the bicycle. SCORE: X/10"},
]
def evaluate(candidate, example):
"""Render SVG, score with a VLM, return (score, ASI)."""
image = render_image(candidate["svg_code"]) # via cairosvg
score, feedback = get_vlm_score_feedback(VLM, image, example["criteria"])
return score, {
"RenderedSVG": Image(base64_data=image, media_type="image/png"),
"Feedback": feedback,
}
result = oa.optimize_anything(
seed_candidate={"svg_code": "<svg>...</svg>"},
evaluator=evaluate,
dataset=VISUAL_ASPECTS,
background=f"Optimize SVG source code depicting '{GOAL}'. "
"Improve anatomy, composition, and visual quality.",
)
logger.info(f"Best SVG:\n{result.best_candidate['svg_code']}")
# ---------- Code optimization (single-task) ----------
def evaluate_solver(candidate: str) -> tuple[float, dict]:
"""Evaluate a Python solver for a mathematical optimization problem."""
import subprocess, json
proc = subprocess.run(
["python", "-c", candidate],
capture_output=True, text=True, timeout=30,
)
if proc.returncode != 0:
oa.log(f"Runtime error: {proc.stderr}")
return 0.0, {"Error": proc.stderr}
try:
output = json.loads(proc.stdout)
return output["score"], {
"Output": output.get("solution"),
"Runtime": f"{output.get('time_ms', 0):.1f}ms",
}
except (json.JSONDecodeError, KeyError) as e:
oa.log(f"Parse error: {e}")
return 0.0, {"Error": str(e), "Stdout": proc.stdout}
result = oa.optimize_anything(
evaluator=evaluate_solver,
objective="Write a Python solver for the bin packing problem that "
"minimizes the number of bins. Output JSON with 'score' and 'solution'.",
background="Use first-fit-decreasing as a starting heuristic. "
"Higher score = fewer bins used.",
)
print(result.best_candidate)
# ---------- Agent architecture generalization ----------
def evaluate_agent(candidate: str, example: dict) -> tuple[float, dict]:
"""Run an agent architecture on a task and score it."""
exec_globals = {}
exec(candidate, exec_globals)
agent_fn = exec_globals.get("solve")
if agent_fn is None:
return 0.0, {"Error": "No `solve` function defined"}
try:
prediction = agent_fn(example["input"])
correct = prediction == example["expected"]
score = 1.0 if correct else 0.0
feedback = "Correct" if correct else (
f"Expected '{example['expected']}', got '{prediction}'"
)
return score, {"Prediction": prediction, "Feedback": feedback}
except Exception as e:
return 0.0, {"Error": str(e)}
result = oa.optimize_anything(
seed_candidate="def solve(input):\n return input",
evaluator=evaluate_agent,
dataset=train_tasks,
valset=val_tasks,
background="Discover a Python agent function `solve(input)` that "
"generalizes across unseen reasoning tasks.",
)
print(result.best_candidate)
Integration with DSPy
optimize_anything complements DSPy's built-in optimizers. Use DSPy optimizers (GEPA, MIPROv2, BootstrapFewShot) for DSPy programs, and optimize_anything for arbitrary text artifacts outside DSPy:
import dspy
import gepa.optimize_anything as oa
# DSPy program optimization (use dspy.GEPA)
optimizer = dspy.GEPA(
metric=gepa_metric,
reflection_lm=dspy.LM("openai/gpt-4o"),
auto="medium",
)
compiled = optimizer.compile(agent, trainset=trainset)
# Non-DSPy artifact optimization (use optimize_anything)
result = oa.optimize_anything(
seed_candidate=my_config_yaml,
evaluator=eval_config,
background="Optimize Kubernetes scheduling policy for cost.",
)
Best Practices
- Rich ASI — The more diagnostic feedback you provide, the better the proposer can reason about improvements
- Use
oa.log()— Route prints to the proposer as ASI instead of stdout - Structured returns — Return
(score, dict)tuples for multi-faceted diagnostics - Seedless for exploration — Use
objective=when the solution space is large and unfamiliar - Background context — Provide domain knowledge via
background=to constrain the search - Generalization mode — Always provide
valsetwhen the artifact must transfer to unseen inputs - Images as ASI — Use
gepa.Imageto pass rendered outputs to vision-capable LLMs
Limitations
- Requires the
gepapackage (pip install gepa) - Evaluator must be deterministic or low-variance for stable optimization
- Compute cost scales with number of candidates explored
- Single-task mode does not generalize; use mode 3 with
valsetfor transfer - Currently powered by GEPA backend; API is backend-agnostic for future strategies