Agent Skills: eval-driven-dev

Add instrumentation, build golden datasets, write eval-based tests, run them, root-cause failures, and iterate — Ensure your Python LLM application works correctly. Make sure to use this skill whenever a user is developing, testing, QA-ing, evaluating, or benchmarking a Python project that calls an LLM. Use for making sure an LLM application works correctly, catching regressions after prompt changes, fixing unexpected behavior, or validating output quality before shipping.

UncategorizedID: github/awesome-copilot/eval-driven-dev

Install this agent skill to your local

pnpm dlx add-skill https://github.com/github/awesome-copilot/eval-driven-dev

Skill Files

Browse the full folder contents for eval-driven-dev.

Download Skill

Loading file tree…

Select a file to preview its contents.