Agent Skills: grey-haven-evaluation

Evaluate LLM outputs with multi-dimensional rubrics, handle non-determinism, and implement LLM-as-judge patterns. Essential for production LLM systems. Use when testing prompts, validating outputs, comparing models, or when user mentions 'evaluation', 'testing LLM', 'rubric', 'LLM-as-judge', 'output quality', 'prompt testing', or 'model comparison'.

UncategorizedID: greyhaven-ai/claude-code-config/grey-haven-evaluation

Install this agent skill to your local

pnpm dlx add-skill https://github.com/greyhaven-ai/claude-code-config/grey-haven-evaluation

Skill Files

Browse the full folder contents for grey-haven-evaluation.

Download Skill

Loading file tree…

Select a file to preview its contents.