Agent Skills: benchmark-skill

Aggregate benchmark results from skill evaluation runs into statistics. Use when comparing evaluation runs or analyzing skill performance trends.

UncategorizedID: bluewaves-creations/bluewaves-skills/benchmark-skill

Install this agent skill to your local

pnpm dlx add-skill https://github.com/bluewaves-creations/bluewaves-skills/tree/HEAD/plugins/skills-factory/skills/benchmark-skill

Skill Files

Browse the full folder contents for benchmark-skill.

Download Skill

Loading file tree…

plugins/skills-factory/skills/benchmark-skill/SKILL.md

Skill Metadata

Name
benchmark-skill
Description
Aggregate benchmark results from skill evaluation runs into statistics. Use when comparing evaluation runs or analyzing skill performance trends.

Aggregate benchmark statistics from skill evaluation runs.

$ARGUMENTS

If $ARGUMENTS is provided, use it as the skill path. Otherwise, ask the user which skill to benchmark.

Steps

  1. Locate the skill directory and its .skill-eval/ workspace. Resolve $ARGUMENTS to an absolute path if provided.

  2. Find the latest benchmark run directory inside .skill-eval/.

  3. Run aggregate_benchmark.py:

    python3 ${SKILL_ROOT}/../skill-shaper/scripts/aggregate_benchmark.py <benchmark-dir> --skill-name <name>
    
  4. Display the generated benchmark.md summary.

  5. Report pass rate deltas between with-skill and without-skill configurations.