Aggregate benchmark statistics from skill evaluation runs.
$ARGUMENTS
If $ARGUMENTS is provided, use it as the skill path. Otherwise, ask the user which skill to benchmark.
Steps
-
Locate the skill directory and its
.skill-eval/workspace. Resolve$ARGUMENTSto an absolute path if provided. -
Find the latest benchmark run directory inside
.skill-eval/. -
Run aggregate_benchmark.py:
python3 ${SKILL_ROOT}/../skill-shaper/scripts/aggregate_benchmark.py <benchmark-dir> --skill-name <name> -
Display the generated
benchmark.mdsummary. -
Report pass rate deltas between with-skill and without-skill configurations.