Agent Skills: /evaluate:plugin

|

UncategorizedID: laurigates/claude-plugins/evaluate-plugin-batch

Install this agent skill to your local

pnpm dlx add-skill https://github.com/laurigates/claude-plugins/tree/HEAD/evaluate-plugin/skills/evaluate-plugin-batch

Skill Files

Browse the full folder contents for evaluate-plugin-batch.

Download Skill

Loading file tree…

evaluate-plugin/skills/evaluate-plugin-batch/SKILL.md

Skill Metadata

Name
evaluate-plugin-batch
Description
|

/evaluate:plugin

Batch evaluate all skills in a plugin. Runs /evaluate:skill for each skill, then produces a plugin-level quality report.

When to Use This Skill

| Use this skill when... | Use alternative when... | |------------------------|------------------------| | Auditing all skills in a plugin before release | Evaluating a single skill -> /evaluate:skill | | Establishing quality baselines across a plugin | Viewing past results -> /evaluate:report | | Checking overall plugin quality after refactoring | Need structural compliance -> plugin-compliance-check.sh |

Context

  • Plugin skills: !find $1/skills -name "SKILL.md" -maxdepth 3
  • Existing evals: !find $1/skills -name "evals.json" -maxdepth 3

Parameters

Parse these from $ARGUMENTS:

| Parameter | Default | Description | |-----------|---------|-------------| | <plugin-name> | required | Name of the plugin to evaluate | | --create-missing-evals | false | Generate evals for skills that lack them | | --parallel N | 1 | Max concurrent skill evaluations |

Execution

Step 1: Discover skills

Find all skills in the plugin:

<plugin-name>/skills/*/SKILL.md

List them and count the total.

Step 2: Filter and prepare

For each skill, check if evals.json exists:

  • Has evals: include in evaluation
  • No evals + --create-missing-evals: include, will create evals during evaluation
  • No evals, no flag: skip with a note

Report the breakdown:

Found N skills in <plugin-name>:
  - M with eval cases
  - K without eval cases (skipped | will create)

Step 3: Run evaluations

For each included skill, invoke /evaluate:skill via the SlashCommand tool:

SlashCommand: /evaluate:skill <plugin-name>/<skill-name> [--create-evals]

If --parallel N is set and N > 1, batch evaluations into groups of N. Otherwise, run sequentially.

Track progress with TodoWrite — mark each skill as it completes.

Step 4: Aggregate plugin report

After all skill evaluations complete, read each skill's benchmark.json and aggregate:

bash evaluate-plugin/scripts/aggregate_benchmark.sh <plugin-name>

Write aggregated results to <plugin-name>/eval-results/plugin-benchmark.json.

Step 5: Report

Print a plugin-level summary table:

## Plugin Evaluation: <plugin-name>

| Skill | Evals | Pass Rate | Status |
|-------|-------|-----------|--------|
| skill-a | 4 | 100% | PASS |
| skill-b | 3 | 67% | PARTIAL |
| skill-c | 5 | 80% | PASS |

**Overall**: 82% pass rate across N eval cases

Rank skills by pass rate. Flag any below 50% as needing attention.

Agentic Optimizations

| Context | Command | |---------|---------| | List plugin skills | ls -d <plugin>/skills/*/SKILL.md | | Check for evals | find <plugin>/skills -name evals.json | | Count skills | ls -d <plugin>/skills/*/SKILL.md \| wc -l | | Aggregate results | bash evaluate-plugin/scripts/aggregate_benchmark.sh <plugin> |

Quick Reference

| Flag | Description | |------|-------------| | --create-missing-evals | Generate eval cases for skills without them | | --parallel N | Max concurrent evaluations (default: 1) |