Skill Builder Workflow
Create, evaluate, and improve Agent skills to production quality.
Quick Start
| Mode | When to Use | Starting Step | |------|-------------|---------------| | Create | Building a new skill from scratch | Step 1 | | Evaluate | Scoring an existing skill | Step 4 | | Improve | Upgrading a skill to 100/100 | Step 5 |
Skill Files
| File | Purpose |
|------|---------|
| SKILL.md | This workflow |
| SCORING.md | Structure + Efficacy rubrics (MUST READ before scoring) |
| TEMPLATES.md | Starter templates and patterns (MUST READ before creating) |
| EXAMPLES.md | Before/after improvement examples |
| CHECKLIST.md | 50-point validation checklist |
Mode 1: Create a New Skill
Step 1: Gather Requirements
Ask the user:
- What does the skill do? (core capability)
- When should it activate? (trigger contexts)
- What tools/scripts are needed? (dependencies)
- What's the expected output? (deliverables)
- What input quality issues are common? (see Input Decomposition below)
- What does this assume the user knows? (see User Assumptions below)
Input Decomposition
[!IMPORTANT] Most real-world inputs are messy. If the domain typically has vague, incomplete, or poorly-structured input, the skill MUST include a transformation step.
Ask: "What does bad input look like in this domain?"
| Input Quality | Skill Must Include | |---------------|-------------------| | Usually clean and structured | No transformation needed | | Sometimes vague or incomplete | Validation step that asks for clarification | | Often messy or ambiguous | Decomposition step with probing questions to transform input |
Decomposition step pattern:
### Step N: Decompose Input
Transform raw input into structured form using these probes:
| Probe | Purpose |
|-------|---------|
| "What specifically happened?" | Extract concrete actions |
| "What was the outcome?" | Capture measurable results |
| "How often does this occur?" | Establish patterns |
User Capability Assumptions
List what the skill assumes the user can do. For each assumption, either:
- (a) Remove it by adding a compensating step, OR
- (b) Document it as a prerequisite
| Assumption | Compensation Strategy | |------------|----------------------| | User can provide structured input | Add decomposition step | | User knows domain terminology | Add glossary or explain inline | | User can make judgment calls | Add decision logic with explicit criteria | | User knows quality standards | Add validation checklist |
Step 1.5: Identify the Hardest Parts
[!CRITICAL] State-of-the-art skills solve the hard problems, not just the easy ones. Before designing the workflow, identify where experts struggle and novices get stuck.
Ask: "What are the 2-3 hardest judgment calls in this domain?"
Signs of a hard judgment call:
- Experts disagree on the right answer
- Multiple valid options exist
- Context determines the best choice
- Novices consistently get it wrong
For each hard part, the skill MUST include:
| Hard Part Type | Required Solution | |----------------|-------------------| | Ambiguous categorization | Disambiguation logic with explicit criteria | | Quality/intensity judgment | Calibration guidance with thresholds | | Context-dependent choice | Decision matrix or if/then rules | | Subjective evaluation | Rubric with concrete examples |
Example pattern for disambiguation:
| If X could be A or B... | Ask this to disambiguate |
|-------------------------|--------------------------|
| [Ambiguous situation 1] | Was the emphasis on [criterion]? → A. On [other criterion]? → B |
| [Ambiguous situation 2] | Did it primarily [test for A] or [test for B]? |
[!WARNING] A lookup table is not disambiguation. If your skill has a reference table but no logic for handling cases that match multiple entries, it's incomplete.
Step 2: Assess Complexity & Choose Structure
[!CAUTION] Default to Simple. Only upgrade complexity if the skill genuinely needs it. Ask: "Would this skill work without this file?" If yes, don't add it.
Complexity Assessment:
| If the skill... | Then it's... | |-----------------|--------------| | Does ONE thing, linear flow, no scripts, <5 decision points | Simple | | Multi-step workflow, needs reference tables, moderate domain knowledge | Standard | | Many conditionals, requires scripts, extensive domain expertise, high failure modes | Complex |
Structure by Complexity:
| Complexity | Structure |
|------------|-----------|
| Simple | SKILL.md only |
| Standard | SKILL.md + REFERENCE.md or EXAMPLES.md |
| Complex | Above + TESTING.md + scripts/ |
[!TIP] Signs you're over-engineering:
- Adding TESTING.md with obvious scenarios ("it should work")
- Creating REFERENCE.md that repeats the workflow
- Writing EXAMPLES.md when 2 inline examples suffice
Read TEMPLATES.md for starter templates.
Step 3: Write the SKILL.md
Use templates from TEMPLATES.md. Ensure:
- Frontmatter — valid YAML with
name(must match folder name) anddescription - Description — includes BOTH what it does AND when to use it
- "Why?" line — one sentence after title explaining the problem this solves
- Workflow — clear, numbered steps
- Progressive disclosure — link to supporting files (only if needed)
[!TIP] Description is critical for discovery. Include multiple trigger keywords.
Step 3.a: Register the Skill
[!CRITICAL] Do NOT edit AGENTS.md manually.
- Run the
skills-index-updaterskill or script:python3 ~/.claude/skills/skills-index-updater/scripts/update_skill_index.py - Verify
AGENTS.mdcontains your new/updated skill.
After creating, proceed to Step 4 to evaluate.
Mode 2: Evaluate an Existing Skill
Step 4: Score the Skill
[!CRITICAL] Read SCORING.md completely before scoring. It contains both rubrics and scoring worksheets.
Process:
- Read all skill files (SKILL.md + supporting files)
- Score Structure (0-100): 9 categories — documentation completeness
- Score Efficacy (0-100): 6 categories — actual effectiveness
- Use Combined Score Matrix in SCORING.md for verdict
- Identify gaps in both dimensions
Present results using the format in SCORING.md.
If either score < 90, proceed to Step 5.
Mode 3: Improve to 100/100
Step 5: Plan Improvements
Based on evaluation, prioritize:
| Priority | Fixes | Target | |----------|-------|--------| | P1 Critical | Missing frontmatter, invalid YAML, empty description | Required to function | | P2 Important | Missing triggers, no examples, no progressive disclosure | Required for 95+ | | P3 Polish | Missing troubleshooting, no quick start, terminology issues | Required for 100 |
Step 6: Execute Improvements
[!CAUTION] Get user approval before making changes. Present the plan and wait for confirmation.
Work systematically:
- Fix frontmatter first (skill won't load without valid YAML)
- Enhance description with trigger keywords
- Add progressive disclosure if SKILL.md > 200 lines
- Create supporting files as needed
- Add quality sections (Troubleshooting, Quick Start)
Step 7: Verify Final Score
- Re-read all skill files
- Re-score against both rubrics
- Confirm scores meet target
- Present final structure and summary
Validation Checklist (Quick)
Before declaring complete:
- [ ]
namein frontmatter matches folder name - [ ]
descriptionincludes what AND when - [ ] "Why?" line present after title
- [ ] SKILL.md under 500 lines
- [ ] Structure matches complexity (not over-engineered)
- [ ] Examples show concrete input/output
- [ ] Consistent terminology throughout
Full checklist: CHECKLIST.md
Troubleshooting
| Problem | Solution | |---------|----------| | Skill not discovered | Check description has trigger keywords | | Low Structure score | Add missing sections per SCORING.md rubric | | Low Efficacy score | Simplify — skill may be doing too many things | | Frontmatter errors | Validate YAML syntax, check for reserved words | | User confused by skill | Add Quick Start, improve decision density |
Reference
- SCORING.md — Structure + Efficacy rubrics with worksheets
- TEMPLATES.md — Starter templates and common patterns
- EXAMPLES.md — Before/after improvement examples
- CHECKLIST.md — 50-point validation checklist