Skill Quality Validation
Ensures Claude Code skills follow best practices for discoverability, structure, content quality, and effectiveness. This skill provides checklists, patterns, and validation criteria for creating high-quality skills.
When to Use This Skill
Use this skill when you see these patterns:
✅ Yes, use this skill for:
- "Create a new skill for [topic]"
- "Review this skill for quality"
- "Why isn't my skill being invoked?"
- "Improve this skill's structure"
- "Prepare this skill for sharing"
- "Debug skill invocation issues"
- "Make this skill more effective"
❌ No, use different skills for:
- Writing skill content (use topic-specific skills)
- Testing specific functionality (use testing skills)
- Code review (use code-review skills)
Quick Reference
Core Principles
Every skill must have:
- ✅ Specific description with trigger keywords (< 100 chars)
- ✅ Under 500 lines (split into directory if longer)
- ✅ Concrete examples (not abstract)
- ✅ Consistent terminology
- ✅ Progressive disclosure (most important first)
Red flags:
- ❌ Vague description like "Help with Python"
- ❌ Single file over 500 lines
- ❌ Abstract guidance without examples
- ❌ Mixing terminology (e.g., "commit" and "change" without explanation)
- ❌ Time-sensitive info (e.g., "new tool just released")
Quality Checklist Workflow
When creating or reviewing a skill, copy this checklist and follow the steps:
Skill Quality Review Progress:
- [ ] Step 1: Verify description and metadata
- [ ] Step 2: Check structure and organization
- [ ] Step 3: Validate content quality
- [ ] Step 4: Review code and scripts (if applicable)
- [ ] Step 5: Test across models
- [ ] Step 6: Perform real usage testing
Step 1: Verify Description and Metadata
Check the YAML frontmatter:
- [ ] Description includes specific trigger keywords (what users will say)
- [ ] Description explains WHAT the skill does and WHEN to use it
- [ ] Description is in third person ("Validates...", not "Apply...")
- [ ] Description under 1024 characters
- [ ] Priority is set appropriately (5-7 for most skills)
- [ ] Name uses lowercase, hyphens, no reserved words
If checks fail: Update frontmatter before proceeding.
Step 2: Check Structure and Organization
Review file organization:
- [ ] SKILL.md is under 500 lines
- [ ] Uses directory structure if over 500 lines
- [ ] "When to Use This Skill" section exists and is clear
- [ ] Progressive disclosure: most important content first
- [ ] Headers are descriptive and scannable
- [ ] File references are one level deep maximum
If checks fail: Reorganize content or split into supporting files.
Step 3: Validate Content Quality
Review the skill content:
- [ ] Examples are concrete and copy-pasteable
- [ ] All code examples are runnable
- [ ] Terminology is consistent throughout
- [ ] No time-sensitive information (or properly isolated)
- [ ] Workflows have clear numbered steps
- [ ] Decision trees for complex choices
- [ ] All placeholders are explained or replaced
If checks fail: Add missing examples or clarify instructions.
Step 4: Review Code and Scripts
If skill includes executable code:
- [ ] Scripts solve problems (don't punt to Claude)
- [ ] Error handling is explicit with helpful messages
- [ ] All constants are justified (no "voodoo constants")
- [ ] Dependencies are listed with install instructions
- [ ] Paths use forward slashes (not backslashes)
- [ ] Validation/feedback loops for critical operations
If checks fail: Improve error handling and documentation.
Step 5: Test Across Models
Test with all Claude models:
- [ ] Tested with Haiku (simple case works)
- [ ] Tested with Sonnet (moderate complexity works)
- [ ] Tested with Opus (complex case works)
- [ ] Skill invoked correctly in all cases
- [ ] Responses follow skill guidance consistently
If checks fail: Adjust description or add more explicit guidance.
Step 6: Perform Real Usage Testing
Test in actual workflows:
- [ ] Fresh start test (new project, no external docs)
- [ ] Colleague test (someone else uses it)
- [ ] Different project test (verify it's project-agnostic)
- [ ] Error path test (intentionally trigger failures)
If checks fail: Update skill based on observed issues.
File Structure
For skills under 500 lines:
my-skill.md # Single file
For skills over 500 lines:
my-skill/
├── SKILL.md # Main instructions (< 500 lines)
├── examples.md # Detailed examples
├── reference.md # API/command reference (optional)
└── scripts/ # Helper scripts (optional)
└── validate.py
Key principles:
- SKILL.md always under 500 lines
- Related files use UPPERCASE for visibility (FORMS.md, EXAMPLES.md)
- Scripts in subdirectory, executed not loaded as context
- Each file has single, clear purpose
Example from real skill:
pdf/
├── SKILL.md # Core PDF guidance
├── FORMS.md # Form-filling specific guidance
├── examples.md # Extended examples
└── scripts/
├── analyze_form.py # Utility script
└── fill_form.py # Form processor
Core Quality Standards
1. Description Quality
Format: Frontmatter YAML at top of SKILL.md
---
description: "Specific action + key terms + when to use"
priority: 5
---
Requirements:
- Include key terms that trigger the skill
- Explain both WHAT and WHEN
- Keep under 100 characters
- Use terms users naturally say
📖 See EXAMPLES.md for good/bad examples
2. Content Structure
SKILL.md must be:
- Under 500 lines total
- Well-organized with clear sections
- Using progressive disclosure
- Focused on one coherent topic
If exceeding 500 lines:
- Split into directory structure
- Keep core guidance in SKILL.md
- Move detailed examples to examples.md
- Move reference material to reference.md
- Move scripts to scripts/ subdirectory
Progressive disclosure pattern:
# Skill Name
Brief intro (1-2 sentences)
## When to Use
Quick bullet list
## Quick Reference
Most common cases with examples
## Detailed Guidance
(Or link to examples.md)
## Advanced Patterns
(Or link to patterns.md)
3. Terminology Consistency
Rules:
- Use consistent terms throughout all files
- Establish vocabulary early
- Explain synonyms when first used
- Don't mix related terms without explanation
📖 See EXAMPLES.md for patterns
4. Concrete Examples
Every pattern needs a real, runnable example.
Examples must:
- Be copy-pasteable
- Show actual code/commands
- Include expected output
- Demonstrate the principle
📖 See EXAMPLES.md for good/bad examples
5. File Reference Depth
Keep references one level deep:
See examples.md for detailed patterns # ✅ Good
See examples.md which references patterns.md
which has code in scripts/ # ❌ Bad - too deep
6. Time-Sensitive Information
Isolate or avoid time-sensitive content:
## Current Best Practice (as of 2024)
Use ast-grep for syntax-aware searches
## Legacy Patterns
Previously, ripgrep was used...
📖 See EXAMPLES.md for deprecation patterns
Code and Script Quality
Scripts Should Solve Problems
Don't punt to Claude - solve the problem in the script:
- ✅ Validate and return specific errors
- ✅ Handle edge cases explicitly
- ✅ Provide actionable error messages
- ❌ Leave TODOs for Claude to figure out
- ❌ Generic "check this" functions
Error Handling
Every error path needs helpful messages:
except FileNotFoundError:
print("Error: jj not found. Install with: brew install jj")
sys.exit(1)
No Voodoo Constants
Justify all magic numbers:
TIMEOUT_SECONDS = 30 # API requests take 5-10s, allow 3x buffer
Package Verification
List all dependencies with install instructions:
## Dependencies
Required:
- `ast-grep` - Install: `brew install ast-grep`
Verify: `which ast-grep`
📖 See EXAMPLES.md for detailed patterns
Workflow Quality
Clear Steps
Use numbered steps with verification:
1. **Create directory:**
```bash
mkdir my-dir
```
Verify: ls my-dir
- Create file: ...
### Decision Trees
**Complex workflows need decision points:**
```markdown
**Need X?** → Use tool A
**Need Y?** → Use tool B
**Need both?** → Use A then B
📖 See EXAMPLES.md for patterns
Testing
Every skill needs testing across:
- Models: Haiku, Sonnet, Opus
- Scenarios: Simple, edge case, complex
- Real usage: New project, no external help
📖 See TESTING.md for detailed testing guidelines
Troubleshooting
Common issues:
- Skill not being invoked → Check description keywords
- Too broad → Split into focused skills
- Too abstract → Add concrete examples
📖 See TROUBLESHOOTING.md for complete guide
Quality Self-Check
Before considering a skill complete, copy this checklist and verify each item:
Skill Quality Verification:
- [ ] Can someone use this without follow-up questions?
- [ ] Would this work in 6 months?
- [ ] Are examples copy-pasteable and runnable?
- [ ] Can you find guidance in < 30 seconds?
- [ ] Are error messages helpful enough?
- [ ] Does the description include key trigger terms?
- [ ] Is SKILL.md under 500 lines?
- [ ] Are file references one level deep?
- [ ] Is terminology consistent throughout?
If any check fails:
- Can't use without follow-up questions → Add more concrete examples
- Won't work in 6 months → Isolate time-sensitive info in "Current Best Practice" sections
- Examples not copy-pasteable → Complete all placeholders and add setup steps
- Can't find guidance quickly → Improve headers and add table of contents
- Error messages unclear → Add context, hints, and recovery steps
- Description lacks triggers → Add specific terms users naturally say
- SKILL.md too long → Split into directory with reference files
- Deep file references → Consolidate or flatten structure
- Inconsistent terminology → Choose one term and use everywhere
Evaluation Scenarios
Test this skill with these scenarios to ensure it works effectively:
Scenario 1: Simple Case - New Skill Creation
Input: "Help me create a new skill for managing Docker containers"
Expected behavior:
- Skill is invoked and recognized
- Provides description template with trigger keywords
- Suggests file structure (single file vs directory)
- Offers checklist for required sections
- Reminds about concrete examples requirement
Verify:
- Skill invocation happens automatically
- Response includes specific checklist items
- Guidance is actionable and clear
Scenario 2: Edge Case - Skill Not Being Invoked
Input: "My skill exists but Claude never uses it"
Expected behavior:
- Skill is invoked and recognized
- Diagnoses common invocation issues
- Checks description for trigger keywords
- Verifies file location and frontmatter format
- Suggests testing phrases
Verify:
- Troubleshooting steps are provided
- Specific fixes offered for each issue
- Testing methodology explained
Scenario 3: Complex Case - Comprehensive Skill Review
Input: "Review my python-scripts skill for quality and best practices"
Expected behavior:
- Skill is invoked and recognized
- Provides complete quality checklist
- Reviews description, structure, examples, and testing
- Identifies specific gaps or issues
- Suggests prioritized improvements
- References relevant sections of examples.md
Verify:
- All quality dimensions covered
- Specific, actionable feedback provided
- Prioritization of issues clear
- References to supporting documentation included
Additional Resources
- EXAMPLES.md - Detailed good/bad examples for all principles
- TESTING.md - Complete testing guidelines
- TROUBLESHOOTING.md - Common issues and fixes