Skill Quality Validation Skill

Skill Quality Validation

Ensures Claude Code skills follow best practices for discoverability, structure, content quality, and effectiveness. This skill provides checklists, patterns, and validation criteria for creating high-quality skills.

When to Use This Skill

Use this skill when you see these patterns:

✅ Yes, use this skill for:

"Create a new skill for [topic]"
"Review this skill for quality"
"Why isn't my skill being invoked?"
"Improve this skill's structure"
"Prepare this skill for sharing"
"Debug skill invocation issues"
"Make this skill more effective"

❌ No, use different skills for:

Writing skill content (use topic-specific skills)
Testing specific functionality (use testing skills)
Code review (use code-review skills)

Quick Reference

Core Principles

Every skill must have:

✅ Specific description with trigger keywords (< 100 chars)
✅ Under 500 lines (split into directory if longer)
✅ Concrete examples (not abstract)
✅ Consistent terminology
✅ Progressive disclosure (most important first)

Red flags:

❌ Vague description like "Help with Python"
❌ Single file over 500 lines
❌ Abstract guidance without examples
❌ Mixing terminology (e.g., "commit" and "change" without explanation)
❌ Time-sensitive info (e.g., "new tool just released")

Quality Checklist Workflow

When creating or reviewing a skill, copy this checklist and follow the steps:

Skill Quality Review Progress:
- [ ] Step 1: Verify description and metadata
- [ ] Step 2: Check structure and organization
- [ ] Step 3: Validate content quality
- [ ] Step 4: Review code and scripts (if applicable)
- [ ] Step 5: Test across models
- [ ] Step 6: Perform real usage testing

Step 1: Verify Description and Metadata

Check the YAML frontmatter:

[ ] Description includes specific trigger keywords (what users will say)
[ ] Description explains WHAT the skill does and WHEN to use it
[ ] Description is in third person ("Validates...", not "Apply...")
[ ] Description under 1024 characters
[ ] Priority is set appropriately (5-7 for most skills)
[ ] Name uses lowercase, hyphens, no reserved words

If checks fail: Update frontmatter before proceeding.

Step 2: Check Structure and Organization

Review file organization:

[ ] SKILL.md is under 500 lines
[ ] Uses directory structure if over 500 lines
[ ] "When to Use This Skill" section exists and is clear
[ ] Progressive disclosure: most important content first
[ ] Headers are descriptive and scannable
[ ] File references are one level deep maximum

If checks fail: Reorganize content or split into supporting files.

Step 3: Validate Content Quality

Review the skill content:

[ ] Examples are concrete and copy-pasteable
[ ] All code examples are runnable
[ ] Terminology is consistent throughout
[ ] No time-sensitive information (or properly isolated)
[ ] Workflows have clear numbered steps
[ ] Decision trees for complex choices
[ ] All placeholders are explained or replaced

If checks fail: Add missing examples or clarify instructions.

Step 4: Review Code and Scripts

If skill includes executable code:

[ ] Scripts solve problems (don't punt to Claude)
[ ] Error handling is explicit with helpful messages
[ ] All constants are justified (no "voodoo constants")
[ ] Dependencies are listed with install instructions
[ ] Paths use forward slashes (not backslashes)
[ ] Validation/feedback loops for critical operations

If checks fail: Improve error handling and documentation.

Step 5: Test Across Models

Test with all Claude models:

[ ] Tested with Haiku (simple case works)
[ ] Tested with Sonnet (moderate complexity works)
[ ] Tested with Opus (complex case works)
[ ] Skill invoked correctly in all cases
[ ] Responses follow skill guidance consistently

If checks fail: Adjust description or add more explicit guidance.

Step 6: Perform Real Usage Testing

Test in actual workflows:

[ ] Fresh start test (new project, no external docs)
[ ] Colleague test (someone else uses it)
[ ] Different project test (verify it's project-agnostic)
[ ] Error path test (intentionally trigger failures)

If checks fail: Update skill based on observed issues.

File Structure

For skills under 500 lines:

my-skill.md                # Single file

For skills over 500 lines:

my-skill/
├── SKILL.md              # Main instructions (< 500 lines)
├── examples.md           # Detailed examples
├── reference.md          # API/command reference (optional)
└── scripts/              # Helper scripts (optional)
    └── validate.py

Key principles:

SKILL.md always under 500 lines
Related files use UPPERCASE for visibility (FORMS.md, EXAMPLES.md)
Scripts in subdirectory, executed not loaded as context
Each file has single, clear purpose

Example from real skill:

pdf/
├── SKILL.md              # Core PDF guidance
├── FORMS.md              # Form-filling specific guidance
├── examples.md           # Extended examples
└── scripts/
    ├── analyze_form.py   # Utility script
    └── fill_form.py      # Form processor

Core Quality Standards

1. Description Quality

Format: Frontmatter YAML at top of SKILL.md

---
description: "Specific action + key terms + when to use"
priority: 5
---

Requirements:

Include key terms that trigger the skill
Explain both WHAT and WHEN
Keep under 100 characters
Use terms users naturally say

📖 See EXAMPLES.md for good/bad examples

2. Content Structure

SKILL.md must be:

Under 500 lines total
Well-organized with clear sections
Using progressive disclosure
Focused on one coherent topic

If exceeding 500 lines:

Split into directory structure
Keep core guidance in SKILL.md
Move detailed examples to examples.md
Move reference material to reference.md
Move scripts to scripts/ subdirectory

Progressive disclosure pattern:

# Skill Name

Brief intro (1-2 sentences)

## When to Use

Quick bullet list

## Quick Reference

Most common cases with examples

## Detailed Guidance

(Or link to examples.md)

## Advanced Patterns

(Or link to patterns.md)

3. Terminology Consistency

Rules:

Use consistent terms throughout all files
Establish vocabulary early
Explain synonyms when first used
Don't mix related terms without explanation

📖 See EXAMPLES.md for patterns

4. Concrete Examples

Every pattern needs a real, runnable example.

Examples must:

Be copy-pasteable
Show actual code/commands
Include expected output
Demonstrate the principle

📖 See EXAMPLES.md for good/bad examples

5. File Reference Depth

Keep references one level deep:

See examples.md for detailed patterns # ✅ Good

See examples.md which references patterns.md
which has code in scripts/ # ❌ Bad - too deep

6. Time-Sensitive Information

Isolate or avoid time-sensitive content:

## Current Best Practice (as of 2024)

Use ast-grep for syntax-aware searches

## Legacy Patterns

Previously, ripgrep was used...

📖 See EXAMPLES.md for deprecation patterns

Code and Script Quality

Scripts Should Solve Problems

Don't punt to Claude - solve the problem in the script:

✅ Validate and return specific errors
✅ Handle edge cases explicitly
✅ Provide actionable error messages
❌ Leave TODOs for Claude to figure out
❌ Generic "check this" functions

Error Handling

Every error path needs helpful messages:

except FileNotFoundError:
    print("Error: jj not found. Install with: brew install jj")
    sys.exit(1)

No Voodoo Constants

Justify all magic numbers:

TIMEOUT_SECONDS = 30  # API requests take 5-10s, allow 3x buffer

Package Verification

List all dependencies with install instructions:

## Dependencies

Required:

- `ast-grep` - Install: `brew install ast-grep`

Verify: `which ast-grep`

📖 See EXAMPLES.md for detailed patterns

Workflow Quality

Clear Steps

Use numbered steps with verification:

1. **Create directory:**
   ```bash
   mkdir my-dir
   ```

Verify: ls my-dir

Create file: ...


### Decision Trees

**Complex workflows need decision points:**

```markdown
**Need X?** → Use tool A
**Need Y?** → Use tool B
**Need both?** → Use A then B

📖 See EXAMPLES.md for patterns

Testing

Every skill needs testing across:

Models: Haiku, Sonnet, Opus
Scenarios: Simple, edge case, complex
Real usage: New project, no external help

📖 See TESTING.md for detailed testing guidelines

Troubleshooting

Common issues:

Skill not being invoked → Check description keywords
Too broad → Split into focused skills
Too abstract → Add concrete examples

📖 See TROUBLESHOOTING.md for complete guide

Quality Self-Check

Before considering a skill complete, copy this checklist and verify each item:

Skill Quality Verification:
- [ ] Can someone use this without follow-up questions?
- [ ] Would this work in 6 months?
- [ ] Are examples copy-pasteable and runnable?
- [ ] Can you find guidance in < 30 seconds?
- [ ] Are error messages helpful enough?
- [ ] Does the description include key trigger terms?
- [ ] Is SKILL.md under 500 lines?
- [ ] Are file references one level deep?
- [ ] Is terminology consistent throughout?

If any check fails:

Can't use without follow-up questions → Add more concrete examples
Won't work in 6 months → Isolate time-sensitive info in "Current Best Practice" sections
Examples not copy-pasteable → Complete all placeholders and add setup steps
Can't find guidance quickly → Improve headers and add table of contents
Error messages unclear → Add context, hints, and recovery steps
Description lacks triggers → Add specific terms users naturally say
SKILL.md too long → Split into directory with reference files
Deep file references → Consolidate or flatten structure
Inconsistent terminology → Choose one term and use everywhere

Evaluation Scenarios

Test this skill with these scenarios to ensure it works effectively:

Scenario 1: Simple Case - New Skill Creation

Input: "Help me create a new skill for managing Docker containers"

Expected behavior:

Skill is invoked and recognized
Provides description template with trigger keywords
Suggests file structure (single file vs directory)
Offers checklist for required sections
Reminds about concrete examples requirement

Verify:

Skill invocation happens automatically
Response includes specific checklist items
Guidance is actionable and clear

Scenario 2: Edge Case - Skill Not Being Invoked

Input: "My skill exists but Claude never uses it"

Expected behavior:

Skill is invoked and recognized
Diagnoses common invocation issues
Checks description for trigger keywords
Verifies file location and frontmatter format
Suggests testing phrases

Verify:

Troubleshooting steps are provided
Specific fixes offered for each issue
Testing methodology explained

Scenario 3: Complex Case - Comprehensive Skill Review

Input: "Review my python-scripts skill for quality and best practices"

Expected behavior:

Skill is invoked and recognized
Provides complete quality checklist
Reviews description, structure, examples, and testing
Identifies specific gaps or issues
Suggests prioritized improvements
References relevant sections of examples.md

Verify:

All quality dimensions covered
Specific, actionable feedback provided
Prioritization of issues clear
References to supporting documentation included

Additional Resources

EXAMPLES.md - Detailed good/bad examples for all principles
TESTING.md - Complete testing guidelines
TROUBLESHOOTING.md - Common issues and fixes

Agent Skills: Skill Quality Validation

Install this agent skill to your local

Skill Files