Skill Validator Skill | Agent Skills

Skill Validator

Validate any skill against production-level quality criteria.

Validation Workflow

Phase 1: Gather Context

Read the skill's SKILL.md completely
Identify skill type from frontmatter description:
- Builder skill (creates artifacts)
- Guide skill (provides instructions)
- Automation skill (executes workflows)
- Analyzer skill (extracts insights)
- Validator skill (enforces quality)
- Hybrid skill (combination of above)
Read all reference files in references/ directory
Check for assets/scripts directories
Note frontmatter fields (name, description, allowed-tools, model)

Phase 2: Apply Criteria

Evaluate against 9 criteria categories. Each criterion scores 0-3:

0: Missing/Absent
1: Present but inadequate
2: Adequate implementation
3: Excellent implementation

Criteria Categories

1. Structure & Anatomy (Weight: 12%)

| Criterion | What to Check | |-----------|---------------| | SKILL.md exists | Root file present | | Line count | <500 lines (context is precious) | | Frontmatter complete | name and description present in YAML | | Name constraints | Lowercase, numbers, hyphens only; ≤64 chars; matches directory | | Description format | [What] + [When] format; ≤1024 chars | | Description style | Third-person: "This skill should be used when..." | | No extraneous files | No README.md, CHANGELOG.md, LICENSE in skill dir | | Progressive disclosure | Details in references/, not bloated SKILL.md | | Asset organization | Templates in assets/, scripts in scripts/ | | Large file guidance | If references >10k words, grep patterns in SKILL.md |

Fail condition: Missing SKILL.md or >800 lines = automatic fail

2. Content Quality (Weight: 15%)

| Criterion | What to Check | |-----------|---------------| | Conciseness | No verbose explanations, context is public good | | Imperative form | Instructions use "Do X" not "You should do X" | | Appropriate freedom | Constraints where needed, flexibility where safe | | Scope clarity | Clear what skill does AND does not do | | No hallucination risk | No instructions that encourage making up info | | Output specification | Clear expected outputs defined |

3. User Interaction (Weight: 12%)

| Criterion | What to Check | |-----------|---------------| | Clarification triggers | Asks questions before acting on ambiguity | | Required vs optional | Distinguishes must-know from nice-to-know | | Graceful handling | What to do when user doesn't answer | | No over-asking | Doesn't ask obvious or inferrable questions | | Question pacing | Avoids too many questions in single message | | Context awareness | Uses available context before asking |

Key pattern to look for:

## Required Clarifications
1. Question about X
2. Question about Y

## Optional Clarifications
3. Question about Z (if relevant)

Note: Avoid asking too many questions in a single message.

4. Documentation & References (Weight: 10%)

| Criterion | What to Check | |-----------|---------------| | Source URLs | Official documentation links provided | | Reference files | Complex details in references/ not main file | | Fetch guidance | Instructions to fetch docs for unlisted patterns | | Version awareness | Notes about checking for latest patterns | | Example coverage | Good/bad examples for key patterns |

Key pattern to look for:

| Resource | URL | Use For |
|----------|-----|---------|
| Official Docs | https://... | Complex cases |

5. Domain Standards (Weight: 10%)

| Criterion | What to Check | |-----------|---------------| | Best practices | Follows domain conventions (e.g., WCAG, OWASP) | | Enforcement mechanism | Checklists, validation steps, must-verify items | | Anti-patterns | Lists what NOT to do | | Quality gates | Output checklist before delivery |

Key pattern to look for:

### Must Follow
- [ ] Requirement 1
- [ ] Requirement 2

### Must Avoid
- Antipattern 1
- Antipattern 2

6. Technical Robustness (Weight: 8%)

| Criterion | What to Check | |-----------|---------------| | Error handling | Guidance for failure scenarios | | Security considerations | Input validation, secrets handling if relevant | | Dependencies | External tools/APIs documented | | Edge cases | Common edge cases addressed | | Testability | Can outputs be verified? |

7. Maintainability (Weight: 8%)

| Criterion | What to Check | |-----------|---------------| | Modularity | References are self-contained topics | | Update path | Easy to update when standards change | | No hardcoded values | Uses placeholders/variables where appropriate | | Clear organization | Logical section ordering |

8. Zero-Shot Implementation (Weight: 12%)

Skills should enable single-interaction implementation with embedded expertise.

| Criterion | What to Check | |-----------|---------------| | Before Implementation section | Context gathering guidance present | | Codebase context | Guidance to scan existing structure/patterns | | Conversation context | Uses discussed requirements/decisions | | Embedded expertise | Domain knowledge in references/, not runtime discovery | | User-only questions | Only asks for USER requirements, not domain knowledge |

Key pattern to look for:

## Before Implementation

Gather context to ensure successful implementation:

| Source | Gather |
|--------|--------|
| **Codebase** | Existing structure, patterns, conventions |
| **Conversation** | User's specific requirements |
| **Skill References** | Domain patterns from `references/` |
| **User Guidelines** | Project-specific conventions |

Red flag: Skill instructs to "research" or "discover" domain knowledge at runtime instead of embedding it.

9. Reusability (Weight: 13%)

Skills should handle variations, not single requirements.

| Criterion | What to Check | |-----------|---------------| | Handles variations | Not hardcoded to single use case | | Variable elements | Clarifications capture what VARIES | | Constant patterns | Domain best practices encoded as constants | | Not requirement-specific | Avoids hardcoded data, tools, configs | | Abstraction level | Appropriate generalization for domain |

Good example:

"Create visualizations - adaptable to data shape, chart type, library"

Bad example (too specific):

"Create bar chart with sales data using Recharts"

Key check: Does the skill work for multiple use cases within its domain?

Type-Specific Validation

After scoring general criteria, verify type-specific requirements:

| Type | Must Have | |------|-----------| | Builder | Clarifications, Output Spec, Domain Standards, Output Checklist | | Guide | Workflow Steps, Examples (Good/Bad), Official Docs links | | Automation | Scripts in scripts/, Dependencies, Error Handling, I/O Spec | | Analyzer | Analysis Scope, Evaluation Criteria, Output Format, Synthesis | | Validator | Quality Criteria, Scoring Rubric, Thresholds, Remediation |

Scoring: Deduct 10 points if type-specific requirements missing for identified type.

Scoring Guide

Category Scores

Calculate each category score:

Category Score = (Sum of criterion scores) / (Max possible) * 100

Overall Score

Overall = Σ(Category Score × Weight)

Rating Thresholds

| Score | Rating | Meaning | |-------|--------|---------| | 90-100 | Production | Ready for wide use | | 75-89 | Good | Minor improvements needed | | 60-74 | Adequate | Functional but needs work | | 40-59 | Developing | Significant gaps | | 0-39 | Incomplete | Major rework required |

Output Format

Generate validation report:

# Skill Validation Report: [skill-name]

**Rating**: [Production/Good/Adequate/Developing/Incomplete]
**Overall Score**: [X]/100

## Summary
[2-3 sentence assessment]

## Category Scores

| Category | Score | Weight | Weighted |
|----------|-------|--------|----------|
| Structure & Anatomy | X/100 | 12% | X |
| Content Quality | X/100 | 15% | X |
| User Interaction | X/100 | 12% | X |
| Documentation | X/100 | 10% | X |
| Domain Standards | X/100 | 10% | X |
| Technical Robustness | X/100 | 8% | X |
| Maintainability | X/100 | 8% | X |
| Zero-Shot Implementation | X/100 | 12% | X |
| Reusability | X/100 | 13% | X |
| **Type-Specific Deduction** | -X | - | -X |

## Critical Issues (if any)
- [Issue requiring immediate fix]

## Improvement Recommendations
1. **High Priority**: [Specific action]
2. **Medium Priority**: [Specific action]
3. **Low Priority**: [Specific action]

## Strengths
- [What skill does well]

Quick Validation Checklist

For rapid assessment, check these critical items:

Structure & Frontmatter

[ ] SKILL.md <500 lines
[ ] Frontmatter: name (≤64 chars, lowercase, hyphens) + description (≤1024 chars)
[ ] Description uses third-person style ("This skill should be used when...")
[ ] No README.md/CHANGELOG.md in skill directory

Content & Interaction

[ ] Has clarification questions (Required vs Optional)
[ ] Has output specification
[ ] Has official documentation links

Zero-Shot & Reusability

[ ] Has "Before Implementation" section (context gathering)
[ ] Domain expertise embedded in references/ (not runtime discovery)
[ ] Handles variations (not requirement-specific)

Type-Specific (check based on skill type)

[ ] Builder: Clarifications + Output Spec + Standards + Checklist
[ ] Guide: Workflow + Examples + Docs
[ ] Automation: Scripts + Dependencies + Error Handling
[ ] Analyzer: Scope + Criteria + Output Format
[ ] Validator: Criteria + Scoring + Thresholds + Remediation

If 10+ checked: Likely Production (90+) If 7-9 checked: Likely Good (75-89) If 5-6 checked: Likely Adequate (60-74) If <5 checked: Needs significant work

Reference Files

| File | When to Read | |------|--------------| | references/detailed-criteria.md | Deep evaluation of specific criterion | | references/scoring-examples.md | Example validations for calibration | | references/improvement-patterns.md | Common fixes for common issues |

Usage Examples

Validate a skill

Validate the creating-chatgpt-widgets skill against production criteria

Quick audit

Quick validation check on mcp-builder skill

Focused review

Check if skill-creator skill has proper user interaction patterns

Agent Skills: Skill Validator

Install this agent skill to your local

Skill Files

Skill Validator

Validation Workflow

Phase 1: Gather Context

Phase 2: Apply Criteria

Criteria Categories

1. Structure & Anatomy (Weight: 12%)

2. Content Quality (Weight: 15%)

3. User Interaction (Weight: 12%)

4. Documentation & References (Weight: 10%)

5. Domain Standards (Weight: 10%)

6. Technical Robustness (Weight: 8%)

7. Maintainability (Weight: 8%)

8. Zero-Shot Implementation (Weight: 12%)

9. Reusability (Weight: 13%)

Type-Specific Validation

Scoring Guide

Category Scores

Overall Score

Rating Thresholds

Output Format

Quick Validation Checklist

Structure & Frontmatter

Content & Interaction

Zero-Shot & Reusability

Type-Specific (check based on skill type)

Reference Files

Usage Examples

Validate a skill

Quick audit

Focused review