Agent Skill Creator Standard Skill

Agent Skill Creator Standard

Priority: P0 — Apply to ALL skills

Maximize Token ROI. Every line in SKILL.md must provide specific procedural value. Activation (how it triggers) and Implementation (how it helps) primary quality metrics.

Three-Level Loading System

Level 1 Frontmatter: name + description (Activation Anchor), ≤100 words.
Level 2 SKILL.md body: Core Rules + Workflows (Implementation Core), ≤100 lines.
Level 3 references/: Detailed examples, schemas, and "TESTS.md" (On-demand).

Workflow (New or Existing Skill)

New skill:

Research — web-search domain best practices, checklists, and standards; extract key terms → triggers, workflows → guidelines, mistakes → anti-patterns. See Web Search Research.
Capture intent — what it , when it trigger, expected output format?
Write SKILL.md — draft using TEMPLATE.md
Test — spawn parallel subagents: one with-skill, one without-skill (baseline)
Evaluate — grade assertions, review benchmark (pass rate, tokens, time)
Iterate — rewrite based on feedback, rerun into next iteration dir, repeat
Optimize description — run trigger eval queries, target ≥80% accuracy
Pressure-test — for discipline skills, capture agent excuses, red flags, and stop conditions Existing skill:
Audit — run Quality Checklist below; identify violations
Snapshot — cp -r <skill-dir> <workspace>/skill-snapshot/ before any edits
Improve SKILL.md — fix violations, compress, move oversized content to references/
Test — spawn parallel subagents: one with-new-skill, one with-snapshot (baseline)
Evaluate & iterate — same as steps 4–5 above
Optimize description — re-run trigger eval if description changed
Harden — add rationalization counters where agents still fail under pressure See Eval Workflow for full testing + iteration details.

Description Quality (Activation)

Third-Person Voice: Use Standardizes..., Audits..., Encrypts.... Avoid "I will" or "This skill helps to".
What + When Structure:
What: Define 5–8 specific capabilities (e.g., "Generates JWT tokens, rotates keys").
When: Explicitly define triggers (e.g., "Use when user says 'rotate keys'").
Specificity: Avoid vague verbs like "manage" or "handle". Use "Validate", "Inject", "Refactor", "Sanitize".
Trigger Hint: Include (triggers: *.ext, keyword) suffix for technical skills.

Content Quality (Implementation)

No Redundant Knowledge: NOT explain concepts AI already knows (e.g., HTTP status codes, standard library docs, basic SOLID principles). Focus strictly on project-specific rules.
Caveman Compression: Use "Caveman Mode" for rules to save tokens. Drop articles (, , ), remove filler words ("should", "will", "), and use telegraphic snippets.
Standard: "You should ensure that database connection closed after every query to prevent leaks." (15 tokens)
Caveman: "Close DB connection after query. Prevent leaks." (7 tokens)
Actionability: Examples must copy-paste ready and executable.
Workflow Clarity: Use sequential ordered lists for multi-step processes.
Progressive Disclosure: Move code blocks >10 lines to references/.
Pressure Hardening: Discipline skills must name red flags, common excuses, and exact stop/restart conditions.

Behavior Guardrails

Use pressure tests for discipline skills: TDD, debugging, verification, review, protocol, and workflow skills need baseline failure evidence.
Capture rationalizations: Save the exact excuses agents use when they skip the rule.
Add red flags: Short phrases that tell the agent to stop and restart the protocol.
Encode behavior in evals: Add pressure_scenarios, rationalizations, red_flags, and behavior_assertions when the skill is guardrail-oriented.
Keep it local: Put behavior details in evals or references/; keep SKILL.md compact.

Anti-Patterns

No "AI-splaining": not explain why pattern good unless it's unique project constraint.
No Vague Triggers: Never use src/** or **/*. surgical.
No Description Bloat: If description exceeds 100 words, some capabilities belong in body.
No long code blocks: >10 lines → extract to references/
No redundancy: don't repeat frontmatter content in body
No untested guardrails: Rules that were never pressure-tested are speculation.

Quality Checklist (Tessl-Aligned)

[ ] Activation ≥ 90%: Description covers both capabilities ("What") and triggers ("When").
[ ] Implementation ≥ 90%: No general-purpose explanations; all examples executable.
[ ] Structural Compliance: SKILL.md ≤ 100 lines; code blocks moved to references/.
[ ] Trigger rate ≥80% on should-trigger queries.
[ ] Guardrail skills include rationalizations, red flags, and behavior eval fields.

References

Skill Template — load when starting new skill from scratch
Anti-Patterns Detail — load when fixing or reviewing anti-pattern format
Size & Limits — load when SKILL.md approaches 100 lines
Resource Organization — load when deciding where to place content (scripts/, references/, assets/)
Testing & Trigger Rate — load when writing evals or measuring trigger rate
Eval Workflow — load when running parallel subagent tests
Full Lifecycle — load for complete phase-by-phase creation guide
Web Search Research — load when creating skill for unfamiliar or non-engineering domain

Agent Skills: Agent Skill Creator Standard

Install this agent skill to your local

Skill Files

Agent Skill Creator Standard

Priority: P0 — Apply to ALL skills

Three-Level Loading System

Workflow (New or Existing Skill)

Description Quality (Activation)

Content Quality (Implementation)

Behavior Guardrails

Anti-Patterns

Quality Checklist (Tessl-Aligned)

References