code-reviewer
Use when explicitly asked to run the code-reviewer subagent or when another skill requires the code-reviewer agent card.
sf-ai-agentforce-testing
>
claude-extensibility
Claude Code extensibility: agents, skills, output styles. Capabilities: create/update/delete agents and skills, YAML frontmatter, system prompts, tool/model selection, resumable agents, CLI-defined agents. Actions: create, edit, delete, optimize, test extensions. Keywords: agent, skill, output-style, SKILL.md, subagent, Task tool, progressive disclosure. Use when: creating agents/skills, editing extensions, configuring tool access, choosing models, testing activation.
spawning-agents-on-the-command-line
Use when subagents need to delegate work but can't use Task tool, or when needing to test skills in isolated context - spawns Claude instances via CLI backgrounding with JSON responses
testing-skills-with-subagents
Use when creating or editing skills, before deployment, to verify they work under pressure and resist rationalization - applies RED-GREEN-REFACTOR cycle to process documentation by running baseline without skill, writing to address failures, iterating to close loopholes
testing-skills-with-subagents
Use when creating or editing skills, before deployment, to verify they work under pressure and resist rationalization - applies RED-GREEN-REFACTOR cycle to process documentation by running baseline without skill, writing to address failures, iterating to close loopholes
determinism
Use when verifying outcomes with code instead of LLM judgment, versioning prompts with hashes, or ensuring reproducible agent behavior. Load for any critical verification. Scripts return boolean exit codes, not subjective assessments. Prompts use semantic versioning with SHA256 validation.
Convex Agents Debugging
Troubleshoots agent behavior, logs LLM interactions, and inspects database state. Use this when responses are unexpected, to understand context the LLM receives, or to diagnose data issues.
Convex Agents Playground
Sets up a web UI for testing, debugging, and developing agents without code. Use this to manually test agents, browse conversation history, and verify behavior in real-time.
testing-workflows-with-subagents
Use when creating or editing commands, orchestrator prompts, or workflow documentation before deployment - applies RED-GREEN-REFACTOR to test instruction clarity by finding real execution failures, creating test scenarios, and verifying fixes with subagents
self-test-skill-invocation
Use when user asks to "test skill invocation framework" or mentions "canary skill test". This is a self-test skill to verify the test framework correctly loads and invokes skills.
pydantic-ai-testing
Test PydanticAI agents using TestModel, FunctionModel, VCR cassettes, and inline snapshots. Use when writing unit tests, mocking LLM responses, or recording API interactions.
llm-artifacts-detection
Detects common LLM coding agent artifacts in codebases. Identifies test quality issues, dead code, over-abstraction, and verbose LLM style patterns. Use when cleaning up AI-generated code or reviewing for agent-introduced cruft.
pydantic-ai-common-pitfalls
Avoid common mistakes and debug issues in PydanticAI agents. Use when encountering errors, unexpected behavior, or when reviewing agent implementations.
evaluation
Build evaluation frameworks for agent systems. Use when testing agent performance, validating context engineering choices, or measuring improvements over time.
writing-skills
Use when creating new skills, editing existing skills, or verifying skills work - applies TDD to documentation by testing with subagents before writing
sf-ai-agentforce-testing
>
evaluation
This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge, multi-dimensional evaluation, agent testing, or quality gates for agent pipelines.