diagram
Generate publication-quality technical diagrams using Nano Banana Pro (Gemini 3 Pro Image) with AI-powered quality review. Smart iteration only regenerates when quality is below threshold.
evaluation-metrics
LLM evaluation frameworks, benchmarks, and quality metrics for production systems.
llm-evaluation
|
tech-debt-tracker
Automated technical debt identification, tracking, and prioritization system
model-evaluation
Evaluates machine learning models for performance, fairness, and reliability using appropriate metrics and validation techniques. Trigger keywords: model evaluation, metrics, accuracy, precision, recall, F1, ROC, AUC, cross-validation, ML testing.
course-description-analyzer
This skill analyzes or creates course descriptions for intelligent textbooks by checking for completeness of required elements (title, audience, prerequisites, topics, Bloom's Taxonomy outcomes) and providing quality scores with improvement suggestions. Use this skill when working with course descriptions in /docs/course-description.md that need validation or creation for learning graph generation.
learning-graph-generator
Generates a comprehensive learning graph from a course description, including 200 concepts with dependencies, taxonomy categorization, and quality validation reports. Use this when the user wants to create a structured knowledge graph for educational content.
skill-performance-profiler
Analyzes skill usage patterns across conversations to track token consumption, identify heavy vs. lightweight skills, measure invocation frequency, detect co-occurrence patterns, and suggest consolidation opportunities. Use when the user asks to analyze skill performance, optimize skill usage, identify token-heavy skills, find consolidation opportunities, or review skill metrics.
analyzing-code
Analyzes code statistics by language for project insight, CI/CD metrics, or before refactoring. Use this skill when understanding project composition, measuring change impact, or generating CI/CD metrics
review-feedback-schema
Schema for tracking code review outcomes to enable feedback-driven skill improvement. Use when logging review results or analyzing review quality.
review-skill-improver
Analyzes feedback logs to identify patterns and suggest improvements to review skills. Use when you have accumulated feedback data and want to improve review accuracy.
llm-judge
LLM-as-judge methodology for comparing code implementations across repositories. Scores implementations on functionality, security, test quality, overengineering, and dead code using weighted rubrics. Used by /beagle:llm-judge command.
validation-standards
Tool usage requirements, failure patterns, consistency checks, and validation methodologies for Claude Code operations
code-analysis
Provides methodologies, metrics, and best practices for analyzing code structure, complexity, and quality
meta-prompt-engineering
Use when prompts produce inconsistent or unreliable outputs, need explicit structure and constraints, require safety guardrails or quality checks, involve multi-step reasoning that needs decomposition, need domain expertise encoding, or when user mentions improving prompts, prompt templates, structured prompts, prompt optimization, reliable AI outputs, or prompt patterns.
context-engineering
Master context engineering for AI features - the skill that separates AI products that work from ones that hallucinate. Use when speccing new AI features, diagnosing underperforming AI features, or doing quality checks before shipping. Helps PMs define what context AI needs, where to get it, and what to do when it fails. Based on the 4D Context Canvas framework.
satisfaction-feedback
处理用户满意度反馈。用户回复"满意"/"不满意"时,更新 FAQ 使用计数或记录 BADCASE。触发词:满意/不满意/解决了/���解决/谢谢。
evaluation
Build evaluation frameworks for agent systems. Use when testing agent performance, validating context engineering choices, or measuring improvements over time.
Page 1 of 2 · 27 results