llamaguard
Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, self-harm, criminal planning. 94-95% accuracy. Deploy with vLLM, HuggingFace, Sagemaker. Integrates with NeMo Guardrails.
benchmark-datasets
Standard datasets and benchmarks for evaluating AI security, robustness, and safety
meta-prompt-engineering
Use when prompts produce inconsistent or unreliable outputs, need explicit structure and constraints, require safety guardrails or quality checks, involve multi-step reasoning that needs decomposition, need domain expertise encoding, or when user mentions improving prompts, prompt templates, structured prompts, prompt optimization, reliable AI outputs, or prompt patterns.
principles
Provides development principles, guidelines, and VibeCoder guidance. Use when user mentions 原則, principles, ガイドライン, guidelines, VibeCoder, 安全性, safety, 差分編集, diff-aware. Triggers: 原則, principles, ガイドライン, VibeCoder, 安全性, 差分編集. Do not use for actual implementation - use impl skill instead.