Agent Safety Skill | Agent Skills

Agent Safety

Implement safety systems for responsible AI agent deployment.

When to Use This Skill

Invoke this skill when:

Adding input/output guardrails
Implementing content filtering
Setting up rate limiting
Ensuring compliance (GDPR, SOC2)

Parameter Schema

| Parameter | Type | Required | Description | Default | |-----------|------|----------|-------------|---------| | task | string | Yes | Safety goal | - | | risk_level | enum | No | strict, moderate, permissive | strict | | filters | list | No | Filter types to enable | ["injection", "pii", "toxicity"] |

Quick Start

from guardrails import Guard
from guardrails.validators import ToxicLanguage, PIIFilter

guard = Guard.from_validators([
    ToxicLanguage(threshold=0.8, on_fail="exception"),
    PIIFilter(on_fail="fix")
])

# Validate output
validated = guard.validate(llm_response)

Guardrail Types

Input Guardrails

# Prompt injection detection
INJECTION_PATTERNS = [
    r"ignore (previous|all) instructions",
    r"you are now",
    r"forget everything"
]

Output Guardrails

# Content filtering
filters = [
    ToxicityFilter(),
    PIIRedactor(),
    HallucinationDetector()
]

Rate Limiting

class RateLimiter:
    def __init__(self, rpm=60, tpm=100000):
        self.rpm = rpm
        self.tpm = tpm

    def check(self, user_id, tokens):
        # Token bucket algorithm
        pass

Troubleshooting

| Issue | Solution | |-------|----------| | False positives | Tune thresholds | | Injection bypass | Add LLM-based detection | | PII leakage | Add secondary validation | | Performance hit | Cache filter results |

Best Practices

Defense in depth (multiple layers)
Fail-safe defaults (deny by default)
Audit everything
Regular red team testing

Compliance Checklist

[ ] Input validation active
[ ] Output filtering enabled
[ ] Audit logging configured
[ ] Rate limits set
[ ] PII handling compliant

Related Skills

tool-calling - Input validation
llm-integration - API security
multi-agent - Per-agent permissions

Agent Skills: Agent Safety

Install this agent skill to your local

Skill Files