Agent Skills: guardrails-ai-setup

Guardrails AI validation framework setup for LLM applications. Implement input/output validation, safety checks, and structured output enforcement.

UncategorizedID: a5c-ai/babysitter/guardrails-ai-setup

Install this agent skill to your local

pnpm dlx add-skill https://github.com/a5c-ai/babysitter/tree/HEAD/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/guardrails-ai-setup

Skill Files

Browse the full folder contents for guardrails-ai-setup.

Download Skill

Loading file tree…

plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/guardrails-ai-setup/SKILL.md

Skill Metadata

Name
guardrails-ai-setup
Description
Guardrails AI validation framework setup for LLM applications. Implement input/output validation, safety checks, and structured output enforcement.

guardrails-ai-setup

Configure Guardrails AI validation framework to ensure LLM outputs meet quality, safety, and structural requirements. Implement validators for input sanitization, output format enforcement, and safety constraints.

Overview

Guardrails AI provides:

  • Input validation before LLM calls
  • Output validation after LLM responses
  • Structured output enforcement (JSON, XML, etc.)
  • Pre-built validators from Guardrails Hub
  • Custom validator creation
  • Automatic retry and correction mechanisms

Capabilities

Input Validation

  • Sanitize user inputs
  • Detect prompt injection attempts
  • Validate input formats and lengths
  • Check for PII before processing

Output Validation

  • Enforce structured output schemas
  • Validate content accuracy
  • Check for harmful content
  • Verify factual consistency

Safety Constraints

  • Content moderation
  • Toxicity detection
  • Bias checking
  • Hallucination detection

Integration Features

  • LangChain integration
  • Streaming support
  • Automatic retries
  • Correction strategies

Usage

Basic Setup

from guardrails import Guard
from guardrails.hub import ValidJson, ToxicLanguage, DetectPII

# Create guard with validators
guard = Guard().use_many(
    ValidJson(),
    ToxicLanguage(on_fail="fix"),
    DetectPII(on_fail="fix")
)

# Use with LLM
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4")

result = guard(
    llm,
    prompt="Generate a product description for a laptop",
    max_tokens=500
)

print(result.validated_output)

Schema-Based Validation

from guardrails import Guard
from pydantic import BaseModel, Field
from typing import List

class ProductReview(BaseModel):
    """Schema for product review output."""
    rating: int = Field(ge=1, le=5, description="Rating from 1-5")
    summary: str = Field(max_length=200, description="Brief summary")
    pros: List[str] = Field(min_items=1, max_items=5)
    cons: List[str] = Field(min_items=1, max_items=5)
    recommendation: bool

# Create guard from schema
guard = Guard.from_pydantic(ProductReview)

result = guard(
    llm,
    prompt="""Analyze this product and provide a structured review:
    Product: Wireless Noise-Canceling Headphones
    Price: $299
    Features: 30hr battery, ANC, Bluetooth 5.3
    """,
)

# Result is a validated ProductReview instance
review = result.validated_output
print(f"Rating: {review.rating}")
print(f"Summary: {review.summary}")

Using Guardrails Hub Validators

from guardrails import Guard
from guardrails.hub import (
    CompetitorCheck,
    ProfanityFree,
    ReadingTime,
    RestrictToTopic,
    SensitiveTopic,
    ToxicLanguage,
    ValidJson,
    ValidLength
)

# Install validators from hub
# guardrails hub install hub://guardrails/toxic_language

# Compose multiple validators
guard = Guard().use_many(
    ValidJson(on_fail="reask"),
    ToxicLanguage(threshold=0.8, on_fail="fix"),
    ProfanityFree(on_fail="fix"),
    ValidLength(min=100, max=1000, on_fail="reask"),
    RestrictToTopic(
        valid_topics=["technology", "software"],
        on_fail="reask"
    )
)

Custom Validators

from guardrails import Validator, register_validator
from guardrails.validators import ValidationResult

@register_validator(name="custom/no-urls", data_type="string")
class NoURLs(Validator):
    """Validator that checks for URLs in text."""

    def validate(self, value: str, metadata: dict) -> ValidationResult:
        import re
        url_pattern = r'https?://\S+'

        if re.search(url_pattern, value):
            return ValidationResult(
                outcome="fail",
                error_message="Text contains URLs which are not allowed",
                fix_value=re.sub(url_pattern, "[URL REMOVED]", value)
            )

        return ValidationResult(outcome="pass")

# Use custom validator
guard = Guard().use(NoURLs(on_fail="fix"))

Prompt Injection Defense

from guardrails import Guard
from guardrails.hub import DetectPromptInjection

# Create input guard for prompt injection
input_guard = Guard().use(
    DetectPromptInjection(
        on_fail="exception",
        threshold=0.9
    )
)

def safe_chat(user_input: str) -> str:
    # Validate input first
    try:
        input_guard.validate(user_input)
    except Exception as e:
        return "I cannot process that request."

    # Process safe input
    return llm.invoke(user_input)

Integration with NeMo Guardrails

from guardrails import Guard
from nemoguardrails import LLMRails, RailsConfig

# Combine Guardrails AI with NeMo Guardrails
config = RailsConfig.from_path("./config")
rails = LLMRails(config)

# Use Guardrails AI for structured output
output_guard = Guard.from_pydantic(OutputSchema)

async def guarded_chat(user_input: str) -> dict:
    # NeMo handles dialogue safety
    response = await rails.generate_async(
        messages=[{"role": "user", "content": user_input}]
    )

    # Guardrails AI validates structure
    validated = output_guard.validate(response["content"])

    return validated.validated_output

Task Definition

const guardrailsAISetupTask = defineTask({
  name: 'guardrails-ai-setup',
  description: 'Configure Guardrails AI validation for LLM application',

  inputs: {
    outputSchema: { type: 'object', required: false },
    validators: { type: 'array', required: true },
    onFailStrategy: { type: 'string', default: 'reask' },  // 'reask', 'fix', 'exception', 'filter'
    maxRetries: { type: 'number', default: 3 },
    enableInputValidation: { type: 'boolean', default: true },
    enableOutputValidation: { type: 'boolean', default: true }
  },

  outputs: {
    guardConfigured: { type: 'boolean' },
    validatorsInstalled: { type: 'array' },
    artifacts: { type: 'array' }
  },

  async run(inputs, taskCtx) {
    return {
      kind: 'skill',
      title: 'Configure Guardrails AI validation',
      skill: {
        name: 'guardrails-ai-setup',
        context: {
          outputSchema: inputs.outputSchema,
          validators: inputs.validators,
          onFailStrategy: inputs.onFailStrategy,
          maxRetries: inputs.maxRetries,
          enableInputValidation: inputs.enableInputValidation,
          enableOutputValidation: inputs.enableOutputValidation,
          instructions: [
            'Install Guardrails AI package and hub validators',
            'Define output schema if structured output needed',
            'Configure selected validators with failure strategies',
            'Set up input validation for prompt injection defense',
            'Configure output validation for content safety',
            'Implement retry logic with correction strategies',
            'Test validation pipeline with sample inputs/outputs',
            'Document validation rules and expected behaviors'
          ]
        }
      },
      io: {
        inputJsonPath: `tasks/${taskCtx.effectId}/input.json`,
        outputJsonPath: `tasks/${taskCtx.effectId}/result.json`
      }
    };
  }
});

Applicable Processes

  • system-prompt-guardrails
  • prompt-injection-defense
  • content-moderation-safety
  • chatbot-design-implementation

External Dependencies

  • guardrails-ai Python package
  • Guardrails Hub account (for hub validators)
  • LLM provider (OpenAI, Anthropic, etc.)
  • Optional: NeMo Guardrails for dialogue safety

References

Related Skills

  • SK-SAF-001 content-moderation-api
  • SK-SAF-003 nemo-guardrails
  • SK-SAF-004 prompt-injection-detector
  • SK-SAF-005 pii-redaction

Related Agents

  • AG-SAF-001 safety-auditor
  • AG-SAF-002 prompt-injection-defender
  • AG-PE-001 system-prompt-engineer