Agent Skills: AI Agents Development — Production Skill Hub

Production AI agent patterns covering MCP, RAG, guardrails, observability, and ROI. Use when designing or evaluating agent systems.

UncategorizedID: vasilyu1983/ai-agents-public/ai-agents

Install this agent skill to your local

pnpm dlx add-skill https://github.com/vasilyu1983/AI-Agents-public/tree/HEAD/frameworks/shared-skills/skills/ai-agents

Skill Files

Browse the full folder contents for ai-agents.

Download Skill

Loading file tree…

frameworks/shared-skills/skills/ai-agents/SKILL.md

Skill Metadata

Name
ai-agents
Description
Production AI agent patterns covering MCP, RAG, guardrails, observability, and ROI. Use when designing or evaluating agent systems.

AI Agents Development — Production Skill Hub

Modern Best Practices (March 2026): deterministic control flow, bounded tools, auditable state, MCP-based tool integration, handoff-first orchestration, multi-layer guardrails, OpenTelemetry tracing, and human-in-the-loop controls (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).

This skill provides production-ready operational patterns for designing, building, evaluating, and deploying AI agents. It centralizes procedures, checklists, decision rules, and templates used across RAG agents, tool-using agents, OS agents, and multi-agent systems.

No theory. No narrative. Only operational steps and templates.


When to Use This Skill

Codex should activate this skill whenever the user asks for:

  • Designing an agent (LLM-based, tool-based, OS-based, or multi-agent).
  • Scoping capability maturity and rollout risk for new agent behaviors.
  • Creating action loops, plans, workflows, or delegation logic.
  • Writing tool definitions, MCP tools, schemas, or validation logic.
  • Generating RAG pipelines, retrieval modules, or context injection.
  • Building memory systems (session, long-term, episodic, task).
  • Creating evaluation harnesses, observability plans, or safety gates.
  • Preparing CI/CD, rollout, deployment, or production operational specs.
  • Producing any template in /references/ or /assets/.
  • Implementing MCP servers or integrating Model Context Protocol.
  • Setting up agent handoffs and orchestration patterns.
  • Configuring multi-layer guardrails and safety controls.
  • Evaluating whether to build an agent (build vs not decision).
  • Calculating agent ROI, token costs, or cost/benefit analysis.
  • Assessing hallucination risk and mitigation strategies.
  • Deciding when to kill an agent project (kill triggers).
  • For prompt scaffolds, retrieval tuning, or security depth, see Scope Boundaries below.

Scope Boundaries (Use These Skills for Depth)

Default Workflow (Production)


Quick Reference

| Agent Type | Core Control Flow | Interfaces | MCP/A2A | When to Use | |------------|-----------|------------|---------|-------------| | Workflow Agent (FSM/DAG) | Explicit state transitions | State store, tool allowlist | MCP | Deterministic, auditable flows | | Tool-Using Agent | Route → call tool → observe | Tool schemas, retries/timeouts | MCP | External actions (APIs, DB, files) | | RAG Agent | Retrieve → answer → cite | Retriever, citations, ACLs | MCP | Knowledge-grounded responses | | Planner/Executor | Plan → execute steps with caps | Planner prompts, step budget | MCP (+A2A) | Multi-step problems with bounded autonomy | | Multi-Agent (Orchestrated) | Delegate → merge → validate | Handoff contracts, eval gates | A2A | Specialization with explicit handoffs | | OS Agent | Observe UI → act → verify | Sandbox, UI grounding | MCP | Desktop/browser control under strict guardrails | | Code/SWE Agent | Branch → edit → test → PR | Repo access, CI gates | MCP | Coding tasks with review/merge controls |

Framework Selection (March 2026)

Tier 1 — Production-Grade

| Framework | Architecture | Best For | Languages | Ease | |-----------|--------------|----------|-----------|------| | LangGraph | Graph-based, stateful | Enterprise, compliance, auditability | Python, JS | Medium | | Claude Agent SDK | Event-driven, tool-centric | Anthropic ecosystem, Computer Use, MCP-native | Python, TS | Easy | | OpenAI Agents SDK | Tool-centric, lightweight | Fast prototyping, OpenAI ecosystem | Python | Easy | | Google ADK | Code-first, multi-language | Gemini/Vertex AI, polyglot teams | Python, TS, Go, Java | Medium | | Pydantic AI | Type-safe, graph FSM | Production Python, type safety, MCP+A2A native | Python | Medium | | MS Agent Framework | Kernel + multi-agent | Enterprise Azure, .NET/Java teams | Python, .NET, Java | Medium |

Tier 2 — Specialized

| Framework | Architecture | Best For | Languages | Ease | |-----------|--------------|----------|-----------|------| | LlamaIndex | Event-driven workflows | RAG-native agents, retrieval-heavy | Python, TS | Medium | | CrewAI | Role-based crews | Team workflows, content generation | Python | Easiest | | Mastra | Vercel AI SDK-based | TypeScript/Next.js teams | TypeScript | Easy | | SmolAgents | Code-first, minimalist | Lightweight, fewer LLM calls | Python | Easy | | Agno | FastAPI-native runtime | Production Python, 100+ integrations | Python | Easy | | AWS Bedrock Agents | Managed infrastructure | Enterprise AWS, knowledge bases | Python | Easy |

Tier 3 — Niche

| Framework | Niche | |-----------|-------| | Haystack | Enterprise RAG+agents pipeline (Airbus, NVIDIA) | | DSPy | Declarative optimization — compiles programs into prompts/weights |

See references/modern-best-practices.md for detailed comparison and selection guide.

Framework Deep Dives


Decision Tree: Choosing Agent Architecture

What does the agent need to do?
    ├─ Answer questions from knowledge base?
    │   ├─ Simple lookup? → RAG Agent (LangChain/LlamaIndex + vector DB)
    │   └─ Complex multi-step? → Agentic RAG (iterative retrieval + reasoning)
    │
    ├─ Perform external actions (APIs, tools, functions)?
    │   ├─ 1-3 tools, linear flow? → Tool-Using Agent (LangGraph + MCP)
    │   └─ Complex workflows, branching? → Planning Agent (ReAct/Plan-Execute)
    │
    ├─ Write/modify code autonomously?
    │   ├─ Single file edits? → Tool-Using Agent with code tools
    │   └─ Multi-file, issue resolution? → Code/SWE Agent (HyperAgent pattern)
    │
    ├─ Delegate tasks to specialists?
    │   ├─ Fixed workflow? → Multi-Agent Sequential (A → B → C)
    │   ├─ Manager-Worker? → Multi-Agent Hierarchical (Manager + Workers)
    │   └─ Dynamic routing? → Multi-Agent Group Chat (collaborative)
    │
    ├─ Control desktop/browser?
    │   └─ OS Agent (Anthropic Computer Use + MCP for system access)
    │
    └─ Hybrid (combination of above)?
        └─ Planning Agent that coordinates:
            - Tool-using for actions (MCP)
            - RAG for knowledge (MCP)
            - Multi-agent for delegation (A2A)
            - Code agents for implementation

Protocol Selection:

  • Use MCP for: Tool access, data retrieval, single-agent integration
  • Use A2A for: Agent-to-agent handoffs, multi-agent coordination, task delegation

Framework Selection (after choosing architecture):

Which framework?
    ├─ MVP/Prototyping?
    │   ├─ Python → OpenAI Agents SDK or CrewAI
    │   └─ TypeScript → Mastra or Claude Agent SDK
    │
    ├─ Production →
    │   ├─ Auditability/compliance? → LangGraph
    │   ├─ Type safety + MCP/A2A native? → Pydantic AI
    │   ├─ Anthropic models + Computer Use? → Claude Agent SDK
    │   ├─ Google Cloud / Gemini? → Google ADK
    │   ├─ Azure / .NET / Java? → MS Agent Framework
    │   ├─ AWS managed? → Bedrock Agents
    │   └─ RAG-heavy? → LlamaIndex Workflows
    │
    ├─ Minimalist / Research →
    │   ├─ Fewest LLM calls? → SmolAgents
    │   └─ Optimize prompts automatically? → DSPy
    │
    └─ Enterprise pipeline → Haystack

Core Concepts (Vendor-Agnostic)

Control Flow Options

  • Reactive: direct tool routing per user request (fast, brittle if unbounded).
  • Workflow (FSM/DAG): explicit states and transitions (default for deterministic production).
  • Planner/Executor: plan with strict budgets, then execute step-by-step (use when branching is unavoidable).
  • Orchestrated multi-agent: separate roles with validated handoffs (use when specialization is required).

Memory Types (Tradeoffs)

  • Short-term (session): cheap, ephemeral; best for conversational continuity.
  • Episodic (task): scoped to a case/ticket; supports audit and replay.
  • Long-term (profile/knowledge): high risk; requires consent, retention limits, and provenance.

Failure Handling (Production Defaults)

  • Classify errors: retriable vs fatal vs needs-human.
  • Bound retries: max attempts, backoff, jitter; avoid retry storms.
  • Fallbacks: degraded mode, smaller model, cached answers, or safe refusal.

Do / Avoid

Do

  • Do keep state explicit and serializable (replayable runs).
  • Do enforce tool allowlists, scopes, and idempotency for side effects.
  • Do log traces/metrics for model calls and tool calls (OpenTelemetry GenAI semantic conventions: https://opentelemetry.io/docs/specs/semconv/gen-ai/).

Avoid

  • Avoid runaway autonomy (unbounded loops or step counts).
  • Avoid hidden state (implicit memory that cannot be audited).
  • Avoid untrusted tool outputs without validation/sanitization.

Navigation: Economics & Decision Framework

Should You Build an Agent?

  • Build vs Not Decision Framework - references/build-vs-not-decision.md
    • 10-second test (volume, cost, error tolerance)
    • Red flags and immediate disqualifiers
    • Alternatives to agents (usually better)
    • Full decision tree with stage gates
    • Kill triggers during development and post-launch
    • Pre-build validation checklist

Agent ROI & Token Economics

  • Agent Economics - references/agent-economics.md
    • Token pricing by model (January 2026)
    • Cost per task by agent type
    • ROI calculation formula and tiers
    • Hallucination cost framework and mitigation ROI
    • Investment decision matrix
    • Monthly tracking dashboard

Navigation: AI Engine Layers

Five-layer architecture for production agent systems. Start with the overview, then drill into layer-specific patterns.

Action Graph → covered by references/operational-patterns.md + references/agent-operations-best-practices.md Data Agent → covered by ../ai-rag/SKILL.md + references/rag-patterns.md


Navigation: Core Concepts & Patterns

Governance & Maturity

  • Agent Maturity & Governance - references/agent-maturity-governance.md
    • Capability maturity levels (L0-L4)
    • Identity & policy enforcement
    • Fleet control and registry management
    • Deprecation rules and kill switches

Modern Best Practices

  • Modern Best Practices - references/modern-best-practices.md
    • Model Context Protocol (MCP)
    • Agent-to-Agent Protocol (A2A)
    • Agentic RAG (Dynamic Retrieval)
    • Multi-layer guardrails
    • LangGraph over LangChain
    • OpenTelemetry for agents

Context Management

Core Operational Patterns

  • Operational Patterns - references/operational-patterns.md
    • Agent loop pattern (PLAN → ACT → OBSERVE → UPDATE)
    • OS agent action loop
    • RAG pipeline pattern
    • Tool specification
    • Memory system pattern
    • Multi-agent workflow
    • Safety & guardrails
    • Observability
    • Evaluation patterns
    • Deployment & CI/CD

Navigation: Protocol Implementation


Navigation: Agent Capabilities

Skill Packaging & Sharing

Framework-Specific Patterns

  • Pydantic AI Patterns - references/pydantic-ai-patterns.md Type-safe agents, MCP toolsets (Stdio/SSE/StreamableHTTP), A2A via to_a2a(), pydantic-graph FSM, durable execution, TestModel testing

Navigation: Production Operations


Navigation: Templates (Copy-Paste Ready)

Checklists

Core Agent Templates

RAG Templates

Tool Templates

Multi-Agent Templates

Service Layer Templates


External Sources Metadata

  • Curated References - data/sources.json Authoritative sources spanning standards, protocols, and production agent frameworks

Shared Utilities (Centralized patterns — extract, don't duplicate)


Trend Awareness Protocol

IMPORTANT: When users ask framework recommendations or "what's best for X" questions, use WebSearch to verify current landscape before answering. If unavailable, use data/sources.json and state what was verified vs assumed.

Trigger: framework comparisons, "best for [use case]", "is X still relevant?", "latest in AI agents", MCP server availability.

Report: current landscape, emerging trends, deprecated patterns, recommendation with rationale.


Related Skills

This skill integrates with complementary skills:

Core Dependencies

  • ../ai-llm/ - LLM patterns, prompt engineering, and model selection for agents
  • ../ai-rag/ - Deep RAG implementation: chunking, embedding, reranking
  • ../ai-prompt-engineering/ - System prompt design, few-shot patterns, reasoning strategies

Production & Operations

Supporting Patterns

  • ../dev-api-design/ - REST/GraphQL design for agent APIs and tool interfaces
  • ../ai-mlops/ - Model deployment, monitoring, drift detection
  • ../qa-debugging/ - Agent debugging, error analysis, root cause investigation
  • ../dev-ai-coding-metrics/ - Team-level AI coding metrics: adoption, DORA/SPACE, ROI, DX surveys (this skill covers per-task agent economics)

Usage pattern: Start here for agent architecture, then reference specialized skills for deep implementation details.


Usage Notes

  • Modern Standards: Default to MCP for tools, agentic RAG for retrieval, handoff-first for multi-agent
  • Lightweight SKILL.md: Use this file for quick reference and navigation
  • Drill-down resources: Reference detailed resources for implementation guidance
  • Copy-paste templates: Use templates when the user asks for structured artifacts
  • External sources: Reference data/sources.json for authoritative documentation links
  • No theory: Never include theoretical explanations; only operational steps

AI-Native SDLC Template

Fact-Checking

  • Use web search/web fetch to verify current external facts, versions, pricing, deadlines, regulations, or platform behavior before final answers.
  • Prefer primary sources; report source links and dates for volatile information.
  • If web access is unavailable, state the limitation and mark guidance as unverified.