Agent Skills: Design RAG Architecture

Design a RAG architecture for a use case

UncategorizedID: melodic-software/claude-code-plugins/rag-design

Install this agent skill to your local

pnpm dlx add-skill https://github.com/melodic-software/claude-code-plugins/tree/HEAD/plugins/systems-design/skills/rag-design

Skill Files

Browse the full folder contents for rag-design.

Download Skill

Loading file tree…

plugins/systems-design/skills/rag-design/SKILL.md

Skill Metadata

Name
rag-design
Description
Design a RAG architecture for a use case

Design RAG Architecture

Design a Retrieval-Augmented Generation system for a given use case.

Arguments

$ARGUMENTS - The RAG use case to design for (e.g., "customer support chatbot", "documentation Q&A", "legal document search", "code assistant")

Workflow

  1. Clarify requirements by understanding:

    • What type of questions will be asked?
    • What is the document corpus size and type?
    • What is the required accuracy/faithfulness?
    • What is the latency budget?
    • Are there multi-turn conversation requirements?
  2. Load relevant skills based on the use case:

    • RAG patterns → rag-architecture
    • Vector store selection → vector-databases
    • LLM serving → llm-serving-patterns
    • Inference optimization → ml-inference-optimization
  3. Spawn the rag-architect agent for comprehensive design:

    • Use Task tool with subagent_type="rag-architect"
    • Provide full use case context and requirements
    • Request end-to-end RAG architecture
  4. Design the ingestion pipeline:

    • Document extraction (PDF, HTML, code)
    • Chunking strategy selection
    • Embedding model selection
    • Vector database configuration
    • Metadata extraction and indexing
  5. Design the retrieval pipeline:

    • Query processing (expansion, HyDE)
    • Retrieval strategy (dense, sparse, hybrid)
    • Reranking approach
    • Context assembly
    • Prompt engineering
  6. Address quality and scale:

    • Retrieval accuracy (recall@k, MRR)
    • Answer faithfulness (grounding)
    • Latency budget allocation
    • Cost optimization
    • Scaling strategy

Example Usage

/sd:rag-design customer support chatbot with 10K FAQ documents
/sd:rag-design internal documentation Q&A for engineering team
/sd:rag-design legal document search for contract review
/sd:rag-design code assistant for enterprise codebase
/sd:rag-design research paper Q&A with 100K papers
/sd:rag-design product catalog search with structured data
/sd:rag-design multi-lingual knowledge base

Use Case Categories

| Category | Key Considerations | | -------- | ------------------ | | Customer Support | FAQ coverage, escalation, tone consistency | | Documentation | Technical accuracy, code examples, versioning | | Legal/Compliance | Citation accuracy, audit trails, access control | | Code Assistance | AST-aware chunking, context relevance, IDE integration | | Research/Academic | Multi-document reasoning, citation, long-form answers | | E-commerce | Product attributes, inventory awareness, personalization |

RAG Pattern Selection Guide

| Complexity | Pattern | When to Use | | ---------- | ------- | ----------- | | Low | Basic RAG | Simple Q&A, small corpus | | Medium | RAG + Reranking | Higher accuracy needed | | Medium | Hybrid Search | Mixed keyword + semantic queries | | High | Query-Transformed | Vague or complex queries | | High | Agentic RAG | Multi-hop reasoning, tool use |

Output

A comprehensive RAG system architecture including:

  • Ingestion pipeline (documents → vectors)
  • Retrieval pipeline (query → context)
  • Technology stack (embedding model, vector DB, LLM)
  • Quality targets (recall, faithfulness, latency)
  • Trade-offs and alternatives
  • Cost estimate (per-query and monthly)