Design RAG Architecture
Design a Retrieval-Augmented Generation system for a given use case.
Arguments
$ARGUMENTS - The RAG use case to design for (e.g., "customer support chatbot", "documentation Q&A", "legal document search", "code assistant")
Workflow
-
Clarify requirements by understanding:
- What type of questions will be asked?
- What is the document corpus size and type?
- What is the required accuracy/faithfulness?
- What is the latency budget?
- Are there multi-turn conversation requirements?
-
Load relevant skills based on the use case:
- RAG patterns →
rag-architecture - Vector store selection →
vector-databases - LLM serving →
llm-serving-patterns - Inference optimization →
ml-inference-optimization
- RAG patterns →
-
Spawn the rag-architect agent for comprehensive design:
- Use Task tool with subagent_type="rag-architect"
- Provide full use case context and requirements
- Request end-to-end RAG architecture
-
Design the ingestion pipeline:
- Document extraction (PDF, HTML, code)
- Chunking strategy selection
- Embedding model selection
- Vector database configuration
- Metadata extraction and indexing
-
Design the retrieval pipeline:
- Query processing (expansion, HyDE)
- Retrieval strategy (dense, sparse, hybrid)
- Reranking approach
- Context assembly
- Prompt engineering
-
Address quality and scale:
- Retrieval accuracy (recall@k, MRR)
- Answer faithfulness (grounding)
- Latency budget allocation
- Cost optimization
- Scaling strategy
Example Usage
/sd:rag-design customer support chatbot with 10K FAQ documents
/sd:rag-design internal documentation Q&A for engineering team
/sd:rag-design legal document search for contract review
/sd:rag-design code assistant for enterprise codebase
/sd:rag-design research paper Q&A with 100K papers
/sd:rag-design product catalog search with structured data
/sd:rag-design multi-lingual knowledge base
Use Case Categories
| Category | Key Considerations | | -------- | ------------------ | | Customer Support | FAQ coverage, escalation, tone consistency | | Documentation | Technical accuracy, code examples, versioning | | Legal/Compliance | Citation accuracy, audit trails, access control | | Code Assistance | AST-aware chunking, context relevance, IDE integration | | Research/Academic | Multi-document reasoning, citation, long-form answers | | E-commerce | Product attributes, inventory awareness, personalization |
RAG Pattern Selection Guide
| Complexity | Pattern | When to Use | | ---------- | ------- | ----------- | | Low | Basic RAG | Simple Q&A, small corpus | | Medium | RAG + Reranking | Higher accuracy needed | | Medium | Hybrid Search | Mixed keyword + semantic queries | | High | Query-Transformed | Vague or complex queries | | High | Agentic RAG | Multi-hop reasoning, tool use |
Output
A comprehensive RAG system architecture including:
- Ingestion pipeline (documents → vectors)
- Retrieval pipeline (query → context)
- Technology stack (embedding model, vector DB, LLM)
- Quality targets (recall, faithfulness, latency)
- Trade-offs and alternatives
- Cost estimate (per-query and monthly)