RAG Retrieval Skill | Agent Skills

RAG Retrieval

Comprehensive patterns for building production RAG systems. Each category has individual rule files in rules/ loaded on-demand.

Quick Reference

| Category | Rules | Impact | When to Use | |----------|-------|--------|-------------| | Core RAG | 4 | CRITICAL | Basic RAG, citations, hybrid search, context management | | Embeddings | 3 | HIGH | Model selection, chunking, batch/cache optimization | | Contextual Retrieval | 3 | HIGH | Context-prepending, hybrid BM25+vector, pipeline | | HyDE | 3 | HIGH | Vocabulary mismatch, hypothetical document generation | | Agentic RAG | 4 | HIGH | Self-RAG, CRAG, knowledge graphs, adaptive routing | | Multimodal RAG | 3 | MEDIUM | Image+text retrieval, PDF chunking, cross-modal search | | Query Decomposition | 3 | MEDIUM | Multi-concept queries, parallel retrieval, RRF fusion | | Reranking | 3 | MEDIUM | Cross-encoder, LLM scoring, combined signals | | PGVector | 4 | HIGH | PostgreSQL hybrid search, HNSW indexes, schema design |

Total: 30 rules across 9 categories

Core RAG

Fundamental patterns for retrieval, generation, and pipeline composition.

| Rule | File | Key Pattern | |------|------|-------------| | Basic RAG | rules/core-basic-rag.md | Retrieve + context + generate with citations | | Hybrid Search | rules/core-hybrid-search.md | RRF fusion (k=60) for semantic + keyword | | Context Management | rules/core-context-management.md | Token budgeting + sufficiency check | | Pipeline Composition | rules/core-pipeline-composition.md | Composable Decompose → HyDE → Retrieve → Rerank |

Embeddings

Embedding models, chunking strategies, and production optimization.

| Rule | File | Key Pattern | |------|------|-------------| | Models & API | rules/embeddings-models.md | Model selection, batch API, similarity | | Chunking | rules/embeddings-chunking.md | Semantic boundary splitting, 512 token sweet spot | | Advanced | rules/embeddings-advanced.md | Redis cache, Matryoshka dims, batch processing |

Contextual Retrieval

Anthropic's context-prepending technique — 67% fewer retrieval failures.

| Rule | File | Key Pattern | |------|------|-------------| | Context Prepending | rules/contextual-prepend.md | LLM-generated context + prompt caching | | Hybrid Search | rules/contextual-hybrid.md | 40% BM25 / 60% vector weight split | | Complete Pipeline | rules/contextual-pipeline.md | End-to-end indexing + hybrid retrieval |

HyDE

Hypothetical Document Embeddings for bridging vocabulary gaps.

| Rule | File | Key Pattern | |------|------|-------------| | Generation | rules/hyde-generation.md | Embed hypothetical doc, not query | | Per-Concept | rules/hyde-per-concept.md | Parallel HyDE for multi-topic queries | | Fallback | rules/hyde-fallback.md | 2-3s timeout → direct embedding fallback |

Agentic RAG

Self-correcting retrieval with LLM-driven decision making.

| Rule | File | Key Pattern | |------|------|-------------| | Self-RAG | rules/agentic-self-rag.md | Binary document grading for relevance | | Corrective RAG | rules/agentic-corrective-rag.md | CRAG workflow with web fallback | | Knowledge Graph | rules/agentic-knowledge-graph.md | KG + vector hybrid for entity-rich domains | | Adaptive Retrieval | rules/agentic-adaptive-retrieval.md | Query routing to optimal strategy |

Multimodal RAG

Image + text retrieval with cross-modal search.

| Rule | File | Key Pattern | |------|------|-------------| | Embeddings | rules/multimodal-embeddings.md | CLIP, SigLIP 2, Voyage multimodal-3 | | Chunking | rules/multimodal-chunking.md | PDF extraction preserving images | | Pipeline | rules/multimodal-pipeline.md | Dedup + hybrid retrieval + generation |

Query Decomposition

Breaking complex queries into concepts for parallel retrieval.

| Rule | File | Key Pattern | |------|------|-------------| | Detection | rules/query-detection.md | Heuristic indicators (<1ms fast path) | | Decompose + RRF | rules/query-decompose.md | LLM concept extraction + parallel retrieval | | HyDE Combo | rules/query-hyde-combo.md | Decompose + HyDE for maximum coverage |

Reranking

Post-retrieval re-scoring for higher precision.

| Rule | File | Key Pattern | |------|------|-------------| | Cross-Encoder | rules/reranking-cross-encoder.md | ms-marco-MiniLM (~50ms, free) | | LLM Reranking | rules/reranking-llm.md | Batch scoring + Cohere API | | Combined | rules/reranking-combined.md | Multi-signal weighted scoring |

PGVector

Production hybrid search with PostgreSQL.

| Rule | File | Key Pattern | |------|------|-------------| | Schema | rules/pgvector-schema.md | HNSW index + pre-computed tsvector | | Hybrid Search | rules/pgvector-hybrid-search.md | SQLAlchemy RRF with FULL OUTER JOIN | | Indexing | rules/pgvector-indexing.md | HNSW (17x faster) vs IVFFlat | | Metadata | rules/pgvector-metadata.md | Filtering, boosting, Redis 8 comparison |

Quick Start Example

from openai import OpenAI

client = OpenAI()

async def rag_query(question: str, top_k: int = 5) -> dict:
    """Basic RAG with citations."""
    docs = await vector_db.search(question, limit=top_k)
    context = "\n\n".join([f"[{i+1}] {doc.text}" for i, doc in enumerate(docs)])

    response = await llm.chat([
        {"role": "system", "content": "Answer with inline citations [1], [2]. Use ONLY provided context."},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
    ])

    return {"answer": response.content, "sources": [d.metadata['source'] for d in docs]}

Key Decisions

| Decision | Recommendation | |----------|----------------| | Embedding model | text-embedding-3-small (general), voyage-3.5 (production) | | Chunk size | 256-1024 tokens (512 typical) | | Hybrid weight | 40% BM25 / 60% vector | | Top-k | 3-10 documents | | Temperature | 0.1-0.3 (factual) | | Context budget | 4K-8K tokens | | Reranking | Retrieve 50, rerank to 10 | | Vector index | HNSW (production), IVFFlat (high-volume) | | HyDE timeout | 2-3 seconds with fallback | | Query decomposition | Heuristic first, LLM only if multi-concept |

Common Mistakes

No citation tracking (unverifiable answers)
Context too large (dilutes relevance)
Single retrieval method (misses keyword matches)
Not chunking long documents (context gets lost)
Embedding queries differently than documents
No fallback path in agentic RAG (workflow hangs)
Infinite rewrite loops (no retry limit)
Using wrong similarity metric (cosine vs euclidean)
Not caching embeddings (recomputing unchanged content)
Missing image captions in multimodal RAG (limits text search)

Evaluations

See test-cases.json for 30 test cases across all categories.

Related Skills

ork:langgraph - LangGraph workflow patterns (for agentic RAG workflows)
caching - Cache RAG responses for repeated queries
ork:golden-dataset - Evaluate retrieval quality
ork:llm-integration - Local embeddings with nomic-embed-text
vision-language-models - Image analysis for multimodal RAG
ork:database-patterns - Schema design for vector search

Capability Details

retrieval-patterns

Keywords: retrieval, context, chunks, relevance, rag Solves:

Retrieve relevant context for LLM
Implement RAG pipeline with citations
Optimize retrieval quality

hybrid-search

Keywords: hybrid, bm25, vector, fusion, rrf Solves:

Combine keyword and semantic search
Implement reciprocal rank fusion
Balance precision and recall

embeddings

Keywords: embedding, text to vector, vectorize, chunk, similarity Solves:

Convert text to vector embeddings
Choose embedding models and dimensions
Implement chunking strategies

contextual-retrieval

Keywords: contextual, anthropic, context-prepend, bm25 Solves:

Prepend context to chunks for better retrieval
Reduce retrieval failures by 67%
Implement hybrid BM25+vector search

hyde

Keywords: hyde, hypothetical, vocabulary mismatch Solves:

Bridge vocabulary gaps in semantic search
Generate hypothetical documents for embedding
Handle abstract or conceptual queries

agentic-rag

Keywords: self-rag, crag, corrective, adaptive, grading Solves:

Build self-correcting RAG workflows
Grade document relevance
Implement web search fallback

multimodal-rag

Keywords: multimodal, image, clip, vision, pdf Solves:

Build RAG with images and text
Cross-modal search (text → image)
Process PDFs with mixed content

query-decomposition

Keywords: decompose, multi-concept, complex query Solves:

Break complex queries into concepts
Parallel retrieval per concept
Improve coverage for compound questions

reranking

Keywords: rerank, cross-encoder, precision, scoring Solves:

Improve search precision post-retrieval
Score relevance with cross-encoder or LLM
Combine multiple scoring signals

pgvector-search

Keywords: pgvector, postgresql, hnsw, tsvector, hybrid Solves:

Production hybrid search with PostgreSQL
HNSW vs IVFFlat index selection
SQL-based RRF fusion

Agent Skills: RAG Retrieval

Install this agent skill to your local

Skill Files