Agent Skills: RAG & Search Engineering — Complete Reference

RAG and search engineering — chunking, hybrid retrieval, reranking, and nDCG evaluation. Use when building retrieval-augmented generation pipelines.

UncategorizedID: vasilyu1983/ai-agents-public/ai-rag

Install this agent skill to your local

pnpm dlx add-skill https://github.com/vasilyu1983/AI-Agents-public/tree/HEAD/frameworks/shared-skills/skills/ai-rag

Skill Files

Browse the full folder contents for ai-rag.

Download Skill

Loading file tree…

frameworks/shared-skills/skills/ai-rag/SKILL.md

Skill Metadata

Name
ai-rag
Description
RAG and search engineering — chunking, hybrid retrieval, reranking, and nDCG evaluation. Use when building retrieval-augmented generation pipelines.

RAG & Search Engineering — Complete Reference

Build production-grade retrieval systems with hybrid search, grounded generation, and measurable quality.

This skill covers:

  • RAG: Chunking, contextual retrieval, grounding, adaptive/self-correcting systems
  • Search: BM25, vector search, hybrid fusion, ranking pipelines
  • Evaluation: recall@k, nDCG, MRR, groundedness metrics

Modern Best Practices (Jan 2026):

  • Separate retrieval quality from answer quality; evaluate both (RAG: https://arxiv.org/abs/2005.11401).
  • Default to hybrid retrieval (sparse + dense) with reranking when precision matters (DPR: https://arxiv.org/abs/2004.04906).
  • Use a failure taxonomy to debug systematically (Seven Failure Points in RAG: https://arxiv.org/abs/2401.05856).
  • Treat freshness/invalidation as first-class; staleness is a correctness bug, not a UX issue.
  • Add grounding gates: answerability checks, citation coverage checks, and refusal-on-missing-context defaults.
  • Threat-model RAG: retrieved text is untrusted input (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).

Default posture: deterministic pipeline, bounded context, explicit failure handling, and telemetry for every stage.

Scope note: For prompt structure and output contracts used in the generation phase, see ai-prompt-engineering.

Quick Reference

| Task | Tool/Framework | Command/Pattern | When to Use | |------|----------------|-----------------|-------------| | Decide RAG vs alternatives | Decision framework | RAG if: freshness + citations + corpus size; else: fine-tune/caching | Avoid unnecessary retrieval latency/complexity | | Chunking & parsing | Chunker + parser | Start simple; add structure-aware chunking per doc type | Ingestion for docs, code, tables, PDFs | | Retrieval | Sparse + dense (hybrid) | Fusion (e.g., RRF) + metadata filters + top-k tuning | Mixed query styles; high recall requirements | | Precision boost | Reranker | Cross-encoder/LLM rerank of top-k candidates | When top-k contains near-misses/noise | | Grounding | Output contract + citations | Quote/ID citations; answerability gate; refuse on missing evidence | Compliance, trust, and auditability | | Evaluation | Offline + online eval | Retrieval metrics + answer metrics + regression tests | Prevent silent regressions and staleness failures |

Decision Tree: RAG Architecture Selection

Building RAG system: [Architecture Path]
    ├─ Document type?
    │   ├─ Page/section-structured? → Structure-aware chunking (pages/sections + metadata)
    │   ├─ Technical docs/code? → Structure-aware + code-aware chunking (symbols, headers)
    │   └─ Simple content? → Fixed-size token chunking with overlap (baseline)
    │
    ├─ Retrieval accuracy low?
    │   ├─ Query ambiguity? → Query rewriting + multi-query expansion + filters
    │   ├─ Noisy results? → Add reranker + better metadata filters
    │   └─ Mixed queries? → Hybrid retrieval (sparse + dense) + reranking
    │
    ├─ Dataset size?
    │   ├─ <100k chunks? → Flat index (exact search)
    │   ├─ 100k-10M? → HNSW (low latency)
    │   └─ >10M? → IVF/ScaNN/DiskANN (scalable)
    │
    └─ Production quality?
        └─ Add: ACLs, freshness/invalidation, eval gates, and telemetry (end-to-end)

Core Concepts (Vendor-Agnostic)

  • Pipeline stages: ingest → chunk → embed → index → retrieve → rerank → pack context → generate → verify.
  • Two evaluation planes: retrieval relevance (did we fetch the right evidence?) vs generation fidelity (did we use it correctly?).
  • Freshness model: staleness budget, invalidation triggers, and rebuild strategy (incremental vs full).
  • Trust boundaries: retrieved content is untrusted; apply the same rigor as user input (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).

Implementation Practices (Tooling Examples)

  • Use a retrieval API contract: query, filters, top_k, trace_id, and returned evidence IDs.
  • Instrument each stage with tracing/metrics (OpenTelemetry GenAI semantic conventions: https://opentelemetry.io/docs/specs/semconv/gen-ai/).
  • Add caches deliberately: embeddings cache, retrieval cache (query+filters), and response cache (with invalidation).

Do / Avoid

Do

  • Do keep retrieval deterministic: fixed top_k, stable ranking, explicit filters.
  • Do enforce document-level ACLs at retrieval time (not only at generation time).
  • Do include citations with stable IDs and verify citation coverage in tests.

Avoid

  • Avoid shipping RAG without a test set and regression gate.
  • Avoid "stuff everything" context packing; it increases cost and can reduce accuracy.
  • Avoid mixing corpora without metadata and tenant isolation.

When to Use This Skill

Use this skill when the user asks:

  • "Help me design a RAG pipeline."
  • "How should I chunk this document?"
  • "Optimize retrieval for my use case."
  • "My RAG system is hallucinating — fix it."
  • "Choose the right vector database / index type."
  • "Create a RAG evaluation framework."
  • "Debug why retrieval gives irrelevant results."

Tool/Model Recommendation Protocol

When users ask for vendor/model/framework recommendations, validate claims against current primary sources.

Triggers

  • "What's the best vector database for [use case]?"
  • "What should I use for [chunking/embedding/reranking]?"
  • "What's the latest in RAG development?"
  • "Current best practices for [retrieval/grounding/evaluation]?"
  • "Is [Pinecone/Qdrant/Chroma] still relevant in 2026?"
  • "[Vector DB A] vs [Vector DB B]?"
  • "Best embedding model for [use case]?"
  • "What RAG framework should I use?"

Required Checks

  1. Read data/sources.json and start from sources with "add_as_web_search": true.
  2. Verify 1-2 primary docs per recommendation (release notes, benchmarks, docs).
  3. If browsing isn't available, state assumptions and give a verification checklist.

What to Report

After checking, provide:

  • Current landscape: What vector DBs/embeddings are popular NOW (not 6 months ago)
  • Emerging trends: Techniques gaining traction (late interaction, agentic RAG, graph RAG)
  • Deprecated/declining: Approaches or tools losing relevance
  • Recommendation: Based on fresh data, not just static knowledge

Example Topics (verify with current sources)

  • Vector databases (Pinecone, Qdrant, Weaviate, Milvus, pgvector, LanceDB)
  • Embedding models (OpenAI, Cohere, Voyage AI, Jina, Sentence Transformers)
  • Reranking (Cohere Rerank, Jina Reranker, FlashRank, RankGPT)
  • RAG frameworks (LlamaIndex, LangChain, Haystack, txtai)
  • Advanced RAG (contextual retrieval, agentic RAG, graph RAG, CRAG)
  • Evaluation (RAGAS, TruLens, DeepEval, BEIR)

Related Skills

For adjacent topics, reference these skills:

  • ai-llm - Prompting, fine-tuning, instruction datasets
  • ai-agents - Agentic RAG workflows and tool routing
  • ai-llm-inference - Serving performance, quantization, batching
  • ai-mlops - Deployment, monitoring, security, privacy, and governance
  • ai-prompt-engineering - Prompt patterns for RAG generation phase

Templates

System Design (Start Here)

Chunking & Ingestion

Embedding & Indexing

Retrieval & Reranking

Context Packaging & Grounding

Evaluation

Search Configuration

Query Rewriting

Navigation

Resources

Templates

Data

Use this skill whenever the user needs retrieval-augmented system design or debugging, not prompt work or deployment.

Fact-Checking

  • Use web search/web fetch to verify current external facts, versions, pricing, deadlines, regulations, or platform behavior before final answers.
  • Prefer primary sources; report source links and dates for volatile information.
  • If web access is unavailable, state the limitation and mark guidance as unverified.