Vector Index Tuning
Guide to optimizing vector indexes for production performance.
Use this skill when
- Tuning HNSW parameters
- Implementing quantization
- Optimizing memory usage
- Reducing search latency
- Balancing recall vs speed
- Scaling to billions of vectors
Do not use this skill when
- You only need exact search on small datasets (use a flat index)
- You lack workload metrics or ground truth to validate recall
- You need end-to-end retrieval system design beyond index tuning
Instructions
- Gather workload targets (latency, recall, QPS), data size, and memory budget.
- Choose an index type and establish a baseline with default parameters.
- Benchmark parameter sweeps using real queries and track recall, latency, and memory.
- Validate changes on a staging dataset before rolling out to production.
Refer to resources/implementation-playbook.md for detailed patterns, checklists, and templates.
Safety
- Avoid reindexing in production without a rollback plan.
- Validate changes under realistic load before applying globally.
- Track recall regressions and revert if quality drops.
Resources
resources/implementation-playbook.mdfor detailed patterns, checklists, and templates.