Agent Skills: LangChain Embeddings and Vector Search (Python)

'Build and query vector stores with LangChain 1.0 without getting burned

UncategorizedID: jeremylongshore/claude-code-plugins/langchain-embeddings-search

Install this agent skill to your local

pnpm dlx add-skill https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/HEAD/plugins/saas-packs/langchain-py-pack/skills/langchain-embeddings-search

Skill Files

Browse the full folder contents for langchain-embeddings-search.

Download Skill

Loading file tree…

plugins/saas-packs/langchain-py-pack/skills/langchain-embeddings-search/SKILL.md

Skill Metadata

Name
langchain-embeddings-search
Description
'Build and query vector stores with LangChain 1.0 without getting burned

LangChain Embeddings and Vector Search (Python)

Overview

FAISS.similarity_search_with_score() returns L2 distance — lower is better. Pinecone.similarity_search_with_score() returns cosine similarity — higher is better. Swap your vector store and your if score > 0.8 filter now keeps the garbage and drops the good results, silently. This is pain-catalog entry P12, and it is the single most common reason a "we migrated from FAISS to Pinecone for scale" project loses retrieval quality overnight.

The sibling gotchas:

  • P13 — RecursiveCharacterTextSplitter default separators break inside code fences, so RAG over Markdown docs truncates code examples mid-function
  • P14 — Embedding-dim mismatch crashes at insert time (after 10 minutes of processing), not at VectorStore.__init__; the failure blames "dim mismatch: 1536 != 3072" and no earlier error
  • P15 — Cohere/Jina reranker scores are within-query relative, so a 0.34 top-1 is not worse than a 0.92 top-1 on a different query; filtering by threshold is the wrong heuristic

This skill walks through embedding model selection, vector store creation with the version-safe dim guard, score normalization, hybrid keyword+vector search, and rerankers with the correct filter-by-rank pattern. Pin: langchain-core 1.0.x, langchain-community 1.0.x, langchain-openai 1.0.x, faiss-cpu, pinecone-client. Pain-catalog anchors: P12, P13, P14, P15, P49, P50.

Prerequisites

  • Python 3.10+
  • langchain-core >= 1.0, < 2.0 and langchain-community >= 1.0, < 2.0
  • Embedding provider: pip install langchain-openai (text-embedding-3-small/large)
  • Vector store: pip install faiss-cpu OR pip install langchain-pinecone
  • Provider API keys: OPENAI_API_KEY, PINECONE_API_KEY

Instructions

Step 1 — Initialize embeddings with an explicit dim

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",  # 1536 dims
    # For text-embedding-3-large, use 3072 dims — must match index
)

# Assert dim at startup (prevents P14)
assert len(embeddings.embed_query("test")) == 1536, "embedding dim drifted"

Swapping models (-small 1536 → -large 3072) is a migration, not a swap. Plan it — back-fill the index, not just the config.

Step 2 — Choose a vector store

| Store | Score metric | Latency (1M vectors) | When to use | |---|---|---|---| | FAISS | L2 distance (lower = better) | ~5ms | Local dev, < 1M vectors, in-process | | Chroma | Cosine similarity (higher = better) | ~10ms | Small multi-user, persistent local | | PGVector | Cosine by default (higher = better) | ~20ms | Existing Postgres, transactional needs | | PineconeVectorStore | Cosine similarity (higher = better) | ~50ms (hosted) | > 1M vectors, multi-tenant, managed |

from langchain_community.vectorstores import FAISS

store = FAISS.from_documents(docs, embedding=embeddings)
results = store.similarity_search_with_score("query", k=5)
# FAISS: [(doc, 0.31), (doc, 0.42), ...] — LOWER IS MORE SIMILAR

vs.

from langchain_pinecone import PineconeVectorStore

store = PineconeVectorStore(index_name="prod", embedding=embeddings)
results = store.similarity_search_with_score("query", k=5)
# Pinecone: [(doc, 0.91), (doc, 0.87), ...] — HIGHER IS MORE SIMILAR

See Vector Store Comparison for the feature matrix and the migration gotchas.

Step 3 — Normalize scores before any threshold filter

Write a normalizer at the retriever boundary, so downstream code never sees raw store-specific scores:

def normalize(score: float, store_type: str) -> float:
    """Return similarity in [0, 1] where 1 = identical, 0 = unrelated."""
    if store_type == "faiss_l2":
        return 1.0 / (1.0 + score)  # collapse L2 distance into similarity
    if store_type in {"pinecone", "chroma", "pgvector"}:
        return max(0.0, min(1.0, score))  # already similarity, clamp just in case
    raise ValueError(f"Unknown store type: {store_type}")

Now score > 0.7 means the same thing regardless of backend. See Score Semantics for the per-store derivation.

Step 4 — Chunk text with language-aware splitters

from langchain_text_splitters import RecursiveCharacterTextSplitter, Language

# BAD — breaks inside Markdown code fences (P13)
bad = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)

# GOOD — respects Markdown structure
md_splitter = RecursiveCharacterTextSplitter.from_language(
    Language.MARKDOWN, chunk_size=1000, chunk_overlap=100,
)

# For Python source files
py_splitter = RecursiveCharacterTextSplitter.from_language(
    Language.PYTHON, chunk_size=1500, chunk_overlap=150,
)

PDF pipelines have their own pain: PyPDFLoader splits by page, tearing tables in half (P49). Use PyMuPDFLoader or UnstructuredPDFLoader for documents with tables.

Step 5 — Hybrid search (keyword + vector)

Pure vector search misses exact-match keywords (product SKUs, error codes, function names). Combine BM25 + vector:

from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever

bm25 = BM25Retriever.from_documents(docs); bm25.k = 5
vector = store.as_retriever(search_kwargs={"k": 5})

ensemble = EnsembleRetriever(
    retrievers=[bm25, vector],
    weights=[0.4, 0.6],  # tune on your eval set
)

See Hybrid Search for the eval harness and the weight-tuning procedure.

Step 6 — Rerank by rank, not by score

from langchain_cohere import CohereRerank

reranker = CohereRerank(top_n=3, model="rerank-v3.5")
reranked = reranker.compress_documents(
    documents=candidates, query=query,
)
# reranked[0].metadata["relevance_score"] is query-relative — 0.34 may be the best
# WRONG: [d for d in reranked if d.metadata["relevance_score"] > 0.5]
# RIGHT: reranked[:top_n]  — trust the rank order

Filter by rank (keep top-k) not threshold. Calibration per-query is possible but rarely worth the engineering cost.

Output

  • Embeddings initialized with dim assertion at startup
  • Vector store chosen from the comparison matrix with score-semantics awareness
  • Score normalizer applied at retriever boundary (no raw scores downstream)
  • Language-aware text splitter that respects code fences and PDF structure
  • Hybrid retriever combining BM25 and vector with tuned weights
  • Reranker filtering by rank, not threshold

Error Handling

| Error | Cause | Fix | |-------|-------|-----| | PineconeApiException: dim mismatch: 1536 != 3072 | Changed embedding model without reindexing (P14) | Create a new index with the new dim; migrate in a background job | | Retrieval quality drops after FAISS→Pinecone swap | Score semantics flipped (P12) | Apply normalize() at boundary; retune threshold on eval set | | RAG answers misquote tables | PyPDFLoader tore table across pages (P49) | Switch to PyMuPDFLoader or UnstructuredPDFLoader | | RAG retrieval drops code examples mid-function | RecursiveCharacterTextSplitter broke code fence (P13) | Use from_language(Language.MARKDOWN/PYTHON) | | Cohere reranker top-1 score < 0.5 | Scores are per-query relative (P15) | Filter by rank (reranked[:k]), not threshold | | WebBaseLoader returns 403 / Cloudflare interstitial (P50) | Default User-Agent flagged as bot | Pass header_template={"User-Agent": "Mozilla/5.0 ..."}; respect robots.txt | | ValueError: expected str instance, NoneType found on embed | Empty document content | Filter docs = [d for d in docs if d.page_content.strip()] before embedding |

Examples

Building a RAG retriever with hybrid search

End-to-end: load Markdown docs with language-aware chunking, embed with OpenAI text-embedding-3-small, index in FAISS for local dev, wrap in an EnsembleRetriever with BM25 at 0.4 weight and vector at 0.6.

See Hybrid Search for the full builder and the weight-tuning procedure on a golden set.

Migrating from FAISS to Pinecone without quality regression

The three gotchas: (a) score semantics flip (P12), (b) the migration needs a re-embed unless the source embedding is stable, (c) threshold filters must be retuned on the new score scale.

See Vector Store Comparison for the migration checklist.

Per-tenant vector stores without leakage

Use Pinecone namespaces or PGVector row-level security. Construct the retriever per-request with the tenant ID — never bind a retriever at import time (P33).

See the pack's langchain-enterprise-rbac skill for the tenant-isolation pattern.

Resources