Cohere Embeddings Reference
Official Resources
- Docs & Cookbooks: https://github.com/cohere-ai/cohere-developer-experience
- API Reference: https://docs.cohere.com/reference/about
Models Overview
| Model | Context | Dimensions | Features |
|-------|---------|------------|----------|
| embed-v4.0 | 128K tokens | 256/512/1024/1536 | Multimodal (text+image), Matryoshka |
| embed-english-v3.0 | 512 tokens | 1024 | English-only, fast |
| embed-multilingual-v3.0 | 512 tokens | 1024 | 100+ languages |
| embed-english-light-v3.0 | 512 tokens | 384 | Lightweight, fastest |
Input Types (CRITICAL)
Using the wrong
input_typewill silently degrade search quality. Cohere uses asymmetric embeddings where documents and queries are embedded differently.
| Input Type | Use Case |
|------------|----------|
| search_document | Documents stored in vector DB for retrieval |
| search_query | User queries searching against documents |
| classification | Text classification tasks |
| clustering | Clustering similar documents |
| image | Image inputs (Embed v4 only) |
Example: Search Pipeline
import cohere
co = cohere.ClientV2()
# INDEXING: Use search_document for docs you're storing
doc_response = co.embed(
model="embed-english-v3.0",
texts=documents,
input_type="search_document" # MUST use for storage
)
# QUERYING: Use search_query for user queries
query_response = co.embed(
model="embed-english-v3.0",
texts=[user_query],
input_type="search_query" # MUST use for retrieval
)
Native SDK Embeddings
Basic Text Embedding
response = co.embed(
model="embed-english-v3.0",
texts=["Hello world", "Machine learning is cool"],
input_type="search_document"
)
embeddings = response.embeddings.float_
print(f"Embedding shape: {len(embeddings)} x {len(embeddings[0])}")
Embed v4 with Matryoshka Dimensions
# High precision (default)
response = co.embed(
model="embed-v4.0",
texts=["text"],
input_type="search_document",
output_dimension=1536
)
# Balanced (3x faster search)
response = co.embed(
model="embed-v4.0",
texts=["text"],
input_type="search_document",
output_dimension=512
)
# Compact (6x faster search)
response = co.embed(
model="embed-v4.0",
texts=["text"],
input_type="search_document",
output_dimension=256
)
Different Embedding Types
response = co.embed(
model="embed-english-v3.0",
texts=["Hello"],
input_type="search_document",
embedding_types=["float", "int8", "uint8", "binary", "ubinary"]
)
float_emb = response.embeddings.float_
int8_emb = response.embeddings.int8
binary_emb = response.embeddings.binary
Multimodal Embeddings (Embed v4)
Image Embeddings
import base64
with open("image.jpg", "rb") as f:
image_base64 = base64.b64encode(f.read()).decode()
image_uri = f"data:image/jpeg;base64,{image_base64}"
response = co.embed(
model="embed-v4.0",
images=[image_uri],
input_type="image"
)
Mixed Content
response = co.embed(
model="embed-v4.0",
inputs=[
{"text": "A description of the product"},
{"image": image_uri},
{"text": "Another text chunk"}
],
input_type="search_document"
)
Batch Processing
Hard Limit: 96 Items Per Request
def embed_in_batches(texts: list, batch_size: int = 96):
"""Embed texts in batches of 96 (Cohere API limit)."""
all_embeddings = []
for i in range(0, len(texts), batch_size):
batch = texts[i:i + batch_size]
response = co.embed(
model="embed-english-v3.0",
texts=batch,
input_type="search_document"
)
all_embeddings.extend(response.embeddings.float_)
return all_embeddings
Embed Jobs API (Large Datasets)
job = co.embed_jobs.create(
model="embed-english-v3.0",
dataset_id="your-dataset-id",
input_type="search_document"
)
status = co.embed_jobs.get(job.job_id)
print(status.status) # "processing", "complete", "failed"
LangChain Integration
Basic Usage
from langchain_cohere import CohereEmbeddings
embeddings = CohereEmbeddings(model="embed-english-v3.0")
vector = embeddings.embed_query("What is machine learning?")
vectors = embeddings.embed_documents(["Document 1", "Document 2"])
With Vector Store
from langchain_cohere import CohereEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
embeddings = CohereEmbeddings(model="embed-english-v3.0")
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents(your_documents)
vectorstore = FAISS.from_documents(docs, embeddings)
results = vectorstore.similarity_search("your query", k=5)
Best Practices
- Match input types: Always use
search_documentfor stored docs andsearch_queryfor queries - Batch efficiently: Hard limit of 96 texts per request
- Choose dimensions wisely: Lower dimensions = faster search but slightly less precision
- Chunk long texts: Consider chunking at ~6000 chars (texts auto-truncate at 8K)
from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=6000,
chunk_overlap=200
)
chunks = splitter.split_text(long_document)