Perplexity Known Pitfalls
Overview
Real gotchas when integrating Perplexity Sonar API. Perplexity uses an OpenAI-compatible chat endpoint but performs live web searches -- a fundamentally different paradigm from standard LLM completions. These pitfalls come from treating it like a regular chatbot.
Prerequisites
- Perplexity API key configured
- Understanding of OpenAI-compatible chat API format
Pitfalls
1. Using It as a Generic Chatbot
Perplexity searches the web per request. Using it for tasks that don't need web search wastes money.
# BAD: general chatbot (wastes a search query)
response = call_perplexity("Write me a haiku about cats")
# Costs $0.005+ for something any LLM can do offline
# GOOD: leverage web search capability
response = call_perplexity(
"What are the latest Next.js 15 features released this month?",
search_recency_filter="month"
)
2. Ignoring Citations
Perplexity returns [1], [2] markers in text with a separate citations array. Ignoring them loses the key value prop.
data = response.model_dump() # or response.json() for raw HTTP
answer = data["choices"][0]["message"]["content"]
citations = data.get("citations", []) # NOT in choices — top-level field
# BAD: displaying raw markers
print(answer) # "According to [1], Node.js 22 adds..."
# GOOD: replace markers with links
import re
for i, url in enumerate(citations, 1):
answer = answer.replace(f"[{i}]", f"[{i}]({url})")
3. Using Wrong SDK Import
There is no @perplexity/sdk or perplexity Python package. Use the standard OpenAI client.
// BAD — this package doesn't exist
import { PerplexityClient } from "@perplexity/sdk";
// GOOD — use OpenAI client with Perplexity base URL
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.PERPLEXITY_API_KEY,
baseURL: "https://api.perplexity.ai",
});
4. Not Setting max_tokens
Without max_tokens, responses can be arbitrarily long, increasing costs unpredictably.
// BAD: no token limit — output cost can spike
await client.chat.completions.create({
model: "sonar-pro", // $15/M output tokens!
messages: [{ role: "user", content: "Tell me about AI" }],
});
// GOOD: always set max_tokens
await client.chat.completions.create({
model: "sonar-pro",
messages: [{ role: "user", content: "Tell me about AI" }],
max_tokens: 1024,
});
5. No Recency Filter for Time-Sensitive Queries
Without search_recency_filter, Perplexity may cite outdated articles.
# BAD: may return articles from any time period
response = call_perplexity("current Bitcoin price")
# GOOD: constrain to recent results
response = call_perplexity(
"current Bitcoin price",
search_recency_filter="day" # hour | day | week | month
)
6. Sending Full Conversation History
Each message in the conversation may trigger new search queries. Sending 20 turns of history is expensive and slow.
# BAD: 20 turns of history = many search queries
messages = long_history + [{"role": "user", "content": "summarize"}]
# GOOD: summarize context, send focused query
messages = [
{"role": "system", "content": "Answer based on web search."},
{"role": "user", "content": f"Context: {summary}\nQuestion: {question}"}
]
7. Using sonar-pro for Simple Queries
sonar-pro costs 3-15x more than sonar. Using it for simple factual lookups wastes budget.
// BAD: sonar-pro for a trivial question
await client.chat.completions.create({
model: "sonar-pro", // $3 input + $15 output per M tokens
messages: [{ role: "user", content: "What is the capital of France?" }],
});
// GOOD: match model to complexity
const model = isComplexQuery(query) ? "sonar-pro" : "sonar";
8. Mixing Allowlist and Denylist in Domain Filter
search_domain_filter supports either allowlist (include) or denylist (exclude with - prefix), but not both in the same request.
// BAD: mixing modes
search_domain_filter: ["python.org", "-reddit.com"] // ERROR
// GOOD: pick one mode
search_domain_filter: ["python.org", "docs.python.org"] // Allowlist
// OR
search_domain_filter: ["-reddit.com", "-quora.com"] // Denylist
9. Not Caching Search Results
Every uncached call performs a web search. At scale, duplicate queries burn budget.
// BAD: same query hits API every time
app.get("/search", (req, res) => {
const result = await client.chat.completions.create({ ... });
res.json(result);
});
// GOOD: cache by query hash
const cache = new LRUCache({ max: 1000, ttl: 3600_000 });
app.get("/search", (req, res) => {
const key = hash(req.query.q);
if (cache.has(key)) return res.json(cache.get(key));
const result = await client.chat.completions.create({ ... });
cache.set(key, result);
res.json(result);
});
10. Wrong Base URL
The API is at api.perplexity.ai, not api.perplexity.com.
// BAD
baseURL: "https://api.perplexity.com" // Wrong domain
// GOOD
baseURL: "https://api.perplexity.ai" // Correct
Code Review Checklist
- [ ] Uses
openaipackage, not fake@perplexity/sdk - [ ] Base URL is
https://api.perplexity.ai - [ ]
max_tokensset on every request - [ ] Citations parsed from
response.citationsarray - [ ]
search_recency_filterused for time-sensitive queries - [ ] Caching implemented for repeated queries
- [ ] Model routing: sonar for simple, sonar-pro for complex
- [ ] Conversation history trimmed before sending
- [ ] PII sanitized from queries
- [ ] Domain filter uses only allowlist OR denylist, not both
Error Handling
| Pitfall | Impact | Detection |
|---------|--------|-----------|
| No caching | 3-5x cost overrun | Check cache hit rate metric |
| Wrong model | Budget waste | Grep for sonar-pro in simple query paths |
| No max_tokens | Unpredictable costs | Grep for create() calls without max_tokens |
| PII in queries | Privacy violation | Run sanitization check in CI |
Output
- Identified anti-patterns in existing code
- Applied fixes for each pitfall
- Code review checklist for ongoing quality