Perplexity Production Checklist
Overview
Complete checklist for deploying Perplexity Sonar API integrations to production. Perplexity-specific concerns: every API call performs a live web search (variable latency), citations link to third-party sites (must validate), and costs scale per-request plus per-token.
Prerequisites
- Staging environment tested
- Production API key generated (separate from dev/staging)
- Monitoring configured
- Cost budget defined
Production Readiness Checklist
API Configuration
- [ ] Production
PERPLEXITY_API_KEYin secret manager (not env file) - [ ] Key starts with
pplx-and has credits loaded - [ ] Separate API keys for dev/staging/prod
- [ ] Base URL is
https://api.perplexity.ai(not localhost/proxy) - [ ] Model selection configured:
sonarfor fast,sonar-profor deep
Code Quality
- [ ] All search calls wrapped in retry with exponential backoff
- [ ] Rate limiting implemented (50 RPM default)
- [ ] Query sanitization strips PII before sending to Perplexity
- [ ] Citations parsed from response (not extracted from text)
- [ ]
max_tokensset on all requests (prevents runaway costs) - [ ] Timeouts configured: 15s for sonar, 30s for sonar-pro
- [ ] Error handling covers 401, 402, 429, 500+ status codes
- [ ] No hardcoded API keys in source code
Performance
- [ ] Result caching implemented for repeated queries
- [ ] Cache TTL appropriate: 30min for news, 4hrs for research, 24hrs for facts
- [ ] Streaming enabled for user-facing search (reduces perceived latency)
- [ ] Request queue prevents burst overload
- [ ]
search_domain_filterused where appropriate (reduces search time)
Monitoring
- [ ] Latency tracked per model (sonar ~2s, sonar-pro ~5s, deep-research ~30s)
- [ ] Error rate monitored (alert on >5% failure rate)
- [ ] Token usage tracked for cost projection
- [ ] Citation count per response logged (quality signal)
- [ ] 429 rate limit errors tracked with alert
Cost Controls
- [ ] Monthly budget cap set on API key
- [ ] Model routing: simple queries to
sonar, complex tosonar-pro - [ ]
max_tokenscapped per endpoint - [ ] Cache hit rate monitored (target >30%)
- [ ] Cost per query tracked by model
Graceful Degradation
async function searchWithFallback(query: string) {
try {
// Primary: sonar-pro for deep answers
return await perplexity.chat.completions.create({
model: "sonar-pro",
messages: [{ role: "user", content: query }],
max_tokens: 2048,
});
} catch (err: any) {
if (err.status === 429 || err.status >= 500) {
// Fallback: sonar for faster, cheaper response
return await perplexity.chat.completions.create({
model: "sonar",
messages: [{ role: "user", content: query }],
max_tokens: 512,
});
}
throw err;
}
}
Health Check Endpoint
app.get("/health/perplexity", async (req, res) => {
const start = Date.now();
try {
const response = await perplexity.chat.completions.create({
model: "sonar",
messages: [{ role: "user", content: "ping" }],
max_tokens: 5,
});
res.json({
status: "healthy",
latencyMs: Date.now() - start,
model: response.model,
});
} catch (err: any) {
res.status(503).json({
status: "unhealthy",
error: err.status || err.message,
latencyMs: Date.now() - start,
});
}
});
Alerting Rules
| Alert | Condition | Severity | |-------|-----------|----------| | API Unreachable | Health check fails 3x | P1 | | High Error Rate | 429/5xx > 5% over 5min | P2 | | High Latency | p95 > 15s for sonar | P2 | | Budget Exceeded | Monthly cost > 80% cap | P2 | | Auth Failure | Any 401/402 error | P1 |
Error Handling
| Issue | Cause | Solution | |-------|-------|----------| | Variable latency | Web search per request | Set appropriate timeouts per model | | Broken citations | Source pages changed | Validate citation URLs before displaying | | Cost overrun | No model routing | Route simple queries to sonar | | Rate limit spikes | Burst traffic | Queue requests with p-queue |
Output
- Production-ready Perplexity integration with all checks passing
- Health check endpoint for monitoring
- Graceful degradation from sonar-pro to sonar
- Alerting rules configured
Resources
Next Steps
For version upgrades, see perplexity-upgrade-migration.