Perplexity Production Checklist Skill

Perplexity Production Checklist

Overview

Complete checklist for deploying Perplexity Sonar API integrations to production. Perplexity-specific concerns: every API call performs a live web search (variable latency), citations link to third-party sites (must validate), and costs scale per-request plus per-token.

Prerequisites

Staging environment tested
Production API key generated (separate from dev/staging)
Monitoring configured
Cost budget defined

Production Readiness Checklist

API Configuration

[ ] Production PERPLEXITY_API_KEY in secret manager (not env file)
[ ] Key starts with pplx- and has credits loaded
[ ] Separate API keys for dev/staging/prod
[ ] Base URL is https://api.perplexity.ai (not localhost/proxy)
[ ] Model selection configured: sonar for fast, sonar-pro for deep

Code Quality

[ ] All search calls wrapped in retry with exponential backoff
[ ] Rate limiting implemented (50 RPM default)
[ ] Query sanitization strips PII before sending to Perplexity
[ ] Citations parsed from response (not extracted from text)
[ ] max_tokens set on all requests (prevents runaway costs)
[ ] Timeouts configured: 15s for sonar, 30s for sonar-pro
[ ] Error handling covers 401, 402, 429, 500+ status codes
[ ] No hardcoded API keys in source code

Performance

[ ] Result caching implemented for repeated queries
[ ] Cache TTL appropriate: 30min for news, 4hrs for research, 24hrs for facts
[ ] Streaming enabled for user-facing search (reduces perceived latency)
[ ] Request queue prevents burst overload
[ ] search_domain_filter used where appropriate (reduces search time)

Monitoring

[ ] Latency tracked per model (sonar ~2s, sonar-pro ~5s, deep-research ~30s)
[ ] Error rate monitored (alert on >5% failure rate)
[ ] Token usage tracked for cost projection
[ ] Citation count per response logged (quality signal)
[ ] 429 rate limit errors tracked with alert

Cost Controls

[ ] Monthly budget cap set on API key
[ ] Model routing: simple queries to sonar, complex to sonar-pro
[ ] max_tokens capped per endpoint
[ ] Cache hit rate monitored (target >30%)
[ ] Cost per query tracked by model

Graceful Degradation

async function searchWithFallback(query: string) {
  try {
    // Primary: sonar-pro for deep answers
    return await perplexity.chat.completions.create({
      model: "sonar-pro",
      messages: [{ role: "user", content: query }],
      max_tokens: 2048,
    });
  } catch (err: any) {
    if (err.status === 429 || err.status >= 500) {
      // Fallback: sonar for faster, cheaper response
      return await perplexity.chat.completions.create({
        model: "sonar",
        messages: [{ role: "user", content: query }],
        max_tokens: 512,
      });
    }
    throw err;
  }
}

Health Check Endpoint

app.get("/health/perplexity", async (req, res) => {
  const start = Date.now();
  try {
    const response = await perplexity.chat.completions.create({
      model: "sonar",
      messages: [{ role: "user", content: "ping" }],
      max_tokens: 5,
    });
    res.json({
      status: "healthy",
      latencyMs: Date.now() - start,
      model: response.model,
    });
  } catch (err: any) {
    res.status(503).json({
      status: "unhealthy",
      error: err.status || err.message,
      latencyMs: Date.now() - start,
    });
  }
});

Alerting Rules

| Alert | Condition | Severity | |-------|-----------|----------| | API Unreachable | Health check fails 3x | P1 | | High Error Rate | 429/5xx > 5% over 5min | P2 | | High Latency | p95 > 15s for sonar | P2 | | Budget Exceeded | Monthly cost > 80% cap | P2 | | Auth Failure | Any 401/402 error | P1 |

Error Handling

| Issue | Cause | Solution | |-------|-------|----------| | Variable latency | Web search per request | Set appropriate timeouts per model | | Broken citations | Source pages changed | Validate citation URLs before displaying | | Cost overrun | No model routing | Route simple queries to sonar | | Rate limit spikes | Burst traffic | Queue requests with p-queue |

Output

Production-ready Perplexity integration with all checks passing
Health check endpoint for monitoring
Graceful degradation from sonar-pro to sonar
Alerting rules configured

Resources

Next Steps

For version upgrades, see perplexity-upgrade-migration.

Agent Skills: Perplexity Production Checklist

Install this agent skill to your local

Skill Files