Deepgram Production Checklist
Overview
Comprehensive go-live checklist for Deepgram integrations. Covers singleton client, health checks, Prometheus metrics, alert rules, error handling, and a phased go-live timeline.
Production Readiness Matrix
| Category | Item | Status | |----------|------|--------| | Auth | Production API key with scoped permissions | [ ] | | Auth | Key stored in secret manager (not env file) | [ ] | | Auth | Key rotation schedule (90-day) configured | [ ] | | Auth | Fallback key provisioned and tested | [ ] | | Resilience | Retry with exponential backoff on 429/5xx | [ ] | | Resilience | Circuit breaker for cascade failure prevention | [ ] | | Resilience | Request timeout set (30s pre-recorded, 10s TTS) | [ ] | | Resilience | Graceful degradation when API unavailable | [ ] | | Performance | Singleton client (not creating per-request) | [ ] | | Performance | Concurrency limited (50-80% of plan limit) | [ ] | | Performance | Audio preprocessed (16kHz mono for best results) | [ ] | | Performance | Large files use callback URL (async) | [ ] | | Monitoring | Health check endpoint testing Deepgram API | [ ] | | Monitoring | Prometheus metrics: latency, error rate, usage | [ ] | | Monitoring | Alerts: error rate >5%, latency >10s, circuit open | [ ] | | Security | PII redaction enabled if handling sensitive audio | [ ] | | Security | Audio URLs validated (HTTPS, no private IPs) | [ ] | | Security | Audit logging on all operations | [ ] |
Instructions
Step 1: Production Singleton Client
import { createClient, DeepgramClient } from '@deepgram/sdk';
class ProductionDeepgram {
private static client: DeepgramClient | null = null;
static getClient(): DeepgramClient {
if (!this.client) {
const key = process.env.DEEPGRAM_API_KEY;
if (!key) throw new Error('DEEPGRAM_API_KEY required for production');
this.client = createClient(key);
}
return this.client;
}
// Force re-init (for key rotation)
static reset() { this.client = null; }
}
Step 2: Health Check Endpoint
import express from 'express';
import { createClient } from '@deepgram/sdk';
const app = express();
const deepgram = createClient(process.env.DEEPGRAM_API_KEY!);
app.get('/health', async (req, res) => {
const start = Date.now();
try {
// Test API connectivity by listing projects
const { error } = await deepgram.manage.getProjects();
const latency = Date.now() - start;
if (error) {
return res.status(503).json({
status: 'unhealthy',
deepgram: 'error',
error: error.message,
latency_ms: latency,
});
}
res.json({
status: 'healthy',
deepgram: 'connected',
latency_ms: latency,
timestamp: new Date().toISOString(),
});
} catch (err: any) {
res.status(503).json({
status: 'unhealthy',
deepgram: 'unreachable',
error: err.message,
latency_ms: Date.now() - start,
});
}
});
Step 3: Prometheus Metrics
import { Counter, Histogram, Gauge, Registry } from 'prom-client';
const registry = new Registry();
const transcriptionRequests = new Counter({
name: 'deepgram_requests_total',
help: 'Total Deepgram API requests',
labelNames: ['method', 'model', 'status'],
registers: [registry],
});
const transcriptionLatency = new Histogram({
name: 'deepgram_latency_seconds',
help: 'Deepgram API request latency',
labelNames: ['method', 'model'],
buckets: [0.5, 1, 2, 5, 10, 30],
registers: [registry],
});
const audioProcessed = new Counter({
name: 'deepgram_audio_seconds_total',
help: 'Total audio seconds processed',
labelNames: ['model'],
registers: [registry],
});
const activeConnections = new Gauge({
name: 'deepgram_active_connections',
help: 'Active WebSocket connections',
registers: [registry],
});
// Instrumented transcription
async function instrumentedTranscribe(url: string, model = 'nova-3') {
const timer = transcriptionLatency.startTimer({ method: 'prerecorded', model });
try {
const { result, error } = await deepgram.listen.prerecorded.transcribeUrl(
{ url }, { model, smart_format: true }
);
timer();
transcriptionRequests.inc({ method: 'prerecorded', model, status: error ? 'error' : 'ok' });
if (result?.metadata?.duration) {
audioProcessed.inc({ model }, result.metadata.duration);
}
if (error) throw error;
return result;
} catch (err) {
timer();
transcriptionRequests.inc({ method: 'prerecorded', model, status: 'error' });
throw err;
}
}
// Expose metrics endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', registry.contentType);
res.send(await registry.metrics());
});
Step 4: Alert Rules (Prometheus/AlertManager)
groups:
- name: deepgram
rules:
- alert: DeepgramHighErrorRate
expr: rate(deepgram_requests_total{status="error"}[5m]) / rate(deepgram_requests_total[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "Deepgram error rate > 5%"
- alert: DeepgramHighLatency
expr: histogram_quantile(0.95, rate(deepgram_latency_seconds_bucket[5m])) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "Deepgram P95 latency > 10s"
- alert: DeepgramHealthCheckFailed
expr: up{job="deepgram-service"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Deepgram health check failed for 2+ minutes"
Step 5: Error Handling Wrapper
async function safeTranscribe(url: string, options: Record<string, any> = {}) {
const timeout = options.timeout ?? 30000;
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), timeout);
try {
const result = await Promise.race([
instrumentedTranscribe(url, options.model ?? 'nova-3'),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('Transcription timeout')), timeout)
),
]);
clearTimeout(timeoutId);
return result;
} catch (err: any) {
clearTimeout(timeoutId);
// Log structured error
console.error(JSON.stringify({
level: 'error',
service: 'deepgram',
message: err.message,
url: url.substring(0, 100),
timestamp: new Date().toISOString(),
}));
throw err;
}
}
Step 6: Go-Live Timeline
| Phase | When | Actions | |-------|------|---------| | D-7 | 1 week before | Load test at 2x expected volume, security review | | D-3 | 3 days before | Smoke test with production key, verify all alerts fire | | D-1 | Day before | Confirm on-call rotation, validate dashboards | | D-0 | Launch | Shadow mode (10% traffic), monitoring open | | D+1 | Day after | Review error rate, latency, verify no anomalies | | D+7 | 1 week after | Full traffic, tune alert thresholds based on baselines |
Output
- Singleton client with reset capability
- Health check endpoint with latency reporting
- Prometheus metrics (requests, latency, audio, connections)
- AlertManager rules for error rate, latency, availability
- Timeout-safe transcription wrapper
- Phased go-live timeline
Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Health check 503 | API key expired | Rotate key, check secret manager |
| Metrics not scraped | Wrong port/path | Verify Prometheus target config |
| Alert storms | Thresholds too tight | Add for: duration, tune values |
| Timeout on large files | Sync mode too slow | Switch to callback URL pattern |