AssemblyAI Cost Tuning
Overview
Optimize AssemblyAI costs through model selection, feature-aware billing, and usage monitoring. AssemblyAI charges per audio hour with add-on pricing for intelligence features.
Prerequisites
- Access to AssemblyAI billing dashboard at https://www.assemblyai.com/app
- Understanding of current usage patterns
Actual Pricing (Pay-As-You-Go)
Speech-to-Text (Async)
| Model | Price per Hour | Best For | |-------|---------------|----------| | Best (Universal-3) | $0.37/hr | Highest accuracy, production | | Nano | $0.12/hr | High volume, cost-sensitive |
Streaming Speech-to-Text
| Model | Price per Hour | |-------|---------------| | Universal Streaming | $0.47/hr |
Audio Intelligence Add-Ons
| Feature | Additional Cost per Hour | |---------|-------------------------| | Speaker Diarization | $0.02/hr | | Sentiment Analysis | $0.02/hr | | Entity Detection | $0.08/hr | | Auto Highlights | Included | | Content Safety | $0.02/hr | | IAB Categories | $0.02/hr | | Summarization | Included (uses LeMUR) | | PII Redaction | $0.02/hr | | PII Audio Redaction | +processing time |
LeMUR
| Model | Price per Input Token | Price per Output Token | |-------|----------------------|----------------------| | Default | ~$0.003/1K tokens | ~$0.015/1K tokens |
Instructions
Step 1: Cost Estimation Calculator
interface CostEstimate {
baseTranscriptionCost: number;
featuresCost: number;
totalCost: number;
breakdown: Record<string, number>;
}
function estimateTranscriptionCost(
audioHours: number,
options: {
model?: 'best' | 'nano';
speakerLabels?: boolean;
sentimentAnalysis?: boolean;
entityDetection?: boolean;
contentSafety?: boolean;
iabCategories?: boolean;
piiRedaction?: boolean;
} = {}
): CostEstimate {
const model = options.model ?? 'best';
const baseRate = model === 'best' ? 0.37 : 0.12;
const baseCost = audioHours * baseRate;
const breakdown: Record<string, number> = {
[`transcription (${model})`]: baseCost,
};
let featuresCost = 0;
if (options.speakerLabels) {
const cost = audioHours * 0.02;
breakdown['speaker_labels'] = cost;
featuresCost += cost;
}
if (options.sentimentAnalysis) {
const cost = audioHours * 0.02;
breakdown['sentiment_analysis'] = cost;
featuresCost += cost;
}
if (options.entityDetection) {
const cost = audioHours * 0.08;
breakdown['entity_detection'] = cost;
featuresCost += cost;
}
if (options.contentSafety) {
const cost = audioHours * 0.02;
breakdown['content_safety'] = cost;
featuresCost += cost;
}
if (options.iabCategories) {
const cost = audioHours * 0.02;
breakdown['iab_categories'] = cost;
featuresCost += cost;
}
if (options.piiRedaction) {
const cost = audioHours * 0.02;
breakdown['pii_redaction'] = cost;
featuresCost += cost;
}
return {
baseTranscriptionCost: baseCost,
featuresCost,
totalCost: baseCost + featuresCost,
breakdown,
};
}
// Example: 100 hours with Best model + diarization + sentiment
const estimate = estimateTranscriptionCost(100, {
model: 'best',
speakerLabels: true,
sentimentAnalysis: true,
});
// Result: $37 (transcription) + $2 (speakers) + $2 (sentiment) = $41
Step 2: Model Selection Strategy
import { AssemblyAI } from 'assemblyai';
const client = new AssemblyAI({
apiKey: process.env.ASSEMBLYAI_API_KEY!,
});
// Use Nano for high-volume, cost-sensitive workloads
// - 3x cheaper than Best ($0.12 vs $0.37)
// - Good enough for search indexing, keyword detection
const cheapTranscript = await client.transcripts.transcribe({
audio: audioUrl,
speech_model: 'nano',
});
// Use Best for critical, accuracy-sensitive workloads
// - Medical transcription, legal proceedings, compliance
// - Supports word_boost for domain terminology
const accurateTranscript = await client.transcripts.transcribe({
audio: audioUrl,
speech_model: 'best',
word_boost: ['specialized', 'domain', 'terms'],
boost_param: 'high',
});
Step 3: Feature Budget — Only Enable What You Need
// EXPENSIVE: All features enabled ($0.37 + $0.16 = $0.53/hr)
const expensive = await client.transcripts.transcribe({
audio: audioUrl,
speech_model: 'best', // $0.37/hr
speaker_labels: true, // +$0.02/hr
sentiment_analysis: true, // +$0.02/hr
entity_detection: true, // +$0.08/hr
content_safety: true, // +$0.02/hr
iab_categories: true, // +$0.02/hr
});
// CHEAP: Only what's needed ($0.12 + $0.02 = $0.14/hr)
const cheap = await client.transcripts.transcribe({
audio: audioUrl,
speech_model: 'nano', // $0.12/hr
speaker_labels: true, // +$0.02/hr
// Skip features you don't use
});
Step 4: Usage Tracking
class AssemblyAIUsageTracker {
private totalAudioHours = 0;
private totalCost = 0;
private transcriptionCount = 0;
track(audioDurationSeconds: number, model: 'best' | 'nano', features: string[]) {
const hours = audioDurationSeconds / 3600;
this.totalAudioHours += hours;
this.transcriptionCount++;
const estimate = estimateTranscriptionCost(hours, {
model,
speakerLabels: features.includes('speaker_labels'),
sentimentAnalysis: features.includes('sentiment_analysis'),
entityDetection: features.includes('entity_detection'),
contentSafety: features.includes('content_safety'),
iabCategories: features.includes('iab_categories'),
piiRedaction: features.includes('redact_pii'),
});
this.totalCost += estimate.totalCost;
return estimate;
}
getSummary() {
return {
totalAudioHours: this.totalAudioHours.toFixed(2),
totalCost: `$${this.totalCost.toFixed(2)}`,
transcriptionCount: this.transcriptionCount,
avgCostPerTranscription: `$${(this.totalCost / this.transcriptionCount).toFixed(4)}`,
};
}
}
Step 5: Cost Reduction Strategies
| Strategy | Savings | Trade-off | |----------|---------|-----------| | Use Nano instead of Best | 68% cheaper | Slightly lower accuracy | | Disable unused features | Up to $0.16/hr | Missing insights | | Cache transcript results | Eliminate re-fetch costs | Stale data risk | | Use LeMUR instead of per-feature AI | Often cheaper for summaries | Different output format | | Pre-filter audio (skip silence) | Proportional savings | Requires preprocessing | | Batch with webhooks | No savings, but better throughput | More complex architecture |
Step 6: Budget Alerts
const MONTHLY_BUDGET = 100; // $100
const tracker = new AssemblyAIUsageTracker();
// After each transcription
const estimate = tracker.track(transcript.audio_duration ?? 0, 'best', ['speaker_labels']);
const summary = tracker.getSummary();
if (parseFloat(summary.totalCost.replace('$', '')) > MONTHLY_BUDGET * 0.8) {
console.warn(`Budget warning: ${summary.totalCost} of $${MONTHLY_BUDGET} used`);
// Send alert to Slack, email, etc.
}
Output
- Accurate cost estimation with feature-level breakdown
- Model selection strategy (Best vs. Nano)
- Feature budgeting to eliminate unnecessary costs
- Usage tracking with budget alerts
- Cost reduction strategies ranked by impact
Error Handling
| Issue | Cause | Solution | |-------|-------|----------| | Unexpected high bill | Entity detection enabled everywhere | Audit features per endpoint | | Nano accuracy too low | Wrong model for use case | Switch critical paths to Best | | Budget exceeded | No monitoring | Implement usage tracker + alerts | | Double billing | Re-transcribing same audio | Cache transcript IDs, check before submitting |
Resources
Next Steps
For architecture patterns, see assemblyai-reference-architecture.