AssemblyAI Performance Tuning Skill

AssemblyAI Performance Tuning

Overview

Optimize AssemblyAI transcription performance through model selection, parallel processing, caching, and webhook-based architectures.

Prerequisites

assemblyai package installed
Understanding of async patterns
Redis or in-memory cache available (optional)

Latency Benchmarks (Actual)

Async Transcription

| Audio Duration | Approx. Processing Time | Notes | |----------------|------------------------|-------| | 30 seconds | ~10-15 seconds | Includes queue time | | 5 minutes | ~30-60 seconds | Scales sub-linearly | | 1 hour | ~3-5 minutes | Depends on queue load | | 10 hours | ~15-30 minutes | Max async duration |

Streaming

| Metric | Value | |--------|-------| | First partial transcript | ~300ms (P50) | | Final transcript latency | ~500ms (P50) | | End-of-turn detection | Automatic with endpointing |

Model Speed vs. Accuracy

| Model | Speed | Accuracy | Price/hr | |-------|-------|----------|----------| | nano | Fastest | Good | $0.12 | | best (Universal-3) | Standard | Highest | $0.37 | | nova-3 (streaming) | Real-time | High | $0.47 | | nova-3-pro (streaming) | Real-time | Highest | $0.47 |

Instructions

Step 1: Choose the Right Model

import { AssemblyAI } from 'assemblyai';

const client = new AssemblyAI({
  apiKey: process.env.ASSEMBLYAI_API_KEY!,
});

// For highest accuracy (default)
const accurate = await client.transcripts.transcribe({
  audio: audioUrl,
  speech_model: 'best',
});

// For fastest processing and lowest cost
const fast = await client.transcripts.transcribe({
  audio: audioUrl,
  speech_model: 'nano',
});

Step 2: Parallel Batch Processing

import PQueue from 'p-queue';

const queue = new PQueue({ concurrency: 10 });

async function batchTranscribe(audioUrls: string[]) {
  const results = await Promise.all(
    audioUrls.map(url =>
      queue.add(() =>
        client.transcripts.transcribe({ audio: url, speech_model: 'nano' })
      )
    )
  );

  return results.filter(t => t.status === 'completed');
}

// Process 100 files with 10 concurrent jobs
const urls = Array.from({ length: 100 }, (_, i) => `https://storage.example.com/audio-${i}.mp3`);
const transcripts = await batchTranscribe(urls);
console.log(`Completed: ${transcripts.length}/${urls.length}`);

Step 3: Use Webhooks Instead of Polling

// SLOW: transcribe() polls every 3 seconds until done
const slow = await client.transcripts.transcribe({ audio: audioUrl });

// FAST: submit() returns immediately, webhook notifies on completion
const fast = await client.transcripts.submit({
  audio: audioUrl,
  webhook_url: 'https://your-app.com/webhooks/assemblyai',
});
// Your webhook handler processes the result — no polling overhead

Step 4: Cache Transcript Results

import { LRUCache } from 'lru-cache';
import type { Transcript } from 'assemblyai';

const transcriptCache = new LRUCache<string, Transcript>({
  max: 500,
  ttl: 60 * 60 * 1000, // 1 hour
});

async function getCachedTranscript(transcriptId: string): Promise<Transcript> {
  const cached = transcriptCache.get(transcriptId);
  if (cached) return cached;

  const transcript = await client.transcripts.get(transcriptId);
  if (transcript.status === 'completed') {
    transcriptCache.set(transcriptId, transcript);
  }
  return transcript;
}

Step 5: Redis Cache for Distributed Systems

import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL!);

async function getCachedTranscriptRedis(transcriptId: string): Promise<Transcript> {
  const cached = await redis.get(`transcript:${transcriptId}`);
  if (cached) return JSON.parse(cached);

  const transcript = await client.transcripts.get(transcriptId);
  if (transcript.status === 'completed') {
    await redis.setex(
      `transcript:${transcriptId}`,
      3600, // 1 hour TTL
      JSON.stringify(transcript)
    );
  }
  return transcript;
}

Step 6: Minimize Feature Overhead

// Only enable features you actually need — each adds processing time

// Minimal (fastest)
const minimal = await client.transcripts.transcribe({
  audio: audioUrl,
  speech_model: 'nano',
  punctuate: true,
  format_text: true,
});

// Full intelligence (slower, more expensive)
const full = await client.transcripts.transcribe({
  audio: audioUrl,
  speech_model: 'best',
  speaker_labels: true,
  sentiment_analysis: true,
  entity_detection: true,
  auto_highlights: true,
  content_safety: true,
  iab_categories: true,
  summarization: true,
  summary_type: 'bullets',
});

Step 7: Performance Monitoring

async function timedTranscribe(audioUrl: string, options: Record<string, any> = {}) {
  const start = Date.now();
  const transcript = await client.transcripts.transcribe({
    audio: audioUrl,
    ...options,
  });
  const durationMs = Date.now() - start;

  const stats = {
    transcriptId: transcript.id,
    status: transcript.status,
    audioDuration: transcript.audio_duration,
    processingTimeMs: durationMs,
    ratio: transcript.audio_duration
      ? (durationMs / 1000 / transcript.audio_duration).toFixed(2)
      : 'N/A',
    wordCount: transcript.words?.length ?? 0,
    model: options.speech_model ?? 'best',
  };

  console.log('Transcription stats:', stats);
  return { transcript, stats };
}

Output

Optimal model selection based on speed/accuracy/cost trade-offs
Parallel batch processing with concurrency control
Webhook-based architecture (eliminates polling overhead)
In-memory and Redis caching for transcript retrieval
Performance monitoring with processing time ratios

Error Handling

| Issue | Cause | Solution | |-------|-------|----------| | Slow transcription | Large file + best model | Use nano model or split audio | | Queue backlog | Too many concurrent submissions | Limit concurrency with p-queue | | Cache stale data | Transcript re-processed | Set appropriate TTL, invalidate on webhook | | Polling overhead | Using transcribe() for many files | Switch to submit() + webhooks |

Resources

Next Steps

For cost optimization, see assemblyai-cost-tuning.

Agent Skills: AssemblyAI Performance Tuning

Install this agent skill to your local

Skill Files