ElevenLabs Performance Tuning
Overview
Optimize ElevenLabs TTS latency and throughput through model selection, streaming strategies, audio format tuning, and caching. Latency ranges from ~75ms (Flash) to ~500ms (v3) depending on configuration.
Prerequisites
- ElevenLabs SDK installed
- Understanding of your latency requirements
- Audio playback infrastructure (browser, mobile, server-side)
Instructions
Step 1: Model Selection for Latency
The single biggest performance lever is model choice:
| Model | Avg Latency | Quality | Languages | Use Case |
|-------|-------------|---------|-----------|----------|
| eleven_flash_v2_5 | ~75ms | Good | 32 | Real-time chat, IVR, gaming |
| eleven_turbo_v2_5 | ~150ms | Good | 32 | Balanced speed/quality |
| eleven_multilingual_v2 | ~300ms | High | 29 | Narration, content creation |
| eleven_v3 | ~500ms | Highest | 70+ | Maximum expressiveness |
// Select model based on use case
function selectModel(useCase: "realtime" | "balanced" | "quality" | "max_quality"): string {
const models = {
realtime: "eleven_flash_v2_5",
balanced: "eleven_turbo_v2_5",
quality: "eleven_multilingual_v2",
max_quality: "eleven_v3",
};
return models[useCase];
}
Step 2: Output Format Optimization
Smaller formats = faster transfer:
| Format | Size/Second | Quality | Best For |
|--------|-------------|---------|----------|
| mp3_44100_128 | ~16 KB/s | High | Downloads, archival |
| mp3_22050_32 | ~4 KB/s | Medium | Streaming, mobile |
| pcm_16000 | ~32 KB/s | Raw | Server-side processing |
| pcm_44100 | ~88 KB/s | Raw | High-quality processing |
| ulaw_8000 | ~8 KB/s | Phone | Telephony/IVR |
// Use smaller format for streaming, higher quality for downloads
const streamingConfig = {
output_format: "mp3_22050_32", // 4 KB/s — fast streaming
model_id: "eleven_flash_v2_5", // ~75ms first byte
};
const downloadConfig = {
output_format: "mp3_44100_128", // 16 KB/s — high quality
model_id: "eleven_multilingual_v2",
};
Step 3: HTTP Streaming for Time-to-First-Byte
Use the streaming endpoint to start playback before full generation completes:
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
const client = new ElevenLabsClient();
async function streamToResponse(
text: string,
voiceId: string,
res: Response | import("express").Response
) {
const startTime = performance.now();
const stream = await client.textToSpeech.stream(voiceId, {
text,
model_id: "eleven_flash_v2_5",
output_format: "mp3_22050_32",
voice_settings: {
stability: 0.5,
similarity_boost: 0.75,
style: 0.0, // style=0 reduces latency
},
});
let firstChunk = true;
for await (const chunk of stream) {
if (firstChunk) {
const ttfb = performance.now() - startTime;
console.log(`Time to first byte: ${ttfb.toFixed(0)}ms`);
firstChunk = false;
}
// Write chunk to response or audio player
(res as any).write(chunk);
}
(res as any).end();
}
Step 4: WebSocket Streaming for Lowest Latency
For interactive applications where text arrives in chunks (e.g., from an LLM):
import WebSocket from "ws";
interface WSStreamConfig {
voiceId: string;
modelId?: string;
chunkLengthSchedule?: number[];
}
async function createTTSStream(config: WSStreamConfig) {
const model = config.modelId || "eleven_flash_v2_5";
const url = `wss://api.elevenlabs.io/v1/text-to-speech/${config.voiceId}/stream-input?model_id=${model}`;
const ws = new WebSocket(url);
const audioChunks: Buffer[] = [];
let totalLatency = 0;
let firstAudioTime = 0;
await new Promise<void>((resolve, reject) => {
ws.on("open", resolve);
ws.on("error", reject);
});
// Initialize stream
ws.send(JSON.stringify({
text: " ",
xi_api_key: process.env.ELEVENLABS_API_KEY,
voice_settings: { stability: 0.5, similarity_boost: 0.75 },
// Control buffering: fewer chars = lower latency, more = better prosody
chunk_length_schedule: config.chunkLengthSchedule || [50, 120, 200],
}));
return {
// Send text chunks as they arrive (e.g., from LLM stream)
sendText(text: string) {
ws.send(JSON.stringify({ text }));
},
// Signal end of input
finish(): Promise<Buffer> {
return new Promise((resolve) => {
const sendTime = Date.now();
ws.on("message", (data: Buffer) => {
const msg = JSON.parse(data.toString());
if (msg.audio) {
if (!firstAudioTime) {
firstAudioTime = Date.now();
totalLatency = firstAudioTime - sendTime;
}
audioChunks.push(Buffer.from(msg.audio, "base64"));
}
if (msg.isFinal) {
console.log(`WebSocket TTFB: ${totalLatency}ms`);
ws.close();
resolve(Buffer.concat(audioChunks));
}
});
ws.send(JSON.stringify({ text: "" })); // EOS signal
});
},
};
}
// Usage with LLM streaming
const stream = await createTTSStream({
voiceId: "21m00Tcm4TlvDq8ikWAM",
chunkLengthSchedule: [50, 100, 150], // Aggressive buffering for speed
});
// As LLM tokens arrive:
stream.sendText("Hello, ");
stream.sendText("how are ");
stream.sendText("you today?");
const audio = await stream.finish();
Step 5: Audio Caching
Cache generated audio for repeated content (greetings, prompts, errors):
import { LRUCache } from "lru-cache";
import crypto from "crypto";
const audioCache = new LRUCache<string, Buffer>({
max: 500, // Max cached audio files
maxSize: 100 * 1024 * 1024, // 100MB total
sizeCalculation: (value) => value.length,
ttl: 24 * 60 * 60 * 1000, // 24 hours
});
function cacheKey(text: string, voiceId: string, modelId: string): string {
return crypto.createHash("sha256")
.update(`${voiceId}:${modelId}:${text}`)
.digest("hex");
}
async function cachedTTS(
text: string,
voiceId: string,
modelId = "eleven_multilingual_v2"
): Promise<Buffer> {
const key = cacheKey(text, voiceId, modelId);
const cached = audioCache.get(key);
if (cached) {
console.log("[Cache HIT]", key.substring(0, 8));
return cached;
}
const stream = await client.textToSpeech.convert(voiceId, {
text,
model_id: modelId,
});
const chunks: Buffer[] = [];
for await (const chunk of stream as any) {
chunks.push(Buffer.from(chunk));
}
const audio = Buffer.concat(chunks);
audioCache.set(key, audio);
console.log("[Cache MISS]", key.substring(0, 8), `${audio.length} bytes`);
return audio;
}
Step 6: Parallel Generation
Generate multiple audio segments concurrently:
import PQueue from "p-queue";
const queue = new PQueue({ concurrency: 5 }); // Match plan limit
async function generateChapters(
chapters: { title: string; text: string }[],
voiceId: string
): Promise<Buffer[]> {
const results = await Promise.all(
chapters.map(chapter =>
queue.add(async () => {
const start = performance.now();
const audio = await cachedTTS(chapter.text, voiceId);
const duration = performance.now() - start;
console.log(`${chapter.title}: ${duration.toFixed(0)}ms`);
return audio;
})
)
);
return results as Buffer[];
}
Performance Optimization Checklist
| Optimization | Latency Impact | Implementation |
|-------------|----------------|----------------|
| Flash model | -60% vs v2, -85% vs v3 | Change model_id |
| Streaming endpoint | -50% time-to-first-byte | Use .stream() instead of .convert() |
| WebSocket streaming | Best for LLM integration | See Step 4 |
| Smaller output format | -30% transfer time | mp3_22050_32 vs mp3_44100_128 |
| Audio caching | -99% for repeated content | LRU cache with SHA-256 keys |
| style: 0 | -10-20% latency | Remove style exaggeration |
| Concurrency queue | Maximize throughput | p-queue matching plan limit |
Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| High TTFB | Wrong model | Switch to eleven_flash_v2_5 |
| Choppy streaming | Network buffering | Use pcm_16000 for direct playback |
| Cache miss storm | TTL expired for popular content | Use stale-while-revalidate pattern |
| WebSocket drops | Network instability | Reconnect with buffered text |
| Memory pressure | Audio cache too large | Set maxSize limit on LRU cache |
Resources
Next Steps
For cost optimization, see elevenlabs-cost-tuning.