ElevenLabs Core Workflow B — Speech-to-Speech, Sound Effects & Audio Isolation
Overview
Secondary ElevenLabs workflows beyond TTS: (1) Speech-to-Speech voice conversion, (2) Sound Effects generation from text descriptions, (3) Audio Isolation for noise removal, and (4) Speech-to-Text transcription.
Prerequisites
- Completed
elevenlabs-install-authsetup - For STS: source audio file in MP3/WAV/M4A format
- For audio isolation: noisy audio file to clean
Instructions
Step 1: Speech-to-Speech (Voice Changer)
Transform audio from one voice to another using POST /v1/speech-to-speech/{voice_id}:
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createReadStream, createWriteStream } from "fs";
import { Readable } from "stream";
import { pipeline } from "stream/promises";
const client = new ElevenLabsClient();
async function speechToSpeech(
sourceAudioPath: string,
targetVoiceId: string,
outputPath: string
) {
const audio = await client.speechToSpeech.convert(targetVoiceId, {
audio: createReadStream(sourceAudioPath),
model_id: "eleven_english_sts_v2", // STS-specific model
voice_settings: JSON.stringify({
stability: 0.5,
similarity_boost: 0.8,
style: 0.0,
}),
remove_background_noise: true, // Built-in noise removal
});
await pipeline(Readable.fromWeb(audio as any), createWriteStream(outputPath));
console.log(`Voice-converted audio saved to ${outputPath}`);
}
// Convert your voice recording to sound like "Rachel"
await speechToSpeech(
"my_recording.mp3",
"21m00Tcm4TlvDq8ikWAM",
"converted.mp3"
);
cURL equivalent:
curl -X POST "https://api.elevenlabs.io/v1/speech-to-speech/21m00Tcm4TlvDq8ikWAM" \
-H "xi-api-key: ${ELEVENLABS_API_KEY}" \
-F "audio=@my_recording.mp3" \
-F "model_id=eleven_english_sts_v2" \
-F 'voice_settings={"stability":0.5,"similarity_boost":0.8}' \
-F "remove_background_noise=true" \
--output converted.mp3
Step 2: Sound Effects Generation
Generate cinematic sound effects from text descriptions using POST /v1/sound-generation:
async function generateSoundEffect(
description: string,
outputPath: string,
options?: {
duration?: number; // 0.5-30 seconds (null = auto)
promptInfluence?: number; // 0-1 (default 0.3, higher = follows prompt more closely)
loop?: boolean; // Seamless looping (default false)
}
) {
const audio = await client.textToSoundEffects.convert({
text: description,
duration_seconds: options?.duration,
prompt_influence: options?.promptInfluence ?? 0.3,
// model_id: "eleven_text_to_sound_v2", // default
});
await pipeline(Readable.fromWeb(audio as any), createWriteStream(outputPath));
console.log(`Sound effect saved to ${outputPath}`);
}
// Generate various sound effects
await generateSoundEffect(
"Heavy rain on a tin roof with distant thunder",
"rain.mp3",
{ duration: 10, promptInfluence: 0.6 }
);
await generateSoundEffect(
"Sci-fi laser gun firing three quick bursts",
"laser.mp3",
{ duration: 3, promptInfluence: 0.8 }
);
await generateSoundEffect(
"Gentle forest ambiance with birds chirping",
"forest_loop.mp3",
{ duration: 15, loop: true } // Seamless loop for background audio
);
cURL equivalent:
curl -X POST "https://api.elevenlabs.io/v1/sound-generation" \
-H "xi-api-key: ${ELEVENLABS_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"text": "Heavy rain on a tin roof with distant thunder",
"duration_seconds": 10,
"prompt_influence": 0.6
}' \
--output rain.mp3
Step 3: Audio Isolation (Voice Isolator)
Remove background noise from audio using POST /v1/audio-isolation:
async function isolateVoice(
noisyAudioPath: string,
cleanOutputPath: string
) {
const cleanAudio = await client.audioIsolation.audioIsolation({
audio: createReadStream(noisyAudioPath),
});
await pipeline(
Readable.fromWeb(cleanAudio as any),
createWriteStream(cleanOutputPath)
);
console.log(`Clean audio saved to ${cleanOutputPath}`);
}
// Remove background noise from a recording
await isolateVoice("noisy_interview.mp3", "clean_interview.mp3");
Streaming variant for large files (POST /v1/audio-isolation/stream):
async function isolateVoiceStreaming(
noisyAudioPath: string,
cleanOutputPath: string
) {
const stream = await client.audioIsolation.audioIsolationStream({
audio: createReadStream(noisyAudioPath),
});
const writer = createWriteStream(cleanOutputPath);
for await (const chunk of stream) {
writer.write(chunk);
}
writer.end();
}
cURL equivalent:
curl -X POST "https://api.elevenlabs.io/v1/audio-isolation" \
-H "xi-api-key: ${ELEVENLABS_API_KEY}" \
-F "audio=@noisy_interview.mp3" \
--output clean_interview.mp3
Step 4: Speech-to-Text (Transcription)
Transcribe audio with speaker diarization using POST /v1/speech-to-text:
async function transcribeAudio(audioPath: string) {
const result = await client.speechToText.convert({
audio: createReadStream(audioPath),
model_id: "scribe_v1", // ElevenLabs' STT model
// language_code: "en", // Optional: force language
// diarize: true, // Enable speaker detection
// timestamps_granularity: "word", // "word" or "character"
});
console.log("Transcription:", result.text);
// Word-level timestamps
if (result.words) {
for (const word of result.words) {
console.log(`[${word.start.toFixed(2)}-${word.end.toFixed(2)}] ${word.text}`);
}
}
return result;
}
await transcribeAudio("podcast_episode.mp3");
API Endpoint Summary
| Feature | Method | Endpoint | Billing |
|---------|--------|----------|---------|
| Speech-to-Speech | POST | /v1/speech-to-speech/{voice_id} | Per character |
| Sound Effects | POST | /v1/sound-generation | Per generation |
| Audio Isolation | POST | /v1/audio-isolation | 1,000 chars/min of audio |
| Audio Isolation Stream | POST | /v1/audio-isolation/stream | 1,000 chars/min of audio |
| Speech-to-Text | POST | /v1/speech-to-text | Per audio minute |
Sound Effect Tips
- Be specific: "wooden door creaking slowly open in a quiet room" beats "door sound"
- Specify quantity: "three quick gunshots" vs "gunshots"
- Set mood: "eerie", "cheerful", "aggressive" changes the output character
- Use
prompt_influence: 0.6-0.8for precise results,0.2-0.4for creative variation - Max duration: 30 seconds per generation
Audio Isolation Limits
| Aspect | Limit |
|--------|-------|
| Max file size | 500 MB |
| Max duration | 1 hour |
| Supported formats | MP3, WAV, M4A, FLAC, OGG, WEBM |
| PCM optimization | Use file_format: "pcm_s16le_16" for lowest latency |
Error Handling
| Error | HTTP | Cause | Solution |
|-------|------|-------|----------|
| model_can_not_do_voice_conversion | 400 | Wrong model for STS | Use eleven_english_sts_v2 |
| audio_too_short | 400 | STS input under 1 second | Use longer audio clip |
| audio_too_long | 400 | STS input over limit | Trim to under 5 minutes |
| invalid_sound_prompt | 400 | Nonsensical SFX description | Write descriptive, specific prompts |
| file_too_large | 413 | Audio isolation over 500MB | Compress or split the file |
| quota_exceeded | 401 | Character/generation limit hit | Check usage dashboard |
Resources
Next Steps
For common errors, see elevenlabs-common-errors. For SDK patterns, see elevenlabs-sdk-patterns.