Agent Skills: elevenlabs

ElevenLabs API for text-to-speech and voice. Use when user mentions "ElevenLabs",

UncategorizedID: vm0-ai/vm0-skills/elevenlabs

Install this agent skill to your local

pnpm dlx add-skill https://github.com/vm0-ai/vm0-skills/tree/HEAD/elevenlabs

Skill Files

Browse the full folder contents for elevenlabs.

Download Skill

Loading file tree…

elevenlabs/SKILL.md

Skill Metadata

Name
elevenlabs
Description
ElevenLabs API for text-to-speech and voice. Use when user mentions "ElevenLabs",

Troubleshooting

If requests fail, run zero doctor check-connector --env-name ELEVENLABS_TOKEN or zero doctor check-connector --url https://api.elevenlabs.io/v1/voices --method GET

How to Use

All examples below assume you have ELEVENLABS_TOKEN set.

The base URL for the ElevenLabs API is:

  • https://api.elevenlabs.io/v1

1. List Available Voices

Get all voices available to your account:

curl -s -X GET "https://api.elevenlabs.io/v1/voices" --header "xi-api-key: $ELEVENLABS_TOKEN" | jq '.voices[] | {voice_id, name, category}'

This returns voice IDs needed for text-to-speech. Common voice categories:

  • premade: ElevenLabs default voices
  • cloned: Your cloned voices
  • generated: AI-designed voices

2. Get Voice Details

Get detailed information about a specific voice. Replace <your-voice-id> with an actual voice ID:

curl -s -X GET "https://api.elevenlabs.io/v1/voices/<your-voice-id>" --header "xi-api-key: $ELEVENLABS_TOKEN"

3. List Available Models

Get all available TTS models:

curl -s -X GET "https://api.elevenlabs.io/v1/models" --header "xi-api-key: $ELEVENLABS_TOKEN" | jq '.[] | {model_id, name, can_do_text_to_speech}'

Common models:

  • eleven_multilingual_v2: Best quality, supports 29 languages
  • eleven_flash_v2_5: Low latency, good for real-time
  • eleven_v3: Latest model (alpha)

4. Text to Speech (Save to File)

Convert text to speech and save as MP3. Replace <your-voice-id> with an actual voice ID from the list voices endpoint:

Write to /tmp/elevenlabs_request.json:

{
  "text": "Hello! This is a test of the ElevenLabs text to speech API.",
  "model_id": "eleven_multilingual_v2",
  "voice_settings": {
    "stability": 0.5,
    "similarity_boost": 0.75
  }
}

Then run:

curl -s -X POST "https://api.elevenlabs.io/v1/text-to-speech/<your-voice-id>" --header "xi-api-key: $ELEVENLABS_TOKEN" --header "Content-Type: application/json" --header "Accept: audio/mpeg" -d @/tmp/elevenlabs_request.json --output speech.mp3

Voice settings:

  • stability (0.0-1.0): Higher = more consistent, lower = more expressive
  • similarity_boost (0.0-1.0): Higher = closer to original voice

5. Text to Speech with Streaming

Stream audio for real-time playback. Replace <your-voice-id> with an actual voice ID:

Write to /tmp/elevenlabs_request.json:

{
  "text": "This audio is being streamed in real-time.",
  "model_id": "eleven_flash_v2_5"
}

Then run:

curl -s -X POST "https://api.elevenlabs.io/v1/text-to-speech/<your-voice-id>/stream" --header "xi-api-key: $ELEVENLABS_TOKEN" --header "Content-Type: application/json" --header "Accept: audio/mpeg" -d @/tmp/elevenlabs_request.json --output streamed.mp3

6. Get User Subscription Info

Check your usage and character limits:

curl -s -X GET "https://api.elevenlabs.io/v1/user/subscription" --header "xi-api-key: $ELEVENLABS_TOKEN" | jq '{character_count, character_limit, tier}'

Output Formats

You can specify different output formats via the output_format query parameter. Replace <your-voice-id> with an actual voice ID:

Write to /tmp/elevenlabs_request.json:

{
  "text": "Hello world",
  "model_id": "eleven_multilingual_v2"
}

Then run:

curl -s -X POST "https://api.elevenlabs.io/v1/text-to-speech/<your-voice-id>?output_format=pcm_16000" --header "xi-api-key: $ELEVENLABS_TOKEN" --header "Content-Type: application/json" -d @/tmp/elevenlabs_request.json --output speech.pcm

Available formats:

  • mp3_44100_192 (default): MP3 at 44.1kHz, 192kbps
  • mp3_44100_128: MP3 at 44.1kHz, 128kbps
  • pcm_16000: PCM at 16kHz
  • pcm_22050: PCM at 22.05kHz
  • pcm_24000: PCM at 24kHz

Guidelines

  1. Choose the right model: Use eleven_flash_v2_5 for low latency, eleven_multilingual_v2 for best quality
  2. Monitor usage: Check subscription endpoint to avoid exceeding character limits
  3. Experiment with voice settings: Adjust stability and similarity_boost for different effects
  4. Use streaming for long text: Stream endpoint is better for real-time applications
  5. Cache voice IDs: Store frequently used voice IDs to avoid repeated API calls