Agent Skills: TTS LiveKit Plugin Skill

Build and deploy self-hosted Text-to-Speech API using MeloTTS from Hugging Face and create a LiveKit plugin for voice agents. Use this skill when building TTS systems, LiveKit voice agents, or self-hosted speech synthesis solutions.

UncategorizedID: Okeysir198/P20251122-claude-skills/tts-livekit-plugin

Install this agent skill to your local

pnpm dlx add-skill https://github.com/Okeysir198/P20251122-claude-skills/tree/HEAD/tts-livekit-plugin

Skill Files

Browse the full folder contents for tts-livekit-plugin.

Download Skill

Loading file tree…

tts-livekit-plugin/SKILL.md

Skill Metadata

Name
tts-livekit-plugin
Description
Build and deploy self-hosted Text-to-Speech API using MeloTTS from Hugging Face and create a LiveKit plugin for voice agents. Use this skill when building TTS systems, LiveKit voice agents, or self-hosted speech synthesis solutions.

TTS LiveKit Plugin Skill

This skill provides a complete solution for building self-hosted Text-to-Speech (TTS) systems integrated with LiveKit voice agents.

What This Skill Does

  1. Creates a Self-Hosted TTS API Server

    • FastAPI-based REST API
    • Uses MeloTTS model from Hugging Face
    • Supports streaming audio responses
    • Multi-language and multi-voice support
    • Production-ready with proper error handling
  2. Builds a LiveKit TTS Plugin

    • Fully compatible with LiveKit agents framework
    • Implements standard TTS interface
    • Streaming audio support
    • Proper error handling and retries
    • Drop-in replacement for commercial TTS providers
  3. Provides Complete Testing

    • Comprehensive test suite for API
    • Integration tests for plugin
    • No mocked functions - all real implementations
    • Performance and concurrency tests
  4. Includes Full Documentation

    • API documentation with examples
    • Plugin usage guide
    • Deployment guide for production
    • Multiple usage examples

Components

API Server (api/)

  • server.py: FastAPI server with MeloTTS integration
  • requirements.txt: Python dependencies
  • Endpoints:
    • GET /: Health check
    • GET /voices: List available voices
    • POST /synthesize: Full audio synthesis
    • POST /synthesize/stream: Streaming synthesis

LiveKit Plugin (plugin/)

  • melotts_plugin.py: LiveKit TTS plugin implementation
  • Extends livekit.agents.tts.TTS base class
  • Implements ChunkedStream for audio streaming
  • Uses aiohttp for HTTP requests
  • Proper exception handling (APIConnectionError, APITimeoutError, APIStatusError)

Tests (tests/)

  • test_api.py: API server tests

    • Health checks
    • Voice listing
    • Synthesis (streaming and non-streaming)
    • Multiple languages
    • Error handling
    • Concurrency
  • test_plugin.py: Plugin integration tests

    • Plugin initialization
    • Synthesis with real API
    • Multiple languages
    • Error handling
    • Concurrency
    • Timeout handling

Examples (examples/)

  • test_api_client.py: Standalone API testing script
  • simple_agent.py: Basic LiveKit agent example
  • voice_assistant.py: Complete voice assistant implementation

Documentation (docs/)

  • API.md: Complete API reference
  • PLUGIN.md: Plugin usage guide
  • DEPLOYMENT.md: Production deployment guide

Quick Start

1. Start the TTS API Server

cd api
pip install -r requirements.txt
python -m unidic download
python server.py

Server runs on http://localhost:8000

2. Test the API

cd examples
python test_api_client.py

3. Use in LiveKit Agent

from melotts_plugin import TTS

tts = TTS(
    api_base_url="http://localhost:8000",
    language="EN",
    speaker="EN-US",
    speed=1.0
)

stream = tts.synthesize("Hello from LiveKit!")

Features

  • ✅ Self-hosted (no external API dependencies)
  • ✅ High-quality natural speech (MeloTTS)
  • ✅ 6 languages: English, Spanish, French, Chinese, Japanese, Korean
  • ✅ Multiple voices per language
  • ✅ Streaming audio for low latency
  • ✅ CPU-friendly (optimized for real-time inference)
  • ✅ GPU support (automatic if available)
  • ✅ LiveKit agents framework compatible
  • ✅ Production-ready error handling
  • ✅ Comprehensive test coverage
  • ✅ Full documentation

Architecture

┌─────────────────┐      HTTP POST       ┌──────────────────┐
│  LiveKit Agent  │ ──────────────────►  │   TTS API        │
│                 │                       │   Server         │
│  ┌───────────┐  │                       │                  │
│  │ MeloTTS   │  │   Audio Stream       │  ┌────────────┐  │
│  │ Plugin    │  │ ◄──────────────────  │  │  MeloTTS   │  │
│  └───────────┘  │    (WAV chunks)      │  │  Model     │  │
└─────────────────┘                       │  └────────────┘  │
                                          └──────────────────┘

Why MeloTTS?

  • High Quality: Natural-sounding speech
  • Fast: Optimized for real-time inference
  • CPU-Friendly: Works well even without GPU
  • Multi-lingual: 6 languages supported
  • Low Latency: ~150-200ms TTFB
  • Open Source: Free to use and modify

Performance

  • Latency: 150-200ms time-to-first-byte
  • CPU Usage: Optimized for real-time on CPUs
  • GPU Support: Automatic acceleration if available
  • Streaming: Chunked delivery for low latency
  • Concurrent Requests: Handles multiple simultaneous requests

Supported Languages

| Language | Code | Speakers | |----------|------|----------| | English | EN | EN-US, EN-BR, EN-AU, EN-IN | | Spanish | ES | ES | | French | FR | FR | | Chinese | ZH | ZH | | Japanese | JP | JP | | Korean | KR | KR |

Testing

All tests use real implementations - no mocks:

# Start API server
cd api && python server.py

# Run API tests
cd tests && pytest test_api.py -v

# Run plugin tests
cd tests && pytest test_plugin.py -v

Deployment

Multiple deployment options:

  1. Standalone: Run directly with Python/Uvicorn
  2. Docker: Containerized deployment
  3. Kubernetes: Scalable cloud deployment
  4. Cloud: AWS, GCP, Azure support

See docs/DEPLOYMENT.md for detailed guides.

Integration with LiveKit

The plugin is a drop-in replacement for other TTS providers:

# Instead of:
# from livekit.plugins import openai
# tts = openai.TTS()

# Use:
from melotts_plugin import TTS
tts = TTS(api_base_url="http://localhost:8000")

# Same interface, self-hosted!

Use Cases

  • Voice assistants
  • Interactive voice response (IVR) systems
  • Accessibility tools
  • Educational applications
  • Multilingual customer service bots
  • Real-time voice agents
  • Live streaming with voice synthesis

Requirements

API Server:

  • Python 3.9+
  • 2GB+ RAM
  • FastAPI, MeloTTS, Uvicorn
  • Optional: GPU for faster inference

LiveKit Plugin:

  • Python 3.9+
  • livekit-agents >= 0.8.0
  • aiohttp >= 3.9.0

Security

For production:

  • Add API authentication
  • Enable HTTPS/TLS
  • Implement rate limiting
  • Configure CORS
  • Set up monitoring

See docs/DEPLOYMENT.md#security for details.

When to Use This Skill

Use this skill when you need to:

  1. Build a self-hosted TTS solution
  2. Create LiveKit voice agents with custom TTS
  3. Avoid commercial TTS API costs
  4. Have full control over voice synthesis
  5. Support multiple languages
  6. Deploy TTS in private/air-gapped environments
  7. Build voice assistants
  8. Integrate TTS into existing applications

Troubleshooting

Server won't start:

  • Run python -m unidic download
  • Check port 8000 is available
  • Verify dependencies installed

Plugin connection errors:

  • Ensure API server is running
  • Check api_base_url configuration
  • Verify network connectivity

Audio quality issues:

  • Try different voices/speakers
  • Adjust speed parameter
  • Check sample rate configuration

See documentation for more troubleshooting tips.

Resources

License

Apache 2.0 License

Support

  1. Check the documentation in docs/
  2. Review examples in examples/
  3. Run the test suite to verify setup
  4. Check logs for error messages

This skill provides everything needed for production-ready, self-hosted TTS with LiveKit integration. All code is fully functional with no mocks or placeholders.