TTS LiveKit Plugin Skill
This skill provides a complete solution for building self-hosted Text-to-Speech (TTS) systems integrated with LiveKit voice agents.
What This Skill Does
-
Creates a Self-Hosted TTS API Server
- FastAPI-based REST API
- Uses MeloTTS model from Hugging Face
- Supports streaming audio responses
- Multi-language and multi-voice support
- Production-ready with proper error handling
-
Builds a LiveKit TTS Plugin
- Fully compatible with LiveKit agents framework
- Implements standard TTS interface
- Streaming audio support
- Proper error handling and retries
- Drop-in replacement for commercial TTS providers
-
Provides Complete Testing
- Comprehensive test suite for API
- Integration tests for plugin
- No mocked functions - all real implementations
- Performance and concurrency tests
-
Includes Full Documentation
- API documentation with examples
- Plugin usage guide
- Deployment guide for production
- Multiple usage examples
Components
API Server (api/)
- server.py: FastAPI server with MeloTTS integration
- requirements.txt: Python dependencies
- Endpoints:
GET /: Health checkGET /voices: List available voicesPOST /synthesize: Full audio synthesisPOST /synthesize/stream: Streaming synthesis
LiveKit Plugin (plugin/)
- melotts_plugin.py: LiveKit TTS plugin implementation
- Extends
livekit.agents.tts.TTSbase class - Implements
ChunkedStreamfor audio streaming - Uses aiohttp for HTTP requests
- Proper exception handling (APIConnectionError, APITimeoutError, APIStatusError)
Tests (tests/)
-
test_api.py: API server tests
- Health checks
- Voice listing
- Synthesis (streaming and non-streaming)
- Multiple languages
- Error handling
- Concurrency
-
test_plugin.py: Plugin integration tests
- Plugin initialization
- Synthesis with real API
- Multiple languages
- Error handling
- Concurrency
- Timeout handling
Examples (examples/)
- test_api_client.py: Standalone API testing script
- simple_agent.py: Basic LiveKit agent example
- voice_assistant.py: Complete voice assistant implementation
Documentation (docs/)
- API.md: Complete API reference
- PLUGIN.md: Plugin usage guide
- DEPLOYMENT.md: Production deployment guide
Quick Start
1. Start the TTS API Server
cd api
pip install -r requirements.txt
python -m unidic download
python server.py
Server runs on http://localhost:8000
2. Test the API
cd examples
python test_api_client.py
3. Use in LiveKit Agent
from melotts_plugin import TTS
tts = TTS(
api_base_url="http://localhost:8000",
language="EN",
speaker="EN-US",
speed=1.0
)
stream = tts.synthesize("Hello from LiveKit!")
Features
- ✅ Self-hosted (no external API dependencies)
- ✅ High-quality natural speech (MeloTTS)
- ✅ 6 languages: English, Spanish, French, Chinese, Japanese, Korean
- ✅ Multiple voices per language
- ✅ Streaming audio for low latency
- ✅ CPU-friendly (optimized for real-time inference)
- ✅ GPU support (automatic if available)
- ✅ LiveKit agents framework compatible
- ✅ Production-ready error handling
- ✅ Comprehensive test coverage
- ✅ Full documentation
Architecture
┌─────────────────┐ HTTP POST ┌──────────────────┐
│ LiveKit Agent │ ──────────────────► │ TTS API │
│ │ │ Server │
│ ┌───────────┐ │ │ │
│ │ MeloTTS │ │ Audio Stream │ ┌────────────┐ │
│ │ Plugin │ │ ◄────────────────── │ │ MeloTTS │ │
│ └───────────┘ │ (WAV chunks) │ │ Model │ │
└─────────────────┘ │ └────────────┘ │
└──────────────────┘
Why MeloTTS?
- High Quality: Natural-sounding speech
- Fast: Optimized for real-time inference
- CPU-Friendly: Works well even without GPU
- Multi-lingual: 6 languages supported
- Low Latency: ~150-200ms TTFB
- Open Source: Free to use and modify
Performance
- Latency: 150-200ms time-to-first-byte
- CPU Usage: Optimized for real-time on CPUs
- GPU Support: Automatic acceleration if available
- Streaming: Chunked delivery for low latency
- Concurrent Requests: Handles multiple simultaneous requests
Supported Languages
| Language | Code | Speakers | |----------|------|----------| | English | EN | EN-US, EN-BR, EN-AU, EN-IN | | Spanish | ES | ES | | French | FR | FR | | Chinese | ZH | ZH | | Japanese | JP | JP | | Korean | KR | KR |
Testing
All tests use real implementations - no mocks:
# Start API server
cd api && python server.py
# Run API tests
cd tests && pytest test_api.py -v
# Run plugin tests
cd tests && pytest test_plugin.py -v
Deployment
Multiple deployment options:
- Standalone: Run directly with Python/Uvicorn
- Docker: Containerized deployment
- Kubernetes: Scalable cloud deployment
- Cloud: AWS, GCP, Azure support
See docs/DEPLOYMENT.md for detailed guides.
Integration with LiveKit
The plugin is a drop-in replacement for other TTS providers:
# Instead of:
# from livekit.plugins import openai
# tts = openai.TTS()
# Use:
from melotts_plugin import TTS
tts = TTS(api_base_url="http://localhost:8000")
# Same interface, self-hosted!
Use Cases
- Voice assistants
- Interactive voice response (IVR) systems
- Accessibility tools
- Educational applications
- Multilingual customer service bots
- Real-time voice agents
- Live streaming with voice synthesis
Requirements
API Server:
- Python 3.9+
- 2GB+ RAM
- FastAPI, MeloTTS, Uvicorn
- Optional: GPU for faster inference
LiveKit Plugin:
- Python 3.9+
- livekit-agents >= 0.8.0
- aiohttp >= 3.9.0
Security
For production:
- Add API authentication
- Enable HTTPS/TLS
- Implement rate limiting
- Configure CORS
- Set up monitoring
See docs/DEPLOYMENT.md#security for details.
When to Use This Skill
Use this skill when you need to:
- Build a self-hosted TTS solution
- Create LiveKit voice agents with custom TTS
- Avoid commercial TTS API costs
- Have full control over voice synthesis
- Support multiple languages
- Deploy TTS in private/air-gapped environments
- Build voice assistants
- Integrate TTS into existing applications
Troubleshooting
Server won't start:
- Run
python -m unidic download - Check port 8000 is available
- Verify dependencies installed
Plugin connection errors:
- Ensure API server is running
- Check
api_base_urlconfiguration - Verify network connectivity
Audio quality issues:
- Try different voices/speakers
- Adjust speed parameter
- Check sample rate configuration
See documentation for more troubleshooting tips.
Resources
License
Apache 2.0 License
Support
- Check the documentation in
docs/ - Review examples in
examples/ - Run the test suite to verify setup
- Check logs for error messages
This skill provides everything needed for production-ready, self-hosted TTS with LiveKit integration. All code is fully functional with no mocks or placeholders.