Voice Mode
Voice conversation mode where all responses are spoken aloud via Pocket TTS.
Setup
The tts.sh script lives in this skill's scripts/ directory. Resolve it relative to this SKILL.md:
SKILL_DIR="<absolute path to this skill's directory>"
TTS="${SKILL_DIR}/scripts/tts.sh"
Use ${TTS} for all commands below.
Activation
On activation, ALWAYS run these steps in order before anything else:
-
Check the TTS container is running:
${TTS} ensureIf this fails (exit code 1), tell the user the container is down and stop. Do NOT attempt to start it.
-
Confirm voice mode is active by speaking:
${TTS} play "Voice mode activated. I'm listening." -v eponine
Response Rules
While voice mode is active:
-
ALWAYS speak every response using tts.sh:
${TTS} play "<response text>" -v eponine -
Prefer concise responses — aim for 1-3 sentences when used standalone. When combined with another skill, match the response length that skill requires.
-
Write naturally for speech — avoid markdown, bullet points, code blocks, URLs. Write as you'd speak in conversation.
-
Also output text — print a brief text version so the conversation is readable in the terminal.
-
Handle STT input gracefully — user input arrives as
[STT]...[/STT]tags from their whisper script. The transcription may be imperfect. Infer intent from context rather than asking for clarification on every garbled word. -
Split long responses — if you need to say more than ~2 sentences, make multiple tts.sh calls so audio starts playing sooner.
Voice Selection
Default voice: eponine
If the user provided an argument (e.g., /voice-mode jean), use that voice instead.
Available: alba, marius, javert, jean, fantine, cosette, eponine, azelma
Deactivation
Voice mode ends when the user says "stop voice mode", "text mode", or "stop talking". Confirm with a final spoken message: "Voice mode off. Back to text."
Configuration
All configurable via environment variables:
POCKET_TTS_PORT— server port (default: 18731)POCKET_TTS_VOICE— default voice (default: eponine)POCKET_TTS_SPEED— playback speed (default: 1.2)
Dependencies
- Docker with pocket-tts container running (
docker compose up -dfrom the pocket-tts repo) - mpv (audio playback)
- curl