Voice Mode Skill | Agent Skills

Voice Mode

Voice conversation mode where all responses are spoken aloud via Pocket TTS.

Setup

The tts.sh script lives in this skill's scripts/ directory. Resolve it relative to this SKILL.md:

SKILL_DIR="<absolute path to this skill's directory>"
TTS="${SKILL_DIR}/scripts/tts.sh"

Use ${TTS} for all commands below.

Activation

On activation, ALWAYS run these steps in order before anything else:

Check the TTS container is running:
```
${TTS} ensure
```
If this fails (exit code 1), tell the user the container is down and stop. Do NOT attempt to start it.

Confirm voice mode is active by speaking:

${TTS} play "Voice mode activated. I'm listening." -v eponine

Response Rules

While voice mode is active:

ALWAYS speak every response using tts.sh:

${TTS} play "<response text>" -v eponine

Prefer concise responses — aim for 1-3 sentences when used standalone. When combined with another skill, match the response length that skill requires.
Write naturally for speech — avoid markdown, bullet points, code blocks, URLs. Write as you'd speak in conversation.
Also output text — print a brief text version so the conversation is readable in the terminal.
Handle STT input gracefully — user input arrives as [STT]...[/STT] tags from their whisper script. The transcription may be imperfect. Infer intent from context rather than asking for clarification on every garbled word.
Split long responses — if you need to say more than ~2 sentences, make multiple tts.sh calls so audio starts playing sooner.

Voice Selection

Default voice: eponine

If the user provided an argument (e.g., /voice-mode jean), use that voice instead.

Available: alba, marius, javert, jean, fantine, cosette, eponine, azelma

Deactivation

Voice mode ends when the user says "stop voice mode", "text mode", or "stop talking". Confirm with a final spoken message: "Voice mode off. Back to text."

Configuration

All configurable via environment variables:

POCKET_TTS_PORT — server port (default: 18731)
POCKET_TTS_VOICE — default voice (default: eponine)
POCKET_TTS_SPEED — playback speed (default: 1.2)

Dependencies

Docker with pocket-tts container running (docker compose up -d from the pocket-tts repo)
mpv (audio playback)
curl

Agent Skills: Voice Mode

Install this agent skill to your local

Skill Files