Livestream Skill: Warehouse Audio Pipeline
Live audio capture, transcription, and narration from the meeting room via Tailscale network.
Architecture
conversation-logger (10.1.10.107) Local Mac
3x EMEET OfficeCore M0 Plus USB mics (fallback: audio-capture-org.py)
Whisper large-v3-turbo, 6-speaker mlx-whisper-small, no diarization
PostgreSQL → Flask :5000 audio-capture.org → DuckDB
│ │
▼ ▼
/api/transcripts?limit=N live_history_pipeline.sql
│
└──── sshpass via gx10-acee ──────────────┐
▼
Say MCP (Samantha Enhanced)
Access Path
Step 1: SSH to gx10-acee (jump host)
sshpass -p 'aaaaaa' ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no a@100.67.53.87
- Host: gx10-acee, Tailscale IP: 100.67.53.87
- User:
a, Password:aaaaaa - NVIDIA HDA audio card, WiFi on
wlP9s9
Step 2: SSH to conversation-logger
sshpass -p 'aaaaaa' ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no alu@10.1.10.107
- Host: conversation-logger, LAN IP: 10.1.10.107
- User:
alu, Password:aaaaaa - 3x EMEET mics on ALSA cards 1, 2, 3
Step 3: Query the API
curl -s 'http://10.1.10.107:5000/api/transcripts?limit=10'
One-liner (from local Mac)
sshpass -p 'aaaaaa' ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no a@100.67.53.87 \
"curl -s 'http://10.1.10.107:5000/api/transcripts?limit=10'"
One-liner (execute command on logger)
sshpass -p 'aaaaaa' ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no a@100.67.53.87 \
'sshpass -p "aaaaaa" ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no alu@10.1.10.107 "COMMAND"'
API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| /api/transcripts?limit=N | GET | Recent transcripts (JSON: id, speaker_id, transcript, started_at, ended_at, zone_id, confidence, duration_sec) |
| /transcripts | GET | Web UI transcript browser |
| /conversations | GET | Conversation groupings |
| /digests | GET | Digest summaries |
| /speakers | GET | Speaker profiles |
Infrastructure on conversation-logger
Systemd Services
warehouse-capture-mic1.service— Mic 1 capture (device 4, Whisper large-v3-turbo)warehouse-capture-mic2.service— Mic 2 capture (device 5)warehouse-capture-mic3.service— Mic 3 capture (device 6)warehouse-autogain.service— Auto-gain controllerwarehouse-gui.service— Flask web dashboard (:5000)postgresql@16-main.service— PostgreSQL 16
Key Paths
/opt/warehouse-logging/scripts/capture_node.py— Main capture script/opt/warehouse-logging/scripts/auto_gain.py— Gain controller/opt/warehouse-logging/app.py— Flask dashboard/opt/warehouse-logging/venv/— Python virtualenv
Hardware
- 3x EMEET OfficeCore M0 Plus (USB, Bus 001 Devices 3/5/9)
- ALSA cards: 1 (Plus), 2 (Plus_1), 3 (Plus_2)
- NVIDIA HDA on card 0 (not used for capture)
Network
- WiFi only (
wlP9s9): SSIDTP-Link_A7B3, 2.4GHz Ch2, -47dBm, 94% - All ethernet ports DOWN (NO-CARRIER) — single point of failure
- Consider connecting ethernet for reliability
Live Narration Script
Save to /tmp/live-warehouse-stream.sh:
#!/bin/bash
# Speaks ALL new transcripts, batched by speaker, no cutoffs
export PATH="/Users/alice/v/.flox/run/aarch64-darwin.v.dev/bin:$PATH"
ACEE="100.67.53.87"
LOGGER="10.1.10.107"
LAST_ID=""
POLL_INTERVAL=5
LIMIT=50
voice_for_speaker() {
case "$1" in
SPEAKER_00|alu) echo "Ava (Premium)" ;;
SPEAKER_01) echo "Evan (Enhanced)" ;;
SPEAKER_02) echo "Allison (Enhanced)" ;;
SPEAKER_03) echo "Nathan (Enhanced)" ;;
SPEAKER_04) echo "Noelle (Enhanced)" ;;
SPEAKER_05) echo "Nicky (Enhanced)" ;;
silly-alu) echo "Samantha (Enhanced)" ;;
*) echo "Ava (Premium)" ;;
esac
}
while true; do
RESULT=$(sshpass -p 'aaaaaa' ssh -o ConnectTimeout=8 \
-o StrictHostKeyChecking=no -o BatchMode=no a@$ACEE \
"curl -s 'http://$LOGGER:5000/api/transcripts?limit=$LIMIT'" 2>/dev/null)
[ $? -ne 0 ] || [ -z "$RESULT" ] && { sleep $POLL_INTERVAL; continue; }
# Parse & reverse to chronological order
PARSED=$(echo "$RESULT" | python3 -c "
import json,sys
try:
d=json.load(sys.stdin)
lines = []
for t in d['transcripts']:
lines.append(f\"{t['id']}|{t['speaker_id']}|{t['transcript']}\")
for line in reversed(lines):
print(line)
except: pass
" 2>/dev/null)
[ -z "$PARSED" ] && { sleep $POLL_INTERVAL; continue; }
# First run: initialize without speaking history
if [ -z "$LAST_ID" ]; then
LAST_ID=$(echo "$PARSED" | tail -1 | cut -d'|' -f1)
sleep $POLL_INTERVAL; continue
fi
# Collect all new transcripts, batch consecutive same-speaker
FOUND_LAST=0; CURRENT_SPEAKER=""; CURRENT_TEXT=""; NEW_COUNT=0
while IFS= read -r line; do
ID=$(echo "$line" | cut -d'|' -f1)
SPEAKER=$(echo "$line" | cut -d'|' -f2)
TEXT=$(echo "$line" | cut -d'|' -f3)
if [ "$FOUND_LAST" -eq 0 ]; then
[ "$ID" = "$LAST_ID" ] && FOUND_LAST=1; continue
fi
NEW_COUNT=$((NEW_COUNT + 1)); LAST_ID="$ID"
TRIMMED=$(echo "$TEXT" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
[ -z "$TRIMMED" ] && continue
if [ "$SPEAKER" = "$CURRENT_SPEAKER" ]; then
CURRENT_TEXT="$CURRENT_TEXT $TRIMMED"
else
if [ -n "$CURRENT_TEXT" ] && [ -n "$CURRENT_SPEAKER" ]; then
VOICE=$(voice_for_speaker "$CURRENT_SPEAKER")
echo "[$(date +%H:%M:%S)] $CURRENT_SPEAKER: $CURRENT_TEXT"
say -v "$VOICE" -r 210 "$CURRENT_TEXT" 2>/dev/null
fi
CURRENT_SPEAKER="$SPEAKER"; CURRENT_TEXT="$TRIMMED"
fi
done <<< "$PARSED"
# Speak last batch
if [ -n "$CURRENT_TEXT" ] && [ "$NEW_COUNT" -gt 0 ]; then
VOICE=$(voice_for_speaker "$CURRENT_SPEAKER")
echo "[$(date +%H:%M:%S)] $CURRENT_SPEAKER: $CURRENT_TEXT"
say -v "$VOICE" -r 210 "$CURRENT_TEXT" 2>/dev/null
fi
sleep $POLL_INTERVAL
done
Key design choices
limit=50: Catches all transcripts between polls (Whisper produces ~1 fragment/second)- Chronological reversal: API returns newest-first; we reverse for natural speech order
- Speaker batching: Consecutive same-speaker fragments concatenated into one
saycall - No "SPEAKER says:" prefix: Voice identity conveys the speaker; text spoken naturally
- First-poll skip: Initializes at current position without blasting history
Say MCP Voice Selection
Two MCP servers available for TTS:
| Server | Tool | Voice Param | Rate Param |
|--------|------|-------------|------------|
| say | mcp__say__speak | Name string (e.g. "Ava (Premium)") | WPM (1-500, default 175) |
| macos-speech-sdk | mcp__macos-speech-sdk__speak | Name or identifier (e.g. "com.apple.voice.premium.en-US.Ava") | 0.0-1.0 mapped to 80-300 WPM, or direct WPM if >1 |
High-Quality en-US Voices
| Voice | Quality | Identifier | Gender | Trit |
|-------|---------|------------|--------|------|
| Ava (Premium) | premium | com.apple.voice.premium.en-US.Ava | F | +1 |
| Ava (Enhanced) | enhanced | com.apple.voice.enhanced.en-US.Ava | F | +1 |
| Samantha (Enhanced) | enhanced | com.apple.voice.enhanced.en-US.Samantha | F | 0 |
| Allison (Enhanced) | enhanced | com.apple.voice.enhanced.en-US.Allison | F | -1 |
| Evan (Enhanced) | enhanced | com.apple.voice.enhanced.en-US.Evan | M | +1 |
| Nathan (Enhanced) | enhanced | com.apple.voice.enhanced.en-US.Nathan | M | 0 |
| Nicky (Enhanced) | enhanced | com.apple.ttsbundle.siri_Nicky_en-US_premium | F | -1 |
| Noelle (Enhanced) | enhanced | com.apple.voice.enhanced.en-US.Noelle | F | 0 |
Per-Speaker Voice Mapping
- SPEAKER_00/alu → Ava (Premium) — primary speaker, highest quality
- SPEAKER_01 → Evan (Enhanced) — male voice for contrast
- SPEAKER_02 → Allison (Enhanced)
- SPEAKER_03 → Nathan (Enhanced)
- SPEAKER_04 → Noelle (Enhanced)
- SPEAKER_05 → Nicky (Enhanced)
MCP vs CLI Usage
- Background script (
/tmp/live-warehouse-stream.sh): Uses CLIsay -v "Voice Name"— works headless - In-session narration: Use
mcp__macos-speech-sdk__speakwith voice identifier for full control mcp__say__speakhasbackground: trueparam for non-blocking speech
Local Fallback (SDF Ch8 Degeneracy)
When remote pipeline is unreachable, use local mic capture:
/Users/alice/v/.venv-mlx-lm/bin/python /Users/alice/v/scripts/audio-capture-org.py
- Captures MacBook Pro Microphone via FFmpeg avfoundation
:1 - Transcribes with mlx-whisper-small (16kHz, 8s chunks)
- Appends to
/Users/alice/v/audio-capture.org
DuckDB Integration
Ingest history for audio digest
duckdb -c ".read /Users/alice/v/live_history_pipeline.sql"
- Merges claude/preclaude/codex history
- Generates TTS-ready
narration_linefields audio_digestview: top 10 sessions formatted for voice
Audio ACSet database
/Users/alice/v/audio_acset.duckdb— Structured audio metadata- Tables: AudioFile, Transcript, Segment, Speaker, Topic, ACSetSchema
SDF Analysis
Per Software Design for Flexibility (Hanson & Sussman):
- Ch1 Combinators: Pipeline = compose(ssh_tunnel, api_poll, tts_narrate)
- Ch7 Propagators: Transcripts flow: mic → whisper → postgres → API → say (bidirectional: can query history backwards)
- Ch8 Degeneracy: Remote warehouse (primary) vs local mic (fallback) — same generic interface, different implementations
- Ch9 Generic Dispatch:
narrate(source)dispatches on source type: warehouse API vs local org file
Dependency Structure
[USB Mics] ──USB──→ [conversation-logger]
│
[ALSA/PulseAudio]
│
[capture_node.py × 3]
│
[Whisper large-v3-turbo]
│
[PostgreSQL 16]
│
[Flask :5000]
│
[WiFi: TP-Link_A7B3] ← SINGLE POINT OF FAILURE
│
[LAN: 10.1.10.107]
│
[gx10-acee: 100.67.53.87 via Tailscale]
│
[Local Mac: sshpass + curl]
│
[Say MCP / say command]
Risk: WiFi is the only network path. All ethernet ports show NO-CARRIER. Mitigation: USB mics and local capture/transcription continue even if WiFi drops — data accumulates locally and can be retrieved when connectivity returns.