transcript-search
Search transcripts intelligently without loading entire texts into context.
Trit: 0 (ERGODIC - coordination/retrieval)
CRITICAL RULE
NEVER run SELECT text FROM transcripts or load full transcript bodies into context.
Always use sentence-level extraction with regexp_extract_all or string_split + filtering.
Known Databases
| Path | Schema | Content |
|------|--------|---------|
| ~/worlds/a/all_transcripts.duckdb | transcripts(id, source, source_path, audio_path, timestamp, text, duration_seconds, session_id) | 174 voice memos + whisper transcripts |
| ~/worlds/a/audio_transcript.duckdb | recordings, segments, speakers, words | Speaker-diarized audio with GF(3) |
| ~/worlds/a/aqua_transcriptions.duckdb | varies | Aqua Voice transcriptions |
| ~/.topos/duckdb-atlas/audio_transcript.duckdb | same as above | Atlas copy |
Search Patterns
1. Sentence-Level Context Extraction (PRIMARY)
Extract sentences matching keywords with surrounding context:
-- Find sentences about a topic with ±250 char context window
SELECT id, source, timestamp, trim(chunk) as context
FROM (
SELECT id, source, timestamp,
unnest(regexp_extract_all(text, '[^.]{0,250}KEYWORD[^.]{0,250}', 0)) as chunk
FROM transcripts
)
WHERE length(trim(chunk)) > 15
ORDER BY id;
2. Multi-Keyword Intersection
Find sentences where multiple concepts co-occur:
-- Sentences mentioning BOTH term1 AND term2
WITH sentences AS (
SELECT id, source, unnest(string_split(text, '.')) as sentence
FROM transcripts
)
SELECT id, source, trim(sentence) as sentence
FROM sentences
WHERE lower(sentence) LIKE '%term1%'
AND lower(sentence) LIKE '%term2%'
AND length(trim(sentence)) > 20;
3. Quick Count Before Deep Dive
Always count first to avoid surprise data dumps:
-- How many transcripts mention X?
SELECT COUNT(*) as hits,
array_agg(id ORDER BY id) as transcript_ids
FROM transcripts
WHERE lower(text) LIKE '%keyword%';
4. Temporal Search
-- Recent transcripts mentioning X
SELECT id, source, timestamp, left(text, 200) as preview
FROM transcripts
WHERE lower(text) LIKE '%keyword%'
AND timestamp > NOW() - INTERVAL '7 days'
ORDER BY timestamp DESC;
5. Co-occurrence Matrix
-- Which transcripts mention both colors AND tabs?
SELECT id, source, timestamp
FROM transcripts
WHERE lower(text) LIKE '%color%'
AND (lower(text) LIKE '%tab%' OR lower(text) LIKE '%tile%')
ORDER BY id;
Workflow
- Count first: How many transcripts match? Get IDs.
- Extract sentences: Use regex context windows, NOT full text.
- Narrow: Add more keywords to intersect.
- Report: Show relevant sentences with transcript ID + timestamp.
Known Color-Tab Mappings (from transcript #149)
From voice memo session #149, the color system for tabs/tiles:
- Green = Emacs / conventional flow / "zero" baseline / bridging
- Blue = secondary workspace
- Red = active/alert state
- Orange = Barton's aesthetic (shirt, rollers — transcript #168, #171)
- Colors map to styles/environments in tiled terminal sessions
- "Any color, any style, any tab, associated rows" — colors ARE the tab identifiers
Key quote: "And so what colors? Can you talk about color a little bit? Green is for what?" → Green was Emacs. "Currently green and red, there's 4 tiles" → tiled terminal layout.
Anti-Patterns
| ❌ Bad | ✅ Good |
|--------|---------|
| SELECT text FROM transcripts WHERE ... | SELECT id, trim(chunk) FROM (regexp_extract_all(...)) |
| SELECT * FROM transcripts | SELECT id, source, timestamp, left(text, 200) as preview |
| Loading 174 full transcripts | Count → filter IDs → extract sentences |
| Grepping raw text blobs | DuckDB regex with context windows |
Related Skills
| Skill | Relationship |
|-------|-------------|
| yt-playlist-acset | Creates transcript DuckDBs from YouTube playlists |
| live-recording | Captures voice memos via whisper-cpp |
| duckdb-ies | Interactome analytics over transcripts |
| duck-agent | DuckDB file discovery |
| beeper | Transcripts were shared to Barton via Beeper |