transcript-search Skill

transcript-search

Search transcripts intelligently without loading entire texts into context.

Trit: 0 (ERGODIC - coordination/retrieval)

CRITICAL RULE

NEVER run SELECT text FROM transcripts or load full transcript bodies into context. Always use sentence-level extraction with regexp_extract_all or string_split + filtering.

Known Databases

| Path | Schema | Content | |------|--------|---------| | ~/worlds/a/all_transcripts.duckdb | transcripts(id, source, source_path, audio_path, timestamp, text, duration_seconds, session_id) | 174 voice memos + whisper transcripts | | ~/worlds/a/audio_transcript.duckdb | recordings, segments, speakers, words | Speaker-diarized audio with GF(3) | | ~/worlds/a/aqua_transcriptions.duckdb | varies | Aqua Voice transcriptions | | ~/.topos/duckdb-atlas/audio_transcript.duckdb | same as above | Atlas copy |

Search Patterns

1. Sentence-Level Context Extraction (PRIMARY)

Extract sentences matching keywords with surrounding context:

-- Find sentences about a topic with ±250 char context window
SELECT id, source, timestamp, trim(chunk) as context
FROM (
  SELECT id, source, timestamp,
    unnest(regexp_extract_all(text, '[^.]{0,250}KEYWORD[^.]{0,250}', 0)) as chunk
  FROM transcripts
)
WHERE length(trim(chunk)) > 15
ORDER BY id;

2. Multi-Keyword Intersection

Find sentences where multiple concepts co-occur:

-- Sentences mentioning BOTH term1 AND term2
WITH sentences AS (
  SELECT id, source, unnest(string_split(text, '.')) as sentence
  FROM transcripts
)
SELECT id, source, trim(sentence) as sentence
FROM sentences
WHERE lower(sentence) LIKE '%term1%'
  AND lower(sentence) LIKE '%term2%'
  AND length(trim(sentence)) > 20;

3. Quick Count Before Deep Dive

Always count first to avoid surprise data dumps:

-- How many transcripts mention X?
SELECT COUNT(*) as hits,
       array_agg(id ORDER BY id) as transcript_ids
FROM transcripts
WHERE lower(text) LIKE '%keyword%';

4. Temporal Search

-- Recent transcripts mentioning X
SELECT id, source, timestamp, left(text, 200) as preview
FROM transcripts
WHERE lower(text) LIKE '%keyword%'
  AND timestamp > NOW() - INTERVAL '7 days'
ORDER BY timestamp DESC;

5. Co-occurrence Matrix

-- Which transcripts mention both colors AND tabs?
SELECT id, source, timestamp
FROM transcripts
WHERE lower(text) LIKE '%color%'
  AND (lower(text) LIKE '%tab%' OR lower(text) LIKE '%tile%')
ORDER BY id;

Workflow

Count first: How many transcripts match? Get IDs.
Extract sentences: Use regex context windows, NOT full text.
Narrow: Add more keywords to intersect.
Report: Show relevant sentences with transcript ID + timestamp.

Known Color-Tab Mappings (from transcript #149)

From voice memo session #149, the color system for tabs/tiles:

Green = Emacs / conventional flow / "zero" baseline / bridging
Blue = secondary workspace
Red = active/alert state
Orange = Barton's aesthetic (shirt, rollers — transcript #168, #171)
Colors map to styles/environments in tiled terminal sessions
"Any color, any style, any tab, associated rows" — colors ARE the tab identifiers

Key quote: "And so what colors? Can you talk about color a little bit? Green is for what?" → Green was Emacs. "Currently green and red, there's 4 tiles" → tiled terminal layout.

Anti-Patterns

| ❌ Bad | ✅ Good | |--------|---------| | SELECT text FROM transcripts WHERE ... | SELECT id, trim(chunk) FROM (regexp_extract_all(...)) | | SELECT * FROM transcripts | SELECT id, source, timestamp, left(text, 200) as preview | | Loading 174 full transcripts | Count → filter IDs → extract sentences | | Grepping raw text blobs | DuckDB regex with context windows |

Related Skills

| Skill | Relationship | |-------|-------------| | yt-playlist-acset | Creates transcript DuckDBs from YouTube playlists | | live-recording | Captures voice memos via whisper-cpp | | duckdb-ies | Interactome analytics over transcripts | | duck-agent | DuckDB file discovery | | beeper | Transcripts were shared to Barton via Beeper |

Agent Skills: transcript-search

Install this agent skill to your local

Skill Files