Agent Skills: Podcast Transcription with AssemblyAI

|

UncategorizedID: tdhopper/dotfiles2.0/assemblyai-transcribe

Install this agent skill to your local

pnpm dlx add-skill https://github.com/tdhopper/dotfiles2.0/tree/HEAD/.claude/skills/assemblyai-transcribe

Skill Files

Browse the full folder contents for assemblyai-transcribe.

Download Skill

Loading file tree…

.claude/skills/assemblyai-transcribe/SKILL.md

Skill Metadata

Name
assemblyai-transcribe
Description
|

Podcast Transcription with AssemblyAI

Transcribe audio files with speaker diarization using scripts/transcribe.py.

Requirements

  • Set ASSEMBLYAI_API_KEY environment variable
  • Dependencies installed automatically via uv run

Supported Formats

WAV, MP3, AIFF, AAC, OGG, FLAC, M4A, WMA, WEBM

Usage

Transcribe a local file with speaker diarization (default):

uv run scripts/transcribe.py /path/to/podcast.mp3

Transcribe from a URL:

uv run scripts/transcribe.py https://example.com/podcast.mp3

Save to file:

uv run scripts/transcribe.py /path/to/podcast.mp3 -o transcript.txt

Specify expected number of speakers:

uv run scripts/transcribe.py /path/to/podcast.mp3 -n 3

Plain text output (no speaker labels):

uv run scripts/transcribe.py /path/to/podcast.mp3 --no-diarize -f text

SRT subtitle format:

uv run scripts/transcribe.py /path/to/podcast.mp3 -f srt -o subtitles.srt

Options

| Flag | Description | |------|-------------| | -o, --output | Output file path (default: stdout) | | -f, --format | Output format: diarized (default), text, srt | | --no-diarize | Disable speaker diarization | | -n, --speakers | Expected number of speakers (helps accuracy) |

Output Formats

  • diarized (default): [MM:SS] Speaker A: text with blank lines between utterances
  • text: Plain transcript without speaker labels or timestamps
  • srt: SRT subtitle format with speaker labels

Notes

  • Local files are uploaded to AssemblyAI's servers for processing, then transcribed
  • URLs are passed directly (the audio must be publicly accessible)
  • Polling interval is 5 seconds; long audio files may take several minutes
  • By default, AssemblyAI detects up to 10 speakers; use -n to hint if you know the count