Podcast Transcription with AssemblyAI Skill

Podcast Transcription with AssemblyAI

Transcribe audio files with speaker diarization using scripts/transcribe.py.

Requirements

Set ASSEMBLYAI_API_KEY environment variable
Dependencies installed automatically via uv run

Supported Formats

WAV, MP3, AIFF, AAC, OGG, FLAC, M4A, WMA, WEBM

Usage

Transcribe a local file with speaker diarization (default):

uv run scripts/transcribe.py /path/to/podcast.mp3

Transcribe from a URL:

uv run scripts/transcribe.py https://example.com/podcast.mp3

Save to file:

uv run scripts/transcribe.py /path/to/podcast.mp3 -o transcript.txt

Specify expected number of speakers:

uv run scripts/transcribe.py /path/to/podcast.mp3 -n 3

Plain text output (no speaker labels):

uv run scripts/transcribe.py /path/to/podcast.mp3 --no-diarize -f text

SRT subtitle format:

uv run scripts/transcribe.py /path/to/podcast.mp3 -f srt -o subtitles.srt

Options

| Flag | Description | |------|-------------| | -o, --output | Output file path (default: stdout) | | -f, --format | Output format: diarized (default), text, srt | | --no-diarize | Disable speaker diarization | | -n, --speakers | Expected number of speakers (helps accuracy) |

Output Formats

diarized (default): [MM:SS] Speaker A: text with blank lines between utterances
text: Plain transcript without speaker labels or timestamps
srt: SRT subtitle format with speaker labels

Notes

Local files are uploaded to AssemblyAI's servers for processing, then transcribed
URLs are passed directly (the audio must be publicly accessible)
Polling interval is 5 seconds; long audio files may take several minutes
By default, AssemblyAI detects up to 10 speakers; use -n to hint if you know the count

Agent Skills: Podcast Transcription with AssemblyAI

Install this agent skill to your local

Skill Files