AssemblyAI Core Workflow A — Async Transcription
Overview
Primary money-path workflow: submit audio for async transcription with audio intelligence features. The SDK handles file upload (for local files), queues the transcription job, and polls until completion.
Prerequisites
assemblyaipackage installed- API key configured in
ASSEMBLYAI_API_KEY
Instructions
Step 1: Basic Async Transcription
import { AssemblyAI } from 'assemblyai';
const client = new AssemblyAI({
apiKey: process.env.ASSEMBLYAI_API_KEY!,
});
// Remote URL — SDK queues and polls automatically
const transcript = await client.transcripts.transcribe({
audio: 'https://example.com/meeting-recording.mp3',
});
console.log(transcript.text);
console.log(`Duration: ${transcript.audio_duration}s`);
console.log(`Words: ${transcript.words?.length}`);
Step 2: Local File Upload
// The SDK uploads the file and transcribes in one call
const transcript = await client.transcripts.transcribe({
audio: './recordings/interview.wav',
});
// Or from a buffer/stream
import fs from 'fs';
const buffer = fs.readFileSync('./recordings/interview.wav');
const transcript2 = await client.transcripts.transcribe({
audio: buffer,
});
Step 3: Speaker Diarization
const transcript = await client.transcripts.transcribe({
audio: audioUrl,
speaker_labels: true,
speakers_expected: 3, // Optional: hint for expected speaker count
});
// Utterances are grouped by speaker
for (const utterance of transcript.utterances ?? []) {
console.log(`Speaker ${utterance.speaker}: ${utterance.text}`);
// Speaker A: Good morning, thanks for joining.
// Speaker B: Happy to be here.
}
Step 4: Full Audio Intelligence Stack
const transcript = await client.transcripts.transcribe({
audio: audioUrl,
// Speaker identification
speaker_labels: true,
// Content analysis
sentiment_analysis: true,
entity_detection: true,
auto_highlights: true,
iab_categories: true, // Topic detection (IAB taxonomy)
content_safety: true, // Flag sensitive content
summarization: true,
summary_model: 'informative',
summary_type: 'bullets',
// Formatting
punctuate: true,
format_text: true,
language_code: 'en',
// Word boost for domain terms
word_boost: ['AssemblyAI', 'LeMUR', 'transcription'],
boost_param: 'high',
});
// --- Access results ---
// Sentiment per sentence
for (const s of transcript.sentiment_analysis_results ?? []) {
console.log(`[${s.sentiment}] ${s.text}`);
// [POSITIVE] I really enjoyed working on this project.
}
// Named entities
for (const e of transcript.entities ?? []) {
console.log(`${e.entity_type}: ${e.text}`);
// person_name: John Smith
// location: San Francisco
}
// Auto-highlighted key phrases
for (const h of transcript.auto_highlights_result?.results ?? []) {
console.log(`"${h.text}" (count: ${h.count}, rank: ${h.rank})`);
}
// IAB content categories
const categories = transcript.iab_categories_result?.summary ?? {};
for (const [category, relevance] of Object.entries(categories)) {
if ((relevance as number) > 0.5) {
console.log(`Topic: ${category} (${((relevance as number) * 100).toFixed(0)}%)`);
}
}
// Content safety labels
for (const result of transcript.content_safety_labels?.results ?? []) {
for (const label of result.labels) {
console.log(`Safety: ${label.label} (${(label.confidence * 100).toFixed(0)}%)`);
}
}
// Summary
console.log('Summary:', transcript.summary);
Step 5: PII Redaction
const transcript = await client.transcripts.transcribe({
audio: audioUrl,
redact_pii: true,
redact_pii_policies: [
'email_address',
'phone_number',
'person_name',
'credit_card_number',
'social_security_number',
'date_of_birth',
],
redact_pii_sub: 'hash', // Replace PII with hash. Options: 'hash' | 'entity_name'
redact_pii_audio: true, // Also generate redacted audio file
});
// Text has PII replaced: "My name is ####" or "My name is [PERSON_NAME]"
console.log(transcript.text);
// Get redacted audio URL (takes extra processing time)
if (transcript.redact_pii_audio_quality) {
const redactedAudio = await client.transcripts.redactedAudio(transcript.id);
console.log('Redacted audio URL:', redactedAudio.redacted_audio_url);
}
Step 6: Manage Transcripts
// List recent transcripts
const page = await client.transcripts.list({ limit: 20 });
for (const t of page.transcripts) {
console.log(`${t.id} | ${t.status} | ${t.audio_duration}s`);
}
// Get a specific transcript
const existing = await client.transcripts.get('transcript-id');
// Delete a transcript (GDPR compliance)
await client.transcripts.delete('transcript-id');
Supported Audio Formats
MP3, WAV, FLAC, M4A, OGG, WebM, MP4, AAC. Max file size: 5 GB. Max duration: 10 hours (async). The SDK auto-detects format.
Output
- Complete transcript with word-level timestamps and confidence scores
- Speaker-labeled utterances (with
speaker_labels: true) - Sentiment analysis, entity detection, key phrases, topic categories
- PII-redacted text and audio
- Content safety labels for moderation
Error Handling
| Error | Cause | Solution |
|-------|-------|----------|
| transcript.status === 'error' | Corrupted audio or unsupported format | Verify audio file plays locally |
| download_url must be accessible | Private/expired URL | Use a publicly accessible URL or upload locally |
| Could not process audio | File too short (<200ms) or silent | Ensure audio has speech content |
| word_boost has no effect | Misspelled terms or wrong model | Check spelling; word boost works with Best model tier |
Resources
Next Steps
For real-time streaming transcription, see assemblyai-core-workflow-b.
For LLM-powered analysis of transcripts, see assemblyai-sdk-patterns (LeMUR examples).