Azure AI Transcription SDK for Python
Client library for Azure AI Transcription (speech-to-text) with real-time and batch transcription.
Installation
pip install azure-ai-transcription
Environment Variables
TRANSCRIPTION_ENDPOINT=https://<resource>.cognitiveservices.azure.com
TRANSCRIPTION_KEY=<your-key>
Authentication
Use subscription key authentication (DefaultAzureCredential is not supported for this client):
import os
from azure.ai.transcription import TranscriptionClient
with TranscriptionClient(
endpoint=os.environ["TRANSCRIPTION_ENDPOINT"],
credential=os.environ["TRANSCRIPTION_KEY"],
) as client:
transcriptions = list(client.list_transcriptions())
Transcription (Batch)
import os
from azure.ai.transcription import TranscriptionClient
with TranscriptionClient(
endpoint=os.environ["TRANSCRIPTION_ENDPOINT"],
credential=os.environ["TRANSCRIPTION_KEY"],
) as client:
job = client.begin_transcription(
name="meeting-transcription",
locale="en-US",
content_urls=["https://<storage>/audio.wav"],
diarization_enabled=True,
)
result = job.result()
print(result.status)
Transcription (Real-time)
import os
from azure.ai.transcription import TranscriptionClient
with TranscriptionClient(
endpoint=os.environ["TRANSCRIPTION_ENDPOINT"],
credential=os.environ["TRANSCRIPTION_KEY"],
) as client:
stream = client.begin_stream_transcription(locale="en-US")
stream.send_audio_file("audio.wav")
for event in stream:
print(event.text)
Best Practices
- Pick sync OR async and stay consistent. Do not mix
azure.xxxsync clients withazure.xxx.aioasync clients in the same call path. Choose one mode per module. - Always use context managers for clients and async credentials. Wrap every client in
with Client(...) as client:(sync) orasync with Client(...) as client:(async). For asyncDefaultAzureCredentialfromazure.identity.aio, also useasync with credential:so tokens and transports are cleaned up. - Enable diarization when multiple speakers are present
- Use batch transcription for long files stored in blob storage
- Capture timestamps for subtitle generation
- Specify language to improve recognition accuracy
- Handle streaming backpressure for real-time transcription
- Close transcription sessions when complete