whisper
OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.
sound-effects-generator
Generate audio tones, noise, DTMF signals, and simple sound effects programmatically. Export to WAV or MP3 format.
audio-converter
Convert audio files between formats (MP3, WAV, FLAC, OGG, M4A) with bitrate and sample rate control. Batch processing supported.
audio-trimmer
Cut, trim, and edit audio segments with fade effects, speed control, concatenation, and basic audio manipulations.
podcast-splitter
Split audio files by detecting silence gaps. Auto-segment podcasts into chapters, remove long silences, and export individual clips.
audio-normalizer
Use when asked to normalize audio volume, match loudness, or apply peak/RMS normalization to audio files.
ASR
Implement speech-to-text (ASR/automatic speech recognition) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to transcribe audio files, convert speech to text, build voice input features, or process audio recordings. Supports base64 encoded audio files and returns accurate text transcriptions.
audio-analysis
Audio analysis with Tone.js and Web Audio API including FFT, frequency data extraction, amplitude measurement, and waveform analysis. Use when extracting audio data for visualizations, beat detection, or any audio-reactive features.