whisper
OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.
Azure AI Services
This skill should be used when the user asks about "Azure AI Search", "Cognitive Search", "AI Foundry", "Azure OpenAI", "speech to text", "text to speech", "Azure AI", "vector search", "semantic search", or mentions Azure AI and cognitive services. Provides best practices and MCP tool guidance for Azure AI services.
superwhisper-custom-mode
Guide for creating effective Custom Mode prompts and examples for Superwhisper, an AI dictation app. Use when users want to create, improve, or understand Superwhisper custom mode instructions for processing dictated speech with context-awareness, and when users want to generate examples.
transcribe-and-analyze
Transcribe audio and video from URLs (YouTube, direct media links) using WhisperKit locally. Optionally analyze transcripts with AI when explicitly requested. Use when users provide URLs to media content and request transcription or speech-to-text conversion.
ASR
Implement speech-to-text (ASR/automatic speech recognition) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to transcribe audio files, convert speech to text, build voice input features, or process audio recordings. Supports base64 encoded audio files and returns accurate text transcriptions.
voice-transcription
Record and transcribe voice input when user wants to speak instead of type, describe complex issues verbally, provide audio input, or dictate text. Use this when user says "record my voice", "let me speak", "voice input", "transcribe audio", or when verbal description would be clearer than typing.
speech-to-text
Expert skill for implementing speech-to-text with Faster Whisper. Covers audio processing, transcription optimization, privacy protection, and secure handling of voice data for JARVIS voice assistant.
shownotes
Extract transcripts from podcasts and YouTube videos, then create shareable show notes and summaries. Use when the user wants to: (1) Get transcripts from Apple Podcasts or podcast audio files, (2) Extract transcripts from YouTube videos, (3) Create show notes or summaries from audio/video content, (4) Search for podcast episodes or YouTube videos to transcribe, or (5) Turn any audio or video content into structured notes.
youtube-transcript
Download YouTube video transcripts when user provides a YouTube URL or asks to download/get/fetch a transcript from YouTube. Also use when user wants to transcribe or get captions/subtitles from a YouTube video.
openai
OpenAI API via curl. Use this skill for GPT chat completions, DALL-E image generation, Whisper audio transcription, embeddings, and text-to-speech.
assemblyai-streaming
This skill should be used when working with AssemblyAI’s Speech-to-Text and LLM Gateway APIs, especially for streaming/live transcription, meeting notetakers, and voice agents that need low-latency transcripts and audio analysis.
transcribe
Speech-to-text transcription using Groq Whisper API. Supports m4a, mp3, wav, ogg, flac, webm.
transcript-fixer
Corrects speech-to-text transcription errors in meeting notes, lectures, and interviews using dictionary rules and AI. Learns patterns to build personalized correction databases. Use when working with transcripts containing ASR/STT errors, homophones, or Chinese/English mixed content requiring cleanup.
openai-whisper
Local speech-to-text with the Whisper CLI (no API key).