Back to tags
Tag

Agent Skills with tag: speech-to-text

14 skills match this tag. Use tags to discover related Agent Skills and explore similar workflows.

whisper

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

speech-to-textmultilingualautomatic-speech-recognitiontranscription
ovachiever
ovachiever
81

Azure AI Services

This skill should be used when the user asks about "Azure AI Search", "Cognitive Search", "AI Foundry", "Azure OpenAI", "speech to text", "text to speech", "Azure AI", "vector search", "semantic search", or mentions Azure AI and cognitive services. Provides best practices and MCP tool guidance for Azure AI services.

Azurecognitive-servicesAIsearch
charris-msft
charris-msft
0

superwhisper-custom-mode

Guide for creating effective Custom Mode prompts and examples for Superwhisper, an AI dictation app. Use when users want to create, improve, or understand Superwhisper custom mode instructions for processing dictated speech with context-awareness, and when users want to generate examples.

prompt-engineeringexamplespeech-to-textcustom-mode
miguelarios
miguelarios
1

transcribe-and-analyze

Transcribe audio and video from URLs (YouTube, direct media links) using WhisperKit locally. Optionally analyze transcripts with AI when explicitly requested. Use when users provide URLs to media content and request transcription or speech-to-text conversion.

speech-to-textaudio-processingvideo-processingwhisper
buddyh
buddyh
1

ASR

Implement speech-to-text (ASR/automatic speech recognition) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to transcribe audio files, convert speech to text, build voice input features, or process audio recordings. Supports base64 encoded audio files and returns accurate text transcriptions.

speech-to-textasraudio-transcriptionaudio-input
UholySmokes
UholySmokes
1

voice-transcription

Record and transcribe voice input when user wants to speak instead of type, describe complex issues verbally, provide audio input, or dictate text. Use this when user says "record my voice", "let me speak", "voice input", "transcribe audio", or when verbal description would be clearer than typing.

voice-transcriptionspeech-to-textaudio-processingdictation
aldervall
aldervall
21

speech-to-text

Expert skill for implementing speech-to-text with Faster Whisper. Covers audio processing, transcription optimization, privacy protection, and secure handling of voice data for JARVIS voice assistant.

natural-language-processingspeech-to-textaudio-processingfaster-whisper
martinholovsky
martinholovsky
92

shownotes

Extract transcripts from podcasts and YouTube videos, then create shareable show notes and summaries. Use when the user wants to: (1) Get transcripts from Apple Podcasts or podcast audio files, (2) Extract transcripts from YouTube videos, (3) Create show notes or summaries from audio/video content, (4) Search for podcast episodes or YouTube videos to transcribe, or (5) Turn any audio or video content into structured notes.

speech-to-textsummarizationyoutubeaudio-extraction
forcequit
forcequit
5

youtube-transcript

Download YouTube video transcripts when user provides a YouTube URL or asks to download/get/fetch a transcript from YouTube. Also use when user wants to transcribe or get captions/subtitles from a YouTube video.

youtuberest-apispeech-to-textsubtitles
gupsammy
gupsammy
102

openai

OpenAI API via curl. Use this skill for GPT chat completions, DALL-E image generation, Whisper audio transcription, embeddings, and text-to-speech.

openaicurlllm-integrationimage-generation
vm0-ai
vm0-ai
12

assemblyai-streaming

This skill should be used when working with AssemblyAI’s Speech-to-Text and LLM Gateway APIs, especially for streaming/live transcription, meeting notetakers, and voice agents that need low-latency transcripts and audio analysis.

apitranscriptionspeech-to-textstreaming
ratacat
ratacat
123

transcribe

Speech-to-text transcription using Groq Whisper API. Supports m4a, mp3, wav, ogg, flac, webm.

transcriptionapispeech-to-textgroq-whisper
badlogic
badlogic
15611

transcript-fixer

Corrects speech-to-text transcription errors in meeting notes, lectures, and interviews using dictionary rules and AI. Learns patterns to build personalized correction databases. Use when working with transcripts containing ASR/STT errors, homophones, or Chinese/English mixed content requiring cleanup.

natural-language-processingdocument-processingspeech-to-texttranscription-correction
daymade
daymade
15713

openai-whisper

Local speech-to-text with the Whisper CLI (no API key).

openaiclispeech-to-texttool-use
steipete
steipete
2,731407