Back to tags
Tag

Agent Skills with tag: speech-to-text

7 skills match this tag. Use tags to discover related Agent Skills and explore similar workflows.

Azure AI Services

This skill should be used when the user asks about "Azure AI Search", "Cognitive Search", "AI Foundry", "Azure OpenAI", "speech to text", "text to speech", "Azure AI", "vector search", "semantic search", or mentions Azure AI and cognitive services. Provides best practices and MCP tool guidance for Azure AI services.

Azurecognitive-servicesAIsearch
charris-msft
charris-msft
0

superwhisper-custom-mode

Guide for creating effective Custom Mode prompts and examples for Superwhisper, an AI dictation app. Use when users want to create, improve, or understand Superwhisper custom mode instructions for processing dictated speech with context-awareness, and when users want to generate examples.

prompt-engineeringexamplespeech-to-textcustom-mode
miguelarios
miguelarios
1

transcribe-and-analyze

Transcribe audio and video from URLs (YouTube, direct media links) using WhisperKit locally. Optionally analyze transcripts with AI when explicitly requested. Use when users provide URLs to media content and request transcription or speech-to-text conversion.

speech-to-textaudio-processingvideo-processingwhisper
buddyh
buddyh
1

ASR

Implement speech-to-text (ASR/automatic speech recognition) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to transcribe audio files, convert speech to text, build voice input features, or process audio recordings. Supports base64 encoded audio files and returns accurate text transcriptions.

speech-to-textasraudio-transcriptionaudio-input
UholySmokes
UholySmokes
1

youtube-transcript

Download YouTube video transcripts when user provides a YouTube URL or asks to download/get/fetch a transcript from YouTube. Also use when user wants to transcribe or get captions/subtitles from a YouTube video.

youtuberest-apispeech-to-textsubtitles
gupsammy
gupsammy
0

openai

OpenAI API via curl. Use this skill for GPT chat completions, DALL-E image generation, Whisper audio transcription, embeddings, and text-to-speech.

openaicurlllm-integrationimage-generation
vm0-ai
vm0-ai
0

openai-whisper

Local speech-to-text with the Whisper CLI (no API key).

openaiclispeech-to-texttool-use
steipete
steipete
0