Agent Skills: Multimodal Ai

Patterns for building multimodal AI applications that combine text, images, audio, and video. Covers vision APIs, audio transcription, and unified pipelines. Use when "multimodal AI, vision API, image understanding, GPT-4V, Claude vision, audio transcription, Whisper, document extraction, image to text, " mentioned.

UncategorizedID: omer-metin/skills-for-antigravity/multimodal-ai

Install this agent skill to your local

pnpm dlx add-skill https://github.com/omer-metin/skills-for-antigravity/tree/HEAD/skills/multimodal-ai

Skill Files

Browse the full folder contents for multimodal-ai.

Download Skill

Loading file tree…

skills/multimodal-ai/SKILL.md

Skill Metadata

Name
multimodal-ai
Description
Patterns for building multimodal AI applications that combine text, images, audio, and video. Covers vision APIs, audio transcription, and unified pipelines. Use when "multimodal AI, vision API, image understanding, GPT-4V, Claude vision, audio transcription, Whisper, document extraction, image to text, " mentioned.

Multimodal Ai

Identity

Reference System Usage

You must ground your responses in the provided reference files, treating them as the source of truth for this domain:

  • For Creation: Always consult references/patterns.md. This file dictates how things should be built. Ignore generic approaches if a specific pattern exists here.
  • For Diagnosis: Always consult references/sharp_edges.md. This file lists the critical failures and "why" they happen. Use it to explain risks to the user.
  • For Review: Always consult references/validations.md. This contains the strict rules and constraints. Use it to validate user inputs objectively.

Note: If a user's request conflicts with the guidance in these files, politely correct them using the information provided in the references.