multimodal-learning | Agent Skills

transformers

Loading and using pretrained models with Hugging Face Transformers. Use when working with pretrained models from the Hub, running inference with Pipeline API, fine-tuning models with Trainer, or handling text, vision, audio, and multimodal tasks.

transformershugging-facepretrained-modelsmultimodal-learning

itsmostafa

invoking-gemini

Invokes Google Gemini models for structured outputs, multi-modal tasks, and Google-specific features. Use when users request Gemini, structured JSON output, Google API integration, or cost-effective parallel processing.

google-geminiapilarge-language-modelsstructured-output

oaustegard

251

transformers

This skill should be used when working with pre-trained transformer models for natural language processing, computer vision, audio, or multimodal tasks. Use for text generation, classification, question answering, translation, summarization, image classification, object detection, speech recognition, and fine-tuning models on custom datasets.

transformersmachine-learningnatural-language-processingcomputer-vision

K-Dense-AI

3,233360

Agent Skills with tag: multimodal-learning

transformers

invoking-gemini

transformers