Back to tags
Tag

Agent Skills with tag: transformers

17 skills match this tag. Use tags to discover related Agent Skills and explore similar workflows.

nanogpt

Educational GPT implementation in ~300 lines. Reproduces GPT-2 (124M) on OpenWebText. Clean, hackable code for learning transformers. By Andrej Karpathy. Perfect for understanding GPT architecture from scratch. Train on Shakespeare (CPU) or OpenWebText (multi-GPU).

transformersGPT-2deep-learningmodel-training
ovachiever
ovachiever
81

rwkv-architecture

RNN+Transformer hybrid with O(n) inference. Linear time, infinite context, no KV cache. Train like GPT (parallel), infer like RNN (sequential). Linux Foundation AI project. Production at Windows, Office, NeMo. RWKV-7 (March 2025). Models up to 14B parameters.

transformersrnnlinear-time-inferencemodel-architecture
ovachiever
ovachiever
81

transformers

This skill should be used when working with pre-trained transformer models for natural language processing, computer vision, audio, or multimodal tasks. Use for text generation, classification, question answering, translation, summarization, image classification, object detection, speech recognition, and fine-tuning models on custom datasets.

transformerspretrained-modelsfine-tuningnatural-language-processing
ovachiever
ovachiever
81

huggingface-tokenizers

Fast tokenizers optimized for research and production. Rust-based implementation tokenizes 1GB in <20 seconds. Supports BPE, WordPiece, and Unigram algorithms. Train custom vocabularies, track alignments, handle padding/truncation. Integrates seamlessly with transformers. Use when you need high-performance tokenization or custom tokenizer training.

huggingfacetokenizationnlprust
ovachiever
ovachiever
81

optimizing-attention-flash

Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA, flash-attn library, H100 FP8, and sliding window attention.

transformersflash-attentionpytorchgpu-memory-optimization
ovachiever
ovachiever
81

grpo-rl-training

Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training

reinforcement-learningfine-tuningtransformerstrl
ovachiever
ovachiever
81

gptq

Post-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on consumer GPUs, when you need 4× memory reduction with <2% perplexity degradation, or for faster inference (3-4× speedup) vs FP16. Integrates with transformers and PEFT for QLoRA fine-tuning.

model-compressionquantizationtransformersinference-optimization
ovachiever
ovachiever
81

mamba-architecture

State-space model with O(n) complexity vs Transformers' O(n²). 5× faster inference, million-token sequences, no KV cache. Selective SSM with hardware-aware design. Mamba-1 (d_state=16) and Mamba-2 (d_state=128, multi-head). Models 130M-2.8B on HuggingFace.

state-space-modeltransformersmodel-inferencehuggingface
ovachiever
ovachiever
81

long-context

Extend context windows of transformer models using RoPE, YaRN, ALiBi, and position interpolation techniques. Use when processing long documents (32k-128k+ tokens), extending pre-trained models beyond original context limits, or implementing efficient positional encodings. Covers rotary embeddings, attention biases, interpolation methods, and extrapolation strategies for LLMs.

transformerslong-contextpositional-encodingroformer
ovachiever
ovachiever
81

llm-basics

LLM architecture, tokenization, transformers, and inference optimization. Use for understanding and working with language models.

llmtransformerstokenizationinference-optimization
pluginagentmarketplace
pluginagentmarketplace
1

deep-learning

PyTorch, TensorFlow, neural networks, CNNs, transformers, and deep learning for production

pytorchtensorflowneural-networkscnn
pluginagentmarketplace
pluginagentmarketplace
11

nlp-basics

Process and analyze text using modern NLP techniques - preprocessing, embeddings, and transformers

preprocessingembeddingstransformersnatural-language-processing
pluginagentmarketplace
pluginagentmarketplace
11

ai-llm-development

|

transformersdeep-learningai-modelsopenai
phrazzld
phrazzld
21

transformers

Loading and using pretrained models with Hugging Face Transformers. Use when working with pretrained models from the Hub, running inference with Pipeline API, fine-tuning models with Trainer, or handling text, vision, audio, and multimodal tasks.

transformershugging-facepretrained-modelsmultimodal-learning
itsmostafa
itsmostafa
10

Neural Network Design

Design and architect neural networks with various architectures including CNNs, RNNs, Transformers, and attention mechanisms using PyTorch and TensorFlow

pytorchtensorflowneural-network-architecturestransformers
aj-geddes
aj-geddes
301

Natural Language Processing

Build NLP applications using transformers library, BERT, GPT, text classification, named entity recognition, and sentiment analysis

natural-language-processingtransformerstext-classificationnamed-entity-recognition
aj-geddes
aj-geddes
301

transformers

This skill should be used when working with pre-trained transformer models for natural language processing, computer vision, audio, or multimodal tasks. Use for text generation, classification, question answering, translation, summarization, image classification, object detection, speech recognition, and fine-tuning models on custom datasets.

transformersmachine-learningnatural-language-processingcomputer-vision
K-Dense-AI
K-Dense-AI
3,233360