Agent Skills: HuggingFace Transformers

Use when "HuggingFace Transformers", "pre-trained models", "pipeline API", or asking about "text generation", "text classification", "question answering", "NER", "fine-tuning transformers", "AutoModel", "Trainer API"

UncategorizedID: eyadsibai/ltk/transformers

Install this agent skill to your local

pnpm dlx add-skill https://github.com/eyadsibai/ltk/tree/HEAD/plugins/ltk-data/skills/transformers

Skill Files

Browse the full folder contents for transformers.

Download Skill

Loading file tree…

plugins/ltk-data/skills/transformers/SKILL.md

Skill Metadata

Name
transformers
Description
Use when "HuggingFace Transformers", "pre-trained models", "pipeline API", or asking about "text generation", "text classification", "question answering", "NER", "fine-tuning transformers", "AutoModel", "Trainer API"

HuggingFace Transformers

Access thousands of pre-trained models for NLP, vision, audio, and multimodal tasks.

When to Use

  • Quick inference with pipelines
  • Text generation, classification, QA, NER
  • Image classification, object detection
  • Fine-tuning on custom datasets
  • Loading pre-trained models from HuggingFace Hub

Pipeline Tasks

NLP Tasks

| Task | Pipeline Name | Output | |------|---------------|--------| | Text Generation | text-generation | Completed text | | Classification | text-classification | Label + confidence | | Question Answering | question-answering | Answer span | | Summarization | summarization | Shorter text | | Translation | translation_en_to_fr | Translated text | | NER | ner | Entity spans + types | | Fill Mask | fill-mask | Predicted tokens |

Vision Tasks

| Task | Pipeline Name | Output | |------|---------------|--------| | Image Classification | image-classification | Label + confidence | | Object Detection | object-detection | Bounding boxes | | Image Segmentation | image-segmentation | Pixel masks |

Audio Tasks

| Task | Pipeline Name | Output | |------|---------------|--------| | Speech Recognition | automatic-speech-recognition | Transcribed text | | Audio Classification | audio-classification | Label + confidence |


Model Loading Patterns

Auto Classes

| Class | Use Case | |-------|----------| | AutoModel | Base model (embeddings) | | AutoModelForCausalLM | Text generation (GPT-style) | | AutoModelForSeq2SeqLM | Encoder-decoder (T5, BART) | | AutoModelForSequenceClassification | Classification head | | AutoModelForTokenClassification | NER, POS tagging | | AutoModelForQuestionAnswering | Extractive QA |

Key concept: Always use Auto classes unless you need a specific architecture—they handle model detection automatically.


Generation Parameters

| Parameter | Effect | Typical Values | |-----------|--------|----------------| | max_new_tokens | Output length | 50-500 | | temperature | Randomness (0=deterministic) | 0.1-1.0 | | top_p | Nucleus sampling threshold | 0.9-0.95 | | top_k | Limit vocabulary per step | 50 | | num_beams | Beam search (disable sampling) | 4-8 | | repetition_penalty | Discourage repetition | 1.1-1.3 |

Key concept: Higher temperature = more creative but less coherent. For factual tasks, use low temperature (0.1-0.3).


Memory Management

Device Placement Options

| Option | When to Use | |--------|-------------| | device_map="auto" | Let library decide GPU allocation | | device_map="cuda:0" | Specific GPU | | device_map="cpu" | CPU only |

Quantization Options

| Method | Memory Reduction | Quality Impact | |--------|------------------|----------------| | 8-bit | ~50% | Minimal | | 4-bit | ~75% | Small for most tasks | | GPTQ | ~75% | Requires calibration | | AWQ | ~75% | Activation-aware |

Key concept: Use torch_dtype="auto" to automatically use the model's native precision (often bfloat16).


Fine-Tuning Concepts

Trainer Arguments

| Argument | Purpose | Typical Value | |----------|---------|---------------| | num_train_epochs | Training passes | 3-5 | | per_device_train_batch_size | Samples per GPU | 8-32 | | learning_rate | Step size | 2e-5 for fine-tuning | | weight_decay | Regularization | 0.01 | | warmup_ratio | LR warmup | 0.1 | | evaluation_strategy | When to eval | "epoch" or "steps" |

Fine-Tuning Strategies

| Strategy | Memory | Quality | Use Case | |----------|--------|---------|----------| | Full fine-tuning | High | Best | Small models, enough data | | LoRA | Low | Good | Large models, limited GPU | | QLoRA | Very Low | Good | 7B+ models on consumer GPU | | Prefix tuning | Low | Moderate | When you can't modify weights |


Tokenization Concepts

| Parameter | Purpose | |-----------|---------| | padding | Make sequences same length | | truncation | Cut sequences to max_length | | max_length | Maximum tokens (model-specific) | | return_tensors | Output format ("pt", "tf", "np") |

Key concept: Always use the tokenizer that matches the model—different models use different vocabularies.


Best Practices

| Practice | Why | |----------|-----| | Use pipelines for inference | Handles preprocessing automatically | | Use device_map="auto" | Optimal GPU memory distribution | | Batch inputs | Better throughput | | Use quantization for large models | Run 7B+ on consumer GPUs | | Match tokenizer to model | Vocabularies differ between models | | Use Trainer for fine-tuning | Built-in best practices |

Resources