Back to tags
Tag

Agent Skills with tag: fine-tuning

17 skills match this tag. Use tags to discover related Agent Skills and explore similar workflows.

fine-tuning-with-trl

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

fine-tuningreinforcement-learningrlhfhuggingface
ovachiever
ovachiever
81

simpo-training

Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.

preference-alignmentllm-trainingoptimizationfine-tuning
ovachiever
ovachiever
81

transformers

This skill should be used when working with pre-trained transformer models for natural language processing, computer vision, audio, or multimodal tasks. Use for text generation, classification, question answering, translation, summarization, image classification, object detection, speech recognition, and fine-tuning models on custom datasets.

transformerspretrained-modelsfine-tuningnatural-language-processing
ovachiever
ovachiever
81

unsloth

Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization

fine-tuningperformance-optimizationmemory-efficiencylora
ovachiever
ovachiever
81

implementing-llms-litgpt

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

large-language-modelslightning-ailoralqora
ovachiever
ovachiever
81

grpo-rl-training

Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training

reinforcement-learningfine-tuningtransformerstrl
ovachiever
ovachiever
81

llama-factory

Expert guidance for fine-tuning LLMs with LLaMA-Factory - WebUI no-code, 100+ models, 2/3/4/5/6/8-bit QLoRA, multimodal support

fine-tuningllamano-codeqLORA
ovachiever
ovachiever
81

model-merging

Merge multiple fine-tuned models using mergekit to combine capabilities without retraining. Use when creating specialized models by blending domain-specific expertise (math + coding + chat), improving performance beyond single models, or experimenting rapidly with model variants. Covers SLERP, TIES-Merging, DARE, Task Arithmetic, linear merging, and production deployment strategies.

model-mergingfine-tuningensemble-learningdeployment-strategies
ovachiever
ovachiever
81

axolotl

Expert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support

fine-tuningLoRAmultimodalLLM
ovachiever
ovachiever
81

slipstream-finetune

Finetune LLMs to speak Slipstream natively - complete guide with GLM-4-9B

fine-tuningllmglm-4language-model
anthony-maio
anthony-maio
1

llm-integration

Integrate LLMs into applications - APIs, prompting, fine-tuning, and context management

llmsapipromptingfine-tuning
pluginagentmarketplace
pluginagentmarketplace
1

fine-tuning

LLM fine-tuning with LoRA, QLoRA, and instruction tuning for domain adaptation.

fine-tuningloraq-lorainstruction-tuning
pluginagentmarketplace
pluginagentmarketplace
1

fine-tuning

LLM fine-tuning and prompt-tuning techniques

fine-tuningllmprompt-tuningmodel-training
pluginagentmarketplace
pluginagentmarketplace
1

dspy-finetune-bootstrap

Fine-tune LLM weights using DSPy's BootstrapFinetune optimizer

dsppyfine-tuningllmoptimizer
OmidZamani
OmidZamani
131

funsloth-hfjobs

Training manager for Hugging Face Jobs - launch fine-tuning on HF cloud GPUs with optional WandB monitoring

machine-learningcloud-computinghf-jobswandb
chrisvoncsefalvay
chrisvoncsefalvay
4

funsloth-train

Generate Unsloth training notebooks and scripts. Use when the user wants to create a training notebook, configure fine-tuning parameters, or set up SFT/DPO/GRPO training.

jupyter-notebooklarge-language-modelsmodel-trainingfine-tuning
chrisvoncsefalvay
chrisvoncsefalvay
4

qlora

Memory-efficient fine-tuning with 4-bit quantization and LoRA adapters. Use when fine-tuning large models (7B+) on consumer GPUs, when VRAM is limited, or when standard LoRA still exceeds memory. Builds on the lora skill.

large-language-modelsquantizationlorafine-tuning
itsmostafa
itsmostafa
10