unsloth-gguf Skill | Agent Skills

Overview

Unsloth provides a streamlined method to export fine-tuned models directly to GGUF format. It features "Dynamic 2.0" quantization, which protects sensitive weights to maintain high accuracy, and automates the merging of LoRA adapters.

When to Use

When deploying models to local serving platforms like Ollama, llama.cpp, or LM Studio.
When model size needs to be minimized for CPU-based inference or low-VRAM GPUs.
When sharing models with the community via GGUF format.

Decision Tree

Is target VRAM very low?
- Yes: Use quantization_method = 'q4_k_m' or higher compression.
- No: Use q8_0 or f16 for maximum quality.
Deploying to Ollama?
- Yes: Export to GGUF and then create a Modelfile with a FROM command.

Workflows

Exporting Fine-tuned Models to GGUF

After training, call model.save_pretrained_gguf("name", tokenizer, quantization_method='q4_k_m').
Specify quantization method (e.g., q4_k_m, q8_0, f16) based on target VRAM.
Wait for the script to download llama.cpp and perform conversion automatically.

Deploying to Ollama

Export model to GGUF using the native Unsloth save function.
Create a 'Modelfile' containing: FROM ./model-q4_k_m.gguf.
Run ollama create my-model -f Modelfile to import and serve.

Non-Obvious Insights

Unsloth 'Dynamic 2.0' GGUFs are superior to standard GGUFs because they dynamically identify and protect weights that are sensitive to quantization, leading to higher MMLU scores.
The GGUF export process handles the complex task of merging LoRA layers back into the base weights automatically, ensuring the resulting file is a standalone model.
Unsloth supports direct Hub uploading for GGUFs, removing the need for local storage during the export-to-share pipeline.

Evidence

"model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")" Source
"Unsloth Dynamic 4-bit Quantization! We dynamically opt not to quantize certain parameters and this greatly increases accuracy." Source

Scripts

scripts/unsloth-gguf_tool.py: Python helper for automated GGUF export.
scripts/unsloth-gguf_tool.js: Utility to generate Ollama Modelfiles.

Dependencies

unsloth
llama-cpp-python (or local llama.cpp binary)
huggingface_hub

References

[[references/README.md]]

Agent Skills: unsloth-gguf

Install this agent skill to your local

Skill Files