nanogpt
Educational GPT implementation in ~300 lines. Reproduces GPT-2 (124M) on OpenWebText. Clean, hackable code for learning transformers. By Andrej Karpathy. Perfect for understanding GPT architecture from scratch. Train on Shakespeare (CPU) or OpenWebText (multi-GPU).
pytorch-fsdp
Expert guidance for Fully Sharded Data Parallel training with PyTorch FSDP - parameter sharding, mixed precision, CPU offloading, FSDP2
grpo-rl-training
Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training
moe-training
Train Mixture of Experts (MoE) models using DeepSpeed or HuggingFace. Use when training large-scale models with limited compute (5× cost reduction vs dense models), implementing sparse architectures like Mixtral 8x7B or DeepSeek-V3, or scaling model capacity without proportional compute increase. Covers MoE architectures, routing mechanisms, load balancing, expert parallelism, and inference optimization.
supervised-learning
Build production-ready classification and regression models with hyperparameter tuning
fine-tuning
LLM fine-tuning and prompt-tuning techniques
training-pipelines
Master training pipelines - orchestration, distributed training, hyperparameter tuning
ml-pipeline-workflow
Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipelines, implementing MLOps practices, or automating model training and deployment workflows.
dspy-finetune-bootstrap
Fine-tune LLM weights using DSPy's BootstrapFinetune optimizer
funsloth-train
Generate Unsloth training notebooks and scripts. Use when the user wants to create a training notebook, configure fine-tuning parameters, or set up SFT/DPO/GRPO training.