mixed-precision | Agent Skills

pytorch-fsdp

Expert guidance for Fully Sharded Data Parallel training with PyTorch FSDP - parameter sharding, mixed precision, CPU offloading, FSDP2

pytorchdistributed-computingmixed-precisioncpu-offloading

ovachiever

deepspeed

Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, sparse attention

distributed-computingDeepSpeedmodel-optimizationparallelism

ovachiever

huggingface-accelerate

Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic device placement, mixed precision (FP16/BF16/FP8). Interactive config, single launch command. HuggingFace ecosystem standard.

pytorchdistributed-computingdeep-learninghuggingface

ovachiever

Agent Skills with tag: mixed-precision

pytorch-fsdp

deepspeed

huggingface-accelerate