Back to tags
Tag

Agent Skills with tag: gpu-acceleration

24 skills match this tag. Use tags to discover related Agent Skills and explore similar workflows.

gpu-ml-trainer

Specialized skill for ML training workflows on cloud GPUs. Fine-tune LLMs with LoRA/QLoRA, train image LoRAs, build classifiers, and run custom training jobs. Generates production-ready training pipelines with checkpointing, logging, and optimal GPU selection.

ml-pipelinesmachine-learninggpu-accelerationdeep-learning
gpu-cli
gpu-cli
0

gpu-workflow-creator

Transform natural language requests into complete GPU CLI workflows. The ultimate skill for Mac users who want to run NVIDIA GPU workloads without configuration complexity. Describe what you want, get a working project.

cudagpu-accelerationcommand-linemacos
gpu-cli
gpu-cli
0

gpu-debugger

Debug failed GPU CLI runs. Analyze error messages, diagnose OOM errors, fix sync issues, troubleshoot connectivity, and resolve common problems. Turn cryptic errors into actionable fixes.

cudagpu-accelerationterminalerror-handling
gpu-cli
gpu-cli
0

gpu-inference-server

Set up AI inference servers on cloud GPUs. Create private LLM APIs (vLLM, TGI), image generation endpoints, embedding services, and more. All with OpenAI-compatible interfaces that work with existing tools.

gpu-accelerationcloud-infrastructureapiimage-generation
gpu-cli
gpu-cli
0

gpu-media-processor

Process audio, video, and media on cloud GPUs. Transcribe with Whisper, clone voices, generate videos, upscale images, and run batch media processing. All results sync back to your Mac.

gpu-accelerationmedia-conversionbatch-processingtranscription
gpu-cli
gpu-cli
0

pytorch-lightning

Deep learning framework (PyTorch Lightning). Organize PyTorch code into LightningModules, configure Trainers for multi-GPU/TPU, implement data pipelines, callbacks, logging (W&B, TensorBoard), distributed training (DDP, FSDP, DeepSpeed), for scalable neural network training.

deep-learningpytorchdistributed-traininggpu-acceleration
ovachiever
ovachiever
81

get-available-resources

This skill should be used at the start of any computationally intensive scientific task to detect and report available system resources (CPU cores, GPUs, memory, disk space). It creates a JSON file with resource information and strategic recommendations that inform computational approach decisions such as whether to use parallel processing (joblib, multiprocessing), out-of-core computing (Dask, Zarr), GPU acceleration (PyTorch, JAX), or memory-efficient strategies. Use this skill before running analyses, training models, processing large datasets, or any task where resource constraints matter.

resource-constraintsparallel-processinggpu-accelerationmemory-management
ovachiever
ovachiever
81

openrlhf-training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

reinforcement-learningdistributed-traininggpu-accelerationray
ovachiever
ovachiever
81

faiss

Facebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index types (Flat, IVF, HNSW). Use for fast k-NN search, large-scale vector retrieval, or when you need pure similarity search without metadata. Best for high-performance applications.

similarity-searchvector-retrievalgpu-accelerationknn
ovachiever
ovachiever
81

modal

Run Python code in the cloud with serverless containers, GPUs, and autoscaling. Use when deploying ML models, running batch processing jobs, scheduling compute-intensive tasks, or serving APIs that require GPU acceleration or dynamic scaling.

pythonserverlessgpu-accelerationautoscaling
ovachiever
ovachiever
81

training-llms-megatron

Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. Use when training models >1B parameters, need maximum GPU efficiency (47% MFU on H100), or require tensor/pipeline/sequence/context/expert parallelism. Production-ready framework used for Nemotron, LLaMA, DeepSeek.

training-orchestrationlarge-language-modelsparallelismgpu-acceleration
ovachiever
ovachiever
81

graphics-rendering

|

gpu-accelerationperformance-optimizationrendering-pipelinecomputer-graphics
pluginagentmarketplace
pluginagentmarketplace
2

particle-systems

|

particle-systemsphysics-simulationgpu-accelerationreal-time-rendering
pluginagentmarketplace
pluginagentmarketplace
2

shader-techniques

|

gpu-accelerationperformance-optimizationshader-programming
pluginagentmarketplace
pluginagentmarketplace
2

webgl-expert

Expert guide for WebGL API development including 3D graphics, shaders (GLSL), rendering pipeline, textures, buffers, performance optimization, and canvas rendering. Use when working with WebGL, 3D graphics, canvas rendering, shaders, GPU programming, or when user mentions WebGL, OpenGL ES, GLSL, vertex shaders, fragment shaders, texture mapping, or 3D web graphics.

WebGLGLSLcanvasgpu-acceleration
ronnycoding
ronnycoding
6

glsl

GLSL shader programming for JARVIS holographic effects

glslshader-programmingholographic-effectsgpu-acceleration
martinholovsky
martinholovsky
92

webgl

WebGL shaders and effects for JARVIS 3D HUD

webglshaderscanvasgpu-acceleration
martinholovsky
martinholovsky
92

vllm-deployment

|

vllmlarge-language-modelscontainer-orchestrationdocker
Stakpak <team@stakpak.dev>
Stakpak <team@stakpak.dev>
3

Page 1 of 2 · 24 results