deepspeed
Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, sparse attention
training-llms-megatron
Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. Use when training models >1B parameters, need maximum GPU efficiency (47% MFU on H100), or require tensor/pipeline/sequence/context/expert parallelism. Production-ready framework used for Nemotron, LLaMA, DeepSeek.
dispatching-parallel-agents
Use when facing 3+ logically independent failures (different features, different root causes) that can be investigated concurrently - dispatches multiple agents to investigate in parallel; requires either parallel-safe test infrastructure OR sequential fix implementation
dispatching-parallel-agents
Dispatches one subagent per independent domain to parallelize investigation/fixes. Use when you have 2+ unrelated failures (e.g., separate failing test files, subsystems, bugs) with no shared state or ordering dependencies.
concurrency
>
multithreading
Multithreading and concurrency patterns for game servers including synchronization primitives
batch-operations
Use when performing 3+ similar operations like file edits, searches, or git operations. Triggers on repetitive tasks, "do the same for", loop-like patterns. Enables parallel tool calls and batch-editor delegation.
optimization-performance
|
synchronization-algorithms
|
sub-agent-delegation
Delegate complex tasks to sub-agents for parallel autonomous work. Use when GPU kernel optimization, numerical correctness verification, performance profiling, or long-running validation would benefit from focused independent execution.
dispatching-parallel-agents
Use when facing 3+ independent failures that can be investigated without shared state or dependencies - dispatches multiple Claude agents to investigate and fix independent problems concurrently
phase-task-verification
Shared branch creation and verification logic for sequential and parallel task execution - handles git operations, HEAD verification, and MODE-specific behavior
decomposing-tasks
Use when you have a complete feature spec and need to plan implementation - analyzes task dependencies, groups into sequential/parallel phases, validates task quality (no XL tasks, explicit files), and calculates parallelization time savings
finding-files
Performs fast file discovery with parallel search and smart defaults. Use this skill when searching for files by name, pattern, or type, especially when performance matters or when working with large directories
agency
Use Agency CLI to run parallel AI coding tasks in isolated Git worktrees. Invoke when user mentions "agency", "ag", parallel tasks, worktrees, or wants to run multiple coding agents simultaneously.
dispatching-parallel-agents
Use when facing 2+ independent tasks that can be worked on without shared state or sequential dependencies
gemini-tmux-orchestration
Use when delegating coding tasks to Gemini CLI agent, when you need parallel AI execution, or when tasks benefit from Gemini's 1M+ context window - orchestrates Gemini via tmux since headless mode cannot write files
nav-install-multi-claude
Install Navigator multi-Claude workflow orchestration scripts. Auto-invokes when user says "install multi-Claude workflows", "set up multi-Claude", or "enable parallel execution".
Page 1 of 2 · 22 results