nccl-communication
NVIDIA Collective Communications Library integration for multi-GPU operations. Initialize NCCL communicators, execute collective operations, configure communication topologies, profile collective performance, and support RCCL for AMD compatibility.
nsight-profiler
Expert skill for NVIDIA Nsight Systems and Nsight Compute profiling tools. Configure profiling sessions, analyze kernel reports, interpret occupancy metrics, roofline model data, memory bandwidth bottlenecks, and warp execution efficiency.
nvenc-nvdec
NVIDIA hardware video encoding/decoding integration. Configure NVENC encoding parameters, set up NVDEC decoding pipelines, handle codec configurations, integrate with CUDA for pre/post processing, and manage video memory surfaces.
opencl-runtime
Cross-vendor OpenCL runtime management and kernel development. Query platforms/devices, generate portable OpenCL C kernel code, handle vendor-specific extensions, manage contexts and command queues, compile and cache programs.
parallel-patterns
GPU parallel algorithm design patterns and implementations. Implement parallel reduction, scan/prefix sum, histogram, parallel sort algorithms, stream compaction, and work-efficient patterns optimized for specific GPU architectures.
stencil-convolution
Expert skill for optimized stencil and convolution pattern implementations on GPU. Design tiled stencil algorithms with halos, implement 2D/3D convolution kernels, optimize boundary condition handling, apply temporal blocking techniques, generate separable filter implementations, and profile stencil memory bandwidth.
tensorrt-optimization
NVIDIA TensorRT model optimization and deployment. Convert models to TensorRT engines, configure optimization profiles and precision modes, apply INT8 calibration, analyze kernel fusion, generate custom plugins, and profile inference performance.
unified-memory
Expert skill for CUDA Unified Memory and memory prefetching optimization. Configure managed memory allocations, implement memory prefetch strategies, handle page fault analysis, configure memory hints and advise, profile unified memory migration, optimize for oversubscription scenarios, and compare managed vs explicit memory.
vulkan-compute
Vulkan compute shader development and pipeline configuration. Generate GLSL/HLSL compute shaders, compile to SPIR-V, configure compute pipelines, manage descriptor sets and resource bindings, implement memory barriers and synchronization.
warp-primitives
Warp-level programming and SIMD optimization. Use warp shuffle instructions, voting functions, cooperative groups, warp-synchronous algorithms, and minimize warp divergence for optimal GPU performance.
agent-generator
Generate AGENT.md files with proper YAML frontmatter, role definitions, expertise areas, and prompt templates following Babysitter SDK conventions.
process-analyzer
Analyze processes, identify workflows, define boundaries and scope, and map process requirements for specialization creation.
process-generator
Generate process JS files following Babysitter SDK patterns including task definitions, quality gates, breakpoints, and proper io configuration.
process-integrator
Integrate skills and agents into process files by updating task definitions with appropriate skill.name and agent.name references.
process-validator
Validate process JS files for correct SDK patterns, task definitions, syntax, and quality gate implementation.
skill-generator
Generate SKILL.md files with proper YAML frontmatter, capabilities documentation, and usage examples following Babysitter SDK conventions.
specialization-researcher
Research specialization domains, compile references, analyze best practices, and gather comprehensive knowledge for new specialization creation.
specialization-validator
Validate specialization completeness across all 7 phases, score each phase, identify gaps, and generate validation reports.
packet-capture
Expert skill for packet capture and analysis using libpcap/Wireshark. Execute tcpdump/tshark commands, write BPF filter expressions, analyze pcap files, decode protocol layers, calculate statistics, and generate Wireshark dissectors.
protocol-parser
Specialized skill for binary and text protocol parsing and serialization. Design and validate protocol message formats, generate parser code from specifications, implement state machine parsing, and handle endianness and byte alignment.
socket-programming
Deep integration with socket APIs for TCP/UDP programming across platforms. Execute socket operations, analyze socket options and buffer configurations, debug connection states, and generate optimized socket code for different I/O models.
Page 2 of 2 · 71 results