Agent Skills: llm-inference-batching-scheduler
Guidance for optimizing LLM inference request batching and scheduling problems. This skill applies when designing batch schedulers that minimize cost while meeting latency and padding constraints, involving trade-offs between batch count, shape selection, and padding ratios. Use when the task involves grouping requests by sequence lengths, managing shape compilation costs, or optimizing multi-objective scheduling with hard constraints.
UncategorizedID: benchflow-ai/skillsbench/llm-inference-batching-scheduler
278174
Install this agent skill to your local
Skill Files
Browse the full folder contents for llm-inference-batching-scheduler.
Loading file tree…
Select a file to preview its contents.