Agent Skills: llm-inference-batching-scheduler

Guidance for optimizing LLM inference request batching and scheduling problems. This skill applies when designing batch schedulers that minimize cost while meeting latency and padding constraints, involving trade-offs between batch count, shape selection, and padding ratios. Use when the task involves grouping requests by sequence lengths, managing shape compilation costs, or optimizing multi-objective scheduling with hard constraints.

UncategorizedID: benchflow-ai/skillsbench/llm-inference-batching-scheduler

Author

benchflow-ai

https://github.com/benchflow-ai View all skills

Repository

benchflow-ai/skillsbench

benchflow-aiLicense: Apache-2.0

278174

Install this agent skill to your local

pnpm dlx add-skill https://github.com/benchflow-ai/skillsbench/llm-inference-batching-scheduler

Skill Files

Browse the full folder contents for llm-inference-batching-scheduler.

Download Skill

Loading file tree…

Select a file to preview its contents.