Agent Skills: Optimize LLM Command

Get LLM optimization recommendations for serving latency, inference costs, and throughput improvements

UncategorizedID: melodic-software/claude-code-plugins/optimize-llm

Install this agent skill to your local

pnpm dlx add-skill https://github.com/melodic-software/claude-code-plugins/tree/HEAD/plugins/systems-design/skills/optimize-llm

Skill Files

Browse the full folder contents for optimize-llm.

Download Skill

Loading file tree…

plugins/systems-design/skills/optimize-llm/SKILL.md

Skill Metadata

Name
optimize-llm
Description
Get LLM optimization recommendations for serving latency, inference costs, and throughput improvements

Optimize LLM Command

Get quick, actionable recommendations for LLM serving optimization.

Usage

/sd:optimize-llm [focus]

Arguments

  • focus (optional): Optimization priority
    • latency - Focus on reducing response time
    • cost - Focus on reducing inference costs
    • throughput - Focus on maximizing requests/second
    • If omitted: Provide balanced recommendations

Examples

/sd:optimize-llm
/sd:optimize-llm latency
/sd:optimize-llm cost

Workflow

  1. Gather Context

    • Search for LLM-related configuration files
    • Look for: model configs, serving configs, inference scripts
    • Identify current serving stack (vLLM, TGI, TensorRT-LLM, etc.)
  2. Spawn LLM Optimization Advisor Agent Use the llm-optimization-advisor agent to analyze and provide recommendations. The agent specializes in:

    • Quantization strategies (INT8, INT4, FP16)
    • Batching optimization (continuous, dynamic)
    • KV cache optimization (PagedAttention)
    • Serving framework selection
    • Cost reduction strategies
  3. Present Recommendations Display optimization opportunities organized by:

    • Quick Wins - Low effort, high impact changes
    • Medium Effort - Moderate changes with significant benefits
    • Advanced - Architectural changes for maximum performance

Output Format

## LLM Optimization Report

### Current Setup
- Model: [detected or ask]
- Framework: [detected or unknown]
- Hardware: [detected or ask]

### Quick Wins
1. [Optimization] - [Expected impact]
2. ...

### Medium Effort Optimizations
1. [Optimization] - [Expected impact]
2. ...

### Advanced Optimizations
1. [Optimization] - [Expected impact]
2. ...

### Estimated Total Impact
- Latency: [X]% improvement
- Cost: [X]% reduction
- Throughput: [X]x increase