Agent Skills: CoreWeave Cost Tuning

|

UncategorizedID: jeremylongshore/claude-code-plugins-plus-skills/coreweave-cost-tuning

Install this agent skill to your local

pnpm dlx add-skill https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/HEAD/plugins/saas-packs/coreweave-pack/skills/coreweave-cost-tuning

Skill Files

Browse the full folder contents for coreweave-cost-tuning.

Download Skill

Loading file tree…

plugins/saas-packs/coreweave-pack/skills/coreweave-cost-tuning/SKILL.md

Skill Metadata

Name
coreweave-cost-tuning
Description
'Optimize CoreWeave GPU cloud costs with right-sizing and scheduling.

CoreWeave Cost Tuning

GPU Pricing Reference (approximate)

| GPU | Per GPU/hour | Best For | |-----|-------------|----------| | A100 40GB PCIe | ~$1.50 | Development, smaller models | | A100 80GB PCIe | ~$2.21 | Production inference | | H100 80GB PCIe | ~$4.76 | High-throughput inference | | H100 SXM5 (8x) | ~$6.15/GPU | Training, multi-GPU | | L40 | ~$1.10 | Image generation, light inference |

Cost Optimization Strategies

Scale-to-Zero for Dev/Staging

autoscaling.knative.dev/minScale: "0"
autoscaling.knative.dev/scaleDownDelay: "5m"

Right-Size GPU Selection

def recommend_gpu(model_size_b: float, inference_only: bool = True) -> str:
    if model_size_b <= 7:
        return "L40" if inference_only else "A100_PCIE_80GB"
    elif model_size_b <= 13:
        return "A100_PCIE_80GB"
    elif model_size_b <= 70:
        return "A100_PCIE_80GB (4x tensor parallel)"
    else:
        return "H100_SXM5 (8x tensor parallel)"

Quantization to Use Smaller GPUs

Use AWQ or GPTQ quantization to fit larger models on smaller GPUs:

# 70B model at 4-bit fits on single A100-80GB instead of 4x
vllm serve meta-llama/Llama-3.1-70B-Instruct-AWQ --quantization awq

Resources

Next Steps

For architecture patterns, see coreweave-reference-architecture.