Agent Skills: CoreWeave Production Checklist

|

UncategorizedID: jeremylongshore/claude-code-plugins-plus-skills/coreweave-prod-checklist

Install this agent skill to your local

pnpm dlx add-skill https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/HEAD/plugins/saas-packs/coreweave-pack/skills/coreweave-prod-checklist

Skill Files

Browse the full folder contents for coreweave-prod-checklist.

Download Skill

Loading file tree…

plugins/saas-packs/coreweave-pack/skills/coreweave-prod-checklist/SKILL.md

Skill Metadata

Name
coreweave-prod-checklist
Description
|

CoreWeave Production Checklist

Inference Services

  • [ ] GPU type and count validated for model size
  • [ ] Autoscaling configured (KServe or HPA)
  • [ ] Health and readiness probes set
  • [ ] Resource requests AND limits specified
  • [ ] Node affinity targeting correct GPU class
  • [ ] minReplicas >= 1 for production (no cold starts)

Storage

  • [ ] Model weights in PVC (not downloaded at startup)
  • [ ] Checkpoints saved to persistent storage
  • [ ] Storage class appropriate (SSD for inference, HDD for archival)

Security

  • [ ] Secrets for model tokens and registry access
  • [ ] Network policies applied
  • [ ] Container images from trusted registries

Monitoring

  • [ ] GPU utilization metrics collected
  • [ ] Inference latency and throughput tracked
  • [ ] Alert on pod restarts and OOM events
  • [ ] Log aggregation configured

Rollback

kubectl rollout undo deployment/my-inference
kubectl rollout status deployment/my-inference

Resources

Next Steps

For upgrades, see coreweave-upgrade-migration.