CoreWeave Deploy Integration
Helm Chart for Inference Service
# helm/values.yaml
replicaCount: 2
image:
repository: vllm/vllm-openai
tag: latest
gpu:
type: A100_PCIE_80GB
count: 1
memory: 48Gi
model:
name: meta-llama/Llama-3.1-8B-Instruct
autoscaling:
enabled: true
minReplicas: 1
maxReplicas: 5
targetConcurrency: 2
helm install my-inference ./helm -f values-prod.yaml
helm upgrade my-inference ./helm -f values-prod.yaml
Kustomize Overlays
k8s/
├── base/
│ ├── deployment.yaml
│ ├── service.yaml
│ └── kustomization.yaml
├── overlays/
│ ├── dev/
│ │ ├── gpu-patch.yaml # L40 GPU for dev
│ │ └── kustomization.yaml
│ └── prod/
│ ├── gpu-patch.yaml # A100/H100 for prod
│ ├── replicas-patch.yaml
│ └── kustomization.yaml
kubectl apply -k k8s/overlays/prod/
Resources
Next Steps
For event monitoring, see coreweave-webhooks-events.