CoreWeave Hello World
Overview
Deploy your first GPU workload on CoreWeave: a simple inference service using vLLM or a batch CUDA job. CoreWeave runs Kubernetes on bare-metal GPU nodes with A100, H100, and L40 GPUs.
Prerequisites
- Completed
coreweave-install-authsetup - kubectl configured with CoreWeave kubeconfig
- Namespace with GPU quota
Instructions
Step 1: Deploy a vLLM Inference Server
# vllm-inference.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: vllm-server
spec:
replicas: 1
selector:
matchLabels:
app: vllm-server
template:
metadata:
labels:
app: vllm-server
spec:
containers:
- name: vllm
image: vllm/vllm-openai:latest
args:
- "--model"
- "meta-llama/Llama-3.1-8B-Instruct"
- "--port"
- "8000"
ports:
- containerPort: 8000
resources:
limits:
nvidia.com/gpu: 1
memory: 48Gi
cpu: "8"
requests:
nvidia.com/gpu: 1
memory: 32Gi
cpu: "4"
env:
- name: HUGGING_FACE_HUB_TOKEN
valueFrom:
secretKeyRef:
name: hf-token
key: token
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: gpu.nvidia.com/class
operator: In
values: ["A100_PCIE_80GB"]
---
apiVersion: v1
kind: Service
metadata:
name: vllm-server
spec:
selector:
app: vllm-server
ports:
- port: 8000
targetPort: 8000
type: ClusterIP
# Create HuggingFace token secret
kubectl create secret generic hf-token --from-literal=token="${HF_TOKEN}"
# Deploy
kubectl apply -f vllm-inference.yaml
kubectl get pods -w # Wait for Running state
# Port-forward and test
kubectl port-forward svc/vllm-server 8000:8000 &
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "meta-llama/Llama-3.1-8B-Instruct", "messages": [{"role": "user", "content": "Hello!"}]}'
Step 2: Batch GPU Job
# gpu-batch-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: gpu-benchmark
spec:
template:
spec:
restartPolicy: Never
containers:
- name: benchmark
image: pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime
command: ["python3", "-c"]
args:
- |
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
x = torch.randn(10000, 10000, device="cuda")
y = torch.matmul(x, x)
print(f"Matrix multiply result shape: {y.shape}")
print("CoreWeave GPU test passed!")
resources:
limits:
nvidia.com/gpu: 1
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: gpu.nvidia.com/class
operator: In
values: ["A100_PCIE_80GB"]
kubectl apply -f gpu-batch-job.yaml
kubectl logs job/gpu-benchmark --follow
Error Handling
| Error | Cause | Solution |
|-------|-------|----------|
| Pod stuck Pending | No GPU capacity | Try different GPU type or check quota |
| nvidia-smi not found | Wrong base image | Use NVIDIA CUDA images |
| OOMKilled | Insufficient GPU memory | Use larger GPU (80GB A100) |
| Image pull error | Registry auth | Create imagePullSecret |
Resources
Next Steps
Proceed to coreweave-local-dev-loop for development workflow setup.