CoreWeave Reference Architecture Skill

CoreWeave Reference Architecture

Architecture Diagram

                    ┌─────────────────────┐
                    │   Load Balancer     │
                    │   (Ingress/LB)      │
                    └──────────┬──────────┘
                               │
              ┌────────────────┼────────────────┐
              │                │                │
     ┌────────▼──────┐ ┌──────▼────────┐ ┌─────▼───────┐
     │ Model A       │ │ Model B       │ │ Model C     │
     │ (vLLM, A100)  │ │ (TGI, H100)  │ │ (SD, L40)   │
     │ 2 replicas    │ │ 1 replica     │ │ 3 replicas  │
     └───────────────┘ └───────────────┘ └─────────────┘
              │                │                │
     ┌────────▼────────────────▼────────────────▼───────┐
     │              Shared Storage (PVC)                │
     │         Models / Checkpoints / Data              │
     └──────────────────────────────────────────────────┘

Project Structure

ml-platform/
├── k8s/
│   ├── base/                    # Shared templates
│   ├── models/
│   │   ├── llama-8b/           # Per-model manifests
│   │   ├── llama-70b/
│   │   └── stable-diffusion/
│   └── infra/
│       ├── storage.yaml         # PVCs
│       ├── secrets.yaml         # Model tokens
│       └── monitoring.yaml      # Prometheus rules
├── containers/
│   ├── vllm/Dockerfile
│   └── custom-server/Dockerfile
├── scripts/
│   ├── deploy.sh
│   └── benchmark.sh
└── monitoring/
    ├── grafana-dashboards/
    └── alert-rules.yaml

Key Design Decisions

| Decision | Choice | Rationale | |----------|--------|-----------| | Serving framework | vLLM | Continuous batching, PagedAttention | | GPU type (production) | A100 80GB | Best price/performance for inference | | Storage | Shared PVC (SSD) | Fast model loading across replicas | | Autoscaling | KServe + Knative | Native scale-to-zero support | | Container registry | GHCR | GitHub integration, free for public |

Resources

Next Steps

For multi-environment setup, see coreweave-multi-env-setup.

Agent Skills: CoreWeave Reference Architecture