CAST AI Upgrade & Migration
Overview
Upgrade CAST AI components: Helm charts for the agent/controller/evictor, Terraform provider version, and workload autoscaler. Includes rollback procedures for each component.
Prerequisites
- Current CAST AI components installed
- Staging cluster for testing upgrades first
- Helm and kubectl access
- Change management approval for production
Instructions
Step 1: Check Current Versions
# Helm chart versions
helm list -n castai-agent -o json | jq '.[] | {name: .name, chart: .chart, appVersion: .app_version}'
# Available versions
helm repo update
helm search repo castai-helm --versions | head -20
# Terraform provider version
grep "castai/castai" .terraform.lock.hcl
terraform providers
Step 2: Review Changelog
# Check CAST AI changelog for breaking changes
# https://docs.cast.ai/changelog/
# Check Terraform provider releases
# https://github.com/castai/terraform-provider-castai/releases
Step 3: Upgrade on Staging First
# Upgrade agent
helm upgrade castai-agent castai-helm/castai-agent \
-n castai-agent --reuse-values
# Upgrade cluster controller
helm upgrade cluster-controller castai-helm/castai-cluster-controller \
-n castai-agent --reuse-values
# Upgrade evictor
helm upgrade castai-evictor castai-helm/castai-evictor \
-n castai-agent --reuse-values
# Upgrade workload autoscaler
helm upgrade castai-workload-autoscaler castai-helm/castai-workload-autoscaler \
-n castai-agent --reuse-values
# Verify all pods restarted cleanly
kubectl get pods -n castai-agent -w
Step 4: Upgrade Terraform Provider
# Update version constraint in versions.tf
terraform {
required_providers {
castai = {
source = "castai/castai"
version = "~> 7.5" # Update to target version
}
}
}
# Upgrade provider
terraform init -upgrade
terraform plan -var-file=environments/staging.tfvars
# Review plan carefully for resource recreation
# Apply if plan looks safe
terraform apply -var-file=environments/staging.tfvars
Step 5: Validate After Upgrade
# Verify agent is online
curl -s -H "X-API-Key: ${CASTAI_API_KEY}" \
"https://api.cast.ai/v1/kubernetes/external-clusters/${CASTAI_CLUSTER_ID}" \
| jq '{agentStatus, name}'
# Verify autoscaler policies still applied
curl -s -H "X-API-Key: ${CASTAI_API_KEY}" \
"https://api.cast.ai/v1/kubernetes/clusters/${CASTAI_CLUSTER_ID}/policies" \
| jq '.enabled'
# Check for errors in new agent version
kubectl logs -n castai-agent deployment/castai-agent --tail=50 | grep -i error
Rollback Procedure
# Helm rollback to previous release
helm rollback castai-agent -n castai-agent
helm rollback cluster-controller -n castai-agent
# Terraform rollback
terraform plan -var-file=environments/staging.tfvars # Review
# If needed, pin previous provider version:
# version = "= 7.4.2"
terraform init -upgrade
terraform apply -var-file=environments/staging.tfvars
Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Agent CrashLoop after upgrade | Breaking chart change | helm rollback to previous |
| Terraform plan shows destroy | Major provider version jump | Pin intermediate version |
| Policies reset after upgrade | Chart default values changed | Pass --reuse-values |
| Spot handler incompatible | Node format changed | Upgrade all components together |
Resources
Next Steps
For CI pipeline integration, see castai-ci-integration.