CAST AI Cost Tuning
Overview
Maximize Kubernetes cost savings through CAST AI: spot instance strategies, workload right-sizing, cluster hibernation, and savings tracking. Typical savings: 50-70% on cloud compute costs.
Prerequisites
- CAST AI Phase 2 enabled with full automation
- Savings report available (requires 24h+ of data)
- Understanding of workload criticality tiers
Instructions
Step 1: Analyze Current Savings
# Get savings breakdown
curl -s -H "X-API-Key: ${CASTAI_API_KEY}" \
"https://api.cast.ai/v1/kubernetes/clusters/${CASTAI_CLUSTER_ID}/savings" \
| jq '{
currentMonthlyCost: .currentMonthlyCost,
optimizedMonthlyCost: .optimizedMonthlyCost,
monthlySavings: .monthlySavings,
savingsPercentage: .savingsPercentage,
spotSavings: .spotSavings,
rightSizingSavings: .rightSizingSavings
}'
Step 2: Maximize Spot Usage
# Enable aggressive spot with diversity and fallbacks
curl -X PUT -H "X-API-Key: ${CASTAI_API_KEY}" \
-H "Content-Type: application/json" \
"https://api.cast.ai/v1/kubernetes/clusters/${CASTAI_CLUSTER_ID}/policies" \
-d '{
"enabled": true,
"spotInstances": {
"enabled": true,
"clouds": ["aws"],
"spotDiversityEnabled": true,
"spotDiversityPriceIncreaseLimitPercent": 20,
"spotBackups": {
"enabled": true,
"spotBackupRestoreRateSeconds": 600
}
}
}'
Spot allocation strategy by workload tier:
| Workload Type | Spot % | Rationale | |---------------|--------|-----------| | Batch jobs, CI runners | 100% spot | Interruptible, restartable | | Stateless APIs (behind LB) | 80% spot | Can handle brief interruptions | | Stateful services, databases | 0% spot | Use on-demand or reserved | | ML training | 80-100% spot | Checkpointing handles interrupts |
Step 3: Workload Right-Sizing
# Get resource waste analysis
curl -s -H "X-API-Key: ${CASTAI_API_KEY}" \
"https://api.cast.ai/v1/workload-autoscaling/clusters/${CASTAI_CLUSTER_ID}/workloads" \
| jq '[.items[] | select(.estimatedSavingsPercent > 20) | {
name: .workloadName,
namespace: .namespace,
wastedCpu: (.currentCpuRequest - .recommendedCpuRequest),
wastedMemory: (.currentMemoryRequest - .recommendedMemoryRequest),
savingsPercent: .estimatedSavingsPercent
}] | sort_by(-.savingsPercent) | .[0:10]'
Step 4: Cluster Hibernation (Dev/Staging)
# Hibernate non-production clusters during off-hours
# Scales nodes to zero, resume on demand
# Enable hibernation
curl -X POST -H "X-API-Key: ${CASTAI_API_KEY}" \
-H "Content-Type: application/json" \
"https://api.cast.ai/v1/kubernetes/clusters/${CASTAI_CLUSTER_ID}/hibernate" \
-d '{
"schedule": {
"enabled": true,
"hibernateAt": "20:00",
"wakeUpAt": "08:00",
"timezone": "America/New_York",
"weekdaysOnly": true
}
}'
Step 5: Cost Tracking Dashboard
interface CostReport {
cluster: string;
period: string;
currentCost: number;
optimizedCost: number;
savings: number;
spotPercent: number;
}
async function generateMonthlyCostReport(
clusterIds: string[]
): Promise<CostReport[]> {
const reports: CostReport[] = [];
for (const clusterId of clusterIds) {
const [cluster, savings, nodes] = await Promise.all([
castaiGet(`/v1/kubernetes/external-clusters/${clusterId}`),
castaiGet(`/v1/kubernetes/clusters/${clusterId}/savings`),
castaiGet(`/v1/kubernetes/external-clusters/${clusterId}/nodes`),
]);
const spotNodes = nodes.items.filter(
(n: { lifecycle: string }) => n.lifecycle === "spot"
).length;
reports.push({
cluster: cluster.name,
period: new Date().toISOString().slice(0, 7),
currentCost: savings.currentMonthlyCost,
optimizedCost: savings.optimizedMonthlyCost,
savings: savings.monthlySavings,
spotPercent:
nodes.items.length > 0
? (spotNodes / nodes.items.length) * 100
: 0,
});
}
return reports;
}
Cost Optimization Checklist
- [ ] Spot instances enabled with diversity
- [ ] Workload autoscaler right-sizing resources
- [ ] Dev/staging clusters hibernated off-hours
- [ ] Empty node downscaler enabled
- [ ] Instance families include latest generation (cheaper)
- [ ] Reserved/savings plan for baseline on-demand nodes
- [ ] Weekly savings report review
Error Handling
| Issue | Cause | Solution | |-------|-------|----------| | Savings lower than expected | Too many on-demand constraints | Relax node template constraints | | Spot interruptions too frequent | Single instance type | Enable spot diversity | | Hibernation not triggering | Schedule timezone wrong | Use IANA timezone format | | Right-sizing too aggressive | Low headroom | Increase memory headroom to 20% |
Resources
Next Steps
For architecture patterns, see castai-reference-architecture.