CAST AI Cost Tuning Skill

CAST AI Cost Tuning

Overview

Maximize Kubernetes cost savings through CAST AI: spot instance strategies, workload right-sizing, cluster hibernation, and savings tracking. Typical savings: 50-70% on cloud compute costs.

Prerequisites

CAST AI Phase 2 enabled with full automation
Savings report available (requires 24h+ of data)
Understanding of workload criticality tiers

Instructions

Step 1: Analyze Current Savings

# Get savings breakdown
curl -s -H "X-API-Key: ${CASTAI_API_KEY}" \
  "https://api.cast.ai/v1/kubernetes/clusters/${CASTAI_CLUSTER_ID}/savings" \
  | jq '{
    currentMonthlyCost: .currentMonthlyCost,
    optimizedMonthlyCost: .optimizedMonthlyCost,
    monthlySavings: .monthlySavings,
    savingsPercentage: .savingsPercentage,
    spotSavings: .spotSavings,
    rightSizingSavings: .rightSizingSavings
  }'

Step 2: Maximize Spot Usage

# Enable aggressive spot with diversity and fallbacks
curl -X PUT -H "X-API-Key: ${CASTAI_API_KEY}" \
  -H "Content-Type: application/json" \
  "https://api.cast.ai/v1/kubernetes/clusters/${CASTAI_CLUSTER_ID}/policies" \
  -d '{
    "enabled": true,
    "spotInstances": {
      "enabled": true,
      "clouds": ["aws"],
      "spotDiversityEnabled": true,
      "spotDiversityPriceIncreaseLimitPercent": 20,
      "spotBackups": {
        "enabled": true,
        "spotBackupRestoreRateSeconds": 600
      }
    }
  }'

Spot allocation strategy by workload tier:

| Workload Type | Spot % | Rationale | |---------------|--------|-----------| | Batch jobs, CI runners | 100% spot | Interruptible, restartable | | Stateless APIs (behind LB) | 80% spot | Can handle brief interruptions | | Stateful services, databases | 0% spot | Use on-demand or reserved | | ML training | 80-100% spot | Checkpointing handles interrupts |

Step 3: Workload Right-Sizing

# Get resource waste analysis
curl -s -H "X-API-Key: ${CASTAI_API_KEY}" \
  "https://api.cast.ai/v1/workload-autoscaling/clusters/${CASTAI_CLUSTER_ID}/workloads" \
  | jq '[.items[] | select(.estimatedSavingsPercent > 20) | {
    name: .workloadName,
    namespace: .namespace,
    wastedCpu: (.currentCpuRequest - .recommendedCpuRequest),
    wastedMemory: (.currentMemoryRequest - .recommendedMemoryRequest),
    savingsPercent: .estimatedSavingsPercent
  }] | sort_by(-.savingsPercent) | .[0:10]'

Step 4: Cluster Hibernation (Dev/Staging)

# Hibernate non-production clusters during off-hours
# Scales nodes to zero, resume on demand

# Enable hibernation
curl -X POST -H "X-API-Key: ${CASTAI_API_KEY}" \
  -H "Content-Type: application/json" \
  "https://api.cast.ai/v1/kubernetes/clusters/${CASTAI_CLUSTER_ID}/hibernate" \
  -d '{
    "schedule": {
      "enabled": true,
      "hibernateAt": "20:00",
      "wakeUpAt": "08:00",
      "timezone": "America/New_York",
      "weekdaysOnly": true
    }
  }'

Step 5: Cost Tracking Dashboard

interface CostReport {
  cluster: string;
  period: string;
  currentCost: number;
  optimizedCost: number;
  savings: number;
  spotPercent: number;
}

async function generateMonthlyCostReport(
  clusterIds: string[]
): Promise<CostReport[]> {
  const reports: CostReport[] = [];

  for (const clusterId of clusterIds) {
    const [cluster, savings, nodes] = await Promise.all([
      castaiGet(`/v1/kubernetes/external-clusters/${clusterId}`),
      castaiGet(`/v1/kubernetes/clusters/${clusterId}/savings`),
      castaiGet(`/v1/kubernetes/external-clusters/${clusterId}/nodes`),
    ]);

    const spotNodes = nodes.items.filter(
      (n: { lifecycle: string }) => n.lifecycle === "spot"
    ).length;

    reports.push({
      cluster: cluster.name,
      period: new Date().toISOString().slice(0, 7),
      currentCost: savings.currentMonthlyCost,
      optimizedCost: savings.optimizedMonthlyCost,
      savings: savings.monthlySavings,
      spotPercent:
        nodes.items.length > 0
          ? (spotNodes / nodes.items.length) * 100
          : 0,
    });
  }

  return reports;
}

Cost Optimization Checklist

[ ] Spot instances enabled with diversity
[ ] Workload autoscaler right-sizing resources
[ ] Dev/staging clusters hibernated off-hours
[ ] Empty node downscaler enabled
[ ] Instance families include latest generation (cheaper)
[ ] Reserved/savings plan for baseline on-demand nodes
[ ] Weekly savings report review

Error Handling

| Issue | Cause | Solution | |-------|-------|----------| | Savings lower than expected | Too many on-demand constraints | Relax node template constraints | | Spot interruptions too frequent | Single instance type | Enable spot diversity | | Hibernation not triggering | Schedule timezone wrong | Use IANA timezone format | | Right-sizing too aggressive | Low headroom | Increase memory headroom to 20% |

Resources

Next Steps

For architecture patterns, see castai-reference-architecture.

Agent Skills: CAST AI Cost Tuning

Install this agent skill to your local

Skill Files