GKE Expert Skill | Agent Skills

GKE Expert

Initial Assessment When user requests GKE help, determine:

Cluster type: Autopilot or Standard? Task: Create, Deploy, Scale, Troubleshoot, or Optimize? Environment: Dev, Staging, or Production?

Quick Start Workflows Create Cluster Autopilot (recommended for most): bashgcloud container clusters create-auto CLUSTER_NAME
--region=REGION
--release-channel=regular Standard (for specific node requirements): bashgcloud container clusters create CLUSTER_NAME
--zone=ZONE
--num-nodes=3
--enable-autoscaling
--min-nodes=2
--max-nodes=10 Always authenticate after creation: bashgcloud container clusters get-credentials CLUSTER_NAME --region=REGION Deploy Application

Create deployment manifest:

yamlapiVersion: apps/v1 kind: Deployment metadata: name: APP_NAME spec: replicas: 3 selector: matchLabels: app: APP_NAME template: metadata: labels: app: APP_NAME spec: containers: - name: APP_NAME image: gcr.io/PROJECT_ID/IMAGE:TAG ports: - containerPort: 8080 resources: requests: cpu: 100m memory: 128Mi limits: cpu: 500m memory: 512Mi

Apply and expose:

bashkubectl apply -f deployment.yaml kubectl expose deployment APP_NAME --type=LoadBalancer --port=80 --target-port=8080 Setup Autoscaling HPA for pods: bashkubectl autoscale deployment APP_NAME --cpu-percent=70 --min=2 --max=100 Cluster autoscaling (Standard only): bashgcloud container clusters update CLUSTER_NAME
--enable-autoscaling --min-nodes=2 --max-nodes=10 --zone=ZONE Configure Workload Identity

Enable on cluster:

bashgcloud container clusters update CLUSTER_NAME
--workload-pool=PROJECT_ID.svc.id.goog

Link service accounts:

bash# Create GCP service account gcloud iam service-accounts create GSA_NAME

Create K8s service account

kubectl create serviceaccount KSA_NAME

Bind them

gcloud iam service-accounts add-iam-policy-binding
GSA_NAME@PROJECT_ID.iam.gserviceaccount.com
--role roles/iam.workloadIdentityUser
--member "serviceAccount:PROJECT_ID.svc.id.goog[default/KSA_NAME]"

Annotate K8s SA

kubectl annotate serviceaccount KSA_NAME
iam.gke.io/gcp-service-account=GSA_NAME@PROJECT_ID.iam.gserviceaccount.com Troubleshooting Guide Pod Issues bash# Pod not starting - check events kubectl describe pod POD_NAME kubectl get events --field-selector involvedObject.name=POD_NAME

Common fixes:

ImagePullBackOff: Check image exists and pull secrets

CrashLoopBackOff: kubectl logs POD_NAME --previous

Pending: kubectl describe nodes (check resources)

OOMKilled: Increase memory limits

Service Issues bash# No endpoints kubectl get endpoints SERVICE_NAME kubectl get pods -l app=APP_NAME # Check if pods match selector

Test connectivity

kubectl run test --image=busybox -it --rm -- wget -O- SERVICE_NAME Performance Issues bash# Check resource usage kubectl top nodes kubectl top pods --all-namespaces

Find bottlenecks

kubectl describe resourcequotas kubectl describe limitranges Production Patterns Ingress with HTTPS yamlapiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: APP_NAME-ingress annotations: networking.gke.io/managed-certificates: "CERT_NAME" spec: rules:

host: example.com http: paths:
- path: / pathType: Prefix backend: service: name: APP_NAME port: number: 80 Pod Disruption Budget yamlapiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: APP_NAME-pdb spec: minAvailable: 1 selector: matchLabels: app: APP_NAME Security Context yamlspec: securityContext: runAsNonRoot: true runAsUser: 1000 containers:
name: app securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: ["ALL"] Cost Optimization

Use Autopilot for automatic right-sizing Enable cluster autoscaling with appropriate limits Use Spot VMs for non-critical workloads:

bashgcloud container node-pools create spot-pool
--cluster=CLUSTER_NAME
--spot
--num-nodes=2

Set resource requests/limits appropriately Use VPA for recommendations: kubectl describe vpa APP_NAME-vpa

Essential Commands bash# Cluster management gcloud container clusters list kubectl config get-contexts kubectl cluster-info

Deployments

kubectl rollout status deployment/APP_NAME kubectl rollout undo deployment/APP_NAME kubectl scale deployment APP_NAME --replicas=5

Debugging

kubectl logs -f POD_NAME --tail=50 kubectl exec -it POD_NAME -- /bin/bash kubectl port-forward pod/POD_NAME 8080:80

Monitoring

kubectl top nodes kubectl top pods kubectl get events --sort-by='.lastTimestamp'

External Documentation

For detailed documentation beyond this skill:

Official GKE Docs: https://cloud.google.com/kubernetes-engine/docs
kubectl Reference: https://kubernetes.io/docs/reference/kubectl/
GKE Best Practices: https://cloud.google.com/kubernetes-engine/docs/best-practices
Workload Identity: https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity
GKE Pricing Calculator: https://cloud.google.com/products/calculator

Cleanup

kubectl delete all -l app=APP_NAME kubectl drain NODE_NAME --ignore-daemonsets Advanced Topics Reference

For complex scenarios, consult:

Stateful workloads: Use StatefulSets with persistent volumes Batch jobs: Use Jobs/CronJobs with appropriate backoff policies Multi-region: Use Multi-cluster Ingress or Traffic Director Service mesh: Install Anthos Service Mesh for advanced networking GitOps: Implement Config Sync or Flux for declarative management Monitoring: Integrate with Cloud Monitoring or install Prometheus

Agent Skills: GKE Expert

Install this agent skill to your local

Skill Files