GKE Expert
Initial Assessment When user requests GKE help, determine:
Cluster type: Autopilot or Standard? Task: Create, Deploy, Scale, Troubleshoot, or Optimize? Environment: Dev, Staging, or Production?
Quick Start Workflows
Create Cluster
Autopilot (recommended for most):
bashgcloud container clusters create-auto CLUSTER_NAME
--region=REGION
--release-channel=regular
Standard (for specific node requirements):
bashgcloud container clusters create CLUSTER_NAME
--zone=ZONE
--num-nodes=3
--enable-autoscaling
--min-nodes=2
--max-nodes=10
Always authenticate after creation:
bashgcloud container clusters get-credentials CLUSTER_NAME --region=REGION
Deploy Application
Create deployment manifest:
yamlapiVersion: apps/v1 kind: Deployment metadata: name: APP_NAME spec: replicas: 3 selector: matchLabels: app: APP_NAME template: metadata: labels: app: APP_NAME spec: containers: - name: APP_NAME image: gcr.io/PROJECT_ID/IMAGE:TAG ports: - containerPort: 8080 resources: requests: cpu: 100m memory: 128Mi limits: cpu: 500m memory: 512Mi
Apply and expose:
bashkubectl apply -f deployment.yaml
kubectl expose deployment APP_NAME --type=LoadBalancer --port=80 --target-port=8080
Setup Autoscaling
HPA for pods:
bashkubectl autoscale deployment APP_NAME --cpu-percent=70 --min=2 --max=100
Cluster autoscaling (Standard only):
bashgcloud container clusters update CLUSTER_NAME
--enable-autoscaling --min-nodes=2 --max-nodes=10 --zone=ZONE
Configure Workload Identity
Enable on cluster:
bashgcloud container clusters update CLUSTER_NAME
--workload-pool=PROJECT_ID.svc.id.goog
Link service accounts:
bash# Create GCP service account gcloud iam service-accounts create GSA_NAME
Create K8s service account
kubectl create serviceaccount KSA_NAME
Bind them
gcloud iam service-accounts add-iam-policy-binding
GSA_NAME@PROJECT_ID.iam.gserviceaccount.com
--role roles/iam.workloadIdentityUser
--member "serviceAccount:PROJECT_ID.svc.id.goog[default/KSA_NAME]"
Annotate K8s SA
kubectl annotate serviceaccount KSA_NAME
iam.gke.io/gcp-service-account=GSA_NAME@PROJECT_ID.iam.gserviceaccount.com
Troubleshooting Guide
Pod Issues
bash# Pod not starting - check events
kubectl describe pod POD_NAME
kubectl get events --field-selector involvedObject.name=POD_NAME
Common fixes:
ImagePullBackOff: Check image exists and pull secrets
CrashLoopBackOff: kubectl logs POD_NAME --previous
Pending: kubectl describe nodes (check resources)
OOMKilled: Increase memory limits
Service Issues bash# No endpoints kubectl get endpoints SERVICE_NAME kubectl get pods -l app=APP_NAME # Check if pods match selector
Test connectivity
kubectl run test --image=busybox -it --rm -- wget -O- SERVICE_NAME Performance Issues bash# Check resource usage kubectl top nodes kubectl top pods --all-namespaces
Find bottlenecks
kubectl describe resourcequotas kubectl describe limitranges Production Patterns Ingress with HTTPS yamlapiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: APP_NAME-ingress annotations: networking.gke.io/managed-certificates: "CERT_NAME" spec: rules:
- host: example.com
http:
paths:
- path: / pathType: Prefix backend: service: name: APP_NAME port: number: 80 Pod Disruption Budget yamlapiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: APP_NAME-pdb spec: minAvailable: 1 selector: matchLabels: app: APP_NAME Security Context yamlspec: securityContext: runAsNonRoot: true runAsUser: 1000 containers:
- name: app securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: ["ALL"] Cost Optimization
Use Autopilot for automatic right-sizing Enable cluster autoscaling with appropriate limits Use Spot VMs for non-critical workloads:
bashgcloud container node-pools create spot-pool
--cluster=CLUSTER_NAME
--spot
--num-nodes=2
Set resource requests/limits appropriately Use VPA for recommendations: kubectl describe vpa APP_NAME-vpa
Essential Commands bash# Cluster management gcloud container clusters list kubectl config get-contexts kubectl cluster-info
Deployments
kubectl rollout status deployment/APP_NAME kubectl rollout undo deployment/APP_NAME kubectl scale deployment APP_NAME --replicas=5
Debugging
kubectl logs -f POD_NAME --tail=50 kubectl exec -it POD_NAME -- /bin/bash kubectl port-forward pod/POD_NAME 8080:80
Monitoring
kubectl top nodes kubectl top pods kubectl get events --sort-by='.lastTimestamp'
External Documentation
For detailed documentation beyond this skill:
- Official GKE Docs: https://cloud.google.com/kubernetes-engine/docs
- kubectl Reference: https://kubernetes.io/docs/reference/kubectl/
- GKE Best Practices: https://cloud.google.com/kubernetes-engine/docs/best-practices
- Workload Identity: https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity
- GKE Pricing Calculator: https://cloud.google.com/products/calculator
Cleanup
kubectl delete all -l app=APP_NAME kubectl drain NODE_NAME --ignore-daemonsets Advanced Topics Reference
For complex scenarios, consult:
Stateful workloads: Use StatefulSets with persistent volumes Batch jobs: Use Jobs/CronJobs with appropriate backoff policies Multi-region: Use Multi-cluster Ingress or Traffic Director Service mesh: Install Anthos Service Mesh for advanced networking GitOps: Implement Config Sync or Flux for declarative management Monitoring: Integrate with Cloud Monitoring or install Prometheus