Kubernetes Operations Skill

Kubernetes Operations

Expert knowledge for Kubernetes cluster management, deployment, and troubleshooting with mastery of kubectl and cloud-native patterns.

When to Use This Skill

| Use this skill when... | Use <sibling> instead when... | |---|---| | Working with kubectl against pods, deployments, services, ingress, ConfigMaps, or Secrets | Use kubectl-debugging when you specifically need kubectl debug ephemeral containers or node sessions | | Applying or inspecting raw Kubernetes manifests and kustomize overlays | Use helm-release-management when the workload is delivered as a Helm chart | | Diagnosing cluster-level networking, storage, or workload health | Use argocd-login when the issue is authenticating to ArgoCD before any cluster operation |

Core Expertise

Kubernetes Operations

Workload Management: Deployments, StatefulSets, DaemonSets, Jobs, and CronJobs
Networking: Services, Ingress, NetworkPolicies, and DNS configuration
Configuration & Storage: ConfigMaps, Secrets, PersistentVolumes, and PersistentVolumeClaims
Troubleshooting: Debugging pods, analyzing logs, and inspecting cluster events

Cluster Operations Process

Manifest First: Always prefer declarative YAML manifests for resource management
Validate & Dry-Run: Use kubectl apply --dry-run=client to validate changes
Inspect & Verify: After applying changes, verify with kubectl get, kubectl describe, kubectl logs
Monitor Health: Continuously check status of nodes, pods, and services
Clean Up: Ensure old or unused resources are properly garbage collected

Essential Commands

# Resource management
kubectl apply -f manifest.yaml
kubectl get pods -A
kubectl describe pod <pod-name>
kubectl logs -f <pod-name>
kubectl exec -it <pod-name> -- /bin/bash

# Debugging
kubectl get events --sort-by='.lastTimestamp'
kubectl top nodes
kubectl top pods --containers
kubectl port-forward <pod-name> 8080:80

# Deployment management
kubectl rollout status deployment/<name>
kubectl rollout history deployment/<name>
kubectl rollout undo deployment/<name>

# Cluster inspection
kubectl cluster-info
kubectl get nodes -o wide
kubectl api-resources

Key Debugging Patterns

Pod Debugging

# Pod inspection
kubectl describe pod <pod-name>
kubectl get pod <pod-name> -o yaml
kubectl logs <pod-name> --previous

# Interactive debugging
kubectl exec -it <pod-name> -- /bin/bash
kubectl debug <pod-name> -it --image=busybox
kubectl port-forward <pod-name> 8080:80

Networking Troubleshooting

# Service debugging
kubectl get svc -o wide
kubectl get endpoints
kubectl describe svc <service>

# Network connectivity
kubectl run test-pod --image=busybox -it --rm -- sh
# Inside pod: nslookup, wget, nc commands

Common Issues

# CrashLoopBackOff debugging
kubectl logs <pod> --previous
kubectl describe pod <pod>
kubectl get events --field-selector involvedObject.name=<pod>

# Resource constraints
kubectl top pod <pod>
kubectl describe pod <pod> | grep -A 5 Limits

# State management
kubectl state list
kubectl state show <resource>

Best Practices

Context Safety (CRITICAL)

Always specify --context explicitly in every kubectl command
Never rely on the current context - it may have been changed by another process
Use kubectl --context=<context-name> get pods format for all operations
This prevents accidental operations on the wrong cluster (e.g., running production commands against staging)

# CORRECT: Explicit context
kubectl --context=gke_myproject_us-central1_prod get pods
kubectl --context=staging-cluster apply -f deployment.yaml

# WRONG: Relying on current context
kubectl get pods  # Which cluster is this targeting?

Resource Definitions

Use declarative YAML manifests
Implement proper labels and selectors
Define resource requests and limits
Configure health checks (liveness/readiness probes)

Security

Use NetworkPolicies to restrict traffic
Implement RBAC for access control
Store sensitive data in Secrets
Run containers as non-root users

Monitoring

Configure proper logging and metrics
Set up alerts for critical conditions
Use health checks and readiness probes
Monitor resource usage and quotas

Agentic Optimizations

| Context | Command | |---------|---------| | Pod status (structured) | kubectl get pods -n <ns> -o json \| jq '.items[] \| {name:.metadata.name, status:.status.phase}' | | Quick overview | kubectl get pods -n <ns> -o wide | | Events (compact) | kubectl get events -n <ns> --sort-by='.lastTimestamp' -o json | | Resource details | kubectl get <resource> -o json | | Logs (bounded) | kubectl logs <pod> -n <ns> --tail=50 |

For detailed debugging commands, troubleshooting patterns, Helm workflows, and advanced K8s operations, see REFERENCE.md.

Agent Skills: Kubernetes Operations

Install this agent skill to your local

Skill Files