Agent Skills: k8s

Kubernetes ops skill for deploying, operating, and troubleshooting services on Kubernetes. Use for tasks like writing manifests/Helm, configuring deployments/services/ingress, autoscaling, observability, RBAC, secrets/configmaps, rollout/rollback, incident debugging, and production readiness checks.

UncategorizedID: muzhicaomingwang/ai-ideas/k8s

Install this agent skill to your local

pnpm dlx add-skill https://github.com/muzhicaomingwang/ai-ideas/tree/HEAD/.project/ai/ops/skills/k8s

Skill Files

Browse the full folder contents for k8s.

Download Skill

Loading file tree…

.project/ai/ops/skills/k8s/SKILL.md

Skill Metadata

Name
k8s
Description
Kubernetes ops skill for deploying, operating, and troubleshooting services on Kubernetes. Use for tasks like writing manifests/Helm, configuring deployments/services/ingress, autoscaling, observability, RBAC, secrets/configmaps, rollout/rollback, incident debugging, and production readiness checks.

k8s

Use this skill for Kubernetes 运维与发布相关工作。

Defaults / assumptions to confirm

  • Cluster type: managed (EKS/GKE/ACK) vs self-hosted
  • Packaging: raw YAML vs Helm vs Kustomize
  • Ingress: NGINX/ALB/APISIX/Istio
  • Observability stack: Prometheus/Grafana, Loki/ELK, tracing

Workflow

  1. Understand service requirements
  • Ports, protocols, health checks, resources (CPU/mem), storage needs.
  • SLOs: latency, availability, RPO/RTO.
  • Dependencies: DB, cache, MQ, external APIs.
  1. Deployment design
  • Use Deployment for stateless; StatefulSet for stable identities/storage.
  • Define readinessProbe and livenessProbe (and startupProbe if needed).
  • Set resources.requests/limits and choose appropriate QoS.
  • Use PodDisruptionBudget for availability during maintenance.
  1. Config & secrets
  • Config: ConfigMap (non-sensitive), mounted or env.
  • Secrets: Secret (sensitive) + external secret manager if available.
  • Never commit plaintext secrets; prefer sealed/external secrets.
  1. Networking
  • Service types and DNS.
  • Ingress/Gateway routing, TLS termination, timeouts.
  • NetworkPolicy if cluster enforces it.
  1. Scaling & resilience
  • HPA based on CPU/memory/custom metrics.
  • Graceful shutdown (preStop, terminationGracePeriodSeconds).
  • Retry/backoff at client; avoid retry storms.
  1. Observability
  • Standard logs with correlation IDs.
  • Metrics: RPS, p95 latency, error rate, saturation.
  • Alerts and dashboards; runbook links.
  1. Release operations
  • Rolling updates, canary/blue-green if needed.
  • kubectl rollout status + rollback plan.
  • Post-deploy verification checks and smoke tests.
  1. Troubleshooting checklist
  • kubectl get/describe pods, events, and logs.
  • Check probes, image pull, env/config, DNS, network, and resource throttling.
  • For performance: node pressure, HPA behavior, GC/heap, connection pool limits.

Output expectations when making changes

  • Provide manifests (or Helm values/templates) + brief deployment notes.
  • Include resource sizing rationale and probe settings.
  • Include rollback instructions and verification steps.