Agent Skills: Multi-Cluster Kubernetes

Multi-cluster Kubernetes management, federation, and hybrid deployments

kubernetesmulti-clusterfederationhybrid-cloudcontainer-orchestration
cluster-managementID: pluginagentmarketplace/custom-plugin-kubernetes/multi-cluster

Skill Files

Browse the full folder contents for multi-cluster.

Download Skill

Loading file tree…

skills/multi-cluster/SKILL.md

Skill Metadata

Name
multi-cluster
Description
Multi-cluster Kubernetes management, federation, and hybrid deployments

Multi-Cluster Kubernetes

Executive Summary

Production-grade multi-cluster Kubernetes management covering federation, cross-cluster networking, and disaster recovery patterns. This skill provides deep expertise in designing and operating globally distributed Kubernetes infrastructure.

Core Competencies

1. Multi-Cluster Architecture

Topology Patterns

Hub-Spoke:
                    ┌─────────┐
                    │   Hub   │
                    │ Cluster │
                    └────┬────┘
         ┌───────────────┼───────────────┐
         │               │               │
    ┌────▼────┐    ┌────▼────┐    ┌────▼────┐
    │ Spoke 1 │    │ Spoke 2 │    │ Spoke 3 │
    │ (Dev)   │    │ (Stage) │    │ (Prod)  │
    └─────────┘    └─────────┘    └─────────┘

Mesh:
    ┌─────────┐          ┌─────────┐
    │Cluster 1│◄────────►│Cluster 2│
    │ (US)    │          │ (EU)    │
    └────┬────┘          └────┬────┘
         │                    │
         └────────┬───────────┘
              ┌───▼───┐
              │Cluster│
              │3 (AP) │
              └───────┘

2. ArgoCD Multi-Cluster

ApplicationSet Generator

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: api-server
  namespace: argocd
spec:
  generators:
  - clusters:
      selector:
        matchLabels:
          env: production
  template:
    metadata:
      name: 'api-server-{{name}}'
    spec:
      project: default
      source:
        repoURL: https://github.com/org/api-server
        targetRevision: HEAD
        path: k8s/overlays/production
      destination:
        server: '{{server}}'
        namespace: production
      syncPolicy:
        automated:
          prune: true
          selfHeal: true

Register External Cluster

# Add cluster to ArgoCD
argocd cluster add prod-cluster --name prod --kubeconfig ~/.kube/prod.yaml

# List clusters
argocd cluster list

# Verify connectivity
argocd cluster get prod

3. Cross-Cluster Networking

Cilium Cluster Mesh

# Enable cluster mesh on each cluster
cilium clustermesh enable --context cluster1
cilium clustermesh enable --context cluster2

# Connect clusters
cilium clustermesh connect --context cluster1 --destination-context cluster2

# Verify
cilium clustermesh status

Global Service

apiVersion: v1
kind: Service
metadata:
  name: api-server
  annotations:
    service.cilium.io/global: "true"
spec:
  type: ClusterIP
  ports:
  - port: 80
    targetPort: 8080
  selector:
    app: api-server

4. Disaster Recovery

Active-Active Configuration

# External DNS for GSLB
apiVersion: externaldns.k8s.io/v1alpha1
kind: DNSEndpoint
metadata:
  name: api-global
spec:
  endpoints:
  - dnsName: api.example.com
    recordType: A
    targets:
    - 52.1.1.1    # US cluster
    - 35.2.2.2    # EU cluster
    setIdentifier: us-east
    recordTTL: 60
---
# Each cluster has identical deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  replicas: 3
  # ... same configuration in both clusters

Velero Cross-Cluster Backup

# Install Velero in both clusters
velero install \
  --provider aws \
  --bucket velero-backups \
  --backup-location-config region=us-east-1

# Create backup
velero backup create prod-backup \
  --include-namespaces production \
  --snapshot-volumes

# Restore in DR cluster
velero restore create --from-backup prod-backup

5. Fleet Management

Rancher Fleet

apiVersion: fleet.cattle.io/v1alpha1
kind: GitRepo
metadata:
  name: api-server
  namespace: fleet-default
spec:
  repo: https://github.com/org/api-server
  branch: main
  paths:
  - k8s/
  targets:
  - clusterSelector:
      matchLabels:
        env: production
    name: production
  - clusterSelector:
      matchLabels:
        env: staging
    name: staging

Integration Patterns

Uses skill: cluster-admin

  • Cluster provisioning
  • Certificate management

Coordinates with skill: gitops

  • Multi-cluster GitOps
  • ApplicationSets

Works with skill: storage-networking

  • Cross-cluster networking
  • Data replication

Troubleshooting Guide

Decision Tree: Multi-Cluster Issues

Multi-Cluster Issue?
│
├── Cluster unreachable
│   ├── Check network connectivity
│   ├── Verify kubeconfig
│   └── Check cluster health
│
├── Sync failures
│   ├── Check ArgoCD logs
│   ├── Verify RBAC permissions
│   └── Check resource conflicts
│
└── Service discovery fails
    ├── Check mesh connectivity
    ├── Verify DNS configuration
    └── Check NetworkPolicies

Debug Commands

# ArgoCD cluster status
argocd cluster list
argocd app list --dest-server <server>

# Cilium mesh status
cilium clustermesh status
cilium connectivity test

# Cross-cluster DNS
kubectl run debug --rm -it --image=nicolaka/netshoot -- \
  nslookup <service>.default.svc.clusterset.local

Common Challenges & Solutions

| Challenge | Solution | |-----------|----------| | Network latency | Use regional clusters | | State sync | Eventually consistent design | | Failover delay | Health checks, DNS TTL | | Config drift | GitOps, policy enforcement |

Success Criteria

| Metric | Target | |--------|--------| | Cross-cluster latency | <50ms (regional) | | Failover time | <2 minutes | | Config consistency | 100% | | Cluster availability | 99.99% |

Resources