Agent Skills: Docker Swarm Skill

Docker Swarm orchestration, cluster management, and production deployments

dockercontainer-orchestrationcluster-managementproduction-deployment
deployID: pluginagentmarketplace/custom-plugin-docker/docker-swarm

Skill Files

Browse the full folder contents for docker-swarm.

Download Skill

Loading file tree…

skills/docker-swarm/SKILL.md

Skill Metadata

Name
docker-swarm
Description
Docker Swarm orchestration, cluster management, and production deployments

Docker Swarm Skill

Master Docker Swarm for container orchestration, cluster management, and production deployments.

Purpose

Set up and manage Docker Swarm clusters for high availability, service scaling, and production orchestration.

Parameters

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | managers | number | No | 3 | Number of manager nodes | | workers | number | No | - | Number of worker nodes | | encrypted | boolean | No | true | Encrypt overlay networks |

Cluster Setup

Initialize Swarm

# Initialize on first manager
docker swarm init --advertise-addr <MANAGER_IP>

# Get join tokens
docker swarm join-token worker
docker swarm join-token manager

# Join as worker
docker swarm join --token <WORKER_TOKEN> <MANAGER_IP>:2377

# Join as manager
docker swarm join --token <MANAGER_TOKEN> <MANAGER_IP>:2377

High Availability (3 or 5 managers)

# Manager quorum: N/2 + 1
# 3 managers = tolerates 1 failure
# 5 managers = tolerates 2 failures

Service Deployment

Basic Service

# Create service
docker service create \
  --name webapp \
  --replicas 3 \
  --publish 80:80 \
  nginx:alpine

# Scale
docker service scale webapp=5

# Update image
docker service update --image nginx:1.25-alpine webapp

# Rollback
docker service rollback webapp

Full Service Configuration

docker service create \
  --name api \
  --replicas 3 \
  --network backend \
  --publish 8080:3000 \
  --mount type=volume,source=data,target=/data \
  --secret db_password \
  --env NODE_ENV=production \
  --limit-cpu 0.5 \
  --limit-memory 512M \
  --update-delay 10s \
  --update-parallelism 1 \
  --update-failure-action rollback \
  --health-cmd "curl -f http://localhost:3000/health" \
  --health-interval 30s \
  myapp:latest

Stack Deployment

Production Stack

# stack.yaml
services:
  frontend:
    image: frontend:${VERSION:-latest}
    deploy:
      replicas: 3
      placement:
        constraints:
          - node.role == worker
      update_config:
        parallelism: 1
        delay: 10s
        failure_action: rollback
      resources:
        limits:
          cpus: '0.5'
          memory: 256M
    ports:
      - "80:80"
    networks:
      - frontend
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/health"]
      interval: 30s

  backend:
    image: backend:${VERSION:-latest}
    deploy:
      replicas: 3
    secrets:
      - db_password
    networks:
      - frontend
      - backend

networks:
  frontend:
    driver: overlay
  backend:
    driver: overlay
    internal: true

secrets:
  db_password:
    external: true
# Deploy stack
docker stack deploy -c stack.yaml myapp

# List services
docker stack services myapp

# Remove stack
docker stack rm myapp

Secrets & Configs

Secrets

# Create secret
echo "password" | docker secret create db_password -

# Use in service
docker service update --secret-add db_password myservice

# Rotate secret
echo "newpassword" | docker secret create db_password_v2 -
docker service update \
  --secret-rm db_password \
  --secret-add source=db_password_v2,target=db_password \
  myservice

Configs

# Create config
docker config create nginx_config ./nginx.conf

# Use in service
docker service create \
  --config source=nginx_config,target=/etc/nginx/nginx.conf \
  nginx

Node Management

# List nodes
docker node ls

# Drain node (maintenance)
docker node update --availability drain <node>

# Activate node
docker node update --availability active <node>

# Add label
docker node update --label-add role=database <node>

# Promote to manager
docker node promote <node>

# Demote from manager
docker node demote <node>

Error Handling

Common Errors

| Error | Cause | Solution | |-------|-------|----------| | no suitable node | Constraints not met | Relax or add nodes | | not converging | Health check failing | Check service logs | | Raft: no leader | Quorum lost | Restore managers |

Manager Recovery

# If quorum lost, force new cluster
docker swarm init --force-new-cluster --advertise-addr <IP>

Troubleshooting

Debug Checklist

  • [ ] Swarm active? docker info | grep Swarm
  • [ ] Nodes healthy? docker node ls
  • [ ] Service running? docker service ls
  • [ ] Tasks placed? docker service ps <svc>

Diagnostics

# Service status
docker service ls

# Task status
docker service ps <service> --no-trunc

# Service logs
docker service logs -f <service>

# Node issues
docker node inspect <node> --pretty

Usage

Skill("docker-swarm")

Assets

  • assets/swarm-stack.yaml - Stack template
  • scripts/swarm-init.sh - Init script

Related Skills

  • docker-networking
  • docker-security
  • docker-production