Agent Skills: Docker Production Skill

Deploy Docker containers to production with monitoring, logging, and health checks

dockerproductionmonitoringlogginghealth-checks
deployID: pluginagentmarketplace/custom-plugin-docker/docker-production

Skill Files

Browse the full folder contents for docker-production.

Download Skill

Loading file tree…

skills/docker-production/SKILL.md

Skill Metadata

Name
docker-production
Description
Deploy Docker containers to production with monitoring, logging, and health checks

Docker Production Skill

Master production-grade Docker deployments with monitoring, logging, health checks, and resource management.

Purpose

Configure containers for production with proper observability, resource limits, and deployment strategies.

Parameters

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | monitoring | enum | No | prometheus | prometheus/datadog | | logging | enum | No | json-file | json-file/loki/elk | | replicas | number | No | 1 | Number of replicas |

Production Configuration

Health Checks

HEALTHCHECK --interval=30s --timeout=3s --retries=3 --start-period=60s \
  CMD curl -f http://localhost:3000/health || exit 1
# Compose health check
services:
  app:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s

Resource Limits

services:
  app:
    deploy:
      resources:
        limits:
          cpus: '1'
          memory: 1G
        reservations:
          cpus: '0.5'
          memory: 512M
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3

Logging Configuration

services:
  app:
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"
        labels: "app,environment"

Monitoring Stack

Prometheus + Grafana

services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    ports:
      - "8080:8080"

Prometheus Config

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'docker-containers'
    docker_sd_configs:
      - host: unix:///var/run/docker.sock

Deployment Strategies

Rolling Update (Zero Downtime)

deploy:
  update_config:
    parallelism: 1
    delay: 10s
    failure_action: rollback
    order: start-first
  rollback_config:
    parallelism: 1
    delay: 10s

Blue-Green

# Deploy new version
docker compose -p myapp-green up -d

# Switch traffic (update nginx/load balancer)
# Remove old version
docker compose -p myapp-blue down

Error Handling

Common Errors

| Error | Cause | Solution | |-------|-------|----------| | unhealthy | Health check failing | Check endpoint, increase start_period | | OOMKilled | Memory exceeded | Increase limit or optimize | | restart loop | App crash | Check logs, fix application |

Recovery

  1. Check logs: docker logs --tail 100 <container>
  2. Verify health: docker inspect --format='{{.State.Health.Status}}'
  3. Rollback if needed

Troubleshooting

Debug Checklist

  • [ ] Health check passing?
  • [ ] Resources sufficient? docker stats
  • [ ] Logs showing errors?
  • [ ] Metrics collecting?

Diagnostics

# Resource usage
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

# Restart count
docker inspect --format='{{.RestartCount}}' <container>

# Recent events
docker events --filter 'container=<name>' --since 1h

Usage

Skill("docker-production")

Related Skills

  • docker-debugging
  • docker-ci-cd
  • docker-security