Agent Skills: Deploying Airflow

>-

dataID: kilo-org/kilo-marketplace/deploying-airflow

Install this agent skill to your local

pnpm dlx add-skill https://github.com/Kilo-Org/kilo-marketplace/tree/HEAD/skills/deploying-airflow

Skill Files

Browse the full folder contents for deploying-airflow.

Download Skill

Loading file tree…

skills/deploying-airflow/SKILL.md

Skill Metadata

Name
deploying-airflow
Description
>-

Deploying Airflow

This skill covers deploying Airflow DAGs and projects to production, whether using Astro (Astronomer's managed platform) or open-source Airflow on Docker Compose or Kubernetes.

Choosing a path: Astro is a good fit for managed operations and faster CI/CD. For open-source, use Docker Compose for dev and the Helm chart for production.


Astro (Astronomer)

Astro provides CLI commands and GitHub integration for deploying Airflow projects.

Deploy Commands

| Command | What It Does | |---------|--------------| | astro deploy | Full project deploy — builds Docker image and deploys DAGs | | astro deploy --dags | DAG-only deploy — pushes only DAG files (fast, no image build) | | astro deploy --image | Image-only deploy — pushes only the Docker image (for multi-repo CI/CD) | | astro deploy --dbt | dbt project deploy — deploys a dbt project to run alongside Airflow |

Full Project Deploy

Builds a Docker image from your Astro project and deploys everything (DAGs, plugins, requirements, packages):

astro deploy

Use this when you've changed requirements.txt, Dockerfile, packages.txt, plugins, or any non-DAG file.

DAG-Only Deploy

Pushes only files in the dags/ directory without rebuilding the Docker image:

astro deploy --dags

This is significantly faster than a full deploy since it skips the image build. Use this when you've only changed DAG files and haven't modified dependencies or configuration.

Image-Only Deploy

Pushes only the Docker image without updating DAGs:

astro deploy --image

This is useful in multi-repo setups where DAGs are deployed separately from the image, or in CI/CD pipelines that manage image and DAG deploys independently.

dbt Project Deploy

Deploys a dbt project to run with Cosmos on an Astro deployment:

astro deploy --dbt

GitHub Integration

Astro supports branch-to-deployment mapping for automated deploys:

  • Map branches to specific deployments (e.g., main -> production, develop -> staging)
  • Pushes to mapped branches trigger automatic deploys
  • Supports DAG-only deploys on merge for faster iteration

Configure this in the Astro UI under Deployment Settings > CI/CD.

CI/CD Patterns

Common CI/CD strategies on Astro:

  1. DAG-only on feature branches: Use astro deploy --dags for fast iteration during development
  2. Full deploy on main: Use astro deploy on merge to main for production releases
  3. Separate image and DAG pipelines: Use --image and --dags in separate CI jobs for independent release cycles

Deploy Queue

When multiple deploys are triggered in quick succession, Astro processes them sequentially in a deploy queue. Each deploy completes before the next one starts.

Reference


Open-Source: Docker Compose

Deploy Airflow using the official Docker Compose setup. This is recommended for learning and exploration — for production, use Kubernetes with the Helm chart (see below).

Prerequisites

  • Docker and Docker Compose v2.14.0+
  • The official apache/airflow Docker image

Quick Start

Download the official Airflow 3 Docker Compose file:

curl -LfO 'https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml'

This sets up the full Airflow 3 architecture:

| Service | Purpose | |---------|---------| | airflow-apiserver | REST API and UI (port 8080) | | airflow-scheduler | Schedules DAG runs | | airflow-dag-processor | Parses and processes DAG files | | airflow-worker | Executes tasks (CeleryExecutor) | | airflow-triggerer | Handles deferrable/async tasks | | postgres | Metadata database | | redis | Celery message broker |

Minimal Setup

For a simpler setup with LocalExecutor (no Celery/Redis), create a docker-compose.yaml:

x-airflow-common: &airflow-common
  image: ${AIRFLOW_IMAGE_REF:?Set AIRFLOW_IMAGE_REF to apache/airflow@sha256:<reviewed-digest>}
  environment: &airflow-common-env
    AIRFLOW__CORE__EXECUTOR: LocalExecutor
    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
    AIRFLOW__CORE__DAGS_FOLDER: /opt/airflow/dags
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
  depends_on:
    postgres:
      condition: service_healthy

services:
  postgres:
    image: postgres:16
    environment:
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: airflow
      POSTGRES_DB: airflow
    volumes:
      - postgres-db-volume:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "airflow"]
      interval: 10s
      retries: 5
      start_period: 5s

  airflow-init:
    <<: *airflow-common
    entrypoint: /bin/bash
    command:
      - -c
      - |
        airflow db migrate
        airflow users create \
          --username admin \
          --firstname Admin \
          --lastname User \
          --role Admin \
          --email admin@example.com \
          --password admin
    depends_on:
      postgres:
        condition: service_healthy

  airflow-apiserver:
    <<: *airflow-common
    command: airflow api-server
    ports:
      - "8080:8080"
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 30s

  airflow-scheduler:
    <<: *airflow-common
    command: airflow scheduler

  airflow-dag-processor:
    <<: *airflow-common
    command: airflow dag-processor

  airflow-triggerer:
    <<: *airflow-common
    command: airflow triggerer

volumes:
  postgres-db-volume:

Airflow 3 architecture note: The webserver has been replaced by the API server (airflow api-server), and the DAG processor now runs as a standalone process separate from the scheduler.

Common Operations

# Start all services
docker compose up -d

# Stop all services
docker compose down

# View logs
docker compose logs -f airflow-scheduler

# Restart after requirements change
docker compose down && docker compose up -d --build

# Run a one-off Airflow CLI command
docker compose exec airflow-apiserver airflow dags list

Installing Python Packages

Add packages to requirements.txt and rebuild:

# Add to requirements.txt, then:
docker compose down
docker compose up -d --build

Or use a custom Dockerfile:

FROM apache/airflow:3  # Pin to a specific version (e.g., 3.1.7) for reproducibility
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

Update docker-compose.yaml to build from the Dockerfile:

x-airflow-common: &airflow-common
  build:
    context: .
    dockerfile: Dockerfile
  # ... rest of config

Environment Variables

Configure Airflow settings via environment variables in docker-compose.yaml:

environment:
  # Core settings
  AIRFLOW__CORE__EXECUTOR: LocalExecutor
  AIRFLOW__CORE__PARALLELISM: 32
  AIRFLOW__CORE__MAX_ACTIVE_TASKS_PER_DAG: 16

  # Email
  AIRFLOW__EMAIL__EMAIL_BACKEND: airflow.utils.email.send_email_smtp
  AIRFLOW__SMTP__SMTP_HOST: smtp.example.com

  # Connections (as URI)
  AIRFLOW_CONN_MY_DB: postgresql://user:pass@host:5432/db

Open-Source: Kubernetes (Helm Chart)

Deploy Airflow on Kubernetes using the official Apache Airflow Helm chart.

Prerequisites

  • A Kubernetes cluster
  • kubectl configured
  • helm installed

Installation

# Add the Airflow Helm repo
helm repo add apache-airflow https://airflow.apache.org
helm repo update

# Install with default values
helm install airflow apache-airflow/airflow \
  --namespace airflow \
  --create-namespace

# Install with custom values
helm install airflow apache-airflow/airflow \
  --namespace airflow \
  --create-namespace \
  -f values.yaml

Key values.yaml Configuration

# Executor type
executor: KubernetesExecutor  # or CeleryExecutor, LocalExecutor

# Airflow image (pin to your desired version)
defaultAirflowRepository: apache/airflow
defaultAirflowTag: "3"  # Or pin: "3.1.7"

# Git-sync for DAGs (recommended for production)
dags:
  gitSync:
    enabled: true
    repo: https://github.com/your-org/your-dags.git
    branch: main
    subPath: dags
    wait: 60  # seconds between syncs

# API server (replaces webserver in Airflow 3)
apiServer:
  resources:
    requests:
      cpu: "250m"
      memory: "512Mi"
    limits:
      cpu: "500m"
      memory: "1Gi"
  replicas: 1

# Scheduler
scheduler:
  resources:
    requests:
      cpu: "500m"
      memory: "1Gi"
    limits:
      cpu: "1000m"
      memory: "2Gi"

# Standalone DAG processor
dagProcessor:
  enabled: true
  resources:
    requests:
      cpu: "250m"
      memory: "512Mi"
    limits:
      cpu: "500m"
      memory: "1Gi"

# Triggerer (for deferrable tasks)
triggerer:
  resources:
    requests:
      cpu: "250m"
      memory: "512Mi"
    limits:
      cpu: "500m"
      memory: "1Gi"

# Worker resources (CeleryExecutor only)
workers:
  resources:
    requests:
      cpu: "500m"
      memory: "1Gi"
    limits:
      cpu: "2000m"
      memory: "4Gi"
  replicas: 2

# Log persistence
logs:
  persistence:
    enabled: true
    size: 10Gi

# PostgreSQL (built-in)
postgresql:
  enabled: true

# Or use an external database
# postgresql:
#   enabled: false
# data:
#   metadataConnection:
#     user: airflow
#     pass: airflow
#     host: your-rds-host.amazonaws.com
#     port: 5432
#     db: airflow

Upgrading

# Upgrade with new values
helm upgrade airflow apache-airflow/airflow \
  --namespace airflow \
  -f values.yaml

# Upgrade to a new Airflow version
helm upgrade airflow apache-airflow/airflow \
  --namespace airflow \
  --set defaultAirflowTag="<version>"

DAG Deployment Strategies on Kubernetes

  1. Git-sync (recommended): DAGs are synced from a Git repository automatically
  2. Persistent Volume: Mount a shared PV containing DAGs
  3. Baked into image: Include DAGs in a custom Docker image

Useful Commands

# Check pod status
kubectl get pods -n airflow

# View scheduler logs
kubectl logs -f deployment/airflow-scheduler -n airflow

# Port-forward the API server
kubectl port-forward svc/airflow-apiserver 8080:8080 -n airflow

# Run a one-off CLI command
kubectl exec -it deployment/airflow-scheduler -n airflow -- airflow dags list

Related Skills

  • setting-up-astro-project: For initializing a new Astro project
  • managing-astro-local-env: For local development with astro dev
  • authoring-dags: For writing DAGs before deployment
  • testing-dags: For testing DAGs before deployment