ETL Patterns Skill | Agent Skills

ETL Patterns

Orchestrator for production-grade Extract-Transform-Load patterns.

Skill Routing

| Need | Skill | Content | |------|-------|---------| | Reliability patterns | etl-core-patterns | Idempotency, checkpointing, error handling, chunking, retry, logging | | Load strategies | etl-incremental-patterns | Backfill, timestamp-based, CDC, pipeline orchestration |

Pattern Selection Guide

By Reliability Need

| Need | Pattern | Skill | |------|---------|-------| | Repeatable runs | Idempotency | etl-core-patterns | | Resume after failure | Checkpointing | etl-core-patterns | | Handle bad records | Error handling + DLQ | etl-core-patterns | | Memory management | Chunked processing | etl-core-patterns | | Network resilience | Retry with backoff | etl-core-patterns | | Observability | Structured logging | etl-core-patterns |

By Load Strategy

| Scenario | Pattern | Skill | |----------|---------|-------| | Small tables (<100K) | Full refresh | etl-incremental-patterns | | Large tables | Timestamp incremental | etl-incremental-patterns | | Real-time sync | CDC events | etl-incremental-patterns | | Historical migration | Parallel backfill | etl-incremental-patterns | | Zero-downtime refresh | Swap pattern | etl-incremental-patterns | | Multi-step pipelines | Pipeline orchestration | etl-incremental-patterns |

Quick Reference

Idempotency Options

# Small datasets: Delete-then-insert
# Large datasets: UPSERT on conflict
# Change detection: Row hash comparison

Load Strategy Decision

Is table < 100K rows?
  → Full refresh

Has reliable timestamp column?
  → Timestamp incremental

Source supports CDC?
  → CDC event processing

Need zero downtime?
  → Swap pattern (temp table → rename)

One-time historical load?
  → Parallel backfill with date ranges

Common Pipeline Structure

# 1. Setup
checkpoint = Checkpoint('.etl_checkpoint.json')
processor = ETLProcessor()

# 2. Extract (with incremental)
df = incremental_by_timestamp(source_table, 'updated_at')

# 3. Transform (with error handling)
transformed = processor.process_batch(df.to_dict('records'))

# 4. Load (with idempotency)
upsert_records(pd.DataFrame(transformed))

# 5. Checkpoint
checkpoint.set_last_processed('sync', df['updated_at'].max())

# 6. Handle failures
processor.save_failures('failures/')

Related Skills

data-validation - Validate data quality during ETL
data-quality - Monitor data quality metrics
pandas-coder - DataFrame transformations

Agent Skills: ETL Patterns

Install this agent skill to your local

Skill Files