SQL Server Table Reconciliation Skill

SQL Server Table Reconciliation

Compare identical tables across two SQL Server instances using Python with mssql-python driver and Apache Arrow. Detect missing rows, column mismatches, schema drift, and produce a reconciliation report.

Workflow

Collect connection details for source and target
Identify primary key / composite key
Detect schema differences
Extract data via Arrow for efficient columnar transfer
Compare rows and columns
Generate reconciliation report

Collect Inputs

| Parameter | Required | Description | |-----------|----------|-------------| | Source server | Yes | Source SQL Server (e.g. prod-server.database.windows.net) | | Source database | Yes | Source database name | | Target server | Yes | Target SQL Server (e.g. staging-server.database.windows.net) | | Target database | Yes | Target database name | | Tables | Yes | Comma-separated schema.table names, or schema.* wildcard (e.g. dbo.Orders,dbo.Items or dbo.*) | | Auth mode | Yes | sql (user/password) or entra (Azure AD/token) | | Primary key | Auto-detect | Column(s) forming the row identity. Auto-detect from metadata if not provided. | | Columns to compare | All | Subset of columns, or all non-PK columns | | Chunk size | 100000 | Rows per batch for large tables | | Output format | console | console, csv, parquet, or json |

Bundled Script

The reconciliation logic is provided as a standalone script at scripts/reconcile.py. Invoke it with the appropriate arguments based on user inputs:

python scripts/reconcile.py \
    --source-server <source_server> \
    --source-database <source_database> \
    --target-server <target_server> \
    --target-database <target_database> \
    --tables "<table_spec>" \
    --auth <sql|entra> \
    --chunk-size <chunk_size> \
    --output <console|csv|json>

Optional arguments

| Argument | Description | |----------|-------------| | --primary-key | Comma-separated PK column(s). Omit to auto-detect. | | --columns | Comma-separated columns to compare. Omit to compare all non-PK columns. |

Example invocations

Single table with SQL auth:

python scripts/reconcile.py \
    --source-server prod-server.database.windows.net \
    --source-database ProdDB \
    --target-server staging-server.database.windows.net \
    --target-database StagingDB \
    --tables "dbo.Orders" \
    --auth sql \
    --output console

Wildcard with Entra auth and CSV output:

python scripts/reconcile.py \
    --source-server prod-server.database.windows.net \
    --source-database ProdDB \
    --target-server staging-server.database.windows.net \
    --target-database StagingDB \
    --tables "dbo.*" \
    --auth entra \
    --output csv

Prerequisites

Install required packages before running:

pip install mssql-python pyarrow pandas

Comparison Rules

Normalize types before comparing: cast decimals to same precision, trim strings, normalize datetime to UTC
NULL handling: NULL == NULL is considered a match (both sides missing = no diff)
Ignore row order: always compare by PK join, never positional
Large tables: chunk extraction with OFFSET/FETCH or ROW_NUMBER() partitioning

Hash-Based Optimization (for large tables)

When table has >1M rows, generate a hash pre-check:

SELECT {pk_cols},
       HASHBYTES('SHA2_256', CONCAT_WS('|', col1, col2, ...)) AS row_hash
FROM {table}

Compare hashes first; only fetch full rows for mismatched hashes. This reduces data transfer significantly.

Report Format

Reconciling dbo.EMPLOYEES...
Reconciling dbo.DEPARTMENTS...
Reconciling dbo.JOBS...

--- dbo.EMPLOYEES ---
  Source: 107  Target: 107
  Missing: 0  Extra: 0  Mismatches: 0
  Result: ✓ IDENTICAL

--- dbo.DEPARTMENTS ---
  Source: 27  Target: 27
  Missing: 0  Extra: 0  Mismatches: 3
  Result: ✗ DIFFERENCES FOUND

--- dbo.JOBS ---
  Source: 19  Target: 19
  Missing: 0  Extra: 0  Mismatches: 0
  Result: ✓ IDENTICAL

=== Summary: 2 passed, 1 failed, 0 skipped / 3 tables ===

When a single table is provided, include full detail (schema drift, sample rows, mismatches). When multiple tables, use the compact per-table format above with full detail only for tables with FAIL status.

Performance Considerations

| Scenario | Strategy | |----------|----------| | < 100K rows | Single Arrow fetch, in-memory pandas compare | | 100K–1M rows | Chunked extraction (100K batches), streaming comparison | | > 1M rows | Hash pre-check → only fetch mismatched rows | | Wide tables (100+ cols) | Compare PK + hash first, drill into specific columns on mismatch | | Network-constrained | Use Arrow columnar format (10-50x smaller than row-by-row) |

Constraints

Always use mssql-python driver (not pyodbc, pymssql)
Always use Apache Arrow via cursor (cursor.arrow()) for data extraction
Connection MUST use connection string format, not keyword arguments (kwargs like encrypt=True throw errors)
Never compare without identifying PK first — ask user if auto-detect fails
Handle connection failures gracefully with retry logic
Never hardcode credentials in generated scripts — use os.environ / getpass (env vars: MSSQL_USER, MSSQL_PASSWORD)
Do not print credentials in output or logs
Use parameterized queries (? placeholders) for metadata lookups — never f-string interpolate user input into SQL

Agent Skills: SQL Server Table Reconciliation

Install this agent skill to your local

Skill Files