Data Quality Test Generator Skill

Data Quality Test Generator

Generate comprehensive dbt test suites for Fantasy Football Analytics models following Kimball patterns, implementation requirements, and dbt 1.10+ syntax.

When to Use This Skill

Use this skill proactively when:

Creating tests for new dbt models (staging, facts, dimensions, marts)
User asks to "add tests for {model}" or "improve test coverage"
Implementing data quality gates per implementation requirements
Standardizing existing tests to dbt 1.10+ syntax
Creating tests alongside models (integration with dbt-model-builder skill)

dbt 1.10+ Syntax Requirements

CRITICAL: FF Analytics uses dbt 1.10+ which requires the new syntax:

Use data_tests: not tests:
Wrap test arguments in arguments: block
Use config: for test configuration (severity, where clauses)

Correct vs Incorrect Syntax

✅ CORRECT (dbt 1.10+):

data_tests:
  - accepted_values:
      arguments:
        values: ['value1', 'value2']
      config:
        severity: error

❌ INCORRECT (old syntax):

tests:
  - accepted_values:
      values: ['value1', 'value2']
      severity: error

Test Generation by Model Type

Task 1: Staging Model Tests

Staging models normalize raw provider data and require schema conformance tests.

Test Priorities:

not_null on all primary key columns
unique on single-column natural keys
accepted_values on categorical/enum columns
Basic data type validation

Example Pattern:

version: 2

models:
  - name: stg_nflverse__weekly
    description: "NFLverse weekly player stats"

    columns:
      - name: player_id
        description: "gsis_id from nflverse"
        data_tests:
          - not_null

      - name: season
        data_tests:
          - not_null
          - dbt_utils.expression_is_true:
              arguments:
                expression: ">= 2020"  # Reasonable season bounds

      - name: week
        data_tests:
          - not_null
          - accepted_values:
              arguments:
                values: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

      - name: position
        data_tests:
          - not_null
          - accepted_values:
              arguments:
                values: ['QB', 'RB', 'WR', 'TE', 'K', 'DEF']

See: references/example_staging_tests.yml

Task 2: Fact Table Tests

Fact tables capture measurable events and require grain + FK + measure validation.

Test Priorities:

Grain uniqueness using dbt_utils.unique_combination_of_columns
Relationship tests for all foreign keys to dimensions
not_null on required measures
accepted_values on 2×2 model enums (measure_domain, stat_kind, horizon)
Expression tests for business logic (non-negative scores, etc.)

Example Pattern:

version: 2

models:
  - name: fact_player_stats
    description: "Player statistics fact table (grain: player + game + stat)"

    # Grain uniqueness test
    data_tests:
      - dbt_utils.unique_combination_of_columns:
          arguments:
            combination_of_columns:
              - player_id
              - game_id
              - stat_name
              - measure_domain
              - stat_kind
          config:
            severity: error

    columns:
      # Foreign key to dim_player
      - name: player_id
        description: "Foreign key to dim_player (mfl_id)"
        data_tests:
          - not_null
          - relationships:
              arguments:
                to: ref('dim_player')
                field: player_id

      # Foreign key to dim_game
      - name: game_id
        description: "Foreign key to dim_game"
        data_tests:
          - not_null
          - relationships:
              arguments:
                to: ref('dim_game')
                field: game_id

      # 2x2 model enum: measure_domain
      - name: measure_domain
        description: "Real-world vs fantasy scoring dimension"
        data_tests:
          - not_null
          - accepted_values:
              arguments:
                values: ['real_world', 'fantasy']

      # 2x2 model enum: stat_kind
      - name: stat_kind
        description: "Actuals vs projections dimension"
        data_tests:
          - not_null
          - accepted_values:
              arguments:
                values: ['actual', 'projection']

      # Measure validation
      - name: stat_value
        description: "Value of the statistical measure"
        data_tests:
          - not_null

See: references/example_fact_tests.yml

Task 3: Dimension Table Tests

Dimensions provide descriptive context and require uniqueness + SCD validation.

Test Priorities:

unique + not_null on surrogate key (dimension PK)
not_null on natural key columns
SCD Type 2 tests (if applicable):
- valid_from <= valid_to (or valid_to IS NULL for current)
- No overlapping date ranges for same natural key
accepted_values on categorical attributes

Example Pattern:

version: 2

models:
  - name: dim_player
    description: "Player dimension with SCD Type 2 history"

    columns:
      # Surrogate key
      - name: player_sk
        description: "Surrogate key for SCD Type 2"
        data_tests:
          - unique
          - not_null

      # Natural key
      - name: player_id
        description: "Natural key (mfl_id canonical)"
        data_tests:
          - not_null

      # SCD Type 2 validity
      - name: valid_from
        description: "Start of validity period"
        data_tests:
          - not_null

      - name: valid_to
        description: "End of validity period (NULL = current)"

      # SCD Type 2 date logic test
      - name: _scd_validation
        description: "Ensure valid_from <= valid_to when valid_to is not null"
        data_tests:
          - dbt_utils.expression_is_true:
              arguments:
                expression: "valid_from <= COALESCE(valid_to, CURRENT_DATE + INTERVAL '100 years')"

      # Categorical attributes
      - name: position
        description: "Player position"
        data_tests:
          - accepted_values:
              arguments:
                values: ['QB', 'RB', 'WR', 'TE', 'K', 'DEF']

      - name: status
        description: "Active/Inactive/Retired"
        data_tests:
          - accepted_values:
              arguments:
                values: ['active', 'inactive', 'retired', 'unknown']

See: references/example_dimension_tests.yml

Task 4: Mart Tests

Marts are analytics-ready views requiring business logic validation.

Test Priorities:

Row count thresholds (dbt_utils.expression_is_true on table)
Metric validation (non-negative fantasy points, reasonable ranges)
Partition completeness (all weeks/seasons represented)
Aggregate consistency (totals match source facts)

Example Pattern:

version: 2

models:
  - name: mart_fantasy_actuals_weekly
    description: "Weekly fantasy points by player (analytics-ready)"

    data_tests:
      # Minimum row count threshold
      - dbt_utils.expression_is_true:
          arguments:
            expression: "(SELECT COUNT(*) FROM {{ ref('mart_fantasy_actuals_weekly') }}) > 10000"
          config:
            severity: warn

    columns:
      - name: player_name
        data_tests:
          - not_null

      - name: fantasy_points
        description: "Total fantasy points for the week"
        data_tests:
          - not_null
          - dbt_utils.expression_is_true:
              arguments:
                expression: ">= 0"  # Fantasy points can't be negative

      - name: games_played
        data_tests:
          - accepted_values:
              arguments:
                values: [0, 1]  # Weekly grain, max 1 game per week

Task 5: Source Freshness Tests

Implement freshness monitoring for critical data sources.

Example Freshness Policies (always check latest documentation for the most up to date policies)

NFLverse: warn_after 24h, error_after 48h (during season)
Commissioner Sheets: warn_after 12h, error_after 24h
KTC: warn_after 48h, error_after 96h (less time-sensitive)

Example Pattern:

version: 2

sources:
  - name: nflverse
    description: "NFLverse data provider"
    freshness:
      warn_after: {count: 24, period: hour}
      error_after: {count: 48, period: hour}

    tables:
      - name: weekly
        description: "Weekly player statistics"
        identifier: "dt=*/**"  # Partitioned by date

        # Freshness test on max snapshot_date
        freshness:
          warn_after: {count: 24, period: hour}
          error_after: {count: 48, period: hour}

Data Quality Requirements

The following tests are required in general:

Accepted values on enums - All categorical columns must have accepted_values tests
Freshness tests - All source tables must have warn/error thresholds
Grain uniqueness - All fact tables must test grain with unique_combination_of_columns
FK relationships - All foreign keys must have relationship tests to dimensions

Best Practices

Test Coverage Targets

Staging models: >80% column coverage (focus on PKs and enums)
Fact tables: 100% grain + FK coverage
Dimensions: 100% PK/NK coverage
Marts: Business logic validation (thresholds, non-negative, etc.)

Test Naming

Tests are auto-named by dbt. For custom tests, use descriptive names:

- name: _custom_business_rule
  description: "Validate that revenue >= cost for all transactions"

Severity Levels

error: Blocking failures (grain violations, missing FKs)
warn: Non-blocking issues (freshness delays, low row counts)

config:
  severity: warn  # Use for non-critical tests

Where Clauses

Use where clauses to filter test scope:

data_tests:
  - relationships:
      arguments:
        to: ref('dim_player')
        field: player_id
      config:
        where: "player_id IS NOT NULL AND season >= 2020"

Integration with Other Skills

dbt-model-builder: Create tests alongside model creation

Resources

assets/

test_suite_template.yml - Base template with dbt 1.10+ syntax for common test patterns

references/

example_fact_tests.yml - Complete fact table test suite (fact_player_projections)
example_staging_tests.yml - Staging model test patterns
example_dimension_tests.yml - Dimension table test patterns with SCD Type 2 validation

Common Test Patterns

Testing 2×2 Model Enums

The 2×2 stat model uses consistent enums across all facts:

# measure_domain: real_world vs fantasy
- name: measure_domain
  data_tests:
    - accepted_values:
        arguments:
          values: ['real_world', 'fantasy']

# stat_kind: actual vs projection
- name: stat_kind
  data_tests:
    - accepted_values:
        arguments:
          values: ['actual', 'projection']

# horizon: projection timeframe (only for stat_kind='projection')
- name: horizon
  data_tests:
    - accepted_values:
        arguments:
          values: ['weekly', 'full_season', 'rest_of_season']
      config:
        where: "stat_kind = 'projection'"

Testing Date Ranges

- name: game_date
  data_tests:
    - dbt_utils.expression_is_true:
        arguments:
          expression: ">= '2020-01-01'"  # Reasonable lower bound
    - dbt_utils.expression_is_true:
        arguments:
          expression: "<= CURRENT_DATE + INTERVAL '1 year'"  # No future dates beyond 1 year

Testing Aggregations

# Ensure weekly totals match season totals
- name: _season_total_consistency
  data_tests:
    - dbt_utils.expression_is_true:
        arguments:
          expression: |
            (
              SELECT SUM(weekly_points)
              FROM {{ ref('mart_weekly_points') }}
              WHERE season = 2024
            ) = (
              SELECT season_total
              FROM {{ ref('mart_season_totals') }}
              WHERE season = 2024
            )

Agent Skills: Data Quality Test Generator

Install this agent skill to your local

Skill Files