Data Quality Test Generator
Generate comprehensive dbt test suites for Fantasy Football Analytics models following Kimball patterns, implementation requirements, and dbt 1.10+ syntax.
When to Use This Skill
Use this skill proactively when:
- Creating tests for new dbt models (staging, facts, dimensions, marts)
- User asks to "add tests for {model}" or "improve test coverage"
- Implementing data quality gates per implementation requirements
- Standardizing existing tests to dbt 1.10+ syntax
- Creating tests alongside models (integration with dbt-model-builder skill)
dbt 1.10+ Syntax Requirements
CRITICAL: FF Analytics uses dbt 1.10+ which requires the new syntax:
- Use
data_tests:nottests: - Wrap test arguments in
arguments:block - Use
config:for test configuration (severity, where clauses)
Correct vs Incorrect Syntax
✅ CORRECT (dbt 1.10+):
data_tests:
- accepted_values:
arguments:
values: ['value1', 'value2']
config:
severity: error
❌ INCORRECT (old syntax):
tests:
- accepted_values:
values: ['value1', 'value2']
severity: error
Test Generation by Model Type
Task 1: Staging Model Tests
Staging models normalize raw provider data and require schema conformance tests.
Test Priorities:
- not_null on all primary key columns
- unique on single-column natural keys
- accepted_values on categorical/enum columns
- Basic data type validation
Example Pattern:
version: 2
models:
- name: stg_nflverse__weekly
description: "NFLverse weekly player stats"
columns:
- name: player_id
description: "gsis_id from nflverse"
data_tests:
- not_null
- name: season
data_tests:
- not_null
- dbt_utils.expression_is_true:
arguments:
expression: ">= 2020" # Reasonable season bounds
- name: week
data_tests:
- not_null
- accepted_values:
arguments:
values: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
- name: position
data_tests:
- not_null
- accepted_values:
arguments:
values: ['QB', 'RB', 'WR', 'TE', 'K', 'DEF']
See: references/example_staging_tests.yml
Task 2: Fact Table Tests
Fact tables capture measurable events and require grain + FK + measure validation.
Test Priorities:
- Grain uniqueness using
dbt_utils.unique_combination_of_columns - Relationship tests for all foreign keys to dimensions
- not_null on required measures
- accepted_values on 2×2 model enums (measure_domain, stat_kind, horizon)
- Expression tests for business logic (non-negative scores, etc.)
Example Pattern:
version: 2
models:
- name: fact_player_stats
description: "Player statistics fact table (grain: player + game + stat)"
# Grain uniqueness test
data_tests:
- dbt_utils.unique_combination_of_columns:
arguments:
combination_of_columns:
- player_id
- game_id
- stat_name
- measure_domain
- stat_kind
config:
severity: error
columns:
# Foreign key to dim_player
- name: player_id
description: "Foreign key to dim_player (mfl_id)"
data_tests:
- not_null
- relationships:
arguments:
to: ref('dim_player')
field: player_id
# Foreign key to dim_game
- name: game_id
description: "Foreign key to dim_game"
data_tests:
- not_null
- relationships:
arguments:
to: ref('dim_game')
field: game_id
# 2x2 model enum: measure_domain
- name: measure_domain
description: "Real-world vs fantasy scoring dimension"
data_tests:
- not_null
- accepted_values:
arguments:
values: ['real_world', 'fantasy']
# 2x2 model enum: stat_kind
- name: stat_kind
description: "Actuals vs projections dimension"
data_tests:
- not_null
- accepted_values:
arguments:
values: ['actual', 'projection']
# Measure validation
- name: stat_value
description: "Value of the statistical measure"
data_tests:
- not_null
See: references/example_fact_tests.yml
Task 3: Dimension Table Tests
Dimensions provide descriptive context and require uniqueness + SCD validation.
Test Priorities:
- unique + not_null on surrogate key (dimension PK)
- not_null on natural key columns
- SCD Type 2 tests (if applicable):
valid_from <= valid_to(or valid_to IS NULL for current)- No overlapping date ranges for same natural key
- accepted_values on categorical attributes
Example Pattern:
version: 2
models:
- name: dim_player
description: "Player dimension with SCD Type 2 history"
columns:
# Surrogate key
- name: player_sk
description: "Surrogate key for SCD Type 2"
data_tests:
- unique
- not_null
# Natural key
- name: player_id
description: "Natural key (mfl_id canonical)"
data_tests:
- not_null
# SCD Type 2 validity
- name: valid_from
description: "Start of validity period"
data_tests:
- not_null
- name: valid_to
description: "End of validity period (NULL = current)"
# SCD Type 2 date logic test
- name: _scd_validation
description: "Ensure valid_from <= valid_to when valid_to is not null"
data_tests:
- dbt_utils.expression_is_true:
arguments:
expression: "valid_from <= COALESCE(valid_to, CURRENT_DATE + INTERVAL '100 years')"
# Categorical attributes
- name: position
description: "Player position"
data_tests:
- accepted_values:
arguments:
values: ['QB', 'RB', 'WR', 'TE', 'K', 'DEF']
- name: status
description: "Active/Inactive/Retired"
data_tests:
- accepted_values:
arguments:
values: ['active', 'inactive', 'retired', 'unknown']
See: references/example_dimension_tests.yml
Task 4: Mart Tests
Marts are analytics-ready views requiring business logic validation.
Test Priorities:
- Row count thresholds (dbt_utils.expression_is_true on table)
- Metric validation (non-negative fantasy points, reasonable ranges)
- Partition completeness (all weeks/seasons represented)
- Aggregate consistency (totals match source facts)
Example Pattern:
version: 2
models:
- name: mart_fantasy_actuals_weekly
description: "Weekly fantasy points by player (analytics-ready)"
data_tests:
# Minimum row count threshold
- dbt_utils.expression_is_true:
arguments:
expression: "(SELECT COUNT(*) FROM {{ ref('mart_fantasy_actuals_weekly') }}) > 10000"
config:
severity: warn
columns:
- name: player_name
data_tests:
- not_null
- name: fantasy_points
description: "Total fantasy points for the week"
data_tests:
- not_null
- dbt_utils.expression_is_true:
arguments:
expression: ">= 0" # Fantasy points can't be negative
- name: games_played
data_tests:
- accepted_values:
arguments:
values: [0, 1] # Weekly grain, max 1 game per week
Task 5: Source Freshness Tests
Implement freshness monitoring for critical data sources.
Example Freshness Policies (always check latest documentation for the most up to date policies)
- NFLverse: warn_after 24h, error_after 48h (during season)
- Commissioner Sheets: warn_after 12h, error_after 24h
- KTC: warn_after 48h, error_after 96h (less time-sensitive)
Example Pattern:
version: 2
sources:
- name: nflverse
description: "NFLverse data provider"
freshness:
warn_after: {count: 24, period: hour}
error_after: {count: 48, period: hour}
tables:
- name: weekly
description: "Weekly player statistics"
identifier: "dt=*/**" # Partitioned by date
# Freshness test on max snapshot_date
freshness:
warn_after: {count: 24, period: hour}
error_after: {count: 48, period: hour}
Data Quality Requirements
The following tests are required in general:
- Accepted values on enums - All categorical columns must have accepted_values tests
- Freshness tests - All source tables must have warn/error thresholds
- Grain uniqueness - All fact tables must test grain with unique_combination_of_columns
- FK relationships - All foreign keys must have relationship tests to dimensions
Best Practices
Test Coverage Targets
- Staging models: >80% column coverage (focus on PKs and enums)
- Fact tables: 100% grain + FK coverage
- Dimensions: 100% PK/NK coverage
- Marts: Business logic validation (thresholds, non-negative, etc.)
Test Naming
Tests are auto-named by dbt. For custom tests, use descriptive names:
- name: _custom_business_rule
description: "Validate that revenue >= cost for all transactions"
Severity Levels
- error: Blocking failures (grain violations, missing FKs)
- warn: Non-blocking issues (freshness delays, low row counts)
config:
severity: warn # Use for non-critical tests
Where Clauses
Use where clauses to filter test scope:
data_tests:
- relationships:
arguments:
to: ref('dim_player')
field: player_id
config:
where: "player_id IS NOT NULL AND season >= 2020"
Integration with Other Skills
- dbt-model-builder: Create tests alongside model creation
Resources
assets/
test_suite_template.yml- Base template with dbt 1.10+ syntax for common test patterns
references/
example_fact_tests.yml- Complete fact table test suite (fact_player_projections)example_staging_tests.yml- Staging model test patternsexample_dimension_tests.yml- Dimension table test patterns with SCD Type 2 validation
Common Test Patterns
Testing 2×2 Model Enums
The 2×2 stat model uses consistent enums across all facts:
# measure_domain: real_world vs fantasy
- name: measure_domain
data_tests:
- accepted_values:
arguments:
values: ['real_world', 'fantasy']
# stat_kind: actual vs projection
- name: stat_kind
data_tests:
- accepted_values:
arguments:
values: ['actual', 'projection']
# horizon: projection timeframe (only for stat_kind='projection')
- name: horizon
data_tests:
- accepted_values:
arguments:
values: ['weekly', 'full_season', 'rest_of_season']
config:
where: "stat_kind = 'projection'"
Testing Date Ranges
- name: game_date
data_tests:
- dbt_utils.expression_is_true:
arguments:
expression: ">= '2020-01-01'" # Reasonable lower bound
- dbt_utils.expression_is_true:
arguments:
expression: "<= CURRENT_DATE + INTERVAL '1 year'" # No future dates beyond 1 year
Testing Aggregations
# Ensure weekly totals match season totals
- name: _season_total_consistency
data_tests:
- dbt_utils.expression_is_true:
arguments:
expression: |
(
SELECT SUM(weekly_points)
FROM {{ ref('mart_weekly_points') }}
WHERE season = 2024
) = (
SELECT season_total
FROM {{ ref('mart_season_totals') }}
WHERE season = 2024
)