OpenMetadata Data Quality & Observability Skill

OpenMetadata Data Quality & Observability

Guide for configuring data quality tests, profiling, alerts, and incident management in OpenMetadata.

When to Use This Skill

Creating and managing data quality tests
Configuring data profiler workflows
Setting up observability alerts
Triaging and resolving data incidents
Exploring lineage for impact analysis
Running quality tests programmatically

This Skill Does NOT Cover

General data discovery and UI navigation (see openmetadata-user)
Using SDKs for non-quality tasks (see openmetadata-dev)
Administering users and policies (see openmetadata-ops)
Contributing quality features to core (see openmetadata-sdk-dev)

Data Quality Overview

OpenMetadata provides comprehensive data quality capabilities:

┌─────────────────────────────────────────────────────────────┐
│                 Data Quality Framework                       │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │   Profiler  │  │   Tests     │  │   Incident Manager  │  │
│  │  (Metrics)  │→ │ (Assertions)│→ │   (Resolution)      │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
│         ↓                ↓                    ↓              │
│  ┌─────────────────────────────────────────────────────────┐│
│  │                 Alerts & Notifications                  ││
│  └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘

Data Profiler

What the Profiler Does

The profiler captures descriptive statistics to:

Understand data shape and distribution
Validate assumptions (nulls, duplicates, ranges)
Detect anomalies over time
Power data quality tests

Profiler Metrics

Table-Level Metrics

| Metric | Description | |--------|-------------| | Row Count | Total rows in table | | Column Count | Number of columns | | Size Bytes | Table size in bytes | | Create DateTime | When table was created |

Column-Level Metrics

| Metric | Description | |--------|-------------| | Null Count | Number of null values | | Null Ratio | Percentage of nulls | | Unique Count | Distinct values | | Unique Ratio | Percentage unique | | Duplicate Count | Non-unique values | | Min/Max | Range for numeric/date | | Mean/Median | Central tendency | | Std Dev | Value distribution spread | | Histogram | Value distribution buckets |

Configuring Profiler Workflow

Via UI

Navigate to Settings → Services → Database Services
Select your database service
Click Ingestion → Add Ingestion
Select Profiler as ingestion type
Configure:
- Schedule (cron expression)
- Sample size percentage
- Tables to include/exclude
- Metrics to compute

Via YAML

source:
  type: profiler
  serviceName: my-database
  sourceConfig:
    config:
      type: Profiler
      generateSampleData: true
      profileSampleType: PERCENTAGE
      profileSample: 50  # Sample 50% of rows
      tableFilterPattern:
        includes:
          - "prod.*"
        excludes:
          - ".*_staging"
      columnFilterPattern:
        includes:
          - ".*"

processor:
  type: orm-profiler
  config: {}

sink:
  type: metadata-rest
  config: {}

workflowConfig:
  openMetadataServerConfig:
    hostPort: http://localhost:8585/api
    authProvider: openmetadata
    securityConfig:
      jwtToken: ${OM_JWT_TOKEN}

Sample Configuration Options

| Option | Description | Default | |--------|-------------|---------| | profileSampleType | PERCENTAGE or ROWS | PERCENTAGE | | profileSample | Sample size | 50 | | generateSampleData | Store sample rows | true | | sampleDataCount | Rows to store | 50 | | threadCount | Parallel threads | 5 | | timeoutSeconds | Per-table timeout | 43200 |

Viewing Profiler Results

Navigate to table's Profiler tab
View Table Profile:
- Row count over time
- Size trends
View Column Profile:
- Null percentages
- Unique values
- Distribution histograms

Data Quality Tests

Test Categories

Table-Level Tests

| Test | Description | |------|-------------| | tableRowCountToBeBetween | Row count within range | | tableRowCountToEqual | Row count equals value | | tableColumnCountToBeBetween | Column count within range | | tableColumnCountToEqual | Column count equals value | | tableRowInsertedCountToBeBetween | New rows within range | | tableCustomSQLQuery | Custom SQL returns expected result |

Column-Level Tests

| Test | Description | |------|-------------| | columnValuesToBeNotNull | No null values | | columnValuesToBeUnique | All values unique | | columnValuesToBeBetween | Values within range | | columnValuesToMatchRegex | Values match pattern | | columnValuesToBeInSet | Values in allowed list | | columnValueLengthsToBeBetween | String lengths in range | | columnValuesMissingCount | Missing values below threshold | | columnValueMaxToBeBetween | Max value in range | | columnValueMinToBeBetween | Min value in range | | columnValueMeanToBeBetween | Mean in range | | columnValueMedianToBeBetween | Median in range | | columnValueStdDevToBeBetween | Std dev in range | | columnValuesLengthsToMatch | Exact string length |

Creating Tests via UI

Navigate to table's Data Quality tab
Click + Add Test
Select test type (Table or Column level)
Choose test definition
Configure parameters:
- Column (for column tests)
- Thresholds/values
- Description
Save test

Creating Tests via YAML

source:
  type: TestSuite
  serviceName: my-database
  sourceConfig:
    config:
      type: TestSuite
      entityFullyQualifiedName: my-database.schema.table

processor:
  type: orm-test-runner
  config:
    testCases:
      - name: orders_row_count_check
        testDefinitionName: tableRowCountToBeBetween
        parameterValues:
          - name: minValue
            value: 1000
          - name: maxValue
            value: 1000000

      - name: customer_id_not_null
        testDefinitionName: columnValuesToBeNotNull
        columnName: customer_id

      - name: status_in_valid_set
        testDefinitionName: columnValuesToBeInSet
        columnName: status
        parameterValues:
          - name: allowedValues
            value: "['pending', 'completed', 'cancelled']"

sink:
  type: metadata-rest
  config: {}

workflowConfig:
  openMetadataServerConfig:
    hostPort: http://localhost:8585/api
    authProvider: openmetadata
    securityConfig:
      jwtToken: ${OM_JWT_TOKEN}

Test Suites

Group related tests into test suites:

testSuites:
  - name: orders_table_suite
    description: Quality tests for orders table
    testCases:
      - orders_row_count_check
      - customer_id_not_null
      - status_in_valid_set

Custom SQL Tests

Write custom validation queries:

- name: custom_business_rule
  testDefinitionName: tableCustomSQLQuery
  parameterValues:
    - name: sqlExpression
      value: "SELECT COUNT(*) FROM orders WHERE total < 0"
    - name: strategy
      value: "COUNT"
    - name: threshold
      value: 0

Strategy Options:

COUNT - Result should equal threshold
ROWS - Should return no rows
VALUE - Single value comparison

Dimensional Testing

Test data quality by business dimensions:

- name: quality_by_region
  testDefinitionName: columnValuesToBeBetween
  columnName: revenue
  parameterValues:
    - name: minValue
      value: 0
    - name: partitionColumnName
      value: region
    - name: partitionValues
      value: "['US', 'EU', 'APAC']"

Running Tests Programmatically

Python SDK

from metadata.ingestion.ometa.ometa_api import OpenMetadata
from metadata.generated.schema.tests.testCase import TestCase
from metadata.generated.schema.tests.testDefinition import TestDefinition
from metadata.generated.schema.type.entityReference import EntityReference

# Initialize client
metadata = OpenMetadata(server_config)

# Get test definition
null_test = metadata.get_by_name(
    entity=TestDefinition,
    fqn="columnValuesToBeNotNull",
)

# Create test case
test_case = TestCase(
    name="customer_id_not_null",
    testDefinition=EntityReference(
        id=null_test.id,
        type="testDefinition",
    ),
    entityLink="<#E::table::my-db.schema.orders::columns::customer_id>",
    parameterValues=[],
)

created = metadata.create_or_update(test_case)
print(f"Created test: {created.name}")

Run from ETL Pipeline

from metadata.workflow.data_quality import TestSuiteWorkflow

config = {
    "source": {
        "type": "TestSuite",
        "serviceName": "my-database",
        "sourceConfig": {
            "config": {
                "type": "TestSuite",
                "entityFullyQualifiedName": "my-database.schema.orders"
            }
        }
    },
    # ... rest of config
}

workflow = TestSuiteWorkflow.create(config)
workflow.execute()
workflow.print_status()

Alerts and Notifications

Alert Types

| Trigger | Description | |---------|-------------| | Test Failure | When data quality test fails | | Pipeline Failure | When ingestion pipeline fails | | Schema Change | When table schema changes | | Ownership Change | When asset owner changes | | New Asset | When new asset is created |

Creating Alerts

Navigate to Settings → Notifications
Click + Add Alert
Configure:
- Alert name and description
- Trigger type
- Filter conditions
- Notification destinations

Notification Destinations

| Destination | Setup Required | |-------------|----------------| | Email | SMTP configuration | | Slack | Webhook URL | | Microsoft Teams | Webhook URL | | Webhook | Custom endpoint URL |

Alert Filters

Filter which events trigger alerts:

filters:
  - field: entityType
    condition: equals
    value: table
  - field: testResult
    condition: equals
    value: Failed
  - field: tier
    condition: in
    values: [Tier1, Tier2]

Slack Integration

destinations:
  - type: Slack
    config:
      webhookUrl: https://hooks.slack.com/services/XXX/YYY/ZZZ
      channel: "#data-quality-alerts"

Incident Manager

Incident Lifecycle

Test Failure
    ↓
New Incident Created
    ↓
Acknowledged (ack)
    ↓
Assigned to Owner
    ↓
Investigation
    ↓
Root Cause Documented
    ↓
Resolved (with reason)

Incident States

| State | Description | |-------|-------------| | New | Incident just created | | Ack | Acknowledged, under review | | Assigned | Assigned to specific person/team | | Resolved | Issue fixed, incident closed |

Managing Incidents

Acknowledge

Navigate to Incident Manager
Find new incident
Click Ack to acknowledge
Incident moves to acknowledged state

Assign

Select acknowledged incident
Click Assign
Search for user or team
Add assignment notes
Task created for assignee

Document Root Cause

Open incident details
Click Root Cause
Document:
- What went wrong
- Why it happened
- How it was discovered
Save for future reference

Resolve

Open incident
Click Resolve
Select resolution reason:
- Fixed - Issue corrected
- False Positive - Test was wrong
- Duplicate - Same as another incident
- Won't Fix - Accepted as-is
Add resolution comments
Confirm resolution

Resolution Workflow

1. Failure Notification → System alerts on test failure
2. Acknowledgment → Team member confirms awareness
3. Assignment → Routes to knowledgeable person
4. Status Updates → Assigned team communicates progress
5. Resolution → All stakeholders notified of fix

Historical Analysis

Past incidents serve as a troubleshooting handbook:

Review similar scenarios
Access previous resolutions
Learn from patterns
Improve test coverage

Lineage for Impact Analysis

Lineage in Data Quality Context

Use lineage to understand:

Which downstream tables are affected by issues
Which upstream sources might be the root cause
Impact radius of data quality problems

Exploring Lineage

Navigate to table's Lineage tab
View upstream sources (data origins)
View downstream targets (data consumers)
Click nodes for quality status

Lineage Configuration

| Setting | Range | Purpose | |---------|-------|---------| | Upstream Depth | 1-3 | How far back to trace | | Downstream Depth | 1-3 | How far forward to trace | | Nodes per Layer | 5-50 | Max nodes displayed |

Lineage Layers

| Layer | Quality Use Case | |-------|------------------| | Column | Track field transformations | | Observability | See test results on each node | | Service | Cross-system impact analysis |

Observability Layer

Enabling the observability layer shows:

Test pass/fail status on each node
Failing tests propagate visual indicators
Quick identification of problem sources

Profiler and Test Scheduling

Cron Expressions

| Expression | Schedule | |------------|----------| | 0 0 * * * | Daily at midnight | | 0 */6 * * * | Every 6 hours | | 0 0 * * 0 | Weekly on Sunday | | 0 0 1 * * | Monthly on 1st | | */30 * * * * | Every 30 minutes |

Recommended Schedules

| Workload Type | Profiler | Tests | |---------------|----------|-------| | Batch (Daily) | Daily after load | Daily after load | | Streaming | Every 6 hours | Every hour | | Critical | Hourly | Every 15 minutes | | Archive | Weekly | Weekly |

Managing Schedules

Navigate to Settings → Services → [Service]
Go to Ingestion tab
View/edit scheduled workflows:
- Metadata ingestion
- Profiler
- Test suites

Best Practices

Test Coverage

Start with critical tables - Tier1 assets first
Cover basics first:
- Null checks on required columns
- Uniqueness on primary keys
- Range checks on numeric fields
Add business rules - Custom SQL for domain logic
Test incrementally - New rows, not full table

Profiler Configuration

Sample appropriately - 10-50% usually sufficient
Exclude large columns - Skip LOBs and JSON
Schedule off-peak - Avoid production impact
Timeout appropriately - Set realistic limits

Alert Management

Avoid alert fatigue - Start with critical tests
Route appropriately - Right team for right issues
Include context - Link to asset and test details
Set severity levels - Not all failures are equal

Incident Response

Acknowledge quickly - Show awareness
Document thoroughly - Future you will thank you
Communicate status - Keep stakeholders informed
Learn from incidents - Improve tests and processes

Troubleshooting

Profiler Not Running

| Symptom | Check | |---------|-------| | No metrics | Verify ingestion is scheduled | | Missing columns | Check column filter patterns | | Slow execution | Reduce sample size | | Timeouts | Increase timeout or reduce scope |

Tests Failing Unexpectedly

| Symptom | Check | |---------|-------| | False positives | Review test thresholds | | Intermittent failures | Check for race conditions | | All tests failing | Verify database connectivity | | No test results | Check test suite is scheduled |

Alerts Not Sending

| Symptom | Check | |---------|-------| | No emails | Verify SMTP configuration | | No Slack messages | Check webhook URL | | Wrong recipients | Review alert filters | | Too many alerts | Tighten filter conditions |

Integration with CI/CD

GitHub Actions Example

name: Data Quality Check
on:
  schedule:
    - cron: '0 6 * * *'
  workflow_dispatch:

jobs:
  quality-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: pip install openmetadata-ingestion

      - name: Run quality tests
        env:
          OM_HOST: ${{ secrets.OM_HOST }}
          OM_TOKEN: ${{ secrets.OM_TOKEN }}
        run: |
          metadata_openmetadata test-suite run \
            --config quality-tests.yaml

      - name: Check results
        run: |
          # Fail pipeline if critical tests failed
          metadata_openmetadata test-suite status \
            --suite critical-tests \
            --fail-on-failure

References

Data Quality Guide
Profiler Configuration
Test Definitions
Incident Manager
Data Lineage
openmetadata-dev - SDK for programmatic quality tests
openmetadata-user - UI navigation and discovery
openmetadata-ops - Platform administration

Agent Skills: OpenMetadata Data Quality & Observability

Install this agent skill to your local

Skill Files