Agent Skills: Pandera Validation

DataFrame schema validation using pandera. Schema definitions, column checks, and decorator-based validation.

UncategorizedID: majesticlabs-dev/majestic-marketplace/pandera-validation

Install this agent skill to your local

pnpm dlx add-skill https://github.com/majesticlabs-dev/majestic-marketplace/tree/HEAD/plugins/majestic-data/skills/pandera-validation

Skill Files

Browse the full folder contents for pandera-validation.

Download Skill

Loading file tree…

plugins/majestic-data/skills/pandera-validation/SKILL.md

Skill Metadata

Name
pandera-validation
Description
DataFrame schema validation using pandera. Schema definitions, column checks, and decorator-based validation.

Pandera Validation

Audience: Data engineers validating pandas DataFrames.

Goal: Provide pandera patterns for schema validation and type checking.

Scripts

Execute schema functions from scripts/schemas.py:

from scripts.schemas import (
    create_user_schema,
    create_nullable_schema,
    create_date_range_schema,
    UserSchema,
    validate_with_errors,
    infer_and_export_schema
)

Usage Examples

Basic Schema Validation

from scripts.schemas import create_user_schema

schema = create_user_schema()
validated_df = schema.validate(df)

Collect All Errors

from scripts.schemas import create_user_schema, validate_with_errors

schema = create_user_schema()
validated_df, errors = validate_with_errors(df, schema)

if errors:
    for err in errors:
        print(f"{err['column']}: {err['check']} - {err['failure_case']}")

Class-Based Schema

from scripts.schemas import UserSchema

# Validate with type hints
UserSchema.validate(df)

# Use as function type hint
def process_users(df: pa.typing.DataFrame[UserSchema]) -> pd.DataFrame:
    return df.query("status == 'active'")

Infer Schema from DataFrame

from scripts.schemas import infer_and_export_schema

schema_export = infer_and_export_schema(df)
print(schema_export['python_code'])  # Python schema definition
print(schema_export['yaml'])         # YAML schema

Built-in Checks Reference

| Check Type | Example | Description | |------------|---------|-------------| | Numeric | Check.gt(0), Check.in_range(0, 100) | Comparisons | | String | Check.str_matches(r'pattern') | Regex match | | Set membership | Check.isin(['A', 'B']) | Allowed values | | Uniqueness | unique=True on Column | No duplicates | | Nullable | nullable=True on Column | Allow nulls |

Decorator-Based Validation

import pandera as pa

@pa.check_output(schema)
def load_data(path: str) -> pd.DataFrame:
    return pd.read_csv(path)

@pa.check_input(schema, "df")
def process_data(df: pd.DataFrame) -> pd.DataFrame:
    return df.assign(processed=True)

@pa.check_io(df=input_schema, out=output_schema)
def transform_data(df: pd.DataFrame) -> pd.DataFrame:
    return df.transform(...)

When to Use Pandera

| Use Case | Pandera | Alternative | |----------|---------|-------------| | DataFrame validation | ✓ | - | | Type hints for DataFrames | ✓ | - | | ETL pipeline checks | ✓ | Great Expectations | | Record-level validation | - | Pydantic |

Dependencies

pandera>=0.18
pandas