Exploratory Data Analysis
Analyze tabular datasets to understand distributions, data quality, and patterns.
When To Use
- Understanding a new dataset before modeling
- Checking data quality such as missing values, outliers, and duplicates
- Analyzing target variable distribution
- Identifying class imbalance
- Generating summary statistics
Analysis Process
- Connect to data and inspect schema.
- Analyze the target variable first.
- Check each relevant column for distribution, missingness, and cardinality.
- Document findings in a compact report.
Helper
For local Parquet, CSV, JSON, or JSONL files, use:
scripts/eda-column-dist --source data/sample.parquet --column status
The helper script requires duckdb in the active Python environment.
Reference
For detailed analysis methodology and output format, read references/eda-analysis.md.