Agent Skills: Data Science Engineering Suite - Quick Reference

ML and data science workflows — EDA, feature engineering, modelling, evaluation, and production handoff. Use when exploring data or building models.

UncategorizedID: vasilyu1983/ai-agents-public/ai-ml-data-science

Install this agent skill to your local

pnpm dlx add-skill https://github.com/vasilyu1983/AI-Agents-public/tree/HEAD/frameworks/shared-skills/skills/ai-ml-data-science

Skill Files

Browse the full folder contents for ai-ml-data-science.

Download Skill

Loading file tree…

frameworks/shared-skills/skills/ai-ml-data-science/SKILL.md

Skill Metadata

Name
ai-ml-data-science
Description
"ML and data science workflows — EDA, feature engineering, modelling, evaluation, and production handoff. Use when exploring data or building models."

Data Science Engineering Suite - Quick Reference

This skill turns raw data and questions into validated, documented models ready for production:

  • EDA workflows: Structured exploration with drift detection
  • Feature engineering: Reproducible feature pipelines with leakage prevention and train/serve parity
  • Model selection: Baselines first; strong tabular defaults; escalate complexity only when justified
  • Evaluation & reporting: Slice analysis, uncertainty, model cards, production metrics
  • SQL transformation: SQLMesh for staging/intermediate/marts layers
  • MLOps: CI/CD, CT (continuous training), CM (continuous monitoring)
  • Production patterns: Data contracts, lineage, feedback loops, streaming features

Modern emphasis (2026): Feature stores, automated retraining, drift monitoring (Evidently), train-serve parity, and agentic ML loops (plan -> execute -> evaluate -> improve). Tools: LightGBM, CatBoost, scikit-learn, PyTorch, Polars (lazy eval for larger-than-RAM datasets), lakeFS for data versioning.


Quick Reference

| Task | Tool/Framework | Command | When to Use | |------|----------------|---------|-------------| | EDA & Profiling | Pandas, Great Expectations | df.describe(), ge.validate() | Initial data exploration and quality checks | | Feature Engineering | Pandas, Polars, Feature Stores | df.transform(), Feast materialization | Creating lag, rolling, categorical features | | Model Training | Gradient boosting, linear models, scikit-learn | lgb.train(), model.fit() | Strong baselines for tabular ML | | Hyperparameter Tuning | Optuna, Ray Tune | optuna.create_study(), tune.run() | Optimizing model parameters | | SQL Transformation | SQLMesh | sqlmesh plan, sqlmesh run | Building staging/intermediate/marts layers | | Experiment Tracking | MLflow, W&B | mlflow.log_metric(), wandb.log() | Versioning experiments and models | | Model Evaluation | scikit-learn, custom metrics | metrics.roc_auc_score(), slice analysis | Validating model performance |


Data Lake & Lakehouse

For comprehensive data lake/lakehouse patterns (beyond SQLMesh transformation), see data-lake-platform:

  • Table formats: Apache Iceberg, Delta Lake, Apache Hudi
  • Query engines: ClickHouse, DuckDB, Apache Doris, StarRocks
  • Alternative transformation: dbt (alternative to SQLMesh)
  • Ingestion: dlt, Airbyte (connectors)
  • Streaming: Apache Kafka patterns
  • Orchestration: Dagster, Airflow

This skill focuses on ML feature engineering and modeling. Use data-lake-platform for general-purpose data infrastructure.


Related Skills

For adjacent topics, reference:


Decision Tree: Choosing Data Science Approach

User needs ML for: [Problem Type]
  - Tabular data?
    - Small-medium (<1M rows)? -> LightGBM (fast, efficient)
    - Large and complex (>1M rows)? -> LightGBM first, then NN if needed
    - High-dim sparse (text, counts)? -> Linear models, then shallow NN

  - Time series?
    - Seasonality? -> LightGBM, then see ai-ml-timeseries
    - Long-term dependencies? -> Transformers (see ai-ml-timeseries)

  - Text or mixed modalities?
    - LLMs/Transformers -> See ai-llm

  - SQL transformations?
    - SQLMesh (staging/intermediate/marts layers)

Rule of thumb: For tabular data, tree-based gradient boosting is a strong baseline, but must be validated against alternatives and constraints.


Core Concepts (Vendor-Agnostic)

  • Problem framing: define success metrics, baselines, and decision thresholds before modeling.
  • Leakage prevention: ensure all features are available at prediction time; split by time/group when appropriate.
  • Uncertainty: report confidence intervals and stability (fold variance, bootstrap) rather than single-point metrics.
  • Reproducibility: version code/data/features, fix seeds, and record the environment.
  • Operational handoff: define monitoring, retraining triggers, and rollback criteria with MLOps.

Implementation Practices (Tooling Examples)

  • Track experiments and artifacts (run id, commit hash, data version).
  • Add data validation gates in pipelines (schema + distribution + freshness).
  • Prefer reproducible, testable feature code (shared transforms, point-in-time correctness).
  • Use datasheets/model cards and eval reports as deployment prerequisites (Datasheets for Datasets: https://arxiv.org/abs/1803.09010; Model Cards: https://arxiv.org/abs/1810.03993).

Do / Avoid

Do

  • Do start with baselines and a simple model to expose leakage and data issues early.
  • Do run slice analysis and document failure modes before recommending deployment.
  • Do keep an immutable eval set; refresh training data without contaminating evaluation.

Avoid

  • Avoid random splits for temporal or user-correlated data.
  • Avoid "metric gaming" (optimizing the number without validating business impact).
  • Avoid training on labels created after the prediction timestamp (silent future leakage).

Core Patterns (Overview)

Pattern 1: End-to-End DS Project Lifecycle

Use when: Starting or restructuring any DS/ML project.

Stages:

  1. Problem framing - Business objective, success metrics, baseline
  2. Data & feasibility - Sources, coverage, granularity, label quality
  3. EDA & data quality - Schema, missingness, outliers, leakage checks
  4. Feature engineering - Per data type with feature store integration
  5. Modelling - Baselines first, then LightGBM, then complexity as needed
  6. Evaluation - Offline metrics, slice analysis, error analysis
  7. Reporting - Model evaluation report + model card
  8. MLOps - CI/CD, CT (continuous training), CM (continuous monitoring)

Detailed guide: EDA Best Practices


Pattern 2: Feature Engineering

Use when: Designing features before modelling or during model improvement.

By data type:

  • Numeric: Standardize, handle outliers, transform skew, scale
  • Categorical: One-hot/ordinal (low cardinality), target/frequency/hashing (high cardinality)
    • Feature Store Integration: Store encoders, mappings, statistics centrally
  • Text: Cleaning, TF-IDF, embeddings, simple stats
  • Time: Calendar features, recency, rolling/lag features

Key Modern Practice: Use feature stores (Feast, Tecton, Databricks) for versioning, sharing, and train-serve parity.

Detailed guide: Feature Engineering Patterns


Pattern 3: Data Contracts & Lineage

Use when: Building production ML systems with data quality requirements.

Components:

  • Contracts: Schema + ranges/nullability + freshness SLAs
  • Lineage: Track source -> feature store -> train -> serve
  • Feature store hygiene: Materialization cadence, backfill/replay, encoder versioning
  • Schema evolution: Backward/forward-compatible migrations with shadow runs

Detailed guide: Data Contracts & Lineage


Pattern 4: Model Selection & Training

Use when: Picking model families and starting experiments.

Decision guide (modern benchmarks):

  • Tabular: Start with a strong baseline (linear/logistic, then gradient boosting) and iterate based on error analysis
  • Baselines: Always implement simple baselines first (majority class, mean, naive forecast)
  • Train/val/test splits: Time-based (forecasting), group-based (user/item leakage), or random (IID)
  • Hyperparameter tuning: Start manual, then Bayesian optimization (Optuna, Ray Tune)
  • Overfitting control: Regularization, early stopping, cross-validation

Detailed guide: Modelling Patterns


Pattern 5: Evaluation & Reporting

Use when: Finalizing a model candidate or handing over to production.

Key components:

  • Metric selection: Primary (ROC-AUC, PR-AUC, RMSE) + guardrails (calibration, fairness)
  • Threshold selection: ROC/PR curves, cost-sensitive, F1 maximization
  • Slice analysis: Performance by geography, user segments, product categories
  • Error analysis: Collect high-error examples, cluster by error type, identify systematic failures
  • Uncertainty: Confidence intervals (bootstrap where appropriate), variance across folds, and stability checks
  • Evaluation report: 8-section report (objective, data, features, models, metrics, slices, risks, recommendation)
  • Model card: Documentation for stakeholders (intended use, data, performance, ethics, operations)

Detailed guide: Evaluation Patterns


Pattern 6: Reproducibility & MLOps

Use when: Ensuring experiments are reproducible and production-ready.

Modern MLOps (CI/CD/CT/CM):

  • CI (Continuous Integration): Automated testing, data validation, code quality
  • CD (Continuous Delivery): Environment-specific promotion (dev -> staging -> prod), canary deployment
  • CT (Continuous Training): Drift-triggered and scheduled retraining
  • CM (Continuous Monitoring): Real-time data drift, performance, system health

Versioning:

  • Code (git commit), data (DVC, LakeFS), features (feature store), models (MLflow Registry)
  • Seeds (reproducibility), hyperparameters (experiment tracker)

Detailed guide: Reproducibility Checklist


Pattern 7: Feature Freshness & Streaming

Use when: Managing real-time features and streaming pipelines.

Components:

  • Freshness contracts: Define freshness SLAs per feature, monitor lag, alert on breaches
  • Batch + stream parity: Same feature logic across batch/stream, idempotent upserts
  • Schema evolution: Version schemas, add forward/backward-compatible parsers, backfill with rollback
  • Data quality gates: PII/format checks, range checks, distribution drift (KL, KS, PSI)

Detailed guide: Feature Freshness & Streaming


Pattern 8: Production Feedback Loops

Use when: Capturing production signals and implementing continuous improvement.

Components:

  • Signal capture: Log predictions + user edits/acceptance/abandonment (scrub PII)
  • Labeling: Route failures/edge cases to human review, create balanced sets
  • Dataset refresh: Periodic refresh (weekly/monthly) with lineage, protect eval set
  • Online eval: Shadow/canary new models, track solve rate, calibration, cost, latency

Detailed guide: Production Feedback Loops


Resources (Detailed Guides)

For comprehensive operational patterns and checklists, see:


Templates

Use these as copy-paste starting points:

Project & Workflow Templates

  • Standard DS project template: assets/project/template-standard.md
  • Quick DS experiment template: assets/project/template-quick.md

Feature Engineering & EDA

  • Feature engineering template: assets/features/template-feature-engineering.md
  • EDA checklist & notebook template: assets/eda/template-eda.md

Evaluation & Reporting

  • Model evaluation report: assets/evaluation/template-evaluation-report.md
  • Model card: assets/evaluation/template-model-card.md
  • ML experiment review: assets/review/experiment-review-template.md

SQL Transformation (SQLMesh)

For SQL-based data transformation and feature engineering:

  • SQLMesh project setup: ../data-lake-platform/assets/transformation/sqlmesh/template-sqlmesh-project.md
  • SQLMesh model types: ../data-lake-platform/assets/transformation/sqlmesh/template-sqlmesh-model.md (FULL, INCREMENTAL, VIEW)
  • Incremental models: ../data-lake-platform/assets/transformation/sqlmesh/template-sqlmesh-incremental.md
  • DAG and dependencies: ../data-lake-platform/assets/transformation/sqlmesh/template-sqlmesh-dag.md
  • Testing and data quality: ../data-lake-platform/assets/transformation/sqlmesh/template-sqlmesh-testing.md

Use SQLMesh when:

  • Building SQL-based feature pipelines
  • Managing incremental data transformations
  • Creating staging/intermediate/marts layers
  • Testing SQL logic with unit tests and audits

For data ingestion (loading raw data), use:

  • ai-mlops skill (dlt templates for REST APIs, databases, warehouses)

Navigation

Resources

Templates

Data


External Resources

See data/sources.json for curated foundational and implementation references:

  • Core ML/DL: scikit-learn, XGBoost, LightGBM, PyTorch, TensorFlow, JAX
  • Data processing: pandas, NumPy, Polars, DuckDB, Spark, Dask
  • SQL transformation: SQLMesh, dbt (staging/marts/incremental patterns)
  • Feature stores: Feast, Tecton, Databricks Feature Store (centralized feature management)
  • Data validation: Pydantic, Great Expectations, Pandera, Evidently (quality + drift)
  • Visualization: Matplotlib, Seaborn, Plotly, Streamlit, Dash
  • MLOps: MLflow, W&B, DVC, Neptune (experiment tracking + model registry)
  • Hyperparameter tuning: Optuna, Ray Tune, Hyperopt
  • Model serving: BentoML, FastAPI, TorchServe, Seldon, Ray Serve
  • Orchestration: Kubeflow, Metaflow, Prefect, Airflow, ZenML
  • Cloud platforms: AWS SageMaker, Google Vertex AI, Azure ML, Databricks, Snowflake

Use this skill to execute data science projects end-to-end: concrete checklists, patterns, and templates, not theory.

Fact-Checking

  • Use web search/web fetch to verify current external facts, versions, pricing, deadlines, regulations, or platform behavior before final answers.
  • Prefer primary sources; report source links and dates for volatile information.
  • If web access is unavailable, state the limitation and mark guidance as unverified.