Agent Skills: MMM Model Builder Skill

Build Marketing Mix Models with PyMC, adstock and saturation transforms, and mandatory out-of-sample validation. Use when user says "build MMM model", "построй MMM модель", "marketing mix model", or asks to train MMM on client data.

UncategorizedID: tekliner/improvado-agentic-frameworks-and-skills/mmm-model-builder

Install this agent skill to your local

pnpm dlx add-skill https://github.com/tekliner/improvado-agentic-frameworks-and-skills/tree/HEAD/skills/mmm-model-builder

Skill Files

Browse the full folder contents for mmm-model-builder.

Download Skill

Loading file tree…

skills/mmm-model-builder/SKILL.md

Skill Metadata

Name
mmm-model-builder
Description
Build Marketing Mix Models with PyMC, adstock and saturation transforms, and mandatory out-of-sample validation. Use when user says "build MMM model", "построй MMM модель", "marketing mix model", or asks to train MMM on client data.

MMM Model Builder Skill

Thesis: Automatically build Marketing Mix Models using pure PyMC (NOT PyMC-Marketing) with geometric adstock, Hill saturation transforms, and mandatory out-of-sample validation, iterating until both convergence metrics and OOS validation pass, documenting the thinking process in Knowledge Framework format.

Trigger Phrases

Use this skill when user says:

  • "построй MMM модель для [клиент/путь]"
  • "build MMM model"
  • "запусти mmm-model-builder"
  • "создай marketing mix model"
  • "train MMM on [data]"
  • "run MMM with adstock and saturation"

Overview

This skill orchestrates automated MMM model building with:

  1. Data Analysis - Auto-detect date, target, channels, controls from CSV
  2. Config Generation - Generate PyMC config with channel-type-specific priors
  3. Model Training - Run pure PyMC with adstock + saturation transforms
  4. Diagnostics Check - Validate rhat, ESS, divergences, R²
  5. OOS Validation - ⚠️ MANDATORY - Test predictive power on held-out data
  6. Iteration - Adjust config and retry until ALL metrics achieved
  7. KF Documentation - Generate Knowledge Framework analysis with iteration thinking
graph LR
    A[CSV Data] --> B[Data Analyzer]
    B --> C[Config Generator]
    C --> D[Model Trainer V3]
    D --> E{Diagnostics OK?}
    E -->|No| F[Adjust + Retry]
    F --> D
    E -->|Yes| G{OOS Validation}
    G -->|FAIL| F
    G -->|PASS| H[✅ Full Success]
    H --> I[KF Documentation]

    style D fill:#c8f7dc
    style G fill:#fff4e1
    style H fill:#c8f7dc
    style I fill:#e1f5ff

Model Architecture (V3)

The default model (v3) implements proper MMM transformations:

Geometric Adstock

adstock_t = spend_t + α * adstock_{t-1}
  • Captures carryover effects from marketing
  • α ∈ [0,1] controls decay rate
  • Higher α = longer memory (slower decay)

Hill Saturation

saturated = x / (λ + x)
  • Captures diminishing returns
  • λ = half-saturation point
  • Ensures marketing effect is bounded

Per-Channel Priors

| Channel Type | Adstock α Prior | Decay Speed | |--------------|-----------------|-------------| | Search | Beta(4, 2) | Fast | | Social | Beta(3, 3) | Medium | | Display | Beta(2, 3) | Slow | | Video/TV | Beta(2, 4) | Slowest |

Target Metrics

In-Sample Diagnostics

| Metric | Target | Description | |--------|--------|-------------| | worst_rhat | ≤ 1.02 | Convergence indicator | | min_ess | ≥ 100 | Effective sample size | | divergences | = 0 | Sampling issues | | R² | 0.55-0.70 | Model fit |

Out-of-Sample Validation (MANDATORY)

| Metric | Target | Description | |--------|--------|-------------| | OOS R² | ≥ 0.40 | Predictive power on unseen data | | OOS MAPE | ≤ 20% | Average prediction error | | Overfitting Index | ≤ 0.25 | R² drop from train to test |

⚠️ CRITICAL: A model is NOT production-ready unless OOS validation passes. The OOS validator uses a 20% temporal holdout to test true predictive power.

Usage

CLI Usage

# From skill directory
conda activate pymc_gpu_015
python scripts/iteration_engine.py /path/to/data.csv ./output 10 v3

Programmatic Usage

from scripts.iteration_engine import run_iteration_engine

result = run_iteration_engine(
    csv_path="/path/to/data.csv",
    output_dir="./mmm_output",
    max_iterations=10,
    conda_env="pymc_gpu_015",
    model_type="v3"  # Default: adstock + saturation + OOS
)

# Check full success (diagnostics + OOS)
print(f"Status: {result['status']}")  # 'full_success', 'diagnostics_only', or 'budget_exhausted'
print(f"OOS Passed: {result['oos_validation_passed']}")
print(f"KF Documentation: {result['kf_documentation']}")

Components

| Component | Purpose | |-----------|---------| | data_analyzer.py | Auto-detect columns from CSV | | config_generator.py | Generate YAML configs with channel-type priors | | model_trainer.py | Pure PyMC with adstock + Hill saturation | | diagnostics_checker.py | Validate convergence metrics | | oos_validator.py | NEW Out-of-sample validation | | iteration_engine.py | Orchestrate iterations + OOS + KF docs | | orchestrator.py | Main entry point |

Model Types

| Type | Description | Use Case | |------|-------------|----------| | v3 (default) | Grid-based adstock + Hill saturation + OOS | Production MMM | | v3_proper | PyTensor scan-based adstock | Research/accuracy | | simple | Linear model, no transforms | Quick baseline | | v1 | Per-channel params, simplified | Testing | | v2 | Hierarchical structure | Advanced |

Output Artifacts

output_dir/
├── data_profile.json           # Data analysis results
├── config_iter_N.yaml          # Generated configs
├── iteration_log.json          # Training iterations
├── thinking_history.json       # Detailed thinking process
├── ITERATION_ANALYSIS.md       # Knowledge Framework documentation
├── skill_report.json           # Final summary
└── artifacts_iter_N/
    ├── diagnostics/
    │   ├── diagnostics_summary.txt
    │   └── predicted_vs_actual_series.csv
    ├── metrics/
    │   └── summary.json
    ├── oos_validation.json     # ⚠️ NEW: OOS validation results
    ├── oos_predictions.csv     # ⚠️ NEW: Test set predictions
    ├── thinking_log.json       # Per-iteration thinking
    └── trace.nc                # PyMC trace

Knowledge Framework Output

Every run generates ITERATION_ANALYSIS.md with:

  • §1.0 Data Profile
  • §2.0 Iteration History (metrics per iteration)
  • §3.0 Thinking Process (reasoning at each step)
  • §4.0 Final Results (In-Sample + OOS Validation)
  • §5.0 Key Learnings

This enables:

  • Understanding why model decisions were made
  • Reproducibility of the modeling process
  • Learning from iteration patterns
  • Verifying model generalizes to unseen data

Requirements

  • Python 3.10+
  • PyMC 5.x
  • ArviZ
  • pandas, numpy, pyyaml
  • Conda environment: pymc_gpu_015

IMPORTANT: Pure PyMC

This skill uses pure PyMC, NOT PyMC-Marketing:

import pymc as pm  # NOT pymc_marketing

with pm.Model():
    # Geometric adstock via grid interpolation
    # Hill saturation: x / (lam + x)
    # Per-channel alpha, lambda, beta estimation

OOS Validation Details

The out-of-sample validator (oos_validator.py) performs:

  1. Temporal Split - Last 20% of data held out as test set
  2. Train Statistics - Normalization uses ONLY training data
  3. Prediction - Apply learned parameters to test data
  4. Metrics - Calculate OOS R², MAPE, RMSE
  5. Overfitting Check - Compare in-sample vs OOS R²
# OOS Validation Thresholds
DEFAULT_THRESHOLDS = {
    'oos_r2_min': 0.40,       # Minimum OOS R²
    'oos_mape_max': 20.0,     # Maximum MAPE in %
    'overfitting_max': 0.25,  # Max R² drop from train to test
    'test_size': 0.20,        # 20% held out for testing
}

Version: 0.3.0 Created: 2025-11-26 Updated: 2025-11-26 (added mandatory OOS validation) Author: Claude Code (Ralph Loop)