# MMM Model Builder - Release Log

## v0.3.0 (2025-11-26)

**Major Release: Mandatory Out-of-Sample Validation**

### New Features
- **OOS Validator Module** (`oos_validator.py`): Mandatory validation step for all MMM models
  - 20% temporal holdout test set
  - Train statistics only for normalization (no data leakage)
  - OOS R², MAPE, RMSE metrics
  - Overfitting index check (in-sample vs OOS R²)

- **Validation Thresholds**:
  | Metric | Target | Description |
  |--------|--------|-------------|
  | OOS R² | ≥ 0.40 | Predictive power on unseen data |
  | OOS MAPE | ≤ 20% | Average prediction error |
  | Overfitting Index | ≤ 0.25 | Max R² drop from train to test |

- **Pipeline Integration**: OOS validation runs after diagnostics pass
  - Diagnostics PASS + OOS PASS → `full_success`
  - Diagnostics PASS + OOS FAIL → continues iterating
  - Model not production-ready until OOS passes

- **New Output Artifacts**:
  - `oos_validation.json` - Full validation results
  - `oos_predictions.csv` - Test set actual vs predicted

- **Updated KF Documentation**: §4.0 now includes OOS validation section

### Updated Pipeline Flow
```mermaid
graph LR
    D[Model Training] --> E{Diagnostics}
    E -->|PASS| G{OOS Validation}
    E -->|FAIL| F[Adjust Config]
    G -->|PASS| H[✅ Full Success]
    G -->|FAIL| F
    F --> D
```

### Breaking Changes
- `result['status']` now returns `'full_success'` only when BOTH diagnostics AND OOS pass
- New status `'diagnostics_only'` for cases where diagnostics pass but OOS fails
- Models that don't pass OOS validation are not considered production-ready

### Test Results
- All unit tests pass (see `2025-11-26_oos_validation_test/00_test_oos_validation.py`)
- OOS validator correctly splits Fastg8 data: 231 train / 58 test
- Metrics calculation verified: R², MAPE, RMSE
- Adstock transformation verified: higher α = more carryover

### Why OOS Validation Matters
A model with high in-sample R² may be overfitting to training data noise. OOS validation ensures the model captures true marketing dynamics that generalize to future periods, not just historical patterns.

---

## v0.2.0 (2025-11-26)

**Major Release: Adstock + Saturation Transforms**

### New Features
- **Geometric Adstock**: `adstock_t = spend_t + α * adstock_{t-1}` for carryover effects
- **Hill Saturation**: `x / (λ + x)` for diminishing returns modeling
- **Per-Channel Priors**: Channel-type-specific alpha priors:
  | Channel Type | Prior | Decay |
  |--------------|-------|-------|
  | Search | Beta(4,2) | Fast |
  | Social | Beta(3,3) | Medium |
  | Display | Beta(2,3) | Slow |
  | Video/TV | Beta(2,4) | Slowest |
- **Knowledge Framework Output**: Auto-generated `ITERATION_ANALYSIS.md` with:
  - Data profile and channel list
  - Iteration history with metrics
  - Thinking process documentation
  - Final results and key learnings
- **Thinking Log**: Tracks reasoning at each pipeline step

### Model Architecture (V3)
```
spend → [Geometric Adstock] → [Hill Saturation] → [Beta Coefficient] → revenue
           α∈[0,1]              λ (half-sat)        β≥0
```

### Bug Fixes
- Fixed `pm.math.cast` error (doesn't exist in PyMC)
- Fixed f-string format specifier ValueError
- Fixed numpy bool_ JSON serialization TypeError

### Test Results: Fastg8 SmartMedia v3 ✅ SUCCESS
- **Date:** 2025-11-26
- **Model:** v3 (adstock + Hill saturation)
- **Iterations:** 1 (converged on first attempt!)
- **Results:**
  | Metric | v0.2.0 (v3) | v0.1.0 (simple) | Improvement |
  |--------|-------------|-----------------|-------------|
  | R² | 0.6457 | 0.6151 | +5% ⬆️ |
  | worst_rhat | 1.0021 | 1.0018 | ~same |
  | min_ess | 2971 | 4204 | lower but OK |
  | divergences | 0 | 0 | same |
- **KF Output:** `algorithms/revenue_div/projects/mmm_agentic_approach/2025-11-26_v3_run2/ITERATION_ANALYSIS.md`

### Key Insight
Adding adstock and saturation transforms improved R² by 5% while maintaining excellent convergence. The model now captures marketing dynamics (carryover effects and diminishing returns) rather than just linear spend→revenue relationships.

---

## v0.1.0 (2025-11-26)

**Initial Release**

### Features
- Data analyzer with auto-detection of date, target, channels, controls
- Config generator with prior selection based on channel type
- Pure PyMC model trainer (NOT PyMC-Marketing)
- Diagnostics checker for rhat, ESS, divergences, R²
- Iteration engine with automatic config adjustment
- Orchestrator for full pipeline execution

### Prior Selection Methodology
- Source #1: Bayesian theory (weakly-informative priors)
- Source #2: Open MMM practice (Robyn, LightweightMMM)
- Source #3: Client data context

### Target Metrics
- worst_rhat ≤ 1.02
- min_ess ≥ 100
- divergences = 0
- R² ∈ [0.55, 0.70]

### Dependencies
- PyMC 5.x (pure PyMC, NOT PyMC-Marketing)
- ArviZ for diagnostics
- Conda env: pymc_gpu_015

### Author
Claude Code (Ralph Loop Execution)

---

## Testing History

### Test Run 1: Fastg8 SmartMedia ✅ SUCCESS
- **Date:** 2025-11-26
- **Data:** 289 days, 8 channels, 14 controls
- **Iterations:** 3 (quick test → regularized → balanced)
- **Status:** ✅ COMPLETE - All metrics achieved

**Final Results:**
| Metric | My Result | Reference (231) | Status |
|--------|-----------|-----------------|--------|
| worst_rhat | 1.001818 | 1.012425 | ✅ Better |
| min_ess | 4204.8 | 160.87 | ✅ Much better |
| divergences | 0 | 0 | ✅ Same |
| R² | 0.6151 | 0.6087 | ✅ Within 1% |

**Artifacts:**
- `/client_cases/im_XXXX_XXX___ExampleClient/customer_data/Advanced_analytics/MMM/skill_test_final/`
  - `diagnostics/diagnostics_summary.txt`
  - `metrics/summary.json`
  - `components_daily.csv`
  - `trace.nc`

**Key Insight:** Achieved reference R² without looking at reference code. Pure PyMC model with simplified linear structure converges better than reference.
