Adaptive Walk-Forward Epoch Selection (AWFES) Skill

Adaptive Walk-Forward Epoch Selection (AWFES)

Machine-readable reference for adaptive epoch selection within Walk-Forward Optimization (WFO). Optimizes training epochs per-fold using Walk-Forward Efficiency (WFE) as the objective.

Self-Evolving Skill: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues.

When to Use This Skill

Use this skill when:

Selecting optimal training epochs for ML models in WFO
Avoiding overfitting via Walk-Forward Efficiency metrics
Implementing per-fold adaptive epoch selection
Computing efficient frontiers for epoch-performance trade-offs
Carrying epoch priors across WFO folds

Quick Start

from adaptive_wfo_epoch import AWFESConfig, compute_efficient_frontier

# Generate epoch candidates from search bounds and granularity
config = AWFESConfig.from_search_space(
    min_epoch=100,
    max_epoch=2000,
    granularity=5,  # Number of frontier points
)
# config.epoch_configs → [100, 211, 447, 945, 2000] (log-spaced)

# Per-fold epoch sweep
for fold in wfo_folds:
    epoch_metrics = []
    for epoch in config.epoch_configs:
        is_sharpe, oos_sharpe = train_and_evaluate(fold, epochs=epoch)
        wfe = config.compute_wfe(is_sharpe, oos_sharpe, n_samples=len(fold.train))
        epoch_metrics.append({"epoch": epoch, "wfe": wfe, "is_sharpe": is_sharpe})

    # Select from efficient frontier
    selected_epoch = compute_efficient_frontier(epoch_metrics)

    # Carry forward to next fold as prior
    prior_epoch = selected_epoch

Methodology Overview

What This Is

Per-fold adaptive epoch selection where:

Train models across a range of epochs (e.g., 400, 800, 1000, 2000)
Compute WFE = OOS_Sharpe / IS_Sharpe for each epoch count
Find the "efficient frontier" - epochs maximizing WFE vs training cost
Select optimal epoch from frontier for OOS evaluation
Carry forward as prior for next fold

What This Is NOT

NOT early stopping: Early stopping monitors validation loss continuously; this evaluates discrete candidates post-hoc
NOT Bayesian optimization: No surrogate model; direct evaluation of all candidates
NOT nested cross-validation: Uses temporal WFO, not shuffled splits

Academic Foundations

| Concept | Citation | Key Insight | | --------------------------- | ------------------------------ | ------------------------------------------------- | | Walk-Forward Efficiency | Pardo (1992, 2008) | WFE = OOS_Return / IS_Return as robustness metric | | Deflated Sharpe Ratio | Bailey & López de Prado (2014) | Adjusts for multiple testing | | Pareto-Optimal HP Selection | Bischl et al. (2023) | Multi-objective hyperparameter optimization | | Warm-Starting | Nomura & Ono (2021) | Transfer knowledge between optimization runs |

See references/academic-foundations.md for full literature review.

Core Formula: Walk-Forward Efficiency

def compute_wfe(
    is_sharpe: float,
    oos_sharpe: float,
    n_samples: int | None = None,
) -> float | None:
    """Walk-Forward Efficiency - measures performance transfer.

    WFE = OOS_Sharpe / IS_Sharpe

    Interpretation (guidelines, not hard thresholds):
    - WFE ≥ 0.70: Excellent transfer (low overfitting)
    - WFE 0.50-0.70: Good transfer
    - WFE 0.30-0.50: Moderate transfer (investigate)
    - WFE < 0.30: Severe overfitting (likely reject)

    The IS_Sharpe minimum is derived from signal-to-noise ratio,
    not a fixed magic number. See compute_is_sharpe_threshold().

    Reference: Pardo (2008) "The Evaluation and Optimization of Trading Strategies"
    """
    # Data-driven threshold: IS_Sharpe must exceed 2σ noise floor
    min_is_sharpe = compute_is_sharpe_threshold(n_samples) if n_samples else 0.1

    if abs(is_sharpe) < min_is_sharpe:
        return None
    return oos_sharpe / is_sharpe

Principled Configuration Framework

All parameters are derived from first principles or data characteristics. AWFESConfig provides unified configuration with log-spaced epoch generation, Bayesian variance derivation from search space, and market-specific annualization factors.

See references/configuration-framework.md for the full AWFESConfig class and compute_is_sharpe_threshold() implementation.

Guardrails (Principled Guidelines)

G1: WFE Thresholds - 0.30 (reject), 0.50 (warning), 0.70 (target) based on practitioner consensus
G2: IS_Sharpe Minimum - Data-driven threshold: 2/sqrt(n) adapts to sample size
G3: Stability Penalty - Adaptive threshold derived from WFE variance prevents epoch churn
G4: DSR Adjustment - Deflated Sharpe corrects for epoch selection multiplicity via Gumbel distribution

See references/guardrails.md for full implementations of all guardrails.

WFE Aggregation Methods

Under the null hypothesis, WFE follows a Cauchy distribution (no defined mean). Always prefer median or pooled methods:

Pooled WFE: Precision-weighted by sample size (best for variable fold sizes)
Median WFE: Robust to outliers (best for suspected regime changes)
Weighted Mean: Inverse-variance weighting (best for homogeneous folds)

See references/wfe-aggregation.md for implementations and selection guide.

Efficient Frontier Algorithm

Pareto-optimal epoch selection: an epoch is on the frontier if no other epoch dominates it (better WFE AND lower training time). The AdaptiveEpochSelector class maintains state across folds with adaptive stability penalties.

See references/efficient-frontier.md for the full algorithm and carry-forward mechanism.

Anti-Patterns

| Anti-Pattern | Symptom | Fix | Severity | | --------------------------------- | ----------------------------------- | --------------------------------- | -------- | | Expanding window (range bars) | Train size grows per fold | Use fixed sliding window | CRITICAL | | Peak picking | Best epoch always at sweep boundary | Expand range, check for plateau | HIGH | | Insufficient folds | effective_n < 30 | Increase folds or data span | HIGH | | Ignoring temporal autocorr | Folds correlated | Use purged CV, gap between folds | HIGH | | Overfitting to IS | IS >> OOS Sharpe | Reduce epochs, add regularization | HIGH | | sqrt(252) for crypto | Inflated Sharpe | Use sqrt(365) or sqrt(7) weekly | MEDIUM | | Single epoch selection | No uncertainty quantification | Report confidence interval | MEDIUM | | Meta-overfitting | Epoch selection itself overfits | Limit to 3-4 candidates max | HIGH |

CRITICAL: Never use expanding window for range bar ML training. See references/anti-patterns.md for the full analysis (Section 7).

Decision Tree

See references/epoch-selection-decision-tree.md for the full practitioner decision tree.

Start
  │
  ├─ IS_Sharpe > compute_is_sharpe_threshold(n)? ──NO──> Mark WFE invalid, use fallback
  │         │                                            (threshold = 2/√n, adapts to sample size)
  │        YES
  │         │
  ├─ Compute WFE for each epoch
  │         │
  ├─ Any WFE > 0.30? ──NO──> REJECT all epochs (severe overfit)
  │         │                (guideline, not hard threshold)
  │        YES
  │         │
  ├─ Compute efficient frontier
  │         │
  ├─ Apply AdaptiveStabilityPenalty
  │         │ (threshold derived from WFE variance)
  └─> Return selected epoch

Integration with rangebar-eval-metrics

This skill extends rangebar-eval-metrics:

| Metric Source | Used For | Reference | | --------------------- | ---------------------------------------- | ---------------------------------------------------------------------------------------- | | sharpe_tw | WFE numerator (OOS) and denominator (IS) | range-bar-metrics.md | | n_bars | Sample size for aggregation weights | metrics-schema.md | | psr, dsr | Final acceptance criteria | sharpe-formulas.md | | prediction_autocorr | Validate model isn't collapsed | ml-prediction-quality.md | | is_collapsed | Model health check | ml-prediction-quality.md | | Extended risk metrics | Deep risk analysis (optional) | risk-metrics.md |

Recommended Workflow

Compute base metrics using rangebar-eval-metrics:compute_metrics.py
Feed to AWFES for epoch selection with sharpe_tw as primary signal
Validate with psr > 0.85 and dsr > 0.50 before deployment
Monitor is_collapsed and prediction_autocorr for model health

OOS Application Phase

AWFES uses Nested WFO with three data splits per fold (Train 60% / Val 20% / Test 20%) with 6% embargo gaps at each boundary. The per-fold workflow: epoch sweep on train, WFE computation on validation, Bayesian update, final model training on train+val, evaluation on test.

See references/oos-workflow.md for the complete workflow with diagrams, BayesianEpochSelector class, and apply_awfes_to_test() implementation. Also see references/oos-application.md for the extended reference.

Epoch Smoothing Methods

Bayesian updating (recommended) provides principled, uncertainty-aware smoothing. Alternatives include EMA and SMA. Initialization via AWFESConfig.from_search_space() derives variances from the epoch range automatically.

See references/epoch-smoothing-methods.md for all methods, formulas, and initialization strategies. See references/epoch-smoothing.md for extended mathematical analysis.

OOS Metrics Specification

Three-tier metric hierarchy for test evaluation:

Tier 1 (Primary): sharpe_tw, hit_rate, cumulative_pnl, positive_sharpe_folds, wfe_test
Tier 2 (Risk): max_drawdown, calmar_ratio, profit_factor, cvar_10pct
Tier 3 (Statistical): psr, dsr, binomial_pvalue, hac_ttest_pvalue

See references/oos-metrics-implementation.md for full metric tables, compute_oos_metrics(), and fold aggregation code. See references/oos-metrics.md for threshold justifications.

Look-Ahead Bias Prevention

CRITICAL (v3 fix): TEST must use prior_bayesian_epoch (from prior folds only), NOT val_optimal_epoch. The Bayesian update happens AFTER test evaluation, ensuring information flows only from past to present.

See references/look-ahead-bias-v3.md for the v3 fix details, embargo requirements, validation checklist, and anti-patterns. See references/look-ahead-bias.md for detailed examples.

References

| Topic | Reference File | | ------------------------ | --------------------------------------------------------------------------------- | | Academic Literature | academic-foundations.md | | Mathematical Formulation | mathematical-formulation.md | | Configuration Framework | configuration-framework.md | | Guardrails | guardrails.md | | WFE Aggregation | wfe-aggregation.md | | Efficient Frontier | efficient-frontier.md | | Decision Tree | epoch-selection-decision-tree.md | | Anti-Patterns | anti-patterns.md | | OOS Workflow | oos-workflow.md | | OOS Application | oos-application.md | | Epoch Smoothing Methods | epoch-smoothing-methods.md | | Epoch Smoothing Analysis | epoch-smoothing.md | | OOS Metrics Impl | oos-metrics-implementation.md | | OOS Metrics Thresholds | oos-metrics.md | | Look-Ahead Bias (v3) | look-ahead-bias-v3.md | | Look-Ahead Bias Examples | look-ahead-bias.md | | Feature Sets | feature-sets.md | | xLSTM Implementation | xlstm-implementation.md | | Range Bar Metrics | range-bar-metrics.md | | Troubleshooting | troubleshooting.md |

Related Skills

| Skill | Relationship | | -------------------------------------------------------------------------------- | --------------------------------------------------- | | sharpe-ratio-non-iid-corrections | Generalized Sharpe variance, DSR for WFE validation | | opendeviation-eval-metrics | Metric definitions consumed by WFE |

Full Citations

Bailey, D. H., & López de Prado, M. (2014). The deflated Sharpe ratio: Correcting for selection bias, backtest overfitting and non-normality. The Journal of Portfolio Management, 40(5), 94-107.
Bischl, B., et al. (2023). Multi-Objective Hyperparameter Optimization in Machine Learning. ACM Transactions on Evolutionary Learning and Optimization.
López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley. Chapter 7.
Nomura, M., & Ono, I. (2021). Warm Starting CMA-ES for Hyperparameter Optimization. AAAI Conference on Artificial Intelligence.
Pardo, R. E. (2008). The Evaluation and Optimization of Trading Strategies, 2nd Edition. John Wiley & Sons.

Post-Execution Reflection

After this skill completes, check before closing:

Did the command succeed? — If not, fix the instruction or error table that caused the failure.
Did parameters or output change? — If the underlying tool's interface drifted, update Usage examples and Parameters table to match.
Was a workaround needed? — If you had to improvise (different flags, extra steps), update this SKILL.md so the next invocation doesn't need the same workaround.

Only update if the issue is real and reproducible — not speculative.

Agent Skills: Adaptive Walk-Forward Epoch Selection (AWFES)

Install this agent skill to your local

Skill Files