Agent Skills: backtesting.py Oracle Validation for Range Bar Patterns

backtesting.py configuration for SQL oracle validation and range bar pattern backtesting. Use when running backtesting.py against ClickHouse SQL results, configuring Backtest() constructor, handling overlapping trades, multi-position mode, rolling quantile NaN handling, trade sorting, or oracle gate validation. TRIGGERS - backtesting.py, Backtest(), hedging, exclusive_orders, multi-position, overlapping trades, oracle validation, SQL vs Python, trade comparison, entry price mismatch, signal count mismatch, rolling quantile NaN, ExitTime sort, stats._trades, gen600_strategy, champion_strategy, gen300_strategy, barrier setup.

UncategorizedID: terrylica/cc-skills/backtesting-py-oracle

Install this agent skill to your local

pnpm dlx add-skill https://github.com/terrylica/cc-skills/tree/HEAD/plugins/quant-research/skills/backtesting-py-oracle

Skill Files

Browse the full folder contents for backtesting-py-oracle.

Download Skill

Loading file tree…

plugins/quant-research/skills/backtesting-py-oracle/SKILL.md

Skill Metadata

Name
backtesting-py-oracle
Description
backtesting.py configuration for SQL oracle validation and range bar pattern backtesting. Use when running backtesting.py against ClickHouse SQL results, configuring Backtest() constructor, handling overlapping trades, multi-position mode, rolling quantile NaN handling, trade sorting, or oracle gate validation. TRIGGERS - backtesting.py, Backtest(), hedging, exclusive_orders, multi-position, overlapping trades, oracle validation, SQL vs Python, trade comparison, entry price mismatch, signal count mismatch, rolling quantile NaN, ExitTime sort, stats._trades, gen600_strategy, champion_strategy, gen300_strategy, barrier setup.

backtesting.py Oracle Validation for Range Bar Patterns

Configuration and anti-patterns for using backtesting.py to validate ClickHouse SQL sweep results. Ensures bit-atomic replicability between SQL and Python trade evaluation.

Companion skills: clickhouse-antipatterns (SQL correctness, AP-16) | sweep-methodology (sweep design) | rangebar-eval-metrics (evaluation metrics)

Validated: Gen600 oracle verification (2026-02-12) — 3 assets, 5 gates, ALL PASS.


Critical Configuration (NEVER omit)

from backtesting import Backtest

bt = Backtest(
    df,
    Strategy,
    cash=100_000,
    commission=0,
    hedging=True,           # REQUIRED: Multiple concurrent positions
    exclusive_orders=False,  # REQUIRED: Don't auto-close on new signal
)

Why: SQL evaluates each signal independently (overlapping trades allowed). Without hedging=True, backtesting.py skips signals while a position is open, producing fewer trades than SQL. This was discovered when SOLUSDT produced 105 Python trades vs 121 SQL trades — 16 signals were silently skipped.


Anti-Patterns (Ordered by Severity)

BP-01: Missing Multi-Position Mode (CRITICAL)

Symptom: Python produces fewer trades than SQL. Gate 1 (signal count) fails.

Root Cause: Default exclusive_orders=True prevents opening new positions while one is active.

Fix: Always use hedging=True, exclusive_orders=False.

BP-02: ExitTime Sort Order (CRITICAL)

Symptom: Entry prices appear mismatched (Gate 3 fails) even though both SQL and Python use the same price source.

Root Cause: stats._trades is sorted by ExitTime, not EntryTime. When overlapping trades exit in a different order than they entered, trade[i] no longer maps to signal[i].

Fix:

trades = stats._trades.sort_values("EntryTime").reset_index(drop=True)

BP-03: NaN Poisoning in Rolling Quantile (CRITICAL)

Symptom: Cross-asset tests fail with far fewer Python trades. Feature quantile becomes NaN and propagates forward indefinitely.

Root Cause: np.percentile with NaN inputs returns NaN. If even one NaN feature value enters the rolling window, all subsequent quantiles become NaN, making all subsequent filter comparisons fail.

Fix: Skip NaN values when building the signal window:

def _rolling_quantile_on_signals(feature_arr, is_signal_arr, quantile_pct, window=1000):
    result = np.full(len(feature_arr), np.nan)
    signal_values = []
    for i in range(len(feature_arr)):
        if is_signal_arr[i]:
            if len(signal_values) > 0:
                window_data = signal_values[-window:]
                result[i] = np.percentile(window_data, quantile_pct * 100)
            # Only append non-NaN values (matches SQL quantileExactExclusive NULL handling)
            if not np.isnan(feature_arr[i]):
                signal_values.append(feature_arr[i])
    return result

BP-04: Data Range Mismatch (MODERATE)

Symptom: Different signal counts between SQL and Python for assets with early data (BNB, XRP).

Root Cause: load_range_bars() defaults to start='2020-01-01' but SQL has no lower bound.

Fix: Always pass start='2017-01-01' to cover all available data.

BP-05: Margin Exhaustion with Overlapping Positions (MODERATE)

Symptom: Orders canceled with insufficient margin. Fewer trades than expected.

Root Cause: With hedging=True and default full-equity sizing, overlapping positions exhaust available margin.

Fix: Use fixed fractional sizing:

self.buy(size=0.01)  # 1% equity per trade

BP-06: Signal Timestamp vs Entry Timestamp (LOW)

Symptom: Gate 2 (timestamp match) fails because SQL uses signal bar timestamps while Python uses entry bar timestamps.

Root Cause: SQL outputs the signal detection bar's timestamp_ms. Python's EntryTime is the fill bar (next bar after signal). These differ by 1 bar.

Fix: Record signal bar timestamps in the strategy's next() method:

# Before calling self.buy()
self._signal_timestamps.append(int(self.data.index[-1].timestamp() * 1000))

5-Gate Oracle Validation Framework

| Gate | Metric | Threshold | What it catches | | ---- | --------------- | --------- | ------------------------------------ | | 1 | Signal Count | <5% diff | Missing signals, filter misalignment | | 2 | Timestamp Match | >95% | Timing offset, warmup differences | | 3 | Entry Price | >95% | Price source mismatch, sort ordering | | 4 | Exit Type | >90% | Barrier logic differences | | 5 | Kelly Fraction | <0.02 | Aggregate outcome alignment |

Expected residual: 1-2 exit type mismatches per asset at TIME barrier boundary (bar 50). SQL uses fwd_closes[max_bars], backtesting.py closes at current bar price. Impact on Kelly < 0.006.


Strategy Architecture: Single vs Multi-Position

| Mode | Constructor | Use Case | Position Sizing | | --------------- | -------------------------------------- | --------------------- | ------------------------------ | | Single-position | hedging=False (default) | Champion 1-bar hold | Full equity | | Multi-position | hedging=True, exclusive_orders=False | SQL oracle validation | Fixed fractional (size=0.01) |

Multi-Position Strategy Template

class Gen600Strategy(Strategy):
    def next(self):
        current_bar = len(self.data) - 1

        # 1. Register newly filled trades and set barriers
        for trade in self.trades:
            tid = id(trade)
            if tid not in self._known_trades:
                self._known_trades.add(tid)
                self._trade_entry_bar[tid] = current_bar
                actual_entry = trade.entry_price
                if self.tp_mult > 0:
                    trade.tp = actual_entry * (1.0 + self.tp_mult * self.threshold_pct)
                if self.sl_mult > 0:
                    trade.sl = actual_entry * (1.0 - self.sl_mult * self.threshold_pct)

        # 2. Check time barrier for each open trade
        for trade in list(self.trades):
            tid = id(trade)
            entry_bar = self._trade_entry_bar.get(tid, current_bar)
            if self.max_bars > 0 and (current_bar - entry_bar) >= self.max_bars:
                trade.close()
                self._trade_entry_bar.pop(tid, None)

        # 3. Check for new signal (no position guard — overlapping allowed)
        if self._is_signal[current_bar]:
            self.buy(size=0.01)

Data Loading

from data_loader import load_range_bars

df = load_range_bars(
    symbol="SOLUSDT",
    threshold=1000,
    start="2017-01-01",      # Cover all available data
    end="2025-02-05",        # Match SQL cutoff
    extra_columns=["volume_per_trade", "lookback_price_range"],  # Gen600 features
)

HTML Equity Plot: Y-Axis Auto-Fit on Zoom (BP-07 to BP-10)

backtesting.py generates Bokeh HTML plots via bt.plot(). By default, the y-axis is fixed — zooming on the x-axis does NOT rescale the y-axis to fit visible data. This makes it impossible to inspect zoomed-in regions of equity curves that span 4+ orders of magnitude.

Reference implementation: scripts/gen800/plotting.py in opendeviationbar-patterns.

BP-07: NEVER Use LogScale for Equity (CRITICAL)

Symptom: Equity panel y-axis doesn't auto-fit on zoom. JS callbacks fail silently.

Root Cause: Bokeh's LogScale breaks CustomJS y-range callbacks in multi-panel linked x_range layouts. The js_on_change callback fires but y_range.start/y_range.end assignments are ignored by the LogScale renderer.

Proven via POC: LogScale + JS callback works in single-panel mode but fails when 5 panels share a linked x_range.

Fix: Transform equity data to log10() in Python, display on linear scale with a CustomJSTickFormatter that shows 10^tick as readable values:

import numpy as np
from bokeh.models import CustomJSTickFormatter, Range1d

# Transform data
raw = np.asarray(src.data["equity"], dtype=float)
raw = np.where(raw > 0, raw, 1e-10)
src.data["equity"] = np.log10(raw).tolist()

# Custom tick formatter (shows 1%, 100%, 10K%, 1M%)
child.yaxis[0].formatter = CustomJSTickFormatter(code="""
    const v = Math.pow(10, tick);
    if (v >= 1e6) return (v/1e6).toFixed(1) + 'M%';
    if (v >= 1e3) return (v/1e3).toFixed(0) + 'K%';
    if (v >= 1) return v.toFixed(0) + '%';
    return v.toFixed(2) + '%';
""")

# MUST set initial y_range explicitly (backtesting.py leaves it as NaN)
valid = eq[np.isfinite(eq)]
pad = (valid.max() - valid.min()) * 0.05
child.y_range = Range1d(start=valid.min() - pad, end=valid.max() + pad)

BP-08: Panel-Aware Column Matching (CRITICAL)

Symptom: Drawdown/P&L panels go blank when zooming. Shows millions of percent.

Root Cause: Multiple panels share the same ColumnDataSource (backtesting.py optimization). The Equity panel's Line renderer and the Drawdown panel's Line renderer both reference a source containing equity, drawdown, High, Low, etc. A naive "find first y column" approach picks equity for the Drawdown panel.

Fix: Match by panel_label FIRST, then by column name:

panel_label = child.yaxis[0].axis_label or ""

# Label-first matching (panels share data sources!)
if "Drawdown" in panel_label and "drawdown" in d:
    hi_col = lo_col = "drawdown"
elif "Equity" in panel_label and "equity" in d:
    hi_col = lo_col = "equity"
elif "High" in d and "Low" in d:  # OHLC
    hi_col, lo_col = "High", "Low"

BP-09: JS Callback Pattern for Y-Axis Auto-Fit (PROVEN)

Attach a CustomJS callback to x_range.js_on_change for each panel. The callback scans visible data and sets y_range directly:

from bokeh.models import CustomJS

cb = CustomJS(
    args={"source": src, "yr": child.y_range,
          "x_col": "index", "y_col": "equity"},
    code="""
    const xs = source.data[x_col];
    const ys = source.data[y_col];
    const x0 = cb_obj.start, x1 = cb_obj.end;
    let lo = Infinity, hi = -Infinity;
    for (let i = 0; i < xs.length; i++) {
        if (xs[i] >= x0 && xs[i] <= x1 && isFinite(ys[i])) {
            if (ys[i] < lo) lo = ys[i];
            if (ys[i] > hi) hi = ys[i];
        }
    }
    if (isFinite(lo) && isFinite(hi) && lo < hi) {
        const pad = (hi - lo) * 0.05 || 0.001;
        yr.start = lo - pad;
        yr.end = hi + pad;
    }
    """,
)
child.x_range.js_on_change("start", cb)
child.x_range.js_on_change("end", cb)

BP-10: DataRange1d(only_visible=True) Is Unreliable

Symptom: DataRange1d(only_visible=True) works for simple cases but fails with:

  • LogScale panels (y-range computed in wrong space)
  • VBar renderers (candlesticks — bounds not tracked correctly)
  • Shared data sources across panels

Fix: Always use the JS callback pattern (BP-09) instead of DataRange1d(only_visible=True).

Panel Data Source Reference (backtesting.py internals)

| Panel | Label | Renderer | Y Column | Source | | -------- | ----------------- | ------------------------------------ | ---------------- | --------------------- | | Equity | "Equity" | Patch (equity_dd), Line (equity) | equity | Shared OHLC source | | Drawdown | "Drawdown" | Line (drawdown), Scatter (peak) | drawdown | Shared OHLC source | | P/L | "Profit / Loss" | Scatter (returns), MultiLine | y or returns | Separate trade source | | OHLC | "" (no label) | Segment, VBar | High/Low | Shared OHLC source | | Volume | "Volume" | VBar | Volume (top) | Shared OHLC source |


Project Artifacts (rangebar-patterns repo)

| Artifact | Path | | --------------------------- | ------------------------------------------------- | | Oracle comparison script | scripts/gen600_oracle_compare.py | | Gen600 strategy (reference) | backtest/backtesting_py/gen600_strategy.py | | SQL oracle query template | sql/gen600_oracle_trades.sql | | Oracle validation findings | findings/2026-02-12-gen600-oracle-validation.md | | Backtest CLAUDE.md | backtest/CLAUDE.md | | ClickHouse AP-16 | .claude/skills/clickhouse-antipatterns/SKILL.md | | Fork source | ~/fork-tools/backtesting.py/ |