---
title: Results May Not Generalize
impact: MEDIUM
tags: external-validity, generalization, context, replication
---

## Results May Not Generalize

An experiment that works in one context may fail in another. Time of year, user population, product maturity, and external events all affect whether results replicate.

**Incorrect (assuming universal applicability):**

> **Experiment during COVID lockdowns:**
> - "Work from home" messaging increased conversions 25%
>
> PM: "This is our new standard messaging! Roll it out permanently."
>
> **Six months later (post-COVID):**
>
> The messaging no longer resonates. Conversion rates dropped below the original baseline. The "win" was context-dependent.

**Another example:**

> **Experiment at Bing in the UK:**
> - Opening Hotmail links in new tab: +15% engagement
>
> Team: "Let's roll this out globally immediately."
>
> **Better approach:**
>
> Team: "Let's replicate in US and other markets before full rollout."
>
> US replication: +12% engagement
> Germany replication: +8% engagement
>
> Now confident it generalizes.

**Correct (testing for generalizability):**

> **Before assuming results generalize:**
>
> 1. **Consider context dependencies:**
>    - Was this run during unusual circumstances (holidays, crises, events)?
>    - Was user behavior different than normal?
>    - Will the effect persist after the context changes?
>
> 2. **Test across segments:**
>    - Does it work for new vs. returning users?
>    - Does it work across geographies?
>    - Does it work on all platforms (web, mobile, app)?
>
> 3. **Replicate after time passes:**
>    - If results were during an unusual period, re-test in normal conditions
>    - For big wins, replicate to confirm stability
>
> PM: "The experiment worked during lockdown. Let's run a holdout for 3 months to see if it still works as behavior normalizes."

**Why it matters:**

External validity asks: "Will these results hold in other contexts?" The answer is often "partially" or "no." Consider what made your experiment context unique, and test generalizability before assuming permanent wins.
