The FWL Theorem: Making Multivariate Regressions Intuitive

Partialling-out a confounder to recover a known +0.2 causal effect

−0.106naive slope · wrong sign
+0.267after partialling-out income
+0.200true causal effect (ATE)

Carlos Mendez

Nagoya University (GSID)

June 11, 2026

The Tension

Act I

The same coupon data says both “coupons hurt sales” and “coupons help sales”

A retail chain hands out discount coupons across 50 stores and asks one question: do the coupons increase sales?

Regress sales on coupons and the slope is negative. Add one control and it flips positive. Which answer is real?

Naive regression: coupons look like they reduce sales

Daily sales vs. coupon usage across 50 stores. The orange fit slopes down — coupons appear to hurt sales.

Where we’re going

  • The retail setup: a known +0.2 effect, confounded by income
  • What “controlling for income” actually does, algebraically
  • The FWL recipe: residualize, residualize, regress
  • See the hidden conditional relationship as a scatter plot
  • The bridge to Double Machine Learning

The Investigation

Act II

Income is a confounder that opens a backdoor from coupons to sales

  • Income → fewer coupons (wealthier stores use fewer)
  • Income → more sales (wealthier stores spend more)
  • Coupons → sales: true effect +0.2

The path coupons ← income → sales is non-causal. Leave it open and the negative income→coupon link leaks into the coupon slope.

A simulated lab with a known answer: the true effect is exactly +0.2

def simulate_store_data(n=50, seed=42):
    rng = np.random.default_rng(seed)
    income  = rng.normal(50, 10, n)                 # the confounder
    coupons = 60 - 0.5 * income + rng.normal(0, 5, n)   # income → fewer coupons
    sales   = (10 + 0.2 * coupons + 0.3 * income    # true effect = +0.2
               + 0.5 * dayofweek + rng.normal(0, 3, n))
    return pd.DataFrame(...)

We plant the answer (+0.2) in the data, then check whether each estimator finds it.

The naive slope is −0.106 — and points the wrong way (p = 0.365)

Model Coupons coef. SE p
Naive OLS (no controls) −0.1059 0.116 0.365

Not just imprecise — the sign is backwards from the true +0.2. The confounder is pulling it down.

Add income as a control and the slope flips to +0.267 (p = 0.031)

Model Coupons coef. Income coef. p
Naive OLS −0.1059 0.365
Full OLS (+ income) +0.2673 +0.3836 0.031

Conditioning on income blocks the backdoor: the estimate jumps to +0.267, close to the true +0.2.

FWL: any multivariate coefficient is a univariate slope on residuals

\[\hat\beta_1^{FWL}=\frac{\mathrm{Cov}(\tilde y,\ \tilde x_1)}{\mathrm{Var}(\tilde x_1)}\]

where \(\tilde x_1\) is the residual of \(x_1\) (coupons) regressed on \(x_2\) (income), and \(\tilde y\) is the residual of \(y\) (sales) regressed on \(x_2\).

Remove income from coupons, remove income from sales, then regress the leftovers. Same \(\hat\beta_1\).

Three lines of statsmodels reproduce the multivariate coefficient

# residualize each variable with respect to income
df["coupons_tilde"] = smf.ols("coupons ~ income", df).fit().resid
df["sales_tilde"]   = smf.ols("sales ~ income",   df).fit().resid
# regress residual sales on residual coupons (no intercept)
fwl = smf.ols("sales_tilde ~ coupons_tilde - 1", df).fit()

Residuals are mean-zero, so we drop the intercept. The coefficient on coupons_tilde is the controlled effect.

Residualize-both reproduces +0.2673 exactly — and recovers the SE

FWL step Coupons coef. SE p
Step 1 — residualize \(x_1\) only +0.2673 1.271 0.834
Step 2 — residualize both +0.2673 0.118 0.028

Same coefficient to four decimals; residualizing the outcome too restores the SE to match full OLS (0.120).

Partialling-out, drawn: the residuals are coupon variation income can’t explain

Coupon usage vs. income; the orange line is the income→coupons fit, dashed lines are the residuals each store keeps.

The hidden positive relationship the table couldn’t show you

Residualized sales vs. residualized coupons. With income removed from both, the slope is the +0.267 conditional effect.

Adding the means back keeps the slope but restores readable units

Same residual scatter shifted by the sample means — axes now read ~34% coupons and ~$33.6K sales, slope still +0.267.

FWL scales: two controls, same identity, +0.2706

Model Coupons coef. SE p
Full OLS (+ income + day) +0.2706 0.119 0.028
FWL (+ income + day) +0.2706 0.116 0.023

Partial out income and day-of-week from both sides — identical coefficient. The theorem holds for any number of controls.

The Resolution

Act III

After partialling-out income, coupons raise sales by +0.267

+0.267

\(\hat\beta_1\) on coupons, full OLS = FWL (SE 0.118) · matches the true +0.200 within finite-sample noise

Simpson’s paradox, resolved: the slope flips from −0.106 to +0.267

Left: naive negative slope. Right: positive slope after partialling-out income. Same 50 stores.

Six estimators, one coefficient: FWL is an identity, not an approximation

Method Coupons coef. SE p
Naive OLS (no controls) −0.1059 0.116 0.365
Full OLS (+ income) +0.2673 0.120 0.031
FWL residualize \(x\) only +0.2673 1.271 0.834
FWL residualize both +0.2673 0.118 0.028
Full OLS (+ income + day) +0.2706 0.119 0.028
FWL (+ income + day) +0.2706 0.116 0.023

Does FWL make this causal? No — it visualizes, it does not identify

Objection. Residualizing on income looks like a trick that manufactures a causal effect.

Response. FWL is pure algebra — it only reproduces what OLS already computes. The causal reading needs one assumption: income is the only confounder. FWL pictures that adjustment; it cannot certify it.

FWL is Double Machine Learning with a linear mop

FWL (here)

  • residualize \(y\), \(d\) with OLS
  • regress residual \(y\) on residual \(d\)
  • exact for linear confounding

Double ML

  • residualize \(y\), \(d\) with ML (forest, lasso)
  • regress residual \(y\) on residual \(d\)
  • handles non-linear, high-dim controls

Same residualize-then-regress logic; swap OLS for a flexible learner and you get a debiased causal estimate.

Don’t read the coefficient — read the partialled-out scatter.