Double LASSO Interactive Lab

Why LASSO? Why Double LASSO?

Suppose you want to estimate a causal effect — say, does the abortion rate affect the crime rate — and you have 284 candidate control variables to choose from. You cannot use all of them (the estimate explodes), you cannot drop them all (you risk omitted-variable bias), and picking by hand feels arbitrary. Double LASSO automates the choice in a way that is honest about causal inference, not just about prediction.

This app lets you turn the dials yourself. In four tabs you will: sweep the LASSO penalty λ and watch coefficients snap to zero in real time; reproduce the post's headline sign-flip when cross-validation replaces the theory-driven penalty; and explore the actual Fitzgerald-et-al. (2026) forest plot.

L1 (LASSO) vs. L2 (Ridge) — why LASSO selects, Ridge does not

Both methods shrink coefficients. Only LASSO drives them exactly to zero. The animation below shows the same coefficient under the two penalties as λ grows: the orange L1 estimate hits zero abruptly, the steel-blue L2 estimate asymptotes but never reaches zero. This is why LASSO doubles as a variable-selection device.

Tab 2

LASSO Lab

Slide λ and watch which controls survive. Compare the raw LASSO estimate to the post-LASSO OLS refit.

Tab 3

Penalty Showdown

Same data, two penalties. Reproduce the sign-flip Fitzgerald et al. flag in §10. Run 100 simulations to see the bias-variance picture.

Tab 4

Forest Plot

The post's headline figure, interactively. Toggle outcomes and methods. Hover for SEs, CIs, and the number of controls each estimator used.

Glossary (open a card if a term is unfamiliar)

LASSO

L1-penalised least squares. The absolute-value penalty produces exactly-zero coefficients — variable selection comes for free.

Penalty λ

The knob controlling shrinkage. Larger λ pins more coefficients to zero. This is the main slider in Tab 2.

Selection set I_y, I_d

The indices of controls each LASSO step keeps. Their union is the support of the post-OLS regression.

Double LASSO (DL)

Two LASSOs — one for y, one for d — then OLS on the union. The causal-inference-safe variant.

Rigorous penalty

A theory-driven choice of λ from Belloni et al. (2012). Tuned to keep selection error small relative to estimation noise, not to minimise prediction MSE.

CV penalty (λ.min)

A data-driven choice of λ that minimises out-of-fold prediction MSE. Different objective, can flip the sign of α̂.

Post-OLS step

After LASSO selects a support, refit by plain OLS. LASSO is used only for selection; the final α̂ is unshrunk.

p / n ratio

Number of candidate controls divided by sample size. In the post, 284 / 576 ≈ 0.5 — the regime where DL was designed to help.

LASSO Lab — turn the penalty knob yourself

The simulated data has one treatment variable and many candidate controls. The true treatment coefficient is α = 0.5 (orange curve below). The LASSO chooses how many controls to keep based on a single penalty parameter λ. Drag the λ slider and watch the coefficients shrink to exactly zero, one at a time.

Sample size n 200

More data ⇒ each control's coefficient is estimated more precisely.

Number of controls p 40

About 15% of these have a true nonzero effect; the rest are noise.

Signal strength 0.60

Magnitude of the truly-relevant coefficients relative to noise.

Penalty —

Slide left for less shrinkage (more controls survive); right for more.

controls kept (|I|)

—

out of — candidates

α̂ from raw LASSO

—

shrunk toward zero

α̂ from post-OLS

—

refit on selected support

true α

0.50

held fixed for comparison

What to look for

Sparsity grows with λ. Slide right: more controls are pinned to zero. Slide left: more re-enter. At λ ≈ 0 you recover OLS.
The post-OLS α̂ tracks the true α more closely than the raw LASSO α̂. This is the §10 message — LASSO shrinks everything toward zero, including the treatment. Refit on the selected support to undo it.
Each orange-curve coefficient (the treatment) is forced to stay in. Try a large p and a large λ: controls disappear, but the orange line keeps a meaningful value.

Penalty Showdown — rigorous vs. cross-validated

Same simulated data. Same Double LASSO recipe. The only difference: how λ is chosen. Rigorous uses Belloni et al.'s theory-driven formula; CV uses 3-fold cross-validation to minimise prediction MSE. The two objectives are not the same, and the headline finding in §10 of the post is that they can move the coefficient in opposite directions.

Sample size n 200

Capped at 300 so the "Run 100 sims" button finishes quickly.

Number of controls p 40

Capped at 50 for the 100-sim run.

Signal strength 0.50

Common scale for both π and θ.

Asymmetry 0.80

0 = controls predict y and d equally · 1 = controls predict d well, y barely. The §9 fingerprint.

Rigorous Double LASSO

λ from Belloni et al. (2012): 2 · 1.1 · σ̂ / √n · Φ⁻¹(1 − 0.05/(2p))

α̂—

SE(α̂)—

|I_y|—

|I_d|—

union |I_y ∪ I_d|—

λ_y, λ_d—

CV Double LASSO

λ from 3-fold cross-validation (lambda.min)

α̂—

SE(α̂)—

|I_y|—

|I_d|—

union |I_y ∪ I_d|—

λ_y, λ_d—

Why does this happen?

CV optimises prediction MSE — out-of-sample loss on y alone, or on d alone. That is not the same as estimating the causal α correctly.
CV's λ.min over-selects. Many marginally predictive controls survive. Each soaks up a little bit of the treatment variation, leaving less variation for the post-OLS to identify α.
The rigorous penalty is deliberately conservative. Its Φ⁻¹(1−γ/(2p)) factor is a Bonferroni-style correction across the p candidate variables — designed to keep selection error small relative to estimation noise.

Bias vs. variance over many simulations

Single runs are noisy. Run the whole pipeline 100 times with fresh draws (same parameters, different ε and v) to see whether the CV bias is systematic.

The post's forest plot — interactively

These numbers come straight from results_table2.csv in the post's folder — the same data used to produce Figure 1. Toggle outcomes and methods to compare. Hover a point to see its standard error, CI, and the number of controls the estimator used.

What to look for

Toggle "Murder" off and watch the Kitchen-sink OLS bar disappear — its +2.34 estimate compresses the x-axis for everyone else. The story of why |X'X|⁻¹ blows up becomes visible.
Hover any point to see its SE, 95% CI, and the exact number of controls that estimator used. The gap between DL (rigorous) and DL (CV) is most dramatic on Murder (9 controls vs 161).
Compare the lower bar chart: the orange "DL (CV)" bars dwarf the teal "DL (rigorous)" bars across all three outcomes — 109 vs 12 for property crime, 150 vs 8 for violent crime. That is the §10 over-selection story made visible at a glance.

Why does Kitchen-sink OLS explode for Murder?

With 284 controls and 576 observations, the design matrix X is nearly collinear. OLS computes β̂ = (X'X)⁻¹ X'y, and when the smallest eigenvalues of X'X are close to zero, its inverse blows up. The result is the implausible α̂ = +2.34 (a 234% increase in the murder rate per unit increase in the abortion rate). LASSO's contribution is to choose a smaller, well-conditioned submatrix automatically — the orange and teal bars in the chart above show how few controls actually need to stay in.

Connecting back to Tab 3

The DL-rigorous vs DL-CV comparison you just experimented with on simulated data is exactly what happens on the real abortion-crime panel:

Violent crime: rigorous gives α̂ = −0.096; CV's λ.min flips the sign to +0.019.
Murder: rigorous gives α̂ = −0.166; CV's λ.min inflates it to −1.11.
Selection counts: rigorous keeps 8–12 controls; CV keeps 109–161.

The takeaway from the post (§10) is therefore visible twice: once on a controlled simulation where you set the truth, and once on the original 576 × 284 panel that motivates the whole exercise.

Double LASSO — Interactive Lab

Why LASSO? Why Double LASSO?

L1 (LASSO) vs. L2 (Ridge) — why LASSO selects, Ridge does not

LASSO Lab

Penalty Showdown

Forest Plot

Glossary (open a card if a term is unfamiliar)

LASSO Lab — turn the penalty knob yourself

What to look for

Penalty Showdown — rigorous vs. cross-validated

Rigorous Double LASSO

CV Double LASSO

Why does this happen?

Bias vs. variance over many simulations

The post's forest plot — interactively

What to look for

Outcomes

Methods

Why does Kitchen-sink OLS explode for Murder?

Connecting back to Tab 3