Why LASSO? Why Double LASSO?
Suppose you want to estimate a causal effect — say, does the abortion rate affect the crime rate — and you have 284 candidate control variables to choose from. You cannot use all of them (the estimate explodes), you cannot drop them all (you risk omitted-variable bias), and picking by hand feels arbitrary. Double LASSO automates the choice in a way that is honest about causal inference, not just about prediction.
This app lets you turn the dials yourself. In four tabs you will: sweep the LASSO penalty λ and watch coefficients snap to zero in real time; reproduce the post's headline sign-flip when cross-validation replaces the theory-driven penalty; and explore the actual Fitzgerald-et-al. (2026) forest plot.
L1 (LASSO) vs. L2 (Ridge) — why LASSO selects, Ridge does not
Both methods shrink coefficients. Only LASSO drives them exactly to zero. The animation below shows the same coefficient under the two penalties as λ grows: the orange L1 estimate hits zero abruptly, the steel-blue L2 estimate asymptotes but never reaches zero. This is why LASSO doubles as a variable-selection device.
LASSO Lab
Slide λ and watch which controls survive. Compare the raw LASSO estimate to the post-LASSO OLS refit.
Penalty Showdown
Same data, two penalties. Reproduce the sign-flip Fitzgerald et al. flag in §10. Run 100 simulations to see the bias-variance picture.
Forest Plot
The post's headline figure, interactively. Toggle outcomes and methods. Hover for SEs, CIs, and the number of controls each estimator used.
Glossary (open a card if a term is unfamiliar)
LASSO
Penalty λ
Selection set I_y, I_d
Double LASSO (DL)
Rigorous penalty
CV penalty (λ.min)
Post-OLS step
p / n ratio
LASSO Lab — turn the penalty knob yourself
The simulated data has one treatment variable and many candidate controls. The true treatment coefficient is α = 0.5 (orange curve below). The LASSO chooses how many controls to keep based on a single penalty parameter λ. Drag the λ slider and watch the coefficients shrink to exactly zero, one at a time.
What to look for
- Sparsity grows with λ. Slide right: more controls are pinned to zero. Slide left: more re-enter. At λ ≈ 0 you recover OLS.
- The post-OLS α̂ tracks the true α more closely than the raw LASSO α̂. This is the §10 message — LASSO shrinks everything toward zero, including the treatment. Refit on the selected support to undo it.
- Each orange-curve coefficient (the treatment) is forced to stay in. Try a large p and a large λ: controls disappear, but the orange line keeps a meaningful value.
Penalty Showdown — rigorous vs. cross-validated
Same simulated data. Same Double LASSO recipe. The only difference: how λ is chosen. Rigorous uses Belloni et al.'s theory-driven formula; CV uses 3-fold cross-validation to minimise prediction MSE. The two objectives are not the same, and the headline finding in §10 of the post is that they can move the coefficient in opposite directions.
Rigorous Double LASSO
λ from Belloni et al. (2012): 2 · 1.1 · σ̂ / √n · Φ⁻¹(1 − 0.05/(2p))
CV Double LASSO
λ from 3-fold cross-validation (lambda.min)
Why does this happen?
- CV optimises prediction MSE — out-of-sample loss on y alone, or on d alone. That is not the same as estimating the causal α correctly.
- CV's λ.min over-selects. Many marginally predictive controls survive. Each soaks up a little bit of the treatment variation, leaving less variation for the post-OLS to identify α.
- The rigorous penalty is deliberately conservative. Its Φ⁻¹(1−γ/(2p)) factor is a Bonferroni-style correction across the p candidate variables — designed to keep selection error small relative to estimation noise.
Bias vs. variance over many simulations
Single runs are noisy. Run the whole pipeline 100 times with fresh draws (same parameters, different ε and v) to see whether the CV bias is systematic.
The post's forest plot — interactively
These numbers come straight from results_table2.csv in the
post's folder — the same data used to produce Figure 1. Toggle outcomes
and methods to compare. Hover a point to see its standard error, CI, and
the number of controls the estimator used.
What to look for
- Toggle "Murder" off and watch the Kitchen-sink OLS bar disappear — its +2.34 estimate compresses the x-axis for everyone else. The story of why |X'X|⁻¹ blows up becomes visible.
- Hover any point to see its SE, 95% CI, and the exact number of controls that estimator used. The gap between DL (rigorous) and DL (CV) is most dramatic on Murder (9 controls vs 161).
- Compare the lower bar chart: the orange "DL (CV)" bars dwarf the teal "DL (rigorous)" bars across all three outcomes — 109 vs 12 for property crime, 150 vs 8 for violent crime. That is the §10 over-selection story made visible at a glance.
Outcomes
Methods
Why does Kitchen-sink OLS explode for Murder?
With 284 controls and 576 observations, the design matrix
X is nearly collinear. OLS computes
β̂ = (X'X)⁻¹ X'y, and when the smallest eigenvalues of
X'X are close to zero, its inverse blows up. The result is
the implausible α̂ = +2.34 (a 234% increase in the murder rate per unit
increase in the abortion rate). LASSO's contribution is to choose a
smaller, well-conditioned submatrix automatically — the orange and teal
bars in the chart above show how few controls actually need to stay in.
Connecting back to Tab 3
The DL-rigorous vs DL-CV comparison you just experimented with on simulated data is exactly what happens on the real abortion-crime panel:
- Violent crime: rigorous gives α̂ = −0.096; CV's λ.min flips the sign to +0.019.
- Murder: rigorous gives α̂ = −0.166; CV's λ.min inflates it to −1.11.
- Selection counts: rigorous keeps 8–12 controls; CV keeps 109–161.
The takeaway from the post (§10) is therefore visible twice: once on a controlled simulation where you set the truth, and once on the original 576 × 284 panel that motivates the whole exercise.