Treatment Effects in Stata

Does maternal smoking lower birth weight? Six estimators, one dataset

−275 gnaive, unadjusted gap

−230 gfive-method consensus

6estimators compared

Carlos Mendez

Nagoya University (GSID)

June 11, 2026

The Tension

Act I

Smokers’ babies weigh 275 g less — but is that smoking, or who smokes?

A raw comparison says smokers’ newborns are 275 grams lighter. Striking — and almost certainly wrong as a causal number.

Smokers are younger, less educated, less often married, less likely to seek early prenatal care. Each alone predicts a lighter baby.

One unadjusted shift — and we cannot yet say what causes it

Kernel density of infant birth weight. Smokers’ distribution (orange) sits ~250 g left of non-smokers’ (steel blue) — but the shift conflates smoking with confounders.

Where we’re going

The estimand: ATE and ATT under the potential-outcomes framework
What makes adjustment credible: unconfoundedness + overlap
Six estimators on four routes — outcome, treatment, both, or neither
The payoff: do they agree, and what does it mean if they do?

The Investigation

Act II

The lab: 4,642 births, 864 smokers, six pre-treatment confounders

Outcome \(Y\) — infant birth weight in grams (bweight)
Treatment \(D\) — mother smoked during pregnancy (mbsmoke)
Confounders \(X\) — age, education, marital status, prenatal care, parity, father’s age

Smokers are 18.6% of the sample (864 of 4,642). The minority-treatment imbalance is exactly why a difference of means is risky.

We want one of two averages, not one mother’s effect

\[\tau_{ATE}=E[Y(1)-Y(0)] \qquad \tau_{ATT}=E[Y(1)-Y(0)\mid D=1]\]

We never see both potential outcomes for the same mother — causal inference is a missing-data problem in disguise.

ATE asks “what if smoking became universal?” · ATT asks “what is happening to those who currently smoke?”

Adjustment is credible only under two assumptions

Unconfoundedness

\[\{Y(0),Y(1)\}\perp D\mid X\]

Among mothers identical on \(X\), smoking is as good as random. Bold, and not directly testable.

Overlap

\[0<e(X)<1\]

For every covariate profile, both smokers and non-smokers exist. Testable — we check it.

Here \(e(X)=\Pr(D=1\mid X)\) is the propensity score. SUTVA — no interference, one version of treatment — rounds out the three.

Six estimators, four routes — what does each one model?

Estimator	Outcome?	Treatment?	Core mechanic
RA	✓	—	Predict \(Y(1),Y(0)\), average the gap
IPW	—	✓	Reweight by \(1/\hat e(X)\)
IPWRA	✓	✓	RA with IPW weights
AIPW	✓	✓	RA + residual correction (efficient)
NNM	—	—	Match on covariates (Mahalanobis)
PSM	—	✓	Match on the propensity score

Doubly robust (IPWRA, AIPW): consistent if either model is right. NNM is the only fully model-free estimator.

With no controls, OLS just restates the biased −275 g gap

regress bweight mbsmoke, vce(robust)
* mbsmoke coefficient: -275.25 g  (95% CI -316.8, -233.7;  t = -12.97)

A precise estimate of the wrong quantity — it absorbs the causal effect plus every covariate that differs between the groups.

Regression adjustment models the outcome and shrinks −275 g to −240 g

\[\hat\tau_{RA}=\frac{1}{n}\sum_{i=1}^{n}\big[\hat\mu_1(X_i)-\hat\mu_0(X_i)\big]\]

teffects ra (bweight mmarried mage prenatal1 fbaby) (mbsmoke), ate
* ATE = -239.6 g   ATT = -223.3 g

Fit one outcome model per arm, predict both potential outcomes for everyone, average the gap. \(\hat\mu_d(X)=E[Y\mid D=d,X]\).

IPW models the treatment instead — and lands at −230.9 g

\[\hat\tau_{IPW}=\frac{1}{n}\sum_i\left[\frac{D_iY_i}{\hat e(X_i)}-\frac{(1-D_i)Y_i}{1-\hat e(X_i)}\right]\]

Reweight every mother by the inverse of her propensity to smoke — the reweighted sample mimics a randomized experiment.

RA models birth weight; IPW models smoking. They agree to within ~9 g — the first strong signal the effect is real, not a one-model artifact.

Both distributions span (0,1): overlap holds, so IPW is stable

Estimated propensity scores by smoking status. Non-smokers (steel blue) cluster low, smokers (orange) cluster high — but both span most of the unit interval. No zone where one group is absent.

Doubly robust buys insurance for under a gram: −231.9 g and −232.5 g

IPWRA

RA run with IPW weights
ATE \(=-231.9\) g
consistent if either model is right

AIPW

RA + propensity-weighted residual term
ATE \(=-232.5\) g
attains the efficiency bound

Belt and suspenders: only a simultaneous failure of both models breaks them. They differ by 0.6 g.

NNM fits no model at all — it finds each smoker a statistical twin

\[\hat\tau_{NNM}=\frac{1}{n}\sum_i (2D_i-1)\left[Y_i-\frac{1}{M}\sum_{j\in J_M(i)}Y_j\right]\]

For every smoking mother, find her closest non-smoker(s) in covariate space by Mahalanobis distance, then compare birth weights.

teffects nnmatch (bweight mmarried mage fage medu prenatal1) (mbsmoke), ///
   ematch(mmarried prenatal1) biasadj(mage fage medu) ate
* ATE = -210.1 g   ATT = -238.5 g

PSM collapses six covariates to one score and matches on it

On a 100-mother subsample: each smoker (orange, top row) is matched to the non-smoker(s) with the closest propensity score. Rosenbaum–Rubin: matching on the scalar score balances every covariate that built it.

The Resolution

Act III

Five very different estimators converge on roughly −230 g

−230 g

RA, IPW, IPWRA, AIPW, PSM all land between −229 and −240 g; NNM the lone outlier at −210 g

The forest plot: adjustment rules out the naive −275 g

ATE ± 95% CI across seven specifications. The naive estimate (−275 g) is the most negative; six adjusted estimators cluster near −230 g, NNM the slight outlier at −210 g.

ATT can flip the story: for NNM the treated lose more, not less

Estimator	ATE (g)	ATT (g)
RA	−239.6	−223.3
IPW	−230.9	−219.6
IPWRA	−231.9	−220.6
NNM	−210.1	−238.5
PSM	−229.4	−224.6

Four methods: ATT closer to zero than ATE. NNM reverses it — the actual smokers sit where smoking does more damage.

Does matching make this causal? No — two assumptions still carry it

Objection. Machine-matching or reweighting controls cannot manufacture identification.

Response. Correct. The −230 g is identified only under unconfoundedness given \(X\) and overlap. The six methods discipline how we adjust; none rules out an unmeasured confounder — stress, income, nutrition — that drives both smoking and birth weight. Convergence is reassuring, not proof.