Treatment Effects in Stata

Does maternal smoking lower birth weight? Six estimators, one dataset

−275 gnaive, unadjusted gap
−230 gfive-method consensus
6estimators compared

Carlos Mendez

Nagoya University (GSID)

June 11, 2026

The Tension

Act I

Smokers’ babies weigh 275 g less — but is that smoking, or who smokes?

A raw comparison says smokers’ newborns are 275 grams lighter. Striking — and almost certainly wrong as a causal number.

Smokers are younger, less educated, less often married, less likely to seek early prenatal care. Each alone predicts a lighter baby.

One unadjusted shift — and we cannot yet say what causes it

Kernel density of infant birth weight. Smokers’ distribution (orange) sits ~250 g left of non-smokers’ (steel blue) — but the shift conflates smoking with confounders.

Where we’re going

  • The estimand: ATE and ATT under the potential-outcomes framework
  • What makes adjustment credible: unconfoundedness + overlap
  • Six estimators on four routes — outcome, treatment, both, or neither
  • The payoff: do they agree, and what does it mean if they do?

The Investigation

Act II

The lab: 4,642 births, 864 smokers, six pre-treatment confounders

  • Outcome \(Y\) — infant birth weight in grams (bweight)
  • Treatment \(D\) — mother smoked during pregnancy (mbsmoke)
  • Confounders \(X\) — age, education, marital status, prenatal care, parity, father’s age

Smokers are 18.6% of the sample (864 of 4,642). The minority-treatment imbalance is exactly why a difference of means is risky.

We want one of two averages, not one mother’s effect

\[\tau_{ATE}=E[Y(1)-Y(0)] \qquad \tau_{ATT}=E[Y(1)-Y(0)\mid D=1]\]

We never see both potential outcomes for the same mother — causal inference is a missing-data problem in disguise.

ATE asks “what if smoking became universal?” · ATT asks “what is happening to those who currently smoke?”

Adjustment is credible only under two assumptions

Unconfoundedness

\[\{Y(0),Y(1)\}\perp D\mid X\]

Among mothers identical on \(X\), smoking is as good as random. Bold, and not directly testable.

Overlap

\[0<e(X)<1\]

For every covariate profile, both smokers and non-smokers exist. Testable — we check it.

Here \(e(X)=\Pr(D=1\mid X)\) is the propensity score. SUTVA — no interference, one version of treatment — rounds out the three.

Six estimators, four routes — what does each one model?

Estimator Outcome? Treatment? Core mechanic
RA Predict \(Y(1),Y(0)\), average the gap
IPW Reweight by \(1/\hat e(X)\)
IPWRA RA with IPW weights
AIPW RA + residual correction (efficient)
NNM Match on covariates (Mahalanobis)
PSM Match on the propensity score

Doubly robust (IPWRA, AIPW): consistent if either model is right. NNM is the only fully model-free estimator.

With no controls, OLS just restates the biased −275 g gap

regress bweight mbsmoke, vce(robust)
* mbsmoke coefficient: -275.25 g  (95% CI -316.8, -233.7;  t = -12.97)

A precise estimate of the wrong quantity — it absorbs the causal effect plus every covariate that differs between the groups.

Regression adjustment models the outcome and shrinks −275 g to −240 g

\[\hat\tau_{RA}=\frac{1}{n}\sum_{i=1}^{n}\big[\hat\mu_1(X_i)-\hat\mu_0(X_i)\big]\]

teffects ra (bweight mmarried mage prenatal1 fbaby) (mbsmoke), ate
* ATE = -239.6 g   ATT = -223.3 g

Fit one outcome model per arm, predict both potential outcomes for everyone, average the gap. \(\hat\mu_d(X)=E[Y\mid D=d,X]\).

IPW models the treatment instead — and lands at −230.9 g

\[\hat\tau_{IPW}=\frac{1}{n}\sum_i\left[\frac{D_iY_i}{\hat e(X_i)}-\frac{(1-D_i)Y_i}{1-\hat e(X_i)}\right]\]

Reweight every mother by the inverse of her propensity to smoke — the reweighted sample mimics a randomized experiment.

RA models birth weight; IPW models smoking. They agree to within ~9 g — the first strong signal the effect is real, not a one-model artifact.

Both distributions span (0,1): overlap holds, so IPW is stable

Estimated propensity scores by smoking status. Non-smokers (steel blue) cluster low, smokers (orange) cluster high — but both span most of the unit interval. No zone where one group is absent.

Doubly robust buys insurance for under a gram: −231.9 g and −232.5 g

IPWRA

  • RA run with IPW weights
  • ATE \(=-231.9\) g
  • consistent if either model is right

AIPW

  • RA + propensity-weighted residual term
  • ATE \(=-232.5\) g
  • attains the efficiency bound

Belt and suspenders: only a simultaneous failure of both models breaks them. They differ by 0.6 g.

NNM fits no model at all — it finds each smoker a statistical twin

\[\hat\tau_{NNM}=\frac{1}{n}\sum_i (2D_i-1)\left[Y_i-\frac{1}{M}\sum_{j\in J_M(i)}Y_j\right]\]

For every smoking mother, find her closest non-smoker(s) in covariate space by Mahalanobis distance, then compare birth weights.

teffects nnmatch (bweight mmarried mage fage medu prenatal1) (mbsmoke), ///
   ematch(mmarried prenatal1) biasadj(mage fage medu) ate
* ATE = -210.1 g   ATT = -238.5 g

PSM collapses six covariates to one score and matches on it

On a 100-mother subsample: each smoker (orange, top row) is matched to the non-smoker(s) with the closest propensity score. Rosenbaum–Rubin: matching on the scalar score balances every covariate that built it.

The Resolution

Act III

Five very different estimators converge on roughly −230 g

−230 g

RA, IPW, IPWRA, AIPW, PSM all land between −229 and −240 g; NNM the lone outlier at −210 g

The forest plot: adjustment rules out the naive −275 g

ATE ± 95% CI across seven specifications. The naive estimate (−275 g) is the most negative; six adjusted estimators cluster near −230 g, NNM the slight outlier at −210 g.

ATT can flip the story: for NNM the treated lose more, not less

Estimator ATE (g) ATT (g)
RA −239.6 −223.3
IPW −230.9 −219.6
IPWRA −231.9 −220.6
NNM −210.1 −238.5
PSM −229.4 −224.6

Four methods: ATT closer to zero than ATE. NNM reverses it — the actual smokers sit where smoking does more damage.

Does matching make this causal? No — two assumptions still carry it

Objection. Machine-matching or reweighting controls cannot manufacture identification.

Response. Correct. The −230 g is identified only under unconfoundedness given \(X\) and overlap. The six methods discipline how we adjust; none rules out an unmeasured confounder — stress, income, nutrition — that drives both smoking and birth weight. Convergence is reassuring, not proof.

When five honest routes agree, trust the −230 g over the −275 g.