Difference-in-Differences in Stata

Did after-school tutoring raise GPA? A disciplined evaluation

25.32ATT · GPA points
36.20naive · overstates 43%
5equivalent estimators

Carlos Mendez

Nagoya University (GSID)

June 11, 2026

The Tension

Act I

A program lifts the treated group’s GPA by 36 points — but is that the program?

A fictitious government runs an after-school tutoring program in 10 of 35 high schools to raise the GPA of low-income students.

Look only at the treated schools and GPA jumps from 60 to 96. Spectacular. Or is something else rising too?

The naive before-after answer is 36.20 — the credible answer is much smaller

Interrupted Time Series — treated schools only. GPA leaps across the red treatment line from ~60 to ~96.

Where we’re going

  • The naive ITS trap: before-after overstates the effect
  • The 2×2 DiD design — a comparison group rebuilds the counterfactual
  • Five equivalent Stata estimators land on one number
  • The event study — testing parallel trends with pre-treatment leads

The Investigation

Act II

The lab: 35 schools, 2 periods, a clean simultaneous rollout

  • Outcomegpa, a school’s mean GPA on a 0–100 scale
  • Treatment — 10 schools get tutoring; 25 are the comparison group
  • Design — every treated school switches on at the same time (no staggering)

A strongly-balanced panel: 35 schools × 2 periods = 70 observations. The estimand is the ATT \(E[Y_i(1)-Y_i(0)\mid D_i=1]\) — the effect for the schools that actually got the program.

All 10 treated schools switch on together — the ideal 2×2 setup

Treatment-timing heatmap. Treated schools (IDs 26–35) flip pre → post simultaneously at time 2; the 25 comparison schools never switch.

DiD rebuilds the counterfactual from the comparison group’s drift

The comparison group (rising gently) supplies the dashed counterfactual: where the treated would have ended up without the program.

The double difference: subtract the comparison group’s trend from the treated group’s

\[DiD = \Big(\bar{Y}_{1}^{T} - \bar{Y}_{0}^{T}\Big) - \Big(\bar{Y}_{1}^{C} - \bar{Y}_{0}^{C}\Big)\]

Treated change \(36.20\) minus comparison change \(10.88\) = the program’s effect, with the common time trend removed.

The means table makes the subtraction explicit: 36.20 − 10.88 = 25.32

Group Pre Post Change
Comparison (25 schools) 71.22 82.10 10.88
Treated (10 schools) 60.17 96.37 36.20
DiD estimate 25.32

Roughly one-third of the treated group’s raw gain (10.88 points) was natural drift, not the program.

The formal diff command: ATT = 25.315, SE 0.627, p < 0.001

diff gpa, treated(treated) period(post)
Contrast Estimate SE Sig.?
Before: Diff (T−C) −11.049 0.443 yes
After: Diff (T−C) 14.266 0.443 yes
Diff-in-Diff 25.315 0.627 yes

The same number as a regression: the interaction txp IS the DiD

\[Y_{it} = \alpha + \beta_1 \text{Treat}_i + \beta_2 \text{Post}_t + \beta_3 (\text{Treat}_i \times \text{Post}_t) + \varepsilon_{it}\]

reg gpa treated post txp, robust

\(\hat\beta_3 = 25.31\) (SE 0.61) — the DiD.

The rest is nuisance: constant 71.22 (comparison pre-mean), \(\hat\beta_1 = -11.05\) (baseline gap), \(\hat\beta_2 = 10.89\) (common trend).

TWFE absorbs school and time effects; the interaction survives unchanged

\[Y_{it} = \beta_3 (\text{Treat}_i \times \text{Post}_t) + \gamma_i + \vartheta_t + \varepsilon_{it}\]

xtreg  gpa txp i.time, fe vce(cluster id)
reghdfe gpa txp, absorb(id time) cluster(id)

\(\gamma_i\) wipes out permanent school differences.

\(\vartheta_t\) wipes out common shocks. What remains: \(\hat\beta_3 = 25.31\).

Five estimators, one answer: 25.31–25.33 across the board

Method Estimate SE Clustered?
diff (manual) 25.315 0.627 no
reg (interaction) 25.315 0.615 robust
didregress 25.315 0.834 yes
xtreg (TWFE) 25.315 0.585 yes
reghdfe (+ covariate) 25.328 0.605 yes

Adding the female-share control moves the estimate by ~0.01 points. The design — not the covariates — does the heavy lifting.

The Resolution

Act III

The credible ATT is 25.32 GPA points — and the naive number overstated it by 43%

25.32

\(\hat\delta\), ATT (SE 0.627) · vs. naive ITS 36.20 — the 10.88-point gap was secular drift

Leads near zero, lags near 25 — the table behind the picture

Period Coefficient SE Sig.?
lead 4 0.342 0.401 no
lead 3 −0.322 0.441 no
lead 2 0.593 0.423 no
lag 0 25.028 0.445 yes
lag 1 24.705 0.559 yes
lag 2 24.768 0.739 yes
lag 3 25.701 0.797 yes

Lags span < 1 GPA point across four periods: no fade-out, no ramp-up — an immediate, sustained effect.

Let the comparison group, not the calendar, tell you what the program did.