← Back to the post
Interactive data dictionary

Conditional Average Treatment Effects (CATE) with Stata 19

The canonical 401(k) eligibility and household assets sample (assets3), for heterogeneous treatment-effect estimation.

1
dataset
11
variables
9,913
households
cross-section
structure

Downloads

Each dataset is available as a labeled Stata .dta and its source file.

⇩ Download all data (ZIP)stata_codebook.do

DatasetGrainRowsStataSource
assets3_rawhousehold (cross-section)9,913 × 11assets3_raw.dtaassets3_raw.csv

Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.

Load directly in code

Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.

Stata

* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_cate/data/"
use "${BASE}assets3_raw.dta", clear
describe
notes

Python

!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_cate/data/"
df = pd.read_stata(BASE + "assets3_raw.dta")

# load every dataset at once
files = ["assets3_raw"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}

# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "assets3_raw.dta", "assets3_raw.dta")
df, meta = pyreadstat.read_dta("assets3_raw.dta")

Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb

R

# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_cate/data/"
df <- read_dta(paste0(BASE, "assets3_raw.dta"))

Overview & sources

Companion data for a hands-on Stata tutorial on estimating Conditional Average Treatment Effects (CATE) with Stata 19's new cate command. The dataset is the canonical assets3 excerpt — an extract from Chernozhukov & Hansen (2004) that ships with Stata 19 (webuse assets3) — covering 9,913 U.S. households. The outcome is net total financial assets (US$); the treatment is employer-offered 401(k) e401k eligibility (not actual participation); the remaining columns are demographic and financial covariates that describe the heterogeneity of interest. The post contrasts partialing-out (PO) and augmented inverse-probability weighting (AIPW) estimators against a parametric teffects aipw benchmark, then probes heterogeneity with estat heterogeneity, estat projection, GATE on prespecified income groups, GATES on data-driven quartiles, estat classification, and a nonparametric estat series fit. The raw eligible-versus-ineligible gap of $19,557 shrinks to a doubly robust ATE near $8,000, and income emerges as the dominant moderator.

One file, cross-sectional. assets3_raw is a single cross-section — one row per household, no time dimension. It is exported verbatim from Stata 19's built-in assets3 sample by the post's do-file (export delimited); the string-coded columns (e.g. e401k, pension) carry Stata's value-label text rather than the underlying 0/1 codes.

Data sources

SourceProvidesReference / URL
Chernozhukov &amp; Hansen (2004)Source study for the assets3 sample (401(k) eligibility and the household wealth distribution)Chernozhukov, V., & Hansen, C. (2004). The effects of 401(k) participation on the wealth distribution: an instrumental quantile regression analysis. Review of Economics and Statistics, 86(3), 735–751. https://doi.org/10.1162/0034653041811734
StataCorp (Stata 19)Distribution of the analysis dataset (webuse assets3) and the cate estimation commandStataCorp. (2025). Stata 19 Causal Inference and Treatment-Effects Reference Manual: cate. https://www.stata.com/manuals/causal.pdf
Method referencesCATE / heterogeneous-treatment-effect estimatorsAthey, Tibshirani & Wager (2019); Chernozhukov et al. (2018); Robinson (1988).

Cite this data

Please cite this dataset as follows.

APA

Mendez, C. (2026). Conditional Average Treatment Effects (CATE) with Stata 19 [Data set]. https://carlos-mendez.org/post/stata_cate/

Chernozhukov, V., & Hansen, C. (2004). The effects of 401(k) participation on the wealth distribution: an instrumental quantile regression analysis. Review of Economics and Statistics, 86(3), 735–751. https://doi.org/10.1162/0034653041811734

BibTeX

@misc{mendez2026statacate,
  author       = {Mendez, Carlos},
  title        = {Conditional Average Treatment Effects (CATE) with Stata 19},
  year         = {2026},
  howpublished = {\url{https://carlos-mendez.org/post/stata_cate/}},
  note         = {Data set}
}

@article{chernozhukov2004effects,
  author  = {Chernozhukov, Victor and Hansen, Christian},
  title   = {The effects of 401(k) participation on the wealth distribution: an instrumental quantile regression analysis},
  journal = {Review of Economics and Statistics},
  volume  = {86}, number = {3}, pages = {735--751}, year = {2004},
  doi     = {10.1162/0034653041811734}
}

Variable explorer search & filter all 11 variables

Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.

VariableTypeDistributionLabelDefinitionUnitsIn filesSource
age#continuousmin 25 | median 40 | max 64Age of household head (years)Age of the household head in years.yearsassets3_rawStata assets3
assets#continuousmin -5.02e+05 | median 1.5e+03 | max 1.54e+06Net total financial assets (US$)Household net total financial assets — the outcome variable.US$assets3_rawChernozhukov & Hansen (2004) / Stata assets3
e401k#identifier401(k) eligibility (treatment)Whether the household head's employer offers a 401(k) plan (eligibility, not participation) — the treatment.categoryassets3_rawStata assets3
educ#continuousmin 1 | median 12 | max 18Years of educationYears of completed education of the household head.yearsassets3_rawStata assets3
income#continuousmin 0 | median 3.15e+04 | max 2.42e+05Household income (US$)Annual household income.US$assets3_rawStata assets3
incomecat#identifierIncome category (0–4)Coarse household-income category used for the prespecified GATE groups.0–4assets3_rawStata assets3
ira#identifierIRA participationWhether the household holds an Individual Retirement Account (IRA).categoryassets3_rawStata assets3
married#identifierMarital statusMarital status of the household head.categoryassets3_rawStata assets3
ownhome#identifierHomeownerWhether the household owns its home.categoryassets3_rawStata assets3
pension#identifierPension benefits statusWhether the household receives defined-benefit pension benefits.categoryassets3_rawStata assets3
twoearn#identifierTwo-earner householdWhether the household has two earners.categoryassets3_rawStata assets3

Cross-file variable index

Which file each variable appears in (● = present).

Variableassets3_raw
age
assets
e401k
educ
income
incomecat
ira
married
ownhome
pension
twoearn

Construction & formulas

The estimand is the Conditional Average Treatment Effect — the average effect of 401(k) eligibility d on net assets y for households with covariate profile x:

Stata 19's cate command estimates τ(x) with cross-fitted lasso nuisance models (the learners for the outcome and treatment functions) plus a generalized random forest for the individual-effect function, and an honest-tree bootstrap for inference. The PO (partialing-out, partial-linear) and AIPW (fully interactive) routes make different model assumptions but both return a per-household τ̂(x_i). Note: the columns in assets3_raw.csv are the raw inputs only; the IATE predictions τ̂_i are produced by the do-file and not part of this dataset.

The datasets

Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.

expand to search (Ctrl/⌘+F) or print across all datasets

household (cross-section)  9,913 × 11 · single cross-section (no time dimension) · 9,913 U.S. households

Panel key: row = one household (no explicit id column) · Estimate CATE of 401(k) eligibility on net assets (PO / AIPW / GATE / GATES).

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
assets continuousNet total financial assets (US$)Household net total financial assets — the outcome variable.From the assets3 sample; net of liabilities, so values can be negative.US$Chernozhukov & Hansen (2004) / Stata assets39,913 households
e401k identifier401(k) eligibility (treatment)Whether the household head's employer offers a 401(k) plan (eligibility, not participation) — the treatment.Value-label text from the assets3 sample (Eligible / Not eligible).categoryStata assets33,682 Eligible; 6,231 Not eligible
age continuousAge of household head (years)Age of the household head in years.From the assets3 sample.yearsStata assets39,913 households
educ continuousYears of educationYears of completed education of the household head.From the assets3 sample.yearsStata assets39,913 households
income continuousHousehold income (US$)Annual household income.From the assets3 sample.US$Stata assets39,913 households
incomecat identifierIncome category (0–4)Coarse household-income category used for the prespecified GATE groups.Discrete income bin from the assets3 sample (0 = lowest … 4 = highest).0–4Stata assets39,913 households
pension identifierPension benefits statusWhether the household receives defined-benefit pension benefits.Value-label text from the assets3 sample (Receives pension / No pension).categoryStata assets39,913 households
married identifierMarital statusMarital status of the household head.Value-label text from the assets3 sample (Married / Not married).categoryStata assets39,913 households
twoearn identifierTwo-earner householdWhether the household has two earners.Value-label text from the assets3 sample (Yes / No).categoryStata assets39,913 households
ira identifierIRA participationWhether the household holds an Individual Retirement Account (IRA).Value-label text from the assets3 sample (Yes / No).categoryStata assets39,913 households
ownhome identifierHomeownerWhether the household owns its home.Value-label text from the assets3 sample (Yes / No).categoryStata assets39,913 households

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
assetsmin -5.02e+05 | median 1.5e+03 | max 1.54e+06100%9,9135,168-502,30218,0541,499.01,536,79863,529
e401k100%9,9132
agemin 25 | median 40 | max 64100%9,9134025.0041.0640.0064.0010.34
educmin 1 | median 12 | max 18100%9,913181.0013.2112.0018.002.81
incomemin 0 | median 3.15e+04 | max 2.42e+05100%9,9137,332037,20831,488242,12424,771
incomecat100%9,9135
pension100%9,9132
married100%9,9132
twoearn100%9,9132
ira100%9,9132
ownhome100%9,9132

Known limitations & caveats