← Back to the post
Interactive data dictionary

Synthetic Control with Prediction Intervals

The classic German-reunification panel behind a Python SCPI tutorial: GDP per capita for 17 countries, 1960–2003.

17
countries
1960–2003
years
748
rows
11
variables

Downloads

Each dataset is available as a labeled Stata .dta and its source file.

⇩ Download all data (ZIP)stata_codebook.do

DatasetGrainRowsStataSource
datacountry-year748 × 11data.dtadata.csv

Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.

Load directly in code

Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.

Stata

* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_scpi/data/"
use "${BASE}data.dta", clear
describe
notes

Python

!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_scpi/data/"
df = pd.read_stata(BASE + "data.dta")

# load every dataset at once
files = ["data"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}

# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "data.dta", "data.dta")
df, meta = pyreadstat.read_dta("data.dta")

Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb

R

# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_scpi/data/"
df <- read_dta(paste0(BASE, "data.dta"))

Overview & sources

Companion data for a hands-on Python tutorial that applies the synthetic control with prediction intervals (SCPI) framework of Cattaneo, Feng, and Titiunik (2021) to a classic question in political economy: did German reunification in 1990 reduce West Germany's GDP per capita, and how confident can we be in the estimate? The analysis treats West Germany as the treated unit and 16 OECD countries as the donor pool, builds a synthetic West Germany from 31 pre-treatment years (1960–1990) under a simplex constraint, and constructs prediction intervals that decompose uncertainty into in-sample (weight estimation) and out-of-sample (post-treatment) components with finite-sample coverage guarantees. The estimated gap grows to roughly −\$3,465 per capita by 2003, and actual GDP falls below the 99% prediction interval in 7 of 13 post-treatment years.

One file. data.csv is an annual country panel (one row per country × year) covering 17 countries over 1960–2003 (748 rows). The tutorial uses only the country, year, and gdp columns (features=None); the remaining seven columns are the original Abadie predictor covariates (inflation, trade, schooling, investment ratios, industry share), carried verbatim from the source dataset and available for the post's covariate-adjustment exercise.

Data sources

SourceProvidesReference / URL
Abadie (2021)The German-reunification panel — GDP per capita and OECD covariates for 17 countries (1960–2003)Abadie, A. (2021). Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects. Journal of Economic Literature, 59(2), 391–425. https://doi.org/10.1257/jel.20191450
scpi_pkg illustration dataDistributed form of the panel used here (the scpi Python package illustration scripts)Cattaneo, M. D., Feng, Y., Palomba, F., & Titiunik, R. scpi_pkg. https://github.com/nppackages/scpi
Method referencesEstimators and conceptsAbadie, Diamond & Hainmueller (2010, 2015); Cattaneo, Feng & Titiunik (2021).

Cite this data

Please cite this dataset as follows.

APA

Mendez, C. (2026). Synthetic Control with Prediction Intervals: Quantifying Uncertainty in Germany's Reunification Impact [Data set]. https://carlos-mendez.org/post/python_scpi/

Abadie, A. (2021). Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects. Journal of Economic Literature, 59(2), 391–425. Cattaneo, M. D., Feng, Y., & Titiunik, R. (2021). Prediction Intervals for Synthetic Control Methods. Journal of the American Statistical Association, 116(536), 1668–1683.

BibTeX

@misc{mendez2026pythonscpi,
  author       = {Mendez, Carlos},
  title        = {Synthetic Control with Prediction Intervals: Quantifying Uncertainty in Germany's Reunification Impact},
  year         = {2026},
  howpublished = {\url{https://carlos-mendez.org/post/python_scpi/}},
  note         = {Data set}
}

@article{abadie2021using,
  author  = {Abadie, Alberto},
  title   = {Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects},
  journal = {Journal of Economic Literature},
  volume  = {59}, number = {2}, pages = {391--425}, year = {2021}
}
@article{cattaneo2021prediction,
  author  = {Cattaneo, Matias D. and Feng, Yingjie and Titiunik, Rocio},
  title   = {Prediction Intervals for Synthetic Control Methods},
  journal = {Journal of the American Statistical Association},
  volume  = {116}, number = {536}, pages = {1668--1683}, year = {2021}
}

Variable explorer search & filter all 11 variables

Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.

VariableTypeDistributionLabelDefinitionUnitsIn filesSource
country#identifierCountry nameCountry identifier — the treated unit (West Germany) plus 16 OECD donor countries.stringdataAbadie (2021)
gdp#continuousmin 0.707 | median 10.3 | max 37.5GDP per capita (thousand USD)Real GDP per capita in thousands of US dollars — the outcome variable for the synthetic control.thousand US$dataAbadie (2021)
index#identifierCountry numeric IDNumeric identifier for the country (from the source Abadie dataset; not sequential).integer codedataAbadie (2021)
industry#continuousmin 21.6 | median 33.1 | max 48Industry share of GDP (%)Industry value added as a share of GDP, a structural predictor covariate.% of GDPdataAbadie (2021)
infrate#continuousmin -0.915 | median 4.08 | max 28.8Inflation rate (%)Annual inflation rate (consumer prices), a predictor covariate in the original Abadie analysis.% per yeardataAbadie (2021)
invest60#continuousmin 0.201 | median 0.278 | max 0.373Investment ratio, 1960s averageAverage investment-to-output ratio over the 1960s (time-invariant per country).ratiodataAbadie (2021)
invest70#continuousmin 0.226 | median 0.318 | max 0.42Investment ratio, 1970s averageAverage investment-to-output ratio over the 1970s (time-invariant per country).ratiodataAbadie (2021)
invest80#continuousmin 17.6 | median 26.5 | max 35Investment ratio, 1980s average (%)Average investment-to-output ratio over the 1980s (time-invariant per country).% / ratiodataAbadie (2021)
schooling#continuousmin 3.5 | median 38 | max 69.6Secondary schooling (%)Share of the population with secondary schooling, a human-capital predictor covariate.% of populationdataAbadie (2021)
trade#continuousmin 9.43 | median 49.5 | max 150Trade openness (% of GDP)Trade (exports + imports) as a share of GDP, a predictor covariate.% of GDPdataAbadie (2021)
year#yearCalendar yearAnnual time index, 1960-2003.yeardataAbadie (2021)

Cross-file variable index

Which file each variable appears in (● = present).

Construction & formulas

The synthetic control builds a counterfactual West Germany as a weighted average of donor countries, then reads the treatment effect off the post-treatment gap.

The datasets

Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.

expand to search (Ctrl/⌘+F) or print across all datasets

country-year  748 × 11 · 1960-2003 · 17 countries (West Germany + 16 OECD donors)

Panel key: country x year · Build a synthetic West Germany and estimate the reunification effect with prediction intervals.

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
index identifierCountry numeric IDNumeric identifier for the country (from the source Abadie dataset; not sequential).Carried from the source dataset; constant within a country across years.integer codeAbadie (2021)all rows
country identifierCountry nameCountry identifier — the treated unit (West Germany) plus 16 OECD donor countries.Carried from the source dataset.stringAbadie (2021)17 countries
year yearCalendar yearAnnual time index, 1960-2003.Carried from the source dataset.yearAbadie (2021)1960-2003 (44 years)
gdp continuousGDP per capita (thousand USD)Real GDP per capita in thousands of US dollars — the outcome variable for the synthetic control.Carried from the source dataset; the only outcome used in estimation (outcome_var='gdp').thousand US$Abadie (2021)all rows (748)
infrate continuousInflation rate (%)Annual inflation rate (consumer prices), a predictor covariate in the original Abadie analysis.Carried from the source dataset; not used by this tutorial's headline estimation.% per yearAbadie (2021)727 of 748 rows
trade continuousTrade openness (% of GDP)Trade (exports + imports) as a share of GDP, a predictor covariate.Carried from the source dataset; available for the covariate-adjustment exercise.% of GDPAbadie (2021)646 of 748 rows
schooling continuousSecondary schooling (%)Share of the population with secondary schooling, a human-capital predictor covariate.Carried from the source dataset; reported only for selected years (sparse).% of populationAbadie (2021)151 of 748 rows (sparse)
invest60 continuousInvestment ratio, 1960s averageAverage investment-to-output ratio over the 1960s (time-invariant per country).Carried from the source dataset; one value per country (period average).ratioAbadie (2021)17 rows (one per country)
invest70 continuousInvestment ratio, 1970s averageAverage investment-to-output ratio over the 1970s (time-invariant per country).Carried from the source dataset; one value per country (period average).ratioAbadie (2021)17 rows (one per country)
invest80 continuousInvestment ratio, 1980s average (%)Average investment-to-output ratio over the 1980s (time-invariant per country).Carried from the source dataset; one value per country (period average).% / ratioAbadie (2021)17 rows (one per country)
industry continuousIndustry share of GDP (%)Industry value added as a share of GDP, a structural predictor covariate.Carried from the source dataset; available for the covariate-adjustment exercise.% of GDPAbadie (2021)541 of 748 rows

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
index100%74817
country100%74817
year100%7484419601981.51981200312.71
gdpmin 0.707 | median 10.3 | max 37.5100%7487390.70712.1410.2637.558.95
infratemin -0.915 | median 4.08 | max 28.897%727726-0.9155.874.0828.785.13
trademin 9.43 | median 49.5 | max 15086%6466469.4353.1249.53149.726.46
schoolingmin 3.5 | median 38 | max 69.620%1511333.5036.3638.0069.6015.50
invest60min 0.201 | median 0.278 | max 0.3732%17170.2010.2870.2780.3730.045
invest70min 0.226 | median 0.318 | max 0.422%17170.2260.3170.3180.4200.044
invest80min 17.6 | median 26.5 | max 352%171717.5925.9626.4934.994.28
industrymin 21.6 | median 33.1 | max 4872%54154021.5933.2433.0748.005.16

Known limitations & caveats