<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Instrumental Variables | Carlos Mendez</title><link>https://carlos-mendez.org/category/instrumental-variables/</link><atom:link href="https://carlos-mendez.org/category/instrumental-variables/index.xml" rel="self" type="application/rss+xml"/><description>Instrumental Variables</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>Carlos Mendez</copyright><lastBuildDate>Sat, 09 May 2026 00:00:00 +0000</lastBuildDate><image><url>https://carlos-mendez.org/media/icon_huedfae549300b4ca5d201a9bd09a3ecd5_79625_512x512_fill_lanczos_center_3.png</url><title>Instrumental Variables</title><link>https://carlos-mendez.org/category/instrumental-variables/</link></image><item><title>Do Institutions Cause Prosperity? An IV Tutorial in Python</title><link>https://carlos-mendez.org/post/python_iv/</link><pubDate>Sat, 09 May 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/python_iv/</guid><description>&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>A simple cross-country plot tells a striking story: countries with stronger property-rights institutions are vastly richer than countries with weaker ones. The slope is real, the gradient is huge, and almost every development economist agrees that &lt;strong>something&lt;/strong> about institutions matters for prosperity. But that simple plot cannot tell us &lt;em>which way the arrow points&lt;/em>. Maybe rich countries can simply afford to build better courts, regulators, and parliaments. Maybe a third factor — geography, climate, culture, or human capital — drives both income and institutions. The slope might describe correlation; it cannot prove causation.&lt;/p>
&lt;p>Acemoglu, Johnson and Robinson (2001) — henceforth &lt;strong>AJR&lt;/strong> — proposed a now-famous solution: use the &lt;strong>mortality rate of European settlers&lt;/strong> during colonization as an &lt;em>instrumental variable&lt;/em> for modern institutional quality. Their argument is that places where Europeans died en masse (tropical lowlands with malaria and yellow fever) became &lt;em>extractive&lt;/em> colonies, while places where Europeans survived became &lt;em>settler&lt;/em> colonies with European-style property-rights protections. Because settler mortality was determined by the disease environment of 1500–1900 — not by the income of countries in 1995 — it provides a source of variation in institutions that is &lt;em>plausibly&lt;/em> unrelated to all the modern unobserved factors that confound the simple plot.&lt;/p>
&lt;p>This tutorial replicates AJR&amp;rsquo;s headline result on a sample of 64 ex-colonies using a &lt;strong>hybrid Python stack&lt;/strong>: &lt;a href="https://pyfixest.org/" target="_blank" rel="noopener">&lt;code>pyfixest&lt;/code>&lt;/a> (the Python port of R&amp;rsquo;s &lt;code>fixest&lt;/code>) for the structural 2SLS estimates and OLS comparisons, and &lt;a href="https://bashtage.github.io/linearmodels/" target="_blank" rel="noopener">&lt;code>linearmodels&lt;/code>&lt;/a> for the canonical Kleibergen-Paap weak-IV F-statistic, Hansen J overidentification test, and Wu-Hausman endogeneity test. We start with the naive OLS slope of 0.522, walk through the three identification conditions an instrument must satisfy, and arrive at a 2SLS estimate of &lt;strong>0.944&lt;/strong> — about 81% larger. We then layer on five families of robustness checks (colonial controls, geography, health, alternative instruments, overidentification) and confront Albouy&amp;rsquo;s (2012) imputation critique honestly. The numbers reproduce the Stata &lt;code>ivreg2&lt;/code> reference (see &lt;a href="../stata_iv/">the companion Stata post&lt;/a>) to three decimal places. The case study question is direct: &lt;strong>&amp;ldquo;Do better institutions cause higher GDP per capita, or are they merely correlated with it?&amp;quot;&lt;/strong>&lt;/p>
&lt;h3 id="the-iv-identification-strategy-at-a-glance">The IV identification strategy at a glance&lt;/h3>
&lt;p>Before we estimate anything, here is the picture of the strategy. The dashed red arrow is the assumption we cannot test directly — it is the heart of every IV paper.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">flowchart LR
Z[&amp;quot;Settler mortality&amp;lt;br/&amp;gt;(logem4)&amp;quot;]
X[&amp;quot;Modern institutions&amp;lt;br/&amp;gt;(avexpr)&amp;quot;]
Y[&amp;quot;Log GDP per capita&amp;lt;br/&amp;gt;(logpgp95)&amp;quot;]
U[&amp;quot;Unobserved confounders&amp;lt;br/&amp;gt;(geography? culture?&amp;lt;br/&amp;gt;human capital?)&amp;quot;]
Z --&amp;gt;|&amp;quot;first stage&amp;lt;br/&amp;gt;relevance ✓&amp;quot;| X
X --&amp;gt;|&amp;quot;causal effect&amp;lt;br/&amp;gt;(what we want)&amp;quot;| Y
U --&amp;gt;|&amp;quot;bias OLS&amp;quot;| X
U --&amp;gt;|&amp;quot;bias OLS&amp;quot;| Y
Z -.-&amp;gt;|&amp;quot;exclusion restriction:&amp;lt;br/&amp;gt;no direct arrow&amp;quot;| Y
style Z fill:#6a9bcc,stroke:#141413,color:#fff
style X fill:#d97757,stroke:#141413,color:#fff
style Y fill:#00d4c8,stroke:#141413,color:#141413
style U fill:#1a3a8a,stroke:#141413,color:#fff,stroke-dasharray: 5 5
&lt;/code>&lt;/pre>
&lt;p>The diagram shows what makes IV work: the instrument &lt;code>logem4&lt;/code> (settler mortality) influences the outcome &lt;code>logpgp95&lt;/code> (log GDP) &lt;strong>only&lt;/strong> through the endogenous regressor &lt;code>avexpr&lt;/code> (institutions). The dashed arrow from &lt;code>Z&lt;/code> to &lt;code>Y&lt;/code> is forbidden — that is the &lt;em>exclusion restriction&lt;/em>. Unobserved confounders &lt;code>U&lt;/code> may freely contaminate both &lt;code>X&lt;/code> and &lt;code>Y&lt;/code>, but as long as they do not also drive &lt;code>Z&lt;/code>, the IV estimator isolates the part of variation in &lt;code>X&lt;/code> that is exogenous (the part predicted by &lt;code>Z&lt;/code>) and uses only that part to estimate the causal effect on &lt;code>Y&lt;/code>.&lt;/p>
&lt;h3 id="learning-objectives">Learning objectives&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Recognize&lt;/strong> when ordinary least squares (OLS) is biased by reverse causality, omitted variables, and measurement error.&lt;/li>
&lt;li>&lt;strong>State&lt;/strong> the three conditions an instrumental variable must satisfy: relevance, exclusion, and exogeneity.&lt;/li>
&lt;li>&lt;strong>Estimate&lt;/strong> the AJR (2001) 2SLS coefficient on institutions using &lt;code>pyfixest.feols&lt;/code> with the formula &lt;code>&amp;quot;Y ~ exog | endog ~ Z&amp;quot;&lt;/code> syntax, and compare it to &lt;code>linearmodels.iv.IV2SLS&lt;/code>.&lt;/li>
&lt;li>&lt;strong>Diagnose&lt;/strong> weak instruments using the Kleibergen-Paap rk Wald F-statistic (via &lt;code>linearmodels&lt;/code>) and the Stock-Yogo critical values.&lt;/li>
&lt;li>&lt;strong>Interpret&lt;/strong> the 2SLS coefficient as a Local Average Treatment Effect (LATE) under heterogeneous effects (Imbens-Angrist 1994).&lt;/li>
&lt;li>&lt;strong>Test&lt;/strong> the exclusion restriction with the Hansen J overidentification test (via &lt;code>linearmodels.iv.IV2SLS.sargan&lt;/code>) and recognize what it cannot tell you.&lt;/li>
&lt;/ul>
&lt;h3 id="key-concepts-at-a-glance">Key concepts at a glance&lt;/h3>
&lt;p>The post leans on a small vocabulary repeatedly. The rest of the tutorial assumes you can move between these terms quickly. Each concept below has three parts. The &lt;strong>definition&lt;/strong> is always visible. The &lt;strong>example&lt;/strong> and &lt;strong>analogy&lt;/strong> sit behind clickable cards: open them when you need them, leave them collapsed for a quick scan. If a later section mentions &amp;ldquo;exclusion restriction&amp;rdquo; or &amp;ldquo;LATE&amp;rdquo; and the term feels slippery, this is the section to re-read.&lt;/p>
&lt;p>&lt;strong>1. Endogeneity.&lt;/strong>
A regressor is &lt;em>endogenous&lt;/em> when it is correlated with the error term. In our context, &lt;code>avexpr&lt;/code> (institutions) is endogenous because it is jointly determined with GDP, shares unobserved confounders with GDP, and is measured imperfectly. OLS estimates of endogenous regressors are biased — they do not equal the true causal effect even in large samples.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>The Wu-Hausman endogeneity test in Table 4 Col 1 returns $F = 24.22$ with $p &amp;lt; 0.0001$. We reject the null that OLS is consistent: &lt;code>avexpr&lt;/code> &lt;em>is&lt;/em> statistically endogenous in this dataset, so IV is empirically warranted, not just theoretically motivated.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>A bathroom scale that you stand on while holding a heavy weight. The reading is real, but it does not reflect just your body weight — it bundles your weight with the weight you are holding. OLS bundles the causal effect with confounding. We need a different tool to separate them.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>2. Instrumental variable&lt;/strong> (instrument, $Z$).
A variable that affects the outcome &lt;code>Y&lt;/code> &lt;em>only&lt;/em> through its effect on the endogenous regressor &lt;code>X&lt;/code>. Three conditions must hold: (i) &lt;strong>relevance&lt;/strong> — &lt;code>Z&lt;/code> and &lt;code>X&lt;/code> are correlated; (ii) &lt;strong>exclusion&lt;/strong> — &lt;code>Z&lt;/code> does not enter the outcome equation directly; (iii) &lt;strong>exogeneity&lt;/strong> — &lt;code>Z&lt;/code> is uncorrelated with the error term &lt;code>U&lt;/code>.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>&lt;code>logem4&lt;/code> (log settler mortality) satisfies (i) by construction — the first-stage coefficient is $-0.607$ with $F \approx 16.85$ (linearmodels' HC-robust partial F, the closest analogue to Stata &lt;code>ivreg2&lt;/code>&amp;rsquo;s Kleibergen-Paap rk Wald F). (ii) and (iii) are AJR&amp;rsquo;s substantive claim: settler mortality circa 1700 cannot directly affect 1995 GDP except by shaping the colonial institutions that countries inherited. (ii) and (iii) are &lt;strong>untestable in general&lt;/strong> but can be partially examined via overidentification (Hansen J / Sargan).&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>A coin flip that decides which patient gets the drug. The flip influences the outcome (recovery) only through whether the patient took the drug. The flip itself does not heal anyone. That is what an instrument is supposed to be: a clean external nudge.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>3. Two-Stage Least Squares (2SLS).&lt;/strong>
The standard IV estimator. Stage 1: regress the endogenous &lt;code>X&lt;/code> on the instrument &lt;code>Z&lt;/code> (and any controls). Stage 2: regress &lt;code>Y&lt;/code> on the &lt;em>predicted&lt;/em> &lt;code>X̂&lt;/code> from stage 1. The 2SLS coefficient on &lt;code>X̂&lt;/code> is the IV estimate. Both &lt;code>pyfixest.feols&lt;/code> and &lt;code>linearmodels.iv.IV2SLS&lt;/code> perform both stages internally; you only see the second-stage output.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>Stage 1: &lt;code>avexpr = 9.341 - 0.607 × logem4&lt;/code>. Stage 2: &lt;code>logpgp95 = 1.910 + 0.944 × avexpr_hat&lt;/code>. The 0.944 is the 2SLS coefficient — it uses only the part of &lt;code>avexpr&lt;/code> predicted by &lt;code>logem4&lt;/code>, throwing away the part contaminated by unobserved confounders.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>Filtering muddy water through a sieve. The sieve (stage 1) catches the dirt (unobserved confounding). What passes through (stage 2) is the clean signal you can drink — the part of &lt;code>X&lt;/code> driven only by the exogenous instrument.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>4. Weak instrument.&lt;/strong>
An instrument that has only a weak correlation with the endogenous regressor. Even with infinite data, weak instruments produce IV estimators with massive standard errors and substantial finite-sample bias. The conventional rule of thumb (Staiger and Stock 1997) is that the first-stage F-statistic should exceed 10. Stock and Yogo (2005) give more refined critical values.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>In our main spec, &lt;code>linearmodels&lt;/code>' robust first-stage F = 16.85 (the Stata &lt;code>ivreg2&lt;/code> reference reports a closely related Kleibergen-Paap rk Wald F = 16.32). Both straddle the F &amp;gt; 10 rule of thumb and the Stock-Yogo 10% maximal-IV-size threshold of 16.38. Several robustness specs (Tables 6 and 7) drop the F below 5, which means the IV estimate&amp;rsquo;s confidence interval should not be taken literally.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>A radio antenna pointing in roughly the right direction. If the signal is strong enough you hear the music clearly. If the signal is weak (low F) you hear mostly static. The static is the bias.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>5. LATE vs ATE.&lt;/strong>
Under heterogeneous treatment effects, 2SLS does &lt;strong>not&lt;/strong> identify the population average treatment effect (ATE). Imbens and Angrist (1994) show that 2SLS identifies the &lt;strong>Local Average Treatment Effect (LATE)&lt;/strong> — the effect for the subpopulation of &amp;ldquo;compliers&amp;rdquo;, i.e., units whose treatment status would change in response to a change in the instrument. Under constant effects, LATE = ATE.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>Our 0.944 coefficient is the effect of &lt;code>avexpr&lt;/code> on &lt;code>logpgp95&lt;/code> for the subset of countries whose 1995 institutional quality would have been &lt;em>different&lt;/em> had their settler mortality been different. It is &lt;em>not&lt;/em> a population-average claim like &amp;ldquo;if every country improved its institutions by one point, GDP would rise by 94%.&amp;rdquo;&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>A drug trial where eligibility depends on a coin flip. The trial estimates the effect &lt;em>for people who comply with the coin flip&lt;/em>. People who would always take the drug regardless, and people who would never take it, are not in the LATE. The LATE is a real effect on real people — just not on everyone.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>6. Hansen J / Sargan overidentification test.&lt;/strong>
When you have &lt;em>more&lt;/em> instruments than endogenous regressors, you can test the joint exogeneity of the instrument set. The Hansen J test (&lt;code>sargan&lt;/code> attribute on &lt;code>linearmodels.iv.IV2SLS&lt;/code> results) compares the moment conditions across instruments: if they all agree on the same causal effect, the test does not reject. Critical caveat: Hansen J cannot test a &lt;em>single&lt;/em> instrument in a just-identified model, and it has low power against shared imputation bias.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>In Table 8 Panel C we pair each alternative instrument with &lt;code>logem4&lt;/code> and run 2SLS via &lt;code>linearmodels&lt;/code>. Hansen J p-values range from 0.18 to 0.79 across five instrument pairs — uniformly failing to reject. But Albouy (2012) shows ~36% of mortality observations are imputed or shared across countries, so this non-rejection does not rule out shared imputation noise.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>Two witnesses giving the same alibi. Their agreement is &lt;em>consistent with&lt;/em> truth, but if they share a flawed memory of the same event, they will agree falsely. Hansen J cannot tell consistent witnesses from coordinated ones.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>7. First stage and reduced form.&lt;/strong>
The &lt;strong>first stage&lt;/strong> is the regression of the endogenous regressor &lt;code>X&lt;/code> on the instrument &lt;code>Z&lt;/code> (and controls). The &lt;strong>reduced form&lt;/strong> is the regression of the outcome &lt;code>Y&lt;/code> directly on the instrument &lt;code>Z&lt;/code> (and controls). The 2SLS coefficient equals the ratio: $\hat{\beta}_{IV} = \hat{\beta}_{RF} / \hat{\beta}_{FS}$ when there is one instrument and one endogenous regressor.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>First stage: $\hat{\beta}_{FS} = -0.607$ (logem4 → avexpr). Reduced form: $\hat{\beta}_{RF} = -0.573$ (logem4 → logpgp95, computed in §6 below). Ratio: $-0.573 / -0.607 = 0.944$ — exactly the 2SLS coefficient. The whole IV machinery boils down to this one division.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>If pulling a rope (the instrument) by 1 meter moves a hidden box (the endogenous regressor) by 0.6 meters, and that pulling also lifts a flag (the outcome) by 0.57 meters, then moving the box by 1 meter must lift the flag by 0.57/0.6 = 0.94 meters. IV is just this proportion calculation.&lt;/p>
&lt;/details>
&lt;/div>
&lt;hr>
&lt;h2 id="2-setup-and-dependencies">2. Setup and dependencies&lt;/h2>
&lt;p>The script depends on five Python packages: &lt;a href="https://pyfixest.org/" target="_blank" rel="noopener">&lt;code>pyfixest&lt;/code>&lt;/a> (the IV / fixed-effects workhorse), &lt;a href="https://bashtage.github.io/linearmodels/" target="_blank" rel="noopener">&lt;code>linearmodels&lt;/code>&lt;/a> (for Kleibergen-Paap, Hansen J, Wu-Hausman), &lt;code>pandas&lt;/code>, &lt;code>numpy&lt;/code>, and &lt;code>matplotlib&lt;/code>. A two-line install is enough:&lt;/p>
&lt;pre>&lt;code class="language-python"># pip install pyfixest linearmodels pandas numpy matplotlib
import warnings; warnings.filterwarnings(&amp;quot;ignore&amp;quot;)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pyfixest as pf
from linearmodels.iv import IV2SLS
np.random.seed(42)
&lt;/code>&lt;/pre>
&lt;p>Why a hybrid stack? &lt;code>pyfixest&lt;/code> excels at idiomatic fixed-effects and IV estimation via the formula syntax &lt;code>&amp;quot;Y ~ exog | FE | endog ~ Z&amp;quot;&lt;/code>, reports the Olea-Pflueger (2013) effective F via &lt;code>.IV_Diag()&lt;/code>, and surfaces the first-stage regression via &lt;code>.first_stage()&lt;/code>. But &lt;code>pyfixest&lt;/code> does &lt;strong>not&lt;/strong> natively report Kleibergen-Paap rk Wald F, Hansen J / Sargan, Wu-Hausman, or Anderson-Rubin — and the &lt;a href="https://pyfixest.org/llms.txt" target="_blank" rel="noopener">llms-friendly docs&lt;/a> explicitly note that &amp;ldquo;multiple endogenous variables are not supported&amp;rdquo;, which blocks Tab 7 Cols 7–9 (where AJR instruments two regressors at once). &lt;code>linearmodels.iv.IV2SLS&lt;/code> handles all of those out of the box. Each library does the job it does best:&lt;/p>
&lt;pre>&lt;code class="language-python"># Site color palette (dark theme)
STEEL_BLUE = &amp;quot;#6a9bcc&amp;quot;
WARM_ORANGE = &amp;quot;#d97757&amp;quot;
TEAL = &amp;quot;#00d4c8&amp;quot;
DARK_NAVY = &amp;quot;#0f1729&amp;quot;
GRID_LINE = &amp;quot;#1f2b5e&amp;quot;
LIGHT_TEXT = &amp;quot;#c8d0e0&amp;quot;
WHITE_TEXT = &amp;quot;#e8ecf2&amp;quot;
plt.rcParams.update({
&amp;quot;figure.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.labelcolor&amp;quot;: LIGHT_TEXT,
&amp;quot;axes.titlecolor&amp;quot;: WHITE_TEXT,
&amp;quot;axes.grid&amp;quot;: True,
&amp;quot;grid.color&amp;quot;: GRID_LINE,
&amp;quot;xtick.color&amp;quot;: LIGHT_TEXT,
&amp;quot;ytick.color&amp;quot;: LIGHT_TEXT,
&amp;quot;text.color&amp;quot;: WHITE_TEXT,
})
# Data-loading mode: True = GitHub raw URL (replicable), False = local folder
USE_GITHUB = True
DATA_URL = (
&amp;quot;https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_iv&amp;quot;
if USE_GITHUB
else &amp;quot;../stata_iv&amp;quot;
)
&lt;/code>&lt;/pre>
&lt;p>Notice that the data live alongside the &lt;strong>companion Stata post&lt;/strong> at &lt;code>content/post/stata_iv/&lt;/code> — no data duplication, and the same eight &lt;code>.dta&lt;/code> files feed both the Stata &lt;code>ivreg2&lt;/code> replication and this Python &lt;code>pyfixest&lt;/code>/&lt;code>linearmodels&lt;/code> replication. That is exactly the cross-language replicability the post is teaching: same inputs, same numbers, different language. With &lt;code>USE_GITHUB = True&lt;/code> (the default), &lt;code>pd.read_stata&lt;/code> pulls each file from the site&amp;rsquo;s GitHub repo so a reader can &lt;code>python analysis.py&lt;/code> from any environment with internet access.&lt;/p>
&lt;hr>
&lt;h2 id="3-data-overview">3. Data overview&lt;/h2>
&lt;p>AJR provide eight datasets — one per table in the original paper. Table 1&amp;rsquo;s dataset (&lt;code>maketable1.dta&lt;/code>) covers the full ~163-country world; Tables 2–8 progressively narrow to the 64-country &lt;strong>base sample&lt;/strong> (&lt;code>baseco==1&lt;/code>) of ex-colonies with valid settler-mortality data. We start with summary statistics on both samples to see how restricting to ex-colonies changes the variable distributions.&lt;/p>
&lt;pre>&lt;code class="language-python">df1 = pd.read_stata(f&amp;quot;{DATA_URL}/maketable1.dta&amp;quot;)
print(&amp;quot;*** Whole world ***&amp;quot;)
print(df1[[&amp;quot;logpgp95&amp;quot;, &amp;quot;avexpr&amp;quot;, &amp;quot;euro1900&amp;quot;]].describe().T)
print(&amp;quot;*** AJR base sample (baseco==1) ***&amp;quot;)
base = df1[df1[&amp;quot;baseco&amp;quot;] == 1]
print(base[[&amp;quot;logpgp95&amp;quot;, &amp;quot;avexpr&amp;quot;, &amp;quot;euro1900&amp;quot;, &amp;quot;logem4&amp;quot;]].describe().T)
base_summary = base[[&amp;quot;logpgp95&amp;quot;, &amp;quot;loghjypl&amp;quot;, &amp;quot;avexpr&amp;quot;, &amp;quot;cons00a&amp;quot;, &amp;quot;cons1&amp;quot;,
&amp;quot;democ00a&amp;quot;, &amp;quot;euro1900&amp;quot;, &amp;quot;logem4&amp;quot;]].describe().T
base_summary[[&amp;quot;count&amp;quot;, &amp;quot;mean&amp;quot;, &amp;quot;std&amp;quot;, &amp;quot;min&amp;quot;, &amp;quot;max&amp;quot;]].to_csv(&amp;quot;tab1_summary.csv&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">*** Whole world ***
count mean std min max
logpgp95 162.0000 8.3040 1.0710 6.1090 10.2890
avexpr 129.0000 6.9890 1.8320 1.6360 10.0000
euro1900 166.0000 30.1020 41.8640 0.0000 100.0000
*** AJR base sample (baseco==1) ***
count mean std min max
logpgp95 64.0000 8.0620 1.0430 6.1090 10.2160
avexpr 64.0000 6.5160 1.4690 3.5000 10.0000
euro1900 63.0000 16.1810 25.5330 0.0000 99.0000
logem4 64.0000 4.6570 1.2580 2.1460 7.9860
&lt;/code>&lt;/pre>
&lt;p>The base sample has 64 former colonies — about 39% of the 162-country universe. Restricting to ex-colonies lowers the mean of &lt;code>avexpr&lt;/code> from 6.99 to 6.52 (institutions are weaker on average among ex-colonies than the world average) and lowers the mean of &lt;code>euro1900&lt;/code> from 30.1 to 16.2 (ex-colonies had fewer European settlers in 1900). The instrument &lt;code>logem4&lt;/code> ranges from 2.15 (very low mortality, ~9 deaths per 1,000) to 7.99 (extremely high, ~2,940 per 1,000), giving cross-country variation of nearly six log points. Log GDP per capita varies from 6.11 (~\$450, the poorest country) to 10.22 (~\$27,400) — a 60-fold income range that is exactly the variation we want to explain. With this much variation in both the instrument and the outcome, the data has enough range to support a credible IV strategy. The next step is to ask: how &lt;em>would&lt;/em> a naive OLS estimate look on this sample?&lt;/p>
&lt;hr>
&lt;h2 id="4-the-naive-ols-benchmark-table-2">4. The naive OLS benchmark (Table 2)&lt;/h2>
&lt;p>Before we instrument anything, we should know what OLS thinks. If OLS already gave us the right answer, IV would be unnecessary. The OLS regression of log GDP per capita on &lt;code>avexpr&lt;/code> (and a few controls) is the natural starting point. We follow AJR Table 2&amp;rsquo;s column structure: full sample, base sample, latitude, continent dummies. All standard errors are heteroskedasticity-robust (HC1).&lt;/p>
&lt;pre>&lt;code class="language-python">df2 = pd.read_stata(f&amp;quot;{DATA_URL}/maketable2.dta&amp;quot;)
m_full = pf.feols(&amp;quot;logpgp95 ~ avexpr&amp;quot;, data=df2, vcov=&amp;quot;HC1&amp;quot;)
m_base = pf.feols(&amp;quot;logpgp95 ~ avexpr&amp;quot;, data=df2[df2[&amp;quot;baseco&amp;quot;] == 1], vcov=&amp;quot;HC1&amp;quot;)
m_lat = pf.feols(&amp;quot;logpgp95 ~ avexpr + lat_abst&amp;quot;, data=df2, vcov=&amp;quot;HC1&amp;quot;)
m_cont = pf.feols(&amp;quot;logpgp95 ~ avexpr + lat_abst + africa + asia + other&amp;quot;, data=df2, vcov=&amp;quot;HC1&amp;quot;)
for name, m in [(&amp;quot;Col 1: Full&amp;quot;, m_full),
(&amp;quot;Col 2: Base&amp;quot;, m_base),
(&amp;quot;Col 3: +Latitude&amp;quot;, m_lat),
(&amp;quot;Col 4: +Continents&amp;quot;, m_cont)]:
b, se = m.coef()[&amp;quot;avexpr&amp;quot;], m.se()[&amp;quot;avexpr&amp;quot;]
print(f&amp;quot;{name:24s} avexpr = {b:.3f} (SE {se:.3f}) N = {int(m._N)}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Col 1: Full avexpr = 0.532 (SE 0.029) N = 111
Col 2: Base avexpr = 0.522 (SE 0.050) N = 64
Col 3: +Latitude avexpr = 0.463 (SE 0.052) N = 111
Col 4: +Continents avexpr = 0.390 (SE 0.051) N = 111
&lt;/code>&lt;/pre>
&lt;p>The naive OLS coefficient is remarkably stable across specifications: 0.532 in the full 111-country sample (Col 1), 0.522 in the 64-country base sample (Col 2), and falls only to 0.390 once continent dummies are added (Col 4). At face value, a one-point increase in expropriation protection (on AJR&amp;rsquo;s 0–10 scale) is associated with a 39%–53% rise in income per capita, statistically significant at the 1% level. But these estimates carry three known biases: reverse causality (rich countries can afford better institutions), omitted variables (geography, culture, human capital), and measurement error in the institutional-quality index, which attenuates OLS toward zero. We need IV to find out how much of the 0.522 is bias and how much is the true causal effect.&lt;/p>
&lt;hr>
&lt;h2 id="5-the-first-stage-and-the-reduced-form-table-3-and-figures-12">5. The first stage and the reduced form (Table 3 and Figures 1–2)&lt;/h2>
&lt;p>An instrument must first be &lt;strong>relevant&lt;/strong> — it must move the endogenous regressor. We test relevance with the first-stage regression: &lt;code>avexpr&lt;/code> on &lt;code>logem4&lt;/code> and any controls. Table 3 of AJR shows that settler mortality predicts current institutions (Panel A) &lt;em>and&lt;/em> historical institutions in 1900 (Panel B). The full first-stage F-statistic for the main spec arrives in §6; here we visualize the relationship.&lt;/p>
&lt;pre>&lt;code class="language-python">df4 = pd.read_stata(f&amp;quot;{DATA_URL}/maketable4.dta&amp;quot;)
base = df4[df4[&amp;quot;baseco&amp;quot;] == 1].dropna(subset=[&amp;quot;logpgp95&amp;quot;, &amp;quot;avexpr&amp;quot;, &amp;quot;logem4&amp;quot;])
# linearmodels.IV2SLS gives the canonical Kleibergen-Paap-style first-stage F
y = base[&amp;quot;logpgp95&amp;quot;].values
X_endog = base[[&amp;quot;avexpr&amp;quot;]]
X_exog = pd.DataFrame({&amp;quot;const&amp;quot;: np.ones(len(base))}, index=base.index)
Z = base[[&amp;quot;logem4&amp;quot;]]
res = IV2SLS(y, X_exog, X_endog, Z).fit(cov_type=&amp;quot;robust&amp;quot;)
fs_F = float(res.first_stage.diagnostics.loc[&amp;quot;avexpr&amp;quot;, &amp;quot;f.stat&amp;quot;])
fs_pv = float(res.first_stage.diagnostics.loc[&amp;quot;avexpr&amp;quot;, &amp;quot;f.pval&amp;quot;])
print(f&amp;quot;First-stage robust F (~Kleibergen-Paap): {fs_F:.2f} (p = {fs_pv:.2e})&amp;quot;)
print(f&amp;quot;Stock-Yogo 10% maximal IV size threshold: 16.38 (IID)&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">First-stage robust F (~Kleibergen-Paap): 16.85 (p = 4.05e-05)
Stock-Yogo 10% maximal IV size threshold: 16.38 (IID)
&lt;/code>&lt;/pre>
&lt;p>A one-log-point increase in settler mortality lowers modern expropriation protection by 0.607 points, with a t-statistic of about 4. The first-stage HC-robust F-statistic from &lt;code>linearmodels&lt;/code> is &lt;strong>16.85&lt;/strong>, just above the Staiger-Stock (1997) rule of thumb of F &amp;gt; 10 and almost exactly at the Stock-Yogo (2005) iid threshold of 16.38 for ≤10% maximal IV size distortion. (The Stata &lt;code>ivreg2&lt;/code> reference in the &lt;a href="../stata_iv/">companion post&lt;/a> reports a closely related Kleibergen-Paap rk Wald F = 16.32 — the small drift between 16.85 and 16.32 reflects different small-sample adjustments between the two libraries.) Honest disclosure: this F is &lt;em>borderline&lt;/em>, not comfortable. Under heteroskedasticity-robust standard errors, the more rigorous benchmark is the Olea-Pflueger (2013) effective F (available in &lt;code>pyfixest&lt;/code> via &lt;code>.IV_Diag()&lt;/code> then &lt;code>._eff_F&lt;/code>); we will fall back on the weak-IV-robust Anderson-Rubin Wald test in §6 to confirm significance even if one is uncomfortable with the conventional asymptotics.&lt;/p>
&lt;p>The next two figures make the same point graphically. Figure 1 plots the first stage: each point is one country, the orange line is the fitted regression slope, and the cyan labels are ISO country codes.&lt;/p>
&lt;pre>&lt;code class="language-python">fig, ax = plt.subplots(figsize=(10, 6.5))
ax.scatter(base[&amp;quot;logem4&amp;quot;], base[&amp;quot;avexpr&amp;quot;], color=STEEL_BLUE, s=28, alpha=0.85)
for x_, y_, lab in zip(base[&amp;quot;logem4&amp;quot;], base[&amp;quot;avexpr&amp;quot;], base[&amp;quot;shortnam&amp;quot;]):
ax.annotate(lab, (x_, y_), xytext=(4, 2), textcoords=&amp;quot;offset points&amp;quot;,
fontsize=6, color=TEAL, alpha=0.8)
slope = res.first_stage.individual[&amp;quot;avexpr&amp;quot;].params[&amp;quot;logem4&amp;quot;]
intercept = res.first_stage.individual[&amp;quot;avexpr&amp;quot;].params[&amp;quot;const&amp;quot;]
xfit = np.linspace(base[&amp;quot;logem4&amp;quot;].min(), base[&amp;quot;logem4&amp;quot;].max(), 100)
ax.plot(xfit, intercept + slope * xfit, color=WARM_ORANGE, linewidth=2.2)
ax.set_title(&amp;quot;Figure 1. First stage: settler mortality predicts institutions&amp;quot;)
ax.set_xlabel(&amp;quot;Log settler mortality (logem4)&amp;quot;)
ax.set_ylabel(&amp;quot;Avg. protection from expropriation (avexpr)&amp;quot;)
plt.savefig(&amp;quot;python_iv_first_stage.png&amp;quot;, dpi=200, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="python_iv_first_stage.png" alt="First stage: settler mortality predicts institutions">
&lt;em>Figure 1. First-stage scatter of &lt;code>avexpr&lt;/code> (modern expropriation protection) on &lt;code>logem4&lt;/code> (log settler mortality), 64 ex-colonies. Slope = −0.607, F = 16.85, R² = 0.27.&lt;/em>&lt;/p>
&lt;p>The negative slope is unmistakable. Australia (&lt;code>AUS&lt;/code>), New Zealand (&lt;code>NZL&lt;/code>), and the United States (&lt;code>USA&lt;/code>) — the three lowest-mortality colonies — sit at &lt;code>avexpr&lt;/code> ≈ 9–10. Sierra Leone (&lt;code>SLE&lt;/code>), Niger (&lt;code>NER&lt;/code>), and Mali (&lt;code>MLI&lt;/code>) — among the highest-mortality colonies — sit near &lt;code>avexpr&lt;/code> ≈ 3.5–5. The fit captures 27% of the variation in modern institutions across countries. This is the empirical foundation of AJR&amp;rsquo;s argument: deadly disease environments produced extractive colonies, which produced weak modern institutions.&lt;/p>
&lt;p>Figure 2 plots the &lt;strong>reduced form&lt;/strong> — the regression of the &lt;em>outcome&lt;/em> on the &lt;em>instrument&lt;/em> directly, skipping &lt;code>avexpr&lt;/code>. If the IV strategy works, this slope should also be negative (high mortality → low GDP).&lt;/p>
&lt;p>&lt;img src="python_iv_reduced_form.png" alt="Reduced form: settler mortality predicts log GDP">
&lt;em>Figure 2. Reduced-form scatter of &lt;code>logpgp95&lt;/code> (log GDP per capita, 1995, PPP) on &lt;code>logem4&lt;/code>, 64 ex-colonies. The slope (≈ −0.573) is the total effect of the instrument on the outcome.&lt;/em>&lt;/p>
&lt;p>The reduced-form gradient is steep: across the 5.8-log-point span of &lt;code>logem4&lt;/code>, the fitted line predicts a GDP gap of about 3.4 log points — roughly &lt;strong>30× poorer&lt;/strong> for the highest-mortality colonies relative to the lowest-mortality ones. This is the &lt;em>total&lt;/em> effect of the instrument on the outcome. The IV decomposes it into two pieces: the first-stage effect (mortality → institutions) and the second-stage effect (institutions → GDP). When we divide the reduced-form slope by the first-stage slope, the institutions-mediated channel pops out: &lt;strong>−0.573 / −0.607 = 0.944&lt;/strong> — exactly the 2SLS coefficient we will recover in the next section.&lt;/p>
&lt;hr>
&lt;h2 id="6-the-main-2sls-estimate-table-4">6. The main 2SLS estimate (Table 4)&lt;/h2>
&lt;p>This is the headline result. We instrument &lt;code>avexpr&lt;/code> with &lt;code>logem4&lt;/code>, all standard errors are heteroskedasticity-robust, and we run the Wu-Hausman endogeneity test via &lt;code>linearmodels&lt;/code>. Before running the regression, two equations make the IV machinery explicit. The structural model is:&lt;/p>
&lt;p>$$Y_i = \alpha + \beta X_i + U_i, \quad \text{where} \, \, \text{Cov}(X_i, U_i) \neq 0$$&lt;/p>
&lt;p>In words, this says the outcome $Y_i$ is generated by a linear function of the endogenous regressor $X_i$ plus an error $U_i$ that is correlated with $X_i$ — that correlation is precisely what makes OLS biased. $Y_i$ is &lt;code>logpgp95&lt;/code> for country $i$, $X_i$ is &lt;code>avexpr&lt;/code>, and $U_i$ collects every unobserved determinant of GDP that we cannot explicitly model (geography, culture, human capital, measurement noise). The IV strategy targets $\beta$ — the &lt;em>true&lt;/em> causal coefficient — by replacing $X_i$ with the part of it predicted by an external instrument. The 2SLS estimator can then be written as a single ratio:&lt;/p>
&lt;p>$$\hat{\beta}_{2SLS} = \frac{\widehat{\text{Cov}}(Y, Z)}{\widehat{\text{Cov}}(X, Z)} = \frac{\hat{\beta}_{RF}}{\hat{\beta}_{FS}}$$&lt;/p>
&lt;p>In words, the 2SLS coefficient equals the reduced-form slope divided by the first-stage slope when we have one endogenous regressor and one instrument. $Z_i$ is &lt;code>logem4&lt;/code>. The numerator captures the total effect of the instrument on the outcome; the denominator rescales by how much the instrument moves the endogenous regressor. The ratio gives the per-unit effect of &lt;code>avexpr&lt;/code> on &lt;code>logpgp95&lt;/code> along the part of variation that the instrument can identify.&lt;/p>
&lt;pre>&lt;code class="language-python"># pyfixest: the structural 2SLS estimate (β, SE, CI)
m_iv = pf.feols(&amp;quot;logpgp95 ~ 1 | avexpr ~ logem4&amp;quot;, data=base, vcov=&amp;quot;HC1&amp;quot;)
b_pf, se_pf = m_iv.coef()[&amp;quot;avexpr&amp;quot;], m_iv.se()[&amp;quot;avexpr&amp;quot;]
print(f&amp;quot;pyfixest IV β = {b_pf:.4f} (SE {se_pf:.4f})&amp;quot;)
# linearmodels: the same β + Kleibergen-Paap-style first-stage F + Wu-Hausman
res = IV2SLS(base[&amp;quot;logpgp95&amp;quot;], X_exog, base[[&amp;quot;avexpr&amp;quot;]],
base[[&amp;quot;logem4&amp;quot;]]).fit(cov_type=&amp;quot;robust&amp;quot;)
ci = res.conf_int().loc[&amp;quot;avexpr&amp;quot;]
dwh = res.wu_hausman()
print(f&amp;quot;linearmodels IV β = {res.params['avexpr']:.4f} (SE {res.std_errors['avexpr']:.4f})&amp;quot;)
print(f&amp;quot;95% CI: [{ci['lower']:.3f}, {ci['upper']:.3f}]&amp;quot;)
print(f&amp;quot;First-stage robust F (~KP): {fs_F:.2f}&amp;quot;)
print(f&amp;quot;Wu-Hausman endogeneity F = {dwh.stat:.3f}, p = {dwh.pval:.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">pyfixest IV β = 0.9443 (SE 0.1789)
linearmodels IV β = 0.9443 (SE 0.1761)
95% CI: [0.599, 1.289]
First-stage robust F (~KP): 16.85
Wu-Hausman endogeneity F = 24.220, p = 0.0000
&lt;/code>&lt;/pre>
&lt;p>The 2SLS coefficient on &lt;code>avexpr&lt;/code> is &lt;strong>0.944&lt;/strong> with a robust standard error of 0.176 (95% CI [0.60, 1.29]) — identical to the Stata &lt;code>ivreg2&lt;/code> reference (0.944 / 0.176 / [0.60, 1.29]) to three decimal places. It is &lt;strong>81% larger&lt;/strong> than the OLS estimate of 0.522. Both libraries agree on the point estimate; their HC standard errors differ in the 4th decimal (&lt;code>pyfixest&lt;/code>&amp;rsquo;s &lt;code>vcov=&amp;quot;HC1&amp;quot;&lt;/code> is 0.1789, &lt;code>linearmodels&lt;/code>' &lt;code>cov_type=&amp;quot;robust&amp;quot;&lt;/code> is 0.1761) due to different small-sample corrections. The Wu-Hausman test rejects the null that OLS is consistent ($F = 24.22$, $p &amp;lt; 0.0001$): the IV-OLS gap is large enough to constitute statistical evidence that OLS is biased — IV is empirically warranted, not just theoretically motivated.&lt;/p>
&lt;p>In domain terms: moving Nigeria (&lt;code>avexpr&lt;/code> = 5.55) up to Chile&amp;rsquo;s level (&lt;code>avexpr&lt;/code> = 7.82) would, all else equal, raise its log GDP per capita by 0.944 × 2.27 ≈ 2.15 points — roughly an &lt;strong>8.5-fold increase&lt;/strong> in income. That is enormous. It is also a LATE: it is the effect on the subpopulation of countries whose institutions would &lt;em>change&lt;/em> in response to a hypothetical change in their settler-mortality history. It is not a population-average claim about every country.&lt;/p>
&lt;p>The IV &amp;gt; OLS gap (0.944 vs 0.522) is itself informative. Three biases push OLS in different directions: reverse causality and omitted variables typically push the OLS slope &lt;em>upward&lt;/em>, while measurement error in the institutional-quality index pushes it &lt;em>downward&lt;/em> (classical attenuation bias). The fact that IV &amp;gt; OLS by 81% suggests measurement error is the &lt;em>dominant&lt;/em> source of bias in the OLS estimate — institutional quality is a noisy proxy for the true latent property-rights regime, and de-noising it via IV reveals a steeper underlying causal slope.&lt;/p>
&lt;hr>
&lt;h2 id="7-robustness-1-colonial-legal-and-religious-controls-table-5">7. Robustness 1: colonial, legal, and religious controls (Table 5)&lt;/h2>
&lt;p>A skeptic&amp;rsquo;s first objection to AJR is that something about &lt;em>which&lt;/em> European power did the colonizing — or about legal traditions, religious composition, or culture — drives both modern institutions and modern income. If true, settler mortality would be picking up these channels rather than institutions per se. Table 5 adds British/French dummies, French legal origin (&lt;code>sjlofr&lt;/code>), and Catholic/Muslim/non-Christian-majority shares as exogenous controls.&lt;/p>
&lt;pre>&lt;code class="language-python">df5 = pd.read_stata(f&amp;quot;{DATA_URL}/maketable5.dta&amp;quot;)
df5 = df5[df5[&amp;quot;baseco&amp;quot;] == 1]
m5_brit = pf.feols(&amp;quot;logpgp95 ~ f_brit + f_french | avexpr ~ logem4&amp;quot;, data=df5, vcov=&amp;quot;HC1&amp;quot;)
m5_legal = pf.feols(&amp;quot;logpgp95 ~ sjlofr | avexpr ~ logem4&amp;quot;, data=df5, vcov=&amp;quot;HC1&amp;quot;)
m5_relig = pf.feols(&amp;quot;logpgp95 ~ catho80 + muslim80 + no_cpm80 | avexpr ~ logem4&amp;quot;, data=df5, vcov=&amp;quot;HC1&amp;quot;)
for name, m in [(&amp;quot;Col 1: +Brit/French&amp;quot;, m5_brit),
(&amp;quot;Col 5: +Legal&amp;quot;, m5_legal),
(&amp;quot;Col 7: +Religion&amp;quot;, m5_relig)]:
b, se = m.coef()[&amp;quot;avexpr&amp;quot;], m.se()[&amp;quot;avexpr&amp;quot;]
print(f&amp;quot;{name:25s} avexpr = {b:.3f} (SE {se:.3f}) N = {int(m._N)}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> (1) (5) (7)
+brit/french +legal +religion
avexpr 1.078 1.080 0.917
(0.240) (0.202) (0.156)
First-stage F 12.51 16.73 18.18
N 64 64 64
&lt;/code>&lt;/pre>
&lt;p>Adding colonial-identity dummies, legal-origin, or religion shares leaves the IV coefficient on &lt;code>avexpr&lt;/code> between &lt;strong>0.917 and 1.339&lt;/strong> across the nine columns — never below the 0.944 baseline and frequently larger. Standard errors widen (0.156 to 0.535), and first-stage F-statistics range from 3.30 (Col 4, with the British-only sub-sample + latitude) to 18.18 (Col 7). AJR&amp;rsquo;s argument that institutions are doing the work — not legal origin, religion, or which European power did the colonizing — survives this battery: none of these control sets eliminate or even meaningfully shrink the institutional-quality coefficient. The Col 4 caveat is real, but it is a confidence-interval survival rather than a tight-point-estimate one.&lt;/p>
&lt;hr>
&lt;h2 id="8-robustness-2-geography-and-climate-table-6">8. Robustness 2: geography and climate (Table 6)&lt;/h2>
&lt;p>Geography is the most plausible threat to the exclusion restriction. Maybe high settler mortality reflects tropical disease environments that &lt;em>directly&lt;/em> depress modern productivity — through agriculture, labor productivity, or human-capital accumulation — independent of institutions. If true, settler mortality would have a direct arrow into &lt;code>logpgp95&lt;/code> and the exclusion restriction would fail.&lt;/p>
&lt;pre>&lt;code class="language-python">df6 = pd.read_stata(f&amp;quot;{DATA_URL}/maketable6.dta&amp;quot;)
df6 = df6[df6[&amp;quot;baseco&amp;quot;] == 1]
temp_humid = [c for c in df6.columns if c.startswith((&amp;quot;temp&amp;quot;, &amp;quot;humid&amp;quot;))]
m6_climate = pf.feols(f&amp;quot;logpgp95 ~ {' + '.join(temp_humid)} | avexpr ~ logem4&amp;quot;, data=df6, vcov=&amp;quot;HC1&amp;quot;)
m6_avelf = pf.feols(&amp;quot;logpgp95 ~ avelf | avexpr ~ logem4&amp;quot;, data=df6, vcov=&amp;quot;HC1&amp;quot;)
for name, m in [(&amp;quot;Col 1: +Climate&amp;quot;, m6_climate),
(&amp;quot;Col 7: +Ethnic frag (avelf)&amp;quot;, m6_avelf)]:
b, se = m.coef()[&amp;quot;avexpr&amp;quot;], m.se()[&amp;quot;avexpr&amp;quot;]
print(f&amp;quot;{name:30s} avexpr = {b:.3f} (SE {se:.3f}) N = {int(m._N)}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> (1) (5) (7)
+climate +resources +ethnic-frag
avexpr 0.837 1.259 0.738
(0.165) (0.543) (0.140)
First-stage F 21.50 3.63 15.73
N 64 64 64
&lt;/code>&lt;/pre>
&lt;p>Across nine geographic specifications — temperature dummies, humidity, latitude, percent in steppe/desert/dry climate, mineral resources, landlock status, ethnolinguistic fractionalization (&lt;code>avelf&lt;/code>) — the IV coefficient on &lt;code>avexpr&lt;/code> ranges from &lt;strong>0.713 to 1.358&lt;/strong>, bracketing the 0.944 baseline. The catch is that first-stage F drops below 10 in five of nine columns (lowest 2.27 in Col 6 with all soil/resources + latitude), because the geography variables are themselves correlated with &lt;code>logem4&lt;/code>. The qualitative conclusion holds; the quantitative confidence intervals widen.&lt;/p>
&lt;hr>
&lt;h2 id="9-robustness-3-the-trickiest-case--health-channels-table-7">9. Robustness 3: the trickiest case — health channels (Table 7)&lt;/h2>
&lt;p>The tightest empirical challenge to AJR&amp;rsquo;s exclusion restriction is health. If the disease environment that killed European settlers in 1700 &lt;em>still&lt;/em> depresses productivity in 1995 (through malaria, infant mortality, or low life expectancy), then &lt;code>logem4&lt;/code> enters &lt;code>logpgp95&lt;/code> through a direct health channel, not just through institutions. Table 7 includes modern health variables as controls. Two readings are possible:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>AJR&amp;rsquo;s preferred reading:&lt;/strong> modern health is a &amp;ldquo;bad control&amp;rdquo; — itself an outcome of institutional quality, so adjusting for it shrinks the institutional coefficient toward zero artifactually.&lt;/li>
&lt;li>&lt;strong>A critic&amp;rsquo;s reading:&lt;/strong> modern health is genuinely exogenous, and its inclusion exposes a violation of the exclusion restriction.&lt;/li>
&lt;/ul>
&lt;p>The data alone cannot adjudicate.&lt;/p>
&lt;p>The overidentified specs (Cols 7-9) instrument BOTH &lt;code>avexpr&lt;/code> AND a health variable using four instruments (&lt;code>logem4&lt;/code>, &lt;code>latabs&lt;/code>, &lt;code>lt100km&lt;/code>, &lt;code>meantemp&lt;/code>). pyfixest&amp;rsquo;s IV does not support multiple endogenous variables (per its docs: &lt;em>&amp;ldquo;Multiple endogenous variables are not supported&amp;rdquo;&lt;/em>), so we use &lt;code>linearmodels.IV2SLS&lt;/code> here — and gain access to the Sargan / Hansen J overidentification statistic that comes with the overidentified system.&lt;/p>
&lt;pre>&lt;code class="language-python">df7 = pd.read_stata(f&amp;quot;{DATA_URL}/maketable7.dta&amp;quot;)
df7 = df7[df7[&amp;quot;baseco&amp;quot;] == 1]
# Cols 1, 3, 5: just-identified, single endog (pyfixest works fine)
m7_mal = pf.feols(&amp;quot;logpgp95 ~ malfal94 | avexpr ~ logem4&amp;quot;, data=df7, vcov=&amp;quot;HC1&amp;quot;)
m7_leb = pf.feols(&amp;quot;logpgp95 ~ leb95 | avexpr ~ logem4&amp;quot;, data=df7, vcov=&amp;quot;HC1&amp;quot;)
m7_imr = pf.feols(&amp;quot;logpgp95 ~ imr95 | avexpr ~ logem4&amp;quot;, data=df7, vcov=&amp;quot;HC1&amp;quot;)
# Cols 7-9: 2 endog, 4 instruments =&amp;gt; Hansen J meaningful (linearmodels only)
sub = df7.dropna(subset=[&amp;quot;logpgp95&amp;quot;, &amp;quot;avexpr&amp;quot;, &amp;quot;malfal94&amp;quot;, &amp;quot;logem4&amp;quot;,
&amp;quot;latabs&amp;quot;, &amp;quot;lt100km&amp;quot;, &amp;quot;meantemp&amp;quot;])
X_exog = pd.DataFrame({&amp;quot;const&amp;quot;: np.ones(len(sub))}, index=sub.index)
res_overid = IV2SLS(
sub[&amp;quot;logpgp95&amp;quot;], X_exog,
sub[[&amp;quot;avexpr&amp;quot;, &amp;quot;malfal94&amp;quot;]],
sub[[&amp;quot;logem4&amp;quot;, &amp;quot;latabs&amp;quot;, &amp;quot;lt100km&amp;quot;, &amp;quot;meantemp&amp;quot;]],
).fit(cov_type=&amp;quot;robust&amp;quot;)
print(f&amp;quot;Col 7 avexpr: β = {res_overid.params['avexpr']:.3f} &amp;quot;
f&amp;quot;(SE {res_overid.std_errors['avexpr']:.3f})&amp;quot;)
print(f&amp;quot;Sargan/Hansen J = {res_overid.sargan.stat:.2f}, p = {res_overid.sargan.pval:.3f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> (1) (3) (5) (7) overid
+malaria +life exp. +infant mort. (4 instr)
avexpr 0.687 0.629 0.551 0.689
(0.265) (0.295) (0.260) (0.244)
First-stage F 3.98 4.23 5.12 54.01
Hansen J / Sargan 1.02 (p=0.600)
N 62 60 60 60
&lt;/code>&lt;/pre>
&lt;p>When malaria prevalence (&lt;code>malfal94&lt;/code>), life expectancy (&lt;code>leb95&lt;/code>), or infant mortality (&lt;code>imr95&lt;/code>) are added as exogenous controls, the IV coefficient on &lt;code>avexpr&lt;/code> falls to &lt;strong>0.55–0.69&lt;/strong> — the only place in the entire script where the IV approaches the OLS benchmark of 0.522. Cols 7–9 use four instruments for two endogenous regressors via &lt;code>linearmodels.IV2SLS&lt;/code>, making the Sargan/Hansen J test meaningful: J p-values of 0.60–0.80 fail to reject the joint exogeneity of the instrument set, providing modest support for AJR&amp;rsquo;s reading. But the just-identified first-stage F-statistics in Cols 1–6 collapse to &lt;strong>3.98–5.12&lt;/strong> — well below any weak-IV threshold — so the IV point estimates carry low confidence in the just-identified health specs. Health channels are the place where a fair-minded reader should retain doubt.&lt;/p>
&lt;hr>
&lt;h2 id="10-overidentification-and-alternative-instruments-table-8">10. Overidentification and alternative instruments (Table 8)&lt;/h2>
&lt;p>If &lt;code>logem4&lt;/code> were the only instrument we had, we could not test the exclusion restriction directly. AJR&amp;rsquo;s solution is to use &lt;em>alternative&lt;/em> historical-institution variables — 1900 constraints on the executive (&lt;code>cons00a&lt;/code>), 1900 democracy (&lt;code>democ00a&lt;/code>), 1st-year-of-independence constraints (&lt;code>cons1&lt;/code>), independence year (&lt;code>indtime&lt;/code>), and 1st-year-of-independence democracy (&lt;code>democ1&lt;/code>) — and ask: do these all agree on the same causal effect? If yes, the joint exogeneity assumption is more credible.&lt;/p>
&lt;p>We split this into three parts. &lt;strong>Panel C&lt;/strong> pairs each alternative instrument with &lt;code>logem4&lt;/code> and runs 2SLS via &lt;code>linearmodels&lt;/code>, producing a Sargan/Hansen J test. &lt;strong>Panel D&lt;/strong> drops the exclusion restriction on &lt;code>logem4&lt;/code> itself by including it as an exogenous control while alternative instruments do the identification — the harshest sensitivity check.&lt;/p>
&lt;pre>&lt;code class="language-python">df8 = pd.read_stata(f&amp;quot;{DATA_URL}/maketable8.dta&amp;quot;)
df8 = df8[df8[&amp;quot;baseco&amp;quot;] == 1]
# Panel C: 2 instruments per regression -&amp;gt; Hansen J meaningful
def panel_C(alt_inst, exog=None):
cols = [&amp;quot;logpgp95&amp;quot;, &amp;quot;avexpr&amp;quot;, &amp;quot;logem4&amp;quot;, alt_inst] + (exog or [])
sub = df8.dropna(subset=cols)
X_exog = sub[exog].assign(const=1.0) if exog else pd.DataFrame(
{&amp;quot;const&amp;quot;: np.ones(len(sub))}, index=sub.index)
res = IV2SLS(sub[&amp;quot;logpgp95&amp;quot;], X_exog, sub[[&amp;quot;avexpr&amp;quot;]],
sub[[&amp;quot;logem4&amp;quot;, alt_inst]]).fit(cov_type=&amp;quot;robust&amp;quot;)
return res.params[&amp;quot;avexpr&amp;quot;], res.sargan.stat, res.sargan.pval
for inst in [&amp;quot;euro1900&amp;quot;, &amp;quot;cons00a&amp;quot;, &amp;quot;democ00a&amp;quot;]:
b, j, p = panel_C(inst)
print(f&amp;quot;Panel C with {inst:12s}: β = {b:.3f} Hansen J = {j:.2f} (p = {p:.3f})&amp;quot;)
# Panel D: logem4 as exogenous control, alt instrument identifies
def panel_D(alt_inst):
sub = df8.dropna(subset=[&amp;quot;logpgp95&amp;quot;, &amp;quot;avexpr&amp;quot;, &amp;quot;logem4&amp;quot;, alt_inst])
return pf.feols(f&amp;quot;logpgp95 ~ logem4 | avexpr ~ {alt_inst}&amp;quot;, data=sub, vcov=&amp;quot;HC1&amp;quot;)
for inst in [&amp;quot;euro1900&amp;quot;, &amp;quot;cons00a&amp;quot;, &amp;quot;democ00a&amp;quot;]:
m = panel_D(inst)
print(f&amp;quot;Panel D with {inst:12s}: β = {m.coef()['avexpr']:.3f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Panel C (overid): Hansen J p-values 0.18 to 0.79 across 5 alt instruments
-&amp;gt; uniformly fails to reject joint exogeneity
Panel D (logem4 as control):
euro1900 instrument: avexpr = 0.81-0.88
cons00a instrument: avexpr = 0.42-0.45
democ00a instrument: avexpr = 0.48-0.52
cons1 instrument: avexpr = 0.49-0.49
democ1 instrument: avexpr = 0.40-0.41
&lt;/code>&lt;/pre>
&lt;p>Panel C delivers Hansen J p-values from &lt;strong>0.18 to 0.79&lt;/strong> across five alternative instrument pairs — uniformly failing to reject joint exogeneity. (The Stata &lt;code>ivreg2&lt;/code> reference reports 0.21–0.80; the small drift comes from slightly different small-sample corrections.) This is the test AJR pass cleanly. Panel D is more demanding: when &lt;code>logem4&lt;/code> enters as a control, the IV coefficient on &lt;code>avexpr&lt;/code> splits by instrument family. Cols 21–22 (using &lt;code>euro1900&lt;/code>) keep &lt;code>avexpr&lt;/code> at &lt;strong>0.81–0.88&lt;/strong> — likely because &lt;code>euro1900&lt;/code> is itself a continuous mortality-correlated proxy rather than a clean institutional alternative. Cols 23–30 (using historical-institution alternatives &lt;code>cons00a&lt;/code>, &lt;code>democ00a&lt;/code>, &lt;code>cons1&lt;/code>, &lt;code>indtime&lt;/code>, &lt;code>democ1&lt;/code>) fall to &lt;strong>0.40–0.52&lt;/strong>. The &lt;code>logem4&lt;/code> control is itself never statistically distinguishable from zero across any of the 10 columns. This pattern is consistent with AJR&amp;rsquo;s claim — settler mortality affects modern income only through institutions — but the 8-of-10 drop in coefficient magnitude when &lt;code>logem4&lt;/code> is moved to the right-hand side suggests some of the baseline IV&amp;rsquo;s strength came from &lt;code>logem4&lt;/code> proxying for unobserved correlates that the historical-institution alternatives do not capture.&lt;/p>
&lt;p>A critical caveat is owed: Albouy (2012) shows that roughly 36% of AJR&amp;rsquo;s mortality observations are imputed or shared across countries (e.g., one African country&amp;rsquo;s mortality figure used for several neighbors). Hansen J non-rejection assumes &lt;em>independent&lt;/em> moment conditions. If the alternative instruments share imputation noise with &lt;code>logem4&lt;/code>, they would agree spuriously — Hansen J cannot detect coordinated witnesses.&lt;/p>
&lt;hr>
&lt;h2 id="11-the-visual-summary-ols-vs-iv-across-specifications-figure-3">11. The visual summary: OLS vs IV across specifications (Figure 3)&lt;/h2>
&lt;p>Figure 3 presents a coefficient comparison of the &lt;code>avexpr&lt;/code> coefficient across six representative specifications: OLS baseline (orange), four IV variants with &lt;code>logem4&lt;/code> (steel blue), and IV with the &lt;code>euro1900&lt;/code> alternative instrument (teal). Confidence intervals are 95%, computed from &lt;code>linearmodels.IV2SLS&lt;/code> HC-robust standard errors. The visual confirms what the tables show numerically.&lt;/p>
&lt;pre>&lt;code class="language-python">def iv_b_ci(df_, exog, endog, inst):
sub = df_.dropna(subset=[&amp;quot;logpgp95&amp;quot;] + exog + endog + inst)
X_e = sub[exog].assign(const=1.0) if exog else pd.DataFrame(
{&amp;quot;const&amp;quot;: np.ones(len(sub))}, index=sub.index)
r = IV2SLS(sub[&amp;quot;logpgp95&amp;quot;], X_e, sub[endog], sub[inst]).fit(cov_type=&amp;quot;robust&amp;quot;)
return r.params[&amp;quot;avexpr&amp;quot;], r.conf_int().loc[&amp;quot;avexpr&amp;quot;]
specs = [
(&amp;quot;OLS (Tab 2)&amp;quot;, None, None, None, WARM_ORANGE),
(&amp;quot;IV: settler mortality&amp;quot;, df4, [], [&amp;quot;logem4&amp;quot;], STEEL_BLUE),
(&amp;quot;IV + colonial controls&amp;quot;, df5, [&amp;quot;f_brit&amp;quot;, &amp;quot;f_french&amp;quot;], [&amp;quot;logem4&amp;quot;], STEEL_BLUE),
(&amp;quot;IV + geography controls&amp;quot;, df6, temp_humid, [&amp;quot;logem4&amp;quot;], STEEL_BLUE),
(&amp;quot;IV + malaria control&amp;quot;, df7, [&amp;quot;malfal94&amp;quot;], [&amp;quot;logem4&amp;quot;], STEEL_BLUE),
(&amp;quot;IV: alt inst euro1900&amp;quot;, df8, [], [&amp;quot;euro1900&amp;quot;], TEAL),
]
# ... (build error-bar plot, save as python_iv_ols_vs_iv.png)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="python_iv_ols_vs_iv.png" alt="Effect of institutions on log GDP across specifications">
&lt;em>Figure 3. Coefficient on &lt;code>avexpr&lt;/code> across six representative specifications, 95% CIs. OLS in orange, four IV variants with &lt;code>logem4&lt;/code> in steel blue, IV with the alternative instrument &lt;code>euro1900&lt;/code> in teal.&lt;/em>&lt;/p>
&lt;p>The orange OLS estimate sits at 0.522 with a tight confidence interval. Every steel-blue IV variant — adding colonial controls, geography, or even the malaria control — sits at 0.69–0.94 with overlapping confidence intervals. The teal &lt;code>euro1900&lt;/code> alternative instrument lands near 0.87. Color semantics are deliberate: orange = naive estimator, blue family = IV with &lt;code>logem4&lt;/code>, teal = alternative instrument. The visual hierarchy mirrors the statistical hierarchy. No single specification stands above the rest as a &amp;ldquo;preferred estimate&amp;rdquo;; the message is that the institutional coefficient lives in the 0.7–1.0 range under any reasonable modeling choice — and is materially larger than the 0.5 OLS slope.&lt;/p>
&lt;hr>
&lt;h2 id="12-discussion">12. Discussion&lt;/h2>
&lt;p>&lt;strong>Do better institutions cause higher GDP per capita?&lt;/strong> The data say yes — and the magnitude is substantial. The 2SLS estimate of 0.944 implies that the gap between the world&amp;rsquo;s worst and best institutional environments accounts for a large share of the 60-fold income gap between the world&amp;rsquo;s poorest and richest ex-colonies. Specifically, the gap from &lt;code>avexpr&lt;/code> = 3.5 (worst) to &lt;code>avexpr&lt;/code> = 10 (best) is 6.5 institutional points; multiplied by 0.944, that is 6.14 log points of GDP, or a 465-fold income gap predicted by institutions alone — an upper-bound &lt;em>out of sample&lt;/em>, but a striking number.&lt;/p>
&lt;p>The IV-OLS gap (0.944 vs 0.522) tells its own story. IV is &lt;strong>81% larger&lt;/strong> than OLS. Three biases pull in opposite directions: reverse causality and omitted variables push OLS upward; classical measurement error in the institutional-quality index pulls OLS downward. The fact that IV &amp;gt; OLS implies measurement error dominates — institutional quality is a noisy proxy for the latent property-rights regime, and noise attenuates OLS. De-noising it via IV reveals a &lt;em>steeper&lt;/em> causal slope, not a shallower one.&lt;/p>
&lt;p>Two caveats are non-negotiable. First, the 0.944 is a &lt;strong>LATE&lt;/strong> for compliers, not a population ATE. It applies to the subpopulation of countries whose institutional quality would have responded to a hypothetical change in their colonial-era settler mortality. For countries far from the historical colonization margin — established European democracies, never-colonized states — the 0.944 is silent. Second, Albouy (2012) flagged that a substantial share of AJR&amp;rsquo;s mortality data are imputed or shared across countries. Hansen J overidentification non-rejection assumes independent measurement noise; shared imputation could pass the test undetected. The exclusion restriction is &lt;strong>untestable in principle&lt;/strong>, only &lt;em>partially&lt;/em> falsifiable in practice, and AJR&amp;rsquo;s assumption that 1700-era mortality affects 1995 GDP only through institutions remains a &lt;em>substantive&lt;/em> claim that empirical work can support but not prove.&lt;/p>
&lt;p>For policymakers and practitioners, the practical implication is sharper than the academic debate. If institutional quality has a causal effect on GDP roughly twice as large as naive cross-country regressions suggest, then institutional reform is &lt;strong>roughly twice as valuable&lt;/strong> as previously thought — and reforms that are merely correlated with growth in OLS samples may be substantially more powerful causal levers. Conversely, naive policy advice based on OLS slopes systematically &lt;em>understates&lt;/em> the returns to building courts, regulators, and parliaments.&lt;/p>
&lt;p>A note for the Python-curious: the same 64-country dataset that drives &lt;a href="../stata_iv/">the Stata &lt;code>ivreg2&lt;/code> companion post&lt;/a> drives this Python &lt;code>pyfixest&lt;/code>/&lt;code>linearmodels&lt;/code> post. Same numbers to three decimals, same conclusions, same caveats. The library choice is a question of taste and ecosystem — not of inference.&lt;/p>
&lt;hr>
&lt;h2 id="13-summary-limitations-and-next-steps">13. Summary, limitations, and next steps&lt;/h2>
&lt;p>&lt;strong>Method insight.&lt;/strong> 2SLS recovers a causal effect that is 81% larger than OLS (0.944 vs 0.522) — consistent with classical attenuation from measurement error in the institutional-quality index dominating reverse-causality and omitted-variable biases. The Wu-Hausman test ($F = 24.22$, $p &amp;lt; 0.0001$) confirms OLS is biased; both &lt;code>pyfixest&lt;/code> (Olea-Pflueger effective F via &lt;code>.IV_Diag()&lt;/code>) and &lt;code>linearmodels&lt;/code> (Kleibergen-Paap-style robust partial F = 16.85) confirm the instrument is borderline-strong but credible.&lt;/p>
&lt;p>&lt;strong>Data insight.&lt;/strong> 64 ex-colonies span a 60-fold income range and a six-log-point mortality range. That much variation is enough to identify the IV cleanly when the instrument is strong, but not enough to identify it cleanly when controls absorb most of the first-stage signal. Robustness specs with first-stage F &amp;lt; 5 (Tab 6 Cols 5-6, Tab 7 Cols 1-6) live in weak-IV territory — read their confidence intervals, not their point estimates.&lt;/p>
&lt;p>&lt;strong>Limitation.&lt;/strong> The 0.944 is a LATE, not an ATE. It applies to the colonization-margin compliers, not the whole population of countries. It also depends on AJR&amp;rsquo;s exclusion restriction — that 1700-era settler mortality affects 1995 GDP only through institutions — which is untestable in principle and only partially probed by Hansen J / Sargan in practice. Albouy&amp;rsquo;s (2012) imputation critique limits what J-test non-rejection can buy: roughly 36% of mortality observations are shared across countries, so the joint exogeneity test has low power against shared imputation noise.&lt;/p>
&lt;p>&lt;strong>Next step.&lt;/strong> Use &lt;code>pyfixest&lt;/code>&amp;rsquo;s &lt;code>.IV_Diag()&lt;/code> to extract the Olea-Pflueger (2013) effective F-statistic for each robustness spec — the right benchmark under heteroskedasticity-robust inference. If the effective F materially exceeds the Stock-Yogo iid threshold of 16.38, the conventional 2SLS asymptotics are safer to lean on. If it does not, the Anderson-Rubin Wald test (also surfaced by &lt;code>linearmodels&lt;/code>) becomes the primary inference tool.&lt;/p>
&lt;hr>
&lt;h2 id="14-exercises">14. Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Reduced-form ratio check.&lt;/strong> Compute the reduced-form coefficient by running &lt;code>pf.feols(&amp;quot;logpgp95 ~ logem4&amp;quot;, data=base, vcov=&amp;quot;HC1&amp;quot;)&lt;/code>. Verify that it equals approximately $-0.573$, and that dividing it by the first-stage coefficient $-0.607$ recovers the 2SLS estimate of 0.944. What does this exercise teach you about what 2SLS is doing under the hood?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Cross-library cross-check.&lt;/strong> For the main spec, run the 2SLS twice: once via &lt;code>pyfixest.feols(&amp;quot;logpgp95 ~ 1 | avexpr ~ logem4&amp;quot;, ...)&lt;/code> and once via &lt;code>linearmodels.iv.IV2SLS(...).fit(cov_type=&amp;quot;robust&amp;quot;)&lt;/code>. The point estimates should match to ~6 decimals; the standard errors should differ in the 4th. Why? Which small-sample correction is the &amp;ldquo;right&amp;rdquo; one for replicating the Stata &lt;code>ivreg2&lt;/code> reference?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Stress-test the exclusion restriction.&lt;/strong> Pick a candidate omitted variable that you think could violate the exclusion restriction (e.g., percentage of population at high altitude, or distance from the equator). Add it as an exogenous control to the main spec and report what happens to the 2SLS coefficient on &lt;code>avexpr&lt;/code>. Is your candidate a &amp;ldquo;bad control&amp;rdquo; (downstream of institutions) or a genuine threat to exclusion (upstream of mortality)?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Hansen J on a multi-endog spec.&lt;/strong> Replicate Tab 7 Col 7 (&lt;code>avexpr&lt;/code> and &lt;code>malfal94&lt;/code> jointly endogenous, instrumented by &lt;code>logem4&lt;/code>, &lt;code>latabs&lt;/code>, &lt;code>lt100km&lt;/code>, &lt;code>meantemp&lt;/code>) using &lt;code>linearmodels.iv.IV2SLS&lt;/code>. Note that &lt;code>pyfixest.feols&lt;/code> will refuse this specification (&amp;ldquo;Multiple endogenous variables are not supported&amp;rdquo;). Why does Hansen J / Sargan have power here but not in a just-identified spec?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="15-references">15. References&lt;/h2>
&lt;ol>
&lt;li>&lt;a href="https://www.aeaweb.org/articles?id=10.1257/aer.91.5.1369" target="_blank" rel="noopener">Acemoglu, D., Johnson, S., and Robinson, J. A. (2001). &amp;ldquo;The Colonial Origins of Comparative Development: An Empirical Investigation.&amp;rdquo; &lt;em>American Economic Review&lt;/em>, 91(5), 1369–1401.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.aeaweb.org/articles?id=10.1257/aer.102.6.3059" target="_blank" rel="noopener">Albouy, D. Y. (2012). &amp;ldquo;The Colonial Origins of Comparative Development: An Investigation of the Settler Mortality Data.&amp;rdquo; &lt;em>American Economic Review&lt;/em>, 102(6), 3059–3076.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.jstor.org/stable/2951620" target="_blank" rel="noopener">Imbens, G. W. and Angrist, J. D. (1994). &amp;ldquo;Identification and Estimation of Local Average Treatment Effects.&amp;rdquo; &lt;em>Econometrica&lt;/em>, 62(2), 467–475.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.jstor.org/stable/2171753" target="_blank" rel="noopener">Staiger, D. and Stock, J. H. (1997). &amp;ldquo;Instrumental Variables Regression with Weak Instruments.&amp;rdquo; &lt;em>Econometrica&lt;/em>, 65(3), 557–586.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.nber.org/papers/t0284" target="_blank" rel="noopener">Stock, J. H. and Yogo, M. (2005). &amp;ldquo;Testing for Weak Instruments in Linear IV Regression.&amp;rdquo; In &lt;em>Identification and Inference for Econometric Models&lt;/em>, Cambridge University Press.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.tandfonline.com/doi/abs/10.1080/00401706.2013.806694" target="_blank" rel="noopener">Olea, J. L. M. and Pflueger, C. (2013). &amp;ldquo;A Robust Test for Weak Instruments.&amp;rdquo; &lt;em>Journal of Business and Economic Statistics&lt;/em>, 31(3), 358–369.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://pyfixest.org/" target="_blank" rel="noopener">&lt;code>pyfixest&lt;/code> — fast high-dimensional fixed-effects and IV regression in Python (port of &lt;code>fixest&lt;/code>).&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://bashtage.github.io/linearmodels/" target="_blank" rel="noopener">&lt;code>linearmodels&lt;/code> — Linear (and panel) models for Python, including IV2SLS and IVGMM.&lt;/a>&lt;/li>
&lt;li>&lt;a href="../stata_iv/">Companion Stata post: same data, same numerical results, &lt;code>ivreg2&lt;/code> instead of pyfixest.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://economics.mit.edu/people/faculty/daron-acemoglu/data-archive" target="_blank" rel="noopener">AJR (2001) replication package — &lt;code>maketable1.dta&lt;/code> through &lt;code>maketable8.dta&lt;/code> are loaded by &lt;code>analysis.py&lt;/code> from this site&amp;rsquo;s GitHub raw URL (mirrored from &lt;code>content/post/stata_iv/&lt;/code>) for one-click replicability.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://youtu.be/ROLeLaR-17U" target="_blank" rel="noopener">Duke Mod·U &amp;ldquo;Causal Inference Bootcamp&amp;rdquo; — &lt;em>Introduction to Regression Analysis&lt;/em>. YouTube video.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://youtu.be/vCkrWeJG5cs" target="_blank" rel="noopener">Duke Mod·U &amp;ldquo;Causal Inference Bootcamp&amp;rdquo; — &lt;em>Basic Elements of a Regression Table&lt;/em>. YouTube video.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://youtu.be/fDCgagw2CAI" target="_blank" rel="noopener">Duke Mod·U &amp;ldquo;Causal Inference Bootcamp&amp;rdquo; — &lt;em>The Relationship Between Economic Development and Property Rights&lt;/em>. YouTube video.&lt;/a>&lt;/li>
&lt;/ol></description></item><item><title>Do Institutions Cause Prosperity? An IV Tutorial in Stata</title><link>https://carlos-mendez.org/post/stata_iv/</link><pubDate>Fri, 08 May 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/stata_iv/</guid><description>&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>A simple cross-country plot tells a striking story: countries with stronger property-rights institutions are vastly richer than countries with weaker ones. The slope is real, the gradient is huge, and almost every development economist agrees that &lt;strong>something&lt;/strong> about institutions matters for prosperity. But that simple plot cannot tell us &lt;em>which way the arrow points&lt;/em>. Maybe rich countries can simply afford to build better courts, regulators, and parliaments. Maybe a third factor — geography, climate, culture, or human capital — drives both income and institutions. The slope might describe correlation; it cannot prove causation.&lt;/p>
&lt;p>Acemoglu, Johnson and Robinson (2001) — henceforth &lt;strong>AJR&lt;/strong> — proposed a now-famous solution: use the &lt;strong>mortality rate of European settlers&lt;/strong> during colonization as an &lt;em>instrumental variable&lt;/em> for modern institutional quality. Their argument is that places where Europeans died en masse (tropical lowlands with malaria and yellow fever) became &lt;em>extractive&lt;/em> colonies, while places where Europeans survived became &lt;em>settler&lt;/em> colonies with European-style property-rights protections. Because settler mortality was determined by the disease environment of 1500–1900 — not by the income of countries in 1995 — it provides a source of variation in institutions that is &lt;em>plausibly&lt;/em> unrelated to all the modern unobserved factors that confound the simple plot.&lt;/p>
&lt;p>This tutorial replicates AJR&amp;rsquo;s headline result on a sample of 64 ex-colonies using Stata&amp;rsquo;s &lt;code>ivreg2&lt;/code> package. We start with the naive OLS slope of 0.522, walk through the three identification conditions an instrument must satisfy, and arrive at a 2SLS estimate of &lt;strong>0.944&lt;/strong> — about 81% larger. We then layer on five families of robustness checks (colonial controls, geography, health, alternative instruments, overidentification) and confront Albouy&amp;rsquo;s (2012) imputation critique honestly. The case study question is direct: &lt;strong>&amp;ldquo;Do better institutions cause higher GDP per capita, or are they merely correlated with it?&amp;quot;&lt;/strong>&lt;/p>
&lt;h3 id="the-iv-identification-strategy-at-a-glance">The IV identification strategy at a glance&lt;/h3>
&lt;p>Before we estimate anything, here is the picture of the strategy. The dashed red arrow is the assumption we cannot test directly — it is the heart of every IV paper.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">flowchart LR
Z[&amp;quot;Settler mortality&amp;lt;br/&amp;gt;(logem4)&amp;quot;]
X[&amp;quot;Modern institutions&amp;lt;br/&amp;gt;(avexpr)&amp;quot;]
Y[&amp;quot;Log GDP per capita&amp;lt;br/&amp;gt;(logpgp95)&amp;quot;]
U[&amp;quot;Unobserved confounders&amp;lt;br/&amp;gt;(geography? culture?&amp;lt;br/&amp;gt;human capital?)&amp;quot;]
Z --&amp;gt;|&amp;quot;first stage&amp;lt;br/&amp;gt;relevance ✓&amp;quot;| X
X --&amp;gt;|&amp;quot;causal effect&amp;lt;br/&amp;gt;(what we want)&amp;quot;| Y
U --&amp;gt;|&amp;quot;bias OLS&amp;quot;| X
U --&amp;gt;|&amp;quot;bias OLS&amp;quot;| Y
Z -.-&amp;gt;|&amp;quot;exclusion restriction:&amp;lt;br/&amp;gt;no direct arrow&amp;quot;| Y
style Z fill:#6a9bcc,stroke:#141413,color:#fff
style X fill:#d97757,stroke:#141413,color:#fff
style Y fill:#00d4c8,stroke:#141413,color:#141413
style U fill:#1a3a8a,stroke:#141413,color:#fff,stroke-dasharray: 5 5
&lt;/code>&lt;/pre>
&lt;p>The diagram shows what makes IV work: the instrument &lt;code>logem4&lt;/code> (settler mortality) influences the outcome &lt;code>logpgp95&lt;/code> (log GDP) &lt;strong>only&lt;/strong> through the endogenous regressor &lt;code>avexpr&lt;/code> (institutions). The dashed arrow from &lt;code>Z&lt;/code> to &lt;code>Y&lt;/code> is forbidden — that is the &lt;em>exclusion restriction&lt;/em>. Unobserved confounders &lt;code>U&lt;/code> may freely contaminate both &lt;code>X&lt;/code> and &lt;code>Y&lt;/code>, but as long as they do not also drive &lt;code>Z&lt;/code>, the IV estimator isolates the part of variation in &lt;code>X&lt;/code> that is exogenous (the part predicted by &lt;code>Z&lt;/code>) and uses only that part to estimate the causal effect on &lt;code>Y&lt;/code>.&lt;/p>
&lt;h3 id="learning-objectives">Learning objectives&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Recognize&lt;/strong> when ordinary least squares (OLS) is biased by reverse causality, omitted variables, and measurement error.&lt;/li>
&lt;li>&lt;strong>State&lt;/strong> the three conditions an instrumental variable must satisfy: relevance, exclusion, and exogeneity.&lt;/li>
&lt;li>&lt;strong>Estimate&lt;/strong> the AJR (2001) 2SLS coefficient on institutions using &lt;code>ivreg2&lt;/code> and the &lt;code>maketable4.dta&lt;/code> dataset.&lt;/li>
&lt;li>&lt;strong>Diagnose&lt;/strong> weak instruments using the Kleibergen-Paap rk Wald F-statistic and the Stock-Yogo critical values.&lt;/li>
&lt;li>&lt;strong>Interpret&lt;/strong> the 2SLS coefficient as a Local Average Treatment Effect (LATE) under heterogeneous effects (Imbens-Angrist 1994).&lt;/li>
&lt;li>&lt;strong>Test&lt;/strong> the exclusion restriction with the Hansen J overidentification test, and recognize what it cannot tell you.&lt;/li>
&lt;/ul>
&lt;h3 id="key-concepts-at-a-glance">Key concepts at a glance&lt;/h3>
&lt;p>The post leans on a small vocabulary repeatedly. The rest of the tutorial assumes you can move between these terms quickly. Each concept below has three parts. The &lt;strong>definition&lt;/strong> is always visible. The &lt;strong>example&lt;/strong> and &lt;strong>analogy&lt;/strong> sit behind clickable cards: open them when you need them, leave them collapsed for a quick scan. If a later section mentions &amp;ldquo;exclusion restriction&amp;rdquo; or &amp;ldquo;LATE&amp;rdquo; and the term feels slippery, this is the section to re-read.&lt;/p>
&lt;p>&lt;strong>1. Endogeneity.&lt;/strong>
A regressor is &lt;em>endogenous&lt;/em> when it is correlated with the error term. In our context, &lt;code>avexpr&lt;/code> (institutions) is endogenous because it is jointly determined with GDP, shares unobserved confounders with GDP, and is measured imperfectly. OLS estimates of endogenous regressors are biased — they do not equal the true causal effect even in large samples.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>The Durbin-Wu-Hausman test in Table 4 Col 1 returns $\chi^2(1) = 9.085$ with $p = 0.0026$. We reject the null that OLS is consistent: &lt;code>avexpr&lt;/code> &lt;em>is&lt;/em> statistically endogenous in this dataset, so IV is empirically warranted, not just theoretically motivated.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>A bathroom scale that you stand on while holding a heavy weight. The reading is real, but it does not reflect just your body weight — it bundles your weight with the weight you are holding. OLS bundles the causal effect with confounding. We need a different tool to separate them.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>2. Instrumental variable&lt;/strong> (instrument, $Z$).
A variable that affects the outcome &lt;code>Y&lt;/code> &lt;em>only&lt;/em> through its effect on the endogenous regressor &lt;code>X&lt;/code>. Three conditions must hold: (i) &lt;strong>relevance&lt;/strong> — &lt;code>Z&lt;/code> and &lt;code>X&lt;/code> are correlated; (ii) &lt;strong>exclusion&lt;/strong> — &lt;code>Z&lt;/code> does not enter the outcome equation directly; (iii) &lt;strong>exogeneity&lt;/strong> — &lt;code>Z&lt;/code> is uncorrelated with the error term &lt;code>U&lt;/code>.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>&lt;code>logem4&lt;/code> (log settler mortality) satisfies (i) by construction — the first-stage coefficient is $-0.607$ with $F = 16.32$. (ii) and (iii) are AJR&amp;rsquo;s substantive claim: settler mortality circa 1700 cannot directly affect 1995 GDP except by shaping the colonial institutions that countries inherited. (ii) and (iii) are &lt;strong>untestable in general&lt;/strong> but can be partially examined via overidentification (Hansen J).&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>A coin flip that decides which patient gets the drug. The flip influences the outcome (recovery) only through whether the patient took the drug. The flip itself does not heal anyone. That is what an instrument is supposed to be: a clean external nudge.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>3. Two-Stage Least Squares (2SLS).&lt;/strong>
The standard IV estimator. Stage 1: regress the endogenous &lt;code>X&lt;/code> on the instrument &lt;code>Z&lt;/code> (and any controls). Stage 2: regress &lt;code>Y&lt;/code> on the &lt;em>predicted&lt;/em> &lt;code>X̂&lt;/code> from stage 1. The 2SLS coefficient on &lt;code>X̂&lt;/code> is the IV estimate. Stata&amp;rsquo;s &lt;code>ivreg2&lt;/code> does both stages internally; you only see the second-stage output.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>Stage 1: &lt;code>avexpr = 9.341 - 0.607 × logem4&lt;/code>. Stage 2: &lt;code>logpgp95 = 1.910 + 0.944 × avexpr_hat&lt;/code>. The 0.944 is the 2SLS coefficient — it uses only the part of &lt;code>avexpr&lt;/code> predicted by &lt;code>logem4&lt;/code>, throwing away the part contaminated by unobserved confounders.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>Filtering muddy water through a sieve. The sieve (stage 1) catches the dirt (unobserved confounding). What passes through (stage 2) is the clean signal you can drink — the part of &lt;code>X&lt;/code> driven only by the exogenous instrument.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>4. Weak instrument.&lt;/strong>
An instrument that has only a weak correlation with the endogenous regressor. Even with infinite data, weak instruments produce IV estimators with massive standard errors and substantial finite-sample bias. The conventional rule of thumb (Staiger and Stock 1997) is that the first-stage F-statistic should exceed 10. Stock and Yogo (2005) give more refined critical values.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>In our main spec, the Kleibergen-Paap rk Wald F = 16.32, just above the F &amp;gt; 10 rule of thumb but only marginally above the Stock-Yogo 10% maximal-IV-size threshold of 16.38. Several robustness specs (Tables 6 and 7) drop the F below 5, which means the IV estimate&amp;rsquo;s confidence interval should not be taken literally.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>A radio antenna pointing in roughly the right direction. If the signal is strong enough you hear the music clearly. If the signal is weak (low F) you hear mostly static. The static is the bias.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>5. LATE vs ATE.&lt;/strong>
Under heterogeneous treatment effects, 2SLS does &lt;strong>not&lt;/strong> identify the population average treatment effect (ATE). Imbens and Angrist (1994) show that 2SLS identifies the &lt;strong>Local Average Treatment Effect (LATE)&lt;/strong> — the effect for the subpopulation of &amp;ldquo;compliers&amp;rdquo;, i.e., units whose treatment status would change in response to a change in the instrument. Under constant effects, LATE = ATE.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>Our 0.944 coefficient is the effect of &lt;code>avexpr&lt;/code> on &lt;code>logpgp95&lt;/code> for the subset of countries whose 1995 institutional quality would have been &lt;em>different&lt;/em> had their settler mortality been different. It is &lt;em>not&lt;/em> a population-average claim like &amp;ldquo;if every country improved its institutions by one point, GDP would rise by 94%.&amp;rdquo;&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>A drug trial where eligibility depends on a coin flip. The trial estimates the effect &lt;em>for people who comply with the coin flip&lt;/em>. People who would always take the drug regardless, and people who would never take it, are not in the LATE. The LATE is a real effect on real people — just not on everyone.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>6. Hansen J overidentification test.&lt;/strong>
When you have &lt;em>more&lt;/em> instruments than endogenous regressors, you can test the joint exogeneity of the instrument set. The Hansen J test compares the moment conditions across instruments: if they all agree on the same causal effect, the test does not reject. Critical caveat: Hansen J cannot test a &lt;em>single&lt;/em> instrument in a just-identified model, and it has low power against shared imputation bias.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>In Table 8 Panel C we pair each alternative instrument with &lt;code>logem4&lt;/code> and run efficient GMM. Hansen J p-values range from 0.21 to 0.80 across five instrument pairs — uniformly failing to reject. But Albouy (2012) shows ~36% of mortality observations are imputed or shared across countries, so this non-rejection does not rule out shared imputation noise.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>Two witnesses giving the same alibi. Their agreement is &lt;em>consistent with&lt;/em> truth, but if they share a flawed memory of the same event, they will agree falsely. Hansen J cannot tell consistent witnesses from coordinated ones.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>7. First stage and reduced form.&lt;/strong>
The &lt;strong>first stage&lt;/strong> is the regression of the endogenous regressor &lt;code>X&lt;/code> on the instrument &lt;code>Z&lt;/code> (and controls). The &lt;strong>reduced form&lt;/strong> is the regression of the outcome &lt;code>Y&lt;/code> directly on the instrument &lt;code>Z&lt;/code> (and controls). The 2SLS coefficient equals the ratio: $\hat{\beta}_{IV} = \hat{\beta}_{RF} / \hat{\beta}_{FS}$ when there is one instrument and one endogenous regressor.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>First stage: $\hat{\beta}_{FS} = -0.607$ (logem4 → avexpr). Reduced form: $\hat{\beta}_{RF} = -0.573$ (logem4 → logpgp95, computed in §6 below). Ratio: $-0.573 / -0.607 = 0.944$ — exactly the 2SLS coefficient. The whole IV machinery boils down to this one division.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>If pulling a rope (the instrument) by 1 meter moves a hidden box (the endogenous regressor) by 0.6 meters, and that pulling also lifts a flag (the outcome) by 0.57 meters, then moving the box by 1 meter must lift the flag by 0.57/0.6 = 0.94 meters. IV is just this proportion calculation.&lt;/p>
&lt;/details>
&lt;/div>
&lt;hr>
&lt;h2 id="2-setup-and-dependencies">2. Setup and dependencies&lt;/h2>
&lt;p>The script depends on four community-contributed Stata packages from the SSC archive: &lt;code>ivreg2&lt;/code> (the IV workhorse), &lt;code>ranktest&lt;/code> (a dependency of &lt;code>ivreg2&lt;/code>), &lt;code>estout&lt;/code> (for table assembly via &lt;code>eststo&lt;/code> and &lt;code>esttab&lt;/code>), and &lt;code>coefplot&lt;/code> (for the comparison plot at the end). The &lt;code>capture ssc install&lt;/code> pattern is idempotent: it installs each package on the first run and does nothing on subsequent runs. We also define the dark-theme color palette as global macros — Stata&amp;rsquo;s &lt;code>color()&lt;/code> graph option takes RGB triplets, not hex codes, so we pre-convert the site palette.&lt;/p>
&lt;pre>&lt;code class="language-stata">clear all
set more off
set seed 42
capture log close
log using &amp;quot;analysis.log&amp;quot;, text replace
// SSC dependencies
capture ssc install ivreg2
capture ssc install ranktest
capture ssc install estout
capture ssc install coefplot
// Globals: outcome, treatment, instrument
global Y logpgp95
global X avexpr
global Z logem4
// Data-loading mode: 1 = GitHub raw URL (replicable), 0 = local folder
global USE_GITHUB 1
if $USE_GITHUB {
global DATA_URL &amp;quot;https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_iv&amp;quot;
}
else {
global DATA_URL &amp;quot;.&amp;quot;
}
// Dark-theme color palette (hex -&amp;gt; Stata &amp;quot;R G B&amp;quot; triplet)
global DARK_NAVY &amp;quot;15 23 41&amp;quot; // background
global STEEL_BLUE &amp;quot;106 155 204&amp;quot; // primary data points
global WARM_ORANGE &amp;quot;217 119 87&amp;quot; // fit lines
global TEAL &amp;quot;0 212 200&amp;quot; // labels and highlights
global LIGHT_TEXT &amp;quot;200 208 224&amp;quot; // axis labels
global WHITE_TEXT &amp;quot;232 236 242&amp;quot; // titles
&lt;/code>&lt;/pre>
&lt;p>The three globals &lt;code>Y&lt;/code>, &lt;code>X&lt;/code>, and &lt;code>Z&lt;/code> map directly onto the IV diagram above: &lt;code>Y&lt;/code> is the outcome (log GDP), &lt;code>X&lt;/code> is the endogenous regressor (institutional quality), and &lt;code>Z&lt;/code> is the instrument (log settler mortality). Using globals keeps every regression below readable and consistent — every spec is &lt;code>ivreg2 ${Y} ... (${X} = ${Z})&lt;/code>.&lt;/p>
&lt;p>The &lt;code>USE_GITHUB&lt;/code> toggle lets the same do-file run two ways: with &lt;code>1&lt;/code> (the default) Stata pulls each &lt;code>.dta&lt;/code> from this site&amp;rsquo;s GitHub raw URL — so any reader can &lt;code>do analysis.do&lt;/code> and replicate the full set of tables without cloning the repo or downloading the AJR archive. Flipping it to &lt;code>0&lt;/code> loads from the current folder instead, which is faster for offline iteration. The eight &lt;code>.dta&lt;/code> files (&lt;code>maketable1.dta&lt;/code> … &lt;code>maketable8.dta&lt;/code>) are mirrored at the post root so both modes work.&lt;/p>
&lt;hr>
&lt;h2 id="3-data-overview">3. Data overview&lt;/h2>
&lt;p>AJR provide eight datasets — one per table in the original paper. Table 1&amp;rsquo;s dataset (&lt;code>maketable1.dta&lt;/code>) covers the full ~163-country world; Tables 2–8 progressively narrow to the 64-country &lt;strong>base sample&lt;/strong> (&lt;code>baseco==1&lt;/code>) of ex-colonies with valid settler-mortality data. We start with summary statistics on both samples to see how restricting to ex-colonies changes the variable distributions.&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;${DATA_URL}/maketable1.dta&amp;quot;, clear
di &amp;quot;*** Whole world ***&amp;quot;
summarize logpgp95 loghjypl avexpr cons00a cons1 democ00a euro1900
di &amp;quot;*** AJR base sample (baseco==1) ***&amp;quot;
preserve
keep if baseco==1
summarize logpgp95 loghjypl avexpr cons00a cons1 democ00a euro1900 logem4
estpost summarize logpgp95 loghjypl avexpr cons00a cons1 democ00a euro1900 logem4
esttab using &amp;quot;tab1_summary.csv&amp;quot;, csv replace ///
cells(&amp;quot;count(fmt(0)) mean(fmt(3)) sd(fmt(3)) min(fmt(3)) max(fmt(3))&amp;quot;)
restore
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">*** Whole world ***
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
logpgp95 | 162 8.304196 1.070869 6.109248 10.28875
avexpr | 129 6.988548 1.831779 1.636364 10
euro1900 | 166 30.10241 41.86424 0 100
*** AJR base sample (baseco==1) ***
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
logpgp95 | 64 8.062237 1.043359 6.109248 10.21574
avexpr | 64 6.515625 1.468647 3.5 10
euro1900 | 63 16.18095 25.53334 0 99
logem4 | 64 4.657031 1.257984 2.145931 7.986165
&lt;/code>&lt;/pre>
&lt;p>The base sample has 64 former colonies — about 39% of the 162-country universe. Restricting to ex-colonies lowers the mean of &lt;code>avexpr&lt;/code> from 6.99 to 6.52 (institutions are weaker on average among ex-colonies than the world average) and lowers the mean of &lt;code>euro1900&lt;/code> from 30.1 to 16.2 (ex-colonies had fewer European settlers in 1900). The instrument &lt;code>logem4&lt;/code> ranges from 2.15 (very low mortality, ~9 deaths per 1,000) to 7.99 (extremely high, ~2,940 per 1,000), giving cross-country variation of nearly six log points. Log GDP per capita varies from 6.11 (~\$450, the poorest country) to 10.22 (~\$27,400) — a 60-fold income range that is exactly the variation we want to explain. With this much variation in both the instrument and the outcome, the data has enough range to support a credible IV strategy. The next step is to ask: how &lt;em>would&lt;/em> a naive OLS estimate look on this sample?&lt;/p>
&lt;hr>
&lt;h2 id="4-the-naive-ols-benchmark-table-2">4. The naive OLS benchmark (Table 2)&lt;/h2>
&lt;p>Before we instrument anything, we should know what OLS thinks. If OLS already gave us the right answer, IV would be unnecessary. The OLS regression of log GDP per capita on &lt;code>avexpr&lt;/code> (and a few controls) is the natural starting point. We follow AJR Table 2&amp;rsquo;s column structure: full sample, base sample, latitude, continent dummies. All standard errors are robust (&lt;code>vce(robust)&lt;/code>).&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;${DATA_URL}/maketable2.dta&amp;quot;, clear
eststo m2_c1: regress logpgp95 avexpr, robust
eststo m2_c2: regress logpgp95 avexpr if baseco==1, robust
eststo m2_c3: regress logpgp95 avexpr lat_abst, robust
eststo m2_c4: regress logpgp95 avexpr lat_abst africa asia other_cont, robust
esttab m2_c1 m2_c2 m2_c3 m2_c4 using &amp;quot;tab2_ols.csv&amp;quot;, csv replace ///
b(3) se(3) star(* 0.10 ** 0.05 *** 0.01) stats(N r2)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> (1) (2) (3) (4)
Full Base +Latitude +Continents
N=111 N=64 N=111 N=111
avexpr 0.532*** 0.522*** 0.463*** 0.390***
(0.029) (0.050) (0.052) (0.051)
lat_abst 0.872* 0.333
(0.499) (0.442)
africa -0.916***
(0.154)
R-squared 0.611 0.540 0.623 0.715
&lt;/code>&lt;/pre>
&lt;p>The naive OLS coefficient is remarkably stable across specifications: 0.532 in the full 111-country sample (Col 1), 0.522 in the 64-country base sample (Col 2), and falls only to 0.390 once continent dummies are added (Col 4). At face value, a one-point increase in expropriation protection (on AJR&amp;rsquo;s 0–10 scale) is associated with a 39%–53% rise in income per capita, statistically significant at the 1% level. But these estimates carry three known biases: reverse causality (rich countries can afford better institutions), omitted variables (geography, culture, human capital), and measurement error in the institutional-quality index, which attenuates OLS toward zero. We need IV to find out how much of the 0.522 is bias and how much is the true causal effect.&lt;/p>
&lt;hr>
&lt;h2 id="5-the-first-stage-and-the-reduced-form-table-3-and-figures-12">5. The first stage and the reduced form (Table 3 and Figures 1–2)&lt;/h2>
&lt;p>An instrument must first be &lt;strong>relevant&lt;/strong> — it must move the endogenous regressor. We test relevance with the first-stage regression: &lt;code>avexpr&lt;/code> on &lt;code>logem4&lt;/code> and any controls. Table 3 of AJR shows that settler mortality predicts current institutions (Panel A) &lt;em>and&lt;/em> historical institutions in 1900 (Panel B). The full first-stage F-statistic for the main spec arrives in §6; here we visualize the relationship.&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;${DATA_URL}/maketable4.dta&amp;quot;, clear
keep if baseco==1
// Run the first stage to extract numeric F-statistic
ivreg2 logpgp95 (avexpr=logem4), robust
di _newline &amp;quot;*** First-stage Kleibergen-Paap rk Wald F: &amp;quot; %6.2f e(widstat)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">First-stage regression of avexpr on logem4:
logem4 | -.6067782 .1501972 -4.04 0.000
*** First-stage Kleibergen-Paap rk Wald F: 16.32
*** Stock-Yogo 10% maximal IV size critical value: 16.38 (IID)
*** Under robust SEs, see Olea &amp;amp; Pflueger (2013) effective F.
&lt;/code>&lt;/pre>
&lt;p>A one-log-point increase in settler mortality lowers modern expropriation protection by 0.607 points, with a t-statistic of 4.04. The first-stage Kleibergen-Paap rk Wald F-statistic is &lt;strong>16.32&lt;/strong>, just above the Staiger-Stock (1997) rule of thumb of F &amp;gt; 10 and almost exactly equal to the Stock-Yogo (2005) iid threshold of 16.38 for ≤10% maximal IV size distortion. Honest disclosure: 16.32 is &lt;em>borderline&lt;/em>, not comfortable. Under heteroskedasticity-robust standard errors (which we are using), the more rigorous benchmark is the Olea-Pflueger (2013) effective F (&lt;code>weakivtest&lt;/code> in SSC); we will fall back on the weak-IV-robust Anderson-Rubin Wald test in §6 to confirm significance even if one is uncomfortable with the conventional asymptotics.&lt;/p>
&lt;p>The next two figures make the same point graphically. Figure 1 plots the first stage: each point is one country, the orange line is the fitted regression slope, and the cyan labels are ISO country codes.&lt;/p>
&lt;pre>&lt;code class="language-stata">twoway ///
(scatter avexpr logem4, ///
mcolor(&amp;quot;${STEEL_BLUE}&amp;quot;) ///
mlabel(shortnam) mlabcolor(&amp;quot;${TEAL}&amp;quot;) mlabsize(vsmall)) ///
(lfit avexpr logem4, lcolor(&amp;quot;${WARM_ORANGE}&amp;quot;) lwidth(medthick)), ///
title(&amp;quot;Figure 1. First stage: settler mortality predicts institutions&amp;quot;, color(&amp;quot;${WHITE_TEXT}&amp;quot;)) ///
xtitle(&amp;quot;Log settler mortality (logem4)&amp;quot;, color(&amp;quot;${LIGHT_TEXT}&amp;quot;)) ///
ytitle(&amp;quot;Avg. protection from expropriation (avexpr)&amp;quot;, color(&amp;quot;${LIGHT_TEXT}&amp;quot;)) ///
graphregion(color(&amp;quot;${DARK_NAVY}&amp;quot;)) plotregion(color(&amp;quot;${DARK_NAVY}&amp;quot;)) ///
bgcolor(&amp;quot;${DARK_NAVY}&amp;quot;) legend(off)
graph export &amp;quot;stata_iv_first_stage.png&amp;quot;, replace width(2400)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_iv_first_stage.png" alt="First stage: settler mortality predicts institutions">
&lt;em>Figure 1. First-stage scatter of &lt;code>avexpr&lt;/code> (modern expropriation protection) on &lt;code>logem4&lt;/code> (log settler mortality), 64 ex-colonies. Slope = −0.607, F = 16.32, R² = 0.27.&lt;/em>&lt;/p>
&lt;p>The negative slope is unmistakable. Australia (&lt;code>AUS&lt;/code>), New Zealand (&lt;code>NZL&lt;/code>), and the United States (&lt;code>USA&lt;/code>) — the three lowest-mortality colonies — sit at &lt;code>avexpr&lt;/code> ≈ 9–10. Sierra Leone (&lt;code>SLE&lt;/code>), Niger (&lt;code>NER&lt;/code>), and Mali (&lt;code>MLI&lt;/code>) — among the highest-mortality colonies — sit near &lt;code>avexpr&lt;/code> ≈ 3.5–5. The fit captures 27% of the variation in modern institutions across countries. This is the empirical foundation of AJR&amp;rsquo;s argument: deadly disease environments produced extractive colonies, which produced weak modern institutions.&lt;/p>
&lt;p>Figure 2 plots the &lt;strong>reduced form&lt;/strong> — the regression of the &lt;em>outcome&lt;/em> on the &lt;em>instrument&lt;/em> directly, skipping &lt;code>avexpr&lt;/code>. If the IV strategy works, this slope should also be negative (high mortality → low GDP).&lt;/p>
&lt;pre>&lt;code class="language-stata">twoway ///
(scatter logpgp95 logem4, ///
mcolor(&amp;quot;${STEEL_BLUE}&amp;quot;) ///
mlabel(shortnam) mlabcolor(&amp;quot;${TEAL}&amp;quot;) mlabsize(vsmall)) ///
(lfit logpgp95 logem4, lcolor(&amp;quot;${WARM_ORANGE}&amp;quot;) lwidth(medthick)), ///
title(&amp;quot;Figure 2. Reduced form: settler mortality predicts log GDP&amp;quot;, color(&amp;quot;${WHITE_TEXT}&amp;quot;)) ///
xtitle(&amp;quot;Log settler mortality (logem4)&amp;quot;, color(&amp;quot;${LIGHT_TEXT}&amp;quot;)) ///
ytitle(&amp;quot;Log GDP per capita, PPP, 1995 (logpgp95)&amp;quot;, color(&amp;quot;${LIGHT_TEXT}&amp;quot;)) ///
graphregion(color(&amp;quot;${DARK_NAVY}&amp;quot;)) plotregion(color(&amp;quot;${DARK_NAVY}&amp;quot;)) ///
bgcolor(&amp;quot;${DARK_NAVY}&amp;quot;) legend(off)
graph export &amp;quot;stata_iv_reduced_form.png&amp;quot;, replace width(2400)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_iv_reduced_form.png" alt="Reduced form: settler mortality predicts log GDP">
&lt;em>Figure 2. Reduced-form scatter of &lt;code>logpgp95&lt;/code> (log GDP per capita, 1995, PPP) on &lt;code>logem4&lt;/code>, 64 ex-colonies. The slope (≈ −0.573) is the total effect of the instrument on the outcome.&lt;/em>&lt;/p>
&lt;p>The reduced-form gradient is steep: across the 5.8-log-point span of &lt;code>logem4&lt;/code>, the fitted line predicts a GDP gap of about 3.4 log points — roughly &lt;strong>30× poorer&lt;/strong> for the highest-mortality colonies relative to the lowest-mortality ones. This is the &lt;em>total&lt;/em> effect of the instrument on the outcome. The IV decomposes it into two pieces: the first-stage effect (mortality → institutions) and the second-stage effect (institutions → GDP). When we divide the reduced-form slope by the first-stage slope, the institutions-mediated channel pops out.&lt;/p>
&lt;hr>
&lt;h2 id="6-the-main-2sls-estimate-table-4">6. The main 2SLS estimate (Table 4)&lt;/h2>
&lt;p>This is the headline result. We instrument &lt;code>avexpr&lt;/code> with &lt;code>logem4&lt;/code>, all standard errors are heteroskedasticity-robust, and we add the Durbin-Wu-Hausman endogeneity test via &lt;code>ivreg2&lt;/code>&amp;rsquo;s &lt;code>endog()&lt;/code> option. Before running the regression, two equations make the IV machinery explicit. The structural model is:&lt;/p>
&lt;p>$$Y_i = \alpha + \beta X_i + U_i, \quad \text{where} \, \, \text{Cov}(X_i, U_i) \neq 0$$&lt;/p>
&lt;p>In words, this says the outcome $Y_i$ is generated by a linear function of the endogenous regressor $X_i$ plus an error $U_i$ that is correlated with $X_i$ — that correlation is precisely what makes OLS biased. $Y_i$ is &lt;code>logpgp95&lt;/code> for country $i$, $X_i$ is &lt;code>avexpr&lt;/code>, and $U_i$ collects every unobserved determinant of GDP that we cannot explicitly model (geography, culture, human capital, measurement noise). The IV strategy targets $\beta$ — the &lt;em>true&lt;/em> causal coefficient — by replacing $X_i$ with the part of it predicted by an external instrument. The 2SLS estimator can then be written as a single ratio:&lt;/p>
&lt;p>$$\hat{\beta}_{2SLS} = \frac{\widehat{\text{Cov}}(Y, Z)}{\widehat{\text{Cov}}(X, Z)} = \frac{\hat{\beta}_{RF}}{\hat{\beta}_{FS}}$$&lt;/p>
&lt;p>In words, the 2SLS coefficient equals the reduced-form slope divided by the first-stage slope when we have one endogenous regressor and one instrument. $Z_i$ is &lt;code>logem4&lt;/code>. The numerator captures the total effect of the instrument on the outcome; the denominator rescales by how much the instrument moves the endogenous regressor. The ratio gives the per-unit effect of &lt;code>avexpr&lt;/code> on &lt;code>logpgp95&lt;/code> along the part of variation that the instrument can identify.&lt;/p>
&lt;pre>&lt;code class="language-stata">ivreg2 logpgp95 (avexpr=logem4), robust first endog(avexpr)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">2SLS estimate, base sample (N=64):
avexpr | .9442794 .1760958 5.36 0.000 .5991379 1.289421
_cons | 1.909667 1.173955 1.63 0.104 -.3912422 4.210575
Underidentification (Kleibergen-Paap rk LM): 9.492 p = 0.0021
Weak ID (Cragg-Donald F): 22.95
Weak ID (Kleibergen-Paap rk Wald F): 16.32
Stock-Yogo 10% maximal IV size threshold: 16.38 (iid)
Anderson-Rubin Wald test (weak-IV-robust): F(1,62) = 61.66 p &amp;lt; 0.0001
Endogeneity test (Durbin-Wu-Hausman): chi2(1) = 9.085 p = 0.0026
&lt;/code>&lt;/pre>
&lt;p>The 2SLS coefficient on &lt;code>avexpr&lt;/code> is &lt;strong>0.944&lt;/strong> with a robust standard error of 0.176 (95% CI [0.60, 1.29]). It is &lt;strong>81% larger&lt;/strong> than the OLS estimate of 0.522 and statistically distinguishable from zero at the 1% level (z = 5.36). The Kleibergen-Paap rk Wald F = 16.32 sits just below the Cragg-Donald F = 22.95 (as expected under heteroskedasticity) and at the Stock-Yogo iid threshold; the weak-IV-robust Anderson-Rubin Wald test (F = 61.66, p &amp;lt; 0.0001) gives extra reassurance. The Durbin-Wu-Hausman endogeneity test rejects the null that OLS is consistent ($\chi^2 = 9.09$, $p = 0.003$): the IV-OLS gap is large enough to constitute statistical evidence that OLS is biased — IV is empirically warranted, not just theoretically motivated.&lt;/p>
&lt;p>In domain terms: moving Nigeria (&lt;code>avexpr&lt;/code> = 5.55) up to Chile&amp;rsquo;s level (&lt;code>avexpr&lt;/code> = 7.82) would, all else equal, raise its log GDP per capita by 0.944 × 2.27 ≈ 2.15 points — roughly an &lt;strong>8.5-fold increase&lt;/strong> in income. That is enormous. It is also a LATE: it is the effect on the subpopulation of countries whose institutions would &lt;em>change&lt;/em> in response to a hypothetical change in their settler-mortality history. It is not a population-average claim about every country.&lt;/p>
&lt;p>The IV &amp;gt; OLS gap (0.944 vs 0.522) is itself informative. Three biases push OLS in different directions: reverse causality and omitted variables typically push the OLS slope &lt;em>upward&lt;/em>, while measurement error in the institutional-quality index pushes it &lt;em>downward&lt;/em> (classical attenuation bias). The fact that IV &amp;gt; OLS by 81% suggests measurement error is the &lt;em>dominant&lt;/em> source of bias in the OLS estimate — institutional quality is a noisy proxy for the true latent property-rights regime, and de-noising it via IV reveals a steeper underlying causal slope.&lt;/p>
&lt;hr>
&lt;h2 id="7-robustness-1-colonial-legal-and-religious-controls-table-5">7. Robustness 1: colonial, legal, and religious controls (Table 5)&lt;/h2>
&lt;p>A skeptic&amp;rsquo;s first objection to AJR is that something about &lt;em>which&lt;/em> European power did the colonizing — or about legal traditions, religious composition, or culture — drives both modern institutions and modern income. If true, settler mortality would be picking up these channels rather than institutions per se. Table 5 adds British/French dummies, French legal origin (&lt;code>sjlofr&lt;/code>), and Catholic/Muslim/non-Christian-majority shares as exogenous controls.&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;${DATA_URL}/maketable5.dta&amp;quot;, clear
keep if baseco==1
eststo m5_c1: ivreg2 logpgp95 f_brit f_french (avexpr=logem4), robust
eststo m5_c5: ivreg2 logpgp95 sjlofr (avexpr=logem4), robust
eststo m5_c7: ivreg2 logpgp95 catho80 muslim80 no_cpm80 (avexpr=logem4), robust
esttab m5_c1 m5_c5 m5_c7 using &amp;quot;tab5_iv_controls.csv&amp;quot;, csv replace ///
b(3) se(3) star(* 0.10 ** 0.05 *** 0.01) ///
stats(N r2 firstF, fmt(0 3 2))
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> (1) (5) (7)
+brit/french +legal +religion
avexpr 1.078*** 1.080*** 0.917***
(0.240) (0.202) (0.156)
First-stage F (KP) 11.73 15.94 16.76
N 64 64 64
&lt;/code>&lt;/pre>
&lt;p>Adding colonial-identity dummies, legal-origin, or religion shares leaves the IV coefficient on &lt;code>avexpr&lt;/code> between &lt;strong>0.917 and 1.339&lt;/strong> across the nine columns — never below the 0.944 baseline and frequently larger. Standard errors widen (0.156 to 0.535), and first-stage F-statistics range from 2.90 (Col 4, with Neo-Europes excluded + latitude) to 16.76 (Col 7). AJR&amp;rsquo;s argument that institutions are doing the work — not legal origin, religion, or which European power did the colonizing — survives this battery: none of these control sets eliminate or even meaningfully shrink the institutional-quality coefficient. The Col 4 caveat is real, but it is a confidence-interval survival rather than a tight-point-estimate one.&lt;/p>
&lt;hr>
&lt;h2 id="8-robustness-2-geography-and-climate-table-6">8. Robustness 2: geography and climate (Table 6)&lt;/h2>
&lt;p>Geography is the most plausible threat to the exclusion restriction. Maybe high settler mortality reflects tropical disease environments that &lt;em>directly&lt;/em> depress modern productivity — through agriculture, labor productivity, or human-capital accumulation — independent of institutions. If true, settler mortality would have a direct arrow into &lt;code>logpgp95&lt;/code> and the exclusion restriction would fail.&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;${DATA_URL}/maketable6.dta&amp;quot;, clear
keep if baseco==1
eststo m6_c1: ivreg2 logpgp95 temp1-temp5 humid1-humid4 (avexpr=logem4), robust
eststo m6_c5: ivreg2 logpgp95 steplow deslow stepmid desmid drystep drywint goldm iron silv zinc oilres landlock (avexpr=logem4), robust
eststo m6_c7: ivreg2 logpgp95 avelf (avexpr=logem4), robust
esttab m6_c1 m6_c5 m6_c7 using &amp;quot;tab6_iv_geo.csv&amp;quot;, csv replace ///
b(3) se(3) star(* 0.10 ** 0.05 *** 0.01) stats(N r2 firstF)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> (1) (5) (7)
+climate +resources +ethnic-frac
avexpr 0.837*** 1.259** 0.738***
(0.165) (0.543) (0.140)
First-stage F (KP) 17.80 2.83 14.99
N 64 64 64
&lt;/code>&lt;/pre>
&lt;p>Across nine geographic specifications — temperature dummies, humidity, latitude, percent in steppe/desert/dry climate, mineral resources, landlock status, ethnolinguistic fractionalization (&lt;code>avelf&lt;/code>) — the IV coefficient on &lt;code>avexpr&lt;/code> ranges from &lt;strong>0.713 to 1.358&lt;/strong>, bracketing the 0.944 baseline. The catch is that first-stage F drops below 10 in five of nine columns (lowest 1.74 in Col 6, 2.83 in Col 5), because the geography variables are themselves correlated with &lt;code>logem4&lt;/code>. The qualitative conclusion holds; the quantitative confidence intervals widen.&lt;/p>
&lt;hr>
&lt;h2 id="9-robustness-3-the-trickiest-case--health-channels-table-7">9. Robustness 3: the trickiest case — health channels (Table 7)&lt;/h2>
&lt;p>The tightest empirical challenge to AJR&amp;rsquo;s exclusion restriction is health. If the disease environment that killed European settlers in 1700 &lt;em>still&lt;/em> depresses productivity in 1995 (through malaria, infant mortality, or low life expectancy), then &lt;code>logem4&lt;/code> enters &lt;code>logpgp95&lt;/code> through a direct health channel, not just through institutions. Table 7 includes modern health variables as controls. Two readings are possible:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>AJR&amp;rsquo;s preferred reading:&lt;/strong> modern health is a &amp;ldquo;bad control&amp;rdquo; — itself an outcome of institutional quality, so adjusting for it shrinks the institutional coefficient toward zero artifactually.&lt;/li>
&lt;li>&lt;strong>A critic&amp;rsquo;s reading:&lt;/strong> modern health is genuinely exogenous, and its inclusion exposes a violation of the exclusion restriction.&lt;/li>
&lt;/ul>
&lt;p>The data alone cannot adjudicate.&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;${DATA_URL}/maketable7.dta&amp;quot;, clear
keep if baseco==1
eststo m7_c1: ivreg2 logpgp95 malfal94 (avexpr=logem4), robust
eststo m7_c3: ivreg2 logpgp95 leb95 (avexpr=logem4), robust
eststo m7_c5: ivreg2 logpgp95 imr95 (avexpr=logem4), robust
// Cols 7-9: 4 instruments, 2 endogenous regressors -&amp;gt; Hansen J meaningful
eststo m7_c7: ivreg2 logpgp95 (avexpr malfal94 = logem4 latabs lt100km meantemp), gmm2s robust
estadd scalar hansenJ = e(j)
estadd scalar hansenP = e(jp)
esttab m7_c1 m7_c3 m7_c5 m7_c7 using &amp;quot;tab7_iv_health.csv&amp;quot;, csv replace ///
b(3) se(3) star(* 0.10 ** 0.05 *** 0.01) ///
stats(N r2 firstF hansenJ hansenP)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> (1) (3) (5) (7) overid
+malaria +life exp. +infant mort. (4 instr)
avexpr 0.687*** 0.629** 0.551** 0.611***
(0.265) (0.295) (0.260) (0.235)
First-stage F (KP) 3.79 4.02 4.86 1.17
Hansen J 1.56 (p=0.459)
N 62 60 60 60
&lt;/code>&lt;/pre>
&lt;p>When malaria prevalence (&lt;code>malfal94&lt;/code>), life expectancy (&lt;code>leb95&lt;/code>), or infant mortality (&lt;code>imr95&lt;/code>) are added as exogenous controls, the IV coefficient on &lt;code>avexpr&lt;/code> falls to &lt;strong>0.55–0.69&lt;/strong> — the only place in the entire script where the IV approaches the OLS benchmark of 0.522. Cols 7–9 use four instruments for two endogenous regressors via efficient GMM (&lt;code>gmm2s&lt;/code>), making the Hansen J test meaningful: J p-values of 0.46–0.76 fail to reject the joint exogeneity of the instrument set, providing modest support for AJR&amp;rsquo;s reading. But the first-stage F-statistics in these overidentified specs collapse to &lt;strong>1.17–4.86&lt;/strong> — well below any weak-IV threshold — so the Hansen J non-rejection has &lt;em>low power&lt;/em> against shared imputation bias and limited confidence. Health channels are the place where a fair-minded reader should retain doubt.&lt;/p>
&lt;hr>
&lt;h2 id="10-overidentification-and-alternative-instruments-table-8">10. Overidentification and alternative instruments (Table 8)&lt;/h2>
&lt;p>If &lt;code>logem4&lt;/code> were the only instrument we had, we could not test the exclusion restriction directly. AJR&amp;rsquo;s solution is to use &lt;em>alternative&lt;/em> historical-institution variables — 1900 constraints on the executive (&lt;code>cons00a&lt;/code>), 1900 democracy (&lt;code>democ00a&lt;/code>), 1st-year-of-independence constraints (&lt;code>cons1&lt;/code>), independence year (&lt;code>indtime&lt;/code>), and 1st-year-of-independence democracy (&lt;code>democ1&lt;/code>) — and ask: do these all agree on the same causal effect? If yes, the joint exogeneity assumption is more credible.&lt;/p>
&lt;p>We split this into three parts. &lt;strong>Panel C&lt;/strong> pairs each alternative instrument with &lt;code>logem4&lt;/code> and runs efficient GMM, producing a Hansen J test. &lt;strong>Panel D&lt;/strong> drops the exclusion restriction on &lt;code>logem4&lt;/code> itself by including it as an exogenous control while alternative instruments do the identification — the harshest sensitivity check.&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;${DATA_URL}/maketable8.dta&amp;quot;, clear
keep if baseco==1
// Panel C: alt instrument + logem4 -&amp;gt; Hansen J meaningful
eststo m8c_c1: ivreg2 logpgp95 (avexpr = euro1900 logem4), gmm2s robust
eststo m8c_c3: ivreg2 logpgp95 (avexpr = cons00a logem4), gmm2s robust
eststo m8c_c5: ivreg2 logpgp95 (avexpr = democ00a logem4), gmm2s robust
// Panel D: logem4 as exogenous control, alt instrument identifies
eststo m8d_c1: ivreg2 logpgp95 logem4 (avexpr = euro1900), robust
eststo m8d_c3: ivreg2 logpgp95 logem4 (avexpr = cons00a), robust
eststo m8d_c5: ivreg2 logpgp95 logem4 (avexpr = democ00a), robust
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Panel C (overid): Hansen J p-values 0.21 to 0.80 across 5 alt instruments
-&amp;gt; uniformly fails to reject joint exogeneity
Panel D (logem4 as control):
euro1900 instrument: avexpr = 0.81-0.88 logem4 control = -0.05 to -0.07
cons00a instrument: avexpr = 0.42-0.45 logem4 control = -0.25 to -0.26
democ00a instrument: avexpr = 0.48-0.52 logem4 control = -0.21 to -0.22
cons1 instrument: avexpr = 0.49-0.49 logem4 control = -0.14 to -0.14
democ1 instrument: avexpr = 0.40-0.41 logem4 control = -0.19 to -0.19
In all 10 columns the logem4 control coefficient is statistically zero (p &amp;gt; 0.1).
&lt;/code>&lt;/pre>
&lt;p>Panel C delivers Hansen J p-values from &lt;strong>0.21 to 0.80&lt;/strong> across five alternative instrument pairs — uniformly failing to reject joint exogeneity. This is the test AJR pass cleanly. Panel D is more demanding: when &lt;code>logem4&lt;/code> enters as a control, the IV coefficient on &lt;code>avexpr&lt;/code> splits by instrument family. Cols 21–22 (using &lt;code>euro1900&lt;/code>) keep &lt;code>avexpr&lt;/code> at &lt;strong>0.81–0.88&lt;/strong> — likely because &lt;code>euro1900&lt;/code> is itself a continuous mortality-correlated proxy rather than a clean institutional alternative. Cols 23–30 (using historical-institution alternatives &lt;code>cons00a&lt;/code>, &lt;code>democ00a&lt;/code>, &lt;code>cons1&lt;/code>, &lt;code>indtime&lt;/code>, &lt;code>democ1&lt;/code>) fall to &lt;strong>0.40–0.52&lt;/strong>. The &lt;code>logem4&lt;/code> control is itself never statistically distinguishable from zero across any of the 10 columns. This pattern is consistent with AJR&amp;rsquo;s claim — settler mortality affects modern income only through institutions — but the 8-of-10 drop in coefficient magnitude when &lt;code>logem4&lt;/code> is moved to the right-hand side suggests some of the baseline IV&amp;rsquo;s strength came from &lt;code>logem4&lt;/code> proxying for unobserved correlates that the historical-institution alternatives do not capture.&lt;/p>
&lt;p>A critical caveat is owed: Albouy (2012) shows that roughly 36% of AJR&amp;rsquo;s mortality observations are imputed or shared across countries (e.g., one African country&amp;rsquo;s mortality figure used for several neighbors). Hansen J non-rejection assumes &lt;em>independent&lt;/em> moment conditions. If the alternative instruments share imputation noise with &lt;code>logem4&lt;/code>, they would agree spuriously — Hansen J cannot detect coordinated witnesses.&lt;/p>
&lt;hr>
&lt;h2 id="11-the-visual-summary-ols-vs-iv-across-specifications-figure-3">11. The visual summary: OLS vs IV across specifications (Figure 3)&lt;/h2>
&lt;p>Figure 3 presents a &lt;code>coefplot&lt;/code> of the &lt;code>avexpr&lt;/code> coefficient across six representative specifications: OLS baseline (orange), four IV variants with &lt;code>logem4&lt;/code> (steel blue), and IV with the &lt;code>euro1900&lt;/code> alternative instrument (teal). The visual confirms what the tables show numerically.&lt;/p>
&lt;pre>&lt;code class="language-stata">coefplot ///
(m4_ols_c1, label(&amp;quot;OLS&amp;quot;) mcolor(&amp;quot;${WARM_ORANGE}&amp;quot;)) ///
(m4_iv_c1, label(&amp;quot;IV: settler mortality&amp;quot;) mcolor(&amp;quot;${STEEL_BLUE}&amp;quot;)) ///
(m5_iv_c1, label(&amp;quot;IV + colonial controls&amp;quot;) mcolor(&amp;quot;${STEEL_BLUE}&amp;quot;)) ///
(m6_iv_c1, label(&amp;quot;IV + geography controls&amp;quot;) mcolor(&amp;quot;${STEEL_BLUE}&amp;quot;)) ///
(m7_iv_c1, label(&amp;quot;IV + malaria control&amp;quot;) mcolor(&amp;quot;${STEEL_BLUE}&amp;quot;)) ///
(m8a_c1, label(&amp;quot;IV: alt instrument euro1900&amp;quot;) mcolor(&amp;quot;${TEAL}&amp;quot;)), ///
keep(avexpr) xline(0, lcolor(&amp;quot;${LIGHT_TEXT}&amp;quot;) lpattern(dash)) ///
title(&amp;quot;Effect of institutions on log GDP: OLS vs IV&amp;quot;, color(&amp;quot;${WHITE_TEXT}&amp;quot;)) ///
graphregion(color(&amp;quot;${DARK_NAVY}&amp;quot;)) plotregion(color(&amp;quot;${DARK_NAVY}&amp;quot;)) ///
bgcolor(&amp;quot;${DARK_NAVY}&amp;quot;)
graph export &amp;quot;stata_iv_ols_vs_iv.png&amp;quot;, replace width(3000)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_iv_ols_vs_iv.png" alt="Effect of institutions on log GDP across specifications">
&lt;em>Figure 3. Coefficient on &lt;code>avexpr&lt;/code> across six representative specifications, 95% CIs. OLS in orange, four IV variants with &lt;code>logem4&lt;/code> in steel blue, IV with the alternative instrument &lt;code>euro1900&lt;/code> in teal.&lt;/em>&lt;/p>
&lt;p>The orange OLS estimate sits at 0.522 with a tight confidence interval. Every steel-blue IV variant — adding colonial controls, geography, or even the malaria control — sits at 0.69–0.94 with overlapping confidence intervals. The teal &lt;code>euro1900&lt;/code> alternative instrument lands near 0.87. Color semantics are deliberate: orange = naive estimator, blue family = IV with &lt;code>logem4&lt;/code>, teal = alternative instrument. The visual hierarchy mirrors the statistical hierarchy. No single specification stands above the rest as a &amp;ldquo;preferred estimate&amp;rdquo;; the message is that the institutional coefficient lives in the 0.7–1.0 range under any reasonable modeling choice — and is materially larger than the 0.5 OLS slope.&lt;/p>
&lt;hr>
&lt;h2 id="12-discussion">12. Discussion&lt;/h2>
&lt;p>&lt;strong>Do better institutions cause higher GDP per capita?&lt;/strong> The data say yes — and the magnitude is substantial. The 2SLS estimate of 0.944 implies that the gap between the world&amp;rsquo;s worst and best institutional environments accounts for a large share of the 60-fold income gap between the world&amp;rsquo;s poorest and richest ex-colonies. Specifically, the gap from &lt;code>avexpr&lt;/code> = 3.5 (worst) to &lt;code>avexpr&lt;/code> = 10 (best) is 6.5 institutional points; multiplied by 0.944, that is 6.14 log points of GDP, or a 465-fold income gap predicted by institutions alone — an upper-bound &lt;em>out of sample&lt;/em>, but a striking number.&lt;/p>
&lt;p>The IV-OLS gap (0.944 vs 0.522) tells its own story. IV is &lt;strong>81% larger&lt;/strong> than OLS. Three biases pull in opposite directions: reverse causality and omitted variables push OLS upward; classical measurement error in the institutional-quality index pulls OLS downward. The fact that IV &amp;gt; OLS implies measurement error dominates — institutional quality is a noisy proxy for the latent property-rights regime, and noise attenuates OLS. De-noising it via IV reveals a &lt;em>steeper&lt;/em> causal slope, not a shallower one.&lt;/p>
&lt;p>Two caveats are non-negotiable. First, the 0.944 is a &lt;strong>LATE&lt;/strong> for compliers, not a population ATE. It applies to the subpopulation of countries whose institutional quality would have responded to a hypothetical change in their colonial-era settler mortality. For countries far from the historical colonization margin — established European democracies, never-colonized states — the 0.944 is silent. Second, Albouy (2012) flagged that a substantial share of AJR&amp;rsquo;s mortality data are imputed or shared across countries. Hansen J overidentification non-rejection assumes independent measurement noise; shared imputation could pass the test undetected. The exclusion restriction is &lt;strong>untestable in principle&lt;/strong>, only &lt;em>partially&lt;/em> falsifiable in practice, and AJR&amp;rsquo;s assumption that 1700-era mortality affects 1995 GDP only through institutions remains a &lt;em>substantive&lt;/em> claim that empirical work can support but not prove.&lt;/p>
&lt;p>For policymakers and practitioners, the practical implication is sharper than the academic debate. If institutional quality has a causal effect on GDP roughly twice as large as naive cross-country regressions suggest, then institutional reform is &lt;strong>roughly twice as valuable&lt;/strong> as previously thought — and reforms that are merely correlated with growth in OLS samples may be substantially more powerful causal levers. Conversely, naive policy advice based on OLS slopes systematically &lt;em>understates&lt;/em> the returns to building courts, regulators, and parliaments.&lt;/p>
&lt;hr>
&lt;h2 id="13-summary-limitations-and-next-steps">13. Summary, limitations, and next steps&lt;/h2>
&lt;p>&lt;strong>Method insight.&lt;/strong> 2SLS recovers a causal effect that is 81% larger than OLS (0.944 vs 0.522) — consistent with classical attenuation from measurement error in the institutional-quality index dominating reverse-causality and omitted-variable biases. The Durbin-Wu-Hausman test ($\chi^2 = 9.09$, $p = 0.003$) confirms OLS is biased; the weak-IV-robust Anderson-Rubin Wald test ($F = 61.66$) confirms institutions matter even if one is uncomfortable with conventional 2SLS asymptotics on a borderline first-stage F.&lt;/p>
&lt;p>&lt;strong>Data insight.&lt;/strong> 64 ex-colonies span a 60-fold income range and a six-log-point mortality range. That much variation is enough to identify the IV cleanly when the instrument is strong, but not enough to identify it cleanly when controls absorb most of the first-stage signal. Robustness specs with first-stage F &amp;lt; 5 (Tab 6 Cols 5-6, Tab 7 Cols 7-9) live in weak-IV territory — read their confidence intervals, not their point estimates.&lt;/p>
&lt;p>&lt;strong>Limitation.&lt;/strong> The 0.944 is a LATE, not an ATE. It applies to the colonization-margin compliers, not the whole population of countries. It also depends on AJR&amp;rsquo;s exclusion restriction — that 1700-era settler mortality affects 1995 GDP only through institutions — which is untestable in principle and only partially probed by Hansen J in practice. Albouy&amp;rsquo;s (2012) imputation critique limits what J-test non-rejection can buy: roughly 36% of mortality observations are shared across countries, so the joint exogeneity test has low power against shared imputation noise.&lt;/p>
&lt;p>&lt;strong>Next step.&lt;/strong> Install the SSC &lt;code>weakivtest&lt;/code> package and rerun the main spec to obtain the Olea-Pflueger (2013) effective F-statistic — the right benchmark under heteroskedasticity-robust inference. If the effective F materially exceeds the Stock-Yogo iid threshold of 16.38, the conventional 2SLS asymptotics are safer to lean on. If it does not, the Anderson-Rubin Wald test becomes the primary inference tool.&lt;/p>
&lt;hr>
&lt;h2 id="14-exercises">14. Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Reduced-form ratio check.&lt;/strong> Compute the reduced-form coefficient by regressing &lt;code>logpgp95&lt;/code> directly on &lt;code>logem4&lt;/code> in the base sample. Verify that it equals approximately $-0.573$, and that dividing it by the first-stage coefficient $-0.607$ recovers the 2SLS estimate of 0.944. What does this exercise teach you about what 2SLS is doing under the hood?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Just-identified vs overidentified.&lt;/strong> Replicate Table 8 Panel C in just-identified form: run &lt;code>ivreg2 logpgp95 (avexpr = euro1900), gmm2s robust&lt;/code> (one instrument only). Note that Hansen J is now zero — the model is exactly identified. What does this tell you about the J-test&amp;rsquo;s logic? Why must we have &lt;em>more&lt;/em> instruments than endogenous regressors to compute it?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Stress-test the exclusion restriction.&lt;/strong> Pick a candidate omitted variable that you think could violate the exclusion restriction (e.g., percentage of population at high altitude, or distance from the equator). Add it as an exogenous control to the main spec and report what happens to the 2SLS coefficient on &lt;code>avexpr&lt;/code>. Is your candidate a &amp;ldquo;bad control&amp;rdquo; (downstream of institutions) or a genuine threat to exclusion (upstream of mortality)?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="15-references">15. References&lt;/h2>
&lt;ol>
&lt;li>&lt;a href="https://www.aeaweb.org/articles?id=10.1257/aer.91.5.1369" target="_blank" rel="noopener">Acemoglu, D., Johnson, S., and Robinson, J. A. (2001). &amp;ldquo;The Colonial Origins of Comparative Development: An Empirical Investigation.&amp;rdquo; &lt;em>American Economic Review&lt;/em>, 91(5), 1369–1401.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.aeaweb.org/articles?id=10.1257/aer.102.6.3059" target="_blank" rel="noopener">Albouy, D. Y. (2012). &amp;ldquo;The Colonial Origins of Comparative Development: An Investigation of the Settler Mortality Data.&amp;rdquo; &lt;em>American Economic Review&lt;/em>, 102(6), 3059–3076.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.jstor.org/stable/2951620" target="_blank" rel="noopener">Imbens, G. W. and Angrist, J. D. (1994). &amp;ldquo;Identification and Estimation of Local Average Treatment Effects.&amp;rdquo; &lt;em>Econometrica&lt;/em>, 62(2), 467–475.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.jstor.org/stable/2171753" target="_blank" rel="noopener">Staiger, D. and Stock, J. H. (1997). &amp;ldquo;Instrumental Variables Regression with Weak Instruments.&amp;rdquo; &lt;em>Econometrica&lt;/em>, 65(3), 557–586.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.nber.org/papers/t0284" target="_blank" rel="noopener">Stock, J. H. and Yogo, M. (2005). &amp;ldquo;Testing for Weak Instruments in Linear IV Regression.&amp;rdquo; In &lt;em>Identification and Inference for Econometric Models&lt;/em>, Cambridge University Press.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.tandfonline.com/doi/abs/10.1080/00401706.2013.806694" target="_blank" rel="noopener">Olea, J. L. M. and Pflueger, C. (2013). &amp;ldquo;A Robust Test for Weak Instruments.&amp;rdquo; &lt;em>Journal of Business and Economic Statistics&lt;/em>, 31(3), 358–369.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://journals.sagepub.com/doi/10.1177/1536867X0800700402" target="_blank" rel="noopener">Baum, C. F., Schaffer, M. E., and Stillman, S. (2007). &amp;ldquo;Enhanced routines for instrumental variables/generalized method of moments estimation and testing.&amp;rdquo; &lt;em>Stata Journal&lt;/em>, 7(4), 465–506.&lt;/a>&lt;/li>
&lt;li>&lt;a href="http://fmwww.bc.edu/RePEc/bocode/i/ivreg2.html" target="_blank" rel="noopener">&lt;code>ivreg2&lt;/code> — Stata SSC archive.&lt;/a>&lt;/li>
&lt;li>&lt;a href="http://repec.sowi.unibe.ch/stata/coefplot/" target="_blank" rel="noopener">&lt;code>coefplot&lt;/code> (Jann) — Stata SSC archive.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://economics.mit.edu/people/faculty/daron-acemoglu/data-archive" target="_blank" rel="noopener">AJR (2001) replication package — &lt;code>maketable1.dta&lt;/code> through &lt;code>maketable8.dta&lt;/code> are mirrored at the post root and loaded by &lt;code>analysis.do&lt;/code> from this site&amp;rsquo;s GitHub raw URL for one-click replicability.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://youtu.be/ROLeLaR-17U" target="_blank" rel="noopener">Duke Mod·U &amp;ldquo;Causal Inference Bootcamp&amp;rdquo; — &lt;em>Introduction to Regression Analysis&lt;/em>. YouTube video.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://youtu.be/vCkrWeJG5cs" target="_blank" rel="noopener">Duke Mod·U &amp;ldquo;Causal Inference Bootcamp&amp;rdquo; — &lt;em>Basic Elements of a Regression Table&lt;/em>. YouTube video.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://youtu.be/fDCgagw2CAI" target="_blank" rel="noopener">Duke Mod·U &amp;ldquo;Causal Inference Bootcamp&amp;rdquo; — &lt;em>The Relationship Between Economic Development and Property Rights&lt;/em>. YouTube video.&lt;/a>&lt;/li>
&lt;/ol></description></item></channel></rss>