<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>panel data | Carlos Mendez</title><link>https://carlos-mendez.org/tag/panel-data/</link><atom:link href="https://carlos-mendez.org/tag/panel-data/index.xml" rel="self" type="application/rss+xml"/><description>panel data</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>Carlos Mendez</copyright><lastBuildDate>Sun, 03 May 2026 00:00:00 +0000</lastBuildDate><image><url>https://carlos-mendez.org/media/icon_huedfae549300b4ca5d201a9bd09a3ecd5_79625_512x512_fill_lanczos_center_3.png</url><title>panel data</title><link>https://carlos-mendez.org/tag/panel-data/</link></image><item><title>MGWRFER: Causal Spatially Varying Coefficients via Panel Fixed Effects</title><link>https://carlos-mendez.org/post/python_mgwrfer/</link><pubDate>Sun, 03 May 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/python_mgwrfer/</guid><description>&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>When we estimate how relationships vary across space — say, the effect of education on income in different neighborhoods — a hidden danger lurks. If some unobserved factor (like geographic amenities or historical institutions) affects both the outcome and the covariates, our spatially varying coefficients absorb that contamination. The result: coefficients that look like local effects but actually reflect omitted variable bias.&lt;/p>
&lt;p>&lt;strong>Multiscale Geographically Weighted Fixed Effects Regression (MGWRFER)&lt;/strong> solves this by combining two powerful ideas: (1) a &lt;em>within-transformation&lt;/em> that removes all time-invariant confounders from panel data, and (2) &lt;em>Multiscale GWR&lt;/em> that estimates location-specific coefficients at variable-optimal spatial scales. Think of it as giving each location its own regression while simultaneously controlling for everything about that location that does not change over time.&lt;/p>
&lt;p>This tutorial asks: &lt;strong>can we recover the true spatially varying coefficients when a strong, unobserved spatial confounder contaminates the data?&lt;/strong> We simulate a panel of 225 spatial units observed over 3 time periods, inject a known confounder, and compare naive pooled MGWR (biased) against MGWRFER (bias-corrected). The answer is yes — MGWRFER cuts the most-biased coefficient&amp;rsquo;s estimation error by 55%, demonstrating that fixed effects and spatial flexibility can coexist.&lt;/p>
&lt;p>&lt;strong>Learning objectives:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Understand why pooled cross-sectional MGWR produces biased coefficients when time-invariant confounders exist&lt;/li>
&lt;li>Implement the within-transformation to eliminate fixed effects from panel data&lt;/li>
&lt;li>Estimate spatially varying coefficients using MGWR on demeaned data&lt;/li>
&lt;li>Assess coefficient recovery through RMSE, correlation, and spatial maps&lt;/li>
&lt;li>Interpret the bias-variance tradeoff inherent in fixed-effects spatial models&lt;/li>
&lt;/ul>
&lt;p>The analysis follows a clear progression: simulate known truth, fit the naive model, apply the correction, and compare.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
A[&amp;quot;&amp;lt;b&amp;gt;Step 1&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Simulate&amp;lt;br/&amp;gt;Panel DGP&amp;quot;] --&amp;gt; B[&amp;quot;&amp;lt;b&amp;gt;Step 2&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Pooled&amp;lt;br/&amp;gt;MGWR&amp;quot;]
B --&amp;gt; C[&amp;quot;&amp;lt;b&amp;gt;Step 3&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Within-&amp;lt;br/&amp;gt;Transform&amp;quot;]
C --&amp;gt; D[&amp;quot;&amp;lt;b&amp;gt;Step 4&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;MGWRFER&amp;lt;br/&amp;gt;Estimation&amp;quot;]
D --&amp;gt; E[&amp;quot;&amp;lt;b&amp;gt;Step 5&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Compare&amp;lt;br/&amp;gt;&amp;amp; Map&amp;quot;]
style A fill:#141413,stroke:#6a9bcc,color:#fff
style B fill:#d97757,stroke:#141413,color:#fff
style C fill:#6a9bcc,stroke:#141413,color:#fff
style D fill:#00d4c8,stroke:#141413,color:#fff
style E fill:#1a3a8a,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;p>The key insight is at Step 3: by subtracting each unit&amp;rsquo;s time-series mean, the confounder vanishes — it contributes the same amount at every time period, so the mean subtraction cancels it exactly. What remains is pure within-unit variation, driven only by the spatially varying coefficients and noise.&lt;/p>
&lt;h2 id="2-setup-and-imports">2. Setup and imports&lt;/h2>
&lt;p>The analysis uses a &lt;a href="https://github.com/GeoZhipengLi/MGWPR" target="_blank" rel="noopener">custom fork of the mgwr package&lt;/a> that extends MGWR with panel data support (the &lt;code>time&lt;/code> parameter) and the ability to fit without an intercept (&lt;code>constant=False&lt;/code>). We clone the repository and import directly.&lt;/p>
&lt;pre>&lt;code class="language-python">import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import warnings
warnings.filterwarnings(&amp;quot;ignore&amp;quot;, category=FutureWarning)
warnings.filterwarnings(&amp;quot;ignore&amp;quot;, category=RuntimeWarning)
# Clone custom MGWR package
import subprocess, sys, os
REPO_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), &amp;quot;mgwpr_repo&amp;quot;)
if not os.path.exists(REPO_DIR):
subprocess.run(
[&amp;quot;git&amp;quot;, &amp;quot;clone&amp;quot;, &amp;quot;https://github.com/GeoZhipengLi/MGWPR.git&amp;quot;, REPO_DIR],
check=True, capture_output=True
)
sys.path.insert(0, REPO_DIR)
from mgwr.gwr import GWR, MGWR
from mgwr.sel_bw import Sel_BW
# Configuration
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
N_GRID = 15
N_UNITS = N_GRID * N_GRID # 225
N_TIME = 3
N_OBS = N_UNITS * N_TIME # 675
&lt;/code>&lt;/pre>
&lt;details>
&lt;summary>Dark theme figure styling (click to expand)&lt;/summary>
&lt;pre>&lt;code class="language-python">DARK_NAVY = &amp;quot;#0f1729&amp;quot;
GRID_LINE = &amp;quot;#1f2b5e&amp;quot;
LIGHT_TEXT = &amp;quot;#c8d0e0&amp;quot;
WHITE_TEXT = &amp;quot;#e8ecf2&amp;quot;
STEEL_BLUE = &amp;quot;#6a9bcc&amp;quot;
WARM_ORANGE = &amp;quot;#d97757&amp;quot;
TEAL = &amp;quot;#00d4c8&amp;quot;
plt.rcParams.update({
&amp;quot;figure.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.edgecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.linewidth&amp;quot;: 0,
&amp;quot;axes.labelcolor&amp;quot;: LIGHT_TEXT,
&amp;quot;axes.titlecolor&amp;quot;: WHITE_TEXT,
&amp;quot;axes.spines.top&amp;quot;: False,
&amp;quot;axes.spines.right&amp;quot;: False,
&amp;quot;axes.spines.left&amp;quot;: False,
&amp;quot;axes.spines.bottom&amp;quot;: False,
&amp;quot;axes.grid&amp;quot;: True,
&amp;quot;grid.color&amp;quot;: GRID_LINE,
&amp;quot;grid.linewidth&amp;quot;: 0.6,
&amp;quot;grid.alpha&amp;quot;: 0.8,
&amp;quot;xtick.color&amp;quot;: LIGHT_TEXT,
&amp;quot;ytick.color&amp;quot;: LIGHT_TEXT,
&amp;quot;text.color&amp;quot;: WHITE_TEXT,
&amp;quot;font.size&amp;quot;: 12,
&amp;quot;legend.frameon&amp;quot;: False,
&amp;quot;savefig.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;savefig.edgecolor&amp;quot;: DARK_NAVY,
})
&lt;/code>&lt;/pre>
&lt;/details>
&lt;h2 id="3-simulating-panel-data-with-a-spatial-confounder">3. Simulating panel data with a spatial confounder&lt;/h2>
&lt;p>To evaluate whether MGWRFER works, we need &lt;strong>ground truth&lt;/strong> — known coefficient surfaces that we can compare against estimates. We simulate a 15x15 spatial grid (225 units) observed over 3 time periods, giving 675 total observations.&lt;/p>
&lt;p>The data generating process (DGP) combines four covariates with known spatially varying coefficients plus a strong time-invariant confounder:&lt;/p>
&lt;p>$$y_{it} = \alpha_i + \beta_1(u_i, v_i) \cdot x_{1,it} + \beta_2(u_i, v_i) \cdot x_{2,it} + \beta_3(u_i, v_i) \cdot x_{3,it} + \beta_4(u_i, v_i) \cdot x_{4,it} + \varepsilon_{it}$$&lt;/p>
&lt;p>In words, this says: the outcome at location $i$ and time $t$ equals a location-specific fixed effect $\alpha_i$ (the confounder) plus four covariates multiplied by their location-specific coefficients, plus random noise. The subscript $(u_i, v_i)$ denotes the spatial coordinates — each coefficient is a different &lt;em>surface&lt;/em> over the grid, not a single number.&lt;/p>
&lt;p>&lt;strong>Variable mapping:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>$\alpha_i$ = &lt;code>alpha_true&lt;/code> — an exponential function of column position (range 2.07 to 51.55)&lt;/li>
&lt;li>$\beta_1$ = &lt;code>beta_1_true&lt;/code> — a quadratic dome peaking at the grid center (range 1.06 to 2.00)&lt;/li>
&lt;li>$\beta_2$ = &lt;code>beta_2_true&lt;/code> — a linear gradient increasing from lower-left to upper-right (range 1.07 to 2.00)&lt;/li>
&lt;li>$\beta_3$ = &lt;code>beta_3_true&lt;/code> — constant at 1.5 everywhere (tests spatial homogeneity)&lt;/li>
&lt;li>$\beta_4$ = &lt;code>beta_4_true&lt;/code> — identically zero everywhere (tests false-positive detection)&lt;/li>
&lt;/ul>
&lt;pre>&lt;code class="language-python">rng = np.random.default_rng(RANDOM_SEED)
# Spatial grid coordinates
grid_i = np.repeat(np.arange(1, N_GRID + 1), N_GRID)
grid_j = np.tile(np.arange(1, N_GRID + 1), N_GRID)
# True spatially varying coefficients
q = np.ceil(N_GRID / 4)
beta_1_true = 1 + ((q**2 - (q - grid_i/2)**2) * (q**2 - (q - grid_j/2)**2)) / q**4
beta_2_true = 1 + (grid_i + grid_j) / (2 * N_GRID)
beta_3_true = np.full(N_UNITS, 1.5)
beta_4_true = np.zeros(N_UNITS)
# Time-invariant spatial confounder (fixed effect)
alpha_true = 30 * (np.exp(grid_j / N_GRID) - 1)
# Generate panel observations
x1 = rng.standard_normal(N_OBS)
x2 = rng.standard_normal(N_OBS)
x3 = rng.standard_normal(N_OBS)
x4 = rng.standard_normal(N_OBS)
epsilon = rng.standard_normal(N_OBS)
# Repeat coefficients across time periods
b1 = np.repeat(beta_1_true, N_TIME)
b2 = np.repeat(beta_2_true, N_TIME)
b3 = np.repeat(beta_3_true, N_TIME)
b4 = np.repeat(beta_4_true, N_TIME)
alpha_panel = np.repeat(alpha_true, N_TIME)
# Outcome = fixed effect + spatially varying slopes + noise
y = alpha_panel + b1*x1 + b2*x2 + b3*x3 + b4*x4 + epsilon
print(f&amp;quot;Panel data shape: ({N_OBS}, 14)&amp;quot;)
print(pd.DataFrame({&amp;quot;y&amp;quot;: y, &amp;quot;x1&amp;quot;: x1, &amp;quot;x2&amp;quot;: x2, &amp;quot;x3&amp;quot;: x3, &amp;quot;x4&amp;quot;: x4})
.describe().round(3).to_string())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Panel data shape: (675, 14)
y x1 x2 x3 x4
count 675.000 675.000 675.000 675.000 675.000
mean 23.069 -0.038 -0.014 -0.110 0.027
std 15.489 0.982 1.009 1.010 1.017
min -4.073 -2.965 -3.648 -3.048 -3.064
25% 9.717 -0.702 -0.675 -0.771 -0.647
50% 20.862 -0.049 0.012 -0.089 0.052
75% 35.123 0.580 0.636 0.554 0.683
max 57.411 2.914 3.179 2.914 2.857
&lt;/code>&lt;/pre>
&lt;p>The outcome y has a mean of 23.07 and standard deviation of 15.49. Most of this cross-sectional variation comes from the confounder $\alpha_i$, which ranges from 2.07 to 51.55 (mean 23.29). By contrast, the four covariates are standard-normal draws (means near 0, SDs near 1.0), and the true coefficients are all modest in magnitude (ranging from 0 to 2). This is a challenging identification problem: the confounder dominates the outcome, so any method that ignores it will attribute confounder variation to the covariates.&lt;/p>
&lt;p>The figure below shows the true coefficient surfaces and the confounder pattern on the 15x15 grid.&lt;/p>
&lt;pre>&lt;code class="language-python">fig, axes = plt.subplots(2, 2, figsize=(12, 11))
# ... plotting code for true coefficient surfaces ...
plt.savefig(&amp;quot;mgwrfer_true_coefficients.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="mgwrfer_true_coefficients.png" alt="True DGP coefficient surfaces: beta_1 shows a quadratic dome, beta_2 a linear gradient, beta_3 is constant at 1.5, and alpha_i is an exponential confounder dominating the cross-sectional variation.">&lt;/p>
&lt;p>The contrast is stark: $\alpha_i$ (lower-right panel) has a range of nearly 50 units, while the coefficients $\beta_1$ through $\beta_3$ vary by at most 1 unit. Any cross-sectional model that cannot separate $\alpha_i$ from the slopes will produce severely biased estimates — the exponential fixed-effect pattern will &amp;ldquo;leak&amp;rdquo; into the coefficient surfaces, distorting their true shapes.&lt;/p>
&lt;h2 id="4-pooled-mgwr-the-naive-approach">4. Pooled MGWR: the naive approach&lt;/h2>
&lt;p>The simplest approach ignores the panel structure entirely, treating all 675 observations as independent cross-sectional data and fitting MGWR with an intercept. This is what a researcher might do if they stacked multiple time periods without accounting for unit-specific effects.&lt;/p>
&lt;p>The custom &lt;code>mgwr&lt;/code> package requires variables to be &lt;strong>standardized&lt;/strong> before multiscale bandwidth selection. The &lt;code>time=N_TIME&lt;/code> parameter tells the algorithm that observations are grouped in panels of 3 time periods per unit, which affects the kernel weighting.&lt;/p>
&lt;pre>&lt;code class="language-python"># Standardize raw data
Y_std_pooled = (Y_raw - Y_raw.mean()) / Y_raw.std()
X_std_pooled = (X_raw - X_raw.mean(axis=0)) / X_raw.std(axis=0)
# Bandwidth selection and fitting
pooled_selector = Sel_BW(
coords_panel, Y_std_pooled, X_std_pooled,
multi=True, constant=True, time=N_TIME
)
pooled_bw = pooled_selector.search()
pooled_model = MGWR(
coords_panel, Y_std_pooled, X_std_pooled,
pooled_selector, constant=True, time=N_TIME
).fit()
print(f&amp;quot;Pooled MGWR bandwidths: {pooled_bw}&amp;quot;)
print(f&amp;quot;R-squared: {pooled_model.R2:.4f}&amp;quot;)
print(f&amp;quot;AICc: {pooled_model.aicc:.2f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Pooled MGWR bandwidths: [ 44. 50. 175. 223. 223.]
Pooled MGWR R-squared: 0.9771
Pooled MGWR Adj. R-squared: 0.9759
Pooled MGWR AICc: -561.77
&lt;/code>&lt;/pre>
&lt;p>After back-transforming the standardized coefficients to the original scale, we compute recovery metrics against the known truth:&lt;/p>
&lt;pre>&lt;code class="language-python"># Back-transform: beta_orig = beta_std * (y_std / x_std)
# Average per unit across time periods, then compare to true values
print(&amp;quot; beta1_pooled: RMSE=0.3945, Corr=0.4586&amp;quot;)
print(&amp;quot; beta2_pooled: RMSE=0.0888, Corr=0.9504&amp;quot;)
print(&amp;quot; beta3_pooled: RMSE=0.0578, Corr=nan&amp;quot;)
print(&amp;quot; beta4_pooled: RMSE=0.2531, Corr=nan&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> beta1_pooled: RMSE=0.3945, Corr=0.4586
beta2_pooled: RMSE=0.0888, Corr=0.9504
beta3_pooled: RMSE=0.0578, Corr=nan
beta4_pooled: RMSE=0.2531, Corr=nan
&lt;/code>&lt;/pre>
&lt;p>The R-squared of 0.977 looks impressive, but it is misleading. The intercept (bandwidth = 44) absorbs most of the spatial variation from the confounder $\alpha_i$, inflating the apparent model fit without actually recovering the slope coefficients well. The contamination is most visible in $\beta_1$: its correlation with the true values is only 0.459, and the RMSE of 0.395 represents roughly 26% of the coefficient&amp;rsquo;s mean value (1.50). The model conflates the quadratic dome pattern with the exponential fixed effect. Meanwhile, $\beta_4$ — which is truly zero everywhere — shows an RMSE of 0.253, meaning the model falsely attributes confounder variation to a covariate that has no effect. The &lt;code>nan&lt;/code> correlations for $\beta_3$ and $\beta_4$ are mathematically expected: the true values have zero variance (constant and zero respectively), making Pearson correlation undefined.&lt;/p>
&lt;h2 id="5-mgwrfer-removing-the-confounder">5. MGWRFER: removing the confounder&lt;/h2>
&lt;h3 id="51-the-within-transformation">5.1 The within-transformation&lt;/h3>
&lt;p>The fix is elegant. If the confounder $\alpha_i$ does not change over time, we can eliminate it by subtracting each unit&amp;rsquo;s temporal mean from all its observations. This is the &lt;em>within-transformation&lt;/em> — the workhorse of panel data econometrics. Think of it like zeroing a kitchen scale: you subtract the weight of the container (the fixed effect) so that only the contents (the covariate effects) remain.&lt;/p>
&lt;p>Formally, for each unit $i$:&lt;/p>
&lt;p>$$\tilde{y}_{it} = y_{it} - \bar{y}_i = \beta_1(u_i, v_i)(x_{1,it} - \bar{x}_{1,i}) + \cdots + \beta_4(u_i, v_i)(x_{4,it} - \bar{x}_{4,i}) + (\varepsilon_{it} - \bar{\varepsilon}_i)$$&lt;/p>
&lt;p>In words, this says: after subtracting the unit mean $\bar{y}_i$, the fixed effect $\alpha_i$ vanishes completely (since $\alpha_i - \alpha_i = 0$). What remains are the within-unit deviations of the covariates multiplied by their true spatially varying coefficients, plus demeaned noise. The key &lt;strong>causal assumption&lt;/strong> is that no &lt;em>time-varying&lt;/em> confounders exist — strict exogeneity conditional on the fixed effects.&lt;/p>
&lt;p>&lt;strong>Variable mapping:&lt;/strong> $\tilde{y}_{it}$ corresponds to &lt;code>y_within&lt;/code> in the code, $\bar{y}_i$ is computed via &lt;code>groupby(&amp;quot;unit_id&amp;quot;).transform(&amp;quot;mean&amp;quot;)&lt;/code>, and the demeaned covariates are &lt;code>x1_within&lt;/code> through &lt;code>x4_within&lt;/code>.&lt;/p>
&lt;pre>&lt;code class="language-python"># Assemble panel DataFrame (see script.py for full construction)
# panel_df contains: unit_id, time_id, coord_i, coord_j, y, x1-x4, true coefficients
# Within-transformation: subtract unit means
unit_means = panel_df.groupby(&amp;quot;unit_id&amp;quot;)[[&amp;quot;y&amp;quot;,&amp;quot;x1&amp;quot;,&amp;quot;x2&amp;quot;,&amp;quot;x3&amp;quot;,&amp;quot;x4&amp;quot;]].transform(&amp;quot;mean&amp;quot;)
y_within = (panel_df[&amp;quot;y&amp;quot;].values - unit_means[&amp;quot;y&amp;quot;].values).reshape(-1, 1)
X_within = np.column_stack([
panel_df[&amp;quot;x1&amp;quot;].values - unit_means[&amp;quot;x1&amp;quot;].values,
panel_df[&amp;quot;x2&amp;quot;].values - unit_means[&amp;quot;x2&amp;quot;].values,
panel_df[&amp;quot;x3&amp;quot;].values - unit_means[&amp;quot;x3&amp;quot;].values,
panel_df[&amp;quot;x4&amp;quot;].values - unit_means[&amp;quot;x4&amp;quot;].values,
])
print(f&amp;quot;y_within range: [{y_within.min():.3f}, {y_within.max():.3f}]&amp;quot;)
print(f&amp;quot;Max unit mean after demeaning: 7.11e-15 (should be ~0)&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> y_within range: [-6.877, 6.923]
Fixed effects removed (mean of y_within per unit = 0)
Max unit mean after demeaning: 7.11e-15 (should be ~0)
&lt;/code>&lt;/pre>
&lt;p>The demeaned outcome spans only [-6.88, 6.92] — a spread of 13.8 compared to the raw y range of [-4.07, 57.41] (spread of 61.5). The confounder, which ranged from 2.07 to 51.55, has been completely removed. The maximum unit mean after demeaning is 7.11 x 10^-15 — effectively machine-zero — confirming that the transformation is numerically exact. With $\alpha_i$ gone, any variation in the demeaned outcome is attributable solely to the covariates' spatially varying effects and noise.&lt;/p>
&lt;h3 id="52-mgwr-on-demeaned-data">5.2 MGWR on demeaned data&lt;/h3>
&lt;p>Now we fit MGWR on the within-transformed data. Two critical settings distinguish this from the pooled model:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>&lt;code>constant=False&lt;/code>&lt;/strong> — since demeaning removes the intercept (the unit-level mean is already gone), we fit slopes only.&lt;/li>
&lt;li>&lt;strong>Standardization&lt;/strong> — we standardize the demeaned variables before bandwidth selection, then back-transform the coefficients to the original scale.&lt;/li>
&lt;/ol>
&lt;pre>&lt;code class="language-python"># Standardize demeaned data
Y_std_fe = (y_within - y_within.mean()) / y_within.std()
X_std_fe = (X_within - X_within.mean(axis=0)) / X_within.std(axis=0)
# Bandwidth selection (no intercept)
fe_selector = Sel_BW(
coords_panel, Y_std_fe, X_std_fe,
multi=True, constant=False, time=N_TIME
)
fe_bw = fe_selector.search()
# Fit MGWRFER
fe_model = MGWR(
coords_panel, Y_std_fe, X_std_fe,
fe_selector, constant=False, time=N_TIME
).fit()
print(f&amp;quot;MGWRFER bandwidths: {fe_bw}&amp;quot;)
print(f&amp;quot;R-squared: {fe_model.R2:.4f}&amp;quot;)
print(f&amp;quot;AICc: {fe_model.aicc:.2f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> MGWRFER bandwidths: [ 50. 91. 116. 62.]
MGWRFER R-squared: 0.8900
MGWRFER Adj. R-squared: 0.8844
MGWRFER AICc: 496.09
&lt;/code>&lt;/pre>
&lt;p>The R-squared of 0.890 reflects explanatory power over the &lt;em>demeaned&lt;/em> outcome — it is not directly comparable to the pooled model&amp;rsquo;s 0.977, which operates on raw y dominated by the confounder. A fairer interpretation: 89% of the within-unit temporal variation is explained by the spatially varying slopes.&lt;/p>
&lt;p>After back-transforming the coefficients (&lt;code>beta_orig = beta_std * (y_std / x_std)&lt;/code>) and averaging per unit:&lt;/p>
&lt;pre>&lt;code class="language-python">print(&amp;quot; beta1_mgwrfer: RMSE=0.1793, Corr=0.8179&amp;quot;)
print(&amp;quot; beta2_mgwrfer: RMSE=0.1050, Corr=0.9407&amp;quot;)
print(&amp;quot; beta3_mgwrfer: RMSE=0.0724, Corr=nan&amp;quot;)
print(&amp;quot; beta4_mgwrfer: RMSE=0.1399, Corr=nan&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> beta1_mgwrfer: RMSE=0.1793, Corr=0.8179
beta2_mgwrfer: RMSE=0.1050, Corr=0.9407
beta3_mgwrfer: RMSE=0.0724, Corr=nan
beta4_mgwrfer: RMSE=0.1399, Corr=nan
&lt;/code>&lt;/pre>
&lt;p>The improvement for $\beta_1$ is dramatic: RMSE drops from 0.395 to 0.179 (a 54.6% reduction) and the correlation with true values jumps from 0.459 to 0.818. MGWRFER now captures the quadratic dome pattern instead of conflating it with the fixed effect. For the null coefficient $\beta_4$, RMSE drops from 0.253 to 0.140 (44.7% reduction) — much less false-positive contamination. However, $\beta_2$ and $\beta_3$ show modest RMSE increases (0.089 to 0.105, and 0.058 to 0.072). This is the &lt;strong>bias-variance tradeoff&lt;/strong> at work: the within-transformation reduces effective sample size (from raw observations to within-unit deviations), increasing estimation variance for coefficients that were already well-identified by pooled MGWR.&lt;/p>
&lt;h2 id="6-comparing-coefficient-recovery">6. Comparing coefficient recovery&lt;/h2>
&lt;p>The scatter plots below compare true vs estimated coefficients for both approaches. In a perfect model, all points would lie on the 45-degree reference line.&lt;/p>
&lt;pre>&lt;code class="language-python"># Figure 2: True vs Pooled MGWR (3-panel scatter)
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for ax, true_vals, est_vals, label in zip(axes, true_arrays, pooled_arrays, labels):
ax.scatter(true_vals, est_vals, color=STEEL_BLUE, alpha=0.4, s=15)
ax.plot(lims, lims, color=WARM_ORANGE, linewidth=2, linestyle=&amp;quot;--&amp;quot;)
# ... annotation code ...
plt.savefig(&amp;quot;mgwrfer_bias_pooled.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="mgwrfer_bias_pooled.png" alt="True vs Pooled MGWR scatter plots for three coefficients. Beta_1 shows severe scatter away from the identity line (Corr=0.459), while beta_2 and beta_3 track more closely.">&lt;/p>
&lt;p>The pooled MGWR scatter reveals the damage: $\beta_1$ points are widely dispersed around the 45-degree line, with the model systematically overestimating some locations and underestimating others (Corr = 0.459). The quadratic dome shape is barely recovered. In contrast, $\beta_2$ hugs the reference line (Corr = 0.950) because its linear gradient is more easily separated from the exponential confounder.&lt;/p>
&lt;pre>&lt;code class="language-python"># Figure 3: True vs MGWRFER (3-panel scatter)
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for ax, true_vals, est_vals, label in zip(axes, true_arrays, fe_arrays, labels):
ax.scatter(true_vals, est_vals, color=TEAL, alpha=0.4, s=15)
ax.plot(lims, lims, color=WARM_ORANGE, linewidth=2, linestyle=&amp;quot;--&amp;quot;)
# ... annotation code ...
plt.savefig(&amp;quot;mgwrfer_recovery_fe.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="mgwrfer_recovery_fe.png" alt="True vs MGWRFER scatter plots. Beta_1 is now tightly clustered around the identity line (Corr=0.818), showing successful recovery of the quadratic dome pattern.">&lt;/p>
&lt;p>After fixed-effects correction, the $\beta_1$ scatter tightens dramatically — the correlation jumps from 0.459 to 0.818, and the quadratic dome structure is clearly visible as a tight band along the reference line. The tradeoff is visible in $\beta_2$ and $\beta_3$: slightly wider scatter (more variance) but still centered on the truth, indicating unbiased estimation with higher noise.&lt;/p>
&lt;h2 id="7-model-comparison">7. Model comparison&lt;/h2>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Metric&lt;/th>
&lt;th>Pooled MGWR&lt;/th>
&lt;th>MGWRFER&lt;/th>
&lt;th>Change&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>RMSE ($\beta_1$)&lt;/td>
&lt;td>0.3945&lt;/td>
&lt;td>0.1793&lt;/td>
&lt;td>-54.6%&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>RMSE ($\beta_2$)&lt;/td>
&lt;td>0.0888&lt;/td>
&lt;td>0.1050&lt;/td>
&lt;td>+18.2%&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>RMSE ($\beta_3$)&lt;/td>
&lt;td>0.0578&lt;/td>
&lt;td>0.0724&lt;/td>
&lt;td>+25.2%&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>RMSE ($\beta_4$)&lt;/td>
&lt;td>0.2531&lt;/td>
&lt;td>0.1399&lt;/td>
&lt;td>-44.7%&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Corr ($\beta_1$)&lt;/td>
&lt;td>0.4586&lt;/td>
&lt;td>0.8179&lt;/td>
&lt;td>+78%&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Corr ($\beta_2$)&lt;/td>
&lt;td>0.9504&lt;/td>
&lt;td>0.9407&lt;/td>
&lt;td>-1.0%&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The pattern is clear: MGWRFER delivers its largest improvements precisely where pooled MGWR was most biased. For coefficients contaminated by the confounder ($\beta_1$ and $\beta_4$), RMSE drops 45-55%. For coefficients already well-estimated ($\beta_2$ and $\beta_3$), RMSE rises modestly (18-25%) but the absolute values remain small. This is a favorable tradeoff in practice — eliminating severe bias at the cost of slightly higher variance for already-precise estimates.&lt;/p>
&lt;h2 id="8-bandwidth-comparison">8. Bandwidth comparison&lt;/h2>
&lt;p>The bandwidths reveal &lt;em>how&lt;/em> the fixed-effects correction changes the spatial structure that MGWR detects.&lt;/p>
&lt;pre>&lt;code class="language-python">print(&amp;quot;Pooled MGWR bws (x1-x4): [50, 175, 223, 223]&amp;quot;)
print(&amp;quot;MGWRFER bws (x1-x4): [50, 91, 116, 62]&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Pooled MGWR bws (x1-x4): [50, 175, 223, 223]
MGWRFER bws (x1-x4): [50, 91, 116, 62]
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="mgwrfer_bandwidth_comparison.png" alt="Grouped bar chart comparing pooled MGWR vs MGWRFER bandwidths for each covariate. MGWRFER uses consistently smaller bandwidths, especially for x4 which drops from 223 to 62.">&lt;/p>
&lt;p>MGWRFER selects uniformly smaller bandwidths for 3 of 4 covariates. The most dramatic shift is x4 (null effect): the pooled model uses bandwidth 223 (nearly global, treating the coefficient as spatially constant), while MGWRFER uses 62. This happens because the pooled model&amp;rsquo;s x4 coefficient was absorbing the globally smooth confounder variation — requiring a large kernel to fit that smooth pattern. After demeaning removes the confounder, the remaining x4 variation is local noise best captured with a smaller kernel. Similarly, x2 drops from 175 to 91 and x3 from 223 to 116. Only x1 retains the same bandwidth (50 in both models) — its quadratic dome has a genuinely local structure that requires a small kernel regardless of whether the confounder is removed.&lt;/p>
&lt;h2 id="9-spatial-coefficient-maps">9. Spatial coefficient maps&lt;/h2>
&lt;p>The most convincing evidence comes from mapping the estimated surfaces alongside the known truth.&lt;/p>
&lt;pre>&lt;code class="language-python"># 2x3 grid: top row = true, bottom row = MGWRFER estimates
fig, axes = plt.subplots(2, 3, figsize=(16, 11))
# ... mapping code with shared colorbars ...
plt.savefig(&amp;quot;mgwrfer_coefficient_maps.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="mgwrfer_coefficient_maps.png" alt="Six-panel spatial map comparing true coefficients (top row) with MGWRFER estimates (bottom row) for beta_1, beta_2, and beta_3. The quadratic dome and linear gradient are visually recovered.">&lt;/p>
&lt;p>The MGWRFER estimated $\beta_1$ map (bottom-left) recovers the concentric dome pattern of the true coefficient (top-left), though with some smoothing at the edges. The $\beta_2$ linear gradient (bottom-center) matches the true gradient (top-center) with high fidelity. The $\beta_3$ map (bottom-right) shows mild spurious spatial variation around the true constant of 1.5 — this illustrates the variance cost of within-transformation for spatially homogeneous effects (RMSE = 0.072).&lt;/p>
&lt;h2 id="10-statistical-significance">10. Statistical significance&lt;/h2>
&lt;p>A key diagnostic for MGWRFER is whether it correctly identifies which coefficients are significant at each location. The significance maps below use filtered t-values (corrected for multiple testing across the 225 spatial units).&lt;/p>
&lt;pre>&lt;code class="language-python"># 2x2 significance maps
# Orange = significant positive, dark blue = not significant
plt.savefig(&amp;quot;mgwrfer_significance_maps.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="mgwrfer_significance_maps.png" alt="Significance maps for all four coefficients. Beta_1 through beta_3 are unanimously significant positive (all orange). Beta_4 correctly shows 202 of 225 units as not significant (dark blue), with a small false-positive cluster.">&lt;/p>
&lt;p>All 225 spatial units show statistically significant positive effects for $\beta_1$, $\beta_2$, and $\beta_3$ — consistent with the true DGP where all three are strictly positive everywhere. The critical test is $\beta_4$ (truly zero): 202 of 225 units (89.8%) are correctly classified as not significant, while 23 units (10.2%) show false positives. This false-positive rate, though above the nominal 5% level, is substantially better than what pooled MGWR would produce — where the inflated RMSE of 0.253 implies widespread spurious significance. The false positives are spatially concentrated in a small cluster, suggesting boundary effects or local multicollinearity rather than systematic bias.&lt;/p>
&lt;h2 id="11-discussion">11. Discussion&lt;/h2>
&lt;p>Returning to our original question: &lt;strong>can we recover the true spatially varying coefficients when a strong, unobserved spatial confounder contaminates the data?&lt;/strong> The answer is a qualified yes.&lt;/p>
&lt;p>MGWRFER successfully eliminates the confounder&amp;rsquo;s influence on coefficient estimation. The most contaminated coefficient ($\beta_1$) goes from poorly recovered (Corr = 0.459) to well-recovered (Corr = 0.818). The null coefficient ($\beta_4$) goes from showing substantial false-positive bias (RMSE = 0.253) to being correctly identified as non-significant in 90% of locations. These improvements are not marginal — they represent the difference between misleading and informative inference.&lt;/p>
&lt;p>The tradeoff is real but manageable. Coefficients that were already well-estimated see modest RMSE increases (18-25%), because the within-transformation reduces effective sample size. A practitioner facing this tradeoff should ask: &amp;ldquo;Is the potential for confounding bias worse than a small increase in estimation variance?&amp;rdquo; In most applied settings — where unobserved spatial confounders are plausible but unmeasurable — the answer is yes. The bias from ignoring fixed effects is &lt;em>systematic&lt;/em> (it pushes estimates in the wrong direction), while the variance increase is &lt;em>random&lt;/em> (it widens confidence intervals without introducing directional error).&lt;/p>
&lt;p>The causal interpretation of MGWRFER coefficients requires the assumption of &lt;strong>no time-varying confounders&lt;/strong> — strict exogeneity conditional on the fixed effects. In real applications, this is stronger than it sounds: it rules out any unobserved factor that changes over time and is correlated with both the covariates and the outcome. Researchers should justify this assumption carefully, especially in settings with policy changes, structural breaks, or trending confounders.&lt;/p>
&lt;h2 id="12-summary-and-next-steps">12. Summary and next steps&lt;/h2>
&lt;p>&lt;strong>Key takeaways:&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Bias correction works:&lt;/strong> MGWRFER reduces RMSE by 55% for the most-biased coefficient ($\beta_1$: 0.395 to 0.179) and by 45% for the null effect ($\beta_4$: 0.253 to 0.140), demonstrating effective removal of time-invariant confounding.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Bias-variance tradeoff is favorable:&lt;/strong> The variance cost is modest — $\beta_2$ RMSE rises from 0.089 to 0.105, and $\beta_3$ from 0.058 to 0.072 — while the bias elimination is large. Systematic bias is worse than random variance in most applications.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Bandwidths reveal confounding structure:&lt;/strong> After demeaning, MGWRFER selects smaller bandwidths (x4: 223 to 62; x2: 175 to 91), indicating that the confounder was inflating spatial smoothness estimates. The true coefficient surfaces are more localized than the pooled model suggests.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>False-positive control improves:&lt;/strong> The null coefficient is correctly identified as non-significant in 90% of locations under MGWRFER, compared to the pooled model where RMSE of 0.253 would imply widespread false significance.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Limitations:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Only 3 time periods — more periods would improve within-estimator efficiency and reduce the false-positive rate&lt;/li>
&lt;li>The simulated confounder is time-invariant by construction; in practice, time-varying confounders remain a threat&lt;/li>
&lt;li>Computational cost: MGWR bandwidth selection scales poorly with N, limiting grid sizes&lt;/li>
&lt;li>The 15x15 grid (225 units) is small; results may differ quantitatively at larger scales&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Next steps:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Apply MGWRFER to real panel data (e.g., regional economic growth, housing prices, environmental exposure)&lt;/li>
&lt;li>Compare with alternative spatial panel methods (spatial lag/error with fixed effects)&lt;/li>
&lt;li>Explore the relationship between the number of time periods and the bias-variance tradeoff&lt;/li>
&lt;li>Extend to cases with spatially and temporally varying coefficients (GTWRFER)&lt;/li>
&lt;/ul>
&lt;h2 id="13-exercises">13. Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Increase time periods:&lt;/strong> Modify the DGP to use &lt;code>N_TIME = 10&lt;/code> instead of 3. How does the bias-variance tradeoff change? Does $\beta_2$&amp;rsquo;s RMSE still increase under MGWRFER, or does the larger effective sample size offset the variance cost?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Add a time-varying confounder:&lt;/strong> Create a variable $\gamma_t$ that changes over time and is correlated with $x_1$. Add it to the DGP as $y_{it} = \alpha_i + \gamma_t \cdot x_{1,it} + \ldots$. Does MGWRFER still improve coefficient recovery, or does the time-varying confounder break the within-transformation&amp;rsquo;s assumptions?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Real-world application:&lt;/strong> Download a panel dataset of regional economic indicators (e.g., from the World Bank or PySAL sample data). Apply MGWRFER and compare against pooled MGWR. What spatial patterns emerge in the coefficient maps that the pooled model misses?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="references">References&lt;/h2>
&lt;ol>
&lt;li>&lt;a href="https://github.com/GeoZhipengLi/MGWPR" target="_blank" rel="noopener">Li, Z., Fotheringham, A.S., Oshan, T., &amp;amp; Wolf, L.J. (2024). Multiscale Geographically Weighted Fixed Effects Regression.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1080/24694452.2017.1352480" target="_blank" rel="noopener">Fotheringham, A.S., Yang, W., &amp;amp; Kang, W. (2017). Multiscale Geographically Weighted Regression (MGWR). Annals of the American Association of Geographers, 107(6), 1247-1265.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.21105/joss.01823" target="_blank" rel="noopener">Oshan, T., Li, Z., Kang, W., Wolf, L.J., &amp;amp; Fotheringham, A.S. (2019). mgwr: A Python Implementation of Multiscale Geographically Weighted Regression. Journal of Open Source Software, 4(42), 1823.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/GeoZhipengLi/MGWPR" target="_blank" rel="noopener">GeoZhipengLi/MGWPR — Custom mgwr Package with Panel Data Support (GitHub)&lt;/a>&lt;/li>
&lt;/ol></description></item><item><title>Introduction to Panel Data Methods in Python</title><link>https://carlos-mendez.org/post/python_panel_intro/</link><pubDate>Mon, 27 Apr 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/python_panel_intro/</guid><description>&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>Imagine you have data on the same workers in two different years — 2010 and 2012 — and you want to know whether &lt;em>joining a union&lt;/em> raises a worker&amp;rsquo;s wage. A simple regression on the pooled data says yes, by about 7.5%. But that headline number hides a problem that has occupied econometricians for fifty years: workers who join unions are not the same as workers who don&amp;rsquo;t. Maybe they have less formal education, or they work in industries where unions are common, or they are older and have negotiated harder. If any of those &lt;em>unobserved&lt;/em> differences also affect wages, the 7.5% estimate is mixing the union effect with everything else that comes bundled with union status.&lt;/p>
&lt;p>This is the &lt;strong>omitted-variable bias&lt;/strong> problem, and panel data — repeated observations on the same units over time — gives us several ways to fight it. By comparing each worker to &lt;em>themselves&lt;/em> across years, we can strip out anything that is constant within a person (innate ability, gender, schooling, family background) and isolate the effect of switching union status. The price is a much smaller effective sample: only the workers who actually changed union status between 2010 and 2012 contribute to the estimate. The benefit is a coefficient that is much harder to dismiss as confounded.&lt;/p>
&lt;p>This tutorial walks through the seven canonical panel estimators on a real two-period wage panel: pooled OLS, between, first-differences, the within (fixed effects) estimator, two-way fixed effects, random effects, and Mundlak&amp;rsquo;s correlated random effects. Along the way we run the Hausman test and visualize what the &lt;em>within transformation&lt;/em> actually does to the data. The headline result will surprise some readers: once we account for unobserved worker traits, the union wage premium roughly &lt;em>triples&lt;/em> — from about 7% to about 21%.&lt;/p>
&lt;p>&lt;strong>Learning objectives:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Understand the difference between &lt;em>between&lt;/em> and &lt;em>within&lt;/em> variation in panel data, and why this distinction drives the choice of estimator.&lt;/li>
&lt;li>Implement seven panel-data estimators in Python using &lt;code>pyfixest&lt;/code> and &lt;code>linearmodels&lt;/code>, with one short code block per method.&lt;/li>
&lt;li>Visualize the within transformation and see geometrically why fixed effects produce a different slope than pooled OLS.&lt;/li>
&lt;li>Run the Hausman test to compare fixed and random effects, and use the Mundlak/CRE specification as the modern alternative.&lt;/li>
&lt;li>Interpret the factor-of-three gap between cross-sectional and within estimators in terms of selection on unobservables.&lt;/li>
&lt;/ul>
&lt;p>The diagram below summarizes the estimator family and how the two specification tests (Hausman and Mundlak) point you toward FE or RE based on the data.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">flowchart TD
A[&amp;quot;Panel data y_it, x_it for i = 1..N, t = 1..T&amp;quot;]
A --&amp;gt; B{&amp;quot;What variation does the estimator use?&amp;quot;}
B --&amp;gt;|&amp;quot;All variation (ignores panel)&amp;quot;| POLS[&amp;quot;Pooled OLS&amp;quot;]
B --&amp;gt;|&amp;quot;Cross-sectional only&amp;quot;| BETW[&amp;quot;Between&amp;quot;]
B --&amp;gt;|&amp;quot;Within-individual only&amp;quot;| WITHIN[&amp;quot;FE / FDFE / DVFE / TWFE&amp;quot;]
B --&amp;gt;|&amp;quot;Weighted between + within&amp;quot;| RE[&amp;quot;Random Effects&amp;quot;]
WITHIN --&amp;gt; TEST{&amp;quot;Hausman test or Mundlak term&amp;quot;}
RE --&amp;gt; TEST
TEST --&amp;gt;|&amp;quot;Reject H0: RE inconsistent&amp;quot;| USE_FE[&amp;quot;Use FE (consistent)&amp;quot;]
TEST --&amp;gt;|&amp;quot;Fail to reject: RE plausible&amp;quot;| USE_RE[&amp;quot;Use RE (efficient)&amp;quot;]
WITHIN --&amp;gt; CRE[&amp;quot;CRE / Mundlak: bridges FE and RE&amp;quot;]
RE --&amp;gt; CRE
style POLS fill:#999999,stroke:#141413,color:#fff
style BETW fill:#8FB4D8,stroke:#141413,color:#141413
style WITHIN fill:#d97757,stroke:#141413,color:#fff
style RE fill:#00d4c8,stroke:#141413,color:#141413
style CRE fill:#c4623d,stroke:#141413,color:#fff
style USE_FE fill:#d97757,stroke:#141413,color:#fff
style USE_RE fill:#00d4c8,stroke:#141413,color:#141413
&lt;/code>&lt;/pre>
&lt;p>The diagram makes the central trade-off visible. Estimators on the left side (POLS, Between, RE) lean on cross-sectional variation — they answer &amp;ldquo;how do union and non-union workers compare?&amp;rdquo; Estimators on the right (FE, FDFE, DVFE, TWFE) lean on within-worker variation — they answer &amp;ldquo;what happens when &lt;em>the same worker&lt;/em> switches union status?&amp;rdquo; CRE/Mundlak sits in the middle and provides a single specification that recovers both. The Hausman test and the Mundlak term are formal tests for choosing between FE and RE; we will run both and they will agree.&lt;/p>
&lt;h2 id="2-setup-and-imports">2. Setup and imports&lt;/h2>
&lt;p>We use &lt;a href="https://pyfixest.org/" target="_blank" rel="noopener">&lt;code>pyfixest&lt;/code>&lt;/a> for OLS and absorbed fixed effects, &lt;a href="https://bashtage.github.io/linearmodels/panel/introduction.html" target="_blank" rel="noopener">&lt;code>linearmodels&lt;/code>&lt;/a> for the random-effects GLS estimator, and &lt;code>scipy.stats.chi2&lt;/code> for the Hausman test critical value. The standard &lt;code>pandas&lt;/code> / &lt;code>numpy&lt;/code> / &lt;code>matplotlib&lt;/code> stack handles data and figures.&lt;/p>
&lt;pre>&lt;code class="language-python">import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pyfixest as pf
import statsmodels.api as sm
from linearmodels.panel import RandomEffects
from scipy.stats import chi2
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
rng = np.random.default_rng(RANDOM_SEED)
&lt;/code>&lt;/pre>
&lt;p>The dark-theme &lt;code>plt.rcParams&lt;/code> block is in &lt;code>script.py&lt;/code> and is omitted here for brevity. All figures in this post use the site&amp;rsquo;s dark-navy palette.&lt;/p>
&lt;h2 id="3-data-loading">3. Data loading&lt;/h2>
&lt;p>We load a two-period wage panel from a Stata &lt;code>.dta&lt;/code> file: NLSY-style data on US workers observed in 2010, 2012, 2014, 2016, and 2018. For pedagogical clarity we restrict the analysis to &lt;strong>2010 and 2012 only&lt;/strong>, which makes T = 2 and gives us the cleanest possible illustration of the textbook result that first-differences and the within estimator are the same thing. With T = 2, every worker contributes exactly two observations, so the panel is automatically balanced.&lt;/p>
&lt;pre>&lt;code class="language-python">DATA_URL = &amp;quot;https://github.com/quarcs-lab/data-open/raw/master/isds/wage_panel_bob4.dta&amp;quot;
df_full = pd.read_stata(DATA_URL)
# Keep two periods so the FD = Within identity is visible.
df = df_full[df_full[&amp;quot;year&amp;quot;].isin([2010, 2012])].copy()
df = df.sort_values([&amp;quot;ID&amp;quot;, &amp;quot;year&amp;quot;]).reset_index(drop=True)
# Convert union &amp;quot;Yes/No&amp;quot; to 1/0; build a female dummy.
df[&amp;quot;union&amp;quot;] = df[&amp;quot;union&amp;quot;].map({&amp;quot;Yes&amp;quot;: 1, &amp;quot;No&amp;quot;: 0, 1: 1, 0: 0})
df[&amp;quot;female&amp;quot;] = (df[&amp;quot;gender&amp;quot;].astype(str).str.strip().str.lower() == &amp;quot;female&amp;quot;).astype(float)
# Drop rows with missing values in the variables we use.
df = df.dropna(subset=[&amp;quot;lwage&amp;quot;, &amp;quot;union&amp;quot;, &amp;quot;age&amp;quot;, &amp;quot;schooling&amp;quot;]).reset_index(drop=True)
&lt;/code>&lt;/pre>
&lt;p>The next block prints panel structure and descriptive statistics. The &amp;ldquo;balanced&amp;rdquo; check confirms every worker has exactly two observations, and the descriptive table tells us how spread out our key variables are.&lt;/p>
&lt;pre>&lt;code class="language-python">print(f&amp;quot;Individuals (N): {df['ID'].nunique()}&amp;quot;)
print(f&amp;quot;Time periods (T): {df['year'].nunique()}&amp;quot;)
print(f&amp;quot;Observations (N×T): {len(df)}&amp;quot;)
print(f&amp;quot;Balanced: {(df.groupby('ID')['year'].count() == df['year'].nunique()).all()}&amp;quot;)
print(df[[&amp;quot;lwage&amp;quot;, &amp;quot;union&amp;quot;, &amp;quot;age&amp;quot;, &amp;quot;schooling&amp;quot;]].describe().round(4))
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Individuals (N): 2199
Time periods (T): 2
Observations (N×T): 4398
Balanced: True
lwage union age schooling
count 4398.0000 4398.0000 4398.0000 4398.0000
mean 3.1061 0.1626 35.6794 14.5020
std 0.5982 0.3690 6.2576 2.1825
min -1.7325 0.0000 25.0000 3.0000
max 6.0635 1.0000 49.0000 17.0000
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>Interpretation.&lt;/strong> The analysis sample is a perfectly balanced panel of 2,199 prime-age workers (mean age 35.7, range 25–49) observed in 2010 and 2012, for 4,398 worker-year observations. Only 16.3% of the sample is unionized in any given period (mean union = 0.1626), which means the dataset leans heavily on non-union workers — a relevant constraint for any estimator that uses cross-sectional variation. Mean log wage is 3.11 with a standard deviation of 0.60, and average schooling is 14.5 years. With balanced T = 2, the within and first-difference transformations are particularly clean because every individual contributes the same amount of within-variation: exactly one switch (or non-switch) per regressor.&lt;/p>
&lt;h2 id="4-between-vs-within-variance-how-much-do-panel-methods-have-to-work-with">4. Between vs within variance: how much do panel methods have to work with?&lt;/h2>
&lt;p>Before estimating anything, it helps to ask a diagnostic question: for each variable, how much variation comes from differences &lt;em>between&lt;/em> workers and how much from changes &lt;em>within&lt;/em> workers over time? Fixed-effects estimators only use the within part. If the within part is tiny, FE will be noisy no matter how large the sample is.&lt;/p>
&lt;p>The decomposition splits each variable&amp;rsquo;s variance into two pieces. The &lt;strong>between&lt;/strong> part is the variance of each worker&amp;rsquo;s two-year mean: $\mathrm{Var}(\bar{x}_i)$. The &lt;strong>within&lt;/strong> part is the variance of each observation around its own worker&amp;rsquo;s mean: $\mathrm{Var}(x_{it} - \bar{x}_i)$. Their sum is (approximately) the total variance.&lt;/p>
&lt;pre>&lt;code class="language-python">for var in [&amp;quot;lwage&amp;quot;, &amp;quot;union&amp;quot;, &amp;quot;age&amp;quot;, &amp;quot;schooling&amp;quot;]:
overall_sd = df[var].std()
between_sd = df.groupby(&amp;quot;ID&amp;quot;)[var].mean().std()
within_sd = (df[var] - df.groupby(&amp;quot;ID&amp;quot;)[var].transform(&amp;quot;mean&amp;quot;)).std()
between_pct = between_sd**2 / (between_sd**2 + within_sd**2) * 100
print(f&amp;quot;{var:&amp;lt;10} overall {overall_sd:.4f} between {between_sd:.4f}&amp;quot;
f&amp;quot; within {within_sd:.4f} between% {between_pct:.1f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">lwage overall 0.5982 between 0.5570 within 0.2184 between% 86.7
union overall 0.3690 between 0.3576 within 0.0911 between% 93.9
age overall 6.2576 between 6.1755 within 1.0147 between% 97.4
schooling overall 2.1825 between 2.1827 within 0.0000 between% 100.0
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="panel_intro_variation.png" alt="Between vs within variance shares for the four key variables.">&lt;/p>
&lt;p>&lt;strong>Interpretation.&lt;/strong> Almost all of the variation in our variables is &lt;em>between&lt;/em> workers, not over time within a worker. Union status is 93.9% between and only 9.1% within — fixed-effects estimators have access to that thin 9% slice of total union variance. Schooling has zero within-variation (100% between) because nobody&amp;rsquo;s reported education changes between 2010 and 2012 in this sample, which is why FE will mechanically drop schooling from the regression. The big methodological consequence is that FE standard errors will be much larger than POLS standard errors, so the choice between FE and RE is not just a question of unbiasedness; it is also a question of statistical precision.&lt;/p>
&lt;h2 id="5-visualizing-the-panel-who-actually-changes-union-status">5. Visualizing the panel: who actually changes union status?&lt;/h2>
&lt;p>The variance decomposition tells us the within share is small. A spaghetti plot of individual log-wage trajectories makes the same point visually. We sample 30 random workers and color each line by the worker&amp;rsquo;s union pattern: orange if always union, blue if never union, and teal if union status changed between 2010 and 2012.&lt;/p>
&lt;pre>&lt;code class="language-python">sample_ids = rng.choice(df[&amp;quot;ID&amp;quot;].unique(), size=30, replace=False)
fig, ax = plt.subplots(figsize=(10, 6))
for pid in sample_ids:
person = df[df[&amp;quot;ID&amp;quot;] == pid].sort_values(&amp;quot;year&amp;quot;)
if person[&amp;quot;union&amp;quot;].nunique() &amp;gt; 1:
ax.plot(person[&amp;quot;year&amp;quot;], person[&amp;quot;lwage&amp;quot;], &amp;quot;o-&amp;quot;, color=&amp;quot;#00d4c8&amp;quot;, lw=2) # changer
else:
c = &amp;quot;#d97757&amp;quot; if person[&amp;quot;union&amp;quot;].iloc[0] == 1 else &amp;quot;#6a9bcc&amp;quot;
ax.plot(person[&amp;quot;year&amp;quot;], person[&amp;quot;lwage&amp;quot;], &amp;quot;o-&amp;quot;, color=c, alpha=0.35)
plt.savefig(&amp;quot;panel_intro_trajectories.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="panel_intro_trajectories.png" alt="Individual wage trajectories for 30 sampled workers, colored by union-status pattern.">&lt;/p>
&lt;p>&lt;strong>Interpretation.&lt;/strong> Most of the lines are flat-colored (blue or orange): workers who are &lt;em>always&lt;/em> or &lt;em>never&lt;/em> in a union over the two-year window. Only the teal lines — the ones that change union status — provide identifying information for fixed effects, first-differences, and Mundlak/CRE. If you squint at the figure and ignore the teal lines, you have effectively run a between estimator. If you ignore everything except the teal lines, you have run fixed effects. The post&amp;rsquo;s central tension between cross-sectional and within methods is a question of which lines you choose to read.&lt;/p>
&lt;h2 id="6-pooled-ols-the-naive-baseline">6. Pooled OLS: the naive baseline&lt;/h2>
&lt;p>We start with the simplest possible estimator: regress log wages on union membership, treating every worker-year as if it were an independent observation. This is &lt;strong>pooled OLS&lt;/strong> (POLS). It ignores the panel structure entirely.&lt;/p>
&lt;pre>&lt;code class="language-python"># Stata: reg lwage union, robust
fit_pols = pf.feols(&amp;quot;lwage ~ union&amp;quot;, data=df, vcov=&amp;quot;HC1&amp;quot;)
pols_coef = fit_pols.coef()[&amp;quot;union&amp;quot;]
pols_se = fit_pols.se()[&amp;quot;union&amp;quot;]
print(f&amp;quot;Union coefficient: {pols_coef:.4f} (SE {pols_se:.4f})&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Union coefficient: 0.0750 (SE 0.0231)
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>Interpretation.&lt;/strong> Pooled OLS reports a union wage premium of 7.5 log points (SE 2.3 percentage points), which is highly significant by conventional standards (t ≈ 3.25). This is the textbook cross-sectional answer and the number a naive analyst would report. It is almost certainly biased: if higher-ability workers select &lt;em>out of&lt;/em> unionized jobs (a common pattern in this dataset), then POLS confounds the union effect with whatever ability does to wages. The rest of the post is essentially a tour through different ways of subtracting the bias out.&lt;/p>
&lt;h2 id="7-between-estimator-the-cross-sectional-benchmark">7. Between estimator: the cross-sectional benchmark&lt;/h2>
&lt;p>The &lt;strong>between estimator&lt;/strong> takes POLS to its logical extreme: collapse each worker to their two-year mean, then run OLS across workers. This uses &lt;em>only&lt;/em> between-individual variation — the mirror image of fixed effects — and gives us a clean reference point for what a purely cross-sectional answer looks like.&lt;/p>
&lt;pre>&lt;code class="language-python"># Stata: xtreg lwage union, be
df_between = df.groupby(&amp;quot;ID&amp;quot;)[[&amp;quot;lwage&amp;quot;, &amp;quot;union&amp;quot;]].mean().reset_index()
fit_between = pf.feols(&amp;quot;lwage ~ union&amp;quot;, data=df_between, vcov=&amp;quot;HC1&amp;quot;)
between_coef = fit_between.coef()[&amp;quot;union&amp;quot;]
between_se = fit_between.se()[&amp;quot;union&amp;quot;]
print(f&amp;quot;Union coefficient: {between_coef:.4f} (SE {between_se:.4f})&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Union coefficient: 0.0662 (SE 0.0311)
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>Interpretation.&lt;/strong> Collapsing the panel to 2,199 individual averages and running OLS gives 6.6 log points (SE 3.1) — the cross-sectional union effect with all within-individual variation explicitly thrown away. Notice how close this is to POLS (0.066 vs 0.075): that is exactly what we should expect, because 94% of union variance is between-worker, so POLS and Between are looking at almost the same picture from slightly different angles. Both share the same identification problem and serve as the &lt;em>pre-FE benchmarks&lt;/em> against which the within-style estimators will diverge sharply in the next sections.&lt;/p>
&lt;h2 id="8-first-differences-subtracting-the-past-from-the-present">8. First-differences: subtracting the past from the present&lt;/h2>
&lt;p>The first within-style estimator we will see is &lt;strong>first-differences&lt;/strong> (FDFE). The idea is to subtract each worker&amp;rsquo;s 2010 values from their 2012 values; any time-invariant trait (ability, schooling, family background) cancels out in the subtraction. We are left with a regression of $\Delta\mathrm{lwage}$ on $\Delta\mathrm{union}$, identified entirely from the workers who &lt;em>changed&lt;/em> union status.&lt;/p>
&lt;p>Formally, write the panel model as&lt;/p>
&lt;p>$$y_{it} = \alpha_i + \beta x_{it} + u_{it}$$&lt;/p>
&lt;p>where $\alpha_i$ is the worker-specific (unobserved) effect. Differencing across the two periods gives&lt;/p>
&lt;p>$$y_{i,2012} - y_{i,2010} = \beta (x_{i,2012} - x_{i,2010}) + (u_{i,2012} - u_{i,2010})$$&lt;/p>
&lt;p>In words, this says: the change in wages between 2010 and 2012 equals $\beta$ times the change in union status, plus a noise term. The worker-specific $\alpha_i$ has vanished. Mapping to code: $y$ is the &lt;code>lwage&lt;/code> column, $x$ is &lt;code>union&lt;/code>, $\alpha_i$ is whatever is unique about each worker&amp;rsquo;s &lt;code>ID&lt;/code>, and $\beta$ is the parameter we want to estimate.&lt;/p>
&lt;pre>&lt;code class="language-python"># Stata: bysort ID: gen d_lwage = lwage - L.lwage; reg d_lwage d_union, robust
df_diff = (df.sort_values([&amp;quot;ID&amp;quot;, &amp;quot;year&amp;quot;])
.groupby(&amp;quot;ID&amp;quot;)[[&amp;quot;lwage&amp;quot;, &amp;quot;union&amp;quot;]].diff().dropna())
df_diff.columns = [&amp;quot;d_lwage&amp;quot;, &amp;quot;d_union&amp;quot;]
fit_fdfe = pf.feols(&amp;quot;d_lwage ~ d_union&amp;quot;, data=df_diff, vcov=&amp;quot;HC1&amp;quot;)
fdfe_coef = fit_fdfe.coef()[&amp;quot;d_union&amp;quot;]
fdfe_se = fit_fdfe.se()[&amp;quot;d_union&amp;quot;]
print(f&amp;quot;Union coefficient: {fdfe_coef:.4f} (SE {fdfe_se:.4f})&amp;quot;)
print(f&amp;quot;Differenced sample: {len(df_diff)} rows (one per worker since T=2).&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Union coefficient: 0.2113 (SE 0.0792)
Differenced sample: 2199 rows (one per worker since T=2).
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>Interpretation.&lt;/strong> The first-difference estimator returns 21.1 log points (SE 7.9), with a 95% confidence interval of roughly [0.06, 0.37]. The point estimate is &lt;em>almost three times larger&lt;/em> than POLS (0.211 vs 0.075), and the standard error is about 3.4× larger — the classic signature of moving from a cross-sectional design to a switcher-only design. The CI is wide but excludes zero, so the upward revision is statistically detectable. The intuition: workers who switch into unions are not the same as workers who are always in unions, so the within-worker effect is a different — and arguably cleaner — parameter than the cross-sectional comparison.&lt;/p>
&lt;h2 id="9-within--fixed-effects-the-same-idea-run-differently">9. Within / Fixed effects: the same idea, run differently&lt;/h2>
&lt;p>The &lt;strong>within estimator&lt;/strong> (also called fixed effects, FE) achieves the same goal as first-differences through a different transformation: it subtracts each worker&amp;rsquo;s &lt;em>mean&lt;/em> from each observation. Every variable becomes $\tilde{x}_{it} = x_{it} - \bar{x}_i$. After this &lt;em>within transformation&lt;/em>, OLS on the demeaned data delivers the FE coefficient. Modern software (&lt;code>pyfixest&lt;/code> here, &lt;code>reghdfe&lt;/code> in Stata) hides the demeaning step and just lets us write &lt;code>lwage ~ union | ID&lt;/code>, where the &lt;code>| ID&lt;/code> syntax means &amp;ldquo;absorb individual fixed effects&amp;rdquo;.&lt;/p>
&lt;pre>&lt;code class="language-python"># Manual demeaning — pedagogical, makes the within transformation visible.
df[&amp;quot;lwage_demean&amp;quot;] = df[&amp;quot;lwage&amp;quot;] - df.groupby(&amp;quot;ID&amp;quot;)[&amp;quot;lwage&amp;quot;].transform(&amp;quot;mean&amp;quot;)
df[&amp;quot;union_demean&amp;quot;] = df[&amp;quot;union&amp;quot;] - df.groupby(&amp;quot;ID&amp;quot;)[&amp;quot;union&amp;quot;].transform(&amp;quot;mean&amp;quot;)
# Stata: xtreg lwage union, fe robust (or) reghdfe lwage union, absorb(ID)
fit_fe = pf.feols(&amp;quot;lwage ~ union | ID&amp;quot;, data=df, vcov=&amp;quot;HC1&amp;quot;)
fe_coef = fit_fe.coef()[&amp;quot;union&amp;quot;]
fe_se = fit_fe.se()[&amp;quot;union&amp;quot;]
print(f&amp;quot;Union coefficient: {fe_coef:.4f} (SE {fe_se:.4f})&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Union coefficient: 0.2103 (SE 0.0812)
&lt;/code>&lt;/pre>
&lt;p>The figure below visualizes what the demeaning actually does. The left panel shows the raw data — union (jittered for visibility) on the x-axis, log wage on the y-axis, and a POLS regression line through the cloud. The right panel shows the same observations after subtracting each worker&amp;rsquo;s mean from both variables; the FE regression line goes through the demeaned cloud and through the origin.&lt;/p>
&lt;p>&lt;img src="panel_intro_demeaning.png" alt="Within transformation: raw scatter on the left (POLS slope), demeaned scatter on the right (FE slope).">&lt;/p>
&lt;p>&lt;strong>Interpretation.&lt;/strong> The two panels look almost like different datasets, but they come from the &lt;em>same&lt;/em> observations. On the left (raw data), the POLS slope is ≈ 0.08, dragged down by the union and non-union workers' mean wages being close to each other. On the right (demeaned data), the FE slope is ≈ 0.21, identified only by the workers who actually changed union status — those are the points that move off the origin. The visual makes geometrically clear what the variance decomposition told us numerically: the within slope is steeper because we are no longer comparing &lt;em>across&lt;/em> workers (where ability and schooling confound the picture); we are comparing each worker to themselves.&lt;/p>
&lt;p>The FE coefficient of 0.2103 is essentially identical to FDFE (0.2113). The tiny gap of +0.001 comes from the fact that our FD regression includes an intercept (which absorbs an aggregate time trend), while plain FE does not. Once we add a year fixed effect to FE — that&amp;rsquo;s two-way FE in the next section — the gap closes exactly.&lt;/p>
&lt;p>A small numerical aside: the &lt;strong>dummy-variable&lt;/strong> version of FE gives the same answer.&lt;/p>
&lt;pre>&lt;code class="language-python">df[&amp;quot;ID_str&amp;quot;] = df[&amp;quot;ID&amp;quot;].astype(str)
fit_dvfe = pf.feols(&amp;quot;lwage ~ union + C(ID_str)&amp;quot;, data=df, vcov=&amp;quot;HC1&amp;quot;)
print(f&amp;quot;DVFE coefficient: {fit_dvfe.coef()['union']:.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">DVFE coefficient: 0.2103
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>Interpretation.&lt;/strong> Including a dummy for every worker (N − 1 = 2,198 dummies in this sample) recovers the FE coefficient exactly: 0.2103. The within transformation, first-differences, and dummy-variable FE are three recipes for the same dish. The reason modern software prefers absorption (&lt;code>| ID&lt;/code>) over dummies is purely computational: with N = 2,199 dummies it still runs fast, but at N = 100,000 the dummy specification becomes prohibitive while absorbed FE remains trivial.&lt;/p>
&lt;h2 id="10-two-way-fixed-effects-closing-the-fdfe-gap">10. Two-way fixed effects: closing the FD–FE gap&lt;/h2>
&lt;p>&lt;strong>Two-way fixed effects&lt;/strong> (TWFE) absorbs both individual and time effects. We let &lt;code>pyfixest&lt;/code> handle both with &lt;code>| ID + year&lt;/code>. This is the workhorse specification of applied micro and DID research.&lt;/p>
&lt;pre>&lt;code class="language-python"># Stata: reghdfe lwage union age, absorb(ID year) vce(cluster ID)
fit_twfe = pf.feols(&amp;quot;lwage ~ union + age | ID + year&amp;quot;, data=df, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;ID&amp;quot;})
twfe_coef = fit_twfe.coef()[&amp;quot;union&amp;quot;]
twfe_se = fit_twfe.se()[&amp;quot;union&amp;quot;]
print(f&amp;quot;Union coefficient: {twfe_coef:.4f} (SE {twfe_se:.4f})&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Union coefficient: 0.2129 (SE 0.0793)
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>Interpretation.&lt;/strong> TWFE returns 21.3 log points (SE 7.9), almost indistinguishable from FE (0.210). The small +0.002 gap relative to FE is exactly what closes the FD–FE puzzle from the previous section: by absorbing year effects we are mechanically removing the aggregate wage trend that FD&amp;rsquo;s intercept was capturing. Schooling, gender, and any other time-invariant regressor would be silently absorbed by the individual fixed effects — you cannot identify the effect of something that does not change within a worker. This is a structural feature of within-style methods, not a coding error, and is one of the main reasons applied researchers reach for CRE/Mundlak when they want both within identification &lt;em>and&lt;/em> coefficients on time-invariant variables.&lt;/p>
&lt;h2 id="11-random-effects-betting-on-the-no-correlation-assumption">11. Random effects: betting on the no-correlation assumption&lt;/h2>
&lt;p>The &lt;strong>random-effects&lt;/strong> (RE) estimator takes a different stance: it treats the worker effect $\alpha_i$ as a &lt;em>random&lt;/em> draw from a population, &lt;em>uncorrelated with the regressors&lt;/em>. If that assumption holds, RE is more efficient than FE because it uses both within and between variation. If the assumption fails, RE is biased.&lt;/p>
&lt;p>Two pieces of vocabulary that the rest of this section relies on. First, RE is fit by &lt;em>generalized least squares&lt;/em> (GLS) — a weighted regression that downweights observations whose individual effect is harder to learn from, which is what lets RE blend between- and within-variation in the right proportions. Second, an estimator is &lt;em>consistent&lt;/em> if its bias shrinks toward zero as the sample grows; an &lt;em>inconsistent&lt;/em> estimator stays biased no matter how much data you collect. RE is consistent under the no-correlation assumption; FE is consistent under weaker assumptions and is therefore the safer default whenever the no-correlation assumption is suspect.&lt;/p>
&lt;pre>&lt;code class="language-python"># Stata: xtreg lwage union, re robust
df_re = df.set_index([&amp;quot;ID&amp;quot;, &amp;quot;year&amp;quot;])
exog = sm.add_constant(df_re[[&amp;quot;union&amp;quot;]])
fit_re = RandomEffects(df_re[&amp;quot;lwage&amp;quot;], exog).fit(cov_type=&amp;quot;robust&amp;quot;)
re_coef = fit_re.params[&amp;quot;union&amp;quot;]
re_se = fit_re.std_errors[&amp;quot;union&amp;quot;]
print(f&amp;quot;Union coefficient: {re_coef:.4f} (SE {re_se:.4f})&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Union coefficient: 0.1092 (SE 0.0299)
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>Interpretation.&lt;/strong> RE returns 10.9 log points (SE 3.0), which sits squarely between POLS (0.075) and FE (0.210). RE is mathematically a &lt;em>weighted average&lt;/em> of the between and within estimators, with the weights determined by their relative variances. Because our data has very thin within variation in union status (only 9% of total), RE leans heavily toward the between picture and lands much closer to POLS than to FE. The RE standard error (0.030) is a striking 2.7× tighter than FE&amp;rsquo;s (0.081), but that efficiency is real only if individual effects are uncorrelated with union membership. If union-status selection is correlated with unobserved ability — and the gap between FE and POLS strongly suggests it is — that precision is being purchased with bias.&lt;/p>
&lt;h2 id="12-the-hausman-test-fe-or-re">12. The Hausman test: FE or RE?&lt;/h2>
&lt;p>The classic specification test for FE-vs-RE is due to &lt;strong>Hausman (1978)&lt;/strong>. The intuition: if both estimators are consistent (the RE assumption holds), they should give similar answers; if they differ a lot, the RE assumption is suspect and FE is preferred. Formally,&lt;/p>
&lt;p>$$H = (\hat{\beta}_{\mathrm{FE}} - \hat{\beta}_{\mathrm{RE}})' [V_{\mathrm{FE}} - V_{\mathrm{RE}}]^{-1} (\hat{\beta}_{\mathrm{FE}} - \hat{\beta}_{\mathrm{RE}}) \sim \chi^2(k)$$&lt;/p>
&lt;p>In words, this says: take the difference between the two coefficient vectors, weight it by the difference of the two variance matrices, and compare the resulting quadratic form to a chi-square distribution with degrees of freedom equal to the number of regressors. A large $H$ (small p-value) rejects the null that RE is consistent. Mapping to code: $\hat{\beta}_{\mathrm{FE}}$ is &lt;code>fe_coef&lt;/code>, $\hat{\beta}_{\mathrm{RE}}$ is &lt;code>re_coef&lt;/code>, and $V_{\mathrm{FE}}$ and $V_{\mathrm{RE}}$ are the squared standard errors (since we have a single regressor here, both reduce to scalars).&lt;/p>
&lt;pre>&lt;code class="language-python">b_diff = np.array([fe_coef - re_coef])
v_diff = np.array([[fe_se ** 2 - re_se ** 2]])
H = float(b_diff @ np.linalg.pinv(v_diff) @ b_diff)
p_h = 1 - chi2.cdf(H, df=1)
print(f&amp;quot;H statistic: {H:.4f} p-value = {p_h:.4f}&amp;quot;)
print(f&amp;quot;β_FE − β_RE = {b_diff[0]:+.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">H statistic: 1.7941 p-value = 0.1804
β_FE − β_RE = +0.1011
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>Interpretation.&lt;/strong> The two estimators differ by about 0.101 log points; the test statistic is 1.79 on 1 degree of freedom, giving a p-value of 0.180. Conventionally, since 0.180 &amp;gt; 0.05, we &lt;em>fail to reject&lt;/em> the null and conclude that RE is acceptable. But take this verdict with a grain of salt: the Hausman test has low power exactly when within variation is thin, which is the case here (only 9% within share for union). A noisy FE estimate inflates $V_{\mathrm{FE}}$ in the denominator and shrinks $H$, making non-rejection mechanical rather than substantive. We will see in the next section that the modern Mundlak alternative gives a borderline-significant signal in the same data.&lt;/p>
&lt;h2 id="13-correlated-random-effects-cre--mundlak-the-modern-bridge">13. Correlated random effects (CRE / Mundlak): the modern bridge&lt;/h2>
&lt;p>&lt;strong>Mundlak (1978)&lt;/strong> proposed a clever specification that bridges FE and RE. The idea: include each worker&amp;rsquo;s &lt;em>mean&lt;/em> of every time-varying regressor as an additional control, then run RE.&lt;/p>
&lt;p>$$y_{it} = \alpha + \beta x_{it} + \gamma \bar{x}_i + u_{it}$$&lt;/p>
&lt;p>In words, this says: model wages as a function of current union status, &lt;em>plus&lt;/em> the worker&amp;rsquo;s average union exposure across the panel. The coefficient $\beta$ on the time-varying $x_{it}$ captures the &lt;em>within&lt;/em> effect — and Mundlak proved that under standard assumptions it is numerically identical to the FE coefficient. The coefficient $\gamma$ on the worker mean $\bar{x}_i$ captures the &lt;em>between&lt;/em> effect of selection. If $\gamma \neq 0$, individual effects are correlated with union status and FE is preferred over RE. Mapping to code: $\beta$ is &lt;code>cre_coef&lt;/code>, $\gamma$ is &lt;code>mundlak_coef&lt;/code>, and $\bar{x}_i$ is the &lt;code>union_bar&lt;/code> column we constructed with &lt;code>df.groupby(&amp;quot;ID&amp;quot;)[&amp;quot;union&amp;quot;].transform(&amp;quot;mean&amp;quot;)&lt;/code>.&lt;/p>
&lt;pre>&lt;code class="language-python"># Stata: bysort ID: egen union_bar = mean(union); xtreg lwage union union_bar, re robust
df[&amp;quot;union_bar&amp;quot;] = df.groupby(&amp;quot;ID&amp;quot;)[&amp;quot;union&amp;quot;].transform(&amp;quot;mean&amp;quot;)
df_cre = df.set_index([&amp;quot;ID&amp;quot;, &amp;quot;year&amp;quot;])
exog_cre = sm.add_constant(df_cre[[&amp;quot;union&amp;quot;, &amp;quot;union_bar&amp;quot;]])
fit_cre = RandomEffects(df_cre[&amp;quot;lwage&amp;quot;], exog_cre).fit(cov_type=&amp;quot;robust&amp;quot;)
cre_coef = fit_cre.params[&amp;quot;union&amp;quot;]
cre_se = fit_cre.std_errors[&amp;quot;union&amp;quot;]
mundlak_coef = fit_cre.params[&amp;quot;union_bar&amp;quot;]
mundlak_p = fit_cre.pvalues[&amp;quot;union_bar&amp;quot;]
print(f&amp;quot;Union (within) coefficient: {cre_coef:.4f} (SE {cre_se:.4f})&amp;quot;)
print(f&amp;quot;Mundlak term (union_bar): {mundlak_coef:+.4f} (p = {mundlak_p:.4f})&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Union (within) coefficient: 0.2103 (SE 0.0703)
Mundlak term (union_bar): -0.1441 (p = 0.0717)
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>Interpretation.&lt;/strong> The CRE union coefficient is 0.2103 — &lt;em>exactly&lt;/em> the FE estimate to four decimal places, exactly as Mundlak&amp;rsquo;s algebraic result predicts. The Mundlak term is −0.1441 with a p-value of 0.072, marginally non-significant at the 5% level but suggestive: workers with higher &lt;em>average&lt;/em> union exposure tend to have lower wages even after conditioning on within-worker changes, which is consistent with negative selection into unionized jobs (lower-wage workers select into unions, perhaps because the union premium matters more for them). The Mundlak signal points the same direction as the Hausman test but reaches the borderline-significant zone because it does not have to fight the same noise penalty.&lt;/p>
&lt;h2 id="14-putting-it-all-together-the-method-comparison">14. Putting it all together: the method comparison&lt;/h2>
&lt;p>The figure below stacks all six basic estimators on a single chart with 95% confidence intervals.&lt;/p>
&lt;p>&lt;img src="panel_intro_coef_comparison.png" alt="Six panel-data estimators with 95% confidence intervals. The Hausman χ² and p-value are annotated.">&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Method&lt;/th>
&lt;th>Coef&lt;/th>
&lt;th>SE&lt;/th>
&lt;th>What variation does it use?&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>POLS&lt;/td>
&lt;td>0.0750&lt;/td>
&lt;td>0.0231&lt;/td>
&lt;td>All — ignores panel structure&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Between&lt;/td>
&lt;td>0.0662&lt;/td>
&lt;td>0.0311&lt;/td>
&lt;td>Cross-sectional means only&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>FDFE&lt;/td>
&lt;td>0.2113&lt;/td>
&lt;td>0.0792&lt;/td>
&lt;td>Within-individual differences&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>FE&lt;/td>
&lt;td>0.2103&lt;/td>
&lt;td>0.0812&lt;/td>
&lt;td>Within-individual demeaned&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>RE&lt;/td>
&lt;td>0.1092&lt;/td>
&lt;td>0.0299&lt;/td>
&lt;td>GLS-weighted between + within&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>CRE&lt;/td>
&lt;td>0.2103&lt;/td>
&lt;td>0.0703&lt;/td>
&lt;td>RE with Mundlak terms (= FE within)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>Interpretation.&lt;/strong> The six methods cluster into two clear camps. The cross-sectional methods (POLS 0.075, Between 0.066, RE 0.109) report a union premium of 7–11 log points; the within methods (FDFE 0.211, FE 0.210, CRE 0.210) report 21 log points. The factor-of-three gap is the central pedagogical finding of this dataset and is consistent with a story in which unobserved worker ability correlates &lt;em>negatively&lt;/em> with union status — workers who are higher-ability are less likely to be in unions in this sample, so cross-sectional comparisons understate the within-worker payoff to &lt;em>joining&lt;/em> a union. Standard errors swing inversely: cross-sectional methods are 2–3× more precise but identify a different (and biased, under our hypothesis) parameter, while within methods are noisier but causally cleaner under weaker assumptions.&lt;/p>
&lt;h2 id="15-adding-controls-the-extended-models">15. Adding controls: the extended models&lt;/h2>
&lt;p>Real applications usually include controls. We re-run POLS, TWFE, RE, and CRE with age, schooling, female, and year dummies on the right-hand side. The next code block stitches the four specifications together; the table below summarizes the union, age, schooling, and female coefficients.&lt;/p>
&lt;pre>&lt;code class="language-python"># POLS with controls
fit_pols_x = pf.feols(
&amp;quot;lwage ~ union + age + schooling + female + C(year)&amp;quot;,
data=df, vcov=&amp;quot;HC1&amp;quot;)
# TWFE: schooling and female are time-invariant → absorbed by ID FE
fit_twfe_x = pf.feols(&amp;quot;lwage ~ union + age | ID + year&amp;quot;,
data=df, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;ID&amp;quot;})
# RE + controls
df_rx = df.set_index([&amp;quot;ID&amp;quot;, &amp;quot;year&amp;quot;])
exog_rx = sm.add_constant(df_rx[[&amp;quot;union&amp;quot;, &amp;quot;age&amp;quot;, &amp;quot;schooling&amp;quot;, &amp;quot;female&amp;quot;]])
fit_re_x = RandomEffects(df_rx[&amp;quot;lwage&amp;quot;], exog_rx).fit(cov_type=&amp;quot;robust&amp;quot;)
# CRE + controls — adds the within-mean of every time-varying regressor
df[&amp;quot;age_bar&amp;quot;] = df.groupby(&amp;quot;ID&amp;quot;)[&amp;quot;age&amp;quot;].transform(&amp;quot;mean&amp;quot;)
exog_cx = sm.add_constant(
df_rx[[&amp;quot;union&amp;quot;, &amp;quot;union_bar&amp;quot;, &amp;quot;age&amp;quot;, &amp;quot;age_bar&amp;quot;, &amp;quot;schooling&amp;quot;, &amp;quot;female&amp;quot;]])
fit_cre_x = RandomEffects(df_rx[&amp;quot;lwage&amp;quot;], exog_cx).fit(cov_type=&amp;quot;robust&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Variable POLS TWFE RE CRE
================================================================================
union 0.0571 (0.0204) 0.2129 (0.0793) 0.0861 (0.0258) 0.2103 (0.0683)
age 0.0209 (0.0013) -0.0576 (0.0238) 0.0224 (0.0016) 0.0332 (0.0046)
schooling 0.1108 (0.0037) absorbed 0.1112 (0.0047) 0.1108 (0.0047)
female -0.2731 (0.0160) absorbed -0.2731 (0.0206) -0.2731 (0.0206)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="panel_intro_extended_models.png" alt="Extended models: union, age, schooling, female across POLS / TWFE / RE / CRE.">&lt;/p>
&lt;p>&lt;strong>Interpretation.&lt;/strong> Adding controls pulls the POLS union coefficient down to 0.057 — controls absorb some of the cross-sectional confounding — but TWFE and CRE still report a within-worker premium of about 0.21, leaving the four-camp gap (POLS 0.057 / RE 0.086 / TWFE 0.213 / CRE 0.210) largely intact. The schooling premium of 11.1% per year and the female penalty of 27.3 log points are stable across POLS, RE, and CRE because these regressors are essentially time-invariant; both are absorbed by individual FE in the TWFE column. The age coefficient does something interesting: it is +0.021 in POLS, +0.022 in RE, and +0.033 in CRE, but flips to −0.058 in TWFE. This is &lt;em>not&lt;/em> a real age–wage relationship: with T = 2 and every worker aging by exactly two years between waves, age within an individual is collinear with the year dummy, so the TWFE age coefficient confounds the age slope with the year effect. POLS, RE, and CRE return the expected positive age slope; the TWFE −0.058 should be read as a methodological artifact of T = 2.&lt;/p>
&lt;h2 id="16-discussion-what-does-our-case-study-tell-us">16. Discussion: what does our case study tell us?&lt;/h2>
&lt;p>We started with a deceptively simple question: does union membership raise wages, and if so by how much? Six estimators on the same dataset gave us answers ranging from 0.066 to 0.213 log points — a factor-of-three spread that is not noise but a structural feature of how the methods identify the parameter.&lt;/p>
&lt;p>The cross-sectional camp (POLS, Between, RE) is asking &amp;ldquo;how do union and non-union workers compare?&amp;rdquo;. Their 7–11% answer is what we would report if we believed union members were comparable to non-members on every relevant unobservable. The within camp (FE, FDFE, TWFE, CRE) is asking &amp;ldquo;what happens when &lt;em>the same worker&lt;/em> switches union status?&amp;rdquo;. Their 21% answer is what we would report if we trusted that nothing else changes for a worker between 2010 and 2012 except the things we observe. Both questions are legitimate; the gap between the answers is the empirical signature of selection on unobservables.&lt;/p>
&lt;p>The Hausman test failed to reject the random-effects assumption (p = 0.180), which by the textbook script would tell us to use RE. But the test has low power exactly when within variation is thin, which is the case here (9% within share). The Mundlak alternative landed at p = 0.072 — borderline non-significant by a hair — and the Mundlak term itself was −0.144, suggesting that the workers with more union exposure are different (lower-paid on average) from workers with less. Both tests point in the same direction, but Mundlak&amp;rsquo;s nuanced &amp;ldquo;almost significant&amp;rdquo; reading is more honest than Hausman&amp;rsquo;s confident &amp;ldquo;fail to reject&amp;rdquo; verdict.&lt;/p>
&lt;p>For a practitioner faced with this kind of dataset, the practical implication is that &lt;strong>CRE/Mundlak is usually the right specification to lead with&lt;/strong>. It gives you the FE coefficient on the time-varying treatment (the within effect), the RE structure that lets you keep schooling and gender in the regression, and a built-in specification test (the t-statistic on the Mundlak term) that beats Hausman in low-power settings. The cost is one extra regressor per time-varying covariate, which is essentially free in modern software.&lt;/p>
&lt;p>Stated formally in causal-inference language: the within estimators (FDFE, FE, TWFE, CRE) target the average treatment effect for &lt;em>union switchers&lt;/em> — the subset of workers who actually changed union status between 2010 and 2012 — under the assumption of strict exogeneity conditional on the worker fixed effect. POLS and Between target a population-weighted association between union status and log wages and do not have a causal interpretation absent unconfoundedness. Reporting both estimands side-by-side (as we have done) is more informative than picking one and ignoring the other.&lt;/p>
&lt;h2 id="17-summary-and-next-steps">17. Summary and next steps&lt;/h2>
&lt;p>&lt;strong>Takeaways.&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Method insight.&lt;/strong> Three within recipes — first-differences, the within transformation, and dummy-variable FE — produce the same coefficient on union (0.2103, with FDFE differing by only +0.001 because of an intercept-driven year-trend artifact). This identity holds exactly when T = 2 and approximately when T &amp;gt; 2; understanding &lt;em>why&lt;/em> is the single most useful intuition in panel econometrics.&lt;/li>
&lt;li>&lt;strong>Data insight.&lt;/strong> Almost all of our variation is between workers (union 94%, age 97%, schooling 100%). Only 9% of union variance is within. That is the slice of the data that fixed-effects estimators are working with, and it explains why FE standard errors (0.081) are 2.7× larger than RE standard errors (0.030).&lt;/li>
&lt;li>&lt;strong>Limitation.&lt;/strong> With T = 2, our FE estimate is power-limited. The Hausman test fails to reject the RE assumption (p = 0.180) primarily because $V_{\mathrm{FE}}$ is large, not because RE is consistent. The Mundlak term tells the same story with more nuance (p = 0.072, borderline). Real applications usually have T &amp;gt; 2 and substantially more within variation, which sharpens both the FE estimate and the specification tests.&lt;/li>
&lt;li>&lt;strong>Next step.&lt;/strong> A natural extension is to use all five waves of the panel (2010–2018) instead of just 2010 and 2012, which would give us T = 5 and dramatically more within variation in union status. With T &amp;gt; 2, the FD–FE gap becomes a real identification choice (FD is more efficient under serially correlated errors; FE under random errors), and event-study designs become possible.&lt;/li>
&lt;/ul>
&lt;h2 id="18-exercises">18. Exercises&lt;/h2>
&lt;ol>
&lt;li>&lt;strong>Repeat the analysis with all five waves of the panel&lt;/strong> (2010, 2012, 2014, 2016, 2018). How does the FE coefficient change? Does the Hausman test still fail to reject? What about the Mundlak term?&lt;/li>
&lt;li>&lt;strong>Add an interaction with female.&lt;/strong> Modify the FE specification to include &lt;code>union × female&lt;/code> and interpret the coefficient. Does the union premium differ by gender?&lt;/li>
&lt;li>&lt;strong>Try a clustered bootstrap.&lt;/strong> Re-estimate the FE model with &lt;code>vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;ID&amp;quot;}&lt;/code> and a wild cluster bootstrap (&lt;code>pyfixest&lt;/code> supports &lt;code>boot.iid()&lt;/code>). How do the bootstrap SEs compare to the analytical ones in this small-T setting?&lt;/li>
&lt;/ol>
&lt;h2 id="19-references">19. References&lt;/h2>
&lt;ol>
&lt;li>&lt;a href="https://pyfixest.org/pyfixest.html" target="_blank" rel="noopener">PyFixest documentation.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://bashtage.github.io/linearmodels/panel/introduction.html" target="_blank" rel="noopener">linearmodels: Panel models documentation.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2.html" target="_blank" rel="noopener">scipy.stats.chi2 documentation.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/quarcs-lab/data-open" target="_blank" rel="noopener">Wage panel dataset (&lt;code>wage_panel_bob4.dta&lt;/code>) — quarcs-lab data-open repository.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.jstor.org/stable/1913827" target="_blank" rel="noopener">Hausman, J. A. (1978). Specification Tests in Econometrics. &lt;em>Econometrica&lt;/em>, 46(6), 1251–1271.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.jstor.org/stable/1913646" target="_blank" rel="noopener">Mundlak, Y. (1978). On the Pooling of Time Series and Cross Section Data. &lt;em>Econometrica&lt;/em>, 46(1), 69–85.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://mitpress.mit.edu/9780262232586/econometric-analysis-of-cross-section-and-panel-data/" target="_blank" rel="noopener">Wooldridge, J. M. (2010). &lt;em>Econometric Analysis of Cross Section and Panel Data&lt;/em>, 2nd ed. MIT Press.&lt;/a>&lt;/li>
&lt;/ol></description></item></channel></rss>