panel data | Carlos Mendez

MGWRFER: Causal Spatially Varying Coefficients via Panel Fixed Effects

Sun, 03 May 2026 00:00:00 +0000

1. Overview

When we estimate how relationships vary across space — say, the effect of education on income in different neighborhoods — a hidden danger lurks. If some unobserved factor (like geographic amenities or historical institutions) affects both the outcome and the covariates, our spatially varying coefficients absorb that contamination. The result: coefficients that look like local effects but actually reflect omitted variable bias.

Multiscale Geographically Weighted Fixed Effects Regression (MGWRFER) solves this by combining two powerful ideas: (1) a within-transformation that removes all time-invariant confounders from panel data, and (2) Multiscale GWR that estimates location-specific coefficients at variable-optimal spatial scales. Think of it as giving each location its own regression while simultaneously controlling for everything about that location that does not change over time.

This tutorial asks: can we recover the true spatially varying coefficients when a strong, unobserved spatial confounder contaminates the data? We simulate a panel of 225 spatial units observed over 3 time periods, inject a known confounder, and compare naive pooled MGWR (biased) against MGWRFER (bias-corrected). The answer is yes — MGWRFER cuts the most-biased coefficient’s estimation error by 55%, demonstrating that fixed effects and spatial flexibility can coexist.

Learning objectives:

Understand why pooled cross-sectional MGWR produces biased coefficients when time-invariant confounders exist
Implement the within-transformation to eliminate fixed effects from panel data
Estimate spatially varying coefficients using MGWR on demeaned data
Assess coefficient recovery through RMSE, correlation, and spatial maps
Interpret the bias-variance tradeoff inherent in fixed-effects spatial models

The analysis follows a clear progression: simulate known truth, fit the naive model, apply the correction, and compare.

graph LR
A["<b>Step 1</b><br/>Simulate<br/>Panel DGP"] --> B["<b>Step 2</b><br/>Pooled<br/>MGWR"]
B --> C["<b>Step 3</b><br/>Within-<br/>Transform"]
C --> D["<b>Step 4</b><br/>MGWRFER<br/>Estimation"]
D --> E["<b>Step 5</b><br/>Compare<br/>& Map"]
style A fill:#141413,stroke:#6a9bcc,color:#fff
style B fill:#d97757,stroke:#141413,color:#fff
style C fill:#6a9bcc,stroke:#141413,color:#fff
style D fill:#00d4c8,stroke:#141413,color:#fff
style E fill:#1a3a8a,stroke:#141413,color:#fff

The key insight is at Step 3: by subtracting each unit’s time-series mean, the confounder vanishes — it contributes the same amount at every time period, so the mean subtraction cancels it exactly. What remains is pure within-unit variation, driven only by the spatially varying coefficients and noise.

2. Setup and imports

The analysis uses a custom fork of the mgwr package that extends MGWR with panel data support (the time parameter) and the ability to fit without an intercept (constant=False). We clone the repository and import directly.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=RuntimeWarning)
# Clone custom MGWR package
import subprocess, sys, os
REPO_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "mgwpr_repo")
if not os.path.exists(REPO_DIR):
subprocess.run(
["git", "clone", "https://github.com/GeoZhipengLi/MGWPR.git", REPO_DIR],
check=True, capture_output=True
)
sys.path.insert(0, REPO_DIR)
from mgwr.gwr import GWR, MGWR
from mgwr.sel_bw import Sel_BW
# Configuration
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
N_GRID = 15
N_UNITS = N_GRID * N_GRID # 225
N_TIME = 3
N_OBS = N_UNITS * N_TIME # 675

Dark theme figure styling (click to expand)

DARK_NAVY = "#0f1729"
GRID_LINE = "#1f2b5e"
LIGHT_TEXT = "#c8d0e0"
WHITE_TEXT = "#e8ecf2"
STEEL_BLUE = "#6a9bcc"
WARM_ORANGE = "#d97757"
TEAL = "#00d4c8"
plt.rcParams.update({
"figure.facecolor": DARK_NAVY,
"axes.facecolor": DARK_NAVY,
"axes.edgecolor": DARK_NAVY,
"axes.linewidth": 0,
"axes.labelcolor": LIGHT_TEXT,
"axes.titlecolor": WHITE_TEXT,
"axes.spines.top": False,
"axes.spines.right": False,
"axes.spines.left": False,
"axes.spines.bottom": False,
"axes.grid": True,
"grid.color": GRID_LINE,
"grid.linewidth": 0.6,
"grid.alpha": 0.8,
"xtick.color": LIGHT_TEXT,
"ytick.color": LIGHT_TEXT,
"text.color": WHITE_TEXT,
"font.size": 12,
"legend.frameon": False,
"savefig.facecolor": DARK_NAVY,
"savefig.edgecolor": DARK_NAVY,
})

3. Simulating panel data with a spatial confounder

To evaluate whether MGWRFER works, we need ground truth — known coefficient surfaces that we can compare against estimates. We simulate a 15x15 spatial grid (225 units) observed over 3 time periods, giving 675 total observations.

The data generating process (DGP) combines four covariates with known spatially varying coefficients plus a strong time-invariant confounder:

$$y_{it} = \alpha_i + \beta_1(u_i, v_i) \cdot x_{1,it} + \beta_2(u_i, v_i) \cdot x_{2,it} + \beta_3(u_i, v_i) \cdot x_{3,it} + \beta_4(u_i, v_i) \cdot x_{4,it} + \varepsilon_{it}$$

In words, this says: the outcome at location $i$ and time $t$ equals a location-specific fixed effect $\alpha_i$ (the confounder) plus four covariates multiplied by their location-specific coefficients, plus random noise. The subscript $(u_i, v_i)$ denotes the spatial coordinates — each coefficient is a different surface over the grid, not a single number.

Variable mapping:

$\alpha_i$ = alpha_true — an exponential function of column position (range 2.07 to 51.55)
$\beta_1$ = beta_1_true — a quadratic dome peaking at the grid center (range 1.06 to 2.00)
$\beta_2$ = beta_2_true — a linear gradient increasing from lower-left to upper-right (range 1.07 to 2.00)
$\beta_3$ = beta_3_true — constant at 1.5 everywhere (tests spatial homogeneity)
$\beta_4$ = beta_4_true — identically zero everywhere (tests false-positive detection)

rng = np.random.default_rng(RANDOM_SEED)
# Spatial grid coordinates
grid_i = np.repeat(np.arange(1, N_GRID + 1), N_GRID)
grid_j = np.tile(np.arange(1, N_GRID + 1), N_GRID)
# True spatially varying coefficients
q = np.ceil(N_GRID / 4)
beta_1_true = 1 + ((q**2 - (q - grid_i/2)**2) * (q**2 - (q - grid_j/2)**2)) / q**4
beta_2_true = 1 + (grid_i + grid_j) / (2 * N_GRID)
beta_3_true = np.full(N_UNITS, 1.5)
beta_4_true = np.zeros(N_UNITS)
# Time-invariant spatial confounder (fixed effect)
alpha_true = 30 * (np.exp(grid_j / N_GRID) - 1)
# Generate panel observations
x1 = rng.standard_normal(N_OBS)
x2 = rng.standard_normal(N_OBS)
x3 = rng.standard_normal(N_OBS)
x4 = rng.standard_normal(N_OBS)
epsilon = rng.standard_normal(N_OBS)
# Repeat coefficients across time periods
b1 = np.repeat(beta_1_true, N_TIME)
b2 = np.repeat(beta_2_true, N_TIME)
b3 = np.repeat(beta_3_true, N_TIME)
b4 = np.repeat(beta_4_true, N_TIME)
alpha_panel = np.repeat(alpha_true, N_TIME)
# Outcome = fixed effect + spatially varying slopes + noise
y = alpha_panel + b1*x1 + b2*x2 + b3*x3 + b4*x4 + epsilon
print(f"Panel data shape: ({N_OBS}, 14)")
print(pd.DataFrame({"y": y, "x1": x1, "x2": x2, "x3": x3, "x4": x4})
.describe().round(3).to_string())

Panel data shape: (675, 14)
y x1 x2 x3 x4
count 675.000 675.000 675.000 675.000 675.000
mean 23.069 -0.038 -0.014 -0.110 0.027
std 15.489 0.982 1.009 1.010 1.017
min -4.073 -2.965 -3.648 -3.048 -3.064
25% 9.717 -0.702 -0.675 -0.771 -0.647
50% 20.862 -0.049 0.012 -0.089 0.052
75% 35.123 0.580 0.636 0.554 0.683
max 57.411 2.914 3.179 2.914 2.857

The outcome y has a mean of 23.07 and standard deviation of 15.49. Most of this cross-sectional variation comes from the confounder $\alpha_i$, which ranges from 2.07 to 51.55 (mean 23.29). By contrast, the four covariates are standard-normal draws (means near 0, SDs near 1.0), and the true coefficients are all modest in magnitude (ranging from 0 to 2). This is a challenging identification problem: the confounder dominates the outcome, so any method that ignores it will attribute confounder variation to the covariates.

The figure below shows the true coefficient surfaces and the confounder pattern on the 15x15 grid.

fig, axes = plt.subplots(2, 2, figsize=(12, 11))
# ... plotting code for true coefficient surfaces ...
plt.savefig("mgwrfer_true_coefficients.png", dpi=300, bbox_inches="tight")

The contrast is stark: $\alpha_i$ (lower-right panel) has a range of nearly 50 units, while the coefficients $\beta_1$ through $\beta_3$ vary by at most 1 unit. Any cross-sectional model that cannot separate $\alpha_i$ from the slopes will produce severely biased estimates — the exponential fixed-effect pattern will “leak” into the coefficient surfaces, distorting their true shapes.

4. Pooled MGWR: the naive approach

The simplest approach ignores the panel structure entirely, treating all 675 observations as independent cross-sectional data and fitting MGWR with an intercept. This is what a researcher might do if they stacked multiple time periods without accounting for unit-specific effects.

The custom mgwr package requires variables to be standardized before multiscale bandwidth selection. The time=N_TIME parameter tells the algorithm that observations are grouped in panels of 3 time periods per unit, which affects the kernel weighting.

# Standardize raw data
Y_std_pooled = (Y_raw - Y_raw.mean()) / Y_raw.std()
X_std_pooled = (X_raw - X_raw.mean(axis=0)) / X_raw.std(axis=0)
# Bandwidth selection and fitting
pooled_selector = Sel_BW(
coords_panel, Y_std_pooled, X_std_pooled,
multi=True, constant=True, time=N_TIME
)
pooled_bw = pooled_selector.search()
pooled_model = MGWR(
coords_panel, Y_std_pooled, X_std_pooled,
pooled_selector, constant=True, time=N_TIME
).fit()
print(f"Pooled MGWR bandwidths: {pooled_bw}")
print(f"R-squared: {pooled_model.R2:.4f}")
print(f"AICc: {pooled_model.aicc:.2f}")

Pooled MGWR bandwidths: [ 44. 50. 175. 223. 223.]
Pooled MGWR R-squared: 0.9771
Pooled MGWR Adj. R-squared: 0.9759
Pooled MGWR AICc: -561.77

After back-transforming the standardized coefficients to the original scale, we compute recovery metrics against the known truth:

# Back-transform: beta_orig = beta_std * (y_std / x_std)
# Average per unit across time periods, then compare to true values
print(" beta1_pooled: RMSE=0.3945, Corr=0.4586")
print(" beta2_pooled: RMSE=0.0888, Corr=0.9504")
print(" beta3_pooled: RMSE=0.0578, Corr=nan")
print(" beta4_pooled: RMSE=0.2531, Corr=nan")

 beta1_pooled: RMSE=0.3945, Corr=0.4586
beta2_pooled: RMSE=0.0888, Corr=0.9504
beta3_pooled: RMSE=0.0578, Corr=nan
beta4_pooled: RMSE=0.2531, Corr=nan

The R-squared of 0.977 looks impressive, but it is misleading. The intercept (bandwidth = 44) absorbs most of the spatial variation from the confounder $\alpha_i$, inflating the apparent model fit without actually recovering the slope coefficients well. The contamination is most visible in $\beta_1$: its correlation with the true values is only 0.459, and the RMSE of 0.395 represents roughly 26% of the coefficient’s mean value (1.50). The model conflates the quadratic dome pattern with the exponential fixed effect. Meanwhile, $\beta_4$ — which is truly zero everywhere — shows an RMSE of 0.253, meaning the model falsely attributes confounder variation to a covariate that has no effect. The nan correlations for $\beta_3$ and $\beta_4$ are mathematically expected: the true values have zero variance (constant and zero respectively), making Pearson correlation undefined.

5. MGWRFER: removing the confounder

5.1 The within-transformation

The fix is elegant. If the confounder $\alpha_i$ does not change over time, we can eliminate it by subtracting each unit’s temporal mean from all its observations. This is the within-transformation — the workhorse of panel data econometrics. Think of it like zeroing a kitchen scale: you subtract the weight of the container (the fixed effect) so that only the contents (the covariate effects) remain.

Formally, for each unit $i$:

$$\tilde{y}_{it} = y_{it} - \bar{y}_i = \beta_1(u_i, v_i)(x_{1,it} - \bar{x}_{1,i}) + \cdots + \beta_4(u_i, v_i)(x_{4,it} - \bar{x}_{4,i}) + (\varepsilon_{it} - \bar{\varepsilon}_i)$$

In words, this says: after subtracting the unit mean $\bar{y}_i$, the fixed effect $\alpha_i$ vanishes completely (since $\alpha_i - \alpha_i = 0$). What remains are the within-unit deviations of the covariates multiplied by their true spatially varying coefficients, plus demeaned noise. The key causal assumption is that no time-varying confounders exist — strict exogeneity conditional on the fixed effects.

Variable mapping: $\tilde{y}_{it}$ corresponds to y_within in the code, $\bar{y}_i$ is computed via groupby("unit_id").transform("mean"), and the demeaned covariates are x1_within through x4_within.

# Assemble panel DataFrame (see script.py for full construction)
# panel_df contains: unit_id, time_id, coord_i, coord_j, y, x1-x4, true coefficients
# Within-transformation: subtract unit means
unit_means = panel_df.groupby("unit_id")[["y","x1","x2","x3","x4"]].transform("mean")
y_within = (panel_df["y"].values - unit_means["y"].values).reshape(-1, 1)
X_within = np.column_stack([
panel_df["x1"].values - unit_means["x1"].values,
panel_df["x2"].values - unit_means["x2"].values,
panel_df["x3"].values - unit_means["x3"].values,
panel_df["x4"].values - unit_means["x4"].values,
])
print(f"y_within range: [{y_within.min():.3f}, {y_within.max():.3f}]")
print(f"Max unit mean after demeaning: 7.11e-15 (should be ~0)")

 y_within range: [-6.877, 6.923]
Fixed effects removed (mean of y_within per unit = 0)
Max unit mean after demeaning: 7.11e-15 (should be ~0)

The demeaned outcome spans only [-6.88, 6.92] — a spread of 13.8 compared to the raw y range of [-4.07, 57.41] (spread of 61.5). The confounder, which ranged from 2.07 to 51.55, has been completely removed. The maximum unit mean after demeaning is 7.11 x 10^-15 — effectively machine-zero — confirming that the transformation is numerically exact. With $\alpha_i$ gone, any variation in the demeaned outcome is attributable solely to the covariates' spatially varying effects and noise.

5.2 MGWR on demeaned data

Now we fit MGWR on the within-transformed data. Two critical settings distinguish this from the pooled model:

constant=False — since demeaning removes the intercept (the unit-level mean is already gone), we fit slopes only.
Standardization — we standardize the demeaned variables before bandwidth selection, then back-transform the coefficients to the original scale.

# Standardize demeaned data
Y_std_fe = (y_within - y_within.mean()) / y_within.std()
X_std_fe = (X_within - X_within.mean(axis=0)) / X_within.std(axis=0)
# Bandwidth selection (no intercept)
fe_selector = Sel_BW(
coords_panel, Y_std_fe, X_std_fe,
multi=True, constant=False, time=N_TIME
)
fe_bw = fe_selector.search()
# Fit MGWRFER
fe_model = MGWR(
coords_panel, Y_std_fe, X_std_fe,
fe_selector, constant=False, time=N_TIME
).fit()
print(f"MGWRFER bandwidths: {fe_bw}")
print(f"R-squared: {fe_model.R2:.4f}")
print(f"AICc: {fe_model.aicc:.2f}")

 MGWRFER bandwidths: [ 50. 91. 116. 62.]
MGWRFER R-squared: 0.8900
MGWRFER Adj. R-squared: 0.8844
MGWRFER AICc: 496.09

The R-squared of 0.890 reflects explanatory power over the demeaned outcome — it is not directly comparable to the pooled model’s 0.977, which operates on raw y dominated by the confounder. A fairer interpretation: 89% of the within-unit temporal variation is explained by the spatially varying slopes.

After back-transforming the coefficients (beta_orig = beta_std * (y_std / x_std)) and averaging per unit:

print(" beta1_mgwrfer: RMSE=0.1793, Corr=0.8179")
print(" beta2_mgwrfer: RMSE=0.1050, Corr=0.9407")
print(" beta3_mgwrfer: RMSE=0.0724, Corr=nan")
print(" beta4_mgwrfer: RMSE=0.1399, Corr=nan")

 beta1_mgwrfer: RMSE=0.1793, Corr=0.8179
beta2_mgwrfer: RMSE=0.1050, Corr=0.9407
beta3_mgwrfer: RMSE=0.0724, Corr=nan
beta4_mgwrfer: RMSE=0.1399, Corr=nan

The improvement for $\beta_1$ is dramatic: RMSE drops from 0.395 to 0.179 (a 54.6% reduction) and the correlation with true values jumps from 0.459 to 0.818. MGWRFER now captures the quadratic dome pattern instead of conflating it with the fixed effect. For the null coefficient $\beta_4$, RMSE drops from 0.253 to 0.140 (44.7% reduction) — much less false-positive contamination. However, $\beta_2$ and $\beta_3$ show modest RMSE increases (0.089 to 0.105, and 0.058 to 0.072). This is the bias-variance tradeoff at work: the within-transformation reduces effective sample size (from raw observations to within-unit deviations), increasing estimation variance for coefficients that were already well-identified by pooled MGWR.

6. Comparing coefficient recovery

The scatter plots below compare true vs estimated coefficients for both approaches. In a perfect model, all points would lie on the 45-degree reference line.

# Figure 2: True vs Pooled MGWR (3-panel scatter)
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for ax, true_vals, est_vals, label in zip(axes, true_arrays, pooled_arrays, labels):
ax.scatter(true_vals, est_vals, color=STEEL_BLUE, alpha=0.4, s=15)
ax.plot(lims, lims, color=WARM_ORANGE, linewidth=2, linestyle="--")
# ... annotation code ...
plt.savefig("mgwrfer_bias_pooled.png", dpi=300, bbox_inches="tight")

The pooled MGWR scatter reveals the damage: $\beta_1$ points are widely dispersed around the 45-degree line, with the model systematically overestimating some locations and underestimating others (Corr = 0.459). The quadratic dome shape is barely recovered. In contrast, $\beta_2$ hugs the reference line (Corr = 0.950) because its linear gradient is more easily separated from the exponential confounder.

# Figure 3: True vs MGWRFER (3-panel scatter)
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for ax, true_vals, est_vals, label in zip(axes, true_arrays, fe_arrays, labels):
ax.scatter(true_vals, est_vals, color=TEAL, alpha=0.4, s=15)
ax.plot(lims, lims, color=WARM_ORANGE, linewidth=2, linestyle="--")
# ... annotation code ...
plt.savefig("mgwrfer_recovery_fe.png", dpi=300, bbox_inches="tight")

After fixed-effects correction, the $\beta_1$ scatter tightens dramatically — the correlation jumps from 0.459 to 0.818, and the quadratic dome structure is clearly visible as a tight band along the reference line. The tradeoff is visible in $\beta_2$ and $\beta_3$: slightly wider scatter (more variance) but still centered on the truth, indicating unbiased estimation with higher noise.

7. Model comparison

Metric	Pooled MGWR	MGWRFER	Change
RMSE ($\beta_1$)	0.3945	0.1793	-54.6%
RMSE ($\beta_2$)	0.0888	0.1050	+18.2%
RMSE ($\beta_3$)	0.0578	0.0724	+25.2%
RMSE ($\beta_4$)	0.2531	0.1399	-44.7%
Corr ($\beta_1$)	0.4586	0.8179	+78%
Corr ($\beta_2$)	0.9504	0.9407	-1.0%

The pattern is clear: MGWRFER delivers its largest improvements precisely where pooled MGWR was most biased. For coefficients contaminated by the confounder ($\beta_1$ and $\beta_4$), RMSE drops 45-55%. For coefficients already well-estimated ($\beta_2$ and $\beta_3$), RMSE rises modestly (18-25%) but the absolute values remain small. This is a favorable tradeoff in practice — eliminating severe bias at the cost of slightly higher variance for already-precise estimates.

8. Bandwidth comparison

The bandwidths reveal how the fixed-effects correction changes the spatial structure that MGWR detects.

print("Pooled MGWR bws (x1-x4): [50, 175, 223, 223]")
print("MGWRFER bws (x1-x4): [50, 91, 116, 62]")

 Pooled MGWR bws (x1-x4): [50, 175, 223, 223]
MGWRFER bws (x1-x4): [50, 91, 116, 62]

MGWRFER selects uniformly smaller bandwidths for 3 of 4 covariates. The most dramatic shift is x4 (null effect): the pooled model uses bandwidth 223 (nearly global, treating the coefficient as spatially constant), while MGWRFER uses 62. This happens because the pooled model’s x4 coefficient was absorbing the globally smooth confounder variation — requiring a large kernel to fit that smooth pattern. After demeaning removes the confounder, the remaining x4 variation is local noise best captured with a smaller kernel. Similarly, x2 drops from 175 to 91 and x3 from 223 to 116. Only x1 retains the same bandwidth (50 in both models) — its quadratic dome has a genuinely local structure that requires a small kernel regardless of whether the confounder is removed.

9. Spatial coefficient maps

The most convincing evidence comes from mapping the estimated surfaces alongside the known truth.

# 2x3 grid: top row = true, bottom row = MGWRFER estimates
fig, axes = plt.subplots(2, 3, figsize=(16, 11))
# ... mapping code with shared colorbars ...
plt.savefig("mgwrfer_coefficient_maps.png", dpi=300, bbox_inches="tight")

The MGWRFER estimated $\beta_1$ map (bottom-left) recovers the concentric dome pattern of the true coefficient (top-left), though with some smoothing at the edges. The $\beta_2$ linear gradient (bottom-center) matches the true gradient (top-center) with high fidelity. The $\beta_3$ map (bottom-right) shows mild spurious spatial variation around the true constant of 1.5 — this illustrates the variance cost of within-transformation for spatially homogeneous effects (RMSE = 0.072).

10. Statistical significance

A key diagnostic for MGWRFER is whether it correctly identifies which coefficients are significant at each location. The significance maps below use filtered t-values (corrected for multiple testing across the 225 spatial units).

# 2x2 significance maps
# Orange = significant positive, dark blue = not significant
plt.savefig("mgwrfer_significance_maps.png", dpi=300, bbox_inches="tight")

All 225 spatial units show statistically significant positive effects for $\beta_1$, $\beta_2$, and $\beta_3$ — consistent with the true DGP where all three are strictly positive everywhere. The critical test is $\beta_4$ (truly zero): 202 of 225 units (89.8%) are correctly classified as not significant, while 23 units (10.2%) show false positives. This false-positive rate, though above the nominal 5% level, is substantially better than what pooled MGWR would produce — where the inflated RMSE of 0.253 implies widespread spurious significance. The false positives are spatially concentrated in a small cluster, suggesting boundary effects or local multicollinearity rather than systematic bias.

11. Discussion

Returning to our original question: can we recover the true spatially varying coefficients when a strong, unobserved spatial confounder contaminates the data? The answer is a qualified yes.

MGWRFER successfully eliminates the confounder’s influence on coefficient estimation. The most contaminated coefficient ($\beta_1$) goes from poorly recovered (Corr = 0.459) to well-recovered (Corr = 0.818). The null coefficient ($\beta_4$) goes from showing substantial false-positive bias (RMSE = 0.253) to being correctly identified as non-significant in 90% of locations. These improvements are not marginal — they represent the difference between misleading and informative inference.

The tradeoff is real but manageable. Coefficients that were already well-estimated see modest RMSE increases (18-25%), because the within-transformation reduces effective sample size. A practitioner facing this tradeoff should ask: “Is the potential for confounding bias worse than a small increase in estimation variance?” In most applied settings — where unobserved spatial confounders are plausible but unmeasurable — the answer is yes. The bias from ignoring fixed effects is systematic (it pushes estimates in the wrong direction), while the variance increase is random (it widens confidence intervals without introducing directional error).

The causal interpretation of MGWRFER coefficients requires the assumption of no time-varying confounders — strict exogeneity conditional on the fixed effects. In real applications, this is stronger than it sounds: it rules out any unobserved factor that changes over time and is correlated with both the covariates and the outcome. Researchers should justify this assumption carefully, especially in settings with policy changes, structural breaks, or trending confounders.

12. Summary and next steps

Key takeaways:

Bias correction works: MGWRFER reduces RMSE by 55% for the most-biased coefficient ($\beta_1$: 0.395 to 0.179) and by 45% for the null effect ($\beta_4$: 0.253 to 0.140), demonstrating effective removal of time-invariant confounding.
Bias-variance tradeoff is favorable: The variance cost is modest — $\beta_2$ RMSE rises from 0.089 to 0.105, and $\beta_3$ from 0.058 to 0.072 — while the bias elimination is large. Systematic bias is worse than random variance in most applications.
Bandwidths reveal confounding structure: After demeaning, MGWRFER selects smaller bandwidths (x4: 223 to 62; x2: 175 to 91), indicating that the confounder was inflating spatial smoothness estimates. The true coefficient surfaces are more localized than the pooled model suggests.
False-positive control improves: The null coefficient is correctly identified as non-significant in 90% of locations under MGWRFER, compared to the pooled model where RMSE of 0.253 would imply widespread false significance.

Limitations:

Only 3 time periods — more periods would improve within-estimator efficiency and reduce the false-positive rate
The simulated confounder is time-invariant by construction; in practice, time-varying confounders remain a threat
Computational cost: MGWR bandwidth selection scales poorly with N, limiting grid sizes
The 15x15 grid (225 units) is small; results may differ quantitatively at larger scales

Next steps:

Apply MGWRFER to real panel data (e.g., regional economic growth, housing prices, environmental exposure)
Compare with alternative spatial panel methods (spatial lag/error with fixed effects)
Explore the relationship between the number of time periods and the bias-variance tradeoff
Extend to cases with spatially and temporally varying coefficients (GTWRFER)

13. Exercises

Increase time periods: Modify the DGP to use N_TIME = 10 instead of 3. How does the bias-variance tradeoff change? Does $\beta_2$’s RMSE still increase under MGWRFER, or does the larger effective sample size offset the variance cost?
Add a time-varying confounder: Create a variable $\gamma_t$ that changes over time and is correlated with $x_1$. Add it to the DGP as $y_{it} = \alpha_i + \gamma_t \cdot x_{1,it} + \ldots$. Does MGWRFER still improve coefficient recovery, or does the time-varying confounder break the within-transformation’s assumptions?
Real-world application: Download a panel dataset of regional economic indicators (e.g., from the World Bank or PySAL sample data). Apply MGWRFER and compare against pooled MGWR. What spatial patterns emerge in the coefficient maps that the pooled model misses?

References

Introduction to Panel Data Methods in Python

Mon, 27 Apr 2026 00:00:00 +0000

1. Overview

Imagine you have data on the same workers in two different years — 2010 and 2012 — and you want to know whether joining a union raises a worker’s wage. A simple regression on the pooled data says yes, by about 7.5%. But that headline number hides a problem that has occupied econometricians for fifty years: workers who join unions are not the same as workers who don’t. Maybe they have less formal education, or they work in industries where unions are common, or they are older and have negotiated harder. If any of those unobserved differences also affect wages, the 7.5% estimate is mixing the union effect with everything else that comes bundled with union status.

This is the omitted-variable bias problem, and panel data — repeated observations on the same units over time — gives us several ways to fight it. By comparing each worker to themselves across years, we can strip out anything that is constant within a person (innate ability, gender, schooling, family background) and isolate the effect of switching union status. The price is a much smaller effective sample: only the workers who actually changed union status between 2010 and 2012 contribute to the estimate. The benefit is a coefficient that is much harder to dismiss as confounded.

This tutorial walks through the seven canonical panel estimators on a real two-period wage panel: pooled OLS, between, first-differences, the within (fixed effects) estimator, two-way fixed effects, random effects, and Mundlak’s correlated random effects. Along the way we run the Hausman test and visualize what the within transformation actually does to the data. The headline result will surprise some readers: once we account for unobserved worker traits, the union wage premium roughly triples — from about 7% to about 21%.

Learning objectives:

Understand the difference between between and within variation in panel data, and why this distinction drives the choice of estimator.
Implement seven panel-data estimators in Python using pyfixest and linearmodels, with one short code block per method.
Visualize the within transformation and see geometrically why fixed effects produce a different slope than pooled OLS.
Run the Hausman test to compare fixed and random effects, and use the Mundlak/CRE specification as the modern alternative.
Interpret the factor-of-three gap between cross-sectional and within estimators in terms of selection on unobservables.

The diagram below summarizes the estimator family and how the two specification tests (Hausman and Mundlak) point you toward FE or RE based on the data.

flowchart TD
A["Panel data y_it, x_it for i = 1..N, t = 1..T"]
A --> B{"What variation does the estimator use?"}
B -->|"All variation (ignores panel)"| POLS["Pooled OLS"]
B -->|"Cross-sectional only"| BETW["Between"]
B -->|"Within-individual only"| WITHIN["FE / FDFE / DVFE / TWFE"]
B -->|"Weighted between + within"| RE["Random Effects"]
WITHIN --> TEST{"Hausman test or Mundlak term"}
RE --> TEST
TEST -->|"Reject H0: RE inconsistent"| USE_FE["Use FE (consistent)"]
TEST -->|"Fail to reject: RE plausible"| USE_RE["Use RE (efficient)"]
WITHIN --> CRE["CRE / Mundlak: bridges FE and RE"]
RE --> CRE
style POLS fill:#999999,stroke:#141413,color:#fff
style BETW fill:#8FB4D8,stroke:#141413,color:#141413
style WITHIN fill:#d97757,stroke:#141413,color:#fff
style RE fill:#00d4c8,stroke:#141413,color:#141413
style CRE fill:#c4623d,stroke:#141413,color:#fff
style USE_FE fill:#d97757,stroke:#141413,color:#fff
style USE_RE fill:#00d4c8,stroke:#141413,color:#141413

The diagram makes the central trade-off visible. Estimators on the left side (POLS, Between, RE) lean on cross-sectional variation — they answer “how do union and non-union workers compare?” Estimators on the right (FE, FDFE, DVFE, TWFE) lean on within-worker variation — they answer “what happens when the same worker switches union status?” CRE/Mundlak sits in the middle and provides a single specification that recovers both. The Hausman test and the Mundlak term are formal tests for choosing between FE and RE; we will run both and they will agree.

2. Setup and imports

We use pyfixest for OLS and absorbed fixed effects, linearmodels for the random-effects GLS estimator, and scipy.stats.chi2 for the Hausman test critical value. The standard pandas / numpy / matplotlib stack handles data and figures.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pyfixest as pf
import statsmodels.api as sm
from linearmodels.panel import RandomEffects
from scipy.stats import chi2
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
rng = np.random.default_rng(RANDOM_SEED)

The dark-theme plt.rcParams block is in script.py and is omitted here for brevity. All figures in this post use the site’s dark-navy palette.

3. Data loading

We load a two-period wage panel from a Stata .dta file: NLSY-style data on US workers observed in 2010, 2012, 2014, 2016, and 2018. For pedagogical clarity we restrict the analysis to 2010 and 2012 only, which makes T = 2 and gives us the cleanest possible illustration of the textbook result that first-differences and the within estimator are the same thing. With T = 2, every worker contributes exactly two observations, so the panel is automatically balanced.

DATA_URL = "https://github.com/quarcs-lab/data-open/raw/master/isds/wage_panel_bob4.dta"
df_full = pd.read_stata(DATA_URL)
# Keep two periods so the FD = Within identity is visible.
df = df_full[df_full["year"].isin([2010, 2012])].copy()
df = df.sort_values(["ID", "year"]).reset_index(drop=True)
# Convert union "Yes/No" to 1/0; build a female dummy.
df["union"] = df["union"].map({"Yes": 1, "No": 0, 1: 1, 0: 0})
df["female"] = (df["gender"].astype(str).str.strip().str.lower() == "female").astype(float)
# Drop rows with missing values in the variables we use.
df = df.dropna(subset=["lwage", "union", "age", "schooling"]).reset_index(drop=True)

The next block prints panel structure and descriptive statistics. The “balanced” check confirms every worker has exactly two observations, and the descriptive table tells us how spread out our key variables are.

print(f"Individuals (N): {df['ID'].nunique()}")
print(f"Time periods (T): {df['year'].nunique()}")
print(f"Observations (N×T): {len(df)}")
print(f"Balanced: {(df.groupby('ID')['year'].count() == df['year'].nunique()).all()}")
print(df[["lwage", "union", "age", "schooling"]].describe().round(4))

Individuals (N): 2199
Time periods (T): 2
Observations (N×T): 4398
Balanced: True
lwage union age schooling
count 4398.0000 4398.0000 4398.0000 4398.0000
mean 3.1061 0.1626 35.6794 14.5020
std 0.5982 0.3690 6.2576 2.1825
min -1.7325 0.0000 25.0000 3.0000
max 6.0635 1.0000 49.0000 17.0000

Interpretation. The analysis sample is a perfectly balanced panel of 2,199 prime-age workers (mean age 35.7, range 25–49) observed in 2010 and 2012, for 4,398 worker-year observations. Only 16.3% of the sample is unionized in any given period (mean union = 0.1626), which means the dataset leans heavily on non-union workers — a relevant constraint for any estimator that uses cross-sectional variation. Mean log wage is 3.11 with a standard deviation of 0.60, and average schooling is 14.5 years. With balanced T = 2, the within and first-difference transformations are particularly clean because every individual contributes the same amount of within-variation: exactly one switch (or non-switch) per regressor.

4. Between vs within variance: how much do panel methods have to work with?

Before estimating anything, it helps to ask a diagnostic question: for each variable, how much variation comes from differences between workers and how much from changes within workers over time? Fixed-effects estimators only use the within part. If the within part is tiny, FE will be noisy no matter how large the sample is.

The decomposition splits each variable’s variance into two pieces. The between part is the variance of each worker’s two-year mean: $\mathrm{Var}(\bar{x}_i)$. The within part is the variance of each observation around its own worker’s mean: $\mathrm{Var}(x_{it} - \bar{x}_i)$. Their sum is (approximately) the total variance.

for var in ["lwage", "union", "age", "schooling"]:
overall_sd = df[var].std()
between_sd = df.groupby("ID")[var].mean().std()
within_sd = (df[var] - df.groupby("ID")[var].transform("mean")).std()
between_pct = between_sd**2 / (between_sd**2 + within_sd**2) * 100
print(f"{var:<10} overall {overall_sd:.4f} between {between_sd:.4f}"
f" within {within_sd:.4f} between% {between_pct:.1f}")

lwage overall 0.5982 between 0.5570 within 0.2184 between% 86.7
union overall 0.3690 between 0.3576 within 0.0911 between% 93.9
age overall 6.2576 between 6.1755 within 1.0147 between% 97.4
schooling overall 2.1825 between 2.1827 within 0.0000 between% 100.0

Interpretation. Almost all of the variation in our variables is between workers, not over time within a worker. Union status is 93.9% between and only 9.1% within — fixed-effects estimators have access to that thin 9% slice of total union variance. Schooling has zero within-variation (100% between) because nobody’s reported education changes between 2010 and 2012 in this sample, which is why FE will mechanically drop schooling from the regression. The big methodological consequence is that FE standard errors will be much larger than POLS standard errors, so the choice between FE and RE is not just a question of unbiasedness; it is also a question of statistical precision.

5. Visualizing the panel: who actually changes union status?

The variance decomposition tells us the within share is small. A spaghetti plot of individual log-wage trajectories makes the same point visually. We sample 30 random workers and color each line by the worker’s union pattern: orange if always union, blue if never union, and teal if union status changed between 2010 and 2012.

sample_ids = rng.choice(df["ID"].unique(), size=30, replace=False)
fig, ax = plt.subplots(figsize=(10, 6))
for pid in sample_ids:
person = df[df["ID"] == pid].sort_values("year")
if person["union"].nunique() > 1:
ax.plot(person["year"], person["lwage"], "o-", color="#00d4c8", lw=2) # changer
else:
c = "#d97757" if person["union"].iloc[0] == 1 else "#6a9bcc"
ax.plot(person["year"], person["lwage"], "o-", color=c, alpha=0.35)
plt.savefig("panel_intro_trajectories.png", dpi=300, bbox_inches="tight")

Interpretation. Most of the lines are flat-colored (blue or orange): workers who are always or never in a union over the two-year window. Only the teal lines — the ones that change union status — provide identifying information for fixed effects, first-differences, and Mundlak/CRE. If you squint at the figure and ignore the teal lines, you have effectively run a between estimator. If you ignore everything except the teal lines, you have run fixed effects. The post’s central tension between cross-sectional and within methods is a question of which lines you choose to read.

6. Pooled OLS: the naive baseline

We start with the simplest possible estimator: regress log wages on union membership, treating every worker-year as if it were an independent observation. This is pooled OLS (POLS). It ignores the panel structure entirely.

# Stata: reg lwage union, robust
fit_pols = pf.feols("lwage ~ union", data=df, vcov="HC1")
pols_coef = fit_pols.coef()["union"]
pols_se = fit_pols.se()["union"]
print(f"Union coefficient: {pols_coef:.4f} (SE {pols_se:.4f})")

Union coefficient: 0.0750 (SE 0.0231)

Interpretation. Pooled OLS reports a union wage premium of 7.5 log points (SE 2.3 percentage points), which is highly significant by conventional standards (t ≈ 3.25). This is the textbook cross-sectional answer and the number a naive analyst would report. It is almost certainly biased: if higher-ability workers select out of unionized jobs (a common pattern in this dataset), then POLS confounds the union effect with whatever ability does to wages. The rest of the post is essentially a tour through different ways of subtracting the bias out.

7. Between estimator: the cross-sectional benchmark

The between estimator takes POLS to its logical extreme: collapse each worker to their two-year mean, then run OLS across workers. This uses only between-individual variation — the mirror image of fixed effects — and gives us a clean reference point for what a purely cross-sectional answer looks like.

# Stata: xtreg lwage union, be
df_between = df.groupby("ID")[["lwage", "union"]].mean().reset_index()
fit_between = pf.feols("lwage ~ union", data=df_between, vcov="HC1")
between_coef = fit_between.coef()["union"]
between_se = fit_between.se()["union"]
print(f"Union coefficient: {between_coef:.4f} (SE {between_se:.4f})")

Union coefficient: 0.0662 (SE 0.0311)

Interpretation. Collapsing the panel to 2,199 individual averages and running OLS gives 6.6 log points (SE 3.1) — the cross-sectional union effect with all within-individual variation explicitly thrown away. Notice how close this is to POLS (0.066 vs 0.075): that is exactly what we should expect, because 94% of union variance is between-worker, so POLS and Between are looking at almost the same picture from slightly different angles. Both share the same identification problem and serve as the pre-FE benchmarks against which the within-style estimators will diverge sharply in the next sections.

8. First-differences: subtracting the past from the present

The first within-style estimator we will see is first-differences (FDFE). The idea is to subtract each worker’s 2010 values from their 2012 values; any time-invariant trait (ability, schooling, family background) cancels out in the subtraction. We are left with a regression of $\Delta\mathrm{lwage}$ on $\Delta\mathrm{union}$, identified entirely from the workers who changed union status.

Formally, write the panel model as

$$y_{it} = \alpha_i + \beta x_{it} + u_{it}$$

where $\alpha_i$ is the worker-specific (unobserved) effect. Differencing across the two periods gives

$$y_{i,2012} - y_{i,2010} = \beta (x_{i,2012} - x_{i,2010}) + (u_{i,2012} - u_{i,2010})$$

In words, this says: the change in wages between 2010 and 2012 equals $\beta$ times the change in union status, plus a noise term. The worker-specific $\alpha_i$ has vanished. Mapping to code: $y$ is the lwage column, $x$ is union, $\alpha_i$ is whatever is unique about each worker’s ID, and $\beta$ is the parameter we want to estimate.

# Stata: bysort ID: gen d_lwage = lwage - L.lwage; reg d_lwage d_union, robust
df_diff = (df.sort_values(["ID", "year"])
.groupby("ID")[["lwage", "union"]].diff().dropna())
df_diff.columns = ["d_lwage", "d_union"]
fit_fdfe = pf.feols("d_lwage ~ d_union", data=df_diff, vcov="HC1")
fdfe_coef = fit_fdfe.coef()["d_union"]
fdfe_se = fit_fdfe.se()["d_union"]
print(f"Union coefficient: {fdfe_coef:.4f} (SE {fdfe_se:.4f})")
print(f"Differenced sample: {len(df_diff)} rows (one per worker since T=2).")

Union coefficient: 0.2113 (SE 0.0792)
Differenced sample: 2199 rows (one per worker since T=2).

Interpretation. The first-difference estimator returns 21.1 log points (SE 7.9), with a 95% confidence interval of roughly [0.06, 0.37]. The point estimate is almost three times larger than POLS (0.211 vs 0.075), and the standard error is about 3.4× larger — the classic signature of moving from a cross-sectional design to a switcher-only design. The CI is wide but excludes zero, so the upward revision is statistically detectable. The intuition: workers who switch into unions are not the same as workers who are always in unions, so the within-worker effect is a different — and arguably cleaner — parameter than the cross-sectional comparison.

9. Within / Fixed effects: the same idea, run differently

The within estimator (also called fixed effects, FE) achieves the same goal as first-differences through a different transformation: it subtracts each worker’s mean from each observation. Every variable becomes $\tilde{x}_{it} = x_{it} - \bar{x}_i$. After this within transformation, OLS on the demeaned data delivers the FE coefficient. Modern software (pyfixest here, reghdfe in Stata) hides the demeaning step and just lets us write lwage ~ union | ID, where the | ID syntax means “absorb individual fixed effects”.

# Manual demeaning — pedagogical, makes the within transformation visible.
df["lwage_demean"] = df["lwage"] - df.groupby("ID")["lwage"].transform("mean")
df["union_demean"] = df["union"] - df.groupby("ID")["union"].transform("mean")
# Stata: xtreg lwage union, fe robust (or) reghdfe lwage union, absorb(ID)
fit_fe = pf.feols("lwage ~ union | ID", data=df, vcov="HC1")
fe_coef = fit_fe.coef()["union"]
fe_se = fit_fe.se()["union"]
print(f"Union coefficient: {fe_coef:.4f} (SE {fe_se:.4f})")

Union coefficient: 0.2103 (SE 0.0812)

The figure below visualizes what the demeaning actually does. The left panel shows the raw data — union (jittered for visibility) on the x-axis, log wage on the y-axis, and a POLS regression line through the cloud. The right panel shows the same observations after subtracting each worker’s mean from both variables; the FE regression line goes through the demeaned cloud and through the origin.

Interpretation. The two panels look almost like different datasets, but they come from the same observations. On the left (raw data), the POLS slope is ≈ 0.08, dragged down by the union and non-union workers' mean wages being close to each other. On the right (demeaned data), the FE slope is ≈ 0.21, identified only by the workers who actually changed union status — those are the points that move off the origin. The visual makes geometrically clear what the variance decomposition told us numerically: the within slope is steeper because we are no longer comparing across workers (where ability and schooling confound the picture); we are comparing each worker to themselves.

The FE coefficient of 0.2103 is essentially identical to FDFE (0.2113). The tiny gap of +0.001 comes from the fact that our FD regression includes an intercept (which absorbs an aggregate time trend), while plain FE does not. Once we add a year fixed effect to FE — that’s two-way FE in the next section — the gap closes exactly.

A small numerical aside: the dummy-variable version of FE gives the same answer.

df["ID_str"] = df["ID"].astype(str)
fit_dvfe = pf.feols("lwage ~ union + C(ID_str)", data=df, vcov="HC1")
print(f"DVFE coefficient: {fit_dvfe.coef()['union']:.4f}")

DVFE coefficient: 0.2103

Interpretation. Including a dummy for every worker (N − 1 = 2,198 dummies in this sample) recovers the FE coefficient exactly: 0.2103. The within transformation, first-differences, and dummy-variable FE are three recipes for the same dish. The reason modern software prefers absorption (| ID) over dummies is purely computational: with N = 2,199 dummies it still runs fast, but at N = 100,000 the dummy specification becomes prohibitive while absorbed FE remains trivial.

10. Two-way fixed effects: closing the FD–FE gap

Two-way fixed effects (TWFE) absorbs both individual and time effects. We let pyfixest handle both with | ID + year. This is the workhorse specification of applied micro and DID research.

# Stata: reghdfe lwage union age, absorb(ID year) vce(cluster ID)
fit_twfe = pf.feols("lwage ~ union + age | ID + year", data=df, vcov={"CRV1": "ID"})
twfe_coef = fit_twfe.coef()["union"]
twfe_se = fit_twfe.se()["union"]
print(f"Union coefficient: {twfe_coef:.4f} (SE {twfe_se:.4f})")

Union coefficient: 0.2129 (SE 0.0793)

Interpretation. TWFE returns 21.3 log points (SE 7.9), almost indistinguishable from FE (0.210). The small +0.002 gap relative to FE is exactly what closes the FD–FE puzzle from the previous section: by absorbing year effects we are mechanically removing the aggregate wage trend that FD’s intercept was capturing. Schooling, gender, and any other time-invariant regressor would be silently absorbed by the individual fixed effects — you cannot identify the effect of something that does not change within a worker. This is a structural feature of within-style methods, not a coding error, and is one of the main reasons applied researchers reach for CRE/Mundlak when they want both within identification and coefficients on time-invariant variables.

11. Random effects: betting on the no-correlation assumption

The random-effects (RE) estimator takes a different stance: it treats the worker effect $\alpha_i$ as a random draw from a population, uncorrelated with the regressors. If that assumption holds, RE is more efficient than FE because it uses both within and between variation. If the assumption fails, RE is biased.

Two pieces of vocabulary that the rest of this section relies on. First, RE is fit by generalized least squares (GLS) — a weighted regression that downweights observations whose individual effect is harder to learn from, which is what lets RE blend between- and within-variation in the right proportions. Second, an estimator is consistent if its bias shrinks toward zero as the sample grows; an inconsistent estimator stays biased no matter how much data you collect. RE is consistent under the no-correlation assumption; FE is consistent under weaker assumptions and is therefore the safer default whenever the no-correlation assumption is suspect.

# Stata: xtreg lwage union, re robust
df_re = df.set_index(["ID", "year"])
exog = sm.add_constant(df_re[["union"]])
fit_re = RandomEffects(df_re["lwage"], exog).fit(cov_type="robust")
re_coef = fit_re.params["union"]
re_se = fit_re.std_errors["union"]
print(f"Union coefficient: {re_coef:.4f} (SE {re_se:.4f})")

Union coefficient: 0.1092 (SE 0.0299)

Interpretation. RE returns 10.9 log points (SE 3.0), which sits squarely between POLS (0.075) and FE (0.210). RE is mathematically a weighted average of the between and within estimators, with the weights determined by their relative variances. Because our data has very thin within variation in union status (only 9% of total), RE leans heavily toward the between picture and lands much closer to POLS than to FE. The RE standard error (0.030) is a striking 2.7× tighter than FE’s (0.081), but that efficiency is real only if individual effects are uncorrelated with union membership. If union-status selection is correlated with unobserved ability — and the gap between FE and POLS strongly suggests it is — that precision is being purchased with bias.

12. The Hausman test: FE or RE?

The classic specification test for FE-vs-RE is due to Hausman (1978). The intuition: if both estimators are consistent (the RE assumption holds), they should give similar answers; if they differ a lot, the RE assumption is suspect and FE is preferred. Formally,

$$H = (\hat{\beta}_{\mathrm{FE}} - \hat{\beta}_{\mathrm{RE}})' [V_{\mathrm{FE}} - V_{\mathrm{RE}}]^{-1} (\hat{\beta}_{\mathrm{FE}} - \hat{\beta}_{\mathrm{RE}}) \sim \chi^2(k)$$

In words, this says: take the difference between the two coefficient vectors, weight it by the difference of the two variance matrices, and compare the resulting quadratic form to a chi-square distribution with degrees of freedom equal to the number of regressors. A large $H$ (small p-value) rejects the null that RE is consistent. Mapping to code: $\hat{\beta}_{\mathrm{FE}}$ is fe_coef, $\hat{\beta}_{\mathrm{RE}}$ is re_coef, and $V_{\mathrm{FE}}$ and $V_{\mathrm{RE}}$ are the squared standard errors (since we have a single regressor here, both reduce to scalars).

b_diff = np.array([fe_coef - re_coef])
v_diff = np.array([[fe_se ** 2 - re_se ** 2]])
H = float(b_diff @ np.linalg.pinv(v_diff) @ b_diff)
p_h = 1 - chi2.cdf(H, df=1)
print(f"H statistic: {H:.4f} p-value = {p_h:.4f}")
print(f"β_FE − β_RE = {b_diff[0]:+.4f}")

H statistic: 1.7941 p-value = 0.1804
β_FE − β_RE = +0.1011

Interpretation. The two estimators differ by about 0.101 log points; the test statistic is 1.79 on 1 degree of freedom, giving a p-value of 0.180. Conventionally, since 0.180 > 0.05, we fail to reject the null and conclude that RE is acceptable. But take this verdict with a grain of salt: the Hausman test has low power exactly when within variation is thin, which is the case here (only 9% within share for union). A noisy FE estimate inflates $V_{\mathrm{FE}}$ in the denominator and shrinks $H$, making non-rejection mechanical rather than substantive. We will see in the next section that the modern Mundlak alternative gives a borderline-significant signal in the same data.

13. Correlated random effects (CRE / Mundlak): the modern bridge

Mundlak (1978) proposed a clever specification that bridges FE and RE. The idea: include each worker’s mean of every time-varying regressor as an additional control, then run RE.

$$y_{it} = \alpha + \beta x_{it} + \gamma \bar{x}_i + u_{it}$$

In words, this says: model wages as a function of current union status, plus the worker’s average union exposure across the panel. The coefficient $\beta$ on the time-varying $x_{it}$ captures the within effect — and Mundlak proved that under standard assumptions it is numerically identical to the FE coefficient. The coefficient $\gamma$ on the worker mean $\bar{x}_i$ captures the between effect of selection. If $\gamma \neq 0$, individual effects are correlated with union status and FE is preferred over RE. Mapping to code: $\beta$ is cre_coef, $\gamma$ is mundlak_coef, and $\bar{x}_i$ is the union_bar column we constructed with df.groupby("ID")["union"].transform("mean").

# Stata: bysort ID: egen union_bar = mean(union); xtreg lwage union union_bar, re robust
df["union_bar"] = df.groupby("ID")["union"].transform("mean")
df_cre = df.set_index(["ID", "year"])
exog_cre = sm.add_constant(df_cre[["union", "union_bar"]])
fit_cre = RandomEffects(df_cre["lwage"], exog_cre).fit(cov_type="robust")
cre_coef = fit_cre.params["union"]
cre_se = fit_cre.std_errors["union"]
mundlak_coef = fit_cre.params["union_bar"]
mundlak_p = fit_cre.pvalues["union_bar"]
print(f"Union (within) coefficient: {cre_coef:.4f} (SE {cre_se:.4f})")
print(f"Mundlak term (union_bar): {mundlak_coef:+.4f} (p = {mundlak_p:.4f})")

Union (within) coefficient: 0.2103 (SE 0.0703)
Mundlak term (union_bar): -0.1441 (p = 0.0717)

Interpretation. The CRE union coefficient is 0.2103 — exactly the FE estimate to four decimal places, exactly as Mundlak’s algebraic result predicts. The Mundlak term is −0.1441 with a p-value of 0.072, marginally non-significant at the 5% level but suggestive: workers with higher average union exposure tend to have lower wages even after conditioning on within-worker changes, which is consistent with negative selection into unionized jobs (lower-wage workers select into unions, perhaps because the union premium matters more for them). The Mundlak signal points the same direction as the Hausman test but reaches the borderline-significant zone because it does not have to fight the same noise penalty.

14. Putting it all together: the method comparison

The figure below stacks all six basic estimators on a single chart with 95% confidence intervals.

Method	Coef	SE	What variation does it use?
POLS	0.0750	0.0231	All — ignores panel structure
Between	0.0662	0.0311	Cross-sectional means only
FDFE	0.2113	0.0792	Within-individual differences
FE	0.2103	0.0812	Within-individual demeaned
RE	0.1092	0.0299	GLS-weighted between + within
CRE	0.2103	0.0703	RE with Mundlak terms (= FE within)

Interpretation. The six methods cluster into two clear camps. The cross-sectional methods (POLS 0.075, Between 0.066, RE 0.109) report a union premium of 7–11 log points; the within methods (FDFE 0.211, FE 0.210, CRE 0.210) report 21 log points. The factor-of-three gap is the central pedagogical finding of this dataset and is consistent with a story in which unobserved worker ability correlates negatively with union status — workers who are higher-ability are less likely to be in unions in this sample, so cross-sectional comparisons understate the within-worker payoff to joining a union. Standard errors swing inversely: cross-sectional methods are 2–3× more precise but identify a different (and biased, under our hypothesis) parameter, while within methods are noisier but causally cleaner under weaker assumptions.

15. Adding controls: the extended models

Real applications usually include controls. We re-run POLS, TWFE, RE, and CRE with age, schooling, female, and year dummies on the right-hand side. The next code block stitches the four specifications together; the table below summarizes the union, age, schooling, and female coefficients.

# POLS with controls
fit_pols_x = pf.feols(
"lwage ~ union + age + schooling + female + C(year)",
data=df, vcov="HC1")
# TWFE: schooling and female are time-invariant → absorbed by ID FE
fit_twfe_x = pf.feols("lwage ~ union + age | ID + year",
data=df, vcov={"CRV1": "ID"})
# RE + controls
df_rx = df.set_index(["ID", "year"])
exog_rx = sm.add_constant(df_rx[["union", "age", "schooling", "female"]])
fit_re_x = RandomEffects(df_rx["lwage"], exog_rx).fit(cov_type="robust")
# CRE + controls — adds the within-mean of every time-varying regressor
df["age_bar"] = df.groupby("ID")["age"].transform("mean")
exog_cx = sm.add_constant(
df_rx[["union", "union_bar", "age", "age_bar", "schooling", "female"]])
fit_cre_x = RandomEffects(df_rx["lwage"], exog_cx).fit(cov_type="robust")

Variable POLS TWFE RE CRE
================================================================================
union 0.0571 (0.0204) 0.2129 (0.0793) 0.0861 (0.0258) 0.2103 (0.0683)
age 0.0209 (0.0013) -0.0576 (0.0238) 0.0224 (0.0016) 0.0332 (0.0046)
schooling 0.1108 (0.0037) absorbed 0.1112 (0.0047) 0.1108 (0.0047)
female -0.2731 (0.0160) absorbed -0.2731 (0.0206) -0.2731 (0.0206)

Interpretation. Adding controls pulls the POLS union coefficient down to 0.057 — controls absorb some of the cross-sectional confounding — but TWFE and CRE still report a within-worker premium of about 0.21, leaving the four-camp gap (POLS 0.057 / RE 0.086 / TWFE 0.213 / CRE 0.210) largely intact. The schooling premium of 11.1% per year and the female penalty of 27.3 log points are stable across POLS, RE, and CRE because these regressors are essentially time-invariant; both are absorbed by individual FE in the TWFE column. The age coefficient does something interesting: it is +0.021 in POLS, +0.022 in RE, and +0.033 in CRE, but flips to −0.058 in TWFE. This is not a real age–wage relationship: with T = 2 and every worker aging by exactly two years between waves, age within an individual is collinear with the year dummy, so the TWFE age coefficient confounds the age slope with the year effect. POLS, RE, and CRE return the expected positive age slope; the TWFE −0.058 should be read as a methodological artifact of T = 2.

16. Discussion: what does our case study tell us?

We started with a deceptively simple question: does union membership raise wages, and if so by how much? Six estimators on the same dataset gave us answers ranging from 0.066 to 0.213 log points — a factor-of-three spread that is not noise but a structural feature of how the methods identify the parameter.

The cross-sectional camp (POLS, Between, RE) is asking “how do union and non-union workers compare?”. Their 7–11% answer is what we would report if we believed union members were comparable to non-members on every relevant unobservable. The within camp (FE, FDFE, TWFE, CRE) is asking “what happens when the same worker switches union status?”. Their 21% answer is what we would report if we trusted that nothing else changes for a worker between 2010 and 2012 except the things we observe. Both questions are legitimate; the gap between the answers is the empirical signature of selection on unobservables.

The Hausman test failed to reject the random-effects assumption (p = 0.180), which by the textbook script would tell us to use RE. But the test has low power exactly when within variation is thin, which is the case here (9% within share). The Mundlak alternative landed at p = 0.072 — borderline non-significant by a hair — and the Mundlak term itself was −0.144, suggesting that the workers with more union exposure are different (lower-paid on average) from workers with less. Both tests point in the same direction, but Mundlak’s nuanced “almost significant” reading is more honest than Hausman’s confident “fail to reject” verdict.

For a practitioner faced with this kind of dataset, the practical implication is that CRE/Mundlak is usually the right specification to lead with. It gives you the FE coefficient on the time-varying treatment (the within effect), the RE structure that lets you keep schooling and gender in the regression, and a built-in specification test (the t-statistic on the Mundlak term) that beats Hausman in low-power settings. The cost is one extra regressor per time-varying covariate, which is essentially free in modern software.

Stated formally in causal-inference language: the within estimators (FDFE, FE, TWFE, CRE) target the average treatment effect for union switchers — the subset of workers who actually changed union status between 2010 and 2012 — under the assumption of strict exogeneity conditional on the worker fixed effect. POLS and Between target a population-weighted association between union status and log wages and do not have a causal interpretation absent unconfoundedness. Reporting both estimands side-by-side (as we have done) is more informative than picking one and ignoring the other.

17. Summary and next steps

Takeaways.

Method insight. Three within recipes — first-differences, the within transformation, and dummy-variable FE — produce the same coefficient on union (0.2103, with FDFE differing by only +0.001 because of an intercept-driven year-trend artifact). This identity holds exactly when T = 2 and approximately when T > 2; understanding why is the single most useful intuition in panel econometrics.
Data insight. Almost all of our variation is between workers (union 94%, age 97%, schooling 100%). Only 9% of union variance is within. That is the slice of the data that fixed-effects estimators are working with, and it explains why FE standard errors (0.081) are 2.7× larger than RE standard errors (0.030).
Limitation. With T = 2, our FE estimate is power-limited. The Hausman test fails to reject the RE assumption (p = 0.180) primarily because $V_{\mathrm{FE}}$ is large, not because RE is consistent. The Mundlak term tells the same story with more nuance (p = 0.072, borderline). Real applications usually have T > 2 and substantially more within variation, which sharpens both the FE estimate and the specification tests.
Next step. A natural extension is to use all five waves of the panel (2010–2018) instead of just 2010 and 2012, which would give us T = 5 and dramatically more within variation in union status. With T > 2, the FD–FE gap becomes a real identification choice (FD is more efficient under serially correlated errors; FE under random errors), and event-study designs become possible.

18. Exercises

Repeat the analysis with all five waves of the panel (2010, 2012, 2014, 2016, 2018). How does the FE coefficient change? Does the Hausman test still fail to reject? What about the Mundlak term?
Add an interaction with female. Modify the FE specification to include union × female and interpret the coefficient. Does the union premium differ by gender?
Try a clustered bootstrap. Re-estimate the FE model with vcov={"CRV1": "ID"} and a wild cluster bootstrap (pyfixest supports boot.iid()). How do the bootstrap SEs compare to the analytical ones in this small-T setting?

panel data | Carlos Mendez

MGWRFER: Causal Spatially Varying Coefficients via Panel Fixed Effects

1. Overview

2. Setup and imports

3. Simulating panel data with a spatial confounder

4. Pooled MGWR: the naive approach

5. MGWRFER: removing the confounder

5.1 The within-transformation

5.2 MGWR on demeaned data

6. Comparing coefficient recovery

7. Model comparison

8. Bandwidth comparison

9. Spatial coefficient maps

10. Statistical significance

11. Discussion

12. Summary and next steps

13. Exercises

References

Introduction to Panel Data Methods in Python

1. Overview

2. Setup and imports

3. Data loading

4. Between vs within variance: how much do panel methods have to work with?

5. Visualizing the panel: who actually changes union status?

6. Pooled OLS: the naive baseline

7. Between estimator: the cross-sectional benchmark

8. First-differences: subtracting the past from the present

9. Within / Fixed effects: the same idea, run differently

10. Two-way fixed effects: closing the FD–FE gap

11. Random effects: betting on the no-correlation assumption

12. The Hausman test: FE or RE?

13. Correlated random effects (CRE / Mundlak): the modern bridge

14. Putting it all together: the method comparison

15. Adding controls: the extended models

16. Discussion: what does our case study tell us?

17. Summary and next steps

18. Exercises

19. References