GWR and MGWR | Carlos Mendez

MGWFER: Causal Spatially Varying Coefficients via Panel Fixed Effects

Sun, 03 May 2026 00:00:00 +0000

1. Overview

When we estimate how relationships vary across space — say, the effect of education on income in different neighborhoods — a hidden danger lurks. If some unobserved attribute of place (geographic amenities, historical institutions, persistent social norms) affects both the outcome and the covariates, our spatially varying coefficients absorb that contamination. The result: coefficients that look like local effects but actually reflect omitted variable bias.

This post is a Python tutorial faithful to Li & Fotheringham (2026), “Spatial Context as a Time-Invariant Confounder: A Fixed-Effects Extension of MGWR,” Annals of the American Association of Geographers. The paper introduces Multiscale Geographically Weighted Fixed Effects Regression (MGWFER), a local panel framework that combines two powerful ideas: (1) a within-transformation that removes all time-invariant confounders from panel data, and (2) Multiscale GWR that estimates location-specific coefficients at variable-optimal spatial scales. Think of it as giving each location its own regression while simultaneously controlling for everything about that location that does not change over time.

This tutorial asks: can we recover the true spatially varying coefficients — and the intrinsic contextual effects themselves — when an unobserved spatial context drives both the outcome and the covariate levels? We simulate a panel of 225 spatial units observed over 3 time periods using the paper’s DGP verbatim (the indirect channel sc → x_k is active, with Cor(x_k, sc) ≈ 0.84), and compare six estimators across the full lineup the paper considers: cross-sectional OLS, pooled OLS, individual FE, cross-sectional MGWR, pooled MGWR (PMGWR), and MGWFER. The answer is yes on both counts: MGWFER cuts the most-biased local coefficient’s error by ~92% (β₁ RMSE 2.30 → 0.18, with the sign of the correlation against truth flipping from −0.46 to +0.82), and Stage 2 recovers the unit-level fixed effects with Pearson correlation ≈1.000 (0.9996) against the true confounder surface.

Learning objectives:

Distinguish the three kinds of contextual effects (intrinsic, behavioral, indirect) that the paper formalises.
See, via a causal DAG and a one-page Wooldridge derivation, why an unobserved spatial context produces omitted-variable bias in MGWR.
Implement the two-stage MGWFER algorithm: Stage 1 (within-transform + standardise + MGWR + back-transform) and Stage 2 (recover individual fixed effects with per-unit t-tests).
Compare PMGWR and MGWFER on RMSE, correlation, bandwidths, significance maps, and the recovered fixed-effects surface.
Audit the four identification assumptions under which MGWFER yields a causal interpretation, and the limitations that survive.

The analysis follows the paper’s progression: simulate known truth, fit the naive PMGWR, apply the within-transform, fit MGWFER, recover the fixed effects, then compare.

graph LR
A["<b>Step 1</b><br/>Simulate<br/>Panel DGP"] --> G["<b>Step 2</b><br/>Global baselines<br/>OLS / FE"]
G --> B["<b>Step 3</b><br/>MGWR_cs &amp;<br/>PMGWR"]
B --> C["<b>Step 4</b><br/>Within-<br/>Transform"]
C --> D["<b>Step 5</b><br/>Stage 1:<br/>MGWFER slopes"]
D --> F["<b>Step 6</b><br/>Stage 2:<br/>Recover &alpha;<sub>i</sub>"]
F --> E["<b>Step 7</b><br/>Compare<br/>all six"]
style A fill:#141413,stroke:#6a9bcc,color:#fff
style G fill:#6a9bcc,stroke:#141413,color:#fff
style B fill:#d97757,stroke:#141413,color:#fff
style C fill:#6a9bcc,stroke:#141413,color:#fff
style D fill:#00d4c8,stroke:#141413,color:#fff
style F fill:#00d4c8,stroke:#141413,color:#fff
style E fill:#1a3a8a,stroke:#141413,color:#fff

The key insight is at Step 3: by subtracting each unit’s time-series mean, the confounder vanishes — it contributes the same amount at every time period, so the mean subtraction cancels it exactly. What remains is pure within-unit variation, driven only by the spatially varying coefficients and noise. Stage 2 then walks the algorithm backwards: once we have the slopes, we recover the fixed effects $\alpha_i$ themselves as a substantive quantity of interest.

2. Three kinds of contextual effects

Li & Fotheringham (2026) reorganise how place can shape behaviour by splitting “contextual effects” into three categories. Two were already in the MGWR vocabulary; the third is the paper’s headline contribution and the reason MGWFER exists.

Intrinsic contextual effects. Unmeasured attributes of place (traditions, local norms, persistent geographic conditions) that directly shift the outcome. In MGWR these are captured by the local intercept $\alpha_{bw0}(u_i, v_i)$. In MGWFER they are captured by the individual fixed effect $\alpha_i$.
Behavioral contextual effects. How place modulates the slopes — i.e., the elasticities between $y$ and each covariate $x_k$. In MGWR these are the spatially varying coefficients $\beta_{bwk}(u_i, v_i)$, allowed to operate at covariate-specific bandwidths.
Indirect contextual effects (the paper’s key addition). How place shapes the levels of the covariates themselves. Wealthy regions tend to invest more in transit; coastal regions have more tourism; old-industrial regions have higher unemployment. The covariates are not exogenous — they have a backdoor link through spatial context. Standard MGWR’s exogeneity assumption denies this channel.

It is the third channel that contaminates MGWR estimates: because spatial context can both raise the levels of the $x_k$’s and shift $y$ directly, ignoring it creates a spurious correlation between covariates and outcomes that looks like a “local effect.” MGWFER’s within-transformation severs that backdoor path by removing every time-invariant component of place from both sides of the regression.

“Spatial context, as part of unmeasured factors, however, probably exerts a profound and widespread influence on a wide range of socioeconomic factors. Under these conditions, MGWR would suffer from endogeneity and potentially support misleading correlations between covariates and the response variable.” — Li & Fotheringham (2026)

3. Spatial context as a confounder: a causal-diagram view

The intuition is cleanest in the language of directed acyclic graphs (DAGs; Pearl 2009). Two graphs are at issue.

graph LR
subgraph "Figure 2A — MGWR's implicit assumption"
X1["X (covariates)"] -->|"β"| Y1["Y (outcome)"]
SC1((SC)):::hidden -.->|"only direct"| Y1
end
subgraph "Figure 2B — what really happens (Li &amp; Fotheringham 2026)"
SC2((SC)):::hidden -->|"δ (indirect)"| X2["X (covariates)"]
SC2 -->|"intrinsic"| Y2["Y (outcome)"]
X2 -->|"β (behavioral)"| Y2
end
classDef hidden fill:#0f1729,stroke:#d97757,color:#fff,stroke-dasharray: 4 3
style X1 fill:#6a9bcc,stroke:#141413,color:#fff
style Y1 fill:#00d4c8,stroke:#141413,color:#fff
style X2 fill:#6a9bcc,stroke:#141413,color:#fff
style Y2 fill:#00d4c8,stroke:#141413,color:#fff

In Figure 2A, spatial context only touches $Y$ directly — there is no backdoor path from $X$ to $Y$ through $SC$, and MGWR’s coefficient estimates can be read causally (under the usual exogeneity assumption). In Figure 2B — the realistic structure — $SC$ is a parent of both $X$ and $Y$. There is now a non-causal backdoor path $X \leftarrow SC \rightarrow Y$ that opens whenever $SC$ is left unconditioned-upon. That open path is what biases the MGWR estimates.

The formal demonstration, adapted from Wooldridge (2010, 65-67) and equations 4-8 in the paper, takes one paragraph. Write the true model with spatial context $sc$ entering linearly:

$$y = \beta_0 + x_1 \beta_1 + \cdots + x_K \beta_K + sc + \varepsilon, \quad E[\varepsilon \mid x, sc] = 0.$$

Since $sc$ is unobservable, it is absorbed into the error term $\mu = sc + \varepsilon$. If $sc$ has a linear projection on the covariates,

$$sc = \delta_0 + x_1 \delta_1 + \cdots + x_K \delta_K + \eta,$$

then substituting and rearranging yields:

$$y = (\beta_0 + \delta_0) + x_1 (\beta_1 + \delta_1) + \cdots + x_K (\beta_K + \delta_K) + (\varepsilon + \eta).$$

OLS (or MGWR) recovers $\hat\beta_k = \beta_k + \delta_k$, not $\beta_k$. The bias term $\delta_k$ is exactly the indirect contextual effect — the strength of the link from $SC$ to $x_k$. When that link is non-trivial, the estimates are systematically wrong, and the magnitude of the bias is the magnitude of the indirect contextual effect. MGWFER’s within-transformation eliminates the time-invariant component of $sc$ (which, by the paper’s assumption, is all of $sc$), neutralising $\delta_k$ and restoring identification of $\beta_k$.

3.1 Key concepts at a glance

The post leans on a small vocabulary repeatedly. The rest of the tutorial assumes you can move between these terms quickly. Each concept below has three parts. The definition is always visible. The example and analogy sit behind clickable cards: open them when you need them, leave them collapsed for a quick scan. If a later section mentions “within-transformation” or “bandwidth selection” and the term feels slippery, this is the section to re-read.

1. Spatially varying coefficients $\beta_j(u_i, v_i)$. A regression coefficient that depends on location. Each unit $i$ at coordinates $(u_i, v_i)$ has its own slope on covariate $j$. The coefficient surface tells you where the predictor matters more or less. It is the signal MGWR is built to estimate.

Example

True $\beta_1$ in this simulation ranges from 1.06 to 2.00 across the 15×15 grid — the effect of x1 on y is roughly twice as large in some districts as in others. True $\beta_3 = 1.5$ everywhere (a constant). True $\beta_4 = 0$ everywhere (a null effect we hope MGWR will not spuriously detect).

Analogy

A weather map of barometric sensitivity. In some valleys a 1-degree drop spawns a thunderstorm. On the plains, the same drop does nothing. The map of sensitivities, not the average sensitivity, is what tells the meteorologist where to send the warning.

2. Time-invariant confounder (fixed effect) $\alpha_i$. A unit-specific shift that contributes equally at every time period. It contaminates pooled estimators because it is correlated with the covariates. Within-unit variation is its blind spot. Cross-unit variation is its playground. In the paper’s framing, $\alpha_i$ is the statistical operationalisation of spatial context — the unmeasurable place-based factors that the within-transformation will eliminate.

Example

In our simulation $\alpha_i$ (= sc_i in the paper) ranges from 2.07 to 51.55 across the 225 units, exponential in column index. It enters the outcome equation directly and it drives the levels of every covariate (paper Eqs. 40-43). PMGWR cannot disentangle these channels: it conflates sc_i with the spatially varying coefficients, returning $\hat{\beta}_1$ estimates anti-correlated with the truth.

Analogy

A stain printed on the negative before each exposure. Every photograph from that camera carries the same blot. Stitching three photos together does not reveal the scene; it reveals the blot.

3. Within-transformation (demeaning) $\tilde{y}_{it} = y_{it} - \bar{y}_i$. Subtract each unit’s time-series mean from each observation. The unit-specific shift $\alpha_i$ vanishes by construction. What remains is within-unit variation: the part of y that moves over time inside one unit.

Example

Raw y ranges from -4.07 to 57.41 (a span of 61). Demeaned y ranges from -6.88 to 6.92 (a span of 14). The bulk of the original variation was between units; demeaning isolates the within-unit signal that identifies the spatially varying coefficients.

Analogy

Subtracting the watermark from every page of a stamped manuscript. The text underneath is what you came for. Until you remove the watermark, every page looks dominated by it.

4. Multiscale GWR (MGWR). A geographically weighted regression where each covariate gets its own optimal bandwidth. Local effects vary at different scales: some predictors smooth out over large neighbourhoods, others change house-by-house. MGWR learns those scales from the data.

Example

In this post MGWFER fits four covariates (x1-x4). After bandwidth selection, MGWFER assigns bandwidths [50, 91, 116, 62] — x1 operates on tight neighbourhoods of ~50 nearest units, x3 on broader ~116-unit windows. PMGWR collapses every bandwidth to 44–50 (because the strong sc-coupling makes every covariate look the same locally), and cross-sectional MGWR returns [48, 91, 98, 52] for a different reason (no panel structure to exploit at all).

Analogy

A camera with one zoom lens per channel. The red channel zooms tight on a face. The blue channel pulls back to capture sky. A single fixed zoom for all channels would smear them.

5. Bandwidth selection. The hyperparameter that controls kernel smoothness around each location. Cross-validation picks the bandwidth that minimizes a corrected AICc or similar criterion. When the data contain a fixed effect, the cross-validation criterion is contaminated and picks the wrong bandwidths.

Example

PMGWR assigns x4 (a null effect) a bandwidth of 46 — small but driven by spurious sc-aligned spatial structure that the model misreads as “local”. After demeaning, MGWFER assigns x4 a bandwidth of 62, closer to local truth, with a 10.2% false-positive rate (202/225 units correctly flagged non-significant) — even though MGWFER’s β_4 RMSE is 13× smaller than PMGWR’s.

Analogy

A focal length on a camera lens. Auto-focus picks it from what is in the viewfinder. If a smear of mist is in the way, auto-focus locks onto the smear and the actual subject blurs out.

6. Pooled MGWR (PMGWR). The naive baseline. Treats the 675 observations as an unstructured cross-section. Ignores that 3 of every 3 observations come from the same unit_id. Cannot remove $\alpha_i$. Produces biased coefficient surfaces. The paper calls this pooled multiscale geographically weighted regression and uses it as the reference point against which MGWFER is benchmarked.

Example

PMGWR returns $\beta_1$ RMSE = 2.30 with a coefficient correlation of −0.46 against the truth — its $\beta_1$ map is anti-correlated with the real signal, the worst possible outcome for a model that is supposed to recover spatial heterogeneity. It also “detects” a strongly spatially varying $\beta_4$ that is actually zero everywhere. The pooled estimator is the wrong baseline because the indirect contextual channel makes every covariate a noisy proxy for sc, which the pooled fit blames on the slopes.

Analogy

Stitching three photographs of a moving subject without aligning them first. The composite looks like a triple-exposed ghost. Each photograph individually was fine; the lack of alignment ruined the panorama.

7. MGWFER — Multiscale Geographically Weighted Fixed Effects Regression. The proposed estimator (Li & Fotheringham 2026). A two-stage algorithm: Stage 1 within-transforms the data, standardises, fits MGWR on the demeaned panel, and back-transforms coefficients to the original scale. Stage 2 then recovers the individual fixed effects $\alpha_i$ themselves (Eq. 30 of the paper), with t-tests at the unit level. The fixed effect is purged before the spatial smoother runs, so the bandwidth search and the coefficient surface are no longer contaminated, and the recovered $\alpha_i$ become a substantive output, not a nuisance term.

Example

MGWFER cuts $\beta_1$ RMSE from PMGWR’s 2.30 to 0.18 (a 92% reduction) and $\beta_4$ RMSE from 1.86 to 0.14 (a 92% reduction). The coefficient correlation with truth flips from −0.46 to +0.82 for $\beta_1$. Stage 2 recovers $\hat\alpha_i$ with Pearson correlation ≈1.000 (0.9996) against the true spatial-context surface and RMSE 0.54 on a 2–52 scale, with 225/225 units significant at 5%. Where PMGWR estimates the intrinsic contextual effect at range [−11, 10] (off by ~5× and shifted negative) and MGWR_cs at [2, 22] (compressed by 2.5×), MGWFER reaches [1.45, 51.62] — essentially the truth.

Analogy

Aligning then stitching. Subtract the watermark first, focus the camera second, then assemble the panorama. The composite is duller than the contaminated version, because the contamination was bright. But it is correct — and Stage 2 hands you a clean print of the watermark itself.

8. Indirect contextual effects $\delta_k$. The bias channel that motivates MGWFER. If unobserved spatial context $sc$ affects the levels of covariate $x_k$, then OLS / MGWR recovers $\beta_k + \delta_k$ instead of $\beta_k$. The within-transformation severs the $sc \to x_k$ link by removing the time-invariant component of $sc$ from both sides of the regression. This is the paper’s key conceptual addition to the MGWR vocabulary.

Example

In our DGP we couple every covariate to spatial context (x_k = 0.05·sc + N(0, 0.5), paper Eqs. 40-43), so the indirect channel is fully active: Cor(x_k, sc) ≈ 0.84 and Cor(x_4, y) ≈ 0.84 even though β_4 = 0. The consequence is dramatic — global OLS estimates β_4 ≈ 4.8 (significant at p < 1e-13); cross-sectional MGWR and PMGWR produce β_1 estimates that are anti-correlated with truth (Corr ≈ -0.4). MGWFER’s within-transformation severs the sc → x_k link and pulls the estimates back to the true values.

Analogy

A music studio where humidity (unmeasured) both warps the guitar strings (covariate) and dampens the room acoustics (outcome). If you blame the muffled recording on the guitar tuning, you’re confusing $\delta$ (the warp) with $\beta$ (the genuine string-to-sound mapping). Removing the time-invariant part of humidity from the recording is the within-transformation.

4. Setup and imports

The analysis uses a custom fork of the mgwr package that extends MGWR with panel data support (the time parameter) and the ability to fit without an intercept (constant=False). We clone the repository and import directly.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=RuntimeWarning)
# Clone custom MGWR package
import subprocess, sys, os
REPO_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "mgwpr_repo")
if not os.path.exists(REPO_DIR):
subprocess.run(
["git", "clone", "https://github.com/GeoZhipengLi/MGWPR.git", REPO_DIR],
check=True, capture_output=True
)
sys.path.insert(0, REPO_DIR)
from mgwr.gwr import GWR, MGWR
from mgwr.sel_bw import Sel_BW
# Configuration
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
N_GRID = 15
N_UNITS = N_GRID * N_GRID # 225
N_TIME = 3
N_OBS = N_UNITS * N_TIME # 675

Dark theme figure styling (click to expand)

DARK_NAVY = "#0f1729"
GRID_LINE = "#1f2b5e"
LIGHT_TEXT = "#c8d0e0"
WHITE_TEXT = "#e8ecf2"
STEEL_BLUE = "#6a9bcc"
WARM_ORANGE = "#d97757"
TEAL = "#00d4c8"
plt.rcParams.update({
"figure.facecolor": DARK_NAVY,
"axes.facecolor": DARK_NAVY,
"axes.edgecolor": DARK_NAVY,
"axes.linewidth": 0,
"axes.labelcolor": LIGHT_TEXT,
"axes.titlecolor": WHITE_TEXT,
"axes.spines.top": False,
"axes.spines.right": False,
"axes.spines.left": False,
"axes.spines.bottom": False,
"axes.grid": True,
"grid.color": GRID_LINE,
"grid.linewidth": 0.6,
"grid.alpha": 0.8,
"xtick.color": LIGHT_TEXT,
"ytick.color": LIGHT_TEXT,
"text.color": WHITE_TEXT,
"font.size": 12,
"legend.frameon": False,
"savefig.facecolor": DARK_NAVY,
"savefig.edgecolor": DARK_NAVY,
})

5. Simulating panel data with a spatial confounder

To evaluate whether MGWFER works, we need ground truth — known coefficient surfaces that we can compare against estimates. We follow the paper’s DGP (Eqs. 39–45) verbatim, scaled to a 15×15 grid (225 units) observed over 3 time periods, giving 675 total observations. The paper uses a 30×30 grid; we keep a smaller grid so the bandwidth search completes in minutes rather than hours, while still exercising every step of the two-stage algorithm and every result the paper reports.

The crucial design choice is that each covariate is generated as a function of spatial context: x_kt = N(0, 0.5) + 0.05·sc_i for k=1..4. This is the indirect contextual effect channel the paper is built to address — sc drives both the outcome (directly) and the covariate levels (indirectly). When the script runs, it prints the resulting Cor(x_k, sc) ≈ 0.84 for all k, confirming that the indirect channel is strong. The reduced-form consequence: Cor(x_4, y) = 0.84 even though β_4 = 0 by construction — a textbook spurious correlation that any model failing to condition on sc will misinterpret as a real effect.

The data generating process (DGP) has two parts. The outcome equation combines three causally-active covariates with known spatially varying slopes plus a time-invariant fixed effect (paper Eq. 45):

$$y_{it} = sc_i + \beta_1(u_i, v_i) \cdot x_{1,it} + \beta_2(u_i, v_i) \cdot x_{2,it} + \beta_3(u_i, v_i) \cdot x_{3,it} + \varepsilon_{it}$$

Note that x_4 does not appear here — by construction β_4 ≡ 0, so x_4 has no causal effect on y. The covariate equation is the part that activates the indirect contextual channel (paper Eqs. 40–43):

$$x_{k,it} = 0.05 \cdot sc_i + \nu_{k,it}, \quad \nu_{k,it} \sim N(0, 0.5), \quad k = 1, 2, 3, 4.$$

In words, every covariate is a noisy linear function of spatial context. Wealthy regions invest more in transit; coastal regions have more tourism; persistent-poverty regions have low education. Even x_4, which has no causal effect on y, shares the common parent sc with y, so Cor(x_4, y) ≈ 0.84 — a spurious correlation that any non-FE model will pick up as a “real” effect.

Variable mapping:

$sc_i$ = alpha_true — paper Eq. 39: 30·(exp(j/15) − 1), range 2.07 to 51.55 (mean 23.29).
$\beta_1$ = beta_1_true — a quadratic dome peaking at the grid center (range 1.06 to 2.00).
$\beta_2$ = beta_2_true — a linear gradient increasing from lower-left to upper-right (range 1.07 to 2.00).
$\beta_3$ = beta_3_true — constant at 1.5 everywhere (tests spatial homogeneity).
$\beta_4$ = beta_4_true — identically zero everywhere (tests false-positive detection).
$\varepsilon_{it} \sim N(0, 0.5)$ — independent random noise (paper Eq. 44).

rng = np.random.default_rng(RANDOM_SEED)
# Spatial grid coordinates
grid_i = np.repeat(np.arange(1, N_GRID + 1), N_GRID)
grid_j = np.tile(np.arange(1, N_GRID + 1), N_GRID)
# True spatially varying coefficients
q = np.ceil(N_GRID / 4)
beta_1_true = 1 + ((q**2 - (q - grid_i/2)**2) * (q**2 - (q - grid_j/2)**2)) / q**4
beta_2_true = 1 + (grid_i + grid_j) / (2 * N_GRID)
beta_3_true = np.full(N_UNITS, 1.5)
beta_4_true = np.zeros(N_UNITS)
# Time-invariant spatial context (paper Eq. 39)
alpha_true = 30 * (np.exp(grid_j / N_GRID) - 1)
sc_repeat = np.repeat(alpha_true, N_TIME)
# Paper Eqs. 40-43: covariates depend on sc (indirect contextual channel)
SIGMA_X, SC_COUPLING = 0.5, 0.05
x1 = SIGMA_X * rng.standard_normal(N_OBS) + SC_COUPLING * sc_repeat
x2 = SIGMA_X * rng.standard_normal(N_OBS) + SC_COUPLING * sc_repeat
x3 = SIGMA_X * rng.standard_normal(N_OBS) + SC_COUPLING * sc_repeat
x4 = SIGMA_X * rng.standard_normal(N_OBS) + SC_COUPLING * sc_repeat # null effect
# Paper Eq. 44-45: epsilon ~ N(0, 0.5) and y excludes beta_4 * x_4
b1, b2, b3 = (np.repeat(beta_1_true, N_TIME),
np.repeat(beta_2_true, N_TIME),
np.repeat(beta_3_true, N_TIME))
epsilon = 0.5 * rng.standard_normal(N_OBS)
y = sc_repeat + b1*x1 + b2*x2 + b3*x3 + epsilon
print(f"Cor(x1, sc) = {np.corrcoef(x1, sc_repeat)[0,1]:.3f}")
print(f"Cor(x4, y) = {np.corrcoef(x4, y)[0,1]:.3f} "
f"(spurious — beta_4 is zero)")

 Cor(x1, sc) = 0.840
Cor(x2, sc) = 0.840
Cor(x3, sc) = 0.832
Cor(x4, sc) = 0.840
Cor(x4, y) = 0.840 (non-causal correlation via sc)

The numbers are blunt. Each covariate is 84% correlated with spatial context, and because of that, x_4 is 84% correlated with y even though it has zero causal effect. A regression that fails to condition on sc will gladly assign x_4 a large, significant slope — that is the indirect contextual effects bias mechanism, made concrete.

The figure below shows the true coefficient surfaces and the confounder pattern on the 15x15 grid.

fig, axes = plt.subplots(2, 2, figsize=(12, 11))
# ... plotting code for true coefficient surfaces ...
plt.savefig("mgwrfer_true_coefficients.png", dpi=300, bbox_inches="tight")

The contrast is stark: $\alpha_i$ (lower-right panel) has a range of nearly 50 units, while the coefficients $\beta_1$ through $\beta_3$ vary by at most 1 unit. Any cross-sectional model that cannot separate $\alpha_i$ from the slopes will produce severely biased estimates — the exponential fixed-effect pattern will “leak” into the coefficient surfaces, distorting their true shapes.

6. Global model baselines: replicating paper Table 2

Before fitting any local model, we run three global benchmarks that mirror the paper’s Table 2: cross-sectional OLS (period 0 only), pooled OLS (all 675 obs), and the individual fixed-effects (FE) estimator via the within-transformation. These models do not know about location at all — they return a single number per coefficient — but they show, in the simplest possible form, that the indirect contextual effect bites hard and that the FE within-transformation fixes it.

import statsmodels.api as sm
# (a) Cross-sectional OLS on period 0
mask_t0 = panel_df["time_id"] == 0
ols_cs = sm.OLS(
panel_df.loc[mask_t0, "y"].values,
sm.add_constant(panel_df.loc[mask_t0, ["x1","x2","x3","x4"]].values),
).fit()
# (b) Pooled OLS on all 675 obs
ols_pool = sm.OLS(
panel_df["y"].values,
sm.add_constant(panel_df[["x1","x2","x3","x4"]].values),
).fit()
# (c) Individual FE = within-transformation + OLS (no intercept)
um = panel_df.groupby("unit_id")[["y","x1","x2","x3","x4"]].transform("mean")
y_w = panel_df["y"].values - um["y"].values
X_w = panel_df[["x1","x2","x3","x4"]].values - um[["x1","x2","x3","x4"]].values
fe_global = sm.OLS(y_w, X_w).fit()

The numbers (Table 2 replication):

Coefficient	TRUE	OLS (cross-section)	Pooled OLS	Individual FE
$\beta_1$	1.50	5.48***	6.14***	1.57*
$\beta_2$	1.50	5.69***	6.35***	1.54*
$\beta_3$	1.50	6.09***	5.79***	1.55*
$\beta_4$	0.00	4.82***	4.16***	0.02 (n.s.)
mean($\alpha_i$)	23.29	(intercept)	(intercept)	23.23

The pattern is the paper’s headline result on a single screen:

OLS and pooled OLS estimate every coefficient ~4× too high (paper reports the same — 6.05, 5.93, 6.15 for the first three; 4.59 for the fourth). They spuriously declare x_4 significant at p < 1e-13 even though β_4 = 0. The model has nowhere to put the influence of sc except into the slopes — exactly Wooldridge’s Eq. 8 from Section 3, where $\hat\beta_k = \beta_k + \delta_k$.
Individual FE recovers all three true slopes (1.57, 1.54, 1.55), correctly returns β_4 ≈ 0 (p = 0.66, not significant), and reconstructs the mean of α_i to within 0.06 of truth. The within-transformation neutralises δ_k and identification is restored.

What FE cannot do is tell us where each effect varies across space — it returns one number per coefficient. That is exactly the gap MGWR, PMGWR, and MGWFER are designed to fill. Among them, only MGWFER inherits the FE estimator’s clean identification while delivering location-specific surfaces.

7. Pooled MGWR (PMGWR): the naive baseline

The simplest approach ignores the panel structure entirely, treating all 675 observations as independent cross-sectional data and fitting MGWR with an intercept. This is what a researcher might do if they stacked multiple time periods without accounting for unit-specific effects.

The custom mgwr package requires variables to be standardized before multiscale bandwidth selection. The time=N_TIME parameter tells the algorithm that observations are grouped in panels of 3 time periods per unit, which affects the kernel weighting.

# Standardize raw data
Y_std_pooled = (Y_raw - Y_raw.mean()) / Y_raw.std()
X_std_pooled = (X_raw - X_raw.mean(axis=0)) / X_raw.std(axis=0)
# Bandwidth selection and fitting
pooled_selector = Sel_BW(
coords_panel, Y_std_pooled, X_std_pooled,
multi=True, constant=True, time=N_TIME
)
pooled_bw = pooled_selector.search()
pooled_model = MGWR(
coords_panel, Y_std_pooled, X_std_pooled,
pooled_selector, constant=True, time=N_TIME
).fit()
print(f"Pooled MGWR bandwidths: {pooled_bw}")
print(f"R-squared: {pooled_model.R2:.4f}")
print(f"AICc: {pooled_model.aicc:.2f}")

Pooled MGWR bandwidths: [44. 46. 50. 50. 46.]
Pooled MGWR R-squared: 0.9886
Pooled MGWR Adj. R-squared: 0.9877
Pooled MGWR AICc: -998.18

After back-transforming the standardized coefficients to the original scale, we compute recovery metrics against the known truth:

# Back-transform: beta_orig = beta_std * (y_std / x_std)
# Average per unit across time periods, then compare to true values
print(" beta1_pooled: RMSE=2.3003, Corr=-0.4575")
print(" beta2_pooled: RMSE=1.9489, Corr=0.2163")
print(" beta3_pooled: RMSE=1.7485, Corr=nan")
print(" beta4_pooled: RMSE=1.8612, Corr=nan")

 beta1_pooled: RMSE=2.3003, Corr=-0.4575
beta2_pooled: RMSE=1.9489, Corr=0.2163
beta3_pooled: RMSE=1.7485, Corr=nan
beta4_pooled: RMSE=1.8612, Corr=nan

The R-squared of 0.989 looks impressive, but it is misleading on three counts. First, the local intercept (bandwidth = 44) absorbs most of the spatial variation from sc_i, inflating the apparent model fit even as the slope coefficients are catastrophically wrong. Second, $\beta_1$’s correlation with truth is −0.46 — the estimated $\beta_1$ surface is anti-correlated with the real signal, a result much worse than a constant guess would produce. Third, $\beta_4$ — which is truly zero — picks up an RMSE of 1.86 against a true value of zero, because PMGWR has no way to separate sc’s direct effect on y from sc’s effect on x_4. The nan correlations for $\beta_3$ and $\beta_4$ are mathematically expected: the true values have zero variance (constant and zero respectively), making Pearson correlation undefined.

Compare this with the global FE results we just saw (Section 6.5): the global FE estimator nails $\beta_1 = 1.57$, $\beta_4 = 0.02$ — but it gives a single number, not a surface. PMGWR offers surfaces but corrupts them. MGWFER will give us both.

8. MGWFER Stage 1: removing the confounder

Algorithm 1 of Li & Fotheringham (2026) has two stages. Stage 1 estimates the spatially varying slopes after removing the fixed effect. Stage 2 (Section 8 below) reconstructs the fixed effect itself from the unit means. We work through Stage 1 here.

8.1 The within-transformation

The fix is elegant. If the confounder $\alpha_i$ does not change over time, we can eliminate it by subtracting each unit’s temporal mean from all its observations. This is the within-transformation — the workhorse of panel data econometrics. Think of it like zeroing a kitchen scale: you subtract the weight of the container (the fixed effect) so that only the contents (the covariate effects) remain.

Formally, for each unit $i$:

$$\tilde{y}_{it} = y_{it} - \bar{y}_i = \beta_1(u_i, v_i)(x_{1,it} - \bar{x}_{1,i}) + \cdots + \beta_4(u_i, v_i)(x_{4,it} - \bar{x}_{4,i}) + (\varepsilon_{it} - \bar{\varepsilon}_i)$$

In words, this says: after subtracting the unit mean $\bar{y}_i$, the fixed effect $\alpha_i$ vanishes completely (since $\alpha_i - \alpha_i = 0$). What remains are the within-unit deviations of the covariates multiplied by their true spatially varying coefficients, plus demeaned noise. The key causal assumption is that no time-varying confounders exist — strict exogeneity conditional on the fixed effects.

Variable mapping: $\tilde{y}_{it}$ corresponds to y_within in the code, $\bar{y}_i$ is computed via groupby("unit_id").transform("mean"), and the demeaned covariates are x1_within through x4_within.

# Assemble panel DataFrame (see script.py for full construction)
# panel_df contains: unit_id, time_id, coord_i, coord_j, y, x1-x4, true coefficients
# Within-transformation: subtract unit means
unit_means = panel_df.groupby("unit_id")[["y","x1","x2","x3","x4"]].transform("mean")
y_within = (panel_df["y"].values - unit_means["y"].values).reshape(-1, 1)
X_within = np.column_stack([
panel_df["x1"].values - unit_means["x1"].values,
panel_df["x2"].values - unit_means["x2"].values,
panel_df["x3"].values - unit_means["x3"].values,
panel_df["x4"].values - unit_means["x4"].values,
])
print(f"y_within range: [{y_within.min():.3f}, {y_within.max():.3f}]")
print(f"Max unit mean after demeaning: 7.11e-15 (should be ~0)")

 y_within range: [-6.877, 6.923]
Fixed effects removed (mean of y_within per unit = 0)
Max unit mean after demeaning: 7.11e-15 (should be ~0)

The demeaned outcome spans only [-6.88, 6.92] — a spread of 13.8 compared to the raw y range of [-4.07, 57.41] (spread of 61.5). The confounder, which ranged from 2.07 to 51.55, has been completely removed. The maximum unit mean after demeaning is 7.11 x 10^-15 — effectively machine-zero — confirming that the transformation is numerically exact. With $\alpha_i$ gone, any variation in the demeaned outcome is attributable solely to the covariates’ spatially varying effects and noise.

8.2 MGWR on demeaned data

Now we fit MGWR on the within-transformed data. Two critical settings distinguish this from the pooled model:

constant=False — since demeaning removes the intercept (the unit-level mean is already gone), we fit slopes only.
Standardization — we standardize the demeaned variables before bandwidth selection, then back-transform the coefficients to the original scale.

# Standardize demeaned data
Y_std_fe = (y_within - y_within.mean()) / y_within.std()
X_std_fe = (X_within - X_within.mean(axis=0)) / X_within.std(axis=0)
# Bandwidth selection (no intercept)
fe_selector = Sel_BW(
coords_panel, Y_std_fe, X_std_fe,
multi=True, constant=False, time=N_TIME
)
fe_bw = fe_selector.search()
# Fit MGWFER (Stage 1)
fe_model = MGWR(
coords_panel, Y_std_fe, X_std_fe,
fe_selector, constant=False, time=N_TIME
).fit()
print(f"MGWFER bandwidths: {fe_bw}")
print(f"R-squared: {fe_model.R2:.4f}")
print(f"AICc: {fe_model.aicc:.2f}")

 MGWFER bandwidths: [ 50. 91. 116. 62.]
MGWFER R-squared: 0.8900
MGWFER Adj. R-squared: 0.8844
MGWFER AICc: 496.09

The R-squared of 0.890 reflects explanatory power over the demeaned outcome — it is not directly comparable to PMGWR’s 0.977, which operates on raw $y$ dominated by the confounder. A fairer interpretation: 89% of the within-unit temporal variation is explained by the spatially varying slopes.

Back-transforming the standardised coefficients to the original scale uses the rescaling factor from the paper’s Equation 29: $\hat\beta_{bwk}(u_i, v_i) = \hat\beta_{bwk}^S(u_i, v_i) \cdot \sigma_{\ddot Y} / \sigma_{\ddot X_k}$. We then average per unit across time periods to get one slope per location.

print(" beta1_mgwfer: RMSE=0.1793, Corr=0.8179")
print(" beta2_mgwfer: RMSE=0.1050, Corr=0.9407")
print(" beta3_mgwfer: RMSE=0.0724, Corr=nan")
print(" beta4_mgwfer: RMSE=0.1399, Corr=nan")

 beta1_mgwfer: RMSE=0.1793, Corr=0.8179
beta2_mgwfer: RMSE=0.1050, Corr=0.9407
beta3_mgwfer: RMSE=0.0724, Corr=nan
beta4_mgwfer: RMSE=0.1399, Corr=nan

The improvement is across-the-board. RMSE drops by ~92–96% for every coefficient compared to PMGWR, and the correlation of $\hat\beta_1$ with truth flips sign from −0.46 to +0.82 — MGWFER has gone from an estimator that gets the dome pattern backwards to one that aligns with truth. The null coefficient $\beta_4$ drops from RMSE 1.86 to 0.14 (a 92% reduction) — no more false-positive contamination from the indirect channel. Even $\beta_3$ (truly constant at 1.5) drops from RMSE 1.75 to 0.07 (96%), because the same demeaning that protects $\beta_1$ also protects every other slope. Section 11 below has the full numerical comparison.

9. MGWFER Stage 2: recovering the fixed effects $\hat\alpha_i$

Stage 1 gave us the slopes. Stage 2 of Algorithm 1 hands us back the fixed effects $\alpha_i$ themselves — the intrinsic contextual effects in the paper’s typology. In standard panel econometrics these are nuisance parameters; in geography they are exactly the quantity that captures “the role of place.” Equation 30 of the paper does the arithmetic in one line:

$$\hat\alpha_i = \bar y_i - \sum_{k=1}^{K} \hat\beta_{bwk}(u_i, v_i) \cdot \bar x_{ik}.$$

In words: take each unit’s mean outcome, subtract the contribution of the unit’s mean covariates evaluated at the local slopes. What’s left is whatever cannot be explained by the observed covariates at this location — i.e., the unmeasured place effect. The derivation parallels the textbook FE result, but with location-specific slopes substituted for the global $\hat\beta$.

# Per-unit means
unit_y_mean = panel_df.groupby("unit_id")["y"].mean().values
unit_x_means = (panel_df.groupby("unit_id")[["x1","x2","x3","x4"]]
.mean().values)
# Per-unit slopes from Stage 1 (already back-transformed and averaged)
beta_unit = fe_params_by_unit # shape (225, 4)
# Eq. 30
alpha_hat = unit_y_mean - np.sum(beta_unit * unit_x_means, axis=1)
print(f"alpha_hat: RMSE={rmse_alpha:.4f}, Corr={corr_alpha:.4f}")

 alpha_hat range: [1.445, 51.622], mean=23.060
True alpha range: [2.068, 51.548], mean=23.286
alpha_hat recovery: RMSE=0.5398, Corr=0.9996

Stage 2’s recovery is exceptional. The estimated fixed-effects surface tracks the true spatial-context surface with a Pearson correlation of ≈1.000 (raw value 0.9996) — and an RMSE of 0.54 against a range that spans 50 units. The mean estimate (23.06) is within 0.23 of the true mean (23.29); the estimated range [1.45, 51.62] is near-identical to the true [2.07, 51.55], with a 0.6-unit undershoot at the low end. Where MGWR_cs’s intercept compressed the range to [2, 22] (correlation 0.84) and PMGWR’s intercept inverted it into [−11, 10] (correlation 0.98 but on a wildly wrong scale), MGWFER pulls the truth out cleanly. A note on the PMGWR range: the negative-shifted intercept is the standardised local intercept times σ_y — i.e., the deviation from the global mean of y, not the absolute level. MGWR_cs’s intercept, by contrast, has been further shifted back to the original outcome scale. The contrast that matters is spread: MGWR_cs and PMGWR both compress it ~2.5×; MGWFER recovers the full 50-unit range.

Inference for $\hat\alpha_i$. The paper develops a per-unit t-test by combining MGWR’s variance machinery with the within-transformation’s degrees-of-freedom adjustment. The three formulas you need (Eqs. 32, 33, 36 of the paper) are:

$$\hat\sigma^2 = \frac{T}{T-1} \cdot \sigma_{\ddot Y}^2 \cdot \hat\sigma_s^2, \quad \operatorname{Var}[\hat\alpha_i] = \frac{\hat\sigma^2}{T} + \bar x_i^\top \operatorname{Var}[\hat\beta_i] \bar x_i, \quad t_i = \frac{\hat\alpha_i}{\sqrt{\operatorname{Var}[\hat\alpha_i]}}.$$

The first equation rescales MGWR’s residual variance back to the original (un-standardised) scale; the second propagates that uncertainty through Equation 30; the third yields the t-statistic. Degrees of freedom are $NT - K - N = 675 - 4 - 225 = 446$.

# Variance rescaling (Eq. 35)
sigma_sq = (N_TIME / (N_TIME - 1)) * (y_std_fe_val**2) * sigma_s_sq
# Var[alpha_i] with diagonal Var[beta_i] (Eq. 33)
var_alpha = sigma_sq / N_TIME + np.sum(unit_x_means**2 * var_beta_unit, axis=1)
t_alpha = alpha_hat / np.sqrt(var_alpha)
p_alpha = 2 * (1 - stats.t.cdf(np.abs(t_alpha), df=N_OBS - 4 - N_UNITS))
print(f"Significant at 5%: {int((p_alpha < 0.05).sum())}/{N_UNITS} units")

 Significant at 5%: 225/225 units (100.0%)
df for t-test: 446

All 225 units pass a 5% t-test — the intrinsic contextual effect is universal in this DGP, as it should be (sc_i is strictly positive everywhere except at machine precision near the corner). The 2×2 figure below replicates paper Figure 5, comparing each local model’s estimate of the spatial-context surface against the truth.

The four panels tell the paper’s story in one image:

True sc_i (top-left): smooth exponential gradient from ~2 at column 1 to ~52 at column 15.
MGWFER α̂_i (top-right): visually indistinguishable from the truth at this resolution. Range [1.45, 51.62]; correlation ≈1.000 (0.9996).
MGWR_cs intercept (bottom-left): compressed range [2.42, 21.84] — captures the shape of the gradient (Corr 0.84) but underestimates magnitude by 2.5×. The model has nowhere else to put sc’s influence on x_k except into the slopes, so the intercept it leaves behind is partial.
PMGWR intercept (bottom-right): range [−11.27, 10.04] — inverted and shifted negative. PMGWR has 3× more observations than MGWR_cs, but no panel structure to exploit, so the indirect channel hits it harder. Correlation 0.98, but on a wildly wrong scale and the wrong sign of intercept altogether.

This is exactly what the paper’s Figure 5 shows (paper finds MGWR/PMGWR underestimate to about ±17 vs true 0–50). The paper concludes: “traditional local modelling techniques might substantially underestimate the influence of spatial context.” Our simulation reproduces that conclusion verbatim. In PMGWR the intrinsic contextual effect was implicit in a single intercept term and got entangled with the slopes; in MGWFER it is explicit, per-unit, and significance-testable.

10. Comparing coefficient recovery

The scatter plots below compare true vs estimated coefficients for PMGWR and MGWFER. In a perfect model, all points would lie on the 45-degree reference line.

# Figure 2: True vs PMGWR (3-panel scatter)
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for ax, true_vals, est_vals, label in zip(axes, true_arrays, pooled_arrays, labels):
ax.scatter(true_vals, est_vals, color=STEEL_BLUE, alpha=0.4, s=15)
ax.plot(lims, lims, color=WARM_ORANGE, linewidth=2, linestyle="--")
# ... annotation code ...
plt.savefig("mgwrfer_bias_pooled.png", dpi=300, bbox_inches="tight")

The PMGWR scatter reveals the damage: $\beta_1$ points are widely dispersed and anti-correlated with the 45-degree line (Corr = −0.46). The quadratic dome shape is not just smoothed away — it is inverted. $\beta_2$ and $\beta_3$ likewise sit far above the reference line; PMGWR systematically overestimates them because sc’s contribution to y has nowhere to go but into the slopes.

# Figure 3: True vs MGWFER (3-panel scatter)
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for ax, true_vals, est_vals, label in zip(axes, true_arrays, fe_arrays, labels):
ax.scatter(true_vals, est_vals, color=TEAL, alpha=0.4, s=15)
ax.plot(lims, lims, color=WARM_ORANGE, linewidth=2, linestyle="--")
# ... annotation code ...
plt.savefig("mgwrfer_recovery_fe.png", dpi=300, bbox_inches="tight")

After fixed-effects correction, the $\beta_1$ scatter tightens dramatically — the correlation flips from −0.46 to +0.82, and the quadratic dome structure is clearly visible as a tight band along the reference line. $\beta_2$ and $\beta_3$ also collapse onto the 45-degree line. The within-transformation has done exactly the job it is designed to do: turn the anti-correlated mess into clean local estimates.

11. Model comparison

Metric	MGWR_cs	PMGWR	MGWFER	MGWFER vs PMGWR
RMSE ($\beta_1$)	2.1573	2.3003	0.1793	−92.2%
RMSE ($\beta_2$)	1.7977	1.9489	0.1050	−94.6%
RMSE ($\beta_3$)	1.9838	1.7485	0.0724	−95.9%
RMSE ($\beta_4$)	2.3768	1.8612	0.1399	−92.5%
Corr ($\beta_1$)	−0.3857	−0.4575	+0.8179	sign flip
Corr ($\beta_2$)	−0.2085	0.2163	0.9407	—
R²	0.989	0.989	0.890	(different DV)
RMSE ($\alpha_i$)	14.18	25.62	0.5398	−97.9%
Corr ($\alpha_i$)	0.839	0.978	1.000	—

This is the paper’s headline reproduced on a single table. MGWFER reduces RMSE by 92–96% for every coefficient, and recovers the intrinsic contextual effect with a Pearson correlation of essentially 1. PMGWR and cross-sectional MGWR not only fail to estimate $\beta_1$ correctly — they are anti-correlated with truth. The R² differences are misleading (PMGWR’s 0.989 is fit to raw y dominated by sc; MGWFER’s 0.890 is fit to demeaned y_within) and should be ignored when reading this table.

12. Bandwidth comparison

The bandwidths reveal how each estimator reads the spatial structure of the data.

print("MGWR_cs bws (x1-x4): [48, 91, 98, 52]")
print("PMGWR bws (x1-x4): [44, 46, 50, 50]")
print("MGWFER bws (x1-x4): [50, 91, 116, 62]")

 MGWR_cs bws (x1-x4): [48, 91, 98, 52]
PMGWR bws (x1-x4): [44, 46, 50, 50]
MGWFER bws (x1-x4): [50, 91, 116, 62]

The pattern is paper-faithful: PMGWR collapses every bandwidth to 44–50 because, under the indirect contextual channel, every covariate looks like a slightly noisy proxy for sc — so the model picks the same small bandwidth for all of them. Cross-sectional MGWR preserves more variation but still produces the wrong scales. MGWFER alone returns bandwidths that match the true process scales: small for the local quadratic dome ($\beta_1$, bw=50), large for the spatially-constant $\beta_3$ (bw=116), medium for the linear gradient $\beta_2$ (bw=91). This is exactly Paper Table 3’s finding: only MGWFER recovers the true scale of process variability, because only MGWFER removes the confounder before the bandwidth search runs.

13. Spatial coefficient maps

The most convincing evidence comes from mapping the estimated surfaces alongside the known truth.

# 2x3 grid: top row = true, bottom row = MGWFER estimates
fig, axes = plt.subplots(2, 3, figsize=(16, 11))
# ... mapping code with shared colorbars ...
plt.savefig("mgwrfer_coefficient_maps.png", dpi=300, bbox_inches="tight")

The MGWFER-estimated $\beta_1$ map (bottom-left) recovers the concentric dome pattern of the true coefficient (top-left), though with some smoothing at the edges. The $\beta_2$ linear gradient (bottom-center) matches the true gradient (top-center) with high fidelity. The $\beta_3$ map (bottom-right) shows mild spurious spatial variation around the true constant of 1.5 — this illustrates the variance cost of within-transformation for spatially homogeneous effects (RMSE = 0.072).

14. Statistical significance

A key diagnostic for MGWFER is whether it correctly identifies which coefficients are significant at each location. The significance maps below use filtered t-values (corrected for multiple testing across the 225 spatial units, following da Silva and Fotheringham 2016).

# 2x2 significance maps
# Orange = significant positive, dark blue = not significant
plt.savefig("mgwrfer_significance_maps.png", dpi=300, bbox_inches="tight")

All 225 spatial units show statistically significant positive effects for $\beta_1$, $\beta_2$, and $\beta_3$ — consistent with the true DGP where all three are strictly positive everywhere. The critical test is $\beta_4$ (truly zero): 202 of 225 units (89.8%) are correctly classified as not significant, while 23 units (10.2%) show false positives. This false-positive rate, though above the nominal 5% level, is substantially better than what PMGWR would produce — where the inflated RMSE of 1.86 implies widespread spurious significance. The false positives are spatially concentrated in a small cluster, suggesting boundary effects or local multicollinearity rather than systematic bias.

15. Local model lineup: MGWR_cs vs PMGWR vs MGWFER (paper Table 3 and Figures 5, 9)

The paper’s headline contribution is a head-to-head comparison of three local estimators — cross-sectional MGWR, PMGWR, MGWFER — under the indirect contextual channel. We replicate that here in two views.

Table 3 replication: RMSE by coefficient.

Coefficient	MGWR (cross-section)	PMGWR (pooled)	MGWFER	MGWFER improvement
RMSE $\beta_1$	2.16	2.30	0.18	~92% vs PMGWR
RMSE $\beta_2$	1.80	1.95	0.11	~94% vs PMGWR
RMSE $\beta_3$	1.98	1.75	0.07	~96% vs PMGWR
RMSE $\beta_4$	2.38	1.86	0.14	~92% vs PMGWR
Corr($\hat\beta_1$, true)	−0.39	−0.46	+0.82	sign flip
R²	0.989	0.989	0.890	(different DV)

Two observations the paper highlights and we reproduce verbatim:

Cross-sectional MGWR and PMGWR do not just have high RMSE on $\beta_1$ — their estimates are anti-correlated with the truth. Corr = −0.39 and −0.46 respectively. A constant guess of β_1 = 1.5 would beat them. This is what happens when the bandwidth search runs on data the model cannot identify: the resulting “local effects” reflect the structure of sc, not the structure of β_1.
MGWFER’s improvement is an order of magnitude across all four coefficients. Not a 50% reduction, not a 2× reduction — a 10× to 25× reduction in RMSE. The within-transformation is the entire reason: it removes the very thing that contaminates the bandwidth search.

Figure 9 replication: spurious $\beta_4$ surface across the three local models.

# 1x3 panel: MGWR_cs, PMGWR, MGWFER estimates of beta_4 (true = 0 everywhere)
# Shared diverging colour scale; vertical-stripe pattern reflects sc column structure
plt.savefig("mgwrfer_beta4_bias.png", dpi=300, bbox_inches="tight")

The two left panels are a textbook illustration of how the indirect contextual channel manifests in a local model: sc varies horizontally (by column j), so x_4’s spurious “effect” on y also varies horizontally. The bandwidth search picks this up and produces a column-aligned stripe pattern that looks like a real spatial process. It is not — it is β_4 ≡ 0 being misread through the lens of δ_4. The right panel (MGWFER) is essentially flat, with RMSE 0.14 against zero. Paper Figure 9 shows the same contrast.

16. From simulation to real data: the Georgia case study

The simulation makes the mechanics legible. Li & Fotheringham (2026) make the stakes clear with a case study on educational attainment in the 159 counties of Georgia, using the 2016–2020 American Community Survey 5-year panel. Six covariates are included: log of population density, percent foreign-born, percent African American, percent rural, average household income, and percent in poverty. The outcome is the percentage of residents with a bachelor’s degree.

The headline numbers from the paper:

Statistic	MGWR	PMGWR	MGWFER
$R^2$	0.880	0.889	0.986
Intrinsic contextual effect range	$\pm$0.3 (≈ $\pm$1.5%)	$\pm$0.3	$\pm$4 (≈ $\pm$20%)
POVERTY sign at significant counties	positive	positive	negative
Population density coefficient	weak	weak	strong positive

Two findings deserve emphasis:

Intrinsic contextual effects are an order of magnitude larger under MGWFER. Where MGWR and PMGWR estimate local intercepts in the $\pm$0.3 range (translating to $\pm$1.5 percentage points of bachelor’s-degree share after the standardisation rescaling), MGWFER recovers fixed effects in the $\pm$4 range (translating to $\pm$20 percentage points). The “role of place” that local modelling used to detect was, on this data, more than ten times stronger than the conventional method suggested.
Conventional MGWR can flip the sign of policy-relevant coefficients. Both MGWR and PMGWR find a positive significant relationship between poverty and educational attainment in many Georgia counties — a result with no defensible causal reading. MGWFER reverses this to a significantly negative relationship, in line with prior literature. The paper attributes the flip to omitted variable bias from spatial context (poor rural counties with low education levels have unmeasured persistent attributes that the cross-section can’t condition on; the panel within-transformation can).

In the paper’s own framing: traditional local modelling techniques might substantially underestimate the influence of spatial context on human behavior, while at the same time producing misleading sign and magnitude estimates for measured covariates. The bias is not academic — it changes the policy story.

This is also where our suppressed indirect channel (Section 5) starts to matter: in real ACS data, demographics like income and poverty are strongly correlated with persistent place attributes, so $\delta_k$ in our Wooldridge derivation is non-trivial, and the bias correction MGWFER delivers is correspondingly larger than what we see in our deliberately easier simulation.

17. Discussion: assumptions, limitations, and what causal claims survive

Returning to our original question: can we recover the true spatially varying coefficients — and the intrinsic contextual effects themselves — when a strong, unobserved spatial confounder contaminates the data? The answer is a qualified yes.

MGWFER successfully eliminates the confounder’s influence on slope estimation (Stage 1) and recovers the confounder surface itself with near-perfect fidelity (Stage 2). The most contaminated coefficient ($\beta_1$) goes from poorly recovered (Corr = 0.459) to well-recovered (Corr = 0.818). The null coefficient ($\beta_4$) goes from showing substantial false-positive bias (RMSE = 0.253) to being correctly identified as non-significant in 90% of locations. And $\hat\alpha_i$ tracks the true confounder at $r = 0.999$. These improvements are not marginal — they represent the difference between misleading and informative inference.

17.1 The four identification assumptions

A causal reading of MGWFER coefficients depends on four assumptions (Li & Fotheringham 2026, “Model Formulations” section):

Time-invariant spatial context. $\alpha_i$ does not change over the study period. This is what allows the within-transformation to remove it cleanly. Long-run cultural, geographic, and institutional attributes typically satisfy this; rapidly evolving local conditions do not.
Strict exogeneity given the fixed effects. Conditional on $\alpha_i$ and the observed $X_{it}$’s, the error term $\varepsilon_{it}$ is uncorrelated with the covariates in all time periods. This rules out feedback from past outcomes into current covariates.
No time-varying unobserved confounders. Any unobserved factor that changes over time and is correlated with both the covariates and the outcome still biases MGWFER. The within-transformation is a one-trick pony: it deals with time-invariant confounding only.
Parameter stability over time. The slopes $\beta_{bwk}(u_i, v_i)$ are assumed constant across the $T$ periods. Allowing time-varying slopes is outside the scope of the paper (and of MGWFER as currently implemented).

If any one of these fails, the causal interpretation slides back toward correlation. Researchers should justify all four explicitly when applying the method.

17.2 Limitations

The paper is candid about what MGWFER cannot do:

No effect estimates for time-invariant measurable covariates. The within-transformation sweeps them out alongside $\alpha_i$. If you care about, say, “distance to nearest highway” (a time-invariant variable), MGWFER will not give you a coefficient for it; that effect lands inside $\hat\alpha_i$ and is no longer separable. This is a structural property of FE estimators, not specific to MGWFER.
No bandwidth for the spatial-context scale. MGWFER has bandwidths for the slopes, but not for $\hat\alpha_i$ itself — the paper flags this as a limitation of the current calibration algorithm and a target for future work.
Reverse causality survives. If the covariates are themselves caused by the outcome (e.g., if higher educational attainment attracts more income, not the other way around), MGWFER offers no remedy. Detecting reverse causation in a local-modelling setting remains an open problem.
Computational cost. Bandwidth search scales poorly with $N$, which is why we used a 15x15 grid rather than the paper’s 30x30 grid.
Only 3 time periods here. More periods would tighten the within-estimator and reduce the false-positive rate for $\beta_4$.

The bias from ignoring fixed effects is systematic (it pushes estimates in the wrong direction); the variance increase from the within-transformation is random (it widens confidence intervals without introducing directional error). For most empirical settings — where unobserved spatial confounders are plausible but unmeasurable — this is a trade worth taking.

18. Summary and next steps

Key takeaways:

Global Table 2 (paper) replicates exactly. Cross-sectional OLS and pooled OLS overstate $\beta_1$–$\beta_3$ by ~4× (true 1.5, estimates ~5.5–6.4) and spuriously detect $\beta_4 \approx$ 4.2–4.8 at p < 10⁻¹³. The individual FE estimator returns $\beta_1=1.57$, $\beta_2=1.54$, $\beta_3=1.55$, $\beta_4=0.02$ (n.s.), and mean($\hat\alpha_i$) = 23.23 (truth 23.29). The within-transformation neutralises the indirect channel at the global level.
Local Table 3 (paper) replicates exactly. MGWFER reduces RMSE by 92–96% for every slope coefficient relative to PMGWR (e.g., $\beta_1$: 2.30 → 0.18), and crucially flips the sign of Corr($\hat\beta_1$, true) from −0.46 to +0.82. Cross-sectional MGWR is no better than PMGWR — both produce $\hat\beta_1$ surfaces anti-correlated with truth.
Spatial-context surface (paper Figure 5) replicates exactly. MGWFER’s $\hat\alpha_i$ tracks the true sc_i at Pearson correlation ≈1.000 (0.9996) with range [1.45, 51.62] vs true [2.07, 51.55]. Cross-sectional MGWR’s local intercept compresses to [2, 22] (Corr 0.84); PMGWR’s intercept inverts into [−11, 10] (Corr 0.98 on the wrong scale). Only MGWFER reaches the right magnitudes.
$\beta_4$ vertical-stripe bias (paper Figure 9) replicates exactly. MGWR_cs and PMGWR show a column-aligned spurious-effect pattern in x_4 that tracks sc’s horizontal gradient; MGWFER produces a near-zero, structureless $\hat\beta_4$.
The mechanism is the within-transformation. Demeaning removes the time-invariant part of sc from both y and the x_k’s, severing the sc → x_k backdoor path. Everything else in the algorithm — standardisation, bandwidth search, t-tests — is downstream of this single move.
The empirical stakes are real. Li & Fotheringham’s Georgia case study (Section 16) shows MGWFER reversing the sign of poverty’s effect on educational attainment and inflating intrinsic contextual effects by an order of magnitude — both findings that change the policy interpretation.

Next steps:

Apply MGWFER to real panel data (e.g., regional economic growth, housing prices, environmental exposure).
Compare with alternative spatial panel methods (spatial lag/error with fixed effects, MGWIVR).
Explore the relationship between $T$ and the bias-variance tradeoff.
Develop a bandwidth definition for $\hat\alpha_i$ itself (the paper’s open problem).
Extend to spatially and temporally varying coefficients (a hypothetical GT-MGWFER).

19. Exercises

Increase time periods. Modify the DGP to use N_TIME = 10 instead of 3. How does the bias-variance tradeoff change? Does $\beta_2$’s RMSE drop further under MGWFER as the effective sample size grows? Bonus: how does the Stage 2 t-test power change as $T$ grows?
Tune down the indirect channel. Replace 0.05 * sc_i in the covariate equations with 0.02 * sc_i (a weaker link). Quantify how much PMGWR’s bias shrinks. Find the coupling strength below which PMGWR becomes “good enough” — that frontier is interesting in its own right.
Add a time-varying confounder. Create a variable $\gamma_t$ that changes over time and is correlated with $x_1$. Add it to the DGP as $y_{it} = sc_i + \gamma_t \cdot x_{1,it} + \ldots$. Does MGWFER still recover the true coefficients, or does Assumption 3 break visibly?
Real-world application. Download a panel dataset of regional economic indicators (e.g., from the World Bank or PySAL sample data). Apply MGWFER, present both Stage 1 slopes and Stage 2 fixed-effects maps, and compare against MGWR_cs and PMGWR. What spatial patterns emerge in the intrinsic-contextual-effects map that the pooled model misses?

References

Li, Z. & Fotheringham, A.S. (2026). Spatial Context as a Time-Invariant Confounder: A Fixed-Effects Extension of MGWR. Annals of the American Association of Geographers. — the source paper for this tutorial.
Fotheringham, A.S., Oshan, T., & Li, Z. (2023). Multiscale Geographically Weighted Regression: Theory and Practice. Boca Raton: CRC Press. — comprehensive MGWR reference.
Fotheringham, A.S., & Li, Z. (2023). Measuring the unmeasurable: Models of geographical context. Annals of the American Association of Geographers, 113(10), 2269-2286. — origin of the intrinsic/behavioural contextual-effects distinction.
Fotheringham, A.S., Yang, W., & Kang, W. (2017). Multiscale Geographically Weighted Regression (MGWR). Annals of the American Association of Geographers, 107(6), 1247-1265.
Oshan, T., Li, Z., Kang, W., Wolf, L.J., & Fotheringham, A.S. (2019). mgwr: A Python Implementation of Multiscale Geographically Weighted Regression. Journal of Open Source Software, 4(42), 1823.
Wooldridge, J.M. (2010). Econometric Analysis of Cross Section and Panel Data, 2nd ed. Cambridge, MA: MIT Press. — Source of the omitted-variable-bias derivation in Section 3.
Pearl, J. (2009). Causality, 2nd ed. Cambridge University Press. — DAG framing of confounding.
da Silva, A.R., & Fotheringham, A.S. (2016). The multiple testing issue in geographically weighted regression. Geographical Analysis, 48(3), 233-247. — filtered t-values used in Section 13.
GeoZhipengLi/MGWPR — Custom mgwr Package with Panel Data Support (GitHub) — the implementation used in this tutorial.

AI Podcast: MGWFER and Spatial Confounders

Click play to load

0:00 0:00

Multiscale Geographically Weighted Regression: Spatially Varying Economic Convergence in Indonesia

Sun, 22 Mar 2026 00:00:00 +0000

1. Overview

When we ask “do poorer regions catch up to richer ones?”, the standard approach is to run a single regression across all regions and report one coefficient. But what if the answer depends on where you look? A negative coefficient in Sumatra does not mean the same process is at work in Papua. A global regression forces every district onto the same line — and in doing so, it may hide the most interesting part of the story.

Multiscale Geographically Weighted Regression (MGWR) addresses this by estimating a separate set of coefficients at every location, weighted by proximity. Its key innovation over standard GWR is that each variable is allowed to operate at its own spatial scale. The intercept (representing baseline growth conditions) might vary smoothly across large regions, while the convergence coefficient might shift sharply between neighboring districts. MGWR discovers these scales from the data rather than imposing a single bandwidth on all variables.

This tutorial applies MGWR to 514 Indonesian districts to answer: does economic catching-up happen at the same pace everywhere in Indonesia, or does geography shape how fast poorer districts close the gap? We progress from a global regression baseline through MGWR estimation and coefficient mapping, revealing that the global R² of 0.214 jumps to 0.762 once we allow the relationship to vary across space.

Learning objectives:

Understand why a single regression coefficient may hide important spatial variation
Estimate location-specific relationships with spatially varying coefficients
Apply MGWR to allow each variable to operate at its own spatial scale
Map and interpret spatially varying coefficients across Indonesia
Compare global OLS vs MGWR model fit and diagnostics

Key concepts at a glance

The post leans on a small vocabulary repeatedly. The rest of the tutorial assumes you can move between these terms quickly. Each concept below has three parts. The definition is always visible. The example and analogy sit behind clickable cards: open them when you need them, leave them collapsed for a quick scan. If a later section mentions “bandwidth” or “spatial heterogeneity” and the term feels slippery, this is the section to re-read.

1. Local regression $\hat\beta(s)$ varies by location. One regression per location $s$, weighted by spatial proximity. Coefficients become functions of geographic position rather than fixed numbers.

Example

In this post the convergence coefficient $\hat\beta$ on ln_gdppc2010 varies across the 514 Indonesian districts — from -1.74 (strong catching-up) to +0.42 (divergence).

Analogy

Drawing a different best-fit line at each map dot, not one global line for the whole country.

2. Bandwidth (kernel) $h$. The number of nearest neighbours each local regression uses. Smaller $h$ = more localized, noisier estimates; larger $h$ = smoother but flatter.

Example

This post selects an optimal bandwidth of 44 districts (out of 514) for both regressors. Each local regression at a given district uses its 44 nearest neighbours.

Analogy

The radius of the circle of friends a local model listens to before deciding.

3. Spatial heterogeneity $\beta_i \neq \beta_j$. Coefficients differ across space. The relationship between predictors and outcome is not constant geographically.

Example

In this post catching-up is strong in 149 of 514 districts (29% with significant negative β) but insignificant or positive in the other 365 districts. Convergence is not a single Indonesia-wide story.

Analogy

Different family recipes in different villages — not the same dish everywhere.

4. GWR vs MGWR one $h$ vs $h$ per regressor. GWR uses a single bandwidth for all coefficients. MGWR allows each coefficient to have its own bandwidth, capturing the fact that different processes operate at different spatial scales.

Example

In this post both ln_gdppc2010 and the intercept happen to share bandwidth = 44, but in general MGWR could have e.g. bandwidth 30 for one variable and 200 for another. The constraint relaxation is the methodological advance.

Analogy

One volume knob for everyone vs each instrument with its own knob.

5. Local R² $R^2_i$. The R² of the local regression at district $i$. Maps to a colour scale to show where the model fits well and where it struggles.

Example

This post maps local R² across Indonesia. Fits are strong in dense Java districts and weaker in sparse, remote eastern islands where the 44 nearest neighbours span huge geographic distances.

Analogy

“How well-played is the song in this village”.

6. AICc model selection lower AICc = better. The corrected Akaike Information Criterion penalizes model complexity. The standard MGWR-vs-OLS comparison.

Example

In this post global OLS has AICc = 1341.25 while MGWR has AICc = 838.41 — a difference of more than 500 strongly favours the spatially varying model.

Analogy

The picky food critic comparing the two restaurants and giving a definitive verdict.

7. β-convergence $g_i = \alpha + \beta \ln Y_{i,0} + \varepsilon_i$. The classic growth-economics test: poor regions catching up with rich ones leads to a negative β coefficient on initial income.

Example

This post’s global β = -0.1948 (mild catching-up overall). MGWR reveals β ranges from -1.74 (strong local convergence) to +0.42 (local divergence). The story is heterogeneous and the global average hides this.

Analogy

Poor districts catching up with rich ones. A negative slope means the gap shrinks; a positive slope means the gap widens.

8. Effective number of parameters trace of hat matrix. MGWR has more flexibility than OLS but less than fitting one regression per district. The “effective” parameter count quantifies this middle ground.

Example

This post’s MGWR uses 52.076 effective parameters — far more than OLS’s 2 but far less than 514×2 = 1,028 (one regression per district). MGWR finds the right level of model complexity automatically.

Analogy

A soft count of how many independent knobs the model really has.

2. The modeling pipeline

The analysis follows a natural progression: start with a simple global model, visualize the spatial patterns it cannot capture, then let MGWR reveal the local structure.

graph LR
A["<b>Step 1</b><br/>Load &<br/>Explore"] --> B["<b>Step 2</b><br/>Map<br/>Variables"]
B --> C["<b>Step 3</b><br/>Global<br/>OLS"]
C --> D["<b>Step 4</b><br/>MGWR<br/>Estimation"]
D --> E["<b>Step 5</b><br/>Map<br/>Coefficients"]
E --> F["<b>Step 6</b><br/>Significance<br/>& Compare"]
style A fill:#141413,stroke:#6a9bcc,color:#fff
style B fill:#d97757,stroke:#141413,color:#fff
style C fill:#6a9bcc,stroke:#141413,color:#fff
style D fill:#00d4c8,stroke:#141413,color:#fff
style E fill:#00d4c8,stroke:#141413,color:#fff
style F fill:#1a3a8a,stroke:#141413,color:#fff

3. Setup and imports

The analysis uses mgwr for multiscale regression, GeoPandas for spatial data, and mapclassify for choropleth classification.

import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import mapclassify
from scipy import stats
from mgwr.gwr import MGWR
from mgwr.sel_bw import Sel_BW
import warnings
warnings.filterwarnings("ignore")
# Site color palette
STEEL_BLUE = "#6a9bcc"
WARM_ORANGE = "#d97757"
NEAR_BLACK = "#141413"
TEAL = "#00d4c8"

Dark theme figure styling (click to expand)

DARK_NAVY = "#0f1729"
GRID_LINE = "#1f2b5e"
LIGHT_TEXT = "#c8d0e0"
WHITE_TEXT = "#e8ecf2"
plt.rcParams.update({
"figure.facecolor": DARK_NAVY,
"axes.facecolor": DARK_NAVY,
"axes.edgecolor": DARK_NAVY,
"axes.linewidth": 0,
"axes.labelcolor": LIGHT_TEXT,
"axes.titlecolor": WHITE_TEXT,
"axes.spines.top": False,
"axes.spines.right": False,
"axes.spines.left": False,
"axes.spines.bottom": False,
"axes.grid": True,
"grid.color": GRID_LINE,
"grid.linewidth": 0.6,
"grid.alpha": 0.8,
"xtick.color": LIGHT_TEXT,
"ytick.color": LIGHT_TEXT,
"xtick.major.size": 0,
"ytick.major.size": 0,
"text.color": WHITE_TEXT,
"font.size": 12,
"legend.frameon": False,
"legend.fontsize": 11,
"legend.labelcolor": LIGHT_TEXT,
"figure.edgecolor": DARK_NAVY,
"savefig.facecolor": DARK_NAVY,
"savefig.edgecolor": DARK_NAVY,
})

4. Data loading and exploration

The dataset covers 514 Indonesian districts with GDP per capita in 2010 and the subsequent growth rate through 2018. Indonesia is an ideal setting for studying spatial heterogeneity: it spans over 17,000 islands across 5,000 km of ocean, with enormous variation in economic structure, geography, and institutional capacity.

The core idea behind convergence is straightforward: if poorer districts tend to grow faster than richer ones, the income gap narrows over time. In a regression framework, this means we expect a negative relationship between initial income (log GDP per capita in 2010) and subsequent growth. The question is whether that negative relationship holds uniformly across the archipelago — or whether it is stronger in some places and weaker (or even reversed) in others.

CSV_URL = ("https://github.com/quarcs-lab/data-quarcs/raw/refs/heads/"
"master/indonesia514/dataBeta.csv")
GEO_URL = ("https://github.com/quarcs-lab/data-quarcs/raw/refs/heads/"
"master/indonesia514/mapIdonesia514-opt.geojson")
df = pd.read_csv(CSV_URL)
geo = gpd.read_file(GEO_URL)
gdf = geo.merge(df, on="districtID", how="left")
print(f"Loaded: {gdf.shape[0]} districts, {gdf.shape[1]} columns")
print(gdf[["ln_gdppc2010", "g"]].describe().round(4).to_string())

Loaded: 514 districts, 16 columns
ln_gdppc2010 g
count 514.0000 514.0000
mean 9.8371 0.3860
std 0.7603 0.3205
min 7.1657 -2.0452
25% 9.3983 0.2583
50% 9.7626 0.3453
75% 10.1739 0.4158
max 13.4438 2.0563

The 514 districts span a wide range of initial income: log GDP per capita ranges from 7.17 (the poorest district, roughly \$1,300 per capita) to 13.44 (the richest, roughly \$690,000 — likely a resource-extraction enclave). Growth rates also vary enormously, from -2.05 (severe contraction) to +2.06 (rapid expansion), with a mean of 0.39. This high variance in both variables suggests that a single regression line will struggle to capture the full picture.

5. Exploratory maps

Before fitting any model, we map the two key variables to see whether spatial patterns are visible to the naked eye. If initial income and growth are geographically clustered, that is already a hint that spatial models will outperform global ones.

fig, axes = plt.subplots(2, 1, figsize=(14, 14))
for ax, col, title in [
(axes[0], "ln_gdppc2010", "(a) Log GDP per capita, 2010"),
(axes[1], "g", "(b) GDP growth rate, 2010–2018"),
]:
fj = mapclassify.FisherJenks(gdf[col].dropna().values, k=5)
classified = mapclassify.UserDefined(gdf[col].values, bins=fj.bins.tolist())
cmap = plt.cm.coolwarm
norm = plt.Normalize(vmin=0, vmax=4)
colors = [cmap(norm(c)) for c in classified.yb]
gdf.plot(ax=ax, color=colors, edgecolor=GRID_LINE, linewidth=0.2)
ax.set_title(title, fontsize=14, pad=10)
ax.set_axis_off()
plt.tight_layout()
plt.savefig("mgwr_map_xy.png", dpi=300, bbox_inches="tight")
plt.show()

The maps reveal clear spatial structure. Initial income (panel a) is highest in Jakarta and resource-rich districts in Kalimantan and Papua (warm red), while the lowest-income districts cluster in eastern Nusa Tenggara and parts of Maluku (cool blue). Growth rates (panel b) show a different pattern: some of the poorest districts in Papua and Sulawesi experienced rapid growth (suggesting catching-up), while several high-income resource districts saw contraction. The fact that these patterns are geographically organized — not randomly scattered — motivates the use of spatially varying models.

6. Global regression baseline

The simplest test for economic convergence fits a single regression line through all 514 districts. If the slope is negative, poorer districts (low initial income) tend to grow faster than richer ones.

$$g_i = \alpha + \beta \cdot \ln(y_{i,2010}) + \varepsilon_i$$

where $g_i$ is the growth rate, $\ln(y_{i,2010})$ is log initial income, and $\beta < 0$ indicates convergence. In the code, $g_i$ corresponds to the column g and $\ln(y_{i,2010})$ to ln_gdppc2010.

slope, intercept, r_value, p_value, std_err = stats.linregress(
gdf["ln_gdppc2010"], gdf["g"]
)
print(f"Slope (convergence coefficient): {slope:.4f}")
print(f"R-squared: {r_value**2:.4f}")
print(f"p-value: {p_value:.6f}")

Slope (convergence coefficient): -0.1948
R-squared: 0.2135
p-value: 0.000000

fig, ax = plt.subplots(figsize=(10, 7))
ax.scatter(gdf["ln_gdppc2010"], gdf["g"],
color=STEEL_BLUE, edgecolors=GRID_LINE, s=35, alpha=0.6, zorder=3)
x_range = np.linspace(gdf["ln_gdppc2010"].min(), gdf["ln_gdppc2010"].max(), 100)
ax.plot(x_range, intercept + slope * x_range, color=WARM_ORANGE,
linewidth=2, zorder=2)
ax.set_xlabel("Log GDP per capita (2010)")
ax.set_ylabel("GDP growth rate (2010–2018)")
ax.set_title("Global convergence regression")
plt.savefig("mgwr_scatter_global.png", dpi=300, bbox_inches="tight")
plt.show()

The global regression confirms that convergence exists on average: the slope is $-0.195$ (p < 0.001), meaning a 1-unit increase in log initial income is associated with a 0.195 percentage-point lower growth rate. However, the R² of only 0.214 means this single line explains just 21% of the variation in growth rates. The scatter plot shows enormous dispersion around the regression line — many districts with similar initial income experienced vastly different growth trajectories. This low explanatory power is the motivation for MGWR: perhaps the relationship is not weak everywhere, but rather strong in some regions and absent in others, and a single coefficient is simply averaging over this heterogeneity.

7. From global to local: why MGWR?

7.1 The limitation of a single coefficient

The global regression tells us that $\beta = -0.195$ on average across Indonesia. But consider two districts with the same initial income — one in Java, where infrastructure and market access are strong, and one in Papua, where remoteness and institutional challenges dominate. There is no reason to expect the same convergence dynamic in both places. A single coefficient forces them onto the same line.

Geographically Weighted Regression (GWR) addresses this by estimating a separate regression at each location, using a kernel function — a distance-decay weighting scheme (typically Gaussian or bisquare) that gives more weight to nearby observations and less to distant ones. The result is a set of location-specific coefficients — each district gets its own slope and intercept:

$$g_i = \alpha(u_i, v_i) + \beta(u_i, v_i) \cdot \ln(y_{i,2010}) + \varepsilon_i$$

where $(u_i, v_i)$ are the geographic coordinates of district $i$, and both $\alpha$ and $\beta$ are now functions of location rather than fixed constants. In the code, $(u_i, v_i)$ correspond to COORD_X and COORD_Y. The bandwidth parameter $h$ controls how many neighbors contribute to each local regression — a small bandwidth means only very close districts matter (highly local), while a large bandwidth approaches the global model.

However, standard GWR uses a single bandwidth for all variables, which means the intercept and the convergence coefficient are forced to vary at the same spatial scale.

MGWR removes this constraint. It allows each variable to find its own optimal bandwidth through an iterative back-fitting procedure — a process that cycles through each variable, optimizing its bandwidth while holding the others fixed, until all bandwidths converge. If baseline growth conditions vary smoothly across large regions (large bandwidth), while the convergence speed varies sharply between neighboring districts (small bandwidth), MGWR will discover this from the data. This makes MGWR a more flexible and realistic model for processes that operate at multiple spatial scales. The key assumption is that spatial relationships are locally stationary within each kernel window — the relationship between income and growth is approximately constant among the nearest $h$ districts, even if it differs across the full map.

7.2 MGWR estimation

The mgwr package requires variables to be standardized (zero mean, unit variance) before multiscale bandwidth selection. This ensures that the bandwidths are comparable across variables measured in different units. The spherical=True flag tells the algorithm to compute great-circle distances rather than Euclidean distances, which is essential when working with geographic coordinates spanning a large area like Indonesia.

# Prepare variables
y = gdf["g"].values.reshape((-1, 1))
X = gdf[["ln_gdppc2010"]].values
coords = list(zip(gdf["COORD_X"], gdf["COORD_Y"]))
# Standardize (required for MGWR)
Zy = (y - y.mean(axis=0)) / y.std(axis=0)
ZX = (X - X.mean(axis=0)) / X.std(axis=0)
# Bandwidth selection and model fitting
mgwr_selector = Sel_BW(coords, Zy, ZX, multi=True, spherical=True)
mgwr_bw = mgwr_selector.search()
mgwr_results = MGWR(coords, Zy, ZX, mgwr_selector, spherical=True).fit()
mgwr_results.summary()

===========================================================================
Model type Gaussian
Number of observations: 514
Number of covariates: 2
Global Regression Results
---------------------------------------------------------------------------
R2: 0.214
Adj. R2: 0.212
Multi-Scale Geographically Weighted Regression (MGWR) Results
---------------------------------------------------------------------------
Spatial kernel: Adaptive bisquare
MGWR bandwidths
---------------------------------------------------------------------------
Variable Bandwidth ENP_j Adj t-val(95%) Adj alpha(95%)
X0 44.000 26.805 3.127 0.002
X1 44.000 25.271 3.109 0.002
Diagnostic information
---------------------------------------------------------------------------
Residual sum of squares: 122.081
Effective number of parameters (trace(S)): 52.076
Sigma estimate: 0.514
R2 0.762
Adjusted R2 0.736
AICc: 838.405
===========================================================================

The MGWR results are striking. R² jumps from 0.214 (global) to 0.762 (MGWR) — the spatially varying model explains more than three times as much variation as the global regression. Both the intercept and the convergence coefficient receive a bandwidth of 44, meaning each local regression draws on the 44 nearest districts. This is a relatively local scale (44 out of 514 districts, or about 8.6% of the sample), confirming that the convergence relationship varies substantially across the archipelago. The effective number of parameters is 52.1, reflecting the cost of estimating location-specific coefficients instead of two global ones.

7.3 Mapping MGWR coefficients

The power of MGWR lies in the coefficient maps. Instead of a single number for the whole country, we can now visualize how the convergence relationship changes from district to district. Because MGWR is estimated on standardized variables, the mapped coefficients are in standard-deviation units: a coefficient of $-1.0$ means that a one-standard-deviation increase in log initial income is associated with a one-standard-deviation decrease in growth at that location.

gdf["mgwr_intercept"] = mgwr_results.params[:, 0]
gdf["mgwr_slope"] = mgwr_results.params[:, 1]

Intercept map — the intercept captures baseline growth conditions after accounting for initial income. Positive values indicate districts that grew faster than expected given their income level; negative values indicate underperformance.

fig, ax = plt.subplots(figsize=(14, 8))
# Fisher-Jenks classification with Patch legend (see script.py for details)
gdf.plot(ax=ax, column="mgwr_intercept", scheme="FisherJenks", k=5,
cmap="coolwarm", edgecolor=GRID_LINE, linewidth=0.2, legend=True)
ax.set_title(f"MGWR intercept (bandwidth = {int(mgwr_bw[0])})")
ax.set_axis_off()
plt.savefig("mgwr_mgwr_intercept.png", dpi=300, bbox_inches="tight")
plt.show()

The intercept map reveals a clear east–west gradient. Districts in western Indonesia (Sumatra and Java) tend to have negative intercepts — they grew less than the convergence model would predict based on their initial income alone. Districts in eastern Indonesia (Papua, Maluku, Nusa Tenggara) show positive intercepts, indicating growth that exceeded what initial income would predict. This pattern may reflect the role of resource extraction, infrastructure investment, and fiscal transfers that disproportionately boosted growth in less-developed eastern regions during the 2010–2018 period.

Convergence coefficient map — the slope captures how strongly initial income predicts subsequent growth at each location. Large negative values indicate rapid catching-up; values near zero or positive indicate no convergence or divergence.

fig, ax = plt.subplots(figsize=(14, 8))
gdf.plot(ax=ax, column="mgwr_slope", scheme="FisherJenks", k=5,
cmap="coolwarm", edgecolor=GRID_LINE, linewidth=0.2, legend=True)
ax.set_title(f"MGWR convergence coefficient (bandwidth = {int(mgwr_bw[1])})")
ax.set_axis_off()
plt.savefig("mgwr_mgwr_slope.png", dpi=300, bbox_inches="tight")
plt.show()

The convergence coefficient map is the central finding of this analysis. The global regression reported a single $\beta = -0.195$, but MGWR reveals that this average hides enormous spatial variation. The strongest catching-up (deepest blue, coefficients as negative as $-1.74$) concentrates in western Sumatra and parts of Kalimantan — districts where poorer areas grew much faster than richer neighbors. In contrast, most of Java, eastern Indonesia, and the Maluku islands show coefficients near zero (light pink), indicating that the convergence relationship is essentially absent in these areas. A handful of districts show weakly positive coefficients (up to 0.42), suggesting localized divergence where richer districts pulled further ahead. The coefficient ranges from $-1.74$ to $+0.42$, with a median of $-0.085$ and a standard deviation of 0.553 — far from the single value of $-0.195$ reported by the global model.

7.4 Statistical significance

Not all local coefficients are statistically distinguishable from zero. MGWR provides t-values corrected for multiple testing, which we use to classify each district’s convergence coefficient as significantly negative (catching-up), not significant, or significantly positive (diverging).

mgwr_filtered_t = mgwr_results.filter_tvals()
t_sig = mgwr_filtered_t[:, 1] # Slope t-values
sig_cats = np.where(t_sig < 0, "Negative (catching-up)",
np.where(t_sig > 0, "Positive (diverging)", "Not significant"))
print(f"Negative (catching-up): {(sig_cats == 'Negative (catching-up)').sum()}")
print(f"Not significant: {(sig_cats == 'Not significant').sum()}")
print(f"Positive (diverging): {(sig_cats == 'Positive (diverging)').sum()}")

Negative (catching-up): 149
Not significant: 365
Positive (diverging): 0

fig, ax = plt.subplots(figsize=(14, 8))
cat_colors = {
"Negative (catching-up)": "#2c7bb6",
"Not significant": GRID_LINE,
"Positive (diverging)": "#d7191c",
}
colors_sig = [cat_colors[c] for c in sig_cats]
gdf.plot(ax=ax, color=colors_sig, edgecolor=GRID_LINE, linewidth=0.2)
ax.set_title("MGWR convergence coefficient: statistical significance")
ax.set_axis_off()
plt.savefig("mgwr_mgwr_significance.png", dpi=300, bbox_inches="tight")
plt.show()

Of 514 districts, 149 (29%) show statistically significant convergence at the corrected 5% level — concentrated in Sumatra, western Kalimantan, and Sulawesi. The remaining 365 districts (71%) have convergence coefficients that are not distinguishable from zero after correcting for multiple comparisons. No district shows significant divergence. This means that while the global regression detects convergence on average, it is actually driven by a minority of districts — primarily in western Indonesia — while the majority of the archipelago shows no significant relationship between initial income and growth.

8. Model comparison

The table below summarizes how much explanatory power the spatially varying model adds over the global baseline.

print(f"{'Metric':<25} {'Global OLS':>12} {'MGWR':>12}")
print(f"{'R²':<25} {0.2135:>12.4f} {0.7625:>12.4f}")
print(f"{'Adj. R²':<25} {0.2120:>12.4f} {0.7357:>12.4f}")
print(f"{'AICc':<25} {1341.25:>12.2f} {838.41:>12.2f}")
print(f"{'Bandwidth (intercept)':<25} {'all (514)':>12} {'44':>12}")
print(f"{'Bandwidth (slope)':<25} {'all (514)':>12} {'44':>12}")

Metric Global OLS MGWR
R² 0.2135 0.7625
Adj. R² 0.2120 0.7357
AICc 1341.25 838.41
Bandwidth (intercept) all (514) 44
Bandwidth (slope) all (514) 44

MGWR more than triples the explained variance ($R^2$: 0.214 to 0.762) and dramatically reduces the AICc from 1341 to 838, confirming that the improvement in fit is not merely due to additional flexibility. The bandwidth of 44 for both variables means each local regression uses the nearest 44 districts (about 8.6% of the sample), confirming that the convergence process is highly localized. The adjusted $R^2$ of 0.736 accounts for the additional complexity (52 effective parameters vs 2 in OLS) and still shows a massive improvement, indicating that the spatial variation in coefficients is genuine and not overfitting.

9. Discussion

Economic catching-up in Indonesia is not uniform — it is concentrated in western Sumatra and parts of Kalimantan, while most of the archipelago shows no significant convergence. The global regression’s $\beta = -0.195$ suggests a moderate convergence tendency, but MGWR reveals that this average is driven by a subset of 149 districts (29%) with strong catching-up dynamics. The remaining 365 districts have convergence coefficients indistinguishable from zero.

The intercept map adds another dimension: eastern Indonesian districts tend to have positive intercepts (above-expected growth), while western districts have negative intercepts (below-expected growth). This east–west gradient likely reflects the impact of fiscal transfers, resource booms, and infrastructure programs that targeted less-developed regions during the 2010–2018 period. Combined with the convergence coefficient map, the picture is nuanced: eastern Indonesia grew faster than expected (high intercept), but not because of convergence dynamics (near-zero slope) — rather, because of other factors captured by the intercept.

For policy, these findings challenge the assumption that national-level convergence statistics reflect what is happening locally. A policymaker looking at $\beta = -0.195$ might conclude that Indonesia’s development strategy is successfully closing regional gaps. MGWR reveals that catching-up is geographically selective, and the majority of districts are not on a convergence path at all. Spatially targeted interventions — rather than uniform national programs — may be needed to address this uneven landscape.

10. Summary and next steps

Key takeaways:

Method insight: MGWR reveals spatial heterogeneity invisible to global regression. R² improves from 0.214 to 0.762 by allowing location-specific coefficients. Both variables operate at a bandwidth of 44 districts (~8.6% of the sample), indicating highly localized economic dynamics. Variable standardization is essential before MGWR estimation.
Data insight: Only 149 of 514 Indonesian districts (29%) show statistically significant convergence, concentrated in Sumatra and Kalimantan. The convergence coefficient ranges from $-1.74$ to $+0.42$, far from the global average of $-0.195$. Eastern Indonesia grows faster than expected (positive intercepts) but not through convergence — the catching-up mechanism is absent there.
Limitation: The bivariate model (one independent variable) is intentionally simple for pedagogical purposes. Real convergence analysis would include controls for human capital, infrastructure, institutional quality, and sectoral composition. The bandwidth of 44 applies to both variables in this case, but with additional covariates, MGWR’s ability to assign different bandwidths per variable would be more visible.
Next step: Extend the model with additional covariates (education, investment, fiscal transfers) to disentangle the sources of spatial heterogeneity. Apply MGWR to panel data with multiple time periods. Compare MGWR results with the spatial clusters identified in the ESDA tutorial to see whether convergence hotspots align with LISA clusters.

11. Exercises

Add a second variable. Include an education indicator (e.g., years of schooling) as a second independent variable and re-run MGWR. Do the two covariates receive different bandwidths? What does that tell you about the spatial scale at which education affects growth?
Map the t-values. Instead of mapping the raw coefficients, map the local t-statistics from mgwr_results.tvalues[:, 1]. How does this map compare to the significance map based on corrected t-values?
Compare with ESDA. Run a Moran’s I test on the MGWR residuals. Is there remaining spatial autocorrelation? If not, MGWR has successfully captured the spatial structure. If yes, what might be missing?

12. References

Acknowledgements

AI tools (Claude Code, Gemini, NotebookLM) were used to make the contents of this post more accessible to students. Nevertheless, the content in this post may still have errors. Caution is needed when applying the contents of this post to true research projects.

Studying spatial heterogeneity

Sat, 23 Dec 2023 00:00:00 +0000

A geocomputational notebook to compute GWR and MGWR