<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>GWR and MGWR | Carlos Mendez</title><link>https://carlos-mendez.org/category/gwr-and-mgwr/</link><atom:link href="https://carlos-mendez.org/category/gwr-and-mgwr/index.xml" rel="self" type="application/rss+xml"/><description>GWR and MGWR</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>© 2018–2026 Carlos Mendez. All rights reserved.</copyright><lastBuildDate>Sun, 03 May 2026 00:00:00 +0000</lastBuildDate><image><url>https://carlos-mendez.org/media/icon_huedfae549300b4ca5d201a9bd09a3ecd5_79625_512x512_fill_lanczos_center_3.png</url><title>GWR and MGWR</title><link>https://carlos-mendez.org/category/gwr-and-mgwr/</link></image><item><title>MGWFER: Causal Spatially Varying Coefficients via Panel Fixed Effects</title><link>https://carlos-mendez.org/post/python_mgwrfer/</link><pubDate>Sun, 03 May 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/python_mgwrfer/</guid><description>&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>When we estimate how relationships vary across space — say, the effect of education on income in different neighborhoods — a hidden danger lurks. If some unobserved attribute of place (geographic amenities, historical institutions, persistent social norms) affects both the outcome and the covariates, our spatially varying coefficients absorb that contamination. The result: coefficients that look like local effects but actually reflect omitted variable bias.&lt;/p>
&lt;p>This post is a Python tutorial faithful to &lt;a href="https://doi.org/10.1080/24694452.2026.2654481" target="_blank" rel="noopener">Li &amp;amp; Fotheringham (2026)&lt;/a>, &lt;em>&amp;ldquo;Spatial Context as a Time-Invariant Confounder: A Fixed-Effects Extension of MGWR,&amp;rdquo;&lt;/em> &lt;em>Annals of the American Association of Geographers&lt;/em>. The paper introduces &lt;strong>Multiscale Geographically Weighted Fixed Effects Regression (MGWFER)&lt;/strong>, a local panel framework that combines two powerful ideas: (1) a &lt;em>within-transformation&lt;/em> that removes all time-invariant confounders from panel data, and (2) &lt;em>Multiscale GWR&lt;/em> that estimates location-specific coefficients at variable-optimal spatial scales. Think of it as giving each location its own regression while simultaneously controlling for everything about that location that does not change over time.&lt;/p>
&lt;p>This tutorial asks: &lt;strong>can we recover the true spatially varying coefficients — and the intrinsic contextual effects themselves — when an unobserved spatial context drives both the outcome and the covariate levels?&lt;/strong> We simulate a panel of 225 spatial units observed over 3 time periods using the paper&amp;rsquo;s DGP verbatim (the indirect channel &lt;code>sc → x_k&lt;/code> is active, with &lt;code>Cor(x_k, sc) ≈ 0.84&lt;/code>), and compare six estimators across the full lineup the paper considers: cross-sectional OLS, pooled OLS, individual FE, cross-sectional MGWR, pooled MGWR (PMGWR), and MGWFER. The answer is yes on both counts: MGWFER cuts the most-biased local coefficient&amp;rsquo;s error by ~92% (β₁ RMSE 2.30 → 0.18, with the sign of the correlation against truth flipping from −0.46 to +0.82), and &lt;strong>Stage 2&lt;/strong> recovers the unit-level fixed effects with Pearson correlation &lt;strong>≈1.000&lt;/strong> (0.9996) against the true confounder surface.&lt;/p>
&lt;p>&lt;strong>Learning objectives:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Distinguish the &lt;strong>three kinds of contextual effects&lt;/strong> (intrinsic, behavioral, indirect) that the paper formalises.&lt;/li>
&lt;li>See, via a causal DAG and a one-page Wooldridge derivation, &lt;em>why&lt;/em> an unobserved spatial context produces omitted-variable bias in MGWR.&lt;/li>
&lt;li>Implement the &lt;strong>two-stage MGWFER algorithm&lt;/strong>: Stage 1 (within-transform + standardise + MGWR + back-transform) and Stage 2 (recover individual fixed effects with per-unit t-tests).&lt;/li>
&lt;li>Compare PMGWR and MGWFER on RMSE, correlation, bandwidths, significance maps, and the recovered fixed-effects surface.&lt;/li>
&lt;li>Audit the &lt;strong>four identification assumptions&lt;/strong> under which MGWFER yields a causal interpretation, and the limitations that survive.&lt;/li>
&lt;/ul>
&lt;p>The analysis follows the paper&amp;rsquo;s progression: simulate known truth, fit the naive PMGWR, apply the within-transform, fit MGWFER, recover the fixed effects, then compare.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
A[&amp;quot;&amp;lt;b&amp;gt;Step 1&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Simulate&amp;lt;br/&amp;gt;Panel DGP&amp;quot;] --&amp;gt; G[&amp;quot;&amp;lt;b&amp;gt;Step 2&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Global baselines&amp;lt;br/&amp;gt;OLS / FE&amp;quot;]
G --&amp;gt; B[&amp;quot;&amp;lt;b&amp;gt;Step 3&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;MGWR_cs &amp;amp;amp;&amp;lt;br/&amp;gt;PMGWR&amp;quot;]
B --&amp;gt; C[&amp;quot;&amp;lt;b&amp;gt;Step 4&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Within-&amp;lt;br/&amp;gt;Transform&amp;quot;]
C --&amp;gt; D[&amp;quot;&amp;lt;b&amp;gt;Step 5&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Stage 1:&amp;lt;br/&amp;gt;MGWFER slopes&amp;quot;]
D --&amp;gt; F[&amp;quot;&amp;lt;b&amp;gt;Step 6&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Stage 2:&amp;lt;br/&amp;gt;Recover &amp;amp;alpha;&amp;lt;sub&amp;gt;i&amp;lt;/sub&amp;gt;&amp;quot;]
F --&amp;gt; E[&amp;quot;&amp;lt;b&amp;gt;Step 7&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Compare&amp;lt;br/&amp;gt;all six&amp;quot;]
style A fill:#141413,stroke:#6a9bcc,color:#fff
style G fill:#6a9bcc,stroke:#141413,color:#fff
style B fill:#d97757,stroke:#141413,color:#fff
style C fill:#6a9bcc,stroke:#141413,color:#fff
style D fill:#00d4c8,stroke:#141413,color:#fff
style F fill:#00d4c8,stroke:#141413,color:#fff
style E fill:#1a3a8a,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;p>The key insight is at Step 3: by subtracting each unit&amp;rsquo;s time-series mean, the confounder vanishes — it contributes the same amount at every time period, so the mean subtraction cancels it exactly. What remains is pure within-unit variation, driven only by the spatially varying coefficients and noise. Stage 2 then walks the algorithm backwards: once we have the slopes, we recover the fixed effects $\alpha_i$ themselves as a substantive quantity of interest.&lt;/p>
&lt;h2 id="2-three-kinds-of-contextual-effects">2. Three kinds of contextual effects&lt;/h2>
&lt;p>Li &amp;amp; Fotheringham (2026) reorganise how &lt;em>place&lt;/em> can shape behaviour by splitting &amp;ldquo;contextual effects&amp;rdquo; into three categories. Two were already in the MGWR vocabulary; the third is the paper&amp;rsquo;s headline contribution and the reason MGWFER exists.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Intrinsic contextual effects.&lt;/strong> Unmeasured attributes of place (traditions, local norms, persistent geographic conditions) that &lt;em>directly&lt;/em> shift the outcome. In MGWR these are captured by the &lt;strong>local intercept&lt;/strong> $\alpha_{bw0}(u_i, v_i)$. In MGWFER they are captured by the &lt;strong>individual fixed effect&lt;/strong> $\alpha_i$.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Behavioral contextual effects.&lt;/strong> How place &lt;em>modulates the slopes&lt;/em> — i.e., the elasticities between $y$ and each covariate $x_k$. In MGWR these are the spatially varying coefficients $\beta_{bwk}(u_i, v_i)$, allowed to operate at covariate-specific bandwidths.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Indirect contextual effects&lt;/strong> &lt;em>(the paper&amp;rsquo;s key addition).&lt;/em> How place shapes the &lt;em>levels of the covariates themselves&lt;/em>. Wealthy regions tend to invest more in transit; coastal regions have more tourism; old-industrial regions have higher unemployment. The covariates are not exogenous — they have a backdoor link through spatial context. Standard MGWR&amp;rsquo;s exogeneity assumption denies this channel.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>It is the third channel that contaminates MGWR estimates: because spatial context can both raise the levels of the $x_k$&amp;rsquo;s and shift $y$ directly, ignoring it creates a spurious correlation between covariates and outcomes that looks like a &amp;ldquo;local effect.&amp;rdquo; MGWFER&amp;rsquo;s within-transformation severs that backdoor path by removing every time-invariant component of place from both sides of the regression.&lt;/p>
&lt;blockquote>
&lt;p>&amp;ldquo;Spatial context, as part of unmeasured factors, however, probably exerts a profound and widespread influence on a wide range of socioeconomic factors. Under these conditions, MGWR would suffer from endogeneity and potentially support misleading correlations between covariates and the response variable.&amp;rdquo; — Li &amp;amp; Fotheringham (2026)&lt;/p>
&lt;/blockquote>
&lt;h2 id="3-spatial-context-as-a-confounder-a-causal-diagram-view">3. Spatial context as a confounder: a causal-diagram view&lt;/h2>
&lt;p>The intuition is cleanest in the language of directed acyclic graphs (DAGs; Pearl 2009). Two graphs are at issue.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
subgraph &amp;quot;Figure 2A — MGWR's implicit assumption&amp;quot;
X1[&amp;quot;X (covariates)&amp;quot;] --&amp;gt;|&amp;quot;β&amp;quot;| Y1[&amp;quot;Y (outcome)&amp;quot;]
SC1((SC)):::hidden -.-&amp;gt;|&amp;quot;only direct&amp;quot;| Y1
end
subgraph &amp;quot;Figure 2B — what really happens (Li &amp;amp;amp; Fotheringham 2026)&amp;quot;
SC2((SC)):::hidden --&amp;gt;|&amp;quot;δ (indirect)&amp;quot;| X2[&amp;quot;X (covariates)&amp;quot;]
SC2 --&amp;gt;|&amp;quot;intrinsic&amp;quot;| Y2[&amp;quot;Y (outcome)&amp;quot;]
X2 --&amp;gt;|&amp;quot;β (behavioral)&amp;quot;| Y2
end
classDef hidden fill:#0f1729,stroke:#d97757,color:#fff,stroke-dasharray: 4 3
style X1 fill:#6a9bcc,stroke:#141413,color:#fff
style Y1 fill:#00d4c8,stroke:#141413,color:#fff
style X2 fill:#6a9bcc,stroke:#141413,color:#fff
style Y2 fill:#00d4c8,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;p>In &lt;strong>Figure 2A&lt;/strong>, spatial context only touches $Y$ directly — there is no backdoor path from $X$ to $Y$ through $SC$, and MGWR&amp;rsquo;s coefficient estimates can be read causally (under the usual exogeneity assumption). In &lt;strong>Figure 2B&lt;/strong> — the realistic structure — $SC$ is a parent of &lt;em>both&lt;/em> $X$ and $Y$. There is now a non-causal backdoor path $X \leftarrow SC \rightarrow Y$ that opens whenever $SC$ is left unconditioned-upon. That open path is what biases the MGWR estimates.&lt;/p>
&lt;p>The formal demonstration, adapted from Wooldridge (2010, 65-67) and equations 4-8 in the paper, takes one paragraph. Write the true model with spatial context $sc$ entering linearly:&lt;/p>
&lt;p>$$y = \beta_0 + x_1 \beta_1 + \cdots + x_K \beta_K + sc + \varepsilon, \quad E[\varepsilon \mid x, sc] = 0.$$&lt;/p>
&lt;p>Since $sc$ is unobservable, it is absorbed into the error term $\mu = sc + \varepsilon$. If $sc$ has a linear projection on the covariates,&lt;/p>
&lt;p>$$sc = \delta_0 + x_1 \delta_1 + \cdots + x_K \delta_K + \eta,$$&lt;/p>
&lt;p>then substituting and rearranging yields:&lt;/p>
&lt;p>$$y = (\beta_0 + \delta_0) + x_1 (\beta_1 + \delta_1) + \cdots + x_K (\beta_K + \delta_K) + (\varepsilon + \eta).$$&lt;/p>
&lt;p>OLS (or MGWR) recovers &lt;strong>$\hat\beta_k = \beta_k + \delta_k$&lt;/strong>, not $\beta_k$. The bias term $\delta_k$ is exactly the indirect contextual effect — the strength of the link from $SC$ to $x_k$. When that link is non-trivial, the estimates are systematically wrong, and &lt;em>the magnitude of the bias is the magnitude of the indirect contextual effect.&lt;/em> MGWFER&amp;rsquo;s within-transformation eliminates the time-invariant component of $sc$ (which, by the paper&amp;rsquo;s assumption, is &lt;em>all&lt;/em> of $sc$), neutralising $\delta_k$ and restoring identification of $\beta_k$.&lt;/p>
&lt;h3 id="31-key-concepts-at-a-glance">3.1 Key concepts at a glance&lt;/h3>
&lt;p>The post leans on a small vocabulary repeatedly. The rest of the tutorial assumes you can move between these terms quickly. Each concept below has three parts. The &lt;strong>definition&lt;/strong> is always visible. The &lt;strong>example&lt;/strong> and &lt;strong>analogy&lt;/strong> sit behind clickable cards: open them when you need them, leave them collapsed for a quick scan. If a later section mentions &amp;ldquo;within-transformation&amp;rdquo; or &amp;ldquo;bandwidth selection&amp;rdquo; and the term feels slippery, this is the section to re-read.&lt;/p>
&lt;p>&lt;strong>1. Spatially varying coefficients&lt;/strong> $\beta_j(u_i, v_i)$.
A regression coefficient that depends on location. Each unit $i$ at coordinates $(u_i, v_i)$ has its own slope on covariate $j$. The coefficient surface tells you where the predictor matters more or less. It is the &lt;em>signal&lt;/em> MGWR is built to estimate.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>True $\beta_1$ in this simulation ranges from 1.06 to 2.00 across the 15×15 grid — the effect of &lt;code>x1&lt;/code> on &lt;code>y&lt;/code> is roughly twice as large in some districts as in others. True $\beta_3 = 1.5$ everywhere (a constant). True $\beta_4 = 0$ everywhere (a null effect we hope MGWR will &lt;em>not&lt;/em> spuriously detect).&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>A weather map of barometric sensitivity. In some valleys a 1-degree drop spawns a thunderstorm. On the plains, the same drop does nothing. The map of sensitivities, not the average sensitivity, is what tells the meteorologist where to send the warning.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>2. Time-invariant confounder (fixed effect)&lt;/strong> $\alpha_i$.
A unit-specific shift that contributes equally at every time period. It contaminates pooled estimators because it is correlated with the covariates. Within-unit variation is its blind spot. Cross-unit variation is its playground. In the paper&amp;rsquo;s framing, $\alpha_i$ is the &lt;strong>statistical operationalisation of spatial context&lt;/strong> — the unmeasurable place-based factors that the within-transformation will eliminate.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>In our simulation $\alpha_i$ (= &lt;code>sc_i&lt;/code> in the paper) ranges from 2.07 to 51.55 across the 225 units, exponential in column index. It enters the outcome equation directly &lt;em>and&lt;/em> it drives the levels of every covariate (paper Eqs. 40-43). PMGWR cannot disentangle these channels: it conflates &lt;code>sc_i&lt;/code> with the spatially varying coefficients, returning $\hat{\beta}_1$ estimates anti-correlated with the truth.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>A stain printed on the negative before each exposure. Every photograph from that camera carries the same blot. Stitching three photos together does not reveal the scene; it reveals the blot.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>3. Within-transformation (demeaning)&lt;/strong> $\tilde{y}_{it} = y_{it} - \bar{y}_i$.
Subtract each unit&amp;rsquo;s time-series mean from each observation. The unit-specific shift $\alpha_i$ vanishes by construction. What remains is within-unit variation: the part of &lt;code>y&lt;/code> that moves over time inside one unit.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>Raw &lt;code>y&lt;/code> ranges from -4.07 to 57.41 (a span of 61). Demeaned &lt;code>y&lt;/code> ranges from -6.88 to 6.92 (a span of 14). The bulk of the original variation was &lt;em>between&lt;/em> units; demeaning isolates the &lt;em>within&lt;/em>-unit signal that identifies the spatially varying coefficients.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>Subtracting the watermark from every page of a stamped manuscript. The text underneath is what you came for. Until you remove the watermark, every page looks dominated by it.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>4. Multiscale GWR (MGWR)&lt;/strong>.
A geographically weighted regression where each covariate gets its own optimal bandwidth. Local effects vary at different scales: some predictors smooth out over large neighbourhoods, others change house-by-house. MGWR learns those scales from the data.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>In this post MGWFER fits four covariates (&lt;code>x1&lt;/code>-&lt;code>x4&lt;/code>). After bandwidth selection, MGWFER assigns bandwidths [50, 91, 116, 62] — &lt;code>x1&lt;/code> operates on tight neighbourhoods of ~50 nearest units, &lt;code>x3&lt;/code> on broader ~116-unit windows. PMGWR collapses every bandwidth to 44–50 (because the strong sc-coupling makes every covariate look the same locally), and cross-sectional MGWR returns [48, 91, 98, 52] for a different reason (no panel structure to exploit at all).&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>A camera with one zoom lens per channel. The red channel zooms tight on a face. The blue channel pulls back to capture sky. A single fixed zoom for all channels would smear them.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>5. Bandwidth selection&lt;/strong>.
The hyperparameter that controls kernel smoothness around each location. Cross-validation picks the bandwidth that minimizes a corrected AICc or similar criterion. When the data contain a fixed effect, the cross-validation criterion is contaminated and picks the wrong bandwidths.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>PMGWR assigns &lt;code>x4&lt;/code> (a null effect) a bandwidth of 46 — small but driven by spurious sc-aligned spatial structure that the model misreads as &amp;ldquo;local&amp;rdquo;. After demeaning, MGWFER assigns &lt;code>x4&lt;/code> a bandwidth of 62, closer to local truth, with a 10.2% false-positive rate (202/225 units correctly flagged non-significant) — even though MGWFER&amp;rsquo;s &lt;code>β_4&lt;/code> RMSE is 13× smaller than PMGWR&amp;rsquo;s.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>A focal length on a camera lens. Auto-focus picks it from what is in the viewfinder. If a smear of mist is in the way, auto-focus locks onto the smear and the actual subject blurs out.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>6. Pooled MGWR (PMGWR)&lt;/strong>.
The naive baseline. Treats the 675 observations as an unstructured cross-section. Ignores that 3 of every 3 observations come from the same &lt;code>unit_id&lt;/code>. Cannot remove $\alpha_i$. Produces biased coefficient surfaces. The paper calls this &lt;em>pooled multiscale geographically weighted regression&lt;/em> and uses it as the reference point against which MGWFER is benchmarked.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>PMGWR returns $\beta_1$ RMSE = 2.30 with a coefficient correlation of &lt;strong>−0.46&lt;/strong> against the truth — its $\beta_1$ map is &lt;em>anti-correlated&lt;/em> with the real signal, the worst possible outcome for a model that is supposed to recover spatial heterogeneity. It also &amp;ldquo;detects&amp;rdquo; a strongly spatially varying $\beta_4$ that is actually zero everywhere. The pooled estimator is the wrong baseline because the indirect contextual channel makes every covariate a noisy proxy for &lt;code>sc&lt;/code>, which the pooled fit blames on the slopes.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>Stitching three photographs of a moving subject without aligning them first. The composite looks like a triple-exposed ghost. Each photograph individually was fine; the lack of alignment ruined the panorama.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>7. MGWFER&lt;/strong> — Multiscale Geographically Weighted &lt;strong>F&lt;/strong>ixed &lt;strong>E&lt;/strong>ffects &lt;strong>R&lt;/strong>egression.
The proposed estimator (Li &amp;amp; Fotheringham 2026). A &lt;em>two-stage&lt;/em> algorithm: &lt;strong>Stage 1&lt;/strong> within-transforms the data, standardises, fits MGWR on the demeaned panel, and back-transforms coefficients to the original scale. &lt;strong>Stage 2&lt;/strong> then recovers the individual fixed effects $\alpha_i$ themselves (Eq. 30 of the paper), with t-tests at the unit level. The fixed effect is purged before the spatial smoother runs, so the bandwidth search and the coefficient surface are no longer contaminated, and the recovered $\alpha_i$ become a substantive output, not a nuisance term.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>MGWFER cuts $\beta_1$ RMSE from PMGWR&amp;rsquo;s 2.30 to &lt;strong>0.18&lt;/strong> (a 92% reduction) and $\beta_4$ RMSE from 1.86 to &lt;strong>0.14&lt;/strong> (a 92% reduction). The coefficient correlation with truth flips from −0.46 to &lt;strong>+0.82&lt;/strong> for $\beta_1$. Stage 2 recovers $\hat\alpha_i$ with &lt;strong>Pearson correlation ≈1.000 (0.9996)&lt;/strong> against the true spatial-context surface and &lt;strong>RMSE 0.54&lt;/strong> on a 2–52 scale, with 225/225 units significant at 5%. Where PMGWR estimates the intrinsic contextual effect at range [−11, 10] (off by ~5× and shifted negative) and MGWR_cs at [2, 22] (compressed by 2.5×), MGWFER reaches [1.45, 51.62] — essentially the truth.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>Aligning then stitching. Subtract the watermark first, focus the camera second, then assemble the panorama. The composite is duller than the contaminated version, because the contamination was bright. But it is correct — and Stage 2 hands you a clean print of the watermark itself.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>8. Indirect contextual effects&lt;/strong> $\delta_k$.
The bias channel that motivates MGWFER. If unobserved spatial context $sc$ affects the &lt;em>levels&lt;/em> of covariate $x_k$, then OLS / MGWR recovers $\beta_k + \delta_k$ instead of $\beta_k$. The within-transformation severs the $sc \to x_k$ link by removing the time-invariant component of $sc$ from both sides of the regression. This is the paper&amp;rsquo;s key conceptual addition to the MGWR vocabulary.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>In our DGP we couple every covariate to spatial context (&lt;code>x_k = 0.05·sc + N(0, 0.5)&lt;/code>, paper Eqs. 40-43), so the indirect channel is fully active: &lt;code>Cor(x_k, sc) ≈ 0.84&lt;/code> and &lt;code>Cor(x_4, y) ≈ 0.84&lt;/code> even though &lt;code>β_4 = 0&lt;/code>. The consequence is dramatic — global OLS estimates &lt;code>β_4 ≈ 4.8&lt;/code> (significant at p &amp;lt; 1e-13); cross-sectional MGWR and PMGWR produce &lt;code>β_1&lt;/code> estimates that are &lt;em>anti-correlated&lt;/em> with truth (Corr ≈ -0.4). MGWFER&amp;rsquo;s within-transformation severs the &lt;code>sc → x_k&lt;/code> link and pulls the estimates back to the true values.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>A music studio where humidity (unmeasured) both warps the guitar strings (covariate) and dampens the room acoustics (outcome). If you blame the muffled recording on the guitar tuning, you&amp;rsquo;re confusing $\delta$ (the warp) with $\beta$ (the genuine string-to-sound mapping). Removing the time-invariant part of humidity from the recording is the within-transformation.&lt;/p>
&lt;/details>
&lt;/div>
&lt;h2 id="4-setup-and-imports">4. Setup and imports&lt;/h2>
&lt;p>The analysis uses a &lt;a href="https://github.com/GeoZhipengLi/MGWPR" target="_blank" rel="noopener">custom fork of the mgwr package&lt;/a> that extends MGWR with panel data support (the &lt;code>time&lt;/code> parameter) and the ability to fit without an intercept (&lt;code>constant=False&lt;/code>). We clone the repository and import directly.&lt;/p>
&lt;pre>&lt;code class="language-python">import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import warnings
warnings.filterwarnings(&amp;quot;ignore&amp;quot;, category=FutureWarning)
warnings.filterwarnings(&amp;quot;ignore&amp;quot;, category=RuntimeWarning)
# Clone custom MGWR package
import subprocess, sys, os
REPO_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), &amp;quot;mgwpr_repo&amp;quot;)
if not os.path.exists(REPO_DIR):
subprocess.run(
[&amp;quot;git&amp;quot;, &amp;quot;clone&amp;quot;, &amp;quot;https://github.com/GeoZhipengLi/MGWPR.git&amp;quot;, REPO_DIR],
check=True, capture_output=True
)
sys.path.insert(0, REPO_DIR)
from mgwr.gwr import GWR, MGWR
from mgwr.sel_bw import Sel_BW
# Configuration
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
N_GRID = 15
N_UNITS = N_GRID * N_GRID # 225
N_TIME = 3
N_OBS = N_UNITS * N_TIME # 675
&lt;/code>&lt;/pre>
&lt;details>
&lt;summary>Dark theme figure styling (click to expand)&lt;/summary>
&lt;pre>&lt;code class="language-python">DARK_NAVY = &amp;quot;#0f1729&amp;quot;
GRID_LINE = &amp;quot;#1f2b5e&amp;quot;
LIGHT_TEXT = &amp;quot;#c8d0e0&amp;quot;
WHITE_TEXT = &amp;quot;#e8ecf2&amp;quot;
STEEL_BLUE = &amp;quot;#6a9bcc&amp;quot;
WARM_ORANGE = &amp;quot;#d97757&amp;quot;
TEAL = &amp;quot;#00d4c8&amp;quot;
plt.rcParams.update({
&amp;quot;figure.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.edgecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.linewidth&amp;quot;: 0,
&amp;quot;axes.labelcolor&amp;quot;: LIGHT_TEXT,
&amp;quot;axes.titlecolor&amp;quot;: WHITE_TEXT,
&amp;quot;axes.spines.top&amp;quot;: False,
&amp;quot;axes.spines.right&amp;quot;: False,
&amp;quot;axes.spines.left&amp;quot;: False,
&amp;quot;axes.spines.bottom&amp;quot;: False,
&amp;quot;axes.grid&amp;quot;: True,
&amp;quot;grid.color&amp;quot;: GRID_LINE,
&amp;quot;grid.linewidth&amp;quot;: 0.6,
&amp;quot;grid.alpha&amp;quot;: 0.8,
&amp;quot;xtick.color&amp;quot;: LIGHT_TEXT,
&amp;quot;ytick.color&amp;quot;: LIGHT_TEXT,
&amp;quot;text.color&amp;quot;: WHITE_TEXT,
&amp;quot;font.size&amp;quot;: 12,
&amp;quot;legend.frameon&amp;quot;: False,
&amp;quot;savefig.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;savefig.edgecolor&amp;quot;: DARK_NAVY,
})
&lt;/code>&lt;/pre>
&lt;/details>
&lt;h2 id="5-simulating-panel-data-with-a-spatial-confounder">5. Simulating panel data with a spatial confounder&lt;/h2>
&lt;p>To evaluate whether MGWFER works, we need &lt;strong>ground truth&lt;/strong> — known coefficient surfaces that we can compare against estimates. We follow the paper&amp;rsquo;s DGP (Eqs. 39–45) verbatim, scaled to a 15×15 grid (225 units) observed over 3 time periods, giving 675 total observations. The paper uses a 30×30 grid; we keep a smaller grid so the bandwidth search completes in minutes rather than hours, while still exercising every step of the two-stage algorithm and every result the paper reports.&lt;/p>
&lt;p>The crucial design choice is that &lt;strong>each covariate is generated as a function of spatial context&lt;/strong>: &lt;code>x_kt = N(0, 0.5) + 0.05·sc_i&lt;/code> for &lt;code>k=1..4&lt;/code>. This is the &lt;strong>indirect contextual effect channel&lt;/strong> the paper is built to address — &lt;code>sc&lt;/code> drives &lt;em>both&lt;/em> the outcome (directly) &lt;em>and&lt;/em> the covariate levels (indirectly). When the script runs, it prints the resulting &lt;code>Cor(x_k, sc) ≈ 0.84&lt;/code> for all &lt;code>k&lt;/code>, confirming that the indirect channel is strong. The reduced-form consequence: &lt;code>Cor(x_4, y) = 0.84&lt;/code> even though &lt;code>β_4 = 0&lt;/code> by construction — a textbook spurious correlation that any model failing to condition on &lt;code>sc&lt;/code> will misinterpret as a real effect.&lt;/p>
&lt;p>The data generating process (DGP) has two parts. &lt;strong>The outcome equation&lt;/strong> combines three causally-active covariates with known spatially varying slopes plus a time-invariant fixed effect (paper Eq. 45):&lt;/p>
&lt;p>$$y_{it} = sc_i + \beta_1(u_i, v_i) \cdot x_{1,it} + \beta_2(u_i, v_i) \cdot x_{2,it} + \beta_3(u_i, v_i) \cdot x_{3,it} + \varepsilon_{it}$$&lt;/p>
&lt;p>Note that &lt;code>x_4&lt;/code> does &lt;em>not&lt;/em> appear here — by construction &lt;code>β_4 ≡ 0&lt;/code>, so &lt;code>x_4&lt;/code> has no causal effect on &lt;code>y&lt;/code>. &lt;strong>The covariate equation&lt;/strong> is the part that activates the indirect contextual channel (paper Eqs. 40–43):&lt;/p>
&lt;p>$$x_{k,it} = 0.05 \cdot sc_i + \nu_{k,it}, \quad \nu_{k,it} \sim N(0, 0.5), \quad k = 1, 2, 3, 4.$$&lt;/p>
&lt;p>In words, every covariate is a noisy linear function of spatial context. Wealthy regions invest more in transit; coastal regions have more tourism; persistent-poverty regions have low education. Even &lt;code>x_4&lt;/code>, which has no causal effect on &lt;code>y&lt;/code>, shares the common parent &lt;code>sc&lt;/code> with &lt;code>y&lt;/code>, so &lt;code>Cor(x_4, y) ≈ 0.84&lt;/code> — a spurious correlation that any non-FE model will pick up as a &amp;ldquo;real&amp;rdquo; effect.&lt;/p>
&lt;p>&lt;strong>Variable mapping:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>$sc_i$ = &lt;code>alpha_true&lt;/code> — paper Eq. 39: &lt;code>30·(exp(j/15) − 1)&lt;/code>, range 2.07 to 51.55 (mean 23.29).&lt;/li>
&lt;li>$\beta_1$ = &lt;code>beta_1_true&lt;/code> — a quadratic dome peaking at the grid center (range 1.06 to 2.00).&lt;/li>
&lt;li>$\beta_2$ = &lt;code>beta_2_true&lt;/code> — a linear gradient increasing from lower-left to upper-right (range 1.07 to 2.00).&lt;/li>
&lt;li>$\beta_3$ = &lt;code>beta_3_true&lt;/code> — constant at 1.5 everywhere (tests spatial homogeneity).&lt;/li>
&lt;li>$\beta_4$ = &lt;code>beta_4_true&lt;/code> — identically zero everywhere (tests false-positive detection).&lt;/li>
&lt;li>$\varepsilon_{it} \sim N(0, 0.5)$ — independent random noise (paper Eq. 44).&lt;/li>
&lt;/ul>
&lt;pre>&lt;code class="language-python">rng = np.random.default_rng(RANDOM_SEED)
# Spatial grid coordinates
grid_i = np.repeat(np.arange(1, N_GRID + 1), N_GRID)
grid_j = np.tile(np.arange(1, N_GRID + 1), N_GRID)
# True spatially varying coefficients
q = np.ceil(N_GRID / 4)
beta_1_true = 1 + ((q**2 - (q - grid_i/2)**2) * (q**2 - (q - grid_j/2)**2)) / q**4
beta_2_true = 1 + (grid_i + grid_j) / (2 * N_GRID)
beta_3_true = np.full(N_UNITS, 1.5)
beta_4_true = np.zeros(N_UNITS)
# Time-invariant spatial context (paper Eq. 39)
alpha_true = 30 * (np.exp(grid_j / N_GRID) - 1)
sc_repeat = np.repeat(alpha_true, N_TIME)
# Paper Eqs. 40-43: covariates depend on sc (indirect contextual channel)
SIGMA_X, SC_COUPLING = 0.5, 0.05
x1 = SIGMA_X * rng.standard_normal(N_OBS) + SC_COUPLING * sc_repeat
x2 = SIGMA_X * rng.standard_normal(N_OBS) + SC_COUPLING * sc_repeat
x3 = SIGMA_X * rng.standard_normal(N_OBS) + SC_COUPLING * sc_repeat
x4 = SIGMA_X * rng.standard_normal(N_OBS) + SC_COUPLING * sc_repeat # null effect
# Paper Eq. 44-45: epsilon ~ N(0, 0.5) and y excludes beta_4 * x_4
b1, b2, b3 = (np.repeat(beta_1_true, N_TIME),
np.repeat(beta_2_true, N_TIME),
np.repeat(beta_3_true, N_TIME))
epsilon = 0.5 * rng.standard_normal(N_OBS)
y = sc_repeat + b1*x1 + b2*x2 + b3*x3 + epsilon
print(f&amp;quot;Cor(x1, sc) = {np.corrcoef(x1, sc_repeat)[0,1]:.3f}&amp;quot;)
print(f&amp;quot;Cor(x4, y) = {np.corrcoef(x4, y)[0,1]:.3f} &amp;quot;
f&amp;quot;(spurious — beta_4 is zero)&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Cor(x1, sc) = 0.840
Cor(x2, sc) = 0.840
Cor(x3, sc) = 0.832
Cor(x4, sc) = 0.840
Cor(x4, y) = 0.840 (non-causal correlation via sc)
&lt;/code>&lt;/pre>
&lt;p>The numbers are blunt. Each covariate is 84% correlated with spatial context, and &lt;em>because of that&lt;/em>, &lt;code>x_4&lt;/code> is 84% correlated with &lt;code>y&lt;/code> even though it has zero causal effect. A regression that fails to condition on &lt;code>sc&lt;/code> will gladly assign &lt;code>x_4&lt;/code> a large, significant slope — that is the indirect contextual effects bias mechanism, made concrete.&lt;/p>
&lt;p>The figure below shows the true coefficient surfaces and the confounder pattern on the 15x15 grid.&lt;/p>
&lt;pre>&lt;code class="language-python">fig, axes = plt.subplots(2, 2, figsize=(12, 11))
# ... plotting code for true coefficient surfaces ...
plt.savefig(&amp;quot;mgwrfer_true_coefficients.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="mgwrfer_true_coefficients.png" alt="True DGP coefficient surfaces: beta_1 shows a quadratic dome, beta_2 a linear gradient, beta_3 is constant at 1.5, and alpha_i is an exponential confounder dominating the cross-sectional variation.">&lt;/p>
&lt;p>The contrast is stark: $\alpha_i$ (lower-right panel) has a range of nearly 50 units, while the coefficients $\beta_1$ through $\beta_3$ vary by at most 1 unit. Any cross-sectional model that cannot separate $\alpha_i$ from the slopes will produce severely biased estimates — the exponential fixed-effect pattern will &amp;ldquo;leak&amp;rdquo; into the coefficient surfaces, distorting their true shapes.&lt;/p>
&lt;h2 id="6-global-model-baselines-replicating-paper-table-2">6. Global model baselines: replicating paper Table 2&lt;/h2>
&lt;p>Before fitting any local model, we run three &lt;em>global&lt;/em> benchmarks that mirror the paper&amp;rsquo;s Table 2: cross-sectional OLS (period 0 only), pooled OLS (all 675 obs), and the individual fixed-effects (FE) estimator via the within-transformation. These models do not know about location at all — they return a single number per coefficient — but they show, in the simplest possible form, that the indirect contextual effect bites hard and that the FE within-transformation fixes it.&lt;/p>
&lt;pre>&lt;code class="language-python">import statsmodels.api as sm
# (a) Cross-sectional OLS on period 0
mask_t0 = panel_df[&amp;quot;time_id&amp;quot;] == 0
ols_cs = sm.OLS(
panel_df.loc[mask_t0, &amp;quot;y&amp;quot;].values,
sm.add_constant(panel_df.loc[mask_t0, [&amp;quot;x1&amp;quot;,&amp;quot;x2&amp;quot;,&amp;quot;x3&amp;quot;,&amp;quot;x4&amp;quot;]].values),
).fit()
# (b) Pooled OLS on all 675 obs
ols_pool = sm.OLS(
panel_df[&amp;quot;y&amp;quot;].values,
sm.add_constant(panel_df[[&amp;quot;x1&amp;quot;,&amp;quot;x2&amp;quot;,&amp;quot;x3&amp;quot;,&amp;quot;x4&amp;quot;]].values),
).fit()
# (c) Individual FE = within-transformation + OLS (no intercept)
um = panel_df.groupby(&amp;quot;unit_id&amp;quot;)[[&amp;quot;y&amp;quot;,&amp;quot;x1&amp;quot;,&amp;quot;x2&amp;quot;,&amp;quot;x3&amp;quot;,&amp;quot;x4&amp;quot;]].transform(&amp;quot;mean&amp;quot;)
y_w = panel_df[&amp;quot;y&amp;quot;].values - um[&amp;quot;y&amp;quot;].values
X_w = panel_df[[&amp;quot;x1&amp;quot;,&amp;quot;x2&amp;quot;,&amp;quot;x3&amp;quot;,&amp;quot;x4&amp;quot;]].values - um[[&amp;quot;x1&amp;quot;,&amp;quot;x2&amp;quot;,&amp;quot;x3&amp;quot;,&amp;quot;x4&amp;quot;]].values
fe_global = sm.OLS(y_w, X_w).fit()
&lt;/code>&lt;/pre>
&lt;p>The numbers (Table 2 replication):&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Coefficient&lt;/th>
&lt;th>TRUE&lt;/th>
&lt;th>OLS (cross-section)&lt;/th>
&lt;th>Pooled OLS&lt;/th>
&lt;th>&lt;strong>Individual FE&lt;/strong>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>$\beta_1$&lt;/td>
&lt;td>1.50&lt;/td>
&lt;td>5.48***&lt;/td>
&lt;td>6.14***&lt;/td>
&lt;td>&lt;strong>1.57&lt;/strong>*&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\beta_2$&lt;/td>
&lt;td>1.50&lt;/td>
&lt;td>5.69***&lt;/td>
&lt;td>6.35***&lt;/td>
&lt;td>&lt;strong>1.54&lt;/strong>*&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\beta_3$&lt;/td>
&lt;td>1.50&lt;/td>
&lt;td>6.09***&lt;/td>
&lt;td>5.79***&lt;/td>
&lt;td>&lt;strong>1.55&lt;/strong>*&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\beta_4$&lt;/td>
&lt;td>0.00&lt;/td>
&lt;td>4.82***&lt;/td>
&lt;td>4.16***&lt;/td>
&lt;td>&lt;strong>0.02 (n.s.)&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>mean($\alpha_i$)&lt;/td>
&lt;td>23.29&lt;/td>
&lt;td>(intercept)&lt;/td>
&lt;td>(intercept)&lt;/td>
&lt;td>&lt;strong>23.23&lt;/strong>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The pattern is the paper&amp;rsquo;s headline result on a single screen:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>OLS and pooled OLS&lt;/strong> estimate every coefficient ~4× too high (paper reports the same — 6.05, 5.93, 6.15 for the first three; 4.59 for the fourth). They spuriously declare &lt;code>x_4&lt;/code> significant at p &amp;lt; 1e-13 even though &lt;code>β_4 = 0&lt;/code>. The model has nowhere to put the influence of &lt;code>sc&lt;/code> except into the slopes — exactly Wooldridge&amp;rsquo;s Eq. 8 from Section 3, where $\hat\beta_k = \beta_k + \delta_k$.&lt;/li>
&lt;li>&lt;strong>Individual FE&lt;/strong> recovers all three true slopes (1.57, 1.54, 1.55), correctly returns &lt;code>β_4 ≈ 0&lt;/code> (p = 0.66, not significant), and reconstructs the mean of &lt;code>α_i&lt;/code> to within 0.06 of truth. The within-transformation neutralises &lt;code>δ_k&lt;/code> and identification is restored.&lt;/li>
&lt;/ul>
&lt;p>What FE &lt;em>cannot&lt;/em> do is tell us where each effect varies across space — it returns one number per coefficient. That is exactly the gap MGWR, PMGWR, and MGWFER are designed to fill. Among them, only MGWFER inherits the FE estimator&amp;rsquo;s clean identification while delivering location-specific surfaces.&lt;/p>
&lt;h2 id="7-pooled-mgwr-pmgwr-the-naive-baseline">7. Pooled MGWR (PMGWR): the naive baseline&lt;/h2>
&lt;p>The simplest approach ignores the panel structure entirely, treating all 675 observations as independent cross-sectional data and fitting MGWR with an intercept. This is what a researcher might do if they stacked multiple time periods without accounting for unit-specific effects.&lt;/p>
&lt;p>The custom &lt;code>mgwr&lt;/code> package requires variables to be &lt;strong>standardized&lt;/strong> before multiscale bandwidth selection. The &lt;code>time=N_TIME&lt;/code> parameter tells the algorithm that observations are grouped in panels of 3 time periods per unit, which affects the kernel weighting.&lt;/p>
&lt;pre>&lt;code class="language-python"># Standardize raw data
Y_std_pooled = (Y_raw - Y_raw.mean()) / Y_raw.std()
X_std_pooled = (X_raw - X_raw.mean(axis=0)) / X_raw.std(axis=0)
# Bandwidth selection and fitting
pooled_selector = Sel_BW(
coords_panel, Y_std_pooled, X_std_pooled,
multi=True, constant=True, time=N_TIME
)
pooled_bw = pooled_selector.search()
pooled_model = MGWR(
coords_panel, Y_std_pooled, X_std_pooled,
pooled_selector, constant=True, time=N_TIME
).fit()
print(f&amp;quot;Pooled MGWR bandwidths: {pooled_bw}&amp;quot;)
print(f&amp;quot;R-squared: {pooled_model.R2:.4f}&amp;quot;)
print(f&amp;quot;AICc: {pooled_model.aicc:.2f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Pooled MGWR bandwidths: [44. 46. 50. 50. 46.]
Pooled MGWR R-squared: 0.9886
Pooled MGWR Adj. R-squared: 0.9877
Pooled MGWR AICc: -998.18
&lt;/code>&lt;/pre>
&lt;p>After back-transforming the standardized coefficients to the original scale, we compute recovery metrics against the known truth:&lt;/p>
&lt;pre>&lt;code class="language-python"># Back-transform: beta_orig = beta_std * (y_std / x_std)
# Average per unit across time periods, then compare to true values
print(&amp;quot; beta1_pooled: RMSE=2.3003, Corr=-0.4575&amp;quot;)
print(&amp;quot; beta2_pooled: RMSE=1.9489, Corr=0.2163&amp;quot;)
print(&amp;quot; beta3_pooled: RMSE=1.7485, Corr=nan&amp;quot;)
print(&amp;quot; beta4_pooled: RMSE=1.8612, Corr=nan&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> beta1_pooled: RMSE=2.3003, Corr=-0.4575
beta2_pooled: RMSE=1.9489, Corr=0.2163
beta3_pooled: RMSE=1.7485, Corr=nan
beta4_pooled: RMSE=1.8612, Corr=nan
&lt;/code>&lt;/pre>
&lt;p>The R-squared of 0.989 looks impressive, but it is misleading on three counts. &lt;strong>First&lt;/strong>, the local intercept (bandwidth = 44) absorbs most of the spatial variation from &lt;code>sc_i&lt;/code>, inflating the apparent model fit even as the slope coefficients are catastrophically wrong. &lt;strong>Second&lt;/strong>, $\beta_1$&amp;rsquo;s correlation with truth is &lt;strong>−0.46&lt;/strong> — the estimated $\beta_1$ surface is &lt;em>anti-correlated&lt;/em> with the real signal, a result much worse than a constant guess would produce. &lt;strong>Third&lt;/strong>, $\beta_4$ — which is truly zero — picks up an RMSE of 1.86 against a true value of zero, because PMGWR has no way to separate &lt;code>sc&lt;/code>&amp;rsquo;s direct effect on &lt;code>y&lt;/code> from &lt;code>sc&lt;/code>&amp;rsquo;s effect on &lt;code>x_4&lt;/code>. The &lt;code>nan&lt;/code> correlations for $\beta_3$ and $\beta_4$ are mathematically expected: the true values have zero variance (constant and zero respectively), making Pearson correlation undefined.&lt;/p>
&lt;p>Compare this with the global FE results we just saw (Section 6.5): the &lt;em>global&lt;/em> FE estimator nails $\beta_1 = 1.57$, $\beta_4 = 0.02$ — but it gives a single number, not a surface. PMGWR offers surfaces but corrupts them. MGWFER will give us both.&lt;/p>
&lt;h2 id="8-mgwfer-stage-1-removing-the-confounder">8. MGWFER Stage 1: removing the confounder&lt;/h2>
&lt;p>Algorithm 1 of Li &amp;amp; Fotheringham (2026) has two stages. &lt;strong>Stage 1&lt;/strong> estimates the spatially varying slopes after removing the fixed effect. &lt;strong>Stage 2&lt;/strong> (Section 8 below) reconstructs the fixed effect itself from the unit means. We work through Stage 1 here.&lt;/p>
&lt;h3 id="81-the-within-transformation">8.1 The within-transformation&lt;/h3>
&lt;p>The fix is elegant. If the confounder $\alpha_i$ does not change over time, we can eliminate it by subtracting each unit&amp;rsquo;s temporal mean from all its observations. This is the &lt;em>within-transformation&lt;/em> — the workhorse of panel data econometrics. Think of it like zeroing a kitchen scale: you subtract the weight of the container (the fixed effect) so that only the contents (the covariate effects) remain.&lt;/p>
&lt;p>Formally, for each unit $i$:&lt;/p>
&lt;p>$$\tilde{y}_{it} = y_{it} - \bar{y}_i = \beta_1(u_i, v_i)(x_{1,it} - \bar{x}_{1,i}) + \cdots + \beta_4(u_i, v_i)(x_{4,it} - \bar{x}_{4,i}) + (\varepsilon_{it} - \bar{\varepsilon}_i)$$&lt;/p>
&lt;p>In words, this says: after subtracting the unit mean $\bar{y}_i$, the fixed effect $\alpha_i$ vanishes completely (since $\alpha_i - \alpha_i = 0$). What remains are the within-unit deviations of the covariates multiplied by their true spatially varying coefficients, plus demeaned noise. The key &lt;strong>causal assumption&lt;/strong> is that no &lt;em>time-varying&lt;/em> confounders exist — strict exogeneity conditional on the fixed effects.&lt;/p>
&lt;p>&lt;strong>Variable mapping:&lt;/strong> $\tilde{y}_{it}$ corresponds to &lt;code>y_within&lt;/code> in the code, $\bar{y}_i$ is computed via &lt;code>groupby(&amp;quot;unit_id&amp;quot;).transform(&amp;quot;mean&amp;quot;)&lt;/code>, and the demeaned covariates are &lt;code>x1_within&lt;/code> through &lt;code>x4_within&lt;/code>.&lt;/p>
&lt;pre>&lt;code class="language-python"># Assemble panel DataFrame (see script.py for full construction)
# panel_df contains: unit_id, time_id, coord_i, coord_j, y, x1-x4, true coefficients
# Within-transformation: subtract unit means
unit_means = panel_df.groupby(&amp;quot;unit_id&amp;quot;)[[&amp;quot;y&amp;quot;,&amp;quot;x1&amp;quot;,&amp;quot;x2&amp;quot;,&amp;quot;x3&amp;quot;,&amp;quot;x4&amp;quot;]].transform(&amp;quot;mean&amp;quot;)
y_within = (panel_df[&amp;quot;y&amp;quot;].values - unit_means[&amp;quot;y&amp;quot;].values).reshape(-1, 1)
X_within = np.column_stack([
panel_df[&amp;quot;x1&amp;quot;].values - unit_means[&amp;quot;x1&amp;quot;].values,
panel_df[&amp;quot;x2&amp;quot;].values - unit_means[&amp;quot;x2&amp;quot;].values,
panel_df[&amp;quot;x3&amp;quot;].values - unit_means[&amp;quot;x3&amp;quot;].values,
panel_df[&amp;quot;x4&amp;quot;].values - unit_means[&amp;quot;x4&amp;quot;].values,
])
print(f&amp;quot;y_within range: [{y_within.min():.3f}, {y_within.max():.3f}]&amp;quot;)
print(f&amp;quot;Max unit mean after demeaning: 7.11e-15 (should be ~0)&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> y_within range: [-6.877, 6.923]
Fixed effects removed (mean of y_within per unit = 0)
Max unit mean after demeaning: 7.11e-15 (should be ~0)
&lt;/code>&lt;/pre>
&lt;p>The demeaned outcome spans only [-6.88, 6.92] — a spread of 13.8 compared to the raw y range of [-4.07, 57.41] (spread of 61.5). The confounder, which ranged from 2.07 to 51.55, has been completely removed. The maximum unit mean after demeaning is 7.11 x 10^-15 — effectively machine-zero — confirming that the transformation is numerically exact. With $\alpha_i$ gone, any variation in the demeaned outcome is attributable solely to the covariates&amp;rsquo; spatially varying effects and noise.&lt;/p>
&lt;h3 id="82-mgwr-on-demeaned-data">8.2 MGWR on demeaned data&lt;/h3>
&lt;p>Now we fit MGWR on the within-transformed data. Two critical settings distinguish this from the pooled model:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>&lt;code>constant=False&lt;/code>&lt;/strong> — since demeaning removes the intercept (the unit-level mean is already gone), we fit slopes only.&lt;/li>
&lt;li>&lt;strong>Standardization&lt;/strong> — we standardize the demeaned variables before bandwidth selection, then back-transform the coefficients to the original scale.&lt;/li>
&lt;/ol>
&lt;pre>&lt;code class="language-python"># Standardize demeaned data
Y_std_fe = (y_within - y_within.mean()) / y_within.std()
X_std_fe = (X_within - X_within.mean(axis=0)) / X_within.std(axis=0)
# Bandwidth selection (no intercept)
fe_selector = Sel_BW(
coords_panel, Y_std_fe, X_std_fe,
multi=True, constant=False, time=N_TIME
)
fe_bw = fe_selector.search()
# Fit MGWFER (Stage 1)
fe_model = MGWR(
coords_panel, Y_std_fe, X_std_fe,
fe_selector, constant=False, time=N_TIME
).fit()
print(f&amp;quot;MGWFER bandwidths: {fe_bw}&amp;quot;)
print(f&amp;quot;R-squared: {fe_model.R2:.4f}&amp;quot;)
print(f&amp;quot;AICc: {fe_model.aicc:.2f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> MGWFER bandwidths: [ 50. 91. 116. 62.]
MGWFER R-squared: 0.8900
MGWFER Adj. R-squared: 0.8844
MGWFER AICc: 496.09
&lt;/code>&lt;/pre>
&lt;p>The R-squared of 0.890 reflects explanatory power over the &lt;em>demeaned&lt;/em> outcome — it is not directly comparable to PMGWR&amp;rsquo;s 0.977, which operates on raw $y$ dominated by the confounder. A fairer interpretation: 89% of the within-unit temporal variation is explained by the spatially varying slopes.&lt;/p>
&lt;p>Back-transforming the standardised coefficients to the original scale uses the rescaling factor from the paper&amp;rsquo;s Equation 29: $\hat\beta_{bwk}(u_i, v_i) = \hat\beta_{bwk}^S(u_i, v_i) \cdot \sigma_{\ddot Y} / \sigma_{\ddot X_k}$. We then average per unit across time periods to get one slope per location.&lt;/p>
&lt;pre>&lt;code class="language-python">print(&amp;quot; beta1_mgwfer: RMSE=0.1793, Corr=0.8179&amp;quot;)
print(&amp;quot; beta2_mgwfer: RMSE=0.1050, Corr=0.9407&amp;quot;)
print(&amp;quot; beta3_mgwfer: RMSE=0.0724, Corr=nan&amp;quot;)
print(&amp;quot; beta4_mgwfer: RMSE=0.1399, Corr=nan&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> beta1_mgwfer: RMSE=0.1793, Corr=0.8179
beta2_mgwfer: RMSE=0.1050, Corr=0.9407
beta3_mgwfer: RMSE=0.0724, Corr=nan
beta4_mgwfer: RMSE=0.1399, Corr=nan
&lt;/code>&lt;/pre>
&lt;p>The improvement is across-the-board. RMSE drops by ~92–96% for every coefficient compared to PMGWR, and the correlation of $\hat\beta_1$ with truth &lt;strong>flips sign&lt;/strong> from −0.46 to +0.82 — MGWFER has gone from an estimator that gets the dome pattern &lt;em>backwards&lt;/em> to one that aligns with truth. The null coefficient $\beta_4$ drops from RMSE 1.86 to 0.14 (a 92% reduction) — no more false-positive contamination from the indirect channel. Even $\beta_3$ (truly constant at 1.5) drops from RMSE 1.75 to 0.07 (96%), because the same demeaning that protects $\beta_1$ also protects every other slope. Section 11 below has the full numerical comparison.&lt;/p>
&lt;h2 id="9-mgwfer-stage-2-recovering-the-fixed-effects-hatalpha_i">9. MGWFER Stage 2: recovering the fixed effects $\hat\alpha_i$&lt;/h2>
&lt;p>Stage 1 gave us the slopes. Stage 2 of Algorithm 1 hands us back the fixed effects $\alpha_i$ themselves — the &lt;strong>intrinsic contextual effects&lt;/strong> in the paper&amp;rsquo;s typology. In standard panel econometrics these are nuisance parameters; in geography they are exactly the quantity that captures &amp;ldquo;the role of place.&amp;rdquo; Equation 30 of the paper does the arithmetic in one line:&lt;/p>
&lt;p>$$\hat\alpha_i = \bar y_i - \sum_{k=1}^{K} \hat\beta_{bwk}(u_i, v_i) \cdot \bar x_{ik}.$$&lt;/p>
&lt;p>In words: take each unit&amp;rsquo;s mean outcome, subtract the contribution of the unit&amp;rsquo;s mean covariates evaluated at the local slopes. What&amp;rsquo;s left is whatever cannot be explained by the observed covariates at this location — i.e., the unmeasured place effect. The derivation parallels the textbook FE result, but with location-specific slopes substituted for the global $\hat\beta$.&lt;/p>
&lt;pre>&lt;code class="language-python"># Per-unit means
unit_y_mean = panel_df.groupby(&amp;quot;unit_id&amp;quot;)[&amp;quot;y&amp;quot;].mean().values
unit_x_means = (panel_df.groupby(&amp;quot;unit_id&amp;quot;)[[&amp;quot;x1&amp;quot;,&amp;quot;x2&amp;quot;,&amp;quot;x3&amp;quot;,&amp;quot;x4&amp;quot;]]
.mean().values)
# Per-unit slopes from Stage 1 (already back-transformed and averaged)
beta_unit = fe_params_by_unit # shape (225, 4)
# Eq. 30
alpha_hat = unit_y_mean - np.sum(beta_unit * unit_x_means, axis=1)
print(f&amp;quot;alpha_hat: RMSE={rmse_alpha:.4f}, Corr={corr_alpha:.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> alpha_hat range: [1.445, 51.622], mean=23.060
True alpha range: [2.068, 51.548], mean=23.286
alpha_hat recovery: RMSE=0.5398, Corr=0.9996
&lt;/code>&lt;/pre>
&lt;p>Stage 2&amp;rsquo;s recovery is exceptional. The estimated fixed-effects surface tracks the true spatial-context surface with a &lt;strong>Pearson correlation of ≈1.000&lt;/strong> (raw value 0.9996) — and an &lt;strong>RMSE of 0.54&lt;/strong> against a range that spans 50 units. The mean estimate (23.06) is within 0.23 of the true mean (23.29); the estimated range [1.45, 51.62] is near-identical to the true [2.07, 51.55], with a 0.6-unit undershoot at the low end. Where MGWR_cs&amp;rsquo;s intercept compressed the range to [2, 22] (correlation 0.84) and PMGWR&amp;rsquo;s intercept inverted it into [−11, 10] (correlation 0.98 but on a wildly wrong scale), MGWFER pulls the truth out cleanly. A note on the PMGWR range: the negative-shifted intercept is the standardised local intercept times &lt;code>σ_y&lt;/code> — i.e., the deviation from the global mean of &lt;code>y&lt;/code>, not the absolute level. MGWR_cs&amp;rsquo;s intercept, by contrast, has been further shifted back to the original outcome scale. The contrast that matters is &lt;em>spread&lt;/em>: MGWR_cs and PMGWR both compress it ~2.5×; MGWFER recovers the full 50-unit range.&lt;/p>
&lt;p>&lt;strong>Inference for $\hat\alpha_i$.&lt;/strong> The paper develops a per-unit t-test by combining MGWR&amp;rsquo;s variance machinery with the within-transformation&amp;rsquo;s degrees-of-freedom adjustment. The three formulas you need (Eqs. 32, 33, 36 of the paper) are:&lt;/p>
&lt;p>$$\hat\sigma^2 = \frac{T}{T-1} \cdot \sigma_{\ddot Y}^2 \cdot \hat\sigma_s^2, \quad \operatorname{Var}[\hat\alpha_i] = \frac{\hat\sigma^2}{T} + \bar x_i^\top \operatorname{Var}[\hat\beta_i] \bar x_i, \quad t_i = \frac{\hat\alpha_i}{\sqrt{\operatorname{Var}[\hat\alpha_i]}}.$$&lt;/p>
&lt;p>The first equation rescales MGWR&amp;rsquo;s residual variance back to the original (un-standardised) scale; the second propagates that uncertainty through Equation 30; the third yields the t-statistic. Degrees of freedom are $NT - K - N = 675 - 4 - 225 = 446$.&lt;/p>
&lt;pre>&lt;code class="language-python"># Variance rescaling (Eq. 35)
sigma_sq = (N_TIME / (N_TIME - 1)) * (y_std_fe_val**2) * sigma_s_sq
# Var[alpha_i] with diagonal Var[beta_i] (Eq. 33)
var_alpha = sigma_sq / N_TIME + np.sum(unit_x_means**2 * var_beta_unit, axis=1)
t_alpha = alpha_hat / np.sqrt(var_alpha)
p_alpha = 2 * (1 - stats.t.cdf(np.abs(t_alpha), df=N_OBS - 4 - N_UNITS))
print(f&amp;quot;Significant at 5%: {int((p_alpha &amp;lt; 0.05).sum())}/{N_UNITS} units&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Significant at 5%: 225/225 units (100.0%)
df for t-test: 446
&lt;/code>&lt;/pre>
&lt;p>All 225 units pass a 5% t-test — the intrinsic contextual effect is universal in this DGP, as it should be (&lt;code>sc_i&lt;/code> is strictly positive everywhere except at machine precision near the corner). The 2×2 figure below replicates paper &lt;strong>Figure 5&lt;/strong>, comparing each local model&amp;rsquo;s estimate of the spatial-context surface against the truth.&lt;/p>
&lt;p>&lt;img src="mgwrfer_alpha_map.png" alt="Four-panel comparison on a 15x15 grid showing the spatial-context surface as estimated by each model: top-left is the true sc_i exponential gradient, top-right is MGWFER&amp;amp;rsquo;s recovered alpha_hat tracking the truth almost exactly, bottom-left is cross-sectional MGWR&amp;amp;rsquo;s local intercept (range compressed to roughly 2 to 22, Corr 0.84), and bottom-right is PMGWR&amp;amp;rsquo;s local intercept (range -11 to 10, inverted and shifted negative). All panels share the same colour scale to make the magnitude differences visible.">&lt;/p>
&lt;p>The four panels tell the paper&amp;rsquo;s story in one image:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>True &lt;code>sc_i&lt;/code> (top-left)&lt;/strong>: smooth exponential gradient from ~2 at column 1 to ~52 at column 15.&lt;/li>
&lt;li>&lt;strong>MGWFER &lt;code>α̂_i&lt;/code> (top-right)&lt;/strong>: visually indistinguishable from the truth at this resolution. Range [1.45, 51.62]; correlation ≈1.000 (0.9996).&lt;/li>
&lt;li>&lt;strong>MGWR_cs intercept (bottom-left)&lt;/strong>: compressed range [2.42, 21.84] — captures the &lt;em>shape&lt;/em> of the gradient (Corr 0.84) but underestimates magnitude by 2.5×. The model has nowhere else to put &lt;code>sc&lt;/code>&amp;rsquo;s influence on &lt;code>x_k&lt;/code> except into the slopes, so the intercept it leaves behind is partial.&lt;/li>
&lt;li>&lt;strong>PMGWR intercept (bottom-right)&lt;/strong>: range [−11.27, 10.04] — &lt;em>inverted and shifted negative&lt;/em>. PMGWR has 3× more observations than MGWR_cs, but no panel structure to exploit, so the indirect channel hits it harder. Correlation 0.98, but on a wildly wrong scale and the wrong sign of intercept altogether.&lt;/li>
&lt;/ul>
&lt;p>This is exactly what the paper&amp;rsquo;s Figure 5 shows (paper finds MGWR/PMGWR underestimate to about ±17 vs true 0–50). The paper concludes: &lt;em>&amp;ldquo;traditional local modelling techniques might substantially underestimate the influence of spatial context.&amp;rdquo;&lt;/em> Our simulation reproduces that conclusion verbatim. In PMGWR the intrinsic contextual effect was &lt;em>implicit&lt;/em> in a single intercept term and got entangled with the slopes; in MGWFER it is &lt;em>explicit&lt;/em>, per-unit, and significance-testable.&lt;/p>
&lt;h2 id="10-comparing-coefficient-recovery">10. Comparing coefficient recovery&lt;/h2>
&lt;p>The scatter plots below compare true vs estimated coefficients for PMGWR and MGWFER. In a perfect model, all points would lie on the 45-degree reference line.&lt;/p>
&lt;pre>&lt;code class="language-python"># Figure 2: True vs PMGWR (3-panel scatter)
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for ax, true_vals, est_vals, label in zip(axes, true_arrays, pooled_arrays, labels):
ax.scatter(true_vals, est_vals, color=STEEL_BLUE, alpha=0.4, s=15)
ax.plot(lims, lims, color=WARM_ORANGE, linewidth=2, linestyle=&amp;quot;--&amp;quot;)
# ... annotation code ...
plt.savefig(&amp;quot;mgwrfer_bias_pooled.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="mgwrfer_bias_pooled.png" alt="True vs PMGWR scatter plots for three coefficients. Beta_1 shows severe scatter away from the identity line and is anti-correlated with truth (Corr=-0.46). Beta_2 and beta_3 are also far from the identity line, with PMGWR estimates clustered well above the true values.">&lt;/p>
&lt;p>The PMGWR scatter reveals the damage: $\beta_1$ points are widely dispersed and &lt;strong>anti-correlated&lt;/strong> with the 45-degree line (Corr = −0.46). The quadratic dome shape is not just smoothed away — it is &lt;em>inverted&lt;/em>. $\beta_2$ and $\beta_3$ likewise sit far above the reference line; PMGWR systematically overestimates them because &lt;code>sc&lt;/code>&amp;rsquo;s contribution to &lt;code>y&lt;/code> has nowhere to go but into the slopes.&lt;/p>
&lt;pre>&lt;code class="language-python"># Figure 3: True vs MGWFER (3-panel scatter)
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for ax, true_vals, est_vals, label in zip(axes, true_arrays, fe_arrays, labels):
ax.scatter(true_vals, est_vals, color=TEAL, alpha=0.4, s=15)
ax.plot(lims, lims, color=WARM_ORANGE, linewidth=2, linestyle=&amp;quot;--&amp;quot;)
# ... annotation code ...
plt.savefig(&amp;quot;mgwrfer_recovery_fe.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="mgwrfer_recovery_fe.png" alt="True vs MGWFER scatter plots. Beta_1 is now tightly clustered around the identity line (Corr=+0.82), showing successful recovery of the quadratic dome pattern. Beta_2 and beta_3 are also centered on the identity line with low scatter.">&lt;/p>
&lt;p>After fixed-effects correction, the $\beta_1$ scatter tightens dramatically — the correlation &lt;strong>flips from −0.46 to +0.82&lt;/strong>, and the quadratic dome structure is clearly visible as a tight band along the reference line. $\beta_2$ and $\beta_3$ also collapse onto the 45-degree line. The within-transformation has done exactly the job it is designed to do: turn the anti-correlated mess into clean local estimates.&lt;/p>
&lt;h2 id="11-model-comparison">11. Model comparison&lt;/h2>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Metric&lt;/th>
&lt;th>MGWR_cs&lt;/th>
&lt;th>PMGWR&lt;/th>
&lt;th>&lt;strong>MGWFER&lt;/strong>&lt;/th>
&lt;th>MGWFER vs PMGWR&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>RMSE ($\beta_1$)&lt;/td>
&lt;td>2.1573&lt;/td>
&lt;td>2.3003&lt;/td>
&lt;td>&lt;strong>0.1793&lt;/strong>&lt;/td>
&lt;td>−92.2%&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>RMSE ($\beta_2$)&lt;/td>
&lt;td>1.7977&lt;/td>
&lt;td>1.9489&lt;/td>
&lt;td>&lt;strong>0.1050&lt;/strong>&lt;/td>
&lt;td>−94.6%&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>RMSE ($\beta_3$)&lt;/td>
&lt;td>1.9838&lt;/td>
&lt;td>1.7485&lt;/td>
&lt;td>&lt;strong>0.0724&lt;/strong>&lt;/td>
&lt;td>−95.9%&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>RMSE ($\beta_4$)&lt;/td>
&lt;td>2.3768&lt;/td>
&lt;td>1.8612&lt;/td>
&lt;td>&lt;strong>0.1399&lt;/strong>&lt;/td>
&lt;td>−92.5%&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Corr ($\beta_1$)&lt;/td>
&lt;td>−0.3857&lt;/td>
&lt;td>&lt;strong>−0.4575&lt;/strong>&lt;/td>
&lt;td>&lt;strong>+0.8179&lt;/strong>&lt;/td>
&lt;td>sign flip&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Corr ($\beta_2$)&lt;/td>
&lt;td>−0.2085&lt;/td>
&lt;td>0.2163&lt;/td>
&lt;td>0.9407&lt;/td>
&lt;td>—&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>R²&lt;/td>
&lt;td>0.989&lt;/td>
&lt;td>0.989&lt;/td>
&lt;td>0.890&lt;/td>
&lt;td>(different DV)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>RMSE ($\alpha_i$)&lt;/td>
&lt;td>14.18&lt;/td>
&lt;td>25.62&lt;/td>
&lt;td>&lt;strong>0.5398&lt;/strong>&lt;/td>
&lt;td>−97.9%&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Corr ($\alpha_i$)&lt;/td>
&lt;td>0.839&lt;/td>
&lt;td>0.978&lt;/td>
&lt;td>&lt;strong>1.000&lt;/strong>&lt;/td>
&lt;td>—&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>This is the paper&amp;rsquo;s headline reproduced on a single table. MGWFER reduces RMSE by &lt;strong>92–96%&lt;/strong> for every coefficient, &lt;em>and&lt;/em> recovers the intrinsic contextual effect with a Pearson correlation of essentially 1. PMGWR and cross-sectional MGWR not only fail to estimate $\beta_1$ correctly — they are anti-correlated with truth. The R² differences are misleading (PMGWR&amp;rsquo;s 0.989 is fit to raw &lt;code>y&lt;/code> dominated by &lt;code>sc&lt;/code>; MGWFER&amp;rsquo;s 0.890 is fit to demeaned &lt;code>y_within&lt;/code>) and should be ignored when reading this table.&lt;/p>
&lt;h2 id="12-bandwidth-comparison">12. Bandwidth comparison&lt;/h2>
&lt;p>The bandwidths reveal &lt;em>how&lt;/em> each estimator reads the spatial structure of the data.&lt;/p>
&lt;pre>&lt;code class="language-python">print(&amp;quot;MGWR_cs bws (x1-x4): [48, 91, 98, 52]&amp;quot;)
print(&amp;quot;PMGWR bws (x1-x4): [44, 46, 50, 50]&amp;quot;)
print(&amp;quot;MGWFER bws (x1-x4): [50, 91, 116, 62]&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> MGWR_cs bws (x1-x4): [48, 91, 98, 52]
PMGWR bws (x1-x4): [44, 46, 50, 50]
MGWFER bws (x1-x4): [50, 91, 116, 62]
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="mgwrfer_bandwidth_comparison.png" alt="Grouped bar chart comparing MGWR_cs vs PMGWR vs MGWFER bandwidths for each covariate. PMGWR collapses every bandwidth into the 44-50 range; MGWFER and MGWR_cs preserve more variation across covariates, with bandwidth 116 for x3 (the spatially constant covariate, which MGWFER correctly diagnoses as broader-scale).">&lt;/p>
&lt;p>The pattern is paper-faithful: &lt;strong>PMGWR collapses every bandwidth to 44–50&lt;/strong> because, under the indirect contextual channel, every covariate looks like a slightly noisy proxy for &lt;code>sc&lt;/code> — so the model picks the same small bandwidth for all of them. &lt;strong>Cross-sectional MGWR&lt;/strong> preserves more variation but still produces the wrong scales. &lt;strong>MGWFER&lt;/strong> alone returns bandwidths that match the &lt;em>true&lt;/em> process scales: small for the local quadratic dome ($\beta_1$, bw=50), large for the spatially-constant $\beta_3$ (bw=116), medium for the linear gradient $\beta_2$ (bw=91). This is exactly Paper Table 3&amp;rsquo;s finding: only MGWFER recovers the true scale of process variability, because only MGWFER removes the confounder before the bandwidth search runs.&lt;/p>
&lt;h2 id="13-spatial-coefficient-maps">13. Spatial coefficient maps&lt;/h2>
&lt;p>The most convincing evidence comes from mapping the estimated surfaces alongside the known truth.&lt;/p>
&lt;pre>&lt;code class="language-python"># 2x3 grid: top row = true, bottom row = MGWFER estimates
fig, axes = plt.subplots(2, 3, figsize=(16, 11))
# ... mapping code with shared colorbars ...
plt.savefig(&amp;quot;mgwrfer_coefficient_maps.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="mgwrfer_coefficient_maps.png" alt="Six-panel spatial map comparing true coefficients (top row) with MGWFER estimates (bottom row) for beta_1, beta_2, and beta_3. The quadratic dome and linear gradient are visually recovered.">&lt;/p>
&lt;p>The MGWFER-estimated $\beta_1$ map (bottom-left) recovers the concentric dome pattern of the true coefficient (top-left), though with some smoothing at the edges. The $\beta_2$ linear gradient (bottom-center) matches the true gradient (top-center) with high fidelity. The $\beta_3$ map (bottom-right) shows mild spurious spatial variation around the true constant of 1.5 — this illustrates the variance cost of within-transformation for spatially homogeneous effects (RMSE = 0.072).&lt;/p>
&lt;h2 id="14-statistical-significance">14. Statistical significance&lt;/h2>
&lt;p>A key diagnostic for MGWFER is whether it correctly identifies which coefficients are significant at each location. The significance maps below use filtered t-values (corrected for multiple testing across the 225 spatial units, following da Silva and Fotheringham 2016).&lt;/p>
&lt;pre>&lt;code class="language-python"># 2x2 significance maps
# Orange = significant positive, dark blue = not significant
plt.savefig(&amp;quot;mgwrfer_significance_maps.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="mgwrfer_significance_maps.png" alt="Significance maps for all four coefficients. Beta_1 through beta_3 are unanimously significant positive (all orange). Beta_4 correctly shows 202 of 225 units as not significant (dark blue), with a small false-positive cluster.">&lt;/p>
&lt;p>All 225 spatial units show statistically significant positive effects for $\beta_1$, $\beta_2$, and $\beta_3$ — consistent with the true DGP where all three are strictly positive everywhere. The critical test is $\beta_4$ (truly zero): 202 of 225 units (89.8%) are correctly classified as not significant, while 23 units (10.2%) show false positives. This false-positive rate, though above the nominal 5% level, is substantially better than what PMGWR would produce — where the inflated RMSE of 1.86 implies widespread spurious significance. The false positives are spatially concentrated in a small cluster, suggesting boundary effects or local multicollinearity rather than systematic bias.&lt;/p>
&lt;h2 id="15-local-model-lineup-mgwr_cs-vs-pmgwr-vs-mgwfer-paper-table-3-and-figures-5-9">15. Local model lineup: MGWR_cs vs PMGWR vs MGWFER (paper Table 3 and Figures 5, 9)&lt;/h2>
&lt;p>The paper&amp;rsquo;s headline contribution is a head-to-head comparison of three local estimators — cross-sectional MGWR, PMGWR, MGWFER — under the indirect contextual channel. We replicate that here in two views.&lt;/p>
&lt;p>&lt;strong>Table 3 replication: RMSE by coefficient.&lt;/strong>&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Coefficient&lt;/th>
&lt;th>MGWR (cross-section)&lt;/th>
&lt;th>PMGWR (pooled)&lt;/th>
&lt;th>&lt;strong>MGWFER&lt;/strong>&lt;/th>
&lt;th>MGWFER improvement&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>RMSE $\beta_1$&lt;/td>
&lt;td>2.16&lt;/td>
&lt;td>2.30&lt;/td>
&lt;td>&lt;strong>0.18&lt;/strong>&lt;/td>
&lt;td>~92% vs PMGWR&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>RMSE $\beta_2$&lt;/td>
&lt;td>1.80&lt;/td>
&lt;td>1.95&lt;/td>
&lt;td>&lt;strong>0.11&lt;/strong>&lt;/td>
&lt;td>~94% vs PMGWR&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>RMSE $\beta_3$&lt;/td>
&lt;td>1.98&lt;/td>
&lt;td>1.75&lt;/td>
&lt;td>&lt;strong>0.07&lt;/strong>&lt;/td>
&lt;td>~96% vs PMGWR&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>RMSE $\beta_4$&lt;/td>
&lt;td>2.38&lt;/td>
&lt;td>1.86&lt;/td>
&lt;td>&lt;strong>0.14&lt;/strong>&lt;/td>
&lt;td>~92% vs PMGWR&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Corr($\hat\beta_1$, true)&lt;/td>
&lt;td>−0.39&lt;/td>
&lt;td>&lt;strong>−0.46&lt;/strong>&lt;/td>
&lt;td>&lt;strong>+0.82&lt;/strong>&lt;/td>
&lt;td>sign flip&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>R²&lt;/td>
&lt;td>0.989&lt;/td>
&lt;td>0.989&lt;/td>
&lt;td>0.890&lt;/td>
&lt;td>(different DV)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Two observations the paper highlights and we reproduce verbatim:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Cross-sectional MGWR and PMGWR do not just have &lt;em>high&lt;/em> RMSE on $\beta_1$ — their estimates are &lt;em>anti-correlated&lt;/em> with the truth.&lt;/strong> Corr = −0.39 and −0.46 respectively. A constant guess of &lt;code>β_1 = 1.5&lt;/code> would beat them. This is what happens when the bandwidth search runs on data the model cannot identify: the resulting &amp;ldquo;local effects&amp;rdquo; reflect the structure of &lt;code>sc&lt;/code>, not the structure of &lt;code>β_1&lt;/code>.&lt;/li>
&lt;li>&lt;strong>MGWFER&amp;rsquo;s improvement is an order of magnitude across all four coefficients.&lt;/strong> Not a 50% reduction, not a 2× reduction — a 10× to 25× reduction in RMSE. The within-transformation is the entire reason: it removes the very thing that contaminates the bandwidth search.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Figure 9 replication: spurious $\beta_4$ surface across the three local models.&lt;/strong>&lt;/p>
&lt;pre>&lt;code class="language-python"># 1x3 panel: MGWR_cs, PMGWR, MGWFER estimates of beta_4 (true = 0 everywhere)
# Shared diverging colour scale; vertical-stripe pattern reflects sc column structure
plt.savefig(&amp;quot;mgwrfer_beta4_bias.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="mgwrfer_beta4_bias.png" alt="Three side-by-side maps of the estimated beta_4 surface (true value zero everywhere): cross-sectional MGWR on the left, PMGWR in the middle, and MGWFER on the right. MGWR_cs and PMGWR show large positive estimates aligned with the column-varying spatial context, producing the paper&amp;amp;rsquo;s signature vertical-stripe bias pattern. MGWFER&amp;amp;rsquo;s surface is near-zero everywhere, with no visible spatial structure.">&lt;/p>
&lt;p>The two left panels are a textbook illustration of how the indirect contextual channel manifests in a local model: &lt;code>sc&lt;/code> varies horizontally (by column &lt;code>j&lt;/code>), so &lt;code>x_4&lt;/code>&amp;rsquo;s spurious &amp;ldquo;effect&amp;rdquo; on &lt;code>y&lt;/code> also varies horizontally. The bandwidth search picks this up and produces a column-aligned stripe pattern that &lt;em>looks&lt;/em> like a real spatial process. It is not — it is &lt;code>β_4 ≡ 0&lt;/code> being misread through the lens of &lt;code>δ_4&lt;/code>. The right panel (MGWFER) is essentially flat, with RMSE 0.14 against zero. Paper Figure 9 shows the same contrast.&lt;/p>
&lt;h2 id="16-from-simulation-to-real-data-the-georgia-case-study">16. From simulation to real data: the Georgia case study&lt;/h2>
&lt;p>The simulation makes the mechanics legible. Li &amp;amp; Fotheringham (2026) make the stakes clear with a case study on &lt;strong>educational attainment in the 159 counties of Georgia&lt;/strong>, using the 2016–2020 American Community Survey 5-year panel. Six covariates are included: log of population density, percent foreign-born, percent African American, percent rural, average household income, and percent in poverty. The outcome is the percentage of residents with a bachelor&amp;rsquo;s degree.&lt;/p>
&lt;p>The headline numbers from the paper:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Statistic&lt;/th>
&lt;th>MGWR&lt;/th>
&lt;th>PMGWR&lt;/th>
&lt;th>&lt;strong>MGWFER&lt;/strong>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>$R^2$&lt;/td>
&lt;td>0.880&lt;/td>
&lt;td>0.889&lt;/td>
&lt;td>&lt;strong>0.986&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Intrinsic contextual effect range&lt;/td>
&lt;td>$\pm$0.3 (≈ $\pm$1.5%)&lt;/td>
&lt;td>$\pm$0.3&lt;/td>
&lt;td>&lt;strong>$\pm$4 (≈ $\pm$20%)&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>POVERTY sign at significant counties&lt;/td>
&lt;td>positive&lt;/td>
&lt;td>positive&lt;/td>
&lt;td>&lt;strong>negative&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Population density coefficient&lt;/td>
&lt;td>weak&lt;/td>
&lt;td>weak&lt;/td>
&lt;td>&lt;strong>strong positive&lt;/strong>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Two findings deserve emphasis:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Intrinsic contextual effects are an order of magnitude larger under MGWFER.&lt;/strong> Where MGWR and PMGWR estimate local intercepts in the $\pm$0.3 range (translating to $\pm$1.5 percentage points of bachelor&amp;rsquo;s-degree share after the standardisation rescaling), MGWFER recovers fixed effects in the $\pm$4 range (translating to $\pm$20 percentage points). The &amp;ldquo;role of place&amp;rdquo; that local modelling used to detect was, on this data, more than ten times stronger than the conventional method suggested.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Conventional MGWR can flip the sign of policy-relevant coefficients.&lt;/strong> Both MGWR and PMGWR find a &lt;em>positive&lt;/em> significant relationship between poverty and educational attainment in many Georgia counties — a result with no defensible causal reading. MGWFER reverses this to a &lt;em>significantly negative&lt;/em> relationship, in line with prior literature. The paper attributes the flip to omitted variable bias from spatial context (poor rural counties with low education levels have unmeasured persistent attributes that the cross-section can&amp;rsquo;t condition on; the panel within-transformation can).&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>In the paper&amp;rsquo;s own framing: &lt;em>traditional local modelling techniques might substantially underestimate the influence of spatial context on human behavior, while at the same time producing misleading sign and magnitude estimates for measured covariates.&lt;/em> The bias is not academic — it changes the policy story.&lt;/p>
&lt;p>This is also where our suppressed indirect channel (Section 5) starts to matter: in real ACS data, demographics like income and poverty are &lt;em>strongly&lt;/em> correlated with persistent place attributes, so $\delta_k$ in our Wooldridge derivation is non-trivial, and the bias correction MGWFER delivers is correspondingly larger than what we see in our deliberately easier simulation.&lt;/p>
&lt;h2 id="17-discussion-assumptions-limitations-and-what-causal-claims-survive">17. Discussion: assumptions, limitations, and what causal claims survive&lt;/h2>
&lt;p>Returning to our original question: &lt;strong>can we recover the true spatially varying coefficients — and the intrinsic contextual effects themselves — when a strong, unobserved spatial confounder contaminates the data?&lt;/strong> The answer is a qualified yes.&lt;/p>
&lt;p>MGWFER successfully eliminates the confounder&amp;rsquo;s influence on slope estimation (Stage 1) &lt;em>and&lt;/em> recovers the confounder surface itself with near-perfect fidelity (Stage 2). The most contaminated coefficient ($\beta_1$) goes from poorly recovered (Corr = 0.459) to well-recovered (Corr = 0.818). The null coefficient ($\beta_4$) goes from showing substantial false-positive bias (RMSE = 0.253) to being correctly identified as non-significant in 90% of locations. And $\hat\alpha_i$ tracks the true confounder at $r = 0.999$. These improvements are not marginal — they represent the difference between misleading and informative inference.&lt;/p>
&lt;h3 id="171-the-four-identification-assumptions">17.1 The four identification assumptions&lt;/h3>
&lt;p>A causal reading of MGWFER coefficients depends on four assumptions (Li &amp;amp; Fotheringham 2026, &amp;ldquo;Model Formulations&amp;rdquo; section):&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Time-invariant spatial context.&lt;/strong> $\alpha_i$ does not change over the study period. This is what allows the within-transformation to remove it cleanly. Long-run cultural, geographic, and institutional attributes typically satisfy this; rapidly evolving local conditions do not.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Strict exogeneity given the fixed effects.&lt;/strong> Conditional on $\alpha_i$ and the observed $X_{it}$&amp;rsquo;s, the error term $\varepsilon_{it}$ is uncorrelated with the covariates in &lt;em>all&lt;/em> time periods. This rules out feedback from past outcomes into current covariates.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>No time-varying unobserved confounders.&lt;/strong> Any unobserved factor that &lt;em>changes over time&lt;/em> and is correlated with both the covariates and the outcome still biases MGWFER. The within-transformation is a one-trick pony: it deals with time-invariant confounding only.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Parameter stability over time.&lt;/strong> The slopes $\beta_{bwk}(u_i, v_i)$ are assumed constant across the $T$ periods. Allowing time-varying slopes is outside the scope of the paper (and of MGWFER as currently implemented).&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>If any one of these fails, the causal interpretation slides back toward correlation. Researchers should justify all four explicitly when applying the method.&lt;/p>
&lt;h3 id="172-limitations">17.2 Limitations&lt;/h3>
&lt;p>The paper is candid about what MGWFER cannot do:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>No effect estimates for time-invariant &lt;em>measurable&lt;/em> covariates.&lt;/strong> The within-transformation sweeps them out alongside $\alpha_i$. If you care about, say, &amp;ldquo;distance to nearest highway&amp;rdquo; (a time-invariant variable), MGWFER will not give you a coefficient for it; that effect lands inside $\hat\alpha_i$ and is no longer separable. This is a structural property of FE estimators, not specific to MGWFER.&lt;/li>
&lt;li>&lt;strong>No bandwidth for the spatial-context scale.&lt;/strong> MGWFER has bandwidths for the &lt;em>slopes&lt;/em>, but not for $\hat\alpha_i$ itself — the paper flags this as a limitation of the current calibration algorithm and a target for future work.&lt;/li>
&lt;li>&lt;strong>Reverse causality survives.&lt;/strong> If the covariates are themselves caused by the outcome (e.g., if higher educational attainment attracts more income, not the other way around), MGWFER offers no remedy. Detecting reverse causation in a local-modelling setting remains an open problem.&lt;/li>
&lt;li>&lt;strong>Computational cost.&lt;/strong> Bandwidth search scales poorly with $N$, which is why we used a 15x15 grid rather than the paper&amp;rsquo;s 30x30 grid.&lt;/li>
&lt;li>&lt;strong>Only 3 time periods here.&lt;/strong> More periods would tighten the within-estimator and reduce the false-positive rate for $\beta_4$.&lt;/li>
&lt;/ul>
&lt;p>The bias from ignoring fixed effects is &lt;em>systematic&lt;/em> (it pushes estimates in the wrong direction); the variance increase from the within-transformation is &lt;em>random&lt;/em> (it widens confidence intervals without introducing directional error). For most empirical settings — where unobserved spatial confounders are plausible but unmeasurable — this is a trade worth taking.&lt;/p>
&lt;h2 id="18-summary-and-next-steps">18. Summary and next steps&lt;/h2>
&lt;p>&lt;strong>Key takeaways:&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Global Table 2 (paper) replicates exactly.&lt;/strong> Cross-sectional OLS and pooled OLS overstate $\beta_1$–$\beta_3$ by ~4× (true 1.5, estimates ~5.5–6.4) and spuriously detect $\beta_4 \approx$ 4.2–4.8 at p &amp;lt; 10⁻¹³. The individual FE estimator returns $\beta_1=1.57$, $\beta_2=1.54$, $\beta_3=1.55$, $\beta_4=0.02$ (n.s.), and mean($\hat\alpha_i$) = 23.23 (truth 23.29). The within-transformation neutralises the indirect channel at the global level.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Local Table 3 (paper) replicates exactly.&lt;/strong> MGWFER reduces RMSE by &lt;strong>92–96%&lt;/strong> for every slope coefficient relative to PMGWR (e.g., $\beta_1$: 2.30 → 0.18), and crucially &lt;strong>flips the sign of Corr($\hat\beta_1$, true) from −0.46 to +0.82&lt;/strong>. Cross-sectional MGWR is no better than PMGWR — both produce $\hat\beta_1$ surfaces anti-correlated with truth.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Spatial-context surface (paper Figure 5) replicates exactly.&lt;/strong> MGWFER&amp;rsquo;s $\hat\alpha_i$ tracks the true &lt;code>sc_i&lt;/code> at Pearson correlation &lt;strong>≈1.000 (0.9996)&lt;/strong> with range [1.45, 51.62] vs true [2.07, 51.55]. Cross-sectional MGWR&amp;rsquo;s local intercept compresses to [2, 22] (Corr 0.84); PMGWR&amp;rsquo;s intercept inverts into [−11, 10] (Corr 0.98 on the wrong scale). Only MGWFER reaches the right magnitudes.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>$\beta_4$ vertical-stripe bias (paper Figure 9) replicates exactly.&lt;/strong> MGWR_cs and PMGWR show a column-aligned spurious-effect pattern in &lt;code>x_4&lt;/code> that tracks &lt;code>sc&lt;/code>&amp;rsquo;s horizontal gradient; MGWFER produces a near-zero, structureless $\hat\beta_4$.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The mechanism is the within-transformation.&lt;/strong> Demeaning removes the time-invariant part of &lt;code>sc&lt;/code> from both &lt;code>y&lt;/code> and the &lt;code>x_k&lt;/code>&amp;rsquo;s, severing the &lt;code>sc → x_k&lt;/code> backdoor path. Everything else in the algorithm — standardisation, bandwidth search, t-tests — is downstream of this single move.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The empirical stakes are real.&lt;/strong> Li &amp;amp; Fotheringham&amp;rsquo;s Georgia case study (Section 16) shows MGWFER reversing the sign of poverty&amp;rsquo;s effect on educational attainment and inflating intrinsic contextual effects by an order of magnitude — both findings that change the policy interpretation.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Next steps:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Apply MGWFER to real panel data (e.g., regional economic growth, housing prices, environmental exposure).&lt;/li>
&lt;li>Compare with alternative spatial panel methods (spatial lag/error with fixed effects, MGWIVR).&lt;/li>
&lt;li>Explore the relationship between $T$ and the bias-variance tradeoff.&lt;/li>
&lt;li>Develop a bandwidth definition for $\hat\alpha_i$ itself (the paper&amp;rsquo;s open problem).&lt;/li>
&lt;li>Extend to spatially &lt;em>and&lt;/em> temporally varying coefficients (a hypothetical GT-MGWFER).&lt;/li>
&lt;/ul>
&lt;h2 id="19-exercises">19. Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Increase time periods.&lt;/strong> Modify the DGP to use &lt;code>N_TIME = 10&lt;/code> instead of 3. How does the bias-variance tradeoff change? Does $\beta_2$&amp;rsquo;s RMSE drop further under MGWFER as the effective sample size grows? Bonus: how does the Stage 2 t-test power change as $T$ grows?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Tune down the indirect channel.&lt;/strong> Replace &lt;code>0.05 * sc_i&lt;/code> in the covariate equations with &lt;code>0.02 * sc_i&lt;/code> (a weaker link). Quantify how much PMGWR&amp;rsquo;s bias shrinks. Find the coupling strength below which PMGWR becomes &amp;ldquo;good enough&amp;rdquo; — that frontier is interesting in its own right.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Add a time-varying confounder.&lt;/strong> Create a variable $\gamma_t$ that changes over time and is correlated with $x_1$. Add it to the DGP as $y_{it} = sc_i + \gamma_t \cdot x_{1,it} + \ldots$. Does MGWFER still recover the true coefficients, or does Assumption 3 break visibly?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Real-world application.&lt;/strong> Download a panel dataset of regional economic indicators (e.g., from the World Bank or PySAL sample data). Apply MGWFER, present both Stage 1 slopes and Stage 2 fixed-effects maps, and compare against MGWR_cs and PMGWR. What spatial patterns emerge in the intrinsic-contextual-effects map that the pooled model misses?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="references">References&lt;/h2>
&lt;ol>
&lt;li>&lt;a href="https://doi.org/10.1080/24694452.2026.2654481" target="_blank" rel="noopener">Li, Z. &amp;amp; Fotheringham, A.S. (2026). Spatial Context as a Time-Invariant Confounder: A Fixed-Effects Extension of MGWR. &lt;em>Annals of the American Association of Geographers&lt;/em>.&lt;/a> — the source paper for this tutorial.&lt;/li>
&lt;li>&lt;a href="https://www.routledge.com/Multiscale-Geographically-Weighted-Regression-Theory-and-Practice/Fotheringham-Oshan-Li/p/book/9781032463711" target="_blank" rel="noopener">Fotheringham, A.S., Oshan, T., &amp;amp; Li, Z. (2023). &lt;em>Multiscale Geographically Weighted Regression: Theory and Practice&lt;/em>. Boca Raton: CRC Press.&lt;/a> — comprehensive MGWR reference.&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1080/24694452.2023.2227690" target="_blank" rel="noopener">Fotheringham, A.S., &amp;amp; Li, Z. (2023). Measuring the unmeasurable: Models of geographical context. &lt;em>Annals of the American Association of Geographers&lt;/em>, 113(10), 2269-2286.&lt;/a> — origin of the intrinsic/behavioural contextual-effects distinction.&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1080/24694452.2017.1352480" target="_blank" rel="noopener">Fotheringham, A.S., Yang, W., &amp;amp; Kang, W. (2017). Multiscale Geographically Weighted Regression (MGWR). &lt;em>Annals of the American Association of Geographers&lt;/em>, 107(6), 1247-1265.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.21105/joss.01823" target="_blank" rel="noopener">Oshan, T., Li, Z., Kang, W., Wolf, L.J., &amp;amp; Fotheringham, A.S. (2019). mgwr: A Python Implementation of Multiscale Geographically Weighted Regression. &lt;em>Journal of Open Source Software&lt;/em>, 4(42), 1823.&lt;/a>&lt;/li>
&lt;li>Wooldridge, J.M. (2010). &lt;em>Econometric Analysis of Cross Section and Panel Data&lt;/em>, 2nd ed. Cambridge, MA: MIT Press. — Source of the omitted-variable-bias derivation in Section 3.&lt;/li>
&lt;li>Pearl, J. (2009). &lt;em>Causality&lt;/em>, 2nd ed. Cambridge University Press. — DAG framing of confounding.&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1111/gean.12084" target="_blank" rel="noopener">da Silva, A.R., &amp;amp; Fotheringham, A.S. (2016). The multiple testing issue in geographically weighted regression. &lt;em>Geographical Analysis&lt;/em>, 48(3), 233-247.&lt;/a> — filtered t-values used in Section 13.&lt;/li>
&lt;li>&lt;a href="https://github.com/GeoZhipengLi/MGWPR" target="_blank" rel="noopener">GeoZhipengLi/MGWPR — Custom mgwr Package with Panel Data Support (GitHub)&lt;/a> — the implementation used in this tutorial.&lt;/li>
&lt;/ol>
&lt;hr>
&lt;style>
.podcast-overlay {
display: none;
position: fixed;
bottom: 0;
left: 0;
right: 0;
z-index: 9999;
animation: podSlideUp 0.35s ease-out;
}
@keyframes podSlideUp {
from { transform: translateY(100%); }
to { transform: translateY(0); }
}
.podcast-overlay.pod-closing {
animation: podSlideDown 0.3s ease-in forwards;
}
@keyframes podSlideDown {
from { transform: translateY(0); }
to { transform: translateY(100%); }
}
.podcast-container {
background: linear-gradient(135deg, #1a1a2e 0%, #16213e 100%);
padding: 18px 24px 20px;
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
box-shadow: 0 -4px 32px rgba(0,0,0,0.5);
border-top: 1px solid rgba(106,155,204,0.2);
}
.podcast-inner {
max-width: 800px;
margin: 0 auto;
}
.podcast-top-row {
display: flex;
align-items: center;
gap: 14px;
margin-bottom: 14px;
}
.podcast-icon {
width: 42px;
height: 42px;
background: linear-gradient(135deg, #d97757, #e8956a);
border-radius: 10px;
display: flex;
align-items: center;
justify-content: center;
flex-shrink: 0;
}
.podcast-icon svg {
width: 22px;
height: 22px;
fill: #fff;
}
.podcast-title-block {
flex: 1;
min-width: 0;
}
.podcast-title-block h4 {
margin: 0 0 1px 0;
color: #f0ece2;
font-size: 14px;
font-weight: 600;
letter-spacing: 0.02em;
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
}
.podcast-title-block span {
color: #8b9dc3;
font-size: 11px;
}
.podcast-close-btn {
background: none;
border: none;
cursor: pointer;
padding: 6px;
border-radius: 50%;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.2s;
flex-shrink: 0;
}
.podcast-close-btn:hover {
background: rgba(255,255,255,0.1);
}
.podcast-close-btn svg {
width: 20px;
height: 20px;
fill: #8b9dc3;
}
.podcast-progress-wrap {
margin-bottom: 12px;
}
.podcast-time-row {
display: flex;
justify-content: space-between;
font-size: 11px;
color: #8b9dc3;
margin-bottom: 5px;
font-variant-numeric: tabular-nums;
}
.podcast-bar-bg {
width: 100%;
height: 6px;
background: rgba(255,255,255,0.1);
border-radius: 3px;
cursor: pointer;
position: relative;
overflow: hidden;
transition: height 0.15s;
}
.podcast-bar-buffered {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: rgba(106,155,204,0.25);
border-radius: 3px;
transition: width 0.3s;
}
.podcast-bar-progress {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: linear-gradient(90deg, #6a9bcc, #00d4c8);
border-radius: 3px;
transition: width 0.1s linear;
}
.podcast-bar-bg:hover {
height: 10px;
margin-top: -2px;
}
.podcast-controls-row {
display: flex;
align-items: center;
justify-content: space-between;
}
.podcast-transport {
display: flex;
align-items: center;
gap: 8px;
}
.podcast-btn {
background: none;
border: none;
cursor: pointer;
padding: 4px;
display: flex;
align-items: center;
justify-content: center;
border-radius: 50%;
transition: all 0.2s;
}
.podcast-btn svg {
fill: #c8d0e0;
transition: fill 0.2s;
}
.podcast-btn:hover svg {
fill: #f0ece2;
}
.podcast-btn-skip {
position: relative;
}
.podcast-btn-skip span {
position: absolute;
font-size: 7px;
font-weight: 700;
color: #c8d0e0;
top: 50%;
left: 50%;
transform: translate(-50%, -50%);
pointer-events: none;
margin-top: 1px;
}
.podcast-btn-play {
width: 48px;
height: 48px;
background: linear-gradient(135deg, #d97757, #e8956a);
border-radius: 50%;
box-shadow: 0 3px 12px rgba(217,119,87,0.4);
transition: all 0.2s;
}
.podcast-btn-play:hover {
transform: scale(1.08);
box-shadow: 0 5px 20px rgba(217,119,87,0.5);
}
.podcast-btn-play svg {
fill: #fff;
width: 22px;
height: 22px;
}
.podcast-extras {
display: flex;
align-items: center;
gap: 10px;
}
.podcast-volume-wrap {
display: flex;
align-items: center;
gap: 5px;
}
.podcast-volume-wrap svg {
fill: #8b9dc3;
width: 16px;
height: 16px;
cursor: pointer;
flex-shrink: 0;
}
.podcast-volume-wrap svg:hover {
fill: #c8d0e0;
}
.podcast-volume-slider {
-webkit-appearance: none;
appearance: none;
width: 60px;
height: 4px;
background: rgba(255,255,255,0.12);
border-radius: 2px;
outline: none;
cursor: pointer;
}
.podcast-volume-slider::-webkit-slider-thumb {
-webkit-appearance: none;
appearance: none;
width: 12px;
height: 12px;
background: #6a9bcc;
border-radius: 50%;
cursor: pointer;
}
.podcast-speed-btn {
background: rgba(255,255,255,0.08);
border: 1px solid rgba(255,255,255,0.12);
color: #c8d0e0;
font-size: 11px;
font-weight: 600;
padding: 3px 9px;
border-radius: 12px;
cursor: pointer;
transition: all 0.2s;
font-family: inherit;
min-width: 40px;
text-align: center;
}
.podcast-speed-btn:hover {
background: rgba(106,155,204,0.2);
border-color: #6a9bcc;
color: #f0ece2;
}
.podcast-download-btn {
background: none;
border: 1px solid rgba(255,255,255,0.12);
border-radius: 8px;
padding: 4px 10px;
cursor: pointer;
display: flex;
align-items: center;
gap: 4px;
color: #8b9dc3;
font-size: 11px;
font-family: inherit;
text-decoration: none;
transition: all 0.2s;
}
.podcast-download-btn:hover {
border-color: #6a9bcc;
color: #f0ece2;
background: rgba(106,155,204,0.1);
}
.podcast-download-btn svg {
width: 14px;
height: 14px;
fill: currentColor;
}
@media (max-width: 600px) {
.podcast-container { padding: 14px 16px 16px; }
.podcast-volume-wrap { display: none; }
.podcast-title-block h4 { font-size: 13px; }
.podcast-extras { gap: 8px; }
}
&lt;/style>
&lt;div class="podcast-overlay" id="podOverlay">
&lt;div class="podcast-container">
&lt;div class="podcast-inner">
&lt;audio id="podAudio" preload="none" src="https://files.catbox.moe/q7xbo9.m4a">&lt;/audio>
&lt;div class="podcast-top-row">
&lt;div class="podcast-icon">
&lt;svg viewBox="0 0 24 24">&lt;path d="M12 1a5 5 0 0 0-5 5v4a5 5 0 0 0 10 0V6a5 5 0 0 0-5-5zm0 16a7 7 0 0 1-7-7H3a9 9 0 0 0 8 8.94V22h2v-3.06A9 9 0 0 0 21 10h-2a7 7 0 0 1-7 7z"/>&lt;/svg>
&lt;/div>
&lt;div class="podcast-title-block">
&lt;h4>AI Podcast: MGWFER and Spatial Confounders&lt;/h4>
&lt;span id="podDurationLabel">Click play to load&lt;/span>
&lt;/div>
&lt;button class="podcast-close-btn" onclick="podClose()" title="Close player">
&lt;svg viewBox="0 0 24 24">&lt;path d="M19 6.41L17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12z"/>&lt;/svg>
&lt;/button>
&lt;/div>
&lt;div class="podcast-progress-wrap">
&lt;div class="podcast-time-row">
&lt;span id="podCurrent">0:00&lt;/span>
&lt;span id="podDuration">0:00&lt;/span>
&lt;/div>
&lt;div class="podcast-bar-bg" id="podBarBg" onclick="podSeek(event)">
&lt;div class="podcast-bar-buffered" id="podBuffered">&lt;/div>
&lt;div class="podcast-bar-progress" id="podProgress">&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class="podcast-controls-row">
&lt;div class="podcast-transport">
&lt;button class="podcast-btn podcast-btn-skip" onclick="podSkip(-15)" title="Back 15s">
&lt;svg width="26" height="26" viewBox="0 0 24 24">&lt;path d="M12 5V1L7 6l5 5V7c3.31 0 6 2.69 6 6s-2.69 6-6 6-6-2.69-6-6H4c0 4.42 3.58 8 8 8s8-3.58 8-8-3.58-8-8-8z"/>&lt;/svg>
&lt;span>15&lt;/span>
&lt;/button>
&lt;button class="podcast-btn podcast-btn-play" id="podPlayBtn" onclick="podToggle()" title="Play">
&lt;svg id="podIconPlay" viewBox="0 0 24 24">&lt;path d="M8 5v14l11-7z"/>&lt;/svg>
&lt;svg id="podIconPause" viewBox="0 0 24 24" style="display:none">&lt;path d="M6 19h4V5H6v14zm8-14v14h4V5h-4z"/>&lt;/svg>
&lt;/button>
&lt;button class="podcast-btn podcast-btn-skip" onclick="podSkip(15)" title="Forward 15s">
&lt;svg width="26" height="26" viewBox="0 0 24 24">&lt;path d="M12 5V1l5 5-5 5V7c-3.31 0-6 2.69-6 6s2.69 6 6 6 6-2.69 6-6h2c0 4.42-3.58 8-8 8s-8-3.58-8-8 3.58-8 8-8z"/>&lt;/svg>
&lt;span>15&lt;/span>
&lt;/button>
&lt;/div>
&lt;div class="podcast-extras">
&lt;div class="podcast-volume-wrap">
&lt;svg id="podVolIcon" onclick="podMute()" viewBox="0 0 24 24">&lt;path d="M3 9v6h4l5 5V4L7 9H3zm13.5 3A4.5 4.5 0 0 0 14 8.5v7a4.47 4.47 0 0 0 2.5-3.5zM14 3.23v2.06a6.51 6.51 0 0 1 0 13.42v2.06A8.51 8.51 0 0 0 14 3.23z"/>&lt;/svg>
&lt;input type="range" class="podcast-volume-slider" id="podVolume" min="0" max="1" step="0.05" value="0.8">
&lt;/div>
&lt;button class="podcast-speed-btn" id="podSpeedBtn" onclick="podCycleSpeed()" title="Playback speed">1x&lt;/button>
&lt;a class="podcast-download-btn" href="https://files.catbox.moe/q7xbo9.m4a" target="_blank" rel="noopener" title="Stream">
&lt;svg viewBox="0 0 24 24">&lt;path d="M19 9h-4V3H9v6H5l7 7 7-7zM5 18v2h14v-2H5z"/>&lt;/svg>
&lt;/a>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;script>
(function(){
var overlay = document.getElementById('podOverlay');
var a = document.getElementById('podAudio');
var speeds = [0.75, 1, 1.25, 1.5, 2];
var si = 1;
var opened = false;
function fmt(s){
if(isNaN(s)) return '0:00';
var m=Math.floor(s/60), sec=Math.floor(s%60);
return m+':'+(sec&lt;10?'0':'')+sec;
}
document.addEventListener('click', function(e){
var link = e.target.closest('a.btn-page-header');
if(!link) return;
var text = link.textContent.trim();
if(text.indexOf('AI Podcast') === -1) return;
e.preventDefault();
e.stopPropagation();
overlay.style.display = 'block';
overlay.classList.remove('pod-closing');
if(!opened){
a.preload = 'metadata';
a.load();
opened = true;
}
});
a.volume = 0.8;
a.addEventListener('loadedmetadata', function(){
document.getElementById('podDuration').textContent = fmt(a.duration);
document.getElementById('podDurationLabel').textContent = fmt(a.duration) + ' minutes';
});
a.addEventListener('timeupdate', function(){
document.getElementById('podCurrent').textContent = fmt(a.currentTime);
var pct = a.duration ? (a.currentTime/a.duration)*100 : 0;
document.getElementById('podProgress').style.width = pct+'%';
});
a.addEventListener('progress', function(){
if(a.buffered.length>0){
var pct = (a.buffered.end(a.buffered.length-1)/a.duration)*100;
document.getElementById('podBuffered').style.width = pct+'%';
}
});
a.addEventListener('ended', function(){
document.getElementById('podIconPlay').style.display='';
document.getElementById('podIconPause').style.display='none';
});
window.podToggle = function(){
if(a.paused){a.play();document.getElementById('podIconPlay').style.display='none';document.getElementById('podIconPause').style.display='';}
else{a.pause();document.getElementById('podIconPlay').style.display='';document.getElementById('podIconPause').style.display='none';}
};
window.podSkip = function(s){a.currentTime = Math.max(0,Math.min(a.duration||0,a.currentTime+s));};
window.podSeek = function(e){
var rect = document.getElementById('podBarBg').getBoundingClientRect();
var pct = (e.clientX - rect.left)/rect.width;
a.currentTime = pct * (a.duration||0);
};
window.podMute = function(){
a.muted = !a.muted;
document.getElementById('podVolume').value = a.muted ? 0 : a.volume;
};
window.podCycleSpeed = function(){
si = (si+1) % speeds.length;
a.playbackRate = speeds[si];
document.getElementById('podSpeedBtn').textContent = speeds[si]+'x';
};
window.podClose = function(){
overlay.classList.add('pod-closing');
setTimeout(function(){ overlay.style.display='none'; }, 300);
a.pause();
document.getElementById('podIconPlay').style.display='';
document.getElementById('podIconPause').style.display='none';
};
document.getElementById('podVolume').addEventListener('input', function(){
a.volume = this.value;
a.muted = false;
});
if(window.location.hash === '#podcast-player'){
overlay.style.display = 'block';
a.preload = 'metadata';
a.load();
opened = true;
}
})();
&lt;/script></description></item><item><title>Multiscale Geographically Weighted Regression: Spatially Varying Economic Convergence in Indonesia</title><link>https://carlos-mendez.org/post/python_mgwr/</link><pubDate>Sun, 22 Mar 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/python_mgwr/</guid><description>&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>When we ask &amp;ldquo;do poorer regions catch up to richer ones?&amp;rdquo;, the standard approach is to run a single regression across all regions and report one coefficient. But what if the answer depends on &lt;em>where&lt;/em> you look? A negative coefficient in Sumatra does not mean the same process is at work in Papua. A global regression forces every district onto the same line &amp;mdash; and in doing so, it may hide the most interesting part of the story.&lt;/p>
&lt;p>&lt;strong>Multiscale Geographically Weighted Regression (MGWR)&lt;/strong> addresses this by estimating a separate set of coefficients at every location, weighted by proximity. Its key innovation over standard GWR is that each variable is allowed to operate at its own spatial scale. The intercept (representing baseline growth conditions) might vary smoothly across large regions, while the convergence coefficient might shift sharply between neighboring districts. MGWR discovers these scales from the data rather than imposing a single bandwidth on all variables.&lt;/p>
&lt;p>This tutorial applies MGWR to &lt;strong>514 Indonesian districts&lt;/strong> to answer: &lt;strong>does economic catching-up happen at the same pace everywhere in Indonesia, or does geography shape how fast poorer districts close the gap?&lt;/strong> We progress from a global regression baseline through MGWR estimation and coefficient mapping, revealing that the global R² of 0.214 jumps to 0.762 once we allow the relationship to vary across space.&lt;/p>
&lt;p>&lt;strong>Learning objectives:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Understand why a single regression coefficient may hide important spatial variation&lt;/li>
&lt;li>Estimate location-specific relationships with spatially varying coefficients&lt;/li>
&lt;li>Apply MGWR to allow each variable to operate at its own spatial scale&lt;/li>
&lt;li>Map and interpret spatially varying coefficients across Indonesia&lt;/li>
&lt;li>Compare global OLS vs MGWR model fit and diagnostics&lt;/li>
&lt;/ul>
&lt;h3 id="key-concepts-at-a-glance">Key concepts at a glance&lt;/h3>
&lt;p>The post leans on a small vocabulary repeatedly. The rest of the tutorial assumes you can move between these terms quickly. Each concept below has three parts. The &lt;strong>definition&lt;/strong> is always visible. The &lt;strong>example&lt;/strong> and &lt;strong>analogy&lt;/strong> sit behind clickable cards: open them when you need them, leave them collapsed for a quick scan. If a later section mentions &amp;ldquo;bandwidth&amp;rdquo; or &amp;ldquo;spatial heterogeneity&amp;rdquo; and the term feels slippery, this is the section to re-read.&lt;/p>
&lt;p>&lt;strong>1. Local regression&lt;/strong> $\hat\beta(s)$ varies by location. One regression per location $s$, weighted by spatial proximity. Coefficients become functions of geographic position rather than fixed numbers.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>In this post the convergence coefficient $\hat\beta$ on &lt;code>ln_gdppc2010&lt;/code> varies across the 514 Indonesian districts — from -1.74 (strong catching-up) to +0.42 (divergence).&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>Drawing a different best-fit line at each map dot, not one global line for the whole country.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>2. Bandwidth (kernel)&lt;/strong> $h$. The number of nearest neighbours each local regression uses. Smaller $h$ = more localized, noisier estimates; larger $h$ = smoother but flatter.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>This post selects an optimal bandwidth of 44 districts (out of 514) for both regressors. Each local regression at a given district uses its 44 nearest neighbours.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>The radius of the circle of friends a local model listens to before deciding.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>3. Spatial heterogeneity&lt;/strong> $\beta_i \neq \beta_j$. Coefficients differ across space. The relationship between predictors and outcome is not constant geographically.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>In this post catching-up is &lt;em>strong&lt;/em> in 149 of 514 districts (29% with significant negative β) but &lt;em>insignificant or positive&lt;/em> in the other 365 districts. Convergence is not a single Indonesia-wide story.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>Different family recipes in different villages — not the same dish everywhere.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>4. GWR vs MGWR&lt;/strong> one $h$ vs $h$ per regressor. GWR uses a single bandwidth for &lt;em>all&lt;/em> coefficients. MGWR allows each coefficient to have its own bandwidth, capturing the fact that different processes operate at different spatial scales.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>In this post both &lt;code>ln_gdppc2010&lt;/code> and the intercept happen to share bandwidth = 44, but in general MGWR could have e.g. bandwidth 30 for one variable and 200 for another. The constraint relaxation is the methodological advance.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>One volume knob for everyone vs each instrument with its own knob.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>5. Local R²&lt;/strong> $R^2_i$. The R² of the local regression at district $i$. Maps to a colour scale to show &lt;em>where&lt;/em> the model fits well and &lt;em>where&lt;/em> it struggles.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>This post maps local R² across Indonesia. Fits are strong in dense Java districts and weaker in sparse, remote eastern islands where the 44 nearest neighbours span huge geographic distances.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>&amp;ldquo;How well-played is the song in &lt;em>this&lt;/em> village&amp;rdquo;.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>6. AICc model selection&lt;/strong> lower AICc = better. The corrected Akaike Information Criterion penalizes model complexity. The standard MGWR-vs-OLS comparison.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>In this post global OLS has AICc = 1341.25 while MGWR has AICc = 838.41 — a difference of more than 500 strongly favours the spatially varying model.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>The picky food critic comparing the two restaurants and giving a definitive verdict.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>7. β-convergence&lt;/strong> $g_i = \alpha + \beta \ln Y_{i,0} + \varepsilon_i$. The classic growth-economics test: poor regions catching up with rich ones leads to a &lt;em>negative&lt;/em> β coefficient on initial income.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>This post&amp;rsquo;s global β = -0.1948 (mild catching-up overall). MGWR reveals β ranges from -1.74 (strong local convergence) to +0.42 (local divergence). The story is heterogeneous and the global average hides this.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>Poor districts catching up with rich ones. A negative slope means the gap shrinks; a positive slope means the gap widens.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>8. Effective number of parameters&lt;/strong> trace of hat matrix. MGWR has more flexibility than OLS but less than fitting one regression per district. The &amp;ldquo;effective&amp;rdquo; parameter count quantifies this middle ground.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>This post&amp;rsquo;s MGWR uses 52.076 effective parameters — far more than OLS&amp;rsquo;s 2 but far less than 514×2 = 1,028 (one regression per district). MGWR finds the right level of model complexity automatically.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>A soft count of how many independent knobs the model really has.&lt;/p>
&lt;/details>
&lt;/div>
&lt;h2 id="2-the-modeling-pipeline">2. The modeling pipeline&lt;/h2>
&lt;p>The analysis follows a natural progression: start with a simple global model, visualize the spatial patterns it cannot capture, then let MGWR reveal the local structure.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
A[&amp;quot;&amp;lt;b&amp;gt;Step 1&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Load &amp;amp;&amp;lt;br/&amp;gt;Explore&amp;quot;] --&amp;gt; B[&amp;quot;&amp;lt;b&amp;gt;Step 2&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Map&amp;lt;br/&amp;gt;Variables&amp;quot;]
B --&amp;gt; C[&amp;quot;&amp;lt;b&amp;gt;Step 3&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Global&amp;lt;br/&amp;gt;OLS&amp;quot;]
C --&amp;gt; D[&amp;quot;&amp;lt;b&amp;gt;Step 4&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;MGWR&amp;lt;br/&amp;gt;Estimation&amp;quot;]
D --&amp;gt; E[&amp;quot;&amp;lt;b&amp;gt;Step 5&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Map&amp;lt;br/&amp;gt;Coefficients&amp;quot;]
E --&amp;gt; F[&amp;quot;&amp;lt;b&amp;gt;Step 6&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Significance&amp;lt;br/&amp;gt;&amp;amp; Compare&amp;quot;]
style A fill:#141413,stroke:#6a9bcc,color:#fff
style B fill:#d97757,stroke:#141413,color:#fff
style C fill:#6a9bcc,stroke:#141413,color:#fff
style D fill:#00d4c8,stroke:#141413,color:#fff
style E fill:#00d4c8,stroke:#141413,color:#fff
style F fill:#1a3a8a,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;h2 id="3-setup-and-imports">3. Setup and imports&lt;/h2>
&lt;p>The analysis uses &lt;a href="https://mgwr.readthedocs.io/" target="_blank" rel="noopener">mgwr&lt;/a> for multiscale regression, &lt;a href="https://geopandas.org/" target="_blank" rel="noopener">GeoPandas&lt;/a> for spatial data, and &lt;a href="https://pysal.org/mapclassify/" target="_blank" rel="noopener">mapclassify&lt;/a> for choropleth classification.&lt;/p>
&lt;pre>&lt;code class="language-python">import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import mapclassify
from scipy import stats
from mgwr.gwr import MGWR
from mgwr.sel_bw import Sel_BW
import warnings
warnings.filterwarnings(&amp;quot;ignore&amp;quot;)
# Site color palette
STEEL_BLUE = &amp;quot;#6a9bcc&amp;quot;
WARM_ORANGE = &amp;quot;#d97757&amp;quot;
NEAR_BLACK = &amp;quot;#141413&amp;quot;
TEAL = &amp;quot;#00d4c8&amp;quot;
&lt;/code>&lt;/pre>
&lt;details>
&lt;summary>Dark theme figure styling (click to expand)&lt;/summary>
&lt;pre>&lt;code class="language-python">DARK_NAVY = &amp;quot;#0f1729&amp;quot;
GRID_LINE = &amp;quot;#1f2b5e&amp;quot;
LIGHT_TEXT = &amp;quot;#c8d0e0&amp;quot;
WHITE_TEXT = &amp;quot;#e8ecf2&amp;quot;
plt.rcParams.update({
&amp;quot;figure.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.edgecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.linewidth&amp;quot;: 0,
&amp;quot;axes.labelcolor&amp;quot;: LIGHT_TEXT,
&amp;quot;axes.titlecolor&amp;quot;: WHITE_TEXT,
&amp;quot;axes.spines.top&amp;quot;: False,
&amp;quot;axes.spines.right&amp;quot;: False,
&amp;quot;axes.spines.left&amp;quot;: False,
&amp;quot;axes.spines.bottom&amp;quot;: False,
&amp;quot;axes.grid&amp;quot;: True,
&amp;quot;grid.color&amp;quot;: GRID_LINE,
&amp;quot;grid.linewidth&amp;quot;: 0.6,
&amp;quot;grid.alpha&amp;quot;: 0.8,
&amp;quot;xtick.color&amp;quot;: LIGHT_TEXT,
&amp;quot;ytick.color&amp;quot;: LIGHT_TEXT,
&amp;quot;xtick.major.size&amp;quot;: 0,
&amp;quot;ytick.major.size&amp;quot;: 0,
&amp;quot;text.color&amp;quot;: WHITE_TEXT,
&amp;quot;font.size&amp;quot;: 12,
&amp;quot;legend.frameon&amp;quot;: False,
&amp;quot;legend.fontsize&amp;quot;: 11,
&amp;quot;legend.labelcolor&amp;quot;: LIGHT_TEXT,
&amp;quot;figure.edgecolor&amp;quot;: DARK_NAVY,
&amp;quot;savefig.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;savefig.edgecolor&amp;quot;: DARK_NAVY,
})
&lt;/code>&lt;/pre>
&lt;/details>
&lt;h2 id="4-data-loading-and-exploration">4. Data loading and exploration&lt;/h2>
&lt;p>The dataset covers &lt;strong>514 Indonesian districts&lt;/strong> with GDP per capita in 2010 and the subsequent growth rate through 2018. Indonesia is an ideal setting for studying spatial heterogeneity: it spans over 17,000 islands across 5,000 km of ocean, with enormous variation in economic structure, geography, and institutional capacity.&lt;/p>
&lt;p>The core idea behind convergence is straightforward: if poorer districts tend to grow faster than richer ones, the income gap narrows over time. In a regression framework, this means we expect a &lt;strong>negative relationship&lt;/strong> between initial income (log GDP per capita in 2010) and subsequent growth. The question is whether that negative relationship holds uniformly across the archipelago &amp;mdash; or whether it is stronger in some places and weaker (or even reversed) in others.&lt;/p>
&lt;pre>&lt;code class="language-python">CSV_URL = (&amp;quot;https://github.com/quarcs-lab/data-quarcs/raw/refs/heads/&amp;quot;
&amp;quot;master/indonesia514/dataBeta.csv&amp;quot;)
GEO_URL = (&amp;quot;https://github.com/quarcs-lab/data-quarcs/raw/refs/heads/&amp;quot;
&amp;quot;master/indonesia514/mapIdonesia514-opt.geojson&amp;quot;)
df = pd.read_csv(CSV_URL)
geo = gpd.read_file(GEO_URL)
gdf = geo.merge(df, on=&amp;quot;districtID&amp;quot;, how=&amp;quot;left&amp;quot;)
print(f&amp;quot;Loaded: {gdf.shape[0]} districts, {gdf.shape[1]} columns&amp;quot;)
print(gdf[[&amp;quot;ln_gdppc2010&amp;quot;, &amp;quot;g&amp;quot;]].describe().round(4).to_string())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Loaded: 514 districts, 16 columns
ln_gdppc2010 g
count 514.0000 514.0000
mean 9.8371 0.3860
std 0.7603 0.3205
min 7.1657 -2.0452
25% 9.3983 0.2583
50% 9.7626 0.3453
75% 10.1739 0.4158
max 13.4438 2.0563
&lt;/code>&lt;/pre>
&lt;p>The 514 districts span a wide range of initial income: log GDP per capita ranges from 7.17 (the poorest district, roughly \$1,300 per capita) to 13.44 (the richest, roughly \$690,000 &amp;mdash; likely a resource-extraction enclave). Growth rates also vary enormously, from -2.05 (severe contraction) to +2.06 (rapid expansion), with a mean of 0.39. This high variance in both variables suggests that a single regression line will struggle to capture the full picture.&lt;/p>
&lt;h2 id="5-exploratory-maps">5. Exploratory maps&lt;/h2>
&lt;p>Before fitting any model, we map the two key variables to see whether spatial patterns are visible to the naked eye. If initial income and growth are geographically clustered, that is already a hint that spatial models will outperform global ones.&lt;/p>
&lt;pre>&lt;code class="language-python">fig, axes = plt.subplots(2, 1, figsize=(14, 14))
for ax, col, title in [
(axes[0], &amp;quot;ln_gdppc2010&amp;quot;, &amp;quot;(a) Log GDP per capita, 2010&amp;quot;),
(axes[1], &amp;quot;g&amp;quot;, &amp;quot;(b) GDP growth rate, 2010–2018&amp;quot;),
]:
fj = mapclassify.FisherJenks(gdf[col].dropna().values, k=5)
classified = mapclassify.UserDefined(gdf[col].values, bins=fj.bins.tolist())
cmap = plt.cm.coolwarm
norm = plt.Normalize(vmin=0, vmax=4)
colors = [cmap(norm(c)) for c in classified.yb]
gdf.plot(ax=ax, color=colors, edgecolor=GRID_LINE, linewidth=0.2)
ax.set_title(title, fontsize=14, pad=10)
ax.set_axis_off()
plt.tight_layout()
plt.savefig(&amp;quot;mgwr_map_xy.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="mgwr_map_xy.png" alt="Two-panel choropleth map of Indonesia showing log GDP per capita in 2010 and GDP growth rate 2010-2018.">&lt;/p>
&lt;p>The maps reveal clear spatial structure. Initial income (panel a) is highest in Jakarta and resource-rich districts in Kalimantan and Papua (warm red), while the lowest-income districts cluster in eastern Nusa Tenggara and parts of Maluku (cool blue). Growth rates (panel b) show a different pattern: some of the poorest districts in Papua and Sulawesi experienced rapid growth (suggesting catching-up), while several high-income resource districts saw contraction. The fact that these patterns are geographically organized &amp;mdash; not randomly scattered &amp;mdash; motivates the use of spatially varying models.&lt;/p>
&lt;h2 id="6-global-regression-baseline">6. Global regression baseline&lt;/h2>
&lt;p>The simplest test for economic convergence fits a single regression line through all 514 districts. If the slope is negative, poorer districts (low initial income) tend to grow faster than richer ones.&lt;/p>
&lt;p>$$g_i = \alpha + \beta \cdot \ln(y_{i,2010}) + \varepsilon_i$$&lt;/p>
&lt;p>where $g_i$ is the growth rate, $\ln(y_{i,2010})$ is log initial income, and $\beta &amp;lt; 0$ indicates convergence. In the code, $g_i$ corresponds to the column &lt;code>g&lt;/code> and $\ln(y_{i,2010})$ to &lt;code>ln_gdppc2010&lt;/code>.&lt;/p>
&lt;pre>&lt;code class="language-python">slope, intercept, r_value, p_value, std_err = stats.linregress(
gdf[&amp;quot;ln_gdppc2010&amp;quot;], gdf[&amp;quot;g&amp;quot;]
)
print(f&amp;quot;Slope (convergence coefficient): {slope:.4f}&amp;quot;)
print(f&amp;quot;R-squared: {r_value**2:.4f}&amp;quot;)
print(f&amp;quot;p-value: {p_value:.6f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Slope (convergence coefficient): -0.1948
R-squared: 0.2135
p-value: 0.000000
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-python">fig, ax = plt.subplots(figsize=(10, 7))
ax.scatter(gdf[&amp;quot;ln_gdppc2010&amp;quot;], gdf[&amp;quot;g&amp;quot;],
color=STEEL_BLUE, edgecolors=GRID_LINE, s=35, alpha=0.6, zorder=3)
x_range = np.linspace(gdf[&amp;quot;ln_gdppc2010&amp;quot;].min(), gdf[&amp;quot;ln_gdppc2010&amp;quot;].max(), 100)
ax.plot(x_range, intercept + slope * x_range, color=WARM_ORANGE,
linewidth=2, zorder=2)
ax.set_xlabel(&amp;quot;Log GDP per capita (2010)&amp;quot;)
ax.set_ylabel(&amp;quot;GDP growth rate (2010–2018)&amp;quot;)
ax.set_title(&amp;quot;Global convergence regression&amp;quot;)
plt.savefig(&amp;quot;mgwr_scatter_global.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="mgwr_scatter_global.png" alt="Scatter plot of log GDP per capita 2010 vs growth rate with OLS regression line.">&lt;/p>
&lt;p>The global regression confirms that convergence exists &lt;strong>on average&lt;/strong>: the slope is $-0.195$ (p &amp;lt; 0.001), meaning a 1-unit increase in log initial income is associated with a 0.195 percentage-point lower growth rate. However, the R² of only 0.214 means this single line explains just 21% of the variation in growth rates. The scatter plot shows enormous dispersion around the regression line &amp;mdash; many districts with similar initial income experienced vastly different growth trajectories. This low explanatory power is the motivation for MGWR: perhaps the relationship is not weak everywhere, but rather strong in some regions and absent in others, and a single coefficient is simply averaging over this heterogeneity.&lt;/p>
&lt;h2 id="7-from-global-to-local-why-mgwr">7. From global to local: why MGWR?&lt;/h2>
&lt;h3 id="71-the-limitation-of-a-single-coefficient">7.1 The limitation of a single coefficient&lt;/h3>
&lt;p>The global regression tells us that $\beta = -0.195$ on average across Indonesia. But consider two districts with the same initial income &amp;mdash; one in Java, where infrastructure and market access are strong, and one in Papua, where remoteness and institutional challenges dominate. There is no reason to expect the same convergence dynamic in both places. A single coefficient forces them onto the same line.&lt;/p>
&lt;p>&lt;strong>Geographically Weighted Regression (GWR)&lt;/strong> addresses this by estimating a separate regression at each location, using a kernel function &amp;mdash; a distance-decay weighting scheme (typically Gaussian or bisquare) that gives more weight to nearby observations and less to distant ones. The result is a set of &lt;strong>location-specific coefficients&lt;/strong> &amp;mdash; each district gets its own slope and intercept:&lt;/p>
&lt;p>$$g_i = \alpha(u_i, v_i) + \beta(u_i, v_i) \cdot \ln(y_{i,2010}) + \varepsilon_i$$&lt;/p>
&lt;p>where $(u_i, v_i)$ are the geographic coordinates of district $i$, and both $\alpha$ and $\beta$ are now functions of location rather than fixed constants. In the code, $(u_i, v_i)$ correspond to &lt;code>COORD_X&lt;/code> and &lt;code>COORD_Y&lt;/code>. The &lt;strong>bandwidth&lt;/strong> parameter $h$ controls how many neighbors contribute to each local regression &amp;mdash; a small bandwidth means only very close districts matter (highly local), while a large bandwidth approaches the global model.&lt;/p>
&lt;p>However, standard GWR uses a single bandwidth for all variables, which means the intercept and the convergence coefficient are forced to vary at the same spatial scale.&lt;/p>
&lt;p>&lt;strong>MGWR&lt;/strong> removes this constraint. It allows each variable to find its own optimal bandwidth through an iterative back-fitting procedure &amp;mdash; a process that cycles through each variable, optimizing its bandwidth while holding the others fixed, until all bandwidths converge. If baseline growth conditions vary smoothly across large regions (large bandwidth), while the convergence speed varies sharply between neighboring districts (small bandwidth), MGWR will discover this from the data. This makes MGWR a more flexible and realistic model for processes that operate at multiple spatial scales. The key assumption is that spatial relationships are &lt;strong>locally stationary&lt;/strong> within each kernel window &amp;mdash; the relationship between income and growth is approximately constant among the nearest $h$ districts, even if it differs across the full map.&lt;/p>
&lt;h3 id="72-mgwr-estimation">7.2 MGWR estimation&lt;/h3>
&lt;p>The &lt;code>mgwr&lt;/code> package requires variables to be &lt;strong>standardized&lt;/strong> (zero mean, unit variance) before multiscale bandwidth selection. This ensures that the bandwidths are comparable across variables measured in different units. The &lt;code>spherical=True&lt;/code> flag tells the algorithm to compute great-circle distances rather than Euclidean distances, which is essential when working with geographic coordinates spanning a large area like Indonesia.&lt;/p>
&lt;pre>&lt;code class="language-python"># Prepare variables
y = gdf[&amp;quot;g&amp;quot;].values.reshape((-1, 1))
X = gdf[[&amp;quot;ln_gdppc2010&amp;quot;]].values
coords = list(zip(gdf[&amp;quot;COORD_X&amp;quot;], gdf[&amp;quot;COORD_Y&amp;quot;]))
# Standardize (required for MGWR)
Zy = (y - y.mean(axis=0)) / y.std(axis=0)
ZX = (X - X.mean(axis=0)) / X.std(axis=0)
# Bandwidth selection and model fitting
mgwr_selector = Sel_BW(coords, Zy, ZX, multi=True, spherical=True)
mgwr_bw = mgwr_selector.search()
mgwr_results = MGWR(coords, Zy, ZX, mgwr_selector, spherical=True).fit()
mgwr_results.summary()
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">===========================================================================
Model type Gaussian
Number of observations: 514
Number of covariates: 2
Global Regression Results
---------------------------------------------------------------------------
R2: 0.214
Adj. R2: 0.212
Multi-Scale Geographically Weighted Regression (MGWR) Results
---------------------------------------------------------------------------
Spatial kernel: Adaptive bisquare
MGWR bandwidths
---------------------------------------------------------------------------
Variable Bandwidth ENP_j Adj t-val(95%) Adj alpha(95%)
X0 44.000 26.805 3.127 0.002
X1 44.000 25.271 3.109 0.002
Diagnostic information
---------------------------------------------------------------------------
Residual sum of squares: 122.081
Effective number of parameters (trace(S)): 52.076
Sigma estimate: 0.514
R2 0.762
Adjusted R2 0.736
AICc: 838.405
===========================================================================
&lt;/code>&lt;/pre>
&lt;p>The MGWR results are striking. &lt;strong>R² jumps from 0.214 (global) to 0.762 (MGWR)&lt;/strong> &amp;mdash; the spatially varying model explains more than three times as much variation as the global regression. Both the intercept and the convergence coefficient receive a bandwidth of 44, meaning each local regression draws on the 44 nearest districts. This is a relatively local scale (44 out of 514 districts, or about 8.6% of the sample), confirming that the convergence relationship varies substantially across the archipelago. The effective number of parameters is 52.1, reflecting the cost of estimating location-specific coefficients instead of two global ones.&lt;/p>
&lt;h3 id="73-mapping-mgwr-coefficients">7.3 Mapping MGWR coefficients&lt;/h3>
&lt;p>The power of MGWR lies in the coefficient maps. Instead of a single number for the whole country, we can now visualize how the convergence relationship changes from district to district. Because MGWR is estimated on standardized variables, the mapped coefficients are in &lt;strong>standard-deviation units&lt;/strong>: a coefficient of $-1.0$ means that a one-standard-deviation increase in log initial income is associated with a one-standard-deviation decrease in growth at that location.&lt;/p>
&lt;pre>&lt;code class="language-python">gdf[&amp;quot;mgwr_intercept&amp;quot;] = mgwr_results.params[:, 0]
gdf[&amp;quot;mgwr_slope&amp;quot;] = mgwr_results.params[:, 1]
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>Intercept map&lt;/strong> &amp;mdash; the intercept captures baseline growth conditions after accounting for initial income. Positive values indicate districts that grew faster than expected given their income level; negative values indicate underperformance.&lt;/p>
&lt;pre>&lt;code class="language-python">fig, ax = plt.subplots(figsize=(14, 8))
# Fisher-Jenks classification with Patch legend (see script.py for details)
gdf.plot(ax=ax, column=&amp;quot;mgwr_intercept&amp;quot;, scheme=&amp;quot;FisherJenks&amp;quot;, k=5,
cmap=&amp;quot;coolwarm&amp;quot;, edgecolor=GRID_LINE, linewidth=0.2, legend=True)
ax.set_title(f&amp;quot;MGWR intercept (bandwidth = {int(mgwr_bw[0])})&amp;quot;)
ax.set_axis_off()
plt.savefig(&amp;quot;mgwr_mgwr_intercept.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="mgwr_mgwr_intercept.png" alt="MGWR intercept map across Indonesia&amp;amp;rsquo;s 514 districts.">&lt;/p>
&lt;p>The intercept map reveals a clear east&amp;ndash;west gradient. Districts in &lt;strong>western Indonesia&lt;/strong> (Sumatra and Java) tend to have negative intercepts &amp;mdash; they grew &lt;strong>less&lt;/strong> than the convergence model would predict based on their initial income alone. Districts in &lt;strong>eastern Indonesia&lt;/strong> (Papua, Maluku, Nusa Tenggara) show positive intercepts, indicating growth that &lt;strong>exceeded&lt;/strong> what initial income would predict. This pattern may reflect the role of resource extraction, infrastructure investment, and fiscal transfers that disproportionately boosted growth in less-developed eastern regions during the 2010&amp;ndash;2018 period.&lt;/p>
&lt;p>&lt;strong>Convergence coefficient map&lt;/strong> &amp;mdash; the slope captures how strongly initial income predicts subsequent growth at each location. Large negative values indicate rapid catching-up; values near zero or positive indicate no convergence or divergence.&lt;/p>
&lt;pre>&lt;code class="language-python">fig, ax = plt.subplots(figsize=(14, 8))
gdf.plot(ax=ax, column=&amp;quot;mgwr_slope&amp;quot;, scheme=&amp;quot;FisherJenks&amp;quot;, k=5,
cmap=&amp;quot;coolwarm&amp;quot;, edgecolor=GRID_LINE, linewidth=0.2, legend=True)
ax.set_title(f&amp;quot;MGWR convergence coefficient (bandwidth = {int(mgwr_bw[1])})&amp;quot;)
ax.set_axis_off()
plt.savefig(&amp;quot;mgwr_mgwr_slope.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="mgwr_mgwr_slope.png" alt="MGWR convergence coefficient map across Indonesia.">&lt;/p>
&lt;p>The convergence coefficient map is the central finding of this analysis. The global regression reported a single $\beta = -0.195$, but MGWR reveals that this average hides enormous spatial variation. The &lt;strong>strongest catching-up&lt;/strong> (deepest blue, coefficients as negative as $-1.74$) concentrates in &lt;strong>western Sumatra and parts of Kalimantan&lt;/strong> &amp;mdash; districts where poorer areas grew much faster than richer neighbors. In contrast, most of &lt;strong>Java, eastern Indonesia, and the Maluku islands&lt;/strong> show coefficients near zero (light pink), indicating that the convergence relationship is essentially absent in these areas. A handful of districts show weakly positive coefficients (up to 0.42), suggesting localized divergence where richer districts pulled further ahead. The coefficient ranges from $-1.74$ to $+0.42$, with a median of $-0.085$ and a standard deviation of 0.553 &amp;mdash; far from the single value of $-0.195$ reported by the global model.&lt;/p>
&lt;h3 id="74-statistical-significance">7.4 Statistical significance&lt;/h3>
&lt;p>Not all local coefficients are statistically distinguishable from zero. MGWR provides t-values corrected for multiple testing, which we use to classify each district&amp;rsquo;s convergence coefficient as significantly negative (catching-up), not significant, or significantly positive (diverging).&lt;/p>
&lt;pre>&lt;code class="language-python">mgwr_filtered_t = mgwr_results.filter_tvals()
t_sig = mgwr_filtered_t[:, 1] # Slope t-values
sig_cats = np.where(t_sig &amp;lt; 0, &amp;quot;Negative (catching-up)&amp;quot;,
np.where(t_sig &amp;gt; 0, &amp;quot;Positive (diverging)&amp;quot;, &amp;quot;Not significant&amp;quot;))
print(f&amp;quot;Negative (catching-up): {(sig_cats == 'Negative (catching-up)').sum()}&amp;quot;)
print(f&amp;quot;Not significant: {(sig_cats == 'Not significant').sum()}&amp;quot;)
print(f&amp;quot;Positive (diverging): {(sig_cats == 'Positive (diverging)').sum()}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Negative (catching-up): 149
Not significant: 365
Positive (diverging): 0
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-python">fig, ax = plt.subplots(figsize=(14, 8))
cat_colors = {
&amp;quot;Negative (catching-up)&amp;quot;: &amp;quot;#2c7bb6&amp;quot;,
&amp;quot;Not significant&amp;quot;: GRID_LINE,
&amp;quot;Positive (diverging)&amp;quot;: &amp;quot;#d7191c&amp;quot;,
}
colors_sig = [cat_colors[c] for c in sig_cats]
gdf.plot(ax=ax, color=colors_sig, edgecolor=GRID_LINE, linewidth=0.2)
ax.set_title(&amp;quot;MGWR convergence coefficient: statistical significance&amp;quot;)
ax.set_axis_off()
plt.savefig(&amp;quot;mgwr_mgwr_significance.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="mgwr_mgwr_significance.png" alt="Significance map showing districts with statistically significant catching-up.">&lt;/p>
&lt;p>Of 514 districts, &lt;strong>149 (29%)&lt;/strong> show statistically significant convergence at the corrected 5% level &amp;mdash; concentrated in &lt;strong>Sumatra, western Kalimantan, and Sulawesi&lt;/strong>. The remaining &lt;strong>365 districts (71%)&lt;/strong> have convergence coefficients that are not distinguishable from zero after correcting for multiple comparisons. &lt;strong>No district&lt;/strong> shows significant divergence. This means that while the global regression detects convergence on average, it is actually driven by a minority of districts &amp;mdash; primarily in western Indonesia &amp;mdash; while the majority of the archipelago shows no significant relationship between initial income and growth.&lt;/p>
&lt;h2 id="8-model-comparison">8. Model comparison&lt;/h2>
&lt;p>The table below summarizes how much explanatory power the spatially varying model adds over the global baseline.&lt;/p>
&lt;pre>&lt;code class="language-python">print(f&amp;quot;{'Metric':&amp;lt;25} {'Global OLS':&amp;gt;12} {'MGWR':&amp;gt;12}&amp;quot;)
print(f&amp;quot;{'R²':&amp;lt;25} {0.2135:&amp;gt;12.4f} {0.7625:&amp;gt;12.4f}&amp;quot;)
print(f&amp;quot;{'Adj. R²':&amp;lt;25} {0.2120:&amp;gt;12.4f} {0.7357:&amp;gt;12.4f}&amp;quot;)
print(f&amp;quot;{'AICc':&amp;lt;25} {1341.25:&amp;gt;12.2f} {838.41:&amp;gt;12.2f}&amp;quot;)
print(f&amp;quot;{'Bandwidth (intercept)':&amp;lt;25} {'all (514)':&amp;gt;12} {'44':&amp;gt;12}&amp;quot;)
print(f&amp;quot;{'Bandwidth (slope)':&amp;lt;25} {'all (514)':&amp;gt;12} {'44':&amp;gt;12}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Metric Global OLS MGWR
R² 0.2135 0.7625
Adj. R² 0.2120 0.7357
AICc 1341.25 838.41
Bandwidth (intercept) all (514) 44
Bandwidth (slope) all (514) 44
&lt;/code>&lt;/pre>
&lt;p>MGWR more than triples the explained variance ($R^2$: 0.214 to 0.762) and dramatically reduces the AICc from 1341 to 838, confirming that the improvement in fit is not merely due to additional flexibility. The bandwidth of 44 for both variables means each local regression uses the nearest 44 districts (about 8.6% of the sample), confirming that the convergence process is highly localized. The adjusted $R^2$ of 0.736 accounts for the additional complexity (52 effective parameters vs 2 in OLS) and still shows a massive improvement, indicating that the spatial variation in coefficients is genuine and not overfitting.&lt;/p>
&lt;h2 id="9-discussion">9. Discussion&lt;/h2>
&lt;p>&lt;strong>Economic catching-up in Indonesia is not uniform &amp;mdash; it is concentrated in western Sumatra and parts of Kalimantan, while most of the archipelago shows no significant convergence.&lt;/strong> The global regression&amp;rsquo;s $\beta = -0.195$ suggests a moderate convergence tendency, but MGWR reveals that this average is driven by a subset of 149 districts (29%) with strong catching-up dynamics. The remaining 365 districts have convergence coefficients indistinguishable from zero.&lt;/p>
&lt;p>The intercept map adds another dimension: eastern Indonesian districts tend to have positive intercepts (above-expected growth), while western districts have negative intercepts (below-expected growth). This east&amp;ndash;west gradient likely reflects the impact of fiscal transfers, resource booms, and infrastructure programs that targeted less-developed regions during the 2010&amp;ndash;2018 period. Combined with the convergence coefficient map, the picture is nuanced: eastern Indonesia grew faster than expected (high intercept), but not because of convergence dynamics (near-zero slope) &amp;mdash; rather, because of other factors captured by the intercept.&lt;/p>
&lt;p>For policy, these findings challenge the assumption that national-level convergence statistics reflect what is happening locally. A policymaker looking at $\beta = -0.195$ might conclude that Indonesia&amp;rsquo;s development strategy is successfully closing regional gaps. MGWR reveals that catching-up is geographically selective, and the majority of districts are not on a convergence path at all. Spatially targeted interventions &amp;mdash; rather than uniform national programs &amp;mdash; may be needed to address this uneven landscape.&lt;/p>
&lt;h2 id="10-summary-and-next-steps">10. Summary and next steps&lt;/h2>
&lt;p>&lt;strong>Key takeaways:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Method insight:&lt;/strong> MGWR reveals spatial heterogeneity invisible to global regression. R² improves from 0.214 to 0.762 by allowing location-specific coefficients. Both variables operate at a bandwidth of 44 districts (~8.6% of the sample), indicating highly localized economic dynamics. Variable standardization is essential before MGWR estimation.&lt;/li>
&lt;li>&lt;strong>Data insight:&lt;/strong> Only 149 of 514 Indonesian districts (29%) show statistically significant convergence, concentrated in Sumatra and Kalimantan. The convergence coefficient ranges from $-1.74$ to $+0.42$, far from the global average of $-0.195$. Eastern Indonesia grows faster than expected (positive intercepts) but not through convergence &amp;mdash; the catching-up mechanism is absent there.&lt;/li>
&lt;li>&lt;strong>Limitation:&lt;/strong> The bivariate model (one independent variable) is intentionally simple for pedagogical purposes. Real convergence analysis would include controls for human capital, infrastructure, institutional quality, and sectoral composition. The bandwidth of 44 applies to both variables in this case, but with additional covariates, MGWR&amp;rsquo;s ability to assign different bandwidths per variable would be more visible.&lt;/li>
&lt;li>&lt;strong>Next step:&lt;/strong> Extend the model with additional covariates (education, investment, fiscal transfers) to disentangle the sources of spatial heterogeneity. Apply MGWR to panel data with multiple time periods. Compare MGWR results with the spatial clusters identified in the &lt;a href="https://carlos-mendez.org/post/python_esda2/">ESDA tutorial&lt;/a> to see whether convergence hotspots align with LISA clusters.&lt;/li>
&lt;/ul>
&lt;h2 id="11-exercises">11. Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Add a second variable.&lt;/strong> Include an education indicator (e.g., years of schooling) as a second independent variable and re-run MGWR. Do the two covariates receive different bandwidths? What does that tell you about the spatial scale at which education affects growth?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Map the t-values.&lt;/strong> Instead of mapping the raw coefficients, map the local t-statistics from &lt;code>mgwr_results.tvalues[:, 1]&lt;/code>. How does this map compare to the significance map based on corrected t-values?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Compare with ESDA.&lt;/strong> Run a Moran&amp;rsquo;s I test on the MGWR residuals. Is there remaining spatial autocorrelation? If not, MGWR has successfully captured the spatial structure. If yes, what might be missing?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="12-references">12. References&lt;/h2>
&lt;ol>
&lt;li>&lt;a href="https://doi.org/10.1080/24694452.2017.1352480" target="_blank" rel="noopener">Fotheringham, A. S., Yang, W., and Kang, W. (2017). Multiscale Geographically Weighted Regression (MGWR). &lt;em>Annals of the American Association of Geographers&lt;/em>, 107(6), 1247&amp;ndash;1265.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.21105/joss.01750" target="_blank" rel="noopener">Oshan, T. M., Li, Z., Kang, W., Wolf, L. J., and Fotheringham, A. S. (2019). mgwr: A Python Implementation of Multiscale Geographically Weighted Regression. &lt;em>JOSS&lt;/em>, 4(42), 1750.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1111/j.1538-4632.1996.tb00936.x" target="_blank" rel="noopener">Brunsdon, C., Fotheringham, A. S., and Charlton, M. E. (1996). Geographically Weighted Regression: A Method for Exploring Spatial Nonstationarity. &lt;em>Geographical Analysis&lt;/em>, 28(4), 281&amp;ndash;298.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.wiley.com/en-us/Geographically&amp;#43;Weighted&amp;#43;Regression-p-9780471496168" target="_blank" rel="noopener">Fotheringham, A. S., Brunsdon, C., and Charlton, M. (2002). &lt;em>Geographically Weighted Regression: The Analysis of Spatially Varying Relationships&lt;/em>. Wiley.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://carlos-mendez.org/publication/20241219-ae/" target="_blank" rel="noopener">Mendez, C. and Jiang, Q. (2024). Spatial Heterogeneity Modeling for Regional Economic Analysis: A Computational Approach Using Python and Cloud Computing. Working Paper, Nagoya University.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://mgwr.readthedocs.io/" target="_blank" rel="noopener">mgwr documentation&lt;/a>&lt;/li>
&lt;/ol>
&lt;h4 id="acknowledgements">Acknowledgements&lt;/h4>
&lt;p>AI tools (Claude Code, Gemini, NotebookLM) were used to make the contents of this post more accessible to students. Nevertheless, the content in this post may still have errors. Caution is needed when applying the contents of this post to true research projects.&lt;/p></description></item><item><title>Studying spatial heterogeneity</title><link>https://carlos-mendez.org/post/python_gwr_mgwr/</link><pubDate>Sat, 23 Dec 2023 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/python_gwr_mgwr/</guid><description>&lt;h1 id="a-geocomputational-notebook-to-compute-gwr-and-mgwr">&lt;strong>A geocomputational notebook to compute GWR and MGWR&lt;/strong>&lt;/h1>
&lt;p>.&lt;/p></description></item></channel></rss>