← Back to the post
Interactive data dictionary

MGWFER: Causal Spatially Varying Coefficients via Panel Fixed Effects

A fully synthetic 15×15 spatial panel (225 units × 3 periods) with known coefficient surfaces and a time-invariant spatial confounder.

1
dataset
14
variables
225
spatial units
3 periods
panel (675 obs)

Downloads

Each dataset is available as a labeled Stata .dta and its source file.

⇩ Download all data (ZIP)stata_codebook.do

DatasetGrainRowsStataSource
simulated_panel_dataunit-period (spatial panel)675 × 14simulated_panel_data.dtasimulated_panel_data.csv

Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.

Load directly in code

Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.

Stata

* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_mgwrfer/data/"
use "${BASE}simulated_panel_data.dta", clear
describe
notes

Python

!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_mgwrfer/data/"
df = pd.read_stata(BASE + "simulated_panel_data.dta")

# load every dataset at once
files = ["simulated_panel_data"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}

# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "simulated_panel_data.dta", "simulated_panel_data.dta")
df, meta = pyreadstat.read_dta("simulated_panel_data.dta")

Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb

R

# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_mgwrfer/data/"
df <- read_dta(paste0(BASE, "simulated_panel_data.dta"))

Overview & sources

Companion data for a Python tutorial faithful to Li & Fotheringham (2026), which introduces Multiscale Geographically Weighted Fixed Effects Regression (MGWFER) — a local panel framework that removes time-invariant spatial confounders from Multiscale GWR. The dataset is a fully synthetic spatial panel generated verbatim from the paper’s data-generating process (Eqs. 39–45) on a 15×15 grid of 225 spatial units observed over 3 time periods (675 observations). Each covariate is coupled to a time-invariant spatial context sc_i (Cor(x_k, sc) ≈ 0.84), so the indirect contextual effect channel is active and Cor(x_4, y) ≈ 0.84 even though β_4 ≡ 0. Because the true coefficient surfaces (beta1_truebeta4_true) and the confounder (alpha_true) are carried as columns, the panel is ground truth against which OLS, pooled OLS, fixed effects, cross-sectional MGWR, pooled MGWR, and MGWFER can be benchmarked. The entire data-generating process is open and reproducible.

One file. simulated_panel_data is a balanced spatial panel: one row per spatial unit × time period (225 units × 3 periods = 675 rows). Spatial position is fixed by integer grid coordinates (coord_i, coord_j) on a 15×15 lattice; time_id indexes the three periods. Alongside the observed outcome y and four covariates x1x4, the file carries the known truth columns — the fixed effect alpha_true and the four spatially varying slopes beta1_truebeta4_true — so every estimator can be scored against ground truth.

Data sources

SourceProvidesReference / URL
Li &amp; Fotheringham (2026)Replicated study; the verbatim data-generating process (Eqs. 39–45) and the MGWFER algorithmLi, Z., & Fotheringham, A. S. (2026). Spatial Context as a Time-Invariant Confounder: A Fixed-Effects Extension of MGWR. Annals of the American Association of Geographers. https://doi.org/10.1080/24694452.2026.2654481
Synthetic (this study)All values — simulated from the paper&#x27;s DGP with a fixed random seed (open &amp; reproducible)Mendez, C. (2026). See the post's Python script script.py for the full DGP (NumPy default_rng, seed 42).
Method referencesEstimators and softwareFotheringham, Yang & Kang (2017, Multiscale GWR); Oshan et al. (2019, the mgwr package); GeoZhipengLi/MGWPR (panel-enabled mgwr fork); Wooldridge (2010, omitted-variable-bias derivation).

Cite this data

Please cite this dataset as follows.

APA

Mendez, C. (2026). MGWFER: Causal Spatially Varying Coefficients via Panel Fixed Effects [Data set]. https://carlos-mendez.org/post/python_mgwrfer/

Li, Z., & Fotheringham, A. S. (2026). Spatial Context as a Time-Invariant Confounder: A Fixed-Effects Extension of MGWR. Annals of the American Association of Geographers. https://doi.org/10.1080/24694452.2026.2654481

BibTeX

@misc{mendez2026pythonmgwrfer,
  author       = {Mendez, Carlos},
  title        = {MGWFER: Causal Spatially Varying Coefficients via Panel Fixed Effects},
  year         = {2026},
  howpublished = {\url{https://carlos-mendez.org/post/python_mgwrfer/}},
  note         = {Data set}
}

@article{li2026spatial,
  author  = {Li, Zhipeng and Fotheringham, A. Stewart},
  title   = {Spatial Context as a Time-Invariant Confounder: A Fixed-Effects Extension of {MGWR}},
  journal = {Annals of the American Association of Geographers},
  year    = {2026},
  doi     = {10.1080/24694452.2026.2654481}
}

Variable explorer search & filter all 14 variables

Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.

VariableTypeDistributionLabelDefinitionUnitsIn filesSource
alpha_true#continuousmin 2.07 | median 21.1 | max 51.5True spatial context / fixed effect (sc_i)Known time-invariant confounder; the intrinsic contextual effect MGWFER recovers. Truth column — not an observable predictor.synthetic unitssimulated_panel_dataSimulation (ground truth)
beta1_true#continuousmin 1.05 | median 1.46 | max 2True local slope beta1 (quadratic dome)Known spatially varying coefficient on x1; ground truth for scoring. Quadratic dome peaking at the grid centre.coefficientsimulated_panel_dataSimulation (ground truth)
beta2_true#continuousmin 1.07 | median 1.53 | max 2True local slope beta2 (linear gradient)Known spatially varying coefficient on x2; ground truth for scoring. Linear gradient in i+j.coefficientsimulated_panel_dataSimulation (ground truth)
beta3_true#continuousTrue local slope beta3 (constant 1.5)Known spatially homogeneous coefficient on x3; ground truth for scoring.coefficientsimulated_panel_dataSimulation (ground truth)
beta4_true#continuousTrue local slope beta4 (null = 0)Known null coefficient on x4; ground truth for false-positive testing.coefficientsimulated_panel_dataSimulation (ground truth)
coord_i#identifierGrid row coordinate (i)Row position of the unit on the 15x15 lattice; spatial coordinate for kernel weighting.grid units (1-15)simulated_panel_dataSimulation
coord_j#identifierGrid column coordinate (j)Column position on the 15x15 lattice; drives the exponential spatial-context gradient.grid units (1-15)simulated_panel_dataSimulation
time_id#identifierTime period indexPeriod index within the panel (3 periods per unit). Not a calendar year.integer (0-2)simulated_panel_dataSimulation
unit_id#identifierSpatial unit IDIdentifier of the spatial unit (one of 225 grid cells); repeats across the unit's 3 time periods.integer ID (0-224)simulated_panel_dataSimulation
x1#continuousmin -1.02 | median 1.08 | max 3.45Covariate x1 (effect = quadratic dome)Causally-active covariate; its true local slope beta1 is a quadratic dome peaking at the grid centre.synthetic unitssimulated_panel_dataSimulation
x2#continuousmin -1.61 | median 1.09 | max 3.98Covariate x2 (effect = linear gradient)Causally-active covariate; its true local slope beta2 is a linear gradient increasing with i+j.synthetic unitssimulated_panel_dataSimulation
x3#continuousmin -1.03 | median 1.05 | max 3.77Covariate x3 (effect = constant 1.5)Causally-active covariate; its true local slope beta3 is constant at 1.5 everywhere.synthetic unitssimulated_panel_dataSimulation
x4#continuousmin -1.2 | median 1.12 | max 3.7Covariate x4 (null effect; spurious link to y)Covariate with NO causal effect on y (beta4 = 0); shares parent sc with y, so Cor(x4, y) ~ 0.84.synthetic unitssimulated_panel_dataSimulation
y#continuousmin -0.577 | median 26.2 | max 66.2Outcome variableSimulated response: spatial context plus three causally-active covariates plus noise.synthetic unitssimulated_panel_dataSimulation

Cross-file variable index

Which file each variable appears in (● = present).

Variablesimulated_panel_data
alpha_true
beta1_true
beta2_true
beta3_true
beta4_true
coord_i
coord_j
time_id
unit_id
x1
x2
x3
x4
y

Construction & formulas

The data are generated from a two-part data-generating process on a 15×15 grid of N = 225 units indexed by integer coordinates (i, j), each observed over T = 3 periods (paper Eqs. 39–45). The columns alpha_true and beta1_truebeta4_true are the known truth the estimators are scored against.

The headline correction: pooled estimators recover β_k + δ_k (true slope plus the indirect contextual effect δ_k); the within-transformation ỹ_it = y_it − ȳ_i removes the time-invariant sc_i exactly, neutralising δ_k and restoring identification of the local slopes.

The datasets

Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.

expand to search (Ctrl/⌘+F) or print across all datasets

unit-period (spatial panel)  675 × 14 · 3 periods (time_id 0-2; not calendar years) · 225 spatial units on a 15x15 grid; 675 observations

Panel key: unit_id x time_id · Benchmark OLS / pooled OLS / FE / cross-sectional MGWR / PMGWR / MGWFER against known truth.

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
unit_id identifierSpatial unit IDIdentifier of the spatial unit (one of 225 grid cells); repeats across the unit's 3 time periods.0..224, in row-major order over the 15x15 grid (np.repeat(arange(225), 3)).integer ID (0-224)Simulationall rows
time_id identifierTime period indexPeriod index within the panel (3 periods per unit). Not a calendar year.np.tile(arange(3), 225); values 0, 1, 2.integer (0-2)Simulationall rows
coord_i identifierGrid row coordinate (i)Row position of the unit on the 15x15 lattice; spatial coordinate for kernel weighting.Row index 1..15, np.repeat(arange(1,16), 15) then replicated across time.grid units (1-15)Simulationall rows
coord_j identifierGrid column coordinate (j)Column position on the 15x15 lattice; drives the exponential spatial-context gradient.Column index 1..15, np.tile(arange(1,16), 15) then replicated across time.grid units (1-15)Simulationall rows
y continuousOutcome variableSimulated response: spatial context plus three causally-active covariates plus noise.y = sc_i + beta1*x1 + beta2*x2 + beta3*x3 + epsilon; epsilon ~ N(0, 0.5). x4 is excluded (paper Eqs. 44-45).synthetic unitsSimulationall rows
x1 continuousCovariate x1 (effect = quadratic dome)Causally-active covariate; its true local slope beta1 is a quadratic dome peaking at the grid centre.x1 = 0.05*sc_i + N(0, 0.5) (indirect contextual channel, paper Eq. 40).synthetic unitsSimulationall rows
x2 continuousCovariate x2 (effect = linear gradient)Causally-active covariate; its true local slope beta2 is a linear gradient increasing with i+j.x2 = 0.05*sc_i + N(0, 0.5) (paper Eq. 41).synthetic unitsSimulationall rows
x3 continuousCovariate x3 (effect = constant 1.5)Causally-active covariate; its true local slope beta3 is constant at 1.5 everywhere.x3 = 0.05*sc_i + N(0, 0.5) (paper Eq. 42).synthetic unitsSimulationall rows
x4 continuousCovariate x4 (null effect; spurious link to y)Covariate with NO causal effect on y (beta4 = 0); shares parent sc with y, so Cor(x4, y) ~ 0.84.x4 = 0.05*sc_i + N(0, 0.5) (paper Eq. 43); omitted from the y equation.synthetic unitsSimulationall rows
alpha_true continuousTrue spatial context / fixed effect (sc_i)Known time-invariant confounder; the intrinsic contextual effect MGWFER recovers. Truth column — not an observable predictor.sc_i = 30*(exp(j/15) - 1); exponential in column index j (range 2.07-51.55). Constant across the unit's 3 periods.synthetic unitsSimulation (ground truth)all rows
beta1_true continuousTrue local slope beta1 (quadratic dome)Known spatially varying coefficient on x1; ground truth for scoring. Quadratic dome peaking at the grid centre.1 + (q^2-(q-i/2)^2)*(q^2-(q-j/2)^2)/q^4, q=ceil(15/4) (range 1.06-2.00). Constant across periods.coefficientSimulation (ground truth)all rows
beta2_true continuousTrue local slope beta2 (linear gradient)Known spatially varying coefficient on x2; ground truth for scoring. Linear gradient in i+j.1 + (i+j)/(2*15) (range 1.07-2.00). Constant across periods.coefficientSimulation (ground truth)all rows
beta3_true continuousTrue local slope beta3 (constant 1.5)Known spatially homogeneous coefficient on x3; ground truth for scoring.1.5 everywhere (np.full(225, 1.5)).coefficientSimulation (ground truth)all rows
beta4_true continuousTrue local slope beta4 (null = 0)Known null coefficient on x4; ground truth for false-positive testing.0 everywhere (np.zeros(225)).coefficientSimulation (ground truth)all rows

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
unit_id100%675225
time_id100%6753
coord_i100%67515
coord_j100%67515
ymin -0.577 | median 26.2 | max 66.2100%675675-0.57728.5526.2066.1618.78
x1min -1.02 | median 1.08 | max 3.45100%675675-1.021.151.083.450.904
x2min -1.61 | median 1.09 | max 3.98100%675675-1.611.161.093.980.929
x3min -1.03 | median 1.05 | max 3.77100%675675-1.031.111.053.770.909
x4min -1.2 | median 1.12 | max 3.7100%675675-1.201.181.123.700.935
alpha_truemin 2.07 | median 21.1 | max 51.5100%675152.0723.2921.1451.5515.23
beta1_truemin 1.05 | median 1.46 | max 2100%675361.051.501.462.000.268
beta2_truemin 1.07 | median 1.53 | max 2100%675291.071.531.532.000.204
beta3_true100%67511.501.501.501.500
beta4_true100%675100000

Known limitations & caveats