Downloads
Each dataset is available as a labeled Stata .dta and its source file.
⇩ Download all data (ZIP)stata_codebook.do
| Dataset | Grain | Rows | Stata | Source |
|---|---|---|---|---|
v113i06 | bank-quarter | 12,600 × 11 | v113i06.dta | v113i06.dta |
Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.
Load directly in code
Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.
Stata
* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_spxtivdfreg/data/"
use "${BASE}v113i06.dta", clear
describe
notesPython
!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_spxtivdfreg/data/"
df = pd.read_stata(BASE + "v113i06.dta")
# load every dataset at once
files = ["v113i06"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}
# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "v113i06.dta", "v113i06.dta")
df, meta = pyreadstat.read_dta("v113i06.dta")Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb
R
# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_spxtivdfreg/data/"
df <- read_dta(paste0(BASE, "v113i06.dta"))Overview & sources
Companion data for a Stata tutorial that replicates the empirical application of Kripfganz & Sarafidis (2025, Journal of Statistical Software 113(6)) with the spxtivdfreg package — a defactored instrumental-variables estimator for spatial dynamic panels with unobserved common factors. The panel models the non-performing-loan (NPL) ratio of 350 US commercial banks observed quarterly over 2006:Q1–2014:Q4 (36 quarters; 12,600 observations, 12,250 in the effective estimation sample), spanning the entire Global Financial Crisis. Credit risk is modelled through four simultaneous endogeneity channels — a spatial lag of NPL (ψ), temporal persistence (ρ), an endogenous regressor (INEFF, instrumented by INTEREST), and latent common factors — using a 350×350 economic-distance spatial weight matrix built from Spearman rank correlations of bank debt ratios. This page documents the bank-quarter panel (v113i06.dta); the weight matrix W.csv is a separate file (see note below).
v113i06 is a strongly balanced bank-quarter panel — one row per bank × quarter, 350 banks × 36 quarters = 12,600 rows. ID is the bank identifier and TIME the quarterly index (1–36, mapping 2006:Q1–2014:Q4); xtset ID TIME declares the panel. The companion spatial weight matrix W.csv — a 350×350, row-standardized, economic-distance matrix (6,300 nonzero entries, ~18 neighbours per bank) — is a bare numeric matrix with no variable columns and is therefore not a documented variable table here; spxtivdfreg loads it via spmatrix("W.csv", import).
Data sources
| Source | Provides | Reference / URL |
|---|---|---|
| Kripfganz & Sarafidis (2025) | Replicated study; the v113i06.dta bank-quarter panel and the W.csv weight matrix (JSS replication package) | Kripfganz, S., & Sarafidis, V. (2025). Estimating spatial dynamic panel data models with unobserved common factors in Stata. Journal of Statistical Software, 113(6). https://doi.org/10.18637/jss.v113.i06 |
| Method references | Estimator and concepts (defactored IV, common factors, IV panels) | Kripfganz & Sarafidis (2021), Stata Journal 21(3); Sarafidis & Wansbeek (2012), Econometric Reviews 31(5); Pesaran (2006), Econometrica 74(4). |
| Spatial-panel context | Comparator packages and spatial-panel framework | Belotti, Hughes & Mortari (2017), Stata Journal 17(1); Elhorst (2014), Spatial Econometrics, Springer. |
Cite this data
Please cite this dataset as follows.
APA
Mendez, C. (2026). Spatial Dynamic Panels with Common Factors in Stata: Credit Risk in US Banking [Data set]. https://carlos-mendez.org/post/stata_spxtivdfreg/
Kripfganz, S., & Sarafidis, V. (2025). Estimating spatial dynamic panel data models with unobserved common factors in Stata. Journal of Statistical Software, 113(6). https://doi.org/10.18637/jss.v113.i06BibTeX
@misc{mendez2026stataspxtivdfreg,
author = {Mendez, Carlos},
title = {Spatial Dynamic Panels with Common Factors in Stata: Credit Risk in US Banking},
year = {2026},
howpublished = {\url{https://carlos-mendez.org/post/stata_spxtivdfreg/}},
note = {Data set}
}
@article{kripfganz2025spatial,
author = {Kripfganz, Sebastian and Sarafidis, Vasilis},
title = {Estimating Spatial Dynamic Panel Data Models with Unobserved Common Factors in {Stata}},
journal = {Journal of Statistical Software},
volume = {113}, number = {6}, year = {2025},
doi = {10.18637/jss.v113.i06}
}Variable explorer search & filter all 11 variables
Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.
| Variable | Type | Distribution | Label | Definition | Units | In files | Source |
|---|---|---|---|---|---|---|---|
BUFFER# | continuous | Capital buffer (leverage ratio minus 8%) | Capital buffer above the regulatory minimum (leverage ratio minus the 8% threshold). | percentage points | v113i06 | Kripfganz & Sarafidis (2025) | |
CAR# | continuous | Capital adequacy ratio (%) | Regulatory capital adequacy ratio of the bank. | % | v113i06 | Kripfganz & Sarafidis (2025) | |
ID# | identifier | – | Bank identifier | Anonymized US commercial-bank identifier (1-350); the panel cross-section unit. | 1-350 | v113i06 | Kripfganz & Sarafidis (2025) |
INEFF# | continuous | Operational inefficiency (endogenous) | Bank operational inefficiency; treated as the endogenous regressor in the NPL equation. | ratio | v113i06 | Kripfganz & Sarafidis (2025) | |
INTEREST# | continuous | Interest expenses / deposits (instrument) | Interest expenses relative to deposits; the excluded instrument for the endogenous INEFF. | ratio | v113i06 | Kripfganz & Sarafidis (2025) | |
LIQUIDITY# | continuous | Loan-to-deposit ratio | Loans relative to deposits; the covariate with the largest effect on NPL in the full model. | ratio | v113i06 | Kripfganz & Sarafidis (2025) | |
NPL# | continuous | Non-performing loan ratio (%) | Non-performing loans as a share of total loans, in percentage points; the dependent variable (credit risk). | % (percentage points) | v113i06 | Kripfganz & Sarafidis (2025) | |
PROFIT# | continuous | Profitability (return on equity, %) | Bank profitability, annualized return on equity. | % (annualized ROE) | v113i06 | Kripfganz & Sarafidis (2025) | |
QUALITY# | continuous | Loan quality (loan-loss provisions / assets, %) | Loan loss provisions as a share of assets; a flow indicator of asset quality. | % | v113i06 | Kripfganz & Sarafidis (2025) | |
SIZE# | continuous | Bank size, ln(total assets) | Natural log of total assets; a proxy for bank scale and systemic exposure. | log (ln assets) | v113i06 | Kripfganz & Sarafidis (2025) | |
TIME# | identifier | – | Quarterly time index | Quarter counter 1-36, mapping 2006:Q1 (=1) to 2014:Q4 (=36). | 1-36 (quarters) | v113i06 | Kripfganz & Sarafidis (2025) |
Cross-file variable index
Which file each variable appears in (● = present).
Construction & formulas
The model is a spatial dynamic panel with interactive (factor) fixed effects,
estimated by defactored IV. For bank i at quarter t:
- Structural equation:
NPL_it = ψ Σ_j w_ij NPL_jt + ρ NPL_i,t-1 + x_it β + α_i + λ_i' f_t + ε_it— spatial lag (ψ), temporal lag (ρ), covariates (β), bank fixed effect (α_i), and the interactive fixed effect (λ_i' f_t, the common factors). - Spatial lag:
W·NPL, the row-standardized weighted average of neighbours' NPL;w_ij > 0when banksi, jare economic neighbours (debt-ratio Spearman correlation above the 95th percentile). - Defactored IV: step 1 extracts and removes the common factors via principal
components (the
stdoption standardizes first); step 2 runs IV/GMM on the defactored data, instrumentingINEFFwithINTERESTand lagged exogenous regressors. - Variance share of factors:
ρ_factor = σ_f² / (σ_f² + σ_e²)— fraction of residual variance due to common factors (0.335 in the full model). - Long-run total effect:
β / [(1 − ρ)(1 − ψ)]— the short-run coefficient amplified by the temporal multiplier1/(1−ρ)and the spatial multiplier1/(1−ψ).
The covariates x_it are the bank financial ratios INEFF,
CAR, SIZE, BUFFER, PROFIT, QUALITY,
and LIQUIDITY; INTEREST serves only as an excluded instrument for the
endogenous INEFF.
The datasets
Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.
expand to search (Ctrl/⌘+F) or print across all datasets
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
ID identifier | Bank identifier | Anonymized US commercial-bank identifier (1-350); the panel cross-section unit. | Integer bank code from the replication package; declared with xtset ID TIME. | 1-350 | Kripfganz & Sarafidis (2025) | 350 banks |
TIME identifier | Quarterly time index | Quarter counter 1-36, mapping 2006:Q1 (=1) to 2014:Q4 (=36). | Sequential quarter index from the replication package; the panel time variable. | 1-36 (quarters) | Kripfganz & Sarafidis (2025) | 36 quarters |
NPL continuous | Non-performing loan ratio (%) | Non-performing loans as a share of total loans, in percentage points; the dependent variable (credit risk). | Bank-quarter NPL/total-loans from the replication package; modelled with spatial and temporal lags. | % (percentage points) | Kripfganz & Sarafidis (2025) | bank-quarter |
INEFF continuous | Operational inefficiency (endogenous) | Bank operational inefficiency; treated as the endogenous regressor in the NPL equation. | Bank-quarter inefficiency measure; instrumented by INTEREST and lagged exogenous regressors. | ratio | Kripfganz & Sarafidis (2025) | bank-quarter |
CAR continuous | Capital adequacy ratio (%) | Regulatory capital adequacy ratio of the bank. | Bank-quarter CAR from the replication package; an exogenous covariate. | % | Kripfganz & Sarafidis (2025) | bank-quarter |
SIZE continuous | Bank size, ln(total assets) | Natural log of total assets; a proxy for bank scale and systemic exposure. | log of bank total assets, bank-quarter; an exogenous covariate. | log (ln assets) | Kripfganz & Sarafidis (2025) | bank-quarter |
BUFFER continuous | Capital buffer (leverage ratio minus 8%) | Capital buffer above the regulatory minimum (leverage ratio minus the 8% threshold). | Leverage ratio minus 8, bank-quarter; an exogenous covariate (protective: enters NPL negatively). | percentage points | Kripfganz & Sarafidis (2025) | bank-quarter |
PROFIT continuous | Profitability (return on equity, %) | Bank profitability, annualized return on equity. | Bank-quarter ROE from the replication package; an exogenous covariate. | % (annualized ROE) | Kripfganz & Sarafidis (2025) | bank-quarter |
QUALITY continuous | Loan quality (loan-loss provisions / assets, %) | Loan loss provisions as a share of assets; a flow indicator of asset quality. | Bank-quarter provisions/assets from the replication package; an exogenous covariate. | % | Kripfganz & Sarafidis (2025) | bank-quarter |
LIQUIDITY continuous | Loan-to-deposit ratio | Loans relative to deposits; the covariate with the largest effect on NPL in the full model. | Bank-quarter loan-to-deposit ratio from the replication package; an exogenous covariate. | ratio | Kripfganz & Sarafidis (2025) | bank-quarter |
INTEREST continuous | Interest expenses / deposits (instrument) | Interest expenses relative to deposits; the excluded instrument for the endogenous INEFF. | Bank-quarter interest-expense/deposits from the replication package; enters only the iv() instrument set. | ratio | Kripfganz & Sarafidis (2025) | bank-quarter |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
ID | – | 100% | 12,600 | 350 | — | — | — | — | — |
TIME | – | 100% | 12,600 | 36 | — | — | — | — | — |
NPL | 100% | 12,600 | 11,742 | 0 | 1.73 | 1.10 | 23.04 | 2.11 | |
INEFF | 100% | 12,600 | 12,593 | 0.044 | 0.289 | 0.266 | 0.946 | 0.120 | |
CAR | 100% | 12,600 | 12,546 | 2.56 | 17.68 | 14.52 | 193.0 | 10.31 | |
SIZE | 100% | 12,600 | 12,327 | 9.19 | 11.98 | 11.85 | 19.25 | 1.26 | |
BUFFER | 100% | 12,600 | 12,595 | -6.49 | 2.86 | 1.81 | 48.00 | 3.82 | |
PROFIT | 100% | 12,600 | 12,582 | -189.7 | 8.59 | 8.54 | 217.4 | 10.38 | |
QUALITY | 100% | 12,600 | 8,517 | -4.95 | 0.283 | 0.126 | 27.87 | 0.625 | |
LIQUIDITY | 100% | 12,600 | 12,586 | 0.012 | 0.770 | 0.780 | 2.32 | 0.222 | |
INTEREST | 100% | 12,600 | 12,065 | -5.16 | -1.91 | -1.96 | 2.52 | 0.933 |
Known limitations & caveats
- Real replication data. This is the published JSS replication panel from Kripfganz & Sarafidis (2025); it is empirical bank data, not simulated. Use it under the terms of the original replication package and cite the source article.
- The weight matrix is separate. The spatial structure lives in
W.csv(a 350×350 row-standardized economic-distance matrix), not in this panel; it has no variable columns and is documented in the post (§3.3), not as a variable table here. - INEFF is endogenous. Operational inefficiency is treated as endogenous in the NPL equation (reverse causality plus unobserved management quality) and is instrumented by
INTERESTand lagged exogenous regressors — do not interpret its OLS correlation causally. - Effective sample is smaller than 12,600. Absorbing bank fixed effects and forming the temporal lag drops the first period, leaving 12,250 observations (350 banks × 35 periods) in estimation.
- Common factors are essential for inference. Omitting them roughly doubles the temporal-persistence estimate and causes the Hansen J-test to reject (p < 0.001); the documented variables support, but do not by themselves guarantee, valid inference without the factor structure.