Standard Errors in Panel Data: A Beginner's Guide in Python

Carlos Mendez

Last updated on Mar 31, 2026 26 min read Python, Econometrics, Tutorial

Contents

1. Overview

Imagine you run a regression and find that R&D spending significantly boosts firm performance, with a t-statistic of 30. Sounds like a rock-solid result. But what if that impressive t-statistic is an illusion — a consequence of using the wrong formula for your standard errors? In panel data, where the same firms are observed year after year, this is not a hypothetical worry. The repeated observations within each firm create correlation patterns that violate the assumptions behind ordinary standard errors, and ignoring these patterns can make your estimates look far more precise than they actually are.

Standard errors are the bridge between a point estimate and a statistical conclusion. If that bridge is built on the wrong assumptions, the conclusion collapses. In a classic cross-sectional regression with independent observations, conventional standard errors work well. But panel data — where firm 1 in 2015 is related to firm 1 in 2016 — breaks the independence assumption. A firm that performs well one year tends to perform well the next. Errors within the same firm are correlated, and this within-cluster correlation means conventional standard errors understate the true uncertainty surrounding your estimates.

The solution is to use standard error estimators that account for the structure of the data. In this tutorial, we build a simulated panel of 100 firms over 10 years with a known true effect, then systematically compare six approaches to standard error estimation: conventional, White (heteroskedasticity-robust), entity-clustered, time-clustered, two-way clustered, and Driscoll-Kraay. Along the way, we discover two critical lessons. First, no standard error estimator can rescue a biased estimator — fixed effects are needed to remove omitted variable bias. Second, even after fixing bias, the choice of standard error estimator determines whether our confidence intervals have the coverage they promise. The tutorial is inspired by and builds upon the excellent reference by Gregoire (2024), while using original simulated data and explanations.

Learning objectives:

Understand why within-cluster correlation invalidates conventional standard errors in panel data
Implement six standard error estimators using Python’s linearmodels package
Compare how different SE choices affect t-statistics and inference for the same regression
Assess empirical rejection rates via Monte Carlo simulation to identify which SEs correctly control size — that is, reject the true null hypothesis no more than 5% of the time
Distinguish between the bias problem (which SEs cannot fix) and the inference problem (which SEs can fix)

2. Setup and imports

Before running the analysis, install the required package if needed:

pip install linearmodels

The linearmodels library, developed by Kevin Sheppard, extends statsmodels with specialized panel data estimators. It provides PanelOLS for fixed effects regressions with flexible covariance options. The from_formula() method accepts R-style formulas where EntityEffects and TimeEffects keywords absorb group-level fixed effects.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
from linearmodels.panel import PanelOLS

# Reproducibility
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)

# Site color palette
STEEL_BLUE = "#6a9bcc"
WARM_ORANGE = "#d97757"
NEAR_BLACK = "#141413"
TEAL = "#00d4c8"

Dark theme figure styling (click to expand)

# Dark theme palette (consistent with site navbar/dark sections)
DARK_NAVY = "#0f1729"
GRID_LINE = "#1f2b5e"
LIGHT_TEXT = "#c8d0e0"
WHITE_TEXT = "#e8ecf2"

# Plot defaults — minimal, spine-free, dark background
plt.rcParams.update({
    "figure.facecolor": DARK_NAVY,
    "axes.facecolor": DARK_NAVY,
    "axes.edgecolor": DARK_NAVY,
    "axes.linewidth": 0,
    "axes.labelcolor": LIGHT_TEXT,
    "axes.titlecolor": WHITE_TEXT,
    "axes.spines.top": False,
    "axes.spines.right": False,
    "axes.spines.left": False,
    "axes.spines.bottom": False,
    "axes.grid": True,
    "grid.color": GRID_LINE,
    "grid.linewidth": 0.6,
    "grid.alpha": 0.8,
    "xtick.color": LIGHT_TEXT,
    "ytick.color": LIGHT_TEXT,
    "xtick.major.size": 0,
    "ytick.major.size": 0,
    "text.color": WHITE_TEXT,
    "font.size": 12,
    "legend.frameon": False,
    "legend.fontsize": 11,
    "legend.labelcolor": LIGHT_TEXT,
    "figure.edgecolor": DARK_NAVY,
    "savefig.facecolor": DARK_NAVY,
    "savefig.edgecolor": DARK_NAVY,
})

3. The data generating process

3.1 Why simulated data?

When studying standard errors, simulated data has a decisive advantage over real data: we know the true answer. If the true effect of R&D on performance is exactly 0.5, we can check whether each standard error estimator produces confidence intervals that contain 0.5 roughly 95% of the time. With real data, we never know the truth, so we cannot directly evaluate whether our SEs are working correctly.

Think of it like testing a thermometer. You would not test it in unknown conditions — you would dip it in ice water (0 degrees C) and boiling water (100 degrees C) to see if it reads correctly. Simulated data serves as our “known temperature.”

3.2 The DGP

Our data generating process creates a panel of 100 firms observed over 10 years. The key feature is that firm ability — an unobserved characteristic that differs across firms but stays constant over time — affects both R&D intensity and firm performance. This creates omitted variable bias in pooled regressions, exactly the scenario that motivates fixed effects.

The true model is:

$$y_{it} = 2.0 + 0.5 \cdot x_{it} + \mu_i + \lambda_t + \varepsilon_{it}$$

In words, firm performance ($y$) equals a constant (2.0) plus the true causal effect of R&D intensity ($x$) times 0.5, plus a firm-specific effect ($\mu_i$), a time-specific effect ($\lambda_t$), and an idiosyncratic error ($\varepsilon_{it}$). The firm effect $\mu_i$ is correlated with $x_{it}$ — more capable firms invest more in R&D — which means pooled OLS will overestimate the true effect. The errors follow an AR(1) — or first-order autoregressive — process within each firm, meaning each year’s error depends on the previous year’s error (with autocorrelation coefficient $\rho = 0.5$). This creates the within-cluster serial correlation that makes standard error choice critical.

In code, $y$ corresponds to our y column, $x$ is x (R&D intensity), and $\mu_i$ is the unobserved firm fixed effect that we will absorb with EntityEffects.

def simulate_panel(n_firms=100, n_years=10, seed=42):
    """Simulate a panel dataset with firm and time effects.

    True DGP:
        y_it = 2.0 + 0.5 * x_it + mu_i + lambda_t + eps_it

    Where mu_i is correlated with x_it (firm ability drives both
    R&D and performance), and eps_it has AR(1) serial correlation
    within firms (rho = 0.5).

    The TRUE causal effect of x on y is beta = 0.5.
    """
    rng = np.random.default_rng(seed)

    firms = np.repeat(np.arange(1, n_firms + 1), n_years)
    years = np.tile(np.arange(2010, 2010 + n_years), n_firms)

    # Firm-level unobserved heterogeneity (ability)
    firm_ability = rng.normal(0, 2, n_firms)
    mu = np.repeat(firm_ability, n_years)

    # Time effects (business cycle)
    time_shocks = rng.normal(0, 0.5, n_years)
    lam = np.tile(time_shocks, n_firms)

    # Treatment: R&D intensity (correlated with firm ability)
    x = 3.0 + 0.8 * mu + rng.normal(0, 1.5, n_firms * n_years)

    # Idiosyncratic errors with within-firm AR(1) serial correlation
    eps = np.zeros(n_firms * n_years)
    rho_ar = 0.5
    for i in range(n_firms):
        start = i * n_years
        eps[start] = rng.normal(0, 1.5)
        for t in range(1, n_years):
            eps[start + t] = rho_ar * eps[start + t - 1] + rng.normal(0, 1.5)

    # True model
    y = 2.0 + 0.5 * x + mu + lam + eps

    return pd.DataFrame({"firm": firms, "year": years, "y": y, "x": x})


df = simulate_panel(n_firms=100, n_years=10, seed=42)
print(f"Dataset shape: {df.shape}")
print(f"Number of firms: {df['firm'].nunique()}")
print(f"Number of years: {df['year'].nunique()}")
print(df.head())

Dataset shape: (1000, 4)
Number of firms: 100
Number of years: 10
 firm  year        y        x
    1  2010 6.721042 4.139183
    1  2011 5.889161 3.844151
    1  2012 2.355109 2.596322
    1  2013 2.589589 1.318461
    1  2014 3.569626 3.595742

The simulated panel contains 1,000 observations — 100 firms, each observed over 10 years from 2010 to 2019. Firm 1’s performance (y) ranges from about 2.4 to 6.7 across the decade, and its R&D intensity (x) varies between 1.3 and 4.1. These year-to-year fluctuations within a single firm represent the within-firm variation that fixed effects regressions exploit, while the systematic differences across firms (some consistently high, others consistently low) represent the between-firm variation that firm fixed effects absorb.

print(df.describe().round(4))

            firm       year          y          x
count  1000.0000  1000.0000  1000.0000  1000.0000
mean     50.5000  2014.5000     2.9699     2.8984
std      28.8805     2.8737     2.9686     1.9783
min       1.0000  2010.0000    -7.0880    -3.0834
25%      25.7500  2012.0000     0.9376     1.5721
50%      50.5000  2014.5000     2.9351     2.9669
75%      75.2500  2017.0000     5.0383     4.1769
max     100.0000  2019.0000    13.5170     9.1612

Firm performance (y) averages 2.97 with a standard deviation of 2.97, spanning from -7.09 to 13.52. R&D intensity (x) averages 2.90 with a standard deviation of 1.98. The wide ranges in both variables reflect the combination of genuine within-firm fluctuations and the large cross-firm differences injected by firm fixed effects. Next, we decompose this total variation to understand how much comes from differences between firms versus changes within firms over time.

4. Exploring the panel structure

Before estimating any model, we need to understand the structure of our panel data. A key diagnostic is the decomposition of variance into between-firm and within-firm components. This tells us where the action is — and why pooled OLS can go wrong.

4.1 Between vs. within variation

Think of variation in firm performance like variation in student test scores within a school. Some variation comes from differences between students (some students are consistently stronger than others) and some comes from variation within students over time (a student scores differently on different exams). In panel data, the “between” component captures persistent firm-level differences, while the “within” component captures how each firm deviates from its own average over time.

# Panel balance check
obs_per_firm = df.groupby("firm").size()
print(f"Observations per firm: min={obs_per_firm.min()}, "
      f"max={obs_per_firm.max()}, mean={obs_per_firm.mean():.1f}")
print(f"Panel is {'balanced' if obs_per_firm.nunique() == 1 else 'unbalanced'}")

# Within vs between variation
overall_std_y = df["y"].std()
between_std_y = df.groupby("firm")["y"].mean().std()
within_std_y = df.groupby("firm")["y"].transform(lambda g: g - g.mean()).std()

print(f"\nVariation in y:")
print(f"  Overall std:  {overall_std_y:.4f}")
print(f"  Between std:  {between_std_y:.4f}")
print(f"  Within std:   {within_std_y:.4f}")

overall_std_x = df["x"].std()
between_std_x = df.groupby("firm")["x"].mean().std()
within_std_x = df.groupby("firm")["x"].transform(lambda g: g - g.mean()).std()

print(f"\nVariation in x:")
print(f"  Overall std:  {overall_std_x:.4f}")
print(f"  Between std:  {between_std_x:.4f}")
print(f"  Within std:   {within_std_x:.4f}")

Observations per firm: min=10, max=10, mean=10.0
Panel is balanced

Variation in y:
  Overall std:  2.9686
  Between std:  2.4645
  Within std:   1.6715

Variation in x:
  Overall std:  1.9783
  Between std:  1.3751
  Within std:   1.4282

The decomposition reveals an important pattern. For firm performance (y), the between-firm standard deviation (2.46) is substantially larger than the within-firm standard deviation (1.67). This means that persistent differences across firms account for more of the total variation than year-to-year fluctuations within individual firms. The same pattern holds for R&D intensity (x): between-firm variation (1.38) is comparable to within-firm variation (1.43). Since firm fixed effects absorb all between-firm variation, this tells us that fixed effects will have a large impact on the regression — they are removing a dominant source of variation that is confounded with the treatment.

4.2 Within-firm correlations

within_corr = (
    df.groupby("firm")
    .apply(lambda g: g["y"].corr(g["x"]), include_groups=False)
)
print(f"Within-firm correlation (y, x):")
print(f"  Mean:   {within_corr.mean():.4f}")
print(f"  Median: {within_corr.median():.4f}")

Within-firm correlation (y, x):
  Mean:   0.4100
  Median: 0.4624

The average within-firm correlation between R&D and performance is 0.41, with a median of 0.46. This moderate positive correlation is what we expect given the true effect ($\beta = 0.5$): years in which a firm invests more in R&D tend to be years in which that firm performs better. The correlation is less than 0.5 because the AR(1) errors add noise.

# Figure: Panel structure and within-firm correlations
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
fig.patch.set_linewidth(0)

# Left: x vs y colored by firm (sample 10 firms)
rng_plot = np.random.default_rng(99)
sample_firms = sorted(rng_plot.choice(df["firm"].unique(), 10, replace=False))
colors_sample = [STEEL_BLUE, WARM_ORANGE, TEAL, "#e8956a", "#c4623d",
                 "#8fbfcc", "#e0a57a", "#5cc8c0", "#b0c4de", "#f0c8a0"]
for i, fid in enumerate(sample_firms):
    sub = df[df["firm"] == fid]
    axes[0].scatter(sub["x"], sub["y"], color=colors_sample[i % len(colors_sample)],
                    alpha=0.7, s=30, edgecolors=DARK_NAVY, linewidths=0.5)
axes[0].set_xlabel("R&D intensity (x)")
axes[0].set_ylabel("Firm performance (y)")
axes[0].set_title("10 sampled firms: x vs y", fontweight="bold")

# Right: within-firm correlation distribution
axes[1].hist(within_corr, bins=20, color=STEEL_BLUE, edgecolor=DARK_NAVY, alpha=0.85)
axes[1].axvline(within_corr.mean(), color=WARM_ORANGE, linewidth=2,
                linestyle="--", label=f"Mean = {within_corr.mean():.2f}")
axes[1].set_xlabel("Within-firm correlation (y, x)")
axes[1].set_ylabel("Number of firms")
axes[1].set_title("Distribution of within-firm correlations", fontweight="bold")
axes[1].legend()

plt.tight_layout()
plt.savefig("panel_ses_eda.png", dpi=300, bbox_inches="tight",
            facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()

Panel structure scatter plots showing 10 sampled firms and distribution of within-firm correlations.

The left panel shows how the 10 sampled firms form distinct clusters in the scatter plot — each firm occupies a different region of the x-y space. This visual clustering is the between-firm variation that fixed effects remove. The right panel shows that most firms have a positive within-firm correlation between R&D and performance, with the distribution centered around 0.41. A few firms have near-zero or negative correlations, reflecting the random noise in the simulation. These within-firm relationships are what fixed effects regressions actually estimate.

Now that we understand the panel structure, we are ready to set up the MultiIndex that linearmodels requires and begin estimating models.

5. Setting up the MultiIndex

The linearmodels package requires panel data to be stored in a pandas DataFrame with a MultiIndex: the entity (firm) as the first level and the time period (year) as the second. This structure tells the package which observations belong to the same firm and how they are ordered in time — information it needs to compute clustered standard errors and absorb fixed effects.

df_panel = df.set_index(["firm", "year"])
print(f"MultiIndex levels: {df_panel.index.names}")
print(df_panel.head(3))

MultiIndex levels: ['firm', 'year']
                  y         x
firm year
1    2010  6.721042  4.139183
     2011  5.889161  3.844151
     2012  2.355109  2.596322

The MultiIndex now encodes the panel structure directly in the DataFrame. Firm 1’s three displayed observations span 2010–2012, and linearmodels uses this ordering to know which observations to group when computing entity-clustered standard errors. With the data properly indexed, we can now estimate our first model.

6. Pooled OLS — the naive baseline

6.1 Conventional standard errors

We begin with the simplest possible approach: pooled OLS with conventional standard errors. This estimator ignores the panel structure entirely — it treats all 1,000 observations as if they were independent draws, like 1,000 different firms each observed once. We use PanelOLS.from_formula() with cov_type="unadjusted" to request conventional (homoskedastic) standard errors. The formula "y ~ 1 + x" specifies a regression of firm performance on R&D intensity with an intercept.

mod_pooled = PanelOLS.from_formula("y ~ 1 + x", data=df_panel)
res_pooled = mod_pooled.fit(cov_type="unadjusted")

beta_pooled = res_pooled.params["x"]
se_pooled = res_pooled.std_errors["x"]
t_pooled = res_pooled.tstats["x"]
print(f"Coefficient on x: {beta_pooled:.4f}")
print(f"Conventional SE:  {se_pooled:.4f}")
print(f"t-statistic:      {t_pooled:.4f}")

Coefficient on x: 1.0318
Conventional SE:  0.0345
t-statistic:      29.9151

The pooled OLS coefficient is 1.03 — more than double the true value of 0.5. This is omitted variable bias in action. Because high-ability firms both invest more in R&D and perform better, the regression attributes to R&D what is actually driven by unobserved ability. The conventional standard error of 0.0345 looks impressively small, yielding a t-statistic of 29.9. But this precision is doubly misleading: the point estimate itself is biased, and the standard error is too small because it ignores within-firm error correlation.

This is the first major lesson: a biased estimator with small standard errors is worse than a noisy but unbiased one. The conventional SE tells us we can be very confident that the effect is around 1.03 — but 1.03 is the wrong answer. No standard error correction can fix this; we need a different estimator (fixed effects) to address the bias. We will get there in Section 9. But first, let us see what happens when we try progressively better standard errors on the same biased pooled model.

6.2 White (heteroskedasticity-robust) standard errors

The next step up from conventional SEs is the White estimator, also called heteroskedasticity-consistent (HC) standard errors. While conventional SEs assume all errors have the same variance, the White estimator allows the error variance to differ across observations. Think of it as replacing a one-size-fits-all uncertainty measure with one tailored to each data point. In linearmodels, we request it with cov_type="robust".

The White covariance estimator is:

$$\hat{\Sigma}_{\text{White}} = (X’X)^{-1} \left( \sum_{i=1}^{N} X_i' \hat{e}_i^2 X_i \right) (X’X)^{-1}$$

In words, this replaces the constant variance assumption with the squared residuals $\hat{e}_i^2$ from each observation, producing standard errors that are robust to heteroskedasticity — situations where the spread of errors varies with the level of $X$.

res_white = mod_pooled.fit(cov_type="robust")
se_white = res_white.std_errors["x"]
t_white = res_white.tstats["x"]
print(f"White SE:    {se_white:.4f}")
print(f"t-statistic: {t_white:.4f}")

White SE:    0.0361
t-statistic: 28.5897

The White SE (0.0361) is only slightly larger than the conventional SE (0.0345), and the t-statistic barely budges from 29.9 to 28.6. This is because heteroskedasticity is not the main problem here — within-cluster correlation is. The White estimator treats each observation as independent, just with potentially different variances. It does not account for the fact that firm 1’s error in 2015 is correlated with firm 1’s error in 2016. For panel data with serial correlation, we need standard errors that account for this clustering.

7. Clustered standard errors

7.1 The intuition behind clustering

Clustering is the workhorse correction for panel data standard errors. The idea is simple: if errors within a firm are correlated, then 10 observations from the same firm do not contain as much independent information as 10 observations from 10 different firms. Clustering acknowledges this by allowing arbitrary correlation among all observations within the same cluster.

Think of surveying students in classrooms. If you survey 100 students from 10 classrooms (10 per classroom), you do not have 100 independent data points — students in the same classroom share the same teacher, curriculum, and classroom environment. The effective sample size is closer to 10 (the number of classrooms) than 100 (the number of students). Clustering adjusts the standard errors to reflect this reduced effective sample size.

7.2 Entity-clustered SEs

Entity clustering allows arbitrary correlation among all observations within the same firm. We request it by setting cluster_entity=True.

# Entity-clustered
res_cl_entity = mod_pooled.fit(cov_type="clustered", cluster_entity=True)
se_cl_entity = res_cl_entity.std_errors["x"]
t_cl_entity = res_cl_entity.tstats["x"]
print(f"Entity-clustered SE: {se_cl_entity:.4f}")
print(f"t-statistic:         {t_cl_entity:.4f}")

Entity-clustered SE: 0.0621
t-statistic:         16.6233

Entity-clustered SEs (0.0621) are 80% larger than conventional SEs (0.0345) and nearly double the White SEs (0.0361). The t-statistic drops from 29.9 to 16.6 — still highly significant in this case, but the inflation in standard errors demonstrates how much conventional SEs understate uncertainty when within-firm correlation is present. In a setting with a weaker true effect, this correction could flip a “significant” result to “insignificant.”

7.3 Time-clustered SEs

Time clustering allows correlation among all firms within the same year. This matters when firms face common shocks — a recession, a regulatory change, or a market-wide technology shift that affects all firms simultaneously.

# Time-clustered
res_cl_time = mod_pooled.fit(cov_type="clustered", cluster_time=True)
se_cl_time = res_cl_time.std_errors["x"]
t_cl_time = res_cl_time.tstats["x"]
print(f"Time-clustered SE:   {se_cl_time:.4f}")
print(f"t-statistic:         {t_cl_time:.4f}")

Time-clustered SE:   0.0168
t-statistic:         61.2757

Time-clustered SEs (0.0168) are actually smaller than conventional SEs, and the t-statistic jumps to 61.3. This happens because our DGP has only weak time effects ($\lambda_t \sim N(0, 0.5)$) but strong firm effects. With only 10 time clusters (years), the clustering correction has very few groups to work with, and the asymptotic theory — the mathematical guarantees that hold when the number of clusters is large — that justifies clustered SEs relies on having many clusters. As a rule of thumb, cluster on the dimension that has at least 40–50 groups. Here, entity clustering (100 firms) is far more appropriate than time clustering (10 years).

7.4 Two-way clustered SEs

Two-way clustering allows correlation along both dimensions simultaneously — within firms over time and across firms within the same year. This is the most conservative approach, proposed by Cameron, Gelbach, and Miller (2011). In linearmodels, set both cluster_entity=True and cluster_time=True.

# Two-way clustered
res_cl_both = mod_pooled.fit(cov_type="clustered",
                             cluster_entity=True, cluster_time=True)
se_cl_both = res_cl_both.std_errors["x"]
t_cl_both = res_cl_both.tstats["x"]
print(f"Two-way clustered SE: {se_cl_both:.4f}")
print(f"t-statistic:          {t_cl_both:.4f}")

Two-way clustered SE: 0.0532
t-statistic:          19.3829

The two-way clustered SE (0.0532) falls between the entity-clustered (0.0621) and time-clustered (0.0168) estimates. This makes sense: the two-way estimator combines information from both clustering dimensions. Since the time dimension contributes little (weak time effects, few clusters), the two-way SE is somewhat smaller than entity-only clustering. In practice, two-way clustering is recommended when both dimensions have enough clusters and both types of correlation are plausible.

8. A side-by-side comparison so far

Before introducing fixed effects, let us pause to see all the pooled OLS standard errors side by side. Remember: the point estimate (1.0318) is the same for all of them — only the standard errors and hence the confidence intervals differ.

Model / SE Type	Coefficient	Std. Error	t-stat
Pooled OLS (conventional)	1.0318	0.0345	29.92
Pooled OLS (White/HC)	1.0318	0.0361	28.59
Pooled OLS (cluster: entity)	1.0318	0.0621	16.62
Pooled OLS (cluster: time)	1.0318	0.0168	61.28
Pooled OLS (cluster: both)	1.0318	0.0532	19.38

The entity-clustered SE is 1.8 times larger than the conventional SE. But recall that all these models estimate the wrong coefficient (1.03 vs. the true 0.5). Correcting standard errors on a biased estimator is like putting better tires on a car driving in the wrong direction. Next, we fix the direction with fixed effects.

9. Entity fixed effects with clustered SEs

9.1 Why fixed effects solve the bias

Fixed effects regression removes all time-invariant differences between firms before estimating the coefficient. Mathematically, it subtracts each firm’s time-average from its observations — a process called demeaning. After demeaning, the unobserved firm ability $\mu_i$ vanishes because it is constant over time, and we estimate $\beta$ using only the within-firm variation in $x$ and $y$. This eliminates the omitted variable bias that inflated the pooled OLS estimate.

In linearmodels, adding EntityEffects to the formula absorbs firm fixed effects:

mod_fe = PanelOLS.from_formula("y ~ 1 + x + EntityEffects", data=df_panel)
res_fe_cl = mod_fe.fit(cov_type="clustered", cluster_entity=True)

beta_fe = res_fe_cl.params["x"]
se_fe_cl = res_fe_cl.std_errors["x"]
t_fe_cl = res_fe_cl.tstats["x"]
print(f"FE coefficient on x:    {beta_fe:.4f}")
print(f"Entity-clustered SE:    {se_fe_cl:.4f}")
print(f"t-statistic:            {t_fe_cl:.4f}")

FE coefficient on x:    0.4829
Entity-clustered SE:    0.0357
t-statistic:            13.5250

The fixed effects coefficient (0.4829) is dramatically closer to the true value of 0.5 than the pooled estimate (1.0318). The remaining gap of 0.017 is sampling noise, not systematic bias. The entity-clustered SE of 0.0357 is actually smaller than the pooled entity-clustered SE (0.0621) because fixed effects remove the between-firm variation that was inflating the residuals.

9.2 Two-way fixed effects

We can also absorb time fixed effects by adding TimeEffects, which removes year-specific shocks common to all firms. This controls for business cycle effects, regulatory changes, or any other year-level phenomenon.

mod_twfe = PanelOLS.from_formula("y ~ 1 + x + EntityEffects + TimeEffects",
                                 data=df_panel)
res_twfe = mod_twfe.fit(cov_type="clustered", cluster_entity=True)

beta_twfe = res_twfe.params["x"]
se_twfe = res_twfe.std_errors["x"]
t_twfe = res_twfe.tstats["x"]
print(f"TWFE coefficient on x:  {beta_twfe:.4f}")
print(f"Entity-clustered SE:    {se_twfe:.4f}")
print(f"t-statistic:            {t_twfe:.4f}")

TWFE coefficient on x:  0.4796
Entity-clustered SE:    0.0376
t-statistic:            12.7392

Adding time fixed effects barely changes the estimate (0.4796 vs. 0.4829) and slightly increases the standard error (0.0376 vs. 0.0357). This makes sense: the time effects in our DGP are small ($\lambda_t \sim N(0, 0.5)$), so absorbing them provides only a minor correction while consuming 9 additional degrees of freedom. In real applications where macroeconomic shocks are substantial, two-way FE can make a bigger difference.

10. Driscoll-Kraay standard errors

Driscoll and Kraay (1998) proposed a standard error estimator that accounts for both cross-sectional correlation (across firms within a period) and temporal dependence (within firms over time), using a kernel-based approach similar to Newey-West but applied to cross-sectional averages. In linearmodels, we request it with cov_type="kernel" and a Bartlett kernel (equivalent to Newey and West (1987) weighting). The bandwidth parameter controls how many time lags of correlation the estimator accounts for — a bandwidth of 3 means it incorporates correlations up to 3 years apart, with declining weights for longer lags.

res_dk = mod_pooled.fit(cov_type="kernel", kernel="bartlett", bandwidth=3)
se_dk = res_dk.std_errors["x"]
t_dk = res_dk.tstats["x"]
print(f"Driscoll-Kraay SE (BW=3): {se_dk:.4f}")
print(f"t-statistic:              {t_dk:.4f}")

Driscoll-Kraay SE (BW=3): 0.0158
t-statistic:              65.4073

The Driscoll-Kraay SE (0.0158) is the smallest we have seen — even smaller than conventional SEs. This reflects the estimator’s focus on cross-sectional dependence, which is weak in our simulation (firms are independent given their fixed effects). In applications with strong cross-sectional correlation — for example, banks exposed to the same macroeconomic shock — Driscoll-Kraay SEs can be substantially larger. The key feature is robustness to cross-sectional dependence that entity clustering alone cannot handle.

11. Full comparison

11.1 Summary table

Now we can see all eight model-SE combinations in a single table. The true coefficient is $\beta = 0.5$. The “Reject H0” column tests the default null H0: $\beta = 0$ (not H0: $\beta = 0.5$). In Section 12, the Monte Carlo explicitly tests against the true value.

Model / SE Type	Coefficient	Std. Error	t-stat	Reject H0 (5%)
Pooled OLS (conventional)	1.0318	0.0345	29.92	Yes
Pooled OLS (White/HC)	1.0318	0.0361	28.59	Yes
Pooled OLS (cluster: entity)	1.0318	0.0621	16.62	Yes
Pooled OLS (cluster: time)	1.0318	0.0168	61.28	Yes
Pooled OLS (cluster: both)	1.0318	0.0532	19.38	Yes
Entity FE (cluster: entity)	0.4829	0.0357	13.53	Yes
Two-way FE (cluster: entity)	0.4796	0.0376	12.74	Yes
Pooled OLS (Driscoll-Kraay)	1.0318	0.0158	65.41	Yes

Two patterns stand out. First, all pooled models estimate a coefficient around 1.03 — more than double the true 0.5 — while both FE models recover estimates close to 0.5. This is the bias-versus-variance distinction: standard errors address precision, not accuracy. Second, among the FE models (which have the right coefficient), entity-clustered SEs are appropriately sized relative to the true uncertainty.

11.2 Standard error comparison

# Figure: SE comparison bar chart (code in script.py)

Bar chart comparing standard error estimates across all eight model-SE combinations.

The bar chart reveals the full spectrum of standard error estimates. Entity-clustered SEs on the pooled model (0.0621) are the largest — they correctly reflect high within-firm correlation but sit atop a biased estimate. The FE models' entity-clustered SEs (0.036–0.038) are smaller because fixed effects absorbed the between-firm variation that inflated residuals. At the other extreme, Driscoll-Kraay (0.0158) and time-clustered (0.0168) SEs are the smallest, reflecting the weak cross-sectional and time-level correlation in our data.

11.3 Confidence intervals

# Figure: Confidence intervals across methods (code in script.py)

Confidence interval plot showing 95% CIs across all eight methods, with a dashed line at the true beta of 0.5.

The confidence interval plot delivers the tutorial’s core visual message. The teal dashed line at $\beta = 0.5$ is the truth. All five pooled OLS intervals (blue) are far to the right — none come close to covering the true value, regardless of which SE estimator we use. The two FE intervals (orange) are centered near 0.5 and easily cover it. The lesson is unmistakable: standard errors cannot rescue a biased point estimate, but combined with a consistent estimator, they produce intervals with correct coverage.

12. Monte Carlo simulation — which SEs get the right rejection rate?

12.1 The experiment

The confidence interval plot above shows one simulation. But how do we know whether those intervals typically contain the true value? A single simulation could be lucky or unlucky. To rigorously evaluate each SE estimator, we need a Monte Carlo simulation: generate hundreds of independent datasets from the same DGP, estimate the model on each, and check how often the 95% confidence interval covers the true $\beta = 0.5$.

If an SE estimator is correctly sized, its 95% CI should cover the truth 95% of the time, meaning it rejects the true null hypothesis only 5% of the time. An SE that is too small produces intervals that are too narrow, leading to over-rejection — false positives in more than 5% of simulations.

We focus on Entity FE models because they produce unbiased estimates. This isolates the SE question: given that the point estimate is right on average, do the standard errors correctly quantify the remaining uncertainty?

12.2 Results

N_SIM = 500
# ... (Monte Carlo loop runs Entity FE with 6 different SE types) ...

Empirical rejection rates at 5% level (H0: beta=0.5 is true):
  FE + conventional             : 0.060 (30/500)   ~correct
  FE + White (HC)               : 0.064 (32/500)   ~correct
  FE + cluster: entity          : 0.066 (33/500)   ~correct
  FE + cluster: time            : 0.090 (45/500)
  FE + cluster: both            : 0.078 (39/500)   ~correct
  TWFE + cluster: entity        : 0.032 (16/500)   ~correct

# Figure: Monte Carlo rejection rates (code in script.py)

Monte Carlo rejection rates for six FE model and SE combinations, with a dashed line at the nominal 5% level.

The Monte Carlo results across 500 simulations reveal meaningful differences. Entity FE with entity-clustered SEs rejects at 6.6% — close to the nominal 5% and well within the range expected from simulation noise. Conventional SEs (6.0%) and White SEs (6.4%) also perform well here because, after absorbing firm fixed effects, the remaining within-firm errors are approximately homoskedastic with moderate serial correlation that 100 clusters can handle.

The outlier is FE with time-clustered SEs at 9.0% — nearly double the nominal rate. This over-rejection occurs because time clustering with only 10 year-clusters violates the large-cluster asymptotic assumption. With 10 clusters, the finite-sample correction is insufficient, and the SEs are too small. TWFE with entity-clustered SEs (3.2%) is slightly conservative, meaning its confidence intervals are a bit wider than necessary — a benign property compared to over-rejection.

12.3 Standard error ratios

# Figure: SE ratios relative to entity-clustered (code in script.py)

SE ratios relative to entity-clustered standard errors as the benchmark.

This figure normalizes all standard errors to the entity-clustered SE (the recommended default). Ratios below 1.0 indicate SEs that are smaller than entity-clustered — and therefore potentially over-confident. Conventional SEs and White SEs on the pooled model are about 0.55–0.58 times the entity-clustered SE, confirming they understate uncertainty by roughly 40%. The FE-based entity-clustered SE (0.57x) is smaller because fixed effects reduce residual variance — this is a genuine precision gain, not an artifact of ignoring correlation.

13. Discussion

13.1 Answering the case study question

We asked: when firms are observed over multiple years, how does our choice of standard error estimator change what we conclude about the effect of R&D spending on firm performance? The answer has two parts.

First, the bias problem. Pooled OLS estimates R&D’s effect at 1.03 — more than double the true 0.5. This bias comes from omitted firm ability, not from standard error choice. Entity fixed effects reduce the estimate to 0.48, close to the truth. No standard error correction can fix a biased coefficient.

Second, the inference problem. Even after fixing bias with FE, standard error choice matters. In our Monte Carlo, time-clustered SEs on FE models rejected the true null at 9.0% instead of 5%. Entity-clustered SEs maintained correct size at 6.6%. For a practitioner, using the wrong SEs could mean reporting a “significant” finding that is actually a false positive.

13.2 Practical guidance

Following the recommendations of Petersen (2009), here is a decision framework:

Always start with fixed effects if the panel has entity-level unobserved heterogeneity. Without FE, standard error corrections address precision but not bias.
Cluster on the dimension with more groups. Entity clustering (100 firms) is more reliable than time clustering (10 years) because clustered SEs rely on large-cluster asymptotics.
Two-way clustering is the safe default when both dimensions have enough clusters (rule of thumb: at least 40–50 each). It accounts for both types of dependence simultaneously.
Driscoll-Kraay is specialized. Use it when cross-sectional dependence is strong and the number of time periods is large (e.g., long macroeconomic panels).

14. Summary and next steps

Key takeaways:

Standard errors cannot fix bias. Pooled OLS overestimated the R&D effect at 1.03 (true: 0.5) regardless of which SE estimator was applied. Entity fixed effects recovered an estimate of 0.48 — close to the truth. Always address the model before worrying about the standard errors.
Clustering dimension matters. Entity-clustered SEs (0.0621) were 80% larger than conventional SEs (0.0345) on the pooled model, reflecting the within-firm correlation that conventional SEs ignore. Time-clustered SEs (0.0168) were misleadingly small because only 10 year-clusters provided too few groups for reliable asymptotic inference.
Monte Carlo validation is essential. Entity-clustered SEs on the FE model rejected the true null at 6.6% (close to the nominal 5%), while time-clustered SEs rejected at 9.0% — nearly double the expected rate. Simulation is the only way to verify that your SE choice controls size in your specific data structure.
The FE + entity-clustered combination is the reliable default. It addresses both bias (via FE) and inference (via clustering). Two-way clustering adds insurance against cross-sectional correlation when both dimensions have enough groups.

Limitations:

Our simulation uses balanced panels. With unbalanced panels (firms entering and exiting), some SE estimators require additional adjustments.
We used 100 firms and 10 years. Results may differ with fewer clusters or different cluster-size ratios.
The DGP has a simple AR(1) error structure. Real data may have more complex dependence patterns.

Next steps:

Apply these techniques to a real firm-level dataset (e.g., Compustat) and compare SE estimates.
Explore bootstrap-based approaches for clustered inference with few clusters (wild cluster bootstrap).
Study the Cameron-Gelbach-Miller multi-way clustering theory for panels with more than two clustering dimensions.

15. Exercises

Modify the DGP. Change the AR(1) coefficient from 0.5 to 0.9 (stronger serial correlation) and re-run the Monte Carlo. Which SE estimators are most affected? Does entity-clustering still control size at 5%?
Reduce the number of firms. Set n_firms=20 (keeping n_years=10) and re-run the Monte Carlo. With only 20 entity clusters, do entity-clustered SEs still perform well? At what cluster count do they start to break down?
Add cross-sectional dependence. Modify simulate_panel() so that each year has a common shock ($\delta_t$) that enters all firms' errors: eps[start + t] += delta_t. Re-run the analysis and check whether entity-clustered SEs still control size, or whether Driscoll-Kraay / two-way clustering becomes necessary.

References

python econometrics panel