How does one cluster standard errors two ways in Stata? This question comes up frequently in time series panel data (i.e. where data are organized by unit ID and time period) but can come up in other data with panel structure as well (e.g. firms by industry and region).

I’ll first show how two-way clustering does **not** work in Stata. I have seen this occasionally in practice, so I think it’s important to get it out of the way.

The standard `regress`

command in Stata only allows one-way clustering. Getting around that restriction, one might be tempted to

- Create a group identifier for the interaction of your two levels of clustering
- Run
`regress`

and cluster by the newly created group identifier

## ## ## . use "http://www.stata-press.com/data/r14/nlswork.dta", clear ## (National Longitudinal Survey. Young Women 14-26 years of age in 1968) ## ## . egen double_cluster=group(idcode year) ## ## . regress ln_wage age i.race, vce(cluster double_cluster) ## ## Linear regression Number of obs = 28,510 ## F(3, 28509) = 905.75 ## Prob > F = 0.0000 ## R-squared = 0.0946 ## Root MSE = .45494 ## ## (Std. Err. adjusted for 28,510 clusters in double_cluster) ## ------------------------------------------------------------------------------ ## | Robust ## ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] ## -------------+---------------------------------------------------------------- ## age | .0196731 .0004233 46.48 0.000 .0188435 .0205028 ## | ## race | ## black | -.1377638 .0059505 -23.15 0.000 -.1494271 -.1261006 ## other | .0666999 .0284081 2.35 0.019 .0110187 .1223812 ## | ## _cons | 1.141686 .012024 94.95 0.000 1.118119 1.165254 ## ------------------------------------------------------------------------------

What goes wrong here? You can see already that something is off because the number of clusters is the same as the number of observations. Since, in this dataset, the combination of `idcode`

and `year`

uniquely identifies each observations, the above approach effectively does not cluster at all. Instead, it gives you *heteroskedasticity-robust* standard errors, which are typically too small. You can check this by comparing to the output the same regression as above but with the `robust`

option.

## ## ## . use "http://www.stata-press.com/data/r14/nlswork.dta", clear ## (National Longitudinal Survey. Young Women 14-26 years of age in 1968) ## ## . regress ln_wage age i.race, robust ## ## Linear regression Number of obs = 28,510 ## F(3, 28506) = 905.75 ## Prob > F = 0.0000 ## R-squared = 0.0946 ## Root MSE = .45494 ## ## ------------------------------------------------------------------------------ ## | Robust ## ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] ## -------------+---------------------------------------------------------------- ## age | .0196731 .0004233 46.48 0.000 .0188435 .0205028 ## | ## race | ## black | -.1377638 .0059505 -23.15 0.000 -.1494271 -.1261006 ## other | .0666999 .0284081 2.35 0.019 .0110187 .1223812 ## | ## _cons | 1.141686 .012024 94.95 0.000 1.118119 1.165254 ## ------------------------------------------------------------------------------

So how does two-way clustering in Stata work then? There are a couple of user-written commands that one can use. I recommend `reghdfe`

by Sergio Correia because it is extremely versatile. Give him credit for it if you use the command! Other good options are `ivreg2`

by Baum, Schaffer and Stillman or `cgmreg`

by Cameron, Gelbach and Miller.

One issue with `reghdfe`

is that the inclusion of fixed effects is a required option. Sometimes you want to explore how results change with and without fixed effects, while still maintaining two-way clustered standard errors. A shortcut to make it work in `reghdfe`

is to absorb a constant. In the example above:

## ## ## . use "http://www.stata-press.com/data/r14/nlswork.dta", clear ## (National Longitudinal Survey. Young Women 14-26 years of age in 1968) ## ## . gen temp = 1 ## ## . reghdfe ln_wage age i.race, absorb(temp) cluster(idcode year) ## (converged in 1 iterations) ## ## HDFE Linear regression Number of obs = 28,510 ## Absorbing 1 HDFE group F( 3, 14) = 99.06 ## Statistics robust to heteroskedasticity Prob > F = 0.0000 ## R-squared = 0.0946 ## Adj R-squared = 0.0945 ## Number of clusters (idcode) = 4,710 Within R-sq. = 0.0946 ## Number of clusters (year) = 15 Root MSE = 0.4549 ## ## (Std. Err. adjusted for 15 clusters in idcode year) ## ------------------------------------------------------------------------------ ## | Robust ## ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] ## -------------+---------------------------------------------------------------- ## age | .0196731 .0014594 13.48 0.000 .0165431 .0228032 ## | ## race | ## black | -.1377638 .0133762 -10.30 0.000 -.166453 -.1090747 ## other | .0666999 .0664563 1.00 0.333 -.0758347 .2092346 ## ------------------------------------------------------------------------------ ## ## Absorbed degrees of freedom: ## ---------------------------------------------------------------+ ## Absorbed FE | Num. Coefs. = Categories - Redundant | ## -------------+-------------------------------------------------| ## temp | 1 1 0 | ## ---------------------------------------------------------------+

Compared to the initial incorrect approach, correctly two-way clustered standard errors differ substantially in this example.

What goes on at a more technical level is that two-way clustering amounts to adding up standard errors from clustering by each variable separately and then subtracting standard errors from clustering by the interaction of the two levels, see Cameron, Gelbach and Miller for details. The incorrect *group ID* approach only computes the interaction part.

Luis SchmidtHallo Tom! Vielen Dank fuer den Text, es hat mich sehr geholfen. Ich habe aber gemerkt, dass man kann auch noabsorb schreiben anstatt temp=1 zu erstellen. Gruss aus Brasilien. Luis

Tom ZimmermannPost authorDanke fuer den Tipp, die Option kannte ich nicht!

KevinHi Tom – Thanks for this illustration. Quick follow up: do we still need to worry about the small number of clusters (in this case the small number of clusters for years)? Would we still need to do Wild bootstrap (or something similar) as Cameron, Gelbach, Miller recommend in their other work?

Thanks!!!

Tom ZimmermannPost authorHi Kevin,

yes, with a small number of clusters (here: years), you still need to worry about consistency of standard error estimates. Thanks for pointing that out. I’m not sure reghdfe provides an option for the Wild Bootstrap. Other commands might. In any case, the Wild Bootstrap is not your only option, Angrist and Pischke discuss alternatives in chapter 8 of Mostly Harmless Econometrics.

Best,

Tom

SergioHi Tom!

Just wanted to point out that newer versions of reghdfe include the “noabsorb” (noa) option that will just add a normal constant.

Also, to run wild bootstraps you can use the boottest Stata package [1] that David Roodman and coauthors have recently published.

Cheers from DC!

Sergio

[1] http://qed.econ.queensu.ca/working_papers/papers/qed_wp_1406.pdf

Tom ZimmermannPost authorHi Sergio,

great, thanks for letting me know! I just updated my reghdfe version and it works like a charm ðŸ™‚

Best,

Tom