How does one cluster standard errors two ways in Stata? This question comes up frequently in time series panel data (i.e. where data are organized by unit ID and time period) but can come up in other data with panel structure as well (e.g. firms by industry and region).

I’ll first show how two-way clustering does **not** work in Stata. I have seen this occasionally in practice, so I think it’s important to get it out of the way.

The standard `regress`

command in Stata only allows one-way clustering. Getting around that restriction, one might be tempted to

- Create a group identifier for the interaction of your two levels of clustering
- Run
`regress`

and cluster by the newly created group identifier

##
##
## . use "http://www.stata-press.com/data/r14/nlswork.dta", clear
## (National Longitudinal Survey. Young Women 14-26 years of age in 1968)
##
## . egen double_cluster=group(idcode year)
##
## . regress ln_wage age i.race, vce(cluster double_cluster)
##
## Linear regression Number of obs = 28,510
## F(3, 28509) = 905.75
## Prob > F = 0.0000
## R-squared = 0.0946
## Root MSE = .45494
##
## (Std. Err. adjusted for 28,510 clusters in double_cluster)
## ------------------------------------------------------------------------------
## | Robust
## ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
## -------------+----------------------------------------------------------------
## age | .0196731 .0004233 46.48 0.000 .0188435 .0205028
## |
## race |
## black | -.1377638 .0059505 -23.15 0.000 -.1494271 -.1261006
## other | .0666999 .0284081 2.35 0.019 .0110187 .1223812
## |
## _cons | 1.141686 .012024 94.95 0.000 1.118119 1.165254
## ------------------------------------------------------------------------------

What goes wrong here? You can see already that something is off because the number of clusters is the same as the number of observations. Since, in this dataset, the combination of `idcode`

and `year`

uniquely identifies each observations, the above approach effectively does not cluster at all. Instead, it gives you *heteroskedasticity-robust* standard errors, which are typically too small. You can check this by comparing to the output the same regression as above but with the `robust`

option.

##
##
## . use "http://www.stata-press.com/data/r14/nlswork.dta", clear
## (National Longitudinal Survey. Young Women 14-26 years of age in 1968)
##
## . regress ln_wage age i.race, robust
##
## Linear regression Number of obs = 28,510
## F(3, 28506) = 905.75
## Prob > F = 0.0000
## R-squared = 0.0946
## Root MSE = .45494
##
## ------------------------------------------------------------------------------
## | Robust
## ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
## -------------+----------------------------------------------------------------
## age | .0196731 .0004233 46.48 0.000 .0188435 .0205028
## |
## race |
## black | -.1377638 .0059505 -23.15 0.000 -.1494271 -.1261006
## other | .0666999 .0284081 2.35 0.019 .0110187 .1223812
## |
## _cons | 1.141686 .012024 94.95 0.000 1.118119 1.165254
## ------------------------------------------------------------------------------

So how does two-way clustering in Stata work then? There are a couple of user-written commands that one can use. I recommend `reghdfe`

by Sergio Correia because it is extremely versatile. Give him credit for it if you use the command! Other good options are `ivreg2`

by Baum, Schaffer and Stillman or `cgmreg`

by Cameron, Gelbach and Miller.

One issue with `reghdfe`

is that the inclusion of fixed effects is a required option. Sometimes you want to explore how results change with and without fixed effects, while still maintaining two-way clustered standard errors. A shortcut to make it work in `reghdfe`

is to absorb a constant. In the example above:

##
##
## . use "http://www.stata-press.com/data/r14/nlswork.dta", clear
## (National Longitudinal Survey. Young Women 14-26 years of age in 1968)
##
## . gen temp = 1
##
## . reghdfe ln_wage age i.race, absorb(temp) cluster(idcode year)
## (converged in 1 iterations)
##
## HDFE Linear regression Number of obs = 28,510
## Absorbing 1 HDFE group F( 3, 14) = 99.06
## Statistics robust to heteroskedasticity Prob > F = 0.0000
## R-squared = 0.0946
## Adj R-squared = 0.0945
## Number of clusters (idcode) = 4,710 Within R-sq. = 0.0946
## Number of clusters (year) = 15 Root MSE = 0.4549
##
## (Std. Err. adjusted for 15 clusters in idcode year)
## ------------------------------------------------------------------------------
## | Robust
## ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
## -------------+----------------------------------------------------------------
## age | .0196731 .0014594 13.48 0.000 .0165431 .0228032
## |
## race |
## black | -.1377638 .0133762 -10.30 0.000 -.166453 -.1090747
## other | .0666999 .0664563 1.00 0.333 -.0758347 .2092346
## ------------------------------------------------------------------------------
##
## Absorbed degrees of freedom:
## ---------------------------------------------------------------+
## Absorbed FE | Num. Coefs. = Categories - Redundant |
## -------------+-------------------------------------------------|
## temp | 1 1 0 |
## ---------------------------------------------------------------+

Compared to the initial incorrect approach, correctly two-way clustered standard errors differ substantially in this example.

What goes on at a more technical level is that two-way clustering amounts to adding up standard errors from clustering by each variable separately and then subtracting standard errors from clustering by the interaction of the two levels, see Cameron, Gelbach and Miller for details. The incorrect *group ID* approach only computes the interaction part.