Mostly Harmless Econometrics: An Empiricist’s Companion
Nonstandard Standard Error Issues
We have normality. I repeat, we have normality.
Anything you still can’t cope with is therefore your own problem.
Douglas Adams, The Hitchhiker’s Guide to the Galaxy (1979)
Today, software packages routinely compute asymptotic standard errors derived under weak assumptions about the sampling process or underlying model. For example, you get regression standard errors based on formula (3.1.7) using the Stata option "robust". Robust standard errors improve on old-fashioned standard errors because the resulting inferences are asymptotically valid when the regression residuals are heteroskedastic, as they almost certainly are when regression approximates a nonlinear CEF. In contrast, old-fashioned standard errors are derived assuming homoskedasticity. The hang-up here is that robust standard errors can be misleading when the asymptotic approximation is not very good. The first part of this chapter looks at the failure of asymptotic inference with robust standard errors and some simple palliatives.
A pillar of traditional cross-section inference - and the discussion in Section 3.1.3 - is the assumption that the data are independent. Each observation is treated as a random draw from the same population, uncorrelated with the observation before or after. We understand today that this sampling model is unrealistic and potentially even foolhardy. Much as in the time-series studies common in macroeconomics, cross-section analysts must worry about correlation between observations. The most important form of dependence arises in data with a group structure - for example, the test scores of children observed within classes or schools. Children in the same school or class tend to have test scores that are correlated since they are subject to some of the same environmental and family-background influences. We call this correlation the clustering problem, or the Moulton problem, after Moulton (1986), who made it famous. A closely-related problem is correlation over time in the data sets commonly used to implement differences-in-differences estimation strategies. For example, studies of state-level minimum wages must confront the fact that state average employment rates are correlated over time. We call this the serial correlation problem, closely
related but distinct from the Moulton problem.
Researchers plagued by clustering and serial correlation also have to confront the fact that the simplest fixups for these problems, like Stata’s "cluster" option, may not be very good. The asymptotic approximation relevant for clustered or serially correlated data relies on a large number of clusters or time series observations. Alas, we are rarely blessed with many clusters or long time series. The resulting inference problems are not always insurmountable, though often the best solution is to get more data. Econometric fix-ups for clustering and serial correlation are discussed in the second part of this chapter. Some of the material in this chapter is hard to work through without matrix algebra, so we take the plunge and switch to a mostly-matrix motif.