A COMPANION TO Theoretical Econometrics
Diagnostic Testing in Cross Section Contexts
To obtain a unified view of diagnostic testing, it is important to use a modern perspective. This requires facility with concepts from basic probability, such as conditional expectations and conditional variances, and application of tools such as the law of iterated expectations. While sloppiness in stating assumptions is often innocuous, in some cases it is not. In the following sections we show how to properly state the null hypothesis for diagnostic testing in cross section applications.
1.1 Diagnostic tests for the conditional mean in the linear regression model
We begin with the standard linear regression model because it is still the workhorse in empirical economics, and because it provides the simplest setting for a modern approach to diagnostic testing. Even in this basic setup, we must be careful in stating assumptions. The following statement, or a slight variant on it, appears in numerous textbooks and research papers:
Consider the model
y, = b0 + x, P + u, i = 1, 2,..., N,
where x, is a 1 x k vector of explanatory variables and the u, are iid zeromean errors.
This formulation may be so familiar that we do not even think to question it. Unfortunately, for econometric analysis, the statement of the model and assumptions is almost useless, as it omits the most important consideration: What is the relationship between the error, щ, and the explanatory variables, x? If we assume random sampling, the errors must be independent and identically distributed because щ is a function of yi and x. But this tells us nothing of value for estimating p.
Often, the explanatory variables in (9.1) are assumed to be fixed or nonrandom. Then we can obtain an unbiased estimator of P because the model satisfies the GaussMarkov assumptions. But assuming fixed regressors assumes away the interesting problems that arise with analyzing nonexperimental data.
What is a better model formulation? For most cross section applications it is best to start with a population model, which in the linear regression case is written as
y = p0 + xP + u, (9.2)
where x is a 1 x k vector of explanatory variables and u is the error. If we supplement this model with random sampling  which, for econometric applications with cross section data, is more realistic than the fixed regressor assumption  we can forget about the realizations of the random variables in (9.2) and focus entirely on the assumptions in the population.
In order to consistently estimate P by OLS (ordinary least squares), we need to make assumptions about the relationship between u and x. There are several possibilities. The weakest useful assumptions are
E(u) = 0 
(9.3) 
E(x'u) = 0, 
(9.4) 
where we assume throughout that all expectations are welldefined.
Assumption (9.3) is for free because we have included an intercept in the model. Assumption (9.4) is equivalent to assuming u is uncorrelated with each x. Under (9.3), (9.4), random sampling, and the assumption that var(x) has full rank  i. e. there is no perfect collinearity in the population  the OLS estimator is consistent and VN  asymptotically normal for p0 and p.
As a minimal set of assumptions, (9.4) is fine. But it is not generally enough to interpret the P7 as the partial effect of x. on the expected value of y. A stronger assumption is that u has a zero conditional mean:
E(u x) = 0, (9.5)
which implies that the population regression function, E(y x), is linear:
Under (9.6), no additional functions of x appear in a linear regression model. More formally, if for any 1 x h function g(x) we write
y = po + xP + g(x) Y + u
and (9.5) holds, then y = 0. Tests for functional form always maintain (9.6) as the null hypothesis. If we assume only (9.3) and (9.4), there is nothing to test: by definition, po and P appear in the linear projection of y on x.
In testing for omitted variables, z, that are not exact functions of x, the null hypothesis in (9.2) is
E(u x, z) = 0,
which is equivalent to
E(y x, z) = E(y x) = Po + xp. (9.8)
The first equality in equation (9.8) has a very natural interpretation: once x has been controlled for, z has no effect on the mean value of y. The most common way to test (9.8) is to specify the extended model
y = Po + xP + zy + u (9.9)
and to test H0 : y = 0.
Similar considerations arise when we test a maintained model against a nonnested alternative. One use of nonnested tests is to detect misspecified functional form. A traditional way of specifying competing models that are linear in parameters is
y = Po + a(x)P + u
and
y = Y0 + h(x)Y + v, (9.11)
where a(x) and h(x) are rowvectors of functions of x that may or may not contain elements in common and can be of different dimensions. For example, in (9.10), all explanatory variables may be in level or quadratic form and in (9.11) some or all may be in logarithmic form. When we take the null to be (9.10), assumption (9.5) is the weakest assumption that makes sense: we are testing H0 : E(y x) = Po + a(x)P against the alternative H1 : E(y x) = Y0 + h(x) y. Of course, there are different methods of testing Ho against H1, but before we choose a test we must agree on the proper statement of the null.
Stating the proper null requires even more care when using nonnested tests to choose between models with explanatory variables that are not functionally related to one another. If we agree to treat explanatory variables as random
and focus on conditional expectations, no confusion can arise. The traditional approach to specifying the competing models is
y = p0 + xp + u (9.12)
and
y = y0 + zy + v. (9.13)
We are thinking of cases where not all elements of z are functionally related to x, and vice versa. For example, in an equation to explain student performance, x contains one set of school and family characteristics and z contains another set (where there might be some overlap). Specifying (9.12) as the null model gives us nothing to test, as we have assumed nothing about how u relates to x and z. Instead, the null hypothesis is exactly as in equation (9.8): once the elements of x have been controlled for, z has no effect on y. This is the sense in which (9.12) is the correct model, and (9.13) is not. Similarly, if (9.13) is the null model, we are really saying E(y x, z) = E(y z) = y0 + zy.
It may be tempting to specify the null model as E(y x) = p0 + xP and the competing model as E(y z) = y, + zy, but then we have nothing to test. Both of these hypotheses can be true, in which case it makes no sense to test one against the other.
So far, we have said nothing about actually computing conditional mean diagnostics. A common method is based on variable addition statistics or artificial regressions. (See Davidson and MacKinnon, Chapter 1 this volume, for a survey of artificial regressions.) A general variable addition statistic is obtained by regressing the OLS residuals obtained under the null on the x; and some additional test variables. In particular, the statistic is N ■ RU from the regression
щ on 1, x;, g;, i = 1, 2,..., N, (9.14)
where щ = yi  Po  x;p are the OLS residuals and g; = g(x;, z;, X) is a 1 x q vector of misspecification indicators. Notice that g is allowed to depend on some estimated nuisance parameters, X, an r x 1 vector.
When testing for misspecified functional form in (9.6), gt depends only on x; (and possible nuisance parameter estimates). For example, we can take g; = g; = g(x,), where g(x) is a row vector of nonlinear functions of x. (Squares and cross products are common, but we can use more exotic functions, such as g;(x) = exp(x;)/ (1 + exp(xj)).) For RESET, g; consists of powers of the OLS fitted values, у = p0 + xp, usually у2, у3, and possibly у4. The DavidsonMacKinnon (1981) test of (9.6) against the nonnested alternative (9.11) takes i to be the scalar y0 + h(x ;)y, the fitted values from an OLS regression of y on 1, h(x ), = 1, 2, . . . , N. Wooldridge (1992a), obtains the LM test of X = 1 in the more general model E(y x) = (1 + ^(P0 + xP))1/X when y > 0. Then, i = y! log(у), assuming that у > 0 for all i.
As discussed above, the null hypothesis we are testing is given by (9.6). What auxiliary assumptions are needed to obtain a usable limiting distribution for
LM = N • RU (We denote this statistic by LM because, when we are testing the null model against a more general alternative, it is the most popular form of the LM statistic.) By auxiliary assumptions we do not mean regularity conditions. In this setup, regularity conditions consist of assuming enough finite moments on the elements of x i, g t = g(x „ 5), and и, sufficient differentiability of g(x, ) over the interior of the nuisance parameter space, and д/N  consistency of X for 5 in the interior of the nuisance parameter space. We will say no more about these kinds of assumptions because they are rarely checked.
The key auxiliary assumption when testing for functional form, using the standard NR2 statistic, is homoskedasticity (which, again, is stated in terms of the population):
var(y x) = var(u x) = о2. (9.15)
(Notice how we do not restrict the variance of и given explanatory variables that do not appear in x, as no such restrictions are needed.) It follows from Wooldridge (1991a) that, under (9.6) and (9.15),
N • r2 ~ x2, (9.16)
where we assume that there are no redundancies in g(x i, 5). (Technically, the population residuals from the population regression of g(x, 5) on 1, x must have variance matrix with full rank.)
When we test for omitted variables in equation (9.9), gi = z;. If we test (9.8) against the nonnested alternative E(y x, z) = y0 + zy, there are two popular tests. Let w be the elements of z not in x. The NR2 version of the Fstatistic proposed by Mizon and Richard (1986) simply takes g; = w;. In other words, we consider a composite model that contains all explanatory variables, and then test for joint significance of those variables not in x. The DavidsonMacKinnon test again takes i to be the fitted values from the alternative model. In these cases, the auxiliary assumption under H0 is
var(y x, z) = var(u x, z) = о2. (9.17)
Note that (9.15) is no longer sufficient.
Rather than use an NR2 statistic, an Ftest is also asymptotically valid. We simply obtain the Fstatistic for significance of gt in the artificial model
y i = во + x;P + g iY + error i. (9.18)
The Fstatistic is asymptotically valid in the sense that q • F ~ x2 under H0 and the appropriate homoskedasticity assumption ((9.15) or (9.17)). From now on we focus on the LM form of the statistic.
Interestingly, when we test for omitted variables, the asymptotic result in (9.16) does not generally hold under (9.17) if we only assume и and z (and и and x) are uncorrelated under H0. To see this, as well as gain other insights into the asymptotic behavior of the LM statistic, it is useful to sketch the derivation of
(9.16) . First, straightforward algebra shows that LM can be written as
' N 
 
' N ^ 
1 
' N ^ 

LM = 
N 1/2 X g 'A 
d2 
N 1 £ Г' Г 
N 1/2 X g u 

К ;=1 У 
1 ;=1 y 
К ;=1 ) 
where 62 = N1 X N=1ui and r; = g;  10  x;П are the OLS residuals from a multivariate regression of g; on 1, x;, i = 1, 2,..., N. Equation (9.19) makes it clear that the statistic is based on the sample covariance between gi and й;. From
(9.19) , we see that the asymptotic distribution depends on the asymptotic distribution of
N
N 1/2 £ g'u.
i=1
As shown in Wooldridge (1990a, 1991a), under Ho (either (9.6) or (9.8)),
N N
N 1/2 £ gU = N 1/2 £ r' U + op(1) (9.20)
;=1 ;=1
where r; = g;  п 0  х;П are the population residuals from the population regression of g; on 1, x;. Under H0, E(u; r;) = 0 (since r; is either a function of x; or of (x;, z;)), and so E(r 'u;) = 0. It follows that the second term in (9.20) has a limiting ^variate normal distribution. Therefore, whether LM has a limiting chisquare distribution under H0 hinges on whether 62 (N1X N= 1f'r;) is a consistent estimator of var(ru;) = E(u2r  r;). By the law of iterated expectations,
E(u2 r(r;) = E[E(u2 r;)r;r;j = о 2E(r;r;), (9.21)
where the second equality holds provided E(u2 r;) = E(u2) = о2. For testing (9.8) under (9.17), E(u2 x;, z;) = var(u;x;, z;) = о2, and r; is a function of (x;, z;), so E(u2 r;) = о2.
If we only assume E(r'u,) = 0  for example, E(u) = 0, E(x'u) = 0, and E(zu) = 0 in (9.9)  then E(u2 r) and var(u r) are not necessarily the same, and (9.17) is no longer enough to ensure that LM has an asymptotic chisquare distribution.
We can also use (9.19) and (9.20) to see how LM behaves when the conditional mean null hypothesis holds but the auxiliary homoskedasticity assumption does not. An important point is that the representation in (9.20) holds with or without homoskedasticity, which implies that LM has a welldefined limiting distribution even if the conditional variance is not constant. Therefore, the rejection frequency tends to some number strictly less than one (typically, substantially below one), which means that a diagnostic test for the conditional mean has no systematic power to detect heteroskedasticity. Intuitively, it makes sense that a conditional mean test makes a poor test for heteroskedasticity. However, some authors have
claimed that conditional mean diagnostics, such as RESET, have the ability to detect heteroskedasticity; the previous argument shows this simply is not true.
Without (9.21), the matrix in the quadratic form is not a consistent estimator of var(ru;), and so the limiting distribution of LM is not chisquare. The resulting test based on chisquare critical values may be asymptotically undersized or oversized, and it is difficult to know which is the case.
It is now fairly well known how to adjust the usual LM statistic to allow for heteroskedasticity of unknown form under H0. A computationally simple regressionbased test has its roots in the Messer and White (1984) method for obtaining the heteroskedasticityrobust variance matrix of the OLS estimator, and was first proposed by Davidson and MacKinnon (1985). Subsequently, it was shown to be valid quite generally by Wooldridge (1990a). The simplest way to derive a heteroskedasticityrobust test is to note that a consistent estmator of var(rU), with or without homoskedasticity, is N1X N=1 U2 f' f. A useful agebraic fact is
(9.23)
where R20 is now the uncentered R2 and SSR0 is the usual sum of squared residuals. Under (9.6) or (9.8), the heteroskedasticityrobust LM statistic has an asymptotic X2q distribution.
As we mentioned in the introduction, for any diagnostic test it is important to know the alternatives against which it is consistent. Before we leave this subsection, we provide an example of how the previous tools can be used to shed light on conflicting claims about specification tests in the literature. Ramsey's RESET has often been touted as a general diagnostic that can detect, in addition to functional form problems, omitted variables. (See, e. g., Thursby (1979, 1989) and Godfrey (1988, section 4.2.2).) In fact, RESET, or any other test where the mis  specification indicators are functions of xi (and possibly nuisance parameters) make poor tests for omitted variables. To see why, suppose that E(y x, q) = p0 + xP + yq, where у Ф 0. We start with this model to emphasize that we are interested in the partial effect of each x, holding q, and the other elements of x, fixed. Now, suppose that q is not observed. If q is correlated with one or more elements of x, the OLS regression y on 1, x, using a random sample of data, is biased and
inconsistent for the P;. What happens if we apply RESET or some other functional form test? Suppose that q has a linear conditional expectation given x: E(q x) = n0 + xn. This is certainly the leading case; after all, we started with a linear model for E(y x, q). Then, by iterated expectations,
E(y x) = E[E(y x, q)x] = Po + xP + YE(q x)
= (P0 + Yn>) + x(P + Yn) = 00 + x0.
In other words, regardless of the size of Y or the amount of correlation between q and x, E(y x) is linear in x, and so RESET generally converges in distribution to a quadratic form in a normal random vector. This means it is inconsistent against the omitted variable alternative. If var(y  x) is constant, as is the case if var(y x, q) is constant and var(q  x) is constant, then RESET has a limiting chisquare distribution: its asymptotic power is equal to its asymptotic size. RESET would only detect the omission of q if E(q x) is nonlinear. But then we could never distinguish between y ф 0 and E(y x) being nonlinear in x.
Finally, we can compare (9.19) and (9.22) to see that, when the homoskedasticity assumption holds, the statistics have the same limiting distribution under local alternatives. (See Davidson and MacKinnon (1993, section 12.2) for a discussion of local alternatives.) The statistics only differ in the q x q matrix in the quadratic form. Under homoskedasticity, these both converge in probability to o2E(r'r) under local alternatives. See Wooldridge (1990a) for a more general discussion.