A COMPANION TO Theoretical Econometrics
Sampling Theory Inference with Known Covariance Matrix
Writing y{ = x'в + eі so that all N observations are included yields the familiar matrix expression
y = XP + e, (4.2)
where y and e are of dimension (N x 1) and X is of dimension (N x K), and rank K. The assumption of heteroskedastic y can be written as
E[(y - XP)(y - XP)'] = E[ee' ] = У = о2Л, (4.3)
where
У = diagonal(o2, о2,..., oN)
= o2diagonal(X 1, X2,..., XN)
= о2Л. (4.4)
In equation (4.4) a constant o2 has been factored out of У yielding a matrix Л of ratios X i = о 2/o2. This factoring device is useful (i) when Л is known, but о2 is not, and (ii) if a heteroskedastic specification has a constant component (о2 in this case) and a component that varies over observations. The constant that is factored out is arbitrary, and, in practice, is chosen for convenience.
The generalized least squares estimator for P which, from the Gauss-Markov Theorem is known to be the best linear unbiased estimator, is given by
1
XXfXf xy
П2 a2
The right-hand expressions in equations (4.4) emphasize the weighted nature of the generalized least squares estimator. Each observation (x; and y;) is weighted by the inverse standard deviation a-1, or a quantity proportional to it, A-1/2. Observations that are less reliable because they come from a distribution with a large variance are weighted less than more reliable observations where a2 is smaller. The mean and covariance matrix of the generalized least squares estimator are given by E[g] = в and Vg, respectively, where
Practical application of (4.5) and (4.6) requires knowledge of at least Л. For inference purposes, an unbiased estimator for o2 can be found from
Although most applications proceed by refining the specification of Л into one that contains a reduced number of parameters that is constant for changing sample size, there are some scenarios where knowledge of Л is a reasonable assumption. To illustrate one such example, suppose that we are interested in an industry cost function that can be written as
Уіі = j + ej (4.8)
where the double subscript (i, j) refers to the jth firm in the ith industry. Suppose also that the e;j are independent with var(eij) = o2 (a constant) and that there are П firms in the ith industry. A model for data obtained by averaging over all firms in each industry is given by
П; П; П;
Г h y;j = - h j + 7 h eij
ni j=1
or
E; = Х'в + Є;.
The variance of the error term is
1 n‘ 1
var(e;) = - У var(e! y) = - n, o2 n2 p n2
That is, с, is heteroskedastic with its variance depending on the number of firms used to compute the average industry data. Providing this number is available, the matrix Л is known with its inverse given by
Л-1 = diagonal(n1, n2,..., nN).
The generalized least squares procedure can be applied. It recognizes that industry observations obtained by averaging a large number of firms are more reliable than those obtained by averaging a small number of firms.
To construct confidence intervals for the elements in в or to test hypotheses about the elements in в, one can assume the error vector e is normally distributed and proceed with finite sample inference procedures, or one can use large sample approximate inference procedures without the assumption of normally distributed errors. When the errors are normally distributed the following results hold:
(N - K)d ~ x 2
o2 X (N-K)
RS ~ NR, o2R(X4-1X)-1R' ]
(RS - Rfi)'[R(XrA-1X)-1 R']-1(RS - R[I)
o2
F = (RS - Rfi)' [R(X/A-1X)-1 R/]-1(RS - RP)/J d2
In the above expressions R is a (J x K) matrix of rank J whose elements define the quantity for which inference is sought. These results parallel those for the general linear model with independent, identically distributed error terms, and can be derived from them by using a straightforward transformation. For details of the transformation, and more details of how equations (4.10)-(4.13) are used for hypothesis testing and interval estimation, see, for example, Judge et al. (1988, ch. 8). When the errors are not assumed to be normally distributed, approximate large sample inference is based on equation (4.12) with o2 replaced by d2.
For inferences about nonlinear functions of the elements of в that cannot be written as Re, we consider functions of the form ^(в) = 0 where g() is a J-dimensional vector function. Inference can be based on the approximate result
where G is the (J x K) matrix of partial derivatives, with rank J,
Three categories of tests frequently used in econometrics are the Wald, Lagrange multiplier, and likelihood ratio tests. In the context of the scenarios discussed so far (hypothesis tests about в in a model with covariance matrix о2Л, with Л known), all three testing principles lead to the results given above. The only difference is that, in a Lagrange multiplier test, the estimate for o2 is based on the restricted rather than unrestricted generalized least squares residuals.
Further details on estimation and hypothesis testing for the case of a known error covariance matrix can be found in standard textbooks such as Judge et al. (1988, chs. 8, 9), Greene (1997, ch. 12) and Baltagi (1998, chs. 5, 9). Of particular interest might be the consequences of using the ordinary least squares (OLS) estimator b = (X' X)-1X'y in the presence of heteroskedastic errors. It is well known that, under these circumstances, the OLS estimator is inefficient and that the estimated covariance matrix 62(X' X)-1 is a biased estimate of the true covariance matrix o2(X'X)-1X^ X(X'X)-1. Examples of inefficiencies and bias are given in most textbook treatments.