Springer Texts in Business and Economics
Testing in a Pooled Model
(1) The Chow-Test
Before pooling the data one may be concerned whether the data is poolable. This hypothesis is also known as the stability of the regression equation across firms or across time. It can be formulated in terms of an unrestricted model which involves a separate regression equation for each firm
Vi = Zi6i + Ui for i = 1, 2,...,N
where vi = (yii,... ,ViT), Zi = [it, Xi and Xi is (T x K). 8'i is 1 x (K + 1) and ui is T x 1. The important thing to notice is that 6i is different for every regional equation. We want to test the hypothesis H0; 8i=8 for all i, versus H1; 8i = 8 for some i. Under H0 we can write the restricted model given in (12.41) as:
V = Z8 + u
where Z1 = (Z'1 ,Z'2,...,Z'N) and U = (u/1,u/2, written as
(12.42)
,u'N). The unrestricted model can also be
(12.43)
K1 = K + 1. Hence the variables in Z are all linear combinations of the variables in Z*. Under the assumption that u ~ N(0, a2INT), the MVU estimator for 8 in equation (12.42) is
8ols = 8mle = (Z'Z)_1Z'v (12.44)
and therefore
V = Z8ols + e (12.45)
implying that e = (INT — Z(Z1 Z)-1Z')y = My = M(Z8 + u) = Mu since MZ = 0. Similarly, under the alternative, the MVU for 8i is given by
8i, oLS = 8i, MLE = (Z'Zi)-1Z'Vi (12.46)
and therefore (12.47)
Vi — Zi8i, OLS + ei
|
One can easily deduce that y = Z*<5* + e* with e* = M*y = M*u and = (Z*'Z*)-1 Z*'y. Note that both M and M* are symmetric and idempotent with MM* = M*. This easily follows since
Z (Z'Z)-1Z'Z *(Z*'Z* )-1Z *' = Z (Z'Z)-1I*'Z *'Z*(Z*'Z*)-1Z*'
This uses the fact that Z = Z*I*. Now, e'e — e*'e* = u'(M — M*)u and e*'e* = u'M*u are
independent since (M — M*)M* = 0. Also, both quadratic forms when divided by a2 are distributed as x2 since (M — M*) and M* are idempotent, see Judge et al. (1985). Dividing these quadratic forms by their respective degrees of freedom, and taking their ratio leads to the following test statistic:
(e'e — e^e! — e'2e2 — .. — e'N eN )/(N — 1)K'
(e'e + e'2e2 + .. + e'N eN )/N (T — K')
Under H0, Fobs is distributed as an F((N — 1)K', N(T — K')), see lemma 2.2 of Fisher (1970). This is exactly the Chow’s (1960) test extended to the case of N linear regressions.
The URSS in this case is the sum of the N residual sum of squares obtained by applying OLS to (12.41), i. e., on each firm equation separately. The RRSS is simply the RSS from OLS performed on the pooled regression given by (12.42). In this case, there are (N — 1)K' restrictions and the URSS has N(T — K') degrees of freedom. Similarly, one can test the stability of the regression across time. In this case, the degrees of freedom are (T — 1)K' and N(T — K') respectively. Both tests target the whole set of regression coefficients including the constant. If the LSDV model is suspected to be the proper specification, then the intercepts are allowed to vary but the slopes remain the same. To test the stability of the slopes only, the same Chow - test can be utilized, however the RRSS is now that of the LSDV regression with firm (or time) dummies only. The number of restrictions becomes (N — 1)K for testing the stability of the slopes across firms and (T — 1)K for testing their stability across time.
The Chow-test however is proper under spherical disturbances, and if that hypothesis is not correct it will lead to improper inference. Baltagi (1981) showed that if the true specification of the disturbances is an error components structure then the Chow-test tend to reject poolability too often when in fact it is true. However, a generalization of the Chow-test which takes care of the general variance-covariance matrix is available in Zellner (1962). This is exactly the test of the null hypothesis H0; R/3 = r when Q is that of the error components specification, see Chapter 9. Baltagi (1981) shows that this test performs well in Monte Carlo experiments. In this case, all we need to do is transform our model (under both the null and alternative hypotheses) such that the transformed disturbances have a variance of a2INT, then apply the Chow-test on the transformed model. The later step is legitimate because the transformed disturbances have homoskedastic variances and the usual Chow-test is legitimate. Given Q = a2U, we premultiply the restricted model given in (12.42) by U-1/2 and we call U-1/2y = y, U-1/2Z = Z and U-1/2u = u. Hence
y = Z 6 + u (12.49)
with E(uu') = U-1/2E(uu')U-1/2' = a2INT. Similarly, we premultiply the unrestricted model given in (12.43) by U-1/2 and we call U-1/2Z* = Z*. Therefore
y = Z *6* + u (12.50)
with E(uu') = a[16]1nt■
At this stage, we can test Ho; 6i = 6 for every i = 1, 2,...,N, simply by using the Chow - statistic, only now on the transformed models (12.49) and (12.50) since they satisfy u ~ N(0,a2INT). Note that Z = Z*I* which is simply obtained from Z = Z*I* by premultiplying by S-1/2. Defining M = Int - Z(Z'Z)-1Z', and M* = INt - Z*(Z*'Z*)-1Z*', it is easy to show that M and M * are both symmetric, idempotent and such that M lid * = lid *. Once again the conditions for lemma 2.2 of Fisher (1970) are satisfied, and the test-statistic
where e = y - Z6ols and 6ols = (Z'Z) 1Z'y implying that e = My = Mu. Similarly,
**
e* = y - Z*6OLS and 6OLS = (Z*'Z*)-1 Z*'y implying that e* = M*y = M*u. This is the Chow-test after premultiplying the model by S-1/2 or simply applying the Fuller and Battese (1974) transformation. See Baltagi (2008) for details.
For the gasoline data in Baltagi and Griffin (1983), Chow’s test for poolability across countries yields an observed F-statistic of 129.38 and is distributed as F(68,270) under H0; 6i = 6 for i = 1,...,N. This tests the stability of four time-series regression coefficients across 18 countries. The unrestricted SSE is based upon 18 OLS time-series regressions, one for each country. For the stability of the slope coefficients only, H0; /3i = в, an observed F-value of 27.33 is obtained which is distributed as F(51,270) under the null. Chow’s test for poolability across time yields an F -value of 0.276 which is distributed as F (72,266) under H0; 6t = 6 for t = 1,...,T. This tests the stability of four cross-section regression coefficients across 19 time periods. The unrestricted SSE is based upon 19 OLS cross-section regressions, one for each year. This does not reject poolability across time-periods. The test for poolability across countries, allowing for a one-way error components model yields an F-value of 21.64 which is distributed as F(68,270) under H0; 6i = 6 for i = 1,...,N. The test for poolability across time yields an F-value of 1.66 which is distributed as F(72,266) under H0; 6t = 6 for t = 1,...,T. This rejects H0 at the 5% level.
where eit denotes the OLS residuals on the pooled model, ei. denote their sum over t, respectively. Under the null hypothesis H0 this LM statistic is distributed as a x2. For the gasoline data in Baltagi and Griffin (1983), the Breusch and Pagan LM test yields an LM statistic of 1465.6. This is obtained using the Stata command xtest0 after estimating the model with random effects. This is significant and rejects the null hypothesis. The corresponding likelihood ratio test assuming Normal disturbances is also reported by Stata maximum likelihood output for the random effects model. This yields an LR statistic of 463.97 which is asymptotically distributed as x1 under the null hypothesis H0 and is also significant.
One problem with the Breusch-Pagan test is that it assumes that the alternative hypothesis is two-sided when we know that a> 0. A one-sided version of this test is given by Honda (1985):
where e denotes the vector of OLS residuals. Note that the square of this N(0,1) statistic is the Breusch and Pagan (1980) LM test-statistic. Honda (1985) finds that this test statistic is uniformly most powerful and robust to non-normality. However, Moulton and Randolph (1989) showed that the asymptotic N(0,1) approximation for this one-sided LM statistic can be poor even in large samples. They suggest an alternative Standardized Lagrange Multiplier (SLM) test whose asymptotic critical values are generally closer to the exact critical values than those of the LM test. This SLM test statistic centers and scales the one-sided LM statistic so that its mean is zero and its variance is one.
HO - E(HO) _ d - E(d) ^/var(HO) var(d)
where d = e'De/e'e and D = (In О Jt). Using the results on moments of quadratic forms in regression residuals, see for e. g., Evans and King (1985), we get
E(d) = tr(DPz )/p
and
var(d) = 2{p tr(DPZ)2 — [tr(DPZ)]2}/p2(p + 2) (12.55)
where p = n — (K + 1) and PZ = In — Z(Z'Z)-1Z'. Under the null hypothesis, SLM has an asymptotic N(0, 1) distribution.
(3) The Hausman-Test
A critical assumption in the error components regression model is that E(uit/Xit) = 0. This is important given that the disturbances contain individual effects (the p. fs) which are unobserved and may be correlated with the Xu’s. For example, in an earnings equation these ^’s may denote unobservable ability of the individual and this may be correlated with the schooling variable included on the right hand side of this equation. In this case, E(uit/Xit) = 0 and the GLS estimator Pols becomes biased and inconsistent for /3. However, the within transformation wipes out these p. fs and leaves the Within estimator вWithin unbiased and consistent for в. Hausman (1978) suggests comparing eGLS and в within, both of which are consistent under the null hypothesis Ho; E(uit/Xit) = 0, but which will have different probability limits if H0 is not true. In fact, eWithin is consistent whether H0 is true or not, while eGLS is BLUE, consistent and asymptotically efficient under Ho, but is inconsistent when H0 is false. A natural test statistic would be based on 3 = eGLS — eWithin. Under H0, plim 3 = 0^and cov(3, eGLS) = 0.
Using the fact that /3GLS — в = (X'Q-1X)-1X'Q-lu and /3Within — в = (X'QX)-1Х'Qu, one gets E(3) = 0 and
cov(/^GLS, q) = ^^GLS) — cov(/^GLS, PWithin)
= (X 'Q-1X )-1 — (X'n-1X )-1X' n-1E (uu')QX (X 'QX )-1 = 0
Using the fact that /3Within = /3GLS — q, one gets var(f3Withim) = var(qGLs )+var(q), since cov(/3GLS, q) = 0. Therefore,
var(q) = var (/3 w ithin) — var( Pols ) = (X'QX )-1 — (X'U-1X)-1 (12.56)
Hence, the Hausman test statistic is given by
m = qf[var(q)]-1q (12.57)
and under H0 is asymptotically distributed as Xk, where K denotes the dimension of slope vector в. In order to make this test operational, Q is replaced by a consistent estimator Q, and GLS by its corresponding FGLS. An alternative asymptotically equivalent test can be obtained from the augmented regression
y* = X *@ + X 7 + w (12.58)
where y* = ovQ-1/2y, X* = ovQ-1/2X and X = QX. Hausman’s test is now equivalent to testing whether 7 = 0. This is a standard Wald test for the omission of the variables X from (12.58).
This test was generalized by Arellano (1993) to make it robust to heteroskedasticity and autocorrelation of arbitrary forms. In fact, if either heteroskedasticity or serial correlation is present, the variances of the Within and GLS estimators are not valid and the corresponding Hausman test statistic is inappropriate. For the Baltagi and Griffin (1983) gasoline data, the Hausman test statistic based on the difference between the Within estimator and that of feasible GLS based on Swamy and Arora (1972) yields a x3 value of m = 306.1 which rejects the null hypothesis. This is obtained using the Stata command hausman.