A COMPANION TO Theoretical Econometrics
Basic Model
Suppose we have a set of N cross-sections with T time series observations on each. The classic data introduced in Zellner's (1962) initial work comprised firm - level investment data collected annually for 20 years. More recently Batchelor and Gulley (1995) analysed the determinants of jewelry demand for a sample of six countries over 16 years. In both cases, the disturbances from different regression equations, at a given point in time, were correlated because of common unobservable factors.
It is convenient to continue with the cross-section, time series characterization of the data, but clearly what is distinctive is that the data have two dimensions. In general we are dealing with data fields. Often in demand studies a system of demand equations is specified to explain household level consumption of several commodities. Here the disturbance covariances arise because of potential correlations between household specific unobservables associated with each household's commodity demand. In the area of energy demand Bartels, Fiebig, and Plumb
(1996) consider household expenditures on gas, and two types of electricity, while Fiebig, Bartels, and Aigner (1991) and Henley and Peirson (1994) consider electricity demands at different times of the day.
The structure of the multi-dimensional data focuses attention on two important specification issues: (i) what should be the appropriate parameter variation across the two dimensions, and (ii) what should be the appropriate stochastic specification. In the case of the basic SUR model these specification issues are resolved by forming a system of N equations each containing T observations:
yі = X; Р,- + U i = 1,..., N (5.1)
where y; and u; are T-dimensional vectors, X, is T x K; and p; is a K;-dimensional vector. Stacking all N equations yields:
which can be written compactly as:
y = Xp + u (5.2)
where в is a K-dimensional vector of unknown parameters that needs to be estimated and K = X N=1Ki • For the NT x 1 vector of stacked disturbances the assumptions are (i) E(u) = 0, and (ii) the NT x NT covariance matrix is comprised of N2 blocks of the form E(u;uj) = cijIT where IT is a T x T identity matrix. These assumptions mean that the T disturbances in each of the N equations have zero mean, equal variance, and are uncorrelated and that covariances between contemporaneous disturbances for a pair of equations are potentially nonzero but equal, while non-contemporaneous covariances are all zero. Thus the full covariance matrix of u is given by Q = X ® IT where X = [c^] is the N x N contemporaneous covariance matrix and ® denotes the Kronecker product.
Each of the N equations is individually assumed to satisfy the classical assumptions associated with the linear regression model and can be estimated separately. Of course this ignores the correlation between the disturbances of different equations, which can be exploited by joint estimation. The individual equations are related, even though superficially they may not seem to be; they are only seemingly unrelated. The GLS (generalized least squares) estimator is readily defined as
S(X) = [X'(X-1 ® IT)X]-1X'(X-1 ® IT)y (5.3)
with a covariance matrix given by
var [S(X)] = [X'(X-1 ® IT)X]-1. (5.4)
It is well known that the GLS estimator reduces to OLS (ordinary least squares) when: (i) there is an absence of contemporaneous correlations (cij = 0, i Ф j); or (ii) the same set of explanatory variables are included in each equation (X1 = X2 = ... = XN). A more complete characterization of when OLS is equivalent to GLS is given in Baltagi (1989) and Bartels and Fiebig (1991).
In his original article, Zellner (1962) recognized that the efficiency gains resulting from joint estimation tended to be larger when the explanatory variables in different equations were not highly correlated but the disturbances from these equations were highly correlated. Work by Binkley (1982) and Binkley and Nelson (1988) has led to an important qualification to this conventional wisdom. They show that even when correlation among variables across equations is present, efficiency gains from joint estimation can be considerable when there is multi- collinearity within an equation.
Consider the class of feasible GLS (FGLS) estimators that differ only in the choice of the estimator used for the contemporaneous covariance matrix, say S(X). The estimator is given by:
S(X) = [X'(X-1 ® IT)X]-1X'(X-1 ® IT)y, (5.5)
and inferences are based on the estimator of the asymptotic covariance matrix of S(X) given by:
There are many variants of this particular FGLS estimator. Obviously, OLS belongs to the class with X = IN, but Zellner (1962) proposed the first operational estimator that explicitly utilized the SUR structure. He suggested an estimated covariance matrix calculated from OLS residuals obtained from (5.1); namely S = (sij) where Sj = (yi - Xibi)'(yj - Xjbj)/T and bi is the OLS estimator of p;. For consistent estimation division by т = T suffices but other suggestions have also been made; see for example Srivastava and Giles (1987).
S has been referred to as the restricted estimator of X, but estimation can also be based on the unrestricted residuals derived from OLS regressions which include all explanatory variables from the SUR system. Considerable theoretical work has been devoted to the comparison of respective finite sample properties of the restricted and unrestricted SUR estimators associated with the different estimators of X. All of the results discussed in Srivastava and Giles (1987) were based on the assumption of normally distributed disturbances. More recently, Hasegawa (1995) and Srivastava and Maekawa (1995) have presented comparisons between the restricted and unrestricted estimators allowing for nonnormal errors. None of this work produces a conclusive choice between the two alternative estimators.
While theoretical work continues on both restricted and unrestricted estimators, software designers typically make the choice for practitioners. SAS, SHAZAM, TSP, and LIMDEP all use restricted residuals in the estimation of X. Moreover, there is limited scope to opt for alternatives with only LIMDEP and TSP allowing one to input their own choice of estimator for X. Where the software packages do vary is in the default choice of т and whether to iterate or not. See Silk (1996) for further discussion of software comparisons between SAS, SHAZAM, and TSP in terms of systems estimation.