Serial correlation in the disturbances of the linear regression model
The seminal work of Cochrane and Orcutt (1949) alerted the econometric profession to the difficulties of assuming uncorrelated disturbances in time series applications of the linear regression model. It soon became well known that neglecting serial correlation in regression disturbances can lead to inefficient parameter estimates, misleading hypothesis tests on these parameters and inefficient regression forecasts. The initial focus was on the linear regression model where y is n x 1, X is an n x k nonstochastic matrix of rank k < n, в is a k x 1 parameter vector and u is an n x 1 disturbance vector whose elements are assumed to follow a stationary AR(1) process
u = put_1 + et, |p| < 1, є t ~ iid(0, о2). (3.13)
It was recognized (see for example Johnston, 1972), that the usual reason for including the disturbance term in the regression model is to account for the effects of omitted or unobservable regressors, errors in the measurement of the dependent variable, arbitrary human behavior and functional approximations. Given that some of these effects are typically autocorrelated, it was thought that the AR(1) disturbance process (3.13) might be a good model for the regression disturbances in econometric applications.
From the 1970s onwards, there has been an increased understanding of other forms of serially correlated disturbance processes, no doubt helped by the work of Box and Jenkins (1970) and others. With the increased use of quarterly time series data, the simple AR(4) process
u = P4Ut_4 + єt, | P4I < 1, є1 ~ iid(0, о2) (3.14)
was made popular by the work of Thomas and Wallis (1971) and Wallis (1972). Combined with an AR(1) process, it also leads to the restricted AR(5) process
ut = P1Ut_1 + P4Ut_4 _ P1P4Ut_5 + єt, єt ~ iid(0, о2). (3.15)
New numerical algorithms and improvements in computer hardware in the 1970s and 1980s have allowed MA disturbance models to be an alternative to the standard AR models - see for example, Nichols, Pagan, and Terrell (1975) for a review. The MA(1) disturbance model takes the form
u t = Єї + ує t_1, є t ~ iid(0, о2) (3.16)
and the simple MA(4) process
ut = єі + y^_4, є t ~ iid(0, о2) (3.17)
is an alternative model to (3.14) for quarterly data. Of course there is really no reason not to consider a general ARMA(p, q) model for the disturbance process which is what many econometricians these days consider rather than just an AR(1) model.
In many applications of the linear regression model, it is clear that the value of the dependent variable depends very much on some fraction of its value in the previous period as well as on other independent variables. This leads naturally to the dynamic linear regression model, the simplest version of which is
yt = aiyt_i + x'в + ut, t = 1,..., n, (3.18)
where a1 is a scalar, в is a k x 1 parameter vector, xt is a k x 1 vector of exogenous variables and ut is the disturbance term. A more general model is
yt = a1 yt_1 + ... + avyt_v + x't в + ut, t = 1,..., n. (3.19)
For completion, we need to also provide an assumption on the generation of y0, y_1,..., y1-p. One approach is to treat y0, y_1,..., y1_p as constants. Another is to assume they each have a constant mean equal to E(y1) and that vt = yt _ E(yt) follows the stationary AR(p) process
Vt = avt_1 + a 2Vt_2 + ... + a pVt_p + ut
in which ut is the error term in (3.19). The former assumption is appropriate if we wish to make inferences conditional on the values taken by y0, y_1,..., y1_p while the latter assumption has been made by Tse (1982) and Inder (1985, 1986).
In some circumstances, there is not much difference between the linear regression (3.12) with AR(1) errors and the first-order dynamic regression model (3.18). To see this, suppose xt is made up of a constant intercept regressor and the time trend, i. e. xt = (1, t)'. Consider
yt = x' в + ut (3.20)
in which ut follows the AR(1) process (3.13). If we lag (3.20) one period, multiply it by p and subtract from (3.20), we get
yt = pyt_1 + x'tв _ px'_1e + Et. (3.21)
Observe that (3.21) is a regression with a well-behaved error term and five regressors. Because when xt is lagged, it remains a constant regressor and a linear trend regressor, there is perfect multicollinearity between the regressors of xt and xt_1. This problem can be solved by dropping the px'_1e term, in which case (3.21) becomes the simple dynamic linear model (3.18) with ut ~ iid(0, о2). Thus in the general case of a linear regression model of the form of (3.20), if xt is "lag invariant" in the sense that all regressors in xt_1 can be written as linear combinations of the regressors purely from xt, then (3.18) and (3.20) are equivalent. This simple analysis has ignored the first observation. Any difference between the two models could depend largely on what is assumed about the first observation in each model.
This section discusses the problem of estimation for the models of the previous section. Recall that subsection 2.1 considered models with mean zero. These can be readily generalized to models in which yt has a constant but unknown mean, say p1. Such models can be written as a special case of a regression model (3.12) in which X is the vector of ones. It is very rare that the mean of yt is known to be zero, so models which allow for nonzero means are more realistic. Thus in this section, we shall restrict our attention to the estimation of the linear regression model (3.12) with various ARMA-type disturbance processes.