A COMPANION TO Theoretical Econometrics
The OPG Regression
By no means all interesting econometric models are regression models. It is therefore useful to see if artificial regressions other than the GNR exist for wide classes of models. One of these is the outer-product-of-the-gradient regression, or OPG
regression, a particularly simple artificial regression that can be used with most models that are estimated by maximum likelihood. Suppose we are interested in a model of which the loglikelihood function can be written as
n
€(9) = (0), (1.27)
t = 1
where €t() denotes the contribution to the loglikelihood function associated with observation t. This is the log of the density of the dependent variable(s) for observation t, conditional on observations 1,..., t - 1. Thus lags of the dependent variable(s) are allowed. The key feature of (1.27) is that €(0) is a sum of contributions from each of the n observations.
Now let G(0) be the matrix with typical element
The matrix G(0) is called the matrix of contributions to the gradient, or the CG matrix, because the derivative of the sample loglikelihood (1.27) with respect to 0i, the ith component of 0, is the sum of the elements of column i of G(0). The OPG regression associated with (1.27) can be written as
і = G(0)b + residuals, (1.28)
where і denotes an n-vector of 1s.
It is easy to see that the OPG regression (1.28) satisfies the conditions for it to be an artificial regression. Condition (1.1') is evidently satisfied, since RT(0)r(0) = GT(0) і, the components of which are the derivatives of €(0) with respect to each of the 0i. Condition (2) is also satisfied, because, under standard regularity conditions, if 0 is the true parameter vector,
plim(n~1RT (9)R(9)) = plim(n-1G T (9)G(9)) = J (9).
n — ro n—— ro
Here J (0) denotes the information matrix, defined as
J (9) = lim - X E(G T (9)Gf (9)),
n—ro n
t =1
where Gt () is the tth row of G(). Since, as is well known, the asymptotic covariance matrix of n1/2(0 - 0O) is given by the inverse of the information matrix, condition
(2) is satisfied under the further weak regularity condition that J(0) should be continuous in 0. Condition (3) is also satisfied, since it can be shown that one-step estimates from the OPG regression are asymptotically equivalent to maximum likelihood estimates. The proof is quite similar to the one for the GNR given in Section 3.
It is particularly easy to compute an LM test by using the OPG regression. Let 0 denote the constrained ML estimates obtained by imposing r restrictions when maximizing the loglikelihood. Then the ESS from the OPG regression
і = G(0)b + residuals, (1.29)
which is equal to n times the uncentered R2, is the OPG form of the LM statistic. Like the GNR, the OPG regression can be used for many purposes. The use of what is essentially the OPG regression for obtaining maximum likelihood estimates and computing covariance matrices was advocated by Berndt, Hall, Hall, and Hausman (1974). Using it to compute Lagrange Multiplier, or LM, tests was suggested by Godfrey and Wickens (1981), and using it to compute information matrix tests was proposed by Chesher (1983) and Lancaster (1984). The OPG regression is appealing for all these uses because it applies to a very wide variety of models and requires only first derivatives. In general, however, both estimated covariance matrices and test statistics based on the OPG regression are not very reliable in finite samples. In particular, a large number of papers, including Chesher and Spady (1991), Davidson and MacKinnon (1985a, 1992), and Godfrey, McAleer, and McKenzie (1988), have shown that, in finite samples, LM tests based on the OPG regression tend to overreject, often very severely.
Despite this drawback, the OPG regression provides a particularly convenient way to obtain various theoretical results. For example, suppose that we are interested in the variance of 02, the last element of 0. If 01 denotes a vector of the remaining k - 1 elements, and G(0) and b are partitioned in the same way as 0, the OPG regression becomes
і = G1(0)b1 + G2(0)b2 + residuals,
and the FWL regression derived from this by retaining only the last regressor is
M1i = M1G2b2 + residuals,
where M1 = I - G1(G|G1)-1G j, and the dependence on 0 has been suppressed for notational convenience. The covariance matrix estimate from this is just
(GjM1G2)-1 = (GjG2 - GjG1(GjG1)-1GjG2)-1. (1.30)
If we divide each of the components of (1.30) by n and take their probability limits, we find that
lim var(nl/2(02 - 02O)) = (J22 - J21J-1
n—— ro
where 020 is the true value of 02. This is a very well-known result, but, since its relation to the FWL theorem is not obvious without appeal to the OPG regression, it is not usually obtained in such a convenient or illuminating way.