A COMPANION TO Theoretical Econometrics
Collinearity in maximum likelihood estimation
Collinearity in the context of maximum likelihood estimation is similarly diagnosed. Instead of minimizing the sum of squared errors we maximize the loglikelihood function. Standard gradient methods for numerical maximization use first and/or second derivatives. As in the Gauss-Newton algorithm for nonlinear least squares, these methods involve an inversion: the Hessian for the Newton-Raphson, the Information matrix for the method of scoring, and the cross-product matrix of first derivatives for the method of Berndt, Hall, Hall, and Hausman. In these algorithms if the matrix to be inverted becomes singular, or nearly so, estimation fails. In each case we can apply the BKW diagnostics to the matrix we are inverting at each step of the nonlinear optimization, and to the estimate of the asymptotic covariance matrix. The same difficulties arise in diagnosing collinearity here as in nonlinear least squares, only it is worse, because while the condition numbers provide a measure of how ill-conditioned the matrix is, the rows of Table 12.1 no longer provide any information about which variables are involved in collinear relations. Similar remarks hold for collinearity diagnosis in generalized least squares and simultaneous equations models.
Some common maximum likelihood estimators, among others, probit, logit, tobit, Poisson regression, and multiplicative heteroskedasticity, have information matrices of a common form,
t(p) = X'WX, (12.25)
where W is a T x T diagonal weight matrix that often is a function of the unknown parameters, p, and the independent variables.
The class of generalized linear models (McCullagh and Nelder, 1989) contains many of these estimators as special cases, and have information matrices in the form of equation (12.25), thus collinearity diagnostics for these models are relevant. Weissfeld and Sereika (1991) explore the detection of collinearity in the class of generalized linear models (GLM). Segerstedt and Nyquist (1992) observe that ill-conditioning in these models can be due to collinearity of the variables, X, the influence of the weights, W, or both. Weissfeld and Sereika suggest applying the BKW diagnostics to the scaled information matrix. Lee and Weissfeld
(1996) do the same for the Cox regression model. Once again, while the variance decompositions can be computed in these instances, their interpretation is not straightforward, since collinearity can be due to the weights, W.
Lesaffre and Marx (1993) also investigate the problem of ill-conditioning in GLM and take a slightly different approach. Following Mackinnon and Puterman
(1989) , they suggest that only the columns of X be standardized to unit length, forming X1. Then, conditioning diagnostics are computed on X1MXV where M is the estimated weight matrix based on the rescaled data. The square root of the ratio of the largest to smallest eigenvalue describes the worst relative precision with which linear combinations of the parameters can be estimated. Thus, this scaling gives a structural interpretation to the conditioning diagnostic. One problem with this scaling is that X1MX1 could be ill-conditioned because of the effects of M.