A COMPANION TO Theoretical Econometrics
Other Diagnostic Issues and Tools
There are a number of issues related to the diagnosis of collinearity, and other diagnostic tools. In this section we summarize some of these.
Eigenvalue magnitudes are affected by the scale of the data. There is wide agreement that the X matrix should be scaled before analyzing collinearity, and scaling the columns of X to unit length is standard. A much more hotly debated issue, chronicled in Belsley (1984), is whether the data should be centered, and then scaled, prior to collinearity diagnosis. If the data are centered, by subtracting the mean, the origin is translated so that the regression, in terms of the centered data, has a у-intercept of zero. The least squares estimates of slopes are unaffected by centering. The least squares estimate of the intercept itself can be obtained after the slopes are estimated, as b1 = у - b2x2 - ... - bKxK. So nothing is really gained, or lost, by centering. Let Xc be the X matrix after centering, scaling to unit length, and deleting the first column of zeros. Then X'cXc = Rc is the regressor correlation matrix.
The "pro-centering" point of view is summarized by Stewart (1987, p. 75), who suggests that the constant term is rarely of interest and its inclusion "masks" the real variables. "Centering simply shows the variable for what it is." The "anti-centering" viewpoint is based on several points. First, as a practical matter, centering lowers the condition number of the data (Belsley, 1991, p. 189), usually by a large amount, and thus makes it an unreliable diagnostic. Second, and more importantly, centering the data makes it impossible to identify collinearities caused by linear combinations of explanatory variables which exhibit little variation. If a variable, or a linear combination of variables, exhibits little variation, then it will be "collinear" with the constant term, the column of 1s in the first column of X. That is, suppose a2xt2 + a3xt3 + ... + aKxtK ~ a, where a is a constant. If xt1 = 1, then а2х^2 + a3xt3 + ... + aKxtK — ax^i ~ 0.
The pro-centering view is that the constant term is not interesting, and therefore such linear dependencies are not important. The anti-centering group notes that such a collinear relationship affects not only the intercept, but also affects the coefficients of the variables in the collinear relationship, whether the intercept is of theoretical importance or not.
We fall squarely into the anti-centering camp. The data should be scaled to unit length, but not centered, prior to examining collinearity diagnostics. The interested reader should see Belsley (1984), including comments, for the complete, lively, debate.