The Variance Decomposition of Belsley, Kuh, and Welsch (1980)

A property of eigenvalues is that tr( X'X) = XK=1 . This implies that the sizes of

the eigenvalues are determined in part by the scaling of the data. Data matrices

consisting of large numbers will have larger eigenvalues, in total, than data matrices with small numbers. To remove the effect of scaling BKW, whose collinearity diagnostic procedure we recommend, suggest scaling the columns of X to unit length. Define Sj = (X)1/2, and let S = diag(s1, s2,..., sK). Then the scaled X matrix is XS-1. This scaling is only for the purpose of diagnosing collinearity, not for model estimation or interpretation.

To diagnose collinearity, examine the proportion of the variance of each least

squares coefficient contributed by each individual eigenvalue. Define фjk = -,

are ordered in magnitude, with n1 = 1 and nK being the largest, since its denominator is XK, the smallest eigenvalue. The largest condition index is often called

Table 12.1 summarizes much of what we can learn about collinearity in data. BKW carried out extensive simulations to determine how large condition indexes affect the variances of the least squares estimators. Their diagnostic procedures, also summarized in Belsley (1991, ch. 5), are these:

Step 1

Begin by identifying large condition indices. A small eigenvalue and a near exact linear dependency among the columns of X is associated with each large condition index. BKW's experiments lead them to the general guidelines that indices in the range 0-10 indicate weak near dependencies, 10-30 indicate moderately strong near dependencies, 30-100 a strong near dependency, and indices in excess of 100 are very strong. Thus when examining condition indexes values of 30 and higher should immediately attract attention.

Step 2 (if there is a single large condition index)

Examine the variance-decomposition proportions. If there is a single large condition index, indicating a single near dependency associated with one small eigenvalue, collinearity adversely affects estimation when two or more coefficients each have 50 percent or more of their variance associated with the large condition index, in the last row of Table 12.1. The variables involved in the near dependency have coefficients with large variance proportions.

Step 2 (if there are two or more large condition indexes of

RELATIVELY EQUAL MAGNITUDE)

If there are J > 2 large and roughly equal condition indexes, then X' X has J eigenvalues that are near zero and J near exact linear dependencies among the columns of X exist. Since the J corresponding eigenvectors span the space containing the coefficients of the true linear dependence, the "50 percent rule" for identifying the variables involved in the near dependencies must be modified.

If there are two (or more) small eigenvalues, then we have two (or more) near exact linear relations, such as Xcj ~ 0 and Xcj ~ 0. These two relationships do not, necessarily, indicate the form of the linear dependencies, since X(a1ci + a2cj) ~ 0 as well. In this case the two vectors of constants ci and cf define a two-dimensional vector space in which the two near exact linear dependencies exist. While we may not be able to identify the individual relationships among the explanatory variables that are causing the collinearity, we can identify the variables that appear in the two (or more) relations.

Thus variance proportions in a single row do not identify specific linear dependencies, as they did when there was but one large condition number. In this case, sum the variance proportions across the J large condition number rows in Table 12.1. The variables involved in the (set of) near linear dependencies are identified by summed coefficient variance proportions of greater than 50 percent.

Step 2 (if there are J > 2 large condition indexes, with one

EXTREMELY LARGE)

An extremely large condition index, arising from a very small eigenvalue, can "mask" the variables involved in other near exact linear dependencies. For example, if one condition index is 500 and another is 50, then there are two near exact linear dependencies among the columns of X. However, the variance decompositions associated with the condition index of 50 may not indicate that there are two or more variables involved in a relationship. Identify the variables involved in the set of near linear dependencies by summing the coefficient variance proportions in the last J rows of Table 12.1, and locating the sums greater than 50 percent.

Step 3

Perhaps the most important step in the diagnostic process is determining which coefficients are not affected by collinearity. If there is a single large condition index, coefficients with variance proportions less than 50 percent in the last row of Table 12.1 are not adversely affected by the collinear relationship in the data. If there are J > 2 large condition indexes, then sum the last J rows of variance proportions. Coefficients with summed variance proportions of less than 50 percent are not adversely affected by the collinear relationships. If the parameters of interest have coefficients unaffected by collinearity, then small eigenvalues and large condition numbers are not a problem.

Step 4

If key parameter estimates are adversely affected by collinearity, further diagnostic steps may be taken. If there is a single large condition index the variance proportions identify the variables involved in the near dependency. If there are multiple large condition indexes, auxiliary regressions may be used to further study the nature of the relationships between the columns of X. In these regressions one variable in a near dependency is regressed upon the other variables in the identified set. The usual t-statistics may be used as diagnostic tools to determine which variables are involved in specific linear dependencies. See Belsley (1991, p. 144) for suggestions. Unfortunately, these auxiliary regressions may also be confounded by collinearity, and thus they may not be informative.

A COMPANION TO Theoretical Econometrics

Normality tests

Let us now consider the fundamental problem of testing disturbance normality in the context of the linear regression model: Y = Xp + u, (23.12) where Y = (y1, ..., …

Univariate Forecasts

Univariate forecasts are made solely using past observations on the series being forecast. Even if economic theory suggests additional variables that should be useful in forecasting a particular variable, univariate …

Further Research on Cointegration

Although the discussion in the previous sections has been confined to the possibility of cointegration arising from linear combinations of I(1) variables, the literature is currently proceeding in several interesting …

The Variance Decomposition of Belsley, Kuh, and Welsch (1980)

A COMPANION TO Theoretical Econometrics

Normality tests

Univariate Forecasts

Further Research on Cointegration

Новые и рекомендуемые материалы:

Производство и продажа хонинговального инструмента

Оборудование для производства краски

Теплообменники для паровых и водяных котлов

Станок для производства ТЕРИВА TERIVA (блоки перекрытия)

Оборудование для производства пенобетона

Расфасовка угля, торфа, кормов, оборудование для упаковки-дозирования

Паровые котлы на дровах, опилках

Где работают наши линии по производству пенобетона

Где работают наши линии по производству пенопласта

Малый бизнес

Производимое оборудование

Техническая литература

Как с нами связаться:

Контакты для заказов оборудования: