Advanced Econometrics Takeshi Amemiya
Optimal Significance Level
In the preceding sections we have considered the problem of choosing between equations in which there is no explicit relationship between the competing regressor matrices X, and X2. In a special case where one set of regressors is contained in the other set, the choice of an equation becomes equivalent to the decision of accepting or rejecting a linear hypothesis on the parameters of the broader model. Because the acceptance or rejection of a hypothesis critically depends on the significance level chosen, the problem is that of determining the optimal significance level (or, equivalently, the optimal critical value) of the /"test according to some criterion. We shall present the gist of the results obtained in a few representative papers on this topic and then shall explain the connection between these results and the foregoing discussion on modifications of R2.
Let the broader of the competing models be
у = X0 + u - Хф, + X2fi2 + и
where X, and X2 are TXKx and TXK2 matrices, respectively, and for which we assume u is normal so that model (2.1.17) is the same as Model 1 with
normality. Note that the X2 here has no relationship with the X2 that appears in Eq. (2.1.2). Suppose we suspect Д might be 0 and test the hypothesis/?2 = 0 by the F test developed in Section 1.5.2. The appropriate test statistic is obtained by putting fi2 = 0 in Eq. (1.5.19) as
,2u8)
The researcher first sets the critical value d and then chooses the model (2.1.17) if rj Ш d or the constrained model
y = X^ + u (2.1.19)
if t] < d.
Conventionally, the critical value d is determined rather arbitrarily in such a way that P(t] S d) evaluated under the null hypothesis equals a preassigned significance level such as 1 or 5 percent. We shall consider a decision-theoretic determination of d. For that we must first specify the risk function. The decision of the researcher who chooses between models (2.1.17) and (2.1.19) on the basis of the F statistic t] may be interpreted as a decision to estimate fi by the estimator fi defined as
fi = fi if ri^d (2.1.20)
"[o'] if,<il
where fi is the least squares estimator applied to (2.1.17) and P is that applied to (2.1.19). Thus it seems reasonable to adopt the mean squared error matrix £2 = E{fi — P){fi — P)', where the expectation is taken under (2.1.17) as our risk (or expected loss) function. However, £2 is not easy to work with directly because it depends on many variables and parameters, namely, X, fi, a1, K, Kt, and d, in addition to having the fundamental difficulty of being a matrix. (For the derivation of £2, see Sawa and Hiromatsu, 1973, or Farebrother, 1975.) Thus people have worked with simpler risk functions.
Sawa and Hiromatsu (1973) chose as their risk function the largest characteristic root of
[Q/(X'X)-1Q]~1/2Q'£2Q[Q,(X, X)-1Q]-1/2, (2.1.21)
where Q' = (0,1) where 0 is the K2 X matrix of zeros and I is the identity matrix of size K2. This transformation of £2 lacks a strong theoretical justification and is used primarily for mathematical convenience. Sawa and Hire-
matsu applied the minimax regret strategy to the risk function (2.1.21) and showed that in the special case K2 = 1 ,d= 1.88 is optimal for most reasonable values of T—K. Brook (1976) applied the minimax regret strategy to a different transformation of fl,
and recommended d — 2 on the basis of his results. The risk function (2.1.22) seems more reasonable than (2.1.21) as it is more closely related to the mean squared prediction error (see Section 1.6). At any rate, the conclusions of these two articles are similar.
Now on the basis of these results we can evaluate the criteria discussed in the previous subsections by asking what critical value is implied by each criterion in a situation where a set of the regressors of one model is contained in that of the other model. We must choose between models (2.1.17) and (2.1.19). For each criterion, let p denote the ratio of the value of the criterion for model (2.1.19) over that for model (2.1.17). Then, using (2.1.18), we can easily establish a relationship between r/ and p. For Theil’s criterion we have from (2.1.6)
TheiD-^TT^- (2-1.23)
Therefore we obtain the well-known result—that Theil’s criterion selects (2.1.19) over (2.1.17) if and only if q < 1. Thus, compared with the optimal critical values suggested by Brook or by Sawa and Hiromatsu, Theil’s criterion imposes far less of a penalty upon the inclusion of regressors. From the prediction criterion (2.1.13) we get
(Г-^ХГ + А) T—K
1 {K - K$T + Kx) Я’ K-Kx
Therefore
Table 2.2 gives the values of 2T/(T+ Kx) for a few selected values of Kx/Т. These values are close to the values recommended by Brook and by Sawa and Hiromatsu. The optimal critical value of the Ftest implied by the AIC can be easily computed for various values of Kx/T and K/T from (2.1.16) and (2.1.18). The critical value for the AIC is very close to that for the PC, although
Table 2.2 Optimal critical value of the F test implied by PC
IT
T+Kt
1/10
1/20
1/30
it is slightly smaller. Finally, for the MC we have from (2.1.15)
Therefore
p(MC) > 1 if and only if t}> 2.
These results give some credence to the proposition that the modified R2 proposed here is preferred to Theil’s corrected R2 as a measure of the goodness of fit. However, the reader should take this conclusion with a grain of salt for several reasons: (1) None of the criteria discussed in the previous subsections is derived from completely justifiable assumptions. (2) The results in the literature of the optimal significance level are derived from the somewhat questionable principle of minimizing the maximum regret. (3) The results in the literature on the optimal significance level are relevant to a comparison of the criteria considered in the earlier subsections only to the extent that one set of regressors is contained in the other set. The reader should be reminded again that a measure of the goodness of fit is merely one of the many things to be considered in the whole process of choosing a regression equation.