Advanced Econometrics Takeshi Amemiya
Stein’s Estimator: Heteroscedastic Case
Assume model (2.2.5), where Л is a general positive definite diagonal matrix. Two estimators for this case can be defined.
Ridge estimator: a* = (Л 4- yI)_1Aa,
af = (1 - Bi)dt where B, =
(Note: The transformation a = H'fi translates this estimator into the ridge estimator (2.2.12). у is either a constant or a function of the sample.)
Generalized ridge estimator: a* = (A + Г)-1Ла where Г is
diagonal,
af = (1 — 2?,)a, where
R - Уі
' ^ + У/’
fi* = (X'X + HTHT'X'y.
Other ridge and generalized ridge estimators have been proposed by various authors. In the three following ridge estimators, у is a positive quantity that does not depend on A,; therefore Bt is inversely proportional to A,. This is an intuitively appealing property because it seems reasonable to shrink the component with the larger variance more. In the four following generalized ridge estimators, exactly the opposite takes place: The amount of shrinkage Bt is an increasing function of A,—an undesirable property. In some ofthe estimators, a2 appears in the formula, and in some, its estimate a2, which is assumed to be independent of a, appears. As pointed out by Efron and Morris (1976), the fundamental properties of Stein’s estimators are not changed if a2 is independently estimated.
Selected Estimators and Their Properties
Ridge Estimators
Ridge 1 (Sclove, 1973)
a1 tr Л
Ridge 2 (Hoerl, Kennard, and Baldwin, 1975} and Modified Ridge 2 (Thisted, 1976)
Ко2
У = -7^.
a'a
This estimator is obtained by putting Л = I in Sclove’s estimator. Although the authors claimed its good properties on the basis of a Monte Carlo study, Thisted (1976) showed that it can sometimes be far inferior to the maximum likelihood estimator a; he proposed a modification, y = {K — 2)d2/a'd, and showed that the modified estimator is minimax for some Л if a2 is known.
Ridge 3 (Thisted, 1976)
у — K— if all dt <00
2 d, aj
i-l
= 0 otherwise,
where
Ar^tL
This estimator is minimax for all A if a2 is known. If Я, are constant, this estimator is reduced to the modified version of Ridge 2. When the A’s are too spread out, however, it becomes indistinguishable from d (which is minimax).
Generalized Ridge Estimators
Generalized Ridge 1 (Berger, 1975)
(K-2)a%
d'A2d '
This estimator is minimax for all A and reduces to Stein’s estimator when А,- are constant.
Generalized Ridge 2 (Berger, 1976)
d'A2a *
Berger (1976) obtained conditions on/under which it is minimax and admissible for all A.
Generalized Ridge 3 (Bhattacharya, 1966). Bt is complicated and therefore is not reproduced here, but it is an increasing function of A, and is minimax for all A.
a'Ad + go1 + h + aa2ki ’ This estimator is minimax for all A if
oa*s « [Mz2)+21
^mu L n J
2 К
n + 2 A SO.
Results
All the generalized ridge estimators are minimax for all A and generalized ridge 2 is also admissible, whereas among the ridge estimators only Thisted’s (which is strictly not ridge because of a discontinuity) is minimax for all A.
Because a is minimax with a constant risk, any other minimax estimator dominates a. However, the mere fact that an estimator dominates & does not necessarily make the estimator good in its own right. If the estimator is admissible as well, like Berger’s generalized ridge 2, there is no other estimator that dominates it. Even that, however, is no guarantee of excellence because there may be an estimator (which may be neither minimax nor admissible) that has a lower risk over a wide range of the parameter space. It is nice to prove minimaxity and admissibility; however, we should look for other criteria of performance as well, such as whether the amount of shrinkage is proportional to the variance—the criterion in which all the generalized ridge estimators fail.
The exact distributions of Stein’s or ridge estimators are generally hard to obtain. However, in many situations they may be well approximated by the jackknife and the bootstrap methods (see Section 4.3.4).