Advanced Econometrics Takeshi Amemiya
Stein’s Estimator: Homoscedastic Case
Let us consider a special case of the canonical model (2.2.5) in which Л = I. James and Stein (1961) showed that £l|[l — c(d'd)_1]d — all2 is minimized for all a when с = (К — 2)<r2 if a2 is known and КШ 3, where llxll2 denotes the vector product x'x. If we define
the result of James and Stein implies in particular that Stein’s estimator is uniformly better than the maximum likelihood estimator a with respect to the risk function E(ol — a)'(d — a) if К ё 3. In other words, a is inadmissible (see the definition of inadmissible in Section 2.1.2). The fact that d is minimax with a constant risk (see Hodges and Lehman, 1950) implies that d* is minimax. This surprising result has had a great impact on the theory of statistics.
Translated into the regression model, the result of James and Stein implies
two facts: (1) Consider the ridge estimator (X'X + yI)“'X'y with у = (К — 2)o2/[a'a — {К — 2)a2]. If X'X = I, it is reduced precisely to Stein’s estimator of P since X'y — N(fi, a21). Therefore this ridge estimator is uniformly better than the least squares estimator if X'X = I (the opposite of multicollinearity). (2) Assume a general X'X in Model 1. If we define A = (X'X)1/2, we have Afi — N{Aft, a21), where ft is the least squares^estimator. Applying Stein’s estimator to АД we ^ know^ £11(1 — B)Afl — Ay? ll2 < Е\кф — P)\2 for all^where В = {К— 2)сг2//?'Х'хД Therefore, equivalently, (1 — B)P is unjformly better than p with respect to the risk function E(P — P)'X'X(P — P). Note that this is essentially the risk function we proposed in (1.6.10) in Section 1.6, where we discussed prediction.
So far, we have assumed er2 is known. James and Stein showed that even when a2 is unknown, if S is distributed independent of a and as er2/2, then £11 [1 — c5(a'a)_1]a — all2 attains the minimum for all a and a2 at c = (K — 2)/(n + 2) if 3. They also showed that [1 — с£(а'а)-1]а is uniformly better than a if 0 < c < 2(K — 2)/(« + 2). In the regression model we can put S = y'[I — X(X'X)-1X']y because it is independent of P and distributed as g2Xt-k-
Efron and Morris (1972) interpreted Stein’s estimator as an empirical Bayes estimator. Suppose a — N(a, o2l), where a2 is known, and the prior distribution of a is N(0, ff2y_11). Then the Bayes estimator is <£* = (! + y)-1a = (1 — B)awhereB = y/(l + y). The marginal distribution of a is N(Q, a2B~4). Therefore Ba'a/a2 — /*. Because £[(/^)_l] = (К— 2)_1 (see Johnson and Kotz, 1970a, vol. 1, p. 166), we have
Thus we can use the term within the square bracket as an unbiased estimator of B, thereby leading to Stein’s estimator.
It is important not to confuse Stein’s result with the statement that E(a* — a){a* — a)' < E{a — a)(a — a)' in the matrix sense. This inequality does not generally hold. Note that Stein’s estimator shrinks each component of a by the same factor B. If the amount of shrinkage for a particular component is large, the mean squared error of Stein’s estimator for that component may well exceed that of the corresponding component of a, even though Ella* — all2 < £11 a — all2. In view of this possibility, Efron and Morris (1972) proposed a compromise: Limit the amount of shrinkage to a fixed amount for each component. In this way the maximum possible mean squared error for the components of a can be decreased, whereas, with luck, the sum of the mean squared errors will not be increased by very much.
Earlier we stated that the maximum likelihood estimator a is inadmissible because it is dominated by Stein’s estimator. A curious fact is that Stein’s estimator itself is dominated by
Stein’s positive-rule estimator 1 — &,
L «« J+
where [x]+ denotes max[0, x]. Hence Stein’s estimator is inadmissible, as proved by Baranchik (1970). Efron and Morris (1973) showed that Stein’s positive-rule estimator is also inadmissible but cannot be greatly improved upon.
We defined Stein’s estimator as the estimator obtained by shrinking a toward 0. Stein’s estimator can easily be modified in such a way that it shrinks a toward any other value. It is easy to show that
Stein’s modified estimator 1 — т-я^-т./я7—г (a — с) + с
L (a —c)'(a —c)J
is minimax for any constant vector c. If the stochastic quantity X-1lTa is chosen to be c, where 1 is the vector of ones, then the resulting estimator can be shown to be minimax for ATS 4.