Springer Texts in Business and Economics
The Best Predictor
a. The problem is to minimize E[Y — h(X)]2 with respect to h(X). Add and subtract E(Y/X) to get
E{[Y — E(Y/X)] + [E(Y/X) — h(X)]}2
= E[Y — E(Y/X)]2 + E[E(Y/X) — h(X)]2
and the cross-product term E[Y — E(Y/X)] [E(Y/X) — h(X)] is zero because of the law of iterated expectations, see the Appendix to Chapter 2 Amemiya (1994). In fact, this says that expectations can be written as
E = ExEy/x
and the cross-product term given above EY/X [Y—E(Y/X)] [E(Y/X) — h(X)] is clearly zero. Hence, E[Y — h(X)]2 is expressed as the sum of two positive terms. The first term is not affected by our choice of h(X). The second term however is zero for h(X) = E(Y/X). Clearly, this is the best predictor of Y based on X.
b. In the Appendix to Chapter 2, we considered the bivariate Normal distribution and showed that E(Y/X) = p, Y + p^ (X — p, X). In part (a), we showed that this is the best predictor of Y based on X. But, in this case, this is exactly the form for the best linear predictor of Y based on X derived in problem 2.16. Hence, for the bivariate Normal density, the best predictor is identical to the best linear predictor of Y based on X.
References
Freund, J. E. (1992), Mathematical Statistics (Prentice-Hall: New Jersey).