INTRODUCTION TO STATISTICS AND ECONOMETRICS
Cramer-Rao Lower Bound
We shall derive a lower bound to the variance of an unbiased estimator and show that in certain cases the variance of the maximum likelihood estimator attains the lower bound.
THEOREM 7.4.1 (Cramer-Rao) Let L(Xb X2, . .., Xn | 0) be the likelihood function and let 0(Xlt X2,. . . , Xn) be an unbiased estimator of 0. Then, under general conditions, we have
(7.4.1) V(0) > --------- ^---------
log L
d02
The right-hand side is known as the Cramer-Rao lower bound (CRLB).
(In Section 7.3 the likelihood function was always evaluated at the observed values of the sample, because there we were only concerned with the definition and computation of the maximum likelihood estimator. In this section, however, where we are concerned with the properties of the maximum likelihood estimator, we need to evaluate the likelihood function at the random variables X, X2, . . . , X„, which makes the likelihood function itself a random variable. Note that E, the expectation operation, is taken with respect to the random variables Xb X2, . . . , Xn.)
Sketch of Proof. (A rigorous proof is obviously not possible, because the theorem uses the phrase “under general conditions.”) Put X = 0 and Y = d log L/00 in Theorem 7.4.1. Then we have
f dL 3 f 01 .
— ax = — Lax = — =0, 00 Э0 J 00
where the integral is an n-tuple integral with respect to ay, x2, . . . , xn. We also have
The unspecified general conditions, known as regularity conditions, are essentially the conditions on L which justify interchanging the derivative and the integration operations in (7.4.2), (7.4.3), and (7.4.5). If, for example, the support of L (the domain of L over which L is positive) depends on 0, the conditions are violated because the fifth equality of (7.4.2), the fourth equality of (7.4.3), and the third equality of (7.4.5) do not hold. We shall give two examples in which the maximum likelihood estimator attains the Cramer-Rao lower bound.
EXAMPLE 7.4.1 Let X ~ B(n, p) as in Example 7.3.1. Differentiating
(7.3.3) again with respect to p, we obtain
d2 log L = _ _ n ~ X
dp2 p2 (1 - pf ’ where we have substituted X for k because here we must treat L as a random variable. Therefore we obtain
(7.4.7) CRLB = ^^ • n
Since Vp = p( 1 — p)/n by (5.1.5), the maximum likelihood estimator p attains the Cramer-Rao lower bound and hence is the best unbiased estimator.
EXAMPLE 7.4.2 Let (XJ be as in Example 7.3.3 (normal density) except that we now assume a is known, so that |x is the only parameter to estimate. Differentiating (7.3.15) again with respect to p, we obtain
Therefore (7.4.9) CRLB = — •
n
— 9
But we have previously shown that V(X) = a /n. Therefore the maximum likelihood estimator attains the Cramer-Rao lower bound; in other words, X is the best unbiased estimator. It can be also shown that even if a2 is unknown and estimated, X is the best unbiased estimator.
FIGURE 7.6 Convergence of log likelihood functions |
The maximum likelihood estimator can be shown to be consistent under general conditions. We shall only provide the essential ingredients of the proof. Suppose {XJ are i. i.d. with the density f (x, 0). The discrete case can be similarly analyzed. Define
(7.4.10) QJfS) = - log Ln(0) = - X bg f(Xit 0),
n n ,
1= 1
where a random variable Xt appears in the argument of/ because we need to consider the property of the likelihood function as a random variable. To prove the consistency of the maximum likelihood estimator, we essentially need to show that Qn(0) converges in probability to a nonstochastic function of 0, denoted (5(0), which attains the global maximum at the true value of 0, denoted 0O. This is illustrated in Figure 7.6. Note that (9n(0) is maximized at 0„, the maximum likelihood estimator. If (9ri(0) converges to (9(0), we should expect 0„ to converge to 0O. (In the present analysis it is essential to distinguish 0, the domain of the likelihood function, from 0o, the true value. This was unnecessary in the analysis of the preceding section. Whenever L or its derivatives appeared in the equations, we implicidy assumed that they were evaluated at the true value of the parameter, unless it was noted otherwise.)
Next we shall show why we can expect (9„(0) to converge to (9(0) and why we can expect (9(0) to be maximized at 0O. To answer the first question, note by (7.4.10) that (9n(0) is (1 /n) times the sum of i. i.d. random variables. Therefore we can apply Khinchine’s LLN (Theorem 6.2.1), provided that E log /(X„ 0) < °°. Therefore plimre_*oo Q„(0) = (7(0) = E log /(X,, 0).
To answer the second question, we need
THEOREM 7.4.2 (Jensen) Let X be a proper random variable (that is, it is not a constant) and let g(-) be a stricdy concave function. That is to say, g[a + (1 — )6] > Xg(a) + (1 — h)g(b) for any a < b and 0 < Л. < 1. Then
(7.4.11) Eg(X) < g(EX). (Jensen’s inequality)
Taking g to be log and X to be f(X, 0)//(X, 0O) in Theorem 7.4.2, we obtain
/(X, 0) /(X, 0)
(7.4.12) E log------------ < log E-------------- if 0 Ф 0O.
B/(X,0O) /(X,0O)
But the right-hand side of the above inequality is equal to zero, because
(7.4.13) E ^X’ 6) = f” fix, d0)dx = Ґ /(*, %)dx = 1.
/(X, 0O) J-»y
Therefore we obtain from (7.4.12) and (7.4.13)
(7.4.14) E log/(X, 0) < E log /(X, 0O) if 0 Ф 0O.
We have essentially proved the consistency of the global maximum likelihood estimator. To prove the consistency of a local maximum likelihood estimator, we should replace (7.4.14) by the statement that the derivative of (7(0) is zero at 0O. In other words, we should show
л
(7.4.15) — E log L = 0.
But assuming we can interchange the derivative and the expectation operation, this is precisely what we showed in (7.4.2). The reader should verify (7.4.2) or (7.4.15) in Examples 7.4.1 and 7.4.2.