Advanced Econometrics Takeshi Amemiya
Comparison of the Maximum Likelihood Estimator and the Minimum Chi-Square Estimator
In a simple model where the vector x( consists of 1 and a single independent variable and where T is small, the exact mean and variance of MLE and the MIN x2 estimator can be computed by a direct method. Berkson (1955,1957) did so for the logit and the probit model, respectively, and found the exact mean squared error of the MIN x2 estimator to be smaller in all the examples considered.
Amemiya (1980b) obtained the formulae for the bias to the order of л~1 and the mean squared error to the order of л-2 of MLE and MIN x2 in a general logit model.3 The method employed in this study is as follows: Using (9.2.19) and the sampling scheme described in Section 9.2.5, the normal equation
(9.2.8) is reduced to
We can regard (9.2.40) as defining the MLE fi implicitly as a function of Pi, P2,. • ■ ,PT, say, fl = g,(Pi, P2, • • • ,PT)- Expanding g in a Taylor series around Px, P2,. . . , and noting that gfP, ,P2,- . . , PT) = fi0,we obtain
P ~ Po = 2 * А + X X (9-2 41)
t L t S
4” 7 5) ввг^А^п
" ( 1 r
where и, = Р, — P, and g„ gu, and gUr denote the first-, second-, and third - order partial derivatives of g evaluated at (Pt, P2,. . . ,PT), respectively. The bias of the MLE to the order of и-1 is obtained by taking the expectation of the first two terms of the right-hand side of (9.2.41). The mean squared error of the MLE to the order of n~2 is obtained by calculating the mean squared error of the right-hand side of (9.2.41), ignoring the terms of a smaller order than n~2. We need not consider higher terms in the Taylor expansion because EuЇ for к S 5 are at most of the order of nj3. A Taylor expansion for the MIN x2 estimator 0 is obtained by expanding the right-hand side of (9.2.32) around P,.
Using these formulae, Amemiya calculated the approximate mean squared errors of MLE and the MIN x2 estimator in several examples, both artificial and empirical, and found the MIN x2 estimator to have a smaller mean squared error than MLE in all the examples considered. However, the difference between the two mean squared error matrices can be shown to be neither positive definite nor negative definite (Ghosh and Sinha, 1981). In fact, Davis (1984) showed examples in which the MLE has a smaller mean squared error to the order of n~2 and offered an intuitive argument that showed that the greater T, the more likely MLE is to have a smaller mean squared error.
Amemiya also derived the formulae for the n_2-order mean squared errors of the bias-corrected MLE and the bias-corrected MIN x2 estimator and showed that the former is smaller. The bias-corrected MLE is defined as
A A
0 — Щ0), where В is the bias to the order of n~ and similarly for MINx2- This result is consistent with the second-order efficiency of MLE in the exponential family proved by Ghosh and Subramanyam (1974), as mentioned in Section 4.2.4. The actual magnitude of the difference of the »~2-order mean squared errors of the bias-corrected MLE and MIN x2 in Amemiya’s examples was always found to be extremely small. Davis did not report the corresponding results for her examples.
Smith, Savin, and Robertson (1984) conducted a Monte Carlo study of a logit model with one independent variable and found that although in point estimation MIN /2 did better than MLE, as in the studies of Berkson and Amemiya, the convergence of the distribution of the MIN /2 estimator to a normal distribution was sometimes disturbingly slow, being unsatisfactory in one instance even at n = 480.
For further discussion of this topic, see the article by Berkson (1980) and the comments following the article.