Advanced Econometrics Takeshi Amemiya
Nonlinear Full Information Maximum Likelihood Estimator
In this subsection we shall consider the maximum likelihood estimator of model (8.2.1) under the normality assumption of uu. To do so we must assume that (8.2.1) defines a one-to-one correspondence between y, and uf. This assumption enables us to write down the likelihood function in the usual way as the product of the density of u, and the Jacobian. Unfortunately, this is a rather stringent assumption, which considerably limits the usefulness of the nonlinear full information maximum likelihood (NLFI) estimator in practice. There are two types of problems: (1) There may be no solution for у for some values of u. (2) There may be more than one solution for у for some values of u. In the first case the domain of u must be restricted, a condition that implies that the normality assumption cannot hold. In the second case we must specify a mechanism by which a unique solution is chosen, a condition that would complicate the likelihood function. We should note that the NL2S and NL3S estimators are free from both of these problems.
Assuming ut ~ N(0, 2), we can write the log likelihood function of model (8.2.1) as
L*----- -- log |2| + 2 lQg liafr/dy;il - і f f&~lfr (8-2.7)
^ 1-І * 1-1
Solving dL*/d2, = 0 for 2, we get
<8-2-8>
1 f-i
Inserting (8.2.8) into (8.2.7) yields the concentrated log likelihood function
l = £ log iiaf(/6y;n-Tiog -2 ;
t-1 z 11-1
The NLFI maximum likelihood estimator of a is defined as the value of a that maximizes (8.2.9).
Amemiya (1977a) proved that if the true distribution of u, is normal, NLFI is consistent, asymptotically normal, and in general has a smaller asymptotic covariance matrix than BNL3S. It is well known that in the linear model the full information maximum likelihood estimator has the same asymptotic distribution as the three-stage least squares estimator. Amemiya showed that the asymptotic equivalence occurs if and only if fu can be written in the form
fu(y„ x„ a() = A(a,)'z(y„ x,) + fi,(a£„ xf), (8.2.10)
where z is an TV-vector of surrogate variables.
If the true distribution of uf is not normal, on the other hand, Amemiya proved that NLFI is generally not consistent, whereas NL3S is known to be consistent even then. This, again, is contrary to the linear case in which the full information maximum likelihood estimator obtained under the assumption of normality is consistent even if the true distribution is not normal. Note that this result is completely separate from and in no way contradicts the quite likely fact that the maximum likelihood estimator of a nonlinear model derived under the assumption of a certain regular nonnormal distribution is consistent if the true distribution is the same as the assumed distribution.
We shall see how the consistency of NLFI crucially depends on the normality assumption. Differentiating (8.2.9) with respect to a,-, we obtain
where ( )Yl denotes the /th column of the inverse of the matrix within the parentheses. The consistency of NLFI is equivalent to the condition
іішЯ^ІГ =0 (8.2.12)
Tdoti «о
and hence to the condition
lim 4 2 E iHr=Iim t 2 еьіїр** (8-2-13)
1 t-l °Uit I »-l
where a1 is the /th column of2~l. Now (8.2.13) could hold even if each term of a summation is different from the corresponding term of the other, but that event is extremely unlikely. Therefore we can say that the consistency of NLFT is essentially equivalent to the condition
= (8.2.14)
oUft
It is interesting to note that the condition (8.2.14) holds if u, is normal because of the following lemma.5
Lemma. Suppose u = {u1,u2,. . . , uN)' is distributed as N(0,2), where 2 is positive definite. If d/j(u)/dw, is continuous, Edh/dUj < °°, and Ehut < oo, then
Е — = Е1т'о (8.2.15)
oUt
where a‘ is the /th column of 2_1.
In simple models the condition (8.2.14) may hold without normality. In the model defined by (8.1.36) and (8.1.37), we have gu = —z and g2t = —xt. Therefore (8.2.14) clearly holds for і = 2 for any distribution of ut provided that the mean is 0. The equation for і = 1 gives (8.1.39), which is satisfied by a class of distributions including the normal distribution. (Phillips, 1982, has presented another simple example.) However, if g„ is a more complicated nonlinear function of the exogenous variables and the parameters {a,} as well as of u, (8.2.14) can be made to hold only when we specify a density that depends on the exogenous variables and the parameters of the model. In such a case, normality can be regarded, for all practical purposes, as a necessary and sufficient condition for the consistency of NLFI.
It is interesting to compare certain iterative formulae for calculating NLFI and BNL3S. By equating the right-hand side of (8.2.11) to 0 and rearranging terms, we can obtain the following iteration to obtain NLFI:
o(2) = a(i)-(i6/A-1G)-1G'A-1f, (8.2.16)
where
and 6 = diag(6,, G2,. . . , &N) and all the variables that appear in the second term of the right-hand side of (8.2.16) are evaluated at a^j.
The Gauss-Newton iteration for BNL3S is defined by
o<2) = «о) - (G'A-'Gr'G'A-'f, (8.2.18)
where G,' = EG' and G = diag(Gj, G2,. . . , G^) as before.
Thus we see that the only difference between (8.2.16) and (8.2.18) is in the respective “instrumental variables” used in the formulae. Note that 6,- defined in (8.2.17) can work as a proper set of “instrumental variables” (that is, variables uncorrelated with_u,) only if ut satisfies the condition of the aforementioned lemma, whereas G, is always a proper set of instrumental variables, a fact that implies that BNL3S is more robust than NLFI. If u, is normal, however, 6, contains more of the part of G,- uncorrelated with u, than Gt does, which implies that NLFI is more efficient than BNL3S under normality.
Note that (8.2.16) is a generalization of the formula (7.2.12) for the linear case. Unlike the iteration of the linear case, however, the iteration defined by
(8.2.16) does not have the property that fi(2) is asymptotically equivalent to NLFI when c^d is consistent. Therefore its main value may be pedagogical, and it may not be useful in practice.