Introduction to the Mathematical and Statistical Foundations of Econometrics
Maximum Likelihood Theory
Consider a random sample Z ь..., Zn from a ^-variate distribution with density f (z 0), where в0 є © c 1” is an unknown parameter vector with © a given parameter space. As is well known, owing to the independence of the Zj’s, the joint density function of the random vector Z = (ZT, zj )T is the product of the marginal densities, ПП=1 f (zj в0). The likelihood function in this case is defined as this joint density with the nonrandom arguments zj replaced by the corresponding random vectors Zj, and в 0 by в:
n
L n (в) = П f (Zjв). (8.1)
j=1
The maximum likelihood (ML) estimator of в0 is now в = argmaxeє©Ln (в), or equivalently,
в = argmaxln(L n (в)), (8.2)
в є©
where “argmax” stands for the argument for which the function involved takes its maximum value.
The ML estimation method is motivated by the fact that, in this case,
E[ln(Ln(в))] < E[ln(Ln(во))]. (8.3)
To see this, note that ln(u) = u — 1 for u = 1 and ln(u) < u — 1for0 < u < 1 and u > 1. Therefore, if we take u = f (Zj в)/f (Zj в0) it follows that, for all в, ln(f (Zjв)/f (Zjв0)) < f (Zjв)/f (Zjв0) — 1, and if we take expectations
it follows now that
E[ln(f (Zjв)/f(Zj |0o))] < E[f (Zjв)/f(Zj |0q)] - 1
Summing up for j = 1, 2,...,n, (8.3) follows.
This argument reveals that neither the independence assumption of the data Z = (ZT, ZT)T nor the absolute continuity assumption is necessary for (8.3). The only condition that matters is that
E [Ln (в )/Ln (в0)] < 1
for all в e © and n > 1. Moreover, if the support of Zj is not affected by the parameters in в0 - that is, if in the preceding case the set {z e r” : f (ze) > 0} is the same for all в e © - then the inequality in (8.4) becomes an equality:
E [Ln (в )/L n (в0)] = 1
for all в e © and n > 1. Equality (8.5) is the most common case in econometrics.
To show that absolute continuity is not essential for (8.3), suppose that the Zj’s are independent and identically discrete distributed with support S, that is, for all z e S, P[Zj = z] > 0 and J]zeS P[Zj = z] = 1. Moreover, now let f (ze0) = P[Zj = z], where f (ze) is the probability model involved. Of course, f (ze) should be specified such that J]zeS f (ze) = 1forall в e ©.For example, suppose that the Zj’s are independent Poisson (в0) distributed, and thus f (ze) = е-ввz/z! and S = {0, 1, 2,...}. Then the likelihood function involved also takes the form (8.1), and
E[f(Zjв)/f(Zjв0)] = g f(z^) = g f(z^) = 1;
hence, (8.5) holds in this case as well and therefore so does (8.3).
In this and the previous case the likelihood function takes the form of a product. However, in the dependent case we can also write the likelihood function as a product. For example, let Z = (ZT, Zj)T be absolutely continuously distributed with joint density fn (zn,..., zi в0), where the Zj’s are no longer independent. It is always possible to decompose a joint density as a product of conditional densities and an initial marginal density. In particular, letting, for t > 2,
ft (ztzt-1, ..., Zl, в) = ft (zt, ..., z1 в )/ft-1(zt-1, ..., z1 в),
we can write
n
fn (zn, Zlв) = fi(zie )f[ ft (zt zt-1, ...,Z1,0).
t =2
Therefore, the likelihood function in this case can be written as
n
L n (в) = fn (Zn, Zi ) = fi( Z ів Щ f (Zt Zt-i, Z і, в).
t =2
(8.6)
It is easy to verify that in this case (8.5) also holds, and therefore so does (8.3). Moreover, it follows straightforwardly from (8.6) and the preceding argument that
for t = 2, 3,..., n;
hence,
P(E[ln(Lt(в)/Lt-і(в)) - ln(Lt(во)/Lt-і(во))Zt-і = і for t = 2, 3,...,n.
Of course, these results hold in the independent case as well.