A COMPANION TO Theoretical Econometrics
Maximum likelihood estimation
An n x 1 vector y generated by the linear regression model (3.12) with an ARMA(p, q) disturbance process with normal errors, i. e. et ~ IN(0, a2), can be shown to be distributed as
y ~ N(XP, a2Q(p, у)) (3.22)
where a2Q(p, y) is the covariance matrix of the ARMA(p, q) process in which p = (p1,..., pp)' and y = (Y1,..., Yq)' are parameter vectors. The loglikelihood function of this model is
n 1
№ a^ ^ Yly) = - 2 log (2na2) - 2 logl °(p' Y)l 1
- ^(y - XP)'Q(P, Y)-1(y - XP). (3.23)
For any given values of p and y, the values of в and a2 that maximize (3.23) are Up, Y = (X'Q(p, Y)-1X)-1X'Q(p, y)-1y (3.24)
and
dP, Y = (y - ^p/0^ Y)-1(y - XIU/n. (3.25)
If these estimators of в and a2 are substituted back into (3.23), we get the profile or concentrated loglikelihood:
n 1 n
/p(p, Yly) = - 2 log (2ndP, Y) - - logl0(p, Y)l - - (3.26)
Full maximum likelihood estimates of the parameters in this model are therefore obtained by first finding the values of the p and y vectors which maximize (3.26). These values of p and y, denoted p and y, are then used in (3.24) and (3.25) to find U and d2, the maximum likelihood estimates.
There are a number of practical issues in this process worthy of discussion. The first involves exploiting the Cholesky decomposition of Q(p, y) denoted L(p, y) such that
L(p, y)L(p, y)' = Q(p, y)
where L(p, y) is a lower triangular matrix. If we denote
y*(^ Y) = Lfa Y)-1y |
(3.27) |
X*(p, Y) = L(p, Y)-1X, |
(3.28) |
then (3.24) becomes
Up, Y = (X*(p, Y)'X*(p, Y))-1X*(p, Y)'y*(p, Y), (3.29)
the ordinary least squares (OLS) estimator from the regression of y*(p, y) on X*(p, y), and (3.25) becomes
Z p, Y = e *(p, Y)'e *(p, Y)/n, (3.30)
the sum of squared residuals from this regression divided by n, where
e*(p, Y) = y*(p, Y) - X*(p, Y)Up, y. (3.31)
The problem of maximizing (3.26) with respect to p and y now reduces to one of maximizing
n n
- 2 ^gH^ Y/e*^ Y)) - log I L(P, Y)l = - 2 log(e*(p, YУ e*(^ Y))
where
e *(p, y) = e*(p, y)I L(p, Y)l1/n.
Thus the estimation problem reduces to minimizing the sum of squares
s = X e *(p, Y)2 (3.32)
i =1
with respect to p and Y. This may be achieved by applying a standard nonlinear least squares algorithm to s and then using the resultant p and у in
(3.29) and (3.30) to obtain U and o2. Also observe that because L(p, y) is a lower triangular matrix, its determinant is easily calculated as the product of the diagonal elements, i. e.
|L(P, у)| = ПL(P, Y)ii. (3.33)
i=1
The remaining issue is the construction of Q(p, y) or more importantly L(p, y) or L_1(p, y). Note that through (3.27), (3.28), (3.29), (3.30), and (3.32), maximum likelihood estimates can be obtained without the need to calculate Q(p, y) or Q-1(p, y). In the case of AR(1) disturbances
(1 - p2)1/2 0 ..................... 0 0
-P 1 :
L-1(p, X) =
1 0 - P 1
which was first derived by Prais and Winsten (1954) and is also known as the Prais-Winsten transformation.
For AR(p) disturbances, van der Leeuw (1994) has shown that
Q(p, y) = [PP - NN'j-1
where P is the n x n triangular matrix
1 |
0 |
0 ............ |
о о |
.......... 0 0 |
-P1 |
1 |
0 |
0 |
|
P2 |
-Pl |
1 |
Pp Pp-1 .................. |
.......... 1 |
0 ....................... |
.................. 0 |
0 - Pp........................ |
.......... - P1 |
1 |
0 |
: : 10 0 0....................................... 0 0 L - pp..................... - Pj 1
and N is an n x p matrix that has all elements zero except for its top p x p block which has the triangular form
pp pp-i............................ pi
0 - pp p2
0 0
00 0 -pp
As noted by Ara (1995), in this case L(p, y)-1 has the same form as P but with the top left p x p block replaced by the lower triangular matrix
a11 0 .................. 0
$21 0^22 0
ap1 dp2 ........................ app _
The afj values can be calculated recursively in the following order app = (1 - Pp)1^
ap, p-i = (—pi — Pp-i Pp)/app, і = 1, . . . , p — 1
for j = p - 1, p - 2,..., 2 with m = p - j
for € = 1,..., j - 2
and
and
|
|
|
|
Note that in this case
Щр, Y )l-1 — П aii •
For MA(q) disturbances, it is more convenient to construct L(p, y) which is the band triangular matrix:
a11 |
0 |
0 ....................................... |
.............. 0 |
a21 |
a22 |
0 |
0 |
a31 |
a32 |
a33 |
0 |
aq+1,1 aq+1,2 aq+1,3 |
aq + 1,q+1 |
0 |
0 aq + 2,2 aq + 2,3 |
0 |
0 |
O 0 |
|||
0 |
0 |
0 ....................... |
' * * an, n-q ' " |
* ann |
(3.34)
We solve for the nonzero elements of this matrix recursively via
W) = (1 + Y2 + ... + Y 2)
Wi = (Yi + Y1Y+1 + ... + Yq-iYq), i = 1,..., q,
au = w0/2
a;+11 — Wi/ an, i — 1,..., q,
for j = 2,, q - 1 (using aij = 0 for all zero values of the elements of the above matrix in these formulae). For i = q,..., n
1 A
W0 - X a2,i-j v j—1
and
z* = z*i = |
z* = |
Observe that given the band triangular nature of (3.34), the transformation of z to z* = L(p, Y)-1z, where z denotes an n x 1 vector (either y in order to get y*(p, y) or a column of X in order to compute X*(p, y)) can be performed recursively by
The generalization of this approach to ARMA( p, q) disturbances is discussed by Ansley (1979) and in the special case of ARMA(1, 1) disturbances, is outlined by Rahman and King (1993). The above approach to maximum likelihood estimation is a general approach which requires numerical methods for optimization of the profile or concentrated loglikelihood. In the special cases of AR(1) disturbances and AR(2) disturbances, Beach and MacKinnon (1978a, 1978b) have derived an iterative method for solving the first-order conditions for the full maximum likelihood estimates. For further discussion of this and other methods, also see Davidson and MacKinnon (1993).