Advanced Econometrics Takeshi Amemiya
Newton-Raphson Method
The Newton-Raphson method is based on the following quadratic approximation of the maximand (or minimand, as the case may be):
(2(0) * <2(0i) + gi(0 - 0.) + i(0 - 0i)'H,(0 ~ 0,), (4.4.1)
where 0, is an initial estimate and
The second-round estimator 02 of the Newton-Raphson iteration is obtained by maximizing the right-hand side of the approximation (4.4.1). Therefore
2 = 0, - HrV (4.4.2)
The iteration (4.4.2) is to be repeated until the sequence {§„} thus obtained converges.
Inserting iteration (4.4.2) back into approximation (4.4.1) yields
Qifii) “ Q(Bi) ~ № ~ OiYUitib ~ fa (4-4.3)
Equation (4.4.3) shows a weakness of this method: Even if (4.4.3) holds exactly, Q(62) > Q(0) is not guaranteed unless H, is a negative definite matrix. Another weakness is that even if H, is negative definite, 02 — 0{ may be too large or too small: If it is too large, it overshoots the target; if it is too small, the speed of convergence is slow.
The first weakness may be alleviated if we modify (4.4.2) as
^-^-(H.-ajirgi, (4-4-4)
where I is the identity matrix and a! is a scalar to be appropriately chosen by the researcher subject to the condition that H, — e^I is negative definite. This modification was proposed by Goldfeld, Quandt, and Trotter (1966) and is called quadratic hill-climbing. [Goldfeld, Quandt, and Trotter (1966) and Goldfeld and Quandt (1972, Chapter 1) have discussed how to choose al and the convergence properties of the method.]
The second weakness may be remedied by the modification
(4.4.5)
where the scalar At is to be appropriately determined. Fletcher and Powell (1963) have presented a method to determined A, by cubic interpolation of Q(0) along the current search direction. [This method is called the DFP iteration because Fletcher and Powell refined the method originally proposed by Davidson (1959).] Also, Bemdt et al. (1974) have presented another method for choosing A,.
The Newton-Raphson method can be used to obtain either the maximum likelihood or the nonlinear least squares estimator by choosing the appropriate Q. In the case of the MLE, Дд2 log L/двдО') may be substituted for d2 log L/двдв' in defining H. If this is done, the iteration is called the method of scoring (see Rao, 1973, p. 366, or Zacks, 1971, p. 232). In view of Eq. (4.2.22), —E(d log L/dO)(d log L/дв') may be used instead; then we need not calculate the second derivatives of log L.