Advanced Econometrics Takeshi Amemiya
Asymptotic Properties of Least Squares and Maximum Likelihood Estimator in the Autoregressive Model
We shall first consider the least squares (LS) estimation of the parameters p's and <t2 of an AR(p) model (5.2.30) using T observations yuy2, ■ . . , yT. In vector notation we can write model (5.2.30) as
y = Y^ + e, (5.4.1)
where |
У = (Ур+1,Ур+2,- ■ • ,УтУ, |
€ = (еі |
|
(Pb P2> |
. . ,ppy, and |
||
" Ур Ур-і ■ • • |
Уі |
||
Ур+1 Ур ■ • • |
У2 |
||
Y = |
• |
• |
|
.Ут-і Ут-2 |
Ут-р. |
The LS estimator p of p is defined as/> = (Y'Y^'Y'y. In the special case of p— 1, it becomes
t-2______
T
Model (5.4.1) superficially looks like the classical regression model (1.1.4), but it is not because the regressors Y cannot be regarded as constants. This makes it difficult to derive the mean and variance ofp for general p. However, in the case of p = 1, the exact distribution of p can be calculated by direct integration using the method of Imhof (1961). The distribution is negatively skewed and downward biased. Phillips (1977) derived the Edgeworth expansion [up to the order of (XT~l)] of the distribution of p in the case of p = 1 assuming the normality of (є,} and compared it with the exact distribution. He found that the approximation is satisfactory for p = 0.4 but not for p — 0.8.
In the general AR(p) model we must rely on the asymptotic properties of the LS estimator. For model (5.4.1) with Assumptions A, B", and C, Anderson (1971, p. 193) proved
JT(P - />) - N(0, ct2*;1), (5.4.3)
where Xp is the p X p autovariance matrix of AR(p).2 We can estimate <r2 consistently by
a2 = Г-'(у - Yp)'(y - Yp). (5.4.4)
Because ритг_. T~lY'Y = Xp, the distribution of p may be approximated by N[p, a2(Y'Y)-1 ]. Note that this conclusion is identical to the conclusion we derived for the classical regression model in Theorem 3.5.4. Thus the asymptotic theory of estimation and hypothesis testing developed in Chapters 3 and 4 for the classical regression model (1.1.4) can be applied to the autoregressive model (5.4.1). There is one difference: In testing a null hypothesis that specifies the values of all the elements of p, a2X~1 need not be estimated because it depends only on p [see (5.4.16) for the case ofp = 1].
We shall consider the simplest case of AR( 1) and give a detailed proof of the consistency and a sketch of the proof of the asymptotic normality. These are based on the proof of Anderson (1971).
From (5.4.2) we have
T |
We shall prove consistency by showing that T~l times the numerator converges to 0 in probability and that T~l times the denominator converges to a positive constant in probability.
The cross product terms in (XfL2 yt-i€tY are of the form y, yl+jet+l€t+l+s.
But their expectation is 0 because el+l+, is independent of y, y,+s£t+1. Thus
*(ї| (5A6)
Therefore, by Theorem 3.2.1 (Chebyshev), we have
plim ^ 2 y^e, = 0. (5.4.7)
7*—*00 1 fma2
Putting e, = у, — py,_, in (5.4.7), we have 1 T
plim - 2 yt_,(y, -py,.i) = 0. (5.4.8)
Г-*» 1 ,_2
By Theorem 3.3.2 (Kolmogorov LLN 2), we have
plim 2 (yt ~ pyt-i f = a2. (5.4.9)
г-. T fa
By adding (5.4.9) and 2p times (5.4.8), we have
P|™ Q; y2-p2 y 2 УІ-1) = °2' (5.4.10)
Therefore
« 2* 2 2 «
plim - E Т~2 + 7ТГ7І p. lim 7: (УЇ - Уг). (5.4.11)
T—OO 1 £2 l — p l — P T—«° 1
But the last term of (5.4.11) is 0 because of (3.2.5)—generalized Chebyshev’s inequality. Therefore
(5.4.12)
The consistency of p follows from (5.4.5), (5.4.7), and (5.4.12) because of Theorem 3.2.6.
Next consider asymptotic normality. For this purpose we need the follow - ing definition: A sequence {v,} is said to be К-dependent if (vti ,vh>. . . , v, J are independent of (vSi, vS2 vSm) for any set of integers satisfying
h<t2. . .<t„<sl<s2<. . .<sm and tn + К < s,.
To apply a central limit theorem to 2£., v„ split the Г observations into S successive groups with Л/ +^observations [so that 5(Л/+К) = T] and then in each group retain the first M observations and eliminate the remaining К
observations. If we choose S and M in such a way that *S—> M-* and
S/T-* 0 as T—► oo, the elimination of К observations from each of the S groups does not matter asymptotically.
We can write
where vNt = et 2JL0 P%-i-s and ANT = — 2£.z (c, 2“.^+, p*€,_But we have
= І; І P’PwEe, bet - 1-А-.— (5.4.14)
1 t-2 T—2 j-ЛН-І м>—Л+1
Therefore Ajvj. can be ignored for large enough Ж (Anderson, 1971, Theorem
7.7.1, has given an exact statement of this.) We can show that for a fixed Ж,
N
EvNt = 0, Evjf, = a* 2) p2*, and EvNtvm = 0 for t Ф r.
s^O
Moreover, i?№ and vNil+N+2 are independent for all t. Therefore {vNt} for each Ж are (Ж+ l)-dependent and can be subjected to a central limit theorem (see Anderson, 1971, Theorem 7.7.5). Therefore
(5.4.15)
Combining (5.4.12) and (5.4.15), we have
JT(P~ P)~* MO, 1 - p2). (5.4.16)
Now we consider the MLE in AR(1) under the normality of {є,}. The likelihood function is given by
L = (2я)-г/ї21ГІ/2 exp [—(1/2)у/Хг, у], (5-4.17)
where |2,| and 27* are given in (5.2.13) and (5.2.14), respectively. Therefore
log L = —I log (2л) - у log a2 +jtog(i-P’l-jpQ,
where Q = { +p2) 2,-! yj ~ Р2(у] + y$-)~2p 2£.2 у,-ху,. Putting d log L/da2 = 0, we obtain
Inserting (5.4.19) into (5.4.18) yields the concentrated log likelihood function (aside from terms not depending on the unknown parameters)
log L* = ~ log (2л)--jlogG + ^logO-p2). (5.4.20)
Setting d log L */dp = 0 results in a cubic equation in p with a unique real root in the range [— 1, 1]. Beach and Mackinnon( 1978) have reported a method of deriving this root.
However, by setting dQ/dp = 0, we obtain a much simpler estimator
T
2 Уі-іУі
Рл=Цгх------- • (5.4.21)
We call it the approximate MLE. Note that it is similar to the least square estimator p, given in (5.4.2), for which the range of the summation in the denominator is from t = 2 to T. If we denote the true MLE by pM, we can easily show that 4T {pA — p) and Vr (pM — p) have the same limit distribution by using the result of Section 4.2.5.