INTRODUCTION TO STATISTICS AND ECONOMETRICS
Asymptotic Properties of Least Squares Estimators
In this section we prove the consistency and the asymptotic normality of the least squares estimators a and (3 and the consistency of a under suitable assumptions about the regressor {xt}.
To prove the consistency of a and p, we use Theorem 6.1.1, which states that convergence in mean square implies consistency. Since both ti and P are unbiased estimators of the respective parameters, we need only show that the variances given in (10.2.23) and (10.2.24) converge to zero. Therefore, we conclude that ti and p are consistent if
(10.2.55) lim E(lf)2 = oo and
(10.2.56) lim Z(xf )2 = oo.
7—>00
We shall rewrite these conditions in terms of the original variables {xt}. Since X(lf)2 and (Zxf )2 are the sums of squared prediction errors in predicting the unity regressor by {xt} and in predicting {xt by the unity regressor, respectively, the condition that the two regressors are distinctly different in some sense is essential for (10.2.55) and (10.2.56) to hold. Given the sequences of constants {xt} and {zt, t = 1, 2, . . . , T, we measure the degree of closeness of the two sequences by the index
2 (Zx, z,)2
(10.2.57) ■
Zx2Xz2
2 2
Then we have 0 ^ pT ^ 1. To show pT £ 1, consider the identity
(10.2.58) E(x< — Xztf = Exf + X2Ez2 — 2 XLxtzt.
Since (10.2.58) holds for any X., it holds in particular when
Inserting (10.2.59) into the right-hand side of (10.2.58) and noting that the left-hand side of (10.2.58) is the sum of nonnegative terms and hence is nonnegative, we obtain the Cauchy-Schwartz inequality:
„ о (Ex, z,)2
(10.2.60) X xf - -—— > 0.
(See Theorem 4.3.5 for another version of the Cauchy-Schwartz inequality.) The desired inequality p2 < 1 follows from (10.2.60). Note that p2 = 1 if and only if xt = zt for all t and pT = 0 if and only if xt) and (z() are orthogonal (that is, Zx, z, = 0).
Using the index (10.2.57) with z, = 1, we can write
(10.2.61)
(10.2.62) X(l?)2 = (1 - phT, and
Z(xf)2 = (1 - p|)Zx2.
Finally, we state our result as
THEOREM 10.2.1 In the bivariate regression model (10.1.1), the least squares estimators a and (3 are consistent if
(10.2.64) lim Zx[11] [12] [13] [14] [15] [16] [17] [18]t = oo
T—о
and
(10.2.65) lim p2 < 1.
Note that when we defined the bivariate regression model in Section 10.1, we assumed pт Ф 1. The assumption (10.2.65) states that pr Ф 1 holds in the limit as well. The condition (10.2.64) is in general not restrictive.
__ 9
Examples of sequences that do not satisfy (10.2.64) are xt = t and xt = 2 but we do not commonly encounter these sequences in practice. Next we prove the consistency of cr. From (10.2.38) we have
V,,2 1
(10.2.66) d2 = ~y - ~ ЧЩ ~ uf.
the transformed sequence has a constant variance for all T. This is accomplished by considering the sequence
(10.2.70)
since the variance of (10.2.70) is unity for all T. We need to obtain the conditions on {z,} such that the limit distribution of (10.2.70) is N(0, 1). The answer is provided by the following theorem:
THEOREM 10.2.2 Let [ut} be i. i.d. with mean zero and a constant variance a2 as in the model (10.1.1). If
max zt
(10.2.70) lim — = 0,
Xz2
then
(10.2.71) —N(0, 1).
Note that if zt = 1 for all t, (10.2.71) is clearly satisfied and this theorem is reduced to the Lindeberg-Levy central limit theorem (Theorem 6.2.2). Accordingly, this theorem may be regarded as a generalization of the Lindeberg-Levy theorem. It can be proved using the Lindeberg-Feller central limit theorem; see Amemiya (1985, p. 96).
We shall apply the result (10.2.72) to P — P and a — a by putting zt = x* and Zf = if in turn. Using (10.2.63), we have
max (xf )2 max (xt — xf 4 max xf
(10.2.72) ---------- =---------------------------------------- <----------------------- — •
X(xf)2 (1 - pf-)Xxf (1 - Pt)Xx2
Therefore {xf) satisfy the condition (10.2.71) if we assume (10.2.65) and
2
max xt
1 </<T
(10.2.73) hm — = 0.
t ->°° Xxf
Next, using (10.2.61) and (10.2.62), we have
(10.2.74) max(1*)2
Ц1*)2
2 2 pT max xt
П1 - Рт) (1 - Pt) Z*?
Therefore {1*} satisfy the condition (10.2.71) if we assume (10.2.65) and (10.2.74). Thus we have proved that Theorem 10.2.2 implies the following theorem:
THEOREM 10.2.3 In the bivariate regression model (10.1.1), assume further (10.2.65) and (10.2.74). Then we have
Vz(lf)2
(10.2.75) --- — (a — a) -» N(0, 1)
and
Э) -> m, і).
Using the terminology introduced in Section 6.2, we can say that a and P are asymptotically normal with their respective means and variances. Note that the condition (10.2.74) is stronger than (10.2.64), which was required for the consistency proof; this is not surprising since the asymptotic normality is a stronger result than consistency. We should point out, however, that (10.2.74) is only mildly more restrictive than (10.2.64). In order to be convinced of this fact, the reader should try to construct a sequence which satisfies (10.2.64) but not (10.2.74).
The conclusion of Theorem 10.2.3 states that a and f> are asymptotically
normal when each estimator is considered separately. The assumptions of that theorem are actually sufficient to prove the joint asymptotic normality of a and (3; that is, the joint distribution of the random variables defined in (10.2.76) and (10.2.77) converges to a joint normal distribution with zero means, unit variances, and the covariance which is equal to the limit of the covariance. We shall state this result as a theorem in Chapter 12, where we discuss the general regression model in matrix notation.