Introduction to the Mathematical and Statistical Foundations of Econometrics
The Central Limit Theorem
The prime example of the concept of convergence in distribution is the central limit theorem, which we have seen in action in Figures 6.4—6.6:
Theorem 6.23: Let X1,Xn be i. i.d. random variables satisfying E(Xj) = X, var(Xj) = a2 < to and let X = (1/n)J2’n=1 Xj. Then *Jn(X — /г)
N(0, a2).
Proof: Without loss of generality we may assume that x = 0 and a = 1. Let p(t) be the characteristic function of Xj. The assumptions x = 0 and a = 1 imply that the first and second derivatives of p(t) at t = 0 are equal to p'(0) = 0, p"(0) = — 1, respectively; hence by Taylor’s theorem applied to Re[0(t)] and Im[0(t)] separately there exists numbers X1t, X2,t e [0, 1] such that
p(t) = p(0) + tp'(0) + 212 (Re[p"(xu ■ t)] + i ■ Im[p"(X2,, ■ t)]) = 1 — 212 + z(t )t2,
for instance, where z(t) = (1 + Re[p"(X1jt ■ t)] + i ■ Im[p"(X2,t ■ t)])/2. Note that z(t) is bounded and satisfies limtz(t) = 0.
Next, let pn (t) be the characteristic function of *fnX. Then
+y^ (m) і1—2t2/n) (z(t/^n)t2/n)m ■ (632)
m=1 V / V /
For n so large that t2 <
2n we have
(6.33)
Now observe that, for any real-valued sequence an that converges to a, lim ln((1 + an /n )n) = lim n ln(1 + an /n)
n^TO n^TO
ln(1 + 8) — ln(1)
= a x lim------------------------- = a;
8^0 8
hence,
lim an = a ^ lim (1 + an/n)n = ea. (6.34)
n^TO n^TO
Ifwelet an = |z(t/^/n)|t2, which has limit a = 0, it follows from (6.34) that the right-hand expression in (6.33) converges to zero, and if we let an = a = —t1 /2 it follows then from (6.32) that
lim cpn(t) = e—1/1. (6.35)
n^TO
The right-hand side of (6.35) is the characteristic function of the standard normal distribution. The theorem follows now from Theorem 6.22. Q. E.D.
There is also a multivariate version of the central limit theorem:
Theorem 6.24: Let Xi,...,Xn be i. i.d. random vectors in Kk satisfying E (Xj) = x, Var(Xj) = £, where £ is finite, and let X = (1/n)YTj=1 Xj. Then //n(X — x) ^d Nk(0, £).
Proof: Let Ц є Kk be arbitrary but not a zero vector. Then it follows from Theorem 6.23 that /пЦt(X — /x) ^d N(0, ЦT£f); hence, it follows from Theorem 6.22 that for all t є К, limn^TOE(exp[i ■ u/n Ц T(X — x)]) = exp(—г2ЦT£Ц/2). Choosing t = 1, we thus have that, for arbitrary Ц є Kk, limn^TOE(exp[i ■ ЦTy/n(X — x)]) = exp(—ЦT£/2). Because the latter is the characteristic function of the Nk (0, £) distribution, Theorem 6.24 follows now from Theorem 6.22. Q. E.D.
Next, let Ф be a continuously differentiable mapping from Kk to Km, and let the conditions of Theorem 6.24 hold. The question is, What is the limiting distribution of у/н(Ф(Х) — Ф(х)), if any? To answer this question, assume for the time being that k = m = 1 and let var(Xj) = a2; thus, ^(X — x)
N(0, a2). It follows from the mean value theorem (see Appendix II) that there exists a random variable X є [0, 1] such that
/П(Ф( X) — Ф(х)) = /n(X — х)Ф'(х + X(X — x)).
Because ^(X — x) ^d N(0, a2) implies (ft — x) ^d 0, which by Theorem 6.16 implies that X ^p x, it follows that x + X(X — x) ^p x - Moreover, because the derivative Ф7 is continuous in x it follows now from Theorem 6.3
that Ф'(д + k(X — д)) ^p Ф'(д). Therefore, it follows from Theorem 6.21 that */п(Ф(Х) — Ф(д)) N [0, ст2(Ф'(д))2]. Along similar lines, if we apply
the mean value theorem to each of the components of Ф separately, the following more general result can be proved. This approach is known as the 6-method.
Theorem 6.25: Let Xn be a random vector in Kk satisfying */n(X„ — д) ^d Nk[0, £], where д є Kk is nonrandom. Moreover, let Ф(х) = (Ф1(х),Фш (x ))T withx = (x1,..., xk )T be a mapping from Kk to Km such that them x k matrix ofpartial derivatives
(6.36)
exists in an arbitrary, small, open neighborhood of д and its elements are continuous in д. Then ^fn(p(Xn) — Ф(д)) ^d Nm [0, Д(д)2А(д)1].
6.3. Stochastic Boundedness, Tightness, and the Op and op Notations
The stochastic boundedness and related tightness concepts are important for various reasons, but one of the most important is that they are necessary conditions for convergence in distribution.
Definition 6.7: A sequence of random variables or vectors Xn is said to be stochastically bounded if, for every є є (0, 1), there exists a finite M > 0 such that inf n>1 P[||Xn у < M] > 1 — є.
Of course, if Xn is bounded itself (i. e., P[||Xn|| < M] = 1 for all n), it is stochastically bounded as well, but the converse may not be true. For example, if the Xn’s are equally distributed (but not necessarily independent) random variables with common distribution function F, then for every є є (0, 1) we can choose continuity points — M and M of F such that P[|Xn | < M] = F(M) — F(—M) = 1 — є. Thus, the stochastic boundedness condition limits the heterogeneity of the Xn’s.
Stochastic boundedness is usually denoted by Op (1) : Xn = Op (1) means that the sequence Xn is stochastically bounded. More generally,
Definition 6.8: Let an be a sequence ofpositive nonrandom variables. Then Xn = Op (an) means that Xn /an is stochastically bounded and Op (an) by itself represents a generic random variable or vector Xn such that Xn = Op(an).
The necessity of stochastic boundedness for convergence in distribution follows from the fact that
Theorem 6.26: Convergence in distribution implies stochastic boundedness.
Proof: Let Xn and X be random variables with corresponding distribution functions Fn and F, respectively, and assume that Xn ^d X. Given an є є (0, 1) we can choose continuity points - M and Mi of F such that F(Mi) > 1 - є/4, F(-M1) < є/4. Because limn^TO Fn(M1) = F(M1) there exists an index n1 such that |Fn(M1) - F(M1)| < є/4 if n > n1; hence, Fn (M1) > 1 — є/2if n > n1. Similarly, there exists an index n2 such that Fn(—M1) < є/2'ifn > n2.Letm = max(n1? n2). Theninfn>mP[|Xn|< M1] >
1 - є. Finally, we can always choose an M2 so large that min1<n<m—1 P[|Xn | < M2] > 1 - є. If we take M = max(M1, M2), the theorem follows. The proof of the multivariate case is almost the same. Q. E.D.
Note that, because convergence in probability implies convergence in distribution, it follows trivially from Theorem 6.26 that convergence in probability implies stochastic boundedness.
For example, let Sn = ^f"=1 Xj, where the Xj’s are i. i.d. random variables with expectation p, and variance a2 < to. If p = 0, then Sn = Op (Vn) because, by the central limit theorem, Sn Дfn converges in distribution to N(0, a2). However, if p = 0, then only Sn = Op(n) because then Sn Дfn - pjf N(0, a2); hence, Sn Д/й = Op(1) + Op(*/n) and thus Sn = Op (Vn) + Op (n) = Op (n).
In Definition 6.21 have introduced the concept of uniform integrability. It is left as an exercise to prove that
Theorem 6.27: Uniform integrability implies stochastic boundedness.
Tightness is the version of stochastic boundedness for probability measures:
Definition 6.9: A sequence of probability measures pn on the Borel sets in R* is called tight if, for an arbitrary є є (0, 1) there exists a compact subset K of R* such that infn>1pn(K) > 1 - є.
Clearly, if Xn = Op (1), then the sequence of corresponding induced probability measures pn is tight because the sets of the type K = {x є R* : ||x || < M} are closed and bounded for M < to and therefore compact.
For sequences of random variables and vectors the tightness concept does not add much over the stochastic boundedness concept, but the tightness concept is fundamental in proving so-called functional central limit theorems.
If Xn = Op(1), then obviously for any 8 > 0, Xn = Op(ns). But Xn/n8 is now more than stochastically bounded because then we also have that Xn/n8 ^p 0. The latter is denoted by Xn = op(n8):
Definition 6.10: Let an be a sequence of positive nonrandom variables. Then Xn = op (an) meansthatXn /an converges inprobability to zero (or azero vector
if Xn is a vector), and op (an) by itself represents a generic random variable or vector Xn such that Xn = op(an). Moreover, the sequence 1 /an represents the rate of convergence of Xn.
Thus, Xn ^p X can also be denoted by Xn = X + op (1). This notation is handy if the difference of Xn and X is a complicated expression. For example, the result of Theorem 6.25 is obtained because, by the mean value theorem, .4/n'(<p(Xn) — Ф(д)) = Дn(p)xfn(Xn — p) = A(pf/n(X„ — p) + op (1), where
Д n (p)
/ 9Ф1(х)/dxx=p+Xu(Xn—p)
дФт (x )/d xx=p+ltn (Xn —p)/
The remainder term (Дn(p) — A(p)f/n(X„ — p) can now be represented by op(1), because Дп(p) ^p Д(p) and */n(X„ — p) ^d Nt[0, £]; hence, by Theorem 6.21 this remainder term converges in distribution to the zero vector and thus also in probability to the zero vector.