Introduction to the Mathematical and Statistical Foundations of Econometrics
Convergence in Distribution
Let Xn be a sequence of random variables (or vectors) with distribution functions Fn (x), and let X be a random variable (or conformable random vector) with distribution function F(x).
Definition 6.6: We say that Xn converges to X in distribution (denoted by Xn ^d X) if limn^XlFn(x) = F(x) pointwise in x - possibly except in the discontinuity points of F(x).
Alternative notation: If X has a particular distribution, for example N(0, 1), then Xn ^d X is also denoted by Xn ^d N(0, 1).
The reason for excluding discontinuity points of F(x) in the definition of convergence in distribution is that limn^TO Fn(x) may not be right-continuous in these discontinuity points. For example, let Xn = X + 1/n. Then Fn(x) = F(x - 1/n). Now if F(x) is discontinuous in x0, then limn^TO F(x0 - 1/n) < F(x0); hence limn^TO Fn (x0) < F(x0). Thus, without the exclusion of discontinuity points, X + 1/n would not converge in distribution to the distribution of X, which would be counterintuitive.
If each of the components of a sequence of random vectors converges in distribution, then the random vectors themselves may not converge in distribution. As a counterexample, let
Then X1n —dN(0, 1)and X2n —dN(0, 1), but Xn does not converge in distribution.
Moreover, in general Xn —dX does not imply that Xn — p. For example, if we replace X by an independent random drawing Z from the distribution of X, then Xn —dX and Xn —dZ are equivalent statements because they only say that the distribution function of Xn converges to the distribution function of X (or Z) pointwise in the continuity points of the latter distribution function. If Xn ——dX implied Xn —pX, then Xn —pZ would imply that X = Z, which is not possible because X and Z are independent. The only exception is the case in which the distribution of X is degenerated: P (X = c) = 1 for some constant c:
Theorem 6.16: IfXn converges in distribution to X, and P (X = c) = 1, where c is a constant, then Xn converges in probability to c.
Proof: Exercise.
Note thatthis result is demonstrated in the left-hand panels of Figures 6.1-6.3. On the other hand,
Theorem 6.17: Xn — p X implies Xn —d X.
Proof: Theorem 6.17 follows straightforwardly from Theorem 6.3, Theorem 6.4, and Theorem 6.18 below. Q. E.D.
There is a one-to-one correspondence between convergence in distribution and convergence of expectations of bounded continuous functions of random variables:
Theorem 6.18: Let Xn and X be random vectors in R*. Then Xn —d X if and only iffor all bounded continuous functions p on R* limn—m E [p(Xn)] = E [p (X)].
Proof: I will only prove this theorem for the case in which Xn and X are random variables. Throughout the proof the distribution function of Xn is denoted by Fn(x) and the distribution function of X by F(x).
Proof of the “only if” case: Let Xn —dX. Without loss of generality we
may assume that p(x) є [0, 1] for all x. For any є > 0 we can choose continuity points a and b of F(x) such that F(b) - F(a) > 1 - є. Moreover, we can
choose continuity points a = c < c2 < ■■■ < cm = b of F(x) such that, for j = 1m — 1,
sup p(x) — inf p(x) < є. (6.17)
X G(Cj ,Cj+l] xG(cj, cj+1 ]
Now define
f (x) = inf p(x) for x Є (Cj, Cj + 1],
xe(Cj. Cj+1]
Moreover,
b
E[<p(X)] = j q>(x)dF(x) = F(a) + j ^—— dF(x) < F(b). (6.25)
a
Combining (6.24) and (6.25) yields F(b) > limsupn^TOFn(a); hence, because b(> a) was arbitrary, letting b I a it follows that
F(a) > limsup Fn(a). (6.26)
n^TO
Similarly, for c < a we have F (c) < liminfn^TO Fn (a); hence, if we let c a, it follows that
F(a) < liminf Fn(a). (6.27)
n^TO
If we combine (6.26) and (6.27), the “if” part follows, that is, F(a) = limn^TOFn (a). Q. E.D.
Note that the “only if’ part of Theorem 6.18 implies another version of the bounded convergence theorem:
Theorem 6.19: (Bounded convergence theorem) IfXn is bounded: P(|Xn | < M) = 1 for some M < to and all n, then Xn ^d X implies limn^TOE(Xn) = E(X).
Proof: Easy exercise.
On the basis of Theorem 6.18, it is not hard to verify that the following result holds.
Theorem 6.20: (Continuous mapping theorem) Let Xn andXbe random vectors in R such that Xn ^d X, and let ) be a continuous mapping from R into Km. Then Ф(Xn)^dФ(X).
Proof: Exercise.
The following are examples of Theorem 6.20 applications:
(1) Let Xn ^d X, where Xis N(0, 1) distributed. Then X2 ^d x.
(2) Let Xn ^d X, where X is Nk(0, I) distributed. Then XjXn ^d x%.
If Xn ^d X, Yn Y, and Ф(x, y) is a continuous function, then in general
it does not follow that Ф^п, Yn) ^d Ф(X, Y) except if either X or Y has a degenerated distribution:
Theorem 6.21: LetXand Xn be random vectors in Kk such that Xn ^dX, and let Yn be a random vector in Km such that plimn^TOYn = c, where c є Rm is
a nonrandom vector Moreover, let Ф^, y) be a continuous function on the set Kkx{y є Km : ||y - c|| < 5} for some 8 > 0.6 Then Ф(Хп, Yn) ^d Ф(Х, c).
Proof: Again, we prove the theorem for the case k = m = 1 only. Let Fn (x) and F(x) be the distribution functions of Xn and X, respectively, and let Ф^, y) be a bounded continuous function on К x (c - 8, c + 8) for some 8 > 0. Without loss of generality we may assume that |Ф^, y)| < 1. Next, let є > 0 be arbitrary, and choose continuity points a < b of F(x) such that F(b) — F(a) > 1 — є. Then for any y > 0,
|E[Ф^, Yn)] — E[Ф^, c)|
< E[^(Xn, Yn) — Ф№, c)|/(| Yn — c|< у)]
+ E[|Ф№, Yn) — Ф№, c)|/(|Yn — c| > у)]
< E [^Xn, Yn) — Ф№, c)|/(| Yn — c|< y ) I (Xn є [a, b])]
+ 2P(Xn Є [a, b]) + 2P(| Yn — c| > y)
< sup |Ф(x, y) — Ф^, c)| + 2(1 — Fn (b) + Fn (a))
xe[a, b], |y—c|<y
+ 2P(| Yn — c| >y)■ (6.28)
Because a continuous function on a closed and bounded subset of Euclidean space is uniformly continuous on that subset (see Appendix II), we can choose Y so small that
sup |Ф^, y) — Ф^, c)| < є^ (6.29)
x e[a, b], |y—c|<Y
Moreover, 1 — Fn(b) + Fn(a) ^ 1 — F(b) + F(a) < є, and P(| Yn — c| > Y) ^ 0. Therefore, it follows from (6.28) that
limsup |E^(Xn, Yn)] — E^(Xn, c)| < 3є^ (6.30)
The rest of the proof is left as an exercise. Q. E.D.
Coronary 6.1: Let Zn be t-distributed with n degrees of freedom. Then Zn ^d N(0, 1).
Proof: By the definition of the t-distribution with n degrees of freedom we can write
Z |
U0 |
(6.31) |
n= |
sn=1U2 |
|
where U0, U,■■■ |
, Un are i. i.d |
. N(0, 1). Let Xn = U0 and X = U0 |
so that trivially Xn |
^d X. Let Yn |
= (1/n)Z)n=i Uj. Then by the weak law |
6 Thus, Ф is continuous in y on a little neighborhood of c. |
of large numbers (Theorem 6.2) we have plimn^TOYn = E(Uj2) = 1. Let ФД, y) = x Д/y. Note that Ф(х, y) is continuous on R x (1 - є, 1 + e)for0 < є < 1. Thus, by Theorem 6.21, Zn = Ф(Хп, Yn) ^ Ф(Х, 1) = U0 ~ N(0, 1) in distribution. Q. E.D.
Coronary 6.2: Let U1 ...Un be a random sample from Nk (д, E), where E is nonsingular. Denote U = (1/n)Y^j=1 Uj, E = (1/(n — 1))YTj=j(Uj — U)(Uj — U)T, and let Zn = n(U — д)тЕ—1(t7 — д). Then Zn ^d xk.
Proof: For a k x k matrix A = (a1ak), let vec(A) be the k2 x 1 vector of stacked columns aj, j = 1,...,k of A : vec(A) = (a’T,.aJ)T = b, for instance, with inverse vec—1(b) = A. Let c = vec(E), Yn = vec(E), Xn = *fn(U — д), X ~ Nk(0, E), and Ф(x, y) = xT(vec—1(y))—1 x. Because E is nonsingular, there exists a neighborhood C(8) = {y e Kkxk : ||y — c\ < 8} of
c such that for all y in C(8), vec—1 (y) is nonsingular (Exercise: Why?), and consequently, Ф(x, y) is continuous onKk x C(8) (Exercise: Why?). The corollary follows now from Theorem 6.21 (Exercise: Why?). Q. E.D.