Introduction to the Mathematical and Statistical Foundations of Econometrics
Dependent Laws of Large Numbers and Central Limit Theorems
Chapter 6 I focused on the convergence of sums of i. i.d. random variables - in particular the law of large numbers and the central limit theorem. However, macroeconomic and financial data are time series data for which the independence assumption does not apply. Therefore, in this chapter I will generalize the weak law of large numbers and the central limit theorem to certain classes of time series.
7.1. Stationarity and the Wold Decomposition
Chapter 3 introduced the concept of strict stationarity, which for convenience will be restated here:
Definition 7.1: A time series process Xt is said to be strictly stationary if, for arbitrary integers m < m2 < ••• < mn, the joint distribution of Xt—m1Xt—mn does not depend on the time index t.
A weaker version of stationarity is covariance stationarity, which requires that the first and second moments of any set Xt—m1,Xt—mn of time series variables do not depend on the time index t.
Definition 7.2: A time series process Xt e R is covariance stationary (or weakly stationary) if E[||Xt ||2] < to and, for all integers t and m, E[Xt] = д and E[(Xt — jf(Xt-m — д)т] = T(m) do not depend on the time index t.
Clearly, a strictly stationary time series process Xt is covariance stationary if E[||Xt||2] < to.
For zero-mean covariance stationary processes the famous Wold (1938) decomposition theorem holds. This theorem is the basis for linear time series analysis and forecasting - in particular the Box-Jenkins (1979) methodology - and vector autoregression innovation response analysis. See Sims (1980, 1982, 1986) and Bernanke (1986) for the latter.
Theorem 7.1: (Wold decomposition) Let Xt є К be a zero-mean covariance stationary process. Then we can write Xt = Yj=0 a jUt — j + Wt, where a0 = 1, ^j=0 a2 < to, the Ut’s are zero-mean covariance stationary and uncorrelated random variables, and Wt is a deterministic process, that is, there exist coefficients jij such that P[Wt = Ylj=1 PjWt-j] = 1. Moreover, Ut = Xt —2^= PjXt—j and E[Ut +m Wt] = 0 for all integers m and t.
Intuitive proof. The exact proof employs Hilbert space theory and will therefore be given in the appendix to this chapter. However, the intuition behind the Wold decomposition is not too difficult.
It is possible to find a sequence fj, j = 1, 2, 3of real numbers such that E[(Xt — ^^j=1 PjXt —j)2] is minimal. The random variable
CO
Xt, = £ PjXt—j (7.1)
j=1
is then called the linear projection of Xt on Xt — j, j > 1. If we let
CO
Ut = Xt — J2 вjXt — j, (7.2)
j=1
it follows from the first-order condition dE[(Xt — 2^= fjXt—j)2]/dfj = 0 that
E [UtXt—m ] = 0 for m = 1, 2, 3,....
Note that (7.2) and (7.3) imply
E[U] = 0, E[UU—m] = 0 for m = 1, 2, 3,....
Moreover, note that by (7.2) and (7.3),
and thus by the covariance stationarity of Xt, E [U2] = al < E И and
/_ 2"
for all t. Hence it follows from (7.4) and (7.5) that Ut is a zero-mean covariance stationary time series process itself.
Next, substitute Xt_1 = Ut_i + Yj= PjXt_1_j in (7.1). Then (7.1) becomes
( |
CO CO
Ut _1 + X) eixt_1_A + 'Yh ejXt _j
j = 1 / j =2
CO
= e1ut_1 + Yl(ej+e1ej _1)Xt _j
(=2 o
= 01Ut_1 + (02 + 02) Xt_2 + ^2(вj + в1вj_1)Xt_ j ■ (7.7)
j =3
Now replace Xt_2 in (7.7) by Ut_2 + J2^=1 PjXt_2_j. Then (7.7) becomes
/ CO CO
Xt = PUt _1 + (e2 + e2) Ut_2 + J2 PjX_2_j + E(fij + 01 в_1)Xt_j
j=1
= 01 Ut_1 + (02 + 02) Ut_2 + E [(02 + 02) 0j_2 + (ej + 010j_0] Xt_j
j=3
= 01 Ut_1 + (02 + 02) Ut_2 + j(02 + 02) 01 + (03 + 0102)] Xt_3
O
+ E [(02 + 02) 0j_2 + (0j + 010/'_0] Xt_ j ■
j =4
Repeating this substitution m times yields an expression of the type
m
Xt = ajUt _j + &m, jXt _j,
j = 1 j =m + 1
for instance. It follows now from (7.3), (7.4), (7.5), and (7.8) that
/ _ 4 2"
E[X2] = a2 + E
j=1
Hence, letting m — o, we have
2
4XU = °-«Ё a2 + m-mo E j=1
Therefore, we can write Xt as
CO
Xt = E ajUt_j + Wt, j =o
where ao = 1 and X;^=0 aj < to with Wt = plimffl^TO Xj=ffl+1 6m, jXt-j a remainder term that satisfies
E[Ut+mWt] = 0 for all integers m and t.
Finally, observe from (7.2) and (7.9) that
Ut + ^2 SjUt-j, for instance.
j=1
It follows now straightforwardly from (7.4), (7.5), and (7.10) that Sj = 0 for all j > 1; hence, Wt =2^= Pj Wt-j with probability 1. Q. E.D.
Theorem 7.1 carries over to vector-valued covariance stationary processes:
Theorem 7.2: (Multivariate Wold decomposition) Let Xt є Rk be a zero-mean covariance stationary process. Then we can write Xt = Yj=0 AjUt - j + Wt, where A0 = Ik J=0 AjAj is finite, the Ut's are zero-mean covariance sta
tionary and uncorrelated random vectors (i. e., E[UtUtT m] = O for m > 1), and Wt is a deterministic process (i. e., there exist matrices Bj such that P[Wt =Yj=1 BjWt-j] = 1). Moreover, Ut = Xt —Yj^ BjXt - j, and
E[Ut+m WtT] = O for all integers m and t.
Although the process Wt is deterministic in the sense that it is perfectly predictable from its past values, it still may be random. If so, let ^tW = a (Wt, Wt—1, Wt-2,.. .)bethea-algebra generated by Wt-m for m > 0. Then all Wt’s are measurable -^—m for arbitrary natural numbers m; hence, all Wt’s are measurable ^WT = n<t=0^W. However, it follows from (7.2) and (7.9) that each Wt can be constructed from Xt—j for j > 0; hence, = a(Xt, Xt—1, Xt-2,...) d ^tW, and consequently, all Wt’s are measurable TO-50 = П“0^-t. This implies that Wt = E[Wt |TO-TO]. See Chapter 3.
The a-algebra represents the information contained in the remote past of Xt. Therefore, is called the remote a - algebra, and the events therein are
called the remote events. If TO-is the trivial a-algebra {^, 0}, and thus the remote pastof Xt is uninformative, then E [Wt |TO-to] = E [Wt ];hence, Wt = 0. However, the same result holds if all the remote events have either probability 0 or 1, as is easy to verify from the definition of conditional expectations with respect to a a-algebra. This condition follows automatically from Kolmogorov’s zero-one law if the Xt’s are independent (see Theorem 7.5 below), but for dependent processes this is not guaranteed. Nevertheless, for economic time
series this is not too farfetched an assumption, for in reality they always start from scratch somewhere in the far past (e. g., 500 years ago for U. S. time series).
Definition 7.3: A time series process Xt has a vanishing memory if the events in the remote a-algebra &~X“ = n“0a(X—t, X—t—1, X—t—2, • • •) have either probability 0 or 1.
Thus, under the conditions of Theorems 7.1 and 7.2 and the additional assumption that the covariance stationary time series process involved has a vanishing memory, the deterministic term Wt in the Wold decomposition is 0 or is a zero vector, respectively