Introduction to the Mathematical and Statistical Foundations of Econometrics
Borel Measurability
Let g be a real function and let X be a random variable defined on the probability space {^, P}. For g(X) to be a random variable, we must have that
It is possible to construct a real function g and a random variable X for which this is not the case. But if
For all Borel sets B, Ab = {x є R : g(x) є B} is a Borel set itself,
(2.5)
then (2.4) is clearly satisfied because then, for any Borel set B and A B defined in (2.5),
{ш є ^ : g(X(o)) є B} = {o є ^ : X(o) є AB}є
Moreover, if (2.5) is not satisfied in the sense that there exists a Borel set B for which A B is not a Borel set itself, then it is possible to construct a random variable X such that the set
{o є ^ : g(X(ш)) є B} = {o є ^ : X(ш) є AB} /
hence, for such a random variable X, g(X) is not a random variable itself.1 Thus, g(X) is guaranteed to be a random variable if and only if (2.5) is satisfied. Such real functions g(x) are described as being Borel measurable:
Definition 2.1: A realfunction g is Borel measurable if and only if for all Borel sets B in R the sets AB = {x є R : g(x) є B} are Borel sets in R. Similarly, a real function g on R* is Borel measurable if and only if for all Borel sets B in R the sets AB = {x є R* : g(x) є B} are Borel sets in R*.
However, we do not need to verify condition (2.5) for all Borel sets. It suffices to verify it for Borel sets of the type (-to, y], y є R only:
Theorem 2.1: A real function g on R* is Borel measurable if and only if for all y є R the sets Ay = {x є R* : g(x) < y} are Borel sets in R*.
Proof: Let D be the collection of all Borel sets B in R for which the sets {x є R* : g(x) є B} are Borel sets in R*, including the Borel sets of the type (-to, y], y є R. Then D contains the collection of all intervals of the type (-to, y], y є R. The smallest a - algebra containing the collection {(-to, y], y є R} is just the Euclidean Borel field B = a ({(-to, y], y є R}); hence, if D is a a-algebra, then B c D. ButD is a collection of Borel sets; hence, D c B. Thus, if D is a a-algebra, then B = D. The proof thatD is a a-algebra is left as an exercise. Q. E.D.
The simplest Borel measurable function is the simple function:
The actual construction of such a counterexample is difficult, though, but not impossible.
Definition 2.2: A real function g on Kk is called a simple function if it takes the form g(x) = m=i ajI(x є Bf), with m < to, aj є К, where the Bj’s are
disjoint Borel sets in Kk.
Without loss of generality we may assume that the disjoint Borel sets Bj’s form a partition of Kk : U’J=1 Bj = Kk because, if not, then let g(x) = J2m+1 ajI (x є Bj), with Bm+1 = Kk(U'"=1 Bj) and am+1 = 0. Moreover, without loss of generality we may assume that the aj’s are all different. For example, if g(x) = J2m+1 ajI(x є Bj) and am = am+1, then g(x) = J2'm=1 ajI(x є Bjj), where Bj = Bj for j = 1,. .., m — 1 and Bm= Bm U Bm+1.
Theorem 2.1 can be used to prove that
Theorem 2.2: Simple functions are Borel measurable.
Proof: Let g(x) = J2m= 1 ajI (x є Bj) be a simple function on Kk. For arbitrary
У є К,
{x є Kk : g(x) < y}= x є Kk : V ajI (x є Bj) < Л = U Bj,
j=1 J aj<y
which is a finite union of Borel sets and therefore a Borel set itself. Because y was arbitrary, it follows from Theorem 2.1 that g is Borel measurable. Q. E.D.
Theorem 2.3: If f (x) and g(x) are simple functions, then so are f (x) + g(x), f (x) — g(x),and f (x) ■ g(x). If in addition, g(x) = 0forallx, then f (x)/g(x) is a simple function.
Proof: Exercise
Theorem 2.1 can also be used to prove
Theorem 2.4: Let gj(x), j = 1, 2, 3,... be a sequence of Borel-measurable functions. Then
(a) f1,n(x) = min{g1(x),. .., gn(x)} and fz, n(x) = max{g1(x),. . ., gn(x)} are Borel measurable;
(b) f1(x) = infn>1 gn (x) and f2(x) = supn>1 gn (x) are Borel measurable; and
(c) h1(x) = liminf„^mgn(x) and h2(x) = limsupn^lXlgn(x) are Borel measurable;
(d) ifg(x) = limn^mgn(x) exists, then g is Borel measurable.
Proof: First, note that the min, max, inf, sup, liminf, limsup, and lim operations are takenpointwise in x. I will only prove the min, inf, and liminf cases
for Borel-measurable real functions on R. Again, let y є R be arbitrary. Then,
(a) {x є R : An (x) < y} = U=1 {x є R : gj(x) < y} є В.
(b) {x є R : f1(x) < y} = U^=1 {x є R : gj (x) < y} є В.
(c) {x є R : h 1(x) < y} = П^и?^ {x є R : gj(x) < y} є В.
The max, sup, limsup, and lim cases are left as exercises. Q. E.D.
Because continuous functions can be written as pointwise limits of step functions and step functions with a finite number of steps are simple functions, it follows from Theorems 2.1 and 2.4(d) that
Theorem 2.5: Continuous real functions are Borel measurable.
Proof: Let g be a continuous function on R. Define for natural numbers n, gn (x) = g(x) if —n < x < n but gn (x) = 0 elsewhere. Next, define for j = 0,..., m — 1 and m = 1,2,...
B( j, m, n) = (—n + 2n ■ j/m, — n + 2(j + 1)n/m].
Then the B(j, m, n)’s are disjoint intervals such that и”-1 B(j, m, n) = (—n, n]; hence, the function
is a step function with a finite number of steps and thus a simple function. Because, trivially, g(x) = lim„^TOgn (x) pointwise in x, g(x) is Borel measurable if the functions gn(x) are Borel measurable (see Theorem 2.4(d)). Similarly, the functions gn (x) are Borel measurable if, for arbitrary fixed n, gn (x) = limm^mgn, m(x) pointwise in x because the gn, m(x)’s are simple functions and thus Borel measurable. To prove gn(x) = limm^TOgn, m(x), choose an arbitrary fixed x and choose n > |x |. Then there exists a sequence of indices jn, m such that x є В (jn, m, m, n) for all m; hence,
0 < gn (x) — gn, m (x) < g(x) — inf g(x*)
x^ B(jn, m,m, n)
< sup |g(x) — g(x,)|^ 0
|x —x+ <2n/m
as m ^ to. The latter result follows from the continuity of g(x). Q. E.D.
Next, I will show in two steps that real functions are Borel measurable if and only if they are limits of simple functions:
Theorem 2.6: A nonnegative realfunction g(x) is Borel measurable if and only if there exists a nondecreasing sequence gn (x) of nonnegative simple functions such that pointwise inx, 0 < gn (x) < g(x), and limn^Xl gn (x) = g(x).
Proof: The “if” case follows straightforwardly from Theorems 2.2 and 2.4. For proving the “only if” case let, for 1 < m < n2n, gn(x) = (m — 1)/2n if (m — 1)/2n < g(x) < m /2n and gn (x) = n otherwise. Then gn (x) is a sequence of simple functions satisfying 0 < gn (x) < g(x) and limn^TOgn(x) = g(x) point - wise in x. Q. E.D.
Every real function g(x) can be written as a difference of two nonnegative functions:
g(x) = g+(x) — g— (x), where g+ (x) = max{g(x), 0},
g— (x) = max{—g(x), 0}. (2.6)
Moreover, if g is Borel measurable, then so are g+ and g_ in (2.6). Therefore, it follows straightforwardly from (2.6) and Theorems 2.3 and 2.6 that
Theorem 2.7: A real function g(x) is Borel measurable if and only if it is the limit of a sequence of simple functions.
Proof: Exercise.
Using Theorem 2.7, we can now generalize Theorem 2.3 to
Theorem 2.8: If f (x) and g(x) are Borel-measurable functions, then so are f (x) + g(x), f (x) — g(x), and f (x) ■ g(x). Moreover, if g(x) = 0 for allx, then f (x )/g(x) is a Borel-measurable function.
Proof: Exercise.
2.2. Integrals of Borel-Measurable Functions with Respect to a Probability Measure
If g is a step function on (0,1] - for instance, g(x) = Y^j= ajI(x є (bj, bj+i])- where bo = 0 and bm+1 = 1, then the Riemann integral of g over (0,1] is defined as
1
mm
g(x)dx = J2 aj(bj+1 — bj) = J2 aj x((bj, bj+1])
о j = 1 j = 1
where fx is the uniform probability measure on (0, 1]. Mimicking these results for simple functions and more general probability measures x, we can define the integral of a simple function with respect to a probability measure x as follows:
Definition 2.3: Let x be a probability measure on {Kk, B}, and let g(x) = J2'm=1 ajI(x є Bj) be a simple function on. Then the integral of g with
respect to д is defined as
/m
g(x)dдф) = aj д(Bj)-[9]
i=1
For nonnegative continuous real functions g on (0,1], the Riemann integral of g over (0, 1] is defined as /J g(x)dx = sup0<g<g /J g+(x)dx, where the supre - mum is taken over all step functions g+ satisfying 0 < g+(x) < g(x) for all x in (0, 1]. Again, we may mimic this result for nonnegative, Borel-measurable functions g and general probability measures д:
Definition 2.4: Let д be a probability measure on {Kk, B} and let g(x) be a nonnegative Borel-measurable function on Kk. Then the integral of g with respect to д is defined as
where the supremum is taken over all simple functions g+ satisfying 0 < g+ (x) < g(x) for all x in a Borel set B with дф) = 1.
Using the decomposition (2.6), we can now define the integral of an arbitrary Borel-measurable function with respect to a probability measure:
Definition 2.5: Let д be a probability measure on {Kk В} and let g(x) be a Borel-measurable function on Kk. Then the integral of g with respect to д is defined as
where g+(x) = max{g(x), 0}, g— (x) = max{— g(x), 0}provided that at least one of the integrals at the right-hand side of (2.7) is finite.[10]
Definition 2.6: The integral of a Borel-measurable function g with respect to a probability measure д over a Borel set A is defined as fA g(x)dx(x) = f I(x є A)g(x)dx(x).
All the well-known properties of Riemann integrals carry over to these new integrals. In particular,
Theorem 2.9: Let f (x) and g(x) be Borel-measurable functions on Rk, let д be a probability measure on {Rk, B}, and let A be a Borel set in Rk. Then
(a) fA(ag(x) + ff (x))d/x(x) = a fA g(x)d/x(x) + в fA f(x)d/x(x).
(b) For disjoint Borel sets Aj in Rk, /и„ A. g(x)dn(x) =
T! f=lAjg(x)dB(x). J l '
(c) Ifg(x) > 0 for all x in A, then fA g(x)dn(x) > 0.
(d) Ifg(x) > f (x) for all x in A, then fAg(x)dn(x) > fA f (x)dn(x).
(e) [fa g(x )d B(x )l < I a g(x )|d Bx).
(f) If д(A) = 0, then fA g(x)dд(%) = 0.
(g) Iff g(x )d /x(x) < to and lim„^TO д( An) = 0 for a sequence of Borel sets An, thenlimn^m f. g(x)dn(x) = 0.
” An
Proofs of (a)-(f): Exercise.
Proof of (g): Without loss of generality we may assume that g(x) > 0. Let
Ct = {x e R : t < g(x) < t + 1} and Bm = {x e R : g(x) > m} = U^Ct.
Then fRg(x)dp(x) = J2kf=0 fCtg(x)d^(x) < to; hence,
'TO /»
g(x )d ix(x) = J2
Bm k=m Ct
Therefore,
j g(x)dix(x) + mд(An);
hence, for fixed m, limsup„^TO /A g(x)d/x(x) < fB g(x)d/x(x). Letting m ^ to, we find that part (g) of Theorem 2.9 follows from (2.8). Q. E.D.
Moreover, there are two important theorems involving limits of a sequence of Borel-measurable functions and their integrals, namely, the monotone convergence theorem and the dominated convergence theorem:
Theorem 2.10: (Monotone convergence) Let gn be a nondecreasing sequence of nonnegative Borel-measurable functions on Rk (i. e., for any fixed x e Rk, 0 < gn (x) < gn+1(x) for n = 1, 2, 3,...), and let д be a probability measure
on {Rk, B}. Then
lim I gn(x)dц(х) = I lim gn(x)dn(x).
n — TO J J n — Ж
Proof: First, observe from Theorem 2.9(d) and the monotonicity of gn that /gn (x)dц(х) is monotonic nondecreasing and that therefore limn—TO/gn(x)dц(х) exists (but may be infinite) and g(x) = limn—TOgn(x) exists (but may be infinite) and is Borel-measurable. Moreover, given that for x є Rk, gn (x) < g(x), it follows easily from Theorem 2.9(d) that/ gn (x )d n(x) < fg(x)dn(x); hence, limn—TO/gn(x)dn(x) < /g(x)d/x(x).Thus, it remains to be shown that
lim I gn(x)dn-(x) > I g(x)dn-(x). (2.9)
n — TO J J
It follows from the definition on the integral /g(x)d^(x) that (2.9) is true if, for any simple function f (x) with 0 < f(x) < g(xX
lim I gn(x)dn-(x) > I f (x)dn-(x). (2.10)
n — TO J J
Given such a simple function f (x), let An = {x є Rk : gn(x) > (1 - e) f (x)} for arbitrary e > 0, and let supxf (x) = M. Note that, because f(x) is simple, M < ж. Moreover, note that
lim ^(RkAn) = lim ii({x є Rk : gn(x) < (1 - e)f (x)}) = 0.
(2.11)
Furthermore, observe that
f gn (x)d/4x) >j gn (x)dMx) > (1 - e) / f (x )d i(x)
An An
= (1 - e) j f (x)dx(x) - (1 - e) j f(x )dx(x)
Rk An
> (1 - e) J f (x)di(x) - (1 - e)M^(Rk An).
(2.12)
It follows now from (2.11) and (2.12) that, for arbitrary e > 0, limn—TO/gn(x)d^(x) > (1 - e) / f (x )d^(x), which implies (2.10). If we combine (2.9) and (2.10), the theorem follows. Q. E.D.
Theorem 2.11: (Dominated convergence) Let gn be sequence of Borel - measurable functions on Rk such that pointwise in x, g(x) = limn—mgn (x),
and let g(x) = sup„>\g„(x)|. If / g(x)di(x) < to, where д is a probability measure on {Rk, B}, then
Proof: Let fn(x) = g(x) - supm>ngm(x). Then fn(x) is nondecreasing and nonnegative and limn—TO fn (x) = g(x) - g(x). Thus, it follows from the condition/ g(x )d i(x) < to and Theorems 2.9(a, d)-2.10 that
(2.13)
Next, let hn(x) = g(x) + infm>ngm(x). Then hn(x) is nondecreasing and nonnegative andlimn—TOhn (x) = g(x) + g(x). Thus, it follows again from the condition / g(x)di(x) < to and Theorems 2.9(a, d)-2.10 that
The theorem now follows from (2.13) and (2.14). Q. E.D.
In the statistical and econometric literature you will encounter integrals of the form/, g(x )dF(x), where F is a distribution function. Because each distribution function F(x) on Rk is uniquely associated with a probability measure i on B, one should interpret these integrals as
j g(x)dF(x) g(x)di(x), (2.15)
A A
where i is the probability measure on Bk associated with F, g is a Borel - measurable function on Rk, and A is a Borel set in B.
2.4. General Measurability and Integrals of Random Variables with Respect to Probability Measures
All the definitions and results in the previous sections carry over to mappings X: ^ — R, where ^ is a nonempty set, with X a a-algebra of subsets of ^. Recall that X is a random variable defined on a probability space {^, X, P} if, for all Borel sets B in R, {ш e ^ : X(^) є B} є X. Moreover, recall that it suffices to verify this condition for Borel sets of the type By = (-to, y],
y e R. These generalizations are listed in this section with all random variables involved defined on a common probability space {^, &, P}.
Definition 2.7: A random variable X is called simple if it takes theform X(ш) = Y^j= bjI(ш e Aj), with m < x, bj e R, where the Aj s are disjoint sets in &.
Compare Definition 2.2. (Verify as was done for Theorem 2.2 that a simple random variable is indeed a random variable.) Again, we may assume without loss of generality that the bfs are all different. For example, if X has a hypergeometric or binomial distribution, then X is a simple random variable.
Theorem 2.12: IfX and Y are simple random variables, then so are X + Y, X — Y, and X ■ Y. If in addition, Y is nonzero with probability 1, then X/Y is a simple random variable.
Proof: Similar to Theorem 2.3.
Theorem 2.13: LetXj be a sequence of random variables. Then max1< j <nXj, min1<j<nXj, sup„>1Xn, inf„>1Xn, limsupn^xXn, and liminf„^xXn are random variables. Iflimn^xXn (ш) = X(ш) for all ш in a set A in & with P (A) = 1, then X is a random variable.
Proof: Similar to Theorem 2.4.
Theorem 2.14: A mapping X: ^ ^ R is a random variable if and only if there exists a sequence Xn of simple random variables such that limn^x Xn (ш) = X(ш) for all ш in a set A in & with P (A) = 1.
Proof: Similar to Theorem 2.7.
As in Definitions 2.3-2.6, we may define integrals of a random variable X with respect to the probability measure P in four steps as follows.
Definition 2.8: Let Xbe a simple random variable: X(ш) = ’j=1 bjI(ш e
Aj), for instance. Then the integral of X with respect to P is defined as fX(ofdP (ш) =£ m=ibjP(Aj).[11]
Definition 2.9: Let X be a nonnegative random variable (with probability 1). Then the integral of X with respect of P is defined as f X(rn)dP(rn) = sup0<Xt<Xf X(rn)^dP(rn), where the supremum is taken over all simple random variables X* satisfying P [0 < X* < X] = 1.
Definition 2.10: LetXbe a random variable. Then the integral ofXwith respect to P is defined as fX(a>)dP(a>) = fX+(ю)dP(ю) — /X—(ю) dP(a>), where X+ = max{X, 0} and X— = max{—X, 0}, provided that at least one of the latter two integrals is finite.
Definition 2.11: The integral of a random variable X with respect to a probability measure P over a set A in & is defined as fA X(a>)dP(a>) = f I (ю є A) X(a>)dP(a>).
Theorem 2.15: LetX and Y be random variables, and let A be a set in & Then
(a) fA (aX(ю) + в Y(a>))dP(rn) = a fA X(a>)dP(a>) + в fA Y(a>)dP(a>).
(b) For disjoint sets Aj in &, /и„ A X (a>)dP(a>) = Yj=1fA- X(a>)dP(a>).
(c) IfX(w) > 0 for all ю in A, then fA X(ai)dP(a>) > 0.
(d) IfX(oY) > Y(ю) for all ю in A, then fA X(afdP(af > fA Y(ю)dP(ю).
(e) If a XHdPH| < I a X (ю)^(ю)- f) If P(A) = 0, then fA X(ю)dP(ю) = 0.
(g) If J X(m)dP(oi) < ж and for a sequence of sets An in &, limn P(An) = 0, then 1Шпжж f. X(ю)dP(ю) = 0.
Proof: Similar to Theorem 2.9.
Also the monotone and dominated-convergence theorems carry over:
Theorem 2.16: LetXn be a monotonic, nondecreasing sequence of nonnegative random variables defined on the probability space {^, &, P}, that is, there exists a set A є & with P(A) = 1 such that for all ю є A, 0 < Xn (ю) < Xn+1(of, n = 1, 2, 3, Then
lim / Xn (m)dP(rn) = lim Xn (m)dP(rn).
П^Ж J J П^Ж
Proof: Similar to Theorem 2.10.
Theorem 2.17: Let Xn be a sequence of random variables defined on the probability space {^, &, P} such that for all ю in a set A є & with P( A) = 1, Y(co) = limn^txXnfo). Let X = supn>1 Xn. If § X(rn)dP(rn) < ж, then limn^m f Xn(M)dP(M) = f Y(m)dP(oi).
Proof: Similar to Theorem 2.11.
Finally, note that the integral of a random variable with respect to the corresponding probability measure P is related to the definition of the integral of a Borel-measurable function with respect to a probability measure /г:
Theorem 2.18: Let гх be the probability measure induced by the random variable X. Then f X(M)dP(o) = f xd^X(x). Moreover, if g is a Borel-measurable
real function on R and X is a k-dimensional random vector with induced probability measure xX, thenf g(X(o))dP(o) = f g(x)dxX(x). Furthermore, denoting in the latter case Y = g(X), with xY the probability measure induced by Y, we have f Y(o)dP(o) = f g(X(o))dP(o) = f g(x )dxX(x) = fyd^Y (y).
Proof: Let X be a simple random variable: X(of = ^“=1 bjI(o є Aj), for instance, and recall that without loss of generality we may assume that the bj’s are all different. Each of the disjoint sets Aj are associated with disjoint Borel sets Bj such that Aj = {o є ^ : X(o) є Bj} (e. g., let Bj = {bj}). Then fX(o)dP (o) = Jf, ”1=1 bjP (Aj) = Y, 7=1 bj їх X(Bj) = f g*(x )dxx(x), where g*(x) = J2”i=1 bjI(x є Bj) is a simple function such that
m m
g*(X(o)) = J2 bjI(X(o) є Bj) = Y^ bj I(o є Aj) = X(o).
j=1 j=1
Therefore, in this case the Borel set C = {x : g*(x) = x} has xX measure zero: XX(C) = 0, and consequently,
jx(o)dP(o) = j g*(x)dxx(x) + J g*(x)dxx(x)
RC C
= j xdxX(x) = jxdxX(x).
RC
The rest of the proof is left as an exercise. Q. E.D.