Introduction to the Mathematical and Statistical Foundations of Econometrics
Conditional Expectations
Roll a die, and let the outcome be Y. Define the random variable X = 1 if Y is even, and X = 0 if Y is odd. The expected value of Y is E[Y] = (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5. But what would the expected value of Y be if it is revealed that the outcome is even: X = 1? The latter information implies that Y is 2, 4, or 6 with equal probabilities 1 /3; hence, the expected value of Y, conditional on the event X = 1, is E[Y|X = 1] = (2 + 4 + 6)/3 = 4. Similarly, if it is revealed that X = 0, then Y is 1, 3, or, 5 with equal probabilities 1 /3; hence, the expected value of Y, conditional on the event X = 0, is E[Y|X = 0] = (1 + 3 + 5)/3 = 3. Both results can be captured in a single statement:
E [Y | X] = 3 + X. (3.1)
In this example the conditional probability of Y = y, given, X = x is[12]
P(Y = y and X = x)
P (X = x)
P ({y} П {2, 4, 6}) P ({y})
P ({2, 4, 6}) P ({2, 4, 6})
1
= - if x = 1 and y є {2, 4, 6} P ({y} П {2, 4, 6}) P (0)
P({2, 4, 6}) P({2, 4, 6})
= 0 if x = 1 and y Є {2, 4, 6}
P({y} П {1, 3, 5}) P({y})
P({1, 3, 5}) P({1, 3, 5})
= 1 ifx = 0 and y є {1, 3, 5}
P({y} П {1, 3, 5}) P(0)
P({1, 3, 5}) P({1, 3, 5})
= 0 ifx = 0 and уЄ {1, 3, 5}; (3.2)
hence,
Thus, in the case in which both Y and X are discrete random variables, the conditional expectation E[Y|X] can be defined as
E[Y|X] = ^2 УР(у1 X), where
p(y lx) = P(Y = y|X = x) for P(X = x) > 0.
A second example is one in which X is uniformly [0, 1] distributed, and given the outcome x of X, Y is randomly drawn from the uniform [0, x] distribution. Then the distribution function F(y) of Y is
P(Y < y) = P(Y < y and X < y) + P(Y < y and X > y) P(X < y) + P(Y < y and X > y) y + E[I(Y < y)I(X > y)]
I(z < y) x-1 dz І I(x > y) dx
Hence, the density of Y is
f (y) = F'(y) = - ln(y) for y є (0, 1], f (y) = 0 for y Є (0, 1].
Thus, the expected value of Y is E[Y] = /J y(- ln(y))dy = 1/4. But what would the expected value be if it is revealed that X = x for a given number x є (0, 1)? The latter information implies that Y is now uniformly [0, x] distributed; hence, the conditional expectation involved is
More generally, the conditional expectation of Y given X is
X
E [Y | X] = X-1 J ydy
о
The latter example is a special case of a pair (Y, X) of absolutely continuously distributed random variables with joint density function f (y, x) and marginal density fx (x). The conditional distribution function of Y, given the event X є [x, x + 5],5 > 0, is
Letting 5 I 0 then yields the conditional distribution function of Y given the event X = x:
F(y|x) = lim P(Y < y |X є [x, x + 5])
5^0
y
= j f (u, x)du/fx(x), providedfx(x) > 0.
—to
Note that we cannot define this conditional distribution function directly as
F(y|x) = P(Y < y and X = x)/P(X = x)
because for continuous random variables X, P (X = x) = 0.
The conditional density of Y, given the event X = x, is now
f(yx) = 9F(y|x)/dy = f(y, x)/fx(x),
and the conditional expectation of Y given the event X = x can therefore be defined as
TO
E[Y|X = x] =f уЛуlx» = S<x). for mstaiire.
— to
Plugging in X for x then yields
These examples demonstrate two fundamental properties of conditional expectations. The first one is that E[Y|X] is a function of X, which can be translated as follows: Let Y and X be two random variables defined on a common probability space {Й, Ж, P}, and let.!XX be the a-algebra generated by X, ЖX = {X—1(B), B є B}, where X—1(B) is a shorthand notation for the set {ш є Й : X(ш) є B} and B is the Euclidean Borel field. Then,
Z = E[Y|X] is measurable &X, (3.5)
which means that, for all Borel sets B, {ш є Й : Z(ш) є В}є ЖX. Secondly, we have
E[(Y — E[Y|X])I(X є B)] = 0 for all Borel sets B. (3.6)
In particular, in the case (3.4) we have E[(Y — E[Y|X])I(X є B)]
TO TO
(3.7)
Because ЖX = {X '(B), B є B}, property (3.6) is equivalent to
Moreover, note that Й є ЖX, and thus (3.8) implies
E (Y) = Y (ш^Р(ш) = Z (ш^Р(ш) = E (Z)
provided that the expectations involved are defined. A sufficient condition for the existence of E(Y) is that
E(| Y|) < <x. (3.10)
We will see later that (3.10) is also a sufficient condition for the existence of E (Z).
I will show now that condition (3.6) also holds for the examples (3.1) and (3.3). Of course, in the case of (3.3) I have already shown this in (3.7), but it is illustrative to verify it again for the special case involved.
In the case of (3.1) the random variable Y■ I(X = 1) takes the value 0 with probability V2 and the values 2,4, or 6 with probability 1 /6; the random variable Y■ I(X = 0) takes the value 0 with probability V2 and the values 1, 3, or 5 with probability 1 /6. Thus,
E [Y ■ I(X є B)] = E [Y ■ I(X = 1)] = 2
E [Y ■ I(X є B)] = E [Y ■ I(X = 0)] = 1.5
E[Y ■ I(X є B)] = E[Y] = 3.5
E [Y ■ I(X є B)] = 0
which by (3.1) and (3.6) is equal to
E[(E[Y|X])I(X є B)]
= 3E[I(X є B)] + E[X ■ I(X є B)]
= 3P(X є B) + P(X = 1 and X є B)
3 P (X = 1) + P (X = 1) |
=2 |
if 1 є B |
and |
0 / B, |
3 P (X = 0) + P (X = 1 and X = 0) |
= 1.5 |
if 1 / B |
and |
0 B, |
3 P (X = 0 or X = 1) + P (X = 1) |
= 3.5 |
if 1 є B |
and |
0 B, |
0 |
if 1 / B |
and |
0 / B. |
Moreover, in the case of (3.3) the distribution function of Y ■ I(X є B) is
Fb(y) = P(Y ■ I(X є B) < y) = P(Y < y and X є B) + P(X / B)
= P(X є B П [0, y]) + P(Y < y and X є B П (y, 1)) + P(X / B) y 1 1
= f I(x є B)dx + y f x—11(x є B)dx + 1 — /1(x є B)dx 0 y 0
1 1
= 1 — /1(x є B)dx + y f x—11(x є B)dx for 0 < y < 1;
yy
hence, the density involved is 1
fB(y) = j x—11(x є B)dx for y є [0, 1], fB(y) = 0 for y є [0, 1].
y
Thus,
x-11 (x є B )dx^ dy
1
= 2 j y ■ I (y e B )dy,
0
which is equal to
1
1 1 f
E(E[Y|X]I(X є B)) = - E[X ■ I(X є B)] = - x ■ I(x є B)dx.
0
The two conditions (3.5) and (3.8) uniquely define Z = E[Y|X] in the sense that if there exist two versions of E[Y|X] such as Z1 = E[Y|X] and Z2 = E[Y|X] satisfying the conditions (3.5) and (3.8), then P(Z1 = Z2) = 1. To see this, let
A = {ш є & : Z1(rn) < Z2(w)}. (3.11)
Then A є XX; hence, it follows from (3.8) that
j(Z2(«) - Z 1(eo))dP(eo) = E[(Z2 - Z 1)I(Z2 - Z1 > 0)] = 0.
A
The latter equality implies P(Z2 - Z1 > 0) = 0 as I will show in Lemma 3.1. If we replace the set A by A = {ш є & : Z 1(ш) > Z2(«)}, it follows similarly that P (Z2 - Z1 < 0) = 0. Combining these two cases, we find that P(Z2 = Z1) = 0.
Lemma 3.1: E [Z ■ I(Z > 0)] = 0 implies P (Z > 0) = 0.
Proof: Choose є > 0 arbitrarily. Then
0 = E [Z ■ I(Z > 0)] = E [Z ■ I(0 < Z < є)] + E [Z ■ I(Z > є)]
> E[Z ■ I(Z > є)] > єE[I(Z > є)] = єP(Z > є);
hence, P(Z > є) = 0 for all є > 0. Now take є = 1/n, n = 1, 2,... and let Cn = {co є & : Z(ш) > n-1}.
Then Cn c Cn+1; hence,
Q. E.D.
Conditions (3.5) and (3.8) only depend on the conditioning random variable X via the sub-a-algebra &X of &. Therefore, we can define the conditional expectation of a random variable Y relative to an arbitrary sub-a-algebra &0 of &, denoted by E[Y|&0], as follows:
Definition 3.1: Let Y be a random variable defined on a probability space &, P} satisfying E(| Y|) < to, and let &0 c & be a sub-a-algebra of &. The conditional expectation ofY relative to the sub-a-algebra &0, denoted by E[Y|&0] = Z, for instance, is a random variable Z that is measurable &0 and is such that for all sets A є &0,
j Y(w)dP(w) = j Z(rn)dP(rn).
AA