Introduction to the Mathematical and Statistical Foundations of Econometrics
Mathematical Expectation
With these new integrals introduced, we can now answer the second question stated at the end of the introduction: How do we define the mathematical expectation if the distribution of X is neither discrete nor absolutely continuous?
Definition 2.12: The mathematical expectation of a random variable X is defined as E(X) = f X(o)dP(o) or equivalently as E(X) = f xdF(x) (cf(2.15)), whereFis the distribution function ofX, provided that the integrals involved are defined. Similarly, if g(x) is a Borel-measurable function on Kk and Xis a random vector in Kk, then, equivalently, E[g(X)] = f g(X(o))dP(o) = f g(x )dF(x), provided that the integrals involved are defined.
Note that the latter part of Definition 2.12 covers both examples (2.1) and (2.3).
As motivated in the introduction, the mathematical expectation E [g(X)] may be interpreted as the limit of the average payoff of a repeated game with payoff function g. This is related to the strong law of large numbers, which we
will discuss in Chapter 7: Let X1, X2, X3, ...be a sequence of independent random variables or vectors each distributed the same as X, and let g be a Borel-measurable function such that E[|g(X)|] < to. Then
P ( Hm(1/n)£ g(Xj) = E[g(X)] J = 1.
There are a few important special cases of the function g - in particular the variance ofX, which measures the variation of Xaround its expectation E(X) - and the covariance of a pair of random variables X and Y, which measures how Xand Yfluctuate together around their expectations:
Definition 2.13: The m’s moment (m = 1, 2, 3,...) of a random variable X is definedasE (Xм), and them's central moment ofX is defined by E (| X — /лх |m), where /лх = E(X). The second central moment is called the variance of X, var(X) = E [(X — ixx )2] = ax, for instance. The covariance of a pair (X, Y) of random variables is defined as cov(X, Y) = E[(X — цх) (Y — /лу)], where /лх is the same as before, and /лу = E(Y). The correlation (coefficient) of a pair (X, Y) of random variables is defined as
The correlation coefficient measures the extent to which Y can be approximated by a linear function of X, and vice versa. In particular,
If exactly Y = a + вX, then corr(X, Y) = 1 if в > 0,
corr(X, Y) =-1 if в < 0. (2.17)
Moreover,
Definition 2.14: Random variables X and Y are said to be uncorrelated if cov(X, Y) = 0. A sequence of random variables Xj is uncorrelated if, for all i = j, Xi and Xj are uncorrelated.
Furthermore, it is easy to verify that
Theorem 2.19: If Xi,...,Xn are uncorrelated, then var(Jfj =1 Xj) =
T! j= var(Xj).
Proof: Exercise.
2.5. Some Useful Inequalities Involving Mathematical Expectations
There are a few inequalities that will prove to be useful later on - in particular the inequalities of Chebishev, Holder, Liapounov, and Jensen.
2.6.1. Chebishev’s Inequality
Let X be a nonnegative random variable with distribution Function F(x), and let p(x) be a monotonic, increasing, nonnegative Borel-measurable function on [0, to). Then, for arbitrary є > 0,
E [p(X)] = j p(x)dF(x ) = j p(x)dF(x)
{p(x )>р(є)}
+ j p(x)dF(x) > j p(x)dF(x) > р(є)
{р(х)<р(є)} {р(х)>р(є)}
x j dF(x) = р(є) j dF(x) = р(є)(1 — F(є));
{p(x )>р(є)} {x >є}
(2.18)
hence,
P(X > є) = 1 — F(є) < E[р(Х)]/р(є). (2.19)
In particular, it follows from (2.19) that, for a random variable Ywith expected value ц. у = E(Y) and variance oj,
P({« є П : |Y(«) — ^y| >J0y/) < є. (2.20)
2.6.2. Holder’s Inequality
Holder’s inequality is based on the fact that ln(x) is a concave function on (0, to): for 0 < a < b, and 0 < X < 1, ln(Xa + (1 — X)b) > Xln(a) + (1 — X) ln(b);
hence,
Xa + (1 — X)b > aX b1—X. (2.21)
Now let X and Y be random variables, and put a = |X|p/E(|X|p), b = |Y|q/E(|Y|q), where p > 1 and p—1 + q—1 = 1. Then it follows from (2.21), with X = 1/p and 1 — X = 1/q, that
= | X ■ Y |
(E(|X|p))1/p (E(|Y|q))1/q '
Taking expectations yields Holder’s inequality:
E(|X ■ Y|) < (E(|X|p))1/p (E(|Y |q))1/q,
where p > 1 and 1 + - = 1. (2.22)
pq
Forthecasep = q = 2, inequality (2.22)reads E(|X ■ Y|) < ^E(X2)^/E(Y2), which is known as the Cauchy-Schwartz inequality.