INTRODUCTION TO STATISTICS AND ECONOMETRICS
NORMAL RANDOM VARIABLES
The normal distribution is by far the most important continuous distribution used in statistics. Many reasons for its importance will become apparent as we study its properties below. We should mention that the binomial random variable X defined in Definition 5.1.1 is approximately normally distributed when n is large. This is a special case of the so-called central limit theorem, which we shall discuss in Chapter 6. Examples of the normal approximation of binomial are given in Section 6.3.
When X has the above density, we write symbolically X ~ TV(jjl, a2).
We can verify J-cc f(x)dx = 1 for all p, and all positive a by a rather complicated procedure using polar coordinates. See, for example, Hoel (1984, p. 78). The direct evaluation of a general integral faf(x)dx is difficult because the normal density does not have an indefinite integral. Such an integral may be approximately evaluated from a normal probability table or by a computer program based on a numerical method, however.
The normal density is completely characterized by two parameters, p and a. We have
THEOREM 5.2.1 Let X be N(|X, a2). Then EX = p, and VX = a2.
Proof. We have
(5.2.2)
z exp(-z2/2)dz = - [exp(-z2/2)]-с» = 0
-°°л/2я~
because the integrand in (5.2.4) is the density of N(0, 1). Therefore, from
(5.2.2)
і (5.2.3), and (5.2.4), we have EX = p. Next we have
= ct. □
From (5.2.1) it is clear that /(x) is symmetric and bell-shaped around p. EX = p follows direcdy from this fact. To study the effect of ct on the shape of/(x), observe
(5.2.6) /(p) = - j=L - ,
42^ ct
which shows that the larger ct is, the flatter /(x) is.
Theorem 5.2.2 shows an important property of a normal random variable: a linear function of a normal random variable is again normal.
THEOREM 5.2.2 Let X be N(|x, a2) and let Y = a + (3X. Then we have Y~ N(a + pfju, pV).
Proof. Using Theorem 3.6.1, the density g(y) of У is given by
Therefore, by Definition 5.2.1, Y ~ N(a + (Bp, р2ст2). □
л
A useful corollary of Theorem 5.2.2 is that if X is А(р, о ), then Z = (X — |x)/ct is N( 0, 1), which is called the standard normal random variable. We will often need to evaluate the probability Р(хг < X < x2) when X is N((x, a2). Defining Z in the above way, we have
V
The right-hand side of (5.2.8) can be evaluated from the probability table of the standard normal distribution.
= P(-3<Z<-1) where Z ~ A(0, 1)
= P(Z < -1) - P(Z < -3) = 0.1587 - 0.0013
from the standard normal table = 0.1574.
Sometimes the problem specifies a probability and asks one to determine the variance, as in the following example.
EXAMPLE 5.2.2 Assume that the life in hours of a light bulb is normally distributed with mean 100. If it is required that the life should exceed 80 with at least 0.9 probability, what is the largest value that cr can have?
Let X be the life of a light bulb. Then X ~ N(100, a2). We must determine a2 so as to satisfy
(5.2.10) P(X > 80) > 0.9.
Defining Z = (X — 100)/a, (5.2.10) is equivalent to
fy - р-И |
2 -2p |
(x - рП |
fy - P-yj |
(Ту v У |
Vx У |
(Ту V y_ |
LV |
+ |
THEOREM 5.3.1 Let (X, Y) have the density (5.3.1). Then the marginal densities f(x) and f(y) and the conditional densities f(y x) and f(x y)
9
are univariate normal densities, and we have EX = xx, VX = crx, EY = |Лк, VY = <jy, Correlation (X, T) = p, and finally
(5.3.2) E(Y I X) = ixy + P ^ (X - |ix), V(Y X) = c^(l - p2).
Proof The joint density/(x, y) can be rewritten as
<5'3'3’ л,',) = ^Л-7
-/2 ' f l,
where /] is the density of N(px, u|) and /2 is the density of N[|Xy + роуст^х — |jlx), cr2(l — p2)]. All the assertions of the theorem follow from (5.3.3) without much difficulty. We have
(5.3.4) f{x) = [ f2fi dy
J -00
= /1 /%dy because f does not depend on у
J - a.
= /1 because /2 is a normal density.
л
Therefore we immediately see X ~ N(xx, ctx). By symmetry we have Y ~ W(p, y, cry). Next we have
(5.3.5) /(,!«) =^f=AA=/5.
f(x) /1
Therefore we can conclude that the conditional distribution of Y given X = x is 1V[|xk + рсгустх^х — px), u2(l — p2)]. All that is left to show is that Correlation (X, Y) = p. We have by Theorem 4.4.1
(5.3.6) EXY = ExE(XY X) = Ex[XE(Y X)]
= Ex[Xi. Y + payax1X(X - px)]
= Pxpr + PCTyCTx.
Therefore Cov(X, Y) = p(Ту(тх; hence Correlation(X, Y) = p. □
In the above discussion we have given the bivariate normal density (5.3.1) as a definition and then derived its various properties in Theorem 5.3.1. We can also prove that (5.3.1) is indeed the only function of x and у that possesses these properties. The next theorem shows a very important property of the bivariate normal distribution.
THEOREM 5.3.2 IfX and Y are bivariate normal and a and (3 are constants, then aX + (ЗУ is normal.
But clearly /з is the density of iV[ap, x + py + p*(a*/ax)(x — (xx), (a*)2(l - p*2)] and f is that of./V(px, а|), as before. We conclude, therefore, using Theorem 5.3.1 and equation (5.3.1), thatg(i) is a normal density. □
It is important to note that the conclusion of Theorem 5.3.2 does not necessarily follow if we merely assume that each of X and Y is univariately normal. See Ferguson (1967, p. Ill) for an example of a pair of univariate normal random variables which are joindy not normal.
By applying Theorem 5.3.2 repeatedly, we can easily prove that a linear combination of n-variate normal random variables is normal. In particular, we have
theorem 5.3.3 Let {XJ, і = 1, 2, . . . , n, be pairwise independent and identically distributed as IV (jjl, a2). Then X = (l/n)Ef=1X, is N(|x, a2/n).
The following is another important property of the bivariate normal distribution.
theorem 5.3.4 IfX and Y are bivariate normal and Cov(X, Y) = 0, then X and Y are independent.
Proof. If we put p = 0 in (5.3.1), we immediately see that f(x, y) = f(x)f(y). Therefore X and Y are independent by Definition 3.4.6. □
Note that the expression for E(Y | X) obtained in (5.3.2) is precisely the best linear predictor of Y based on X, which was obtained in Theorem 4.3.6. Since we showed in Theorem 4.4.3 that £(T | X) is the best predictor of Y based on X, the best predictor and the best linear predictor coincide in the case of the normal distribution—another interesting feature of normality.
In the preceding discussion we proved (5.3.2) before we proved Theorems 5.3.2 and 5.3.4. It may be worthwhile to point out that (5.3.2) follows readily from Theorems 5.3.2 and 5.3.4 and equations (4.3.10), (4.3.11), and (4.3.12). Recall that these three equations imply that for any pair of random variables X and Y there exists a random variable Z such that
(5.3.10) Y = |xy + p ^ (X - px) + oyZ,
EZ = 0, VZ = 1 — p2, and Cov(X, Z) = 0. If, in addition, X and Y are bivariate normal, Z is also normal because of Theorem 5.3.2. Therefore Z and X are independent because of Theorem 5.3.4, which implies that
E(Z I X) = EZ = 0 and V(Z | X) = VZ = 1 — p2. Therefore, taking the conditional mean and variance of both sides of (5.3.10), we arrive at (5.3.2).
Conversely, however, the linearity of E(Y | X) does not imply the joint normality of X and Y, as Example 4.4.4 shows. Examples 4.4.1 and 4.4.2 also indicate the same point. The following two examples are applications of Theorems 5.3.1 and 5.3.3, respectively.
EXAMPLE 5.3.1 Suppose X and Y are distributed jointly normal with EX = 1, EY = 2, VX = VY = Уз, and the correlation coefficient p = 1/ь Calculate P{2.2 < Y < 3.2 | X = 3).
Using (5.3.2) we have
E(Y I X) = 2 + (X - 1)
E(Y I X = 3) =3
Therefore, Y given X = 3 is jV(3, У4). Defining Z ~ 1V(0, 1), we have
P(2.2 < Y < 3.2 I X = 3) = P(-l.6 < Z < 0.4)
= P(Z < 0.4) - P(Z < -1.6) = 0.6554 - 0.0548 = 0.6006.
EXAMPLE 5.3.2 If you wish to estimate the mean of a normal population whose variance is 9, how large a sample should you take so that the probability is at least 0.8 that your estimate will not be in error by more than 0.5?
Put X, ~ JV(|x, 9). Then, by Theorem 5.3.3,
1 n
- X Xi ~ N
n 1
We want to choose n so that
P(|X„ - |x| < 0.5) > 0.8.
Defining the standard normal Z = 4n(Xn — p)/3, the inequality above is equivalent to
which implies n > 59.13. Therefore, the answer is 60.