INTRODUCTION TO STATISTICS AND ECONOMETRICS
HIGHER MOMENTS
As noted in Section 4.1, the expected value, or the mean, is a measure of the central location of the probability distribution of a random variable. Although it is probably the single most important measure of the characteristics of a probability distribution, it alone cannot capture all of the characteristics. For example, in the coin-tossing gamble of the previous section, suppose one must choose between two random variables, X and X2, when Xj is 1 or 0 with probability 0.5 for each value and X2 is 0.5 with probability 1. Though the two random variables have the same mean, they are obviously very different.
The characteristics of the probability distribution of random variable X can be represented by a sequence of moments defined either as
(4.2.1) kih moment around zero = EXh or
(4.2.2) kth moment around mean = E{X — EX)k.
Knowing all the moments (either around zero or around the mean) for k = 1, 2, . . . , is equivalent to knowing the probability distribution completely. The expected value (or the mean) is the first moment around zero. Since either xk or (x — EX)k is clearly a continuous function of x, moments can be evaluated using the formulae in Theorems 4.1.1 and 4.1.2.
As we defined sample mean in the previous section, we can similarly define the sample kth moment around zero. Let Xj, X2, . . . , Xn be mutually independent and identically distributed as X. Then, n л”=]Х* is the sample Mi moment around zero based on a sample of size n. Like the sample mean, the sample Mi moment converges to the population Mi moment in probability, as will be shown in Chapter 6.
Next to the mean, by far the most important moment is the second moment around the mean, which is called the variance. Denoting the variance of X by VX, we have
DEFINITION 4.2.1
VX = E(X - EX)2 = EX2 - (EX)2.
The second equality in this definition can be easily proved by expanding the squared term in the above and using Theorem 4.1.6. It gives a more convenient formula than the first. It says that the variance is the mean of the square minus the square of the mean. The square root of the variance is called the standard deviation and is denoted by a. (Therefore variance is sometimes denoted by a instead of V.) From the definition it is clear that VX > 0 for any random variable and that VX = 0 if and only if X = EX with probability 1.
The variance measures the degree of dispersion of a probability distribution. In the example of the coin-tossing gamble we have VXi = y4 and VX2 = 0. (Ax can be deduced from the definition, the variance of any constant is 0.) The following three examples indicate that the variance is an effective measure of dispersion.
EXAMPLE 4.2.1
X = a with probability У% = — a with probability У2
VX = EX2 = a2.
EXAMPLE 4.2.3 (same as Example 4.1.1). X has density/(x) = 2x 3, 1 < x.
x xdx = 2[log x] ]°.
.-. VX = 00.
Note that we previously obtained EX = 2, which shows that the variance is more strongly affected by the fat tail.
Examples 4.2.4 and 4.2.5 illustrate the use of the second formula of Definition 4.2.1 for computing the variance.
EXAMPLE 4.2.4 A die is loaded so that the probability of a given face turning up is proportional to the number on that face. Calculate the mean and variance for X, the face number showing.
We have, by Definition 4.1.1, [1]
EXAMPLE 4.2.5 X has density /(x) = 2(1 — x) for 0 < x < 1 and = 0 otherwise. Compute VX.
By Definition 4.1.2 we have
(4.2.6)
By Theorem 4.1.2 we have
Г1 / 2 3 j _____ 1
(x — x )ax = — ■ о 6
Therefore, by Definition 4.2.1,
The following useful theorem is an easy consequence of the definition of the variance.
THEOREM 4.2.1 If a and (З are constants, we have
V(aX + p) = aVx.
Note that Theorem 4.2.1 shows that adding a constant to a random variable does not change its variance. This makes intuitive sense because adding a constant changes only the central location of the probability distribution and not its dispersion, of which the variance is a measure.
We shall seldom need to know any other moment, but we mention the third moment around the mean. It is 0 if the probability distribution is symmetric around the mean, positive if it is positively skewed as in Figure 4.1, and negative if it is negatively skewed as the mirror image of Figure
4.1 would be.