Springer Texts in Business and Economics
Partitioned Regression and the Frisch-Waugh-Lovell Theorem
In Chapter 4, we studied a useful property of least squares which allows us to interpret multiple regression coefficients as simple regression coefficients. This was called the residualing interpretation of multiple regression coefficients. In general, this property applies whenever the k regressors given by X can be separated into two sets of variables X1 and X2 of dimension (n x k1) and (n x k2) respectively, with X = [X1,X2] and k = k1 + k2. The regression in equation (7.1) becomes a partitioned regression given by
y = Xe + u = X^ + X2e 2 + u
One may be interested in the least squares estimates of в2 corresponding to X2, but one has to control for the presence of Xi which may include seasonal dummy variables or a time trend, see Frisch and Waugh (1933) and Lovell (1963)1.
The OLS normal equations from (7.8) are as follows:
(7.9)
These can be solved by partitioned inversion of the matrix on the left, see the Appendix to this chapter, or by solving two equations in two unknowns. Problem 2 asks the reader to verify that
hobs = (X2 Px-1 X2)-1X2 Px-1 У (7.10)
where PXl = In — PXl and PXl = X1 (XiX1)-1Xi. PXl is the orthogonal projection matrix of
X1 and PXl X2 generates the least squares residuals of each column of X2 regressed on all the variables in X1. In fact, if we denote by X2 = PXlX2 and y = PXly, then (7.10) can be written
as
hobs = (X2 X2)-1X2 У (7.11)
using the fact that PXl is idempotent. This implies that в2,OLS can be obtained from the regression of у on X2. In words, the residuals from regressing y on X1 are in turn regressed upon the residuals from each column of X2 regressed on all the variables in X1. This was illustrated in Chapter 4 with some examples. Following Davidson and MacKinnon (1993) we denote this result more formally as the Frisch-Waugh-Lovell (FWL) Theorem. In fact, if we premultiply (7.8) by PXl and use the fact that PXlX1 = 0, one gets
PXl У = PXl X2e2 + PXlu (7.12)
The FWL Theorem states that: (1) The least squares estimates of в2 from equations (7.8) and (7.12) are numerically identical and (2) The least squares residuals from equations (7.8) and (7.12) are identical.
Using the fact that I5Xl is idempotent, it immediately follows that, OLS on (7.12) yields e2,OLS as given by equation (7.10). Alternatively, one can start from equation (7.8) and use the result that
y = PXy + PXy = Xe OLS + PXy = X1e1,OLS + X2e2,OLS + PXy (7.13)
where PX = X(X'X)-1X' and PX = In — PX. Premultiplying equation (7.13) by X2PXl and using the fact that PXlX1 = 0, one gets
X2 Pxi y = X2 Pxi X2P2OLS + X2 Pxi px y (7.14)
But, PXlPX = PXl. Hence, PXlPX = Px. Using this fact along with PXX = PX[X1,X2] = 0,
the last term of equation (7.14) drops out yielding the result that в2,OLS from (7.14) is identical to the expression in (7.10). Note that no partitioned inversion was used in this proof. This proves part (1) of the FWL Theorem.
Also, premultiplying equation (7.13) by Pxi and using the fact that PXlPx = Px, one gets
PXi y = Pxi X2e2,OLS + Pxy (7.15)
Now 32,OLS was shown to be numerically identical to the least squares estimate obtained from equation (7.12). Hence, the first term on the right hand side of equation (7.15) must be the fitted values from equation (7.12). Since the dependent variables are the same in equations
(7.15) and (7.12), Pxy in equation (7.15) must be the least squares residuals from regression (7.12). But, Pxy is the least squares residuals from regression (7.8). Hence, the least squares residuals from regressions (7.8) and (7.12) are numerically identical. This proves part (2) of the FWL Theorem.
Several applications of the FWL Theorem will be given in this book. Problem 2 shows that if X1 is the vector of ones indicating the presence of a constant in the regression, then regression
(7.15) is equivalent to running y — y) on the set of variables in X2 expressed as deviations from their respective sample means. Problem 3 shows that the FWL Theorem can be used to prove that including a dummy variable for one of the observations in the regression is equivalent to omitting that observation from the regression.