Mostly Harmless Econometrics: An Empiricist’s Companion
Appendix
Derivation of Equation (4.6.8)
Rewrite equation (4.6.7) as follows
Yij = p* + - kqt і + (^o + nxjSj + v f,
where tі =Si — Sj. Since tі and Sj are uncorrelated by construction, we have:
^o + ^1.
C (t і, Yjj)
V (t і)
Simplifying the second line,
C [(si Sj ),Yij ] [V(Si) - V(Sj)]
|
= РоФ + Pl(l - Ф) = Pi + Ф(Ро - Pi)
where ф V(s') . Solving for 'K1, we have
Г V (Si)-V (Sj) & 15
^1 = p1 _ ^0 = ф(р1 _ p0).
|
|
|
Derivation of the approximate bias of 2SLS
Start from the last equality in (4.6.20):
EPSLS - P] « [E (i'Z'Zi) + E (£'PzC)]-1 E (C'Pzv) •
The magic of linear algebra helps us simplify this expression: The term C'PzC is a scalar and therefore equal to its trace; the trace is a linear operator which passes through expectations and is invariant to cyclic permutations; finally, the trace of Pz, an idempotent matrix, is equal to it’s rank, Q. Using these facts, we have
E(C'PzC) = E [tr (c'PzC)] = E [tr (PzCC')] = tr (PzE [CC'])
= tr (Pzct2/)
= CT2tr (Pz )
= ct2Q,
where we have assumed that Ci is homoskedastic. Similarly, applying the trace trick to C'PzV shows that this term is equal to av£ Q. Therefore,
[E (i'Z'Zi) + ct2q] 1 E [tr (C'Pzv)]
avi= Q [E (i'Z'Zi) + ct2q]
Multivariate first-stage F-statistics
Assume any exogenous covariates have been partialled out of the instrument list and that there are two endogenous variables, x1 and *2 with coefficients ^1 and 62. We are interested in the bias of the 2SLS estimator of 62 when x1 is also treated as endogenous. The second stage equation is
y = PzжА + PzЖ262 + [v + A - Pz*1)61 + A - Pz*2)62]- (4.7.1)
where Pzж1 and PzЖ2 are the first-stage fitted values from regressions of x1 and Ж2 on Z. By the usual anatomy formula for multivariate regression, 62 in (4.7.1) is the bivariate regression of y on the residual from
a regression of Pz*2 on Pz*1. This residual is
[I - Pzxi(x^Pz*1) 1x'1Pz]Pz*2 = PzX2,
where M1z = [I — Pzx1(x'1Pz*1)~1*'1Pz] is the relevant residual-maker matrix. In addition, note that Miz Pz X2 = Pz [M1Z Х2].
From here we conclude that the 2SLS estimator of ^2 is the OLS regression on Pz [M1zX2], in other words, OLS on the fitted values from a regression of M1zX2 on Z. This is the same as 2SLS using Pz to instrument M1zX2. So the 2SLS estimator of ^2 can be written
[*2 M1z Pz M1z X2]-1x2 M1z Pz У = ^2 + [*2M^ Pz M^ *2r1*2M1z Pz r
The explained sum of squares (numerator of the F-statistic) that determines the bias of the 2SLS estimator of ^2 is therefore the expectation of [*2M1zPzM1z*2], while the bias comes from the fact that the expectation E[£' M1zPzrj] is non-zero when ц and £ are correlated.
Here’s how to compute this F-statistic in practice: (a) Regress the first stage fitted values for the regressor of interest, Pz*2, on the other first-stage fitted values and any exogenous covariates. Save the residuals from this step; (b) Construct the F-statistic for excluded instruments in a first-stage regression of the residuals from (a) on the excluded instruments. Note that you should get the 2SLS coefficient of interest in a 2SLS procedure where the residuals from (a) are instrumented using Z, with no other covariates or endogenous variables. Use this fact to check your calculation.
CHAPTER 4. INSTRUMENTAL VARIABLES IN ACTION