Mostly Harmless Econometrics: An Empiricist’s Companion
IV Details
2SLS estimates are easy to compute, especially since software like SAS and Stata will do it for you. Occasionally, however, you might be tempted to do it yourself just to see if it really works. Or you may be stranded on the planet Krikkit with all of your software licenses expired (Krikkit is encased in a slo-time envelope, so it will take you a long time to get licenses renewed). "Manual 2SLS" is for just such emergencies. In the Manual 2SLS procedure, you estimate the first stage yourself (which in any case, you should be looking at), and plug the fitted values into the second stage equation, which is then estimated by OLS. Returning to the system at the beginning of this chapter, the first and second stages are
si — Xi^io + ^1iZi + £ii Yi — a'Xi + psi + [pi + p(Si - Si)]
where Xi is a set of covariates, Zi is a set of excluded instruments, and the first stage fitted values are Si — XiSio + ^iiZi.
Manual 2SLS takes some of the mystery out of canned 2SLS, and may be useful in a software crisis, but it opens the door to mistakes. For one thing, as we discussed earlier, the OLS standard errors from the manual second stage will not be correct (the OLS residual variance is the variance of ^ + p(Si — Si), while for proper 2SLS standard errors you want the variance of ^ only). There are more subtle risks as well.
Covariate Ambivalence
Suppose the covariate vector contains two sorts of variables, some (say, Xoi) that you are comfortable with, and others (say, Xii) about which you are ambivalent. Griliches and Mason (1972) faced this scenario when
constructing 2SLS estimates of a wage equation that treats AFQT scores (an ability test used by the armed forces) as an endogenous control variable to be instrumented. The instruments for AFQT are early schooling (completed before military service), race, and family background variables. They estimated a system that can be described like this:
sj — X0i^10 + ^11 Zj + Cli
Yj — a0XOi + a0Xii + psi + [pj + p(Si — Si)].
This looks a lot like manual 2SLS.
A closer look, however, reveals an important difference between the equations above and the usual 2SLS procedure: the covariates in the first and second stages are not the same. For example, Griliches and Mason included age in the second stage but not in the first, a fact noted by Cardell and Hopkins (1977) in a comment on their paper. This is a mistake. Griliches’ and Mason’s second stage estimates are not the same as 2SLS. What’s worse, they are inconsistent where 2SLS might have been fine. To see why, note that the first-stage residual, Sj — Sj, is uncorrelated with Xoi by construction since OLS residuals are always uncorrelated with included regressors. But because X1j is not included in the first-stage it is likely to be correlated with the first-stage residuals (e. g., age is probably correlated with the AFQT residual from the Griliches and Mason (1972) first stage). The inconsistency from this correlation spills over to all coefficients in the second stage. The moral of the story: put the same exogenous covariates in your first and second stage. If a covariate is good enough for the second stage, it’s good enough for the first.
Forbidden Regressions
Forbidden regressions were forbidden by MIT Professor Jerry Hausman in 1975, and while they occasionally resurface in an under-supervised thesis, they are still technically off-limits. A forbidden regression crops up when researchers apply 2SLS reasoning directly to nonlinear models. A common scenario is a dummy endogenous variable. Suppose, for example, the causal model of interest is
where Dj is a dummy variable for veteran status. The usual 2SLS first stage is
Di — ^1oXj + ^1lZj + C1j; (4.6.2)
a linear regression of Dj on covariates and regressors.
Because Dj is a dummy variable, the CEF associated with this first stage, E[Dj|Xj, Zj], is probably nonlinear. So the usual OLS first-stage is an approximation to the underlying nonlinear CEF. We might, therefore, use a nonlinear first stage in an attempt to come closer to the CEF. Suppose that we use Probit to model E[Dj|Xj, Zj]. The Probit first stage is Ф[Х'^ро + r'p1Zi], where rpo and rp1 are Probit coefficients, and the fitted values are Dpi = Ф[ХІ7гро + 7r’p1Zi]. The forbidden regression in this case is the second stage equation created by substituting Dpi for Di:
Y i = a'Xi + pDpi + [Pi + p(Di — Dpi)]. (4.6.3)
The problem with (4.6.3) is that only OLS estimation of (4.6.2) is guaranteed to produce first-stage residuals that are uncorrelated with fitted values and covariates. If E[Di|Xi, Zi] = Ф^г^о + n'p1Zi], then residuals from the nonlinear model will be asymptotically uncorrelated with Xi and Dpi, but who is to say that the first stage CEF is really Probit? With garden-variety 2SLS, in contrast, we do not need to worry about whether the first-stage CEF is really linear.[69]
A simple alternative to the forbidden second step, (4.6.3), avoids problems due to an incorrect nonlinear first stage. Instead of plugging in nonlinear fitted values, we can use the nonlinear fitted values as instruments. In other words, use Dpi as an instrument for (4.6.1) in a conventional 2SLS procedure (as always, the exogenous covariates, Xi, should also be in the instrument list). Use of fitted values as instruments is the same as plugging in fitted values when the first-stage is estimated by OLS, but not in general. Nonlinear - fits-as-instruments has the further advantage that, if the nonlinear model gives a better approximation of the first-stage CEF than the linear model, the resulting 2SLS estimates will be more eff cient than those using a linear first stage (Newey, 1990).
But here, too, there is a drawback. The nonlinear-fits-as-instruments procedure implicitly uses nonlinearities in the first stage as a source of identifying information. To see this, suppose the causal model of interest includes the instruments, Zi :
Y i = a’Xi + l'Zi + pDi + Pi. (4.6.4)
Now, with the first stage given by (4.6.2), the model is unidentified and conventional 2SLS estimates of (4.6.4) don’t exist. But 2SLS estimates using Xi, Zi, Dpi do exist, because Dpi is a nonlinear function of Xi and Zi that is excluded from the second stage. Should you use this nonlinearity as a source of identifying information? We usually prefer to avoid this sort of back-door identification since its not clear what the underlying experiment really is.
As a rule, naively plugging in first-stage fitted values in nonlinear models is a bad idea. This includes models with a nonlinear second stage as well as those where the CEF for the first stage is nonlinear. Suppose,
for example, that you believe the causal relation between schooling and earnings is approximately quadratic (as in Card’s [1995] structural model). In other words, the model of interest is
yi — a'Ni + p1Si + P2s2 + Vi - (4.6.5)
Given two instruments, it’s easy enough to estimate (4.6.5) treating both Si and S2 as endogenous. In this case, there are two first-stage equations, one for Si and one for S2- You need at least two instruments for this to work, of course. It’s natural to use Zi and its square (unless Zi is a dummy, in which case you’ll need a better idea).
You might be tempted, however, to work with a single first stage, say equation (4.6.2), and estimate the following second stage manually:
yi — o/Xi + p1Si + p2s2 + [Vi + pi(Si _ si) + p2(s2 — S2)]-
This is a mistake since Si can be correlated with S2 — S2 while S2 can be correlated with both Si — Si and S2 — S2. On the other hand, as long as Xi and Zi are uncorrelated with Vi in (4.6.5), and you have enough instruments in Zi, 2SLS estimation of (4.6.5) is straightforward.