Mostly Harmless Econometrics: An Empiricist’s Companion
Grouped Data and 2SLS
The Wald estimator is the mother of all instrumental variables estimators because more complicated 2SLS estimators can typically be constructed from an underlying set of Wald estimators. The link between Wald and 2SLS is grouped-data: 2SLS using dummy instruments is the same thing as GLS on a set of group means. GLS in turn can be understood as a linear combination of all the Wald estimators that can be constructed from pairs of means. The generality of this link might appear to be limited by the presumption that the instruments at hand are dummies. Not all instrumental variables are dummies, or even discrete, but this is not really important. For one thing, many credible instruments can be thought of as defining categories, such as quarter of birth. Moreover, instrumental variables that appear more continuous (such as draft lottery numbers, which range from 1-365) can usually be grouped without much loss of information (for example, a single dummy for draft-eligibility status, or dummies for groups of 25 lottery numbers).[44]
To explain the Wald/grouping/2SLS nexus more fully, we stick with the draft-lottery study. Earlier we noted that draft-eligibility is a promising instrument for Vietnam-era veteran status. The draft-eligibility ceilings were RSN 195 for men born in 1950, RSN 125 for men born in 1951, and RSN 95 for men born in 1952. In practice, however, there is a richer link between draft lottery numbers (which we’ll call Rj, short for RSN) and veteran status (Dj) than draft-eligibility status alone. Although men with numbers above the eligibility ceiling were not drafted, the ceiling was unknown in advance. Some men therefore volunteered in the hope of serving under better terms and gaining some control over the timing of their service. The pressure to become a draft-induced volunteer was high for men with low lottery numbers, but low for men with high numbers. As a result, there is variation in P[Dj = 1|Rj] even for values strictly above or below the draft-eligibility cutoff. For example, men born in 1950 with lottery numbers 200 - 225 were more likely to serve than those with lottery numbers 226 - 250, though ultimately no one in either group was drafted.
The Wald estimator using draft-eligibility as an instrument for men born in 1950 compares the earnings of men with Rj < 195 to the earnings of men with Rj > 195. But the previous discussion suggests the possibility of many more comparisons, for example men with Rj < 25 vs. men with Rj 2 [26 — 50]; men with Rj 2 [51 — 75] vs. men with Rj 2 [76 — 100], and so on, until these 25-number intervals are exhausted. We might also make the intervals finer, comparing, say, men in 5-number or single-number intervals instead of 25-number intervals. The result of this expansion in the set of comparisons is a set of Wald estimators. These sets are complete in that the intervals partition the support of the underlying instrument, while the individual estimators are linearly independent in the sense that their numerators are linearly independent. Finally, each of these Wald estimators consistently estimates the same causal effect, assumed here to be constant, as long as Rj is independent of potential outcomes and correlated with veteran status (i. e., the Wald denominators are not zero).
The possibility of constructing multiple Wald estimators for the same causal effect naturally raises the question of what to do with all of them. We would like to come up with a single estimate that somehow combines the information in the individual Wald estimates efficiently. As it turns out, the most efficient linear combination of a full set of linearly independent Wald estimates is produced by fitting a line through the group means used to construct these estimates.
The grouped data estimator can be motivated directly as follows. As in (4.1.11), we work with a bivariate constant-effects model, which in this case can be written
Y i = a + pdi + pb
where p =Yii—Yoi is the causal effect of interest and Yoi = a + Pi. Because Ri was randomly assigned and lottery numbers are assumed to have no effect on earnings other than through veteran status, E[p^Ri] = 0. It therefore follows that
E[Yi|Ri] = a + pP [Di = 1|Ri], (4.1.15)
since P[Di = 1|Ri] = E[Di |Ri]. In other words, the slope of the line connecting average earnings given lottery number with the average probability of service by lottery number is equal to the effect of military service, p. This is in spite of the fact that the regression Yi on Di—in this case, the difference in means by veteran status—almost certainly differs from p since Yoi and Di are likely to be correlated.
Equation (4.1.15) suggests an estimation strategy based on fitting a line to the sample analog of E[Yi|Ri] and P[Di = 1 |Ri]. Suppose that Ri takes on values j = 1, ...,J. In principle, j might run from 1 to 365, but in Angrist (1990), lottery-number information was aggregated to 69 five-number intervals, plus a 70th for numbers 346-365. We can therefore think of Ri as running from 1 to 70. Let yj and pj denote estimates of E[Yi|Ri = j] and P[Di = 11Ri = j], while pj denotes the average error in (4.1.14). Because sample moments converge to population moments it follows that OLS estimates of p in the grouped equation
are consistent. In practice, however, GLS may be preferable since a grouped equation is heteroskedastic with a known variance structure. The efficient GLS estimator for grouped data in a constant-effects linear model
is weighted least squares, weighted by the variance of pj (see, e. g., Prais and Aitchison, 1954 or Wooldridge,
-2
2006). Assuming the microdata residual is homoskedastic with variance, this variance is, where nj is the group size.
The GLS (or weighted least squares) estimator of p in equation (4.1.16) is especially important in this context for two reasons. First, the GLS slope estimate constructed from J grouped observations is an asymptotically efficient linear combination of any full set of J—1 linearly independent Wald estimators
(Angrist, 1991). This can be seen without any mathematics: GLS and any linear combination of pairwise Wald estimators are both linear combinations of the grouped dependent variable. Moreover, GLS is the asymptotically efficient linear estimator for grouped data. Therefore we can conclude that there is no better (i. e., asymptotically more efficient) linear combination of Wald estimators than GLS (again, a maintained assumption here is that p is constant). The formula for constructing the GLS estimator from a full set of linearly independent Wald estimators appears in Angrist (1988).
Second, just as each Wald estimator is also an IV estimator, the GLS (weighted least squares) estimator of equation (4.1.16) is also 2SLS. The instruments in this case are a full set of dummies to indicate each lottery-number cell. To see why, define the set of dummy instruments Z; = {rj; = 1[R; = j]; j = 1, ...J — 1}. Now, consider the first stage regression of D; on Z; plus a constant. Since this first stage is saturated, the fitted values will be the sample conditional means, pj, repeated nj times for each j. The second stage slope estimate is therefore exactly the same as weighted least squares estimation of the grouped equation, (4.1.16), weighted by the cell size, nj.
The connection between grouped-data and 2SLS is of both conceptual and practical importance. On the conceptual side, any 2SLS estimator using a set of dummy instruments can be understood as a linear combination of all the Wald estimators generated by these instruments one at a time. The Wald estimator in turn provides a simple framework used later in this chapter to interpret IV estimates in the much more realistic world of heterogeneous potential outcomes.
Although not all instruments are inherently discrete and therefore immediately amenable to a Wald or grouped-data interpretation, many are. Examples include the draft lottery number, quarter of birth, twins, and sibling-sex composition instruments we’ve already discussed. See also the recent studies by Bennedsen, et al., 2007, and Ananat and Michaels, 2008, both of which use dummies for male first births as instruments. Moreover, instruments that have a continuous flavor can often be fruitfully turned into discrete variables. For example, Angrist, Graddy and Imbens (2000) group continuous weather-based instruments into 3 dummy variables, stormy, mixed, and clear, which they then use to estimate the demand fish. This dummy-variable parameterization seems to capture the main features of the relationship between weather conditions and the price of fish.[45]
On the practical side, the grouped-data equivalent of 2SLS gives us a simple tool that can be used to explain and evaluate any IV strategy. In the case of the draft lottery, for example, the grouped model embodies the assumption that the only reason average earnings vary with lottery numbers is the variation in probability of service across lottery-number groups. If the underlying causal relation is linear with constant effects, then equation (4.1.16) should fit the group means well, something we can assess by inspection and, as discussed in the next section, with the machinery of formal statistical inference.
Sometimes labor economists refer to grouped-data plots for discrete instruments as Visual Instrumental Variables (VIV).[46] An example appears in Angrist (1990), reproduced here as Figure 4.1.2. This figure shows the relationship between average earnings in 5-number RSN cells and the probability of service in these cells, for the 1981-84 earnings of white men born 1950-53. The slope of the line through these points is an IV estimate of the earnings loss due to military service, in this case about $2,400, not very different from the Wald estimates discussed earlier but with a lower standard error (in this case, about $800).
Figure 4.1.2: The relationship between average earnings and the probability of military service (from Angrist 1990). This is a VIV plot of average 1981-84 earnings by cohort and groups of five consecutive draft lottery numbers against conditional probabilities of veteran status in the same cells. The sample includes white men born 1950-53. Plotted points consist of average residuals (over four years of earnings) from regressions on period and cohort effects. The slope of the least-squares regression line drawn through the points is -2,384, with a standard error of 778.