Mostly Harmless Econometrics: An Empiricist’s Companion
Counting and Characterizing Compliers
We’ve seen that, except in special cases, each instrumental variable identifies a unique causal parameter, one specific to the subpopulation of compliers for that instrument. Different valid instruments for the same causal relation therefore estimate different things, at least in principle (an important exception being
instruments that allow for perfect compliance on one side or the other). Although different IV estimates are "weighted-up" by 2SLS to produce a single average causal effect, over-identification testing of the sort discussed in Section 4.2.2, where multiple instruments are validated according to whether or not they estimate the same thing, is out the window in a fully heterogeneous world.
Differences in compliant sub-populations might explain variability in treatment effects from one instrument to another. We would therefore like to learn as much as we can about the compliers for different instruments. Moreover, if the compliant subpopulation is similar to other populations of interest, the case for extrapolating estimated causal effects to these other populations is stronger. In this spirit, Acemoglu and Angrist (2000) argue that quarter-of-birth instruments and state compulsory attendance laws (the minimum schooling required before leaving school in your state of birth when you were 14) affect essentially the same group of people and for the same reasons. We therefore expect IV estimates of the returns to schooling from these two sets of instruments to be similar. We might also expect the quarter of birth estimates to predict the impact of contemporary proposals to strengthen compulsory attendance laws.
On the other hand, if the compliant subpopulations associated with two or more instruments are very different, yet the IV estimates they generate are similar, we might be prepared to adopt homogeneous effects as a working hypothesis. This revives the over-identification idea, but puts it at the service of external validity.[63] This reasoning is illustrated by the study of the effects of family size on children’s education by Angrist, Lavy, and Schlosser (2006). The Angrist, Lavy, and Schlosser study is motivated by the observation that children from larger families typically end up with less education than those from smaller families. A long-standing concern in research on fertility is whether the observed negative correlation between larger families and worse outcomes is causal. As it turns out, IV estimates of the effect of family size using a number of different instruments, each with very different compliant subpopulations, all generate results showing no effect of family size. Angrist, Lavy, and Schlosser (2006) argue that their results point to a common treatment of zero for just about everybody in the Israeli population they study.
We have already seen that the size of a complier group is easy to measure. This is just the Wald first-stage, since, given monotonicity, we have
P[Dii>Doi] = E[Dii - DQi]
= E[Dii] - E[Doi]
= E[Di|Zi = 1] - E[Di|Zi=0].
We can also tell what proportion of the treated are compliers since, for compliers, treatment status is
completely determined by zi. Start with the definition of conditional probability:
The second equality uses the fact that P[Di=1|D1i >DQi] = P[zi=1|D1i >DQi] and that P[zi=1|D1i >DQi] = P[Zi=1] by Independence. In other words, the proportion of the treated who are compliers is given by the first stage, times the probability the instrument is switched on, divided by the proportion treated.
Formula (4.4.7) is illustrated here by calculating the proportion of veterans who are draft-lottery compliers. The ingredients are reported in Table 4.4.2. For example, for white men born in 1950, the first stage is.159, the probability of draft-eligibility is ЦЦ, and the marginal probability of treatment is.267. From these statistics, we compute that the compliant subpopulation is.32 of the veteran population in this group. The proportion of veterans who were draft-lottery compliers falls to 20 percent for non-white men born in 1950. This is not surprising since the draft-lottery first stage is considerably weaker for non-whites. The last column of the table reports the proportion of nonveterans who would have served if they had been draft-eligible. This ranges from 3 percent of non-whites to 10 percent of whites, reflecting the fact that most non-veterans were deferred, ineligible, or unqualified for military service.
|
a number of instrumental variables. The first-stage, reported in column 6, gives the absolute size of the compiler group. Columns 8 and 9 show the size of the compiler population relative to the
treated and untreated populations.
The effect of compulsory military service is the parameter of primary interest in the Angrist (1990) study, so the fact that draft-eligibility compliers are a minority of veterans is not really a limitation of this study. Even in the Vietnam era, most soldiers were volunteers, a little-appreciated fact about Vietnam-era veterans. The LATE interpretation of IV estimates using the draft lottery highlights the fact that other identification strategies are needed to estimate effects of military service on volunteers (some of these are implemented in Angrist, 1998).
The remaining rows in Table 4.4.2 document the size of the compliant subpopulation for the twins and sibling-sex composition instruments used by Angrist and Evans (1998) to estimate the effects of childbearing and for the quarter of birth instruments and compulsory attendance laws used by Angrist and Krueger (1991) and Acemoglu and Angrist (2000) to estimates the returns to schooling. In each of these studies, the compliant subpopulation is a small fraction of the treated group. For example, less than 2 percent of those who graduated from high school did so because of compulsory attendance laws or by virtue of having been born in a late quarter.
The question of whether a small compliant subpopulation is a cause for worry is context-specific. In some cases, it seems fair to say, "you get what you need." With many policy interventions, for example, it is a marginal group that is of primary interest, a point emphasized in McClellan’s (1994) landmark IV study of the effects of surgery on heart attack patients. McClellan uses the relative distance to cardiac care facilities to construct instruments for whether an elderly heart-attack patient is treated with a surgical intervention. Most patients get the same treatment either way, but for some, the case for major surgery is marginal. In such cases, providers or patients opt for a less invasive strategy if the nearest surgical facility is far away. McClellan finds little benefit from surgical procedures for this marginal group. Similarly, an increase in the compulsory attendance age to age 18 is clearly irrelevant for the vast majority of American high school students, but it will affect a few who would otherwise drop out. IV estimates suggest the economic returns to schooling for this marginal group are substantial.
The last column of Table 4.4.2 illustrates the special feature of twins instruments alluded to at the end of the previous subsection. As before, let Dj =0 for women with two children in a sample of women with at least two children, while Dj = 1 indicates women who have more than two. Because there are no never-takers in response to the event of a multiple birth, i. e., all mothers who have twins at second birth end up with (at least) three children, the probability of compliance among those with Dj =0 is virtually one (the table shows an entry of.97). LATE is therefore the effect on the non-treated, E[Yij—Yoj|Dj = 0], in this case.
Unlike the size of the complier group, information on the characteristics of compliers seems like a tall order because the compliers cannot be individually identified. Because we can’t see both D1j and Doi for each individual, we can’t just list those with D1j >Doi and then calculate the distribution of characteristics for this group. Nevertheless, it’s easy to describe the distribution of complier characteristics. To simplify,
we focus here on characteristics - like race or degree completion - that can be described by dummy variables. In this case, everything we need to know can be learned from variation in the first stage across covariate groups.
Let xu be a Bernoulli-distributed characteristic, say a dummy indicating college graduates. Are sex - composition compliers more or less likely to be college graduates than other women with two children? This question is answered by the following calculation:
In other words, the relative likelihood a complier is a college graduate is given by the ratio of the first stage for college graduates to the overall first stage.29
This calculation is illustrated in Table 4.4.3, which reports compliers’ characteristics ratios for age at first birth, nonwhite race, and degree completion using twins and same-sex instruments. The table was constructed from the Angrist and Evans (1998) 1980 census extract. Twins compliers are much more likely to be over 30 than the average mother in the sample, reflecting the fact that younger women who had a multiple birth were likely to go on to have additional children anyway. Twins compliers are also more educated than the average mother, while sex-composition compliers are less educated. This helps to explain the smaller 2SLS estimates generated by twins instruments (reported here in Table 4.1.4), since Angrist and Evans (1998) show that the labor supply consequences of childbearing decline with mother’s schooling.
29 A general method for constructing the mean or other features of the distribution of covariates for compliers uses Abadie’s (2003) kappa-weighting scheme. For example,
E[X,|dh > Dq,]
where
d,(1 - z,) (1 - d,)z,
1 1 - P(z, = 1|Хг) P(z, = 1Хг)
This works because the weighting function, к,, "finds compliers,” in a sense discussed in Section (4.5.2), below.
Twins at second birth |
First two children |
are same sex |
|||
Variable |
E [ж] |
E[x|Dr > Do] |
P[xd1>d0/P[X] |
E [ж Dx > D0] P[xd1>d0/P[X] |
|
(1) |
(2) |
(3) |
(6) |
(5) |
|
Age 30 or older at first birth |
0.00291 |
0.00404 |
1.39 |
0.00233 |
0.995 |
(0.0201) |
(0.374) |
||||
Black or hispanic |
0.125 |
0.103 |
0.822 |
0.102 |
0.814 |
(0.00421) |
(0.0775) |
||||
High school graduate |
0.822 |
0.861 |
1.048 |
0.815 |
0.998 |
(0.000772) |
(0.0140) |
||||
College graduate |
0.132 |
0.151 |
1.14 |
0.0904 |
0.704 |
(0.00376) |
(0.0692) |
Table 4.4.3: Complier-characteristics ratios for twins and sex-composition instruments |
Notes: The table reports an analysis of compiler characteristics for twins and sex-composition instru |
ments. The ratios in columns 3 and 5 give the relative likelihood compilers have the characteristic indicated in each row. Data are from the 1980 Census 5% sample, including married mothers age 21-35 with at least two children, as in Angrist and Evans (1998). The sample size is 254,654 for all columns.