Mostly Harmless Econometrics: An Empiricist’s Companion
Fuzzy RD is IV
Fuzzy RD exploits discontinuities in the probability or expected value of treatment conditional on a covariate. The result is a research design where the discontinuity becomes an instrumental variable for treatment status instead of deterministically switching treatment on or off. To see how this works, let D; denote the treatment as before, though here D; is no longer deterministically related to the threshold-crossing rule, x; > xo■ Rather, there is a jump in the probability of treatment at xo, so that
r, I go(xi) if x; > xo
P[D; = 1|x;J = > , where gi (xo ) = go (xo)■
I gi(x;) if x; < xo
The functions go(x;) and g1(x;) can be anything as long as they differ (and the more the better) at xo. We’ll assume g1(xo) > go(xo), so x; > xo makes treatment more likely. We can write the relation between the probability of treatment and x; as
E [D; |x;] = P [D; = 1|x;] = go(x;) + [gi(x;) - go(xi)]Ti,
where
T = 1(x; > xo):
The dummy variable T; indicates the point of discontinuity in E [Dj|xj].
Fuzzy RD leads naturally to a simple 2SLS estimation strategy. Assuming that go(x;) and g1(x;) can be described by pth-order polynomials as in (6.1.4), we have
E [^Ы = 7oo + 7oix; + 7o2x? + ■■■ + 7oPxp (6.2.1)
+ [7o + 7ix; + 72x2 + ■■■ + 7 lxV]T;
= 7 oo + 7oix; + 7o2x2 + ■■■ + 7opxp
+7(°тг + 7ixiTi + 7 °x2t; + ■■■ + 7°x? T; ■
From this we see that T;, as well as the interaction terms {xiTi, x2T;, . . . x^T;} can be used as instruments for D; in (6.1.4).[97]
The simplest fuzzy RD estimator uses only Ti as an instrument, without the interaction terms (with the
interaction terms in the instrument list, we might also like to allow for interactions in the second stage as in 6.1.6). The resulting just-identified IV estimator has the virtues of transparency and good finite-sample properties. The first stage in this case is
Di = 7 о + 71 xi + 72+ ••• + 7 рхї + KT i + Cii, (6.2.2)
where Ti is the excluded instrument that provides identifying power with a first-stage effect given by к.
The fuzzy RD reduced form is obtained by substituting (6.2.2) into (6.1.4):
Yi = p + Ki xi + К2Ж2 + ••• + Kpxp + pnti + £2i, (6.2.3)
where p = a + p7o and Kj = f1 + p7j for j = 1, „vp. As with sharp RD, identification in the fuzzy case turns on the ability to distinguish the relation between Yi and the discontinuous function, Ti = 1(xi > xo), from the effect of polynomial controls included in the first and second stage. In one of the first RD studies in applied econometrics, van der Klaauw (2002) used a fuzzy design to evaluate the effects of university financial aid awards on college enrollment. In van der Klaauw’s study, Di is the size of the financial aid award offer, and Ti is a dummy variable indicating applicants with an ability index above pre-determined award-threshold cutoffs.[98] [99]
Fuzzy RD estimates with treatment effects that change as a function of Xi can be constructed by 2SLS estimation of an equation with treatment-covariate interactions. Here, the second stage model with interaction terms is the same as (6.1.6), while the first stage is similar to (6.2.1), except that to match the second-stage parametrization, we center polynomial terms at xo. In this case, the excluded instruments are {Ti, xiTi, x2Ti, . . . x{ti} while the variables {Di, xiDi, Di~i2, . . . Dixip} are treated as endogenous. The first stage for Di becomes
An analogous first stage is constructed for each of the polynomial interaction terms in the set {xiDi, Dixi2,
. . . Dixip}.6
The nonparametric version of fuzzy RD consists of IV estimation in a small neighborhood around the discontinuity. The reduced-form conditional expectation of Yi near xo is
E [Yi|xo < xi <x0 + 5] - E [Yi|xo - 5 < xi < x0] ' py^.
Similarly, for the first stage for Di, we have
E [di |xo < xi < xo + 5] — E [Di|xo — 5 < xi < xo] ' yjj.
Therefore
E [Yi |xo < xi < xo + 5] - E [yі |xo - 5 < xi < xo]
І1Ш --------------------------------------------------------------------- = p.
a! E [Di |xo < xi < xo + 5] — E [dі |xo — 5 < xi < xo]
The sample analog of (6.2.5) is a Wald estimator of the sort discussed in Section??, in this case using Ті as an instrument for Di in a 5—neighborhood of xo. As with other dummy-variable instruments, the result is a local average treatment effect. In particular, the Wald estimand for fuzzy RD captures the causal effect on compliers defined as individuals whose treatment status changes as we move the value of xi from just to the left of xo to just to the right of xo. This interpretation of fuzzy RD was introduced by Hahn, Todd, and van der Klaauw (2001). Note, however, that there is another sense in which this version of LATE is local: the estimates are for compliers with xi = xo, a feature of sharp nonparametric estimates as well.
Finally, note that as with the nonparametric version of sharp RD, the finite-sample behavior of the sample analog of (6.2.5) is not likely to be very good. Hahn, Todd, and van der Klaauw (2001) develop a nonparametric IV procedure using local linear regression to estimate the top and bottom of the Wald estimator with less bias. This takes us back to a 2SLS model with linear or polynomial controls, but the model is fit in a discontinuity sample using a data-driven bandwidth. The idea of using discontinuity samples informally also applies in this context: start with a parametric 2SLS setup in the full sample, say, based on (6.1.4). Then restrict the sample to points near the discontinuity and get rid of most or all of the polynomial controls. Ideally, 2SLS estimates in the discontinuity samples with few controls will be broadly consistent with the more precise estimates constructed using the larger sample.
Angrist and Lavy (1999)use a fuzzy RD research design to estimate the effects of class size on children’s test scores, the same question addressed by the STAR experiment discussed in Chapter 2. Fuzzy RD is an especially powerful and flexible research design, a fact highlighted by the Angrist and Lavy study, which generalizes fuzzy RD in two ways relative to the discussion above. First, the causal variable of interest, class size, takes on many values. So the first stage exploits jumps in average class size instead of probabilities. Second, the Angrist and Lavy (1999) research design uses multiple discontinuities.
The Angrist and Lavy study begins with the observation that class size in Israeli schools is capped at 40. Students in a grade with up to 40 students can expect to be in classes as large as 40, but grades with 41 students are split into two classes, grades with 81 students are split into three classes, and so on. Angrist
and Lavy call this "Maimonides Rule" since a maximum class size of 40 was first proposed by the medieval Talmudic scholar Maimonides. To formalize Maimonides Rule, let msc denote the predicted class size (in a given grade) assigned to class c in school s, where enrollment in the grade is denoted es. Assuming grade cohorts are split up into classes of equal size, the predicted class size that results from a strict application of Maimonides’ Rule is
es
int[] + 1 where int(x) is the integer part of a real number, x. This function, plotted with dotted lines in Figure 6.2.1 for fourth and fifth graders, has a sawtooth pattern with discontinuities (in this case, sharp drops in predicted class size) at integer multiples of 40. At the same time, msc is clearly an increasing function of enrollment, es, making the enrollment variable an important control.
Angrist and Lavy exploit the discontinuities in Maimonides Rule by constructing 2SLS estimates of an equation like
yisc = a0 + a1 Pds + Pes + ^2e? + ... + + Pnsc + Vise
where Уisc is i's test score in school s and class c, nsc is the size of this class, and es is enrollment. In this version of fuzzy RD, msc plays the role of Ti, es plays the role of xi, and class size, nsc plays the role of Di. Angrist and Lavy also include a non-enrollment covariate, pds, to control for the proportion of students in the school from a disadvantaged background. This is not necessary for RD, since the only source of omitted variables bias in the RD model is es, but it makes the specification comparable to the model used to construct a corresponding set of OLS estimates.[100]
Figure 6.2.1 from Angrist and Lavy (1999) plots the average of actual and predicted class sizes against enrollment in fourth and fifth grade. Maimonides’ Rule does not predict class size perfectly because some schools split grades at enrollments lower than 40. This is what makes the RD design fuzzy. Still, there are clear drops in class size at enrollment levels of enrollment levels of 40, 80, and 120. Note also that the msc instrument neatly combines both discontinuities and slope-discontinuity interactions such as XiTi in (6.2.4) in a single variable. This compact parametrization comes from a specific understanding of the institutions and rules that determine Israeli class size.
Estimates of equation (6.2.6) for fifth-grade Math scores are reported in Table 6.2.1, beginning with OLS. With no controls, there is a strong positive relationship between class size and test scores. Most of this vanishes however, when the percent disadvantaged in the school is included as a control. The correlation between class size and test scores shrinks to insignificance when enrollment is added as an additional control, as can be seen in column 3. Still, there is no evidence that smaller classes are better, as we might believe based on the results from the Tennessee STAR randomized trial.
Figure 6.2.1: The fuzzy-RD first-stage for regression-discontinuity estimates of the effect of class size on pupils’ test scores (from Angrist and Lavy, 1999)
In contrast with the OLS estimates in column 3, 2SLS estimates of similar specification using msc as an instrument for nsc strongly suggest that smaller classes increase test scores. These results, reported in column 4 for models that include a linear enrollment control and in column 5 for models that include a quadratic enrollment control range from -.23 to -.26 with standard error around.1. These results suggest a 7-student reduction in class size (as in Tennessee STAR) raises Math scores by about 1.75 points, for an effect size of.18it, where a is the standard deviation of class average scores. This is not too far from the Tennessee estimates.
Importantly, the functional form of the enrollment control does not seem to matter very much (though estimates with no controls - not reported in the table - come out much smaller and insignificant). Columns 6 and 7 check the robustness of the main findings using a +/-5 discontinuity sample. Not surprisingly, these results are much less precise than those reported in columns 5 and 6 since they were estimated with only about one-quarter of the data used to construct the full-sample estimates. Still, they bounce around the -.25 mark. Finally, the last column shows the results of estimation using an even narrower discontinuity sample limited to schools with plus or minus an enrollment of 3 students around the discontinuities at 40, 80, and 120 (with dummy controls for which of these discontinuities is relevant). These are Wald estimates in the spirit of Hahn, Todd, and van der Klaauw (2001) and formula (6.2.5); the instrument used to construct these estimates is a dummy for being in a school with enrollment just to the right of the relevant discontinuity. The result is an imprecise -.270 (s. e.=.281), but still strikingly similar to the other estimates in the table. This set of estimates illustrates the high price to be paid in terms of precision when we shrink the sample around the discontinuities. Happily, however, the picture that emerges from Table (6.2.1) is fairly clear.
Table 6.2.1: OLS and fuzzy RD estimates of the effects of class size on fifth grade math scores
Mean score |
67.3 |
67.3 |
67.0 |
67.0 |
|||
(s. d.) |
(9.6) |
(9.6) |
(10.2) |
(10.6) |
|||
Regressors |
|||||||
Class size |
.322 |
.076 |
.019 |
-.230 |
-.261 |
-.185 -.443 |
-.270 |
(.039) |
(.036) |
(.044) |
(.092) |
(.113) |
(.151) (.236) |
(.281) |
|
Percent disadvantaged |
-.340 |
-.332 |
-.350 |
-.350 |
-.459 -.435 |
||
(.018) |
(.018) |
(.019) |
(.019) |
(.049) (.049) |
|||
Enrollment |
.017 |
.041 |
.062 |
.079 |
|||
(.009) |
(.012) |
(.037 |
(.036) |
||||
Enrollment squared/100 |
-.010 |
||||||
(.016) |
|||||||
Segment 1 |
-12.6 |
||||||
(enrollment 36-45) |
(3.80) |
||||||
Segment 2 |
-2.89 |
||||||
(enrollment 76-85) |
(2.41) |
||||||
Root MSE |
9.36 |
8.32 |
8.30 |
8.40 |
8.42 |
8.79 9.10 |
10.2 |
R-squared |
.048 |
.249 |
.252 |
||||
N |
2,018 |
2,018 |
471 |
302 |
Notes: Adapted from Angrist and Lavy (1999). The table reports estimates of equation |
(6.2.6) in the text using class averages. Standard errors, reported in parentheses, are corrected for within-school correlation.