Mostly Harmless Econometrics: An Empiricist’s Companion
Regression DD
As with the fixed effects model, we can use regression to estimate equations like (5.2.2). Let NJs be a dummy for restaurants in New Jersey and dt be a time-dummy that switches on for observations obtained
in November (i. e., after the minimum wage change). Then
Yist — a + 7NJS + Adt + @(NJS ■ dt) + "ist (5.2.3)
is the same as (5.2.2) where NJs ■ dt=Dst. In the language of Section 3.1.4, this model includes two main effects for state and year and an interaction term that marks observations from New Jersey in November. This is a saturated model since the conditional mean function E(Yist|s, t) takes on four possible values and there are four parameters. The link between the parameters in the regression equation, (5.2.3), and those in the DD model for the conditional mean function, (5.2.2), is
a — E (y ist |s — PA, t — Feb)— 7 PA + AFeb
7 — E (y ist |s — NJ, t — Feb) - E(Yistjs — PA, t — Feb)— 7 Nj - 7 pa
A — E (y ist |s — PA;t — Nov) - E(Yistjs — PA, t — Feb)— Anov - AFeb
P — fE(Yist|s — NJ;t — Nov) - E(Yist|s — NJ;t — Feb)}
~{E(yist|s — PA, t — Nov) - E(Yist|s — PA, t — Feb)}.
The regression formulation of the difference-in-difference model offers a convenient way to construct DD estimates and standard errors. It’s also easy to add additional states or periods to the regression set-up. We might for example, add additional control states and pre-treatment periods to the New Jersey/Pennsylvania sample. The resulting generalization of (5.2.3) includes a dummy for each state and period but is otherwise unchanged.
A second advantage of regression-DD is that it facilitates empirical work with regressors other than switched-on/switched-off dummy variables. Instead of New Jersey and Pennsylvania in 1992, for example, we might look at all state minimum wages in the United States. Some of these are a little higher than the federal minimum (which covers everyone regardless of where they live), some are a lot higher, and some are the same. The minimum wage is therefore a variable with differing "treatment intensity" across states and over time. Moreover, in addition to statutory variation in state minima, the local importance of a minimum wage varies with average state wage levels. For example, the early-1990s Federal minimum of $4.25 was probably irrelevant in Connecticut - with high average wages - but a big deal in Mississippi.
Card (1992) exploits regional variation in the impact of the federal minimum wage. His approach is motivated by an equation like
Yist — 7s + At + P(FAS ■ dt) + "ist (5.2.4)
where the variable FAs is a measure of the fraction of teenagers likely to be affected by a minimum wage increase in each state and dt is a dummy for observations after 1990, when the federal minimum increased from $3.35 to $3.80. The FAs variable measures the baseline (pre-increase) proportion of each state’s teen
labor force earning less than $3.80.
As in the New Jersey/Pennsylvania study, Card (1992) works with data from two periods, before and after, in this case 1989 and 1992. But this study uses 51 states (including the District of Columbia), for a total of 102 state-year observations. Since there are no individual-level covariates in (5.2.4), this is the same as estimation with micro data (provided the group-level estimates are weighted by cell size). Note that FAs • dt is an interaction term, like NJs • dt in (5.2.3), though here the interaction term takes on a distinct value for each observation in the data set. Finally, because Card (1992) analyzes data for only two periods, the reported estimates are from an equation in first-differences:
AYS = A* + ftFAS + Ags;
where AYs is the change in average teen employment in state s and Aes is the error term in the differenced equation.[89]
Table 5.2.2, based on Table 3 in Card (1992), shows that wages increased more in states where the minimum wage increase is likely to have had more bite (see the estimate of.15 in column 1). This is an important step in Card’s analysis - it verifies the notion that the fraction affected variable is a good predictor of the wage changes induced by an increase in the federal minimum. Employment, on the other hand, seems largely unrelated to fraction affected, as can be seen in column 3. Thus, the results in Card (1992) are in line with the results from the New Jersey/Pennsylvania study.
Table 5.2.2: Regression-DD estimates of minimum wage effects on teens, 1989 to 1992 Equations for Change Equations for change in Teen in Mean Log Wage: Employment-Population Ratio:
|
Notes: Adapted from Card (1992). The table reports estimates from a regression of the change in average teen employment by state on the fraction of teens affected by a change in the federal minimum wage in each state. Data are from the 1989 and 1992 CPS. Regressions are weighted by the CPS sample size by state and year.
Card’s (1992) analysis illustrates a further advantage of regression-DD: it’s easy to add additional covariates in this framework. For example, we might like to control for adult employment as a source of omitted
state-specific trends. In other words, we can model counterfactual employment in the absence of a change in the minimum wage as
E[y0ist |S; t, Xst] = 7s + ^t + X'stS.
where Xst is a vector of state-and-time-varying covariates, including adult employment (though this may not be kosher if adult employment also responds to the minimum wage change, in which case it’s bad control; see Section 3.2.3). As it turns out, the addition of an adult employment control has little effect on Card’s estimates, as can be seen in columns 2 and 4 in Table 5.2.2.
It’s worth emphasizing the fact that Card (1992) analyzes state averages instead of individual data. He might have used a pooled multi-year sample of micro data from the CPS to estimate an equation like
Y ist = 7s + A + p(fas • dt) + XistS + "ist; (5.2.5)
where Xist can include individual level characteristics such as race. The covariate vector might also include time-varying variables measured at the state level. Only the latter are likely to be a source of omitted variables bias, but individual-level controls can increase precision, a point we noted in Section 2.3. Inference is a little more complicated in a framework that combines of micro data on dependent variables with group- level regressors, however. The key issue is how best to adjust for possible group-level random effects, as we discuss in Chapter 8, below.
When the sample includes many years, the regression-DD model lends itself to a test for causality in the spirit of Granger (1969). The Granger idea is to see whether causes happen before consequences and not vice versa (though as we know from the epigram at the beginning of Chapter 4, this alone is not sufficient for causal inference). Suppose the policy variable of interest, Dst, changes at different times in different states. In this context, Granger causality testing means a check on whether, conditional on state and year effects, past Dst predicts Yist while future Dst does not. If Dst causes Yist but not vice versa, then leads should not matter in an equation like:
m q
Y ist = 7s + A + P-r Ds, t-r + P+T Ds, t+T XistS + "ist; (5.2.6)
t=0 r=1
where the sums on the right-hand side allow for m lags (P_2, ...,P_m) or post-treatment effects and q leads (fi+1,fi+1,...,P+q) or anticipatory effects. The pattern of lagged effects is usually of substantive interest as well. We might, for example, believe that causal effects should grow or fade as time passes.
Autor (2003) implements the Granger test in an investigation of the effect of employment protection on firms’ use of temporary help. Employment protection is a type of labor law - promulgated by state legislatures or, more typically, through common law as made by state courts - that makes it harder to fire workers. As a rule, U. S. labor law allows "employment at will," which means that workers can be fired for just cause or no cause, at the employer’s whim. But some state courts have allowed a number of exceptions to the employment-at-will doctrine, leading to lawsuits for "unjust dismissal". Autor is interested in whether fear of employee lawsuits makes firms more likely to use temporary workers for tasks for which they would otherwise have increased their workforce. Temporary workers work for someone else besides the firm for which they are executing tasks. As a result, the firm using them cannot be sued for unjust dismissal when they let temporary workers go.
Autor’s empirical strategy relates the employment of temporary workers in a state to dummy variables indicating state court rulings that allow exceptions to the employment-at-will doctrine. His regression-DD model includes both leads and lags, as in equation (5.2.6). The estimated leads and lags, running from two years ahead to 4 years behind, are plotted in Figure 5.2.4, a reproduction of Figure 3 from Autor (2003). The estimates show no effects in the two years before the courts adopted an exception, with sharply increasing effects on temporary employment in the first few years after the adoption, which then appear to flatten out with a permanently higher rate of temporary employment in affected states. This pattern seems consistent with a causal interpretation of Autor’s results.
An alternative check on the DD identification strategy adds state-specific time trends to the regressors in Xjst. In other words, we estimate
Y ist — 7 0s + 71st + ^t + st + Xist$ + "ist, (5.2.7)
where 7os is a state-specific intercept as before and 71s is a state-specific trend coefficient multiplying the time-trend variable, t. This allows treatment and control states to follow different trends in a limited but potentially revealing way. It’s heartening to find that the estimated effects of interest are unchanged by the inclusion of these trends, and discouraging otherwise. Note, however, that we need at least 3 periods to estimate a model with state-specific trends. Moreover, in practice, 3 periods is typically inadequate to pin down both the trends and the treatment effect. As a rule, DD estimation with state-specific trends is likely to be more robust and convincing when the pre-treatment data establish a clear trend that can be extrapolated into the post-treatment period.
In a study of the effect of labor regulation on businesses in Indian states, Besley and Burgess (2004)use state trends as a robustness check. Different states change regulatory regimes at different times, giving rise to a DD research design. As in Card (1992), the unit of observation in Besley and Burgess (2004) is a state-year average. Table 5.2.3 (based on Table IV in their paper) reproduces the key results.
The estimates in column 1, from a regression-DD model without state-specific trends, suggest that labor regulation leads to lower output per capita. The models used to construct the estimates in columns 2 and 3 add time-varying state-specific covariates like government expenditure per capita and state population. This is in the spirit of Card’s (1992) addition of state-level adult employment rates as a control in the minimum
Time passage relative to year of adoption of implied contract exception
Table 5.2.3: Effect of labor regulation on the performance of firms in Indian states
Notes: Adapted from Besley and Burgess (2004), Table IV. The table reports |
regression-DD estimates of the effects of labor regulation on productivity. The dependent variable is log manufacturing output per capita. All models include state and year effects. Robust standard errors clustered at the state level are reported in parentheses. State amendments to the Industrial Disputes Act are coded 1=pro-worker, 0 = neutral, -1 = pro-employer and then cumulated over the period to generate the labor regulation measure. Log of installed electrical capacity is measured in kilowatts, and log development expenditure is real per capita state spending on social and economic services. Congress, hard left, Janata, and regional majority are counts of the number of years for which these political groupings held a majority of the seats in the state legislatures. The data are for the sixteen main states for the period 1958-1992. There are 552 observations.
wage study. The addition of controls affects the Besley and Burgess estimates little. But the addition of state-specific trends kills the labor-regulation effect, as can be seen in column 4. Apparently, labor regulation in India increases in states where output is declining anyway. Control for this trend therefore drives the estimated regulation effect to zero.
Picking Controls
We’ve labeled the two dimensions in the DD set-up “states” and “time” because this is the archetypical DD example in applied econometrics. But the DD idea is much more general. Instead of states, the subscript s might denote demographic groups, some of which are affected by a policy and others are not. For example, Kugler, Jimeno, and Hernanz (2005) look at the effects of age-specific employment protection policies in Spain. Likewise, instead of time, we might group data by cohort or other types of characteristics. An example is Angrist and Evans (1999), who study the effect of changes in state abortion laws on teen pregnancy using variation by state and year of birth. Implicitly, however, DD designs always set up an implicit treatment-control comparison. The question of whether this comparison is a good one deserves careful consideration.
One potential pitfall in this context arises when the composition of the implicit treatment and control groups changes as a result of treatment. Going back to a design based on state/time comparisons, suppose we’re interested in the effects of the generosity of public assistance on labor supply. Historically, U. S. states have offered widely-varying welfare payments to poor unmarried mothers. Labor economists have long been interested in the effects of such income maintenance policies - how much of an increase in living standards they facilitate, and whether they make work less attractive (see, e. g., Meyer and Rosenbaum, 2001, for a recent study). A concern here, emphasized in a review of research on welfare by Moffitt (1992), is that poor people who would in any case have weak labor force attachment might move to states with more generous welfare benefits. In a DD research design, this sort of program-induced migration tends to make generous welfare programs look worse for labor supply than they really are.
Migration problems can usually be fixed if we know where an individual starts out. Say we know state of residence in the period before treatment, or state of birth. State of birth or previous state of residence are unchanged by the treatment but still highly correlated with current state of residence. The problem of migration is therefore eliminated in comparisons using these dimensions instead of state of residence. This introduces a new problem, however, which is that individuals who do move are incorrectly located. In practice, however, this problem is easily addressed with the IV methods discussed in chapter 4 (state of birth or previous residence is used to construct instruments for current location).
A modification of the two-by-two DD set-up uses higher-order contrasts to draw causal inferences. An example is the extension of Medicaid coverage in the U. S. studied by Yelowitz (1995). Eligibility for Medicaid, the massive U. S. health insurance program for the poor, was once tied to eligibility for AFDC, a large cash welfare program. At various times in the 1980s, however, some states extended Medicaid coverage to children in families ineligible for AFDC. Yelowitz was interested in how this expansion affected, among other things, mothers’ labor force participation and earnings.
In addition to state and time, children’s age provides a third dimension in which Medicaid policy varies. Yelowitz exploits this variation by estimating
y iast — Tst + ^at + @ as + ftdast + Xiast3 + "iast;
where s index states, t indexes time, and a is the age of the youngest child in a family. This model provides full non-parametric control for state-specific time effects that are common across age groups (qst), time-varying