Using gret l for Principles of Econometrics, 4th Edition
Differences-in-Differences Estimation
If you want to learn about how a change in policy affects outcomes, nothing beats a randomized controlled experiment. Unfortunately, these are rare in economics because they are either very expensive of morally unacceptable. No one want to determines what the return to schooling is by randomly assigning people to a prescribed number of schooling years. That choice should be yours and not someone else’s.
But, the evaluation of policy is not hopeless when randomized controlled experiments are impossible. Life provides us with situations that happen to different groups of individuals at different points in time. Such events are not really random, but from a statistical point of view the treatment may appear to be randomly assigned. That is what so-called natural experiments are about. You have two groups of similar people. For whatever reason, one group gets treated to the policy and the other does not. Comparative differences are attributed to the policy.
In the example, we will look at the effects of a change in the minimum wage. It is made possible because the minimum wage is raised in one state and not another. The similarity of states is important because the non-treated state is going to be used for comparison.
The data come from Card and Krueger and are found in the file njminS. gdt. We will open it and look at the summary statistics by state.
1 open "@gretldirdatapoenjmin3.gdt"
2 smpl d = 0 —restrict
3 summary fte —by=nj —simple
4 smpl full
5 smpl d = 1 —restrict
6 summary fte —by=nj —simple
7 smpl full
Since we want to get a picture of what happened in NJ and PA before and after NJ raised the minimum wage we restrict the sample to before the increase. Then get the summary statistics for fte by state in line 3. Restore the full sample and then restrict it to after the policy d=1. Repeat the summary statistics for fte. The results suggest not much difference at this point.
nj = |
0 |
(n = |
79) d=0: |
|||
Mean |
Minimum |
Maximum |
Std. Dev. |
|||
fte |
23.331 |
7.5000 |
70.500 |
11.856 |
||
nj = |
1 |
(n = |
331) d=0: |
|||
Mean |
Minimum |
Maximum |
Std. Dev. |
|||
fte |
20.439 |
5.0000 |
85.000 |
9.1062 |
||
nj = |
0 |
(n = |
79) d=1: |
|||
Mean |
Minimum |
Maximum |
Std. Dev. |
|||
fte |
21.166 |
0.00000 |
43.500 |
8.2767 |
||
nj = |
1 |
(n = |
331) d=1: |
|||
Mean |
Minimum |
Maximum |
Std. Dev. |
|||
fte |
21.027 |
0.00000 |
60.500 |
9.2930 |
Now, make some variable list and run a few regressions [53] 2 [54] [55]
5 ols fte ХІ
6 modeltab add
7 ols fte x2
8 modeltab add
9 ols fte x3
10 modeltab add
11 modeltab show
The first set of variables include the indicator variables nj, d and their interaction. The second set adds more indicators for whether the jobs are at kfc, roys, or wendys and if the store is companied owned. The final set add more indicators for location.
The results from the three regressions appear below:
OLS estimates
Dependent variable: fte
Standard errors in parentheses * indicates significance at the 10 percent level ** indicates significance at the 5 percent level
In the previous analysis we did not exploit an important feature of Card and Krueger's data. The same restaurants were observed before and after in both states-in 384 of the 410 observations. It seems reasonable to limit the before and after comparison to the same units.
This requires adding an individual fixed effect to the model and dropping observations that have no before or after with which to compare.
1 smpl missing(demp) != 1 —restrict
2 ols demp const nj
Fortunately, the data set includes the AFTE where it is called demp. Dropping the observations for demp that are missing and using least squares to estimate the parameters of the simple regression yield:
demp = -2.28333 + 2.75000 nj
(0.73126) (0.81519)
T = 768 R[56] [57] [58] [59] [60] = 0.0134 F(1, 766) = 11.380 a = 8.9560
(standard errors in parentheses)
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
# obtain summary statistics for full sample smpl full
summary
# create indicator variable for large homes series ld = (sqft>25)
discrete ld smpl 1 8
print ld sqft —byobs smpl full
# create interaction and estimate model series sqft_utown=sqft*utown
ols price const utown sqft sqft_utown age pool fplace
# generate some marginal effects scalar premium = $coeff(utown)*1000
scalar sq_u = 10*($coeff(sqft)+$coeff(sqft_utown))
scalar sq_other = 10*$coeff(sqft)
scalar depr = 1000*$coeff(age)
scalar sp = 1000*$coeff(pool)
scalar firep = 1000*$coeff(fplace)
printf "n University Premium = $%8.7gn
Marginal effect of sqft near University = $%7.6gn
Marginal effect of sqft elsewhere = $%7.6gn
Depreciation Rate = $%7.2fn
Pool = $%7.2fn
Fireplace = $%7.2fn",premium, sq_u, sq_other, depr, sp, firep omit sqft_utown
# testing joint hypotheses
open "@gretldirdatapoecps4_small. gdt" series blk_fem = black*female ols wage const educ black female blk_fem restrict b[3]=0 b[4]=0 b[5]=0 end restrict
ols wage const educ black female blk_fem south midwest west omit south midwest west scalar sser = $ess
# creation of interactions using a loop list x = const educ black female blk_fem list dx = null
loop foreach i x
series south_$i = south * $i list dx = dx south_$i endloop
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
modeltab clear ols wage x dx scalar sseu = $ess scalar dfu = $df modeltab add
# estimating subsets smpl south=1 —restrict ols wage x
modeltab add smpl full
smpl south=0 —restrict ols wage x modeltab add modeltab show
# Chow tests smpl full ols wage x scalar sser = $ess
scalar fstat = ((sser-sseu)/5)/(sseu/dfu) pvalue f 5 dfu fstat
ols wage x
chow south —dummy
# log-linear model--interpretation
open "@gretldirdatapoecps4_small. gdt" logs wage
ols l_wage const educ female
scalar differential = 100*(exp($coeff(female))-1)
# linear probability model with HCCME open "@gretldirdatapoecoke. gdt"
ols coke const pratio disp_coke disp_pepsi —robust
# treatment effects
open "@gretldirdatapoestar. gdt"
list v = totalscore small tchexper boy freelunch
white_asian tchwhite tchmasters schurban schrural summary v —by=small —simple summary v --by=regular --simple
smpl aide!= 1 --restrict list x1 = const small list x2 = x1 tchexper
list x3 = x1 boy freelunch white_asian
list x4 = x1 tchwhite tchmasters schurban schrural
ols totalscore x1 --quiet
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
modeltab add
ols totalscore x2 —quiet modeltab add
ols totalscore x3 —quiet modeltab add
ols totalscore x4 —quiet modeltab add modeltab show modeltab free
# manual creation of multiple indicators for school id discrete schid
list d = dummify(schid) ols totalscore x1 —quiet scalar sser = $ess scalar r_df = $df modeltab add
ols totalscore x2 --quiet modeltab add
ols totalscore x1 d --quiet scalar sseu = $ess scalar u_df = $df modeltab add
ols totalscore x2 d --quiet modeltab add modeltab show modeltab free
scalar J = r_df-u_df
scalar fstat = ((sser - sseu)/J)/(sseu/u_df) pvalue f J u_df fstat
# testing random assignment of students
ols small const boy white_asian tchexper freelunch restrict
b[1] = .5 end restrict
# differences-in-differences
open "@gretldirdatapoenjmin3.gdt" smpl d = 0 —restrict summary fte --by=nj --simple smpl full
smpl d = 1 --restrict summary fte --by=nj --simple smpl full
list x1 = const nj d d_nj
list x2 = x1 kfc roys wendys co_owned
list x3 = x2 southj centralj pa1
summary x1 fte
ols fte x1 modeltab add ols fte x2 modeltab add ols fte x3 modeltab add modeltab show modeltab free
159 160 161 162 163 164 165 166 167 168 169 170 171 |
smpl missing(demp) != ols demp const nj