Mostly Harmless Econometrics: An Empiricist’s Companion
Appendix: More on fixed effects and lagged dependent variables
To simplify, we ignore covariates and year effects and assume there are only two periods, with treatment equal to zero for everyone in the first period (the punch line is the same in a more general setup). The causal effect of interest, P, is positive. Suppose first that treatment is correlated with an unobserved individual effect, aj, and that outcomes can be described by
Y it — aj + PDj t + "it-
where "jt is serially uncorrelated, and uncorrelated with aj and Djt. We also have
Y it— 1 — aj + £jt-i,
where aj and £jt_1 are uncorrelated. You mistakenly estimate the effect of Djt in a model that controls for Yjt_1 but ignores fixed effects. The resulting estimator has probability limit, where Djt — Djt —
7Yjt_i is the residual from a regression of Djt on Yjt_1.
Now substitute a = Yjt-i — "it-1 in (5.4.1) to get
Yit = Y it— 1 + Pd it + "it — "it— 1 ■
From here, we get
Cov(Yit, Dit) = _ Cov("it-i, Dit) = _ Cov("it-i, Dit - 7Yit_i) = _ + 7a;2
V (Dit) =P V (Dit) =P V (Dit) =P V (Dit):
where ct;2 is the variance of "it_1. Since trainees have low Yit_i, 7 < 0 and the resulting estimate of P is too small.
Suppose instead that treatment is determined by low Yit_1. The correct specification is a simplified version of (5.3.3), say
Y it = + 0Yit-i + PDit + "it; (5.4.2)
where "it is serially uncorrelated. You mistakenly estimate a first-differenced equation in an effort to kill fixed effects. This ignores lagged dependent variables. In this simple example, where Dit_i = 0 for everyone, the first-differenced estimator has probability limit
C°V(yit Yit — 1; Dit Dit — 1) C°V(yit Yit — 1; Dit)
V(Dit - Dit— 1) V(Dit)
Subtracting Yit_i from both sides of (5.4.2), we have
Y it — Y it - 1 = ^ + (6 — l)Y it— 1 + P Dit + "it:
Substituting this in (4.2.2), the inappropriately differenced model yields
In general, we think в is a number between zero and one, otherwise Yit is non-stationary (i. e., an explosive time series process). Therefore, since trainees have low Yit_1; the estimate of P in first differences is too big.
CHAPTER 5. FIXED EFFECTS, DD, AND PANEL DATA