Springer Texts in Business and Economics
Simultaneous Bias
Example 1: Consider a simple Keynesian model with no government
Ct = a + f3Yt + Ut t = 1,2,...,T (11.1)
Yt = Ct + It (11.2)
where Ct denotes consumption, Yt denotes disposable income, and It denotes autonomous investment. This is a system of two simultaneous equations, also known as structural equations
with the second equation being an identity. The first equation can be estimated by OLS giving
eOLS = ET=1 VtCt/Yl= 1 Vt and aOLS = C — PolsY (11.3)
with yt and ct denoting Yt and Ct in deviation form, i. e., yt = Yt — Y, and Y = ^T=1 Yt/T.
Since It is autonomous, it is an exogenous variable determined outside the system, whereas Ct and Yt are endogenous variables determined by the system. Let us solve for Yt and Ct in terms of the constant and It. The resulting two equations are known as the reduced form equations
Ct = a/ (1 — e)+eit(1 — e)+Ut/(1 — в) (11.4)
Yt = a/ (1 — e)+It/(1 — e)+Ut/(1 — в) (11.5)
B. H. Baltagi, Econometrics, Springer Texts in Business and Economics, DOI 10.1007/978-3-642-20059-5_11, © Springer-Verlag Berlin Heidelberg 2011
These equations express each endogenous variable in terms of exogenous variables and the error terms. Note that both Yt and Ct are a function of ut, and hence both are correlated with ut. In fact, Y — E(Yt) = ut/(1 — в), and
cov(Yt, Ut) = E[(Yt — E(Yt))ut] = a*/(1 — в) > 0 if 0 < в < 1 (11.6)
This holds because ut ~ (0, аЦ) and It is exogenous and independent of the error term. Equation (11.6) shows that the right hand side regressor in (11.1) is correlated with the error term. This causes the OLS estimates to be biased and inconsistent. In fact, from (11.1),
ct = Ct — C = eyt + (ut — u)
and substituting this expression in (11.3), we get
Pols = в + S t=i ytut/Y? t= 1 yt (11.7)
From (11.7), it is clear that E(вOLS) = в, since the expected value of the second term is not necessarily zero. Also, using (11.5) one gets
yt = Yt — Y = [it + (ut — u)]/(1 — в)
where it = It — I and I = Yt1=1 It/T. Defining myy = J21=1 Vt /T, we get
myy = (mu + 2mtu + muu)/(1 — в)2 (11.8)
where mu = X)T=1 if/T, miU = £]T=1 it(ut — u)/T and muu = Y, t=1(ut — u)2/T. Also,
myU = (miu + muu)/(1 — в) (11.9)
Using the fact that plim miu = 0 and plim muu = au, we get
plim ]3ols = в + plim (myu/myy) = в + Ou(1 — в)/(plim mu + au)]
which shows that вOLS overstates в if 0 < в < 1.
Example 2: Consider a simple demand and supply model
Qt = a + m + u1t (11.10)
Qt = 1 + dpt + u2t (11.11)
Qt = Qt = Qt t = 1,2,...,T (11.12)
Substituting the equilibrium condition (11.12) in (11.10) and (11.11), we get
Qt = a+m + u1t (11.13)
Qt = Y + dpt + u2t t = C 2,...,T (11.14)
For the demand equation (11.13), the sign of в is expected to be negative, while for the supply equation (11.14), the sign of 8 is expected to be positive. However, we only observe one equilibrium pair (Qt, Pt) and these are not labeled demand or supply quantities and prices. When we run the OLS regression of Qt on Pt we do not know what we are estimating, demand or supply? In fact, any linear combination of (11.13) and (11.14) looks exactly like (11.13) or (11.14). It will have a constant, Price, and a disturbance term in it. Since demand or supply cannot be distinguished from this ‘mongrel’ we have what is known as an identification problem. If the demand equation (or the supply equation) looked different from this mongrel, then this particular equation would be identified. More on this later. For now let us examine the properties of the OLS estimates of the demand equation. It is well known that
вOLS = St=i QtPt/ ^2t=i Pt = в + St=i Pt(Ut — ui)/ ^2t=i Pt (11.15)
where qt and pt denote Qt and Pt in deviation form, i. e., qt = Qt — Q. This estimator is
unbiased depending on whether the last term in (11.15) has zero expectations. In order to find this expectation we solve the structural equations in (11.13) and (11.14) for Qt and Pt
Qt = (аб — чв)/(6 — в) + (Suit — /3u2t)/(S — в) (11.16)
Pt = (a — Y)/ (б — e) + (uit — U2t)/(6 — в) (11.17)
(11.16) and (11.17) are known as the reduced form equations. Note that both Qt and Pt are functions of both errors ui and u2. Hence, Pt is correlated with uit. In fact,
Pt = (uit — u{)/(б — в) — (u2t — u2)/(б — в) (11.18)
and
plim£ J=i pt (uit — ui)/T = (an — ai2)/(б — в) (11.19)
plim£ T=i P2t/T = (an + CT22 — 2ai2 )/(б — в )2 (11.20)
where aij = cov(uit, ujt) for i, j = 1, 2; and t = 1,...,T. Hence, from (11.15)
plim ]3ols = в + (aii — аи)(б — в)/(aii + a22 — 2ai2) (11.21)
and the last term is not necessarily zero, implying that вOLS is not consistent for в. Similarly, one can show that the OLS estimator for б is not consistent, see problem 1. This simultaneous bias is once again due to the correlation of the right hand side variable (price) with the error term ui. This correlation could be due to the fact that Pt is a function of u2t, from (11.17), and u2t and uit are correlated, making Pt correlated with uit. Alternatively, Pt is a function of Qt, from (11.13) or (11.14), and Qt is a function of uit, from (11.13), making Pt a function of uit. Intuitively, if a shock in demand (i. e., a change in uit) shifts the demand curve, the new intersection of demand and supply determines a new equilibrium price and quantity. This new price is therefore, affected by the change in uit, and is correlated with it.
In general, whenever a right hand side variable is correlated with the error term, the OLS estimates are biased and inconsistent. We refer to this as an endogeneity problem. Recall, Figure 3 of Chapter 3 with cov(Pt, uit) > 0. This shows that Pt’s above their mean are on the average associated with uit’s above their mean, (i. e., uit > 0). This implies that the quantity Qt associated with this particular Pt is on the average above the true line (a + вPt). This is true for all observations to the right of E(Pt). Similarly, any Pt to the left of E(Pt) is on the average associated with a uit below its mean, (i. e., uit < 0). This implies that quantities associated with prices below their mean E(Pt) are on the average data points that lie below the true line. With this observed data, the estimated line using OLS will always be biased. In this case, the intercept estimate is biased downwards, whereas the slope estimate is biased upwards. This bias does not disappear with more data, as any new observation will on the average be either above the true line if Pt > E(Pt) or below the line if Pt < E(Pt). Hence, these OLS estimates are inconsistent.
Deaton (1997, p. 95) has a nice discussion of endogeneity problems in development economics. One important example pertains to farm size and farm productivity. Empirical studies using OLS have found an inverse relationship between productivity as measured by log(Output/Acre) and farm size as measured by (Acreage). This seems counter-intuitive as it suggests that smaller farms are more productive than larger farms. Economic explanations of this phenomenon include the observation that hired labor (which is typically used on large farms) is of lower quality than family labor (which is typically used on small farms). The latter needs less monitoring and can be entrusted with valuable animals and machinery. Another explanation is that this phenomenon is an optimal response by small farmers to uncertainty. It could also be a sign of inefficiency as farmers work too much on their own farms pushing their marginal productivity below market wage. How could this be an endogeneity problem? After all, the amount of land is outside the control of the farmer. This is true, but that does not mean that acreage is uncorrelated with the disturbance term. After all, size is unlikely to be independent of the quality of land. “Desert farms that are used for low-intensity animal grazing are typically larger than garden farms, where the land is rich and output/acre is high.” In this case, land quality is negatively correlated with land size. It takes more acres to sustain a cow in West Texas than in less arid areas. This negative correlation between acres, the explanatory variable and quality of land which is an omitted variable included in the error term introduces endogeneity. This in turn results in downward bias of the OLS estimate of acreage on productivity.
Endogeneity can also be caused by sample selection. Gronau (1973) observed that women with small children had higher wages than women with no children. An economic explanation is that women with children have higher reservation wages and as a result fewer of them work. Of those that work, their observed wages are higher than those without children. The endogeneity works through the unobserved component in the working women’s wage that induces her to work. This is positively correlated with the number of children she has and therefore introduces upward biases in the OLS estimate of the effect of the number of children on wages.