Springer Texts in Business and Economics

Dummy Variables

Many explanatory variables are qualitative in nature. For example, the head of a household could be male or female, white or non-white, employed or unemployed. In this case, one codes these variables as “M” for male and “F” for female, or change this qualitative variable into a quantitative variable called FEMALE which takes the value “0” for male and “1” for female. This obviously begs the question: “why not have a variable MALE that takes on the value 1 for male and 0 for female?” Actually, the variable MALE would be exactly 1-FEMALE. In other words, the zero and one can be thought of as a switch, which turns on when it is 1 and off when it is 0. Suppose that we are interested in the earnings of households, denoted by EARN, and MALE and FEMALE are the only explanatory variables available, then problem 10 asks the reader to verify that running OLS on the following model:

EARN = ам MALE + aF FEMALE + u (4.21)

gives aM = “average earnings of the males in the sample” and aF = “average earnings of the females in the sample.” Notice that there is no intercept in (4.21), this is because of what is known in the literature as the “dummy variable trap.” Briefly stated, there will be perfect multicollinearity between MALE, FEMALE and the constant. In fact, MALE + FEMALE =

1. Some researchers may choose to include the intercept and exclude one of the sex dummy variables, say MALE, then

EARN = a + в FEMALE + u (4.22)

and the OLS estimates give a = “average earnings of males in the sample” = aM, while в = aF - aM = “the difference in average earnings between females and males in the sample.” Regression (4.22) is more popular when one is interested in contrasting the earnings between males and females and obtaining with one regression the markup or markdown in average earnings (aF - aM) as well as the test of whether this difference is statistically different from zero. This would be simply the t-statistic on в in (4.22). On the other hand, if one is interested in estimating the average earnings of males and females separately, then model (4.21) should be the one to consider. In this case, the t-test for a. F - ам = 0 would involve further calculations not directly given from the regression in (4.21) but similar to the calculations given in Example 3.

What happens when another qualitative variable is included, to depict another classification of the individuals in the sample, say for example, race? If there are three race groups in the sample, WHITE, BLACK and HISPANIC. One could create a dummy variable for each of these classifications. For example, WHITE will take the value 1 when the individual is white and 0 when the individual is non-white. Note that the dummy variable trap does not allow the inclusion of all three categories as they sum up to 1. Also, even if the intercept is dropped, once MALE and FEMALE are included, perfect multicollinearity is still present because MALE + FEMALE = WHITE + BLACK + HISPANIC. Therefore, one category from race should be dropped. Suits (1984) argues that the researcher should use the dummy variable category omission to his or her advantage, in interpreting the results, keeping in mind the purpose of the study. For example, if one is interested in comparing earnings across the sexes holding race constant, the omission of MALE or FEMALE is natural, whereas, if one is interested in the race differential in earnings holding gender constant, one of the race variables should be omitted. Whichever variable is omitted, this becomes the base category for which the other earnings are compared. Most researchers prefer to keep an intercept, although regression packages allow for a no intercept option. In this case one should omit one category from each of the race and sex classifications. For example, if MALE and WHITE are omitted:

EARN = a + eF FEMALE + вв BLACK + вН HISPANIC + u (4.23)

Assuming the error u satisfies all the classical assumptions, and taking expected values of both sides of (4.23), one can see that the intercept a = the expected value of earnings of the omitted category which is “white males”. For this category, all the other switches are off. Similarly, a + вf is the expected value of earnings of “white females,” since the FEMALE switch is on. One can conclude that eF = difference in the expected value of earnings between white females and white males. Similarly, one can show that a + вв is the expected earnings of “black males” and a + eF + в в is the expected earnings of “black females.” Therefore, eF represents the difference in expected earnings between black females and black males. In fact, problem 11 asks the reader to show that вf represents the difference in expected earnings between hispanic females and hispanic males. In other words, вF represents the differential in expected earnings between females and males holding race constant. Similarly, one can show that вв is the difference in expected earnings between blacks and whites holding sex constant, and вН is the differential in expected earnings between hispanics and whites holding sex constant. The main key to the interpretation of the dummy variable coefficients is to be able to turn on and turn off the proper switches, and write the correct expectations.

The real regression will contain other quantitative and qualitative variables, like

EARN = a + вF FEMALE + вв BLACK + вН HISPANIC + y 4 EXP (4.24)

+Y2EXP2 + y 3 EDUC + y4 UNION + u

where EXP is years of job experience, EDUC is years of education, and UNION is 1 if the individual belongs to a union and 0 otherwise. EXP2 is the squared value of EXP. Once again, one can interpret the coefficients of these regressions by turning on or off the proper switches. For example, y4 is interpreted as the expected difference in earnings between union and non-union members holding all other variables included in (4.24) constant. Halvorsen and Palmquist (1980) warn economists about the interpretation of dummy variable coefficients when the dependent variable is in logs. For example, if the earnings equation is semi-logarithmic:

log(Earnings) = a + в UNION + yEDUC + u

then y = % change in earnings for one extra year of education, holding union membership constant. But, what about the returns for union membership? If we let Y = log(Earnings) when the individual belongs to a union, and Y0 = log(Earnings) when the individual does not belong to a union, then g = % change in earnings due to union membership = (eYl — eYo )/eYo. Equivalently, one can write that log(1 + g) = Yi — Y0 = в, or that g = ee — 1. In other words, one should not hasten to conclude that в has the same interpretation as 7. In fact, the % change in earnings due to union membership is ee — 1 and not в. The error involved in using в rather than ee — 1 to estimate g could be substantial, especially if в is large. For example, when /3 = 0.5, 0.75,1; 3 = ee — 1 = 0.65,1.12,1.72, respectively. Kennedy (1981) notes that if /3 is unbiased for в, 3 is not necessarily unbiased for g. However, consistency of в implies consistency for 3. If one assumes log-normal distributed errors, then E(ee) = ee+0-5Var(e). Based on this result, Kennedy (1981) suggests estimating g by 3 = ee+0J5Ya'r^e)-1, where Var(/3) is a consistent estimate of Var^).

Another use of dummy variables is in taking into account seasonal factors, i. e., including 3 seasonal dummy variables with the omitted season becoming the base for comparison. i For example:

Sales = a + вш Winter + в s Spring + вР Fall + 71Price + u (4.25)

the omitted season being the Summer season, and if (4.25) models the sales of air-conditioning units, then вf is the difference in expected sales between the Fall and Summer seasons, holding the price of an air-conditioning unit constant. If these were heating units one may want to change the base season for comparison.

Another use of dummy variables is for War years, where consumption is not at its normal level say due to rationing. Consider estimating the following consumption function

Ct = a + вYt + SWAR t + ut t = 1,2,...,T (4.26)

where Ct denotes real per capita consumption, Yt denotes real per capita personal disposable income, and WARt is a dummy variable taking the value 1 if it is a War time period and 0 otherwise. Note that the War years do not affect the slope of the consumption line with respect to income, only the intercept. The intercept is a in non-War years and a + S in War years. In other words, the marginal propensity out of income is the same in War and non-War years, only the level of consumption is different.

Of course, one can dummy other unusual years like periods of strike, years of natural disaster, earthquakes, floods, hurricanes, or external shocks beyond control, like the oil embargo of 1973. If this dummy includes only one year like 1973, then the dummy variable for 1973, call it D73, takes the value 1 for 1973 and zero otherwise. Including D73 as an extra variable in the regression has the effect of removing the 1973 observation from estimation purposes, and the resulting regression coefficients estimates are exactly the same as those obtained excluding the 1973 observation and its corresponding dummy variable. In fact, using matrix algebra in Chapter 7, we will show that the coefficient estimate of D73 is the forecast error for 1973, using the regression that ignores the 1973 observations. In addition, the standard error of the dummy coefficient estimates is the standard error of this forecast. This is a much easier way of obtaining the forecast error and its standard error from the regression package without additional computations, see Salkever (1976). More on this in Chapter 7.

Interaction Effects

So far the dummy variables have been used to shift the intercept of the regression keeping the slopes constant. One can also use the dummy variables to shift the slopes by letting them interact with the explanatory variables. For example, consider the following earnings equation:

EARN = a + aF FEMALE + /3EDUC + u (4.27)

In this regression, only the intercept shifts from males to females. The returns to an extra year of education is simply p, which is assumed to be the same for males as well as females. But if we now introduce the interaction variable (FEMALE x EDUC), then the regression becomes:

EARN = a + aFFEMALE + pEDUC + y(FEMALE x EDUC) + u (4.28)

In this case, the returns to an extra year of education depends upon the sex of the individual. In fact, d(EARN)/d(EDUC) = p + 7(FEMALE) = p if male, and p + 7 if female. Note that the interaction variable = EDUC if the individual is female and 0 if the individual is male.

Estimating (4.28) is equivalent to estimating two earnings equations, one for males and another one for females, separately. The only difference is that (4.28) imposes the same variance across the two groups, whereas separate regressions do not impose this, albeit restrictive, equality of the variances assumption. This set-up is ideal for testing the equality of slopes, equality of intercepts, or equality of both intercepts and slopes across the sexes. This can be done with the F-test described in (4.17). In fact, for Ho; equality of slopes, given different intercepts, the restricted residuals sum of squares (RRSS) is obtained from (4.27), while the unrestricted residuals sum of squares (URSS) is obtained from (4.28). Problem 12 asks the reader to set up the F-test for the following null hypothesis: (i) equality of slopes and intercepts, and (ii) equality of intercepts given the same slopes.

Dummy variables have many useful applications in economics. For example, several tests including the Chow (1960) test, and Utts (1982) Rainbow test described in Chapter 8, can be applied using dummy variable regressions. Additionally, they can be used in modeling splines, see Poirier (1976) and Suits, Mason and Chan (1978), and fixed effects in panel data, see Chapter 12. Finally, when the dependent variable is itself a dummy variable, the regression equation needs special treatment, see Chapter 13 on qualitative limited dependent variables.

Empirical Example: Table 4.1 gives the results of a regression on 595 individuals drawn from the Panel Study of Income Dynamics (PSID) in 1982. This data is provided on the Springer web site as EARN. ASC. A description of the data is given in Cornwell and Rupert (1988). In particular, log wage is regressed on years of education (ED), weeks worked (WKS), years of full-time work experience (EXP), occupation (OCC = 1, if the individual is in a blue-collar occupation), residence (SOUTH = 1, SMSA = 1, if the individual resides in the South, or in a standard metropolitan statistical area), industry (IND = 1, if the individual works in a manufacturing industry), marital status (MS = 1, if the individual is married), sex and race (FEM = 1, BLK = 1, if the individual is female or black), union coverage (UNION = 1, if the individual’s wage is set by a union contract). These results show that the returns to an extra year of schooling is 5.7%, holding everything else constant. It shows that Males on the average earn more than Females. Blacks on the average earn less than Whites, and Union workers earn more than non-union workers. Individuals residing in the South earn less than those living elsewhere. Those residing in a standard metropolitan statistical area earn more on the average than those

Dependent Variable: LWAGE Analysis of Variance

		Sum of	Mean
Source	DF	Squares	Square	F Value	Prob > F
Model	12	52.48064	4.37339	41.263	0.0001
Error	582	61.68465	0.10599
C Total	594	114.16529
Root MSE		0.32556	R-square	0.4597
Dep Mean		6.95074	Adj R-sq	0.4485
C. V.		4.68377
		Parameter Estimates
		Parameter	Standard	T for H0:
Variable	DF	Estimate	Error	Parameter=0	Prob > \|T\|
INTERCEP	1	5.590093	0.19011263	29.404	0.0001
WKS	1	0.003413	0.00267762	1.275	0.2030
SOUTH	1	-0.058763	0.03090689	-1.901	0.0578
SMSA	1	0.166191	0.02955099	5.624	0.0001
MS	1	0.095237	0.04892770	1.946	0.0521
EXP	1	0.029380	0.00652410	4.503	0.0001
EXP2	1	-0.000486	0.00012680	-3.833	0.0001
OCC	1	-0.161522	0.03690729	-4.376	0.0001
IND	1	0.084663	0.02916370	2.903	0.0038
UNION	1	0.106278	0.03167547	3.355	0.0008
FEM	1	-0.324557	0.06072947	-5.344	0.0001
BLK	1	-0.190422	0.05441180	-3.500	0.0005
ED	1	0.057194	0.00659101	8.678	0.0001

who do not. Individuals who work in a manufacturing industry or are not blue collar workers or are married earn more on the average than those who are not. For EXP2 = (EXP)2, this regression indicates a significant quadratic relationship between earnings and experience. All the variables were significant at the 5% level except for WKS, SOUTH and MS.

Note

1. There are more sophisticated ways of seasonal adjustment than introducing seasonal dummies, see Judge et al. (1985).

Problems

1. For the Cigarette Data given in Table 3.2. Run the following regressions:

(a) Real per capita consumption of cigarettes on real price and real per capita income. (All variables are in log form, and all regressions in this problem include a constant).

(b) Real per capita consumption of cigarettes on real price.

(d) Real per capita consumption on the residuals of part (c).

(e) Residuals from part (b) on the residuals in part (c).

(f) Compare the regression slope estimates in parts (d) and (e) with the regression coefficient estimate of the real income coefficient in part (a), what do you conclude?

2. Simple Versus Multiple Regression Coefficients. This is based on Baltagi (1987b). Consider the multiple regression

Yi = a + в2 X2i + в3Х3і + ui І = 1 2,...,n

along with the following auxiliary regressions:

X2i = a + l>X3i + V2i

X3i = c + dX2i + V3i

In section 4.3, we showed that в2, the OLS estimate of в2 can be interpreted as a simple regression of Y on the OLS residuals C2. A similar interpretation can be given to /З3. Kennedy (1981, p. 416) claims that в2 is not necessarily the same as S2, the OLS estimate of S2 obtained from the regression Y on C2, c3 and a constant, Yi = 7 + S2v2i + S3v3i + wi. Prove this claim by finding a relationship between the e’s and the As.

3. For the simple regression Yi = a + pXi + ui considered in Chapter 3, show that

(a) Pols = $^n=i x-iV-i/Yln=i x2 can be obtained using the residual interpretation by regressing X on a constant first, getting the residuals a and then regressing Y on C.

(b) aOLS = Y — вOLSX can be obtained using the residual interpretation by regressing 1 on X and obtaining the residuals cc and then regressing Y on CC.

4. Effect of Additional Regressors on R2. This is based on Nieswiadomy (1986).

(a) Suppose that the multiple regression given in (4.1) has K1 regressors in it. Denote the least squares sum of squared errors by SSE1. Now add K2 regressors so that the total number of regressors is K = K1 + K2. Denote the corresponding least squares sum of squared errors by SSE2. Show that SSE2 < SSE1, and conclude that the corresponding R-squares satisfy R2 > R2.

(b) Derive the equality given in (4.16) starting from the definition of R2 and R2.

(c) Show that the corresponding R-squares satisfy R2 > R2, when the F-statistic for the joint significance of these additional K2 regressors is less than or equal to one.

5. Perfect Multicollinearity. Let Y be the output and X2 = skilled labor and X3 = unskilled labor in the following relationship:

Yi = a + в2 X2i + P3X3i + в4 (X2i + X3i) + в 5X2i + e6X3i + ui

What parameters are estimable by OLS?

6. Suppose that we have estimated the parameters of the multiple regression model:

Yt = в1 + e2Xt2 + e3Xt3 + ut

by Ordinary Least Squares (OLS) method. Denote the estimated residuals by (et, t = 1,...,T) and the predicted values by (Yt, t = 1,...,T).

(a) What is the R2 of the regression of e on a constant, X2 and X3?

(b) If we regress Y on a constant and Y, what are the estimated intercept and slope coefficients? What is the relationship between the R2 of this regression and the R2 of the original regression?

(c) If we regress Y on a constant and e, what are the estimated intercept and slope coefficients? What is the relationship between the R2 of this regression and the R2 of the original regression?

(d) Suppose that we add a new explanatory variable X4 to the original model and re-estimate the parameters by OLS. Show that the estimated coefficient of X4 and its estimated standard error will be the same as in the OLS regression of e on a constant, X2, X3 and X4.

7. Consider the Cobb-Douglas production function in example 5. How can you test for constant returns to scale using a t-statistic from the unrestricted regression given in (4.18).

8. Testing Multiple Restrictions. For the multiple regression given in (4.1). Set up the F-statistic described in (4.17) for testing

(a) Ho; P2 = в4 = Ре.

(b) Ho; в2 = вз and въ — ве = 1.

9. Monte Carlo Experiments. Hanushek and Jackson (1977, pp. 60-65) generated the following data

Yi = 15 + 1X2i + 2X3i + Ui for i = 1, 2,..., 25 with a fixed set of X2i and X3i, and ui, s that

are IID ~ N(0,100). For each set of 25 ui, s drawn randomly from the normal distribution, a corresponding set of 25 Yj’s are created from the above equation. Then OLS is performed on the resulting data set. This can be repeated as many times as we can afford. 400 replications were performed by Hanushek and Jackson. This means that they generated 400 data sets each of size 25 and ran 400 regressions giving 400 OLS estimates of а, в2, в3 and a2. The classical assumptions are satisfied for this model, by construction, so we expect these OLS estimators to be BLUE, MLE and efficient.

(a) Replicate the Monte Carlo experiments of Hanushek and Jackson (1977) and generate the means of the 400 estimates of the regression coefficients as well as a2. Are these estimates unbiased?

(b) Compute the standard deviation of these 400 estimates and call this ab. Also compute the average of the 400 standard errors of the regression estimates reported by the regression. Denote this mean by sb. Compare these two estimates of the standard deviation of the regression coefficient estimates to the true standard deviation knowing the true a2. What do you conclude?

(d) Increase the sample size form 25 to 50 and repeat the experiment. What do you observe?

10. Female and Male Dummy Variables.

(a) Derive the OLS estimates of ар and ам for Yi = арFi + амMi + ui where Y is Earnings, F is FEMALE and M is MALE, see (4.21). Show that ар = Yp, the average of the Yi, s only for females, and SM = YM, the average of the Yi, s only for males.

(b) Suppose that the regression is Yi = а + вFi + ui, see (4.22). Show that а = SM, and

в = ар — ам.

(d) Verify parts (a), (b) and (c) using the earnings data underlying Table 4.1.

11. Multiple Dummy Variables. For equation (4.23)

EARN = a + eF FEMALE + f3B BLACK + /3H HISPANIC + u Show that

(a) E(Earnings/Hispanic Female) = a + @F + @H; also E(Earnings/Hispanic Male) = a + @H. Conclude that @F = E(Earnings/Hispanic Female) - E(Earnings/Hispanic Male).

(b) E(Earnings/Hispanic Female) - E(Earnings/White Female) = E(Earnings/Hispanic Male) - E(Earnings/White Male) = [IH.

12. For the earnings equation given in (4.28), how would you set up the F-test and what are the restricted and unrestricted regressions for testing the following hypotheses:

(a) The equality of slopes and intercepts for Males and Females.

(b) The equality of intercepts given the same slopes for Males and Females. Show that the resulting F-statistic is the square of a t-statistic from the unrestricted regression.

(c) The equality of intercepts allowing for different slopes for Males and Females. Show that the resulting F-statistic is the square of a t-statistic from the unrestricted regression.

(d) Apply your results in parts (a), (b) and (c) to the earnings data underlying Table 4.1.

13. For the earnings data regression underlying Table 4.1.

(a) Replicate the regression results given in Table 4.1.

(b) Verify that the joint significance of all slope coefficients can be obtained from (4.20).

(c) How would you test the joint restriction that expected earnings are the same for Males and Females whether Black or Non-Black holding everything else constant?

(d) How would you test the joint restriction that expected earnings are the same whether the individual is married or not and whether this individual belongs to a Union or not?

(e) From Table 4.1 what is your estimate of the % change in earnings due to Union membership? If the disturbances are assumed to be log-normal, what would be the estimate suggested by Kennedy (1981) for this % change in earnings?

(f) What is your estimate of the % change in earnings due to the individual being married?

14. Crude Quality. Using the data set of U. S. oil field postings on crude prices ($/barrel), gravity (degree API) and sulphur (% sulphur) given in the CRUDES. ASC file on the Springer web site.

(a) Estimate the following multiple regression model: POIL = ,^+^GRAVITY + в3 SULPHUR

+ £.

(b) Regress GRAVITY = a0 + a1SULPHUR + vt then compute the residuals (vt). Now perform the regression

POIL = Yi + Y2vt + £

Verify that y2 is the same as в2 in part (a). What does this tell you?

(c) Regress POIL = ф1 + ^2SULPHUR + w. Compute the residuals (w). Now regress w on v obtained from part (b), to get Wt = S1 + S2vt+ residuals. Show that S2 = в2 in part (a). Again, what does this tell you?

(d) To illustrate how additional data affects multicollinearity, show how your regression in part

(a) changes when the sample is restricted to the first 25 crudes.

(e) Delete all crudes with sulphur content outside the range of 1 to 2 percent and run the multiple regression in part (a). Discuss and interpret these results.

Year	CAR	QMG (1,000 Gallons)	PMG ($)	POP (1,000)	RGNP (Billion)	PGNP
1950	49195212	40617285	0.272	152271	1090.4	26.1
1951	51948796	43896887	0.276	154878	1179.2	27.9
1952	53301329	46428148	0.287	157553	1226.1	28.3
1953	56313281	49374047	0.290	160184	1282.1	28.5
1954	58622547	51107135	0.291	163026	1252.1	29.0
1955	62688792	54333255	0.299	165931	1356.7	29.3
1956	65153810	56022406	0.310	168903	1383.5	30.3
1957	67124904	57415622	0.304	171984	1410.2	31.4
1958	68296594	59154330	0.305	174882	1384.7	32.1
1959	71354420	61596548	0.311	177830	1481.0	32.6
1960	73868682	62811854	0.308	180671	1517.2	33.2
1961	75958215	63978489	0.306	183691	1547.9	33.6
1962	79173329	62531373	0.304	186538	1647.9	34.0
1963	82713717	64779104	0.304	189242	1711.6	34.5
1964	86301207	67663848	0.312	191889	1806.9	35.0
1965	90360721	70337126	0.321	194303	1918.5	35.7
1966	93962030	73638812	0.332	196560	2048.9	36.6
1967	96930949	76139326	0.337	198712	2100.3	37.8
1968	101039113	80772657	0.348	200706	2195.4	39.4
1969	103562018	85416084	0.357	202677	2260.7	41.2
1970	106807629	88684050	0.364	205052	2250.7	43.4
1971	111297459	92194620	0.361	207661	2332.0	45.6
1972	117051638	95348904	0.388	209896	2465.5	47.5
1973	123811741	99804600	0.524	211909	2602.8	50.2
1974	127951254	100212210	0.572	213854	2564.2	55.1
1975	130918918	102327750	0.595	215973	2530.9	60.4
1976	136333934	106972740	0.631	218035	2680.5	63.5
1977	141523197	110023410	0.657	220239	2822.4	67.3
1978	146484336	113625960	0.678	222585	3115.2	72.2
1979	149422205	107831220	0.857	225055	3192.4	78.6
1980	153357876	100856070	1.191	227757	3187.1	85.7
1981	155907473	100994040	1.311	230138	3248.8	94.0
1982	156993694	100242870	1.222	232520	3166.0	100.0
1983	161017926	101515260	1.157	234799	3279.1	103.9
1984	163432944	102603690	1.129	237001	3489.9	107.9
1985	168743817	104719230	1.115	239279	3585.2	111.5
1986	173255850	107831220	0.857	241613	3676.5	114.5
1987	177922000	110467980	0.897	243915	3847.0	117.7

Table 4.2 U. S. Gasoline Data: 1950-1987

15. Consider the U. S. gasoline data from 1950-1987 given in Table 4.2, and obtained from the file USGAS. ASC on the Springer web site.

(a) For the period 1950-1972 estimate models (1) and (2):

logQMG = в1 + e2logCAR + в3 logPOP + f34logRGNP (1)

+@5logPGNP + вб logPMG + u

, QMG, RGNP, CAR, PMG

og CAR = Yl + 72 og POP + 73 ogPOP + 74 ogPGNP + ^ (2)

(b) What restrictions should the e’s satisfy in model (1) in order to yield the y’s in model (2)?

(d) Compute the simple correlations among the X’s in model (1). What do you observe?

(e) Use the Chow-F test to test the parametric restrictions obtained in part (b).

(f) Estimate equations (1) and (2) now using the full data set 1950-1987. Discuss briefly the effects on individual parameter estimates and their standard errors of the larger data set.

(g) Using a dummy variable, test the hypothesis that gasoline demand per CAR permanently shifted downward for model (2) following the Arab Oil Embargo in 1973?

(h) Construct a dummy variable regression that will test whether the price elasticity has changed after 1973.

16. Consider the following model for the demand for natural gas by residential sector, call it model (1):

logConsit = во + eilogPgit + e2logPoit + e3logPeit + e4logHDDit + в ^ogPIit + uu

where i = 1, 2,..., 6 states and t = 1, 2,..., 23 years. Cons is the consumption of natural gas by residential sector, Pg, Po and Pe are the prices of natural gas, distillate fuel oil, and electricity of the residential sector. HDD is heating degree days and PI is real per capita personal income. The data covers 6 states: NY, FL, MI, TX, UT and CA over the period 1967-1989. It is given in the NATURAL. ASC file on the Springer web site.

(a) Estimate the above model by OLS. Call this model (1). What do the parameter estimates imply about the relationship between the fuels?

(b) Plot actual consumption versus the predicted values. What do you observe?

(c) Add a dummy variable for each state except California and run OLS. Call this model (2). Compute the parameter estimates and standard errors and compare to model (1). Do any of the interpretations of the price coefficients change? What is the interpretation of the New York dummy variable? What is the predicted consumption of natural gas for New York in 1989?

(d) Test the hypothesis that the intercepts of New York and California are the same.

(e) Test the hypothesis that all the states have the same intercept.

(f) Add a dummy variable for each state and run OLS without an intercept. Call this model (3). Compare the parameter estimates and standard errors to the first two models. What is the interpretation of the coefficient of the New York dummy variable? What is the predicted consumption of natural gas for New York in 1989?

(g) Using the regression in part (f), test the hypothesis that the intercepts of New York and California are the same.

References

This chapter draws upon the material in Kelejian and Oates (1989) and Wallace and Silver (1988). Several econometrics books have an excellent discussion on dummy variables, see Gujarati (1978), Judge et al. (1985), Kennedy (1992), Johnston (1984) and Maddala (2001), to mention a few. Other readings referenced in this chapter include:

Baltagi, B. H. (1987a), “To Pool or Not to Pool: The Quality Bank Case,” The American Statistician, 41: 150-152.

Baltagi, B. H. (1987b), “Simple versus Multiple Regression Coefficients,” Econometric Theory, Problem

87.1.1, 3: 159.

Chow, G. C. (1960), “Tests of Equality Between Sets of Coefficients in Two Linear Regressions,” Econo - metrica, 28: 591-605.

Cornwell, C. and P. Rupert (1988), “Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variables Estimators,” Journal of Applied Econometrics, 3: 149-155.

Dufour, J. M. (1980), “Dummy Variables and Predictive Tests for Structural Change,” Economics Letters, 6: 241-247.

Dufour, J. M. (1982), “Recursive Stability of Linear Regression Relationships,” Journal of Econometrics, 19: 31-76.

Gujarati, D. (1970), “Use of Dummy Variables in Testing for Equality Between Sets of Coefficients in Two Linear Regressions: A Note,” The American Statistician, 24: 18-21.

Gujarati, D. (1970), “Use of Dummy Variables in Testing for Equality Between Sets of Coefficients in Two Linear Regressions: A Generalization,” The American Statistician, 24: 50-52.

Halvorsen, R. and R. Palmquist (1980), “The Interpretation of Dummy Variables in Semilogarithmic Equations,” American Economic Review, 70: 474-475.

Hanushek, E. A. and J. E. Jackson (1977), Statistical Methods for Social Scientists (Academic Press: New York).

Hill, R. Carter and L. C. Adkins (2001), “Collinearity,” Chapter 12 in B. H. Baltagi (ed.) A Companion to Theoretical Econometrics (Blackwell: Massachusetts).

Kennedy, P. E. (1981), “Estimation with Correctly Interpreted Dummy Variables in Semilogarithmic Equations,” American Economic Review, 71: 802.

Kennedy, P. E. (1981), “The Balentine: A Graphical Aid for Econometrics,” Australian Economic Papers, 20: 414-416.

Kennedy, P. E. (1986), “Interpreting Dummy Variables,” Review of Economics and Statistics, 68: 174-175.

Nieswiadomy, M. (1986), “Effect of an Additional Regressor on R2,” Econometric Theory, Problem

86.3.1, 2:442.

Poirier, D. (1976), The Econometrics of Structural Change (North Holland: Amsterdam).

Salkever, D. (1976), “The Use of Dummy Variables to Compute Predictions, Prediction Errors, and Confidence Intervals,” Journal of Econometrics, 4: 393-397.

Suits, D. (1984), “Dummy Variables: Mechanics vs Interpretation,” Review of Economics and Statistics, 66: 132-139.

Suits, D. B., A. Mason and L. Chan (1978), “Spline Functions Fitted by Standard Regression Methods,” Review of Economics and Statistics, 60: 132-139.

Utts, J. (1982), “The Rainbow Test for Lack of Fit in Regression,” Communications in Statistics-Theory and Methods, 11: 1801-1815.

Appendix

Springer Texts in Business and Economics

The General Linear Model: The Basics

7.1 Invariance of the fitted values and residuals to non-singular transformations of the independent variables. The regression model in (7.1) can be written as y = XCC-1" + u where …

Regression Diagnostics and Specification Tests

8.1 Since H = PX is idempotent, it is positive semi-definite with b0H b > 0 for any arbitrary vector b. Specifically, for b0 = (1,0,.., 0/ we get hn …

Generalized Least Squares

9.1 GLS Is More Efficient than OLS. a. Equation (7.5) of Chap. 7 gives "ois = " + (X'X)-1X'u so that E("ois) = " as long as X and u …

Dummy Variables

Springer Texts in Business and Economics

The General Linear Model: The Basics

Regression Diagnostics and Specification Tests

Generalized Least Squares

Новые и рекомендуемые материалы:

Производство и продажа хонинговального инструмента

Оборудование для производства краски

Теплообменники для паровых и водяных котлов

Станок для производства ТЕРИВА TERIVA (блоки перекрытия)

Оборудование для производства пенобетона

Расфасовка угля, торфа, кормов, оборудование для упаковки-дозирования

Паровые котлы на дровах, опилках

Где работают наши линии по производству пенобетона

Где работают наши линии по производству пенопласта

Малый бизнес

Производимое оборудование

Техническая литература

Как с нами связаться:

Контакты для заказов оборудования: