Springer Texts in Business and Economics

Empirical Example

Table 3.2 gives (i) the logarithm of cigarette consumption (in packs) per person of smoking age (> 16 years) for 46 states in 1992, (ii) the logarithm of real price of cigarettes in each state, and (iii) the logarithm of real disposable income per capita in each state. This is drawn from Baltagi and Levin (1992) study on dynamic demand for cigarettes. It can be downloaded as Cigarett. dat from the Springer web site.

Table 3.2 Cigarette Consumption Data

LNC: log of consumption (in packs) per person of smoking age (>16)

LNP: log of real price (1983$/pack)

LNY: log of real disposable income per-capita (in thousand 1983$)

OBS	STATE	LNC	LNP	LNY
1	AL	4.96213	0.20487	4.64039
2	AZ	4.66312	0.16640	4.68389
3	AR	5.10709	0.23406	4.59435
4	CA	4.50449	0.36399	4.88147
5	CT	4.66983	0.32149	5.09472
6	DE	5.04705	0.21929	4.87087
7	DC	4.65637	0.28946	5.05960
8	FL	4.80081	0.28733	4.81155
9	GA	4.97974	0.12826	4.73299
10	ID	4.74902	0.17541	4.64307
11	IL	4.81445	0.24806	4.90387
12	IN	5.11129	0.08992	4.72916
13	IA	4.80857	0.24081	4.74211
14	KS	4.79263	0.21642	4.79613
15	KY	5.37906	-0.03260	4.64937
16	LA	4.98602	0.23856	4.61461
17	ME	4.98722	0.29106	4.75501
18	MD	4.77751	0.12575	4.94692
19	MA	4.73877	0.22613	4.99998
20	MI	4.94744	0.23067	4.80620
21	MN	4.69589	0.34297	4.81207
22	MS	4.93990	0.13638	4.52938
23	MO	5.06430	0.08731	4.78189
24	MT	4.73313	0.15303	4.70417
25	NE	4.77558	0.18907	4.79671
26	NV	4.96642	0.32304	4.83816
27	NH	5.10990	0.15852	5.00319
28	NJ	4.70633	0.30901	5.10268
29	NM	4.58107	0.16458	4.58202
30	NY	4.66496	0.34701	4.96075
31	ND	4.58237	0.18197	4.69163
32	OH	4.97952	0.12889	4.75875
33	OK	4.72720	0.19554	4.62730
34	PA	4.80363	0.22784	4.83516
35	RI	4.84693	0.30324	4.84670
36	SC	5.07801	0.07944	4.62549
37	SD	4.81545	0.13139	4.67747
38	TN	5.04939	0.15547	4.72525
39	TX	4.65398	0.28196	4.73437
40	UT	4.40859	0.19260	4.55586
41	VT	5.08799	0.18018	4.77578
42	VA	4.93061	0.11818	4.85490
43	WA	4.66134	0.35053	4.85645
44	WV	4.82454	0.12008	4.56859
45	WI	4.83026	0.22954	4.75826
46	WY	5.00087	0.10029	4.71169

Data: Cigarette Consumption of 46 States in 1992

66 Chapter 3: Simple Linear Regression Table 3.3 Cigarette Consumption Regression

Analysis of Variance

		Sum of	Mean
Source	DF	Squares	Square	F Value	Prob > F
Model	1	0.48048	0.48048	18.084	0.0001
Error	44	1.16905	0.02657
Root MSE		0.16300	R-square	0.2913
Dep Mean		4.84784	Adj R-sq	0.2752
C. V.		3.36234
		Parameter Estimates
		Parameter	Standard	T for H0:
Variable	DF	Estimate	Error	Parameter=0	Prob > \|T\|
INTERCEP	1	5.094108	0.06269897	81.247	0.0001
LNP	1	-1.198316	0.28178857	-4.253	0.0001

Log of Real Price (1983$/Pack)

Figure 3.9 Residuals Versus LNP

Table 3.3 gives the SAS output for the regression of logC on logP. The price elasticity of demand for cigarettes in this simple model is (dlogC/logP) which is the slope coefficient. This is estimated to be —1.198 with a standard error of 0.282. This says that a 10% increase in real price of cigarettes has an estimated 12% drop in per capita consumption of cigarettes. The R2 of this regression is 0.29, s2 is given by the Mean Square Error of the regression which is 0.0266. Figure 3.9 plots the residuals of this regression versus the independent variable, while Figure

3.10 plots the predictions along with the 95% confidence interval band for these predictions. One observation clearly stands out as an influential observation given its distance from the rest of the data and that is the observation for Kentucky, a producer state with very low real price. This observation almost anchors the straight line fit through the data. More on influential observations in Chapter 8.

.2 .3 .4

Log of Real Price (1983$/Pack)

Figure 3.10 95% Confidence Band for Predicted Values

Problems

1. For the simple regression with a constant Yi = a + /ЗХі + ui, given in equation (3.1) verify the following numerical properties of the OLS estimator:

(d) Show that cov(aOLS^вOLS) = —Xvar(вOLS) = —a2X/J2"=i x2. This means that the sign of the covariance is determined by the sign of X. If X is positive, this covariance will be negative. This also means that if aOLS is over-estimated, вOLS will be under-estimated.

(b)

Show that EП=1 Xi = 1 and EП=1 XiXi = 0.

(e) Prove that var(a) = ЕЕ Г=1 bi = E E Г=1 X2 + EE IE fi =var(iOLS) + E E n=1f2.

7. (a) Differentiate (3.9) with respect to a and в and show that 3MLE = 3OLS, PmLe = вOLS.

(b) Differentiate (3.9) with respect to a2 and show that H2MLE = EГ=1 e|/n.

8. The t-Statistic in a Simple Regression. It is well known that a standard normal random variable N(0,1) divided by a square root of a chi-squared random variable divided by its degrees of freedom (xV/v)2 results in a random variable that is t-distributed with v degrees of freedom, provided the N(0,1) and the x2 variables are independent, see Chapter 2. Use this fact to show that

9. Relationship Between B2 and r2xy.

(a) Using the fact that R2 = EГ=1 32/ EГ=1 Уі ; Зі = liOLSxE and ][2]ols = EГ=1 xiii/ EГ=1 x2,

r2xy where,

Xiii)2/(E n=1 xi )(E I=1 yi).

Уі + ei, show that n=1 УіУі = E 1 УІ)( Г=1 в2) is equal to R2.

^ = (E n=1 уіУі )2/(E n

10. Prediction. Consider the problem of predicting Y0 from (3.11). Given X0,

(вOLS - e)/[s/(EI=1 xi) 1 ] - tn-2.

(a) Show that E(Y0) = a + вXo.

(b) Show that Y0 is unbiased for E(Y0).

(c) Show that var(Y0) = var(3OLS) + X0var(eOLS) + 2X0cov(3OLS, eOLS). Deduce that var(Y0) = a2[(1/n) + (X0 - .X)2/Е”=1 x2].

(d) Consider a linear predictor of E(Y0), say Y0 = E"=1 a2Y2, show that E"=1 a2 = 1 and E”=1 aiXi = X0 for this predictor to be unbiased for E(Y0).

(e) Show that the var(Y0) = a2 EП=1 a2. Minimize EП=1 a2 subject to the restrictions given in (d). Prove that the resulting predictor is Y0 = 3OLS + eOLSX0 and that the minimum variance is a2 [(1/n) + (X0 - X)2/ EП=1 x2].

11. Optimal Weighting of Unbiased Estimators. This is based on Baltagi (1995). For the simple regression without a constant Y2 = eX2 + u2,i = 1, 2,...,N; where в is a scalar and u2 — IID(0, a2) independent of X2. Consider the following three unbiased estimators of в:

y1 = E”=1 XiYi/Y:n=1 X2, 3 = y/X and

y3 = E”=1(Xi - X)(Y - Y)/E”=1(Xi - .X)2, where X = EГ=1 Xi/n and Y = En=1 Yi/n.

(a) Show that cov(в1,в2) = var^) > 0, and that p12 = (the correlation coefficient of в1 and в2) = [var(/31)/var(/32)] 2 with 0 < p12 < 1. Show that the optimal combination of /31 and в2, given by в = ав1 + (1 - а)в2 where - то < a < ж occurs at a* = 1. Optimality here refers to minimizing the variance. Hint: Read the paper by Samuel-Cahn (1994).

(b) Similarly, show that cov(/1, в3) = var(/i) > 0, and that p13 = (the correlation coefficient of ві and /З3) = [var(/31)/var(/33)j 2 = (1 — p^2) 2 with 0 < p13 < 1. Conclude that the optimal combination в1 and в3 is again a* = 1.

(c) Show that cov(/2,в3) = 0 and that optimal combination of в2 and в3 is в = (1 — р22)в3 + р12в2 = в1. This exercise demonstrates a more general result, namely that the BLUE of в in this case в1 , has a positive correlation with any other linear unbiased estimator of в, and that this correlation can be easily computed from the ratio of the variances of these two estimators.

12. Efficiency as Correlation. This is based on Oksanen (1993). Let в denote the Best Linear Unbiased Estimator of в and let в denote any linear unbiased estimator of в. Show that the relative efficiency of в with respect to в is the squared correlation coefficient between в and в. Hint: Compute the variance of в + А(в — в) for any A. This variance is minimized at A = 0 since в is BLUE. This

should give you the result that E(в ) = E(вв) which in turn proves the required result, see Zheng (1994).

13. For the numerical illustration given in section 3.9, what happens to the least squares regression coefficient estimates (aoLS ffioLS), s2, the estimated se(aoLS) and se(/oLS), ^-statistic for aoLS and вOLS for ; a = 0, and Hb; в = 0 and R2 when:

(a) Y is regressed on X2 + 5 rather than X2. In other words, we add a constant 5 to each observation of the explanatory variable Xi and rerun the regression. It is very instructive to see how the computations in Table 3.1 are affected by this simple transformation on Xi.

(b) Yi + 2 is regressed on Xi. In other words, a constant 2 is added to Yi.

14. For the cigarette consumption data given in Table 3.2.

(a) Give the descriptive statistics for logC, logP and logY. Plot their histogram. Also, plot logC versus logY and logC versus logP. Obtain the correlation matrix of these variables.

(b) Run the regression of logC on logY. What is the income elasticity estimate? What is its standard error? Test the null hypothesis that this elasticity is zero. What is the s and R2 of this regression?

(c) Show that the square of the simple correlation coefficient between logC and logY is equal to R2. Show that the square of the correlation coefficient between the fitted and actual values of logC is also equal to R2 .

(d) Plot the residuals versus income. Also, plot the fitted values along with their 95% confidence band.

15. Consider the simple regression with no constant: Yi = вXi + ui i =1, 2,...,n

where ui ~ IID(0,<t2) independent of Xi. Theil (1971) showed that among all linear estimators in Yi, the minimum mean square estimator for в, i. e., that which minimizes E(в — в)2 is given by

в = в2 E”=1 XiViKp2 E”=1 X2 + a2).

(a) Show that E(в) = в/(1 + c), where c = a2/в2 YI"=1 X2 > 0.

(b) Conclude that the Bias (в) = E(в) — в = — [c/(1 + с)]в. Note that this bias is positive (negative) when в is negative (positive). This also means that в is biased towards zero.

Table 3.4 Energy Data for 20 countries

Country	RGDP (in 106 1975 U. S.S’s)	EN 106 Kilograms Coal Equivalents
Malta	1251	456
Iceland	1331	1124
Cyprus	2003	1211
Ireland	11788	11053
Norway	27914	26086
Finland	28388	26405
Portugal	30642	12080
Denmark	34540	27049
Greece	38039	20119
Switzerland	42238	23234
Austria	45451	30633
Sweden	59350	45132
Belgium	62049	58894
Netherlands	82804	84416
Turkey	91946	32619
Spain	159602	88148
Italy	265863	192453
U. K.	279191	268056
France	358675	233907
W. Germany	428888	352.677

16. Table 3.4 gives cross-section Data for 1980 on real gross domestic product (RGDP) and aggregate energy consumption (EN) for 20 countries

(a) Enter the data and provide descriptive statistics. Plot the histograms for RGDP and EN. Plot EN versus RGDP.

(b) Estimate the regression:

log(En) = a + (3log(RGDP) + u.

Be sure to plot the residuals. What do they show?

(d) One of your Energy data observations has a misplaced decimal. Multiply it by 1000. Now repeat parts (a), (b) and (c).

(e) Was there any reason for ordering the data from the lowest to highest energy consumption? Explain.

Lesson Learned: Always plot the residuals. Always check your data very carefully.

17. Using the Energy Data given in Table 3.4, corrected as in problem 16 part (d), is it legitimate to reverse the form of the equation?

log(RDGP) = y + Slog(En) + e

(a) Economically, does this change the interpretation of the equation? Explain.

(b) Estimate this equation and compare R2 of this equation with that of the previous problem. Also, check if ё = 1/в. Why are they different?

(d) Show that 6/3 = R2.

(e) Effects of changing units in which variables are measured. Suppose you measured energy in BTU’s instead of kilograms of coal equivalents so that the original series was multiplied by 60. How does it change a and в in the following equations?

log(En) = a + elog(RDGP) + u En = a* + в* RGDP + v

Can you explain why 3 changed, but not в for the log-log model, whereas both 3*and

в changed for the linear model?

(f) For the log-log specification and the linear specification, compare the GDP elasticity for Malta and W. Germany. Are both equally plausible?

(g) Plot the residuals from both linear and log-log models. What do you observe?

(h) Can you compare the R2 and standard errors from both models in part (g)? Hint: Retrieve log(En) and log(En) in the log-log equation, exponentiate, then compute the residuals and s. These are comparable to those obtained from the linear model.

18. For the model considered in problem 16: log(En) = a + вlog(RGDP) + u and measuring energy in BTU’s (like part (e) of problem 17).

(a) What is the 95% confidence prediction interval at the sample mean?

(b) What is the 95% confidence prediction interval for Malta?

References

Additional readings on the material covered in this chapter can be found in:

Baltagi, B. H. (1995), “Optimal Weighting of Unbiased Estimators,” Econometric Theory, Problem 95.3.1, 11:637.

Baltagi, B. H. and D. Levin (1992), “Cigarette Taxation: Raising Revenues and Reducing Consumption,” Structural Change and Economic Dynamics, 3: 321-335.

Belsley, D. A., E. Kuh and R. E. Welsch (1980), Regression Diagnostics (Wiley: New York).

Greene, W. (1993), Econometric Analysis (Macmillian: New York).

Gujarati, D. (1995), Basic Econometrics (McGraw-Hill: New York).

Johnston, J. (1984), Econometric Methods (McGraw-Hill: New York).

Kelejian, H. and W. Oates (1989), Introduction to Econometrics (Harper and Row: New York). Kennedy, P. (1992), A Guide to Econometrics (MIT Press: Cambridge).

Kmenta, J. (1986), Elements of Econometrics (Macmillan: New York).

Maddala, G. S. (1992), Introduction to Econometrics (Macmillan: New York).

Oksanen, E. H. (1993), “Efficiency as Correlation,” Econometric Theory, Problem 93.1.3, 9: 146. Samuel-Cahn, E. (1994), “Combining Unbiased Estimators,” The American Statistician, 48: 34-36. Wallace, D. and L. Silver (1988), Econometrics: An Introduction (Addison Wesley: New York).

Zheng, J. X. (1994), “Efficiency as Correlation,” Econometric Theory, Solution 93.1.3, 10: 228.

Appendix

Centered and Uncentered R2

From the OLS regression on (3.1) we get

Подпись: (A.1) Подпись: (A.3) Yi Yi + ei ^ 1 > 2,...,n

where Yi — aoLS + Xi(3OLS. Squaring and summing the above equation we get

n Y 2 — n Y 2 І n e2 i=1 Yi i=1 Yi i=1 ei

since En=i Yiei — 0. The uncentered R2 is given by

uncentered R2 — 1 - ЕП=1 ei/ ЕП=1 Yi2 — ЕП=1 Y2/ Ei=i Yi2

Note that the total sum of squares for Yi is not expressed in deviation from the sample mean Y. In other words, the uncentered R2 is the proportion of variation of ЕП= Yf that is explained by the regression on X. Regression packages usually report the centered R2 which was defined in section 3.6 as 1 — (En=1 e2/Y//n=1 у2) where yi — Yi — Y. The latter measure focuses on explaining the variation in Yi after fitting the constant.

From section 3.6, we have seen that a naive model with only a constant in it gives Y as the estimate of the constant, see also problem 2. The variation in Yi that is not explained by this naive model is ЕП^ УЇ — Y/Jn=1(Yi — Y)2. Subtracting nY2 from both sides of (A.2) we get

En=1 yi — £ П=1 Y2 — nY2 + E П=1 e2

and the centered R2 is

Подпись: (A.4) centered R2 — 1 — (ЕП=1 e2/ ЕП=1 у2) — (En=1 Yi2 — nY2)/ En=1 y

If there is a constant in the model Y — Y, see section 3.6, and En=1 Yi2 — EE^Y — Y)2 — En=1 — nY2. Therefore, the centered R2 — En=1 Y2/Y/n=1 у2 which is the R2 reported by

regression packages. If there is no constant in the model, some regression packages give you the option of (no constant) and the R2 reported is usually the uncentered R2. Check your regression package documentation to verify what you are getting. We will encounter uncentered R2 again in constructing test statistics using regressions, see for example Chapter 11.

CHAPTER 4

Springer Texts in Business and Economics

The General Linear Model: The Basics

7.1 Invariance of the fitted values and residuals to non-singular transformations of the independent variables. The regression model in (7.1) can be written as y = XCC-1" + u where …

Regression Diagnostics and Specification Tests

8.1 Since H = PX is idempotent, it is positive semi-definite with b0H b > 0 for any arbitrary vector b. Specifically, for b0 = (1,0,.., 0/ we get hn …

Generalized Least Squares

9.1 GLS Is More Efficient than OLS. a. Equation (7.5) of Chap. 7 gives "ois = " + (X'X)-1X'u so that E("ois) = " as long as X and u …

Empirical Example

Springer Texts in Business and Economics

The General Linear Model: The Basics

Regression Diagnostics and Specification Tests

Generalized Least Squares

Новые и рекомендуемые материалы:

Производство и продажа хонинговального инструмента

Оборудование для производства краски

Теплообменники для паровых и водяных котлов

Станок для производства ТЕРИВА TERIVA (блоки перекрытия)

Оборудование для производства пенобетона

Расфасовка угля, торфа, кормов, оборудование для упаковки-дозирования

Паровые котлы на дровах, опилках

Где работают наши линии по производству пенобетона

Где работают наши линии по производству пенопласта

Малый бизнес

Производимое оборудование

Техническая литература

Как с нами связаться:

Контакты для заказов оборудования: