Springer Texts in Business and Economics

Empirical Examples

Example 1: Union Participation

To illustrate the logit and probit models, we consider the PSID data for 1982 used in Chapter 4. In this example, we are interested in modelling union participation. Out of the 595 individuals observed in 1982, 218 individuals had their wage set by a union and 377 did not. The explanatory variables used are: years of education (ED), weeks worked (WKS), years of full-time work experience (EXP), occupation (OCC = 1, if the individual is in a blue-collar occupation), residence (SOUTH = 1, SMSA = 1, if the individual resides in the South, or in a standard metropolitan statistical area), industry (IND = 1, if the individual works in a manufacturing industry), marital status (MS = 1, if the individual is married), sex and race (FEM = 1, BLK = 1, if the individual is female or black). A full description of the data is given in Cornwell and Rupert (1988). The results of the linear probability, logit and probit models are given in Table 13.3. These were computed using EViews. In fact Table 13.4 gives the probit output. We have already mentioned that the probit model normalizes a to be 1. But, the logit model has variance n2/3. Therefore, the logit estimates tend to be larger than the probit estimates although by a factor less than пД/3. In order to make the logit results comparable to those of the probit, Amemiya (1981) suggests multiplying the logit coefficient estimates by 0.625.

Similarly, to make the linear probability estimates comparable to those of the probit model one needs to multiply these coefficients by 2.5 and then subtract 1.25 from the constant term. For this example, both logit and probit procedures converged quickly in 4 iterations. The log - likelihood values and McFadden’s (1974) R2 obtained for the last iteration are recorded.

Table 13.3 Comparison of the Linear Probability, Logit and Probit Models: Union Participation*

Variable	OLS	Logit	Probit
EXP	-.005 (1.14)	-.007 (1.15)	-.007 (1.21)
WKS	-.045 (5.21)	-.068 (5.05)	-.061 (5.16)
OCC	.795 (6.85)	1.036 (6.27)	.955 (6.28)
IND	.075 (0.79)	.114 (0.89)	.093 (0.76)
SOUTH	-.425 (4.27)	-.653 (4.33)	-.593 (4.26)
SMSA	.211 (2.20)	.280 (2.05)	.261 (2.03)
MS	.247 (1.55)	.378 (1.66)	.351 (1.62)
FEM	-.272 (1.37)	-.483 (1.58)	-.407 (1.47)
ED	-.040 (1.88)	-.057 (1.85)	-.057 (1.99)
BLK	.125 (0.71)	.222 (0.90)	.226 (0.99)
Const	1.740 (5.27)	2.738 (3.27)	2.517 (3.30)
Log-likelihood		-312.337	-313.380
McFadden’s R2		0.201	0.198
Xio		157.2	155.1

* Figures in parentheses are t-statistics

Note that the logit and probit estimates yield similar results in magnitude, sign and significance. One would expect different results from the logit and probit only if there are several observations in the tails. The following variables were insignificant at the 5% level: EXP, IND, MS, FEM and BLK. The results show that union participation is less likely if the individual resides in the South and more likely if he or she resides in a standard metropolitan statistical area. Union participation is also less likely the more the weeks worked and the higher the years of education. Union participation is more likely for blue-collar than non blue-collar occupations. The linear probability model yields different estimates from the logit and probit results. OLS predicts two observations with & > 1, and 29 observations with & < 0. Table 13.5 gives the actual versus predicted values of union participation for the linear probability, logit and probit models. The percentage of correct predictions is 75% for the linear probability and probit model and 76% for the logit model.

One can test the significance of all slope coefficients by computing the LR based on the unrestricted log-likelihood value (logiu) reported in Table 13.3, and the restricted log-likelihood value including only the constant. The latter is the same for both the logit and probit models and is given by

log4 = n[ylogy + (1 - y)log(1 - y)] (13.33)

where y is the proportion of the sample with yi = 1, see problem 2. In this example, y = 218/595 = 0.366 and n = 595 with logir = -390.918. Therefore, for the probit model,

LR = -2[log4 - login] = —2[—390.918 + 313.380] = 155.1

which is distributed as хІо under the null of zero slope coefficients. This is highly significant and the null is rejected. Similarly, for the logit model this LR statistic is 157.2. For the linear probability model, the same null hypothesis of zero slope coefficients can be tested using a

Подпись: Dependent Variable: Method: Sample: Included observations:

Подпись: UNION ML - Binary Probit 1 595 595

Table 13.4 Probit Estimates: Union Participation

Convergence achieved after 5 iterations Covariance matrix computed using second derivatives

Variable	Coefficient	Std. Error	z-Statistic	Prob.
EX	-0.006932	0.005745	-1.206491	0.2276
WKS	-0.060829	0.011785	-5.161666	0.0000
OCC	0.955490	0.152137	6.280476	0.0000
IND	0.092827	0.122774	0.756085	0.4496
SOUTH	-0.592739	0.139102	-4.261183	0.0000
SMSA	0.260700	0.128630	2.026741	0.0427
MS	0.350520	0.216284	1.620648	0.1051
FEM	-0.407026	0.277038	-1.469203	0.1418
ED	-0.057382	0.028842	-1.989515	0.0466
BLK	0.226482	0.228845	0.989675	0.3223
C	2.516784	0.762612	3.300217	0.0010
Mean dependent var	0.366387	S. D. dependent var	0.482222
S. E. of regression	0.420828	Akaike info criterion	1.090351
Sum squared resid	103.4242	Schwarz criterion	1.171484
Log likelihood	-313.3795	Hannan-Quinn criter.	1.121947
Restr. log likelihood	-390.9177	Avg. log likelihood	-0.526688
LR statistic (10 df)	155.0763	McFadden R-squared	0.198349
Probability(LR stat)	0.000000
Obs with Dep=0	377	Total obs	595
Obs with Dep=1	218

Table 13.5 Actual Versus Predicted	: Union Participation
	Predicted	Total

	Union =	0		Union = 1
Union =0	OLS		312	OLS =	65	377
	LOGIT		316	LOGIT =	61
	Probit		314	Probit =	63
Actual
Union =1	OLS		83	OLS =	135	218
	LOGIT		82	LOGIT =	136
	Probit		86	Probit =	132
	OLS		395	OLS =	200	595
Total	LOGIT		398	LOGIT =	197
	Probit		400	Probit =	195

Chow F-statistic. This yields an observed value of 17.80 which is distributed as F(10, 584) under the null hypothesis. Again, the null is soundly rejected. This F-test is in fact the BRMR test considered in section 13.6. As described in section 13.8, McFadden’s R2 is given by R2 = 1 — loglu/loglr] which for the probit model yields

R2 = 1 — (313.380/390.918) = 0.198.

For the logit model, McFadden’s R2 is 0.201.

Example 2: Employment and Problem Drinking

Mullahy and Sindelar (1996) estimate a linear probability model relating employment and measures of problem drinking. The analysis is based on the 1988 Alcohol Supplement of the National Health Interview Survey. This regression was performed for Males and Females separately since the authors argue that women are less likely than men to be alcoholic, are more likely to abstain from consumption, and have lower mean alcohol consumption levels. They also report that women metabolize ethanol faster than do men and experience greater liver damage for the same level of consumption of ethanol. The dependent variable takes the value 1 if the individual was employed in the past two weeks and zero otherwise. The explanatory variables included the 90th percentile of ethanol consumption in the sample (18 oz. for males and 10.8 oz. for females) and zero otherwise. This variables is denoted by hvdrnk90. The state unemployment rate in 1988 (UE88), Age, Age2, schooling, married, family size, and white. Health status dummies indicating whether the individual’s health was excellent, very good, fair. Region of residence, whether the individual resided in the northeast, midwest or south. Also, whether he or she resided in center city (msa1) or other metropolitan statistical area (not center city, msa2). Three additional dummy variables were included for the quarters in which the survey was conducted. Details on the definitions of these variables are given in Table 1 of Mullahy and Sindelar (1996). Table 13.6 gives the probit results based on n = 9822 males using Stata. These results show a negative relationship between the 90th percentile alcohol variable and the probability of being employed, but this has a p-value of 0.075. Mullahy and Sindelar find that for both men and women, problem drinking results in reduced employment and increased unemployment. Table 13.7 gives the marginal effects computed in Stata using the mfx option after probit estimation. The marginal effects are computed at the sample mean of the variables, except in the case of dummy variables where it is done for a discrete change from 0 to 1. For example, the marginal effect of being a heavy drinker in the upper 90th percentile of ethanol consumption in the sample, (given that all the other variables are evaluated at their mean and dummy variables are changing from 0 to 1), is to decrease the probability of employment by 1.6%. These can also be computed at particular values of the explanatory variables with the option at in Stata. In fact Table 13.8 gives the average marginal effect for all males. This can be computed using the margeff command in Stata. In this case the average marginal effect for a heavy drinker (-.0165) did not change much from the marginal effect computed at the sample mean (-.0162) and neither did the standard error (.0096 compared with.0093). The goodness of fit as measured by how well this probit classifies the predicted probabilities is given in Table 13.9 using the estat classification option in Stata. The percentage of correct predictions is 90.79%. Problem 13 asks the reader to verify these results as well as those in the original article by Mullahy and Sindelar (1996).

. probit emp hvdrnk90 ue88 age agesq educ married famsize white hlstat1 hlstat2 hlstat3 hlstat4 region1 region2 region3 msa1 msa2 q1 q2 q3, robust

Probit regression

Log pseudolikelihood = -	2698.1797			Pseudo R2	= 0.1651
emp	Coef.	Robust Std. Err.	z	P> z	[95% Conf. Interval]
hvdrnk90	-.1049465	.0589881	-1.78	0.075	-.2205612	.0106681
ue88	-.0532774	.0142025	-3.75	0.000	-.0811137	-.0254411
age	.0996338	.0171185	5.82	0.000	.0660821	.1331855
agesq	-.0013043	.0002051	-6.36	0.000	-.0017062	-.0009023
educ	.0471834	.0066739	7.07	0.000	.0341029	.0602639
married	.2952921	.0540858	5.46	0.000	.189286	.4012982
famsize	.0188906	.0140463	1.34	0.179	-.0086398	.0464209
white	.3945226	.0483381	8.16	0.000	.2997818	.4892634
hlstat1	1.816306	.0983447	18.47	0.000	1.623554	2.009058
hlstat2	1.778434	.0991531	17.94	0.000	1.584098	1.972771
hlstat3	1.547836	.0982637	15.75	0.000	1.355243	1.74043
hlstat4	1.043363	.1077279	9.69	0.000	.8322205	1.254506
region1	.0343123	.0620021	0.55	0.580	-.0872096	.1558341
region2	.0604907	.0537885	1.12	0.261	-.0449327	.1659142
region3	.1821206	.0542346	3.36	0.001	.0758227	.2884185
msa1	-.0730529	.0518719	-1.41	0.159	-.1747199	.0286141
msa2	.0759533	.0513092	1.48	0.139	-.0246109	.1765175
q1	-.1054844	.0527728	-2.00	0.046	-.2089171	-.0020516
q2	-.0513229	.0528185	-0.97	0.331	-.1548453	.0521995
q3	-.0293419	.0543751	-0.54	0.589	-.1359152	.0772313
cons	-3.017454	.3592321	-8.40	0.000	-3.721536	-2.313372

Number of obs Wald chi2(20) Prob > chi2

9822

928.33

0.0000

Example 3: Fertility and Same Sex of Previous Children

Carrasco (2001) estimated a probit equation for fertility using PSID data over the period 19861989. The sample consists of 1,442 married or cohabiting women between the ages of 18 and 55 in 1986. The dependent variable fertility (f) is specified by a dummy variable that equals 1 if the age of the youngest child in the next year is 1. The explanatory variables are: (ags26l) which is a dummy variable that equals 1 if the woman has a child between 2 and 6 years old; education which has three levels (educ 1, educ 2 and educ 3), the female’s age, race, and husband’s income. An indicator of same sex of previous children (dsex), and its components: (dsexf) for girls, and (dsexm) for boys. This variable exploits the widely observed phenomenon of parental preferences for a mixed sibling-sex composition in developed countries. Therefore, a dummy for whether the sex of the next child matches the sex of the previous children provides a plausible predictor for additional childbearing. The data set can be obtained from the Journal of Business & Economic Statistics archive data web site. Problem 15 asks the reader to replicate some of the results obtained in the original article by Carrasco (2001). The estimates reveal that having children of the same sex has a significant and positive effect on the probability of having an additional child. The marginal effect of same sex children increases the probability of fertility by 3%, see Table 13.10. These are obtained using the dprobit command in Stata.

. mfx compute

Marginal effects after probit

y = Pr(emp) (predict)

= .92244871

variable	dy/dx	Std. Err.	z	P> \|z\|	[95% Conf. Interval]	X
hvdrnk90*	-.0161704	.00962	-1.68	0.093	-.035034	.002693	.099165
ue88	-.0077362	.00205	-3.78	0.000	-.011747	-.003725	5.56921
age	.0144674	.00248	5.83	0.000	.009607	.019327	39.1757
agesq	-.0001894	.00003	-6.37	0.000	-.000248	-.000131	1627.61
educ	.0068513	.00096	7.12	0.000	.004966	.008737	13.3096
married*	.0488911	.01009	4.85	0.000	.029119	.068663	.816432
famsize	.002743	.00204	1.35	0.179	-.001253	.006739	2.7415
white*	.069445	.01007	6.90	0.000	.049709	.089181	.853085
hlstat1*	.2460794	.01484	16.58	0.000	.216991	.275167	.415903
hlstat2*	.1842432	.00992	18.57	0.000	.164799	.203687	.301873
hlstat3*	.130786	.00661	19.80	0.000	.11784	.143732	.205254
hlstat4*	.0779836	.00415	18.77	0.000	.069841	.086126	.053451
region1*	.0049107	.00875	0.56	0.575	-.012233	.022054	.203014
region2*	.0086088	.0075	1.15	0.251	-.006092	.023309	.265628
region3*	.0252543	.00715	3.53	0.000	.011247	.039262	.318265
msa1*	-.0107946	.00779	-1.39	0.166	-.026061	.004471	.333232
msa2*	.0109542	.00735	1.49	0.136	-.003456	.025365	.434942
q1*	-.0158927	.00825	-1.93	0.054	-.032053	.000268	.254632
q2*	-.0075883	.00795	-0.95	0.340	-.023167	.007991	.252698
q3*	-.0043066	.00807	-0.53	0.594	-.020121	.011508	.242822

(*) dy/dx is for discrete change of dummy variable from 0 to 1

Springer Texts in Business and Economics

The General Linear Model: The Basics

7.1 Invariance of the fitted values and residuals to non-singular transformations of the independent variables. The regression model in (7.1) can be written as y = XCC-1" + u where …

Regression Diagnostics and Specification Tests

8.1 Since H = PX is idempotent, it is positive semi-definite with b0H b > 0 for any arbitrary vector b. Specifically, for b0 = (1,0,.., 0/ we get hn …

Generalized Least Squares

9.1 GLS Is More Efficient than OLS. a. Equation (7.5) of Chap. 7 gives "ois = " + (X'X)-1X'u so that E("ois) = " as long as X and u …

Empirical Examples

Springer Texts in Business and Economics

The General Linear Model: The Basics

Regression Diagnostics and Specification Tests

Generalized Least Squares

Новые и рекомендуемые материалы:

Производство и продажа хонинговального инструмента

Оборудование для производства краски

Теплообменники для паровых и водяных котлов

Станок для производства ТЕРИВА TERIVA (блоки перекрытия)

Оборудование для производства пенобетона

Расфасовка угля, торфа, кормов, оборудование для упаковки-дозирования

Паровые котлы на дровах, опилках

Где работают наши линии по производству пенобетона

Где работают наши линии по производству пенопласта

Малый бизнес

Производимое оборудование

Техническая литература

Как с нами связаться:

Контакты для заказов оборудования: