Springer Texts in Business and Economics
Empirical Examples
Example 1: Union Participation
To illustrate the logit and probit models, we consider the PSID data for 1982 used in Chapter 4. In this example, we are interested in modelling union participation. Out of the 595 individuals observed in 1982, 218 individuals had their wage set by a union and 377 did not. The explanatory variables used are: years of education (ED), weeks worked (WKS), years of full-time work experience (EXP), occupation (OCC = 1, if the individual is in a blue-collar occupation), residence (SOUTH = 1, SMSA = 1, if the individual resides in the South, or in a standard metropolitan statistical area), industry (IND = 1, if the individual works in a manufacturing industry), marital status (MS = 1, if the individual is married), sex and race (FEM = 1, BLK = 1, if the individual is female or black). A full description of the data is given in Cornwell and Rupert (1988). The results of the linear probability, logit and probit models are given in Table 13.3. These were computed using EViews. In fact Table 13.4 gives the probit output. We have already mentioned that the probit model normalizes a to be 1. But, the logit model has variance n2/3. Therefore, the logit estimates tend to be larger than the probit estimates although by a factor less than пД/3. In order to make the logit results comparable to those of the probit, Amemiya (1981) suggests multiplying the logit coefficient estimates by 0.625.
Similarly, to make the linear probability estimates comparable to those of the probit model one needs to multiply these coefficients by 2.5 and then subtract 1.25 from the constant term. For this example, both logit and probit procedures converged quickly in 4 iterations. The log - likelihood values and McFadden’s (1974) R2 obtained for the last iteration are recorded.
Table 13.3 Comparison of the Linear Probability, Logit and Probit Models: Union Participation*
* Figures in parentheses are t-statistics |
Note that the logit and probit estimates yield similar results in magnitude, sign and significance. One would expect different results from the logit and probit only if there are several observations in the tails. The following variables were insignificant at the 5% level: EXP, IND, MS, FEM and BLK. The results show that union participation is less likely if the individual resides in the South and more likely if he or she resides in a standard metropolitan statistical area. Union participation is also less likely the more the weeks worked and the higher the years of education. Union participation is more likely for blue-collar than non blue-collar occupations. The linear probability model yields different estimates from the logit and probit results. OLS predicts two observations with & > 1, and 29 observations with & < 0. Table 13.5 gives the actual versus predicted values of union participation for the linear probability, logit and probit models. The percentage of correct predictions is 75% for the linear probability and probit model and 76% for the logit model.
One can test the significance of all slope coefficients by computing the LR based on the unrestricted log-likelihood value (logiu) reported in Table 13.3, and the restricted log-likelihood value including only the constant. The latter is the same for both the logit and probit models and is given by
log4 = n[ylogy + (1 - y)log(1 - y)] (13.33)
where y is the proportion of the sample with yi = 1, see problem 2. In this example, y = 218/595 = 0.366 and n = 595 with logir = -390.918. Therefore, for the probit model,
LR = -2[log4 - login] = —2[—390.918 + 313.380] = 155.1
which is distributed as хІо under the null of zero slope coefficients. This is highly significant and the null is rejected. Similarly, for the logit model this LR statistic is 157.2. For the linear probability model, the same null hypothesis of zero slope coefficients can be tested using a
Table 13.4 Probit Estimates: Union Participation
Convergence achieved after 5 iterations Covariance matrix computed using second derivatives
|
Table 13.5 Actual Versus Predicted |
: Union Participation |
|||||
Predicted |
Total |
|||||
Union = |
0 |
Union = 1 |
||||
Union =0 |
OLS |
312 |
OLS = |
65 |
377 |
|
LOGIT |
316 |
LOGIT = |
61 |
|||
Probit |
314 |
Probit = |
63 |
|||
Actual |
||||||
Union =1 |
OLS |
83 |
OLS = |
135 |
218 |
|
LOGIT |
82 |
LOGIT = |
136 |
|||
Probit |
86 |
Probit = |
132 |
|||
OLS |
395 |
OLS = |
200 |
595 |
||
LOGIT |
398 |
LOGIT = |
197 |
|||
Probit |
400 |
Probit = |
195 |
Chow F-statistic. This yields an observed value of 17.80 which is distributed as F(10, 584) under the null hypothesis. Again, the null is soundly rejected. This F-test is in fact the BRMR test considered in section 13.6. As described in section 13.8, McFadden’s R2 is given by R2 = 1 — loglu/loglr] which for the probit model yields
R2 = 1 — (313.380/390.918) = 0.198.
For the logit model, McFadden’s R2 is 0.201.
Example 2: Employment and Problem Drinking
Mullahy and Sindelar (1996) estimate a linear probability model relating employment and measures of problem drinking. The analysis is based on the 1988 Alcohol Supplement of the National Health Interview Survey. This regression was performed for Males and Females separately since the authors argue that women are less likely than men to be alcoholic, are more likely to abstain from consumption, and have lower mean alcohol consumption levels. They also report that women metabolize ethanol faster than do men and experience greater liver damage for the same level of consumption of ethanol. The dependent variable takes the value 1 if the individual was employed in the past two weeks and zero otherwise. The explanatory variables included the 90th percentile of ethanol consumption in the sample (18 oz. for males and 10.8 oz. for females) and zero otherwise. This variables is denoted by hvdrnk90. The state unemployment rate in 1988 (UE88), Age, Age2, schooling, married, family size, and white. Health status dummies indicating whether the individual’s health was excellent, very good, fair. Region of residence, whether the individual resided in the northeast, midwest or south. Also, whether he or she resided in center city (msa1) or other metropolitan statistical area (not center city, msa2). Three additional dummy variables were included for the quarters in which the survey was conducted. Details on the definitions of these variables are given in Table 1 of Mullahy and Sindelar (1996). Table 13.6 gives the probit results based on n = 9822 males using Stata. These results show a negative relationship between the 90th percentile alcohol variable and the probability of being employed, but this has a p-value of 0.075. Mullahy and Sindelar find that for both men and women, problem drinking results in reduced employment and increased unemployment. Table 13.7 gives the marginal effects computed in Stata using the mfx option after probit estimation. The marginal effects are computed at the sample mean of the variables, except in the case of dummy variables where it is done for a discrete change from 0 to 1. For example, the marginal effect of being a heavy drinker in the upper 90th percentile of ethanol consumption in the sample, (given that all the other variables are evaluated at their mean and dummy variables are changing from 0 to 1), is to decrease the probability of employment by 1.6%. These can also be computed at particular values of the explanatory variables with the option at in Stata. In fact Table 13.8 gives the average marginal effect for all males. This can be computed using the margeff command in Stata. In this case the average marginal effect for a heavy drinker (-.0165) did not change much from the marginal effect computed at the sample mean (-.0162) and neither did the standard error (.0096 compared with.0093). The goodness of fit as measured by how well this probit classifies the predicted probabilities is given in Table 13.9 using the estat classification option in Stata. The percentage of correct predictions is 90.79%. Problem 13 asks the reader to verify these results as well as those in the original article by Mullahy and Sindelar (1996).
. probit emp hvdrnk90 ue88 age agesq educ married famsize white hlstat1 hlstat2 hlstat3 hlstat4 region1 region2 region3 msa1 msa2 q1 q2 q3, robust
Probit regression
Log pseudolikelihood = - |
2698.1797 |
Pseudo R2 |
= 0.1651 |
|||
emp |
Coef. |
Robust Std. Err. |
z |
P> z |
[95% Conf. Interval] |
|
hvdrnk90 |
-.1049465 |
.0589881 |
-1.78 |
0.075 |
-.2205612 |
.0106681 |
ue88 |
-.0532774 |
.0142025 |
-3.75 |
0.000 |
-.0811137 |
-.0254411 |
age |
.0996338 |
.0171185 |
5.82 |
0.000 |
.0660821 |
.1331855 |
agesq |
-.0013043 |
.0002051 |
-6.36 |
0.000 |
-.0017062 |
-.0009023 |
educ |
.0471834 |
.0066739 |
7.07 |
0.000 |
.0341029 |
.0602639 |
married |
.2952921 |
.0540858 |
5.46 |
0.000 |
.189286 |
.4012982 |
famsize |
.0188906 |
.0140463 |
1.34 |
0.179 |
-.0086398 |
.0464209 |
white |
.3945226 |
.0483381 |
8.16 |
0.000 |
.2997818 |
.4892634 |
hlstat1 |
1.816306 |
.0983447 |
18.47 |
0.000 |
1.623554 |
2.009058 |
hlstat2 |
1.778434 |
.0991531 |
17.94 |
0.000 |
1.584098 |
1.972771 |
hlstat3 |
1.547836 |
.0982637 |
15.75 |
0.000 |
1.355243 |
1.74043 |
hlstat4 |
1.043363 |
.1077279 |
9.69 |
0.000 |
.8322205 |
1.254506 |
region1 |
.0343123 |
.0620021 |
0.55 |
0.580 |
-.0872096 |
.1558341 |
region2 |
.0604907 |
.0537885 |
1.12 |
0.261 |
-.0449327 |
.1659142 |
region3 |
.1821206 |
.0542346 |
3.36 |
0.001 |
.0758227 |
.2884185 |
msa1 |
-.0730529 |
.0518719 |
-1.41 |
0.159 |
-.1747199 |
.0286141 |
msa2 |
.0759533 |
.0513092 |
1.48 |
0.139 |
-.0246109 |
.1765175 |
q1 |
-.1054844 |
.0527728 |
-2.00 |
0.046 |
-.2089171 |
-.0020516 |
q2 |
-.0513229 |
.0528185 |
-0.97 |
0.331 |
-.1548453 |
.0521995 |
q3 |
-.0293419 |
.0543751 |
-0.54 |
0.589 |
-.1359152 |
.0772313 |
cons |
-3.017454 |
.3592321 |
-8.40 |
0.000 |
-3.721536 |
-2.313372 |
Number of obs Wald chi2(20) Prob > chi2 |
9822 928.33 0.0000 |
Example 3: Fertility and Same Sex of Previous Children
Carrasco (2001) estimated a probit equation for fertility using PSID data over the period 19861989. The sample consists of 1,442 married or cohabiting women between the ages of 18 and 55 in 1986. The dependent variable fertility (f) is specified by a dummy variable that equals 1 if the age of the youngest child in the next year is 1. The explanatory variables are: (ags26l) which is a dummy variable that equals 1 if the woman has a child between 2 and 6 years old; education which has three levels (educ 1, educ 2 and educ 3), the female’s age, race, and husband’s income. An indicator of same sex of previous children (dsex), and its components: (dsexf) for girls, and (dsexm) for boys. This variable exploits the widely observed phenomenon of parental preferences for a mixed sibling-sex composition in developed countries. Therefore, a dummy for whether the sex of the next child matches the sex of the previous children provides a plausible predictor for additional childbearing. The data set can be obtained from the Journal of Business & Economic Statistics archive data web site. Problem 15 asks the reader to replicate some of the results obtained in the original article by Carrasco (2001). The estimates reveal that having children of the same sex has a significant and positive effect on the probability of having an additional child. The marginal effect of same sex children increases the probability of fertility by 3%, see Table 13.10. These are obtained using the dprobit command in Stata.
. mfx compute
Marginal effects after probit
y = Pr(emp) (predict)
= .92244871
variable |
dy/dx |
Std. Err. |
z |
P> |z| |
[95% Conf. Interval] |
X |
|
hvdrnk90* |
-.0161704 |
.00962 |
-1.68 |
0.093 |
-.035034 |
.002693 |
.099165 |
ue88 |
-.0077362 |
.00205 |
-3.78 |
0.000 |
-.011747 |
-.003725 |
5.56921 |
age |
.0144674 |
.00248 |
5.83 |
0.000 |
.009607 |
.019327 |
39.1757 |
agesq |
-.0001894 |
.00003 |
-6.37 |
0.000 |
-.000248 |
-.000131 |
1627.61 |
educ |
.0068513 |
.00096 |
7.12 |
0.000 |
.004966 |
.008737 |
13.3096 |
married* |
.0488911 |
.01009 |
4.85 |
0.000 |
.029119 |
.068663 |
.816432 |
famsize |
.002743 |
.00204 |
1.35 |
0.179 |
-.001253 |
.006739 |
2.7415 |
white* |
.069445 |
.01007 |
6.90 |
0.000 |
.049709 |
.089181 |
.853085 |
hlstat1* |
.2460794 |
.01484 |
16.58 |
0.000 |
.216991 |
.275167 |
.415903 |
hlstat2* |
.1842432 |
.00992 |
18.57 |
0.000 |
.164799 |
.203687 |
.301873 |
hlstat3* |
.130786 |
.00661 |
19.80 |
0.000 |
.11784 |
.143732 |
.205254 |
hlstat4* |
.0779836 |
.00415 |
18.77 |
0.000 |
.069841 |
.086126 |
.053451 |
region1* |
.0049107 |
.00875 |
0.56 |
0.575 |
-.012233 |
.022054 |
.203014 |
region2* |
.0086088 |
.0075 |
1.15 |
0.251 |
-.006092 |
.023309 |
.265628 |
region3* |
.0252543 |
.00715 |
3.53 |
0.000 |
.011247 |
.039262 |
.318265 |
msa1* |
-.0107946 |
.00779 |
-1.39 |
0.166 |
-.026061 |
.004471 |
.333232 |
msa2* |
.0109542 |
.00735 |
1.49 |
0.136 |
-.003456 |
.025365 |
.434942 |
q1* |
-.0158927 |
.00825 |
-1.93 |
0.054 |
-.032053 |
.000268 |
.254632 |
q2* |
-.0075883 |
.00795 |
-0.95 |
0.340 |
-.023167 |
.007991 |
.252698 |
q3* |
-.0043066 |
.00807 |
-0.53 |
0.594 |
-.020121 |
.011508 |
.242822 |
(*) dy/dx is for discrete change of dummy variable from 0 to 1 |