Springer Texts in Business and Economics

The Censored Regression Model

Suppose one is interested in the amount one is willing to spend on the purchase of a durable good. For example, a car. In this case, one would observe the expenditures only if the car is bought, so

y* = х'в + Ui if y* > 0 (13.50)

where xi denotes a vector of household characteristics, such as income, number of children or education. y* is a latent variable, in this case the amount one is willing to spend on a car. We observe yi = y* only if y* > 0 and we set yi = 0 if y* < 0. The censoring at zero is of course arbitrary, and the ui’s are assumed to be IIN(0,a2). This is known as the Tobit model after Tobin (1958). In this case, we have censored observations since we do not observe any y* that is negative. All we observe is the fact that this household did not buy a car and a corresponding vector xi of this household’s characteristics. Without loss of generality, we assume that the first n1 observations have positive y* ’s and the remaining n0 = n — n1 observations have non-positive y*’s. In this case, OLS on the first ni observations, i. e., using only the positive observed y*’s would be biased since ui does not have zero mean. In fact, by omitting observations for which y* < 0 from the sample, one is only considering disturbances from (13.50) such that ui > —хф. The distribution of these u’s is a truncated normal density given in Figure 13.2. The mean of this density is not zero and is dependent on в, a2 and xi. More formally, the regression function can be written as:

E(y*/xi, y* > 0) = x'ie + E[ui/y* > 0] = х[в + E[ui/ui > —х'ф] (13.51)

= xie + <JYi for i = 1, 2,...,n1

where Yi = Ф(—zi)/[1 — Ф(—zi)] and zi = xiв/a. See Greene (1993, p. 685) for the moments of a truncated normal density or the Appendix to this chapter. OLS on the positive y* ’ s omits the second term in (13.51), and is therefore biased and inconsistent.

A simple two-step can be used to estimate (13.51). First, we define a dummy variable di, which takes the value 1 if y* is observed and 0 otherwise. This allows us to perform probit estimation on the whole sample, and provides us with a consistent estimator of (в/a). Also, P[di = 1] = P[y* > 0] = P[ui > —xie] and P[di = 0] = P[y* < 0] = P[ui < —x'ie]. Therefore,

the likelihood function is given by

t = n?=i[P (ui <—xie)]1-di [P (ui > —xi e)]di (13.52)

= 7=1 ^(zi)di [1 — T(zi)]1-di where Zi = xie/a

Table 13.14 Multinomial Logit Results: Problem Drinking

. mlogit y alc90th ue88 age agesq schooling married famsize white excellent verygood good fair northeast midwest south centercity othermsa q1 q2 q3, baseoutcome(1)

Подпись: Multinomial logistic regression Подпись: 9822 1276.47 0.0000 0.1655 Number of obs Wald chi2(20) Prob > chi2 Pseudo R2

Log likelihood = -3217.481

y	Coef.	Std. Err.	z	P> \|z\|	[95% Conf. Interval]
2 alc90th	.1270931	.21395	0.59	0.552	-.2922412	.5464274
ue88	.0458099	.051355	0.89	0.372	-.0548441	.1464639
age	.1617634	.0663205	2.44	0.015	.0317776	.2917492
agesq	-.0024377	.0007991	-3.05	0.002	-.004004	-.0008714
schooling	-.0092135	.0245172	-0.38	0.707	-.0572664	.0388393
married	.4004928	.1927458	2.08	0.038	.022718	.7782677
famsize	.0622453	.0503686	1.24	0.217	-.0364753	.1609659
white	.0391309	.1705625	0.23	0.819	-.2951653	.3734272
excellent	2.91833	.4486757	6.50	0.000	2.038942	3.797719
verygood	2.978336	.4505932	6.61	0.000	2.09519	3.861483
good	2.493939	.4446815	5.61	0.000	1.622379	3.365499
fair	1.460263	.4817231	3.03	0.002	.5161027	2.404422
northeast	.0849125	.2374365	0.36	0.721	-.3804545	.5502796
midwest	.0158816	.2037486	0.08	0.938	-.3834583	.4152215
south	.1750244	.2027444	0.86	0.388	-.2223474	.5723962
centercity	-.2717445	.1911074	-1.42	0.155	-.6463081	.1028192
othermsa	-.0921566	.1929076	-0.48	0.633	-.4702486	.2859354
q1	.422405	.1978767	2.13	0.033	.0345738	.8102362
q2	-.0219499	.2056751	-0.11	0.915	-.4250657	.3811659
q3	-.0365295	.2109049	-0.17	0.862	-.4498954	.3768364
cons	-6.113244	1.427325	-4.28	0.000	-8.910749	-3.315739
3 alc90th	-.1534987	.1395003	-1.10	0.271	-.4269144	.1199169
ue88	-.0954848	.033631	-2.84	0.005	-.1614004	-.0295693
age	.227164	.0409884	5.54	0.000	.1468282	.3074999
agesq	-.0030796	.0004813	-6.40	0.000	-.0040228	-.0021363
schooling	.0890537	.0152314	5.85	0.000	.0592008	.1189067
married	.7085708	.1219565	5.81	0.000	.4695405	.9476012
famsize	.0622447	.0332365	1.87	0.061	-.0028975	.127387
white	.7380044	.1083131	6.81	0.000	.5257147	.9502941
excellent	3.702792	.1852415	19.99	0.000	3.339725	4.065858
verygood	3.653313	.1894137	19.29	0.000	3.282069	4.024557
good	2.99946	.1786747	16.79	0.000	2.649264	3.349656
fair	1.876172	.1885159	9.95	0.000	1.506688	2.245657
northeast	.088966	.1491191	0.60	0.551	-.203302	.3812341
midwest	.1230169	.1294376	0.95	0.342	-.130676	.3767099
south	.4393047	.1298054	3.38	0.001	.1848908	.6937185
centercity	-.2689532	.1231083	-2.18	0.029	-.510241	-.0276654
othermsa	.0978701	.1257623	0.78	0.436	-.1486195	.3443598
q1	-.0274086	.1286695	-0.21	0.831	-.2795961	.224779
q2	-.110751	.126176	-0.88	0.380	-.3580514	.1365494
q3	-.0530835	.1296053	-0.41	0.682	-.3071052	.2009382
cons	-6.237275	.8886698	-7.02	0.000	-7.979036	-4.495515

(y==1 is the base outcome)

and once в/a is estimated, we substitute these estimates in zi and Yi given below (13.51) to get 7j. The second step is to estimate (13.51) using only the positive y*’s with yi substituted for Yi. The resulting estimator of в is consistent and asymptotically normal, see Heckman (1976, 1979).

Alternatively, one can use maximum likelihood procedures to estimate the Tobit model. Note that we have two sets of observations: (i) the positive y* ’s with yi = y*, for which we can write the density function N(х[в, а2), and (ii) the non-positive y* ’ s for which we assign yi = 0 with probability

Pr[yi = 0] = Pr[y* < 0] = Pr[«i < - х'ф] = Ф(-х'ів/а) = 1 - Ф(х[в/а) (13.53)

The probability over the entire censored region gets assigned to the censoring point. This allows us to write the following log-likelihood:

log^ = -(1/2) ЕЩ! log(2^2) - (1/2a2) ЕГ=і(Уі - х'в)2 (13.54)

+ ”=ni+1 log[1 - ф(хів/а)]

Differentiating with respect to в and a2, see Maddala (1983, p. 153), one gets

Подпись: (13.55) (13.56) (13.57) dl°g£/de = ЕГІі(Уі - хів)хі/а2 - E”=„1+i фіхі/а[1 - фі]

dlog£/da2 = YZі(Уі - хів)2/2а4 - (m/2a2) ^™=„i+i фгх'гв/[2а3(1 - Фі)]

where Фі and фі are evaluated at zi = хiв/a.

Premultiplying (13.55) by в’/2a2 and adding the result to (13.56), one gets

aMLE = Е”=і(Уі - хів)Уі/пі = Y((Yi - Хів)/т

where Y1 denotes the n1 x 1 vector of non-zero observations on yi, X1 is the n1 x k matrix of values of хі for the non-zero yi’s. Also, after multiplying throughout by a, (13.55) can be written as:

-X0+ X1 (Yi - Xi0)/a = 0 (13.58)

where Xo denotes the n0 x k matrix of хі’ s for which yi is zero, y0 is an n0 x 1 vector of Yi’ s = фі/[1 - Фі] evaluated at zi = хів/a for the observations for which yi = 0. Solving (13.58) one

gets

Подпись: (13.59)

Pmle = (XiXi)-1 XiYi - a(XiXi)“1X0Yo

Note that the first term in (13.59) is the OLS estimator for the first n observations for which y* is positive.

One can use the Newton-Raphson procedure or the method of scoring, for the second derivatives of the log-likelihood, see Maddala (1983, pp. 154-156). These can be computed with the tobit command in Stata. Note that for the Tobit specification, both в and a2 are identified. This is contrasted to the logit and probit specifications where only the ratio (в/a2) is identified. Wooldridge (2009, Chapter 17) recommends one obtain the estimates of (в/a2) from a probit and comparing those with the Tobit estimates generated by dividing /3 by 32. If these estimates are different or have different signs, then the Tobit estimation may not be appropriate. Problem 13.17 illustrates the Tobit estimation for married women labor supply example using the Mroz (1987) data.

Maddala warns that the Tobit specification is not necessarily the right specification every time we have zero observations. It is applicable only in those cases where the latent variable can, in principle, take negative values and the observed zero values are a consequence of censoring and non-observability. In fact, one cannot have negative expenditures on a car, negative hours of work or negative wages. However, one can enter employment and earn wages when one’s observed wage is larger than the reservation wage. Let y* be the difference between observed wage and reservation wage. Only if y* is positive will wages be observed. Final warning: The Tobit specification is heavily reliant on the normality and homoskedasticity assumptions. Failure of these assumptions leads to misleading inference.

Springer Texts in Business and Economics

The General Linear Model: The Basics

7.1 Invariance of the fitted values and residuals to non-singular transformations of the independent variables. The regression model in (7.1) can be written as y = XCC-1" + u where …

Regression Diagnostics and Specification Tests

8.1 Since H = PX is idempotent, it is positive semi-definite with b0H b > 0 for any arbitrary vector b. Specifically, for b0 = (1,0,.., 0/ we get hn …

Generalized Least Squares

9.1 GLS Is More Efficient than OLS. a. Equation (7.5) of Chap. 7 gives "ois = " + (X'X)-1X'u so that E("ois) = " as long as X and u …

The Censored Regression Model

Springer Texts in Business and Economics

The General Linear Model: The Basics

Regression Diagnostics and Specification Tests

Generalized Least Squares

Новые и рекомендуемые материалы:

Производство и продажа хонинговального инструмента

Оборудование для производства краски

Теплообменники для паровых и водяных котлов

Станок для производства ТЕРИВА TERIVA (блоки перекрытия)

Оборудование для производства пенобетона

Расфасовка угля, торфа, кормов, оборудование для упаковки-дозирования

Паровые котлы на дровах, опилках

Где работают наши линии по производству пенобетона

Где работают наши линии по производству пенопласта

Малый бизнес

Производимое оборудование

Техническая литература

Как с нами связаться:

Контакты для заказов оборудования: