A COMPANION TO Theoretical Econometrics
Some conventional sample selection models
The tobit model assumes that the censoring threshold is deterministic and known. A generalization of the tobit model assumes that the censoring threshold is an unobservable stochastic variable. This generalization consists of two latent regression functions defined on the population: y* = v1p1 + u1 and y* = x2p2 + u2. The sample observation is (Iy1, I), where y1 = y* and I = 1 if y* > y* , and I = 0 if y* < y*. An example of this model is a labor supply model (Gronau, 1974; Heckman, 1974; Nelson, 1977), where y* is an offered wage and y* is the reservation wage of an individual. In a labor supply model, the individual maximizes utility with respect to income and leisure time subject to income and time constraints: max{U(t, c, u) : c = y*(T - t) + c0, t < T}, where T is the available time, c0 is nonlabor income available, c is the total income, and u represents unobserved characteristics of an individual. The reservation wage y* is (3U/3t)/(3U/3c) t=T. The market wage y*1 can be observed only for the worker. This formulation can further be generalized to incorporate fixed costs of participation. A general framework for these models can be written as
y* = x P + u, and I* = z y - e, (18.1)
where y = y* can be observed only if I* > 0. This two-equation formulation provides the prototypical sample selection model in econometrics. The sample is censored if the sign of I * is observable in addition to Iy. It is a truncated case if only the event I = 1 and its corresponding sample observations of y are available.
Sample data can be generated by individuals making choices of belonging to one or another group, i. e. by the self-selection of individuals. A prototypical choice theoretic model of self-selection is that of Roy (1951). Roy (1951) discussed the problem of individuals choosing between two professions, hunting and fishing, based on their productivity (income) in each. There is a latent population of skills. While every person can, in principle, do the work in each "occupation", self-interest drives individuals to choose the occupation that produces the highest income for them. Roy's model is special in that an individual chooses his occupation based on the highest income among occupations. A more general setting is that individuals choose between several alternatives based on their preferences and the (potential) outcomes can be factors in their utility functions (Lee, 1978; Willis and Rosen, 1979).
A self-selection model with two alternatives and a potential outcome equation for each alternative can be
y* = x1p1 + u1, and y* = x2p2 + u2, (18.2)
with a choice equation
I* = z y - e. (18.3)
The sample observation (I, y) is I = 1 and y = y*1 if I* > 0, and I = 0 and y = y*2 if I* < 0. For cases with polychotomous choices, a self-selection model with m alternatives and m1 potential outcome equations, where 0 < m1 < m, is
y = XjPj + Uj, j = 1,..., m1, (18.4)
and
Uj = ZjY + Vj, j = 1,..., m. (18.5)
The Uj represents the utility of the alternative j. Outcomes are available for some m1 alternatives. The outcome yj can be observed only if the alternative j is chosen by the individual. In a utility maximization framework, the alternative j will be chosen if Uj > U for all l Ф j, l = 1,..., m.
The selection equations in the preceding models provide discrete choices. In some cases, the selection equations may provide more sample information than
discrete choices. A censored regression selection criterion is such a case. A model of female labor supply without participation cost is an example. The market wage can be observed when the hour of work of an individual is positive and the hours-of-work equation can be modeled by a tobit model. A sample selection model with a tobit (censored) selection rule can be specified as
y* = x у + u1, and y* = x в + u2, (18.6)
where (y1, y2) can be observed such that (y1, y2) = (y*, y*) when y* > 0. This model provides additional information in that positive values of y*1 can be observed instead of just the sign of y*. Other models that are of interest are simultaneous equation models and panel data models.
An important feature of a sample selection model is its usage to investigate potential outcomes or opportunity costs besides observed outcomes. For sectorial wage or occupational choices with self-selection, Roy's model (Roy, 1951; Heckman and Honore, 1990) has emphasized comparative advantage in individuals and its effect on income distribution. For example, the comparative advantage measure of Sattinger (1978) involves the computation of opportunity costs for forgone choices (Lee, 1995). Opportunity costs are counterfactual outcomes. The evaluations of counterfactual outcomes are important in social welfare programs (Heckman and Robb, 1985; Bjorklund and Moffitt, 1987) because of their policy implications.