A COMPANION TO Theoretical Econometrics
Sample selection models with a tobit selection rule
For the semiparametric estimation of a sample selection model with a discrete choice equation, the exclusion restriction of a relevant regressor in the choice equation from the outcome equation provides the crucial identification condition. Such an exclusion restriction will not be required when the selection criterion is a tobit rule. The identification and estimation of such a model are considered in Lee (1994a), Chen (1997), and Honore, Kyriazidou, and Udry (1997). The model in Lee (1994a) assumes that the disturbances (u1, u2) in equation (18.6) are independent of the regressors x. With iid disturbances, model (18.6) implies two observable outcome equations: E( y2i | y1i > 0, xi) = xip + E(u2i | y1i > 0, xi) and
E(y21 u1 > - xij, xj > xij, xi) = E(x | xj > xij, xi)P + E(u21 u1 > - xij, xj > xij, xi).
As the disturbances are independent of x, the conditional moment restriction E(u21 u1 > - xj, x j > xij, xi) = E(u2i | yi > 0, xi) provides the identification of P given j from the tobit equation. The intuition behind these formulations is based on the fact that the density of (u1i, u2i) conditional on y1i > 0 and xi is the same as the density of any (u1;, u2j) conditional on u1;- > - xj and xj > xj at the point xij. The conditions xj > xj and u1j > - xj imply that y1;- = xj + u1j > 0 and, hence, the observability of (y1;-, y2j). The E(u21 u1 > - xj, xj > xj, xi) can be estimated by
X П=1 u2jI(u1j > - x1ij, x1jj > x1ij)/ X n=11(u1j > - x1j, x1j > x1ij). Instead of this estimator,
Lee (1994a) suggests a kernel smoothing estimator En( y2 - x p|xuy) where y is a consistent estimator from first-stage semiparametric estimation of the tobit selection equation, and proposed a semiparametric least squares procedure: minp іXn=1 Ix(xl)(y2i - X;P - En(y2 - xP | x1f, у)), where IX(x,) is a trimming function on x. The two-stage estimator of P has a closed form expression and is - Jn - consistent and asymptotically normal. Chen (1997) proposed two estimation approaches for this model. One is similar to the semiparametric least squares procedure in Lee (1994a) except that the ratio of sample indicators is used without smoothing and the trimming function is replaced by the weighting function as in Powell (1987). The second estimation approach in Chen (1997) is minp_a nXП=іI(yu - xy > 0, x, y > 0)(y2i - x;p - a). This approach utilizes a different portion of the observable disturbances where E(u2ilu1i > 0, x-y > 0) = a is a constant for all is under the independence assumption. Chen (1997) also derives the asymptotic efficiency bound for this semiparametric model. None of the estimators available in the literature (including the estimators in Honore et al.
(1997) discussed later) attain the efficiency bound. Honore et al. (1997) propose two-stage estimators based on symmetry. They consider first the case that (u1, u2) is symmetrically distributed conditional on x (arbitrary heteroskedasticity is allowed), i. e. (-u1, - u2) is distributed like (u1, u2) conditional on x. The property of symmetry in disturbances was first explored for estimating the censored and truncated regression models in Powell (1986). Even though the underlying disturbances are symmetrically distributed, the observable disturbances are no longer symmetrically distributed under sample selection. Honore et al. (1997) restore the symmetric property by restricting the estimation of P with sample observations in the region where - xу < u1 < xу (equivalently, 0 < y1 < 2xy). With sample observations in the restricted region, u1 is symmetrically distributed around 0 and the proposed estimation procedures can be based on least absolute deviations or least squares, i. e. minp 4-Х n=11(0 < yi; < 2x, y)| y2i - xiP |, or minp 4-Х n=11(0 < yi; < 2xiy)(y2i - x;P)2, where у is a first-stage estimator, e. g. semiparametric censored or truncated regression from Powell (1986). The restoration of symmetric region demonstrates elegantly the usefulness of observed residuals of u1 from the tobit selection equation for estimation. Honore et al. (1997) consider also the case that (u1, u2) is independent of x in the underlying equations. They suggest estimation approaches based on pairwise differences across sample observations. The pairwise difference approach is to create a possible symmetric property on the difference of disturbances. The difference of two iid random variables must be symmetrically distributed around zero if there is no sample selection. Under sample selection, one has to restore the symmetry property of the pairwise difference. Honore et al. suggested the trimming of u1i and u1j identically so that u1i > max{-x;y, - x-y} and u1j > max{-xiy, - x-y} (equivalently, y1;- > max{0, (x; - xj)y} and yi; > max{0, (x;- - x,)y}). On this trimmed region, the independence assumption implies that u2i - u2j is distributed symmetrically around 0. Their suggested pairwise difference estimators are minp X,<jI(y1i > max{0, (x, - x;-)y}, y1;- > max{0, (x;- - x,)y})| У2і - y2j - (x2i - x^P | or minp X4I(yu > max{0, (x, - xj)y}, yv > max{0, (x;- - x,)y})| y2i - y2j - (x2i - x2j)p |2. Consistency and asymptotic normality of the estimators are derived by empirical process arguments from Pakes and Pollard (1989).