Springer Texts in Business and Economics
Sample Selection and Non-response
Non-response is a big problem plaguing survey data. Some individuals refuse to respond and some do not answer all the questions, especially on relevant economic variables like income. Suppose we interviewed randomly 150 individuals upon their graduation from high school. Among these, 50 were female and 100 were male. A year later, we try to re-interview these individuals to find out whether they are employed or not. Only 70 out of 100 males and 40 out of 50 females were located and interviewed a year later. Out of those re-interviewed, 60 males and 20 females were found to be employed. Let y = 1 if the individual is employed and zero if not. Let x be the sex of this individual and let z = 1 if this individual is located and interviewed a year later and zero otherwise.
Conditioning on sex of the respondent one can compute the probability of being employed a year after high school graduation as follows:
Pr[y = 1/x] = Pr[y = 1/x, z = 1] Pr[z = 1/x]+ Pr[y = 1/x, z = 0] Pr[z = 0/x]
In this case, Pr[y = 1/Male, z = 1] = 60/70, Pr[z = 1/Male] = 70/100 and Pr[z = 0/Male] = 30/100. But the sampling process is uninormative about the non-respondents or the censored observations, i. e., Pr[y = 1/Male, z = 0]. Therefore, in the absence of other information
Pr[y = 1/Male] = (0.6) + (0.3) Pr[y = 1/Male, z = 0]
Manski (1995) argues that one can estimate bounds on this probability. In fact, replacing 0 < Pr[y = 1/Male, z = 0] < 1 by its bounds, yields
0.6 < Pr[y = 1/Male] < 0.9
with the width of the bound equal to the probability of non-response conditioning on Males, i. e., Pr[z = 0/Male] = 0.3. Similarly, 0.4 < Pr[y = 1/Female] < 0.6 with the width of the bound equal to the probability of non-response conditioning on Females, i. e., Pr[z = 0/Female] = 10/50 = 0.2. Manski (1995) argues that these bounds are informative and should be the starting point of empirical analysis. Researchers assuming that non-response is ignorable or exogenous are imposing the following restriction
Pr[y = 1/Male, z = 1] = Pr[y = 1/Male, z = 0] = Pr[y = 1/Male] = 60/70 Pr[y = 1/Female, z = 1] = Pr[y = 1/Female, z = 0] = Pr[y = 1/Female] =20/40
To the extent that these probabilities are different casts doubt on the ignorable non-response assumption.