Mostly Harmless Econometrics: An Empiricist’s Companion
The Selection Problem
We take a brief time-out for a more formal discussion of the role experiments play in uncovering causal effects. Suppose you are interested in a causal “if-then” question. To be concrete, consider a simple example: Do hospitals make people healthier? For our purposes, this question is allegorical, but it is surprisingly close to the sort of causal question health economists care about. To make this question more realistic, imagine we’re studying a poor elderly population that uses hospital emergency rooms for primary care. Some of these patients are admitted to the hospital. This sort of care is expensive, crowds hospital facilities, and is, perhaps, not very effective (see, e. g., Grumbach, Keane, and Bindman, 1993). In fact, exposure to other sick patients by those who are themselves vulnerable might have a net negative impact on their health.
Since those admitted to the hospital get many valuable services, the answer to the hospital-effectiveness question still seems likely to be "yes". But will the data back this up? The natural approach for an empirically-minded person is to compare the health status of those who have been to the hospital to the health of those who have not. The National Health Interview Survey (NHIS) contains the information needed to make this comparison. Specifically, it includes a question “During the past 12 months, was the respondent a patient in a hospital overnight?” which we can use to identify recent hospital visitors. The NHIS also asks “Would you say your health in general is excellent, very good, good, fair, poor?” The following table displays the mean health status (assigning a 1 to excellent health and a 5 to poor health) among those who have been hospitalized and those who have not (tabulated from the 2005 NHIS):
Group Sample Size Mean health status Std. Error
|
The difference in the means is 0.71, a large and highly significant contrast in favor of the non-hospitalized, with a t-statistic of 58.9.
Taken at face value, this result suggests that going to the hospital makes people sicker. It’s not impossible this is the right answer: hospitals are full of other sick people who might infect us, and dangerous machines and chemicals that might hurt us. Still, it’s easy to see why this comparison should not be taken at face value: people who go to the hospital are probably less healthy to begin with. Moreover, even after hospitalization people who have sought medical care are not as healthy, on average, as those who never get hospitalized in the first place, though they may well be better than they otherwise would have been.
To describe this problem more precisely, think about hospital treatment as described by a binary random variable, Dj = {0, 1}. The outcome of interest, a measure of health status, is denoted by Yj. The question is whether Yj is affected by hospital care. To address this question, we assume we can imagine what might have happened to someone who went to the hospital if they had not gone and vice versa. Hence, for any individual there are two potential health variables:
{YU if Di = 1
.
Yoi if Di =0
In other words, Yoi is the health status of an individual had he not gone to the hospital, irrespective of whether he actually went, while Yii is the individual’s health status if he goes. We would like to know the difference between Y1i and Yoi, which can be said to be the causal effect of going to the hospital for individual i. This is what we would measure if we could go back in time and change a person’s treatment status.[4]
The observed outcome, Yi, can be written in terms of potential outcomes as
if Di = 0
Yoi + (Yii - Yoi)Di
This notation is useful because Y1i — Yoi is the causal effect of hospitalization for an individual. In general, there is likely to be a distribution of both Y1i and Yoi in the population, so the treatment effect can be different for different people. But because we never see both potential outcomes for any one person, we must learn about the effects of hospitalization by comparing the average health of those who were and were not hospitalized.
A naive comparison of averages by hospitalization status tells us something about potential outcomes, though not necessarily what we want to know. The comparison of average health conditional on hospitalization status is formally linked to the average causal effect by the equation below:
E [Yii|Di = 1] - E[Yoi|Di = 1]
1 'V }
average treatment effect on the treated
+ E [yoi |D i = 1] - E [yoi |D i = 0]
4--------------------------------------- v---------------------------------------- }
selection bias
The term
E[y li | D i = 1] - E[Yoi|Di = 1] = E[y li - Yoi|Di
is the average causal effect of hospitalization on those who were hospitalized. This term captures the averages difference between the health of the hospitalized, E[Y1i|Di = 1], and what would have happened to them had they not been hospitalized, E[Yoi|Di = 1]. The observed difference in health status however, adds to this causal effect a term called selection bias. This term is the difference in average Yoi between those who