Mostly Harmless Econometrics: An Empiricist’s Companion

The Selection Problem

We take a brief time-out for a more formal discussion of the role experiments play in uncovering causal effects. Suppose you are interested in a causal “if-then” question. To be concrete, consider a simple example: Do hospitals make people healthier? For our purposes, this question is allegorical, but it is surprisingly close to the sort of causal question health economists care about. To make this question more realistic, imagine we’re studying a poor elderly population that uses hospital emergency rooms for primary care. Some of these patients are admitted to the hospital. This sort of care is expensive, crowds hospital facilities, and is, perhaps, not very effective (see, e. g., Grumbach, Keane, and Bindman, 1993). In fact, exposure to other sick patients by those who are themselves vulnerable might have a net negative impact on their health.

Since those admitted to the hospital get many valuable services, the answer to the hospital-effectiveness question still seems likely to be "yes". But will the data back this up? The natural approach for an empirically-minded person is to compare the health status of those who have been to the hospital to the health of those who have not. The National Health Interview Survey (NHIS) contains the information needed to make this comparison. Specifically, it includes a question “During the past 12 months, was the respondent a patient in a hospital overnight?” which we can use to identify recent hospital visitors. The NHIS also asks “Would you say your health in general is excellent, very good, good, fair, poor?” The following table displays the mean health status (assigning a 1 to excellent health and a 5 to poor health) among those who have been hospitalized and those who have not (tabulated from the 2005 NHIS):

Group Sample Size Mean health status Std. Error

Hospital	7774	2.79	0.014
No Hospital	90049	2.07	0.003

The difference in the means is 0.71, a large and highly significant contrast in favor of the non-hospitalized, with a t-statistic of 58.9.

Taken at face value, this result suggests that going to the hospital makes people sicker. It’s not impossible this is the right answer: hospitals are full of other sick people who might infect us, and dangerous machines and chemicals that might hurt us. Still, it’s easy to see why this comparison should not be taken at face value: people who go to the hospital are probably less healthy to begin with. Moreover, even after hospitalization people who have sought medical care are not as healthy, on average, as those who never get hospitalized in the first place, though they may well be better than they otherwise would have been.

To describe this problem more precisely, think about hospital treatment as described by a binary random variable, Dj = {0, 1}. The outcome of interest, a measure of health status, is denoted by Yj. The question is whether Yj is affected by hospital care. To address this question, we assume we can imagine what might have happened to someone who went to the hospital if they had not gone and vice versa. Hence, for any individual there are two potential health variables:

{YU if Di = 1

Yoi if Di =0

In other words, Yoi is the health status of an individual had he not gone to the hospital, irrespective of whether he actually went, while Yii is the individual’s health status if he goes. We would like to know the difference between Y1i and Yoi, which can be said to be the causal effect of going to the hospital for individual i. This is what we would measure if we could go back in time and change a person’s treatment status.[4]

The observed outcome, Yi, can be written in terms of potential outcomes as

Подпись: Y0i if Di = 0

Подпись: (2.1.1) Yoi + (Yii - Yoi)Di

This notation is useful because Y1i — Yoi is the causal effect of hospitalization for an individual. In general, there is likely to be a distribution of both Y1i and Yoi in the population, so the treatment effect can be different for different people. But because we never see both potential outcomes for any one person, we must learn about the effects of hospitalization by comparing the average health of those who were and were not hospitalized.

A naive comparison of averages by hospitalization status tells us something about potential outcomes, though not necessarily what we want to know. The comparison of average health conditional on hospitalization status is formally linked to the average causal effect by the equation below:

E [Yii|Di = 1] - E[Yoi|Di = 1]

1 'V }

average treatment effect on the treated

+ E [yoi |D i = 1] - E [yoi |D i = 0]

4--------------------------------------- v---------------------------------------- }

selection bias

The term

Подпись: 1] E[y li | D i = 1] - E[Yoi|Di = 1] = E[y li - Yoi|Di

is the average causal effect of hospitalization on those who were hospitalized. This term captures the averages difference between the health of the hospitalized, E[Y1i|Di = 1], and what would have happened to them had they not been hospitalized, E[Yoi|Di = 1]. The observed difference in health status however, adds to this causal effect a term called selection bias. This term is the difference in average Yoi between those who

were and were not hospitalized. Because the sick are more likely than the healthy to seek treatment, those who were hospitalized have worse Yoj’s, making selection bias negative in this example. The selection bias may be so large (in absolute value) that it completely masks a positive treatment effect. The goal of most empirical economic research is to overcome selection bias, and therefore to say something about the causal effect of a variable like Dj.

Mostly Harmless Econometrics: An Empiricist’s Companion

Regression Details

1.4.1 Weighting Regression Few things are as confusing to applied researchers as the role of sample weights. Even now, 20 years post - Ph. D., we read the section of …

Getting a Little Jumpy: Regression Discontinuity Designs

But when you start exercising those rules, all sorts of processes start to happen and you start to find out all sorts of stuff about people... Its just a way …

Two-Sample IV and Split-Sample IVF

GLS estimates of Г in (4.3.1) are consistent because E The 2SLS minimand can be thought of as GLS applied to equation (4.3.1), after multiplying by /N to keep the …

The Selection Problem

Mostly Harmless Econometrics: An Empiricist’s Companion

Regression Details

Getting a Little Jumpy: Regression Discontinuity Designs

Two-Sample IV and Split-Sample IVF

Новые и рекомендуемые материалы:

Производство и продажа хонинговального инструмента

Оборудование для производства краски

Теплообменники для паровых и водяных котлов

Станок для производства ТЕРИВА TERIVA (блоки перекрытия)

Оборудование для производства пенобетона

Расфасовка угля, торфа, кормов, оборудование для упаковки-дозирования

Паровые котлы на дровах, опилках

Где работают наши линии по производству пенобетона

Где работают наши линии по производству пенопласта

Малый бизнес

Производимое оборудование

Техническая литература

Как с нами связаться:

Контакты для заказов оборудования: