Choice-Based Sampling

9.5.1 Introduction

Consider the multinominal QR model (9.3.1) or its special case (9.3.4). Up until now we have specified only the conditional probabilities of alternatives j = 0, 1,. . . , m given a vector of exogenous or independent variables x and have based our statistical inference on the conditional probabilities. Thus we have been justified in treating x as a vector of known constants just as in the classical linear regression model of Chapter 1. We shall now treat both j and x as random variables and consider different sampling schemes that specify how j and x are sampled.

First, we shall list a few basic symbols frequently used in the subsequent discussion:

F(y|x, P) or P(j) Conditional probability the y'th

alternative is chosen, given the exogenous variables x

Подпись: The above evaluated at the true value of fi True density of x10 Density according to which a researcher draws x Probability according to which a researcher draws j

Подпись: HU) QU) = QUfi) = SPU*, dx QoU) = QUPo) = SPUx, fio)f(x) dx

Leti'=l,2,. . . , n be the individuals sampled according to some scheme. Then we can denote the alternative and the vector of the exogenous variables observed for the ith individual by jt and x/s respectively.

We consider two types of sampling schemes called exogenous sampling and endogenous sampling (or choice-based sampling in the QR model). The first refers to sampling on the basis of exogenous variables, and the second refers to sampling on the basis of endogenous variables. The different sampling schemes are characterized by their likelihood functions. The likelihood function associated with exogenous sampling is given by11

Le = II PUixh fi)g(*i). (9.5.1)

f-1

The likelihood function associated with choice-based sampling is given by

Lc=n pu, ix„ mxdQum^HUi) (9.5.2)

/-1

if Q(jfio) is unknown and by

£co = f[ PUi*i, )QUiPoTlH(ji) (9.5.3)

(-1

if Q(jo) is known. Note that ifg(x) =/(x) and H(j) = Q(jfioX (9.5.1) and

(9.5.3)

both become

which is the standard likelihood function associated with random sampling. This is precisely the likelihood function considered in the previous sections.

Although (9.5.2) may seem unfamiliar, it can be explained easily as follows. Consider drawing random variables j and x in the following order. We can first

draw j with probability H(j) and then, given j, we can draw x according to the conditional density f(xj). Thus the joint probability isf(xj)H(j), which by Bayes’s rule is equal to P(jx)f(x)Q(j)~lH(j).

This sampling scheme is different from a scheme in which the proportion of people choosing alternative j is a priori determined and fixed. This latter scheme may be a more realistic one. (Hsieh, Manski, and McFadden, 1983, have discussed this sampling scheme.) However, we shall adopt the definition of the preceding paragraphs (following Manski and Lerman, 1977) because in this way choice-based sampling contains random sampling as a special case [QU)= H(j)] and because the two definitions lead to the same estimators with the same asymptotic properties.

Choice-based sampling is useful in a situation where exogenous sampling or random sampling would find only a very small number of people choosing a particular alternative. For example, suppose a small proportion of residents in a particular city use the bus for going to work. Then, to ensure that a certain number of bus riders are included in the sample, it would be much less expensive to interview people at a bus depot than to conduct a random survey of homes. Thus it is expected that random sampling augmented with choice - based sampling of rare alternatives would maximize the efficiency of estimation within the budgetary constraints of a researcher. Such augmentation can be analyzed in the framework of generalized choice-based sampling proposed by Cosslett (1981a) (to be discussed in Section 9.5.4).

In the subsequent subsections we shall discuss four articles: Manski and Lerman (1977), Cosslett (1981a), Cosslett (1981b), and Manski and McFadden (1981). These articles together cover the four different types of models, varying according to whether /is known and whether Q is known, and cover five estimators offi—the exogenous sampling maximum likelihood estimator (ESMLE), the random sampling maximum likelihood estimator (RSMLE), the choice-based sampling maximum likelihood estimator (CBMLE), the Manski-Lerman weighted maximum likelihood estimator (WMLE), and the Manski-McFadden estimator (MME).

A comparison of RSMLE and CBMLE is important because within the framework of choice-based sampling a researcher controls H{j), and the particular choice H(j) = Q0(j) yields random sampling. The choice of H(j) is an important problem of sample design and, as we shall see later, H(j) = Q0(j) is not necessarily an optimal choice.

Table 9.6 indicates how the definitions of RSMLE and CBMLE vary with the four types of model; it also indicates in which article each case is discussed. Note that RSMLE = CBMLE if Q is known. ESMLE, which is not listed in

Table 9.6 Models, estimators, and cross references

/	Q	RSMLE	CBMLE	WMLE	MME
Known	Known	Max. L* wrt /? subject to Q0 = fPfdx	MM Max. L„ wrt P subject to Q0 = SPfdx	MAL
Known	Unknown	Max. L* wrt fi.	MM Max. L„ wrt. p.	—	—
Unknown	Known	Max. L* wrt P and /subject to Go = SPfdx.	C2 (see also Cosslett, 1978) Max. Leo wrt P and /subject to Qo = SPfdx.	MAL	MM
Unknown	Unknown	Max. L* wrt p.	Cl (also proves asymptotic efficiency) Max. Lc wrt p and/

Note: RSMLE = random sampling maximum likelihood estimator; CBMLE = choice-based sampling maximum likelihood estimator; WMLE ~ Manski-Lerman weighted maximum likelihood estimator, MME = Manski-McFadden estimator.

MM = Manski and McFadden (1981); MAL = Manski and Lerman (1977); C2 = Cosslett (1981b); Cl = Cosslett (1981a).

Table 9.6, is the same as RSMLE except when/is unknown and Q is known. In that case ESMLE maximizes Le with respect to fi without constraints. RSMLE and CBMLE for the case of known Q will be referred to as the constrained RSMLE and the constrained CBMLE, respectively. For the case of unknown Q, we shall attach unconstrained to each estimator.

Advanced Econometrics Takeshi Amemiya

Nonlinear Limited Information Maximum Likelihood Estimator

In the preceding section we assumed the model (8.1.1) without specifying the model for Y( or assuming the normality of u, and derived the asymptotic distribution of the class of …

Results of Cosslett: Part II

Cosslett (1981b) summarized results obtained elsewhere, especially from his earlier papers (Cosslett, 1978, 1981a). He also included a numerical evaluation of the asymptotic bias and variance of various estimators. We …

Other Examples of Type 3 Tobit Models

Roberts, Maddala, and Enholm (1978) estimated two types of simultaneous equations Tobit models to explain how utility rates are determined. One of their models has a reduced form that is …

Choice-Based Sampling

Advanced Econometrics Takeshi Amemiya

Nonlinear Limited Information Maximum Likelihood Estimator

Results of Cosslett: Part II

Other Examples of Type 3 Tobit Models

Новые и рекомендуемые материалы:

Производство и продажа хонинговального инструмента

Оборудование для производства краски

Теплообменники для паровых и водяных котлов

Станок для производства ТЕРИВА TERIVA (блоки перекрытия)

Оборудование для производства пенобетона

Расфасовка угля, торфа, кормов, оборудование для упаковки-дозирования

Паровые котлы на дровах, опилках

Где работают наши линии по производству пенобетона

Где работают наши линии по производству пенопласта

Малый бизнес

Производимое оборудование

Техническая литература

Как с нами связаться:

Контакты для заказов оборудования: