Advanced Econometrics Takeshi Amemiya
Multivariate Probit Model
A multivariate probit model was first proposed by Ashford and Sowden (1970) and applied to a bivariate set of data. They supposed that a coal miner develops breathlessness (y, = 1) if his tolerance (yf) is less than 0. Assuming that yf ~ N(—0x, 1), where x = (1, Age)', we have
F(y,= 1) = Ф(#х). (9.4.15)
They also supposed that a coal miner develops wheeze (y2 = 1) if his tolerance (yf) against wheeze is less than 0 and that yf ~ N(—fi'2x, 1). Then we have
P(y2 = 1) = Ф(Дх). (9.4.16)
Now that we have specified the marginal probabilities of Уі and y2, the multivariate model is completed by specifying the joint probability Р(Уі = 1, у2 = 1), which in turn is determined if the joint distribution of yf and yf is specified. Ashford and Sowden assumed that yf and yf are jointly normal with a correlation coefficient p. Thus
P{ У і = 1, Уг = 1) = P'i*), (9-4-17)
where Fp denotes the bivariate normal distribution function with zero means, unit variances, and correlation p.
The parameters fil, fi2, and p can be estimated by MLE or MIN x2 (if there are many observations per cell for the latter method). Amemiya (1972) has given the MIN x2 estimation of this model.
Muthen (1979) estimated the following sociological model, which is equivalent to a bivariate probit model:
yx — 1 if ux<OLx+f}]i) (9.4.18)
Уг = 1 if u2<a2 + Р2Ч ri = 'y + v
и і, и2 ~ N(0, 1), v~ N(0, a2).
Here yx and y2 represent responses of parents to questions about their attitudes toward their children, rj is a latent (unobserved) variable signifying parents’ sociological philosophy, and x is a vector of observable characteristics of parents. Muthen generalized this model to a multivariate model in which y, x, t], u, v, and 0£ are all vectors following
у = 1 if u < a + Qtj (9.4.19)
Btf = Гх + v
u ~ N(0,1), v ~ N(0, 2),
where 1 is a vector of ones, and discussed the problem of identification. In addition, Lee (1982a) has applied a model like (9.4.19) to a study of the relationship between health and wages.
It is instructive to note a fundamental difference between the multivariate probit model and the multivariate logit or log-linear models discussed earlier: in the multivariate probit model the marginal probabilities are first specified and then a joint probability consistent with the given marginal probabilities is found, whereas in the multivariate logit and log-linear models the joint probabilities or conditional probabilities are specified at the outset. The consequence of these different methods of specification is that marginal probabilities have a simple form (probit) in the multivariate probit model and the conditional probabilities have a simple form (logit) in the multivariate logit and log-linear models.
Because of this fundamental difference between a multivariate probit model and a multivariate logit model, it is an important practical problem for a researcher to compare the two types of models using some criterion of goodness of fit. Morimune (1979) compared the Ashford-Sowden bivariate probit model with the Nerlove-Press log-linear model empirically in a model in which the two binary dependent variables represent home ownership (y,) and whether or not the house has more than five rooms (y2). As criteria for
comparison, Morimune used Cox’s test (Section 4.S.3) and his own modification of it. He concluded that probit was preferred to logit by either test.
It is interesting to ask whether we could specify an Ashford-Sowden type bivariate logit model by assuming the logistic distribution fory* andy* in the Ashford-Sowden model. Although there is no “natural” bivariate logistic distribution the marginal distributions of which are logistic (unlike the normal case), Lee (1982b) found that Plackett’s bivariate logistic distribution function (Plackett, 1965) yielded results similar to a bivariate probit model when applied to Ashford and Sowden’s data and Morimune’s data. Furthermore, it was computationally simpler.
Given two marginal distribution functions F(x) and СЦу), Plackett’s class of bivariate distributions H(x, y) is defined by
_ H{ -F-G + H) ¥ (.F-H)(G-H) ’
for any fixed у/ in (0, »).
Unfortunately, this method does not easily generalize to a higher-order multivariate distribution, where because of the computational burden of the probit model the logit analog of a multivariate probit model would be especially useful. Some progress in this direction has been made by Malik and Abraham (1973), who generalized Gumbel’s bivariate logistic distribution (Gumbel, 1961) to a multivariate case.