A COMPANION TO Theoretical Econometrics
Modified count models
The leading motivation for modified count models is to solve the so-called problem of excess zeros, the presence of more zeros in the data than predicted by count models such as the Poisson.
The hurdle model or two-part model relaxes the assumption that the zeros and the positives come from the same data generating process. The zeros are determined by the density /(•), so that Pr[ y = 0] = /1(0). The positive counts come from the truncated density /2(y | y > 0) = /2(y)/(1 -/2(0)), which is multiplied by Pr[y > 0] = 1 - /1(0) to ensure that probabilities sum to unity. Thus
This reduces to the standard model only if /() = /2(). Thus in the modified model the two processes generating the zeros and the positives are not constrained to be the same. While the motivation for this model is to handle excess zeros, it is also capable of modeling too few zeros.
Maximum likelihood estimation of the hurdle model involves separate maximization of the two terms in the likelihood, one corresponding to the zeros and the other to the positives. This is straightforward.
A hurdle model has the interpretation that it reflects a two-stage decision making process. For example, a patient may initiate the first visit to a doctor, but the second and subsequent visits may be determined by a different mechanism (Pohlmeier and Ulrich, 1995).
Regression applications use hurdle versions of the Poisson or negative binomial, obtained by specifying/() and /2() to be the Poisson or negative binomial densities given earlier. In application the covariates in the hurdle part which models the zero/one outcome need not be the same as those which appear in the truncated part, although in practice they are often the same. The hurdle model is widely used, and the hurdle negative binomial model is quite flexible. Drawbacks are that the model is not very parsimonious, typically the number of parameters is doubled, and parameter interpretation is not as easy as in the same model without hurdle.
The conditional mean in the hurdle model is the product of a probability of positives and the conditional mean of the zero-truncated density. Therefore, using a Poisson regression when the hurdle model is the correct specification implies a misspecification which will lead to inconsistent estimates.