Continuous mixture models
The negative binomial model can be obtained in many different ways. The following justification using a mixture distribution is one of the oldest and has wide appeal.
Suppose the distribution of a random count y is Poisson, conditional on the parameter X, so that f(y X) = exp(-X)Xy/y!. Suppose now that the parameter X is random, rather than being a completely deterministic function of regressors x. In particular, let X = pv, where p is a deterministic function of x, for example exp(x'P), and v > 0 is iid distributed with density g(v a). This is an example of unobserved heterogeneity, as different observations may have different X (heterogeneity) but part of this difference is due to a random (unobserved) component v.
The marginal density of y, unconditional on the random parameter v but conditional on the deterministic parameters p and a, is obtained by integrating out v. This yields
where g(v | a) is called the mixing distribution and a denotes the unknown parameter of the mixing distribution. The integration defines an "average" distribution. For some specific choices of f(-) and g(), the integral will have an analytical or closed-form solution.
If f(y | X) is the Poisson density and g(v), v > 0, is the gamma density with E [v] = 1 and V[v] = a we obtain the negative binomial density
where Г() denotes the gamma integral which specializes to a factorial for an integer argument. Special cases of the negative binomial include the Poisson (a = 0) and the geometric (a = 1).
The first two moments of the negative binomial distribution are
E [ y | p, a] = p,
V[y | p, a] = p(1 + ap). (15.14)
The variance therefore exceeds the mean, since a > 0 and p > 0. Indeed it can be shown easily that overdispersion always arises if y | X is Poisson and the mixing is of the form X = pv where E [v] = 1. Note also that the overdispersion is of the form (15.10) discussed in Section 2.4.
Two standard variants of the negative binomial are used in regression applications. Both variants specify p; = exp(x( P). The most common variant lets a be a parameter to be estimated, in which case the conditional variance function, p + ap2 from (15.14), is quadratic in the mean. The loglikelihood is easily obtained from (15.13), and estimation is by maximum likelihood.
The other variant of the negative binomial model has a linear variance function, V [y|p, a] = (1 + 5 )p, obtained by replacing a by 5/p throughout (15.13). Estimation by ML is again straightforward. Sometimes this variant is called negative binomial 1 (NB1) in contrast to the variant with a quadratic variance function which has been called negative binomial 2 (NB2) model (Cameron and Trivedi, 1998).
The negative binomial model with quadratic variance function has been found to be very useful in applied work. It is the standard cross section model for counts, which are usually overdispersed, along with the quasi-MLE of Section 4.1.
For mixtures other than Poisson-gamma, such as those that instead use as mixing distribution the lognormal distribution or the inverse-Gaussian distribution, the marginal distribution cannot be expressed in a closed form. Then one may have to use numerical quadrature or simulated maximum likelihood to estimate the model. These methods are entirely feasible with currently available
computing power. If one is prepared to use simulation-based estimation methods, see Gourieroux and Monfort (1997), the scope for using mixed-Poisson models of various types is very extensive.