A COMPANION TO Theoretical Econometrics

Essentials of Count. Data Regression

A. Colin Cameron and Pravin K. Trivedi

1 Introduction

In many economic contexts the dependent or response variable of interest (y) is a nonnegative integer or count which we wish to explain or analyze in terms of a set of covariates (x). Unlike the classical regression model, the response variable is discrete with a distribution that places probability mass at nonnega­tive integer values only. Regression models for counts, like other limited or discrete dependent variable models such as the logit and probit, are nonlinear with many properties and special features intimately connected to discreteness and nonlinearity.

Let us consider some examples from microeconometrics, beginning with samples of independent cross section observations. Fertility studies often model the number of live births over a specified age interval of the mother, with inter­est in analyzing its variation in terms of, say, mother's schooling, age, and household income (Winkelmann, 1995). Accident analysis studies model airline safety, for example, as measured by the number of accidents experienced by an airline over some period, and seek to determine its relationship to airline pro­fitability and other measures of the financial health of the airline (Rose, 1990). Recreational demand studies seek to place a value on natural resources such as national forests by modeling the number of trips to a recreational site (Gurmu and Trivedi, 1996). Health demand studies model data on the number of times that individuals consume a health service, such as visits to a doctor or days in hospital in the past year (Cameron, Trivedi, Milne and Piggott, 1988), and esti­mate the impact of health status and health insurance.

Examples of count data regression based on time series and panel data are also available. A time series example is the annual number of bank failures over some period, which may be analyzed using explanatory variables such as bank

profitability, corporate profitability, and bank borrowings from the Federal Reserve Bank (Davutyan, 1989). A panel data example that has attracted much attention in the industrial organization literature on the benefits of research and development expenditures is the number of patents received annually by firms (Hausman, Hall, and Griliches, 1984).

In some cases, such as number of births, the count is the variable of ultimate interest. In other cases, such as medical demand and results of research and development expenditure, the variable of ultimate interest is continuous, often expenditures or receipts measured in dollars, but the best data available are, instead, a count.

In all cases the data are concentrated on a few small discrete values, say 0, 1, and 2; skewed to the left; and intrinsically heteroskedastic with variance increas­ing with the mean. In many examples, such as number of births, virtually all the data are restricted to single digits, and the mean number of events is quite low. But in other cases, such as number of patents, the tail can be very long with, say, one-quarter of the sample being awarded no patents while one firm is awarded 400 patents.

These features motivate the application of special methods and models for count regression. There are two ways to proceed. The first approach is a fully parametric one that completely specifies the distribution of the data, fully respecting the restriction of y to nonnegative integer values. The second ap­proach is a mean-variance approach, which specifies the conditional mean to be nonnegative, and specifies the conditional variance to be a function of the con­ditional mean.

These approaches are presented for cross section data in Sections 2 to 4. Sec­tion 2 details the Poisson regression model. This model is often too restrictive and other, more commonly-used, fully parametric count models are presented in Section 3. Less-used alternative parametric approaches for counts, such as dis­crete choice models and duration models, are also presented in this section. The partially parametric approach of modeling the conditional mean and conditional variance is detailed in Section 4. Extensions to other types of data, notably time series, multivariate and panel data, are given in Section 5. In Section 6 practical recommendations are provided. For pedagogical reasons the Poisson regression model for cross section data is presented in some detail. The other models, many superior to Poisson, are presented in less detail for space reasons. For more complete treatment see Cameron and Trivedi (1998) and the guide to fur­ther reading in Section 7.

Добавить комментарий

A COMPANION TO Theoretical Econometrics

Normality tests

Let us now consider the fundamental problem of testing disturbance normality in the context of the linear regression model: Y = Xp + u, (23.12) where Y = (y1, ..., …

Univariate Forecasts

Univariate forecasts are made solely using past observations on the series being forecast. Even if economic theory suggests additional variables that should be useful in forecasting a particular variable, univariate …

Further Research on Cointegration

Although the discussion in the previous sections has been confined to the pos­sibility of cointegration arising from linear combinations of I(1) variables, the literature is currently proceeding in several interesting …

Как с нами связаться:

тел./факс +38 05235  77193 Бухгалтерия
+38 050 512 11 94 — гл. инженер-менеджер (продажи всего оборудования)

+38 050 457 13 30 — Рашид - продажи новинок
e-mail: msd@msd.com.ua
Схема проезда к производственному офису:
Схема проезда к МСД

Партнеры МСД

Контакты для заказов шлакоблочного оборудования:

+38 096 992 9559 Инна (вайбер, вацап, телеграм)
Эл. почта: inna@msd.com.ua

За услуги или товары возможен прием платежей Онпай: Платежи ОнПай