Advanced Econometrics Takeshi Amemiya
Log-Linear Model
A log-linear model refers to a particular parameterization of a multivariate model. We shall discuss it in the context of the 2 X 2 model given in Table 9.1. For the moment we shall assume that there are no independent variables and that there is no constraint among the probabilities; therefore the model is completely characterized by specifying any three of the four probabilities appearing in Table 9.1. We shall call Table 9.1 the basic parameterization and shall consider two alternative parameterizations.
The first alternative parameterization is a logit model and is given in Table
9.4, where d is the normalization chosen to make the sum of probabilities equal to unity. The second alternative parameterization is called a log-linear model and is given in Table 9.5, where d, again, is a proper normalization, not necessarily equal to the d in Table 9.4.
The three models described in Tables 9.1,9.4, and 9.5 are equivalent; they differ only in parameterization. Parameterizations in Tables 9.4 and 9.5 have the attractive feature that the conditional probabilities have a simple logistic form. For example, in Table 9.4 we have
Note, however, that the parameterization in Table 9.5 has an additional attractive feature: a12 = 0 if and only if yx and y2 are independent. The role
Table 9.4 Bivariate logit model Уг
У 1 О
d~le“"
d~ye<*»
of a12 also can be seen by the following equation that can be derived from Table 9.5:
Р(Уі = 1 їй) = Л(а, + al2y2). (9.4.9)
The log-linear parameterization in Table 9.5 can also* be defined by
Р(Уі, Уг)a exp (a, y, + a2y2 + а12йй), (9.4.10)
where * reads “is proportional to.” This formulation can be generalized to a log-linear model of more than two binary random variables as follows (we shall write only the case of three variables):
Р(Уі, Уі, Уз) * exp (a, y, + a2y2 + a3y3 + any{y2 + a l3yry3
+ «23 УгУз + «123 УіУгУз)- (9.4.11)
The first three terms in the exponential function are called the main effects. Terms involving the product of two variables are called second-order interaction terms, the product of three variables third-order interaction terms, and so on. Note that (9.4.11) involves seven parameters that can be put into a one-to - one correspondence with the seven probabilities that completely determine the distribution ofyt, y2, and y3. Such a model, without any constraint among the parameters, is called a saturated model. A saturated model for / binary variables involves 2J— 1 parameters. Researchers often use a constrained log-linear model, called an unsaturated model, which is obtained by setting some of the higher-order interaction terms to 0.
Example 9.4.2 is an illustration of a multivariate log-linear model.
Example 9.4.2 (Goodman, 1972). Goodman sought to explain whether a soldier prefers a Northern camp to a Southern camp (y0) by the race of the soldier (у,), the region of his origin (y2), and the present location of his camp (North or South) (y3). Because each conditional probability has a logistic form, a log-linear model is especially suitable for analyzing a model of this
sort. Generalizing (9.4.9), we can write a conditional probability as
Р(Уо = 1|Уі, У2, Уз) = Л(ао + «оіУі + «огУг + <*озУз + ОктУїУг
+ «оізУї-Уз + ^огзУгУз + ^ошУіУгУз)-
(9.4.12)
Goodman looked at the asymptotic t value of the MLE of each a (the MLE divided by its asymptotic standard deviation) and tentatively concluded «ой = «ой = Ooi23= 0, called the null hypothesis. Then he proceeded to accept formally the null hypothesis as a result of the following testing procedure. Define P„t= 1,2,. . . , 16, as the observed frequencies in the 16 cells created by all the possible joint outcomes of the four binary variables. (They can be interpreted as the unconstrained MLE’s of the probabilities Pt.) Define Ft as the constrained MLE of P, under the null hypothesis. Then we must reject the null hypothesis if and only if
where n is the total number of soldiers in the sample and^f^ is the a% critical value of xl • [Note that the left-hand side of (9.4.13) is analogous to (9.3.24).] Or, alternatively, we can use
2n X A lQg 4 > ХІ
t-i F,
We shall indicate how to generalize a log-linear model to the case of discrete variables that take more than two values. This is done simply by using the binary variables defined in (9.3.2). We shall illustrate this idea by a simple example: Suppose there are two variables z and уг such that z takes the three values 0,1, and 2 and y3 takes the two values 0 and 1. Define two binary (0, 1) variables yx and y2 by the rule: yt = 1 if z = 1 and y2= 1 if z = 2. Then we can specify P(z, y3) by specifying Р(Уі, y2, y3), which we can specify by a log-linear model as in (9.4.11). However, we should remember one small detail: Because in the present case Уі y2 = 0 by definition, the two terms involving y{ y2 in the right-hand side of (9.4.11) drop out.
In the preceding discussion we have touched upon only a small aspect of the log-linear model. There is a vast amount of work on this topic in the statistical literature. The interested reader should consult articles by Haberman (1978, 1979), Bishop, Feinberg, and Holland (1975), or the many references to Leo Goodman’s articles cited therein.
Nerlove and Press (1973) proposed making the parameters of a log-linear model dependent on independent variables. Specifically, they proposed the main-effect parameters—a,, a2, and a3 in (9.4.11)—to be linear combinations of independent variables. (However, there is no logical necessity to restrict this formulation to the main effects.)
Because in a log-linear model each conditional probability has a logit form as in (9.4.12), the following estimation procedure (which is simpler than MLE) can be used: Maximize the product of the conditional probabilities with respect to the parameters that appear therein. The remaining parameters must be estimated by maximizing the remaining part of the likelihood function. Amemiya (1975b) has given a sketch of a proof of the consistency of this estimator. We would expect that the estimator is, in general, not asymptotically as efficient as the MLE. However, Monte Carlo evidence, as reported by Guilkey and Schmidt (1979), suggests that the loss of efficiency may be minor.