A COMPANION TO Theoretical Econometrics
Estimation
2.2 Maximum likelihood estimation
Maximum likelihood (ML) estimation of spatial lag and spatial error regression models was first outlined by Ord (1975).21 The point of departure is an assumption of normality for the error terms. The joint likelihood then follows from the multivariate normal distribution for y. Unlike what holds for the classic regression model, the joint loglikelihood for a spatial regression does not equal the sum of the loglikelihoods associated with the individual observations. This is due to the two-directional nature of the spatial dependence, which results in a Jacobian term that is the determinant of a full N x N matrix, e. g. |I - pW|.
For the SAR error model, the loglikelihood is based on the multivariate normal case, for example, as used in the general treatment of Magnus (1978). Since e ~ MVN(0, X), it follows that, with e = y - Xp and X = o2[(I - XW)'(I - XW)]-1,
lnL = - (N/2) ln (2n) - (N/2) ln o2 + ln|I - XW|
-(l/2o2)(y - Xp)'(I - XW)'(I - XW)(y - Xp). (14.21)
Closer inspection of the last term in (14.21) reveals that, conditional upon X (the spatial autoregressive parameter), a maximization of the loglikelihood is equivalent to the minimization of the sum of squared residuals in a regression of a spatially filtered dependent variable y* = y - XWy on a set of spatially filtered explanatory variables X* = X - XWX. The first order conditions for Sml indeed yield the familiar generalized least squares estimator:
Sml = [(X - XWX)'(X - XWX)]-1(X - XWX)'(y - XWy) (14.22)
and, similarly, the ML estimator for o2 follows as:
oML = (e - XWe)'(e - XWe)/N (14.23)
with e = y - XpML. However, unlike the time series case, a consistent estimator for X cannot be obtained from the OLS residuals and therefore the standard two-step FGLS approach does not apply.22 Instead, the estimator for X must be obtained from an explicit maximization of a concentrated likelihood function (for details, see Anselin, 1988a, ch. 6, and Anselin and Bera, 1998).
The loglikelihood for the spatial lag model is obtained using the same general principles (see Anselin, 1988, ch. 6 for details) and takes the form
lnL = - (N/2) ln (2n) - (N/2) ln o2 + ln|I - pW|
-(1/2o2)(y - pWy - Xp)'(y - pWy - Xp). (14.24)
The minimization of the last term in (14.24) corresponds to OLS, but since this ignores the log Jacobian ln 11 - pW |, OLS is not a consistent estimator in this model. As in the spatial error model, there is no satisfactory two-step procedure and estimators for the parameters must be obtained from an explicit maximization of the likelihood. This is greatly simplified since both Sml and 6ML can be obtained conditional upon p from the first order conditions:
Sml = (X'X)-1X'(y - pWy), (14.25)
or, with So = (X'X)-1X'y, eo = y - XSo, Sl = (X'X)-1X'Wy, 6l = y - XSl,
Sml = So - pSl (14.26)
and
This yields a concentrated loglikelihood in a single parameter, which is straightforward to optimize by means of direct search techniques (see Anselin (1980, 1988a) for derivations and details).
Both spatial lag and spatial error models are special cases of a more general specification that may include forms of heteroskedasticity as well. This also provides the basis for ML estimation of spatial SUR models with spatial lag or spatial error terms (Anselin, 1980, ch. 10). Similarly, ML estimation of error components models with spatial lag or spatial error terms can be implemented as well. Spatial models with discrete dependent variables are typically not estimated by means of ML, given the prohibitive nature of evaluating multiple integrals to determine the relevant marginal distributions.23
Finally, it is important to note that models with spatial dependence do not fit the classical framework (e. g. as outlined in Rao, 1973) under which the optimal properties (consistency, asymptotic efficiency, asymptotic normality) of ML estimators are established. This implies that these properties do not necessarily hold and that careful consideration must be given to the explicit formulation of regularity conditions. In general terms, aside from the usual restrictions on the variance and higher moments of the model variables, these conditions boil down to constraints on the range of dependence embodied in the spatial weights matrix.24 In addition, to avoid singularity or explosive processes, the parameter space for the coefficient in a spatial process model is restricted to an interval other than the familiar -1, +1. For example, for an SAR process, the parameter space is 1/ ramin < p < 1/ ramax, where ramin and ramax are the smallest (on the real line) and largest eigenvalues of the spatial weights matrix W. For row-standardized weights, ramax = 1, but ramin > - 1, such that the lower bound on the parameter space is less than -1 (Anselin, 1980). This must be taken into account in practical implementations of estimation routines.