A COMPANION TO Theoretical Econometrics
Methods for introducing inexact nonsample information
Economists usually bring general information about parameters to the estimation problem, but it is not like the exact restrictions discussed in the previous section. For example, we may know the signs of marginal effects, which translate into inequality restrictions on parameters. Or we may think that a parameter falls in the unit interval, and that there is a good chance it falls between 0.25 and 0.75. That is, we are able to suggest signs of parameters, and even perhaps ranges of reasonable values. While such information has long been available, it has been difficult to use in applications. Perhaps the biggest breakthrough in recent years has been the development of methods and the distribution of software that makes it feasible to estimate linear (and nonlinear) models subject to inequality restrictions, and to implement Bayesian statistical methods.
The theory of inequality restricted least squares was developed some time ago. See Judge and Yancey (1986). However, the numerical problems of minimizing the sum of squared regression errors or maximizing a likelihood function subject to general inequality restrictions are substantial. Recently major software packages (SAS, GAUSS, GAMS) have made algorithms for such constrained
optimization much more accessible. With inequality restrictions, such as P* > 0, MSE gains require only that the direction of the inequality be correct.
The Bayesian paradigm is an alternative mode of thought. See Zellner (1971). In it we represent our uncertainty about parameter values using probability distributions. Inexact nonsample information is specified up front in the Bayesian world, by specifying a "prior" probability distribution for each parameter (in general a joint prior). The prior density can be centered over likely values. It can be a truncated distribution, putting zero prior probability on parameter values we rule out on theoretical grounds, and so on. When prior beliefs are combined with data a multivariate probability distribution of the parameters is generated, called the posterior distribution, which summarizes all available information about the parameters.
As noted in Judge et al. (1985, p. 908), Bayesians have no special problem dealing with the singularity or near-singularity of X'X. Their approach to the collinearity problem is to combine the prior densities on the parameters, в with the sample information contained in the data to form a posterior density (see Zellner, 1971, pp. 75-81). The problem for Bayesians, as noted by Leamer (1978), is that when data are collinear the posterior distribution becomes very sensitive to changes in the prior. Small changes in the prior density result in large changes in the posterior, which complicates the use and analysis of the results in much the same way that collinearity makes inference imprecise in the classical theory of inference.
Bayesian theory is elegant, and logically consistent, but it has been a nightmare in practice. Suppose y(P) is the multivariate posterior distribution for the vector of regression parameters p. The problem is how to extract the information about a single parameter of interest, say P*. The brute force method is to obtain the posterior density for p* by integrating all the other parameters out of y(P). When the posterior distribution y(P) is complicated, as it usually is, this integration is a challenging problem.
The Bayesian miracle has been the development of computationally intensive, but logically simple, procedures for deriving the posterior densities for individual parameters. These procedures include the Gibbs sampler, the Metropolis and Metropolis-Hastings algorithms (Dorfman, 1997). These developments will soon make Bayesian analysis feasible in many economic applications.
In passing we note that non-Bayesians have tried to achieve the incorporation of similar information by making the exact restrictions in Section 5.1 inexact (Theil and Goldberger, 1961). This is achieved by adding a random disturbance v ~ (0, Q) to exact restrictions, to obtain r = - Rp + v. This additional information is combined with the linear model as
The resulting model is estimated by generalized least squares, which is called "mixed estimation" in this context. The difficulty, of course, apart from specifying
the constraints, is the specification of the covariance matrix Q, reflecting parameter uncertainty.
Another estimation methodology has been introduced recently, based upon the maximum entropy principle (Golan, Judge, and Miller, 1996). This estimation method, instead of maximizing the likelihood function, or minimizing the sum of squared errors, maximizes the entropy function, subject to data and logical constraints. The method of maximum entropy is "nonparametric" in the sense that no specific probability distribution for the errors need be assumed. Like the Bayesian methodology, maximum entropy estimation requires the incorporation of prior information about the regression parameters at the outset. Golan, Judge and Miller find that the maximum entropy estimator, which like the Stein-rule is a shrinkage estimator, performs well in the presence of collinearity.