A COMPANION TO Theoretical Econometrics
Semiparametric Estimation
We have seen that in the estimation of QRM the assumption is made about the error term being distributed according to some known distribution (i. e. logistic or normal). The validity of this assumption may be rejected using appropriate
specification tests, implying (if correct) the inconsistency of the estimators. For this reason, several semiparametric methods of estimation have been proposed in the literature, which, based on weaker assumptions about the error distribution, estimate the relevant parameters of the model.
In this section, we will consider four semiparametric estimators: the Maximum Score (MS) estimator (Manski, 1975, 1985; Manski and Thompson, 1986), the Quasi Maximum Likelihood (QML) estimator (Klein and Spady, 1993), the Generalized Maximum Likelihood (GML) estimator (Cosslett, 1983), and the seminonparametric (SNP) estimator (Gallant and Nychka, 1987). This is, by no means, an exhaustive review of the many existing semiparametric estimators for QRM. The interested reader is referred to the comprehensive review of semiparametric models in the context of limited dependent variables by Powell (1994).
Our main focus is on the binary response model, however, the MS and QML can be extended to multinomial response models. Other estimators for the multinomial response model have been developed by Thompson (1989a, 1989b) and Lee (1994).
Gabler, Laisney, and Lechner (1993) use a small-scale Monte Carlo study to conclude that the bias associated with incorrect distributional assumptions in the binary probit model can be substantial both in finite samples and asymptotically. Similar results for the binary logit model are shown by Horowitz (1993).
The MS estimator for the parameter vector в is intuitively obtained by maximizing the number of correct predictions of y (dependent variable) by the sign of the latent regression function x 'p. More formally, this estimator maximizes the following score function over a suitable parameter space:
N
S„(P) - X [ylKP > 0} + (1 - y)IKp < 0}]. (17.17)
i=1
The only restriction imposed on the distribution of the error term is to have conditional median zero, which ensures consistency of the estimator. Despite the fact that the MS estimator is consistent, it is not root-n consistent under standard regularity conditions, nor asymptotically normal. Its rate of convergence is n1, and n з (S - p0) converges to a nonnormal distribution (Kim and Pollard, 1990).
The QML estimator, proposed by Klein and Spady (1993), is obtained as follows.3 By the use of Bayes' theorem, we can write the probability of y = 1 given x as:
P{y = 1/x} = P{y 1} - p(xP). (17.18)
g(xP)
The mean of y is a consistent estimator for P{y = 1} above, and we can also obtain consistent estimators for g(xP/y = 1) and g(xP) using kernel estimates (for known P). Having those estimators, we can obtain consistent estimates p(xP) that are used to compute the QML estimator by substituting them for F(P'x;) on (17.4) and maximizing the expression.
This estimator is strongly consistent, asymptotically normal, and attains the semiparametric efficiency bound (Newey, 1990); however, it relies on the stringent assumption of independence between the error term and the regressors.
Cosslett's (1983) GML estimator uses the likelihood function for the binary choice model (17.4) and maximizes it with respect to the functional form F as well as p, subject to the condition that F is a distribution function. This maximization is done in two steps: first, P is fixed and (17.4) is maximized with respect to F to obtain F; then, in the second step, F is used as the correct distribution and (17.4) is maximized with respect to p, obtaining p. Cosslett (1983) derived the conditions under which this estimator is consistent; however, its asymptotic normality has not been proved yet, and the second step is computationally costly since the likelihood function at that stage varies in discrete steps over the parameter space.
The paper by Gabler, Laisney, and Lechner (1993), to be referred as GLL, is an application of the SNP estimator proposed by Gallant and Nychka (1987) to the binary choice model. In general, the term semi-nonparametric is used when the goal is to approximate a function of interest with a parametric approximation (in this case the distribution function of the errors). The larger the number of observations available to estimate the function, the larger the number of parameters to be used in approximating the function of interest and, as a result, the better the approximation.
Gallant and Nychka (1987) proposed the approximation of any smooth density that has a moment generating function with the Hermite form:4
K
h*(u) = ^аа^и'+і exp{-(u/5)2}. (17.19)
i, j=0
The estimation method used by GLL is to fix K (the degree of the Hermite polynomial) in a somewhat optimal way and use the framework of pseudo maximum likelihood (PML).
For the binary choice model, we use the above approximation for the density F in the likelihood function (17.4). The likelihood function is then maximized taking into account two additional restrictions on a and 5: и has to have zero expectation, and the requirement of unit mass below the density F. Moreover, a condition for consistency in this approach is the degree K of the approximation to increase with the sample size.5 For the details of the estimation method refer to GLL's paper.
The asymptotic normality of this SNP estimator follows from the asymptotic theory of PML estimation. References in this literature are White (1982, 1983) and Gourieroux and Monfort (1995). This allows for hypothesis testing with the usual techniques. Note that the PML theory is being used for the asymptotic distribution of a potentially inconsistent estimator (due to the use of a fixed K ).6 In addition, let us note that this asymptotic normality result is not a standard asymptotic result, since the asymptotic variance is only approximated as N ^ ^, holding K at the fixed value chosen a priori.7
While all four estimators discussed above relax distributional assumptions about the error term, each of them shows drawbacks that are worth considering before their use in empirical applications. The MS estimator has been used more than the other estimators due to its computational feasibility (it is even available in some software packages, although with size limitations), but its asymptotic nonnormality is a drawback when hypothesis testing beyond parameter significance is needed since additional steps are required. This last drawback is shared with GML, which is also computationally costly. On the other hand, QML is asymptotically normal and efficient, but should not be used when dependence between the error term and the regressors is suspected, limiting its application considerably. Finally, the relatively new SNP estimator does not share the drawbacks of the MS and GML estimators, and the authors offer a GAUSS program upon request. However, the estimator is inconsistent if K is not chosen carefully, and the nonglobal concavity of its objective function require good starting values, which usually implies the estimation of a probit model. Besides, it shares QML's drawback regarding the dependence between the error term and the regressors. More evidence about the relative performance of these estimators is needed.