A COMPANION TO Theoretical Econometrics
Nonnested Hypothesis. Testing: An Overview
M. Hashem Pesaran and Melvyn Weeks
This chapter focuses on the hypotheses testing problem when the hypotheses or models under consideration are "nonnested" or belong to "separate" families of distributions, in the sense that none of the individual models may be obtained from the remaining either by imposition of parameter restrictions or through a limiting process.1 In econometric analysis nonnested models arise naturally when rival economic theories are used to explain the same phenomenon such as unemployment, inflation, or output growth. Typical examples from the economics literature are Keynesian and new classical explanations of unemployment, structural, and monetary theories of inflation, alternative theories of investment, and endogenous and exogenous theories of growth.[10] Nonnested models could also arise when alternative functional specifications are considered such as multinomial probit and logit distribution functions used in the qualitative choice literature, exponential, and power utility functions used in the asset pricing models, and a variety of nonnested specifications considered in the empirical analysis of income and wealth distributions. Finally, even starting from the same theoretical paradigm, it is possible for different investigators to arrive at different models if they adopt different conditioning or follow different paths to a more parsimonious model using the general-to-specific specification search methodology, advocated, for example by Hendry (1993).
The concept of an econometric model is discussed in Section 2, where a distinction is made between conditional and unconditional models. This is an important distinction since most applied work in econometrics takes place within a modeling framework where the behavior of one or more "endogenous" variables is often explained conditional on a set of "exogenous" variables. This discussion also highlights the importance of conditioning in the process of model evaluation.
Examples of nonnested models are given in Section 3. Section 4 discusses the differences that lie behind model selection and hypotheses testing. Although this chapter is primarily concerned with hypotheses testing involving nonnested models, a discussion of the differences and similarities of the two approaches to model evaluation can serve an important pedagogic purpose in clarifying the conditions under which one approach rather than the other could be appropriate.
The literature on nonnested hypothesis testing in statistics was pioneered by the seminal contributions of Cox (1961), Cox (1962), and Atkinson (1970), and was subsequently applied to econometric models by Pesaran (1974) and Pesaran and Deaton (1978). The analysis of nonnested regression models was further considered by Davidson and MacKinnon (1981), Fisher and McAleer (1981), Dastoor (1983), Deaton (1982), Sawyer (1983), Gourieroux, Monfort, and Trognon (1983), and Godfrey and Pesaran (1983).3 This literature is reviewed in Section 5 where we examine a number of alternative approaches to testing nonnested hypotheses, including the encompassing approach advanced by Mizon and Richard (1986), Gourieroux and Monfort (1995), and Smith (1993).
Generally speaking, two models, say Hf and Hg, are said to be nonnested if it is not possible to derive Hf (or Hg) from the other model either by means of an exact set of parametric restrictions or as a result of a limiting process. But for many purposes a more rigorous definition is needed. Section 6 examines this issue and focuses on the Kullback-Leibler divergence measure which has played a pivotal role in the development of a number of nonnested test statistics. The Vuong approach to model selection, viewed as a hypothesis testing problem is also discussed in this section (see Vuong, 1989). Section 7 deals with the practical problems involved in the implementation of the Cox procedure. Apart from a few exceptions, the centering of the loglikelihood ratio statistic required to construct the Cox statistic, will involve finding an estimate of the Kullback- Leibler measure of closeness of the alternative to the null hypothesis, which in most cases is not easy to compute using analytical techniques. Subsequently, we explore two methods which circumvent the problem. First, following work by Pesaran and Pesaran (1993), we examine the simulation approach which provides a consistent estimator of the KLIC measure. However, since this approach is predicated upon the adherence to a classical testing framework, we also examine the use of a parametric bootstrap approach. Whereas the use of simulation facilitates the construction of a pivotal test statistic with an asymptotically well - defined limiting distribution, the bootstrap procedure effectively replaces the theoretical distribution with the empirical distribution function. We also discuss the use of pivotal bootstrap statistics for testing nonnested models. 2
W = (w1, w*,..., w'j)'
Щ : fi(wi, w*,..., wT |wo, ф,) = fi(W|wo, ф,), i = 1, 2,..., m, (13.1)
where f () is the probability density function of the model (hypothesis) Щ, and ф, is a p x 1 vector of unknown parameters associated with model Ші.4
The models characterized by fi(W|w0, фі) are unconditional in the sense that probability distribution of wt is fully specified in terms of some initial values, w0, and for a given value of ф;. In econometrics the interest often centers on conditional models, where a vector of "endogenous" variables, yt, is explained (or modeled) conditional on a set of "exogenous", variables, xt. Such conditional models can be derived from (13.1) by noting that
f (wi, w2,..., wT |wo, Фі) = fi(y1, y*,..., Ут |X1, x*,..., xT, ф(Фі))
x f (x1, x2,..., xt |wo, к(Фі)), (13.2)
where wt = (y', x'). The unconditional model Ші is decomposed into a conditional model of yt given xt and a marginal model of xt. Denoting the former by we have
^i, yU : fi(Уп У2,. . ., Ут |Xl, X2,. . ., Xт, Wo, ф(ф!)) = f(Y|X Wo, ф(фi)), (13.3)
where Y = (y1, y2,..., yT)' and X = (x1, x2,..., xT)'.
Confining attention to the analysis and comparison of conditional models is valid only if the variations in the parameters of the marginal model, к (ф;), does not induce changes in the parameters of the conditional model, ф (фі). Namely Эф(ф!)/Э'к (фі) = 0. When this condition holds it is said that xt is weakly exogenous for фі. The parameters of the conditional model, ф, = ф (ф;), are often referred to as the parameters of interest.5
The conditional models Ші = 1, 2,..., m all are based on the same conditioning variables, xt, and differ only in so far as they are based upon different pdfs. We may introduce an alternative set of models which share the same pdfs but differ with respect to the inclusion of exogenous variables. For any model, Ш i we may partition the set of exogenous variables xt according to a simple in - cluded/excluded dichotomy. Therefore xt = (x't, x'*)' writes the set of exogenous variables according to a subset xit which are included in model Шi, and a subset x* which are excluded. We may then write
f(Y|xv x2,... xt, wo, фі) = fi(Y | xf1, x,'2, ... x! T, x*_, x*2,..., x*t, wo, фі)
= f(Y | Xi, wo, ф і(фі)) x f(X* |X i, wo, c^i)),
where X' = (x'1, x'2,..., x'T)' and X* = (x'*, x'*,..., x'*)'. As noted above in the case of models differentiated solely by different pdfs, a comparison of models based upon the partition of xt into xit and x*t should be preceded by determining whether дф^фд/^ф = o.
The above setup allows consideration of rival models that could differ in the conditioning set of variables, {xit, i = 1, 2,..., m} and/or the functional form of their underlying probability distribution functions, {f ( ), i = 1, 2,..., m}. In much of this chapter we will be concerned with two rival (conditional) models and for notational convenience we denote them by
Hf: F = {f(yt K, QM; 0), 0 Є 0}, (13.4)
Hg: F = {g(yt |z^ °t-i; Y), Y Є г}, (13.5)
where Qt-1 denotes the set of all past observations on y, x and z, 0 and y are respectively kf and kg vectors of unknown parameters belonging to the non-empty compact sets 0 and Г, and where x and z represent the conditioning variables. For the sake of notational simplicity we shall also often usef(0) and gt(Y) in place of f(yt |xt, QM; 0) and g(yt |zt, QM; y), respectively.
Now given the observations (yt, xt, zt, t = 1, 2,..., T) and conditional on the initial values w0, the maximum likelihood (ML) estimators of 0 and y are given by
PT = arg max Lf (0), у T = arg max Lg(Y),
0Є© уєг
where the respective loglikelihood functions are given by:
T T
Lf (0) = X ln ft (0), Lg (y ) = X ln gt (Y).
t=1 t=1
Throughout we shall assume that the conditional densities ft(0) and gt(Y) satisfy the usual regularity conditions as set out, for example, in White (1982) and Smith (1993), needed to ensure that PT and yT have asymptotically normal limiting distributions under the data generating process (DGP). We allow the DGP to differ from Hf and Hg, and denote it by Hh; thus admitting the possibility that both Hf and Hg could be misspecified and that both are likely to be rejected in practice. In this setting PT and уT are referred to as quasi-ML estimators and their probability limits under Hh, which we denote by 0h* and Yh* respectively, are known as (asymptotic) pseudo-true values. These pseudo-true values are defined by
0h* = arg max Eh{T~1Lf (0)}, Yh* = arg max Eh{T^(y)}, (13.8)
0Є© Y£r
where Eh (■) denotes expectations are taken under Hh. In the case where wt follows a strictly stationary process, (13.8) simplifies to
0h* = arg max Eh{ln f(0)y Yh* = arg max Eh{ln gt(Y)}.
0Є© YЄГ
To ensure global identifiability of the pseudo-true values, it will be assumed that 0y* and Yf* provide unique maxima of Eh{T-1Lf(0)} and Eh{T-1Lg(y)}, respectively. Clearly, under H, namely assuming Hf is the DGP, we have 0f* = 00, and Yf* = Y*(00) where 00 is the "true" value of 0 under Hf. Similarly, under Hg we have Yg* = Yo, and 0g* = 0*(yo) with y0 denoting the "true" value of y under Hg. The functions Y*(00), and 0*(Y0) that relate the parameters of the two models under consideration are called the binding functions. These functions do not involve the true model, Hf, and only depend on the models Hf and Hg that are under consideration. As we shall see later a formal definition of encompassing is given is terms of the pseudo-true values, 0h* and Yh*, and the binding functions Y*(0o), and 0*(Yo).
Before proceeding further it would be instructive to consider some examples of nonnested models from the literature.