Advanced Econometrics Takeshi Amemiya
Problem of Left-Censoring
In this subsection we shall consider the problem of left-censoring in models for unemployment duration. When every individual is observed at the start of his or her unemployment spell, there is no problem of left-censoring. Often, however, an individual is observed to be in an unemployment spell at the start of the sample period. This complicates the estimation.
Suppose that an unemployment spell of a particular individual starts at time —s, the individual is interviewed at time 0, and the spell terminates at t. (To simplify the analysis, we assume that right-censoring at time t does not occur.) The treatment of left-censoring varies according to the following three cases: (1) .s is observed but t is not observed, (2) both s and t are observed, and (3) f is observed but j is not observed. For each case we shall derive the relevant likelihood function.
The first case corresponds to the situation analyzed by Nickell (1979), although his is a Markov chain model. Assuming that the underlying distribution of the duration is F( •) and its density/(•), we can derive the density g(s). Denoting the state of “being unemployed” by U, we have for sufficiently small As
g(s) As = P[U started in (—s — As, — j)| U at 0] (11.2.70)
_ P[U at 0| 77 started in (—5 — As, — 5)]P[t/ started in (—s — As, —5)]
(Numerator) ds
_ P[U at 0| U started in (—s — As, —5)] As
1
(Numerator) ds
[1-/•(*)] Aj [1 ~E(s)]As
f - ES
[1 ~F(s)]ds Jo
where ES= Sosf(s) ds. In (11.2.70) the third equality follows from the assumption that P[U started in (—5 — As, —s)] does not depend on s (the assumption of constant entry rate), and the last equality follows from integration by parts. By eliminating As from both sides of (11.2.70), we obtain
In the second case we should derive the joint density #($,*) = 5)^(5). The
density g(s) has been derived, but we still need to derive g(r|.s). Let X denote total unemployment duration. First evaluate
P(X>s)
P(X>s + t)
P{X > s)
_ 1 - F(s + t)
1 - m •
If we denote the distribution function of by G(t|j), (11.2.72) implies
F(s +1)
Ст = Т=^ЬУ (11-273)
Therefore, differentiating (11.2.73) with respect to t, we obtain
*№) = fl^)- O1-2-74)
Finally, from (11.2.71) and (11.2.74), we obtain
(11.2.75)
This situation holds for Lancaster (1979), as he observed both 5 and t. However, Lancaster used the conditional density (11.2.74) rather than the joint density (11.2.75) because he felt uncertain about the assumption of constant entry rate.
Finally, in the third case we need g(t). This can be obtained by integrating g(s, t) with respect to s as follows:
g(t)=Is Г/(5+г)Л (11-2-76)
_ 1 ~F(t)
ES '
See Flinn and Heckman (1982) for an alternative derivation of g(t).
11.2.5 Cox’s Partial Maximum Likelihood Estimator Cox (1972) considered a hazard rate of the form
A'(/) = A(t) exp (fi'Xj), (11.2.77)
which generalizes (11.2.63). This model is often referred to as a proportional hazards model. Cox proposed a partial MLE (PMLE) obtained by maximizing a part of the likelihood function. We shall first derive the whole likelihood function and then write it as a product of the part Cox proposed to maximize and the remainder.
Cox allowed for right-censoring. Let /, , / = 1,2,. . . , n, be completed durations and let th і = n + 1, n + 2,. . . , N, be censored durations. We assume for simplicity {/,} are distinct.9 Then the likelihood function has the form (11.2.11): specifically,
L = J] exp (ДХЖг,-) exp [-exp (y?'x,)A(t()] (11.2.78)
i-i
x П exp [-exp (jff'x;)A(0],
І“П+1
where A(t) = foA(z) dz. Combining the exp functions that appear in both terms and rewriting the combined term further, we obtain
L = f[ exP 0V*i)Wi) • exp 2 exp 0Гж,) A(z) rfzj (11.2.79)
= TT exp ЦЗ'х,)А(г,) • exp j - | | 2 ехр(/Гж*)1я0й[, i-i I Jo Lagjjw j J
where R(t) = (itt S t). To understand the second equality in (11.2.79), note that їдєед, exp (0'xh) is a step function described in Figure 11.1.
Cox’s PMLE Д, maximizes
" exp (fi'x,)
1 Ц 2 exp (P'xh) ‘
АЄЯ(Г,)
It is a part of L because we can write L as L = LxL2 ,
■і ‘2 ‘N-l
Figure 11.1 Іьєт exP (/?'**) as a function of t where
= п[ 2 exp
i-l LheJUt,) J
X exp j-^ exp 0?'xA)j A(0 dfj
Because does not depend on Mf), Cox’s method enables us to estimate fi without specifying X(t). Cox (1975) suggested and Tsiatis (1981) proved that under general conditions the PMLE is consistent and asymptotically normal with the asymptotic covariance matrix given by
(11.2.83)
This result is remarkable considering that L, is not even a conditional likelihood function in the usual sense.10
However, L{ and L2 do have intuitive meanings. We shall consider their meanings in a simple example. Suppose N = 2 and tx is a completed duration (say, the first person dies at time tx) and t2 a censored duration (the second person lives at least until time t2), with t2> t{. Then we have
We can write (11.2.84) as
L — >
where
(11.2.89)
These four components of L can be interpreted as follows:
Lx = P{# 1 dies at /j I both #1 and #2 live until tx and either #1 or #2 dies at tx)
£-21 = jP(either #1 or #2 dies at tjlboth #1 and #2 live until t,)
L22 = P(both # 1 and #2 live until t,)
L23 = P{#2 lives at least until t2#l lives until t,).
Note that L2L22J-t23 corresponds to L^ of (11.2.82).
Kalbfleisch and Prentice (1973) gave an alternative interpretation of Cox’s partial likelihood function. First, consider the case of no censoring. Let t < h < . . • < tN be an ordered sequence of durations. Then by successive integrations we obtain
P(tx<t2<. . .<tN) (11.2.90)
X exp j - [exp (P'xN-1) + exp (fl’xN)] A(z) dzI
X dtN - [ dtN-2 ... dtі
ПОО Г 00 ДГ—2
... [] A(f,) exp (У?'х,)
I «/fjV-3 f“l
X exp j^—exp (/?'x,) J A(z) t/zj X
= exp(0'xiV)exp(/?,xAr_1)- • • exp (y?'Xi)
-5- (exp (^'Хдг)[ехр 05'Хдг) + exp (^'Хлг-!)]
• • • [exp (А'Хдг) + exp (/?%,_,) + . . . + exp (0% )]},
Next, suppose that completed durations are ordered as /, < t2 < . . -<tn and in the interval [/, , /i+1), /=1,2,. . . , n (with the understanding tn+1 = °°), we observe censored durations tn, ta,. . . , tiqr Then we have
P(tl<t2<. . .< tn, tt S tn, ta,. . . , ti9l for all /) (11.2.91)
, for all i|l,, t2.......... tn)
Xf'ihVHb) ■ ■ .fn(t„)dtndt„-. . dti
ee Гее П
... П A(r() exp (0%)
I Л„_і 1-1
X exp j-J^exp (0%) + 2 (exp )?'Xy)j A(z) dz j
X df„ Л„_!. . .dtx
= exp (0'xj exp (0'хп-х) • • • exp (£'x,)
-H {[exp (0'xn) + C„][exp (0'xn) + exp (0’Xn-i ) + Cn + C„-i ] ■ ■ • [exp (0'xn) + . . . + exp(0%) + Cn + . . . + Ci]), where c, = exp (fi'Xij).
Because the parameter vector fi appears in both L, and Ьг, we expect Cox’s PMLE to be asymptotically less efficient than the full MLE in general. This is indeed the case. We shall compare the asymptotic covariance matrix of the two estimators in the special case of a stationary model where the A(f) that appears in the right-hand side of (11.2.77) is a constant. Furthermore, we shall suppose there is no censoring. In this special case, Cox’s model is identical to the model considered in Section 11.2.3. Therefore the asymptotic covariance matrix of the MLE fi can be derived from (11.2.30) by noting that the fi of the present section can be regarded as the vector consisting of all but the last element of the fi of Section 11.2.3. Hence, by Theorem 13of Appendix l, we obtain
(11.2.92)
The asymptotic covariance matrix of Cox’s PMLE can be derived from (11.2.83) as
(11.2.93)
where 2A denotes ХАєЛ(<() and E denotes the expectation taken with respect to random variables tt that appear in R(tt).
This expectation is rather cumbersome to derive in general cases. Therefore, we shall make a further simplification and assume fiisa scalar and equal to 0. Under this simplifying assumption, we can show that the PMLE is asymptotically efficient. (Although this is not a very interesting case, we have considered this case because this is the only case where the asymptotic variance of the PMLE can be derived without lengthy derivations while still enabling the reader to understand essentially what kind of operation is involved in the expectation that appears in (11.2.93). For a comparison of the two estimates in more general cases, a few relevant references will be given at the end.) Under this simplification (11.2.93) is reduced to
(11.2.94)
We shall first evaluate (11.2.94) for the case N= 3 and then for the case of general N. If tl<t2<t3, we have
If we change the rank order of (tu t2, t3), the right-hand side of (11.2.95) will change correspondingly. But, under our stationarity (constant Я) and homogeneity (/? = 0) assumptions, each one of six possible rank orderings can happen with an equal probability. Therefore we obtain
Similarly, if tx<t2<t3, we have
By using an argument similar to that above, we obtain
(11.2.99)
(11.2.100)
N(N - l)Ji к
The coefficient on can be derived as follows: There are N
permutations of N integers 1,2,. . . , N, each of which happens with an equal probability. Let Pk be the number of permutations in which a given pair of integers, say, 1 and 2, appear in the last к positions. Then Pk = (*)2[(N— 2)!]. Therefore the desired coefficient is given by (ЛП)-12£12 «92[(tf - 2)!]//c[2] [3]}.
Finally, from (11.2.94), (11.2.99), and (11.2.100), we conclude
(11.2.101)
(11.2.102) we see that Cox’s PMLE is asymptotically efficient in this special case.
Kalbfleisch (1974) evaluated the asymptotic relative efficiency of the PMLE also for nonzero fi ina model that is otherwise the same as the one we have j ust considered. He used a Taylor expansion of (11.2.93) around fi = 0. Kay (1970) extended Kalbfleisch’s results to a case where fi is a two-dimensional vector. Han (1983), using a convenient representation of the asymptotic relative efficiency of the PMLE obtained by Efron (1977), evaluated it for Weibull as well as exponential models with any number of dimensions of the j? vector and with or without censoring. Cox’s estimator is found to have a high asymptotic relative efficiency in most of the cases considered by these authors. However, it would be useful to study the performance of Cox’s estimator in realistic situations, which are likely to occur in econometric applications.
Exercises
1. (Section 11.1.1)
Using (11.1.5), express the unconditional mean and variance ofy’(2) as a function of the mean and variance of y'(0).
4. (Section 11.1.1)
Prove the statement following (11.1.15).
5. (Section 11.1.1)
Verify statement (11.1.23).
6. (Section 11.1.3)
Derive the asymptotic variance-covariance matrix of the MLE of a in the Boskin-Nold model using (11.1.38).
7. (Section 11.1.3)
The mean duration is derived in (11.1.65). Using a similar technique, derive Vft.
8. (Section 11.1.5)
Let the Markov matrix of a two-state (1 or 0) stationary homogeneous first-order Markov chain model be
where A is the only unknown parameter of the model. Define the following symbols:
nJk Number of people who were in statej at time 0 and are in state к at time 1
rij. Number of people who were in state j at time 0 n. j Number of people who are in state j at time 1
We are to treat n}. as given constants and njk and n.^as random variables.
a. Supposing»,. = 10, rio. = 5, n., = 8, and n.0 = 7, compute the least squares estimate of A based on Eq. (11.1.73). Also, compute an estimate of its variance conditional on щ..
b. Supposing = 7 and «о, = 1 in addition to the data given in a, compute the MLE of A and an estimate ofits variance conditional on n}..
9. (Section 11.1.5)
Prove that minimizing (11.1.78) yields the asymptotically same estimator as minimizing (11.1.77).
10. (Section 11.1.5)
Write down the aggregate likelihood function (11.1.82) explicitly in the following special case: T= 1, N= 5, r0 = 3, and r, = 2.
11. (Section 11.2.5)
Verify (11.2.50).
12. (Section 11.2.5)
Consider a particular individual. Suppose his hazard rate of moving from state 1 to state 2 is a,/ + f}xk, where t is the duration in state 1 and к denotes the /cth spell in state 1. The hazard rate of moving from state 2 to 1 is a2t + f}2k, where t is the duration in state 2 and к denotes the kth spell in state 2. Suppose he was found in state 1 at time 0. This is his first spell in state 1. (He may have stayed in state 1 prior to time 0.) Then he moved to state 2 at time £,, completed his first spell in state 2, moved back to state 1 at time t2, and stayed in state 1 at least until time t3. Write down his contribution to the likelihood function. Assume a,, a2 > 0.
13. (Section 11.2.5)
Consider a homogeneous nonstationary duration model with the hazard rate
A(f) = at, a> 0.
Supposing we observe n completed spells of duration £,, t2,. . . ,t„ and N — n censored spells of duration tn+l, tn+2,. . . ,tN, derive the MLE of a and its asymptotic variance.
14. (Section 11.2.5)
Let F(tX) = 1 — e~H where A is a random variable distributed with density g( •). Define A(r) = f(t)/[ 1 — F(t)], where F(t) = ExF(tX) and/(/) = dF/dt. Show dk(t)/dt < 0 (cf. Flinn and Heckman, 1982).