A COMPANION TO Theoretical Econometrics
Truncation and censoring
Econometric data used in duration analysis are often panel data comprising a number of individuals observed over a fixed interval of time. Let us suppose that the survey concerns unemployment durations; the sampling period is January 2000-December 2000 and the individuals also provided information on their job history prior to January 2000. We can consider two different sampling schemes, which imply truncation and censoring.
Censoring
Let us first consider a sample drawn from the population including both employed and unemployed people, and assume at most one unemployment spell per individual. Within this sample we find persons, who:
1. are unemployed in January and remain unemployed in December too;
2. are unemployed in January and find a job before December;
3. are employed in January, lose their job before December and are still unemployed at this date;
4. are employed in January, next lose their job and find new employment before December.
Due to the labor force dynamics, unemployment durations of some individuals are only partially observed. For groups (2) and (4) the unemployment spells are complete, whereas they are right censored for groups (1) and (3).
To identify the right censored durations we can introduce an indicator variable d. It takes value 1 if the observed duration spell for individual i is complete, and 0 if this observation is right censored. We also denote by Ti the date of the entry into the unemployment state, by ^ the total unemployment duration and by y{ the observed unemployment duration knowing that the sampling period ends at T.
The model involves two latent variables Ti and ^. The observed variables dj and yi are related to the latent variables by:
Figure 21.1 Censoring scheme: unemployment spells
Conditional on Ti the density of the observed pair (yi, dt) is:
by substituting the hazard expression into equation (21.17). The loglikelihood function for this model can be written by assuming that individual durations are independent conditional on explanatory variables:
N
log L(y; d) = X log [12] [13]i(У; di)
i=1
N N
= X di log м yi) + X logS( yi).
i =1 i =1
Note that the duration distributions are conditioned on the date Ti. This information has generally to be introduced among the explanatory variables to correct for the so-called cohort effect.
Truncation
We can also draw the sample in the subpopulation of people who are unemployed in January 2000 (date T0, say). Within this sample we find persons, who:
Observed unemployment spells ----------- Unobserved unemployment spells
Figure 21.2 Truncation scheme
However, we now need to take into account the endogenous selection of the sample, which only contains unemployed people at T0 (see Lung-Fei Lee, Chapter 16, in this volume). This sampling scheme is called left truncated, since compared to the previous scheme we have only retained the individuals with unemployment duration larger than T0 - T. Conditional on T,, the density of the pair (y,, d,) becomes:
l,(У,, d) = f(y)dS(y)1-d,/Si(To - T).