A COMPANION TO Theoretical Econometrics
Time series: a brief historical introduction
Time series data have been used since the dawn of empirical analysis in the mid-seventeenth century. In the "Bills of Mortality" John Graunt compared data on births and deaths over the period 1604-60 and across regions (parishes); see Stigler (1986). The time dimension of such data, however, was not properly understood during the early stages of empirical analysis. Indeed, it can be argued that the time dimension continued to bedevil statistical analysis for the next three centuries before it could be tamed in the context of proper statistical models for time series data.
The descriptive statistics period: 1665-1926
Up until the last quarter of the nineteenth century the time dimension of observed data and what that entails was not apparent to the descriptive statistics literature which concentrated almost exclusively in looking at histograms and
Figure 28.1 US industrial production index |
certain associated numerical characteristics such as the mean and variance. By its very nature, histogram analysis and the associated descriptive statistics suppress the time dimension and concentrate on a single aspect of the data generating process, the distribution. Two other aspects raised by the time dimension, the dependence and heterogeneity with respect to the time index, were largely ignored, because implicit in this literature is the assumption that data exhibit independence and complete homogeneity. Questions concerning the temporal independence/homogeneity of time series data were first explicitly raised in the last quarter of the nineteenth century by Lexis and Bienayme (see Heyde and Seneta, 1977, ch. 3). By the mid-nineteenth century it became apparent that comparisons over time required a certain stability (temporal independence/homogeneity) of the measurements being compared; births, deaths, accidents, suicides, and murders. The proposed tests for stability took the form of comparing the sample variance of the time series in question with that of a stable (independent and homogeneous) binomially distributed process. The results on the basis of such a comparison were very discouraging, however, because, with the exception of the ratio of male to female births, it suggested that all other observed time series appeared to exhibit some form of instability.
This apparent instability was, at the time, associated with the cycles and trends exhibited by time series {yv y2,..., yT} when plotted over time (t-plot); in Figure 28.1 we can see a typical economic time series, the monthly US industrial production index for the period 1960-94. The two chance regularity patterns one can see in Figure 28.1 is the secular trend (increasing function of t) and the cycles around this trend. These cycles become more apparent when the data are de-trended as shown in Figure 28.2. By the end of the nineteenth century a time
series was perceived as made up of three different components. As summarized by Davis:
The problem of single time series,..., is concerned with three things: first, the determination of a trend; second, the discovery and interpretation of cyclical movements in the residuals; third, the determination of the magnitude of the erratic element in the data. (Davis, 1941, p. 59)
In view of this, it was only natural that time series focused on discovering the presence and capturing this instability (trends and cycles). In the early twentieth century the attempt to model observed cycles took two alternative (but related) forms. The first form attempted to capture the apparent cycles using sinusoidal functions (see Schuster, 1906). The objective was to discover any hidden periodicities using a technique appropriately called the periodogram. The second way to capture cycles came in the form of the temporal correlation, the autocorrelation coefficients, and their graph the correlogram; see Granger and Newbold (1977). This was a simple adaptation of Galton's (contemporaneous) correlation coefficient.
The early empirical studies indicated that the periodogram appeared to be somewhat unrealistic for economic time series because the harmonic scheme assumes strict periodicity. The correlogram was also partly unsuccessful because the sample correlogram gave rise to spurious correlations when it was applied to economic time series. Various techniques were suggested at the time in an attempt to deal with the spurious correlation problem, the most widely used being the differencing of the series and evaluating the correlogram using the differenced series (see Norton, 1902, Hooker, 1905). The first important use of these techniques with economic data was by Moore (1914) who attempted to discover the temporal interdependence among economic time series using both the periodogram and the temporal correlations. It was felt at the time that the problem of spurious correlation arises from the fact that time series are functions of time but there was no apparent functional form to be used in order to capture that effect; see Hendry and Morgan (1995). This literature culminated with the classic papers of Yule (1921, 1926) where the spurious correlation (and regression) problem was diagnosed as due to the apparent departures (exhibited by economic time series) from the assumptions required to render correlation analysis valid. Yule (1921) is a particularly important paper because it constitutes the first systematic attempt to relate misleading results of statistical analysis to the invalidity of the underlying probabilistic assumptions: misspecification. To this day, the problem of spurious correlation as due to the departures of probabilistic assumptions, necessary to render correlation analysis valid, is insufficiently understood. This is because the modeler is often unaware of the underlying probabilistic assumptions whose validity renders the analysis reliable. Such a situation arises when the statistical model is not explicitly specified and thus no assessment of the underlying assumptions can be conducted in order to ensure their validity. For instance, the sample (contemporaneous) correlation coefficient between two different series {(xt, yt), t = 1, 2,..., T} is a meaningful measure of first-order dependence only in cases where the means of both processes underlying the data are constant over t : E(Xt) = px, E(Yt) = p, for all t Є T; otherwise the measure is likely to be misleading.