Using gret l for Principles of Econometrics, 4th Edition
Spurious Regressions
It is possible to estimate a regression and find a statistically significant relationship even if none exists. In time-series analysis this is actually a common occurrence when data are not stationary. This example uses two data series, rwl and rw2, that were generated as independent random walks.
rw 1 : yt = yt-1 + vu (12 1)
The errors are independent standard normal random deviates generated using a pseudo-random number generator. As you can see, Xt and yt are not related in any way. To explore the empirical relationship between these unrelated series, load the spurious. gdt data and declare the data to be time-series.
1 open "@gretldirdatapoespurious. gdt"
2 setobs 1 1 —special-time-series
The sample information at the bottom of the main gretl window indicates that the data have already been declared as time-series and that the full range (1-700) is in memory. The first thing to do is to plot the data using a time-series plot. To place both series in the same time-series graph, select View>Graph specified vars>Time-series plots from the pull-down menu. This will reveal the ‘define graph’ dialog box. Place both series into the ‘Selected vars’ box and click OK. The result appears in top part of Figure 12.4 (after editing) below. The XY scatter plot is obtained similarly, except use View>Graph specified vars>X-Y scatter from the pull-down menu. Put rwl on the y axis and rw2 on the x axis.
The linear regression confirms this. Left click on the graph to reveal a pop-up menu, from which you choose Edit. This brings up the plot controls shown in Figure 4.16. Select the linear fit option to reveal the regression results in Table 12.1.
The coefficient on rw2 is positive (0.842) and significant (t = 40.84 > 1.96). However, these variables are not related to one another! The observed relationship is purely spurious. The cause of the spurious result is the nonstationarity of the two series. This is why you must check your data for stationarity whenever you use time-series in a regression.
OLS, using observations 1-700
Dependent variable: rw1
Coefficient Std. Error t-ratio
0.620478 28.7167 0.0000
0.0206196 40.8368 0.0000
Sum squared resid 51112.33 S. E. of regression 8.557268 R2 0.704943 Adjusted R2 0.704521
Table 12.1: Least squares estimation of a spurious relationship.