THE ECONOMETRICS OF MACROECONOMIC MODELLING
A large-scale EqCM model and four dVAR type forecasting systems based on differenced data
Section 11.2.1 brought out that even for very simple systems, it is in general difficult to predict which version of the model is going to have the smallest forecast error, the EqCM or the dVAR. While the forecast errors of the dVAR are robust to changes in the adjustment coefficient a and the long-run mean Z, the dVAR forecast error may still turn out to be larger than the EqCM forecast error. Typically, this is the case if the parameter change (included in the EqCM) is small relative to the contribution of the equilibrium-correcting term (which is omitted in the dVAR) at the start of the forecast period.
In the following, we generate multi-period forecasts from the econometric model RIMINI, and compare these to the forecasts from models based on differenced data. In order to provide some background to those simulations, this section first describes the main features of the incumbent EqCM and then explains how we have designed the dVAR forecasting systems.
The incumbent EqCM model—eRIM The quarterly macroeconometric model RIMINI has 205 equations[110] which can be divided into three categories:
• 146 definitional equations, for example, national accounting identities, composition of the work-force, etc;
• 33 estimated ‘technical’ equations, for example, price indices with different base years and equations that serve special reporting purposes (with no feedback to the rest of the model);
• 26 estimated stochastic equations, representing economic behaviour.
The two first groups of equations are identical in RIMINI and the dVAR versions of the model. It is the specifications of the 26 econometric equations that distinguish the models. Together they contain putative quantitative knowledge about behaviour relating to aggregate outcomes, for example, consumption, savings, and household wealth; labour demand and unemployment; wage and price interactions (inflation); capital formation; foreign trade. Seasonally unadjusted data are used for the estimation of the equations. To a large extent, macroeconomic interdependencies are contained in the dynamics of the model. For example, prices and wages are Granger-causing output, trade and employment and likewise the level of the real activity feeds back on to wage-price inflation. The model is an open system: examples of important non-modelled variables are the level of economic activity by trading partners, as well as inflation and wage-costs in those countries. Indicators of economic policy (the level of government expenditure, the short-term interest rate, and the exchange rate), are also non-modelled and the forecasts are therefore conditional on a particular scenario for these variables. In the following, we refer to the incumbent version of RIMINI as eRIM.
Two full scale dVAR models—dRIM and dRIMc Because all the stochastic equations in RIMINI are in equilibrium correction form, a simple dVAR version of the model, dRIM, can be obtained by omitting the equilibrium correcting terms from the equation and re-estimating the coefficients of the remaining (differenced variables). Omission of significant equilibrium-correcting terms means that the resulting differenced equations become mis-specified, with autocorrelated and heteroskedastic residuals. From one perspective, this is not a big problem: the main thrust of the theoretical discussion is that the dVAR is indeed mis-specified within sample, cf. that the error-term ey, t in the dVAR equation (11.11) is autocorrelated provided that there is some autocorrelation in the disequilibrium term in (11.7). The dVAR might still forecast better than the EqCM, if the coefficients relating to the equilibrium-correcting terms change in the forecast period. That said, having a mis-specified dVAR does put that model at a disadvantage compared to the EqCM. Section 11.2.1 suggests that simply omitting the levels term while retaining the intercept may seriously damage the dVAR forecasts. Hence we decided to re-model all the affected equations, in terms of differences alone, in order to make the residuals of the dVAR-equations empirically white noise. The intercept was only retained for levels variables. This constitutes the backbone of the dRIMc model.
Two univariate models—dAR and dARr All three model versions considered so far are ‘system of equations’ forecasting models. For comparison, we have also prepared single equation forecasts for each variable. The first set of single equation forecasts is dubbed dAR, and is based on unrestricted estimation of AR(4) models. Finally, we generate forecasts from a completely restricted fourth-order autoregressive model, hence forecasts are generated from Д4Д1п Xt = 0, for a variable Xt that is among the endogenous variables in the original model. This set of forecasts is called dARr, where the r is a reminder that the forecasts are based on (heavily) restricted AR(4) processes. Both dAR and dARr are specified without drift terms, hence their forecasts are protected against trend-misrepresentation. Thus, we will compare forecast errors from five forecasting systems.
Table 11.1 summarises the five models in terms of the incumbent ‘baseline’ EqCM model and the four ‘rival’ dVAR type models.
Relative forecast performance 1992(1)—1994(4) All models that enter this exercise were estimated on a sample ending in 1991(4). The period 1992(1)-1994(4) is used for forecast comparisons. That period saw the start of a marked upswing in the Norwegian economy. Hence, several of the model-endogenous variables change substantially over the 12 quarter forecast period.
In this paragraph, we first use graphs to illustrate how the eRIM forecast the interest rate level (RLB), housing price growth (Д4рН), the rate of inflation (Д4ср*), and the level of unemployment (UTOT) compared to the four dVARs: dRIM, dRIMc, dAR, and dARr. We evaluate three dynamic forecasts, distinguished by the start period: the first forecast is for the whole 12 quarter horizon, so the first period being forecasted is 1992(1). The second simulation starts in 1993(1) and the third in 1994(1). Furthermore, all forecasts are conditional on the actual values of the models’ exogenous variables and the initial conditions,
Table 11.1
The models used in the forecasts
Description
26 Behavioural equations, equilibrium-correcting equations 33 + 146 Technical and definitional equations 26 Behavioural equations, re-estimated after omitting level terms
33 + 146 Technical and definitional equations 26 Behavioural equations, remodelled without levels-information
33 + 146 Technical and definitional equations 71 equations modelled as 4.order AR models 71 equations modelled as restricted 4.order AR models which of course change accordingly when we initialise the forecasts in different start periods.
The results are summarised in Figures 11.1-11.3. Figure 11.1 shows actual and forecasted values from the 12-quarter dynamic simulation. Looking at the graph for the interest rate first, the poor forecast from the dRIM model is immediately evident. Remember that this model was set up by deleting all the levels term in the individual EqCM equations, and then re-estimating these mis-specified equations on the same sample as in eRIM. Hence, dRIM imposes a large number of units roots while retaining the intercepts, and there is no attempt to patch-up the resulting mis-specification. Not surprisingly, dRIM is a clear loser on all the four variables in Figure 11.1. This turns out to be a typical result, it is very seldom that a variable is forecasted more accurately with dRIM than with dRIMc, the re-modelled dVAR version of eRIM.
Turning to dRIMc vs. eRIM, one sees that for the 12-quarter dynamic forecasts in Figure 11.1, the incumbent equilibrium-correcting model seems to outperform dRIMc for interest rates, growth in housing prices, and the inflation rate. However, dRIMc beats the EqCM when it comes to forecasting the rate of unemployment.
One might wonder how it is possible for dRIMc to be accurate about unemployment in spite of the poor inflation forecasts. The explanation is found by considering eRIM, where the level of unemployment affects inflation, but where there is very little feedback from inflation per se on economic activity. In eRIM, the level of unemployment only reacts to inflation to the extent that inflation accrues to changes in level variables, such as the effective real exchange rates or real household wealth. Hence, if eRIM generated inflation forecast errors of the same size that we observe for dRIMc, that would be quite damaging for the unemployment forecasts of that model as well. However, this mechanism is not
present in dRIMc, since all levels terms have been omitted. Hence, the unemployment forecasts of the dVAR versions of RIMINI are effectively insulated from the errors in the inflation forecast. In fact, the figures confirm the empirical relevance of Hendry’s (1997a) claim that when the data generating mechanism is unknown and non-constant, models with less causal content (dRIMc) may still outperform the model that contains a closer representation of the underlying mechanism (eRIM). The univariate forecasts, dAR and dARr, are also way off the mark for the interest rate and for the unemployment rate. However, the forecast rule A4Acpit = 0, in dARc, predicts a constant inflation rate that yields a quite good forecast for inflation in this period; see Figure 11.1.
Figure 11.2 shows the dynamics forecast for the same selection of variables, but now the first forecast period is 1993(1). For the interest rate, the ranking of dRIMc and eRIM forecasts is reversed from Figure 11.1: dRIMc is spot on for most of the forecast-horizon, while eRIM consistently overpredicts. Evidently, dRIMc uses the information embodied in the actual development in 1992 much more efficiently than eRIM. The result is a good example of the intercept - correction provided by differencing. Equations (11.34) and (11.35) show that if the parameters of the EqCM change prior to the start of the forecast (i. e. in 1992 in the present case), then the dVAR might constitute the better forecasting model. Since the loan interest rate is a major explanatory variable for housing price growth (in both eRIM and dRIMc), it is not surprising that
Figure 11.2. The period 1993(1)-1994(4) forecasts and actual values for the
interest rate level (RLB), housing price growth (A4ph), the rate of inflation
(A4cpi), and the level of unemployment (UTOT)
the housing price forecasts of the dRIMc are much better than in Figure 11.1. That said, we note that, with the exception of 1993(4) and 1994(2), eRIM forecasts housing prices better than dRIMc, which is evidence of countervailing forces in the forecasts for housing prices. The impression of the inflation forecasts are virtually the same as in the previous figure, while the graph of actual and forecasted unemployment shows that eRIM wins on this forecast horizon.
The 4-period forecasts are shown in Figure 11.3, where simulation starts in 1994(1). Interestingly, also the eRIM interest rate forecasts have now adjusted. This indicates that the parameter instability that damaged the forecasts that started in 1993(1) turned out to be a transitory shift. dRIMc now outperforms the housing price forecasts of eRIM. The improved accuracy of dARr as the forecast period is moved forward in time is very clear. It is only for the interest rate that the dARr is still very badly off target. The explanation is probably that using A4Axt = 0 to generate forecasts works reasonably well for series with a clear seasonal pattern, but not for interest rates. This is supported by noting the better interest rate forecast of dAR, the unrestricted AR(4) model.
The relative accuracy of the eRIM forecasts, might be confined to the four variables covered by Figures 11.1-11.3. In Eitrheim et al. (1999) we therefore compare the forecasting properties of the five different models on a larger
Figure 11.3. The period 1994(1)-1994(4) forecasts and actual values for the
interest rate level (RLB), housing price growth (A4ph), the rate of inflation
(A4cpi), and the level of unemployment (UTOT)
(sub)set of 43 macroeconomic variables). The list includes most of the variables that are regularly forecasted, such as GDP growth, the trade balance, wages, and productivity.
Eitrheim et al. (1999) follow convention and use the empirical root mean square forecast errors (RMSFE). The theoretical rationale for RMSFE is the mean squared forecast error (MSFE)
MSFEmod = ЕІУт+h — Ут+н, mod I IIr] >
where yT+h mod = E[yT+h I It] and mod is either dVAR or EqCM. The MSFE can be rewritten as
MSFEmod = bias T+i, mod + VaryT+h 1 Er ]•
Conditional on the same information set It, the model with the largest squared bias has also the highest MSFE, and consequently the highest squared RMSFE.[111]
Table 11.2 shows the placements of the five models in the 43 horse races. The incumbent model has the lowest RMSFE for 24 out of the 43 variables, and also has 13 second places. Hence eRIM comes out best or second best for 86% of the horse races, and seems to be a clear winner on this score. The two ‘difference’ versions of the large econometric model (dRIMc and dRIM) have very different fates. dRIMc, the version where each behavioural equation is carefully re-modelled in terms of differences is a clear second best, while dRIM is just as clear a loser, with 27 bottom positions. Comparing the two sets of univariate forecasts, it seems like the restricted version (A4Axt) behaves better than the unrestricted AR model. Finding that the very simple forecasting rule in dARr outperforms the full model in 6 instances (and is runner-up in another 8), in itself suggests that it can be useful as a baseline and yardstick for the model-based forecasts.
Table 11.2 Results of 43 RMSFE forecast contests
|
Parts (b)-(d) in Table 11.2 collect the results of three 4-quarter forecast contests. Interestingly, several facets of the picture drawn from the 12-quarter forecasts and the graphs in Figures 11.1-11.3 appear to be modified. Although the incumbent eRIM model collects a majority of first and second places, it is beaten by the double difference model A4Axt = 0, dARr, in terms of first places in two of the three contests. This shows that the impression from the ‘headline’ graphs, namely that dARr works much better for the 1994(1)-1994(4) forecast, than for the forecast that starts in 1992, carries over to the larger set of variables covered by Table 11.2. In this way, our result shows in practice what the theoretical discussion foreshadowed, namely that forecasting systems that are blatantly mis-specified econometrically, nevertheless can forecast better than the econometric model with a higher causal content.
The results seem to corroborate the analytical results above. For short forecast horizons like, for example, 4-quarters, simple univariate dARr models offer much more protection against pre-forecast breaks compared with the other models, and their forecast errors are also insulated from forecast errors elsewhere in a larger system. However, the dARr model seems to lose this advantage relative to the other models as we increase the forecast horizon. The autonomous growth bias in dVAR type models tend to multiply as we increase the forecast horizon, causing the forecast error variance to ‘explode’. Over long forecast horizons we would then typically see huge dVAR biases relative to the EqCM forecast bias. Finally, neither of the models protect against breaks that occur after the forecast is made.