Food Expenditure Example
food-expi = в1 + в2incomei + ei i = 1,2,..., N (8.2)
where food-expi is food expenditure and incomei is income of the ith individual. When the errors of the model are heteroskedastic, then the least squares estimator of the coefficients is consistent. That means that the least squares point estimates of the intercept and slope are useful. However, when the errors are heteroskedastic the usual least squares standard errors are inconsistent and therefore should not be used to form confidence intervals or to test hypotheses.
To use least squares estimates with heteroskedastic data, at a very minimum, you'll need a consistent estimator of their standard errors in order to construct valid tests and intervals. A simple computation proposed by White accomplishes this. Standard errors computed using White's technique are loosely referred to as robust, though one has to be careful when using this term; the standard errors are robust to the presence of heteroskedasticity in the errors of model (but not necessarily other forms of model misspecification).
Open the food. gdt data in gretl and estimate the model using least squares.
1 open "@gretldirdatapoefood. gdt"
2 ols food_exp const income
3 gnuplot food_exp income —linear-fit
This yields the usual least squares estimates of the parameters, but produces the wrong standard errors when the data are heteroskedastic. To get an initial idea of whether this might be the case a plot of the data is generated and the least squares line is graphed. If the data are heteroskedastic with respect to income then you will see more variation around the regression line for some levels of income. The graph is shown in Figure 8.1 and this appears to be the case. There is significantly more variation in the data for high incomes than for low.
To obtain the heteroskedasticity robust standard errors, simply add the --robust option to the regression as shown in the following gretl script. After issuing the —robust option, the standard errors stored in the accessor $stderr(income) are the robust ones.
1 ols food_exp const income --robust
2 # confidence intervals (Robust)
3 scalar lb = $coeff(income) - critical(t,$df,0.025) * $stderr(income)
4 scalar ub = $coeff(income) + critical(t,$df,0.025) * $stderr(income)
5 printf "nThe 95%% confidence interval is (%.3f, %.3f).n",lb, ub
In the script, we have used the critical(t,$df,0.025) function to get the desired critical value from the t-distribution. Remember, the degrees of freedom from the preceding regression are stored in $df. The first argument in the function indicates the desired distribution, and the last is the desired right-tail probability (a/2 in this case).
The script produces
Figure 8.1: Plot of food expenditures against income with least squares fit. The 95% confidence interval is (6.391, 14.028).
This can also be done from the pull-down menus. Select Model>Ordinary Least Squares (see Figure 2.6) to generate the dialog to specify the model shown in Figure 8.2 below. Note, the check box to generate ‘robust standard errors’ is circled. You will also notice that there is a button labeled Configure just to the right of the ‘Robust standard errors’ check box. Clicking this button reveals a dialog from which several options can be selected. In this case, we can select the particular method that will be used to compute the robust standard errors and even set robust standard errors to be the default computation for least squares. This dialog box is shown in Figure 8.3 below.
To reproduce the results in Hill et al. (2011), you’ll want to select HC1 from the pull-down list. As you can see, other gretl options can be selected here that affect the default behavior of the program. The particular variant it uses depends on which dataset structure you have defined for your data. If none is defined, gretl assumes you have cross-sectional data.
The model results for the food expenditure example appear in the table below. After estimating the model using the dialog, you can use Analysis>Confidence intervals for coefficients to generate 95% confidence intervals. Since you used the robust option in the dialog, these will be based on the variant of White’s standard errors chosen using the ‘configure’ button. In this case, I chose HC3, which some suggest performs slightly better in small samples. The result is:
VARIABLE COEFFICIENT 95% CONFIDENCE INTERVAL
const 83.4160 25.4153 141.417
income 10.2096 6.39125 14.0280
283.5735 S. D. dependent var 304505.2 S. E. of regression 0.385002 Adjusted R2 29.29889 P-value(F)
Of course, you can also generate graphs from a script, which in this case is:
In this script we continue to expand the use of gretl functions. The residuals are saved in line 2. Then in line 3 the setinfo command is used to change the description and the graph label using the -d and - n switches, respectively. Then gnuplot is called to plot res against income. This time the output is directed to a specific file. Notice that no suffix was necessary. To view the file in MS Windows, simply lauch wgnuplot and load ’c:Tempolsres’.
Another graphical method that shows the relationship between the magnitude of the residuals and the independent variable is shown below:
1 series abs_e = abs(res)
2 setinfo abs_e - d "Absolute value of the LS Residuals"
3 - n "Absolute Value of Residual"
4 gnuplot abs_e income —loess-fit —output=c:temploessfit. plt
The graph appears in Figure 8.5. To generate this graph two things have been done. First, the absolute value of the least squares residuals have been saved to a new variable called abs_e. Then these are plotted against income as a scatter plot and as a locally weighted, smoothed scatterplot estimated by process called loess.
The basic idea behind loess is to create a new variable that, for each value of the dependent variable, yi, contains the corresponding smoothed value, y|. The smoothed values are obtained by running a regression of y on x by using only the data (xi, yi) and a few of the data points near this one. In loess, the regression is weighted so that the central point (xi, yi) gets the highest weight and points that are farther away (based on the distance | Xj — xi |) receive less weight. The estimated regression line is then used to predict the smoothed value yf for yis only. The procedure is repeated to obtain the remaining smoothed values, which means that a separate weighted regression is performed for every point in the data. Obviously, if your data set is large, this can take a while. Loess is said to be a desirable smoother because of it tends to follow the data. Polynomial smoothing methods, for instance, are global in that what happens on the extreme left of a scatterplot can affect the fitted values on the extreme right.
One can see from the graph in Figure 8.5 that the residuals tend to get larger as income rises, reaching a maximum at 28. The residual for an observation having the largest income is relatively small and the locally smoothed prediction causes the line to start trending downward.