Using gret l for Principles of Econometrics, 4th Edition
Cars Example
The data set cars. gdt is included in package of datasets that are distributed with this manual. In most cases it is a good idea to print summary statistics of any new dataset that you work with. This serves several purposes. First, if there is some problem with the dataset, the summary statistics may give you some indication. Is the sample size as expected? Are the means, minimums and maximums reasonable? If not, you’ll need to do some investigative work. The other reason is important as well. By looking at the summary statistics you’ll gain an idea of how the variables have been scaled. This is vitally important when it comes to making economic sense out of the results. Do the magnitudes of the coefficients make sense? It also puts you on the lookout for discrete variables, which also require some care in interpreting.
The summary command is used to get summary statistics. These include mean, minimum, maximum, standard deviation, the coefficient of variation, skewness and excess kurtosis. The corr command computes the simple correlations among your variables. These can be helpful in gaining an initial understanding of whether variables are highly collinear or not. Other measures are more useful, but it never hurts to look at the correlations. Either of these commands can be used with a variable list afterwards to limit the list of variables summarized of correlated.
Consider the cars example from POE4. The script is
1 open "c:Program Filesgretldatapoecars. gdt"
2 summary
3 corr
4 ols mpg const cyl eng wgt
5 vif
The summary statistics appear below:
Summary Statistics, using the observations |
1-392 |
|||
Variable |
Mean |
Median |
Minimum |
Maximum |
mpg |
23.4459 |
22.7500 |
9.00000 |
46.6000 |
cyl |
5.47194 |
4.00000 |
3.00000 |
8.00000 |
eng |
194.412 |
151.000 |
68.0000 |
455.000 |
wgt |
2977.58 |
2803.50 |
1613.00 |
5140.00 |
Variable |
Std. Dev. |
C. V. |
Skewness |
Ex. kurtosis |
mpg |
7.80501 |
0.332894 |
0.455341 |
-0.524703 |
cyl |
1.70578 |
0.311733 |
0.506163 |
-1.39570 |
eng |
104.644 |
0.538259 |
0.698981 |
-0.783692 |
wgt |
849.403 |
0.285266 |
0.517595 |
-0.814241 |
and the correlation matrix
Correlation coefficients, using the observations 1-392
5% critical value (two-tailed) = 0.0991 for n = 392
mpg |
cyl |
eng |
wgt |
|
1.0000 |
-0.7776 |
-0.8051 |
-0.8322 |
mpg |
1.0000 |
0.9508 |
0.8975 |
cyl |
|
1.0000 |
0.9330 |
eng |
||
1.0000 |
wgt |
The variables are quite highly correlated in the sample. For instance the correlation between weight and engine displacement is 0.933. Cars with big engines are heavy. What a surprise!
The regression results are:
OLS, using observations 1-392
Dependent variable: mpg
Coefficient |
Std. Error |
t-ratio |
p-value |
|
44.3710 |
1.48069 |
29.9665 |
0.0000 |
|
cyl |
-0.267797 |
0.413067 |
-0.6483 |
0.5172 |
eng |
-0.0126740 |
0.00825007 |
-1.5362 |
0.1253 |
wgt |
-0.00570788 |
0.000713919 |
-7.9951 |
0.0000 |
The test of the individual significance of cyl and eng can be read from the table of regression results. Neither are significant at the 5% level. The joint test of their significance is performed using the omit statement. The F-statistic is 4.298 and has a p-value of 0.0142. The null hypothesis is rejected in favor of their joint significance.
The new statement that requires explanation is vif. vif stands for variance inflation factor and it is used as a collinearity diagnostic by many programs, including gretl. The vif is closely related to the statistic suggested by Hill et al. (2011) who suggest using the R[27] from auxiliary regressions to determine the extent to which each explanatory variable can be explained as linear functions of the others. They suggest regressing xj on all of the other independent variables and comparing the R[28] from this auxiliary regression to 10. If the R2 exceeds 10, then there is evidence of a collinearity problem.
The vifj actually reports the same information, but in a less straightforward way. The vif associated with the jth regressor is computed
which is, as you can see, simply a function of the Rj2 from the jth regressor. Notice that when R2 > .80, the vifj > 10. Thus, the rule of thumb for the two rules is actually the same. A vifj greater than 10 is equivalent to an R2 greater than.8 from the auxiliary regression.
The output from gretl is shown below:
Variance Inflation Factors Minimum possible value = 1.0
Values > 10.0 may indicate a collinearity problem
Mean dependent var 23.44592 Sum squared resid 7162.549 R[31] 0.699293 F(3, 388) 300.7635 Log-likelihood -1125.674 Schwarz criterion 2275.234 |
S. E. of regression 4.296531 Adjusted R[32] 0.696967 P-value(F) 7.6e-101 Akaike criterion 2259.349 Hannan-Quinn 2265.644 |
cyl 10.516 eng 15.786
VIF(j) = 1/(1 - R(j)"2), where R(j) is the multiple correlation coefficient between variable j and the other independent variables
Properties of matrix X'X:
1-norm = 4.0249836e+009
Determinant = 6.6348526e+018
Reciprocal condition number = 1.7766482e-009
Once again, the gretl output is very informative. It gives you the threshold for high collinearity (vifj) > 10) and the relationship between vifj and R2. Clearly, these data are highly collinear. Two variance inflation factors above the threshold and the one associated with wgt is fairly large as well.
6.4 Script [33] 2 3 4 5 6 [34] [35] [36] [37] [38] 12 [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51]
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
square advert
ols sales const price advert sq_advert restrict b[2] =0 b[3] =0 b[4] =0 end restrict
ols sales const price advert sq_advert
scalar sseu = $ess
scalar unrest_df = $df
ols sales const
scalar sser = $ess
scalar rest_df = $df
scalar J = rest_df - unrest_df
scalar Fstat=((sser-sseu)/J)/(sseu/(unrest_df)) pvalue F J unrest_df Fstat
# t-test
ols sales const price advert sq_advert omit price
# optimal advertising
open "@gretldirdatapoeandy. gdt" square advert
ols sales const price advert sq_advert
scalar Ao =(1-$coeff(advert))/(2*$coeff(sq_advert))
# test of optimal advertising restrict b[3]+3.8*b[4]=1
end restrict
open "@gretldirdatapoeandy. gdt" square advert
ols sales const price advert sq_advert
scalar Ao =(1-$coeff(advert))/(2*$coeff(sq_advert))
# One-sided t-test
ols sales const price advert sq_advert --vcv
scalar r = $coeff(advert)+3.8*$coeff(sq_advert)-1
scalar v = $vcv[3,3]+((3.8)rt2)*$vcv[4,4]+2*(3.8)*$vcv[3,4]
scalar t = r/sqrt(v)
pvalue t $df t
# joint test
ols sales const price advert sq_advert restrict
b[3]+3.8*b[4]=1
b[1]+6*b[2]+1.9*b[3]+3.61*b[4]=80 end restrict
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
# restricted estimation
open "@gretldirdatapoebeer. gdt" logs q pb pl pr i
ols l_q const l_pb l_pl l_pr l_i —quiet restrict
b2+b3+b4+b5=0 end restrict restrict
b[2]+b[3]+b[4]+b[5]=0 end restrict
# model specification -- relevant and irrelevant vars open "@gretldirdatapoeedu_inc. gdt"
ols faminc const he we omit we
corr
list all_x = const he we kl6 xtra_x5 xtra_x6 ols faminc all_x
# reset test
ols faminc const he we kl6 reset —quiet —squares-only reset --quiet
# model selection rules and a function function matrix modelsel (series y, list xvars)
ols y xvars —quiet scalar sse = $ess scalar N = $nobs scalar K = nelem(xvars) scalar aic = ln(sse/N)+2*K/N scalar bic = ln(sse/N)+K*ln(N)/N scalar rbar2 = 1-((1-$rsq)*(N-1)/$df) matrix A = { K, N, aic, bic, rbar2} printf "nRegressors: %sn",varname(xvars) printf "K = %d, N = %d, AIC = %.4f, SC = %.4f, and Adjusted R2 = %.4fn", K, N, aic, bic, rbar2 return A end function
list x1 = const he
list x2 = const he we
list x3 = const he we kl6
list x4 = const he we xtra_x5 xtra_x6
matrix a = modelsel(faminc, x1)
matrix b = modelsel(faminc, x2)
matrix c = modelsel(faminc, x3)
matrix d = modelsel(faminc, x4)
matrix MS = a|b|c|d colnames(MS,"K N AIC SC Adj_R2" ) printf "%10.5g",MS function modelsel clear
ols faminc all_x
modeltab add
omit xtra_x5 xtra_x6
modeltab add
omit kl6
modeltab add
omit we
modeltab add
modeltab show
ols faminc x3 —quiet reset
# collinearity
open "@gretldirdatapoecars. gdt"
summary
corr
ols mpg const cyl
ols mpg const cyl eng wgt --quiet
omit cyl
ols mpg const cyl eng wgt --quiet omit eng
ols mpg const cyl eng wgt --quiet omit eng cyl
# Auxiliary regressions for collinearity
# Check: r2 >.8 means severe collinearity ols cyl const eng wgt
scalar r1 = $rsq ols eng const wgt cyl scalar r2 = $rsq ols wgt const eng cyl scalar r3 = $rsq
printf "R-squares for the auxillary regresionsnDependent Variable:
n cylinders %3.3gn engine displacement %3.3gn weight %3.3gn", r1, r2, r3
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
ols mpg const cyl eng wgt vif