 Using gret l for Principles of Econometrics, 4th Edition

# Heteroskedasticity in the Linear Probabilty Model

In chapter 7 we introduced the linear probability model. It was shown that the indicator variable, yi is heteroskedastic. That is,

var(yi) = пі(1 - пі) (8.10)

where ni is the probability that the dependent variable is equal to 1 (the choice is made). The estimated variance is

var(yi) = Пі(1 - Пі) (8.11)

This can be used to perform feasible GLS. The cola marketing data coke. gdt is the basis for this example. The independent variable, coke, takes the value of 1 if the individual purchases Coca-Cola and is 0 if not. The decision to purchase Coca-Cola depends on the ratio of the price relative to Pepsi, and whether displays for Coca-Cola or Pepsi were present. The variables disp_coke=1 if a Coca-Cola display was present, otherwise 0; disp_pepsi=1 if a Pepsi display was present, otherwise it is zero.

1 First, the data are loaded and the summary statistics are provided.

2 open "@gretldirdatapoecoke. gdt"

3 summary —simple

4 list x = const pratio disp_coke disp_pepsi

The —simple option is used for the summary command. Then a list is created that contains the names of the independent variables to be used in the estimated models. The basic summary statistics are:

Summary statistics, using the observations 1 - 1140

Mean Minimum Maximum Std. Dev.

 coke 0.44737 0 1 0.49744 pr_pepsi 1.2027 0.68 1.79 0.30073 pr_coke 1.1901 0.68 1.79 0.29992 disp_pepsi 0.36404 0 1 0.48137 disp_coke 0.37895 0 1 0.48534 pratio 1.0272 0.49721 2.3247 0.28661

Everything looks good. There are no negative prices, and the indicator variables are all contained between 0 and 1. The magnitudes of the means are reasonable.

Next, least squares is used to estimate the model twice: once with usual standard errors and again with the HCCME standard errors produced by the —robust option. Each is added to a model table using modeltab add.

1 # OLS

2 ols coke x

4 # OLS w/robust

5 ols coke x —robust

Feasible GLS will be estimated in two ways. In the first regression, we will omit any observation that has a negative estimated variance. Remember that one of the problems with linear probability is that predictions are not constrained to lie between 0 and 1. If yi < 0 or yi > 1, then variance estimates will be negative. In the first line below a new series is created to check this condition. If the variance, varp, is greater than zero, pos will be equal to 1 and if not, then it is zero. The second line creates a weight for wls that is formed by multiplying the indicator variable pos times the reciprocal of the variance. In this way, any nonnegative weights become zeros.

---------------- Remove observations with negative variance

1 series p = \$yhat

2 series varp = p*(1-p)

3 series pos = (varp > 0)

4 series w = pos * 1/varp

5 # omit regression

6 wls w coke x

The first line uses the accessor for the predicted values from a linear regression, \$yhat, and therefore it must follow least squares estimation of the linear probability model; in this model, they are interpreted as probabilities. Once again, a trick is being used to eliminate observations from the model. Basically, any observation that has a zero weight in w is dropped from the computation. There are equivalent ways to do this in gretl as shown below

------------------ Two other ways to drop observations

smpl varp>0 —restrict

setmiss 0 w

The restricting the sample is probably the most straightforward method. The second uses the setmiss command that changes the missing value code to 0 for elements of w; any observation where w=0 is now considered missing and won't be used to estimate the model.

Finally, another feasible GLS estimation is done. This time, p1 is truncated at 0.01 if yi < 0.01 and to 0.99 if yi > 0.99. The code to do this is

-------- WLS with truncated variances for observations out of bounds

1 series b = (p<.01) || (p>.99)

2 series pt = b*0.01 + p*(1-b)

3 series varp_t = pt*(1-pt)

4 series w_t = 1/varp_t

5 wls w_t coke x

7 modeltab show

The first line creates another indicator variable that takes the value of 1 if the predicted probability falls outside of the boundary. The || is a logical operator that takes the union of the two condi­tions (=“OR”). The second line creates the truncated value of the probability using the indicator variable. b(0.01) + p(1 - b) = 0.01 when b = 1 b(0.01) + p(1 — b) = p when b = 0

There is another, less transparent, way to generate the truncated probabilities: use the ternary conditional assignment operator. This operates like an if statement and can be used to save a line of script. This syntax would create the series as

-------------------- The conditional assignment operator

series pt = ( (p<.01) || (p>.99) ) ? 0.01 : p

Basically the bound condition in parentheses (p < .01)||(p > .99) is checked: that is what the question mark represents. If it is true, pt is set to the first value that appears in front of the colon. If false, it is set to the value specified to the right of the colon. It operates very much like a traditional if statement in a spreadsheet program. This method is more efficient computationally as well, which could save some time if used in a loop to perform simulations.

Once the truncated probabilities are created, then the usual weighted least squares estimation can proceed. The model table appears below:     Dependent variable: coke

Standard errors in parentheses * indicates significance at the 10 percent level ** indicates significance at the 5 percent level

Columns (1) and (2) are the OLS estimates with usual and robust standard errors, respectively. Column (3) uses WLS with the negative variance observations omitted from the sample. Column (4) is WLS with the negative predictions truncated. These results are quite a bit different from the others. This no doubt occurs because of the large weight being placed on the 16 observations whose weights were constructed by truncation. The var(ei) = 0.01(1 — 0.01) = 0.0099. The square root of the reciprocal is approximately 10, a large weight to be placed on these 16 observations via WLS. Since these extreme observations carry a large weight relative to the others, they exert a considerable influence on the estimated regression.

8.4 Script

1 open "@gretldirdatapoefood. gdt"

2 set echo off

3 ols food_exp const income

4 gnuplot food_exp income —linear-fit

5 # see section 1.4 of this manual for commands to view these plots.

6

6 # ols with HCCME standard errors

7 ols food_exp const income —robust

8 # confidence intervals (Robust)

 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

scalar lb = \$coeff(income) - critical(t,\$df,0.025) * \$stderr(income) scalar ub = \$coeff(income) + critical(t,\$df,0.025) * \$stderr(income) printf "nThe 95%% confidence interval is (%.3f, %.3f).n",lb, ub

# residual plot

ols food_exp const income --robust series res = \$uhat

setinfo res - d "Least Squares Residuals" - n "Residual" gnuplot res income —output=c:Tempolsres

# lauch gnuplot (Windows only) launch wgnuplot

# To view graph, type: load 'C:Tempolsres' at prompt

# residual magnitude plot with loess fit series abs_e = abs(res)

setinfo abs_e - d "Absolute value of the LS

Residuals" - n "Absolute Value of Residual"

gnuplot abs_e income —loess-fit —output=c:temploessfit. plt

# LM test for heteroskdasticity ols food_exp const income series sq_ehat = \$uhat*\$uhat ols sq_ehat const income scalar NR2 = \$trsq

pvalue X 1 NR2

# built-in LM test

ols food_exp const income modtest income —breusch-pagan

# White test

ols food_exp const income series sq_ehat = \$uhat*\$uhat series sq_income = incomert2 ols sq_ehat const income sq_income scalar NR2 = \$trsq pvalue X 2 NR2

# built-in White test

ols food_exp const income —quiet modtest —white —quiet

# grouped data—Goldfeld-Quandt open "@gretldirdatapoecps2.gdt" ols wage const educ exper metro

# Use only metro observations smpl metro=1 --restrict

ols wage const educ exper scalar stdm = \$sigma scalar df_m = \$df

 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111

#Restore the full sample smpl full

# Use only rural observations smpl metro=0 —restrict

ols wage const educ exper scalar stdr = \$sigma scalar df_r = \$df

# GQ statistic

gq = stdmrt2/stdrrt2

scalar pv = pvalue(F, df_m, df_r, gq)

printf "nThe F(%d, %d) statistic = %.3f. The right side p-value is %.4g.n",df_m, df_r, gq, pv

# Goldfeld-Quandt for food expenditure open "@gretldirdatapoefood. gdt" dataset sortby income

list x = const income ols food_exp x

# large variance observations smpl 21 40

ols food_exp x scalar stdL = \$sigma scalar df_L = \$df #Restore the full sample smpl full

# small variance observations smpl 1 20

ols food_exp x scalar stdS = \$sigma scalar df_S = \$df

# GQ statistic

gq = stdLrt2/stdSrt2

scalar pv = pvalue(F, df_L, df_S, gq) printf "nThe F(%d, %d) statistic = %.3f. The right side p-value is %.4g.n",df_L, df_S, gq, pv

# compare ols with and without HCCME list x = const income

ols food_exp x --quiet modeltab add

ols food_exp x --robust --quiet modeltab add modeltab show

# hypothesis test

ols food_exp x --robust omit income

ols food_exp x --quiet restrict b=0 end restrict

 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162

ols food_exp x —robust —quiet restrict b=0 end restrict

open "@gretldirdatapoefood. gdt"

#GLS using built in function

series w = 1/income

wls w food_exp const income

scalar lb = \$coeff(income) - critical(t,\$df,0.025) * \$stderr(income)

scalar ub = \$coeff(income) + critical(t,\$df,0.025) * \$stderr(income)

printf "nThe 95%% confidence interval is (%.3f, %.3f).n",lb, ub

#GLS using OLS on transformed data

series wi = 1/sqrt(income)

series ys = wi*food_exp

series xs = wi*income

series cs = wi

ols ys cs xs

#Wage Example

open "@gretldirdatapoecps2.gdt" ols wage const educ exper metro

# Use only metro observations smpl metro --dummy

ols wage const educ exper scalar stdm = \$sigma smpl full

#Create a dummy variable for rural

series rural = 1-metro

#Restrict sample to rural observations

smpl rural --dummy

ols wage const educ exper

scalar stdr = \$sigma

#Restore the full sample

smpl full

#Generate standard deviations for each metro and rural obs

series wm = metro*stdm

series wr = rural*stdr

series w = 1/(wm + wr)rt2

#Weighted least squares

wls w wage const educ exper metro

# heteroskedastic model

open "@gretldirdatapoefood. gdt" ols food_exp const income series lnsighat = log(\$uhat*\$uhat) series z = log(income) ols lnsighat const z
series predsighat = exp(\$yhat) series w = 1/predsighat wls w food_exp const income

# linear probability model

open "@gretldirdatapoecoke. gdt" summary --simple

list x = const pratio disp_coke disp_pepsi

# OLS

# OLS w/robust

ols coke x --robust modeltab add series p = \$yhat series varp = p*(1-p) series pos = (varp > 0) series w = pos * 1/varp

# omit regression wls w coke x modeltab add

# smpl varp>0 --restrict

# setmiss 0 w

series b = (p<.01) || (p>.99) series pt = b*0.01 + p*(1-b) series varp_t = pt*(1-pt) series w_t = 1/varp_t

 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195

# trunc regression wls w_t coke x modeltab add modeltab show Figure 8.2: Check the box for heteroskedasticity robust standard errors. Figure 8.3: Set the method for computing robust standard errors. These are located under the HCCME tab. From the pull-down list for cross-sectional data choose an appropriate option-HC3 in this case. Figure 8.4: Plot of food expenditures against income with least squares fit. Figure 8.5: Plot of the absolute value of the food expenditures model residuals against income with loess fit. Figure 8.6: Select Data>Sort data from the main menu bar to reveal the dialog box shown on the right side of of this figure. Choose the desired sort key and indicate whether you want to sort in ascending or descending order.

Добавить комментарий

## Using gret l for Principles of Econometrics, 4th Edition

### Simulation

In appendix 10F of POE4, the authors conduct a Monte Carlo experiment comparing the performance of OLS and TSLS. The basic simulation is based on the model y = x …

### Hausman Test

The Hausman test probes the consistency of the random effects estimator. The null hypothesis is that these estimates are consistent-that is, that the requirement of orthogonality of the model’s errors …

### Time-Varying Volatility and ARCH Models: Introduction to Financial Econometrics

In this chapter we’ll estimate several models in which the variance of the dependent variable changes over time. These are broadly referred to as ARCH (autoregressive conditional heteroskedas - ticity) …

## Как с нами связаться:

Украина:
г.Александрия
тел./факс +38 05235  77193 Бухгалтерия
+38 050 512 11 94 — гл. инженер-менеджер (продажи всего оборудования)

+38 050 457 13 30 — Рашид - продажи новинок
e-mail: msd@msd.com.ua
Схема проезда к производственному офису:
Схема проезда к МСД

Партнеры МСД

## Контакты для заказов шлакоблочного оборудования:

+38 096 992 9559 Инна (вайбер, вацап, телеграм)
Эл. почта: inna@msd.com.ua

За услуги или товары возможен прием платежей Онпай: Платежи ОнПай