Using gret l for Principles of Econometrics, 4th Edition
Simulation
In appendix 10F of POE4, the authors conduct a Monte Carlo experiment comparing the performance of OLS and TSLS. The basic simulation is based on the model
y = x + e (10.7)
x = nz1 + nz2 + nz3 + v (10.8)
The Zi are exogenous instruments that are each N(0,1). The errors, e and v, are
The parameter n controls the strength of the instruments and is set to either 0.1 or 0.5. The parameter p controls the endogeneity of x. When p = 0, x is exogenous. When p = 0.8 it is seriously endogenous. Sample size is set to 100 and 10,000 simulated samples are drawn.
The gretl script to perform the simulation appears below:
1 scalar N = 100
2 nulldata N
3 scalar rho =0.8 # set r = (0.0 or 0.8)
4 scalar p = 0.5 # set p = (0.1 or 0.5)
5 matrix S = {1, rho; rho, 1}
6 matrix C = cholesky(S)
7
7 series z1 = normal(N,1)
8 series z2 = normal(N,1)
9 series z3 = normal(N,1)
10 series xs = p*z1 + p*z2 + p*z3
11 list z = z1 z2 z3
13
12 loop 10000 —progressive —quiet
13 matrix errors = mnormal(N,2)*C'
14 series v = errors[,1]
15 series e = errors[,2]
16 x = xs + v
17 y = x + e
18 ols x const z --quiet
19 scalar f = $Fstat
20 ols y 0 x --quiet
21 scalar b_ols = $coeff(x)
22 tsls y 0 x; 0 z --quiet
23 scalar b_tsls = $coeff(x)
24 store coef. gdt b_ols b_tsls f
25 print b_ols b_tsls f
26 endloop
The top part of the script initializes all of the parameters for the simulation. The sample size is set to 100, an empty dataset is created, the values of p and n are set, then the covariance matrix is created and the Cholesky decomposition is taken. The Cholesky decomposition is a trick used to create correlation among the residuals. There are more transparent ways to do this (e. g., e = rho*v + normal(0,1)), but this is a useful trick to use, especially when you want to correlate more than two series. The systematic part of x is created and called xs and a list to contain the instruments is created as well.
The loop uses the —progressive option and is set to do 10,000 iterations. The matrix called errors uses the Cholesky decomposition of the variance covariance to create the correlated errors. The first column we assign to v and the second to e. The endogenous regressor x is created by adding v to the systematic portion of the model, and then the dependent variable in the regression is created. The first regression in line 20 is the reduced form. The overall F statistic from this regression can serve as the test for weak instruments since there are no other exogenous variables in the model. The omit form of the F-test won’t work in a progressive loop so I avoided it here. The slope estimates for least squares and two-stage least squares are collected, stored to coef. gdt, and printed.
For this particular parameterization, I obtained the following result:
Statistics for 10000 repetitions
Variable mean std.
1.42382 0.0532148 1.00887 0.106816 30.6130 7.88943
1 set echo off
2 open "@gretldirdatapoemroz. gdt"
3 logs wage
4 square exper
5
5 list x = const educ exper sq_exper
6 list z = const exper sq_exper mothereduc
7 # least squares and IV estimation of wage eq
8 ols l_wage x
9 tsls l_wage x ; z
11
10 # tsls--manually
11 smpl wage>0 --restrict
12 ols educ z
13 series educ_hat = $yhat
14 ols l_wage const educ_hat exper sq_exper
17
15 # partial correlations--the FWL result
16 ols educ const exper sq_exper
17 series e1 = $uhat
18 ols mothereduc const exper sq_exper
19 series e2 = $uhat
20 ols e1 e2
21 corr e1 e2
25
22 list z = const exper sq_exper mothereduc
23 list z1 = const exper sq_exper fathereduc
24 list z2 = const exper sq_exper mothereduc fathereduc
29
25 # Hausman test with different sets of instruments
26 ols educ z --quiet
27 series ehat = $uhat
28 ols l_wage x ehat
34
29 ols educ z1 --quiet
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
series ehatl = $uhat ols l_wage x ehatl
ols educ z2 —quiet series ehat2 = $uhat ols l_wage x ehat2
# weak instruments
open "@gretldirdatapoemroz. gdt" smpl wage>0 —restrict logs wage square exper
list x = const educ exper sq_exper
list z2 = const exper sq_exper mothereduc fathereduc ols educ z2
omit mothereduc fathereduc
# Sargan test of overidentification tsls l_wage x; z2
series uhat2 = $uhat ols uhat2 z2 scalar test = $trsq pvalue X 2 test
tsls l_wage x ; z2
open "@gretldirdatapoemroz. gdt" smpl wage>0 —restrict logs wage square exper
list x = const educ exper sq_exper
list z2 = const exper sq_exper mothereduc fathereduc
tsls l_wage x; z2
series ehat2 = $uhat
ols ehat2 z2
scalar test = $trsq
pvalue X 2 test
# Cragg-Donald F
open "@gretldirdatapoemroz. gdt" smpl wage>0 —restrict logs wage square exper
series nwifeinc = (faminc-wage*hours)/l000
list x = mtr educ kidsl6 nwifeinc const
list z = kidsl6 nwifeinc mothereduc fathereduc const
tsls hours x ; z
scalar df = $df
list w = const kidsl6 nwifeinc ols mtr w --quiet series el = $uhat
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
ols educ w —quiet
series e2 = $uhat
ols mothereduc w —quiet
series e3 = $uhat
ols fathereduc w —quiet
series e4 = $uhat
# canonical correlations in R foreign language=R —send-data —quiet
set1 <- gretldata[,29:30] set2 <- gretldata[,31:32] cc1 <- cancor(set1,set2) cc <- as. matrix(cc1$cor) gretl. export(cc) end foreign
vars = mread("@dotdir/cc. mat") print vars
scalar mincc = minc(vars)
scalar cd = df*(minccrt2)/(2*(1-minccrt2))
printf "nThe Cragg-Donald Statistic is 0/06.4f.n",cd
# canonical correlations in gretl function matrix cc(list Y, list X)
matrix mY = cdemean({Y}) matrix mX = cdemean({X})
matrix YX = mY'mX matrix XX = mX'mX matrix YY = mY'mY
matrix ret = eigsolve(qform(YX, invpd(XX)), YY) return sqrt(ret) end function
list E1 = e1 e2 list E2 = e3 e4
l = cc(E1, E2) scalar mincc = minc(l)
scalar cd = df*(minccrt2)/(2*(1-minccrt2))
printf "nThe Cragg-Donald Statistic is %10.4f.n",cd
# simulation for ols and tsls scalar N = 100
nulldata N
scalar rho =0.8 # set r = (0.0 or 0.8)
scalar p = 0.5 # set p = (0.1 or 0.5)
matrix S = {1, rho; rho, 1}
matrix C = cholesky(S)
series z1 = normal(N,1) series z2 = normal(N,1) series z3 = normal(N,1) series xs = p*z1 + p*z2 + p*z3 list z = z1 z2 z3
loop 10000 —progressive —quiet
matrix errors = mnormal(N,2)*C' series v = errors[,1] series e = errors[,2] x = xs + v y = x + e
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
ols x const z --quiet scalar f = $Fstat ols y 0 x --quiet scalar b_ols = $coeff(x) tsls y 0 x; 0 z --quiet scalar b_tsls = $coeff(x) store coef. gdt b_ols b_tsls f print b_ols b_tsls f endloop
Model 2: OLS, using observations 1-428 Dependent variable: educ
coefficient std. error t-ratio p-value
Comparison of Model 1 and Model 2:
Null hypothesis: the regression parameters are zero for the variables mothereduc, fathereduc
Test statistic: F(2, 423) ^^5.4003, with p-value = 4.2689ІЄ-022^
Of the 3 model selection statistics, 0 have improved.
Figure 10.3: Results from using the omit statement after least squares