Using gret l for Principles of Econometrics, 4th Edition
Monte Carlo Simulation
The first step in a Monte Carlo exercise is to model the data generation process. This requires what Davidson and MacKinnon (2004) refer to as a fully specified statistical model. A fully specified parametric model “is one for which it is possible to simulate the dependent variable once the values of the parameters are known” (Davidson and MacKinnon, 2004, p. 19). First you’ll need a regression function, for instance:
E (ytQt) = ві + в2 xt (2.14)
where yt is your dependent variable, xt the dependent variable, Qt the current information set, and ві and в2 the parameters of interest. The information set Qt contains xt as well as other potential explanatory variables that determine the average of yt. The conditional mean of yt given the information set could represent a linear regression model or a discrete choice model. However, equation (2.14) is not complete; it requires some description of how the unobserved or excluded factors affect ytQt.
To complete the the specification we need to specify an “unambiguous recipe” for simulating the model on a computer (Davidson and MacKinnon, 2004, p. 17). This means we’ll need to specify a probability distribution for the unobserved components of the model and then use a pseudo-random number generator to generate samples of the desired size.
In this example the data generation process will be as follows. We will let N = 40 and consider a linear model of the form
Уі = ві + в2 Xi + ei i = 1, 2, ••• , 40. (2.15)
The errors of the model will iid N(0, 88). The parameters ві = 100 and в2 = 10. Finally, let xl, x2, ■ ■ ■ ,x20 = 10 and let x2l, x22, ■ ■ ■ , x40 = 20. This gives us enough information to simulate samples of yi from the model. The hansl script (hansl is an acronym for hansl’s a neat scripting language is:
1 nulldata 40
2 # Generate X
3 series x = (index>20) ? 20 : 10
4
4 # Generate systematic portion of model
5 series ys = 100 + 10*x
7
6 loop 1000 —progressive —quiet
7 y = ys + normal(0,50)
8 ols y const x
9 scalar b1 = $coeff(const)
10 scalar b2 = $coeff(x)
11 scalar sig2 = $sigma"2
12 print b1 b2 sig2
13 store "@workdircoef. gdt" b1 b2 sig2
14 endloop
17
15 open "@workdircoef. gdt"
16 summary
17 freq b2 —normal
The first line creates an empty dataset that has room for 40 observations. Line 3 contains a ternary conditional assignment operator.2 Here is how it works. A series x is being created. The statement in parentheses is checked. The question mark (?) is the conditional assignment. If the statement in parentheses is true, then x is assigned the value to the left of the colon. If false it gets the value to the right. So, when index (a gretl default way of identifying the observation number) is greater than 20, x is set to 20, if index is less than or equal to 20 it is set to 10.
Next, the systematic portion of the model is created. For this we need x and the known values of the parameters (100, 10). Then we loop from 1 to 1000 in increments of 1. Normal random variates are added to the model, it is estimated by ols, and several statistics from that computation are retrieved, printed, and stored in a specified location.
The normal(0,50) statement generates normal random variables with mean of 0 and a variance of 50. The print statement used in this context actually tells gretl to accumulate the things that are listed and to print out summary statistics from their computation inside the loop. The store command tells gretl to output b1, b2, and sig2 to an external file. The —progressive option to the loop command alters the print and store commands a bit, and you can consult the Gretl Users Guide for more information about how.
Here is the output from the Monte Carlo. First, the output from the progressive loop:
OLS estimates using the 40 observations 1-40 Statistics for 1000 repetitions Dependent variable: у
In a progressive loop, gretl will print out the mean and standard deviation from the series of estimates. It works with all single equation estimators in gretl and is quite useful for Monte Carlo analysis. From this you can see that the average value of the constant in 1000 samples is 100.491. The average slope was 9.962. The third column gives the mean of the standard error calculation
Statistics for 1000 repetitions Variable mean
M 100.491 24.5847
Ь2 9.96204 1.57931
sig2 2497.03 551.720
When the print command is issued, it will compute and print to the screen the ‘mean’ and ‘std. dev.’ of the estimated scalar. Notice that b1 and b2 match the output produced by the —progressive option. The print command is useful for studying the behavior of various statistics (like tests, confidence intervals, etc) and other estimators that cannot be handled properly within a progressive loop (e. g., mle, gmm, and system estimation commands).
The store statement works behind the scenes, but yields this informative piece of information:
3tore: using filename c:teirpcoef. gdt Data written OX.
This tells you where gretl wrote the dataset that contains the listed scalars, and that is was written properly. Now you are ready to open it up and perform additional analysis. In this example, we have used the @workdir macro. This basically tells gretl to go to the working directory to write the file. You could write files to gretl’s temporary directory using @dotdircoef. gdt.
The data set is opened and the summary statistics generated (again, if needed)
1 open "@workdircoef. gdt"
2 summary
3 freq b2 —normal
From here you can plot frequency distribution and test to see whether the least squares estimator of slope is normally distributed.
The script for chapter 2 is found below. These scripts can also be found at my website http: //www. learneconometrics. com/gretl.
1 set echo off
2 open "@gretldirdatapoefood. gdt"
3 setinfo food_exp - d "household food expenditure per week"
4 - n "Food Expenditure"
5 setinfo income - d "weekly household income" - n "Weekly Income"
6 labels
7
7 #Least squares
8 ols food_exp const income —vcv
9 ols 10 2
11
10 #Summary Statistics
11 summary food_exp income
14
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
#Plot the Data gnuplot food_exp income
#List the Data
print food_exp income —byobs #Elasticity
ols food_exp const income --quiet
scalar elast=$coeff(income)*mean(income)/mean(food_exp) #Prediction
scalar yhat = $coeff(const) + $coeff(income)*20 #Table 2.2
open "@gretldirdatapoetable2_2.gdt" list ylist = y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 loop foreach i ylist
ols ylist.$i const x endloop
# slopes and elasticities at different points open "@gretldirdatapoebr. gdt"
series sqft2 = sqftrt2 ols price const sqft2
scalar slope_2000 = 2*$coeff(sqft2)*2000 scalar slope_4000 = 2*$coeff(sqft2)*4000 scalar slope_6000 = 2*$coeff(sqft2)*6000 scalar elast_2000 = slope_2000*2000/117461.77 scalar elast_4000 = slope_4000*4000/302517.39 scalar elast_6000 = slope_6000*6000/610943.42
# histogram for price and log(price) series l_price = ln(price)
freq price freq l_price
# regression using indicator variables open "@gretldirdatapoeutown. gdt" ols price const utown —quiet
scalar ut = $coeff(const)+$coeff(utown) scalar other = $coeff(const)
printf "nThe average in Utown is %.4f and the average elsewhere is %.4fn",ut, other
# Monte Carlo simulation
open "@gretldirdatapoefood. gdt" set seed 3213789 loop 100 —progressive —quiet series u = normal(0,88) series y1= 80+10*income+u ols y1 const income
endloop
# Monte Carlo simulation #2
# Generate systematic portion of model nulldata 40
# Generate X
series x = (index>20) ? 20 : 10
# Generate systematic portion of model series ys = 100 + 10*x
loop 1000 —progressive —quiet series y = ys + normal(0,50) ols y const x scalar b1 = $coeff(const) scalar b2 = $coeff(x) scalar sig2 = $sigmart2 print b1 b2 sig2
store "@workdircoef. gdt" b1 b2 sig2 endloop
open "@workdircoef. gdt" summary
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
freq b2 —normal
Figure 2.12: Results from the script to compute an elasticity based on a linear regression. |
Model 4: OLS, using |
obse |
Display actual, fitted, residual |
||
Dependent variable: |
food |
Forecasts... |
||
Confidence intervals for coefficients |
||||
coefficient |
Cr>nfir|gp|f P РІІІГКР... |
e |
||
const 83.4160 |
^Coefficient covariance matrix_^^ |
* |
||
income 10.2096 |
ANOVA |
35 *** |
||
Mean dependent var |
233 |
Bootstrap... |
.6752 |
|
.з гоз ь. v. dependent var |
112 |
|||
Sum squared resid |
304505.2 5.E. Of regression |
39. |
51700 |
|
R-squared |
0.385002 Adjusted R-squared |
0.368818 |
||
F(l, 38) |
23. |
78884 P-value(F) |
0.000019 |
|
Log-likelihood |
-235 |
.5088 Akaike criterion |
475 |
.0176 |
Schwarz criterion |
473 |
.3954 Hannan-Quinn |
476 |
.2339 |
OLS estimates using the 40 observations 1-40 Statistics for 100 repetitions Dependent variable: yl
Statistics for 100 repetitions Variable mean
Ы 38.1474
b2 9.59723
store: using filename c:tempcoeff. gdt
Data, written OK.
Figure 2.15: The summary results from 100 random samples of the Monte Carlo experiment.
|
|||||||||||
|
|||||||||||
|
|||||||||||
|
|||||||||||
|
|||||||||||
|
|||||||||||
|
|||||||||||
|
|
|
|
|
|
||||||
|
|
||||||||||
Figure 2.16: Price versus size from the quadratic model.
|