Predictions in the Log-linear Model
In this example, you use the regression to make predictions about the log wage and the level of the wage for a person having 12 years of schooling. The naive prediction of wage merely takes the antilog of the predicted ln(wage). This can be improved upon by using properties of lognormal random variables. It can be shown that if ln(w) N(y.,a2) then E(w) = e^+a2/2 and
var(w) = e2^2 (ef2 — 1).
That means that the corrected prediction is yc = exp(bi + b2x + <t2/2) = e(bl+b2a:)e<j2/2. The script to generate these is given below.
1 open "@gretldirdatapoecps4_small. gdt"
2 logs wage
3 ols l_wage const educ
4 scalar l_wage_12 = $coeff(const)+$coeff(educ)*12
5 scalar nat_pred = exp(l_wage_12)
6 scalar corrected_pred = nat_pred*exp($sigma"2/2)
7 print l_wage_12 nat_pred corrected_pred
The results from the script are
l_wage_12 = 2.6943434
nat_pred = 14.795801
corrected_pred = 16.996428
That means that for a worker with 12 years of schooling the predicted wage is $14.80/hour using the natural predictor and $17.00/hour using the corrected one. In large samples we would expect the corrected predictor to be a bit better. Among the 1000 individuals in the sample, 328 of them have 12 years of schooling. Among those, the average wage is $15.99. Hence the corrected prediction overshoots by about a dollar/hour. Still, it is closer than the uncorrected figure.
To get the average wage for those with 12 years of schooling, we can restrict the sample using the script below:
smpl educ=12 —restrict summary wage smpl full
The syntax is relatively straightforward. The smpl command instructs gretl that something is being done to the sample. The second statement educ=12 is a condition that gretl looks for within the sample. The --restrict option tells gretl what to do for those observations that satisfy the condition. The summary wage statement produces
Summary Statistics, using the observations 1-328
for the variable wage (328 valid observations)