Using gret l for Principles of Econometrics, 4th Edition
Seemingly Unrelated Regressions
The acronym SUR stands for seemingly unrelated regression equations. SUR is another way of estimating panel data models that are long (large T), but not wide (small N). More generally though, it is used to estimate systems of equations that do not necessarily have any parameters in common and whose regression functions do not appear to be related. In the SUR framework, each firm in your sample is parametrically different; each firm has its own regression function, i. e., different intercept and slopes. Firms are not totally unrelated, however. In this model the firms are linked by what is not included in the regression rather than by what is. The firms are thus related by unobserved factors and SUR requires us to specify how these omitted factors are linked in the system’s error structure.
In the basic SUR model, the errors are assumed to be homoscedastic and linearly independent within each equation, or in our case, each firm. The error of each equation may have its own variance. Most importantly, each equation (firm) is correlated with the others in the same time period. The latter assumption is called contemporaneous correlation, and it is this property that sets SUR apart from other models.
Now consider the investment model suggested by Grunfeld (1958). Considering investment decisions of only two firms, General Electric (g) and Westinghouse (w), we have
iUVgt — від + в2 g Vgt + вз g kgt + egt (15.13)
iUvwt — eiw + e2w vwt + e3w kwt + ewt (15.14)
where t — 1,2,..., 20, k is capital stock and v is value of the firm. In the context of the two firm Grunfeld model in (15.13) and (15.14) this would mean that var[egt] — var[ewt] — o2w;
cov(egt, ewt) — ogw for all time periods; and cov(eit, eis) — 0 for t — s for each firm, i — g, w. So in the SUR model you essentially have to estimate a variance for each individual and a covariance between each pair of individuals. These are then used to construct a feasible generalized least squares estimator of the equations parameters.
Even though SUR requires a T and an N dimension, it is not specifically a panel technique. This is because the equations in an SUR system may be modeling different behaviors for a single individual rather than the same behavior for several individuals. As mentioned before, it is best used when panels are long and narrow since this gives you more observations to estimate the equations variances and the cross equation covariances. More time observations reduces the sampling variation associated with these estimates, which in turn improves the performance of the feasible generalized least squares estimator. If your panel dataset has a very large number of individuals and only a few years, then FGLS may not perform very well in a statistical sense. In the two firm Grunfeld example, N=2 and T=20 so we needn’t worry about this warning too much, although the asymptotic inferences are based on T (and not N) being infinite.
The two firm example is from Hill et al. (2011) who have provided the data in the grunfeld2.gdt data set. The first model we estimate is the pooled model, estimated by least squares. This is done in lines 2 and 3.
1 open "@gretldirdatapoegrunfeld2.gdt"
2 list xvars = const v k
3 ols inv xvars
4 modeltab add
The second model estimates the investment equations for each firm separately. There are a number of ways to do this and several are explored below. The first method uses interaction terms (see chapter 7) to estimate the two equations. Basically, an indicator variable is interacted with each regressor, including the constant. The two firm model would be:
inv = ві + в2к + e3v + e4d + e5(d x k) + fi6(d x v) + e (15.15)
where d is the firm indicator. In the script the interactions are created using a loop. In this way, you could automate the procedure for any number of explanatory variables. Line 5 generates the set of indicators for the units of the panel. There are only two units in this panel, so only two indicators are created named du_1 and dm2. An empty list called Z is created. Z will be used to hold the variables created in the loop. The foreach loop is used in this example. The index is called i and it will loop over each element of the variable list, X. In line 8 the interaction term is assigned to a series. The name will be d$i, which as the loop proceeds will be dvarname. The next line creates the variable list, adding the new interaction at each iteration. Finally, the model is estimated using the original regressors.
5 series unitdum
6 list dZ = null
7 loop foreach i X
8 series d$i = du_2 * $i
9 list dZ = dZ d$i
10 endloop
11 ols inv X dZ
12 modeltab add
13 modeltab show
The results appear below.
Pooled OLS estimates Dependent variable: inv
(1)
17.87**
(7.024)
0.01519**
(0.006196)
0.1436**
(0.01860) 40
0.7995
-177.3
-9.956
(23.63)
0.02655**
(0.01172)
0.1517**
(0.01936)
9.447
(28.81)
0.02634
(0.03435)
-0.05929
(0.1169)
40
0.8025
-175.3
Standard errors in parentheses
* indicates significance at the 10 percent level ** indicates significance at the 5 percent level
They match those in Table 13.12 of POE4. One of the disadvantages of estimating the separate equations in this way is that it assumes that the error variances of the two firms are equal. If this is not true, then standard errors and t-ratios will not be valid. You could use a robust covariance estimator or estimate the model via groupwise heteroskedasticity. Also, the use of interaction terms complicates the interpretation of the coefficients a bit. The coefficients on the interaction terms are measuring the difference in effect between the interacted group and the reference group (GE). To get the marginal effect of an increase in k on average investment for Westinghouse we would add + в5. The computation based on the least squares estimates is 0.1517 — 0.0593 = 0.0924.
The next method of estimating the equations separately is better. It allows the variances of each subset to differ. The gretl script to estimate the two firm model using this data
1 wls du_1 inv v k const
2 wls du_2 inv v k const
This uses the trick explored earlier where observations can be included or excluded when weighted
1 loop foreach i du_*
2 wls $i inv v k const
3 endloop
Notice that the wildcard dm* is used again. The results from this exercise were added to the model table and appears below:
Dependent variable: inv
Standard errors in parentheses * indicates significance at the 10 percent level ** indicates significance at the 5 percent level |
Notice that the coefficients are actually equivalent in the two sets of regressions. The GE equations are put side-by-side to ease the comparison. The standard errors differ, as expected. The Westing - house coefficients estimated by WLS are also the same as the ones from the pooled model, though it is less obvious. Recall that the implied marginal effect of a change in k on average investment
was estimated to be 0.1517 — 0.0593 = 0.0924, which matches the directly estimated result in the last column.
Collecting the results for large N would be somewhat of a problem, but remember, up to 6 models can be added to a model table in gretl.
Next, we will estimate the model using SUR via the system command. To do this, some rearranging of the data is required. The system estimator is gretl handles many different cases, but the observations have to be ordered in a particular way in order for it to work. Since each equation in a system may be estimated separately, each firm’s observations must be given unique names and the observations must be aligned by time period. The grunfeld2.gdt has the data for each firm stacked on top of one another, making it 40 x 3. We need it to be 20 x 6. Ordinarily with SUR this is not a problem. Recall that SUR is for large T, small N models and the equations may not even contain the same variables. In our case, the data were ordered for use as panel and not as a system. This is easy enough to rectify.
The easiest way to do this in gretl is to use matrices. The data series will be converted to a matrix, reshaped, and then reloaded as data series. Unique names will have to be given the new series and you’ll have to keep up with what gets placed where. It is slightly clumsy, but easy enough to do.
1 open "@gretldirdatapoegrunfeld2.gdt"
2 list X = inv v k
3 matrix dat = { X }
4 matrix Y = mshape(dat,20,6)
5 colnames(Y,"ge_i w_i ge_v w_v ge_k w_k ")
In line 1 the data are opened and in line 2 the variable list created. In line 3 the data series listed in X are converted to a matrix called dat. Then, the matrix Y is converted from 40 x 3 to 20 x 6. The mshape(matrix, rows, columns) command essentially takes whatever is in the matrix and converts it to the new dimension. Elements are read from X and written to the target in column-major order. Thus, y=mshape(X,2,4)
Finally, the proper column names were reassigned using the colnames command.
Next, the matrix needs to reenter gretl as a dataset. Begin by creating a new, empty dataset that contains the proper number of observations (T=20) using the nulldata command. You must use the --preserve option, otherwise the contents of your matrix will be deleted when you create the empty dataset!
7 list V = null
8 scalar n = cols(Y)
9 loop for i=1..n
10 series v$i = Y[,i]
11 endloop
12 rename v1 inv_g
13 rename v2 inv_w
An empty list called v is created in line 7. A scalar n is created that contains the number of columns of Y and then a loop from 1 to n is initiated. Inside the loop is a single statement that will assign each column of the matrix to a separate series. The series name will begin with v and the column number will be appended. You will end up with variables v1, v2, to v6. The choice of variable names is not informative. You need to verify that the new variables correspond to the correct variables from the original dataset. To help keep the regressions straight, the dependent variables were renamed. A clever programmer could probably figure out how to do this automatically, but for now let’s move on.
Before pushing on to estimation of the SUR, there is one more way to estimate the model equation by equation using least squares. This can be done within the same system framework as SUR.
It consists of a block of code that starts with the system name="Grunfeld" line. One advantage naming your system is that results are attached to it, stored into the session, and are available for further analysis. For instance, with a saved set of equations you can impose restrictions on a single equation in the model or impose restrictions across equations.
1 system name="Grunfeld"
2 equation inv_g const v_g k_g
3 equation inv_w const v_w k_w
4 end system
5 estimate "Grunfeld" method=ols
Following the system name, each equation is put on a separate line. Notice that each equation is identified using equation which is followed by the dependent variable and then the independent variables which includes a constant. Close the system block using the end system command. The system is then estimated using the line estimate "Grunfeld" method=ols. Executing this script yields
Equation system, Grunfeld
Estimator: Ordinary Least Squares
Equation 1: OLS, using observations 1-20
Dependent variable: inv_g
|
Mean dependent var 102.2900 S. D. dependent var 48.58450
Sum squared resid 13216.59 S. E. of regression 27.88272
Equation 2: OLS, using observations 1-20
Dependent variable: inv_w
Coefficient |
Std. Error |
t-ratio |
p-value |
|
const |
-0.509390 |
8.01529 |
-0.0636 |
0.9501 |
v_w |
0.0528941 |
0.0157065 |
3.3677 |
0.0037 |
k_w |
0.0924065 |
0.0560990 |
1.6472 |
0.1179 |
Mean dependent var 42.89150 S. D. dependent var 19.11019
Sum squared resid 1773.234 S. E. of regression 10.21312
Cross-equation VCV for residuals
(correlations above the diagonal)
Breusch-Pagan test for diagonal covariance matrix: X2(1) = 10.6278 [0.0011]
Naming the system has many advantages. First, the specified model is saved to a session and an icon is added to the session icon view as shown below in Figure 15.4. Clicking on the model icon named “Grunfeld” opens the dialog shown in Figure 15.5. Do not worry about the code that appears in the box. It is not editable and is generated by gretl. You do have some choice as to how the particular system is estimated and whether iterations should be performed. These choices appear in Figure 15.6. As you can see, you may choose sur, ols, tsls, wls, and others. To reestimated a model, choose an estimator, and click OK.
A test can be used to determine whether there is sufficient contemporaneous correlation. The test is simple to do from the standard output or you can rely on gretl’s automatic result. Recall from POE4 that the test is based on the squared correlation computed from least squares estimation
A little caution is required here. The squared correlations are must be computed based on the residuals from the least squares estimator, not SUR. Since we’ve used the system command to estimate the model by OLS, the results above can be used directly.
The resulting cross-equation variance covariance for the residuals is
Cross-equation VCV for residuals (correlations above the diagonal)
777.45 (0.729)
207.59 104.31
Gretl produces this number for you in the upper diagonal of the matrix and places it in parentheses. Using the given computation the test statistic is
LM = Tr^w _dx2i)
provided the null hypothesis of no correlation is true. The arithmetic is (20 * 0.729)
Fortunately, gretl also produces this statistic as part of the standard output from system estimation by method=ols. It is referred to in the output as “Breusch-Pagan test for diagonal
Seemingly Unrelated Regressions (sur)
Three-Stage Least Squares (3sls)
Full Information Maximum Likelihood (fiml)
Limited Information Maximum Likelihood (liml)
Ordinary Least Squares (ols)
Two-Stage Least Squares (tsls)
Weighted Least Squares (wls)
Figure 15.6: The estimator choices available from the system dialog.
covariance matrix” and its distributed x[83](1) if there is no contemporaneous correlation among firms. The statistic is = 10.6278 with a p-value of 0.0011. The two firms appear to be contemporaneously correlated and SUR estimation may be more efficient.
To perform SUR, the only change is to rename the system (if desired) and to change method=ols to method=sur.
1 system name="Grunfeld_sur"
2 equation inv_g const v_g k_g
3 equation inv_w const v_w k_w
4 end system
5 estimate "Grunfeld_sur" method=sur
The results appear below:
Equation system, Seemingly Unrelated Regressions
Equation 1: SUR, using observations 1-20
Dependent variable: inv_g
Coefficient |
Std. Error |
t-ratio |
p-value |
|
const |
-27.7193 |
27.0328 |
-1.0254 |
0.3174 |
v-g |
0.0383102 |
0.0132901 |
2.8826 |
0.0092 |
k-g |
0.139036 |
0.0230356 |
6.0357 |
0.0000 |
Mean dependent var 102.2900 S. D. dependent var 48.58450
Sum squared resid 13788.38 S. E. of regression 26.25679
Equation 2: SUR, using observations 1-20
Dependent variable: inv_w
Coefficient |
Std. Error |
t-ratio |
p-value |
|
const |
-1.25199 |
6.95635 |
-0.1800 |
0.8590 |
v_w |
0.0576298 |
0.0134110 |
4.2972 |
0.0004 |
k_w |
0.0639781 |
0.0489010 |
1.3083 |
0.2056 |
Mean dependent var 42.89150 S. D. dependent var 19.11019 Sum squared resid 1801.301 S. E. of regression 9.490260
Once the system has been estimated, the restrict command can be used to impose the crossequation restrictions on a system of equations that has been previously defined and named. The set of restrictions is starts with the keyword restrict and terminates with end restrict. Some additional details and examples of how to use the restrict command are given in section 6.1. Each restriction in the set is expressed as an equation. Put the linear combination of parameters to be tested on the left-hand-side of the equality and a numeric value on the right. Parameters are referenced using b[i, j] where i refers to the equation number in the system, and j the parameter number. So, to equate the intercepts in equations one and two use the statement
b[1,1] - b[2,1] = 0 (15.20)
The full syntax for testing the full set of cross-equation restrictions
від = Plw, в2д = e2w, взд = взw (15.21)
on equations (15.13) and (15.14) is shown [84] 2
b[1,2]-b[2,2]=0
4 b[1,3]-b[2,3]=0
5 end restrict
6 estimate "Grunfeld_sur" method=sur —geomean
Gretl estimates the two equation SUR subject to the restrictions.
Equation system, Grunfeld_sur
Estimator: Seemingly Unrelated Regressions
Equation 1: SUR, using observations 1-20
Dependent variable: inv_g
Coefficient |
Std. Error |
t-ratio |
p-value |
|
const |
19.1578 |
2.54265 |
7.5346 |
0.0000 |
v-g |
0.0226805 |
0.00502650 |
4.5122 |
0.0002 |
k-g |
0.109053 |
0.0190478 |
5.7252 |
0.0000 |
Mean dependent var 102.2900 S. D. dependent var 48.58450
Sum squared resid 15923.76 S. E. of regression 28.21681
Equation 2: SUR, using observations 1-20
Dependent variable: inv_w
Coefficient |
Std. Error |
t-ratio |
p-value |
|
const |
19.1578 |
2.54265 |
7.5346 |
0.0000 |
v_w |
0.0226805 |
0.00502650 |
4.5122 |
0.0002 |
k_w |
0.109053 |
0.0190478 |
5.7252 |
0.0000 |
Notice that the intercept and slopes are equal across firms now. The restrictions are imposed. It also computes an F-statistic of the null hypothesis that the restrictions are true versus the alternative that at least one of them is not true. It returns the computed F-statistic and its p-value. A p-value less than the desired level of significance leads to a rejection of the hypothesis.
The gretl output from this test procedure is
F test for the specified restrictions:
F(3,34) = 3.43793 [0.0275]
which does not match the result in the text.[85] At the 5% level of significance, the equality of the two equations is rejected.
1 set echo off
2 open "@gretldirdatapoenls_panel. gdt"
3 # pooled least squares
4 list xvars = const educ exper exper2 tenure tenure2 south union black
5 panel lwage xvars —pooled —robust
6
6 # fixed effects
7 xvars -= const
8 panel lwage xvars const —fixed-effects
10
її # fixed effects and lsdv
12 genr unitdum
13 xvars -= educ black
14 ols lwage xvars du_*
їв panel lwage xvars const —fixed-effects
16
17 # fe, re, between, and pooled comparison
18 open "@gretldirdatapoenls_panel. gdt"
19 list xvars = educ exper exper2 tenure tenure2 south union black
20 panel lwage xvars const --fixed-effects
21 modeltab add
22 panel lwage xvars const --random-effects
23 modeltab add
24 panel lwage xvars const --between
25 modeltab add
26 panel lwage xvars const --pooled --robust
27 modeltab add
28 modeltab show
29 modeltab clear
30
30 # Grunfeld example -- ols
31 open "@gretldirdatapoegrunfeld2.gdt"
32 list X = const v k
33 ols inv X
34 modeltab add
35 genr unitdum
36 list dZ = null
37 loop foreach i X
38 series d$i = du_2 * $i
39 list dZ = dZ d$i
40 endloop
41 ols inv X dZ
42 modeltab add
43 modeltab show
45
44 # using wls to estimate each equation separately
45 wls du_1 inv v k const
46 modeltab add
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
wls du_2 inv v k const modeltab add modeltab show
# repeat using wls with a loop loop foreach i du_*
wls $i inv v k const endloop
# sur--reshaping the data
open "@gretldirdatapoegrunfeld2.gdt"
list X = inv v k
matrix dat = { X }
matrix Y = mshape(dat,20,6)
colnames(Y,"ge_i w_i ge_v w_v ge_k w_k ")
nulldata 20 --preserve list v = null scalar n = cols(Y) loop for i=1..n
series v$i = Y[,i] endloop
# rename the variables to improve output rename v1 inv_g
rename v2 inv_w rename v3 v_g rename v4 v_w rename v5 k_g rename v6 k_w
setinfo inv_g - d "Investment GE" - n "" setinfo inv_w - d "Investment Westinghouse" - n
# actual system estimation -- ols system name="Grunfeld"
equation inv_g const v_g k_g equation inv_w const v_w k_w end system
estimate "Grunfeld" method=ols
# actual system estimation -- sur system name="Grunfeld_sur"
equation inv_g const v_g k_g equation inv_w const v_w k_w end system
estimate "Grunfeld_sur" method=sur
# restricting sur restrict "Grunfeld_sur" b[1,1]-b[2,1]=0 b[1,2]-b[2,2]=0
100 b[1,3]-b[2,3]=0
101 end restrict
102 estimate "Grunfeld_sur" method=sur —geomean