Springer Texts in Business and Economics

A Measure of Fit

We have obtained the least squares estimates of a, в and a2 and found their distributions under normality of the disturbances. We have also learned how to test hypotheses regarding these parameters. Now we turn to a measure of fit for this estimated regression line. Recall, that ei = Yi — YPi where YPi denotes the predicted Yi from the least squares regression line at the value Xi, i. e., aoLS + POLSXi. Using the fact that £i=1 ei = 0, we deduce that £П= Yi = £П=1 Yi,

and therefore, Y = Y. The actual and predicted values of Y have the same sample mean, see numerical properties (i) and (iii) of the OLS estimators discussed in section 2. This is true

as long as there is a constant in the regression. Adding and subtracting Y from ti, we get ei = yi — yi, or yi = ei + y. Squaring and summing both sides:

£i=1 yf = £i=1 e2 + £!£ yf + 2 £!£ uy = £i=1 e? + £!£ yf (3.10)

where the last equality follows from the fact that yi = /3OLSxi and £™=1 ex = 0. In fact,

£i=1 ey = £n=i eiY = 0

means that the OLS residuals are uncorrelated with the predicted values from the regression, see numerical properties (ii) and (iv) of the OLS estimates discussed in section 3.2. In other words, (3.10) says that the total variation in Yi, around its sample mean Y i. e.^^™=1 yf, can be


decomposed into two parts: the first is the regression sums of squares ™=1 yf = eOLSZIi=1 xf, and the second is the residual sums of squares £i=1 ef. In fact, regressing Y on a constant yields aOLS = y, see problem 2, and the unexplained residual sums of squares of this naive model is

£1=1 (Yi — a ols )2 = £!£(£ — у)2 = £1=1 yf

Therefore, у2 in (3.10) gives the explanatory power of X after the constant is fit.

Using this decomposition, one can define the explanatory power of the regression as the ratio of the regression sums of squares to the total sums of squares. In other words, define R2 = £f=1 yf/Z7=1 yf and this value is clearly between 0 and 1. In fact, dividing (3.10) by £f=1 yf one gets R2 = 1 — Zi=1 ef/ZП=1 yf. The ef is a measure of misfit which was minimized by least squares. If £i= ef is large, this means that the regression is not explaining a lot of the variation in Y and hence, the Rf value would be small. Alternatively, if the ef is small, then the fit is good and R2 is large. In fact, for a perfect fit, where all the observations lie on the fitted line, Yi = Yyi and ei = 0, which means that ef = 0 and Rf = 1. The other extreme case is where the regression sums of squares 5^*= yf = 0. In other words, the linear regression explains nothing of the variation in Yi. In this case, £N=1 y2 = £f=1 ef and Rf = 0. Note that since Ei=1 у? = 0 implies yi = 0 for every i, which in turn means that Yyi = Y for every i. The fitted regression line is a horizontal line drawn at Y = У, and the independent variable X does not have any explanatory power in a linear relationship with Y.

Note that R2 has two alternative meanings: (i) It is the simple squared correlation coefficient between Yi and Yi, see problem 9. Also, for the simple regression case, (ii) it is the simple squared correlation between X and Y. This means that before one runs the regression of Y on X, one can compute r^y which in turn tells us the proportion of the variation in Y that will be explained by X. If this number is pretty low, we have a weak linear relationship between Y and X and we know that a poor fit will result if Y is regressed on X. It is worth emphasizing that R2 is a measure of linear association between Y and X. There could exist, for example, a perfect quadratic relationship between X and Y, yet the estimated least squares line through the data is a flat line implying that R2 = 0, see problem 3 of Chapter 2. One should also be suspicious of least squares regressions with R2 that are too close to 1. In some cases, we may not want to include a constant in the regression. In such cases, one should use an uncentered R2 as a measure fit. The appendix to this chapter defines both centered and uncentered R2 and explains the difference between them.

Добавить комментарий

Springer Texts in Business and Economics

The General Linear Model: The Basics

7.1 Invariance of the fitted values and residuals to non-singular transformations of the independent variables. The regression model in (7.1) can be written as y = XCC-1" + u where …

Regression Diagnostics and Specification Tests

8.1 Since H = PX is idempotent, it is positive semi-definite with b0H b > 0 for any arbitrary vector b. Specifically, for b0 = (1,0,.., 0/ we get hn …

Generalized Least Squares

9.1 GLS Is More Efficient than OLS. a. Equation (7.5) of Chap. 7 gives "ois = " + (X'X)-1X'u so that E("ois) = " as long as X and u …

Как с нами связаться:

тел./факс +38 05235  77193 Бухгалтерия
+38 050 512 11 94 — гл. инженер-менеджер (продажи всего оборудования)

+38 050 457 13 30 — Рашид - продажи новинок
e-mail: msd@msd.com.ua
Схема проезда к производственному офису:
Схема проезда к МСД

Партнеры МСД

Контакты для заказов шлакоблочного оборудования:

+38 096 992 9559 Инна (вайбер, вацап, телеграм)
Эл. почта: inna@msd.com.ua