A COMPANION TO Theoretical Econometrics
Collinearity in nonlinear regression models
When examining Z(P) for collinearity a problem arises. That is, Z(P) depends not only on the data matrix X but also on the parameter values p. Thus collinearity changes from point to point in the parameter space, and the degree of collinearity among the columns of the data matrix X may or may not correspond to collinearity in Z(P). This problem affects nonlinear regression in two ways. First, the Gauss - Newton algorithm itself may be affected by collinearity in Z(P), because at each iteration the cross-product matrix Z(Pn)'Z(P„) must be inverted. If the columns of Z(P„) are highly collinear then the cross-product matrix may be difficult to invert, at which point the algorithm may fail. Second, the estimated asymptotic covariance matrix of the nonlinear least squares estimator, equation (12.24), contains the cross-product matrix Z(S)'Z(S), and thus the estimated variances and covariances suffer from the usual consequences of collinearity, depending on the relationships between the columns of Z(S). Computer software packages, such as SAS 6.12, compute and report the BKW diagnostics for the matrix Z(Pn)'Z(P„) when the Gauss-Newton algorithm fails, so that the user may try to determine the source of the very nearly exact collinearity that leads to the failure, and it also computes the conditioning diagnostics for Z(S)'Z(S), upon convergence of the algorithm. There remains, of course, the collinearity among the columns of Z(P), which enters the true asymptotic covariance matrix of the nonlinear least squares estimator in equation (12.23), and which remains unknown.
What do we do if collinearity, or ill-conditioning, of Z(P„) causes the Gauss - Newton algorithm to fail to converge? The conditioning of Z(P„) can be affected by scaling the data. One common problem is that the columns of Z(P„) have greatly different magnitudes. Recall that Z(P„) contains the first derivatives of the function evaluated at P„, so magnitudes in Z(P„) are slopes of the functions f(X, P). If these are greatly different then the function is steep in some directions and shallow in others. Such an irregular surface is difficult to work with. By rescaling the columns of X, it is sometimes possible to more nearly equalize the columns of Z(P„), meaning that the function f(X, P) itself has been smoothed. This is usually advantageous.
When computing the BKW diagnostics the columns of Z(P„) should be scaled to unit length. If, after the data are scaled, the condition number of Z(P„) is still large, closer examination of the function, data, and parameter values are required. To illustrate, Greene (1997, p. 456) and Davidson and MacKinnon (1993, pp. 181-6) give the example of the nonlinear consumption function C = a + РУY + e, where C is consumption and У is aggregate income. For this model the fth row of Z(P) is [1 УY РУY 1пУ]. What happens if during the Gauss-Newton iterations the value of y approached zero? The second column approaches 1, and is collinear with the first column. What happens if P ^ 0? Then the third column approaches 0, making Z(P) ill-conditioned. In these cases collinearity is avoided by avoiding these parameter values, perhaps by selecting starting values wisely. For a numerical example see Greene (1997, pp. 456-8). There are alternative algorithms to use when convergence is a problem in nonlinear least squares regression. It is very useful to be aware of the alternatives offered by your software, as some may perform better than others in any given problem. See Greene (1997, ch. 5).