Introduction to the Mathematical and Statistical Foundations of Econometrics
Uniform Continuity
A function g on Кк is called uniformly continuous if for every є > 0 there exists a 8 > 0 such that |g(x) - g(y)| < є if ||x - уУ <8. In particular,
Theorem II.7: If a function g is continuous on a compact subset © ofRk, then it is uniformly continuous on ©.
Proof: Let є > 0 be arbitrary, and observe from the continuity of g that, for each x in ©, there exists a 8(x) > 0 such that |g(x) - g(y)| < є/2 if \x — y\ < 2S(x). Now let U(x) = {y є Rk : ||y - x У < 8(x)}. Then the collection {U(x), x є ©} is an open covering of ©; hence, by compactness of © there exists a finite number of points 6, ■■■,6n in © such that © c и"= U (Qj). Next, let 8 = mini< j<„ 8(Qj )■ Each point x є © belongs to at least one of the open sets U(Qj):x є U(Qj) for some j. Then ||x — Qj || < 8(Qj) < 28(Qj) and hence |g(x) — g(Qj)| < є/2. Moreover, if ||x — y || <8, then
У У — Qj II = y — x + x — Qj У < ||x — y У
+ ||x — Qj || <8 + 8(Qj) < 28(Qj);
hence, |g(y) — g(Qj)| < є/2. Consequently, |g(x) — g(y)| < |g(x) — g(Qj)| +
lg(y) — g(Qj)l < є if ||x — y|| < 8. Q. E.D.
II.3. Derivatives of Vector and Matrix Functions
Consider a real function f(x) = f(x1,...,xn) on Rn, where x = (x1;...,x„ )T. Recall that the partial derivative of f to a component xi of x is denoted and defined by
df (x ) df(x1,...,xn )
d xi
def... f(x1 ,...,xi—1, xi +8, xi+1,
= lim------------------------------------
5^0
For example, let f (x) = вTx = xт в = в1 x1 +---------- finx„. Then
This result could also have been obtained by treating x T as a scalar and taking the derivative of f (x) = xTe to xT : d(xTв)/dx T = в. This motivates the convention to denote the column vector of a partial derivative of f (x) by df (x )/d xT. Similarly, if we treat x as a scalar and take the derivative of f (x) = вTx to x, then the result is a row vector: д(вTx)/dx = вT. Thus, in general,
If the function H is vector valued, for instance H(x) = (h1(x),■■■, hm (x))T, x є К", then applying the operation d/dx to each of the components yields an m x n matrix:
Moreover, applying the latter to a column vector of partial derivatives of a real function f yields
/ д2 f(x) |
д2 f(x) |
д x 1 д x 1 |
д x 1 д xn |
д 2 f (x) |
д 2 f (x) |
д xn д x 1 |
дxn дxn / |
d (df (x )/d x T) d x |
d 2 f (x) d x d x T ’ |
for instance.
In the case of an m x n matrix X with columns x1,...,xn є R, xj = (x^ j, ■■■, xm, j ) and a differentiable function f (X) on the vector space of k x n matrices, we may interpret X = (x1,■■■,xn) as a “row” of column vectors, and thus
i-p def t-1
is an n x m matrix. For the same reason, дf (X)/дX1 = (дf (X)/дX)1 ■ An example of such a derivative to a matrix is given by Theorem I.33 in Appendix I, which states that if X is a square nonsingular matrix, then д ln[det(X)]/дX = X-f
Next, consider the quadratic function f (x) = a + xTb + xTCx, where
Thus, C is a symmetric matrix. Then
d
df (x )/dxk = -
ST' , dxi V“^ V“^ dxici, jxj
= £1,1 d; + £ z—
n n
= bk + '2c;,;x; + ^ ]xici, k + ^ ] ck, j xj
i=1 j = 1
i =k j =;
n
= bk + 2^2 c;, jXj, k = 1,...,n;
j=1
hence, stacking these partial derivatives in a column vector yields
df(x )/d xT = b + 2Cx. (II.8)
If C is not symmetric, we may without loss of generality replace C in the function f (x) by the symmetric matrix C/2 + CT/2 because xTCx = (xTCx)T = xTCTx, and thus
df (x )/dxT = b + Cx + CTx.
The result (II.8) for the case b = 0 can be used to give an interesting alternative interpretation of eigenvalues and eigenvectors of symmetric matrices, namely, as the solutions of a quadratic optimization problem under quadratic restrictions. Consider the optimization problem
maxorminxT Ax s ■ t ■ xTx = 1, (II.9)
where A is a symmetric matrix and “max” and “min” include local maxima and minima and saddle-point solutions. The Lagrange function for solving this problem is
<(x, X) = xTAx + X(1 — xTx) with first-order conditions
d<(x, X)/dxT = 2Ax — 2Xx = 0 ^ Ax = Xx, (II.10)
d<(x, X)/dX = 1 — xTx = 0 ^ ||x || = 1. (II.11)
Condition (II.10) defines the Lagrange multiplier X as the eigenvalue and the solution for x as the corresponding eigenvector of A, and (II.11) is the normalization of the eigenvector to unit length. If we combine (II.10) and (II.11), it follows that X = xTAx.
Figure II.1. The mean value theorem. |