Introduction to the Mathematical and Statistical Foundations of Econometrics
Taylor’s Theorem
The mean value theorem implies that if, for two points a < b, f (a) = f (b),then there exists a point c є [a, b] such that f'(c) = 0. This fact is the core of the proof of Taylor’s theorem:
Theorem II.9(a): Let f (x) be an n-times, continuously differentiable realfunction on an interval [a, b] with the nth derivative denoted by f(n)(x). For any pair of points x, x0 e [a, b] there exists a X e [0, 1] such that
where Rn is the remainder term. Now let a < x0 < x < b be fixed, and consider the function
^ ГҐ ГҐ ^ (x — u ^ vik)f Rn(x — u T
g(u) = f(x) — f (u) — ^ — ----------------- f( )(u) — (x_^ )И
with derivative
Then g(x) = g(x0) = 0; hence, there exists a point c e [x0, x] such that g/(c) = 0 :
Therefore,
Rn = ——f (n)(c) = ——f(n) (x0 +X(x — x0)), (II.13)
n! n!
where c = x0 + к (x - x0). If we combine (11.12) and (11.13), the theorem follows. Q. E.D.
Also, Taylor’s theorem carries over to real functions of more than one variable, but the result involved is awkward to display for n > 2. Therefore, we only state the second-order Taylor expansion theorem involved:
Theorem II.9(b): Let f (x) be a twice continuously differentiable real function on a convex subset S of Rn. For any pair of points x, x0 є S there exists a к є [0, 1] such that
II.4. Optimization
Theorem II.9(b) shows that the function f (x) involved is locally quadratic. Therefore, the conditions for a maximum or a minimum of f (x) in a point xo є S can be derived from (II.14) and the following theorem.
Theorem II.10: Let Cbea symmetric n x n matrix, and let f (x) = a + xTb + xTCx, є Rn, where a is a given scalar and b is a given vector in Rn. If C is positive (negative) definite, then f (x) takes a unique minimum (maximum) at x = -1/2 C-1 b.
Proof: The first-order condition for a maximum or minimum is df (x)/d xT = o^ Rn); hence, x = -1/2C-1 b. As to the uniqueness issue and the question of whether the optimum is a minimum or a maximum, recall that C = QA QT, where Л is the diagonal matrix of the eigenvalues of C and Q is the corresponding matrix of eigenvectors. Thus, we can write f (x) as f (x) = a + xT QQTb + xTQAQTx. Let y = QTx = (yi, yn)T and в = QTb =
(в1,..., fn)T Then f(Qy) = a + yJp + yTA y = a + ї™yj + kj yj). The latter is a sum of quadratic functions in one variable that each have a unique minimum if к j > o and a unique maximum if к j < o. Q. E.D.
It follows now from (II.14) and Theorem П. Ш that
Theorem II.11: The function f (x) in Theorem II.9(b) takes a local maximum (minimum) in a point xo є S, that is, xo is contained in an open subset S o of S such that, for all x є So{xo}, f (x) < f (xo)( f (x) > f (xo)) if and only if df (xo)/d xj = o^ Rn), and the matrix d2 f (xo)/(9 xo 9 xj) is negative (positive) definite.
A practical application of Theorems 11.8(a), II.9, and II.10 is the Newton iteration for finding a minimum or a maximum of a function. Suppose that the function f (x) in Theorem II.9(b) takes a unique global maximum at x* є s. Starting from an initial guess x0 of x*, for k > 0 let,
Thus, the Newton iteration maximizes or minimizes the local quadratic approximation of f in xk. The iteration is stopped if for some small threshold
Є > || xk+1 — xk || < Є