Math 55b: Honors Advanced Calculus and Linear Algebra (Spring 1999-2000)

Lecture notes for Math 55b: Honors Advanced Calculus and Linear Algebra (Spring 1999-2000)

If you find a mistake, omission, etc., please let me know by e-mail.

The orange balls mark our current location in the course, and the current problem set.

To develop univariate differential calculus, we largely follow Rudin, chapter 5.

As Rudin notes at the end of this chapter, differentiation can also be defined for vector-valued functions of one real variable. As Rudin does not note, the vector space can even be infinite-dimensional, provided that it is normed; and the basic algebraic properties of the derivative listed in Thm. 5.3 (p.104) can be adapted to this generality, e.g. the formula (fg)'=f'g+fg' still holds if f,g take values in normed vector spaces U,V and multiplication is interpreted as a continuous bilinear map from U x V to some other normed vector space W.
NB The norm does not have to come from an inner product structure. Often this does not matter because we work in finite dimensional vector spaces, where all norms are equivalent, and changing to an equivalent norm does not affect the definition of the derivative. The one exception to this is Thm. 5.19 (p.113) where one needs the norm exactly rather than up to a constant factor. This theorem still holds for a general norm but requires an additional argument. The key ingredient of the proof is this: given a nonzero vector z in a vector space V, we want a continuous functional w on V such that ||w||=1 and w(z)=|z|. If V is an inner product space (finite-dimensional or not), the inner product with z/|z| provides such a functional w. But this approach does not work in general. The existence of such w is usually proved as a corollary of the Hahn-Banach theorem. When V is finite dimensional, w can be constructed by induction on the dimension of V. To deal with the general case one must also invoke the Axiom of Choice in its usual guise of Zorn's Lemma.

We next start on univariate integral calculus, following Rudin, chapter 6. The following gives some motivation for the definitions there. (And yes, it's the same Riemann who gave number theorists like me the Riemann zeta function and the Riemann hypothesis.)

The Riemann-sum approach to integration goes back to the ``method of exhaustion'' of classical Greek geometry, in which the area of a plane figure (or the volume of a region in space) is bounded below and above by finding subsets and supersets that are finite unions of disjoint rectangles (or boxes). The lower and upper Riemann sums adapt this idea to the integrals of functions which may be negative as well as positive (one of the weaknesses of geometric Greek mathematics is that the ancient Greeks had no concept of negative quantities -- nor, for that matter, of zero). You may have encountered the quaint technical term ``quadrature'', used in some contexts as a synonym for ``integration''. This too is an echo of the geometrical origins of integration. ``Quadrature'' literally means ``squaring'', meaning not ``multiplying by itself'' but ``constructing a square of the same size as'', which is equivalent to ``finding the area of'' -- cf. the phrase ``squaring the circle''. For instance, Greek geometry contains a theorem equivalent to the integration of x²dx; this result is called the ``quadrature of the parabola''. The proof is tantamount to the evaluation of lower and upper Riemann sums for the integral of x²dx.
An alternative explanation of the upper and lower Riemann sums is that they arise by repeated application of the following two axioms describing the integral:

For any a,b,c (with a < b < c), the integral of a function from a to c is the sum of its integrals from a to b and from b to c;
If a function f on [a,b] takes values in [m,M] then its integral from a to b is in [m(b-a), M(b-a)] (again assuming a < b).
The latter axiom is a consequence of the following two: the integral of a constant function from a to b is that constant times the length b-a of the interval [a,b]; and if f<=g on some interval then the integral of f over that interval does not exceed the integral of g. Note that again all these axioms arise naturally from an interpretation of the integral as a ``signed area''.
Recall the ``integration by parts'' identity: fg is the integral of f dg + g df. The Stieltjes integral is a way of making sense of this identity even when f and/or g is not continuously differentiable. To be sure, some hypotheses on f and g must still be made for the Stieltjes integral of f dg to make sense. Rudin specifies one suitable system of such hypotheses.
Here's a version of Riemann-Stieltjes integrals that works cleanly for integrating bounded functions from [a,b] to any complete normed vector space: vint.pdf, vint.ps.
Riemann-Stieltjes integration by parts: Suppose both f and g are increasing functions on [a,b]. For any partition a = x₀ < ... < x_n = b of the interval, write f(b)g(b) - f(a)g(a) as the telescoping sum of f(x_i)g(x_i) - f(x_i-1)g(x_i-1) from i=1 to n. Now rewrite the i-th summand as f(x_i) (g(x_i)-g(x_i-1)) + g(x_i-1) (f(x_i)-f(x_i-1)). [Naturally it is no accident that this identity resembles the one used in the familiar proof of the formula for the derivative of fg!] Summing this over i yields the upper Riemann-Stieltjes sum for the integral of f dg plus the lower R.-S. sum for the integral of g df. Therefore: if one of these integrals exists, so does the other, and their sum is f(b)g(b) - f(a)g(a). [Cf. Rudin, page 141, Exercise 17.]

Most of Chapter 7 of Rudin we've covered already in the 55a lectures and problem sets. For more counterexamples along the lines of the first section of that chapter, see Counterexamples in Analysis by B.R.Gelbaum and J.M.H.Olsted -- there are two copies in the science center library (QA300.G4). After covering a few odds and ends from this chapter, we'll proceed to power series and the exponential and logarithmic functions in Chapter 8; we'll use these to derive the binomial theorem for arbitrary exponents. This will give us a way to simplify a key ingredient in the Stone-Weierstrass theorem, which is the one major result of Chapter 7 we haven't seen yet. We postpone discussion of Euler's Beta and Gamma integrals (also in Chapter 8) so that we can use multivariate integration to give a more direct proof of the formula relating them.

The result mentioned several times in our Feb.14 meeting re the convergence of alternating series is stated and proved on pages 70-71 of Rudin (Theorem 3.42).

The original Weierstrass approximation theorem (7.26 in Rudin) can be reduced to the uniform approximation of the single function |x| on [-1,1]. From this function we can construct an arbirtrary piecewise linear continuous function, and such P.L. functions uniformly approximate any continuous function on a closed interval. To get at |x|, we'll rewrite it as [1-(1-x²)]^1/2, and use the power series for (1-X)^1/2. See the first part of exercise 22 for chapter 8, on p.201, for this power series (and more generally for the power series for (1-X)^a). We need (1-X)^1/2 to be approximated by its power series uniformly on the closed interval [-1,1] (or at least [0,1]); but fortunately this too follows from the proof of Abel's theorem (8.2, pages 174-5). Actually this is a subtler result than we need, since the xⁿ coefficient of the power series for (1-X)^1/2 is negative for every n>0. If a power series in X has radius of convergence 1 and all but finitely many of its nonzero coefficients have the same sign, then it is easily shown that the sum of the coefficients converges if and only if f(X) has a finite limit as X approaches 1, in which case the sum equals that limit and the power series converges uniformly on [0,1]. That's all we need because clearly (1-X)^1/2 extends to a continuous function on [0,1]. (For an alternative approach to uniformly approximating |x|, see exercise 23 on p.169.)

An explicit upper bound of 4 on Pi can be obtained by calculating cos(2) < 1 - 2^2/2! + 2^4/4! = -1/3 < 0. For much better estimates, integrate (x-x²)⁴ dx/(x²+1) from 0 to 1 and note that 0 < 1/(x²+1) <= 1. :-)

We next begin multivariate differential calculus, starting at the middle of Rudin Chapter 9 (since the first part of that chapter is for us a review of linear algebra). Again, Rudin works with functions from open subsets of Rⁿ to R^m, but must of the discussion works equally well with the target space R^m replaced by an arbitrary normed vector space V. As in the univariate case, proving the Mean Value Theorem in the multivariate context (Theorem 9.19) requires either that V have an inner-product norm, or the use of the Hahn-Banach theorem to construct suitable functionals on V. Once this is done, the key Theorem 9.21 can also be proved for functions to V, and without first doing the case m=1. To do this, first prove the result in the special case when each D_j f(x) vanishes; then reduce to this case by subtracting from f the linear map from Rⁿ to V indicated by the partial derivatives D_j f(x).

The Inverse function theorem (9.24) is a special case of the Implicit function theorem (9.28), and its proof amounts to specializing the proof of the implicit function theorem. But Rudin proves the Implicit theorem as a special case of the Inverse theorem, so we have to do Inverse first. (NB of course for these two theorems we do need that our target space is finite-dimensional.) The proof of the second part of the implicit function theorem, which asserts that the implicit function g not only exists but is also continuously differentiable with derivative at b given by formula (58) (p.225), can be done more easily using the chain rule, since g has been constructed as the composition of the following three functions: first, send y to (0,y); then, apply the inverse function F^-1; finally, project the resulting vector (x,y) to x. The first and last of these three functions are linear, so certainly C¹; and the continuous differentiability of F^-1 comes from the inverse function theorem.

Here's an approach to D_ij=D_ji that works for a C² function to an arbitrary normed space. As in Rudin (see p.235) we reduce to the case of a function of two variables, and define u and Delta. Assume first that D₂₁ f vanishes at (a,b). Then use the Fundamental Theorem of Calculus to write Delta(f,Q) as as the integral of u'(t)dt on [a,a+h], and then write u'(t) as an integral of D₂₁ f(t,b+u)du on [0,k]. Conclude that u'(t)=o(k) and thus that Delta(f,Q) / hk approaches zero. Now apply this to the function f-xyD₂₁ f(x,y) to see that in general Delta(f,Q) / hk approaches D₂₁f(x,y). Do the same in reverse order to conclude that D₂₁ f(x,y)=D₁₂ f(x,y).

The next topic is multivariate integral calculus, covered in Chapter 10. After going over the basic definitions and properties through the change-of-variable formula, we take a detour back to Chapter 8 to treat Euler's Beta and Gamma functions, which are a fundamental tool in the evaluation of definite integrals and even in many identities and asymptotic formulas which do not seem to involve integration at all. The canonical approach to the identity

Beta(x,y) = Gamma(x) Gamma(y) / Gamma(x+y)

is through change of variables in a double integral, as outlined in the exercises from Chapter 10 assigned for this week; but the approach in Chapter 8 via logarithmic convexity and the Bohr-Mollerup theorem has its own virtues: it yields some integral identities which cannot be obtained directly from double integration, as well as the product formula from which follow such results as

Gamma(x) Gamma(1-x) = Pi / sin(Pi*x).

This last is also the only reasonable way -- or at least the only one I know -- to evaluate Beta(x,1-x) without complex analysis.

Rudin uses a fact about convex functions that is only presented as an exercise earlier in the book (p.100, #23). Namely: let f be a convex function on some interval I, and consider s(x, y) := (f(x)-f(y)) / (x-y) as a function on the set of (x, y) in I x I with x > y; then s is an increasing function of both variables. The proof is fortunately not hard. For instance, to prove that if x > y' > y then s(x, y') > s(x, y), write y' as px+qy and calculate that s(x, y') > s(x, y) is equivalent to the usual convexity condition. The case x > x' > y works in exactly the same way.

By the way, the rather obscure integration by parts in Rudin, p.194 is not necessary. A straightfoward choice of ``parts'' yields

x B(x, y+1) = y B(x+1, y) ;

This may seem to go in a useless direction, but the elementary observation that

B(x, y) = B(x, y+1) + B(x+1, y)

recovers the recursion (97). In addition to the trigonometric definite integrals noted by Rudin (formula 98), Beta functions also turn up in the evaluation of the definite integral of u^a du / (1+u^b)^c over (0,infinity): let t = u^b / (1+u^b). What is the value of that integral? Can you obtain in particular the formula Pi / (b sin(a Pi/b)) for the special case c=1?

We're back to Chapter 10 now, setting up the machinery of differential forms that is the framework for the higher-dimensional generalization of the Fundamental Theorem of Calculus that comprises the divergence, Stokes, and Green theorems and much else besides.

Here's an introduction to the underlying algebraic structure of exterior algebra, in .ps or .pdf format.

The basic formula (92) that yields generalized Stokes (formula 91) is even more easily proved for oriented k-cells instead of k-simplices, since each term of d(omega) on the left-hand side of the identity yields the sum of 2 of the 2k terms of the boundary occurring on the right-hand side. Naturally the signs that enter when one does it for a k-simplex are the reason we put the (-1)^j into the definition of the boundary (formula 85). One interpretation of generalized Stokes is that the operators d on differential forms, and boundary on chains, are adjoints relative to the pairing given by integration. In fact just as d²=0 we have (boundary)²=0; this is easy to show, and further bolsters our confidence in our choice of signs in formula 85. Just as we defined the k-th cohomology of E as the quotient of the closed k-forms on E by the exact k-forms, we can define the k-th homology H_k(E) to be the quotient of the closed k-chains in E by the boundaries of (k+1)-chains, where as you might expect a chain is said to be ``closed'' if its boundary is a trivial chain (this generalizes the notion of a ``closed curve''). For instance, H₀(E) has one generator for each component of E; when k is positive, H_k(E) is trivial if E is convex; and H₁(R²-{0}) is one-dimensional, generated by a circle around the origin. Moreover, thanks to generalized Stokes, integration yields a pairing of H_k(E) with H^k(E). You will learn in differential topology that this pairing often identifies each of these two spaces with the other's dual. You can already check that this holds for the punctured plane R²-{0}.

The following remarks may help explain how come a function from an open set in R³ to R³ yields both a 1-form and a 2-form (formulas 124, 125 on page 281). In general, once we choose a generator for the n-th exterior power of an n-dimensional vector space V over some field (remember that this top exterior power is 1-dimensional), the exterior product yields for each k a pairing between the k-th and (n-k)-th exterior powers of V, which identifies each of the two spaces with the other's dual. In our case n=3 and k=1, so we identify the dual of R³ with its exterior square. But, once we have chosen an inner product structure on R³, we have identified it with its own dual; and once we also give R³ an orientation, we get a generator for its top exterior power, namely e₁^e₂^e₃ for any positively oriented orthonormal basis (e₁, e₂, e₃). So, we have an identification of R³ with its exterior square, and thus can naturally identify the 1- and 2-forms on any open set in R³ with functions from that set to R³.

We're about to start on Fourier analysis, using Köorner's book (a bit of this material can also be found in Chapter 8 of Rudin). The natural context for Fourier analysis is the notion of a Hilbert space. The one strange choice Körner makes is to suppress this notion entirely (Hilbert's name does not even appear in the index!), as if to prove that all of Fourier analysis can be done without Hilbert spaces, as Axler set out to do all of linear algebra without determinants. But doing Fourier without Hilbert seems perverse. We shall thus begin with an introduction to the structure of Hilbert spaces, in particular separable ones. The first handout (hilbert1.pdf, hilbert1.ps) gives the definition, basic examples, and the fundamental notion of an orthonormal topological basis. The second handout (hilbert2.pdf, hilbert2.ps) introduces separability, and the behavior of orthogonal complements, projections, and duality for a Hilbert space.

With what we know already, we can give another kind of proof that the complex exponentials are an ontb for L₂([0,1]): their linear span is a C-subalgebra of C([0,1]) which separates points and is ``self-adjoint'' (closed under complex conjugation); therefore, by the complex Stone-Weierstrass theorem (Thm. 7.33 on page 165 of Rudin), it is dense in C([0,1]) in the sup metric. [In fact Rudin's introduction to the theory of Fourier series in Chapter 8 uses this argument; see Thm. 8.15 on page 190.] Thus a fortiori that linear span is dense in C([0,1]) also relative to the L₂ metric, and therefore is dense in L₂([0,1]), as desired. Rudin only deals with Fourier series in one dimension, but the same method also works for Fourier series on Tⁿ for n>1.

We'll proceed by developing Fourier analysis starting with the following chapters of Körner: 1 (Introduction), 2 (Statement and proof of Fejér's theorem, 3 (Weyl equidistribution), 9,15,16,18 (some results on convergence and divergence of Fourier series). Chapters 4 through 6 contain material most of which we have seen already, but the second part of Chapter 6 may still surprise you. You might also be interested in Chapter 19, which states without proof several results much more precise (though necessarily much harder) than is done in Chapter 18.

Re Chapter 3: Weyl showed more generally the following theorem: Let T₁, T₂, T₃,... be real numbers mod 2*pi, i.e. elements of T. Then the following are equivalent:

For all a,b with 0 < a < b < 1, the number of t_r with r <= n lying in 2*pi[a,b] is asymptotic to (b-a) n;
For any continuous function f from T to C, as n grows to infinity the average of f(t_r) over r <= n approaches the average of f over T
For each nonzero integer s, the average of e^ist_r approaches 0 as n grows to infinity.

When these equivalent conditions hold, the sequence {t_r} is said to be ``equidistributed mod 2*pi''. Thus the proof of Thm. 3.1 is in effect the application of this criterion to prove the equidistribution mod 1 of the sequence of integer multiples of a given irrational.

Re Chapter 18: As noted in class, what's really going on here is the following result: Let V,W be normed vector spaces, with V complete. Then any weakly convergent sequence {T_n} in B(V,W) must be bounded; in fact if {T_n} is unbounded then there are vectors v in V such that |T_nv| approaches infinity. (Recall that {T_n} is said to converge weakly to a linear transformation T if T_nv approaches Tv for all v in V -- in which case of course {T_nv} is bounded.)

To prove this result: Assume on the contrary that {T_n} is unbounded. Without loss of generality (i.e. by passing to a subsequence), assume that ||T₁||>1 and, for each n, we have ||T_n|| > 100 ||T_n-1||. let v_n be a vector of norm 1 such that |T_nv_n| > ||T_n|| / 2. Now define v as a sum of a convergent sequence thus: u₀=0; given u_n, choose for u_n+1 whichever of u_n+10^-nv_n or u_n-10^-nv_n makes |T_nu_n+1| larger; by the triangle inequality, |T_nv_n+1| is then at least ||T_n||/(2*10ⁿ). Let v be the limit of u_n. Then |T_nv| is at least (1/2-1/9)10^-n||T_n||, so approaches infinity as claimed. [Except for the needlessly large factors of 100 and 10, this is essentially what happens in Chapter 18 for the sequence of functionals on C([0,1]) taking f to S_n(f,t).]

Next on the menu: Chapters 40 and 41, on orthogonal polynomials and Gaussian quadrature. Unaccountably missing from the end of Chapter 40 is the following result, which I'll call ``Theorem 40.9'': let a,b,r,u_n be as in Theorem 40.8; then there is a ``three-term recurrence'' u_n+1 = (A_nt+B_n)u_n - C_nu_n-1, for some real scalars A_n , B_n , C_n. For instance, the existence of such a recurrence for the Tchebychev polynomials follows from the product formula for cos(nx)cos(x). In general, the recurrence can be proved by comparing the expansions of (t u_n) as a linear combination of the u_m and (t u_m) as a linear combination of the u_n. This method also gives explicitly formulas for A_n , C_n (but not generally B_n) in terms of the leading coefficients and norms of the u_n.

All the results of Chapter 41 generalize directly to integration with respect to an arbitrary weight on a finite interval, using zeros of polynomials orthogonal with respect to that weight. Note too that Lemma 41.1 can also be obtained as a consequence of the fact that, over any field F, evaluation at n distinct field elements is a linear bijection from the polynomials of degree <n to Fⁿ. This can be proved either computationally, as in Körner, or from general linear-algebra facts together with the observation that the kernel of the map is zero. The left-hand side (even with an arbitrary weight function) is a linear functional on that space, etc.

We're starting on discrete Fourier analysis, from chapters 98ff. in Körner. Fourier series, discrete Fourier analysis, and the Fourier transform (which we'll proceed to next) are special cases of Pontrjagin duality. We shall not be able to develop Pontrjagin duality in anything like its full scope in 55b; even the basic theorems require detailed study of things like Haar measure that are normally not established until well into a graduate analysis course like Math 212. But can still state some of the definitions and key theorems to suggest the framework into which the various Fourier variants fit. Here's an outline. For another glimpse into the Pontrjagin framework, in the special case of finite abelian groups where everything can be stated and proved using only linear algebra, see Chapters 103-4 of the textbook.

In each of our Fourier settings, to pointwise multiply two functions on a group G we convolve their Fourier transforms, and vice versa (up to scalar multiplication). See Chapters 51, 52 for convolution on R and T. Convolution on Z/nZ is defined in 97.7 and described in 97.8 on page 495. In general the convolution of two functions f,g is the function f*g whose value at any t is the sum/average/integral over x of f(t-x)g(x). We encountered such things already in connection with the Dirichlet and Fejér kernels; they will also be the key to our use of the fast Fourier transform (FFT, Chapter 98) to fast multiplication (Chapter 99). Note that the behavior of Pontrjagin duality on subgroups is also an important ingerdient in the FFT, which separates even and odd integers mod 2N.

Our final topic is Fourier transforms, Section IV of the text. We'll concentrate on the material in Chapters 46 (introduction, and Fejér part I), 49 (conclusion of Fejér, and some applications), 51 (convolution on R), and 60 (the inversion formula). Chapters 47 and 48 contain warnings and justifications of changing the order of integration that should be familiar and routine by now; you may want to read them for review purposes, but be aware that at least one typo is lurking (In Lemma 48.6 on page 234, f is a function not on R but on R²). For that matter there are at least two typos in Ch.46: in Lemm 46.3 we need f to satisfy the hypotheses of Lemma 46.2 (integrability and absolute convergence), not 41.2; and on the top line of p.224, it's the continuity of f^ that we need, not of f. [Also, there's only one `m' in ``imitate''!] The end of Chapter 49 recommends attention to Lemma 50.2, which contains two important integral formulas. Unfortunately Körner proves them with complex analysis, which most of you probably don't know yet. Fortunately 50.2(i) is also proved in Rudin (Example 9.43, pages 237-8), and 50.2(ii) will be easy once we have the inversion theorem of Chapter 60. As a natural extension of the results of Chapter 51 we have Parseval's identity and an application to the behavior at infinity of Fourier transforms of integrable functions: parseval.pdf, parseval.ps.

p1.pdf, p1.ps: First problem set: Univariate differential calculus

p2.pdf, p2.ps: Second problem set: Univariate integral calculus

p3.pdf, p3.ps: Third problem set: More univariate calculus, and Stone-Weierstrass

About problem 9(ii): The trick is to integrate by parts max(2, n₀) times so that alpha and beta are replaced by continuous functions integrated against each of 1, x, x², etc., and one can apply the result of 9(i). Just make sure that at each iteration one uses the antiderivative that vanishes at x=1.

p4.pdf, p4.ps: Fourth problem set: Multivariate differential calculus

p5.pdf, p5.ps: Fifth problem set: Integration in R^k, and special functions

The mystery numbers that show up in the final problem of the fifth problem set are essentially the Bernoulli and Euler numbers. The Bernoulli numbers B_n were already introduced in the second problem set; they can also be defined by the generating function: t / (e^t - 1) is the sum of B_n xⁿ / n! as n ranges from 0 to infinity. The Euler numbers 1, -1, 5, -61, 1385, -50521, ... are similarly generated by 1/cosh(t). They can also be obtained by evaluating the Bernoulli polynomials at 1/4. All this will become much easier to explain once we develop the theory of Fourier series.

p6.pdf, p6.ps: Sixth problem set: Convexity and Euler's integrals

p7.pdf, p7.ps: Seventh problem set: Exterior algebra and differential forms

p8.pdf, p8.ps: Eighth problem set: Exterior algebra, differential forms, chains, and integration

p9.pdf, p9.ps: Ninth problem set: Life in Hilbert space, and a bit on Fourier and summability methods

p10.pdf, p10.ps: Tenth problem set: Weyl, continued; differentiation of Fourier series; more Fourier applications

p11.pdf, p11.ps: Eleventh problem set: Fourier variations (Orthogonal polynomials and discrete Fourier analysis)

pp.pdf, pp.ps: An assortment of practice problems