Lecture notes, etc., for Math 55b: Honors Advanced Calculus and Linear Algebra (Spring 200[2-]3)

If you find a mistake, omission, etc., please let me know by e-mail. Andrei's Math 55 page Q & A: Questions that arose concerning lectures, problem sets, etc., and my replies

The orange balls mark our current location in the course, and the current problem set. We start with differential calculus of vector-valued functions of one real variable, building on Chapter 5 of Rudin.
You may have already seen ``little oh'' and ``big Oh'' notations. For functions f, g on the same space, ``f=O(g)'' means that g is a nonnegative real-valued function, f takes values in a normed vector space, and there exists a real constant M such that |f(x)|<=Mg(x) for all x. The notation ``f=o(g)'' is used in connection with a limit; for instance, ``f(x)=o(g(x)) as x approaches x0'' indicates that f, g are vector- and real-valued functions as above on some neighborhood of x0, and that for each c>0 there is a neighborhood of x0 such that |f(x)|<=cg(x) for all x in the neighborhood. Thus f'(x0)=a means the same as ``f(x)=f(x0)+a(x-x0)+o(|x-x0|) as x approaches x0'', with no need to exclude the case x=x0. Rudin in effect uses this approach when proving the Chain Rule (5.5).

In Rudin's proof of L'Hôpital's Rule (5.13), why can he assume that g(x) does not vanish for any x in (a,b), and that the denominator g(x)-g(y) in equation (18) is never zero?

The derivative of f/g can be obtained from the product rule, together with the derivative of 1/g -- which in turn can be obtained from the Chain Rule together with the the derivative of the single function 1/x. Once we do multivariate differential calculus, we'll see that the derivatives of f+g, f-g, fg, f/g could also be obtained in much the same way that we showed the continuity of those functions, by combining the multivariate Chain Rule with the derivatives of the specific functions x+y, x-y, xy, x/y of two variables x,y.

As Rudin notes at the end of this chapter, differentiation can also be defined for vector-valued functions of one real variable. As Rudin does not note, the vector space can even be infinite-dimensional, provided that it is normed; and the basic algebraic properties of the derivative listed in Thm. 5.3 (p.104) can be adapted to this generality, e.g., the formula (fg)'=f'g+fg' still holds if f,g take values in normed vector spaces U,V and multiplication is interpreted as a continuous bilinear map from U x V to some other normed vector space W.

NB The norm does not have to come from an inner product structure. Often this does not matter because we work in finite dimensional vector spaces, where all norms are equivalent, and changing to an equivalent norm does not affect the definition of the derivative. The one exception to this is Thm. 5.19 (p.113) where one needs the norm exactly rather than up to a constant factor. This theorem still holds for a general norm but requires an additional argument. The key ingredient of the proof is this: given a nonzero vector z in a vector space V, we want a continuous functional w on V such that ||w||=1 and w(z)=|z|. If V is an inner product space (finite-dimensional or not), the inner product with z/|z| provides such a functional w. But this approach does not work in general. The existence of such w is usually proved as a corollary of the Hahn-Banach theorem. When V is finite dimensional, w can be constructed by induction on the dimension of V. To deal with the general case one must also invoke the Axiom of Choice in its usual guise of Zorn's Lemma. We next start on univariate integral calculus, following Rudin, chapter 6. The following gives some motivation for the definitions there. (And yes, it's the same Riemann who gave number theorists like me the Riemann zeta function and the Riemann Hypothesis.)
The Riemann-sum approach to integration goes back to the ``method of exhaustion'' of classical Greek geometry, in which the area of a plane figure (or the volume of a region in space) is bounded below and above by finding subsets and supersets that are finite unions of disjoint rectangles (or boxes). The lower and upper Riemann sums adapt this idea to the integrals of functions which may be negative as well as positive (one of the weaknesses of geometric Greek mathematics is that the ancient Greeks had no concept of negative quantities -- nor, for that matter, of zero). You may have encountered the quaint technical term ``quadrature'', used in some contexts as a synonym for ``integration''. This too is an echo of the geometrical origins of integration. ``Quadrature'' literally means ``squaring'', meaning not ``multiplying by itself'' but ``constructing a square of the same size as''; this in turn is equivalent to ``finding the area of'', as in the phrase ``squaring the circle''. For instance, Greek geometry contains a theorem equivalent to the integration of x2dx, a result called the ``quadrature of the parabola''. The proof is tantamount to the evaluation of lower and upper Riemann sums for the integral of x2dx.

An alternative explanation of the upper and lower Riemann sums is that they arise by repeated application of the following two axioms describing the integral:

• For any a,b,c (with a < b < c), the integral of a function from a to c is the sum of its integrals from a to b and from b to c;
• If a function f on [a,b] takes values in [m,M] then its integral from a to b is in [m(b-a), M(b-a)] (again assuming a <b).
The latter axiom is a consequence of the following two: the integral of a constant function from a to b is that constant times the length b-a of the interval [a,b]; and if f<=g on some interval then the integral of f over that interval does not exceed the integral of g. Note that again all these axioms arise naturally from an interpretation of the integral as a ``signed area''.

Recall the ``integration by parts'' identity: fg is an integral of f dg + g df. The Stieltjes integral is a way of making sense of this identity even when f and/or g is not continuously differentiable. To be sure, some hypotheses on f and g must still be made for the Stieltjes integral of f dg to make sense. Rudin specifies one suitable system of such hypotheses.

Riemann-Stieltjes integration by parts: Suppose both f and g are increasing functions on [a,b]. For any partition a = x0 < ... < xn = b of the interval, write f(b)g(b) - f(a)g(a) as the telescoping sum of f(xi)g(xi) - f(xi-1)g(xi-1) from i=1 to n. Now rewrite the i-th summand as

f(xi) (g(xi)-g(xi-1)) + g(xi-1) (f(xi)-f(xi-1)).
[Naturally it is no accident that this identity resembles the one used in the familiar proof of the formula for the derivative of fg !] Summing this over i yields the upper Riemann-Stieltjes sum for the integral of f dg plus the lower R.-S. sum for the integral of g df. Therefore: if one of these integrals exists, so does the other, and their sum is f(b)g(b) - f(a)g(a). [Cf. Rudin, page 141, Exercise 17.] Most of Chapter 7 of Rudin we've covered already in the 55a lectures and problem sets. For more counterexamples along the lines of the first section of that chapter, see Counterexamples in Analysis by B.R.Gelbaum and J.M.H.Olsted -- there are two copies in the science center library (QA300.G4). Concerning Thm. 7.16, be warned that it can easily fail for ``improper integrals'' on infinite intervals (see Rudin, p.138, Exercise 8, assigned on PS2). It is often very useful to bring a limit or an infinite sum within an integral sign, but this procedure requires justification beyond Thm. 7.16.

After covering a few odds and ends from Chapter 7, we'll proceed to power series and the exponential and logarithmic functions in Chapter 8; we'll use these to derive the binomial theorem for arbitrary exponents. This will give us a way to simplify a key ingredient in the Stone-Weierstrass theorem, which is the one major result of Chapter 7 we haven't seen yet. We postpone discussion of Euler's Beta and Gamma integrals (also in Chapter 8) so that we can use multivariate integration to give a more direct proof of the formula relating them.

The result concerning the convergence of alternating series is stated and proved on pages 70-71 of Rudin (Theorem 3.42).

The original Weierstrass approximation theorem (7.26 in Rudin) can be reduced to the uniform approximation of the single function |x| on [-1,1]. From this function we can construct an arbirtrary piecewise linear continuous function, and such P.L. functions uniformly approximate any continuous function on a closed interval. To get at |x|, we'll rewrite it as [1-(1-x2)]1/2, and use the power series for (1-X)1/2. See the first part of exercise 22 for chapter 8, on p.201, for this power series (and more generally for the power series for (1-X)a). We need (1-X)1/2 to be approximated by its power series uniformly on the closed interval [-1,1] (or at least [0,1]); but fortunately this too follows from the proof of Abel's theorem (8.2, pages 174-5). Actually this is a subtler result than we need, since the Xn coefficient of the power series for (1-X)1/2 is negative for every n>0. If a power series in X has radius of convergence 1 and all but finitely many of its nonzero coefficients have the same sign, then it is easily shown that the sum of the coefficients converges if and only if f(X) has a finite limit as X approaches 1, in which case the sum equals that limit and the power series converges uniformly on [0,1]. That's all we need because clearly (1-X)1/2 extends to a continuous function on [0,1]. (For an alternative approach to uniformly approximating |x|, see exercise 23 on p.169.)

An explicit upper bound of 4 on Pi can be obtained by calculating cos(2) < 1 - 2^2/2! + 2^4/4! = -1/3 < 0. For much better estimates, integrate (x-x2)4 dx/(x2+1) from 0 to 1 and note that 0 < 1/(x2+1) <= 1. :-)

Rudin uses a fact about convex functions that is only presented as an exercise earlier in the book (p.100, #23). Namely: let f be a convex function on some interval I, and consider s(x, y) := (f(x)-f(y)) / (x-y) as a function on the set of (x, y) in I x I with x > y; then s is an increasing function of both variables. The proof is fortunately not hard. For instance, to prove that if x > y' > y then s(x, y') > s(x, y), write y' as px+qy and calculate that s(x, y') > s(x, y) is equivalent to the usual convexity condition. The case x > x' > y works in exactly the same way.

The rather obscure integration by parts in Rudin, p.194 is not necessary. A straightfoward choice of ``parts'' yields

x B(x, y+1) = y B(x+1, y) ;

This may seem to go in a useless direction, but the elementary observation that

B(x, y) = B(x, y+1) + B(x+1, y)

recovers the recursion (97).

In addition to the trigonometric definite integrals noted by Rudin (formula 98), Beta functions also turn up in the evaluation of the definite integral of u a du / (1+u b)c over (0,infinity): let t = u b / (1+u b). What is the value of that integral? Can you obtain in particular the formula Pi / (b sin(a Pi/b)) for the special case c=1?

The limit formula for Gamma(x) readily yields the product formula:

Gamma(x) = x-1 e-Cx Prod(exp(x/k) / (1+x/k), k=1,2,3,...)
where C=0.57721566490... is Euler's constant, the limit as N goes to infinity of 1+(1/2)+(1/3)+...+(1/N)-log(N). This lets us easily show that Gamma is infinitely differentiable (in fact analytic) and to obtain nice formulas for the derivatives of log(Gamma(x)); for instance, Gamma'(1)=-C, and more generally the logarithmic derivative of Gamma(x) at x=N+1 is 1+(1/2)+(1/3)+...+(1/N)-C. We next begin multivariate differential calculus, starting at the middle of Rudin Chapter 9 (since the first part of that chapter is for us a review of linear algebra). Again, Rudin works with functions from open subsets of Rn to Rm, but must of the discussion works equally well with the target space Rm replaced by an arbitrary normed vector space V. If we want to allow arbitrary normed vector spaces for the domain of f, we'll usually have to require that the derivative f' be a continuous linear map, or equivalently that its norm ||f'||=sup|v|<=1|f'(v)| be finite.

As in the univariate case, proving the Mean Value Theorem in the multivariate context (Theorem 9.19) requires either that V have an inner-product norm, or the use of the Hahn-Banach theorem to construct suitable functionals on V. Once this is done, the key Theorem 9.21 can also be proved for functions to V, and without first doing the case m=1. To do this, first prove the result in the special case when each Dj f(x) vanishes; then reduce to this case by subtracting from f the linear map from Rn to V indicated by the partial derivatives Dj f(x).

The Inverse function theorem (9.24) is a special case of the Implicit function theorem (9.28), and its proof amounts to specializing the proof of the implicit function theorem. But Rudin proves the Implicit theorem as a special case of the Inverse theorem, so we have to do Inverse first. (NB for these two theorems we will assume that our target space is finite-dimensional; how far can you generalize to infinite-dimensional spaces?) Note that Rudin's statement of the contraction principle (Theorem 9.23 on p.220) is missing the crucial hypothesis that X be nonempty!

The proof of the second part of the implicit function theorem, which asserts that the implicit function g not only exists but is also continuously differentiable with derivative at b given by formula (58) (p.225), can be done more easily using the chain rule, since g has been constructed as the composition of the following three functions: first, send y to (0,y); then, apply the inverse function F-1; finally, project the resulting vector (x,y) to x. The first and last of these three functions are linear, so certainly C1; and the continuous differentiability of F-1 comes from the inverse function theorem.

Here's an approach to Dij=Dji that works for a C2 function to an arbitrary normed space. As in Rudin (see p.235) we reduce to the case of a function of two variables, and define u and Delta. Assume first that D21 f vanishes at (a,b). Then use the Fundamental Theorem of Calculus to write Delta(f,Q) as as the integral of u'(t)dt on [a,a+h], and then write u'(t) as an integral of D21 f(t,s)ds on [b,b+k]. Conclude that u'(t)=o(k) and thus that Delta(f,Q) / hk approaches zero. Now apply this to the function f-xyD21 f(x,y) to see that in general Delta(f,Q) / hk approaches D21 f(x,y). Do the same in reverse order to conclude that D21 f(x,y)=D12 f(x,y). Can you prove D12(f)=D21(f) for a function f to an arbitrary inner product space under the hypotheses of Theorem 9.41? Next topic, and last one from Rudin, is multivariate integral calculus (Chapter 10). Having defined multivariate integrals and obtained the formula for change of variables, we're setting up the machinery of differential forms that is the framework for the higher-dimensional generalization of the Fundamental Theorem of Calculus that comprises the divergence, Stokes, and Green theorems and much else besides.

The basic formula (92) that yields generalized Stokes (formula 91) is even more easily proved for oriented k-cells instead of k-simplices, since each term of d(omega) on the left-hand side of the identity yields the sum of 2 of the 2k terms of the boundary occurring on the right-hand side. Naturally the signs that enter when one does it for a k-simplex are the reason we put the (-1)j into the definition of the boundary (formula 85). One interpretation of generalized Stokes is that the operators d on differential forms, and boundary on chains, are adjoints relative to the pairing given by integration. In fact just as d 2=0 we have (boundary)2=0; this is easy to show, and further bolsters our confidence in our choice of signs in formula 85. Just as we defined the k-th cohomology of E as the quotient of the closed k-forms on E by the exact k-forms, we can define the k-th homology Hk(E) to be the quotient of the closed k-chains in E by the boundaries of (k+1)-chains, where as you might expect a chain is said to be ``closed'' if its boundary is a trivial chain (this generalizes the notion of a ``closed curve''). For instance, H0(E) has one generator for each component of E; when k is positive, Hk(E) is trivial if E is convex; and H1(R2-{0}) is one-dimensional, generated by a circle around the origin. Moreover, thanks to generalized Stokes, integration yields a pairing of Hk(E) with H k(E). You will learn in differential topology that this pairing often identifies each of these two spaces with the other's dual. You can already check that this holds for the punctured plane R2-{0}. The following remarks may help explain how come a function from an open set in R3 to R3 yields both a 1-form and a 2-form (formulas 124, 125 on page 281). In general, once we choose a generator for the n-th exterior power of an n-dimensional vector space V over some field (remember that this top exterior power is 1-dimensional), the exterior product yields for each k a pairing between the k-th and (n-k)-th exterior powers of V, which identifies each of the two spaces with the other's dual. In our case n=3 and k=1, so we identify the dual of R3 with its exterior square. But, once we have chosen an inner product structure on R3, we have identified it with its own dual; and once we also give R3 an orientation, we get a generator for its top exterior power, namely e1^e2^e3 for any positively oriented orthonormal basis (e1, e2, e3). So, we have an identification of R3 with its exterior square, and thus can naturally identify the 1- and 2-forms on any open set in R3 with functions from that set to R3.

At a more down-to-earth level, we can also use these ideas to explain the ``cross product'' on an oriented 3-dimensional real inner product space V: the cross product v x w is the image of v^w under the map that identifies the exterior square of V with V.
A generator for the n-th exterior power of an n-dimensional real vector space V can be regarded as an orientation on V. If V also carries an inner product structure, then our construction identifies differential k-forms and (n-k)-forms on any open set in V; this identification is known as the Hodge star operator from k-forms to (n-k)-forms. You can check that the Laplacian of a function f, regarded as a 0-form, is just *d*df -- which is one explanation of why the Laplacian is invariant under orthogonal changes of coordinates. For each k, *d*d is an operator on the space of differential k-forms, and may be regarded as the Laplacian on this space. Can you write this operator explicitly as a differential operator on (n!)/(k!(n-k)!)-tuples of C2 functions? We're about to start on Fourier analysis, using Körner's book (a bit of this material can also be found in Chapter 8 of Rudin). The natural context for Fourier analysis is the notion of a Hilbert space. The one strange choice Körner makes is to suppress this notion entirely (Hilbert's name does not even appear in the index!), as if to prove that all of Fourier analysis can be done without Hilbert spaces, as Axler set out to do all of linear algebra without determinants. But doing Fourier without Hilbert seems perverse. We shall thus begin with an introduction to the structure of Hilbert spaces, in particular separable ones. The first handout (hilbert1.pdf, hilbert1.ps) gives the definition, basic examples, and the fundamental notion of an orthonormal topological basis (ontb). The second handout (hilbert2.pdf, hilbert2.ps) introduces separability, and the behavior of orthogonal complements, projections, and duality for a Hilbert space. With what we know already, we can give another kind of proof that the complex exponentials are an ontb for L2([0,1]): their linear span is a C-subalgebra of C([0,1]) which separates points and is ``self-adjoint'' (closed under complex conjugation); therefore, by the complex Stone-Weierstrass theorem (Thm. 7.33 on page 165 of Rudin), it is dense in C([0,1]) in the sup metric. [In fact Rudin's introduction to the theory of Fourier series in Chapter 8 uses this argument; see Thm. 8.15 on page 190.] Thus a fortiori that linear span is dense in C([0,1]) also relative to the L2 metric, and therefore is dense in L2([0,1]), as desired. Rudin only deals with Fourier series in one dimension, but the same method also works for Fourier series on Tn for n>1. We'll proceed by developing Fourier analysis starting with the following chapters of Körner: 1 (Introduction), 2 (Statement and proof of Fejér's theorem), 3 (Weyl equidistribution), 9,15,16,18 (some results on convergence and divergence of Fourier series). Chapters 4 through 6 contain material most of which we have seen already, but the second part of Chapter 6 may still surprise you. You might also be interested in Chapter 19, which states without proof several results much more precise (though necessarily much harder) than is done in Chapter 18. Re Chapter 3: Weyl showed more generally the following theorem: Let t1, t2, t3,... be real numbers mod 2*pi, i.e., elements of T. Then the following are equivalent:

• For all a,b with 0 < a < b < 1, the number of tr with r <= n lying in 2*pi[a,b] is asymptotic to (b-a) n;
• For any continuous function f from T to C, as n grows to infinity the average of f(tr) over r <= n approaches the average of f over T;
• For each nonzero integer s, the average of eistr approaches 0 as n grows to infinity.
When these equivalent conditions hold, the sequence {tr} is said to be ``equidistributed mod 2*pi''. Thus the proof of Thm. 3.1 is in effect the application of this criterion to prove the equidistribution mod 1 of the sequence of integer multiples of a given irrational. Weyl's 1914 paper is reprinted in Vol.I (pages 487-497) of his Gesammelte Abhandlungen (Collected Works), which you can find in several of the Harvard libraries. Re Chapter 18: As noted in class, what's really going on here is the following result: Let V,W be normed vector spaces, with V complete. Then any weakly convergent sequence {Tn} in B(V,W) must be bounded; in fact if {Tn} is unbounded then there are vectors v in V such that |Tnv| approaches infinity. (Recall that {Tn} is said to converge weakly to a linear transformation T if Tnv approaches Tv for all v in V -- in which case of course {Tnv} is bounded.)

To prove this result: Assume on the contrary that {Tn} is unbounded. Without loss of generality (i.e. by passing to a subsequence), assume that ||T1||>1 and, for each n, we have ||Tn|| > 100 ||Tn-1||. let vn be a vector of norm 1 such that |Tnvn| > ||Tn|| / 2. Now define v as a sum of a convergent sequence thus: u0=0; given un, choose for un+1 whichever of un+10-nvn or un-10-nvn makes |Tnun+1| larger; by the triangle inequality, |Tnvn+1| is then at least ||Tn||/(2*10n). Let v be the limit of un. Then |Tnv| is at least (1/2-1/9)10-n||Tn||, so approaches infinity as claimed. [Except for the needlessly large factors of 100 and 10, this is essentially what happens in Chapter 18 for the sequence of functionals on C([0,1]) taking f to Sn(f,t).] Next on the menu: Chapters 40 and 41, on orthogonal polynomials and Gaussian quadrature. Unaccountably missing from the end of Chapter 40 is the following result, which I'll call ``Theorem 40.9'': let a,b,r,un be as in Theorem 40.8; then there is a ``three-term recurrence'' un+1 = (Ant+Bn)un - Cnun-1, for some real scalars An , Bn , Cn. For instance, the existence of such a recurrence for the Tchebychev polynomials follows from the product formula for cos(nx)cos(x). In general, the recurrence can be proved by comparing the expansions of (t un) as a linear combination of the um and (t um) as a linear combination of the un. This method also gives explicitly formulas for An , Cn (but not generally Bn) in terms of the leading coefficients and norms of the un.

All the results of Chapter 41 generalize directly to integration with respect to an arbitrary weight on a finite interval, using zeros of polynomials orthogonal with respect to that weight. Note too that Lemma 41.1 can also be obtained as a consequence of the fact that, over any field F, evaluation at n distinct field elements is a linear bijection from the polynomials of degree <n to Fn. This can be proved either computationally, as in Körner, or from general linear-algebra facts together with the observation that the kernel of the map is zero. The left-hand side (even with an arbitrary weight function) is a linear functional on that space, etc. We're starting on discrete Fourier analysis, from chapters 97ff. in Körner. Fourier series, discrete Fourier analysis, and the Fourier transform (which we'll proceed to next) are special cases of Pontrjagin duality. We shall not be able to develop Pontrjagin duality in anything like its full scope in 55b; even the basic theorems require detailed study of things like Haar measure that are normally not established until well into a graduate analysis course like Math 212. But can still state some of the definitions and key theorems to suggest the framework into which the various Fourier variants fit. Here's an outline. For another glimpse into the Pontrjagin framework, in the special case of finite abelian groups where everything can be stated and proved using only linear algebra, see Chapters 103-4 of the textbook.

In each of our Fourier settings, to pointwise multiply two functions on a group G we convolve their Fourier transforms, and vice versa (up to scalar multiplication). See Chapters 51, 52 for convolution on R and T. Convolution on Z/nZ is defined in 97.7 and described in 97.8 on page 495. In general the convolution of two functions f,g is the function f*g whose value at any t is the sum/average/integral over x of f(t-x)g(x). We encountered such things already in connection with the Dirichlet and Fejér kernels; they will also be the key to our use of the fast Fourier transform (FFT, Chapter 98) to fast multiplication (Chapter 99). Note that the behavior of Pontrjagin duality on subgroups is also an important ingerdient in the FFT, which separates even and odd integers mod 2N. First problem set: Univariate differential calculus (PS, PDF') corrected 31.i.03 (see problem 5) Second problem set: Univariate integral calculus (PS, PDF') Third problem set: Univariate calculus cont'd, and Stone-Weierstrass (PS, PDF') If Problems 6, 7, 8 are too easy or too familiar, see if you can evaluate the integral of cosn(x) cos(cx) from 0 to Pi/2, and deduce further product formulas. Fourth problem set: Multivariate differential calculus (PS, PDF') Fifth problem set: Multivariate integral calculus (PS, PDF') Sixth problem set: Interlude on convexity; introduction to differential forms (PS, PDF') Seventh problem set: Differential forms, chains, integration, and more exterior algebra (PS, PDF') Eighth problem set: Life in Hilbert space (PS, PDF') Ninth problem set: Fourier series I [summability methods, Weyl equidistribution, and differentiation] (PS, PDF') Tenth problem set: Fourier series, cont'd; orthogonal polynomials (PS, PDF') Eleventh and last problem set: Fourier series, the grand finale: Gauss, Jacobi, Poisson, and Müntz (PS, PDF') pp.pdf, pp.ps: An assortment of practice problems