Math 55b: Honors Abstract Algebra (Fall 2017)

Warning: MathJax requires JavaScript to process the mathematics on this page.
If your browser supports JavaScript, be sure it is enabled.

Math 55b: Honors Real and Complex Analysis Lecture notes, etc., for Math 55b: Honors Real and Complex Analysis (Spring [2017-]2018)

If you find a mistake, omission, etc., please let me know by e-mail.

The orange balls mark our current location in the course, and the current problem set.

Office hours will be in the Lowell House Dining Hall as in Math 55a (usually Tuesday after Math Table, so 7:30-9:00 PM), or by appointment.

Math Night will again happen Monday evenings/nights, usually 8-10, in Leverett Dining Hall, starting January 22. The course CA’s will again hold office hours there.

Section times:
Rohil Prasad: Thursday 3-4 PM, Science Center room 309A.
Vikram Sundar: Wendesday 4-5 PM, Science Center room 304.

Vikram’s notes for 55b will be here.

Our first topic is the topology of metric spaces, a fundamental tool of modern mathematics that we shall use mainly as a key ingredient in our rigorous development of differential and integral calculus over $\bf R$ and $\bf C$. To supplement the treatment in Rudin’s textbook, I wrote up 20-odd pages of notes in six sections; copies will be distributed in class, and you also may view them and print out copies in advance from the PDF files linked below. [Some of the explanations, e.g. of notations such as $f(\cdot)$ and the triangle inequality in $\bf C$, will not be necessary; they were needed when this material was the initial topic of Math 55a, and it doesn’t feel worth the effort to delete them now that it’s been moved to 55b. Likewise for the comment about the Euclidean distance at the top of page 2 of the initial handout on “basic definitions and examples”.]

Metric Topology I
Basic definitions and examples: the metric spaces Rⁿ and other product spaces; isometries; boundedness and function spaces

The “sup metric” on $X^S$ is sometimes also called the “uniform metric” because $d(\,f,g) \leq r$ is equivalent to a bound, $d(\,f(s),g(s)) \leq r$ for all $s \in S$, that is “uniform” in the sense that it’s independent of the choice of $s$. Likewise for the sup metric on the space of bounded functions from $S$ to an arbitrary metric space $X$ (see the next paragraph).

If $S$ is an infinite set and $X$ is an unbounded metric space then we can’t use our definition of $X^S$ as a metric space because $\sup_S d_X(\,f(s),g(s))$ might be infinite. But the bounded functions from $S$ to $X$ do constitute a metric space under the same definition of $d_{X^S}$. A function is said to be “bounded” if its image is a bounded set. You should check that $d_{X^S}(\,f,g)$ is in fact finite for bounded f and g.

Now that metric topology is in 55b, not 55a, the following observation can be made: if $X$ is $\bf R$ or $\bf C$, the bounded functions in $X^S$ constitute a vector space, and the sup metric comes from a norm on that vector space: $d(\,f,g) = \| \, f-g \|$ where the norm $\left\| \cdot \right\|$ is defined by $\| \, f \| = \sup_{s \in S} | \, f(s)|.$ Likewise for the bounded functions from $S$ to any normed vector space. Such spaces will figure in our development of real analysis (and in your further study of analysis beyond Math 55).

The “Proposition” on page 3 of the first topology handout can be extended as follows:
iv) For every $p \in X$ there exists a real number $M$ such that $d(p,q) \lt M$ for all $q \in E.$
In other words, for every $p \in X$ there exists an open ball about $p$ that contains $E.$ Do you see why this is equivalent to (i), (ii), and (iii)?

Metric Topology II
Open and closed sets, and related notions

Metric Topology III
Introduction to functions and continuity

Metric Topology IV
Sequences and convergence, etc.

The proof of “uniform limit of continuous is continuous” also shows that “uniform limit of uniformly continuous is uniformly continuous”: if each $f_n$ is uniformly continuous then a single $\delta$ will work for all $x$.

A typical application is the continuity of power series such as $\sum_{k=0}^\infty x^k/k!$ (which by definition is the pointwise limit of the partial sums $f_n(x) = \sum_{k=0}^n x^k/k!$). The limit is not uniform as $x$ ranges over all of $\bf R$ (or $\bf C$), but it is uniform for $x$ in every ball $B_R(0)$, and since every $x$ is in some such ball the limit function is continuous. Likewise for any power series $\sum_{k=0}^\infty a_k x^k$ with $\{a_k\}$ bounded, as long as $x$ is within the open circle of convergence $|x| \lt 1$: the limit is not in general uniform, but it is uniform in the ball $B_r(0)$ for every $r \lt 1$ (why?), which is good enough because $|x|\lt 1$ means that $x$ is contained in such a ball. In either case, each $f_n$ is easily seen to be uniformly continuous on bounded sets, so the argument noted in the previous paragraph shows that the limit function $f$ is also uniformly continuous in every ball on which we showed that it is continuous (that is, arbitrary $B_R(0)$ for $\sum_{k=0}^\infty x^k/k!$, or $B_r(0)$ with $r \lt 1$ for $\sum_{k=0}^\infty a_k x^k$ with $\{a_k\}$ bounded). [NB we need to use open balls $B$ so that we can test continuity at $x \in X$ on a neighborhood in $B$: for small enough $\epsilon$, the $\epsilon$-neighborhood of $x$ in $X$ is contained in $B$.]

Metric Topology V
Compactness and sequential compactness

Metric Topology VI
Cauchy sequences and related notions (completeness, completions, and a third formulation of compactness)

Here is a more direct proof (using sequential compactness) of the theorem that a continuous map $f: X \to Y$ between metric spaces is uniformly continuous if $X$ is compact. Assume not. Then there exists $\epsilon \gt 0$ such that for all $\delta \gt 0$ there are some points $p,q \in X$ such that $d(p,q) \lt \delta$ but $d(\,f(p),f(q)) \geq \epsilon.$ For each $n = 1, 2, 3, \ldots,$ choose $p_n, q_n$ that satisfy those inequalities for $\delta = 1/n.$ Since $X$ is assumed compact, we can extract a subsequence $\{ p_{n_i} \! \}$ of $\{p_n\!\}$ that converges to some $p \in X.$ But then $\{q_{n_i}\!\}$ converges to the same $p$. Hence both $f(p_{n_i})$ and and $f(q_{n_i})$ converge to $f(p),$ which contradicts the fact that $d(\,f(p_{n_i}), f(q_{n_i})) \geq \epsilon$ for each $i$.

Our next topic is differential calculus of vector-valuek functions of one real variable, building on Chapter 5 of Rudin.

You may have already seen “little oh” and “big Oh” notations. For functions $f,g$ on the same space, “$f = O(g)$” means that $g$ is a nonnegative real-valued function, $\,f$ takes values in a normed vector space, and there exists a real constant $M$ such that $\left|\,f(x)\right| \le M g(x)$ for all $x$. The notation “$f = o(g)$” is used in connection with a limit; for instance, “$f(x) = o(g(x))$ as $x$ approaches $x_0$” indicates that $f,g$ are vector- and real-valued functions as above on some neighborhood of $x_0$, and that for each $\epsilon \gt 0$ there is a neighborhood of $x_0$ such that $\left|\,f(x)\right| \le \epsilon g(x)$ for all $x$ in the neighborhood. (Given $g$ and the target of $f\!$, functions $f=O(g)$ form a vector space, which contains functions $o(g)$ as a subspace.) Thus $f'(x_0) = a$ means the same as “$f(x) = f(x_0) + a (x-x_0) + o(\left|x-x_0\right|)$ as $x$ approaches $x_0\!$”, with no need to exclude the case $x = x_0.$ Rudin in effect uses this approach when proving the Chain Rule (5.5).
Apropos the Chain Rule: as far as I can see we don’t need continuity of $f$ at any point except $x$ (though that hypothesis will usually hold in any application). All that’s needed is that $x$ has some relative neighborhood $N$ in $[a,b]$ such that $f(N)$ is contained in $I$. Also, it is necessary that $f$ map $[a,b]$ to $\bf R$, but $g$ can take values in any normed vector space.
The derivative of $f/g$ can be obtained from the product rule, together with the derivative of $1/g$ — which in turn can be obtained from the Chain Rule together with the the derivative of the single function $1/x$. [Also, if you forget the quotient-rule formula, you can also reconstruct it from the product rule by differentiating both sides of $f = g \cdot (\,f/g)$ and solving for $(\,f/g)';$ but this is not a proof unless you have some other argument to show that the derivative exists in the first place.] Once we do multivariate differential calculus, we’ll see that the derivatives of $f+g$, $f-g$, $fg$, $f/g$ could also be obtained in much the same way that we showed the continuity of those functions, by combining the multivariate Chain Rule with the derivatives of the specific functions $x+y$, $x-y$, $xy$, $x/y$ of two variables $x,y.$
As Rudin notes at the end of this chapter, differentiation can also be defined for vector-valued functions of one real variable. As Rudin does not note, the vector space can even be infinite-dimensional, provided that it is normed; and the basic algebraic properties of the derivative listed in Thm. 5.3 (p.104) can be adapted to this generality, e.g., the formula $(\,fg)' = f'g + fg'$ still holds if $f,g$ take values in normed vector spaces $U,V$ and multiplication is interpreted as a continuous bilinear map from $U \times V$ to some other normed vector space $W$.
“Rolle’s Theorem” is the special case $f(b) = f(a)$ of Rudin’s Theorem 5.10; as you can see it is in effect the key step in his proof of Theorem 5.9, and thus of 5.10 as well. [As you can see from the Wikipedia page, the attribution of this result to Michelle Rolle (1652–1719) is problematic in several ways, and seems to be a good example of “Stigler’s law of eponymy”.]
We omit 5.12 (continuity of derivatives) and 5.13 (L’Hôpital’s Rule). In 5.12, see p.94 for Rudin’s notion of “simple discontinuity” (or “discontinuity of the first kind”) vs. “discontinuity of the second kind”, but please don’t use those terms in your problem sets or other mathematical writing, since they’re not widely known. In Rudin’s proof of L’Hôpital’s Rule (5.13), why can he assume that $g(x)$ does not vanish for any $x$ in $(a,b)$, and that the denominator $g(x) - g(y)$ in equation (18) is never zero?
NB The norm does not have to come from an inner product structure. Often this does not matter because we work in finite dimensional vector spaces, where all norms are equivalent, and changing to an equivalent norm does not affect the definition of the derivative. One exception to this is Theorem 5.19 (p.113) where one needs the norm exactly rather than up to a constant factor. This theorem still holds for a general norm but requires an additional argument. The key ingredient of the proof is this: given a nonzero vector $z$ in a vector space $V\!$, we want a continuous functional $w$ on $V\,$ such that $\left\| w \right \| = 1$ and $w(z) = \left| z \right|.$ If $V$ is an inner product space (finite-dimensional or not), the inner product with $z \left/ \left| z \right| \right.$ provides such a functional $w$. But this approach does not work in general. The existence of such $w$ is usually proved as a corollary of the Hahn-Banach theorem. When $V$ is finite dimensional, $w$ can be constructed by induction on the dimension $V\!$. To deal with the general case one must also invoke the Axiom of Choice in its usual guise of Zorn’s Lemma.

We next start on univariate integral calculus, largely following Rudin, chapter 6. The following gives some motivation for the definitions there. (And yes, it’s the same Riemann (1826–1866) who gave number theorists like me the Riemann zeta function and the Riemann Hypothesis.)

The Riemann-sum approach to integration goes back to the “method of exhaustion” of classical Greek geometry, in which the area of a plane figure (or the volume of a region in space) is bounded below and above by finding subsets and supersets that are finite unions of disjoint rectangles (or boxes). The lower and upper Riemann sums adapt this idea to the integrals of functions which may be negative as well as positive (recall that one of the weaknesses of geometric Greek mathematics is that the ancient Greeks had no concept of negative quantities — nor, for that matter, of zero). You may have encountered the quaint technical term “quadrature”, used in some contexts as a synonym for “integration”. This too is an echo of the geometrical origins of integration. “Quadrature” literally means “squaring”, meaning not “multiplying by itself” but “constructing a square of the same size as”; this in turn is equivalent to “finding the area of”, as in the phrase “squaring the circle”. For instance, Greek geometry contains a theorem equivalent to the integration of $\int x^2 \, dx,$ a result called the “quadrature of the parabola”. The proof is tantamount to the evaluation of lower and upper Riemann sums for the integral of $x^2 \, dx$.
An alternative explanation of the upper and lower Riemann sums, and of “partitions” and “refinements” (Definitions 6.1 and 6.3 in Rudin), is that they arise by repeated application of the following two axioms describing the integral (see for instance L.Gillman’s expository paper in the American Mathematical Monthly (Vol.100 #1, 16–25)):

For any $a,b,c$ (with $a \lt b \lt c$) we have $\int_a^c f(x)\, dx = \int_a^b f(x)\, dx + \int_b^c f(x)\, dx;$
If a function $f: [a,b] \to {\bf R}$ takes values in $[m,M]$ then $\int_a^b f(x) \, dx \in [m(b-a),M(b-a)]$ (again assuming $a \lt b$).
The latter axiom is a consequence of the following two: the integral $\int_a^b K \, dx$ of a constant function $K$ is $K(b-a);$ and if $f(x) \le g(x)$ for all $x$ in the interval $[a,b]$ then $\int_a^b f(x) \, dx \le \int_a^b g(x) \, dx.$ Note that again all these axioms arise naturally from an interpretation of the integral as a “signed area”.
The (Riemann-)Stieltjes integral, with $d\alpha$ in place of $dx$, is then obtained by replacing each $\Delta x = b - a$ by $\Delta\alpha = \alpha(b) - \alpha(a).$
Here’s a version of Riemann-Stieltjes integrals that works cleanly for integrating bounded functions from $[a,b]$ to any complete normed vector space.

In Theorem 6.11 (integrable functions are preserved under continuous maps), we readily generalize to the integrability over $[a,b]$ of $h = \phi \circ f$ when $f:[a,b] \to [m,M]$ is integrable and $\phi$ is a continuous map from $[m,M]$ to a complete normed vector space $V$. If we want to generalize further by letting $\,f$ itself be vector-valued, then we must explicitly assume that $\phi$ is uniformly continuous, which Rudin doesn’t have to do in 6.11 because $[m,M]$ is compact.
In Theorem 6.12, property (a) says the integrable functions form a vector space, and the integral is a linear transformation; property (d) says it’s a bounded transformation relative to the sup norm, with operator norm at most $\Delta\alpha = \alpha(b)-\alpha(a)$ (indeed it’s not hard to show that the operator norm equals $\Delta\alpha = \alpha(b)-\alpha(a);$ and (b) and (c) are the axioms noted above. Property (e) almost says the integral is linear as a function of $\alpha$ — do you see why “almost”?
Recall the “integration by parts” identity: $fg$ is an integral of $\,f \, dg + g \, df.$ The Stieltjes integral is a way of making sense of this identity even when $\,f$ and/or $g$ is not continuously differentiable. To be sure, some hypotheses on $\,f$ and $g$ must still be made for the Stieltjes integral of $\,f\, dg$ to make sense. Rudin specifies one suitable system of such hypotheses in Theorem 6.22.
Riemann-Stieltjes integration by parts: Suppose both $\,f$ and $g$ are increasing functions on $[a,b].$ For any partition $a = x_0 \lt \cdots \lt x_n = b$ of the interval, write $\,f(b) g(b) - f(a) g(a)$ as the telescoping sum $\sum_{i=1}^n \left(f(x_i) g(x_i) - f(x_{i-1}) g(x_{i-1})\right).$ Now rewrite the $i$-th summand as $$ f(x_i) (g(x_i) - g(x_{i-1})) + g(x_i) (f(x_i) - f(x_{i-1})). $$ [Naturally it is no accident that this identity resembles the one used in the familiar proof of the formula for the derivative of $fg$!] Summing this over $i$ yields the upper Riemann-Stieltjes sum for the integral of $\,f \, dg$ plus the lower R.-S. sum for the integral of $g \, df$. Therefore: if one of these integrals exists, so does the other, and their sum is $\,f(b) g(b) - f(a) g(a).$ [Cf. Rudin, page 141, Exercise 17.]

Some of Chapter 7 of Rudin we’ve covered already in the topology lectures and problem sets. For more counterexamples along the lines of the first section of that chapter, see Counterexamples in Analysis by B.R.Gelbaum and J.M.H.Olsted — there’s a copy in the Science Center library (QA300.G4). Concerning Thm. 7.16, be warned that it can easily fail for “improper integrals” on infinite intervals. It is often very useful to bring a limit or an infinite sum within an integral sign, but this procedure requires justification beyond Thm. 7.16.

We’ll cover some of the new parts of Chapter 7: Weierstrass M, 7.10, extended to vector-valued functions; uniform convergence and $\int$ (7.16, again in vector-valued setting, with the target space $V$ normed and complete); and the Stone-Weierstrass theorem, which is the one major result of Chapter 7 we haven’t seen yet. We then proceed to power series and the exponential and logarithmic functions in Chapter 8. We omit most of the discussion of Fourier series (185–192), an important topic (which used to be the concluding topic of Math 55b), but one that alas cannot be accommodated given the mandates of the curricular review. We’ll encounter a significant special case in the guise of Laurent expansions of an analytic function on a disc. See these notes (part 1, part 2) from 2002-3 on Hilbert space for a fundamental context for Fourier series and much else (notably much of quantum mechanics), which is also what we’ll use to give one proof of the Müntz-Szász theorem on uniform approximation on $[0,1]$ by linear combinations of arbitrary powers. [Yes, if I were to rewrite these notes now I would not have to define separability, because we already did that in the course of developing the general notion of compactness.]

We also postpone discussion of Euler’s Beta and Gamma integrals (also in Chapter 8) so that we can use multivariate integration to give a more direct proof of the formula relating them.

The result concerning the convergence of alternating series is stated and proved on pages 70-71 of Rudin (Theorem 3.42).

The original Weierstrass approximation theorem (7.26 in Rudin) can be reduced to the uniform approximation of the single function $|x|$ on $[-1,1].$ From this function we can construct an arbitrary piecewise linear continuous function, and such piecewise linear functions uniformly approximate any continuous function on a closed interval.(*) [They also give yet another example of a natural vector space with an uncountably infinite algebraic basis.] To get at $|x|,$ we’ll rewrite it as $[1 - (1-x^2)]^{1/2},$ and use the power series for $(1-X)^{1/2}.$ We need $(1-X)^{1/2}$ to be approximated by its power series uniformly on the closed interval $[-1,1]$ (or at least [0,1]); but fortunately this too follows from the proof of Abel’s theorem (8.2, pages 174-5). Actually this is a subtler result than we need, since the $X^n$ coefficient of the power series for $(1-X)^{1/2}$ is negative for every $n \gt 0.$ If a power series $f(X)$ has radius of convergence 1 and all but finitely many of its nonzero coefficients have the same sign, then it is easily shown that the sum of the coefficients converges if and only if $f(X)$ has a finite limit as $X \to 1,$ in which case the sum equals that limit and the power series converges uniformly on $[0,1].$ That’s all we need because clearly $(1-X)^{1/2}$ extends to a continuous function on $[0,1].$ (For an alternative approach to uniformly approximating $|x|,$ see Exercise 23 on p.169.)

(*) Let $\,f$ be any continuous function on $[0,1]$. It is uniformly continuous because $[0,1]$ is compact. So, given $\epsilon \ge 0$ there exists $\delta \ge 0$ such that $|x-x'| \lt \delta \Rightarrow |\,f(x)-f(x')| \lt \epsilon$. Now let $g$ be the piecewise linear function such that $g(x) = f(x)$ at $x=0,\delta,2\delta,3\delta,\ldots,N\delta$ (with $N = \lfloor 1/\delta \rfloor$) and at $x=1$, and is (affine) linear on $[N\delta,1]$ and on each $[(i-1)\delta,i\delta]$ ($1 \leq 1 \leq N$). Exercise: $|\,f(x)-g(x)| \lt \epsilon$ for all $x \in [0,1]$. So we have uniformly approximated $\,f$ to within $\epsilon$ by the piecewise-linear continuous function $g$.

Rudin’s notion of an “algebra” of functions is almost a special case of what we called an “algebra over $\bf F$” in 55a (with ${\bf F} = \bf R$ or $\bf C$ as usual), except that Rudin does not require his algebras to have a unit (else he wouldn’t have to impose the “vanish on no point” condition). The notion can be usefully abstracted to a “normed algebra over $\bf F$”, which is an algebra together with a vector space norm $\| \cdot \|$ satisfying $\|xy\| \le \|x\| \, \|y\|$ for all $x$ and $y$ in the algebra. Among other things this leads to the Stone-Čech theorem.

In the first theorem of Chapter 8, Rudin obtains the termwise differentiability of a power series at any $x$ with $x \lt R$ by applying Theorem 7.17. That’s nice, but we’ll want to use the same result in other contexts, notably over $\bf C$, where the mean value theorem does not apply. So we instead give an argument that works in any complete field with an absolute value — this includes $\bf R$, $\bf C$, and other examples such as the field ${\bf Q}_p$ of $p$-adic numbers. If the sum of $c_n x^n$ converges for some nonzero $x$ with some $|x| = R$, then any $x$ satisfying $x \lt R$ has a neighborhood that is still contained in $\{ y : |y| \lt R\}$. So if $\,f(x)$ is the sum of that series, then for $y \neq x$ in that neighborhood we may form the usual quotient $(\,f(y)-f(x)) \, / \, (x-y)$ and expand it termwise, then let $y \to x$ and recover the expected power series for $f'(x)$ using the Weierstrass $M$ test (Theorem 7.10).

An alternative derivation of formula (26) on p.179: differentiate the power series (25) termwise (now that we know it works also over $\bf C$) to show $E(z) = dE(z)/dz$; then for any fixed w the difference $E(w+z)-E(w)E(z)$ is an analytic function of $z$ that vanishes at $z = 0$ and is thus zero everywhere.

In algebraic terms, identities (26) and (27) say that $E$ (that is, $\exp$) gives group homomorphisms from $({\bf R}, +)$ to ${\bf R}^*$ and from $({\bf C}, +)$ to ${\bf C}^*$. Theorem 8.6 includes the assertion that in the real case this map has image the positive reals, and trivial kernel; so there is a well-defined inverse function from the multiplicative group of positive reals back to $({\bf R}, +);$ and that’s the logarithm function. In the complex case, we shall soon see that the image is all of ${\bf C}^*$, but the kernel is no longer trivial (in fact, $\ker(\exp)$ consists of the integer multiples of $2\pi i$), which means that more care will be needed if we want to define and use logarithms of complex numbers.

Small error in Rudin: the argument on p.180 that “Since $E$ [a.k.a. $\exp$] is strictly increasing and differentiable on [the real numbers], it has an inverse function $L$ which is also strictly increasing and differentiable …” is not quite correct: consider the strictly increasing and differentiable function $x \mapsto x^3.$ What’s the correct statement? (Hint: the Chain Rule tells you what the derivative of the inverse function must be.)

In any case, we have deliberately omitted the univariate Inverse Function Theorem in anticipation of the multivariate setting where the Inverse and Implicit Function Theorems are equivalent. However, if there is a differentiable inverse function then we known its derivative from the Chain Rule. So if $L'(y)$ exists then it equals $1/y;$ this together with $L(1) = 0$ gives us the integral formula $L(y) = \int_1^y dx/x$ (via the Fundamental Theorem of Calculus), and then we can define $L(y)$ by this formula, and differentiate to prove that it is in fact the inverse function of $E$ for $y \gt 0.$

The same approach identifies $\tan^{-1}(y)$ with $\int_0^y dx/(x^2+1)$ once we have constructed the sine and cosine functions (Rudin’s “$S$” and and “$C$”) and checked that the derivative of their ratio $\tan(x)$ is $\tan^2(x) + 1.$ This yields the power-series expansion $$ \tan^{-1}(y) = y - \frac{y^3}{3} + \frac{y^5}{5} - \frac{y^7}{7} + \frac{y^9}{9} - \frac{y^{11}}{11} + - \cdots $$ for $|y| \lt 1$ (be sure you understand how to derive this from the formula for the derivative of $\tan^{-1}(1)$!), and thus also $$ \frac\pi4 = \tan^{-1}(1) = 1 - \frac13 + \frac15 - \frac17 + \frac19 - \frac1{11} + - \cdots $$ (why?). Notes:

You can also check that $1 / (x^2+1)$ is $\left( 1/(x-i) - 1/(x+i) \right) / 2i,$ and that the corresponding linear combination of $\log (x \pm i)$ seems to agree with $\tan^{-1}(x)$ — though I don’t think we are quite in position yet to make rigorous sense of this route to $\int dx/(x^2+1)$.
The power series for $\sin$, $\cos$, and $\tan^{-1},$ and the alternating series for $\pi/4$, long predate 19th-century calculus. They are often named for Leibniz (1646–1716) or James Gregory (1638–1675), but were already known centuries earlier — together with some computational applications, including the evaluation of $\pi$ as $4 \cos^{-1}(0)$ — to the mathematicians of the Kerala school, and “are believed to have been discovered by Madhava of Sangamagrama (c. 1350 – c. 1425)” according to the “Madhava series” page on Wikipedia.

Similarly we get $\int_0^y dx/(1-x^2)^{1/2} = \sin^{-1} y$ for $|y| \leq 1$; note that this is the “principal value” of $\sin^{-1} y$ (i.e., the choice in $[-\pi/2, \pi/2$]), and that for $y = \pm 1$ the integral is “improper” and must be interpreted as a limit $\lim_{y \to 1-}$ or $\lim_{y \to (-1)+}.$ Likewise $\int dx/(x^2+1)^{1/2}$ leads to inverse functions of the hyperbolic trigonometric functions $\sinh(x) = (e^{x} - e^{-x})/2$ and $\cosh(x) = (e^{x} + e^{-x})/2.$ These basic indefinite integrals, together with elementary changes of variable and integrations by parts, suffice to obtain any indefinite integral one is likely to encounter in a first-year calculus class.

As far as I can tell the final inequality “$\le 2$” in Rudin’s (50) can just as easily be “$\le 1$”, because if we have found a choice of $y$ that makes $C(y)$ negative then $C$ must already vanish somewhere between 0 and $y$. For that matter, we can find such $y$ directly from the power series for $C(y)$: we calculate $\cos 2 \lt 1 - 2^2/2! + 2^4/4! = -1/3 \lt 0$ (the omitted terms $-2^6/6! + 2^8/8!$ etc. pair up to a negative sum); this yields an explicit upper bound of $4$ on $\pi$. Likewise if $x^2 \leq 2$ then $\cos(x) \gt 0,$ so $\pi^2 \lt 8.$ Indeed $\cos 1 \gt 1 - 1^2/2! = 1/2 = \cos \pi/3,$ so $\pi \gt 3.$ It is “well known” that in fact $\pi^2$ is less than but rather close to 10; this one-page note explains this fact if you believe that $\zeta(2) = \pi^2 / 6$ ([Euler 1734] — a famous theorem of which we shall give at least one proof before the semester’s end). For much better estimates, integrate $(x-x^2)^4 dx/(x^2+1)$ from 0 to 1, and note that $1/2 \leq 1/(x^2+1) \leq 1$. ☺ [Published by D. P. Dalzell in 1944, as I learned from the replies to this MathOverflow question, where you can also find further information about this nifty proof and some related mathematics.]

We next begin multivariate differential calculus, starting at the middle of Rudin Chapter 9 (since the first part of that chapter is for us a review of linear algebra — but you might want to read through the material on norms of linear maps and related topics in pages 208–9). Again, Rudin works with functions from open subsets of ${\bf R}^n$ to ${\bf R}^m$, but most of the discussion works equally well with the target space ${\bf R}^m$ replaced by an arbitrary normed vector space $V\!$. If we want to allow arbitrary normed vector spaces for the domain of $f$, we’ll usually have to require that the derivative $f'$ be a continuous linear map, or equivalently that its norm $\| \, f' \| = \sum_{\|v\|=1} \left|\,f'(v)\right|$ be finite.

As in the univariate case, proving the Mean Value Theorem in the multivariate context (Theorem 9.19) requires either that $V$ have an inner-product norm, or the use of the Hahn-Banach theorem to construct suitable functionals on $V\!$. Once this is done, the key Theorem 9.21 can also be proved for functions to V, and without first doing the case $m=1.$ To do this, first prove the result in the special case when each $D_j({\bf x})$ vanishes; then reduce to this case by subtracting from $f$ the linear map from ${\bf R}^n$ to $V$ indicated by the partial derivatives $D_j \,f({\bf x}).$

The Inverse function theorem (9.24) is a special case of the Implicit function theorem (9.28), and its proof amounts to specializing the proof of the implicit function theorem. But Rudin proves the Implicit theorem as a special case of the Inverse theorem, so we have to do Inverse first. (NB for these two theorems we will assume that our target space is finite-dimensional; how far can you generalize to infinite-dimensional spaces?) Note that Rudin’s statement of the contraction principle (Theorem 9.23 on p.220) is missing the crucial hypothesis that $X$ be nonempty! The end of the proof of 9.24 could be simplified if Rudin allowed himself the full use of the hypothesis that $\bf f$ is continuously differentiable on $E$, not just at $\bf a$: differentiability of the inverse function $\bf g$ at ${\bf G} = {\bf f}({\bf a})$ is easy given Rudin’s construction of $\bf g$; differentiability at any other point ${\bf f}({\bf x})$ follows, since $\bf x$ might as well be ${\bf a},$ and then the derivative is continuous because $\bf g$ and ${\bf f}'$ are.

[We have seen that even in dimension $1$ there can be a function $\,f$ that is differentiable everywhere, and has $f'(0) \neq 0$, but is not locally injective near $0$, and thus has no inverse function. (Necessarily $f'$ is not continuous at $0$.) However, if $f'(x)$ exists and is nonzero for all $x$ in a neighborhood of $0$, then $\,f$ is injective by Rolle’s theorem, and then it does have an inverse function with the expected derivative. Remarkably this generalizes to higher dimensions (replacing “$f'$ nonzero” by “$f'$ invertible”), though the proof requires techniques from algebraic topology such as the Brouwer fixed point theorem instead of Rolle. See this Sep.2011 entry on Terry Tao’s blog.]

The proof of the second part of the implicit function theorem, which asserts that the implicit function g not only exists but is also continuously differentiable with derivative at $\bf b$ given by formula (58) (p.225), can be done more easily using the chain rule, since $\bf g$ has been constructed as the composition of the following three functions: first, send $\bf y$ to $({\bf 0}, {\bf y})$; then, apply the inverse function ${\bf F}^{-1}$; finally, project the resulting vector $({\bf x}, {\bf y})$ to $\bf x$. The first and last of these three functions are linear, so certainly ${\cal C}^1$; and the continuous differentiability of ${\bf F}^{-1}$ comes from the inverse function theorem.

Here’s an approach to $D_{ij} = D{ji}$ that works for a ${\cal C}^2$ function to an arbitrary normed space. As in Rudin (see p.235) we reduce to the case of a function of two variables, and define $u$ and $\Delta$. Assume first that $D_{21} \, f$ vanishes at $(a,b)$. Then use the Fundamental Theorem of Calculus to write $\Delta(f,Q)$ as the integral of $u'(t) \, dt$ on $[a,a+h]$, and then write $u(t)$ as an integral of $D_{21} \, f(t,s) \, ds$ on $[b,b+k]$. Conclude that $u'(t) = o(k)$ and thus that $\Delta(\, f,Q) / hk \to 0.$ Now apply this to the function $\, f - xy D_{21} \, f(x,y)$ to see that in general $\Delta(\, f,Q) / kh \to D_{21} \, f(x,y)$. Do the same in reverse order to conclude that $D_{21} \, f(x,y) = D_{12} \, f(x,y).$ Can you prove $D_{12} \, f = D_{21} \, f$ for a function $\,f$ to an arbitrary inner product space under the hypotheses of Theorem 9.41?

We omit the “rank theorem” (whose lesser importance is noted by Rudin himself), as well as the section on determinants (which we treated at much greater length in Math 55a).

An important application of iterated partial derivatives is the Taylor expansion of an $m$-times differentiable function of several variables; see Exercise 30 (Rudin, 243-244). As promised at the start of Math 55a and/or Math 55b, this also applies to maxima and minima of real-valued functions $\,f$ of several variables, as follows. If $\,f$ is differentiable at a local maximum or minimum then its derivative there vanishes, as was the case for a function of one variable. Again we say that a zero of the derivative is a “critical point” of $\,f$. Suppose now that $\, f$ is ${\cal C}^2$ near a critical point. The second derivative can be regarded as a quadratic form. It must be positive semidefinite at a local minimum, and negative semidefinite at a local maximum. Conversely, if it is strictly positive (negative) definite at a critical point then that point is a strict local minimum (resp. maximum) of $\,f$. Compare with Rudin’s exercise 31 on page 244 (which however assumes that $\, f$ is ${\cal C}^3$ — which I don’t think is needed, though it makes some of the estimates easier to obtain).

Next topic, and last one from Rudin, is multivariate integral calculus (Chapter 10). Most of the chapter is concerned with setting up a higher-dimensional generalization of the Fundamental Theorem of Calculus that comprises the divergence, Stokes, and Green theorems and much else besides. With varying degrees of regret we’ll omit this material, as well as the Lebesgue theory of Chapter 11. We will, however, get some sense of multivariate calculus by giving a definition of integrals over ${\bf R}^n$ and proving the formula for change of variables (Theorem 10.9). this will already hint why in general an integral over an $n$-dimensional space is often best viewed as an integral not of a function but a “differential $n$-form”. For instance, in two dimensions an integrand of “$\,f(x,y) \, dx \, dy$ ” can be thought of as “$\,f(x,y) \, dx \wedge dy$ ”, and then we recover the formula involving the Jacobian from the rules of exterior algebra. You’ll have to read the rest of this chapter of Rudin, and/or take a course on differential geometry or “calculus on manifolds”, to see these ideas developed more fully.

By induction on $n$, the map taking a continuous function $\,f$ on a box $B = \{ x \in {\bf R}^n: \forall i, x_i \in [a_i,b_i] \}$ to its integral on $B$ is bounded linear map of norm equal to the volume $\prod_{i=1}^n (b_i-a_i)$ of the box. The application of Stone-Weierstrass that Rudin uses to derive Fubini’s theorem (for continuous integrands on a box) suggests the following generalization: for any compact metric spaces $X_1,\ldots,X_n$, any continuous $\, f: X_1 \times X_2 \times \cdots \times X_n \to {\bf R}$ can be uniformly approximated by linear combinations of functions of the form $(x_1,x_2,\ldots,x_n) \mapsto \, f_1(x_1) \; f_2(x_2) \cdots \, f_n(x_n)$ for continuous $\, f_i: X_i \to {\bf R}.$ The proof is much the same as in the case of intervals $X_i = [a_i,b_i]$ that Rudin uses.

Rudin’s use of “compact support” (top of page 247) doesn’t quite match the definition (10.3, bottom of page 246): as defined there, the only continuous function of compact support is zero! But all that is needed is that the support is contained in a compact set (which is what “compact support” actually means in practice), which by Heine-Borel is equivalent to the assumption that the function has bounded support.

The “partition of unity” constructed in Theorem 10.8 works for compact subsets of any metric space, not just ${\bf R}^n$. In ${\bf R}^n$, it can be done also with differentiable $\psi_i$, or even ${\cal C}^\infty$ functions (but not analytic ones…), by choosing differentiable or ${\cal C}^\infty$ functions $\varphi_i$.

Complex Analysis 1:
Outline of solutions and extensions for the complex analysis problems from the 8th and 9th problem set
Having defined line integrals, we can deduce that if $f$ is analytic in an open rectangle $R$ then $\oint_\gamma f(z)\, dz = 0$ for any closed path in $R$, because that's $\oint_\gamma dF(z)$ where $F$ is an antiderivative (as constructed in #4). Likewise for $f$ analytic on an open circle, or on any other convex region (so that we can consistently construct $F$). Note that these paths are not required to be simple (i.e., they may self-intersect). In fact this is true on any simply-connected region $E$. We are not in position in 55a to properly define this notion, but it is preserved under 1:1 ${\cal C}^1$ maps $T$ with ${\cal C}^1$ inverses (recall that by the Inverse Function Theorem the ${\cal C}^1$ inverse is equivalent to the condition that $T$ at each point is an invertible linear map). Note that $T$ is not required to be complex analytic!(*) The point is that since $\omega = f(z) \, dz$ is closed, so is its “pullback” $T^*\omega$ (obtained by substituting for $dx$ and $dy$ the total derivatives of the $x$- and $y$-coordinates of $T$); but then $\oint_\gamma T^*\omega = 0$ for every closed path $\gamma$ (because $T^*\omega = dF$ for some $F$), so $\oint_{T\gamma} f(z) \, dz = \oint_{T\gamma} \omega = \oint_\gamma T^*\omega = 0$. It then follows that $f = F'$ for some analytic function $F$ on our region: we can define $F$ on any connected component by fixing $z_0$ in the component and setting $F(z) = \int_\gamma f(z) \, dz$ for any path $\gamma$ from $z_0$ to $z$; this is well-defined because two different choices of $\gamma$ differ by a close path, and the integral of $f(z) \, dz$ on a closed path vanishes.
(*) In fact any two connected and simply-connected regions in $\bf C$ are related by an analytic 1:1 map, unless exactly one of them is all of $\bf C$. But that is a considerably harder theorem.

Now that we recognize the rectangular $\oint_{\partial R}$ as a special case of a contour integral, we can also recognize $\int_0^{\theta_0} f(Re^{i\theta}) \, d\theta$ as $\int_\gamma f(z) \, \frac{dz}{iz}$ where $\gamma$ is the circular arc from $R$ to $Re^{i\theta_0}$. In particular, the formula $f(a) = (2\pi)^{-1} \int_0^{2\pi} f(a + Re^{i\theta}) \, d\theta$ is tantamount to Cauchy’s integral formula $f(a) = (2\pi i)^{-1} \oint_\gamma f(z) \, \frac{dz}{z-a}$ for a circular contour $\gamma$ centered at $a$ (in each case $\,f$ must be analytic in a neighborhood of the corresponding circular disc). Likewise for our generalization where $a$ can be any point in the open disc, not necessarily its center.

An important application is the Laurent series of a function analytic in a neighborhood of an annulus $\{ z \in {\bf C} : r \leq |z-a| \leq R \}$, generalizing the power series expansion of an analytic function in a disc. This time we find that if $r \lt |z_0-a| \lt R$ then $$ f(z_0) = \frac1{2\pi i} \oint_{|z|=R} f(z) \, \frac{dz}{z-z_0} - \frac1{2\pi i} \oint_{|z|=r} f(z) \, \frac{dz}{z-z_0}. $$ The first integral is still $\sum_{n=0}^\infty c_n (z_0-a)^n$ where $c_n = (2\pi i)^{-1} \oint_{|z|=R} f(z) \, dz/(z-a)^{n+1},$ using the geometric series $$ \frac 1{z-z_0} = \frac 1{(z-a)-(z_0-a)} = \sum_{n=0}^\infty \frac{(z_0-a)^n}{(z-a)^{n+1}} $$ uniformly convergent in compact subsets of the open annulus (and indeed of the circle $|z| \lt R$). For the second integral we use the geometric series $$ \frac 1{z-z_0} = \frac 1{(z-a)-(z_0-a)} = -\sum_{n=1}^\infty \frac{(z-a)^{n-1}}{(z_0-a)^n} = \! -\sum_{n=-\infty}^{-1} \frac{(z_0-a)^n}{(z-a)^{n+1}}, $$ also uniformly convergent in compact subsets of the annulus (and indeed in $|z| \geq \rho$ for any $\rho \gt r$). We conclude that $f(z) = \sum_{n\in\bf Z} c_n z^n$ with $c_n = \frac1{2\pi i} \oint f(z) \, dz/(z-a)^{n+1}$ for all $n,$ positive as well as negative. The contour $\gamma$ can be any circle $|z|=\rho$ with $\rho \in [r,R],$ or for that matter any closed contour in the annulus that winds around it once. In particular, taking $\rho = |z_0|$ recovers the Fourier series of the restriction of $\,f$ to the circle $|z_0| = \rho.$

Liouville’s theorem soon follows: every bounded entire function is constant. (An “entire function” is an analytic function $\,f: {\bf C} \to {\bf C}$.) Write $f(z) = \sum_{n=0}^\infty a_n z^n.$ Since the domain is the entire complex plane, we can apply the integral formula for $a_n$ with $R$ arbitrarily large. This shows that $a_n = O(1/R^n),$ and thus $a_n = 0$ for each $n > 0.$ The hypothesis may seem very restrictive, but note that the Fundamental Theorem of Algebra follows immediately on setting $\,f(z) = 1/P(z)$ for a polynomial $P \in {\bf C}[z]$ with no complex roots: $1/P,$ and thus $P,$ must be constant!

The same argument shows more generally that if an entire function grows no faster than a polynomial then it is a polynomial; more precisely, if for some $d$ we have constants $C,R_0$ such that $|\,f(z)| \leq C |z|^d$ for all $z\in\bf C$ with $|z| \geq R_0,$ then $\,f$ is a polynomial of degree at most $d$. Indeed the integral formula shows $|a_n| \leq C R^{d-n}$ for all $R \geq R_0,$ whence $a_n = 0$ for $n \gt d.$

The “calculus of residues” is a central tool, both for developing complex analysis and for applications beyond it. If $\,f$ is an analytic function on a punctured neighborhood $E$ of $a \in \bf C$ then the residue of $\,f$ (better, of $\,f(z)\,dz$) at $a$ is $(2\pi i)^{-1} \oint_\gamma f(z) \, dz$ where $\gamma$ is the oriented boundary of a circle about $a$ lying in $E \cup \{a\}$. The factor of $(2\pi i)^{-1}$ is convenient because the residue of a power series $\sum_n c_n (z-a)^n$ (with negative $n$ allowed) is $c_{-1}.$ (Proof: the residue is additive, and the residue of $\sum_{n\neq -1} c_n (z-a)^n$ vanishes because there is an analytic antiderivative $\sum_{n\neq -1} c_n (z-a)^{n+1}/(n+1)$; so it remains to check that $c_{-1} \, dz/(z-a)$ has residue $c_{-1},$ which is a restatement of something we have already shown.) For example, if $\,f$ is actually analytic on an (unpunctured) neighborhood of $a$, then $f(z) \, dz/(z-a)$ has a residue of $f(a)$ at $z=a,$ from which we recover the Cauchy integral formula.

If $\gamma$ is the oriented boundary of a simply connected region $E,$ and $\,f$ is analytic on a neighborhood of $\overline{E}$ except for some points $a_1,\ldots,a_k$ in $E,$ then $\oint_\gamma f(z) \, dz = 2\pi i \sum_{j=1}^k {\rm Res}_{z=a_j} f(z) \, dz$. In particular this is true if $\,f$ is meromorphic on a neighborhood of $\overline E$ (that is, if each $a_j$ is at worst a pole). This will be the basis for most of our applications of contour integration to the evaluation of definite integrals. (The main exception is $\int_0^\infty \sin x \, dx/x = \frac12 \int_{-\infty}^\infty \sin x \, dx/x,$ for which we had to deal with the pole of $e^{iz} \, dz/z$ on the natural contour, and got a contribution of half its residue.) For example, if $\gamma$ is the oriented boundary of the semicircle $\{ z \in {\bf C}: |z| \leq R, {\rm Re}(z) \geq 0 \}$, then $e^{iyz} \, dz/(z^2+1)$ is analytic in a neighborhood of that semicircle except for the simple pole at $z=i,$ where its residue is $e^{-y}/2i$ (because $1/(z^2+1) = 1/(z-i)(z+i)$ etc.); thus if $y \geq 0$ we can let $R \to \infty$ to deduce that $\int_{-\infty}^\infty e^{ixy} \, dx/(x^2+1) = \pi e^{-y},$ whence also $\int_0^\infty \cos(xy) \, dx/(x^2+1) = (\pi/2) e^{-y}.$

An important example is a logarithmic derivative $(\log \, f)' = \,f'/f$ (better, a logarithmic differential $d(\log f) = df/f = (\,f'/f) \, dz$). Here $\,f$ is any mermomorphic function that is not identically zero. This makes sense even though $\log f$ is not in general well-defined, and is additive: $d(\,fg)/fg = df/f + dg/g.$ The key fact is that $df/f$ has poles only at zeros and poles of $\,f$: a simple pole of residue $n$ at $z=a$ if $a$ is an order-$n$ zero of $\,f$, and a simple pole of residue $-n$ at $z=a$ if $a$ is an order-$n$ pole. Therefore $\oint_\gamma df/f$ is $2\pi i$ times the difference between the numbers of zero and poles of $\,f$ enclosed by $\gamma$, counted with multiplicity (assuming that there is no zero and pole on $\gamma$ itself — and also assuming as usual that $\gamma$ is traversed “in the positive direction”). This is a form of the argument principle.

Now, since $2\pi i {\bf Z}$ is a discrete subset of ${\bf C},$ continuous changes in $\,f$ cannot change $\oint_\gamma df/f,$ and thus leave invariant the number of zeros minus poles: zeros can merge with zeros, and poles with poles (combining multiplicity), or can cancel each other (consider $\,f_a(z) = z/(z-a)$ as $a \to 0$, with $\,f_a(z)$ varying continuously on $|z|=1$), but they cannot “escape” or “enter” the region enclosed by $\gamma$. One might more simply say that the total number of zeros with multiplicity remains constant, where a pole of order $n$ is counted as a “zero of order $-n$”. An example is Rouché’s theorem: If $|g(z)| < |\,f(z)|$ for all $z$ on $\gamma$ then $\,f$ and $\,f+g$ have the same counts of zeros minus poles with multiplicity enclosed by $\gamma$. (Proof: consider $\oint_\gamma d(\,f+tg) \, / \, (\,f+tg)$ as $t$ varies from $0$ to $1$ (an example of a “homotopy”).) In particular, if $\,f$ and $g$ are both analytic then $\gamma$ encloses at least one zero of $\,f$ if and only if it encloses at least one zero of $\,f+g.$ An easy consequence is the open mapping theorem: a nonconstant analytic function takes open sets to open sets. (NB this is not generally true of continuous functions, nor even of analytic functions on $\bf R$ — can you find an easy counterexample?)

Problem sets 1 and 2: Metric topology basics

Problem set 3: Metric topology cont’d

Problem set 4: Topology finale; differential-calculus prelude

Problem set 5: More univariate differential calculus; introducing univariate integral calculus

Problem set 6: Riemann(-Stieltjes) integration cont’d

Problem set 7: Fourier series via Stone-Weierstrass; power series; manipulating and estimating definite integrals to prove some classical product and sum formulas
Typos in problems 1 (F.Flesher), 2 (C.J.Dowd), and 4,5 (T.Piazza) corrected

Problem set 8: Introduction to multivariate differentiation — and to contour integration and complex analysis
Typo in problem 7 (D.Chiu) corrected 27.iii.2018

Problem set 9: More complex analysis, and (counter)examples of multivariate real analysis
Typos in problem 5 and 9 (A.Sun) corrected 5.iv.2018

Problem set 10: Integration in ${\bf R}^k$; more analysis in $\bf C$
Small error in problem 2 (J.Ahn) corrected 11.iv.2018; typo in problem 3, and missing hypothesis a the end of problem 7 (both D.Xiang), corrected 15.iv.2018

Problem set 11: Complex analysis cont’d:
definite integrals and other uses of residues; product formulas; rational functions; variation on a theme of Jensen
Problem 11 corrected (S. Hu)