Plan for fall – chapters 1 to 4 of the Folland textbook
Pre-midterm I: chapter 1
Pre-midterm 2: chapters 2 and 3
Grade breakdown:
Reading responses, free points, 5%
HW, posted weekly, assigned on Wednesday and due the following Wednesday at 11:59, 25%
Midterms: 20%
Final: 30%
All exams are inclass.
Historical Background
Fourier and the crisis of the 19th century
Fourier’s ideas animate the course
1795: joined Napolean’s expedition to Cairo
Mathematics and imperialism
Becomes very interested in heat
How does heat diffuse across an object?
The total heat distribution should solve ∂x2∂2u=∂wu and u(0,x)=f(x)
Fourier found a general method for solving this – partial differential equations
His method requires assuming a special form: you can write your initial function as a sum of cosines. Then the distribution u can be found easily.
However this assumes that f(x)=0;x=±1, and that the function is continuous
You can’t solve f(x)=1, for instance: you can’t solve for heat when the temperature applied to a bar is constant.
What’s the solution? Bold idea: larger sums of cosines can predictably better approximate functions which cannot themselves be written as a finite sum.
For instance: write 1=π4[cos(2πx)−31cos(23πx)+51cos(25πx)+...], for x∈(−1,1)
You cannot do this with a Taylor epansion, you can’t write constant functions in terms of nonconstant functions.
You immediately begin to run into all sorts of funny paradoxes
Notice that cos(2(2n−1)π(x+2))=cos(2(2n−1)πx+(2n−1)π): this amounts to a phase shift of π in each instance
Therefore you could write −1 simply by shifting, so the formula is the same but the interval is now (1,3)
All you did was add continuous things, but you got something totally noncontinuous…
Response to Fourier: this is not a function. If you could call that a function and consider it a representation of such a familiar function, maybe Weierstrauss was onto something when he claimed W(x)=cos(πx)+21cos(13πx)+41cos(169πx)+...=∑j=0∞2jcos(13jπx).
This series converges and is a function
It is continuous in x
It is differentiable nowhere – why can’t I just take the derivatives of each term? – you can’t
“A monstrosity”
What does this do to our conception of continuity?
Several questions follow:
What is a function in the first place?
When can a function be represented as a Fourier series, ideally one that converges?
When can a series be differentiated or integrated, term-by-term?
Function from a set A to a set B is an assignment of every element in A to exactly one element in B, denoted by f:A→B
A: domain of f
B: codomain of f
f(A)={y∈B∣∃x∈A,f(x)=y}⊂B, the image/range of f
f is injective / 1-to-1 if a1=a2→f(a1)=f(a2)
f is surjective / onto if ∀b∈B,∃A∈A:f(A)=b
f is bijective if both injective and surjective. Bijectivity gives you one-on-one pairing. Bijection is pairing but also works for infinite sets.
If f:A→B and g:B→C, we can write their composition as g∘f:A→C, where (g∘f)(x)=g(f(x)). Requirement: the codomain of f must be a subset of the domain of g.
If f:A→B and g:B→A, f is invertible if ∃g:B→A:g∘f s.t. A→A satisfies ∀x∈A,g∘f(x)=x an f∘g:B→B satisfies ∀y∈B,f∘g(y)=y. Basically: it sends every element back to itself.
Bijectivity implies invertability.
Definition: g=f−1
Euclidean spaces and vectors
The reals R
We can also consider tuples of reals: Rn for n∈N, the set of ordered tuples of reals. Rn={(x1,x2,...,xn)∣∀j,xj∈R}
Vectors also used to describe n-tuples
Vector arrows are menat to capture that an n-tuple can also describe a direction and magnitude
The magnitude / norm of a vector / n-tuple is ∥x∥=x12+x22+...+xn2
Note that the norm is defined in terms of a dot product: ∥x∥=x⋅x
The Cauchy-Schwarz Inequality
If a,b∈Rn, ∣a⋅b∣≤∥a∥∥b∥
Going to be your best friend in this class
This proof, which seems to work only in Euclidean space, is very much broadly applicable.
Proof:
If b=0, then both sides are zero. The inequality holds.
Now, assume b=0. Consider the function f:R→R≥0 where f(t)=∣∣a−tb∣∣2=(∣a⋅t∣b)⋅(a−tb)=∣∣a∣∣2−2t(a⋅b)+t2∣∣b∣∣2
This quadratic function has a minimum at t=∣∣b∣∣2a⋅b
Plugging this into f gives you ∣∣a∣∣2−∣∣b∣∣2(a⋅b)2
Since f(t)≥0, ∣∣a∣∣2−∣∣b∣∣2(a⋅b)2≥0, so ∣∣a∣∣2∣∣b∣∣2≥(a⋅b)2
Take square roots to get our inequality.
Lecture 2: Euclidean Spaces, Open Sets, Limits and Continuity, Sequences
Triangle inequality: ∀a,b∈Rn, ∥a+b∥≤∥a∥+∥b∥
Norm of sum is less than or equal to the sum of the norms
Proof
Since (∥a+b∥)2=(a+b)⋅(a+b)=∥a∥2+2ab+∥b∥2
Appplying CS to the middle term: ≤∥a∥2+2∥a∥∥b∥+∥b∥2
Simplify: (∥a∥+∥b∥)2
Why is it called the triangle inequality?
Define distance between two vectors using a norm: it’s faster to go along the direct line (‘hypotenuse’) than the legs of the triangle
Formally: d(a,b)=∥a−b∥=(a1−y1)2+(x2−y2)2+...
We use x⋅y or ⟨x,y⟩ to denote the dot product
The angle between vectors describes some plane. By the Cauchy-Schwartz theorem, ∥x∥∥y∥x⋅y is between −1 and 1, so it is a cosine. Therefore, cosθx,y=∥x∥∥y∥x⋅y
x,y are perpendicular iff x⋅y=0, in which case they are orthogonal.
As soon as you have a notion of inner product, you can define the angle between vectors.
Useful inequality: if you have a vector x∈Rn, x=(x1,...,xn), then its maximum of the absolute value of the vector is less than the norm of x, and itself less than nmax{∣x1∣,...,∣xn∣}.
∥x∥∞=max{∣x1∣,...,∣xn∣}, called the L∞ norm.
This is to be contrasted with the usual L-2 norm, ∥x∥2, which is the square root of the sum of the squares of the components.
Norms are generalizable across vector spaces.
Subsets of Rn - introduction to topology
Topology tries to abstract the notion of “nearness” or “closeness” in a space, without specific notions of angle, length, etc.
What do you need to describe a space? You need to say when points are near, and you don’t need to do so quantiatively.
Definition: Let y be a point in Rn and r>0. Define an open ball Br(y) to be the collection of all points in Rn such that the distance from every point to the center is less than r.
Formally: Br(y)={x∈Rn∣∥x−y∥<r}
A set S⊂Rn is bounded if it is contained in some ball centered at the origin 0. There exists a sufficiently large radius for an enclosing ball.
Definition: a point x∈Rn is an interior point of S if there exists a ball centered at some point which is strictly contained in that set, where x is contained in that ball.
Definition: the set of points in S which are interior points of S is called the interior of S, denoted Sint.
Sint={x∈S∣∃r>0,Br(x)⊂S}
Examples of interiors:
If S=Br(y), then Sint=S
If S={n1∈Rn∈N} (subset of a line), then Sint=∅, since there is no ball around any point which is contained in S. Also note that 0∈/S.
Definition: a point x∈Rn is a boundary point of S if every ball centered at x contains a point in S and a point not in S.
Definition: the set of boundary points of S is called the boundary of S, denoted ∂S.
Example 1: if S=Br(y), ∂S={x∈Rn∣∥x−y∥=r}, the sphere of radius r centered at y.
Example 2: if S={n1∈R∣n∈N}, the boundary is {0}. This is because every ball around 0 contains points in S and points not in S. Or maybe not… depends on if you need to look at the center or not. Otherwise the definition is kind of empty… need to clarify.
Definition: the closure of a set Sˉ=S∪∂S. The closure is the set of all points in S and all boundary points of S.
Example 1: if S=Br(y), then Sˉ={x∈Rn∣∥x−y∥≤r}, the closed ball of radius r centered at y.
Example 2: if S={n1∈R∣n∈N}, then Sˉ=S∪{0}.
Definition: S is open if it contains no boundary points. S is closed if it contains all of its boundary points.
Example 1: Br(y) is open
Example 2: {n1∈R∣n∈N} is closed
Example 3: {n1∈R∣n∈N}∪{0} is neither open nor closed
Proposition: If S⊂Rn,
S is open iff it is equal to its interior
S is closed iff its complement is open
Proof:
An element of S is iether an interior point or a boundary point.
S is open iff every point is interior.
S=Sˉ=S∪∂S⟺∂S⊂S⟹∂(SC)⊂S (boundaries are shared between a set and its complement) ⟺SC contains no boundary points
Example:
If S=∅, this is a set which is both open and closed.
If S=Q, the set of fractions, this set is neither open nor closed. The interior is empty, Sint=∅; yet the closure of S is R, so S is not closed.
Limits
“You can’t lift your pencil”
Definition: let f:Rn→R be a real-valued function.
x→alim=L if ∀ϵ>0,∃δ>0:∣f(x)−L∣<ϵ whenever ∥x−a∥<δ
a whenever b: b→a
We can replace ∥x−a∥<δ with the L∞ norm:
∥x−a∥∞=max{∣x1−a1∣,...,∣xn−an∣}<nδ
You can also consider a limit on a subset, f:S⊂Rn→R.
x→a,x∈Slimf(x)=L if ∀ϵ>0,∃δ>0:∣f(x)−L∣<ϵ whenever ∥x−a∥<δ and x∈S
Continuity
Definition: f(x) is continuous at a if the limit as x approaches a of f(x), and f(a), both exist and are equal.
Definitions also make sense for vector-valued functions: you can go from Rn to Rm. All you need is a corresponding notion of norm, which you do in any Euclidean space / dimension. The only thing that changes is to replace ∥f(x)−L∥n<ϵ, where n is the dimension of the value space.
Alternatively, you can find the limit of each indvididual component, considering the sub-function fj:Rm→R⊂Rn.
The notion of “getting close” is very subtle in high dimensions – it’s harder than just taking left and right limits.
Example 1: f:R2→R,f(x,y)=x2+y2xy. This function is everywhere bounded. f(x,y)≤2. A consequence of Cauchy-Schwartz. But the function fails to be continuous at the origin. The limit as x→0 does not exist. By taking linear lines of approach, the line is constant, but the line in particular changes this constant value
Theorem. Let f1(x,y)=x+y, f2(x,y)=xy, g(x)=x1. f1,f2:R2→R; g:R∖{0}→R. Proof:
Let (a,b)∈R2
Given ϵ>0, we must show that there exists some δ such that if ∣x−a∣<δ and ∣y−b∣<δ, then ∣(x+y)−(a+b)∣<ϵ.
Let us choose δ=21ϵ.
∣(x+y)−(a+b)∣=∣(x−a)+(y−b)∣≤∣(x−a)∣+∣(y−b)∣. Via triangle inequality
We have that δ+δ=2ϵ+2ϵ=ϵ.
Revisit proof!
Theorem. Say f:Rn→Rm and g=Rm→Rk. If f and g are continuous at a, then g∘f is continuous at a. Proof:
Let ϵ>0.
Since g is continuous at f(a), there exists δ1>0 such that ∥g(y)−g(f(a))∥<ϵ whenever ∥y−f(a)∥<δ1.
Since f is continuous at a, there exists δ2>0 such that ∥f(x)−f(a)∥<δ1 whenever ∥x−a∥<δ2.
Let δ=δ2.
Then ∥g(f(x))−g(f(a))∥<ϵ whenever ∥x−a∥<δ.
Later, with sequences: an=2an−1−an−2+2. Claim: an=n2. You can prove this with induction.
Lecture 3: Sequences, Induction, the Completeness Axiom
Sequences
A sequence is a collection of mathematical objects which are indexed by the natural numbers
We denote a sequence by {xk}k=1∞
A set has no notion of order, whereas a sequence does have order
e.g. the sequence 1, 4, 9, 16, … can be written as {n2}n=1∞
e.g. a sequence of intervals (−1,1),(−1/2,1/2),(−1/3,1/3),... can be written as {(−1/n,1/n)}n=1∞
You can also define sequences inductively: F1=1,F2=1,Fn=Fn+2=Fn+1+Fn
A nice proof technique: an inductive proof. Useful for proving statements of the form ∀n∈N,P(n) is true. First, show P(1). Then, show P(n)⟹P(n+1) (inductive step)
Example: consider a1=1,a2=4,an=2an−1−an−2+2 for n≥3. Claim: an=n2.
Sequences of real numbers / vectors, {xn}n=1∞⊆R2 converge to a∈Rn if ∀ϵ>0,∃N>0 such that ∥xn−a∣<ϵ,∀n>N. If this is so, then xn→a. if {xn=1∞} does not converge to any vector, the sequence diverges.
We can describe limits / continuity in the form of sequences
e.g. the sequence {1/k}k=1∞ converges to 0.
e.g. the sequence {(−1)k}k=1∞ is divergent.
The following are equivalent.
f:Rn→Rn is continuous.
∀x0∈Rn,∀{xn}n=1∞ converging to x0, {f(xn)}n=1∞ converges to f(x0).
∀U⊆Rk an open set, the preimage f−1(U)={x∈Rn∣f(x)∈U}⊆Rn is also open.
Proof that 1 implies 2:
Let x0∈Rn be arbitrary, and assume f is continuous at x0.
Let {xn}n=1∞ be a sequence converging to x0.
By continuity, ∀ϵ>0, ∃δ>0 such that ∥f(x)−f(x0)∥>ϵ whenever ∥x−x0∥<δ.
Since xn→x0, ∃N>0 such that ∥xn−x0∥<δ,∀n>N.
Proof that 2 implies 1 (prove the logicallay equivalent contrapositive statement):”not 1 implies not 2”. If ∃x0∈Rn such that f(x) is not continuous at x0, we’ll prove ∃xn→x0 and f(xn) does not converge to f(x0).
Since f is not continuous at x0, ∃ϵ0>0,∀δ>0,∃xδ∈Rn where ∥xδ−x0∥<δ and ∥f(xδ)−f(x0)∥>ϵ0
For each of δ=1,1/2,1/3,...,1/k,... choose such a vector xk as above.
Although the sequence xk→x0, f(xk) does not converge to f(x0).
Proof that 1 implies 3:
Say U⊆Rl is open. We will show that every point in f−1(U) is contained in a ball, contained in the preimage.
If a∈f−1(u), then f(a)∈U, since U is open, ∃r>0 such that Br(f(a))⊆U; i.e., ∀y∈Rk, ∥y−f(∣a)∥<r,y∈U.
By continuity of f, for r>0, ∃δ>0 such that ∥f(x)−f(a)∥<r whenever ∥x−a∥<δ; i.e., Bδ(a)⊆f−1(U), so f−1(U) is open.
Preimage definition: f−1(U)={x∈Rn∣f(x)∈U}
Proof that 3 implies 1.
Let a∈Rn be arbitrary. ∀ϵ>0,Bϵ(f(a))⊆Rk is open.
By 3, f−1(Bϵ(f(a))) is open. By openness, ∃δ>0 such that Bδ(a)⊆f−1(Bϵ(f(a))).
So, if ∥x−x∥<δ, then ∥f(x)−f(∣a)∥<ϵ, so f is continuous at a.
Corollary: If f:Rn→Rk is continuous and V⊂Rk, then f−1(V) (the preimage) is also closed.
Example: if C>0, the sequence {Cn/n!}n=1∞ converges to 0.
Choose N>2C, then ∀n>N we have 0<Cn/n!=CN/N!⋅C/(N+1)⋅C/(N+2)⋅...⋅C/n
We can upper-bound the rest of these Cn/N!⋅21⋅21⋅...⋅21
These factors are n−N in number, and going to zero as n→∞. Therefore the sequence converges to zero.
Theorem.A⊆Rn is a subset. Then x∈Aˉ⟺A intersects every neighborhood of x. (A point is in the closure of a set if the set intersects every neighborhood of the point.) Recall that x has a neighborhood U if x is interior to U. We will prove the contrapositive: a point is not in the closure of a set iff the point has a neighborhood not intersecting A.
Say x∈/Aˉ, then Rn∖Aˉ=U is a neighborhood which does not intersect A.
Say ∃U, a neighborhood of x not meeting A. Then Rn∖U is a closed set containing A, implying that the closure of A is a subset of Rn∖U. Since x∈/Rn∖U,x∈/A.
Theorem. If S⊆Rn, and x0∈Rn, x0∈Sˉ⟺∃{xn}n=1∞⊆S where xn→x0.
“It’s not possible to take a sequence of points and escape a closed ball”
Leftwards proof. If xn→x0 and ∀n,xn∈S, then every neighborhood of x0 contains an element of S, namely xn for n sufficiently large.
Since xn→x0, ∀ϵ>0, ∃N>0 such that ∥xn−x0∥<ϵ, ∀n>N; any neighborhood of x0 contains such a ball and xn∈Bϵ(x0)⊆U, a neighborhood of x0.
So xn∈S∩U for every neighborhood U, and thus x0∈S.
Rightwards proof. Say we have a point which is in the closure x0∈Sˉ. If x0∈S, then in fact we can take a constant sequence ∃{x0}n=1∞ which converges to x0.
If x0∈Sˉ but x0∈/S, then B1/k(x0) is a neighborhood of x0, ∀k∈N.
By theorem, ∀k,∃xk∈B1/k(x0)∩S. Choosing one for each ball produces a sequence. This sequence clearly converges to x0.
The rightward implication: you can construct something just outside your set, to the closure, but no further.
Important property of the reals. Completeness in the reals: every sequence of real numbers which gets close to itself converges to a real number. When we talk about a limit existing, we point towards an existing limit. But even if we don’t know about the limit, the numbers get closer and closer to each other.
If S⊆R is a subset, an upper bound b for S is a real number b∈R such that∀x∈S,x≤b. Similarly, a lower bound is a y∈R such that ∀x∈S, y≤x. Bounds for a set are not unique. The Completeness Axiom. – one of the defining properties of the reals (proven in Kemp’s notes, theorem 4.13). If \(S \susbsetequals \mathbb{R}\) and non-empty,
If S has an upper bound, it has a least upper bound
If S has a lower boud, it has a greatest lower bound Although this is true for the reals, it is false for the rationals. For example, S={x∈R∣x∈Q and x2<2}. There is no least upper bound in the rationals (it is in the reals).
Lecture 4: Sequences, Extrema
Monotone Sequence Theorem: every bounded monotone sequence is convergent.
Since l is a least upper bound, ∀ϵ>0,∃N>0 s.t. l−ϵ<xn,∀n>N, as otherwise l−ϵ would be a smaller upper bound than l. Thus ∀ϵ>0,∃N>0,l−ϵ≤xn<l,∀n>N. Therefore, xn→l.
Since I1⊃I2⊃I3⊃..., then a1≤a2≤a3≤..., and b1≥b2≥b3≥.... These are monotone sequences.
These are bounded sequences since a1≤ak≤bk≤b1, ∀k.
We can apply the monotone sequence theorem to show that both sequences are convergent.
Moreover, since bk−ak→0, then the converge to the same element. Call this element x0. Since ak≤xo≤bk, ∀k, so x0∈[ak,bk]⟹x0∈∩k=1∞Ik.
We have to show moreover that this point is unique. Assume that there are two unique elements which are both in the intersection of all intervals. Then there is some nonzero distance between them. But since bk−ak→0, ∃N s.t. ∣bN−aN∣<ϵ. Since x0,x′∈[aN,bN]. This is a contradiction because ∣x0−x′∣>ϵ.
{xn}={(−1)n} is bounded but not monotone.
If {xk}k=1∞ is a sequence, a subsequence {xkj}j=1∞ and is defined by an injective increasing function from j→kj.
That the injection is increasing means that the ordering is preserved.
Example: if kj=2j, {(−1)k}k=1∞ has a subsequence {(−1)2j}j=1∞={1,1,1,...}.
Theorem (Bolzano-Weierstrass). Every bounded sequence in R has a convergent subsequence. (It doesn’t need to be monotone.) Proof by divide and conquer. Have something like an inductive argument, and then we are forced into one of two choices.
Let {xk}k=1∞ be bounded, say xk∈[a,b],∀k∈N.
First, bisect the interval [a,b] into [a,(a+b)/2] and [(a+b)/2,b].
Let I1 be the biggest interval. One of these subintervals must contain infinitely many terms in the sequence.
If both do, just choose the left one. Call this interval I2.
Bisect I2 into [a2,(a2+b2)/2] and [(a2+b2)/2,b2]. Choose a subinterval as above, call it I3.
We continue to narrow in one whichever interval still has an infinite number of terms in the sequence.
Proceeding in this way, we produce a sequence of nested intervals I1⊃I2⊃I3⊃....
We know, moreover, that their lengths are decreasing to zero. ∣Ik∣=2k−1b−a.
To construct our subset, k1∈N s.t. xk1∈I1, k2>N:k2>k1 s.t. xk2∈I2, and so on. This guarantees that the subsequence is contained in the intersection of all the intervals and the order is preserved.
By the Nested Interval Theorem, there is some x0∈∩k=1∞Ik. {xkj}j=1∞ converges to x0.
Corollary. Every bounded sequence in Rn has a convergent subsequence. If you look at a sequence of vectors, they all converge component-wise. But you couldn’t say that the sequenece converges to a single vector. ∀j=1,...,n,∃a convergent subsequence for {xk(j)} by BW. We proceed inductively.
Choose a subsequence of vectors, say {xkl}l=1∞, where the first component converges.
Choose a subsubsequence which makes the second components converge. This will preserve the property that the first component converges, too.
As long as n is finite, you can keep on taking subs ;)
Definition. A sequence is Cauchy if ∀ϵ>0,∃N>0 s.t. ∥xn−xm∥<ϵ whenever n,m>N. You can talk about a seuqnece being Cauchy without knowing what it converges to. Its components pile up, they end up being very close to each other, there is an index such that all the terms after that index are very close to each other. This is tronger even than saying that consecutive terms go to zzero. It also says that the terms in general after a certain point are very close to each other.
Todd Kemp: the real numbers are equivalence classes of Cauchy sequences which are rational.
Theorem. A sequence in Rn is a convergent sequence iff it is Cauchy. You don’t need to know the limit to know that the sequence is Cauchy. So often it is easier to show that a sequence of things is Cauchy rather than finding the convergent value. Proof:
If {xn}n=1∞ is converging to x0∈Rn, the sequence is cauchy. Since xj−xk=(xj−x0)+(x0−xk). Then, appeal to the triangle inequality: ∥xk−xj∥≤∥xj−x0∥+∥x0−xk∥ as k,j→∞. (For every ϵ, choose the N such that the convergence exists such that j,k the minimum of the two is greater than N.) Then ∥xk−xj∥<ϵ.
Suppose {xk}k=1∞ is Cauchy. Choosing ϵ=1 in the definition of Cauchy, there exists some N such that ∣xk−xk∥<1 for all j,k>N. Our sequence, then, is bounded. Iin particular, ∥xk∥<∥xk+1∥+1, ∀k>N. Thus the sequence is bounded and thuse has a convergent subsequence. But the Cauchy property shows us that the limit of the subsequence is the same as the limit of the original sequence.
Claim: the sequence converges to x0, not just the subsequence.
Given ϵ>0,∃J>0 s.t. ∥xkj−x0∥<ϵ/2 if j>J (by convergence), further ∃N>0 s.t. ∥xk−xm∥<ϵ/2 if k,m>N (Cauchy).
Thus ∥xk−x0∥≤∥xk−xkj∥+∥xkj−x0∥<ϵ/2+ϵ/2=ϵ if k>N and j>J.
Compactness
Definition. A set S⊆Rn is compact if it is closed and bounded / every sequence in S has a convergent subsequence whose limit is in S.
Remark: finite sets are compact. Compact sets can be infinite, but behave like finite sets.
Theorem (Extreme Value Theorem). If f:[a,b]→R is continuous, it achieves its max and min.
Notes
Midterm 1 is going to cover chapter 1 of the book.
Lecture 5: Compactness and Connectedness
COMPACTNESS
Definition.S⊆Rn is compact if it is closed and bounded.
Theorem. The following are equivalent:
S⊆Rn is compact
Every sequence of points in S has a convergent subsequence whose limit is also in S
Proof:
If S is closed and bounded, then Bolzano-Weierstrass guarantees that you have a convergent subsequence. Closedness implies the limit lies in S.
Suppose S is not compact. If S is not bounded, then there exists a sequence in S such that the norm of the vector goes to infinity, in which case there is no convergent subsequence. If S is not closed, then there exists an element in the closure of the set but not in the set itself, but because this is an element of the closure, there exists a subsequence converging to x0, since xk∈B1/k(x0), ∀k∈N.
Remark. Every finite set is compact.
Compactness is an expansion of useful properties of finite sets to infinite sets.
Theorem. Continuous functions map compact sets to compact sets. If f:S→Rm is continuous and S is compact, then the image of S under f is compact.
Proof:
Suppose there is a set of points in the image {yk}⊆f(S), then ∀k∈N,∃{vecxk}⊆S:f(xk)=yk.
Since S is compact, there exists a subsequence xkj which is converging in S: xkj→x0∈S.
Since f is continuous at x0, f(xk)→f(x0)
So {ykj}={f(xkj)}→f(x0)∈f(S).
Extreme Value Theorem. If f:S⊆Rn→R is continuous, S is compact. Then f has a max and min value which is attained in S, i.e. ∃a,b∈S s.t. f(a)≤f(x)≤f(b), ∀x∈S. These values are attained in the set. We know we have supremums and infimums in the reals. But we don’t know that we have max and min values. This theorem says that we do have max and min values if we are in a compact set.
Definition. Maybe the most general notion of compactness. Let S⊆Rn. Let {Uα}α∈A⊆Rn be a cover of S if S⊆Uα∈AUα.
Heine-Borel Theorem. If we have a subset of RnS, S is compact iff every open cover of S has a finite subcover.
Example. Let S=[0,1] are compact. Then the collection {Bϵ(x)}x∈[0,1],ϵ>0
Definition. (Metric Spaces.) A metric space is a set X with a function d:X×X→R satisfying the following properties for every pair of points:
Symmetry: d(x,y)=d(y,x)
Positivity: d(x,y)≥0
Definiteness: d(x,y)=0⟺x=y
Triangle Inequality: d(x,z)≤d(x,y)+d(y,z)
Examples of metric spaces.
Real space with Euclidean distance: Rn with d(x,y)=∥x−y∥2
Real space with L-infinity distance: Rn with d(x,y)=∥x−y∥∞
Function space with sup-norm: C0([0,1])={f:[0,1]→R∣f is continuous.} with d(f,g)=∥f−g∥∞:=supx∈[0,1]∣f(x)−g(x)∣
Definition. If (x,d) is a metric space, an ϵ-ball is Bϵ(x)={y∈X∣d(x,y)<ϵ}.
Definition. A set S⊆(X,d) is open if ∀x∈S,∃ϵ>0 s.t. Bϵ(x)⊆S. Closed sets are complements of an open set.
What is compactness now? A sequential definition of compactness: every sequence has a convergent subsequence whose limit is in S. Consider the sequence {fn(x)}n=1∞ in C0([0,1]), fn(x)=xn The sequence is bounded. The sup-norm is bounded by 2, but their ``limit’’ seems to be discontinuous, where it is y=0 for [0,1) and 1 at 1. Clearly this sequence does not have a convergent subsequence, because they converge to something outside of our space. So compactness needs to be more complex when thinking about spaces that are infinite-dimensional.
CONNECTEDNESS
Definition. A separation of a set S⊆Rn is a pair of disjoint nonempty sets A,B⊆Rn such that S=A∪B and Aˉ∩B=A∩Bˉ=∅. A set is connected if no separation exists.
Example:
The set S={(x,y)∣(x−1)2+y2<1}∪{(x,y)∣(x+1)2+y2<1}
The origin does not belong to either set
Theorem. The connected subsets of R are the intervals [open, closed, half-open, unbounded].
Proof:
If S is not an interval, it is disconnected. This means that there exist a,b in S and c∈R s.t. a<c<b, but c∈/S. Set A=S∩(−∞,c),B=S∩(c,∞). There is only one point in common across the intersection Aˉ∪Bˉ, which is c. But c∈/S.
If S=[a,b], say S has some separation A∪B=S. WLOG, we can assume a∈A,b∈B. Let’s set c=supA, c∈Aˉ. Because this is a separation, then c∈/B by the definition of separation. Therefore c∈A. Because c∈A,∈/B, in particular, c=b. But then (c,b]⊆B, so c∈Bˉ and thus cannot be in A. This is a contradiction. So, [a,b] must be connected. Say S is an unboudned interval and has a separation A∪B=S.
Intermediate Value Theorem. The continuous image of a connected set is connected. If f is a map from S∩Rn→Rm is continuous and S is connected, then f(S) is connected.
Proof by contrapositive: if the image is disconnected, then the original set is disconnected.
If A∪B=f(S) is a separation of the image, then f−1(A)∪f−1(B)=S is a separation of the original set.
Both of these sets are nonempty because their union was the image.
Take x0∈f−1(A) and say x0∈f−1(B)ˉ.
Use the sequential definition of belonging to the closure; there exists a sequence of points xk which are in the preimage of B which converge xk→x0∈f−1(B).
Since f is continuous, f(xk)→f(x0)∈B.
But f(x0)∈A∩Bˉ, contradicting that A∩B=f(S) is a separation.
Corollary: If f:S⊇Rn→R and S is connected, then f(S) is an interval. Thus ∀a,b∈S, where f(a)<t<f(b), there must exist a third element of the set such that f(c)=t
Example: The function f(x)=x367+357x250+42eπ2x83+x2+17 has a solution in R. How do we know it exists? Because of the intermediate value theorem. At some point it is positive, and at some point it is negative, which means that at some point it is zero.
Definition. A set is path-connected if ∀x,y∈S:∃g:[0,1]→S:g(0)=x,g(1)=y, noting g is continuous. This function defines a path in the set S.
Theorem. If S is path-connected, then S is connected. To prove, show the contrapositive and proof by contradiction as needed.
Topologist’s Sine Curve. It’s possible to be connected but not path connected. Let S={(0,y)∣y∈[−1,1]}∪{(x,sin(π/x))∣0<x<2}. This is not a separation because if you look at the closure of either of these two sets, they meet the other. But if you look at any point on the vertical line, there is no continuous path in the set which goes from a point in one set to the other, because it would need to pass through the crazy part of the curve.
Notes
Every function is surjective onto its own preimage
Lecture 6: Uniform Continuity
When we define f:S⊂Rn→Rm continuous on S, this means that for all points in S, f is continuous at that point (epsilon-delta) definition.
However, note that the delta we choose is dependent on x (i.e. we need to adapt the delta based on the given conditions, i.e. shape of f around x and tightness of ϵ)
Uniform continuity. A function f:S⊆Rn→Rm is uniformly continuous on S if the following definition is satisfied.
∀ϵ>0,∃δ>0:∀x,y∈S,∥x−y∥<δ⟹∥f(x)−f(y)∥<ϵ
Example: f(x)=sin(x) is uniformly continuous on R.
Geenerally speaking, if f satisfies ∀x,y∈S,∃M>0:∥f(x)−f(y)∥≤M∥x−y∥, then f is uniform continuous on S, with ϵ=δ/M.
Lipschitz inequality with Lipschitz constant M. We say f is M-Lipschitz on S.
Consider f(x)=ex:R→R is not uniformly continuous on R. This is not uniformly continuous. Assume that it is uniformly continuous on R. Let ϵ=1. Then there exists some positive δ such that ∀x,y∈R, ∣x−y∣<δ⟹∣ex−ey∣<1. Notice that ∣eδ/2−1∣>0 and limx→∞ex=∞, thus ∃x0∈R s.t. then ∣ex0(eδ/2−1)∣>1. But then, ∣ex0+δ/2−ex0∣>1, which is a contradiction.
But, who gives a fuck about uniform continuity?
Suppose f:S⊂Rn→Rm is continuous, and {xn}n=1∞⊆S is Cauchy. Is {f(xn)}n=1∞ Cauchy?
Consider S=(0,1), f(x)=1/x:(0,1)→R. The sequence converges outside of its set. {xn}n=1∞={1/n}n=1∞⊆S, xn→0, {f(xn)}n=1∞={n}n=1∞
Theorem. If f:S⊆Rn→Rm is uniform continuous on S, then f sends Cauchy sequences to Cauchy sequences.
Theorem. (Heine, 1870). Suppose S⊆Rn is compact. Then if f:S⊆Rn→Rm is continuous, it is uniformly continuous.
(Proof.)
Assume for the sake of contradiciton that f is not unfiromly continuous.
This means that there exists an ϵ such that for every δ, there exists x,y∈S such that ∥x−y∥<δ and ∥f(x)−f(y)∥>ϵ.
Suppose we have the sequence δ=1,1/2,1/3,...,1/n,....
For each element in this sequence, ∃x,ys.t.∥xn−yn∥<1/n and ∥f(xn)−f(yn)∥>ϵ.
By BW Theorem, there exists a subsequence xnj→a∈S, and also that ∥xnj−ynj∥→0, so ynj→a.
Therefore, f(xnj−f(ynj)→0).
This contradicts ∥f(xnj)−f(ynj)∥>ϵ.
Lecture 7: Differential Calculus
Some history
Many real analysis cours eare taught “backwards”
Sharaf al-Din al-Tusi, 12th century Iranian mathematician, derivative of cubic equation
Bhaskara II, 12th century Indian mathematician, derivative of sine function, Taylor approximations
Merton School, 14th century, mean value theorem, etc.
An explosion of interest in calculus and derivatives in the 17th century, growing out of practical questions, trying to solve this type of question: You have a curve which is the graph of a function. You want to find the tangent line to that point.
Applications:
Angles of intersecting curves (Decartes)
Building telescopes and clocks (Galilei, Huygens)
Finding maxes and mins of functions (Newton, Fermat)
Astronomy (Kepler, Newton)
Derivatives
Leibniz’s proof: If x changes by δx, then y=x2 should change by y+δy=(x+δx)2=x2+2xδx+(δx)2. Relabel δx,δy as dx,dy and as they become ‘infinitely small’, then (δx)2 gets infinitely smaller. Also subtract y=x2. So we get dy=2xdx, or dy/dx=2x.
But what does “infinitely small” mean? When can we just drop (δx)2?
Leibniz is considering some kind of limiting argument.
Consider this as a sort of linear approximation.
Idea: given f:R→R. Find a linear function l(x)=mx+b;m,b∈R approximating f(x) at x=a.
Basic criteria: f(a)=l(a); i.e. f(a)=ma+b; l(x)=m(x−a)+f(a)
The difference f(x)−l(x) goes to zero faster than x−a as x→a. In other words, x−af(x)−l(x)→0 as x→a.
We can rewrite the error of the linear approximation. Let h=x−a. Then f(x)−l(x)=f(a+h)−f(a)−mh=:E(h) ‘‘error’’. ‘
Definition. Let f:I⊆R→R where a∈I. We say f is differentiable at a if there exists m∈R s.t. f(a+h)=f(a)+mh+E(h) where limh→0E(h)/h=0, where h=x−a. We call m the derivative of f at a. (Note that this is a first-order Taylor approximation.)
But we can rearrange: m=hf(a+h)−f(a)−E(h)=hf(a+h)−f(a). (Error goes to zero.) But if you think of it in the first way, you’re thinking that it can be easured as a linear approximation plus an error.
The condition limh→0hE(h)=0 is often denoted E(h) is O(h), ‘‘little-oh’’. Thus differentiability is described as ``f(a+h) is linear +O(h)’’.
Note that if limh→0hE(h)=0, then limh→0E(h)=0. Then limh→0f(a+h)−f(a)−mh=limh→0E(h)=0. But this implies that f is continuous. Differentiability at a point implies continuity at a point.
We will write the value of m as f′(a), the derivative of f at a. There may not be a function f′(x) such that m=f′(x)∣x=a.
Example. Proof of the product rule. If f,g are differentiable at a, the derivative of f(x)g(x) at a is f′(a)g(a)+f(a)⋅g′(a). Since f,g are differentiable, we can write out their linear approximations. f(a+h)=f(a)+f′(a)h+E1(h), g(a+h)=g(a)+g′(a)h+E2(h). Therefore, f(a+h)g(a+h)=f(a)g(a)+[f′(a)g(a)+f(a)g′(a)]h+E3(h). We have a really ugly term E3=(f(a)+f′(a)h+E1(h))E2(h)+(g(a)+g′(a)h+E2(h))E1(h)+f′(a)g′(a)h2. This is all little-oh, they all have limits which, if you divide by zero, still go to zero as you go to zero.
Lecture 8: Derivatives in 1 Variable, Several Variables, the Mean Value Theorem
Bigger than 45: A; between 37 and 44: B; between 30 and 36: C; less than 29; D
Average: 38, 81%
Median: 40, 84%
Definition. Let f:I⊆R→R, and a∈I be an open interval. f is differentiable at a if there exists m∈R s.t. f(a+h)=f(a)+mh+E(h), such that limh→0hE(h)=0.
Note that, in the limit as h→0, this limit equals m.
Lemma. Say we have f:I⊂R→R and a∈I is open, and a is a local maximum or minimum of the function. If f is differentiable at a, then f′(a)=0.
Proof.
Since a is the local min, f(a+h)≥f(a) for all h sufficiently small.
Then (f(a+h)−f(a))/h has the same sign as h..
Look at the left and right limits as h→0±
Since f is differentiable, the limits must agree, so f′(a)=0.
Rolle’s Theorem. One of the earliest ideas about calculus. If f is continuous on a closed interval [a,b] and f is differentiable on (a,b) and if f(a)=f(b), then there exists c∈(a,b) s.t. f′(c)=0. “What comes up must come down.”
Proof.
From the extreme value theorem, because the closed interval is compact, any continuous function on it has a max and a min.
If one of each is at the endpoints, since max=f(a)=f(b)=min, then f is cnonstant. Then the derivative of the function is 0.
Otherwise, a min or max occurs in the interval, and the mean occurs there.
Non-constructive proof: doesn’t tell you where the min or max is. You can guarantee your tangent slope will vanish somewhere in the interval.
One-variable mean value theorem. Important consequence of Rolle’s theorem. Say f is continuous on [a,b] and differentiable on (a,b). Then ∃c∈(a,b) s.t. f′(c)=b−af(b)−f(a). This line is the secant line connecting the two points. The slope of the tangent line is the same as the slope of the secant line.
Proof
(a,f(a)) and (b,f(b)) are connected by the line l(x)=f(a)+b−af(b)−f(a)(x−a).
Exists c∈(a,b) where f′(c) is the slope of the line.
We claim ∃c∈(a,b) where f′(c) equals the slope of the secant line.
We can write down a new function which is just a difference of the other two: g(x)=f(x)−l(x). g′(c)=0 means that f′(c)=l′(c).
Since g(a)=g(b)=0, we can apply Rolle’s theorem to g, which means that g′(c)=0 for some c∈(a,b).
Definition. We say that f is either increasing or decreasing if ∀a<b, f(a)≤f(b) or f(a)≥f(b), respectively. Strictly increasing if f(a)<f(b) or f(a)>f(b), respectively.
Theorem. Say f is differentiable on I, an open interval.
If ∣f′(x)∣≤C, ∀x∈I, then f is Lipschitz with constant C. That is, ∣f(x)−f(y)∣≤C∣x−y∣.
If f′(x)=0, ∀x∈I, then f is constant.
If f′(x)≥0, ∀x∈I, then f is increasing.
Proof.
For a,b∈I, by the MVT, we have c∈(a,b) s.t. f′(c)(b−a)=f(b)−f(a).
Then 1 implies ∣f′(c)≤C and 2 implies f(b)=f(a).
The constant in the Lipshitz inequality is the absolute value of the derivative, so ∣f(b)−f(a)∣≤C∣b−a∣.
If f′(c)≥0, then f(b)−f(a)≥0. SO f(a)≤f(b).
Corollary. For any a>0, we have
x→+∞limexxa=x→+∞limxalogx=x→0+limx−alogx=0
At infinity, exponentials grow larger than any power of x; any power of x grows larger than log; any log grows larger than any negative power of x.
L’Hopital’s rule – need to be continuous and differentiable in the neighborhood.
1-variable, vector-valued functions Suppose you have a function f:R→Rn, then f(x) is a scalar quantity, but the output is a vector. For each j, you can write the derivative at a as f′(a)=limh→0hf(a+h)−f(a)=(f1′(a),f2′(a),...,fh′(a)). f is differentiable at a, or fj is differentiable at a for all j=1,...,n. If ϕ:R→R and g:R→Rn. Then it satisfies (ϕf)′=ϕ′f+ϕf′. Moreover, (⟨f,g⟩)′=⟨f′,g⟩+⟨f,g′⟩. It follwos that (∥f∥2)′=2∥f∥
Think about these functions as parametrized curves in Rn: t→f(t), e.g. the path of a particle, in which case t→f′(t) is the path of velocity vectors of the particle.
The tangent line too f(t) is l(t)=f(t0)+tf′(t0).
Scalar-valued, multi-variable differentiability. f:D⊆Rn→R. Start with a partial derivative. The derivative of a function w.r.t. 1 variable while fixing the others as constant. Suppose f(x)=f(x1,...,xn). Then \(\frac{\del f}{\del x_j} (\vec{x}) = \lim_{h \to 0} \frac{f(x_1, ..., x_{j+h}, ..., x_n) - f(x_1, ..., x_j, ..., x_n)}{h}\) Just changing a single component of the input.
Partial derivatives don’t tell you the whole story about a function’s local behavior.
Definition. A function f:D⊆Rn→R is differentiable at a∈D if there exists c∈Rn s.t. f(a+h)=f(a)+⟨c,h⟩+E(h) s.t. limh→0∥h∥E(h)=0. Equivlaently, limh→0∥h∥f(a+h)−f(a)−⟨c,h⟩=0. If this limit exists, we write c=∇f(a), the gradient of f at a.
For example, z=f(x) is a surface. z=f(a)+⟨∇f(a),x−a⟩ is a plane. f is differentiable at a if Zsurface−Zplane=f(x)−f(x−a)−⟨∇f(a),a−a⟩=E(x−a)→ faster than ∥x−a∥→00. At any point, the tangent plane is a good approximation for points near the point on the surface.
Theorem. If f is differentiable at a, then all \(\del_j f(\vec{a})\) exist, and \(\nabla f(\vec{a}) = (\del_1 f(\vec{a}), ..., \del_n f(\vec{a}))\).
if there exists m∈R s.t. f(a+h)=f(a)+m⋅h+E(h), where limh→0∥h∥E(h)=0.
Partials are all components of the derivative. If we know something about all of the partials, we can conclude something about the derivative at a point.
Theorem. If f is differentiable at a, then f is continuous at a.
Theorem. If f:D⊆Rn→R and for a∈D, if all \(\del_j f\) exist in a neighborhood of a and are continuous at a, then f is differentiable at a.
Proof in 2D for simplicity.
We want to show that ∥h∥f(a+h)−f(a)−⟨c,h⟩→0 as h→0.
Let \(\vec{c} = (\del_1 f(\vec{a}), \del_2 f(\vec{a}))\).
Then, for h=(h1,h2).
Since f(a+h)−f(a)=[f(a1+h1,a2+h2)−f(a1,a2+h2)]+[f(a1,a2+h2)−f(a1,a2)]
Use the one-variable mean value theorem to bound each of these components.
For ∥x−a∥≤∥h∥<δ sufficiently small, we have that the first term is less than \(\del_1 f(a_1 + c_1, a_2 + h_2) h_1\), \(\del_2 f(a_1, a_2 + c_2) h_2\).
Theorem. If f is differentiable at a, then the directional derivatives at a in any direction exist, and \(\del_{\vec{u}} f(\vec{a}) = \nabla f(\vec{a}) \cdot \vec{u}\).
This implies \(\del_j f(\vec{a}) = \nabla f(\vec{a}) \cdot \vec{e}_j\), ej is the one-hot vector with 1 at j and 0 otherwise.
Proof.
Since f is differentiable at a, we have ∥h∥f(a+h)−f(a)−∇f(a)⋅h→0 as h→0
Let h=tu.
Then tf(a+tu)−f(a)−∇f(a)⋅(tu), note ∥h∥=t.
We can factor out the t: tf(a+tu)−f(a)−∇f(a)⋅u→0 as t→0.
If t<0, since ∥h∥=−t, then −tf(a+tu)−f(a)+∇f(a)⋅u→0 as t→0.
Nice property of the gradient. For any direction u, your function grows quickest in the gradient direction. Since \(\del_{\vec{u}} f(\vec{a}) = \nabla f(\vec{a}) \cdot \vec{u} = \Vert \nabla f(\vec{a}) \Vert \Vert \vec{u} \Vert \cos \theta = \Vert \nabla f(\vec{a}) \Vert \cos \theta \le \Vert \nabla f(\vec{a}) \Vert\). The gradient at every point a always directs you towards the direction of maximal increase.
What is the chain rule? f(g(t)):R→R, where g:R→Rn, f:Rn→R.
Theorem. Let g:R→Rn be differentiable at t=a and f:Rn→R is differentiable at b=g(a). Then f(g(a)) is differentiable at a=t and \(\frac{d}{dt} f(\vec{g}(t)) \vert_{t = a} = \nabla f(\vec{g}(a)) \cdot \vec{g}'(a) = \frac{\del f}{\del x}, \frac{dx}{dt} + \frac{\del f}{\del x_2} \frac{dx_2}{dt} + ... + \frac{\del f}{\del x_n} \frac{d x_n}{dt}\).
Proof.
We have that f(b+h)=f(b)+∇f(b)⋅h+E1(h) with E1(h)/∥h∥→0 as h→0.
We also have g(a+ϵ)=g(a)+g′(a)ϵ+E2(ϵ) with E2(ϵ)/∥ϵ∥→0 as ϵ→0.
Writing h as g(a+ϵ)−g(a)=g′(a)=g′(a)ϵ+E2(ϵ)
We have that g(a+ϵ)=h+b. So f(g(a+ϵ))=f(b+h)=f(b)+∇f(b)⋅h+E1(h)=f(b)+∇f(b)⋅(g′(a)ϵ+E2(ϵ))+E1(h)
We have f(b)+∇f(b)⋅g′(a)ϵ+E3(ϵ), where E3(ϵ)=∇f(b)⋅E2(ϵ)+E1(h)
Then, we have ϵf(g(a+ϵ))−f(g(a))=∇f(g(a))⋅g′(a)+ϵE3(ϵ)
After some tedious algebra, as ϵ→0, ∥h∥→0, and ϵ1≤∥h∥1
Then ϵE3(ϵ)→0 as ϵ→0.
The derivative of a function from Rn→Rm is a matrix.
2nd gradient property. Let F:U⊆R3→R be differentiable and non-constant and its image on U is a smooth surface, then for all elements on the surface, ∇F(a) is perpendicular to S.
Water flowing down the mountain travels at right angles to the level contours. Gradient descent follows the path of water.
Cross-product: only makes sense in R3, produces another vector.
The cross product v×w=∥v∥∥w∥sinθn, where n is the unit vector perpendicular to the plane spanned by v and w.
Lemma. If x(t),y(t):R→R3, f:R→R,c∈R, (fx)′=f′x+fx′ and (x×y)′=x′×y+x×y′ and (x⋅y)′=x′⋅y+x⋅y′.
Lecture 10: Celestial Mechanics, Mean Value Theorem, Higher order Partials
Theorem. Let S⊆Rn be open, containing a,b∈Rn and it contains L connecting a to b. Suppose f:S⊂Rn→R is continuous on L and differentiable on L, except perhaps the endpoints. Then, there exists a point c∈L s.t. ∇f(c)⋅(a−a)=f(b)−f(a).
Proof.
Let h=b−a; then L={a+th∣t∈[0,1]}.
Let ϕ(t):[0,1]→R=f(a+th) is continuous and differentiable on (0,1)
We have ϕ′(t)=∇f(a+th)⋅dtd(a+th)=∇f(a+th)⋅(b−a)
With the 1D MVT, there exists some t0 such that ϕ′(t0)=f(b)−f(a)=∇f(a+t0h)⋅(b−a)=:∇f(c)⋅(b−a).
Definition. A set is convex if a,b∈S, then a+t(b−a)∈S for all t∈[0,1].
Convex functions: preimages of convex sets are convex.
Corollary. If f⊆Rn→R is differentiable on S open, convex; if ∥∇f(x)∥≤M, ∀x∈S, then ∀a,b∈S, then ∣f(b)−f(a)≤M∥b−a∥.
Proof.
By convexity, from the MVT, ∀a,b∈S,∃c∈La,b such hthat \(\nabbla f(\vec{c}) \cdot (\vec{b} - \vec{a}) = f(\vec{b}) - f(\vec{a})\).
Thus ∣f(b)−f(a)∣≤∥∇f(c)∥∥b−a∥≤M∥b−a∥.
Corollary. If f is differentiable on S, open and convex, and ∇f(x)=0, $$forall x \in S\(, then\)f\(is constant on\)S$$.
Theorem. Suppose f is differentiable on S, open and connected, and the gradient vanishes everywhere. Then f i constant.
Higher Order Partials.
If f:S⊆Rn→R is differentiable on $$S$ and open. Its partials are also functions, which may also have partials.
Lecture 14: 1-Dimensional Integration, Integration in Higher Dimensions
Theorem. If f is bounded and monotone on [a,b] then f is integrable on [a,b].
Theorem. If f is continuous on [a,b], then f is integrable on [a,b]. Proof.
Since f is bounded on [a,b] by the extreme value theorem, then UPf,LPf exist for any partition P.
Since f is continuous on a compact set, f is uniformly continuous.
For all ϵ>0, ∃δ s.t. ∀x,y∈[a,b], ∣x−y∣<δ⟹∣f(x)−f(y)∣<ϵ.
Let P be a partition of [a,b] into equally spaced intervals each with length <δ.
So we have Mj−mj<ϵ/(b−a)
Meaning Upf−Lpf=∑j=1J(Mj−mj)(xj−xj−1)≤b−aϵ∑j=1J(xj−xj−1)=b−aϵ(xJ−x1)=ϵ/(b−a)⋅(b−a)=ϵ
Note that partitions are all finite, but you can bound them by any ϵ. So we have ∀ϵ>0,∃P:UPf−LPf≤ϵ
Midterm: 1.8 up to end of chapter 2
Function: f(x)={01/qif x∈/Qif x=p/q in lowest terms.
f(x) is bounded, but not monotone or continuous
But it is integrable on [0,1]
Theorem. If f is bounded on [a,b] and is continuous on [a,b] except at finitely many points, then f is integrable on [a,b]. Proof.
Suppose y1,...,yL are points of discontinuity.
Set m=inf{f(x)∣[a,b]} and $$M = \sup { f(x) \vert [a, b] }
Given δ>0, set Il=[a,b]∩[yl−δ,yl+δ]
U=⋃l=1LIl, V=[a,b]∖Uint
Notice that the length of U is $$
le 2 \delta L,sof : V \to R$$ is continuous
If P is any partition containing δU, we can write UPf=UPUf+UPVf
basically… with time, ∫abf(x)−g(x)dx=0, at finitely many points, for some surrogate g
Fundamental Theorem of Calculus. There are two!
Let f be integrable on [a,b]. For x∈[a,b], define F(x)=∫axf(t)dt. Then F is continuous on [a,b] and differentiable on (a,b), and F′(x)=f(x) on [a,b]. (It makes sense even if f is integrable but not continuous!) Proof:
Since ∀x,y∈[a,b] it is all true that F(y)−F(x)=∫xyf(t)dt
You can define C=sup{∣f(t)∣:t∈[a,b]}
We see that ∣F(y)−F(x)∣≤∫xy∣f(t)∣dt≤C∣y−x∣. Which means that F is Lipschitz continuous.
Say that f is continuous at x0∈[a,b]. Then ∀ϵ>0,∃δ>0 s.t. ∣t−x0∣<δ⟹∣f(t)−f(x0)<ϵ.
We get that limy→x0y−x0F(y)−F(x0)=f(x0) by an epsilon-delta argument.
Let F be continuous on [a,b] and differentiable except at finitely many points of [a,b]. Let f be a function which agrees with F′(x) where it is defined. If f is integrable on [a,b] then ∫abf(t)dt=F(b)−F(a).
Suppose P={x0,...,xJ} is a partition of [a,b]. Then after perhaps refining we can assume each point of F being non-differentiable is an end point of a subinterval of P.
Then ∀j,F is continuous on [xj−1,xj] and differentiable on (xj−1,xj).
Thus by the MVT, F(xj)−F(xj−1)=F′(tj)(xj−xj−1)=f(tj)(xj−xj−1) for some tj∈(xj−1,xj)
Then, we get a telescoping sum: F(b)−F(a)=F(xJ)−F(x0)=∑j=1Jf(tj)(xj−xj−1)
But we know this is bounded between the lower and upper Riemann sums: LPf≤F(b)−F(a)≤UPf
We know we can make the difference between the two arbitrarily small. Therefore F(b)−F(a)=∫abf(t)dt if f is integrable.
Integration in higher dimensions.
A product of intervals is a rectangle R=[a,b]×[c,d]
A partition is a grid of rectangles.
Definition. If f:R2→P and P a partition of R, as above. Define mjk=infx∈Rjkf(x),Mjk=supx∈Rjkf(x). We can define upper and lower Riemann sums by similar logic.
Definition. The characteristic function of S is χS(x)={10if x∈Sif x∈/S
Indicator functions are interesting examples depending on the set.
So you can define the integral over a non-rectangle as the integral over a rectangle of the indicator function.
Indicator functions can actually fuck up nice behavior of our original sets. So how can you guarantee that the result is still integrable? But the boundary of S can have zero content. f×χS will only be discontinuous on the boundary of S.
Lecture 15: Higher Dimension Integration
if S∈Rn, the characteristic function of S is 1 if the point is in S and 0 otherwise
If f:S∈R2→R is bounded, and S is bounded and R is any rectangle containing S, then then f⋅χS is integrable on R and define ∫∫SfdA=∫∫Rf⋅χSdA
Theorem declaring the properties of integrals:
linearity – a linear combination of any to integrals is integrable on S
if a function is integrable on bounded disjoint domain sets, it is integrable on their union and the sum of their individual integrals
if f,g are integrable and f(x)≤g(x) for all x∈S, then you have a bound on the integrals: ∫∫Sf(x)dA≤∫∫Sg(x)dA
if f is integrable on S, then ∣f(x)∣ is integrable on S and ∣∫∈Sfdx∣≤∫∫S∣f∣dA
Combining a. and d. gives you a form of the triangle inequality ∣∫∫f+gdA∣≤∫∫∣f∣dA+∫∫∣g∣dA
Even if f is very nice, f⋅χS won’t be continuous on R.
Lemma. χS is dicontinuous at x means x∈∂S
If x∈Sint, there exists a ball of some radius centered at x which is a subset of S, therefore the characteristic function is constant and continuous in the neighborhood of that point.
If x∈(SC)int, you are also constant.
If \(\vec{x} \in \partial S4$, then\)\forall \delta > 0$4, Bδ(x)∩S=∅ and Bδ(x)∩SC=∅.
Proposition.
If Z⊆R2 has zero content and U⊆Z then U has zero content.
If Z1,...,Zk have zero content, then their union has zero content.
If f:(a0,b0)→R2 is C1, then the image of any subinterval f((a0,b0)) has zero content.
Each part of the curve can be contained in some set of rectangles; you can find a set of rectangles whose total area is less than your tolerance.
Proof for 3.
C1 buys us the mean value theorem
If Pk is some equally spaced partition of [a,b], of subintervals of length δ=kb−a, C=sup{∣f′(t)∣}, writing f(t)=(x(t),y(t)), we apply MVT to each comp to obtain ∣x(t)−x(tj)∣≤Cδ, ∣y(t)−y(tj)∣≤Cδ.
Together, these imply that the image of the subinterval is contained in a square of side length Cδ.
Then the area as k→∞ goes to zero.
The image of any curve in C1 will have zero content.
If f:S∈Rk→Rn, k<n, is C1, and 44Sisbounded,thenf(S)$$ has zero content.
Remark. A set S⊂R2 s.t. S is bounded and ∂S has zero content, then S is Jordan-measurable.
Theorem. If S⊆R2 is Jordan measurable and f:R2→R is bounded and continuous on S except perhaps on a set of zero content, then f is integrable on S.
Proposition. If Z⊆R2 of zero content and f:R2→R is bounded then f is integrable on Z and has integral zero.
Proof.
If ϵ>0, there exists some collection of rectangles such that Z is contained in their union and the sum of their areas is less than ϵ
After subdividing the rectangles as needed, we can assume they form a partition of a rectangle containing Z and interior pairwise disjoint
Setting C=supZ∣f(x) then we have −Cϵ<−C∑j=1MA(Rj)≤LP(f)≤UP(f)<C∑j=1MA(Rj)<Cϵ. This tells us both the lower and upper Riemann sums are within Cϵ of each other, and therefore the integral is zero.
Corollary
If f is integrable on S⊆R2 and f=g except on a set of zero content, then ∫∫SdA=∫∫SgdA
If f is integrable on S,T and S∩T has zero content, then f is integrable on S∪T and ∫∫fdA=∫∫SfdA+∫∫TfdA
How to generalize to higher dimensions? Every argument, lemma, proposition, and theorem in this section is generalizable in higher-dimensional spaces in a straightforward way. Turn squares into boxes. Riemann sums are defined on partitions of boxes.
Mean Value Theorem for integrals. If S⊆Rn is compact, connected, and Jordan-measurable, and f,g are continuous on the set. Also g is non-negative. Then there exists some a∈S such that ∫Sf(x)g(x)dv=f(a)∫Sg(x)dV.
Proof.
Because f is continuous on S, by the extreme value theorem, we have that there exists M,m such that m≤f(x)≤M for all x∈S.
That is, we have that m∫SgdV≤∫f⋅gdV≤M∫SgdV.
This implies m≤∫Sgd∫Sf⋅gdVM
By the intermediate value theorem, there exists an a∈S such that f(a)=∫SgdV∫Sf⋅gdV.
Corollary. Average Value. If S⊆Rn is compact, connected, Jordan-measurable, and f is continuous on S, then there exists some a∈S such that f(a)=AvgS(f)=Vol(S)1∫SfdV. There is some point in S where you are equal to your average across the set.