Link Search Menu Expand Document

Lecture Notes

MATH 334


Table of contents
  1. Lecture 1: Background and Notation
  2. Lecture 2: Euclidean Spaces, Open Sets, Limits and Continuity, Sequences
  3. Lecture 3: Sequences, Induction, the Completeness Axiom
  4. Lecture 4: Sequences, Extrema
  5. Lecture 5: Compactness and Connectedness
  6. Lecture 6: Uniform Continuity
  7. Lecture 7: Differential Calculus
  8. Lecture 8: Derivatives in 1 Variable, Several Variables, the Mean Value Theorem
  9. Lecture 9: C1C^1-functions, chain rule, gradients, Kepler’s laws
  10. Lecture 10: Celestial Mechanics, Mean Value Theorem, Higher order Partials
  11. Lecture 14: 1-Dimensional Integration, Integration in Higher Dimensions
  12. Lecture 15: Higher Dimension Integration

Lecture 1: Background and Notation

  • Plan for fall – chapters 1 to 4 of the Folland textbook
  • Pre-midterm I: chapter 1
  • Pre-midterm 2: chapters 2 and 3
  • Grade breakdown:
    • Reading responses, free points, 5%
    • HW, posted weekly, assigned on Wednesday and due the following Wednesday at 11:59, 25%
    • Midterms: 20%
    • Final: 30%
    • All exams are inclass.

Historical Background

  • Fourier and the crisis of the 19th century
  • Fourier’s ideas animate the course
  • 1795: joined Napolean’s expedition to Cairo
  • Mathematics and imperialism
  • Becomes very interested in heat
  • How does heat diffuse across an object?
  • The total heat distribution should solve 2ux2=wu\frac{\partial^2 u}{\partial x^2} = \partial w^u and u(0,x)=f(x)u(0, x) = f(x)
  • Fourier found a general method for solving this – partial differential equations
  • His method requires assuming a special form: you can write your initial function as a sum of cosines. Then the distribution uu can be found easily.
  • However this assumes that f(x)=0;x=±1f(x) = 0; x = \pm 1, and that the function is continuous
  • You can’t solve f(x)=1f(x) = 1, for instance: you can’t solve for heat when the temperature applied to a bar is constant.
  • What’s the solution? Bold idea: larger sums of cosines can predictably better approximate functions which cannot themselves be written as a finite sum.
    • For instance: write 1=4π[cos(πx2)13cos(3πx2)+15cos(5πx2)+...]1 = \frac{4}{\pi} \left[ \cos \left( \frac{\pi x}{2} \right) - \frac{1}{3} \cos \left( \frac{3\pi x}{2} \right) + \frac{1}{5} \cos \left( \frac{5 \pi x}{2} \right) + ... \right], for x(1,1)x \in (-1, 1)
    • You cannot do this with a Taylor epansion, you can’t write constant functions in terms of nonconstant functions.
    • You immediately begin to run into all sorts of funny paradoxes
    • Notice that cos((2n1)π(x+2)2)=cos((2n1)πx2+(2n1)π)\cos \left( \frac{(2n-1) \pi (x+2)}{2} \right) = \cos \left( \frac{(2n-1) \pi x}{2} + (2n-1) \pi \right): this amounts to a phase shift of π\pi in each instance
    • Therefore you could write 1-1 simply by shifting, so the formula is the same but the interval is now (1,3)(1, 3)
    • All you did was add continuous things, but you got something totally noncontinuous…
  • Response to Fourier: this is not a function. If you could call that a function and consider it a representation of such a familiar function, maybe Weierstrauss was onto something when he claimed W(x)=cos(πx)+12cos(13πx)+14cos(169πx)+...=j=0cos(13jπx)2jW(x) = \cos(\pi x) + \frac{1}{2} \cos (13 \pi x) + \frac{1}{4} \cos (169 \pi x) + ... = \sum_{j=0}^\infty \frac{\cos(13^j \pi x)}{2^j}.
    • This series converges and is a function
    • It is continuous in xx
    • It is differentiable nowhere – why can’t I just take the derivatives of each term? – you can’t
    • “A monstrosity”
    • What does this do to our conception of continuity?
  • Several questions follow:
    • What is a function in the first place?
    • When can a function be represented as a Fourier series, ideally one that converges?
    • When can a series be differentiated or integrated, term-by-term?
    • What is differentiation? What is integration?
    • What is continuity? WHat is convergence?
  • This class will be occupied by these questions

Notation

  • Basic
    • \forall, for all
    • \exists, there exists
    • Summation: n=1kan=a1+a2+...+ak\sum_{n=1}^k a_n = a_1 + a_2 + ... + a_k
    • N\mathbb{N}, the naturals / whole numbers. N={1,2,3,...}\mathbb{N} = \{1, 2, 3, ...\}
    • Z\mathbb{Z}, the integers. Z={...,2,1,0,1,2,...}\mathbb{Z} = \{..., -2, -1, 0, 1, 2, ...\}
    • Q\mathbb{Q}, the rationals. Q={pqp,qZ,q0}\mathbb{Q} = \left\{ \frac{p}{q} \mid p, q \in \mathbb{Z}, q \neq 0 \right\}
    • R\mathbb{R}, the reals.
  • Set theory
    • Ω\Omega, the universal set
    • Union: AB={xΩxxAxB}A \cup B = \{x \in \Omega \vert x \mid x \in A \vee x \in B\}
    • Intersection: AB={xΩxxAxB}A \cap B = \{x \in \Omega \vert x \mid x \in A \wedge x \in B\}
    • Complement: AC={xΩxxA}A^C = \{x \in \Omega \vert x \mid x \notin A\}
    • Set difference: AB={xΩxxAxB}A \setminus B = \{x \in \Omega \vert x \mid x \in A \wedge x \notin B\}
      • Can also be written as AB=ABCA \setminus B = A \cap B^C
    • Big-Union: j=1kAj=A1A2...Ak={xj,xAj}\bigcup_{j=1}^k A_j = A_1 \cup A_2 \cup ... \cup A_k = \{ x \| \exists j, x \in A_j \}
    • Big-Intersection: j=1kAj=A1A2...Ak={xj,xAj}\bigcap_{j=1}^k A_j = A_1 \cap A_2 \cap ... \cap A_k = \{ x \| \forall j, x \in A_j \}
  • Mappings and functions
    • Function from a set AA to a set BB is an assignment of every element in AA to exactly one element in BB, denoted by f:ABf: A \to B
    • AA: domain of ff
    • BB: codomain of ff
    • f(A)={yBxA,f(x)=y}Bf(A) = \{ y \in B \vert \exists x \in A, f(x) = y \} \subset B, the image/range of ff
    • ff is injective / 1-to-1 if a1a2f(a1)f(a2)a_1 \neq a_2 \to f(a_1) \neq f(a_2)
    • ff is surjective / onto if bB,AA:f(A)=b\forall b \in B, \exists A \in A : f(A) = b
    • ff is bijective if both injective and surjective. Bijectivity gives you one-on-one pairing. Bijection is pairing but also works for infinite sets.
    • If f:ABf: A \to B and g:BCg : B \to C, we can write their composition as gf:ACg \circ f : A \to C, where (gf)(x)=g(f(x))(g \circ f)(x) = g(f(x)). Requirement: the codomain of ff must be a subset of the domain of gg.
    • If f:ABf : A \to B and g:BAg : B \to A, ff is invertible if g:BA:gf\exists g: B \to A : g \circ f s.t. AAA \to A satisfies xA,gf(x)=x\forall x \in A, g \circ f(x) = x an fg:BBf \circ g: B \to B satisfies yB,fg(y)=y\forall y \in B, f \circ g(y) = y. Basically: it sends every element back to itself.
      • Bijectivity implies invertability.
      • Definition: g=f1g = f^{-1}
  • Euclidean spaces and vectors
    • The reals R\mathbb{R}
    • We can also consider tuples of reals: Rn\mathbb{R}^n for nNn \in \mathbb{N}, the set of ordered tuples of reals. Rn={(x1,x2,...,xn)j,xjR}\mathbb{R}^n = \{(x_1, x_2, ..., x_n) \vert \forall j, x_j \in \mathbb{R}\}
    • Vectors also used to describe nn-tuples
    • Vector arrows are menat to capture that an nn-tuple can also describe a direction and magnitude
    • The magnitude / norm of a vector / nn-tuple is x=x12+x22+...+xn2\Vert \vec{x} \Vert = \sqrt{x_1^2 + x_2^2 + ... + x_n^2}
    • Vector addition: x+y=(x1+y1,x2+y2,...,xn+yn)\vec{x} + \vec{y} = (x_1 + y_1, x_2 + y_2, ..., x_n + y_n)
    • Scalar multiplication: λx=(λx1,λx2,...,λxn)\lambda \vec{x} = (\lambda x_1, \lambda x_2, ..., \lambda x_n)
    • Dot product / inner product: xy=x1y1+x2y2+...+xnyn\vec{x} \cdot \vec{y} = x_1 y_1 + x_2 y_2 + ... + x_n y_n
      • Note that the norm is defined in terms of a dot product: x=xx\Vert \vec{x} \Vert = \sqrt{\vec{x} \cdot \vec{x}}

The Cauchy-Schwarz Inequality

  • If a,bRn\vec{a}, \vec{b} \in \mathbb{R}^n, abab\vert \vec{a} \cdot \vec{b} \vert \leq \Vert \vec{a} \Vert \Vert \vec{b} \Vert
  • Going to be your best friend in this class
  • This proof, which seems to work only in Euclidean space, is very much broadly applicable.
  • Proof:
    1. If b=0\vec{b} = \vec{0}, then both sides are zero. The inequality holds.
    2. Now, assume b0\vec{b} \neq \vec{0}. Consider the function f:RR0f : \mathbb{R} \to \mathbb{R}_{\ge 0} where f(t)=atb2=(atb)(atb)=a22t(ab)+t2b2f(t) = \vert \vert \vec{a} - t \vec{b} \vert \vert^2 = (\vert{a} \cdot t \vert{b}) \cdot (\vec{a} - t\vec{b}) = \vert \vert \vec{a} \vert \vert^2 - 2t (\vec{a} \cdot \vec{b}) + t^2 \vert \vert \vec{b} \vert \vert^2
    3. This quadratic function has a minimum at t=abb2t = \frac{\vec{a} \cdot \vec{b}}{\vert \vert b \vert \vert ^2}
    4. Plugging this into ff gives you a2(ab)2b2\vert \vert \vec{a} \vert \vert^2 - \frac{(\vec{a} \cdot \vec{b})^2}{\vert \vert \vec{b} \vert \vert^2}
    5. Since f(t)0f(t) \ge 0, a2(ab)2b20\vert \vert \vec{a} \vert \vert^2 - \frac{(\vec{a} \cdot \vec{b})^2}{\vert \vert \vec{b} \vert \vert^2} \ge 0, so a2b2(ab)2\vert \vert \vec{a} \vert \vert^2 \vert \vert \vec{b} \vert \vert^2 \ge (\vec{a} \cdot \vec{b})^2
    6. Take square roots to get our inequality.

Lecture 2: Euclidean Spaces, Open Sets, Limits and Continuity, Sequences

  • Triangle inequality: a,bRn\forall \vec{a}, \vec{b} \in \mathbb{R}^n, a+ba+b\| \vec{a} + \vec{b} \| \le \| \vec{a} \| + \| \vec{b} \|
    • Norm of sum is less than or equal to the sum of the norms
    • Proof
      1. Since (a+b)2=(a+b)(a+b)=a2+2ab+b2(\| \vec{a} + \vec{b} \|)^2 = (\vec{a} + \vec{b}) \cdot (\vec{a} + \vec{b}) = \|a\|^2 + 2\vec{a}\vec{b} + \|b\|^2
      2. Appplying CS to the middle term: a2+2ab+b2\le \| \vec{a} \|^2 + 2 \| \vec{a} \| \| \vec{b} \| + \| \vec{b} \|^2
      3. Simplify: (a+b)2( \| \vec{a} \| + \| \vec{b} \|)^2
    • Why is it called the triangle inequality?
      • Define distance between two vectors using a norm: it’s faster to go along the direct line (‘hypotenuse’) than the legs of the triangle
      • Formally: d(a,b)=ab=(a1y1)2+(x2y2)2+...d(\vec{a}, \vec{b}) = \| \vec{a} - \vec{b} \| = \sqrt{(a_1 - y_1)^2 + (x_2 - y_2)^2 + ... }
  • We use xy\vec{x} \cdot \vec{y} or x,y\langle \vec{x}, \vec{y} \rangle to denote the dot product
  • The angle between vectors describes some plane. By the Cauchy-Schwartz theorem, xyxy\frac{\vec{ x} \cdot \vec{y}}{\| \vec{x} \| \|\vec{y} \|} is between 1-1 and 11, so it is a cosine. Therefore, cosθx,y=xyxy\cos \theta_{\vec{x}, \vec{y}} = \frac{\vec{x} \cdot \vec{y}}{\| \vec{x} \| \| \vec{y} \|}
    • x,y\vec{x}, \vec{y} are perpendicular iff xy=0\vec{x} \cdot \vec{y} = 0, in which case they are orthogonal.
    • As soon as you have a notion of inner product, you can define the angle between vectors.
  • Useful inequality: if you have a vector xRn\vec{x} \in \mathbb{R}^n, x=(x1,...,xn)\vec{x} = (x_1, ..., x_n), then its maximum of the absolute value of the vector is less than the norm of x\vec{x}, and itself less than nmax{x1,...,xn}\sqrt{n} max \{ \vert x_1 \vert, ..., \vert x_n \vert \}.
    • x=max{x1,...,xn}\| \vec{x} \|_{\infty} = max \{ \vert x_1 \vert, ..., \vert x_n \vert \}, called the LL^\infty norm.
    • This is to be contrasted with the usual L-2 norm, x2\Vert \vec{x} \Vert_2, which is the square root of the sum of the squares of the components.
    • Norms are generalizable across vector spaces.

Subsets of Rn\mathbb{R}^n - introduction to topology

  • Topology tries to abstract the notion of “nearness” or “closeness” in a space, without specific notions of angle, length, etc.
  • What do you need to describe a space? You need to say when points are near, and you don’t need to do so quantiatively.
  • Definition: Let y\vec{y} be a point in Rn\mathbb{R}^n and r>0r > 0. Define an open ball Br(y)B_r(\vec{y}) to be the collection of all points in Rn\mathbb{R}^n such that the distance from every point to the center is less than rr.
    • Formally: Br(y)={xRnxy<r}B_r(\vec{y}) = \{ \vec{x} \in \mathbb{R}^n \vert \| \vec{x} - \vec{y} \| < r \}
  • A set SRnS \subset \mathbb{R}^n is bounded if it is contained in some ball centered at the origin 0\vec{0}. There exists a sufficiently large radius for an enclosing ball.
  • Definition: a point xRn\vec{x} \in \mathbb{R}^n is an interior point of SS if there exists a ball centered at some point which is strictly contained in that set, where x\vec{x} is contained in that ball.
  • Definition: the set of points in SS which are interior points of SS is called the interior of SS, denoted SintS^\text{int}.
Sint={xSr>0,Br(x)S}S^\text{int} = \{ \vec{x} \in S \vert \exists r > 0, B_r(\vec{x}) \subset S \}
  • Examples of interiors:
    • If S=Br(y)S = Br(\vec{y}), then Sint=SS^\text{int} = S
    • If S={1nRnN}S = \{ \frac{1}{n} \in \mathbb{R} \vec n \in \mathbb{N} \} (subset of a line), then Sint=S^\text{int} = \emptyset, since there is no ball around any point which is contained in SS. Also note that 0S0 \notin S.
  • Definition: a point xRn\vec{x} \in \mathbb{R}^n is a boundary point of SS if every ball centered at x\vec{x} contains a point in SS and a point not in SS.
  • Definition: the set of boundary points of SS is called the boundary of SS, denoted S\partial S.
    • Example 1: if S=Br(y)S = Br(\vec{y}), S={xRnxy=r}\partial S = \{ \vec{x} \in \mathbb{R}^n \vert \| \vec{x} - \vec{y} \| = r \}, the sphere of radius rr centered at y\vec{y}.
    • Example 2: if S={1nRnN}S = \{ \frac{1}{n} \in \mathbb{R} \vert n \in \mathbb{N} \}, the boundary is {0}\{ 0 \}. This is because every ball around 00 contains points in SS and points not in SS. Or maybe not… depends on if you need to look at the center or not. Otherwise the definition is kind of empty… need to clarify.
  • Definition: the closure of a set Sˉ=SS\bar{S} = S \cup \partial S. The closure is the set of all points in SS and all boundary points of SS.
    • Example 1: if S=Br(y)S = Br(\vec{y}), then Sˉ={xRnxyr}\bar{S} = \{ \vec{x} \in \mathbb{R}^n \vert \| \vec{x} - \vec{y} \| \le r \}, the closed ball of radius rr centered at y\vec{y}.
    • Example 2: if S={1nRnN}S = \{ \frac{1}{n} \in \mathbb{R} \vert n \in \mathbb{N} \}, then Sˉ=S{0}\bar{S} = S \cup \{ 0 \}.
  • Definition: SS is open if it contains no boundary points. SS is closed if it contains all of its boundary points.
    • Example 1: Br(y)Br(\vec{y}) is open
    • Example 2: {1nRnN}\{ \frac{1}{n} \in \mathbb{R} \vert n \in \mathbb{N} \} is closed
    • Example 3: {1nRnN}{0}\{ \frac{1}{n} \in \mathbb{R} \vert n \in \mathbb{N} \} \cup \{ 0 \} is neither open nor closed
  • Proposition: If SRnS \subset \mathbb{R}^n,
    • SS is open iff it is equal to its interior
    • SS is closed iff its complement is open
    • Proof:
      1. An element of SS is iether an interior point or a boundary point.
      2. SS is open iff every point is interior.
      3. S=Sˉ=SS    SS    (SC)SS = \bar{S} = S \cup \partial S \iff \partial S \subset S \implies \partial(S^C) \subset S (boundaries are shared between a set and its complement)     SC\iff S^C contains no boundary points
    • Example:
      • If S=S = \emptyset, this is a set which is both open and closed.
      • If S=QS = \mathbb{Q}, the set of fractions, this set is neither open nor closed. The interior is empty, Sint=S^{\text{int}} = \emptyset; yet the closure of SS is R\mathbb{R}, so SS is not closed.

Limits

  • “You can’t lift your pencil”
  • Definition: let f:RnRf: \mathbb{R}^n \to \mathbb{R} be a real-valued function.
limxa=L if ϵ>0,δ>0:f(x)L<ϵ whenever xa<δ\lim_{\vec{x} \to \vec{a}} = L \text{ if } \forall \epsilon_{> 0}, \exists \delta_{> 0} : \vert f(\vec{x}) - L \vert < \epsilon \text{ whenever } \Vert \vec{x} - \vec{a} \Vert < \delta
  • aa whenever bb: bab \to a
  • We can replace xa<δ\Vert \vec{x} - \vec{a} \Vert < \delta with the LL^\infty norm:
xa=max{x1a1,...,xnan}<δn\Vert \vec{x} - \vec{a} \Vert_{\infty} = \max \{ \vert x_1 - a_1 \vert, ..., \vert x_n - a_n \vert \} < \frac{\delta}{\sqrt{n}}
  • You can also consider a limit on a subset, f:SRnRf : S \sub \mathbb{R}^n \to \mathbb{R}.
limxa,xSf(x)=L if ϵ>0,δ>0:f(x)L<ϵ whenever xa<δ and xS\lim_{\vec{x} \to \vec{a}, \vec{x} \in S} f(\vec{x}) = L \text{ if } \forall \epsilon_{> 0}, \exists \delta_{> 0} : \vert f(\vec{x}) - L \vert < \epsilon \text{ whenever } \Vert \vec{x} - \vec{a} \Vert < \delta \text{ and } \vec{x} \in S

Continuity

  • Definition: f(x)f(\vec{x}) is continuous at a\vec{a} if the limit as x\vec{x} approaches a\vec{a} of f(x)f(\vec{x}), and f(a)f(\vec{a}), both exist and are equal.
  • Definitions also make sense for vector-valued functions: you can go from Rn\mathbb{R}^n to Rm\mathbb{R}^m. All you need is a corresponding notion of norm, which you do in any Euclidean space / dimension. The only thing that changes is to replace f(x)Ln<ϵ\Vert f(\vec{x}) - \vec{L} \Vert_n < \epsilon, where nn is the dimension of the value space.
    • Alternatively, you can find the limit of each indvididual component, considering the sub-function fj:RmRRnf_{j}: \mathbb{R}^m \to \mathbb{R} \sub \mathbb{R}^n.
  • The notion of “getting close” is very subtle in high dimensions – it’s harder than just taking left and right limits.
    • Example 1: f:R2R,f(x,y)=xyx2+y2f : \mathbb{R}^2 \to \mathbb{R}, f(x, y) = \frac{xy}{x^2 + y^2}. This function is everywhere bounded. f(x,y)2\vec f(x, y) \vec \le 2. A consequence of Cauchy-Schwartz. But the function fails to be continuous at the origin. The limit as x0\vec{x} \to \vec{0} does not exist. By taking linear lines of approach, the line is constant, but the line in particular changes this constant value
  • Theorem. Let f1(x,y)=x+yf_1(x, y) = x+ y, f2(x,y)=xyf_2(x, y) = xy, g(x)=1xg(x) = \frac{1}{x}. f1,f2:R2Rf_1, f_2 : \mathbb{R}^2 \to \mathbb{R}; g:R{0}Rg: \mathbb{R} \setminus \{0\} \to \mathbb{R}. Proof:
    1. Let (a,b)R2(a, b) \in \mathbb{R}^2
    2. Given ϵ>0\epsilon > 0, we must show that there exists some δ\delta such that if xa<δ\vert x - a \vert < \delta and yb<δ\vert y - b \vert < \delta, then (x+y)(a+b)<ϵ\vert (x + y) - (a + b) \vert < \epsilon.
    3. Let us choose δ=12ϵ\delta = \frac{1}{2} \epsilon.
    4. (x+y)(a+b)=(xa)+(yb)(xa)+(yb)\vert (x + y) - (a + b) \vert = \vert (x - a) + (y - b) \vert \le \vert (x -a ) \vert + \vert (y - b) \vert. Via triangle inequality
    5. We have that δ+δ=ϵ2+ϵ2=ϵ\delta + \delta = \frac{\epsilon}{2} + \frac{\epsilon}{2} = \epsilon.
    6. Revisit proof!
  • Theorem. Say f:RnRmf : \mathbb{R}^n \to \mathbb{R}^m and g=RmRkg = \mathbb{R}^m \to \mathbb{R}^k. If ff and gg are continuous at a\vec{a}, then gfg \circ f is continuous at a\vec{a}. Proof:
    1. Let ϵ>0\epsilon > 0.
    2. Since gg is continuous at f(a)f(\vec{a}), there exists δ1>0\delta_1 > 0 such that g(y)g(f(a))<ϵ\Vert g(\vec{y}) - g(f(\vec{a})) \Vert < \epsilon whenever yf(a)<δ1\Vert \vec{y} - f(\vec{a}) \Vert < \delta_1.
    3. Since ff is continuous at a\vec{a}, there exists δ2>0\delta_2 > 0 such that f(x)f(a)<δ1\Vert f(\vec{x}) - f(\vec{a}) \Vert < \delta_1 whenever xa<δ2\Vert \vec{x} - \vec{a} \Vert < \delta_2.
    4. Let δ=δ2\delta = \delta_2.
    5. Then g(f(x))g(f(a))<ϵ\Vert g(f(\vec{x})) - g(f(\vec{a})) \Vert < \epsilon whenever xa<δ\Vert \vec{x} - \vec{a} \Vert < \delta.
  • Later, with sequences: an=2an1an2+2a_n = 2a_{n-1} - a_n-2 + 2. Claim: an=n2a_n = n^2. You can prove this with induction.

Lecture 3: Sequences, Induction, the Completeness Axiom

Sequences

  • A sequence is a collection of mathematical objects which are indexed by the natural numbers
  • We denote a sequence by {xk}k=1\{x_k \}^\infty_{k=1}
  • A set has no notion of order, whereas a sequence does have order
  • e.g. the sequence 1, 4, 9, 16, … can be written as {n2}n=1\{ n^2 \}_{n=1}^\infty
  • e.g. a sequence of intervals (1,1),(1/2,1/2),(1/3,1/3),...(-1, 1), (-1/2, 1/2), (-1/3, 1/3), ... can be written as {(1/n,1/n)}n=1\{ (-1/n, 1/n) \}_{n=1}^\infty
  • You can also define sequences inductively: F1=1,F2=1,Fn=Fn+2=Fn+1+FnF_1 = 1, F_2 = 1, F_n = F_{n+2} = F_{n+1} + F_n
  • A nice proof technique: an inductive proof. Useful for proving statements of the form nN,P(n)\forall n \in \mathbb{N}, P(n) is true. First, show P(1)P(1). Then, show P(n)    P(n+1)P(n) \implies P(n+1) (inductive step)
  • Example: consider a1=1,a2=4,an=2an1an2+2a_1 = 1, a_2 = 4, a_n = 2a_{n-1} - a_{n-2} + 2 for n3n \ge 3. Claim: an=n2a_n = n^2.
  • Sequences of real numbers / vectors, {xn}n=1R2\{ x_n \}_{n=1}^\infty \subseteq \mathbb{R}^2 converge to aRn\vec{a} \in \mathbb{R}^n if ϵ>0,N>0\forall \epsilon > 0, \exists N > 0 such that xna<ϵ,n>N\Vert \vec{x}_n - \vec{a} \vert < \epsilon, \forall n > N. If this is so, then xna\vec{x}_n \to \vec{a}. if {xn=1}\{ \vec{x}_{n=1}^\infty \} does not converge to any vector, the sequence diverges.
  • We can describe limits / continuity in the form of sequences
  • e.g. the sequence {1/k}k=1\{ 1/k \}_{k=1}^\infty converges to 0.
  • e.g. the sequence {(1)k}k=1\{ (-1)^k \}_{k=1}^\infty is divergent.
  • The following are equivalent.
    1. f:RnRnf : \mathbb{R}^n \to \mathbb{R}^n is continuous.
    2. x0Rn,{xn}n=1\forall \vec{x}_0 \in \mathbb{R}^n, \forall \{ \vec{x}_n \}_{n=1}^\infty converging to x0\vec{x}_0, {f(xn)}n=1\{ f(\vec{x}_n ) \}_{n=1}^\infty converges to f(x0)f(\vec{x}_0).
    3. URk\forall U \subseteq \mathbb{R}^k an open set, the preimage f1(U)={xRnf(x)U}Rnf^{-1}(U) = \{ \vec{x} \in \mathbb{R}^n \vert f(\vec{x}) \in U \} \subseteq \mathbb{R}^n is also open.
  • Proof that 1 implies 2:
    1. Let x0Rn\vec{x}_0 \in \mathbb{R}^n be arbitrary, and assume ff is continuous at x0\vec{x}_0.
    2. Let {xn}n=1\{ \vec{x}_n \}_{n=1}^\infty be a sequence converging to x0\vec{x}_0.
    3. By continuity, ϵ>0\forall \epsilon > 0, δ>0\exists \delta > 0 such that f(x)f(x0)>ϵ\Vert f(\vec{x}) - f(\vec{x}_0) \Vert > \epsilon whenever xx0<δ\Vert \vec{x} - \vec{x}_0 \Vert < \delta.
    4. Since xnx0\vec{x}_n \to \vec{x}_0, N>0\exists N > 0 such that xnx0<δ,n>N\Vert \vec{x}_n - \vec{x}_0 \Vert < \delta, \forall n > N.
  • Proof that 2 implies 1 (prove the logicallay equivalent contrapositive statement):”not 1 implies not 2”. If x0Rn\exists \vec{x}_0 \in \mathbb{R}^n such that f(x)f(\vec{x}) is not continuous at x0\vec{x}_0, we’ll prove xnx0\exists \vec{x}_n \to \vec{x}_0 and f(xn)f(\vec{x}_n) does not converge to f(x0)f(\vec{x}_0).
    1. Since ff is not continuous at x0\vec{x}_0, ϵ0>0,δ>0,xδRn\exists \epsilon_0 > 0, \forall \delta > 0, \exists \vec{x}_{\delta} \in \mathbb{R}^n where xδx0<δ\Vert \vec{x}_\delta - \vec{x}_0 \Vert < \delta and f(xδ)f(x0)>ϵ0\Vert f(\vec{x}_\delta) - f(\vec{x}_0) \Vert > \epsilon_0
    2. For each of δ=1,1/2,1/3,...,1/k,...\delta = 1, 1/2, 1/3, ..., 1/k, ... choose such a vector xk\vec{x}_k as above.
    3. Although the sequence xkx0\vec{x}_k \to \vec{x}_0, f(xk)f(\vec{x}_k) does not converge to f(x0)f(\vec{x}_0).
  • Proof that 1 implies 3:
    1. Say URlU \subseteq \mathbb{R}^l is open. We will show that every point in f1(U)f^{-1}(U) is contained in a ball, contained in the preimage.
    2. If af1(u)\vec{a} \in f^{-1}(u), then f(a)Uf(\vec{a}) \in U, since UU is open, r>0\exists r > 0 such that Br(f(a))UB_r(f(\vec{a})) \subseteq U; i.e., yRk\forall \vec{y} \in \mathbb{R}^k, yf(a)<r,yU\Vert \vec{y} - f(\vert{a}) \Vert < r, \vec{y} \in U.
    3. By continuity of ff, for r>0r > 0, δ>0\exists \delta > 0 such that f(x)f(a)<r\Vert f(\vec{x}) - f(\vec{a}) \Vert < r whenever xa<δ\Vert \vec{x} - \vec{a} \Vert < \delta; i.e., Bδ(a)f1(U)B_\delta(\vec{a}) \subseteq f^{-1}(U), so f1(U)f^{-1}(U) is open.
  • Preimage definition: f1(U)={xRnf(x)U}f^{-1}(U) = \{ \vec{x} \in \mathbb{R}^n \vert f(\vec{x}) \in U \}
  • Proof that 3 implies 1.
    1. Let aRn\vec{a} \in \mathbb{R}^n be arbitrary. ϵ>0,Bϵ(f(a))Rk\forall \epsilon > 0, B_\epsilon ( f(\vec{a})) \subseteq \mathbb{R}^k is open.
    2. By 3, f1(Bϵ(f(a)))f^{-1}(B_\epsilon (f(\vec{a}))) is open. By openness, δ>0\exists \delta > 0 such that Bδ(a)f1(Bϵ(f(a)))B_\delta(\vec{a}) \subseteq f^{-1} (B_\epsilon (f(\vec{a}))).
    3. So, if xx<δ\Vert \vec{x} - \vec{x} \Vert < \delta, then f(x)f(a)<ϵ\Vert f(\vec{x}) - f(\vert{a}) \Vert < \epsilon, so ff is continuous at a\vec{a}.
  • Corollary: If f:RnRkf : \mathbb{R}^n \to \mathbb{R}^k is continuous and VRkV \subset \mathbb{R}^k, then f1(V)f^{-1}(V) (the preimage) is also closed.
  • Example: if C>0C > 0, the sequence {Cn/n!}n=1\{ C^n / n! \}_{n=1}^\infty converges to 0.
    1. Choose N>2CN > 2C, then n>N\forall n > N we have 0<Cn/n!=CN/N!C/(N+1)C/(N+2)...C/n0 < C^n / n! = C^N / N! \cdot C / (N+1) \cdot C / (N+2) \cdot ... \cdot C / n
    2. We can upper-bound the rest of these Cn/N!1212...12C^n / N! \cdot \frac{1}{2} \cdot \frac{1}{2} \cdot ... \cdot \frac{1}{2}
    3. These factors are nNn - N in number, and going to zero as nn \to \infty. Therefore the sequence converges to zero.
  • Theorem. ARnA \subseteq \mathbb{R}^n is a subset. Then xAˉ    A intersects every neighborhood of x\vec{x} \in \bar{A} \iff A \text{ intersects every neighborhood of } \vec{x}. (A point is in the closure of a set if the set intersects every neighborhood of the point.) Recall that x\vec{x} has a neighborhood UU if x\vec{x} is interior to UU. We will prove the contrapositive: a point is not in the closure of a set iff the point has a neighborhood not intersecting AA.
    1. Say xAˉ\vec{x} \notin \bar{A}, then RnAˉ=U\mathbb{R}^n \setminus \bar{A} = U is a neighborhood which does not intersect AA.
    2. Say U\exists U, a neighborhood of x\vec{x} not meeting AA. Then RnU\mathbb{R}^n \setminus U is a closed set containing AA, implying that the closure of AA is a subset of RnU\mathbb{R}^n \setminus U. Since xRnU,xA\vec{x} \notin \mathbb{R}^n \setminus U, \vec{x} \notin A.
  • Theorem. If SRnS \subseteq \mathbb{R}^n, and x0Rn\vec{x}_0 \in \mathbb{R}^n, x0Sˉ    {xn}n=1S where xnx0\vec{x}_0 \in \bar{S} \iff \exists \{ \vec{x}_n \}_{n=1}^\infty \subseteq S \text{ where } \vec{x}_n \to \vec{x}_0.
    • “It’s not possible to take a sequence of points and escape a closed ball”
      1. Leftwards proof. If xnx0\vec{x}_n \to \vec{x}_0 and n,xnS\forall n, \vec{x}_n \in S, then every neighborhood of x0\vec{x}_0 contains an element of SS, namely xn\vec{x}_n for nn sufficiently large.
      2. Since xnx0\vec{x}_n \to \vec{x}_0, ϵ>0\forall \epsilon > 0, N>0\exists N > 0 such that xnx0<ϵ\Vert \vec{x}_n - \vec{x}_0 \Vert < \epsilon, n>N\forall n > N; any neighborhood of x0\vec{x}_0 contains such a ball and xnBϵ(x0)U\vec{x}_n \in B_\epsilon ( \vec{x}_0 ) \subseteq U, a neighborhood of x0\vec{x}_0.
      3. So xnSU\vec{x}_n \in S \cap U for every neighborhood UU, and thus x0S\vec{x}_0 \in S.
      4. Rightwards proof. Say we have a point which is in the closure x0Sˉ\vec{x}_0 \in \bar{S}. If x0S\vec{x}_0 \in S, then in fact we can take a constant sequence {x0}n=1\exists \{ \vec{x}_0 \}_{n=1}^\infty which converges to x0\vec{x}_0.
      5. If x0Sˉ\vec{x}_0 \in \bar{S} but x0S\vec{x}_0 \notin S, then B1/k(x0)B_{1/k} (\vec{x}_0) is a neighborhood of x0\vec{x}_0, kN\forall k \in \mathbb{N}.
      6. By theorem, k,xkB1/k(x0)S\forall k, \exists \vec{x}_k \in B_{1/k}(\vec{x}_0) \cap S. Choosing one for each ball produces a sequence. This sequence clearly converges to x0\vec{x}_0.
  • The rightward implication: you can construct something just outside your set, to the closure, but no further.
  • Important property of the reals. Completeness in the reals: every sequence of real numbers which gets close to itself converges to a real number. When we talk about a limit existing, we point towards an existing limit. But even if we don’t know about the limit, the numbers get closer and closer to each other.

If SRS \subseteq \mathbb{R} is a subset, an upper bound bb for SS is a real number bRb \in \mathbb{R} such thatxS,xb\forall x \in S, x \le b. Similarly, a lower bound is a yRy \in \mathbb{R} such that xS\forall x \in S, yxy \le x. Bounds for a set are not unique. The Completeness Axiom. – one of the defining properties of the reals (proven in Kemp’s notes, theorem 4.13). If \(S \susbsetequals \mathbb{R}\) and non-empty,

  1. If SS has an upper bound, it has a least upper bound
  2. If SS has a lower boud, it has a greatest lower bound Although this is true for the reals, it is false for the rationals. For example, S={xRxQ and x2<2}S = \{ x \in \mathbb{R} \vert x \in \mathbb{Q} \text{ and } x^2 < 2 \}. There is no least upper bound in the rationals (it is in the reals).

Lecture 4: Sequences, Extrema

  • Monotone Sequence Theorem: every bounded monotone sequence is convergent.
  • Since ll is a least upper bound, ϵ>0,N>0\forall \epsilon > 0, \exists N > 0 s.t. lϵ<xn,n>Nl - \epsilon < x_n, \forall n > N, as otherwise lϵl - \epsilon would be a smaller upper bound than ll. Thus ϵ>0,N>0,lϵxn<l,n>N\forall \epsilon > 0, \exists N > 0, l - \epsilon \le x_n < l, \forall n > N. Therefore, xnlx_n \to l.
  • Nested Interval Theorem: consider a sequence of intervals ${I_k}_{k=1}^\infty,where, whereI_k = [a_k, b_k]suchthatsuch thatI_1 \supset I_2 \supset I_3 \supset … \supset I_k \supset …..b_k - a_k \to 0asask \to \infty.Then. Then\existsauniquea uniquex_0ineveryin everyI_n$$. Proof:
    1. Since I1I2I3...I_1 \supset I_2 \supset I_3 \supset ..., then a1a2a3...a_1 \le a_2 \le a_3 \le ..., and b1b2b3...b_1 \ge b_2 \ge b_3 \ge .... These are monotone sequences.
    2. These are bounded sequences since a1akbkb1a_1 \le a_k \le b_k \le b_1, k\forall k.
    3. We can apply the monotone sequence theorem to show that both sequences are convergent.
    4. Moreover, since bkak0b_k - a_k \to 0, then the converge to the same element. Call this element x0x_0. Since akxobka_k \le x_o \le b_k, k\forall k, so x0[ak,bk]    x0k=1Ikx_0 \in [a_k, b_k] \implies x_0 \in \cap_{k=1}^\infty I_k.
    5. We have to show moreover that this point is unique. Assume that there are two unique elements which are both in the intersection of all intervals. Then there is some nonzero distance between them. But since bkak0b_k - a_k \to 0, N\exists N s.t. bNaN<ϵ\vert b_N - a_N \vert < \epsilon. Since x0,x[aN,bN]x_0, x' \in [a_N, b_N]. This is a contradiction because x0x>ϵ\vert x_0 - x' \vert > \epsilon.
  • {xn}={(1)n}\{x_n\} = \{(-1)^n \} is bounded but not monotone.
  • If {xk}k=1\{x_k\}_{k=1}^\infty is a sequence, a subsequence {xkj}j=1\{ x_{k_j} \}_{j=1}^\infty and is defined by an injective increasing function from jkjj \to k_j.
    • That the injection is increasing means that the ordering is preserved.
    • Example: if kj=2jk_j = 2j, {(1)k}k=1\{ (-1)^k \}_{k=1}^\infty has a subsequence {(1)2j}j=1={1,1,1,...}\{ (-1)^{2j} \}_{j=1}^\infty = \{ 1, 1, 1, ... \}.

Theorem (Bolzano-Weierstrass). Every bounded sequence in R\mathbb{R} has a convergent subsequence. (It doesn’t need to be monotone.) Proof by divide and conquer. Have something like an inductive argument, and then we are forced into one of two choices.

  1. Let {xk}k=1\{x_k\}_{k=1}^\infty be bounded, say xk[a,b],kNx_k \in [a, b], \forall k \in \mathbb{N}.
  2. First, bisect the interval [a,b][a, b] into [a,(a+b)/2][a, (a+b)/2] and [(a+b)/2,b][(a+b)/2, b].
  3. Let I1I_1 be the biggest interval. One of these subintervals must contain infinitely many terms in the sequence.
  4. If both do, just choose the left one. Call this interval I2I_2.
  5. Bisect I2I_2 into [a2,(a2+b2)/2][a_2, (a_2 + b_2)/2] and [(a2+b2)/2,b2][(a_2 + b_2)/2, b_2]. Choose a subinterval as above, call it I3I_3.
  6. We continue to narrow in one whichever interval still has an infinite number of terms in the sequence.
  7. Proceeding in this way, we produce a sequence of nested intervals I1I2I3...I_1 \supset I_2 \supset I_3 \supset ....
  8. We know, moreover, that their lengths are decreasing to zero. Ik=ba2k1\vert I_k \vert = \frac{b - a}{2^{k-1}}.
  9. To construct our subset, k1Nk_1 \in \mathbb{N} s.t. xk1I1x_{k_1} \in I_1, k2>N:k2>k1k_2 > \mathbb{N} : k_2 > k_1 s.t. xk2I2x_{k_2} \in I_2, and so on. This guarantees that the subsequence is contained in the intersection of all the intervals and the order is preserved.
  10. By the Nested Interval Theorem, there is some x0k=1Ikx_0 \in \cap_{k=1}^\infty I_k. {xkj}j=1\{ x_{k_j} \}_{j=1}^\infty converges to x0x_0.

Corollary. Every bounded sequence in Rn\mathbb{R}^n has a convergent subsequence. If you look at a sequence of vectors, they all converge component-wise. But you couldn’t say that the sequenece converges to a single vector. j=1,...,n,a\forall j = 1, ..., n, \exists a convergent subsequence for {xk(j)}\{ x_k^{(j)}\} by BW. We proceed inductively.

  1. Choose a subsequence of vectors, say {xkl}l=1\{ \vec{x}_{k_l} \}_{l=1}^\infty, where the first component converges.
  2. Choose a subsubsequence which makes the second components converge. This will preserve the property that the first component converges, too.
  3. As long as nn is finite, you can keep on taking subs ;)

Definition. A sequence is Cauchy if ϵ>0,N>0\forall \epsilon > 0, \exists N > 0 s.t. xnxm<ϵ\Vert \vec{x}_n - \vec{x}_m \Vert < \epsilon whenever n,m>Nn, m > N. You can talk about a seuqnece being Cauchy without knowing what it converges to. Its components pile up, they end up being very close to each other, there is an index such that all the terms after that index are very close to each other. This is tronger even than saying that consecutive terms go to zzero. It also says that the terms in general after a certain point are very close to each other.

  • Todd Kemp: the real numbers are equivalence classes of Cauchy sequences which are rational.

Theorem. A sequence in Rn\mathbb{R}^n is a convergent sequence iff it is Cauchy. You don’t need to know the limit to know that the sequence is Cauchy. So often it is easier to show that a sequence of things is Cauchy rather than finding the convergent value. Proof:

  1. If {xn}n=1\{\vec{x}_n\}_{n=1}^\infty is converging to x0Rn\vec{x}_0 \in \mathbb{R}^n, the sequence is cauchy. Since xjxk=(xjx0)+(x0xk)\vec{x}_j - \vec{x}_k = (\vec{x}_j - \vec{x}_0) + (\vec{x}_0 - \vec{x}_k). Then, appeal to the triangle inequality: xkxjxjx0+x0xk\Vert \vec{x}_k - \vec{x}_j \Vert \le \Vert \vec{x}_j - \vec{x}_0 \Vert + \Vert \vec{x}_0 - \vec{x}_k \Vert as k,jk, j \to \infty. (For every ϵ\epsilon, choose the NN such that the convergence exists such that j,kj, k the minimum of the two is greater than NN.) Then xkxj<ϵ\Vert \vec{x}_k - \vec{x}_j \Vert < \epsilon.
  2. Suppose {xk}k=1\{ \vec{x}_k \}_{k=1}^\infty is Cauchy. Choosing ϵ=1\epsilon = 1 in the definition of Cauchy, there exists some NN such that xkxk<1\vert \vec{x}_k - \vec{x}_k \Vert < 1 for all j,k>Nj, k > N. Our sequence, then, is bounded. Iin particular, xk<xk+1+1\Vert \vec{x}_k \Vert < \Vert \vec{x}_{k+1} \Vert + 1, k>N\forall k > N. Thus the sequence is bounded and thuse has a convergent subsequence. But the Cauchy property shows us that the limit of the subsequence is the same as the limit of the original sequence.
    • Claim: the sequence converges to x0\vec{x}_0, not just the subsequence.
    • Given ϵ>0,J>0\epsilon > 0, \exists J > 0 s.t. xkjx0<ϵ/2\Vert \vec{x}_{k_j} - \vec{x}_0 \Vert < \epsilon / 2 if j>Jj > J (by convergence), further N>0\exists N > 0 s.t. xkxm<ϵ/2\Vert \vec{x}_k - \vec{x}_m \Vert < \epsilon / 2 if k,m>Nk, m > N (Cauchy).
    • Thus xkx0xkxkj+xkjx0<ϵ/2+ϵ/2=ϵ\Vert \vec{x}_k - \vec{x}_0 \Vert \le \Vert \vec{x}_k - \vec{x}_{k_j} \Vert + \Vert \vec{x}_{k_j} - \vec{x}_0 \Vert < \epsilon / 2 + \epsilon / 2 = \epsilon if k>Nk > N and j>Jj > J.

Compactness

  • Definition. A set SRnS \subseteq \mathbb{R}^n is compact if it is closed and bounded / every sequence in SS has a convergent subsequence whose limit is in SS.
  • Remark: finite sets are compact. Compact sets can be infinite, but behave like finite sets.
  • Theorem (Extreme Value Theorem). If f:[a,b]Rf: [a, b] \to \mathbb{R} is continuous, it achieves its max and min.

Notes

  • Midterm 1 is going to cover chapter 1 of the book.

Lecture 5: Compactness and Connectedness

COMPACTNESS

Definition. SRnS \subseteq \mathbb{R}^n is compact if it is closed and bounded.

Theorem. The following are equivalent:

  1. SRnS \subseteq \mathbb{R}^n is compact
  2. Every sequence of points in SS has a convergent subsequence whose limit is also in SS

Proof:

  1. If SS is closed and bounded, then Bolzano-Weierstrass guarantees that you have a convergent subsequence. Closedness implies the limit lies in SS.
  2. Suppose SS is not compact. If SS is not bounded, then there exists a sequence in SS such that the norm of the vector goes to infinity, in which case there is no convergent subsequence. If SS is not closed, then there exists an element in the closure of the set but not in the set itself, but because this is an element of the closure, there exists a subsequence converging to x0\vec{x}_0, since xkB1/k(x0)\vec{x}_k \in B_{1/k} (\vec{x}_0), kN\forall k \in \mathbb{N}.

Remark. Every finite set is compact.

  • Compactness is an expansion of useful properties of finite sets to infinite sets.

Theorem. Continuous functions map compact sets to compact sets. If f:SRmf: S \to \mathbb{R}^m is continuous and SS is compact, then the image of SS under ff is compact.

Proof:

  1. Suppose there is a set of points in the image {yk}f(S)\{\vec{y}_k\} \subseteq f(S), then kN,{vecxk}S:f(xk)=yk\forall k \in \mathbb{N}, \exists \{ vec{x}_k \} \subseteq S : f(\vec{x}_k) = \vec{y}_k.
  2. Since SS is compact, there exists a subsequence xkj\vec{x}_{k_j} which is converging in SS: xkjx0S\vec{x}_{k_j} \to \vec{x}_0 \in S.
  3. Since ff is continuous at x0\vec{x}_0, f(xk)f(x0)f(\vec{x}_k) \to f(\vec{x}_0)
  4. So {ykj}={f(xkj)}f(x0)f(S)\{\vec{y}_{k_j}\} = \{f(\vec{x}_{k_j})\} \to f(\vec{x}_0) \in f(S).

Extreme Value Theorem. If f:SRnRf : S \subseteq \mathbb{R}^n \to \mathbb{R} is continuous, SS is compact. Then ff has a max and min value which is attained in SS, i.e. a,bS\exists \vec{a}, \vec{b} \in S s.t. f(a)f(x)f(b)f(\vec{a}) \le f(\vec{x}) \le f(\vec{b}), xS\forall \vec{x} \in S. These values are attained in the set. We know we have supremums and infimums in the reals. But we don’t know that we have max and min values. This theorem says that we do have max and min values if we are in a compact set.

Definition. Maybe the most general notion of compactness. Let SRnS \subseteq \mathbb{R}^n. Let {Uα}αARn\{ \mathcal{U}_\alpha \}_{\alpha \in A} \subseteq \mathbb{R}^n be a cover of SS if SUαAUαS \subseteq U_{\alpha \in A} \mathcal{U}_\alpha.

Heine-Borel Theorem. If we have a subset of Rn\mathbb{R}^n SS, SS is compact iff every open cover of SS has a finite subcover.

Example. Let S=[0,1]S = [0, 1] are compact. Then the collection {Bϵ(x)}x[0,1],ϵ>0\{B_\epsilon(x)\}_{x \in [0, 1]}, \epsilon > 0

Definition. (Metric Spaces.) A metric space is a set XX with a function d:X×XRd : X \times X \to \mathbb{R} satisfying the following properties for every pair of points:

  1. Symmetry: d(x,y)=d(y,x)d(x, y) = d(y, x)
  2. Positivity: d(x,y)0d(x, y) \ge 0
  3. Definiteness: d(x,y)=0    x=yd(x, y) = 0 \iff x = y
  4. Triangle Inequality: d(x,z)d(x,y)+d(y,z)d(x, z) \le d(x, y) + d(y, z)

Examples of metric spaces.

  • Real space with Euclidean distance: Rn\mathbb{R}^n with d(x,y)=xy2d(\vec{x}, \vec{y}) = \Vert \vec{x} - \vec{y} \Vert_2
  • Real space with LL-infinity distance: Rn\mathbb{R}^n with d(x,y)=xyd(\vec{x}, \vec{y}) = \Vert \vec{x} - \vec{y} \Vert_\infty
  • Function space with sup-norm: C0([0,1])={f:[0,1]Rf is continuous.}C^0 ([0, 1]) = \{ f: [0, 1] \to \mathbb{R} \vert f \text{ is continuous.}\} with d(f,g)=fg:=supx[0,1]f(x)g(x)d(f, g) = \Vert f - g \Vert_\infty := \sup_{x \in [0, 1]} \vert f(x) - g(x) \vert

Definition. If (x,d)(x, d) is a metric space, an ϵ\epsilon-ball is Bϵ(x)={yXd(x,y)<ϵ}B_\epsilon (x) = \{ y \in X \vert d(x, y) < \epsilon \}.

Definition. A set S(X,d)S \subseteq (X, d) is open if xS,ϵ>0\forall x \in S, \exists \epsilon > 0 s.t. Bϵ(x)SB_\epsilon (x) \subseteq S. Closed sets are complements of an open set.

What is compactness now? A sequential definition of compactness: every sequence has a convergent subsequence whose limit is in SS. Consider the sequence {fn(x)}n=1\{ f_n (x) \}_{n=1}^\infty in C0([0,1])C^0([0, 1]), fn(x)=xnf_n(x) = x^n The sequence is bounded. The sup-norm is bounded by 2, but their ``limit’’ seems to be discontinuous, where it is y=0y = 0 for [0,1)[0, 1) and 11 at 11. Clearly this sequence does not have a convergent subsequence, because they converge to something outside of our space. So compactness needs to be more complex when thinking about spaces that are infinite-dimensional.

CONNECTEDNESS

Definition. A separation of a set SRnS \subseteq \mathbb{R}^n is a pair of disjoint nonempty sets A,BRnA, B \subseteq \mathbb{R}^n such that S=ABS = A \cup B and AˉB=ABˉ=\bar{A} \cap B = A \cap \bar{B} = \emptyset. A set is connected if no separation exists.

Example:

  • The set S={(x,y)(x1)2+y2<1}{(x,y)(x+1)2+y2<1}S = \{ (x, y) \vert (x - 1)^2 + y^2 < 1\} \cup \{(x, y) \vert (x + 1)^2 + y^2 < 1 \}
  • The origin does not belong to either set

Theorem. The connected subsets of R\mathbb{R} are the intervals [open, closed, half-open, unbounded].

Proof:

  • If SS is not an interval, it is disconnected. This means that there exist a,ba, b in SS and cRc \in \mathbb{R} s.t. a<c<ba < c < b, but cSc \notin S. Set A=S(,c),B=S(c,)A = S \cap (-\infty, c), B = S \cap(c, \infty). There is only one point in common across the intersection AˉBˉ\bar{A} \cup \bar{B}, which is cc. But cSc \notin S.
  • If S=[a,b]S = [a, b], say SS has some separation AB=SA \cup B = S. WLOG, we can assume aA,bBa \in A, b \in B. Let’s set c=supAc = \sup A, cAˉc \in \bar{A}. Because this is a separation, then cBc \notin B by the definition of separation. Therefore cAc \in A. Because cA,Bc \in A, \notin B, in particular, cbc \neq b. But then (c,b]B(c, b] \subseteq B, so cBˉc \in \bar{B} and thus cannot be in AA. This is a contradiction. So, [a,b][a, b] must be connected. Say SS is an unboudned interval and has a separation AB=SA \cup B = S.

Intermediate Value Theorem. The continuous image of a connected set is connected. If ff is a map from SRnRmS \cap \mathbb{R}^n \to \mathbb{R}^m is continuous and SS is connected, then f(S)f(S) is connected.

Proof by contrapositive: if the image is disconnected, then the original set is disconnected.

  1. If AB=f(S)A \cup B = f(S) is a separation of the image, then f1(A)f1(B)=Sf^{-1}(A) \cup f^{-1}(B) = S is a separation of the original set.
  2. Both of these sets are nonempty because their union was the image.
  3. Take x0f1(A)\vec{x}_0 \in f^{-1}(A) and say x0f1(B)ˉ\vec{x}_0 \in \bar{f^{-1}(B)}.
  4. Use the sequential definition of belonging to the closure; there exists a sequence of points xk\vec{x}_k which are in the preimage of BB which converge xkx0f1(B)\vec{x}_k \to \vec{x}_0 \in f^{-1}(B).
  5. Since ff is continuous, f(xk)f(x0)Bf(\vec{x}_k) \to f(\vec{x}_0) \in B.
  6. But f(x0)ABˉf(\vec{x}_0) \in A \cap \bar{B}, contradicting that AB=f(S)A \cap B = f(S) is a separation.

Corollary: If f:SRnRf : S \supseteq \mathbb{R}^n \to \mathbb{R} and SS is connected, then f(S)f(S) is an interval. Thus a,bS\forall \vec{a}, \vec{b} \in S, where f(a)<t<f(b)f(\vec{a}) < t < f(\vec{b}), there must exist a third element of the set such that f(c)=tf(\vec{c}) = t

Example: The function f(x)=x367+573x250+eπ242x83+x2+17f(x) = x^{367} + \frac{57}{3}x^{250} + \frac{e^{\pi^2}}{42} x^{83} + x^2 + 17 has a solution in R\mathbb{R}. How do we know it exists? Because of the intermediate value theorem. At some point it is positive, and at some point it is negative, which means that at some point it is zero.

Definition. A set is path-connected if x,yS:g:[0,1]S:g(0)=x,g(1)=y\forall \vec{x}, \vec{y} \in S : \exists g : [0, 1] \to S : g(0) = \vec{x}, g(1) = \vec{y}, noting gg is continuous. This function defines a path in the set SS.

Theorem. If SS is path-connected, then SS is connected. To prove, show the contrapositive and proof by contradiction as needed.

Topologist’s Sine Curve. It’s possible to be connected but not path connected. Let S={(0,y)y[1,1]}{(x,sin(π/x))0<x<2}S = \{ (0, y) \vert y \in [-1, 1] \} \cup \{ (x, \sin(\pi / x)) \vert 0 < x < 2 \}. This is not a separation because if you look at the closure of either of these two sets, they meet the other. But if you look at any point on the vertical line, there is no continuous path in the set which goes from a point in one set to the other, because it would need to pass through the crazy part of the curve.

Notes

  • Every function is surjective onto its own preimage

Lecture 6: Uniform Continuity

  • When we define f:SRnRmf : S \subset \mathbb{R}^n \to \mathbb{R}^m continuous on SS, this means that for all points in SS, ff is continuous at that point (epsilon-delta) definition.
  • However, note that the delta we choose is dependent on x\vec{x} (i.e. we need to adapt the delta based on the given conditions, i.e. shape of ff around xx and tightness of ϵ\epsilon)

Uniform continuity. A function f:SRnRmf : S \subseteq \mathbb{R}^n \to \mathbb{R}^m is uniformly continuous on SS if the following definition is satisfied.

ϵ>0,δ>0:x,yS,xy<δ    f(x)f(y)<ϵ\forall \epsilon > 0, \exists \delta > 0 : \forall \vec{x}, \vec{y} \in S, \Vert \vec{x} - \vec{y} \Vert < \delta \implies \Vert f(\vec{x}) - f(\vec{y}) \Vert < \epsilon
  • Example: f(x)=sin(x)f(x) = \sin(x) is uniformly continuous on R\mathbb{R}.

Geenerally speaking, if ff satisfies x,yS,M>0:f(x)f(y)Mxy\forall \vec{x}, \vec{y} \in S, \exists M > 0 : \Vert f(\vec{x}) - f(\vec{y}) \Vert \le M \Vert \vec{x} - \vec{y} \Vert, then ff is uniform continuous on SS, with ϵ=δ/M\epsilon = \delta / M.

Lipschitz inequality with Lipschitz constant MM. We say ff is MM-Lipschitz on SS.

Consider f(x)=ex:RRf(x) = e^x : \mathbb{R} \to \mathbb{R} is not uniformly continuous on R\mathbb{R}. This is not uniformly continuous. Assume that it is uniformly continuous on R\mathbb{R}. Let ϵ=1\epsilon = 1. Then there exists some positive δ\delta such that x,yR\forall x, y \in \mathbb{R}, xy<δ    exey<1\vert x - y \vert < \delta \implies \vert e^x - e^y \vert < 1. Notice that eδ/21>0\vert e^{\delta / 2} - 1 \vert > 0 and limxex=\lim_{x \to \infty} e^x = \infty, thus x0R\exists x_0 \in \mathbb{R} s.t. then ex0(eδ/21)>1\vert e^{x_0} (e^{\delta / 2} - 1) \vert > 1. But then, ex0+δ/2ex0>1\vert e^{x_0 + \delta / 2} - e^{x_0} \vert > 1, which is a contradiction.

But, who gives a fuck about uniform continuity?

  • Suppose f:SRnRmf : S \subset \mathbb{R}^n \to \mathbb{R}^m is continuous, and {xn}n=1S\{ \vec{x}_n \}_{n=1}^\infty \subseteq S is Cauchy. Is {f(xn)}n=1\{f(\vec{x}_n)\}_{n=1}^\infty Cauchy?
  • Consider S=(0,1)S = (0, 1), f(x)=1/x:(0,1)Rf(x) = 1/x : (0, 1) \to \mathbb{R}. The sequence converges outside of its set. {xn}n=1={1/n}n=1S\{ x_n \}_{n=1}^\infty = \{ 1/n \}_{n=1}^\infty \subseteq S, xn0x_n \to 0, {f(xn)}n=1={n}n=1\{ f(x_n) \}_{n=1}^\infty = \{ n \}_{n=1}^\infty

Theorem. If f:SRnRmf : S \subseteq \mathbb{R}^n \to \mathbb{R}^m is uniform continuous on SS, then ff sends Cauchy sequences to Cauchy sequences.

Theorem. (Heine, 1870). Suppose SRnS \subseteq \mathbb{R}^n is compact. Then if f:SRnRmf : S \subseteq \mathbb{R}^n \to \mathbb{R}^m is continuous, it is uniformly continuous.

(Proof.)

  1. Assume for the sake of contradiciton that ff is not unfiromly continuous.
  2. This means that there exists an ϵ\epsilon such that for every δ\delta, there exists x,yS\vec{x}, \vec{y} \in S such that xy<δ\Vert \vec{x} - \vec{y} \Vert < \delta and f(x)f(y)>ϵ\Vert f(\vec{x}) - f(\vec{y}) \Vert > \epsilon.
  3. Suppose we have the sequence δ=1,1/2,1/3,...,1/n,...\delta = 1, 1/2, 1/3, ..., 1/n, ....
  4. For each element in this sequence, x,ys.t.xnyn<1/n\exists \vec{x}, \vec{y} s.t. \Vert \vec{x}_n - \vec{y}_n \Vert < 1/n and f(xn)f(yn)>ϵ\Vert f(\vec{x}_n) - f(\vec{y}_n) \Vert > \epsilon.
  5. By BW Theorem, there exists a subsequence xnjaS\vec{x}_{n_j} \to \vec{a} \in S, and also that xnjynj0\Vert \vec{x}_{n_j} - \vec{y}_{n_j} \Vert \to 0, so ynja\vec{y}_{n_j} \to \vec{a}.
  6. Therefore, f(xnjf(ynj)0)f(\vec{x}_{n_j} - f(\vec{y}_{n_j}) \to 0).
  7. This contradicts f(xnj)f(ynj)>ϵ\Vert f(x_{n_j}) - f(y_{n_j}) \Vert > \epsilon.

Lecture 7: Differential Calculus

Some history

  • Many real analysis cours eare taught “backwards”
  • Sharaf al-Din al-Tusi, 12th century Iranian mathematician, derivative of cubic equation
  • Bhaskara II, 12th century Indian mathematician, derivative of sine function, Taylor approximations
  • Merton School, 14th century, mean value theorem, etc.
  • An explosion of interest in calculus and derivatives in the 17th century, growing out of practical questions, trying to solve this type of question: You have a curve which is the graph of a function. You want to find the tangent line to that point.
  • Applications:
    • Angles of intersecting curves (Decartes)
    • Building telescopes and clocks (Galilei, Huygens)
    • Finding maxes and mins of functions (Newton, Fermat)
    • Astronomy (Kepler, Newton)

Derivatives

  • Leibniz’s proof: If xx changes by δx\delta x, then y=x2y = x^2 should change by y+δy=(x+δx)2=x2+2xδx+(δx)2y + \delta y = (x + \delta x)^2 = x^2 + 2x\delta x + (\delta x)^2. Relabel δx,δy\delta x, \delta y as dx,dydx, dy and as they become ‘infinitely small’, then (δx)2(\delta x)^2 gets infinitely smaller. Also subtract y=x2y = x^2. So we get dy=2xdxdy = 2x dx, or dy/dx=2xdy/dx = 2x.
  • But what does “infinitely small” mean? When can we just drop (δx)2(\delta x)^2?
  • Leibniz is considering some kind of limiting argument.
  • Consider this as a sort of linear approximation.
  • Idea: given f:RRf : \mathbb{R} \to \mathbb{R}. Find a linear function l(x)=mx+b;m,bRl(x) = mx + b; m, b \in \mathbb{R} approximating f(x)f(x) at x=ax = a.
    1. Basic criteria: f(a)=l(a)f(a) = l(a); i.e. f(a)=ma+bf(a) = ma + b; l(x)=m(xa)+f(a)l(x) = m(x - a) + f(a)
    2. The difference f(x)l(x)f(x) - l(x) goes to zero faster than xax - a as xax \to a. In other words, f(x)l(x)xa0\frac{f(x) - l(x)}{x - a} \to 0 as xax \to a.
  • We can rewrite the error of the linear approximation. Let h=xah = x - a. Then f(x)l(x)=f(a+h)f(a)mh=:E(h)f(x) - l(x) = f(a + h) - f(a) - mh =: E(h) ‘‘error’’. ‘

Definition. Let f:IRRf : I \subseteq \mathbb{R} \to \mathbb{R} where aIa \in I. We say ff is differentiable at aa if there exists mRm \in \mathbb{R} s.t. f(a+h)=f(a)+mh+E(h)f(a + h) = f(a) + mh + E(h) where limh0E(h)/h=0\lim_{h \to 0} E(h) / h = 0, where h=xah = x - a. We call mm the derivative of ff at aa. (Note that this is a first-order Taylor approximation.)

But we can rearrange: m=f(a+h)f(a)E(h)h=f(a+h)f(a)hm = \frac{f(a + h) - f(a) - E(h)}{h} = \frac{f(a + h) - f(a)}{h}. (Error goes to zero.) But if you think of it in the first way, you’re thinking that it can be easured as a linear approximation plus an error.

The condition limh0E(h)h=0\lim_{h \to 0} \frac{E(h)}{h} = 0 is often denoted E(h)E(h) is O(h)\mathcal{O}(h), ‘‘little-oh’’. Thus differentiability is described as ``f(a+h)f(a + h) is linear +O(h)+ \mathcal{O}(h)’’.

Note that if limh0E(h)h=0\lim_{h \to 0} \frac{E(h)}{h} = 0, then limh0E(h)=0\lim_{h \to 0} E(h) = 0. Then limh0f(a+h)f(a)mh=limh0E(h)=0\lim_{h \to 0} f(a + h) - f(a) - mh = \lim_{h \to 0} E(h) = 0. But this implies that ff is continuous. Differentiability at a point implies continuity at a point.

We will write the value of mm as f(a)f'(a), the derivative of ff at aa. There may not be a function f(x)f'(x) such that m=f(x)x=am = f'(x) \vert_{x = a}.

Example. Proof of the product rule. If f,gf, g are differentiable at aa, the derivative of f(x)g(x)f(x)g(x) at aa is f(a)g(a)+f(a)g(a)f'(a) g(a) + f(a) \cdot g'(a). Since f,gf, g are differentiable, we can write out their linear approximations. f(a+h)=f(a)+f(a)h+E1(h)f(a + h) = f(a) + f'(a) h + E_1(h), g(a+h)=g(a)+g(a)h+E2(h)g(a + h) = g(a) + g'(a)h + E_2(h). Therefore, f(a+h)g(a+h)=f(a)g(a)+[f(a)g(a)+f(a)g(a)]h+E3(h)f(a + h) g(a + h) = f(a) g(a) + \left[ f'(a) g(a) + f(a) g'(a) \right]h + E_3(h). We have a really ugly term E3=(f(a)+f(a)h+E1(h))E2(h)+(g(a)+g(a)h+E2(h))E1(h)+f(a)g(a)h2E_3 = \left( f(a) + f'(a) h + E_1(h)\right) E_2(h) + \left( g(a) + g'(a) h + E_2(h) \right) E_1(h) + f'(a) g'(a) h^2. This is all little-oh, they all have limits which, if you divide by zero, still go to zero as you go to zero.


Lecture 8: Derivatives in 1 Variable, Several Variables, the Mean Value Theorem

  • Bigger than 45: A; between 37 and 44: B; between 30 and 36: C; less than 29; D
  • Average: 38, 81%
  • Median: 40, 84%

Definition. Let f:IRRf : I \subseteq \mathbb{R} \to \mathbb{R}, and aIa \in I be an open interval. ff is differentiable at aa if there exists mRm \in \mathbb{R} s.t. f(a+h)=f(a)+mh+E(h)f(a + h) = f(a) + mh + E(h), such that limh0E(h)h=0\lim_{h \to 0} \frac{E(h)}{h} = 0.

Note that, in the limit as h0h \to 0, this limit equals mm.

Lemma. Say we have f:IRRf : I \subset \mathbb{R} \to \mathbb{R} and aIa \in I is open, and aa is a local maximum or minimum of the function. If ff is differentiable at aa, then f(a)=0f'(a) = 0.

Proof.

  1. Since aa is the local min, f(a+h)f(a)f(a + h) \ge f(a) for all hh sufficiently small.
  2. Then (f(a+h)f(a))/h(f(a + h) - f(a)) / h has the same sign as hh..
  3. Look at the left and right limits as h0±h \to 0^\pm
  4. Since ff is differentiable, the limits must agree, so f(a)=0f'(a) = 0.

Rolle’s Theorem. One of the earliest ideas about calculus. If ff is continuous on a closed interval [a,b][a, b] and ff is differentiable on (a,b)(a, b) and if f(a)=f(b)f(a) = f(b), then there exists c(a,b)c \in (a, b) s.t. f(c)=0f'(c) = 0. “What comes up must come down.”

Proof.

  1. From the extreme value theorem, because the closed interval is compact, any continuous function on it has a max and a min.
  2. If one of each is at the endpoints, since max=f(a)=f(b)=minmax = f(a) = f(b) = min, then ff is cnonstant. Then the derivative of the function is 0.
  3. Otherwise, a min or max occurs in the interval, and the mean occurs there.

Non-constructive proof: doesn’t tell you where the min or max is. You can guarantee your tangent slope will vanish somewhere in the interval.

One-variable mean value theorem. Important consequence of Rolle’s theorem. Say ff is continuous on [a,b][a, b] and differentiable on (a,b)(a, b). Then c(a,b)\exists c \in (a, b) s.t. f(c)=f(b)f(a)baf'(c) = \frac{f(b) - f(a)}{b - a}. This line is the secant line connecting the two points. The slope of the tangent line is the same as the slope of the secant line.

Proof

  1. (a,f(a))(a, f(a)) and (b,f(b))(b, f(b)) are connected by the line l(x)=f(a)+f(b)f(a)ba(xa)l(x) = f(a) + \frac{f(b) - f(a)}{b - a} (x - a).
  2. Exists c(a,b)c \in (a, b) where f(c)f'(c) is the slope of the line.
  3. We claim c(a,b)\exists c \in (a, b) where f(c)f'(c) equals the slope of the secant line.
  4. We can write down a new function which is just a difference of the other two: g(x)=f(x)l(x)g(x) = f(x) - l(x). g(c)=0g'(c) = 0 means that f(c)=l(c)f'(c) = l'(c).
  5. Since g(a)=g(b)=0g(a) = g(b) = 0, we can apply Rolle’s theorem to gg, which means that g(c)=0g'(c) = 0 for some c(a,b)c \in (a, b).

Definition. We say that ff is either increasing or decreasing if a<b\forall a < b, f(a)f(b)f(a) \le f(b) or f(a)f(b)f(a) \ge f(b), respectively. Strictly increasing if f(a)<f(b)f(a) < f(b) or f(a)>f(b)f(a) > f(b), respectively.

Theorem. Say ff is differentiable on II, an open interval.

  1. If f(x)C\vert f'(x) \vert \le C, xI\forall x \in I, then ff is Lipschitz with constant CC. That is, f(x)f(y)Cxy\vert f(x) - f(y) \vert \le C \vert x - y \vert.
  2. If f(x)=0f'(x) = 0, xI\forall x \in I, then ff is constant.
  3. If f(x)0f'(x) \ge 0, xI\forall x \in I, then ff is increasing.

Proof.

  1. For a,bIa, b \in I, by the MVT, we have c(a,b)c \in (a, b) s.t. f(c)(ba)=f(b)f(a)f'(c) (b - a) = f(b) - f(a).
  2. Then 1 implies f(c)C\vert f'(c) \le C and 2 implies f(b)=f(a)f(b) = f(a).
  3. The constant in the Lipshitz inequality is the absolute value of the derivative, so f(b)f(a)Cba\vert f(b) - f(a) \vert \le C \vert b - a \vert.
  4. If f(c)0f'(c) \ge 0, then f(b)f(a)0f(b) - f(a) \ge 0. SO f(a)f(b)f(a) \le f(b).

Corollary. For any a>0a > 0, we have

limx+xaex=limx+logxxa=limx0+logxxa=0\lim_{x \to +\infty} \frac{x^a}{e^x} = \lim_{x \to +\infty} \frac{\log x}{x^a} = \lim_{x \to 0^+} \frac{\log x}{x^{-a}} = 0

At infinity, exponentials grow larger than any power of xx; any power of xx grows larger than log\log; any log\log grows larger than any negative power of xx.

L’Hopital’s rule – need to be continuous and differentiable in the neighborhood.

1-variable, vector-valued functions Suppose you have a function f:RRnf : \mathbb{R} \to \mathbb{R}^n, then f(x)f(x) is a scalar quantity, but the output is a vector. For each jj, you can write the derivative at aa as f(a)=limh0f(a+h)f(a)h=(f1(a),f2(a),...,fh(a))f'(a) = \lim_{h \to 0} \frac{f(a + h) - f(a)}{h} = (f_1'(a), f'_2(a), ..., f'_h(a)). ff is differentiable at aa, or fjf_j is differentiable at aa for all j=1,...,nj = 1, ..., n. If ϕ:RR\phi : \mathbb{R} \to \mathbb{R} and g:RRng : \mathbb{R} \to \mathbb{R}^n. Then it satisfies (ϕf)=ϕf+ϕf(\phi f)' = \phi' f + \phi f'. Moreover, (f,g)=f,g+f,g(\langle f, g \rangle)' = \langle f', g \rangle + \langle f, g' \rangle. It follwos that (f2)=2f(\Vert f \Vert^2)' = 2 \Vert f \Vert

Think about these functions as parametrized curves in Rn\mathbb{R}^n: tf(t)t \to f(t), e.g. the path of a particle, in which case tf(t)t \to f'(t) is the path of velocity vectors of the particle.

The tangent line too f(t)f(t) is l(t)=f(t0)+tf(t0)l(t) = f(t_0) + t f'(t_0).

Scalar-valued, multi-variable differentiability. f:DRnRf : D \subseteq \mathbb{R}^n \to \mathbb{R}. Start with a partial derivative. The derivative of a function w.r.t. 1 variable while fixing the others as constant. Suppose f(x)=f(x1,...,xn)f(\vec{x}) = f(x_1, ..., x_n). Then \(\frac{\del f}{\del x_j} (\vec{x}) = \lim_{h \to 0} \frac{f(x_1, ..., x_{j+h}, ..., x_n) - f(x_1, ..., x_j, ..., x_n)}{h}\) Just changing a single component of the input.

Partial derivatives don’t tell you the whole story about a function’s local behavior.

Definition. A function f:DRnRf: D \subseteq \mathbb{R}^n \to \mathbb{R} is differentiable at aD\vec{a} \in D if there exists cRn\vec{c} \in \mathbb{R}^n s.t. f(a+h)=f(a)+c,h+E(h)f(\vec{a} + \vec{h}) = f(\vec{a}) + \langle \vec{c}, \vec{h} \rangle + E(\vec{h}) s.t. limh0E(h)h=0\lim_{\vec{h} \to \vec{0}} \frac{E(\vec{h})}{\Vert h \Vert} = 0. Equivlaently, limh0f(a+h)f(a)c,hh=0\lim_{\vec{h} \to 0} \frac{f(\vec{a} + \vec{h}) - f(\vec{a}) - \langle \vec{c}, \vec{h} \rangle}{\Vert \vec{h} \Vert} = 0. If this limit exists, we write c=f(a)\vec{c} = \nabla f(\vec{a}), the gradient of ff at a\vec{a}.

For example, z=f(x)z = f(\vec{x}) is a surface. z=f(a)+f(a),xaz = f(\vec{a}) + \langle \nabla f(\vec{a}), \vec{x} - \vec{a} \rangle is a plane. ff is differentiable at a\vec{a} if ZsurfaceZplane=f(x)f(xa)f(a),aa=E(xa) faster than xa00Z_{\text{surface}} - Z_{\text{plane}} = f(\vec{x}) - f(\vec{x} - \vec{a}) - \langle \nabla f(\vec{a}), \vec{a} - \vec{a} \rangle = E(\vec{x} - \vec{a}) \to_{\text{ faster than } \Vert \vec{x} - \vec{a} \Vert \to 0} 0. At any point, the tangent plane is a good approximation for points near the point on the surface.

Theorem. If ff is differentiable at a\vec{a}, then all \(\del_j f(\vec{a})\) exist, and \(\nabla f(\vec{a}) = (\del_1 f(\vec{a}), ..., \del_n f(\vec{a}))\).

if there exists mRm \in \mathbb{R} s.t. f(a+h)=f(a)+mh+E(h)f(\vec{a} + \vec{h}) = f(\vec{a}) + m \cdot \vec{h} + E(\vec{h}), where limh0E(h)h=0\lim_{\vec{h} \to 0} \frac{E(\vec{h})}{\Vert \vec{h} \Vert} = 0.

Partials are all components of the derivative. If we know something about all of the partials, we can conclude something about the derivative at a point.

Theorem. If ff is differentiable at a\vec{a}, then ff is continuous at a\vec{a}.

Theorem. If f:DRnRf : D \subseteq \mathbb{R}^n \to \mathbb{R} and for aD\vec{a} \in D, if all \(\del_j f\) exist in a neighborhood of a\vec{a} and are continuous at a\vec{a}, then ff is differentiable at a\vec{a}.

Proof in 2D for simplicity.

  1. We want to show that f(a+h)f(a)c,hh0\frac{f(\vec{a} + \vec{h}) - f(\vec{a}) - \langle \vec{c}, \vec{h} \rangle}{\Vert \vec{h} \Vert} \to 0 as h0\vec{h} \to 0.
  2. Let \(\vec{c} = (\del_1 f(\vec{a}), \del_2 f(\vec{a}))\).
  3. Then, for h=(h1,h2)\vec{h} = (h_1, h_2).
  4. Since f(a+h)f(a)=[f(a1+h1,a2+h2)f(a1,a2+h2)]+[f(a1,a2+h2)f(a1,a2)]f(\vec{a} + \vec{h}) - f(\vec{a}) = [f(a_1 + h_1, a_2 + h_2) - f(a_1, a_2 + h_2)] + [f(a_1, a_2 + h_2) - f(a_1, a_2)]
  5. Use the one-variable mean value theorem to bound each of these components.
  6. For xah<δ\Vert \vec{x} - \vec{a} \Vert \le \Vert \vec{h} \Vert < \delta sufficiently small, we have that the first term is less than \(\del_1 f(a_1 + c_1, a_2 + h_2) h_1\), \(\del_2 f(a_1, a_2 + c_2) h_2\).
  7. Therefore, \(\frac{f(\vec{a} + \vec{h}) - f(\vec{a}) - \langle \vec{c}, \vec{h} \rangle}{\Vert \vec{h} \Vert} \le [\del_1 f(a_1 + c_1, a_2 + h_2) - \del_1 f(a_1, a_2)] \frac{h_1}{\Vert \vec{h} \Vert} + [\del_1 f(a_1, a_2 + c_2) - \del_2 f(a_1, a_2)] \frac{h_2}{\Vert \vec{h} \Vert} \le \epsilon/2 + \epsilon /2 < \epsilon\).
  8. This applies WLOG, apply mean value in every coordinate.

Lecture 9: C1C^1-functions, chain rule, gradients, Kepler’s laws

Theorem. If f:DRnRf : D \subseteq \mathbb{R}^n \to \mathbb{R}, aD\vec{a} \in D, and \(\del_j f\) exists at a\vec{a} and are continuous in the neighborhood of a\vec{a}, then ff is differentiable at a\vec{a}

Definition. C1(D)={f:DRnRall partials exist and continuous on D}C^1(D) = \{ f : D \subseteq \mathbb{R}^n \to \mathbb{R} \vert \text{all partials exist and continuous on } D \}

Definition. Given aRn\vec{a} \in \mathbb{R}^n, and u\vec{u} where u=1\Vert \vec{u} \Vert = 1. The directional derivative of f:RnRf : \mathbb{R}^n \to \mathbb{R} at a\vec{a} in direction u\vec{u} is the following limit:

\[\del_{\vec{u}} f(\vec{a}) = \lim_{t \to 0} \frac{f(\vec{a} + t \vec{u}) - f(\vec{a})}{t}\]

Theorem. If ff is differentiable at a\vec{a}, then the directional derivatives at a\vec{a} in any direction exist, and \(\del_{\vec{u}} f(\vec{a}) = \nabla f(\vec{a}) \cdot \vec{u}\).

This implies \(\del_j f(\vec{a}) = \nabla f(\vec{a}) \cdot \vec{e}_j\), ej\vec{e}_j is the one-hot vector with 1 at jj and 0 otherwise.

Proof.

  1. Since ff is differentiable at a\vec{a}, we have f(a+h)f(a)f(a)hh0\frac{f(\vec{a} + \vec{h}) - f(\vec{a}) - \nabla f(\vec{a}) \cdot \vec{h}}{\Vert \vec{h} \Vert} \to 0 as h0\vec{h} \to 0
  2. Let h=tu\vec{h} = t \vec{u}.
  3. Then f(a+tu)f(a)f(a)(tu)t\frac{f( \vec{a} + t \vec{u}) - f(\vec{a}) - \nabla f(\vec{a}) \cdot (t \vec{u})}{t}, note h=t\Vert \vec{h} \Vert = t.
  4. We can factor out the tt: f(a+tu)f(a)tf(a)u0\frac{f(\vec{a} + t \vec{u}) - f(\vec{a})}{t} - \nabla f(\vec{a}) \cdot \vec{u} \to 0 as t0t \to 0.
  5. If t<0t < 0, since h=t\Vert \vec{h} \Vert = -t, then f(a+tu)f(a)t+f(a)u0-\frac{f(\vec{a} + t \vec{u}) - f(\vec{a})}{t} + \nabla f(\vec{a}) \cdot \vec{u} \to 0 as t0t \to 0.

Nice property of the gradient. For any direction u\vec{u}, your function grows quickest in the gradient direction. Since \(\del_{\vec{u}} f(\vec{a}) = \nabla f(\vec{a}) \cdot \vec{u} = \Vert \nabla f(\vec{a}) \Vert \Vert \vec{u} \Vert \cos \theta = \Vert \nabla f(\vec{a}) \Vert \cos \theta \le \Vert \nabla f(\vec{a}) \Vert\). The gradient at every point aa always directs you towards the direction of maximal increase.

What is the chain rule? f(g(t)):RRf(g(t)) : \mathbb{R} \to \mathbb{R}, where g:RRn\vec{g}: \mathbb{R} \to \mathbb{R}^n, f:RnRf: \mathbb{R}^n \to \mathbb{R}.

Theorem. Let g:RRn\vec{g} : \mathbb{R} \to \mathbb{R}^n be differentiable at t=at= a and f:RnRf : \mathbb{R}^n \to \mathbb{R} is differentiable at b=g(a)\vec{b} = \vec{g}(a). Then f(g(a))f(\vec{g}(a)) is differentiable at a=ta = t and \(\frac{d}{dt} f(\vec{g}(t)) \vert_{t = a} = \nabla f(\vec{g}(a)) \cdot \vec{g}'(a) = \frac{\del f}{\del x}, \frac{dx}{dt} + \frac{\del f}{\del x_2} \frac{dx_2}{dt} + ... + \frac{\del f}{\del x_n} \frac{d x_n}{dt}\).

Proof.

  1. We have that f(b+h)=f(b)+f(b)h+E1(h)f(\vec{b} + \vec{h}) = f(\vec{b}) + \nabla f(\vec{b}) \cdot \vec{h} + E_1 (\vec{h}) with E1(h)/h0E_1(\vec{h}) / \Vert \vec{h} \Vert \to 0 as h0\vec{h} \to \vec{0}.
  2. We also have g(a+ϵ)=g(a)+g(a)ϵ+E2(ϵ)\vec{g}(a + \epsilon) = \vec{g}(a) + g'(a) \epsilon + \vec{E}_2 (\epsilon) with E2(ϵ)/ϵ0\vec{E}_2(\epsilon) / \Vert \epsilon \Vert \to 0 as ϵ0\epsilon \to 0.
  3. Writing h\vec{h} as g(a+ϵ)g(a)=g(a)=g(a)ϵ+E2(ϵ)\vec{g}(a + \epsilon) - \vec{g}(a) = \vec{g}'(a) = \vec{g}'(a) \epsilon + \vec{E}_2 (\epsilon)
  4. We have that g(a+ϵ)=h+b\vec{g}(a + \epsilon) = \vec{h} + \vec{b}. So f(g(a+ϵ))=f(b+h)=f(b)+f(b)h+E1(h)=f(b)+f(b)(g(a)ϵ+E2(ϵ))+E1(h)f(\vec{g}(a + \epsilon)) = f( \vec{b} + \vec{h}) = f(\vec{b}) + \nabla f(\vec{b}) \cdot \vec{h} + E_1(\vec{h}) = f(\vec{b}) + \nabla f(\vec{b}) \cdot (g'(a) \epsilon + \vec{E}_2(\epsilon)) + E_1(\vec{h})
  5. We have f(b)+f(b)g(a)ϵ+E3(ϵ)f(\vec{b}) + \nabla f(\vec{b}) \cdot \vec{g}'(a) \epsilon + E_3(\epsilon), where E3(ϵ)=f(b)E2(ϵ)+E1(h)E_3(\epsilon) = \nabla f(\vec{b}) \cdot E_2 (\epsilon) + E_1(\vec{h})
  6. Then, we have f(g(a+ϵ))f(g(a))ϵ=f(g(a))g(a)+E3(ϵ)ϵ\frac{f(\vec{g}(a + \epsilon)) - f(\vec{g}(a))}{\epsilon} = \nabla f(\vec{g}(a)) \cdot \vec{g}'(a) + \frac{E_3(\epsilon)}{\epsilon}
  7. After some tedious algebra, as ϵ0\epsilon \to 0, h0\Vert \vec{h} \Vert \to 0, and 1ϵ1h\frac{1}{\epsilon} \le \frac{1}{\Vert h \Vert}
  8. Then E3(ϵ)ϵ0\frac{E_3(\epsilon)}{\epsilon} \to 0 as ϵ0\epsilon \to 0.

The derivative of a function from RnRm\mathbb{R}^n \to \mathbb{R}^m is a matrix.

2nd gradient property. Let F:UR3RF : \mathcal{U} \subseteq \mathbb{R}^3 \to \mathbb{R} be differentiable and non-constant and its image on U\mathcal{U} is a smooth surface, then for all elements on the surface, F(a)\nabla F(\vec{a}) is perpendicular to SS.

Water flowing down the mountain travels at right angles to the level contours. Gradient descent follows the path of water.

  • Cross-product: only makes sense in R3\mathbb{R}^3, produces another vector.
  • The cross product v×w=vwsinθn\vec{v} \times \vec{w} = \Vert \vec{v} \Vert \Vert \vec{w} \Vert \sin \theta \vec{n}, where n\vec{n} is the unit vector perpendicular to the plane spanned by v\vec{v} and w\vec{w}.

Lemma. If x(t),y(t):RR3\vec{x}(t), \vec{y}(t) : \mathbb{R} \to \mathbb{R}^3, f:RR,cRf : \mathbb{R} \to \mathbb{R}, c \in \mathbb{R}, (fx)=fx+fx(f \vec{x})' = f' \vec{x} + f \vec{x}' and (x×y)=x×y+x×y(\vec{x} \times \vec{y})' = \vec{x}' \times \vec{y} + \vec{x} \times \vec{y}' and (xy)=xy+xy(\vec{x} \cdot \vec{y})' = \vec{x}' \cdot \vec{y} + \vec{x} \cdot \vec{y}'.


Lecture 10: Celestial Mechanics, Mean Value Theorem, Higher order Partials

Theorem. Let SRnS \subseteq \mathbb{R}^n be open, containing a,bRn\vec{a}, \vec{b} \in \mathbb{R}^n and it contains LL connecting a\vec{a} to b\vec{b}. Suppose f:SRnRf : S \subset \mathbb{R}^n \to \mathbb{R} is continuous on LL and differentiable on LL, except perhaps the endpoints. Then, there exists a point cL\vec{c} \in L s.t. f(c)(aa)=f(b)f(a)\nabla f(\vec{c}) \cdot (\vec{a} - \vec{a}) = f(\vec{b}) - f(\vec{a}).

Proof.

  1. Let h=ba\vec{h} = \vec{b} - \vec{a}; then L={a+tht[0,1]}L = \{ \vec{a} + t \vec{h} \vert t \in [0, 1] \}.
  2. Let ϕ(t):[0,1]R=f(a+th)\phi(t) : [0, 1] \to \mathbb{R} = f(\vec{a} + t \vec{h}) is continuous and differentiable on (0,1)(0, 1)
  3. We have ϕ(t)=f(a+th)ddt(a+th)=f(a+th)(ba)\phi'(t) = \nabla f( \vec{a} + t \vec{h}) \cdot \frac{d}{dt} (\vec{a} + t \vec{h}) = \nabla f(\vec{a} + t \vec{h}) \cdot ( \vec{b} - \vec{a})
  4. With the 1D MVT, there exists some t0t_0 such that ϕ(t0)=f(b)f(a)=f(a+t0h)(ba)=:f(c)(ba)\phi'(t_0) = f(\vec{b}) - f(\vec{a}) = \nabla f(\vec{a} + t_0 \vec{h}) \cdot ( \vec{b} - \vec{a}) =: \nabla f(\vec{c}) \cdot (\vec{b} - \vec{a}).

Definition. A set is convex if a,bS\vec{a}, \vec{b} \in S, then a+t(ba)S\vec{a} + t(\vec{b} - \vec{a}) \in S for all t[0,1]t \in [0, 1].

  • Convex functions: preimages of convex sets are convex.

Corollary. If fRnRf \subseteq \mathbb{R}^n \to \mathbb{R} is differentiable on SS open, convex; if f(x)M\Vert \nabla f(\vec{x}) \Vert \le M, xS\forall \vec{x} \in S, then a,bS\forall \vec{a}, \vec{b} \in S, then f(b)f(a)Mba\vert f(\vec{b}) - f(\vec{a}) \le M \Vert \vec{b} - \vec{a} \Vert.

Proof.

  1. By convexity, from the MVT, a,bS,cLa,b\forall \vec{a}, \vec{b} \in S, \exists \vec{c} \in L_{\vec{a}, \vec{b}} such hthat \(\nabbla f(\vec{c}) \cdot (\vec{b} - \vec{a}) = f(\vec{b}) - f(\vec{a})\).
  2. Thus f(b)f(a)f(c)baMba\vert f(\vec{b}) - f(\vec{a}) \vert \le \Vert \nabla f(\vec{c}) \Vert \Vert \vec{b} - \vec{a} \Vert \le M \Vert \vec{b} - \vec{a} \Vert.

Corollary. If ff is differentiable on SS, open and convex, and f(x)=0\nabla f(\vec{x}) = \vec{0}, $$forall x \in S\(, then\)f\(is constant on\)S$$.

Theorem. Suppose ff is differentiable on SS, open and connected, and the gradient vanishes everywhere. Then ff i constant.

Higher Order Partials.

If f:SRnRf : S \subseteq \mathbb{R}^n \to \mathbb{R} is differentiable on $$S$ and open. Its partials are also functions, which may also have partials.


Lecture 14: 1-Dimensional Integration, Integration in Higher Dimensions

Theorem. If ff is bounded and monotone on [a,b][a, b] then ff is integrable on [a,b][a, b].

Theorem. If ff is continuous on [a,b][a, b], then ff is integrable on [a,b][a, b]. Proof.

  1. Since ff is bounded on [a,b][a, b] by the extreme value theorem, then UPf,LPfU_P f, L_P f exist for any partition PP.
  2. Since ff is continuous on a compact set, ff is uniformly continuous.
  3. For all ϵ>0\epsilon > 0, δ\exists \delta s.t. x,y[a,b]\forall x, y \in [a, b], xy<δ    f(x)f(y)<ϵ\vert x - y \vert < \delta \implies \vert f(x) - f(y) \vert < \epsilon.
  4. Let PP be a partition of [a,b][a, b] into equally spaced intervals each with length <δ< \delta.
  5. So we have Mjmj<ϵ/(ba)M_j - m_j < \epsilon / (b - a)
  6. Meaning UpfLpf=j=1J(Mjmj)(xjxj1)ϵbaj=1J(xjxj1)=ϵba(xJx1)=ϵ/(ba)(ba)=ϵU_p f - L_p f = \sum_{j = 1}^J (M_j - m_j) (x_j - x_{j - 1}) \le \frac{\epsilon}{b - a} \sum_{j = 1}^J (x_j - x_{j - 1}) = \frac{\epsilon}{b - a} (x_J - x_1) = \epsilon / (b -a ) \cdot (b - a) = \epsilon

Note that partitions are all finite, but you can bound them by any ϵ\epsilon. So we have ϵ>0,P:UPfLPfϵ\forall \epsilon > 0, \exists P : U_P f - L_P f \le \epsilon

  • Midterm: 1.8 up to end of chapter 2

Function: f(x)={0if xQ1/qif x=p/q in lowest termsf(x) = \begin{cases} 0 & \text{if } x \notin \mathbb{Q} \\ 1 / q & \text{if } x = p / q \text{ in lowest terms} \end{cases}.

  • f(x)f(x) is bounded, but not monotone or continuous
  • But it is integrable on [0,1][0, 1]

Theorem. If ff is bounded on [a,b][a, b] and is continuous on [a,b][a, b] except at finitely many points, then ff is integrable on [a,b][a, b]. Proof.

  • Suppose y1,...,yLy_1, ..., y_L are points of discontinuity.
  • Set m=inf{f(x)[a,b]}m = \inf \{ f(x) \vert [a, b] \} and $$M = \sup { f(x) \vert [a, b] }
  • Given δ>0\delta > 0, set Il=[a,b][ylδ,yl+δ]I_l = [a, b] \cap [y_l - \delta, y_l + \delta]
  • U=l=1LIlU = \bigcup_{l = 1}^L I_l, V=[a,b]UintV = [a, b] \setminus U^{\text{int}}
  • Notice that the length of UU is $$le 2 \delta L,so, sof : V \to R$$ is continuous
  • If PP is any partition containing δU\delta U, we can write UPf=UPUf+UPVfU_P f = U_P^U f + U_P^V f
  • basically… with time, abf(x)g(x)dx=0\int_a^b f(x) - g(x) dx = 0, at finitely many points, for some surrogate gg

Fundamental Theorem of Calculus. There are two!

  1. Let ff be integrable on [a,b][a, b]. For x[a,b]x \in [a, b], define F(x)=axf(t)dtF(x) = \int_a^x f(t) dt. Then FF is continuous on [a,b][a, b] and differentiable on (a,b)(a, b), and F(x)=f(x)F'(x) = f(x) on [a,b][a, b]. (It makes sense even if ff is integrable but not continuous!) Proof:
  2. Since x,y[a,b]\forall x, y \in [a, b] it is all true that F(y)F(x)=xyf(t)dtF(y) - F(x) = \int_x^y f(t) dt
  3. You can define C=sup{f(t):t[a,b]}C = \sup \{ \vert f(t) \vert : t \in [a, b] \}
  4. We see that F(y)F(x)xyf(t)dtCyx\vert F(y) - F(x) \vert \le \int_x^y \vert f(t) \vert dt \le C \vert y - x \vert. Which means that FF is Lipschitz continuous.
  5. Say that ff is continuous at x0[a,b]x_0 \in [a, b]. Then ϵ>0,δ>0\forall \epsilon > 0, \exists \delta > 0 s.t. tx0<δ    f(t)f(x0)<ϵ\vert t - x_0 \vert < \delta \implies \vert f(t) - f(x_0) < \epsilon.
  6. Further, f(x0)=f(x0)(yx0)x0y1dt=x0yf(x0)dtf(x_0) = \frac{f(x_0)}{(y - x_0)} \int_{x_0}^y 1 dt = \int_{x_0}^y f(x_0) dt
  7. F(y)F(x0)yx0f(x0)=1yx0x0yf(t)f(x0)dt\frac{F(y) - F(x_0)}{y - x_0} - f(x_0) = \frac{1}{y - x_0} \int_{x_0}^y f(t) - f(x_0) dt.
  8. We get that limyx0F(y)F(x0)yx0=f(x0)\lim_{y \to x_0} \frac{F(y) - F(x_0)}{y - x_0} = f(x_0) by an epsilon-delta argument.
  9. Let FF be continuous on [a,b][a, b] and differentiable except at finitely many points of [a,b][a, b]. Let ff be a function which agrees with F(x)F'(x) where it is defined. If ff is integrable on [a,b][a, b] then abf(t)dt=F(b)F(a)\int_a^b f(t) dt = F(b) - F(a).
  10. Suppose P={x0,...,xJ}P = \{ x_0, ..., x_J \} is a partition of [a,b][a, b]. Then after perhaps refining we can assume each point of FF being non-differentiable is an end point of a subinterval of PP.
  11. Then j,F\forall j, F is continuous on [xj1,xj][x_{j - 1}, x_j] and differentiable on (xj1,xj)(x_{j - 1}, x_j).
  12. Thus by the MVT, F(xj)F(xj1)=F(tj)(xjxj1)=f(tj)(xjxj1)F(x_j) - F(x_{j - 1}) = F'(t_j)(x_j - x_{j - 1}) = f(t_j)(x_j - x_{j-1}) for some tj(xj1,xj)t_j \in (x_{j -1 }, x_j)
  13. Then, we get a telescoping sum: F(b)F(a)=F(xJ)F(x0)=j=1Jf(tj)(xjxj1)F(b) - F(a) = F(x_J) - F(x_0) = \sum_{j = 1}^J f(t_j) (x_j - x_{j - 1})
  14. But we know this is bounded between the lower and upper Riemann sums: LPfF(b)F(a)UPfL_P f \le F(b) - F(a) \le U_P f
  15. We know we can make the difference between the two arbitrarily small. Therefore F(b)F(a)=abf(t)dtF(b) - F(a) = \int_a^b f(t) dt if ff is integrable.

Integration in higher dimensions.

  • A product of intervals is a rectangle R=[a,b]×[c,d]R = [a, b] \times [c, d]
  • A partition is a grid of rectangles.

Definition. If f:R2Pf : \mathbb{R}^2 \to P and PP a partition of RR, as above. Define mjk=infxRjkf(x),Mjk=supxRjkf(x)m_{jk} = \inf_{x \in R_{jk}} f(x), M_{jk} = \sup_{x \in R_{jk}} f(x). We can define upper and lower Riemann sums by similar logic.

Definition. The characteristic function of SS is χS(x)={1if xS0if xS\chi_S(x) = \begin{cases} 1 & \text{if } x \in S \\ 0 & \text{if } x \notin S \end{cases}

Indicator functions are interesting examples depending on the set.

So you can define the integral over a non-rectangle as the integral over a rectangle of the indicator function.

Indicator functions can actually fuck up nice behavior of our original sets. So how can you guarantee that the result is still integrable? But the boundary of SS can have zero content. f×χSf \times \chi_S will only be discontinuous on the boundary of SS.


Lecture 15: Higher Dimension Integration

  • if SRnS \in \mathbb{R}^n, the characteristic function of SS is 1 if the point is in SS and 0 otherwise
  • If f:SR2Rf: S \in \mathbb{R}^2 \to \mathbb{R} is bounded, and SS is bounded and RR is any rectangle containing SS, then then fχSf \cdot \chi_S is integrable on RR and define SfdA=RfχSdA\int \int_S f dA = \int \int_R f \cdot \chi_S dA

Theorem declaring the properties of integrals:

  • linearity – a linear combination of any to integrals is integrable on SS
  • if a function is integrable on bounded disjoint domain sets, it is integrable on their union and the sum of their individual integrals
  • if f,gf, g are integrable and f(x)g(x)f(\vec{x}) \le g(\vec{x}) for all xS\vec{x} \in S, then you have a bound on the integrals: Sf(x)dASg(x)dA\int \int_S f(\vec{x}) dA \le \int \int_S g(\vec{x}) dA
  • if ff is integrable on SS, then f(x)\vert f(\vec{x}) \vert is integrable on SS and SfdxSfdA\vert \int \in_S f dx \vert \le \int \int_S \vert f \vert dA

Combining a. and d. gives you a form of the triangle inequality f+gdAfdA+gdA\vert \int \int f + g dA \vert \le \int \int \vert f \vert dA + \int \int \vert g \vert dA

Even if ff is very nice, fχSf \cdot \chi_S won’t be continuous on RR.

Lemma. χS\chi_S is dicontinuous at x\vec{x} means xS\vec{x} \in \partial S

  1. If xSintx \in S^{\text{int}}, there exists a ball of some radius centered at x\vec{x} which is a subset of SS, therefore the characteristic function is constant and continuous in the neighborhood of that point.
  2. If x(SC)intx \in (S^C)^{\text{int}}, you are also constant.
  3. If \(\vec{x} \in \partial S4$, then\)\forall \delta > 0$4, Bδ(x)SB_\delta(\vec{x}) \cap S \neq \emptyset and Bδ(x)SCB_\delta(\vec{x}) \cap S^C \neq \emptyset.

Proposition.

  1. If ZR2Z \subseteq \mathbb{R}^2 has zero content and UZU \subseteq Z then UU has zero content.
  2. If Z1,...,ZkZ_1, ..., Z_k have zero content, then their union has zero content.
  3. If f:(a0,b0)R2\vec{f} : (a_0, b_0) \to \mathbb{R}^2 is C1C^1, then the image of any subinterval f((a0,b0))\vec{f}((a_0, b_0)) has zero content.
    • Each part of the curve can be contained in some set of rectangles; you can find a set of rectangles whose total area is less than your tolerance.

Proof for 3.

  1. C1C^1 buys us the mean value theorem
  2. If PkP_k is some equally spaced partition of [a,b][a, b], of subintervals of length δ=bak\delta = \frac{b - a}{k}, C=sup{f(t)}C = \sup \{ \vert f'(t) \vert \}, writing f(t)=(x(t),y(t))\vec{f}(t) = (x(t), y(t)), we apply MVT to each comp to obtain x(t)x(tj)Cδ\vert x(t) - x(t_j) \vert \le C \delta, y(t)y(tj)Cδ\vert y(t) - y(t_j) \vert \le C \delta.
  3. Together, these imply that the image of the subinterval is contained in a square of side length CδC \delta.
  4. Then the area as kk \to \infty goes to zero.

The image of any curve in C1C^1 will have zero content.

If f:SRkRnf : S \in \mathbb{R}^k \to \mathbb{R}^n, k<nk < n, is C1C^1, and 44Sisbounded,thenis bounded, thenf(S)$$ has zero content.

Remark. A set SR2S \subset \mathbb{R}^2 s.t. SS is bounded and S\partial S has zero content, then SS is Jordan-measurable.

Theorem. If SR2S \subseteq \mathbb{R}^2 is Jordan measurable and f:R2Rf : \mathbb{R}^2 \to \mathbb{R} is bounded and continuous on SS except perhaps on a set of zero content, then ff is integrable on SS.

Proposition. If ZR2Z \subseteq \mathbb{R}^2 of zero content and f:R2Rf : \mathbb{R}^2 \to \mathbb{R} is bounded then ff is integrable on ZZ and has integral zero.

Proof.

  1. If ϵ>0\epsilon > 0, there exists some collection of rectangles such that ZZ is contained in their union and the sum of their areas is less than ϵ\epsilon
  2. After subdividing the rectangles as needed, we can assume they form a partition of a rectangle containing ZZ and interior pairwise disjoint
  3. Setting C=supZf(x)C = \sup_Z \vert f(\vec{x}) then we have Cϵ<Cj=1MA(Rj)LP(f)UP(f)<Cj=1MA(Rj)<Cϵ-C \epsilon < -C \sum_{j = 1}^M A(R_j) \le L_P (f) \le U_P(f) < C \sum_{j = 1}^M A (R_j) < C\epsilon. This tells us both the lower and upper Riemann sums are within CϵC \epsilon of each other, and therefore the integral is zero.

Corollary

  1. If ff is integrable on SR2S \subseteq \mathbb{R}^2 and f=gf = g except on a set of zero content, then SdA=SgdA\int \int_S dA = \int \int_S g dA
  2. If ff is integrable on S,TS, T and STS \cap T has zero content, then ff is integrable on STS \cup T and fdA=SfdA+TfdA\int \int f dA = \int \int_S f dA + \int \int_T f dA

How to generalize to higher dimensions? Every argument, lemma, proposition, and theorem in this section is generalizable in higher-dimensional spaces in a straightforward way. Turn squares into boxes. Riemann sums are defined on partitions of boxes.

Mean Value Theorem for integrals. If SRnS \subseteq \mathbb{R}^n is compact, connected, and Jordan-measurable, and f,gf, g are continuous on the set. Also gg is non-negative. Then there exists some aS\vec{a} \in S such that Sf(x)g(x)dv=f(a)Sg(x)dV\int_S f(\vec{x}) g(\vec{x}) dv = f(\vec{a}) \int_S g(\vec{x}) dV.

Proof.

  1. Because ff is continuous on SS, by the extreme value theorem, we have that there exists M,mM, m such that mf(x)Mm \le f(\vec{x}) \le M for all xS\vec{x} \in S.
  2. That is, we have that mSgdVfgdVMSgdVm \int_S g dV \le \int f \cdot g dV \le M \int_S g dV.
  3. This implies mSfgdVSgdMm \le \frac{\int_S f \cdot g dV}{\int_S g d} M
  4. By the intermediate value theorem, there exists an aS\vec{a} \in S such that f(a)=SfgdVSgdVf(\vec{a}) = \frac{\int_S f \cdot g dV}{\int_S g dV}.

Corollary. Average Value. If SRnS \subseteq \mathbb{R}^n is compact, connected, Jordan-measurable, and ff is continuous on SS, then there exists some aS\vec{a} \in S such that f(a)=AvgS(f)=1Vol(S)SfdVf(\vec{a}) = \text{Avg}_S(f) = \frac{1}{\text{Vol}(S)} \int_S f dV. There is some point in SS where you are equal to your average across the set.