Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of...

97
Lecture Notes - MATH 231A - Real Analysis Kyle Hambrook May 30, 2020

Transcript of Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of...

Page 1: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Lecture Notes - MATH 231A - Real Analysis

Kyle Hambrook

May 30, 2020

Page 2: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Contents

1 Introduction and Preliminaries 3

1 Introduction: Riemann to Lebesgue . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Measurable Sets and Measurable Functions 11

3 σ-Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Measures and Integrals 22

5 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

6 Integral of Non-Negative Simple Functions . . . . . . . . . . . . . . . . . . . . . . . . 25

7 Integral of Non-Negative Extended Real-Valued Functions . . . . . . . . . . . . . . . 28

8 Integral of Extended Real-Valued Functions . . . . . . . . . . . . . . . . . . . . . . . 33

9 Almost Everywhere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

10 Fatou’s Lemma and Dominated Convergence Theorem . . . . . . . . . . . . . . . . . 40

4 Spaces of Functions 44

11 Normed Spaces and Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

12 Lp: Definitions and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

13 Lp: Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

14 B(X) and C(X) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

1

Page 3: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

15 L∞ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5 Some Generalizations 59

16 Complex-Valued Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6 Construction of Measures 62

17 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

18 Set Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

19 Caratheodory’s Theorem: Outer Measures to Measures . . . . . . . . . . . . . . . . . 64

20 Construction of Outer Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

21 Existence and Uniqueness of Extensions . . . . . . . . . . . . . . . . . . . . . . . . . 72

7 Lebesgue Measure on R 73

22 Lebesgue Measure on R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

23 Complete Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

24 Completion of Measures (Optional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

25 Riemann Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

26 Comparison of Lebesgue Integral and Riemann Integral . . . . . . . . . . . . . . . . 79

8 Product Measures and Fubini’s Theorem 82

27 Measurable Rectangles and Product σ-Algebras . . . . . . . . . . . . . . . . . . . . . 82

28 Product Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

29 Fubini’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

30 Product Spaces With More Than Two Factors . . . . . . . . . . . . . . . . . . . . . 93

31 Lebesgue Measure on Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

2

Page 4: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Chapter 1

Introduction and Preliminaries

1 Introduction: Riemann to Lebesgue

You are probably familiar with the Riemann integral from calculus and undergraduate analysis. Iff is a non-negative real-valued function defined on an interval [a, b], then, roughly speaking, theRiemann integral of f is a limit of Riemann sums∫ b

af(x)dx = lim

n→∞

n∑i=1

f(x∗i )(xi − xi−1),

where [a, b] is divided into subintervals [x0, x1], [x1, x2], . . . , [xn−1, xn] with a = x0 ≤ x1 ≤ . . . ≤xn = b, the (sample) points x∗i ∈ [xi−1, xi] are arbitrary, and (xi − xi−1) is the length of [xi−1, xi].We say f is Riemann integrable on [a, b] when its Riemann integral is defined.

In this course, we will study Lebesgue’s theory of integration with respect to a measure. Roughly, ameasure µ on a set X is a function which takes subsets A ⊆ X as inputs and gives non-negative realnumbers µ(A) as outputs. You can think of µ(A) as some general notion of the size of A. The mostimportant measure is the Lebesgue measure on R, denoted by λ. For each interval [a, b], λ([a, b])is the length b − a. For other sets A ⊆ R, λ(A) is the length of A (we will see how to define thislater.) There are many other useful measures. If f is a non-negative real-valued function definedon a set X, then, roughly speaking, the integral of f with respect to µ is∫

Xf(x)dµ(x) = lim

n→∞

n∑i=1

f(x∗i )µ(Ai)

Here the set X is partitioned into subsets A1, . . . , An, we choose sample points x∗i ∈ Ai, and µ(Ai)is the measure of the set Ai. When the integral is defined, we say f is µ-integrable. When µ is theLebesgue measure λ on R, the integral is simply called the Lebesgue integral.

Lebesgue’s theory of integration has several advantages over Riemann’s theory.

(1) Lebesgue’s theory allows us to integrate functions defined on arbitrary sets, whereas the Rie-mann integral is restricted to functions defined on R or Rd. This is especially important for certain

3

Page 5: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory, where the functions are random variables and we integrateover the sample space of possible outcomes.

(2) The Lebesgue integral (i.e., the integral with respect to Lebesgue measure) is defined for alarger class of functions on R, though it still agrees with the Riemann integral whenever the latteris defined. For example, the indicator function of the rationals

1Q =

{1 if x ∈ Q0 if x /∈ Q

is not Riemann integrable on any interval [a, b], but it is Lebesgue integrable on any such interval.We will see the details later.

(3) Lebesgue’s theory possesses better convergence theorems, which lead to more general elegantresults and more useful spaces of functions.

Here is the type of convergence theorem we are interested in:

Prototype Convergence Theorem. If (fn) is a sequence of integrable functions defined on [a, b]

and (fn) converges to f “nicely”, then f is integrable and limn

∫ ba fn =

∫ ba f .

The problem is, of course, to figure out what “nicely” might mean and to define the integral insuch a way that theorems like this one will be widely applicable. These were important unresolvedissues in the late nineteenth century; they arose, for example, in the study of Fourier series. TheLebesgue theory was developed, in large part, to address this. Let us look at this problem in a bitmore detail.

Definition 1.1. A sequence (fn) of functions fn : X → R is said to converge pointwise to afunction f : X → R if

limn→∞

fn(x) = f(x) for each x ∈ X.

In this case, we write fn → f pointwise.

Example 1.2. fn = tent function = graph is triangle with vertices (0, 0), (1/n, n), (2/n, 0), equalszero elsewhere. Then fn → f = 0 pointwise, fn is Riemann integrable on [a, b], and f = 0 is

Riemann integrable on [a, b], but limn

∫ ba fn = 1 6= 0 =

∫ ba f . So we don’t get the desired conclusion

of the Prototype Convergence Theorem.

Example 1.3. Choose an enumeration Q = {r1, r2, r3, . . .}. Define f = 1Q and define fn byfn(x) = 1 if x ∈ {r1, . . . , rn} and fn(x) = 0 otherwise. Then fn → f pointwise, and fn is Riemannintegrable on [a, b], but f is not Riemann integrable on [a, b]. Again we don’t get the desiredconclusion of the Prototype Convergence Theorem. Note that here the sequence (fn) is quite nice.Indeed, (fn) is bounded (|fn| ≤ 1 for all n) and (fn) is increasing (fn ≤ fn+1 for all n). But it’snot enough.

Example 1.4. Later, we will construct a sequence of functions fn : [0, 1]→ R such that

• 0 ≤ fn ≤ 1 for each n

• fn is continuous for each n

4

Page 6: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

• (fn) is an increasing sequence

• (fn) converges pointwise to a function f on [0, 1].

• f is not Riemann integrable on [0, 1].

This sequence is even nicer than the one in the previous example, but we still don’t get the desiredconclusion of the Prototype Convergence Theorem.

The examples above show that pointwise convergence isn’t good enough for Riemann integration.By assuming more, we can get some convergence theorems for Riemann integration. We describethree such theorems below. Unfortunately, they are all somewhat unsatisfactory.

The first theorem requires uniform convergence.

Definition 1.5. A sequence (fn) of functions fn : X → R is said to converge uniformly to afunction f : X → R if

limn→∞

sup {|fn(x)− f(x)| : x ∈ X} = 0.

In this case, we write fn → f uniformly.

Note that uniform convergence implies pointwise convergence.

Theorem 1.6. If fn is Riemann integrable on [a, b] for each n and fn → f uniformly on [a, b], then

f is Riemann integrable on [a, b] and limn

∫ ba fn =

∫ ba f .

The next two theorems require, respectively, the sequence of functions to be increasing or bounded.

Theorem 1.7. If fn is Riemann integrable on [a, b] for each n, fn ≤ fn+1 for each n, fn → f

pointwise on [a, b], and f is Riemann integrable on [a, b], then limn

∫ ba fn =

∫ ba f .

Theorem 1.8. If fn is Riemann integrable on [a, b] for each n, |fn| ≤ M for each n, fn → f

pointwise on [a, b], and f is Riemann integrable on [a, b], then limn

∫ ba fn =

∫ ba f .

The problem with the first of these three theorems is that uniform convergence is too strong. Inmany applications, we don’t have it. The problem with the last two theorems is that the Riemannintegrability of f is part of the hypothesis, rather than part of the conclusion.

We will see that the Lebesgue theory gives us very powerful convergence theorems, which lead tosome very impressive results and some very useful function spaces.

5

Page 7: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

2 Preliminaries

Set Theory

Definition 2.1.

• If A and B are sets, their union and intersection are

A ∪B = {x : x ∈ A or x ∈ B}A ∩B = {x : x ∈ A and x ∈ B}

• If {Ai : i ∈ I} is an indexed family of sets, the union and intersection of the family are⋃i∈I

Ai = {x : x ∈ Ai for at least one i ∈ I}⋂i∈I

Ai = {x : x ∈ Ai for all i ∈ I}

• If A and B are sets, the set difference of A and B (or the relative complement of Bin A) is is

A \B = {x : x ∈ A and x /∈ B}

It is read as “A minus B” or “A take away B”.

• If all the sets in a given context are subsets of a fixed set X, then the complement (orabsolute complement) of a set A ⊆ X is

Ac = X \A = {x ∈ X : x /∈ A}

Properties of Complement:Suppose all sets under consideration are subsets of a fixed set X. Let A,B ⊆ X and let {Ai : i ∈ I}be an indexed family of subsets of X.

• A ∪Ac = X

• A ∩Ac = ∅

• ∅c = X

• Xc = ∅

• (Ac)c = A

• A \B = A ∩Bc = Bc \Ac

• If A ⊆ B, then Bc ⊆ Ac

• De Morgan’s Laws: (⋃i∈I

Ai

)c=⋂i∈I

Aci and

(⋂i∈I

Ai

)c=⋃i∈I

Aci

6

Page 8: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

In words, De Morgan’s Laws say that the complement of a union is the intersection of the comple-ments, and the complement of an intersection is the union of the complements.

Properties of Union, Intersection, and Set Difference

• B ∪⋃i∈I Ai =

⋃i∈I(B ∪Ai)

• B ∩⋂i∈I Ai =

⋂i∈I(B ∩Ai)

• B ∪⋂i∈I Ai =

⋂i∈I(B ∪Ai)

• B ∩⋃i∈I Ai =

⋃i∈I(B ∩Ai)

• A = (A ∩B) ∪ (A \B)

• A ∩B = A \ (A \B)

• A \ (B ∪ C) = (A \B) \ C

• (A ∪B) \ C = (A \B) ∪ (A \ C)

Definition 2.2. For any set X, the power set of X, denoted by P(X), is the collection of allsubsets of X:

P(X) = {A : A ⊆ X}

Definition 2.3. Let X,Y be sets. A function (or map) f : X → Y is a rule that assigns toeach element x ∈ X a unique element f(x) ∈ Y . The sets X and Y are called the domain andcodomain of f . The image (or range) of f is the set

f(X) = {f(x) : x ∈ X} .

If A ⊆ X, the image of A under f is the set

f(A) = {f(x) : x ∈ A}

If B ⊆ Y , the inverse image (or preimage) of B under f is the set

f−1(B) = {x ∈ X : f(x) ∈ B}

The inverse image commutes with unions, intersections, set differences, and complements:

• f−1(⋃

i∈I Ai)

=⋃i∈I f

−1(Ai)

• f−1(⋂

i∈I Ai)

=⋂i∈I f

−1(Ai)

• f−1 (A \B) = f−1(A) \ f−1(B)

• f−1 (Ac) =(f−1(A)

)cRemark 2.4. The image is not so well-behaved. See Exercise ???.

Definition 2.5. If f : X → Y and g : Y → Z, the composition of f and g is the function

g ◦ f : X → Z

defined by(g ◦ f)(x) = g(f(x)) for every x ∈ X.

7

Page 9: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Extended Real Numbers

The set of extended real numbers is the set obtained by adjoining the two symbols −∞ and+∞ to the set of real numbers. It is denoted by R. Thus

R = R ∪ {−∞,+∞} .

We often write ∞ instead of +∞ We extend the usual ordering on R to R by declaring that

−∞ < x <∞ for every x ∈ R

The interval notation is extended in the natural way. For example,

(0,∞] = (0,∞) ∪ {∞} ={x ∈ R : x > 0

}={x ∈ R : 0 < x ≤ ∞

}.

We extend the usual arithmetic operations on R to R by declaring that

• x+∞ =∞ and x−∞ = −∞ for all x ∈ R

• ∞+∞ =∞ and −∞−∞ = −∞

• x · (±∞) = ±∞ for all x ∈ (0,∞]

• x · (±∞) = ∓∞ for all x ∈ [−∞, 0)

• 0 · (±∞) = 0

• x

±∞= 0 for all x ∈ R.

Expressions of the form

∞−∞ and∞∞

are left undefined.

In some other areas of mathematics, the products 0 · (±∞) = 0 are left undefined, but not here.

The absolute values of −∞ and +∞ are

| −∞| = |+∞| = +∞

Supremum and Infimum

Let A ⊆ R.

An element a0 ∈ R is called the maximum (or greatest element) of A if a0 ∈ A and a ≤ a0 forevery a ∈ A. The maximum of A is denoted by max(A). It is easy to see that the maximum isunique if it exists, hence why we say “the maximum” instead of “a maximum”. The minimum(or least element) of A is defined similarly; it is denoted by min(A) and is unique if it exists.

8

Page 10: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

An element a0 ∈ R is called an upper bound of A if a ≤ a0 for every a ∈ A. If A has an upperbound, it is said to be bounded above. A lower bound of A is defined similarly.

An element a0 ∈ R is called the supremum of A if it is the least upper bound of A, i.e., if a0 isthe minimum of the set of all upper bounds of A. Thus a0 is the supremum of A if and only if a0is an upper bound of A and a0 ≤ a′0 for every upper bound a′0 of A. The supremum is denotedby sup(A). Since minimums are unique if they exist, the supremum is unique if it exists. Theinfimum of A is the greatest lower bound of A; it is denoted by inf(A) and is unique if it exists.

If A is non-empty and bounded above in R, the order completeness axiom of the real numbersimplies that A has a supremum sup(A) in R.

If A is non-empty but not bounded above in R, then ∞ is the only upper bound for A, and sosup(A) =∞.

If A is empty, then every extended real number is an upper bound for A, and so sup(A) = −∞.Every subset of R has a supremum and an infimum in R.

Thus every subset of R has a supremum in R. The same goes for infimum.

Limits, Limsup, Liminf

Let (an) be a sequence in R.

Definition 2.6.

• Let a ∈ R. We write lim an = a (and we say that (an) converges to a and that a is the limitof (an)) if for every real ε > 0 there exists an N ∈ N such that if n ≥ N then

|an − a| < ε.

• We write lim an = +∞ (and we say that (an) converges to +∞ and that +∞ is the limit of(an)) if for every real M > 0 there exists an N ∈ N such that if n ≥ N then

an > M.

• We write lim an = −∞ (and we say that (an) converges to −∞ and that −∞ is the limit of(an)) if for every real M > 0 there exists an N ∈ N such that if n ≥ N then

an < −M.

Definition 2.7. The limsup (or limit superior) and liminf (or limit inferior) are defined by

lim sup an = infk≥1

(supn≥k

an

)

lim inf an = supk≥1

(infn≥k

an

)Note that

9

Page 11: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

• The sequence ck =

(supn≥k

an

)is a decreasing sequence

• The sequence bk =

(infn≥k

an

)is an increasing sequence

• lim sup an = infk≥1

(supn≥k

an

)= lim

k

(supn≥k

an

)

• lim inf an = supk≥1

(infn≥k

an

)= lim

k

(infn≥k

an

)• bk ≤ ak ≤ ck for all k

• lim inf an ≤ lim sup an

Theorem 2.8. We have lim sup an = lim inf an iff limn an = L for some L ∈ R, in which case

lim sup an = lim inf an = limnan.

Proof. Exercise.

Infinite Series

Definition 2.9. Let∑∞

n=1 an be an infinite series whose terms an belong to R. The series has asum in R (i.e., the series converges in R) if both

(a) ∞ and −∞ do not both occur among the terms of∑∞

n=1 an, and

(b) the sequence sk =∑k

n=1 an of partial sums of∑∞

n=1 an has a limit in R

The sum of the series∑∞

n=1 an is then defined to be the limit of the partial sums:

limksk = lim

k

k∑n=1

an =

∞∑n=1

an.

Remark 2.10. (i) The sum of the series may be −∞ or +∞.

(ii) Note that condition (a) above is needed to guarantee that the undefined expression ∞−∞does not occur in any of partial sums sk =

∑kn=1 an.

(iii) If∞ is one of the terms of the series and −∞ is not, then the sum of the series is∞. Likewiseif we swap the roles of ∞ and −∞.

10

Page 12: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Chapter 2

Measurable Sets and MeasurableFunctions

3 σ-Algebras

Definition 3.1. Let X be any set. A σ-algebra on X is a collection A ⊆ P(X) that satisfies thefollowing properties.

(i) ∅ ∈ A

(ii) If A ∈ A, then Ac = X \A ∈ A.

(iii) If A1, A2, . . . ∈ A, then⋃∞i=1Ai ∈ A.

The pair (X,A) is called a measurable space. The sets in A are called measurable sets (orA-measurable sets to avoid ambiguity).

Remark: σ indicates “countable union”

Theorem 3.2. If A is a σ-algebra on a set X, then A has the following properties:

(a) X ∈ A

(b) If A1, A2, . . . ∈ A, then⋂∞i=1Ai ∈ A.

(c) If A1, . . . , An ∈ A, then⋃ni=1Ai ∈ A.

(d) If A1, . . . , An ∈ A, then⋂ni=1Ai ∈ A.

(e) If A,B ∈ A, then A \B = A ∩Bc ∈ A.

Proof. In this proof, (i),(ii),(iii) refer to the properties in the definition of a σ-algebra.

(a): By (i) and (ii), X = ∅c ∈ A.

11

Page 13: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

(b): Let A1, A2, . . . ∈ A. By De Morgan’s laws

∞⋂i=1

Ai =

( ∞⋃i=1

Aci

)c.

By (ii), Ac1, Ac2, . . . ∈ A. Then, by (iii),

⋃∞i=1A

ci ∈ A. Finally, by (b) again,

⋂∞i=1Ai = (

⋃∞i=1A

ci )c ∈

A.

(c): Define Ai = ∅ for i > n and apply (iii).

(d): Define Ai = X for i > n and apply (b). Alternatively, use De Morgan’s laws and apply (ii)and (c).

(e): Use (ii) and (d).

Example 3.3. Let X be any set. The following are σ-algebras on X.

(a) P(X)

(b) {∅, X}

(c) {∅, A0, Ac0, X} for any fixed A0 ⊆ X.

(d) {A ⊆ X : A is countable or Ac is countable}

Theorem 3.4. Let X be any set. The intersection of any collection of σ-algebras on X is aσ-algebra on X.

Proof. Let {Aj}j∈J be any collection of σ-algebras on X and consider their intersection⋂j∈J Aj .

We need to check properties (i), (ii), and (iii) of the definition of a σ-algebra.

(i): We have ∅ ∈ Aj for every j ∈ J . So ∅ ∈⋂j∈J Aj .

(ii): Suppose A ∈⋂j∈J Aj . Then A ∈ Aj for every j ∈ J . Since Aj is a σ-algebra for every j ∈ J ,

we have Ac ∈ Aj for every j ∈ J . So A ∈⋂j∈J Aj .

(iii): Suppose A1, A2, . . . ∈⋂j∈J Aj . Then A1, A2, . . . ∈ Aj for every j ∈ J . Since Aj is a σ-algebra

for every j ∈ J , we have⋂∞i=1Ai ∈ Aj for every j ∈ J . So

⋂∞i=1Ai ∈

⋂j∈J Aj .

Definition 3.5. Let X be any set and let G ⊆ P(X). The intersection of all σ-algebras on X thatcontain G is denoted by σ(G). In other words,

σ(G) =⋂{A : A is a σ-algebra on X, G ⊆ A} .

We call σ(G) the σ-algebra generated by G.

The following theorem is immediate from the definition of σ(G).

Theorem 3.6. Let X be any set and let G ⊆ P(X). Then σ(G) is the smallest σ-algebra on Xthat contains σ(G); in other words,

12

Page 14: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

(i) σ(G) is a σ-algebra on X

(ii) G ⊆ σ(G)

(iii) If A is any σ-algebra on X and G ⊆ A, then σ(G) ⊆ A.

Definition 3.7.

• An open ball in Rd is a set of the form B(x0, r) = {x ∈ X : |x− x0| < r}, where x0 ∈ Rdand 0 < r <∞. Note that the open balls in R are the open intervals of the form (a, b) with−∞ < a < b <∞. To see this, note (a, b) = (x0− r, x0 + r) = B(x0, r), where x0 = (b+ a)/2and r = (b− a)/2.

• An open set in Rd is a union open balls.

Definition 3.8. Suppose X is R or Rd (or any topological space). The Borel σ-algebra on X,denoted by B(X), is the σ-algebra generated by the collection of all open subsets of X. In otherwords, if T denotes the collection of all open subsets of X, then B(X) = σ(T ). The elements ofB(X) are called Borel sets in X.

Theorem 3.9. The Borel σ-algebra B(Rd) is generated by the collection of all open balls in Rd.In other words, if G is the collection of all open balls in Rd, then B(Rd) = σ(G).

Proof. Let T be the collection of all open sets in Rd. Since B(Rd) = σ(T ), we must show σ(G) =σ(T ). Since G ⊆ T , we have σ(G) ⊆ σ(T ) by Theorem 3.6. It remains to show σ(T ) ⊆ σ(G). IfA ∈ T , then A is a union of open balls in Rd. Thus, for each point in x ∈ A, there is an open ballBx that contains x and is contained in A. Now for each point x ∈ A choose an open ball B′x withthe following four properties:

(1) B′x contains x,

(2) B′x is contained in Bx and (consequently) in A

(3) The center of B′x is a point with rational coordinates

(4) The radius of B′x is rational.

Then A is the union of the collection of the open balls Bx′ . But, since the open balls Bx′ all haverationals center and radii, there can be only countably many of them. Thus A is the union of acountable collection of open balls. Hence A ∈ σ(G). Thus T ⊆ σ(G). Therefore σ(T ) = σ(G).

Theorem 3.10. The Borel σ-algebra B(R) is generated by each of the following collections of sets:

(i) G1 = {(a, b) : a, b ∈ R, a < b}

(ii) G2 = {(a, b] : a, b ∈ R, a < b}

(iii) G3 = {(−∞, b] : b ∈ R}

(iv) G4 = {(−∞, b) : b ∈ R}

(v) G5 = {[a, b] : a, b ∈ R, a < b}

(vi) G6 = {[a, b) : a, b ∈ R, a < b}

13

Page 15: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

(vii) G7 = {[a,∞) : a ∈ R}

(viii) G8 = {(a,∞) : a ∈ R}

Proof. (i): The previous theorem implies G1 generates B(R), i.e., σ(G1) = B(R).

(ii): By writing,

(a, b] =∞⋂n=1

(a, b+ 1/n)

we see that G2 ⊆ σ(G1), hence σ(G2) ⊆ σ(G1). By writing

(a, b) =∞⋃n=1

(a, b− 1/n]

we see that G1 ⊆ σ(G2), hence σ(G1) ⊆ σ(G2). This proves G2 generates B(R).

(iii): By writing

(−∞, b] =

∞⋃n=1

(−n, b]

we see that G3 ⊆ σ(G2), hence σ(G3) ⊆ σ(G2). By writing

(a, b] = (−∞, b] ∩ (a,∞) = (−∞, b] ∩ (−∞, a]c

we see that G2 ⊆ σ(G3), hence σ(G2) ⊆ σ(G3). This proves G3 generates B(R).

(iv): By writing

(−∞, b) =

∞⋃n=1

(−∞, b− 1/n]

we see that G4 ⊆ σ(G3), hence σ(G4) ⊆ σ(G3). By writing

(−∞, b] =

∞⋂n=1

(−∞, b+ 1/n)

we see that G3 ⊆ σ(G4), hence σ(G3) ⊆ σ(G4). This proves G3 generates B(R).

(v),(vi),(vii),(viii): Similar.

Definition 3.11. Let B(R) denote the collection of all sets of the form A, A ∪ −∞, A ∪ ∞,A ∪ −∞∪∞, where A is a Borel set in R. It is straightforward to check that B(R) is a σ-algebraon R. We call B(R) the Borel σ-algebra on R.

Remark 3.12. It is possible to define open sets in R and to define the Borel σ-algebra on R asthe σ-algebra generated by the open sets in R. See Exercise X???

Theorem 3.13. The Borel σ-algebra B(R) is generated by each of the following collection of sets:

(i) G′3 = {[−∞, b] : b ∈ R}

(ii) G′4 = {[−∞, b) : b ∈ R}

14

Page 16: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

(iii) G′7 = {[a,∞] : a ∈ R}

(iv) G′8 = {(a,∞] : a ∈ R}

Proof. (i) By the previous theorem,

G′3 = {{−∞} ∪ (−∞, b) : b ∈ R} ⊆ B(R),

hence σ(G′3) ⊆ B(R). Now we show B(R) ⊆ σ(G′3). Let B ∈ B(R). Then B equals one of A orA∪−∞ or A∪∞ or A∪−∞∪∞, for some A ∈ B(R). To show that B ∈ σ(G′3), it suffices to showthat {−∞} , {∞} ∈ σ(G′3) and B(R) ⊆ σ(G′3). By writing,

{−∞} =∞⋂n=1

[−∞,−n]

we see that {−∞} ∈ σ(G′3). By writing

{∞} =

∞⋂n=1

(n,∞] =

∞⋂n=1

[−∞, n]c

we see that {∞} ∈ σ(G′3). Recall the definition of G2 from the previous theorem:

G2 = {(a, b] : a, b ∈ R, a < b}

By writing(a, b] = [−∞, b] ∩ (a,∞] = [−∞, b] ∩ [−∞, a]c

we see that G2 ⊆ σ(G′3). Therefore σ(G2) ⊆ σ(G′3). It is tempting to cite the previous theorem andassert that B(R) = σ(G2). But this it not true. Here σ(G2) is the smallest σ-algebra on R thatcontains G2. But (by the previous theorem) B(R) is the smallest σ-algebra on R that contains G2.We need to work a bit harder. We showed G2 ⊆ σ(G′3). Since every element of G2 is a subset of R,we have

G2 ⊆{R ∩B : B ∈ σ(G′3)

}.

The reader can verify that {R ∩B : B ∈ σ(G′3)} is a σ-algebra on R. Therefore, since B(R) is thesmallest σ-algebra on is the smallest σ-algebra on R that contains G2, we have

B(R) ⊆{R ∩B : B ∈ σ(G′3)

}.

For each B ∈ σ(G′3),R ∩B = R ∩ {−∞}c ∩ {∞}c ∩B ∈ σ(G′3),

and so {B ∩ R : B ∈ σ(G′3)

}⊆ σ(G′3).

ThereforeB(R) ⊆ σ(G′3).

(ii),(iii),(iv): Similar.

15

Page 17: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

4 Measurable Functions

To motivate the definition of a measurable function, we ask the reader to recall the followingtheorem from undergraduate analysis about continuous functions.

Theorem 4.1. Let f : R→ R. Then f is continuous iff

f−1(V ) is an open set in R whenever V is an open set in R

If T is the collection of open sets in R, we can rewrite this as: f is continuous iff

f−1(V ) ∈ T whenever V ∈ T

The reader who has studied topology may recall the following more general theorem.

Theorem 4.2. Let X and Y be topological spaces and f : X → Y . Then f is continuous iff

f−1(V ) is an open set in X whenever V is an open set in Y

If TX is the collection of open sets in X and TY is the collection of open sets in Y , we can rewritethis as: f is continuous iff

f−1(V ) ∈ TX whenever V ∈ TY

In summary, a continuous function is a function whose inverse image preserves open sets.

Now we state the definition of a measurable function.

Definition 4.3. Let f : X → Y . Let A be a σ-algebra on X. Let B be a σ-algebra on Y . We saythat f is measurable (or (A,B)-measurable to avoid ambiguity) if

f−1(B) ∈ A for every B ∈ B.

Now we turn to some special cases. If f : X → R, the σ-algebra on R is always assumed tobe the Borel σ-algebra B(R), i.e., (Y,B) = (R,B(R)). We say f : X → R is measurable (orA-measurable to avoid ambiguity) if

f−1(B) ∈ A for every B ∈ B(R).

If f : Rd → R, we sometimes (but not always) consider the Borel σ-algebra on the domain Rd. Wesay f : Rd → R is Borel measurable (or B(Rd)-measurable) if

f−1(B) ∈ B(Rd) for every B ∈ B(R).

The next theorem says that to show a function f is measurable we only need to check f−1(B) forB in a generating collection.

Theorem 4.4. Let (X,A) and (Y,B) be measurable spaces. Let f : X → Y . Let G be anycollection of subsets of Y that generates B, i.e., σ(G) = B. Then f is measurable iff

f−1(B) ∈ A for every B ∈ G.

16

Page 18: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Proof. ⇒: Since G ⊆ σ(G) = B, if

f−1(B) ∈ A for every B ∈ B.

thenf−1(B) ∈ A for every B ∈ G.

⇐: It is readily verified that{B ⊆ Y : f−1(B) ∈ A

}is a σ-algebra. If

f−1(B) ∈ A for every B ∈ G.

then{B ⊆ Y : f−1(B) ∈ A

}contains G Thus B = σ(G) ⊆

{B ⊆ Y : f−1(B) ∈ A

}. So f−1(B) ∈ A

for every B ∈ B. Therefore f is measurable.

By combining the above theorem and Theorem 3.13 (which gives generating sets for B(R)), weobtain:

Theorem 4.5. Let (X,A) be a measurable space and let f : X → R. Then the following areequivalent:

(i) f is measurable

(ii) {x ∈ X : f(x) < b} = f−1([−∞, b)) ∈ A for every b ∈ R

(iii) {x ∈ X : f(x) ≤ b} = f−1([−∞, b]) ∈ A for every b ∈ R

(iv) {x ∈ X : f(x) > a} = f−1((a,∞]) ∈ A for every a ∈ R

(v) {x ∈ X : f(x) ≥ a} = f−1([a,∞]) ∈ A for every a ∈ R

Remark 4.6. When verifying a function f : X → R is measurable, we usually use the theoremabove, rather than the definition.

Theorem 4.7. Let (X,A) be a measurable space and let f : X → R. If f is measurable, then{x ∈ X : f(x) = c} = f−1({c}) ∈ A for every c ∈ R.

Proof.{x ∈ X : f(x) = c} = {x ∈ X : f(x) ≤ c} ∩ {x ∈ X : f(x) ≥ c} ∈ A.

Notation. For brevity, we write

{f < b} = {x ∈ X : f(x) < b} f−1([−∞, b)),

{f = c} = {x ∈ X : f(x) = c} = f−1({c}),

and so on.

Theorem 4.8. Let (X,A) be a measurable space. Let f, g : X → R be measurable functions.Then (i) {f < g}, (ii) {f ≤ g}, and (iii) {f = g} are measurable sets.

17

Page 19: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Proof. (i): For each fixed x ∈ X, we have f(x) < g(x) iff there is an r ∈ Q such that f(x) < r <g(x). Thus

{x ∈ X : f(x) < g(x)} =⋃r∈Q{x ∈ X : f(x) < r} ∩ {x ∈ X : r < g(x)} ∈ A

(ii): {f ≤ g} = {g < f}c ∈ A by (i).

(iii): {f = g} = {f ≤ g} ∩ {f ≥ g} ∈ A by (ii).

Now we give some examples of measurable functions.

Theorem 4.9. Let (X,A) be a measurable space. Let f : X → R. If f is constant, then f ismeasurable.

Proof. Assume f is constant. This means there exists a c ∈ R such that f(x) = c for all x ∈ X.Let a ∈ R. Then {x ∈ X : f(x) ≥ a} = ∅ if a > c, and {x ∈ X : f(x) ≥ a} = X if a ≤ c. Eitherway, {x ∈ X : f(x) < a} ∈ A. Thus f is measurable.

Theorem 4.10. Let (X,A) be a measurable space. Let A ⊆ X. Then A is measurable (i.e.,A ∈ A) iff the indicator function 1A is measurable.

Proof. ⇒: Assume A ∈ A. For every a ∈ R,

{x ∈ X : 1A(x) ≥ a} =

∅ if a > 1A if 0 < a ≤ 1X if a ≤ 0

∈ A.

So 1A is measurable.

⇐: Assume 1A is measurable. Then A = {x ∈ X : 1A(x) > 0} ∈ A.

Theorem 4.11. Let f : R→ R. If f is continuous, then f is Borel measurable.

Proof. Since f is continuous, f−1(G) is open for every open set G ⊆ R. So {x ∈ X : f(x) < a} =f−1((−∞, a)) is open for every a ∈ R. Therefore {x ∈ X : f(x) < a} = f−1((−∞, a)) ∈ B(R) forevery a ∈ R.

Theorem 4.12. Let f : R→ R. If f is increasing or decreasing, then f is Borel measurable.

Proof. Exercise.

Theorem 4.13. Let (X,A),(Y,B), (Z, C) be a measurable spaces. Let f : X → Y and g : Y → Z.If f is (A,B)-measurable and if g : Y → Z is measurable (B, C)-measurable, then then the functiong ◦ f : X → Z is (A, C) measurable.

Proof. Exercise.

18

Page 20: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Theorem 4.14. Let (X,A) be a measurable space. Let f, g : X → R be measurable functions.Then the following functions are measurable:

(i) f + g, if f(x) + g(x) is defined for every x ∈ X.

(ii) fg

(iii) f/g, if g(x) 6= 0 for every x ∈ X.

Proof. Let a ∈ R be arbitrary.

(i):

{f + g < a} = {f < −g + a} =⋃r∈Q

({f < r} ∩ {r < −g + a}) =⋃r∈Q

({f < r} ∩ {g < −r + a}) ∈ A.

(ii): {fg < a} = A0 ∪A1 ∪A2 ∪A3 ∪A4, where Ai are as follows:

A0 = ({f = 0} ∪ {g = 0}) ∩ {fg < a}A1 = {f > 0} ∩ {g > 0} ∩ {fg < a}A2 = {f > 0} ∩ {g < 0} ∩ {fg < a}A3 = {f < 0} ∩ {g > 0} ∩ {fg < a}A4 = {f < 0} ∩ {g < 0} ∩ {fg < a}

We have:

A0 = ({f = 0} ∪ {g = 0}) ∩ {fg < a} =

{∅ if a ≤ 0{f = 0} ∪ {g = 0} if a ≤ 0

}∈ A

A1 = {f > 0} ∩ {g > 0} ∩ {fg < a}= {f > 0, g > 0, fg < a}= {f > 0, g > 0, f < a/g}

=⋃r∈Qr>0

({f > 0, g > 0, f < r} ∩ {f > 0, g > 0, r < a/g})

=⋃r∈Qr>0

({f > 0, g > 0, f < r} ∩ {f > 0, g > 0, g < a/r})

= {f > 0} ∩ {g > 0} ∩⋃r∈Qr>0

({f < r} ∩ {g < a/r}) ∈ A

19

Page 21: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

A2 = {f > 0} ∩ {g < 0} ∩ {fg < a}= {f > 0, g < 0, fg < a}= {f > 0, g < 0, f(−g) > −a}= {f > 0, g < 0, f > (−a)/(−g)}

=⋃r∈Qr>0

({f > 0, g < 0, f > r} ∩ {f > 0, g < 0, r > (−a)/(−g)})

=⋃r∈Qr>0

({f > 0, g < 0, f > r} ∩ {f > 0, g < 0,−g > (−a)/r})

=⋃r∈Qr>0

({f > 0, g < 0, f > r} ∩ {f > 0, g < 0, g < a/r})

= {f > 0} ∩ {g < 0} ∩ ∪r∈Qr>0

({f > r} ∩ {g < a/r}) ∈ A

Similarly, A3, A4 ∈ A. Therefore {fg < a} ∈ A

(iii): {f/g < a} = {f < ag} ∈ A by (ii) and the previous theorem.

Corollary 4.15. Let (X,A) be a measurable space. Let f, g : X → R be measurable functionsand let c ∈ R. Then cf and f + g are measurable functions.

Proof. By Theorem 4.9 and Theorem 4.14(ii), cf is measurable. By Theorem 4.14(i), f + g ismeasurable.

Theorem 4.16. Let (X,A) be a measurable space. Let f, g : X → R be measurable functions.The following functions are measurable:

(i) max {f, g}

(ii) min f, g

(iii) |f |

Proof. (i): {max {f, g} > a} = {f > a} ∪ {g > a} ∈ A

(ii): {min {f, g} > a} = {f > a} ∩ {g > a} ∈ A

(iii): {|f | < a} = {−a < f < a} = {f > −a}∩{f < a} ∈ A. Alternatively, write |f | = max(f,−f).

Theorem 4.17. Let (X,A) be a measurable space. Let (fn) be a sequence of measurable functionsX → R. The following functions are measurable:

(i) supn fn

(ii) infn fn

20

Page 22: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

(iii) lim sup fn = infk≥1(supn≥k fn)

(iv) lim inf fn = supk≥1(infn≥k fn)

Proof. (i): For each a ∈ R,{supnfn > a

}=

{x ∈ X : sup

nfn(x) > a

}=∞⋃n=1

{x ∈ X : fn(x) > a} ∈ A.

(ii): Similar to (i).

(iii): By (i), gk = supn≥k fn is measurable for each k ∈ N. By (ii), lim sup fn = infk≥1 gk ismeasurable.

(iv): Similar to (iii).

Corollary 4.18. Let (X,A) be a measurable space. Let (fn) be a sequence of measurable functionsX → [−∞,∞]. If limn fn(x) converges to an extended real number for each x ∈ X, then limn fn =lim supn fn = lim infn fn, and so limn fn is measurable.

21

Page 23: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Chapter 3

Measures and Integrals

5 Measures

Definition 5.1. Let A be a σ-algebra on a set X. A measure on A is a function µ : A → [0,∞]that has the following properties:

(a) µ(∅) = 0

(b) (Countable Addivity) If E1, E2, . . . is a sequence of disjoint sets in A, then µ (⋃∞i=1Ei) =∑∞

i=1 µ(Ei).

Theorem 5.2 (Properties of Measures). Let A be a σ-algebra on a set X. If µ is a measure on A,then µ has the following properties:

(i) (Finite Additivity) If A,B are disjoint sets in A, then µ(A ∪B) = µ(A) + µ(B).

(ii) (Subtraction) If A,B ∈ A, A ⊆ B, and µ(A) <∞, then µ(B \A) = µ(B)− µ(A).

(iii) (Monotonicity) If A,B ∈ A and A ⊆ B, then µ(A) ≤ µ(B).

(iv) (Countable Subadditivity) If A1, A2, . . . ∈ A, then µ (⋃∞i=1Ai) ≤

∑∞i=1 µ(Ai).

(v) (Continuity From Below) IfA1, A2, . . . ∈ A andA1 ⊆ A2 ⊆ . . ., then µ (⋃∞i=1Ai) = limn µ(An).

(vi) (Continuity From Above) If A1, A2, . . . ∈ A and A1 ⊇ A2 ⊇ . . ., and µ(A1) < ∞, thenµ (⋂∞i=1Ai) = limn µ(An).

Proof. (i): Define A1 = A, A2 = B, and Ai = for i > 2 and use countable additivity.

(ii): Since B = (B ∩A) ∪ (B \A) = A ∪ (B \A), finite additivity gives

µ(B) = µ(A) + µ(B \A).

Since µ(A) <∞, we can subtract it from both sides to get the desired result.

22

Page 24: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

(iii): Since B = (B ∩ A) ∪ (B \ A) = A ∪ (B \ A), finite additivity and the fact that µ(B \ A) ≥ 0gives

µ(B) = µ(A) + µ(B \A) ≥ µ(A)

.

(iv): Define B1 = A1, B2 = A2 \A1, B3 = A3 \ (A1 ∪A2), and so on. Then B1, B2, . . . is sequenceof disjoint sets in A such that ∪∞i=1Bi = ∪∞i=1Ai and Bi ⊆ Ai for all i ∈ N. Therefore, by countableadditivity and monotonicity,

µ

( ∞⋃i=1

Ai

)= µ

( ∞⋃i=1

Bi

)=∞∑i=1

µ(Bi) ≤∞∑i=1

µ(Ai).

(v): Define B1 = A1, B2 = A2 \ A1, B3 = A3 \ A2, and so on. Then B1, B2, . . . is sequence ofdisjoint sets in A such that ∪∞i=1Bi = ∪∞i=1Ai and ∪ni=1Bi = ∪ni=1Ai = An for all n ∈ N. Therefore,by countable additivity and finite additivity,

µ

( ∞⋃i=1

Ai

)= µ

( ∞⋃i=1

Bi

)=∞∑i=1

µ(Bi)

= limn→∞

n∑i=1

µ(Bi) = limn→∞

µ (∪ni=1Bi)

= limn→∞

µ (∪ni=1Ai) = limn→∞

µ (An) .

(vi): Define B1 = A1 \ A1 = ∅, B2 = A1 \ A2, B3 = A1 \ A3, and so on. Then B1, B2, . . . ∈ A andB1 ⊆ B2 ⊆ . . .. By (v),

µ

( ∞⋃i=1

Bi

)= lim

n→∞µ(Bn).

Since⋃∞i=1Bi = A1 \

⋂∞i=1Ai and Bn = A1 \An, we have

µ

(A1 \

∞⋂i=1

Ai

)= lim

n→∞µ(A1 \An).

By monotonicity, µ (⋂∞i=1Ai) ≤ µ(An) ≤ µ(A1) <∞. Then (ii) gives

µ(A1)− µ

( ∞⋂i=1

Ai

)= lim

n→∞(µ(A1)− µ(An)) .

Subtracting µ(A1) and then multiply by −1 gives the desired result.

Example 5.3. Let X be any set.

(a) The counting measure on X is the measure µ : P(X) → [0,∞] defined by µ(A) = ∞ if Ais an infinite subset of X and µ(A) equals the number of elements in A if A is a finite subsetof X.

(b) Let x0 ∈ X. The Dirac measure at x0 or point mass at x0 is the measure µ : P(X) →[0,∞] defined by µ(A) = 1 if x0 /∈ A and µ(A) = 1 if x0 ∈ A.

23

Page 25: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

(c) If A is any σ-algebra on X, the function µ defined by µ(A) = 0 for every A ∈ A is a measureon A. It is called the zero measure or trivial measure on A.

(d) If A is any σ-algebra on X, the function µ defined by µ∗(∅) = 0 and µ(A) = ∞ for everynon-empty set A ∈ A is a measure on A.

As an exercise, the reader should verify that these are indeed measures.

The most important measure is the Lebesgue measures on (R,B(R)). Later, we will see how to definethe Lebesgue measure. For now, we state without proof the following theorem that characterizesit.

Theorem 5.4. There is a unique measure λ on (R,B(R)) such that

λ ((a, b]) = b− a

for every a, b ∈ R, a ≤ b. It is called the Lebegsue measure on (R,B(R))

We will also consider the Lebesgue measure on (Rd,B(Rd)). The following theorem characterizesit.

Theorem 5.5. There is a unique measure λd on (Rd,B(Rd)) such that

λ ((a1, b1]× · · · × (ad, bd]) = (b1 − a1) · · · (bd − ad)

for every ai, bi ∈ R, ai ≤ bi. It is called the Lebegsue measure on (Rd,B(Rd)).

24

Page 26: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

6 Integral of Non-Negative Simple Functions

We define the integral in stages. We start by defining the integral for the class of simple functionsand establishing its basic properties.

Definition 6.1. A function s : X → R is called simple if its range s(X) = {s(x) : x ∈ X} is afinite set.

Lemma 6.2. If s : X → R is a simple function whose range consists of the distinct numbersc1, c2, . . . , cn, then

s =n∑i=1

ci1Ei where Ei = s−1({ci}) = {x ∈ X : s(x) = ci}. (6.1)

We call (6.1) the standard representation of s. Note E1, . . . , En are disjoint sets and⋃ni=1Ei =

X. Note also that s ≥ 0 iff ci ≥ 0 for all 1 ≤ i ≤ n.

DRAW A GRAPH HERE

Lemma 6.3. Let (X,A) be a measurable space. Let s : X → R be a simple function whose rangeconsists of the distinct numbers c1, c2, . . . , cn. Then s is measurable iff each set Ei = s−1({ci}) inits standard representation is measurable.

Proof. If each Ei is measurable, then each 1Ei is measurable, so s =∑n

i=1 ci1Ei is measurable.

Conversely, if s is measurable, then

Ei = {x ∈ X : s(x) = ci} =∞⋂n=1

{x ∈ X : ci − 1/n < s(x) < ci + 1/n}

is measurable for each i = 1, . . . , n.

Definition 6.4. Let (X,A, µ) be a measure space. If s : X → R is a non-negative measurablesimple function with standard representation

s =n∑i=1

ci1Ei ,

then the integral of s with respect to µ (or the µ-integral of s) is∫sdµ =

n∑i=1

ciµ(Ei)

For the sum, remember the convention 0 · ∞ = 0. Note that∫sdµ may equal ∞. When there

is no ambiguity, we write∫s instead of

∫sdµ. Alternatively, it is sometimes convenient to write∫

s(x)dµ(x) instead of∫sdµ.

The next theorem says the integral is linear and monotone for non-negative measurable simplefunctions.

25

Page 27: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Theorem 6.5. Let (X,A, µ) be a measure space. Let s and t be non-negative measurable simplefunctions.

(a) If c ∈ [0,∞), then∫cs = c

∫s.

(b)∫

(s+ t) =∫s+

∫t

(c) If s ≤ t, then∫s ≤

∫t.

Proof. Let s =∑m

i=1 ai1Ai and t =∑n

i=1 bj1Bj be the standard representations of s and t. SoAi = {x : s(x) = ai} and Bj = {x : t(x) = bi}. Note that Ai =

⋃nj=1(Ai ∩ Bj) for each fixed

i = 1, . . . ,m. Likewise Bj =⋃mi=1(Ai ∩ Bj) for each fixed j = 1, . . . , n. Finally, note that Ai ∩ Bj

is disjoint from Ai′ ∩Bj′ whenever (i, j) 6= (i′, j′).

(a): If c ∈ (0,∞), then Ai = {x : cs(x) = cai}, and so the standard representation of cs is

cs =

m∑i=1

cai1Ai .

Therefore ∫cs =

m∑i=1

caiµ(Ai) = c

m∑i=1

aiµ(Ai) = c

∫s.

If c = 0, then cs = 0, and its standard representation is

cs = 01X .

Thus ∫cs = 0µ(X) = 0 = 0

∫s = c

∫s

(b): Then, by the finite additivity of µ,∫s+

∫t =

m∑i=1

aiµ(Ai) +n∑j=1

biµ(Bi)

=

m∑i=1

n∑j=1

aiµ(Ai ∩Bj) +

m∑i=1

n∑j=1

bjµ(Ai ∩Bj)

=

m∑i=1

n∑j=1

(ai + bj)µ(Ai ∩Bj).

Note s+ t is a simple function because its range is the finite set

{ai + bj : 1 ≤ i ≤ m, 1 ≤ j ≤ n} .

Let c1, . . . , c` be the distinct numbers in this set. For each k ∈ {1, . . . , `}, let

Ek = (s+ t)−1({ck}) = {x ∈ X : s(x) + t(x) = ck}

ThenEk =

⋃(i,j):ai+bj=ck

(Ai ∩Bj)

26

Page 28: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

In other words, Ek is the union of those sets Ai ∩ Bj such that ai + bj = ck. So the standardrepresentation of s+ t is

s+ t =∑̀k=1

ck1Ek.

Then, by the finite additivity of µ,∫(s+ t) =

∑̀k=1

ckµ(Ek)

=∑̀k=1

∑i,j:ai+bj=ck

ckµ(Ai ∩Bj)

=m∑i=1

n∑j=1

(ai + bj)µ(Ai ∩Bj)

Comparing the formulas above for∫s+

∫t and

∫(s+ t) shows they are equal.

(c): If s ≤ t, then for every x ∈ Ai ∩Bj we have

ai = s(x) ≤ t(x) ≤ bj

Thus ai ≤ bj whenever Ai ∩Bj is non-empty. Then, by the finite additivity of µ,∫s =

m∑i=1

aiµ(Ai) =

m∑i=1

n∑j=1

aiµ(Ai ∩Bj) ≤m∑i=1

n∑j=1

bjµ(Ai ∩Bj) =

n∑j=1

bjµ(Bj) =

∫t

27

Page 29: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

7 Integral of Non-Negative Extended Real-Valued Functions

Now we extend the definition of the integral to the class of non-negative functions.

Definition 7.1. Let (X,A, µ) be a measure space. Let f : X → [0,∞] be a measurable function.The integral of f with respect to µ (or the µ-integral of f) is defined by Then∫

fdµ = sup

{∫s dµ : s simple measurable, 0 ≤ s ≤ f

}(7.1)

Note that∫fdµ may equal ∞. When there is no ambiguity, we write

∫f instead of

∫fdµ.

Alternatively, it is sometimes convenient to write∫f(x)dµ(x) instead of

∫fdµ.

Remark 7.2. We must check that this definition of the integral agrees with the earlier one when fis simple. We temporarily use

∮to denote the integral defined in Definition 6.4. Then the formula

for the integral in Definition 7.1 becomes∫fdµ = sup

{∮s dµ : s simple measurable, 0 ≤ s ≤ f

}If f : X → [0,∞] is a measurable simple function, then

∮f is an element of the set on the right.

Moreover, for any simple measurable function s with 0 ≤ s ≤ f , Theorem 6.5 implies∮s ≤

∮f , and

so∫f is the maximum of the set on the right. Since the supremum of a set equals the maximum

whenever the maximum exists, we have∫fdµ = sup

{∮s dµ : s simple measurable, 0 ≤ s ≤ f

}=

∮fdµ

The next theorem establishes that the integral is monotone for non-negative functions.

Theorem 7.3. Let (X,A, µ) be a measure space. Let f, g : X → [0,∞] be measurable functions.If f ≤ g, then

∫f ≤

∫g.

Proof. If s is a simple measurable function such that 0 ≤ s ≤ f , then 0 ≤ s ≤ g. Thus{∫s dµ : s simple measurable, 0 ≤ s ≤ f

}⊆{∫

s dµ : s simple measurable, 0 ≤ s ≤ g}

Taking supremums gives the desired inequality.

The next theorem is the first of the fundamental convergence theorems in Lebesgue’s theory ofintegration.

Theorem 7.4. (Monotone Convergence Theorem) Let (X,A, µ) be a measure space. Let fn :X → [0,∞] (n = 1, 2, . . .) be a sequence of measurable functions. If fn increases to f pointwise(meaning that f1(x) ≤ f2(x) ≤ . . . and f(x) = limn fn(x) = supn fn(x) for every x ∈ X), then f ismeasurable and

limn

∫fn =

∫f.

28

Page 30: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Proof. Since f is the pointwise limit of a sequence of measurable functions, Corollary 4.18 impliesf is measurable. Since f1 ≤ f2 ≤ . . . ≤ f , we have

∫f1 ≤

∫f2 ≤ . . . ≤

∫f , and so limn

∫fn ≤

∫f .

We must prove the reverse inequality. Let 0 < c < 1 be given. Let s be a simple measurablefunction such that 0 ≤ s ≤ f . For every n ∈ N, define

En = {x ∈ X : cs(x) ≤ fn(x)} .

For every n ∈ N, we have cs1En ≤ fn, and so

c

∫s1En ≤

∫fn. (7.2)

We want to take the limit n → ∞ in (7.2). Suppose the standard representation of s is s =∑ki=1 ai1Ai . Then s1En is a simple function and its standard representation is s1En =

∑ki=1 ai1En∩Ai .

Therefore ∫s =

k∑i=1

aiµ(Ai)

and ∫s1En =

k∑i=1

aiµ(En ∩Ai).

Note E1 ⊆ E2 ⊆ . . . and

X =∞⋃n=1

En.

(To see the last equality, consider an arbitrary x ∈ X. If f(x) = 0 or s(x) = 0, then x ∈ E1. Iff(x) > 0 and s(x) > 0, then cs(x) < f(x), hence there exists n ∈ N such that cs(x) < fn(x) ≤ f(x),and so x ∈ En.) Let i ∈ {1, . . . , k} be arbitrary. We have E1 ∩Ai ⊆ E2 ∩Ai ⊆ . . . and

Ai =∞⋃n=1

(En ∩Ai).

By continuity from below,limn→∞

µ(En ∩Ai) = µ(Ai).

Therefore

limn

∫s1En = lim

n

k∑i=1

aiµ(En ∩Ai) =k∑i=1

aiµ(Ai) =

∫s.

Thus taking n→∞ in (7.2) gives

c

∫s ≤ lim

n

∫fn.

Since 0 < c < 1 is arbitrary, ∫s ≤ lim

n

∫fn.

Since s is an arbitrary simple measurable function with 0 ≤ s ≤ f , we have∫f ≤ lim

n

∫fn.

29

Page 31: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

The next theorem concerns approximation of functions by simple functions. It is frequently usedto produce a sequence of functions to which the monotone convergence theorem can be applied.Notice the theorem does not involve a measure µ.

Theorem 7.5. (Simple Approximation Theorem) Let (X,A). Let f : X → R. Then there existsa sequence (sn) of functions sn : X → R such that

(i) s1, s2, . . . are simple functions.

(ii) |s1| ≤ |s2| ≤ . . . ≤ |f |.

(iii) limk→∞ sk(x) = f(x) for each x ∈ X.

(iv) If f is bounded, then sk → f uniformly on X.

(v) If f ≥ 0, then 0 ≤ s1 ≤ s2 ≤ . . . ≤ f .

(vi) If (X,A) is a measurable space and f is measurable, then s1, s2, . . . are measurable.

Proof. Let k ∈ N. Decompose [−∞,∞] as the following union of subintervals:

[−∞,∞] = [−∞,−k] ∪ (−k, 0] ∪ [0, k) ∪ [k,∞]

Decompose [−∞,∞] further by dividing (−k, 0] and [0, k) up into intervals of length 1/2k. So weget

[−∞,∞] = [−∞,−k] ∪k2k⋃m=1

(−m2k

,−(m− 1)

2k

]∪

k2k⋃m=1

[m− 1

2k,m

2k

)∪ [k,∞].

For each x ∈ X, consider the subinterval to which f(x) belongs and define sk(x) to the endpointof that subinterval which is closest to 0. More precisely,

sk(x) =

k if f(x) ∈ [k,∞]m−12k

if f(x) ∈[m−12k

, m2k

),m ∈

{1, . . . , k2k

}−(m−1)

2kif f(x) ∈

(−m2k, −(m−1)

2k

],m ∈

{1, . . . , k2k

}−k if f(x) ∈ [−∞,−k].

Figure 3.1 illustrates the definition of sk and sk+1.

(i): Since sk takes only finitely many values, sk is simple.

(ii): No matter which subinterval f(x) belongs, sk(x) rounds f(x) to a number closer to zero thandoes sk+1(x). See Figure 3.1.

(iii): The key observation is that if f(x) ∈ (−k, k), then

|f(x)− sk(x)| ≤ 1

2k.

Here are the details. Let x ∈ X. If f(x) = ∞, then sk(x) = k for every k ∈ N, and solimk→∞ sk(x) =∞ = f(x). If f(x) = −∞, then sk(x) = −k for every k ∈ N, and so limk→∞ sk(x) =

30

Page 32: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Figure 3.1: Definition of sk and sk+1

−∞ = f(x). If f(x) ∈ (−∞,∞), there exists M > 0 such that f(x) ∈ (−M,M). Let ε > 0. ChooseK ∈ N such that 1

2K< ε and K ≥M . For every k ∈ N, we have f(x) ∈ (−k, k), and so

|f(x)− sk(x)| ≤ 1

2k≤ 1

2K< ε.

Therefore limk→∞ sk(x) = f(x).

(iv): If f is bounded, then there exists M > 0 such that f(x) ∈ (−M,M) for every x ∈ X. Letε > 0. Choose K ∈ N such that 1

2K< ε and K ≥ M . For every k ∈ N and every x ∈ X, we have

f(x) ∈ (−k, k), and so

|f(x)− sk(x)| ≤ 1

2k≤ 1

2K< ε.

Therefore sk → f uniformly on X.

(v): If f(x) ≥ 0, then sk(x) ≥ 0 by definition. The rest follows from (ii).

(vi): For k ∈ N and m ∈{

1, . . . , k2k}

, define

Fk = {x : f(x) ∈ [k,∞]} = f−1([k,∞]),

F−k = {x : f(x) ∈ [−∞,−k]} = f−1([−∞,−k]),

Em =

{x : f(x) ∈

[m− 1

2k,m

2k

)}= f−1

([m− 1

2k,m

2k

)),

E−m =

{x : f(x) ∈

(−m2k

,−(m− 1)

2k

]}= f−1

((−m2k

,−(m− 1)

2k

]).

Then

sk = k1Fk+ (−k)1F−k

+

k2k∑m=1

m− 1

2k1Em +

k2k∑m=1

−(m− 1)

2k1E−m .

31

Page 33: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

If f is measurable, then all the sets Fk, F−k, Em, E−m are measurable, and so sk is measurable.

Now we combine the monotone convergence theorem and the simple approximation theorem, toprove the linearity of the integral for non-negative functions.

Theorem 7.6. Let (X,A, µ) be a measure space. Let f, g : X → [0,∞] be measurable functions.

(a) If c ∈ [0,∞], then∫cf = c

∫f .

(b)∫

(f + g) =∫f +

∫g

Proof. (a): Assume c ∈ [0,∞]. Choose a sequence (cn) in [0,∞) such that cn increases to c. ByTheorem 7.5, there is a sequence of non-negative measurable simple functions (sn) such that snincreases to f . Then (cnsn) is a sequence of of non-negative measurable simple functions thatincreases to cf . By the monotone convergence theorem and Theorem 6.5(a)∫

cf = limn

∫cnsn = lim

ncn

∫sn =

(limncn

)·(

limn

∫sn

)= c

∫f

(b): By Theorem 7.5, there are sequences of non-negative measurable simple functions (sn) and(tn) such that sn increases to f and tn increases to g. Then (sn + tn) is a sequence of non-negativemeasurable simple functions that increases to f + g. By the monotone convergence theorem andTheorem 6.5(b), ∫

(f + g) = limn

∫(sn + tn) = lim

n

∫sn + lim

n

∫tn =

∫f +

∫g

Remark 7.7. In the exercises, you will be asked to prove the previous theorem without using themonotone convergence theorem or the simple approximation theorem.

32

Page 34: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

8 Integral of Extended Real-Valued Functions

Finally, we extend the definition of the integral to R-valued functions.

Definition 8.1. Let f : X → R. The positive part of f is the function

f+ = max {0, f} .

The negative part of f is the function

f− = max {0,−f} .

INSERT PICTURE HERE

The following lemma summarizes the important properties of the positive and negative part of afunction.

Lemma 8.2. Let f : X → R.

(a) f+ ≥ 0 and f− ≥ 0

(b) f = f+ − f−

(c) |f | = f+ + f−

(d) f+(x) = 12(|f |(x) + f(x)) whenever f(x) 6= −∞

(e) f−(x) = 12(|f |(x)− f(x)) whenever f(x) 6=∞

(f) For any measurable space (X,A), f is measurable iff f+ and f− are measurable.

Proof. (a)-(e) are immediate from the definitions, and (f) follows from Theorem 4.14 and Theorem4.16.

Definition 8.3. Let (X,A, µ) be a measure space. Let f : X → R be a measurable function. Ifboth

∫f+dµ and

∫f−dµ are finite, we say that f is integrable (or integrable with respect to µ

or µ-integrable), and we define the integral of f (or integral of f with respect to µ or µ-integralof f) to be ∫

fdµ =

∫f+dµ−

∫f+dµ.

If exactly one of∫f+dµ and

∫f−dµ is finite, we still define the integral of f with respect to µ

(or µ-integral of f) to be ∫fdµ =

∫f+dµ−

∫f+dµ,

but we do not say that f is integrable.

If∫f+dµ =

∫f−dµ =∞, the integral

∫fdµ is not defined.

Note f is integrable iff∫fdµ is finite.

Theorem 8.4. Let (X,A, µ) be a measure space. Let f : X → R be a measurable function.

33

Page 35: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

(a) f is integrable iff |f | is integrable.

(b) If∫f is defined, then

∣∣∫ f ∣∣ ≤ ∫ |f |.Remark. (b) is a triangle inequality for integrals.

Proof. (a): f is integrable iff∫f =

∫f+ −

∫f− is finite iff

∫f+ and

∫f− are finite iff

∫|f | =∫

(f+ + f−) =∫f+ +

∫f− is finite iff |f | is integrable.

(b): ∣∣∣∣∫ f

∣∣∣∣ =

∣∣∣∣∫ f+ −∫f−∣∣∣∣ ≤ ∣∣∣∣∫ f+

∣∣∣∣+

∣∣∣∣∫ f−∣∣∣∣ =

∫f+ +

∫f− =

∫|f |

Corollary 8.5. Let (X,A, µ) be a measure space. Let f, g : X → R be measurable functions. If|f | ≤ |g| and g is integrable, then f is integrable.

Proof. Note∫|f | ≤

∫|g| <∞ and apply the previous theorem.

Theorem 8.6. Let (X,A, µ) be a measure space. Let f, g : X → R be integrable functions (whichalso means they are measurable).

(a) If f ≤ g, then∫f ≤

∫g.

(b) If c ∈ R, then∫cf = c

∫f .

(c)∫

(f + g) =∫f +

∫g

Proof. (a): Assume f ≤ g. Then f+ ≤ g+ and f− ≥ g−. Therefore Theorem 7.3 gives∫fdµ =

∫f+dµ−

∫f−dµ ≤

∫g+dµ−

∫g−dµ ≤

∫gdµ.

(b): Exercise.

(c): Note (f + g)+ ≤ f+ + g+. Then Theorem 7.3 and Theorem 7.6(b) imply∫(f + g)+ ≤

∫(f+ + g+) =

∫f+ +

∫g+ <∞.

Likewise∫

(f + g)− <∞. Therefore f + g is integrable. Note

(f + g)+ − (f + g)− = f + g = f+ − f− + g+ − g−.

Since f−, g−, and (f + g)− are finite, we can add them to both sides to get

(f + g)+ + f− + g− = (f + g)− + f+ + g+.

34

Page 36: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Then Theorem 7.6(b)gives∫(f + g)+ +

∫f− +

∫g− =

∫(f + g)− +

∫f+ +

∫g+.

Since∫f−,

∫g−, and

∫(f + g)− are finite, we can subtract them from both sides to get∫

(f + g)+ −∫

(f + g)− =

∫f+ −

∫f− +

∫g+ −

∫g−.

Therefore ∫(f + g) =

∫f +

∫g.

We can partially extend the previous theorem to R-valued functions.

Theorem 8.7. Let (X,A, µ) be a measure space. Let f, g : X → R be measurable functions.Assume

∫f and

∫g are defined.

(a) If f ≤ g, then∫f ≤

∫g.

(b) If c ∈ R, then∫cf = c

∫f .

(c)∫

(f + g) =∫f +

∫g, provided that we do not have

∫f = −

∫g =∞ or

∫f = −

∫g = −∞

and provided that for all x ∈ X we do not have f(x) = −g(x) =∞ or f(x) = −g(x) = −∞.

Proof. Exercise.

35

Page 37: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

9 Almost Everywhere

Definition 9.1. Let (X,A, µ) be a measure space. We say a property P holds almost everywhereor for almost every x ∈ X if there exists a set A such that A ∈ A, µ(Ac) = 0, and P holds forevery x ∈ A. Thus “almost everywhere” means “everywhere except on a set of measure zero’.’ Forbrevity, we often write “a.e.” instead of “almost everywhere” and “almost every.” The conceptof “almost everywhere” depends on the measure µ. So, when clarity demands it, we write µ-a.e.instead of a.e.

Definition 9.2. Let (X,A, µ) be a measure space. We say a property P holds almost everywhereor for almost every x ∈ X if there exists a set A such that A ∈ A, µ(Ac) = 0, and P holds forevery x ∈ A. Thus “almost everywhere” means “everywhere except on a set of measure zero’.’ Forbrevity, we often write “a.e.” instead of “almost everywhere” and “almost every.” The concept of“almost everywhere” depends on the measure µ. So, when clarity demands it, we write “µ-almosteverywhere” instead of “almost everywhere.”

Example 9.3. Let (X,A, µ) be a measure space. If f, g : X → R and there exists A ∈ A such thatµ(Ac) = 0 and f(x) = g(x) for every x ∈ A, then we say that f = g a.e.

Example 9.4. Let (X,A, µ) be a measure space. If (fn) is a sequence functions fn : X → R andthere exists A ∈ A such that µ(Ac) = 0 and (fn(x)) converges for each x ∈ A, then we say that(fn) converges a.e.

Observation. P a.e. if and only if there exists a set A such that A ∈ A, µ(Ac) = 0, andA ⊆ {x ∈ X : P holds at x} if and only if there exists a set N such that N ∈ A, µ(N) = 0, and{x ∈ X : P does not hold at x} ⊆ N . For the second equivalence, take N = Ac.

Observation. If the set {x ∈ X : P holds at x} is in A, then:

(i) P a.e. is true iff µ({x ∈ X : P does not hold at x}) = 0.

(ii) P a.e. is false iff µ({x ∈ X : P does not hold at x}) > 0.

To see this take A = {x ∈ X : P holds at x} in the definition.

Observation. If P1 and P2 are two properties, then P1 a.e. and P2 a.e. iff P1 and P2 a.e. IfP1, P2, . . . is a sequence of properties, then Pi holds a.e. for all i ∈ N iff Pi for all i ∈ N holds a.e.This is because a countable union of sets of measure 0 is a set of measure 0.

Remark. Almost everywhere convergence is not a topological convergence; there is no topologythat can be placed on the set of measurable functions for which convergence in the topology isequivalent to convergence almost everywhere.

Theorem 9.5. Let (X,A, µ) be a measure space. If f : X → R is an integrable function, thenµ({x ∈ X : |f(x)| =∞}) = 0, hence f is finite a.e.

We give two proofs.

36

Page 38: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Proof 1. Let n ∈ N. Define En = {x ∈ X : n ≤ |f(x)|}. Then {x ∈ X : |f(x)| =∞} ⊆ En and1En ≤ n−1|f |. Therefore

µ ({x ∈ X : |f(x)| =∞}) ≤ µ(En) =

∫1En ≤ n−1

∫|f |.

Since∫|f | is finite by assumption, letting n→∞ gives µ ({x ∈ X : |f(x)| =∞}) = 0.

Proof 2. Define A = {x ∈ X : |f(x)| <∞}. Note A ∈ A and |f | = |f |1A +∞1Ac . Then∫|f | =

∫|f |1A +

∫|f |1Ac =

∫|f |1A +∞µ(Ac).

If f is integrable, then∫|f | <∞, which forces µ(Ac) = 0, hence f is finite a.e.

Theorem 9.6. Let (X,A, µ) be a measure space. Let f : X → [0,∞] be a measurable function.Then

∫f = 0 iff f = 0 a.e.

We give two proofs.

Proof 1. Case 1: f is simple. Let f =∑n

i=1 ci1Ei be the standard representation of f . So ci ≥ 0and Ei = f−1({ci}) for each 1 ≤ i ≤ n. Then

∫f =

∑ni=1 ciµ(Ei) = 0 iff for every 1 ≤ i ≤ n either

ci = 0 or µ(Ei) = 0 iff f =∑n

i=1 ci1Ei = 0 a.e..

Case 2: f not simple. Assume f = 0 a.e.. For every measurable simple function s with 0 ≤ s ≤ f ,we have s = 0 a.e., hence

∫s = 0 by Case 1. Therefore∫

f = sup

{∫s : s measurable simple, 0 ≤ s ≤ f

}= sup {0} = 0.

Conversely, assume∫f = 0. Note {x ∈ X : f(x) > 0} =

⋃∞n=1En, where En =

{x ∈ X : f(x) ≥ n−1

}.

For every n ∈ N, we have 1En ≤ nf , and so

µ(En) =

∫1En ≤ n

∫f = 0.

Therefore

µ({x ∈ X : f(x) > 0}) ≤∞∑n=1

µ(En) = 0

Thus f = 0 a.e.

Proof 2. Assume f = 0 a.e. Then there exists A ∈ A such that µ(Ac) = 0 and f(x) = 0 for allx ∈ A. Therefore

f = f1A + f1Ac = f1Ac ≤ ∞1Ac .

Thus ∫f ≤

∫∞1Ac =∞µ(Ac) = 0.

37

Page 39: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Conversely, assume∫f = 0. Note {x ∈ X : f(x) > 0} =

⋃∞n=1En, where En =

{x ∈ X : f(x) ≥ n−1

}.

For every n ∈ N, we have 1En ≤ nf , and so

µ(En) =

∫1En ≤ n

∫f = 0.

Therefore

µ({x ∈ X : f(x) > 0}) ≤∞∑n=1

µ(En) = 0

Thus f = 0 a.e.

Corollary 9.7. Let (X,A, µ) be a measure space. Let f, g : X → R be integrable functions. Thenf = g a.e. iff

∫|f − g| = 0.

Proof. f = g a.e. iff f − g = 0 a.e. iff |f − g| = 0 a.e. iff∫|f − g| = 0. The first equivalence

uses that f and g are finite a.e., which follows from Theorem 9.5. The last equivalence comes fromTheorem 9.6.

Theorem 9.8. Let (X,A, µ) be a measure space. Let f, g : X → R be measurable functions.Assume both

∫f and

∫g are defined.

(a) If f ≤ g a.e., then∫f ≤

∫g.

(b) If f = g a.e., then∫f =

∫g.

Proof. (a): Case 1: f, g : X → [0,∞]. Assume f ≤ g a.e. Then there exists A ∈ A such thatµ(Ac) = 0 and f(x) ≤ g(x) for all x ∈ A. So f1Ac = 0 a.e. and f1A ≤ g everywhere. Thus∫

f =

∫(f1A + f1Ac) =

∫f1A +

∫f1Ac =

∫f1A ≤

∫g.

Case 2: f, g : X → R. Assume f ≤ g a.e. Then f+ ≤ g+ a.e and g− ≤ f− a.e. Therefore, by Case1,∫f+ ≤

∫g+ and

∫g− ≤

∫f−. Thus∫f =

∫f+ −

∫f− ≤

∫g+ −

∫g− =

∫g.

(b): Note that f = g a.e. iff f ≤ g a.e. and g ≤ f a.e. Then apply (a) twice.

Corollary 9.9. Let (X,A, µ) be a measure space. Let f, g : X → R be measurable functions.

(a) If |f | ≤ |g| a.e. and g is integrable, then f is integrable and∫|f | ≤

∫|g|.

(b) If f = g a.e. and g is integrable, then f is integrable and∫f =

∫g.

Proof. (a): By the previous theorem, we have∫|f | ≤

∫|g| <∞. Thus f is integrable.

(b): If f = g a.e, then |f | = |g| a.e., and in particular |f | ≤ |g| a.e. Then (a) implies f is integrable.The previous theorem implies

∫f =

∫g.

38

Page 40: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Definition 9.10. Let (X,A, µ) be a measure space. Let f : X → R. If f is measurable, E ∈ A,and

∫f1Edµ is defined, we define ∫

Efdµ =

∫f1Edµ.

With this notation,∫fdµ =

∫X fdµ.

Theorem 9.11. Let (X,A, µ) be a measure space. Let f, g : X → R be measurable functions.Assume

∫f and

∫g are integrable. Then f = g a.e. iff

∫E f =

∫E g for every E ∈ A.

Proof. Exercise.

39

Page 41: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

10 Fatou’s Lemma and Dominated Convergence Theorem

Lemma 10.1 (Fatou’s Lemma). Let (X,A, µ) be a measure space. If fn : X → [0,∞] (n = 1, 2, . . .)is a sequence of measurable functions, then∫

lim inf fn ≤ lim inf

∫fn.

Proof. Define gk = infn≥k fn for each k ∈ N. Then (gk) is an increasing sequence of measurablefunctions and limk gk = lim inf fn. By the monotone convergence theorem,∫

lim inf fn = limk

∫gk.

Note gk ≤ fn whenever n ≥ k. So∫gk ≤

∫fn whenever n ≥ k. Then, for every k ∈ N,

∫gk is a

lower bound for the set{∫

fn : n ≥ k}

. Thus∫gk ≤ infn≥k

∫fn for every k ∈ N. Therefore∫

lim inf fn = limk

∫gk ≤ lim

kinfn≥k

∫fn = lim inf

∫fn.

Theorem 10.2. (Dominated Convergence Theorem) Let (X,A, µ) be a measure space. Let (fn)be a sequence of measurable functions fn : X → R. Let f : X → R be a measurable function. Letg : X → [0,∞] be an integrable function. Suppose that the following properties hold a.e.: |fn| ≤ gfor every n ∈ N and fn → f . Then f is integrable, fn is integrable for every n, and

limn

∫fn =

∫f.

Proof. We will need the following properties of lim inf that the reader should verify: If c ∈ R and(an) is a sequence in R, then

lim inf(c+ an) = c+ lim inf an

lim inf(−an) = − lim sup an

Let us first assume that the following properties hold everywhere:

g <∞, |fn| ≤ g for every n ∈ N, and fn → f .

Since fn → f pointwise and |fn| ≤ g for all n, we have |f | ≤ g. Since g is integrable, Corollary 8.5implies f is integrable and fn is integrable for each n.

40

Page 42: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Since |f − fn| ≤ 2g, we have 0 ≤ 2g − |f − fn|. By Fatou’s lemma,∫2g =

∫limn

(2g − |f − fn|)

=

∫lim inf

n(2g − |f − fn|)

≤ lim infn

∫(2g − |f − fn|)

= lim infn

(∫2g −

∫|f − fn|

)=

∫2g + lim inf

n

(−∫|f − fn|

)=

∫2g − lim sup

n

∫|f − fn|

Since∫

2g is finite, we can subtract it to obtain

lim supn

∫|f − fn| ≤ 0.

Therefore we have

0 ≤ lim infn

∫|f − fn| ≤ lim sup

n

∫|f − fn| ≤ 0.

Thus

limn

∫|f − fn| = lim sup

n

∫|f − fn| = lim inf

n

∫|f − fn| = 0.

Furthermore,

limn

∣∣∣∣∫ fn −∫f

∣∣∣∣ = limn

∣∣∣∣∫ (fn − f)

∣∣∣∣ ≤ limn

∫|f − fn| = 0,

whence

limn

∫fn =

∫f.

Now let us only assume that the following properties hold almost everywhere:

g <∞, |fn| ≤ g for every n ∈ N, and fn → f .

(In fact, since g is integrable, we already have g < ∞ a.e. because of Theorem 9.5, so we don’tneed to assume it.) Therefore there exists a set A ∈ A with µ(Ac) = 0 such that

g(x) <∞, |fn(x)| ≤ g(x) for every n ∈ N, and fn(x)→ f(x)

for every x ∈ A. Then the following hold everywhere:

g1A <∞, |fn1A| ≤ g1A for every n ∈ N, and fn1A → f1A.

The functions g1A, f1A fn1A are measurable because g, f, fn,1A are all measurable. Since g isintegrable and g1A = g a.e., Corollary 9.9 implies g1A is integrable. Therefore, by the first part ofthe proof, f1A is integrable, fn1A is integrable for each n, and

limn

∫fn1A =

∫f1A

41

Page 43: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Then since f1A = f a.e. and fn1A = fn a.e. for each n, Corollary 9.9 implies f is integrable, fn isintegrable for each n, and

limn

∫fn = lim

n

∫fn1A =

∫f1A =

∫f.

The next theorem is an application of the dominated convergence theorem. It concerns interchang-ing limits and derivatives with integrals.

Theorem 10.3. Let (X,A, µ) be a measure space. Let g : X → [0,∞] be an integrable function.Let f : X × [a, b]→ R such that f( · , t) : X → R is integrable for each t ∈ [a, b]. Define

F (t) =

∫f(x, t)dµ(x)

for each t ∈ [a, b]. Let c ∈ [a, b].

(a) If limt→c f(x, t) = f(x, c) for all x ∈ X and |f(x, t)| ≤ g(x) for all x ∈ X, t ∈ [a, b], then

limt→c

F (t) = F (c),

i.e.,

limt→c

∫Xf(x, t)dµ(x) =

∫limt→c

f(x, t)dµ(x).

(b) If∂f

∂t(x, t) exists for all x ∈ X, t ∈ [a, b] and |∂f

∂t(x, t)| ≤ g(x) for all x ∈ X, t ∈ [a, b], then

F ′(c) =

∫∂f

∂t(x, c)dµ(x).

i.e., (d

dt

∫f(x, t)dµ(x)

)(c) =

∫∂f

∂t(x, c)dµ(x).

Proof. We need the following fact about limits that the reader should verify: limx→c h(x) = L ifflimn→∞ h(xn) = L for every sequence (xn) with xn 6= c and xn → c. This fact will allow us toapply the dominated convergence theorem.

(a): Exercise.

(b): Let (tn) be any sequence of numbers in [a, b] such that tn 6= c for all n and tn → c. Define

fn(x) =f(x, tn)− f(x, c)

tn − c

for each x ∈ X. Then fn is measurable and limn fn(x) =∂f

∂t(x, c). By the mean value theorem,

|fn(x)| ≤ supt∈[a,b]

|∂f∂t

(x, t)| ≤ g(x)

42

Page 44: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

for all x ∈ X. By the dominated convergence theorem,

limn

F (tn)− F (c)

tn − c= lim

n

1

tn − c

(∫f(x, tn)dµ(x)−

∫f(x, c)dµ(x)

)= lim

n

∫f(x, tn)− f(x, c)

tn − cdµ(x)

= limn

∫fn(x)dµ(x)

=

∫∂f

∂t(x, c)dµ(x).

Since (tn) is arbitrary, the limit

F ′(c) = limt→c

F (t)− F (c)

t− cexists, and

F ′(c) =

∫∂f

∂t(x, c)dµ(x).

Exercise. State and prove a stronger version of this theorem where the hypotheses are requiredto hold only almost everywhere.

43

Page 45: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Chapter 4

Spaces of Functions

11 Normed Spaces and Banach Spaces

We assume the reader is familiar with the definition of a vector space. Here are some examples.

Example 11.1. (i) Rd is a vector space with scalar field R.

(ii) Cd is a vector space with scalar field C.

(iii) If X is a set and V is a vector space with scalar field K, then the set V X of all functions fromX to V is a vector space over K. Addition and scalar multiplication are defined pointwise: Iff, g : X → V and c ∈ K, then f + g and cf are defined as usual by (f + g)(x) = f(x) + g(x)and (cf)(x) = cf(x) for all x ∈ X.

(iv) Let (X,A) be a measure space. The set M(X) of measurable functions from X to R is a vectorspace with scalar field R. Indeed, Corollary 4.15 implies this set is closed under addition andscalar multiplication, so it is a subspace of the vector space of all functions from X to R.

Definition 11.2. A norm on V is a function ‖ · ‖ : V → [0,∞) such that the following conditionsare satisfied for all x, y ∈ V and c ∈ K:

(i) ‖x‖ = 0 if and only if x = 0 (Definiteness)

(ii) ‖cx‖ = |c|‖x‖ (Homogeneity)

(iii) ‖x+ y‖ ≤ ‖x‖+ ‖y‖ (Triangle Inequality)

The pair (V, ‖ · ‖) is called a normed space. When the norm is clear from context, we write Vinstead of (V, ‖ · ‖).

Example 11.3. (i) For each x = (x1, . . . , xd) ∈ Rd, define

‖x‖2 =√|x1|2 + · · ·+ |xd|2 =

(d∑i=1

|xi|2)1/2

.

This is the Euclidean norm or 2-norm on Rd.

44

Page 46: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

(ii) Let 1 ≤ p <∞. For each x = (x1, . . . , xd) ∈ Rd, define

‖x‖p =

(d∑i=1

|xi|p)1/p

.

This is the p-norm on Rd.

(iii) For each x = (x1, . . . , xd) ∈ Rd, define

‖x‖∞ = max1≤i≤n

|xi|.

This is called the max-norm or ∞-norm on Rd.

In later sections, we will see examples of norms on vector spaces of functions.

For the rest of this section, assume V is a normed space with scalar field K, where K is either Ror C.

Definition 11.4. Let (V, ‖ · ‖) be a normed space. A sequence (xn) in V is called convergentif there exists a point x ∈ V with the following property: For every ε > 0 there exists a positiveinteger N such that for every integer n ≥ N we have ‖xn − x‖ < ε. In such case, x is called thelimit of (xn), we say that (xn) converges to x, and we write limn xn = x or xn → x. A sequencewhich is not convergent is called divergent.

Theorem 11.5. Let (V, ‖ ·‖) be a normed space. Let x ∈ V and let (xn) be a sequence in V . Thenxn → x iff ‖xn − x‖ → 0.

Theorem 11.6. Let (V, ‖ · ‖) be a normed space. If (xn) is a convergent sequence in V , then (xn)has a unique limit. In other words, if (xn) is a convergent sequence in V such that xn → x ∈ V andxn → y ∈ V , then x = y. (This justifies saying “the limit” rather than “a limit” in the definition.)

Theorem 11.7. Let (V, ‖ · ‖) be a normed space. If (xn) and (yn) are convergent sequences in Vand c ∈ K, then

(a) limn(xn + yn) = limn xn + lim yn.

(b) limn cxn = c limxn

(c) limn ‖xn‖ = ‖ limn xn‖

Definition 11.8. Let (V, ‖ · ‖) be a normed space. A sequence (xn) in V is called Cauchy if forevery ε > 0, there exists a positive integer there exists a positive integer N such that for all integersm,n ≥ N we have ‖xm − xn‖ < ε.

Theorem 11.9. In a normed space V , every convergent sequence is Cauchy.

Definition 11.10. Let (V, ‖ · ‖) be a normed space. A set E ⊆ V is called complete if everyCauchy sequence in E converges and its limit is in E. If V is complete, V is called a Banachspace.

Example 11.11. Cn and Rn are complete, but Qn is not.

45

Page 47: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

12 Lp: Definitions and Basic Properties

Definition 12.1. Let (X,A, µ) be a measure space. Let 1 ≤ p <∞. For each measurable functionf : X → R, define the Lp-norm (or p-norm) of f to be

‖f‖p =

(∫f(x)dµ(x)

)1/p

with the convention that (∞)1/p =∞ for 1 ≤ p <∞.

Define the Lp space to be

Lp = {f : f : X → R, f measurable, ‖f‖p <∞}

with the convention that functions in Lp are viewed as equal iff they are equal a.e. That is, f = gin Lp iff f = g a.e. (To be precise, Lp is the set of equivalence classes of measurable functionsf : X → R such that ‖f‖p <∞, where the equivalence relation is “equal almost everywhere.”)

When necessary for clarity, we may use the following more descriptive notations:

Lp = Lp(µ) = Lp(X) = Lp(X,A, µ) = Lp(X,A, µ,R)

‖ · ‖p = ‖ · ‖Lp = ‖ · ‖Lp(µ) = ‖ · ‖Lp(X) = ‖ · ‖Lp(X,A,µ) = ‖ · ‖Lp(X,A,µ,R)

Theorem 12.2. Let (X,A, µ) be a measure space. For each 1 ≤ p <∞, Lp is a vector space and‖ · ‖p is a norm on Lp.

The rest of this section is devoted to the proof of Theorem 12.2. We will show that Lp is closedunder addition and scalar multiplication (which implies that Lp is a vector subspace of the spaceof all measurable functions), and that ‖ · ‖p satisfies the three properties in the definition of a norm(Definition 11.2): definiteness, homogeneity, and the triangle inequality.

Remark. We always assume Lp is equipped with the norm ‖ · ‖p . Whenever we talk aboutconvergence in Lp, we are talking about convergence with respect to this norm.

In the following theorem, we prove the definiteness and homogeneity of the Lp norm, and we showthat Lp is closed under scalar multiplication.

Theorem 12.3. Let (X,A, µ) be a measure space. Let 1 ≤ p <∞. Let f : X → R be a measurablefunction.

(a) ‖f‖p = 0 iff f = 0 a.e. iff f = 0 in Lp

(b) If c ∈ R, then ‖cf‖p = |c|‖f‖p.

(c) If c ∈ R and f ∈ Lp, then cf ∈ Lp.

Proof. (a): By Theorem, ‖f‖p = 0 iff∫|f |p = 0 iff |f | = 0 a.e. iff f = 0 a.e. The second equivalence

is the convention from the definition of Lp.

46

Page 48: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

(b):

‖cf‖pp =

∫|cf |p = |c|p

∫|f |p = |c|p‖f‖pp.

(c): Immediate from (b).

To prove the triangle inequality for the Lp norm and that Lp is closed under addition, we firstestablish:

Theorem 12.4. (Holder’s Inequality). Let (X,A, µ) be a measure space. Let f, g : X → R bemeasurable functions. Let 1 < p <∞ and let 1 < q <∞ be such that p−1 + q−1 = 1 (equivalently,q = p/(p− 1)). We call q the conjugate exponent of p. Then

‖fg‖1 ≤ ‖f‖p‖g‖q

The proof of Holder’s inequality will use the following lemma.

Lemma 12.5. (Young’s Inequality) If a ≥ 0, b ≥ 0, p > 0, q > 0, and p−1 + q−1 = 1, then

ab ≤ ap

p+bq

q.

Proof. Fix b, p, q and consider the function h(a) = p−1ap + q−1bq − ab for a ∈ [0,∞). The desiredinequality is equivalent to h(a) ≥ 0. So it will suffice to show that the minimum value of h(a) is0. Note h′(a) = ap−1 − b. So h′(a) > 0 if a > b1/(p−1) and h′(a) < 0 if a < b1/(p−1). Thus theminimum value of h(a) occurs at a = b1/(p−1) and is equal to

h(b1/(p−1)) = p−1(b1/(p−1))p + q−1bq − (b1/(p−1))b = (p−1 + q−1)bq − bq = 0.

For the second equality we used that p/(p − 1) = 1 + 1/(p − 1) = q, which is equivalent top−1 + q−1 = 1.

Proof of Theorem 12.4 (Holder’s Inequality). We consider three cases.

Case 1: ‖f‖p‖g‖q =∞. The desired inequality is clear.

Case 2: ‖f‖p‖g‖q = 0. Then ‖f‖p = 0 or ‖g‖q = 0. So f = 0 a.e. or g = 0 a.e. Either way, wehave fg = 0 a.e. Hence ‖fg‖1 = 0 and the desired inequality holds.

Case 3: 0 < ‖f‖p‖g‖q <∞. Set a = |f(x)|/‖f‖p and b = |g(x)|/‖g‖q, and apply the lemma to get

|f(x)g(x)|‖f‖p‖g‖p

≤ |f(x)|p

p‖f‖pp+|g(x)|q

q‖g‖qq.

Integrating gives‖fg‖1‖f‖p‖g‖p

≤ 1

p+

1

q= 1.

47

Page 49: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

The next theorem is the triangle inequality for Lp norms when 1 ≤ p <∞. It also establishes thatLp is closed under addition.

Theorem 12.6. (Minkowski’s Inequality). Let (X,A, µ) be a measure space. Let f, g : X → R bemeasurable functions. Let 1 ≤ p <∞. Then

‖f + g‖p ≤ ‖f‖p + ‖g‖p

Consequently, if f, g ∈ Lp, then f + g ∈ Lp.

Proof. If p = 1, then, since |f + g| ≤ |f |+ |g|, we have

‖f + g‖1 =

∫|f + g| ≤

∫|f |+

∫|g| = ‖f‖1 + ‖g‖1.

Assume 1 < p <∞. We consider three cases.

Case 1: ‖f + g‖p = 0. The desired inequality is clear.

Case 2: ‖f + g‖p =∞. Note

|f + g|p ≤ (|f |+ |g|)p ≤ (2 max {|f |, |g|})p = 2p max {|f |p, |g|p} ≤ 2p(|f |p + |g|p).

Integrating gives‖f + g‖pp ≤ 2p(‖f‖pp + ‖g‖pp).

Therefore, if ‖f + g‖p = ∞, we must have ‖f‖p = ∞ or ‖g‖p = ∞, so either way the desiredinequality holds.

Case 3: 0 < ‖f + g‖p <∞. Note

|f + g|p = |f + g||f + g|p−1 ≤ |f ||f + g|p−1 + |g||f + g|p−1

Integrating gives‖f + g‖pp ≤ ‖|f ||f + g|p−1‖1 + ‖|g||f + g|p−1‖1.

Applying Holder’s inequality separately to the two terms on the right implies

‖f + g‖pp ≤ ‖f‖p‖(f + g)p−1‖q + ‖g‖p‖(f + g)p−1‖q

where p−1 + q−1 = 1. But

‖(f + g)p−1‖q =

(∫|f + g|(p−1)q

)1/q

=

(∫|f + g|p

)(p−1)/p= ‖f + g‖p−1p .

Therefore the inequality above is

‖f + g‖pp ≤ (‖f‖p + ‖g‖p)‖f + g‖p−1p .

Dividing by ‖f + g‖p−1p gives the desired result.

48

Page 50: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

13 Lp: Completeness

Theorem 13.1. Let (X,A, µ) be a measure space. For every 1 ≤ p <∞, Lp is complete.

Proof. Let (fn) be a Cauchy sequence in Lp. Inductively choose positive integers n1 < n2 < . . .such that

‖fnj+1 − fnj‖p < 2−j (13.1)

for all j. Then (fnk)∞k=1 is a subsequence of (fn). For every x ∈ X and k ∈ N,

fnk(x) = fn1(x) +

k−1∑j=1

(fnj+1(x)− fnj (x)), (13.2)

so the terms of the sequence (fnk(x)) are the partial sums of the series

fn1(x) +

∞∑j=1

(fnj+1(x)− fnj (x)). (13.3)

We will show that this series converges a.e. This will show that the sequence (fnk) converges a.e.

Let us consider the series of absolute values. Define

gk = |fn1 |+k−1∑j=1

|fnj+1 − fnj |,

g = |fn1 |+∞∑j=1

|fnj+1 − fnj |.

By (13.1), we have

‖gk‖p ≤ ‖fn1‖p +k∑j=1

‖fnj+1 − fnj‖p ≤ ‖fn1‖p +k∑j=1

2−j < ‖fn1‖p + 1 <∞.

Since gk increases to g, it follows that gpk increases to gp, and so the monotone convergence theoremimplies ∫

gp = limk

∫gpk = lim

k‖gk‖pp ≤ (‖fn1‖p + 1)p <∞.

So gp is integrable. By Theorem 9.5, gp is finite a.e. Therefore g is finite a.e. Thus the series

|fn1(x)|+∞∑j=1

|fnj+1(x)− fnj (x)|

converges in R for a.e. x ∈ X. This means that the series (13.3) converges absolutely in R for a.e.x ∈ X. Since every absolutely convergent series of real numbers is convergent, the series (13.3)converges in R for a.e. x ∈ X. Therefore, since (fnk

(x)) is the sequence of partial sums of the

49

Page 51: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

series (13.3), the sequence (fnk(x)) converges in R for a.e. x ∈ X. So there exists A ∈ A such that

µ(Ac) = 0 and (fnk(x)) converges in R for every x ∈ A. Define f : X → R by

f(x) =

{limk fnk

(x) if x ∈ A0 if x /∈ A

Then f = limk fnka.e. Moreover, f is measurable because f = limk fnk

1A and each function fnk1A

is measurable.

Now we show that fnk→ f in Lp. Recall that gp is integrable. By (13.2), we have

|fnk| ≤ gk ≤ g,

and so|f | = lim

k|fnk| ≤ lim

kgk = g a.e.

Thus

‖fnk‖pp =

∫|fnk|p ≤

∫gp <∞

‖f‖pp =

∫|f |p ≤

∫gp <∞

So fnk, f ∈ Lp. Moreover,

|fnk− f |p ≤ (|fnk

|+ |f |)p ≤ (2g)p a.e.

Since gp is integrable, (2g)p is integrable. Since fnk→ f a.e., we have

|fnk− f |p → 0 a.e.

Now the dominated convergence theorem implies

limk‖fnk

− f‖pp = limk

∫|fnk− f |p = 0.

Thus fnk→ f in Lp.

Finally, we show that fn → f in Lp. Let ε > 0 be given. We must show that there exists N ∈ Nsuch that for all n ≥ N

‖fn − f‖p < ε.

Since fnk→ f in Lp, there exists K ∈ N such that for all k ≥ K

‖fnk− f‖p < ε/2.

Since (fn) is Caucy in Lp, there exists M ∈ N such that for all m,n ≥M

‖fm − fn‖p < ε/2.

Set N = max {M,K}. Let n ≥ N be arbitrary. Choose k ≥ N . Then k ≥ N ≥ K andnk ≥ k ≥ N ≥M and n ≥ N ≥M . In summary, k ≥ K and nk, n ≥M . So

‖fn − f‖p ≤ ‖fnk− f‖p + ‖fnk

− fn‖p < ε/2 + ε/2 = ε.

50

Page 52: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

14 B(X) and C(X)

Definition 14.1. Let X be any set. For every function f : X → R, define the uniform norm (orsupremum norm) of f to be

‖f‖u = sup{|f(x)| : x ∈ X}

Since ‖f‖u is the least upper bound for the set {|f(x)| : x ∈ X}, we have the alternative formulas

‖f‖u = min {M ∈ [0,∞] : |f(x)| ≤M for every x ∈ X}= inf {M ∈ [0,∞) : |f(x)| ≤M for every x ∈ X}

Given a function f : X → R, a number M ∈ [0,∞] is called an upper bound for |f | if |f(x)| ≤Mfor all x ∈ X. We say f is bounded if |f | has a finite upper bound, i.e., if there is a numberM ∈ [0,∞) such that |f(x)| ≤M for all x ∈ X. Note that f is bounded iff ‖f‖u <∞.

The set of all bounded functions f : X → R is denoted by B(X) (or B(X,R))). In other words,

B(X) = {f : f : X → R, ‖f‖u <∞}

Theorem 14.2. Let X be a set. Let f, g : X → R.

(a) f = 0 iff ‖f‖u = 0

(b) If c ∈ R, then ‖cf‖u = |c|‖f‖u

(c) ‖f + g‖u ≤ ‖f‖u + ‖g‖u

(d) B(X) is a vector space and ‖ · ‖u is norm on B(X).

Proof.

(a) If f = 0, then ‖f‖u = sup {|f(x)| : x ∈ X} = sup {0} = 0. Conversely, if ‖f‖u = 0, then|f | ≤ ‖f‖u = 0, so f = 0.

(b) We have

‖cf‖u = sup {|cf(x)| : x ∈ X} = sup {|c||f(x)| : x ∈ X} = |c| sup {|f(x)| : x ∈ X} = |c|‖f‖u

The third inequality is the following general property of suprema: If A ⊆ R and c ∈ [0,∞],then sup cA = c supA, where cA = {ca : a ∈ A}.

(c) Since |f + g| ≤ |f |+ |g| ≤ ‖f‖u + ‖g‖u, we have ‖f + g‖u ≤ ‖f‖u + ‖g‖u.

(d) Parts (b) and (c) imply that B(X) is closed scalar and multiplication. So B(X) is a vectorsubspace of the vector space of all functions f : X → R. Parts (a), (b), and (c) imply ‖ · ‖uis a norm on B(X).

51

Page 53: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Remark. We always assume B(X) is equipped with the uniform norm. Whenever we talk aboutconvergence in B(X), we are talking about convergence with respect to this norm.

The next theorem says that convergence with respect to the uniform norm is equivalent to uniformconvergence. This explain why the uniform norm is called the uniform norm.

Theorem 14.3. Let X be a set. Let (fn) be a sequence of functions fn : X → R. Let f : X → R.Then ‖fn − f‖u → 0 iff fn → f uniformly on X.

Proof. The statement ‖fn − f‖u → 0 means that for every ε > 0 there is an N ∈ N such that forevery n ≥ N we have

sup {|fn(x)− f(x)| : x ∈ X} ≤ ε. (14.1)

On the other hand, the statement fn → f uniformly on X means that for every ε > 0 there is anN ∈ N such that

|fn(x)− f(x)| ≤ ε for all x ∈ X. (14.2)

Since (14.1) holds iff (14.2) holds, the two statements in question are equivalent.

Theorem 14.4. Let X be a set. Then B(X) is complete.

Proof. Let (fn) be a Cauchy sequence in B(X). For every ε > 0, there exists Nε ∈ N such that

‖fm − fn‖u < ε for all m,n ≥ Nε. (14.3)

Since |fm − fn| ≤ ‖fm − fn‖u, it follows that

|fm(x)− fn(x)| < ε for all x ∈ X and m,n ≥ Nε. (14.4)

Thus (fn(x)) is a Cauchy sequence in R for each fixed x ∈ X. By the completeness of R, (fn(x))converges in R for each fixed x ∈ X. Define f : X → R by f(x) = limn fn(x) for each x ∈ X.

Now we show fn → f in B(X). Let ε > 0. Choose Nε so that (14.4) holds. Let x ∈ X and n ≥ Nε

be arbitrary. Since f(x) = limm f(x), there exists an integer m(x) ≥ Nε such that

|fm(x)(x)− f(x)| < ε.

Combining this inequality and (14.4) gives

|fn(x)− f(x)| ≤ |fn(x)− fm(x)(x)|+ |fm(x)(x)− f(x)| < ε+ ε = 2ε.

This shows that|fn(x)− f(x)| < 2ε for all x ∈ X and n ≥ Nε.

It follows that‖fn − f‖u ≤ 2ε for all n ≥ Nε.

Thus ‖fn − f‖∞ → 0. Moreover, since fn ∈ B(X) for all n, we have

‖f‖u ≤ ‖f − fn‖u + ‖fn‖u < 2ε+∞ =∞

for all n ≥ Nε. So f ∈ B(X). Therefore fn → f in B(X).

52

Page 54: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Definition 14.5. Let X be a subset of Rd. Define C(X) to be the set of continuous functionsf : X → R.

Theorem 14.6. Let X be a compact subset of Rd (for example, X = [a, b]d). Then C(X) is avector subspace of B(X), ‖ · ‖u is a norm on C(X), and C(X) is complete with respect to thisnorm. Moreover, for f ∈ C(X),

‖f‖u = max {|f(x)| : x ∈ X}

Proof. As any continuous function on a compact set is bounded, C(X) ⊆ B(X). As the sum of twocontinuous functions is continuous and a constant multiple of a continuous functions is continuous,C(X) is a vector subspace of B(X). As ‖ · ‖u satisfies the properties of a norm on for the elementsB(X), it also satisfies those properties for the elements of the subset C(X) ⊆ B(X). Thus ‖·‖u is anorm on C(X). To see that, ‖f‖u = max {|f(x)| : x ∈ X} for every f ∈ C(X) we argue as follows.From undergraduate analysis, we know that each continuous function on compact set achieves theirmaximum on the set. In particular, |f | achieves its maximum on X. Thus the supremum of |f | onX is actually the maximum of f on X; i.e.,

‖f‖u = sup {|f(x)| : x ∈ X} = max {|f(x)| : x ∈ X} .

To show that C(X) is complete, we first show:

Claim. For every sequence (fn) of functions in C(X), if (fn) converges to a function f ∈ B(X),then f ∈ C(X).

(In the language of topology, the claims says that C(X) is a closed subset of B(X)). The proof ofthe claim is an an example of an ε/3-proof.

Proof of Claim. Let (fn) be a sequence of functions in C(X) such that (fn) converges to a functionf ∈ B(X). We need to show that f is continuous on X. Let x0 ∈ X be given. Let ε > 0 be given.Then for every ε > 0 there exists an Nε ∈ N such that

‖fn − f‖u ≤ ε/3 for all n ≥ N,

which means

|fn(x)− f(x)| ≤ ε/3 for all x ∈ X and n ≥ N. (14.5)

Since fN is continuous, it is continuous at x0. So there exists a δx0 > 0 such that

|fN (x)− fN (x0)| ≤ ε/3 for all x ∈ X with |x− x0| < δx0 . (14.6)

Now, for every x ∈ X with |x − x0| < δx0 , we can apply (14.6) and apply (14.6) (at x = x andx = x0 with n = N) to get

|f(x)− f(x0)| = |f(x)− fN (x) + fN (x)− fN (x0) + fN (x0)− f(x0)| (14.7)

≤ |f(x)− fN (x)|+ |fN (x)− fN (x0)|+ |fN (x0)− f(x0)| (14.8)

≤ ε/3 + ε/3 + ε/3 = ε. (14.9)

53

Page 55: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

This shows f is continuous at x0. Since x0 is arbitrary point in X, this shows f ∈ C(X). (Noticethat the boundedness of f was not used. Only the fact that fn → f in the norm ‖ · ‖u was needed.)

Now we are ready to show that C(X) is complete. Let (fn) be a Cauchy sequence in C(X). SinceC(X) is a subset of B(X), (fn) is a Cauchy sequence in B(X). Since B(X) is complete, (fn)converges in uniform norm to some f ∈ B(X). Now the Claim implies that f ∈ C(X). So (fn)converges in uniform norm to some f ∈ C(X). This proves C(X) is complete.

(The argument above can be generalized to show that closed subsets of complete subsets of normedspaces are complete).

54

Page 56: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

15 L∞

Definition 15.1. Let (X,A, µ) be a measure space. For each measurable function f : X → R,define the L∞ norm (or ∞-norm) of f to be

‖f‖∞ = inf {M ∈ [0,∞) : |f(x)| ≤M for almost every x ∈ X} .

DefineL∞ = {f : f : X → R, f measurable, ‖f‖p <∞}

with the convention that two functions in L∞ are equal as elements in L∞ iff they are equala.e. That is, f = g in L∞ iff f = g a.e. (To be precise, L∞ is the set of equivalence classes ofmeasurable functions f : X → R such that ‖f‖∞ < ∞, where the equivalence relation is “equalalmost everywhere.”)

If necessary for clarity, we may use the following more descriptive notations:

L∞ = L∞(µ) = L∞(X) = L∞(X,A, µ) = L∞(X,A, µ,R)

‖ · ‖∞ = ‖ · ‖L∞ = ‖ · ‖L∞(µ) = ‖ · ‖L∞(X) = ‖ · ‖L∞(X,A,µ) = ‖ · ‖L∞(X,A,µ,R)

Given a measurable function f : X → R, a number M ∈ [0,∞] is called an essential upperbound for |f | if |f(x)| ≤ M for a.e. x ∈ X. We say f is essentially bounded if |f | has afinite essential upper bound, i.e., if there is a number M ∈ [0,∞) such that |f(x)| ≤ M for a.e.x ∈ X. Thus f is essentially bounded iff ‖f‖∞ < ∞, and L∞ is the set of all essentially boundedmeasurable functions from X to R. (with functions that are equal a.e. identified).

Remark. The L∞ norm is closely related to the uniform norm. Indeed, the uniform norm of f isthe least upper of bound |f | and the L∞ norm is the least essential upper bound of |f |.

Lemma 15.2. Let (X,A, µ) be a measure space. Let f : X → R be a measurable function. Then|f | ≤ ‖f‖∞ a.e. and

‖f‖∞ = min {M ∈ [0,∞] : |f(x)| ≤M for almost every x ∈ X} .

Proof. First we show that |f | ≤ ‖f‖∞. Since the set {x : |f(x)| ≤M} belongs to A, |f | ≤ M a.e.holds iff µ({x : |f(x)| > M}) = 0. (To see this, take A = {x : |f(x)| ≤M} in the definition of|f | ≤M a.e.). Thus

‖f‖∞ = inf {M ∈ [0,∞) : µ({x : |f(x)| > M}) = 0}

If ‖f‖∞ =∞, inequality is clear. Assume ‖f‖∞ <∞. For every n ∈ N, since ‖f‖∞ < ‖f‖∞+ 1/n,the definition of the infimum implies there is a number Mn such that Mn < ‖f‖∞ + 1/n andµ({x : |f(x)| > Mn}) = 0. Then

{x : |f(x)| > ‖f‖∞} =∞⋃n=1

{x : |f(x)| > ‖f‖∞ + 1/n} ⊆∞⋃n=1

{x : |f(x)| > Mn}

55

Page 57: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

and so

µ({x : |f(x)| > ‖f‖∞}) ≤∞∑n=1

µ({x : |f(x)| > Mn}) = 0.

Thus |f | ≤ ‖f‖∞ a.e.

By definition, ‖f‖∞ is a lower bound for

{M ∈ [0,∞) : |f(x)| ≤M for almost every x ∈ X} .

So ‖f‖∞ is also a lower bound for

{M ∈ [0,∞] : |f(x)| ≤M for almost every x ∈ X} .

Since |f | ≤ ‖f‖∞ a.e.,

‖f‖∞ ∈ {M ∈ [0,∞] : |f(x)| ≤M for almost every x ∈ X} .

Therefore‖f‖∞ = min {M ∈ [0,∞] : |f(x)| ≤M for almost every x ∈ X} .

Theorem 15.3. Let (X,A, µ) be a measure space. Let f, g : X → R be measurable functions.

(a) ‖f‖∞ = 0 iff f = 0 a.e. iff f = 0 in L∞

(b) If c ∈ R, then ‖cf‖∞ = |c|‖f‖∞

(c) ‖f + g‖∞ ≤ ‖f‖∞ + ‖g‖∞ (Minkowski Inequality)

(d) ‖fg‖1 ≤ ‖f‖∞‖g‖1 (Holder Inequality)

(e) L∞ is a vector space and ‖ · ‖∞ is norm on L∞.

Proof.

(a) The second equivalence is the convention from the definition of L∞. Now we prove the firstequivalence. If f = 0 a.e., then |f | = 0 a.e., so ‖f‖∞ = 0 by definition. Conversely, if‖f‖∞ = 0, then |f | ≤ ‖f‖∞ = 0 a.e., so f = 0 a.e.

(b) If c = 0, then ‖cf‖∞ = ‖0f‖∞ = 0 = 0‖f‖∞ = c‖f‖∞ If c 6= 0, then

‖cf‖∞ = inf {M ∈ [0,∞] : |cf(x)| ≤M for almost every x ∈ X}= inf {M ∈ [0,∞] : |f(x)| ≤M/|c| for almost every x ∈ X}= inf

{|c|M ′ : M ′ ∈ [0,∞], |f(x)| ≤M ′ for almost every x ∈ X

}= |c| inf

{M ′ : M ′ ∈ [0,∞], |f(x)| ≤M ′ for almost every x ∈ X

}= |c|‖f‖∞

The third equality holds because the sets involved are equal. The fourth inequality is thefollowing general property of infima: If A ⊆ R and c ∈ [0,∞], then inf cA = c inf A, wherecA = {ca : a ∈ A}.

56

Page 58: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

(c) Since |f + g| ≤ |f |+ |g| ≤ ‖f‖∞ + ‖g‖∞ a.e, we have ‖f + g‖∞ ≤ ‖f‖∞ + ‖g‖∞.

(d) Since |f | ≤ ‖f‖∞ a.e., we have |fg| ≤ ‖f‖∞|g| a.e., hence∫|fg| ≤ ‖f‖∞

∫|g|.

(e) Parts (b) and (c) imply that L∞ is closed scalar and multiplication. So L∞ is a vectorsubspace of the space of all measurable functions. Parts (a), (b), and (c) imply ‖ · ‖∞ is anorm on L∞.

Remark. We always assume L∞ is equipped with the norm ‖ · ‖p . Whenever we talk aboutconvergence in L∞, we are talking about convergence with respect to this norm.

The next theorem says that convergence in the L∞ norm is equivalent to uniform convergencealmost everywhere.

Theorem 15.4. Let (X,A, µ) be a measure space. Let (fn) be a sequence of measurable functionsfn : X → R. Let f : X → R be a measurable function. Then ‖fn − f‖∞ → 0 iff fn → f uniformlya.e. on X. In other words, ‖fn − f‖∞ → 0 iff there exists A ∈ A such that µ(Ac) = 0 and fn → funiformly on A.

Proof. ⇒: Assume ‖fn−f‖∞ → 0. For each n ∈ N, we have |fn−f | ≤ ‖fn−f‖∞ a.e., hence thereexists An ∈ A such that µ(Acn) = 0 and

|fn(x)− f(x)| ≤ ‖fn − f‖∞ for all x ∈ An.

Define A = ∩∞n=1An. Then µ(Ac) = 0 and

|fn(x)− f(x)| ≤ ‖fn − f‖∞ for all x ∈ A and all n ∈ N. (15.1)

Now we will use the assumption that ‖fn− f‖∞ → 0. Let ε > 0. There exists an Nε ∈ N such that

‖fn − f‖∞ < ε for all n ≥ Nε (15.2)

Combining (15.1) and (15.2) gives

|fn(x)− f(x)| < ε for all x ∈ A and n ≥ Nε

Thus fn → f uniformly on A and (as noted above) µ(Ac) = 0. So fn → f uniformly a.e.

⇐: Assume fn → f uniformly a.e. This means there exists A ∈ A such that µ(Ac) = 0 and fn → funiformly on A. Let ε > 0 be given. Choose N ∈ N such that |fn(x) − f(x)| < ε for every x ∈ Aand every n ≥ N . Since A ∈ A and µ(Ac) = 0, this means that |fn − f | < ε a.e. for every n ≥ N .Therefore ‖fn − f‖∞ < ε for every n ≥ N . So ‖fn − f‖∞ → 0.

Theorem 15.5. Let (X,A, µ) be a measure space. Then L∞ is complete.

Proof. Let (fn) be a Cauchy sequence in L∞. For all m,n ∈ N, define

Am,n = {x : |fm(x)− fn(x)| ≤ ‖fm − fn‖∞} .

57

Page 59: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

These sets are measurable because the functions fn are measurable. Since |fm− fn| ≤ ‖fm− fn‖∞a.e. we have µ(Acm,n) = 0. Define A =

⋂m,n∈NAm,n. Then µ(Ac) = 0 and

|fm(x)− fn(x)| ≤ ‖fm − fn‖∞ for all x ∈ A and m,n ∈ N (15.3)

Now we use the that assumption (fn) is Cauchy in L∞. For every ε > 0, there exists Nε ∈ N suchthat

‖fm − fn‖∞ < ε for all m,n ≥ Nε. (15.4)

Combining (15.3) and (15.4) gives: For every ε > 0, there exists Nε ∈ N such that

|fm(x)− fn(x)| < ε for all x ∈ A and m,n ≥ Nε (15.5)

Thus (fn(x)) is a Cauchy sequence in R for each fixed x ∈ A. By the completeness of R, (fn(x))converges in R for each fixed x ∈ A. Define f : X → R by

f(x) =

{limn fn(x) if x ∈ A0 if x ∈ Ac

Note f is measurable because f = lim fn1A and the functions fn1A are measurable.

Now we show fn → f in L∞. Let ε > 0. Choose Nε so that (15.4) holds. Let x ∈ A and n ≥ Nε bearbitrary. Since x ∈ A, we have f(x) = limm f(x), so there exists an integer m(x) ≥ Nε such that

|fm(x)(x)− f(x)| < ε.

Combining this inequality and (15.5) gives

|fn(x)− f(x)| ≤ |fn(x)− fm(x)(x)|+ |fm(x)(x)− f(x)| < ε+ ε = 2ε.

This shows that|fn(x)− f(x)| < 2ε for all x ∈ A and n ≥ Nε.

Since µ(Ac) = 0, it follows that

‖fn − f‖∞ ≤ 2ε for all n ≥ Nε.

Thus ‖fn − f‖∞ → 0. Moreover, since fn ∈ L∞ for all n, we have

‖f‖∞ ≤ ‖f − fn‖∞ + ‖fn‖∞ < ε+∞ =∞

for all n ≥ Nε. So f ∈ L∞. Therefore fn → f in L∞.

58

Page 60: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Chapter 5

Some Generalizations

16 Complex-Valued Measurable Functions

Let (X,A, µ) be a measure space.

Definition 16.1. We identify the set of complex numbers C = {a+ ib : a, b ∈ R} with R2 ={(a, b) : a, b ∈ R} Then the open sets in C and exactly the open sets in R2, and B(C) = B(R2).

If z = a + ib ∈ C with a, b ∈ R, the real part of z is Re z = a and the imaginary part of z isIm z = b.

If f : X → C, the real part of f and the imaginary part of f are the functions Re f : X → Rand Im f : X → R defined by Re(f)(x) = Re(f(x)) and Im(f)(x) = Im(f(x)) for every x ∈ X.Note that f = Re f + Im f .

As a special case of Definition 4.3, A function f : X → C is called (A,B(C))-measurable (orsimply A-measurable or measurable) if

f−1(B) ∈ A whenever B ∈ B(C)).

The following characterization of measurable complex-valued functions is often useful.

Theorem 16.2. A function f : X → C is measurable iff Re f : X → R and Im f : X → R aremeasurable.

The reader is asked to prove this theorem in an exercise. Let us outline the proof here. First noteB(C) = B(R2) is generated by the collection of open rectangles R =

{(a, b)× (c, d) : a, b, c, d ∈ R2

}.

To see this, in the proof of Theorem 3.9, replace the open balls B′x with rational centers and radiiby open rectangles R′x with rational endpoints. Then use Theorem 4.4 and the formulas

f−1 ((a, b)× (c, d)) = (Re f)−1(a, b) ∩ (Im f)−1(c, d)

f−1 ((a, b)× R) = (Re f)−1(a, b)

f−1 (R× (c, d)) = (Im f)−1(c, d)

59

Page 61: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Using Theorem 16.2, the reader can show that sums, products, and limits of complex-valued mea-surable functions are measurable. One simply applies the analogous results for real-valued functionsto the real and imaginary parts of these functions separately. The reader is asked to supply thedetails in an exercise.

Definition 16.3. Let f : X → C be a measurable function. If Re f and Im f are both integrable,we say that f is integrable and we define the integral of f to be∫

fdµ =

∫Re fdµ+ i

∫Im fµ

The reader can check that if f : X → R and g : X → R are integrable and c ∈ C, then

(a) cf and f + g are integrable

(b)∫cf = c

∫f and

∫(f + g) =

∫f +

∫g

To prove it, we apply the analogous results for real-valued functions to the real and imaginary partsof these functions separately. The reader is asked to supply the details in an exercise.

Theorem 16.4. Let f : X → C be a measurable function. Then:

(a) |f | is measurable.

(b) f is integrable iff |f | is integrable

(c) If f is integrable, then∣∣∫ f ∣∣ ≤ ∫ |f |.

Proof. (a): Since f is measurable, Re f and Im f are measurable, and so (Re f)2 + (Im f)2 ismeasurable. For each a ∈ R, if a ≤ 0 we have {x ∈ X : |f(x)| ≥ a} = X ∈ A, and if a > 0 we have

{x ∈ X : |f(x)| ≥ a} ={x ∈ X : |f(x)|2 ≥ a2

}={x ∈ X : (Re(f)(x))2 + (Im(f)(x))2 ≥ a2

}∈ A.

(b): f is integrable iff∫f =

∫Re f + i

∫Im f is finite iff

∫Re f and

∫Im f are finite iff

∫|Re f |

and∫| Im f | are finite. But |f | =

√|Re f |2 + | Im f |2 ≤ |Re f |+ | Im f | ≤ 2|f |, so that∫

|f | ≤∫|Re f |+

∫| Im f | ≤ 2

∫|f |.

Thus∫|Re f | and

∫| Im f | are finite iff

∫|f | is finite iff |f | is integrable.

(c): The inequality is trivial if∫f = 0. Assume

∫f 6= 0. Set α =

∣∣∫ f ∣∣ (∫ f)−1. Then∣∣∣∣∫ f

∣∣∣∣ = α

∫f =

∫αf

In particular,∫αf is real. Therefore∣∣∣∣∫ f

∣∣∣∣ = α

∫f =

∫αf = Re

∫αf =

∫Re(αf) ≤

∫|αf | =

∫|f |.

60

Page 62: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

The dominated convergence theorem is valid if the functions f and fn appearing in it are complex-valued. To prove it, we apply the dominated convergence to the real and imaginary parts of thesefunctions separately.

We can also consider Lp spaces for complex-valued functions. All the definitions are analogous, allthe theorems are valid, and all the proofs are identical. We use the notations Lp(X,A, µ,R) andLp(X,A, µ,C) if we need distinguish the Lp spaces with real-valued functions and complex-valuedfunctions.

61

Page 63: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Chapter 6

Construction of Measures

17 Outline

In the next few sections, we will study a method for constructing measures due to Caratheodory.After developing the abstract theory, we will use the method to construct Lebesgue measure on R,Lebesgue measure on Rd, and product measures.

We now outline Caratheodory’s method for constructing a measure on a set X. We start by definingthe measure only a collection E of “elementary” subsets of X. The collection E need not be a σ-algebra. Denote this “primitive measure” defined on E by µ0. Next, we use µ0 to construct afunction µ∗ defined on the collection P(X) of all subsets of X. Typically, µ∗ will not be countablyadditive, hence it will not be a measure. Instead, µ∗ will only be an outer measure (the definitionwill come later). To remedy this, we identify a collection M(µ∗) of subsets of X such that M(µ∗)is a σ-algebra on X and the restriction of µ∗ to M(µ∗) is a measure. We denote this measure byµ. If the “primitive measure” µ0 and the collection of “elementary” subsets E obey certain naturalcondtions, then the σ-algebra M(µ∗) will contain the collection E , and µ∗ and (hence) µ will agreewith µ0 on E .

Let us consider this outline for the specific example of Lebesgue measure on R. The collection E of“elementary” subsets is a collection of intervals, and the “primitive measure” λ0 gives the lengthof an interval. The outer measure λ∗ is called the Lebesgue outer measure. The σ-algebra M(λ∗)is denoted by L(R); it is called the Lebesgue σ-algebra, and its elements are called Lebegesuemeasurable sets. The restriction of λ∗ to M(λ∗) = L(R) is the Lebesgue measure L(R) and isdenoted by λ. The Lebesgue measure assigns every interval its length (in particular, it agrees withλ0). The Lebesgue σ-algebra L(R) contains the Borel σ-algebra B(R), so that restricting λ∗ evenfurther gives a measure on B(R), which we call Lebesgue measure on B(R) and which we also denoteby λ.

18 Set Functions

Let X be a set.

62

Page 64: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Definition 18.1. Let A be a collection of subsets of X. That is, A ⊆ P(X). Let φ be a functionφ : A → [0,∞].

(i) φ is additive if φ(A∪B) = φ(A) +φ(B) whenever A,B are disjoint sets in A and A∪B ∈ A

(ii) φ is finitely additive if φ (⋃ni=1Ai) =

∑ni=1 φ(Ai) whenever A1, . . . , An are disjoint sets in

A and⋃ni=1Ai ∈ A

(iii) φ is countably additive if φ (⋃∞i=1Ai) =

∑∞i=1 φ(Ai) whenever A1, A2, . . . are disjoint sets

in A and⋃∞i=1Ai ∈ A

(iv) φ is subadditive if φ(A ∪B) ≤ φ(A) + φ(B) whenever A,B ∈ A and A ∪B ∈ A

(v) φ is finitely subadditive if φ (⋃ni=1Ai) ≤

∑ni=1 φ(Ai) whenever A1, . . . , An ∈ A and⋃n

i=1Ai ∈ A

(vi) φ is countably subadditive if φ (⋃∞i=1Ai) ≤

∑∞i=1 φ(Ai) whenever A1, A2, . . . ∈ A and⋃∞

i=1Ai ∈ A

(vii) φ is monotone if φ(A) ≤ φ(B) whenever A,B ∈ A and A ⊆ B.

(viii) φ is finitely monotone if φ(A) ≤∑n

i=1 φ(Ai) whenever A,A1, . . . , An ∈ A and A ⊆⋃ni=1Ai.

(ix) φ is countably monotone if φ(A) ≤∑∞

i=1 φ(Ai) whenever A,A1, A2, . . . ∈ A and A ⊆⋃∞i=1Ai.

Remark. In the definition of additivity, we must assume explicitly that A ∪ B ∈ A because A isnot necessarily closed under finite unions. Similar remarks apply to the other properties.

Remark. Assume φ(∅) = 0. Countable additivity implies finite additivity (take Ai = ∅ for i > n),and finite additivity implies additivity (take n = 2). In general, additivity does not imply finiteadditivity. But, if A is closed under finite unions, then additivity does imply finite additivity byinduction. Regardless of whether A is closed under countable unions, finite additivity does notimply countable additivity. Similar relationships hold amongst the three flavours of subaddivityand amongst the three flavours of monotonicity.

Recall the definition of a measure.

Definition 18.2. Let A be a σ-algebra on X. A measure on A is a function µ : A → [0,∞] thatis countably additive and satisfies µ(∅) = 0.

The concept of an outer measure will be very important in the following sections.

Definition 18.3. An outer measure on X is a function µ∗ : P(X) → [0,∞] that is countablysubadditive, monotone, and satisfies µ∗(∅) = 0.

Remark. Since µ∗ is defined on P(X) and since µ∗(∅) = 0, countably subadditivity and mono-tonicity together are equivalent to countable monotonicity. Thus, in the definition of outer measure,we can replace countably subadditive and monotone by countable monotonicity.

Any measure defined on P(X) is an outer measure on X. See Example 5.3 for some examples.

63

Page 65: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

19 Caratheodory’s Theorem: Outer Measures to Measures

Let X be a set. In this section, we tackle the following problem. Given an outer measure µ∗ onX, we want to find a σ-algebra on X such that the restriction µ∗ to the σ-algebra is a measure Inother words, we want to find a σ-algebra on which µ∗ is countably additive. Naturally, we wantthis σ-algebra to be as large as possible, so that we have many measure sets and (consequently)many measurable functions to work with.

The next lemma gives necessary and sufficient conditions for an outer measure to be countablyadditive on a σ-algebra.

Lemma 19.1. Let µ∗ be an outer measure on X and let A be a σ-algebra on X. Then the followingare equivalent:

(a) µ∗ is countably additive on A, i.e., µ (⋃∞i=1Ai) =

∑∞i=1 µ

∗(Ai) for all sequences of disjointsets A1, A2, . . . ∈ A.

(b) µ∗ is additive on A, i.e., µ(A ∪B) = µ∗(A) + µ∗(B) for all disjoint sets A,B ∈ A.

(c) Every F ∈ A satisfies

µ∗(E) = µ∗(E ∩ F ) + µ∗(E ∩ F c) for all E ∈ A

Proof. (a) ⇒ (b): Let A and B be disjoint sets in A. Define A1 = A, A2 = B, and Ai = ∅ fori > 2. Since µ∗(∅) = 0 and since µ∗ is assumed to be countably additive, we have µ∗(A ∪ B) =µ∗ (

⋃ni=1Ai) =

∑∞i=1 µ(Ai) = µ∗(A) + µ∗(B).

(b) ⇒ (a): Let A1, A2, . . . be any sequence of disjoint sets in A. By the countably subaddivity ofµ∗, we have µ∗ (

⋃∞i=1Ai) ≤

∑∞i=1 µ

∗(Ai). We must prove the reverse inequality. Since µ∗ is additiveon A and since A is closed under finite unions, an induction gives that µ∗ is finitely additive on A.By the finitely additivity and monotonicity of µ∗ on A, we have

∞∑i=1

µ∗(Ai) = limn→∞

n∑i=1

µ∗(Ai) = limn→∞

µ∗

(n⋃i=1

Ai

)≤ lim

n→∞µ∗

( ∞⋃i=1

Ai

)= µ∗

( ∞⋃i=1

Ai

)

(b) ⇒ (c): Take A = E ∪ F and B = E ∩ F c.

(c) ⇒ (b): Take E = A ∪B and F = A.

Condition (c) in Lemma 19.1 inspires the following definition.

Definition 19.2. Let µ∗ : P(X)→ [0,∞]. A set F ∈ P(X) is called µ∗-measurable if

µ∗(E) = µ(E ∩ F ) + µ∗(E ∩ F c) for all E ∈ P(X) (19.1)

The collection of all µ∗-measurable subsets of X is denoted by M(µ∗).

64

Page 66: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Remark. A µ∗-measurable set F can be thought of as a sharp knife; it can cut up any set E intotwo pieces E ∩F and E ∩F c so cleanly that the sum of the outer measure of the pieces equals theouter measure of the whole.

Observation. Assume µ∗ is an outer measure on X. Let F ∈ P(X). By the countable subaddi-tivity of µ∗, the ≤ inequality in (19.1) holds for all E ∈ P(X). Thus, to prove F is µ∗-measurable,it suffices to prove the ≥ inequality. Moreover, the ≥ inequality is trivial if µ∗(E) =∞. ThereforeF is µ∗-measurable iff

µ∗(E) ≥ µ(E ∩ F ) + µ∗(E ∩ F c) for all E ∈ P(X) with µ∗(E) <∞ (19.2)

Now we come to the main theorem of this section, which provides a solution to the problem posedabove. It is due to Caratheodory.

Theorem 19.3. (Caratheodory’s Restriction Theorem) Let µ∗ be an outer measure on X. ThenM(µ∗) is a σ-algebra on X and µ∗|M(µ∗) (the restriction of µ∗ to M(µ∗)) is a measure on M(µ∗).

Proof. To show that M(µ∗) is a σ-algebra, we first show that ∅ ∈M(µ∗) and that M(µ∗) is closedunder complements. For every E ∈ P(X),

µ∗(E ∩ ∅) + µ∗(E ∩ ∅c) ≤ µ∗(∅) + µ∗(E) = µ∗(E),

so ∅ ∈M(µ∗). If A ∈M(µ∗), then, for every E ∈ P(X),

µ∗(E) = µ∗(E ∩A) + µ∗(E ∩Ac) = µ∗(E ∩Ac) + µ∗(E ∩ (Ac)c),

so Ac ∈M(µ∗).

It remains to show that M(µ∗) is closed under countable unions. Let A1, A2, . . . ∈M(µ∗). We mustshow

⋃∞i=1Ai ∈ M(µ∗). Let E ∈ P(X). Define En =

⋃ni=1Ai for n ≥ 0, with the understanding

that E0 =⋃0i=1Ai = ∅. Since A1 ∈M(µ∗), we have

µ∗(E) = µ∗(E ∩A1) + µ∗(E ∩Ac1) = µ∗(E ∩ Ec0 ∩A1) + µ∗(E ∩ Ec1).

Since A2 ∈M(µ∗), the last term on the right above is

µ∗ (E ∩ Ec1) = µ∗ (E ∩ Ec1 ∩A2) + µ∗ (E ∩ Ec1 ∩Ac2) = µ∗ (E ∩ Ec1 ∩A2) + µ∗ (E ∩ Ec2) .

Iterating (or induction) shows that

µ∗(E) =n∑i=1

µ∗(E ∩ Eci−1 ∩Ai) + µ∗(E ∩ Ecn).

Since Ecn ⊇ (⋃∞i=1Ai)

c, the monotonicity of µ∗ implies

µ∗(E) ≥n∑i=1

µ∗(E ∩ Eci−1 ∩Ai) + µ∗

(E ∩

( ∞⋃i=1

Ai

)c).

65

Page 67: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Now we let n→∞, use⋃∞i=1(E∩Eci−1∩Ai) = E∩(

⋃∞i=1Ai), and apply the countable subadditivity

of µ∗ (twice) to get

µ∗(E) ≥∞∑i=1

µ∗(E ∩ Eci−1 ∩Ai) + µ∗

(E ∩

( ∞⋃i=1

Ai

)c)

≥ µ∗(E ∩

( ∞⋃i=1

Ai

))+ µ∗

(E ∩

( ∞⋃i=1

Ai

)c)≥ µ∗(E).

Thus all the inequalities in the last calculation are inequalities. This proves⋃∞i=1Ai ∈ M(µ∗).

Therefore M(µ∗) is a σ-algebra.

Furthermore, if E =⋃∞i=1Ai and the sets A1, A2, . . . are disjoint, then E ∩ Eci−1 ∩ Ai = Ai and

E ∩ (⋃∞i=1Ai)

c = ∅, hence the calculation above reduces to the statement

µ∗

( ∞⋃i=1

Ai

)=∞∑i=1

µ∗(Ai).

So µ∗ is countably additive onM(µ∗). Thus the restriction of µ∗ toM(µ∗) is a measure onM(µ∗).

66

Page 68: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

20 Construction of Outer Measures

Let X be a set. In this section, we see how to construct an outer measure µ∗ on X by starting witha function µ0 defined on a collection E of subsets of X. Moreover, we will give sufficient conditionsfor the outer measure µ∗ to be an extension of µ0 and for M(µ∗) to contain E .

Definition 20.1. A demi-ring on X is a collection E ⊆ P(X) that satisfies the following proper-ties.

(i) ∅ ∈ E

(ii) If A,B ∈ E , then there exists a finite collection of disjoint sets C1, . . . , Cm ∈ E such thatA \B =

⋃mi=1Ci.

Every σ-algebra is a demi-ring, but not conversely.

Lemma 20.2. Let E be a demi-ring on X.

(a) If A,A1, . . . , An ∈ E , then there exists a finite collection of disjoint sets C1, . . . , Cm ∈ E suchthat A \

⋃ni=1Ai =

⋃mj=1Cj .

(b) If A,B ∈ E , then there exists a finite collection of disjoint sets C1, . . . , Cm ∈ E such thatA ∩B =

⋃mj=1Cj .

Proof.

(a): We prove this by induction on n. If n = 1, this is immediate from the definition of E as a demi-ring. Assume the statement holds for some n ∈ N. So there exists disjoint sets D1, . . . , Dk ∈ Asuch that A \

⋃ni=1Ai =

⋃ki=1Di. Then

A \n+1⋃i=1

Ai =

(A \

n⋃i=1

Ai

)\An+1 =

(k⋃i=1

Di

)\An+1 =

k⋃i=1

(Di \An+1)

By the definition of E as a demi-ring, for each i ∈ {1, . . . , k}, there exist disjoint sets Ei,1, . . . , Ei,`i ∈E such that Di \An+1 =

⋃`ij=1Ei,j . Then

A \n+1⋃i=1

Ai =

k⋃i=1

`i⋃j=1

Ei,j .

By construction, for each fixed i ∈ {1, . . . , k}, the sets Ei,1, . . . , Ei,`i ∈ E are disjoint. If i, i′ ∈cbr1, . . . , k with i 6= i′, then for every j ∈ {1, . . . , `i} and every j′ ∈ {1, . . . , `i′} we have Ei,j ⊆ Di

and Ei′,j′ ⊆ Di′ , so Ei,j and Ei′,j′ are disjoint because Di and Di′ are disjoint. Thus the sets Ei,jare disjoint for all i, j. By relabeling the Ei,j ’s as F1, . . . , Fm where m = `1 + · · ·+ `k, we have

A \n+1⋃i=1

Ai =

m⋃i=1

Fi

and F1, . . . , Fm are disjoint sets in E . This completes the proof by induction.

67

Page 69: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

(b): If A,B ∈ E , then there exists a finite collection of disjoint sets C1, . . . , Cm ∈ E such thatA \B =

⋃mi=1Ci, hence

A ∩B = A \ (A \B) = A \

(m⋃i=1

Ci

)The desired result now follows from (a).

Theorem 20.3. (Caratheodory Construction of Outer Measures) Let E ⊆ P(X) and let µ0 : E →[0,∞]. Assume ∅ ∈ E and µ0(∅) = 0. Define µ∗ : P(X)→ [0,∞] by

µ∗(A) = inf

{ ∞∑i=1

µ0(Ei) : E1, E2, . . . ∈ E , A ⊆∞⋃i=1

Ei

}

for every A ∈ P(X), with the convention that inf ∅ =∞.

(a) µ∗ is an outer measure on X. It is called the outer measure on X generated by µ0.

(b) If µ0 is countably monotone, then µ∗(A) = µ0(A) for all A ∈ E .

(c) If E is a demi-ring and µ0 is finitely additive, then E ⊆M(µ∗).

Remark. The idea behind the definition of µ∗ is as follows. Given a set A ⊆ X, we wish tomeasure it from the outside. So we cover A by a countable unions of sets in E . The measure ofA should be no larger than the sum of the measures of the sets in the cover. We define the outermeasure of A to be the smallest possible sum that can be obtained in this way.

Remark. It is easy to check that the following statements also hold,(b’) µ0 is countably monotone if and only if µ∗(A) = µ0(A) for all A ∈ E .(c’) If E ⊆M(µ∗), then µ0 is finitely additive.

Proof.

(a): For every E1, E2, . . . ∈ E , we have∑∞

i=1 µ0(Ei) ∈ [0,∞]. So µ∗(A) ∈ [0,∞] for every A ∈ P(X).

Let’s show that µ∗(∅) = 0. If Ei = ∅ for i = 1, 2, . . ., then E1, E2, . . . ∈ E and ∅ ⊆⋃∞i=1Ei, hence

µ∗(∅) ≤∑∞

i=1 µ0(Ei) = 0. Therefore µ∗(∅) = 0.

Now let’s show that µ∗ is monotone. Let A,B ∈ P(X) with A ⊆ B. For any sequence E1, E2, . . . ∈E , if B ⊆

⋃∞i=1Ei, then A ⊆

⋃∞i=1Ei. Thus{ ∞∑

i=1

µ0(Ei) : E1, E2, . . . ∈ E , B ⊆∞⋃i=1

Ei

}⊆

{ ∞∑i=1

µ0(Ei) : E1, E2, . . . ∈ E , A ⊆∞⋃i=1

Ei

}

Taking infimums gives µ∗(A) ≤ µ∗(B). Therefore µ∗ is monotone.

Finally, let’s show that µ∗ is countably subadditive. Let A1, A2, . . . ∈ P(X). Let ε > 0. Foreach n ∈ N, by the definition of µ∗(An) as an infimum, there exists E1,n, E2,n, . . . ∈ E such that

68

Page 70: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

An ⊆⋃∞m=1Em,n and

∞∑m=1

µ0(Em,n) ≤ µ∗(An) +ε

2n.

Then⋃∞n=1An ⊆

⋃∞n=1

⋃∞m=1Em,n and

µ∗

( ∞⋃n=1

An

)≤∞∑n=1

∞∑m=1

µ0(Em,n) ≤∞∑n=1

(µ∗(An) +

ε

2n

)=

∞∑n=1

µ∗(An) + ε.

Since ε > 0 is arbitrary, we have

µ∗

( ∞⋃n=1

An

)≤∞∑n=1

µ∗(An).

Therefore µ∗ is countably subadditive. This completes the proof that µ∗ is an outer measure.

(b): Assume µ0 is countably monotone.

First we show that µ∗(A) ≤ µ0(A) for every A ∈ E . Let A ∈ E . If E1 = A and Ei = ∅ for i > 1,then E1, E2, . . . ∈ E and A ⊆

⋃∞i=1Ei, hence

µ∗(A) ≤∞∑i=1

µ0(Ei) = µ0(A).

Now we show that µ0(A) ≤ µ∗(A) for every A ∈ E . Let A ∈ E . If E1, E2, . . . ∈ E with A ⊆⋃∞i=1Ei,

then the countable monotonicity of µ0 implies that µ0(A) ≤∑∞

i=1 µ0(Ei). Therefore µ0(A) is alower bound for the set { ∞∑

i=1

µ0(Ei) : E1, E2, . . . ∈ E , A ⊆∞⋃i=1

Ei

}.

Since µ∗(A) is the infimum of this set, we have µ0(A) ≤ µ∗(A).

(c): Assume E is a demi-ring and µ0 is finitely additive.

To show that E ⊆M(µ∗), We we must show that every A ∈ E satisfies

µ∗(E) = µ∗(E ∩A) + µ∗(E ∩Ac) for every E ∈ P(X).

Let A ∈ E . Let E ∈ P(X). Let ε > 0. By the definition of µ∗(E) as an infimum, there existsE1, E2, . . . ∈ E such that E ⊆

⋃∞n=1En and

∞∑n=1

µ0(En) ≤ µ∗(E) + ε. (20.1)

By Part (b) of Lemma 20.2, for each n ∈ N, there exist disjoint Cn,1, . . . , Cn,Kn ∈ A such that

En ∩A =

Kn⋃k=1

Cn,k.

69

Page 71: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

By the definition of a demi-ring, for each n ∈ N, there exist disjoint Dn,1, . . . , Dn,Ln ∈ A such that

En ∩Ac = En \A =

Ln⋃`=1

Dn,`.

Since En∩A and En \A are disjoint, the sets Cn,1, . . . , Cn,Kn , Dn,1, . . . , Dn,Ln are all disjoint. Notethat

En = (En ∩A) ∪ (En ∩Ac) =

(Kn⋃k=1

Cn,k

)∪

(Ln⋃`=1

Dn,`

).

By the finite additivity of µ0,

µ0(En) =

Kn∑k=1

µ0(Cn,k) +

Ln∑`=1

µ0(Dn,`).

Combining this with (20.1) gives

∞∑n=1

Kn∑k=1

µ0(Cn,k) +

∞∑n=1

Ln∑`=1

µ0(Dn,`) ≤ µ∗(E) + ε. (20.2)

Since E ⊆⋃∞n=1Ei, we have

E ∩A ⊆∞⋃n=1

(En ∩A) ⊆∞⋃n=1

Kn⋃k=1

Cn,k

and

E ∩Ac ⊆∞⋃n=1

(En ∩Ac) ⊆∞⋃n=1

Ln⋃`=1

Dn,`

Recall that Cn,k ∈ A for all n, k and Dn,` ∈ A for all n, `. Then the definition of µ∗ implies

µ∗(E ∩A) ≤∞∑n=1

Kn∑k=1

µ∗(Cn,k)

and

µ∗(E ∩Ac) ≤∞∑n=1

Ln∑`=1

µ∗(Dn,`).

Combining these inequalities with (20.2) gives

µ(E ∩A) + µ(E ∩Ac) ≤ µ∗(E) + ε.

Since ε > 0 is arbitrary, we have

µ(E ∩A) + µ(E ∩Ac) ≤ µ∗(E).

The reverse inequality holds by the subadditivity of the outer measure µ∗. This completes theproof that E ⊆M(µ∗).

70

Page 72: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Theorem 20.4. Let X be a set. Let E ⊆ P(X) and let µ0 : E → [0,∞]. Assume ∅ ∈ E andµ0(∅) = 0. Let µ∗ be the outer measure on X generated by µ0. Let A ∈ P(X).

(a) Assume there exist E1, E2, . . . ∈ E such that A ⊆⋃∞i=1Ei. For every ε > 0 there exists

Gε ∈ Eσ such that A ⊆ Gε and µ∗(A) ≤ µ∗(Gε) ≤ µ∗(A) + ε. Moreover, there exists H ∈ Eσδsuch that A ⊆ H and µ∗(A) = µ∗(H).

(b) Assume E ⊆M(µ∗). If µ∗(A) <∞, then the following are equivalent:

(i) A ∈M(µ∗)

(ii) For every ε > 0 there exists Gε ∈ Eσ such that A ⊆ Gε and µ∗(Gε \A) ≤ ε.(iii) There exists H ∈ Eσδ such that A ⊆ H and µ∗(H \A) = 0.

(c) Assume E ⊆M(µ∗). If X is σ-finite with respect to µ0, then the following are equivalent:

(i) A ∈M(µ∗)

(ii) For every ε > 0 there exists Gε ∈ Eσ such that A ⊆ Gε and µ∗(Gε \A) ≤ ε.(iii) There exists H ∈ Eσδ such that A ⊆ H and µ∗(H \A) = 0.

Proof. Exercise.

71

Page 73: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

21 Existence and Uniqueness of Extensions

Let X be a set. In this section, we prove the existence and uniqueness of a measure µ that extendsa given primitive function µ0 defined on a collection of elementary subsets of X.

Theorem 21.1. Let E be a demi-ring on X. Suppose µ0 : E → [0,∞] is countably monotone,finitely additive, and satisfies µ0(∅) = 0. Let µ∗ be the outer measure on X generated by µ0, as inTheorem 20.3(a). Let A be a σ-algebra on X such that E ⊆ A ⊆ M(µ∗). Let µ = µ∗|A (i.e., µ isthe restriction of µ∗ to A).

(a) µ is a measure on A and µ = µ0 on E .

(b) If ν is a measure on A such that ν = µ0 on E , then ν ≤ µ on A.

(c) If ν is a measure on A such that ν = µ0 on E and if X is σ-finite with respect to µ0 (i.e.,X =

⋃∞i=1Ei for some sets Ei ∈ A with µ0(Ei) <∞), then ν = µ.

Proof.

(a): Theorem 20.3(a) implies µ∗ is an outer measure on X. Theorem 19.3 implies that M(µ∗) is aσ-algebra and µ∗|M(µ∗) is a measure on M(µ∗). Since A ⊆M(µ∗), µ is a measure on A. Theorem20.3(b) implies that µ∗ = µ0 on E . Since E ⊆ A and µ = µ∗ on A, µ = µ0 on E .

(b): Let A ∈ A. If (Ei) ⊆ E and A ⊆⋃∞i=1Ei, then ν(A) ≤

∑∞i=1 ν(Ei) =

∑∞i=1 µ0(Ei), The

definition of µ∗(A) as an infimum implies ν(A) ≤ µ∗(A). Since µ = µ∗ on A, we get ν(A) ≤ µ(A).

(c): Claim: If A ∈ A, E ∈ E , A ⊆ E, and µ0(E) <∞, then µ(A) = ν(A).Proof of Claim: By (b), ν(A) ≤ µ(A) ≤ µ(E) ≤ µ0(E) <∞ and ν(E \A) ≤ µ(E \A). Therefore

µ0(E)− ν(A) = ν(E)− ν(A) ≤ ν(E \A) ≤ µ(E \A) = µ(E)− µ(A) ≤ µ0(E)− µ(A),

hence µ(A) ≤ ν(A).

Now we use the claim to prove (c). Let A ∈ A. Since X is σ-finite with respect to µ0, there existsets E1, E2, . . . ∈ E such that X =

⋃∞i=1Ei and µ0(Ei) <∞ for all i. Define Bi = Ei \

⋃i−1j=1Ej for

i ≥ 1 with the understanding that⋃0j=1Ej = ∅. Then B1, B2, . . . are disjoint sets in A such that

X =⋃∞i=1Ei =

⋃∞i=1Bi and Bi ⊆ Ei for all i. Then A ∩B1, A ∩B2, . . . are disjoint sets in A such

that A =⋃∞i=1(A ∩Bi) and µ(A ∩Bi) ≤ µ(Bi) ≤ µ(Ei) = µ0(Ei) <∞ for all i. The claim implies

that µ(A ∩Bi) = ν(A ∩Bi) for all i. So by the countable additivity of µ and ν we have

µ(A) =∞∑i=1

µ(A ∩Bi) =∞∑i=1

ν(A ∩Bi) = ν(A).

Remark. (b) and (c) do not require that E is a demi-ring. The fact that ν = µ0 on E implies thatµ0 is finitely additive and countably monotone, so those assumptions can also be dropped.

72

Page 74: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Chapter 7

Lebesgue Measure on R

22 Lebesgue Measure on R

In this section, we construct the Lebesgue measure on R.

Theorem 22.1. Let E be the collection of all intervals of the form (a, b] where a, b ∈ R and a ≤ b.Define λ0 : E → [0,∞] by

λ0((a, b]) = b− a

for each (a, b] ∈ E . Notice that ∅ = (a, a] ∈ E and λ0(∅) = λ((a, a]) = 0. Let λ∗ be the outermeasure generated by λ0, as in Theorem 20.3(a). It is called the Lebesgue outer measure on R.

(a) E is a demi-ring on R.

(b) λ0 is finitely additive.

(c) λ0 is countably monotone.

(d) R is σ-finite with respect to µ0.

(e) λ∗ = λ0 on E . Moreover, λ∗ assigns each interval its length.

(f) Define L(R) = M(λ∗). We call L(R) the Lebesgue σ-algebra on R. It satisfies E ⊆ σ(E) =B(R) ⊆ L(R).

(g) λ∗|L(R) is a measure on L(R). It is the unique measure on L(R) that agrees with λ0 on E . Itis called Lebesgue measure on L(R). It is denoted by λ.

(h) λ∗|B(R) is a measure on B(R). It is the unique measure on B(R) that agrees with λ0 on E . Itis called Lebesgue measure on B(R). It is also denoted by λ.

Proof.

(a) Let (a, b], (c, d] ∈ E .

Case 1: If a ≤ b ≤ c ≤ d or c ≤ d ≤ a ≤ b, then (a, b] \ (c, d] = (a, b].

73

Page 75: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Case 2: If a ≤ c ≤ b ≤ d, then (a, b] \ (c, d] = (a, c].

Case 3: If c ≤ a ≤ d ≤ b, then (a, b] \ (c, d] = (d, b].

Case 4: If a ≤ c ≤ d ≤ b, then (a, b] \ (c, d] = (a, c] ∪ (d, b].

Case 5: If c ≤ a ≤ b ≤ d, then (a, b] \ (c, d] = ∅.In all cases, (a, b] \ (c, d] is a finite union of disjoint members of E . Thus E is a demi-ring.

(b) Assume (a, b], (c, d] ∈ E are disjoint and (a, b] ∪ (c, d] ∈ E .

Case 1: (a, b] 6= ∅ and (c, d] 6= ∅. Then a ≤ b = c ≤ d or c ≤ d = a ≤ b. Without loss ofgenerality, assume a ≤ b = c ≤ d. Then (a, b] ∪ (c, d] = (a, d] and

λ0((a, b] ∪ (c, d]) = λ0((a, d]) = d− a = d− a+ b− c = b− a+ d− c = λ0((a, b]) + λ0((c, d])

Case 2: (a, b] = ∅ or (c, d] = ∅. Without loss of generality, assume (a, b] = ∅. Then a = b,(a, b] ∪ (c, d] = (c, d], and

λ0((a, b] ∪ (c, d]) = λ0((c, d]) = d− c = b− a+ d− c = λ0((a, b]) + λ0((c, d]).

(c) Let (a, b], (a1, b1], (a2, b2], . . . ,∈ E such that (a, b] ⊆⋃∞n=1(an, bn]. If a = b, then λ0((a, b]) =

0 ≤∑∞

n=1 λ0((an, bn]), and we are done. Assume a < b. Let ε > 0 such that a+ ε < b. Definec = a+ ε, d = b, cn = an, and dn = bn + ε2−n. Then

[c, d] ⊆ (a, b] ⊆∞⋃n=1

(an, bn] ⊆∞⋃n=1

(cn, dn).

Since [c, d] is compact and the collection {(cn, dn) : n ∈ N} is an open cover of [c, d], there isa finite subcover {(cn, dn) : n = 1, 2, . . . , N}. So

[c, d] ⊆N⋃n=1

(cn, dn).

Take the collection {(cn, dn) : n = 1, 2, . . . , N}, discard any (cn, dn) that is a subset of another(cm, dm), and relabel so that c ∈ (c1, d1), d1 ∈ (c2, d2), d2 ∈ (c3, d3), . . ., d ∈ (aN , dN ). Thenc1 < c < d < dN and cn < dn−1 for n = 2, . . . , N . Therefore

N∑n=1

(dn − cn) = dN − cN + dN−1 − cN−1 + · · ·+ d2 − c2 + d1 − c1

= dN + (dN−1 − cN ) + (dN−2 − cN−1) + · · ·+ (d1 − c2)− c1> dN − c1 > d− c = b− a− ε.

On the other hand,

N∑n=1

(dn − cn) ≤∞∑n=1

(dn − cn) =

∞∑n=1

(bn + ε2−n − an) = ε+

∞∑n=1

(bn − an).

Thus

b− a− ε < ε+∞∑n=1

(bn − an).

74

Page 76: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Letting ε→ 0 gives

b− a ≤∞∑n=1

(bn − an),

which is equivalent to

λ0((a, b]) ≤∞∑n=1

λ0((an, bn]).

(d) R =⋃∞n=1(−n, n] and µ0((−n, n]) = 2n <∞ for all n ∈ N.

(e) By (c) and Theorem 20.3(b), λ∗ = λ0 on E . Let us check that, for each interval I, λ∗(I) isequal to the length of the interval.

Let a, b ∈ R with a ≤ b. Since λ∗ = λ0 on E , we have λ∗((a, b]) = λ0((a, b]) = b− a.

Assume a < b. For every 0 < ε < b− a,

b− a− ε = λ∗((a, b− ε]) ≤ λ∗((a, b)) ≤ λ∗([a, b)) ≤ λ∗([a, b]) ≤ λ∗((a− ε, b]) = b− a+ ε.

Letting ε→ 0 gives λ∗((a, b)) = λ∗([a, b)) = λ∗([a, b]) = b− a.

Now assume a = b. For every ε > 0,

0 ≤ λ∗((a, b)) ≤ λ∗([a, b)) ≤ λ∗([a, b]) ≤ λ∗((a− ε, b]) = b− a+ ε = ε

Letting ε→ 0 gives λ∗((a, b)) = λ∗([a, b)) = λ∗([a, b]) = 0.

Let I be an interval with infinite length. In other words, I is an interval of the form(a,∞), (−∞, b), [a,∞), (−∞, b], (−∞,∞) where a, b ∈ R. For every M > 0 there are a, b ∈ Rwith a ≤ b such that (a, b] ⊆ I and b− a > M , hence λ∗(I) ≥ λ∗((a, b]) = b− a > M . Thusλ∗(I) =∞.

(f) Theorem 20.3(c) implies E ⊆ L(R). However, Theorem 3.10 says that B(R) is the smallestσ-algebra on R that contains E . Thus B(R) ⊆ L(R).

(g) Theorem 19.3 implies λ∗|L(R) is a measure on L(R). By (a),(b),(c),(d),(e),(f) and Theorem21.1(c), λ∗|L(R) is the unique measure on L(R) that equals λ0 on E .

(h) Since B(R) ⊆ L(R) and since λ∗|L(R) is a measure on L(R), λ∗|B(R) is a measure on B(R).By (a),(b),(c),(d),(e),(f) and Theorem 21.1(c), λ∗|B(R) is the unique measure on B(R) thatequals λ0 on E .

75

Page 77: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

23 Complete Measures

Definition 23.1. Let (X,A, µ) be a measure space. We say that (X,A, µ) is complete (or µ iscomplete) if the following property holds: If N ∈ A, µ(N) = 0, and A ⊆ N , then A ∈ A. In otherwords, µ is complete if all subsets of measure zero sets are measurable.

Theorem 23.2. Let (X,A, µ) be a measure space. The following are equivalent.

(a) (X,A, µ) is complete.

(b) For all f, g : X → R, if f is measurable and f = g a.e., then g is measurable.

(c) For all f, f1, f2, . . . : X → R, if fn is measurable for each n and fn → f a.e., then f ismeasurable.

Proof. (a) ⇒ (b): Assume f is measurable and f = g a.e. Then there exists a set A ∈ A such thatµ(Ac) = 0 and f(x) = g(x) for all x ∈ A. Let B ∈ B(R) be given. Write

g−1(B) = (g−1(B) ∩A) ∪ (g−1(B) ∩Ac).

Note g−1(B) ∩ A = f−1(B) ∩ A ∈ A. Note g−1(B) ∩ Ac ⊆ Ac, Ac ∈ A, and µ(Ac) = 0. Since µ iscomplete, g−1(B) ∩Ac ∈ A. Therefore g−1(B) ∈ A. This proves g is measurable.

(b) ⇒ (c): Assume fn is measurable for each n and fn → f a.e.. Then there exists a set A ∈ Asuch that µ(Ac) = 0 and fn(x) → f(x) for all x ∈ A. Define gn = fn1A and g = f1A. Then gn ismeasurable for each n and gn → g pointwise. So g is measurable. Moreover, g = f a.e. Then (b)implies f is measurable.

(c) ⇒ (b): Assume f is measurable and f = g a.e. Define hn = f for all n. Then hn is measurablefor all n and hn → g a.e.. Then (c) implies g is measurable.

(b) ⇒ (a): We prove the contrapositive. Assume µ is not complete. So there exist sets A,N ⊆ Xsuch that N ∈ A, µ(N) = 0, A ⊆ N , and A /∈ A. Define f = 1Nc and g = 1Ac . Then f = g a.e., fis measurable, but g is not measurable.

The next theorem says that the Caratheodory restriction theorem produces a complete measure.

Theorem 23.3. Let µ∗ be an outer measure on a set X. Let M(µ∗) be the σ-algebra of µ∗-measurable sets and let µ be the measure which is the restriction of µ∗ toM(µ∗). Then (X,M(µ∗), µ)is complete.

Proof. Assume N ∈ M(µ∗), µ(N) = 0, and A ⊆ N . Then µ∗(A) ≤ µ∗(N) = µ(N) = 0, henceµ∗(A) = 0. Therefore, for any E ∈ P(X),

µ∗(E ∩A) + µ∗(E ∩Ac) ≤ µ∗(A) + µ∗(E) = µ∗(E).

So A ∈M(µ∗).

Corollary 23.4. (R,L(R), λ) is complete.

76

Page 78: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

24 Completion of Measures (Optional)

Definition 24.1. Let (X,A, µ) be a measure space. A null set (or µ-null set) is a set N ∈ A suchthat µ(N) = 0. The completion of A with respect to µ is the collection A consisting of all sets ofthe form A ∪B, where A ∈ A and B is a subset of a null set.

Observation. According to Definition 23.1, (X,A, µ) is complete iff every subset of a null setbelongs to A.

Theorem 24.2. Let (X,A, µ) be a measure space.

(a) A is a σ-algebra on X.

(b) There is a unique extension of µ to A (i.e., there is a one and only one measure on A thatagrees with µ on A). This extension is called the completion of µ and is denoted by µ. Itsatisfies µ(A ∪B) = µ(A) whenever A ∈ A and B is a subset of a null set.

(c) The measure space (X,A, µ) is complete. It is called the completion of (X,A, µ).

Theorem 24.3. Let (X,A, µ) be a measure space.

(a) A is the smallest σ-algebra on X that contains both A and all subsets of null sets.

(b) A is the smallest σ-algebra on X for which some extension of µ is complete (i.e., for whichthere exists a measure ν on A such that (X,A, ν) is complete and µ = ν on A).

Theorem 24.4. Let (X,A, µ) be a σ-finite measure space. Let µ∗ be the outer measure gener-ated by µ (as in Theorem 20.3 with µ0 replaced by µ). Then M(µ∗) = A, µ∗|M(µ∗) = µ, and(X,M(µ∗), µ∗|M(µ∗)) is the completion of (X,A, µ).

Remark: Exploring what happens in the non-σ-finite case is an exercise left to the reader.

Corollary 24.5. (R,L(R), λ) is the completion of (R,B(R), λ).

Corollary 24.6. Let (X,S, µ) and (Y, T , ν) be σ-finite measure spaces. Let π∗ be the outermeasure on X × Y generated by µ and ν (as in Theorem 28.3). Then (X,M(π∗), µ ⊗ ν) is thecompletion of (X,S ⊗ T , µ⊗ ν).

Theorem 24.7. Let (X,A, µ) be a measure space and let (X,A, µ) be its completion.

(a) If f : X → R is A-measurable, then f is A-measurable.

(b) If f : X → R is A-measurable, then there is a A-measurable function g : X → R such thatf = g µ-a.e.

Proof Outline. (a): Use A ⊆ A and the definition of measurability.

(b): Prove it first for f simple. For general f , consider a sequence of simple functions convergingpointwise to f .

77

Page 79: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

25 Riemann Integral

Definition 25.1. Let f : [a, b] → R be a bounded function. A partition of [a, b] is a finite setP = {x0, x1, . . . , xn} such that a = x0 < x1 < · · · < xn = b. For every partition P = {x0, . . . , xn}of [a, b], the lower Riemann sum for f and P is

L(f, P ) =

n∑i=1

(xi − xi−1) inf {f(x) : x ∈ [xi−1, xi]}

and the upper Riemann sum for f and P is

U(f, P ) =n∑i=1

(xi − xi−1) sup {f(x) : x ∈ [xi−1, xi]} .

The lower Riemann integral of f on [a, b] is∫ b

af = sup {L(f, P ) : P is a partition of [a, b]}

and the upper Riemann integral of f on [a, b] is∫ b

af = sup {U(f, P ) : P is a partition of [a, b]} .

If∫ ba f =

∫ ba f , we define the Riemann integral of f on [a, b] to be

∫ ba f =

∫ ba f =

∫ ba f, and we say

is f Riemann integrable on [a, b].

Lemma 25.2. Let f : [a, b]→ R be a bounded function. If P1 and P2 are partitions of [a, b] suchthat P1 ⊆ P2, then

L(f, P1) ≤ L(f, P2) ≤ U(f, P2) ≤ U(f, P1).

Proof. First inequality: Consider P2 with just one more point that P1 and compare sums; repeatfor larger P2. Third inequality: Similar to first. Middle inequality: The infimum of a set is lessthan or equal to the supremum of the set.

78

Page 80: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

26 Comparison of Lebesgue Integral and Riemann Integral

In this section, (X,A, µ) = (R,B(R), λ). If f : [a, b]→ R is measurable, then the Lebesgue integralof f on [a, b] is

∫[a,b] fdλ =

∫[a,b] f̃dλ where f̃ : R → R is defined by f̃ = f on [a, b] and f̃ = 0 on

R \ [a, b]. It is easy to check f is measurable iff f̃ is measurable.

Theorem 26.1. Suppose f : [a, b]→ R is bounded on [a, b].

(a) If f is Riemann integrable on [a, b], then f is Lebesgue measurable, f is Lebesgue integrable,and the Riemann and Lebesgue integrals of f on [a, b] are equal.

(b) f is Riemann integrable on [a, b] iff f is continuous at almost every point of [a, b].

Proof. We use the notation∫f for the lower Riemann integral of f on [a,b],

∫f for the upper

Riemann integral of f on [a, b], and∫fdλ for the Lebesgue integral on [a, b]. By the definitions of∫

f and∫f , we can choose sequences (P ′n), (P ′′n ) of partitions of [a, b] such that

limnL(f, P ′n) =

∫f and lim

nU(f, P ′′n ) =

∫f.

Define P ′′′n = {a+ k(b− a)/n : k = 0, 1, . . . , n}. Define Pn =⋃nm=1(P

′m ∪ P ′′m ∪ P ′′′m ). Note that

P ′n ⊆ Pn, P ′′n ⊆ Pn, and Pn ⊆ Pn+1. Note also that, if Pn = {x0, x1, . . . , xk}, then 0 ≤ xi − xi−1 ≤1/n for all i = 0, 1, . . . , k. Lemma 25.2 implies

L(f, P ′n) ≤ L(f, Pn) ≤ L(f, Pn+1) ≤∫f and U(f, P ′n) ≥ U(f, Pn) ≥ U(f, Pn+1) ≥

∫f

It follows that

limnL(f, Pn) =

∫f and lim

nU(f, Pn) =

∫f.

For each n, write Pn = {x0, . . . , xk}, and define

sn = f(x0)1{x0} +k∑i=1

inf {f(x) : x ∈ [xi−1, xi]}1(xi−1,xi],

tn = f(x0)1{x0} +k∑i=1

sup {f(x) : x ∈ [xi−1, xi]}1(xi−1,xi].

Then sn and tn are measurable, and∫sndλ = L(f, Pn) and

∫tndλ = U(f, Pn).

Therefore

limn

∫sndλ =

∫f and lim

n

∫tndλ =

∫f.

Notesn ≤ sn+1 ≤ f ≤ tn+1 ≤ tn.

79

Page 81: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Define s = supn sn and t = infn tn. Then s and t are measurable and

s1 ≤ s ≤ f ≤ t ≤ t1. (26.1)

Since s1 and t1 are finite everywhere, s and t are finite everywhere. Moreover, since∫s1 and∫

t1 are finite,∫sdλ and

∫tdλ are finite, in other words, s and t are Lebesgue integrable. Since

0 ≤ sn − s1 ↑ s− s1, the monotone convergence theorem implies∫(s− s1)dλ = lim

n

∫(sn − s1)dλ.

Adding∫s1 gives ∫

sdλ = limn

∫sndλ =

∫f. (26.2)

A similar argument gives ∫tdλ = lim

n

∫tndλ =

∫f. (26.3)

Since s ≤ t, we see from (26.2) and (26.3) that f is Riemann integrable on [a, b] iff∫f =

∫f iff∫

(t− s)dλ = 0 iff s = t a.e.

Now we prove (a). Assume f is Riemann integrable on [a, b]. By the observation above, s = t a.e.By (26.1), we have s ≤ f ≤ t. Therefore s = f = t a.e. Since s and t are measurable and since theLebesgue measure is complete, Theorem 23.2 implies f is measurable. Moreover,∫

fdλ =

∫sdλ =

∫tdλ =

∫f =

∫f.

So f is Lebesgue integrable and the Riemann and Lebesgue integrals are equal.

Now we prove (b). As we observed above, f is Riemann integrable on [a, b] iff s(x) = t(x) for a.e.x ∈ [a, b]. Since

⋃∞n=1 Pn is a countable subset of [a, b], it has Lebesgue measure 0. Thus f is

Riemann integrable on [a, b] iff s(x) = t(x) for a.e. x ∈ [a, b] \⋃∞n=1 Pn. Therefore, it will suffice to

show that, for every x ∈ [a, b]\⋃∞n=1 Pn, s(x) = t(x) iff f is continuous at x. Let x ∈ [a, b]\

⋃∞n=1 Pn

be given.

Assume s(x) = t(x). Then limn(tn(x)−sn(x)) = 0. Let ε > 0. Choose n such that tn(x)−sn(x) < ε.Write Pn = {x0, x1, . . . , xk}. Then x ∈ (xj−1, xj) for some j. Choose δ > 0 such that (x−δ, x+δ) ⊆(xj−1, xj). Let y ∈ [a, b] with |x− y| < δ. Then x, y ∈ (xj−1, xj). Therefore

|f(x)− f(y)| ≤ sup {f(x) : x ∈ [xi−1, xi]} − inf {f(x) : x ∈ [xi−1, xi]}= sup {f(x) : x ∈ [xi−1, xi]} − inf {f(x) : x ∈ [xi−1, xi]}1(xj−1,xj ](x)

=k∑i=1

(sup {f(x) : x ∈ [xi−1, xi]} − inf {f(x) : x ∈ [xi−1, xi]}) 1(xj−1,xj ](x)

= tn(x)− sn(x) < ε.

Therefore f is continuous at x.

80

Page 82: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Conversely, now we assume f is continuous at x and deduce that s(x) = t(x). Let ε > 0. Chooseδ > 0 such that, for all y ∈ [a, b], if |x − y| < δ, then |f(x) − f(y)| < ε. Choose N ∈ N such that1/N < δ. Let n ≥ N be arbitrary. Write Pn = {x1, x0, . . . , xk}. Then x ∈ (xj−1, xj) for some1 ≤ j ≤ k. Recall, from the definition of Pn, that 0 ≤ xi − xi−1 ≤ 1/n for all 1 ≤ i ≤ k. So, forevery y ∈ [xj−1, xj ], we have 0 ≤ |y − x| ≤ xj − xj−1 ≤ 1/n ≤ 1/N < δ, hence |f(y) − f(x)| < ε.Thus, for every y1, y2 ∈ [xj−1, xj ] we have

|f(y1)− f(y2)| ≤ |f(y1)− f(x)|+ |f(y2)− f(x)| ≤ 2ε.

It follows that

tn(x)− sn(x) =k∑i=1

(sup {f(x) : x ∈ [xi−1, xi]} − inf {f(x) : x ∈ [xi−1, xi]}) 1(xj−1,xj ](x)

= sup {f(x) : x ∈ [xi−1, xi]} − inf {f(x) : x ∈ [xi−1, xi]}1(xj−1,xj ](x)

= sup {f(x) : x ∈ [xi−1, xi]} − inf {f(x) : x ∈ [xi−1, xi]}≤ 2ε.

Thust(x)− s(x) = lim

n(tn(x)− sn(x)) ≤ 2ε.

Letting ε→ 0 gives s(x) = t(x).

81

Page 83: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Chapter 8

Product Measures and Fubini’sTheorem

The goal of this chapter is to find sufficient conditions for changing the order of iterated integrals.Given measure spaces (X,S, µ) and (Y, T , ν) and a function f : X × Y → R, we will deriveconditions under which the following formula is valid:∫ ∫

f(x, y)dµ(x)dν(y) =

∫ ∫f(x, y)dν(y)dν(x)

27 Measurable Rectangles and Product σ-Algebras

Let X and Y be sets.

Notation. If S ⊆ P(X) and T ⊆ P(Y ), we define

S × T = {A×B : A ∈ S, B ∈ T }

Note that this is not the usual Cartesian product of S and T .

Remark. If S and T are σ-algebras, then S × T need not be a σ-algebra (see Exercise N).

Definition 27.1. Let S be a σ-algebra on X and T be a σ-algebra on Y .

(a) The sets in S × T are called measurable rectangles.

(b) The σ-algebra on X × Y generated by S × T is called the product σ-algebra of S and T .It is denoted by S ⊗ T . In other words, S ⊗ T = σ(S ⊗ T ).

Definition 27.2. Let E ⊆ X × Y .

(a) For each x0 ∈ X, the x0-section of E is the set Ex0 = {y ∈ Y : (x0, y) ∈ E} ⊆ Y.

(b) For each y0 ∈ Y , the y0-section of E is the set Ey0 = {x ∈ X : (x, y0) ∈ E} ⊆ X.

82

Page 84: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Example 27.3.

(a) If E = [1, 2]× [3, 4] ⊆ R× R, then Ex = [3, 4] for every x ∈ [1, 2] and Ex = ∅ otherwise.

(b) If E ={

(x, y) : x2 + y2 = 1}⊆ R× R, then E0 = {1,−1} and E1 = {0} and E2 = ∅.

Definition 27.4. Let f : X × Y → R.

(a) For each x0 ∈ X, the x0-section of f is the function fx0 : Y → R defined by fx0(y) = f(x0, y)

(b) For each y0 ∈ Y , the y0-section of f is the function fy0 : X → R defined by fy0(x) = f(x, y0)

Example 27.5.

(a) If f : R× R→ R is defined by f(x, y) = 5x2 + y3, then f2(y) = f(2, y) = 20 + y3.

(b) For any E ⊆ X × Y and x0 ∈ X and y0 ∈ Y , we have (1E)x0 = 1Ex0and (1E)y0 = 1Ey0 .

Theorem 27.6. Let S be a σ-algebra on X and let T be a σ-algebra on Y .

(a) If E ∈ S ⊗ T , then Ex0 ∈ T for each x0 ∈ X and Ey0 ∈ S for each y0 ∈ Y .

(b) If f : X × Y → R is S ⊗ T -measurable, then fx0 is T -measurable for each x0 ∈ X and fy0 isS-measurable for each y0 ∈ Y .

Proof.

(a) We must show

S ⊗ T ⊆ {E ⊆ X × Y : Ex0 ∈ T for all x0 ∈ X and Ey0 ∈ S for all y0 ∈ Y } .

It is easy to check that the right-hand side is a σ-algebra containing the collection of measur-able rectangles. As S ⊗ T is the smallest σ-algebra containing the collection of measurablerectangles, the desired containment follows.

(b) Let x0 ∈ X. For every Borel set B ⊆ R, we have f−1(B) ∈ S ⊗ T , and so

(fx0)−1(B) =[f−1(B)

]x0∈ T .

This proves that fx0 is T -measurable for each x0 ∈ X. The other conclusion is provedsimilarly.

83

Page 85: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

28 Product Measures

If S and T are σ-algebras, then S × T need not be a σ-algebra (see Exercise N). However, we dohave the following lemma.

Lemma 28.1. If E is a demi-ring on a set X and F is a demi-ring on a set Y , then E × F is ademi-ring on X × Y .

Proof. First note that ∅ = ∅ × ∅ ∈ E × F . Now let A1 × A2 and B1 × B2 be elements of E × F .Then A1 \ B1 =

⋃n1i=1C1,i for some disjoint sets C1,1, . . . , C1,n1 ∈ E . Also A2 \ B2 =

⋃n2i=1C2,i for

some disjoint sets C2,1, . . . , C2,n2 ∈ E . Define C1,0 = B1 and C2,0 = B2. Then A1 =⋃n1i=0C1,i and

C1,0, . . . , C1,n1 are disjoint. Also A2 =⋃n2i=0C2,i and C2,0, . . . , C2,n2 are disjoint. Therefore

A1 ×A2 =

n1⋃i1=0

n2⋃i2=0

(C1,i1 × C2,i2).

Moreover, the sets C1,i1 × C2,i2 are disjoint sets in E × F for all pairs (i1, i2). Indeed, if (i1, i2) 6=(j1, j2), then i1 6= j1 or i2 6= j2, so C1,i1 ∩ C1,j1 = ∅ or C2,i2 ∩ C2,j2 = ∅, and hence

(C1,i1 × C2,i2) ∩ (C1,j1 × C2,j2) = (C1,i1 ∩ C1,j1)× (C2,i2 ∩ C2,j2) = ∅.

Therefore

(A1 ×A2) \ (B1 ×B2) =

n1⋃i1=0

n2⋃i2=0

((C1,i1 × C2,i2) \ (C1,0 × C2,0)) =

n1⋃i1=1

n2⋃i2=1

(C1,i1 × C2,i2).

Corollary 28.2. If S is a σ-algebra on a set X and T is a σ-algebra on a set Y , then S × T is ademi-ring on X × Y .

Theorem 28.3. Let (X,S, µ) and (Y, T , ν) be measure spaces. Define π0 : S × T → [0,∞] by

π0(A×B) = µ(A)ν(B)

for each A×B ∈ S × T . Let π∗ be the outer measure generated by π0.

(a) π0 is finitely additive and countably monotone.

(b) S × T ⊆M(π∗) and π∗ = π0 on S × T .

(c) Let A be a σ-algebra on X × Y such that S × T ⊆ A ⊆ M(π∗). Define µ ⊗ ν = π∗|A (i.e.,µ⊗ ν is the restriction of π∗ to A). Then µ⊗ ν is a measure on A and

µ⊗ ν(A×B) = µ(A)ν(B) for all A×B ∈ S × T . (28.1)

Moreover, if A = M(π∗), then µ⊗ ν is complete. We call µ⊗ ν the product measure of µand ν on A.

(d) Under the assumptions of (c), if X is σ-finite with respect to µ and Y is σ-finite with respectto ν, then X × Y is σ-finite with respect to π0 and µ ⊗ ν is the unique measure on A thatsatisfies (28.1).

84

Page 86: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Proof.

(a): We first prove that π0 is countably additive. Let A1 × B1, A2 × B2, . . . ∈ S × T be disjointsets. Let A×B ∈ S × T such that A×B =

⋃∞i=1(Ai ×Bi). For every x ∈ X and y ∈ Y ,

1A(x)1B(y) = 1A×B(x, y) =

∞∑i=1

1Ai×Bi(x, y) =

∞∑i=1

1Ai(x)1Bi(y).

For fixed y ∈ Y , integrating in x gives

µ(A)1B(y) =

∫1A(x)1B(y)dµ(x) =

∫ ∞∑i=1

1Ai(x)1Bi(y)dµ(x).

By applying the monotone convergence theorem to the partial sums of the last series, we obtain

µ(A)1B(y) =∞∑i=1

∫1Ai(x)1Bi(y)dµ(x) =

∞∑i=1

µ(Ai)1Bi(y).

Now integrating in y gives

µ(A)ν(B) =

∫µ(A)1B(y)dν(y) =

∫ ∞∑i=1

µ(Ai)1Bi(y)dν(y).

By applying the monotone convergence theorem to the partial sums of the last series, we obtain

µ(A)ν(B) =

∞∑i=1

∫µ(Ai)1Bi(y)dν(y). =

∞∑i=1

µ(Ai)ν(Bi).

Hence π0(A×B) =∑∞

i=1 π0(Ai ×Bi). Thus π0 is countably additive. It follows immediately thatπ0 is finitely additive. To see that π0 is countably monotone, we can appeal to HW8 Exercise 4(c).Alternatively, we can modify the argument above as follows. We drop the condition that the setsAi ×Bi are disjoint and require only that A×B ⊆

⋃∞i=1(Ai ×Bi). Then the second = in the first

displayed equation above becomes ≤, and in all subsequent equations the first = becomes ≤.

(b): Combine (a), Lemma 28.1, and Theorem 20.3.

(c): Combine (a), (b), Theorem 21.1(a), and Theorem 23.3.

(d): Since X is σ-finite with respect to µ, there exists A1, A2, . . . ∈ S such that X =⋃∞i=1Ai and

µ(Ai) < ∞ for all i. Since Y is σ-finite with respect to ν, there exists B1, B2, . . . ∈ T such thatY =

⋃∞j=1Bj and µ(Bj) <∞ for all j. Then

X × Y =

( ∞⋃i=1

Ai

∞⋃j=1

Bj

=

∞⋃i=1

∞⋃j=1

(Ai ×Bj),

Ai ×Bj ∈ S × T and π0(Ai ×Bj) = µ(Ai)ν(Bj) <∞ for all i, j. So X × Y is σ-finite with respectto π0. Now Theorem 21.1(c) implies the result.

85

Page 87: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

29 Fubini’s Theorem

Let (X,S, µ) and (Y, T , ν) be complete measure spaces. The reader should not overlook the com-pleteness assumptions. Let A be a σ-algebra on X × Y such that S × T ⊆ A ⊆M(π∗). Let µ⊗ νbe the product of µ and ν on A.

Lemma 29.1. Let E ∈ (S × T )σδ with µ⊗ ν(E) <∞.

(a) The function g(y) = µ(Ey) is T -measurable

(b) The function h(x) = µ(Ex) is S-measurable.

(c)

∫µ(Ey)dν(y) = µ⊗ ν(E) =

∫ν(Ex)dµ(x)

Proof Outline. Verify the theorem for measurable rectangles directly. Then use countable addi-tivity, the monotone convergence theorem and the fact that limits of measurable functions aremeasurable to extend the theorem to countable unions of measurable rectangles. Finally, use con-tinuity from above and the dominated convergence theorem to extend the theorem to countableintersections of countable unions of measurable rectangles.

Proof. The proof that g(y) = µ(Ey) is T -measurable is symmetric with the proof that h(x) = 7→µ(Ex) is S-measurable. Thus we omit the latter. Similarly, the proofs of the first equality andsecond equality in (c) are virtually identical, so we omit the latter.

First note that Theorem 27.6 implies Ey ∈ S for every y ∈ Y . We consider three cases. We onlyuse the completeness of (X,S, µ) and (Y, T , ν) in the third case.

Case 1: E = A×B ∈ S × T .

For each y ∈ Y , we have Ey = A if y ∈ B and Ey = ∅ if y /∈ B, hence µ(Ey) = µ(A)1B(y).Therefore, since B ∈ T , the function g(y) = µ(Ey) is T -measurable. Moreover,∫

µ(Ey)dν(y) =

∫µ(A)1B(y)dν(y) = µ(A)ν(B) = µ⊗ ν(A×B) = µ⊗ ν(E)

Case 2: E ∈ (S × T )σ.

Then E =⋃∞i=1Ei for some sets Ei = Ai × Bi ∈ S × T . Since S × T is a demi-ring, we can use

Lemma 20.2(a) and relabel to ensure that the sets Ei are disjoint (as in HW8 Exercise 4(b)). Then,for each y ∈ Y , the sets Eyi are disjoint, Ey =

⋃∞i=1E

yi , and

µ(Ey) =∞∑i=1

µ(Eyi ) = limn

n∑i=1

µ(Eyi ).

By Case 1, gi(y) = µ(Eyi ) is T -measurable for each i. Therefore g(y) = µ(Ey) is T -measurablebecause g =

∑∞i=1 gi = limn

∑ni=1 gi. Moreover,∫

µ(Ey)dν(y) =

∫ ∞∑i=1

µ(Eyi )dν(y)

86

Page 88: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

By applying the monotone convergence theorem to the partial sums of the series, we obtain∫µ(Ey)dν(y) =

∞∑i=1

∫µ(Eyi )dν(y).

Now using Case 1, we get ∫µ(Ey)dν(y) =

∞∑i=1

µ⊗ ν(Ei)

Since E =⋃∞i=1Ei and the sets Ei are disjoint, we have∫

µ(Ey)dν(y) = µ⊗ ν(E).

Case 3: E ∈ (S × T )σδ.

Then E =⋂∞n=1En for some sets En ∈ (S × T )σ.

Claim: There exists a decreasing sequence (Gn) of sets in (S × T )σ such that E =⋂∞n=1Gn and

µ⊗ ν(Gn) <∞ for every n.

Proof of Claim: By Theorem 20.4, we can find a set F ∈ (S×T )σ such that E ⊆ F and µ⊗ν(F ) ≤µ ⊗ ν(E) + 1. Define Fn = En ∩ F for each n. Then E =

⋂∞n=1 Fn. Since µ ⊗ ν(E) < ∞ and

Fn ⊆ F , we have µ ⊗ ν(Fn) < ∞ for each n. By Lemma 20.2(b), Fn ∈ (S × T )σ for each n.Define Gn =

⋂nk=1 Fk for each n. Then (Gn) is a decreasing sequence of sets, E =

⋂∞n=1Gn,

and µ ⊗ ν(Gn) ≤ µ ⊗ ν(Fn) < ∞ for each n. Using Lemma 20.2(b) again, we can show thatGn ∈ (S × T )σ for each n. This completes the proof of the claim.

By Case 2, for every n, the function gn(y) = µ(Gyn) is T -measurable and∫µ(Gyn)dν(y) = µ⊗ν(Gn).

So, by continuity from above (Theorem 5.2), we have

µ⊗ ν(E) = limnµ⊗ ν(Gn) = lim

n

∫µ(Gyn)dν(y) = lim

n

∫gn(y)dν(y). (29.1)

On the other hand, we have ∫g1(y)dν(y) = µ⊗ ν(G1) <∞.

Thus g1(y) = µ(Gy1) is finite for ν-a.e. y ∈ Y . For each y ∈ Y , Ey =⋂∞n=1G

yn and (Gyn) is a

decreasing sequence of sets in S. For each y ∈ Y such that g1(y) = µ(Gy1) is finite, continuity fromabove (Theorem 5.2) implies limn gn(y) = limn µ(Gyn) = limn µ(Ey) = g(y). Therefore gn → g forν-a.e. y ∈ Y . Since (Y, T , ν) is complete and each gn is T -measurable, Theorem 23.2 implies g isT -measurable. In other words, the function g(y) = µ(Ey) is T -measurable. We have observed thatgn → g for ν-a.e. and g1 is ν-integrable. We also have 0 ≤ gn ≤ g1 for all n. Thus the dominatedconvergence theorem gives∫

µ(Ey)dν(y) =

∫g(y)dν(y) = lim

n

∫g(y)dν(y)

Combining this with (29.1) gives ∫µ(Ey)dν(y) = µ⊗ ν(E).

87

Page 89: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

This completes the proof.

Note: Everywhere that we used continuity from above, we could have instead used the dominatedconvergence theorem applied to indicator functions. Moreover, if we used the dominated conver-gence theorem, we would only need that Gn ⊆ G1, instead of that (Gn) is decreasing.

Lemma 29.2. Let E ∈ A with µ⊗ ν(E) <∞.

(a) Ey ∈ S for ν-a.e. y ∈ Y . In other words, there is a set B ∈ T such that ν(Bc) = 0 andEy ∈ S for every y ∈ B.

(b) Ex ∈ T for µ-a.e. x ∈ X. In other words, there is a set A ∈ S such that µ(Ac) = 0 andEx ∈ T for every x ∈ A.

(c) There exists a set B as in (a) such that the function g(y) =

{µ(Ey) if y ∈ B0 otherwise

}is T -

measurable

(d) There exists a set A as in (b) such that the function h(x) =

{ν(Ex) if x ∈ A0 otherwise

}is S-

measurable.

(e)

∫g(y)dν(y) = µ⊗ ν(E) =

∫h(x)dµ(x).

Proof Outline. Use Theorem 20.4 to approximate an arbitrary set E ∈ A by a set H ∈ (S ⊗ T )σδwith the same measure of E and the set H \ E ∈ A which is contained in a set H0 ∈ (S ⊗ T )σδhaving measure 0. Lemma 29.1 says the theorem holds for H and the completeness of µ and νcombined with Lemma 29.1 imply that the theorem holds for E \H. The additivity of measuresand the fact that sums of measurable functions are measurable then completes the proof.

Proof. The proofs of (a) and (c) are symmetric with the proofs of (b) and (d). Thus we omit thelatter. Similarly, the proofs of the first equality and second equality in (e) are virtually identical,so we omit the latter.

By Theorem 20.4, there is a set H ∈ (S × T )σδ such that E ⊆ H and µ⊗ ν(H) = µ⊗ ν(E) <∞.Then H \ E ∈ A and (by Theorem 5.2) µ⊗ ν(H \ E) = 0.

By applying Theorem 20.4 to H \ E, there is a set H0 ∈ (S × T )σδ such that H \ E ⊆ H0 andµ ⊗ ν(H0) = µ ⊗ ν(H \ E) = 0. By Theorem 27.6, Hy

0 ∈ S for every y ∈ Y . By Lemma 29.1, thefunction g0(y) = µ(Hy

0 ) is T -measurable and∫µ(Hy

0 )dν(y) = µ⊗ ν(H0) = 0.

Thus µ(Hy0 ) = 0 for ν-a.e. y ∈ Y . Since (X,S, µ) is complete and (E \ H)y ⊆ Hy

0 , we have(E \H)y ∈ S and µ((E \H)y) = 0 for ν-a.e. y ∈ Y . By Theorem 27.6, Hy ∈ S for every y ∈ Y .Then, since Ey = Hy \ (H \ E)y (verify this), we have Ey ∈ S for ν-a.e. y ∈ Y . This proves (a).

Let B ∈ T with ν(Bc) = 0 and Ey ∈ S for every y ∈ B. Define g as in (c). By Lemma 29.1, thefunction gH(y) = µ(Hy) is T -measurable. For each y ∈ Y , we have Hy = Ey∪ (H \E)y. Therefore,

88

Page 90: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

for each y ∈ B, the countable additivity of µ gives

gH(y) = µ(Hy) = µ(Ey) + µ((H \ E)y) = µ(Ey) = g(y)

Thus gH(y) = g(y) for ν-a.e. y ∈ Y . Since gH is T -measurable and since (Y, T , ν) is complete,Theorem 23.2 implies g is T -measurable. This proves (c).

Since gH(y) = g(y) for ν-a.e. y ∈ Y , integrating gives∫µ(Hy)dν(y) =

∫gH(y)dν(y) =

∫g(y)dν(y)

By Lemma 29.1,

µ⊗ ν(H) =

∫µ(Hy)dν(y).

Finally, recall that µ⊗ ν(E) = µ⊗ ν(H). Putting everything together, we see that

µ⊗ ν(E) =

∫g(y)dν(y).

This completes the proof.

Theorem 29.3. (Tonelli’s Theorem). Let f : X × Y → R be A-measurable. Assume f ≥ 0 and{f > 0} is σ-finite with respect to µ⊗ ν.

(a) fy is S-measurable for ν-a.e. y ∈ Y . In other words, there is a set B ∈ T such that ν(Bc) = 0and fy is T -measurable for every y ∈ B.

(b) fx is T -measurable for µ-a.e. x ∈ X. In other words, there is a set A ∈ S such that µ(Ac) = 0and fx is S-measurable for every x ∈ A.

(c) There exists a set B as in (a) such that the function g(y) =

{ ∫fy(x)dµ(x) if y ∈ B

0 otherwise

}is

T -measurable.

(d) There exists a set A as in (b) such that the function h(x) =

{ ∫fx(y)dν(y) if x ∈ A

0 otherwise

}is

S-measurable.

(e)

∫g(y)dν(y) =

∫f(x, y)d(µ⊗ ν)(x, y) =

∫h(x)dµ(x).

Proof Outline. Lemma 29.2 implies the theorem holds for indicator functions. Then the linearity ofthe integral and the fact that linear combinations of measurable functions are measurable impliesthe theorem holds for non-negative simple functions. Finally, the simple approximation theorem,the monotone convergence theorem, and the fact limits of measurable functions are measurableimplies the theorem holds for non-negative functions.

Proof.

The proofs of (a) and (c) are symmetric with the proofs of (b) and (d). Thus we omit the latter.Similarly, the proofs of the first equality and second equality in (e) are virtually identical, so weomit the latter.

89

Page 91: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Case 1: f = 1E , E ∈ A, µ⊗ ν(E) <∞

This is Lemma 29.2.

Case 2: f is a A-measurable simple function such that µ⊗ ν({f > 0}) <∞.

Then f =∑n

i=1 ci1Ei , where ci ∈ (0,∞), Ei = f−1({ci}) ∈ S ⊗ T , and µ⊗ ν(Ei) <∞. Note that,for every y ∈ Y , fy =

∑ni=1 ci1

yEi

. By Case 1, for each i ∈ {1, . . . , n}, there is a set Bi ∈ T suchthat ν(Bc

i ) = 0, 1yEiis S-measurable for every y ∈ Bi, and the function

gi(y) =

{ ∫1yEi

(x)dµ(x) if y ∈ Bi0 otherwise

}is T -measurable. Let B =

⋂ni=1Bi. Then B ∈ T with ν(Bc) = 0, 1yEi

is S-measurable for everyy ∈ B and every i ∈ {1, . . . , n}, and the function

gi(y)1B(y) =

{ ∫1yEi

(x)dµ(x) if y ∈ B0 otherwise

}is T -measurable for every i ∈ {1, . . . , n}. Therefore fy =

∑ni=1 ci1

yEi

is S-measurable for everyy ∈ B and the function

g(y) =

{ ∫fy(x)dµ(x) if y ∈ B

0 otherwise

}=

{ ∫ ∑ni=1 ci1

yEi

(x)dµ(x) if y ∈ B0 otherwise

}=

n∑i=1

cigi(y)1B(y)

is T -measurable. Also by Case 1, we have that∫gi(y)dν(y) = µ ⊗ ν(Ei). Notice that gi = gi1B

for ν-a.e. y ∈ Y . So∫gi(y)1B(y)dν(y) =

∫1Ei(x, y) = d(µ⊗ ν)(x, y) = µ⊗ ν(Ei). Therefore∫

g(y)dν(y) =n∑i=1

ci

∫gi(y)1B(y)dν(y) =

n∑i=1

ciµ⊗ ν(Ei) =

∫f(x, y)dµ⊗ ν(x, y).

Case 3: f ≥ 0 and {f > 0} is σ-finite with respect to µ⊗ ν.

By the Simple Approximation Theorem (Theorem 7.5), there is a sequence (sn) of A-measurablesimple functions such that 0 ≤ sn ↑ f pointwise on X × Y . Since {f > 0} is σ-finite with respectto µ ⊗ ν, there is an increasing sequence of sets F1, F2, . . . ∈ A such that X × Y =

⋃∞i=n Fn and

µ ⊗ ν(Fn) < ∞ for each n. Define fn = sn1Fn . Then (fn) is a sequence of A-measurable simplefunctions such that 0 ≤ fn ↑ f pointwise on X × Y and µ⊗ ν({fn > 0}) <∞ for each n.

By Case 2, for each n, there is a set Bn ∈ T such that ν(Bcn) = 0, fyn is S-measurable for each

y ∈ Bn, and the function

gn(y) =

{ ∫fy(x)dµ(x) if y ∈ Bn

0 otherwise

}is T -measurable. Let B =

⋂∞n=1Bn. Then B ∈ T with ν(Bc) = 0, fyn is S-measurable for every

y ∈ B and every n, and the function

gn(y)1B(y) =

{ ∫fyn(x)dµ(x) if y ∈ B

0 otherwise

}

90

Page 92: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

is T -measurable for every n. Note that 0 ≤ fyn ↑ fy pointwise on X for every y ∈ Y . Therefore fy

is T -measurable for every y ∈ B. Moreover, by the monotone convergence theorem, 0 ≤ gn1B ↑ gpointwise on Y , where

g(y) =

{ ∫fyn(x)dµ(x) if y ∈ B

0 otherwise

}.

Therefore g is T -measurable. Also by Case 2, we have that∫gn(y)dν(y) =

∫fn(x, y)d(µ⊗ν)(x, y).

Notice that gn = gn1B for ν-a.e. y ∈ Y . So∫gn(y)1B(y)dν(y) =

∫fn(x, y)d(µ ⊗ ν)(x, y). Then,

applying the monotone convergence theorem twice,∫g(y)dν(y) = lim

n

∫gn(y)dν(y) = lim

n

∫fn(x, y)d(µ⊗ ν)(x, y) =

∫f(x, y)dµ⊗ ν(x, y).

Theorem 29.4. (Fubini’s Theorem). Let f : X×Y → R be A-measurable. Assume f is integrablewith respect to µ⊗ ν.

(a) fy is S-measurable for ν-a.e. y ∈ Y . In other words, there is a set B ∈ T such that ν(Bc) = 0and fy is T -measurable for every y ∈ B.

(b) fx is T -measurable for µ-a.e. x ∈ X. In other words, there is a set A ∈ S such that µ(Ac) = 0and fx is S-measurable for every x ∈ A.

(c) There exists a set B as in (a) such that the function g(y) =

{ ∫fy(x)dµ(x) if y ∈ B

0 otherwise

}is

T -measurable.

(d) There exists a set A as in (b) such that the function h(x) =

{ ∫fx(y)dν(y) if x ∈ A

0 otherwise

}is

S-measurable.

(e)

∫g(y)dν(y) =

∫f(x, y)d(µ⊗ ν)(x, y) =

∫h(x)dµ(x).

Proof Outline. Apply Tonelli’s theorem to f+ and f−.

Proof. Since f is integrable with respect to µ⊗ ν and since∫fdµ⊗ ν =

∫f+dµ⊗ ν −

∫f+dµ⊗ ν,

f+ and f− are integrable with respect to µ ⊗ ν. {f+ > 0} and {f− > 0} σ-finite with respect toµ ⊗ ν. Tonelli’s theorem implies (a)-(e) hold with f replaced by f+ and f−. In particular, thereare sets B+, B− ∈ T such that ν((B+)c) = ν((B−)c) = 0, (f+)y is S-measurable for all y ∈ B+,

(f−)y is S-measurable for all y ∈ B−, the functions g+0 (y) =

{ ∫(f+)y(x)dµ(x) if y ∈ B+

0 otherwise

}and

g−0 (y) =

{ ∫(f−)y(x)dµ(x) if y ∈ B−

0 otherwise

}are both T -measurable, and

∫g+0 dν(y) =

∫f+d(µ⊗ ν) and

∫g−0 dν(y) =

∫f−d(µ⊗ ν). (29.2)

91

Page 93: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Define B = B+ ∩ B−. Then B ∈ T , ν(B) = 0, (f+)y and (f−)y are S-measurable for all y ∈ B,and the functions g+0 1B and g−0 1B are both T -measurable. Since f = f+ − f−, we have fy =(f+)y − (f−)y for every y ∈ Y . Therefore fy is S-measurable for all y ∈ B. This proves (a).Moreover, for every y ∈ Y ,

g(y) =

{ ∫fy(x)dµ(x) if y ∈ B

0 otherwise

}=

{ ∫(fy)+(x)dµ(x)−

∫(fy)−(x)dµ(x) if y ∈ B

0 otherwise

}=

{ ∫(fy)+(x)dµ(x) if y ∈ B

0 otherwise

}−{ ∫

(fy)−(x)dµ(x) if y ∈ B0 otherwise

}= g+0 (y)1B(y)− g−0 (y)1B(y).

So g is T -measurable. This proves (c). (As an aside, since g+0 , g−0 ≥ 0, we also have g+ = g+0 1B

and g− = g−0 1B.) Finally, since g+0 1B = g+0 and g−0 1B = g−0 hold ν-a.e., (29.2) implies∫g(y)dν(y) =

∫g+0 (y)1B(y)dν(y)−

∫g−0 (y)1B(y)dν(y) =

∫g+0 (y)dν(y)−

∫g−0 (y)dν(y)

=

∫f+d(µ⊗ ν)−

∫f−d(µ⊗ ν) =

∫fd(µ⊗ ν).

This proves the first equality in (e). The proofs of (b),(d), and the second equality in (e) aresimilar.

Remark 29.5. Concerning the hypotheses of Tonelli’s theorem, if X is σ-finite with respect to µand Y is σ-finite with respect to ν, then Theorem 28.3(d) implies {f > 0} is σ-finite with respectto µ⊗ ν.

Remark 29.6. The conclusion (e) in Fubini’s and Tonelli’s theorem is often stated as∫ ∫f(x, y)dµ(x)dν(y) =

∫f(x, y)dµ⊗ ν(x, y) =

∫ ∫f(x, y)dν(x)dµ(y).

The statement (e) is the precise version.

Remark 29.7. Fubini’s and Tonelli’s theorems are frequently used together. Typically one wishesto reverse the order of integration in a double integral

∫ ∫fdµdν. First one verifies that

∫|f |d(µ⊗

ν) < ∞ by using Tonelli’s theorem to evaluate this integral as the iterated integral∫ ∫|f |dµdν.

Then one applies Fubini’s theorem to conclude that∫ ∫

fdµdν =∫ ∫

fdνdµ. Some examples aregiven in the exercises.

Remark 29.8. If A = S ⊗ T , the hypothesis that (X,S, µ) and (X, T , µ) are complete can beomitted. Moreover, in Theorems 29.3 and 29.4, the conclusions can be modified as follows:

(a) fy is S-measurable for every y ∈ Y .

(b) fx is T -measurable for every x ∈ X.

(c) The function g(y) =∫fy(x)dµ(x) is T -measurable.

(d) The function h(x) =∫fx(y)dµ(y) is S-measurable.

(e)

∫g(y)dν(y) =

∫f(x, y)d(µ⊗ ν)(x, y) =

∫h(x)dµ(x).

An analogous modification of Lemma 29.2 also holds.

92

Page 94: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

30 Product Spaces With More Than Two Factors

All the integration theory we have developed for product spaces of two factors X × Y generalizesnaturally to products spaces of n factors

∏ni=1Xi = X1 × · · · ×Xn. We state the main definitions

and theorems, but omit the proofs and any inessential details.

Let X1, . . . , Xn be sets.

Notation. If Si ⊆ P(Xi) (i = 1, . . . , n), we define

n∏i=1

Si = S1 × · · · × Sn =

{n∏i=1

Ai = A1 × · · · ×An : Ai ∈ Si

}Note that this is not the usual Cartesian product of S〉 (i = 1, . . . , n)

Definition 30.1. Let Si be a σ-algebra on Xi (i = 1, . . . , n).

(a) The sets in∏ni=1 Si are called measurable rectangles.

(b) The σ-algebra on∏ni=1Xi generated by

∏ni=1 Si is called the product σ-algebra of S1, . . . ,Sn

It is denoted by⊗n

i=1 Si = S1 ⊗ · · · ⊗ Sn. In other words,⊗n

i=1 Si = σ(∏ni=1 Si).

Lemma 30.2. If Ei is a demi-ring on Xi for i = 1, . . . , n, then∏ni=1 Ei is a demi-ring on

∏ni=1Xi.

Corollary 30.3. If Si is a σ-algebra on Xi for i = 1, . . . , n, then∏ni=1 Si is a demi-ring on

∏ni=1Xi.

Theorem 30.4. Let (Xi,Si, µi) be a measure space for i = 1, . . . , n. Define π0 :∏ni=1 Si → [0,∞]

by

π0

(n∏i=1

Ai

)=

n∏i=1

µi(Ai).

Let π∗ be the outer measure generated by π0.

(a) π0 is finitely additive and countably monotone.

(b)∏ni=1 Si ⊆M(π∗) and π∗ = π0 on

∏ni=1 Si.

(c) Let A be a σ-algebra on∏ni=1Xi such that

∏ni=1 Si ⊆ A ⊆ M(π∗). Let

⊗ni=1 µi == π∗|A

(i.e.,⊗n

i=1 µi is the restriction of π∗ to A). Then⊗n

i=1 µi is a measure on A and

⊗ni=1µi(

n∏i=1

Ai) =n∏i=1

µi(Ai) for alln∏i=1

Ai ∈n∏i=1

Si. (30.1)

Moreover, if A = M(π∗), then⊗n

i=1 µi is complete. We call⊗n

i=1 µi the product measureof µ1, . . . , µn on A.

(d) Under the assumptions of (c), if Xi is σ-finite with respect to µi for i = 1, . . . , n, then∏ni=1Xi

is σ-finite with respect to π0 and⊗n

i=1 µi is the unique measure on A that satisfies (30.1).

Theorem 30.5. (Fubini and Tonelli Theorems) Let (Xi,Si, µi) be a complete measure space fori = 1, . . . , n. Let A be a σ-algebra on

∏ni=1Xi such that

∏ni=1 Si ⊆ A ⊆ M(π∗). Let

⊗ni=1 µi the

product of µ1, . . . , µn on A. Let f :∏ni=1Xi → R be A measurable. Assume at least one of the

following two conditions holds:

93

Page 95: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

(i) f ≥ 0 and {f > 0} is σ-finite with respect to⊗n

i=1 µi

(ii) f is integrable with respect to⊗n

i=1 µi.

Then, for any permutation p : {1, . . . , n} → {1, . . . , n},∫· · ·∫f(x1, . . . , xn)dµ1(x1) · · · dµn(xn) =

∫f(x1, . . . , xn)d(

⊗ni=1µi)(x1, . . . , xn)

=

∫· · ·∫f(x1, . . . , xn)dµp(1)(xp(1)) · · · dµp(n)(xp(n)).

94

Page 96: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

31 Lebesgue Measure on Rn

In this section, we construct Lebesgue measure on Rn. We use the product measure constructionof Theorem 30.4.

Theorem 31.1. Let λ be the Lebesgue measure on L(R). Define λn0 : L(R)n → [0,∞] by

λn0

(n∏i=1

Ai

)=

n∏i=1

λ(Ai)

Let λn,∗ be the outer measure on Rn generated by λn0 . It is called the Lebesgue outer measureon Rn. Let E be the collection of all intervals of the form (a, b] where a, b ∈ R and a ≤ b. Let

En = E × · · · × E =

{n∏i=1

(ai, bi] : (ai, bi] ∈ E

}.

(a) L(R)n is a demi-ring on Rn.

(b) λn0 is finitely additive.

(c) λn0 is countably monotone.

(d) Rn is σ-finite with respect to λn0 .

(e) λn,∗ = λn0 on L(R)n. In particular, λn,∗ assigns any product of intervals its volume.

(f) Define L(Rn) = M(λn,∗). We call L(Rn) the Lebesgue σ-algebra on Rn. It satisfiesL(R)n ⊆ L(Rn) and B(Rn) ⊆ L(Rn).

(g) λ∗|L(Rn) is a complete measure on L(Rn). In fact, it is equal to the product measure⊗n

i=1 λon L(Rn). It is the unique measure on L(Rn) that agrees with λn0 on En. It is called Lebesguemeasure on L(Rn). It is denoted by λn.

(h) λn,∗|B(Rn) is a measure on B(Rn). It is the unique measure on B(Rn) that agrees with λn0 onEn. It is called Lebesgue measure on B(Rn). It is also denoted by λn.

Proof.

(a),(b),(c): Immediate from Theorem 30.4.

(d): Rn =⋃∞k=1(−k, k]n and λn0 (

⋃∞k=1(−k, k]n) = (2k)n <∞ for all k ∈ N.

(e): λn,∗ = λn0 on L(R)n is immediate from Theorem 30.4. Combining this with Theorem 22.1(e)gives that λn,∗ assigns any product of intervals its volume.

(f): The containment L(R)n ⊆ L(Rn) comes from Theorem 30.4. A simple modification of theproof of Theorem 3.9 shows that B(Rn) is generated by the collection of all open rectangles

R =

{n∏i=1

(ai, bi) : ai, bi ∈ R, ai ≤ bi

}.

95

Page 97: Lecture Notes - MATH 231A - Real Analysis€¦ · applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory,

Since R ⊆ L(R)n ⊆ L(Rn), and since L(Rn) is a σ-algebra it follows that B(Rn) ⊆ L(Rn).

(g): Everything except the uniqueness assertion follows immediately from Theorem 30.4. Now weprove uniqueness. By theorem 22.1(a), E is a demi-ring on R. By Lemma 30.2, En is a demi-ringon Rn. Note that En ⊆ L(R)n ⊆ L(Rn). So, by (e), λn0 = λn,∗ = λn on En. Since λn is a measure,it is finitely additive and countably monotone on En. But since λn0 = λn on En, we have that λn0 isfinitely additive and countably monotone on En. Therefore the desired uniqueness assertion followsfrom (d) and Theorem 21.1(c).

(h): Since B(Rn) ⊆ L(Rn) and since λ∗|L(Rn) is a measure on L(Rn), λ∗|B(Rn) is a measure onB(Rn). Since En ⊆ B(Rn), the uniqueness assertion can be proved just like in (g).

Since (R,L(R), λ) is complete and σ-finite, Theorem 30.5 implies the following.

Theorem 31.2. (Fubini and Tonelli Theorems on (Rn,L(Rn), λn)). Let f : Rn → R be L(Rn)-measurable. Assume at least one of the following two conditions holds:

(i) f ≥ 0

(ii) f is integrable with respect to λn.

Then, for any permutation p : {1, . . . , n} → {1, . . . , n},∫· · ·∫f(x1, . . . , xn)dλ(x1) · · · dλ(xn) =

∫f(x1, . . . , xn)dλn(x1, . . . , xn)

=

∫· · ·∫f(x1, . . . , xn)dλ(xp(1)) · · · dλ(xp(n)).

96