Oracle database migration: Tru64 UNIX to HP-UX

25
Jan van Neerven Measure, Integral, and Conditional Expectation Handout May 2, 2012

Transcript of Oracle database migration: Tru64 UNIX to HP-UX

Page 1: Oracle database migration: Tru64 UNIX to HP-UX

Jan van Neerven

Measure, Integral, andConditional Expectation

Handout

May 2, 2012

Page 2: Oracle database migration: Tru64 UNIX to HP-UX
Page 3: Oracle database migration: Tru64 UNIX to HP-UX

1

Measure and Integral

In this lecture we review elements of the Lebesgue theory of integration.

1.1 σ-Algebras

Let Ω be a set. A σ-algebra in Ω is a collection F of subsets of Ω with thefollowing properties:

(1) Ω ∈ F ;(2) A ∈ F implies A ∈ F ;(3) A1, A2, · · · ∈ F implies

⋃∞n=1 An ∈ F .

Here, A = Ω \A is the complement of A.

A measurable space is a pair (Ω,F ), where Ω is a set and F is a σ-algebrain Ω.

The above three properties express that F is non-empty, closed undertaking complements, and closed under taking countable unions. From

∞⋂n=1

An = ( ∞⋃

n=1

An

)it follows that F is closed under taking countable intersections. Clearly, F isclosed under finite unions and intersections as well.

Example 1.1. In every set Ω there is a minimal σ-algebra, ∅, Ω, and a max-imal σ-algebra, the set P(Ω) of all subsets of Ω.

Example 1.2. If A is any collection of subsets of Ω, then there is a smallestσ-algebra, denoted σ(A ), containing A . This σ-algebra is obtained as the in-tersection of all σ-algebras containing A and is called the σ-algebra generatedby A .

Page 4: Oracle database migration: Tru64 UNIX to HP-UX

2 1 Measure and Integral

Example 1.3. Let (Ω1,F1), (Ω2,F2), . . . be a sequence of measurable spaces.The product

(∞∏

n=1

Ωn,

∞∏n=1

Fn)

is defined by∏∞

n=1 Ωn = Ω1×Ω2×. . . (the cartesian product) and∏∞

n=1 Fn =F1 ×F2 × . . . is defined as the σ-algebra generated by all sets of the form

A1 × · · · ×AN ×ΩN+1 ×ΩN+2 × . . .

with N = 1, 2, . . . and An ∈ Fn for n = 1, . . . , N . Finite products∏Nn=1(Ωn,Fn) are defined similarly; here one takes the σ-algebra in Ω1 ×

· · · × ΩN generated by all sets of the form A1 × · · · × AN with An ∈ Fn forn = 1, . . . , N .

Example 1.4. The Borel σ-algebra of Rd, notation B(Rd), is the σ-algebragenerated by all open sets in Rd. The sets in this σ-algebra are called theBorel sets of Rd. As an exercise, check that

(Rd,B(Rd)) =d∏

n=1

(R,B(R)).

Hint: Every open set in Rd is a countable union of rectangles of the form[a1, b1)× · · · × [ad, bd).

Example 1.5. Let f : Ω → R be any function. For Borel sets B ∈ B(R) wedefine

f ∈ B := ω ∈ Ω : f(ω) ∈ B.

The collectionσ(f) =

f ∈ B : B ∈ B(R)

is a σ-algebra in Ω, the σ-algebra generated by f .

1.2 Measurable functions

Let (Ω1,F1) and (Ω2,F2) be measurable spaces. A function f : Ω1 → Ω2 issaid to be measurable if f ∈ A ∈ F1 for all A ∈ F2. Clearly, compositionsof measurable functions are measurable.

Proposition 1.6. If A2 is a subset of F2 with the property that σ(A2) = F2,then a function f : Ω1 → Ω2 is measurable if and only if

f ∈ A ∈ F1 for all A ∈ A2.

Proof. Just notice that A ∈ F2 : f ∈ A ∈ F1 is a sub-σ-algebra of F2

containing A2. ut

Page 5: Oracle database migration: Tru64 UNIX to HP-UX

1.3 Measures 3

In most applications we are concerned with functions f from a measurablespace (Ω,F ) to (R,B(R)). Such functions are said to be Borel measurable. Inwhat follows we summarise some measurability properties of Borel measurablefunctions (the adjective ‘Borel’ will be omitted when no confusion is likely toarise).

Proposition 1.6 implies that a function f : Ω → R is Borel measurableif and only if f > a ∈ F for all a ∈ R. From this it follows that linearcombinations, products, and quotients (if defined) of measurable functions aremeasurable. For example, if f and g are measurable, then f + g is measurablesince

f + g > a =⋃q∈Q

f > q ∩ g > a− q.

This, in turn, implies that fg is measurable, as we have

fg =12[(f + g)2 − (f2 + g2)].

If f = supn>1 fn pointwise and each fn is measurable, then f is measurablesince

f > a =∞⋃

n=1

fn > a.

It follows from infn>1 fn = − supn>1(−fn) that the pointwise infimum ofa sequence of measurable functions is measurable as well. From this we getthat the pointwise limes superior and limes inferior of measurable functionsis measurable, since

lim supn→∞

fn = limn→∞

(supk>n

fk

)= inf

n>1

(supk>n

fk

)and lim infn→∞ fn = − lim supn→∞(−fn). In particular, the pointwise limitlimn→∞ fn of a sequence of measurable functions is measurable.

In these results, it is implicitly understood that the suprema and infimaexist and are finite pointwise. This restriction can be lifted by consideringfunctions f : Ω → [−∞,∞]. Such a function is said to be measurable if thesets f ∈ B as well as the sets f = ±∞ are in F . This extension will alsobe useful later on.

1.3 Measures

Let (Ω,F ) be a measurable space. A measure is a mapping µ : F → [0,∞]with the following properties:

(1) µ(∅) = 0;(2) For all disjoint set A1, A2, . . . in F we have µ

( ⋃∞n=1 An

)=

∑∞n=1 µ(An).

Page 6: Oracle database migration: Tru64 UNIX to HP-UX

4 1 Measure and Integral

A triple (Ω,F , µ), with µ a measure on a measurable space (Ω,F ), iscalled measure space. A measure space (Ω,F , P) is called finite if µ is a finitemeasure, that is, if µ(Ω) < ∞. If µ(Ω) = 1, then µ is called a probabilitymeasure and (Ω,F , µ) is called a probability space. In probability theory, it iscustomary to use the symbol P for a probability measure.

The following properties of measures are easily checked:

(a) if A1 ⊆ A2 in F , then µ(A1) 6 µ(A2);(b) if A1, A2, . . . in F , then

µ( ∞⋃

n=1

An

)6

∞∑n=1

µ(An);

Hint: Consider the disjoint sets An+1 \ (A1 ∪ · · · ∪An).(c) if A1 ⊆ A2 ⊆ . . . in F , then

µ( ∞⋃

n=1

An

)= lim

n→∞µ(An);

Hint: Consider the disjoint sets An+1 \An.(d) if A1 ⊇ A2 ⊇ . . . in F and µ(A1) < ∞, then

µ( ∞⋂

n=1

An

)= lim

n→∞µ(An).

Hint: Take complements.

Note that in c) and d) the limits (in [0,∞]) exist by monotonicity.

Example 1.7. Let (Ω,F ) and (Ω, F ) be measurable spaces and let µ be ameasure on (Ω,F ). If f : Ω → Ω is measurable, then

f∗(µ)(A) := µf ∈ A, A ∈ F ,

defines a measure f∗(µ) on (Ω, F ). This measure is called the image of µunder f . In the context where (Ω,F , P) is a probability space, the image ofP under f is called the distribution of f .

Concerning existence and uniqueness of measures we have the followingtwo fundamental results.

Theorem 1.8 (Caratheodory). Let A be a non-empty collection of subsetsof a set Ω that is closed under taking complements and finite unions. If µ :A → [0,∞] is a mapping such that properties 1.2(1) and 1.2(2) hold, thenthere exists a unique measure µ : σ(A ) → [0,∞] such that

µ|A = µ.

Page 7: Oracle database migration: Tru64 UNIX to HP-UX

1.3 Measures 5

The proof is lengthy and rather tedious. The typical situation is that mea-sures are given, and therefore the proof is omitted. The uniqueness part is aconsequence of the next result, which is useful in many other situations aswell.

Theorem 1.9 (Dynkin). Let µ1 and µ2 be two finite measures defined on ameasurable space (Ω,F ). Let A ⊆ F be a collection of sets with the followingproperties:

(1) A is closed under finite intersections;(2) σ(A ), the σ-algebra generated by A , equals F .

If µ1(A) = µ2(A) for all A ∈ A , then µ1 = µ2.

Proof. Let D denote the collection of all sets D ∈ F with µ1(D) = µ2(D).Then A ⊆ D and D is a Dynkin system, that is,

• Ω ∈ D ;• if D1 ⊆ D2 with D1, D2 ∈ D , then also D2 \D1 ∈ D ;• if D1 ⊆ D2 ⊆ . . . with all Dn ∈ D , then also

⋃∞n=1 Dn ∈ D .

The finiteness of µ1 and µ2 is used to prove the second property.By assumption we have D ⊆ F = σ(A ); we will show that σ(A ) ⊆ D . To

this end let D0 denote the smallest Dynkin system in F containing A . Wewill show that σ(A ) ⊆ D0. In view of D0 ⊆ D , this will prove the lemma.

Let C = D0 ∈ D0 : D0 ∩ A ∈ D0 for all A ∈ A . Then C is a Dynkinsystem and A ⊆ C since A is closed under taking finite intersections. Itfollows that D0 ⊆ C , since D0 is the smallest Dynkin system containing A .But obviously, C ⊆ D0, and therefore C = D0.

Now let C ′ = D0 ∈ D0 : D0 ∩ D ∈ D0 for all D ∈ D0. Then C ′

is a Dynkin system and the fact that C = D0 implies that A ⊆ C ′. HenceD0 ⊆ C ′, since D0 is the smallest Dynkin system containing A . But obviously,C ′ ⊆ D0, and therefore C ′ = D0.

It follows that D0 is closed under taking finite intersections. But a Dynkinsystem with this property is a σ-algebra. Thus, D0 is a σ-algebra, and nowA ⊆ D0 implies that also σ(A ) ⊆ D0. ut

Example 1.10 (Lebesgue measure). In Rd, let A be the collection of all finiteunions of rectangles [a1, b1)× · · · × [ad, bd). Define µ : A → [0,∞] by

µ([a1, b1)× · · · × [ad, bd)) :=d∏

n=1

(bn − an)

and extend this definition to finite disjoint unions by additivity. It can beverified that A and µ satisfy the assumptions of Caratheodory’s theorem.The resulting measure µ on σ(A ) = B(Rd), the Borel σ-algebra of Rd, iscalled the Lebesgue measure.

Page 8: Oracle database migration: Tru64 UNIX to HP-UX

6 1 Measure and Integral

Example 1.11. Let (Ω1,F1, P1), (Ω2,F2, P2), . . . be a sequence of probabilityspaces. There exists a unique probability measure P =

∏∞n=1 Pn on F =∏∞

n=1 Fn which satisfies

P(A1 × · · · ×AN ×ΩN+1 ×ΩN+2 × . . . ) =N∏

n=1

Pn(An)

for all N > 1 and An ∈ Fn for n = 1, . . . , N . Finite products of arbitrarymeasures µ1, . . . , µN can be defined similarly.

For example, the Lebesgue measure on Rd is the product measure of dcopies of the Lebesgue measure on R.

As an illustration of the use of Dynkin’s lemma we conclude this sectionwith an elementary result on independence. In probability theory, a measur-able function f : Ω → R, where (Ω,F , P) is a probability space, is calleda random variable. Random variables f1, . . . , fN : Ω → R are said to beindependent if for all B1, . . . , BN ∈ B(R) we have

Pf1 ∈ B1, . . . , fN ∈ BN =N∏

n=1

Pf1 ∈ B1.

Proposition 1.12. The random variables f1, . . . , fN : Ω → R are indepen-dent if and only if ν =

∏Nn=1 νn, where ν is the distribution of (f1, . . . , fN )

and νn is the distribution of fn, n = 1, . . . , N .

Proof. By definition, f1, . . . , fN are independent if and only if ν(B) =∏Nn=1 νn(Bn) for all B = B1×· · ·×BN with Bn ∈ B(R) for all n = 1, . . . , N .

Since these sets generate B(RN ), by Dynkin’s lemma the latter impliesν =

∏Nn=1 νn. ut

1.4 The Lebesgue integral

Let (Ω,F ) be a measurable space. A simple function is a function f : Ω → Rthat can be represented in the form

f =N∑

n=1

cn1An

with coefficients cn ∈ R and disjoint sets An ∈ F for all n = 1, . . . , N .

Proposition 1.13. A function f : Ω → [−∞,∞] is measurable if and only ifit is the pointwise limit of a sequence of simple functions. If in addition f isnon-negative, we may arrange that 0 6 fn ↑ f pointwise.

Proof. Take, for instance, fn =∑22n

m=−22nm2n 1f∈[ m

2n , m+12n ). ut

Page 9: Oracle database migration: Tru64 UNIX to HP-UX

1.4 The Lebesgue integral 7

1.4.1 Integration of simple functions

Let (Ω,F , µ) be a measure space. For a non-negative simple function f =∑Nn=1 cn1An we define ∫

Ω

f dµ :=N∑

n=1

cnµ(An).

We allow µ(An) to be infinite; this causes no problems because the coefficientscn are non-negative (we use the convention 0 · ∞ = 0). It is easy to checkthat this integral is well-defined, in the sense that it does not depend on theparticular representation of f as a simple function. Also, the integral is linearwith respect to addition and multiplication with non-negative scalars,∫

Ω

af1 + bf2 dµ = a

∫Ω

f1 dµ + b

∫Ω

f2 dµ,

and monotone in the sense that if 0 6 f1 6 f2 pointwise, then∫Ω

f1 dµ 6∫

Ω

f2 dµ.

1.4.2 Integration of non-negative functions

In what follows, a non-negative function is a function with values in [0,∞].For a non-negative measurable function f we choose a sequence of simple

functions 0 6 fn ↑ f and define∫Ω

f dµ := limn→∞

∫Ω

fn dµ.

The following lemma implies that this definition does not depend on the ap-proximating sequence.

Lemma 1.14. For a non-negative measurable function f and non-negativesimple functions fn and g such that 0 6 fn ↑ f and g 6 f pointwise we have∫

Ω

g dµ 6 limn→∞

∫Ω

fn dµ.

Proof. First consider the case g = 1A. Fix ε > 0 arbitrary and let An =1Afn > 1− ε. Then A1 ⊆ A2 ⊆ . . . ↑ A and therefore µ(An) ↑ µ(An). Since1Afn > (1− ε)1An ,

limn→∞

∫Ω

1Afn dµ > (1− ε) limn→∞

µ(An) = (1− ε)µ(A) = (1− ε)∫

Ω

g dµ.

This proves the lemma for g = 1A. The general case follows by linearity. ut

Page 10: Oracle database migration: Tru64 UNIX to HP-UX

8 1 Measure and Integral

The integral is linear and monotone on the set of non-negative measurablefunctions. Indeed, if f and g are such functions and 0 6 fn ↑ f and 0 6 gn ↑ g,then for a, b > 0 we have 0 6 afn + bgn ↑ af + bg and therefore∫

Ω

af + bg dµ = limn→∞

∫Ω

afn + bgn dµ

= a limn→∞

∫Ω

fn dµ + b limn→∞

∫Ω

gn dµ

= a

∫Ω

f dµ + b

∫Ω

g dµ.

If such f and g satisfy f 6 g pointwise, then by noting that 0 6 fn 6maxfn, gn ↑ g we see that∫

Ω

f dµ = limn→∞

∫Ω

fn dµ 6 limn→∞

∫Ω

maxfn, gn dµ 6∫

Ω

g dµ.

The next theorem is the cornerstone of Integration Theory.

Theorem 1.15 (Monotone Convergence Theorem). Let 0 6 f1 6 f2 6. . . be a sequence of non-negative measurable functions converging pointwiseto a function f . Then,

limn→∞

∫Ω

fn dµ =∫

Ω

f dµ.

Proof. First note that f is non-negative and measurable. For each n > 1 choosea sequence of simple functions 0 6 fnk ↑k fn. Set

gnk := maxf1k, . . . , fnk.

For m 6 n we have gmk 6 gnk. Also, for k 6 l we have fmk 6 fml, m =1, . . . , n, and therefore gnk 6 gnl. We conclude that the functions gnk aremonotone in both indices.

From fmk 6 fm 6 fn, 1 6 m 6 n, we see that fnk 6 gnk 6 fn, and weconclude that 0 6 gnk ↑k fn. From

fn = limk→∞

gnk 6 limk→∞

gkk 6 f

we deduce that 0 6 gkk ↑ f . Recalling that gkk 6 fk it follows that∫Ω

f dµ = limk→∞

∫Ω

gkk dµ 6 limk→∞

∫Ω

fk dµ 6∫

Ω

f dµ. ut

Example 1.16. In the situation of Example 1.7 we have the following substitu-tion formula. For any measurable f : Ω1 → Ω2 and non-negative measurableφ : Ω2 → R,

Page 11: Oracle database migration: Tru64 UNIX to HP-UX

1.4 The Lebesgue integral 9∫Ω1

φ f dµ =∫

Ω2

φdf∗(µ).

To prove this, note the this is trivial for simple functions φ = 1A with A ∈ F2.By linearity, the identity extends to non-negative simple functions φ, and bymonotone convergence (using Proposition 1.13) to non-negative measurablefunctions φ.

From the monotone convergence theorem we deduce the following usefulcorollary.

Theorem 1.17 (Fatou Lemma). Let (fn)∞n=1 be a sequence of non-negativemeasurable functions on (Ω,F , µ). Then∫

Ω

lim infn→∞

fn dµ 6 lim infn→∞

∫Ω

fn dµ.

Proof. From infk>n fk 6 fm, m > n, we infer∫Ω

infk>n

fk dµ 6 infm>n

∫Ω

fm dµ.

Hence, by the monotone convergence theorem,∫Ω

lim infn→∞

fn dµ =∫

Ω

limn→∞

infk>n

fk dµ

= limn→∞

∫Ω

infk>n

fk dµ 6 limn→∞

infm>n

∫Ω

fm dµ = lim infn→∞

∫Ω

fn dµ.

ut

1.4.3 Integration of real-valued functions

A measurable function f : Ω → [−∞,∞] is called integrable if∫Ω

|f | dµ < ∞.

Clearly, if f and g are measurable and |g| 6 |f | pointwise, then g is integrable iff is integrable. In particular, if f is integrable, then the non-negative functionsf+ and f− are integrable, and we define∫

Ω

f dµ :=∫

Ω

f+ dµ−∫

Ω

f− dµ.

For a set A ∈ F we write ∫A

f dµ :=∫

Ω

1Af dµ,

noting that 1Af is integrable. The monotonicity and additivity properties ofthis integral carry over to this more general situation, provided we assumethat the functions we integrate are integrable.

The next result is among the most useful in analysis.

Page 12: Oracle database migration: Tru64 UNIX to HP-UX

10 1 Measure and Integral

Theorem 1.18 (Dominated Convergence Theorem). Let (fn)∞n=1 be asequence of integrable functions such that limn→∞ fn = f pointwise. If thereexists an integrable function g such that |fn| 6 |g| for all n > 1, then

limn→∞

∫Ω

|fn − f | dµ = 0.

In particular,

limn→∞

∫Ω

fn dµ =∫

Ω

f dµ.

Proof. We make the preliminary observation that if (hn)∞n=1 is a sequence ofnon-negative measurable functions such that limn→∞ hn = 0 pointwise and his a non-negative integrable function such that hn 6 h for all n > 1, then bythe Fatou lemma∫

Ω

h dµ =∫

Ω

lim infn→∞

(h− hn) dµ

6 lim infn→∞

∫Ω

h− hn dµ =∫

Ω

h dµ− lim supn→∞

∫Ω

hn dµ.

Since∫

Ωh dµ is finite, it follows that 0 6 lim supn→∞

∫Ω

hn dµ 6 0 andtherefore

limn→∞

∫Ω

hn dµ = 0.

The theorem follows by applying this to hn = |fn − f | and h = 2|g|. ut

Let (Ω,F , P) be a probability space. The integral of an integrable randomvariable f : Ω → R is called the expectation of f , notation

Ef =∫

Ω

f dP.

Proposition 1.19. If f : Ω → R and g : Ω → R are independent integrablerandom variables, then fg : Ω → R is integrable and

Efg = Ef Eg.

Proof. To prove this we first observe that for any non-negative Borel measur-able φ : R → R, the random variables φ f and φ g are independent. Takingφ(x) = x+ and φ(x) = x−, it follows that we may consider the positive andnegative parts of f and g separately. In doing so, we may assume that f andg are non-negative.

Denoting the distributions of f , g, and (f, g) by νf , νg, and ν(f,g) respec-tively, by the substitution formula (with φ(x, y) = xy1[0,∞)2(x, y)) and thefact that ν(f,g) = νf × νg we obtain

Page 13: Oracle database migration: Tru64 UNIX to HP-UX

1.5 Lp-spaces 11

E(fg) =∫

[0,∞)2xy dν(f,g)(x, y) =

∫[0,∞)2

xy d(νf × dνg)(x, y)

=∫

[0,∞)

∫[0,∞)

xy dνf (x)dνg(y) = Ef Eg.

This proves the integrability of fg along with the desired identity. ut

1.4.4 The role of null sets

We begin with a simple observation.

Proposition 1.20. Let f be a non-negative measurable function.

1. If∫

Ωf dµ < ∞, then µf = ∞ = 0.

2. If∫

Ωf dµ = 0, then µf 6= 0 = 0.

Proof. For all c > 0 we have 0 6 c1f=∞ 6 f and therefore

0 6 cµf = ∞ 6∫

Ω

f dµ.

The first result follows from this by letting c →∞. For the second, note thatfor all n > 1 we have 1

n1f> 1n

6 f and therefore

1n

µf >

1n

6

∫Ω

f dµ = 0.

It follows that µf > 1n = 0. Now note that f > 0 =

⋃∞n=1f > 1

n. ut

Proposition 1.21. If f is integrable and µf 6= 0 = 0, then∫

Ωf dµ = 0.

Proof. By considering f+ and f− separately we may assume f is non-negative.Choose simple functions 0 6 fn ↑ f . Then µfn > 0 6 µf > 0 = 0 andtherefore

∫Ω

fn dµ = 0 for all n > 1. The result follows from this. ut

What these results show is that in everything that has been said so far, wemay replace pointwise convergence by pointwise convergence µ-almost every-where, where the latter means that we allow an exceptional set of µ-measurezero. This applies, in particular, to the monotonicity assumption in the mono-tone convergence theorem and the convergence and domination assumptionsin the dominated convergence theorem.

1.5 Lp-spaces

Let µ be a measure on (Ω,F ) and fix 1 6 p < ∞. We define L p(µ) as theset of all measurable functions f : Ω → R such that

Page 14: Oracle database migration: Tru64 UNIX to HP-UX

12 1 Measure and Integral ∫Ω

|f |p dµ < ∞.

For such functions we define

‖f‖p :=( ∫

Ω

|f |p dµ) 1

p

.

The next result shows that L p(µ) is a vector space:

Proposition 1.22 (Minkowski’s inequality). For all f, g ∈ L p(µ) wehave f + g ∈ L p(µ) and

‖f + g‖p 6 ‖f‖p + ‖g‖p.

Proof. By elementary calculus it is checked that for all non-negative real num-bers a and b one has

(a + b)p = inft∈(0,1)

t1−pap + (1− t)1−pbp.

Applying this identity to |f(x)| and |g(x)| and integrating with respect to µ,for all fixed t ∈ (0, 1) we obtain∫

Ω

|f(x) + g(x)|p dµ(x) 6∫

Ω

(|f(x)|+ |g(x)|)p dµ(x)

6 t1−p

∫Ω

|f(x)|p dµ(x) + (1− t)1−p

∫Ω

|g(x)|p dµ(x).

Stated differently, this says that

‖f + g‖pp 6 t1−p‖f‖p

p + (1− t)1−p‖g‖pp.

Taking the infimum over all t ∈ (0, 1) gives the result. ut

In spite of this result, ‖ · ‖p is not a norm on L p(µ), because ‖f‖p = 0only implies that f = 0 µ-almost everywhere. In order to get around thisimperfection, we define an equivalence relation ∼ on L p(µ) by

f ∼ g ⇔ f = g µ-almost everywhere.

The equivalence class of a function f modulo ∼ is denoted by [f ]. On thequotient space

Lp(µ) := L p(µ)/ ∼

we define a scalar multiplication and addition in the natural way:

c[f ] := [cf ], [f ] + [g] := [f + g].

We leave it as an exercise to check that both operators are well defined. Withthese operations, Lp(µ) is a normed vector space with respect to the norm

Page 15: Oracle database migration: Tru64 UNIX to HP-UX

1.5 Lp-spaces 13

‖[f ]‖p := ‖f‖p.

Following common practice we shall make no distinction between a functionin L p(µ) and its equivalence class in Lp(µ).

The next result states that Lp(µ) is a Banach space (which, by definition,is a complete normed vector space):

Theorem 1.23. The normed space Lp(µ) is complete.

Proof. Let (fn)∞n=1 be a Cauchy sequence with respect to the norm ‖ · ‖p ofLp(µ). By passing to a subsequence we may assume that

‖fn+1 − fn‖p 612n

, n = 1, 2, . . .

Define the non-negative measurable functions

gN :=N−1∑n=0

|fn+1 − fn|, g :=∞∑

n=0

|fn+1 − fn|,

with the convention that f0 = 0. By the monotone convergence theorem,∫Ω

gp dµ = limN→∞

∫Ω

gpN dµ.

Taking p-th roots and using Minkowski’s inequality we obtain

‖g‖p = limN→∞

‖gN‖p 6 limN→∞

N−1∑n=0

‖fn+1−fn‖p =∞∑

n=0

‖fn+1−fn‖p 6 1+‖f1‖p.

If follows that g is finitely-valued µ-almost everywhere, which means that thesum defining g converges absolutely µ-almost everywhere. As a result, the sum

f :=∞∑

n=0

(fn+1 − fn)

converges on the set g < ∞. On this set we have

f = limN→∞

N−1∑n=0

(fn+1 − fn) = limN→∞

fN .

Defining f to be zero on the null set g = ∞, the resulting function f ismeasurable. From

|fN |p =∣∣∣ N−1∑

n=0

(fn+1 − fn)∣∣∣p 6

( N−1∑n=0

|fn+1 − fn|)p

6 |g|p

and the dominated convergence theorem we conclude that

Page 16: Oracle database migration: Tru64 UNIX to HP-UX

14 1 Measure and Integral

limN→∞

‖f − fN‖p = 0.

We have proved that a subsequence of the original Cauchy sequence con-verges to f in Lp(µ). As is easily verified, this implies that the original Cauchysequence converges to f as well. ut

In the course of the proof we obtained the following result:

Corollary 1.24. Every convergence sequence in Lp(µ) has a µ-almost every-where convergent subsequence.

For p = ∞ the above definitions have to be modified a little. We defineL∞(µ) to be the vector space of all measurable functions f : Ω → R suchthat

supω∈Ω

|f(ω)| < ∞

and define, for such functions,

‖f‖∞ := supω∈Ω

|f(ω)|.

The spaceL∞(µ) := L∞(µ)/ ∼

is again a normed vector space in a natural way. The simple proof that L∞(µ)is a Banach space is left as an exercise.

We continue with an approximation result, the proof of which is an easyapplication of Proposition 1.13 together with the dominated convergence the-orem:

Proposition 1.25. For 1 6 p 6 ∞, the simple functions in Lp(µ) are densein Lp(µ).

We conclude with an important inequality, known as Holder’s inequality.For p = q = 2, it is known as the Cauchy-Schwarz inequality.

Theorem 1.26. Let 1 6 p, q 6 ∞ satisfy 1p + 1

q = 1. If f ∈ Lp(µ) andg ∈ Lq(µ), then fg ∈ L1(µ) and

‖fg‖1 6 ‖f‖p‖g‖q.

Proof. For p = 1, q = ∞ and for p = ∞, q = 1, this follows by a direct estimate.Thus we may assume from now on that 1 < p, q < ∞. The inequality is thenproved in the same way as Minkowski’s inequality, this time using the identity

ab = inft>0

( tpap

p+

bq

qtq

). ut

Page 17: Oracle database migration: Tru64 UNIX to HP-UX

1.7 Notes 15

1.6 Fubini’s theorem

A measure space (Ω,F , µ) is called σ-finite if we can write Ω =⋃∞

n=1 Ω(n)

with sets Ω(n) ∈ F satisfying µ(Ω(n)) < ∞.

Theorem 1.27 (Fubini). Let (Ω1,F1, µ1) and (Ω1,F1, µ1) be σ-finite mea-sure spaces. For any non-negative measurable function f : Ω1 × Ω2 → R thefollowing assertions hold:

(1) The non-negative function ω1 7→∫

Ω2f(ω1, ω2) dµ2(ω2) is measurable;

(2) The non-negative function ω2 7→∫

Ω1f(ω1, ω2) dµ1(ω1) is measurable;

(3) We have∫Ω1×Ω2

f d(µ1 × µ2) =∫

Ω2

∫Ω1

f dµ1 dµ2 =∫

Ω1

∫Ω2

f dµ2 dµ1.

We shall not give a detailed proof, but sketch the main steps. First, forf(ω1, ω2) = 1A1(ω1)1A2(ω2) with A1 ∈ F1, A2 ∈ F2, the result is clear.Dynkin’s lemma is used to extend the result to functions f = 1A with A ∈F = F1 ×F2. One has to assume first that µ1 and µ2 are finite; the σ-finitecase then follows by approximation. By taking linear combinations, the resultextends to simple functions. The result for arbitrary non-negative measurablefunctions then follows by monotone convergence.

A variation of the Fubini theorem holds if we replace ‘non-negative mea-surable’ by ‘integrable’. In that case, for µ1-almost all ω1 ∈ Ω1 the functionω2 7→ f(ω1, ω2) is integrable and for µ2-almost all ω2 ∈ Ω2 the function ω1 7→f(ω1, ω2) is integrable. Moreover, the functions ω1 7→

∫Ω2

f(ω1, ω2) dµ2(ω2)and ω2 7→

∫Ω1

f(ω1, ω2) dµ1(ω1) are integrable and the above identities hold.

1.7 Notes

The proof of the monotone convergence theorem is taken from Kallenberg[1]. The proofs of the Minkowski inequality and Holder inequality, althoughnot the ones that are usually presented in introductory courses, belong to thefolklore of the subject.

Page 18: Oracle database migration: Tru64 UNIX to HP-UX
Page 19: Oracle database migration: Tru64 UNIX to HP-UX

2

Conditional expectations

In this lecture we introduce conditional expectations and study some of theirbasic properties.

We begin by recalling some terminology. A random variable is a measurablefunction on a probability space (Ω,F , P). In this context, Ω is often called thesample space and the sets in F the events. For an integrable random variablef : Ω → R we write

Ef :=∫

Ω

f dP

for its expectation.Below we fix a sub-σ-algebra G of F . For 1 6 p 6 ∞ we denote by

Lp(Ω,G ) the subspace of all f ∈ Lp(Ω) having a G -measurable representative.With this notation, Lp(Ω) = Lp(Ω,F ). We will be interested in the problemof finding good approximations of a random variable f ∈ Lp(Ω) by randomvariables in Lp(Ω;G ).

Lemma 2.1. Lp(Ω,G ) is a closed subspace of Lp(Ω).

Proof. Suppose that (ξn)∞n=1 is a sequence in Lp(Ω,G ) such that limn→∞ fn =f in Lp(Ω). We may assume that the fn are pointwise defined and G -measurable. After passing to a subsequence (when 1 6 p < ∞) we mayfurthermore assume that limn→∞ fn = f almost surely. The set C of allω ∈ Ω where the sequence (fn(ω))∞n=1 converges is G -measurable. Putf := limn→∞ 1Cfn, where the limit exists pointwise. The random variablef is G -measurable and agrees almost surely with f . This shows that f definesan element of Lp(Ω,G ). ut

Our aim is to show that Lp(Ω,G ) is the range of a contractive projectionin Lp(Ω). For p = 2 this is clear: we have the orthogonal decomposition

L2(Ω) = L2(Ω,G )⊕ L2(Ω,G )⊥

Page 20: Oracle database migration: Tru64 UNIX to HP-UX

18 2 Conditional expectations

and the projection we have in mind is the orthogonal projection, denotedby PG , onto L2(Ω,G ) along this decomposition. Following common usage wewrite

E(f |G ) := PG (f), f ∈ L2(Ω),

and call E(f |G ) the conditional expectation of f with respect to G . Let usemphasise that E(f |G ) is defined as an element of L2(Ω,G ), that is, as anequivalence class of random variables.

Lemma 2.2. For all f ∈ L2(Ω) and G ∈ G we have∫G

E(f |G ) dP =∫

G

f dP.

As a consequence, if f > 0 almost surely, then E(f |G ) > 0 almost surely.

Proof. By definition we have f − E(f |G ) ⊥ L2(Ω,G ). If G ∈ G , then 1G ∈L2(Ω,G ) and therefore ∫

Ω

1G(f − E(f |G )) dP = 0.

This gives the desired identity. For the second assertion, choose a G -measurablerepresentative of g := E(f |G ) and apply the identity to the G -measurable setg < 0. ut

Taking G = Ω we obtain the identity E(E(f |G )) = Ef . This will be usedin the lemma, which asserts that the mapping f 7→ E(f |G ) is L1-bounded.

Lemma 2.3. For all f ∈ L2(Ω) we have E|E(f |G )| 6 E|f |.

Proof. It suffices to check that |E(f |G )| 6 E(|f | |G ), since then the lemmafollows from E|E(f |G )| 6 EE(|f | |G ) = E|f |. Splitting f into positive andnegative parts, almost surely we have

|E(f |G )| = |E(f+|G )− E(f−|G )|6 |E(f+|G )|+ |E(f−|G )| = E(f+|G ) + E(f−|G ) = E(|f | |G ). ut

Since L2(Ω) is dense in L1(Ω) this lemma shows that the condition ex-pectation operator has a unique extension to a contractive linear operator onL1(Ω), which we also denote by E(·|G ). This operator is a projection (thisfollows by noting that it is a projection in L2(Ω) and then using that L2(Ω)is dense in L1(Ω)) and it is positive in the sense that it maps positive ran-dom variables to positive random variables; this follows from Lemma 2.2 byapproximation.

Lemma 2.4 (Conditional Jensen inequality). If φ : R → R is convex,then for all f ∈ L1(Ω) such that φ f ∈ L1(Ω) we have, almost surely,

φ E(f |G ) 6 E(φ f |G ).

Page 21: Oracle database migration: Tru64 UNIX to HP-UX

2 Conditional expectations 19

Proof. If a, b ∈ R are such that at + b 6 φ(t) for all t ∈ R, then the positivityof the conditional expectation operator gives

aE(f |G ) + b = E(af + b|G ) 6 E(φ f |G )

almost surely. Since φ is convex we can find real sequences (an)∞n=1 and (bn)∞n=1

such that φ(t) = supn>1(ant+ bn) for all t ∈ R; we leave the proof of this factas an exercise. Hence almost surely,

φ E(f |G ) = supn>1

anE(f |G ) + bn 6 E(φ f |G ). ut

Theorem 2.5 (Lp-contractivity). For all 1 6 p 6 ∞ the conditional ex-pectation operator extends to a contractive positive projection on Lp(Ω) withrange Lp(Ω,G ). For f ∈ Lp(Ω), the random variable E(f |G ) is the uniqueelement of Lp(Ω,G ) with the property that for all G ∈ G ,∫

G

E(f |G ) dP =∫

G

f dP. (2.1)

Proof. For 1 6 p < ∞ the Lp-contractivity follows from Lemma 2.4 applied tothe convex function φ(t) = |t|p. For p = ∞ we argue as follows. If f ∈ L∞(Ω),then 0 6 |f | 6 ‖f‖∞1Ω and therefore 0 6 E(|f | |G ) 6 ‖f‖∞1Ω almost surely.Hence, E(|f | |G ) ∈ L∞(Ω) and ‖E(|f | |G )‖∞ 6 ‖f‖∞.

For 2 6 p 6 ∞, (2.1) follows from Lemma 2.2. For f ∈ Lp(Ω) with1 6 p < 2 we choose a sequence (fn)∞n=1 in L2(Ω) such that limn→∞ fn = fin Lp(Ω). Then limn→∞ E(fn|G ) = E(f |G ) in Lp(Ω) and therefore, for anyG ∈ G ,∫

G

E(f |G ) dP = limn→∞

∫G

E(fn|G ) dP = limn→∞

∫G

fn dP =∫

G

f dP.

If g ∈ Lp(Ω,G ) satisfies∫

Gg dP =

∫G

f dP for all G ∈ G , then∫

Gg dP =∫

GE(f |G ) dP for all G ∈ G . Since both g and E(f |G ) are G -measurable, as

in the proof of the second part of Lemma 2.2 this implies that g = E(f |G )almost surely.

In particular, E(E(f |G )|G ) = E(f |G ) for all f ∈ Lp(Ω) and E(f |G ) = ffor all f ∈ Lp(Ω,G ). This shows that E(·|G ) is a projection onto Lp(Ω,G ).

ut

The next two results develop some properties of conditional expectations.

Proposition 2.6.

(1) If f ∈ L1(Ω) and H is a sub-σ-algebra of G , then almost surely

E(E(f |G )|H ) = E(f |H ).

Page 22: Oracle database migration: Tru64 UNIX to HP-UX

20 2 Conditional expectations

(2) If f ∈ L1(Ω) is independent of G (that is, f is independent of 1G for allG ∈ G ), then almost surely

E(f |G ) = Ef.

(3) If f ∈ Lp(Ω) and g ∈ Lq(Ω,G ) with 1 6 p, q 6 ∞, 1p + 1

q = 1, then almostsurely

E(gf |G ) = gE(f |G ).

Proof. (1): For all H ∈ H we have∫

HE(E(f |G )|H ) dP =

∫H

E(f |G ) dP =∫H

f dP by Theorem 2.5, first applied to H and then to G (observe thatH ∈ G ). Now the result follows from the uniqueness part of the theorem.

(2): For all G ∈ G we have∫

GEf dP = E1GEf = E1Gf =

∫G

f dP, andthe result follows from the uniqueness part of Theorem 2.5.

(3): For all G, G′ ∈ G we have∫

G1G′E(f |G ) dP =

∫G∩G′ E(f |G ) dP =∫

G∩G′ f dP =∫

Gf1G′ dP. Hence E(f1G′ |G ) = 1G′E(f |G ) by the uniqueness

part of Theorem 2.5. By linearity, this gives the result for simple functions g,and the general case follows by approximation. ut

Example 2.7. Let f : (−r, r) → R be integrable and let G be the σ-algebra ofall symmetric Borel sets in (−r, r) (we call a set B symmetric if B = −B).Then,

E(f |G ) =12(f + f) almost surely,

where f(x) = f(−x). This is verified by checking the condition (2.1).

Example 2.8. Let f : [0, 1) → R be integrable and let G be the σ-algebragenerated by the sets In := [n−1

N , nN ), n = 1, . . . , N . Then,

E(f |G ) =N∑

n=1

mn 1[ n−1N , n

N ) almost surely,

wheremn =

1|In|

∫In

f(x) dx.

Again this is verified by checking the condition (2.1).

If f and g are random variables on (Ω,F , P), we write E(g|f) := E(g|σ(f)),where σ(f) is the σ-algebra generated by f .

Example 2.9. Let f1, . . . , fN be independent integrable random variables sat-isfying E(f1) = · · · = E(fN ) = 0, and put SN := f1 + · · · + fN . Then, for1 6 n 6 N ,

E(SN |fn) = fn almost surely.

This follows from the following almost sure identities:

E(SN |fn) = E(fn|fn) +∑m6=n

E(fm|fn) = fn +∑m6=n

E(fm) = fn.

Page 23: Oracle database migration: Tru64 UNIX to HP-UX

2.1 Notes 21

Example 2.10. Let f1, . . . , fN be independent and identically distributed in-tegrable random variables and put SN := f1 + · · ·+ fN . Then

E(f1|SN ) = · · · = E(fN |SN ) almost surely

and therefore

E(fn|SN ) = SN/N almost surely, n = 1, . . . , N.

It is clear that the second identity follows from the first. To prove the first,let µ denote the (common) distribution of the random variables fn. By in-dependence, the distribution of the RN -valued random variable (f1, . . . , fN )equals the N -fold product measure µN := µ×· · ·×µ. Then, for any Borel setB ∈ B(R) we have, using the substitution formula and Fubini’s theorem,∫

SN∈BE(fn|SN ) dP

=∫SN∈B

fn dP

=∫x1+···+xN∈B

xn dµN (x)

=∫

RN−1

∫y∈B−z1−···−zN−1

y dµ(y) dµN−1(z1, . . . , zN−1),

which is independent of n.

2.1 Notes

We recommend Williams [2] for an introduction to the subject. The biblefor modern probabilists is Kallenberg [1].

The definition of conditional expectations starting from orthogonal pro-jections in L2(Ω) follows Kallenberg [1]. Many textbooks prefer the shorter(but less elementary and transparent) route via the Radon-Nikodym theorem.In this approach, one notes that if f is a random variable on (Ω,F , P) andG is a sub-σ-algebra of F , then

Q(G) :=∫

G

f dP, G ∈ G ,

defines a probability measure Q on (Ω,G , P|G ) which is absolutely continu-ous with respect to P|G . Hence by the Radon-Nikodym, Q has an integrabledensity with respect to P|G . This density equals the conditional expectationE(f |G ) almost surely.

Page 24: Oracle database migration: Tru64 UNIX to HP-UX
Page 25: Oracle database migration: Tru64 UNIX to HP-UX

References

1. O. Kallenberg, “Foundations of Modern Probability”, second ed., Probabilityand its Applications (New York), Springer-Verlag, New York, 2002.

2. D. Williams, “Probability with Martingales”, Cambridge Mathematical Text-books, Cambridge University Press, Cambridge, 1991.