CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES...

20
CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability Consider a probability space (Ω, F ,P ) and an F measurable random variable X R. Suppose for some x R that P (X = x) > 0. Then we can define for a set A ∈F the conditional probability (1.1) P (A | X = x)= P (A ∩{X = x})/P (X = x) . This gives us a new probability measure on (Ω, F ), so we may define expecta- tions with respect to this conditioned probability measure. Thus for F measurable Y R we define the conditional expectation E[Y | X = x] by taking the ex- pectation of Y with respect to the measure (1.1). Consider now how to generalize the idea of conditional probability to the case when P (X = x) = 0. We wish to do this in a way which is consistent with Bayes formula. Thus we want for all bounded Borel measurable functions f : R R the identity (1.2) E[ f (X)E[Y |X]]= E[ f (X)Y ] to hold provided E[|Y |] < . In particular E[Y |X] is an F (X) measurable function satisfying (1.2) for all bounded Borel measurable functions f : R R. To carry this out we need to use the Radon-Nikdym theorem. Thus consider the Borel probability measure μ X (·) on R associated with X, and the Borel measure μ X,Y (·) on R associated with (X, Y ) defined by (1.3) μ X (B)= P (X B), μ X,Y (B)= E[ χ B (X)Y ], B R, where χ B (·) is the characteristic function of the set B: χ B (x)=1,x B, χ B (x)= 0,x/ B. Evidently μ X,Y (·) is absolutely continuous with respect to the measure μ X (·). That is for any Borel set B R one has μ X (B) = 0 implies μ X,Y (B) = 0. Hence by the Radon Nikodym theorem there is a non-negative Borel measurable function φ (X,Y ) : R R such that E[φ (X,Y ) (X)] = E[Y ] and (1.2) holds on setting E[Y |X]= φ (X,Y ) (X). We can actually further generalize the definition of conditional expectation. Thus let D⊂F be a sub σ-algebra of F . Then E[Y |D] is defined as in (1.2) by (1.4) E[ ZE[Y |D]]= E[ ZY ] for all bounded D measurable variables Z R. Proposition 1.1. let Y n 0,n =1, 2, ..., be an increasing sequence of random variables on , F ,P ) with limit Y so lim n→∞ Y n = Y , and assume E[Y ] < . Then for any sub σ-field D⊂F , one has lim n→∞ E[Y n |D]= E[Y |D]. 1

Transcript of CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES...

Page 1: CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability

CHAPTER I I - MARTINGALES AND MEASUREPRESERVING TRANSFORMATIONS

JOSEPH G. CONLON

1. Conditional Probability

Consider a probability space (Ω,F , P ) and an F measurable random variableX : Ω→ R. Suppose for some x ∈ R that P (X = x) > 0. Then we can define fora set A ∈ F the conditional probability

(1.1) P (A | X = x) = P (A ∩ X = x)/P (X = x) .

This gives us a new probability measure on (Ω,F), so we may define expecta-tions with respect to this conditioned probability measure. Thus for F measurableY : Ω → R we define the conditional expectation E[Y | X = x] by taking the ex-pectation of Y with respect to the measure (1.1). Consider now how to generalizethe idea of conditional probability to the case when P (X = x) = 0. We wish to dothis in a way which is consistent with Bayes formula. Thus we want for all boundedBorel measurable functions f : R→ R the identity

(1.2) E[ f(X)E[Y |X] ] = E[ f(X)Y ]

to hold provided E[|Y |] <∞. In particular E[Y |X] is an F(X) measurable functionsatisfying (1.2) for all bounded Borel measurable functions f : R→ R.

To carry this out we need to use the Radon-Nikdym theorem. Thus consider theBorel probability measure µX(·) on R associated with X, and the Borel measureµX,Y (·) on R associated with (X,Y ) defined by

(1.3) µX(B) = P (X ∈ B), µX,Y (B) = E[ χB(X)Y ], B ⊂ R,

where χB(·) is the characteristic function of the set B: χB(x) = 1, x ∈ B, χB(x) =0, x /∈ B. Evidently µX,Y (·) is absolutely continuous with respect to the measureµX(·). That is for any Borel set B ⊂ R one has µX(B) = 0 implies µX,Y (B) = 0.Hence by the Radon Nikodym theorem there is a non-negative Borel measurablefunction φ(X,Y ) : R→ R such that E[φ(X,Y )(X)] = E[Y ] and (1.2) holds on settingE[Y |X] = φ(X,Y )(X).

We can actually further generalize the definition of conditional expectation. Thuslet D ⊂ F be a sub σ-algebra of F . Then E[Y |D] is defined as in (1.2) by

(1.4) E[ ZE[Y |D] ] = E[ ZY ]

for all bounded D measurable variables Z : Ω→ R.

Proposition 1.1. let Yn ≥ 0, n = 1, 2, ..., be an increasing sequence of randomvariables on (Ω,F , P ) with limit Y so limn→∞ Yn = Y , and assume E[Y ] < ∞.Then for any sub σ-field D ⊂ F , one has limn→∞E[Yn|D] = E[Y |D].

1

Page 2: CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability

2 JOSEPH G. CONLON

Proof. Let Zn = Y − Yn so Zn, n = 1, 2, .., is a decreasing non-negative sequencewith limn→∞ Zn = 0. It is easy to see that E[Zn|D], n = 1, 2, .., is also a decreasingnon-negative sequence, whence there is a non-negative D measurable variable Usuch that limn→∞E[Zn|D] = U with probability 1. The dominated convergencetheorem then yields

(1.5) E[U ] = limn→∞

E[ E[Zn|D] ] = limn→∞

E[Y − Yn] = 0,

whence U = 0 with probability 1.

2. Martingales

Definition 1. Let X1, X2, ..., be a sequence of random variables on a probabilityspace (Ω,F , P ). The sequence is said to form a martingale if E[ |Xk| ] <∞, k =1, 2, ..., and if for n ≥ 1,

(2.1) E[Xn+1 | X1, X2, ..., Xn] = Xn with probability 1.

Evidently if Y1, Y2, ..., is a sequence of independent variables with E[ |Yk| ] <∞, k = 1, 2, .., and mean 0, then SN = Y1 + · · ·+ YN , N = 1, 2, .. is a martingale.The importance of martingales lies in the fact that it gives the natural setting fora generalization of Lemma 5.1 of Chapter I, which bounded the cdf of the maximalfunction MN = sup1≤n≤N Sn for a random walk in terms of the cdf of SN . Thekey to proving this is the so-called optional sampling theorem.

Definition 2. Let X1, X2, ..., be a sequence of random variables on a probabil-ity space (Ω,F , P ). A sequence m1,m2, ..., of integer-valued random variables arecalled sampling variables if

(2.2) 1 ≤ m1 ≤ m2 ≤ ....., mk = j ∈ F(X1, .., Xj), j, k = 1, 2, ....

Then the sequence of random variables Xn = Xmn, n = 1, 2, ..., is called the se-

quence derived by optional sampling from the original sequence X1, X2, ....

Theorem 2.1 (Optional Sampling Theorem). Let X1, X2, ..., be a martingale,m1,m2, ..., sampling variables, and X1, X2, ..., the optionally sampled sequence.Then the sequence X1, X2, ..., is also a martingale provided

(2.3) E[ |Xk| ] <∞, k = 1, 2, ..., lim infN→∞

E[ |XN | ; mn > N ] = 0, n = 1, 2, ...

Proof. From (1.2) we need to show that for all bounded Borel measurable functionsf : Rn → R,

(2.4) E[ f(X1, .., Xn)Xn+1 ] = E[ f(X1, .., Xn)Xn ] ,

which we can rewrite as

(2.5)∑

k1≤k2≤···≤kn+1

E[ f(Xk1 , Xk2 , ...Xkn)Xkn+1 ; m1 = k1, ...,mn+1 = kn+1 ]

=∑

k1≤k2≤···≤kn

E[ f(Xk1 , Xk2 , ...Xkn)Xkn

; m1 = k1, ...,mn = kn ]

Now (2.5) will follow if we can prove for all fixed k1 ≤ k2 ≤ · · · ≤ kn, that

Page 3: CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability

MATH 625-2010 3

(2.6)∞∑

kn+1=kn

E[ f(Xk1 , Xk2 , ...Xkn)Xkn+1 ; m1 = k1, ...,mn+1 = kn+1 ]

= E[ f(Xk1 , Xk2 , ...Xkn)Xkn ; m1 = k1, ...,mn = kn ] .

We first show that (2.6) holds provided supmn+1(·) is finite. To see this let N besuch that P (mn+1 > N) = 0, P (mn+1 = N) > 0, whence the sum in (2.6) is overkn ≤ kn+1 ≤ N . Observe now that the set mn+1 = N = Ω− mn+1 ≤ N − 1 ∈F(X1, .., XN−1). Hence

(2.7) E[ f(Xk1 , Xk2 , ...Xkn)XN ; m1 = k1, ...,mn+1 = N ]

= E[ f(Xk1 , Xk2 , ...Xkn)XN−1; m1 = k1, ...,mn+1 = N ] .

We can rewrite the LHS of (2.6) using (2.7) by introducing new sampling variables.Thus for k ≥ 1, define mn+1,k by mn+1,k = min[mn+1, k]. Then (2.7) imply that

(2.8) E[ f(Xk1 , Xk2 , ...Xkn)Xmn+1,k

; m1 = k1, ...,mn+1,k ≥ kn ]

= E[ f(Xk1 , Xk2 , ...Xkn)Xmn+1,k−1 ; m1 = k1, ...,mn+1,k−1 ≥ kn ] ,

for k = N . Now, proceeding by induction in (2.8) down to k = kn + 1 yields (2.6).When mn+1(·) is unbounded we need to use the limit in (2.3). Observe first that

(2.8) implies that

(2.9) E[ f(Xk1 , Xk2 , ...Xkn)Xkn

; m1 = k1, ...,mn = kn ]

= E[ f(Xk1 , Xk2 , ...Xkn)Xmn+1,N

; m1 = k1, ...,mn+1,N ≥ kn ]

for any N ≥ kn. Hence we have that

(2.10)∣∣∣∣ E[ f(Xk1 , Xk2 , ...Xkn

)Xkn; m1 = k1, ...,mn = kn ]

− E[ f(Xk1 , Xk2 , ...Xkn)Xmn+1 ; m1 = k1, ...,mn+1 ≥ kn ]∣∣∣∣

≤ ‖f(·)‖∞E[ |Xmn+1 −XN | ; mn+1 > N ] .

Using the fact that E[ |Xmn+1 | ] <∞ and (2.3), we see that

(2.11) lim infN→∞

E[ |Xmn+1 −XN | ; mn+1 > N ] = 0.

Hence (2.6) follows from (2.10) on taking a subsequence of N →∞.

Proposition 2.1. Let X1, X2, ..., be a martingale and MN = sup1≤n≤N |Xn| bethe corresponding maximal function, N ≥ 1. Then for any p ≥ 1 and a > 0,

(2.12) P (MN > a) ≤ 1apE[ |XN |p ], N ≥ 1.

Proof. Let mN : Ω→ Z be the stopping time

mN = N if sup1≤j≤N

|Xj | ≤ a,(2.13)

mN = minj ≥ 1 : |Xj | > a otherwise.

Evidently mN is an optional sampling variable i.e. mN = j ∈ F(X1, .., Xj).Observe now that

(2.14) P (MN > a) = P (|XmN| > a) ≤ 1

apE[ |XmN

|p ].

Page 4: CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability

4 JOSEPH G. CONLON

Using the optional sampling theorem we also have

(2.15) E[XN | XmN] = XmN

,

whence we conclude that

(2.16) E[ |XmN|p ] = E[ |E[XN |XmN

]|p ] ≤ E[ E[ |XN |p | XmN] ] = E |XN |p ],

where we have used the Jensen inequality.

Proposition 2.1 enables us to prove a stronger version of SLLN Proposition 2.2of Chapter I.

Corollary 2.1. Let Y1, Y2, ..., be a sequence of i.i.d. variables with E[ |Y1|2 ] <∞,and SN = Y1 + · · ·+ YN , N = 1, 2, ... Then

(2.17) limN→∞

SNN

= E[Y1] with probability 1.

Proof. We have already observed that XN = SN−E[SN ] is a martingale. ApplyingProposition 2.1 with p = 2 we see that

(2.18) P ( sup2k−1≤n<2k

|Xn| > 22k/3 ) ≤ E[Y 21 ]/2k/3 , k = 1, 2, ....,

whence

(2.19)∞∑k=1

P ( sup2k−1≤n<2k

|Xn| > 22k/3 ) < ∞.

The result follows from (2.19) and Borel-Cantelli (a).

We also observe here that there is a version of CLT for martingales. To see thislet X1, X2, ..., be a martingale and denote by Fn the σ-field F(X1, X2, .., Xn), n =1, 2, ... Setting Yn = Xn−Xn−1, n = 1, 2, .., with X0 ≡ 0, we see that the condition(2.1) is just

(2.20) E[Yn | Fn−1 ] = 0, n = 1, 2, ...

Proposition 2.2. Let X1, X2, ..., be a martingale and suppose the correspondingmartingale differences Yn, n = 1, 2, .., satisfy

(2.21) E[Y 2n | Fn−1 ] = 1, n = 1, 2, .., sup

n≥1E[ |Yn|3 | Fn−1 ] = K,

for some finite constant K. Then XN/√N converges in distribution as N →∞ to

the standard normal variable Z.

Proof. We proceed as in Chapter I, showing that the proof of Lemma 3.1 of ChapterI also works for martingales. Thus setting ZN = XN/

√N , we shall show that

(2.22) limN→∞

χZN(σ) = e−σ

2/2 , σ ∈ R.

To see this we set an,N = χXn/√N (σ), 1 ≤ n ≤ N , so aN,N = χZN

(σ). From (2.21)it follows that

(2.23)∣∣ an,N − [1− σ2/2N ]an−1,N

∣∣ ≤ Kσ3/6N3/2 ,

and a0,N = 1. Hence if we set

(2.24) bn,N = an,N − [1− σ2/2N ]n, n = 0, 1, 2, ..,

Page 5: CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability

MATH 625-2010 5

then (2.23) implies that

(2.25)∣∣ bn,N − [1− σ2/2N ]bn−1,N

∣∣ ≤ Kσ3/6N3/2 , n = 1, 2, .., b0,N = 0.

It is easy to see from (2.25) that

(2.26)∣∣ bn,N ∣∣ ≤ Kσ3[1− λnN ]

6N3/2[1− λN ], where λN = 1− σ2/2N,

whence limN→∞ bN,N = 0. The limit (2.22) follows now from this and (2.24). Theremainder of CLT follows from (2.22) by similar argument as in Chapter I.

Up until now we have been emphasizing the fact that the notion of a martingalegeneralizes the concept of sums of independent random variables with mean zero,hence the connection with SLLN and CLT. There is also another direction it bringsus, which is a generalization of the measure theory result that the derivative of anabsolutely continuous function is integrable. Thus let f : [0, 1]→ R be a Lebesgueintegrable function, and consider the well known result from measure theory,

(2.27) limr→0

12r

∫ x+r

x−rf(z) dz = f(x) for almost every x ∈ [0, 1].

Let us define a martingale associated with the function f(·) as follows: For n =0, 1, ..., and 1 ≤ k ≤ 2n, let In.k be the dyadic interval x ∈ [0, 1] : (k − 1)2−n <x ≤ k2−n and Xn : [0, 1]→ R be given by

(2.28) Xn(ω) =1|In,k|

∫In.k

f(z) dz, ω ∈ In,k .

Thus Xn(ω) is the average of f(·) over the dyadic interval of length 2−n containingω. It is easy to see that Xn, n = 1, 2.., is a martingale and (2.27) tells us that

(2.29) limn→∞

Xn = X with probability 1,

where X is just the function f(·). This result has a generalization in the following:

Theorem 2.2 (Martingale Convergence Theorem). Let X1, X2, ..., be a martingalewhich satisfies supn≥1E[ |Xn| ] <∞. Then there exists a random variable X suchthat E[ |X| ] <∞ and (2.29) holds.

Proof. We prove the result by showing that with probability 1 the sequenceXn(ω), n =1, 2, .., does not oscillate, ω ∈ Ω. By oscillation we mean that for some a, b witha < b, there are two subsequences Xna(j)(ω), Xnb(j)(ω), j = 1, 2, .., such thatlim infj→∞Xna(j)(ω) ≤ a, lim supj→∞Xnb(j)(ω) ≥ b. Evidently the sequencesna(j), nb(j), j = 1, 2..., depend on ω ∈ Ω in general. To do this we define sam-pling variables m∗k, k = 1, 2, .., by

m∗1 = +∞ if Xj > a all j,(2.30)m∗1 = minj ≥ 1 : Xj ≤ a otherwise;

m∗2 = +∞ if Xj < b all j ≥ m∗1,(2.31)m∗2 = minj ≥ m∗1 : Xj ≥ b otherwise;

m∗3 = +∞ if Xj > a all j ≥ m∗2,(2.32)m∗3 = minj ≥ m∗2 : Xj ≤ a otherwise;

Page 6: CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability

6 JOSEPH G. CONLON

and so forth, so that if m∗2n(ω) <∞ then Xm∗2k

(ω)−Xm∗2k−1

(ω) ≥ b−a, 1 ≤ k ≤ n.Thus it will be sufficient for us to show that if Sa,b is the set

(2.33) Sa,b = ∩∞n=1m∗2n <∞ ,

then P (Sa,b) = 0 for all a < b.To prove this we define for any M ≥ 1 optional sampling variables mn,M =

minM,m∗n, n = 1, 2, ..., with corresponding martingale Xn,M = Xmn,M, n =

1, 2, ... Suppose now that P (Sa,b) ≥ 2δ > 0, whence it follows that for any n ≥ 1there exists Mn ≥ 1 such that P (m∗2n ≤M) > δ provided M ≥Mn. Evidently wehave that

n∑r=1

[X2r,M (ω)− X2r−1,M (ω)] ≥ n(b− a) for ω ∈ m∗2n ≤M ,(2.34)

n∑r=1

[X2r,M (ω)− X2r−1,M (ω)] ≥ minXM − b, 0 for ω /∈ m∗2n ≤M .

Taking the expectation in (2.34) and using the fact that the expectation of the LHSis 0, we conclude that

(2.35) n(b− a)P (m∗2n ≤M) ≤ E[ |XM − b| ] .

Since the RHS of (2.35) is uniformly bounded as M →∞, we obtain a contradictionto P (m∗2n ≤M) > δ for all large M by choosing n sufficiently large in (2.35).

Since P (Sa,b) = 0 for all a < b, we conclude that the set

(2.36) S = ∪a,b∈Q, a<bSa,b where Q are the rational numbers

satisfies P (S) = 0. Now let U be the set

(2.37) U = ω ∈ Ω : supn≥1|Xn(ω)| <∞.

It is clear that for any ω ∈ U −S the sequence Xn(ω), n = 1, 2, .., converges. FromProposition 2.1 and the fact that supn≥1E[ |Xn| ] < ∞ it follows that P (U) = 1.Since we have already shown that P (S) = 0 the result follows.

Finally we prove a convergence result for martingales in the spaces Lp(Ω), p > 1.

Theorem 2.3. Let X1, X2, ..., be a martingale which satisfies supn≥1E[ |Xn|p ] <∞ for some p > 1. Then there exists a random variable X such that E[ |X|p ] <∞and Xn converges to X in Lp(Ω) as n→∞.

Proof. Using Jensen’s inequality E[ |Xn| ]p ≤ E[ |Xn|p ], p ≥ 1, we see that theconditions of theorem 2.2 hold. Hence there exists a random variable X such thatE[ |X| ] < ∞ and limn→∞Xn = X with probability 1. To prove convergence inLp(Ω) we need a strengthening of Proposition 2.1. Thus we show that

(2.38) E[ MpN ] ≤

[p

p− 1

]pE[ |XN |p ] , N = 1, 2, ...

To see this we observe that (2.14)-(2.15) can be improved to

(2.39) P (MN > a) ≤ 1aE[ |XN |; |XmN

| > a ] =1aE[ |XN |;MN > a ] .

Page 7: CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability

MATH 625-2010 7

Now we write the expectation E[ MpN ] by means of its distribution function

P (MN > λ), λ > 0, as

(2.40) E[ MpN ] = p

∫ ∞0

λp−1P (MN > λ) dλ

Now (2.39) implies that

(2.41) E[ MpN ] ≤ p

∫ ∞0

λp−2E[ |XN |;MN > λ ] dλ =

E[ |XN |∫ ∞

0

pλp−2H(MN − λ)dλ ] =p

p− 1E[ |XN | Mp−1

N ],

where H(z), z ∈ R, is the Heaviside function H(z) = 1 if z > 0, and otherwiseH(z) = 0. Thus we conclude that

(2.42) E[ MpN ] ≤ p

p− 1E[ |XN |Mp−1

N ] ≤ p

p− 1E[ |XN |p ]1/p E[ |MN |p ]1−1/p ,

on using the Holder inequality, from which (2.38) follows.To complete the proof of convergence in Lp(Ω) we let N →∞ in (2.38), whence

we see that

(2.43) E[ Mp∞ ] = E[ sup

n≥1|Xn|p ] < ∞ .

Hence E[ |X|p ] <∞ and lima→∞E[ Mp∞; M∞ > a] = 0. We have now that

(2.44) lim supn→∞

‖Xn−X‖pp ≤ lim supn→∞

‖[Xn−X]H(a−M∞)‖pp+2E[ Mp∞; M∞ > a] ,

for any a > 0. By the dominated convergence theorem the first term on the RHS of(2.44) is 0, and we have already seen that the second term vanishes as a→∞.

3. Measure Preserving Transformations

Consider a probability space (Ω,F , P ). A measurable function T : Ω → Ω issaid to be measure preserving if

(3.1) P (T−1A) = P (A), A ∈ F ,

where

(3.2) T−1A = ω ∈ Ω : Tω ∈ A .

Example 1. Let Ω = [0, 1], λ ∈ R, F the Borel algebra generated by the open setsof [0, 1], and T : Ω→ Ω be defined by

(3.3) Tω = ω + λ mod 1.

One can view Ω as the unit circle and T as the operation of rotation throughthe angle 2πλ. Taking P to be Lebesgue measure on [0, 1], we see that T is ameasure preserving transformation. Observe also that T is (1-1) onto, so T is anisomorphism of Ω.

Example 2. Let Ω = [0, 1], F the Borel algebra generated by the open sets of [0, 1],and T : Ω→ Ω be defined by

(3.4) Tω = 2ω mod 1.

Page 8: CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability

8 JOSEPH G. CONLON

Evidently T is the shift operator on the binary representation of a number, i.e.

(3.5) ω =∞∑j=1

aj2j

implies Tω =∞∑j=1

aj+1

2j.

To see that T is measure preserving, consider the dyadic interval In,k = [(k −1)/2n, k/2n], where 1 ≤ k ≤ 2n. Then

(3.6) T−1 In,k = In+1,k ∪ In+1,k+2n ,

whence P ( T−1 In,k ) = P ( In,k ). Next define the σ−field G by

(3.7) G = all Borel sets O such that P (T−1O) = P (O) .Since G contains all the dyadic intervals In,k, it follows that G = F , whence T ismeasure preserving. Note that in this example T is not a one to one mapping. Theaction of T is to expand a subset of [0, 1/2] or [1/2, 1] in a 1-1 fashion to a set whichis double its measure. Since T is a 2-1 mapping this makes it measure preserving.Transformations T which have the property of expansion are known as hyperbolic.

Let us assume now that T : Ω → Ω is a measure preserving mapping. Then iff : Ω→ R is integrable, (3.1) implies that the function f T defined by f T (ω) =f(Tω), ω ∈ Ω, is also integrable, and

(3.8) 〈 f T (·) 〉 = 〈 f(·) 〉 .We define a linear operator U on L2(Ω) as follows:

(3.9) Uφ(ω) = φ(Tω), ω ∈ Ω, φ ∈ L2(Ω).

From (3.8) it follows that

(3.10) (Uφ,Uψ) = 〈 Uφ(·) Uψ(·) 〉 = 〈 φ(·) ψ(·) 〉 = (φ, ψ) , φ, ψ ∈ L2(Ω),

where (·, ·) denotes the L2(Ω) inner product. Recall that the operator U is boundedon L2(Ω) with norm ‖U‖ if

(3.11) sup‖Uφ(·)‖ : ‖φ‖ = 1 = ‖U‖ <∞ .

Setting ψ = φ in (3.10) we see that U is a bounded operator on L2(Ω) with norm‖U‖ = 1. The adjoint U∗ of U is defined as the operator which satisfies

(3.12) (Uφ, ψ) = (φ,U∗ψ), φ, ψ ∈ L2(Ω).

The operator U∗ is also bounded on L2(Ω) and ‖U∗‖ = ‖U‖. Hence (3.10) isequivalent to the statement

(3.13) U∗U = I = identity operator on L2(Ω).

Note that (3.13) implies that the operator U is one to one but not necessarily onto.If U is onto then (3.13) implies that U is invertible with inverse U∗, so

(3.14) UU∗ = U∗U = I .

Equation (3.14) says that U is a unitary operator on L2(Ω). One can see that U isinvertible if and only if T is one to one, in which case U∗ is given by the formula

(3.15) U∗φ(ω) = φ(T−1ω), ω ∈ Ω, φ ∈ L2(Ω).

Note that the measure preserving property of T implies that if T is one to one thenT is almost onto, i.e. up to a set of probability 0. This is the case because for anyBorel set A such that T (Ω) ⊂ A satisfies P (Ω−A) = 0.

Page 9: CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability

MATH 625-2010 9

We might expect U∗ to be given by a formula like (3.15) even when T is not oneto one. To see this consider the mapping (3.4) of Example 2. In that case

(3.16) U∗φ(ω) =12[φ(T−1

1 ω) + φ(T−12 ω)

], ω ∈ Ω, φ ∈ L2(Ω),

where T1 : [0, 1/2] → [0, 1] and T2 : [1/2, 1] → [0, 1] are the one to one mappingsTj(ω) = Tω, j = 1, 2. Suppose now that φ0 : [0, 1/2] → R is an arbitrary squareintegrable function and define a function Kasymφ0 : [0, 1]→ R by

Kasymφ0(ω) = φ0(ω) if 0 < ω < 1/2,(3.17)Kasymφ0(ω) = −φ0(ω − 1/2) if 1/2 < ω < 1.

Then (3.16) implies that U∗Kasymφ0 = 0. In fact it is easy to see that N (U∗) =Kasymφ0 : φ0 ∈ L2([0, 1/2]) and therefore has infinite dimension. The non-triviality of N (U∗) implies that U is not onto as we see in the following:

Lemma 3.1. Let H be a Hilbert space and L : H → H a bounded linear operator.Then H is the orthogonal sum H = N (L∗) ⊕ R(L), where R(L) is the closure ofthe range of L i.e. the closure of the linear space Lφ : φ ∈ H.

Proof. Let E be the linear space

(3.18) E = ψ ∈ H : (ψ,Lφ) = 0 for all φ ∈ H .Thus E is the linear space orthogonal to R(L). Using the fact that (ψ,Lφ) =(L∗ψ, φ), we conclude that E = N (L∗).

Lemma 3.1 enables us to characterize R(U) for the mapping (3.4) since it isthe orthogonal complement of N (U∗). One sees that R(U) = Ksymφ0 : φ0 ∈L2([0, 1/2]), where

Ksymφ0(ω) = φ0(ω) if 0 < ω < 1/2,(3.19)Ksymφ0(ω) = φ0(ω − 1/2) if 1/2 < ω < 1.

Definition 3. A measure preserving mapping T : Ω → Ω is said to be ergodic ifthe only eigenfunction of the operator U of (3.9) with eigenvalue 1 is the constantfunction i.e. Uφ = φ and φ ∈ L2(Ω) implies φ(·) ≡ constant.

Lemma 3.2. A measure preserving mapping T : Ω → Ω is ergodic if and only ifinvariant sets of T have probability 0 or 1 i.e. if P ( [A−T−1A]∪ [T−1A−A] ) = 0,then P (A) = 0 or P (A) = 1.

Proof. Assume first that T is not ergodic so there exists non-constant φ(·) ∈ L2(Ω)such that Uφ = φ. Hence one can find a < b such that if A = ω ∈ Ω : a <φ(ω) < b, then 0 < P (A) < 1. Evidently A is an invariant set with the propertythat P (A) 6= 0, 1. Conversely suppose there exists an invariant set A such that0 < P (A) < 1. Then the function φ(·) = χA(·) satisfies φ ∈ L2(Ω) and Uφ = φ butis not the constant function.

Theorem 3.1 (Von-Neumann Ergodic Theorem). Suppose the measure preservingmapping T : Ω→ Ω is ergodic. Then for any φ ∈ L2(Ω) one has

(3.20) limN→∞

1N

N−1∑n=0

φ(Tnω) = 〈 φ(·) 〉 , ω ∈ Ω,

where the convergence in (3.20) is in the norm of L2(Ω).

Page 10: CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability

10 JOSEPH G. CONLON

Proof. Evidently we can assume that 〈 φ(·) 〉 = 0, whence (3.20) can be rewrittenas

(3.21) limN→∞

1N

[I + U + U2 + · · ·UN−1

]φ(·) = 0.

We can easily show that (3.21) holds for a large class of functions, namely functionswhich are “derivatives” of a function ψ(·) ∈ L2(Ω). By the “derivative” of ψ wemean the function φ = Dψ = [U − I]ψ. In that case a discrete version of thefundamental theorem of calculus yields

(3.22)[I + U + U2 + · · ·UN−1

]Dψ(·) = [UN − I]ψ .

Hence the norm of the LHS of (3.21) is at most 2‖ψ‖/N , where we are using thefact that ‖U‖ ≤ 1, whence the limit (3.21) holds.

Next let E be the linear space which is the closure of the linear space Dψ :ψ ∈ L2(Ω). Then (3.21) holds for all φ ∈ E . To see this we note that for anyε > 0 there exists ψε ∈ L2(Ω) such that ‖φ − Dψε‖ ≤ ε. Writing φ on the LHSof (3.21) as a sum φ = Dψε + [φ − Dψε], it follows from (3.22) that the LHS of(3.21) is bounded in norm by 2‖ψε‖/N + ε. Letting N →∞ we conclude that thelim supN→∞ of the norm of the LHS of (3.21) is less than ε. Since ε is arbitrarilysmall we conclude that (3.21) holds for the function φ ∈ E .

To complete the proof we need to show that E is the same as the linear spaceφ ∈ L2(Ω) : 〈 φ(·) 〉 = 0. Now from Lemma 3.1 the orthogonal complement ofE in H is the null space of the operator D∗ = U∗ − I, which is the same as theeigenspace of U∗ with eigenvalue 1. Suppose now that U∗φ = φ for some φ ∈ L2(Ω).Then

(3.23) (Uφ, φ) = (φ,U∗φ) = (φ, φ) ,

which we can rewrite as

(3.24) 〈 φ(T ·) φ(·) 〉 = 〈 φ2(·) 〉 ,

and this implies that

(3.25) 〈 [φ(T ·)− φ(·)]2 〉 = 0.

We conclude that φ(T ·) = φ(·). We have shown that U∗φ = φ implies that Uφ = φ.The ergodicity assumption now implies that φ(·) ≡ constant. It follows that theorthogonal complement of the linear space E is the one dimensional space of constantfunctions , whence E = φ ∈ L2(Ω) : 〈 φ(·) 〉 = 0.

Next we turn to the notion of mixing. If we take φ = χA in Theorem 3.1 for aset A ∈ F , then we can conclude that

(3.26) limN→∞

1N

N−1∑n=0

P (T−nA ∩B) = P (A)P (B) , A,B ∈ F .

Conversely if (3.26) holds then T is ergodic. We can interpret (3.26) as statingthat for a fixed set B ∈ F , the set T−nA is independent of B for large n. A moreobvious statement of this approximate independence is

(3.27) limn→∞

P (T−nA ∩B) = P (A)P (B) , A,B ∈ F .

Page 11: CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability

MATH 625-2010 11

If the measure preserving mapping T : Ω → Ω satisfies (3.27) then it is said to bestrongly mixing. A condition which lies between (3.26) and (3.27) is

(3.28) limN→∞

1N

N−1∑n=0

|P (T−nA ∩B)− P (A)P (B)| = 0 , A,B ∈ F .

If the measure preserving mapping T : Ω → Ω satisfies (3.28) then it is said to beweakly mixing.

We can characterize the notions of ergodicity and weak mixing slightly differentlyfrom (3.26), (3.28). Thus denoting the inner product on the complex Hilbert spaceL2(Ω) of functions f : Ω→ C by (·, ·) where (f, g) = 〈fg〉, we have the following:

Lemma 3.3. Suppose T : Ω→ Ω is a measure preserving transformation. Then Tis ergodic if and only if the operator U of (3.9) satisfies

(3.29) limN→∞

1N

N−1∑n=0

(Unf, f) = 0 , f ∈ L2(Ω), 〈f(·)〉 = 0.

T is weak mixing if and only if U satisfies

(3.30) limN→∞

1N

N−1∑n=0

|(Unf, f)| = 0 , f ∈ L2(Ω), 〈f(·)〉 = 0.

Proof. We show (3.26), and (3.29) are equivalent. To see this note that (3.26) canbe written as

(3.31) limN→∞

1N

N−1∑n=0

[(Unf1, f2)− 〈f1〉〈f2〉] = 0 ,

where f1 and f2 are characteristic functions of any measurable sets A and B. Henceon taking linear combinations of characteristic functions we see that (3.31) continuesto hold for any pair of real valued functions f1, f2 ∈ L2(Ω). Now (3.29) follows from(3.31) on writing f = f1 + if2 for real f1, f2 ∈ L2(Ω). Conversely suppose (3.29)holds for all f ∈ L2(Ω) and let f1, f2 be a pair of real valued functions in L2(Ω)with mean 0. By taking f = f1, f2, f1 + f2, f1 + if2 in (3.29), we conclude that(3.31) holds.

To show that (3.28), and (3.30) are equivalent, we note that (3.28) can be writtensimilarly as

(3.32) limN→∞

1N

N−1∑n=0

|(Unf1, f2)− 〈f1〉〈f2〉| = 0 ,

where f1 and f2 are characteristic functions of any measurable sets A and B. Theargument proceeds then as in the previous paragraph.

The notion of weak mixing is important because it can be simply characterizedin terms of spectral properties of the operator U . We have already seen this istrue of ergodicity i.e. U has only one eigenfunction with eigenvalue 1, namely theconstant function. In contrast there is no known spectral condition equivalent tostrong mixing.

Proposition 3.1. Suppose T : Ω → Ω is a measure preserving transformation.Then T is weak mixing if and only if the operator U of (3.9) has no eigenfunctionsin L2(Ω) other than the constant function.

Page 12: CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability

12 JOSEPH G. CONLON

Proof. We first note that an eigenvalue λ ∈ C of U must lie on the unit circle in Ci.e. |λ| = 1. This follows from (3.13) since it implies that ‖Uφ‖ = ‖φ‖, φ ∈ L2(Ω).Suppose now that (3.30) holds and Uφ = λφ for some φ ∈ L2(Ω) orthogonal to theconstant function, so 〈φ(·)〉 = 0. Since |λ| = 1 equation (3.30) with f = φ yields‖φ‖2 = 0, whence φ ≡ 0.

Conversely suppose that U has no eigenfunctions in L2(Ω) other than the con-stant function. In order to prove (3.30) we need to use a rather deep result [2],namely that for f ∈ L2(Ω) there is a spectral measure µf on the Borel sets of theunit circle S = z ∈ C : |z| = 1, such that

(3.33) (Unf, f) =∫S

λndµf (λ) .

Thus µf is a positive finite measure on the Borel sets of S and

(3.34) µf (S) = ‖f‖2 .Let us suppose now that f is orthogonal to the constant function. Then the condi-tion on the spectrum of U implies that µf has no atoms i.e. that µf (λ) = 0 forall λ ∈ S. Consider now the sum(3.35)1N

N−1∑n=0

|(Unf, f)|2 =∫S

∫S

[1N

N−1∑n=0

(λλ′)n]dµf (λ)dµf (λ′) =

∫S

∫S

gN (λλ′)dµf (λ)dµf (λ′)

where the function gN : S → C is continuous with the properties

(3.36) gN (z) = [1− zN ]/N [1− z], z 6= 1, gN (1) = 1. ‖gN‖∞ ≤ 1.

Evidently we have that limN→∞ gN (z) = 0 for z 6= 1. The fact that µf has noatoms implies that µf × µf ((λ, λ) ∈ S × S : λ ∈ S) = 0. This follows fromFubini’s theorem that for any measurable subset E ⊂ S × S then

(3.37) µf × µf (E) =∫S

µf (Eλ) dµf (λ) ,

where µf (Eλ) is the µf measure of the cross-section of E at fixed λ ∈ S. From thedominated convergence theorem we conclude then from (3.35), (3.36) that

(3.38) limN→∞

1N

N−1∑n=0

|(Unf, f)|2 = 0.

Finally from the Schwarz inequality we have that

(3.39)1N

N−1∑n=0

|(Unf, f)| ≤

[1N

N−1∑n=0

|(Unf, f)|2]1/2

,

whence (3.30) follows.

Let us consider the two examples we gave of measure preserving transformationsat the beginning of §3. In Example I we have that

(3.40) Uf(θ) = f(θ + λ), θ ∈ [0, 1], for periodic functions f : [0, 1]→ C .

Hence fn(θ) = e2πniθ, n ∈ Z, are eigenfunctions of U in L2(Ω) with eigenvaluese2πniλ, n ∈ Z. If λ is a rational number, λ = p/q for some integers p, q with q 6= 0,it is clear that fq(θ) has eigenvalue 1. Hence T is not ergodic if λ is rational. Onthe other hand if λ is irrational then the only eigenfunction with eigenvalue 1 is the

Page 13: CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability

MATH 625-2010 13

constant function f0(θ), whence T is ergodic. We conclude that T is ergodic if andonly if λ is irrational. Evidently from Proposition 3.1 we see that T is not weakmixing, so we have constructed maps which are ergodic but not weak mixing.

Next we consider Example II. We can prove that the mapping T is strong mixingby the same argument as we used to prove the Kolmogorov zero-one law Proposition4.3 of Chapter I.

Proposition 3.2. Let X1, X2, ...., be i.i.d. variables on a probability space (Ω,F , P )and define the space Ω as all coordinates (X1(ω), X2(ω), ....), ω ∈ Ω, and F as theσ−field on Ω generated by finite dimensional rectangles. Let P be the probabilitymeasure on (Ω, F) induced by P i.e.

(3.41) P (A1 ×A2 × · · · ×AN ) = P (X1 ∈ A1, ..., XN ∈ AN )

on finite dimensional rectangles. Now define the shift operator T on Ω by

(3.42) T (X1, X2, X3, ....) = (X2, X3, ....) .

Then T is measure preserving and strong mixing.

Proof. It follows from the fact that the variables X1, X2, ..., are i.i.d. that T ismeasure preserving. To prove (3.27) we observe that (3.27) holds if A,B are finitedimensional rectangles. To see that it holds in general we assume B is a rectangleand F1 is the collection of sets in F for which (3.27) holds. Evidently F1 is aσ−field containing all finite dimensional rectangles and thus is identical to F . Wecan similarly see that B can be an arbitrary set in F .

Proposition 3.2 implies that the mapping of Example II is strong mixing. Tosee that a mapping can be weak mixing but not strong mixing is not so easysince all examples are complicated (see [1], page 87 for an example constructed byvonNeumann).

4. The Birkhoff Ergodic Theorem

In this section our main goal will be to prove a probability 1 convergence theoremcorresponding to the von-Neumann Theorem 3.1. To do this we proceed similarlyto the section on Martingales and define a maximal function. Thus for measurepreserving T : Ω→ Ω and φ ∈ L1(Ω) we define MNφ by

(4.1) MNφ(ω) = sup1≤n≤N

∣∣∣∣∣∣ 1nn−1∑j=0

φ(T jω)

∣∣∣∣∣∣ , ω ∈ Ω.

Parallel to Proposition 2.1 we have the following:

Proposition 4.1. Let T : Ω→ Ω be measure preserving and φ ∈ L1(Ω). Then fora > 0 there is the inequality,

(4.2) P (MN > a) ≤ 1a〈 |φ(·)| 〉 , N ≥ 1.

Proof. The proof is based on a clever inequality. Let f ∈ L1(Ω) be a real but notnecessarily positive function and for n = 0, 1, 2, ..., set fn = [I + U + · · ·+ Un−1]fwhere f0 ≡ 0. The inequality states

(4.3) E[ f(·) ; max0≤n≤N

fn > 0 ] ≥ 0, N = 1, 2, ...

Page 14: CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability

14 JOSEPH G. CONLON

To see this let FN = max0≤n≤N fn and observe first that f0 ≡ 0 implies that FN

is non-negative. Furthermore if FN (ω) > 0 then there exists nω with 1 ≤ nω ≤ Nsuch that FN (ω) = fnω (ω) > 0. In that case one has fnω (ω)− fnω−1(Tω) = f(ω),whence it follows that

(4.4) E[ f(·) ; FN > 0 ] ≥ E[ FN (·)− FN (T ·) ; FN > 0 ], N ≥ 1.

Finally we note that

(4.5) E[ FN (·)− FN (T ·) ; FN > 0 ] ≥ E[ FN (·) ]− E[ FN (T ·) ] = 0,

where in (4.5) we are using the fact that FN (·) is a non-negative function and themeasure preserving property of T .

The proof of (4.2) is a simple consequence of (4.3). We assume wlog that φ(·) isa positive function and set f(·) = φ(·)− a in (4.3), which yields the inequality

(4.6) E[ φ(·)− a ;MNφ(·) > a ] ≥ 0,

which immediately implies (4.2).

Theorem 4.1 (Birkhoff Ergodic Theorem). Suppose the measure preserving trans-formation T : Ω→ Ω is ergodic. Then for any φ ∈ L1(Ω) one has

(4.7) limN→∞

1N

N−1∑n=0

φ(Tnω) = 〈 φ(·) 〉 for ω ∈ Ω with probability 1.

Proof. As before we can assume 〈 φ(·) 〉 = 0, in which case we need to prove that

(4.8) limN→∞

1N

N−1∑n=0

φ(Tnω) = 0 for ω ∈ Ω with probability 1

for all φ ∈ L1(Ω) satisfying 〈 φ(·) 〉 = 0. Arguing as in the von-Neumann theoremwe see that (4.8) holds for φ = Dψ = [U − I]φ where ψ(·) is any bounded functioni.e. ‖ψ‖∞ < ∞. Now for φ ∈ L1(Ω) with 〈 φ(·) 〉 = 0, and any ε > 0 there existsψε with ‖ψε‖∞ < ∞ and ‖φ −Dψε‖1 < ε. To see this note first that there existsφε ∈ L2(Ω) such that ‖φ− φε‖1 < ε/4 and 〈 φε(·) 〉 = 0. Second observe from theargument of Theorem 3.1 that there exists ψε ∈ L2(Ω) such that ‖φε−Dψε‖2 < ε/4,and finally that there exists ψε with ‖ψε‖∞ <∞ and ‖ψε − ψε‖2 < ε/4.

To complete the proof we write φ = Dψε + [φ − Dψε] with ‖ψε‖∞ < ∞ and‖φ−Dψε‖1 < ε. In that case we have

(4.9) supN0≤N≤N1

∣∣∣∣∣ 1N

N−1∑n=0

φ(Tnω)

∣∣∣∣∣ ≤ 2‖ψε‖∞N0

+MN1 [φ−Dψε](ω), ω ∈ Ω.

Observe now that for any φ ∈ L1(Ω) and ω ∈ Ω the sequence MNφ(ω), N = 1, 2, ..,is positive increasing. It follows therefore from Proposition 4.1 that any set Ea(φ),

(4.10) Ea(φ) = ∪∞N=1ω ∈ Ω : MNφ(ω) > a

satisfies the inequality

(4.11) P ( Ea(φ) ) ≤ 1a〈 |φ(·)| 〉 .

Page 15: CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability

MATH 625-2010 15

From (4.9) it follows that(4.12)

supN0≤N<∞

∣∣∣∣∣ 1N

N−1∑n=0

φ(Tnω)

∣∣∣∣∣ ≤ 2‖ψε‖∞N0

+ a, N0 ≥ 1, ω ∈ Ω− Ea([φ−Dψε]).

Taking a =√ε in (4.12) we conclude that

(4.13) lim supN→∞

∣∣∣∣∣ 1N

N−1∑n=0

φ(Tnω)

∣∣∣∣∣ ≤ √ε, ω ∈ Ωε,

where P (Ωε) ≥ 1−√ε. The result follows from (4.13).

The Birkhoff theorem enables us to prove a yet stronger version of SLLN thanCorollary 2.1.

Corollary 4.1. Let X1, X2, ..., be a sequence of i.i.d. variables with E[ |X1| ] <∞,and SN = X1 + · · ·+XN , N = 1, 2, ... Then

(4.14) limN→∞

SNN

= E[X1] with probability 1.

Proof. This follows from Proposition 3.2 and Theorem 4.1 on taking φ ∈ L1(Ω) tobe φ(x1, x2, ....) = x1.

Finally we prove an Lp(Ω) convergence theorem just as we did in the Martingalecase, Theorem 2.3. Note however that here there is convergence even when p = 1.

Proposition 4.2. Suppose T : Ω→ Ω is measure preserving and ergodic, 1 ≤ p <∞ and φ ∈ Lp(Ω). Then the convergence (4.7) holds in Lp(Ω).

Proof. We first prove the result for p = 1. Observe that if ‖φ‖∞ < ∞ the L1(Ω)convergence holds by Theorem 4.1 and the dominated convergence theorem. Moregenerally for φ ∈ L1(Ω) there exists for any ε > 0 a function φε with ‖φε‖∞ < ∞and ‖φ− φε‖1 < ε. Upon writing φ = φε + [φ− φε] we see that

(4.15) lim supN→∞

∥∥∥∥∥ 1N

N−1∑n=0

φ(Tnω) − 〈 φ(·) 〉

∥∥∥∥∥1

≤ 2ε .

The result follows by letting ε→ 0. Evidently one can generalize this argument top > 1.

FInally we show that if a transformation is measure preserving and ergodic withrespect to two different probability measures, then these measures are mutuallysingular.

Proposition 4.3. Let (Ω,F) be a space of Borel sets and T : Ω→ Ω a measurabletransformation on (Ω,F) which is measure preserving and ergodic with respect totwo different probability measures P1, P2 on F . Then there exists a set E ∈ F suchthat P1(E) = 1 and P2(E) = 0.

Proof. Since P1 and P2 are different, there exists a set A ∈ F such that P1(A) 6=P2(A). Observe now from Birkhoff’s theorem that

(4.16) limN→∞

1N

N−1∑n=0

χA(Tnω) = P2(A), ω ∈ Ω− E,

Page 16: CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability

16 JOSEPH G. CONLON

for some set E ∈ F with P2(E) = 0. Since P1(A) 6= P2(A) Birkhoff’s theorem alsoimplies that P1(Ω− E) = 0, whence the result follows.

5. Recurrence Theorems

We have already introduced the notion of recurrence in Chapter I in the contextof random walk on the integers Z. Here we more or less generalize this to arbitraryprobability spaces (Ω,F , P ) with a measure preserving mapping T : Ω→ Ω. Thusfor any ω ∈ Ω consider the sequence Tnω, n = 0, 1, 2, ... We can think of thissequence in terms of the discrete dynamics ω → Tω, ω ∈ Ω, as the dynamicaltrajectory started at ω. We say that T is recurrent on a set A ∈ F with P (A) > 0if ω ∈ A : there exists n(ω) ≥ 1 and Tn(ω)ω ∈ A has the same measure as A.Thus with probability 1 a trajectory started in A returns at some later time to A.Note that if T is recurrent on A then T recurs infinitely often to A with probability1.

Proposition 5.1 (Poincare Recurrence Theorem). Suppose T : Ω→ Ω is measurepreserving and A ∈ F has P (A) > 0. Then T is recurrent on A. If in addition Tis ergodic then ω ∈ Ω : there exists n(ω) ≥ 1 and Tn(ω)ω ∈ A has probability 1.

Proof. Let B = ω ∈ A : for all n ≥ 1, Tnω /∈ A and observe that the setsT−nB, n = 0, 1, 2, .., are disjoint. It follows that P (B) = 0.

Let us assume now that T is ergodic. For 0 ≤ m <∞ let Am be the sets

(5.1) Am = ∪n≥mT−nA .

and observe that

(5.2) A0 = ω ∈ Ω : there exists n(ω) ≥ 1 and Tn(ω)ω ∈ A .

Evidently the Am are a decreasing sequence of sets and T−1Am = Am+1, m ≥0. Since T is measure preserving it follows that P (Am) = P (Am+1), whence weconclude that P (Am −Am+1) = 0. Observe now that

(5.3) A∞ = ∩∞m=0Am = A0 − ∪∞m=0[Am −Am+1] ,

whence P (A∞) = P (A0) ≥ P (A) > 0. The result now follows from the fact thatT−1A∞ = A∞ i.e. A∞ is an invariant set so ergodicity implies P (A∞) = 1, and soP (A0) = 1.

Proposition 5.1 enables us to define a recurrence time τA associated with a setA of positive probability by

(5.4) τA(ω) = infn ≥ 1 : Tnω ∈ A , ω ∈ A,

which has the property τA < ∞ with probability 1 on A. There is a beautifulformula relating the conditional expectation E[τA | A] and P (A) > 0.

Proposition 5.2. For measure preserving ergodic T : Ω→ Ω, there is the identity

(5.5) E[τA | A] = 1/P (A) .

Page 17: CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability

MATH 625-2010 17

Proof. We use the formula

(5.6) E[τA | A] =1

P (A)

∞∑k=1

P ( ω ∈ A : τA(ω) ≥ k ).

Observe now that

(5.7) ω ∈ A : τA(ω) ≥ k = ω ∈ A : Tω /∈ A, T 2ω /∈ A, ...., T k−1ω /∈ A .

Furthermore one has for any m ≥ 1,

(5.8) T−mω ∈ A : Tω /∈ A, T 2ω /∈ A, ...., T k−1ω /∈ A =

ω ∈ Ω : Tmω ∈ A, Tm+1ω /∈ A, ...., Tm+k−1ω /∈ A .

Hence by the measure preserving property of T it follows that(5.9)P ( ω ∈ A : τA(ω) ≥ k ) = P ( ω ∈ Ω : Tmω ∈ A, Tm+1ω /∈ A, ...., Tm+k−1ω /∈ A ) ,

which in turn implies that for any N ≥ 1,

(5.10)N+1∑k=1

P ( ω ∈ A : τA(ω) ≥ k ) =

N+1∑k=1

P ( ω ∈ Ω : TN−(k−1)ω ∈ A, TN−(k−2)ω /∈ A, ...., TNω /∈ A ) =

P ( ω ∈ Ω : Tnω ∈ A for some n, 0 ≤ n ≤ N ) .

The result follows now from Proposition 5.1 since it implies that

(5.11) limN→∞

P ( ω ∈ Ω : Tnω ∈ A for some n, 0 ≤ n ≤ N ) = 1.

Remark 1. Observe that the formula (5.5) does not depend on the mapping T .Thus it is stating that a dynamical trajectory of T samples the space Ω in an equidis-tributed way i.e. the proportion of time it spends in a set A is P (A).

Evidently from Proposition 5.1 we may define a sequence of recurrence timesτA,m, m = 1, 2, .., on A, where

∑mj=1 τA,j(ω) is the mth recurrence time of the

sequence Tnω, n = 1, 2, .., to A. It seems clear that the variables τA,m, m = 1, 2, ..,should all have the same distribution. To prove this is actually a little tricky so wewill just show that τA,1 and τA,2 have the same distribution.

Lemma 5.1. Suppose T : Ω → Ω is measure preserving ergodic and A ⊂ Ω hasP (A) > 0. Then the variables τA,1 and τA,2 on A with conditional probabilitymeasure PA where PA(B) = P (B)/P (A), B ⊂ A, have the same distribution.

Proof. We have now that

τA,1(ω) = infn ≥ 1 : Tnω ∈ A, ω ∈ A,(5.12)

τA,2(ω) = infn ≥ 1 : Tn+τA,1(ω)ω ∈ A, ω ∈ A,

and we need to show that

(5.13) P ( ω ∈ A : τA,1(ω) = n ) = P ( ω ∈ A : τA,2(ω) = n ) n ≥ 1.

Page 18: CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability

18 JOSEPH G. CONLON

To see this we write(5.14)

P ( ω ∈ A : τA,2(ω) = n ) =∞∑k=1

P ( ω ∈ A : τA,2(ω) = n, τA,1(ω) = k ) .

Now we have that

(5.15) P ( ω ∈ A : τA,2(ω) = n, τA,1(ω) = k ) =

P ( ω ∈ A : Tω /∈ A, ...., T k−1ω /∈ A, T kω ∈ A, T k+1ω /∈ A, ...., T k+n−1ω /∈ A, T k+nω ∈ A ) .

Proceeding as in Proposition 5.2, let N ≥ n+ k, whence (5.15) implies that

(5.16) P ( ω ∈ A : τA,2(ω) = n, τA,1(ω) = k ) =

P ( ω ∈ Ω : TN−k−nω ∈ A, TN−k−n+1ω /∈ A, ...., TN−n−1ω /∈ A,TN−nω ∈ A, TN−n+1ω /∈ A, ...., TN−1ω /∈ A, TNω ∈ A ) .

We conclude that

(5.17)N−n∑k=1

P ( ω ∈ A : τA,2(ω) = n, τA,1(ω) = k ) =

P ( ω ∈ Ω : T jω ∈ A for some j, 0 ≤ j < N − n,TN−nω ∈ A, TN−n+1ω /∈ A, ...., TN−1ω /∈ A, TNω ∈ A ) .

Evidently the RHS of (5.17) is the same as

(5.18) P ( ω ∈ Ω : TN−nω ∈ A, TN−n+1ω /∈ A, ...., TN−1ω /∈ A, TNω ∈ A ) −P ( ω ∈ Ω : T jω /∈ A for all j, 0 ≤ j < N − n,

TN−nω ∈ A, TN−n+1ω /∈ A, ...., TN−1ω /∈ A, TNω ∈ A ) .

Using the measure preserving property of T , one sees that the expression in (5.18)is the same as

(5.19) P ( ω ∈ A : Tω /∈ A, ...., Tn−1ω /∈ A, Tnω ∈ A ) −P ( ω ∈ Ω : T jω /∈ A for all j, 0 ≤ j < N − n,

TN−nω ∈ A, TN−n+1ω /∈ A, ...., TN−1ω /∈ A, TNω ∈ A ) .

Proposition 5.1 implies that

(5.20) limN→∞

P ( ω ∈ Ω : T jω /∈ A for all j, 0 ≤ j < N − n ) = 0,

whence (5.13) follows from (5.14), (5.17), (5.19) and (5.20).

We can actually make a stronger statement about the sequence of variablesτA,m, m = 1, 2, .., than that they all have the same distribution. To do this weintroduce a new notion:

Definition 4. Let X1, X2, ...., be a sequence of (not necessarily independent) vari-ables on a probability space (Ω,F , P ) and let (Ω, F , P ) be the associated coordinatespace as in Proposition 3.2. If the shift operator (3.42) is measure preserving on Ωthen the sequence X1, X2, ...., is said to be stationary. If the shift operator is alsoergodic, the stationary sequence is said to be ergodic.

Page 19: CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability

MATH 625-2010 19

Evidently if X1, X2, ...., is stationary then all the variables Xj , j ≥ 1, have thesame distribution, Stationarity of the sequence implies much more than this, infact that for any n ≥ 0, all the vector variables (Xj , Xj+1, .., Xj+n), j ≥ 1, havethe same distribution. One can generalize Lemma 5.1 to see that the sequence ofvariables τA,m, m = 1, 2, .., on (A,PA) is stationary. We shall assume this andfurther show that the shift operator T on this sequence is also ergodic.

Proposition 5.3. Le T : Ω → Ω be measure preserving and ergodic, and A ⊂ Ωsatisfy P (A) > 0. Then the sequence of recurrence times τA,m, m = 1, 2, .., on Ais stationary and ergodic.

Proof. Observe first that if the sequence τA,m, m = 1, 2, .., on A is stationary andergodic, then Proposition 5.2 and the Birkhoff theorem imply that

(5.21) limN→∞

1N

N∑m=1

τA,m(ω) =1

P (A), ω ∈ A with probability 1.

We shall first prove (5.21) directly. To do this let RN be the Nth recurrence timeon A,

(5.22) RN (ω) =N∑k=1

τA,k(ω) , ω ∈ A.

Evidently we have that

(5.23)RN (ω)∑j=1

χA(T jω) = N, ω ∈ A,

where χA is the characteristic function for A. Since infRN (ω) : ω ∈ A ≥ N, N =1, 2, ...,, it follows from the Birkhoff theorem that

(5.24) limN→∞

1RN (ω)

RN (ω)∑j=1

χA(T jω) = P (A) , ω ∈ A with probability 1.

We can rewrite (5.24) as

(5.25) limN→∞

RN (ω)N

=1

P (A), ω ∈ A with probability 1,

and this is the same as (5.21).We may extend this argument to prove ergodicity of the shift operator on the

sequence τA,m, m = 1, 2, ... To do this we recall the criterion (3.26) for ergodicity.A condition which implies this is the following:

(5.26) limN→∞

1N

N−1∑m=0

φ(Tm·) = 〈 φ(·) 〉 , with probability 1 for any φ ∈ L∞(Ω) .

Note that (5.21) is simply (5.26) for the function φ : Z∞+ → R defined by φ(τ1, τ2, ...) =τ1, where Z∞+ is the infinite product of the positive integers. Of course this par-ticular function φ is not in L∞(Ω). Now given a function φ : Z∞+ → R, we canconstruct another function ψ : 0, 1∞ → R such that

(5.27) φ(τ1, τ2, ...) = ψ(0, 0, .., 1, 0, 0, .., 1, ...) ,

Page 20: CHAPTER I I - MARTINGALES AND MEASURE …conlon/math625/chapter2.pdf · CHAPTER I I - MARTINGALES AND MEASURE PRESERVING TRANSFORMATIONS JOSEPH G. CONLON 1. Conditional Probability

20 JOSEPH G. CONLON

where the first 1 on the RHS occurs at the τ1th position, the second 1 at the(τ1 + τ2)th position etc. It is evident that

(5.28) φ(τA,1(ω), τA,2(ω), ...) = ψ(χA(Tω), χA(T 2ω), χA(T 3ω), ......) , ω ∈ A.More generally one has that for k ≥ 1,(5.29)φ(τA,k(ω), τA,k+1(ω), ...) = ψ(χA(T j+1ω), χA(T j+2ω), χA(T j+3ω), ......) , ω ∈ A, j = Rk−1(ω) ,

where we set R0 ≡ 0. From the ergodicity of T and the Birkhoff theorem we havethat

(5.30)

limN→∞

1RN (ω)

RN (ω)∑j=0

χA(T jω) ψ(χA(T j+1ω), χA(T j+2ω), χA(T j+3ω), ......) =

P (A) E[ψ(χA(Tω), χA(T 2ω), χA(T 3ω), ......) | A

], ω ∈ A with probability 1.

The limit (5.26) follows now from (5.25), (5.30) and the identity derived from (5.29),

(5.31)N+1∑k=1

φ(τA,k(ω), τA,k+1(ω), ...) =

RN (ω)∑j=0

χA(T jω) ψ(χA(T j+1ω), χA(T j+2ω), χA(T j+3ω), ......) , ω ∈ A .

References

[1] W. Parry, Topics in Ergodic Theory, Cambridge University Press, Cambridge 1981.

[2] M. Reed and B. Simon, Methods of Mathematical Physics I- Functional Analysis, AcademicPress, London 1980.

University of Michigan, Department of Mathematics, Ann Arbor, MI 48109-1109

E-mail address: [email protected]