An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An...

36
An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable) random variables 2. σ-algebras 3. Expectation (integration) 4. Conditional expectation 5. Useful inequalities 6. Independent random variables 7. The central limit theorem 8. Laws of large numbers: Borel-Cantelli lemma 9. Uniform integrability 10. Kolmogorov’s extension theorem for consistent finite-dimensional distributions March 7, 2015 George Kesidis 1 Sample space and events Consider a random experiment resulting in an outcome (or “sample”), ω. E.g., the experiment is a pair of dice thrown onto a table and the outcome is the exact orientation of the dice and their position on the table when they stop moving. The space of all outcomes, Ω, is called the sample space, i.e., ω Ω. An event is merely a subset of Ω, e.g., “the sum of the dots on the upward facing surfaces of the dice is 7”. We say that an event A Ω has occurred if the outcome ω of the random experiment belongs to A, i.e., ω A, so events A and B occurred if ω A B, and events A or B occurred if ω A B. A sample space Ω is an abstract, unordered set in general. Let F be the set of events, i.e., A F A Ω. March 7, 2015 George Kesidis 2

Transcript of An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An...

Page 1: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

An Introduction to Probability Theory: Outline

1. Definitions: sample space and (measurable)random variables

2. !-algebras

3. Expectation (integration)

4. Conditional expectation

5. Useful inequalities

6. Independent random variables

7. The central limit theorem

8. Laws of large numbers: Borel-Cantelli lemma

9. Uniform integrability

10. Kolmogorov’s extension theorem for consistentfinite-dimensional distributions

! March 7, 2015 George Kesidis

1

Sample space and events

• Consider a random experiment resulting in an outcome (or“sample”), ".

• E.g., the experiment is a pair of dice thrown onto a tableand the outcome is the exact orientation of the dice andtheir position on the table when they stop moving.

• The space of all outcomes, !, is called the sample space,i.e., " ! !.

• An event is merely a subset of !, e.g., “the sum of the dotson the upward facing surfaces of the dice is 7”.

• We say that an event A " ! has occurred if the outcome "of the random experiment belongs to A, i.e., " ! A, so

– events A and B occurred if " ! A #B, and

– events A or B occurred if " ! A $B.

• A sample space ! is an abstract, unordered set in general.

• Let F be the set of events, i.e., A ! F % A " !.

! March 7, 2015 George Kesidis

2

Page 2: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Probability on a sample space

• A probability measure P maps each event A " ! to a realnumber between zero and one inclusive, i.e., P(A) ! [0,1].

• A probability measure has certain properties:

1. P(!) = 1 and

2. P(A) = 1& P(Ac) ' events A, whereAc = {" ! ! | " (! A} is the complement of A.

• Moreover, if the events {Ai}ni=1 are disjoint (i.e., Ai #Aj = )for all i (= j), then

P

!

n"

i=1

Ai

#

=n$

i=1

P(Ai),

i.e., P is finitely additive.

• Formally, a probability measure is defined to be countablyadditive:

3. For any disjoint {Ai}*i=1,

P

! *"

i=1

Ai

#

=*$

i=1

P(Ai),

! March 7, 2015 George Kesidis

3

Probability measures on !-algebras

• On large sample spaces ! (e.g., ! = R), a formal probabilitymeasure may be impossible to construct if all subsets of !are defined as events, i.e., if F = 2! (the power set of allsubsets of !), cf., Caratheodory’s Extension Theorem.

• So, the set of events F is restricted to a !-algebra (or !-field)of subsets of ! formally satisfying the following properties:

1. ! ! F (possesses intersection identity)

2. if A ! F then Ac ! F (closed under complementation)

3. if A1, A2, A3, ... ! F, #*n=1An ! F (closed under countable

intersections)

• The probability measure P is defined only on the !-algebraF " 2!,

P : F + [0,1].

• We have thus identified a fundamental probability (measure)space: (!,F ,P).

• Note: Equivalently by De Morgan’s theorem, can use ) ! F(union identity) and closed under countable unions insteadof conditions 1 and 3 above.

! March 7, 2015 George Kesidis

4

Page 3: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Conditioned events

• The probability that A occurred conditioned on (or “giventhat”) another event B occurred is P(A|B) := P(A#B)/P(B),where P(B) > 0 is assumed.

• A group of events A1, A2, ..., An are said to be mutually in-dependent if

P

!

%

i!I

Ai

#

=&

i!I

P(Ai) ' I " {1,2, ..., n}.

• Note if events A and B are independent and P(B) > 0,then P(A|B) = P(A), i.e., knowledge that the event B hasoccurred has no bearing on the probability that the event Ahas occurred as well.

• Given that B has occurred with P(B) > 0:

– The set of events FB := {A # B | A ! F} is itself a!-algebra, and

– P(·|B), also a probability measure for (!,F) and (B,FB),addresses the residual uncertainty in the random experi-ment given that the event B has occurred.

– On (!,F), P(A) = 0 % P(A|B) = 0 'A ! F, i.e., P(·|B)is absolutely continuous w.r.t. P.

! March 7, 2015 George Kesidis

5

Random variables

• A random variable X is a real-valued function with domain!, X : ! + R.

• So, X(") is a real number representing some feature of theoutcome ".

• E.g., in a dice-throwing experiment, X(") could be definedas just the sum of the dots on the upward-facing surfacesof outcome " (which is the configuration of the dice on thetable when they stop moving).

• For random variables, we are typically interested in the prob-ability of the event that X takes values in a contiguous in-terval B of the real line (including singleton points), or someunion of such intervals, i.e.,

P(X ! B) := P({" ! ! | X(") ! B}) =: P(X&1(B)).

• To ensure that the fundamental probability space (!,F ,P)is capable of evaluating the probabilities of such events, weformally define random variables as being measurable.

• To explain measurability, we need to first define the Borel!-algebra of subsets of R that is generated by contiguousintervals of R.

! March 7, 2015 George Kesidis

6

Page 4: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

The Borel !-algebra on R

• Consider contiguous intervals of the real line, e.g.,

[x,*) = {z ! R | z , x} or

(x, y] = {z ! R | x < z - y} etc.

• Define !(A) as the smallest !-algebra containing all elementsof elements A, i.e., generated by A.

• The Borel !-algebra is

B := !([x,*) | x ! R).

• Note that the singleton sets {x} ! B 'x ! R and that, e.g.,

B = !((&*, x] | x ! R)

= !([x, y) | x - y, x, y ! Q) etc

To see the first equality, note that [x,*)c = (&*, x) andthat we can define a monotonically decreasing sequence xn

converging to x so that #*n=1(&*, xn) = (&*, x].

• The Vitali subset of R is not in the Borel !-algebra, i.e.,B (= 2R.

• Indeed, the cardinality of B is only that of R.

! March 7, 2015 George Kesidis

7

Measurability of random variables

• Formally, random variables are defined to be measurablewith respect to (!,F), i.e.,

X&1(B) ! F 'B ! B,so that P(X ! B) is well-defined 'B ! B.

• A random variable X induces a probability measure PX on(R,B) (the distribution of X),

PX(B) := P(X ! B),

so that (R,B,PX) is also a probability space.

• Note: If a function g : R + R is (R,B)-measurable (i.e.,g&1(B) ! B 'B ! B) and X is a random variable, then g(X)is a random variable too.

• Note: The cumulative distribution function (CDF) of X isjust FX(x) := PX((&*, x]) = P(X - x).

! March 7, 2015 George Kesidis

8

Page 5: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Measurable compositions of random variables

If Y , X and X1, X2, X3, ... are all extended random variables, thenthe following are also random variables:

• min{X,Y }, max{X, Y }, XY , 1{X (= 0}/X where 00:= 1.

• #X + $Y '#, $ ! R.

• supn,1Xn, infn,1Xn, lim supn+*Xn, lim infn+*Xn.

! March 7, 2015 George Kesidis

9

!-algebra generated by a random variable

• Define !(X) := !({X&1(B) | B ! B}), i.e., the smallest!-algebra of events for which the random variable X is mea-surable.

• Note: One can directly show that

!(X) = !({X&1([x,*)) | x ! R}),i.e., considering only a “generating” subset of B.

• !(X) captures the “information” gained by the knowledgeof X(") about the outcomes ".

• E.g., If X is constant then !(X) = {),!}.

• E.g., If X = 1B for an event B ! F,

– where the the indicator function 1B(") = 1 if " ! B and1B(") = 0 else (also may write 1B := 1B),

– i.e., X is a Bernoulli distributed random variable,

– then !(X) = {), B,Bc,!}.

– If the scalars a (= b, then Y := a1B + b1Bc also indicateswhether B or Bc has occurred,

– i.e., !(X) = !(Y ) in this case.

• Doob’s Theorem: If Y is !(X)-measurable (so that !(Y ) "!(X)), then . Borel measurable g such that Y = g(X) a.s.

• If g is one-to-one, then !(Y ) = !(X).

! March 7, 2015 George Kesidis

10

Page 6: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Independent random variables

• The random variables X1, X2, ...,Xn are said to be mutuallyindependent (or just “independent”) if and only if

P(#ni=1{Xi ! Bi}) =

n&

i=1

P(Xi ! Bi) 'B1, ..., Bn ! B.

• Clearly mutual independence implies that the joint CDF

FX1,X2,...,Xn(x1, x2, ..., xn) := P(#n

i=1{Xi - xi})satisfies

FX1,X2,...,Xn(x1, x2, ..., xn) =

n&

i=1

FXi(xi) 'x1, ..., xn ! R.

• To prove the converse statement, we will now discuss mono-tone class theorems.

• Note: The marginal FX1(x1) = FX1,...,Xn

(x1,*,*, ...,*).

! March 7, 2015 George Kesidis

11

Monotone class theorems

• C " 2! is a %-class over ! if A,B ! C % A #B ! C.

• C " 2! is a &-class over ! when:

(i) ! ! C;

(ii) if A,B ! C and A " B % B\A := B #Ac ! C; and

(iii) if A1, A2, ... ! C is monotonically increasing(An " An+1 'n), then $*

n=1An ! C.

• Proposition: If C is a &-class over !, then

(a) A ! C % Ac ! C (by (i) and (ii), !\A / Ac ! C).

(b) A,B ! C and A # B = ) (i.e., they’re disjoint), thenA$B ! C (A " Bc % (Bc\A)c / B$A ! C by (ii) and (a)).

(c) if C is also a %-class, then C is a !-algebra.

The proof of (c) is left as an exercise.

• Because of the conditions on (ii) or (b), a &-class seems lessinclusive than a !-algebra.

! March 7, 2015 George Kesidis

12

Page 7: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Dynkin’s theorem

If D is a %-class, C a &-class and D " C, then !(D) " C.Proof:

• Define G as the smallest &-class such that D " G; thusD " G " C.

• We now prove G is also a %-class; the theorem then followsby the previous proposition (c).

• To this end, define H := {A " ! | A #D ! G 'D ! D}:

– Since D " G and D is a %-class, D " H.

– Check that H is a &-class % (minimal) G " H.

– Thus, A ! G (% A ! H) and D ! D % A #D ! G.

• Now define F := {B " ! | B #A ! G 'A ! G}:

– By the previous step, D " F.

– Check that F is a &-class % (minimal) G " F.

– Thus, G is a %-class.

Note: So, a &-class can be much larger than a %-class.

Classical monotone class theorem: if D is an algebra, C amonotone class (contains all limits of its monotone sequences)and D " C, then !(D) " C.

! March 7, 2015 George Kesidis

13

Independence in probability space (!,F ,P)

• Lemma: If D, C " F are independent classes of events andD is a %-class, then !(D) and C are independent.Proof:

– Take an arbitrary B ! C and defineDB = {A ! !(D) | P(A #B) = P(A)P(B)}

– D " DB.

– Check DB is a &-class.

– Apply Dynkin’s theorem.

• Theorem: If the joint CDF FX1,...,Xn/

'ni=1 FXi

(i.e., theLHS and RHS are equal at all points in Rn), then the nrandom variables X1, ...,Xn are independent.Proof:

– Define Dk = {{Xk - x} | &* - x - *}.

– Note: x = &* % ) ! Dk.

– Check that Dk is a %-class 'k.

– Finally use the lemma to obtain independence of the!(Dk) = !(Xk).

! March 7, 2015 George Kesidis

14

Page 8: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Conditional Independence

• Events A and C are said to be independent given B if

P(A | B,C) = P(A | B).

• Note that this implies P(C | B,A) = P(C | B).

• This is a natural extension of the unqualified notion of in-dependent events, i.e., events A and C are (unconditionally)independent if P(A | C) = P(A).

• Similarly, random variables X and Y are conditionally inde-pendent given Z if

P(X ! A | Z ! B, Y ! C) = P(X ! A | Z ! B)

for all Borel A,B,C " R, cf., Markov processes.

! March 7, 2015 George Kesidis

15

Expectation

• The expectation EX of a random variable X is simply its av-erage or mean value, which can be expressed as the Riemann-Stieltjes integral:

EX =

( *

&*x dFX(x),

recall that the CDF FX is nondecreasing on R.

• In the special case of a di"erentiable FX with probabilitydensity function (PDF) fX = F 0

X, i.e., X is continuouslydistributed, we can use the Riemann integral

EX =

( *

&*xfX(x)dx.

• In the case of a discretely distributed random variable Xwith countable state-space RX,

– F 0X(x) =

)

'!RXpX(')((x& '), where

– ( is the Dirac unit impulse and the probability mass func-tion (PMF) pX(') := P(X = ') > 0 for all ' ! RX, so that

EX =$

'!RX

' pX(').

! March 7, 2015 George Kesidis

16

Page 9: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Lebesgue integration

• Formally, the Lebesgue integral is used to define expecta-tion:

EX =

(

!

X(")dP(") =

(

!

X dP.

• Recall that ! is generally abstract, unordered.

• If X is simple (discretely distributed with |RX| = M < *),

– define the state-space RX = {'1, '2, ..., 'M} and

– the events Ai := {X = 'i} := {" ! ! | X(") = 'i}, whicha.s. partition !,

– so that the (well-defined) Lebesgue integral is

(

!

X(")dP(") =M$

i=1

'iP(Ai),

i.e., P(Ai) = pX('i) for all i.

! March 7, 2015 George Kesidis

17

Lebesgue integration (cont)

• To develop the general Lebesgue integral, we need to con-sider “extended” random variables X : ! + R, where

– the extended reals R := R $ {±*}, and

– measurability involves Borel sets that include ±*.

• For any sequence X1, X2, X3, ... of random variables, the fol-lowing are extended random variables:

– limn+*Xn (assuming the limit in n of Xn(") exists '" !!) and

– supn+*Xn.

! March 7, 2015 George Kesidis

18

Page 10: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Approximating random variables

• Proposition: For any non-negative extended random variableX, there is a sequence of simple random variables Xn suchthat

(a) P(0 - Xn - Xn+1 - X) = 1 for all n (i.e., monotonicity),and

(b) Xn + X almost surely (a.s.), i.e., almost sure conver-gence:

P( limn+*

Xn = X) = 1.

• Proof:

1. Define the nth partition of R+ (i.e., of the y-axis unlikeRiemannian integration) into a finite collection of con-tiguous intervals

{[bkn, bk+1n )}Kn

k=0 ,

where 'n: b0n = 0, bKn+1n = *+ (last interval includes

*).

2. 'k, define Xn(") = bkn '" ! X&1[bkn, bk+1n ), i.e., Xn - X

a.s.

3. The (n + 1)st partition is finer than the nth (i.e., Xn -Xn+1), in such a way that limn+*Kn 1 * to achieve(b).

! March 7, 2015 George Kesidis

19

Construction of the Lebesgue integral

• So for a non-negative extended random variable, the Lebesgueintegral is defined as

(

!

X dP = limn+*

(

!

Xn dP,

i.e., EX = limn+* EXn.

• For a signed extended random variable:

1. Note that X = X+&X& where the non-negative extendedrandom variables

X+ := max{0, X} and X& := max{0,&X}.

2. If EX+ < * or EX& < *, then define the Lebesgueintegral

EX = EX+ & EX&,

otherwise the Lebesgue integral EX is not defined.Note: |X| = X+ +X&.

• E.g., 1Qc is Lebesgue but not Riemann integrable:

( 1

0

1Qc(x)dx = 1 · P(Qc # [0,1]) + 0 · P(Q # [0,1]) = 1,

where here P is Lebesgue measure on ! = [0,1], so thatP(Q # [0,1]) = 0 as the rationals are countable.

! March 7, 2015 George Kesidis

20

Page 11: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Integration theorems (1 variable integrand)

Consider a sequence X1, X2, ... of random variables:

• Bounded convergence theorem: if supn |Xn| - K < *a.s. (where K is constant) and X = limn+*Xn a.s., thenEX = limn+* EXn and E|X| - K.

• Proof: Define An = {X&Xn > )} for arbitrary positive ) 2 1and note

|EXn & EX| - E|Xn &X|= E|Xn &X|1An

+ E|Xn &X|1Acn

- 2KP(An) + ).

• Now note

(lim infn+*

Xn)(") := limn+*

infk,n

Xk(")

always exists (though possibly not finite) since Yn := infk,n Xk

is a.s. monotonically nondecreasing in n.

• Fatou’s lemma: lim infn+* EXn , E(lim infn+*Xn).

• Proof:

– Let X := lim infn+*Xn.

– For any K > 0, invoke the bounded convergence theoremon min{Yn,K} 1 min{X,K}.

– Approximating with simple RVs and using monotonicity,limK+* Emin{X,K} = EX.

! March 7, 2015 George Kesidis

21

Integration theorems (cont)

Let X be the extended RV such that Xn + X a.s.

• Monotone convergence theorem: if limn+*Xn 1 X a.s.,then

limn+*

EXn 1 EX.

• Lebesgue’s dominated convergence theorem:If there exists a random variable Y such that|X| - |Y | a.s. and E|Y | < *, then

limn+*

E|X &Xn| = 0.

• Corollary (Sche"e):

limn+*

E|Xn &X| = 0 3 limn+*

E|Xn| = E|X|

! March 7, 2015 George Kesidis

22

Page 12: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Conditional expected value andevent-conditional distributions

Consider a random variable X and an event A such that P(A) > 0.

• The conditional expected value of X given A, denoted µ(X|A)is

µ(X|A) =

( *

&*xdFX|A(x), where FX|A(z) := P(X - z|A).

• For a discretely distributed random variable X with RX ={aj}*j=1,

µ(X|A) =*$

j=1

ajP(X = aj|A).

• The conditional PMF of X given A is pX|A(aj) = P(X = aj|A)for all j.

• Event-conditional PDF of a continuously distributed X is

fX|A(x) :=d

dxFX|A(x) % µ(X|A) =

( *

&*xfX|A(x)dx.

! March 7, 2015 George Kesidis

23

Conditional expectation

• Consider now two discretely distributed X and Y .

• The conditional expectation of X given the random variableY , denoted E(X|Y ), is a random variable itself.

• Indeed, suppose {bj}*j=1 = RY and, for all samples

"j ! {" ! ! | Y (") = bj} =: Bj

define

E(X|Y )("j) := µ(X|Bj) := µ(X|Y = bj),

• That is, E(X|Y ) maps all samples in the event Bj to the con-ditional expected value µ(X|Bj), i.e., E(X|Y ) is “smoother”(less uncertain) than X.

• Therefore, the random variable E(X|Y ) is a.s. a function ofY , i.e., E(X|Y ) is !(Y )-measurable.

• So, E(X|Y ) = E(X|Z) a.s. whenever !(Z) = !(Y ) allowingfor di"erences involving P-null events.

! March 7, 2015 George Kesidis

24

Page 13: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Conditional densities

• Now consider two random variables X and Y which are con-tinuously distributed with joint PDF

fX,Y =*2FX,Y

*x*y.

• For fY (y) > 0, we can define the conditional density:

fX|Y (x|y) :=fX,Y (x, y)

fY (y)

for all x ! R.

• Note that fX|Y (·|y) is itself a PDF and,

µ(X|Y = y) =

( *

&*xfX|Y (x|y)dx, where P(Y = y) = 0.

! March 7, 2015 George Kesidis

25

Conditional expectation and MSE

• In general, E(X|Y ) is the function of Y which minimizes themean-square error (MSE),

E[(X & h(Y ))2],

among all (measurable) functions h.

• So, E(X|Y ) is the best approximation of X given Y .

• In particular, E(X|Y ) and X have the same expectation,

E(E(X|Y )) = EX.

• Note: if X and Y are independent,

E(X|Y ) = EX a.s.

! March 7, 2015 George Kesidis

26

Page 14: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Some useful inequalities

• If event A1 " A2, thenP(A1) - P(A2) = P(A1) + P(A2\A1).

• For any group of events A1, A2, ..., An, Boole’s inequalityholds:

P

!

n"

i=1

Ai

#

-n$

i=1

P(Ai).

• Note that when the Ai are disjoint, equality holds simply bythe additivity property of a probability measure P and recallthe inclusion-exclusion set identities.

• If two random variables X and Y are such that X , Y a.s.,then EX , EY .

• Recall Fatou’s lemma.

! March 7, 2015 George Kesidis

27

Markov’s Inequality

• Consider a random variable X with E|X| < * and a realnumber x > 0.

• Since |X| , |X|1{|X| , x} , x1{|X| , x} a.s., we arrive atMarkov’s inequality:

E|X| , Ex1{|X| , x}= xE1{|X| , x}= xP(|X| , x).

• An alternative explanation for continuously distributed ran-dom variables X (with PDF f) is

E|X| =

( *

&*|z|f(z)dz

,( &x

&*(&z)f(z)dz +

( *

xzf(z)dz

,( &x

&*xf(z)dz +

( *

xxf(z)dz

= xP(|X| , x).

! March 7, 2015 George Kesidis

28

Page 15: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Chebyshev and Cramer’s Inequalities

• Take x = )2, where ) > 0, and argue Markov’s inequalitywith (X&EX)2 in place of |X| to get Chebyshev’s inequality

var(X) := E[(X & EX)2] , )2P(|X & EX| , )),

i.e.,

P(|X & EX| , )) - )&2var(X).

• Noting that, for all + > 0, {X , x} = {e+X , e+x} and arguingas for Markov’s inequality gives the Cherno" (or Cramer)inequality:

Ee+X , e+xP(X , x)% P(X , x) - exp

*

&[x+ & log Ee+X]+

- exp

,

&max+>0

[x+ & log Ee+X]

-

,

where we have simply sharpened the inequality by taking themaximum over the free parameter + > 0.

• Note the Legendre transform of the log moment-generatingfunction of X in the Cherno" bound.

! March 7, 2015 George Kesidis

29

Inequalities of Minkowski, Holder, andCauchy-Schwarz-Bunyakovsky

• Minkowski’s inequality: if E|X|q, E|Y |q < * for q , 1, then

(E|X + Y |q)1/q - (E|X|q)1/q + (E|Y |q)1/q,i.e., triangle inequality in the Lq space of random variables.

• Holder’s inequality: if E|X|r, E|Y |q < * for r > 1 andq&1 := 1& r&1, then

E|XY | - (E|X|r)1/r(E|Y |q)1/q.

• CBS inequality (q = 2): if EX2, EY 2 < *, then

E|XY | -.

E(X2)

.

E(Y 2).

• CBS is strict whenever X (= cY or Y = 0 a.s. for someconstant c.

• CBS is an immediate consequence of the fact that wheneverX (= 0 a.s. and Y (= 0 a.s.,

E

!

X/

E(X2)&

Y/

E(Y 2)

#2

, 0.

! March 7, 2015 George Kesidis

30

Page 16: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Jensen’s Inequality

• Note that if we take Y = 1 a.s., the Cauchy-Schwarz-Bunyakovsky inequality simply states that var(X) , 0, i.e.,

E(X2)& (EX)2 , 0.

• This is also an immediate consequence of Jensen’s inequal-ity.

• A real-valued function g on R is said to be convex if

g(px+ (1& p)y) - pg(x) + (1& p)g(y)

for any x, y ! R and any real fraction p ! [0,1].

• If the inequality is reversed, g is said to be concave.

• For any convex function g and random variable X, we haveJensen’s inequality:

g(EX) - E(g(X)).

! March 7, 2015 George Kesidis

31

Inequalities: Conditioned versions

• These inequalities and integration theorems have straight-forward “conditional” extensions.

• E.g., the conditional Jensen’s theorem: if g is convex andG " F is a !-algebra,

E(g(X) | G) , g(E(X | G))

• Applying conditional Jensen’s with g(x) = |x|q for real q , 1,and using the linearity of conditional expectation, we getthe following result.

• If X and X1, X2, X3, ... are random variables such that, forreal q , 1, E|X|q < 1 and E|Xn|q < * (i.e., X,Xn ! Lq) 'n,then:

(a) ||E(X | G)||q - ||X||q := (E(|X|q))1/q, and, therefore,

(b) If limn+* ||X &Xn||q = 0, i.e., convergence in Lq, then

limn+*

||E(X | G)& E(Xn | G)||q = 0.

! March 7, 2015 George Kesidis

32

Page 17: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Sums of independent random variables

• Consider two independent random variables X1 and X2 withPDFs f1 and f2 respectively; so, fX1,X2

= f1f2.

• The CDF of the sum is

F (z) = P(X1 +X2 - z) =

( *

&*

( z&x1

&*f1(x1)f2(x2)dx2dx1.

• Exchanging the first integral on the RHS with a derivativew.r.t. z gives the PDF of X1 +X2:

f(z) =d

dzF (z) =

( *

&*f1(x1)f2(z & x1)dx1 for all z ! R.

• Thus, f is the convolution of f1 and f2 which is denotedf = f1 4 f2.

! March 7, 2015 George Kesidis

33

Sums of independent random variables

• In this context, moment generating functions can be usedto simplify calculations.

• Let the MGF (bilateral Laplace transform) of Xi be

mi(+) = Ee+Xi =

( *

&*fi(x)e

+xdx.

• The MGF of X1 +X2 is, by independence,

m(+) = Ee+(X1+X2) = Ee+X1e+X2 = m1(+)m2(+).

• So, convolution of PDFs corresponds to simple multiplica-tion of MGFs (and to addition of independent random vari-ables).

! March 7, 2015 George Kesidis

34

Page 18: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Example: exponential and gamma distributions

Consider independent random variables that are all exponentiallydistributed with mean 1/&.

• The PDF of X1 +X2 is f , where f(z) = 0 for z < 0 and, forz , 0,

f(z) =

( z

0

f1(x1)f2(z & x1)dx1 = &2ze&&z,

i.e., the (n,&) gamma distribution with n = 2 (a.k.a. Erlangdistribution when n ! Z+).

• So, the MGF of X1 +X2 is

m(+) =

,

&

&& +

-2

,

which is consistent with the PDF just computed.

• There is a 1-to-1 relationship between PDFs and MGFsof nonnegative random variables (unilateral Laplace trans-form).

• So, for a sum of n random variables

m(+) =

,

&

&& +

-n

3 fn(z) =&nzn&1e&&z

(n& 1)!'z , 0.

• Note: Construction of continuous-time Markov chains isbased on the memoryless property that is unique to theexponential distribution.

! March 7, 2015 George Kesidis

35

The Gaussian distribution

Assume Xi is Gaussian (normally) distributed with mean µi andvariance !2

i , i.e., Xi 5 N(µi,!2i ).

• If independent RVs, the MGF of X1 +X2 is

m(+) = exp*

µ1+ + 12!21+

2+

6 exp*

µ2+ + 12!22+

2+

= exp*

(µ1 + µ2)+ + 12(!2

1 + !22)+

2+

,

which we also recognize as a Gaussian MGF.

• Even if dependent, #1X1 + #2X2, for scalars #i, is Gaussiandistributed with mean #1µ1 + #2µ2 andvariance #2

1!21 + #2

2!22 + 2#1#2cov(X1, X2),

where the covariance cov(X1, X2) := EX1X2 & EX1EX2.

• X = (X1, X2, ...,Xn) are jointly Gaussian if

fX(x) =1

[2% det(C)]n/2exp

*

&12(x& EX)TC&1(x& EX)

+

,

where the (symmetric) covariance matrix isC = E(X & EX)(X & EX)T.

• Note: E(X1 | X2) = EX1 + (X2 & EX2)E(X1X2)/EX22 is a.s.

linear in X2, and

• if X1, X2 are uncorrelated (diagonal covariance matrix), thenGaussian distributed with mean µi and variance !2

i . X1, X2

are independent (the converse is always true).

! March 7, 2015 George Kesidis

36

Page 19: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

de Moivre’s formula

There is a constant $ > 0 such that n!en 5 $nn7n.

Proof:

• Define B(n) = n!en/(nn7n).

• logB(n) = 1+)n

j=2[logB(j)& logB(j & 1)]

where 1 = logB(1).

• By Taylor’s theorem,

log(1& x) = &x& x2/2& x3/3+ o(x3) where limy+0

o(y)

y= 0.

• So,

logB(j)& logB(j & 1) = 1+ (j &1

2) log(1&

1

j)

= &1

12j2+ o(

1

j2)

• Since j&2 is summable, B(n) converges to a finite $.

! March 7, 2015 George Kesidis

37

de Moivre-Laplace Central Limit Theorem (CLT)

• Consider a sequence of independent and identically distributed(i.i.d.) Bernoulli random variables X1, X2, ..., where

p := P(Xi = 1) and q := 1& p = P(Xi = 0).

• Define the sum Sn = X1 +X2 + ...+Xn.

• Sn is binomially distributed with parameters (n, p), i.e.,

P(Sn = k) =0n

k

1

pkqn&k

for all k ! {0,1,2, ..., n}.

• ESn = np and variance var(Sn) = ES2n & (ESn)2 = npq.

• Thus Yn := (Sn & np)/7nqp is centered (EYn = 0) and has

unit variance var(Yn) = 1 for all n.

• Theorem (de Moivre-Laplace CLT): If Xi are i.i.d. Bernoullirandom variables, then Yn defined above converges in dis-tribution to a standard normal (Gaussian), i.e.,

limn+*

P(Yn > y) = #(y) :=

( *

y

172%

e&x2/2dx

! March 7, 2015 George Kesidis

38

Page 20: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

de Moivre-Laplace CLT: Proof

P(a < Yn :=Sn & np7npq

- b)

=$

np+a7nqp<k-np+b

7nqp

0n

k

1

pkqn&k

=$

a7nqp<k0-b

7nqp

0 n

k0 + np

1

pk0+npqnq&k0

where the sums are over integers k, k0. Using de Moivre’s formulato uniformly approximate

* nk0+np

+

over k0 as n + *:

P(a < Yn - b)

51

$7npq

$

a7nqp<k0-b

7nqp

(1 +k0

np)&k0&np(1&

k0

nq)&nq+k0

51

$7npq

$

a7nqp<k0-b

7nqp

exp

,

&(k0)2

2npq

-

+n+*

( b

a

e&x2/2

$dx

=

72%

$(#(a)&#(b))

where the second step is log(1& x) = &x& x2/2+ o(x2) and thethird is the Riemann integral.

Taking &a, b + *, gives Wallis’ identity: $ =72%.

! March 7, 2015 George Kesidis

39

Stirling’s formula

• de Moivre’s formula with Wallis’ identity gives Stirling’s for-mula:

n!en 5 nn72%n.

• In the following, we prove a more general sequential CLT.

! March 7, 2015 George Kesidis

40

Page 21: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

An i.i.d. CLT

Theorem: If X1, X2, ... are i.i.d. with E|X1| < * and0 < !2 := var(X1) < *, then

Yn :=Sn & nµ

!7n

+d N(0,1),

i.e., converges in distr’n to a standard normal.Proof:

• Taylor’s thm: eix = 1+ ix& 12x2 + R(x) where |R(x)| - |x|3.

• For |x| > 4, |R(x)| - |eix|+ 1+ |x|+ 12x2 - x2,

% |R(x)| - min{|x|3, x2}.

• So, if X 5 (X1 & EX1)/! then the characteristic function

of Yn is EeitYn =0

EeitX/7n1n

% EeitYn =

,

1+ i(EtX7n)&

E(tX)2

2n+ ER(

tX7n)

-n

=

,

1&t2

2n+ ER(

tX7n)

-n

,

where, by the dominated convergence theorem,

|ER(tX/7n)| - n&1

Emin{|tX|37n

, (tX)2} = n&1o(1).

% limn+*

EeitYn = limn+*

(1& t2/(2n) + o(1)/n)n = e&t2/2.

! March 7, 2015 George Kesidis

41

Trotter’s proof of the i.i.d. CLTPreliminaries

• Let C be the set of bounded uniformly continuous functionson R, i.e., if f ! C then ') > 0, .( > 0 such that: 'x, y ! R,|x& y| < ) % |f(x)& f(y)| < (.

• A transformation (function, operator) T : C + C is said tobe linear if T(af + bg) = aTf + bTg 'f, g ! C and 'a, b ! R.Note that 'x ! R, (aTf + bTg)(x) := a(Tf)(x) + b(Tg)(x).

• Define the supremum norm 8 f 8:= supx!R |f(x)|.

• T is said to be a contraction operator if 8 Tf 8-8 f 8 'f ! C.

• For a random variable X, define TX : C + C as

(TXf)(y) := Ef(X + y) =

( *

&*f(x+ y)dFX(x) y ! R.

• Note: f ! C % TXf ! C, TX is a linear contraction, and(TXf)(0) = Ef(X).

• Note: TX1TX2

= TX2TX1

(commutation), furthermore if X1, X2

are independent then (as characteristic functions)

TX1+X2= TX1

TX2= TX2

TX1,

c.f., Fubini’s theorem.

• Define C2 = {f ! C | f 0, f” ! C}.

! March 7, 2015 George Kesidis

42

Page 22: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Trotter’s proof of the i.i.d. CLTPreliminaries (cont)

Lemma 1: If limn+* Ef(Xn) = Ef(X) 'f ! C2, then X1, X2, ...converges in distribution to X.Note: Hypothesis is satisfied if 8 TXn

f & TXf 8+ 0.Proof of Lemma 1:

• Consider any y at which FX is continuous.

• Fix ) > 0 arbitrarily and take ( > 0 small enough so thatFX(y + ()& FX(y & () < ).

• Define f, g ! C2 such that

(i) f(x) = 1 for x - y & (,

(ii) g(x) = 1 for x - y,

(iii) f(x) = 0 for x , y, and

(iv) g(x) = 0 for x , y + (;

so that 0 - f - g - 1 in particular.

• So, since f(X) - 1{X - y & (} etc.,

FX(y & () - Ef(X) = limn+*

Ef(Xn) - lim infn+*

FXn(y)

- lim supn+*

FXn(y) - lim

n+*Eg(Xn) = Eg(X) - FX(y + ()

where the equalities are by hypothesis.

• Since this holds ') > 0, limn+* FXn(y) = FX(y).

! March 7, 2015 George Kesidis

43

Trotter’s proof of the i.i.d. CLTPreliminaries (cont)

Lemma 2: If A,B : C + C are linear, contraction operatorsthat commute, then 8 Anf&Bnf 8- n 8 Af&Bf 8 'n ! Z+, f ! C.Proof:

• Factor

Anf & Bnf =n&1$

i=0

An&i&1(A&B)Bif =n&1$

i=0

An&i&1Bi(A&B)f.

where the second equality is by commutativity.

• Now take norm of both sides, use the triangle inequality,and finally repeatedly use the contraction hypotheses.

! March 7, 2015 George Kesidis

44

Page 23: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Trotter’s proof of the i.i.d. CLTPreliminaries (cont)

Lemma 3: If EX = 0 and EX2 = 1, then 'f ! C2 and ') > 0:.N < * such that 8 Tn&1/2Xf & f & 1

2nf” 8- )

n'n , N .

Proof:

• Fix y such that FX is continuous at y.

• By Taylor’s theorem, .z(x) ! [y, y + x] such that

f(y + x) = f(y) + xf 0(y) + 12x2f”(y) + 1

2x2[f”(z(x))& f”(y)].

• By uniform continuity of f” (f ! C2), ') > 0, .( > 0 suchthat |z(x)& y| < ( % |f”(z(x))& f”(y)| < ). Thus,

(Tn&1/2Xf)(y)

=

(

f(y + n&1/2x)dFX(x)

= f(y)

(

dFX(x) + 17nf 0(y)

(

xdFX(x) + 12nf”(y)

(

x2dFX(x)

+ 12n

(

[f”(z(n&1/2x))& f”(y)]x2dFX(x)

= f(y) + 12nf”(y)

+ 12n

,(

|x|<(7n+

(

|x|,(7n

-

[f”(z(n&1/2x))& f”(y)]x2dFX(x)

! March 7, 2015 George Kesidis

45

Trotter’s proof of the i.i.d. CLTLemma 3’s proof (cont)

• Now, |x| < (7n % |z(n&1/2x)& y| - |n&1/2x| - (.

• Thus,2

2

2

2

12n

(

|x|<(7n[f”(z(n&1/2x))& f”(y)]x2dFX(x)

2

2

2

2

-2

2

2

2

12n

(

|x|<(7n)x2dFX(x)

2

2

2

2

- )n.

• Since |f”(z)& f”(x)| - 2 8 f” 8< * and EX2 < *,

12n

2

2

2

2

(

|x|,(7n[f”(z(n&1/2x))& f”(y)]x2dFX(x)

2

2

2

2

- 1n8 f” 8

2

2

2

2

(

|x|,(7nx2dFX(x)

2

2

2

2

-)

n' su". large n.

• Finally, substitute the last two estimates into the expressionfor (Tn&1/2Xf)(y) of the previous slide.

! March 7, 2015 George Kesidis

46

Page 24: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Trotter’s proof of the i.i.d. CLT

Theorem: If X1, X2, ... are i.i.d. with E|X1| < * and 0 <EX2

1 < *, then n&1/2Sn := n&1/2(X1+ ...+Xn) +d N(µ,!2) whereµ := EX1 and !2 := var(X1).Proof:

• Let Y 5 N(µ, !2).

• By Lemma 1, theorem follows if limn+* 8 Tn&1/2Snf&TY f 8= 0.

• w.l.o.g., µ = 0 and !2 = 1, i.e., Y 5N(0,1) if we restate thetheorem in terms of (Sn & nµ)/(!

7n).

• Since Y 5N(0,1), TY = Tnn&1/2Y and, by IBP,

Tnn&1/2Y f = f + 1

2nf”.

• Since the Xi are i.i.d., Tn&1/2Sn= Tn

n&1/2X1.

• By Lemma 2,

8 Tn&1/2Snf & TY f 8 - n 8 Tn&1/2X1

f & Tn&1/2Y f 8 .

• Applying Lemma 3 we get

8 Tn&1/2X1f & Tn&1/2Y f 8 - 2).

for all su$ciently large n.

! March 7, 2015 George Kesidis

47

Lindeberg’s CLT forindependent random variables

• Consider an independent sequence of random variables withEXi = 0 'i w.l.o.g.

• Let !2i = EX2

i , i.e., they are not necessarily identically dis-tributed.

• The CLT can be generalized to a sequence of random vari-ables that assuming only their mutual independence underLindeberg’s condition:

limn+*

s&2n

n$

i=1

(

|x|,(sn

x2dFXi(x) = 0 '( > 0,

where sn :=/

)ni=1 !

2i , i.e., the Xi are not identically dis-

tributed (recall the last inequality of the proof of Lemma3).

• A simple proof of Lindeberg’s CLT follows that of Trotter’sfor the i.i.d. case [Trotter’59].

• Feller proved that Lindeberg’s condition is necessary.

! March 7, 2015 George Kesidis

48

Page 25: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Modes of convergence

• A CLT involves convergence in distribution.

• This is the weakest class of convergence results, in whichno limiting random variable need exist.

• FYn(y) + FY (y) 'y that are points of continuity of FY implies

Yn + Y in distr’n.

• In the following, we will see that convergence:in distr’n 9 in prob. 9 (a.s. or in L2).

! March 7, 2015 George Kesidis

49

Weak law of large numbers (WLLN):assumptions

• Assume random variables X1, X2, X3, ... are i.i.d.

• Also suppose that the common distribution has finite vari-ance, i.e., !2 := var(X) := E(X & EX)2 < *, where X 5 Xi

'i.

• Finally, suppose that the mean exists and is finite,i.e., µ := EX < *.

• Recall the sum Sn := X1+X2+ · · ·+Xn for n , 1, ESn = nµand var(Sn) = n!2.

• The quantity Sn/n is called the empirical mean of X after nsamples and is an unbiased estimate of µ, i.e.,

E

,

Sn

n

-

= µ.

! March 7, 2015 George Kesidis

50

Page 26: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

A WLLN: Statement and Proof

Theorem: If X1, X2, ... i.i.d. with var(X1) < *,

limn+*

P

,2

2

2

2

Sn

n& µ

2

2

2

2

, )

-

= 0 ') > 0.

• By Chebyshev’s inequality,

P

,2

2

2

2

Sn

n& µ

2

2

2

2

, )

-

-var(Sn/n)

)2=

!2

n)2.

• Note: So, Sn/n is said to be a weakly consistent estimatorof µ.

• Consequently, L2 convergence, E(Yn&Y )2 + 0, implies weakconvergence, P(|Yn & Y | > )) + 0 ') > 0.

• Example: Discretely distributed Yn + 0 in probability butnot in L2 when:

P(Yn = &n) = P(Yn = n) = pn = (1& P(Yn = 0))/2

such that pn + 0 as n + * but n2pn (+ 0, e.g., pn = 1/n forn > 1 so that EY 2

n = 2n (+ 0.

• Example: Yn + c (a constant) in probability 3 Yn + c indistribution.

! March 7, 2015 George Kesidis

51

Strong Law of Large Numbers (SLLN)

• Again, a sequence of random variables X1, X2, ... is said toconverge almost surely (a.s.) to a random variable X if

P

0

limn+*

Xn (= X1

= 0.

• Kolmogorov’s strong LLN: if X1, X2, ... i.i.d. and E|X1| < *,then

P

,

limn+*

Sn

n= µ

-

= 1 i.e.,Sn

n+ µ := EX1 a.s.

• Formally, the limit states that 'r ! Z+, .n such that 'k , n:|Sn/n& µ| < 1/r a.s.; i.e.,

P

! *%

r=1

*"

n=1

*%

k=n

3

|Sk

k& µ| <

1

r

4

#

= 1.

• But P(5*

r=1Br) = 1 3 P(Br) = 1 'r, soSn/n + µ a.s. if and only if

P

! *"

n=1

*%

k=n

3

|Sk

k& µ| <

1

r

4

#

= 1 'r.

! March 7, 2015 George Kesidis

52

Page 27: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

SLLN statement (cont)

• Now fix r ! Z+ arbitrarily and let

Ack =

6

|Sk/k & µ| < r&17

.

• The event*"

n=1

*%

k=n

Ack

is denoted Acn almost always (a.a.), i.e.,

" ! Acn a.a. 3 .n4(") such that " ! Ac

n 'n , n4(").

• Note: P(5*

k=n Ack) 1 P(Ac

n a.a.).

• Note: equivalently express above in terms of An where theevent

*%

n=1

*"

k=n

Ak

is denoted An infinitely often (i.o.) = (Acn a.a.)c, i.e.,

" ! An i.o. 3 'n .m > n such that " ! Acm.

! March 7, 2015 George Kesidis

53

SLLN and Borel-Cantelli Lemmas

So, Sn/n + µ a.s. 3 P(Acn a.a.) = 1 3 P(An i.o.) = 0 for all

r ! Z+ where An = {|Sn/n& µ| , r&1}.

• First BC Lemma: Generally for events A1, A2, . . . ,

*$

n=1

P(An) < * % P(An i.o.) = 0.

• Proof:

P(An i.o.) - P(*"

n=k

An) 'k

- P(Ak) + P(Ak+1) + · · · +k 0

• Second BC Lemma: For independent events A1, A2, . . . ,

*$

n=1

P(An) = * % P(An i.o.) = 1.

• Proof: limm+*)m

k=n P(Ak) = * implies

0 : e&)m

k=nP(Ak) =

m&

k=n

e&P(Ak) ,m&

k=n

(1& P(Ak)) = P(m%

k=n

Ack)

where the equalities are by independence ande&x , 1& x was used.

• Note: P(An i.o.) ! {0,1} by the “zero-one law”.

! March 7, 2015 George Kesidis

54

Page 28: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Kolmogorov’s maximal inequality

If X1, X2, ... are independent and EX2n < * 'n,

P(max1-k-n

|Sk & ESk| , &) -var(Sk)

&2'n , 1,& > 0.

Proof:

• W.l.o.g. assume EXn = 0 'n % ESn = 0 'n. Fix & > 0.

• Define disjoint Bk = {|Si| < & 'i < k, |Sk| , &}.

ES2n ,

n$

k=1

ES2n1Bk

,n$

k=1

E(2(Sn & Sk)Sk + S2k)1Bk

=n$

k=1

[2E(Sn & Sk)ESk1Bk+ ES2

k1Bk]

=n$

k=1

ES2k1Bk

where the second-to-last inequality is by independence.

• Thus, ES2n , &2

)nk=1 P(Bk) = &2P(

8nk=1Bk).

• Note how this generalizes Chebyshev’s inequality.

! March 7, 2015 George Kesidis

55

SLLN: proof of bounded second moment case

Assume EX21 < *.

• Again, assume centered Xn w.l.o.g., and apply the maximalinequality with & = 2n) and First BC Lemma to get that

P({ max1-k-2n

|Sk| - 2n)} a.a.) = 1 ') > 0.

• Now, 'm such that 2n&1 < m - 2n,

max1-k-2n

|Sk| - 2n) implies |Sm| - 2m)

• This leads to P(|Sm|/m - 2) a.a.) = 1 ') > 0.

• Note: Kolmogorov’s SLLN only requires E|X1| < *, i.e., boundedfirst moment.

! March 7, 2015 George Kesidis

56

Page 29: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Weak and strong LLNs

• SLLN % WLLN since P(Acn i.o.) = 0 % P(An) + 0.

• Example of persistently shrinking pulse on ! = [0,1] with P

Lebesgue measure:

– 'm ! Z+, k ! {1,2, . . . ,m}, define

Yk+m(m&1)/2(") := 1

3

k &m

m< " -

k

m

4

.

– The random variables Yn converge weakly but not stronglyto zero because P({Ym > )} i.o.) = 1 for all ) = r&1 > 0.

• Theorem: If X1, X2, ... converges to X in probability thenthere is a subsequence Xn1, Xn2, ... that converges to X a.s.

! March 7, 2015 George Kesidis

57

Uniform integrability: motivation

• Persistently shrinking pulse example also showed that con-vergence in Lq, for q , 1, does not generally imply conver-gence a.s.

• Example (Dirac/Heaviside impulse):

– ! = [0,1] and P is Lebesgue measure.

– Xn(") := n1[0,

1n]'n , 1.

– Clearly, Xn + 0 a.s.

– But, EXn = 1 'n , 1.

• So, convergence a.s. (i.e., “pointwise”) does not generallyimply convergence in Lq either.

• Under what conditions does a.s. convergence imply conver-gence in Lq for q , 1?

! March 7, 2015 George Kesidis

58

Page 30: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Uniform integrability: preliminaries

Consider the probability space (!,F ,P).

• Theorem: If E|X| < * then ') ! (0,*) .c()) ! [0,*) suchthat

E(|X|1{|X|,c}) < ) 'c ! [c()),*).

Proof:

– Lebesgue integrals are uniformly continuous, i.e.,.(()) ! (0,*) such that EX1A < ) 'A ! F such thatP(A) < (())(exercise: prove by contradiction of the assumption thatE|X| < *).

– By Markov’s inequality P(|X| , c) - c&1E|X| 'c ! (0,*).

– Since E|X| < *, .c()) ! (0,*) such thatP(|X| , c) < (()) 'c ! [c()),*).

– Finally, take A = {X , c} for c ! (c()),*).

! March 7, 2015 George Kesidis

59

Uniform integrability:definition and su!cient conditions

• A collection of random variables C on (!,F ,P) is uniformlyintegrable if: ') ! (0,*) .c()) ! [0,*) such that

supX!C

E(X1{|X|,c}) < ) 'c ! [c()),*) and 'X ! C.

• If |X| - Y a.s. 'X ! C with EY < *, then C is uniformly inte-grable (note that EXn1{|Xn| > Y } = 0 and recall Lebesgue’sdominated convergence theorem); the converse is not true.

• For uniformly integrable C, if c ! [c()),*) then

E|X| = E|X|1{|X|<c} + E|X|1{|X|,c} < c+ ) 'X ! C,

i.e., C is uniformly L1 bounded; the converse is not true.

• If C = {X0, X1, ...} such that 0 - Xn - Xn+1 'n ! Z+ (i.e., anincreasing sequence) and C is L1 bounded, then by monotoneconvergence theorem, C is uniformly integrable.

• Theorem: If X ! (!,F ,P) and E|X| < * and {G&, & ! %} isa collection of sub !-algebras of F, then {E(X | G&), & ! %}is uniformly integrable.Proof: exercise.

! March 7, 2015 George Kesidis

60

Page 31: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Uniform integrability: main theorem prelim

• Define the ramp +c(x) = x1{|x|<c} + c1{x,c} & c1{x-&c}.

• Theorem: If C is uniformly integrable then ') ! (0,*).c()) ! [0,*) such that

E|X & +c(X)| < ) 'X ! C, c ! [c()),*).

• Proof: Arbitrarily fix c ! (0,*) and note that

|x& +c(x)| = (x& c)+ + (x+ c)& 'x ! R

% E(X & c)+ = E(X & c)1{X,c} - E|X|1{|X|,c}

and E(X + c)& = &E(X + c)1{X-&c} - E|X|1{|X|,c}.

! March 7, 2015 George Kesidis

61

Swapping limits & expectation (integration)

Theorem: If (a) limn+*Xn = X a.s. and (b) {X0, X1, ...} areuniformly integrable, then

E|X| < * and limn+*

E|Xn &X| = 0.

Proof:

• By (b) and previous “uniformly L1 boundedness” result,.B < * such that E|Xn| < B 'n ! Z+.

• So, by (a) and Fatou’s lemma,

E|X| = E lim infn+*

|Xn| - lim infn+*

E|Xn| < B.

• To get L1 convergence of Xn to X, arbitrarily fix ) ! (0,*).By the previous “ramp” result and (b), .c()) ! (0,*) suchthat, 'c ! (c()),*), n ! Z+,

E|Xn & +c(Xn)| < )/3 and E|X & +c(X)| < )/3.

• Fix c ! (c()),*). Since +c is continuous and bounded inmagnitude by c, by the dominated convergence theorem,.N()) ! Z+ such that

E|+c(Xn)& +c(X)| < )/3 'n , N()).

• By these three “)/3” inequalities and the triangle inequality,

E|X &Xn| - E|Xn & +c(Xn)|+ E|X & +c(X)|+ E|+c(Xn)& +c(X)|< ).

! March 7, 2015 George Kesidis

62

Page 32: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Completeness of Lq

• Definition: Lq(!,F ,P), or just Lq, is the set of random vari-ables X on (!,F ,P) such that ||X||q := (E(|X|q))1/q < *.

• Definition: X1, X2, ... is said to be a Cauchy sequence in Lq

if ') > 0 .N) such that ||Xn &Xm||q < ) whenever n,m > N).

• Definition: A set is said to be complete if all of its Cauchysequences converge to an element inside it.

• For q , 1, || · ||q is a norm by Minkowski’s inequality, ||X +Y ||q - ||X||q + ||Y ||q, so that Lq is a (complete) Banachspace.

• To show that Lq is complete:

– Consider a Cauchy sequence X1, X2, ... in Lq.

– Let N4()) := inf{N' | ' - )} and define X4 = XN 4()); so,by Minkowski’s inequality, 'n , N4,

||Xn||q - ||X4||q + ||Xn &X4||q - ||X4||q + ),

i.e., the sequence {Xn, n , N4} is bounded in Lq.

– Then, one can show that a subsequence Xnk a.s. con-verges to a (measurable) random variable X (use thecompleteness of R and argue by contradiction).

– So by Fatou’s lemma, ||X||qq < *, i.e., X ! Lq.

– Finally, use Fatou’s lemma on ||X & Xn||qq to establishLq-convergence to X.

! March 7, 2015 George Kesidis

63

Caratheodory’s extension theorem

• Theorem: If A is an algebra on ! and P is countably addi-tive on A, then there exists P̄ on !(A) such that P = P̄ onA.

• In addition, if . !1 " !2 " !3... ! A such that !n 1 !, thenthe extension P̄ is unique.

• On R, Caratheodory’s theorem extends a countably additiveprobability measure on the algebra A containing all intervalsand their finite unions, to a !-field that is a strict subset of2R but contains B = !(A).

! March 7, 2015 George Kesidis

64

Page 33: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Product probability spaces

• Consider two probability spaces (!i,Fi,Pi), i = 1,2.

• Define the product sample space

! := !1 6!2 := {("1,"2) | "i ! !i, i = 1,2}

• Note that A0 := {A1 6 A2 | Ai ! Fi, i = 1,2} is closedunder intersections but not under finite unions, e.g., cannotexpress

(A1 6A2) $ (B1 6 B2) as C1 $ C2

where Ai,Bi, Ci ! Fi.

• So, add all finite disjoint unions of elements of A0 to A0 andcall the result A, an algebra.

• Denote F1 6 F2 = !(A).

! March 7, 2015 George Kesidis

65

Product probability space extension

Theorem: There exists a unique P such that

(a) P(A1 6 A2) = P1(A1)P2(A2) 'A1 6 A2 ! A0, and uniquelyextending P to A with finite unions, and

(b) (! = !1 6!2,F = !(A),P) is a probability space.

Proof:

• Sections of measurable sets are measurable, where a sectionof A " ! along "2 is

A"2 := {"1 ! !1 | ("1,"2) ! A} for "2 ! !2,

because '"2 ! !2, M := {A ! F | A"2 ! F1} is a monotoneclass % M = F.

• A = A16A2 ! A0 % A"2 = A1 if "2 ! A2 otherwise A"2 = ) %

P1(A"2) = P1(A1)1A2("2) % P(A) =

(

P1(A"2)dP2("2),

where P1(A"2) is a (!2,F2,P2) random variable.

• Such disintegration extends to A ! A by finite additivity andP is countably additive on A by dominated convergence.

• So, P (and disintegration) extend uniquely to F by Caratheodory.

! March 7, 2015 George Kesidis

66

Page 34: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Fubini-Tonelli Theorem

• Theorem: If

(i) X : ! = !1 6!2 + R is F = F1 6 F2 measurable and,

(ii) "3&i +9

X(")dP("i) is a.s. finite and F3&i-measurable'i ! {1,2},

then

EX :=

(

XdP =

(,(

XdP1

-

dP2 =

(,(

XdP2

-

dP1.

Proof:

– By disintegration,9

XdPi are F3&i-measurable and thetheorem holds for f = 1A, A ! F.

– Extend to simple functions and take limits via dominatedconvergence to prove for the case where E|X| < *.

• Considering hypothesis (ii), recall how absolute summabilityof a sequence implies its (unique) summability in any order.

! March 7, 2015 George Kesidis

67

Consistency of Probability Measures

• Consider the product space (RZ+

,BZ+

) =: (R*,B*), whereZ+ := {0,1,2,3, ...}.

• Again, underlying probability space (!,F ,P).

• A cylinder event A ! B* is of the form

A = A0 6 A1 6 A2 6 ...

where all but a finite number of Ai = !, i.e., there is a finiteindex IA " Z+ such that Ai = R 'i (! IA.

• A family of probability measures {Pn}n!Z+, Pn on (Rn,Bn), issaid to be consistent if

Pn(A0 6 A1 6 . . . An&1) = P

n+1(A0 6A1 6 . . . An&1 6 R)

for all cylinder sets A0 6A1 6 . . . An&1.

! March 7, 2015 George Kesidis

68

Page 35: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Kolmogorov’s Extension Theorem

For each consistent family of probability measures Pn

on (Rn,Bn), .! consistent P* on (R*,B*).

• Clearly, require that

P*(A) = P

n(A0 6 A1 6 ...6An&1)

for all cylinder sets A = A0 6A1 6 ... and all n ! Z+.

• Since P* is specified for all cylinder sets, P* is unique onthe algebra generated by them and, by the monotone classtheorem, unique on B* too.

• For existence:

– Let A be the set of finite unions of cylinder sets, including), so that !(A) = B*.

– Show P* is finitely additive on A and apply Caratheodory’sextension theorem.

! March 7, 2015 George Kesidis

69

Consistency and FDDs

• Consider a discrete-time/parameter stochastic process

X := {Xt | t ! Z+}where each Xt is itself a random variable.

• Let Ft1,t2,...,tn be the joint CDF of Xt1, Xt2, ...,Xtn for some finiten and di"erent tk ! Z+ for all k ! {1,2, ..., n}, i.e.,

Ft1,...,tn(x1, ..., xn) = P(Xt1 - x1, ...,Xtn - xn),

where P is the underlying probability measure.

• This is called an n-dimensional distribution of X.

! March 7, 2015 George Kesidis

70

Page 36: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

KET and Consistent FDDs

• A family of such joint CDFs is called a set of finite-dimensionaldistributions (FDDs).

• The FDDs are consistent if one can marginalize (reduce thedimension) of one and obtain another, e.g.,

Ft1,t4(x1, x4) := Ft1,t2,t3,t4(x1,*,*, x4).

then using KET one can prove .! a discrete-time stochasticprocess X on R* (with distribution P*), i.e.,

dPn := dF0,1,...,n&1 and dP* := dFZ+

• Samples " of the underlying probability space space are ac-tually sample paths of the stochastic process, i.e., Xt(").

• KET can be extended to continuous-time stochastic pro-cesses, i.e., sample paths in Xt(") ! RR+

instead of ! RZ+

.

• In the following, we will focus on the underlying probabilityspace ! and the !-algebras !(Xs | s - t) for t ! R+ = [0,*),i.e., in continuous-time...

! March 7, 2015 George Kesidis

71

Uncountable products

Suppose I is an uncountably infinite index set and 't ! I: (!,Ft)is a sample space and !-algebra of events.

• Theorem: If A ! !(Ft, t ! I) =: GI, then there is somecountable J " I (depending on A) such that A ! GJ.Proof:

– Define H = {A ! GI | . countable J " I s.t. A ! GJ}.

– Clearly, ! ! H and H is closed under countable unionsso that H is a !-algebra.

– Finally since Ft " H 't ! I, H = GI.

• Corollary: If Y : ! + R̄ is GI-measurable, then . a countableJ " I such that Y is GJ-measurable.Proof:

– Use previous theorem if Y = 1A and easily extended tosimple (discretely distributed) Y .

– Extend to any (measurable) random variable Y by ap-proximating with simple functions.

• For the special case where G[0, t] = !(Xs | s - t), i.e., Ft =!(Xt) for random variables Xt: if Y is Ft-measurable then .countable {t0, t1, ...} " [0, t] and a BZ+

-measurable mapping& such that

Y = &(Xt0, Xt1, ...) a.s.

Recall Doob’s theorem.

! March 7, 2015 George Kesidis

72