Download - An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Transcript
Page 1: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

An Introduction to Probability Theory: Outline

1. Definitions: sample space and (measurable)random variables

2. !-algebras

3. Expectation (integration)

4. Conditional expectation

5. Useful inequalities

6. Independent random variables

7. The central limit theorem

8. Laws of large numbers: Borel-Cantelli lemma

9. Uniform integrability

10. Kolmogorov’s extension theorem for consistentfinite-dimensional distributions

! March 7, 2015 George Kesidis

1

Sample space and events

• Consider a random experiment resulting in an outcome (or“sample”), ".

• E.g., the experiment is a pair of dice thrown onto a tableand the outcome is the exact orientation of the dice andtheir position on the table when they stop moving.

• The space of all outcomes, !, is called the sample space,i.e., " ! !.

• An event is merely a subset of !, e.g., “the sum of the dotson the upward facing surfaces of the dice is 7”.

• We say that an event A " ! has occurred if the outcome "of the random experiment belongs to A, i.e., " ! A, so

– events A and B occurred if " ! A #B, and

– events A or B occurred if " ! A $B.

• A sample space ! is an abstract, unordered set in general.

• Let F be the set of events, i.e., A ! F % A " !.

! March 7, 2015 George Kesidis

2

Page 2: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Probability on a sample space

• A probability measure P maps each event A " ! to a realnumber between zero and one inclusive, i.e., P(A) ! [0,1].

• A probability measure has certain properties:

1. P(!) = 1 and

2. P(A) = 1& P(Ac) ' events A, whereAc = {" ! ! | " (! A} is the complement of A.

• Moreover, if the events {Ai}ni=1 are disjoint (i.e., Ai #Aj = )for all i (= j), then

P

!

n"

i=1

Ai

#

=n$

i=1

P(Ai),

i.e., P is finitely additive.

• Formally, a probability measure is defined to be countablyadditive:

3. For any disjoint {Ai}*i=1,

P

! *"

i=1

Ai

#

=*$

i=1

P(Ai),

! March 7, 2015 George Kesidis

3

Probability measures on !-algebras

• On large sample spaces ! (e.g., ! = R), a formal probabilitymeasure may be impossible to construct if all subsets of !are defined as events, i.e., if F = 2! (the power set of allsubsets of !), cf., Caratheodory’s Extension Theorem.

• So, the set of events F is restricted to a !-algebra (or !-field)of subsets of ! formally satisfying the following properties:

1. ! ! F (possesses intersection identity)

2. if A ! F then Ac ! F (closed under complementation)

3. if A1, A2, A3, ... ! F, #*n=1An ! F (closed under countable

intersections)

• The probability measure P is defined only on the !-algebraF " 2!,

P : F + [0,1].

• We have thus identified a fundamental probability (measure)space: (!,F ,P).

• Note: Equivalently by De Morgan’s theorem, can use ) ! F(union identity) and closed under countable unions insteadof conditions 1 and 3 above.

! March 7, 2015 George Kesidis

4

Page 3: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Conditioned events

• The probability that A occurred conditioned on (or “giventhat”) another event B occurred is P(A|B) := P(A#B)/P(B),where P(B) > 0 is assumed.

• A group of events A1, A2, ..., An are said to be mutually in-dependent if

P

!

%

i!I

Ai

#

=&

i!I

P(Ai) ' I " {1,2, ..., n}.

• Note if events A and B are independent and P(B) > 0,then P(A|B) = P(A), i.e., knowledge that the event B hasoccurred has no bearing on the probability that the event Ahas occurred as well.

• Given that B has occurred with P(B) > 0:

– The set of events FB := {A # B | A ! F} is itself a!-algebra, and

– P(·|B), also a probability measure for (!,F) and (B,FB),addresses the residual uncertainty in the random experi-ment given that the event B has occurred.

– On (!,F), P(A) = 0 % P(A|B) = 0 'A ! F, i.e., P(·|B)is absolutely continuous w.r.t. P.

! March 7, 2015 George Kesidis

5

Random variables

• A random variable X is a real-valued function with domain!, X : ! + R.

• So, X(") is a real number representing some feature of theoutcome ".

• E.g., in a dice-throwing experiment, X(") could be definedas just the sum of the dots on the upward-facing surfacesof outcome " (which is the configuration of the dice on thetable when they stop moving).

• For random variables, we are typically interested in the prob-ability of the event that X takes values in a contiguous in-terval B of the real line (including singleton points), or someunion of such intervals, i.e.,

P(X ! B) := P({" ! ! | X(") ! B}) =: P(X&1(B)).

• To ensure that the fundamental probability space (!,F ,P)is capable of evaluating the probabilities of such events, weformally define random variables as being measurable.

• To explain measurability, we need to first define the Borel!-algebra of subsets of R that is generated by contiguousintervals of R.

! March 7, 2015 George Kesidis

6

Page 4: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

The Borel !-algebra on R

• Consider contiguous intervals of the real line, e.g.,

[x,*) = {z ! R | z , x} or

(x, y] = {z ! R | x < z - y} etc.

• Define !(A) as the smallest !-algebra containing all elementsof elements A, i.e., generated by A.

• The Borel !-algebra is

B := !([x,*) | x ! R).

• Note that the singleton sets {x} ! B 'x ! R and that, e.g.,

B = !((&*, x] | x ! R)

= !([x, y) | x - y, x, y ! Q) etc

To see the first equality, note that [x,*)c = (&*, x) andthat we can define a monotonically decreasing sequence xn

converging to x so that #*n=1(&*, xn) = (&*, x].

• The Vitali subset of R is not in the Borel !-algebra, i.e.,B (= 2R.

• Indeed, the cardinality of B is only that of R.

! March 7, 2015 George Kesidis

7

Measurability of random variables

• Formally, random variables are defined to be measurablewith respect to (!,F), i.e.,

X&1(B) ! F 'B ! B,so that P(X ! B) is well-defined 'B ! B.

• A random variable X induces a probability measure PX on(R,B) (the distribution of X),

PX(B) := P(X ! B),

so that (R,B,PX) is also a probability space.

• Note: If a function g : R + R is (R,B)-measurable (i.e.,g&1(B) ! B 'B ! B) and X is a random variable, then g(X)is a random variable too.

• Note: The cumulative distribution function (CDF) of X isjust FX(x) := PX((&*, x]) = P(X - x).

! March 7, 2015 George Kesidis

8

Page 5: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Measurable compositions of random variables

If Y , X and X1, X2, X3, ... are all extended random variables, thenthe following are also random variables:

• min{X,Y }, max{X, Y }, XY , 1{X (= 0}/X where 00:= 1.

• #X + $Y '#, $ ! R.

• supn,1Xn, infn,1Xn, lim supn+*Xn, lim infn+*Xn.

! March 7, 2015 George Kesidis

9

!-algebra generated by a random variable

• Define !(X) := !({X&1(B) | B ! B}), i.e., the smallest!-algebra of events for which the random variable X is mea-surable.

• Note: One can directly show that

!(X) = !({X&1([x,*)) | x ! R}),i.e., considering only a “generating” subset of B.

• !(X) captures the “information” gained by the knowledgeof X(") about the outcomes ".

• E.g., If X is constant then !(X) = {),!}.

• E.g., If X = 1B for an event B ! F,

– where the the indicator function 1B(") = 1 if " ! B and1B(") = 0 else (also may write 1B := 1B),

– i.e., X is a Bernoulli distributed random variable,

– then !(X) = {), B,Bc,!}.

– If the scalars a (= b, then Y := a1B + b1Bc also indicateswhether B or Bc has occurred,

– i.e., !(X) = !(Y ) in this case.

• Doob’s Theorem: If Y is !(X)-measurable (so that !(Y ) "!(X)), then . Borel measurable g such that Y = g(X) a.s.

• If g is one-to-one, then !(Y ) = !(X).

! March 7, 2015 George Kesidis

10

Page 6: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Independent random variables

• The random variables X1, X2, ...,Xn are said to be mutuallyindependent (or just “independent”) if and only if

P(#ni=1{Xi ! Bi}) =

n&

i=1

P(Xi ! Bi) 'B1, ..., Bn ! B.

• Clearly mutual independence implies that the joint CDF

FX1,X2,...,Xn(x1, x2, ..., xn) := P(#n

i=1{Xi - xi})satisfies

FX1,X2,...,Xn(x1, x2, ..., xn) =

n&

i=1

FXi(xi) 'x1, ..., xn ! R.

• To prove the converse statement, we will now discuss mono-tone class theorems.

• Note: The marginal FX1(x1) = FX1,...,Xn

(x1,*,*, ...,*).

! March 7, 2015 George Kesidis

11

Monotone class theorems

• C " 2! is a %-class over ! if A,B ! C % A #B ! C.

• C " 2! is a &-class over ! when:

(i) ! ! C;

(ii) if A,B ! C and A " B % B\A := B #Ac ! C; and

(iii) if A1, A2, ... ! C is monotonically increasing(An " An+1 'n), then $*

n=1An ! C.

• Proposition: If C is a &-class over !, then

(a) A ! C % Ac ! C (by (i) and (ii), !\A / Ac ! C).

(b) A,B ! C and A # B = ) (i.e., they’re disjoint), thenA$B ! C (A " Bc % (Bc\A)c / B$A ! C by (ii) and (a)).

(c) if C is also a %-class, then C is a !-algebra.

The proof of (c) is left as an exercise.

• Because of the conditions on (ii) or (b), a &-class seems lessinclusive than a !-algebra.

! March 7, 2015 George Kesidis

12

Page 7: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Dynkin’s theorem

If D is a %-class, C a &-class and D " C, then !(D) " C.Proof:

• Define G as the smallest &-class such that D " G; thusD " G " C.

• We now prove G is also a %-class; the theorem then followsby the previous proposition (c).

• To this end, define H := {A " ! | A #D ! G 'D ! D}:

– Since D " G and D is a %-class, D " H.

– Check that H is a &-class % (minimal) G " H.

– Thus, A ! G (% A ! H) and D ! D % A #D ! G.

• Now define F := {B " ! | B #A ! G 'A ! G}:

– By the previous step, D " F.

– Check that F is a &-class % (minimal) G " F.

– Thus, G is a %-class.

Note: So, a &-class can be much larger than a %-class.

Classical monotone class theorem: if D is an algebra, C amonotone class (contains all limits of its monotone sequences)and D " C, then !(D) " C.

! March 7, 2015 George Kesidis

13

Independence in probability space (!,F ,P)

• Lemma: If D, C " F are independent classes of events andD is a %-class, then !(D) and C are independent.Proof:

– Take an arbitrary B ! C and defineDB = {A ! !(D) | P(A #B) = P(A)P(B)}

– D " DB.

– Check DB is a &-class.

– Apply Dynkin’s theorem.

• Theorem: If the joint CDF FX1,...,Xn/

'ni=1 FXi

(i.e., theLHS and RHS are equal at all points in Rn), then the nrandom variables X1, ...,Xn are independent.Proof:

– Define Dk = {{Xk - x} | &* - x - *}.

– Note: x = &* % ) ! Dk.

– Check that Dk is a %-class 'k.

– Finally use the lemma to obtain independence of the!(Dk) = !(Xk).

! March 7, 2015 George Kesidis

14

Page 8: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Conditional Independence

• Events A and C are said to be independent given B if

P(A | B,C) = P(A | B).

• Note that this implies P(C | B,A) = P(C | B).

• This is a natural extension of the unqualified notion of in-dependent events, i.e., events A and C are (unconditionally)independent if P(A | C) = P(A).

• Similarly, random variables X and Y are conditionally inde-pendent given Z if

P(X ! A | Z ! B, Y ! C) = P(X ! A | Z ! B)

for all Borel A,B,C " R, cf., Markov processes.

! March 7, 2015 George Kesidis

15

Expectation

• The expectation EX of a random variable X is simply its av-erage or mean value, which can be expressed as the Riemann-Stieltjes integral:

EX =

( *

&*x dFX(x),

recall that the CDF FX is nondecreasing on R.

• In the special case of a di"erentiable FX with probabilitydensity function (PDF) fX = F 0

X, i.e., X is continuouslydistributed, we can use the Riemann integral

EX =

( *

&*xfX(x)dx.

• In the case of a discretely distributed random variable Xwith countable state-space RX,

– F 0X(x) =

)

'!RXpX(')((x& '), where

– ( is the Dirac unit impulse and the probability mass func-tion (PMF) pX(') := P(X = ') > 0 for all ' ! RX, so that

EX =$

'!RX

' pX(').

! March 7, 2015 George Kesidis

16

Page 9: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Lebesgue integration

• Formally, the Lebesgue integral is used to define expecta-tion:

EX =

(

!

X(")dP(") =

(

!

X dP.

• Recall that ! is generally abstract, unordered.

• If X is simple (discretely distributed with |RX| = M < *),

– define the state-space RX = {'1, '2, ..., 'M} and

– the events Ai := {X = 'i} := {" ! ! | X(") = 'i}, whicha.s. partition !,

– so that the (well-defined) Lebesgue integral is

(

!

X(")dP(") =M$

i=1

'iP(Ai),

i.e., P(Ai) = pX('i) for all i.

! March 7, 2015 George Kesidis

17

Lebesgue integration (cont)

• To develop the general Lebesgue integral, we need to con-sider “extended” random variables X : ! + R, where

– the extended reals R := R $ {±*}, and

– measurability involves Borel sets that include ±*.

• For any sequence X1, X2, X3, ... of random variables, the fol-lowing are extended random variables:

– limn+*Xn (assuming the limit in n of Xn(") exists '" !!) and

– supn+*Xn.

! March 7, 2015 George Kesidis

18

Page 10: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Approximating random variables

• Proposition: For any non-negative extended random variableX, there is a sequence of simple random variables Xn suchthat

(a) P(0 - Xn - Xn+1 - X) = 1 for all n (i.e., monotonicity),and

(b) Xn + X almost surely (a.s.), i.e., almost sure conver-gence:

P( limn+*

Xn = X) = 1.

• Proof:

1. Define the nth partition of R+ (i.e., of the y-axis unlikeRiemannian integration) into a finite collection of con-tiguous intervals

{[bkn, bk+1n )}Kn

k=0 ,

where 'n: b0n = 0, bKn+1n = *+ (last interval includes

*).

2. 'k, define Xn(") = bkn '" ! X&1[bkn, bk+1n ), i.e., Xn - X

a.s.

3. The (n + 1)st partition is finer than the nth (i.e., Xn -Xn+1), in such a way that limn+*Kn 1 * to achieve(b).

! March 7, 2015 George Kesidis

19

Construction of the Lebesgue integral

• So for a non-negative extended random variable, the Lebesgueintegral is defined as

(

!

X dP = limn+*

(

!

Xn dP,

i.e., EX = limn+* EXn.

• For a signed extended random variable:

1. Note that X = X+&X& where the non-negative extendedrandom variables

X+ := max{0, X} and X& := max{0,&X}.

2. If EX+ < * or EX& < *, then define the Lebesgueintegral

EX = EX+ & EX&,

otherwise the Lebesgue integral EX is not defined.Note: |X| = X+ +X&.

• E.g., 1Qc is Lebesgue but not Riemann integrable:

( 1

0

1Qc(x)dx = 1 · P(Qc # [0,1]) + 0 · P(Q # [0,1]) = 1,

where here P is Lebesgue measure on ! = [0,1], so thatP(Q # [0,1]) = 0 as the rationals are countable.

! March 7, 2015 George Kesidis

20

Page 11: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Integration theorems (1 variable integrand)

Consider a sequence X1, X2, ... of random variables:

• Bounded convergence theorem: if supn |Xn| - K < *a.s. (where K is constant) and X = limn+*Xn a.s., thenEX = limn+* EXn and E|X| - K.

• Proof: Define An = {X&Xn > )} for arbitrary positive ) 2 1and note

|EXn & EX| - E|Xn &X|= E|Xn &X|1An

+ E|Xn &X|1Acn

- 2KP(An) + ).

• Now note

(lim infn+*

Xn)(") := limn+*

infk,n

Xk(")

always exists (though possibly not finite) since Yn := infk,n Xk

is a.s. monotonically nondecreasing in n.

• Fatou’s lemma: lim infn+* EXn , E(lim infn+*Xn).

• Proof:

– Let X := lim infn+*Xn.

– For any K > 0, invoke the bounded convergence theoremon min{Yn,K} 1 min{X,K}.

– Approximating with simple RVs and using monotonicity,limK+* Emin{X,K} = EX.

! March 7, 2015 George Kesidis

21

Integration theorems (cont)

Let X be the extended RV such that Xn + X a.s.

• Monotone convergence theorem: if limn+*Xn 1 X a.s.,then

limn+*

EXn 1 EX.

• Lebesgue’s dominated convergence theorem:If there exists a random variable Y such that|X| - |Y | a.s. and E|Y | < *, then

limn+*

E|X &Xn| = 0.

• Corollary (Sche"e):

limn+*

E|Xn &X| = 0 3 limn+*

E|Xn| = E|X|

! March 7, 2015 George Kesidis

22

Page 12: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Conditional expected value andevent-conditional distributions

Consider a random variable X and an event A such that P(A) > 0.

• The conditional expected value of X given A, denoted µ(X|A)is

µ(X|A) =

( *

&*xdFX|A(x), where FX|A(z) := P(X - z|A).

• For a discretely distributed random variable X with RX ={aj}*j=1,

µ(X|A) =*$

j=1

ajP(X = aj|A).

• The conditional PMF of X given A is pX|A(aj) = P(X = aj|A)for all j.

• Event-conditional PDF of a continuously distributed X is

fX|A(x) :=d

dxFX|A(x) % µ(X|A) =

( *

&*xfX|A(x)dx.

! March 7, 2015 George Kesidis

23

Conditional expectation

• Consider now two discretely distributed X and Y .

• The conditional expectation of X given the random variableY , denoted E(X|Y ), is a random variable itself.

• Indeed, suppose {bj}*j=1 = RY and, for all samples

"j ! {" ! ! | Y (") = bj} =: Bj

define

E(X|Y )("j) := µ(X|Bj) := µ(X|Y = bj),

• That is, E(X|Y ) maps all samples in the event Bj to the con-ditional expected value µ(X|Bj), i.e., E(X|Y ) is “smoother”(less uncertain) than X.

• Therefore, the random variable E(X|Y ) is a.s. a function ofY , i.e., E(X|Y ) is !(Y )-measurable.

• So, E(X|Y ) = E(X|Z) a.s. whenever !(Z) = !(Y ) allowingfor di"erences involving P-null events.

! March 7, 2015 George Kesidis

24

Page 13: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Conditional densities

• Now consider two random variables X and Y which are con-tinuously distributed with joint PDF

fX,Y =*2FX,Y

*x*y.

• For fY (y) > 0, we can define the conditional density:

fX|Y (x|y) :=fX,Y (x, y)

fY (y)

for all x ! R.

• Note that fX|Y (·|y) is itself a PDF and,

µ(X|Y = y) =

( *

&*xfX|Y (x|y)dx, where P(Y = y) = 0.

! March 7, 2015 George Kesidis

25

Conditional expectation and MSE

• In general, E(X|Y ) is the function of Y which minimizes themean-square error (MSE),

E[(X & h(Y ))2],

among all (measurable) functions h.

• So, E(X|Y ) is the best approximation of X given Y .

• In particular, E(X|Y ) and X have the same expectation,

E(E(X|Y )) = EX.

• Note: if X and Y are independent,

E(X|Y ) = EX a.s.

! March 7, 2015 George Kesidis

26

Page 14: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Some useful inequalities

• If event A1 " A2, thenP(A1) - P(A2) = P(A1) + P(A2\A1).

• For any group of events A1, A2, ..., An, Boole’s inequalityholds:

P

!

n"

i=1

Ai

#

-n$

i=1

P(Ai).

• Note that when the Ai are disjoint, equality holds simply bythe additivity property of a probability measure P and recallthe inclusion-exclusion set identities.

• If two random variables X and Y are such that X , Y a.s.,then EX , EY .

• Recall Fatou’s lemma.

! March 7, 2015 George Kesidis

27

Markov’s Inequality

• Consider a random variable X with E|X| < * and a realnumber x > 0.

• Since |X| , |X|1{|X| , x} , x1{|X| , x} a.s., we arrive atMarkov’s inequality:

E|X| , Ex1{|X| , x}= xE1{|X| , x}= xP(|X| , x).

• An alternative explanation for continuously distributed ran-dom variables X (with PDF f) is

E|X| =

( *

&*|z|f(z)dz

,( &x

&*(&z)f(z)dz +

( *

xzf(z)dz

,( &x

&*xf(z)dz +

( *

xxf(z)dz

= xP(|X| , x).

! March 7, 2015 George Kesidis

28

Page 15: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Chebyshev and Cramer’s Inequalities

• Take x = )2, where ) > 0, and argue Markov’s inequalitywith (X&EX)2 in place of |X| to get Chebyshev’s inequality

var(X) := E[(X & EX)2] , )2P(|X & EX| , )),

i.e.,

P(|X & EX| , )) - )&2var(X).

• Noting that, for all + > 0, {X , x} = {e+X , e+x} and arguingas for Markov’s inequality gives the Cherno" (or Cramer)inequality:

Ee+X , e+xP(X , x)% P(X , x) - exp

*

&[x+ & log Ee+X]+

- exp

,

&max+>0

[x+ & log Ee+X]

-

,

where we have simply sharpened the inequality by taking themaximum over the free parameter + > 0.

• Note the Legendre transform of the log moment-generatingfunction of X in the Cherno" bound.

! March 7, 2015 George Kesidis

29

Inequalities of Minkowski, Holder, andCauchy-Schwarz-Bunyakovsky

• Minkowski’s inequality: if E|X|q, E|Y |q < * for q , 1, then

(E|X + Y |q)1/q - (E|X|q)1/q + (E|Y |q)1/q,i.e., triangle inequality in the Lq space of random variables.

• Holder’s inequality: if E|X|r, E|Y |q < * for r > 1 andq&1 := 1& r&1, then

E|XY | - (E|X|r)1/r(E|Y |q)1/q.

• CBS inequality (q = 2): if EX2, EY 2 < *, then

E|XY | -.

E(X2)

.

E(Y 2).

• CBS is strict whenever X (= cY or Y = 0 a.s. for someconstant c.

• CBS is an immediate consequence of the fact that wheneverX (= 0 a.s. and Y (= 0 a.s.,

E

!

X/

E(X2)&

Y/

E(Y 2)

#2

, 0.

! March 7, 2015 George Kesidis

30

Page 16: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Jensen’s Inequality

• Note that if we take Y = 1 a.s., the Cauchy-Schwarz-Bunyakovsky inequality simply states that var(X) , 0, i.e.,

E(X2)& (EX)2 , 0.

• This is also an immediate consequence of Jensen’s inequal-ity.

• A real-valued function g on R is said to be convex if

g(px+ (1& p)y) - pg(x) + (1& p)g(y)

for any x, y ! R and any real fraction p ! [0,1].

• If the inequality is reversed, g is said to be concave.

• For any convex function g and random variable X, we haveJensen’s inequality:

g(EX) - E(g(X)).

! March 7, 2015 George Kesidis

31

Inequalities: Conditioned versions

• These inequalities and integration theorems have straight-forward “conditional” extensions.

• E.g., the conditional Jensen’s theorem: if g is convex andG " F is a !-algebra,

E(g(X) | G) , g(E(X | G))

• Applying conditional Jensen’s with g(x) = |x|q for real q , 1,and using the linearity of conditional expectation, we getthe following result.

• If X and X1, X2, X3, ... are random variables such that, forreal q , 1, E|X|q < 1 and E|Xn|q < * (i.e., X,Xn ! Lq) 'n,then:

(a) ||E(X | G)||q - ||X||q := (E(|X|q))1/q, and, therefore,

(b) If limn+* ||X &Xn||q = 0, i.e., convergence in Lq, then

limn+*

||E(X | G)& E(Xn | G)||q = 0.

! March 7, 2015 George Kesidis

32

Page 17: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Sums of independent random variables

• Consider two independent random variables X1 and X2 withPDFs f1 and f2 respectively; so, fX1,X2

= f1f2.

• The CDF of the sum is

F (z) = P(X1 +X2 - z) =

( *

&*

( z&x1

&*f1(x1)f2(x2)dx2dx1.

• Exchanging the first integral on the RHS with a derivativew.r.t. z gives the PDF of X1 +X2:

f(z) =d

dzF (z) =

( *

&*f1(x1)f2(z & x1)dx1 for all z ! R.

• Thus, f is the convolution of f1 and f2 which is denotedf = f1 4 f2.

! March 7, 2015 George Kesidis

33

Sums of independent random variables

• In this context, moment generating functions can be usedto simplify calculations.

• Let the MGF (bilateral Laplace transform) of Xi be

mi(+) = Ee+Xi =

( *

&*fi(x)e

+xdx.

• The MGF of X1 +X2 is, by independence,

m(+) = Ee+(X1+X2) = Ee+X1e+X2 = m1(+)m2(+).

• So, convolution of PDFs corresponds to simple multiplica-tion of MGFs (and to addition of independent random vari-ables).

! March 7, 2015 George Kesidis

34

Page 18: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Example: exponential and gamma distributions

Consider independent random variables that are all exponentiallydistributed with mean 1/&.

• The PDF of X1 +X2 is f , where f(z) = 0 for z < 0 and, forz , 0,

f(z) =

( z

0

f1(x1)f2(z & x1)dx1 = &2ze&&z,

i.e., the (n,&) gamma distribution with n = 2 (a.k.a. Erlangdistribution when n ! Z+).

• So, the MGF of X1 +X2 is

m(+) =

,

&

&& +

-2

,

which is consistent with the PDF just computed.

• There is a 1-to-1 relationship between PDFs and MGFsof nonnegative random variables (unilateral Laplace trans-form).

• So, for a sum of n random variables

m(+) =

,

&

&& +

-n

3 fn(z) =&nzn&1e&&z

(n& 1)!'z , 0.

• Note: Construction of continuous-time Markov chains isbased on the memoryless property that is unique to theexponential distribution.

! March 7, 2015 George Kesidis

35

The Gaussian distribution

Assume Xi is Gaussian (normally) distributed with mean µi andvariance !2

i , i.e., Xi 5 N(µi,!2i ).

• If independent RVs, the MGF of X1 +X2 is

m(+) = exp*

µ1+ + 12!21+

2+

6 exp*

µ2+ + 12!22+

2+

= exp*

(µ1 + µ2)+ + 12(!2

1 + !22)+

2+

,

which we also recognize as a Gaussian MGF.

• Even if dependent, #1X1 + #2X2, for scalars #i, is Gaussiandistributed with mean #1µ1 + #2µ2 andvariance #2

1!21 + #2

2!22 + 2#1#2cov(X1, X2),

where the covariance cov(X1, X2) := EX1X2 & EX1EX2.

• X = (X1, X2, ...,Xn) are jointly Gaussian if

fX(x) =1

[2% det(C)]n/2exp

*

&12(x& EX)TC&1(x& EX)

+

,

where the (symmetric) covariance matrix isC = E(X & EX)(X & EX)T.

• Note: E(X1 | X2) = EX1 + (X2 & EX2)E(X1X2)/EX22 is a.s.

linear in X2, and

• if X1, X2 are uncorrelated (diagonal covariance matrix), thenGaussian distributed with mean µi and variance !2

i . X1, X2

are independent (the converse is always true).

! March 7, 2015 George Kesidis

36

Page 19: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

de Moivre’s formula

There is a constant $ > 0 such that n!en 5 $nn7n.

Proof:

• Define B(n) = n!en/(nn7n).

• logB(n) = 1+)n

j=2[logB(j)& logB(j & 1)]

where 1 = logB(1).

• By Taylor’s theorem,

log(1& x) = &x& x2/2& x3/3+ o(x3) where limy+0

o(y)

y= 0.

• So,

logB(j)& logB(j & 1) = 1+ (j &1

2) log(1&

1

j)

= &1

12j2+ o(

1

j2)

• Since j&2 is summable, B(n) converges to a finite $.

! March 7, 2015 George Kesidis

37

de Moivre-Laplace Central Limit Theorem (CLT)

• Consider a sequence of independent and identically distributed(i.i.d.) Bernoulli random variables X1, X2, ..., where

p := P(Xi = 1) and q := 1& p = P(Xi = 0).

• Define the sum Sn = X1 +X2 + ...+Xn.

• Sn is binomially distributed with parameters (n, p), i.e.,

P(Sn = k) =0n

k

1

pkqn&k

for all k ! {0,1,2, ..., n}.

• ESn = np and variance var(Sn) = ES2n & (ESn)2 = npq.

• Thus Yn := (Sn & np)/7nqp is centered (EYn = 0) and has

unit variance var(Yn) = 1 for all n.

• Theorem (de Moivre-Laplace CLT): If Xi are i.i.d. Bernoullirandom variables, then Yn defined above converges in dis-tribution to a standard normal (Gaussian), i.e.,

limn+*

P(Yn > y) = #(y) :=

( *

y

172%

e&x2/2dx

! March 7, 2015 George Kesidis

38

Page 20: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

de Moivre-Laplace CLT: Proof

P(a < Yn :=Sn & np7npq

- b)

=$

np+a7nqp<k-np+b

7nqp

0n

k

1

pkqn&k

=$

a7nqp<k0-b

7nqp

0 n

k0 + np

1

pk0+npqnq&k0

where the sums are over integers k, k0. Using de Moivre’s formulato uniformly approximate

* nk0+np

+

over k0 as n + *:

P(a < Yn - b)

51

$7npq

$

a7nqp<k0-b

7nqp

(1 +k0

np)&k0&np(1&

k0

nq)&nq+k0

51

$7npq

$

a7nqp<k0-b

7nqp

exp

,

&(k0)2

2npq

-

+n+*

( b

a

e&x2/2

$dx

=

72%

$(#(a)&#(b))

where the second step is log(1& x) = &x& x2/2+ o(x2) and thethird is the Riemann integral.

Taking &a, b + *, gives Wallis’ identity: $ =72%.

! March 7, 2015 George Kesidis

39

Stirling’s formula

• de Moivre’s formula with Wallis’ identity gives Stirling’s for-mula:

n!en 5 nn72%n.

• In the following, we prove a more general sequential CLT.

! March 7, 2015 George Kesidis

40

Page 21: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

An i.i.d. CLT

Theorem: If X1, X2, ... are i.i.d. with E|X1| < * and0 < !2 := var(X1) < *, then

Yn :=Sn & nµ

!7n

+d N(0,1),

i.e., converges in distr’n to a standard normal.Proof:

• Taylor’s thm: eix = 1+ ix& 12x2 + R(x) where |R(x)| - |x|3.

• For |x| > 4, |R(x)| - |eix|+ 1+ |x|+ 12x2 - x2,

% |R(x)| - min{|x|3, x2}.

• So, if X 5 (X1 & EX1)/! then the characteristic function

of Yn is EeitYn =0

EeitX/7n1n

% EeitYn =

,

1+ i(EtX7n)&

E(tX)2

2n+ ER(

tX7n)

-n

=

,

1&t2

2n+ ER(

tX7n)

-n

,

where, by the dominated convergence theorem,

|ER(tX/7n)| - n&1

Emin{|tX|37n

, (tX)2} = n&1o(1).

% limn+*

EeitYn = limn+*

(1& t2/(2n) + o(1)/n)n = e&t2/2.

! March 7, 2015 George Kesidis

41

Trotter’s proof of the i.i.d. CLTPreliminaries

• Let C be the set of bounded uniformly continuous functionson R, i.e., if f ! C then ') > 0, .( > 0 such that: 'x, y ! R,|x& y| < ) % |f(x)& f(y)| < (.

• A transformation (function, operator) T : C + C is said tobe linear if T(af + bg) = aTf + bTg 'f, g ! C and 'a, b ! R.Note that 'x ! R, (aTf + bTg)(x) := a(Tf)(x) + b(Tg)(x).

• Define the supremum norm 8 f 8:= supx!R |f(x)|.

• T is said to be a contraction operator if 8 Tf 8-8 f 8 'f ! C.

• For a random variable X, define TX : C + C as

(TXf)(y) := Ef(X + y) =

( *

&*f(x+ y)dFX(x) y ! R.

• Note: f ! C % TXf ! C, TX is a linear contraction, and(TXf)(0) = Ef(X).

• Note: TX1TX2

= TX2TX1

(commutation), furthermore if X1, X2

are independent then (as characteristic functions)

TX1+X2= TX1

TX2= TX2

TX1,

c.f., Fubini’s theorem.

• Define C2 = {f ! C | f 0, f” ! C}.

! March 7, 2015 George Kesidis

42

Page 22: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Trotter’s proof of the i.i.d. CLTPreliminaries (cont)

Lemma 1: If limn+* Ef(Xn) = Ef(X) 'f ! C2, then X1, X2, ...converges in distribution to X.Note: Hypothesis is satisfied if 8 TXn

f & TXf 8+ 0.Proof of Lemma 1:

• Consider any y at which FX is continuous.

• Fix ) > 0 arbitrarily and take ( > 0 small enough so thatFX(y + ()& FX(y & () < ).

• Define f, g ! C2 such that

(i) f(x) = 1 for x - y & (,

(ii) g(x) = 1 for x - y,

(iii) f(x) = 0 for x , y, and

(iv) g(x) = 0 for x , y + (;

so that 0 - f - g - 1 in particular.

• So, since f(X) - 1{X - y & (} etc.,

FX(y & () - Ef(X) = limn+*

Ef(Xn) - lim infn+*

FXn(y)

- lim supn+*

FXn(y) - lim

n+*Eg(Xn) = Eg(X) - FX(y + ()

where the equalities are by hypothesis.

• Since this holds ') > 0, limn+* FXn(y) = FX(y).

! March 7, 2015 George Kesidis

43

Trotter’s proof of the i.i.d. CLTPreliminaries (cont)

Lemma 2: If A,B : C + C are linear, contraction operatorsthat commute, then 8 Anf&Bnf 8- n 8 Af&Bf 8 'n ! Z+, f ! C.Proof:

• Factor

Anf & Bnf =n&1$

i=0

An&i&1(A&B)Bif =n&1$

i=0

An&i&1Bi(A&B)f.

where the second equality is by commutativity.

• Now take norm of both sides, use the triangle inequality,and finally repeatedly use the contraction hypotheses.

! March 7, 2015 George Kesidis

44

Page 23: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Trotter’s proof of the i.i.d. CLTPreliminaries (cont)

Lemma 3: If EX = 0 and EX2 = 1, then 'f ! C2 and ') > 0:.N < * such that 8 Tn&1/2Xf & f & 1

2nf” 8- )

n'n , N .

Proof:

• Fix y such that FX is continuous at y.

• By Taylor’s theorem, .z(x) ! [y, y + x] such that

f(y + x) = f(y) + xf 0(y) + 12x2f”(y) + 1

2x2[f”(z(x))& f”(y)].

• By uniform continuity of f” (f ! C2), ') > 0, .( > 0 suchthat |z(x)& y| < ( % |f”(z(x))& f”(y)| < ). Thus,

(Tn&1/2Xf)(y)

=

(

f(y + n&1/2x)dFX(x)

= f(y)

(

dFX(x) + 17nf 0(y)

(

xdFX(x) + 12nf”(y)

(

x2dFX(x)

+ 12n

(

[f”(z(n&1/2x))& f”(y)]x2dFX(x)

= f(y) + 12nf”(y)

+ 12n

,(

|x|<(7n+

(

|x|,(7n

-

[f”(z(n&1/2x))& f”(y)]x2dFX(x)

! March 7, 2015 George Kesidis

45

Trotter’s proof of the i.i.d. CLTLemma 3’s proof (cont)

• Now, |x| < (7n % |z(n&1/2x)& y| - |n&1/2x| - (.

• Thus,2

2

2

2

12n

(

|x|<(7n[f”(z(n&1/2x))& f”(y)]x2dFX(x)

2

2

2

2

-2

2

2

2

12n

(

|x|<(7n)x2dFX(x)

2

2

2

2

- )n.

• Since |f”(z)& f”(x)| - 2 8 f” 8< * and EX2 < *,

12n

2

2

2

2

(

|x|,(7n[f”(z(n&1/2x))& f”(y)]x2dFX(x)

2

2

2

2

- 1n8 f” 8

2

2

2

2

(

|x|,(7nx2dFX(x)

2

2

2

2

-)

n' su". large n.

• Finally, substitute the last two estimates into the expressionfor (Tn&1/2Xf)(y) of the previous slide.

! March 7, 2015 George Kesidis

46

Page 24: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Trotter’s proof of the i.i.d. CLT

Theorem: If X1, X2, ... are i.i.d. with E|X1| < * and 0 <EX2

1 < *, then n&1/2Sn := n&1/2(X1+ ...+Xn) +d N(µ,!2) whereµ := EX1 and !2 := var(X1).Proof:

• Let Y 5 N(µ, !2).

• By Lemma 1, theorem follows if limn+* 8 Tn&1/2Snf&TY f 8= 0.

• w.l.o.g., µ = 0 and !2 = 1, i.e., Y 5N(0,1) if we restate thetheorem in terms of (Sn & nµ)/(!

7n).

• Since Y 5N(0,1), TY = Tnn&1/2Y and, by IBP,

Tnn&1/2Y f = f + 1

2nf”.

• Since the Xi are i.i.d., Tn&1/2Sn= Tn

n&1/2X1.

• By Lemma 2,

8 Tn&1/2Snf & TY f 8 - n 8 Tn&1/2X1

f & Tn&1/2Y f 8 .

• Applying Lemma 3 we get

8 Tn&1/2X1f & Tn&1/2Y f 8 - 2).

for all su$ciently large n.

! March 7, 2015 George Kesidis

47

Lindeberg’s CLT forindependent random variables

• Consider an independent sequence of random variables withEXi = 0 'i w.l.o.g.

• Let !2i = EX2

i , i.e., they are not necessarily identically dis-tributed.

• The CLT can be generalized to a sequence of random vari-ables that assuming only their mutual independence underLindeberg’s condition:

limn+*

s&2n

n$

i=1

(

|x|,(sn

x2dFXi(x) = 0 '( > 0,

where sn :=/

)ni=1 !

2i , i.e., the Xi are not identically dis-

tributed (recall the last inequality of the proof of Lemma3).

• A simple proof of Lindeberg’s CLT follows that of Trotter’sfor the i.i.d. case [Trotter’59].

• Feller proved that Lindeberg’s condition is necessary.

! March 7, 2015 George Kesidis

48

Page 25: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Modes of convergence

• A CLT involves convergence in distribution.

• This is the weakest class of convergence results, in whichno limiting random variable need exist.

• FYn(y) + FY (y) 'y that are points of continuity of FY implies

Yn + Y in distr’n.

• In the following, we will see that convergence:in distr’n 9 in prob. 9 (a.s. or in L2).

! March 7, 2015 George Kesidis

49

Weak law of large numbers (WLLN):assumptions

• Assume random variables X1, X2, X3, ... are i.i.d.

• Also suppose that the common distribution has finite vari-ance, i.e., !2 := var(X) := E(X & EX)2 < *, where X 5 Xi

'i.

• Finally, suppose that the mean exists and is finite,i.e., µ := EX < *.

• Recall the sum Sn := X1+X2+ · · ·+Xn for n , 1, ESn = nµand var(Sn) = n!2.

• The quantity Sn/n is called the empirical mean of X after nsamples and is an unbiased estimate of µ, i.e.,

E

,

Sn

n

-

= µ.

! March 7, 2015 George Kesidis

50

Page 26: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

A WLLN: Statement and Proof

Theorem: If X1, X2, ... i.i.d. with var(X1) < *,

limn+*

P

,2

2

2

2

Sn

n& µ

2

2

2

2

, )

-

= 0 ') > 0.

• By Chebyshev’s inequality,

P

,2

2

2

2

Sn

n& µ

2

2

2

2

, )

-

-var(Sn/n)

)2=

!2

n)2.

• Note: So, Sn/n is said to be a weakly consistent estimatorof µ.

• Consequently, L2 convergence, E(Yn&Y )2 + 0, implies weakconvergence, P(|Yn & Y | > )) + 0 ') > 0.

• Example: Discretely distributed Yn + 0 in probability butnot in L2 when:

P(Yn = &n) = P(Yn = n) = pn = (1& P(Yn = 0))/2

such that pn + 0 as n + * but n2pn (+ 0, e.g., pn = 1/n forn > 1 so that EY 2

n = 2n (+ 0.

• Example: Yn + c (a constant) in probability 3 Yn + c indistribution.

! March 7, 2015 George Kesidis

51

Strong Law of Large Numbers (SLLN)

• Again, a sequence of random variables X1, X2, ... is said toconverge almost surely (a.s.) to a random variable X if

P

0

limn+*

Xn (= X1

= 0.

• Kolmogorov’s strong LLN: if X1, X2, ... i.i.d. and E|X1| < *,then

P

,

limn+*

Sn

n= µ

-

= 1 i.e.,Sn

n+ µ := EX1 a.s.

• Formally, the limit states that 'r ! Z+, .n such that 'k , n:|Sn/n& µ| < 1/r a.s.; i.e.,

P

! *%

r=1

*"

n=1

*%

k=n

3

|Sk

k& µ| <

1

r

4

#

= 1.

• But P(5*

r=1Br) = 1 3 P(Br) = 1 'r, soSn/n + µ a.s. if and only if

P

! *"

n=1

*%

k=n

3

|Sk

k& µ| <

1

r

4

#

= 1 'r.

! March 7, 2015 George Kesidis

52

Page 27: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

SLLN statement (cont)

• Now fix r ! Z+ arbitrarily and let

Ack =

6

|Sk/k & µ| < r&17

.

• The event*"

n=1

*%

k=n

Ack

is denoted Acn almost always (a.a.), i.e.,

" ! Acn a.a. 3 .n4(") such that " ! Ac

n 'n , n4(").

• Note: P(5*

k=n Ack) 1 P(Ac

n a.a.).

• Note: equivalently express above in terms of An where theevent

*%

n=1

*"

k=n

Ak

is denoted An infinitely often (i.o.) = (Acn a.a.)c, i.e.,

" ! An i.o. 3 'n .m > n such that " ! Acm.

! March 7, 2015 George Kesidis

53

SLLN and Borel-Cantelli Lemmas

So, Sn/n + µ a.s. 3 P(Acn a.a.) = 1 3 P(An i.o.) = 0 for all

r ! Z+ where An = {|Sn/n& µ| , r&1}.

• First BC Lemma: Generally for events A1, A2, . . . ,

*$

n=1

P(An) < * % P(An i.o.) = 0.

• Proof:

P(An i.o.) - P(*"

n=k

An) 'k

- P(Ak) + P(Ak+1) + · · · +k 0

• Second BC Lemma: For independent events A1, A2, . . . ,

*$

n=1

P(An) = * % P(An i.o.) = 1.

• Proof: limm+*)m

k=n P(Ak) = * implies

0 : e&)m

k=nP(Ak) =

m&

k=n

e&P(Ak) ,m&

k=n

(1& P(Ak)) = P(m%

k=n

Ack)

where the equalities are by independence ande&x , 1& x was used.

• Note: P(An i.o.) ! {0,1} by the “zero-one law”.

! March 7, 2015 George Kesidis

54

Page 28: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Kolmogorov’s maximal inequality

If X1, X2, ... are independent and EX2n < * 'n,

P(max1-k-n

|Sk & ESk| , &) -var(Sk)

&2'n , 1,& > 0.

Proof:

• W.l.o.g. assume EXn = 0 'n % ESn = 0 'n. Fix & > 0.

• Define disjoint Bk = {|Si| < & 'i < k, |Sk| , &}.

ES2n ,

n$

k=1

ES2n1Bk

,n$

k=1

E(2(Sn & Sk)Sk + S2k)1Bk

=n$

k=1

[2E(Sn & Sk)ESk1Bk+ ES2

k1Bk]

=n$

k=1

ES2k1Bk

where the second-to-last inequality is by independence.

• Thus, ES2n , &2

)nk=1 P(Bk) = &2P(

8nk=1Bk).

• Note how this generalizes Chebyshev’s inequality.

! March 7, 2015 George Kesidis

55

SLLN: proof of bounded second moment case

Assume EX21 < *.

• Again, assume centered Xn w.l.o.g., and apply the maximalinequality with & = 2n) and First BC Lemma to get that

P({ max1-k-2n

|Sk| - 2n)} a.a.) = 1 ') > 0.

• Now, 'm such that 2n&1 < m - 2n,

max1-k-2n

|Sk| - 2n) implies |Sm| - 2m)

• This leads to P(|Sm|/m - 2) a.a.) = 1 ') > 0.

• Note: Kolmogorov’s SLLN only requires E|X1| < *, i.e., boundedfirst moment.

! March 7, 2015 George Kesidis

56

Page 29: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Weak and strong LLNs

• SLLN % WLLN since P(Acn i.o.) = 0 % P(An) + 0.

• Example of persistently shrinking pulse on ! = [0,1] with P

Lebesgue measure:

– 'm ! Z+, k ! {1,2, . . . ,m}, define

Yk+m(m&1)/2(") := 1

3

k &m

m< " -

k

m

4

.

– The random variables Yn converge weakly but not stronglyto zero because P({Ym > )} i.o.) = 1 for all ) = r&1 > 0.

• Theorem: If X1, X2, ... converges to X in probability thenthere is a subsequence Xn1, Xn2, ... that converges to X a.s.

! March 7, 2015 George Kesidis

57

Uniform integrability: motivation

• Persistently shrinking pulse example also showed that con-vergence in Lq, for q , 1, does not generally imply conver-gence a.s.

• Example (Dirac/Heaviside impulse):

– ! = [0,1] and P is Lebesgue measure.

– Xn(") := n1[0,

1n]'n , 1.

– Clearly, Xn + 0 a.s.

– But, EXn = 1 'n , 1.

• So, convergence a.s. (i.e., “pointwise”) does not generallyimply convergence in Lq either.

• Under what conditions does a.s. convergence imply conver-gence in Lq for q , 1?

! March 7, 2015 George Kesidis

58

Page 30: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Uniform integrability: preliminaries

Consider the probability space (!,F ,P).

• Theorem: If E|X| < * then ') ! (0,*) .c()) ! [0,*) suchthat

E(|X|1{|X|,c}) < ) 'c ! [c()),*).

Proof:

– Lebesgue integrals are uniformly continuous, i.e.,.(()) ! (0,*) such that EX1A < ) 'A ! F such thatP(A) < (())(exercise: prove by contradiction of the assumption thatE|X| < *).

– By Markov’s inequality P(|X| , c) - c&1E|X| 'c ! (0,*).

– Since E|X| < *, .c()) ! (0,*) such thatP(|X| , c) < (()) 'c ! [c()),*).

– Finally, take A = {X , c} for c ! (c()),*).

! March 7, 2015 George Kesidis

59

Uniform integrability:definition and su!cient conditions

• A collection of random variables C on (!,F ,P) is uniformlyintegrable if: ') ! (0,*) .c()) ! [0,*) such that

supX!C

E(X1{|X|,c}) < ) 'c ! [c()),*) and 'X ! C.

• If |X| - Y a.s. 'X ! C with EY < *, then C is uniformly inte-grable (note that EXn1{|Xn| > Y } = 0 and recall Lebesgue’sdominated convergence theorem); the converse is not true.

• For uniformly integrable C, if c ! [c()),*) then

E|X| = E|X|1{|X|<c} + E|X|1{|X|,c} < c+ ) 'X ! C,

i.e., C is uniformly L1 bounded; the converse is not true.

• If C = {X0, X1, ...} such that 0 - Xn - Xn+1 'n ! Z+ (i.e., anincreasing sequence) and C is L1 bounded, then by monotoneconvergence theorem, C is uniformly integrable.

• Theorem: If X ! (!,F ,P) and E|X| < * and {G&, & ! %} isa collection of sub !-algebras of F, then {E(X | G&), & ! %}is uniformly integrable.Proof: exercise.

! March 7, 2015 George Kesidis

60

Page 31: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Uniform integrability: main theorem prelim

• Define the ramp +c(x) = x1{|x|<c} + c1{x,c} & c1{x-&c}.

• Theorem: If C is uniformly integrable then ') ! (0,*).c()) ! [0,*) such that

E|X & +c(X)| < ) 'X ! C, c ! [c()),*).

• Proof: Arbitrarily fix c ! (0,*) and note that

|x& +c(x)| = (x& c)+ + (x+ c)& 'x ! R

% E(X & c)+ = E(X & c)1{X,c} - E|X|1{|X|,c}

and E(X + c)& = &E(X + c)1{X-&c} - E|X|1{|X|,c}.

! March 7, 2015 George Kesidis

61

Swapping limits & expectation (integration)

Theorem: If (a) limn+*Xn = X a.s. and (b) {X0, X1, ...} areuniformly integrable, then

E|X| < * and limn+*

E|Xn &X| = 0.

Proof:

• By (b) and previous “uniformly L1 boundedness” result,.B < * such that E|Xn| < B 'n ! Z+.

• So, by (a) and Fatou’s lemma,

E|X| = E lim infn+*

|Xn| - lim infn+*

E|Xn| < B.

• To get L1 convergence of Xn to X, arbitrarily fix ) ! (0,*).By the previous “ramp” result and (b), .c()) ! (0,*) suchthat, 'c ! (c()),*), n ! Z+,

E|Xn & +c(Xn)| < )/3 and E|X & +c(X)| < )/3.

• Fix c ! (c()),*). Since +c is continuous and bounded inmagnitude by c, by the dominated convergence theorem,.N()) ! Z+ such that

E|+c(Xn)& +c(X)| < )/3 'n , N()).

• By these three “)/3” inequalities and the triangle inequality,

E|X &Xn| - E|Xn & +c(Xn)|+ E|X & +c(X)|+ E|+c(Xn)& +c(X)|< ).

! March 7, 2015 George Kesidis

62

Page 32: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Completeness of Lq

• Definition: Lq(!,F ,P), or just Lq, is the set of random vari-ables X on (!,F ,P) such that ||X||q := (E(|X|q))1/q < *.

• Definition: X1, X2, ... is said to be a Cauchy sequence in Lq

if ') > 0 .N) such that ||Xn &Xm||q < ) whenever n,m > N).

• Definition: A set is said to be complete if all of its Cauchysequences converge to an element inside it.

• For q , 1, || · ||q is a norm by Minkowski’s inequality, ||X +Y ||q - ||X||q + ||Y ||q, so that Lq is a (complete) Banachspace.

• To show that Lq is complete:

– Consider a Cauchy sequence X1, X2, ... in Lq.

– Let N4()) := inf{N' | ' - )} and define X4 = XN 4()); so,by Minkowski’s inequality, 'n , N4,

||Xn||q - ||X4||q + ||Xn &X4||q - ||X4||q + ),

i.e., the sequence {Xn, n , N4} is bounded in Lq.

– Then, one can show that a subsequence Xnk a.s. con-verges to a (measurable) random variable X (use thecompleteness of R and argue by contradiction).

– So by Fatou’s lemma, ||X||qq < *, i.e., X ! Lq.

– Finally, use Fatou’s lemma on ||X & Xn||qq to establishLq-convergence to X.

! March 7, 2015 George Kesidis

63

Caratheodory’s extension theorem

• Theorem: If A is an algebra on ! and P is countably addi-tive on A, then there exists P̄ on !(A) such that P = P̄ onA.

• In addition, if . !1 " !2 " !3... ! A such that !n 1 !, thenthe extension P̄ is unique.

• On R, Caratheodory’s theorem extends a countably additiveprobability measure on the algebra A containing all intervalsand their finite unions, to a !-field that is a strict subset of2R but contains B = !(A).

! March 7, 2015 George Kesidis

64

Page 33: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Product probability spaces

• Consider two probability spaces (!i,Fi,Pi), i = 1,2.

• Define the product sample space

! := !1 6!2 := {("1,"2) | "i ! !i, i = 1,2}

• Note that A0 := {A1 6 A2 | Ai ! Fi, i = 1,2} is closedunder intersections but not under finite unions, e.g., cannotexpress

(A1 6A2) $ (B1 6 B2) as C1 $ C2

where Ai,Bi, Ci ! Fi.

• So, add all finite disjoint unions of elements of A0 to A0 andcall the result A, an algebra.

• Denote F1 6 F2 = !(A).

! March 7, 2015 George Kesidis

65

Product probability space extension

Theorem: There exists a unique P such that

(a) P(A1 6 A2) = P1(A1)P2(A2) 'A1 6 A2 ! A0, and uniquelyextending P to A with finite unions, and

(b) (! = !1 6!2,F = !(A),P) is a probability space.

Proof:

• Sections of measurable sets are measurable, where a sectionof A " ! along "2 is

A"2 := {"1 ! !1 | ("1,"2) ! A} for "2 ! !2,

because '"2 ! !2, M := {A ! F | A"2 ! F1} is a monotoneclass % M = F.

• A = A16A2 ! A0 % A"2 = A1 if "2 ! A2 otherwise A"2 = ) %

P1(A"2) = P1(A1)1A2("2) % P(A) =

(

P1(A"2)dP2("2),

where P1(A"2) is a (!2,F2,P2) random variable.

• Such disintegration extends to A ! A by finite additivity andP is countably additive on A by dominated convergence.

• So, P (and disintegration) extend uniquely to F by Caratheodory.

! March 7, 2015 George Kesidis

66

Page 34: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Fubini-Tonelli Theorem

• Theorem: If

(i) X : ! = !1 6!2 + R is F = F1 6 F2 measurable and,

(ii) "3&i +9

X(")dP("i) is a.s. finite and F3&i-measurable'i ! {1,2},

then

EX :=

(

XdP =

(,(

XdP1

-

dP2 =

(,(

XdP2

-

dP1.

Proof:

– By disintegration,9

XdPi are F3&i-measurable and thetheorem holds for f = 1A, A ! F.

– Extend to simple functions and take limits via dominatedconvergence to prove for the case where E|X| < *.

• Considering hypothesis (ii), recall how absolute summabilityof a sequence implies its (unique) summability in any order.

! March 7, 2015 George Kesidis

67

Consistency of Probability Measures

• Consider the product space (RZ+

,BZ+

) =: (R*,B*), whereZ+ := {0,1,2,3, ...}.

• Again, underlying probability space (!,F ,P).

• A cylinder event A ! B* is of the form

A = A0 6 A1 6 A2 6 ...

where all but a finite number of Ai = !, i.e., there is a finiteindex IA " Z+ such that Ai = R 'i (! IA.

• A family of probability measures {Pn}n!Z+, Pn on (Rn,Bn), issaid to be consistent if

Pn(A0 6 A1 6 . . . An&1) = P

n+1(A0 6A1 6 . . . An&1 6 R)

for all cylinder sets A0 6A1 6 . . . An&1.

! March 7, 2015 George Kesidis

68

Page 35: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

Kolmogorov’s Extension Theorem

For each consistent family of probability measures Pn

on (Rn,Bn), .! consistent P* on (R*,B*).

• Clearly, require that

P*(A) = P

n(A0 6 A1 6 ...6An&1)

for all cylinder sets A = A0 6A1 6 ... and all n ! Z+.

• Since P* is specified for all cylinder sets, P* is unique onthe algebra generated by them and, by the monotone classtheorem, unique on B* too.

• For existence:

– Let A be the set of finite unions of cylinder sets, including), so that !(A) = B*.

– Show P* is finitely additive on A and apply Caratheodory’sextension theorem.

! March 7, 2015 George Kesidis

69

Consistency and FDDs

• Consider a discrete-time/parameter stochastic process

X := {Xt | t ! Z+}where each Xt is itself a random variable.

• Let Ft1,t2,...,tn be the joint CDF of Xt1, Xt2, ...,Xtn for some finiten and di"erent tk ! Z+ for all k ! {1,2, ..., n}, i.e.,

Ft1,...,tn(x1, ..., xn) = P(Xt1 - x1, ...,Xtn - xn),

where P is the underlying probability measure.

• This is called an n-dimensional distribution of X.

! March 7, 2015 George Kesidis

70

Page 36: An Introduction to Probability Theory: Outline Sample ...gik2/teach/prob-theory2.pdf · An Introduction to Probability Theory: Outline 1. Definitions: sample space and (measurable)

KET and Consistent FDDs

• A family of such joint CDFs is called a set of finite-dimensionaldistributions (FDDs).

• The FDDs are consistent if one can marginalize (reduce thedimension) of one and obtain another, e.g.,

Ft1,t4(x1, x4) := Ft1,t2,t3,t4(x1,*,*, x4).

then using KET one can prove .! a discrete-time stochasticprocess X on R* (with distribution P*), i.e.,

dPn := dF0,1,...,n&1 and dP* := dFZ+

• Samples " of the underlying probability space space are ac-tually sample paths of the stochastic process, i.e., Xt(").

• KET can be extended to continuous-time stochastic pro-cesses, i.e., sample paths in Xt(") ! RR+

instead of ! RZ+

.

• In the following, we will focus on the underlying probabilityspace ! and the !-algebras !(Xs | s - t) for t ! R+ = [0,*),i.e., in continuous-time...

! March 7, 2015 George Kesidis

71

Uncountable products

Suppose I is an uncountably infinite index set and 't ! I: (!,Ft)is a sample space and !-algebra of events.

• Theorem: If A ! !(Ft, t ! I) =: GI, then there is somecountable J " I (depending on A) such that A ! GJ.Proof:

– Define H = {A ! GI | . countable J " I s.t. A ! GJ}.

– Clearly, ! ! H and H is closed under countable unionsso that H is a !-algebra.

– Finally since Ft " H 't ! I, H = GI.

• Corollary: If Y : ! + R̄ is GI-measurable, then . a countableJ " I such that Y is GJ-measurable.Proof:

– Use previous theorem if Y = 1A and easily extended tosimple (discretely distributed) Y .

– Extend to any (measurable) random variable Y by ap-proximating with simple functions.

• For the special case where G[0, t] = !(Xs | s - t), i.e., Ft =!(Xt) for random variables Xt: if Y is Ft-measurable then .countable {t0, t1, ...} " [0, t] and a BZ+

-measurable mapping& such that

Y = &(Xt0, Xt1, ...) a.s.

Recall Doob’s theorem.

! March 7, 2015 George Kesidis

72