Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts...

108
Introduction to Probability Theory Fabio G. Cozman - [email protected] August 29, 2018

Transcript of Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts...

Page 1: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Introduction to Probability Theory

Fabio G. Cozman - [email protected]

August 29, 2018

Page 2: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Part I: Basic Concepts for Finite Spaces

1 Possibility/sample space, outcomes, events

2 Variables and indicator functions

3 Probabilities, expectations

4 Properties of probabilities

Page 3: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Possibility/sample space

1 Possibility/sample space: set Ω.

2 Elements ω of Ω are outcomes.

3 Subsets of Ω are events (no fuzziness!).

Example

Two coins are tossed; each coin can be heads (H)or tail (T ). Then Ω = HH ,HT ,TH ,TT.Consider three events. Event A = HH is theevent that both tosses produce heads. EventB = HH ,TT is the event that both tossesproduce identical outcomes. Event C = HH ,THis the event that the second toss yields heads. Notethat A = B ∩ C .

Page 4: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Probability measure (finite spaces!)

A probability measure is a function that assigns aprobability value to each event.

PU1 For any event A, P(A) ≥ 0.

PU2 The space Ω has probability one:P(Ω) = 1.

PU3 If events A and B are disjoint (that is,A ∩ B = ∅), thenP(A ∪ B) = P(A) + P(B).

Page 5: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Easy example

Example

A six-sided die is rolled. Suppose all outcomes of Ωare assigned precise and identical probability values.

We must have∑

ω∈Ω P(ω) = P(Ω) = 1, thuswe have P(ω) = 1/6 for all outcomes.

The event A = 1, 3, 5 (outcome is odd) hasprobability P(A) = 1/2.

The event B = 1, 2, 3, 5 (outcome is prime)has probability P(B) = 2/3 andP(A ∩ B) = P(1, 3, 5) = 1/2.

Page 6: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Properties of probabilities

1 As A and Ac are disjoint and A ∪ Ac = Ω, wehave P(A) + P(Ac) = P(Ω) = 1,

P(A) = 1− P(Ac) .

2 P(∅) = 1− P(∅c) = 1− P(Ω) = 0.3 P(A ∪ B) = P(A) + P(B)− P(A ∩ B).4 For n mutually disjoint events Bi ,

P(∪ni=1Bi) =n∑

i=1

P(Bi) .

5 If events Bi form a partition of Ω,

P(A) = P(∪ni=1A ∩ Bi) =∑i

P(A ∩ Bi) .

Page 7: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Random variables

1 Function X : Ω→ < is usually called a randomvariable.

2 If X is a variable, then any function f : < → <defines a random variable f (X ).

Example

The age in months of a person ω selected from apopulation Ω is a variable X . The same populationcan be used to define a different variable Y whereY (ω) is the weight (rounded to the next kilogram)of a person ω selected from Ω. We can also have arandom variable Z = X + Y .

Page 8: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Distributions

Possibility space Ω, variable X : Ω→ <.

Then: possibility space ΩX containing everypossible value of X .

Probability measure on Ω induces a measureover subsets of ΩX :

P(X ∈ A) = P(ω ∈ Ω : X (ω) ∈ A) .

Induced measure on ΩX is usually called thedistribution of X .

Page 9: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Expectations

Given variable X , its expectation is

E[X ] =∑ω∈Ω

X (ω)P(ω) =∑x

xP(X = x) .

An expectation functional yields a real numberfor each variable.Properties:

For constants α and β, if α ≤ X ≤ β, thenα ≤ E[X ] ≤ β.E[X + Y ] = E[X ] + E[Y ].

Variance is E[(X − E[X ])2

].

Page 10: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Part II: A Bit of History

1 Old times.

2 Classical probabilities.

3 Frequentist and Bayesian schemes.

Page 11: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Brief Look: History of Probabilities

ClassicalLeibnitz, Fermat, Pascal, De Moivre (1600)Bayes (1700)Laplace (1800)Modifications: Keynes, Jeffreys, Jaynes (1940)

FrequentistVenn, Boole, De Morgan (1850)Fisher, Neyman/Pearson (1900)

BayesianRamsey, De Finetti (1930)Savage (1950)

Page 12: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

The Classical Theory: Ancient Time

First thoughts appeared in Philosophy:

Aristotle: “the probable is that which for the mostpart happens”

Bishop Butler: “to us probability is the very guideof life”

Also, many philosophers have used probabilityto prove the existence of God (e.g., the proofof the ecliptic)

Page 13: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

The Classical Theory: Evolution

Pascal, De Moivre, Bernoulli: Central limittheorem, law of large numbers.Thomas Bayes: What you believe depends onwhat you believed before; we need priordistributions.

Bayes’ rule: P(A|B) =P(B |A)P(A)

P(B).

Page 14: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

The Classical Theory: Laplace

Probability is the ratio of the number offavorable cases to that of all the cases possible

The Principle of Non-Sufficient Reason: twopossible cases are equally probable if there is noreason to prefer one to the other

Page 15: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

The Classical Theory: Difficulties

The great problem: the Principle of Non-SufficientReason

Too many proofs from too little knowledge.

The problem of reparameterizations:If you are not sure about x , you are not sureabout x2. How to express that?

Page 16: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Now Come the Frequentists

Basic Idea: instead of using ignorance, let’s useknowledge

Let’s define probability as the limiting relativefrequency of observations

P(A) = limn→∞

nAn

Venn, Boole and De Morgan proposed itaround 1850; Statistics was built upon thisconception of probabilities

Page 17: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

The Frequentist Theory: Difficulties

The definition is too poor compared to whatwe want

It is impossible to talk about probabilities forthings that will happen only once!

More mathematically, how to use the limit inthe definition (limn→∞nA/n)?

Many deterministic sequences have limitsDo random sequences have limits?

Page 18: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

A Brief Summary So Far

Classical Theory:Probability is is the ratio of favorable cases to thenumber of cases (Principle of Non-SufficientReason)Problem: Principle of Non-Sufficient Reason isuntenable

Frequentist Theory:Probability is a limiting relative frequencyProblems: too narrow a concept; hard to definemathematically

Page 19: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

The Emergence of Subjectivism

Since everything else seems to fail, why don’twe admit that there is a component ofsubjectivism in probability?Ramsey/De Finetti groundbreaking idea: let’sdefine probability as a “fair” betting strategy:

I’ll give you 1 unit of currency if President X isre-electedHow much would you pay to bet “fairly” that Xwill not be re-elected?The amount you pay is your probability for Xre-elected

Page 20: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

The Bayesian Theory: Savage’s Idea

Axiomatize preferences over “gambles”

From preferences, obtain “money” (utility) andprobabilities

Result: If f g then there is a probabilitymeasure P and a utility function U such thatE [U(f)] < E [U[(g)]].

Page 21: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

The Bayesian Theory: Basics

All forms of uncertainty are reduced toprobability

Judgements of uncertainty are reduced topreferences

All forms of updating knowledge are equivalentto application of Bayes’ rule

Page 22: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Frequentists Versus Bayesians

Bayesians:

Induction is a solved problem: you define yourprior, you collect data and then you apply Bayesrule, always following decision theory

Challenges: basically subjective (annoying priors).

Frequentists:

Induction is an ad hoc activity; Statistics furnishesuseful tools for induction

Some tools: significance testing, hypothesistesting, least-squares...

Challenges: based on shaky foundations; piecemealand ad hoc approach to problems.

Page 23: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Part III

1 Moments, variance, covariance.

2 Weak laws of large numbers.

Page 24: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Moments

Definition

The ith moment of X is the expectation E[X i].

Definition

The ith central moment of X is the expectationE[(X − E[X ])i

].

Definition

The variance V [X ] of X is second central momentof X .

Note:

V [X ] = E[(X − E[X ])2

]= E

[X 2]− E[X ]2 .

Page 25: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Markov inequality

1 Suppose X ≥ 0 and t > 0.If X (ω) < t, IX≥t(ω) = 0. Then X (ω)/t ≥ IX≥t(ω). If X (ω) ≥ t, then

X (ω)/t ≥ 1 = IX≥t(ω). Consequently, X/t ≥ IX≥t and then E[X ] /t ≥ E[IX≥t

], so:

P(X ≥ t) ≤ E[X ]

t.

2 Chebyshev inequality:

P(|X − E[X ] | ≥ t) ≤ V [X ]

t2.

Page 26: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Digression: Covariance

Definition

The covariance of variables X and Y isCov(X ,Y ) = E[(X − E[X ])(Y − E[Y ])].

If two variables X and Y are such thatCov(X ,Y ) = 0, then X and Y are uncorrelated.

Page 27: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Very weak law of large numbers

Theorem

If variables X1,X2, . . . ,Xn have expectationsE[Xi ] ∈ [µ, µ] and variances V [Xi ] ∈ [σ2, σ2], andXi and Xj are uncorrelated for every i 6= j , then forany ε > 0,

P(µ− ε <

∑i Xi

n< µ + ε

)≥ 1− σ2

nε2.

Page 28: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Weak law of large numbers

Theorem

If variables X1,X2, . . . have expectations E[Xi ] = µand variances V [Xi ] = σ2, and Xi and Xj areuncorrelated for every i 6= j , then for any ε > 0,

limn→∞

P(∣∣∣∣∑i Xi

n− µ

∣∣∣∣ < ε

)= 1.

Page 29: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Philosophy behing the “law”

Idea: irregularities observed in Xi do not affectthe average of these variables.

We should have regularity out of apparentchaos: even though the random variablesbehave randomly, their average does approachsome meaningful number (the probability...).

Suggests the “definition”: P(A) = limn→∞ #A/n.

Page 30: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Part IV: Conditioning

1 Bayes rule.

2 Theorem of total probabilities.

Page 31: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Conditioning: Bayes rule

Definition

If P(B) > 0, then

P(A|B) =P(A ∩ B)

P(B).

Definition

The conditional expectation of X given B , denotedby E[X |B], is defined only if P(B) > 0 as

E[X |B] =∑x

xP(X = x |A) =E[IBX ]

P(B).

Page 32: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Basic facts

For any C such that P(C ) > 0:

For any A, P(A|C ) ≥ 0.

P(Ω|C ) = 1.

If A ∩ B = ∅, thenP(A ∪ B |C ) = P(A|C ) + P(B |C ).

Note that P(A|A) = 1 whenever P(A) > 0.

Page 33: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Properties

1 E[X |B] =∑

ω∈B X (ω)P(ω|B).

2 For events Bini=1,

P(B1 ∩ B2 ∩ · · · ∩ Bn) = P(B1)n∏

i=2

P(Bi | ∩i−1

j=1 Bj

).

Page 34: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

More properties

1 Total probability theorem: If events Bi forma partition of Ω such that all P(Bi) > 0,

P(A) =∑i

P(A ∩ Bi) =∑i

P(A|Bi)P(Bi) .

2 Then:

P(Bi |A) =P(A|Bi)P(Bi)∑i P(A|Bi)P(Bi)

.

Page 35: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Example

1 Individuals in an office have a disease D. Testto detect the disease (R or Rc).

2 P(R |D) = 9/10 (sensitivity of the test).

3 P(Rc |Dc) = 4/5 (specificity of the test).

4 P(D) = 1/9.

5 Then:

P(D|R) =9/10× 1/9

9/10× 1/9 + 1/5× 8/9= 9/25.

Page 36: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Three-prisoners problem

1 Three prisoners, Teddy, Jay, and Mark, waitingto be executed.

2 Governor will select one to be freed (equalprobability)

3 Warden knows the governor’s decision4 Teddy convinces the warden to say the name of

one of his fellow inmates who will be executed(useless information...)

5 Warden is honest6 Warden says that Jay is to be executed: Teddy

is happy (1/3 to 1/2)!7 But if warden said Mark, Teddy would be

happy??

Page 37: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Analysis

1 Possibility space:

Ω =

Teddy freed ∩ warden says Jay,

Teddy freed ∩ warden says Mark,Jay freed ∩ warden says Mark,Mark freed ∩ warden says Jay

.

2 We know that

P(Teddy freed)=P(Jay freed)=P(Mark freed)=1/3.

3 How would the warden behave if Teddy is to befreed?

P(warden says Jay|Teddy freed) .

Page 38: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Possible conclusion...

IfP(warden says Jay|Teddy freed) = 1/2,

then:

P(Teddy freed ∩ warden says Jay) = 1/6,

P(Teddy freed ∩ warden says Mark) = 1/6,

P(Jay freed ∩ warden says Mark) = 1/3,

P(Mark freed ∩ warden says Jay) = 1/3.

Hence

P(Teddy freed|warden names Jay) =1/6

1/3 + 1/6= 1/3.

Page 39: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Complete analysis...

1 Statement does not say anything about thebehaviour of the warden.

2 All that is really known is

P(warden names Jay|Teddy freed) ∈ [0, 1].

3 Consequently

P(Teddy freed|warden names Jay) ∈[

0

0 + 1/3,

1/3

1/3 + 1/3

].

Page 40: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Part V: Probability mass functions

1 Mass functions.

2 Marginal probability mass functions.

3 Conditional probability mass functions.

4 Multivariate models.

Page 41: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Probability mass function

1 Probability mass function is simplypX (x) = P(X = x).

2 Then P(X ∈ A) =∑

x∈A pX (x).

Example (Uniform distribution)

Uniform distribution for X assigns pX (x) = 1/k forevery value x of X .

Example (Bernoulli distribution)

Binary variable X with values 0 and 1. Bernoullidistribution with parameter p for X takes twovalues: pX (0) = (1− p) and pX (1) = p.E[X ] = 0(1− p) + 1p = p; V [X ] = p(1− p).

Page 42: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

More on probability mass functions...

1 For Y = f (X ),

pY (y) = P(Y = y) =∑

x∈ΩX ,Y (x)=y

pX (x),

pY (y) = P(Y = y) =∑

ω∈Ω,Y (X (ω))=y

P(ω) .

2 Conditional probability mass functionpX |B(x |B) = P(X = x|B).

3 Joint probability mass function p(X ,Y ):

pX ,Y (x , y) = P(X = x ∩ Y = y) .

Page 43: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Marginal probability mass functions

pX (x) = P(X = x)=

∑y∈ΩY

P(X = x ∩ Y = y)

=∑y∈ΩY

pX ,Y (x , y).

Example

X and Y with three values each and

pX ,Y (x , y) y = 1 y = 2 y = 3x = 1 1/10 1/25 1/20x = 2 1/20 1/5 1/25x = 3 1/10 1/50 2/5

Page 44: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Expectations and mass functions

Finite spaces:

E[X ] =∑ω∈Ω

X (ω)P(ω)

=∑x∈ΩX

∑ω:X (ω)=x

xP(ω)

=∑x∈ΩX

x∑

ω:X (ω)=x

P(ω)

=∑x∈ΩX

xP(X = x)

=∑x∈ΩX

xpX (x).

Page 45: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Conditional probability mass

For variable X and event A such that P(A) > 0,

pX |A(x |A) = P(X = x |A) .

For variables X and Y ,

pX |Y (x |y) = P(X = x |Y = y) .

Page 46: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Iterated expectations

Denote E[X |Y = y ] by E[X |y ].Then E[X |Y ] is a function of Y .For finite spaces:

E[X ] =∑x∈ΩX

xpX (x)

=∑x∈ΩX

x∑y∈ΩY

pX ,Y (x , y)

=∑x∈ΩX

∑y∈ΩY

xpX |Y (x |y)pY (y)

=∑x∈ΩX

E[X |Y = y ] pY (y)

= E[E[X |Y ]] .

(A similar expression holds for infinite spaces.)

Page 47: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

For sets of variables

1 Probability mass function, conditional, joint,marginal, etc.

2 Vectors:

X =

X1...Xn

= [X1, . . . ,Xn]T ,

pX(x) = P(X = x) ,E[X] = [E[X1] , . . . ,E[Xn]]T ,

Page 48: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Part VI: Independence

1 Independence for two events, for many events.

2 Independence for random variables.

3 Conditional independence.

Page 49: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Independence for events

1 A and B are independent

P(A|B) = P(A) whenever P(B) > 0;

or, equivalently,

P(A ∩ B) = P(A)P(B) .

2 Many events are independent: for all subsets ofevents Aini=1,

P(∩iAi) =∏i

P(Ai) .

(Pairwise independence is not enough!)

Page 50: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Independence for random variables

For all events such that the conditionalprobabilities are defined,

P(Xi = xi| ∩j 6=i Xj = xi) = P(Xi = xi) ;

that is,

p(xi | ∩j 6=i Xj = xi) = p(xi).

Or, more concisely:

p(x1, . . . , xn) =n∏

i=1

p(xi).

Page 51: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Conditional independence

1 (X ⊥⊥Y |A) if

E[f (X )g(Y )|A] = E[f (X )|A]E[g(Y )|A]

for all functions f , g , whenever P(A) > 0.

2 (X ⊥⊥Y |Z ) if

(X ⊥⊥Y |Z = z)

for every category z of Z such thatP(Z = z) > 0.

Page 52: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Part VII: Laws of Large Numbers

1 Weak law.

2 Strong law.

Page 53: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Weak law of large numbers again

1 Independence implies uncorrelation:

E[(Xi − E[Xi ])(Xj − E[Xj ])] =

E[Xi − E[Xi ]]E[Xj − E[Xj ]] = 0.

2 If independent variables X1,X2, . . . haveexpectations E[Xi ] = µ and variancesV [Xi ] = σ2, then for any ε > 0,

limn→∞

P(∣∣∣∣∑i Xi

n− µ

∣∣∣∣ < ε

)= 1.

3 There are variants: assuming no variance,assuming expectations change, etc.

Page 54: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Advanced: strong law of large numbers

In a sequence of variables X1, . . . ,Xn, the meanconverges to the expectation with probability one:

P(

limn→∞

∑ni=1 Xi

n= µ

)= 1.

1 It requires the theory of infinite spaces.

2 It is hard to prove and requires severalassumptions.

3 It is really a strong result.

Page 55: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Part VIII: General Spaces

1 Infinities.

2 General axioms.

Page 56: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Infinite spaces

So, far Ω has been a finite set.1 Random variables have finitely many values.2 A probability mass function specifies a distributions

through finitely many values.

Now suppose Ω is an infinite set: Ω may be1 countable (natural, odd, integer, rational numbers)

or2 uncountable (real numbers).

Page 57: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Kolmogorov’s axioms

PU1 For any event A, P(A) ≥ 0.PU2 The space Ω has probability one:

P(Ω) = 1.PU3 If events A and B are disjoint (that is,

A ∩ B = ∅), thenP(A ∪ B) = P(A) + P(B).

PU4 If Ai is such thatlimn→∞ ∩ni=1Ai = ∅,then limn→∞ P(∩ni=1Ai) = 0.

Page 58: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Equivalent axioms (countable additivity)

P1 For any event A, P(A) ≥ 0.

P2 The space Ω has probability one:P(Ω) = 1.

P3 If countably many events Ai∞i=1 aredisjoint, then P(∪iAi) =

∑∞i=1 P(Ai).

The last axiom introduces countable additivity.

Page 59: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Example with discrete variable

Suppose X has integer values 0, 1, 2, . . . .

Then X has a Poisson distribution withparameter λ > 0 when

P(X = x) =e−λλx

x!,

for x ≥ 0.

Page 60: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Part IX: Mathematics of Infinite Spaces

1 Fields and algebras; Borel algebra.

2 Measurability.

3 Lebesgue integration.

Page 61: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Digression: Measurability

Given an infinite Ω, we have to specifyprobability values for its subsets.All subsets?

1 It is impossible to define a countably additiveprobability measure over all subsets of the realnumbers, for which the probability of an interval[a, b] is b − a.

2 In fact, there are subsets of < that areunmeasurable: a countable additive set-functioncannot be defined on them such that [a, b] mapsto b − a.

3 Ulam’s theorem: if a countably additive measure isdefined over all subsets of the real numbers andvanishes on all singletons, it is identically zero.

Page 62: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Kolmogorov’s solution: fields

Consider first a finite set Ω.A field F is a nonempty set of subsets of Ωsuch that:

1 if A ∈ F , then Ac ∈ F .2 if A ∈ F and B ∈ F , then A ∪ B ∈ F .

Note: if A is in a field, then Ac , ∅ and Ω areautomatically in the field.

Example:∅,A,B ,Ac ,Bc ,A ∪ B ,Ac ∪ B ,A ∪ Bc ,Ac ∪Bc ,A ∩ B ,Ac ∩ B ,A ∩ Bc ,Ac ∩ Bc , (Ac ∩ B) ∪(A ∩ Bc), (A ∪ Bc) ∩ (Ac ∪ B),Ω.

Page 63: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

σ-fields

Now consider an infinite set Ω.A σ-field is a set of subsets of Ω such that

1 if A ∈ F , then Ac ∈ F .2 if Ai ∈ F then ∪iAi ∈ F .

Note that σ-fields are closed under countableunions.

Page 64: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Fields and algebras

Fields are also called algebras.

σ fields are also called σ-algebras.

In fact, “algebra” seems to be a better term(there are other meanings for the word “field”that do not apply here...).

Terminology is confusing!

Page 65: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Borel algebras

“Minimal σ-algebra containing the opensets/compact sets of a topological set Ω.”The Borel algebra for the real numbers:

The smallest σ-algebra on < that contains theintervals.

The elements of a Borel algebra are the Borelsets.

Page 66: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Consequences of countable additivity

No way to extend arbitrary assessments overarbitrary spaces.

No uniform distribution on the integers.

BUT: countable additivity basically allows us to useintegrals to compute expectations!

VERY important: there is a unique probabilitymeasure that corresponds to an expectation (andvice-versa)!

Page 67: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

End of digression

A probability space is a triple (Ω,F ,P), where

Ω is a set (possibility space).

F is a σ-algebra on Ω.

P is a probability measure on F ; that is, anon-negative normalized (to unity) andcountable additive set-function.

Note: in almost all books on probability theory, theprobability space takes the real numbers and theirBorel algebra.

Page 68: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Extending the previous theoryA variable with finitely many values is calledsimple. We know how to relate expectationand probability for those.

Take two possible approximations for E[X ]:

E[X ] ≈ sup (E[Y ] : Y ≤ X ,Y is simple) .

E[X ] ≈ inf (E[Z ] : Z ≥ X ,Y is simple) .

In fact, for many random variables, bothapproximations coincide and then

E[X ] = sup (E[Y ] : Y ≤ X ,Y is simple.) .

The problem is to characterize these variables,and the properties of this functional.

Page 69: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Measurable random variables

A function f : Ω→ < is F -measurable withrespect to an algebra F when any set

ω : f (ω) ≤ α

belongs to F .

Note: there are more general definitions in theliterature...

A random variable X is measurable when it is ameasurable function.

Page 70: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

The Lebesgue integralA variable with finitely many values is calledsimple.Take two possible approximations for E[X ]:

E[X ] ≈ sup (E[Y ] : Y ≤ X ,Y is simple) .

E[X ] ≈ inf (E[Z ] : Z ≥ X ,Y is simple) .

In fact, given an expectation/measure, we have

E[X ] = sup (E[Y ] : Y ≤ X ,Y is simple.) .

This quantity is the Lebesgue integral withrespect to the probability measure P.Notation:

E[X ] =

∫XdP.

Page 71: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

The Lebesgue integralA variable with finitely many values is calledsimple.Take two possible approximations for E[X ]:

E[X ] ≈ sup (E[Y ] : Y ≤ X ,Y is simple) .

E[X ] ≈ inf (E[Z ] : Z ≥ X ,Y is simple) .

In fact, given an expectation/measure, we have

E[X ] = sup (E[Y ] : Y ≤ X ,Y is simple.) .

This quantity is the Lebesgue integral withrespect to the probability measure P.Notation:

E[X ] =

∫XdP.

Page 72: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Discrete random variables

Suppose X has an enumerable set of values.Then

E[X ] =∑x

xP(X = x) .

Example: X has a Poisson distribution withparameter λ > 0 when

P(X = x) =e−λλx

x!,

for integer x ≥ 0; then

E[X ] = λ

andV [X ] = λ.

Page 73: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

The Riemann integral

Under quite general conditions, the Lebesgueintegral can be computed using the Riemannintegral (the “usual” integral).

The key idea is to define densities and then tointegrate with respect to densities.

Page 74: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Part X: Densities and the like

1 Cumulative distribution functions and densities.

2 A summary.

Page 75: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Cumulative distribution function

The function

FX (x) = P(ω : X (ω) ≤ x)

is the cumulative distribution function of X .Note:

FX is a non-negative non-decreasing function, withFX (−∞) = 0 and FX (∞) = 1.

P([a, b]) = FX (b)− FX (a) =∫ b

ap(x) dx .

Page 76: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Densities

For a measurable variable X , the density of Xis, when it exists:

pX (x) =dFX (x)

dx=

dP(ω : X (ω) ≤ τ)dτ

∣∣∣∣x

.

Then:

E[X ] =

∫ΩX

xpX (x) dx ,

where ΩX is the set of values of X , and theintegral is the Riemann integral.

Page 77: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Summary1 Kolmogorov’s theory: probability space (Ω,F ,P), where

Ω is a general possibility space, F is a σ-algebra, and Pis a non-negative, normalized to unity and countablyadditive set-function (a normalized measure).

The most common σ-algebra for the real numbersis the Borel algebra (intervals).Random variables are F -measurable functions.Expectations of measurable functions are Lebesgueintegrals.

2 The distribution of X is entirely captured by FX (x), thecumulative distribution function.

3 If FX is continuous, the variable X is continuous, and wecan differentiate FX (x) to obtain the density pX (x).

4 If the distribution of X has a density pX (x), thenexpectation E[X ] is a Riemann integral

∫xpX (x) dx .

Page 78: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Part XI: Catalog of Distributions

1 Common densities.

2 De Moivre - Laplace’s theorem.

Page 79: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Uniform distribution

Suppose X is a real-valued variable.

The distribution of X is uniform if its density is

pX (x) =1

b − aif x ∈ [a, b].

and pX (x) = 0 otherwise.

Page 80: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Gaussian distribution

X has a Gaussian distribution when

pX (x) =1√

2πσ2exp

(−(x − µ)2

2σ2

).

E[X ] = µ and V [X ] = σ2.

Page 81: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

De Moivre - Laplace’s theorem

Take n, p ∈ [0, 1], such thatn × p × (1− p) >> 1; then(

nk

)pk(1−p)n−k ≈

exp(−(k − np)2/2np(1− p)

)√2πnp(1− p)

.

(That is, the ratio of two sides goes to 1.)

That is, the probability that k among n trialsare positive, when n grows without bound, canbe approximated by a Gaussian density.

Page 82: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Gamma distribution

X has a gamma distribution with parameters αand β when

pX (x) =βα

Γ(α)xα−1e−βx ,

for x > 0, and pX (x) = 0 otherwise.Gamma function:

Γ(α) =

∫ ∞0

zα−1e−zdz .

Note: For any positive integer k ,Γ(k) = (k − 1)!.We have E[X ] = α/β and the variance of X isα/β2.

Page 83: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Gamma and exponential distributions

Important: If Xini=1 are independent andhave a Gamma distribution with parameters αi

and β, then X = X1 + · · ·+ Xn has a Gammadistribution with parameter α1 + · · ·+ αn andβ.

If α = 1 and β > 0, then X has an exponentialdistribution with parameter β,

pX (x) = βe−βx ,

for x > 0, and pX (x) = 0 otherwise.

Page 84: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Chi-square distribution

X has chi-square distribution when

pX (x) =1√

2Γ(1/2)

exp(−x/2)√x

when x > 0, and pX (x) = 0 otherwise.The χ2 with n degrees of freedom:

pX (x) =1

2n/2Γ(n/2)xn/2−1 exp(−x/2)

when x > 0, and pX (x) = 0 otherwise.(Gamma distribution, α = n/2 and β = 1/2).If Xini=1 are Gaussian variables with µ = 0and σ2 = 1, then X 2

1 + · · ·+ X 2n has a χ2

distribution with n degrees of freedom.

Page 85: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Beta distribution

Often used to model random variables that arelimited to an interval.X has a beta distribution with parameters αand β when

pX (x) =Γ(α + β)

Γ(α)Γ(β)xα−1(1− x)β−1,

for x ∈ [0, 1] and 0 otherwise.Note: a beta distribution is proportional toxα−1(1− x)β−1. If α = β = 1, then we obtainthe uniform distribution.For a beta distribution pX (·) with parameters αand β, the expected value of X is α/(α + β)and the variance is αβ/((α + β)2(α + β + 1)).

Page 86: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Dirichlet distribution

A column vector of dimension n has a Dirichletdistribution when:

pX1,...,X2(x1, . . . , xn) =

Γ(∑n

i=1 αi)∏ni=1 Γ(αi)

n∏i=1

xαi−1i ,

when∑

i xi = 1, and 0 otherwise.

Distribution is defined in a simplex ofdimension n − 1.

The values αini=1, where αi > 0, are theparameters of the Dirichlet distribution.

This is a direct generalization of the betadistribution.

Page 87: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

t distribution

X has a t distribution with n degrees offreedom (for n > 0) when

pX (x) =Γ((n + 1)/2)

Γ(n/2)√nπ

(1 + x2/n)−(n+1)/2.

When n = 1, the distribution is called theCauchy distribution — it is a distribution withundefined expected value and variance!

Important: if X has a Gaussian distributionwith µ = 0 and σ2 = 1, and Y has a χ2

distribution with n degrees of freedom, thenX/√

Y /n has a t distribution with n degreesof freedom.

Page 88: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Part XII

1 Multivariate densities.

Page 89: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Multivariate densities

For two variables X and Y ,

FX ,Y (x , y) = P(X ≤ x ,Y ≤ y)

and

pX ,Y (x , y) =∂FX ,Y (x , y)

∂X∂Y.

Then

P((X ,Y ) ∈ D) =

∫ ∫D

pX ,Y (x , y) dxdy .

...and similarly for any number of variables.

Page 90: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Gaussian vector

A column vector X of dimension n has aGaussian joint probability density when:

pX(x) =1√

(2π)n detPexp

(−(x− µ)TP−1(x− µ)

2

),

where µ is a vector and P is a square matrix ofappropriate dimensions.For a Gaussian vector, we have:

E[X] = µ;V [X] = E

[(X− E[X])(X− E[X])T

]= P .

Page 91: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Part XIII: Functions

1 Functions of random variables.

2 Expected values of functions.

Page 92: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Functions of a random variable

Example: If Y = aX + b, with a > 0, then:

f (X ) ≤ y ⇒ X ≤ y − b

a,

and then:

FY (y) = P(f (X ) ≤ y) = P(

X ≤ y − b

a

),

so FY (y) = FX(y−ba

).

Page 93: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Functions of a random variable

Example: If Z = X + Y , then:

FZ (z) = P(Z ≤ z)

= P(X + Y ≤ z)

=

∫ ∞−∞

∫ z−y

−∞pX ,Y (x , y) dxdy .

Page 94: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Linear combinations and conditioning

For a vector Y of dimension m such thatY = AX, where A is a matrix, we have:

pY(y) =1√

(2π)m detQexp

(−(y − ν)TQ−1(y − ν)

2

),

where ν = Aµ and Q = APAT .

If X and Y are Gaussian vectors, pX|Y(x|y) isalso Gaussian with

E[X|Y] = E[X] + Cov[X,Y]V [Y]−1(Y−E[Y]),

V [X|Y] = V [X]− Cov[X,Y]V [Y]−1Cov[Y,X].

Page 95: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Also, we have:

If Y = f (X ) and X has density pX (x), then

E[Y ] = E[f (X )] =

∫f (x)pX (x) dx .

Concepts of covariance, correlation, etc,defined just as for discrete random variables.

Page 96: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Part XIV: Important concepts

1 Conditioning.

2 Independence.

Page 97: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Usual solution for conditioning

Introduce:

pX |Y (x |y) =pX ,Y (x , y)

pY (y)=

pX ,Y (x , y)∫pX ,Y (x , y) dx

.

Then:

E[X |Y ] =

∫xpX |Y (x |Y ) dx .

Page 98: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Independence: basic definition

Consider random variables X1,X2, . . . ,Xn.These random variables are independent:

When all distributions have densities,

p(X1, . . . ,Xn) =n∏

i=1

p(Xi) .

Page 99: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Part XV: Some advanced concepts

1 Kolmogorovian conditioning.

2 Other definitions of independence.

3 Convergence.

4 Laws of large numbers.

5 Exchangeability.

6 Central limit theorem.

7 Bayesian consensus.

Page 100: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Conditioning, again

Conditional expectation:

E[X |A] =E[XA]

P(A)

whenever P(A) > 0.

Now consider E[X |Y ].If Ω is uncountable, E[X |Y = y ] may face thedifficulty that one can have P(Y = y) = 0 for allvalues of Y .So, it is hard to define P(X |Y ) as a function of Xand Y .

Page 101: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Real thing: Kolmogorovian conditioning

Definition: E[X |Y ] is a random variable that is“Y-measurable” and such that

E[f (Y )(X − E[X |Y ])] = 0

for any function f (Y ).

This is not simple!

Usually, a proper density pX |Y (x |y) exists suchthat “probabilities” P(A(X )|B(Y )) can becalculated.

Page 102: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Independence, again

Variables Xini=1 are independent if

E[fi(Xi)| ∩j 6=i Xj ∈ Aj] = E[fi(Xi)] ,

for all functions fi(Xi) andall events ∩j 6=iXj ∈ Aj with positive probability.

For all functions fi(Xi),

E

[n∏

i=1

fi(Xi)

]=

n∏i=1

E[fi(Xi)] .

For all sets of events Aini=1,

P(∩ni=1Xi ∈ Ai) =n∏

i=1

P(Xi ∈ Ai) .

Page 103: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Convergence

There are many notions of convergence for randomvariables.Consider a sequence of random variables Xidefined in the same probability space (Ω,F ,P).

1 Convergence in distribution (function):limn→∞ FXn

(x) = FX (x)

2 Convergence in probability:limn→∞ P(|Xn − X | ≥ ε) = 0.

3 Almost sure convergence (with probabilityone): P(limn→∞ Xn = X ) = 1.

Page 104: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Review: Law of Large Numbers

Consider an infinite sequence of independentvariables with expectation µ, variance σ2.

Define X =∑n

i=1 Xi

n .

Weak law of large numbers:

limn→∞

P(|X − µ| ≥ ε

)= 0.

Strong law of large numbers:

P(

limn→∞

X = µ)

= 1.

Page 105: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Central limit theorem

Take sequence of n independent randomvariables Xi with mean µi and variance σ2

i .

Consider the random variable X =∑

i Xi ; thenE[X ] =

∑i µi and variance σ2 =

∑i σ

2i .

If we define

Z =X − µσ

,

then the distribution of Z tends to a Gaussiandistribution with expectation 0 and variance 1as n→∞.

Page 106: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Exchangeability

1 Binary variables X1,X2, . . . are exchangeable ifP(X1,X2, . . .) does not change if we justchange the order of variables.

2 If X1,X2, . . . are exchangeable, then

P(k ones in n selected variables)

can always be written as∫ (nk

)θk(1− θ)n−kp(θ) dθ.

(This is De Finetti’s representation theorem.)

3 Note the deep implications of exchangeability!

Page 107: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Bayesian consensus

1 Suppose that n Bayesians have different priorsPi(θ).

2 Suppose they all observe X1,X2, . . . .

3 Suppose all Xi are independent and identicallydistributed P(Xi |θ∗).

4 Then all Bayesians will reach an identicalposterior P(θ|X1,X2, . . .) that is infinitelyconcentrated around θ∗.

Page 108: Introduction to Probability Theory - USP · The Classical Theory: Ancient Time First thoughts appeared in Philosophy: Aristotle: \the probable is that which for the most part happens"

Summary1 Kolmogorov’s theory: probability space (Ω,F ,P), where

Ω is a general possibility space, F is a σ-algebra, and Pis a non-negative, normalized to unity and countablyadditive set-function (a normalized measure).

2 Random variables are F -measurable functions,expectations are Lebesgue integrals.

3 Random variables are F -measurable functions,expectations are Lebesgue integrals (there are manyunivariate and multivariate densities!).

4 Conditional density p(X |Y ) is given by p(X ,Y ) /p(Y );independence means “conditional equal tounconditional” or “factorization” (actually only thelatter).

5 Convergence concepts are important, with many results:laws of large numbers, central limit theorems...