the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law...

34
1 the law of large numbers & the CLT

Transcript of the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law...

Page 1: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

1

the law of large numbers & the CLT

Page 2: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

2

sums of random variables

If X,Y are independent, what is the distribution of Z = X + Y ?

Discrete case:

pZ(z) = Σx pX(x) • pY(z-x)

Continuous case:

fZ(z) = ∫-∞ fX(x) • fY(z-x) dx

W = X + Y + Z ? Similar, but double sums/integrals

V = W + X + Y + Z ? Similar, but triple sums/integrals

+∞

Page 3: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

3

example

If X and Y are uniform, then Z = X + Y is not; it’s triangular:

Intuition: X + Y ≈ 0 or ≈ 1 is rare, but many ways to get X + Y ≈ 0.5

Page 4: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

4

“laws of large numbers”

i.i.d. (independent, identically distributed) random vars

X1, X2, X3, …

Xi has μ = E[Xi] < ∞ and σ2 = Var[Xi]

So limits as n→∞ do not exist (except in the degenerate case where μ = σ2 = 0; note that if μ = 0, the center of the data stays fixed, but if σ2 > 0, then the spread grows with n).

Page 5: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

5

weak law of large numbers

i.i.d. (independent, identically distributed) random vars

X1, X2, X3, …

Xi has μ = E[Xi] < ∞ and σ2 = Var[Xi]

Consider the sample mean:

The Weak Law of Large Numbers: ��� For any ε > 0, as n → ∞

Page 6: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

6

weak law of large numbers

For any ε > 0, as n → ∞

Proof: (assume σ2 < ∞)

By Chebyshev inequality,

n→∞

Page 7: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

7

strong law of large numbers

i.i.d. (independent, identically distributed) random vars

X1, X2, X3, …

Xi has μ = E[Xi] < ∞

Strong Law ⇒ Weak Law (but not vice versa) Strong law implies that for any ε > 0, there are only a finite number of n satisfying the weak law condition ���(almost surely, i.e., with probability 1)

Page 8: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

8

weak vs strong laws

Weak Law:

Strong Law:

How do they differ? Imagine an infinite 2d table, whose rows are indp repeats of the infinite sample Xi. Pick ε. Imagine cell m,n lights up if average of 1st n samples in row m is > ε in row

WLLN says fraction of lights in nth column goes to zero as n →∞. It does not prohibit every row from having ∞ lights, so long as frequency declines.

SLLN says every row has only finitely many lights (with probability 1).

lim

Page 9: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

9

sample mean → population mean

Xi ~ Unif(0,1) limn→∞ Σi=1 Xi/n→ µ=0.5

n

Page 10: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

10

sample mean → population mean

μ±2σ

Xi ~ Unif(0,1) limn→∞ Σi=1 Xi/n→ µ=0.5

std dev(Σi=1 Xi/n) = 1/√12n

n

n

Page 11: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

demo

Page 12: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

12

another example

Page 13: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

13

another example

Page 14: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

14

another example

Page 15: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

15

the law of large numbers

Note: Dn = E[ | Σ1≤i≤n(Xi-μ) | ] grows with n, but Dn/n → 0

Justifies the “frequency” interpretation of probability

Suppose that Pr(A) = p

Consider independent trials in which event may or may not occur. Let Xi be indicator for whether or not it occurs in ith trial.

Law of Large numbers says relative frequency converges to p.

Page 16: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

16

the law of large numbers

Implications for gambler playing an unfair game:

Each round bet one dollar that pays off $2 with probability 0.49 and 0 with probability 0.51. Expected payoff is 2*0.49 – 1 = -$0.02

Expected loss in one round not so bad.

Law of large numbers says that in $n$ trials average loss will tend to -0.02.

Large number of games: small average loss translates to HUGE accumulated loss with probability close to 1.

Page 17: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

17

the law of large numbers

Note: Dn = E[ | Σ1≤i≤n(Xi-μ) | ] grows with n, but Dn/n → 0

Justifies the “frequency” interpretation of probability

Does not justify:

Gambler’s fallacy: “I’m due for a win!”

Many web demos, e.g. ��� http://stat-www.berkeley.edu/~stark/Java/Html/lln.htm

Page 18: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

18

normal random variable

X is a normal random variable X ~ N(μ,σ2)

Page 19: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

19

the central limit theorem (CLT)

i.i.d. (independent, identically distributed) random vars

X1, X2, X3, …

Xi has μ = E[Xi] < ∞ and σ2 = Var[Xi] < ∞ As n → ∞,

Restated: As n → ∞,

Page 20: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

20

Page 21: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

21

CLT applies even to even wacky distributions

Page 22: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

22

a good fit (but relatively less good in

extreme tails, perhaps)

Page 23: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

23

CLT in the real world

CLT is the reason many things appear normally distributed Many quantities = sums of (roughly) independent random vars

Exam scores: sums of individual problems People’s heights: sum of many genetic & environmental factors Measurements: sums of various small instrument errors ...

Page 24: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

Where we go next

A little bit of statistics:

•  Maximum likelihood estimation

•  Expectation Maximization

•  Hypothesis Testing

Some applications of probability and statistics in computer science.

24

Page 25: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

Machine Learning

Machine Learning: algorithms that use “experience” to improve their performance

Can be applied in situations where it is very challenging (or impossible) to define the rules by hand: e.g.

•  face detection

•  speech recognition

•  stock prediction

25

Page 26: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

Machine Learning

Machine Learning: write programs with thousands/millions of undefined constants.

Learn through experience how to set those constants.

Humans do it: why not computers?

Problem: we don’t know how brain works.

Nonetheless, machine learning algorithms are getting better and better and better…..

26

Page 27: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

27

Page 28: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

28

Page 29: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

29

Page 30: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

30

Page 31: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

31

Page 32: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

32

Page 33: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

More generally, might use random variables to represent everything about the world

Thus, goal is to estimate f(y|x) which is selected from some carefully chosen “hypothesis space”

Space indexed by parameters which are knobs we turn to create different classifiers.

Learning: the problem of estimating joint probability density functions, tuning the knobs, given samples from the function.

33

Page 34: the law of large numbers & the CLT · Strong Law 㱺 Weak Law (but not vice versa)! Strong law implies that for any ε > 0, there are only a finite ... Each round bet one dollar

Why now?

growing flood of online data

recent progress in algorithms and theoretical foundations

computational power

never-ending industrial applications.

34