Paul Cornwell March 31, 2011 1. Let X 1,…,X n be independent, identically distributed random...

29
The Central Limit Theorem Paul Cornwell March 31, 2011 1

Transcript of Paul Cornwell March 31, 2011 1. Let X 1,…,X n be independent, identically distributed random...

Page 1: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

1

The Central Limit Theorem

Paul CornwellMarch 31, 2011

Page 2: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

2

Let X1,…,Xn be independent, identically distributed random variables with positive variance. Averages of these variables will be approximately normally distributed with mean μ and standard deviation σ/√n when n is large.

Statement of Theorem

Page 3: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

3

How large of a sample size is required for the Central Limit Theorem (CLT) approximation to be good?

What is a ‘good’ approximation?

Questions

Page 4: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

4

Permits analysis of random variables even when underlying distribution is unknown

Estimating parameters

Hypothesis Testing

Polling

Importance

Page 5: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

5

Performing a hypothesis test to determine if set of data came from normal

Considerations◦ Power: probability that a test will reject the null

hypothesis when it is false

◦ Ease of Use

Testing for Normality

Page 6: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

6

Problems◦ No test is desirable in every situation (no

universally most powerful test)

◦ Some lack ability to verify for composite hypothesis of normality (i.e. nonstandard normal)

◦ The reliability of tests is sensitive to sample size; with enough data, null hypothesis will be rejected

Testing for Normality

Page 7: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

7

Symmetric

Unimodal

Bell-shaped

Continuous

Characteristics of Distribution

Page 8: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

8

Skewness: Measures the asymmetry of a distribution.◦ Defined as the third standardized moment◦ Skew of normal distribution is 0

Closeness to Normal

3

1 E

X 3

1

3

)1( sn

XXn

ii

Page 9: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

9

Kurtosis: Measures peakedness or heaviness of the tails.◦ Defined as the fourth standardized moment◦ Kurtosis of normal distribution is 3

Closeness to Normal

4

2 E

n

x

41

4

)1( sn

XXn

ii

Page 10: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

10

Cumulative distribution function:

Binomial Distribution

X

i

iniin pppnxF

0

)1(C),;(

)1(][Var

][E

pnpX

npX

Page 11: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

11

Binomial Distribution*paramete

rsKurtosi

sSkewnes

s% outside 1.96*sd

K-S distanc

e

MeanStd Dev

n = 20p = .2

-.0014(.25)

.3325(1.5)

.0434 .128 3.99991.786

n = 25p = .2

.002 .3013 .0743 .116 5.00072.002

n = 30p = .2

.0235 .2786 .0363 .106 5.9972.188

n = 50p = .2

.0106 .209 .0496 .083 10.0012.832

n = 100p = .2

.005 .149 .05988 .0574 19.9974.0055

*from R

Page 12: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

12

Cumulative distribution function:

Uniform Distribution

ab

axbaxF

),;(

12

)(][Var

2][E

2abX

baX

Page 13: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

13

Uniform Distribution*parameters Kurtosi

sSkewnes

s%

outside 1.96*sd

K-S distanc

e

MeanStd Dev

n = 5(a,b) = (0,1)

-.236(-1.2)

.004(0)

.0477 .0061 .4998.1289 (.129)

n = 5(a,b) = (0,50)

-.234 0 .04785 .0058 24.996.468 (6.455)

n = 5(a,b) = (0, .1)

-.238 -.0008 .048 .0060 .0500.0129 (.0129)

n = 3(a,b) = (0,50)

-.397 -.001 .0468 .01 24.998.326 (8.333)

*from R

Page 14: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

14

Cumulative distribution function:

Exponential Distribution

xexF 1);(

2

1]Var[

1]E[

X

X

Page 15: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

15

Exponential Distribution*

parameters

Kurtosis Skewness

% outside 1.96*sd

K-S distanc

e

MeanStd Dev

n = 5λ = 1

1.239(6)

.904(2)

.0434 .0598 .9995.4473 (.4472)

n = 10 .597 .630 .045 .042 1.0005.316 (.316)

n = 15 .396 .515 .0464 .034 .9997.258 (.2581)

*from R

Page 16: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

16

Find n values for more distributions

Refine criteria for quality of approximation

Explore meanless distributions

Classify distributions in order to have more general guidelines for minimum sample size

For Next Time…

Page 17: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

17

The Central Limit Theorem (Pt 2)

Paul CornwellMay 2, 2011

Page 18: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

18

Central Limit Theorem: Averages of i.i.d. variables become normally distributed as sample size increases

Rate of converge depends on underlying distribution

What sample size is needed to produce a good approximation from the CLT?

Review

Page 19: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

19

Real-life applications of the Central Limit Theorem

What does kurtosis tell us about a distribution?

What is the rationale for requiring np ≥ 5?

What about distributions with no mean?

Questions

Page 20: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

20

Probability for total distance covered in a random walk tends towards normal

Hypothesis testing

Confidence intervals (polling)

Signal processing, noise cancellation

Applications of Theorem

Page 21: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

21

Measures the “peakedness” of a distribution

Higher peaks means fatter tails

Kurtosis

3E

4

2

n

x

Page 22: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

22

Traditional assumption for normality with binomial is np > 5 or 10

Skewness of binomial distribution increases as p moves away from .5

Larger n is required for convergence for skewed distributions

Why np?

Page 23: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

23

Has no moments (including mean, variance)

Distribution of averages looks like regular distribution

CLT does not apply

Cauchy Distribution

)1(

1)(

2xxf

Page 24: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

24

α = β = 1/3

Distribution is symmetric and bimodal

Convergence to normal is fast in averages

Beta Distribution

Page 25: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

25

Heavier-tailed, bell-shaped curve

Approaches normal distribution as degrees of freedom increase

Student’s t Distribution

Page 26: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

26

4 statistics: K-S distance, tail probabilities, skewness and kurtosis

Different thresholds for “adequate” and “superior” approximations

Both are fairly conservative

Criteria

Page 27: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

27

Adequate Approximation

Distribution ∣Kurtosis∣ <.5

∣Skewness∣ <.25

Tail Prob. .04<x<.0

6

K-S Distance

<.05

max

Uniform 3 1 2 2 3

Beta (α=β=1/3)

4 1 3 3 4

Exponential 12 64 5 8 64

Binomial (p=.1)

11 114 14 332 332

Binomial (p=.5)

4 1 12 68 68

Student’s t with 2.5 df

NA NA 13 20 20

Student’s twith 4.1 df

120 1 1 2 120

Page 28: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

28

Stronger Approximation

Distribution ∣Kurtosis∣ <.3

∣Skewness∣ <.15

Tail Prob. .04<x<.0

6

K-S Distance

<.02

max

Uniform 4 1 2 2 4

Beta (α=β=1/3)

6 1 3 4 6

Exponential 20 178 5 45 178

Binomial (p=.1)

18 317 14 1850 1850

Binomial (p=.5)

7 1 12 390 390

Student’s t with 2.5 df

NA NA 13 320 320

Student’s twith 4.1 df

200 1 1 5 200

Page 29: Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

29

Skewness is difficult to shake

Tail probabilities are fairly accurate for small sample sizes

Traditional recommendation is small for many common distributions

Conclusions