Paul Cornwell March 31, 2011 1. Let X 1,…,X n be independent, identically distributed random...

The Central Limit Theorem

Paul CornwellMarch 31, 2011

Let X1,…,Xn be independent, identically distributed random variables with positive variance. Averages of these variables will be approximately normally distributed with mean μ and standard deviation σ/√n when n is large.

Statement of Theorem

How large of a sample size is required for the Central Limit Theorem (CLT) approximation to be good?

What is a ‘good’ approximation?

Questions

Permits analysis of random variables even when underlying distribution is unknown

Estimating parameters

Hypothesis Testing

Polling

Importance

Performing a hypothesis test to determine if set of data came from normal

Considerations◦ Power: probability that a test will reject the null

hypothesis when it is false

◦ Ease of Use

Testing for Normality

Problems◦ No test is desirable in every situation (no

universally most powerful test)

◦ Some lack ability to verify for composite hypothesis of normality (i.e. nonstandard normal)

◦ The reliability of tests is sensitive to sample size; with enough data, null hypothesis will be rejected

Testing for Normality

Symmetric

Unimodal

Bell-shaped

Continuous

Characteristics of Distribution

Skewness: Measures the asymmetry of a distribution.◦ Defined as the third standardized moment◦ Skew of normal distribution is 0

Closeness to Normal

)1( sn

Kurtosis: Measures peakedness or heaviness of the tails.◦ Defined as the fourth standardized moment◦ Kurtosis of normal distribution is 3

Closeness to Normal

)1( sn

Cumulative distribution function:

Binomial Distribution

iniin pppnxF

)1(C),;(

)1(][Var

Binomial Distribution*paramete

rsKurtosi

sSkewnes

s% outside 1.96*sd

K-S distanc

MeanStd Dev

n = 20p = .2

-.0014(.25)

.3325(1.5)

.0434 .128 3.99991.786

n = 25p = .2

.002 .3013 .0743 .116 5.00072.002

n = 30p = .2

.0235 .2786 .0363 .106 5.9972.188

n = 50p = .2

.0106 .209 .0496 .083 10.0012.832

n = 100p = .2

.005 .149 .05988 .0574 19.9974.0055

*from R

Uniform Distribution

axbaxF

)(][Var

Uniform Distribution*parameters Kurtosi

sSkewnes

outside 1.96*sd

K-S distanc

MeanStd Dev

n = 5(a,b) = (0,1)

-.236(-1.2)

.004(0)

.0477 .0061 .4998.1289 (.129)

n = 5(a,b) = (0,50)

-.234 0 .04785 .0058 24.996.468 (6.455)

n = 5(a,b) = (0, .1)

-.238 -.0008 .048 .0060 .0500.0129 (.0129)

n = 3(a,b) = (0,50)

-.397 -.001 .0468 .01 24.998.326 (8.333)

*from R

Exponential Distribution

xexF 1);(

1]Var[

Exponential Distribution*

parameters

Kurtosis Skewness

% outside 1.96*sd

K-S distanc

MeanStd Dev

n = 5λ = 1

1.239(6)

.904(2)

.0434 .0598 .9995.4473 (.4472)

n = 10 .597 .630 .045 .042 1.0005.316 (.316)

n = 15 .396 .515 .0464 .034 .9997.258 (.2581)

*from R

Find n values for more distributions

Refine criteria for quality of approximation

Explore meanless distributions

Classify distributions in order to have more general guidelines for minimum sample size

For Next Time…

The Central Limit Theorem (Pt 2)

Paul CornwellMay 2, 2011

Central Limit Theorem: Averages of i.i.d. variables become normally distributed as sample size increases

Rate of converge depends on underlying distribution

What sample size is needed to produce a good approximation from the CLT?

Review

Real-life applications of the Central Limit Theorem

What does kurtosis tell us about a distribution?

What is the rationale for requiring np ≥ 5?

What about distributions with no mean?

Questions

Probability for total distance covered in a random walk tends towards normal

Hypothesis testing

Confidence intervals (polling)

Signal processing, noise cancellation

Applications of Theorem

Measures the “peakedness” of a distribution

Higher peaks means fatter tails

Kurtosis

Traditional assumption for normality with binomial is np > 5 or 10

Skewness of binomial distribution increases as p moves away from .5

Larger n is required for convergence for skewed distributions

Why np?

Has no moments (including mean, variance)

Distribution of averages looks like regular distribution

CLT does not apply

Cauchy Distribution

α = β = 1/3

Distribution is symmetric and bimodal

Convergence to normal is fast in averages

Beta Distribution

Heavier-tailed, bell-shaped curve

Approaches normal distribution as degrees of freedom increase

Student’s t Distribution

4 statistics: K-S distance, tail probabilities, skewness and kurtosis

Different thresholds for “adequate” and “superior” approximations

Both are fairly conservative

Criteria

Adequate Approximation

Distribution ∣Kurtosis∣ <.5

∣Skewness∣ <.25

Tail Prob. .04<x<.0

K-S Distance

Uniform 3 1 2 2 3

Beta (α=β=1/3)

4 1 3 3 4

Exponential 12 64 5 8 64

Binomial (p=.1)

11 114 14 332 332

Binomial (p=.5)

4 1 12 68 68

Student’s t with 2.5 df

NA NA 13 20 20

Student’s twith 4.1 df

120 1 1 2 120

Stronger Approximation

Distribution ∣Kurtosis∣ <.3

∣Skewness∣ <.15

Tail Prob. .04<x<.0

K-S Distance

Uniform 4 1 2 2 4

Beta (α=β=1/3)

6 1 3 4 6

Exponential 20 178 5 45 178

Binomial (p=.1)

18 317 14 1850 1850

Binomial (p=.5)

7 1 12 390 390

Student’s t with 2.5 df

NA NA 13 320 320

Student’s twith 4.1 df

200 1 1 5 200

Skewness is difficult to shake

Tail probabilities are fairly accurate for small sample sizes

Traditional recommendation is small for many common distributions

Conclusions

Paul Cornwell March 31, 2011 1. Let X 1,…,X n be independent, identically distributed random...

Documents

Transcript of Paul Cornwell March 31, 2011 1. Let X 1,…,X n be independent, identically distributed random...

Probability and Random Variables; and Classical Estimation ...

P2 - Discrete Random Variables - Jarad Niemi

Probabilités et Biostatistique...Test d’indépendance entre deux variables qualitatives (2) Etape 1. H 0: les variables X et Y sont indépendantes H 1: les variables X et Y sont

FLUID PROPERTIES Independent variables SCALARS VECTORS TENSORS.

18.175: Lecture 17 .1in Poisson random variables

10 Composite Variables - Byrnes Lab

Variables de proceso-1.ppt

Chapter 2: Method of Separation of Variables

Variables aleatorias continuas, TCL y Esperanza Condicionaljgimenez/Modelos_y_Simulacion/2013/clase3.pdf · Variables aleatorias continuas Deﬁnición Una variable aleatoria X se

TRANSFORMATION OF RANDOM VARIABLES - METU OCW

Lecture 15. Dummy variables, continued - Personal World ...ridder/Lnotes/Undeconometrics/Transparanten... · Lecture 15. Dummy variables, continued Seasonal effects in time series

Treball forces variables

ANALYTIC FUNCTIONS SEVERAL COMPLEX VARIABLES

AsymptoticPropertiesoftheMaximumLikelihood {εk} is an independent and identically distributed sequence of random variables, {Yk} is an inhomogeneous s-order Markov chain on a state

Convergence of Random Variables - Stanford University

Integración de Funciones de Varias variables - Iniciomatematicas.unex.es/~montalvo/Grado en Matematicas/Analisis... · Capítulo 1 Integración de Funciones de Varias variables 1.

Informe de Variables Aleatorias

CICLO TURBORREACTOR GAS PERFECTO, CALORES ESPECÍFICOS VARIABLES

CSEP505: Programming Languages Lecture 7: Coercions, Type Variables

Regression analysis Two variables - Course Websites