408clt

1
Math 408, Actuarial Statistics I A.J. Hildebrand The Central Limit Theorem Random samples, iid random variables Definition: A random sample of size n from a given distribution is a set of n in- dependent r.v.’s X 1 ,X 2 ,...,X n , each having the given distribution, with expectation E(X i )= μ and variance Var(X i )= σ 2 . Such a set of random variables is also called independent, identically distributed (iid). Sample sum: S = n i=1 X i . E(S )= , Var(S )= 2 , Sample mean: X = 1 n n i=1 X i . E( X )= μ, Var( X )= σ 2 /n Normal approximation, Central Limit Theorem The Central Limit Theorem (CLT) says that the mean and the sum of a random sample of a large enough size 1 from an (essentially) arbitrary distribution have approximately normal distribution: Given a random sample X 1 ,...,X n with μ = E(X i ) and σ 2 = Var(X i ), we have: The sample sum S = n i=1 X i is approximately normal N (nμ, nσ 2 ). Equivalently, the standardized version of S , S * = S - σ n , is approximately standard normal. The sample mean X = 1 n n i=1 X i is approximately normal N (μ, σ 2 /n). Equivalently, the standardized version of X , X * = X - μ σ/ n , is approximately standard normal. Sums of independent normal r.v.’s In the case the X i ’s are already normal, the associated sums and random samples are exactly (and not just approximately) normal, with the appropriate parameters: Sums and differences of two normal r.v.’s: If X 1 and X 2 are N (μ 1 2 1 ) and N (μ 2 2 2 ), respectively, then X 1 ± X 2 is normal N (μ 1 ± μ 2 2 1 + σ 2 2 ). (Note the plus sign in the formula for the variance of X 1 + X 2 . Also, note that it is the variances that add up, not the standard deviations.) General linear combinations of normal r.v.’s: If each X i is normal N (μ i 2 i ), then n i=1 c i X i is normal N (μ, σ 2 ), where μ = n i=1 c i μ i and σ 2 = n i=1 c 2 i σ 2 i . 1 That is, a large enough value of n. The larger n is, the better the CLT approximation becomes, but greater than 25 is usually more than enough in practice, and in some cases the CLT works already well for single digit values of n. 1

Transcript of 408clt

Page 1: 408clt

Math 408, Actuarial Statistics I A.J. Hildebrand

The Central Limit Theorem

Random samples, iid random variables

• Definition: A random sample of size n from a given distribution is a set of n in-dependent r.v.’s X1, X2, . . . , Xn, each having the given distribution, with expectationE(Xi) = µ and variance Var(Xi) = σ2. Such a set of random variables is also calledindependent, identically distributed (iid).

• Sample sum: S =∑n

i=1 Xi. E(S) = nµ, Var(S) = nσ2,

• Sample mean: X = 1n

∑ni=1 Xi. E(X) = µ, Var(X) = σ2/n

Normal approximation, Central Limit Theorem

The Central Limit Theorem (CLT) says that the mean and the sum of a random sample ofa large enough size1 from an (essentially) arbitrary distribution have approximately normaldistribution: Given a random sample X1, . . . , Xn with µ = E(Xi) and σ2 = Var(Xi), we have:

• The sample sum S =∑n

i=1 Xi is approximately normal N(nµ, nσ2).

Equivalently, the standardized version of S, S∗ =S − nµ

σ√

n, is approximately standard

normal.

• The sample mean X = 1n

∑ni=1 Xi is approximately normal N(µ, σ2/n).

Equivalently, the standardized version of X, X∗ =

X − µ

σ/√

n, is approximately standard

normal.

Sums of independent normal r.v.’s

In the case the Xi’s are already normal, the associated sums and random samples are exactly(and not just approximately) normal, with the appropriate parameters:

• Sums and differences of two normal r.v.’s: If X1 and X2 are N(µ1, σ21) and

N(µ2, σ22), respectively, then X1 ± X2 is normal N(µ1 ± µ2, σ

21 + σ2

2). (Note the plussign in the formula for the variance of X1 + X2. Also, note that it is the variances thatadd up, not the standard deviations.)

• General linear combinations of normal r.v.’s: If each Xi is normal N(µi, σ2i ), then∑n

i=1 ciXi is normal N(µ, σ2), where µ =∑n

i=1 ciµi and σ2 =∑n

i=1 c2i σ

2i .

1That is, a large enough value of n. The larger n is, the better the CLT approximation becomes, but greaterthan 25 is usually more than enough in practice, and in some cases the CLT works already well for single digitvalues of n.

1