Confidence Intervals - Kasetsart University

42
Confidence Intervals รศ.ดร. อนันต์ ผลเพิ่ม Assoc.Prof.Anan Phonphoem, Ph.D. [email protected] Intelligent Wireless Network Group (IWING Lab) http://iwing.cpe.ku.ac.th Computer Engineering Department Kasetsart University, Bangkok,Thailand 1

Transcript of Confidence Intervals - Kasetsart University

Page 1: Confidence Intervals - Kasetsart University

Confidence Intervals รศ.ดร. อนันต์ ผลเพิม่

Assoc.Prof. Anan Phonphoem, Ph.D. [email protected]

Intelligent Wireless Network Group (IWING Lab)

http://iwing.cpe.ku.ac.th

Computer Engineering Department

Kasetsart University, Bangkok, Thailand

1

Page 2: Confidence Intervals - Kasetsart University

Confidence Interval

A sample of independent observation

x1,x2,…xn from unknown distribution

Want to find population mean μ

How to know that μ plausible (believable)

◦ Point estimate μ and its estimated s.e.

Confidence Interval

◦ Interval that covers parameter (e.g. μ) with range

of confidence (confidence level)

◦ Ex. 95% of confidence that μ will be in this range

2

Page 3: Confidence Intervals - Kasetsart University

Chi-Square (2) Distribution

If X is normal distribution,

◦ is Chi-Square with degree of freedom

(d.f) equals to 1

3

)1,0(NX

2XY

For independent X1, X2,…,Xn with

is Chi-Square with d.f = n

n

i

iXY

2

)1,0(NX

For independent X1, X2,…,Xn with

is Chi-Square with d.f = n

2

1

n

i

ix

Y

),(2

NX

Page 4: Confidence Intervals - Kasetsart University

Degree of Freedom (d.f)

# of data that is not depended on others

Example:

◦ Select 3 random numbers

d.f = 3

◦ Select 3 random numbers that sum = 10

can select any 2 numbers, the last one is depended on condition

E.g. Select 2 and 5, the last must be 3

d.f =2

◦ Select 3 random numbers that sum of square(x) = 54

can select any 1 number, the last two are depended on condition

E.g. Select 7, the rest must be 1 and 2

d.f = 1

4

Page 5: Confidence Intervals - Kasetsart University

t-Distribution

5

If X is normal distribution, and

are independent, variable t )1,0(NX

YandX

2Y

f

Xt

2

is t-Distribution with d.f = f

For n random from , we can calculate and

And we know the distribution ),(

2NX x

2s

)1,0(N

n

X

2

)1(2

2)1(

n

sn

Therefore, is t-Distribution with d.f = (n-1)

n

s

X

n

sn

n

X

t

2

2

)1(

)1(

Page 6: Confidence Intervals - Kasetsart University

Confidence Interval of μ, known σ

6

n samples are random with unknown μ, known σ (not practical for study)

)1,0(NX

We can find and use it to find Confidence Interval of μ

x

),(2

NX From

nNX

2

,

, then

and )1,0(N

n

XZ

Page 7: Confidence Intervals - Kasetsart University

Confidence Interval of μ, known σ

7

From normal distribution, suppose we want to find μ with the confidence of 95% (0.95)

0.95 0.025 0.025

z is between ± 1.96

-1.96 +1.96

Therefore,

P(-1.96 < Z < 1.96) = 0.95

Page 8: Confidence Intervals - Kasetsart University

Confidence Interval of μ, known σ

8

95.096.196.1

n

XP

n

X

n

n

X

96.196.196.196.1

n

X

n

X

96.196.1

n

X

n

X

96.196.1

95.096.196.1

n

X

n

XP

Therefore,

Page 9: Confidence Intervals - Kasetsart University

Confidence Interval of μ, known σ

9

Confidence Level 95% of μ is n

X

96.1

( ) n

X

96.1n

X

96.1 X

Confidence Level 95%

Page 10: Confidence Intervals - Kasetsart University

Adjust the Interval

We can select any Confidence Interval

high confidence interval

wide confidence interval length

(not a good estimation)

To decrease confidence interval length

◦ Increase number of samples high cost

◦ Decrease the confidence level

Popular Confidence Interval

◦ 90%, 95%, and 99%

For non-normal distribution

◦ Central limit theorem

◦ n > 30 is OK for estimation 10

Page 11: Confidence Intervals - Kasetsart University

Example 1

A cylinder with normal distribution

Unknown μ, and known σ = 2.009 cm.

Randomly select 12 sample cylinders

11

Calculate .91.12 cmx

Find the confidence interval of population mean of

cylinder for this factory with 95% confidence level

Solution

Let μ = population mean

confidence interval = n

x

96.1

12

009.296.191.12

05.1477.11

14.191.12

to

Page 12: Confidence Intervals - Kasetsart University

More easy symbol

12

pz represents 100p percentile of normal

475.0z represents 95 percentile of normal

495.0z represents 99 percentile of normal

Let α is the different between 1.00 and required value

(e.g. require 95% α = 0.05)

n

zX

2

1Confidence interval (1- α ) 100% of μ =

Page 13: Confidence Intervals - Kasetsart University

Example 1

13

For more confidence level = 99%

α = 0.01 α/2 = 0.005 58.2495.0

z

confidence interval = n

x

58.2

12

009.258.291.12

41.1441.11

4964.191.12

to

n

zx

495.0

Which is wider than 95% confidence level )05.1477.11(14.191.12 to

Page 14: Confidence Intervals - Kasetsart University

Example 2

Study the salary of employee

σ2= 10,000 Baht

Randomly ask 100 employee

14

Calculate Bahtx 500,2

Find the confidence interval of population mean of salary for this company with 95% confidence level

Solution σ2= 10,000 σ = 100, , n = 100 500,2x

From central limit theorem, 95% confidence level of μ is

n

zx

495.0

n

x

96.1 60.192500

100

10096.12500

60.519,240.480,2 to

Page 15: Confidence Intervals - Kasetsart University

Example 3: National Discount, Inc.

National Discount has 260 retail outlets throughout

the United States. National evaluates each

potential location for a new retail outlet in part on

the mean annual income of the individuals in the

marketing area of the new location.

Sampling can be used to develop an interval

estimate of the mean annual income for individuals

in a potential marketing area for National Discount.

A sample of size n = 36 was taken.

◦ The sample mean, , is $21,100

◦ The sample standard deviation, s, is $4,500

We will use .95 as the confidence coefficient in our

interval estimate.

x

Page 16: Confidence Intervals - Kasetsart University

Precision Statement

There is a .95 probability that the value of a

sample mean for National Discount will

provide a sampling error of $1,470 or less.

Determined as follows:

◦ 95% of the sample means that can be observed

are within 1.96 of the population mean .

◦ If , then

Example 3: National Discount, Inc.

75036

500,4

n

sx

147096.1 x

Page 17: Confidence Intervals - Kasetsart University

Example 3: National Discount, Inc.

Interval Estimate of the Population Mean:

Unknown

Interval Estimate of is:

$21,100 + $1,470

or $19,630 to $22,570

We are 95% confident that the interval

contains the population mean.

Page 18: Confidence Intervals - Kasetsart University

Many Intervals

For a random set

◦ Can create many intervals (a,b)

◦ Depends on confidence level

For 100 confidence intervals finding

◦ There are (1-α)*100 times from 100 times that

the confidence intervals will cover the parameter (e.g. μ)

18

Page 19: Confidence Intervals - Kasetsart University

Example 3

From a normal distribution with μ = 10, σ = 9

Generate data 16 sets (8 sampling per set)

19

For each set

create confidence interval for 95% level of x

Page 20: Confidence Intervals - Kasetsart University

Example 3

20

n

96.1

n

96.1

5 6 7 8 9 10 11 12 13 14 15

Set 1: 295.10x

With 95% confidence level (8.216,12.374) cover μ = 10

Set 16: 703.12x

With 95% confidence level (10.624,14.781) not cover μ = 10

Therefore, for 16 sets

not cover: 1/16 = 0.0625 = 6.25%

cover: 15/16 = 0.9375 = 93.75%

Page 21: Confidence Intervals - Kasetsart University

Interval Estimation of a Population Mean:

Small-Sample Case (n < 30)

Population is Not Normally Distributed

◦ The only option is to increase the sample size to

◦ n > 30 and use the large-sample interval-estimation

◦ procedures.

Population is Normally Distributed and is Known

◦ The large-sample interval-estimation procedure can

◦ be used.

Population is Normally Distributed and is Unknown

◦ The appropriate interval estimate is based on a

◦ probability distribution known as the t distribution.

Page 22: Confidence Intervals - Kasetsart University

t Distribution

The t distribution is a family of similar probability

distributions.

A specific t distribution depends on a parameter

known as the degrees of freedom.

As the number of degrees of freedom increases,

the difference between the t distribution and the

standard normal probability distribution becomes

smaller and smaller.

A t distribution with more degrees of freedom has

less dispersion.

The mean of the t distribution is zero.

Page 23: Confidence Intervals - Kasetsart University

Interval Estimate

where

1 -α = the confidence coefficient

t α /2 = the t value providing an area of α /2

in the upper tail of a t distribution

with n - 1 degrees of freedom

s = the sample standard deviation

Interval Estimation of a Population Mean:

Small-Sample Case (n < 30) with Unknown

n

stx

2/

Page 24: Confidence Intervals - Kasetsart University

Example 4: Apartment Rents

Interval Estimation of a Population Mean:

Small-Sample Case (n < 30) with Unknown

A reporter for a student newspaper is writing an article on the

cost of off-campus housing.

A sample of 10 one-bedroom units within a half-mile of

campus resulted in a sample mean of $550 per month and a

sample standard deviation of $60.

Let us provide a 95% confidence interval estimate of the

mean rent per month for the population of one-bedroom units

within a half-mile of campus.

Assume this population to be normally distributed.

Page 25: Confidence Intervals - Kasetsart University

Example 4: Apartment Rents

t Value

At 95% confidence, 1 - α = .95, α = .05,

and α /2 = .025.

t.025 is based on n - 1 = 10 - 1 = 9 degrees of

freedom.

In the t distribution table we see that t.025 = 2.262.

Degrees

of

Freedom

Area in Upper Tail

.10 .05 .025 .01 .005

. . . . . .

7 1.415 1.895 2.365 2.998 3.499

8 1.397 1.860 2.306 2.896 3.355

9 1.383 1.833 2.262 2.821 3.250

10 1.372 1.812 2.228 2.764 3.169

Page 26: Confidence Intervals - Kasetsart University

Example 4: Apartment Rents

Interval Estimation of a Population Mean:

Small-Sample Case (n < 30) with Unknown

550 42.92

or $507.08 to $592.92

We are 95% confident that the mean rent per month for the

population of one-bedroom units within a half-mile of campus

is between $507.08 and $592.92.

n

stx

2/

10

60262.2550

Page 27: Confidence Intervals - Kasetsart University

Sample Size for an Interval Estimate

of a Population Mean

Let E = the maximum sampling error mentioned

in the precision statement.

E is the amount added to and subtracted from the

point estimate to obtain an interval estimate.

E is often referred to as the margin of error.

We have

Solving for n we have

n

zE

2/

2

22

2/)(

E

zn

Page 28: Confidence Intervals - Kasetsart University

Example 5: National Discount, Inc.

Sample Size for an Interval Estimate of a

Population Mean

Suppose that National’s management team

wants an estimate of the population

mean such that there is a 0.95 probability

that the sampling error is $500 or less.

How large a sample size is needed to meet

the required precision?

Page 29: Confidence Intervals - Kasetsart University

Example 5: National Discount, Inc.

Sample Size for Interval Estimate of a

Population Mean

At 95% confidence, z.025 = 1.96.

Recall that = 4,500

Solving for n we have

We need to sample 312 to reach a desired

precision of $500 at 95% confidence.

500

n

σz

2α/

17.311)500(

)4500()96.1(

2

22

n

Page 30: Confidence Intervals - Kasetsart University

Interval Estimate

where: 1 -α is the confidence coefficient

zα/2 is the z value providing an area of

α/2 in the upper tail of the standard

normal probability distribution

is the sample proportion

Interval Estimation

of a Population Proportion

n

ppzp

)1(

2/

p

Page 31: Confidence Intervals - Kasetsart University

Example 6: Political Science, Inc.

Interval Estimation of a Population Proportion

Political Science, Inc. (PSI) specializes in voter

polls and surveys designed to keep political office

seekers informed of their position in a race. Using

telephone surveys, interviewers ask registered

voters who they would vote for if the election were

held that day.

In a recent election campaign, PSI found that

◦ 220 registered voters, out of 500 contacted, favored a

particular candidate.

◦ PSI wants to develop a 95% confidence interval estimate

for the proportion of the population of registered voters that

favors the candidate.

Page 32: Confidence Intervals - Kasetsart University

Example 6: Political Science, Inc.

Interval Estimate of a Population Proportion

where: n = 500, = 220/500 = .44, zα/2 = 1.96

.44 + .0435

PSI is 95% confident that the proportion of all voters that favors

the candidate is between .3965 and .4835.

n

ppzp

)1(

2/

p

500

)44.1(44.96.144.

Page 33: Confidence Intervals - Kasetsart University

Let E = the maximum sampling error mentioned

in the precision statement.

We have

Solving for n we have

Sample Size for an Interval Estimate

of a Population Proportion

n

ppzE

)1(

2/

2

2

2/)1()(

E

ppzn

Page 34: Confidence Intervals - Kasetsart University

Example 7: Political Science, Inc.

Sample Size for an Interval Estimate of a

Population Proportion

Suppose that PSI would like a .99

probability that the sample proportion is

within .03 of the population proportion.

How large a sample size is needed to meet

the required precision?

Page 35: Confidence Intervals - Kasetsart University

Example 7: Political Science, Inc.

Sample Size for Interval Estimate of a Population

Proportion

At 99% confidence, z.005 = 2.576.

Note:

We used 0.44 as the best estimate of p in the above expression.

If no information is available about p, then 0.5 is often

assumed because it provides the highest possible sample size.

If we had used p = 0.5, the recommended n would have been

1843.

1817)03(.

)56)(.44(.)576.2()1()(

2

2

2

2

2/

E

ppzn

Page 36: Confidence Intervals - Kasetsart University

Confidence Interval of μ, unknown σ

36

n samples are random with unknown μ, unknown σ

)1,0(NX

Confidence Level 95% of μ is n

sX 96.1

For n > 30, good estimate

n < 30, not good estimate of σ

For n < 30, Confidence Level 95% of μ is

n

stX

Page 37: Confidence Intervals - Kasetsart University

tα for 95% confidence level

Sample size (n) tα 4 3.128

8 2.365

12 2.201

20 2.093

25 2.065

30 2.045

1.960

37

Note: t fast increasing for small n

t slow increasing for large n

For n , t z (t normal distribution)

Page 38: Confidence Intervals - Kasetsart University

8 samples randomly select from population

Unknown μ with , s = 0.67

Find 95% confidence level of confidence Intervals of μ

Example 8

38

91.7x

Solution α= 0.05, d.f = n-1 = 8-1 =7 , t0.05 = 2.36

n

stx

05.0

47.835.7

8

67.036.291.7

8

67.036.291.7

With 95% confidence level that μ will be in the confidence interval (7.35,8.47)

Page 39: Confidence Intervals - Kasetsart University

Example 9

49 students randomly select from 3rd IUP students

Unknown μ with average grade , s = 0.7

Find 95% confidence level of confidence Intervals of μ

39

3.2x

Solution n=49, unknown σ s n > 30 use z (normal)

50.210.2

49

7.096.13.2

49

7.096.13.2

With 95% confidence level that μ will be in the confidence interval (2.1, 2.5)

n

szx

475.0

Page 40: Confidence Intervals - Kasetsart University

Estimating Population Total

40

N

i

iNXXTotal

1

Sometimes, we interested in Total population more than population mean

; N = # population

Confidence Interval (1-α) 100% of Total is n

sNtXN

n )1(

factorfpcforN

nN

n

sNtXN

n

1)1(

fpc factor: Finite population correction factor

N >> n, factor ~ 1

Page 41: Confidence Intervals - Kasetsart University

Example 10 For a maize field, Area = 5,000 Rai

Randomly select 1,000 Rai

The average total output Tung, s =273 Tung/Rai

41

1076x

Solution Total output of 5,000 Rai can be approximate with

Total output of maize field is in the interval (5,111,865.14 , 5,648,134.86 )

1)1(

N

nN

n

sNtXN

n

86.134,268000,380,5

1000,5

1000000,5

1000

273)9842.1(000,5)076,1(000,5

Page 42: Confidence Intervals - Kasetsart University

Reference

สถิตพีิน้ฐาน ส าหรับนกัพฒันาสงัคม, วีนสั พีชวณิชย์ สมจิต วฒันาชยากลู และเบญจมาศ ตลุยนิตกิลุ, คณะวิทยาศาสตร์และเทคโนโลยี ม.ธรรมศาสตร์, มค.2547

42