Confidence Intervals - Kasetsart University

Post on 17-Apr-2022

6 views 0 download

Transcript of Confidence Intervals - Kasetsart University

Confidence Intervals รศ.ดร. อนันต์ ผลเพิม่

Assoc.Prof. Anan Phonphoem, Ph.D. anan.p@ku.ac.th

Intelligent Wireless Network Group (IWING Lab)

http://iwing.cpe.ku.ac.th

Computer Engineering Department

Kasetsart University, Bangkok, Thailand

1

Confidence Interval

A sample of independent observation

x1,x2,…xn from unknown distribution

Want to find population mean μ

How to know that μ plausible (believable)

◦ Point estimate μ and its estimated s.e.

Confidence Interval

◦ Interval that covers parameter (e.g. μ) with range

of confidence (confidence level)

◦ Ex. 95% of confidence that μ will be in this range

2

Chi-Square (2) Distribution

If X is normal distribution,

◦ is Chi-Square with degree of freedom

(d.f) equals to 1

3

)1,0(NX

2XY

For independent X1, X2,…,Xn with

is Chi-Square with d.f = n

n

i

iXY

2

)1,0(NX

For independent X1, X2,…,Xn with

is Chi-Square with d.f = n

2

1

n

i

ix

Y

),(2

NX

Degree of Freedom (d.f)

# of data that is not depended on others

Example:

◦ Select 3 random numbers

d.f = 3

◦ Select 3 random numbers that sum = 10

can select any 2 numbers, the last one is depended on condition

E.g. Select 2 and 5, the last must be 3

d.f =2

◦ Select 3 random numbers that sum of square(x) = 54

can select any 1 number, the last two are depended on condition

E.g. Select 7, the rest must be 1 and 2

d.f = 1

4

t-Distribution

5

If X is normal distribution, and

are independent, variable t )1,0(NX

YandX

2Y

f

Xt

2

is t-Distribution with d.f = f

For n random from , we can calculate and

And we know the distribution ),(

2NX x

2s

)1,0(N

n

X

2

)1(2

2)1(

n

sn

Therefore, is t-Distribution with d.f = (n-1)

n

s

X

n

sn

n

X

t

2

2

)1(

)1(

Confidence Interval of μ, known σ

6

n samples are random with unknown μ, known σ (not practical for study)

)1,0(NX

We can find and use it to find Confidence Interval of μ

x

),(2

NX From

nNX

2

,

, then

and )1,0(N

n

XZ

Confidence Interval of μ, known σ

7

From normal distribution, suppose we want to find μ with the confidence of 95% (0.95)

0.95 0.025 0.025

z is between ± 1.96

-1.96 +1.96

Therefore,

P(-1.96 < Z < 1.96) = 0.95

Confidence Interval of μ, known σ

8

95.096.196.1

n

XP

n

X

n

n

X

96.196.196.196.1

n

X

n

X

96.196.1

n

X

n

X

96.196.1

95.096.196.1

n

X

n

XP

Therefore,

Confidence Interval of μ, known σ

9

Confidence Level 95% of μ is n

X

96.1

( ) n

X

96.1n

X

96.1 X

Confidence Level 95%

Adjust the Interval

We can select any Confidence Interval

high confidence interval

wide confidence interval length

(not a good estimation)

To decrease confidence interval length

◦ Increase number of samples high cost

◦ Decrease the confidence level

Popular Confidence Interval

◦ 90%, 95%, and 99%

For non-normal distribution

◦ Central limit theorem

◦ n > 30 is OK for estimation 10

Example 1

A cylinder with normal distribution

Unknown μ, and known σ = 2.009 cm.

Randomly select 12 sample cylinders

11

Calculate .91.12 cmx

Find the confidence interval of population mean of

cylinder for this factory with 95% confidence level

Solution

Let μ = population mean

confidence interval = n

x

96.1

12

009.296.191.12

05.1477.11

14.191.12

to

More easy symbol

12

pz represents 100p percentile of normal

475.0z represents 95 percentile of normal

495.0z represents 99 percentile of normal

Let α is the different between 1.00 and required value

(e.g. require 95% α = 0.05)

n

zX

2

1Confidence interval (1- α ) 100% of μ =

Example 1

13

For more confidence level = 99%

α = 0.01 α/2 = 0.005 58.2495.0

z

confidence interval = n

x

58.2

12

009.258.291.12

41.1441.11

4964.191.12

to

n

zx

495.0

Which is wider than 95% confidence level )05.1477.11(14.191.12 to

Example 2

Study the salary of employee

σ2= 10,000 Baht

Randomly ask 100 employee

14

Calculate Bahtx 500,2

Find the confidence interval of population mean of salary for this company with 95% confidence level

Solution σ2= 10,000 σ = 100, , n = 100 500,2x

From central limit theorem, 95% confidence level of μ is

n

zx

495.0

n

x

96.1 60.192500

100

10096.12500

60.519,240.480,2 to

Example 3: National Discount, Inc.

National Discount has 260 retail outlets throughout

the United States. National evaluates each

potential location for a new retail outlet in part on

the mean annual income of the individuals in the

marketing area of the new location.

Sampling can be used to develop an interval

estimate of the mean annual income for individuals

in a potential marketing area for National Discount.

A sample of size n = 36 was taken.

◦ The sample mean, , is $21,100

◦ The sample standard deviation, s, is $4,500

We will use .95 as the confidence coefficient in our

interval estimate.

x

Precision Statement

There is a .95 probability that the value of a

sample mean for National Discount will

provide a sampling error of $1,470 or less.

Determined as follows:

◦ 95% of the sample means that can be observed

are within 1.96 of the population mean .

◦ If , then

Example 3: National Discount, Inc.

75036

500,4

n

sx

147096.1 x

Example 3: National Discount, Inc.

Interval Estimate of the Population Mean:

Unknown

Interval Estimate of is:

$21,100 + $1,470

or $19,630 to $22,570

We are 95% confident that the interval

contains the population mean.

Many Intervals

For a random set

◦ Can create many intervals (a,b)

◦ Depends on confidence level

For 100 confidence intervals finding

◦ There are (1-α)*100 times from 100 times that

the confidence intervals will cover the parameter (e.g. μ)

18

Example 3

From a normal distribution with μ = 10, σ = 9

Generate data 16 sets (8 sampling per set)

19

For each set

create confidence interval for 95% level of x

Example 3

20

n

96.1

n

96.1

5 6 7 8 9 10 11 12 13 14 15

Set 1: 295.10x

With 95% confidence level (8.216,12.374) cover μ = 10

Set 16: 703.12x

With 95% confidence level (10.624,14.781) not cover μ = 10

Therefore, for 16 sets

not cover: 1/16 = 0.0625 = 6.25%

cover: 15/16 = 0.9375 = 93.75%

Interval Estimation of a Population Mean:

Small-Sample Case (n < 30)

Population is Not Normally Distributed

◦ The only option is to increase the sample size to

◦ n > 30 and use the large-sample interval-estimation

◦ procedures.

Population is Normally Distributed and is Known

◦ The large-sample interval-estimation procedure can

◦ be used.

Population is Normally Distributed and is Unknown

◦ The appropriate interval estimate is based on a

◦ probability distribution known as the t distribution.

t Distribution

The t distribution is a family of similar probability

distributions.

A specific t distribution depends on a parameter

known as the degrees of freedom.

As the number of degrees of freedom increases,

the difference between the t distribution and the

standard normal probability distribution becomes

smaller and smaller.

A t distribution with more degrees of freedom has

less dispersion.

The mean of the t distribution is zero.

Interval Estimate

where

1 -α = the confidence coefficient

t α /2 = the t value providing an area of α /2

in the upper tail of a t distribution

with n - 1 degrees of freedom

s = the sample standard deviation

Interval Estimation of a Population Mean:

Small-Sample Case (n < 30) with Unknown

n

stx

2/

Example 4: Apartment Rents

Interval Estimation of a Population Mean:

Small-Sample Case (n < 30) with Unknown

A reporter for a student newspaper is writing an article on the

cost of off-campus housing.

A sample of 10 one-bedroom units within a half-mile of

campus resulted in a sample mean of $550 per month and a

sample standard deviation of $60.

Let us provide a 95% confidence interval estimate of the

mean rent per month for the population of one-bedroom units

within a half-mile of campus.

Assume this population to be normally distributed.

Example 4: Apartment Rents

t Value

At 95% confidence, 1 - α = .95, α = .05,

and α /2 = .025.

t.025 is based on n - 1 = 10 - 1 = 9 degrees of

freedom.

In the t distribution table we see that t.025 = 2.262.

Degrees

of

Freedom

Area in Upper Tail

.10 .05 .025 .01 .005

. . . . . .

7 1.415 1.895 2.365 2.998 3.499

8 1.397 1.860 2.306 2.896 3.355

9 1.383 1.833 2.262 2.821 3.250

10 1.372 1.812 2.228 2.764 3.169

Example 4: Apartment Rents

Interval Estimation of a Population Mean:

Small-Sample Case (n < 30) with Unknown

550 42.92

or $507.08 to $592.92

We are 95% confident that the mean rent per month for the

population of one-bedroom units within a half-mile of campus

is between $507.08 and $592.92.

n

stx

2/

10

60262.2550

Sample Size for an Interval Estimate

of a Population Mean

Let E = the maximum sampling error mentioned

in the precision statement.

E is the amount added to and subtracted from the

point estimate to obtain an interval estimate.

E is often referred to as the margin of error.

We have

Solving for n we have

n

zE

2/

2

22

2/)(

E

zn

Example 5: National Discount, Inc.

Sample Size for an Interval Estimate of a

Population Mean

Suppose that National’s management team

wants an estimate of the population

mean such that there is a 0.95 probability

that the sampling error is $500 or less.

How large a sample size is needed to meet

the required precision?

Example 5: National Discount, Inc.

Sample Size for Interval Estimate of a

Population Mean

At 95% confidence, z.025 = 1.96.

Recall that = 4,500

Solving for n we have

We need to sample 312 to reach a desired

precision of $500 at 95% confidence.

500

n

σz

2α/

17.311)500(

)4500()96.1(

2

22

n

Interval Estimate

where: 1 -α is the confidence coefficient

zα/2 is the z value providing an area of

α/2 in the upper tail of the standard

normal probability distribution

is the sample proportion

Interval Estimation

of a Population Proportion

n

ppzp

)1(

2/

p

Example 6: Political Science, Inc.

Interval Estimation of a Population Proportion

Political Science, Inc. (PSI) specializes in voter

polls and surveys designed to keep political office

seekers informed of their position in a race. Using

telephone surveys, interviewers ask registered

voters who they would vote for if the election were

held that day.

In a recent election campaign, PSI found that

◦ 220 registered voters, out of 500 contacted, favored a

particular candidate.

◦ PSI wants to develop a 95% confidence interval estimate

for the proportion of the population of registered voters that

favors the candidate.

Example 6: Political Science, Inc.

Interval Estimate of a Population Proportion

where: n = 500, = 220/500 = .44, zα/2 = 1.96

.44 + .0435

PSI is 95% confident that the proportion of all voters that favors

the candidate is between .3965 and .4835.

n

ppzp

)1(

2/

p

500

)44.1(44.96.144.

Let E = the maximum sampling error mentioned

in the precision statement.

We have

Solving for n we have

Sample Size for an Interval Estimate

of a Population Proportion

n

ppzE

)1(

2/

2

2

2/)1()(

E

ppzn

Example 7: Political Science, Inc.

Sample Size for an Interval Estimate of a

Population Proportion

Suppose that PSI would like a .99

probability that the sample proportion is

within .03 of the population proportion.

How large a sample size is needed to meet

the required precision?

Example 7: Political Science, Inc.

Sample Size for Interval Estimate of a Population

Proportion

At 99% confidence, z.005 = 2.576.

Note:

We used 0.44 as the best estimate of p in the above expression.

If no information is available about p, then 0.5 is often

assumed because it provides the highest possible sample size.

If we had used p = 0.5, the recommended n would have been

1843.

1817)03(.

)56)(.44(.)576.2()1()(

2

2

2

2

2/

E

ppzn

Confidence Interval of μ, unknown σ

36

n samples are random with unknown μ, unknown σ

)1,0(NX

Confidence Level 95% of μ is n

sX 96.1

For n > 30, good estimate

n < 30, not good estimate of σ

For n < 30, Confidence Level 95% of μ is

n

stX

tα for 95% confidence level

Sample size (n) tα 4 3.128

8 2.365

12 2.201

20 2.093

25 2.065

30 2.045

1.960

37

Note: t fast increasing for small n

t slow increasing for large n

For n , t z (t normal distribution)

8 samples randomly select from population

Unknown μ with , s = 0.67

Find 95% confidence level of confidence Intervals of μ

Example 8

38

91.7x

Solution α= 0.05, d.f = n-1 = 8-1 =7 , t0.05 = 2.36

n

stx

05.0

47.835.7

8

67.036.291.7

8

67.036.291.7

With 95% confidence level that μ will be in the confidence interval (7.35,8.47)

Example 9

49 students randomly select from 3rd IUP students

Unknown μ with average grade , s = 0.7

Find 95% confidence level of confidence Intervals of μ

39

3.2x

Solution n=49, unknown σ s n > 30 use z (normal)

50.210.2

49

7.096.13.2

49

7.096.13.2

With 95% confidence level that μ will be in the confidence interval (2.1, 2.5)

n

szx

475.0

Estimating Population Total

40

N

i

iNXXTotal

1

Sometimes, we interested in Total population more than population mean

; N = # population

Confidence Interval (1-α) 100% of Total is n

sNtXN

n )1(

factorfpcforN

nN

n

sNtXN

n

1)1(

fpc factor: Finite population correction factor

N >> n, factor ~ 1

Example 10 For a maize field, Area = 5,000 Rai

Randomly select 1,000 Rai

The average total output Tung, s =273 Tung/Rai

41

1076x

Solution Total output of 5,000 Rai can be approximate with

Total output of maize field is in the interval (5,111,865.14 , 5,648,134.86 )

1)1(

N

nN

n

sNtXN

n

86.134,268000,380,5

1000,5

1000000,5

1000

273)9842.1(000,5)076,1(000,5

Reference

สถิตพีิน้ฐาน ส าหรับนกัพฒันาสงัคม, วีนสั พีชวณิชย์ สมจิต วฒันาชยากลู และเบญจมาศ ตลุยนิตกิลุ, คณะวิทยาศาสตร์และเทคโนโลยี ม.ธรรมศาสตร์, มค.2547

42