Confidence Intervals - Kasetsart University
Transcript of Confidence Intervals - Kasetsart University
Confidence Intervals รศ.ดร. อนันต์ ผลเพิม่
Assoc.Prof. Anan Phonphoem, Ph.D. [email protected]
Intelligent Wireless Network Group (IWING Lab)
http://iwing.cpe.ku.ac.th
Computer Engineering Department
Kasetsart University, Bangkok, Thailand
1
Confidence Interval
A sample of independent observation
x1,x2,…xn from unknown distribution
Want to find population mean μ
How to know that μ plausible (believable)
◦ Point estimate μ and its estimated s.e.
Confidence Interval
◦ Interval that covers parameter (e.g. μ) with range
of confidence (confidence level)
◦ Ex. 95% of confidence that μ will be in this range
2
Chi-Square (2) Distribution
If X is normal distribution,
◦ is Chi-Square with degree of freedom
(d.f) equals to 1
3
)1,0(NX
2XY
For independent X1, X2,…,Xn with
is Chi-Square with d.f = n
n
i
iXY
2
)1,0(NX
For independent X1, X2,…,Xn with
is Chi-Square with d.f = n
2
1
n
i
ix
Y
),(2
NX
Degree of Freedom (d.f)
# of data that is not depended on others
Example:
◦ Select 3 random numbers
d.f = 3
◦ Select 3 random numbers that sum = 10
can select any 2 numbers, the last one is depended on condition
E.g. Select 2 and 5, the last must be 3
d.f =2
◦ Select 3 random numbers that sum of square(x) = 54
can select any 1 number, the last two are depended on condition
E.g. Select 7, the rest must be 1 and 2
d.f = 1
4
t-Distribution
5
If X is normal distribution, and
are independent, variable t )1,0(NX
YandX
2Y
f
Xt
2
is t-Distribution with d.f = f
For n random from , we can calculate and
And we know the distribution ),(
2NX x
2s
)1,0(N
n
X
2
)1(2
2)1(
n
sn
Therefore, is t-Distribution with d.f = (n-1)
n
s
X
n
sn
n
X
t
2
2
)1(
)1(
Confidence Interval of μ, known σ
6
n samples are random with unknown μ, known σ (not practical for study)
)1,0(NX
We can find and use it to find Confidence Interval of μ
x
),(2
NX From
nNX
2
,
, then
and )1,0(N
n
XZ
Confidence Interval of μ, known σ
7
From normal distribution, suppose we want to find μ with the confidence of 95% (0.95)
0.95 0.025 0.025
z is between ± 1.96
-1.96 +1.96
Therefore,
P(-1.96 < Z < 1.96) = 0.95
Confidence Interval of μ, known σ
8
95.096.196.1
n
XP
n
X
n
n
X
96.196.196.196.1
n
X
n
X
96.196.1
n
X
n
X
96.196.1
95.096.196.1
n
X
n
XP
Therefore,
Confidence Interval of μ, known σ
9
Confidence Level 95% of μ is n
X
96.1
( ) n
X
96.1n
X
96.1 X
Confidence Level 95%
Adjust the Interval
We can select any Confidence Interval
high confidence interval
wide confidence interval length
(not a good estimation)
To decrease confidence interval length
◦ Increase number of samples high cost
◦ Decrease the confidence level
Popular Confidence Interval
◦ 90%, 95%, and 99%
For non-normal distribution
◦ Central limit theorem
◦ n > 30 is OK for estimation 10
Example 1
A cylinder with normal distribution
Unknown μ, and known σ = 2.009 cm.
Randomly select 12 sample cylinders
11
Calculate .91.12 cmx
Find the confidence interval of population mean of
cylinder for this factory with 95% confidence level
Solution
Let μ = population mean
confidence interval = n
x
96.1
12
009.296.191.12
05.1477.11
14.191.12
to
More easy symbol
12
pz represents 100p percentile of normal
475.0z represents 95 percentile of normal
495.0z represents 99 percentile of normal
Let α is the different between 1.00 and required value
(e.g. require 95% α = 0.05)
n
zX
2
1Confidence interval (1- α ) 100% of μ =
Example 1
13
For more confidence level = 99%
α = 0.01 α/2 = 0.005 58.2495.0
z
confidence interval = n
x
58.2
12
009.258.291.12
41.1441.11
4964.191.12
to
n
zx
495.0
Which is wider than 95% confidence level )05.1477.11(14.191.12 to
Example 2
Study the salary of employee
σ2= 10,000 Baht
Randomly ask 100 employee
14
Calculate Bahtx 500,2
Find the confidence interval of population mean of salary for this company with 95% confidence level
Solution σ2= 10,000 σ = 100, , n = 100 500,2x
From central limit theorem, 95% confidence level of μ is
n
zx
495.0
n
x
96.1 60.192500
100
10096.12500
60.519,240.480,2 to
Example 3: National Discount, Inc.
National Discount has 260 retail outlets throughout
the United States. National evaluates each
potential location for a new retail outlet in part on
the mean annual income of the individuals in the
marketing area of the new location.
Sampling can be used to develop an interval
estimate of the mean annual income for individuals
in a potential marketing area for National Discount.
A sample of size n = 36 was taken.
◦ The sample mean, , is $21,100
◦ The sample standard deviation, s, is $4,500
We will use .95 as the confidence coefficient in our
interval estimate.
x
Precision Statement
There is a .95 probability that the value of a
sample mean for National Discount will
provide a sampling error of $1,470 or less.
Determined as follows:
◦ 95% of the sample means that can be observed
are within 1.96 of the population mean .
◦ If , then
Example 3: National Discount, Inc.
75036
500,4
n
sx
147096.1 x
Example 3: National Discount, Inc.
Interval Estimate of the Population Mean:
Unknown
Interval Estimate of is:
$21,100 + $1,470
or $19,630 to $22,570
We are 95% confident that the interval
contains the population mean.
Many Intervals
For a random set
◦ Can create many intervals (a,b)
◦ Depends on confidence level
For 100 confidence intervals finding
◦ There are (1-α)*100 times from 100 times that
the confidence intervals will cover the parameter (e.g. μ)
18
Example 3
From a normal distribution with μ = 10, σ = 9
Generate data 16 sets (8 sampling per set)
19
For each set
create confidence interval for 95% level of x
Example 3
20
n
96.1
n
96.1
5 6 7 8 9 10 11 12 13 14 15
Set 1: 295.10x
With 95% confidence level (8.216,12.374) cover μ = 10
Set 16: 703.12x
With 95% confidence level (10.624,14.781) not cover μ = 10
Therefore, for 16 sets
not cover: 1/16 = 0.0625 = 6.25%
cover: 15/16 = 0.9375 = 93.75%
Interval Estimation of a Population Mean:
Small-Sample Case (n < 30)
Population is Not Normally Distributed
◦ The only option is to increase the sample size to
◦ n > 30 and use the large-sample interval-estimation
◦ procedures.
Population is Normally Distributed and is Known
◦ The large-sample interval-estimation procedure can
◦ be used.
Population is Normally Distributed and is Unknown
◦ The appropriate interval estimate is based on a
◦ probability distribution known as the t distribution.
t Distribution
The t distribution is a family of similar probability
distributions.
A specific t distribution depends on a parameter
known as the degrees of freedom.
As the number of degrees of freedom increases,
the difference between the t distribution and the
standard normal probability distribution becomes
smaller and smaller.
A t distribution with more degrees of freedom has
less dispersion.
The mean of the t distribution is zero.
Interval Estimate
where
1 -α = the confidence coefficient
t α /2 = the t value providing an area of α /2
in the upper tail of a t distribution
with n - 1 degrees of freedom
s = the sample standard deviation
Interval Estimation of a Population Mean:
Small-Sample Case (n < 30) with Unknown
n
stx
2/
Example 4: Apartment Rents
Interval Estimation of a Population Mean:
Small-Sample Case (n < 30) with Unknown
A reporter for a student newspaper is writing an article on the
cost of off-campus housing.
A sample of 10 one-bedroom units within a half-mile of
campus resulted in a sample mean of $550 per month and a
sample standard deviation of $60.
Let us provide a 95% confidence interval estimate of the
mean rent per month for the population of one-bedroom units
within a half-mile of campus.
Assume this population to be normally distributed.
Example 4: Apartment Rents
t Value
At 95% confidence, 1 - α = .95, α = .05,
and α /2 = .025.
t.025 is based on n - 1 = 10 - 1 = 9 degrees of
freedom.
In the t distribution table we see that t.025 = 2.262.
Degrees
of
Freedom
Area in Upper Tail
.10 .05 .025 .01 .005
. . . . . .
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
Example 4: Apartment Rents
Interval Estimation of a Population Mean:
Small-Sample Case (n < 30) with Unknown
550 42.92
or $507.08 to $592.92
We are 95% confident that the mean rent per month for the
population of one-bedroom units within a half-mile of campus
is between $507.08 and $592.92.
n
stx
2/
10
60262.2550
Sample Size for an Interval Estimate
of a Population Mean
Let E = the maximum sampling error mentioned
in the precision statement.
E is the amount added to and subtracted from the
point estimate to obtain an interval estimate.
E is often referred to as the margin of error.
We have
Solving for n we have
n
zE
2/
2
22
2/)(
E
zn
Example 5: National Discount, Inc.
Sample Size for an Interval Estimate of a
Population Mean
Suppose that National’s management team
wants an estimate of the population
mean such that there is a 0.95 probability
that the sampling error is $500 or less.
How large a sample size is needed to meet
the required precision?
Example 5: National Discount, Inc.
Sample Size for Interval Estimate of a
Population Mean
At 95% confidence, z.025 = 1.96.
Recall that = 4,500
Solving for n we have
We need to sample 312 to reach a desired
precision of $500 at 95% confidence.
500
n
σz
2α/
17.311)500(
)4500()96.1(
2
22
n
Interval Estimate
where: 1 -α is the confidence coefficient
zα/2 is the z value providing an area of
α/2 in the upper tail of the standard
normal probability distribution
is the sample proportion
Interval Estimation
of a Population Proportion
n
ppzp
)1(
2/
p
Example 6: Political Science, Inc.
Interval Estimation of a Population Proportion
Political Science, Inc. (PSI) specializes in voter
polls and surveys designed to keep political office
seekers informed of their position in a race. Using
telephone surveys, interviewers ask registered
voters who they would vote for if the election were
held that day.
In a recent election campaign, PSI found that
◦ 220 registered voters, out of 500 contacted, favored a
particular candidate.
◦ PSI wants to develop a 95% confidence interval estimate
for the proportion of the population of registered voters that
favors the candidate.
Example 6: Political Science, Inc.
Interval Estimate of a Population Proportion
where: n = 500, = 220/500 = .44, zα/2 = 1.96
.44 + .0435
PSI is 95% confident that the proportion of all voters that favors
the candidate is between .3965 and .4835.
n
ppzp
)1(
2/
p
500
)44.1(44.96.144.
Let E = the maximum sampling error mentioned
in the precision statement.
We have
Solving for n we have
Sample Size for an Interval Estimate
of a Population Proportion
n
ppzE
)1(
2/
2
2
2/)1()(
E
ppzn
Example 7: Political Science, Inc.
Sample Size for an Interval Estimate of a
Population Proportion
Suppose that PSI would like a .99
probability that the sample proportion is
within .03 of the population proportion.
How large a sample size is needed to meet
the required precision?
Example 7: Political Science, Inc.
Sample Size for Interval Estimate of a Population
Proportion
At 99% confidence, z.005 = 2.576.
Note:
We used 0.44 as the best estimate of p in the above expression.
If no information is available about p, then 0.5 is often
assumed because it provides the highest possible sample size.
If we had used p = 0.5, the recommended n would have been
1843.
1817)03(.
)56)(.44(.)576.2()1()(
2
2
2
2
2/
E
ppzn
Confidence Interval of μ, unknown σ
36
n samples are random with unknown μ, unknown σ
)1,0(NX
Confidence Level 95% of μ is n
sX 96.1
For n > 30, good estimate
n < 30, not good estimate of σ
For n < 30, Confidence Level 95% of μ is
n
stX
tα for 95% confidence level
Sample size (n) tα 4 3.128
8 2.365
12 2.201
20 2.093
25 2.065
30 2.045
1.960
37
Note: t fast increasing for small n
t slow increasing for large n
For n , t z (t normal distribution)
8 samples randomly select from population
Unknown μ with , s = 0.67
Find 95% confidence level of confidence Intervals of μ
Example 8
38
91.7x
Solution α= 0.05, d.f = n-1 = 8-1 =7 , t0.05 = 2.36
n
stx
05.0
47.835.7
8
67.036.291.7
8
67.036.291.7
With 95% confidence level that μ will be in the confidence interval (7.35,8.47)
Example 9
49 students randomly select from 3rd IUP students
Unknown μ with average grade , s = 0.7
Find 95% confidence level of confidence Intervals of μ
39
3.2x
Solution n=49, unknown σ s n > 30 use z (normal)
50.210.2
49
7.096.13.2
49
7.096.13.2
With 95% confidence level that μ will be in the confidence interval (2.1, 2.5)
n
szx
475.0
Estimating Population Total
40
N
i
iNXXTotal
1
Sometimes, we interested in Total population more than population mean
; N = # population
Confidence Interval (1-α) 100% of Total is n
sNtXN
n )1(
factorfpcforN
nN
n
sNtXN
n
1)1(
fpc factor: Finite population correction factor
N >> n, factor ~ 1
Example 10 For a maize field, Area = 5,000 Rai
Randomly select 1,000 Rai
The average total output Tung, s =273 Tung/Rai
41
1076x
Solution Total output of 5,000 Rai can be approximate with
Total output of maize field is in the interval (5,111,865.14 , 5,648,134.86 )
1)1(
N
nN
n
sNtXN
n
86.134,268000,380,5
1000,5
1000000,5
1000
273)9842.1(000,5)076,1(000,5
Reference
สถิตพีิน้ฐาน ส าหรับนกัพฒันาสงัคม, วีนสั พีชวณิชย์ สมจิต วฒันาชยากลู และเบญจมาศ ตลุยนิตกิลุ, คณะวิทยาศาสตร์และเทคโนโลยี ม.ธรรมศาสตร์, มค.2547
42