# Chapter 6 Estimation

Embed Size (px)

description

### Transcript of Chapter 6 Estimation

PROBABILITY (6MTCOAE205)

Chapter 6

Estimation

Confidence Intervals

Contents of this chapter: Confidence Intervals for the Population

Mean, μ when Population Variance σ2 is Known when Population Variance σ2 is Unknown

Confidence Intervals for the Population Proportion, (large samples)

Confidence interval estimates for the variance of a normal population

p̂

Assist. Prof. Dr. İmran Göker 6-2

Definitions

An estimator of a population parameter is a random variable that depends on sample

information . . . whose value provides an approximation to this

unknown parameter

A specific value of that random variable is called an estimate

Assist. Prof. Dr. İmran Göker 6-3

Point and Interval Estimates

A point estimate is a single number, a confidence interval provides additional

information about variability

Point EstimateLower Confidence Limit

UpperConfidence Limit

Width of confidence interval

Assist. Prof. Dr. İmran Göker 6-4

Point Estimates

We can estimate a Population Parameter …

with a SampleStatistic

(a Point Estimate)

Mean

Proportion P

xμ

p̂

Assist. Prof. Dr. İmran Göker 6-5

Unbiasedness

A point estimator is said to be an unbiased estimator of the parameter if the expected value, or mean, of the sampling distribution of is ,

Examples: The sample mean is an unbiased estimator of μ The sample variance s2 is an unbiased estimator of σ2

The sample proportion is an unbiased estimator of P

θ̂

θ̂

θ)θE( ˆ

Assist. Prof. Dr. İmran Göker 6-6

x

p̂

Unbiasedness

is an unbiased estimator, is biased:

1θ̂ 2θ̂

θ̂θ

1θ̂ 2θ̂

(continued)

Assist. Prof. Dr. İmran Göker 6-7

Bias

Let be an estimator of

The bias in is defined as the difference between its mean and

The bias of an unbiased estimator is 0

θ̂

θ̂

θ)θE()θBias( ˆˆ

Assist. Prof. Dr. İmran Göker 6-8

Most Efficient Estimator

Suppose there are several unbiased estimators of The most efficient estimator or the minimum variance

unbiased estimator of is the unbiased estimator with the smallest variance

Let and be two unbiased estimators of , based on the same number of sample observations. Then,

is said to be more efficient than if

The relative efficiency of with respect to is the ratio of their variances:

)θVar()θVar( 21ˆˆ

)θVar()θVar( Efficiency Relative

1

2

ˆˆ

1θ̂ 2θ̂

1θ̂ 2θ̂

1θ̂ 2θ̂

Assist. Prof. Dr. İmran Göker 6-9

Confidence Intervals

How much uncertainty is associated with a point estimate of a population parameter?

An interval estimate provides more information about a population characteristic than does a point estimate

Such interval estimates are called confidence intervals

Assist. Prof. Dr. İmran Göker 6-10

Confidence Interval Estimate

An interval gives a range of values: Takes into consideration variation in sample

statistics from sample to sample Based on observation from 1 sample Gives information about closeness to

unknown population parameters Stated in terms of level of confidence

Can never be 100% confident

Assist. Prof. Dr. İmran Göker 6-11

Confidence Interval and Confidence Level

If P(a < < b) = 1 - then the interval from a to b is called a 100(1 - )% confidence interval of .

The quantity (1 - ) is called the confidence level of the interval ( between 0 and 1)

In repeated samples of the population, the true value of the parameter would be contained in 100(1 - )% of intervals calculated this way.

The confidence interval calculated in this manner is written as a < < b with 100(1 - )% confidence

Assist. Prof. Dr. İmran Göker 6-12

Estimation Process

(mean, μ, is unknown)

Population

Random Sample

Mean X = 50

Sample

I am 95% confident that μ is between 40 & 60.

Assist. Prof. Dr. İmran Göker 6-13

Confidence Level, (1-)

Suppose confidence level = 95% Also written (1 - ) = 0.95 A relative frequency interpretation:

From repeated samples, 95% of all the confidence intervals that can be constructed will contain the unknown true parameter

A specific interval either will contain or will not contain the true parameter No probability involved in a specific interval

(continued)

Assist. Prof. Dr. İmran Göker 6-14

General Formula

The general formula for all confidence intervals is:

The value of the reliability factor depends on the desired level of confidence

Point Estimate ± (Reliability Factor)(Standard Error)

Assist. Prof. Dr. İmran Göker 6-15

Confidence Intervals

Population Mean

σ2 Unknown

ConfidenceIntervals

PopulationProportion

σ2 Known

Assist. Prof. Dr. İmran Göker 6-16

PopulationVariance

Confidence Interval for μ(σ2 Known)

Assumptions Population variance σ2 is known Population is normally distributed If population is not normal, use large sample

Confidence interval estimate:

(where z/2 is the normal distribution value for a probability of /2 in each tail)

nσzxμ

nσzx α/2α/2

Assist. Prof. Dr. İmran Göker 6-17

Margin of Error The confidence interval,

Can also be written aswhere ME is called the margin of error

The interval width, w, is equal to twice the margin of error

nσzxμ

nσzx α/2α/2

MEx ±

nσzME α/2

Assist. Prof. Dr. İmran Göker 6-18

Reducing the Margin of Error

The margin of error can be reduced if

the population standard deviation can be reduced (σ↓)

The sample size is increased (n↑)

The confidence level is decreased, (1 – ) ↓

nσzME α/2

Assist. Prof. Dr. İmran Göker 6-19

Finding the Reliability Factor, z/2

Consider a 95% confidence interval:

z = -1.96 z = 1.96

.951

.0252α

.0252α

Point EstimateLower Confidence Limit

UpperConfidence Limit

Z units:

X units: Point Estimate

0

Find z.025 = ±1.96 from the standard normal distribution table

Assist. Prof. Dr. İmran Göker 6-20

Common Levels of Confidence

Commonly used confidence levels are 90%, 95%, and 99%

Confidence Level

Confidence Coefficient,

Z/2 value

1.281.6451.962.332.583.083.27

.80

.90

.95

.98

.99

.998

.999

80%90%95%98%99%99.8%99.9%

1

Assist. Prof. Dr. İmran Göker 6-21

Intervals and Level of Confidence

μμx

Confidence Intervals

Intervals extend from

to

100(1-)%of intervals constructed contain μ;

100()% do not.

Sampling Distribution of the Mean

nσzxLCL

nσzxUCL

x

x1

x2

/2 /21

Assist. Prof. Dr. İmran Göker 6-22

Example

A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We know from past testing that the population standard deviation is 0.35 ohms.

Determine a 95% confidence interval for the true mean resistance of the population.

Assist. Prof. Dr. İmran Göker 6-23

Example

A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We know from past testing that the population standard deviation is .35 ohms.

Solution:

2.4068μ1.9932

.2068 2.20

)11(.35/ 1.96 2.20

nσz x

±

±

±

(continued)

Assist. Prof. Dr. İmran Göker 6-24

Interpretation

We are 95% confident that the true mean resistance is between 1.9932 and 2.4068 ohms

Although the true mean may or may not be in this interval, 95% of intervals formed in this manner will contain the true mean

Assist. Prof. Dr. İmran Göker 6-25

Confidence Intervals

Population Mean

σ2 Unknown

ConfidenceIntervals

PopulationProportion

σ2 Known

Assist. Prof. Dr. İmran Göker 6-26

PopulationVariance

Student’s t Distribution

Consider a random sample of n observations with mean x and standard deviation s from a normally distributed population with mean μ

Then the variable

follows the Student’s t distribution with (n - 1) degrees of freedom

ns/μxt

Assist. Prof. Dr. İmran Göker 6-27

Confidence Interval for μ(σ2 Unknown)

If the population standard deviation σ is unknown, we can substitute the sample standard deviation, s

This introduces extra uncertainty, since s is variable from sample to sample

So we use the t distribution instead of the normal distribution

Assist. Prof. Dr. İmran Göker 6-28

Confidence Interval for μ(σ Unknown)

Assumptions Population standard deviation is unknown Population is normally distributed If population is not normal, use large sample

Use Student’s t Distribution Confidence Interval Estimate:

where tn-1,α/2 is the critical value of the t distribution with n-1 d.f. and an area of α/2 in each tail:

nStxμ

nStx α/21,-nα/21,-n

(continued)

α/2)tP(t α/21,n1n

Assist. Prof. Dr. İmran Göker 6-29

Margin of Error The confidence interval,

Can also be written as

where ME is called the margin of error:

MEx ±

nσtME α/21,-n

Assist. Prof. Dr. İmran Göker 6-30

nStxμ

nStx α/21,-nα/21,-n

Student’s t Distribution

The t is a family of distributions The t value depends on degrees of

freedom (d.f.) Number of observations that are free to vary after

sample mean has been calculated

d.f. = n - 1

Assist. Prof. Dr. İmran Göker 6-31

Student’s t Distribution

t0

t (df = 5)

t (df = 13)t-distributions are bell-shaped and symmetric, but have ‘fatter’ tails than the normal

Standard Normal

(t with df = ∞)

Note: t Z as n increases

Assist. Prof. Dr. İmran Göker 6-32

Student’s t Table

Upper Tail Area

df .10 .025.05

1 12.706

2

3 3.182

t0 2.920The body of the table contains t values, not probabilities

Let: n = 3 df = n - 1 = 2 = .10 /2 =.05

/2 = .05

3.078

1.886

1.638

6.314

2.920

2.353

4.303

Assist. Prof. Dr. İmran Göker 6-33

t distribution valuesWith comparison to the Z value

Confidence t t t Z Level (10 d.f.) (20 d.f.) (30 d.f.) ____

.80 1.372 1.325 1.310 1.282

.90 1.812 1.725 1.697 1.645

.95 2.228 2.086 2.042 1.960

.99 3.169 2.845 2.750 2.576

Note: t Z as n increases

Assist. Prof. Dr. İmran Göker 6-34

Example

A random sample of n = 25 has x = 50 and s = 8. Form a 95% confidence interval for μ

d.f. = n – 1 = 24, so

The confidence interval is

2.0639tt 24,.025α/21,n

53.302μ46.698258(2.0639)50μ

258(2.0639)50

nStxμ

nStx α/21,-n α/21,-n

Assist. Prof. Dr. İmran Göker 6-35

Confidence Intervals

Population Mean

σ2 Unknown

ConfidenceIntervals

PopulationProportion

σ2 Known

Assist. Prof. Dr. İmran Göker 6-36

PopulationVariance

Confidence Intervals for the Population Proportion

An interval estimate for the population proportion ( P ) can be calculated by adding an allowance for uncertainty to the sample proportion ( ) p̂

Assist. Prof. Dr. İmran Göker 6-37

Confidence Intervals for the Population Proportion, p

Recall that the distribution of the sample proportion is approximately normal if the sample size is large, with standard deviation

We will estimate this with sample data:

(continued)

n)p(1p ˆˆ

nP)P(1σP

Assist. Prof. Dr. İmran Göker 6-38

Confidence Interval Endpoints

Upper and lower confidence limits for the population proportion are calculated with the formula

where z/2 is the standard normal value for the level of confidence desired is the sample proportion n is the sample size nP(1−P) > 5

n)p(1pzpP

n)p(1pzp α/2α/2

ˆˆˆˆˆˆ

p̂

Assist. Prof. Dr. İmran Göker 6-39

Example

A random sample of 100 people shows that 25 are left-handed.

Form a 95% confidence interval for the true proportion of left-handers

Assist. Prof. Dr. İmran Göker 6-40

Example A random sample of 100 people shows

that 25 are left-handed. Form a 95% confidence interval for the true proportion of left-handers.

(continued)

0.3349P0.1651

100.25(.75)1.96

10025P

100.25(.75)1.96

10025

n)p(1pzpP

n)p(1pzp α/2α/2

ˆˆˆˆˆˆ

Assist. Prof. Dr. İmran Göker 6-41

Interpretation

We are 95% confident that the true percentage of left-handers in the population is between

16.51% and 33.49%.

Although the interval from 0.1651 to 0.3349 may or may not contain the true proportion, 95% of intervals formed from samples of size 100 in this manner will contain the true proportion.

Assist. Prof. Dr. İmran Göker 6-42

Confidence Intervals

Population Mean

σ2 Unknown

ConfidenceIntervals

PopulationProportion

σ2 Known

Assist. Prof. Dr. İmran Göker 6-43

PopulationVariance

Confidence Intervals for the Population Variance

The confidence interval is based on the sample variance, s2

Assumed: the population is normally distributed

Assist. Prof. Dr. İmran Göker

Goal: Form a confidence interval for the population variance, σ2

6-44

Confidence Intervals for the Population Variance

Assist. Prof. Dr. İmran Göker

The random variable

2

22

1n σ1)s(n

follows a chi-square distribution with (n – 1) degrees of freedom

(continued)

Where the chi-square value denotes the number for which2

, 1n

αχχ )P( 2α , 1n

21n

6-45

Confidence Intervals for the Population Variance

Assist. Prof. Dr. İmran Göker

The (1 - )% confidence interval for the population variance is

2/2 - 1 , 1n

22

2/2 , 1n

2 1)s(nσ1)s(n

αα χχ

(continued)

6-46

Example

You are testing the speed of a batch of computer processors. You collect the following data (in Mhz):

Sample size 17Sample mean 3004Sample std dev 74

Assist. Prof. Dr. İmran Göker

Assume the population is normal. Determine the 95% confidence interval for σx

2

6-47

Finding the Chi-square Values n = 17 so the chi-square distribution has (n – 1) = 16

degrees of freedom = 0.05, so use the the chi-square values with area

0.025 in each tail:

Assist. Prof. Dr. İmran Göker

probability α/2 = .025

216

216= 28.85

6.91

28.85

20.975 , 16

2/2 - 1 , 1n

20.025 , 16

2/2 , 1n

χχ

χχ

α

α

216 = 6.91

probability α/2 = .025

6-48

Calculating the Confidence Limits

The 95% confidence interval is

Assist. Prof. Dr. İmran Göker

Converting to standard deviation, we are 95% confident that the population standard deviation of

CPU speed is between 55.1 and 112.6 Mhz

2/2 - 1 , 1n

22

2/2 , 1n

2 1)s(nσ1)s(n

αα χχ

6.911)(74)(17σ

28.851)(74)(17 2

22

12683σ3037 2

6-49

Finite Populations

If the sample size is more than 5% of the population size (and sampling is without replacement) then a finite population correction factor must be used when calculating the standard error

Assist. Prof. Dr. İmran Göker 6-50

Finite PopulationCorrection Factor

Suppose sampling is without replacement and the sample size is large relative to the population size

Assume the population size is large enough to apply the central limit theorem

Apply the finite population correction factor when estimating the population variance

Assist. Prof. Dr. İmran Göker

1NnNfactor correction population finite

Ch.17-51

Estimating the Population Mean

Let a simple random sample of size n be taken from a population of N members with mean μ

The sample mean is an unbiased estimator of the population mean μ

The point estimate is:

Assist. Prof. Dr. İmran Göker

n

1iix

n1x

Ch.17-52

Finite Populations: Mean

If the sample size is more than 5% of the population size, an unbiased estimator for the variance of the sample mean is

So the 100(1-α)% confidence interval for the population mean is

Assist. Prof. Dr. İmran Göker 6-53

1NnN

nsσ

22xˆ

xα/21,-nxα/21,-n σtxμσt-x ˆˆ

Estimating the Population Total

Consider a simple random sample of size n from a population of size N

The quantity to be estimated is the population total Nμ

An unbiased estimation procedure for the population total Nμ yields the point estimate Nx

Assist. Prof. Dr. İmran Göker Ch.17-54

Estimating the Population Total

An unbiased estimator of the variance of the population total is

A 100(1 - )% confidence interval for the population total is

Assist. Prof. Dr. İmran Göker

xα/21,-nxα/21,-n σNtxNNμσNtxN ˆˆ

1-Nn)(N

nsNσN

222

x2

ˆ

Ch.17-55

Confidence Interval for Population Total: Example

Assist. Prof. Dr. İmran Göker

A firm has a population of 1000 accounts and wishes to estimate the total population value

A sample of 80 accounts is selected with average balance of $87.6 and standard

deviation of $22.3

Find the 95% confidence interval estimate of the total balance

Ch.17-56

Example Solution

Assist. Prof. Dr. İmran Göker

The 95% confidence interval for the population total balance is $82,837.53 to $92,362.47

2392.65724559.6σN

5724559.6999920

80(22.3)(1000)

1-Nn)(N

nsNσN

x

22

222

x2

ˆ

ˆ

22.3s 87.6,x 80, n 1000,N

392.6)(1.9905)(26)(1000)(87.σNtxN x79,0.025 ±± ˆ

92362.47Nμ82837.53

Ch.17-57

Estimating the Population Proportion

Let the true population proportion be P Let be the sample proportion from n

observations from a simple random sample The sample proportion, , is an unbiased

estimator of the population proportion, P

Assist. Prof. Dr. İmran Göker

p̂

p̂

Ch.17-58

Finite Populations: Proportion

If the sample size is more than 5% of the population size, an unbiased estimator for the variance of the population proportion is

So the 100(1-α)% confidence interval for the population proportion is

Assist. Prof. Dr. İmran Göker 6-59

1NnN

n)p(1-pσ2

p

ˆˆˆ ˆ

pα/2pα/2 σzpPσz-p ˆˆ ˆˆˆˆ

Estimation: Additional Topics

Assist. Prof. Dr. İmran Göker

Chapter Topics

Population Means,

Independent Samples

Population Means,

Dependent Samples

Sample Size Determination

Group 1 vs. independent

Group 2

Same group before vs. after

treatment

Finite Populations

Examples:

Population Proportions

Proportion 1 vs. Proportion 2

6-60

Large Populations

Confidence Intervals

Dependent Samples

Tests Means of 2 Related Populations Paired or matched samples Repeated measures (before/after) Use difference between paired values:

Eliminates Variation Among Subjects Assumptions:

Both Populations Are Normally Distributed

Assist. Prof. Dr. İmran Göker

Dependent samples

di = xi - yi

6-61

Mean Difference

The ith paired difference is di , where

Assist. Prof. Dr. İmran Göker

di = xi - yi

The point estimate for the population mean paired difference is d : n

dd

n

1ii

n is the number of matched pairs in the sample

1n

)d(dS

n

1i

2i

d

The sample standard deviation is:

Dependent samples

6-62

Confidence Interval forMean Difference

The confidence interval for difference between population means, μd , is

Assist. Prof. Dr. İmran Göker

Where n = the sample size (number of matched pairs in the paired sample)

nStdμ

nStd d

α/21,ndd

α/21,n

Dependent samples

6-63

Confidence Interval forMean Difference

The margin of error is

tn-1,/2 is the value from the Student’s t distribution with (n – 1) degrees of freedom for which

Assist. Prof. Dr. İmran Göker

(continued)

2α)tP(t α/21,n1n

nstME d

α/21,n

Dependent samples

6-64

Six people sign up for a weight loss program. You collect the following data:

Assist. Prof. Dr. İmran Göker

Paired Samples Example

Weight: Person Before (x) After (y) Difference, di

1 136 125 11 2 205 195 10 3 157 150 7 4 138 140 - 2 5 175 165 10 6 166 160 6 42

d = din

4.82

1n)d(d

S2

id

= 7.0

6-65

Dependent samples

For a 95% confidence level, the appropriate t value is tn-1,/2 = t5,.025 = 2.571

The 95% confidence interval for the difference between means, μd , is

Assist. Prof. Dr. İmran Göker

12.06μ1.94

64.82(2.571)7μ

64.82(2.571)7

nStdμ

nStd

d

d

dα/21,nd

dα/21,n

Paired Samples Example (continued)

Since this interval contains zero, we cannot be 95% confident, given this limited data, that the weight loss program helps people lose weight

6-66

Dependent samples

Difference Between Two Means:Independent Samples

Different data sources Unrelated Independent

Sample selected from one population has no effect on the sample selected from the other population

The point estimate is the difference between the two sample means:

Assist. Prof. Dr. İmran Göker

Population means, independent

samples

Goal: Form a confidence interval for the difference between two population means, μx – μy

x – y6-67

Assist. Prof. Dr. İmran Göker

Population means, independent

samples

Confidence interval uses z/2

Confidence interval uses a value from the Student’s t distribution

σx2 and σy

2 assumed equal

σx2 and σy

2 known

σx2 and σy

2 unknown

σx2 and σy

2 assumed unequal

(continued)

6-68

Difference Between Two Means:Independent Samples

σx2 and σy

2 Known

Assist. Prof. Dr. İmran Göker

Population means, independent

samples

Assumptions:

Samples are randomly and

independently drawn

both population distributions

are normal

Population variances are known

*σx2 and σy

2 known

σx2 and σy

2 unknown

6-69

σx2 and σy

2 Known

Assist. Prof. Dr. İmran Göker

Population means, independent

samples

…and the random variable

has a standard normal distribution

When σx and σy are known and both populations are

normal, the variance of X – Y is y

2y

x

2x2

YX nσ

nσσ

(continued)

*

Y

2y

X

2x

YX

nσ

nσ

)μ(μ)yx(Z

σx2 and σy

2 known

σx2 and σy

2 unknown

6-70

Confidence Interval, σx

2 and σy2 Known

Assist. Prof. Dr. İmran Göker

Population means,

independent samples The confidence interval for

μx – μy is:*

y

2Y

x

2X

α/2YXy

2Y

x

2X

α/2 nσ

nσz)yx(μμ

nσ

nσz)yx(

σx2 and σy

2 known

σx2 and σy

2 unknown

6-71

σx2 and σy

2 Unknown,Assumed Equal

Assist. Prof. Dr. İmran Göker

Population means, independent

samples

Assumptions:

Samples are randomly and independently drawn

Populations are normally distributed

Population variances are unknown but assumed equal

*σx2 and σy

2 assumed equal

σx2 and σy

2 known

σx2 and σy

2 unknown

σx2 and σy

2 assumed unequal

6-72

σx2 and σy

2 Unknown,Assumed Equal

Assist. Prof. Dr. İmran Göker

Population means, independent

samples

(continued)Forming interval estimates:

The population variances are assumed equal, so use the two sample standard deviations and pool them to estimate σ

use a t value with (nx + ny – 2) degrees of freedom

*σx2 and σy

2 assumed equal

σx2 and σy

2 known

σx2 and σy

2 unknown

σx2 and σy

2 assumed unequal

6-73

σx2 and σy

2 Unknown,Assumed Equal

Assist. Prof. Dr. İmran Göker

Population means, independent

samplesThe pooled variance is

(continued)

* 2nn1)s(n1)s(n

syx

2yy

2xx2

p

σx

2 and σy2

assumed equal

σx2 and σy

2 known

σx2 and σy

2 unknown

σx2 and σy

2 assumed unequal

6-74

Confidence Interval, σx

2 and σy2 Unknown, Equal

Assist. Prof. Dr. İmran Göker

The confidence interval for μ1 – μ2 is:

Where

*σx2 and σy

2 assumed equal

σx2 and σy

2 unknown

σx2 and σy

2 assumed unequal

y

2p

x

2p

α/22,nnYXy

2p

x

2p

α/22,nn ns

ns

t)yx(μμns

ns

t)yx(yxyx

2nn1)s(n1)s(n

syx

2yy

2xx2

p

6-75

Pooled Variance Example

You are testing two computer processors for speed. Form a confidence interval for the difference in CPU speed. You collect the following speed data (in Mhz):

CPUx CPUyNumber Tested 17 14Sample mean 3004 2538Sample std dev 74 56

Assist. Prof. Dr. İmran Göker

Assume both populations are normal with equal variances, and use 95% confidence

6-76

Calculating the Pooled Variance

Assist. Prof. Dr. İmran Göker

4427.031)141)-(17

56114741171)n(n

S1nS1nS

22

y

2yy

2xx2

p

(()1x

The pooled variance is:

The t value for a 95% confidence interval is:

2.045tt 0.025 , 29α/2 , 2nn yx

6-77

Calculating the Confidence Limits

The 95% confidence interval is

Assist. Prof. Dr. İmran Göker

y

2p

x

2p

α/22,nnYXy

2p

x

2p

α/22,nn ns

ns

t)yx(μμns

ns

t)yx(yxyx

144427.03

174427.03(2.054)2538)(3004μμ

144427.03

174427.03(2.054)2538)(3004 YX

515.31μμ416.69 YX

We are 95% confident that the mean difference in CPU speed is between 416.69 and 515.31 Mhz.

6-78

σx2 and σy

2 Unknown,Assumed Unequal

Assist. Prof. Dr. İmran Göker

Population means, independent

samples

Assumptions:

Samples are randomly and independently drawn

Populations are normally distributed

Population variances are unknown and assumed unequal

*σx

2 and σy2

assumed equal

σx2 and σy

2 known

σx2 and σy

2 unknown

σx2 and σy

2 assumed unequal

6-79

σx2 and σy

2 Unknown,Assumed Unequal

Assist. Prof. Dr. İmran Göker

Population means, independent

samples

(continued)

Forming interval estimates:

The population variances are assumed unequal, so a pooled variance is not appropriate

use a t value with degrees of freedom, where

σx2 and σy

2 known

σx2 and σy

2 unknown

*σx

2 and σy2

assumed equal

σx2 and σy

2 assumed unequal

1)/(nns

1)/(nns

)ns

()ns(

y

2

y

2y

x

2

x

2x

2

y

2y

x

2x

v

6-80

Confidence Interval, σx

2 and σy2 Unknown, Unequal

Assist. Prof. Dr. İmran Göker

The confidence interval for μ1 – μ2 is:

*σx

2 and σy2

assumed equal

σx2 and σy

2 unknown

σx2 and σy

2 assumed unequal

y

2y

x

2x

α/2,YXy

2y

x

2x

α/2, ns

nst)yx(μμ

ns

nst)yx(

1)/(nns

1)/(nns

)ns

()ns(

y

2

y

2y

x

2

x

2x

2

y

2y

x

2x

vWhere

6-81

Two Population Proportions

Assist. Prof. Dr. İmran Göker

Goal: Form a confidence interval for the difference between two population proportions, Px – Py

The point estimate for the difference is

Population proportions

Assumptions: Both sample sizes are large (generally at least 40 observations in each sample)

yx pp ˆˆ

6-82

Two Population Proportions

The random variable

is approximately normally distributed

Assist. Prof. Dr. İmran Göker

Population proportions

(continued)

y

yy

x

xx

yxyx

n)p(1p

n)p(1p

)p(p)pp(Z

ˆˆˆˆ

ˆˆ

6-83

Confidence Interval forTwo Population Proportions

Assist. Prof. Dr. İmran Göker

Population proportions

The confidence limits for Px – Py are:

y

yy

x

xxyx n

)p(1pn

)p(1pZ )pp(ˆˆˆˆˆˆ

2/

±

6-84

Example: Two Population Proportions

Form a 90% confidence interval for the difference between the proportion of men and the proportion of women who have college degrees.

In a random sample, 26 of 50 men and 28 of 40 women had an earned college degree

Assist. Prof. Dr. İmran Göker 6-85

Example: Two Population Proportions

Men:

Women:

Assist. Prof. Dr. İmran Göker

0.101240

0.70(0.30)50

0.52(0.48)n

)p(1pn

)p(1p

y

yy

x

xx

ˆˆˆˆ

0.525026px ˆ

0.704028py ˆ

(continued)

For 90% confidence, Z/2 = 1.645

6-86

Example: Two Population Proportions

The confidence limits are:

so the confidence interval is

-0.3465 < Px – Py < -0.0135

Since this interval does not contain zero we are 90% confident that the two proportions are not equal

Assist. Prof. Dr. İmran Göker

(continued)

(0.1012)1.645.70)(.52

n)p(1p

n)p(1pZ)pp(

y

yy

x

xxα/2yx

±

±

ˆˆˆˆˆˆ

6-87

Sample Size Determination

Assist. Prof. Dr. İmran Göker

For the Mean

DeterminingSample Size

For theProportion

6-88

Large Populations

Finite Populations

For the Mean

For theProportion

Margin of Error

The required sample size can be found to reach a desired margin of error (ME) with a specified level of confidence (1 - )

The margin of error is also called sampling error the amount of imprecision in the estimate of the

population parameter the amount added and subtracted to the point

estimate to form the confidence interval

Assist. Prof. Dr. İmran Göker 6-89

Sample Size Determination

Assist. Prof. Dr. İmran Göker

nσzx α/2±

nσzME α/2

Margin of Error (sampling error)

6-90

For the Mean

Large Population

s

Sample Size Determination

Assist. Prof. Dr. İmran Göker

nσzME α/2

(continued)

2

22α/2

MEσzn Now solve

for n to get

6-91

For the Mean

Large Populations

Sample Size Determination

To determine the required sample size for the mean, you must know:

The desired level of confidence (1 - ), which determines the z/2 value

The acceptable margin of error (sampling error), ME The population standard deviation, σ

Assist. Prof. Dr. İmran Göker

(continued)

6-92

Required Sample Size Example

If = 45, what sample size is needed to estimate the mean within ± 5 with 90% confidence?

Assist. Prof. Dr. İmran Göker

(Always round up)

219.195

(45)(1.645)ME

σzn 2

22

2

22α/2

So the required sample size is n = 220

6-93

Sample Size Determination:Population Proportion

Assist. Prof. Dr. İmran Göker

n)p(1pzp α/2

ˆˆˆ ±

n)p(1pzME α/2

ˆˆ

Margin of Error (sampling error)

6-94

For theProportion

Large Populations

Assist. Prof. Dr. İmran Göker

2

2α/2

MEz 0.25n

Substitute 0.25 for and solve for n to get

(continued)

n)p(1pzME α/2

ˆˆ

cannot be larger than 0.25, when = 0.5

)p(1p ˆˆ

p̂)p(1p ˆˆ

6-95

For theProportion

Large Populations

Sample Size Determination:Population Proportion

The sample and population proportions, and P, are generally not known (since no sample has been taken yet)

P(1 – P) = 0.25 generates the largest possible margin of error (so guarantees that the resulting sample size will meet the desired level of confidence)

To determine the required sample size for the proportion, you must know:

The desired level of confidence (1 - ), which determines the critical z/2 value

The acceptable sampling error (margin of error), ME Estimate P(1 – P) = 0.25

Assist. Prof. Dr. İmran Göker

(continued)

p̂

6-96

Sample Size Determination:Population Proportion

Required Sample Size Example:Population Proportion

How large a sample would be necessary to estimate the true proportion defective in a large population within ±3%, with 95% confidence?

Assist. Prof. Dr. İmran Göker 6-97

Required Sample Size Example

Assist. Prof. Dr. İmran Göker

Solution:

For 95% confidence, use z0.025 = 1.96

ME = 0.03Estimate P(1 – P) = 0.25

So use n = 1068

(continued)

1067.11(0.03)

6)(0.25)(1.9ME

z 0.25n 2

2

2

2α/2

6-98

Assist. Prof. Dr. İmran Göker 6-99

Sample Size Determination:Finite Populations

Finite Populations

For the Mean

1NnN

nσ)XVar(

2

A finite population correction factor is added:

1. Calculate the required sample size n0 using the prior formula:

2. Then adjust for the finite population:

2

22α/2

0 MEσzn

1)-(NnNnn

0

0

Sample Size Determination:Finite Populations

Assist. Prof. Dr. İmran Göker 6-100

Finite Populations

For theProportion

1NnN

nP)P(1-)pVar( ˆ

A finite population correction factor is added:

1. Solve for n:

2. The largest possible value for this expression (if P = 0.25) is:

3. A 95% confidence interval will extend ±1.96 from the sample proportion

P)P(11)σ(NP)NP(1n 2

p

ˆ

0.251)σ(NP)0.25(1n 2

p

ˆ

pσ ˆ

Example: Sample Size to Estimate Population Proportion

Assist. Prof. Dr. İmran Göker 6-101

(continued)

pσ ˆHow large a sample would be necessary to estimate within ±5% the true proportion of college graduates in a population of 850 people with 95% confidence?

Required Sample Size Example

Assist. Prof. Dr. İmran Göker

Solution: For 95% confidence, use z0.025 = 1.96 ME = 0.05

So use n = 265

(continued)

6-102

0.02551σ0.05σ1.96 pp ˆˆ

264.80.25551)(849)(0.02

)(0.25)(8500.251)σ(N

0.25Nn 22p

max

ˆ