Download - Unit3: Statistical Inferences

Transcript
Page 1: Unit3:  Statistical Inferences

Unit3: Statistical Inferences

Wenyaw ChanDivision of BiostatisticsSchool of Public Health

University of Texas- Health Science Center at Houston

Page 2: Unit3:  Statistical Inferences

Estimation• Point Estimates

– A point estimate of a parameter θ is a single number used as an estimate of the value of θ.

– e.g. A natural estimate to use for estimating the population mean is the sample mean .

• Interval Estimation– If an random interval I=(L,U) satisfying Pr(L< θ <U)=1-

α, the observed values of L and U for a given sample is called a 1- α conference interval estimate for θ.

Which one is more accurate? Which one is more precise?

nXXn

ii /

1

__

Page 3: Unit3:  Statistical Inferences

Estimation

What to estimate?

• B(n, p) proportion• Poisson () mean• N(, σ2) mean and/or variance

Page 4: Unit3:  Statistical Inferences

Estimation of the Mean of a Distribution

• A point estimator of the population mean is sample mean.

• Sampling Distribution of is the distribution of values of over all possible

samples of size n that could have been selected from the reference population.

X

X

)(XE

Page 5: Unit3:  Statistical Inferences

Estimation

• An estimator of a parameter is unbiased estimator if its expectation is equal to the parameter.

• Note: The unbiasedness is not sufficient to be used as the only criterion for chosen an estimator.

• The unbiased estimator with the minimum variance (MVUE) is preferred.

• If the population is normal, then is the MVUE of .X

Page 6: Unit3:  Statistical Inferences

Sample Mean

• Standard error (of the mean) = standard deviation of the sample mean

• The estimated standard error

where s: sample standard deviation .

nn

2

ns

Page 7: Unit3:  Statistical Inferences

Central Limit Theorem

• Let X1,…,Xn be a random sample from some population with mean and variance σ2

Then, for large n,

nNX

2

,

Page 8: Unit3:  Statistical Inferences

Interval Estimation

• Let X1, ….Xn be a random sample from a normal population N(, σ2). If σ2 is known, a 95% confidence interval (C.I.) for is

why? (next slide)

nX

nX 96.1,96.1

Page 9: Unit3:  Statistical Inferences

Interval Estimation

2

If ~ , , then Pr 1.96 1.96 .95

. .

1.96 1.96

1.96 1.96

1.96 1.96

XX Nn

ni e

Xn n

X Xn n

X Xn n

Page 10: Unit3:  Statistical Inferences

Interval Estimation

Interpretation of Confidence Interval• Over the collection of 95% confidence

intervals that could be constructed from repeated random samples of size n, 95% of them will contain the parameter

• It is wrong to say:There is a 95% chance that the parameter will fall within a particular 95% confidence interval.

Page 11: Unit3:  Statistical Inferences

Interval Estimation

• Note: 1. When and n are fixed, 99% C.I. is wider than 95% C.I.2. If the width of the C.I. is specified, the sample size can be determined.n length length

Page 12: Unit3:  Statistical Inferences

Hypothesis Testing

• Null hypothesis(H0): the statement to be tested, usually reflecting the status quo.

• Alternative hypothesis (H1): the logical compliment of H0.

• Note: the null hypothesis is analogous to the

defendant in the court. It is presumed to be true unless the data argue overwhelmingly to the contrary.

Page 13: Unit3:  Statistical Inferences

Hypothesis Testing• Four possible outcomes of the decision:

• Notation: = Pr (Type I error) = level of significance = Pr (Type II error)1- = power= Pr(reject H0|H1 is true)

Truth

Ho H1

DecisionAccept H0 OK Type II error

Reject H0 Type I error OK

Page 14: Unit3:  Statistical Inferences

Hypothesis Testing

• Goal : to make and both small

• Facts: then then

• General Strategy:fix , minimize

Page 15: Unit3:  Statistical Inferences

Testing for the Population Mean• When the sample is from normal population

H0 : = 120 vs H1 : < 120• The best test is based on ,which is called the test

statistic. The "best test" means that the test has the highest power among all tests with a given type I error.

Is there any bad test? Yes. • Rejection Region:

– range of values of test statistic for which H0 is rejected.

X

Page 16: Unit3:  Statistical Inferences

One-tailed test

• Our rejection region is • Now,

X c

2

0

0

00

Pr( | )

Pr( | ~ ( , ))

( ) /

i.e. or //

Type I error Ho is true

X c X Nn

cn

c Z c Z nn

Page 17: Unit3:  Statistical Inferences

Result

• To test H0 : = 0 vs H1 : < 0, based on the samples taken from a normal population with mean and variance unknown,

the test statistic is . • Assume the level of significance is α then,

– if t < tn-1, α , then we reject H0.

– if t ≥ tn-1, α, then we do not reject H0.

t xs n

0

/

Page 18: Unit3:  Statistical Inferences

P-value• The minimum α-level at

which we can reject Ho based on the sample.

• P-value can also be thought as the probability of obtaining a test statistic as extreme as or more extreme than the actual test statistic obtained from the sample, given that the null hypothesis is true.

P value

Page 19: Unit3:  Statistical Inferences

Remarks

• Two different approaches on determining the statistical significance:– Critical value method– P-value method.

Page 20: Unit3:  Statistical Inferences

One-tailed test• Testing H0: µ= µ0 vs H1: µ > µ0 When unknown and population is normal

Test Statistic:

Rejection Region: t > tn-1,α

p-value = 1- Ft,n-1 (t), where Ft,n-1 ( ) is the cdf for t distribution with df=n-1. • Note: If is known, the s in test statistic will be replaced σ by and tn-1,α in rejection region will be replaced by zα , Ft,n-1 (t) will be replace by Ф(t).

t xs n

0

/

2

2

Page 21: Unit3:  Statistical Inferences

Testing For Two-Sided Alternative• Let X1,….,Xn be the random samples from the

population N(µ, σ²), where σ² is unknown.• H0 : µ=µ0 vs H1 : µ≠µ0

– Test Statistic:

– Rejection Region: |t|> tn-1,1-α/2

– p-value = 2*Ft,n-1 (t), if t<= 0. (see figures on next slide)

2*[1- Ft,n-1 (t)], if t > 0. • Warning: exact p-value requires use of computer.

t xs n

0

/

Page 22: Unit3:  Statistical Inferences

Testing For Two-Sided Alternative

P-value for X>U0 P-value for X<=U0

2Uo-x Uo x

if x> Uo

x Uo 2Uo-x

if x<= Uo

Page 23: Unit3:  Statistical Inferences

The Power of A Test

• To test H0 : µ=µ0 vs H1 : µ<µ0 in normal population with known variance σ², the power is

• Review : Power= Pr [rejecting H0 | H0 is false ]• Factors Affecting the Power

1. 2. 3. 4.

0 1[ ( - ) / ].Z n

powerZ || 10 power

power n power

Page 24: Unit3:  Statistical Inferences

The Power of The 1-Sample T Test

• To test H0 : µ=µ0 vs H1 : µ<µ0 in a normal

population with unknown variance σ², the power, for true mean µ1 and true s.d.= σ, is F(tn-1, .05), where F( ) is the c.d.f of non-central t distribution with df=n-1 and non-centrality 1 0 .n

n-1,0.05 n-1,0.95

2

Notes: 1. t = -t . 2. If X and Y are independent random variables such that Y~ N( ,1) and X ~ with d.f.=m, then Y/ (X/m) is said to have a non-central t distribution with

non-centrality .

Page 25: Unit3:  Statistical Inferences

Power Function For Two-Sided Alternative

• To test H0 : µ=µ0 vs H1 : µ≠µ0 in normal population with known variance σ², the power is

,where µ1 is true alternative.

1 2 0 1 1 2 1 0[- + ] [- + ]Z n Z n

Page 26: Unit3:  Statistical Inferences

Case of Unknown Variance

• For the same test with an unknown variance population, the power is F(-tn-1, 1-α/2) + 1- F(tn-1, 1-

α /2), where F( ) is the c.d.f of non-central t distribution with df=n-1 and non-centrality 1 0 .n

Page 27: Unit3:  Statistical Inferences

Sample Size Determination

For example: H0 : µ=µ0 vs H1 : µ<µ0

power :

Hence,01

10

-1 ]/)([

ifnZ

)() (

)() (

/)(

210

22-11

10

-1

-110

ZZn

ZZn

ZnZ

Page 28: Unit3:  Statistical Inferences

Factor Affecting Sample Size

1. 2. 3. 4. • To test H0 : µ=µ0 vs H1 : µ≠µ0, σ² is known. Sample size calculation is

2 n

n 1 n

|| 10 n

)() (2

10

22-12/1

ZZn

Page 29: Unit3:  Statistical Inferences

Relationship between Hypothesis Testing and Confidence Interval

• To test H0 : µ=µ0 vs H1 : µ≠µ0, H0 is rejected with a two-sided level α test if and only if the two-sided 100%*(1 - α) confidence interval for µ does not contain µ0.

Page 30: Unit3:  Statistical Inferences

One Sample Test for the Variance of A Normal Population

Page 31: Unit3:  Statistical Inferences

One Sample Test for A Proportion

Page 32: Unit3:  Statistical Inferences

Exact Method

• If p(hat) < p0, the p-value

• If p(hat) ≥ p0, the p-value

knk

k

PPkn

PnX

)1( 2

)],B(~X|nobservatio in the events of #Pr[2

00

events of #

0

0

knk PPkn

PnX

)1( 2

)],B(~X|nobservatio in the events of #Pr[2

00

n

events of #k

0

Page 33: Unit3:  Statistical Inferences

Power and Sample size

Page 34: Unit3:  Statistical Inferences

One-Sample Inference for the Poisson Distribution

• X ~ Poisson with mean μ• To test H0 : µ=µ0 vs H1 : µ≠µ0 at α level of

significance,– Obtain a two-sided 100(1- α)% C.I. for µ,

say (C1, C2)– If µ0 (C1, C2), we accept H0 otherwise reject H0.

Page 35: Unit3:  Statistical Inferences

One-Sample Inference for the Poisson Distribution

• The p-value (for above two-sided test)– If observed X < µ0, then

– If observed X > µ0,

Where F(x | µ0) is the Poisson c.d.f with mean = µ0.

]1),|(2min[ 0xFP

]1)),|1(1(2min[ 0 xFP

Page 36: Unit3:  Statistical Inferences

Large-Sample Test for Poisson (for µ0 ≥ 10)

• To test H0 : µ=µ0 vs H1 : µ≠µ0 at α level of significance,– Test Statistic:

– Rejection Region:

– p-value:

02

12

00

202 ~)1100/()( HunderSMRxX

21,1

2 X

2 21Pr X