CHAPTER 6 Statistical Inference & Hypothesis Testing

• 6.1 - One Sample

Mean μ, Variance σ 2, Proportion π

• 6.2 - Two Samples Means, Variances, Proportions μ1 vs. μ2 σ1

2 vs. σ22 π1 vs. π2

• 6.3 - Multiple Samples Means, Variances, Proportions μ1, …, μk σ1

2, …, σk2 π1, …, πk

CHAPTER 6 Statistical Inference & Hypothesis Testing

Consider two independent populations…

Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No mean difference") Test at signif level α

POPULATION 1

and a random variable X, normally distributed in each.

POPULATION 2

Classic Example: “Randomized Clinical Trial”… Pop 1 = Treatment, Pop 2 = Control

X2 ~ N(μ2, σ2)

1

σ1

2

σ2

X1 ~ N(μ1, σ1)

Random Sample, size n1


Sampling Distribution =?

2X1X

μ0



POPULATION 1


POPULATION 2


X2 ~ N(μ2, σ2)

1

σ1

2

σ2

X1 ~ N(μ1, σ1)




2 2 2 2~ ,X N n

1 2 ~ ????X X

1 1 1 1~ ,X N n

μ0

Mean(X – Y) = Mean(X) – Mean(Y)



POPULATION 1


POPULATION 2


X2 ~ N(μ2, σ2)

1

σ1

2

σ2

X1 ~ N(μ1, σ1)




2 2 2 2~ ,X N n

1 2 ~ ????, ????X X N

1 1 1 1~ ,X N n

Recall from section 4.1 (Discrete Models):

and if X and Y are independent…

Var(X – Y) = Var(X) + Var(Y)

μ0



POPULATION 1


POPULATION 2


X2 ~ N(μ2, σ2)

1

σ1

2

σ2

X1 ~ N(μ1, σ1)




2 2 2 2~ ,X N n

1 2 1 2~ , ????X X N

1 1 1 1~ ,X N n

Recall from section 4.1 (Discrete Models):Mean(X – Y) = Mean(X) – Mean(Y)



μ0



POPULATION 1


POPULATION 2


X2 ~ N(μ2, σ2)

1

σ1

2

σ2

X1 ~ N(μ1, σ1)




2 2 2 2~ ,X N n

21

1 2 1 21

~ ,X X Nn

1 1 1 1~ ,X N n




μ0



POPULATION 1


POPULATION 2


X2 ~ N(μ2, σ2)

1

σ1

2

σ2

X1 ~ N(μ1, σ1)




2 2 2 2~ ,X N n

2 21 2

1 2 1 21 2

~ ,X X Nn n

1 1 1 1~ ,X N n




μ0



POPULATION 1


POPULATION 2


X2 ~ N(μ2, σ2)

1

σ1

2

σ2

X1 ~ N(μ1, σ1)




2 2 2 2~ ,X N n

2 21 2

1 2 1 21 2

~ ,X X Nn n

1 1 1 1~ ,X N n




μ0

= 0 under H0



POPULATION 1


POPULATION 2X2 ~ N(μ2, σ2)

1

σ1

2

σ2

X1 ~ N(μ1, σ1)

Null Distribution2 2

1 21 2

1 2

~ 0,X X Nn n

0

s.e.

But what if σ12 and σ2

2 are unknown?

1 2X X

Then use sample estimates s12 and s2

2 with Z- or t-test, if n1 and n2 are large.



POPULATION 1


POPULATION 2X2 ~ N(μ2, σ2)

1

σ1

2

σ2

X1 ~ N(μ1, σ1)


1 21 2

1 2

~ 0, s sX X Nn n

0

s.e.

But what if σ12 and σ2

2 are unknown?

1 2X X

Then use sample estimates s12 and s2

2 with Z- or t-test, if n1 and n2 are large.

(But what if n1 and n2 are small?)

Later…

22s 1663.0

Example: X = “$ Cost of a certain medical service”

• Data Sample 1: n1 = 137

1 2x x 84NOTE:

> 0

Assume X is known to be normally distributed at each of k = 2 health care facilities (“groups”).

Clinic: X2 ~ N(μ2, σ2) Hospital: X1 ~ N(μ1, σ1) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05

1x 630 2x 546Sample 2: n2 = 140

21s 788.5


1 21 2

1 2

0, s sX X Nn n

0

0,137 140

N

788.5 1663.0

0, 4.2N

4.2

95% Confidence Interval for μ1 – μ2:95% Margin of Error = (1.96)(4.2) = 8.232

(84 – 8.232, 84 + 8.232) =

(75.768, 92.232) does not contain 0

Z-score = 0

4.284

= 20 >> 1.96 p << .05

Reject H0; extremely strong significant difference


POPULATION 1 POPULATION 2X2 ~ N(μ2, σ2)

1 2

X1 ~ N(μ1, σ1)


1 21 2

1 2

~ 0,X X Nn n

12

Samplesize n1

Sample size n2

2 21 2 unknown and

Consider two independent populations…and a random variable X, normally distributed in each.

large n1 and n2

21s

22s

2 21 2 2 2

0 1 2:H



1 2

X1 ~ N(μ1, σ1)


1 21 2

1 2

~ 0,X X Nn n

12

Samplesize n1

Sample size n2

2 21 2 unknown and


large n1 and n2

21s

22s

small n1 and n2

then conduct a t-test on the “pooled” samples.

IF the two populations are equivariant, i.e.,

12

2 20 1 2

2 21 2

:

:A

H

H

21s

22s 2 2

0 1 2:H

21s

22s

2 20 1 2

2 21 2

:

:A

H

H

Test Statistic


2122

sFs

Working Rule of Thumb

Acceptance Region for H0

¼ < F < 4

2 21 1 2 2

1

2pooled

2

( 1) ( 1)2

n s n ssn n

2

12

2po

2

1 2

2ole

1d

( 1) ( 1)2

s ns snn n

2

2 12

1

1

2

2pooled

2( 1) ( 1)2

s ns sn

nn

2

2 12

2pooled

1 2

1 2( 1) ( 1)2

s ns snn n

2 20 1 2:H



1 2

X1 ~ N(μ1, σ1)

Null Distribution

2 21 2

1 21 2

~ 0,X X Nn n

12

2 21 2 unknown and


small n1 and n2

is accepted, then estimate their common value with a “pooled” sample variance.

IF equal variances

12

The pooled variance is a weighted average of s1

2 and s22, using the

degrees of freedom as the weights.

2 21 1 2 2

1

2pooled

2

( 1) ( 1)2

n s n ssn n

2

12

2po

2

1 2

2ole

1d

( 1) ( 1)2

s ns snn n

2

2 12

1

1

2

2pooled

2( 1) ( 1)2

s ns sn

nn

2

2 12

2pooled

1 2

1 2( 1) ( 1)2

s ns snn n

2 20 1 2:H



1 2

X1 ~ N(μ1, σ1)

Null Distribution

2 21 2

1 21 2

~ 0,X X Nn n

12

2 21 2 unknown and


small n1 and n2

is accepted, then estimate their common value with a “pooled” sample variance.

IF equal variances

12

2 2pooled pooled

1 2

s.e.s s

n n 2

pooled1 2

1 1s.e. sn n

The pooled variance is a weighted average of s1

2 and s22, using the

degrees of freedom as the weights.

is rejected,2 20 1 2:H IF equal variances

then use Satterwaithe Test, Welch Test, etc. SEE LECTURE NOTES AND TEXTBOOK.

1 2y y 84

2 21 1 2 2

1 2

( 1) ( 1)2pooled 2

n s n sn ns

s2 = SS/df

2 2(593 ) (520 )22 3 1s

546 546 16632 2(667 ) (604 )21 5 1s

630 630 788.5

Example: Y = “$ Cost of a certain medical service”

Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”).

Clinic: Y2 ~ N(μ2, σ2) Hospital: Y1 ~ N(μ1, σ1) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05

593 525 5202 3y 546

• Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3

667 653 614 612 6041 5y 630

• Analysis via T-test (if equivariance holds): Point estimates /iy y nNOTE:

> 0“Group Means”

2 2(593 546) (520 546)22 3 1s

1663

(5 1)( ) (3 1)( )2pooled (5 1) (3 1)s

788.5 1663 1080 The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights.

2 2(667 630) (604 63021 5 1s

) 788.5“Group Variances”

Pooled Variance

1663788.5 2.11 4F

SS1 SS2

df1 df2

2pooled

1 2

1 1s.e. sn n

1 1s.e.5 3

1080 24

df = 6

s2 = SS/df

2 2(593 ) (520 )22 3 1s

546 546 16632 2(667 ) (604 )21 5 1s

630 630 788.5

1 2y y 84

p-value =

Example: Y = “$ Cost of a certain medical service”

Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”).

Clinic: Y2 ~ N(μ2, σ2) Hospital: Y1 ~ N(μ1, σ1) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05

Standard Error

02 2P T P T 2484

6 6 3.5

> 2 * (1 - pt(3.5, 6))[1] 0.01282634

Reject H0 at α = .05stat signif, Hosp > Clinic

593 525 5202 3y 546

• Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3

667 653 614 612 6041 5y 630

• Analysis via T-test (if equivariance holds): Point estimates /iy y nNOTE:

> 0“Group Means”

2 2(593 546) (520 546)22 3 1s

1663

The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights.

2 2(667 630) (604 63021 5 1s

) 788.5“Group Variances”

Pooled Variance

1663788.5 2.11 4F

SS = 6480

(5 1)( ) (3 1)( )2pooled (5 1) (3 1)s

788.5 1663 1080

1 22 ( )P Y Y 84

R code:

> y1 = c(667, 653, 614, 612, 604)> y2 = c(593, 525, 520)> > t.test(y1, y2, var.equal = T)

Two Sample t-test

data: y1 and y2 t = 3.5, df = 6, p-value = 0.01283alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 25.27412 142.72588 sample estimates:mean of x mean of y 630 546

p-value < α = .05Reject H0 at this level.

The samples provide evidence that the difference between mean costs is (moderately) statistically significant, at the 5% level, with the hospital being higher than the clinic (by an average of $84).

Formal Conclusion

Interpretation

NEXT UP…

PAIRED MEANSpage 6.2-7, etc.

CHAPTER 6 Statistical Inference & Hypothesis Testing

Documents

Transcript of CHAPTER 6 Statistical Inference & Hypothesis Testing