CHAPTER 6 Statistical Inference & Hypothesis Testing

45
6.1 - One Sample Mean μ, Variance σ 2 , Proportion π 6.2 - Two Samples Means, Variances, Proportions μ 1 vs. μ 2 σ 1 2 vs. σ 2 2 π 1 vs. π 2 6.3 - Multiple Samples Means, Variances, CHAPTER 6 Statistical Inference & Hypothesis Testing

description

CHAPTER 6 Statistical Inference & Hypothesis Testing . 6.1 - One Sample Mean μ , Variance σ 2 , Proportion π 6.2 - Two Samples Means, Variances, Proportions μ 1 vs. μ 2 σ 1 2 vs. σ 2 2 π 1 vs. π 2 6.3 - Multiple Samples Means, Variances, Proportions - PowerPoint PPT Presentation

Transcript of CHAPTER 6 Statistical Inference & Hypothesis Testing

Page 1: CHAPTER 6 Statistical Inference & Hypothesis Testing

• 6.1 - One Sample

Mean μ, Variance σ 2, Proportion π

• 6.2 - Two Samples Means, Variances, Proportions μ1 vs. μ2 σ1

2 vs. σ22 π1 vs. π2

• 6.3 - Multiple Samples Means, Variances, Proportions μ1, …, μk σ1

2, …, σk2 π1, …, πk

CHAPTER 6 Statistical Inference & Hypothesis Testing

Page 2: CHAPTER 6 Statistical Inference & Hypothesis Testing

• 6.1 - One Sample

Mean μ, Variance σ 2, Proportion π

• 6.2 - Two Samples Means, Variances, Proportions μ1 vs. μ2 σ1

2 vs. σ22 π1 vs. π2

• 6.3 - Multiple Samples Means, Variances, Proportions μ1, …, μk σ1

2, …, σk2 π1, …, πk

CHAPTER 6 Statistical Inference & Hypothesis Testing

Page 3: CHAPTER 6 Statistical Inference & Hypothesis Testing

s2 = SS/df2 2

1 1 2 2

1 2

( 1) ( 1)2pooled 2

n s n sn ns

(5 1)( ) (3 1)( )2pooled 5 3 2s

788.5 1663 1080

2 2(593 ) (520 )22 3 1s

546 546 1663

593 525 5202 3y 546

Example: Y = “$ Cost of a certain medical service”

• Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3

667 653 614 612 6041 5y 630

• Analysis via T-test (if equivariance holds): Point estimates /iy y n1 2y y 84

NOTE:> 0

Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”).

Clinic: Y2 ~ N(μ2, σ2) Hospital: Y1 ~ N(μ1, σ1) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05

“Group Means”

2 2(667 ) (604 )21 5 1s

630 630 788.5“Group Variances”

1663788.5 2.11 4F

Pooled Variance

The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights.

SS1 SS2

2 2(593 546) (520 546)22 3 1s

16632 2(667 630) (604 63021 5 1s

) 788.5

Page 4: CHAPTER 6 Statistical Inference & Hypothesis Testing

p-value =

SSErr = 64802 2

1 1 2 2

1 2

( 1) ( 1)2pooled 2

n s n sn ns

788.(5 1)( ) (3 1)( )2pooled 5 3 2

5 1663s 1080

2 2(593 ) (520 )22 3

5 61

546 4s 1663

593 525 5202 3y 546

Example: Y = “$ Cost of a certain medical service”

• Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3

667 653 614 612 6041 5y 630

• Analysis via T-test (if equivariance holds): Point estimates /iy y n1 2y y 84

NOTE:> 0

Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”).

Clinic: Y2 ~ N(μ2, σ2) Hospital: Y1 ~ N(μ1, σ1) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05

“Group Means”

2 2(667 ) (604 )21 5 1s

630 630 788.5“Group Variances”

1663788.5 2.11 4F

Pooled Variance

The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights.

s2 = SS/df

2 2(593 546) (520 546)22 3 1s

1663

dfErr = 6

Standard Error

20 pooled

1 2

1 1s.e. sn n

01 1s.e.5 3

1080 24

2 2(667 630) (604 63021 5 1s

) 788.5

01 22 ( ) 2 2P Y Y P T P T 6 624

8484 3.5

> 2 * (1 - pt(3.5, 6))[1] 0.01282634

Reject H0 at α = .05stat signif, Hosp > Clinic

Page 5: CHAPTER 6 Statistical Inference & Hypothesis Testing

R code:

> y1 = c(667, 653, 614, 612, 604)> y2 = c(593, 525, 520)> > t.test(y1, y2, var.equal = T)

Two Sample t-test

data: y1 and y2 t = 3.5, df = 6, p-value = 0.01283alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 25.27412 142.72588 sample estimates:mean of x mean of y 630 546

p-value < α = .05Reject H0 at this level.

The samples provide evidence that the difference between mean costs is (moderately) statistically significant, at the 5% level, with the hospital being higher than the clinic (by an average of $84).

Formal Conclusion

Interpretation

Page 6: CHAPTER 6 Statistical Inference & Hypothesis Testing

“Total Variability” = “Variability between groups” + “Variability within groups”

12

k

1Y 2Y kY

1 2 k

12

k

=

=Null

Hypothesis?

=H0:

HA: “At least one ‘treatment mean’ μi is significantly different from the others.

Analysis of Variance (ANOVA) Main Idea: Among several (k 2) independent, equivariant,

normally-distributed “treatment groups”…

Alternate method ~

Page 7: CHAPTER 6 Statistical Inference & Hypothesis Testing

• (if equivariance holds): Point estimates /iy y nANOVA F-test593 525 520

2 3y 546

Example: Y = “$ Cost of a certain medical service”

• Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3

667 653 614 612 6041 5y 630 1 2 84y y

NOTE:> 0

Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”).

Clinic: Y2 ~ N(μ2, σ2) Hospital: Y1 ~ N(μ1, σ1) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05

“Group Means”

“Grand Mean”667 653 614 612 604 593 525 520

5 3y

598.50

5 (630) 3 (546)

The grand mean is a weighted average of the group means, using the sample sizes as the weights.

667 653 614 612 604

1 5y 630 593 525 5202 3y 546

Page 8: CHAPTER 6 Statistical Inference & Hypothesis Testing

Analysis of Variance (ANOVA)

“Total Variability” =

Alternate method ~

“Variability between groups” + “Variability within groups”

12

k

1Y 2Y kY

1 2 k

12

k

=

==H0:

HA: “At least one ‘treatment mean’ μi is significantly different from the others.

Main Idea: Among several (k 2) independent, equivariant, normally-distributed “treatment groups”…

Page 9: CHAPTER 6 Statistical Inference & Hypothesis Testing

5( ) 3( )5 3

y

630 546 598.50

• Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3

593 525 5202 3y 546

Example: Y = “$ Cost of a certain medical service”

667 653 614 612 6041 5y 630

Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”).

Clinic: Y2 ~ N(μ2, σ2) Hospital: Y1 ~ N(μ1, σ1) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05

“Group Means”

“Grand Mean”

• (if equivariance holds): Point estimates /iy y nANOVA F-test

How far is the “total” sample from the grand mean?

Page 10: CHAPTER 6 Statistical Inference & Hypothesis Testing

• Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3

5( ) 3( )5 3

y

630 546 598.50

593 525 5202 3y 546

Example: Y = “$ Cost of a certain medical service”

667 653 614 612 6041 5y 630

Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”).

Clinic: Y2 ~ N(μ2, σ2) Hospital: Y1 ~ N(μ1, σ1) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05

“Group Means”

“Grand Mean”

• (if equivariance holds): Point estimates /iy y nANOVA F-test

SSTot = 2 2 2 2 2(667 ) (653 ) (614 ) (612 ) (604 ) 598.5 598.5 598.5 598.5 598.52 2 2(593 ) (525 ) (520 ) 598.5 598.5 598.5 = 19710 dfTot = (5+3) –1 = 7

Page 11: CHAPTER 6 Statistical Inference & Hypothesis Testing

k

Analysis of Variance (ANOVA)

“Total Variability” =

Alternate method ~

“Variability between groups” + “Variability within groups”

12

k

1Y 2Y kY

1 2 k

12

=

==H0:

How can we measure this? Imagine zero variability within groups…

Main Idea: Among several (k 2) independent, equivariant, normally-distributed “treatment groups”…

Page 12: CHAPTER 6 Statistical Inference & Hypothesis Testing

k

Analysis of Variance (ANOVA)

“Total Variability” =

Alternate method ~

“Variability between groups” + “Variability within groups”

12

k

1 2 = =H0:

How can we measure this? Imagine zero variability within groups…

k=

2Y kY

21

1Y

Main Idea: Among several (k 2) independent, equivariant, normally-distributed “treatment groups”…

Page 13: CHAPTER 6 Statistical Inference & Hypothesis Testing

• Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3

{630, 630, 630, 630, 630}

Example: Y = “$ Cost of a certain medical service”

Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”).

Clinic: Y2 ~ N(μ2, σ2) Hospital: Y1 ~ N(μ1, σ1) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05

“Group Means”

“Grand Mean”

• (if equivariance holds): Point estimates /iy y nANOVA F-test

SSTot = 2 2 2 2 2(667 ) (653 ) (614 ) (612 ) (604 ) 598.5 598.5 598.5 598.5 598.5= 19710

{546, 546, 546}

SSTrt = 2 25 ( ) 3 ( ) 630 598.5 546 598.5

2 2 2(593 ) (525 ) (520 ) 598.5 598.5 598.5

= 13230

dfTot = (5+3) –1 = 7

dfTrt = (2) –1 = 1

5( ) 3( )5 3

y

630 546 598.50

667 653 614 612 6041 5y 630 593 525 520

2 3y 546

“The Clonemast

er”

Page 14: CHAPTER 6 Statistical Inference & Hypothesis Testing

Analysis of Variance (ANOVA)

“Total Variability” =

Alternate method ~

“Variability between groups” + “Variability within groups”

12

k

1Y 2Y kY

1 2 k

12

k

=

==H0:

Main Idea: Among several (k 2) independent, equivariant, normally-distributed “treatment groups”…

Page 15: CHAPTER 6 Statistical Inference & Hypothesis Testing

• (if equivariance holds): Point estimates /iy y n593 525 520

2 3y 546667 653 614 612 6041 5y 630

• Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3

Example: Y = “$ Cost of a certain medical service”

Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”).

Clinic: Y2 ~ N(μ2, σ2) Hospital: Y1 ~ N(μ1, σ1) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05

“Group Means”

“Grand Mean”

ANOVA F-test

SSTot = 2 2 2 2 2(667 ) (653 ) (614 ) (612 ) (604 ) 598.5 598.5 598.5 598.5 598.5= 19710

SSTrt = 2 25 ( ) 3 ( ) 630 598.5 546 598.5

2 2 2(593 ) (525 ) (520 ) 598.5 598.5 598.5

= 13230

dfTot = (5+3) –1 = 7

dfTrt = (2) –1 = 1

How far is each sample from its own group mean?

5( ) 3( )5 3

y

630 546 598.50

Page 16: CHAPTER 6 Statistical Inference & Hypothesis Testing

593 525 5202 3y 546667 653 614 612 604

1 5y 630

• Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3

Example: Y = “$ Cost of a certain medical service”

Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”).

Clinic: Y2 ~ N(μ2, σ2) Hospital: Y1 ~ N(μ1, σ1) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05

“Group Means”

“Grand Mean”

• (if equivariance holds): Point estimates /iy y nANOVA F-test

SSTot = 2 2 2 2 2(667 ) (653 ) (614 ) (612 ) (604 ) 598.5 598.5 598.5 598.5 598.5= 19710

SSTrt = 2 25 ( ) 3 ( ) 630 598.5 546 598.5

2 2 2(593 ) (525 ) (520 ) 598.5 598.5 598.5

= 13230

dfTot = (5+3) –1 = 7

dfTrt = (2) –1 = 1

SSErr =

5( ) 3( )5 3

y

630 546 598.50

2 2 2 2 2(667 ) (653 ) (614 ) (612 ) (604 ) 630 630 630 630 6302 2 2(593 ) (525 ) (520 ) 546 546 546 BUT…

Page 17: CHAPTER 6 Statistical Inference & Hypothesis Testing

1 2y y 84

s2 = SS/df2 2

1 1 2 2

1 2

( 1) ( 1)2pooled 2

n s n sn ns

(5 1)( ) (3 1)( )2pooled 5 3 2s

788.5 1663 1080

2 2(593 ) (520 )22 3

5 61

546 4s 1663

593 525 5202 3y 546

Example: Y = “$ Cost of a certain medical service”

• Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3

667 653 614 612 6041 5y 630

• Analysis via T-test (if equivariance holds): Point estimates /iy y nNOTE:

> 0

Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”).

Clinic: Y2 ~ N(μ2, σ2) Hospital: Y1 ~ N(μ1, σ1) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05

“Group Means”

2 2(667 ) (6630 63004 )21 5 1s

788.5“Group Variances”

1663788.5 2.11 4F

Pooled Variance

The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights.

SS1 SS2

2 2(593 546) (520 546)22 3 1s

16632 2(667 630) (604 63021 5 1s

) 788.5

RECALL…

Page 18: CHAPTER 6 Statistical Inference & Hypothesis Testing

1 2y y 84

SSErr = 64802 2

1 1 2 2

1 2

( 1) ( 1)2pooled 2

n s n sn ns

(5 1)( ) (3 1)( )2pooled 5 3 2s

788.5 1663 1080

2 2(593 ) (520 )22 3

5 61

546 4s 1663

593 525 5202 3y 546

Example: Y = “$ Cost of a certain medical service”

• Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3

667 653 614 612 6041 5y 630

• Analysis via T-test (if equivariance holds): Point estimates /iy y nNOTE:

> 0

Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”).

Clinic: Y2 ~ N(μ2, σ2) Hospital: Y1 ~ N(μ1, σ1) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05

“Group Means”

2 2(667 ) (6630 63004 )21 5 1s

788.5“Group Variances”

1663788.5 2.11 4F

Pooled Variance

The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights.

s2 = SS/df

2 2(593 546) (520 546)22 3 1s

1663

dfErr = 6

2 2(667 630) (604 63021 5 1s

) 788.5

RECALL…

Page 19: CHAPTER 6 Statistical Inference & Hypothesis Testing

2 2 2 2 2(667 ) (653 ) (614 ) (612 ) (604 ) 630 630 630 630 6302 2 2(593 ) (525 ) (520 ) 546 546 546

593 525 5202 3y 546667 653 614 612 604

1 5y 630

• Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3

Example: Y = “$ Cost of a certain medical service”

Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”).

Clinic: Y2 ~ N(μ2, σ2) Hospital: Y1 ~ N(μ1, σ1) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05

“Group Means”

“Grand Mean”

• (if equivariance holds): Point estimates /iy y nANOVA F-test

SSTot = 2 2 2 2 2(667 ) (653 ) (614 ) (612 ) (604 ) 598.5 598.5 598.5 598.5 598.5= 19710

SSTrt = 2 25 ( ) 3 ( ) 630 598.5 546 598.5

2 2 2(593 ) (525 ) (520 ) 598.5 598.5 598.5

= 13230

dfTot = (5+3) –1 = 7

dfTrt = (2) –1 = 1

SSErr =

5( ) 3( )5 3

y

630 546 598.50

Page 20: CHAPTER 6 Statistical Inference & Hypothesis Testing

593 525 5202 3y 546667 653 614 612 604

1 5y 630

• Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3

Example: Y = “$ Cost of a certain medical service”

Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”).

Clinic: Y2 ~ N(μ2, σ2) Hospital: Y1 ~ N(μ1, σ1) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05

“Group Means”

“Grand Mean”

• (if equivariance holds): Point estimates /iy y nANOVA F-test

SSTot = 2 2 2 2 2(667 ) (653 ) (614 ) (612 ) (604 ) 598.5 598.5 598.5 598.5 598.5= 19710

SSTrt = 2 25 ( ) 3 ( ) 630 598.5 546 598.5

2 2 2(593 ) (525 ) (520 ) 598.5 598.5 598.5

= 13230

dfTot = (5+3) –1 = 7

dfTrt = (2) –1 = 1

SSErr =

5( ) 3( )5 3

y

630 546 598.50

4( ) 2( )788.5 1663 dfErr = = 6(5+3) –2= 6480

SSTot = SSTrt + SSErr dfTot = dfTrt + dfErr

Page 21: CHAPTER 6 Statistical Inference & Hypothesis Testing

SSTot = SSTrt + SSErr dfTot = dfTrt + dfErr

Source df SS MS F-ratio p-value

Treatment 1 13230 13230

Error 6 6480 1080

Total 7 19710 –

ANOVA TableSSMSdf

2betweens

2withins

Note: This is also

2pooled .s

Trt

Err

MSMS

F

SSTot = SSTrt + SSErr dfTot = dfTrt + dfErrTot

Err

Trt

Page 22: CHAPTER 6 Statistical Inference & Hypothesis Testing

Source df SS MS F-ratio p-value

Treatment 1 13230 13230

12.25 ????Error 6 6480 1080

Total 7 19710 –

ANOVA TableSSMSdf

2betweens

2withins

Note: This is also

2pooled .s

Trt

Err

MSMS

F

SSTot = SSTrt + SSErr dfTot = dfTrt + dfErrTot

Err

Trt

Page 23: CHAPTER 6 Statistical Inference & Hypothesis Testing

2 11

1

SSdf

s 2 22

2

SSdf

s

2122

sFs

2 20 1 2

2 21 2

:

:A

H

H

Test Statistic

Sampling Distribution =?

Page 24: CHAPTER 6 Statistical Inference & Hypothesis Testing

Source df SS MS F-ratio p-value

Treatment 1 13230 13230

12.25

Error 6 6480 1080

Total 7 19710 –

ANOVA TableSSMSdf

2betweens

2withins

Note: This is also

2pooled .s

Trt

Err

MSMS

F

SSTot = SSTrt + SSErr dfTot = dfTrt + dfErrTot

Err

Trt

1,6F

|

12.25

p-value

Page 25: CHAPTER 6 Statistical Inference & Hypothesis Testing

5.99

Page 26: CHAPTER 6 Statistical Inference & Hypothesis Testing

Source df SS MS F-ratio p-value

Treatment 1 13230 13230

12.25

Error 6 6480 1080

Total 7 19710 –

ANOVA TableSSMSdf

2betweens

2withins

Note: This is also

2pooled .s

Trt

Err

MSMS

F

SSTot = SSTrt + SSErr dfTot = dfTrt + dfErrTot

Err

Trt

1,6F

|

12.25

p-value

|

5.99

= .05α

Page 27: CHAPTER 6 Statistical Inference & Hypothesis Testing

Source df SS MS F-ratio p-value

Treatment 1 13230 13230

12.25

Error 6 6480 1080

Total 7 19710 –

ANOVA TableSSMSdf

2betweens

2withins

Note: This is also

2pooled .s

Trt

Err

MSMS

F

1, 6(on )F

p < .05

SSTot = SSTrt + SSErr dfTot = dfTrt + dfErrTot

Err

Trt

Page 28: CHAPTER 6 Statistical Inference & Hypothesis Testing

Source df SS MS F-ratio p-value

Treatment 1 13230 13230

12.25 .01282634

Error 6 6480 1080

Total 7 19710 –

ANOVA TableSSMSdf

2betweens

2withins

Note: This is also

2pooled .s

Trt

Err

MSMS

F

1, 6(on )F

SSTot = SSTrt + SSErr dfTot = dfTrt + dfErrTot

Err

Trt

1–pf(12.25, 1, 6)

Page 29: CHAPTER 6 Statistical Inference & Hypothesis Testing

Source df SS MS F-ratio p-value

Treatment 1 13230

12.25 .01282634

Error 6 6480 1080

Total 7 –

13230

19710

ANOVA TableSSMSdf

2betweens

2withins

Trt

Err

MSMS

F

1, 6(on )F

1–pf(12.25, 1, 6)

TotErr

Trt

SSTot = SSTrt + SSErr dfTot = dfTrt + dfErr

Thus, the treatment accounts for = 67.1% of the total variability in the response Y.

1323019710

Page 30: CHAPTER 6 Statistical Inference & Hypothesis Testing

R code:

# ANOVA FOR UNBALANCED DESIGN

> y1 = c(667, 653, 614, 612, 604)> y2 = c(593, 525, 520)> > Data = data.frame(+ Y = c(y1, y2),+ X = factor(rep(c("y1", "y2"), times = c(length(y1), length(y2))))+ )> > var.test(Y ~ X, data = Data) # EQUIVARIANCE?

F test to compare two variances

data: Y by X F = 0.4741, num df = 4, denom df = 2,p-value = 0.4738alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.01208057 5.04920249 sample estimates:ratio of variances 0.4741431

Page 31: CHAPTER 6 Statistical Inference & Hypothesis Testing

R code:

# ANOVA FOR UNBALANCED DESIGN

> out = aov(Y ~ X, data = Data)> anova(out)

Analysis of Variance Table

Response: Y Df Sum Sq Mean Sq F value Pr(>F) X 1 13230 13230 12.25 0.01283 *Residuals 6 6480 1080 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Note: Vis-à-vis T-test vs. F-test,

• p-value is the same using either method (.01283), since the sample is unchanged!

• The square of the Tdf -score (3.5) is equal to the F1, df -score (12.25).

(Recall that the square of the Z-score is equal to the -score.)21

Page 32: CHAPTER 6 Statistical Inference & Hypothesis Testing

1X 2X kX

Page 33: CHAPTER 6 Statistical Inference & Hypothesis Testing

1X 2X kX

Page 34: CHAPTER 6 Statistical Inference & Hypothesis Testing

Suppose this ANOVA “overall F-test” indicates that a significant difference exists between one (or more) of the treatment means, at = .05.

How can we find out which one(s)?

Page 35: CHAPTER 6 Statistical Inference & Hypothesis Testing

Idea: Test all possible pairwise comparisons, each via a two-sample t-test.Example : Suppose there are k = 5 treatment groups.

(1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3,4) (3,5) (4,5)... ... ... ... ... ... ... ... ... ...p p p p p p p p p p

There are such comparisons.5

102

12

k

1Y 2Y kY

1 2 k

12

k

= ==H0:

…etc…

PROBLEM???

Page 36: CHAPTER 6 Statistical Inference & Hypothesis Testing

Idea: Test all possible pairwise comparisons, each via a two-sample t-test.Example : Suppose there are k = 5 treatment groups.

(1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3,4) (3,5) (4,5)... ... ... ... ... ... ... ... ... ...p p p p p p p p p p

There are such comparisons.5

102

12

k

1Y 2Y kY

1 2 k

12

k

= ==H0:

…etc…

PROBLEM???

= .05

SPURIOUS SIGNIFICANCE!!!

* = .05/10

Page 37: CHAPTER 6 Statistical Inference & Hypothesis Testing

Idea: Test all possible pairwise comparisons, each via a two-sample t-test.Example : Suppose there are k = 5 treatment groups.

(1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3,4) (3,5) (4,5)... ... ... ... ... ... ... ... ... ...p p p p p p p p p p

There are such comparisons.5

102

12

k

1Y 2Y kY

1 2 k

12

k

= ==H0:

…etc…

Make each comparison at level * = / 10.

PROBLEM???

Page 38: CHAPTER 6 Statistical Inference & Hypothesis Testing

Idea: Test all possible pairwise comparisons, each via a two-sample t-test.Example : Suppose there are k = 5 treatment groups.

(1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3,4) (3,5) (4,5)... ... ... ... ... ... ... ... ... ...p p p p p p p p p p

There are such comparisons.5

102

12

k

1Y 2Y kY

1 2 k

12

k

= ==H0:

…etc…

BONFERRONI CORRECTIONMake each comparison at level * = / 10.

Page 39: CHAPTER 6 Statistical Inference & Hypothesis Testing

1

2

k

1Y 2Y kY

1 2 k

12

k

= ==H0:

Analysis of Variance (ANOVA) Main Idea: Among several (k 2) independent, equivariant,

normally-distributed “treatment groups”…

Alternate method ~

MODEL ASSUMPTIONS?

Page 40: CHAPTER 6 Statistical Inference & Hypothesis Testing

1

2

k

1Y 2Y kY

1 2 k

12

k

= ==H0:

Analysis of Variance (ANOVA) Main Idea: Among several (k 2) independent, equivariant,

normally-distributed “treatment groups”…

Alternate method ~

• Equivariance can be tested via very similar “two variances” F-test in 6.2.2 (but this is very sensitive to normality assumption), or others. If violated, can extend Welch Test for two means.

Page 41: CHAPTER 6 Statistical Inference & Hypothesis Testing

1

2

k

1Y 2Y kY

1 2 k

12

k

= ==H0:

Analysis of Variance (ANOVA) Main Idea: Among several (k 2) independent, equivariant,

normally-distributed “treatment groups”…

Alternate method ~

• Normality can be tested via usual methods. If violated, use nonparametric Kruskal-Wallis Test.

Page 42: CHAPTER 6 Statistical Inference & Hypothesis Testing

1

2

k

1Y 2Y kY

1 2 k

12

k

= ==H0:

Analysis of Variance (ANOVA) Main Idea: Among several (k 2) independent, equivariant,

normally-distributed “treatment groups”…

Alternate method ~

• Extensions of ANOVA for data in matched “blocks” designs, repeated measures, multiple factor levels within groups, etc.

Page 43: CHAPTER 6 Statistical Inference & Hypothesis Testing

1

2

k

1Y 2Y kY

1 2 k

12

k

= ==H0:

Analysis of Variance (ANOVA) Main Idea: Among several (k 2) independent, equivariant,

normally-distributed “treatment groups”…

Alternate method ~

• How to identify significant group(s)? Pairwise testing, with correction (e.g., Bonferroni) for spurious significance.

• Example: k = 5 groups result in 10 such tests, so let each α* = α / 10.

Page 44: CHAPTER 6 Statistical Inference & Hypothesis Testing
Page 45: CHAPTER 6 Statistical Inference & Hypothesis Testing

“spurious significance”