Analysis of Variance Comparisons among multiple populations.

31
Analysis of Variance Comparisons among multiple populations
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    224
  • download

    0

Transcript of Analysis of Variance Comparisons among multiple populations.

Page 1: Analysis of Variance Comparisons among multiple populations.

Analysis of Variance

Comparisons among multiple populations

Page 2: Analysis of Variance Comparisons among multiple populations.

More than two populations

Group A B C Dobservations

XA,1

XA,2

XA,3

XA,4…

XA,na

XB,1

XB,2

XB,3

…XB,nb

XC,1

XC,2

XC,3

XC,4

XC,5 …

XC,nc

XD,1

XD,2

XD,3

XD,4

…XD,nd

H0: μA=μB=μC=μD

Discu

ss in th

e C

hapte

r 8

H0: μA=μB H0: μB=μC H0: μC=μD

Page 3: Analysis of Variance Comparisons among multiple populations.

Basic assumptions and the hypothesis testing logic

The observed data are normally distributed with the same variance (although unknown) σ2.

Derive two estimators for σ2

The first is always valid whether the hypothesis, H0: μA=μB=μC=μD, is true or not.

The second one is usually greater than the real parameter σ2 when H0: μA=μB=μC=μD is not true.

Compare these two estimators (2nd/1st) through the sample.

If the ratio is too large, then reject H0

Page 4: Analysis of Variance Comparisons among multiple populations.

ANOVA testing

ANalysis Of VAriance To test the considered hypothesis by analy

zing the variance σ2 Search the proper estimators Decomposition of variance …

Page 5: Analysis of Variance Comparisons among multiple populations.

E.g., Decomposition of Syy, Syy=SSr+SSm

_

Y

_

Y

_

Y

X

Y

Yi(Yi- )

(Yi- )

( - )

如果離差越大 , 表示 Y^ 不太可能是水平線 ,因為若是水平線 , 則差的平方和將會很小

如果離差越小 , 表示迴歸線越接近真實值 , 預測得越準確 !Y 的離差 , 因

給定 sample之後固定不變

_

Y

^

Y

^

Y

^

Y

Page 6: Analysis of Variance Comparisons among multiple populations.

Review x2 distributionX1, X2, X3,…Xn are independent random variables

~N(0,1)( )

~ N(0,1)

21nx

2nx

Page 7: Analysis of Variance Comparisons among multiple populations.

One-way ANOVA approach (i)

# of groups

# of obs. In each group

Total d.f =# of group × obs. in ea

ch group

The ith group mean

Replace the group mean by sample mean

2mnmx ~

Total d.f =# of group ×(

obs. in each group-1)

.

Equal group size, n

Page 8: Analysis of Variance Comparisons among multiple populations.

One-way ANOVA approach (ii)

By definitionCalled as “within samples sum of squares”

SSw/σ2 ~

By definition Called as “between samples sum of squares”

Group sample mean Total sample mean

)1,0(~ N

2mnmx

Replaced total mean by total sample mean X.. d.f.=# of group -1

Var(Xi.)=Var(Xij)/n

i,e., assume all Xi. population means are equal to μ in order to replace μ by total sample mean X..

SSb/σ2 ~2

1mx

Page 9: Analysis of Variance Comparisons among multiple populations.

One-way ANOVA approach (iii)

)(),1(~ mnmmF

i.e., the numerator is sufficiently large, while the denominator is smaller

=TS

Or reject H0 when <α

Page 10: Analysis of Variance Comparisons among multiple populations.

Decomposition of Var(Xij) (i)

Xij

A group B group C group

Total mean X..

A group mean

B group mean XB.

C group mean

Total deviation

Between deviationWithin deviation

If the group difference is smaller, the deviation from the center should be caused by the within randomness.

Page 11: Analysis of Variance Comparisons among multiple populations.

Decomposition of Var(Xij) (ii)

..1 1

2

1 1

2..)( nmXXXX

m

i

n

jij

m

i

n

jij

In usual, define SST as the total sum of squares

m

i

n

jiiij XXXX

1 1

2.... )(=

Page 12: Analysis of Variance Comparisons among multiple populations.

If H0 is not accepted?

Xi.~N(μi, σ2/n)Yi~N(μ., σ2/n)Set

&

∵Xi.=Yi+μi-μ. X..=Y.

Within deviation

=E[Yi]-E[Y.] =μ.-μ.=0

Page 13: Analysis of Variance Comparisons among multiple populations.

ANOVA table

SST=

If p-value<α

Page 14: Analysis of Variance Comparisons among multiple populations.

The meaning of ANOVA table

Source

Sum of error squares

Degree of freedom

Mean of SS F

Between

SSb m-1(# of group -1)

MSb=SSb/(m-1) MSb/MSw

Within SSw nm-m=m(n-1)

MSw=SSw/(nm-m)

total SST nm-1(# of observation -1)

SST=SSb+SSw, 如果 SSb 越大, SSw 將越小,則在不變的組數 m 之下 , MSb 將越大 ,MSw 越小 , 於是 F 值就越大,越可能 reject H0: 各組平均值無差異。也就是說觀察的變數 Xij 與 X 之總平均數的差異,大部份肇因於 Xi. 類別平均數之間的差異。

Page 15: Analysis of Variance Comparisons among multiple populations.

Unbalanced case—unequal sample size within the groups

Different group size ni

conditional estimator of σ2

unconditional estimator of σ2

Page 16: Analysis of Variance Comparisons among multiple populations.

Unbalanced F-test for ANOVA

A balanced design is suitable over an unbalanced one because of the insensitivity to slight departures from the assumption of equal population variances.

Page 17: Analysis of Variance Comparisons among multiple populations.

Two classification factors

A B C Da Xa,A,1

Xa,A,2,…,Xa,A,k

Xa,B,1

Xa,B,2,…,Xa,B,k

Xa,C,1

Xa,C,2,…,Xa,C,k

Xa,D,1

Xa,D,2,…,Xa,D,k

bXb,A,1

Xb,A,2…,Xb,A,k

Xb,B,1

Xb,B,2…, Xb,B,k

Xb,C,1 Xb,C,2…,

Xb,C,k

Xb,D,1, Xb,D,2…,

Xb,Dk

CXc,A,1, Xc,A,2,…,

Xc,A,k

Xc,B,1, Xc,B,2,…,

Xc,B,k

Xc,C,1, Xc,C,2,…,

Xc,C,k

Xc,D,1, Xc,D,2,…,

Xc,D,k

Column factor

Row

fa

ctor

Page 18: Analysis of Variance Comparisons among multiple populations.

Two-way ANOVA approach (i)

m types

n types

Review

(αi=μi-μ, the deviation from total μ)

m

i

m

iii

m

ii m

1 11

0)( (∵ )

Only one observation within each cell

Page 19: Analysis of Variance Comparisons among multiple populations.

Two-way ANOVA approach (ii)

The cell mean of size k or other

Supposed an additive model for cell mean, composed by ai and bj

The ith row mean The jth

column mean

The total mean

Average row factor

Average column factor

The ith row mean=the average column factor+ the specific ith row factor

jj

Deviation from average row factor,column factor

&

Page 20: Analysis of Variance Comparisons among multiple populations.

Two-way ANOVA approach (iii)

Page 21: Analysis of Variance Comparisons among multiple populations.

Two-way ANOVA approach (iv)

i.e., The expected value of specific ij cell could be decomposed into: Total mean+ the ith deviation from average row factor (the ith row deviation from the total mean)+ the jth deviation from average column factor (the jth column deviation from total mean)

&

Use the unbiased estimators to test the objective hypothesis

The assumed two-way ANOVA model

^^^

,, ji

Page 22: Analysis of Variance Comparisons among multiple populations.

Two-way ANOVA approach (v)

2~ nmx

Apply each unbiased estimator

?

2)1)(1(~ mnx

Reduced n-1 d.f.Reduced m-1 d.f.Reduced 1 d.f.

Page 23: Analysis of Variance Comparisons among multiple populations.

Two-way ANOVA approach (vi)

If is true

then

2~ mx

21~ mxDefine

the row sum of squares

2

Page 24: Analysis of Variance Comparisons among multiple populations.

Two way ANOVA table

=m

Page 25: Analysis of Variance Comparisons among multiple populations.

Two-way ANOVA with interaction (i)

Page 26: Analysis of Variance Comparisons among multiple populations.

Two-way ANOVA with interaction (ii)

Page 27: Analysis of Variance Comparisons among multiple populations.

Two-way ANOVA with interaction (iii)

2)1(~ lnmx

2)1(~ lnmx

2~ nmlx

Page 28: Analysis of Variance Comparisons among multiple populations.

Two-way ANOVA with interaction (iv)

2~ nmx

2)1)(1(~ mnx

Page 29: Analysis of Variance Comparisons among multiple populations.

Two-way ANOVA with interaction (v)

Page 30: Analysis of Variance Comparisons among multiple populations.

Two-way ANOVA with interaction (vi)

21~ mx

2~ mx

Page 31: Analysis of Variance Comparisons among multiple populations.

Homework #3 Problem 5, 15, 19, 20, 25