Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

46
Chapter 3 Analysis of Variance (ANOVA; ²5) &5² Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) Chapter 16 Design and Analysis of Experiments (Douglas C. Montgomery) hsuhl (NUK) DAE Chap. 3 1 / 46

Transcript of Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Page 1: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Chapter 3 Analysis of Variance

(ANOVA;變變變異異異數數數分分分析析析)

許湘伶

Applied Linear Regression Models(Kutner, Nachtsheim, Neter, Li)

Chapter 16Design and Analysis of Experiments

(Douglas C. Montgomery)

hsuhl (NUK) DAE Chap. 3 1 / 46

Page 2: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Part I

Supplement

hsuhl (NUK) DAE Chap. 3 2 / 46

Page 3: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Relation between Regression and Analysis of Variance

Regression model:

yi = β0 + β1X1i + · · ·+ βkXki + εi, i = 1, . . . , n

ANOVA Model or One-way model:

yij = µi + εij = µ+ τi + εij,

{i = 1, . . . , aj = 1, . . . , n

hsuhl (NUK) DAE Chap. 3 3 / 46

Page 4: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Relation between Regression and Analysis of Variance (cont.)

Analysis of variance models differ from ordinary regression models intwo key respects:

1 The explanatory or predictor variables in ANOVA models may bequalitative.

2 If the predictor variables are quantitative, no assumption is madein ANOVA models about the nature of the statistical relationbetween Xs and Y .

hsuhl (NUK) DAE Chap. 3 4 / 46

Page 5: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Relation between Regression and Analysis of Variance (cont.)

hsuhl (NUK) DAE Chap. 3 5 / 46

Page 6: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Relation between Regression and Analysis of Variance (cont.)

When indicator variables are so used with regression models, theregression results will be identical to those obtained with ANOVAmodels.

ANOVA models and regression models with indicator variableswill lead to identical results.

hsuhl (NUK) DAE Chap. 3 6 / 46

Page 7: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Relation between Regression and Analysis of Variance (cont.)

Figure : Regression model

Figure : Figure 16.4 Illustration of Partitioning of Total Deviations Yij − Y··hsuhl (NUK) DAE Chap. 3 7 / 46

Page 8: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Part II

Chapter 3 The Analysis of Variance

hsuhl (NUK) DAE Chap. 3 8 / 46

Page 9: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Outline

1 Example2 The analysis of variance3 Analysis of the fixed effects model4 Model adequacy checking5 Practical interpretation of results6 Sample computer output7 Determining sample size8 Other example of single-factor experiments9 The random effect model10 The regression approach to the ANOVA11 Nonparametric methods in the ANOVA

hsuhl (NUK) DAE Chap. 3 9 / 46

Page 10: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Example

methods for the design and analysis of single-factor experimentswith a levels of the factor (or a treatments)

Assume: completely randomized

wafer(晶片)Relationship: RF powersetting vs. the etch rate(蝕刻速率)

I RF power: 4 levels:160, 180, 200, 220 W

I 蝕刻速率: 測量物質從晶圓表面被移除的的速率有多快

n = 5 replicates- 20 runs inrandom order

hsuhl (NUK) DAE Chap. 3 10 / 46

Page 11: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Example (cont.)

hsuhl (NUK) DAE Chap. 3 11 / 46

Page 12: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Example (cont.)

no strong evidence to suggest that the variability in the etch ratearound the average depends on the power setting

Test: differences between the mean etch rates at a = 4 levels ofRF power

1 t-test for all six possible pairs of means: inflates the type I error2 the analysis of variance

hsuhl (NUK) DAE Chap. 3 12 / 46

Page 13: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

ANOVA

a treatments of a single factor

yij: the jth observation taken under treatment i

means model:

yij = µi + εij

{i = 1, 2, . . . , aj = 1, 2, . . . , n

hsuhl (NUK) DAE Chap. 3 13 / 46

Page 14: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

ANOVA (cont.)

Model:

yij = µi + εij

{i = 1, 2, . . . , aj = 1, 2, . . . , n

mean model

= µ+ τi + εij effect model

yij: the ij observationµi: the mean of the ith factor levelµ: overall meanτi: the ith treatment effectεij: the random error component; sources of variability

I measurementI variability from uncontrolled factorsI differences between the experimental unitI noise in the process

hsuhl (NUK) DAE Chap. 3 14 / 46

Page 15: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

ANOVA (cont.)

yij = µi + εij

{i = 1, 2, . . . , aj = 1, 2, . . . , n

mean model

= µ+ τi + εij effect model

linear statistical models

one-way or single-factor analysis of variance model (單因子變異數分析)

the effect model is more widely encountered in the experimentaldesign literatureobject:

I test hypotheses about the treatment meansI estimate model parameters: (µ, τi, σ

2)

εij ∼ NID(0, σ2)⇒ yij ∼ N(µ+ τi, σ2)

hsuhl (NUK) DAE Chap. 3 15 / 46

Page 16: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

ANOVA (cont.)

fixed effects model (固定效應模型): chosen by experimenter

random effects model (隨機效應模型; components of variancemodel變異數成分模型): (Chap. 3.9; Chap. 13)a treatment could be a random sample from a larger population oftreatments

hsuhl (NUK) DAE Chap. 3 16 / 46

Page 17: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Notation

yi·: the average of the observations under the ith treatment

y··: the grand total of all the observations

y··: the grand average of all the observations

yi· =n∑

j=1

yij yi· = yi·/n i = 1, 2, . . . , a

y·· =a∑

i=1

n∑j=1

yij y·· = y··/N, N = an

hsuhl (NUK) DAE Chap. 3 17 / 46

Page 18: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Testing

Testing the equality of the a treatment means E(yij) = µ+ τi = µi:

Hypothesis: {H0: µ1 = µ2 = · · · = µa

Ha: µi 6= µj for at least one pair (i, j)

⇔{

H0: τ1 = τ2 = · · · = τa= 0Ha: τi 6= 0 for at least one i

∑ai=1 µi

a= µ ⇔

a∑i=1

τi = 0

The appropriate procedure for testing the equality of a treatmentmeans is the analysis of variance.

hsuhl (NUK) DAE Chap. 3 18 / 46

Page 19: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Testing (cont.)

hsuhl (NUK) DAE Chap. 3 19 / 46

Page 20: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Decomposition of the Total Sum of Squares

ANOVA: derived from a partitioning of total variability into itscomponent parts

1 SST : the total corrected sum of squares2 SSTreatment: the sum of squares due to treatments (between

treatment)3 SSE: the sum of squares due to error (within treatments)

SST(N−1)

=a∑

i=1

n∑j=1

(yij − y··)2

= na∑

i=1

(yi· − y··)2 +a∑

i=1

n∑j=1

(yij − yi·)2

= SSTreatment(a−1)

+ SSE(N−a)

hsuhl (NUK) DAE Chap. 3 20 / 46

Page 21: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Decomposition of the Total Sum of Squares (cont.)

hsuhl (NUK) DAE Chap. 3 21 / 46

Page 22: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Decomposition of the Total Sum of Squares (cont.)

Total variability: can be partitioned into1 the total corrected sum of squares

SST =

a∑i=1

n∑j=1

(yij − y··)2 =

a∑i=1

n∑j=1

y2ij −

y2··

N

2 a sum of squares of the differences between the treatment averageand the grand average

SSTreatment = na∑

i=1

(yi· − y··)2 =1n

a∑i=1

y2i· −

y2··

N

3 a sum of squares of the differences of observation withintreatments from the treatment average

SSE =

a∑i=1

n∑j=1

(yij − yi·)2 = SST − SSTreatment

hsuhl (NUK) DAE Chap. 3 22 / 46

Page 23: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Decomposition of the Total Sum of Squares (cont.)

1 a pooled estimate of the common variance within each of the atreatments

SSE

N − a

2 an estimate of σ2 if µis are all equal

SSTreatment

a− 1

3 ANOVA identity: provide two estimated of σ2

hsuhl (NUK) DAE Chap. 3 23 / 46

Page 24: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Decomposition of the Total Sum of Squares (cont.)

Error mean square (MSE;誤差均方):

1 MSE =SSE

N − a2 E(MSE) = σ2

Treatment mean square (處理均方):

1 MSTreatment =SSTreatment

a− 1

2 E(MSTreatment) = σ2 +n∑a

i=1 τ2i

a− 13 if there are no differences in treatment means (i.e. τi = 0),

MSTreatment also estimate σ2

A test of hypothesis of no difference in treatment means can beperformed by comparing METreatment and MSE

hsuhl (NUK) DAE Chap. 3 24 / 46

Page 25: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Statistical Analysis

Assumptions

εij ∼ NID(0, σ2)⇒ yij ∼ NID(µ+ τi, σ2)

Cochran’s Theorem

SST : a sum of squares in normally distributed r.v.1 SST/σ

2 ∼ χ2N−1

2 SSTreatment/σ2 ∼ χ2

a−1 if H0 : τi = 0 is true3 SSE/σ

2 ∼ χ2N−a

4 SSTreatment/σ2 and SSE/σ

2 are independent χ2 r.v.

⇒ test statistic: F0 =SSTreatment/(a− 1)

SSE/(N − a)=

MSTreatment

MSE

H0∼ Fa−1,N−a

hsuhl (NUK) DAE Chap. 3 25 / 46

Page 26: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Statistical Analysis (cont.)

Cochran’s Theorem

Let Zi ∼ NID(0, 1), i = 1, . . . , ν, and

ν∑i=1

Z2i =

s∑i=1

Qi,

where s ≤ ν and Qi has νi d.f. (i=1,. . . ,s). Then Qi, ı = 1, . . . , sare independent χ2

νir.v., if and only if

ν =s∑

i=1

νi

hsuhl (NUK) DAE Chap. 3 26 / 46

Page 27: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Statistical Analysis (cont.)

If H0 is false, MSTreatment > MSE

⇒ reject H0 if F0 is too large, i.e., F0 > Fα,a−1,N−a

ANOVA table:

hsuhl (NUK) DAE Chap. 3 27 / 46

Page 28: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Statistical Analysis (cont.)

The Plasma Etching Experiment

H0 : µ1 = µ2 = µ3 = µ4 vs. H1 : some means are different

hsuhl (NUK) DAE Chap. 3 28 / 46

Page 29: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Statistical Analysis (cont.)

## ANOVA tableetch$FRF <- as.factor(etch$RF)etch.aov <- aov(rate˜FRF,data=etch)summary(etch.aov)

Df Sum Sq Mean Sq F value Pr(>F)FRF 3 66870.55 22290.18 66.80 0.0000Residuals 16 5339.20 333.70

F0 > F(0.99, 3, 16) = 5.29

hsuhl (NUK) DAE Chap. 3 29 / 46

Page 30: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Estimation of the Model Parameters

Model:

yij = µ+ τi + εij

{i = 1, . . . , aj = 1, . . . , n

Parameter: µ, τi, σ2

Estimates:I overall mean: µ = y··I treatment effect: τi = yi· − y··, i = 1, . . . , aI µi: µi = µ+ τi = yi·I σ2: σ2 = MSE

hsuhl (NUK) DAE Chap. 3 30 / 46

Page 31: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Estimation of the Model Parameters (cont.)

εij ∼ NID(0, σ2)⇒ yi· ∼ N(µi, σ2/n)

100(1− α)% Confidence interval:

yi· − tα/2,N−a

√MSE

n≤µi ≤ yi· + tα/2,N−a

√MSE

n

yi· − yj· − tα/2,N−a

√2MSE

n≤ µi−µj ≤ yi· − yj· + tα/2,N−a

√2MSE

n

hsuhl (NUK) DAE Chap. 3 31 / 46

Page 32: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Estimation of the Model Parameters (cont.)

Ex 3.3

overall mean: µ = 617.75

treatment effect:

i 1 2 3 4RF power 160 180 200 220

τi -66.55 -30.35 7.65 89.25

95% confidence interval for µ4: (one-at-a-time)

689.6815 ≤ µ4 ≤ 724.3185

Bonferroni method: correct level α/2r

hsuhl (NUK) DAE Chap. 3 32 / 46

Page 33: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Unbalanced data

ni observations under treatment i (i = 1, . . . , a)

N =∑a

i=1 ni: total sample size

SST =a∑

i=1

ni∑j=1

y2ij −

y2··

N

SSTreatment =a∑

i=1

y2i·

ni− y2

··N

hsuhl (NUK) DAE Chap. 3 33 / 46

Page 34: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Model Adequacy Checking

yij: estimate of yij

yij = µ+ τi = yi·

residual eij: investigating violations of the basic assumptions andmodel adequacy

eij = yij − yij

I The checking should be automaticI Model is adequate⇒ eijs should be structurelessI graphical analysisI how to deal with commonly occurring abnormalities

standardized residual: dij =eij√MSE

hsuhl (NUK) DAE Chap. 3 34 / 46

Page 35: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Model Adequacy Checking (cont.)

Residual plot## Residual plotopar <- par(mfrow=c(2,2),cex=.8)plot(etch.aov)par(opar)

hsuhl (NUK) DAE Chap. 3 35 / 46

Page 36: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Model Adequacy Checking (cont.)

hsuhl (NUK) DAE Chap. 3 36 / 46

Page 37: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Model Adequacy Checking (cont.)

eij vs. time: independence assumption

eij vs. yij: nonconstant variance-variance-stabilizingtransformation

hsuhl (NUK) DAE Chap. 3 37 / 46

Page 38: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Statistical Tests for Equality of Variance

Bartlett’s test:H0 :σ2

1 = σ22 = · · · = σ2

a

Ha : above not true for at least on σ2i

Test statistic:

χ20= 2.3026

qc

H0∼ χ2a−1

q = (N − a) log10 S2p −

a∑i=1

(ni − 1) log10 S2i

c = 1 +1

3(a− 1)

(a∑

i=1

(ni − 1)−a − (N − a)−1

)

S2p =

∑ai=1(ni − 1)S2

i

N − a

hsuhl (NUK) DAE Chap. 3 38 / 46

Page 39: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Statistical Tests for Equality of Variance (cont.)

Reject H0: χ20 > χα,a−1

very sensitive to the normality assumption

> bartlett.test(rate˜RF,data=etch)

Bartlett test of homogeneity of variances

data: rate by RFBartlett’s K-squared = 0.4335, df = 3, p-value = 0.9332

> qchisq(0.95,3)[1] 7.814728

hsuhl (NUK) DAE Chap. 3 39 / 46

Page 40: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Statistical Tests for Equality of Variance (cont.)

Modified Levene test:

robust to departures from normality

considering the absolute deviation of yij from the treatmentmedian yi·:

dij = |yij − yi·|{

i = 1, 2, . . . , aj = 1, 2, . . . , n

The test statistic for Levene’s test is simply the usual ANOVA Fstatistic for testing equality of means applied to the absolutedeviations

hsuhl (NUK) DAE Chap. 3 40 / 46

Page 41: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Statistical Tests for Equality of Variance (cont.)

Peak Discharge Data

hsuhl (NUK) DAE Chap. 3 41 / 46

Page 42: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Statistical Tests for Equality of Variance (cont.)

hsuhl (NUK) DAE Chap. 3 42 / 46

Page 43: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Statistical Tests for Equality of Variance (cont.)

> library(lawstat)> peak.aov<-aov(Observ˜as.factor(Method),data=peak)> summary(peak.aov)

Df Sum Sq Mean Sq F value Pr(>F)as.factor(Method) 3 708.3 236.1 76.07 4.11e-11 ***Residuals 20 62.1 3.1---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1> leveneTest(peak$Observ,as.factor(peak$Method))Levene’s Test for Homogeneity of Variance (center = median)

Df F value Pr(>F)group 3 4.5684 0.01357 *

20---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

hsuhl (NUK) DAE Chap. 3 43 / 46

Page 44: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Statistical Tests for Equality of Variance (cont.)

Transformation: y∗ij =√yij

hsuhl (NUK) DAE Chap. 3 44 / 46

Page 45: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Statistical Tests for Equality of Variance (cont.)

Formal method: Box-Cox Method## Box-Cox Methodlibrary(MASS)boxcox(Observ ˜ Method, data = peak,lambda = seq(-1, 1, length = 10))

hsuhl (NUK) DAE Chap. 3 45 / 46

Page 46: Chapter 3 Analysis of Variance (ANOVA; ”””†††óóó555ŠŠŠ

Comparing Among Treatment Means

ANOVA:

reject H0 ⇒ differences between the treatment means

which means differ is not specifiedmultiple comparison methods

yi· ∼ N(µi, σ2/n), σ2 = MSE

⇒µ1 6= µ2 6= µ3 6= µ4

hsuhl (NUK) DAE Chap. 3 46 / 46