6. ANalysis Of VAriance (ANOVA) · 2019-10-29 · What do these ANOVA F statistics test? 1st...

6. ANalysis Of VAriance(ANOVA)

Copyright c©2019 Dan Nettleton (Iowa State University) 6. Statistics 510 1 / 59

Setup and Notation

y = Xβ + ε, ε ∼ N(0, σ2I)

Let X1 = 1, Xm = X, and Xm+1 = I.

Suppose X2, . . . ,Xm are matrices satisfying

C(X1) ⊂ C(X2) ⊂ · · · ⊂ C(Xm−1) ⊂ C(Xm).

Let Pj = PXj and rj = rank(Xj) ∀ j = 1, . . . ,m + 1.


The Total Sum of Squares

The total sum of squares (also known as the corrected total sumof squares) is

n∑i=1

(yi − y·)2 =

y1 − y·...

yn − y·

′ y1 − y·...

yn − y·

= [y− y·1]′[y− y·1]

= [y− P1y]′[y− P1y] = [Iy− P1y]′[Iy− P1y]

= [(I − P1)y]′[(I − P1)y] = y′(I − P1)′(I − P1)y

= y′(I − P1)(I − P1)y = y′(I − P1)y.


Partitioning the Total Sum of Squares

n∑i=1

(yi − y·)2 = y′(I − P1)y = y′(Pm+1 − P1)y

= y′(

m+1∑j=2

Pj −m∑

j=1

Pj

)y

= y′(Pm+1 − Pm + Pm − Pm−1 + · · ·+ P2 − P1)y

= y′(Pm+1 − Pm)y + . . .+ y′(P2 − P1)y

=m∑

j=1

y′(Pj+1 − Pj)y.


The sums of squares in the equation

y′(I − P1)y =m∑

j=1

y′(Pj+1 − Pj)y

are often arranged in an ANOVA table.


Some Additional Sum of Squares Notation

Sum of Squares Sum of Squares

y′(P2 − P1)y SS(2 | 1)

y′(P3 − P2)y SS(3 | 2)...

...

y′(Pm − Pm−1)y SS(m | m− 1)

y′(Pm+1 − Pm)y SSE = y′(I − PX)y

y′(I − P1)y SSTo =n∑

i=1

(yi − y·)2


Note that

SS(j + 1 | j) = y′(Pj+1 − Pj)y

= y′(Pj+1 − Pj + I − I)y

= y′(I − Pj − I + Pj+1)y

= y′(I − Pj)y− y′(I − Pj+1)y

= SSEj − SSEj+1


Thus, SS(j + 1 | j) is the amount the error sum of squaredecreases when y is projected onto C(Xj+1) instead of C(Xj).

SS(j + 1 | j), j = 1, . . . , m− 1 are called Sequential Sums ofSquares.

SAS calls these Type I Sums of Squares.


Properties of the Matrices of the Quadratic Forms

The matrices of the quadratic forms in the ANOVA table haveseveral useful properties:

Symmetry

Idempotency

rank(Pj+1 − Pj) = rj+1 − rj

Zero Cross-Products


Symmetry and Idempotency

Note that ∀ j = 1, . . . , m

(Pj+1 − Pj)′ = P′j+1 − P′j = Pj+1 − Pj

and

(Pj+1 − Pj)(Pj+1 − Pj) = Pj+1Pj+1 − Pj+1Pj − PjPj+1 + PjPj

= Pj+1 − Pj − Pj + Pj

= Pj+1 − Pj.


By idempotency and symmetry,

y′(Pj+1 − Pj)y = y′(Pj+1 − Pj)(Pj+1 − Pj)y= y′(Pj+1 − Pj)

′(Pj+1 − Pj)y= [(Pj+1 − Pj)y]′[(Pj+1 − Pj)y]

= ||(Pj+1 − Pj)y||2

= ||Pj+1y− Pjy||2

≡ ||y(j+1) − y(j)||2

=n∑

i=1

(y(j+1)

i − y(j)i

)2,

which is why we call y′(Pj+1 − Pj)y a “sum of squares.”


rank(Pj+1 − Pj) = rj+1 − rj

Because rank is equal to trace for idempotent matrices, we have

rank(Pj+1 − Pj) = tr(Pj+1 − Pj) = tr(Pj+1)− tr(Pj)

= rank(Pj+1)− rank(Pj)

= rank(Xj+1)− rank(Xj)

= rj+1 − rj.


Zero Cross-Products

∀ j < `

(Pj+1 − Pj)(P`+1 − P`) = Pj+1P`+1 − Pj+1P` − PjP`+1 + PjP`

= Pj+1 − Pj+1 − Pj + Pj

= 0.

Transposing both sides and using symmetry gives

(P`+1 − P`)(Pj+1 − Pj) = 0.


Distribution of Scaled ANOVA Sums of Squares

Because (Pj+1 − Pj

σ2

)(σ2I)

= Pj+1 − Pj

is idempotent,

y′(Pj+1 − Pj)yσ2 ∼ χ2

rj+1−rj

(β′X′(Pj+1 − Pj)Xβ

2σ2

)for all j = 1, . . . , m.


ANOVA Table with Degrees of Freedom

Sum of Squares Degrees of Freedom DF

y′(P2 − P1)y rank(X2)− rank(X1) r2 − 1

y′(P3 − P2)y rank(X3)− rank(X2) r3 − r2...

......

y′(Pm − Pm−1)y rank(Xm)− rank(Xm−1) r − rm−1

y′(Pm+1 − Pm)y rank(Xm+1)− rank(Xm) n− r

y′(I − P1)y rank(Xm+1)− rank(X1) n− 1


Mean Squares

For j = 1, . . . , m− 1, define

MS(j + 1 | j) =SS(j + 1 | j)

rj+1 − rj=

y′(Pj+1 − Pj)yrj+1 − rj

.

These sums of squares divided by their degrees of freedom areknown as mean squares.


ANOVA Table with Mean Squares

Sum of Squares Degrees of Freedom Mean Square

SS(2 | 1) r2 − 1 MS(2|1)

SS(3 | 2) r3 − r2 MS(3|2)...

......

SS(m | m− 1) r − rm−1 MS(m|m− 1)

SSE n− r MSE

SSTo n− 1


Independence of ANOVA Sums of Squares

Because(Pj+1 − Pj)

(σ2I)

(P`+1 − P`) = 0

for all j 6= `, any two ANOVA sums of squares (not includingSSTo) are independent.

It is also true that the ANOVA sums of squares (not includingSSTo) are mutually independent by Cochran’s Theorem, but thatstronger result is not usually needed.


ANOVA F Statistics

For j = 1, . . . , m− 1 we have

Fj =MS(j + 1 | j)

MSE=

y′(Pj+1 − Pj)y/(rj+1 − rj)

y′(I − PX)y/(n− r)

∼ Frj+1−rj,n−r

(β′X′(Pj+1 − Pj)Xβ

2σ2

).


Relationship with Reduced vs. Full Model F Statistic

The ANOVA Fj statistic:

Fj =y′(Pj+1 − Pj)y/(rj+1 − rj)

y′(I − PX)y/(n− r)=

MS(j + 1 | j)MSE

The reduced vs. full model F statistic:

F =y′(PX − PX0)y/(r − r0)

y′(I − PX)y/(n− r)


What do ANOVA F statistics test?

In general, an F statistic is used to test

H0 : “The non-centrality parameter of the F statistic is zero.”

vs.

HA : “The non-centrality parameter of the F statistic is not zero.”



The ANOVA F statistic

Fj =y′(Pj+1 − Pj)y/(rj+1 − rj)

y′(I − PX)y/(n− r)=

MS(j + 1 | j)MSE

has non-centrality parameter

β′X′(Pj+1 − Pj)Xβ

2σ2 .

Thus, Fj can be used to test

H0j :β′X′(Pj+1 − Pj)Xβ

2σ2 = 0 vs. HAj :β′X′(Pj+1 − Pj)Xβ

2σ2 6= 0.



The following are equivalent ways to write the null andalternative hypotheses tested by Fj.

H0j HAj

β′X′(Pj+1 − Pj)Xβ = 0 β′X′(Pj+1 − Pj)Xβ 6= 0

(Pj+1 − Pj)Xβ = 0 (Pj+1 − Pj)Xβ 6= 0

PjE(y) = Pj+1E(y) PjE(y) 6= Pj+1E(y)

Pj+1E(y) ∈ C(Xj) Pj+1E(y) ∈ C(Xj+1) \ C(Xj)



H0j : (Pj+1 − Pj)Xβ = 0 vs. HAj : (Pj+1 − Pj)Xβ 6= 0

is of the form

H0j : C∗j β = 0 vs. HAj : C∗j β 6= 0,

where C∗j = (Pj+1 − Pj)X.

As written, H0j is not a testable hypothesis because C∗j has nrows but rank rj+1 − rj < n (homework problem).

We can rewrite H0j as a testable hypothesis by replacing C∗j withany matrix Cj whose q = rj+1 − rj rows form a basis for the rowspace of C∗j .


Example: Multiple Regression

X1 = 1X2 = [1, x1]

X3 = [1, x1, x2]...

Xm = [1, x1, . . . , xm−1]

SS(j + 1 | j) is the decrease in SSE that results when theexplanatory variable xj is added to a model containing anintercept and explanatory variables x1, . . . , xj−1.


Example: Polynomial Regression

X1 = 1X2 = [1, x]

X3 = [1, x, x2]...

Xm = [1, x, x2, . . . , xm−1]

SS(j + 1 | j) is the decrease in SSE that results when theexplanatory variable xj is added to a model containing anintercept and explanatory variables x, x2, . . . , xj−1.


An Example in R

> #An example from "Design of Experiments: Statistical> #Principles of Research Design and Analysis"> #2nd Edition by Robert O. Kuehl>> d=read.delim("https://.../S510/PlantDensity.txt")


The Data

> dPlantDensity GrainYield

1 10 12.22 10 11.43 10 12.44 20 16.05 20 15.56 20 16.57 30 18.68 30 20.29 30 18.210 40 17.611 40 19.312 40 17.113 50 18.014 50 16.415 50 16.6


Renaming the Variables and Plotting the Data

> names(d)=c("x","y")> head(d)

x y1 10 12.22 10 11.43 10 12.44 20 16.05 20 15.56 20 16.5>> plot(d[,1],d[,2],col=4,pch=16,xlab="Plant Density",+ ylab="Grain Yield")


●

●

●

●

●

●

●

●

●

●

●

●

●

●●

10 20 30 40 50

1214

1618

20

Plant Density

Gra

in Y

ield


Matrices with Nested Column Spaces

X1 =

111111111111111

, X2 =

1 101 101 101 201 201 201 301 301 301 401 401 401 501 501 50

, X3 =

1 10 1001 10 1001 10 1001 20 4001 20 4001 20 4001 30 9001 30 9001 30 9001 40 16001 40 16001 40 16001 50 25001 50 25001 50 2500

,


Matrices with Nested Column Spaces

X4 =

1 10 100 10001 10 100 10001 10 100 10001 20 400 80001 20 400 80001 20 400 80001 30 900 270001 30 900 270001 30 900 270001 40 1600 640001 40 1600 640001 40 1600 640001 50 2500 1250001 50 2500 1250001 50 2500 125000

, X5 =

1 10 100 1000 100001 10 100 1000 100001 10 100 1000 100001 20 400 8000 1600001 20 400 8000 1600001 20 400 8000 1600001 30 900 27000 8100001 30 900 27000 8100001 30 900 27000 8100001 40 1600 64000 25600001 40 1600 64000 25600001 40 1600 64000 25600001 50 2500 125000 62500001 50 2500 125000 62500001 50 2500 125000 6250000


Centering and Standardizing for Numerical Stability

It is typically best for numerical stability to center and scale aquantitative explanatory variable prior to computing higher orderterms.

In the plant density example, we could replace x by (x− 30)/10and work with the matrices on the next two slides.

Because these matrices have the same column spaces as theoriginal matrices, the ANOVA table entries are mathematicallyidentical for either set of matrices.


Matrices with Centered and Scaled x

X1 =

111111111111111

, X2 =

1 −21 −21 −21 −11 −11 −11 01 01 01 11 11 11 21 21 2

, X3 =

1 −2 41 −2 41 −2 41 −1 11 −1 11 −1 11 0 01 0 01 0 01 1 11 1 11 1 11 2 41 2 41 2 4

,


Matrices with Centered and Scaled x

X4 =

1 −2 4 −81 −2 4 −81 −2 4 −81 −1 1 −11 −1 1 −11 −1 1 −11 0 0 01 0 0 01 0 0 01 1 1 11 1 1 11 1 1 11 2 4 81 2 4 81 2 4 8

, X5 =

1 −2 4 −8 161 −2 4 −8 161 −2 4 −8 161 −1 1 −1 11 −1 1 −1 11 −1 1 −1 11 0 0 0 01 0 0 0 01 0 0 0 01 1 1 1 11 1 1 1 11 1 1 1 11 2 4 8 161 2 4 8 161 2 4 8 16


Regardless of whether we center and scale x, the column spaceof X5 is the same as the column space of the cell means modelmatrix

X =

1 0 0 0 01 0 0 0 01 0 0 0 00 1 0 0 00 1 0 0 00 1 0 0 00 0 1 0 00 0 1 0 00 0 1 0 00 0 0 1 00 0 0 1 00 0 0 1 00 0 0 0 10 0 0 0 10 0 0 0 1

.


ANOVA Table for the Plant Density Data

Source Sum of Squares DFx|1 y′(P2 − P1)y 2− 1 = 1x2|1, x y′(P3 − P2)y 3− 2 = 1x3|1, x, x2 y′(P4 − P3)y 4− 3 = 1x4|1, x, x2, x3 y′(P5 − P4)y 5− 4 = 1Error y′(I − P5)y 15− 5 = 10C. Total y′(I − P1)y 15− 1 = 14


Creating the Matrices in R

> y=d$y> x=(d$x-mean(d$x))/10> x[1] -2 -2 -2 -1 -1 -1 0 0 0 1 1 1 2 2 2

>> n=nrow(d)>> x1=matrix(1,nrow=n,ncol=1)> x2=cbind(x1,x)> x3=cbind(x2,xˆ2)> x4=cbind(x3,xˆ3)> x5=matrix(model.matrix(˜0+factor(x)),nrow=n)> I=diag(rep(1,n))


Creating the Projection Matrices in R

> library(MASS)> proj=function(x){+ x%*%ginv(t(x)%*%x)%*%t(x)+ }>> p1=proj(x1)> p2=proj(x2)> p3=proj(x3)> p4=proj(x4)> p5=proj(x5)


Computing the Sums of Squares in R> t(y)%*%(p2-p1)%*%y

[,1][1,] 43.2> t(y)%*%(p3-p2)%*%y

[,1][1,] 42> t(y)%*%(p4-p3)%*%y

[,1][1,] 0.3> t(y)%*%(p5-p4)%*%y

[,1][1,] 2.1> t(y)%*%(I-p5)%*%y

[,1][1,] 7.48> t(y)%*%(I-p1)%*%y

[,1][1,] 95.08


The ANOVA Table in R

> o=lm(y˜x+I(xˆ2)+I(xˆ3)+I(xˆ4),data=d)> anova(o)Analysis of Variance Table

Response: yDf Sum Sq Mean Sq F value Pr(>F)

x 1 43.20 43.200 57.7540 1.841e-05 ***I(xˆ2) 1 42.00 42.000 56.1497 2.079e-05 ***I(xˆ3) 1 0.30 0.300 0.4011 0.5407I(xˆ4) 1 2.10 2.100 2.8075 0.1248Residuals 10 7.48 0.748---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1


What do these ANOVA F statistics test?

1st line: Does a linear mean function fit the data significantlybetter than a constant mean function?

2nd line: Does a quadratic mean function fit the datasignificantly better than a linear mean function?

3rd line: Does a cubic mean function fit the data significantlybetter than a quadratic mean function?

4th line: Does a quartic mean function fit the datasignificantly better than a cubic mean function?

To answer each question, the error variance σ2 is estimated fromthe fit of the full model with one mean for each plant density.


What do these ANOVA F statistics test?

In general, we have

H0j : (Pj+1 − Pj)Xβ = 0 vs. HAj : (Pj+1 − Pj)Xβ 6= 0

which, in testable form, is

H0j : Cjβ = 0 vs. HAj : Cjβ 6= 0,

where Cj is any matrix whose q = rj+1 − rj rows form a basis forthe row space of (Pj+1 − Pj)X.


First Line of the ANOVA Table as Test of H0 : Cβ = 0> X=x5> (p2-p1)%*%X

[,1] [,2] [,3] [,4] [,5][1,] 0.4 0.2 0 -0.2 -0.4[2,] 0.4 0.2 0 -0.2 -0.4[3,] 0.4 0.2 0 -0.2 -0.4[4,] 0.2 0.1 0 -0.1 -0.2[5,] 0.2 0.1 0 -0.1 -0.2[6,] 0.2 0.1 0 -0.1 -0.2[7,] 0.0 0.0 0 0.0 0.0[8,] 0.0 0.0 0 0.0 0.0[9,] 0.0 0.0 0 0.0 0.0

[10,] -0.2 -0.1 0 0.1 0.2[11,] -0.2 -0.1 0 0.1 0.2[12,] -0.2 -0.1 0 0.1 0.2[13,] -0.4 -0.2 0 0.2 0.4[14,] -0.4 -0.2 0 0.2 0.4[15,] -0.4 -0.2 0 0.2 0.4


First Line of the ANOVA Table as Test of H0 : Cβ = 0

Becauserank[(P2−P1)X] = rank(P2−P1) = rank(X2)− rank(X1) = 2−1 = 1,any nonzero constant times any one nonzero row of (P2 − P1)Xforms a basis for the row space of (P2 − P1)X.

For example, we could choose C to be the following one-rowmatrix:

> 5*((p2-p1)%*%X)[15,][1] -2 -1 0 1 2

Some text books would describe these as “the coefficients of acontrast to test for linear trend.” (Note this is different than a testfor “lack of linear fit.”)


We can add consecutive lines in an ANOVA table.

Source Sum of Squares DFx|1 y′(P2 − P1)y 2− 1 = 1x2|1, x y′(P3 − P2)y 3− 2 = 1x3|1, x, x2 y′(P4 − P3)y 4− 3 = 1x4|1, x, x2, x3 y′(P5 − P4)y 5− 4 = 1Error y′(I − P5)y 15− 5 = 10C. Total y′(I − P1)y 15− 1 = 14


We can add consecutive lines in an ANOVA table.

Source Sum of Squares DFx|1 y′(P2 − P1)y 2− 1 = 1x2, x3, x4, |1, x y′(P5 − P2)y 5− 2 = 3Error y′(I − P5)y 15− 5 = 10C. Total y′(I − P1)y 15− 1 = 14


In this case, the combined rows test for lack of linear fitrelative to a model with one unrestricted mean foreach plant density.

Source Sum of Squares DFx|1 y′(P2 − P1)y 2− 1 = 1Lack of Linear Fit y′(P5 − P2)y 5− 2 = 3Error y′(I − P5)y 15− 5 = 10C. Total y′(I − P1)y 15− 1 = 14


> #Let’s add the best fitting simple linear regression> #line to our plot.>> o=lm(y˜x,data=d)>> u=seq(0,60,by=.01) #overkill here but used later.>> lines(u,coef(o)[1]+coef(o)[2]*u,col=2)


●

●

●

●

●

●

●

●

●

●

●

●

●

●●

10 20 30 40 50

1214

1618

20

Plant Density

Gra

in Y

ield


> #The linear fit doesn’t look very good.> #Let’s formally test for lack of fit.>> o=lm(y˜x+factor(x),data=d)> anova(o)Analysis of Variance Table


x 1 43.20 43.200 57.754 1.841e-05 ***factor(x) 3 44.40 14.800 19.786 0.0001582 ***Residuals 10 7.48 0.748---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1


> #It looks like a linear fit is inadequate.> #Let’s try a quadratic fit.>> o=lm(y˜x+I(xˆ2)+factor(x),data=d)> anova(o)Analysis of Variance Table


x 1 43.20 43.200 57.7540 1.841e-05 ***I(xˆ2) 1 42.00 42.000 56.1497 2.079e-05 ***factor(x) 2 2.40 1.200 1.6043 0.2487Residuals 10 7.48 0.748---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1


> #It looks like a quadratic fit is adequate.> #Let’s estimate the coefficients for the best> #quadratic fit.>> b=coef(lm(y˜x+I(xˆ2),data=d))>> #Let’s add the best fitting quadratic curve> #to our plot.> lines(u,b[1]+b[2]*u+b[3]*uˆ2,col=3)


●

●

●

●

●

●

●

●

●

●

●

●

●

●●

10 20 30 40 50

1214

1618

20

Plant Density

Gra

in Y

ield


> #Let’s add the treatment group means to our plot.>> trt.means=tapply(d$y,d$x,mean)>> points(unique(d$x),trt.means,pch="X")


●

●

●

●

●

●

●

●

●

●

●

●

●

●●

10 20 30 40 50

1214

1618

20

Plant Density

Gra

in Y

ield

X

X

X

X

X


> #The quartic fit will pass through the treatment> #means.>>> b=coef(lm(y˜x+I(xˆ2)+I(xˆ3)+I(xˆ4),data=d))> lines(u,b[1]+b[2]*u+b[3]*uˆ2+b[4]*uˆ3+b[5]*uˆ4,col=1)


●

●

●

●

●

●

●

●

●

●

●

●

●

●●

10 20 30 40 50

1214

1618

20

Plant Density

Gra

in Y

ield

X

X

X

X

X


6. ANalysis Of VAriance (ANOVA) · 2019-10-29 · What do these ANOVA F statistics test? 1st...

Documents

Transcript of 6. ANalysis Of VAriance (ANOVA) · 2019-10-29 · What do these ANOVA F statistics test? 1st...