ANOVA and Residuals in Regression - Department of Statistics · Statistics 850 Clayton January 20,...

1
Statistics 850 Clayton January 20, 2004 ANOVA and Residuals in Regression For the model: Y i = β 0 + β 1 x i1 + ··· + β k x ik + i we can write the ANOVA table as: Source df SS MS Regression k ˆ β X Y - n ¯ y 2 SSReg/k Error n - k - 1 2 i ˆ SSErr/(n - k - 1) = s 2 Total n - 1 (y i - ¯ y) 2 = Y Y - n ¯ y 2 Note that F = MSReg/MSError is a test of H 0 : β 1 = β 2 = ··· = β k =0|β 0 . In that sense, we could write SSReg = SS(β 1 2 ,...,β k |β 0 ) and rewrite the ANOVA table as: Source df SS β 0 1 n ¯ y 2 SS(β 1 ,...,β k |β 0 ) k ˆ β X Y - n ¯ y 2 Error n - k - 1 2 i Total NCM n Y Y but we usually do not. (Why?) When we want to focus on the sequential fitting order of parameters, we can write the ANOVA table as, for example, Source df β 1 |β 0 1 β 2 |β 0 1 1 . . . . . . β k |β 0 ,...,β k-1 1 Error n - k - 1 Total n - 1 Residuals There are several different definitions of residual. Here is a summary table of these definitions, including their name in R and SAS. Definition R SAS ˆ i = y i - ˆ y i residual residual r i = ˆ i s 1-h ii rstandard student t i = ˆ i s (i) 1-h ii rstudent rstudent Var(ˆ i )= σ 2 (1 - h ii ) where h ii is from the “hat” matrix H = X(X X) -1 X . In simple linear regression, h ii = 1 n + (x i - ¯ x) 2 (x i - ¯ x) 2 In the formula for t i , s (i) is the “leave one out” estimate of σ based on the regression that is conducted on all the data except the ith case. t i can also be calculated as t i = ˆ (i) ˆ Var(ˆ (i) ) See Rencher for details. 1

Transcript of ANOVA and Residuals in Regression - Department of Statistics · Statistics 850 Clayton January 20,...

Page 1: ANOVA and Residuals in Regression - Department of Statistics · Statistics 850 Clayton January 20, 2004 ANOVA and Residuals in Regression For the model: Y i = ... • In simple linear

Statistics 850 Clayton January 20, 2004

ANOVA and Residuals in Regression

For the model: Yi = β0 + β1xi1 + · · · + βkxik + εi we can write the ANOVA table as:

Source df SS MSRegression k β′X ′Y − ny2 SSReg/kError n − k − 1

∑ε2i = ε′ε SSErr/(n − k − 1) = s2

ε

Total n − 1∑

(yi − y)2 = Y ′Y − ny2

Note that F = MSReg/MSError is a test of H0 : β1 = β2 = · · · = βk = 0|β0. In that sense, we could writeSSReg = SS(β1, β2, . . . , βk|β0) and rewrite the ANOVA table as:

Source df SSβ0 1 ny2

SS(β1, . . . , βk|β0) k β′X ′Y − ny2

Error n − k − 1∑

ε2iTotalNCM n Y ′Y

but we usually do not. (Why?)When we want to focus on the sequential fitting order of parameters, we can write the ANOVA table as,

for example,Source dfβ1|β0 1β2|β0, β1 1...

...βk|β0, . . . , βk−1 1Error n − k − 1Total n − 1

Residuals

There are several different definitions of residual. Here is a summary table of these definitions, includingtheir name in R and SAS.

Definition R SASεi = yi − yi residual residual

ri = εi

√1−hii

rstandard student

ti = εi

s(i)

√1−hii

rstudent rstudent

• Var(εi) = σ2ε (1 − hii) where hii is from the “hat” matrix H = X(X ′X)−1X ′.

• In simple linear regression,

hii =1n

+(xi − x)2∑(xi − x)2

• In the formula for ti, s(i) is the “leave one out” estimate of σε based on the regression that is conductedon all the data except the ith case.

• ti can also be calculated as

ti =ε(i)√

Var(ε(i))

See Rencher for details.

1