STAT22200 Spring 2014 Chapter 6 - University of...

58
STAT22200 Spring 2014 Chapter 6 Yibi Huang May 1, 2014 May 6, 2014 Chapter 6 Model Assumption Checking and Remedies Chapter 6 - 1

Transcript of STAT22200 Spring 2014 Chapter 6 - University of...

Page 1: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

STAT22200 Spring 2014 Chapter 6

Yibi Huang

May 1, 2014May 6, 2014

Chapter 6 Model Assumption Checking and Remedies

Chapter 6 - 1

Page 2: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

THREE ASSUMPTIONS We Need to Check

In the means model yij = µi + εij or the effects model

yij = µ+ τi + εij ,

we make 3 assumptions about the error term εij ’s.:

1. the errors εij are independent, randomly distributed

2. the errors εij have constant variance across treatments

3. the errors εij follow a normal distribution

As the 3 assumptions are all related to errors εij , most of themodel diagnostic methods are based on the residuals

residual eij = yij − yij = yij − y i•.

Chapter 6 - 2

Page 3: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Standardized and Studentized Residuals

Because εij has SD σ, we sometimes standardize the residuals

standardized residual dij =eij√MSE

.

By the normality assumption, dij is approximately N(0, 1).Observations with |dij | > 3 are potential outliers.

If the design is not balanced, a more accurate standardization is the

studentized residual sij =eij√

MSE(1− 1ni

).

We divided by√

MSE(1− 1ni

) because eij = yij − y i• has SD

σ√

1− 1ni. Again, if the errors are normal, sij is approximately

N(0, 1). Observations with |sij | > 3 are potential outliers.

Chapter 6 - 3

Page 4: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Studentized Residuals In a Regression Model (May Skip)

In general, in a regression model Y = Xβ + ε, the residual is

e = Y − X β = Y − X (XTX )−1XTY = (I − H)Y ,

in which H = X (XTX )−1XT is called the hat matrix. One canshow that the variance-covariance matrix of the residuals is

Var(e) = σ2(I − H)

Thus Var(ei ) = σ2(1− hii ), in which hii is the ith diagonalelement of the hat matrix H, called the leverage. Thus thestudentized residuals are defined as

si =ei√

MSE (1− hii )

In a complete randomized design, hii = 1/ni .

Chapter 6 - 4

Page 5: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Example: Resin Glue Failure Time

In previous lectures, the responseof the resin glue experiment islog10(lifetime).In the original data, the responseis simply the life time of glue inhours

●●

●●

●●●

●●

●●●●●●

●●●●●●●

●●●●●

020

6010

0

Temperature (Celsius)

Life

(H

our)

175 194 213 231 250

Temperature Life time in Hours175◦C 110.0 82.2 99.1 82.9 71.3 91.7 76.0 79.2194◦C 45.8 51.3 26.5 58.0 45.3 40.8 35.8 45.6213◦C 33.8 34.8 24.2 20.5 22.5 18.8 18.2 24.2231◦C 14.2 16.7 14.8 14.6 16.2 18.9 14.8250◦C 18.0 6.7 12.0 10.5 12.2 11.4

Chapter 6 - 5

Page 6: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Residuals

> mydata = read.table("resinlife.txt", header=T)

> attach(mydata)

> aov1 = aov(life ~ as.factor(temp))

> aov1$res # get residuals, output omitted

> round(aov1$res,2) # round residuals to 2 decimals

1 2 3 4 5 6 7 8

23.45 -4.35 12.55 -3.65 -15.25 5.15 -10.55 -7.35

9 10 11 12 13 14 15 16

2.16 7.66 -17.14 14.36 1.66 -2.84 -7.84 1.96

17 18 19 20 21 22 23 24

9.17 10.17 -0.42 -4.12 -2.12 -5.82 -6.43 -0.42

25 26 27 28 29 30 31 32

-1.54 0.96 -0.94 -1.14 0.46 3.16 -0.94 6.20

33 34 35 36 37

-5.10 0.20 -1.30 0.40 -0.40

We see the first observation has the largest residual 23.45.Is it an outlier?

Chapter 6 - 6

Page 7: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Studentized ResidualsIn R, studres() is the command to get the studentized residuals.

> library(MASS) # must load library "MASS" to studres()

> ?studres # see help file of studres()

> studres(aov1) # studentized residual

1 2 3 4 5

3.55341908 -0.55843235 1.67395045 -0.46787374 -2.07936247

(... some output is omitted ...)

36 37

0.05235783 -0.05235783

> round(studres(aov1),2) # round studentized residuals to 2 decimals

1 2 3 4 5 6 7 8 9 10

3.55 -0.56 1.67 -0.47 -2.08 0.66 -1.39 -0.95 0.28 0.99

11 12 13 14 15 16 17 18 19 20

-2.38 1.94 0.21 -0.36 -1.02 0.25 1.20 1.34 -0.05 -0.53

21 22 23 24 25 26 27 28 29 30

-0.27 -0.75 -0.83 -0.05 -0.20 0.12 -0.12 -0.15 0.06 0.41

31 32 33 34 35 36 37

-0.12 0.82 -0.67 0.03 -0.17 0.05 -0.05

The first observation with sij = 3.55 > 3 is a potential outlier.

Chapter 6 - 7

Page 8: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

3.4.1 Methods for Checking the Normality Assumption

I Histogram of the residuals, if normal, should be bell-shaped

I Pros: simple, intuitiveI Cons: for a small sample, histogram may not be

bell-shaped even though the sample is from a normaldistribution

I Normal probability plot of the residuals

I aka. normal QQ plot, in which QQ stands for“quantile-quantile”

I the best method to assess normalityI See next slide for details

Chapter 6 - 8

Page 9: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

How to Make a Normal Probability Plot?

1. Given data (y1, y2, . . . , yn), standardize: xi = (yi − y)/s.

2. Arrange the data in increasing order: x(1), x(2), . . . , x(n).

3. Find quantiles of the N(0, 1) distribution: z( 1n+1

), z( 2n+1

),. . . , z( n

n+1).

That is, z( in+1

) is a value such that P(Z ≤ z( in+1

)) = in+1 for

Z ∼ N(0, 1).

4. Plot the x(i) values against the z( in+1

) values.

That is, plot the points (z( in+1

), x(i)) for i = 1, 2, . . . , n

Chapter 6 - 9

Page 10: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Interpreting Normal Probability Plots

I If the data are approximately normal, the plot will be close toa straight line.

I Systematic deviations from a straight line indicate anon-normal distribution.

I Outliers appear as points that are far away from the overallpattern of the plot.

I R command: qqnorm()

Chapter 6 - 10

Page 11: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

−2 −1 0 1 2

−3

−2

−1

01

2

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Normal

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

−2 −1 0 1 2

01

23

45

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Skewed

●●

●●

●●●

●●

●●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●●

●● ●

●●

−2 −1 0 1 2

−5

05

10

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Heavy tailed

●●

●●

●●

● ●

●●

●●

−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Light tailed

Chapter 6 - 11

Page 12: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

QQ plot for the Resin Glue Residuals

> qqnorm(lmorig$res)

> qqline(lmorig$res)

> hist(lmorig$res,xlab="Residuals",main="Histogram of Residuals")

●●

●●

●●

●●

●●●

●●

●●

−2 −1 0 1 2

−10

010

20

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Histogram of Residuals

Residuals

Fre

quen

cy−20 −10 0 10 20

02

46

810

Do the residuals look normal?

Chapter 6 - 12

Page 13: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Example: Hodgkin’s Disease

Hodgkin’s disease is a type of lymphoma, which is a canceroriginating from white blood cells called lymphocytes. The dataset Hodgkins.txt contains plasma bradykininogen levels (inmicrograms of bradykininogen per milliliter of plasma) in 3 types ofsubjects

I normal,

I in patients with active Hodgkin’s disease, and

I in patients with inactive Hodgkin’s disease.

The globulin bradykininogen is the precursor substance forbradykinin, which is thought to be a chemical mediator ofinflammation.

I Is this an experiment?

I We can use ANOVA to compare means of several samples inan observational study.

Chapter 6 - 13

Page 14: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

> mydata = read.table("Hodgkins.txt", header=T)

> attach(mydata)

> plot(Hodgkins,BradyLevel)

> plot(as.integer(Hodgkins), BradyLevel, pch=20, cex=0.75)

Active Inactive Normal

24

68

1014

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●●

1.0 1.5 2.0 2.5 3.0

24

68

1014

as.numeric(Hodgkins)

Bra

dyLe

vel

The distribution within each group looks right skewed. Let’s fit theANOVA model anyway and take a look at the residuals.

> brady1 = lm(BradyLevel ~ Hodgkins)

> anova(brady1)

Df Sum Sq Mean Sq F value Pr(>F)

Hodgkins 2 65.89 32.95 10.67 0.000104 ***

Residuals 62 191.45 3.09

Chapter 6 - 14

Page 15: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Normal QQ Plot for The Hodgkin Data

> qqnorm(brady1$res, ylab="Residuals")

> qqline(brady1$res)

> library(MASS)

> qqnorm(studres(brady1), ylab="Studentized Residuals")

> qqline(studres(brady1))

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

−2 −1 0 1 2

−2

02

46

Normal Q−Q Plot

Theoretical Quantiles

Res

idua

ls

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

−2 −1 0 1 2−

10

12

34

5

Normal Q−Q Plot

Theoretical Quantiles

Stu

dent

ized

Res

idua

ls

Does the distribution of the residuals look normal?No, it’s right-skewed, and has a outlier with sij > 5.

Chapter 6 - 15

Page 16: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Remedies for Non-Normality

Asymmetry can often be ameliorated by transforming theresponse — often a power transformation.

fλ(y) =

{yλ, if λ 6= 0

log(y), if λ = 0.

1. If right-skewned, try taking square root, logarithm, or otherpowers < 1

y −→ log(y),√y , or yλ with λ < 1

2. If left-skewned, try squaring, cubing, or other powers > 1

y −→ y2, y3, or yλ with λ > 1

Chapter 6 - 16

Page 17: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Example: Hodgkin’s Disease – QQ Plots

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

−2 −1 0 1 2

−1

01

23

45

Original Scale

Theoretical Quantiles

Stu

dent

ized

Res

idua

ls After a log transformation, theresiduals are less skewed, and theoutlier looks less extreme.

The square-root transformationalso improves normality but not asmuch as the log transformation.

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−2 −1 0 1 2

−2

01

23

4

Square−root Transform

Theoretical Quantiles

Stu

dent

ized

Res

idua

ls

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−2 −1 0 1 2

−2

−1

01

23

Log Transform

Theoretical Quantiles

Stu

dent

ized

Res

idua

ls

Chapter 6 - 17

Page 18: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

R-Codes for Making the Plots on the Previous Slide

> brady1 = lm(BradyLevel ~ Hodgkins)

> brady2 = lm(sqrt(BradyLevel) ~ Hodgkins)

> brady3 = lm(log(BradyLevel) ~ Hodgkins)

> library(MASS)

> qqnorm(studres(brady1), main="Original Scale")

> qqline(studres(brady1))

> qqnorm(studres(brady2), main="Square-root Transform")

> qqline(studres(brady2))

> qqnorm(studres(brady3), main="Log Transform")

> qqline(studres(brady3))

Chapter 6 - 18

Page 19: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Box-Cox MethodBox-Cox method is an automatic procedure to select the “best”power λ that make the residuals of the model

yλij = µi + εij

closest to normal.

I We usually round the optimal λ to a convenient power like

−1, −1

2, −1

3, 0,

1

3,

1

2, 1, 2, . . .

since the practical difference of y0.5827 and y0.5 is usuallysmall, but the square-root transformation is much easier tointerpret.

I A confidence interval for the optimal λ can also be obtained.See Oehlert, p.129 for details.We usually select a convenient power λ∗ in this C.I.

Chapter 6 - 19

Page 20: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Example: Hodgkin’s Disease – Box-Cox

In R, one must first load thelibrary MASS to use boxcox().The argument of the boxcox()

function can be a model formula,or a lm model, or an aov model.

> library(MASS)

> boxcox(brady1)

> boxcox(BradyLevel ~ Hodgkins)−2 −1 0 1 2

−80

−70

−60

−50

λ

log−

Like

lihoo

d

95%

The middle dash line marks the optimal λ, the right and left dashline mark the 95% C.I. for the optimal λ.

For the plot, we see the optimal λ is around −0.2, and the 95%C.I. contains 0. For simplicity, we use the log-transformedBradyLevel as our response.

Chapter 6 - 20

Page 21: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

A Remark On the Log-Scale

In fact, for concentration, log scale more commonly used than theoriginal scale. For example,

I A concentration of 10.1 and 10.001 are almost the same, buta concentration of 0.1 and 0.001 are very different since 0.1 is100 times higher than 0.001.

I In the original scale, (10.1, 10.001) and (0.1, 0.001) differ bythe same amount, 0.099.

I In log scale,

log10 0.1− log10 0.001 = 2 much greater than

log10 10.1− log10 10.001 ≈ 0.0043

Chapter 6 - 21

Page 22: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Example: Hodgkin’s Disease

From the 2 ANOVA tables below, we see a log transformationmakes the difference between the 3 group of patients moresignificant because the outlier become less extreme and not inflatethe MSE as much.

Response: BradyLevel

Df Sum Sq Mean Sq F value Pr(>F)

Hodgkins 2 65.893 32.946 10.67 0.0001042 ***

Residuals 62 191.449 3.088

Response: log(BradyLevel)

Df Sum Sq Mean Sq F value Pr(>F)

Hodgkins 2 2.2526 1.12631 15.436 3.628e-06 ***

Residuals 62 4.5238 0.07297

Chapter 6 - 22

Page 23: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Non-Parametric Tests

I Transformation does not always work. For example, it helpslittle for symmetric but heavy-tailed (many outliers)distributions.

I The ANOVA F -test is robust to non-normality, but it is notresistent to outliers.

I If outliers are unavoidable, and cannot be removed, trynon-parametric tests (like permutation test, Kruskal-Wallistest (Section 3.11.1)) that doesn’t rely on normalityassumption.

I There might be a lecture specifically on non-parametric testsif we have time.

Chapter 6 - 23

Page 24: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Constant Variance Assumption

I Why Is Non-Constant Variance a Problem?I Tools for checking constant variance assumption — Residual

PlotsI Residuals v.s. Fitted ValuesI Residuals v.s. TreatmentsI Residuals v.s. Other Variables

I RemediesI Transforming the Response,

Variance-Stabilizing TransformationI Brown-Forsythe Modified F -test — an alternative to ANOVA

F -testI Welch Test for Contrasts w/o Constant Variance Assumption

Chapter 6 - 24

Page 25: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Why Is Non-Constant Variance a Problem?Example — Serial Dilution Plating (Exercise 6.2 on p.143). . . . . . . . . . . . a method to count the number of bacteria in solution

Chapter 6 - 25

Page 26: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Example — Serial Dilution Plating (Cont’d)

I Goal: to compare 3 pasteurization methods for milk

I Design: 15 samples of milk randomly assigned to the 3 trts

I Response: the bacterial load in each sample after treatment,determined via serial dilution plating

I Data: http://users.stat.umn.edu/~gary/book/fcdae.data/ex6.2

Method 1 Method 2 Method 326× 102 35× 103 29× 105

29× 102 23× 103 23× 105

20× 102 20× 103 17× 105

22× 102 30× 103 29× 105

32× 102 27× 103 20× 105

Mean 25.8× 102 27× 103 23.6× 105

SD 492 5874 536656

√MSE = 309900

Chapter 6 - 26

Page 27: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Example — Serial Dilution Plating (Cont’d)

95%C.I. for the mean of Method 1:

y1• ± t0.025,15−3

√MSE√n1

= 2580± 2.179309900√

5

= 2580± 301965

= (−299385, 304545)

which is way larger than the range of 5 observations for method 1(2000-3200). What happened?

Chapter 6 - 27

Page 28: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Checking Constant Variance Assumption (1)

Tests for equality of variance are available, but not useful, because

1. the tests do not tell us how much nonconstant variance ispresent and how much it affects our inferences.

2. classical tests of constant variance are very sensitive tononnormality in the errors.It is very difficult to tell non-normality from normality withnon-constant variance.)

Chapter 6 - 28

Page 29: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Checking Constant Variance Assumption (2)

Better tool for checking constant variance — Residual Plots

I Residuals v.s. Fitted Values

I Residuals v.s. Treatments

I Residuals v.s. Other Variables

If the constant variance assumption is true, residuals will evenlyspread around the zero line.

Rule of thumb: ANOVA F -tests for CRD can toleratenon-constant variance to some extent, so do tests for contrasts.It usually fine as long as

max{σi}min{σi}

≤ 2 or even 3,

especially when the group sizes ni are (roughly) equal.

Chapter 6 - 29

Page 30: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Example: Resin Glue Data> plot(lmorig$fit,lmorig$res,ylab="Residuals",xlab="Fitted Values")

> abline(h=0) # adding a zero line

> plot(temp,lmorig$res,ylab="Residuals",xlab="Centigrade Temperature")

> abline(h=0) # adding a zero line

Residuals v.s. Fitted Values Residuals v.s. Treatments●

●●

●●

●●

●●

●●●●●●●

●●●●

20 40 60 80

−10

010

20

Fitted Values

Res

idua

ls

●●

●●

●●

●●

● ●●●●●●

●●●●

180 200 220 240

−10

010

20Centigrade Temperature

Res

idua

ls

I Why the points make vertical strips?

I The spread of residuals increases with fitted valuesI The lower the temperature, the more variable the response.

Chapter 6 - 30

Page 31: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

More on the Residuals PlotsResiduals plots can be used to check many other things, likenon-linearity.

In the resin glue example, if one fit the a linear modelyij = β0 + β1T + εij . One can check whether lifetime is indeedlinear in temperature using a residual plot

> lmorig1 = lm(life ~ temp)

> plot(temp,lmorig1$res,ylab="Residuals",xlab="Centigrade Temperature")

> abline(h=0)

●●

●●●

●●

●●●●●

●●●●●●●

●●●●●

−30

−10

1030

Centigrade Temperature

Res

idua

ls

175 194 213 231 250

From the residual plot, we see

I non-constant variance acrosstemperature

I lifetime is curved, not linearwith temperature

Chapter 6 - 31

Page 32: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Plot of Residuals v.s. Other Variables

If there are other variables that might possibly affect the response,but not included in the model, then one should check the residualplots with these variables. For example,

I if experimental units come from different batches, thenresiduals should be plotted v.s. batches

I if measurements are made by several operators, then residualsshould be plotted v.s. operators

Patterns in such residual plots suggest these variables should be

I either included in the analysis(but note one CANNOT claim these variables have acausal-effect on the response, since they are not controlled inadvance)

I or controlled more carefully (e.g., by a block design) in futureexperiments

Chapter 6 - 32

Page 33: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Remedy 1: Variance-Stabilizing TransformationIf the SD σ (the spread of residuals) depends on the mean µ (thefitted value), you can try a variance-stabilizing transformation ofthe response to make the variance (close to) constant.

I if the SD ∝ the fitted value, then transform the response y tolog(y),

I e.g., the serial dilution plating example

I if the SD ∝√

the fitted value, i.e.,

Variance ∝ fitted value,

theny → √y

I In general, if σ ∝ µα, then the variance-stabilizingtransformation is

y →

{y1−α for α 6= 1

log(y) for α = 1

Chapter 6 - 33

Page 34: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Example: Resin Glue

> lm1 = lm(life ~ as.factor(temp))

> lm2 = lm(sqrt(life) ~ as.factor(temp))

> lm3 = lm(log(life) ~ as.factor(temp))

> plot(lm1$fit, lm1$res, xlab="Fitted Value", ylab="Residuals")

> abline(h=0,lty=2)

> plot(lm2$fit, lm2$res, xlab="Fitted Value", ylab="Residuals")

> abline(h=0,lty=2)

> plot(lm3$fit, lm3$res, xlab="Fitted Value", ylab="Residuals")

> abline(h=0,lty=2)

Original Scale Square-root Transform Log Transform●

●●

●●

●●

●●●●●●●

●●●●

20 40 60 80

−10

010

20

Fitted Value

Res

idua

ls

●●

●●

●●

●●

4 5 6 7 8 9

−1.

5−

0.5

0.5

Fitted Value

Res

idua

ls

●●

●●

●●

●●

2.5 3.0 3.5 4.0 4.5

−0.

40.

00.

20.

4

Fitted ValueR

esid

uals

Chapter 6 - 34

Page 35: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Box-Cox Again

In many cases, we don’t have a good idea what is the value of α.We can still try power transformation of the response.

fλ(y) =

{yλ, if λ 6= 0

log(y), if λ = 0.

How to select λ?

I Trial and error: try convenient power like −1,−1/2,−1/3, 0,1/3, 1/2, 2, . . . and then check residual plots for each of themfor the constant variance.

I Box-Cox method:Though Box-Cox is developed to select a power transformationmaking the residuals as normal as possible, it’s been shownthat the optimal λ is often close to the variance-stabilizing λ.

Chapter 6 - 35

Page 36: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Example – Count of Bacteria Revisit

> library(MASS)

> boxcox(count ~ as.factor(method))

> plot(method, log10(count),

xlab="Methods",

ylab="log10(Count of Bacteria)")

−2 −1 0 1 2

−10

0−

60−

200

λ

log−

Like

lihoo

d

95%

●●●●●

●●●●●

●●●

●●

1.0 1.5 2.0 2.5 3.0

3.5

4.5

5.5

6.5

Methods

log1

0(C

ount

of B

acte

ria)

After log transformation, the 3 variances look even.

Chapter 6 - 36

Page 37: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Example: Resin Glue – Box-Cox

> library(MASS)

> boxcox(life ~ as.factor(temp))

−2 −1 0 1 2

−70

−50

−30

−10

λ

log−

Like

lihoo

d

95%

The 95% C.I. for λ contains both 0 and 1/2. As λ = 1/2 is veryclose to the boundary of the C.I, λ = 0 seems to be a betterchoice, which is consistent with the Arrhenius Law.

Chapter 6 - 37

Page 38: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Drawbacks of Transformation

I Except for a few special transformation (log,√

, reciprocal),the transformed response usually lacks natural interpretation(what does y0.1 mean?)

I Unless having a good interpretation on the transformedresponse, think again before making transformations

Remember that ANOVA tests have some tolerance fornon-constant variance. If

max{σi}min{σi}

≤ 2 or 3,

don’t worry too much about non-constant variance.In that case, it is fine to leave the response untransformed.

Chapter 6 - 38

Page 39: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

> tapply(life, temp, sd)

175 194 213 231 250

12.895514 9.557785 6.378703 1.661181 3.646917

> tapply(sqrt(life), temp, sd)

175 194 213 231 250

0.6802037 0.7505140 0.6223047 0.2048698 0.5305554

> tapply(log(life), temp, sd)

175 194 213 231 250

0.1440872 0.2393055 0.2451210 0.1012540 0.3177677

After the log transformation, the ratio of the largest and smallestSD is 2.2, which is acceptable

Chapter 6 - 39

Page 40: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Brown-Forsythe Modified F -test

If transformation doesn’t work well, or you don’t want to transformfor lacking a good interpretation, try tests that don’t rely on theconstant variance assumption:

I Brown-Forsythe modified F -test — an alternative ofANOVA F -test:

BF =

∑gi=1 ni (y i• − y••)2∑gi=1 s

2i (1− ni/N)

=SSTrt∑g

i=1 s2i (1− ni/N)

in which s2i is the sample variance in treatment i . Under thenull hypothesis of equal treatment means, BF is approximatelydistributed as an F -distribution with g − 1 and ν degrees offreedom, where

ν =(∑g

i=1 di )2∑g

i=1 d2i /(ni − 1)

in which di = s2i (1− ni/N).

Chapter 6 - 40

Page 41: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Welch Test for Contrasts w/o Constant VarianceAssumption

If transformation doesn’t work well, try tests that doesn’t rely onthe constant variance assumption:

I Welch test for a contrast∑g

i=1 ciµi : The test statistic is

t =

∑gi=1 ciy i•√∑gi=1 c

2i s

2i /ni

which is approximately a t-distribution with ν degrees offreedom, where

ν =

(∑gi=1 c

2i s

2i /ni

)2∑gi=1

1ni−1

c4i s4i

n2i

Observe that this is a generalization of the two-sample testwithout equal variance assumption.

Chapter 6 - 41

Page 42: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Checking for Dependent Errors and Remedies

Chapter 6 - 42

Page 43: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Checking for Dependent Errors

I Among the 3 assumptions, violation of the independenceassumption causes severest problem. Most of our the analysis(ANOVA, test of contrasts, multiple comparisons, etc.) havelittle tolerance on dependence of errors

I There are various forms of dependence, serial dependence andspatial dependence are two common ones

I A time plot of the residuals (v.s. the order we measure them)is a good way to check for serial dependence.

I It’s better to keep track of the order units are measured.I A smooth time-plot is a sign of positive serial

dependence, since a smooth time plot means successiveresiduals are too close together

Chapter 6 - 43

Page 44: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Autocorrelation and Autocorrelation Plots

Autocorrelation plots are also common ways to check for serialdependence.

I Lag 1 autocorrelation plot: plotting (X1, . . . ,Xn−1) v.s.(X2, . . . ,Xn)

I Lag k autocorrelation plot: plotting (X1, . . . ,Xn−k) v.s.(X1+k , . . . ,Xn)

I Any trend in the autocorrelation plot is a sign of serialdependence.

A numerical way to to check for serial dependence, is to computethe Lag-k autocorrelation coefficient, which is the correlationcoefficient of (X1, . . . ,Xn−k) v.s. (X1+k , . . . ,Xn), k = 1, 2, 3, . . . .

Chapter 6 - 44

Page 45: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Spatial Dependence

Spatial dependence can arise when the experimental units arearranged in space, like plants in a farm. Spatial dependence occurswhen units that are closer together are more similar than unitsfarther apart.

Chapter 6 - 45

Page 46: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Remedies for Dependence

1. There isn’t much we can do about dependence using ourcurrent machinery, since no simple transformation can removedependence.

2. Analysis of dependent data requires tools like time series orspatial statistics, which is beyond the scope of this class

Chapter 6 - 46

Page 47: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Example: Standard GravityThe National Bureau of Standards performed eight series ofexperiments between 1924 and 1935 to determine g , the standardgravity.

The data are given in the table below (in deviations from9.8m/s2 × 105 so that, for example, the first measurement of g is9.80076 m/s2), with series 1 representing the earliest set ofexperiments and series 8 the last.

Series Measurements

1 76 82 83 54 35 46 87 682 87 95 98 100 109 109 100 81 75 68 673 105 83 76 75 51 76 93 75 624 95 90 76 76 87 79 77 715 76 76 78 79 72 68 75 786 78 78 78 86 87 81 73 67 75 82 837 82 79 81 79 77 79 79 78 79 82 76 73 648 84 86 85 82 77 76 77 80 83 81 78 78 78

Chapter 6 - 47

Page 48: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Example: Standard Gravity

The National Bureau of Standards1 is the government agency thatmeasures things. The following statement is taken from the NISTwebsite:

Founded in 1901, NIST is a non-regulatory federalagency within the U.S. Department of Commerce. NISTmission is to promote U.S. innovation and industrialcompetitiveness by advancing measurement science,standards, and technology in ways that enhanceeconomic security and improve our quality of life.

Thus, it is safe to assume that the NBS scientists were trying hardto measure the same quantity g (e.g., all experiments were done inthe same location) throughout all eight series of experiments.

1now called the National Institute of Standards and TechnologyChapter 6 - 48

Page 49: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

> gee = read.table("g.txt", header=T)

> attach(gee)

> plot(series,g)

Variance decreases with series,which makes sense since theaccuracy of measurementimproves as time goes by.

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●● ●●●

●●

●● ●●●●●●●●●●

●●

●●●●

●●●●●●●●●

1 2 3 4 5 6 7 8

4060

8010

0

series

Sta

ndar

d G

ravi

ty g

(m

/s^2

)

What happens if we ignore this fact, and do an ANOVA F -test?

> lmg = lm(g ~ as.factor(series), data=gee)

> anova(lmg)

Df Sum Sq Mean Sq F value Pr(>F)

as.factor(series) 7 2819 402.7 3.568 0.00236 **

Residuals 73 8239 112.9

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

ANOVA rejects the null hypothesis that the 8 series have the samemean.

Chapter 6 - 49

Page 50: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Question

What is implication if the null hypothesis is rejected? Will youconclude that

(a) g had changed over time, or

(b) the ANOVA F -test failed?

If your choice is (a), how do you explain the change of g?If your choice is (b), why the ANOVA F -test fails?

The ANOVA F -test here is not reliable because at least of itsassumptions are not met — constant variance.

> round(tapply(g,series,sd),2)

1 2 3 4 5 6 7 8

19.25 15.29 15.76 8.30 3.65 5.84 4.74 3.36

Will a transformation work here? Try Box-Cox?

Chapter 6 - 50

Page 51: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Brown-Forsythe Modified F -testSince the variances are not constant, let’s try the Brown-Forsythemodified F -test

BF =SSTrt∑g

i=1 s2i (1− ni/N)

The numerator SSTrt = 2819 can be found in the ANOVA tableabove. The denominator is found using R (see the codes below) tobe 888.5747

> sds = tapply(g,series,sd)

> nis = tapply(g,series,length)

> di = sds^2*(1-nis/length(g))

> BFbottom = sum(di)

> BFbottom

[1] 888.5747

> BF = 2819/BFbottom

The BF -statistic is thus

BF =2819

888.5747= 3.1725

Chapter 6 - 51

Page 52: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Under the null hypothesis of equal treatment means, BF isapproximately distributed as an F -distribution with g − 1 and νdegrees of freedom, where

ν =(∑g

i=1 di )2∑g

i=1 d2i /(ni − 1)

in which di = s2i (1− ni/N)

which is calculated as 29.46249 in the R code below.

> nu = (BFbottom)^2/sum(di^2/(nis-1))

> nu

[1] 29.46249

> 1 - pf(BF, df1 = 7, df2 = nu) # P-value of the BF-test

[1] 0.01263341

However, the BF-test, not relying on the constant varianceassumption, also rejects the null hypothesis of equal mean at aP-value 0.0126. How can this be true?

Chapter 6 - 52

Page 53: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Maybe earlier measurements are less reliable, what if we excludethe first series?

> lmg2 = lm(g ~ as.factor(series), subset = (series > 1))

> anova(lmg2)

Df Sum Sq Mean Sq F value Pr(>F)

as.factor(series) 6 1428.6 238.092 2.7835 0.01787 *

Residuals 66 5645.5 85.538

The Brown-Forsythe Modified F -test gives BF = 2.616 andP-value = 0.038.

What if we remove the first and second series?

> lmg2 = lm(g ~ as.factor(series), subset = (series>2))

> anova(lmg2)

Df Sum Sq Mean Sq F value Pr(>F)

as.factor(series) 5 222.8 44.553 0.7545 0.5863

Residuals 56 3306.6 59.046

The Brown-Forsythe Modified F -test gives BF = 0.658 andP-value = 0.659.

Can we conclude series 3-8 have a common mean, but not series 1and 2?

Chapter 6 - 53

Page 54: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

The observations in each series are in fact given in time ordertaken. We can thus make a time plot.

0 20 40 60 80

4060

8010

0

Index

Sta

ndar

d G

ravi

ty g

(m

/s^2

)

● ●

●●

● ●

● ●

● ●

● ●

●●

● ●● ●

●● ● ● ●

● ●

● ● ●●

●●

●● ● ● ●

●●

●● ●

● ● ●●

●●

● ● ●

The vertical lines separate measurements of difference series.

We can see a lot of measurements are close to the previousmeasurements, which indicates a positive serial correlation.

It’s not surprising that scientists in NIST might unconsciouslymatch their results with the previous measurement, since theprevious measurement is often regarded as the most accurate onetill then.

Chapter 6 - 54

Page 55: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

> plot(g[2:81],g[1:80], pch=20,ylab = "g",xlab="g (Lag = 1)")

> cor(g[2:81],g[1:80])

[1] 0.5002454

> plot(g[3:81],g[1:79], pch=20,ylab = "g",xlab="g (Lag = 2)")

> cor(g[3:81],g[1:79])

[1] 0.1232966

●●

●●

●●

● ●

●●

● ●

●●

●●●●

●●

●●●● ●

●●

●●●●●●

●●●●●

●●

●●●●

●●●●

●●●●

40 60 80 100

4060

8010

0

g[2:81]

g[1:

80]

●●

●●●

●●

●●

● ●

●●

●●

●●●●

●●

●●● ●●

●●

●●●●●●●●●●●

●●

●●●●

●●●●

●●●

40 60 80 10040

6080

100

g[3:81]

g[1:

79]

Lag 1 autocorrelation = 0.5002 Lag 2 autocorrelation = 0.1233

Chapter 6 - 55

Page 56: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Effect of Dependent Errors

With positive serial correlation, the mean of SSTrt is higher than(8− 1)σ2, which makes F -statistic larger and easier to reject thenull.

Chapter 6 - 56

Page 57: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

We cannot check the independence assumption since we don’thave other information. If the data are collected in the order theyare recorded, we can check the autocorrelation. The time plotbelow shows no clear pattern.

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0 10 20 30 40 50 60

−0.

6−

0.2

0.2

0.6

Index

Res

idua

ls

The lag-1 autocorrelation coefficient is −0.113, which meansbasically, no correlation.

> cor(brady3$res[-65],brady3$res[-1])

[1] -0.1132061

If there are other variables available, one should plot the residualsagainst each of these variables.

Chapter 6 - 57

Page 58: STAT22200 Spring 2014 Chapter 6 - University of Chicagoyibi/teaching/stat222/2014/Lectures/C06/Chap6.pdf · 3.4.1 Methods for Checking the Normality Assumption I Histogram of the

Coming Up Next...

Chapter 6 - 58