Chapter 6 Model Assumption Checking and Remediesyibi/teaching/stat222/2017/Lectures/C06.pdf ·...

Chapter 6 Model Assumption Checking andRemedies

Yibi Huang

Chapter 6 - 1

THREE ASSUMPTIONS We Need to Check

In the means model yij = µi + εij or the effects model

yij = µ+ τi + εij ,

we make 3 assumptions about the error term εij ’s.:

1. the errors εij are independent, randomly distributed

2. the errors εij have constant variance across treatments

3. the errors εij follow a normal distribution

As the 3 assumptions are all related to errors εij , most of themodel diagnostic methods are based on the residuals

residual eij = yij − yij = yij − y i•.

Chapter 6 - 2

Standardized and Studentized Residuals

Because the error εij has mean 0 and SD σ, we sometimesstandardize the residuals

standardized residual dij =eij − 0

σ=

eij√MSE

.

By the normality assumption, dij is approximately N(0, 1).Observations with |dij | > 3 are potential outliers.

A more accurate standardization is the

studentized residual sij =eij√

MSE(1− 1ni

).

We divided by√

MSE(1− 1ni

) because eij = yij − y i• has SD

σ√

1− 1ni. Again, if the errors are normal, sij is approximately

N(0, 1). Observations with |sij | > 3 are potential outliers.

Chapter 6 - 3

Example: Resin Glue Failure Time

In previous lectures, the responseof the resin glue experiment islog10(lifetime).In the original data, the responseis simply the life time of glue inhours

●

●

●

●

●

●

●●

●●

●

●

●●●

●

●●

●●●●●●

●●●●●●●

●

●●●●●

020

6010

0

Temperature (Celsius)

Life

(H

our)

175 194 213 231 250

Temperature Life time in Hours175◦C 110.0 82.2 99.1 82.9 71.3 91.7 76.0 79.2194◦C 45.8 51.3 26.5 58.0 45.3 40.8 35.8 45.6213◦C 33.8 34.8 24.2 20.5 22.5 18.8 18.2 24.2231◦C 14.2 16.7 14.8 14.6 16.2 18.9 14.8250◦C 18.0 6.7 12.0 10.5 12.2 11.4

Chapter 6 - 4

Residuals

> resinlife = read.table("resinlife.txt", header=T)

> lmorig = lm(life ~ as.factor(tempC), data=resinlife)

> round(lmorig$res,2)

1 2 3 4 5 6 7 8

23.45 -4.35 12.55 -3.65 -15.25 5.15 -10.55 -7.35

9 10 11 12 13 14 15 16

2.16 7.66 -17.14 14.36 1.66 -2.84 -7.84 1.96

17 18 19 20 21 22 23 24

9.17 10.17 -0.42 -4.12 -2.12 -5.82 -6.42 -0.42

25 26 27 28 29 30 31 32

-1.54 0.96 -0.94 -1.14 0.46 3.16 -0.94 6.20

33 34 35 36 37

-5.10 0.20 -1.30 0.40 -0.40

We see the first observation has the largest residual 23.45.Is it an outlier?

Chapter 6 - 5

Studentized ResidualsIn R, studres() is the command to get the studentized residuals.

> library(MASS) # must load library "MASS" to studres()

> ?studres # see help file of studres()

> studres(lmorig) # studentized residual

1 2 3 4 5

3.55341908 -0.55843235 1.67395045 -0.46787374 -2.07936247

(... some output is omitted ...)

36 37

0.05235783 -0.05235783

> round(studres(lmorig),2) # round studentized residuals to 2 decimals

1 2 3 4 5 6 7 8 9 10

3.55 -0.56 1.67 -0.47 -2.08 0.66 -1.39 -0.95 0.28 0.99

11 12 13 14 15 16 17 18 19 20

-2.38 1.94 0.21 -0.36 -1.02 0.25 1.20 1.34 -0.05 -0.53

21 22 23 24 25 26 27 28 29 30

-0.27 -0.75 -0.83 -0.05 -0.20 0.12 -0.12 -0.15 0.06 0.41

31 32 33 34 35 36 37

-0.12 0.82 -0.67 0.03 -0.17 0.05 -0.05

The first observation with sij = 3.55 > 3 appears to be a potentialoutlier. Chapter 6 - 6

3.4.1 Methods for Checking the Normality Assumption

I Histogram of the residuals, if normal, should be bell-shaped

I Pros: simple, easy to understandI Cons: for a small sample, histogram may not be

bell-shaped even though the sample is from a normaldistribution

I Normal probability plot of the residuals

I aka. normal QQ plot, in which QQ stands for“quantile-quantile”

I the best method to assess normalityI See next slide for details

Chapter 6 - 7

How to Make a Normal Probability Plot?

1. Given data (y1, y2, . . . , yn), standardize: xi = (yi − y)/s.

2. Arrange the data in increasing order: x(1), x(2), . . . , x(n).

3. Find quantiles of the N(0, 1) distribution: z( 1n+1

), z( 2n+1

),. . . , z( n

n+1).

That is, z( in+1

) is a value such that P(Z ≤ z( in+1

)) = in+1 for

Z ∼ N(0, 1).

4. Plot the x(i) values against the z( in+1

) values.

That is, plot the points (z( in+1

), x(i)) for i = 1, 2, . . . , n

Chapter 6 - 8

Interpreting Normal Probability Plots

I If the data are approximately normal, the plot will be close toa straight line.

I Systematic deviations from a straight line indicate anon-normal distribution.

I Outliers appear as points that are far away from the overallpattern of the plot.

I R command: qqnorm()

Chapter 6 - 9

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

−2 −1 0 1 2

−3

−2

−1

01

2

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Normal

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

−2 −1 0 1 2

01

23

45

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

Skewed

●●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●●

●

●●●

●●

●

●●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●●

●

●

●●

●

●

●

●

●●

● ●

●

●

●

●●

●

●

●●

●●●

●● ●

●

●

●

●

●●

●

−2 −1 0 1 2

−5

05

10

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

Heavy tailed

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

● ●

●

●●

●

●

●

●

●●

●

●

●

●

●

−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

Light tailed

Chapter 6 - 10

QQ plot for the Resin Glue Residuals

> qqnorm(lmorig$res)

> qqline(lmorig$res)

> hist(lmorig$res,xlab="Residuals",main="Histogram of Residuals")

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●●

●●

●●●

●

●

●

●

●

●●

●●

−2 −1 0 1 2

−10

010

20

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

Histogram of Residuals

Residuals

Fre

quen

cy−20 −10 0 10 20

02

46

810

Do the residuals look normal?

Chapter 6 - 11

Example: Hodgkin’s Disease

Hodgkin’s disease is a type of lymphoma, which is a canceroriginating from white blood cells called lymphocytes. The dataset Hodgkins.txt contains plasma bradykininogen levels (inmicrograms of bradykininogen per milliliter of plasma) in 3 types ofsubjects

I normal,

I in patients with active Hodgkin’s disease, and

I in patients with inactive Hodgkin’s disease.

The globulin bradykininogen is the precursor substance forbradykinin, which is thought to be a chemical mediator ofinflammation.

I Is this an experiment?

I We can use ANOVA to compare means of several samples inan observational study.

Chapter 6 - 12

> hodgkins = read.table("Hodgkins.txt", header=T)

> library(hodgkins)

> bwplot(BradyLevel~Hodgkins, data=hodgkins)

> qplot(Hodgkins, BradyLevel, data=hodgkins)

Bra

dyLe

vel

2

4

6

8

10

12

14

Active Inactive Normal

●

● ●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●●

●

●●

●

●

●

●

●

●

5

10

Active Inactive NormalHodgkins

Bra

dyLe

vel

The distribution within each group looks right skewed. Let’s fit theANOVA model anyway and take a look at the residuals.

> brady1 = lm(BradyLevel ~ Hodgkins)

> anova(brady1)

Df Sum Sq Mean Sq F value Pr(>F)

Hodgkins 2 65.89 32.95 10.67 0.000104 ***

Residuals 62 191.45 3.09

Chapter 6 - 13

Normal QQ Plot for The Hodgkin Data

> qqnorm(brady1$res, ylab="Residuals")

> qqline(brady1$res)

> library(MASS)

> qqnorm(studres(brady1), ylab="Studentized Residuals")

> qqline(studres(brady1))

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●

●●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●●

●

●●

●

●

●

●

●

●

−2 −1 0 1 2

−2

02

46

Normal Q−Q Plot


Res

idua

ls

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●●

●●

●

●

●

●

●

●

●

−2 −1 0 1 2−

10

12

34

5

Normal Q−Q Plot


Stu

dent

ized

Res

idua

ls

Does the distribution of the residuals look normal?It looks somewhat right-skewed, a potential outlier with sij > 5.

Chapter 6 - 14

Remedies for Non-Normality

Skewness can often be ameliorated by transforming the response— often a power transformation.

fλ(y) =

{yλ, if λ 6= 0

log(y), if λ = 0.

1. If right-skewned, try taking square root, logarithm, or otherpowers < 1

y −→ log(y),√y , or yλ with λ < 1

2. If left-skewned, try squaring, cubing, or other powers > 1

y −→ y2, y3, or yλ with λ > 1

Chapter 6 - 15

Example: Hodgkin’s Disease – QQ Plots

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●●

●●

●

●

●

●

●

●

●

−2 −1 0 1 2

−1

01

23

45

Original Scale


Stu

dent

ized

Res

idua

ls After a log transformation, theresiduals are less skewed, and theoutlier looks less extreme.

The square-root transformationalso reduces skewness but not asmuch as the log transformation.

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●●

●

●●

●

●

●

●

●

●

−2 −1 0 1 2

−2

01

23

4

Square−root Transform


Stu

dent

ized

Res

idua

ls

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●●

●

●●

●

●

●

●

●

●

−2 −1 0 1 2

−2

−1

01

23

Log Transform


Stu

dent

ized

Res

idua

ls

Chapter 6 - 16

R-Codes for Making the Plots on the Previous Slide

> brady1 = lm(BradyLevel ~ Hodgkins, data=hodgkins)

> brady2 = lm(sqrt(BradyLevel) ~ Hodgkins, data=hodgkins)

> brady3 = lm(log(BradyLevel) ~ Hodgkins, data=hodgkins)

> library(MASS)

> qqnorm(studres(brady1), main="Original Scale")


> qqnorm(studres(brady2), main="Square-root Transform")


> qqnorm(studres(brady3), main="Log Transform")


Chapter 6 - 17

Box-Cox MethodBox-Cox method is an automatic procedure to select the “best”power λ that make the residuals of the model

yλij = µi + εij

closest to normal.

I We usually round the optimal λ to a convenient power like

−1, −1

2, −1

3, 0,

1

3,

1

2, 1, 2, . . .

since the practical difference of y0.5827 and y0.5 is usuallysmall, but the square-root transformation is much easier tointerpret.

I A confidence interval for the optimal λ can also be obtained.See Oehlert, p.129 for details.We usually select a convenient power λ∗ in this C.I.

Chapter 6 - 18

Example: Hodgkin’s Disease – Box-Cox

In R, one must first load thelibrary MASS to use boxcox().

The argument of theboxcox() function can be amodel formula, a lm model.

> library(MASS)

> boxcox(brady1)

> boxcox(BradyLevel ~ Hodgkins, data=hodgkins)−2 −1 0 1 2

−80

−70

−60

−50

λlo

g−Li

kelih

ood

95%

The middle dash line marks the optimal λ, the right and left dashline mark the 95% C.I. for the optimal λ.

For the plot, we see the optimal λ is around −0.2, and the 95%C.I. contains 0. For simplicity, we use the log-transformedBradyLevel as our response.

Chapter 6 - 19

A Remark On the Log-Scale

In fact, for measurements of concentration, log scale morecommonly used than the original scale. For example,

I A concentration of 10.1 and 10.001 are almost the same, buta concentration of 0.1 and 0.001 are very different since 0.1 is100 times higher than 0.001.

I In the original scale, (10.1, 10.001) and (0.1, 0.001) differ bythe same amount, 0.099.

I In log scale,

log10 0.1− log10 0.001 = 2 much greater than

log10 10.1− log10 10.001 ≈ 0.0043

Chapter 6 - 20

Example: Hodgkin’s Disease

From the 2 ANOVA tables below, we see the differences of the 3group of patients become more significant after a logtransformation, because the outlier become less extreme and donot inflate the MSE as much.

Response: BradyLevel


Hodgkins 2 65.893 32.946 10.67 0.0001042 ***

Residuals 62 191.449 3.088

Response: log(BradyLevel)


Hodgkins 2 2.2526 1.12631 15.436 3.628e-06 ***

Residuals 62 4.5238 0.07297

Chapter 6 - 21

Non-Parametric Tests

I Transformation does not always work.For example, it helps little for symmetric but heavy-tailed(many outliers) distributions.

I The ANOVA F -test is robust to non-normality, but it is notresistent to outliers.

I If outliers are unavoidable, and cannot be removed, trynon-parametric tests, like permutation test in Chapter 2, andKruskal-Wallis test in Section 3.11.1, that doesn’t rely onnormality assumption, which we will introduce soon after wefinish Chapter 6.

Chapter 6 - 22

Part II: Constant Variance Assumption

Outline:

I Why Is Non-Constant Variance a Problem?I Tools for checking nonconstant variance — Residual Plots

I Residuals v.s. Fitted ValuesI Residuals v.s. TreatmentsI Residuals v.s. Other Variables

I RemediesI Transforming the Response —

Variance-Stabilizing TransformationI Brown-Forsythe Modified F -test — an alternative to ANOVA

F -testI Welch Test for Contrasts w/o Constant Variance Assumption

Chapter 6 - 23

Why Is Non-Constant Variance a Problem?Example — Serial Dilution Plating (Exercise 6.2 on p.143). . . . . . . . . . . . a method to count the number of bacteria in solution

Chapter 6 - 24

Example — Serial Dilution Plating (Cont’d)

I Goal: to compare 3 pasteurization methods for milk

I Design: 15 samples of milk randomly assigned to the 3 trts

I Response: the bacterial load in each sample after treatment,determined via serial dilution plating

I Data: http://users.stat.umn.edu/~gary/book/fcdae.data/ex6.2

Method 1 Method 2 Method 3

26× 102 35× 103 29× 105

29× 102 23× 103 23× 105

20× 102 20× 103 17× 105

22× 102 30× 103 29× 105

32× 102 27× 103 20× 105

Mean 25.8× 102 27× 103 23.6× 105

SD 492 5874 536656

√MSE = 309900

Chapter 6 - 25

Example — Serial Dilution Plating (Cont’d)

95%C.I. for the mean of Method 1:

y 1• ± t0.025,15−3

√MSE√n1

= 2580± 2.179309900√

5

= 2580± 301965

= (−299385, 304545)

which is way larger than the range of 5 observations formethod 1 (2000-3200). What happened?

Chapter 6 - 26

Checking Constant Variance Assumption (1)

Tests for equality of variance are available, but notuseful, because

1. the tests do not tell us how much nonconstantvariance is present and how much it affects ourinferences.

2. classical tests of constant variance are verysensitive to nonnormality in the errors.It is very difficult to tell non-normality fromnormality with non-constant variance.)

Chapter 6 - 27

Checking Constant Variance Assumption (2)

Better tool for checking constant variance — Residual Plots

I Residuals v.s. Fitted Values

I Residuals v.s. Treatments

I Residuals v.s. Other Variables

If the constant variance assumption is true, residuals will evenlyspread around the zero line.

Rule of thumb: ANOVA F -tests for CRD can toleratenon-constant variance to some extent, so do tests for contrasts.It usually fine as long as

max{σi}min{σi}

≤ 2, 3 or even 4,

especially when the group sizes ni are (roughly) equal.

Chapter 6 - 28

Example: Resin Glue Data> plot(lmorig$fit,lmorig$res,ylab="Residuals",xlab="Fitted Values")

> abline(h=0) # adding a zero line

> plot(temp,lmorig$res,ylab="Residuals",xlab="Centigrade Temperature")

> abline(h=0) # adding a zero line

Residuals v.s. Fitted Values Residuals v.s. Treatments●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●●

●●●●●●●

●

●

●

●●●●

20 40 60 80

−10

010

20

Fitted Values

Res

idua

ls

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●●

● ●●●●●●

●

●

●

●●●●

180 200 220 240

−10

010

20Centigrade Temperature

Res

idua

ls

I Why the points line up vertically?

I The spread of residuals increases as the fitted value increaseI The lower the temperature, the higher the variability of the

response. Chapter 6 - 29

More on the Residuals PlotsResiduals plots can be used to check many other things, likenon-linearity.

E.g., for the resin glue data, one can check if the linear modelyij = β0 + β1T + εij is appropriate by checking the residual plot

> lmorig1 = lm(life ~ temp)

> plot(temp,lmorig1$res,ylab="Residuals",xlab="Centigrade Temperature")

> abline(h=0)

●

●

●

●

●

●

●●

●

●

●

●

●●●

●

●●

●●●●●

●

●●●●●●●

●

●●●●●

−30

−10

1030

Centigrade Temperature

Res

idua

ls

175 194 213 231 250

From the residual plot, we see

I non-constant varianceacross temperature

I lifetime is curved, notlinear with temperature

Chapter 6 - 30

Plot of Residuals v.s. Variables Not in the Model

If there exist other variables that might affect the response, but arenot included in the model, then one should check the plots ofresiduals versus these variables. For example,

I if experimental units come from different batches, then plotresiduals v.s. batches

I if measurements are made by several operators, then plotresiduals should v.s. operators

Patterns in such residual plots suggest these variables should

I either be included in the analysis(but note one CANNOT claim these variables have acausal-effect on the response, since they are not controlled inadvance)

I or be controlled more carefully, e.g., by a block design, infuture experiments

Chapter 6 - 31

Remedy 1: Variance-Stabilizing TransformationIf the SD σ (the spread of residuals) changes the mean µ (thefitted values), you can try a variance-stabilizing transformation ofthe response to make the variance (closer to) constant.

I if the SD is proportional to the fitted value, then

y → log(y)

I if the SD is proportional to√

the fitted value, i.e., thevariance is proportional to the fitted value, then

y → √y

I In general, if the SD σ is proportional to (the fitted values)α,then the variance-stabilizing transformation is

y →

{y1−α for α 6= 1

log(y) for α = 1

Chapter 6 - 32

Example: Resin Glue

> lm1 = lm(life ~ as.factor(temp))

> lm2 = lm(sqrt(life) ~ as.factor(temp))

> lm3 = lm(log(life) ~ as.factor(temp))

> plot(lm1$fit, lm1$res, xlab="Fitted Value", ylab="Residuals")

> abline(h=0,lty=2)


> abline(h=0,lty=2)


> abline(h=0,lty=2)

Original Scale Square-root Transform Log Transform●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●●

●●●●●●●

●

●

●

●●●●

20 40 60 80

−10

010

20

Fitted Value

Res

idua

ls

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●●

4 5 6 7 8 9

−1.

5−

0.5

0.5

Fitted Value

Res

idua

ls

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

2.5 3.0 3.5 4.0 4.5

−0.

40.

00.

20.

4

Fitted ValueR

esid

uals

Chapter 6 - 33

Box-Cox Again

In many cases, we don’t have a good idea what is the value of α.We can still try power transformation of the response.

fλ(y) =

{yλ, if λ 6= 0

log(y), if λ = 0.

How to select λ?

I Trial and error: try convenient power like −1,−1/2,−1/3, 0,1/3, 1/2, 2, . . . and then check residual plots for each of themfor the constant variance.

I Box-Cox method:Though Box-Cox is developed to select a power transformationmaking the residuals as normal as possible, it’s been shownthat the optimal λ is often close to the variance-stabilizing λ.

Chapter 6 - 34

Example – Count of Bacteria Revisit

> library(MASS)

> boxcox(count ~ as.factor(method))

> plot(method, log10(count),

xlab="Methods",

ylab="log10(Count of Bacteria)")

−2 −1 0 1 2

−10

0−

60−

200

λ

log−

Like

lihoo

d

95%

●●●●●

●●●●●

●●●

●●

1.0 1.5 2.0 2.5 3.0

3.5

4.5

5.5

6.5

Methods

log1

0(C

ount

of B

acte

ria)

After log transformation, the 3 variances look even.

Chapter 6 - 35

Example: Resin Glue – Box-Cox

> library(MASS)

> boxcox(life ~ as.factor(temp))

−2 −1 0 1 2

−70

−50

−30

−10

λ

log−

Like

lihoo

d

95%

The 95% C.I. for λ contains both 0 and 1/2. As λ = 1/2 is veryclose to the boundary of the C.I, λ = 0 seems to be a betterchoice, which is consistent with the Arrhenius Law.

Chapter 6 - 36

Drawbacks of Transformation

I Except for a few special transformation (log,√

, reciprocal),the transformed response usually lacks natural interpretation(How to interpret y0.1?)

I Unless having a good interpretation on the transformedresponse, think again before making transformations

Remember that ANOVA tests have some tolerance fornon-constant variance. If

max{σi}min{σi}

≤ 2, 3, or even 4,

don’t worry too much about non-constant variance.In that case, it is fine to leave the response untransformed.

Chapter 6 - 37

> library(mosaic)

> sd(life ~ tempC, data=resinlife)

175 194 213 231 250

12.895514 9.557785 6.378703 1.661181 3.646917

> sd(sqrt(life) ~ tempC, data=resinlife)

175 194 213 231 250

0.6802037 0.7505140 0.6223047 0.2048698 0.5305554

> sd(log(life) ~ tempC, data=resinlife)

175 194 213 231 250

0.1440872 0.2393055 0.2451210 0.1012540 0.3177677

After the log transformation, the ratio of the largest and smallestSD is 2.2, which is acceptable

Chapter 6 - 38

Brown-Forsythe Modified F -test

If transformation doesn’t work well, or you don’t want to transformfor lacking a good interpretation, try tests that don’t rely on theconstant variance assumption:

I Brown-Forsythe modified F -test — an alternative ofANOVA F -test:

BF =

∑gi=1 ni (y i• − y••)2∑gi=1 s

2i (1− ni/N)

=SSTrt∑g

i=1 s2i (1− ni/N)

in which s2i is the sample variance in treatment i . Under thenull hypothesis of equal treatment means, BF is approximatelydistributed as an F -distribution with g − 1 and ν degrees offreedom, where

ν =(∑g

i=1 di )2∑g

i=1 d2i /(ni − 1)

in which di = s2i (1− ni/N).

Chapter 6 - 39

Welch Test for Contrasts w/o Constant VarianceAssumption

If transformation doesn’t work well, try tests that doesn’t rely onthe constant variance assumption:

I Welch test for a contrast∑g

i=1 ciµi : The test statistic is

t =

∑gi=1 ciy i•√∑gi=1 c

2i s

2i /ni

which is approximately a t-distribution with ν degrees offreedom, where

ν =

(∑gi=1 c

2i s

2i /ni

)2∑gi=1

1ni−1

c4i s4i

n2i

Observe that this is a generalization of the two-sample testwithout equal variance assumption.

Chapter 6 - 40

Part III — Checking for Dependent Errors

I Among the 3 assumptions, violation of the independenceassumption causes severest problem. Most of our the analysis(ANOVA, test of contrasts, multiple comparisons, etc.) havelittle tolerance on dependence of errors

I There are various forms of dependence, serial dependence andspatial dependence are two common ones

I Remedies for Dependence

I There isn’t much we can do about dependence using ourcurrent machinery, since no simple transformation canremove dependence.

I Analysis of dependent data requires tools like time seriesor spatial statistics, which is beyond the scope of thisclass

Chapter 6 - 41

Example: Standard GravityThe National Bureau of Standards performed 8 series ofexperiments in 1924-1935 to determine g , the standard gravity.

The data are given in the table below (in deviations from9.8m/s2 × 105, e.g., the first measurement of g is 9.80076 m/s2),with series 1 representing the earliest set of experiments and series8 the last.

Series Measurements

1 76 82 83 54 35 46 87 682 87 95 98 100 109 109 100 81 75 68 673 105 83 76 75 51 76 93 75 624 95 90 76 76 87 79 77 715 76 76 78 79 72 68 75 786 78 78 78 86 87 81 73 67 75 82 837 82 79 81 79 77 79 79 78 79 82 76 73 648 84 86 85 82 77 76 77 80 83 81 78 78 78

Chapter 6 - 42

Weird ANOVA F -Test

> g = c(76,82,83,54,35,46,87,68,87,95,98,100,109,109,100,81,75,68,67,

105,83,76,75,51,76,93,75,62,95,90,76,76,87,79,77,71,

76,76,78,79,72,68,75,78,78,78,78,86,87,81,73,67,75,82,83,

82,79,81,79,77,79,79,78,79,82,76,73,64,

84,86,85,82,77,76,77,80,83,81,78,78,78)

> series = c(rep(1,8),rep(2,11),rep(3,9),rep(4,8),rep(5,8),rep(6,11),

rep(7,13),rep(8,13))

> lmg = lm(g ~ as.factor(series))

> anova(lmg)

Response: g


as.factor(series) 7 2818.6 402.66 3.5675 0.002357 **

Residuals 73 8239.4 112.87

ANOVA rejects the H0 of the 8 series having equal means. Whatdoes this mean? Will you conclude that

(a) g had changed in the 8 series of measurements, or

(b) the ANOVA F -test failed?

If your answer is (a), how do you explain the change of g?If your answer is (b), why the ANOVA F -test failed?

Chapter 6 - 43

Example: Standard Gravity

The National Bureau of Standards1 is the government agency thatmeasures things. The following statement is taken from the NISTwebsite:

Founded in 1901, NIST is a non-regulatory federalagency within the U.S. Department of Commerce. NISTmission is to promote U.S. innovation and industrialcompetitiveness by advancing measurement science,standards, and technology in ways that enhanceeconomic security and improve our quality of life.

Thus, it is safe to assume that the NBS scientists were trying hardto measure the same quantity g (e.g., all experiments were done inthe same location) throughout all eight series of experiments.

1now called the National Institute of Standards and TechnologyChapter 6 - 44

> qplot(series, g, xlab="Series",

ylab="Standard Gravity g (m/s^2)")+

scale_x_continuous(breaks=1:8)

Variance decreases with series,which makes sense since theaccuracy of measurementimproves as time goes by.

●

●●

●

●

●

●

●

●

●●●

●●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●●

●

●●●●

●

●

●● ●●●

●●

●

●

●

●

●● ●●●●●●●●●●

●●

●

●●●●

●●●●●●●●●

50

70

90

110

1 2 3 4 5 6 7 8Series

Sta

ndar

d G

ravi

ty g

(m

/s^2

)

The ANOVA F -test here may not be reliable because at least theconstant variance assumption is not met

> round(sd(g ~ series), 2)

1 2 3 4 5 6 7 8

19.25 15.29 15.76 8.30 3.65 5.84 4.74 3.36

Will a transformation work here? Box-Cox?

Chapter 6 - 45

Brown-Forsythe Modified F -testIn view of the nonconstant variability, let’s try the Brown-Forsythemodified F -test

BF =SSTrt∑g

i=1 s2i (1− ni/N)

The numerator SSTrt = 2819 can be found in the ANOVA tableabove. The denominator is found using R (see the codes below) tobe 888.5747

> sds = sd(g ~ series)

> ni = tally(~ series)

> di = sds^2*(1-ni/sum(ni))

> BFbottom = sum(di)

> BFbottom

[1] 888.5747

> BF = 2819/BFbottom

The BF -statistic is thus

BF =2819

888.5747= 3.1725

Chapter 6 - 46

Under the null hypothesis of equal treatment means, BF isapproximately distributed as an F -distribution with g − 1 and νdegrees of freedom, where the degrees of freedom ν is calculatedas 29.46 in the R code below.

ν =(∑g

i=1 di )2∑g

i=1 d2i /(ni − 1)

in which di = s2i (1− ni/N)

> nu = (BFbottom)^2/sum(di^2/(ni-1))

> nu

[1] 29.46249

> 1 - pf(BF, df1 = 7, df2 = nu) # P-value of the BF-test

[1] 0.01263341

However, the BF-test, not relying on the constant varianceassumption, also rejects the null hypothesis of equal mean at aP-value 0.0126. Why the BF-test also failed?

Chapter 6 - 47

Tools for Checking Serial Dependence1. Time plot: a plot of residuals v.s. the order they are

measured)I It’s better to keep track of the order units are measured.I A smooth time-plot is a sign of positive serial

dependence, since a smooth time plot means successiveresiduals are too close together

2. Autocorrelation PlotsI Lag 1 autocorrelation plot: plotting (e1, . . . , en−1) v.s.

(e2, . . . , en)I Lag k autocorrelation plot: plotting (e1, . . . , en−k) v.s.

(e1+k , . . . , en)I Any trend in the autocorrelation plot is a sign of serial

dependence.3. Autocorrelation

I the Lag-k autocorrelation coefficient is the correlationcoefficient of (e1, . . . , en−k) v.s. (e1+k , . . . , en),k = 1, 2, 3, . . . .

Chapter 6 - 48

The observations in each series are in fact given in time ordertaken. We can thus make a time plot.

4060

8010

012

0

Sta

ndar

d G

ravi

ty g

(m

/s^2

)

●

● ●

●

●

●

●

●

●

●●

●

● ●

●

●

●

● ●

●

●

● ●

●

●

●

●

●

●

●

● ●

●

●●

●

● ●● ●

●

●

●● ● ● ●

● ●

●

●

●

●

● ● ●●

●●

●● ● ● ●

●

●●

●

●● ●

●

● ● ●●

●●

● ● ●

series 1

series 2

series 3

series 4

series 5

series 6

series 7

series 8

We can see a lot of measurements are close to the previousmeasurements, which indicates a positive serial correlation.

It’s not surprising that scientists in NIST might unconsciouslymatch their results with the previous measurement, which wasoften regarded as the most accurate one till then.

Chapter 6 - 49

Autocorrelation Plots

> qplot(g[2:81],g[1:80])

> cor(g[2:81],g[1:80])

[1] 0.5002454

> qplot(g[3:81],g[1:79])

> cor(g[3:81],g[1:79])

[1] 0.1232966

●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●

● ●

●

●

●●

●

●

●

●

●

●

●

● ●

●

●●

●

●●●●

●

●

●●●● ●

●●

●

●

●

●

●●●●

●●

●●●● ●

●

●●

●

●●●

●

●● ●●

●●●●

50

70

90

110

50 70 90 110g[2:81]

g[1:

80]

●

●●

●

●

●

●

●

●

●●●

●●

●

●

●

●●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●●

●

●●●●

●

●

●●● ●●

●●

●

●

●

●

●●●●

●●●

●● ●●●

●●

●

●●●

●

● ● ●●

●●●

50

70

90

110

50 70 90 110g[3:81]

g[1:

79]

Lag 1 autocorrelation = 0.5002 Lag 2 autocorrelation = 0.1233

Chapter 6 - 50

Effect of Dependent Errors

Positive serial correlation make the value of one measurementcloser to that of the previous measurement and hence reduces thewithin group variability (SSE) and enlarges the F -statistic, so it iseasier to reject the null.

Chapter 6 - 51

Spatial Dependence

Spatial dependence can arise when the experimental units arearranged in space, like plants in a farm. Spatial dependence occurswhen units that are closer together are more similar than unitsfarther apart.

Chapter 6 - 52

Chapter 6 Model Assumption Checking and Remediesyibi/teaching/stat222/2017/Lectures/C06.pdf ·...

Documents

Transcript of Chapter 6 Model Assumption Checking and Remediesyibi/teaching/stat222/2017/Lectures/C06.pdf ·...