Experimental design and statistical analyses of data

71
1 Experimental design and statistical analyses of data Lesson 5: Mixed models Nested anovas Split-plot designs

description

Experimental design and statistical analyses of data. Lesson 5: Mixed models Nested anovas Split-plot designs. Randomized block design. All treatments are allocated to the same experimental units Treatments are allocated at random. Treatments ( a = 4). Blocks ( b = 3). - PowerPoint PPT Presentation

Transcript of Experimental design and statistical analyses of data

Page 1: Experimental design and statistical analyses of data

1

Experimental design and statistical analyses of data

Lesson 5:Mixed modelsNested anovas

Split-plot designs

Page 2: Experimental design and statistical analyses of data

2

Randomized block design

• All treatments are allocated to the same experimental units

• Treatments are allocated at random

B C B

A B D

D A A

C D C

Blocks (b = 3)

Treatments (a = 4)

Page 3: Experimental design and statistical analyses of data

3

Dy1

Treatments

Patient

A B C D Average

1

2

3

Average

Cy1

Ay2

Ay3

Cy2By2 Dy2

By3 Cy3 Dy3

Ay By

1y

2y

3y

Cy Dy y

55443322110 xxxxxy

Blocks (patients) Treatments (drugs)

Ay1 By1

Page 4: Experimental design and statistical analyses of data

4

An alternative way of writing a GLM

jiijy

Response of patient j receiving drug i

Overall mean Effect of drug i

Effect of patient j

Residual

αi = μi - μ

βj = μj - μ

Page 5: Experimental design and statistical analyses of data

5

jiijy ˆˆˆˆ

Predicted value of y

αi = μi - μ

βj = μj - μ

yyyii

yy jj Response of patient j receiving drug i

Page 6: Experimental design and statistical analyses of data

6

Treatments

Patient

A B C D

1 5.17 5.21 4.91 4.74 5.008

2 6.23 7.34 6.18 6.31 6.515

3 4.93 4.55 4.64 4.61 4.683

5.443 5.700 5.243 5.220 5.402iy

jy

402.5ˆ y042.0402.5443.5ˆ yyAA298.0402.5700.5ˆ yyBB

158.0402.5243.5ˆ yyCC182.0402.5220.5ˆ yyDD

0ˆ i

i

Page 7: Experimental design and statistical analyses of data

7

Treatments

Patient

A B C D

1 5.17 5.21 4.91 4.74 5.008

2 6.23 7.34 6.18 6.31 6.515

3 4.93 4.55 4.64 4.61 4.683

5.443 5.700 5.243 5.220 5.402iy

jy

402.5ˆ y

394.0402.5008.5ˆ11 yy

113.1402.5515.6ˆ22 yy

719.0402.5683.4ˆ33 yy

0ˆ j

j

Page 8: Experimental design and statistical analyses of data

8

402.5ˆ y042.0402.5443.5ˆ yyAA298.0402.5700.5ˆ yyBB

158.0402.5243.5ˆ yyCC182.0402.5220.5ˆ yyDD

394.0402.5008.5ˆ11 yy

113.1402.5515.6ˆ22 yy

719.0402.5683.4ˆ33 yy

Effects of drugs

Effects of patients

Ex: Patient 2 receiving treatment C:

357.6113.1158.0402.5ˆˆˆ 22 CC yy

Page 9: Experimental design and statistical analyses of data

9

Consider the two questions:

• Are the three patients different?

• Are patients in general different?

• In the first case, ”patients” is considered as a fixed factor

• In the second case, ”patients” is considered as a random factor

Page 10: Experimental design and statistical analyses of data

10

”Patients” is a random effect:

jiy

βj is assumed to be iid ND(0,σb2)

0

Value of

Pro

babi

lity

of

β

If patients are randomly chosen, βj will be a stochastic variable

i.e. independently and identically normally distributed with zero mean and variance σ²b

Page 11: Experimental design and statistical analyses of data

11

V(y) = V(μ + αi + βj + ε) = V(μ)+ V(αi)+ V( βj)+ V(ε)

= σa2 + σb

2 + σ2

Variances

Variance due to drug (factor a)Variance due to patient (factor b)

Residual variance

Page 12: Experimental design and statistical analyses of data

12

Both factors are fixed

V(y) = V(μ + αi + βj + ε) = V(μ)+ V(αi)+ V( βj)+ V(ε)

= σa2 + σb

2 + σ2

V(y) = σ2

banyVyV

2

/)()(

Variance of a single observation:

Variance of an average:

Page 13: Experimental design and statistical analyses of data

13

”Patients” is a random factor (mixed anova)

V(y) = V(μ + αi + βj + ε) = V(μ)+ V(αi)+ V( βj)+ V(ε)

= σa2 + σb

2 + σ2

V(y) = σb2 + σ2 Variance of a single observation:

Variance of an average: baabab

yV bb

/)( 22

22

Page 14: Experimental design and statistical analyses of data

14

Both factors are random

V(y) = V(μ + αi + βj + ε) = V(μ)+ V(αi)+ V( βj)+ V(ε)

= σa2 + σb

2 + σ2

V(y) = σa2 +σb

2 + σ2 Variance of a single observation:

Variance of an average:

baabbaba

yV baba

/)( 222

222

Page 15: Experimental design and statistical analyses of data

15

Source df MS E[MS] F

Drugs

Patients

Error

a-1

b-1

(a-1)(b-1)

MSa

MSb

MSe

Total ab-1

Expected Means Squares

Page 16: Experimental design and statistical analyses of data

16

Expected Mean Squares

E[MSa] = bσa2 + σ2

E[MSb] = aσb2 + σ2

E[MSe] = σ2

df = a-1

df = b-1

df = (a-1)(b-1)

H0: αA = αB = αC = αD = 0 → σa2 = 0 →

e

a

MS

MSF

H0: β1 = β2 = β3 = 0 → σb2 = 0 →

e

b

MS

MSF

Page 17: Experimental design and statistical analyses of data

17

Source df MS E[MS] F

Drugs

Patients

Error

a-1

b-1

(a-1)(b-1)

MSa

MSb

MSe

bσa2 + σ2

aσb2 + σ2

σ2

MSa/Mse

MSb/MSe

Total ab-1

Page 18: Experimental design and statistical analyses of data

18

Source df MS E[MS] F

Drugs

Patients

Error

3

2

6

0.149

3.824

0.117

bσa2 + σ2

aσb2 + σ2

σ2

MSa/Mse

MSb/MSe

Total 11

Page 19: Experimental design and statistical analyses of data

19

Hvis ”patients” is a random factor, σb2 is estimated from

E[MSb] = aσb2 + σ2 →

a

MSMS

a

sMSs ebb

b

22

927.04

117.0824.32

bs

V(y) = σb2 + σ2 = 0.927+0.117 = 1.044Variance of a single observation:

Variance of the average:

0.3190.0100.30912

117.0

3

927.0)(

22

bab

yV b

Page 20: Experimental design and statistical analyses of data

20

How to do it with SAS

Page 21: Experimental design and statistical analyses of data

21

DATA eks5_1;

INPUT pat $ treat $ y; /* indlæser data */

CARDS; /* her kommer data. Kan også indlæses fra en fil */

1 A 5.17

2 A 6.23

3 A 4.93

1 B 5.21

2 B 7.34

3 B 4.55

1 C 4.91

2 C 6.18

3 C 4.64

1 D 4.74

2 D 6.31

3 D 4.61

;

PROC GLM; /* procedure General Linear Models */

TITLE 'Eksempel 5.1'; /* medtages hvis der ønskes en titel */

CLASS pat treat; /* pat og treat er klasse (kvalitative) variable */

MODEL y = pat treat;

RANDOM pat; /* Patienter er en tilfældig faktor */

RUN;

Page 22: Experimental design and statistical analyses of data

22

Eksempel 5.1 8 13:18 Monday, November 5, 2001  General Linear Models Procedure Dependent Variable: Y Source DF Sum of Squares Mean Square F Value Pr > F Model 5 8.09475000 1.61895000 13.80 0.0031 Error 6 0.70401667 0.11733611 Corrected Total 11 8.79876667  R-Square C.V. Root MSE Y Mean  0.919987 6.341443 0.34254359 5.40166667  Source DF Type I SS Mean Square F Value Pr > F PAT 2 7.64831667 3.82415833 32.59 0.0006TREAT 3 0.44643333 0.14881111 1.27 0.3666 Source DF Type III SS Mean Square F Value Pr > F PAT 2 7.64831667 3.82415833 32.59 0.0006TREAT 3 0.44643333 0.14881111 1.27 0.3666 

MSb

MSa

MSe

Page 23: Experimental design and statistical analyses of data

23

Eksempel 5.1 18

09:00 Friday, November 16, 2001

General Linear Models Procedure

Source Type III Expected Mean Square

PAT Var(Error) + 4 Var(PAT)

TREAT Var(Error) + Q(TREAT)

Page 24: Experimental design and statistical analyses of data

24

Nested designs

Page 25: Experimental design and statistical analyses of data

25

1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

1 2 3 1 2 3 1 2 3 1 2 3

A B C DFactor A (drug)

Factor B (patient)

Replicate

Model: kijijjiijky )()(

1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

A B C DFactor A (drug)

Factor B (patient)

Replicate

Model: kijjiiijky )()(

1 2 3 1 2 3 1 2 3 1 2 3

Patient j is the same for all drugs

Patient j is not the same for all drugs

Patients are said to be nested within drugs

Replicates can also be regarded as nestedwithin drugs and patients

Page 26: Experimental design and statistical analyses of data

26

Rules for finding the EMS(after Dunn and Clark)

1. For each effect, write down every possible variance component containing every letter of the effect name. For example, in a two way design with r replicates per cell, the EMS for factor A includes σa

2, σab2 and σ(ab)e

2, but not σb2

2. For any nested factor add in parentheses to the effect name the name(s) of the factor within it is nested e.g if B is nested in A, σ(a)b

2 is the variance of β(i)j.

3. For the coefficient of each variance component, use all letters not in the subscripts of the variance component

4. For each variance component, look at any subscripts outside parentheses that are not in the effect name; if any of these letters corresponds to a fixed effect, omit that variance component

Page 27: Experimental design and statistical analyses of data

27

1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

1 2 3 1 2 3 1 2 3 1 2 3

A B C D

Two-way anova (A and B fixed)Factor A (drug)

Factor B (patient)

Replicate

Model: kijijjiijky )()(

Interaction between drug and patient

Residual of the kth replicate nested within drug i and patient j

Page 28: Experimental design and statistical analyses of data

28

Model: kijijjiijky )()(

0i

i 0j

j i j

ij 0)(

(1) For each effect, write down every possible variance component containing every letter of the effect name. For example, in a two way design with r replicates per cell, the EMS for factor A includes σa

2, σab2 and σ(ab)e

2, but not σb2

σa2 + σab

2 + σ(ab)e 2Factor A:

σb2 + σab

2 + σ(ab)e 2Factor B:

σab2 + σ(ab)e 2Factor AB:

σ(ab)e 2Residual:

Page 29: Experimental design and statistical analyses of data

29

Model: kijijjiijky )()(

0i

i 0j

j i j

ij 0)(

σa2 + σab

2 + σ(ab)e 2

σb2 + σab

2 + σ(ab)e 2

σab2 + σ(ab)e 2

Factor A:

Factor B:

Factor AB:

(2) For any nested factor add in parentheses to the effect name the name(s) of the factor within it is nested e.g if B is nested in A, σ(a)b

2 is the variance of β(i)j.

σ(ab)e 2Residual:

Page 30: Experimental design and statistical analyses of data

30

Model: kijijjiijky )()(

0i

i 0j

j i j

ij 0)(

brσa2 + rσab

2 + σ(ab)e 2Factor A:

arσb2 + rσab

2 + σ(ab)e 2Factor B:

rσab2 + σ(ab)e 2Factor AB:

(3) For the coefficient of each variance component, use all letters not in the subscripts of the variance component

σ(ab)e 2Residual:

Page 31: Experimental design and statistical analyses of data

31

Model: kijijjiijky )()(

0i

i 0j

j i j

ij 0)(

brσa2 + rσab

2 + σ(ab)e 2

arσb2 + rσab

2 + σ(ab)e 2

rσab2 + σ(ab)e 2

Factor A:

Factor B:

Factor AB:

(4) For each variance component, look at any subscripts outside parentheses that are not in the effect name; if any of these letters corresponds to a fixed effect, omit that variance component

σ(ab)e 2Residual:

Page 32: Experimental design and statistical analyses of data

32

1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

1 2 3 1 2 3 1 2 3 1 2 3

A B C D

Two-way anova (A and B fixed)Factor A (drug)

Factor B (patient)

Replicate

Model: kijijjiijky )()(

0i

i 0j

j i j

ij 0)(

2)(

222)ˆ( rababbayV abryV 2)(

Source df MS E[MS] F

A

B

AB

Error

a-1

b-1

(a-1)(b-1)

ab(r-1)

MSa

MSb

MSab

MSe

brσa2 +r σab

2+ σ2

arσb2 + r σab

2+ σ2

r σab2+ σ2

σ2

MSa/MSe

MSb/MSe

MSab/MSe

Page 33: Experimental design and statistical analyses of data

33

1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

1 2 3 1 2 3 1 2 3 1 2 3

A B C D

Two-way anova (A fixed, B random)Factor A (drug)

Factor B (patient)

Replicate

Model: kijijjiijky )()(

0i

i

2)(

222)ˆ( rababbayV

Source df MS E[MS] F

A

B

AB

Error

a-1

b-1

(a-1)(b-1)

ab(r-1)

MSa

MSb

MSab

MSe

brσa2 +r σab

2+ σ2

arσb2 + r σab

2+ σ2

r σab2+ σ2

σ2

MSa/MSab

MSb/MSe

MSab/MSe

i

ij 0)( βj is ND(0, σb2) (αβ)ij is ND(0; σab

2(1-1/a))

abrarabrb

yV bb /)( 22

22

NB!

Page 34: Experimental design and statistical analyses of data

34

1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

1 2 3 1 2 3 1 2 3 1 2 3

A B C D

Two-way anova (A and B random)Factor A:

Factor B:

Replicate

Model: kijijjiijky )()(

2)(

222)ˆ( rababbayV

Source df MS E[MS] F

A

B

AB

Error

a-1

b-1

(a-1)(b-1)

ab(r-1)

MSa

MSb

MSab

MSe

brσa2 +r σab

2+ σ2

arσb2 + r σab

2+ σ2

r σab2+ σ2

σ2

MSa/MSab

MSb/MSab

MSab/MSe

βi is ND(0, σb2) (αβ)ij is ND(0; σab

2)αi is ND(0, σa2)

abrrarbryV abba /)( 2222

Page 35: Experimental design and statistical analyses of data

35

1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

A B C D

Nested anova (A fixed, B random)Factor A (drug)

Factor B (patient)

Replicate

Model: kijjiiijky )()(

2)(

2)(

2)ˆ( rabbaayV

Source df MS E[MS] F

A

B(A)

Error

a-1

a(b-1)

ab(r-1)

MSa

MS(a)b

MSe

brσa2 +r σ(a)b

2+ σ2

rσ(a)b2 + σ2

σ2

MSa/MS(a)b

MS(a)b/MSe

MSe

β(i)j is ND(0, σ(a)b2)

1 2 3 1 2 3 1 2 3 1 2 3

0i

i

abrrabrab

yV baba /)( 22

)(

22)(

Page 36: Experimental design and statistical analyses of data

36

1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

A B C D

Nested anova (A and B random)Factor A (doctor)

Factor B (patient)

Replicate

Model: kijjiiijky )()(

2)(

2)(

2)ˆ( rabbaayV

Source df MS E[MS] F

A

B(A)

Error

a-1

a(b-1)

ab(r-1)

MSa

MS(a)b

MSe

brσa2 +r σ(a)b

2+ σ2

rσ(a)b2 + σ2

σ2

MSa/MS(a)b

MS(a)b/MSe

MSe

β(i)j is ND(0, σ(a)b2)

abrrbrabraba

yV baabaa /)( 22

)(2

22)(

2

αi is ND(0, σa2)

1 2 3 1 2 3 1 2 3 1 2 3

Page 37: Experimental design and statistical analyses of data

37

40% 20% 0% Four level nested anova

Tree (b = 2 )

Replicate (r = 2)

Model: rijkkijjiiijky )()()(

2)(

2)(

2)(

2)ˆ( rabccabbaayV

β(i)j is ND(0, σ(a)b2)

abcrrcrabcrabcab

yV cabbacabba /)( 22

)(2

)(

22)(

2)(

Leaf (c = 3 )

1 2 1 2 1 2

1

1 2 3

1 2 1 2 1 2

2

1 2 3

1 2 1 2 1 2

1

1 2 3

1 2 1 2 1 2

2

1 2 3

1 2 1 2 1 2

1

1 2 3

1 2 1 2 1 2

2

1 2 3

Treatment (a = 3)

γ(ij)k is ND(0, σ(ab)c2)0

ii

Page 38: Experimental design and statistical analyses of data

38

Sourcec df MS E[MS] F

Treatments

Trees

Leaves

Error

a-1

a(b-1)

ab(c-1)

abc(r-1)

MSa

MS(a)b

MS(ab)c

MSe

bcrσa2 +cr σ(a)b

2+ r σ(ab)c2 +σ2

cr σ(a)b2+ r σ(ab)c

2 +σ2 r σ(ab)c

2 +σ2

σ2

MSa/MS(a)b

MS(a)b/MS(ab)c

MS(ab)c/MSe

MSe

MS(ab)c = rs(ab)c2 + s2 →

r

MSMSs ecab

cab

)(2

)(

MS(a)b = cr s(a)b2+ r s(ab)c

2 +s2 = cr s(a)b2 + MS(ab)c

→cr

MSMSs cabba

ba)()(2

)(

MSa = bcrsa2 +cr s(a)b

2+ r s(ab)c2 +s2 = bcrsa

2 +MS(a)b →bcr

MSMSs baa

a)(2

2

)(2

)(2

)(2)ˆ( rabccabbaayV

22)(

2)(

2)ˆ( ssssyV cabbaa

Page 39: Experimental design and statistical analyses of data

39

How do it with SAS

Page 40: Experimental design and statistical analyses of data

40

PROC GLM;

CLASS treat tree leaf disc;

MODEL Nitro = treat tree(treat) leaf(tree treat);

/* treatment is a fixed factor, while trees and leaves are random */

RANDOM tree(treat) leaf(tree treat);

/* gives the expected means squares */

RUN;

DATA nested; /* Nested anova (eks 6-4 in the lecture notes) */

INFILE 'H:\lin-mod\eks6x.prn' firstobs =2 ;

INPUT treat $ tree $ leaf $ disc $ Nitro ;

Page 41: Experimental design and statistical analyses of data

41

General Linear Models Procedure

Dependent Variable: NITRO

Source DF Sum of Squares Mean Square F Value Pr > F

Model 17 134.04000000 7.88470588 8.00 0.0001

Error 18 17.75000000 0.98611111

Corrected Total 35 151.79000000

R-Square C.V. Root MSE NITRO Mean

0.883062 3.271932 0.99303127 30.35000000

Source DF Type I SS Mean Square F Value Pr > F

TREAT 2 71.78000000 35.89000000 36.40 0.0001

TREE(TREAT) 3 36.04666667 12.01555556 12.18 0.0001

LEAF(TREAT*TREE) 12 26.21333333 2.18444444 2.22 0.0618

Source DF Type III SS Mean Square F Value Pr > F

TREAT 2 71.78000000 35.89000000 36.40 0.0001

TREE(TREAT) 3 36.04666667 12.01555556 12.18 0.0001

LEAF(TREAT*TREE) 12 26.21333333 2.18444444 2.22 0.0618

NB!These values are based on MSe as the error term, which is wrong!

Page 42: Experimental design and statistical analyses of data

42

PROC GLM;

CLASS treat tree leaf disc;

MODEL Nitro = treat tree(treat) leaf(tree treat);

/* treatment is a fixed factor, while trees and leaves are random */

RANDOM tree(treat) leaf(tree treat);

/* gives the expected means squares */

RUN;

DATA nested; /* Nested anova (eks 6-4 in the lecture notes) */

INFILE 'H:\lin-mod\eks6x.prn' firstobs =2 ;

INPUT treat $ tree $ leaf $ disc $ Nitro ;

Page 43: Experimental design and statistical analyses of data

43

General Linear Models Procedure

Source Type III Expected Mean Square

TREAT Var(Error) + 2 Var(LEAF(TREAT*TREE)) + 6 Var(TREE(TREAT))

+ Q(TREAT)

TREE(TREAT) Var(Error) + 2 Var(LEAF(TREAT*TREE)) + 6 Var(TREE(TREAT))

LEAF(TREAT*TREE) Var(Error) + 2 Var(LEAF(TREAT*TREE))

Page 44: Experimental design and statistical analyses of data

44

PROC GLM;

CLASS treat tree leaf disc;

MODEL Nitro = treat tree(treat) leaf(tree treat);

/* treatment is a fixed factor, while trees and leaves are random */

RANDOM tree(treat) leaf(tree treat);

/* gives the expected means squares */

TEST h=treat e= tree(treat);

/* tests for the difference between treatments with MS for tree(treat) as denominator */

TEST h= tree(treat) e=leaf(tree treat);

/* tests for the difference between trees with MS for leaf(tree treat) as denominator*/

Page 45: Experimental design and statistical analyses of data

45

General Linear Models Procedure

Dependent Variable: NITRO

Tests of Hypotheses using the Type III MS for TREE(TREAT) as an error term

Source DF Type III SS Mean Square F Value Pr > F

TREAT 2 71.78000000 35.89000000 2.99 0.1933

Tests of Hypotheses using the Type III MS for LEAF(TREAT*TREE) as an error term

Source DF Type III SS Mean Square F Value Pr > F

TREE(TREAT) 3 36.04666667 12.01555556 5.50 0.0130

Page 46: Experimental design and statistical analyses of data

46

PROC GLM;

CLASS treat tree leaf disc;

MODEL Nitro = treat tree(treat) leaf(tree treat);

/* treatment is a fixed factor, while trees and leaves are random */

RANDOM tree(treat) leaf(tree treat);

/* gives the expected means squares */

TEST h=treat e= tree(treat);

/* tests for the difference between treatments with MS for tree(treat) as denominator */

TEST h= tree(treat) e=leaf(tree treat);

/* tests for the difference between trees with MS for leaf(tree treat) as denominator*/

MEANS treat / Tukey Dunnett('Control') e= tree(treat) cldiff;

/* finds possible significant differences between treatments and the control and the other treatments */

RUN;

Page 47: Experimental design and statistical analyses of data

47

Tukey's Studentized Range (HSD) Test for variable: NITRO

NOTE: This test controls the type I experimentwise error rate.

Alpha= 0.05 Confidence= 0.95 df= 3 MSE= 12.01556

Critical Value of Studentized Range= 5.910

Minimum Significant Difference= 5.9134

Comparisons significant at the 0.05 level are indicated by '***'.

Simultaneous Simultaneous

Lower Difference Upper

TREAT Confidence Between Confidence

Comparison Limit Means Limit

20% - 40% -3.663 2.250 8.163

20% - Control -2.513 3.400 9.313

40% - 20% -8.163 -2.250 3.663

40% - Control -4.763 1.150 7.063

Control - 20% -9.313 -3.400 2.513

Control - 40% -7.063 -1.150 4.763

Page 48: Experimental design and statistical analyses of data

48

Dunnett's T tests for variable: NITRO

NOTE: This tests controls the type I experimentwise error for

comparisons of all treatments against a control.

Alpha= 0.05 Confidence= 0.95 df= 3 MSE= 12.01556

Critical Value of Dunnett's T= 3.866

Minimum Significant Difference= 5.4714

Comparisons significant at the 0.05 level are indicated by '***'.

Simultaneous Simultaneous

Lower Difference Upper

TREAT Confidence Between Confidence

Comparison Limit Means Limit

20% - Control -2.071 3.400 8.871

40% - Control -4.321 1.150 6.621

Page 49: Experimental design and statistical analyses of data

49

PROC NESTED;

CLASS treat tree leaf;

VAR Nitro;

RUN;

Page 50: Experimental design and statistical analyses of data

50

Coefficients of Expected Mean Squares

Source TREAT TREE LEAF ERROR

TREAT 12 6 2 1

TREE 0 6 2 1

LEAF 0 0 2 1

ERROR 0 0 0 1

Sourcec df MS E[MS] F

Treatments

Trees

Leaves

Error

a-1

a(b-1)

ab(c-1)

abc(r-1)

MSa

MS(a)b

MS(ab)c

MSe

bcrσa2 +cr σ(a)b

2+ r σ(ab)c2 +σ2

cr σ(a)b2+ r σ(ab)c

2 +σ2 r σ(ab)c

2 +σ2

σ2

MSa/MS(a)b

MS(a)b/MS(ab)c

MS(ab)c/MSe

MSe

Page 51: Experimental design and statistical analyses of data

51

Nested Random Effects Analysis of Variance for Variable NITRO

Degrees

Variance of Sum of Error

Source Freedom Squares F Value Pr > F Term

TOTAL 35 151.790000

TREAT 2 71.780000 2.987 0.1933 TREE

TREE 3 36.046667 5.501 0.0130 LEAF

LEAF 12 26.213333 2.215 0.0618 ERROR

ERROR 18 17.750000

Variance Variance Percent

Source Mean Square Component of Total

TOTAL 4.336857 5.213333 100.0000

TREAT 35.890000 1.989537 38.1625

TREE 12.015556 1.638519 31.4294

LEAF 2.184444 0.599167 11.4930

ERROR 0.986111 0.986111 18.9152

Mean 30.35000000

Standard error of mean 0.99847105

r

MSMSs ecab

cab

)(2

)(

cr

MSMSs cabba

ba)()(2

)(

bcr

MSMSs baa

a)(2

Page 52: Experimental design and statistical analyses of data

52

The problem of pseudoreplication

Page 53: Experimental design and statistical analyses of data

53

1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

1 2 3 1 2 3 1 2 3

A B C

Two-way anova (A fixed, B random)

Factor A (drug)

Factor B (patient)

Replicate

18 measurements

If we want to increase the power of the analysis, we may e.g. double thenumber of measurements

But be careful about what you do!

Page 54: Experimental design and statistical analyses of data

54

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 1 2 3 1 2 3

A B C

1 2

1

1 2

2

1 2

3

1 2

4

1 2

5

1 2

6

A

1 2

1

1 2

2

1 2

3

1 2

4

1 2

5

1 2

6

1 2

1

1 2

2

1 2

3

1 2

4

1 2

5

1 2

6

CB

Design 1

Design 2

Both experiments have 36 measurements

3 experimental units/treatment

6 experimental units/treatment

Pseudoreplicates

Design 2 is best because it uses 6 experimental units/treatment

Page 55: Experimental design and statistical analyses of data

55

40% 20% 0%

Four level nested anova

Tree (b = 2 )

Replicate (r = 2)

Leaf (c = 3 )

1 2 1 2 1 2

1

1 2 3

1 2 1 2 1 2

2

1 2 3

1 2 1 2 1 2

1

1 2 3

1 2 1 2 1 2

2

1 2 3

1 2 1 2 1 2

1

1 2 3

1 2 1 2 1 2

2

1 2 3

Treatment (a = 3)

Trees are the experimental units(2 replicates/treatment)Pseudoreplicates

Page 56: Experimental design and statistical analyses of data

56

Split-plot designs

• Three types of fertilizers

• Two types of soil treatment

• Interactions between fertilizers and soil treatment

Page 57: Experimental design and statistical analyses of data

57

A1

A2

Block 3

A2

A1

Block 1

A2

A1

Block 4

A1

A2

Block 2

2 whole-plots within each block

Soil treatments

Page 58: Experimental design and statistical analyses of data

58

A1

A2

Block 3

A2

A1

Block 1

A2

A1

Block 4

A1

A2

Block 2

Fertilizer treatments

3 sub- plots within each whole-plot

Page 59: Experimental design and statistical analyses of data

59

Analysis of whole-plots

Factor df MS E[MS] FSoil treatment (A)

Block (B)

Soil*Block (AB)

Error

a-1 = 1

b-1 =3

(a-1)(b-1) = 3

0

MSa

MSb

MSab

bσa2+σab

2

aσb2+ σab

2

σab2

MSa/MSab

MSb/MSab

Total ab-1 = 7

kijijjiijky )()(

0i

i βj is ND(0, σb2)

Effect of soil treatInteraction betweensoil and block

Effect of block

Interaction term serves as error term

Page 60: Experimental design and statistical analyses of data

60

Analysis of sub-plots

Factor df MS E[MS] FWhole plots

Fertilizer (C)

Soil*Fertilizer (AC)

Block*Fert. (BC)

Soil*Block*Fert. (ABC)

Error

ab-1 = 7

c-1 = 2

(a-1)( c-1) = 2

(b-1)(c-1) = 6

(a-1)(b-1)(c-1) = 6

0

MSc

MSac

MSbc

MSabc

abσc2+σabc

2

bσac2+ σabc

2

aσbc2+ σabc

2

σabc2

MSc/MSabc

MSac/MSabc

MSbc/MSabc

Total abc-1 = 23

0k

k (βγ)jk is ND(0, σbc2(1-1/c))

jkikkijjiijky )()()(ˆ

0)( i k

ik

Effect of fertilizer

Interaction between soil treatment and fertilizer

Interaction between block and fertilizer

Page 61: Experimental design and statistical analyses of data

61

Analysis of sub-plots

Factor df MS E[MS] FWhole plots

Fertilizer (C)

Soil*Fertilizer (AC)

Block*Fert. (BC)

Soil*Block*Fert. (ABC)

Error

ab-1 = 7

c-1 =2

(a-1)( c-1) =2

(b-1)(c-1) = 6

(a-1)(b-1)(c-1) = 6

0

MSc

MSac

MSbc

MSabc

abσc2+σ2

bσac2+ σ2

bσbc2+ σ2

σ2

MSc/MSe

MSac/MSe

MSbc/MSe

Total abc-1 = 23

0k

k (βγ)jk is ND(0, σbc2(1-1/c))

jkikkijjiijky )()()(ˆ

0)( i k

ik

Page 62: Experimental design and statistical analyses of data

62

Analysis of sub-plots

Factor df MS E[MS] FWhole plots

Fertilizer (C)

Soil*Fertilizer (AC)

Block*Fert. (BC)

Soil*Block*Fert. (ABC)

Error

ab-1 = 7

c-1 =2

(a-1)( c-1) =2

(b-1)(c-1) = 6

(a-1)(b-1)(c-1) = 6

0

MSc

MSac

MSbc

MSabc

abσc2+σ2

bσac2+ σ2

σ2

MSc/MSe

MSac/MSe

Total abc-1 = 23

0k

k (βγ)jk is ND(0, σbc2(1-1/c))

jkikkijjiijky )()()(ˆ

0)( i k

ik

Page 63: Experimental design and statistical analyses of data

63

How do it with SAS

Page 64: Experimental design and statistical analyses of data

64

DATA SplitPlt;

/* Example 6-8 in the lecture notes */

/* block = block effect (random factor) */

/* soil = effect of soil treatment (whole-plot effect) */

/* fert = effect of fertilizer (subplot effect) */

/* yield = dependent variable */

INFILE 'h:\lin-mod\eks6-8.prn';

INPUT soil $ block $ fert $ yield;

PROC GLM;

TITLE 'Split plot - full model';

CLASS block soil fert;

MODEL yield= block soil block*soil fert soil*fert block*fert ;

RANDOM block; /* declare block as a random effect */

TEST h = soil e = block*soil; /* tests effect of wholeplot */

TEST h = block e = block*soil; /* tests effect of blocks */

RUN;

Page 65: Experimental design and statistical analyses of data

65

Split plot - full model

The GLM ProcedureDependent Variable: yield Sum of Source DF Squares Mean Square F Value Pr > F  Model 17 32796.58333 1929.21078 3.24 0.0764 Error 6 3575.41667 595.90278 Corrected Total 23 36372.00000  R-Square Coeff Var Root MSE yield Mean 0.901699 15.02223 24.41112 162.5000  Source DF Type III SS Mean Square F Value Pr > F  block 3 588.33333 196.11111 0.33 0.8050 soil 1 7848.16667 7848.16667 13.17 0.0110 block*soil 3 3740.83333 1246.94444 2.09 0.2027 fert 2 10950.75000 5475.37500 9.19 0.0149 soil*fert 2 462.58333 231.29167 0.39 0.6942 block*fert 6 9205.91667 1534.31944 2.57 0.1373  Tests of Hypotheses Using the Type III MS for block*soil as an Error Term  Source DF Type III SS Mean Square F Value Pr > F  soil 1 7848.166667 7848.166667 6.29 0.0870 block 3 588.333333 196.111111 0.16 0.9185

Sub-plot effects

NB! These P-values cannot be used!

Instead use these whole-plot results

Whole-plot effects

Page 66: Experimental design and statistical analyses of data

66

PROC GLM;

TITLE 'Split plot - reduced model

block*fert omitted';

CLASS block soil fert;

MODEL yield= block soil block*soil fert soil*fert;

RANDOM block;

TEST h = soil e = block*soil; /* tests effect of wholeplot */

TEST h = block e = block*soil; /* tests effect of blocks */

RUN;

Page 67: Experimental design and statistical analyses of data

67

Split plot - reduced modelblock*fert omitted

The GLM Procedure Dependent Variable: yield  Sum of Source DF Squares Mean Square F Value Pr > F  Model 11 23590.66667 2144.60606 2.01 0.1224  Error 12 12781.33333 1065.11111  Corrected Total 23 36372.00000   R-Square Coeff Var Root MSE yield Mean  0.648594 20.08372 32.63604 162.5000   Source DF Type III SS Mean Square F Value Pr > F  block 3 588.33333 196.11111 0.18 0.9051 soil 1 7848.16667 7848.16667 7.37 0.0188 block*soil 3 3740.83333 1246.94444 1.17 0.3615 fert 2 10950.75000 5475.37500 5.14 0.0244 soil*fert 2 462.58333 231.29167 0.22 0.8079

Page 68: Experimental design and statistical analyses of data

68

PROC GLM;

TITLE 'Split plot - reduced model

block*fert and soil*fert omitted';

CLASS block soil fert;

MODEL yield= block soil block*soil fert;

RANDOM block;

TEST h = soil e = block*soil; /* tests effect of wholeplot */

TEST h = block e = block*soil; /* tests effect of blocks */

MEANS soil /TUKEY e= block*soil CLM CLDIFF; /* confidence limits for wholeplot effects */

MEANS fert /TUKEY CLM CLDIFF; /* confidence limits for subplot effects */

RUN;

Page 69: Experimental design and statistical analyses of data

69

Split plot - reduced model block*fert and soil*fert omitted

97Dependent Variable: yield Sum of Source DF Squares Mean Square F Value Pr > F  Model 9 23128.08333 2569.78704 2.72 0.0457  Error 14 13243.91667 945.99405  Corrected Total 23 36372.00000   R-Square Coeff Var Root MSE yield Mean  0.635876 18.92739 30.75702 162.5000 Source DF Type III SS Mean Square F Value Pr > F  block 3 588.33333 196.11111 0.21 0.8896 soil 1 7848.16667 7848.16667 8.30 0.0121 block*soil 3 3740.83333 1246.94444 1.32 0.3079 fert 2 10950.75000 5475.37500 5.79 0.0147

Page 70: Experimental design and statistical analyses of data

70

The GLM Procedure  Tukey's Studentized Range (HSD) Test for yield  NOTE: This test controls the Type I experimentwise error rate.   Alpha 0.05 Error Degrees of Freedom 3 Error Mean Square 1246.944 Critical Value of Studentized Range 4.50067 Minimum Significant Difference 45.879   Comparisons significant at the 0.05 level are indicated by ***.   Difference Simultaneous soil Between 95% Confidence Comparison Means Limits  2 - 1 36.17 -9.71 82.05 1 - 2 -36.17 -82.05 9.71

Page 71: Experimental design and statistical analyses of data

71

The GLM Procedure  Tukey's Studentized Range (HSD) Test for yield  NOTE: This test controls the Type I experimentwise error rate.   Alpha 0.05 Error Degrees of Freedom 14 Error Mean Square 945.994 Critical Value of Studentized Range 3.70139 Minimum Significant Difference 40.25   Comparisons significant at the 0.05 level are indicated by ***.   Difference Simultaneous fert Between 95% Confidence Comparison Means Limits  3 - 2 15.38 -24.87 55.62 3 - 1 51.00 10.75 91.25 *** 2 - 3 -15.38 -55.62 24.87 2 - 1 35.63 -4.62 75.87 1 - 3 -51.00 -91.25 -10.75 *** 1 - 2 -35.63 -75.87 4.62