F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

96
Linear models F. Farnir, E. Moyse Biostatistics & Bioinformatics Faculty of Vet. Medicine University of Liege

Transcript of F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Page 1: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Linear models

F. Farnir, E. Moyse

Biostatistics & Bioinformatics

Faculty of Vet. Medicine

University of Liege

Page 2: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Outline of the course

� Linear models

◦ Basic formulation

◦ The two tasks

� Estimating parameters

� Testing hypotheses about the parameters

� Examples of linear model

◦ Simple linear regression

◦ Simple ANOVA

◦ A more complicated example

◦ A more complex situation: repeated measures

Page 3: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Linear models

� A basic formulation

◦ A model is a linear model if the relationship

between the parameters and the modelled

variable is linear.

◦ Examples:

� Linear regression: y(ij) = β0 + β1*x(i) + e(ij)

� Quadratic regression: y(i) = β0 + β1*x(i) + β2*x(i)² + e(ij)

� One way ANOVA: y(i) = µ + a(i) + e(ij)

� Multiple ways ANOVA, other regressions, mixture

of ANOVA and regressions, …

Page 4: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Linear models

� Matrix formulation:y = X*β + e

◦ y’ = (y(1),…,y(n)) = vector of observations

� known (observed, or measured)

◦ β’ = (b(1),…,b(m)) = vector of parameters

� unknown, and to be estimated

◦ e’ = (e(1),…,e(n)) = vector of residuals

� unknown, but supposed N(0,σ²*I)

◦ X = design (or incidence) matrix, linking the parameters to the observations.

� known

Page 5: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Linear models

� Two main problems:

◦ How do we obtain estimators b for the

parameters β of the model ?

◦ How do we test hypotheses about the

parameters of the model ?

Page 6: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Linear models

� Estimation problem:

◦ The rationale we use to infer estimators is to

choose parameters values that make error « as

small as possible »

� To simultaneously reduce all components of e, we

choose to minimize e’*e = Σ e(i)²

� This leads to a « least-square (of the error) estimate »

in the following form:

b = (X’*X)-1*X’*y

Page 7: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Linear models

� Estimation problem: example 1

◦ Assume we have the following simple problem

10 = 3*β + e1

12 = 2*β + e2

◦ 2 equations – 1 unknown => no solution

◦ We choose an estimator of β that minimizes the

sum of the squared errors:

min (e1² + e2²) = min [(10 – 3*β)² + (12 – 2*β)²]

Page 8: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Linear models

� Estimation problem: example 1

◦ Minimizing this function of β can be achieved

by deriving the function with respect to β an

setting the derivative equal to 0

◦ => 2*(10-3*b)*(-3) + 2*(12-2*b)*(-2) = 0

=> b = 54/13

◦ No other value of β makes the sum of squared

errors smaller.

Page 9: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Linear models

� Estimation problem: example 1

◦ Using the matrix notation:

y’ = (10 12), X’ = (3 2),

b = (b), e = (e1 e2)

◦ X’X = 13 => (X’X)-1 = 1/13

X’y = 54

◦ => b = (X’X)-1 * X’y = 54/13

Page 10: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Linear models

� Estimation problem: example 2

◦ The simplest linear model:

yi = µ + ei

◦ In matrix form: y = X*β + e

� y’ = (y1 y2 … yn)

� X’ = (1 1 … 1)

� β = (µ)

� e’ = (e1 e2 … en)

◦ The estimator b of β is then: b = (X’*X)-1*X’*y

Page 11: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Linear models

� Estimation problem: example 2

◦ Let’s compute b:

� X’*X = n => (X’*X)-1 = 1/n

� X’*y = y1 + y2 +… + yn = Σ yi

� b = (X’*X)-1 * X’*y = 1/n * Σ yi = m

◦ Let’s compute (y-X*b)’*(y-X*b)/(n-r(X))

� (X*b)’ = (m m … m)

� (y-X*b)’ = (y1-m y2-m … yn-m)

� (y-X*b)’*(y-X*b) = (y1-m)² + (y2-m)² + … +(yn-m)²

= Σ (yi-m)²

� r(X) = # of indpdt lines of X = 1

Page 12: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Linear models

� Estimation problem: example 2

◦ Consequently:

� (y-X*b)’*(y-X*b)/(n-r(X)) = Σ (yi-m)²/(n-1) = s²

Page 13: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Linear models

� Testing problem:

◦ Most of the (null) hypotheses we might want to

test can be written as:

H0: L*β = c

where L and c are known matrix and vector,

respectively. This is known as « general linear

hypothesis ».

Page 14: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Linear models

� Testing problem (cont’d):

◦ A general test can be devised, based on a F statistics, for such hypotheses:

� G is a generalized inverse of X’*X. If the hypothesis is« testable », all G provide the same F value

� q = # of independent lines of L

� The hypothesis is of course embedded in the statistic

� The numerator is the estimator of σ².

� Normality assumptions are necessary to obtain the F distributions

Fq,n-r(X) = [(L*b-c)’*(L*G*L’)-1*(L*b-c)/q] / [(y-X*b)’*(y-X*b)/(n-r(X))]

Page 15: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple linear regression:

◦ Let’s consider the following dataset:

◦ A question of interest: is there a significant

relation between weights and weeks ?

Week Weight

1 4.1

2 4.6

3 4.9

4 5.2

5 5.4

Page 16: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple linear regression:

◦ A first answer is to look at a plot of weights

versus weeks. This can be achieved using R:

> weeks<-1:5> weights<-c(4.1,4.6,4.9,5.2,5.4)> plot(weeks,weights)

Page 17: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple linear regression:

Page 18: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple linear regression: there is a clear,

almost linear, increasing trend

◦ This could be modeled using a classical linear

regression:

Y(i) = β0 + β1*X(i) + e(i)

or, using the matrix notation:

Y = X*β + e where β =

1

0

ββ

Page 19: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple linear regression:

◦ Computing the estimators b of β.

� X and y are easy to obtain:

( ) yXXX '**'* 1

1

0 −=

b

b

=

=

51

21

11

)5(1

)2(1

)1(1

⋮⋮⋮⋮

X

X

X

X

=

=

4.5

6.4

1.4

)5(

)2(

)1(

⋮⋮

y

y

y

y

Page 20: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple linear regression:

◦ Computing the estimators b of β.

> weeks<-1:5> weights<-c(4.1,4.6,4.9,5.2,5.4)> Y<-weights> X<-matrix(c(rep(1,5),weeks),byrow=F,nr=5)> b<-solve(t(X)%*%X)%*%t(X)%*%Y> b

[,1][1,] 3.88[2,] 0.32> abline(b[1],b[2],col=« red »)

Page 21: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple linear regression:

◦ Computing the estimators b of β.

Page 22: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple linear regression:

◦ Testing an hypothesis: β1 = 0.

� The hypothesis can be put in the form L*β = c

as follows:

� L = (0,1), c = 0

� Next, we can use these elements in the formula

� Note that: q = 1 and r(X) = 2

Fq,n-r(X) = [(L*b-c)’*(L*G*L’)-1*(L*b-c)/q] / [(y-X*b)’*(y-X*b)/(n-r(X))]

Page 23: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple linear regression:

◦ Testing an hypothesis: β1 = 0.> weeks<-1:5> weights<-c(4.1,4.6,4.9,5.2,5.4)> Y<-weights> n<-length(Y)> X<-matrix(c(rep(1,5),weeks),byrow=F,nr=5)> G<-solve(t(X)%*%X)> b<-G%*%t(X)%*%Y> L<-matrix(c(0,1),nr=1)> c<-c(0)> hypo<-L%*%b-c> numer<-t(hypo)%*%solve(L%*%G%*%t(L))%*%hypo> denom<-t(Y-X%*%b)%*%(Y-X%*%b)> F<-(numer/1)/(denom/(n-2))> pf(F,1,n-2,lower.tail=FALSE)

[,1][1,] 0.001857831

Page 24: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simpler solution to linear regression:

◦ Testing an hypothesis: β1 = 0.> weeks<-1:5> weights<-c(4.1,4.6,4.9,5.2,5.4)# ‘lm’ stands for ‘linear models’> lr<-lm(weights~weeks)> summary(lr)

Call:lm(formula = weights ~ weeks)

Residuals:1 2 3 4 5

-0.10 0.08 0.06 0.04 -0.08

Page 25: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models� A simpler solution to linear regression

(cont’d):◦ Testing an hypothesis: β1 = 0.

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) 3.88000 0.10132 38.29 3.92e-05 ** *weeks 0.32000 0.03055 10.47 0.00186 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘. ’ 0.1 ‘ ’ 1

Residual standard error: 0.09661 on 3 degrees of free domMultiple R-squared: 0.9734, Adjusted R-squared: 0.9645 F-statistic: 109.7 on 1 and 3 DF, p-value: 0.00185 8

Page 26: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� The classical solution for linear regression:

◦ Testing an hypothesis: β1 = 0.> weeks<-1:5> weights<-c(4.1,4.6,4.9,5.2,5.4)# b = Σ(X-Xm)*(Y-Ym)/ Σ(X-Xm)²> xm<-mean(weeks)> x<-weeks-xm> ym<-mean(weights)> y<-weights-ym> b<-sum(x*y)/sum(x**2)> b[1] 0.32> SCR<-b*sum(x*y)> SCT<-sum(y*y)> SCE<-SCT-SCR> dfR<-1> dfE<-length(weights)-2> F<-(SCR/dfR)/(SCE/dfE)> pf(F,dfR,dfE,lower.tail=FALSE)[1] 0.001857831

Page 27: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple analysis of variance:

◦ As a second example, consider these data, with

horses heart rates:

Ardennes Warm Half

106.6 115.4 100.2

100.8 97.8 102.1

110.9 120.3 99.6

114.5 98.2 103.8

115.9 113.2 100.7

91.9 107.6

95.0

Page 28: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple analysis of variance:

◦ The question of interest here is: is there a linkbetween heart rates and breed ?

◦ This question can be addressed using an ANOVA, i.e. the following model:

y(ij) = µ + αi + e(ij), i = 1, …, 3

or, using the matrix notation:

y = X*β + e where β =

H

W

A

µ

ααα

Page 29: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple analysis of variance

◦ Elements of the model

=

7007

0505

0066

75618

'*XX

=

1001

0011

0011

⋮⋮⋮⋮X

=

0.709

9.544

6.640

5.1894

'*yX

=

0.95

8.100

6.106

⋮y

Page 30: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple analysis of variance

◦ Elements of the model (using R)

# Data> X<-matrix(c(rep(c(1,1,0,0),6),rep(c(1,0,1,0),5),+ rep(c(1,0,0,1),7)),byrow=TRUE,nr=18)> Y<-c(106.6,100.8,110.9,114.5,115.9,91.9,115.4,+ 97.8,120.3,98.2,113.2,100.2,102.1,99.6,103.8,+ 100.7,107.6,95.0)# Parameters estimators> XX<-t(X)%*%X> XY<-t(X)%*%Y

Page 31: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple analysis of variance

◦ Computing estimators of βIt is easy to see that X’X matrix is singular (the

first row is equal to the sum of the 3 following

ones => use a generalized inverse

> library(MASS)> G1<-ginv(XX)> b1<-G1%*%XY> b1

[,1][1,] 79.25810[2,] 27.50857[3,] 29.72190[4,] 22.02762

Page 32: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple analysis of variance

◦ Computing estimators of βNote that another generalized inverse could be

obtained « by hand », by setting the estimator

of µ = 0 (and inverting the remaining diag.):

=

7007

0505

0066

75618

'*XX

=

7/1000

05/100

006/10

0000

G

Page 33: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple analysis of variance

◦ Computing estimators of β (using R)

# Another G> G2<-matrix(rep(0,16),nr=4)> G2[2,2]<-1/6> G2[3,3]<-1/5> G2[4,4]<-1/7# Check generalized inverse> XX%*%G2%*%XX

[,1] [,2] [,3] [,4][1,] 18 6 5 7[2,] 6 6 0 0[3,] 5 0 5 0[4,] 7 0 0 7

Page 34: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple analysis of variance

◦ Computing estimators of β (using R) (cont’d)

# Solutions> b2<-G2%*%XY> b2

[,1][1,] 0.0000[2,] 106.7667[3,] 108.9800[4,] 101.2857# i.e. the 3 breeds averages# Observe that:> b1[1,1]+b1[2,1][1] 106.7667> b2[1,1]+b2[2,1][1] 106.7667

Page 35: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple analysis of variance

◦ Testing a first hypothesis:

� H0: µA = µW = µH

which can be rewritten as:

H0: (µA = µW & µA = µH) or (µA - µW = 0 & µA - µH = 0)

� In terms of the general linear hypothesis, this can be

written as:

=

−−

0

0

1010

0110

H

W

A

µ

ααα

Page 36: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple analysis of variance

◦ Testing a first hypothesis using R:# Hypothesis 1> L<-matrix(c(0,1,-1,0,0,1,0,-1),byrow=TRUE,nr=2)> q<-2> n<-18> rX<-3> num<-(t(L%*%b1)%*%solve(L%*%G1%*%t(L))%*%L%*%b1)/q> den<-(t(Y-X%*%b1)%*%(Y-X%*%b1))/(n-rX)# Test the hypothesis> F<-num/den> F

[,1][1,] 1.549397 > pf(F,q,n-rX,lower.tail=FALSE)

[,1][1,] 0.2445187

Page 37: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple analysis of variance

◦ Easily testing the first hypothesis using R:

# Hypothesis 1> breed<-factor(c(rep(«A»,6),rep(«W»,5),rep(«H»,7)))> model<-lm(Y~breed)> summary(model)

Call:lm(formula = Y ~ breed)

Residuals:Min 1Q Median 3Q Max

-14.8667 -4.8964 0.3238 5.7907 11.3200

Page 38: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple analysis of variance

◦ Easily testing the first hypothesis using R:

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) 106.767 3.225 33.106 1.94e-15 ** *breedW 2.213 4.783 0.463 0.650 breedH -5.481 4.395 -1.247 0.231 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘. ’ 0.1 ‘ ’ 1

Residual standard error: 7.9 on 15 degrees of freedomMultiple R-squared: 0.1712, Adjusted R-squared: 0.06071 F-statistic: 1.549 on 2 and 15 DF, p-value: 0.2445

Page 39: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A simple analysis of variance

◦ Testing a second hypothesis using R:# Hypothesis 2: H0: µ(H)=0.5*(µ(A)+µ(W))> L<-matrix(c(0,-0.5,-0.5,1),byrow=TRUE,nr=1)> q<-1> n<-18> rX<-3> num<-(t(L%*%b1)%*%solve(L%*%G1%*%t(L))%*%L%*%b1)/q> den<-(t(Y-X%*%b1)%*%(Y-X%*%b1))/(n-rX)# Test the hypothesis> F<-num/den> F

[,1][1,] 2.965257> pf(F,q,n-rX,lower.tail=FALSE)

[,1][1,] 0.1056209

Page 40: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A more complex example:

◦ As a third example, consider this dataset

H St Ge Age HR H St Ge Age HR

1 T M 88 68 8 NT M 64 76

2 T M 96 64 9 NT M 77 75

3 T F 90 76 10 NT F 100 71

4 T F 73 71 11 NT F 75 85

5 T M 85 63 12 NT M 63 81

6 T F 99 63 13 NT M 73 80

7 T F 60 67 14 NT F 67 81

15 NT F 76 83

Page 41: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A more complex example:

◦ Possible questions are:

� Is there any effect of training on the heart rate (HR) ?

� Is there an age effect and/or a gender effect on HR ?

� Is the (potential) training effect similar in males and in

females ?

◦ Answers:

� All these questions can be « easily » addressed using

a linear model…

Page 42: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A more complex example :

◦ The model:

y(ijk) = µ + β*a(ijk) + τi + γj + (τγ)ij + e(ijk)

◦ Using the matrix notation:

y = X*β + e

where β’ = (µ,β,τT,τNT,γF,γM,τγTF,τγTM,τγNTF,τγNTM)

Page 43: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A more complex example :

◦ Obtaining X and Y

=

83

81

80

81

85

71

75

76

67

63

63

71

76

64

68

y

=

01000110761

01000110671

10001010731

10001010631

01000110751

010001101001

10001010771

10001010641

00010101601

00010101991

00101001851

00010101731

00010101901

00101001961

00101001881

X

Page 44: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A more complex example :

◦ Performing computations (using R)

# Building matrices> y<-c(68,64,76,71,63,63,67,76,75,71,85,81,80,81,83)> X<-matrix(rep(0,150),nr=15)> X[,1]<-rep(1,15)> X[,2]<-c(88,96,90,73,85,99,60,64,77,+ 100,75,63,73,67,76)> X[,3]<-c(rep(1,7),rep(0,8))> X[,4]<-1-X[,3]> X[,5]<-c(0,0,1,1,0,1,1,0,0,1,1,0,0,1,1)> X[,6]<-1-X[,5]> X[,7]<-c(X[,5][1:7],rep(0,8))> X[,8]<-c(X[,6][1:7],rep(0,8))> X[,9]<-c(rep(0,7),X[,5][8:15])> X[,10]<-c(rep(0,7),X[,6][8:15])

Page 45: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A more complex example :

◦ Performing computations (using R) (cont’d)

# Computing estimators> XX<-t(X)%*%X> Xy<-t(X)%*%y> G<-ginv(XX)> b<-G%*%Xy

[,1][1,] 38.1162191[2,] -0.1592766[3,] 15.6683053[4,] 22.4479138[5,] 20.1285345[6,] 17.9876846[7,] 8.1587098[8,] 7.5095955[9,] 11.9698247

[10,] 10.4780891

Page 46: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A more complex example :

◦ Remarks:

� Other solutions could be obtained, using other

generalized inverses.

� Since regression parameters are « estimable », the

other solutions would give the same solution for β(see notes for an example).

Page 47: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A more complex example :

◦ Testing hypotheses (using R)

# test_H: a generic function to test hypothesestest_H<-function(X,y,L,c) {

library(MASS)XX<-t(X)%*%XG<-ginv(t(X)%*%X)b<-G%*%t(X)%*%ynum<-t(L%*%b-c)%*%solve(L%*%G%*%t(L))%*%(L%*%b-c)den<-t(y-X%*%b)%*%(y-X%*%b)q<-dim(L)[1]n<-length(y)rX<-qr(XX)$rankF<-(num/q)/(den/(n-rX))pF<-pf(F,q,n-rX,lower.tail=FALSE)c(F,pF)

}

Page 48: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A more complex example :

◦ a) Testing the regression coefficient

◦ b) Testing the training effect

# Test: beta=0> L<-matrix(c(0,1,rep(0,8)),nr=1)> c<-matrix(c(0))> test_H(X,y,L,c)[1] 2.1324558 0.1749022# No significant regression

# Test: tau(T)-tau(NT)=0> L<-matrix(c(0,0,1,-1,rep(0,6)),nr=1)> c<-matrix(c(0))> test_H(X,y,L,c)[1] 14.951074581 0.003126119# Significant training effect

Page 49: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A word of caution:

◦ When testing the training effect, we actually

compare the means of the 2 groups (Trained –

Not Trained)

◦ The raw means embed information on other

effects of the model, which might not be

desirable…

� This can be shown by replacing the observation by the

assumed model and averaging over each group (see

next slide)

Page 50: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A word of caution:

( ) ( )T

TMTFMFTTT eay +++++++=

7

*3*4

7

*3*4*

τγτγγγτβµ

( ) ( )NT

NTMNTFMFNTNTNT eay +++++++=

22*

τγτγγγτβµ

( ) ( )( ) ( ) ( ) ( ) ( )NTT

NTMNTFTMTF

MFNTTNTTNTT

ee

aayy

−+++++

−+−+−=−

14

*7*7*6*814

*

τγτγτγτγ

γγττβ

Page 51: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A word of caution:

◦ This complicated expression shows that:

� Due to the non-balanced nature of the dataset,

comparing training statuses involves the gender

effect

� The presence of a covariate might induce differences

if both groups are not balanced wrt age

� The potential interactions between training status

and gender might render comparison of status

meaningless.

Page 52: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A possible solution:

◦ Use « Least Square Means » (LSM)

� We first obtain averages (LSM) on subgroups

� We average these means to obtain marginal LSM

� Example: LSM(T,F) = ?, LSM(T) = ?

Heart rates Trained Not Trained

Females 76,71,63,67 71,85,81,83

Males 68,64,63 76,75,81,80

( ) TFTFFTTF eay +++++= τγγτβµ *

Conventionally averaged over all dataset

Page 53: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A possible solution:

◦ Use « Least Square Means » (LSM)

� We first obtain averages on subgroups

# Compute L for the 4 subgroups> L_TF<-matrix(c(1,mean(X[,2]),1,0,1,0,1,0,0,0),nr=1)> L_TM<-matrix(c(1,mean(X[,2]),1,0,0,1,0,1,0,0),nr=1)> L_NTF<-matrix(c(1,mean(X[,2]),0,1,1,0,0,0,1,0),nr=1)> L_NTM<-matrix(c(1,mean(X[,2]),0,1,0,1,0,0,0,1),nr=1)# Compute LSM for the 4 subgroups> LSM_TF<-L_TF%*%b> LSM_TM<-L_TM%*%b> LSM_NTF<-L_NTF%*%b> LSM_NTM<-L_NTM%*%b

Page 54: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A possible solution:

◦ Use « Least Square Means » (LSM)

� We then average to obtain marginal LSM

# Compute LSM for the main effects> LSM_T<-0.5*(LSM_TF+LSM_TM)> LSM_NT<-0.5*(LSM_NTF+LSM_NTM)> LSM_F<-0.5*(LSM_TF+LSM_NTF)> LSM_M<-0.5*(LSM_TM+LSM_NTM)

Page 55: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A possible solution:

◦ Use « Least Square Means » (LSM)

� Finally, since they are linear combinations of the

parameters, LSM (or differences of LSM) can be

tested using the general linear hypothesis test given

above !

� Example: let’s compare the T & NT groups

LSM_T-LSM_NT

= 0.5*(LSM_TM + LSM_TF - LSM_NTF - LSM_NTM)

= 0.5*(L_TM + L_TF - L_NTF - L_NTM)*b

= (0,0,1,-1,0,0,0.5,0.5,-0.5,-0.5)*b

Page 56: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Examples of linear models

� A possible solution:

◦ Use « Least Square Means » (LSM)

� (0,0,1,-1,0,0,0.5,0.5,-0.5,-0.5)*b

# Compute difference of LSM between T and NT> L<-matrix(c(0,0,1,-1,0,0,0.5,0.5,-0.5,-0.5),nr=1)> c<-0> test_H(X,y,L,c)[1] 14.951074581 0.003126119# Same result as before, so showing that the obtained# solution is corrected for the other effects of the # model !

Page 57: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

A (even) more complex situation

� Imagine the following situation

◦ 2 groups of 2 individuals are followed

longitudinally and 3 measures are taken on

each individual at 3 specific times (see figure)

Page 58: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

A more complex situation

� Some questions are:

◦ Is there a significant difference in the

measures between the groups ?

◦ Is there a significant difference in the

measure between the times ?

� If yes, for which times ?

◦ [Are the 2 groups dynamic behaviour

different ?]

Page 59: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

A more complex situation

� These questions can easily be adressed

using linear models, as done above.

◦ Omitting the interaction for simplicity:

=

2.97

5.88

0.85

6.110

8.89

5.91

0.118

7.113

7.103

3.116

4.106

4.89

y

=

100101

010101

001101

100101

010101

001101

100011

010011

001011

100011

010011

001011

X

=

3

2

1

2

1

τττγγµ

β

Page 60: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

A more complex situation

� LM analysis, using R (1):

## Observations#> y<-c(89.4,106.4,116.3,103.7,113.7,118.0,91.5,89.8,+ 110.6,85.0,88.5,97.2)## Design matrix#> X<-matrix(rep(0,72),nr=12)> X[,1]<-1> X[1:6,2]<-1> X[7:12,3]<-1> X[c(1,4,7,10),4]<-1> X[c(2,5,8,11),5]<-1> X[c(3,6,9,12),6]<-1

Page 61: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

A more complex situation

� LM analysis, using R (2):

## Compute solutions#> XX<-t(X)%*%X> Xy<-t(X)%*%y> library(MASS)> b<-ginv(XX)%*%Xy## Test of group effect#> L<-matrix(c(0,1,-1,0,0,0),nr=1)> c<-matrix(c(0),nr=1)> test_H(X,y,L,c)[1] 14.89196727 0.00481566## Significant group effect (p = 0.0048)

Page 62: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

A more complex situation

� LM analysis, using R (3):

## Test of time effect#> L<-matrix(c(0,0,0,1,-1,0,0,0,0,0,1,-1),nr=2,byrow=T)> c<-matrix(c(0,0),nr=2)> test_H(X,y,L,c)[1] 8.25934879 0.01133366# Significant time effect (p = 0.0113)

# Or, equivalently:> L<-matrix(c(0,0,0,1,-1,0,0,0,0,1,0,-1),nr=2,byrow=T)> c<-matrix(c(0,0),nr=2)> test_H(X,y,L,c)[1] 8.25934879 0.01133366# Significant time effect (p = 0.0113)

Page 63: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

A more complex situation

� LM analysis, using R (4):

# Obtain LSM for groups> L_G1T1<-matrix(c(1,1,0,1,0,0),nr=1)> L_G1T2<-matrix(c(1,1,0,0,1,0),nr=1)> L_G1T3<-matrix(c(1,1,0,0,0,1),nr=1)> L_G2T1<-matrix(c(1,0,1,1,0,0),nr=1)> L_G2T2<-matrix(c(1,0,1,0,1,0),nr=1)> L_G2T3<-matrix(c(1,0,1,0,0,1),nr=1)> LSM_G1T1<-L_G1T1%*%b> LSM_G1T2<-L_G1T2%*%b> LSM_G1T3<-L_G1T3%*%b> LSM_G2T1<-L_G2T1%*%b> LSM_G2T2<-L_G2T2%*%b> LSM_G2T3<-L_G2T3%*%b> LSM_G1<-(LSM_G1T1+LSM_G1T2+LSM_G1T3)/3> LSM_G2<-(LSM_G2T1+LSM_G2T2+LSM_G2T3)/3

Page 64: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

A more complex situation

� LM analysis, using R (5):## Show LSM for groups, and difference#> c(LSM_G1,LSM_G2,LSM_G1-LSM_G2)[1] 107.91667 93.76667 14.15000## Test whether true difference is 0#> L_delta<-(L_G1T1+L_G1T2+L_G1T3-L_G2T1-L_G2T2-L_G2T3)/3> c_delta<-matrix(c(0),nr=1)> LSM_delta<-L_delta%*%b## Test the difference#> test_H(X,y,L_delta,c_delta)[1] 14.89196727 0.00481566# Of course identical to previous groups test

Page 65: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

A more complex situation

� LM analysis, summary:

◦ Everything seems fine, but...

� Independence assumptions have clearly been violated (measures taken on the same individualare likely to be correlated)

� Assuming erroneously independence might:

� Underestimate random residual variation (σ²e)

� Consequently, overestimate effects...

� An thus, increase false positive rates

◦ So, a question of interest is: how can we take these correlations intoaccount ?

Page 66: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

A more complex situation

� Idea: use a more general family of linear

models, named « mixed models »,

allowing for correlations

◦ « Mixed » refers to the simultaneous use of

« fixed » an « random » effects

� Fixed: this effect would be the same if we repeat

the experiment

� Example: groups, times

� Random: this effect is randomly sampled in a

population of possible levels

� Example: animals

Page 67: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Matrix formulation:

y = X*β + Z*u + e

◦ Z = design (or incidence) matrix, linking the

random parameters to the observations.

� known

◦ u = vector of random effects

� unknown, values to be predicted

� assumed to be random samples from N(0,I*σ²u)

� so, var(ui) = σ²u for all i

� and cov(ui,uj) = 0 for all combinations of i and j ≠ i (i.e.

individuals are assumed to be un(co)related

� σ²u is an unknown parameter, to be estimated

A more complex situation

Page 68: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Matrix formulation (cont’d):

y = X*β + Z*u + e

◦ e = vector of random residuals

� unknown

� assumed to be random samples from N(0,I*σ²e)

� so, var(ei) = σ²u for all i

� and cov(ei,ej) = 0 for all combinations of i and j (i.e. individuals

are assumed to be un(co)related

� σ²e is an unknown parameter, to be estimated

◦ Furthermore, we will assume:

� Cov(ui,ej) = 0 for all i,j

A more complex situation

Page 69: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Variances and covariances

◦ u ~ N(0;G) with G = I*σ²u

◦ e ~ N(0;R) with R = I*σ²e

◦ V = V(y)

= V(X*β + Z*u + e)

= V(X*β) + V(Z*u) + V(e)

+ 2*Cov(X*β,Z*u) + 2*Cov(X*β,e)

+ 2*Cov(Z*u,e)

= 0 + Z*V(u)*Z’ + R + 0 + 0 + Z*Cov(u,e)

= Z*G*Z’ + R + 0

= Z*G*Z’ + R

A more complex situation

Page 70: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Variances and covariances: example

◦ Back to our problem...

A more complex situation

## Random effect design matrix#> Z<-matrix(rep(0,48),nr=12)> Z[c(1:3),1]<-1> Z[c(4:6),2]<-1> Z[c(7:9),3]<-1> Z[c(10:12),4]<-1## Known correlation matrices# Arbitrary values are used to start with#> sigma_2_a<-10.0> sigma_2_e<-20.0> G<-diag(4)*sigma_2_a # No correlation between animals> R<-diag(12)*sigma_2_e # No correlation between residuals

Page 71: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Variances and covariances: example

◦ Back to our problem...

A more complex situation

## Observations variance-covariance matrix#> V<-Z%*%G%*%t(Z)+R> V

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12][1,] 30 10 10 0 0 0 0 0 0 0 0 0[2,] 10 30 10 0 0 0 0 0 0 0 0 0[3,] 10 10 30 0 0 0 0 0 0 0 0 0[4,] 0 0 0 30 10 10 0 0 0 0 0 0[5,] 0 0 0 10 30 10 0 0 0 0 0 0[6,] 0 0 0 10 10 30 0 0 0 0 0 0[7,] 0 0 0 0 0 0 30 10 10 0 0 0[8,] 0 0 0 0 0 0 10 30 10 0 0 0[9,] 0 0 0 0 0 0 10 10 30 0 0 0

[10,] 0 0 0 0 0 0 0 0 0 30 10 10[11,] 0 0 0 0 0 0 0 0 0 10 30 10[12,] 0 0 0 0 0 0 0 0 0 10 10 30

Page 72: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Variances and covariances: example

◦ More generally, in our problem...

A more complex situation

++

++

++

++

++

++

=

2222

2222

2222

2222

2222

2222

2222

2222

2222

2222

2222

2222

000000000

000000000

000000000

000000000

000000000

000000000

000000000

000000000

000000000

000000000

000000000

000000000

euuu

ueuu

uueu

euuu

ueuu

uueu

euuu

ueuu

uueu

euuu

ueuu

uueu

σσσσσσσσσσσσ

σσσσσσσσσσσσ

σσσσσσσσσσσσ

σσσσσσσσσσσσ

V

Page 73: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� We can see that:

◦ Introducing a random individual effect in the

model has led to introduce a correlation

between observations on the same

individual

◦ The price to be paid:

� More parameters (u, σ²u)

� Much more complicated resolution => use of

specialized softwares (SAS, AIREML, ...)

� Some details given in the appendix for R solution

A more complex situation

Page 74: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� An alternative solution:

◦ Instead of introducing an individual effect in

order to correlate the observations,

correlations can be introduced directly in the

R matrix

� No random effect anymore (Z = 0)

� V = Z*G*Z’ + R = R

� Two parameters (σ1² & σ2²) need to be estimated, with

R(i,i) = σ1² + σ2² for all i

R(i,j) = σ1² for all i ≠ j (same animal)

A more complex situation

Page 75: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� An alternative solution (cont’d):� Other equivalent coding: R = K*σ²

σ² = σu² + σe², ρ = σu²/(σu² + σe²)

K(i,i) = 1 for all i

K(i,j) = ρ for all i,j (same animal)

� This correlation structure is referred to as « compound

symmetry » (CS). It involves only 2 parameters (σ² and ρ).

A more complex situation

Page 76: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� An alternative solution (cont’d):

◦ The later approach is more flexible, because

other R structures than CS can be introduced

� For example, it could be expected that measures

taken on times 1 and 3 should be less correlated

than measures taken on times 1 and 2

� A possible structure to model this is:

� σ² = σu² + σe², ρ = σu²/(σu² + σe²)

K(i,i) = 1 for all i

K(i,j) = ρ|i-j| for all i,j (same animal)

� This type of structure is named « type 1 auto-regression »

(AR(1)), and also involves only 2 parameters (σ² and ρ)

A more complex situation

Page 77: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Model selection refers to procedures used

to select the « best » model to be used on

a given dataset

� Various approaches have been proposed,

and we’ll show one as an example

� Although almost automated procedures

exist and are implemented, nothing will

replace the experimenter knowledge of

the problem and sound reasoning...

A word on model selection

Page 78: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� We can re-use a previous example to

show the rationale:

A word on model selection

H St Ge Age HR H St Ge Age HR

1 T M 88 68 8 NT M 64 76

2 T M 96 64 9 NT M 77 75

3 T F 90 76 10 NT F 100 71

4 T F 73 71 11 NT F 75 85

5 T M 85 63 12 NT M 63 81

6 T F 99 63 13 NT M 73 80

7 T F 60 67 14 NT F 67 81

15 NT F 76 83

Page 79: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Several models could be used to select

the « best » one:

(1) yi = m + ei

(2) yi = m + b*Ai + ei

(3) yi = m + b*Ai + Gi + ei

(4) yi = m + b*Ai + Gi + Si + ei

(5) yi = m + b*Ai + Gi + Si + Gi*Si + ei

(6) yi = m + Gi + Si + Gi*Si + ei

(7) yi = m + Gi + Si + ei

...

Possible models

Page 80: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� A model is nested within another if it is

built with a subset of the factors of this

model:

(1) => (2) => (3) => (4) => (5)

(7) => (6) => (5)

but, for example:

(6) ≠> (4)

� Comparing nested (linear) models can be

done using a F test (see next slide)

Nested models

Page 81: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Idea: if a new factor does not contribute

significantly to the model, the extra-fit

provided by this factor is an estimator of

the error variance.

� Remark: when models are not nested,

other criteria (such as ‘Akaike Information

Criterion = AIC) must be used.

F test for nested models selection

Fq,n-r(X) = [(Xc*bc-Xr*br)’*y/q] / [(y-Xc*bc)’*(y-Xc*bc)/(n-r(Xc))]

Page 82: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Example: comparing (3) to (2)

F test for nested models selection

# Reduced model> Xr<-matrix(rep(0,30),nr=15)> Xr[,1]<-rep(1,15)> Xr[,2]<-age# Complete model> Xc<-matrix(rep(0,45),nr=15)> Xc[,1]<-rep(1,15)> Xc[,2]<-age> Xc[,3]<-gender# Solutions> library(MASS)> br<-ginv(t(Xr)%*%Xr)%*%t(Xr)%*%hr> bc<-ginv(t(Xc)%*%Xc)%*%t(Xc)%*%hr# Test > num<-t(Xc%*%bc-Xr%*%br)%*%hr/1> den<-t(hr-Xc%*%bc)%*%(hr-Xc%*%bc)/(15-2)> pf(num/den,q,n-rxc,lower.tail=FALSE)

[,1][1,] 0.4151881

Page 83: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

Appendix

Page 84: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Procedure:

1. Obtain variance components (σ²,ρ,...)

estimation using specific methods

� The today preferred method is called REML

(REstricted Maximum Likelihood)

2. Obtain solutions using these estimates

� A practical method is to used so-called

Henderson’s mixed model equations (MME)

3. Perform (approximate) testing

� Use a modified version of the general linear

hypothesis described above

Appendix: solving mixed models

Page 85: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Procedure – 1) REML: AI algorithm

◦ θ(k+1) = θ(k) + AI-1*SC

� k = iteration #

� θ’ = (σ²u,σ²e)’ and θ(0) ~ arbitrary

� SC = « score vector »

� SC(1) = 0.5*y’*P*Z*G*Z’*P’*y – trace(Z*G*Z’*P)

� P = « hat matrix » = V-1 – V-1*X’*(X’*V-1*X)-*X*V-1

� SC(2) = 0.5*y’*P*P’*y – trace(P)

� AI = « average information matrix »

Appendix: solving mixed models

=

P'*y*P*y'*PZ'*P'*y*G*Z*P*y'*P

P'*y*Z'*P*G*Z*y'*PZ'*P'*y*G*Z*Z'*P*G*Z*y'*PAI

Page 86: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Procedure – 1) REML: R implementation

Appendix: solving mixed models

## REML estimators computation#library(MASS)# Init computationsdiff<-1000.0AI<-matrix(rep(0,4),nr=2)SC<-matrix(rep(0,2),nr=2)sigma_2_u<-10.0sigma_2_e<-20.0sigma_2<-c(sigma_2_u,sigma_2_e)# Loop while estimates differwhile (diff>0.01) {

# Loop body => see next slide}

Page 87: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Procedure – 1) REML: R implementation

Appendix: solving mixed models

# Loop body (1) # Variance of the observationsZGZ<-Z%*%G%*%t(Z)V<-(ZGZ)*sigma_2_u+R*sigma_2_e# P matrixVi<-solve(V)XVi<-t(X)%*%ViP<-Vi-t(XVi)%*%(ginv(XVi%*%X)%*%XVi)# Partial computationsPy<-P%*%yZGZP<-ZGZ%*%PZGZPy<-ZGZ%*%Py# Continued on next slide...

Page 88: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Procedure – 1) REML: R implementation

Appendix: solving mixed models

# Loop body (2) # TracestrP<-0trPZGZ<-0for (i in 1:dim(P)[1]) {

trP<-trP+P[i,i]trPZGZ<-trPZGZ+ZGZP[i,i]

}# AI matrixAI[1,1]<-0.5*t(ZGZPy)%*%(P%*%ZGZPy)AI[1,2]<-0.5*t(Py)%*%(P%*%ZGZPy)AI[2,2]<-0.5*t(Py)%*%(P%*%Py)AI[2,1]<-AI[1,2]# Score vectorSC[1]<-0.5*(t(Py)%*%ZGZPy-trPZGZ)SC[2]<-0.5*(t(Py)%*%Py-trP)

Page 89: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Procedure – 1) REML: R implementation

Appendix: solving mixed models

# Loop body (3) # New estimatorsnew_sigma_2<-sigma_2+solve(AI)%*%SCnew_sigma_2_u<-new_sigma_2[1]new_sigma_2_e<-new_sigma_2[2]# Differencediff<-(sigma_2[1]-new_sigma_2[1])**2diff<-diff+(sigma_2[2]-new_sigma_2[2])**2sigma_2<-new_sigma_2

}sigma_2

[,1][1,] 18.82629[2,] 26.21528

Page 90: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Procedure – 2) MME: method

◦ BLUE ( ) and BLUP ( ) can be obtained using:

◦ In our case, this can be written:

Appendix: solving mixed models

=

+ −

−−−

−−

y*Z'*R

y*X'*R

u

β

GZ*Z'*RX*Z'*R

Z*X'*RX*X'*R1

1

111

11

ˆ

ˆ

β u

=

+ Z'*y

X'*y

u

β

IZ'*ZZ'*X

X'*ZX'*X

ˆ

ˆ

)/(* 22ue σσ

Page 91: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Procedure – 2) MME: R implementation

Appendix: solving mixed models

MMEl<-matrix(rep(0,100),nr=10)MMEr<-matrix(rep(0,10),nr=10)MMEl[1:6,1:6]<-t(X)%*%XMMEl[1:6,7:10]<-t(X)%*%ZMMEl[7:10,1:6]<-t(Z)%*%XMMEl[7:10,7:10]<-t(Z)%*%Z+solve(G)*(sigma_2[2]/sigma_2[1])MMEr[1:6]<-t(X)%*%yMMEr[7:10]<-t(Z)%*%ysol<-ginv(MMEl)%*%MMEr

Page 92: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Procedure – 3) Approximate testing

◦ The approximation comes from the fact that

only estimators of β and u are available

◦ The denominator degres of freedom are

estimated using various methods (see

details in the litterature) well beyond the

scope of this text...!

◦ The estimator is a simple extension of the

method for fixed effects only models

Appendix: solving mixed models

Fq,v = [(L*b-c)’*(L*C*L’)-1*(L*b-c)/q] ^

Page 93: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Procedure – 3) Approximate testing

Appendix: solving mixed models

# Testing group effects> L<-matrix(c(0,1,-1,rep(0,7)),nr=1)> Lsol<-L%*%sol> C_hat<-ginv(MMEl/sigma_2[2])> LCL<-L%*%(C_hat%*%t(L))> LCLi<-ginv(LCL)> Fg<-t(Lsol)%*%(LCLi%*%Lsol)/1> dfg1<-1> dfg2<-2 # Cfr Kenward-Rogers...> 1-pf(Fg,dfg1,dfg2)

[,1][1,] 0.1145035>

Page 94: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� Procedure – 3) Approximate testing

Appendix: solving mixed models

# Testing time effects> L<-matrix(c(0, 0,0, 1,-1,0, 0,0,0,0, 0, 0,0, 1,0,-1,+ 0,0,0,0),nr=2,byrow=T) > Lsol<-L%*%sol> C_hat<-ginv(MMEl/sigma_2[2])> LCL<-L%*%(C_hat%*%t(L))> LCLi<-ginv(LCL)> Ft<-t(Lsol)%*%(LCLi%*%Lsol)/1> dft1<-1> dft2<-6 # Cfr Kenward-Rogers...> 1-pf(Ft,dft1,dft2)

[,1][1,] 0.006966431 >

Page 95: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� The same analyses, with SAS: program

Appendix: solving mixed models

options ls=80;data phd;

input groupe temps animal pheno @@;cards;

1 1 1 89.4 1 1 2 103.7 1 2 1 106.4 1 2 2 113.71 3 1 116.3 1 3 2 118.0 2 1 3 91.5 2 1 4 85.02 2 3 89.8 2 2 4 88.5 2 3 3 110.6 2 3 4 97.2;proc glm;

class groupe temps;model pheno=groupe temps;lsmeans groupe /pdiff stderr;

proc mixed;class groupe temps animal;model pheno=groupe temps / solution;repeated /sub=animal type=cs;

Page 96: F. Farnir, E. Moyse Biostatistics& Bioinformatics ...

� The same analyses, with SAS: result (1)

Appendix: solving mixed models

SASlisting