PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression...

104
Plan Day 1 Contents 1 Day 1: Linear Mixed Models (LMM) 1 Linear Models 2 Random and fixed effects 3 Definition and notations 4 Estimation and algorithms 5 Model selection C.G.B. Dem´ etrio An Introduction to Mixed Models in Agriculture

Transcript of PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression...

Page 1: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1

Contents

1 Day 1: Linear Mixed Models (LMM)1 Linear Models2 Random and fixed effects3 Definition and notations4 Estimation and algorithms5 Model selection

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 2: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Linear Model

The classical linear model is defined by

Y = Xβ + ε

where

Y is an observable data (response variable) vector

β is a vector of unknown parameters

X is the design matrix (for factors and regressors)

ε is a vector of random errors and ε ∼ N(0, σ2I)

ThenE = Xβ and Var(Y) = σ2I

The ordinary least-squares estimator (the same as MLE) of β is

β = (X′X)−1X′Y

Disadvantages

too restrictive for most of typical data sets

the error-structure in real-world experiments is often more complexthan Σ = σ2I

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 3: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Explanatory variables

(next ten slides are from Demetrio, Mortier and Trottier, 2009)2 types of explanatory variables :

1 factors↪→ interest is in attributing variability in y to various categories ofthe factor

Example: patients classified by gender (M/F) and age group(A/B/C)

Yij = µ+ αi + βj + εij i = 1, 2 j = 1, 2, 3

↪→ parameter values give the impact of factor’s levels on theresponse variable

factors may be crossed or nestedfactors may have main effect and interaction effect

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 4: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

2 regressors↪→ interest is in attributing variability in Y to changes in values of acontinuous covariable

Example: changes due to weight x

Yi = β0 + β1xi + εi

↪→ parameter values give the impact of an increase in x on theresponse variable

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 5: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Terminology :

Multiple Linear Regression/ANOVA/ANCOVA

if matrix X contains only regressors, models are called regressionmodels

if matrix X contains only factors, model are called Analysis ofVariance (ANOVA) (X is a matrix with 1’s and 0’s) models.

if matrix X contains both regressors and factors, models are calledAnalysis of Covariance (ANCOVA) models.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 6: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Estimation

Let’s assume a linear model :

Y = Xβ + ε

Parameters to be estimated are β, σIn all the following, X is supposed of full rank: rank(X )= K

Least squares approach : min(||Y − Xβ||2)

βls = (X ′X )−1X ′Y

best linear unbiased estimator of β

βls ∼ N (β, σ2(X ′X )−1)

best quadratic unbiased estimator of σ2

σ2ls =

1

n − K(Y − X βls)′(Y − X βls) and σ2

ls ∼σ2

n − Kχ2(n − K )

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 7: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Maximum likelihood approach

Likelihood

L(β, σ; y) =n∏

i=1

1√2πσ2

e−1

2σ2 (yi−x′i β)′(yi−x′i β)

Log-likelihood

`(β, σ; y) = −n

2log (2πσ2)− 1

2σ2(y − Xβ)′(y − Xβ)

Maximum log-likelihood

∂β,σ`(β, σ, y) = 0⇒

βml = (X ′X )−1X ′Y

σ2ml =

1

n(Y − X β)′(Y − X β)

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 8: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

βls = βml , unbiased

E[βls ] = E[βml ] = E[(X ′X )−1X ′Y] = β

but σ2ls 6= σ2

ml

σ2ls is unbiasedσ2ls is calculated on the orthogonal space of Xσ2ls takes into account the difference between Y and its projection X β

on X and the lost of degrees of freedom due to the estimation of β

σ2ml is biased

joint estimation of σ2 and βit does not take into account the lost in degrees of freedom due tothe estimation of β

Note thatE[(Y − X β)′(Y − X β)] = E{Y′[I − (X ′X )−1X ′]Y}σ2 = (n − K )σ2

Then E(σ2ls) = σ2 and E(σ2

ml) = n−Kn σ2

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 9: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Definition: The trace of a square matrix is the sum of its diagonalelements.Definition: The degrees of freedom of a sum of squares is the rank ofthe idempotent of its quadratic form. That is the degrees of freedom ofY′AY is given by rank(A).Lemma: For B idempotent, rank(B) = trace(B).Lemma: Let c be a scalar and (A), (B) and (C) be matrices. Thenwhen the appropriate operations are defined, we have

(i) trace(A) = trace(A′);

(ii) trace(cA) = c trace(A);

(iii) trace(A + B) =trace(A) + trace(B);

(iv) trace(AB) =trace(BA);

(v) trace(ABC) =trace(CAB) =trace(BCA)

(vi) trace(A ⊗ B) = trace(B) trace(A);

(vii) trace(A′A) = 0 if only if A = 0.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 10: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Theorem: Let Y be an n × 1 vector of random variables with

E[Y] = Ψ and Var[Y] = V

where Ψ is a n × 1 vector of expected values and V is an n × n matrix.Let A an n × n matrix of real values. Then

E(YTAY) = trace (AV) + ΨTAΨ

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 11: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Goodness of fit criterion

Adjusted R-square

R2 = 1−∑n

i=1(yi − x ′i β)2/(n − K )∑ni=1(yi − y)2/(n − 1)

Akaike’s Information Criterion

AIC = −2 logL(βml , σml , y) + 2K

Bayesian Information Criterion

BIC = −2 logL(βml , σml , y) + K log(n)

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 12: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

SAS procedure

Proc GLM data = data;

class x; * if x is a factor

model y = x;

output out=Regr p=Predite r=Residu;

run;

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 13: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Checking

Gaussian hypothesis

Graphical

histogram, QQ-plot,

proc univariate data=Regr;var Residu ;histogram Residu / normal ;qqplot Residu / normal(mu=est sigma=estcolor=red L=1);inset mean std / cfill=blank format=5.2;run;

Statistical test

Kolmogorov-Smirnov

proc univariate data=Regr normaltest ;var Residu;run;

Homoscedasticity hypothesis

Graphical

residual/predicted

proc GPlot data=Regr ;plot Residu*Predite /vref=0;run;

Independence hypothesis

Difficult to test !!!

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 14: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Variable selection in multiple regression

The main approaches

Forward selection, which involves starting with no variables in themodel, trying out the variables one by one and including them ifthey are statistically significant.

Backward elimination, which involves starting with all candidatevariables and testing them one by one for statistical significance,deleting any that are not significant.

Methods that are a combination of the above, testing at each stagefor variables to be included or excluded.

SAS Reg procedure

proc reg

model Y = x/selection = adjrsq bic;

model Y = x/selection = stepwise;

run;

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 15: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Linear Mixed Models

Linear mixed effects models have been widely used in analysis ofdata where responses are clustered around some random effects,such that there is a natural dependence between observations in thesame cluster.

For example, consider repeated measurements taken on each subjectin longitudinal data, or observations taken on members of the samefamily in a genetic study.

They can easily accommodate covariances among observations.

They handle correlated data by incorporating random effects andestimating their associated variance components to model variabilityover and above the residual error.

Because of the estimation procedures usually envolved, mixed-modelapproaches can circumvent the problems associated with unbalancedand incomplete data.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 16: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Maize trial

Example

5 progenies of a population of maize progenies were investigated

the trial was conducted randomizing completely 4 replicates of eachprogeny

the response variable was the weight of corn-cob (kg/10m2)

Progenies Replicates1 5.95 6.21 5.40 5.182 5.07 6.71 5.46 4.983 4.82 5.11 4.68 4.524 3.87 4.16 4.11 4.845 5.53 5.82 4.29 4.70

At crossing, genetic effects may be reasonably assumed as normalrandom variables.During early stages of a selection programme, the nature ofgenotypic effects may still be regarded as random.In general, the interest is in the heritability of a trait.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 17: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Penicillin yield (Brien, 2009)

Example

The effects of four treatments on the yield of penicillin are to beinvestigated. It is known that corn steep liquor, an important rawmaterial in producing penicillin, is highly variable from one blending of itto another. To ensure that the results of the experiment apply to morethan one blend, five blends (blocks) are to be used in the experiment.The trial was conducted using the same blend in four flasks andrandomizing the four treatments to these four flasks.

interest of course in each particular treatment usedno interest in each blend which are very depending on thecircumstancesblend effect can be viewed as a sample of a random blend effect(levels are chosen at random from an infinite set of blend levels)interest in estimating the variance of the blend effect as a source ofrandom variation in the datathe four flasks with the same blend share something whichpresumably violates the assumption of independence

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 18: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

. . .

Blend 1

Flask 1 Flask 3 Fask 4Flask 2

Blend 5

Flask 1 Flask 3 Flask 4Flask 2

TreatmentBlend A B C D

1 89 88 97 942 84 77 92 793 81 87 87 854 87 92 89 845 79 81 80 88

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 19: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Calf birth weight

Example

In an animal breeding experiment 20 unrelated cows were subjected tosuperovulation and artificial insemination. Each group of 4 cows wasinseminated with a different sire, with a total of 5 unrelated sires. Out ofeach mating (combination of dam and sire), three calves were generatedand their yearling weights were recorded.

no interest in each sire or dam which are very depending on thecircumstances

sire effect can be viewed as a sample of a random sire effect (levelsare chosen at random from an infinite set of sire levels)

dam effect can be viewed as a sample of a random dam effect (levelsare chosen at random from an infinite set of dam levels)

interest in estimating the variance of the sire and dam effects assources of random variation in the data

the three calves with the same parents share something whichpresumably violates the assumption of independence

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 20: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

...

S1

D1 D2 D3 D4

...

S5

D17 D18 D19 D20

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 21: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Fixed vs Random effects

Random effect: A factor will be designated as random if it isconsidered appropriate to use a probability distribution function todescribe the distribution of effects associated with the population setof levels.

influence only the variance of the response variableinfinite set of levels (only a finite subset present) and interest liesmore in the variance induced by these levels than in the estimation ofthe levels themselvesblends in the penicilin example, progenies in the maize trial

Fixed effect: It will be designated as fixed if it is consideredappropriate to have the effects associated with the population set oflevels for the factor differ in an arbitrary manner, rather than beingdistributed according to a regularly-shaped p.d.f.

influence only the mean of the response variablefinite set of levels and interest lies in the estimation of eachparticular level effecttreatments in the penicilin example

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 22: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

In practice

Random if

i . large number of population levels andii . random behaviouriii . occur in two contrasting kinds of circumstances:

observational studies or designed experiments with hierarchicalstructure- School/Class/Student- Sire/Dam/Calfdesigned experiments with different spatial or temporal scales- longitudinal studies

Fixed if

i . small or large number of population levels andii . systematic behaviour

↪→ Consequence: data collected within each level of the random effectfactor are linked to a same realization of a random variable. Thisintroduce dependency between this data.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 23: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Type of Models

Fixed-effects model - envolves only fixed effects– to make inferences about those particular levels of theclassification factor that were used in the experiment

Random-effects model - envolves only random effects– to make inferences about the population from which these levelswere drawn

Mixed model - envolves fixed and mixed effects

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 24: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Example

Consider a study, related to observations of half-sib families of Iunrelated sires.

If the interest is on comparing only the I sires, the following fixedmodel can be used to represent the data:

E(Yij) = µ+ si

where yij represents the phenotypic trait observation of progeny j ,j = 1, . . . , r , in family i , i = 1, . . . , I , µ is a mean, si is a fixed effectcommon to all animals having sire i .

If the I sires are considered as a sample of a population of sires, thefollowing random model can be used to represent the data:

E(Yij |si ) = µ+ si

where Si is a random effectTwo usual assumptions:

1 si ’s are independently and identically distributed2 si ’s have zero mean and the same variance σ2

s

si ∼ i .i .d .(0, σ2s )

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 25: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

On matrix notation, this model can be expressed as:

y1

y2

· · ·yI

=

1r

1r

· · ·1r

µ+

1r 0r . . . 0r

0r 1r . . . 0r

· · · · · · · · · · · ·0r 0r . . . 1r

s1

s2

· · ·sI

+

ε1

ε2

· · ·εI

where yi = [yi1, yi2, . . . , yiI ]

′ represents the vector of observations ofprogeny i (i.e., relative to sire i); 1r and 0r represent r -dimensionalcolumn vectors of 1′s and 0′s, respectively; and εi = [εi1, εi2, . . . , εiI ]

′ isthe vector of residuals associated with progeny j .

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 26: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Simulation

Case 1: Consider the simple model yij = µ+ si + eij , with 3 independsires and 2 replicates

fix µ = 2

get a sample of 3 values for si from a N(0, σ2s )

get a sample of 6 values for eij from a N(0, σ2)

Case 2: Poderiamos ter uma estrutura de covariancia mais complexaentre os touros (tipo A ∗ σ2

s , em que A e a matriz de parentesco), asimulacao poderia ser feita utilizando-se a decomposicao de Cholesky damatriz A, i.e. A = DD ′). Dai, obtem-se uma vetor z de dimensao 3 danormal N(0,I) – que pode ser obtido amostrando-se cada um de seuselementos de uma normal padrao N(0,1) – e dai multiplica z por D e pelaraiz quadrada de σ2

s , i.e. o vetor s de touros sai como s = D ∗ z ∗ σs .

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 27: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Advantages of Linear Mixed Models

flexibilidade dos modelos de efeitos mistos na modelagem deobservaes agrupadas, ou correlacionadas.

modelos aplicados a indivduos aparentados (como em melhoramentoanimal e vegetal), dados longitudinais, estatstica espacial, etc.

models lineares generalizados com efeitos mistos, como por exemploimplementado pelo GLIMMIX do SAS,

modelos no lineares de efeitos mistos (NLINMIX do SAS, porexemplo), como para curvas de crescimento.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 28: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Linear Mixed Model

Y = Xβ + Zu + ε

Y is an observable data vector

β is a vector of unknown parameters

u is a vector of unobservable random variables

X and Z are design matrices for the fixed and random effects

ε is a vector of random errors

Generally, it is assumed that u and ε are independent from eachother and normally distributed with zero-mean vectors andvariance-covariance matrices G and Σ, respectively, i.e.:[

]∼ N

([00

],

[G 00 Σ

])Inferences regarding mixed effects models refer to the estimation offixed effects, the prediction of random effects, and the estimation ofvariance and covariance components, which are briefly discussednext.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 29: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Estimation of Fixed Effects

Recall that the general linear mixed models equals

Y = Xβ + Zu + ε

u ∼ N(0,G)

ε ∼ N(0,Σ)

u and ε independentThen,

E(Y|u) = Xβ + Zu and Var(Y|u) = Σ)

E(Y) = E[E(Y|u)] = E(Xβ + ZU) = Xβ

Var(Y) = Var[E(Y|u)] + E[Var(Y|u)] = Var(Xβ + ZU) + E(Σ) =ZGZ′ + Σ

The implied marginal model equals Y ∼ N(Xβ,V) whereV = ZGZ′ + Σ

Note that inferences based on the marginal model do not explicitlyassume the presence of random effects representing the naturalheterogeneity between subjects (case of longitudinal data)

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 30: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Estimation of Fixed Effects

Notation

β: vector of fixed effects (as before)α: vector of all variance components in G and Σθ = (β′,α′)′: vector of all parameters in marginal model

Marginal likelihood function:

LML(θ) = (2π)−n/2|V(α)−1/2 exp[− 1

2(Y−Xβ)′V−1(α)(Y−Xβ)

]If α were known, MLE of β equals

β(α) = (X′V−1X)−1X′V−1Y ∼ N(β, (X′V−1X)−1)

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 31: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Estimation of Fixed Effects

As G and Σ are generally unknown, an estimate of V is used instead,such that the estimator becomes β(α) = (X′V−1X)−1X′V−1Y.

The variance-covariance matrix of β is approximated by(X′V−1X)−1.

Note: (X′V−1X)−1 is biased downwards as a consequence ofignoring the variability introduced by working with estimates of(co)variance components instead of their true (unknown) parametervalues.

Approximated confidence regions and test statistics for estimablefunctions of the type K′β can be obtained by using the result:

(K′β0)′(K′(X′V−1X)−K)−1(K′β0)

rank(K)≈ F[ϕNϕD ]

where F[ϕNϕD ] refers to an F-distribution with ϕN = rank(K) degreesof freedom for the numerator, and ϕD degrees of freedom for thedenominator, which is generally calculated from the data using, forexample, the Satterthwaite’s approach

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 32: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Matrix review

X ∼ Nk(µ,Σ)

Considere as particoes:

X =

[X1

X2

], µ =

[µ1

µ2

]e Σ =

[Σ11 Σ12

Σ21 Σ22

],

X1 ∼ N(µ1,Σ11) e X2 ∼ N(µ2,Σ22) (distribuicoes marginais)

e que

X1|X2 ∼ N(µ1.2,Σ11.2) e X2|X1 ∼ N(µ2.1,Σ22.1) (distribuicoes condicionais),

sendo

µ1.2 = µ1 + Σ12Σ−122 (X2 − µ2), Σ11.2 = Σ11 −Σ12Σ−1

22 Σ21

e

µ2.1 = µ2 + Σ21Σ−111 (X1 − µ1) e Σ22.1 = Σ22 −Σ21Σ−1

11 Σ12.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 33: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Prediction of Random Effects

In addition to the estimation of fixed effects, very often in genetics,for example, interest is also on prediction of random effects.

In linear (Gaussian) models such predictions are given by theconditional expectation of u given the data, i.e. E[u|y].

Given the model specifications, the joint distribution of Y and u is:[Yu

]∼ N

([Xβ0

],

[V ZG

GZ′ G

])From the properties of multivariate normal distribution, we have

E[u|y] = E[u] + Cov[u,Y′]Var−1[Y](y − E[Y])

= GZ′V−1(y − Xβ) = GZ′(ZGZ′ + Σ)−1(y − Xβ)

The fixed effects β are typically replaced by their estimates, so thatpredictions are made based on the following expression:

u = GZ′(ZGZ′ + Σ)−1(y − Xβ)

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 34: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Mixed Model Equations

The solutions β and u discussed before require V−1. As V can be ofhuge dimensions, especially in plant and animal breedingapplications, its inverse is generally computationally demanding ifnot unfeasible.

However, Henderson (1950) presented the mixed model equations(MME) to estimate β and u simultaneously, without the need forcomputing V.The MME were derived by maximizing (β and u) the joint density ofY and u ,[f (y,u|β,G,Σ) = f (y|u|β,Σ)f (u|G)], expressed as:

f (y, u|β,G,Σ) ∝ |Σ|−1/2|G|−1/2 exp[−

1

2(y−Xβ−Zu)′Σ−1(y−Xβ−Zu)−

1

2u′G−1u

]

The logarithm of this function is:

` ∝ log |Σ|+ log |G|+ (y − Xβ − Zu)′Σ−1(y − Xβ − Zu) + u′G−1u

= log |Σ|+ log |G|+ y′Σ−1y − 2y′Σ−1Xβ − 2y′Σ−1Zu

+ β′X′Σ−1Xβ + 2β′X′Σ−1Zu + u′Z′Σ−1Zu + u′G−1u

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 35: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Mixed Model Equations

The derivatives of ` regarding β and u are: ∂`

∂β∂`

∂u

=

[X′Σ−1y − X′Σ−1Xβ − X′Σ−1Zu

Z′Σ−1y − Z′Σ−1Xβ − Z′Σ−1Zu− G−1u

]

Equating them to zero gives the following system:[X′Σ−1Xβ + X′Σ−1Zu

Z′Σ−1Xβ + Z′Σ−1Zu + G−1u

]=

[X′Σ−1yZ′Σ−1y

]which can be expressed as:[

X′Σ−1X X′Σ−1ZZ′Σ−1X Z′Σ−1Z + G−1

] [βu

]=

[X′Σ−1yZ′Σ−1y

]known as the mixed model equations (MME).

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 36: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

BLUE and BLUP

Using the second part of the MME, we have that:

Z′Σ−1Xβ + (Z′Σ−1Z + G−1)u = Z′Σ−1y

so thatu = (Z′Σ−1Z + G−1)−1Z′Σ−1(y − Xβ)

It can be shown that this expression is equivalent tou = GZ′(ZGZ′ + Σ)−1(y − Xβ) and, more importantly, that u isthe best linear unbiased predictor (BLUP) of u.

Using this result into the first part of the MME, we have that:

X′Σ−1Xβ + X′Σ−1Zu = X′Σ−1y

X′Σ−1Xβ + X′Σ−1Z(Z′Σ−1Z + G−1)−1Z′Σ−1(y−Xβ) = X′Σ−1y

β = {X′[Σ−1−Σ−1Z(Z′Σ−1Z+G−1)−1ZΣ−1]X}−1X′[Σ−1−Σ−1Z(Z′Σ−1Z+G−1)−1ZΣ−1]Y

Similarly, it is shown that this expression is equivalent toβ = (X′V−1X)−1X′V−1Y, which is the best linear unbiasedestimator (BLUE) of β

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 37: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

BLUE and BLUP

It is important to note that β and u require knowledge of G and Σ.

These matrices, however, are rarely known.

This is a problem without an exact solution using classical methods.

The practical approach is to replace G and Σ by their estimates (Gand Σ) into the MME.

Note that if G and Σ are known, the variance covariance matrix ofthe BLUE and BLUP is:

Var

[βu

]=

[X′Σ−1X X′Σ−1ZZ′Σ−1X Z′Σ−1Z + G−1

]

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 38: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

BLUE and BLUP

If G and Σ are unknown and their values are replaced in the MMEby some sort of point estimates G and Σ, the new solutions β and uof the system:[

X′Σ−1X X′Σ−1

Z

Z′Σ−1

X Z′Σ−1

Z + G−1

] [βu

]=

[X′Σ

−1y

Z′Σ−1

y

]

are no longer BLUE and BLUP solutions, as they are not even linearfunctions of the data y.

It is shown also that generally:

Var

[βu

]>

[X′Σ

−1X X′Σ

−1Z

Z′Σ−1

X Z′Σ−1

Z + G−1

]

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 39: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Estimation methods for the variance components

Recall that α is the vector of all variance components in G and Σ

In most cases, α is not known, and needs to be replaced by anestimate α

Three frequently used estimation methods for α

Moment method or ANOVA Method (MM)

Maximum likelihood method (ML)

Restricted maximum likelihood method (REML)

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 40: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Estimation methods for the variance components

Anova Estimation

Fit the model by assuming that the random effects in the model arefixed effects. Obtain the corresponding ANOVA table.

Compute the expected mean squares of the observed mean squaresin the ANOVA table under the true assumption about the u′s and ε.

Equate the observed mean squares to their expected mean squaresand solve the resulting system of equations for each of the variancecomponents.

Use the resulting solutions as the estimates of the variancecomponents

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 41: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Estimation methods for the variance components

Example

Consider the data set below, related to observations of half-sib families ofI unrelated sires.

Sire1 2 . . . Iy11 y21 . . . yk1

y12 y22 . . . yk2

. . . . . . . . . . . .y1n1 y2n2 . . . yknI

The following model can be used to represent these data:

yij = µ+ si + εij

where yij represents the phenotypic trait observation of progeny j(j = 1, 2 . . . , ni ) in family i , µ is a mean, si is an effect common toall animals having sire i , and εij is a residual term.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 42: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Estimation methods for the variance components

Example

The sire effect si is equivalent to the transmitting ability (which isequal to one-half additive genetic value) of sire i , as one-half of itsgenes are (randomly) transmitted to each of its ni progeny.

The residual terms εij refer to additional genetics effects (such asthe effect of dams) and environmental components.

It is assumed that si ∼ N(0, σ2s ) and εij ∼ N(0, σ2)

The expectation and variance of Yij are

E(Yij) = µ and Var(Yij) = σ2s + σ2

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 43: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Estimation methods for the variance components

Example

The ANOVA table with expected mean squares isSource df SSq MSq E[MSq]Units N − 1Sire I − 1 SSSq SMSq σ2 + kσ2

s

Residual N − I RSSq RMSq σ2

where k = 1I−1 (N − 1

N

∑Ii=1 n

2i ).

The ANOVA (MM) estimators for σ2 and σ2s are

σ2 =RSSq

N − Iand σ2

s =SMSq − RMSq

k=

1

k

[SMSq − σ2

]In the specific case of balanced data, i.e. the same progeny size forall sires, ni = n = N/I and the ANOVA estimators become:

σ2 = RMSq =RSSq

I (n − 1)and σ2

s =SMSq − RMSq

n=

1

n

[1

I − 1SSSq−σ2

]C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 44: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Estimation methods for the variance components

Anova Estimation – Advantages

In general, the ANOVA approach works well for simple models (suchas a one-way structure) or balanced data (such as data fromdesigned experiments with no missing data).

The estimators of the variance components are unbiased.

One can often approximate the degrees of freedom corresponding tothe estimated standard errors of estimators of estimable functions ofthe fixed effects by using Satterthwaite’s Method.For the sire example

σ2s =

SMSq − RMSq

k

with ns degrees of freedom given by

ns =(SMSq − RMSq)2

(SMSq)2

I − 1+

(RMSq)2

N − I

SAS and R can produce the necessary information to perform theseanalysis.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 45: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Estimation methods for the variance components

Anova Estimation – Disadvantages

It is not indicated for more complex models and data structures suchas those generally found in plant and animal breeding, longitudinalstudies.

There is no unique way in which to form an ANOVA table when thedata are not balanced.

The procedure can produce negative estimates of the variancecomponents which do not make sense.

If some of the expected mean squares of the random effects in theANOVA table depend on fixed effects, the method cannot beapplied. This problem can be avoided by placing all the fixed effectsin the model first followed by the random effects.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 46: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Estimation methods for the variance components

A number of methods have been proposed for estimating variancecomponents in more complex scenarios, such as the expected meansquares approach of Henderson (1953), and the minimum normquadratic unbiased estimation (Rao 1971a, 1971b), but maximumlikelihood based methods are currently the most popular ones,especially the restricted (or residual) maximum likelihood (REML)approach, which attempts to correct for the well-known bias in theclassical maximum likelihood (ML) estimation of variancecomponents.

These two methods are briefly described next.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 47: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Estimation methods for the variance components

Maximum Likelihood Method

Maximum likelihood estimates of the variance components can beobtained by maximizing the log-likelihood L(β,G,Σ) = L(β,α)with respect to each element of G and Σ in α, after replacing β byβ = (X′V−1X)−1X′V−1y

Alternatively, G, Σ, and β can be estimated simultaneously bymaximizing their joint log-likelihood with respect to the variancecomponents and the fixed effects. Standard errors can then beobtained by the inverse of the estimated Fisher information matrix.This approach provides an estimator for the variance-covariancematrix of β which takes into account the extra variability related tothe estimation of the variance components.

This means find the values of β, σ21 , σ2

2 , ..., σ2 that maximize thelikelihood function over the parameter space.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 48: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Example

As a simple example of maximum likelihood estimation of variancecomponents, consider the balanced case (i.e., constant progenysizes) half-sib families data set discussed previously, and the linearmodel:

yij = µ+ si + εij

with the same definitions as before, but with the additional assumptionof normality for both the sire and the residual effects, i.e.:

si ∼ N(0, σ2s ) and εij ∼ N(0, σ2)

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 49: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

On matrix notation, this model can be expressed as:

y1

y2

· · ·yI

=

1n

1n

· · ·1n

µ+

1n 0n . . . 0n

0n 1n . . . 0n

· · · · · · · · · · · ·0n 0n . . . 1n

s1

s2

· · ·sI

+

ε1

ε2

· · ·εI

where yi = [yi1, yi2, . . . , yiI ]

′ represents the vector of observations ofprogeny i (i.e., relative to sire i); 1n and 0n represent n-dimensionalcolumn vectors of 1′s and 0′s, respectively; and εi = [εi1, εi2, . . . , εiI ]

′ isthe vector of residuals associated with progeny i .

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 50: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Then, the vector of observations y = [y1, y2, . . . , yI ]′ has amultivariate normal distribution with mean vector µ = 1Nµ andvariance-covariance matrix given by II ⊗ (1nσ

2s 1′n) + INσ2. The

density function (from which the likelihood function obtained) canbe written as:

p(y;µ, σ2s , σ

2) =

=1

(2π)N/2|II ⊗ Jnσ2s + Inσ2|1/2)

× exp

[− 1

2(y − 1Nµ)′(Jnσ

2s + Inσ

2)−1(y − 1Nµ)

]= (2π)−

N2 (σ2)−

N−I2 (σ2 + nσ2

s )−I2

exp

{− 1

2(y − 1Nµ)′

[II ⊗ Jn

(1

n

(1

σ2 + nσ2s

− 1

σ2

))](y − 1Nµ)

}where Jn = 1n1′n is an (n × n) matrix of 1′s, and ⊗ is the Kroneckerproduct.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 51: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

The log-likelihood function can be written as

`(µ, σ2s , σ

2) ∝ −N − I

2log(σ2)−

I

2log(σ2+nσ2

s )−1

2σ2

I∑i=1

n∑j=1

(yij−yi.)2−1

2

I∑i=1

n(yi. − µ)2

σ2 + nσ2s

By taking the derivatives and setting them to 0, the followingsolutions are obtained:

µ = y.., σ2 = RMSq =RSSq

I (n − 1)and σ2

s =1

n

[SSSq

I− σ2

]from which maximum likelihood estimates of the variancecomponents are obtained, except if σ2

s < 0, in which case theestimate is set to zero.

Note the difference between the maximum likelihood and theANOVA estimators of σ2

s . It is well known that maximum likelihoodestimates of variance components are biased downwards as they donot take into account the degrees of freedom used for estimating thefixed effects.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 52: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Some properties of the direct product of matrices

if Ar and Br are square matrices of order r and c, respectively,

Ar ⊗ Bc =

a11B . . . a1rB. . . . . . . . .ar1B . . . arrB

where ⊗ is called the direct (Kronecker) product operator

In general, A⊗ B 6= B⊗ A

If u and v are vectors, then u′ ⊗ v = v ⊗ u′ = vu′

If D(n) is a diagonal matrix and A is any matrix, then:

D⊗ A = d11A⊕ d22A⊕ . . . dnnA

If matrix dimensions are compatible

(A⊗ B)(C⊗ D) = AC⊗ BD

(αAA⊗ αBB) = αAαB (A⊗ B)

(A⊗ B)T = (AT ⊗ BT )

(A⊗ B)−1 = (A)−1(B)−1

rank(A⊗ B) = rank(A)rank(B)

tr(A⊗ B) = tr(A)tr(B)

det(A⊗ B) = det(A)rank(B)det(B)rank(A)

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 53: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Estimation methods for the variance components

Maximum Likelihood Method – Disadvantages

Numerically intensive

Solving the likelihood equations requires an iterative process whichmay or may not converge. Even when it converges, it may convergeto a local maxima rather than to a global maxima.

Tends to underestimate the variance components.

Distributional properties are not known except asymptotically.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 54: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Estimation methods for the variance components

Restricted Maximum Likelihood Method

Another alternative likelihood-based method for inferring variancecomponents in mixed models is the restricted (or residual) maximumlikelihood approach (REML), which corrects the bias associated withmaximum likelihood estimates by taking into account the degrees offreedom used for estimating the fixed effects.

REML estimators of the variance components are found bymaximizing that part of the likelihood function that is invariant tofixed effects in the model.

We have Y = Xβ + Zb + ε. The REML approach for estimation ofvariance components maximizes the likelihood function of a set oferror contrasts Y∗ = LY, where L is a n-rank(X) (where n = thedimension of Y) full-rank matrix with columns orthogonal to thecolumns of the incidence matrix X, that is, LX = 0.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 55: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Estimation methods for the variance components

Restricted Maximum Likelihood MethodThen the vector Y∗ follows a multivariate normal distribution withnull mean vector and variance-covariance matrixL′VL = L′(ZGZ′ + Σ)L, that is,

Y∗ ∼ N(0, σ21LZ1Z′1L′ + σ2

2LZ2Z′2L′ + . . .+ σ2I).

Note that the distribution of Y∗ does not depend on β, then, thelikelihood formed from Y∗ depends only on the variance components.The residual likelihood function for the variance components is then:

L(α; y) = (2π)−(n−p)/2|L′VL|−1/2 exp{−1

2Y∗′(L′VL)−1Y∗}

The REML estimates of the variance components are those values ofσ2

1 , σ22 , ..., σ2 that maximize the restricted likelihood function

L(α|y).Another approach for obtaining the residual likelihood function forthe variance components is by integrating the fixed effects out of the‘full’ likelihood function, i.e.:

L(α; y) =

∫L(β,α|y)dβ

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 56: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Example

Recall the balanced half-sib families data set, and its associated likelihood function:

L(µ, σ2s , σ

2) = (2π)−N2 (σ2)−

N−I2 (σ2 + nσ2

s )−I2

exp

[−

1

2σ2

I∑i=1

n∑j=1

(yij − yi.)2 −

1

2

I∑i=1

n(yi. − µ)2

σ2 + nσ2s

]

Its residual likelihood is then:

L(σ2s , σ

2) =

∫L(µ, σ2

s , σ2)dµ

= (2π)−N2 (σ2)−

N−I2 (σ2 + nσ2

s )−I2

× exp

[−

1

2σ2

I∑i=1

n∑j=1

(yij − yi.)2] ∫

exp

[−

1

2

I∑i=1

n(yi. − µ)2

σ2 + nσ2s

]dµ

which is equal to:

L(σ2s , σ

2) = (2π)−N2 (σ2)−

N−I2 (λ)−

I2

× exp

[−

1

2σ2

I∑i=1

n∑j=1

(yij − yi.)2]

exp

[−

1

I∑i=1

(yi. − µ)2]√

2πλ

In

where λ = σ2 + nσ2s

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 57: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

By taking the derivatives with respect to λ and σ2, and by using theinvariance property of maximum likelihood estimators, the followingsolutions are obtained:

σ2 = RMSq =RSSq

I (n − 1)and σ2

s =1

n

[SSSq

I − 1− σ2

]which are the REML estimates of the variance components, except ifσ2s < 0, i.e. if SSS < I−1

I (n−1)RSS .

As explicit forms of ML and REML estimators are often notavailable for more complex mixed effects models, ML and REMLestimates are generally obtained by iterative approaches such as theexpectation-maximization (EM) algorithm and Newton-Raphson-based procedures.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 58: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Advantages

Less numerically intensive than the Maximum Likelihood Method.

The REML estimates and the ANOVA estimates agree when thedata are balanced and all MM estimates of the variance componentsare non-negative.

REML estimates tend to be less biased than the MaximumLikelihood Estimates

Disadvantage

The distributional properties of these estimators are not known,except asymptotically.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 59: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Example

Yi = Xi

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 60: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 61: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Model selection

Inference for fixed effects β

Wald testst- and F-testsLR tests (with ML not with REML)

Inference for components of variance α

Wald testsLR tests (even with REML)Caution 1: Marginal vs hirarchical modelCaution 2: Boundary problems

To test random terms REML Ratio Tests (REMLRTs) are used whenthe two models being compared are nested.

Akaike (AIC) and Bayesian (BIC) Information Criteria are used whenthe two models being compared are non nested

When the REMLRT is a test of whether a component constrained tobe nonnegative is zero, then the null distribution is a mixture of χ2sSelf & Liang (1987).

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 62: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Likelihood ratio test for fixed effects

Test model M0 embeded in M1

with m0 = dim(Vec(X0)) and m1 = dim(Vec(X1)) (m0 < m1).Let `M0 and `M1 denote the associated log-likelihood calculated at theestimated parameter values.Then

X 2 = −2`M0 + 2`M1 ∼H0 χ2m1−m0

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 63: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Goodness of fit criterion

Akaike’s Information Criterion

AIC = −2 logL(βml , σml , y) + 2K

Bayesian Information Criterion

BIC = −2 logL(βml , σml , y) + K log(n)

Other criterions with other penalty terms

Danger !!! Softwares often give AIC or BIC calculated at reml-estimatedparameter values.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 64: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Software

R functions

lm() – classical linear model

aov() – analysis of variance model

glm() – generalized linear model

gls() – generalized least squares model

gee() – generalized estimating equations (package gee)

lme() – linear mixed models (package nlme)

nlme() – non-linear mixed model (package nlme)

nls() – non-linear regression model (package nls)

lmer() – linear mixed models (package lme4)

ASReml [email protected]://www.vsni.co.uk/products/asremlASReml forum www.vsni.co.uk/forumCookbook: http://uncronopio.org/ASReml

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 65: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Differences between lme4 and nlme

(B. Venables, 2010, personal communication)1 With nlme the fixed and random parts of the model are specified

using two formulae; in lme4 they are specified in the one formulawith the random parts ”added on” to the fixed parts.

2 With nlme you have no generalized linear mixed model fitter, thoughglmmPQL in the MASS library can be used for some GLMMs, and ituses the nlme library. lme4 has a GLMM built-in. It allows you tospecify families in the glm sense, but not all glm families aresupported, yet.

3 nlme offers non-linear mixed effect models; lme4 does not and neverwill.

4 The nlme package allows you to specify variance heterogeneity andcorrelation patterns; the only way to do this within lme4 is to use aglm family, which is often not what you want to do.

5 The nlme package has a gls functon for ”generalized least squares”.This allows you to make use of the variance heterogeneity andcorrelation patterns feature even if the model does not contain anyrandom effects. This is handy.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 66: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Differences between lme4 and nlme

(B. Venables, 2010, personal communication, cont.)

6 (Probably most important difference). nlme is hard to use withcrossed random effects, but is very well-developed for nested randomeffects. lme4 is the opposite: it handles crossed random effects welland using it with nested random effects is still simple enough, but abit more work than with nlme.

7 nlme uses an older algorithm which struggles for large data sets.lme4 uses a newer algorighm and can handle quite large data setsvery quickly. (I think the SAS Proc mixed, though, will handle evenbigger ones.)

8 lme4 is that, at this stage, it is relatively under-developed. Someimportant things are missing.

9 ASREML is wonderful, but it only handles a relatively small set ofmodels (though the most important set, of course)

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 67: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

(C. Brien, 2010, personal communication)

1 ASREML does a wide range of heterogeneous variances andcorrelations for nested and crossed random effects, althoughprobably not the full range of heterogeneous, nested models thatnlme does. ASREML also does GLMMs, similar to GLMMPQL. Itdoes not do the non-linear models.

2 ASREML is good for experiments and lme4/nlme are good for largesurveys, because that is what they were developed for

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 68: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Software

SAS procedures

PROC GLM – general linear model

PROC MIXED – linear mixed model

PROC GENMOD – generalized linear model

PROC GLIMMIX

PROC NLMIXED – non-linear mixed model

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 69: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

(next three slides are from Demetrio, Mortier and Trottier (2009))

Basic SAS code1/ proc mixed data=variety.eval;

2/ class block type dose;

3/ model y = type|dose ;

4/ random block block*dose ;

5/ ods select Tests3 CovParms; run;

call procedure and declare data set

define block, type, dose as factor

define fixed effects in the model

declare random effects

output test type 3 and covarianceparameters

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 70: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

1/ proc mixed statement <options>;

DATA= SAS data set. Name of SAS data set to be used by PROCMIXED. The default is the most recently created data set.

METHOD

REML (default method)ML

COVTEST allows to specify if asymptotic standard errors and WaldZ-test for variance-covariance structure parameter estimates is used.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 71: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

3/ MODEL statement <option>;

describes linear relation between Y and fixed covariables

S or Solution for fixed effects output;

DDFM method to compute approximate Degree of Freedom

CONTAIN (default)RESKRSATTERTH

outpred=Names1, output data-sets Names1 contains predictedvalues X β + Z u, sd...

outpredm=Names2, output data-sets Names2 contains predictedvalues X β, sd...

4/ Random statement

random block / Solution;

↪→ Blup and t-test

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 72: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Maize trial

Example

5 progenies of a maize population were investigated

the trial was conducted using a completely randomized design with 4replicates of each progeny

the response variable was the weight of corn-cob (kg/10m2)

Progenies Replicates1 5.95 6.21 5.40 5.182 5.07 6.71 5.46 4.983 4.82 5.11 4.68 4.524 3.87 4.16 4.11 4.845 5.53 5.82 4.29 4.70

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 73: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Completely Randomized Design (CRD)

For a random effects (CRD), for treatment random, the model is

Yjk = τk + εjk with τk random and εjk random

τk ∼ N(0, σ2T ) and εjk ∼ N(0, σ2)

τk and εjk , τk and τk′ , εjk and εj′k′ (j 6= j ′ and/or k 6= k ′) areindependent

then

Var(Yjk) = Var(τk + εjk) = σ2 + σ2T

Cov(Yjk ,Yj′k) = Cov(τk + εjk , τk + εj′k) = σ2T (observations from

the same treatment)

Cov(Yjk ,Yjk′) = Cov(τk + εjk , τk′ + εjk′) = 0 (observations fromdifferent treatments)

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 74: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

The variance matrices of the observations for the fixed and randommodels when r = 2, t = 3 are

Var(Y) = Σ =

σ2 0 0 0 0 00 σ2 0 0 0 00 0 σ2 0 0 00 0 0 σ2 0 00 0 0 0 σ2 00 0 0 0 0 σ2

,

Var(Y) = ZGZ′ + Σ =

σ2 + σ2T σ2

T 0 0 0 0σ2T σ2 + σ2

T 0 0 0 00 0 σ2 + σ2

T σ2T 0 0

0 0 σ2T σ2 + σ2

T 0 00 0 0 0 σ2 + σ2

T σ2T

0 0 0 0 σ2T σ2 + σ2

T

In this case:

Z =

12 02×1 02×1

02×1 12 02×1

02×1 02×1 12

,G = σ2T I3 and Σ = σ2I6

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 75: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Expected mean squares for an ANOVA

Let Y be an n × 1 vector of random variables with E[Y] = µ andVar[Y] = V, where µ is a n × 1 vector of expected values and V is ann × n matrix. Let A an n × n matrix of real numbers. Then

E(YTAY) = tr (AV) + µTAµ

For a fixed CRD modelE(Y) = XTτ and V = Inσ2

For a random CRD modelE(Y) = Inµ and V = Inσ2 + rσ2

TMT

whereXT = It ⊗ 1r , MT = XT (XT

TXT )−1XTT = r−1Ir ⊗ Jt

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 76: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

The expected mean squares under the fixed and random models are givenin the following table

Source df SSq MSq (s2) E[MSq] E[MSq]Units n − 1 Y′QUY

Treatments t − 1 Y′QTYY′QTY

t − 1σ2 + qT (Ψ) σ2 + rσ2

T

Residual n − t Y′QUResY

Y′QUResY

n − tσ2 σ2

where qT (Ψ) =Ψ′QTΨ

t − 1=

t∑k=1

r(αk − α.)2

t − 1

MU = (It ⊗ Ir ) = Itr , MG = n−1Jt ⊗ Jr = n−1Jn

QT = MT −MG , QU = MU −MG , QURes= MU −MT

σ2 and σ2T are called components of variance

ANOVA estimators:

σ2 = RMSq, σ2T =

TMSq − RMSq

r

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 77: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Maize trial

ANOVA table using R

Source df SSq MSq F Prob

Plots 19

Progeny 4 5.5078 1.3770 4.2872 0.01644∗Residual 15 4.8177 0.3212

MM ML REML

σ2P 0.2639 0.1951 0.2639

σ2 0.3212 0.3212 0.3212

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 78: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

SAS program

data prog;

input Progeny Yield @@;

cards;

1 5.95 3 4.68

1 6.21 3 4.52

1 5.40 4 3.87

1 5.18 4 4.16

2 5.07 4 4.11

2 6.71 4 4.84

2 5.46 5 5.53

2 4.98 5 5.82

3 4.82 5 4.29

3 5.11 5 4.70

;

* Moment Method;

proc glm data=prog;

class Progeny;

model Yield = Progeny;

run;

* Restricted Maximum Likelihood Method;

proc mixed data=prog;

class Progeny;

model Yield = / solution ddfm=sat;

random Progeny / solution ;

run;

* Maximum Likelihood Method;

proc mixed data=prog method=ML;

class Progeny;

model Yield = / solution ddfm=sat;

random Progeny / solution ;

run;

Page 79: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

R program

CRDMaize.dat <- data.frame(Plots = factor(c(1:20)),Progeny = factor(rep(c(1:5), each=4)),Yield <- c(5.95,6.21,5.40,5.18,5.07,6.71,5.46,4.98,4.82,5.11,

4.68,4.52,3.87,4.16,4.11,4.84,5.53,5.82,4.29,4.70))CRDMaize.datattach(CRDMaize.dat)

# Moment MethodCRDMaize.lm <- lm(Yield ~ Progeny, CRDMaize.dat)anova(CRDMaize.lm)(1.3770-0.3212)/4(summary(CRDMaize.lm)$sigma)^2

require(nlme)# Restricted Maximum Likelihood MethodCRDMaize.reml <- lme(Yield ~ 1, random = ~1|Progeny, CRDMaize.dat, method="REML")summary(CRDMaize.reml)(summary(CRDMaize.reml)$sigma)^2random.effects(CRDMaize.reml)CRDMaize.reml$coefcoef(CRDMaize.reml)summary(lm(Yield ~ Progeny-1))

# Maximum Likelihood MethodCRDMaize.ml <- update(CRDMaize.reml, method="ML")#CRDMaize.ml <- lme(Yield ~ 1, random = ~1|Progeny, CRDMaize.dat, method="ML")summary(CRDMaize.ml,corr = F)(summary(CRDMaize.ml)$sigma)^2random.effects(CRDMaize.ml)coef(CRDMaize.ml)

Page 80: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Penicillin yield (Brien, 2009)

Example

the effects of four treatments (A, B, C and D) on the yield ofpenicillin are to be investigated

it was known that corn steep liquor, an important raw material inproducing penicillin, is highly variable from one blending of it toanother

to ensure that the results of the experiment apply to more than oneblend, several blends are to be used in the experiment

the trial was conducted using the same blend in four flasks andrandomizing the four treatments to these four flasks

altogether five blends were utilized

the blends used can be looked as a sample of a population of blends

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 81: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Data

TreatmentBlend A B C D

1 89 88 97 942 84 77 92 793 81 87 87 854 87 92 89 845 79 81 80 88

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 82: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Randomized Complete Design (RCBD)

Considering a randomized complete block design, the model we aresupposing is:

Yjk = βj + τk + εjk with τk fixed and βj and εjk random

βj ∼ N(0, σ2B) and εjk ∼ N(0, σ2)

βj and εjk , βj and βj′ (j 6= j ′), εjk and εj′k′ (j 6= j ′ and/or k 6= k ′)are independent

then

Var(Yjk) = Var(βj + τk + εjk) = Var(βj + εjk) = σ2 + σ2B

Cov(Yjk ,Yj′k) = Cov(βj + τk + εjk , β′j + τk + εj′k) = 0 (observations

from different blocks and the same treatment)

Cov(Yjk ,Yjk′) = Cov(βj + τk + εjk , βj + τ ′k + εjk′) = σ2B

(observations from the same block and different treatments)

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 83: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

The variance matrices of the observations when b = 2, t = 3Block fixed, treatment fixed

Var(Y) = Σ =

σ2 0 0 0 0 00 σ2 0 0 0 00 0 σ2 0 0 00 0 0 σ2 0 00 0 0 0 σ2 00 0 0 0 0 σ2

,

Block random, treatment fixed

Var(Y) = ZGZ′ + Σ =

σ2 + σ2B σ2

B σ2B 0 0 0

σ2B σ2 + σ2

B σ2B 0 0 0

σ2B σ2

B σ2 + σ2B 0 0 0

0 0 0 σ2 + σ2B σ2

B σ2B

0 0 0 σ2 σ2 + σ2B σ2

B

0 0 0 σ2 σ2B σ2 + σ2

B

In this case:

Z =

[13 03×1

03×1 13

],G = σ2

B I2 and Σ = σ2I6

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 84: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Treaments and Blocks fixed and Plots random:E(Y) = XBβ + XTτ and Var(Y) = VY = σ2In,Treaments fixed, Blocks and Plots random:E(Y) = XTτ and Var(Y) = VY = σ2MU + tσ2

BMB .For Blocks and Plots random, the analysis of variance table is

Source df SSq E(MSq)Blocks b − 1 Y′QBY σ2 + tσ2

B

Units[Blocks] b(t − 1) Y′QUYTreatments t − 1 Y′QTY σ2 + qT (Ψ)Residual (b − 1)(t − 1) Y′QURes

Y σ2

ANOVA estimators:

σ2 = RMSq, σ2B =

BMSq − RMSq

t

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 85: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

whereXB = Ib ⊗ 1t ,XT = 1b ⊗ ItMU = (Ib ⊗ It) = IbtMG = n−1Jb ⊗ Jt = n−1Jn

MB = XB(XTB XB)−1XT

B = t−1Ib ⊗ Jt

QB = MB −MG

MT = XT (XTTXT )−1XT

T = b−1Jb ⊗ ItQT = MT −MG

QU = MU −MG

QURes= MU −MB −MT + MG

qT (Ψ) =Ψ′QTΨ

t − 1=

t∑k=1

b(τk − τ .)2

t − 1

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 86: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Penicillin yield

ANOVA table using R

Source df SSq MSq F Prob

Blend 4 264.0 66.0 1.97 0.15

Plots[Blocks] 15

Treat 3 70.0 23.3 1.24 0.34

Residual 12 226.0 18.8

MM ML REML

σ2B 11.8 9.4 11.8

σ2 18.8 15.1 18.8

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 87: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

SAS programdata pen;

input Blend Treat$ Yield @@;

cards;

1 A 89 3 C 87

1 B 88 3 D 85

1 C 97 4 A 87

1 D 94 4 B 92

2 A 84 4 C 89

2 B 77 4 D 84

2 C 92 5 A 79

2 D 79 5 B 81

3 A 81 5 C 80

3 B 87 5 D 88

;

* Moment Method;

proc glm data=pen;

class Blend Treat;

model Yield = Blend Treat;

run;

* Restricted Maximum Likelihood Method;

proc mixed data=pen;

class Blend Treat;

model Yield = Treat / solution ddfm=sat;

random Blend / solution ;

run;

* Maximum Likelihood Method;

proc mixed data=pen method=ML;

class Blend Treat;

model Yield = Treat / solution ddfm=sat;

random Blend / solution ;

run;

Page 88: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

R program

#set up data.frame with factors Flasks, Blends and Treat and response variable YieldRCBDPen.dat <- data.frame(Blend=factor(rep(c(1,2,3,4,5), times=c(4,4,4,4,4))),Flask = factor(rep(c(1,2,3,4), times=5)),Treat = factor(rep(c("A","B","C","D"), times=5)))RCBDPen.dat$Yield <- c(89,88,97,94,84,77,92,79,81,87,87,85,87,92,89,84,79,81,80,88)RCBDPen.datattach(RCBDPen.dat)

# Moment MethodRCBDPen.lm <- lm(Yield ~ Blend + Treat, RCBDPen.dat)anova(RCBDPen.lm)(66.000-18.833)/4anova(lm(Yield ~1, RCBDPen.dat)) # to get the Total SS

require(nlme)# Restricted Maximum Likelihood MethodRCBD.reml <- lme(Yield ~ Treat, random = ~1|Blend, RCBDPen.dat, method="REML")summary(RCBD.reml,corr = F)

# Maximum Likelihood MethodRCBD.ml <- lme(Yield ~ Treat, random = ~1|Blend, RCBDPen.dat, method="ML")summary(RCBD.ml,corr = F)

Page 89: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Calf birth weight

Example

In an animal breeding experiment 20 unrelated cows were subjected tosuperovulation and artificial insemination. Each group of 4 cows wasinseminated with a different sire, with a total of 5 unrelated sires. Out ofeach mating (combination of dam and sire), three calves were generatedand their yearling weights were recorded.

The following model can be used to represent these data:

yijk = µ+ si + dij + εijk

where yijk represents the observed weight of calf k (k = 1, 2, 3) infamily ij , µ is a mean, si is an effect common to all animals havingsire i , vij is an effect common to all animals having dam j crossedwith sire i and εijk is a residual term.

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 90: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Hierarchical classification model

Considering a general hierarchical classification model, the model we aresupposing is:

yijk = µ+ si + dij + εijk with si , dij and εjk random effects

si ∼ N(0, σ2s ), vij ∼ N(0, σ2

d) and εijk ∼ N(0, σ2)

si dij and εijk , si and s ′i (i 6= i ′), dij and di ′j′ (i 6= i ′ and/or j 6= j ′),εijk and εi ′j′k′ (i 6= i ′, j 6= j ′ and/or k 6= k ′) are independent

then

Var(Yijk) = Var(µ+ si + dij + εijk) = Var(si + dij) = σ2 + σ2s + σ2

d

Cov(Yijk ,Yijk′) = Cov(µ+ si +dij + εijk , µ+ si +dij + εijk′) = σ2s +σ2

d

(observations from same sire and same dam)

Cov(Yijk ,Yijk′) = Cov(µ+ si + dij + εijk , µ+ si + dij′ + εij′k′) = σ2s

(observations from same sire and different dam)

Cov(Yijk ,Yijk′) = Cov(µ+ si + dij + εijk , µ+ s ′i + di ′j′ + εi ′j′k′) = 0(observations from different sire and different dam)

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 91: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Considering 2 sires/2 dams/2 calves, the variance matrices of theobservations is

Var(Y) = ZGZ′ + Σ = Z1G1Z′1 + Z2G2Z′2 + Σ

[V 04×4

04×4 V

]where

V =

σ2 + σ2

s + σ2d σ2

s + σ2d σ2

s σ2s

σ2s + σ2

d σ2 + σ2s + σ2

d σ2s σ2

s

σ2s σ2

s σ2 + σ2s + σ2

d σ2s + σ2

d

σ2s σ2

s σ2s + σ2

d σ2 + σ2s + σ2

d

In this case:

Z1 =

[14 04×1

04×1 14

],Z2 =

12 02×1 02×1 02×1

02×1 12 02×1 02×1

02×1 02×1 12 02×1

02×1 02×1 02×1 12

,G1 = σ2

s I2, G2 = σ2d I4 and Σ = σ2I8

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 92: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

For Sires, Dams and Calves random (ISires/JDams/KCalves), theanalysis of variance table is

Source df SSq E(MSq)Sire I − 1 Y′QSY σ2 + Kσ2

d + JKσ2s

Dam[Sire] I (J − 1) Y′QDY σ2 + Kσ2d

Residual IJ(K − 1) Y′QUResY σ2

ANOVA estimators:

σ2 = RMSq, σ2d =

DSMSq − RMSq

K, σ2

s =SMSq − DSMSq

JK

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 93: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Calf birth weight

ANOVA table using R or SAS

Source df SSq MSq F Prob

Sire 4 1356 339 71.22 < 0.01

Dam[Sire] 15 129 9 1.81 0.068

Residual 40 190 4.8

MM ML REML

σ2s 28 22 27

σ2v 1.3 1.2 1.2

σ2 4.8 4.8 4.8

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 94: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

SAS program

data calf_b;input weight sire dam @@;cards;30.1 1 1 39.3 2 7 43.9 4 1431.1 1 1 39.9 2 8 46.7 4 1434.6 1 1 36.7 2 8 44.5 4 1529.2 1 2 38.7 2 8 46.0 4 1530.8 1 2 39.8 3 9 47.0 4 1531.6 1 2 36.5 3 9 43.9 4 1632.0 1 3 38.9 3 9 45.0 4 1632.6 1 3 37.5 3 10 48.0 4 1632.7 1 3 38.6 3 10 41.9 5 1733.3 1 4 36.8 3 10 43.2 5 1740.2 1 4 39.0 3 11 45.3 5 1736.7 1 4 39.8 3 11 45.3 5 1832.3 2 5 38.6 3 11 44.0 5 1836.7 2 5 36.7 3 12 47.1 5 1840.1 2 5 37.6 3 12 45.3 5 1935.6 2 6 38.9 3 12 44.8 5 1934.3 2 6 40.8 4 13 45.3 5 1941.1 2 6 42.0 4 13 46.0 5 2034.1 2 7 45.0 4 13 47.2 5 2030.8 2 7 42.7 4 14 48.0 5 20;* Moment Method;proc glm data=calf_b;class sire dam;model weight = sire dam(sire);random sire dam(sire)/test;run;

Page 95: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

SAS program (cont.)

* Restricted Maximum Likelihood Method;proc mixed data=calf_b;class sire dam;model weight = / solution ;random sire dam / solution G;run;

* Maximum Likelihood Method;proc mixed data=calf_b method=ML;class sire dam;model weight = / solution ddfm=sat;random sire dam / solution G;run;

Page 96: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

R program

sire <- factor(rep(c(1,2,3,4,5), times=c(rep(12,5)))); siredam <- factor(rep(c(1:20), times=c(rep(3,20)))); damweight <- c(30.1, 31.1, 34.6, 29.2, 30.8, 31.6, 32 , 32.6, 32.7, 33.3, 40.2, 36.7,

32.3, 36.7, 40.1, 35.6, 34.3, 41.1, 34.1, 30.8, 39.3, 39.9, 36.7, 38.7,39.8, 36.5, 38.9, 37.5, 38.6, 36.8, 39.0, 39.8, 38.6, 36.7, 37.6, 38.9,40.8, 42.0, 45.0, 42.7, 43.9, 46.7, 44.5, 46.0, 47.0, 43.9, 45.0, 48.0,41.9, 43.2, 45.3, 45.3, 44.0, 47.1, 45.3, 44.8, 45.3, 46.0, 47.2, 48.0)

calf.dat<- data.frame(weight, sire, dam)

# Moment Methodcalf.lm <- lm(weight ~ sire/dam, calf.dat)anova(calf.lm)(MSSire = anova(calf.lm)$Mean[1])(MSDamdSire = anova(calf.lm)$Mean[2])(MSRes = anova(calf.lm)$Mean[3])(MSSire - MSDamdSire)/ (3*4)(MSDamdSire - MSRes)/ (3)

library(nlme)# Maximum Likelihood Methodcalf.ml <- lme(weight ~ 1, random = ~1|sire/dam, calf.dat, method="ML")summary(calf.ml,corr = F)(summary(calf.ml)$sigma)^2

# Restricted Maximum Likelihood Methodrequire(nlme)calf.reml <- lme(weight ~ 1, random = ~1|sire/dam, calf.dat, method="REML")summary(calf.reml,corr = F)(summary(calf.reml)$sigma)^2 # sigma^2names(summary(calf.reml)) # sigma^2

Page 97: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Unbalanced data

The experiment just described is not common. In general, experimentsthis type are unbalanced as the example in the following figure.

S1

D1 D2

S2

D3 D4 D5

S3

D6 D7D1 D2 D3 D4 D5 D6 D7

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 98: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

For Sires, Dams and Calves random (ISires/niDams/mijCalves), theanalysis of variance table is

Source df SSq E(MSq)Sire I − 1 Y′QSY σ2 + K2σ

2d + K3σ

2s

Dam[Sire]∑I

i=1 ni − I Y′QDY σ2 + K1σ2d

Residual N −∑I

i=1 ni − I Y′QUResY σ2

where K1 =1∑I

i=1 ni − I

(N −

I∑i=1

∑nij=1 m

2ij

mi.

),

K2 =1

I − 1

[ I∑i=1

ni∑j=1

m2ij

(1

mi .− 1

N

)]and K3 =

1

I − 1

(N −

∑Ii=1 m

2i.

N

).

N =∑I

i=1

∑nij=1 mij and mi. =

∑nij=1 mij

ANOVA estimators:

σ2 = RMSq, σ2d =

DSMSq − RMSq

K1, σ2

s =SMSq − K2σ

2d − σ2

K3

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 99: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Calf birth weight

ANOVA table using R or SAS

Source df SSq MSq F Prob

Sire 2 37.25 18.63 25.30 < 0.01

Dam[Sire] 4 1275.33 318.83 433.13 < 0.01

Residual 6 4.42 0.74

K1 = 1.75, K2 = 1.96, K3 = 4.15

MM ML REML

σ2s -81.53 0.00000012 0.00000023

σ2v 181.77 104.29 121.75

σ2 0.74 0.74 0.74

Negative component of variance

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 100: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

SAS programdata calf_unb;

input sire dam weight;

cards;

1 1 32.0

1 1 33.5

1 2 55.0

2 3 36.0

2 4 34.5

2 4 35.0

2 5 48.0

2 5 49.5

2 5 50.0

3 6 32.5

3 6 31.5

3 7 58.0

3 7 57.0

;

* Moment Method;

proc glm data=calf_unb;

class sire dam;

model weight = sire dam(sire);

random sire dam(sire)/test;

run;

* Restricted Maximum Likelihood Method;

proc mixed data=calf_unb;

class sire dam;

model weight = / solution ;

random sire dam / solution G;

run;

* Maximum Likelihood Method;

proc mixed data=calf_unb method=ML;

class sire dam;

model weight = / solution ddfm=sat;

random sire dam / solution G;

run;

Page 101: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

R programsire <- factor(c(1,1,1,2,2,2,2,2,2,3,3,3,3))

dam <- factor(c(1,1,2,3,4,4,5,5,5,6,6,7,7))

weight <- c(32.0,33.5,55.0,36.0,34.5,35.0,48.0,49.5,50.0,32.5,31.5,58.0,57.0)

(calf.dat<- data.frame(sire, dam, weight))

# Moment Method

calf.lm <- lm(weight ~ sire/dam, calf.dat)

anova(calf.lm)

(SireMSq = anova(calf.lm)$Mean[1])

(DamdSireMSq = anova(calf.lm)$Mean[2])

(ResMSq = anova(calf.lm)$Mean[3])

(k1 <- (13-(2^2+1^2)/3-(1^2+2^2+3^2)/6-(2^2+2^2)/4)/((2+3+2)-3))

(k2 <- ((2^2+1^2)*(1/3-1/13)+(1^2+2^2+3^2)*(1/6-1/13)+(2^2+2^2)*(1/4-1/13))/(3-1))

(k3 <- (13-(3^2+6^2+4^2)/13)/(3-1))

(sigma2D <- (DamdSireMSq - ResMSq)/k1) # sigmaD^2_hat

(sigma2S <- (SireMSq - ResMSq - k2*sigma2D)/ k3) # sigmaS^2_hat

# Restricted Maximum Likelihood Method

require(nlme)

calf.reml <- lme(weight ~ 1, random = ~1|sire/dam, calf.dat, method="REML")

summary(calf.reml,corr = F)

(summary(calf.reml)$sigma)^2 # sigma^2

# Maximum Likelihood Method

calf.ml <- lme(weight ~ 1, random = ~1|sire/dam, calf.dat, method="ML")

summary(calf.ml,corr = F)

(summary(calf.ml)$sigma)^2

Page 102: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

General Covariance Structures

Model Description Matrix

ID Identical variation

σ2 0 . . . 00 σ2 . . . 0. . . . . . . . . . . .0 0 . . . σ2

DIAG Heterogeneous variation

σ2

1 0 . . . 00 σ2

2 . . . 0. . . . . . . . . . . .0 0 . . . σ2

J

CS Compound symmetry with homogeneous variation

σ2 σ2

a . . . σ2a

σ2a σ2 . . . σ2

a. . . . . . . . . . . .σ2a σ2

a . . . σ2

CSHet Compound symmetry with heterogeneous variation

σ2

1 σ2a . . . σ2

a

σ2a σ2

2 . . . σ2a

. . . . . . . . . . . .σ2a σ2

a . . . σ2J

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 103: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

General Covariance Structures (cont.)

Model Description Matrix

AR1 First-order autoregressive mod. with homog. var.

σ2 σ2ρ . . . σ2ρd(1,J)

σ2ρ σ2 . . . σ2ρd(2,J)

. . . . . . . . . . . .

σ2ρd(J,1) σ2ρd(J,2) . . . σ2

AR1Het First-order autoregressive mod. with heterog. var.

σ2

1 σ2aρ . . . σ2

aρd(1,J)

σ2aρ σ2

2 . . . σ2aρ

d(2,J)

. . . . . . . . . . . .

σ2aρ

d(J,1) σ2aρ

d(J,2) . . . σ2J

UN Unstructured model

σ2

1 σ12 . . . σ1J

σ21 σ22 . . . σ2J

. . . . . . . . . . . .

σJ1 σJ2 . . . σ2J

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Page 104: PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression models if matrix X contains only factors, model are called Analysis of Variance (ANOVA)

Plan Day 1 Linear Model Linear Mixed Model

Table: Covariance Structure (SAS)

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture