PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression...

Plan Day 1

Contents

1 Day 1: Linear Mixed Models (LMM)1 Linear Models2 Random and fixed effects3 Definition and notations4 Estimation and algorithms5 Model selection

C.G.B. Demetrio An Introduction to Mixed Models in Agriculture

Plan Day 1 Linear Model Linear Mixed Model

Linear Model

The classical linear model is defined by

Y = Xβ + ε

where

Y is an observable data (response variable) vector

β is a vector of unknown parameters

X is the design matrix (for factors and regressors)

ε is a vector of random errors and ε ∼ N(0, σ2I)

ThenE = Xβ and Var(Y) = σ2I

The ordinary least-squares estimator (the same as MLE) of β is

β = (X′X)−1X′Y

Disadvantages

too restrictive for most of typical data sets

the error-structure in real-world experiments is often more complexthan Σ = σ2I



Explanatory variables

(next ten slides are from Demetrio, Mortier and Trottier, 2009)2 types of explanatory variables :

1 factors↪→ interest is in attributing variability in y to various categories ofthe factor

Example: patients classified by gender (M/F) and age group(A/B/C)

Yij = µ+ αi + βj + εij i = 1, 2 j = 1, 2, 3

↪→ parameter values give the impact of factor’s levels on theresponse variable

factors may be crossed or nestedfactors may have main effect and interaction effect



2 regressors↪→ interest is in attributing variability in Y to changes in values of acontinuous covariable

Example: changes due to weight x

Yi = β0 + β1xi + εi

↪→ parameter values give the impact of an increase in x on theresponse variable



Terminology :

Multiple Linear Regression/ANOVA/ANCOVA

if matrix X contains only regressors, models are called regressionmodels

if matrix X contains only factors, model are called Analysis ofVariance (ANOVA) (X is a matrix with 1’s and 0’s) models.

if matrix X contains both regressors and factors, models are calledAnalysis of Covariance (ANCOVA) models.



Estimation

Let’s assume a linear model :

Y = Xβ + ε

Parameters to be estimated are β, σIn all the following, X is supposed of full rank: rank(X )= K

Least squares approach : min(||Y − Xβ||2)

βls = (X ′X )−1X ′Y

best linear unbiased estimator of β

βls ∼ N (β, σ2(X ′X )−1)

best quadratic unbiased estimator of σ2

σ2ls =

1

n − K(Y − X βls)′(Y − X βls) and σ2

ls ∼σ2

n − Kχ2(n − K )



Maximum likelihood approach

Likelihood

L(β, σ; y) =n∏

i=1

1√2πσ2

e−1

2σ2 (yi−x′i β)′(yi−x′i β)

Log-likelihood

`(β, σ; y) = −n

2log (2πσ2)− 1

2σ2(y − Xβ)′(y − Xβ)

Maximum log-likelihood

∂β,σ`(β, σ, y) = 0⇒

βml = (X ′X )−1X ′Y

σ2ml =

1

n(Y − X β)′(Y − X β)



βls = βml , unbiased

E[βls ] = E[βml ] = E[(X ′X )−1X ′Y] = β

but σ2ls 6= σ2

ml

σ2ls is unbiasedσ2ls is calculated on the orthogonal space of Xσ2ls takes into account the difference between Y and its projection X β

on X and the lost of degrees of freedom due to the estimation of β

σ2ml is biased

joint estimation of σ2 and βit does not take into account the lost in degrees of freedom due tothe estimation of β

Note thatE[(Y − X β)′(Y − X β)] = E{Y′[I − (X ′X )−1X ′]Y}σ2 = (n − K )σ2

Then E(σ2ls) = σ2 and E(σ2

ml) = n−Kn σ2



Definition: The trace of a square matrix is the sum of its diagonalelements.Definition: The degrees of freedom of a sum of squares is the rank ofthe idempotent of its quadratic form. That is the degrees of freedom ofY′AY is given by rank(A).Lemma: For B idempotent, rank(B) = trace(B).Lemma: Let c be a scalar and (A), (B) and (C) be matrices. Thenwhen the appropriate operations are defined, we have

(i) trace(A) = trace(A′);

(ii) trace(cA) = c trace(A);

(iii) trace(A + B) =trace(A) + trace(B);

(iv) trace(AB) =trace(BA);

(v) trace(ABC) =trace(CAB) =trace(BCA)

(vi) trace(A ⊗ B) = trace(B) trace(A);

(vii) trace(A′A) = 0 if only if A = 0.



Theorem: Let Y be an n × 1 vector of random variables with

E[Y] = Ψ and Var[Y] = V

where Ψ is a n × 1 vector of expected values and V is an n × n matrix.Let A an n × n matrix of real values. Then

E(YTAY) = trace (AV) + ΨTAΨ



Goodness of fit criterion

Adjusted R-square

R2 = 1−∑n

i=1(yi − x ′i β)2/(n − K )∑ni=1(yi − y)2/(n − 1)

Akaike’s Information Criterion

AIC = −2 logL(βml , σml , y) + 2K

Bayesian Information Criterion

BIC = −2 logL(βml , σml , y) + K log(n)



SAS procedure

Proc GLM data = data;

class x; * if x is a factor

model y = x;

output out=Regr p=Predite r=Residu;

run;



Checking

Gaussian hypothesis

Graphical

histogram, QQ-plot,

proc univariate data=Regr;var Residu ;histogram Residu / normal ;qqplot Residu / normal(mu=est sigma=estcolor=red L=1);inset mean std / cfill=blank format=5.2;run;

Statistical test

Kolmogorov-Smirnov

proc univariate data=Regr normaltest ;var Residu;run;

Homoscedasticity hypothesis

Graphical

residual/predicted

proc GPlot data=Regr ;plot Residu*Predite /vref=0;run;

Independence hypothesis

Difficult to test !!!



Variable selection in multiple regression

The main approaches

Forward selection, which involves starting with no variables in themodel, trying out the variables one by one and including them ifthey are statistically significant.

Backward elimination, which involves starting with all candidatevariables and testing them one by one for statistical significance,deleting any that are not significant.

Methods that are a combination of the above, testing at each stagefor variables to be included or excluded.

SAS Reg procedure

proc reg

model Y = x/selection = adjrsq bic;

model Y = x/selection = stepwise;

run;



Linear Mixed Models

Linear mixed effects models have been widely used in analysis ofdata where responses are clustered around some random effects,such that there is a natural dependence between observations in thesame cluster.

For example, consider repeated measurements taken on each subjectin longitudinal data, or observations taken on members of the samefamily in a genetic study.

They can easily accommodate covariances among observations.

They handle correlated data by incorporating random effects andestimating their associated variance components to model variabilityover and above the residual error.

Because of the estimation procedures usually envolved, mixed-modelapproaches can circumvent the problems associated with unbalancedand incomplete data.



Maize trial

Example

5 progenies of a population of maize progenies were investigated

the trial was conducted randomizing completely 4 replicates of eachprogeny

the response variable was the weight of corn-cob (kg/10m2)

Progenies Replicates1 5.95 6.21 5.40 5.182 5.07 6.71 5.46 4.983 4.82 5.11 4.68 4.524 3.87 4.16 4.11 4.845 5.53 5.82 4.29 4.70

At crossing, genetic effects may be reasonably assumed as normalrandom variables.During early stages of a selection programme, the nature ofgenotypic effects may still be regarded as random.In general, the interest is in the heritability of a trait.



Penicillin yield (Brien, 2009)

Example

The effects of four treatments on the yield of penicillin are to beinvestigated. It is known that corn steep liquor, an important rawmaterial in producing penicillin, is highly variable from one blending of itto another. To ensure that the results of the experiment apply to morethan one blend, five blends (blocks) are to be used in the experiment.The trial was conducted using the same blend in four flasks andrandomizing the four treatments to these four flasks.

interest of course in each particular treatment usedno interest in each blend which are very depending on thecircumstancesblend effect can be viewed as a sample of a random blend effect(levels are chosen at random from an infinite set of blend levels)interest in estimating the variance of the blend effect as a source ofrandom variation in the datathe four flasks with the same blend share something whichpresumably violates the assumption of independence



. . .

Blend 1

Flask 1 Flask 3 Fask 4Flask 2

Blend 5

Flask 1 Flask 3 Flask 4Flask 2

TreatmentBlend A B C D

1 89 88 97 942 84 77 92 793 81 87 87 854 87 92 89 845 79 81 80 88



Calf birth weight

Example

In an animal breeding experiment 20 unrelated cows were subjected tosuperovulation and artificial insemination. Each group of 4 cows wasinseminated with a different sire, with a total of 5 unrelated sires. Out ofeach mating (combination of dam and sire), three calves were generatedand their yearling weights were recorded.

no interest in each sire or dam which are very depending on thecircumstances

sire effect can be viewed as a sample of a random sire effect (levelsare chosen at random from an infinite set of sire levels)

dam effect can be viewed as a sample of a random dam effect (levelsare chosen at random from an infinite set of dam levels)

interest in estimating the variance of the sire and dam effects assources of random variation in the data

the three calves with the same parents share something whichpresumably violates the assumption of independence



...

S1

D1 D2 D3 D4

...

S5

D17 D18 D19 D20



Fixed vs Random effects

Random effect: A factor will be designated as random if it isconsidered appropriate to use a probability distribution function todescribe the distribution of effects associated with the population setof levels.

influence only the variance of the response variableinfinite set of levels (only a finite subset present) and interest liesmore in the variance induced by these levels than in the estimation ofthe levels themselvesblends in the penicilin example, progenies in the maize trial

Fixed effect: It will be designated as fixed if it is consideredappropriate to have the effects associated with the population set oflevels for the factor differ in an arbitrary manner, rather than beingdistributed according to a regularly-shaped p.d.f.

influence only the mean of the response variablefinite set of levels and interest lies in the estimation of eachparticular level effecttreatments in the penicilin example



In practice

Random if

i . large number of population levels andii . random behaviouriii . occur in two contrasting kinds of circumstances:

observational studies or designed experiments with hierarchicalstructure- School/Class/Student- Sire/Dam/Calfdesigned experiments with different spatial or temporal scales- longitudinal studies

Fixed if

i . small or large number of population levels andii . systematic behaviour

↪→ Consequence: data collected within each level of the random effectfactor are linked to a same realization of a random variable. Thisintroduce dependency between this data.



Type of Models

Fixed-effects model - envolves only fixed effects– to make inferences about those particular levels of theclassification factor that were used in the experiment

Random-effects model - envolves only random effects– to make inferences about the population from which these levelswere drawn

Mixed model - envolves fixed and mixed effects



Example

Consider a study, related to observations of half-sib families of Iunrelated sires.

If the interest is on comparing only the I sires, the following fixedmodel can be used to represent the data:

E(Yij) = µ+ si

where yij represents the phenotypic trait observation of progeny j ,j = 1, . . . , r , in family i , i = 1, . . . , I , µ is a mean, si is a fixed effectcommon to all animals having sire i .

If the I sires are considered as a sample of a population of sires, thefollowing random model can be used to represent the data:

E(Yij |si ) = µ+ si

where Si is a random effectTwo usual assumptions:

1 si ’s are independently and identically distributed2 si ’s have zero mean and the same variance σ2

s

si ∼ i .i .d .(0, σ2s )



On matrix notation, this model can be expressed as:

y1

y2

· · ·yI

=

1r

1r

· · ·1r

µ+

1r 0r . . . 0r

0r 1r . . . 0r

· · · · · · · · · · · ·0r 0r . . . 1r

s1

s2

· · ·sI

+

ε1

ε2

· · ·εI

where yi = [yi1, yi2, . . . , yiI ]

′ represents the vector of observations ofprogeny i (i.e., relative to sire i); 1r and 0r represent r -dimensionalcolumn vectors of 1′s and 0′s, respectively; and εi = [εi1, εi2, . . . , εiI ]

′ isthe vector of residuals associated with progeny j .



Simulation

Case 1: Consider the simple model yij = µ+ si + eij , with 3 independsires and 2 replicates

fix µ = 2

get a sample of 3 values for si from a N(0, σ2s )

get a sample of 6 values for eij from a N(0, σ2)

Case 2: Poderiamos ter uma estrutura de covariancia mais complexaentre os touros (tipo A ∗ σ2

s , em que A e a matriz de parentesco), asimulacao poderia ser feita utilizando-se a decomposicao de Cholesky damatriz A, i.e. A = DD ′). Dai, obtem-se uma vetor z de dimensao 3 danormal N(0,I) – que pode ser obtido amostrando-se cada um de seuselementos de uma normal padrao N(0,1) – e dai multiplica z por D e pelaraiz quadrada de σ2

s , i.e. o vetor s de touros sai como s = D ∗ z ∗ σs .



Advantages of Linear Mixed Models

flexibilidade dos modelos de efeitos mistos na modelagem deobservaes agrupadas, ou correlacionadas.

modelos aplicados a indivduos aparentados (como em melhoramentoanimal e vegetal), dados longitudinais, estatstica espacial, etc.

models lineares generalizados com efeitos mistos, como por exemploimplementado pelo GLIMMIX do SAS,

modelos no lineares de efeitos mistos (NLINMIX do SAS, porexemplo), como para curvas de crescimento.



Linear Mixed Model

Y = Xβ + Zu + ε

Y is an observable data vector

β is a vector of unknown parameters

u is a vector of unobservable random variables

X and Z are design matrices for the fixed and random effects

ε is a vector of random errors

Generally, it is assumed that u and ε are independent from eachother and normally distributed with zero-mean vectors andvariance-covariance matrices G and Σ, respectively, i.e.:[

uε

]∼ N

([00

],

[G 00 Σ

])Inferences regarding mixed effects models refer to the estimation offixed effects, the prediction of random effects, and the estimation ofvariance and covariance components, which are briefly discussednext.



Estimation of Fixed Effects

Recall that the general linear mixed models equals

Y = Xβ + Zu + ε

u ∼ N(0,G)

ε ∼ N(0,Σ)

u and ε independentThen,

E(Y|u) = Xβ + Zu and Var(Y|u) = Σ)

E(Y) = E[E(Y|u)] = E(Xβ + ZU) = Xβ

Var(Y) = Var[E(Y|u)] + E[Var(Y|u)] = Var(Xβ + ZU) + E(Σ) =ZGZ′ + Σ

The implied marginal model equals Y ∼ N(Xβ,V) whereV = ZGZ′ + Σ

Note that inferences based on the marginal model do not explicitlyassume the presence of random effects representing the naturalheterogeneity between subjects (case of longitudinal data)




Notation

β: vector of fixed effects (as before)α: vector of all variance components in G and Σθ = (β′,α′)′: vector of all parameters in marginal model

Marginal likelihood function:

LML(θ) = (2π)−n/2|V(α)−1/2 exp[− 1

2(Y−Xβ)′V−1(α)(Y−Xβ)

]If α were known, MLE of β equals

β(α) = (X′V−1X)−1X′V−1Y ∼ N(β, (X′V−1X)−1)




As G and Σ are generally unknown, an estimate of V is used instead,such that the estimator becomes β(α) = (X′V−1X)−1X′V−1Y.

The variance-covariance matrix of β is approximated by(X′V−1X)−1.

Note: (X′V−1X)−1 is biased downwards as a consequence ofignoring the variability introduced by working with estimates of(co)variance components instead of their true (unknown) parametervalues.

Approximated confidence regions and test statistics for estimablefunctions of the type K′β can be obtained by using the result:

(K′β0)′(K′(X′V−1X)−K)−1(K′β0)

rank(K)≈ F[ϕNϕD ]

where F[ϕNϕD ] refers to an F-distribution with ϕN = rank(K) degreesof freedom for the numerator, and ϕD degrees of freedom for thedenominator, which is generally calculated from the data using, forexample, the Satterthwaite’s approach



Matrix review

X ∼ Nk(µ,Σ)

Considere as particoes:

X =

[X1

X2

], µ =

[µ1

µ2

]e Σ =

[Σ11 Σ12

Σ21 Σ22

],

X1 ∼ N(µ1,Σ11) e X2 ∼ N(µ2,Σ22) (distribuicoes marginais)

e que

X1|X2 ∼ N(µ1.2,Σ11.2) e X2|X1 ∼ N(µ2.1,Σ22.1) (distribuicoes condicionais),

sendo

µ1.2 = µ1 + Σ12Σ−122 (X2 − µ2), Σ11.2 = Σ11 −Σ12Σ−1

22 Σ21

e

µ2.1 = µ2 + Σ21Σ−111 (X1 − µ1) e Σ22.1 = Σ22 −Σ21Σ−1

11 Σ12.



Prediction of Random Effects

In addition to the estimation of fixed effects, very often in genetics,for example, interest is also on prediction of random effects.

In linear (Gaussian) models such predictions are given by theconditional expectation of u given the data, i.e. E[u|y].

Given the model specifications, the joint distribution of Y and u is:[Yu

]∼ N

([Xβ0

],

[V ZG

GZ′ G

])From the properties of multivariate normal distribution, we have

E[u|y] = E[u] + Cov[u,Y′]Var−1[Y](y − E[Y])

= GZ′V−1(y − Xβ) = GZ′(ZGZ′ + Σ)−1(y − Xβ)

The fixed effects β are typically replaced by their estimates, so thatpredictions are made based on the following expression:

u = GZ′(ZGZ′ + Σ)−1(y − Xβ)



Mixed Model Equations

The solutions β and u discussed before require V−1. As V can be ofhuge dimensions, especially in plant and animal breedingapplications, its inverse is generally computationally demanding ifnot unfeasible.

However, Henderson (1950) presented the mixed model equations(MME) to estimate β and u simultaneously, without the need forcomputing V.The MME were derived by maximizing (β and u) the joint density ofY and u ,[f (y,u|β,G,Σ) = f (y|u|β,Σ)f (u|G)], expressed as:

f (y, u|β,G,Σ) ∝ |Σ|−1/2|G|−1/2 exp[−

1

2(y−Xβ−Zu)′Σ−1(y−Xβ−Zu)−

1

2u′G−1u

]

The logarithm of this function is:

` ∝ log |Σ|+ log |G|+ (y − Xβ − Zu)′Σ−1(y − Xβ − Zu) + u′G−1u

= log |Σ|+ log |G|+ y′Σ−1y − 2y′Σ−1Xβ − 2y′Σ−1Zu

+ β′X′Σ−1Xβ + 2β′X′Σ−1Zu + u′Z′Σ−1Zu + u′G−1u



Mixed Model Equations

The derivatives of ` regarding β and u are: ∂`

∂β∂`

∂u

=

[X′Σ−1y − X′Σ−1Xβ − X′Σ−1Zu

Z′Σ−1y − Z′Σ−1Xβ − Z′Σ−1Zu− G−1u

]

Equating them to zero gives the following system:[X′Σ−1Xβ + X′Σ−1Zu

Z′Σ−1Xβ + Z′Σ−1Zu + G−1u

]=

[X′Σ−1yZ′Σ−1y

]which can be expressed as:[

X′Σ−1X X′Σ−1ZZ′Σ−1X Z′Σ−1Z + G−1

] [βu

]=

[X′Σ−1yZ′Σ−1y

]known as the mixed model equations (MME).



BLUE and BLUP

Using the second part of the MME, we have that:

Z′Σ−1Xβ + (Z′Σ−1Z + G−1)u = Z′Σ−1y

so thatu = (Z′Σ−1Z + G−1)−1Z′Σ−1(y − Xβ)

It can be shown that this expression is equivalent tou = GZ′(ZGZ′ + Σ)−1(y − Xβ) and, more importantly, that u isthe best linear unbiased predictor (BLUP) of u.

Using this result into the first part of the MME, we have that:

X′Σ−1Xβ + X′Σ−1Zu = X′Σ−1y

X′Σ−1Xβ + X′Σ−1Z(Z′Σ−1Z + G−1)−1Z′Σ−1(y−Xβ) = X′Σ−1y

β = {X′[Σ−1−Σ−1Z(Z′Σ−1Z+G−1)−1ZΣ−1]X}−1X′[Σ−1−Σ−1Z(Z′Σ−1Z+G−1)−1ZΣ−1]Y

Similarly, it is shown that this expression is equivalent toβ = (X′V−1X)−1X′V−1Y, which is the best linear unbiasedestimator (BLUE) of β



BLUE and BLUP

It is important to note that β and u require knowledge of G and Σ.

These matrices, however, are rarely known.

This is a problem without an exact solution using classical methods.

The practical approach is to replace G and Σ by their estimates (Gand Σ) into the MME.

Note that if G and Σ are known, the variance covariance matrix ofthe BLUE and BLUP is:

Var

[βu

]=

[X′Σ−1X X′Σ−1ZZ′Σ−1X Z′Σ−1Z + G−1

]



BLUE and BLUP

If G and Σ are unknown and their values are replaced in the MMEby some sort of point estimates G and Σ, the new solutions β and uof the system:[

X′Σ−1X X′Σ−1

Z

Z′Σ−1

X Z′Σ−1

Z + G−1

] [βu

]=

[X′Σ

−1y

Z′Σ−1

y

]

are no longer BLUE and BLUP solutions, as they are not even linearfunctions of the data y.

It is shown also that generally:

Var

[βu

]>

[X′Σ

−1X X′Σ

−1Z

Z′Σ−1

X Z′Σ−1

Z + G−1

]



Estimation methods for the variance components

Recall that α is the vector of all variance components in G and Σ

In most cases, α is not known, and needs to be replaced by anestimate α

Three frequently used estimation methods for α

Moment method or ANOVA Method (MM)

Maximum likelihood method (ML)

Restricted maximum likelihood method (REML)




Anova Estimation

Fit the model by assuming that the random effects in the model arefixed effects. Obtain the corresponding ANOVA table.

Compute the expected mean squares of the observed mean squaresin the ANOVA table under the true assumption about the u′s and ε.

Equate the observed mean squares to their expected mean squaresand solve the resulting system of equations for each of the variancecomponents.

Use the resulting solutions as the estimates of the variancecomponents




Example

Consider the data set below, related to observations of half-sib families ofI unrelated sires.

Sire1 2 . . . Iy11 y21 . . . yk1

y12 y22 . . . yk2

. . . . . . . . . . . .y1n1 y2n2 . . . yknI

The following model can be used to represent these data:

yij = µ+ si + εij

where yij represents the phenotypic trait observation of progeny j(j = 1, 2 . . . , ni ) in family i , µ is a mean, si is an effect common toall animals having sire i , and εij is a residual term.




Example

The sire effect si is equivalent to the transmitting ability (which isequal to one-half additive genetic value) of sire i , as one-half of itsgenes are (randomly) transmitted to each of its ni progeny.

The residual terms εij refer to additional genetics effects (such asthe effect of dams) and environmental components.

It is assumed that si ∼ N(0, σ2s ) and εij ∼ N(0, σ2)

The expectation and variance of Yij are

E(Yij) = µ and Var(Yij) = σ2s + σ2




Example

The ANOVA table with expected mean squares isSource df SSq MSq E[MSq]Units N − 1Sire I − 1 SSSq SMSq σ2 + kσ2

s

Residual N − I RSSq RMSq σ2

where k = 1I−1 (N − 1

N

∑Ii=1 n

2i ).

The ANOVA (MM) estimators for σ2 and σ2s are

σ2 =RSSq

N − Iand σ2

s =SMSq − RMSq

k=

1

k

[SMSq − σ2

]In the specific case of balanced data, i.e. the same progeny size forall sires, ni = n = N/I and the ANOVA estimators become:

σ2 = RMSq =RSSq

I (n − 1)and σ2

s =SMSq − RMSq

n=

1

n

[1

I − 1SSSq−σ2

]C.G.B. Demetrio An Introduction to Mixed Models in Agriculture



Anova Estimation – Advantages

In general, the ANOVA approach works well for simple models (suchas a one-way structure) or balanced data (such as data fromdesigned experiments with no missing data).

The estimators of the variance components are unbiased.

One can often approximate the degrees of freedom corresponding tothe estimated standard errors of estimators of estimable functions ofthe fixed effects by using Satterthwaite’s Method.For the sire example

σ2s =

SMSq − RMSq

k

with ns degrees of freedom given by

ns =(SMSq − RMSq)2

(SMSq)2

I − 1+

(RMSq)2

N − I

SAS and R can produce the necessary information to perform theseanalysis.




Anova Estimation – Disadvantages

It is not indicated for more complex models and data structures suchas those generally found in plant and animal breeding, longitudinalstudies.

There is no unique way in which to form an ANOVA table when thedata are not balanced.

The procedure can produce negative estimates of the variancecomponents which do not make sense.

If some of the expected mean squares of the random effects in theANOVA table depend on fixed effects, the method cannot beapplied. This problem can be avoided by placing all the fixed effectsin the model first followed by the random effects.




A number of methods have been proposed for estimating variancecomponents in more complex scenarios, such as the expected meansquares approach of Henderson (1953), and the minimum normquadratic unbiased estimation (Rao 1971a, 1971b), but maximumlikelihood based methods are currently the most popular ones,especially the restricted (or residual) maximum likelihood (REML)approach, which attempts to correct for the well-known bias in theclassical maximum likelihood (ML) estimation of variancecomponents.

These two methods are briefly described next.




Maximum Likelihood Method

Maximum likelihood estimates of the variance components can beobtained by maximizing the log-likelihood L(β,G,Σ) = L(β,α)with respect to each element of G and Σ in α, after replacing β byβ = (X′V−1X)−1X′V−1y

Alternatively, G, Σ, and β can be estimated simultaneously bymaximizing their joint log-likelihood with respect to the variancecomponents and the fixed effects. Standard errors can then beobtained by the inverse of the estimated Fisher information matrix.This approach provides an estimator for the variance-covariancematrix of β which takes into account the extra variability related tothe estimation of the variance components.

This means find the values of β, σ21 , σ2

2 , ..., σ2 that maximize thelikelihood function over the parameter space.



Example

As a simple example of maximum likelihood estimation of variancecomponents, consider the balanced case (i.e., constant progenysizes) half-sib families data set discussed previously, and the linearmodel:

yij = µ+ si + εij

with the same definitions as before, but with the additional assumptionof normality for both the sire and the residual effects, i.e.:

si ∼ N(0, σ2s ) and εij ∼ N(0, σ2)



On matrix notation, this model can be expressed as:

y1

y2

· · ·yI

=

1n

1n

· · ·1n

µ+

1n 0n . . . 0n

0n 1n . . . 0n

· · · · · · · · · · · ·0n 0n . . . 1n

s1

s2

· · ·sI

+

ε1

ε2

· · ·εI

where yi = [yi1, yi2, . . . , yiI ]

′ represents the vector of observations ofprogeny i (i.e., relative to sire i); 1n and 0n represent n-dimensionalcolumn vectors of 1′s and 0′s, respectively; and εi = [εi1, εi2, . . . , εiI ]

′ isthe vector of residuals associated with progeny i .



Then, the vector of observations y = [y1, y2, . . . , yI ]′ has amultivariate normal distribution with mean vector µ = 1Nµ andvariance-covariance matrix given by II ⊗ (1nσ

2s 1′n) + INσ2. The

density function (from which the likelihood function obtained) canbe written as:

p(y;µ, σ2s , σ

2) =

=1

(2π)N/2|II ⊗ Jnσ2s + Inσ2|1/2)

× exp

[− 1

2(y − 1Nµ)′(Jnσ

2s + Inσ

2)−1(y − 1Nµ)

]= (2π)−

N2 (σ2)−

N−I2 (σ2 + nσ2

s )−I2

exp

{− 1

2(y − 1Nµ)′

[II ⊗ Jn

(1

n

(1

σ2 + nσ2s

− 1

σ2

))](y − 1Nµ)

}where Jn = 1n1′n is an (n × n) matrix of 1′s, and ⊗ is the Kroneckerproduct.



The log-likelihood function can be written as

`(µ, σ2s , σ

2) ∝ −N − I

2log(σ2)−

I

2log(σ2+nσ2

s )−1

2σ2

I∑i=1

n∑j=1

(yij−yi.)2−1

2

I∑i=1

n(yi. − µ)2

σ2 + nσ2s

By taking the derivatives and setting them to 0, the followingsolutions are obtained:

µ = y.., σ2 = RMSq =RSSq

I (n − 1)and σ2

s =1

n

[SSSq

I− σ2

]from which maximum likelihood estimates of the variancecomponents are obtained, except if σ2

s < 0, in which case theestimate is set to zero.

Note the difference between the maximum likelihood and theANOVA estimators of σ2

s . It is well known that maximum likelihoodestimates of variance components are biased downwards as they donot take into account the degrees of freedom used for estimating thefixed effects.



Some properties of the direct product of matrices

if Ar and Br are square matrices of order r and c, respectively,

Ar ⊗ Bc =

a11B . . . a1rB. . . . . . . . .ar1B . . . arrB

where ⊗ is called the direct (Kronecker) product operator

In general, A⊗ B 6= B⊗ A

If u and v are vectors, then u′ ⊗ v = v ⊗ u′ = vu′

If D(n) is a diagonal matrix and A is any matrix, then:

D⊗ A = d11A⊕ d22A⊕ . . . dnnA

If matrix dimensions are compatible

(A⊗ B)(C⊗ D) = AC⊗ BD

(αAA⊗ αBB) = αAαB (A⊗ B)

(A⊗ B)T = (AT ⊗ BT )

(A⊗ B)−1 = (A)−1(B)−1

rank(A⊗ B) = rank(A)rank(B)

tr(A⊗ B) = tr(A)tr(B)

det(A⊗ B) = det(A)rank(B)det(B)rank(A)




Maximum Likelihood Method – Disadvantages

Numerically intensive

Solving the likelihood equations requires an iterative process whichmay or may not converge. Even when it converges, it may convergeto a local maxima rather than to a global maxima.

Tends to underestimate the variance components.

Distributional properties are not known except asymptotically.




Restricted Maximum Likelihood Method

Another alternative likelihood-based method for inferring variancecomponents in mixed models is the restricted (or residual) maximumlikelihood approach (REML), which corrects the bias associated withmaximum likelihood estimates by taking into account the degrees offreedom used for estimating the fixed effects.

REML estimators of the variance components are found bymaximizing that part of the likelihood function that is invariant tofixed effects in the model.

We have Y = Xβ + Zb + ε. The REML approach for estimation ofvariance components maximizes the likelihood function of a set oferror contrasts Y∗ = LY, where L is a n-rank(X) (where n = thedimension of Y) full-rank matrix with columns orthogonal to thecolumns of the incidence matrix X, that is, LX = 0.




Restricted Maximum Likelihood MethodThen the vector Y∗ follows a multivariate normal distribution withnull mean vector and variance-covariance matrixL′VL = L′(ZGZ′ + Σ)L, that is,

Y∗ ∼ N(0, σ21LZ1Z′1L′ + σ2

2LZ2Z′2L′ + . . .+ σ2I).

Note that the distribution of Y∗ does not depend on β, then, thelikelihood formed from Y∗ depends only on the variance components.The residual likelihood function for the variance components is then:

L(α; y) = (2π)−(n−p)/2|L′VL|−1/2 exp{−1

2Y∗′(L′VL)−1Y∗}

The REML estimates of the variance components are those values ofσ2

1 , σ22 , ..., σ2 that maximize the restricted likelihood function

L(α|y).Another approach for obtaining the residual likelihood function forthe variance components is by integrating the fixed effects out of the‘full’ likelihood function, i.e.:

L(α; y) =

∫L(β,α|y)dβ



Example

Recall the balanced half-sib families data set, and its associated likelihood function:

L(µ, σ2s , σ

2) = (2π)−N2 (σ2)−

N−I2 (σ2 + nσ2

s )−I2

exp

[−

1

2σ2

I∑i=1

n∑j=1

(yij − yi.)2 −

1

2

I∑i=1

n(yi. − µ)2

σ2 + nσ2s

]

Its residual likelihood is then:

L(σ2s , σ

2) =

∫L(µ, σ2

s , σ2)dµ

= (2π)−N2 (σ2)−

N−I2 (σ2 + nσ2

s )−I2

× exp

[−

1

2σ2

I∑i=1

n∑j=1

(yij − yi.)2] ∫

exp

[−

1

2

I∑i=1

n(yi. − µ)2

σ2 + nσ2s

]dµ

which is equal to:

L(σ2s , σ

2) = (2π)−N2 (σ2)−

N−I2 (λ)−

I2

× exp

[−

1

2σ2

I∑i=1

n∑j=1

(yij − yi.)2]

exp

[−

1

2λ

I∑i=1

(yi. − µ)2]√

2πλ

In

where λ = σ2 + nσ2s



By taking the derivatives with respect to λ and σ2, and by using theinvariance property of maximum likelihood estimators, the followingsolutions are obtained:

σ2 = RMSq =RSSq

I (n − 1)and σ2

s =1

n

[SSSq

I − 1− σ2

]which are the REML estimates of the variance components, except ifσ2s < 0, i.e. if SSS < I−1

I (n−1)RSS .

As explicit forms of ML and REML estimators are often notavailable for more complex mixed effects models, ML and REMLestimates are generally obtained by iterative approaches such as theexpectation-maximization (EM) algorithm and Newton-Raphson-based procedures.



Advantages

Less numerically intensive than the Maximum Likelihood Method.

The REML estimates and the ANOVA estimates agree when thedata are balanced and all MM estimates of the variance componentsare non-negative.

REML estimates tend to be less biased than the MaximumLikelihood Estimates

Disadvantage

The distributional properties of these estimators are not known,except asymptotically.



Example

Yi = Xi



Model selection

Inference for fixed effects β

Wald testst- and F-testsLR tests (with ML not with REML)

Inference for components of variance α

Wald testsLR tests (even with REML)Caution 1: Marginal vs hirarchical modelCaution 2: Boundary problems

To test random terms REML Ratio Tests (REMLRTs) are used whenthe two models being compared are nested.

Akaike (AIC) and Bayesian (BIC) Information Criteria are used whenthe two models being compared are non nested

When the REMLRT is a test of whether a component constrained tobe nonnegative is zero, then the null distribution is a mixture of χ2sSelf & Liang (1987).



Likelihood ratio test for fixed effects

Test model M0 embeded in M1

with m0 = dim(Vec(X0)) and m1 = dim(Vec(X1)) (m0 < m1).Let `M0 and `M1 denote the associated log-likelihood calculated at theestimated parameter values.Then

X 2 = −2`M0 + 2`M1 ∼H0 χ2m1−m0



Goodness of fit criterion

Akaike’s Information Criterion

AIC = −2 logL(βml , σml , y) + 2K

Bayesian Information Criterion

BIC = −2 logL(βml , σml , y) + K log(n)

Other criterions with other penalty terms

Danger !!! Softwares often give AIC or BIC calculated at reml-estimatedparameter values.



Software

R functions

lm() – classical linear model

aov() – analysis of variance model

glm() – generalized linear model

gls() – generalized least squares model

gee() – generalized estimating equations (package gee)

lme() – linear mixed models (package nlme)

nlme() – non-linear mixed model (package nlme)

nls() – non-linear regression model (package nls)

lmer() – linear mixed models (package lme4)

ASReml [email protected]://www.vsni.co.uk/products/asremlASReml forum www.vsni.co.uk/forumCookbook: http://uncronopio.org/ASReml



Differences between lme4 and nlme

(B. Venables, 2010, personal communication)1 With nlme the fixed and random parts of the model are specified

using two formulae; in lme4 they are specified in the one formulawith the random parts ”added on” to the fixed parts.

2 With nlme you have no generalized linear mixed model fitter, thoughglmmPQL in the MASS library can be used for some GLMMs, and ituses the nlme library. lme4 has a GLMM built-in. It allows you tospecify families in the glm sense, but not all glm families aresupported, yet.

3 nlme offers non-linear mixed effect models; lme4 does not and neverwill.

4 The nlme package allows you to specify variance heterogeneity andcorrelation patterns; the only way to do this within lme4 is to use aglm family, which is often not what you want to do.

5 The nlme package has a gls functon for ”generalized least squares”.This allows you to make use of the variance heterogeneity andcorrelation patterns feature even if the model does not contain anyrandom effects. This is handy.



Differences between lme4 and nlme

(B. Venables, 2010, personal communication, cont.)

6 (Probably most important difference). nlme is hard to use withcrossed random effects, but is very well-developed for nested randomeffects. lme4 is the opposite: it handles crossed random effects welland using it with nested random effects is still simple enough, but abit more work than with nlme.

7 nlme uses an older algorithm which struggles for large data sets.lme4 uses a newer algorighm and can handle quite large data setsvery quickly. (I think the SAS Proc mixed, though, will handle evenbigger ones.)

8 lme4 is that, at this stage, it is relatively under-developed. Someimportant things are missing.

9 ASREML is wonderful, but it only handles a relatively small set ofmodels (though the most important set, of course)



(C. Brien, 2010, personal communication)

1 ASREML does a wide range of heterogeneous variances andcorrelations for nested and crossed random effects, althoughprobably not the full range of heterogeneous, nested models thatnlme does. ASREML also does GLMMs, similar to GLMMPQL. Itdoes not do the non-linear models.

2 ASREML is good for experiments and lme4/nlme are good for largesurveys, because that is what they were developed for



Software

SAS procedures

PROC GLM – general linear model

PROC MIXED – linear mixed model

PROC GENMOD – generalized linear model

PROC GLIMMIX

PROC NLMIXED – non-linear mixed model



(next three slides are from Demetrio, Mortier and Trottier (2009))

Basic SAS code1/ proc mixed data=variety.eval;

2/ class block type dose;

3/ model y = type|dose ;

4/ random block block*dose ;

5/ ods select Tests3 CovParms; run;

call procedure and declare data set

define block, type, dose as factor

define fixed effects in the model

declare random effects

output test type 3 and covarianceparameters



1/ proc mixed statement <options>;

DATA= SAS data set. Name of SAS data set to be used by PROCMIXED. The default is the most recently created data set.

METHOD

REML (default method)ML

COVTEST allows to specify if asymptotic standard errors and WaldZ-test for variance-covariance structure parameter estimates is used.



3/ MODEL statement <option>;

describes linear relation between Y and fixed covariables

S or Solution for fixed effects output;

DDFM method to compute approximate Degree of Freedom

CONTAIN (default)RESKRSATTERTH

outpred=Names1, output data-sets Names1 contains predictedvalues X β + Z u, sd...

outpredm=Names2, output data-sets Names2 contains predictedvalues X β, sd...

4/ Random statement

random block / Solution;

↪→ Blup and t-test



Maize trial

Example

5 progenies of a maize population were investigated

the trial was conducted using a completely randomized design with 4replicates of each progeny

the response variable was the weight of corn-cob (kg/10m2)

Progenies Replicates1 5.95 6.21 5.40 5.182 5.07 6.71 5.46 4.983 4.82 5.11 4.68 4.524 3.87 4.16 4.11 4.845 5.53 5.82 4.29 4.70



Completely Randomized Design (CRD)

For a random effects (CRD), for treatment random, the model is

Yjk = τk + εjk with τk random and εjk random

τk ∼ N(0, σ2T ) and εjk ∼ N(0, σ2)

τk and εjk , τk and τk′ , εjk and εj′k′ (j 6= j ′ and/or k 6= k ′) areindependent

then

Var(Yjk) = Var(τk + εjk) = σ2 + σ2T

Cov(Yjk ,Yj′k) = Cov(τk + εjk , τk + εj′k) = σ2T (observations from

the same treatment)

Cov(Yjk ,Yjk′) = Cov(τk + εjk , τk′ + εjk′) = 0 (observations fromdifferent treatments)



The variance matrices of the observations for the fixed and randommodels when r = 2, t = 3 are

Var(Y) = Σ =

σ2 0 0 0 0 00 σ2 0 0 0 00 0 σ2 0 0 00 0 0 σ2 0 00 0 0 0 σ2 00 0 0 0 0 σ2

,

Var(Y) = ZGZ′ + Σ =

σ2 + σ2T σ2

T 0 0 0 0σ2T σ2 + σ2

T 0 0 0 00 0 σ2 + σ2

T σ2T 0 0

0 0 σ2T σ2 + σ2

T 0 00 0 0 0 σ2 + σ2

T σ2T

0 0 0 0 σ2T σ2 + σ2

T

In this case:

Z =

12 02×1 02×1

02×1 12 02×1

02×1 02×1 12

,G = σ2T I3 and Σ = σ2I6



Expected mean squares for an ANOVA

Let Y be an n × 1 vector of random variables with E[Y] = µ andVar[Y] = V, where µ is a n × 1 vector of expected values and V is ann × n matrix. Let A an n × n matrix of real numbers. Then

E(YTAY) = tr (AV) + µTAµ

For a fixed CRD modelE(Y) = XTτ and V = Inσ2

For a random CRD modelE(Y) = Inµ and V = Inσ2 + rσ2

TMT

whereXT = It ⊗ 1r , MT = XT (XT

TXT )−1XTT = r−1Ir ⊗ Jt



The expected mean squares under the fixed and random models are givenin the following table

Source df SSq MSq (s2) E[MSq] E[MSq]Units n − 1 Y′QUY

Treatments t − 1 Y′QTYY′QTY

t − 1σ2 + qT (Ψ) σ2 + rσ2

T

Residual n − t Y′QUResY

Y′QUResY

n − tσ2 σ2

where qT (Ψ) =Ψ′QTΨ

t − 1=

t∑k=1

r(αk − α.)2

t − 1

MU = (It ⊗ Ir ) = Itr , MG = n−1Jt ⊗ Jr = n−1Jn

QT = MT −MG , QU = MU −MG , QURes= MU −MT

σ2 and σ2T are called components of variance

ANOVA estimators:

σ2 = RMSq, σ2T =

TMSq − RMSq

r



Maize trial

ANOVA table using R

Source df SSq MSq F Prob

Plots 19

Progeny 4 5.5078 1.3770 4.2872 0.01644∗Residual 15 4.8177 0.3212

MM ML REML

σ2P 0.2639 0.1951 0.2639

σ2 0.3212 0.3212 0.3212


SAS program

data prog;

input Progeny Yield @@;

cards;

1 5.95 3 4.68

1 6.21 3 4.52

1 5.40 4 3.87

1 5.18 4 4.16

2 5.07 4 4.11

2 6.71 4 4.84

2 5.46 5 5.53

2 4.98 5 5.82

3 4.82 5 4.29

3 5.11 5 4.70

;

* Moment Method;

proc glm data=prog;

class Progeny;

model Yield = Progeny;

run;

* Restricted Maximum Likelihood Method;

proc mixed data=prog;

class Progeny;

model Yield = / solution ddfm=sat;

random Progeny / solution ;

run;

* Maximum Likelihood Method;

proc mixed data=prog method=ML;

class Progeny;

model Yield = / solution ddfm=sat;

random Progeny / solution ;

run;

R program

CRDMaize.dat <- data.frame(Plots = factor(c(1:20)),Progeny = factor(rep(c(1:5), each=4)),Yield <- c(5.95,6.21,5.40,5.18,5.07,6.71,5.46,4.98,4.82,5.11,

4.68,4.52,3.87,4.16,4.11,4.84,5.53,5.82,4.29,4.70))CRDMaize.datattach(CRDMaize.dat)

# Moment MethodCRDMaize.lm <- lm(Yield ~ Progeny, CRDMaize.dat)anova(CRDMaize.lm)(1.3770-0.3212)/4(summary(CRDMaize.lm)$sigma)^2

require(nlme)# Restricted Maximum Likelihood MethodCRDMaize.reml <- lme(Yield ~ 1, random = ~1|Progeny, CRDMaize.dat, method="REML")summary(CRDMaize.reml)(summary(CRDMaize.reml)$sigma)^2random.effects(CRDMaize.reml)CRDMaize.reml$coefcoef(CRDMaize.reml)summary(lm(Yield ~ Progeny-1))

# Maximum Likelihood MethodCRDMaize.ml <- update(CRDMaize.reml, method="ML")#CRDMaize.ml <- lme(Yield ~ 1, random = ~1|Progeny, CRDMaize.dat, method="ML")summary(CRDMaize.ml,corr = F)(summary(CRDMaize.ml)$sigma)^2random.effects(CRDMaize.ml)coef(CRDMaize.ml)


Penicillin yield (Brien, 2009)

Example

the effects of four treatments (A, B, C and D) on the yield ofpenicillin are to be investigated

it was known that corn steep liquor, an important raw material inproducing penicillin, is highly variable from one blending of it toanother

to ensure that the results of the experiment apply to more than oneblend, several blends are to be used in the experiment

the trial was conducted using the same blend in four flasks andrandomizing the four treatments to these four flasks

altogether five blends were utilized

the blends used can be looked as a sample of a population of blends



Data

TreatmentBlend A B C D

1 89 88 97 942 84 77 92 793 81 87 87 854 87 92 89 845 79 81 80 88



Randomized Complete Design (RCBD)

Considering a randomized complete block design, the model we aresupposing is:

Yjk = βj + τk + εjk with τk fixed and βj and εjk random

βj ∼ N(0, σ2B) and εjk ∼ N(0, σ2)

βj and εjk , βj and βj′ (j 6= j ′), εjk and εj′k′ (j 6= j ′ and/or k 6= k ′)are independent

then

Var(Yjk) = Var(βj + τk + εjk) = Var(βj + εjk) = σ2 + σ2B

Cov(Yjk ,Yj′k) = Cov(βj + τk + εjk , β′j + τk + εj′k) = 0 (observations

from different blocks and the same treatment)

Cov(Yjk ,Yjk′) = Cov(βj + τk + εjk , βj + τ ′k + εjk′) = σ2B

(observations from the same block and different treatments)



The variance matrices of the observations when b = 2, t = 3Block fixed, treatment fixed

Var(Y) = Σ =

σ2 0 0 0 0 00 σ2 0 0 0 00 0 σ2 0 0 00 0 0 σ2 0 00 0 0 0 σ2 00 0 0 0 0 σ2

,

Block random, treatment fixed

Var(Y) = ZGZ′ + Σ =

σ2 + σ2B σ2

B σ2B 0 0 0

σ2B σ2 + σ2

B σ2B 0 0 0

σ2B σ2

B σ2 + σ2B 0 0 0

0 0 0 σ2 + σ2B σ2

B σ2B

0 0 0 σ2 σ2 + σ2B σ2

B

0 0 0 σ2 σ2B σ2 + σ2

B

In this case:

Z =

[13 03×1

03×1 13

],G = σ2

B I2 and Σ = σ2I6



Treaments and Blocks fixed and Plots random:E(Y) = XBβ + XTτ and Var(Y) = VY = σ2In,Treaments fixed, Blocks and Plots random:E(Y) = XTτ and Var(Y) = VY = σ2MU + tσ2

BMB .For Blocks and Plots random, the analysis of variance table is

Source df SSq E(MSq)Blocks b − 1 Y′QBY σ2 + tσ2

B

Units[Blocks] b(t − 1) Y′QUYTreatments t − 1 Y′QTY σ2 + qT (Ψ)Residual (b − 1)(t − 1) Y′QURes

Y σ2

ANOVA estimators:

σ2 = RMSq, σ2B =

BMSq − RMSq

t



whereXB = Ib ⊗ 1t ,XT = 1b ⊗ ItMU = (Ib ⊗ It) = IbtMG = n−1Jb ⊗ Jt = n−1Jn

MB = XB(XTB XB)−1XT

B = t−1Ib ⊗ Jt

QB = MB −MG

MT = XT (XTTXT )−1XT

T = b−1Jb ⊗ ItQT = MT −MG

QU = MU −MG

QURes= MU −MB −MT + MG

qT (Ψ) =Ψ′QTΨ

t − 1=

t∑k=1

b(τk − τ .)2

t − 1



Penicillin yield

ANOVA table using R


Blend 4 264.0 66.0 1.97 0.15

Plots[Blocks] 15

Treat 3 70.0 23.3 1.24 0.34

Residual 12 226.0 18.8

MM ML REML

σ2B 11.8 9.4 11.8

σ2 18.8 15.1 18.8


SAS programdata pen;

input Blend Treat$ Yield @@;

cards;

1 A 89 3 C 87

1 B 88 3 D 85

1 C 97 4 A 87

1 D 94 4 B 92

2 A 84 4 C 89

2 B 77 4 D 84

2 C 92 5 A 79

2 D 79 5 B 81

3 A 81 5 C 80

3 B 87 5 D 88

;

* Moment Method;

proc glm data=pen;

class Blend Treat;

model Yield = Blend Treat;

run;


proc mixed data=pen;

class Blend Treat;

model Yield = Treat / solution ddfm=sat;

random Blend / solution ;

run;


proc mixed data=pen method=ML;

class Blend Treat;

model Yield = Treat / solution ddfm=sat;

random Blend / solution ;

run;

R program

#set up data.frame with factors Flasks, Blends and Treat and response variable YieldRCBDPen.dat <- data.frame(Blend=factor(rep(c(1,2,3,4,5), times=c(4,4,4,4,4))),Flask = factor(rep(c(1,2,3,4), times=5)),Treat = factor(rep(c("A","B","C","D"), times=5)))RCBDPen.dat$Yield <- c(89,88,97,94,84,77,92,79,81,87,87,85,87,92,89,84,79,81,80,88)RCBDPen.datattach(RCBDPen.dat)

# Moment MethodRCBDPen.lm <- lm(Yield ~ Blend + Treat, RCBDPen.dat)anova(RCBDPen.lm)(66.000-18.833)/4anova(lm(Yield ~1, RCBDPen.dat)) # to get the Total SS

require(nlme)# Restricted Maximum Likelihood MethodRCBD.reml <- lme(Yield ~ Treat, random = ~1|Blend, RCBDPen.dat, method="REML")summary(RCBD.reml,corr = F)

# Maximum Likelihood MethodRCBD.ml <- lme(Yield ~ Treat, random = ~1|Blend, RCBDPen.dat, method="ML")summary(RCBD.ml,corr = F)


Calf birth weight

Example

In an animal breeding experiment 20 unrelated cows were subjected tosuperovulation and artificial insemination. Each group of 4 cows wasinseminated with a different sire, with a total of 5 unrelated sires. Out ofeach mating (combination of dam and sire), three calves were generatedand their yearling weights were recorded.

The following model can be used to represent these data:

yijk = µ+ si + dij + εijk

where yijk represents the observed weight of calf k (k = 1, 2, 3) infamily ij , µ is a mean, si is an effect common to all animals havingsire i , vij is an effect common to all animals having dam j crossedwith sire i and εijk is a residual term.



Hierarchical classification model

Considering a general hierarchical classification model, the model we aresupposing is:

yijk = µ+ si + dij + εijk with si , dij and εjk random effects

si ∼ N(0, σ2s ), vij ∼ N(0, σ2

d) and εijk ∼ N(0, σ2)

si dij and εijk , si and s ′i (i 6= i ′), dij and di ′j′ (i 6= i ′ and/or j 6= j ′),εijk and εi ′j′k′ (i 6= i ′, j 6= j ′ and/or k 6= k ′) are independent

then

Var(Yijk) = Var(µ+ si + dij + εijk) = Var(si + dij) = σ2 + σ2s + σ2

d

Cov(Yijk ,Yijk′) = Cov(µ+ si +dij + εijk , µ+ si +dij + εijk′) = σ2s +σ2

d

(observations from same sire and same dam)

Cov(Yijk ,Yijk′) = Cov(µ+ si + dij + εijk , µ+ si + dij′ + εij′k′) = σ2s

(observations from same sire and different dam)

Cov(Yijk ,Yijk′) = Cov(µ+ si + dij + εijk , µ+ s ′i + di ′j′ + εi ′j′k′) = 0(observations from different sire and different dam)



Considering 2 sires/2 dams/2 calves, the variance matrices of theobservations is

Var(Y) = ZGZ′ + Σ = Z1G1Z′1 + Z2G2Z′2 + Σ

[V 04×4

04×4 V

]where

V =

σ2 + σ2

s + σ2d σ2

s + σ2d σ2

s σ2s

σ2s + σ2

d σ2 + σ2s + σ2

d σ2s σ2

s

σ2s σ2

s σ2 + σ2s + σ2

d σ2s + σ2

d

σ2s σ2

s σ2s + σ2

d σ2 + σ2s + σ2

d

In this case:

Z1 =

[14 04×1

04×1 14

],Z2 =

12 02×1 02×1 02×1

02×1 12 02×1 02×1

02×1 02×1 12 02×1

02×1 02×1 02×1 12

,G1 = σ2

s I2, G2 = σ2d I4 and Σ = σ2I8



For Sires, Dams and Calves random (ISires/JDams/KCalves), theanalysis of variance table is

Source df SSq E(MSq)Sire I − 1 Y′QSY σ2 + Kσ2

d + JKσ2s

Dam[Sire] I (J − 1) Y′QDY σ2 + Kσ2d

Residual IJ(K − 1) Y′QUResY σ2

ANOVA estimators:

σ2 = RMSq, σ2d =

DSMSq − RMSq

K, σ2

s =SMSq − DSMSq

JK



Calf birth weight

ANOVA table using R or SAS


Sire 4 1356 339 71.22 < 0.01

Dam[Sire] 15 129 9 1.81 0.068

Residual 40 190 4.8

MM ML REML

σ2s 28 22 27

σ2v 1.3 1.2 1.2

σ2 4.8 4.8 4.8


SAS program

data calf_b;input weight sire dam @@;cards;30.1 1 1 39.3 2 7 43.9 4 1431.1 1 1 39.9 2 8 46.7 4 1434.6 1 1 36.7 2 8 44.5 4 1529.2 1 2 38.7 2 8 46.0 4 1530.8 1 2 39.8 3 9 47.0 4 1531.6 1 2 36.5 3 9 43.9 4 1632.0 1 3 38.9 3 9 45.0 4 1632.6 1 3 37.5 3 10 48.0 4 1632.7 1 3 38.6 3 10 41.9 5 1733.3 1 4 36.8 3 10 43.2 5 1740.2 1 4 39.0 3 11 45.3 5 1736.7 1 4 39.8 3 11 45.3 5 1832.3 2 5 38.6 3 11 44.0 5 1836.7 2 5 36.7 3 12 47.1 5 1840.1 2 5 37.6 3 12 45.3 5 1935.6 2 6 38.9 3 12 44.8 5 1934.3 2 6 40.8 4 13 45.3 5 1941.1 2 6 42.0 4 13 46.0 5 2034.1 2 7 45.0 4 13 47.2 5 2030.8 2 7 42.7 4 14 48.0 5 20;* Moment Method;proc glm data=calf_b;class sire dam;model weight = sire dam(sire);random sire dam(sire)/test;run;

SAS program (cont.)

* Restricted Maximum Likelihood Method;proc mixed data=calf_b;class sire dam;model weight = / solution ;random sire dam / solution G;run;

* Maximum Likelihood Method;proc mixed data=calf_b method=ML;class sire dam;model weight = / solution ddfm=sat;random sire dam / solution G;run;

R program

sire <- factor(rep(c(1,2,3,4,5), times=c(rep(12,5)))); siredam <- factor(rep(c(1:20), times=c(rep(3,20)))); damweight <- c(30.1, 31.1, 34.6, 29.2, 30.8, 31.6, 32 , 32.6, 32.7, 33.3, 40.2, 36.7,

32.3, 36.7, 40.1, 35.6, 34.3, 41.1, 34.1, 30.8, 39.3, 39.9, 36.7, 38.7,39.8, 36.5, 38.9, 37.5, 38.6, 36.8, 39.0, 39.8, 38.6, 36.7, 37.6, 38.9,40.8, 42.0, 45.0, 42.7, 43.9, 46.7, 44.5, 46.0, 47.0, 43.9, 45.0, 48.0,41.9, 43.2, 45.3, 45.3, 44.0, 47.1, 45.3, 44.8, 45.3, 46.0, 47.2, 48.0)

calf.dat<- data.frame(weight, sire, dam)

# Moment Methodcalf.lm <- lm(weight ~ sire/dam, calf.dat)anova(calf.lm)(MSSire = anova(calf.lm)$Mean[1])(MSDamdSire = anova(calf.lm)$Mean[2])(MSRes = anova(calf.lm)$Mean[3])(MSSire - MSDamdSire)/ (3*4)(MSDamdSire - MSRes)/ (3)

library(nlme)# Maximum Likelihood Methodcalf.ml <- lme(weight ~ 1, random = ~1|sire/dam, calf.dat, method="ML")summary(calf.ml,corr = F)(summary(calf.ml)$sigma)^2

# Restricted Maximum Likelihood Methodrequire(nlme)calf.reml <- lme(weight ~ 1, random = ~1|sire/dam, calf.dat, method="REML")summary(calf.reml,corr = F)(summary(calf.reml)$sigma)^2 # sigma^2names(summary(calf.reml)) # sigma^2


Unbalanced data

The experiment just described is not common. In general, experimentsthis type are unbalanced as the example in the following figure.

S1

D1 D2

S2

D3 D4 D5

S3

D6 D7D1 D2 D3 D4 D5 D6 D7



For Sires, Dams and Calves random (ISires/niDams/mijCalves), theanalysis of variance table is

Source df SSq E(MSq)Sire I − 1 Y′QSY σ2 + K2σ

2d + K3σ

2s

Dam[Sire]∑I

i=1 ni − I Y′QDY σ2 + K1σ2d

Residual N −∑I

i=1 ni − I Y′QUResY σ2

where K1 =1∑I

i=1 ni − I

(N −

I∑i=1

∑nij=1 m

2ij

mi.

),

K2 =1

I − 1

[ I∑i=1

ni∑j=1

m2ij

(1

mi .− 1

N

)]and K3 =

1

I − 1

(N −

∑Ii=1 m

2i.

N

).

N =∑I

i=1

∑nij=1 mij and mi. =

∑nij=1 mij

ANOVA estimators:

σ2 = RMSq, σ2d =

DSMSq − RMSq

K1, σ2

s =SMSq − K2σ

2d − σ2

K3



Calf birth weight

ANOVA table using R or SAS


Sire 2 37.25 18.63 25.30 < 0.01

Dam[Sire] 4 1275.33 318.83 433.13 < 0.01

Residual 6 4.42 0.74

K1 = 1.75, K2 = 1.96, K3 = 4.15

MM ML REML

σ2s -81.53 0.00000012 0.00000023

σ2v 181.77 104.29 121.75

σ2 0.74 0.74 0.74

Negative component of variance


SAS programdata calf_unb;

input sire dam weight;

cards;

1 1 32.0

1 1 33.5

1 2 55.0

2 3 36.0

2 4 34.5

2 4 35.0

2 5 48.0

2 5 49.5

2 5 50.0

3 6 32.5

3 6 31.5

3 7 58.0

3 7 57.0

;

* Moment Method;

proc glm data=calf_unb;

class sire dam;

model weight = sire dam(sire);

random sire dam(sire)/test;

run;


proc mixed data=calf_unb;

class sire dam;

model weight = / solution ;

random sire dam / solution G;

run;


proc mixed data=calf_unb method=ML;

class sire dam;

model weight = / solution ddfm=sat;

random sire dam / solution G;

run;

R programsire <- factor(c(1,1,1,2,2,2,2,2,2,3,3,3,3))

dam <- factor(c(1,1,2,3,4,4,5,5,5,6,6,7,7))

weight <- c(32.0,33.5,55.0,36.0,34.5,35.0,48.0,49.5,50.0,32.5,31.5,58.0,57.0)

(calf.dat<- data.frame(sire, dam, weight))

# Moment Method

calf.lm <- lm(weight ~ sire/dam, calf.dat)

anova(calf.lm)

(SireMSq = anova(calf.lm)$Mean[1])

(DamdSireMSq = anova(calf.lm)$Mean[2])

(ResMSq = anova(calf.lm)$Mean[3])

(k1 <- (13-(2^2+1^2)/3-(1^2+2^2+3^2)/6-(2^2+2^2)/4)/((2+3+2)-3))

(k2 <- ((2^2+1^2)*(1/3-1/13)+(1^2+2^2+3^2)*(1/6-1/13)+(2^2+2^2)*(1/4-1/13))/(3-1))

(k3 <- (13-(3^2+6^2+4^2)/13)/(3-1))

(sigma2D <- (DamdSireMSq - ResMSq)/k1) # sigmaD^2_hat

(sigma2S <- (SireMSq - ResMSq - k2*sigma2D)/ k3) # sigmaS^2_hat

# Restricted Maximum Likelihood Method

require(nlme)

calf.reml <- lme(weight ~ 1, random = ~1|sire/dam, calf.dat, method="REML")

summary(calf.reml,corr = F)

(summary(calf.reml)$sigma)^2 # sigma^2

# Maximum Likelihood Method

calf.ml <- lme(weight ~ 1, random = ~1|sire/dam, calf.dat, method="ML")

summary(calf.ml,corr = F)

(summary(calf.ml)$sigma)^2


General Covariance Structures

Model Description Matrix

ID Identical variation

σ2 0 . . . 00 σ2 . . . 0. . . . . . . . . . . .0 0 . . . σ2

DIAG Heterogeneous variation

σ2

1 0 . . . 00 σ2

2 . . . 0. . . . . . . . . . . .0 0 . . . σ2

J

CS Compound symmetry with homogeneous variation

σ2 σ2

a . . . σ2a

σ2a σ2 . . . σ2

a. . . . . . . . . . . .σ2a σ2

a . . . σ2

CSHet Compound symmetry with heterogeneous variation

σ2

1 σ2a . . . σ2

a

σ2a σ2

2 . . . σ2a

. . . . . . . . . . . .σ2a σ2

a . . . σ2J



General Covariance Structures (cont.)

Model Description Matrix

AR1 First-order autoregressive mod. with homog. var.

σ2 σ2ρ . . . σ2ρd(1,J)

σ2ρ σ2 . . . σ2ρd(2,J)

. . . . . . . . . . . .

σ2ρd(J,1) σ2ρd(J,2) . . . σ2

AR1Het First-order autoregressive mod. with heterog. var.

σ2

1 σ2aρ . . . σ2

aρd(1,J)

σ2aρ σ2

2 . . . σ2aρ

d(2,J)

. . . . . . . . . . . .

σ2aρ

d(J,1) σ2aρ

d(J,2) . . . σ2J

UN Unstructured model

σ2

1 σ12 . . . σ1J

σ21 σ22 . . . σ2J

. . . . . . . . . . . .

σJ1 σJ2 . . . σ2J



Table: Covariance Structure (SAS)


PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression...

Documents

Transcript of PlanDay 1 Contents - USP · if matrix X contains only regressors, models are called regression...