Bayesian Random Effect Models - Duke University · PDF fileBayesian Random Effect Models...

Bayesian Random Effect ModelsNovember 2, 2008

Bayesian Random Effect Models – p.1/19

One-Way AOV Model

In the classical one-way analysis of variance model:

Yij = µj + ǫij , ǫijiid∼ N(0, σ2)

interest is in the individual means or differences in meansfor the J particular groups that have been selected.

The µj are referred to as fixed effects. An alternativemodel is the random effects model that we illustrate next.


Example: Math Achievement

Representative sample of U.S. public schools (160schools)

Within each school, a random sample of students isselected

Yij is a standardized measure of math achievementfor student i in school j

Other variables are also measured (more later)


Hierachical Random Effect Model

The student level model:

Yij = µj + ǫij ǫijiid∼ N(0, σ2)

where σ2 measures how much variation each individualstudent deviates from their school mean µj

Question: How much do US high schools vary in theirmean mathematics achievement?Inference about the broader population of school means:

µjiid∼ N(µ, σ2

µ)

In this 2nd-level model, the school level means vary aboutan overall mean µ with variance σ2

µBayesian Random Effect Models – p.4/19

Random Effects

In 2-level model, the school-level means are viewed asrandom effects arising from a normal population.

µjiid∼ N(µ, σ2

µ)

µ is the overall population mean, a fixed effect

σ2 is the within-group variance or variance component

σ2µ is the between-group variance

2 additional parameters versus the J + 1 in the fixedeffects model.

How much US high schools vary in their meanmathematics achievement is captured by the variancecomponent σµ


Mixed Effects Model

We can write µj = µ + sj where each school mean iscentered at the overall mean µ plus some normal randomeffect sj . Substituting this into the distribution for Yij , wearrive at the combined model:

Yij = µ + sj + ǫij

with fixed effect µ and school level random effects sj andindividual random effects ǫij, leading to what is known asa mixed effects model.


Marginal Model

Because linear combinations of normals are normallydistributed we have the equivalent model:

Yij ∼ N(µ, σ2

µ + σ2)

whereCov(Yij , Yi′j) = σ2

µ

Cov(Yij , Yi′j′) = 0, for j 6= j′

This model that implies students within schools areexchangeable and that student achievements acrossdifferent schools are independent given the school effect.(reasonable assumption?)


Intraclass Correlation

The intraclass correlation:

Corr(Yij , Yi′j =σ2

µ

σ2µ + σ2

provides a measure of the proportion of total variation thatis explained by between group variability. It is

0 when there is no between group variability σ2µ = 0

1 when there is no within group variability σ2 = 0


Classical Estimation

Method of Moments Estimates: Equate E(MS)’s toobserved MS’s. Two equations with two unknowns,solve. MOM estimate of µ is sample mean.

Find the MLE’s of µ, σ2, σ2µ from the marginal model

for Yij (make sure estimates are in parameter space,i.e. estimates of variance components are positive)

Use REsidual Maximum Likelihood (REML). Fit fixedeffects by least squares, then estimated variancecomponents by ML using residuals.

Use library nlme in R and function lme to fit linearmixed effect models using Restricted MaximumLikelihood (reml) default or ML

lme(fixed = formula, data, random =formula, method)


Edited Output

> summary(lme(fixed = y ˜ 1,random = ˜ 1 | school))

Linear mixed-effects model fit by REML

Random effects:Formula: ˜1 | school

(Intercept) ResidualStdDev: 2.934966 6.256862

Fixed effects: y ˜ 1Value Std.Error DF t-val p-value

(Intercept) 12.637 0.2443936 7025 51.71 0


REML Estimates

µ̂ = 12.63

σ̂µ = 2.93 or σ̂2µ = 8.61

σ̂ = 6.26 or σ̂2 = 39.14

ρ = 8.61/(8.61 + 39.14) = .18

Roughly 20% of the variation in math achievement scorescan be attributed to differences among schools. Theremaining variation is due to variation among studentswithin schools.

No estimates of individual school means, however.


Bayesian Model

Unknown parameters of interest: µj , µ, σ2, σ2µ

Distribution for µj is given by the 2nd level modelspecification

Specify joint prior distribution for remaining unknownsµ, σ2, σ2

µ

A default prior p(µ, σ2, σµ) ∝ 1/σ2

Obtain a joint posterior distribution over all unknowns

p(µ1, . . . , µJ , µ, σ2, σµ|Y ) ∝ p(Y |µ1, . . . , µJ , σ2)p(µj|µ, σ2

µ)p(µ, σ2,

(replace variance components with precisions φ and φµ)


Hierarichal Model

Hierarchical model

p(Y | µj , φ) ∝∏

j

∏

i

φ1/2 exp

{

−1

2φ(Yij − µj)

2

}

p(µj|µ, φµ) ∝ φ1/2

µ exp

{

−1

2φµ (µj − µ)2

}

p(µ) ∝ 1

p(φ) ∝ 1/φ

p(φµ) ∝ ? (see HW 6)

p(µ1, . . . , µJ , µ, σ2, σµ|Y ) ∝ p(Y |µ1, . . . , µJ , σ2)p(µj|µ, σ2

µ)p(µ, σ2,Bayesian Random Effect Models – p.13/19

Markov Chain Monte Carlo Sampling

Cannot obtain the posterior distributions in closed form;instead create a Markov chain that generates values fromthe following full conditional distributions

µj|µ, φ, φµ, Y for j = 1, . . . , J

µ|µj , φ, φµ, Y

φ|µ, µj , φµ, Y

φµ|µ, µj , φ, Y

Gibbs Sampling gives a dependent sequence of drawsfrom the joint posterior distribution. Given the sampleoutput summarize to estimate the posterior densities,quantiles, etc.


Full Conditional Distributions

Let θ = (θ1, . . . θp)

full conditional distribution for any component θj giventhe rest θ−j (rest)

p(θj|θ−j , Y ) ∝ L(θ)p(θ)

Only terms that involve θj are needed in writing thefull conditional

May “block” together terms to use “vectorized” code


Full conditional for µ

⇒ p(µ|µ1, . . . µJ , φ, φµ, Y ) ∝J

∏

j=1

p(µj|µ, φµ)p(µ)

= N

(∑

µj

J,

1

Jφµ

)


Summary of Output from WinBUGS

mean sd 2.5% 25% 50% 75% 97.5%rho 0.2 0.0 0.1 0.2 0.2 0.2 0.2sigma2.mu 8.8 1.1 6.9 8.0 8.7 9.5 11.1sigma2 39.2 0.6 37.9 38.7 39.1 39.6 40.4mu 12.6 0.3 12.1 12.5 12.6 12.8 13.1


Posterior

Densities

11.512.0

12.513.0

13.5

0.0 0.5 1.0 1.5

µ

Density

0.150.20

0.25

0 5 10 15 20

ρDensity

3738

3940

41

0.0 0.1 0.2 0.3 0.4 0.5 0.6

σ2

Density

68

1012

14

0.00 0.10 0.20 0.30

σµ 2

Density

Bayesian

Random

EffectM

odels–

p.18/19

Summary

Can obtain posterior distribution for each schoolmean. Posterior mean will be a convex combinationbetween MLE (observed school mean) and overallmean

School level means are “shrunk” to overall mean;degree of shrinkage depends on variancecomponents

Compromise between fixed effects modelseach school has its own meancommon mean (µ1 = . . . , µj = µ)

avoids multiple testing

Bayesian results are useful for ranking of schools“report cards”


Bayesian Random Effect Models - Duke University · PDF fileBayesian Random Effect Models...

Documents

Transcript of Bayesian Random Effect Models - Duke University · PDF fileBayesian Random Effect Models...