Generalised Additive (Mixed) Models - Personal Homepages...

80
Generalised Additive (Mixed) Models I refuse to use a soft ‘G’ Daniel Simpson Department of Mathematical Sciences University of Bath

Transcript of Generalised Additive (Mixed) Models - Personal Homepages...

Page 1: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Generalised Additive (Mixed) ModelsI refuse to use a soft ‘G’

Daniel Simpson

Department of Mathematical SciencesUniversity of Bath

Page 2: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Outline

Background

Latent Gaussian model

Example: Continuous vs Discrete

Priors on functions

Page 3: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Two main paradigms for statistical analysis

I Let y denote a set of observations, distributed according to aprobability model π(y ;θ).

I Based on the observations, we want to estimate θ.

The classical approach:

θ denotes parameters (unknown fixed numbers), estimated forexample by maximum likelihood.

The Bayesian approach:

θ denotes random variables, assigned a prior π(θ). Estimate θbased on the posterior:

π(θ | y) =π(y | θ)π(θ)

π(y)∝ π(y | θ)π(θ).

Page 4: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Two main paradigms for statistical analysis

I Let y denote a set of observations, distributed according to aprobability model π(y ;θ).

I Based on the observations, we want to estimate θ.

The classical approach:

θ denotes parameters (unknown fixed numbers), estimated forexample by maximum likelihood.

The Bayesian approach:

θ denotes random variables, assigned a prior π(θ). Estimate θbased on the posterior:

π(θ | y) =π(y | θ)π(θ)

π(y)∝ π(y | θ)π(θ).

Page 5: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Example (Ski flying records)

Assume a simple linear regression model with Gaussianobservations y = (y1, . . . , yn), where

E(yi ) = α + βxi , Var(yi ) = τ−1, i = 1, . . . , n

1960 1970 1980 1990 2000 2010

140

160

180

200

220

240

World records in ski jumping, 1961 - 2011

Year

Leng

th in

met

ers

Page 6: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

The Bayesian approach

Assign priors to the parameters α, β and τ and calculate posteriors:

130 135 140 145

0.00

0.10

0.20

PostDens [(Intercept)]

Mean = 137.354 SD = 1.508

1.9 2.0 2.1 2.2 2.3 2.4

02

46

PostDens [x]

Mean = 2.14 SD = 0.054

0.05 0.10 0.15 0.20 0.25 0.30

05

1015

PostDens [Precision for the Gaussian observations]

Page 7: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Real-world datasets are usually much more complicated!

Using a Bayesian framework:

I Build (hierarchical) models to account for potentiallycomplicated dependency structures in the data.

I Attribute uncertainty to model parameters and latent variablesusing priors.

Two main challenges:

1. Need computationally efficient methods to calculateposteriors.

2. Select priors in a sensible way.

Page 8: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Real-world datasets are usually much more complicated!

Using a Bayesian framework:

I Build (hierarchical) models to account for potentiallycomplicated dependency structures in the data.

I Attribute uncertainty to model parameters and latent variablesusing priors.

Two main challenges:

1. Need computationally efficient methods to calculateposteriors.

2. Select priors in a sensible way.

Page 9: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Outline

Background

Latent Gaussian modelComputational framework and approximationsSemiparametric regression

Example: Continuous vs Discrete

Priors on functions

Page 10: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

What is a latent Gaussian model?

Classical multiple linear regression model

The mean µ of an n-dimensional observational vector y is given by

µi = E(Yi ) = α +

nβ∑j=1

βjzji , i = 1, . . . , n

where

α : Intercept

β : Linear effects of covariates z

Page 11: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Account for non-Gaussian observations

Generalized linear model (GLM)

The mean µ is linked to the linear predictor ηi :

ηi = g(µi ) = α +

nβ∑j=1

βjzji , i = 1, . . . , n

where g(.) is a link function and

α : Intercept

β : Linear effects of covariates z

Page 12: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Account for non-linear effects of covariates

Generalized additive model (GAM)

The mean µ is linked to the linear predictor ηi :

ηi = g(µi ) = α +

nf∑k=1

fk(cki ), i = 1, . . . , n

where g(.) is a link function and

α : Intercept

{fk(·)} : Non-linear smooth effects of covariates ck

Page 13: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Structured additive regression models

GLM/GAM/GLMM/GAMM+++The mean µ is linked to the linear predictor ηi :

ηi = g(µi ) = α +

nβ∑j=1

βjzji +

nf∑k=1

fk(cki ) + εi , i = 1, . . . , n

where g(.) is a link function and

α : Intercept

β : Linear effects of covariates z{fk(·)} : Non-linear smooth effects of covariates ck

ε : Iid random effects

Page 14: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Latent Gaussian models

I Collect all parameters (random variables) in the linearpredictor in a latent field

x = {α,β, {fk(·)},η}.

I A latent Gaussian model is obtained by assigning Gaussianpriors to all elements of x .

I Very flexible due to many different forms of the unknownfunctions {fk(·)}:

I Include temporally and/or spatially indexed covariates.

I Hyperparameters account for variability and length/strengthof dependence.

Page 15: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Latent Gaussian models

I Collect all parameters (random variables) in the linearpredictor in a latent field

x = {α,β, {fk(·)},η}.

I A latent Gaussian model is obtained by assigning Gaussianpriors to all elements of x .

I Very flexible due to many different forms of the unknownfunctions {fk(·)}:

I Include temporally and/or spatially indexed covariates.

I Hyperparameters account for variability and length/strengthof dependence.

Page 16: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Latent Gaussian models

I Collect all parameters (random variables) in the linearpredictor in a latent field

x = {α,β, {fk(·)},η}.

I A latent Gaussian model is obtained by assigning Gaussianpriors to all elements of x .

I Very flexible due to many different forms of the unknownfunctions {fk(·)}:

I Include temporally and/or spatially indexed covariates.

I Hyperparameters account for variability and length/strengthof dependence.

Page 17: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Some examples of latent Gaussian models

I Generalized linear and additive (mixed) models

I Semiparametric regression

I Disease mapping

I Survival analysis

I Log-Gaussian Cox-processes

I Geostatistical models

I Spatial and spatio-temporal models

I Stochastic volatility

I Dynamic linear models

I State-space models

I +++

Page 18: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Unified framework: A three-stage hierarchical model

1. Observations: y

Assumed conditionally independent given x and θ1:

y | x ,θ1 ∼n∏

i=1

π(yi | xi ,θ1).

2. Latent field: x

Assumed to be a GMRF with a sparse precision matrix Q(θ2):

x | θ2 ∼ N(µ(θ2),Q−1(θ2)

).

3. Hyperparameters: θ

= (θ1,θ2)Precision parameters of the Gaussian priors:

θ ∼ π(θ).

Page 19: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Unified framework: A three-stage hierarchical model

1. Observations: yAssumed conditionally independent given x and θ1:

y | x ,θ1 ∼n∏

i=1

π(yi | xi ,θ1).

2. Latent field: xAssumed to be a GMRF with a sparse precision matrix Q(θ2):

x | θ2 ∼ N(µ(θ2),Q−1(θ2)

).

3. Hyperparameters: θ = (θ1,θ2)Precision parameters of the Gaussian priors:

θ ∼ π(θ).

Page 20: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Unified framework: A three-stage hierarchical model

1. Observations: yAssumed conditionally independent given x and θ1:

y | x ,θ1 ∼n∏

i=1

π(yi | xi ,θ1).

2. Latent field: xAssumed to be a GMRF with a sparse precision matrix Q(θ2):

x | θ2 ∼ N(µ(θ2),Q−1(θ2)

).

3. Hyperparameters: θ = (θ1,θ2)Precision parameters of the Gaussian priors:

θ ∼ π(θ).

Page 21: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Model summary

The joint posterior for the latent field and hyperparameters:

π(x ,θ | y) ∝ π(y | x ,θ)π(x ,θ)

∝n∏

i=1

π(yi | xi ,θ)π(x | θ)π(θ)

Remarks:

I m = dim(θ) is often quite small, like m ≤ 6.

I n = dim(x) is often large, typically n = 102 – 106.

Page 22: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Target densities are given as high-dimensional integrals

We want to estimate:

I The marginals of all components of the latent field:

π(xi | y) =

∫ ∫π(x ,θ | y)dx−idθ

=

∫π(xi | θ, y)π(θ | y)dθ, i = 1, . . . , n.

I The marginals of all the hyperparameters:

π(θj | y) =

∫ ∫π(x ,θ | y)dxdθ−j

=

∫π(θ | y)dθ−j , j = 1, . . .m.

Page 23: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Target densities are given as high-dimensional integrals

We want to estimate:

I The marginals of all components of the latent field:

π(xi | y) =

∫ ∫π(x ,θ | y)dx−idθ

=

∫π(xi | θ, y)π(θ | y)dθ, i = 1, . . . , n.

I The marginals of all the hyperparameters:

π(θj | y) =

∫ ∫π(x ,θ | y)dxdθ−j

=

∫π(θ | y)dθ−j , j = 1, . . .m.

Page 24: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Example: Logistic regression, 2× 2 factorial design

Example (Seeds)

Consider the proportion of seeds that germinates on each of 21plates. We have two seed types (x1) and two root extracts (x2).

> data(Seeds)

> head(Seeds)

r n x1 x2 plate

1 10 39 0 0 1

2 23 62 0 0 2

3 23 81 0 0 3

4 26 51 0 0 4

5 17 39 0 0 5

6 5 6 0 1 6

Page 25: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Summary data set

Number of seeds that germinated in each group:

Seed typesx1 = 0 x1 = 1

Root extractx2 = 0 99/272 49/123

x2 = 1 201/295 75/141

Page 26: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Statistical model

I Assume that the number of seeds that germinate on plate i isbinomial

ri ∼ Binomial(ni , pi ), i = 1, . . . , 21,

I Logistic regression model:

logit(pi ) = log

(pi

1− pi

)= α + β1x1i + β2x2i + β3x1ix2i + εi

where εi ∼ N(0, σ2ε ) are iid.

Aim:

Estimate the main effects, β1 and β2 and a possible interactioneffect β3.

Page 27: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Statistical model

I Assume that the number of seeds that germinate on plate i isbinomial

ri ∼ Binomial(ni , pi ), i = 1, . . . , 21,

I Logistic regression model:

logit(pi ) = log

(pi

1− pi

)= α + β1x1i + β2x2i + β3x1ix2i + εi

where εi ∼ N(0, σ2ε ) are iid.

Aim:

Estimate the main effects, β1 and β2 and a possible interactioneffect β3.

Page 28: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Using R-INLA

> formula = r ~ x1 + x2 + x1*x2 + f(plate, model="iid")

> result = inla(formula, data = Seeds,

family = "binomial",

Ntrials = n,

control.predictor =

list(compute = T, link=1),

control.compute = list(dic = T))

Default priors

Default prior for fixed effects is

β ∼ N(0, 1000).

Change using the control.fixed argument in the inla-call.

Page 29: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Output> summary(result)

Call:

"inla(formula = formula, family = \"binomial\", data = Seeds, Ntrials = n)"

Time used:

Pre-processing Running inla Post-processing Total

0.1354 0.0911 0.0347 0.2613

Fixed effects:

mean sd 0.025quant 0.5quant 0.975quant kld

(Intercept) -0.5581 0.1261 -0.8076 -0.5573 -0.3130 0e+00

x1 0.1461 0.2233 -0.2933 0.1467 0.5823 0e+00

x2 1.3206 0.1776 0.9748 1.3197 1.6716 1e-04

x1:x2 -0.7793 0.3066 -1.3799 -0.7796 -0.1774 0e+00

Random effects:

Name Model

plate IID model

Model hyperparameters:

mean sd 0.025quant 0.5quant 0.975quant

Precision for plate 18413.03 18280.63 1217.90 13003.76 66486.29

Expected number of effective parameters(std dev): 4.014(0.0114)

Number of equivalent replicates : 5.231

Marginal Likelihood: -72.07

Page 30: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Estimated germination probabilities

> result$summary.fitted.values$mean

5 10 15 20

0.40

0.45

0.50

0.55

0.60

0.65

Plate no.

Pro

babi

lity

of g

erm

inat

ion

Page 31: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

More in the practicals . . .

> plot(result)

> result$summary.fixed

> result$summary.random

> result$summay.linear.predictor

> result$summay.fitted.values

> result$marginals.fixed

> result$marginals.hyperpar

> result$marginals.linear.predictor

> result$marginals.fitted.values

Page 32: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Example: Semiparametric regression

Example (Annual global temperature anomalies)

1850 1900 1950 2000

-0.5

0.0

0.5

Year

Tem

pera

ture

ano

mal

y

Page 33: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Estimating a smooth non-linear trend

I Assume the model

yi = α + f (xi ) + εi , i = 1, . . . , n,

where the errors are iid, εi ∼ N(0, σ2ε ).

I Want to estimate the true underlying curve f (·).

Page 34: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

R-code

I Define formula and run model

> formula = y ~ f(x, model = "rw2", hyper = ...)

> result = inla(formula, data = data.frame(y, x))

I The default prior for the hyperparameter of rw2:

hyper = list(prec =

list(prior = "loggamma",

param = c(1, 0.00005)))

Page 35: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Output

I > summary(result)

> plot(result)

I The mean effect of x :

> result$summary.random$x$mean

Note that this effect is constrained to sum to 0.

I Resulting fitted curve

> result$summary.fitted.values$mean

Page 36: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Estimated fit using the default prior

Example (Annual global temperature anomalies)

1850 1900 1950 2000

-0.5

0.0

0.5

Year

Tem

pera

ture

ano

mal

y

Page 37: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Estimated fit using R-INLA compared with smooth.spline

Example (Annual global temperature anomalies)

1850 1900 1950 2000

-0.5

0.0

0.5

Year

Tem

pera

ture

ano

mal

y

Page 38: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Using different priors for the precision

Example (Annual global temperature anomalies)

1850 1900 1950 2000

-0.5

0.0

0.5

Year

Tem

pera

ture

ano

mal

y

Page 39: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

“Hepatitis Z” in Ontario

Somebody gave us all ofthe cases of a diseasethroughout Ontario.

I Analyse risk factors

I We have counts ineach postcode

I Try something simplefirst...

(data: diseasemappingtoolbox in R)

0

500000

1000000

1500000

2000000

2500000

Page 40: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Oops - doesn’t work!

Fit a model with age (in groups)and sex

I It doesn’t fit!

I Lots of under-prediction

I Not enough zeros (56 vs143)

We need to do something else?Zero-inflation??Over-dispersion????

Page 41: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Not just yet...

Let’s actually look at thedata!

I Nearby counts aresimilar

I Big counts are in thecities

I Maybe eachobservation is notindependent?

I Maybe the risk is notindependent?

Page 42: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Outline

Background

Latent Gaussian model

Example: Continuous vs Discrete

Priors on functions

Page 43: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Leukaemia survival

Leukaemia survival data (Henderson et al, 2002, JASA), 1043cases.

Page 44: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Survival models

Survival models are different for many models in statistics. Thereare two types of observations: an event (death) or we stopmeasuring (censoring).

Rather than directly modelling the hazard (instantaneous risk)

h(y) dy = Prob(y ≤ Y < y + dy |Y > y)

h(y) = f (t)S(t)

Page 45: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Cox proportional hazards model

Write the hazard function for each patient as:

h(yi |wi , x i ) = h0(yi ) wi exp(cTi β) exp(x(si )); i = 1, . . . , 1043

where

h0(·) is the baseline hazard functionwi is the log-Normal frailty effect associated with patient ic i is the vector of observed covariates for patient iβ is a vector of unknown parameters

x(si ) is the value of the spatial effect x(s) for patient i .

Page 46: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Spatial survival: example

log(hazard) = log(baseline)

+f (age)

+f (white blood cell count)

+f (deprivation index)

+f (spatial)

+sex

Page 47: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

R-code (regions)

data(Leuk)

g = system.file("demodata/Leuk.graph", package="INLA")

formula = inla.surv(Leuk$time, Leuk$cens) ~ sex + age +

f(inla.group(wbc), model="rw1")+

f(inla.group(tpi), model="rw2")+

f(district, model="besag", graph = g)

result = inla(formula, family="coxph", data=Leuk)

source(system.file("demodata/Leuk-map.R", package="INLA"))

Leuk.map(result$summary.random$district$mean)

plot(result)

Page 48: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

2 4 6 8 10 12 14

−1

01

2

baseline.hazard

PostMean 0.025% 0.5% 0.975%

Page 49: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

−11.0 −10.0 −9.0 −8.0

0.0

0.5

1.0

1.5

PostDens [(Intercept)]

Mean = −9.401 SD = 0.261

−0.2 0.0 0.2 0.40

12

34

5

PostDens [sex]

Mean = 0.077 SD = 0.069

0.025 0.035 0.045

050

100

150

PostDens [age]

Mean = 0.036 SD = 0.002

Page 50: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

0 100 200 300 400 500

−1.

0−

0.5

0.0

0.5

1.0

1.5

inla.group(wbc)

PostMean 0.025% 0.5% 0.975%

Page 51: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

−5 0 5 10

−0.

6−

0.4

−0.

20.

00.

20.

4

inla.group(tpi)

PostMean 0.025% 0.5% 0.975%

Page 52: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

−0.2

−0.12−0.030.060.140.230.32

Page 53: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Outline

Background

Latent Gaussian model

Example: Continuous vs Discrete

Priors on functions

Page 54: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Let’s talk about sets, baby

We know that there is no “uniform” measure over all randomfunctions, so we need to think about useful subsets.

I These are choices about smoothness.

I How differentiable do we want these things to be?

I Can there be sharp changes in the functions or theirderivatives?

I Should these functions “look the same” everywhere?

And a much better question: How do we put a probability over asubset of functions?

Page 55: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Let’s talk about sets, baby

We know that there is no “uniform” measure over all randomfunctions, so we need to think about useful subsets.

I These are choices about smoothness.

I How differentiable do we want these things to be?

I Can there be sharp changes in the functions or theirderivatives?

I Should these functions “look the same” everywhere?

And a much better question: How do we put a probability over asubset of functions?

Page 56: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Let’s talk about sets, baby

We know that there is no “uniform” measure over all randomfunctions, so we need to think about useful subsets.

I These are choices about smoothness.

I How differentiable do we want these things to be?

I Can there be sharp changes in the functions or theirderivatives?

I Should these functions “look the same” everywhere?

And a much better question: How do we put a probability over asubset of functions?

Page 57: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Let’s talk about sets, baby

We know that there is no “uniform” measure over all randomfunctions, so we need to think about useful subsets.

I These are choices about smoothness.

I How differentiable do we want these things to be?

I Can there be sharp changes in the functions or theirderivatives?

I Should these functions “look the same” everywhere?

And a much better question: How do we put a probability over asubset of functions?

Page 58: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Let’s talk about sets, baby

We know that there is no “uniform” measure over all randomfunctions, so we need to think about useful subsets.

I These are choices about smoothness.

I How differentiable do we want these things to be?

I Can there be sharp changes in the functions or theirderivatives?

I Should these functions “look the same” everywhere?

And a much better question: How do we put a probability over asubset of functions?

Page 59: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Let’s talk about sets, baby

We know that there is no “uniform” measure over all randomfunctions, so we need to think about useful subsets.

I These are choices about smoothness.

I How differentiable do we want these things to be?

I Can there be sharp changes in the functions or theirderivatives?

I Should these functions “look the same” everywhere?

And a much better question: How do we put a probability over asubset of functions?

Page 60: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Aside: The Gaussian distributionThe most important distribution in probability and statistics is theGaussian distribution, which has density

p(x) =1√2πσ

e−(x−µ)2

2σ2 .

I µ is the mean (the centre of the distribution)

I σ2 is the variance. Approximately 99.7% of the probabilitymass is within 3σ of the mean.

I A standard normal RV has mean 0 and variance 1.

I The central limit theorem says that empirical averages ofindependent RVs are approximately normal.

n−1∑n

i=1 Xi − µ√nσ

w→ N(0, 1).

Page 61: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Aside: The Gaussian distributionThe most important distribution in probability and statistics is theGaussian distribution, which has density

p(x) =1√2πσ

e−(x−µ)2

2σ2 .

I µ is the mean (the centre of the distribution)

I σ2 is the variance. Approximately 99.7% of the probabilitymass is within 3σ of the mean.

I A standard normal RV has mean 0 and variance 1.

I The central limit theorem says that empirical averages ofindependent RVs are approximately normal.

n−1∑n

i=1 Xi − µ√nσ

w→ N(0, 1).

Page 62: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Aside: The Gaussian distributionThe most important distribution in probability and statistics is theGaussian distribution, which has density

p(x) =1√2πσ

e−(x−µ)2

2σ2 .

I µ is the mean (the centre of the distribution)

I σ2 is the variance. Approximately 99.7% of the probabilitymass is within 3σ of the mean.

I A standard normal RV has mean 0 and variance 1.

I The central limit theorem says that empirical averages ofindependent RVs are approximately normal.

n−1∑n

i=1 Xi − µ√nσ

w→ N(0, 1).

Page 63: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Aside: The Gaussian distributionThe most important distribution in probability and statistics is theGaussian distribution, which has density

p(x) =1√2πσ

e−(x−µ)2

2σ2 .

I µ is the mean (the centre of the distribution)

I σ2 is the variance. Approximately 99.7% of the probabilitymass is within 3σ of the mean.

I A standard normal RV has mean 0 and variance 1.

I The central limit theorem says that empirical averages ofindependent RVs are approximately normal.

n−1∑n

i=1 Xi − µ√nσ

w→ N(0, 1).

Page 64: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Multivariate Gaussian distributions

Now, in the cases we care about, things are not univaritate.

The multivariate Gaussian distribution has density

p(x) =1√

2π|Σ|e−

12

(x−µ)T Σ−1(x−µ).

I µ is still the average (now a d-dimensional vector)

I Σ is a d × d covariance matrix. Σij = Cov(xi , xj)

I For any constant vector a, aTΣ is normally distributed.

Page 65: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Multivariate Gaussian distributions

Now, in the cases we care about, things are not univaritate.

The multivariate Gaussian distribution has density

p(x) =1√

2π|Σ|e−

12

(x−µ)T Σ−1(x−µ).

I µ is still the average (now a d-dimensional vector)

I Σ is a d × d covariance matrix. Σij = Cov(xi , xj)

I For any constant vector a, aTΣ is normally distributed.

Page 66: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Multivariate Gaussian distributions

Now, in the cases we care about, things are not univaritate.

The multivariate Gaussian distribution has density

p(x) =1√

2π|Σ|e−

12

(x−µ)T Σ−1(x−µ).

I µ is still the average (now a d-dimensional vector)

I Σ is a d × d covariance matrix. Σij = Cov(xi , xj)

I For any constant vector a, aTΣ is normally distributed.

Page 67: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

What does a multivariate Gaussian look like?

(from wikipedia!)

Page 68: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Aside: Sampling from multivariate Gaussian distributions

An important and useful skill is sampling from multivariateGaussians.

I Compute the Cholesky factorisation of the covariance matrixΣ = RRT . Then x = Rz ∼ N(0,Σ) if z ∼ N(0, I ).

I Compute the Cholesky factorisation of the precision matrixQ = Σ−1 = LLT . Then x = L−T z ∼ N(0,Σ).

I General calculations with a Gaussian cost O(n3) flops, wheren is the dimension of the problem.

I For large n, we obviously don’t compute with “general”Gaussians!

Page 69: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Aside: Sampling from multivariate Gaussian distributions

An important and useful skill is sampling from multivariateGaussians.

I Compute the Cholesky factorisation of the covariance matrixΣ = RRT . Then x = Rz ∼ N(0,Σ) if z ∼ N(0, I ).

I Compute the Cholesky factorisation of the precision matrixQ = Σ−1 = LLT . Then x = L−T z ∼ N(0,Σ).

I General calculations with a Gaussian cost O(n3) flops, wheren is the dimension of the problem.

I For large n, we obviously don’t compute with “general”Gaussians!

Page 70: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Aside: Sampling from multivariate Gaussian distributions

An important and useful skill is sampling from multivariateGaussians.

I Compute the Cholesky factorisation of the covariance matrixΣ = RRT . Then x = Rz ∼ N(0,Σ) if z ∼ N(0, I ).

I Compute the Cholesky factorisation of the precision matrixQ = Σ−1 = LLT . Then x = L−T z ∼ N(0,Σ).

I General calculations with a Gaussian cost O(n3) flops, wheren is the dimension of the problem.

I For large n, we obviously don’t compute with “general”Gaussians!

Page 71: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Aside: Sampling from multivariate Gaussian distributions

An important and useful skill is sampling from multivariateGaussians.

I Compute the Cholesky factorisation of the covariance matrixΣ = RRT . Then x = Rz ∼ N(0,Σ) if z ∼ N(0, I ).

I Compute the Cholesky factorisation of the precision matrixQ = Σ−1 = LLT . Then x = L−T z ∼ N(0,Σ).

I General calculations with a Gaussian cost O(n3) flops, wheren is the dimension of the problem.

I For large n, we obviously don’t compute with “general”Gaussians!

Page 72: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Gaussian random fields

If we have a process that is occurring everywhere in space, it isnatural to try to model it using some sort of function.

I This is hard (for good measure theory-ish reasons)!

I We typically make our lives easier by making everythingGaussian.

I What makes a function Gaussian?

Page 73: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Gaussian random fields

If we have a process that is occurring everywhere in space, it isnatural to try to model it using some sort of function.

I This is hard (for good measure theory-ish reasons)!

I We typically make our lives easier by making everythingGaussian.

I What makes a function Gaussian?

Page 74: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Gaussian random fields

If we have a process that is occurring everywhere in space, it isnatural to try to model it using some sort of function.

I This is hard (for good measure theory-ish reasons)!

I We typically make our lives easier by making everythingGaussian.

I What makes a function Gaussian?

Page 75: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Gaussian random fields

If we are trying to model u(s) what sort of things do we need?

I We don’t ever observe a function everywhere.

I If u is a vector of observations of u(s) at different locations,we want this to be normally distributed:

u = (u(s1), . . . , u(sp))T ∼ N(0,Σ)

I This is actually quite tricky: the covariance matrix Σ will needto depend on the set of observation sites and always has to bepositive definite.

I It turns out you can actually do this by setting Σij = c(s i , s j)for some covariance function c(·, ·).

I Not every function will ensure that Σ is positive definite!

Page 76: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Gaussian random fields

If we are trying to model u(s) what sort of things do we need?

I We don’t ever observe a function everywhere.

I If u is a vector of observations of u(s) at different locations,we want this to be normally distributed:

u = (u(s1), . . . , u(sp))T ∼ N(0,Σ)

I This is actually quite tricky: the covariance matrix Σ will needto depend on the set of observation sites and always has to bepositive definite.

I It turns out you can actually do this by setting Σij = c(s i , s j)for some covariance function c(·, ·).

I Not every function will ensure that Σ is positive definite!

Page 77: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Gaussian random fields

If we are trying to model u(s) what sort of things do we need?

I We don’t ever observe a function everywhere.

I If u is a vector of observations of u(s) at different locations,we want this to be normally distributed:

u = (u(s1), . . . , u(sp))T ∼ N(0,Σ)

I This is actually quite tricky: the covariance matrix Σ will needto depend on the set of observation sites and always has to bepositive definite.

I It turns out you can actually do this by setting Σij = c(s i , s j)for some covariance function c(·, ·).

I Not every function will ensure that Σ is positive definite!

Page 78: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Gaussian random fields

If we are trying to model u(s) what sort of things do we need?

I We don’t ever observe a function everywhere.

I If u is a vector of observations of u(s) at different locations,we want this to be normally distributed:

u = (u(s1), . . . , u(sp))T ∼ N(0,Σ)

I This is actually quite tricky: the covariance matrix Σ will needto depend on the set of observation sites and always has to bepositive definite.

I It turns out you can actually do this by setting Σij = c(s i , s j)for some covariance function c(·, ·).

I Not every function will ensure that Σ is positive definite!

Page 79: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Gaussian random fields

If we are trying to model u(s) what sort of things do we need?

I We don’t ever observe a function everywhere.

I If u is a vector of observations of u(s) at different locations,we want this to be normally distributed:

u = (u(s1), . . . , u(sp))T ∼ N(0,Σ)

I This is actually quite tricky: the covariance matrix Σ will needto depend on the set of observation sites and always has to bepositive definite.

I It turns out you can actually do this by setting Σij = c(s i , s j)for some covariance function c(·, ·).

I Not every function will ensure that Σ is positive definite!

Page 80: Generalised Additive (Mixed) Models - Personal Homepages ...people.bath.ac.uk/mlt21/samba/ITT3Files/GAMs.pdf · I Let y denote a set of ... I Based on the observations, we want to

Gaussian random fields

Defn: Gaussian random fields

A Gaussian random field u(s) is defined by a mean function µ(s)and a covariance function c(s1, s2). It has the property that, forevery finite collection of points {s1, . . . , sp},

u ≡ (u(s1), . . . , u(sp))T ∼ N(0,Σ),

where Σij = c(si , sj).

I Σ will almost never be sparse.

I It is typically very hard to find families of parameterisedcovariance functions.

I It isn’t straightforward to make this work for multivariate,spatiotemporal, or processes on non-flat spaces.