Formulation of model likelihood functions

35
Formulation of model likelihood functions The most useful representation of stochastic models June 12, 2017 Andreas Scheidegger Eawag: Swiss Federal Institute of Aquatic Science and Technology

Transcript of Formulation of model likelihood functions

Page 1: Formulation of model likelihood functions

Formulation of model likelihood functionsThe most useful representation of stochastic models

June 12, 2017Andreas Scheidegger

Eawag: Swiss Federal Institute of Aquatic Science and Technology

Page 2: Formulation of model likelihood functions

Statistical models are stories about how the datacame to be – Dave Harris

Andreas Scheidegger Motivation 1

Page 3: Formulation of model likelihood functions

What is a likelihood function?

DefinitionThe likelihood function p(y1, . . . , yn|θ) or L(θ) is the jointprobability (density) of observations {y1, . . . yn} given a stochasticmodel with parameter values θ.

InformalIf we simulate output data similar to my measurements with astochastic model while setting the parameters equal θ, what is theprobability (density) that we obtain {y1, . . . yn}?

Andreas Scheidegger Motivation 2

Page 4: Formulation of model likelihood functions

What is a likelihood function?

DefinitionThe likelihood function p(y1, . . . , yn|θ) or L(θ) is the jointprobability (density) of observations {y1, . . . yn} given a stochasticmodel with parameter values θ.

InformalIf we simulate output data similar to my measurements with astochastic model while setting the parameters equal θ, what is theprobability (density) that we obtain {y1, . . . yn}?

Andreas Scheidegger Motivation 2

Page 5: Formulation of model likelihood functions

For what do we need likelihood functions?

Many parameter calibration and predictions techniques require thatthe model is described by its likelihood function:

Frequentist statistics:Maximum likelihood estimator (MLE), LR-tests, . . .

Bayesian statistics:Parameter inference, uncertainty propagation,predictions, model comparison, . . .→ topic of this course

Note: The actual value of the likelihood function per se is usually not ofinterest.

Andreas Scheidegger Motivation 3

Page 6: Formulation of model likelihood functions

How to formulate likelihood functions?

Often, models are not described by the likelihood function.A common description may rather look like this:

Yi = M(xi ,θ) + εi , εi ∼ N(0, σ2)

While this a complete description of the stochastic model1, it isnot directly useful for inference → we must translate such adescription into p(y|θ, x).

1M(xi , θ) is a deterministic function. The complete model, however, isstochastic because we added a random error term εi .

Andreas Scheidegger Motivation 4

Page 7: Formulation of model likelihood functions

Derivation of a likelihood function

1. Decompose the joint probability density:

p(y|θ) = p(y1, . . . , yn|θ) = p(y1|θ)p(y2|θ, y1)p(y3|θ, y1, y2). . . p(yn|θ, y1, . . . , yn−1)

2. Formulate the conditional probabilities:

p(yi |θ, y1, . . . , yi−1)

If the observations are independent:p(yi |θ, y1, . . . , yi−1) = p(yi |θ).

Andreas Scheidegger Derivation of p(y|θ) 5

Page 8: Formulation of model likelihood functions

Some (informal) advices

• Formulate first the likelihood general without specificdistribution assumptions.• Think (informally!) p(x) as Prob(X = x) and change sums tointegrals.• Practically, a function that is proportional to the likelihoodfunction is sufficient.• The logarithmic scale is prefered for computation.• Don’t care about identifiability of the parameters at this stage.

Andreas Scheidegger Derivation of p(y|θ) 6

Page 9: Formulation of model likelihood functions

Example 1: sex ratiodiscrete data

Observed data yThe gender of n newborns.

Model descriptionWe assume the probability forgirl is θ and for boy 1− θ.

ÄÃ

Andreas Scheidegger Examples 7

Page 10: Formulation of model likelihood functions

Example 1: sex ratiodiscrete data

Probability for a single observation:

Prob(yi |θ) ={θ, yi = girl

1− θ, yi = boy

Independence is a reasonable assumption:

Prob(y1, . . . , yn|θ) =n∏

i=1Prob(yi |θ)

= θ#girls(1− θ)#boys

Andreas Scheidegger Examples 8

Page 11: Formulation of model likelihood functions

Example 1: sex ratiodiscrete data

Probability for a single observation:

Prob(yi |θ) ={θ, yi = girl

1− θ, yi = boy

Independence is a reasonable assumption:

Prob(y1, . . . , yn|θ) =n∏

i=1Prob(yi |θ)

= θ#girls(1− θ)#boys

Andreas Scheidegger Examples 8

Page 12: Formulation of model likelihood functions

Example 1: sex ratiodiscrete data

R implementation as function:logL <- function(theta, n.girls, n.boys) {

LL <- n.girls*log(theta) + n.boys*log(1-theta)return(LL)

}

Call:logL(theta=0.4, n.girls=10, n.boys=5)> -11.717035

Andreas Scheidegger Examples 9

Page 13: Formulation of model likelihood functions

Example 2: rating curvecontinuous data

Observed data y, xn pairs of water level xi andrun-off yi .

Model descriptionWater level x and run-off y arerelated as

y = RC(x ,θ) = θ1(x − θ2)θ3

Figure: Rating curve of SluzewCreek. Sikorska et al. (2013)

Andreas Scheidegger Examples 10

Page 14: Formulation of model likelihood functions

Example 2: rating curvecontinuous data

Observed data y, xn pairs of water level xi andrun-off yi .

Model descriptionWater level x and run-off y arerelated as

y = RC(x ,θ) = θ1(x − θ2)θ3

Figure: Rating curve of SluzewCreek. Sikorska et al. (2013)

Andreas Scheidegger Examples 10

Page 15: Formulation of model likelihood functions

Example 2: rating curvecontinuous data

A deterministic model?→ We must make assumptionsabout the error distribution. E.g.,

Yi = RC(xi ,θ) + εi , εi ∼ N(0, σ2)or equivalent

Yi ∼ N(RC(xi ,θ), σ2

)The RC model describes only theexpected value of an observationfor a given xi .

So the pdf for a singleobservation is the density of anormal distribution2

p(yi |xi ,θ, σ) = 1σφ

(yi − RC(xi ,θ)σ

)

Finally, assuming independentobservations

p(y1, . . . , yn|x1, . . . , xn,θ, σ)

=n∏

i=1p(yi |xi ,θ, σ)

2φ(x) = 1√2π

exp{−x2

2 }Andreas Scheidegger Examples 11

Page 16: Formulation of model likelihood functions

Example 2: rating curvecontinuous data

A deterministic model?→ We must make assumptionsabout the error distribution. E.g.,

Yi = RC(xi ,θ) + εi , εi ∼ N(0, σ2)or equivalent

Yi ∼ N(RC(xi ,θ), σ2

)The RC model describes only theexpected value of an observationfor a given xi .

So the pdf for a singleobservation is the density of anormal distribution2

p(yi |xi ,θ, σ) = 1σφ

(yi − RC(xi ,θ)σ

)

Finally, assuming independentobservations

p(y1, . . . , yn|x1, . . . , xn,θ, σ)

=n∏

i=1p(yi |xi ,θ, σ)

2φ(x) = 1√2π

exp{−x2

2 }Andreas Scheidegger Examples 11

Page 17: Formulation of model likelihood functions

Example 2: rating curvecontinuous data

A deterministic model?→ We must make assumptionsabout the error distribution. E.g.,

Yi = RC(xi ,θ) + εi , εi ∼ N(0, σ2)or equivalent

Yi ∼ N(RC(xi ,θ), σ2

)The RC model describes only theexpected value of an observationfor a given xi .

So the pdf for a singleobservation is the density of anormal distribution2

p(yi |xi ,θ, σ) = 1σφ

(yi − RC(xi ,θ)σ

)Finally, assuming independentobservations

p(y1, . . . , yn|x1, . . . , xn,θ, σ)

=n∏

i=1p(yi |xi ,θ, σ)

2φ(x) = 1√2π

exp{−x2

2 }Andreas Scheidegger Examples 11

Page 18: Formulation of model likelihood functions

Example 2: rating curvecontinuous data

Figure: Rating curve. Example of a non-linear regression.

Andreas Scheidegger Examples 12

Page 19: Formulation of model likelihood functions

Example 2: rating curvecontinuous data

Figure: Rating curve. Example of a non-linear regression.

Andreas Scheidegger Examples 12

Page 20: Formulation of model likelihood functions

Example 2: rating curvecontinuous data

X (water level)

Y (

runoff

)

RC(X,θ)

Figure: Rating curve. Example of a non-linear regression.

Andreas Scheidegger Examples 12

Page 21: Formulation of model likelihood functions

Example 2: rating curvecontinuous data

## deterministic raiting curve modelRC <- function(x, theta) {y <- theta[1]*(x-theta[2])^theta[3]return(y)

}

## log likelihood with normal distributed errors## sigma is included as theta[4]=sigma.logL <- function(theta, y.data, x.data) {mean.y <- RC(x.data, theta[1:3]) # mean value for yLL <- sum(dnorm(y.data, mean=mean.y,

sd=theta[4], log=TRUE))return(LL)

}

Andreas Scheidegger Examples 13

Page 22: Formulation of model likelihood functions

Example 3: limit of quantificationcensored data

Observed data ylab 1 lab 2 lab 3 . . .

concentration y1 y2 n.d. . . .

Limit of quantification: LOQstandard deviation of measurements: σ

Model description“A model? I just want to calculate theconcentration.”

Andreas Scheidegger Examples 14

Page 23: Formulation of model likelihood functions

Example 3: limit of quantificationcensored data

Figure: Left censored data.

Model descriptionThe measurements are normaldistributed around the true meanθ with standard deviation σ.

Andreas Scheidegger Examples 15

Page 24: Formulation of model likelihood functions

Example 3: limit of quantificationcensored data

Figure: Left censored data.

Model descriptionThe measurements are normaldistributed around the true meanθ with standard deviation σ.

Andreas Scheidegger Examples 15

Page 25: Formulation of model likelihood functions

Example 3: limit of quantificationcensored data

Figure: Left censored data.

Model descriptionThe measurements are normaldistributed around the true meanθ with standard deviation σ.

Andreas Scheidegger Examples 15

Page 26: Formulation of model likelihood functions

Example 3: limit of quantificationcensored dataLikelihood for a single measured observation:

p(yi |θ, σ) = φ

(yi − θσ

)

Likelihood for a single “not detected” observation:

Prob(n.d.|θ, σ) = Prob(yi < LOQ|θ, σ) =∫ LOQ

0p(y |θ, σ) dy

=Φ(LOQ− θ

σ

)

p(y1, . . . , yn|θ, σ) = Prob(yi < LOQ|θ, σ)#censored ∏¬censored

p(yi |θ, σ)

Andreas Scheidegger Examples 16

Page 27: Formulation of model likelihood functions

Example 3: limit of quantificationcensored dataLikelihood for a single measured observation:

p(yi |θ, σ) = φ

(yi − θσ

)

Likelihood for a single “not detected” observation:

Prob(n.d.|θ, σ) = Prob(yi < LOQ|θ, σ) =∫ LOQ

0p(y |θ, σ) dy

=Φ(LOQ− θ

σ

)

p(y1, . . . , yn|θ, σ) = Prob(yi < LOQ|θ, σ)#censored ∏¬censored

p(yi |θ, σ)

Andreas Scheidegger Examples 16

Page 28: Formulation of model likelihood functions

Example 3: limit of quantificationcensored dataLikelihood for a single measured observation:

p(yi |θ, σ) = φ

(yi − θσ

)

Likelihood for a single “not detected” observation:

Prob(n.d.|θ, σ) = Prob(yi < LOQ|θ, σ) =∫ LOQ

0p(y |θ, σ) dy

=Φ(LOQ− θ

σ

)

p(y1, . . . , yn|θ, σ) = Prob(yi < LOQ|θ, σ)#censored ∏¬censored

p(yi |θ, σ)

Andreas Scheidegger Examples 16

Page 29: Formulation of model likelihood functions

Example 3: limit of quantificationcensored data

## data, if left censored = "nd"y <- c(y1=0.35, y2=0.45, y3="nd", y4="nd", y5=0.4)

## log likelihoodlogL <- function(theta, y, sigma, LOQ) {## number of censored observationsn.censored <- sum(y=="nd")## convert not censored observations into type ’numeric’y.not.cen <- as.numeric(y[y!="nd"])

## likelihood for not censored observationsLL.not.cen <- sum(dnorm(y.not.cen, mean=theta, sd=sigma, log=TRUE))## likelihood for left censored observationsLL.left.cen <- n.censored * pnorm(LOQ, mean=theta, sd=sigma, log=TRUE)return(LL.not.cen + LL.left.cen)

}

Andreas Scheidegger Examples 17

Page 30: Formulation of model likelihood functions

Example 4: auto-regressive modelauto-correlated data

Observed data yequally spaced time series data y1, . . . , yn.

Model descriptionClassical AR(1) model:

yt+1 = θyt + εt+1, εt+1 ∼ N(0, σ2)y0 = k

Time

wat

er le

vel [

feet

]

1880 1920 1960

576577578579580581582

Figure: Annual waterlevel of Lake Huron.Brockwell and Davis(1991)

Andreas Scheidegger Examples 18

Page 31: Formulation of model likelihood functions

Example 4: auto-regressive modelauto-correlated data

Observations are only dependent on the preceding observation.Hence:

p(y1, . . . , yn|θ, σ, y0) =n∏

i=1p(yi |yi−1, θ, σ)

The conditional probabilities are all normal

p(yt |y0, . . . , yt−1, θ, σ) = p(yt |yt−1, θ, σ) = φ

(yt − θyt−1σ

)

LL <- dnorm(y[1], k, sigma, log=TRUE) +sum(dnorm(y[2:n], mean=theta*y[1:(n-1)],sd=sigma, log=TRUE))

Andreas Scheidegger Examples 19

Page 32: Formulation of model likelihood functions

Example 4: auto-regressive modelauto-correlated data

Observations are only dependent on the preceding observation.Hence:

p(y1, . . . , yn|θ, σ, y0) =n∏

i=1p(yi |yi−1, θ, σ)

The conditional probabilities are all normal

p(yt |y0, . . . , yt−1, θ, σ) = p(yt |yt−1, θ, σ) = φ

(yt − θyt−1σ

)

LL <- dnorm(y[1], k, sigma, log=TRUE) +sum(dnorm(y[2:n], mean=theta*y[1:(n-1)],sd=sigma, log=TRUE))

Andreas Scheidegger Examples 19

Page 33: Formulation of model likelihood functions

Normality and “iid.”Reality is normally not normal distributed

Typical statistical assumption, such as• normality• independence

are often chosen from a computational view point.

However, other distribution assumptions can beincorporated easily in most cases.

Andreas Scheidegger General remarks 20

Page 34: Formulation of model likelihood functions

Rating curve modified

Lets assume we observe more extreme values than compatible witha normal distribution → try t-distribution.

## log likelihood with t-distributed errors## theta[4]=scale, theta[5]=degree of freedom.logL <- function(theta, y.data, x.data) {mean.y <- RC(x.data, theta[1:3]) # mean value for yresiduals <- (y.data - mean.y)/theta[4] # scalingLL <- sum(dt(residuals, df=theta[5], log=TRUE))return(LL)

}

Andreas Scheidegger General remarks 21

Page 35: Formulation of model likelihood functions

Summary

1. Decompose the joint probability density:

p(y|θ) = p(y1, . . . , yn|θ) = p(y1|θ)p(y2|θ, y1)p(y3|θ, y1, y2). . . p(yn|θ, y1, . . . yn−1)

2. Make assumptions to formulate the conditional probabilities:

p(yi |θ, y1, . . . yi−1)

3. Make inference, check assumptions, revise if necessary.

Andreas Scheidegger General remarks 22