Formulation of model likelihood functions

Post on 22-Jan-2018

48 views 1 download

Transcript of Formulation of model likelihood functions

Formulation of model likelihood functionsThe most useful representation of stochastic models

June 12, 2017Andreas Scheidegger

Eawag: Swiss Federal Institute of Aquatic Science and Technology

Statistical models are stories about how the datacame to be – Dave Harris

Andreas Scheidegger Motivation 1

What is a likelihood function?

DefinitionThe likelihood function p(y1, . . . , yn|θ) or L(θ) is the jointprobability (density) of observations {y1, . . . yn} given a stochasticmodel with parameter values θ.

InformalIf we simulate output data similar to my measurements with astochastic model while setting the parameters equal θ, what is theprobability (density) that we obtain {y1, . . . yn}?

Andreas Scheidegger Motivation 2

What is a likelihood function?

DefinitionThe likelihood function p(y1, . . . , yn|θ) or L(θ) is the jointprobability (density) of observations {y1, . . . yn} given a stochasticmodel with parameter values θ.

InformalIf we simulate output data similar to my measurements with astochastic model while setting the parameters equal θ, what is theprobability (density) that we obtain {y1, . . . yn}?

Andreas Scheidegger Motivation 2

For what do we need likelihood functions?

Many parameter calibration and predictions techniques require thatthe model is described by its likelihood function:

Frequentist statistics:Maximum likelihood estimator (MLE), LR-tests, . . .

Bayesian statistics:Parameter inference, uncertainty propagation,predictions, model comparison, . . .→ topic of this course

Note: The actual value of the likelihood function per se is usually not ofinterest.

Andreas Scheidegger Motivation 3

How to formulate likelihood functions?

Often, models are not described by the likelihood function.A common description may rather look like this:

Yi = M(xi ,θ) + εi , εi ∼ N(0, σ2)

While this a complete description of the stochastic model1, it isnot directly useful for inference → we must translate such adescription into p(y|θ, x).

1M(xi , θ) is a deterministic function. The complete model, however, isstochastic because we added a random error term εi .

Andreas Scheidegger Motivation 4

Derivation of a likelihood function

1. Decompose the joint probability density:

p(y|θ) = p(y1, . . . , yn|θ) = p(y1|θ)p(y2|θ, y1)p(y3|θ, y1, y2). . . p(yn|θ, y1, . . . , yn−1)

2. Formulate the conditional probabilities:

p(yi |θ, y1, . . . , yi−1)

If the observations are independent:p(yi |θ, y1, . . . , yi−1) = p(yi |θ).

Andreas Scheidegger Derivation of p(y|θ) 5

Some (informal) advices

• Formulate first the likelihood general without specificdistribution assumptions.• Think (informally!) p(x) as Prob(X = x) and change sums tointegrals.• Practically, a function that is proportional to the likelihoodfunction is sufficient.• The logarithmic scale is prefered for computation.• Don’t care about identifiability of the parameters at this stage.

Andreas Scheidegger Derivation of p(y|θ) 6

Example 1: sex ratiodiscrete data

Observed data yThe gender of n newborns.

Model descriptionWe assume the probability forgirl is θ and for boy 1− θ.

ÄÃ

Andreas Scheidegger Examples 7

Example 1: sex ratiodiscrete data

Probability for a single observation:

Prob(yi |θ) ={θ, yi = girl

1− θ, yi = boy

Independence is a reasonable assumption:

Prob(y1, . . . , yn|θ) =n∏

i=1Prob(yi |θ)

= θ#girls(1− θ)#boys

Andreas Scheidegger Examples 8

Example 1: sex ratiodiscrete data

Probability for a single observation:

Prob(yi |θ) ={θ, yi = girl

1− θ, yi = boy

Independence is a reasonable assumption:

Prob(y1, . . . , yn|θ) =n∏

i=1Prob(yi |θ)

= θ#girls(1− θ)#boys

Andreas Scheidegger Examples 8

Example 1: sex ratiodiscrete data

R implementation as function:logL <- function(theta, n.girls, n.boys) {

LL <- n.girls*log(theta) + n.boys*log(1-theta)return(LL)

}

Call:logL(theta=0.4, n.girls=10, n.boys=5)> -11.717035

Andreas Scheidegger Examples 9

Example 2: rating curvecontinuous data

Observed data y, xn pairs of water level xi andrun-off yi .

Model descriptionWater level x and run-off y arerelated as

y = RC(x ,θ) = θ1(x − θ2)θ3

Figure: Rating curve of SluzewCreek. Sikorska et al. (2013)

Andreas Scheidegger Examples 10

Example 2: rating curvecontinuous data

Observed data y, xn pairs of water level xi andrun-off yi .

Model descriptionWater level x and run-off y arerelated as

y = RC(x ,θ) = θ1(x − θ2)θ3

Figure: Rating curve of SluzewCreek. Sikorska et al. (2013)

Andreas Scheidegger Examples 10

Example 2: rating curvecontinuous data

A deterministic model?→ We must make assumptionsabout the error distribution. E.g.,

Yi = RC(xi ,θ) + εi , εi ∼ N(0, σ2)or equivalent

Yi ∼ N(RC(xi ,θ), σ2

)The RC model describes only theexpected value of an observationfor a given xi .

So the pdf for a singleobservation is the density of anormal distribution2

p(yi |xi ,θ, σ) = 1σφ

(yi − RC(xi ,θ)σ

)

Finally, assuming independentobservations

p(y1, . . . , yn|x1, . . . , xn,θ, σ)

=n∏

i=1p(yi |xi ,θ, σ)

2φ(x) = 1√2π

exp{−x2

2 }Andreas Scheidegger Examples 11

Example 2: rating curvecontinuous data

A deterministic model?→ We must make assumptionsabout the error distribution. E.g.,

Yi = RC(xi ,θ) + εi , εi ∼ N(0, σ2)or equivalent

Yi ∼ N(RC(xi ,θ), σ2

)The RC model describes only theexpected value of an observationfor a given xi .

So the pdf for a singleobservation is the density of anormal distribution2

p(yi |xi ,θ, σ) = 1σφ

(yi − RC(xi ,θ)σ

)

Finally, assuming independentobservations

p(y1, . . . , yn|x1, . . . , xn,θ, σ)

=n∏

i=1p(yi |xi ,θ, σ)

2φ(x) = 1√2π

exp{−x2

2 }Andreas Scheidegger Examples 11

Example 2: rating curvecontinuous data

A deterministic model?→ We must make assumptionsabout the error distribution. E.g.,

Yi = RC(xi ,θ) + εi , εi ∼ N(0, σ2)or equivalent

Yi ∼ N(RC(xi ,θ), σ2

)The RC model describes only theexpected value of an observationfor a given xi .

So the pdf for a singleobservation is the density of anormal distribution2

p(yi |xi ,θ, σ) = 1σφ

(yi − RC(xi ,θ)σ

)Finally, assuming independentobservations

p(y1, . . . , yn|x1, . . . , xn,θ, σ)

=n∏

i=1p(yi |xi ,θ, σ)

2φ(x) = 1√2π

exp{−x2

2 }Andreas Scheidegger Examples 11

Example 2: rating curvecontinuous data

Figure: Rating curve. Example of a non-linear regression.

Andreas Scheidegger Examples 12

Example 2: rating curvecontinuous data

Figure: Rating curve. Example of a non-linear regression.

Andreas Scheidegger Examples 12

Example 2: rating curvecontinuous data

X (water level)

Y (

runoff

)

RC(X,θ)

Figure: Rating curve. Example of a non-linear regression.

Andreas Scheidegger Examples 12

Example 2: rating curvecontinuous data

## deterministic raiting curve modelRC <- function(x, theta) {y <- theta[1]*(x-theta[2])^theta[3]return(y)

}

## log likelihood with normal distributed errors## sigma is included as theta[4]=sigma.logL <- function(theta, y.data, x.data) {mean.y <- RC(x.data, theta[1:3]) # mean value for yLL <- sum(dnorm(y.data, mean=mean.y,

sd=theta[4], log=TRUE))return(LL)

}

Andreas Scheidegger Examples 13

Example 3: limit of quantificationcensored data

Observed data ylab 1 lab 2 lab 3 . . .

concentration y1 y2 n.d. . . .

Limit of quantification: LOQstandard deviation of measurements: σ

Model description“A model? I just want to calculate theconcentration.”

Andreas Scheidegger Examples 14

Example 3: limit of quantificationcensored data

Figure: Left censored data.

Model descriptionThe measurements are normaldistributed around the true meanθ with standard deviation σ.

Andreas Scheidegger Examples 15

Example 3: limit of quantificationcensored data

Figure: Left censored data.

Model descriptionThe measurements are normaldistributed around the true meanθ with standard deviation σ.

Andreas Scheidegger Examples 15

Example 3: limit of quantificationcensored data

Figure: Left censored data.

Model descriptionThe measurements are normaldistributed around the true meanθ with standard deviation σ.

Andreas Scheidegger Examples 15

Example 3: limit of quantificationcensored dataLikelihood for a single measured observation:

p(yi |θ, σ) = φ

(yi − θσ

)

Likelihood for a single “not detected” observation:

Prob(n.d.|θ, σ) = Prob(yi < LOQ|θ, σ) =∫ LOQ

0p(y |θ, σ) dy

=Φ(LOQ− θ

σ

)

p(y1, . . . , yn|θ, σ) = Prob(yi < LOQ|θ, σ)#censored ∏¬censored

p(yi |θ, σ)

Andreas Scheidegger Examples 16

Example 3: limit of quantificationcensored dataLikelihood for a single measured observation:

p(yi |θ, σ) = φ

(yi − θσ

)

Likelihood for a single “not detected” observation:

Prob(n.d.|θ, σ) = Prob(yi < LOQ|θ, σ) =∫ LOQ

0p(y |θ, σ) dy

=Φ(LOQ− θ

σ

)

p(y1, . . . , yn|θ, σ) = Prob(yi < LOQ|θ, σ)#censored ∏¬censored

p(yi |θ, σ)

Andreas Scheidegger Examples 16

Example 3: limit of quantificationcensored dataLikelihood for a single measured observation:

p(yi |θ, σ) = φ

(yi − θσ

)

Likelihood for a single “not detected” observation:

Prob(n.d.|θ, σ) = Prob(yi < LOQ|θ, σ) =∫ LOQ

0p(y |θ, σ) dy

=Φ(LOQ− θ

σ

)

p(y1, . . . , yn|θ, σ) = Prob(yi < LOQ|θ, σ)#censored ∏¬censored

p(yi |θ, σ)

Andreas Scheidegger Examples 16

Example 3: limit of quantificationcensored data

## data, if left censored = "nd"y <- c(y1=0.35, y2=0.45, y3="nd", y4="nd", y5=0.4)

## log likelihoodlogL <- function(theta, y, sigma, LOQ) {## number of censored observationsn.censored <- sum(y=="nd")## convert not censored observations into type ’numeric’y.not.cen <- as.numeric(y[y!="nd"])

## likelihood for not censored observationsLL.not.cen <- sum(dnorm(y.not.cen, mean=theta, sd=sigma, log=TRUE))## likelihood for left censored observationsLL.left.cen <- n.censored * pnorm(LOQ, mean=theta, sd=sigma, log=TRUE)return(LL.not.cen + LL.left.cen)

}

Andreas Scheidegger Examples 17

Example 4: auto-regressive modelauto-correlated data

Observed data yequally spaced time series data y1, . . . , yn.

Model descriptionClassical AR(1) model:

yt+1 = θyt + εt+1, εt+1 ∼ N(0, σ2)y0 = k

Time

wat

er le

vel [

feet

]

1880 1920 1960

576577578579580581582

Figure: Annual waterlevel of Lake Huron.Brockwell and Davis(1991)

Andreas Scheidegger Examples 18

Example 4: auto-regressive modelauto-correlated data

Observations are only dependent on the preceding observation.Hence:

p(y1, . . . , yn|θ, σ, y0) =n∏

i=1p(yi |yi−1, θ, σ)

The conditional probabilities are all normal

p(yt |y0, . . . , yt−1, θ, σ) = p(yt |yt−1, θ, σ) = φ

(yt − θyt−1σ

)

LL <- dnorm(y[1], k, sigma, log=TRUE) +sum(dnorm(y[2:n], mean=theta*y[1:(n-1)],sd=sigma, log=TRUE))

Andreas Scheidegger Examples 19

Example 4: auto-regressive modelauto-correlated data

Observations are only dependent on the preceding observation.Hence:

p(y1, . . . , yn|θ, σ, y0) =n∏

i=1p(yi |yi−1, θ, σ)

The conditional probabilities are all normal

p(yt |y0, . . . , yt−1, θ, σ) = p(yt |yt−1, θ, σ) = φ

(yt − θyt−1σ

)

LL <- dnorm(y[1], k, sigma, log=TRUE) +sum(dnorm(y[2:n], mean=theta*y[1:(n-1)],sd=sigma, log=TRUE))

Andreas Scheidegger Examples 19

Normality and “iid.”Reality is normally not normal distributed

Typical statistical assumption, such as• normality• independence

are often chosen from a computational view point.

However, other distribution assumptions can beincorporated easily in most cases.

Andreas Scheidegger General remarks 20

Rating curve modified

Lets assume we observe more extreme values than compatible witha normal distribution → try t-distribution.

## log likelihood with t-distributed errors## theta[4]=scale, theta[5]=degree of freedom.logL <- function(theta, y.data, x.data) {mean.y <- RC(x.data, theta[1:3]) # mean value for yresiduals <- (y.data - mean.y)/theta[4] # scalingLL <- sum(dt(residuals, df=theta[5], log=TRUE))return(LL)

}

Andreas Scheidegger General remarks 21

Summary

1. Decompose the joint probability density:

p(y|θ) = p(y1, . . . , yn|θ) = p(y1|θ)p(y2|θ, y1)p(y3|θ, y1, y2). . . p(yn|θ, y1, . . . yn−1)

2. Make assumptions to formulate the conditional probabilities:

p(yi |θ, y1, . . . yi−1)

3. Make inference, check assumptions, revise if necessary.

Andreas Scheidegger General remarks 22