Formulation of model likelihood functions

Formulation of model likelihood functionsThe most useful representation of stochastic models

June 12, 2017Andreas Scheidegger

Eawag: Swiss Federal Institute of Aquatic Science and Technology

Statistical models are stories about how the datacame to be – Dave Harris

Andreas Scheidegger Motivation 1

What is a likelihood function?

DefinitionThe likelihood function p(y1, . . . , yn|θ) or L(θ) is the jointprobability (density) of observations {y1, . . . yn} given a stochasticmodel with parameter values θ.

InformalIf we simulate output data similar to my measurements with astochastic model while setting the parameters equal θ, what is theprobability (density) that we obtain {y1, . . . yn}?


For what do we need likelihood functions?

Many parameter calibration and predictions techniques require thatthe model is described by its likelihood function:

Frequentist statistics:Maximum likelihood estimator (MLE), LR-tests, . . .

Bayesian statistics:Parameter inference, uncertainty propagation,predictions, model comparison, . . .→ topic of this course

Note: The actual value of the likelihood function per se is usually not ofinterest.


How to formulate likelihood functions?

Often, models are not described by the likelihood function.A common description may rather look like this:

Yi = M(xi ,θ) + εi , εi ∼ N(0, σ2)

While this a complete description of the stochastic model1, it isnot directly useful for inference → we must translate such adescription into p(y|θ, x).

1M(xi , θ) is a deterministic function. The complete model, however, isstochastic because we added a random error term εi .


Some (informal) advices

• Formulate first the likelihood general without specificdistribution assumptions.• Think (informally!) p(x) as Prob(X = x) and change sums tointegrals.• Practically, a function that is proportional to the likelihoodfunction is sufficient.• The logarithmic scale is prefered for computation.• Don’t care about identifiability of the parameters at this stage.

Andreas Scheidegger Derivation of p(y|θ) 6

Example 1: sex ratiodiscrete data

Observed data yThe gender of n newborns.

Model descriptionWe assume the probability forgirl is θ and for boy 1− θ.

ÄÃ

Andreas Scheidegger Examples 7


Probability for a single observation:

Prob(yi |θ) ={θ, yi = girl

1− θ, yi = boy

Independence is a reasonable assumption:

Prob(y1, . . . , yn|θ) =n∏

i=1Prob(yi |θ)

= θ#girls(1− θ)#boys



R implementation as function:logL <- function(theta, n.girls, n.boys) {

LL <- n.girls*log(theta) + n.boys*log(1-theta)return(LL)

}

Call:logL(theta=0.4, n.girls=10, n.boys=5)> -11.717035


Example 2: rating curvecontinuous data

Observed data y, xn pairs of water level xi andrun-off yi .

Model descriptionWater level x and run-off y arerelated as

y = RC(x ,θ) = θ1(x − θ2)θ3

Figure: Rating curve of SluzewCreek. Sikorska et al. (2013)



A deterministic model?→ We must make assumptionsabout the error distribution. E.g.,

Yi = RC(xi ,θ) + εi , εi ∼ N(0, σ2)or equivalent

Yi ∼ N(RC(xi ,θ), σ2

)The RC model describes only theexpected value of an observationfor a given xi .

So the pdf for a singleobservation is the density of anormal distribution2

p(yi |xi ,θ, σ) = 1σφ

(yi − RC(xi ,θ)σ

)

Finally, assuming independentobservations

p(y1, . . . , yn|x1, . . . , xn,θ, σ)

=n∏

i=1p(yi |xi ,θ, σ)

2φ(x) = 1√2π

exp{−x2

2 }Andreas Scheidegger Examples 11


A deterministic model?→ We must make assumptionsabout the error distribution. E.g.,

Yi = RC(xi ,θ) + εi , εi ∼ N(0, σ2)or equivalent

Yi ∼ N(RC(xi ,θ), σ2

)The RC model describes only theexpected value of an observationfor a given xi .

So the pdf for a singleobservation is the density of anormal distribution2

p(yi |xi ,θ, σ) = 1σφ

(yi − RC(xi ,θ)σ

)Finally, assuming independentobservations

p(y1, . . . , yn|x1, . . . , xn,θ, σ)

=n∏

i=1p(yi |xi ,θ, σ)

2φ(x) = 1√2π

exp{−x2

2 }Andreas Scheidegger Examples 11


Figure: Rating curve. Example of a non-linear regression.



X (water level)

Y (

runoff

)

RC(X,θ)

Figure: Rating curve. Example of a non-linear regression.



## deterministic raiting curve modelRC <- function(x, theta) {y <- theta[1]*(x-theta[2])^theta[3]return(y)

}

## log likelihood with normal distributed errors## sigma is included as theta[4]=sigma.logL <- function(theta, y.data, x.data) {mean.y <- RC(x.data, theta[1:3]) # mean value for yLL <- sum(dnorm(y.data, mean=mean.y,

sd=theta[4], log=TRUE))return(LL)

}


Example 3: limit of quantificationcensored data

Observed data ylab 1 lab 2 lab 3 . . .

concentration y1 y2 n.d. . . .

Limit of quantification: LOQstandard deviation of measurements: σ

Model description“A model? I just want to calculate theconcentration.”



Figure: Left censored data.

Model descriptionThe measurements are normaldistributed around the true meanθ with standard deviation σ.



## data, if left censored = "nd"y <- c(y1=0.35, y2=0.45, y3="nd", y4="nd", y5=0.4)

## log likelihoodlogL <- function(theta, y, sigma, LOQ) {## number of censored observationsn.censored <- sum(y=="nd")## convert not censored observations into type ’numeric’y.not.cen <- as.numeric(y[y!="nd"])

## likelihood for not censored observationsLL.not.cen <- sum(dnorm(y.not.cen, mean=theta, sd=sigma, log=TRUE))## likelihood for left censored observationsLL.left.cen <- n.censored * pnorm(LOQ, mean=theta, sd=sigma, log=TRUE)return(LL.not.cen + LL.left.cen)

}


Example 4: auto-regressive modelauto-correlated data

Observed data yequally spaced time series data y1, . . . , yn.

Model descriptionClassical AR(1) model:

yt+1 = θyt + εt+1, εt+1 ∼ N(0, σ2)y0 = k

Time

wat

er le

vel [

feet

]

1880 1920 1960

576577578579580581582

Figure: Annual waterlevel of Lake Huron.Brockwell and Davis(1991)


Example 4: auto-regressive modelauto-correlated data

Observations are only dependent on the preceding observation.Hence:

p(y1, . . . , yn|θ, σ, y0) =n∏

i=1p(yi |yi−1, θ, σ)

The conditional probabilities are all normal

p(yt |y0, . . . , yt−1, θ, σ) = p(yt |yt−1, θ, σ) = φ

(yt − θyt−1σ

)

LL <- dnorm(y[1], k, sigma, log=TRUE) +sum(dnorm(y[2:n], mean=theta*y[1:(n-1)],sd=sigma, log=TRUE))


Normality and “iid.”Reality is normally not normal distributed

Typical statistical assumption, such as• normality• independence

are often chosen from a computational view point.

However, other distribution assumptions can beincorporated easily in most cases.

Andreas Scheidegger General remarks 20

Rating curve modified

Lets assume we observe more extreme values than compatible witha normal distribution → try t-distribution.

## log likelihood with t-distributed errors## theta[4]=scale, theta[5]=degree of freedom.logL <- function(theta, y.data, x.data) {mean.y <- RC(x.data, theta[1:3]) # mean value for yresiduals <- (y.data - mean.y)/theta[4] # scalingLL <- sum(dt(residuals, df=theta[5], log=TRUE))return(LL)

}


Formulation of model likelihood functions

Science

Transcript of Formulation of model likelihood functions