Formulation of model likelihood functions

Formulation of model likelihood functionsThe most useful representation of stochastic models

June 12, 2017Andreas Scheidegger

Eawag: Swiss Federal Institute of Aquatic Science and Technology

Statistical models are stories about how the datacame to be – Dave Harris

Andreas Scheidegger Motivation 1

What is a likelihood function?

DefinitionThe likelihood function p(y1, . . . , yn|θ) or L(θ) is the jointprobability (density) of observations {y1, . . . yn} given a stochasticmodel with parameter values θ.

InformalIf we simulate output data similar to my measurements with astochastic model while setting the parameters equal θ, what is theprobability (density) that we obtain {y1, . . . yn}?

What is a likelihood function?

DefinitionThe likelihood function p(y1, . . . , yn|θ) or L(θ) is the jointprobability (density) of observations {y1, . . . yn} given a stochasticmodel with parameter values θ.

InformalIf we simulate output data similar to my measurements with astochastic model while setting the parameters equal θ, what is theprobability (density) that we obtain {y1, . . . yn}?

For what do we need likelihood functions?

Many parameter calibration and predictions techniques require thatthe model is described by its likelihood function:

Frequentist statistics:Maximum likelihood estimator (MLE), LR-tests, . . .

Bayesian statistics:Parameter inference, uncertainty propagation,predictions, model comparison, . . .→ topic of this course

Note: The actual value of the likelihood function per se is usually not ofinterest.

How to formulate likelihood functions?

Often, models are not described by the likelihood function.A common description may rather look like this:

Yi = M(xi ,θ) + εi , εi ∼ N(0, σ2)

While this a complete description of the stochastic model1, it isnot directly useful for inference → we must translate such adescription into p(y|θ, x).

1M(xi , θ) is a deterministic function. The complete model, however, isstochastic because we added a random error term εi .

Derivation of a likelihood function

1. Decompose the joint probability density:

p(y|θ) = p(y1, . . . , yn|θ) = p(y1|θ)p(y2|θ, y1)p(y3|θ, y1, y2). . . p(yn|θ, y1, . . . , yn−1)

2. Formulate the conditional probabilities:

p(yi |θ, y1, . . . , yi−1)

If the observations are independent:p(yi |θ, y1, . . . , yi−1) = p(yi |θ).

Andreas Scheidegger Derivation of p(y|θ) 5

Some (informal) advices

• Formulate first the likelihood general without specificdistribution assumptions.• Think (informally!) p(x) as Prob(X = x) and change sums tointegrals.• Practically, a function that is proportional to the likelihoodfunction is sufficient.• The logarithmic scale is prefered for computation.• Don’t care about identifiability of the parameters at this stage.

Andreas Scheidegger Derivation of p(y|θ) 6

Example 1: sex ratiodiscrete data

Observed data yThe gender of n newborns.

Model descriptionWe assume the probability forgirl is θ and for boy 1− θ.

Andreas Scheidegger Examples 7

Probability for a single observation:

Prob(yi |θ) ={θ, yi = girl

1− θ, yi = boy

Independence is a reasonable assumption:

Prob(y1, . . . , yn|θ) =n∏

i=1Prob(yi |θ)

= θ#girls(1− θ)#boys

Probability for a single observation:

Prob(yi |θ) ={θ, yi = girl

1− θ, yi = boy

Independence is a reasonable assumption:

Prob(y1, . . . , yn|θ) =n∏

i=1Prob(yi |θ)

= θ#girls(1− θ)#boys

R implementation as function:logL <- function(theta, n.girls, n.boys) {

LL <- n.girls*log(theta) + n.boys*log(1-theta)return(LL)

Call:logL(theta=0.4, n.girls=10, n.boys=5)> -11.717035

Example 2: rating curvecontinuous data

Observed data y, xn pairs of water level xi andrun-off yi .

Model descriptionWater level x and run-off y arerelated as

y = RC(x ,θ) = θ1(x − θ2)θ3

Figure: Rating curve of SluzewCreek. Sikorska et al. (2013)

Observed data y, xn pairs of water level xi andrun-off yi .

Model descriptionWater level x and run-off y arerelated as

y = RC(x ,θ) = θ1(x − θ2)θ3

Figure: Rating curve of SluzewCreek. Sikorska et al. (2013)

A deterministic model?→ We must make assumptionsabout the error distribution. E.g.,

Yi = RC(xi ,θ) + εi , εi ∼ N(0, σ2)or equivalent

Yi ∼ N(RC(xi ,θ), σ2

)The RC model describes only theexpected value of an observationfor a given xi .

So the pdf for a singleobservation is the density of anormal distribution2

p(yi |xi ,θ, σ) = 1σφ

(yi − RC(xi ,θ)σ

Finally, assuming independentobservations

p(y1, . . . , yn|x1, . . . , xn,θ, σ)

i=1p(yi |xi ,θ, σ)

2φ(x) = 1√2π

exp{−x2

2 }Andreas Scheidegger Examples 11

Finally, assuming independentobservations

p(y1, . . . , yn|x1, . . . , xn,θ, σ)

2φ(x) = 1√2π

exp{−x2

)Finally, assuming independentobservations

p(y1, . . . , yn|x1, . . . , xn,θ, σ)

2φ(x) = 1√2π

exp{−x2

Figure: Rating curve. Example of a non-linear regression.

X (water level)

runoff

RC(X,θ)

## deterministic raiting curve modelRC <- function(x, theta) {y <- theta[1]*(x-theta[2])^theta[3]return(y)

## log likelihood with normal distributed errors## sigma is included as theta[4]=sigma.logL <- function(theta, y.data, x.data) {mean.y <- RC(x.data, theta[1:3]) # mean value for yLL <- sum(dnorm(y.data, mean=mean.y,

sd=theta[4], log=TRUE))return(LL)

Example 3: limit of quantificationcensored data

Observed data ylab 1 lab 2 lab 3 . . .

concentration y1 y2 n.d. . . .

Limit of quantification: LOQstandard deviation of measurements: σ

Model description“A model? I just want to calculate theconcentration.”

Figure: Left censored data.

Model descriptionThe measurements are normaldistributed around the true meanθ with standard deviation σ.

Example 3: limit of quantificationcensored dataLikelihood for a single measured observation:

p(yi |θ, σ) = φ

(yi − θσ

Likelihood for a single “not detected” observation:

Prob(n.d.|θ, σ) = Prob(yi < LOQ|θ, σ) =∫ LOQ

0p(y |θ, σ) dy

=Φ(LOQ− θ

p(y1, . . . , yn|θ, σ) = Prob(yi < LOQ|θ, σ)#censored ∏¬censored

p(yi |θ, σ)

p(yi |θ, σ) = φ

(yi − θσ

0p(y |θ, σ) dy

=Φ(LOQ− θ

p(yi |θ, σ)

p(yi |θ, σ) = φ

(yi − θσ

0p(y |θ, σ) dy

=Φ(LOQ− θ

p(yi |θ, σ)

## data, if left censored = "nd"y <- c(y1=0.35, y2=0.45, y3="nd", y4="nd", y5=0.4)

## log likelihoodlogL <- function(theta, y, sigma, LOQ) {## number of censored observationsn.censored <- sum(y=="nd")## convert not censored observations into type ’numeric’y.not.cen <- as.numeric(y[y!="nd"])

## likelihood for not censored observationsLL.not.cen <- sum(dnorm(y.not.cen, mean=theta, sd=sigma, log=TRUE))## likelihood for left censored observationsLL.left.cen <- n.censored * pnorm(LOQ, mean=theta, sd=sigma, log=TRUE)return(LL.not.cen + LL.left.cen)

Example 4: auto-regressive modelauto-correlated data

Observed data yequally spaced time series data y1, . . . , yn.

Model descriptionClassical AR(1) model:

yt+1 = θyt + εt+1, εt+1 ∼ N(0, σ2)y0 = k

1880 1920 1960

576577578579580581582

Figure: Annual waterlevel of Lake Huron.Brockwell and Davis(1991)

Observations are only dependent on the preceding observation.Hence:

p(y1, . . . , yn|θ, σ, y0) =n∏

i=1p(yi |yi−1, θ, σ)

The conditional probabilities are all normal

p(yt |y0, . . . , yt−1, θ, σ) = p(yt |yt−1, θ, σ) = φ

(yt − θyt−1σ

LL <- dnorm(y[1], k, sigma, log=TRUE) +sum(dnorm(y[2:n], mean=theta*y[1:(n-1)],sd=sigma, log=TRUE))

Observations are only dependent on the preceding observation.Hence:

p(y1, . . . , yn|θ, σ, y0) =n∏

i=1p(yi |yi−1, θ, σ)

The conditional probabilities are all normal

p(yt |y0, . . . , yt−1, θ, σ) = p(yt |yt−1, θ, σ) = φ

(yt − θyt−1σ

LL <- dnorm(y[1], k, sigma, log=TRUE) +sum(dnorm(y[2:n], mean=theta*y[1:(n-1)],sd=sigma, log=TRUE))

Normality and “iid.”Reality is normally not normal distributed

Typical statistical assumption, such as• normality• independence

are often chosen from a computational view point.

However, other distribution assumptions can beincorporated easily in most cases.

Andreas Scheidegger General remarks 20

Rating curve modified

Lets assume we observe more extreme values than compatible witha normal distribution → try t-distribution.

## log likelihood with t-distributed errors## theta[4]=scale, theta[5]=degree of freedom.logL <- function(theta, y.data, x.data) {mean.y <- RC(x.data, theta[1:3]) # mean value for yresiduals <- (y.data - mean.y)/theta[4] # scalingLL <- sum(dt(residuals, df=theta[5], log=TRUE))return(LL)

Summary

1. Decompose the joint probability density:

p(y|θ) = p(y1, . . . , yn|θ) = p(y1|θ)p(y2|θ, y1)p(y3|θ, y1, y2). . . p(yn|θ, y1, . . . yn−1)

2. Make assumptions to formulate the conditional probabilities:

p(yi |θ, y1, . . . yi−1)

3. Make inference, check assumptions, revise if necessary.

Formulation of model likelihood functions

Science

Transcript of Formulation of model likelihood functions

Towards Maximum Likelihood: Learning Undirected Graphical ...

Quasi-Maximum Likelihood Estimation for Conditional Quantilespowell/e242_sp04/komunjer.pdf · · 2004-04-23Quasi-Maximum Likelihood Estimation for Conditional Quantiles ... ically

Maximaum likelihood timing estimation

Invariance Property and Likelihood Equation of MLE - Module 4

Fehlspezi kation, Quasi-Likelihood und Sch …semwiso.userweb.mwn.de/schaetzentesten2-ss10/skript/ST2...Kapitel 6 Fehlspezi kation, Quasi-Likelihood und Sch atzgleichungen Bisher haben

Single sourse shortest paths: formulation - Gabriel …gabrielistrate.weebly.com/uploads/2/5/2/6/2526487/curs… · · 2009-12-10Single sourse shortest paths: formulation GIVEN:

Aspects of Likelihood Inferencethe parameter of interest I e.g. partially linear regression models, with ‘smooth’ function replaced by a linear combination of basis functions E(yi)

Likelihood-free Inference of Fornax Dark Matter Density …However, traditional methods like MCMC that relies on likelihood calculations is intractable because of the integration needed

FORMULATION AND ANALYSIS OF ALTERNATING EVOLUTION (AE) SCHEMES

Optimization Formulation of Antioxidant Cream Vitamin E (α ...

Formulation And Evaluation Of Fast Dissolving Tablets Of Felodipine · Formulation And Evaluation Of Fast Dissolving Tablets Of Felodipine K. Santosh Kumar, Bhavani Boddeda*, J. Vijaya

11.formulation of seismic passive resistance of non

SLAC-PUB-15589 A MAXIMUM LIKELIHOOD APPROACH TO …

THE IMPACTS OF FORMULATION AND STORAGE ON α …

PC FUNCTIONS Graphing Polar Functions

Protein Stability and Formulation Bioseparation Engineering.

A New formulation of the Normalized Variable Diagram ... paper... · Resolution Formulation Based on the Normalized Variable Methodology M. Darwish and F. Moukalled American University

Geometrical likelihood for Bs μ + μ -

Lab 13: Pseudo-likelihoodjernite/ir14/Lab13.pdf · Lab 13: Pseudo-likelihood Yacine Jernite December 11, 2014 Yacine Jernite Lab 13: Pseudo-likelihood

Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because