Bayesian methods for parameter estimation and model comparison · Bayesian methods for parameter...

Bayesian methods for parameter estimation and

model comparison

Carson C Chow, LBM, NIDDK, NIH

Monday, April 26, 2010

Task: Fit a model (ODE, PDE,...) to data - estimate parameters

dt= f(t; !)

y(t) = g(t|y(0), !)

dt= f(t; !)

y(t) = g(t|y(0), !)

dt= f(t; !)

y(t) = g(t|y(0), !)

dt= f(t; !)

y = at + b

y(t) = g(t|y(0), !)

Questions:

Questions: What algorithm to use?

How good is the fit?

Questions:

Is there a better model?

What algorithm to use?

Questions:

Sensitivity?

Questions:

Answer: Use Bayesian inference and MCMC

Sensitivity?

Questions:

Sensitivity?

Answer: Invert vs. infer

Bayesian inference

Frequentist: Probability is a frequency (of a random variable)

Bayesian inference

Bayesian: Probability is a measure of uncertainty

Bayesian inference

Bayesian: Probability is a measure of uncertainty

Jaynes: Probability is extended logic

Bayesian inference

Models and parameters have probabilities

Anything can be assigned a probability

Makes everything straightforward

D = Data θ = Parameters

Parameter estimationy(θ,t): solution

i.e. Maximum likelihood estimation

θ that maximizes likelihood

(Di ! yi(!))2e.g. Minimize mean square error

P (D|!) ! exp

" (Di " yi(!))2

Parameter estimationy(θ,t): solution

i.e. Maximum likelihood estimation

θ that maximizes likelihood

(Di ! yi(!))2e.g. Minimize mean square error

P (D|!) ! exp

" (Di " yi(!))2

Statistics

But in frequentist statistics models and parameters cannot have probabilities

Confidence intervals of a parameter are with respect to sampling errors

“Natural” thing to do is to find the probability distribution for a model or parameters

Bayes theorem

P (!, D) = P (!|D)P (D) = P (D|!)P (!)

Bayes theorem

P (!|D) =P (D|!)P (!)

P (!, D) = P (!|D)P (D) = P (D|!)P (!)

Bayes theorem

PriorP (!|D) =

P (D|!)P (!)P (D)

P (!, D) = P (!|D)P (D) = P (D|!)P (!)

Bayes theorem

Likelihood

P (!|D) =P (D|!)P (!)

P (!, D) = P (!|D)P (D) = P (D|!)P (!)

Bayes theorem

Likelihood

P (!|D) =P (D|!)P (!)

Evidence

P (!, D) = P (!|D)P (D) = P (D|!)P (!)

Bayes theorem

PriorPosterior

Likelihood

P (!|D) =P (D|!)P (!)

Evidence

P (!, D) = P (!|D)P (D) = P (D|!)P (!)

Bayes theorem

Bayesian inference reduces statistics to one equation

PriorPosterior

Likelihood

P (!|D) =P (D|!)P (!)

Evidence

P (!, D) = P (!|D)P (D) = P (D|!)P (!)

Parameter estimation

P (!|D) =P (D|!)P (!)

P (!|D) ! P (D|!)P (!)

P (D) =!

P (D|!)P (!)d!

y(θ,t): solution

P (!|D) =P (D|!)P (!)

P (!|D) ! P (D|!)P (!)

P (D) =!

P (D|!)P (!)d!

" (Di " yi(!))2

#y(θ,t): solution

P (!|D) =P (D|!)P (!)

P (!|D) ! P (D|!)P (!)

P (D) =!

P (D|!)P (!)d!

" (Di " yi(!))2

#y(θ,t): solution

Probability of parameter given the data

0 20 40 60 80 100−2

Model as exponential decay with Gaussian noise

Posterior probability

Likelihood

P(y|b, t) ! "t

e!(yt!10exp(!bt))2/2

P (b|y) ! P (y|b, t)P (b)

Example

0 0.025 0.05 0.075 0.10

b0 20 40 60 80 100−2

0 20 40 60 80 100−2

Likelihood

true b=0.05

Prior Posterior

P(y|b, t) ! "t

P (b|y) ! P (y|b, t)P (b)

Example

0 20 40 60 80 100−2

0 0.025 0.05 0.075 0.10

Likelihood

true b=0.05

P(y|b, t) ! "t

P (b|y) ! P (y|b, t)P (b)

Example

Sensitivity analysis

!"2lnP (D|")

Deviation around M.L. Fisher information

I(!) = !E

"!2lnP (D|!)

!"2lnP (D|")

I(!) = !E

"!2lnP (D|!)

!"2lnP (D|")

I(!) = !E

"!2lnP (D|!)

!"2lnP (D|")

I(!) = !E

"!2lnP (D|!)

!"2lnP (D|")

Deviation around M.L.

Posterior probability is more robust

Fisher information

I(!) = !E

"!2lnP (D|!)

Model comparison

Given some data, what is the best model?

Fit the data and penalize extra parameters

Model comparison- Two competing models M1 and M2

P (M1|D) =P (D|M1)P (M1)

P (D)P (M2|D) =

P (D|M2)P (M2)P (D)

P (D|M) =!

P (D|M, !)P (!|M)d!

M1 : y! = f1(y, !)

M2 : y! = f2(y, !)

P (M1|D) =P (D|M1)P (M1)

P (D)P (M2|D) =

P (D|M2)P (M2)P (D)

P (D|M) =!

P (D|M, !)P (!|M)d!

P (M1|D)P (M2|D)

=P (D|M1)P (D|M2)

P (M1)P (M2)

M1 : y! = f1(y, !)

M2 : y! = f2(y, !)

Bayes factor

P (M1|D) =P (D|M1)P (M1)

P (D)P (M2|D) =

P (D|M2)P (M2)P (D)

P (D|M) =!

P (D|M, !)P (!|M)d!

P (M1|D)P (M2|D)

=P (D|M1)P (D|M2)

P (M1)P (M2)

M1 : y! = f1(y, !)

M2 : y! = f2(y, !)

P (M1|D) =P (D|M1)P (M1)

P (D)P (M2|D) =

P (D|M2)P (M2)P (D)

P (D|M) =!

P (D|M, !)P (!|M)d!

P (M1|D)P (M2|D)

=P (D|M1)P (D|M2)

P (M1)P (M2)

M1 : y! = f1(y, !)

M2 : y! = f2(y, !)

P (M1|D) =P (D|M1)P (M1)

P (D)P (M2|D) =

P (D|M2)P (M2)P (D)

P (D|M) =!

P (D|M, !)P (!|M)d!

P (M1|D)P (M2|D)

=P (D|M1)P (D|M2)

P (M1)P (M2)

M1 : y! = f1(y, !)

M2 : y! = f2(y, !)

log odds log P (D|M1)! log P (D|M2)

Likelihood integrated over parameters

P (D|M) =!

P (D|M, !)P (!|M)d!

e.g. for 0 < θ < σP (!|M) =1"

P (D|M) =1!

0P (D|M, ")d"∴

Likelihood integrated over parameters

P (D|M) =!

P (D|M, !)P (!|M)d!

e.g. for 0 < θ < σP (!|M) =1"

P (D|M) =1!

0P (D|M, ")d"∴

P(D|M) ∝ “overlap”

LikelihoodPrior

for k parametersP (D|M) ! 1!k

0P (D|M, ")dk"

0P (D|M, !)dk! = L"k

Maximum likelihood

0P (D|M, ")dk"

0P (D|M, !)dk! = L"k

P (D|M) ! L!k

"kOccam’s factor

Maximum likelihood

0P (D|M, ")dk"

lnP (D|M1) ! lnL" k ln!/"

0P (D|M, !)dk! = L"k

P (D|M) ! L!k

"kOccam’s factor

Maximum likelihood

0P (D|M, ")dk"

lnP (D|M1) ! lnL" k ln!/"

0P (D|M, !)dk! = L"k

P (D|M) ! L!k

"kOccam’s factor

Maximum likelihood

~ Bayes information criterion !BIC

2= ln L! k ln

or Akaike information criterion !AIC

2= lnL! k

AIC and BIC: penalize the log of the likelihood with the number of parameters

Caveat: These approximations generally only valid if likelihood function near normal and peaked

Real criterion: maximize likelihood and the overlap of likelihood with prior

e.g. constants are not penalized

Priors

• Main complaint about Bayesian framework is the necessity of a prior

• All modeling is biased by prior information

• Assuming modeling is possible is already a prior

• Bayesian perspective just makes this systematic and consistent

Curse of dimensionality

k parameters sampled at p points is pk

100 parameters at 10 points is a googol

Consider a model with k parameters

100 parameters at 10 points is a googol >>

Modeling is impossible without priors

Markov Chain Monte Carlo

MCMC method is the de facto method to compute posteriors and do model comparison

Combined with parallel tempering can do parameter estimation and model comparison in one computation

Probabilistic way to compute integrals

MCMC: Metropolis algorithm

1. Start with θ, guess θ´

3. Accept with probability min(r,1)

2. Compute

4. Repeat

r =P (!!|D)P (!|D)

=P (D|!!)P (!!)P (D|!)P (!)

MCMC: Metropolis algorithm

1. Start with θ, guess θ´

3. Accept with probability min(r,1)

2. Compute

4. Repeat

r =P (!!|D)P (!|D)

=P (D|!!)P (!!)P (D|!)P (!)

Time series converges to posterior

E.g. Fit model to data

0 5 10 15 20−2

P (y|a, b, t) !20!

e!(yt!a exp(!bt))2/2Model

0 200 400 600 800 10000.2

Iterations

bP (y|a, b, t) !

e!(yt!a exp(!bt))2/2

0 200 400 600 800 10006

Iterations

0 0.2 0.4 0.6 0.80

b5 10 15 200

0.33±0.06 11.4±1.7

0 5 10 15 20−2

P (y|a, b, t) !20!

e!(yt!a exp(!bt))2/2Model

Model fit

Posterior is stationary distribution of MCMC

since if θ is in the posterior, so is θ´

Transition probability P (!!|!) = min!1,

P (!!|D)P (!|D)

min"1,

P (!|D)P (!!|D)

#P (!!|D)d!

!P (!!|!)P (!|D)d!

= P (!!|D)!

P (!|!!)d! = P (!!|D)

Parallel temperingIntroduce inverse temperature β

0 ! ! ! 1

Run at multiple β and swap randomly

Smaller β runs search more of parameter space

Model comparison in same run since

!("|D) ! p(D|")!P (D)

ln p(D|M) =! 1

0!ln p(D|!,M)"!d"

Applications

1. Cell state transitions in embryonic stem cells

2. The effect of insulin on the free fatty acidsin the blood

Embryonic stem (ES) cells

Wikipedia: embryonic stem cells

Embryonic stem (ES) cells

Wikipedia: embryonic stem cells

ES cells

Differentiation of ES cells with soluble factors

ES cells with LIFAdd

Retinoic Acid

ExE cells

Differentiation of ES cells with soluble factors

ES cells with LIFAdd

Retinoic Acid

ExE cells

Experiments

Experiments: • forward: treat ES cells with RA (no LIF) for 2-5 days• reverse: replace RA with LIF

Measurements: • quantitative RT-PCR (population average gene expression)• Nanog-GFP reporter (single cell gene expression)• Flow cytometry

Important Marker Genes

ES cells: Nanog, Oct4, Sox2ExE cells: Gata6

Flow cytometry histogram of Nanog

Fluorescent intensity (a.u.)

Reverse transition

1. High Nanog under LIF

Reverse transition

2. Transition to low Nanog under RA

Reverse transition

3. High Nanog returns when LIF restored

Reverse transition

RA 2d/LIF 6d

RA 2d/LIF 3d

RA 2d/LIF 1.7d

Questions

Are low and high Nanog distinct stable states?

Are there transitions between states?

Or is it just the growth rates are different?

Questions

Are low and high Nanog distinct stable states?

Are there transitions between states?

Or is it just the growth rates are different?

Answer with PDE model and Bayesian inference

!w(y, t)!t

!y2Dw ! !

!yf(y)w + r(y)w

V (y) = !! y

0f(s)ds

Drift-diffusion-growth equation

diffusion drift growth

y = Nanog levelw = probability density

EffectivePotential

Test 23 model hypotheses

5 double-well potential models

with 4 net proliferation rate models

3 single well potential models with bimodal net proliferation rate

Likelihood function

data point i for trial j

model prediction

Fit to 6 time dependent experiments

r1 r2 r3 r4

Constant Linear sigmoidal

-4492±25 -1792±19 -1115±24 -1105±24

-110346±23 -2150±17 -7±28 0±30

-5198±18 -1488±32 -130±25 -142±28

-15641±19 -2666±18 -1251±38 -1288±20

-4698±17 -1851±16 -961±24 -1036±22

Log liklihoods for 20 two-state hypotheses

-1657±21 -2756±22 -2946±34

Log likelihoods for 3 single-state hypotheses

Best modelFit to 6 experiments Model

Huang, Raaka, and Chow, (submitted)

Cell state transitions

low high

RA no LIF

low high

Faster growth

0.0076/cell/day →0.011/cell/day ←

low Nanog1.3 doublings/day

Text high Nanog1.8 doublings/day

Insulin’s effect on Free Fatty Acids

Insulin resistance and Type 2 diabetes major health concern

Insulin increases glucose uptake by muscle andsuppresses release of FFAs by fat cells (lipolysis)

Need a way to quantitatively measure sensitivity to insulin

0 20 40 60 80 100 120Time (min)

tratio

Glucose

Insulin

Insulin’s affect on glucose and free fatty acids

Periwal et al. 2008

0 50 100Time (min)

lin (µ

0 50 100Time (min)

lin (µ

IndividualVariation

dt= SGGb ! (SG + SIX)G

dt= cX [I(t)!X ! Ib]

dt= L(X)! cfF

dt= L(Y )! cfF

dt= cY [I(t)! Y ! Ib,Y ]

} Glucose Minimal Model,(Bergman et al., 1979)

Periwal et al., 2008

Models Results

Fit to 102 subjects

Models Results

Fit to 102 subjects

Models Results

Max Likelihood

Fit to 102 subjects

Summary

Bayesian methods give a straightforward means for parameter estimation and model comparison

Combined with Markov Chain Monte Carlo method, provides a one stop shop for all your model fit and evaluation needs

Many mathematical challenges, e.g. estimate for convergence time of MCMC

FFA: Vipul Periwal, Ann Sumner, Madia Ricks, NIDDK, Gloria Vega, UT, Richard Bergman, USC

Acknowledgments

Stem Cells: Wei Huang, Bruce Raaka, NIDDK

slides on sciencehouse.wordpress.com

Bayesian methods for parameter estimation and model comparison · Bayesian methods for parameter...

Documents

Transcript of Bayesian methods for parameter estimation and model comparison · Bayesian methods for parameter...

Symbol TIMING estimation

Spectrum Estimation

Bayesian Inference for Normal Mean - University of Torontonosedal/sta313/sta313-normal-mean.pdf · Bayesian Inference for Normal Mean. ... (1 ) 100% Bayesian ... where the z-value

Dynare & Bayesian Estimation - Wouter den · PDF fileDynare & Bayesian Estimation Wouter J. Den Haan London School of Economics c 2011 by Wouter J. Den Haan August 19, 2011

Bayesian Estimation of DSGE models - Dynare

Chapter 8: Estimation

Estimation - Chalmers

, Stefano Fo a , Martin Kunz , Michele Maggiore and ... · Non-local gravity and comparison with observational datasets. II. Updated results and Bayesian model comparison with CDM1

Introduction to Bayesian Statistics - 3milotti/Didattica/... · Introduction to Bayesian Statistics - 3 Edoardo Milotti Università di Trieste and INFN-Sezione di Trieste . Bayesian(inference(and(maximum0likelihood

Chapter 6 Estimation

ENTERTAINMENT SYSTEM COMPARISON

Chapter 5. Bayesian Statistics (II)

Digital Communications Fredrik Rusek · Chapter 12 – Linear Bayesian Estimation Example 12.1 Observations 1. With no prior, the sample mean is MVU 2. The LMMSE is a tradeoff between

ERP cost estimation

Lecture 17 – Part 1 Bayesian Econometrics 1 Lecture 17 – Part 1 Bayesian Econometrics Bayesian Econometrics: Introduction • Idea: We are not estimating a parameter value, ...

The Classical Linear Regression Model - UCM...Andrea Carriero (QMUL) The Classical Linear Regression Model January 2018 12 / 41 Bayesian estimation Theil mixed estimation - marginal

Doubly Robust Bayesian Inference for Non-Stationary ... · 2.1 General Bayesian Inference (GBI) with -Divergences ( -D) Standard Bayesian inference minimizes the KLD between the Data

Bayesian Methods in Positioning Applications

ABC: Bayesian Computation Without Likelihoods

Stochastic Volatility Models: Bayesian Framework