Bayes Factors, posterior predictives, short intro to...

Bayes Factors, posterior predictives, short intro to

RJMCMC

Thermodynamic Integration

Bayesian Statistical Inference

P(θ∣Y )∝ P(Y∣θ)π (θ)

Once you have posterior samples you can compute the predictive distribution of future observations:

P(Ynew∣θ,Yold )

To do this you sample a from

(Sample 1 value from your collection of posterior samples)

Generate simulated data from the likelihood:

Repeat for a large sample of from to get at the posterior predictive distribution

P(Ynew∣θ*)

P(θ∣Y )θ*

Posterior predictive distribution:

No need to use asymptotic normal assumptions or a single point and variance estimate for

Any shaped distribution on naturally feeds it’s entire distribution through to the data generating process!

P(θ∣Y )

Obtaining is related to obtaining a set of fake data samples for parametric bootstrap, except that the distribution assumption on the parameters doesn’t require asymptotic arguments.

P(Ynew∣θ,Yold )

Another diagnostic tool; Obtain a sample from and see if it is similar to the data.

Use the posterior predictive distribution for sequential experimental design: Choose the new covariate points that optimize some criterion.

P(Ynew∣θ,Yold )

Hypothesis testing ; model comparison

Ultimately we want inference on

But computing the marginal likelihood is difficult.

P(M∣Y )

P(M∣Y ) =P(Y∣θM )π (θ)

Θ∫ π (M )dθ

Usually Bayesians make model decisions through Bayes Factors

B12 (y) =w1(y)w2 (y)

w(y) = π (θ) f (y∣θ)dθΘ∫

Bayes Factor interpretation

B12 (y) =w1(y)w2 (y)

w(y) = π (θ) f (y∣θ)dθΘ∫

The odds ratio for two models:

posterior odds = Bayes Factor X prior odds

Uniform prior odds across models implies that

posterior odds = Bayes Factor

posterior odds = Bayes Factor X prior odds

So the Bayes factor is the amount of evidence for one model compared to another.

Bf = the change in odds when moving from the prior to the posterior

Recall: P(θ∣Y ) = P(y∣θ)P(θ)P(y)

P(Y ) = P(y∣θ)P(θ)dθ∫

Newton & Raftery (1994)

P(θ∣Y ) = P(y∣θ)P(θ)P(y)

P(Y )P(θ∣Y )P(y∣θ)

= P(θ)

P(Y ) P(θ∣Y )P(y∣θ)

dθ∫ = P(θ)dθ = 1∫

P(Y ) P(θ∣Y )P(y∣θ)

dθ∫ = 1

E 1P(y∣θ)⎡

⎣⎢

⎦⎥P(θ∣Y )

And estimated P(Y) by P̂ (Y ) =

P (y | ✓)

Compute this by calculating the likelihood for each value of that was obtained from the posterior sampling step

P̂ (Y ) =

P (y | ✓)

The harmonic mean estimator is very very very very very sensitive to outliers with extremely small values of

But it is asymptotically unbiased

Estimate P(Y) by

P(y∣θ)

P̂ (Y ) =

P (y | ✓)

Calderhead and Girolami (2009) showed that the harmonic mean estimator is can be massively biased for finite samples

Thermodynamic Integration Friel, N., Pettitt, A., 2008. Marginal likelihood estimation via power posteriors. Journal of the Royal Statistical Society: Series B 70 (3)

Calderhead, Ben, and Mark Girolami. "Estimating Bayes Factors Via Thermodynamic Integration and Population MCMC." Computational Statistics and Data Analysis 53 (2009)

In Parallel Tempering we sample from

But we can get the marginal likelihood via:

Pm (θ∣Y ) =P(y∣θ)βm P(θ)

Pm (y)

log(p(Y )) = log p(Y∣θ)⎡⎣ ⎤⎦Pm (θ∣Y )dθ dβ∫0

log(p(Y )) = E log p(Y∣θ)⎡⎣ ⎤⎦{ }Pm (θ∣Y ) dβ0

Compute via 1-dimensional quadrature over the temperature!

log(p(Y )) = 12

βm − βm−1( ) Em + Em−1[ ]m∑

log(p(Y )) = E log p(Y∣θ)⎡⎣ ⎤⎦{ }Pm (θ∣Y ) dβ0

Em = E log p(Y∣θ)⎡⎣ ⎤⎦{ }Pm (θ∣Y )

To compute log(marginal likelihoods) all we need is to define a good grid for temperatures

Calderhead and Girolami (2009) suggest

log(p(Y )) = 12

βm − βm−1( ) Em + Em−1[ ]m∑

Em = E log p(Y∣θ)⎡⎣ ⎤⎦{ }Pm (θ∣Y )

β =seq( from = 1,to = N )

N⎛⎝⎜

⎞⎠⎟

Parallel Tempering To the Extreme!

R Studio plots for the Galaxy data set (3 groups, density of one of the mean parameters vs temperature

Parallel Tempering densitiesThat dip just before

temperature β = 1 is real. It is caused by the introduction

of new modes

Compare the 3 group Galaxy to the 6 group galaxy.

Show plots of mean density vs temperature

25,000 iterations with 30 parallel chains

B12 (y) =w1(y)w2 (y)

Now, back to RStudio to compare the Galaxy data with k=3 groups vs k=6 groups.

(the result: there is decisive evidence that the k=3 groups model is better)

Alternative to Bayes Factors: RJMCMC

MODEL POSTERIOR PROBABILITY

Likelihood:

Parameter Prior:

Model Prior: for

The marginal posterior probability of a model is helpful when the answer is not clear

P(Y∣θ j ,M j )

P(θ j∣M j )

P(M j∣Ω)

P(M j∣Y ,Ω) =P(Y∣θ j ,M j ,Ω)P(θ j∣M j ,Ω)P(M j∣Ω)dθ j∫

M j ∈Ω

P(M j∣Y ,Ω) = P(θ j ,M j∣Y ,Ω)dθ j∫

Our goal is to get in a single MCMC chain

even if contains a lot of models

We need simulation methods that sample across models.

P(M j∣Y ,Ω) = P(θ j ,M j∣Y ,Ω)dθ j∫

We can avoid extensive MCMC for each model and instead sample from directly!

We just adjust MCMC so at each iteration we:

1. Sample j, i.e. choose a model Mnew

2. Then propose a from Mnew

3. Keep Mnew and with probability

REVERSIBLE JUMP MCMC

P(M j∣Y ,Ω)

α = min P(Y |θnew ,Mnew )P(θnew ,Mnew )Pnew (vnew )P(Y |θold ,Mold )P(θ old

,Mold )Pold (vold )Jold,new ,1

⎝⎜

⎠⎟

Biometrika (1995), 82, 4, pp. 711-32

We use auxiliary variables to augment the dimension space so that dim(Mold) = dim (Mnew)

α = min P(Y |θnew ,Mnew )P(θnew ,Mnew )Pnew (vnew )P(Y |θold ,Mold )P(θ old

,Mold )Pold (vold )Jold,new ,1

⎝⎜

⎠⎟

JACOBIANWe need the Jacobian for the transformation

And the proposed values needs to allow the possibility of being accepted.

∂θold ,1∂θnew,1

...∂θold ,1∂θnew, pnew

...∂θold ,1∂vnew

M O M M∂θold , pold∂θnew,1

...∂θold , pold∂θnew, pnew

M M O M∂vold∂θnew,1

... ... ∂vold∂vnew

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

.... . .

M1 and M2 have different parameter dimensions

Often model parameters don’t have an obvious a transformation allowing an intuitive transition

The last accepted value might be from a different model and may require a large jump in the parameter space.

POTENTIAL PROBLEMS

Moving from M1 to M2 to will require moving β1,0

quite far to get to a reasonable location for β2,0

Y ⇠ N(�2,0 + �2,1X + �2,2X2,�2

Y ⇠ N(�1,0 + �1,1X,�21)

M1: Galaxy with 3 Gaussians

M2: Galaxy with 4 Gaussians

Moving from M1 to M2 can be done by dividing one of the current Gaussians. From M2 to M1 can be done through merging 2 components

RJMCMC: Beautiful in principle, nasty in practice

Needs: transition function between parameters in multiple model spaces.

Efficiency depends completely on this functional choice and the distribution for the auxiliary variables.

Works well when we can use birth / death process (change-point analysis).

Bayes Factors, posterior predictives, short intro to...

Documents

Transcript of Bayes Factors, posterior predictives, short intro to...

Bootstrap Methods and the Accuracy of Large-Scale Estimatorsstatweb.stanford.edu/~ckirby/brad/talks/2009BootstrapMethods.pdf · -4 -2 0 2 4 0.000 0.005 0.010 0.015 0.020 Simulation:

Inference Based on the Wild Bootstrap - Carleton University€¦ · Bootstrap methods are sometimes used to estimate standard errors and covariance matrices. ... Might be useful for

Jackknife, bootstrap et cross-validation - lsta.upmc.fr · IntroductionPartie 1 : Jackknife Partie 2 : Boostrap Mod eles de r egression Erreur de g en eralisation, cross-validation

The Bootstrap and Jackknife

Page 1 Steganography Paul Krzyzanowski pxk@cs.rutgers.edu ds@pk.org Distributed Systems Except as otherwise noted, the content of this presentation is.

Bootstrap and Resampling Methodsluke/classes/STAT7400/...Bootstrap and Resampling Methods Often we have an estimator T of a parameter q and want to know its sampling properties –

Properties of Variations of Power-Expected-Posterior Priorsjbn/presentations/2017...Slide 2/29 Synopsis 1. Introduction: Bayesian Model Selection and Power-Expected-Posterior (PEP)

Bootstrap and Linear Regression - MIT OpenCourseWare › courses › mathematics › 18-05-introduction-to... · Compute and store the bootstrap diﬀerence θ ∗ − θ. ˆ Repeat

Bootstrap of residual processes in regression: to smooth ... › pdf › 1712.02685.pdf · estimator, Koul and Lahiri (1994) and Neumeyer (2008, 2009) proposed bootstrap procedures,

Inference Based on the Wild Bootstrap · Bootstrap Testing Consider the heteroskedasticity-robust t statistic l 0 l) = l 0 q l (X>X) 1X>X^ (X>X) 1 ll: (17) To calculate wild bootstrap

Μια εισαγωγή στο Bootstrap

ΑΝΑΔΙΑΜΟΡΦΩΣΗ ΔΕΞΙΑΣ ΚΟΙΛΙΑΣstatic.livemedia.gr/livemedia/documents/al18173_us75_20160514090539_avgeropoulou.pdfhazard with bootstrap Cl adjustment model

Active transforming growth factor-β2 in the aqueous humor ...RESEARCH ARTICLE Active transforming growth factor-β2 in the aqueous humor of posterior polymorphous corneal dystrophy

Bootstrap Confidence Intervals - ST552 Lecture 13 · Motivation Theinferenceswe’vecoveredsofarreliedonourassumptionof Normalerrors: ∼ N(0,σ2I n×n) Forexample,we’veseenunderthisassumption,theleastsquares

Introduction - Université de Montréal · distribution for Markov chain Monte Carlo (MCMC). Applications include the approximate evaluation of likelihood functions and Bayesian posterior

Care of Atlas (C2) Fractures - centracare.com · Commonly associated with C1 fx ... Posterior atlas and axis screw-rod fixation and fusion. ... 4/12/2018 11 Posterior atlas and axis

Metformin Inhibits Isoproterenol-induced Cardiac ... … · Metformin Inhibits Cardiac Hypertrophy 379 Fig. 1. Heart weight (A) and left ventricular posterior wall thickness (LVPW)

07 - Bootstrap and Splines › SYS6018 › lectures › 07-bootstrap.pdf · 07 - Bootstrap and Splines SYS 6018 | Fall 2019 6/17 1 #-- Bootstrap Distribution 2 M =2000 # number of

Δημιουργία μίας one-page responsive πρότυπης ιστοσελίδας με χρήση Bootstrap

Resampling Methods: The Bootstrap · 4.3 Introductory Bootstrap Example Consider a SRS with n= 10 having y-values 0 1 2 3 4 8 8 9 10 11 The following output is based on B= 40 bootstrap