Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D...

42
Bayesian inference: what it means and why we care Robin J. Ryder Centre de Recherche en Math´ ematiques de la D´ ecision Universit´ e Paris-Dauphine 6 November 2017 Mathematical Coffees Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 1 / 42

Transcript of Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D...

Page 1: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Bayesian inference: what it means and why we care

Robin J. Ryder

Centre de Recherche en Mathematiques de la DecisionUniversite Paris-Dauphine

6 November 2017Mathematical Coffees

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 1 / 42

Page 2: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

The aim of Statistics

In Statistics, we generally care about inferring information about anunknown parameter θ. For instance, we observe X1, . . . ,Xn ∼ N (θ, 1) andwish to:

Obtain a (point) estimate θ of θ, e.g. θ = 1.3.

Measure the uncertainty of our estimator, by obtaining an interval orregion of plausible values, e.g. [0.9, 1.5] is a 95% confidence intervalfor θ.

Perform model choice/hypothesis testing, e.g. decide betweenH0 : θ = 0 and H1 : θ 6= 0 or between H0 : Xi ∼ N (θ, 1) andH1 : Xi ∼ E(θ).

Use this inference in postprocessing: prediction, decision-making,input of another model...

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 2 / 42

Page 3: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Why be Bayesian?

Some application areas make heavy use of Bayesian inference, because:

The models are complex

Estimating uncertainty is paramount

The output of one model is used as the input of another

We are interested in complex functions of our parameters

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 3 / 42

Page 4: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Frequentist statistics

Statistical inference deals with estimating an unknown parameter θgiven some data D.

In the frequentist view of statistics, θ has a true fixed (deterministic)value.

Uncertainty is measured by confidence intervals, which are notintuitive to interpret: if I get a 95% CI of [80 ; 120] (i.e. 100± 20)for θ, I cannot say that there is a 95% probability that θ belongs tothe interval [80 ; 120].

Frequentist statistics often use the maximum likelihood estimator: forwhich value of θ would the data be most likely (under our model)?

L(θ|D) = P[D|θ]

θ = arg maxθ

L(θ|D)

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 4 / 42

Page 5: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Bayes’ rule

Recall Bayes’ rule: for two events A and B, we have

P[A|B] =P[B|A]P[A]

P[B].

Alternatively, with marginal and conditional densities:

π(y |x) =π(x |y)π(y)

π(x).

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 5 / 42

Page 6: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Bayesian statistics

In the Bayesian framework, the parameter θ is seen as inherentlyrandom: it has a distribution.

Before I see any data, I have a prior distribution on π(θ), usuallyuninformative.

Once I take the data into account, I get a posterior distribution,which is hopefully more informative.By Bayes’ rule,

π(θ|D) =π(D|θ)π(θ)

π(D).

By definition, π(D|θ) = L(θ|D). The quantity π(D) is a normalizingconstant with respect to θ, so we usually do not include it and writeinstead

π(θ|D) ∝ π(θ)L(θ|D).

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 6 / 42

Page 7: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Bayesian statistics

π(θ|D) ∝ π(θ)L(θ|D)

Different people have different priors, hence different posteriors. Butwith enough data, the choice of prior matters little.

We are now allowed to make probability statements about θ, such as”there is a 95% probability that θ belongs to the interval [78 ; 119]”(credible interval).

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 7 / 42

Page 8: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Advantages and drawbacks of Bayesian statistics

More intuitive interpretation of the results

Easier to think about uncertainty

In a hierarchical setting, it becomes easier to take into account all thesources of variability

Prior specification: need to check that changing your prior does notchange your result

Computationally intensive

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 8 / 42

Page 9: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Example: Bernoulli

Take Xi ∼ Bernoulli(θ), i.e.

P[Xi = 1] = θ P[Xi = 0] = 1− θ.

Possible prior: θ ∼ U([0, 1]): π(θ) = 1 for 0 ≤ θ ≤ 1.Likelihood:

L(θ|Xi ) = θXi (1− θ)1−Xi

L(θ|X1, . . . ,Xn) = θ∑

Xi (1− θ)n−∑

Xi = θSn(1− θ)n−Sn

Posterior, with Sn =∑n

i=1 Xi :

π(θ|X1, . . . ,Xn) ∝ 1 · θSn(1− θ)n−Sn

We can compute the normalizing constant analytically:

π(θ|X1, . . . ,Xn) =(n + 1)!

Sn!(n − Sn)!θSn(1− θ)n−Sn

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 9 / 42

Page 10: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Conjugate prior

Suppose we take the prior θ ∼ Beta(α, β):

π(θ) =Γ(α + β)

Γ(α)Γ(β)θα−1(1− θ)β−1.

Then the posterior verifies

π(θ|X1, . . . ,Xn) ∝ θα−1(1− θ)β−1 · θSn(1− θ)n−Sn

henceθ|X1, . . . ,Xn ∼ Beta(α + Sn, β + n − Sn).

Whatever the data, the posterior is in the same family as the prior: we saythat the prior is conjugate for this model. This is very convenientmathematically.

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 10 / 42

Page 11: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Jeffrey’s prior

Another possible default prior is Jeffrey’s prior, which is invariant bychange of variables.Let ` be the log-likelihood and I be Fisher’s information:

I(θ) = E

[(d`

)2∣∣∣∣∣X ∼ Pθ

]= −E

[d2

dθ2`(θ;X )

∣∣∣∣X ∼ Pθ] .Jeffrey’s prior is defined by

π(θ) ∝√I(θ).

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 11 / 42

Page 12: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Invariance of Jeffrey’s prior

Let φ be an alternate parameterization of the model. Then the priorinduced on φ by Jeffrey’s prior on θ is

π(φ) = π(θ)

∣∣∣∣ dθdφ∣∣∣∣

√I(θ)

(dθ

)2

=

√√√√E

[(d`

)2](

)2

=

√√√√E

[(d`

)2]

=

√√√√E

[(d`

)2]

=√I(φ)

which is Jeffrey’s prior on φ.

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 12 / 42

Page 13: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Effect of prior

Example: Bernoulli model (biased coin). θ=probability of success.Observe Sn = 72 successes out of n = 100 trials.Frequentist estimate: θ = 0.7295% confidence interval: [0.63 0.81].Bayesian estimate: will depend on the prior.

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 13 / 42

Page 14: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Effect of prior

0.0 0.2 0.4 0.6 0.8 1.0

04

8

x

prio

r(x)

0.0 0.2 0.4 0.6 0.8 1.0

04

8

x

prio

r(x)

0.0 0.2 0.4 0.6 0.8 1.0

04

8

x

prio

r(x)

Sn = 72, n = 100Black:prior; green: likelihood; red: posterior.

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 14 / 42

Page 15: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Effect of prior

0.0 0.2 0.4 0.6 0.8 1.0

04

8

x

prio

r(x)

0.0 0.2 0.4 0.6 0.8 1.0

04

8

x

prio

r(x)

0.0 0.2 0.4 0.6 0.8 1.0

04

8

x

prio

r(x)

Sn = 7, n = 10Black:prior; green: likelihood; red: posterior.

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 15 / 42

Page 16: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Effect of prior

0.0 0.2 0.4 0.6 0.8 1.0

010

25

x

prio

r(x)

0.0 0.2 0.4 0.6 0.8 1.0

010

25

x

prio

r(x)

0.0 0.2 0.4 0.6 0.8 1.0

010

25

x

prio

r(x)

Sn = 721, n = 1000Black:prior; green: likelihood; red: posterior.

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 16 / 42

Page 17: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Choosing the prior

The choice of the prior distribution can have a large impact, especially ifthe data are of small to moderate size. How do we choose the prior?

Expert knowledge of the application

A previous experiment

A conjugate prior, i.e. one that is convenient mathematically, withmoments chosen by expert knowledge

A non-informative prior

...

In all cases, the best practice is to try several priors, and to see whetherthe posteriors agree: would the data be enough to make agree experts whodisagreed a priori?

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 17 / 42

Page 18: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Example: phylogenetic tree

Example from Ryder & Nicholls (2011).Given lexical data, we wish to infer the age of the Most Recent CommonAncestor to the Indo-European languages.Two main hypotheses:

Kurgan hypothesis: root age is 6000-6500 years Before Present (BP).

Anatolian hypothesis: root age is 8000-9500 years BP

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 18 / 42

Page 19: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Example of a tree

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 19 / 42

Page 20: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Why be Bayesian in this setting?

Our model is complex and the likelihood function is not pleasant

We are interested in the marginal distribution of the root age

Many nuisance parameters: tree topology, internal ages, evolutionrates...

We want to make sure that our inference procedure does not favourone of the two hypotheses a priori

We will use the output as input of other models

For the root age, we choose a prior U([5000, 16000]). Prior for the otherparameters is out of the scope of this talk.

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 20 / 42

Page 21: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Model parameters

Parameter space is large:

Root age R

Tree topology and internal ages g (complex state space)

Evolution parameters λ, µ, ρ, κ

...

The posterior distribution is defined by

π(R, g , λ, µ, ρ, κ|D) ∝ π(R)π(g)π(λ, µ, κ, ρ)L(R, g , λ, µ, κ, ρ|D)

We are interested in the marginal distribution of R given the data D:

π(R|D) =

∫π(R, g , λ, µ, ρ, κ|D) dg dλ dµ dρ dκ.

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 21 / 42

Page 22: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Computation

This distribution is not available analytically, nor can we sample from itdirectly.But we can build a Markov Chain Monte Carlo scheme (see Jalal’s talk) toget a sample from the joint posterior distribution of (R, g , λ, µ, ρ, κ) givenD.Then keeping only the R component gives us a sample from the marginalposterior distribution of R given D.

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 22 / 42

Page 23: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Root age: prior

4000 6000 8000 10000 12000 14000 16000

0e+

001e

−04

2e−

043e

−04

4e−

045e

−04

6e−

047e

−04

Age of Proto−Indo−European

years Before Present

dens

ity

Kurgan hypothesisAnatolian hypothesis

priorposterior

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 23 / 42

Page 24: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Root age: posterior

4000 6000 8000 10000 12000 14000 16000

0e+

001e

−04

2e−

043e

−04

4e−

045e

−04

6e−

047e

−04

Age of Proto−Indo−European

years Before Present

dens

ity

Kurgan hypothesisAnatolian hypothesis

priorposterior

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 24 / 42

Page 25: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Phylogenetics tree of languages: conclusions

Strong support for Anatolian hypothesis; no support for Kurganhypothesis

Measuring the uncertainty of the root age estimate is key

We integrate out the uncertainty of the nuisance parameters

This setting is much easier to handle in the Bayesian setting than inthe frequentist setting

Computational aspects are complex

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 25 / 42

Page 26: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Air France Flight 447

This section and its figures are after Stone et al. (Statistical Science 2014)Air France Flight 447 disappeared over the Atlantic on 1 June 2009, enroute from Rio de Janeiro to Paris; all 228 people on board were killed.The first three search parties did not succeed at retrieving the wreckage orflight recorders.In 2011, a fourth party was launched, based on a Bayesian search.

Figure : Flight route. Picture by Mysid, Public Domain.

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 26 / 42

Page 27: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Why be Bayesian?

Many sources of uncertainties

Subjective probabilities

The object of interest is a distribution

Frequentist formalism does not apply (unique event)

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 27 / 42

Page 28: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Previous searches

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 28 / 42

Page 29: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Prior based on flight dynamics

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 29 / 42

Page 30: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Probabilities derived from drift

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 30 / 42

Page 31: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Posterior

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 31 / 42

Page 32: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Conclusions

Once the posterior distribution was derived, the search was organizedstarting with the areas of highest posterior probability

Actually several posteriors, because several models were considered

The wreckage was located in one week

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 32 / 42

Page 33: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Point estimates

Although one of the main purposes of Bayesian inference is getting adistribution, we can also need to summarize the posterior with a pointestimate.Common choices:

Posterior mean

θ =

∫θ · π(θ|D) dθ

Maximum a posteriori (MAP)

θ = arg maxπ(θ|D)

Posterior median

...

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 33 / 42

Page 34: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Optimality

From a frequentist point of view, the posterior expectation is optimalunder a certain sense.Let θ be the true value of the parameter of interest, and θ(X ) anestimator. Then the posterior mean minimizes the expectation of thesquared error under the prior

[‖θ − θ(X )‖2

2

]For this reason, the posterior mean is also called the minimum meansquare error (MMSE) estimator.For other loss functions, other point estimates are optimal.

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 34 / 42

Page 35: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

2D Ising models

(a) Original Image (b) Focused Region of Image

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 35 / 42

Page 36: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

2D Ising model

Higdon (JASA 1998)

Target density

Consider a 2D Ising model, with posterior density

π(x |y) ∝ exp

α∑i

1I[yi = xi ] + β∑i∼j

1I[xi = xj ]

with α = 1, β = 0.7.

The first term (likelihood) encourages states x which are similar tothe original image y .

The second term (prior) favors states x for which neighbouring pixelsare equal, like a Potts model.

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 36 / 42

Page 37: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

2D Ising models: posterior exploration

X1

X2

10

20

30

40

10

20

30

40

Iteration 300,000

10 20 30 40

Iteration 350,000

10 20 30 40

Iteration 400,000

10 20 30 40

Iteration 450,000

10 20 30 40

Iteration 500,000

10 20 30 40

Metropolis−

Hastings

Wang−

Landau

Pixel

On

Off

Figure : Spatial model example: states explored over 500,000 iterations forMetropolis-Hastings (top) and Wang-Landau algorithms (bottom). Figure from

Bornn et al. (JCGS 2013). See also Jacob & Ryder (AAP 2014) for more on the

algorithm.

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 37 / 42

Page 38: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

2D Ising models: posterior mean

Figure : Spatial model example: average state explored with Wang-Landau afterimportance sampling. Figure from Bornn et al. (JCGS 2013). See also Jacob & Ryder

(AAP 2014) for more on the algorithm.

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 38 / 42

Page 39: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Ising model: conclusions

Problem-specific prior

Even with a point estimate (posterior mean), we measure uncertainty

Computational cost is very high

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 39 / 42

Page 40: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Several models

Suppose we have several models m1,m2, . . . ,mk . Then the model indexcan be viewed as a parameter.Take a uniform (or other) prior:

P[M = mj ] =1

k.

The posterior distribution then gives us the probability associated witheach model given the data.We can use this for model choice (but there are other, more sophisticated,techniques) but also for estimation/prediction while integrating out theuncertainty on the model.

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 40 / 42

Page 41: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Example: variable selection for linear regression

A model is a choice of covariables to include in the regression. With pcovariables, there are 2p models.Classical (frequentist) setting:

Select variables, using your favourite penalty, thus selecting one model

Perform estimation and prediction within that model

If you want error bars, you can compute them, but only within thatmodel

Bayesian setting:

Explore space of all models

Get posterior probabilities

Compute estimation and prediction for each model (or, in practice, forthose with non negligible probability)

Weight these estimates/predictions by the posterior probability ofeach model

The uncertainty about the model is thus fully taken into account.Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 41 / 42

Page 42: Robin J. Ryder - GitHub Pages · Robin J. Ryder Centre de Recherche en Math ematiques de la D ecision Universit e Paris-Dauphine 6 November 2017 Mathematical Co ees Robin Ryder (Dauphine)

Conclusions

Bayesian inference is a powerful tool to fully take into account allsources of uncertainty

Difficulty of prior specification

Computational issues are the main hurdle

Robin Ryder (Dauphine) Bayesian inference: what and why 06/11/17 42 / 42