# Chris Sherlock's slides

date post

06-Jan-2017Category

## Science

view

2.405download

0

Embed Size (px)

### Transcript of Chris Sherlock's slides

Fast generic MCMC for targets with expensivelikelihoods

C.Sherlock (Lancaster), A.Golightly (Ncl) & D.Henderson(Ncl); this talk by: Jere Koskela (Warwick)

Monte Carlo Techniques, Paris, July 2016

Motivation

Metropolis-Hastings (MH) algorithms create a realisation:(1), (2), . . . from a Markov chain with stationary density ().For each (i) we evaluate ((i)) - and then discard it.

Pseudo-marginal MH algorithms create a realisation of ((i), u(i))- and then discard it.

In many applications evaluating () or (, u) is computationallyexpensive.

We would like to reuse these values to create a more efficient MHalgorithm that still targets the correct stationary distribution.

We will focus on the [pseudo-marginal] random walk Metropolis(RWM) with a Gaussian proposal:

| N(, 2V ).

Motivation

Metropolis-Hastings (MH) algorithms create a realisation:(1), (2), . . . from a Markov chain with stationary density ().For each (i) we evaluate ((i)) - and then discard it.

Pseudo-marginal MH algorithms create a realisation of ((i), u(i))- and then discard it.

In many applications evaluating () or (, u) is computationallyexpensive.

We would like to reuse these values to create a more efficient MHalgorithm that still targets the correct stationary distribution.

We will focus on the [pseudo-marginal] random walk Metropolis(RWM) with a Gaussian proposal:

| N(, 2V ).

This talk

Creating an approximation to (): k-NN.

Using an approximation: Delayed acceptance [PsM]MH.

Storing the values: KD-trees.

Algorithm: adaptive da-[PsM]MH; ergodicity.

Choice of P(fixed kernel).

Simulation study.

Creating an approximation

At iteration n of the MH we have ((1)), ((2)), . . . , ((n)), andwe would like to create a(

) ().

Gaussian process to log ?:[problems with cost of fitting and evaluating as n ]

Weighted average of k-nearest neighbour values:(i) Fitting cost: 0 (actually O(n0)).(ii) Per-iteration cost: O(n).(iii) Accuracy with n.

Creating an approximation

At iteration n of the MH we have ((1)), ((2)), . . . , ((n)), andwe would like to create a(

) ().

Gaussian process to log ?:

[problems with cost of fitting and evaluating as n ]

Weighted average of k-nearest neighbour values:(i) Fitting cost: 0 (actually O(n0)).(ii) Per-iteration cost: O(n).(iii) Accuracy with n.

Creating an approximation

At iteration n of the MH we have ((1)), ((2)), . . . , ((n)), andwe would like to create a(

) ().

Gaussian process to log ?:[problems with cost of fitting and evaluating as n ]

Weighted average of k-nearest neighbour values:(i) Fitting cost: 0 (actually O(n0)).(ii) Per-iteration cost: O(n).(iii) Accuracy with n.

Creating an approximation

At iteration n of the MH we have ((1)), ((2)), . . . , ((n)), andwe would like to create a(

) ().

Gaussian process to log ?:[problems with cost of fitting and evaluating as n ]

Weighted average of k-nearest neighbour values:(i) Fitting cost: 0 (actually O(n0)).

(ii) Per-iteration cost: O(n).(iii) Accuracy with n.

Creating an approximation

At iteration n of the MH we have ((1)), ((2)), . . . , ((n)), andwe would like to create a(

) ().

Gaussian process to log ?:[problems with cost of fitting and evaluating as n ]

Weighted average of k-nearest neighbour values:(i) Fitting cost: 0 (actually O(n0)).(ii) Per-iteration cost: O(n).

(iii) Accuracy with n.

Creating an approximation

At iteration n of the MH we have ((1)), ((2)), . . . , ((n)), andwe would like to create a(

) ().

Gaussian process to log ?:[problems with cost of fitting and evaluating as n ]

Delayed acceptance MH (1)

(Christen and Fox, 2005). Suppose we have acomputationally-cheap approximation to the posterior: a().

Define da(; ) := 1(;

) 2(; ), where

1 := 1 a(

)q(|)a()q(|)

and 2 := 1 ()/a(

)

()/a().

Detailed balance (with respect ()) is still preserved with dabecause

() q(|) da(; )= a() q(

|) 1 ()a() 2.

But this algorithm mixes worse than the equivalent MH algorithm(Peskun, 1973; Tierney, 1998).

Delayed acceptance MH (1)

(Christen and Fox, 2005). Suppose we have acomputationally-cheap approximation to the posterior: a().

Define da(; ) := 1(;

) 2(; ), where

1 := 1 a(

)q(|)a()q(|)

and 2 := 1 ()/a(

)

()/a().

Detailed balance (with respect ()) is still preserved with dabecause

() q(|) da(; )= a() q(

|) 1 ()a() 2.

But this algorithm mixes worse than the equivalent MH algorithm(Peskun, 1973; Tierney, 1998).

Delayed acceptance MH (1)

(Christen and Fox, 2005). Suppose we have acomputationally-cheap approximation to the posterior: a().

Define da(; ) := 1(;

) 2(; ), where

1 := 1 a(

)q(|)a()q(|)

and 2 := 1 ()/a(

)

()/a().

Detailed balance (with respect ()) is still preserved with dabecause

() q(|) da(; )

= a() q(|) 1 ()a() 2.

But this algorithm mixes worse than the equivalent MH algorithm(Peskun, 1973; Tierney, 1998).

Delayed acceptance MH (1)

Define da(; ) := 1(;

) 2(; ), where

1 := 1 a(

)q(|)a()q(|)

and 2 := 1 ()/a(

)

()/a().

Detailed balance (with respect ()) is still preserved with dabecause

() q(|) da(; )= a() q(

|) 1 ()a() 2.

But this algorithm mixes worse than the equivalent MH algorithm(Peskun, 1973; Tierney, 1998).

Delayed acceptance MH (1)

Define da(; ) := 1(;

) 2(; ), where

1 := 1 a(

)q(|)a()q(|)

and 2 := 1 ()/a(

)

()/a().

Detailed balance (with respect ()) is still preserved with dabecause

() q(|) da(; )= a() q(

|) 1 ()a() 2.

But this algorithm mixes worse than the equivalent MH algorithm(Peskun, 1973; Tierney, 1998).

Delayed-acceptance [PsM]MH (2)

Using da = 1(; )2(;

) mixes worse but CPU time/iterationcan be much reduced.

Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).

1 is quick to calculate.

There is no need to calculate 2 if we reject at Stage One.

If a is accurate then 2 1.

If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].

Delayed-acceptance PMMH:

2 := 1 (, u)/a(

)

(, u)/a().

Delayed-acceptance [PsM]MH (2)

Using da = 1(; )2(;

) mixes worse but CPU time/iterationcan be much reduced.

Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).

1 is quick to calculate.

There is no need to calculate 2 if we reject at Stage One.

If a is accurate then 2 1.

If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].

Delayed-acceptance PMMH:

2 := 1 (, u)/a(

)

(, u)/a().

Delayed-acceptance [PsM]MH (2)

Using da = 1(; )2(;

) mixes worse but CPU time/iterationcan be much reduced.

Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).

1 is quick to calculate.

There is no need to calculate 2 if we reject at Stage One.

If a is accurate then 2 1.

If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].

Delayed-acceptance PMMH:

2 := 1 (, u)/a(

)

(, u)/a().

Delayed-acceptance [PsM]MH (2)

Using da = 1(; )2(;

) mixes worse but CPU time/iterationcan be much reduced.

Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).

1 is quick to calculate.

There is no need to calculate 2 if we reject at Stage One.

If a is accurate then 2 1.

If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].

Delayed-acceptance PMMH:

2 := 1 (, u)/a(

)

(, u)/a().

Delayed-acceptance [PsM]MH (2)

Using da = 1(; )2(;

) mixes worse but CPU time/iterationcan be much reduced.

Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).

1 is quick to calculate.

There is no need to calculate 2 if we reject at Stage One.

If a is accurate then 2 1.

If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].

Delayed-acceptance PMMH:

2 := 1 (, u)/a(

)

(, u)/a().

Delayed-acceptance [PsM]MH (2)

Using da = 1(; )2(;

) mixes worse but CPU time/iterationcan be much reduced.

Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).

1 is quick to calculate.

There is no need to calculate 2 if we reject at Stage One.

If a is accurate then 2 1.

If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].

Delayed-acceptance PMMH:

2 := 1 (, u)/a(

)

(, u)/a().

Delayed-acceptance [PsM]MH (2)

Using da = 1(; )2(;

) mixes worse but CPU time/iterationcan be much reduced.

Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).

1 is quick to calculate.

There is no need to calculate 2 if we reject at Stage One.

If a is accurate then 2 1.

If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].

Delayed-acceptance PMMH:

2 := 1 (, u)/a