Chris Sherlock's slides
date post
06-Jan-2017Category
Science
view
2.406download
0
Embed Size (px)
Transcript of Chris Sherlock's slides
Fast generic MCMC for targets with expensivelikelihoods
C.Sherlock (Lancaster), A.Golightly (Ncl) & D.Henderson(Ncl); this talk by: Jere Koskela (Warwick)
Monte Carlo Techniques, Paris, July 2016
Motivation
Metropolis-Hastings (MH) algorithms create a realisation:(1), (2), . . . from a Markov chain with stationary density ().For each (i) we evaluate ((i)) - and then discard it.
Pseudo-marginal MH algorithms create a realisation of ((i), u(i))- and then discard it.
In many applications evaluating () or (, u) is computationallyexpensive.
We would like to reuse these values to create a more efficient MHalgorithm that still targets the correct stationary distribution.
We will focus on the [pseudo-marginal] random walk Metropolis(RWM) with a Gaussian proposal:
| N(, 2V ).
Motivation
Metropolis-Hastings (MH) algorithms create a realisation:(1), (2), . . . from a Markov chain with stationary density ().For each (i) we evaluate ((i)) - and then discard it.
Pseudo-marginal MH algorithms create a realisation of ((i), u(i))- and then discard it.
In many applications evaluating () or (, u) is computationallyexpensive.
We would like to reuse these values to create a more efficient MHalgorithm that still targets the correct stationary distribution.
We will focus on the [pseudo-marginal] random walk Metropolis(RWM) with a Gaussian proposal:
| N(, 2V ).
This talk
Creating an approximation to (): k-NN.
Using an approximation: Delayed acceptance [PsM]MH.
Storing the values: KD-trees.
Algorithm: adaptive da-[PsM]MH; ergodicity.
Choice of P(fixed kernel).
Simulation study.
Creating an approximation
At iteration n of the MH we have ((1)), ((2)), . . . , ((n)), andwe would like to create a(
) ().
Gaussian process to log ?:[problems with cost of fitting and evaluating as n ]
Weighted average of k-nearest neighbour values:(i) Fitting cost: 0 (actually O(n0)).(ii) Per-iteration cost: O(n).(iii) Accuracy with n.
Creating an approximation
At iteration n of the MH we have ((1)), ((2)), . . . , ((n)), andwe would like to create a(
) ().
Gaussian process to log ?:
[problems with cost of fitting and evaluating as n ]
Weighted average of k-nearest neighbour values:(i) Fitting cost: 0 (actually O(n0)).(ii) Per-iteration cost: O(n).(iii) Accuracy with n.
Creating an approximation
At iteration n of the MH we have ((1)), ((2)), . . . , ((n)), andwe would like to create a(
) ().
Gaussian process to log ?:[problems with cost of fitting and evaluating as n ]
Weighted average of k-nearest neighbour values:(i) Fitting cost: 0 (actually O(n0)).(ii) Per-iteration cost: O(n).(iii) Accuracy with n.
Creating an approximation
At iteration n of the MH we have ((1)), ((2)), . . . , ((n)), andwe would like to create a(
) ().
Gaussian process to log ?:[problems with cost of fitting and evaluating as n ]
Weighted average of k-nearest neighbour values:(i) Fitting cost: 0 (actually O(n0)).
(ii) Per-iteration cost: O(n).(iii) Accuracy with n.
Creating an approximation
At iteration n of the MH we have ((1)), ((2)), . . . , ((n)), andwe would like to create a(
) ().
Gaussian process to log ?:[problems with cost of fitting and evaluating as n ]
Weighted average of k-nearest neighbour values:(i) Fitting cost: 0 (actually O(n0)).(ii) Per-iteration cost: O(n).
(iii) Accuracy with n.
Creating an approximation
At iteration n of the MH we have ((1)), ((2)), . . . , ((n)), andwe would like to create a(
) ().
Gaussian process to log ?:[problems with cost of fitting and evaluating as n ]
Weighted average of k-nearest neighbour values:(i) Fitting cost: 0 (actually O(n0)).(ii) Per-iteration cost: O(n).(iii) Accuracy with n.
Delayed acceptance MH (1)
(Christen and Fox, 2005). Suppose we have acomputationally-cheap approximation to the posterior: a().
Define da(; ) := 1(;
) 2(; ), where
1 := 1 a(
)q(|)a()q(|)
and 2 := 1 ()/a(
)
()/a().
Detailed balance (with respect ()) is still preserved with dabecause
() q(|) da(; )= a() q(
|) 1 ()a() 2.
But this algorithm mixes worse than the equivalent MH algorithm(Peskun, 1973; Tierney, 1998).
Delayed acceptance MH (1)
(Christen and Fox, 2005). Suppose we have acomputationally-cheap approximation to the posterior: a().
Define da(; ) := 1(;
) 2(; ), where
1 := 1 a(
)q(|)a()q(|)
and 2 := 1 ()/a(
)
()/a().
Detailed balance (with respect ()) is still preserved with dabecause
() q(|) da(; )= a() q(
|) 1 ()a() 2.
But this algorithm mixes worse than the equivalent MH algorithm(Peskun, 1973; Tierney, 1998).
Delayed acceptance MH (1)
(Christen and Fox, 2005). Suppose we have acomputationally-cheap approximation to the posterior: a().
Define da(; ) := 1(;
) 2(; ), where
1 := 1 a(
)q(|)a()q(|)
and 2 := 1 ()/a(
)
()/a().
Detailed balance (with respect ()) is still preserved with dabecause
() q(|) da(; )
= a() q(|) 1 ()a() 2.
But this algorithm mixes worse than the equivalent MH algorithm(Peskun, 1973; Tierney, 1998).
Delayed acceptance MH (1)
(Christen and Fox, 2005). Suppose we have acomputationally-cheap approximation to the posterior: a().
Define da(; ) := 1(;
) 2(; ), where
1 := 1 a(
)q(|)a()q(|)
and 2 := 1 ()/a(
)
()/a().
Detailed balance (with respect ()) is still preserved with dabecause
() q(|) da(; )= a() q(
|) 1 ()a() 2.
But this algorithm mixes worse than the equivalent MH algorithm(Peskun, 1973; Tierney, 1998).
Delayed acceptance MH (1)
(Christen and Fox, 2005). Suppose we have acomputationally-cheap approximation to the posterior: a().
Define da(; ) := 1(;
) 2(; ), where
1 := 1 a(
)q(|)a()q(|)
and 2 := 1 ()/a(
)
()/a().
Detailed balance (with respect ()) is still preserved with dabecause
() q(|) da(; )= a() q(
|) 1 ()a() 2.
But this algorithm mixes worse than the equivalent MH algorithm(Peskun, 1973; Tierney, 1998).
Delayed-acceptance [PsM]MH (2)
Using da = 1(; )2(;
) mixes worse but CPU time/iterationcan be much reduced.
Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).
1 is quick to calculate.
There is no need to calculate 2 if we reject at Stage One.
If a is accurate then 2 1.
If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].
Delayed-acceptance PMMH:
2 := 1 (, u)/a(
)
(, u)/a().
Delayed-acceptance [PsM]MH (2)
Using da = 1(; )2(;
) mixes worse but CPU time/iterationcan be much reduced.
Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).
1 is quick to calculate.
There is no need to calculate 2 if we reject at Stage One.
If a is accurate then 2 1.
If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].
Delayed-acceptance PMMH:
2 := 1 (, u)/a(
)
(, u)/a().
Delayed-acceptance [PsM]MH (2)
Using da = 1(; )2(;
) mixes worse but CPU time/iterationcan be much reduced.
Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).
1 is quick to calculate.
There is no need to calculate 2 if we reject at Stage One.
If a is accurate then 2 1.
If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].
Delayed-acceptance PMMH:
2 := 1 (, u)/a(
)
(, u)/a().
Delayed-acceptance [PsM]MH (2)
Using da = 1(; )2(;
) mixes worse but CPU time/iterationcan be much reduced.
Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).
1 is quick to calculate.
There is no need to calculate 2 if we reject at Stage One.
If a is accurate then 2 1.
If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].
Delayed-acceptance PMMH:
2 := 1 (, u)/a(
)
(, u)/a().
Delayed-acceptance [PsM]MH (2)
Using da = 1(; )2(;
) mixes worse but CPU time/iterationcan be much reduced.
Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).
1 is quick to calculate.
There is no need to calculate 2 if we reject at Stage One.
If a is accurate then 2 1.
If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].
Delayed-acceptance PMMH:
2 := 1 (, u)/a(
)
(, u)/a().
Delayed-acceptance [PsM]MH (2)
Using da = 1(; )2(;
) mixes worse but CPU time/iterationcan be much reduced.
Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).
1 is quick to calculate.
There is no need to calculate 2 if we reject at Stage One.
If a is accurate then 2 1.
If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].
Delayed-acceptance PMMH:
2 := 1 (, u)/a(
)
(, u)/a().
Delayed-acceptance [PsM]MH (2)
Using da = 1(; )2(;
) mixes worse but CPU time/iterationcan be much reduced.
Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).
1 is quick to calculate.
There is no need to calculate 2 if we reject at Stage One.
If a is accurate then 2 1.
If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].
Delayed-acceptance PMMH:
2 := 1 (, u)/a