Chris Sherlock's slides

download Chris Sherlock's slides

of 73

  • date post

    06-Jan-2017
  • Category

    Science

  • view

    2.405
  • download

    0

Embed Size (px)

Transcript of Chris Sherlock's slides

  • Fast generic MCMC for targets with expensivelikelihoods

    C.Sherlock (Lancaster), A.Golightly (Ncl) & D.Henderson(Ncl); this talk by: Jere Koskela (Warwick)

    Monte Carlo Techniques, Paris, July 2016

  • Motivation

    Metropolis-Hastings (MH) algorithms create a realisation:(1), (2), . . . from a Markov chain with stationary density ().For each (i) we evaluate ((i)) - and then discard it.

    Pseudo-marginal MH algorithms create a realisation of ((i), u(i))- and then discard it.

    In many applications evaluating () or (, u) is computationallyexpensive.

    We would like to reuse these values to create a more efficient MHalgorithm that still targets the correct stationary distribution.

    We will focus on the [pseudo-marginal] random walk Metropolis(RWM) with a Gaussian proposal:

    | N(, 2V ).

  • Motivation

    Metropolis-Hastings (MH) algorithms create a realisation:(1), (2), . . . from a Markov chain with stationary density ().For each (i) we evaluate ((i)) - and then discard it.

    Pseudo-marginal MH algorithms create a realisation of ((i), u(i))- and then discard it.

    In many applications evaluating () or (, u) is computationallyexpensive.

    We would like to reuse these values to create a more efficient MHalgorithm that still targets the correct stationary distribution.

    We will focus on the [pseudo-marginal] random walk Metropolis(RWM) with a Gaussian proposal:

    | N(, 2V ).

  • This talk

    Creating an approximation to (): k-NN.

    Using an approximation: Delayed acceptance [PsM]MH.

    Storing the values: KD-trees.

    Algorithm: adaptive da-[PsM]MH; ergodicity.

    Choice of P(fixed kernel).

    Simulation study.

  • Creating an approximation

    At iteration n of the MH we have ((1)), ((2)), . . . , ((n)), andwe would like to create a(

    ) ().

    Gaussian process to log ?:[problems with cost of fitting and evaluating as n ]

    Weighted average of k-nearest neighbour values:(i) Fitting cost: 0 (actually O(n0)).(ii) Per-iteration cost: O(n).(iii) Accuracy with n.

  • Creating an approximation

    At iteration n of the MH we have ((1)), ((2)), . . . , ((n)), andwe would like to create a(

    ) ().

    Gaussian process to log ?:

    [problems with cost of fitting and evaluating as n ]

    Weighted average of k-nearest neighbour values:(i) Fitting cost: 0 (actually O(n0)).(ii) Per-iteration cost: O(n).(iii) Accuracy with n.

  • Creating an approximation

    At iteration n of the MH we have ((1)), ((2)), . . . , ((n)), andwe would like to create a(

    ) ().

    Gaussian process to log ?:[problems with cost of fitting and evaluating as n ]

    Weighted average of k-nearest neighbour values:(i) Fitting cost: 0 (actually O(n0)).(ii) Per-iteration cost: O(n).(iii) Accuracy with n.

  • Creating an approximation

    At iteration n of the MH we have ((1)), ((2)), . . . , ((n)), andwe would like to create a(

    ) ().

    Gaussian process to log ?:[problems with cost of fitting and evaluating as n ]

    Weighted average of k-nearest neighbour values:(i) Fitting cost: 0 (actually O(n0)).

    (ii) Per-iteration cost: O(n).(iii) Accuracy with n.

  • Creating an approximation

    At iteration n of the MH we have ((1)), ((2)), . . . , ((n)), andwe would like to create a(

    ) ().

    Gaussian process to log ?:[problems with cost of fitting and evaluating as n ]

    Weighted average of k-nearest neighbour values:(i) Fitting cost: 0 (actually O(n0)).(ii) Per-iteration cost: O(n).

    (iii) Accuracy with n.

  • Creating an approximation

    At iteration n of the MH we have ((1)), ((2)), . . . , ((n)), andwe would like to create a(

    ) ().

    Gaussian process to log ?:[problems with cost of fitting and evaluating as n ]

    Weighted average of k-nearest neighbour values:(i) Fitting cost: 0 (actually O(n0)).(ii) Per-iteration cost: O(n).(iii) Accuracy with n.

  • Delayed acceptance MH (1)

    (Christen and Fox, 2005). Suppose we have acomputationally-cheap approximation to the posterior: a().

    Define da(; ) := 1(;

    ) 2(; ), where

    1 := 1 a(

    )q(|)a()q(|)

    and 2 := 1 ()/a(

    )

    ()/a().

    Detailed balance (with respect ()) is still preserved with dabecause

    () q(|) da(; )= a() q(

    |) 1 ()a() 2.

    But this algorithm mixes worse than the equivalent MH algorithm(Peskun, 1973; Tierney, 1998).

  • Delayed acceptance MH (1)

    (Christen and Fox, 2005). Suppose we have acomputationally-cheap approximation to the posterior: a().

    Define da(; ) := 1(;

    ) 2(; ), where

    1 := 1 a(

    )q(|)a()q(|)

    and 2 := 1 ()/a(

    )

    ()/a().

    Detailed balance (with respect ()) is still preserved with dabecause

    () q(|) da(; )= a() q(

    |) 1 ()a() 2.

    But this algorithm mixes worse than the equivalent MH algorithm(Peskun, 1973; Tierney, 1998).

  • Delayed acceptance MH (1)

    (Christen and Fox, 2005). Suppose we have acomputationally-cheap approximation to the posterior: a().

    Define da(; ) := 1(;

    ) 2(; ), where

    1 := 1 a(

    )q(|)a()q(|)

    and 2 := 1 ()/a(

    )

    ()/a().

    Detailed balance (with respect ()) is still preserved with dabecause

    () q(|) da(; )

    = a() q(|) 1 ()a() 2.

    But this algorithm mixes worse than the equivalent MH algorithm(Peskun, 1973; Tierney, 1998).

  • Delayed acceptance MH (1)

    (Christen and Fox, 2005). Suppose we have acomputationally-cheap approximation to the posterior: a().

    Define da(; ) := 1(;

    ) 2(; ), where

    1 := 1 a(

    )q(|)a()q(|)

    and 2 := 1 ()/a(

    )

    ()/a().

    Detailed balance (with respect ()) is still preserved with dabecause

    () q(|) da(; )= a() q(

    |) 1 ()a() 2.

    But this algorithm mixes worse than the equivalent MH algorithm(Peskun, 1973; Tierney, 1998).

  • Delayed acceptance MH (1)

    (Christen and Fox, 2005). Suppose we have acomputationally-cheap approximation to the posterior: a().

    Define da(; ) := 1(;

    ) 2(; ), where

    1 := 1 a(

    )q(|)a()q(|)

    and 2 := 1 ()/a(

    )

    ()/a().

    Detailed balance (with respect ()) is still preserved with dabecause

    () q(|) da(; )= a() q(

    |) 1 ()a() 2.

    But this algorithm mixes worse than the equivalent MH algorithm(Peskun, 1973; Tierney, 1998).

  • Delayed-acceptance [PsM]MH (2)

    Using da = 1(; )2(;

    ) mixes worse but CPU time/iterationcan be much reduced.

    Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).

    1 is quick to calculate.

    There is no need to calculate 2 if we reject at Stage One.

    If a is accurate then 2 1.

    If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].

    Delayed-acceptance PMMH:

    2 := 1 (, u)/a(

    )

    (, u)/a().

  • Delayed-acceptance [PsM]MH (2)

    Using da = 1(; )2(;

    ) mixes worse but CPU time/iterationcan be much reduced.

    Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).

    1 is quick to calculate.

    There is no need to calculate 2 if we reject at Stage One.

    If a is accurate then 2 1.

    If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].

    Delayed-acceptance PMMH:

    2 := 1 (, u)/a(

    )

    (, u)/a().

  • Delayed-acceptance [PsM]MH (2)

    Using da = 1(; )2(;

    ) mixes worse but CPU time/iterationcan be much reduced.

    Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).

    1 is quick to calculate.

    There is no need to calculate 2 if we reject at Stage One.

    If a is accurate then 2 1.

    If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].

    Delayed-acceptance PMMH:

    2 := 1 (, u)/a(

    )

    (, u)/a().

  • Delayed-acceptance [PsM]MH (2)

    Using da = 1(; )2(;

    ) mixes worse but CPU time/iterationcan be much reduced.

    Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).

    1 is quick to calculate.

    There is no need to calculate 2 if we reject at Stage One.

    If a is accurate then 2 1.

    If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].

    Delayed-acceptance PMMH:

    2 := 1 (, u)/a(

    )

    (, u)/a().

  • Delayed-acceptance [PsM]MH (2)

    Using da = 1(; )2(;

    ) mixes worse but CPU time/iterationcan be much reduced.

    Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).

    1 is quick to calculate.

    There is no need to calculate 2 if we reject at Stage One.

    If a is accurate then 2 1.

    If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].

    Delayed-acceptance PMMH:

    2 := 1 (, u)/a(

    )

    (, u)/a().

  • Delayed-acceptance [PsM]MH (2)

    Using da = 1(; )2(;

    ) mixes worse but CPU time/iterationcan be much reduced.

    Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).

    1 is quick to calculate.

    There is no need to calculate 2 if we reject at Stage One.

    If a is accurate then 2 1.

    If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].

    Delayed-acceptance PMMH:

    2 := 1 (, u)/a(

    )

    (, u)/a().

  • Delayed-acceptance [PsM]MH (2)

    Using da = 1(; )2(;

    ) mixes worse but CPU time/iterationcan be much reduced.

    Accept accept at Stage 1 (w.p. 1) and accept at Stage 2 (w.p. 2).

    1 is quick to calculate.

    There is no need to calculate 2 if we reject at Stage One.

    If a is accurate then 2 1.

    If a is also cheap then (RWM) can use large jump proposals[EXPLAIN].

    Delayed-acceptance PMMH:

    2 := 1 (, u)/a