Intro to ABC

Intro to ABC Example Conclusion

an introduction toApproximate Bayesian Computation

Matt Moores

Mathematical Sciences SchoolQueensland University of Technology

Brisbane, Australia

ABC in SydneyJuly 3, 2014


Motivation

Inference for a parameter θ when it is:

impossible

or very expensive

to evaluate the likelihood p(y|θ)

ABC is a likelihood-free method for approximatingthe posterior distribution

π(θ|y)

by generating pseudo-data from the model:

w ∼ f(·|θ)


Likelihood-free rejection sampler

Algorithm 1 Likelihood-free rejection sampler

1: Draw parameter value θ′ ∼ π(θ)2: Generate w ∼ f(·|θ′)3: if w = y (the observed data) then4: accept θ′

5: end if

But if the observations y are continuous

(or the space y ∈ Y is enormous)

then P(w = y) ≈ 0

Tavare, Balding, Griffith & Donnelly (1997) Genetics 145(2)


ABC tolerance

accept θ′ if δ(w,y) < ε

where

ε > 0 is the tolerance level

δ(·, ·) is a distance function(for an appropriate choice of norm)

Inference is more exact when ε is close to zero. butmore proposed θ′ are rejected(tradeoff between accuracy & computational cost)

Pritchard, Seielstad, Perez-Lezaun & Feldman (1999) Mol. Biol. Evol. 16(12)


Summary statistics

Computing δ(w,y) for w1, . . . , wn and y1, . . . , yncan be very expensive for large n

Instead, compute summary statistics s(y)

e.g. sufficient statistics(only available for exponential family)


Sufficient statistics

Fisher-Neyman factorisation theorem:

if s(y) is sufficient for θ

then p(y|θ) = f(y) g (s(y)|θ)

only applies to Potts, Ising, exponential randomgraph models (ERGM)

otherwise, selection of suitable summarystatistics can be a very difficult problem


ABC rejection sampler

Algorithm 2 ABC rejection sampler

1: for all iterations t ∈ 1 . . . T do2: Draw independent proposal θ′ ∼ π(θ)3: Generate w ∼ f(·|θ′)4: if ‖s(w)− s(y)‖ < ε then5: set θt ← θ′

6: else7: set θt ← θt−1

8: end if9: end for

Approximates π(θ|y) by πε(θ | ‖s(w)− s(y)‖ < ε)Marin, Pudlo, Robert & Ryder (2012) Stat. Comput. 22(6)Marin & Robert (2014) Bayesian Essentials with R §8.3


A trivial (counter) example

Gaussian with unknown mean:

y ∼ N (µ, 1)

natural conjugate prior:

π(µ) ∼ N (0, 106)

sufficient statistic:

y = 1n

∑ni=1 yi

posterior is analytically tractable:

π(µ|y) ∼ N (m′, s2′)

where1s2′

=(n1

+ 1106

)m′ = s2′ (ny

1+ 0)

= nyn+10−6

∴ no need for ABC (nor MCMC) in practice


R code

π(µ|y)

1.5 2.0 2.5 3.0 3.5 4.0 4.5

0.0

0.2

0.4

0.6

0.8

�y ← rnorm ( n=5, mean=3, sd=1)n ← length ( y )ybar ← sum( y )/npos t s ← 1/ ( n + 1e−6)pos t m ← pos t s ∗ n∗ ybarpos t s im ← rnorm (10000 , pos t m, sd=sqr t ( pos t s ) )


now with ABC

π(µ)

−4000 −2000 0 2000 4000

0e

+0

02

e−

04

4e

−0

4

πε(µ | δ(s(w), s(y)) < ε)

0 2 4 6

0.0

0.2

0.4

0.6

0.8

�prop mu ← rnorm (10000 , 0 , sqr t (1 e6 ) )pseudo ← rnorm ( n∗ 10000 , prop mu, 1)pseudoMx ← matrix ( pseudo , nrow=10000 , ncol=n)ps ybar ← rowMeans ( pseudoMx )ps norm ← abs ( ps ybar − ybar )e p s i l o n ← so r t ( ps norm ) [ 2 0 ]prop keep ← prop mu[ ps norm <= e p s i l o n ]


choice of ε

−15 −10 −5 0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

(a) ε = 15.498

0 2 4 6

0.0

0.2

0.4

0.6

0.8

(b) ε = 3.47

1.5 2.0 2.5 3.0 3.5 4.0 4.5

0.0

0.2

0.4

0.6

0.8

1.0

(c) ε = 1.65

1.5 2.0 2.5 3.0 3.5 4.0 4.5

0.0

0.5

1.0

1.5

2.0

(d) ε = 1.11


Improvements to ABC

Alternatives to i.i.d. proposals:

ABC-MCMC

ABC-SMC

Regression adjustment

compensates for larger ε

Validation of ABC approximation

ABC for model choice


Summary

ABC is a method for likelihood-free inference

It enables inference for models that areotherwise computationally intractable

Main components of ABC:

π(θ) proposal density for θ′

f(·|θ) generative model for wε tolerance level

δ(·, ·) distance functions(y) summary statistics


References

Jean-Michel Marin & Christian RobertBayesian Essentials with RSpringer-Verlag, 2014.

Jean-Michel Marin, Pierre Pudlo, Christian Robert & Robin RyderApproximate Bayesian computational methods.Statistics & Computing, 22(6): 1167–80, 2012.

Simon Tavare, David Balding, Robert Griffiths & Peter DonnellyInferring coalescence times from DNA sequence data.Genetics, 145(2): 505–18, 1997.

Jonathan Pritchard, Mark Seielstad, Anna Perez-Lezaun & MarcusFeldmanPopulation Growth of Human Y Chromosomes: A Study of YChromosome Microsatellites.Mol. Biol. Evol. 16(12): 1791–98, 1999.

Intro to ABC

Data & Analytics

Transcript of Intro to ABC