Download - Intro to ABC

Transcript
Page 1: Intro to ABC

Intro to ABC Example Conclusion

an introduction toApproximate Bayesian Computation

Matt Moores

Mathematical Sciences SchoolQueensland University of Technology

Brisbane, Australia

ABC in SydneyJuly 3, 2014

Page 2: Intro to ABC

Intro to ABC Example Conclusion

Motivation

Inference for a parameter θ when it is:

impossible

or very expensive

to evaluate the likelihood p(y|θ)

ABC is a likelihood-free method for approximatingthe posterior distribution

π(θ|y)

by generating pseudo-data from the model:

w ∼ f(·|θ)

Page 3: Intro to ABC

Intro to ABC Example Conclusion

Likelihood-free rejection sampler

Algorithm 1 Likelihood-free rejection sampler

1: Draw parameter value θ′ ∼ π(θ)2: Generate w ∼ f(·|θ′)3: if w = y (the observed data) then4: accept θ′

5: end if

But if the observations y are continuous

(or the space y ∈ Y is enormous)

then P(w = y) ≈ 0

Tavare, Balding, Griffith & Donnelly (1997) Genetics 145(2)

Page 4: Intro to ABC

Intro to ABC Example Conclusion

ABC tolerance

accept θ′ if δ(w,y) < ε

where

ε > 0 is the tolerance level

δ(·, ·) is a distance function(for an appropriate choice of norm)

Inference is more exact when ε is close to zero. butmore proposed θ′ are rejected(tradeoff between accuracy & computational cost)

Pritchard, Seielstad, Perez-Lezaun & Feldman (1999) Mol. Biol. Evol. 16(12)

Page 5: Intro to ABC

Intro to ABC Example Conclusion

Summary statistics

Computing δ(w,y) for w1, . . . , wn and y1, . . . , yncan be very expensive for large n

Instead, compute summary statistics s(y)

e.g. sufficient statistics(only available for exponential family)

Page 6: Intro to ABC

Intro to ABC Example Conclusion

Sufficient statistics

Fisher-Neyman factorisation theorem:

if s(y) is sufficient for θ

then p(y|θ) = f(y) g (s(y)|θ)

only applies to Potts, Ising, exponential randomgraph models (ERGM)

otherwise, selection of suitable summarystatistics can be a very difficult problem

Page 7: Intro to ABC

Intro to ABC Example Conclusion

ABC rejection sampler

Algorithm 2 ABC rejection sampler

1: for all iterations t ∈ 1 . . . T do2: Draw independent proposal θ′ ∼ π(θ)3: Generate w ∼ f(·|θ′)4: if ‖s(w)− s(y)‖ < ε then5: set θt ← θ′

6: else7: set θt ← θt−1

8: end if9: end for

Approximates π(θ|y) by πε(θ | ‖s(w)− s(y)‖ < ε)Marin, Pudlo, Robert & Ryder (2012) Stat. Comput. 22(6)Marin & Robert (2014) Bayesian Essentials with R §8.3

Page 8: Intro to ABC

Intro to ABC Example Conclusion

A trivial (counter) example

Gaussian with unknown mean:

y ∼ N (µ, 1)

natural conjugate prior:

π(µ) ∼ N (0, 106)

sufficient statistic:

y = 1n

∑ni=1 yi

posterior is analytically tractable:

π(µ|y) ∼ N (m′, s2′)

where1s2′

=(n1

+ 1106

)m′ = s2′ (ny

1+ 0)

= nyn+10−6

∴ no need for ABC (nor MCMC) in practice

Page 9: Intro to ABC

Intro to ABC Example Conclusion

R code

π(µ|y)

1.5 2.0 2.5 3.0 3.5 4.0 4.5

0.0

0.2

0.4

0.6

0.8

�y ← rnorm ( n=5, mean=3, sd=1)n ← length ( y )ybar ← sum( y )/npos t s ← 1/ ( n + 1e−6)pos t m ← pos t s ∗ n∗ ybarpos t s im ← rnorm (10000 , pos t m, sd=sqr t ( pos t s ) )

Page 10: Intro to ABC

Intro to ABC Example Conclusion

now with ABC

π(µ)

−4000 −2000 0 2000 4000

0e

+0

02

e−

04

4e

−0

4

πε(µ | δ(s(w), s(y)) < ε)

0 2 4 6

0.0

0.2

0.4

0.6

0.8

�prop mu ← rnorm (10000 , 0 , sqr t (1 e6 ) )pseudo ← rnorm ( n∗ 10000 , prop mu, 1)pseudoMx ← matrix ( pseudo , nrow=10000 , ncol=n)ps ybar ← rowMeans ( pseudoMx )ps norm ← abs ( ps ybar − ybar )e p s i l o n ← so r t ( ps norm ) [ 2 0 ]prop keep ← prop mu[ ps norm <= e p s i l o n ]

Page 11: Intro to ABC

Intro to ABC Example Conclusion

choice of ε

−15 −10 −5 0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

(a) ε = 15.498

0 2 4 6

0.0

0.2

0.4

0.6

0.8

(b) ε = 3.47

1.5 2.0 2.5 3.0 3.5 4.0 4.5

0.0

0.2

0.4

0.6

0.8

1.0

(c) ε = 1.65

1.5 2.0 2.5 3.0 3.5 4.0 4.5

0.0

0.5

1.0

1.5

2.0

(d) ε = 1.11

Page 12: Intro to ABC

Intro to ABC Example Conclusion

Improvements to ABC

Alternatives to i.i.d. proposals:

ABC-MCMC

ABC-SMC

Regression adjustment

compensates for larger ε

Validation of ABC approximation

ABC for model choice

Page 13: Intro to ABC

Intro to ABC Example Conclusion

Summary

ABC is a method for likelihood-free inference

It enables inference for models that areotherwise computationally intractable

Main components of ABC:

π(θ) proposal density for θ′

f(·|θ) generative model for wε tolerance level

δ(·, ·) distance functions(y) summary statistics

Page 14: Intro to ABC

Intro to ABC Example Conclusion

References

Jean-Michel Marin & Christian RobertBayesian Essentials with RSpringer-Verlag, 2014.

Jean-Michel Marin, Pierre Pudlo, Christian Robert & Robin RyderApproximate Bayesian computational methods.Statistics & Computing, 22(6): 1167–80, 2012.

Simon Tavare, David Balding, Robert Griffiths & Peter DonnellyInferring coalescence times from DNA sequence data.Genetics, 145(2): 505–18, 1997.

Jonathan Pritchard, Mark Seielstad, Anna Perez-Lezaun & MarcusFeldmanPopulation Growth of Human Y Chromosomes: A Study of YChromosome Microsatellites.Mol. Biol. Evol. 16(12): 1791–98, 1999.