Tolerance levels for the Approximate Bayesian … levels for the Approximate Bayesian Computation...

Tolerance levels for the ApproximateBayesian Computation algorithm

Matthew Robinson Wentao Li Paul FernheadStatistics and Operational Research Doctoral Training Centre, Lancaster University

Applied Bayesian Computation

Preliminaries(1),(3):

Start with an n − dimensional set of data yobs and a model forthe data depending on an unknown θ.

Then take a prior distribution π, the importance of this priorbeing that θ ∈ Ω(π).

The probability density of the data given a specific parametervalue is π(y |θ).

The ABC requires as an assumption for its use that it is simpleto simulate Y from π(y |θ)

The ABC posterior is often defined in terms of:a function S() which is the summary statistic, e.g. mean, skew etc.a density kernel K () which integrates to 1, a weighted acceptanceprobability corresponding to the summary statistic.a bandwidth h > 0, a rescaling to make sure everything falls within thedensity kernel.it has been shown without loss of generality, we can take max(K ()) = 1.

Then an approximation to the likelihood is:

p(θ|S(yobs)) =

∫π(y |θ)K [S(y)− S(yobs)/h]dy

The ABC posterior can be defined as:

πABC(θ|S(yobs) ∝ π(θ)p(θ|S(yobs)

Algorithm(1),(3):

There are 3 versions of the ABC algorithm, in theseexperiments we used the simplest:

Input: A set of data yobs, a function S(), a prior π(θ) > 0, aninterger N and ε.

Initial step: S(yobs)

Iterate: for( i in 1:N)Step 1. simulate θi from π(θ)Step 2. simulate ysim from π(y |θi) and calculate S(ysim)Step 3. if ||S(ysim)− S(yobs)|| < ε set θi = θ otherwise reject θi

Output a set of parameter values θiNi=1.

Problems with Tolerance

Tolerance controls what proportion of the estimates getthrough, there are two problems with choosing a tolerance:

choose too high a tolerance, then the distribution is biasedtowards the prior, e.g. accept everything and it returns ourprior.

choose too low a tolerance and too few values are accepted sothe outliers amongst these carry too much weight, whereas themean converges to the true value (Monte Carlo Error).

Figure 1: Error of the ABC algorithm with varying tolerance.

Some issues investigated

Different choices of summary statistic will lead to differentchoices of optimum tolerance.

Increasing dimensions of summary statistic will require moreprecise choices of tolerance level.

Choice of prior will affect the choice of tolerance.

Results:

Choice of summary statistic has often been tailored to the problemat hand, we hope to provide some information as to how thetolerance changes.

For our study we limited ourselves to multivariate normal distributions, with the unknown

parameter being the variance or covariance matrix, and the prior being normal or multivariate

normal with Identity covariance matrix, composed with a covariance operation. Our observed

data was sampled from a normal(0,5) distribution or multivariate normal distribution with

Identity covariance matrix.

Figure 2: Optimal tolerance against number of iterations for (resp.) variance, 4th, 3rd and 5th

moments: colours correspond to variance of the prior

The similarity between the first and second and the third andfourth graphs in Fig2. show that even and odd moments act insimilar ways.

Multiple-dimensions to the Summary Statistic

When using the ABC algorithm, people often do not know theunderlying distribution, and so do not know the sufficient statistic,this leads to as many statistics being used as possible so as tobetter characterise the data.

Figure 3: Tables of Optimum Tolerance: the columns depict different summarystatistics and the rows increasing numbers of Monte Carlo Samples

The red boxes around columns 7 through 11, depict first thesufficient statistic followed by the sufficient and the 3rd moment,sufficient, 3rd and 4th moments and so on.

The lack of discrepancy between values show that once you includethe sufficient statistic, increasing the dimension of the summarystatistic has no effect on the optimum tolerance.

Figure 4: The effect of different summary statistics on the tolerance

Fig4. Shows that if you do not include the sufficient statistic,increasing the dimension of the summary statistic you increase theoptimum tolerance for 2 dimensions but not for 1.

Shape of Error Graph

Error is normally thought to be U-shaped but (2) implies this mightnot be the case, due to a term in the bias; we have verified thatthis occurs in the systems we’ve been looking at.

Figure 5: Multiple minima

Applications of theABC algorithm

The ABC algorithm is oftenapplied in Biology, in areassuch as Evolution andEcology.(4)

Example: A form of the ABC, Markov

Chain Monte Carlo, was used to

characterise the speciation of apes such as

chimpanzees by estimating the split times

between alleles, and the population and

the gene flow rates. There are two main

methods, one constructs 3 populations at

differing loci and uses a split-time to

describe the interaction between the 3, the

other allows multiple evolving species

evolving from locally determined gene flow

rates. As there are many loci, this lends

itself to the ABC method rather than a

more direct maximum likelihood

method.(6)

Applications of ourwork

It is hoped that this researchwill help in the choice ofsummary statistics, thusincreasing the accuracy, orspeed of suchcharacterisations.

Acknowledgements

We acknowledge the financial support of

STOR-i under EPSRC.

References(1)Fearnhead, P., & Prangle, D. (2012).

Constructing summary statistics for

approximate Bayesian computation:

semiautomatic approximate Bayesian

computation. Journal of the Royal

Statistical Society: Series B (Statistical

Methodology), 74(3), 419-474.

(2)Barber, S. Voss, J. & Webster, M.

(2014).

The Rate of Convergence for Approximate

Bayesian Computation. arXiv:1311.2038v3

[math.ST] 18th Jul 2014.

(3)Marin, J. M., Pudlo, P., Robert, C. P.,

& Ryder, R. J. (2012).

Approximate Bayesian computational

methods. Statistics and Computing,

22(6), 1167-1180.

Statistics and Computing, 22(6),

1167-1180.(4)Beaumont, M. A. (2010).

Approximate Bayesian computation in

evolution and ecology. Annual review of

ecology, evolution, and systematics, 41,

379-406.(5)Robert, C. P., & Casella, G. (2010).

Introducing Monte Carlo Methods with R

(Vol. 18). New York: Springer.

(6)Becquet, C., & Przeworski, M. (2007).

A new approach to estimate parameters

of speciation models with application to

apes. Genome research, 17(10),

1505-1519.

http://www.stor-i.lancs.ac.uk/intern/interns/2014 [email protected]

http://www.stor-i.lancs.ac.uk/intern/interns/2014

Tolerance levels for the Approximate Bayesian … levels for the Approximate Bayesian Computation...

Documents

Transcript of Tolerance levels for the Approximate Bayesian … levels for the Approximate Bayesian Computation...