Tolerance levels for the Approximate Bayesian … levels for the Approximate Bayesian Computation...

1
Tolerance levels for the Approximate Bayesian Computation algorithm Matthew Robinson Wentao Li Paul Fernhead Statistics and Operational Research Doctoral Training Centre, Lancaster University Applied Bayesian Computation Preliminaries (1),(3) : Start with an n - dimensional set of data y obs and a model for the data depending on an unknown θ . Then take a prior distribution π , the importance of this prior being that θ Ω(π ). The probability density of the data given a specific parameter value is π (y |θ ). The ABC requires as an assumption for its use that it is simple to simulate Y from π (y |θ ) The ABC posterior is often defined in terms of: a function S () which is the summary statistic, e.g. mean, skew etc. a density kernel K () which integrates to 1, a weighted acceptance probability corresponding to the summary statistic. a bandwidth h > 0, a rescaling to make sure everything falls within the density kernel. it has been shown without loss of generality, we can take max (K ()) = 1. Then an approximation to the likelihood is: p (θ |S (y obs )) = Z π (y |θ )K [{S (y ) - S (y obs )}/h ]dy The ABC posterior can be defined as: π ABC (θ |S (y obs ) π (θ )p (θ |S (y obs ) Algorithm (1),(3) : There are 3 versions of the ABC algorithm, in these experiments we used the simplest: Input: A set of data y obs , a function S (), a prior π (θ ) > 0, an interger N and . Initial step: S (y obs ) Iterate: for( i in 1:N) Step 1. simulate θ i from π (θ ) Step 2. simulate y sim from π (y |θ i ) and calculate S (y sim ) Step 3. if ||S (y sim ) - S (y obs )|| < set θ i = θ otherwise reject θ i Output a set of parameter values {θ i } N i =1 . Problems with Tolerance Tolerance controls what proportion of the estimates get through, there are two problems with choosing a tolerance: choose too high a tolerance, then the distribution is biased towards the prior, e.g. accept everything and it returns our prior. choose too low a tolerance and too few values are accepted so the outliers amongst these carry too much weight, whereas the mean converges to the true value (Monte Carlo Error). Figure 1: Error of the ABC algorithm with varying tolerance. Some issues investigated Different choices of summary statistic will lead to different choices of optimum tolerance. Increasing dimensions of summary statistic will require more precise choices of tolerance level. Choice of prior will affect the choice of tolerance. Results: Choice of summary statistic has often been tailored to the problem at hand, we hope to provide some information as to how the tolerance changes. For our study we limited ourselves to multivariate normal distributions, with the unknown parameter being the variance or covariance matrix, and the prior being normal or multivariate normal with Identity covariance matrix, composed with a covariance operation. Our observed data was sampled from a normal(0,5) distribution or multivariate normal distribution with Identity covariance matrix. Figure 2: Optimal tolerance against number of iterations for (resp.) variance, 4th, 3rd and 5th moments: colours correspond to variance of the prior The similarity between the first and second and the third and fourth graphs in Fig2. show that even and odd moments act in similar ways. Multiple-dimensions to the Summary Statistic When using the ABC algorithm, people often do not know the underlying distribution, and so do not know the sufficient statistic, this leads to as many statistics being used as possible so as to better characterise the data. Figure 3: Tables of Optimum Tolerance: the columns depict different summary statistics and the rows increasing numbers of Monte Carlo Samples The red boxes around columns 7 through 11, depict first the sufficient statistic followed by the sufficient and the 3rd moment, sufficient, 3rd and 4th moments and so on. The lack of discrepancy between values show that once you include the sufficient statistic, increasing the dimension of the summary statistic has no effect on the optimum tolerance. Figure 4: The effect of different summary statistics on the tolerance Fig4. Shows that if you do not include the sufficient statistic, increasing the dimension of the summary statistic you increase the optimum tolerance for 2 dimensions but not for 1. Shape of Error Graph Error is normally thought to be U-shaped but (2) implies this might not be the case, due to a term in the bias; we have verified that this occurs in the systems we’ve been looking at. Figure 5: Multiple minima Applications of the ABC algorithm The ABC algorithm is often applied in Biology, in areas such as Evolution and Ecology. (4) Example: A form of the ABC, Markov Chain Monte Carlo, was used to characterise the speciation of apes such as chimpanzees by estimating the split times between alleles, and the population and the gene flow rates. There are two main methods, one constructs 3 populations at differing loci and uses a split-time to describe the interaction between the 3, the other allows multiple evolving species evolving from locally determined gene flow rates. As there are many loci, this lends itself to the ABC method rather than a more direct maximum likelihood method. (6) Applications of our work It is hoped that this research will help in the choice of summary statistics, thus increasing the accuracy, or speed of such characterisations. Acknowledgements We acknowledge the financial support of STOR-i under EPSRC. References (1) Fearnhead, P., & Prangle, D. (2012). Constructing summary statistics for approximate Bayesian computation: semiautomatic approximate Bayesian computation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(3), 419-474. (2) Barber, S. Voss, J. & Webster, M. (2014). The Rate of Convergence for Approximate Bayesian Computation. arXiv:1311.2038v3 [math.ST] 18th Jul 2014. (3) Marin, J. M., Pudlo, P., Robert, C. P., & Ryder, R. J. (2012). Approximate Bayesian computational methods. Statistics and Computing, 22(6), 1167-1180. Statistics and Computing, 22(6), 1167-1180. (4) Beaumont, M. A. (2010). Approximate Bayesian computation in evolution and ecology. Annual review of ecology, evolution, and systematics, 41, 379-406. (5) Robert, C. P., & Casella, G. (2010). Introducing Monte Carlo Methods with R (Vol. 18). New York: Springer. (6) Becquet, C., & Przeworski, M. (2007). A new approach to estimate parameters of speciation models with application to apes. Genome research, 17(10), 1505-1519. http://www.stor-i.lancs.ac.uk/intern/interns/2014 [email protected]

Transcript of Tolerance levels for the Approximate Bayesian … levels for the Approximate Bayesian Computation...

Page 1: Tolerance levels for the Approximate Bayesian … levels for the Approximate Bayesian Computation ... composed with a covariance ... Tolerance levels for the Approximate Bayesian Computation

Tolerance levels for the ApproximateBayesian Computation algorithm

Matthew Robinson Wentao Li Paul FernheadStatistics and Operational Research Doctoral Training Centre, Lancaster University

Applied Bayesian Computation

Preliminaries(1),(3):

Start with an n − dimensional set of data yobs and a model forthe data depending on an unknown θ.

Then take a prior distribution π, the importance of this priorbeing that θ ∈ Ω(π).

The probability density of the data given a specific parametervalue is π(y |θ).

The ABC requires as an assumption for its use that it is simpleto simulate Y from π(y |θ)

The ABC posterior is often defined in terms of:a function S() which is the summary statistic, e.g. mean, skew etc.a density kernel K () which integrates to 1, a weighted acceptanceprobability corresponding to the summary statistic.a bandwidth h > 0, a rescaling to make sure everything falls within thedensity kernel.it has been shown without loss of generality, we can take max(K ()) = 1.

Then an approximation to the likelihood is:

p(θ|S(yobs)) =

∫π(y |θ)K [S(y)− S(yobs)/h]dy

The ABC posterior can be defined as:

πABC(θ|S(yobs) ∝ π(θ)p(θ|S(yobs)

Algorithm(1),(3):

There are 3 versions of the ABC algorithm, in theseexperiments we used the simplest:

Input: A set of data yobs, a function S(), a prior π(θ) > 0, aninterger N and ε.

Initial step: S(yobs)

Iterate: for( i in 1:N)Step 1. simulate θi from π(θ)Step 2. simulate ysim from π(y |θi) and calculate S(ysim)Step 3. if ||S(ysim)− S(yobs)|| < ε set θi = θ otherwise reject θi

Output a set of parameter values θiNi=1.

Problems with Tolerance

Tolerance controls what proportion of the estimates getthrough, there are two problems with choosing a tolerance:

choose too high a tolerance, then the distribution is biasedtowards the prior, e.g. accept everything and it returns ourprior.

choose too low a tolerance and too few values are accepted sothe outliers amongst these carry too much weight, whereas themean converges to the true value (Monte Carlo Error).

Figure 1: Error of the ABC algorithm with varying tolerance.

Some issues investigated

Different choices of summary statistic will lead to differentchoices of optimum tolerance.

Increasing dimensions of summary statistic will require moreprecise choices of tolerance level.

Choice of prior will affect the choice of tolerance.

Results:

Choice of summary statistic has often been tailored to the problemat hand, we hope to provide some information as to how thetolerance changes.

For our study we limited ourselves to multivariate normal distributions, with the unknown

parameter being the variance or covariance matrix, and the prior being normal or multivariate

normal with Identity covariance matrix, composed with a covariance operation. Our observed

data was sampled from a normal(0,5) distribution or multivariate normal distribution with

Identity covariance matrix.

Figure 2: Optimal tolerance against number of iterations for (resp.) variance, 4th, 3rd and 5th

moments: colours correspond to variance of the prior

The similarity between the first and second and the third andfourth graphs in Fig2. show that even and odd moments act insimilar ways.

Multiple-dimensions to the Summary Statistic

When using the ABC algorithm, people often do not know theunderlying distribution, and so do not know the sufficient statistic,this leads to as many statistics being used as possible so as tobetter characterise the data.

Figure 3: Tables of Optimum Tolerance: the columns depict different summarystatistics and the rows increasing numbers of Monte Carlo Samples

The red boxes around columns 7 through 11, depict first thesufficient statistic followed by the sufficient and the 3rd moment,sufficient, 3rd and 4th moments and so on.

The lack of discrepancy between values show that once you includethe sufficient statistic, increasing the dimension of the summarystatistic has no effect on the optimum tolerance.

Figure 4: The effect of different summary statistics on the tolerance

Fig4. Shows that if you do not include the sufficient statistic,increasing the dimension of the summary statistic you increase theoptimum tolerance for 2 dimensions but not for 1.

Shape of Error Graph

Error is normally thought to be U-shaped but (2) implies this mightnot be the case, due to a term in the bias; we have verified thatthis occurs in the systems we’ve been looking at.

Figure 5: Multiple minima

Applications of theABC algorithm

The ABC algorithm is oftenapplied in Biology, in areassuch as Evolution andEcology.(4)

Example: A form of the ABC, Markov

Chain Monte Carlo, was used to

characterise the speciation of apes such as

chimpanzees by estimating the split times

between alleles, and the population and

the gene flow rates. There are two main

methods, one constructs 3 populations at

differing loci and uses a split-time to

describe the interaction between the 3, the

other allows multiple evolving species

evolving from locally determined gene flow

rates. As there are many loci, this lends

itself to the ABC method rather than a

more direct maximum likelihood

method.(6)

Applications of ourwork

It is hoped that this researchwill help in the choice ofsummary statistics, thusincreasing the accuracy, orspeed of suchcharacterisations.

Acknowledgements

We acknowledge the financial support of

STOR-i under EPSRC.

References(1)Fearnhead, P., & Prangle, D. (2012).

Constructing summary statistics for

approximate Bayesian computation:

semiautomatic approximate Bayesian

computation. Journal of the Royal

Statistical Society: Series B (Statistical

Methodology), 74(3), 419-474.

(2)Barber, S. Voss, J. & Webster, M.

(2014).

The Rate of Convergence for Approximate

Bayesian Computation. arXiv:1311.2038v3

[math.ST] 18th Jul 2014.

(3)Marin, J. M., Pudlo, P., Robert, C. P.,

& Ryder, R. J. (2012).

Approximate Bayesian computational

methods. Statistics and Computing,

22(6), 1167-1180.

Statistics and Computing, 22(6),

1167-1180.(4)Beaumont, M. A. (2010).

Approximate Bayesian computation in

evolution and ecology. Annual review of

ecology, evolution, and systematics, 41,

379-406.(5)Robert, C. P., & Casella, G. (2010).

Introducing Monte Carlo Methods with R

(Vol. 18). New York: Springer.

(6)Becquet, C., & Przeworski, M. (2007).

A new approach to estimate parameters

of speciation models with application to

apes. Genome research, 17(10),

1505-1519.

http://www.stor-i.lancs.ac.uk/intern/interns/2014 [email protected]