Tolerance levels for the Approximate Bayesian … levels for the Approximate Bayesian Computation...
Transcript of Tolerance levels for the Approximate Bayesian … levels for the Approximate Bayesian Computation...
Tolerance levels for the ApproximateBayesian Computation algorithm
Matthew Robinson Wentao Li Paul FernheadStatistics and Operational Research Doctoral Training Centre, Lancaster University
Applied Bayesian Computation
Preliminaries(1),(3):
Start with an n − dimensional set of data yobs and a model forthe data depending on an unknown θ.
Then take a prior distribution π, the importance of this priorbeing that θ ∈ Ω(π).
The probability density of the data given a specific parametervalue is π(y |θ).
The ABC requires as an assumption for its use that it is simpleto simulate Y from π(y |θ)
The ABC posterior is often defined in terms of:a function S() which is the summary statistic, e.g. mean, skew etc.a density kernel K () which integrates to 1, a weighted acceptanceprobability corresponding to the summary statistic.a bandwidth h > 0, a rescaling to make sure everything falls within thedensity kernel.it has been shown without loss of generality, we can take max(K ()) = 1.
Then an approximation to the likelihood is:
p(θ|S(yobs)) =
∫π(y |θ)K [S(y)− S(yobs)/h]dy
The ABC posterior can be defined as:
πABC(θ|S(yobs) ∝ π(θ)p(θ|S(yobs)
Algorithm(1),(3):
There are 3 versions of the ABC algorithm, in theseexperiments we used the simplest:
Input: A set of data yobs, a function S(), a prior π(θ) > 0, aninterger N and ε.
Initial step: S(yobs)
Iterate: for( i in 1:N)Step 1. simulate θi from π(θ)Step 2. simulate ysim from π(y |θi) and calculate S(ysim)Step 3. if ||S(ysim)− S(yobs)|| < ε set θi = θ otherwise reject θi
Output a set of parameter values θiNi=1.
Problems with Tolerance
Tolerance controls what proportion of the estimates getthrough, there are two problems with choosing a tolerance:
choose too high a tolerance, then the distribution is biasedtowards the prior, e.g. accept everything and it returns ourprior.
choose too low a tolerance and too few values are accepted sothe outliers amongst these carry too much weight, whereas themean converges to the true value (Monte Carlo Error).
Figure 1: Error of the ABC algorithm with varying tolerance.
Some issues investigated
Different choices of summary statistic will lead to differentchoices of optimum tolerance.
Increasing dimensions of summary statistic will require moreprecise choices of tolerance level.
Choice of prior will affect the choice of tolerance.
Results:
Choice of summary statistic has often been tailored to the problemat hand, we hope to provide some information as to how thetolerance changes.
For our study we limited ourselves to multivariate normal distributions, with the unknown
parameter being the variance or covariance matrix, and the prior being normal or multivariate
normal with Identity covariance matrix, composed with a covariance operation. Our observed
data was sampled from a normal(0,5) distribution or multivariate normal distribution with
Identity covariance matrix.
Figure 2: Optimal tolerance against number of iterations for (resp.) variance, 4th, 3rd and 5th
moments: colours correspond to variance of the prior
The similarity between the first and second and the third andfourth graphs in Fig2. show that even and odd moments act insimilar ways.
Multiple-dimensions to the Summary Statistic
When using the ABC algorithm, people often do not know theunderlying distribution, and so do not know the sufficient statistic,this leads to as many statistics being used as possible so as tobetter characterise the data.
Figure 3: Tables of Optimum Tolerance: the columns depict different summarystatistics and the rows increasing numbers of Monte Carlo Samples
The red boxes around columns 7 through 11, depict first thesufficient statistic followed by the sufficient and the 3rd moment,sufficient, 3rd and 4th moments and so on.
The lack of discrepancy between values show that once you includethe sufficient statistic, increasing the dimension of the summarystatistic has no effect on the optimum tolerance.
Figure 4: The effect of different summary statistics on the tolerance
Fig4. Shows that if you do not include the sufficient statistic,increasing the dimension of the summary statistic you increase theoptimum tolerance for 2 dimensions but not for 1.
Shape of Error Graph
Error is normally thought to be U-shaped but (2) implies this mightnot be the case, due to a term in the bias; we have verified thatthis occurs in the systems we’ve been looking at.
Figure 5: Multiple minima
Applications of theABC algorithm
The ABC algorithm is oftenapplied in Biology, in areassuch as Evolution andEcology.(4)
Example: A form of the ABC, Markov
Chain Monte Carlo, was used to
characterise the speciation of apes such as
chimpanzees by estimating the split times
between alleles, and the population and
the gene flow rates. There are two main
methods, one constructs 3 populations at
differing loci and uses a split-time to
describe the interaction between the 3, the
other allows multiple evolving species
evolving from locally determined gene flow
rates. As there are many loci, this lends
itself to the ABC method rather than a
more direct maximum likelihood
method.(6)
Applications of ourwork
It is hoped that this researchwill help in the choice ofsummary statistics, thusincreasing the accuracy, orspeed of suchcharacterisations.
Acknowledgements
We acknowledge the financial support of
STOR-i under EPSRC.
References(1)Fearnhead, P., & Prangle, D. (2012).
Constructing summary statistics for
approximate Bayesian computation:
semiautomatic approximate Bayesian
computation. Journal of the Royal
Statistical Society: Series B (Statistical
Methodology), 74(3), 419-474.
(2)Barber, S. Voss, J. & Webster, M.
(2014).
The Rate of Convergence for Approximate
Bayesian Computation. arXiv:1311.2038v3
[math.ST] 18th Jul 2014.
(3)Marin, J. M., Pudlo, P., Robert, C. P.,
& Ryder, R. J. (2012).
Approximate Bayesian computational
methods. Statistics and Computing,
22(6), 1167-1180.
Statistics and Computing, 22(6),
1167-1180.(4)Beaumont, M. A. (2010).
Approximate Bayesian computation in
evolution and ecology. Annual review of
ecology, evolution, and systematics, 41,
379-406.(5)Robert, C. P., & Casella, G. (2010).
Introducing Monte Carlo Methods with R
(Vol. 18). New York: Springer.
(6)Becquet, C., & Przeworski, M. (2007).
A new approach to estimate parameters
of speciation models with application to
apes. Genome research, 17(10),
1505-1519.
http://www.stor-i.lancs.ac.uk/intern/interns/2014 [email protected]