Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013

Discussion of“Approximate Bayesian Computation (ABC) as the new

empirical Bayes approach” by Christian RobertThe validation of ABC

Francesco Pauli

DEAMS - University of Trieste

Padova, March, 21st 2013

F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 1 / 19

ABC: picture

We actually have πABC (θ|y) = π(θ|s(Y ) ∈ Uε(s(yobs))) or a nonparametric approximation.

this is the easy part, there are various answers

“ε matters little if “small enough””,can be included in estimation.

We can at best aim at π(θ|s(y))

we need a good statistic S ,good with respect to what?

consistency,coverage.

We want π(θ|y)

What legitimates using πABC (θ|y)?Type of justification is also connected to whether ABC is a computational

tool or a ‘new inference machine’.


Legitimacy: consistency results I

πABC (θ|s) consistent for π(θ|s)

It is easily seen that, as ε→ 0, πABC (θ|Data) tend to π(θ|s(y)).

Biau et al (2012) define the approximation as a k nearest neighbourand prove consistency.

What about π(θ|s) and π(θ|y)?

Equal if S is sufficient for θ.

Consistent if S ‘tends to sufficiency’.

The approach taken is to find conditions for insufficient S to guaranteeconsistency.


Legitimacy: consistency results II

Using the framework of noisy ABC, consistency for π(θ|y) is shownby Dean, Singh, Jasra, & Peters, 2011.

The proof is written assuming that observations and not a summarystatistic are used.

However, they also say that “If the mapping S() preserves theidentifiability of the system, that is to say if assumption (A1) alsoholds for the HMMs with observations S(Y1); S(Y2), then it is trivialto see that assumptions (A2)-(A7) will also be preserved for allreasonable choices of S() and thus that Theorems 1, 2 and 3 will alsohold for ABC MLE performed using the summary statistic”


Legitimacy: consistency results III

The conclusion is “Theorems 1 and 2 provide a theoreticaljustification for the ABC MLE procedure analogous to that providedfor the standard MLE procedure by the classical notion of asymptoticconsistency. In particular they show that an arbitrary degree ofaccuracy in the parameter estimate can be achieved given sufficientdata and a sufficiently small ε.”


Legitimacy: consistency results IV

Fearnhead and Prangle (2012) also put forward a consistency resultwithin the noisy ABC framework,

in particular assuming that a coverage property holds: “[. . .] underrepeated sampling from the prior, data and summary statistics, eventsassigned probability q by the ABC posterior will occur with probabilityq.”

they show that “[. . .] under the standard regularity conditions(Bernardo and Smith, 1994), the noisy ABC posterior will convergeonto a point mass on the true parameter value as m → ∞.”


Legitimacy: consistency results V

Marin et al (2012) focus on consistency in model selection;

they state condition for the summary statistics in order to obtainconsistent model selection through Bayes factors.

they also point out that “(a) different statistics should be used forestimation and for testing and (b) that they should not be mixed in asingle summary statistic. ”


Legitimacy: consistency results VI

Connection between the different results? Are they equivalent orcover different situations?

Undoubtedly, they offer legitimacy to ABC procedures.

It seems to me they go into the direction of justifying the procedureper se, not as an approximation of the standard ABC (this might berelevant to interpreting ABC as a mere computational tool or a newinference type).


Is consistency enough?

Consistency is a nice property.

It does not say how far from the target π(θ|y) do we get in a specificinstance.

The strategy is to find a class of statistics S for which ABC isconsistent.

Does not allow to say which of the different (insufficient) statistics orstrategies to select the statistics is better.

It is true that some of the aforementioned works state optimality ofparticular strategies, for instance Fearnhead & Prangle state that“[. . .] choosing summary statistics as the posterior means producesABC estimators that are optimal in terms of minimizing quadraticloss”,

but it is also true that when different procedures are compared thepicture is not totally clear.


Comparison of procedures

Blum et al (2013) compare methods of dimension reduction in ABC;

that is, the methods differ because of the choice of the summarystatistics;

the comparison is based on simulations on three different models;

they put forward that “the most suitable set of summary statistics foran analysis may be dataset dependent”;

eventually, no uniformly best method is found.

This would call for “application specific” validation to complement thetheory.


Diagnostics based on coverage properties

Prangle et al (2013) propose diagnostics based on the coverageproperty “For inference on a continuous scalar parameter, θ, aninformal definition is that a given credible interval based on (θ|y0),where y0 ∼ π(y |θ0,m0) for fixed m0, should contain the trueparameter, θ0, the appropriate proportion of times.”

Diagnostics are obtained repeatedly constructing ABC approximationsfor known values of the parameters (for known models) and checkingthat the coverage property holds.

Technically, these becomes a problem of checking uniform distributionof p-values. Details


What kind of justification is most appropriate?

Is using validation the most appropriate thing to do?

Can we say something about how far do we get from π(θ|y)?

Does using validation qualify the method as approximation or newinference?

Prangle et al (2013) say that “Note that the above results do notprove that the posterior π(θ|y) is the only distribution to satisfycoverage with respect to our choice of H. However, we are unaware ofany other such distributions that are likely to arise in the ABCcontext.” this may be more coherent with seeing ABC as a newinference machine.

Connections with Monahan and Boos (1992)?


ABCel

In ABCel the likelihood is substituted by the EL;

no simulation of the sample is involved;

As a side note, it seems to me that this is a different framework, evenif we look at the empirical likelihood as a summary statistic: is ABCel

A?

Anyway, since we substitute the likelihood with a surrogate, the issueof validating the results we get is relevant.


Legitimacy of EL in (A)BC I

Lazar (2003) proposed using EL in the Bayesian paradigm;

the procedure seem to lack a general justification;

in particular a simulation study is performed;

the conclusion is that “Based on both the Monahan & Boos (1992)heuristic and an examination of the frequentist properties of Bayesianintervals, it appears reasonable to use empirical likelihood within theBayesian paradigm.”


Legitimacy of EL in (A)BC II

however “These results need to be interpreted with some care. Whilethey indicate that it is feasible to consider a Bayesian inferentialprocedure based on replacing the data likelihood with empiricallikelihood, the validity of the posterior inference needs to beestablished for each case individually. For example, as demonstratedin an unpublished Carnegie Mellon University technical report by L. A.Wasserman, empirical likelihood for the median and Jeffreys’likelihood are related, and hence the two can be expected to exhibitsimilar poor behaviour.”

This may suggest that the proposals above for the diagnostics in ABCcan be exploited here.


Legitimacy of EL in (A)BC III

Adimari & Pauli (2010) also employed EL as a surrogate for theproper likelihood in the context of pairwise likelihood inference;

they argue that “ based on general results for empirical likelihood,[. . .] such a surrogate has good asymptotic properties.”

In particular, asymptotic normality with covariance matrix theGodambe information matrix is put forward as a justification;

they also explored its efficacy “by comparing it with the ordinaryposterior distribution on simulated datasets.”


Diagnostics based on coverage properties, details I

g(θ|y), Gy (θ) resp. density and df approximating π(θ|y);

B(α) : [0, 1] → B([0, 1]) s.t. BM(α) = α;

C (y , α) = G 1[B(α)] a cred. int. according to g ;

H(θ, y) df for (θ, y).

g satisfies the coverage property w.r. to H(θ0, y0) if ∀ B, α ∈ [0, 1]

P(θ0 ∈ C (y0, α)) = α

That is, if

p0 = Gy0(θ0) ∼ U(0, 1)


Diagnostics based on coverage properties, details II

θ

y

g(θ|y0) αC(y0,α)

π(θ|y0)


Diagnostics based on coverage properties, details III

θ

y

C(y0,α)

α

Back


Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013

Education

Transcript of Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013