Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013

Post on 10-May-2015

1.976 views 1 download

description

This discussion was given after my talk by Stefano Cabras, at the Padova workshop on recent advances in statistical inference

Transcript of Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013

DISCUSSIONof

Bayesian Computation via empirical likelihood

Stefano Cabras, stefano.cabras@uc3m.esUniversidad Carlos III de Madrid (Spain)

Universita di Cagliari (Italy)

Padova, 21-Mar-2013

Summary

◮ Problem:

Summary

◮ Problem:◮ a statistical model f (y | θ);◮ a prior π(θ) on θ;

Summary

◮ Problem:◮ a statistical model f (y | θ);◮ a prior π(θ) on θ;

◮ we want to obtain the posterior

πN(θ | y) ∝ LN(θ)π(θ).

Summary

◮ Problem:◮ a statistical model f (y | θ);◮ a prior π(θ) on θ;

◮ we want to obtain the posterior

πN(θ | y) ∝ LN(θ)π(θ).

◮ BUT

Summary

◮ Problem:◮ a statistical model f (y | θ);◮ a prior π(θ) on θ;

◮ we want to obtain the posterior

πN(θ | y) ∝ LN(θ)π(θ).

◮ BUT◮ IF LN(θ) is not available:

◮ THEN all life ABC;

Summary

◮ Problem:◮ a statistical model f (y | θ);◮ a prior π(θ) on θ;

◮ we want to obtain the posterior

πN(θ | y) ∝ LN(θ)π(θ).

◮ BUT◮ IF LN(θ) is not available:

◮ THEN all life ABC;

◮ IF it is not even possible to simulate from f (y | θ):

Summary

◮ Problem:◮ a statistical model f (y | θ);◮ a prior π(θ) on θ;

◮ we want to obtain the posterior

πN(θ | y) ∝ LN(θ)π(θ).

◮ BUT◮ IF LN(θ) is not available:

◮ THEN all life ABC;

◮ IF it is not even possible to simulate from f (y | θ):◮ THEN replace LN(θ) with LEL(θ)

(the proposed BCel procedure):

π(θ|y) ∝ LEL(θ)× π(θ).

.

... what remains about the f (y | θ) ?

... what remains about the f (y | θ) ?

◮ Recall that the Empirical Likelihood is defined, for iid sample,by means of a set of constraints:

Ef (y |θ)[h(Y ,θ)] = 0.

... what remains about the f (y | θ) ?

◮ Recall that the Empirical Likelihood is defined, for iid sample,by means of a set of constraints:

Ef (y |θ)[h(Y ,θ)] = 0.

◮ The relation between θ and obs. Y is model conditioned andexpressed by h(Y ,θ);

... what remains about the f (y | θ) ?

◮ Recall that the Empirical Likelihood is defined, for iid sample,by means of a set of constraints:

Ef (y |θ)[h(Y ,θ)] = 0.

◮ The relation between θ and obs. Y is model conditioned andexpressed by h(Y ,θ);

◮ Constraints are model driven and so there is still a timid traceof f (y | θ) in BCel .

... what remains about the f (y | θ) ?

◮ Recall that the Empirical Likelihood is defined, for iid sample,by means of a set of constraints:

Ef (y |θ)[h(Y ,θ)] = 0.

◮ The relation between θ and obs. Y is model conditioned andexpressed by h(Y ,θ);

◮ Constraints are model driven and so there is still a timid traceof f (y | θ) in BCel .

◮ Examples:

... what remains about the f (y | θ) ?

◮ Recall that the Empirical Likelihood is defined, for iid sample,by means of a set of constraints:

Ef (y |θ)[h(Y ,θ)] = 0.

◮ The relation between θ and obs. Y is model conditioned andexpressed by h(Y ,θ);

◮ Constraints are model driven and so there is still a timid traceof f (y | θ) in BCel .

◮ Examples:◮ The coalescent model example is illuminating in suggesting the

score of the pairwise likelihood;

... what remains about the f (y | θ) ?

◮ Recall that the Empirical Likelihood is defined, for iid sample,by means of a set of constraints:

Ef (y |θ)[h(Y ,θ)] = 0.

◮ The relation between θ and obs. Y is model conditioned andexpressed by h(Y ,θ);

◮ Constraints are model driven and so there is still a timid traceof f (y | θ) in BCel .

◮ Examples:◮ The coalescent model example is illuminating in suggesting the

score of the pairwise likelihood;◮ The residuals in GARCH models.

... a suggestion

What if we do not even known h(·) ?

... how to elicit h(·) automatically

... how to elicit h(·) automatically

... how to elicit h(·) automatically

◮ Set h(Y ,θ) = Y − g(θ), where

g(θ) = Ef (y |θ)(Y |θ),

is the regression function of Y |θ;

... how to elicit h(·) automatically

◮ Set h(Y ,θ) = Y − g(θ), where

g(θ) = Ef (y |θ)(Y |θ),

is the regression function of Y |θ;

◮ g(θ) should be replaced by an estimator g(θ).

How to estimate g(θ) ?

1... similar to Fearnhead, P. and D. Prangle (JRRS-B, 2012) or Cabras,Castellanos, Ruli (Ercim-2012, Oviedo).

How to estimate g(θ) ?

◮ Use a once forever pilot-run simulation study: 1

1... similar to Fearnhead, P. and D. Prangle (JRRS-B, 2012) or Cabras,Castellanos, Ruli (Ercim-2012, Oviedo).

How to estimate g(θ) ?

◮ Use a once forever pilot-run simulation study: 1

1. Consider a grid (or regular lattice) of θ made by M points:θ1, . . . ,θM

1... similar to Fearnhead, P. and D. Prangle (JRRS-B, 2012) or Cabras,Castellanos, Ruli (Ercim-2012, Oviedo).

How to estimate g(θ) ?

◮ Use a once forever pilot-run simulation study: 1

1. Consider a grid (or regular lattice) of θ made by M points:θ1, . . . ,θM

2. Simulate the corresponding y1, . . . , yM

1... similar to Fearnhead, P. and D. Prangle (JRRS-B, 2012) or Cabras,Castellanos, Ruli (Ercim-2012, Oviedo).

How to estimate g(θ) ?

◮ Use a once forever pilot-run simulation study: 1

1. Consider a grid (or regular lattice) of θ made by M points:θ1, . . . ,θM

2. Simulate the corresponding y1, . . . , yM

3. Regress y1, . . . , yM on θ1, . . . ,θM obtaining g(θ).

1... similar to Fearnhead, P. and D. Prangle (JRRS-B, 2012) or Cabras,Castellanos, Ruli (Ercim-2012, Oviedo).

... example: y ∼ N(|θ|, 1)For a pilot run of M = 1000 we have g(θ) = |θ|.

−10 −5 0 5 10

05

10

Pilot−Run s.s.

θ

y

g(θ)

... example: y ∼ N(|θ|, 1)Suppose to draw a n = 100 sample from θ = 2:

Histogram of y

y

Fre

quen

cy

0 1 2 3 4

05

1015

20

... example: y ∼ N(|θ|, 1)The Empirical Likelihood is this

−4 −2 0 2 4

1.0

1.5

2.0

2.5

θ

Em

p. L

ik.

1st Point: Do we need necessarily have to use f (y | θ) ?

1st Point: Do we need necessarily have to use f (y | θ) ?

◮ The above data maybe drawn from a (e.g.) a Half Normal;

1st Point: Do we need necessarily have to use f (y | θ) ?

◮ The above data maybe drawn from a (e.g.) a Half Normal;

◮ How this is reflected in the BCel ?

1st Point: Do we need necessarily have to use f (y | θ) ?

◮ The above data maybe drawn from a (e.g.) a Half Normal;

◮ How this is reflected in the BCel ?◮ For a given data y;

1st Point: Do we need necessarily have to use f (y | θ) ?

◮ The above data maybe drawn from a (e.g.) a Half Normal;

◮ How this is reflected in the BCel ?◮ For a given data y;◮ and h(Y ,θ) fixed;

1st Point: Do we need necessarily have to use f (y | θ) ?

◮ The above data maybe drawn from a (e.g.) a Half Normal;

◮ How this is reflected in the BCel ?◮ For a given data y;◮ and h(Y ,θ) fixed;◮ the LEL(θ) is the same regardless of f (y | θ).

1st Point: Do we need necessarily have to use f (y | θ) ?

◮ The above data maybe drawn from a (e.g.) a Half Normal;

◮ How this is reflected in the BCel ?◮ For a given data y;◮ and h(Y ,θ) fixed;◮ the LEL(θ) is the same regardless of f (y | θ).

Can we ignore f (y | θ) ?

2nd Point: Sample free vs Simulation free

2nd Point: Sample free vs Simulation free

◮ The Empirical Likelihood is ”simulation free” but not ”samplefree”, i.e.

2nd Point: Sample free vs Simulation free

◮ The Empirical Likelihood is ”simulation free” but not ”samplefree”, i.e.

◮ LEL(θ) → LN(θ) for n → ∞,◮ implying π(θ|y) → πN(θ | y) asymptotically in n.

2nd Point: Sample free vs Simulation free

◮ The Empirical Likelihood is ”simulation free” but not ”samplefree”, i.e.

◮ LEL(θ) → LN(θ) for n → ∞,◮ implying π(θ|y) → πN(θ | y) asymptotically in n.

◮ The ABC is ”sample free” but not ”simulation free”, i.e.

2nd Point: Sample free vs Simulation free

◮ The Empirical Likelihood is ”simulation free” but not ”samplefree”, i.e.

◮ LEL(θ) → LN(θ) for n → ∞,◮ implying π(θ|y) → πN(θ | y) asymptotically in n.

◮ The ABC is ”sample free” but not ”simulation free”, i.e.◮ π(θ|ρ(s(y), sobs) < ǫ) → πN(θ | y) as ǫ → 0◮ implying convergence in the number of simulations if s(y) were

sufficient.

2nd Point: Sample free vs Simulation free

◮ The Empirical Likelihood is ”simulation free” but not ”samplefree”, i.e.

◮ LEL(θ) → LN(θ) for n → ∞,◮ implying π(θ|y) → πN(θ | y) asymptotically in n.

◮ The ABC is ”sample free” but not ”simulation free”, i.e.◮ π(θ|ρ(s(y), sobs) < ǫ) → πN(θ | y) as ǫ → 0◮ implying convergence in the number of simulations if s(y) were

sufficient.

A quick answer recommends use BCel

BUTa small sample would recommend ABC ?

3nd Point: How to validate a pseudo-posteriorπ(θ|y) ∝ LEL(θ)× π(θ) ?

3nd Point: How to validate a pseudo-posteriorπ(θ|y) ∝ LEL(θ)× π(θ) ?

◮ The use of pseudo-likelihoods is not new in the Bayesiansetting:

3nd Point: How to validate a pseudo-posteriorπ(θ|y) ∝ LEL(θ)× π(θ) ?

◮ The use of pseudo-likelihoods is not new in the Bayesiansetting:

◮ Empirical Likelihoods:

3nd Point: How to validate a pseudo-posteriorπ(θ|y) ∝ LEL(θ)× π(θ) ?

◮ The use of pseudo-likelihoods is not new in the Bayesiansetting:

◮ Empirical Likelihoods:◮ Lazar (Biometrika, 2003)◮ Mengersen et al. (PNAS, 2012)

◮ ...

3nd Point: How to validate a pseudo-posteriorπ(θ|y) ∝ LEL(θ)× π(θ) ?

◮ The use of pseudo-likelihoods is not new in the Bayesiansetting:

◮ Empirical Likelihoods:◮ Lazar (Biometrika, 2003)◮ Mengersen et al. (PNAS, 2012)

◮ ...

◮ Modified-Likelihoods:

3nd Point: How to validate a pseudo-posteriorπ(θ|y) ∝ LEL(θ)× π(θ) ?

◮ The use of pseudo-likelihoods is not new in the Bayesiansetting:

◮ Empirical Likelihoods:◮ Lazar (Biometrika, 2003)◮ Mengersen et al. (PNAS, 2012)

◮ ...

◮ Modified-Likelihoods:◮ Ventura et al. (JASA, 2009)

◮ Chang and Mukerjee (Stat. & Prob. Letters 2006)◮ ...

3nd Point: How to validate a pseudo-posteriorπ(θ|y) ∝ LEL(θ)× π(θ) ?

◮ The use of pseudo-likelihoods is not new in the Bayesiansetting:

◮ Empirical Likelihoods:◮ Lazar (Biometrika, 2003)◮ Mengersen et al. (PNAS, 2012)

◮ ...

◮ Modified-Likelihoods:◮ Ventura et al. (JASA, 2009)

◮ Chang and Mukerjee (Stat. & Prob. Letters 2006)◮ ...

◮ Quasi-Likelihoods:

3nd Point: How to validate a pseudo-posteriorπ(θ|y) ∝ LEL(θ)× π(θ) ?

◮ The use of pseudo-likelihoods is not new in the Bayesiansetting:

◮ Empirical Likelihoods:◮ Lazar (Biometrika, 2003)◮ Mengersen et al. (PNAS, 2012)

◮ ...

◮ Modified-Likelihoods:◮ Ventura et al. (JASA, 2009)

◮ Chang and Mukerjee (Stat. & Prob. Letters 2006)◮ ...

◮ Quasi-Likelihoods:◮ Lin (Statist. Methodol., 2006)◮ Greco et al. (JSPI, 2008)◮ Ventura et al. (JSPI, 2010)◮ ...

3nd Point: How to validate a pseudo-posteriorπ(θ|y) ∝ LEL(θ)× π(θ) ?

◮ The use of pseudo-likelihoods is not new in the Bayesiansetting:

◮ Empirical Likelihoods:◮ Lazar (Biometrika, 2003) : examples and coverages of C.I.◮ Mengersen et al. (PNAS, 2012)

◮ ...

◮ Modified-Likelihoods:◮ Ventura et al. (JASA, 2009)

◮ Chang and Mukerjee (Stat. & Prob. Letters 2006)◮ ...

◮ Quasi-Likelihoods:◮ Lin (Statist. Methodol., 2006)◮ Greco et al. (JSPI, 2008)◮ Ventura et al. (JSPI, 2010)◮ ...

3nd Point: How to validate a pseudo-posteriorπ(θ|y) ∝ LEL(θ)× π(θ) ?

◮ The use of pseudo-likelihoods is not new in the Bayesiansetting:

◮ Empirical Likelihoods:◮ Lazar (Biometrika, 2003) : examples and coverages of C.I.◮ Mengersen et al. (PNAS, 2012) : examples and coverages of

C.I.◮ ...

◮ Modified-Likelihoods:◮ Ventura et al. (JASA, 2009)

◮ Chang and Mukerjee (Stat. & Prob. Letters 2006)◮ ...

◮ Quasi-Likelihoods:◮ Lin (Statist. Methodol., 2006)◮ Greco et al. (JSPI, 2008)◮ Ventura et al. (JSPI, 2010)◮ ...

3nd Point: How to validate a pseudo-posteriorπ(θ|y) ∝ LEL(θ)× π(θ) ?

◮ The use of pseudo-likelihoods is not new in the Bayesiansetting:

◮ Empirical Likelihoods:◮ Lazar (Biometrika, 2003) : examples and coverages of C.I.◮ Mengersen et al. (PNAS, 2012) : examples and coverages of

C.I.◮ ...

◮ Modified-Likelihoods:◮ Ventura et al. (JASA, 2009) : second order matching

properties;◮ Chang and Mukerjee (Stat. & Prob. Letters 2006)◮ ...

◮ Quasi-Likelihoods:◮ Lin (Statist. Methodol., 2006)◮ Greco et al. (JSPI, 2008)◮ Ventura et al. (JSPI, 2010)◮ ...

3nd Point: How to validate a pseudo-posteriorπ(θ|y) ∝ LEL(θ)× π(θ) ?

◮ The use of pseudo-likelihoods is not new in the Bayesiansetting:

◮ Empirical Likelihoods:◮ Lazar (Biometrika, 2003) : examples and coverages of C.I.◮ Mengersen et al. (PNAS, 2012) : examples and coverages of

C.I.◮ ...

◮ Modified-Likelihoods:◮ Ventura et al. (JASA, 2009) : second order matching

properties;◮ Chang and Mukerjee (Stat. & Prob. Letters 2006) : examples;◮ ...

◮ Quasi-Likelihoods:◮ Lin (Statist. Methodol., 2006)◮ Greco et al. (JSPI, 2008)◮ Ventura et al. (JSPI, 2010)◮ ...

3nd Point: How to validate a pseudo-posteriorπ(θ|y) ∝ LEL(θ)× π(θ) ?

◮ The use of pseudo-likelihoods is not new in the Bayesiansetting:

◮ Empirical Likelihoods:◮ Lazar (Biometrika, 2003) : examples and coverages of C.I.◮ Mengersen et al. (PNAS, 2012) : examples and coverages of

C.I.◮ ...

◮ Modified-Likelihoods:◮ Ventura et al. (JASA, 2009) : second order matching

properties;◮ Chang and Mukerjee (Stat. & Prob. Letters 2006) : examples;◮ ...

◮ Quasi-Likelihoods:◮ Lin (Statist. Methodol., 2006) : examples;◮ Greco et al. (JSPI, 2008) : robustness properties;◮ Ventura et al. (JSPI, 2010) : examples and coverages of C.I.;◮ ...

3nd Point: How to validate a pseudo-posteriorπ(θ|y) ∝ LEL(θ)× π(θ) ?

◮ Monahan & Boos (Biometrika, 1992) proposed a notion ofvalidity:

3nd Point: How to validate a pseudo-posteriorπ(θ|y) ∝ LEL(θ)× π(θ) ?

◮ Monahan & Boos (Biometrika, 1992) proposed a notion ofvalidity:

π(θ|y) should obey the laws of probability in a fashion that isconsistent with statements derived from Bayes’rule.

3nd Point: How to validate a pseudo-posteriorπ(θ|y) ∝ LEL(θ)× π(θ) ?

◮ Monahan & Boos (Biometrika, 1992) proposed a notion ofvalidity:

π(θ|y) should obey the laws of probability in a fashion that isconsistent with statements derived from Bayes’rule.

◮ Very difficult!

3nd Point: How to validate a pseudo-posteriorπ(θ|y) ∝ LEL(θ)× π(θ) ?

◮ Monahan & Boos (Biometrika, 1992) proposed a notion ofvalidity:

π(θ|y) should obey the laws of probability in a fashion that isconsistent with statements derived from Bayes’rule.

◮ Very difficult!

How to validate the pseudo-posterior π(θ|y) when this is notpossible ?

... Last point: the ABC is still a terrific tool

... Last point: the ABC is still a terrific tool

◮ ... a lot of references:

... Last point: the ABC is still a terrific tool

◮ ... a lot of references:◮ Statistical Journals;

... Last point: the ABC is still a terrific tool

◮ ... a lot of references:◮ Statistical Journals;◮ Twitter;

... Last point: the ABC is still a terrific tool

◮ ... a lot of references:◮ Statistical Journals;◮ Twitter;◮ Xiang’s blog ( xianblog.wordpress.com )

... Last point: the ABC is still a terrific tool

◮ ... a lot of references:◮ Statistical Journals;◮ Twitter;◮ Xiang’s blog ( xianblog.wordpress.com )

◮ ... it is tailored to Approximate LN(θ).

... Last point: the ABC is still a terrific tool

◮ ... a lot of references:◮ Statistical Journals;◮ Twitter;◮ Xiang’s blog ( xianblog.wordpress.com )

◮ ... it is tailored to Approximate LN(θ).

Where is the A in BCel ?