Approximate Bayesian Computation for the Stellar Initial Mass …cschafer/SCMA6/Cisewski.pdf ·...

Approximate Bayesian Computation for theStellar Initial Mass Function

Jessi CisewskiDepartment of Statistics

Yale University

SCMA6

Collaborators: Grant Weller (Savvysherpa), Chad Schafer(Carnegie Mellon), David Hogg (NYU)

Background: ABC

The posterior for θ given observed data xobs:

π(θ | xobs) =f (xobs | θ)π(θ)∫f (xobs | θ)π(θ)dθ

=f (xobs | θ)π(θ)

f (xobs)

Approximate Bayesian Computation

“Likelihood-free” approach to approximating π(θ | xobs)(f (xobs | θ) not specified)

Proceeds via simulation of the forward process

Why would we not know f (xobs | θ)?

1 Physical model too complex to write as a closed formfunction of θ

2 Strong dependency in data3 Observational limitations

1

ABC for Astronomy

cosmoabc: Likelihood-free inference via Population Monte CarloApproximate Bayesian Computation (Ishida et al., 2015)

Approximate Bayesian Computation for Forward Modeling inCosmology (Akeret et al., 2015)

Likelihood-Free Cosmological Inference with Type Ia Supernovae:Approximate Bayesian Computation for a Complete Treatment ofUncertainty (Weyant et al., 2013)

Likelihood - free inference in cosmology: potential for theestimation of luminosity functions (Schafer and Freeman, 2012)

Approximate Bayesian Computation for Astronomical ModelAnalysis: A case study in galaxy demographics and morphologicaltransformation at high redshift (Cameron and Pettitt, 2012)

3

Stellar Initial Mass Function: the dis-tribution of star masses after a star for-mation event within a specified volumeof space

Molecular cloud → Protostars → StarsImage: adapted from

http://www.astro.ljmu.ac.uk

Properties of the cluster formation could include, for example, coregrowth by accretion, interaction due to turbulent environment(Bate, 2012)

Observation effects include aging, completeness, measurementerror

4

http://www.astro.ljmu.ac.uk

Basic ABC algorithm

For the observed data xobs and prior π(θ):

Algorithm∗

1 Sample θprop from prior π(θ)

2 Generate xprop from forward process f (x | θprop)3 Accept θprop if xobs = xprop4 Return to step 1

∗Introduced in Pritchard et al. (1999) (population genetics)

5

Step 3: Accept θprop if xobs = xprop

Waiting for proposals such that xobs = xprop would be computationallyprohibitive

Instead, accept proposals with ∆(xobs, xprop) ≤ εfor some distance ∆ and some tolerance threshold ε

An accepted θ is a draw from the posterior if

P(Accept θprop | θprop = θ) ∝ f (xobs|θ) (the likelihood)

6

Toy Example: Assume that xobs is Gaussian N(θ,1).

Suppose xobs = 1, ε = 0.1.

−2 −1 0 1 2 3 4

x

−2 −1 0 1 2 3 4

θ

Likelihood

Acceptance Prob.

(left) Propose θprop = 0; acceptance region for xprop in redrectangle

(right) Consider all possible θ and calculate acceptance probability

Illustration from Chad Schafer7

xobs = 1, θ = 0, ε = 0.4.

−2 −1 0 1 2 3 4

x

−2 −1 0 1 2 3 4

θ

Likelihood

Acceptance Prob.

xobs = 1, ε = 1→

−2 −1 0 1 2 3 4

θ

Likelihood

Acceptance Prob.

8

ABC Background

Comparing xprop with xobs is not feasible.

When x is high-dimensional, will have to make ε too large inorder to keep acceptance probability reasonable.

Instead, reduce the dimension by comparing summaries,S(xprop) and S(xobs).

9

Gaussian illustration

Data xobs consists of 25 iid draws from Normal(µ, 1)

Summary statistics S(x) = x̄

Distance function ∆(S(xprop), S(xobs)) = |x̄prop − x̄obs|

Tolerance ε = 0.50 and 0.10

Prior π(µ) = Normal(0,10)

10

Gaussian illustration: posteriors for µ

Tolerance: 0.5, N:1000, n:25

µ

Density

-1.0 -0.5 0.0 0.5 1.0

0.0

0.5

1.0

1.5

2.0

True PosteriorABC PosteriorInput value

Tolerance: 0.1, N:1000, n:25

µ

Density

-1.0 -0.5 0.0 0.5 1.0

0.0

0.5

1.0

1.5

2.0 True Posterior

ABC PosteriorInput value

11

How to pick a tolerance, ε?

Instead of starting the ABC algorithm over with a smallertolerance (ε), decrease the tolerance and use the alreadysampled particle system as a proposal distribution rather thandrawing from the prior distribution.

Particle system: (1) retained sampled values, (2) importanceweights

Beaumont et al. (2009); Moral et al. (2011); Bonassi and West(2004)

12

Gaussian illustration: sequential posteriors

-1.0 -0.5 0.0 0.5 1.0

0.0

0.5

1.0

1.5

2.0

N:1000, n:25

µ

Density

True PosteriorABC PosteriorInput value

Tolerance sequence, ε1:10:1.00 0.75 0.53 0.38 0.27 0.19 0.15 0.11 0.08 0.06

13

We propose an ABC algorithm for inference on the stellar IMF

14

Examples of IMF models

Power-law: Salpeter (1955)- Used a power law with α = 2.35

Broken power-law: Kroupa (2001)

Φ(M) ∝ M−αi ,M1i ≤ M ≤ M2i

α1 = 0.3 for 0.01 ≤ M/M∗Sun ≤ 0.08 [Sub-stellar]

α2 = 1.3 for 0.08 ≤ M/MSun ≤ 0.50α3 = 2.3 for 0.50 ≤ M/MSun ≤ Mmax

Log-Normal model: Chabrier (2003)

ξ(logm) =dn

d logm= 0.158× exp

(−(logm − log 0.08)2

2(0.69)2

)∗1 MSun = 1 Solar Mass (the mass of our Sun)

15

ABC for the stellar initial mass function

We propose an ABC algorithm with a new data-generating model

Can account for various observational limitations and uncertaintiesGoal is to capture information about the cluster formation process

Use ideas of preferential attachment [“Rich get richer” (Yule,1925; Simon, 1955)]

Applications: wealth distribution, evolution of citation networks,number of internet page links

Often considered underlying mechanism for data exhibitingpower-law behavior (D’Souza et al., 2007)

16

Proposed Model Generation Process

1 Fix cloud mass that forms stars: Mecl

2 mt ∼ Exponential(λ) enters the system of stars, t = 1, 2, . . .3 The mass quantity mt does one of the following:

Starts a new starJoins an existing star

4 Process is complete when the total mass reaches Mecl

17

Proposed Model Form

A mass, mt , entering the system will

Form a new star with probability

πt = min (1, α)

Join existing star k , k = 1, . . . , nt , with probability

πkt ∝ Mγk,t

α = probability of entering mass forming a new star

γ = growth component (γ = 1 is linear growth)

The generating process is complete when the total mass of formedstars reaches Mecl . The possible ranges of parameters are λ > 0,α ∈ [0, 1), and γ > 0.

18

Selecting summary statistics

Use simplified model: broken power-law, assume independent drawsAccount for the following observation effects:

Aging −→ more massive stars die fasterCompleteness function:

P(observing star | m) =

0, m < Cminm−Cmin

Cmax−Cmin, m ∈ [Cmin,Cmax]

1, m > Cmax

Measurement error

L(α | m1:nobs, ntot ) =(

P(M > Tage ) +

(1− α

M1−αmax − M1−α

min

)∫ Cmax

Cmin

M−α ×(1−

M − Cmin

Cmax − Cmin

)dM

)ntot−nobs

×nobs∏i=1

{∫ Tage

2(2πσ2)

− 12 m−1

i e− 1

2σ2 (log(mi )−log(M))2(

1− α

M1−αmax − M1−α

min

)M−α

×(I{M > Cmax} +

(M − Cmin

Cmax − Cmin

)I{Cmin ≤ M ≤ Cmax}

)dM

}

19

0 10 20 30 40 50

0.00

0.10

0.20

0.30

IMF

N = 1000 Bandwidth = 0.4971

Density

0 10 20 30 40 50

0.00

0.10

0.20

0.30

Observed MF

N = 510 Bandwidth = 0.5625

Density

Sample size = 1000 stars, [Cmin,Cmax] = [2, 4], σ = 0.25

20

Summary statistics

We want to account for the following with our summary statisticsand distance functions:

1 Shape of the observed Mass Function

ρ1(mprop,mobs) =

[∫ {f̂logmprop(x)− f̂logmobs

(x)}2

dx

]1/2

2 Number of stars observed

ρ2(mprop,mobs) = |1− nprop/nobs |

mprop = masses of the stars simulated from the forward modelmobs = masses of observed starsnprop = number of stars simulated from the forward modelnobs = number of observed stars

21

Simulation Study with simpler model

1 Draw n = 103 stars

2 IMF slope α = 2.35 with Mmin = 2 and Mmax = 60

3 N = 103 particles

4 T = 30 sequential time steps

0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

(A) Observed Mass Function

Log10(M)

Log10(f M)

Observed MF95% Credible bandPosterior Median

1.6 1.8 2.0 2.2 2.4 2.6 2.8

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

(B) Posteriors

α

Density

ABC PosteriorTrue PosteriorInput value

Initial ABC posterior...Intermediate ABC posterior...Final ABC Posterior

22

Special cases of proposed forward model

Yule-Simon model: mt fixed, γ = 1, α = 0.30

Chinese restaurant process: mt fixed, γ = 1, α = 0, modified πt(prob of forming a new star)

23

Summary of 183 starsSize and color of points based on final mass

0.8 0.9 1.0 1.1 1.2

0.002

0.006

0.010

Birth time

Birt

h m

ass

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Fina

l mas

s

24

Bate (2012) simulation

Growth of 183 starsColor of lines based on start timebluer = formed earlier, redder = formed later

0.8 0.9 1.0 1.1 1.2

01

23

4

Time

Mass

25

Bate (2012) simulation - ABC posteriors

2000 particles, 23 time steps, Mtot = 88.68

26

Summary

ABC can be a useful tool when data are too complex to define areasonable likelihood

Selection of good summary statistics is crucial for ABC posterior tobe meaningful

A challenge in describing the Stellar Initial Mass Function iscapturing the cluster formation mechanism

We proposed a Preferential Attachment mechanism for the ABCforward model

27

THANK YOU!!!

28

Bibliography I

Akeret, J., Refregier, A., Amara, A., Seehars, S., and Hasner, C. (2015), “Approximate Bayesian Computation forForward Modeling in Cosmology,” arXiv preprint arXiv:1504.07245.

Bate, M. R. (2012), “Stellar, brown dwarf and multiple star properties from a radiation hydrodynamical simulationof star cluster formation,” Monthly Notices of the Royal Astronomical Society, 419, 3115–3146.

Beaumont, M. A., Cornuet, J.-M., Marin, J.-M., and Robert, C. P. (2009), “Adaptive approximate Bayesiancomputation,” Biometrika, 96, 983 – 990.

Bonassi, F. V. and West, M. (2004), “Sequential Monte Carlo with Adaptive Weights for Approximate BayesianComputation,” Bayesian Analysis, 1, 1–19.

Cameron, E. and Pettitt, A. N. (2012), “Approximate Bayesian Computation for Astronomical Model Analysis: ACase Study in Galaxy Demographics and Morphological Transformation at High Redshift,” Monthly Notices ofthe Royal Astronomical Society, 425, 44–65.

Chabrier, G. (2003), “The galactic disk mass function: Reconciliation of the hubble space telescope and nearbydeterminations,” The Astrophysical Journal Letters, 586, L133.

D’Souza, R. M., Borgs, C., Chayes, J. T., Berger, N., and Kleinberg, R. D. (2007), “Emergence of temperedpreferential attachment from optimization,” Proceedings of the National Academy of Sciences, 104, 6112–6117.

Ishida, E., Vitenti, S., Penna-Lima, M., Cisewski, J., de Souza, R., Trindade, A., Cameron, E., et al. (2015),“cosmoabc: Likelihood-free inference via Population Monte Carlo Approximate Bayesian Computation,” arXivpreprint arXiv:1504.06129.

Kroupa, P. (2001), “On the variation of the initial mass function,” Monthly Notices of the Royal AstronomicalSociety, 322, 231–246.

Moral, P. D., Doucet, A., and Jasra, A. (2011), “An adaptive sequential Monte Carlo method for approximateBayesian computation,” Statistics and Computing, 22, 1009–1020.

Pritchard, J. K., Seielstad, M. T., and Perez-Lezaun, A. (1999), “Population Growth of Human Y Chromosomes:A study of Y Chromosome Microsatellites,” Molecular Biology and Evolution, 16, 1791 – 1798.

Salpeter, E. E. (1955), “The luminosity function and stellar evolution.” The Astrophysical Journal, 121, 161.

29

Bibliography II

Schafer, C. M. and Freeman, P. E. (2012), Statistical Challenges in Modern Astronomy V, Springer, chap. 1,Lecture Notes in Statistics, pp. 3 – 19.

Simon, H. A. (1955), “On a class of skew distribution functions,” Biometrika, 425–440.

Weyant, A., Schafer, C., and Wood-Vasey, W. M. (2013), “Likeihood-free cosmological inference with type Iasupernovae: approximate Bayesian computation for a complete treatment of uncertainty,” The AstrophysicalJournal, 764, 116.

Yule, G. U. (1925), “A mathematical theory of evolution, based on the conclusions of Dr. JC Willis, FRS,”Philosophical Transactions of the Royal Society of London. Series B, Containing Papers of a BiologicalCharacter, 21–87.

30

Approximate Bayesian Computation for the Stellar Initial Mass …cschafer/SCMA6/Cisewski.pdf ·...

Documents

Transcript of Approximate Bayesian Computation for the Stellar Initial Mass …cschafer/SCMA6/Cisewski.pdf ·...