Approximate Bayesian Computation for theStellar Initial Mass Function
Jessi CisewskiDepartment of Statistics
Yale University
SCMA6
Collaborators: Grant Weller (Savvysherpa), Chad Schafer(Carnegie Mellon), David Hogg (NYU)
Background: ABC
The posterior for θ given observed data xobs:
π(θ | xobs) =f (xobs | θ)π(θ)∫f (xobs | θ)π(θ)dθ
=f (xobs | θ)π(θ)
f (xobs)
Approximate Bayesian Computation
“Likelihood-free” approach to approximating π(θ | xobs)(f (xobs | θ) not specified)
Proceeds via simulation of the forward process
Why would we not know f (xobs | θ)?
1 Physical model too complex to write as a closed formfunction of θ
2 Strong dependency in data3 Observational limitations
1
2
ABC for Astronomy
cosmoabc: Likelihood-free inference via Population Monte CarloApproximate Bayesian Computation (Ishida et al., 2015)
Approximate Bayesian Computation for Forward Modeling inCosmology (Akeret et al., 2015)
Likelihood-Free Cosmological Inference with Type Ia Supernovae:Approximate Bayesian Computation for a Complete Treatment ofUncertainty (Weyant et al., 2013)
Likelihood - free inference in cosmology: potential for theestimation of luminosity functions (Schafer and Freeman, 2012)
Approximate Bayesian Computation for Astronomical ModelAnalysis: A case study in galaxy demographics and morphologicaltransformation at high redshift (Cameron and Pettitt, 2012)
3
Stellar Initial Mass Function: the dis-tribution of star masses after a star for-mation event within a specified volumeof space
Molecular cloud → Protostars → StarsImage: adapted from
http://www.astro.ljmu.ac.uk
Properties of the cluster formation could include, for example, coregrowth by accretion, interaction due to turbulent environment(Bate, 2012)
Observation effects include aging, completeness, measurementerror
4
Basic ABC algorithm
For the observed data xobs and prior π(θ):
Algorithm∗
1 Sample θprop from prior π(θ)
2 Generate xprop from forward process f (x | θprop)3 Accept θprop if xobs = xprop4 Return to step 1
∗Introduced in Pritchard et al. (1999) (population genetics)
5
Step 3: Accept θprop if xobs = xprop
Waiting for proposals such that xobs = xprop would be computationallyprohibitive
Instead, accept proposals with ∆(xobs, xprop) ≤ εfor some distance ∆ and some tolerance threshold ε
An accepted θ is a draw from the posterior if
P(Accept θprop | θprop = θ) ∝ f (xobs|θ) (the likelihood)
6
Toy Example: Assume that xobs is Gaussian N(θ,1).
Suppose xobs = 1, ε = 0.1.
−2 −1 0 1 2 3 4
x
−2 −1 0 1 2 3 4
θ
Likelihood
Acceptance Prob.
(left) Propose θprop = 0; acceptance region for xprop in redrectangle
(right) Consider all possible θ and calculate acceptance probability
Illustration from Chad Schafer7
xobs = 1, θ = 0, ε = 0.4.
−2 −1 0 1 2 3 4
x
−2 −1 0 1 2 3 4
θ
Likelihood
Acceptance Prob.
xobs = 1, ε = 1→
−2 −1 0 1 2 3 4
θ
Likelihood
Acceptance Prob.
8
ABC Background
Comparing xprop with xobs is not feasible.
When x is high-dimensional, will have to make ε too large inorder to keep acceptance probability reasonable.
Instead, reduce the dimension by comparing summaries,S(xprop) and S(xobs).
9
ABC Background
Comparing xprop with xobs is not feasible.
When x is high-dimensional, will have to make ε too large inorder to keep acceptance probability reasonable.
Instead, reduce the dimension by comparing summaries,S(xprop) and S(xobs).
9
Gaussian illustration
Data xobs consists of 25 iid draws from Normal(µ, 1)
Summary statistics S(x) = x̄
Distance function ∆(S(xprop), S(xobs)) = |x̄prop − x̄obs|
Tolerance ε = 0.50 and 0.10
Prior π(µ) = Normal(0,10)
10
Gaussian illustration: posteriors for µ
Tolerance: 0.5, N:1000, n:25
µ
Density
-1.0 -0.5 0.0 0.5 1.0
0.0
0.5
1.0
1.5
2.0
True PosteriorABC PosteriorInput value
Tolerance: 0.1, N:1000, n:25
µ
Density
-1.0 -0.5 0.0 0.5 1.0
0.0
0.5
1.0
1.5
2.0 True Posterior
ABC PosteriorInput value
11
How to pick a tolerance, ε?
Instead of starting the ABC algorithm over with a smallertolerance (ε), decrease the tolerance and use the alreadysampled particle system as a proposal distribution rather thandrawing from the prior distribution.
Particle system: (1) retained sampled values, (2) importanceweights
Beaumont et al. (2009); Moral et al. (2011); Bonassi and West(2004)
12
Gaussian illustration: sequential posteriors
-1.0 -0.5 0.0 0.5 1.0
0.0
0.5
1.0
1.5
2.0
N:1000, n:25
µ
Density
True PosteriorABC PosteriorInput value
Tolerance sequence, ε1:10:1.00 0.75 0.53 0.38 0.27 0.19 0.15 0.11 0.08 0.06
13
We propose an ABC algorithm for inference on the stellar IMF
14
Examples of IMF models
Power-law: Salpeter (1955)- Used a power law with α = 2.35
Broken power-law: Kroupa (2001)
Φ(M) ∝ M−αi ,M1i ≤ M ≤ M2i
α1 = 0.3 for 0.01 ≤ M/M∗Sun ≤ 0.08 [Sub-stellar]
α2 = 1.3 for 0.08 ≤ M/MSun ≤ 0.50α3 = 2.3 for 0.50 ≤ M/MSun ≤ Mmax
Log-Normal model: Chabrier (2003)
ξ(logm) =dn
d logm= 0.158× exp
(−(logm − log 0.08)2
2(0.69)2
)∗1 MSun = 1 Solar Mass (the mass of our Sun)
15
ABC for the stellar initial mass function
We propose an ABC algorithm with a new data-generating model
Can account for various observational limitations and uncertaintiesGoal is to capture information about the cluster formation process
Use ideas of preferential attachment [“Rich get richer” (Yule,1925; Simon, 1955)]
Applications: wealth distribution, evolution of citation networks,number of internet page links
Often considered underlying mechanism for data exhibitingpower-law behavior (D’Souza et al., 2007)
16
Proposed Model Generation Process
1 Fix cloud mass that forms stars: Mecl
2 mt ∼ Exponential(λ) enters the system of stars, t = 1, 2, . . .3 The mass quantity mt does one of the following:
Starts a new starJoins an existing star
4 Process is complete when the total mass reaches Mecl
17
Proposed Model Form
A mass, mt , entering the system will
Form a new star with probability
πt = min (1, α)
Join existing star k , k = 1, . . . , nt , with probability
πkt ∝ Mγk,t
α = probability of entering mass forming a new star
γ = growth component (γ = 1 is linear growth)
The generating process is complete when the total mass of formedstars reaches Mecl . The possible ranges of parameters are λ > 0,α ∈ [0, 1), and γ > 0.
18
Selecting summary statistics
Use simplified model: broken power-law, assume independent drawsAccount for the following observation effects:
Aging −→ more massive stars die fasterCompleteness function:
P(observing star | m) =
0, m < Cminm−Cmin
Cmax−Cmin, m ∈ [Cmin,Cmax]
1, m > Cmax
Measurement error
L(α | m1:nobs, ntot ) =(
P(M > Tage ) +
(1− α
M1−αmax − M1−α
min
)∫ Cmax
Cmin
M−α ×(1−
M − Cmin
Cmax − Cmin
)dM
)ntot−nobs
×nobs∏i=1
{∫ Tage
2(2πσ2)
− 12 m−1
i e− 1
2σ2 (log(mi )−log(M))2(
1− α
M1−αmax − M1−α
min
)M−α
×(I{M > Cmax} +
(M − Cmin
Cmax − Cmin
)I{Cmin ≤ M ≤ Cmax}
)dM
}
19
0 10 20 30 40 50
0.00
0.10
0.20
0.30
IMF
N = 1000 Bandwidth = 0.4971
Density
0 10 20 30 40 50
0.00
0.10
0.20
0.30
Observed MF
N = 510 Bandwidth = 0.5625
Density
Sample size = 1000 stars, [Cmin,Cmax] = [2, 4], σ = 0.25
20
Summary statistics
We want to account for the following with our summary statisticsand distance functions:
1 Shape of the observed Mass Function
ρ1(mprop,mobs) =
[∫ {f̂logmprop(x)− f̂logmobs
(x)}2
dx
]1/2
2 Number of stars observed
ρ2(mprop,mobs) = |1− nprop/nobs |
mprop = masses of the stars simulated from the forward modelmobs = masses of observed starsnprop = number of stars simulated from the forward modelnobs = number of observed stars
21
Simulation Study with simpler model
1 Draw n = 103 stars
2 IMF slope α = 2.35 with Mmin = 2 and Mmax = 60
3 N = 103 particles
4 T = 30 sequential time steps
0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
(A) Observed Mass Function
Log10(M)
Log10(f M)
Observed MF95% Credible bandPosterior Median
1.6 1.8 2.0 2.2 2.4 2.6 2.8
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
(B) Posteriors
α
Density
ABC PosteriorTrue PosteriorInput value
Initial ABC posterior...Intermediate ABC posterior...Final ABC Posterior
22
Special cases of proposed forward model
Yule-Simon model: mt fixed, γ = 1, α = 0.30
Chinese restaurant process: mt fixed, γ = 1, α = 0, modified πt(prob of forming a new star)
23
Summary of 183 starsSize and color of points based on final mass
0.8 0.9 1.0 1.1 1.2
0.002
0.006
0.010
Birth time
Birt
h m
ass
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Fina
l mas
s
24
Bate (2012) simulation
Growth of 183 starsColor of lines based on start timebluer = formed earlier, redder = formed later
0.8 0.9 1.0 1.1 1.2
01
23
4
Time
Mass
25
Bate (2012) simulation - ABC posteriors
2000 particles, 23 time steps, Mtot = 88.68
26
Summary
ABC can be a useful tool when data are too complex to define areasonable likelihood
Selection of good summary statistics is crucial for ABC posterior tobe meaningful
A challenge in describing the Stellar Initial Mass Function iscapturing the cluster formation mechanism
We proposed a Preferential Attachment mechanism for the ABCforward model
27
THANK YOU!!!
28
Bibliography I
Akeret, J., Refregier, A., Amara, A., Seehars, S., and Hasner, C. (2015), “Approximate Bayesian Computation forForward Modeling in Cosmology,” arXiv preprint arXiv:1504.07245.
Bate, M. R. (2012), “Stellar, brown dwarf and multiple star properties from a radiation hydrodynamical simulationof star cluster formation,” Monthly Notices of the Royal Astronomical Society, 419, 3115–3146.
Beaumont, M. A., Cornuet, J.-M., Marin, J.-M., and Robert, C. P. (2009), “Adaptive approximate Bayesiancomputation,” Biometrika, 96, 983 – 990.
Bonassi, F. V. and West, M. (2004), “Sequential Monte Carlo with Adaptive Weights for Approximate BayesianComputation,” Bayesian Analysis, 1, 1–19.
Cameron, E. and Pettitt, A. N. (2012), “Approximate Bayesian Computation for Astronomical Model Analysis: ACase Study in Galaxy Demographics and Morphological Transformation at High Redshift,” Monthly Notices ofthe Royal Astronomical Society, 425, 44–65.
Chabrier, G. (2003), “The galactic disk mass function: Reconciliation of the hubble space telescope and nearbydeterminations,” The Astrophysical Journal Letters, 586, L133.
D’Souza, R. M., Borgs, C., Chayes, J. T., Berger, N., and Kleinberg, R. D. (2007), “Emergence of temperedpreferential attachment from optimization,” Proceedings of the National Academy of Sciences, 104, 6112–6117.
Ishida, E., Vitenti, S., Penna-Lima, M., Cisewski, J., de Souza, R., Trindade, A., Cameron, E., et al. (2015),“cosmoabc: Likelihood-free inference via Population Monte Carlo Approximate Bayesian Computation,” arXivpreprint arXiv:1504.06129.
Kroupa, P. (2001), “On the variation of the initial mass function,” Monthly Notices of the Royal AstronomicalSociety, 322, 231–246.
Moral, P. D., Doucet, A., and Jasra, A. (2011), “An adaptive sequential Monte Carlo method for approximateBayesian computation,” Statistics and Computing, 22, 1009–1020.
Pritchard, J. K., Seielstad, M. T., and Perez-Lezaun, A. (1999), “Population Growth of Human Y Chromosomes:A study of Y Chromosome Microsatellites,” Molecular Biology and Evolution, 16, 1791 – 1798.
Salpeter, E. E. (1955), “The luminosity function and stellar evolution.” The Astrophysical Journal, 121, 161.
29
Bibliography II
Schafer, C. M. and Freeman, P. E. (2012), Statistical Challenges in Modern Astronomy V, Springer, chap. 1,Lecture Notes in Statistics, pp. 3 – 19.
Simon, H. A. (1955), “On a class of skew distribution functions,” Biometrika, 425–440.
Weyant, A., Schafer, C., and Wood-Vasey, W. M. (2013), “Likeihood-free cosmological inference with type Iasupernovae: approximate Bayesian computation for a complete treatment of uncertainty,” The AstrophysicalJournal, 764, 116.
Yule, G. U. (1925), “A mathematical theory of evolution, based on the conclusions of Dr. JC Willis, FRS,”Philosophical Transactions of the Royal Society of London. Series B, Containing Papers of a BiologicalCharacter, 21–87.
30
Top Related