Asymptotics for discrete random measures

31
Introduction PD(α, θ) mixtures Asymptotics Asymptotics for discrete random measures Pierpaolo De Blasi email: [email protected] Statalk on Bayesian Nonparametrics, Collegio Carlo Alberto, February 19, 2016 1 / 29

Transcript of Asymptotics for discrete random measures

Page 1: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

Asymptotics for discrete random measures

Pierpaolo De Blasiemail: [email protected]

Statalk on Bayesian Nonparametrics,Collegio Carlo Alberto, February 19, 2016

1 / 29

Page 2: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

Outline

1. Introduction• Dirichlet process PD(0, θ)• Stick-breaking representation• Two-parameter Poisson-Dirichlet process PD(α, θ)• Truncation error Rn

• Almost sure approximations2. Applications in PD(α, θ) mixtures

• Blocked Gibbs sampler• Slice sampler• Posterior asymptotics

3. Asymptotics• Limiting distribution• Large deviation

2 / 29

Page 3: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

Dirichlet process

The Dirichlet process (DP) defines a prior distribution on the space ofprobability measure on R (or Rd or any Polish space).

Let θ > 0 and H(·) a probability measure on R,

P ∼ DP(θH)

when for any partition B1, . . . ,Bd of R,

(P(B1), . . . ,P(Bd )) ∼ Dirichlet (θH(B1), . . . , θH(Bd )) .

P is with probability one a discrete probability measure which admits aninfinite sum representation:

P(·) =∑

j≥1pjδzj (·)

where• (pj ) = (p1, p2, . . .) is a random vector taking value in the infinite

probability simplex ∆∞ = {pj ≥ 0,∑

j≥1 pj = 1}• (zj ) = (z1, z2, . . .) is an iid sequence from H(·) independent of (pj ).

3 / 29

Page 4: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

Ferguson and Klass sum representation

P(·) =∑j≥1

pj δzj (·) =∑j≥1

Jj∑`≥1 J`

δzj (·)

with (Jj ) the jumps of a gamma process {ξt , t ∈ [0, 1]} (θ = 1 here),

E(e−sξt ) =(1 + s

)−t= e−t

∫∞0 (1−e−su)ν(du), ν(du) = u−1e−u du.

• Γj = E1 + · · ·Ej , for E1,E2, . . . ∼iid Exp(1), and N(x) =∫∞

x u−1e−u duthe right tail of the Levy measure ν(du).

• ThenJj = N−1(Γj ), J1 > J2 > . . .

so that the normalized weights

pj = N−1(Γj )/∑

`≥1N−1(Γ`)

are a.s. decreasing, p1 > p2 > . . ., and define a distribution on ∆∞

known as the Poisson Dirichlet distribution.• No closed form solution for N−1(u). Each pj requires the computation of

an infinite sum.

4 / 29

Page 5: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

Stick-breaking sum representationAlso known as residual allocation model,

P(·) =∑j≥1

p̃j δzj (·) =∑j≥1

vj

j−1∏`=1

(1− v`) δzj (·)

where vj ∼iid beta(1, θ).• The stick-breaking weights

p̃j = vj

∏j−1

`=1(1− v`),

have decreasing expected value, E(p̃1) > E(p̃2) > . . ., and defines adistribution on ∆∞ known as the GEM distribution.

• Correspondence with Ferguson and Klass weights (pj ) in size-biasedrandom order∗, that is in the order they appear in a multinomial sampling.

• Convenient for simulation purposes: no need to compute an infinite sum.

size-biased random order∗:

P(p̃1 = pj |p1, p2, . . .) = pj

P(p̃j+1 = pi |p̃1, . . . , p̃j ; p1, p2, . . .) =pi1(pj 6= p` for all 1 ≤ ` ≤ j)

1− p̃1 − · · · − p̃j

5 / 29

Page 6: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

Two-parameter Poisson-Dirichlet process

Also known as Pitman-Yor process after Pitman and Yor (1997),

P(·) =∑j≥1

p̃j δzj (·) =∑j≥1

vj

j−1∏`=1

(1− v`) δzj (·)

where vj ∼ind beta(1− α, θ + jα) for α ∈ (0, 1) and θ > −α.• The distribution of the ordered sequence p1 > p2 > . . . where pj = p̃(j) is

known as the two-parameter Poisson-Dirichlet ditribution.• It admits a representation as normalized jumps of a process with

independent increments only for θ = 0 (stable process)

pj = Γ−1/αj

/∑`≥1

Γ−1/α`

• Since vj are stochastically decreasing, E(p̃j ) has a slower decaycompared to the DP case.

6 / 29

Page 7: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

Notation

We use• PD(0, θ) for the Dirichlet process;• PD(α, θ) for the two-parameter Poisson-Dirichlet process.

whenever we refer to the corresponding distribution of (pj ) or (p̃j ) on ∆∞.

In both cases, and unless otherwise specified, the locations (zj ) are taken asiid draws from H(·).

Whenever there is• . . .

. . .

an open problem is pointed out.

7 / 29

Page 8: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

Truncation error

Define the residual probability or truncation error as

Rn =∑j>n

p̃j =n∏

j=1

(1− vj )

For ε > 0, define the counting process

Nε = inf{n ∈ N : Rn < ε},

When the vj are identically distributed (i.e. the DP case), Nε takes on theinterpretation of the renewal process Xt evaluated at t = − log ε with

• independent interarrival times distributed as − log(1− vj )

• n-th arrival times Tn distributed as − log Rn.

8 / 29

Page 9: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

PD(0, θ)

• Since (1− vj ) ∼iid beta(θ, 1), − log(1− vj ) ∼iid Exp(θ) and

− log Rn ∼ Gamma(n, θ)

• Renewal process interpretation:

P(Tn ≤ t) = 1− P(Xt ≤ n − 1)

Tn ∼ Gamma(n, θ), Xt ∼ Pois(θ t)

where Tn = − log Rn, Xt = Nε for t = log(1/ε).• Hence Nε − 1 ∼ Pois(θ log(1/ε)): the smaller ε, the larger the number Nε

of weights p̃j needed to account for 1− ε probability.

9 / 29

Page 10: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

PD(α, θ)

• (1− vj ) ∼ beta(θ + jα, 1− α), so − log(1− vj ) is a positive r.v. which isnot close under convolution. It is infinitely divisible, see Lemma 1 inFerguson (1974).

• − log Rn does not correspond to the n-th arrival time of a Poissonprocess, rather Nε is a counting process with independent arrival times.

• StillP(− log Rn ≤ log(1/ε)) = 1− P(Nε ≤ n − 1)

However, no closed form distribution for − log Rn and Nε.

We will be focussing on the distribution of Rn, seeking an asymptoticapproximations as n→∞:

1. limiting distribution

2. large deviation principle

In this, we keep the PD(0, θ) case as running example for comparison andillustration.

10 / 29

Page 11: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

Truncation error and a.s. approximationStick breaking representation suggests the almost sure truncation

PN(·) =∑N

j=1p̃j δzj (·) + RN δz0 (·)

=∑N

j=1vj

∏j−1

`=1(1− v`) δzj (·) +

∏N

j=1(1− vj ) δz0 (·)

obtained by setting vN+1 = 1 so that p̃N+1 = 1−∑N

j=1 p̃j = Rn.• Muliere and Tardella (1998): sampling functionals of P like

P(g) =

∫g(x)P(dx) =

∑j≥1

p̃jg(zj )

• For each bounded and continuous real valued function g,

PN(g)→a.s. P(g)

• Let Pε = PNε for Nε = inf{n ∈ N : Rn < ε}. Then

dTV (Pε,P) ≤ ε, a.s.

11 / 29

Page 12: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

PD(α, θ) mixtures for density estimation

xi |yi ∼ f (xi |yi ), yi |P ∼ P, i = 1, 2, . . . , n

P ∼ P(1)

where Yi are latent variables (non observable).• By exploiting the stick breaking sum representation,

f (xi ) =

∫f (xi |yi )dP(yi ), P(·) =

∑j≥1

p̃j δzj (·)

orf (xi ) =

∑j≥1

p̃j f (xi |zj ), (p̃j ) ∼ PD(α, θ), (zj ) ∼iid H(·)

• Rewrite (1) as

xi |(p̃j ), (zj ) ∼∑j≥1

p̃j f (x |zj ), i = 1, 2, . . . , n

(p̃j ) ∼ PD(α, θ), (zj ) ∼iid H(·)(2)

12 / 29

Page 13: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

Truncation applied to the posterior

Ishwaran and James (2001), Gelfand and Kottas (2002).• Marginal sampler: avoid dealing with an infinite number of parameters by

integrating out the unknown mixing P, i.e. the sequences (p̃j ) and (zj ):

xi |yi ∼ f (xi |yi ), i = 1, 2, . . . , n

y1, . . . yn ∼ PU(α, θ,H)

where PU(α, θ,H) is the prediction rule of the Polya Urn.• Let

π(dy |x)

be the posterior distribution of y = (y1, . . . , yn) given x = (x1, . . . , xn).• Inference for the posterior of P based only on the posterior yi values:

PD(0, θ)P(·|y) ∼ DP(θH + nPn)

where Pn is the empirical measure of y = (y1, . . . , yn).

13 / 29

Page 14: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

• PD(α, θ)

P(·|y) =k∑

j=1

p̃∗j δζj (·) + p̃∗k+1P∗(·)

where ζ1, · · · , ζk are the unique set of yi values with frequenciesn∗1 , . . . , n

∗k , and

(p̃∗1 , . . . , p̃∗k , p̃∗k+1) ∼ Dirichlet(n∗1 − α, . . . , n∗k − α, θ + αk)

is independent of P∗, which is PD(α, θ + kα).• Thus, to approximate functionals of the posterior P(·|y), use

PN(·|y) =∑k

j=1p̃∗j δζj (·) + p̃∗k+1P∗N(·)

where

P∗N(·) =∑N

j=1p̃j δzj (·) + RN δz0 (·)

=∑N

j=1vj

∏j−1

`=1(1− v`) δzj (·) +

∏N

j=1(1− vj ) δz0 (·)

for vj ∼ beta(1− α, θ + kα + jα).

14 / 29

Page 15: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

Truncation applied to the prior

Blocked Gibbs sampler by Ishwaran and James (2001).• Conditional sampler: P is not marginalized out, rather replaced with its

almost sure truncation in the prior

xi |yi ∼ f (xi |yi ), yi |P ∼ P, i = 1, 2, . . . , n

P ∼ PN(3)

• N can be chosen sufficiently large so that the L1 distance ‖ · ‖1 betweenthe marginal densities

mN (x) =∫ {∏n

i=1f (xi |yi )dP(yi )

}dPN (P), m∞(x) =

∫ {∏n

i=1f (xi |yi )dP(yi )

}dP(P)

is small.• It can be shown that

‖mN(x)−m∞(x)‖1 ≤ 4(1− E

[(1− RN)n])

15 / 29

Page 16: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

PD(0, θ) (1− E

[(1− RN)n]) ≈ ne−N/θ, as N →∞

based on

1− RN =d 1− exp(−E1/θ) · · · exp(−EN/θ) ≈ 1− exp(−N/θ)

by the law of large number since E1, . . . ,EN are iid Exp(1).

PD(α, θ)

• Asymptotic evaluation of E [(1− RN)n] not available.• By direct calculation,

E[(RN)r ] =N∏

j=1

(θ + jα)(r)(θ + (j − 1)α + 1)(r)

wherex(r) =

Γ(x + r)

Γ(x)= x(x + 1) · · · (x + r − 1)

is the ascending factorial.• . . .

. . .16 / 29

Page 17: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

Slice sampler

Walker (2007), Kalli, Griffin and Walker (2011).

• Exact conditional sampler: the truncation level is adaptive, included inthe model via auxiliary parameters.

• Start with hierarchical representation

xi |(p̃j ), (zj ) ∼∑j≥1

p̃j f (x |zj ), i = 1, 2, . . . , n

(p̃j ) ∼ PD(α, θ), (zj ) ∼iid H(·)

• Data augmentation 1: introduce uniform r.v. u1, . . . , un ∈ [0, 1],

xi , ui |(p̃j ), (zj ) ∼∑j≥1

1(ui ≤ p̃j ) f (x |zj ), i = 1, 2, . . . , n

• Data augmentation 2: introduce allocation variables d1, . . . , dn ∈ N suchthat P(di = j) = p̃j ,

xi , ui , di |(p̃j ), (zj ) ∼ 1(ui ≤ p̃di ) f (x |zdi ), i = 1, 2, . . . , n

17 / 29

Page 18: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

• Gibbs sampler on the augmented parameter space {p̃j , zj , ui , di}• Convenient to work with {vj , zj , ui , di} according to p̃j = vj

∏j−1`=1(1− v`).

• Full conditionals: let nj =∑n

i=1 1(di = j)

π(vj |rest) ∼ beta(

1− α + nj , θ + jα +∑

`≥j+1n`),

π(ui |rest) ∼ Unif(0, p̃di ),

π(di = j|rest) =1(p̃j > ui )f (xi |zj )∑

`∈Auif (xi |z`)

, Au = {j : p̃j > u}

• We need to account only for a finite number N of components insampling π(di = j|rest), that is

⋃Aui ⊂ {1, . . . ,N} with N such that

RN < u(1) = min{u1, . . . , un}

• Note that N = Nu(1) = inf{n ∈ N : Rn < u(1)}• However RN is not as in the prior but with respect to vj |rest. So the

configuration of n1, n2, . . . should be taken into account.• . . .

. . .

18 / 29

Page 19: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

Efficient slice sampler

• Let (ξj ) = (ξ1, ξ2, . . .) be an decreasing positive sequence and write

xi , ui , di |(p̃j ), (zj ) ∼ 1(ui ≤ ξdi )p̃di

ξdi

f (x |zdi ), i = 1, 2, . . . , n

i.e. recover previous version for the (random) sequence (ξj ) = (p̃j )

• The full conditionals now are

π(ui |rest) ∼ Unif(0, ξdi )

π(di = j|rest) =1(ξj > ui )

p̃jξj

f (xi |zj )∑Ni`=1 f (xi |z`)

, Ni = inf{n ∈ N : ξj > ui}

• Ni is deterministic given ui . For ξj = e−ξj ,

Ni = b−(1/ξ) log uic, and N = max{Ni} = b−(1/ξ) log u(1)c

compared to N = inf{n ∈ N : Rn < u(1)}.

19 / 29

Page 20: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

• The choice of the sequence (ξj ), or of the constant ξ in

ξj = e−ξj ,

is however a delicate issue.• Balance between mixing and computational time of the sampler.• The mixing of the sampler depends on how the ratio

E(p̃j )/ξj = E(p̃j )/e−ξj

increases with j : faster rates (larger ξ) are associated with better mixingbut longer running time.

• The study of the asymptotic behavior of RN should give some guidelinesin the choice of the sequence (ξj ).

• . . .. . .

20 / 29

Page 21: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

Posterior asymptotics

• PD(α, θ) mixture of normal densities

f (x) = fP,σ(x) =

∫1σφ(x − y

σ

)dP(y),

P(·) =∑j≥1

p̃j δzj (·), σ ∼ π

• Assume xn = (x1, . . . , xn) are iid from some f0(x).• Let Π be the distribution on space of densities F induced by P and π.• Interest is in establishing that the posterior Π(·|xn) accumulates in L1

neighborhood of f0:

Π (f : ‖f − f0‖1 ≤ εn|xn)→ 1,

where εn → 0, nε2n →∞, is the posterior convergence rate.

• Standard sufficient conditions for posterior convergence involves theprior Π and the regularity of f0.

21 / 29

Page 22: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

Ghosal, Ghosh and van der Vaart (2000).• Prior contraction rates

Π(

f :∫

f0 log(f0/f ) ≤ ε̃2n,∫

f0 log(f0/f )2 ≤ ε̃2n

)≥ e−nε̃2

n

where ε̃n ≤ εn.• PD(0, θ)

When f0 is β-smooth, ε̃n = n−β/(1+2β)(log n)t .• Low entropy - high mass sieve:

let (Fn) be a sequence of sets Fn ⊂ F with Fn ↑ F such that

Π(Fcn ) ≤ e−4nε̃2

n

log N(εn,Fn, ‖ · ‖1) ≤ nε2n

where N(εn,Fn, ‖ · ‖1) is the entropy of Fn, the smallest number ofεn-balls needed to cover Fn in the L1-metric.

22 / 29

Page 23: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

Low entropy - high mass sieve

Shen, Tokdar and Ghosal (2013).• For P =

∑j≥1 p̃jδzj , (zj ) ∼iid H,

Fn ={

fP,σ(x) =∫ 1σφ( x−y

σ)dP(y) : Rmn =

∑j>mn

p̃j ≤ εn,

σn ≤ σ ≤ σn (1 + εn)Mn and zj ∈ [−an, an], j ≤ mn

}where Mn = σ−1

n = an = n.• PD(0, θ)

For εn = n−γ(log n)t , γ ∈ (0, 1/2), set mn = bnε2n/ log nc. Then

log N(εn,Fn, ‖ · ‖1) = mn log(1/εn) + other things . nε2n

Π(Fcn ) = P(Rmn > εn) + other things . e−(1−2γ)nε2

n

as it can be shown that P(Rmn > εn) . e−(1−2γ)nε2n .

23 / 29

Page 24: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

Limiting distribution PD(0, θ)

− log Rn − n/θ√n/θ

→d Z , Z ∼ N(0, 1)

• Follows from CLT since − log Rn ∼ Gamma(n, θ).

• For a > 0 and Φ(z) =∫ z−∞(2π)−1/2e−x2/2dx ,

P(Rn > e−n/θea√

n︸ ︷︷ ︸εn

)→ Φ(−θa)

that is εn = e−n/θsn → 0 for sn = ea√

n →∞.• As for Nε, as ε→ 0,

Nε − 1− θ log(1/ε)√θ log(1/ε)

→d Z , Z ∼ N(0, 1)

Is it possible to obtain this CLT from the CLT of Rn and the relationP(− log Rn ≤ log(1/ε)) = 1− P(Nε ≤ n − 1).

24 / 29

Page 25: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

Limiting distribution PD(α, θ)

THEOREM 1. Let Tα be a stable random variable with exponent α,Ee−uTα = e−uα . Also, let Tα,θ be a polynomially tilted version of Tα, i.e., forfα the density of Tα, the density of Tα,θ is proportional to t−θfα(t). Then

n1−αα Rn →a.s. α(Tα,θ)−1

• See Lemma 3.11, Pitman (2006), proof based on Kingman’s paintboxrepresentation (Kingman, 1978) and results in Gnedin, Hansen andPitman (2007) for deterministic (pj ) ∈ ∆∞.

• Direct proof via the moment generating function of − log Rn.• For c > 1,

P(Rn > n−1−αα c︸ ︷︷ ︸εn

)→ P(Tα,θ < α/c)

compared to P(Rn > εn)→ Φ(−θa) for εn = e−n/θea√

n in the PD(0, θ)case.

25 / 29

Page 26: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

• For the case PD(α, 0), the weights (p̃j ) can be represented as

p̃j =J̃j

Tα, (J̃j ) =d (Γ

−1/αj ) in size-biased random order

henceTαRn =

∑j>n

J̃j ,=⇒ n1−αα

∑j>n

J̃j →a.s. α

the small jumps of the stable process in a size-biased random order,once properly rescaled by n

1−αα , converge to a proportion equal to α.

• Interestingly, α is also the proportion of singletons, or dust, with respectto the number of unique values, say Kn, in a multinomial sample of size nfrom the PD(α, θ) distribution:{

n−αMn,1 →a.s. α(Tα,θ)−α

n−αKn →a.s. (Tα,θ)−α,

Mn,1

Kn→a.s. α

where Mn,k = #{j : nj = k , j = 1, . . .Kn} such that∑

k kMn,k = n.

26 / 29

Page 27: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

Large deviations PD(0, θ)

• For a < 1/θ,

1n

log P(− log Rn

n≤ a

)→ −[θa− 1− log(θa)]

i.e. − log Rn satisfies a Large Deviation Principle (LDP) with speed n andrate function I(x) = θx − 1− log(θx) > 0 for θx < 1.

• HenceP(Rn > e−n/θe(1/θ−a)n︸ ︷︷ ︸

εn

) � e−I(a)n

that is εn = e−n/θsn → 0 for sn = ecn →∞ for c = 1/θ − a > 0.• Beyond large deviation, εn = n−γ ,

P(Rn > εn) = P(Gamma(n, θ) < γ log n)

≤ (θγ log n)n

Γ(n + 1)� 1√

nexp{−n log n}

by Stirling’s formula.

27 / 29

Page 28: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

Large deviations PD(α, θ)

THEOREM 2. Let I(x) = (x − 1−αα

)(θ + α) and J(x) = 1− x α1−α .

1log n

log P(− log Rn

log n≥ a

)→ −I(a), a > 1−α

α(i)

1log n

log[− 1

1− α log P(− log Rn

log n≤ a

)]→ J(a), a < 1−α

α(ii)

• Part (i) is a LDP for − log Rn with speed log n and rate function I(x). SeeDembo and Zeitouni (2010, Chp. 2).

• It corresponds to

P(

Rn ≤ n−1−αα n−c

)� n−(θ+α)c , c > 0

that is the probability that Rn is smaller than a negative power of nsmaller than its long run behavior n−

1−αα vanishes polynomially fast.

28 / 29

Page 29: Asymptotics for discrete random measures

Introduction PD(α, θ) mixtures Asymptotics

• Part (ii) is a non standard LDP which corresponds to

P(Rn > n−1−αα nc︸ ︷︷ ︸εn

) � e−(1−α)nα

1−α c

, 0 < c < 1−αα

• It is of direct use for posterior asymptotics: for εn = n−γ(log n)t ,γ ∈ (0, 1/2), the sequence mn which satisfies

P(Rmn > εn) . e−cnε2n

is given bymn = bnετn (log n)ctc, for τ = 2− α

1− αand c = α

1−α1

1−2γ .

• Compared with the PD(0, θ) case, where mn = bnε2n/ log nc,

mn = bnετn (log n)ctc grows faster since τ < 2, and, in particular,

log N(εn,Fn, ‖ · ‖1) . mn log(1/εn)

= nε2n n

1−αα

γ(log n)α

1−α2γ

1−2γ t+1

=⇒ low entropy - high mass sieve condition is not satisfied.• . . .

. . .29 / 29

Page 30: Asymptotics for discrete random measures

References

• Dembo and Zeitouni (2010). Large Deviations Techniques and Applications(2nd ed). Springer.

• Ferguson (1974). Prior distributions on spaces of probability measures.Ann. Statist. 2, 615–629.

• Gelfand and Kottas (2002). A computational approach for fullnonparametric bayesian inference under dirichlet process mixture models. J.Comput. Graph. Statist. 28, 289–305.

• Ghosal, Ghosh and van der Vaart (2000). Convergence rates for posteriordistributions. Ann. Statist. 28, 500–531.

• Gnedin, Hansen and Pitman (2007). Notes on the occupancy problem withinfinitely many boxes: general asymptotics and power laws. Probab. Surv. 4,146–171.

• Ishwaran and James (2001). Gibbs sampling methods for stick-breakingpriors. J. Amer. Statist. Assoc. 96, 161–173.

30 / 29

Page 31: Asymptotics for discrete random measures

References

• Kingman (1978). The representation of partition structures. J. Lond. Math.Soc. 2, 374-380.

• Kalli, Griffin and Walker (2001). Slice sampling mixture models. Statist.Comput. 12, 93–105.

• Muliere and Tardella (1998). Approximating distributions of randomfunctionals of Ferguson-Dirichlet priors. Canad. J. Statist. 26, 283–297.

• Pitman (2006). Combinatorial Stochastic Processes. Springer.

• Pitman and Yor (1997). The two-parameter poisson-dirichlet distributionderived from a stable subordinator. Ann. Probab. 25, 855–900.

• Shen, Tokdar and Ghosal (2013). Adaptive Bayesian multivariate densityestimation with Dirichlet mixtures. Biometrika 100, 623–640.

•Walker (2007). Sampling the Dirichlet mixture model with slices. Comm.Statist. Simulation Comput. 34, 45–54.

31 / 29