Random permutations and the two-parameter Poisson ...aldous/Pitman_Conference/Slides/... · 2 we...
Transcript of Random permutations and the two-parameter Poisson ...aldous/Pitman_Conference/Slides/... · 2 we...
Random permutations and the two-parameterPoisson-Dirichlet distribution.
Sasha Gnedin
Queen Mary, University of London
Sasha Gnedin Random permutations and the 2-parameter PD
Sasha Gnedin Random permutations and the 2-parameter PD
Sasha Gnedin Random permutations and the 2-parameter PD
Sasha Gnedin Random permutations and the 2-parameter PD
The Pitman-Yor definition
• PD(α, θ) is a probability law for a sequence of random frequencies
P ↓ = (P1,P2, · · · ), with P1 > P2 > · · · > 0,∑j
Pj = 1,
obtained by arranging in decreasing order another sequence
P = (P1, P2, · · · )
Pj = Wj
j−1∏i=1
(1−Wj), j = 1, 2, . . . ,
where Wi ’s independent, with WjL= Beta(1− α, θ + αj)
Sasha Gnedin Random permutations and the 2-parameter PD
The Pitman-Yor definition
• PD(α, θ) is a probability law for a sequence of random frequencies
P ↓ = (P1,P2, · · · ), with P1 > P2 > · · · > 0,∑j
Pj = 1,
obtained by arranging in decreasing order another sequence
P = (P1, P2, · · · )
Pj = Wj
j−1∏i=1
(1−Wj), j = 1, 2, . . . ,
where Wi ’s independent, with WjL= Beta(1− α, θ + αj)
Sasha Gnedin Random permutations and the 2-parameter PD
Two algorithms for size-biased ordering
• Conventional sampling without replacement algorithm:For p1, p2, . . . with s =
∑pj <∞, a size-biased pick p1 := pJ is
defined by setting P(J = j) = pj/s. Removing J from N, resp. pJfrom p1, p2, · · · , and iterating the SB-picking yields a SBP of N, resp.of p1, p2, · · ·
Sasha Gnedin Random permutations and the 2-parameter PD
• Ranking algorithm to arrange p1, p2, · · · in SB order:kth iteration only deals with p1, · · · , pk . After 1, · · · , k have beenarranged as i1, · · · , ik with (q1, · · · , qk) := (pi1 , · · · , pik ) the relativerank ρk+1 of k + 1 is determined by moving k + 1 left-to-rightthrough i1, · · · , ik until settling in positionρk+1 = m ∈ {1, · · · , k + 1} with odds
pk+1 : (qm + · · ·+ qk).
The infinite SB order is defined by ρ1, ρ2, . . .
• k steps yield 1, · · · , k (resp. p1, · · · , pk) in size-biased order, showingthat the finite orders are consistent under restrictions (cf also P-Tran’12).
• Works also if∑
pj =∞ although in this case the SB order is not awell-order.
• When p1 = p2 = · · · we have the ranks ρk independent, uniform on[k] := {1, · · · , k}, and the resulting order is the exchangeable infiniteorder (Aldous ’83), which restricts to [k] as uniformly distributedpermutation.
Sasha Gnedin Random permutations and the 2-parameter PD
• Ranking algorithm to arrange p1, p2, · · · in SB order:kth iteration only deals with p1, · · · , pk . After 1, · · · , k have beenarranged as i1, · · · , ik with (q1, · · · , qk) := (pi1 , · · · , pik ) the relativerank ρk+1 of k + 1 is determined by moving k + 1 left-to-rightthrough i1, · · · , ik until settling in positionρk+1 = m ∈ {1, · · · , k + 1} with odds
pk+1 : (qm + · · ·+ qk).
The infinite SB order is defined by ρ1, ρ2, . . .
• k steps yield 1, · · · , k (resp. p1, · · · , pk) in size-biased order, showingthat the finite orders are consistent under restrictions (cf also P-Tran’12).
• Works also if∑
pj =∞ although in this case the SB order is not awell-order.
• When p1 = p2 = · · · we have the ranks ρk independent, uniform on[k] := {1, · · · , k}, and the resulting order is the exchangeable infiniteorder (Aldous ’83), which restricts to [k] as uniformly distributedpermutation.
Sasha Gnedin Random permutations and the 2-parameter PD
• Ranking algorithm to arrange p1, p2, · · · in SB order:kth iteration only deals with p1, · · · , pk . After 1, · · · , k have beenarranged as i1, · · · , ik with (q1, · · · , qk) := (pi1 , · · · , pik ) the relativerank ρk+1 of k + 1 is determined by moving k + 1 left-to-rightthrough i1, · · · , ik until settling in positionρk+1 = m ∈ {1, · · · , k + 1} with odds
pk+1 : (qm + · · ·+ qk).
The infinite SB order is defined by ρ1, ρ2, . . .
• k steps yield 1, · · · , k (resp. p1, · · · , pk) in size-biased order, showingthat the finite orders are consistent under restrictions (cf also P-Tran’12).
• Works also if∑
pj =∞ although in this case the SB order is not awell-order.
• When p1 = p2 = · · · we have the ranks ρk independent, uniform on[k] := {1, · · · , k}, and the resulting order is the exchangeable infiniteorder (Aldous ’83), which restricts to [k] as uniformly distributedpermutation.
Sasha Gnedin Random permutations and the 2-parameter PD
• Ranking algorithm to arrange p1, p2, · · · in SB order:kth iteration only deals with p1, · · · , pk . After 1, · · · , k have beenarranged as i1, · · · , ik with (q1, · · · , qk) := (pi1 , · · · , pik ) the relativerank ρk+1 of k + 1 is determined by moving k + 1 left-to-rightthrough i1, · · · , ik until settling in positionρk+1 = m ∈ {1, · · · , k + 1} with odds
pk+1 : (qm + · · ·+ qk).
The infinite SB order is defined by ρ1, ρ2, . . .
• k steps yield 1, · · · , k (resp. p1, · · · , pk) in size-biased order, showingthat the finite orders are consistent under restrictions (cf also P-Tran’12).
• Works also if∑
pj =∞ although in this case the SB order is not awell-order.
• When p1 = p2 = · · · we have the ranks ρk independent, uniform on[k] := {1, · · · , k}, and the resulting order is the exchangeable infiniteorder (Aldous ’83), which restricts to [k] as uniformly distributedpermutation.
Sasha Gnedin Random permutations and the 2-parameter PD
Characterisation of PD by SBP
• If P1 is independent of (P2, P3, · · · )/(1− P1) then the stick-breakingfactors Yj are independent and (excluding some trivial cases)
P ↓L= PD(α, θ) for some α, θ.
– McCloskey ’65, P ’96, G-Haulk-P ’09
Sasha Gnedin Random permutations and the 2-parameter PD
The arrangement problem
Ordered representations of PD involve
• either an increasing jumpprocess (random c.d.f.)(Ft , t ≥ 0),
• or interval partition of [0, 1] into components of [0, 1] \ Z , for Z arandom measure-0 closed set.
Every such representation implies certain arrangement P ∗ of thefrequencies P↓j ’s in accord with the natural ordering of jump-times, resp.component intervals.
• The arrangement problem concerns features of this induced order P ∗,characterization of PD and sub-families, as well as connection of P ∗
to the well-orders P ↓ and P .
Sasha Gnedin Random permutations and the 2-parameter PD
The arrangement problem
Ordered representations of PD involve
• either an increasing jumpprocess (random c.d.f.)(Ft , t ≥ 0),
• or interval partition of [0, 1] into components of [0, 1] \ Z , for Z arandom measure-0 closed set.
Every such representation implies certain arrangement P ∗ of thefrequencies P↓j ’s in accord with the natural ordering of jump-times, resp.component intervals.
• The arrangement problem concerns features of this induced order P ∗,characterization of PD and sub-families, as well as connection of P ∗
to the well-orders P ↓ and P .
Sasha Gnedin Random permutations and the 2-parameter PD
A combinatorial counterpart of the arrangement problem
• Recall that Pj is the asymptotic frequency of the jth occupied table inthe Dubins-Pitman Chinese Restaurant
When the occupancy numbers aren1, . . . , nk , (n1 + · · ·+ nk = n)
• sits at occupied table j with probabilitynj−αn+θ ,
• occupies a new table with probabilityθ+kαn+θ .
Sasha Gnedin Random permutations and the 2-parameter PD
A combinatorial counterpart of the arrangement problem
• Recall that Pj is the asymptotic frequency of the jth occupied table inthe Dubins-Pitman Chinese Restaurant
When the occupancy numbers aren1, . . . , nk , (n1 + · · ·+ nk = n)
• sits at occupied table j with probabilitynj−αn+θ ,
• occupies a new table with probabilityθ+kαn+θ .
Sasha Gnedin Random permutations and the 2-parameter PD
A combinatorial counterpart of the arrangement problem
• Recall that Pj is the asymptotic frequency of the jth occupied table inthe Dubins-Pitman Chinese Restaurant
When the occupancy numbers aren1, . . . , nk , (n1 + · · ·+ nk = n)
• sits at occupied table j with probabilitynj−αn+θ ,
• occupies a new table with probabilityθ+kαn+θ .
Sasha Gnedin Random permutations and the 2-parameter PD
A combinatorial counterpart of the arrangement problem
• Recall that Pj is the asymptotic frequency of the jth occupied table inthe Dubins-Pitman Chinese Restaurant
When the occupancy numbers aren1, . . . , nk , (n1 + · · ·+ nk = n)
• sits at occupied table j with probabilitynj−αn+θ ,
• occupies a new table with probabilityθ+kαn+θ .
Sasha Gnedin Random permutations and the 2-parameter PD
• Hence a n-sample from P ∗ has the structure of composition (orderedpartition) Π∗n of integer n, with the CRP ‘table’ occupancy countsarranged in the corresponding order. The Π∗n’s are consistent as nvaries.
Z ,U1, · · · ,Un
• sample uniform[0,1] points U1, . . . ,Un
• scan the gaps in Z in the left-to-right order
• record the sizes of clusters in each occupied gap
Sasha Gnedin Random permutations and the 2-parameter PD
• Hence a n-sample from P ∗ has the structure of composition (orderedpartition) Π∗n of integer n, with the CRP ‘table’ occupancy countsarranged in the corresponding order. The Π∗n’s are consistent as nvaries.
Z ,U1, · · · ,Un
• sample uniform[0,1] points U1, . . . ,Un
• scan the gaps in Z in the left-to-right order
• record the sizes of clusters in each occupied gap
Sasha Gnedin Random permutations and the 2-parameter PD
• Hence a n-sample from P ∗ has the structure of composition (orderedpartition) Π∗n of integer n, with the CRP ‘table’ occupancy countsarranged in the corresponding order. The Π∗n’s are consistent as nvaries.
Z ,U1, · · · ,Un
• sample uniform[0,1] points U1, . . . ,Un
• scan the gaps in Z in the left-to-right order
• record the sizes of clusters in each occupied gap
Sasha Gnedin Random permutations and the 2-parameter PD
Subordinator ‘bridge’ representations of PD
• For (St , t ≥ 0) a subordinator with S0 = 0 and tilted by manipulatingthe distribution of (T , ST )
Ft =StST
, 0 ≤ t ≤ T
• depending on choice of subordinator (gamma, stable, generalizedgamma) some restricted range of (α, θ) ∈ [0, 1)× [0,∞) may becovered–McCloskey ’65, Kingman ’75, Perman-PY ’92, PY ’97, P ’03
• The induced order P ∗ is exchangeable, i.e. P↓j (equivalently, Pj ’s) areshuffled ‘uniformly at random’.
Sasha Gnedin Random permutations and the 2-parameter PD
Subordinator ‘bridge’ representations of PD
• For (St , t ≥ 0) a subordinator with S0 = 0 and tilted by manipulatingthe distribution of (T , ST )
Ft =StST
, 0 ≤ t ≤ T
• depending on choice of subordinator (gamma, stable, generalizedgamma) some restricted range of (α, θ) ∈ [0, 1)× [0,∞) may becovered–McCloskey ’65, Kingman ’75, Perman-PY ’92, PY ’97, P ’03
• The induced order P ∗ is exchangeable, i.e. P↓j (equivalently, Pj ’s) areshuffled ‘uniformly at random’.
Sasha Gnedin Random permutations and the 2-parameter PD
The self-similar representation of PD(α, 0)
• Let Z be the α-stable set (e.g., the zero set of BM for α = 1/2), soZ ∩ [0, 1] is the range of α-stable subordinator (St , t ≥ 0) ‘cut’ bypassing level 1.
• The SB pick P1 is the size of the rightmost ‘meander’ interval, whileall other frequencies occur in the exchangeable order (PY ’96).
• Exactly the same (and not just in the n→∞ regime) arrangement ofparts occurs on the combinatorial level of composition Π∗ (P ’97).
Sasha Gnedin Random permutations and the 2-parameter PD
Regenerative compositions
• For subordinator (St , t ≥ 0) (with S0 = 0) the ‘discrete’ c.d.f.
Ft := 1− e−St
is known as a neutral-to-the right prior.
• Compositions Π∗n have a(characteristic) first-part deletionproperty: given the first part is m,deleting the part from Π∗n yields acopy of Π∗n−m.
Sasha Gnedin Random permutations and the 2-parameter PD
Regenerative compositions
• For subordinator (St , t ≥ 0) (with S0 = 0) the ‘discrete’ c.d.f.
Ft := 1− e−St
is known as a neutral-to-the right prior.
• Compositions Π∗n have a(characteristic) first-part deletionproperty: given the first part is m,deleting the part from Π∗n yields acopy of Π∗n−m.
Sasha Gnedin Random permutations and the 2-parameter PD
• The distribution of Π∗n has a product form
p∗(n1, . . . , nk) =k∏
j=1
q(nj + · · ·+ nk , nj)
where (assuming zero drift)
q(n,m) : =Φ(n,m)
Φ(n),
Φ(λ) :=
∫ ∞0
(1− e−λx)ν(dx),
Φ(n,m) :=
(n
m
)∫ ∞0
(1− e−x)me−(n−m)xν(dx)
and ν is the Levy measure of the subordinator.
Sasha Gnedin Random permutations and the 2-parameter PD
• The NTR/regenerative representation is an intrinsic property ofunordered objects P ↓/partition structure (Πn), by the virtue of
p∗(n) = E[Pn−11 ] = E
∑j
(P↓j )n
.• Moreover, if (unordered) partitions Πn (derived by sampling from
some frequencies P ↓) have some kind of part-deletion property, thenthey can be represented by regenerative compositions Π∗n.
Sasha Gnedin Random permutations and the 2-parameter PD
• The NTR/regenerative representation is an intrinsic property ofunordered objects P ↓/partition structure (Πn), by the virtue of
p∗(n) = E[Pn−11 ] = E
∑j
(P↓j )n
.• Moreover, if (unordered) partitions Πn (derived by sampling from
some frequencies P ↓) have some kind of part-deletion property, thenthey can be represented by regenerative compositions Π∗n.
Sasha Gnedin Random permutations and the 2-parameter PD
The regenerative representation of PD
The Redwoods Park Theorem: for 0 ≤ α < 1, θ ≥ 0, the PD(α, θ) isregenerative:
• there is a Levy measure with the upper tail
να,θ[x ,∞) = (1− e−x)−αe−xθ, x ≥ 0
• which corresponds to
Φ(λ) =λ Γ(1− α)Γ(λ+ θ)
Γ(λ+ 1− α + θ)
q(n,m) =
(n
m
)(1− α)m−1
(θ + n −m)n−1
(n −m)α + mθ
n
and agrees with the Ewens-Pitman sampling formula, e.g.
p∗(n) =(1− α)n−1(1 + θ)n−1
.
Sasha Gnedin Random permutations and the 2-parameter PD
The regenerative representation of PD
The Redwoods Park Theorem: for 0 ≤ α < 1, θ ≥ 0, the PD(α, θ) isregenerative:
• there is a Levy measure with the upper tail
να,θ[x ,∞) = (1− e−x)−αe−xθ, x ≥ 0
• which corresponds to
Φ(λ) =λ Γ(1− α)Γ(λ+ θ)
Γ(λ+ 1− α + θ)
q(n,m) =
(n
m
)(1− α)m−1
(θ + n −m)n−1
(n −m)α + mθ
n
and agrees with the Ewens-Pitman sampling formula, e.g.
p∗(n) =(1− α)n−1(1 + θ)n−1
.
Sasha Gnedin Random permutations and the 2-parameter PD
The regenerative representation of PD
The Redwoods Park Theorem: for 0 ≤ α < 1, θ ≥ 0, the PD(α, θ) isregenerative:
• there is a Levy measure with the upper tail
να,θ[x ,∞) = (1− e−x)−αe−xθ, x ≥ 0
• which corresponds to
Φ(λ) =λ Γ(1− α)Γ(λ+ θ)
Γ(λ+ 1− α + θ)
q(n,m) =
(n
m
)(1− α)m−1
(θ + n −m)n−1
(n −m)α + mθ
n
and agrees with the Ewens-Pitman sampling formula, e.g.
p∗(n) =(1− α)n−1(1 + θ)n−1
.
Sasha Gnedin Random permutations and the 2-parameter PD
Subfamily PD(0, θ)
• Forν0,θ(dx) = θe−xθdx , x ≥ 0
(St , t ≥ 0) is compound Poisson with exponential jumps
• The range of Ft = 1− e−St is a Poisson point process with ratefunction θ(1− y)−1, y ∈ [0, 1), hence points obtainable by i.i.d.beta(1, θ) stick-breaking
• P ∗ is a size-biased permutation of P ↓, likewise Π∗n has parts in theSB order
• If a partition structure has the SB-part deletion property, it is Ewens’– Kingman ’78
Sasha Gnedin Random permutations and the 2-parameter PD
Subfamily PD(α, α)
• The (α, α) Levy measure
να,α(dx) =αex
(ex − 1)α+1dx , x ≥ 0
was introduced by Lamperti in ’73. He observed that eSt is atime-changed α-stable subordinator starting at 1, with the range1 + Zα (for Zα α-stable regenerative set). Hence the range ofFt = 1− e−St is Zα/(Zα + 1), which is a known representation of theα-stable bridge
• P ∗ is an exchangeable ordering of P ↓
• If the first part and the last part of regenerative Π∗n have the samedistribution (for every n), then the partition structure is Pitman’s(α, α).
Sasha Gnedin Random permutations and the 2-parameter PD
Subfamily PD(α, 0)
• The (α, 0) Levy measure is
να,0(dx) =αeαx
(ex − 1)α+1dx + δ∞(dx), x ≥ 0
• Killed subordinator (St) belongs to the family of Lamperti-stablesubordinators (Chaumont and Caballero ’06)
• P ∗ has the last part P1, and the other parts are in exchangeableorder. Same for Π∗n. This kind of ordering is characteristic for (α, 0)among the regenerative structures.
Sasha Gnedin Random permutations and the 2-parameter PD
Alternative proof of the regenerative representation forα, θ > 0
• To construct a path of subordinator with Levy measureν[x ,∞) = e−xθ(1− e−x)−α
(a) split [0,∞) at points E1 + · · ·+ Ej of homogeneous Poisson(θ) process,(b) run a Lamperti-stable (α, 0) (killed) subordinator from the origin (0, 0)
until crossing E1 at some time τ1,(c) start another copy of this (α, 0)-subordinator from (E1, τ1) and run it
until passing E1 + E2, etc.
• The construction agrees with PY ’97, Corollary to Proposition 21: For0 ≤ α < 1, θ ≥ 0, P ↓ can be obtained by first splitting the unityaccording to PD(0, θ), then further fragmenting each piece by anindependent copy of the α-stable set.
Sasha Gnedin Random permutations and the 2-parameter PD
Size-cosize biased permutations
• Fix ξ ∈ [0,∞]. For positive p1, . . . , pk with s = p1 + · · ·+ pk , aξ-biased pick is a random element pJ , where
P(J = j) =ξpj + (s − pj)
s(ξ + k − 1).
• ∞-biased pick is size-biased,
• 1-biased pick is uniformly random,
• 0-biased pick is cosize biased.
Removing J, pJ and iterating yields a ξ-biased permutation of p1, . . . , pk ,denoted
permξ(p1, . . . , pn).
This is only defined for k <∞!
Sasha Gnedin Random permutations and the 2-parameter PD
The arrangement problem for fixed n
Let Π↓n be the (α, θ)-partition of n, and let ξ = θ/α.
• partition structure (Π↓n) has the ξ-biased pick deletion property, whichis characteristic;
• Π∗n is obtained from (Π↓n) by arranging the parts in the ξ-biased order.
ξ-biased permutations are not consistent under restrictions . . . How toextend to p1, p2, . . . with infinitely many pj > 0?
Sasha Gnedin Random permutations and the 2-parameter PD
Permutations/orders with biased record counts
• For ξ ∈ [0,∞], define /ξ as a random order on N with independentranks ρ1, ρ2, . . . such that
P(ρk = i) =
{1
k+ξ−1 for 1 ≤ i ≤ k − 1,ξ
k+ξ−1 for i = k .
• /ξ restricts to [k] as a permutation with distribution
ξ#records
(ξ)k
Sasha Gnedin Random permutations and the 2-parameter PD
A solution to the arrangement problem
• Key observation:
permξ(p1, . . . , pk)L= /ξ ◦ perm∞(p1, . . . , pk)
where the record-biased order /ξ and SBP perm∞ are independent.The RHS makes sense for infinite p1, p2, . . .– P-Winkel ’09, G-Haulk-P ’09
• Conclusion: For 0 ≤ α < 1, θ ≥ 0, the regenerative version P ∗ ofPD(α, θ) can be obtained by arranging the SB frequencies P in theindependent order /θ/α.
Sasha Gnedin Random permutations and the 2-parameter PD
Selfsimiar, stationary regenerative representations of PD
• Z ⊂ [0,∞) is selfsimilar if ZL= cZ for c > 0
• In particular, Z ∩ [0, 1] could be the range of (e−(St+X ), t ≥ 0) formean-µ subordinator (St) and independent X with distributionP(X ∈ dx) = µ−1ν[x ,∞]dx .
• Then the induced composition structure (Π∗n) has a last-bit deletionproperty in the representation via binary code (e.g. 1010001 for(2,4,1)), and a version of the last-part deletion property.–Young ’05, GP ’05
Sasha Gnedin Random permutations and the 2-parameter PD
• For 0 ≤ α < 1, θ > −α, PD(α, θ) has such representation where
1− e−X = P1L= beta(1− α, θ + α)
and (St) is subordinator withν[x ,∞] = e−x(θ+α)(1− e−x)−α.–PY ’96, GP ’05
• Then the SB pick P1 is in the last position, while other frequenciesare frequencies are arranged in the order inverse to the regenerative-PD(α, θ + α).
• Example: the BM (1/2, 0) interval partition is left-to-rightregenerative (NTR), and at the same time right-to-leftstationary-regenerative (hence self-similar).
• Example: The BB (1/2, 1/2) interval partition is NTR, but also has arealization as a stationary version of regenerative-PD(1/2, 1)
Sasha Gnedin Random permutations and the 2-parameter PD
? + ? + ? =?
Intelligence + effort + persistence = excellence
Sasha Gnedin Random permutations and the 2-parameter PD
? + ? + ? =?Intelligence + effort + persistence = excellence
Sasha Gnedin Random permutations and the 2-parameter PD