Random permutations and the two-parameter Poisson ...aldous/Pitman_Conference/Slides/... · 2 we...

Random permutations and the two-parameterPoisson-Dirichlet distribution.

Sasha Gnedin

Queen Mary, University of London

Sasha Gnedin Random permutations and the 2-parameter PD

The Pitman-Yor definition

• PD(α, θ) is a probability law for a sequence of random frequencies

P ↓ = (P1,P2, · · · ), with P1 > P2 > · · · > 0,∑j

Pj = 1,

obtained by arranging in decreasing order another sequence

P = (P1, P2, · · · )

Pj = Wj

j−1∏i=1

(1−Wj), j = 1, 2, . . . ,

where Wi ’s independent, with WjL= Beta(1− α, θ + αj)

The Pitman-Yor definition

• PD(α, θ) is a probability law for a sequence of random frequencies

P ↓ = (P1,P2, · · · ), with P1 > P2 > · · · > 0,∑j

Pj = 1,

obtained by arranging in decreasing order another sequence

P = (P1, P2, · · · )

Pj = Wj

j−1∏i=1

(1−Wj), j = 1, 2, . . . ,

where Wi ’s independent, with WjL= Beta(1− α, θ + αj)

Two algorithms for size-biased ordering

• Conventional sampling without replacement algorithm:For p1, p2, . . . with s =

∑pj <∞, a size-biased pick p1 := pJ is

defined by setting P(J = j) = pj/s. Removing J from N, resp. pJfrom p1, p2, · · · , and iterating the SB-picking yields a SBP of N, resp.of p1, p2, · · ·

• Ranking algorithm to arrange p1, p2, · · · in SB order:kth iteration only deals with p1, · · · , pk . After 1, · · · , k have beenarranged as i1, · · · , ik with (q1, · · · , qk) := (pi1 , · · · , pik ) the relativerank ρk+1 of k + 1 is determined by moving k + 1 left-to-rightthrough i1, · · · , ik until settling in positionρk+1 = m ∈ {1, · · · , k + 1} with odds

pk+1 : (qm + · · ·+ qk).

The infinite SB order is defined by ρ1, ρ2, . . .

• k steps yield 1, · · · , k (resp. p1, · · · , pk) in size-biased order, showingthat the finite orders are consistent under restrictions (cf also P-Tran’12).

• Works also if∑

pj =∞ although in this case the SB order is not awell-order.

• When p1 = p2 = · · · we have the ranks ρk independent, uniform on[k] := {1, · · · , k}, and the resulting order is the exchangeable infiniteorder (Aldous ’83), which restricts to [k] as uniformly distributedpermutation.

pk+1 : (qm + · · ·+ qk).

Characterisation of PD by SBP

• If P1 is independent of (P2, P3, · · · )/(1− P1) then the stick-breakingfactors Yj are independent and (excluding some trivial cases)

P ↓L= PD(α, θ) for some α, θ.

– McCloskey ’65, P ’96, G-Haulk-P ’09

The arrangement problem

Ordered representations of PD involve

• either an increasing jumpprocess (random c.d.f.)(Ft , t ≥ 0),

• or interval partition of [0, 1] into components of [0, 1] \ Z , for Z arandom measure-0 closed set.

Every such representation implies certain arrangement P ∗ of thefrequencies P↓j ’s in accord with the natural ordering of jump-times, resp.component intervals.

• The arrangement problem concerns features of this induced order P ∗,characterization of PD and sub-families, as well as connection of P ∗

to the well-orders P ↓ and P .

The arrangement problem

Ordered representations of PD involve

• either an increasing jumpprocess (random c.d.f.)(Ft , t ≥ 0),

• or interval partition of [0, 1] into components of [0, 1] \ Z , for Z arandom measure-0 closed set.

Every such representation implies certain arrangement P ∗ of thefrequencies P↓j ’s in accord with the natural ordering of jump-times, resp.component intervals.

• The arrangement problem concerns features of this induced order P ∗,characterization of PD and sub-families, as well as connection of P ∗

to the well-orders P ↓ and P .

A combinatorial counterpart of the arrangement problem

• Recall that Pj is the asymptotic frequency of the jth occupied table inthe Dubins-Pitman Chinese Restaurant

When the occupancy numbers aren1, . . . , nk , (n1 + · · ·+ nk = n)

• sits at occupied table j with probabilitynj−αn+θ ,

• occupies a new table with probabilityθ+kαn+θ .

• Hence a n-sample from P ∗ has the structure of composition (orderedpartition) Π∗n of integer n, with the CRP ‘table’ occupancy countsarranged in the corresponding order. The Π∗n’s are consistent as nvaries.

Z ,U1, · · · ,Un

• sample uniform[0,1] points U1, . . . ,Un

• scan the gaps in Z in the left-to-right order

• record the sizes of clusters in each occupied gap

Z ,U1, · · · ,Un

Subordinator ‘bridge’ representations of PD

• For (St , t ≥ 0) a subordinator with S0 = 0 and tilted by manipulatingthe distribution of (T , ST )

Ft =StST

, 0 ≤ t ≤ T

• depending on choice of subordinator (gamma, stable, generalizedgamma) some restricted range of (α, θ) ∈ [0, 1)× [0,∞) may becovered–McCloskey ’65, Kingman ’75, Perman-PY ’92, PY ’97, P ’03

• The induced order P ∗ is exchangeable, i.e. P↓j (equivalently, Pj ’s) areshuffled ‘uniformly at random’.

Subordinator ‘bridge’ representations of PD

• For (St , t ≥ 0) a subordinator with S0 = 0 and tilted by manipulatingthe distribution of (T , ST )

Ft =StST

, 0 ≤ t ≤ T

• depending on choice of subordinator (gamma, stable, generalizedgamma) some restricted range of (α, θ) ∈ [0, 1)× [0,∞) may becovered–McCloskey ’65, Kingman ’75, Perman-PY ’92, PY ’97, P ’03

• The induced order P ∗ is exchangeable, i.e. P↓j (equivalently, Pj ’s) areshuffled ‘uniformly at random’.

The self-similar representation of PD(α, 0)

• Let Z be the α-stable set (e.g., the zero set of BM for α = 1/2), soZ ∩ [0, 1] is the range of α-stable subordinator (St , t ≥ 0) ‘cut’ bypassing level 1.

• The SB pick P1 is the size of the rightmost ‘meander’ interval, whileall other frequencies occur in the exchangeable order (PY ’96).

• Exactly the same (and not just in the n→∞ regime) arrangement ofparts occurs on the combinatorial level of composition Π∗ (P ’97).

Regenerative compositions

• For subordinator (St , t ≥ 0) (with S0 = 0) the ‘discrete’ c.d.f.

Ft := 1− e−St

is known as a neutral-to-the right prior.

• Compositions Π∗n have a(characteristic) first-part deletionproperty: given the first part is m,deleting the part from Π∗n yields acopy of Π∗n−m.

Regenerative compositions

• For subordinator (St , t ≥ 0) (with S0 = 0) the ‘discrete’ c.d.f.

Ft := 1− e−St

is known as a neutral-to-the right prior.

• Compositions Π∗n have a(characteristic) first-part deletionproperty: given the first part is m,deleting the part from Π∗n yields acopy of Π∗n−m.

• The distribution of Π∗n has a product form

p∗(n1, . . . , nk) =k∏

q(nj + · · ·+ nk , nj)

where (assuming zero drift)

q(n,m) : =Φ(n,m)

Φ(n),

Φ(λ) :=

∫ ∞0

(1− e−λx)ν(dx),

Φ(n,m) :=

)∫ ∞0

(1− e−x)me−(n−m)xν(dx)

and ν is the Levy measure of the subordinator.

• The NTR/regenerative representation is an intrinsic property ofunordered objects P ↓/partition structure (Πn), by the virtue of

p∗(n) = E[Pn−11 ] = E

(P↓j )n

.• Moreover, if (unordered) partitions Πn (derived by sampling from

some frequencies P ↓) have some kind of part-deletion property, thenthey can be represented by regenerative compositions Π∗n.

• The NTR/regenerative representation is an intrinsic property ofunordered objects P ↓/partition structure (Πn), by the virtue of

p∗(n) = E[Pn−11 ] = E

(P↓j )n

.• Moreover, if (unordered) partitions Πn (derived by sampling from

some frequencies P ↓) have some kind of part-deletion property, thenthey can be represented by regenerative compositions Π∗n.

The regenerative representation of PD

The Redwoods Park Theorem: for 0 ≤ α < 1, θ ≥ 0, the PD(α, θ) isregenerative:

• there is a Levy measure with the upper tail

να,θ[x ,∞) = (1− e−x)−αe−xθ, x ≥ 0

• which corresponds to

Φ(λ) =λ Γ(1− α)Γ(λ+ θ)

Γ(λ+ 1− α + θ)

q(n,m) =

)(1− α)m−1

(θ + n −m)n−1

(n −m)α + mθ

and agrees with the Ewens-Pitman sampling formula, e.g.

p∗(n) =(1− α)n−1(1 + θ)n−1

να,θ[x ,∞) = (1− e−x)−αe−xθ, x ≥ 0

Φ(λ) =λ Γ(1− α)Γ(λ+ θ)

Γ(λ+ 1− α + θ)

q(n,m) =

)(1− α)m−1

(θ + n −m)n−1

(n −m)α + mθ

p∗(n) =(1− α)n−1(1 + θ)n−1

να,θ[x ,∞) = (1− e−x)−αe−xθ, x ≥ 0

Φ(λ) =λ Γ(1− α)Γ(λ+ θ)

Γ(λ+ 1− α + θ)

q(n,m) =

)(1− α)m−1

(θ + n −m)n−1

(n −m)α + mθ

p∗(n) =(1− α)n−1(1 + θ)n−1

Subfamily PD(0, θ)

• Forν0,θ(dx) = θe−xθdx , x ≥ 0

(St , t ≥ 0) is compound Poisson with exponential jumps

• The range of Ft = 1− e−St is a Poisson point process with ratefunction θ(1− y)−1, y ∈ [0, 1), hence points obtainable by i.i.d.beta(1, θ) stick-breaking

• P ∗ is a size-biased permutation of P ↓, likewise Π∗n has parts in theSB order

• If a partition structure has the SB-part deletion property, it is Ewens’– Kingman ’78

Subfamily PD(α, α)

• The (α, α) Levy measure

να,α(dx) =αex

(ex − 1)α+1dx , x ≥ 0

was introduced by Lamperti in ’73. He observed that eSt is atime-changed α-stable subordinator starting at 1, with the range1 + Zα (for Zα α-stable regenerative set). Hence the range ofFt = 1− e−St is Zα/(Zα + 1), which is a known representation of theα-stable bridge

• P ∗ is an exchangeable ordering of P ↓

• If the first part and the last part of regenerative Π∗n have the samedistribution (for every n), then the partition structure is Pitman’s(α, α).

Subfamily PD(α, 0)

• The (α, 0) Levy measure is

να,0(dx) =αeαx

(ex − 1)α+1dx + δ∞(dx), x ≥ 0

• Killed subordinator (St) belongs to the family of Lamperti-stablesubordinators (Chaumont and Caballero ’06)

• P ∗ has the last part P1, and the other parts are in exchangeableorder. Same for Π∗n. This kind of ordering is characteristic for (α, 0)among the regenerative structures.

Alternative proof of the regenerative representation forα, θ > 0

• To construct a path of subordinator with Levy measureν[x ,∞) = e−xθ(1− e−x)−α

(a) split [0,∞) at points E1 + · · ·+ Ej of homogeneous Poisson(θ) process,(b) run a Lamperti-stable (α, 0) (killed) subordinator from the origin (0, 0)

until crossing E1 at some time τ1,(c) start another copy of this (α, 0)-subordinator from (E1, τ1) and run it

until passing E1 + E2, etc.

• The construction agrees with PY ’97, Corollary to Proposition 21: For0 ≤ α < 1, θ ≥ 0, P ↓ can be obtained by first splitting the unityaccording to PD(0, θ), then further fragmenting each piece by anindependent copy of the α-stable set.

Size-cosize biased permutations

• Fix ξ ∈ [0,∞]. For positive p1, . . . , pk with s = p1 + · · ·+ pk , aξ-biased pick is a random element pJ , where

P(J = j) =ξpj + (s − pj)

s(ξ + k − 1).

• ∞-biased pick is size-biased,

• 1-biased pick is uniformly random,

• 0-biased pick is cosize biased.

Removing J, pJ and iterating yields a ξ-biased permutation of p1, . . . , pk ,denoted

permξ(p1, . . . , pn).

This is only defined for k <∞!

The arrangement problem for fixed n

Let Π↓n be the (α, θ)-partition of n, and let ξ = θ/α.

• partition structure (Π↓n) has the ξ-biased pick deletion property, whichis characteristic;

• Π∗n is obtained from (Π↓n) by arranging the parts in the ξ-biased order.

ξ-biased permutations are not consistent under restrictions . . . How toextend to p1, p2, . . . with infinitely many pj > 0?

Permutations/orders with biased record counts

• For ξ ∈ [0,∞], define /ξ as a random order on N with independentranks ρ1, ρ2, . . . such that

P(ρk = i) =

k+ξ−1 for 1 ≤ i ≤ k − 1,ξ

k+ξ−1 for i = k .

• /ξ restricts to [k] as a permutation with distribution

ξ#records

A solution to the arrangement problem

• Key observation:

permξ(p1, . . . , pk)L= /ξ ◦ perm∞(p1, . . . , pk)

where the record-biased order /ξ and SBP perm∞ are independent.The RHS makes sense for infinite p1, p2, . . .– P-Winkel ’09, G-Haulk-P ’09

• Conclusion: For 0 ≤ α < 1, θ ≥ 0, the regenerative version P ∗ ofPD(α, θ) can be obtained by arranging the SB frequencies P in theindependent order /θ/α.

Selfsimiar, stationary regenerative representations of PD

• Z ⊂ [0,∞) is selfsimilar if ZL= cZ for c > 0

• In particular, Z ∩ [0, 1] could be the range of (e−(St+X ), t ≥ 0) formean-µ subordinator (St) and independent X with distributionP(X ∈ dx) = µ−1ν[x ,∞]dx .

• Then the induced composition structure (Π∗n) has a last-bit deletionproperty in the representation via binary code (e.g. 1010001 for(2,4,1)), and a version of the last-part deletion property.–Young ’05, GP ’05

• For 0 ≤ α < 1, θ > −α, PD(α, θ) has such representation where

1− e−X = P1L= beta(1− α, θ + α)

and (St) is subordinator withν[x ,∞] = e−x(θ+α)(1− e−x)−α.–PY ’96, GP ’05

• Then the SB pick P1 is in the last position, while other frequenciesare frequencies are arranged in the order inverse to the regenerative-PD(α, θ + α).

• Example: the BM (1/2, 0) interval partition is left-to-rightregenerative (NTR), and at the same time right-to-leftstationary-regenerative (hence self-similar).

• Example: The BB (1/2, 1/2) interval partition is NTR, but also has arealization as a stationary version of regenerative-PD(1/2, 1)

? + ? + ? =?

Intelligence + effort + persistence = excellence

? + ? + ? =?Intelligence + effort + persistence = excellence

Random permutations and the two-parameter Poisson ...aldous/Pitman_Conference/Slides/... · 2 we...

Documents

Transcript of Random permutations and the two-parameter Poisson ...aldous/Pitman_Conference/Slides/... · 2 we...