A tentative schedule - uniroma1.it · 2012-12-06 · Example 2: the chi square distribution with n...

27
Dirichlet random probabilities and applications Gérard Letac, Institut de Mathématiques de Toulouse Abstract. If α is a fixed bounded measure on E, a Dirichlet random probability P governed by α is defined by the following property: if E 0 ,...,E n is a partition of E then the density of (P (E 1 ),...,P (E n )) is proportional to (1 - x 1 -···- x n ) α(E0)-1 x α(E1)-1 1 ...x α(En)-1 n . This random measure is a very natural object in a number of contexts: non parametric Bayesian methods, solution of perpetuities equations, Markov-Krein theory, random paths. It is sometimes unappropriately called a Dirichlet process. The Italian school has made important contributions to its theory. The course will describe first the properties of P : even if α is continuous, P is almost surely purely atomic. It will study the existence and the distribution of the real random variable E f (w)P (dw), a particularly interesting object (For instance if E = (0, 1) and α(dx)= dx then the density of X = 1 0 wP (dw) is proportional to sin(πx)x x-1 (1 - x) -x ). It will describe some of the applications mentioned above. The course necessitates no knowledge of stochastic integrals and martingales, but standard training in probability and measure theory is required, with a little dose of Lévy processes or infinitely divisible distributions. Up to this, it is fairly elementary. A tentative schedule for eight classes of 2 academic hours is 1. The algebra of beta-gamma random variables. Dirichlet distributions, amalgamation property. The classical characterization of the gamma distributions. Classical objects of the exponential family of Dirichlet distributions. 2. The T c transform of a probability on a tetrahedron: properties, examples and applications. 3. Definition and proof of the existence of the Dirichlet random probability (DRP). A DRP is a purely atomic distribution. Description of its random weights. 4. The Ewens distribution, the Chinese restaurant process. If P is DRP, the random variable E f (w)P (dw) exists if and only if E log + |f (w)|α(dw) < . 5. A short course on infinitely divisible distributions and on Lévy processes. The random measure associated to the Gamma process and its application to Dirichlet random probability. 6. Some particular cases of calculations of the distribution of R wP (dw): when α is Cauchy, β and more generally uniform on a tetrahedron. The Markov-Krein moment problem. Applications to the non parametric Bayesian theory. 7. The Markov chains of the form X n+1 = F n+1 (X n ) when (F n ) n 1 is an iid sequence of random maps from a set E into itself. Applications to perpetuities, examples when perpetuities are Dirichlet. 8. If Q is a probability on R n , θ> 0 and if Y β(1), W Q and X are independent, then X (1 - Y )X + YW if and only if X xP (dx) where P is DRP governed by θQ. We prove this remarkable theorem of Diaconis and Freedman and extend it to Y β(k,θ) with an other random measure called quasi Bernoulli of order k. Some references 1

Transcript of A tentative schedule - uniroma1.it · 2012-12-06 · Example 2: the chi square distribution with n...

Dirichlet random probabilities and applications

Gérard Letac, Institut de Mathématiques de Toulouse

Abstract. If α is a fixed bounded measure on E, a Dirichlet random probability P governedby α is defined by the following property: if E0, . . . , En is a partition of E then the density of(P (E1), . . . , P (En)) is proportional to

(1− x1 − · · · − xn)α(E0)−1x

α(E1)−11 . . . xα(En)−1

n .

This random measure is a very natural object in a number of contexts: non parametric Bayesianmethods, solution of perpetuities equations, Markov-Krein theory, random paths. It is sometimesunappropriately called a Dirichlet process. The Italian school has made important contributions toits theory. The course will describe first the properties of P : even if α is continuous, P is almostsurely purely atomic. It will study the existence and the distribution of the real random variable∫

Ef(w)P (dw), a particularly interesting object (For instance if E = (0, 1) and α(dx) = dx then

the density of X =∫ 1

0wP (dw) is proportional to sin(πx)xx−1(1 − x)−x). It will describe some of

the applications mentioned above. The course necessitates no knowledge of stochastic integrals andmartingales, but standard training in probability and measure theory is required, with a little dose ofLévy processes or infinitely divisible distributions. Up to this, it is fairly elementary.

A tentative schedule for eight classes of 2 academic hours is

1. The algebra of beta-gamma random variables. Dirichlet distributions, amalgamation property.The classical characterization of the gamma distributions. Classical objects of the exponentialfamily of Dirichlet distributions.

2. The Tc transform of a probability on a tetrahedron: properties, examples and applications.

3. Definition and proof of the existence of the Dirichlet random probability (DRP). A DRP is apurely atomic distribution. Description of its random weights.

4. The Ewens distribution, the Chinese restaurant process. If P is DRP, the random variable∫

Ef(w)P (dw) exists if and only if

E

log+ |f(w)|α(dw) < ∞.

5. A short course on infinitely divisible distributions and on Lévy processes. The random measureassociated to the Gamma process and its application to Dirichlet random probability.

6. Some particular cases of calculations of the distribution of∫

RwP (dw): when α is Cauchy, β and

more generally uniform on a tetrahedron. The Markov-Krein moment problem. Applications tothe non parametric Bayesian theory.

7. The Markov chains of the form Xn+1 = Fn+1(Xn) when (Fn)n ≥ 1 is an iid sequence of randommaps from a set E into itself. Applications to perpetuities, examples when perpetuities areDirichlet.

8. If Q is a probability on Rn , θ > 0 and if Y ∼ β(1, θ), W ∼ Q and X are independent, then

X ∼ (1− Y )X + YW if and only if X ∼∫

xP (dx) where P is DRP governed by θQ. We provethis remarkable theorem of Diaconis and Freedman and extend it to Y ∼ β(k, θ) with an otherrandom measure called quasi Bernoulli of order k.

Some references

1

1. Cifarelli D.M. and Regazzini E. (1990) ’Distribution functions of means of a Dirichletprocess ’ Ann. Statist. 18, 429-442.

2. Diaconis P. and Kemperman J. (1996) ’Some new tools for Dirichlet Priors’ Bayesian Statis-

tics 5, 97-106, Oxford University Press.

3. Diaconis P. and Freedman D. (1999) ’Iterated random functions’ Siam Review 41, pages45-76.

4. James L.F., Lijoi A. and Prünster I. (2010) ’On the posterior distribution of classes ofrandom means’ Bernoulli 16, 155-180.

5. Lijoi A. and Regazzini E. (2004) ’Means of a Dirichlet process and multiple hypergeometricfunctions’ Ann. Probab. 32, 1469-1495.

1 The beta gamma algebra

If p and a > 0 we define the gamma distribution γp,a on the positive line as

γp,a(dx) =1

Γ(p)e−x/a

(x

a

)p−1

1(0,∞(x)dx

a.

The number p is called the shape parameter and the number a is called the scale parameter. If a = 1then γp,a = γp is said to be standard.

If p, q > 0 the beta distribution (or beta distribution of first kind)and the beta distribution ofsecond kind are respectively

βp,q(dx) =1

B(p, q)xp−1(1− x)q−11(0,1)(x)dx

β(2)p,q(dx) =

1

B(p, q)

xp−1

(1 + x)p+q1(0,∞)(x)dx

Here are their Laplace or Mellin transforms:

Proposition 1.1. If a, p, q > 0 we have

∫ ∞

0

e−sxγp,a(dx) = 1(1+as)p for s > −1/a

∫ ∞

0

xs−1γp,a(dx) = as Γ(p+s)Γ(p) for s > −p

∫ 1

0

xs−1βp,q(dx) = Γ(p+s)Γ(p)

Γ(p+q)Γ(p+q+s) for s > −p

∫ ∞

0

xs−1β(2)p,q(dx) = Γ(p+s)

Γ(p)Γ(q−s)Γ(q) for − p < s < q

The proof is obvious. Note that if Y ∼ γp,a then E( 1Y n ) = ∞ if n ≥ p.

Theorem 1.2. Let a, p, q > 0. If X ∼ γp,a and Y ∼ γq,a are independent, then the sum S = X +Y ∼γp+q,a is independent of U = X

X+Y ∼ βp,q and V = XY ∼ β

(2)p,q .

More generally, for n ≥ 2 and p1, . . . , pn > 0, consider the independent random variables X1, . . . , Xn

such that Xk ∼ γpk,a for all k = 1, . . . , n. Define Sk = X1 + · · ·+Xk and qk = p1 + · · ·+ pk. Then

S1

S2,S2

S3, . . . ,

Sn−1

Sn, Sn

2

are independent random variables with Sk

Sk+1∼ βqk,pk+1

and Sn ∼ γqn,a.

Proof. For −p < t < q and s > −1/a the first part is proved by the following computation:

E(

V te−sS)

= E

(

(

X

Y

)t

e−s(X+Y )

)

= E(

Xte−sX)

E(

Y −te−sY)

=Γ(p+ t)

Γ(p)

Γ(q − t)

Γ(q)

1

(1 + as)p+q.

Note that U = XX+Y ∼ βp,q is implied by V = X

Y ∼ β(2)p,q .

We prove the second part by induction on n and things are more delicate. For n = 2 this is thefirst part applied to X = X1, Y = X2, S = S2 and U = S1/S2. Assume that the statement is truefor n ≥ 2 and let us prove it for n + 1. By the induction hypothesis (Sn, Xn+1) is independent ofS1

S2, S2

S3, . . . , Sn−1

Snand we use this fact when tk > −qk and s > −1/a for writing that

E

(

(S1

S2)t1(

S2

S3)t2 . . . (

Sn−1

Sn)tn−1(

Sn

Sn+1)tne−sSn+1

)

= E

(

(S1

S2)t1(

S2

S3)t2 . . . (

Sn−1

Sn)tn−1

)

E

(

(Sn

Sn+1)tne−sSn+1

)

=C

(1 + as)qn+1

n∏

k=1

Γ(qk + tk)

Γ(pk+1 + qk + tk)

where C is the constant with respect to t1, . . . , tn such that when t1 = . . . = tn = 0 the last expressionis 1.

Example 1: the chi square distribution with one degree of freedom. If Z ∼ N(0, 1) thenZ2 ∼ γ1/2,2 = χ2

1. To see this, compute the image of N(0, 1) by z 7→ y = z2 which is the compositionof z 7→ u = |z| 7→ y = u2. We get

e−z2

2dz√2π

→ 2e−u2

2 1(0,∞)(u)du√2π

→ e−y2 y−1/2 dy

Γ(1/2).

Example 2: the chi square distribution with n degrees of freedom. If Z1, . . . , Zn areindependent random variables with the same distribution N(0, 1) then

Z21 + · · ·+ Z2

n ∼ γn/2,2 = χ2n

This comes from the above example for n = 1 and from Theorem 1.2.

Example 3: Gamma distribution and Brownian motion. Until recently, chi square distributionswere the only gamma distributions met in practical situations. Only in 1990 Yor has shown that if Bis a standard one dimensional Brownian motion then

(∫ ∞

0

eaB(t)−btdt

)−1

∼ γ 2b|a|

,|a|2

(see Exercise 1.2). The important point of the remark is that the shape parameter 2b|a| is not necessarily

a half integer.

We mention now a converse of the first part of Theorem 1.2. It is due to Lukacs (1956).

Theorem 1.3. Let X and Y be independent positive random variables not concentrated on a singlepoint such that S = X + Y and U = X/(X + Y ) are independent. Then there exist a, p, q > 0 suchthat X ∼ γp,a and Y ∼ γq,a.

3

Proof. For s > 0 denote LX(s) = E(e−sX) = e−kX(s). Therefore

−L′X(s)LY (s) = E(Xe−sX)E(e−sY ) = E(Xe−s(X+Y ))

= E(USe−sS) = E(U)E(Se−sS) = −E(U)L′S(s) = −E(U)(L′

X(s)LY (s) + LX(s)L′Y (s)).

As a consequence(1− E(U))k′X(s) = E(U))k′Y (s).

Similarly

L′′X(s)LY (s) = E(X2e−sX)E(e−sY ) = E(X2e−s(X+Y ))

= E(U2S2e−sS) = E(U2)E(S2e−sS) = E(U2)L′′S(s)

= E(U2) (L′′X(s)LY (s) + 2L′

X(s)L′Y (s) + LX(s)L′′

Y (s)) .

leading to(k′X)2 − k′′X = E(U2)((k′X)2 − k′′X + 2k′Xk′Y + (k′Y )

2 − k′′Y ) (1)

Denote for simplicity m1 = E(U), m2 = E(U2) and f(s) = m1k′Y (s). We get

k′X = f/(1−m1), k′Y = f/m1.

Observe also that 0 < m21 < m2 < m1 < 1 : the point is that these inequalities are strict from the

hypothesis that X and Y are not concentrated on one point. With these notations, the equality (1)becomes the differential equation Af ′ +Bf2 = 0 where

A =m1 −m2

m1(1−m1)> 0, B =

m2 −m21 + 2m1m2(1−m1)

m21(1−m1)2

> 0.

The solution of this differential equation is f(s) = A/(C+Bs) where C is an arbitrary constant whichmust be > 0 since k′Y (s) > 0 on s > 0. As a consequence kY (s) = m1

AB log(1 + B

C s) (the constant ofintegration is determined by lims→0 kY (s) = 0). Therefore

LY (s) =1

(1 + as)q

is the Laplace transform of γq,a with q = m1A/B and a = B/C. The reasoning for LX is plain.

Exercice 1.1. If S ∼ γp+q,a and U ∼ βp,q are independent, show that SU ∼ γp,a (Hint: use the Mellintransform of SU and Proposition 1.1)

Exercice 1.2. If Z ∼ N(0, t) compute E(eaZ). Use the result for computing f(a, b) = E(Y (a, b)) whereY (a, b) =

∫∞

0eaB(t)−btdt , where B is a standard Brownian motion and where b > |a|/2 > 0. Show that

if b > n|a|/2 then E(Y n(a, b)) = n!∏n

k=1 f(ka, kb). Hint for n = 2:

E(Y 2(a, b)) = 2E

(∫

0<t1<t2

e2aB(t1)+a(B(t2)−B(t1))−2bt1−b(t2−t1)dt1dt2

)

= 2E

(∫ ∞

0

e2aB(t1)−2bt1dt1 ×∫ ∞

0

eaB2(s2)−bs2ds2

)

by introducing for fixed t1 the Brownian motion B2(s2) = B(t1+s2)−B(t1) which is independent of B(t1).Finally prove that E(Y n(a, b)) =

∫∞

0x−nγ 2b

a , a2(dx). What are the values on n such that E(Y n(a, b)) is

finite? Note that this calculation does not prove completely that Y −1(a, b) ∼ γ 2ba , a

2since we have only

proved that a finite number of moments coincide.

4

2 The Dirichlet distribution

We will need the following notation. The natural basis of Rd+1 is denoted by e0, . . . ed. The convexhull of e0, . . . ed is a tetrahedron that we denote by Ed+1. The elements of Ed+1 are therefore thevectors λ = (λ0, . . . , λd) of Rd+1 such that λi ≥ 0 for i = 0, . . . , d and such that λ0 + · · · + λd = 1. Ifa0, . . . , ad are positive numbers the Dirichlet distribution D(a0, . . . , ad) of X = (X0, . . . , Xd) ∈ Ed+1

is such that the law of (X1, . . . , Xd) is

1

B(a0, . . . , ad)(1− x1 − · · · − xd)

a0−1xa1−11 . . . xad−1

d 1Td(x1, . . . , xd)dx1, . . . dxd (2)

where B(a0, . . . , ad) = Γ(a0)...Γ(ad)Γ(a0+...+ad)

and where Td is the set of (x1, . . . , xd) such that xi > 0 for all

i = 0, 1, . . . , d, with the convention x0 = 1− x1 · · · − xd. For instance, if the real random variable X1

follows the beta distribution

β(a1, a0)(dx) =1

B(a1, a0)xa1−1(1− x)a0−11(0,1)(x)dx

then (X1, 1−X1) ∼ D(a1, a0). A simple example is D(a0, . . . , ad), which is the uniform probability onthe tetrahedron Ed+1). It is not clear that (2) has total mass 1, the next theorem will show it. Indeed,the following result is crucial for the Dirichlet distributions:

Theorem 2.1. Consider the independent random variables X0, . . . , Xd such that Xk ∼ γak,c for allk = 0, . . . , d and define S = X0 + · · · + Xd. Then S is independent of the vector 1

S (X0, . . . , Xd).Furthermore 1

S (X1, . . . , Xd) has the density (2) and 1S (X0, . . . , Xd) ∼ D(a0, . . . , ad).

Proof. Denote a = a0 + · · · + ad. It seems that the method of the Jacobian is after all the quickest.We look for the image of the probability

e−1c (x0+···+xd)xa0−1

0 xa1−11 . . . xad−1

d

1

ca∏d

k=0 Γ(ak)1(0,∞)d+1(x0, . . . , xn)dx0 . . . dxn

by the map(x0, . . . , xn) 7→ (y1, . . . , yd, s) (3)

defined by s = x0 + · · ·+ xd and yi = xi/s with i = 1, . . . , d. The inverse is given by

x1 = sy1, . . . , xd = syd, x0 = s(1− y1 − · · · − yd) = sy0

and the Jacobian matrix is very easy to compute

s 0 0 . . . 0 y10 s 0 . . . 0 y2. . . . . . . . . . . . . . . . . .0 0 0 . . . s yd−s −s −s . . . −s y0

Adding the first d rows to the last one shows easily that its determinant is sd and the image of theprobability (3) is

e−sc

sa−1

ca−1Γ(a)1(0,∞(s)ds× 1

B(a0, . . . , ad)(1− y1−· · ·− yd)

a0−1ya1−11 . . . yad−1

d 1Td(y1, . . . , yd)dy1 . . . dyd

where Td = (y1, . . . , yd); yi > 0, i = 0, 1, . . . , n (recall the notation y0 = 1− y1 − · · · − yd)

Remarks: Let us insist on the fact that the Dirichlet distribution D(a0, . . . , ad) is a singular distri-bution in R

d+1 concentrated on the tetraedron Ed+1 However, many calculations about it can only beperformed by using the density of its projection (x0, . . . , xd) 7→ (x1, . . . , xd) from Ed+1 to Td.

5

For the next proposition we use the Pochhammer symbol (X)n = X(X +1)(X +2) . . . (X +n− 1) forn > 0 with the convention (X)0 = 1.

Proposition 2.2: (Moments) If X = (X0, . . . , Xn) ∼ D(a0, . . . , ad) with a = a0 + · · ·+ ad then forsi > −ai one has

E (Xs00 . . . Xsd

d ) =Γ(a)

Γ(a+ s0 + · · ·+ sd)

d∏

k=0

Γ(ak + sk)

Γ(ak).

In particular if n0, . . . , nd are nonegative integers, then

E (Xn0

0 . . . Xnd

d ) =1

(a)n0+···+nd

d∏

k=0

(ak)nk.

Proposition 2.3: Limiting behavior of the Dirichlet distribution

limt→∞

D(ta0, . . . , tad) = δa0/a,...,ad/a (4)

limε→0

D(εa0, . . . , εad) =

d∑

k=0

akaδek (5)

Furthermore let Y0, . . . , Yd be independent random variables such that Yi ∼ β(ai, 1) and denote byµi the distribution of

j 6=i Yjej in Rd+1. If X(ε) ∼ D(εa0, . . . , εad) then the limiting distribution for

ε → 0 of (Xε0(ε), . . . , X

εd(ε)) is the mixing

d∑

i=0

aiaµi (6)

which is concentrated on d+ 1 facets of the unit cube of Rd+1.

Proof: To prove (4) observe that

limt→∞

E (Xn0

0 . . . Xnd

d ) = limt→∞

1

(ta)n0+···+nd

d∏

k=0

(tak)nk=

d∏

k=0

(aka

)nk

.

∏dk=0

(

ak

a

)nk is the moment for (n0, . . . , nd) of δa0/a,...,ad/a. Since Ed+1 is compact, the theorem ofStone Weierstrass says here that any continuous function f on Ed+1 can be uniformly approximatedby a sequence of polynomials. Therefore convergence of moments implies convergence in distributionand (4 is proved.

To prove (5) we use a similar method. But

limε→0

E (Xn0

0 . . . Xnd

d ) = limε→∞

1

(εa)n0+···+nd

d∏

k=0

(εak)nk

is zero if all nj > 0 for all j = 0, . . . , d and is ak/a if nk > 0 and the others nj = 0 for j 6= k. This

limiting sequence of moments corresponds to the Bernoulli distribution∑d

k=0ak

a δek .To prove (6) recall that since Γ(t+ 1) = tΓ(t) for t > 0 we have Γ(t) ∼ 1/t when t → 0. e observe

that for all s0, . . . , sd ≥ 0 we have

E(Xεs00 (ε), . . . , Xεsd

d (ε)) =B(ε(a0 + s0), . . . , ε(ad + sd))

B(εa0, . . . , εad)

→ε→0s0 + · · ·+ sd + a

a

d∏

i=0

aiai + si

=d∑

i=0

aia

j 6=i

ajaj + sj

=d∑

i=0

aia

ys00 . . . ysdd µi(dy).

6

Comments. If we consider d+1 independent Gamma-Lévy processes Y0(t), . . . , Yd(t), denote S(t) =Y (a0t) + . . .+ Yd(adt). Then we have

D(t) =1

S(t)(Y (a0t), . . . , Yd(adt) ∼ D(a0t, . . . , adt).

Clearly, from the law of large numbers the almost sure limit of D(t) when t → ∞ is 1a (a0, . . . , ad).

In the other hand, it is false that the almost sure limit of D(ε) when ε → 0 exists. To see this, justconsider the case d = 1 and a0 = a1 = 1. Denote V (t) = log Y0(t)− log Y1(t) and observe that

D(t) =

(

eV (t)

1 + eV (t),

1

1 + eV (t)

)

.

Now the almost sure limit of V (ε) when ε → 0 fails to exist: this can be proved by the so called zeroone law of Blumenthal and Getoor. As a consequence, the convergence of D(ε) when ε → 0 impliedby Proposition 2.3 is only in distribution.

Theorem 2.4. (Amalgamation) Let a, q1, . . . , qn be positive numbers and let X1, . . . , Xn be in-dependent random variables such that Xj ∼ γqj ,a. Let T1, . . . , Tm be a partition of 1, 2, . . . , n anddenote Qi =

j∈Tiqj . Consider

Si =∑

j∈Ti

Xj , Zi =(Xj)j∈Ti

Si, S =

n∑

j=1

Xj .

Then Y = 1S (S1, . . . , Sm), Z1, . . . , Zm and S are independent. Furthermore Y ∼ D(Q1, . . . , Qm) and

for all i = 1, . . . ,m we have Zi ∼ D((qj)j∈Ti).

Proof. Denote Vi = ZiYi. Then from Theorem 2.1, (Y, V1, . . . , Vm) are independent of S. As aconsequence (Y,Z1, . . . , Zm) are independent of S. Similarly Z1 is independent of S1 from Theorem2.1 again, and is independent of Si for obvious reasons. Therefore Z1 and Y are independent. Moregenerally, (Z1, . . . , Zn) and Y are independent. Finally, Z1, . . . , Zn are independent for obvious reasons.

Comments. The most useful case of such partition occurs when the index set 1, . . . , n is ratherwritten I × J where I = 1, . . . ,m , J = 1, . . . , p and n = mp. The Dirichlet distribution has theform

(Xij)i∈I,j∈J = D((qij)i∈I,j∈J .

If we consider the partition Ti = i × J and Qi =∑m

j=1 qij and Xi. =∑m

j=1 Xij we get

(X1., . . . , Xm.) ∼ D(Q1, . . . , Qm).

(see exercise 2.4)

Corollary 2.5. Let q1, . . . , qn be positive numbers. Let T1, . . . , Tm be a partition of 1, 2, . . . , n anddenote Qi =

j∈Tiqj . If X ∼ D(q1, . . . , qn) and if Yi =

j∈TiXj then

(Y1, . . . , Ym) ∼ D(Q1, . . . , Qm).

In particular Xj ∼ βqj ,Q−qj if Q =∑d

i=0 qi.

Proposition 2.6. If Y ∼ D(a0, . . . , ad) and if the conditional distribution of X ∈ e0, . . . , ed is

Bernoulli and is∑d

k=0 Ykδek then Pr(X = ek) = ak/a and

Y |X = ek = D(a0, . . . , ak−1, ak + 1, ak+1, . . . , ad)

7

Proof. We can write

Pr(X = ek) = E(1X=ek) = E(E(1X = ek|Y )) = E(Yk) =aka.

Finally

akaE(Y n0

0 . . . Y nd

d |X = ek) = Pr(X = ek)E(Yn0

0 . . . Y nd

d |X = ek)

= E(Y n0

0 . . . Y nd

d 1X=ek) = E(Y n0

0 . . . Y nd

d E(1X=ek |Y ))

= E(Y n0

0 . . . Y nd

d Yk)

=1

(a)n0+···+nd+1(a0)n0

. . . (ak−1)nk−1(ak)nk+1(ak+1)nk+1

. . . (ad)nd

and therefore

E(Y n0

0 . . . Y nd

d |X = ek) =1

(a+ 1)n0+···+nd

(a0)n0. . . (ak−1)nk−1

(ak + 1)nk(ak+1)nk+1

. . . (ad)nd

which implies the result.

Comment. If X1, . . . , XN are iid with the Bernoulli distribution∑d

k=0 Ykδek when conditioned byY ∼ D(a0, . . . , ad) then SN = X1 + · · ·+XN when conditioned by Y has the multinomial distribution

(d∑

k=0

Ykδek)∗N =

i0+...+id=N

(

Ni0, . . . , id

)

Y i00 . . . Y id

d δi0e0+···+ided

Denote for a while ~a = (a0, . . . , ad). Then Proposition 2.6 implies that

Y |SN ∼ D(~a+ SN )

as can be seen by induction on N.

The next proposition shows that a Dirichlet distribution can generates a non homogeneous Markovchain:

Proposition 2.7. (Y1, . . . , Yn) ∼ D(p1, . . . , pn) and if Fk = Y1 + · · · + Yk then (Fk)nk=0 is a Markov

chain. Its transition kernel is such that for 1 ≤ k ≤ n− 1 we have Fk|Fk−1 = (1− Fk−1)Bk where

Bk ∼ βpk,pk+1+···+pn.

Proof. We start from the independent random variables X1, . . . , Xn such that Xk ∼ γpk,a for allk = 1, . . . , n. Defining Sk = X1 + · · ·+Xk and qk = p1 + · · ·+ pk.

Theorem 1.2 says thatS1

S2,S2

S3, . . . ,

Sn−1

Sn, Sn

are independent random variables with Sk

Sk+1∼ βqk,pk+1

and Sn ∼ γqn,a. Denoting Yi = Xi/Sn we get

thatF1

F2,F2

F3, . . . ,

Fn−1

Fn= Fn−1, Sn

are independent. As a consequence

The next two propositions give two useful geometrical examples of Dirichlet distributions

Proposition 2.7: (Uniform distribution on the Euclidean sphere) If E = Rn has its natural

Euclidean structure, let S be the unit sphere of E and denote by µ(ds) the uniform measure on S

8

namely the unique measure on S which is invariant by all orthogonal transformations s 7→ u(s) whereu is in the orthogonal group O(E). Let Z = (Z1, . . . , Zn) ∼ N(0, In) be a standard Gaussian variablein E. Then

Θ = (Θ1, . . . ,Θn) =Z

‖Z‖is uniform and (Θ2

1, . . . ,Θ2n) ∼ D(1/2, . . . , 1/2)

Proposition 2.8: (Uniform distribution on the Hermitian sphere) If H = Cn has its natural

Hermitian structure, let S be the unit sphere of H and denote by µ(ds) the uniform measure on Snamely the unique measure on S which is invariant by all unitary transformations s 7→ u(s) where uis in the unitary group U(H). Let Z = (Z1, . . . , Zn) ∼ N(0, In) be a standard Gaussian variable in E.Then

Θ = (Θ1, . . . ,Θn) =Z

‖Z‖is uniform and (‖Θ1‖2, . . . , ‖Θn‖2) ∼ D(1, . . . , 1) is uniform in the tetrahedron En.

Proof of Proposition 2.7: If u ∈ O(E) then u(Z) ∼ Z since the density of Y = u(Z) is

e−‖u∗y2‖/2| detu| dy

(2π)n/2= e−‖y‖2/2 dy

(2π)n/2

since u preserve the norm and detu = ±1. Hence u(Z) ∼ Z for all u ∈ O(E) , hence u( Z‖Z‖ ) ∼ Z

‖Z‖ .

Since the invariant probability is unique Θ = Z‖Z‖ is uniform. Now

(Θ21, . . . ,Θ

2n) =

1

Z21 + · · ·+ Z2

n

(Z21 , . . . , Z

2n)

Since Z21 , . . . , Z

2n are independent gamma distribution with the same distribution γ1/2,2, Theorem 2.1

proves the desired result.

Proof of Proposition 2.8: The important point is a standard Gaussian variable Zk in C has theform Xk + iYk where Xk, Yk are N(0, 1) and independent. The proof is quite similar to the proof ofProposition 2.7 and we get

(‖Θ1‖2, . . . , ‖Θn‖2) =1

X21 + Y 2

1 + · · ·+X2n + Y 2

n

(X21 + Y 2

1 , . . . , X2n + Y 2

n ) ∼ D(1, . . . , 1)

by amalgamation.

Exercice 2.1. If Y ∼ D(a0, . . . , ad) what is the covariance matrix of Y ? (Hint, use Proposition 2.2)

Exercice 2.2. If Xn ∼ βp/n,q/n, what is the limiting distribution of (X1/nn , (1−Xn)

1/n when n → ∞?(Hint: use Proposition 1.3)

Exercice 2.3. Y ∼ D(a0, . . . , ad) what is the distribution of Y0+Y1? What is the distribution of (Y0, Y1)?

Exercice 2.4. Let I = 1, . . . ,m , J = 1, . . . , p. Consider a random matrix X with Dirichletdistribution

(Xij)i∈I,j∈J = D((qij)i∈I,j∈J .

Consider the partition Qi =∑p

j=1 qij as well as Xi. =∑m

j=1 Xij Y = (X1., . . . , Xm.) ∼ D(Q1, . . . , Qm)

and Z = (Xij

Xi.). Denote Rj =

∑mi=1 qij . Show that Y Z is Dirichlet distributed and express its parameters

with (R1, . . . , Rp).

Exercice 2.5. Y ∼ D(a0, a1, a2). What is the conditional distribution of Y1 knowing Y2? Hint: use thedefinition of the Dirichlet distribution.

9

3 The Tc transform of a distribution on a tetraedron

In the sequel if f = (f0, . . . , fd) and x = (x0, . . . , xd) are in Rd+1 we write 〈f, x〉 =∑d

i=0 fixi and wedenote

Ud+1 = f = (f0, . . . , fd) ∈ Rd+1 ; f0 > 0, . . . , fd > 0.

Let X = (X0, . . . , Xd) be a random variable on Ed+1. Let c > 0. The Tc transform of X is the followingfunction on Ud+1 :

Tc(X)(f) = E(〈f,X〉−c).

Its existence is clear from Tc(X)(f) ≤ (mini fi)−c < ∞. It satisfies Tc(X)(λf) = λ−cTc(X)(f). The

explicit calculation of Tc(X) is easy in some rare cases, including the Dirichlet case D(a0, . . . , ad)

when c = a = a0 + . . . + ad and the Bernoulli case∑k

i=0 piδei . The choice of a proper c is generallyimportant while using this Tc transform. For d = 1, knowing the Tc transform is equivalent to knowingthe function t 7→ E((1− tX)−c) on (−∞, 1) when X is a random variable valued in [0, 1] since

Tc((1−X,X))(1, 1− t) = E((1− tX)−c).

The Tc transform is a tool which is in general better adapted to the study of distributions on thetetrahedron than the Laplace transform E(exp(−〈f,X〉)). Its knowledge gives a kind of Cauchy Stieltjestransformation of 〈f,X〉 when f ∈ R

d+1 since for s > −mini fi we have

E

(

1

(s+ 〈f,X〉)c)

= Tc(X)(s+ f0, . . . , s+ fd).

For yi > −1 for all i = 0, . . . we have similarly

E

(

1

(1 + 〈y,X〉)c)

= Tc(X)(1 + y0, . . . , 1 + yd).

Example: the Bernoulli distribution. If X ∼∑dk=0 pkδek with pk ≥ 0 and

∑dk=0 pk = 1 then

Tc(X)(f0, . . . , fd) =d∑

k=0

pkf ck

.

The next theorem gathers the main properties of the Tc transform. It shows for instance thatTc(X) characterizes the distribution of X and gives in (10) a crucial probabilistic interpretation to theproduct Ta(X0)Tb(X1) when X0 and X1 are independent random variables valued in Ed+1.

Theorem 3.1:

1. If X and Z are random variables on Ed+1 and if there exists c > 0 such that Tc(X)(f) = Tc(Z)(f)for all f ∈ Ud+1 then X ∼ Z.

2. If k is a non–negative integer and if H = −( ∂∂f0

+ · · ·+ ∂∂fd

) then

HkTc(X) = (c)kTc+k(X), (7)

where (c)n is the Pochhammer symbol defined by (c)0 = 1 and (c)n+1 = (c)n(c+ n).

3. If (a0, . . . , ad) ∈ Ud+1 with a = a0 + . . .+ ad and if X ∼ D(a0, . . . , ad) then

Ta(X)(f) = f−a0

0 . . . f−ad

d . (8)

10

4. Suppose that X0, . . . , Xr, Y are independent random variables such that Xi ∈ Ed+1 for i =0, . . . , r and Y = (Y0, . . . , Yr) ∈ Er+1 has Dirichlet distribution D(b0, . . . , br). Then for b =b0 + . . .+ br and for Z = X0Y0 + . . .+XrYr we have on Ud+1 :

Tb(Z)(f) = Tb0(X0)(f) . . . Tbd(Xd)(f). (9)

In particular, if Y ∼ β(b1, b0) we have

Tb0+b1((1− Y )X0 + Y X1) = Tb0(X0)Tb1(X1). (10)

5. The probability of the face x0 = . . . = xk = 0 is computable by the Tc transform:

limf0→∞

Tc(X)(f0, . . . , f0, 1, 1, . . . , 1) = Pr(X0 = X1 = . . . = Xk = 0).

Proof: For part 1) fix g ∈ Rd+1, set fi = 1− tgi for t small enough and develop t 7→ E(〈f,X〉−c) in a

neighborhood of t = 0. Since 〈f,X〉 = 1− t〈g,X〉 we have

Tc(X)(f) = E((1− t〈g,X〉)−c) =

∞∑

n=0

(c)nn!

E(〈g,X〉n)tn.

It follows from the hypothesis Tc(X) = Tc(Z) that E(〈g,X〉n) = E(〈g, Z〉n) for all n. Thus 〈g,X〉 ∼〈g, Z〉 since both are bounded random variables with the same moments. Since this is true for allg ∈ R

d+1 we have X ∼ Z. Formula (7) is easy to obtain by induction on k using the fact thatX0+ . . .+Xd = 1. Let us give a proof of the standard formula (8) by the so called beta-gamma algebra.It differs from the method of Proposition 2.1 in [4]. We write γc(dv) = e−vvc−11(0,∞)(v)dv/Γ(c).Consider independent V0, . . . , Vd such that Vi ∼ γai

and define V = V0 + . . .+ Vd and Xi = Vi/V forall i = 0, . . . , d. Recall that (X0, . . . , Xd) ∼ D(a0, . . . , ad) is independent of V ∼ γa. Therefore

E

(

1

〈f,X〉a)

= E

(∫ ∞

0

e−v〈f,X〉va−1 dv

Γ(a)

)

= E

(

eV−V 〈f,X〉)

= E

(

e∑d

i=0(Vi−fiVi

)

=d∏

i=0

1

faii

.

Formula (9) follows from (8) by replacing X, a0, . . . , ad by Y, b0, . . . , br and f by 〈f,X0〉, . . . , 〈f,Xr〉.Using conditioning and the independence of X0, . . . , Xr we obtain

Tb(Z)(f) = E

E([

r∑

j=0

Yj〈f,Xj〉]−b|X0, . . . , Xr)

= E

r∏

j=0

〈f,Xj〉−bj

=

r∏

j=0

Tbj (Xj)(f).

Applying (9) to (Y0, Y1) = (1 − Y, Y ) ∼ D(b0, b1) we get Z = (1 − Y )X0 + Y X1. This leads to (10).Property 5 is obvious since the events X0 + · · ·+Xk = 0 and X0 = X1 = . . . = Xk = 0 coincide.

Let us give an example of the power of the Tc transform method:

Proposition 3.2: Let a0, . . . , ad be positive numbers. Denote a = a0 + · · ·+ ad. Let X, Y and B bethree Dirichlet, beta and Bernoulli independent random variables such that X ∼ D(a0, . . . , ad) and

B ∼∑d

i=0ai

a δei are valued in Rd+1 and such that Y ∼ β(1, a). Then

X ∼ X(1− Y ) +BY.

Proof: We prove it by taking X0 = X, X1 = B, b1 = 1 and b0 = a in (10). As seen before,

T1(B)(f) =1

a

(

a0f0

+ · · ·+ adfd

)

.

11

The trick for computing T1+a(X) is to observe from (7) and (8) that

T1+a(X)(f) =−1

a

(

d∑

i=0

∂fi

)

d∏

i=0

1

faii

= Ta(X)(f)T1(B)(f).

From (10) we also know that for Z = (1 − Y )X + Y B we have T1+a(Z) = Ta(X)T1(B). ThusT1+a(Z) = T1+a(X). Part 1 of Theorem 3.1 implies X ∼ Z.

Here an other example of the use of the Tc transform method; it gives a quick proof of a perpetuityresult of Devroye et al. (1985).

Proposition 3.3: If X ∼ β(2, 2) is independent of Y ∼ β(1, 1) then

X ∼ Z = X(1− Y )1Y≥1/2 + (1−XY )1Y≤1/2 = X(1− Y ) + (1−X)1Y≤1/2.

Proof: For f0, f1 > 0 we write

f0(1− Z) + f1Z = (1−X)(f01Y≥1/2 + f11Y≤1/2) +X(f0Y + f1(1− Y ))

and we get

E((f0(1− Z) + f1Z)−4)) = E(

(f01Y≥1/2 + f11Y≤1/2)−2(f0Y + f1(1− Y ))−2

)

=1

f21

∫ 1/2

0

dy

(f0y + f1(1− y))2+

1

f20

∫ 1

1/2

dy

(f0y + f1(1− y))2

=1

f20 f

21

= E((f0(1−X) + f1X)−4).

Exercice 3.1. If Y = (Y1, . . . , Yn) ∼ D(q, . . . , q) is Dirichlet on the tetrahedron En and if d1, . . . , dn arereal numbers, denote Z = Y1d1 + . . .+ Yndn.

1. Compute Sqn(Z)(t) = E

(

1(1−tZ)qn

)

when |t| is small enough (Hint: use (8) by taking fi = 1− tdi).

2. If q = 1 the function t 7→ Sn(Z)(t) is a rational fraction: expand it as a sum of partial fractions, anduse the result to express Sn(Z)(t) as a sum of a power series in t. Prove that

Sn(Z)(t) =∞∑

k=0

(n)kk!

E(Zk)tk

and compute E(Zk) for all integers k.

Exercice 3.2. Let 1 ≤ m < n and consider the random unitary matrix U = [A,B] where the block Ahas n rows and m columns. The distribution of U is the uniform one on U(n). Let D = diag(d1, . . . , dn)where di are real numbers. The study of the random (m,m) matrix A∗DA is a challenging problem (whatis the distribution of its eigenvalues for instance?). Denote by Z1, . . . , Zn the diagonal of A∗DA. Give themoments of Zj (Hint: use Proposition 2.8 and Exercise 3.1 part 2).

4 The Dirichlet random measure

4.1 Definition of a Dirichlet random measure

The aim of this section is to construct the Dirichlet random measures on a locally compact and witha countable basis Ω equiped with its Borel field B and equipped with a positive bounded measure αof total mass a not necessarily equal to 1.

12

Definition: A random probability P on (Ω,B) is said to be a Dirichlet random probability withparameter α if for all m and for any (T1, . . . , Tm) partition of Ω, then

(P (T1), . . . , P (Tm)) ∼ D(α(T1) . . . , α(Tm)). (11)

For instance if Ω = 1, . . . , n is a finite set and if α(i) = ai > 0, Corollary 2.5 shows that ifX ∼ D(a1, . . . , an) then P (i) = Xi defines a random probability measure with property (11) isstatisfied. The construction that we give is used by Feigin and Tweedie (1987); it is not as intuitiveas the Ferguson ones (1973) but it gives in one shot a powerful representation of this law D(α) of arandom probability measure. This approach gives an immediate proof of a somewhat paradoxical fact:if P ∼ D(α) then almost surely P is purely atomic, even if α has no atoms.

4.2 A principle for Markov chains

We begin with a general principle about Markov chains. See for instance Chamayou Letac (1991) andPropp and Wilson (1995):

Theorem 4.1: Let E be a locally compact space with countable basis and its Borel σ−field, denoteby C the set of continuous maps f : E → E endowed with the smallest σ−field such that f 7→ f(x0)is measurable for any x0 ∈ E. Let ν be a probability on C(E), let F1, . . . , Fn, . . . be iid in C(E) withcommon distribution ν. Consider

Wn(x) = Fn Fn−1 . . . F1(x) and Zn(x) = F1 F2 . . . Fn(x).

Assume that almost surely Z = limn→∞ Zn(x) exists and does not depend on x. Then the distributionπ of Z is a stationary distribution of the Markov chain (Wn(x))n≥0 and this chain has only onestationary distribution. In particular if F ∈ C and X ∈ E are independent such that F ∼ ν thenF (X) ∼ X if and only if X ∼ π.

Proof: If g : E 7→ R is continuous and bounded the map x 7→∫

Cg(f(x))ν(df) is continuous and

bounded on E (by dominated convergence). Write

πx0

n ∼ Wn(x0) ∼ Zn(x0).

For n ≥ 1 we have∫

E

g(x)πx0

n (dx) =

E×C

g(f(x))πx0

n−1(dx)ν(df).

From the hypothesis Zn(x0) →n→∞ Z almost surely implies

E

g(x)πx0

n (dx) →n→∞

E

g(x)π(dx),

E

(∫

C

g(f(x))ν(df)

)

πx0

n−1(dx) →n→∞

E×C

g(f(x))π(dx)ν(df),

which proves the stationarity of π. If π0 ∼ X0 was an other stationary distribution, one would getX0 ∼ Wn(X0) ∼ Zn(X0). But we have just seen that the limiting distribution of Zn(x0) is π. Thereforeπ0 = π.

Example: Let a0, . . . , ad be positive numbers and a = a0+ · · ·+ad. Let (Yn)n≥1 and (Bn)n≥1 be beta

and Bernoulli independent random variables such that Bn ∼∑di=0

ai

a δei are valued in Rd+1 and such

that Y ∼ β(1, a). Consider the Markov chain Xn = (1− Yn)Xn−1 + YnBn. This chain is of the abovetype where E is the tetrahedron Ed+1 and the random maps Fn are given by Fn(x) = (1−Yn)x+YnBn.Therefore

W1(x) = (1− Y1)x+B1Y1

W2(x) = (1− Y2)(1− Y1)x+ (1− Y2)Y1B1 + Y2B2

W3(x) = (1− Y3)(1− Y2)(1− Y1)x+ (1− Y3)(1− Y2)Y1B1 + (1− Y3)Y2B2 + Y3B3

13

while

Z1(x) = (1− Y1)x+B1Y1

Z2(x) = (1− Y1)(1− Y2)x+ (1− Y1)Y2B1 + Y1B1

Z3(x) = (1− Y1)(1− Y2)(1− Y3)x+ (1− Y1)(1− Y2)Y3B3 + (1− Y1)Y2B2 + Y1B1

Clearly

Zn(x) = x

n∏

j=1

(1− Yj) +

n∑

k=1

BkYk

k−1∏

j=1

(1− Yj) →n→∞ Z =

∞∑

k=1

BkYk

k−1∏

j=1

(1− Yj)

From Proposition 3.2 we can claim that π = D(a0, . . . , ad) is a stationary distribution of this Markowchain, from Theorem 4.1 we can claim that Z ∼ π or

∞∑

k=1

BkYk

k−1∏

j=1

(1− Yj) ∼ D(a0, . . . , ad), (12)

an important result for the next section.

Exercice 4.2.1 If X and Y are independent such that X ∼ βp,p+q and Y ∼ βp,q show that X ∼ Y (1−X)(Hint: compute the Mellin transform of X, 1 −X and Y with the help of Section 1). If (Yk)k≥1 are iidsuch that Yk ∼ βp,q consider the Markov chain (Xn)n≥0 on (0, 1) defined by Xn = Yn(1 − Xn−1). Byapplying Theorem 4.2 to Fn(x) = Yn(1−x), compute Wn(x) and Zn(x). Show that Z = limZn(x) exists,express it in terms of the (Yk)k≥1’s and give its distribution π (source, Chamayou and Letac (1991) pages19-20)

Exercice 4.2.2 If X and Y are independent such that X ∼ β(2)p,q and Y ∼ β

(2)p,p+q show that X ∼ Y (1+X)

(Hint: compute the Mellin transform of X, 1 +X and Y with the help of Section 1). If (Yk)k≥1 are iid

such that Yk ∼ β(2)p,p+q consider the Markov chain (Xn)n≥0 on (0, 1) defined by Xn = Yn(1 +Xn−1). By

applying Theorem 4.2 to Fn(x) = Yn(1+x), compute Wn(x) and Zn(x). Show that Z = limZn(x) exists,express it in terms of the (Yk)k≥1’s and give its distribution π (source, Chamayou and Letac (1991) page21)

Exercice 4.2.3 If X and Y are independent such that X ∼ β1/3,2/3 and Y ∼ β1/2,1/3 show thatX ∼ Y (1 − X)2 (Hint: compute the Mellin transform of X, 1 − X and Y with the help of Section1). If (Yk)k≥1 are iid such that Yk ∼ β1/2,1/3 consider the Markov chain (Xn)n≥0 on (0, 1) defined byXn = Yn(1−Xn−1)

2. Admitting that Z = limZn(x) exists, give its distribution π (source, Chamayou andLetac (1991) page 27-29, where the existence of Z is proved)

4.3 Construction of the Dirichlet random measure

Theorem 4.2: Let Ω be locally compact with countable topological basis equiped with its Borelσ field, and let α be a positive bounded measure on Ω of total mass a. Let X1, Y1, . . . , Xn, Yn, . . . beindependent random variables such that yj ∼ β1,a and such that Xj is valued in Ω with distributionα/a. Then the distribution of the random probability

P =

∞∑

k=1

δXkYk

k−1∏

j=1

(1− Yj) (13)

is Dirichlet with parameter α.

Proof: We fix a partition (T1, . . . , Tm) of Ω and we define Bk = (1, 0, . . . , 0) = e1 if Xk ∈ T1,Bk = (0, 1, . . . , 0) = e2 if Xk ∈ T2,..., Bk = (0, 0, . . . , 1) = em if Xk ∈ Tm. Here e1, . . . , em is the

14

canonical basis of Rm. Therefore Bk is Bernoulli distributed in the tetrahedron Em. The probability

of Bk = (1, 0, . . . , 0) = e1 is Pr(Xk ∈ T1) =α(T1)

a . More generally Bk ∼∑mi=1

α(Ti)a δei . Clearly

(P (T1), . . . , P (Tm)) =

∞∑

k=1

BkYk

k−1∏

j=1

(1− Yj)

and (12) shows that D(a1, . . . , ad) where ai = α(Ti).

Comments:

1. Existence. Theorem 4.2 proves the existence of Dirichlet random probabilities.

2. Uniqueness. We are not going to prove its uniqueness, in the sense if we consider the set P(Ω)of all probabilities on Ω there is only one distribution (denoted by D(α)) on P(Ω) such that ifP ∼ D(α) then its satisfies to the definition of a Dirichlet random probability governed by αgiven in Section 4.1. This uniqueness is carefully proved in the funding paper of Ferguson (1973).We admit this uniqueness in all the sequel.

3. Atoms and support. As observed in Theorem 4.2, the Dirichlet random probability P ∼ D(α)is almost surely a purely atomic one, but the atoms are randomly placed on X1, . . . , Xn, . . ..Note that if A ⊂ Ω is such that α(Ω \ A) = 0 then necessarily P (A) = 0 almost surely. If Ω isa bounded interval for instance and if α is the Lebesgue measure, these atoms are dense in theinterval Ω.

4. Image. If ϕ : Ω → Ω1 is measurable, denote by ϕ∗α = α1 the image of α by ϕ. Obviously ifα(Ω) = a then α1(Ω1) = a. We leave to the reader the proof of the following statement

P ∼ D(α) ⇒ P1 = ϕ∗P ∼ D(α1).

This is quite important: suppose that we want to study the random variable I =∫

Ωϕ(w)P (dw)

where ϕ : Ω → R = Ω1. Therefore α1 = ϕ∗α is a bounded measure on R and we are led to thestudy of I =

∫∞

−∞xP1(dx) where P1 ∼ D(α1). If Ω is a real finite dimensional linear space, the

case where ϕ is a linear or an affine transformation is worthwhile to be considered.

5. Dirichlet process. Suppose that α is a bounded measure on the real line, consider P ∼ D(α)and its distribution function

FP (t) = P ((−∞, t])

Then t 7→ FP (t) is a Markov process (this can be proved in an analogous way to Proposition 2.7,which is actually the particular case of this statement when α =

∑nk=1 pkδk). For this reason,

the Dirichlet random probability P is often called the Dirichlet process, a questionable term. Itmakes no sense if Ω is not an interval of the real line.

Exercice 4.3.1. (Yamato, 1984) If Y1, . . . , Yk, . . . are iid with Yk ∼ β1,a, denote

Pk = Yk

k−1∏

j=1

(1− Yj)

(see (13)). Compute E(Y 2k ), E((1− Yk)

2), E(P 2k ) and E(

∑∞k=1 P

2k ).

5 The random variable∫

Ω g(w)P (dw) when P ∼ D(α).

Given a random Dirichlet measure on Ω governed by α and given a real function g on Ω our next taskis to study Ig =

Ωg(w)P (dw). If the image of α by g is α1 on the real line and if P1 ∼ D(α1) this

integral becomes Ig =∫∞

−∞xP1(dx). We first solve the existence problem, and we next consider the

difficult question: what is the distribution of Ig in terms of α? In this course, except a few exceptions,we have only time for the case where the mass of α is one. In this case, a powerful tool is the ordinaryCauchy Stieltjes transform of a bounded measure on the real line.

15

5.1 Existence of∫

Ωg(w)P (dw)

We begin with an obvious generalization of (8):

Theorem 5.1. Let α be a bounded measure on Ω of total mass a, and let P ∼ D(α). If f : Ω → (0,∞)is such that there exists 0 < m with m ≤ f(w) ≤ M for all w ∈ Ω and if I − f =

Ωg(w)P (dw) then

E

(

1

(If )a

)

= e−∫Ωlog f(w)α(dw) (14)

Proof. If f takes only a finite number of values, (14) is nothing but (8). In the general case there existsan increasing sequence fn taking each a finite number of values such that lim fn(w) = f(w) for all w.Therefore Ifn →n→∞ If and

Ωlog fn(w)α(dw) →n→∞

Ωlog f(w)α(dw) by monotone convergence.

Also Ifn increases towards If , therefore I−afn

decreases towards I−af . Since E

(

1(If )a

)

≤ 1ma is finite,

the theorem of monotone decreasing convergence can be applied to claim that the limit of E(

1(Ifn )a

)

is E

(

1(If )a

)

and this ends the proof. .

If α is a bounded measure of total mass a on Ω = Rd with bounded support formula (14) may be

quite sufficient for finding the distribution µα of X =∫

Rd xP (dx) when P ∼ D(α). If we apply (14) tof(x) = 1 + t〈y, x〉 where t is small enough such that 1 + t〈y, x〉 is positive on the support of α, we geta way to find the moments of 〈y,X〉 by expanding both sides of

Rd

µα(dx)

(1 + t〈y, x〉)a = e−∫Rd

log(1+t〈y,w〉)α(dw)

Knowning the moments of 〈y,X〉 gives the knowledge of the distribution of 〈y,X〉. Since we know itfor all y ∈ R

d we get the knowledge of the distribution of X.

Example. The measure α is a multiple of an arcsinus distribution.Let Ω = (−1, 1) and the arcsinus probability α defined by

α(dw) =a

π

1√1− w2

1(−1,1)(w)dw,1

aα((−1, x)) =

1

2+

1

πarcsinx. (15)

Let P ∼ α and I =∫

wP (dw). Let us apply Theorem 5.1 to f(w) = 1+sw for −1 < −s < 1. Therefore

m = 1 − |s| > 0. We now compute g(t) = −∫ 1

−1log(1 + sw)α(dw) by expanding in power series in t,

obtaining by a standard calculation:

g(t) = a

∞∑

n=1

t2n

2n

1

π

∫ 1

−1

w2n

√1− w2

dw = a

∞∑

n=1

t2n

2n

(1/2)nn!

To complete the calculation introduce h(u) =∑∞

n=1un

2n(1/2)n

n! . We get easily

uh′(u) +1

2=

∞∑

n=0

un

2

(1/2)nn!

=1

2√1− u

Therefore

h(u) =1

2

∫ u

0

(

1√1− v

− 1

)

dv

v= log

2

1 +√1− u

We finally get

∫ 1

−1

µα(dx)

1 + xt) =

(

2

1 +√1− t2

)a

= 2F1(a

2,a+ 1

2, a+ 1, t2) =

∞∑

n=0

(a2 )n(a+12 )n

n!(a+ 1)nt2n

16

by a miracle of the hypergeometric function. This implies that the even moments of µα are

∫ 1

−1

x2nµα(dx) =(2n)!

(a)2n× (a2 )n(

a+12 )n

n!(a+ 1)n=

( 12 )n

(a+ 1)n=

∫ 1

0

vnβ 12,a+ 1

2(dv).

The probability µα is obviously symmetric. Since its image by x 7→ v = x2 is β 12,a+ 1

2(dv) we get at

last that

µα(dx) =1

B( 12 , a+ 12 )

(1− x2)a−12 1(−1,1)(x)dx. (16)

The next theorem solves completely the existence problem. We adopt the notation a+ = max0, awhen a is a real number.

Theorem 5.2: Let Ω be a complete metric space, α be a positive bounded measure on Ω of totalmass 1 and let P ∼ D(α). If g is a positive measurable function on Ω then

Ωg(w)P (dw) < ∞ almost

surely if and only if∫

Ω(log g(w))+α(dw) < ∞

Proof: We show first Part ⇐ . We write f(w) = (log g(w))+. By definition

Ω

g(w)P (dw) ≤∫

Ω

max(1, g(w))P (dw) = limn→∞

n∑

k=1

ef(Xk)Yk

k−1∏

j=1

(1− Yj) ≤ ∞.

Since 1−Yk ∼ β(a, 1) then E(log(1−Yk)) = a∫ 1

0ya−1 log y dy = − 1

a and the law of the large numbersimplies that almost surely

limk→∞

k−1∏

j=1

(1− Yj)

1/k

= e−1/a.

The fact that E(f(X1)) < ∞ implies that E(f(X1)) =∫∞

0Pr(f(X1) ≥ x)dx < ∞. As a consequence

for each ε > 0 we have∑∞

n=1 Pr(f(X1) ≥ nε) < ∞), a fact that we prefer to write as follows

∞∑

n=1

Pr(f(Xn)

n≥ ε) < ∞).

From the Borel Cantelli Lemma, this implies that the set of integers n; f(Xn)n ≥ ε is almost surely

finite, or that lim supn→∞f(Xn)

n ≤ ε almost surely. Since this true for all ε we have limn→∞f(Xn)

n = 0

since f ≥ 0. In the same manner we have limk→∞log Yk

k = 0 and limk→∞ Y1/kk = 0. We now use the

Cauchy criteria of convergence of a series with positive terms uk saying that if lim supk→∞k√uk = r < 1

then∑∞

k=1 uk converges. We apply it to

uk = ef(Xk)Yk

k−1∏

j=1

(1− Yj).

The above considerations have shown that limk→∞k√uk ≤ e−

1a = r < 1. This proves Part ⇐ .

We now show Part ⇒ . We suppose that∫

Ω(log g(w))+α(dw) = ∞ and we want to show that for

all real x we have

Pr(

Ω

g(w)P (dw) ≤ x) = 0.

A trick invented by Hannum et al (1981) is the following. Consider a bounded function h and Ih =∫

Ωh(w)P (dw), which exists, since h is bounded. Then from Theorem 5.1 applied to f(w) = 1+ sg(w)

we have

E

(

1

(1 + sIh)a

)

= e−∫Ωlog(1+s h(w))α(dw). (17)

17

Actually as a function of s defined on the interval −sminw∈Ω h(w) < 1, the two members of (17) arethe Laplace transform of the random variable T 0

h = UIh where U ∼ γa,1 is independent of Ih. Thiscomes from

E

(

1

(1 + sIh)a

)

= E

(∫ ∞

0

e−u−usIhua−1

Γ(a)du

)

= E(e−UIh).

Note that Pr(Ih ≤ 0) = Pr(T 0h ≤ 0) since Ih and UIh have the same sign. We now repeat the process

while replacing h by h− x, Ih by Ih − x and T 0h by T x

h = U(Ih − x). Now we have

Pr(Ih ≤ x) = Pr(T xh ≤ 0), E(e−sTx

h ) = e−∫Ωlog(1+s(h(w)−x))α(dw). (18)

for s(x −minh) < 1. We now choose a sequence of bounded positive functions gn such that gn(w) isincreasing with limit g(w). We also denote fn = (log gn)+ and we also need the following inequality:for s(1 + x) < 1 , s > 0 and x > 0 we have

log(1 + s(gn − x) ≥ fn + log s.

This is an elementary fact: just discuss gn ≤ 1 and gn ≥ 1. From (18) we have

Pr(Ign ≤ x) = Pr(T xgn ≤ 0) ≤ E(e−sTx

gn )

≤ exp

(

−∫

Ω

(fn(w) + log s)α(dw)

)

=1

saexp

(

−∫

Ω

fn(w)α(dw)

)

→n→∞ 0

since∫

Ωfn(w)α(dw) →

Ω(log g(w))+α(dw) = ∞ from the hypothesis. Since Pr(Ign ≤ x) →n→∞ 0

we have Pr(Ig ≤ x) = 0. .

The following corollary is an extension of the fundamental example following Theorem 4.1:

Corollary 5.3. Let α be a bounded measure on Rd with total mass a = α(Rd). Let Y1, B1, Y1, B2, . . .

be a sequence of independent random variables such that for all n Bn in Rd has distribution α/a and

Yn has distribution β1,a. Assume that∫

Rd(log ‖w‖)+α(dw) < ∞. Consider the Markov chain (Xn)n≥0

on Rd defined by

Xn = (1− Yn)Xn−1 + YnBn.

Then this chain has a unique stationary distribution which is the distribution of I =∫

Rd wP (dw)where P ∼ D(α).

Proof. From Theorem 4.1 applied to E = Rd and to Fn(x) = (1 − Yn)x + YnBn the stationary

distribution is unique and is the distribution of

Z = B1Y1 +B2Y2(1− Y2) + Y3B3(1− Y1)(1− Y2) + . . . .

From Theorem 5.2 since∫

Rd(log ‖w‖)+α(dw) < ∞ the random integral I =∫

Rd wP (dw) convergesabsolutely when P ∼ D(α). Furthermore Theorem 4.2 from (13) says that the random probability Phas the distribution of

Q =

∞∑

k=1

δBkYk

k−1∏

j=1

(1− Yj).

Hence

I =

Rd

wP (dw) ∼ Z =

Rd

wQ(dw) =

∞∑

k=1

BkYk

k−1∏

j=1

(1− Yj).

18

Exercice 5.1.1. (Cauchy distribution ,Yamato 1984) If z = p+ iq is a complex number with q > 0define the Cauchy distribution on the real line

cz(dx) =1

π

qdx

(x− p)2 + q2

Show that if t > 0∫ ∞

−∞

eixtcz(t) = eizt

(Hint, prove that if X ∼ ci then E(eiXt) = e−|t| and that qX + p ∼ cz). What is the value of this integralif t < 0? Consider now the random Dirichlet distribution P ∼ D(acz). Using Theorem 5.2 show thatI =

∫∞

−∞xP (dx) converges almost surely. Express I using (13). Using the fact that

∑∞k=1 Pk = 1 when

Pk = Yk

∏k−1j=1 (1− Yj), prove that I ∼ cz for all a (Hint, consider its characteristic function E(eitI)).

5.2 Some tutorial on the Cauchy Stieltjes transform

Among the various extensions of Theorem 5.1 the one considered in Theorem 5.9 below is particularlyattractive. To introduce it we use some mathematical results described in this section. We considerthe analytic function z 7→ log z defined on the complex plane without the negative part of the real axisby log z = log |z|+ iθ when z = |z|eiθ with θ ∈ (−π, π). The derivative of this analytic function is 1/z.Furthermore, the exponential of a log z is denoted by za. With these conventions, if µ is a probabilityon the real line, we define the Cauchy-Stieltjes transform of µ of type c > 0 as the function of z definedon the upper complex half plane =z > 0 by

z 7→∫ ∞

−∞

µ(dw)

(w − z)c.

This integral converges since∣

1

(w − z)c

≤ 1

|=z|c .

It is sometimes called the generalized Cauchy-Stieltjes transform. If c = 1 one talks simply about theCauchy-Stieltjes transform of µ. This section gives information mainly about about the case c = 1.

Theorem 5.4. Let H+ = z;=z > 0 be the upper half complex plane. Let µ be a probability on R

and consider the map on H+

z 7→ sµ(z) =

∫ ∞

−∞

µ(dw)

w − z

called the Cauchy Stieltjes transform of µ. Then sµ(z) exists and is valued in H+ and µ 7→ sµcharacterizes µ : more specifically if (a, b) are not atoms of µ we have

limy↓0

1

π

∫ b

a

=sµ(x+ iy)dx = µ([a, b]). (19)

Finally if X ∼ µ and aX + b ∼ µ1 with a > 0 then sµ1(z) = 1

asµ(z−ba ).

Proof. Clearly

<sµ(x+ iy) =

∫ ∞

−∞

(w − x)µ(dw)

(w − x)2 + y2, =sµ(x+ iy) = y

∫ ∞

−∞

µ(dw)

(w − x)2 + y2

and this proves the existence and the fact that sµ(x+ iy) ∈ H+ for y > 0. Now recall that for y > 0the characteristic function of the Cauchy distribution

cz(dw) = cx+iy(dw) =1

π

ydw

(w − x)2 + y2

19

is

ϕcx+iy(t) =

∫ ∞

−∞

eitxwcx+iy(dw) = e−y|t|+ixt

or is eitz for t > 0. This implies that for fixed y > 0 then x 7→ 1π=sµ(x + iy) is the density of a

probability µy with characteristic function

ϕµy(t) = e−y|t|ϕµ(t).

Clearly µ is determined from µy. One observes that limy→0 µy = µ for tight convergence. With thePaul Lévy theorem we get (19). The last formula is obvious.

Corollary 5.5. Cauchy Stieltjes transform of type c if c is an integer. If µ is aprobability on R whith Cauchy Stieltjes transform sµ and if c is a positive integer, then the CauchyStieltjes transform of type c of µ is

∫ ∞

−∞

µ(dw)

(w − z)c= (c− 1)!s(c)µ (z)

and it characterizes µ.

Proof. The only thing to prove is that it characterizes µ. We show it by induction on c. For c = 1

this has been proved by Theorem 5.1. If true for c, suppose that we know f(z) = 1c!

∫∞

−∞µ(dw)

(w−z)c+1 .

Then s(c)µ (z) is the only primitive of f which vanishes when |z| → ∞ along the imaginary axis, and

this shows that f characterizes s(c)µ . The induction follows.

Remark. If c > 0 is not an integer, the fact that the Cauchy Stieltjes of µ of type c, namely the

function z 7→ f(z) =∫∞

−∞µ(dw)(w−z)c on the upper half plane H+ characterizes µ is harder to prove and we

will not do it here. The reading of Sumner (1949) giving the proper references is quite recommended:he proves for instance that when µ is concentrated on the positive line then denoting

Lk(f)(z) = Ck

(

d

dz

)k(

(z2k+c−2

(

d

dz

)k−1

(f(z)

)

where Ck = 2c−1(2k−1)!k!(k−2)!(c)2k−1

, then if a and b are not atoms of µ we have the following inversion formula:

µ([a, b]) = limk→∞

∫ b

a

Lk(f)(−z)dz.

The next three propositions of this section will give useful examples of Cauchy Stieltjes transforms.

Proposition 5.6. The Cauchy Stieltjes transform of the Cauchy law. If p+iq is a complexnumber with q > 0 the Cauchy distribution cp+iq on the real line is

cp+iq(dx) =1

π

qdx

(x− p)2 + q2.

Then its Cauchy-Stieltjes transform for =z > 0 is

∫ ∞

−∞

cp+iq(dw)

w − z=

1

−z + p− iq(20)

Proof. One proves (20) for the standard Cauchy distribution ci by computing the integral using

20

partial fractions:

1

π

∫ ∞

−∞

dw

(1 + w2)(w − z)=

1

πlim

T→∞

∫ T

−T

dw

(1 + w2)(w − z)

=1

1 + z21

πlim

T→∞

∫ T

−T

(

1

w − z− z

1 + w2− w

1 + w2

)

dw

=1

1 + z21

πlim

T→∞

(

logz − T

z + T− 2z arctanT

)

=1

1 + z21

π

(

iπ − i0− 2zπ

2

)

=1

−z − i

To pass to the general case of (20) use the fact that p+ qX ∼ cp+iq when X ∼ ci.

Remark. Let us mention two other proofs of Proposition 5.6. One can be done by residues: quickerand less elementary. The second one is my favorite (and probably "Yor’s favorite proof") and usesBrownian motion through the following steps

1. If f is a meromorphic function and t 7→ BZ(t) is a complex Brownian motion starting fromZ ∈ C then t 7→ f(BZ(t)) = M(t) is a complex martingale.

2. If T is a regular stopping time, f(Z) = E(f(BZ(T )).

3. If Z ∈ H+ and if T is the hitting time of R then taking f(Z) = eitZ with t > 0 shows thatthe Fourier transform of the distribution of BZ(T ) is eitZ , thus this distribution is the Cauchydistribution cZ .

4. If z ∈ H+ then Z 7→ f(Z) = 1Z−z is analytic on H− = −H+. Let T be the hitting time of R by

BZ . We have

E

(

1

BZ(T )− z

)

=1

Z − z.

Since by symmetry the distribution of BZ(T ) is cZ we get the desired Cauchy Stieltjes transformof cZ .

Let us compute some other Cauchy Stieltjes transforms: The next proposition uses the hypergeometricfunction

F (a, b; c; z) =∞∑

n=0

zn

n!

(a)n(b)n(c)n

with the notation (a)0 = 1 and (a)n+1 = (a+ n)(a)n.

Proposition 5.7. The Cauchy Stieltjes transform of the beta law. The Cauchy Stieltjestransform of the distribution

βb,c−b(dw) =1

B(b, c− b)wb−1(1− w)c−b−1

1(0,1)(w)dw

is

sβb,c−b(z) = −1

zF (1, b, c; 1/z).

In particularsβ3/2,3/2

(z) = 2[1− 2z + 2√

z(z − 1)] = −2[√z −

√z − 1]2.

If X ∼ β3/2,3/2 and Y = r(2X − 1) then the Stieltjes transform of the "Wigner half circle law" Wr ofradius r is

sY (z) =1

r2[√

z2 − r2 − z] = −2[√z + r −

√z − r]2.

21

For r = σ√2 then s = sY satisfies

s2(z) +z

σ2s(z) +

1

σ2= 0. (21)

Proof. We use the Gauss formula∫ 1

0wb−1(1 − w)c−b−1(1 − wz)−adt = B(b, c − b)F (a, b; c; z). For

getting the explicit result on sβ3/2,3/2we write

F (1, 3/2; 3; z) =

∞∑

n=0

zn

(n+ 2)!(3/2)n

=1

z2

∞∑

n=0

zn+2

(n+ 2)!(3/2)n

=1

z2[−4(1− z)1/2 − 2z + 4]

= 2

(

1− (1− z)1/2

z

)2

(22)

The last example will not be used in this course, but is worthwhile to be known for its role in randommatrix theory:

Proposition 5.8. The Cauchy Stieltjes transform of the Marchenko Pastur law. Let0 < a < b. Then the following measure

A(dw) = Aa,b(dw) =4

π

1

(√b−√

a)21

w

(b− w)(w − a)1(a,b(w)dw

is a probability. Its Cauchy Stieltjes transform is

sA(z) =2

(√b−√

a)2z[√

(z − a)(z − b)− z +√ab].

and satisfies

z

(√b−√

a

2

)2

s2A(z) + (z −√ab)sA(z) + 1 = 0.

Proof. We compute first the integral

K(z) =

∫ b

a

1

w − z

(b− w)(w − a)dw.

The change of variable w = a+ (b− a)u gives

K(z) =(b− a)2

a− z

∫ 1

0

u(1− u)

1− a−ba−zu

du =(b− a)2

a− zB(3/2, 3/2)F (1, 3/2; 3;

a− b

a− z)

from the Gauss formula. We use (22) and B(3/2, 3/2) = π/8 and we get

K(z) = −π

4[√z − b−

√z − a]2.

By considering K(0) we see easily that MP is a probability distribution. Now consider

I =

∫ b

a

1

w(w − z)

(b− w)(w − a)dw =1

z(K(z)−K(0))

22

Since sMP = I/K(0) we get the result.

Remark. For γ > 0 we now define the Marcenko Pastur law MPγ from Aa,b as follows: we takea(γ) = γ − 2

√γ + 1 and b(γ) = γ + 2

√γ + 1. We define MPγ(dw) as

For 0 < γ < 1 MPγ = (1− γ)δ0 + γAa(γ),b(γ)

For 1 ≤ γ MPγ = Aa(γ),b(γ).

For A = Aa(γ),b(γ) we can write

For γ ≥ 1 we have zs2A(z) + (z − γ + 1)sA(z) + 1 = 0.

For γ ≤ 1 we have zγs2A(z) + (z − 1 + γ)sA(z) + 1 = 0

Since for 0 < γ < 1 we have sMPγ(z) = − 1−γ

z + γsA(z), this implies that for all y > 0 the CauchyStieltjes transform of MPγ satisfies the important formula

zs2MPγ(z) + (z + 1− γ)sMPγ

(z) + 1 = 0. (23)

Exercise 5.2.1. Let 0 < a < b and 0 < β < q + 1 where q is a non negative integer. Consider thefollowing probability

A(dw) = C1

w(b− w)β−1(w − a)q−β

1(a,b)(w)dw.

Show that its Cauchy Stieltjes transform is sA(z) =F (z)F (0) − 1 where

F (z) = (a− z)β−1(b− z)q−β −q−1∑

k=0

(β − q)kk!

(a− b)k(a− z)q−k−1.

Hint: compute first the integral

K(z) =

∫ b

a

1

w − z(b− w)β−1(w − a)q−βdw

q! sinβπ

[

(a− z)β−1(b− z)q−β −q−1∑

k=0

(β − q)kk!

(a− b)k(a− z)q−k−1

]

.

by the change of variable w = a+ (b− a)u and conclude like in Proposition 5.6.

5.3 Cauchy Stieltjes transform of∫

Ωg(w)P (dw)

Theorem 5.9. Let α be a bounded positive measure of total mass a on the real line such that∫∞

−∞(log |x|)+α(dx) < ∞. Let P ∼ D(α) and I =

∫∞

−∞xP (dx). Then for all complex z such that

=z > 0 the Cauchy Stieltjes transform of the random variable I is

E

(

1

(I − z)a

)

= e−∫ ∞−∞

log(w−z)α(dw) (24)

Furthermore, if I ∼ µα, then the map α 7→ µα is injective when restricted to all bounded measures ofthe same mass a.

Proof. In the particular case α = a0δw0+ · · · + adδwd

we have I = w0Y0 + · · · + wdYd whereY = (Y0, . . . , Yd) ∼ D(a0, . . . , ad). In this case, (24) becomes

E

(

1

(w0Y0 + · · ·+ wdYd − z)a

)

=1

(w0 − z)a0 . . . (wd − z)ad= e−a0 log(w0−z)−···−ad log(wd−z). (25)

23

To prove it we first imitate the proof the (8) while replacing fi > 0 by fi complex such that <fi > 0,getting

E

(

1

(f0Y0 + · · ·+ fdYd)a

)

=1

fa0

0 . . . fad

d

.

We are allowed to do this while modifying the proof of (8) since

∫ ∞

0

e−vf va−1

Γ(a)dv =

1

fa

when <f > 0. Now we assume that |z| is large enough such that fi = 1− wi

z has a positive real part,getting

E

(

1

((1− w0

z )Y0 + · · ·+ (1− wd

z )Yd)a

)

=1

(1− w0

z )a0 . . . (1− wd

z )ad.

Multiplying both sides of the last equality by 1/(−z)a we get (25) for |z| large enough. Finally bothsides of (25) being analytic in the half-plane =z > 0, we can claim that (25) holds in the whole halfplane H+.

If the support of α is bounded, we imitate the proof of Theorem 5.1 and construct a sequence (fn)of functions taking a finite number of values such that |fn(x)| ≤ |x| on the support of α and such thatlim fn(x) = x on this support. By this method, we pass from (25) to (24). If the support of α is notbounded, we approximate α by

αn(dw) = 1(−n,n)(w)α(dw) + unδ−n + vnδn

where un = α((−∞,−n]) and vn = α([n,∞)). If Pn ∼ D(αn) and In =∫∞

−∞wPn(dw) then (24) holds

for In and αn. Let us show that

limn

∫ ∞

−∞

log(w − z)αn(dw) =

∫ ∞

−∞

log(w − z)α(dw) (26)

To see this, observe that

0 < vn log n =

∫ ∞

n

logwα(dw) →n→∞ 0.

Therefore∫ ∞

n

(logw − log n)α(dw) =

∫ ∞

n

logwα(dw)− vn log n →n→∞ 0

and some details from this lead to (26). We can also claim that D(αn) converges to D(α) for theweak convergence on the compact space of positive measures on R with total mass ≤ 1 endowed withthe weak convergence: just apply Theorem 5.1 to all positive functions f taking a finite number ofvalues to see this. As a consequence, the distribution of In converges weakly towards the distributionof I. Since w 7→ 1

(w−z)a vanishes at infinity the fact that In converges to I in distribution implies that

E

(

1(In−z)a

)

converges towards E

(

1(I−z)a

)

. This finally shows (24).

To complete the proof, if α and α′ have the same mass a and the same µα = µalpha′ , then (24)implies that on H+

−∫ ∞

−∞

log(w − z)α(dw) = −∫ ∞

−∞

log(w − z)α′(dw)

Taking derivatives in z we get that α and α′ have the same Cauchy Stieltjes transform and therforefrom Theorem 5.4 they coincide. .

Let us give an elegant consequence of the above theorem, due to Lijoi and Regazzini (2004).

Corollary 5.10. (Characterization of the Cauchy distributions) Let α be a probability onthe real line such that

∫∞

−∞(log |x|)+α(dx) < ∞. Let P ∼ D(α) and I =

∫∞

−∞xP (dx). Then I ∼ α if

and only if α is a Cauchy distribution or a Dirac mass.

24

Proof. The if part is a consequence of Exercise 5.1.1. We prove the ’only if’ part. For =z > 0 wedefine g(z) = −

∫∞

−∞log(w − z)α(dw) and we get

g′(z)(1)=

∫ ∞

−∞

α(dw)

w − z

(2)= eg(z)

(1) comes from the derivation of z 7→ log z and (2) from Theorem 5.9 applied to α. Hence e−g(z)g′(z) =1 implies the existence of a complex number p− iq such that e−g(z) = −z + p− iq, g′(z) = 1

−z+p−iq .

Since the function z 7→ g(z) is analytic in the half plane =z > 0 this also true of z 7→ g′(z). Thereforeg′ cannot have a pole in =z > 0. As a consequence q ≥ 0 : if q = 0 we have α = δp and if q > 0 wehave α = cp+iq from (20).

A more important application is the explicit calculation for the density of X =∫∞

−∞wP (dw) when α

has mass 1.

Theorem 5.11. Let α be a probability on R such that∫∞

−∞(log |w|)+ α(dw) < ∞ and such that α

is not concentrated on one point. Denote by Fα(x) = α((−∞, x]) its distribution function. Then thedistribution µα(dx) of X =

∫∞

−∞xP (dx) when P ∼ D(α) has density

f(x) =1

πsin(πFα(x)) e

−∫ ∞−∞

log |w−x|α(dw).

Remark. −∫∞

−∞log |w − x|α(dw) is valued in (−∞,∞] and therefore is well defined. Actually

∫ x+1

x−1(− log |w − x|)α(dw) is valued in (0,∞) since one integrates a positive function. In the other

hand∫

|w−x|>1(log |w − x|)α(dw) < ∞ from the hypothesis: the difference of the two integrals make

sense.

Proof. From Theorem 5.9 since a = 1 we have

sµα(x+ iy) =

∫ ∞

−∞

µ(dt)

t− x− iy= e−

∫ ∞−∞

log(w−x−iy)α(dw) = eA+iB

where A = A(x, y) = −∫∞

−∞log |w − x − iy|α(dw) and B = B(x, y) = −

∫∞

−∞arg(w − x − iy)α(dw).

Having in mind the use of Theorem 5.4 let us compute

f(x) =1

πlimy0

=sµ(x+ iy) =1

πlimy0

eA(x,y) sinB(x, y).

The trick is to see that if x is not an atom of α we have

limy0

B(x, y) = limy0

−∫ ∞

−∞

arg(w − x− iy)α(dw) = π

∫ x

−∞

α(dw) + 0

∫ ∞

x

α(dw) = πFα(x).

The limit of A(x, y) is −∫∞

−∞log |w− x|α(dw) as can be seen by separating the cases |w− x| > 1 and

|w − x| ≤ 1. Finally we have from Theorem 5.4

µα([a, b]) =1

πlimy0

∫ b

a

=sµ(x+ iy)dx =

∫ b

a

f(x)dx

and the result is proved.

Let us give some applications of this result.

Proposition 5.12. Discrete α. Let Y ∼ D(p0, . . . , pd) such that p0 + . . . .pd = 1. Let x0 < x1 <. . . < xd be real numbers. Consider the function g on (x0, xd) defined by

g(x) =1

∏dj=0 |x− xj |pj

25

Then the density f(x) of Y0x1 + . . . + Ydxd is f(x) = Cjg(x) for xj−1 < x < xj where Cj =1π sin(π(p0 + . . .+ pj−1)).

Proof. This is an immediate application of Theorem 5.11 to α =∑d

j=0 pjδxj. One easily see that

log g(x) = −∫∞

−∞log |w − x|α(dw). Also the distribution function Fα(x) is (p0 + . . .+ pj−1) if xj−1 <

x < xj .

Proposition 5.13. (α uniform on (a, b)). If α(dw) = 1b−a1(a,b(w)dw the density f(x) of µα ∼ X =

∫ b

axP (dx) when P ∼ D(α) is for a < x < b

f(x) =e

πsin

(

πx− a

b− a

)

× 1

(x− a)x−ab−a (b− x)

b−xb−a

.

Proof. Just a patient computation of∫ b

alog |w − x|dw and the application of Theorem 5.11. .

Example. Suppose that α is the uniform probability on the unit sphere S2 in the Euclidean spaceR

3. Archimedes theorem says that X = (X1, X2, X3) ∼ α implies that X1 is uniform on (−1, 1). Away to see this is to observe that if Z = (Z1, Z2, Z3) ∼ N(O, I3) then Z

‖Z‖ ∼ α and therefore

X21 ∼ Z2

1

Z21 + Z2

2 + Z23

∼ β1/2,1

Since the density of X1 is symmetric and since the density of V = X21 is 1

2v−1/2dv the density

of X1 is 1/2. Consider now P ∼ D(α) concentrated on the unit ball B3 and Y = (Y1, Y2, Y3) =∫

B3(x1, x2, x3)P (dx). Proposition 5.13 has given the density of Y1 namely

f(y) =e

πsin

(

π1 + x

2

)

× 1

(1 + x)1+x2 (1− x)

1−x2

.

Furthermore the density of Y = (Y1, Y2, Y3) is invariant by rotation. Let us compute the density g(r)of R =

Y 21 + Y 2

2 + Y 23 . We write Y = RΘ where Θ = (Θ1,Θ2,Θ3) is uniformly distributed on S2.

Therefore for y > 0 since Pr(|Θ1| < x) = min(x, 1)

∫ y

−y

f(u)du = Pr(|Y1| < y) = Pr(|Θ1| <y

R) = E(min(1,

y

R)) =

∫ y

0

g(r)dr +

∫ 1

y

y

rg(r)dr

and we get by derivation

2f(y) =

∫ 1

y

g(r)

rdr, g(r) = −2rf ′(r).

Comments.

1. As said before, this section deals with the case where the mass a of α is one. Explicit formulasfor a 6= 1 are presented in Ciffarelli and Regazzini (1990) and in Lijoi and Regazini (2003). Thereare quite complicated. As an example, if α(dx) = a1(0,1)(w)dw with a > 1 the density of µα isgiven in Lijoi and Regazzini (2003) page 1483 example 1 b), thus extending Proposition 5.13.This density is expressed by an integral.

2. The result of (16) in the particular case a = 1 can be recovered from Theorem 5.11. Thecalculations actually do not seem simpler by this last method.

26

Exercise 5.3.1. If α is a probability on R such that∫∞

−∞(log |w|)+α(dw) < ∞ denote by µα the

distribution of∫∞

−∞xP (dx) when P ∼ D(α). Corollary 5.10 shows that α = µα if and only if either α is

Dirac or Cauchy. Prove that the same result is true if

α = µµα.

Hint: if g(z) = −∫∞

−∞log(w−z)α(dw) show that g′′(z) = g′(z)eg(z). Integrating this differential equation

and assuming that α is neither Dirac nor Cauchy one gets the existence of two complex constants C 6= 0and D such that

∫ ∞

−∞

µα(dw)

w − z= eg(z) =

CeCz+D

1− eCz+D.

Show that this function of z has poles in H+ and that such an α cannot exist.

Exercise 5.3.2. Suppose that α is the uniform probability on the unit sphere S2 in the Euclidean planeR

2. If (X1, X2) ∼ α, show that Y1 has the arcsine distribution (15). If P ∼ D(α) let Y = (Y1, Y2) =∫

B2(x1, x2)P (dx1, dx2). Give the density of Y1 using Section 5.2.1. Give the density of R =

Y 21 + Y 2

2

by imitating the example following Proposition 5.13.

27