Logarithmic Sobolev inequalities in discrete product ...

35
Logarithmic Sobolev inequalities in discrete product spaces: proof by a transportation cost distance Katalin Marton Alfréd Rényi Institute of Mathematics of the Hungarian Academy of Sciences

Transcript of Logarithmic Sobolev inequalities in discrete product ...

Logarithmic Sobolev inequalities in discrete product spaces:proof by a transportation cost distance

Katalin Marton

Alfréd Rényi Institute of Mathematicsof the Hungarian Academy of Sciences

Relative entropy

DefinitionZ: measurable space, µ, ν: probability measures on Z.Z, U : random variables, L(Z) = µ, L(U) = ν.

Relative entropy:

D(µ||ν) = D(Z||U) =

∫Z

logdµ

dνdµ (1)

If Z is finite:

D(µ||ν) = D(Z||U) =∑z∈Z

µ(z) logµ(z)

ν(z). (2)

Entropy contraction of Markov kernels

Definition(Z, ν) : probability space,P(Z): measures on Z,Γ: Markov kernel on Z with invariant measure ν.

Entropy contraction for (Z, ν,Γ)with rate 1− c, 0 < c ≤ 1:

D(µΓ||ν) ≤ (1− c) ·D(µ||ν). (3)

Equivalently:

c ·D(µ||ν) ≤(D(µ||ν)−D(µΓ||ν)

), for all µ ∈ P(Z) (4)

Gibbs sampler governed by local specifications of qn

DefinitionΓi: Markov kernel X n 7→ Xn

Γi(zn|yn) = δ(yi, zi) · qi(zi|yi),

Γ: Markov kernel X n 7→ Xn:

Γ =1

n·n∑i=1

Γi.

Definitionqn has the entropy contraction property if:

its Gibbs sampler Γ has.

Entropy contraction in product space

(X n, qn): probability space

QuestionWhich measures qn have the entropy contraction property witha reasonable constant c?

c cannot be larger than O(1/n).Changing notation: WHEN

c

n·D(pn||qn) ≤

(D(pn||qn)−D(pnΓ||qn)

)for all pn ∈ P(X n)

?(5)

Equivalently: WHEN

D(pn||qn) ≤ (1− c

n) ·D(pnΓ||qn) ? (6)

Conditional relative entropy

DefinitionZ: measurable space,

V: another measurable space, π: probability measure on V,V : random variable, L(V ) = π

For v ∈ Vprobability measures on Z:µ(·|v) = L(Z|V = v) ν(·|v) = L(U |V = v)

Conditional relative entropy:

D(µ(·|V )||ν(·|V )

)= D

(Z|V )||U |V )

),∫VD(µ(·|v)||ν(·|v)

)dπ

(7)

Cont’dIf V is finite:

D(µ(·|V )||ν(·|V

)= D

(Z|V )||U |V )

)=∑v∈V

π(v)D(µ(·|v)||ν(·|v)

). (8)

Sufficient condition for entropy contraction in productspace

(X n, qn,Γ): probability space with Gibbs sampler,qn = L(Xn)pn = L(Y n): another distribution on X n

Recall: Γ = 1n

∑ni=1 Γi Yi = (Y1, . . . , Yi−1, Yi+1, . . . , Yn),

Proposition

c ·D(pn||qn) ≤n∑i=1

D(pi(·|Yi)||qi(·|Yi

)(∗)

=⇒

D(pnΓ||qn) ≤(1− c

n

)D(pn||qn).

(9)

If qn is a product measure then: c = 1.

NotationFor xn = (x1, x2, . . . , xn) ∈ X n:

xi , (x1, . . . , xi−1, xi+1, . . . , xn)

pi , L(Yi), qi , L(Xi)

Expansion formula

D(pn||qn) = D(pi||qi) +D(pi(·|Yi)||qi(·|Yi)

)=⇒

D(pn||qn)

=1

n·n∑i=1

D(pi||qi) +1

n·n∑i=1

D(pi(·|Yi)||qi(·|Yi)

)

Proof of Proposition

D(pn||qn)−D(pnΓ||qn)

≥ (by convexity of entropy)

D(pn||qn)− 1

n·n∑i=1

D(pi||qi)

=1

n

n∑i=1

D(pi(·|Yi||qi(·|Yi)

)(by assumption) ≥ c

nD(pn||qn)

Entropy contraction in DISCRETE product spaces

X finite

(Xn, qn,Γ)

Wanted: inequality

D(pn||qn) ≤ C·n∑i=1

D(pi(·|Yi||qi(·|Yi)

)for all pn ∈ P(Xn) (∗)

To get (*):use a Wassersteine-like distance.

A reverse Pinsker’s inequality

µ, ν: probability measures on X X finite

Notation: Variational distance

|µ− ν| = 1

2·∑x∈X|µ(x)− ν(x)|

LemmaSet

X+ , {x ∈ X : ν(x) > 0}, α , min{ν(x) : x ∈ X+

}Then

D(µ||ν) ≤ 4

α· |µ− ν|2

Follows from the inequality D(µ||ν) ≤∑

x∈X|µ(x)−ν(x)|2

ν(x)

A distance between measures on product spaces

Xn: product space

µn = L(Zn), νn = L(Un): probability measures on Xn

Definition (P. Massart)The square of the W2-distance of µn and νn:

W 22 (µn, νn) , min

n∑i=1

Pr2{Zi 6= Ui

}inf: on couplings of µn and νn.

NotationFor I ⊂ [1, n], pn = L(Y n) and yn ∈ X n:

yI , (yi : i ∈ I), yI , (yi : i /∈ I)

pI , L(YI), pI(·|yI) , L(YI |YI = yI)

For Theorem 1 we need the inequality

W 22

(pn||qn) ≤ C · E

n∑i=1

∣∣pi(·|Yi)− qi(·|Yi)∣∣2.in a MORE GENERAL FORM:

We require a bound for

W 22

(pI(·|yI), qI(·|yI)

)for ALL subsets I ⊂ [1, n] (not just I = [1, n]), and all yI .

Main Theorem

(X n, qn), X finite !

Theorem 1Set

α = min{qi(xi|xi) : qn(xn) > 0, 1 ≤ i ≤ n

}(10)

Fix a pn = L(Y n) ∈ P(X n); assume

qn(xn) = 0 =⇒ pn(xn) = 0. (11)

Main assumption:

W 22

(pI(·|yI), qI(·|yI))

≤ C · E{∑i∈I

∣∣pi(·|Yi)− qi(·|Yi)∣∣2 ∣∣∣∣ YI = yI

},

(12)

for all I ⊂ [1, n], yI ∈ X [1,n]\I .

Main Theorem continued

Assume all the inequalities

W 22

(pI(·|yI), qI(·|yI))

≤ C · E{∑i∈I

∣∣pi(·|Yi)− qi(·|Yi)∣∣2 ∣∣∣∣ YI = yI

},

(13)

where I ⊂ [1, n] and yI ∈ X [1,n]\I is fixed.Then

D(pn||qn) ≤

4C

α·n∑i=1

E∣∣pi(·|Yi)− qi(·|Yi)∣∣2

≤ 2C

α·n∑i=1

D(pi(·|Yi)||qi(·|Yi)

).

(14)

An analogous result for densities in Rn

Theoremf(xn) = exp(−V (xn) : density on Rn,qn : probability measure with density f .

Assume: conditional densities a f(xi|xi) satisfy alogarithmic Sobolev inequality with constant ρ, for all i, xi.Under some conditions on

1

ρ·Hess V (xn)

(expressing that V is not too far from being uniformly convex):

D(pn||qn) ≤ C ·n∑i=1

D(pi(·|Yi)||qi(·|Yi)

)for all pn

(C = C(qn))

Proof of Theorem 1By induction on n. Assume Theorem 1 for n− 1

Notation

pi(·|yi) , L(Yi|Yi = yi)

For every i ∈ [1, n]

D(pn||qn) = D(Y n||Xn) = D(Yi||Xi) +D(pi(·|Yi)||qi(·|Yi)

)=⇒

D(pn||qn) =1

n

n∑i=1

D(Yi||Xi) +1

n

n∑i=1

D(pi(·|Yi)|qi(·|Yi)

)(15)

By the induction hypothesis the second term is

≤(1− 1

n

)· 4C

α

n∑i=1

E∣∣pi(·|Yi)− qi(·|Yi)∣∣2

Proof of Theorem 1 Cont’dSecond term:

≤(1− 1

n

)· 4C

α

n∑i=1

E∣∣pi(·|Yi)− qi(·|Yi)∣∣2

First term:

1

n

n∑i=1

D(Yi||Xi) ≤ (by the Lemma)1

n· 4

α·∣∣L(Yi)− L(Xi)

∣∣2≤

n∑i=1

Pr2{Yi 6= Xi

}in any coupling of pn, qn

=1

n· 4

α·W 2

2 (pn, qn) for the best coupling

≤ (by the assumption of Theorem 1)

1

n· 4C

α·n∑i=1

∣∣pi(·|Yi)− qi(·|Yi)∣∣2(16)

Entropy contraction

(X n, qn,Γ), X finiteΓ: Gibbs sampler

Corollary 1If qn satisfies the conditions of Theorem 1 then

D(pnΓ||qn) ≤(

1− α

2nC

)·D(pn||qn). (17)

Logarithmic Sobolev inequality

NotationE : Dirichlet form associated with Γ isthe quadratic form

E(f, g) =⟨(Id− Γ)f, g

⟩qn

Definitionqn satisfies a logarithmic Sobolev inequalitywith constant c > 0 if:

c ·D(pn||qn) ≤ E(√

pn

qn,

√pn

qn

)for all pn ∈ P(X n)

Logarithmic Sobolev inequality for Gibbs sampler

(X n, qn,Γ), X finite

Corollary 2Under conditions of Theorem 1the logarithmic Sobolev inequality holds true:

1

n·D(pn||qn) ≤ 4C

α· EΓ

(√pn

qn,

√pn

qn

)=

4C

αn·n∑i=1

E(

1−(∑yi∈X

√pi(yi|Yi

)· qi(yi|Yi

))2).

(18)

=⇒ hypercontractivity

***********************************?????????????????

Application: Gibbs measures with Dobrushin’suniqueness condition

(X n, qn), X finite

Definitionqn satisfies (an L2-version of)Dobrushin’s uniqueness condition with coupling matrix

A =(ak,i)nk,i=1

, ai,i = 0,

if:

(i) max |qi(·|zi)− qi(·|si)| ≤ ak,i, k 6= i,

max : for all zi, si differing only in the k-th coordinate,(19)

and (ii)||A||2 < 1.

(X n, qn), X finite

Theorem 2Assume Dobrushin’s uniqueness condition with coupling matrix

A, ||A||2 < 1.

Then conditions of Theorem 1 hold with

C = 1/(1− ||A||

)2.

Thus

D(pn||qn) ≤ 4

α· 1(

1− ||A||)2 · n∑

i=1

E∣∣pi(·|Yi)− qi(·|Yi)∣∣2

≤ 2

α· 1(

1− ||A||)2 · n∑

i=1

D(pi(·|Yi)||qi(·|Yi)

),

(20)

Cont’d

ProofDobrushin’s uniqueness condition implies that Γ contractsW2-distance with rate

1− 1

n·(

1− ||A||2).

Application: Gibbs measures on Zd

Notation

Zd : d-dimensional lattice, i ∈ Zd: site

ρ(k, i) = max1≤ν≤d |kν − iν |: distance on Zd,

Λ ⊂⊂ Zd: finite set of sites

X finite: spin space

xZd

= (xi : i ∈ Zd) ∈ X Zd: spin configuration

X Zd: configuration space,

For xZd

and Λ ⊂ Zd

xΛ = (xi : i ∈ Λ), xΛ = (xi : i /∈ Λ),

xΛ is called an outside configuration for Λ.

Local specificatons

Definition

qΛ(·|xΛ), Λ ⊂⊂ Zd : conditional distributions on XΛ.Assume compatibility conditions.

There exists at least one probability measure q on the space ofconfigurations:

q = L(X) ∈ P(X Zd)

satisfying

L(XΛ|XΛ = xΛ) = qΛ(·|xΛ), all Λ ⊂⊂ Zd.

qΛ(·|xΛ): local specifications of q.

Finite range interactions

DefinitionThe local specifications have finite range of interactions if:there is an R > 0:

qΛ(·|xΛ) only depends on coordinates k /∈ Λ with

ρ(k,Λ) ≤ R.

q may not be uniquely defined by the local specifications, evenfor finite range interactions.

Dobrushin-Shlosman’s strong mixing condition

Given local specifications qΛ(·|xΛ).

Assumption

There exists a function ϕ(ρ) of the distance such that:(i) ∑

i∈Zd

ϕ(ρ(k, i)

)<∞,

and:(ii) for every

Λ ⊂⊂ Zd, M ⊂ Λ, k /∈ Λ

and everyyΛ, zΛ, differing only at k :∣∣qM (·|yΛ)− qM (·|zΛ)

∣∣ ≤ ϕ(ρ(k,M)).

In case of finite range interactions:

If Dobrushin-Shlosman’s strong mixing condition holds then

ϕ(ρ) = C · exp(−γ · ρ)

can be taken

ρ(k,M) = max length of red segments

|qM (·|yΛ)− qM (·|zΛ)| ≤ ϕ(ρ(k,M))

yΛ, zΛ differ only at k ∈ Λ

Dobrushin-Shlosman’s strong mixing condition

Cont’dMeaning: The influence of the spin at k /∈ Λon the spins in M ⊂ Λ that are far away from kis small.

Essential:

The spins over Λ are fixed in two different ways.

The spins over Λ \M are not fixed.

Logarithmic Sobolev inequality for strongly mixingmeasures

Earlier results for the case of finite range interactions:D. Stroock, B. Zegarlinski 1992, F. Cesi 2001F. Martinelli, E. Olivieri

Theorem 3

(XΛ, qΛ(·|yΛ)) for fixed Λ and yΛ

α = min{qi(xi|xi) : qi(xi|xi) > 0

}.

(Finite range is not assumed.){Dobrushin-Shlosman’s strong mixing condition + {α > 0}

}=⇒conditions of Theorem 1 for qΛ(·|yΛ), with uniform constant=⇒logarithmic Sobolev inequality for qΛ(·|yΛ) with uniform constant

Logarithmic Sobolev inequality for strongly mixingmeasures

Cont’dThe proof uses a Gibbs sampler

updating cubes of size mdepending on the dimension and on the function ϕ(ρ).

We get

W 22

(pΛ, qΛ(·|yΛ)

)≤ Cm ·

∑I:m-sided cube

E∣∣pI∩Λ(·|YI∩Λ)− qI∩Λ(·|YI∩Λ)

∣∣2≤ Cm,α ·

∑i∈Λ

E∣∣pi(·|YΛ\i

)− qi

(·|YΛ\i, yΛ

)∣∣2for an appropriate m that is good enough for any Λ and yΛ.