- Stein's method, logarithmic Sobolev and transport...

Stein’s method, logarithmic Sobolev and transport inequalities

Arturo Jaramillo and HongJuan Zhou

University of Kansas

November 2017

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 1 / 35

Introduction

For d ≥ 1, let γ(dx) denote the standard Gaussian measure in Rd .

Theorem (Classical logarithmic Sobolev inequality for γ)For every probability measure ν of the form ν(dx) = h(x)γ(dx), withh : Rd → R+, we have that the relative entropy and Fisher information ofν with respect to γ, defined by

H(ν|γ) :=∫Rd

h(x) log(h(x))γ(dx), I(ν|γ) :=∫Rd

|∇h(x)|2h(x) γ(dx),

satisfy

H(ν|γ) ≤ 12 I(ν|γ).

Introduction

H(ν|γ) :=∫Rd

|∇h(x)|2h(x) γ(dx),

satisfy

H(ν|γ) ≤ 12 I(ν|γ).

Introduction

H(ν|γ) :=∫Rd

|∇h(x)|2h(x) γ(dx),

satisfy

H(ν|γ) ≤ 12 I(ν|γ).

Introduction

Objective:Prove a sharper version of the logarithmic Sobolev inequality that includesthe so called “Stein discrepancy”, which is a type of measure of how close

is a measure to the standard d-dimensional distribution.

Note: in the sequel, we will assume that ν(dx) = h(x)γ(dx).

Introduction

Objective:Prove a sharper version of the logarithmic Sobolev inequality that includesthe so called “Stein discrepancy”, which is a type of measure of how close

is a measure to the standard d-dimensional distribution.

Note: in the sequel, we will assume that ν(dx) = h(x)γ(dx).

Preliminaries

A matrix-valued map τν : Rd → Rd×d is said to be a Stein kernel for ν, iffor every smooth ϕ : Rd → R,∫

Rdx · ∇ϕ(x)ν(dx) =

∫Rd〈τν(x),Hess[ϕ](x)〉HS ν(dx).

where 〈A,B〉HS := tr(A∗B) is the Hilbert-Schmidt inner product on Rd×d .

Remark- The matrix τν(x) can be taken to be symmetric.- In the case where ν = γ, we can take τν(x) := Id = identity matrix, since∫

∆ϕ(x)ν(dx) =∫Rd〈Id ,Hess[ϕ](x)〉HS ν(dx).

Preliminaries

Remark- The matrix τν(x) can be taken to be symmetric.

- In the case where ν = γ, we can take τν(x) := Id = identity matrix, since∫Rd

x · ∇ϕ(x)ν(dx) =∫Rd

Preliminaries

Remark- The matrix τν(x) can be taken to be symmetric.- In the case where ν = γ, we can take τν(x) := Id = identity matrix,

since∫Rd

x · ∇ϕ(x)ν(dx) =∫Rd

Preliminaries

Remark- The matrix τν(x) can be taken to be symmetric.- In the case where ν = γ, we can take τν(x) := Id = identity matrix, since∫

Improved log-Sobolev inequality

Whenever τν exists, we define the Stein discrepancy of ν with respect toγ, as

S(ν|γ) :=(∫

Rd‖τν(x)− Id‖2HSγ(dx)

The main result of the talk is the following

Theorem (Improved logarithmic Sobolev inequality, HSI)

H(ν|γ) ≤ 12S(ν|γ)2 log

(1 + I(ν|γ)

S(ν|γ)2

)In the sequel, we will assume that 0 < S(ν|γ), I(ν|γ) <∞.

S(ν|γ) :=(∫

(1 + I(ν|γ)

S(ν|γ)2

In the sequel, we will assume that 0 < S(ν|γ), I(ν|γ) <∞.

S(ν|γ) :=(∫

(1 + I(ν|γ)

S(ν|γ)2

)In the sequel, we will assume that 0 < S(ν|γ), I(ν|γ) <∞.

Basic results of the Stein kernel

Assume that τν = τ i ,jν 1≤i ,j≤d exists and is symmetric. Recall that τν

satisfies ∫Rd

x · ∇ϕ(x)ν(dx) =∫Rd〈τν(x),Hess[ϕ](x)〉HS ν(dx). (1)

Thus, for 1 ≤ i , j ≤ d fixed, we can take ϕ(x) = xi and ϕ(x) = xi xj in (1),in order to obtain∫

xν(dx) = 0, and∫

xi xjν(dx) =∫τ i ,jν (x)ν(dx).

satisfies ∫Rd

Thus, for 1 ≤ i , j ≤ d fixed, we can take ϕ(x) = xi

and ϕ(x) = xi xj in (1),in order to obtain∫

xν(dx) = 0, and∫

satisfies ∫Rd

Thus, for 1 ≤ i , j ≤ d fixed, we can take ϕ(x) = xi and ϕ(x) = xi xj in (1),in order to obtain

∫xν(dx) = 0, and

∫xi xjν(dx) =

∫τ i ,jν (x)ν(dx).

satisfies ∫Rd

Thus, for 1 ≤ i , j ≤ d fixed, we can take ϕ(x) = xi and ϕ(x) = xi xj in (1),in order to obtain∫

xν(dx) = 0, and∫

The Ornstein-Uhlenbeck semigroup

Let Ptt≥0 denote the Ornstein-Uhlenbeck semigroup in Rd , withinfinitesimal generator

Lf = ∆f − x · ∇f , for f ∈ C2(Rd ;R).

It is well known that Pt can be written as

Pt f (x) =∫Rd

f(e−tx +

√1− e−2ty

)γ(dy).

This expression is called Mehler formula. From it, we can easily obtain

∇Pt f = e−tPt(∇f ),

Lf = ∆f − x · ∇f , for f ∈ C2(Rd ;R).

Pt f (x) =∫Rd

f(e−tx +

√1− e−2ty

)γ(dy).

This expression is called Mehler formula.

From it, we can easily obtain

Lf = ∆f − x · ∇f , for f ∈ C2(Rd ;R).

Pt f (x) =∫Rd

f(e−tx +

√1− e−2ty

)γ(dy).

This expression is called Mehler formula. From it, we can easily obtain

By using Mehler’s formula, as well as an integration by parts argument, wecan show that

Pt(∇f )(x) = 1√1− e−2t

yf (e−tx +√

1− e−2ty)γ(dy).

The generator L satisfies the following integration by parts formula∫Rd

f (x)Lg(x)γ(dx) = −∫∇f (x) · ∇g(x)γ(dx).

By using Mehler’s formula, as well as an integration by parts argument, wecan show that

Pt(∇f )(x) = 1√1− e−2t

yf (e−tx +√

1− e−2ty)γ(dy).

The generator L satisfies the following integration by parts formula∫Rd

f (x)Lg(x)γ(dx) = −∫∇f (x) · ∇g(x)γ(dx).

Formulas for I(ν|γ)

The Fisher information I(ν|γ) can be written in terms of L as follows

I(ν|γ) =∫Rd

|∇h(x)|2h(x) γ(dx)

=∫Rd|∇ log h(x)|2h(x)γ(dx)

= −∫Rd

(L log h(x)

)h(x)γ(dx).

Thus, by setting v := log h, we get

I(ν|γ) = −∫RdLv(x)ν(dx).

Formulas for I(ν|γ)

The Fisher information I(ν|γ) can be written in terms of L as follows

I(ν|γ) =∫Rd

|∇h(x)|2h(x) γ(dx)

=∫Rd|∇ log h(x)|2h(x)γ(dx)

= −∫Rd

(L log h(x)

)h(x)γ(dx).

Thus, by setting v := log h, we get

I(ν|γ) = −∫RdLv(x)ν(dx).

Organization of the proof

Define vt := log Pth and νt(dx) = Pthγ(dx).

Replacing h by Pth andusing the symmetry of Pt in the previous expressions, we get

I(νt |γ) = −∫RdLPtvt(x)ν(dx).

For proving the HSI, we use the integrated Bruijn’s formula

H(ν|γ) =∫ ∞

0I(νt |γ)dt.

The result is obtained by obtaining different type of bounds for Iγ(Pth),depending on whether t ≈ 0 or t ≈ ∞.

Define vt := log Pth and νt(dx) = Pthγ(dx). Replacing h by Pth andusing the symmetry of Pt in the previous expressions, we get

H(ν|γ) =∫ ∞

0I(νt |γ)dt.

H(ν|γ) =∫ ∞

0I(νt |γ)dt.

H(ν|γ) =∫ ∞

0I(νt |γ)dt.

Decay of I(νt |γ) and S(νt |γ)

The following results are the main ingredients for the proof of the HSI

TheoremFor every t > 0,

I(νt |γ) ≤ e−2t I(ν0|γ),

I(νt |γ) ≤ e−4t

1− e−2t ‖τν − Id‖22,ν = e−4t

1− e−2t S(ν0|γ)2.

Moreover, the Stein discrepancy satisfies

S(νt |γ) ≤ e−2tS(ν0|γ).

Proof of the LSI inequality

Using the previous bounds, we have that for every u > 0,

H(ν|γ) =∫ u

0Iγ(Pth)dt +

∫ ∞u

Iγ(Pth)dt

≤ I(ν|γ)∫ u

0e−2tdt + S(ν|γ)2

∫ ∞u

e−4t

1− e−2t dt

≤ 12 I(ν|γ)(1− e−2u) + 1

2S(ν|γ)2(−e−2u − log(1− e−2u)).

Optimizing in u (computations are easier if we define 1− e−2u = r), weobtain the result.

Bound for I(νt |γ), when t is large

We have that

Iγ(Pth) = −∫RdLPtvt(x)ν(dx) = −

[∆Ptvt(x)− x · vt(x)]ν(dx)

=∫Rd〈τν(x)− Id ,Hess(Ptvt)〉HS ν(dx)

To rewrite the Hessian, notice that

∂i ,jPtvt(x) = e−2tPt(∂i ,jvt)(x)

= e−2t√

1− e−2t

yi∂vt∂xj

(e−tx +√

1− e−2ty)γ(dx)

Bound for I(νt |γ), when t is largeFrom here it follows that∫

Rd〈τν(x)− Id ,Hess(Ptvt(x))〉HS ν(dx)

= e−2t√

1− e−2t

[(τν(x)−Id )y ·∇vt(e−tx+

√1− e−2ty)

]ν(dx)γ(dy)

This implies, after two suitable applications of the Cauchy-Schwarzinequality, that

Iγ(Pth) ≤ e−2t√

1− e−2t

∫Rd|(τν(x)− Id )y |

× |∇vt(e−tx +√

1− e−2ty)|ν(dx)γ(dy)

≤ e−2t√

1− e−2t

(∫Rd‖τν(x)− Id‖2ν(dx)

Pt |∇vt |2(x)ν(dx)) 1

Bound for I(νt |γ), when t is large

Thus, since∫Rd

Pt |∇vt |2(x)ν(dx) =∫Rd

Pt |∇vt(x)|2h(x)γ(dx)

=∫Rd|∇vt(x)|2Pth(x)γ(dx) = Iγ(Pth),

we get that

Iγ(Pth) ≤ e−2t√

1− e−2t

(∫Rd‖τν(x)− Id‖2ν(dx)

) 12Iγ(Pth)

which implies the desired inequality

Sketch of the proof of S(νt |γ) ≤ e−2tS(ν0|γ)

The idea consists on finding a Stein kernel for νt . This is obtained usingintegration by parts, and is given by

τνt (x) := e−2t Pthτν(x)Pth + (1− e−2t)Id .

Therefore,∫Rd‖τνt − Id‖2HS ≤ e−4

‖Pt [h(τν − Id )](x)‖2HSPth(x) γ(dx).

By the Cauchy-Schwarz inequality,

‖Pt [h(τν − Id )](x)‖2HS ≤ Pt [h‖τν − Id‖2HS ](x)Pth(x).

Consequently,∫Rd‖τνt (x)− Id‖2HSν

t(dx) ≤ e−4t∫Rd

Pt [h‖τν − Id‖HS ](x)γ(dx)

≤ e−4t∫Rd

h(x)‖τν(x)− Id‖2HSγ(dx)

= e−4t∫Rd‖τν − Id‖2HSν(dx),

which gives the desired inequality.

Remark: The Stein’s kernel τνt admits the probabilistic representation

τνt (x) = E[e−2tτν(F ) + (1− e−2t)Id | Ft = x ], νt(dx) -a.e.

where on some probability space (Ω,F ,P), F has distribution ν,Ft := e−tF +

√1− e−2tZ , where Z is a d-dimensional Gaussian vector,

independent of F .

Consequently,∫Rd‖τνt (x)− Id‖2HSν

t(dx) ≤ e−4t∫Rd

Pt [h‖τν − Id‖HS ](x)γ(dx)

≤ e−4t∫Rd

h(x)‖τν(x)− Id‖2HSγ(dx)

= e−4t∫Rd‖τν − Id‖2HSν(dx),

which gives the desired inequality.Remark: The Stein’s kernel τνt admits the probabilistic representation

τνt (x) = E[e−2tτν(F ) + (1− e−2t)Id | Ft = x ], νt(dx) -a.e.

where on some probability space (Ω,F ,P), F has distribution ν,Ft := e−tF +

√1− e−2tZ , where Z is a d-dimensional Gaussian vector,

independent of F .

Introduction

Objective:

Give an improved form of exponential decay of entropy.Apply Stein’s discrepancy in deriving concentration inequalities.Explore the relationship between transport distances and Stein’sdiscrepancy. The WSH inequality, as an improvement of theTalagrand quadratic transportation cost inequality, provides a sharperbound on the Wasserstein distance W2, which involves Stein’sdiscrepancy and relative entropy. Finally we bound Wp distance byStein’s discrepancy.

Exponential decay of entropy from HSI

The classical logarithmic Sobolev inequality ensures the exponential decayof the relative entropy

H(νt |γ) ≤ e−2tH(ν|γ) ,

along the O-U semigroup, i.e., dνt = Pthdγ.

Now, applying HSI produces a reinforcement of this exponential decayunder the finiteness of the Stein discrepancy.

CorollaryLet ν with Stein discrepancy S(ν|γ) = S. For any t ≥ 0,

H(νt |γ) ≤ e−4t

e−2t + 1−e−2t

S2 H(ν|γ)H(ν|γ) ≤ e−4t

1− e−2t S2(ν|γ) .

Exponential decay of entropy from HSI

The classical logarithmic Sobolev inequality ensures the exponential decayof the relative entropy

H(νt |γ) ≤ e−2tH(ν|γ) ,

along the O-U semigroup, i.e., dνt = Pthdγ.Now, applying HSI produces a reinforcement of this exponential decayunder the finiteness of the Stein discrepancy.

CorollaryLet ν with Stein discrepancy S(ν|γ) = S. For any t ≥ 0,

H(νt |γ) ≤ e−4t

e−2t + 1−e−2t

S2 H(ν|γ)H(ν|γ) ≤ e−4t

1− e−2t S2(ν|γ) .

Sketch of proof: Applying HSI inequality to νt implies that

H(νt |γ) ≤ e−4tS2

2 log(1 + e4t I(νt |γ)S2 ).

Set U(t) = e4t

S2 H(νt |γ), then

e2U − 1− 4U ≤ −U ′ .

−2U + 2U2 ≤ −U ′ .

Setting V (t) = e−2tU(t), we get 2e2tV 2(t) ≤ −V ′(t), such that afterintegration,

e2t − 1 ≤ 1V (t) −

1V (0) .

Stein discrepancy and concentration inequalities

For the standard Gaussian measure γ, for any 1-Lipschitz functionu : Rd → R with mean zero,

γ(u ≥ r) ≤ e−r2/2 ,

or equivalently, ‖u‖p,γ := (∫Rd |u|pdγ)1/p ≤ C√p, p ≥ 1. Now let ν have

Stein kernel τν , do we have a similar result for ‖u‖p,ν?

Theorem (Moment bounds and Stein discrepancy)

Let ν have Stein kernel τν . There exists a positive constant C such that forany 1-Lipschitz function u : Rd → R with

∫Rd udν = 0, and every p ≥ 2,

(∫Rd|u|pdν

)1/p≤ C

(Sp(ν|γ) +√p

(∫Rd‖τν‖p/2

op dν)1/p

Here, the p-Stein discrepancy is given by

Sp(ν|γ) =(∫

Rd‖τν − Id‖pHSdν

Taking into account of ‖τν‖op ≤ 1 + ‖τν − Id‖HS, we have

‖u‖p,ν ≤ C(

Sp(ν|γ) +√p +√p√

Sp(ν|γ))

Theorem (Moment bounds and Stein discrepancy)

Let ν have Stein kernel τν . There exists a positive constant C such that forany 1-Lipschitz function u : Rd → R with

∫Rd udν = 0, and every p ≥ 2,

(∫Rd|u|pdν

)1/p≤ C

(Sp(ν|γ) +√p

(∫Rd‖τν‖p/2

op dν)1/p

Here, the p-Stein discrepancy is given by

Sp(ν|γ) =(∫

Rd‖τν − Id‖pHSdν

Taking into account of ‖τν‖op ≤ 1 + ‖τν − Id‖HS, we have

‖u‖p,ν ≤ C(

Sp(ν|γ) +√p +√p√

Sp(ν|γ))

An example of illustration of the theorem

Consider X a centered random variable on a probabilit space with values inRd . Let X1, . . . ,Xn be independent copies of X . Assume X has the law νadmitting a Stein kernel τν . Set Tn = 1√

k=1 Xk . A Stein kernel τνn ofthe law νn of Tn is

τνn = E(

n∑k=1

τν(Xk)|Tn

Hence,

Sp(νn|γ) ≤ E(‖1

n∑k=1

(τν(Xk)− Id)‖pHS

≤ Kpn−1/2Sp(ν|γ)

which follows from Rosenthal’s inequality. Kp = O(p).

An example of illustration of the theorem

Consider X a centered random variable on a probabilit space with values inRd . Let X1, . . . ,Xn be independent copies of X . Assume X has the law νadmitting a Stein kernel τν . Set Tn = 1√

k=1 Xk . A Stein kernel τνn ofthe law νn of Tn is

τνn = E(

n∑k=1

τν(Xk)|Tn

Hence,

Sp(νn|γ) ≤ E(‖1

n∑k=1

(τν(Xk)− Id)‖pHS

≤ Kpn−1/2Sp(ν|γ)

which follows from Rosenthal’s inequality. Kp = O(p).Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 23 / 35

For any 1-Lipschitz function u : Rd → R such that E(u(Tn)) = 0, byTheorem 5,

‖u(Tn)‖Lp ≤ C√p(1 + n−1/2√pSp + n−1/4√

pSp) .

By Markov’s inequality, one can deduce a concentration inequality for Tn.For example, if Sp = O(pα) for some α > 0, then ‖u(Tn)‖Lp ≤ C√p forany p ≤ n

12α+2 . Then

P(|u(Tn)| ≥ r) ≤(C√p

with p ∼ r2

4C2 . Optimizing p gives

P(u(Tn) ≥ r) ≤ C ′e−r2/C ′ ,

for all 0 ≤ r ≤ rn where rn ∼ n1

4α+4 .

‖u(Tn)‖Lp ≤ C√p(1 + n−1/2√pSp + n−1/4√

pSp) .

12α+2 . Then

P(|u(Tn)| ≥ r) ≤(C√p

with p ∼ r2

Optimizing p gives

P(u(Tn) ≥ r) ≤ C ′e−r2/C ′ ,

4α+4 .

‖u(Tn)‖Lp ≤ C√p(1 + n−1/2√pSp + n−1/4√

pSp) .

12α+2 . Then

P(|u(Tn)| ≥ r) ≤(C√p

with p ∼ r2

4C2 . Optimizing p gives

P(u(Tn) ≥ r) ≤ C ′e−r2/C ′ ,

4α+4 .

Recall the theorem’s conclusion(∫Rd|u|pdν

)1/p≤ C

(Sp(ν|γ) +√p

(∫Rd‖τν‖p/2

op dν)1/p

Sketch of the proof: Set φ(t) =∫Rd (Ptu)2qdν. Differentiating along the

semigroup and using the definition of stein kernel τν yields

φ′(t) = 2q∫

(Ptu)2q−1〈Id− τν ,Hess(Ptu)〉HS

−2q(2q − 1)∫

(Ptu)2q−2〈τν ,∇Ptu ⊗∇Ptu〉HSdν

Using |∇u| ≤ 1 and |∇Ptu| ≤ e−t , we bound

−φ′(t) ≤ c1(t, q)∫|Ptu|2q−1‖τν−Id‖Hsdν+c2(t, q)

∫|Ptu|2q−1‖τν‖opdν

From young inequality, one can deduce

−φ′(t) ≤ C(t)φ(t) + D(t).

)1/p≤ C

(Sp(ν|γ) +√p

(∫Rd‖τν‖p/2

op dν)1/p

φ′(t) = 2q∫

−2q(2q − 1)∫

−φ′(t) ≤ C(t)φ(t) + D(t).

)1/p≤ C

(Sp(ν|γ) +√p

(∫Rd‖τν‖p/2

op dν)1/p

φ′(t) = 2q∫

−2q(2q − 1)∫

−φ′(t) ≤ C(t)φ(t) + D(t).

Wasserstein distance and Stein discrepancy

Theorem (Wasserstein distance and Stein discrepancy)For every centered probability measure ν on Rd ,

W2(ν, γ) ≤ S(ν|γ)

Remark: The measure ν is not assumed to admit a density w.r.t Lebesguemeasure on Rd .

Sketch of proof:Step 1: Assume dν = hdγ, and let vt = log Pth and dνt = Pthdγ. Thenfrom a result of Otto and Villani (2000),

dt W2(ν, νt) ≤(∫

Rd|∇vt |2dνt

Wasserstein distance and Stein discrepancy

Theorem (Wasserstein distance and Stein discrepancy)For every centered probability measure ν on Rd ,

W2(ν, γ) ≤ S(ν|γ)

Remark: The measure ν is not assumed to admit a density w.r.t Lebesguemeasure on Rd .Sketch of proof:Step 1: Assume dν = hdγ, and let vt = log Pth and dνt = Pthdγ. Thenfrom a result of Otto and Villani (2000),

dt W2(ν, νt) ≤(∫

Rd|∇vt |2dνt

W2(ν, γ) ≤∫ ∞

(∫Rd|∇vt |2dνt

)1/2dt ≤ S(ν|γ)

∫ ∞0

e−2t√

1− e−2tdt.

Step 2: For the general case, we do a regularization procedure. Namely, fixε > 0 and introduce Fε = e−εF +

√1− e−2εZ where F and Z are

independent with laws ν and γ.The distribution of Fε, νε admits smooth density hε w.r.t γ.νε has a stein kernel τνε(x) = E(e−2ετν(F ) + (1− e−2ε)Id|Fε = x).S(νε|γ) ≤ e−2εS(ν|γ).As ε→ 0, Fε → F in L2, so W2(νε, γ)→W2(ν, γ).

W2(ν, γ) = limε→0

W2(νε, γ) ≤ lim supε→0

S(νε|γ) ≤ S(ν|γ).

W2(ν, γ) ≤∫ ∞

(∫Rd|∇vt |2dνt

)1/2dt ≤ S(ν|γ)

∫ ∞0

e−2t√

1− e−2tdt.

independent with laws ν and γ.

The distribution of Fε, νε admits smooth density hε w.r.t γ.νε has a stein kernel τνε(x) = E(e−2ετν(F ) + (1− e−2ε)Id|Fε = x).S(νε|γ) ≤ e−2εS(ν|γ).As ε→ 0, Fε → F in L2, so W2(νε, γ)→W2(ν, γ).

W2(ν, γ) ≤∫ ∞

(∫Rd|∇vt |2dνt

)1/2dt ≤ S(ν|γ)

∫ ∞0

e−2t√

1− e−2tdt.

independent with laws ν and γ.The distribution of Fε, νε admits smooth density hε w.r.t γ.νε has a stein kernel τνε(x) = E(e−2ετν(F ) + (1− e−2ε)Id|Fε = x).S(νε|γ) ≤ e−2εS(ν|γ).As ε→ 0, Fε → F in L2, so W2(νε, γ)→W2(ν, γ).

Talagrand inequality and WSH inequality

The Talagrand quadratic transportation cost inequality bounds theWasserstein distance using relative entropy.

W 22 (ν, γ) ≤ 2H(ν|γ) .

Applying HSI inequality produces

Theorem (Gaussian WSH inequality)Let dν = hdγ be a centered probability measure on Rd with smoothdensity h w.r.t γ. Assume that S(ν|γ) and H(ν|γ) are positive and finite.Then

W2(ν, γ) ≤ S(ν|γ) arccos(e−H(ν|γ)S2(ν|γ) ).

Talagrand inequality and WSH inequality

The Talagrand quadratic transportation cost inequality bounds theWasserstein distance using relative entropy.

W 22 (ν, γ) ≤ 2H(ν|γ) .

Applying HSI inequality produces

Theorem (Gaussian WSH inequality)Let dν = hdγ be a centered probability measure on Rd with smoothdensity h w.r.t γ. Assume that S(ν|γ) and H(ν|γ) are positive and finite.Then

W2(ν, γ) ≤ S(ν|γ) arccos(e−H(ν|γ)S2(ν|γ) ).

Sketch of proof: By HSI inequality and decay feature of Stein’sdiscrepancy,

H(νt |γ) ≤ 12S2(ν|γ) log

(1 + I(νt |γ)

S2(ν|γ)

Exponentiating both sides,√I(νt |γ) ≤ I(νt |γ)

S(ν|γ)√

e2H(νt |γ)S2(ν|γ) − 1

By the result of Otto and Villani(2000), that is, the derivative ofWasserstein distance is bounded by square root of Fisher information,

dt W2(ν, νt) ≤ −ddt H(νt |γ)

S(ν|γ)√

e2H(νt |γ)S2(ν|γ) − 1

= − ddt

(S(ν|γ) arccos

(e−H(νt |γ)S2(ν|γ)

Integrating between t = 0 and t =∞ yields the result.

(1 + I(νt |γ)

S2(ν|γ)

S(ν|γ)√

e2H(νt |γ)S2(ν|γ) − 1

S(ν|γ)√

e2H(νt |γ)S2(ν|γ) − 1

= − ddt

(S(ν|γ) arccos

(1 + I(νt |γ)

S2(ν|γ)

S(ν|γ)√

e2H(νt |γ)S2(ν|γ) − 1

S(ν|γ)√

e2H(νt |γ)S2(ν|γ) − 1

= − ddt

(S(ν|γ) arccos

HWI Inequality and the comparison with HSI inequality

Otto and Villani (2000) give the HWI inequality, which states, for alldν = hdγ,

H(ν|γ) ≤W2(ν, γ)√

I(ν|γ)− 12W 2

2 (ν, γ).

Q: can we produce an inequality involving H,W2, I,S?

Here is a possibleway for the computation

Entγ(h) =∫ t

0Iγ(Psh)ds + Entγ(Pth)

≤ Iγ(h)∫ u

0e−2sds + S2(ν|γ)

e−4s

1− e−2s ds

+ e−2t

2(1− e−2t)W 22 (ν, γ) ,

following from the proof idea of HSI inequality and the reverse Talagrandinequality along the semigroup.

HWI Inequality and the comparison with HSI inequality

Otto and Villani (2000) give the HWI inequality, which states, for alldν = hdγ,

H(ν|γ) ≤W2(ν, γ)√

I(ν|γ)− 12W 2

2 (ν, γ).

Q: can we produce an inequality involving H,W2, I,S? Here is a possibleway for the computation

Entγ(h) =∫ t

0Iγ(Psh)ds + Entγ(Pth)

≤ Iγ(h)∫ u

0e−2sds + S2(ν|γ)

e−4s

1− e−2s ds

+ e−2t

2(1− e−2t)W 22 (ν, γ) ,

following from the proof idea of HSI inequality and the reverse Talagrandinequality along the semigroup.

Integrating out the integrals and setting α = 1− e−2u ≤ 1− e−2t = β,

H(ν|γ) ≤ 12 inf

0<α≤β≤1Φ(α, β) ,

Φ(α, β) = αI(ν|γ)+(α−logα)S2(ν|γ)+1− ββ

W 22 (ν, γ)+(log β−β)S2(ν|γ) .

When α = β, HWI is obtained. When β = 1, HSI is obtained.

HWI Vs. HSI

Consider the probability measure dνn(x) = ρn(x)dx , where

ρn(x) = 1√2π

((1− an)e−x2/2 + nane−n2x2/2

an ∈ [0, 1] and an = o( 1log n ).

A direct computation shows thatH(νn|γ)→ 0. Moreover,

I(νn|γ) =∫Rρ′n(x)2

ρn(x) dx − 1 ∼ n2an.

S2(νn|γ) =∫R(τn(x)− 1)2ρn(x)dx ≤ an → 0.

W2(νn, γ) ≤ √an. Also, W2(νn, γ) ≤ can for some constant c > 0.

The bound of HWI, W2(νn, γ)√

I(νn|γ)− 12W 2

2 (νn, γ) ∼ na3/2n →∞.

The bound of HSI, S2(νn|γ) log(1 + I(νn|γ)S2(νn|γ)) ∼ 2an log n→ 0.

HWI Vs. HSI

ρn(x) = 1√2π

((1− an)e−x2/2 + nane−n2x2/2

an ∈ [0, 1] and an = o( 1log n ). A direct computation shows that

H(νn|γ)→ 0.

Moreover,I(νn|γ) =

∫Rρ′n(x)2

I(νn|γ)− 12W 2

2 (νn, γ) ∼ na3/2n →∞.

HWI Vs. HSI

ρn(x) = 1√2π

((1− an)e−x2/2 + nane−n2x2/2

an ∈ [0, 1] and an = o( 1log n ). A direct computation shows that

H(νn|γ)→ 0. Moreover,I(νn|γ) =

∫Rρ′n(x)2

I(νn|γ)− 12W 2

2 (νn, γ) ∼ na3/2n →∞.

Wp distance and Stein discrepancy

Proposition (Wp distance and Stein discrepancy)

Let ν be a centered probability measure on Rd with Stein kernel τν in thesense

∫Rd xφdν =

∫Rd τν∇φdν for every smooth test funciton φ. For every

p ≥ 1, set

‖τν − Id‖p,ν =

d∑i ,j=1

∫Rd|τ ijν − δij |pdν

(1) Let p ∈ [1, 2). Then

Wp(ν, γ) ≤ Cpd1−1/p‖τν − Id‖p,ν .

(2) Let p ∈ [2,∞). Then if ν has finite moments of order p, then

Wp(ν, γ) ≤ Cpd1−2/p‖τν − Id‖p,ν .

Idea of the proof: As usual, write vt = log Pth and dνt = Pthdγ. Aversion of ∇vt , t > 0 is given by

∇vt(x) = e−2t√

1− e−2tE ((τν(F )− Id)Z |Ft = x)

= e−2t√

1− e−2tE

d∑j=1

(τ ijν (F )− δij)Zj |Ft

.where F and Z are indepdent with laws ν and γ respectively, andFt = e−tF +

√1− e−2tZ .

Moreover,

Wp(ν, γ) ≤∫ ∞

0(∫Rd|∇vt |pdνt)1/pdt.

Idea of the proof: As usual, write vt = log Pth and dνt = Pthdγ. Aversion of ∇vt , t > 0 is given by

∇vt(x) = e−2t√

1− e−2tE ((τν(F )− Id)Z |Ft = x)

= e−2t√

1− e−2tE

d∑j=1

(τ ijν (F )− δij)Zj |Ft

.where F and Z are indepdent with laws ν and γ respectively, andFt = e−tF +

√1− e−2tZ . Moreover,

Wp(ν, γ) ≤∫ ∞

0(∫Rd|∇vt |pdνt)1/pdt.

Bibliography

Ledoux, Michel and Nourdin, Ivan and Peccati, Giovanni (2015).Stein’s method, logarithmic Sobolev and transport inequalities.Geometric and Functional Analysis. 1 256–306.

- Stein's method, logarithmic Sobolev and transport...

Documents

Transcript of - Stein's method, logarithmic Sobolev and transport...

Latihan Soal Transport Momentum

Logarithmic Sobolev inequalities in discrete product ...

Sobolev spaces, Trace theorems and Green’s functions.

Sobolev Spaces - UCSD Mathematicsbdriver/231-02-03/Lecture_Notes/Sobolev Spaces.pdf23. Sobolev Spaces Deﬁnition 23.1. For p∈[1,∞],k∈N and Ωan open subset of Rd,let Wk,p loc

Sobolev Duals for Random Frames and Quantization of ...gunturk/GLPSY_submitted.pdf · Sobolev Duals for Random Frames and Quantization of Compressed Sensing Measurements C.S. Gun

presentation_combined transport

GSRT_ logistics-transport

Sobolev Spaces - ustc.edu.cnstaff.ustc.edu.cn/~bjxuan/Sobolev.pdf · Chapter 2 Sobolev Spaces Sobolev spaces turn out often to be the proper setting in which to apply ideas of functional

Plant Transport - uvm.edu

Improved C Approximation of Higher Order Sobolev Functions ...hajlasz/OriginalPublications/BojarskiHS-Improv… · Zygmund [7, Theorem 13] who extended the theorem to Sobolev spaces

Geometry of logarithmic strain measures in solid mechanics · 2016-11-01 · Geometry of logarithmic strain measures in solid mechanics Patrizio Ne 1, Bernhard Eidel 2 and Robert

Transport thermique

Transport in Plants - Find People

Sobolev Inequalities - UCSD Mathematicsbdriver/231-02-03/Lecture_Notes/Sobolev... · ANALYSIS TOOLS WITH APPLICATIONS 493 27. Sobolev Inequalities 27.1. Morrey’s Inequality. Notation

Colloid Transport and Colloid-Facilitated Transport in Groundwater Introduction DLVO Theory Stabilization/Transport/Aggregation/Filtration Applications.

Chapter 3 Exponential & Logarithmic Functions · Chapter 3 Exponential & Logarithmic Functions Section 3.1 Exponential Functions & Their Graphs Definition of Exponential Function:

1. Transport Intro 2020

PRIORITIZATION of TRANSPORT CORRIDORS

Logarithmic Image Processing (LIP) By Ben Weisenbeck Oiki Wong.

Medium Effects in Charmonium Transport