- Stein's method, logarithmic Sobolev and transport...

Post on 30-May-2020

6 views 0 download

Transcript of - Stein's method, logarithmic Sobolev and transport...

Stein’s method, logarithmic Sobolev and transport inequalities

Arturo Jaramillo and HongJuan Zhou

University of Kansas

November 2017

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 1 / 35

Introduction

For d ≥ 1, let γ(dx) denote the standard Gaussian measure in Rd .

Theorem (Classical logarithmic Sobolev inequality for γ)For every probability measure ν of the form ν(dx) = h(x)γ(dx), withh : Rd → R+, we have that the relative entropy and Fisher information ofν with respect to γ, defined by

H(ν|γ) :=∫Rd

h(x) log(h(x))γ(dx), I(ν|γ) :=∫Rd

|∇h(x)|2h(x) γ(dx),

satisfy

H(ν|γ) ≤ 12 I(ν|γ).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 2 / 35

Introduction

For d ≥ 1, let γ(dx) denote the standard Gaussian measure in Rd .

Theorem (Classical logarithmic Sobolev inequality for γ)For every probability measure ν of the form ν(dx) = h(x)γ(dx), withh : Rd → R+, we have that the relative entropy and Fisher information ofν with respect to γ, defined by

H(ν|γ) :=∫Rd

h(x) log(h(x))γ(dx), I(ν|γ) :=∫Rd

|∇h(x)|2h(x) γ(dx),

satisfy

H(ν|γ) ≤ 12 I(ν|γ).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 2 / 35

Introduction

For d ≥ 1, let γ(dx) denote the standard Gaussian measure in Rd .

Theorem (Classical logarithmic Sobolev inequality for γ)For every probability measure ν of the form ν(dx) = h(x)γ(dx), withh : Rd → R+, we have that the relative entropy and Fisher information ofν with respect to γ, defined by

H(ν|γ) :=∫Rd

h(x) log(h(x))γ(dx), I(ν|γ) :=∫Rd

|∇h(x)|2h(x) γ(dx),

satisfy

H(ν|γ) ≤ 12 I(ν|γ).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 2 / 35

Introduction

Objective:Prove a sharper version of the logarithmic Sobolev inequality that includesthe so called “Stein discrepancy”, which is a type of measure of how close

is a measure to the standard d-dimensional distribution.

Note: in the sequel, we will assume that ν(dx) = h(x)γ(dx).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 3 / 35

Introduction

Objective:Prove a sharper version of the logarithmic Sobolev inequality that includesthe so called “Stein discrepancy”, which is a type of measure of how close

is a measure to the standard d-dimensional distribution.

Note: in the sequel, we will assume that ν(dx) = h(x)γ(dx).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 3 / 35

Preliminaries

A matrix-valued map τν : Rd → Rd×d is said to be a Stein kernel for ν, iffor every smooth ϕ : Rd → R,∫

Rdx · ∇ϕ(x)ν(dx) =

∫Rd〈τν(x),Hess[ϕ](x)〉HS ν(dx).

where 〈A,B〉HS := tr(A∗B) is the Hilbert-Schmidt inner product on Rd×d .

Remark- The matrix τν(x) can be taken to be symmetric.- In the case where ν = γ, we can take τν(x) := Id = identity matrix, since∫

Rdx · ∇ϕ(x)ν(dx) =

∫Rd

∆ϕ(x)ν(dx) =∫Rd〈Id ,Hess[ϕ](x)〉HS ν(dx).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 4 / 35

Preliminaries

A matrix-valued map τν : Rd → Rd×d is said to be a Stein kernel for ν, iffor every smooth ϕ : Rd → R,∫

Rdx · ∇ϕ(x)ν(dx) =

∫Rd〈τν(x),Hess[ϕ](x)〉HS ν(dx).

where 〈A,B〉HS := tr(A∗B) is the Hilbert-Schmidt inner product on Rd×d .

Remark- The matrix τν(x) can be taken to be symmetric.

- In the case where ν = γ, we can take τν(x) := Id = identity matrix, since∫Rd

x · ∇ϕ(x)ν(dx) =∫Rd

∆ϕ(x)ν(dx) =∫Rd〈Id ,Hess[ϕ](x)〉HS ν(dx).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 4 / 35

Preliminaries

A matrix-valued map τν : Rd → Rd×d is said to be a Stein kernel for ν, iffor every smooth ϕ : Rd → R,∫

Rdx · ∇ϕ(x)ν(dx) =

∫Rd〈τν(x),Hess[ϕ](x)〉HS ν(dx).

where 〈A,B〉HS := tr(A∗B) is the Hilbert-Schmidt inner product on Rd×d .

Remark- The matrix τν(x) can be taken to be symmetric.- In the case where ν = γ, we can take τν(x) := Id = identity matrix,

since∫Rd

x · ∇ϕ(x)ν(dx) =∫Rd

∆ϕ(x)ν(dx) =∫Rd〈Id ,Hess[ϕ](x)〉HS ν(dx).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 4 / 35

Preliminaries

A matrix-valued map τν : Rd → Rd×d is said to be a Stein kernel for ν, iffor every smooth ϕ : Rd → R,∫

Rdx · ∇ϕ(x)ν(dx) =

∫Rd〈τν(x),Hess[ϕ](x)〉HS ν(dx).

where 〈A,B〉HS := tr(A∗B) is the Hilbert-Schmidt inner product on Rd×d .

Remark- The matrix τν(x) can be taken to be symmetric.- In the case where ν = γ, we can take τν(x) := Id = identity matrix, since∫

Rdx · ∇ϕ(x)ν(dx) =

∫Rd

∆ϕ(x)ν(dx) =∫Rd〈Id ,Hess[ϕ](x)〉HS ν(dx).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 4 / 35

Improved log-Sobolev inequality

Whenever τν exists, we define the Stein discrepancy of ν with respect toγ, as

S(ν|γ) :=(∫

Rd‖τν(x)− Id‖2HSγ(dx)

) 12.

The main result of the talk is the following

Theorem (Improved logarithmic Sobolev inequality, HSI)

H(ν|γ) ≤ 12S(ν|γ)2 log

(1 + I(ν|γ)

S(ν|γ)2

)In the sequel, we will assume that 0 < S(ν|γ), I(ν|γ) <∞.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 5 / 35

Improved log-Sobolev inequality

Whenever τν exists, we define the Stein discrepancy of ν with respect toγ, as

S(ν|γ) :=(∫

Rd‖τν(x)− Id‖2HSγ(dx)

) 12.

The main result of the talk is the following

Theorem (Improved logarithmic Sobolev inequality, HSI)

H(ν|γ) ≤ 12S(ν|γ)2 log

(1 + I(ν|γ)

S(ν|γ)2

)

In the sequel, we will assume that 0 < S(ν|γ), I(ν|γ) <∞.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 5 / 35

Improved log-Sobolev inequality

Whenever τν exists, we define the Stein discrepancy of ν with respect toγ, as

S(ν|γ) :=(∫

Rd‖τν(x)− Id‖2HSγ(dx)

) 12.

The main result of the talk is the following

Theorem (Improved logarithmic Sobolev inequality, HSI)

H(ν|γ) ≤ 12S(ν|γ)2 log

(1 + I(ν|γ)

S(ν|γ)2

)In the sequel, we will assume that 0 < S(ν|γ), I(ν|γ) <∞.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 5 / 35

Basic results of the Stein kernel

Assume that τν = τ i ,jν 1≤i ,j≤d exists and is symmetric. Recall that τν

satisfies ∫Rd

x · ∇ϕ(x)ν(dx) =∫Rd〈τν(x),Hess[ϕ](x)〉HS ν(dx). (1)

Thus, for 1 ≤ i , j ≤ d fixed, we can take ϕ(x) = xi and ϕ(x) = xi xj in (1),in order to obtain∫

xν(dx) = 0, and∫

xi xjν(dx) =∫τ i ,jν (x)ν(dx).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 6 / 35

Basic results of the Stein kernel

Assume that τν = τ i ,jν 1≤i ,j≤d exists and is symmetric. Recall that τν

satisfies ∫Rd

x · ∇ϕ(x)ν(dx) =∫Rd〈τν(x),Hess[ϕ](x)〉HS ν(dx). (1)

Thus, for 1 ≤ i , j ≤ d fixed, we can take ϕ(x) = xi

and ϕ(x) = xi xj in (1),in order to obtain∫

xν(dx) = 0, and∫

xi xjν(dx) =∫τ i ,jν (x)ν(dx).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 6 / 35

Basic results of the Stein kernel

Assume that τν = τ i ,jν 1≤i ,j≤d exists and is symmetric. Recall that τν

satisfies ∫Rd

x · ∇ϕ(x)ν(dx) =∫Rd〈τν(x),Hess[ϕ](x)〉HS ν(dx). (1)

Thus, for 1 ≤ i , j ≤ d fixed, we can take ϕ(x) = xi and ϕ(x) = xi xj in (1),in order to obtain

∫xν(dx) = 0, and

∫xi xjν(dx) =

∫τ i ,jν (x)ν(dx).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 6 / 35

Basic results of the Stein kernel

Assume that τν = τ i ,jν 1≤i ,j≤d exists and is symmetric. Recall that τν

satisfies ∫Rd

x · ∇ϕ(x)ν(dx) =∫Rd〈τν(x),Hess[ϕ](x)〉HS ν(dx). (1)

Thus, for 1 ≤ i , j ≤ d fixed, we can take ϕ(x) = xi and ϕ(x) = xi xj in (1),in order to obtain∫

xν(dx) = 0, and∫

xi xjν(dx) =∫τ i ,jν (x)ν(dx).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 6 / 35

The Ornstein-Uhlenbeck semigroup

Let Ptt≥0 denote the Ornstein-Uhlenbeck semigroup in Rd , withinfinitesimal generator

Lf = ∆f − x · ∇f , for f ∈ C2(Rd ;R).

It is well known that Pt can be written as

Pt f (x) =∫Rd

f(e−tx +

√1− e−2ty

)γ(dy).

This expression is called Mehler formula. From it, we can easily obtain

∇Pt f = e−tPt(∇f ),

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 7 / 35

The Ornstein-Uhlenbeck semigroup

Let Ptt≥0 denote the Ornstein-Uhlenbeck semigroup in Rd , withinfinitesimal generator

Lf = ∆f − x · ∇f , for f ∈ C2(Rd ;R).

It is well known that Pt can be written as

Pt f (x) =∫Rd

f(e−tx +

√1− e−2ty

)γ(dy).

This expression is called Mehler formula.

From it, we can easily obtain

∇Pt f = e−tPt(∇f ),

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 7 / 35

The Ornstein-Uhlenbeck semigroup

Let Ptt≥0 denote the Ornstein-Uhlenbeck semigroup in Rd , withinfinitesimal generator

Lf = ∆f − x · ∇f , for f ∈ C2(Rd ;R).

It is well known that Pt can be written as

Pt f (x) =∫Rd

f(e−tx +

√1− e−2ty

)γ(dy).

This expression is called Mehler formula. From it, we can easily obtain

∇Pt f = e−tPt(∇f ),

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 7 / 35

The Ornstein-Uhlenbeck semigroup

By using Mehler’s formula, as well as an integration by parts argument, wecan show that

Pt(∇f )(x) = 1√1− e−2t

∫Rd

yf (e−tx +√

1− e−2ty)γ(dy).

The generator L satisfies the following integration by parts formula∫Rd

f (x)Lg(x)γ(dx) = −∫∇f (x) · ∇g(x)γ(dx).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 8 / 35

The Ornstein-Uhlenbeck semigroup

By using Mehler’s formula, as well as an integration by parts argument, wecan show that

Pt(∇f )(x) = 1√1− e−2t

∫Rd

yf (e−tx +√

1− e−2ty)γ(dy).

The generator L satisfies the following integration by parts formula∫Rd

f (x)Lg(x)γ(dx) = −∫∇f (x) · ∇g(x)γ(dx).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 8 / 35

Formulas for I(ν|γ)

The Fisher information I(ν|γ) can be written in terms of L as follows

I(ν|γ) =∫Rd

|∇h(x)|2h(x) γ(dx)

=∫Rd|∇ log h(x)|2h(x)γ(dx)

= −∫Rd

(L log h(x)

)h(x)γ(dx).

Thus, by setting v := log h, we get

I(ν|γ) = −∫RdLv(x)ν(dx).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 9 / 35

Formulas for I(ν|γ)

The Fisher information I(ν|γ) can be written in terms of L as follows

I(ν|γ) =∫Rd

|∇h(x)|2h(x) γ(dx)

=∫Rd|∇ log h(x)|2h(x)γ(dx)

= −∫Rd

(L log h(x)

)h(x)γ(dx).

Thus, by setting v := log h, we get

I(ν|γ) = −∫RdLv(x)ν(dx).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 9 / 35

Organization of the proof

Define vt := log Pth and νt(dx) = Pthγ(dx).

Replacing h by Pth andusing the symmetry of Pt in the previous expressions, we get

I(νt |γ) = −∫RdLPtvt(x)ν(dx).

For proving the HSI, we use the integrated Bruijn’s formula

H(ν|γ) =∫ ∞

0I(νt |γ)dt.

The result is obtained by obtaining different type of bounds for Iγ(Pth),depending on whether t ≈ 0 or t ≈ ∞.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 10 / 35

Organization of the proof

Define vt := log Pth and νt(dx) = Pthγ(dx). Replacing h by Pth andusing the symmetry of Pt in the previous expressions, we get

I(νt |γ) = −∫RdLPtvt(x)ν(dx).

For proving the HSI, we use the integrated Bruijn’s formula

H(ν|γ) =∫ ∞

0I(νt |γ)dt.

The result is obtained by obtaining different type of bounds for Iγ(Pth),depending on whether t ≈ 0 or t ≈ ∞.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 10 / 35

Organization of the proof

Define vt := log Pth and νt(dx) = Pthγ(dx). Replacing h by Pth andusing the symmetry of Pt in the previous expressions, we get

I(νt |γ) = −∫RdLPtvt(x)ν(dx).

For proving the HSI, we use the integrated Bruijn’s formula

H(ν|γ) =∫ ∞

0I(νt |γ)dt.

The result is obtained by obtaining different type of bounds for Iγ(Pth),depending on whether t ≈ 0 or t ≈ ∞.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 10 / 35

Organization of the proof

Define vt := log Pth and νt(dx) = Pthγ(dx). Replacing h by Pth andusing the symmetry of Pt in the previous expressions, we get

I(νt |γ) = −∫RdLPtvt(x)ν(dx).

For proving the HSI, we use the integrated Bruijn’s formula

H(ν|γ) =∫ ∞

0I(νt |γ)dt.

The result is obtained by obtaining different type of bounds for Iγ(Pth),depending on whether t ≈ 0 or t ≈ ∞.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 10 / 35

Decay of I(νt |γ) and S(νt |γ)

The following results are the main ingredients for the proof of the HSI

TheoremFor every t > 0,

I(νt |γ) ≤ e−2t I(ν0|γ),

and

I(νt |γ) ≤ e−4t

1− e−2t ‖τν − Id‖22,ν = e−4t

1− e−2t S(ν0|γ)2.

Moreover, the Stein discrepancy satisfies

S(νt |γ) ≤ e−2tS(ν0|γ).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 11 / 35

Proof of the LSI inequality

Using the previous bounds, we have that for every u > 0,

H(ν|γ) =∫ u

0Iγ(Pth)dt +

∫ ∞u

Iγ(Pth)dt

≤ I(ν|γ)∫ u

0e−2tdt + S(ν|γ)2

∫ ∞u

e−4t

1− e−2t dt

≤ 12 I(ν|γ)(1− e−2u) + 1

2S(ν|γ)2(−e−2u − log(1− e−2u)).

Optimizing in u (computations are easier if we define 1− e−2u = r), weobtain the result.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 12 / 35

Bound for I(νt |γ), when t is large

We have that

Iγ(Pth) = −∫RdLPtvt(x)ν(dx) = −

∫Rd

[∆Ptvt(x)− x · vt(x)]ν(dx)

=∫Rd〈τν(x)− Id ,Hess(Ptvt)〉HS ν(dx)

To rewrite the Hessian, notice that

∂i ,jPtvt(x) = e−2tPt(∂i ,jvt)(x)

= e−2t√

1− e−2t

∫Rd

yi∂vt∂xj

(e−tx +√

1− e−2ty)γ(dx)

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 13 / 35

Bound for I(νt |γ), when t is largeFrom here it follows that∫

Rd〈τν(x)− Id ,Hess(Ptvt(x))〉HS ν(dx)

= e−2t√

1− e−2t

∫Rd

∫Rd

[(τν(x)−Id )y ·∇vt(e−tx+

√1− e−2ty)

]ν(dx)γ(dy)

This implies, after two suitable applications of the Cauchy-Schwarzinequality, that

Iγ(Pth) ≤ e−2t√

1− e−2t

∫Rd

∫Rd|(τν(x)− Id )y |

× |∇vt(e−tx +√

1− e−2ty)|ν(dx)γ(dy)

≤ e−2t√

1− e−2t

(∫Rd‖τν(x)− Id‖2ν(dx)

∫Rd

Pt |∇vt |2(x)ν(dx)) 1

2

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 14 / 35

Bound for I(νt |γ), when t is large

Thus, since∫Rd

Pt |∇vt |2(x)ν(dx) =∫Rd

Pt |∇vt(x)|2h(x)γ(dx)

=∫Rd|∇vt(x)|2Pth(x)γ(dx) = Iγ(Pth),

we get that

Iγ(Pth) ≤ e−2t√

1− e−2t

(∫Rd‖τν(x)− Id‖2ν(dx)

) 12Iγ(Pth)

12 ,

which implies the desired inequality

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 15 / 35

Sketch of the proof of S(νt |γ) ≤ e−2tS(ν0|γ)

The idea consists on finding a Stein kernel for νt . This is obtained usingintegration by parts, and is given by

τνt (x) := e−2t Pthτν(x)Pth + (1− e−2t)Id .

Therefore,∫Rd‖τνt − Id‖2HS ≤ e−4

∫Rd

‖Pt [h(τν − Id )](x)‖2HSPth(x) γ(dx).

By the Cauchy-Schwarz inequality,

‖Pt [h(τν − Id )](x)‖2HS ≤ Pt [h‖τν − Id‖2HS ](x)Pth(x).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 16 / 35

Sketch of the proof of S(νt |γ) ≤ e−2tS(ν0|γ)

The idea consists on finding a Stein kernel for νt . This is obtained usingintegration by parts, and is given by

τνt (x) := e−2t Pthτν(x)Pth + (1− e−2t)Id .

Therefore,∫Rd‖τνt − Id‖2HS ≤ e−4

∫Rd

‖Pt [h(τν − Id )](x)‖2HSPth(x) γ(dx).

By the Cauchy-Schwarz inequality,

‖Pt [h(τν − Id )](x)‖2HS ≤ Pt [h‖τν − Id‖2HS ](x)Pth(x).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 16 / 35

Sketch of the proof of S(νt |γ) ≤ e−2tS(ν0|γ)

The idea consists on finding a Stein kernel for νt . This is obtained usingintegration by parts, and is given by

τνt (x) := e−2t Pthτν(x)Pth + (1− e−2t)Id .

Therefore,∫Rd‖τνt − Id‖2HS ≤ e−4

∫Rd

‖Pt [h(τν − Id )](x)‖2HSPth(x) γ(dx).

By the Cauchy-Schwarz inequality,

‖Pt [h(τν − Id )](x)‖2HS ≤ Pt [h‖τν − Id‖2HS ](x)Pth(x).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 16 / 35

Consequently,∫Rd‖τνt (x)− Id‖2HSν

t(dx) ≤ e−4t∫Rd

Pt [h‖τν − Id‖HS ](x)γ(dx)

≤ e−4t∫Rd

h(x)‖τν(x)− Id‖2HSγ(dx)

= e−4t∫Rd‖τν − Id‖2HSν(dx),

which gives the desired inequality.

Remark: The Stein’s kernel τνt admits the probabilistic representation

τνt (x) = E[e−2tτν(F ) + (1− e−2t)Id | Ft = x ], νt(dx) -a.e.

where on some probability space (Ω,F ,P), F has distribution ν,Ft := e−tF +

√1− e−2tZ , where Z is a d-dimensional Gaussian vector,

independent of F .

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 17 / 35

Consequently,∫Rd‖τνt (x)− Id‖2HSν

t(dx) ≤ e−4t∫Rd

Pt [h‖τν − Id‖HS ](x)γ(dx)

≤ e−4t∫Rd

h(x)‖τν(x)− Id‖2HSγ(dx)

= e−4t∫Rd‖τν − Id‖2HSν(dx),

which gives the desired inequality.Remark: The Stein’s kernel τνt admits the probabilistic representation

τνt (x) = E[e−2tτν(F ) + (1− e−2t)Id | Ft = x ], νt(dx) -a.e.

where on some probability space (Ω,F ,P), F has distribution ν,Ft := e−tF +

√1− e−2tZ , where Z is a d-dimensional Gaussian vector,

independent of F .

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 17 / 35

Introduction

Objective:

Give an improved form of exponential decay of entropy.Apply Stein’s discrepancy in deriving concentration inequalities.Explore the relationship between transport distances and Stein’sdiscrepancy. The WSH inequality, as an improvement of theTalagrand quadratic transportation cost inequality, provides a sharperbound on the Wasserstein distance W2, which involves Stein’sdiscrepancy and relative entropy. Finally we bound Wp distance byStein’s discrepancy.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 18 / 35

Exponential decay of entropy from HSI

The classical logarithmic Sobolev inequality ensures the exponential decayof the relative entropy

H(νt |γ) ≤ e−2tH(ν|γ) ,

along the O-U semigroup, i.e., dνt = Pthdγ.

Now, applying HSI produces a reinforcement of this exponential decayunder the finiteness of the Stein discrepancy.

CorollaryLet ν with Stein discrepancy S(ν|γ) = S. For any t ≥ 0,

H(νt |γ) ≤ e−4t

e−2t + 1−e−2t

S2 H(ν|γ)H(ν|γ) ≤ e−4t

1− e−2t S2(ν|γ) .

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 19 / 35

Exponential decay of entropy from HSI

The classical logarithmic Sobolev inequality ensures the exponential decayof the relative entropy

H(νt |γ) ≤ e−2tH(ν|γ) ,

along the O-U semigroup, i.e., dνt = Pthdγ.Now, applying HSI produces a reinforcement of this exponential decayunder the finiteness of the Stein discrepancy.

CorollaryLet ν with Stein discrepancy S(ν|γ) = S. For any t ≥ 0,

H(νt |γ) ≤ e−4t

e−2t + 1−e−2t

S2 H(ν|γ)H(ν|γ) ≤ e−4t

1− e−2t S2(ν|γ) .

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 19 / 35

Sketch of proof: Applying HSI inequality to νt implies that

H(νt |γ) ≤ e−4tS2

2 log(1 + e4t I(νt |γ)S2 ).

Set U(t) = e4t

S2 H(νt |γ), then

e2U − 1− 4U ≤ −U ′ .

−2U + 2U2 ≤ −U ′ .

Setting V (t) = e−2tU(t), we get 2e2tV 2(t) ≤ −V ′(t), such that afterintegration,

e2t − 1 ≤ 1V (t) −

1V (0) .

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 20 / 35

Stein discrepancy and concentration inequalities

For the standard Gaussian measure γ, for any 1-Lipschitz functionu : Rd → R with mean zero,

γ(u ≥ r) ≤ e−r2/2 ,

or equivalently, ‖u‖p,γ := (∫Rd |u|pdγ)1/p ≤ C√p, p ≥ 1. Now let ν have

Stein kernel τν , do we have a similar result for ‖u‖p,ν?

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 21 / 35

Theorem (Moment bounds and Stein discrepancy)

Let ν have Stein kernel τν . There exists a positive constant C such that forany 1-Lipschitz function u : Rd → R with

∫Rd udν = 0, and every p ≥ 2,

(∫Rd|u|pdν

)1/p≤ C

(Sp(ν|γ) +√p

(∫Rd‖τν‖p/2

op dν)1/p

)

Here, the p-Stein discrepancy is given by

Sp(ν|γ) =(∫

Rd‖τν − Id‖pHSdν

)1/p.

Taking into account of ‖τν‖op ≤ 1 + ‖τν − Id‖HS, we have

‖u‖p,ν ≤ C(

Sp(ν|γ) +√p +√p√

Sp(ν|γ))

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 22 / 35

Theorem (Moment bounds and Stein discrepancy)

Let ν have Stein kernel τν . There exists a positive constant C such that forany 1-Lipschitz function u : Rd → R with

∫Rd udν = 0, and every p ≥ 2,

(∫Rd|u|pdν

)1/p≤ C

(Sp(ν|γ) +√p

(∫Rd‖τν‖p/2

op dν)1/p

)

Here, the p-Stein discrepancy is given by

Sp(ν|γ) =(∫

Rd‖τν − Id‖pHSdν

)1/p.

Taking into account of ‖τν‖op ≤ 1 + ‖τν − Id‖HS, we have

‖u‖p,ν ≤ C(

Sp(ν|γ) +√p +√p√

Sp(ν|γ))

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 22 / 35

An example of illustration of the theorem

Consider X a centered random variable on a probabilit space with values inRd . Let X1, . . . ,Xn be independent copies of X . Assume X has the law νadmitting a Stein kernel τν . Set Tn = 1√

n∑n

k=1 Xk . A Stein kernel τνn ofthe law νn of Tn is

τνn = E(

1n

n∑k=1

τν(Xk)|Tn

).

Hence,

Sp(νn|γ) ≤ E(‖1

n

n∑k=1

(τν(Xk)− Id)‖pHS

)1/p

≤ Kpn−1/2Sp(ν|γ)

which follows from Rosenthal’s inequality. Kp = O(p).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 23 / 35

An example of illustration of the theorem

Consider X a centered random variable on a probabilit space with values inRd . Let X1, . . . ,Xn be independent copies of X . Assume X has the law νadmitting a Stein kernel τν . Set Tn = 1√

n∑n

k=1 Xk . A Stein kernel τνn ofthe law νn of Tn is

τνn = E(

1n

n∑k=1

τν(Xk)|Tn

).

Hence,

Sp(νn|γ) ≤ E(‖1

n

n∑k=1

(τν(Xk)− Id)‖pHS

)1/p

≤ Kpn−1/2Sp(ν|γ)

which follows from Rosenthal’s inequality. Kp = O(p).Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 23 / 35

For any 1-Lipschitz function u : Rd → R such that E(u(Tn)) = 0, byTheorem 5,

‖u(Tn)‖Lp ≤ C√p(1 + n−1/2√pSp + n−1/4√

pSp) .

By Markov’s inequality, one can deduce a concentration inequality for Tn.For example, if Sp = O(pα) for some α > 0, then ‖u(Tn)‖Lp ≤ C√p forany p ≤ n

12α+2 . Then

P(|u(Tn)| ≥ r) ≤(C√p

r

)p,

with p ∼ r2

4C2 . Optimizing p gives

P(u(Tn) ≥ r) ≤ C ′e−r2/C ′ ,

for all 0 ≤ r ≤ rn where rn ∼ n1

4α+4 .

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 24 / 35

For any 1-Lipschitz function u : Rd → R such that E(u(Tn)) = 0, byTheorem 5,

‖u(Tn)‖Lp ≤ C√p(1 + n−1/2√pSp + n−1/4√

pSp) .

By Markov’s inequality, one can deduce a concentration inequality for Tn.For example, if Sp = O(pα) for some α > 0, then ‖u(Tn)‖Lp ≤ C√p forany p ≤ n

12α+2 . Then

P(|u(Tn)| ≥ r) ≤(C√p

r

)p,

with p ∼ r2

4C2 .

Optimizing p gives

P(u(Tn) ≥ r) ≤ C ′e−r2/C ′ ,

for all 0 ≤ r ≤ rn where rn ∼ n1

4α+4 .

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 24 / 35

For any 1-Lipschitz function u : Rd → R such that E(u(Tn)) = 0, byTheorem 5,

‖u(Tn)‖Lp ≤ C√p(1 + n−1/2√pSp + n−1/4√

pSp) .

By Markov’s inequality, one can deduce a concentration inequality for Tn.For example, if Sp = O(pα) for some α > 0, then ‖u(Tn)‖Lp ≤ C√p forany p ≤ n

12α+2 . Then

P(|u(Tn)| ≥ r) ≤(C√p

r

)p,

with p ∼ r2

4C2 . Optimizing p gives

P(u(Tn) ≥ r) ≤ C ′e−r2/C ′ ,

for all 0 ≤ r ≤ rn where rn ∼ n1

4α+4 .

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 24 / 35

Recall the theorem’s conclusion(∫Rd|u|pdν

)1/p≤ C

(Sp(ν|γ) +√p

(∫Rd‖τν‖p/2

op dν)1/p

).

Sketch of the proof: Set φ(t) =∫Rd (Ptu)2qdν. Differentiating along the

semigroup and using the definition of stein kernel τν yields

φ′(t) = 2q∫

(Ptu)2q−1〈Id− τν ,Hess(Ptu)〉HS

−2q(2q − 1)∫

(Ptu)2q−2〈τν ,∇Ptu ⊗∇Ptu〉HSdν

Using |∇u| ≤ 1 and |∇Ptu| ≤ e−t , we bound

−φ′(t) ≤ c1(t, q)∫|Ptu|2q−1‖τν−Id‖Hsdν+c2(t, q)

∫|Ptu|2q−1‖τν‖opdν

From young inequality, one can deduce

−φ′(t) ≤ C(t)φ(t) + D(t).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 25 / 35

Recall the theorem’s conclusion(∫Rd|u|pdν

)1/p≤ C

(Sp(ν|γ) +√p

(∫Rd‖τν‖p/2

op dν)1/p

).

Sketch of the proof: Set φ(t) =∫Rd (Ptu)2qdν. Differentiating along the

semigroup and using the definition of stein kernel τν yields

φ′(t) = 2q∫

(Ptu)2q−1〈Id− τν ,Hess(Ptu)〉HS

−2q(2q − 1)∫

(Ptu)2q−2〈τν ,∇Ptu ⊗∇Ptu〉HSdν

Using |∇u| ≤ 1 and |∇Ptu| ≤ e−t , we bound

−φ′(t) ≤ c1(t, q)∫|Ptu|2q−1‖τν−Id‖Hsdν+c2(t, q)

∫|Ptu|2q−1‖τν‖opdν

From young inequality, one can deduce

−φ′(t) ≤ C(t)φ(t) + D(t).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 25 / 35

Recall the theorem’s conclusion(∫Rd|u|pdν

)1/p≤ C

(Sp(ν|γ) +√p

(∫Rd‖τν‖p/2

op dν)1/p

).

Sketch of the proof: Set φ(t) =∫Rd (Ptu)2qdν. Differentiating along the

semigroup and using the definition of stein kernel τν yields

φ′(t) = 2q∫

(Ptu)2q−1〈Id− τν ,Hess(Ptu)〉HS

−2q(2q − 1)∫

(Ptu)2q−2〈τν ,∇Ptu ⊗∇Ptu〉HSdν

Using |∇u| ≤ 1 and |∇Ptu| ≤ e−t , we bound

−φ′(t) ≤ c1(t, q)∫|Ptu|2q−1‖τν−Id‖Hsdν+c2(t, q)

∫|Ptu|2q−1‖τν‖opdν

From young inequality, one can deduce

−φ′(t) ≤ C(t)φ(t) + D(t).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 25 / 35

Wasserstein distance and Stein discrepancy

Theorem (Wasserstein distance and Stein discrepancy)For every centered probability measure ν on Rd ,

W2(ν, γ) ≤ S(ν|γ)

Remark: The measure ν is not assumed to admit a density w.r.t Lebesguemeasure on Rd .

Sketch of proof:Step 1: Assume dν = hdγ, and let vt = log Pth and dνt = Pthdγ. Thenfrom a result of Otto and Villani (2000),

d+

dt W2(ν, νt) ≤(∫

Rd|∇vt |2dνt

)1/2.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 26 / 35

Wasserstein distance and Stein discrepancy

Theorem (Wasserstein distance and Stein discrepancy)For every centered probability measure ν on Rd ,

W2(ν, γ) ≤ S(ν|γ)

Remark: The measure ν is not assumed to admit a density w.r.t Lebesguemeasure on Rd .Sketch of proof:Step 1: Assume dν = hdγ, and let vt = log Pth and dνt = Pthdγ. Thenfrom a result of Otto and Villani (2000),

d+

dt W2(ν, νt) ≤(∫

Rd|∇vt |2dνt

)1/2.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 26 / 35

Then

W2(ν, γ) ≤∫ ∞

0

(∫Rd|∇vt |2dνt

)1/2dt ≤ S(ν|γ)

∫ ∞0

e−2t√

1− e−2tdt.

Step 2: For the general case, we do a regularization procedure. Namely, fixε > 0 and introduce Fε = e−εF +

√1− e−2εZ where F and Z are

independent with laws ν and γ.The distribution of Fε, νε admits smooth density hε w.r.t γ.νε has a stein kernel τνε(x) = E(e−2ετν(F ) + (1− e−2ε)Id|Fε = x).S(νε|γ) ≤ e−2εS(ν|γ).As ε→ 0, Fε → F in L2, so W2(νε, γ)→W2(ν, γ).

W2(ν, γ) = limε→0

W2(νε, γ) ≤ lim supε→0

S(νε|γ) ≤ S(ν|γ).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 27 / 35

Then

W2(ν, γ) ≤∫ ∞

0

(∫Rd|∇vt |2dνt

)1/2dt ≤ S(ν|γ)

∫ ∞0

e−2t√

1− e−2tdt.

Step 2: For the general case, we do a regularization procedure. Namely, fixε > 0 and introduce Fε = e−εF +

√1− e−2εZ where F and Z are

independent with laws ν and γ.

The distribution of Fε, νε admits smooth density hε w.r.t γ.νε has a stein kernel τνε(x) = E(e−2ετν(F ) + (1− e−2ε)Id|Fε = x).S(νε|γ) ≤ e−2εS(ν|γ).As ε→ 0, Fε → F in L2, so W2(νε, γ)→W2(ν, γ).

W2(ν, γ) = limε→0

W2(νε, γ) ≤ lim supε→0

S(νε|γ) ≤ S(ν|γ).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 27 / 35

Then

W2(ν, γ) ≤∫ ∞

0

(∫Rd|∇vt |2dνt

)1/2dt ≤ S(ν|γ)

∫ ∞0

e−2t√

1− e−2tdt.

Step 2: For the general case, we do a regularization procedure. Namely, fixε > 0 and introduce Fε = e−εF +

√1− e−2εZ where F and Z are

independent with laws ν and γ.The distribution of Fε, νε admits smooth density hε w.r.t γ.νε has a stein kernel τνε(x) = E(e−2ετν(F ) + (1− e−2ε)Id|Fε = x).S(νε|γ) ≤ e−2εS(ν|γ).As ε→ 0, Fε → F in L2, so W2(νε, γ)→W2(ν, γ).

W2(ν, γ) = limε→0

W2(νε, γ) ≤ lim supε→0

S(νε|γ) ≤ S(ν|γ).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 27 / 35

Talagrand inequality and WSH inequality

The Talagrand quadratic transportation cost inequality bounds theWasserstein distance using relative entropy.

W 22 (ν, γ) ≤ 2H(ν|γ) .

Applying HSI inequality produces

Theorem (Gaussian WSH inequality)Let dν = hdγ be a centered probability measure on Rd with smoothdensity h w.r.t γ. Assume that S(ν|γ) and H(ν|γ) are positive and finite.Then

W2(ν, γ) ≤ S(ν|γ) arccos(e−H(ν|γ)S2(ν|γ) ).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 28 / 35

Talagrand inequality and WSH inequality

The Talagrand quadratic transportation cost inequality bounds theWasserstein distance using relative entropy.

W 22 (ν, γ) ≤ 2H(ν|γ) .

Applying HSI inequality produces

Theorem (Gaussian WSH inequality)Let dν = hdγ be a centered probability measure on Rd with smoothdensity h w.r.t γ. Assume that S(ν|γ) and H(ν|γ) are positive and finite.Then

W2(ν, γ) ≤ S(ν|γ) arccos(e−H(ν|γ)S2(ν|γ) ).

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 28 / 35

Sketch of proof: By HSI inequality and decay feature of Stein’sdiscrepancy,

H(νt |γ) ≤ 12S2(ν|γ) log

(1 + I(νt |γ)

S2(ν|γ)

).

Exponentiating both sides,√I(νt |γ) ≤ I(νt |γ)

S(ν|γ)√

e2H(νt |γ)S2(ν|γ) − 1

.

By the result of Otto and Villani(2000), that is, the derivative ofWasserstein distance is bounded by square root of Fisher information,

d+

dt W2(ν, νt) ≤ −ddt H(νt |γ)

S(ν|γ)√

e2H(νt |γ)S2(ν|γ) − 1

= − ddt

(S(ν|γ) arccos

(e−H(νt |γ)S2(ν|γ)

)).

Integrating between t = 0 and t =∞ yields the result.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 29 / 35

Sketch of proof: By HSI inequality and decay feature of Stein’sdiscrepancy,

H(νt |γ) ≤ 12S2(ν|γ) log

(1 + I(νt |γ)

S2(ν|γ)

).

Exponentiating both sides,√I(νt |γ) ≤ I(νt |γ)

S(ν|γ)√

e2H(νt |γ)S2(ν|γ) − 1

.

By the result of Otto and Villani(2000), that is, the derivative ofWasserstein distance is bounded by square root of Fisher information,

d+

dt W2(ν, νt) ≤ −ddt H(νt |γ)

S(ν|γ)√

e2H(νt |γ)S2(ν|γ) − 1

= − ddt

(S(ν|γ) arccos

(e−H(νt |γ)S2(ν|γ)

)).

Integrating between t = 0 and t =∞ yields the result.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 29 / 35

Sketch of proof: By HSI inequality and decay feature of Stein’sdiscrepancy,

H(νt |γ) ≤ 12S2(ν|γ) log

(1 + I(νt |γ)

S2(ν|γ)

).

Exponentiating both sides,√I(νt |γ) ≤ I(νt |γ)

S(ν|γ)√

e2H(νt |γ)S2(ν|γ) − 1

.

By the result of Otto and Villani(2000), that is, the derivative ofWasserstein distance is bounded by square root of Fisher information,

d+

dt W2(ν, νt) ≤ −ddt H(νt |γ)

S(ν|γ)√

e2H(νt |γ)S2(ν|γ) − 1

= − ddt

(S(ν|γ) arccos

(e−H(νt |γ)S2(ν|γ)

)).

Integrating between t = 0 and t =∞ yields the result.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 29 / 35

HWI Inequality and the comparison with HSI inequality

Otto and Villani (2000) give the HWI inequality, which states, for alldν = hdγ,

H(ν|γ) ≤W2(ν, γ)√

I(ν|γ)− 12W 2

2 (ν, γ).

Q: can we produce an inequality involving H,W2, I,S?

Here is a possibleway for the computation

Entγ(h) =∫ t

0Iγ(Psh)ds + Entγ(Pth)

≤ Iγ(h)∫ u

0e−2sds + S2(ν|γ)

∫ t

u

e−4s

1− e−2s ds

+ e−2t

2(1− e−2t)W 22 (ν, γ) ,

following from the proof idea of HSI inequality and the reverse Talagrandinequality along the semigroup.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 30 / 35

HWI Inequality and the comparison with HSI inequality

Otto and Villani (2000) give the HWI inequality, which states, for alldν = hdγ,

H(ν|γ) ≤W2(ν, γ)√

I(ν|γ)− 12W 2

2 (ν, γ).

Q: can we produce an inequality involving H,W2, I,S? Here is a possibleway for the computation

Entγ(h) =∫ t

0Iγ(Psh)ds + Entγ(Pth)

≤ Iγ(h)∫ u

0e−2sds + S2(ν|γ)

∫ t

u

e−4s

1− e−2s ds

+ e−2t

2(1− e−2t)W 22 (ν, γ) ,

following from the proof idea of HSI inequality and the reverse Talagrandinequality along the semigroup.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 30 / 35

Integrating out the integrals and setting α = 1− e−2u ≤ 1− e−2t = β,

H(ν|γ) ≤ 12 inf

0<α≤β≤1Φ(α, β) ,

where

Φ(α, β) = αI(ν|γ)+(α−logα)S2(ν|γ)+1− ββ

W 22 (ν, γ)+(log β−β)S2(ν|γ) .

When α = β, HWI is obtained. When β = 1, HSI is obtained.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 31 / 35

HWI Vs. HSI

Consider the probability measure dνn(x) = ρn(x)dx , where

ρn(x) = 1√2π

((1− an)e−x2/2 + nane−n2x2/2

),

an ∈ [0, 1] and an = o( 1log n ).

A direct computation shows thatH(νn|γ)→ 0. Moreover,

I(νn|γ) =∫Rρ′n(x)2

ρn(x) dx − 1 ∼ n2an.

S2(νn|γ) =∫R(τn(x)− 1)2ρn(x)dx ≤ an → 0.

W2(νn, γ) ≤ √an. Also, W2(νn, γ) ≤ can for some constant c > 0.

The bound of HWI, W2(νn, γ)√

I(νn|γ)− 12W 2

2 (νn, γ) ∼ na3/2n →∞.

The bound of HSI, S2(νn|γ) log(1 + I(νn|γ)S2(νn|γ)) ∼ 2an log n→ 0.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 32 / 35

HWI Vs. HSI

Consider the probability measure dνn(x) = ρn(x)dx , where

ρn(x) = 1√2π

((1− an)e−x2/2 + nane−n2x2/2

),

an ∈ [0, 1] and an = o( 1log n ). A direct computation shows that

H(νn|γ)→ 0.

Moreover,I(νn|γ) =

∫Rρ′n(x)2

ρn(x) dx − 1 ∼ n2an.

S2(νn|γ) =∫R(τn(x)− 1)2ρn(x)dx ≤ an → 0.

W2(νn, γ) ≤ √an. Also, W2(νn, γ) ≤ can for some constant c > 0.

The bound of HWI, W2(νn, γ)√

I(νn|γ)− 12W 2

2 (νn, γ) ∼ na3/2n →∞.

The bound of HSI, S2(νn|γ) log(1 + I(νn|γ)S2(νn|γ)) ∼ 2an log n→ 0.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 32 / 35

HWI Vs. HSI

Consider the probability measure dνn(x) = ρn(x)dx , where

ρn(x) = 1√2π

((1− an)e−x2/2 + nane−n2x2/2

),

an ∈ [0, 1] and an = o( 1log n ). A direct computation shows that

H(νn|γ)→ 0. Moreover,I(νn|γ) =

∫Rρ′n(x)2

ρn(x) dx − 1 ∼ n2an.

S2(νn|γ) =∫R(τn(x)− 1)2ρn(x)dx ≤ an → 0.

W2(νn, γ) ≤ √an. Also, W2(νn, γ) ≤ can for some constant c > 0.

The bound of HWI, W2(νn, γ)√

I(νn|γ)− 12W 2

2 (νn, γ) ∼ na3/2n →∞.

The bound of HSI, S2(νn|γ) log(1 + I(νn|γ)S2(νn|γ)) ∼ 2an log n→ 0.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 32 / 35

Wp distance and Stein discrepancy

Proposition (Wp distance and Stein discrepancy)

Let ν be a centered probability measure on Rd with Stein kernel τν in thesense

∫Rd xφdν =

∫Rd τν∇φdν for every smooth test funciton φ. For every

p ≥ 1, set

‖τν − Id‖p,ν =

d∑i ,j=1

∫Rd|τ ijν − δij |pdν

1/p

.

(1) Let p ∈ [1, 2). Then

Wp(ν, γ) ≤ Cpd1−1/p‖τν − Id‖p,ν .

(2) Let p ∈ [2,∞). Then if ν has finite moments of order p, then

Wp(ν, γ) ≤ Cpd1−2/p‖τν − Id‖p,ν .

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 33 / 35

Idea of the proof: As usual, write vt = log Pth and dνt = Pthdγ. Aversion of ∇vt , t > 0 is given by

∇vt(x) = e−2t√

1− e−2tE ((τν(F )− Id)Z |Ft = x)

= e−2t√

1− e−2tE

d∑j=1

(τ ijν (F )− δij)Zj |Ft

.where F and Z are indepdent with laws ν and γ respectively, andFt = e−tF +

√1− e−2tZ .

Moreover,

Wp(ν, γ) ≤∫ ∞

0(∫Rd|∇vt |pdνt)1/pdt.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 34 / 35

Idea of the proof: As usual, write vt = log Pth and dνt = Pthdγ. Aversion of ∇vt , t > 0 is given by

∇vt(x) = e−2t√

1− e−2tE ((τν(F )− Id)Z |Ft = x)

= e−2t√

1− e−2tE

d∑j=1

(τ ijν (F )− δij)Zj |Ft

.where F and Z are indepdent with laws ν and γ respectively, andFt = e−tF +

√1− e−2tZ . Moreover,

Wp(ν, γ) ≤∫ ∞

0(∫Rd|∇vt |pdνt)1/pdt.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 34 / 35

Bibliography

Ledoux, Michel and Nourdin, Ivan and Peccati, Giovanni (2015).Stein’s method, logarithmic Sobolev and transport inequalities.Geometric and Functional Analysis. 1 256–306.

Arturo Jaramillo and HongJuan Zhou (University of Kansas) November 2017 35 / 35