M.Hairer · Advancedstochasticanalysis February29,2016 M.Hairer TheUniversityofWarwick Contents 1...

Advanced stochastic analysis

February 29, 2016

M. Hairer

The University of Warwick

Contents

1 Introduction 1

2 White noise and Wiener chaos 3

3 The Malliavin derivative and its adjoint 8

4 Smooth densities 16

5 Malliavin Calculus for Diffusion Processes 18

6 Hörmander’s Theorem 22

7 Hypercontractivity 27

8 Construction of the Φ42 field 34

1 Introduction

One of the main tools of modern stochastic analysis is Malliavin calculus. Ina nutshell, this is a theory providing a way of differentiating random variablesdefined on a Gaussian probability space (typically Wiener space) with respect tothe underlying noise. This allows to develop an “analysis on Wiener space”, aninfinite-dimensional generalisation of the usual analytical concepts we are familiarwith on Rn. (Fourier analysis, Sobolev spaces, etc.)

The main goal of this course is to develop this theory with the proof of Hörman-der’s theorem inmind. This was actually the original motivation for the developmentof the theory and states the following. Consider a stochastic differential equationon Rn given by

dX j(t) = Vj,0(X(t)) dt +m∑

i=1

Vj,i(X(t)) dWi(t)

Introduction 2

= Vj,0(X) dt +12

m∑i=1

d∑k=1

Vk,i(X)∂kVj,i(X) dt +m∑

i=1

Vj,i(X) dWi(t) ,

where denotes Stratonovich integration. We also write this in the shorthandnotation

dXt = V0(Xt) dt +m∑

i=1

Vi(Xt) dWi(t) , (1.1)

where the Vi are smooth vector fields on Rn with all derivatives bounded. Onemight then ask under what conditions it is the case that the law of Xt has a densitywith respect to Lebesgue measure for t > 0. One clear obstruction is the existenceof a (possibly time-dependent) submanifold of Rn of strictly smaller dimension(say k < n) which is invariant for the solution, at least locally. Indeed, Lebesguemeasure does not charge any such submanifold, thus ruling out that transitionprobabilities are absolutely continuous with respect to it.

If such a submanifold exists, call it say M ⊂ R × Rn, then it must be the casethat the vector fields ∂t − V0 and Vim

i=1 are all tangent to M. This implies inparticular that all Lie brackets between the Vj’s (including j = 0) are tangent toM, so that the vector space spanned by them is of dimension strictly less thann + 1. Since the vector field ∂t − V0 is the only one spanning the “time” direction,we conclude that if such a submanifold exists, then the dimension of the vectorspace V(x) spanned by Vi(x)m

i=1 as well as all the Lie brackets between the Vj’sevaluated at x, is strictly less than n for some values of x.

This suggests the following definition. Define V0 = Vimi=1 and then set

recursively

Vn+1 = Vn ∪ [Vi,V ] : V ∈ Vn, i ≥ 0 , V=⋃n≥0

Vn ,

as well as V(x) = spanV (x) : V ∈ V.Definition 1.1 Given a collection of vector fields as above, we say that it satisfiesthe parabolic Hörmander condition if dim V(x) = n for every x ∈ Rn.

Conversely, Frobenius’s theorem (see for example [Law77]) is a deep theorem inRiemannian geometry which can be interpreted as stating that if dim V(x) = k < nfor all x in some open set OofRn, thenR×O can be foliated into k +1-dimensionalsubmanifolds with the property that ∂t − V0 and Vim

i=1 are all tangent to thisfoliation. This discussion points towards the following theorem.

Theorem 1.2 (Hörmander) Consider (1.1), as well as the vector spaces V(x) ⊂Rn constructed as above. If the parabolic Hörmander condition is satisfied, thenthe transition probabilities for (1.1) have smooth densities with respect to Lebesguemeasure.

White noise and Wiener chaos 3

The original proof of this result goes back to [Hör67] and relied on purelyanalytical techniques. However, since it has a clear probabilistic interpretation,a more “pathwise” proof of Theorem 1.2 was sought for quite some time. Thebreakthrough came with Malliavin’s seminal work [Mal78], where he laid thefoundations of what is now known as the “Malliavin calculus”, a differentialcalculus in Wiener space, and used it to give a probabilistic proof of Hörmander’stheorem. This new approach proved to be extremely successful and soon anumber of authors studied variants and simplifications of the original proof[Bis81b, Bis81a, KS84, KS85, KS87, Nor86]. Even now, more than three decadesafter Malliavin’s original work, his techniques prove to be sufficiently flexible toobtain related results for a number of extensions of the original problem, includingfor example SDEs with jumps [Tak02, IK06, Cas09, Tak10], infinite-dimensionalsystems [Oco88, BT05, MP06, HM06, HM11], and SDEs driven by Gaussianprocesses other than Brownian motion [BH07, CF10, HP11, CHLT15].

1.1 Original referencesThe material for these lecture notes was taken mostly from the monographs[Nua06, Mal97], as well as from the note [Hai11]. Additional references to someof the original literature can be found at the end.

2 White noise and Wiener chaos

Let H = L2(R+,Rm) (but for the purpose of much of this section, H could be anyseparable Hilbert space), then white noise is a linear isometry W : H→ L2(Ω,P)for some probability space (Ω,P), such that eachW (h) is a centred Gaussian randomvariable. In other words, one has

EW (h) = 0 , EW (h)W (g) = 〈h, g〉 ,and each W (h) is Gaussian. Such a construct can easily be shown to exist.

Indeed, it suffices to take a sequence ξnn≥0 of i.i.d. normal random variablesand an orthonormal basis enn≥0 of H . For h =

∑n≥0 hnen ∈ H , it then suffices to

set W (h) = ∑n≥0 hnξn, with the convergence taking place in L2(Ω,P). Conversely,

given a white noise, it can always be recast in this form by setting ξn = W (en).A white noise determines an m-dimensional Wiener process, which we call

again W , in the following way. Write 1(i)[0,t) for the element of H given by

(1(i)[0,t)) j(s) =1 if s ∈ [0, t) and j = i,0 otherwise, (2.1)

and set Wi(t) = W (1(i)[0,t)). It is then immediate to check that one has indeed

EWi(s)W j(t) = δi j(s ∧ t) ,


so that this is a standard Wiener process. For arbitrary h ∈ H , one then has

W (h) =m∑

i=1

∫ ∞

0hi(s) dWi(s) , (2.2)

with the right hand side being given by the usual Itô integral.Let now Hn denote the nth Hermite polynomial. One way of defining these is

to set H0 = 1 and then recursively by imposing that H′n(x) = nHn−1(x) and that,for n ≥ 1, EHn(X) = 0 for a normal Gaussian random variable X with variance 1.This determines the Hn uniquely, since the first condition determines Hn up to aconstant, with the second condition determining the value of this constant uniquely.The first few Hermite polynomials are given by

H1(x) = x , H2(x) = x2 − 1 , H3(x) = x3 − 3x .

Remark 2.1 Beware that the definition given here differs from the one given in[Nua06] by a factor n!, but coincides with the one given in most other parts ofthe mathematical literature, for example in [Mal97]. In the physical literature,they tend to be defined in the same way, but with X of variance 1/2, so thatthey are orthogonal with respect to the measure with density exp(−x2) rather thanexp(−x2/2).

There is an analogy between expansions in Hermite polynomials and expansionin Fourier series. In this analogy, the factor n! plays the same role as the factor 2πthat appears in Fourier analysis. Just like there, one can shift it around to simplifycertain expressions, but one can never quite get rid of it.

An alternative characterisation of the Hermite polynomials is given by

Hn(x) = exp(−

D2

2

)xn , (2.3)

where D represents differentiation with respect to the variable x. To show that thisis the case, it suffices to verify that EHn(X) = 0 for n ≥ 1 and Hn as in (2.3). Sincethe Fourier transform of xn is cnδ

(n) for some constant cn and since exp(−x2/2) isa fixed point for the Fourier transform, one has for n > 0∫

e−x22 e−

D22 xn dx = cn

∫e−

k22 e

k22 δ(n)(k) dk = cn

∫δ(n)(k) dk = 0 ,

as required.A different recursive relation for the Hn’s is given by

Hn+1(x) = xHn(x) − H′n(x) , n ≥ 0 . (2.4)


To show that (2.4) holds, it suffices to note that

[ f (D), x] = f ′(D) ,so that indeed

Hn+1(x) = exp(−

D2

2

)x xn = xHn(x) +

[exp

(−

D2

2

), x

]xn

= xHn(x) − D exp(−

D2

2

)xn = xHn(x) − H′n(x) .

Combining both recursive characterisations of Hn, we obtain for n,m ≥ 0 theidentity∫

Hn(x)Hm(x)e−x2/2 dx =1

n + 1

∫H′n+1(x)Hm(x)e−x2/2 dx

=1

n + 1

∫Hn+1(x)xHm(x) − H′m(x)

e−x2/2 dx

=1

n + 1

∫Hn+1(x)Hm+1e−x2/2 dx .

Combining this with the fact that EHn(X) = 0 for n ≥ 1 and EH20 (X) = 1, we

immediately obtain the identity

EHn(X)Hm(X) = n!δn,m .

Fix now an orthonormal basis eii∈N of H . For every multiindex k, which weview as a function k : N→ N such that all but finitely many values vanish, we thenset

Φkdef=

∏i∈N

Hki (W (ei)) . (2.5)

It follows immediately from the above that

EΦkΦ` = k!δk,` , k! =∏

i

ki! . (2.6)

Write now H⊗sn for the subspace of H⊗n consisting of symmetric tensors. There isa natural projection Π : H⊗n → H⊗sn given as follows. For any permutation σ of1, . . . , n write Πσ : H⊗n → H⊗n for the linear map given by

Πσh1 ⊗ . . . ⊗ hn

= hσ(1) ⊗ . . . ⊗ hσ(n) .

We then set Π = 1n!

∑σ Πσ, where the sum runs over all permutations.


Writing |k | = ∑i ki, we set ek = Π

⊗i e⊗ki

i , which is an element of H⊗s |k |.Note that the vectors ek are not orthonormal, but that instead one has

〈ek, e`〉 = k!|k |!δk,` .

Comparing this to (2.6), we conclude that the maps

In : ek 7→1√

n!Φk , |k | = n , (2.7)

yield, for every n ≥ 0, an isometry between H⊗sn and some closed subspace Hnof L2(Ω,P). This space is called the nth homogeneous Wiener chaos after theterminology of the original article [Wie38] by Norbert Wiener where a constructionsimilar to this was first introduced, but with quite a different motivation. As amatter of fact, Wiener’s construction was based on the usual monomials instead ofHermite polynomials and, as a consequence, the analogues of the maps In in hiscontext were not isometries. The first construction equivalent to the one presentedhere was given almost two decades later by Irving Segal [Seg56], motivated in partby constructive quantum field theory.

We now show that the isomorphisms In are canonical, i.e. they do not dependon the choice of basis ei. For this, it suffices to show that for any h ∈ H with‖h‖ = 1 one has In(h⊗n) = Hn(W (h)). The main ingredient for this is the followinglemma.

Lemma 2.2 Let x, y ∈ R and a, b with a2 + b2 = 1. Then, one has the identity

Hn(ax + by) =n∑

k=0

(nk

)ak bn−k Hk(x)Hn−k(y) .

Proof. As a consequence of (2.3) we have

Hn(ax + by) = exp(−

D2x

2a2

)(ax + by)n = exp(−

D2x

2

)exp

(−

b2D2x

2a2

)(ax + by)n .

Noting that G(Dx/a)(ax + by)n = G(Dy/b)(ax + by)n, we conclude that

Hn(ax + by) = exp(−

D2x

2

)exp

(−

D2y

2

)(ax + by)n .

Applying the binomial theorem to (ax + by)n concludes the proof.Applying this lemma repeatedly and taking limits, we have


Corollary 2.3 Let a ∈ `2 with∑

a2i = 1 and let xii∈N be such that

∑i ai xi

converges. Then, one has the identity

Hn(∑

i∈N

ai xi)=

∑|k |=n

n!k!

ak∏i∈N

Hki (xi) ,

where ak def=

∏i aki

i . In particular, In(h⊗n) = Hn(W (h)) independently of the choiceof basis used in the definition of In.

It turns out that, as long as Ω contains no other source of randomness thangenerated by the white noise W , then the spaces Hn span all of L2(Ω,P). Moreprecisely, denote byFtheσ-algebra generated byW , namely the smallestσ-algebrasuch that the random variables W (h) are F-measurable for every h ∈ H . Then, onehas

Theorem 2.4 In the above context, one has

L2(Ω,F,P) =⊕n≥0

Hn .

Proof. Denote by HN ⊂ H the subspace generated by ekk≤N , by FN the σ-algebra generated by W (h) : h ∈ HN, and assume that one has

L2(Ω,FN,P) =⊕n≥0

H(N)n , (2.8)

where H(N)n ⊂ Hn is the image of H⊗sn

N under In. Let now X ∈ L2(Ω,F,P) and setXN = E(X |FN ). By Doob’s martingale convergence theorem, one has XN → Xin L2, thus showing by (2.8) that

⊕n≥0 Hn is dense in L2(Ω,F,P) as claimed.

To show (2.8), it only remains to show that if X ∈ L2(Ω,FN,P) satisfiesEXY = 0 for every Y ∈ H

(N)n and every n ≥ 0, then X = 0. We can write

X(ω) = f (W (e1), . . . ,W (eN )) (almost surely) for some square integrable functionf : RN → R. By assumption, one then has

∫f (x) P(x) µ(dx) = 0 for every

polynomial P and for µ the standard Gaussian measure on RN . Since µ hassubexponential tails, it follows from a straightforward approximation argument that∫

f (x) eik x µ(dx) = 0 for every k ∈ RN , whence the claim follows.1

Let us now show what a typical element of Hn looks like. We have already seenthat H0 only contains constants and H1 contains precisely all random variablesof the form (2.2). Write ∆n ⊂ Rn

+ for the cone consisting of points s with0 < s1 < · · · < sn. We then have

1The fact that µ integrates exponentials is crucial here. If this condition is dropped, then thereare counterexamples to the claim that polynomials are dense in L2(µ).

The Malliavin derivative and its adjoint 8

Lemma 2.5 For n ≥ 1, the space Hn consists of all random variables of the form

In(h) =∑

j1··· jn

∫ ∞

0

∫ sn

0· · ·

∫ s2

0h j1,..., jn (s1, . . . , sn) dW j1(s1) · · · dW jn (sn) ,

with h ∈ L2(∆n,Rmn ).Remark 2.6 We implicitly assume that the indices of h are contracted with thoseof the W ’s in the only “reasonable” way.

Proof. We identify L2(∆n,Rmn ) with a subspace of H⊗n and define the symmetri-sation Π : H⊗n → H⊗n as before. The map

√n!Π is then an isometry between

L2(∆n,Rmn ) and H⊗sn. Setting h =√

n!Π h, we claim that In(h) = In(h), fromwhich the lemma then follows.

Since linear combinations of such elements are dense in L2(∆n), it suffices tohake h of the form

h j1,..., jn (s) = g(1)j1(s1) · · · g(n)jn

(sn) ,where the functions g(i) ∈ H satisfy ‖g(i)‖ = 1 and have the property thatsup supp g(i) < inf supp g(i) for i < j. It then follows from the properties of thesupports and standard properties of Itô integration that

In(h) =n∏

i=1

W (g(i)) .

Since the functions g(i) have disjoint supports and are therefore all orthogonal in H ,we can view them as the first n elements of an orthonormal basis of H . The claimnow follows immediately from the definition (2.7) of In.

3 The Malliavin derivative and its adjoint

One of the goals ofMalliavin calculus is tomake precise the notion of “differentiationwith respect to white noise”. Let us formally write ξi(t) = dWi

dt , which actuallymakes sense as a random distribution. Then, any random variable X measurablewith respect to the filtration generated by the W (h)’s can be viewed as a function ofthe ξi’s.

At the intuitive level, one would like to introduce operators D (i)t which take the

derivative of a random variable with respect to ξi(t). What would natural propertiesof such operators be? On the one hand, one would certainly like to have

D (i)t W (h) = hi(t) , (3.1)


since, at least formally, one has

W (h) =m∑

i=1

∫ ∞

0hi(t)ξi(t) dt .

On the other hand, one would like these operators to satisfy the chain rule, sinceotherwise they could hardly claim to be “derivatives”:

D (i)t F(X1, . . . , Xn) =

n∑k=1

∂k F(X1, . . . , Xn)D (i)t Xk . (3.2)

Finally, when viewed as a function of t (and of the index i), the right hand side of(3.1) belongs to H , and this property is preserved by the chain rule. It is thereforenatural to ask for an operator D that takes as an argument a sufficiently “nice”random variable and returns an H-valued random variable, such that DW (h) = hand such that (3.2) holds.

Let now W⊂ L2(Ω,P) denote the set of all random variables X such that thereexists N ≥ 0, a function F : RN → R which, together with its derivatives, grows atmost polynomially at infinity, and elements hi ∈ H such that

X = F(W (h1), . . . ,W (hN )) . (3.3)

Given such a random variable, we define an H-valued random variable D X by

D X =N∑

k=1

∂k F(W (h1), . . . ,W (hN )) hk . (3.4)

One can show that D X is well-defined, i.e. does not depend on the choice ofrepresentation (3.3). Indeed, for h ∈ H , one can characterise 〈h,D X〉 as the limitin probability, as ε → 0, of ε−1(τεh X − X), where the translation operator τ isgiven by

τh X

(W ) = X(W +

∫ ·

0h(s) ds

).

This in turn does not depend on the representative of X in L2 since τ∗εhP is equivalent

to P for every h ∈ H as a consequence of the Cameron-Martin theorem, see forexample [Bog98]. Since W∩Hn is dense in Hn for every n, we conclude that Wis dense in L2(Ω,P), so that D is a densely defined unbounded linear operator onL2(Ω,P).

One very important tool in Malliavin calculus is the following integration byparts formula.


Proposition 3.1 For every X ∈ Wand h ∈ H , one has the identity

E〈D X, h〉 = EXW (h) .

Proof. By Grahm-Schmidt, we can assume that X is of the form (3.3) with the hiorthonormal. One then has

E〈D X, h〉 =N∑

k=1

E∂k F(W (h1), . . . ,W (hN )) 〈hk, h〉

=

N∑k=1

〈hk, h〉(2π)N/2

∫RN

e−|x |2/2∂k F(x1, . . . xk) dx

=

N∑k=1

〈hk, h〉(2π)N/2

∫RN

e−|x |2/2F(x1, . . . xk)xk dx

=

N∑k=1

EXW (hk)〈hk, h〉 = E

XW (h) .

To obtain the last identity, we used the fact that h =∑∞

k=1〈hk, h〉hk for an orthonor-mal basis hk, together with the fact that W (hk) is of mean 0 and independent ofX for every k > N .

Corollary 3.2 For every X,Y ∈ Wand every h ∈ H , one has

EY 〈D X, h〉 = E

XYW (h) − X〈DY, h〉 . (3.5)

Proof. Note that XY ∈ Wand that Leibniz’s rule holds.

As a consequence of the integration by parts formula (3.5), we can show thatthe operator D is closable, which guarantees that it is “well-behaved” from afunctional-analytic point of view.

Proposition 3.3 The operator D is closable. In other words if, for some sequenceXn ∈ W, one has Xn → 0 in L2(Ω,P) and D Xn → Y in L2(Ω,P, H), then Y = 0.

Proof. Let Xn be as in the statement of the proposition and let Z ∈ W, so that inparticular both Z and D Z have moments of all orders. It then immediately followsfrom (3.5) that on has

EZ〈Y, h〉 = lim

n→∞E

Xn ZW (h) − Xn〈D Z, h〉 = 0 .

If Y were non-vanishing, there would exists h such that the real-valued randomvariable 〈Y, h〉 is not identically 0. Since W is dense in L2, this would entail theexistence of some Z ∈ W such that E

Z〈Y, h〉 , 0, yielding a contradiction.


We henceforth denote by W1,2 the domain of the closure of D (namely thoserandom variables X such that there exists Xn ∈ Wwith Xn → X in L2 and suchthat D Xn converges to some limit D X) and we do not distinguish between D andits closure. We also follow [Nua06] in denoting the adjoint of D by δ. One canof course apply the Malliavin differentiation operator repeatedly, thus yielding anunbounded closed operator D k from L2(Ω,P) to L2(Ω,P, H⊗k). We denote thedomain of this operator by Wk,2. Actually, the exact same proof shows that powersof D are closable as unbounded operators from Lp(Ω,P) to Lp(Ω,P, H⊗k) for everyp ≥ 1. We denote the domain of these operators by Wk,p. Furthermore, for anyHilbert space K , we denote by Wk,p(K) the domain of D k viewed as an operatorfrom Lp(Ω,P, K) to Lp(Ω,P, H⊗k ⊗ K). We call a random variable belonging toWk,p for every k, p ≥ 1 “Malliavin smooth” and we write S=

⋂k,p Wk,p as well

as S(K) = ⋂k,p Wk,p(K). The Malliavin smooth random variables play a role

analogous to that of Schwartz test functions in finite-dimensional analysis.

Remark 3.4 As an immediate consequence of Hölder’s inequality and the Leibnizrule, S is an algebra.

Let us now try to get some feeling for the domain of δ. Recall that, by definitionof the adjoint, the domain of δ is given by those elements u ∈ L2(Ω,P, H) such thatthere exists Y ∈ L2(Ω,P) satisfying the identity

E〈u,D X〉 = E(Y X) ,for every X ∈ W. One then writes Y = δu. Interestingly, it turns out that theoperator δ is an extension of Itô integration! It is therefore also called the Skorokhodintegration operator and, instead of just writing δu, one often writes instead∫ ∞

0u(t) δW (t) .

We now proceed to showing that it is indeed the case that, if u is a square integrablestochastic process that is adapted to the filtration generated by the increments of theunderlying Wiener process W , then u belongs to the domain of δ and δu coincideswith the usual Itô integral of u against W .

To formulate this more precisely, denote by Ft the σ-algebra generated by therandom variables W (h) with supp h ⊂ [0, t]. Consider then the set of elementaryadapted processes, which consist of all processes of the form

u =N∑

k=1

Y (i)k 1(i)[sk,tk ) , (3.6)

for some N ≥ 1, some times sk, tk with 0 ≤ sk < tk < ∞, and some randomvariables Y (i)

k ∈ L2(Ω,Fsk ,P). Summation over i is also implied. We denote


by L2a(Ω,P, H) ⊂ L2(Ω,P, H) the closure of this set. Recall then that, for an

elementary adapted process of the type (3.6), its Itô integral is given by∫ ∞

0u(t) dW (t) def

=

N∑k=1

Y (i)k

Wi(tk) −Wi(sk) =

N∑k=1

Y (i)k W

1(i)[sk,tk )

. (3.7)

Using Itô’s isometry, this can then be extended to all of L2a.

Theorem 3.5 The space L2a(Ω,P, H) is included in the domain of δ and, on it, δ

coincides with the Itô integration operator.

Proof. Let u be an elementary adapted process of the form (3.6) with each Y (i)k in

W. For X ∈ Wone then has, as a consequence of (3.5),

E〈u,D X〉 =

N∑k=1

EY (i)

k 〈1(i)[sk,tk ),D X〉 (3.8)

=

N∑k=1

E(Y (i)

k X W (1(i)[sk,tk )) − X〈DY (i)k , 1(i)[sk,tk )〉

).

At this stage, we note that since DY (i)k is Fsk -measurable by assumption, it has

a representation of the type (3.3) with each h j satisfying supp h j ∈ [0, sk]. Inparticular, one has 〈h j, 1(i)[sk,tk )〉 = 0 so that, by (3.4), one has 〈DY (i)

k , 1(i)[sk,tk )〉 = 0.Combining this with the above identity and (3.7), we conclude that

E〈u,D X〉 = E

(X

∫ ∞

0u(t) dW (t)) .

Taking limits on both sides of this identity, we conclude that it holds for everyu ∈ L2

a, thus completing the proof.

One also has the following extension of Itô’s isometry.

Theorem 3.6 The space W1,2(H) is included in the domain of δ and, on it, theidentity

E|δu|2 = E∫ ∞

0|u(t)|2 dt + E

∫ ∞

0

∫ ∞

0D (i)

s u j(t)D ( j)t ui(s) ds dt

holds, with summation over repeated indices implied.


Proof. Consider similarly to before u to be a process of the form u =∑N

i=1 Y (i)h(i)with Y (i) ∈ Wand h(i) ∈ H . It then follows from the same calculation as (3.8) that

δu = Y (i)W (h(i)) − 〈DY (i), h(i)〉 , (3.9)

with summation over i implied, so that

Dhδu = DhY (i) W (h(i))+Y (i)〈h, h(i)〉− 〈D2Y (i), h ⊗ h(i)〉 = δDhu+ 〈h, u〉 . (3.10)(This is nothing but an instance of the “canonical commutation relations” appearingin quantum mechanics.) Integrating by parts, applying (3.10), and then integratingby parts again it follows that

E|δu|2 = E〈u,Dδu〉 = E〈u, u〉 + EY (i)δ(Dh(i)Y( j)h( j))

= E〈u, u〉 + EDh( j)Y(i)Dh(i)Y

( j) ,

with summation over i and j implied. Noting that

Dh( j)Y(i) Dh(i)Y

( j) =∫ ∞

0

∫ ∞

0

D (k)

s Y (i) h( j)k (s)

D (`)t Y ( j) h(i)

`(t) ds dt

=

∫ ∞

0

∫ ∞

0

D (k)

s Y (i) h(i)`(t)

D (`)t Y ( j)h( j)

k (s) ds dt

=

∫ ∞

0

∫ ∞

0D (k)

s u`(t)D (`)t uk(s) ds dt ,

and using the density of the class of processes we considerd concludes the proof.

In a similar way, we can give a very nice characterisation of the “Ornstein-Uhlenbeck operator” δD :

Proposition 3.7 The spaces Hn are invariant for ∆ = δD and one has ∆X = nXfor every X ∈ Hn.

Proof. Fix an orthonormal basis ek of H. Then, by definition, the randomvariables Φk as in (2.5) with |k | = n are dense in Hn. Recalling that DHk(W (h)) =kHk−1(W (h))h, one has

DΦk =∑

i

kiΦk−δi ei ,

where δi is given by δi( j) = δi, j . We now recall that, as in (3.8), one has the identity

δ(X h) = X W (h) − 〈D X, h〉 ,


for every X ∈ Wand h ∈ H , so that

∆Φk =∑

i

kiΦk−δiW (ei) −∑i, j

ki(k j − δi, j)Φk−δi−δ j 〈ei, e j〉

=∑

i

kiΦk−δiW (ei) − (ki − 1)Φk−2δi

.

Recall now that, by (2.4), one has

Hki−1(x)x − (ki − 1)Hki−2(x) = Hki (x) ,so that one does indeed obtain ∆Φk =

∑i kiΦk = nΦk as claimed.

An important remark to keep in mind is that while δ is an extension of Itôintegration it is not the only such extension, and not even the only “reasonable” one.Actually, one may argue that it is not a “reasonable” extension of Itô’s integral atall since, for a generic random variable X , one has in general∫ ∞

0X u(t) δW (t) , X

∫ ∞

0u(t) δW (t) .

Also, if one considers a one-parameter family of stochastic processes a 7→ u(a, ·)and sets G(a) = ∫ ∞

0 u(a, t) δW (t), then in general one has

G(X) ,∫ ∞

0u(X, t) δW (t) ,

if X is a random variable.It will be useful in the sequel to be able to have a formula for the Malliavin

derivative of a random variable that is already given as a stochastic integral.Consider a random variable X of the type

X =∫ ∞

0u(t) dW (t) , (3.11)

with u ∈ L2a is sufficiently “nice”. At a formal level, one would then expect to have

the identity

D (i)s X = ui(s) +

∫ ∞

0D (i)

s u(t) dW (t) . (3.12)

This is indeed the case, as the following proposition shows.

Proposition 3.8 Let u ∈ L2a(Ω,P, H) be such that ui(t) ∈ W1,2 for almost every t

and∫ ∞0 E‖Dui(t)‖2 dt < ∞. Then (3.12) holds.


Proof. Take u of the form (3.6) with each Y (i)k in W, so that X =

∑Y (i)

k W1(i)[sk,tk )

.

It then follows from the chain rule that

D X =N∑

k=1

Y (i)

k 1(i)[sk,tk ) +DY (i)k W

1(i)[sk,tk )

= u +

∫ ∞

0Du(t) dW (t) ,

and the claim follows from a simple approximation argument, combined with thefact that D is closed.

Finally, we will use the important fact that the divergence operator δ maps Sinto S. This is a consequence of the following result.

Proposition 3.9 For every p ≥ 2 there exist constants k and C such that, for everyseparable Hilbert space K and every u ∈ S(H ⊗ K), one has the bound

E|δu|p ≤ C∑

0≤`≤k

E|D`u|2p1/2

.

Proof. For p ∈ [1, 2], the bound follows immediately from Theorem 3.6 andJensen’s inequality. Take now p > 2. Using the definition of δ combined with thechain rule for D , Proposition 3.8, and Young’s inequality, we obtain the bound

E|δu|p = (p − 1)E|δu|p−2〈u,Dδu〉 = (p − 1)E|δu|p−2|u|2 + 〈u, δDu〉

≤12E|δu|p + cE

|u|p + |u|p/2 |δDu|p/2,

for some constant c. We now use Hölder’s inequality which yields

E|u|p/2 |δDu|p/2

≤E|u|2p1/4

E|δDu|2p/33/4.

Combining this with the above, we conclude that there exists a constant C such that

E|δu|p ≤ CE|D`u|2p1/2

+E|δDu|2p/33/2

.

The proof is concluded by a simple inductive argument.

Corollary 3.10 The operator δ maps S(H ⊗ K) into S(K).Proof. In order to estimate E|D kδu|p, it suffices to first apply Proposition 3.8 ktimes and then Proposition 3.9.

Remark 3.11 The above argument is very far from being sharp. Actually, it ispossible to show that δ maps Wk,p into Wk−1,p for every p ≥ 2 and every k ≥ 1.This however requires a much more delicate argument.

Smooth densities 16

4 Smooth densities

In this section, we give sufficient conditions for the law of a random variable X tohave a smooth density with respect to Lebesgue measure. The main ingredient forthis is the following simple lemma.

Lemma 4.1 Let X be anRn-valued random variable for which there exist constantsCk such that |ED(k)G(X)| ≤ Ck ‖G‖∞ for every G ∈ C∞0 and k ≥ 1. Then the lawof X has a smooth density with respect to Lebesgue measure.

Proof. Denoting by µ the law of X n, our assumption can be rewritten as

∫Rn

D(k)G(x) µ(dx) ≤ Ck ‖G‖∞ . (4.1)

Let now s > n/2 so that ‖G‖∞ . ‖G‖H s by Sobolev embedding. By duality andthe density of C∞0 in H s, the assumption then implies that every distributionalderivative of µ belongs to the Sobolev space H−s so that, as a distribution, µbelongs to H` for every ` ∈ R. The result then follows from the fact that H` ⊂ Ck

as soon as ` > k + n2 .

Remark 4.2 If the bound (4.1) only holds for k = 1, then it is still the case that thelaw of X has a density with respect to Lebesgue measure.

The idea now is to make repeated use of the integration by parts formula (3.5)in order to control the expectation of D(k)G(X). Consider first the case k = 1 andwrite 〈u, DG〉 for the directional derivative of G in the direction u ∈ Rn. Ideally,for every i ∈ 1, . . . , n, we would like to find an H-valued random variable Yiindependent of G such that

∂iG(X) = 〈DG(X),Yi〉 , (4.2)

where the second scalar product is taken in H , so that one has

E∂iG(X) = EG(X) δYi

.

If Yi can be chosen in such a way that E|δYi | < ∞ for every i, then the bound(4.1) for k = 1 follows. Since DG(X) = ∑

j ∂jG(X)D X j as a consequence of thechain rule, a random variable Yi as in (4.2) can be found only if D X , viewed as arandom linear map from H to Rn, is almost surely surjective. This suggests thatan important condition will be that of the invertibility of theMalliavin matrix Mdefined by

Mi j = 〈D Xi,D X j〉 , (4.3)

Smooth densities 17

where the scalar product is taken in H . Assuming that M is invertible, the solutionwith minimal H-norm to the overdetermined system δi = 〈D X,Yi〉 (where δidenotes the ith canonical basis vector in Rn) is given by

Yi = (D X)∗M−1δi .

Assuming a sufficient amount of regularity, this shows that a bound of the typeappearing in the assumption of Lemma 4.1 holds for a random variable X whoseMalliavin matrix M is invertible and whose inverse has a finite moment ofsufficiently high order. The following theorem should therefore not come as asurprise.

Theorem 4.3 Let X be a Malliavin smooth Rn-valued random variable such thatthe Malliavin matrix defined in (4.3) is almost surely invertible and has inversemoments of all orders. Then the law of X has a smooth density with respect toLebesgue measure.

The main ingredient of the proof of this theorem is the following lemma.

Lemma 4.4 Let X be as above and let Z ∈ S. Then, there exists Z ∈ S such thatthe identity

EZ∂iG(X) = E

G(X)Z

, (4.4)

holds for every G ∈ C∞0 .

Proof. Following the calculation above, defining the H-valued random variable Yiby

Yi =

n∑j=1

(D X j)M−1ji ,

we have the identity ∂iG(X) = 〈DG(X),Yi〉. As a consequence, (4.4) holds withZ = δ

Z Yi

.

The claim now follows fromRemark 3.4 and Proposition 3.9, as soon as we can showthat M−1ji ∈ S. This however follows from the chain rule for D and Remark 3.4,since the former shows that D kM−1ji can be written as a polynomial in M−1ji andD`X for ` ≤ k.

Proof of Theorem 4.3. By Lemma 4.1 it suffices to show that under the assumptionsof the theorem, for every multiindex k and every random variable Y ∈ S, thereexists a random variable Z ∈ S such that

EY DkG(X) = E

G(X)Z

. (4.5)

Malliavin Calculus for Diffusion Processes 18

We proceed by induction, the claim being trivial for k = 0. Assuming that (4.5)holds for some k, we then obtain as a consequence of Lemma 4.4 that

EY Dk∂iG(X) = E

∂iG(X)Z

= EG(X)Z

,

for some Z ∈ S, which is precisely the required bound (4.5), but for k + ei.

5 Malliavin Calculus for Diffusion Processes

We are now in possession of all the abstract tools required to tackle the proof ofHörmander’s theorem. Before we start however, we discuss how Ds Xt can becomputed when Xt is the solution to an SDE of the type (1.1). Recall first that, bydefinition, (1.1) is equivalent to the Itô stochastic differential equation

dXt = V0(Xt) dt +m∑

i=1

Vi(Xt) dWi(t) , (5.1)

with V0 given in coordinates by

(V0)i(x) = (V0)i(x) + 12(∂kVj)i(x)(Vj)k(x) ,

with summation over repeated indices implied. We assume that Vj ∈ C∞b , the spaceof smooth vector fields that are bounded, together with all of their derivatives. Itimmediately follows that, for every initial condition x0 ∈ Rn, (5.1) can be solvedby simple Picard iteration, just like ordinary differential equations, but in the spaceof adapted square integrable processes.

An important tool for our analysis will be the linearisation of (1.1) with respectto its initial condition. This is obtained by simply formally differentiating bothsides of (1.1) with respect to the initial conditions x0. For any s ≥ 0, this yields thenon-autonomous linear equation

dJs,t = DV0(Xt) Js,t dt +m∑

i=1

DVi(Xt) Js,t dWi(t) , Js,s = id , (5.2)

where id denotes the n × n identity matrix. This in turn is equivalent to theStratonovich equation

dJs,t = DV0(Xt) Js,t dt +m∑

i=1

DVi(Xt) Js,t dWi(t) , Js,s = id . (5.3)


Higher order derivatives J(k)0,t with respect to the initial condition can be definedsimilarly. It is straightforward to verify that this equation admits a unique solution,and that this solution satisfies the identity

Jt,u Js,t = Js,u , (5.4)

for any three times s ≤ t ≤ u.Under our standing assumptions for SDEs, theJacobian has moments of all orders:

Proposition 5.1 If Vi ∈ C∞b for all i, then sups,t≤T E|Js,t |p < ∞ for every T > 0and every p ≥ 1.

Proof. We write |A| for the Frobenius norm of a matrix A. A tedious applicationof Itô’s formula shows that for p ≥ 4 one has

d |Js,t |p = p|Js,t |p−2(〈Js,t, DV0(Xt) Js,t〉 dt +

m∑i=1

〈Js,t, DVi(Xt) Js,t〉 dWi(t))

+p2|Js,t |p−4

m∑i=1

((p − 2)〈DVi(Xt)Js,t, Js,t〉2 + |Js,t |2(tr DVi(Xt)Js,t)2)

dt .

Writing this in integral form, taking expectations on both sides and using theboundedness of the derivatives of the vector fields, we conclude that there exists aconstant C such that

E|Js,t |p ≤ np/2 + C∫ t

sE|Js,r |p dr ,

so that the claim now follows from Gronwall’s lemma. (The np/2comes from theinitial condition, which equals |id|p = np/2.)

As a consequence of (5.4), one has Js,t = J0,t J−10,s . One can also verify that theinverse J−10,t of the Jacobian solves the SDE

dJ−10,t = −J−10,t DV0(Xt) dt −m∑

i=1

J−10,t DVi(Xt) dWi . (5.5)

This follows from the chain rule by noting that if we denote by Ψ(A) = A−1 the mapthat takes the inverse of a square matrix, then we have DΨ(A)H = −A−1H A−1.

On the other hand, we can use (3.12) to, at least on a formal level, take theMalliavin derivative of the integral form of (5.1), which then yields for r ≤ t theidentity

D ( j)r X(t) =

∫ t

rDV0(Xs)D ( j)

r Xs ds +m∑

i=1

∫ t

rDVi(Xs)D ( j)

r Xs dWi(s) + Vj(Xr) .


We see that, save for the initial condition at time t = r given by Vj(Xr), this equationis identical to the integral form of (5.2)! Using the variation of constants formula,it follows that for s < t one has the identity

D ( j)s Xt = Js,tVj(Xs) . (5.6)

Furthermore, since Xt is independent of the later increments of W , we haveD ( j)

s Xt = 0 for s ≥ t. This formal manipulation can easily be justified a posteriori,thus showing that the random variables Xt belongs to W1,p for every p. In fact, onehas even more than that:

Proposition 5.2 If the vector fields Vi belong to C∞b and X0 ∈ Rn is deterministic,then the solution Xt to (5.1) belongs to S for every t ≥ 0.

Before we prove this, let us recall the following bound on iterated Itô integrals.

Lemma 5.3 Let k ≥ 1 and let v be a stochastic process on Rk with E‖v‖pLp < ∞

for some p ≥ 2. Then, one has the bound

E

∫ t

0· · ·

∫ s2

0v(s1, . . . , sk) dWi1(s1) · · · dWik (sk)

p≤ Ct

k(p−2)2 E‖v‖p

Lp .

Proof. The proof goes by induction over k. For k = 1, it follows from theBurkholder-David-Gundy inequality followed by Hölder’s inequality that

E

∫ t

0v(s) dWi(s)

p≤ CE

∫ t

0|v(s)|2 ds

p/2≤ Ct

p−22 E

∫ t

0|v(s)|p ds , (5.7)

as claimed. In the general case, we set

v(sk) =∫ sk

0· · ·

∫ s2

0v(s1, . . . , sk) dWi1(s1) · · · dWik−1(sk−1) .

Combining (5.7) with the induction hypothesis, we obtain

E

∫ t

0v(sk) dWik (sk)

p≤ Ct

p−22

∫ t

0E|v(sk)|p dsk

≤ Ctp−22 t

(k−1)(p−2)2

∫ t

0E(∫ sk

0· · ·

∫ s1

0|v(s1, . . . , sk)|p ds1 · · · dsk−1

)dsk ,

and the claim follows from Fubini’s theorem.


Proof of Proposition 5.2. We first derive an equation for the Malliavin derivativeof the Jacobian Js,t . Differentiating the integral form of (5.2) we obtain for r < sthe identity

D ( j)r Js,t =

∫ t

sDV0(Xu)D ( j)

r Js,u du +∫ t

sD2V0(Xu)

Js,u,D( j)r Xu

du

+

m∑i=1

∫ t

sDVi(Xu)D ( j)

r Js,u dWi(u) +∫ t

sD2Vi(Xu)

Js,u,D( j)r Xu

dWi(u) .

Using (5.6), we can rewrite this as

D ( j)r Js,t =

∫ t

sDV0(Xu)D ( j)

r Js,u du +∫ t

sD2V0(Xu)

Js,u, Jr,uVj(Xr) du

+

m∑i=1

∫ t

sDVi(Xu)D ( j)

r Js,u dWi(u)

+

m∑i=1

∫ t

sD2Vi(Xu)

Js,u, Jr,uVj(Xr) dWi(u) .

Once again, we see that this is nothing but an inhomogeneous version of theequation for Js,t itself. The variation of constants formula thus yields

D ( j)r Js,t =

∫ t

sJu,t D2V0(Xu)

Js,u, Jr,uVj(Xr) du

+

m∑i=1

∫ t

sJu,t D2Vi(Xu)

Js,u, Jr,uVj(Xr) dWi(u) .

This allows to show by induction that, for any integer k, the iterated Malliavinderivative D ( j1)

r1 · · ·D( jk )rk X(t) with r1 ≤ · · · ≤ rk can be expressed as a finite sum

of terms consisting of a multiple iterated Wiener / Lebesgue integral with integrandgiven by a finite product of components of the type Jsi,s j with i < j, as well asfunctions in C∞b evaluated at Xsi . This has moments of all orders as a consequenceof Proposition 5.1, combined with Lemma 5.3.

Theorem 5.4 Let x0 ∈ Rn and let Xt be the solution to (1.1). If the vector fieldsVj ⊂ C∞b satisfy the parabolic Hörmander condition, then the law of Xt has asmooth density with respect to Lebesgue measure.

Proof. Denote byA0,t the operatorA0,tv =∫ t0 Js,tV (Xs)v(s) ds, where v is a square

integrable, not necessarily adapted, Rm-valued stochastic process and V is then × m matrix-valued function obtained by concatenating the vector fields Vj for

Hörmander’s Theorem 22

j = 1, . . . ,m. With this notation, it follows from (5.6) that the Malliavin covariancematrix M0,t of Xt is given by

M0,t = A0,tA∗0,t =

∫ t

0Js,tV (Xs)V ∗(Xs)J∗s,t ds .

It follows from (5.6) that the assumptions of Theorem 4.3 are satisfied for therandom variable Xt , provided that we can show that ‖M−10,t ‖ has bounded momentsof all orders. This in turn follows by combining Lemma 6.2 with Theorem 6.3below.

6 Hörmander’s Theorem

This section is devoted to a proof of the fact that Hörmander’s condition is sufficientto guarantee the invertibility of the Malliavin matrix of a diffusion process. Forpurely technical reasons, it turns out to be advantageous to rewrite the Malliavinmatrix as

M0,t = J0,t C0,t J∗0,t , C0,t =

∫ t

0J−10,s V (Xs)V ∗(Xs)J−10,s

∗ ds ,

where C0,t is the reduced Malliavin matrix of our diffusion process.

Remark 6.1 The reason for considering the reduced Malliavin matrix is that theprocess appearing under the integral in the definition of C0,t is adapted to thefiltration generated by Wt . This allows us to use some tools from stochastic calculusthat would not be available otherwise.

Since we assumed that J0,t has inverse moments of all orders, the invertibilityof M0,t is equivalent to that of C0,t . Note first that since C0,t is a positive definitesymmetric matrix, the norm of its inverse is given by

‖C−10,t ‖ =(inf|η |=1

〈η, C0,tη〉)−1

.

A very useful observation is then the following:

Lemma 6.2 Let M be a symmetric positive semidefinite n×n matrix-valued randomvariable such that E‖M‖p < ∞ for every p ≥ 1 and such that, for every p ≥ 1there exists Cp such that

sup|η |=1

P〈η, Mη〉 < ε

≤ Cpε

p , (6.1)

holds for every ε ≤ 1. Then, E‖M−1‖p < ∞ for every p ≥ 1.


Proof. The non-trivial part of the result is that the supremum over η is taken outsideof the probability in (6.1). For ε > 0, let ηkk≤N be a sequence of vectors with|ηk | = 1 such that for every η with |η | ≤ 1, there exists k such that |ηk − η | ≤ ε2. Itis clear that one can find such a set with N ≤ Cε2−2n for some C > 0 independentof ε. We then have the bound

〈η, Mη〉 = 〈ηk, Mηk〉 + 〈η − ηk, Mη〉 + 〈η − ηk, Mηk〉≥ 〈ηk, Mηk〉 − 2‖M‖ε2 ,

so that

P(inf|η |=1

〈η, Mη〉 ≤ ε) ≤ P(infk≤N

〈ηk, Mηk〉 ≤ 4ε)+ P

(‖M‖ ≥ 1ε

)≤ Cε2−2n sup

|η |=1P(〈η, Mη〉 ≤ 4ε

)+ P

(‖M‖ ≥ 1ε

).

It now suffices to use (6.1) for p large enough to bound the first term and Chebychev’sinequality combined with the moment bound on ‖M‖ to bound the second term.

As a consequence of this, Theorem 5.4 is a corollary of:

Theorem 6.3 Under the assumptions of Theorem 5.4, for every initial conditionx ∈ Rn, we have the bound

sup|η |=1

P〈η, C0,1η〉 < ε

≤ Cpε

p ,

for suitable constants Cp and all p ≥ 1.

Remark 6.4 The choice t = 1 as the final time is of course completely arbitrary.Here and in the sequel, we will always consider functions on the time interval [0, 1].

Before we turn to the proof of this result, we introduce a very useful notationwhich was introduced in [HM11]. Given a family A = Aεε∈(0,1] of eventsdepending on some parameter ε > 0, we say that A is “almost true” if, for everyp > 0 there exists a constant Cp such that P(Aε) ≥ 1 − Cpε

p for all ε ∈ (0, 1].Similarly for “almost false”. Given two such families of events A and B, we saythat “A almost implies B” and we write A ⇒ε B if A \ B is almost false. It isstraightforward to check that these notions behave as expected (almost implicationis transitive, finite unions of almost false events are almost false, etc). Note alsothat these notions are unchanged under any reparametrisation of the form ε 7→ εα

for α > 0. Given two families X and Y of real-valued random variables, we willsimilarly write X ≤ε Y as a shorthand for the fact that Xε ≤ Yε is “almost true”.

Before we proceed, we state the following useful result, where ‖ · ‖∞ denotesthe L∞ norm and ‖ · ‖α denotes the best possible α-Hölder constant.


Lemma 6.5 Let f : [0, 1] → R be continuously differentiable and let α ∈ (0, 1].Then, the bound

‖∂t f ‖∞ = ‖ f ‖1 ≤ 4‖ f ‖∞max1, ‖ f ‖− 1

1+α∞ ‖∂t f ‖ 1

1+αα

holds, where ‖ f ‖α denotes the best α-Hölder constant for f .

Proof. Denote by x0 a point such that |∂t f (x0)| = ‖∂t f ‖∞. It follows from thedefinition of the α-Hölder constant ‖∂t f ‖Cα that |∂t f (x)| ≥ 1

2 ‖∂t f ‖∞ for every xsuch that |x − x0 | ≤ ‖∂t f ‖∞/2‖∂t f ‖Cα 1/α. The claim then follows from the factthat if f is continuously differentiable and |∂t f (x)| ≥ A over an interval I, thenthere exists a point x1 in the interval such that | f (x1)| ≥ A|I |/2.

With these notations at hand, we have the following statement, which isessentially a quantitative version of the Doob-Meyer decomposition theorem.Originally, it appeared in [Nor86], although some form of it was already present inearlier works. The statement and proof given here are slightly different from thosein [Nor86], but are very close to them in spirit.

Lemma 6.6 Let W be an m-dimensional Wiener process and let A and B be R andRm-valued adapted processes such that, for α = 1

3 , one has E‖A‖α + ‖B‖αp

< ∞for every p. Let Z be the process defined by

Zt = Z0 +

∫ t

0As ds +

∫ t

0Bs dW (s) . (6.2)

Then, there exists a universal constant r ∈ (0, 1) such that one has‖Z‖∞ < ε

⇒ε

‖A‖∞ < εr &

‖B‖∞ < εr .

Proof. Recall the exponential martingale inequality [RY99, p. 153], stating that ifM is any continuous martingale with quadratic variation process 〈M〉(t), then

P(supt≤T

|M(t)| ≥ x & 〈M〉(T) ≤ y)≤ 2 exp

−x2/2y

,

for every positive T , x, y. With our notations, this implies that for any q < 1 andany adapted process F, one has the almost implication

‖F‖∞ < ε⇒ε

∫ ·

0Ft dW (t) ∞ < εq

. (6.3)

With this bound in mind, we apply Itô’s formula to Z2, so that

Z2t = Z2

0 + 2∫ t

0Zs As ds + 2

∫ t

0Zs Bs dW (s) +

∫ t

0B2

s ds . (6.4)


Since ‖A‖∞ ≤ε ε−1/4 (or any other negative exponent for that matter) by assumptionand similarly for B, it follows from this and (6.3) that

‖Z‖∞ < ε⇒ε

∫ 1

0As Zs ds ≤ ε

34

&

∫ 1

0Bs Zs dW (s) ≤ ε

23

.

Inserting these bounds back into (6.4) and applying Jensen’s inequality then yields

‖Z‖∞ < ε⇒ε

∫ 1

0B2

s ds ≤ ε12

⇒

∫ 1

0|Bs | ds ≤ ε

14

.

We now use the fact that ‖B‖α ≤ε ε−q for every q > 0 and we apply Lemma 6.5with ∂t f (t) = |Bt | (we actually do it component by component), so that

‖Z‖∞ < ε⇒ε

‖B‖∞ ≤ ε 117

,

say. In order to get the bound on A, note that we can again apply the exponentialmartingale inequality to obtain that this “almost implies” the martingale part in(6.2) is “almost bounded” in the supremum norm by ε

118 , so that

‖Z‖∞ < ε⇒ε

∫ ·

0As ds ∞ ≤ ε

118

.

Finally applying again Lemma 6.5 with ∂t f (t) = At , we obtain that‖Z‖∞ < ε

⇒ε

‖A‖∞ ≤ ε1/80,

and the claim follows with r = 1/80.

Remark 6.7 By making α arbitrarily close to 1/2, keeping track of the differentnorms appearing in the above argument, and then bootstrapping the argument, it ispossible to show that

‖Z‖∞ < ε⇒ε

‖A‖∞ ≤ εp&

‖B‖∞ ≤ εq,

for p arbitrarily close to 1/5 and q arbitrarily close to 3/10. This seems to be a verysmall improvement over the exponent 1/8 that was originally obtained in [Nor86],but is certainly not optimal either. The main reason why our result is suboptimal isthat we move several times back and forth between L1, L2, and L∞ norms. (Notefurthermore that our result is not really comparable to that in [Nor86], since Norrisused L2 norms in the statements and his assumptions were slightly different fromours.)

We now have all the necessary tools to prove Theorem 6.3:


Proof of Theorem 6.3. We fix some initial condition x0 ∈ Rn and some unit vectorη ∈ Rn. With the notation introduced earlier, our aim is then to show that

〈η, C0,1η〉 < ε⇒ε ∅ , (6.5)

or in other words that the statement 〈η, C0,1η〉 < ε is “almost false”. As a shorthand,we introduce for an arbitrary smooth vector field F on Rn the process ZF defined by

ZF(t) = 〈η, J−10,t F(xt)〉 ,so that

〈η, C0,1η〉 =m∑

k=1

∫ 1

0|ZVk

(t)|2 dt ≥m∑

k=1

(∫ 1

0|ZVk

(t)| dt)2. (6.6)

The processes ZF have the nice property that they solve the stochastic differentialequation

dZF(t) = Z[F,V0](t) dt +m∑

i=1

Z[F,Vk ](t) dWk(t) , (6.7)

which can be rewritten in Itô form as

dZF(t) =(Z[F,V0](t) +

m∑k=1

12

Z[[F,Vk ],Vk ](t))

dt +m∑

i=1

Z[F,Vk ](t) dWk(t) . (6.8)

Sincewe assumed that all derivatives of theVj grow atmost polynomially, we deducefrom the Hölder regularity of Brownian motion that, provided that the derivativesof F grow at most polynomially fast, ZF does indeed satisfy the assumptions on itsHölder norm required for the application of Norris’s lemma. The idea now is toobserve that, by (6.6), the left hand side of (6.5) states that ZF is “small” for everyF ∈ V0. One then argues that, by Norris’s lemma, if ZF is small for every F ∈ Vkthen, by considering (6.7), it follows that ZF is also small for every F ∈ Vk+1.Hörmander’s condition then ensures that a contradiction arises at some stage, sinceZF(0) = 〈F(x0), ξ〉 and there exists k such that Vk(x0) spans all of Rn.

Let us make this rigorous. It follows from Norris’s lemma and (6.8) that onehas the almost implication

‖ZF ‖∞ < ε⇒ε

‖Z[F,Vk ]‖∞ < εr &

‖ZG‖∞ < εr ,

for k = 1, . . . ,m and for G = [F,V0] + 12∑m

k=1[[F,Vk],Vk]. Iterating this bound asecond time, this time considering the equation for ZG, we obtain that

‖ZF ‖∞ < ε⇒ε

‖Z[[F,Vk ],V`]‖∞ < εr2,

Hypercontractivity 27

so that we finally obtain the implication‖ZF ‖∞ < ε

⇒ε

‖Z[F,Vk ]‖∞ < εr2, (6.9)

for k = 0, . . . ,m.At this stage, we are basically done. Indeed, combining (6.6) with Lemma 6.5

as above, we see that〈η, C0,1η〉 < ε

⇒ε

‖ZVk‖∞ < ε1/5

.

Applying (6.9) iteratively, we see that for every k > 0 there exists some qk > 0such that 〈η, C0,1η〉 < ε

⇒ε

⋂V∈Vk

‖ZV ‖∞ < εqk.

Since ZV (0) = 〈η,V (x0)〉 and since there exists some k > 0 such that Vk(x0) = Rn,the right hand side of this expression is empty for some sufficiently large value ofk, which is precisely the desired result.

7 Hypercontractivity

The aim of this section is to prove the following result. Let Tt denote the semigroupgenerated by the Ornstein-Uhlenbeck operator ∆ defined in Section 3. In otherwords, one sets Tt = exp(−∆t), which can be defined by functional calculus. Sincewe have an explicit eigenspace decomposition of ∆ by Proposition 3.7, this isequivalent to simply setting Tt X = e−nt X for every X ∈ Hn. The main result ofthis section is the following.

Theorem 7.1 For p, q ∈ (1,∞) and t ≥ 0 with p−1q−1 = e2t , one has

‖Tt X ‖Lp ≤ ‖X ‖Lq , (7.1)

for every X ∈ Lq(Ω,P).Versions of this theorem were first proved by Nelson [Nel66, Nel73] with again

constructive quantum field theory as his motivation. An operator Tt which satisfiesa bound of the type (7.1) for some p > q is called “hypercontractive”. An extremelyimportant feature of this bound is that it holds without the appearance of anyproportionality constant. As we will see in Corollary 7.3 below, this makes it stableunder tensorisation, which is a very powerful property.

Let us provide a simple application of this result. An immediate corollary is that,for any X ∈ Hn and p ≥ 1, one has the very useful “reverse Jensen’s inequality”

EX2p ≤ (2p − 1)npEX2p

. (7.2)


Although we have made no attempt at optimising this statement, it is alreadyremarkably precise: using Stirling’s formula, one can verify from the explicitformula for the moments of a Gaussian distribution that in the case n = 1 (when allelements of Hn have Gaussian distributions) and for large p, one has the asymptoticbehaviour

EX2p ≤ (2p − 1)p√2e

12−p

EX2p.

It is however for n ≥ 2 that the bound (7.2) reveals its full power since the possibledistributions for elements of Hn then cannot de described by a finite-dimensionalfamily anymore.

A crucial ingredient in the proof of Theorem 7.1 is the following tensorisationproperty. Consider bounded linear operators Ti : Lq(Ωi,Pi) → Lp(Ωi,Pi) fori ∈ 1, 22 and define the probability space (Ω,P) by

Ω = Ω1 ×Ω2 , P = P1 ⊗ P2 .

Then, on random variables of the form X(ω) = X1(ω1)X2(ω2), one defines anoperator T = T1 ⊗ T2 by setting (T X)(ω) = (T1X1)(ω1) (T2X2)(ω2), where we usedthe notation ω = (ω1, ω2). Extending T by linearity, this defines T on a densesubset of (Ω,P). On that subset one can also write T = T2T1 where T1 acts onfunctions of Ω by

(T1X)(ω1, ω2) = T1X(·, ω2)(ω1) ,

and similarly for T2. We claim that if each Ti satisfies a hypercontractive boundof the type (7.1), then so does T . A key ingredient for proving this is thefollowing lemma, where we write for example ‖X ‖Lp

1as a shorthand for the function

ω2 7→ ‖X(·, ω2)‖Lp(Ω1,P1).

Lemma 7.2 If p ≥ q ≥ 1, then one has ‖‖X ‖Lq1‖Lp

2≤ ‖‖X ‖Lp

2‖Lq

1.

Proof. The holds for q = 1 by the triangle inequality since

‖‖X ‖L11‖Lp

2=

∫|X(ω1, ·)| P1(dω1) Lp

2≤

∫ |X(ω1, ·)| Lp

2P1(dω1)

= ‖‖X ‖Lp2‖L1

1.

Exploiting the fact that ‖X ‖Lq = ‖|X |q‖1/qL1 , the general case follows:

‖‖X ‖Lq1‖Lp

2= ‖‖X ‖q

Lq1‖1/q

Lp/q2

= ‖‖|X |q‖L11‖1/q

Lp/q2

≤ ‖‖|X |q‖Lp/q2

‖1/qL11= ‖‖X ‖q

Lp2‖1/q

L11= ‖‖X ‖Lp

2‖Lq

1,

thus concluding the proof.2We will always assume our spaces to be standard probability spaces, so that no pathologies

arise when considering products and conditional expectations.


Corollary 7.3 In the above setting if, for some p ≥ q ≥ 1 one has ‖Ti X ‖Lp ≤

‖X ‖Lq , then one also has ‖T X ‖Lp ≤ ‖X ‖Lq .

Proof. One has

‖T X ‖Lp = ‖‖T1T2X ‖Lp1‖Lp

2≤ ‖‖T2X ‖Lq

1‖Lp

2

≤ ‖‖T2X ‖Lp2‖Lq

1≤ ‖‖X ‖Lq

2‖Lq

1= ‖X ‖Lq ,

as claimed.

Recall now that, in the context of a Gaussian probability space, the space W

consisting of random variables of the type

X = F(ξ1, . . . , ξn) , F ∈ C∞b , (7.3)

where ξi = W (e1) for an orthonormal basis ei of the Cameron-Martin space H ,is dense in L2(Ω,P). Similarly, one can show that it is actually dense in everyLp(Ω,P) for p ∈ [1,∞), and we will assume this in the sequel.

As a consequence, in order to prove Theorem 7.1, it suffices to show that (7.1)holds for random variables of the type (7.3) for any fixed n. Note now that, using(3.9) for the evaluation of the Skorokhod integral, one has

∆X = δD X = δ∑

i

(∂iF)(ξ1, . . . , ξn)ei

=∑

i

(∂iF)(ξ1, . . . , ξn)ξi − (∂2i F)(ξ1, . . . , ξn) .

In other words, one has ∆ =∑n

i=1 ∆i, where

∆i = −∂2i + ξi ∂i .

At this stage, we note that the closed subspace of Lp(Ω,P) given by the closureof the subspace spanned by random variables of the type (7.3) for any fixed n iscanonically isomorphic (precisely via (7.3)) to Lp(Rn,N(0, id)). Furthermore, Tt

maps that space into itself, so let us write T (n)t for the corresponding operator on

Lp(Rn,N(0, id)). It follows from the fact that all of the ∆i commute that one has

T (n)t = T (1)

t ⊗ . . . ⊗ T (1)t (n times),

so that as a consequence of Corollary 7.3, Theorem 7.1 follows if we can show theanalogous statement for T (1)

t .


At this stage, we note that the operator ∂2x − x∂x is the generator of the standardone-dimensional Ornstein-Uhlenbeck process given by the solutions to the SDE

dX = −X dt +√2 dB(t) , X0 = x , (7.4)

where B is a standard one-dimensional Brownian motion. Note that B has nothingto do whatsoever with the white noise process W that is the start of our discussion!In other words, it follows from Itô’s formula that if we define an operator Tt onLp(R,N(0, 1)) by

Ttϕ(x) = Exϕ(Xt) ,

then Ttϕ does indeed solve the equation

∂tTtϕ = (∂2x − x∂x)Ttϕ .

Since the operator ∂2x − x∂x is essentially self-adjoint on C∞b (see for example[RS72, RS75] for more details), it follows that Tt as defined above does indeedcoincide with the operator T (1)

t , modulo the isomorphism mentioned above. By thevariation of constants formula, the solution to (7.4) is given by

X(t) = e−t x +∫ t

0es−t dB(s) .

In law, for any fixed t (not as a function of t!), this can be written as

X(t) law= e−t x +

√1 − e−2tθ , θ ∼N(0, 1) ,

so that Tt = Qe−t , where the operator Q% is given by

Q%ϕ

(x) = Eϕ(%x +

√1 − %2θ

).

Summarising the discussion so far, we have:

Lemma 7.4 Let η, ξ be two jointly Gaussian random variables with variance 1and covariance %. Assume that for p, q with (q − 1) = %2(p − 1), one has

E

E(ϕ(ξ) | η)p1/p

≤E|ϕ(η)|q1/q , (7.5)

for every % ∈ (0, 1) and every function ϕ ∈ C∞0 . Then, Theorem 7.1 holds.

At this stage, it turns out that we can again leverage the tensorisation propertyof the hypercontractive bounds to reduce ourselves to the simplest probability spaceimaginable, namely that generated by a single Bernoulli random variable! This wasfirst pointed out by Gross [Gro75] and the argument works in the following way.


Let x( j)i i≥1, j∈0,1 be a collection of i.i.d. ±1-valued fair coin tosses and let

rii≥1 be a 0, 1-valued i.i.d. sequence with P(ri = 1) = %. Define furthermore

ηn =1√

n

n∑i=1

x(1)i , ξn =1√

n

n∑i=1

x(ri )i . (7.6)

We then have:

Proposition 7.5 The sequence (ηn, ξn) converges in law to (η, ξ) for η and ξas in Lemma 7.4. Furthermore, for any η ∈ R, the law of ξn, conditional onηn = 2n−1/2b√nη/2c (assume n even), converges to that of a Gaussian randomvariable with mean %η and variance 1 − %2, uniformly in the Prokhorov metric forη in compact sets.

Proof. The first statement is an immediate consequence of the central limit theorem.The second statement is a little bit more delicate and we only sketch one possibleavenue of proof of a slightly stronger statement: conditioning furthermore on%n = n−1

∑ni=1 ri, we claim that the required conditional convergence still holds,

uniformly over any sequence %n with |%n − %| ≤ n−1/4 (and of course n%n ∈ N).Since P(|%n − %| ≤ n−1/4) → 0 as n → ∞ (even exponentially fast), this impliesthe claim.

Since the distribution of the sequence x(0)i conditioned on ηn is exchangeable,the distribution of (ηn, ξn) conditioned on ηn and %n is identical to that of (ηn, ξn)conditioned on ηn, where

ξn =1√

n

n%n∑i=1

x(1)i +1√

n

n−1∑i=n%n

x(0)i+1def= ξ

(1)n + ξ

(0)n .

Since ξ(1)n is independent of ηn, it follows from the central limit theorem that, underthe above assumption on the %n, its distribution converges toN(0, 1− %) as n → ∞,with the required uniformity properties.

It therefore remains to show that the distribution of ξ(1)n , conditional onηn = n−1/2b√nηc, converges to N(%η, %(1 − %)). One elementary way of provingthis is to use the explicit expression

P(ξ(1)n = x & ηn = y) = 2−N( n%n

n%n+x√

n2

) ( n(1 − %n)(1−n)%n+(y−x)√n

2

),

and to take limits using Stirling’s formula. Formally, one would like to simplyapply Donsker’s invariance principle: the rescaled partial sums of the x(1)i convergeto a Brownian motion B and the conditional distribution of B(%) given B(1) = η


is precisely N(%η, %(1 − %)). This is however not a proof since the convergenceof distributions does not imply convergence of conditional distributions, which iscrucial for our application.

It follows immediately from Proposition 7.5 thatE

E(ϕ(ξ) | η)p1/p

= limn→∞

E

E(ϕ(ξn) | ηn)p1/p ,

E|ϕ(η)|q1/q

= limn→∞

E|ϕ(ηn)|q1/q

.

We conclude that (7.5) holds if, for every fixed value of n, it holds with (ξ, η)replaced by (ξn, ηn). Again, we see that (7.5) is precisely of the form (7.1), thistime with (Ω,P) = (±1n,P⊗n

2 ), where P2(+1) = P2(−1) = 12 , and with the

operator Tt replaced by Q(n)% , where

Q(n)% ψ

(x) = Eψ(y) , yi =

xi if ri = 1,zi otherwise.

for z an i.i.d. sequence of Bernoulli random variables and ri an independentsequence of 0, 1-valued random variables as in (7.6). We see that one has againQ(n)% = Q⊗n

% where Q% acts on the probability space (±1,P2) byQ%ψ

(x) = %ψ(x) + 1 − %2

ψ(1) + ψ(−1) .

Note now that on ±1 every function is affine, so that Q% is fully characterised by

Q%(a + bx) = a + %bx ,

where x denotes the “identity map” from ±1 toR. It thus follows that the proof ofTheorem 7.1 is reduced to verifying that, for every % ∈ [0, 1], every f : ±1 → R,and every p, q ≥ 1 with q = 1 + %2(p − 1), one has

‖Q% f ‖Lp ≤ ‖ f ‖Lq . (7.7)

At this stage, we introduce two essential notions. First, the entropy of a functionf : ±1 → R is given by

Ent f = E( f log f ) − E f logE f .

Second, the Dirichlet form of two functions is given by

D( f , g) = 14( f (1) − f (−1))(g(1) − g(−1)) .


The motivation for this definition is that, as an immediate consequence of thedefinitions (by homogeneity we restrict ourselves to the case f (x) = 1 + sx andg(x) = 1 + t x), one has the identity

dd%

E(gQ% f ) = E((1 + t x)sx) = st =1%

D(g,Q% f ) , (7.8)

for any two random variables f and g on (±1,P2). We will make use of thefollowing two facts.

Lemma 7.6 For any f : ±1 → R and any q ≥ 1, one has D( f , f q−1) ≥4(q−1)

q2 D( f q/2, f q/2).

Proof. By homogeneity and symmetry, we can take f (1) = x and f (−1) = 1 withx ≥ 1, so that

D( f q/2, f q/2) = 14(xq/2 − 1)2 = q2

16

(∫ x

1yq/2−1 dy

)2≤

q2

16

∫ x

1yq−2 dy

∫ x

1dy

=q2

16(q − 1) (xq−1 − 1)(x − 1) = q2

4(q − 1)D( f , f q−1) ,

as claimed.

Lemma 7.7 (log-Sobolev inequality) One has 2D( f , f ) ≥ Ent( f 2).Proof. By definition Ent a f = a Ent f , so that we can restrict ourselves to the casef (x) = 1 + t x, so that D( f , f ) = t2 and

Ent f 2 =12(1 + t)2 log(1 + t)2 + 1

2(1 − t)2 log(1 − t)2 − (1 + t2) log(1 + t2) .

Setting ϕ(t) = 2D( f , f ) − Ent f 2, we have

ϕ(t) = 4t − (1 + t) log(1 + t)2 − (1 + t)+ (1 − t) log(1 − t)2 + (1 − t) + 2t log(1 + t2) + 2t= 2(1 − t) log(1 − t) − 2(1 + t) log(1 + t) + 2t log(1 + t2) + 4t ,

ϕ(t) = −2 log(1 − t) − 2 log(1 + t) + 2 log(1 + t2) + 4t2

1 + t2

= 2( 2t2

1 + t2+ log

(1 + t2

1 − t2)).

Since both of these terms are positive, one has ϕ ≥ 0. Since furthermoreϕ(0) = ϕ(0) = 0, we conclude that one has indeed ϕ(t) ≥ 0 for t ≥ 0.

Construction of the Φ42 field 34

Our aim now is to show that if we set p(%) = 1 + %−2(q − 1), thendd%

‖Q% f ‖Lp(%) ≥ 0 . (7.9)

Since ‖Q1 f ‖Lp(1) = ‖ f ‖Lq , this then implies (7.7) which in turn implies Theorem 7.1.Setting N(%) = E|Q% f |p(%) and using the fact that d

dx ax = ax log a, we get

ddt

N(t)1/p(t) = − pp2

N(t)1/p log N(t) + 1p

N(t) 1p−1N(t)

=N(t) 1

p−1

p

(N(t) − p

pN(t) log N(t)

).

Assuming that f is positive, we have

N(t) = p(t)E((Qt f )p(t)−1 ddt

Qt f)+ p(t)E((Qt f )p(t) log(Qt f ))

=ptD

(Qt f )p−1,Qt f+

ppE((Qt f )p log(Qt f )p

).

Noting that p = −2t (p − 1) and using Lemma 7.6, we conclude that

ddt

N(t)1/p(t) ≥ N(t) 1p−1

pt

(4(p − 1)p

D((Qt f )p/2, (Qt f )p/2) − 2(p − 1)p

Ent(Qt f )p).

It thus follows from Lemma 7.7 that (7.9) holds, which implies (7.7), and thereforeTheorem 7.1 holds.

8 Construction of the Φ42 field

We now sketch the argument given by Nelson in [Nel66], showing how thehypercontractive bounds of the previous section can be used to construct Φ4

2Euclidean field theory. The goal is to build a measure P on the space D′(T2) ofdistributions on the two-dimensional torus which is formally given by

P(dΦ) ∝ exp(−12

∫|∇Φ(x)|2 dx −

∫|Φ(x)|4 dx

)“dΦ” .

This expression is of course completely nonsensical in many respects, not leastbecause there is no “Lebesgue measure” in infinite dimensions. However, thefirst part in this description is quadratic and should therefore define a Gaussianmeasure. Recalling that the Gaussian measure with covariance C has density


exp(−12〈x,C−1x〉) with respect to Lebesgue measure, this suggests that we should

rewrite P asP(dΦ) ∝ exp

(−

∫|Φ(x)|4 dx

)Q(dΦ) , (8.1)

whereQ is Gaussian with covariance given by the inverse of the Laplacian. In otherwords, under Q, the Fourier modes of Φ are distributed as independent Gaussianrandom variables (besides the reality constraint) with Φ(k) having variance 1/|k |2.In order to simplify things, we furthermore postulate that Φ(0) = 0.

The measure Q is the law of the free field, which also plays a crucial role in thestudy of critical phenomena in two dimensions due to its remarkable invarianceproperties under conformal transformations. However, it turns out that (8.1) isunfortunately still nonsensical. Indeed, for this to make sense, one would like at thevery least to have Φ ∈ L4 almost surely. It turns out that one does not even haveΦ ∈ L2 since, at least formally, one has

E‖Φ‖2L2 =∑

k

E|Φ(k)|2 =∑k,0

1|k |2 = ∞ ,

since we are in two dimensions.Denote now by G the Green’s function for the Laplacian. One way of defining

G is the following. Take a cut-off function χ : R2 → R which is smooth, positive,compactly supported, and such that χ(k) = 1 for |k | ≤ 1.

G(x) = limN→∞

GN (x) , GN (x) =∑k,0

ϕ(k/N)|k |−2eik x .

It is a standard result that this limit exists, does not depend on the choice of χ,and is such that G(x) ∼ − 1

2π log |x | for small values of x, and is smooth otherwise.Furthermore, the function GN has the property that |GN (x)| . log |x |−1 ∧ log Nfor all x. Finally, one has

|G(x) − GN (x)| . | log N |x || ∧ 1N2 |x |2 .

Note now that for every N , one can find fields ΦN and ΨN that are independent andwith independent Gaussian Fourier coefficients such that

E|ΦN (k)|2 = ϕ(k/N)|k |−2 , E|ΨN (k)|2 = 1 − ϕ(k/N)|k |−2 .

One then has ΦN +ΨNlaw= Φ with Φ a free field. We can furthermore choose ΦN to

be a function of Φ by simply setting

ΦN (k) =√ϕ(k/N)Φ(k) . (8.2)


Furthermore, ΨN is “small” in some sense to be made precise and ΦN is almostsurely a smooth Gaussian field with covariance GN .

Note however that C2N

def= E|ΦN (x)|2 = GN (0) ∼ log N as N → ∞. The idea

now is to reinterpret the quantity Φ4 appearing in (8.1) as a “Wick power” which isdefined in terms of the 4th Hermite polynomial by

:ΦN (x)4: def= C4

N H4(ΦN (x)/CN ) . (8.3)

The point here is that by the defining properties of the Hermite polynomials, onehas E :ΦN (x)4: = 0. Furthermore, and even more importantly, one can easily verifyfrom a simple calculation using Wick’s formula that

E(:ΦN (x)4: :ΦN (y)4:

)= 24G4

N (x − y) .In particular, setting

XNdef=

∫T2

:ΦN (x)4: dx ,

one hasEX2

N = 24∫

T2

∫T2

G4N (x − y) dx dy ,

which is uniformly bounded as N → ∞. Furthermore, by (8.3), XN is an elementof the fourth homogeneous Wiener chaos H4 on the Gaussian probability spacegenerated by Φ. It is now a simple exercise to show that there exists a randomvariable X belonging to H4 such that limN→∞ XN = X in L2, and therefore in everyLp by (7.2). At this stage, we would like to define our measure P by

P(dΦ) ∝ exp(−X(Φ))Q(dΦ) , (8.4)

with X given by the finite random variable we just constructed.The problem with this is that although |ΦN (x)|4 is obviously bounded from

below, uniformly in N , :ΦN (y)4: is not! Indeed, the best one can do is:ΦN (y)4: =

ΦN (y)2 − 3C2N

2− 6C4

N ≥ −6C4N ∼ −c(log N)2 ,

for some constant c > 0 (actually c = 32π2 ), so that

XN ≥ −c(log N)2 . (8.5)

In order to make sense of (8.4) however, we would like to show that the randomvariable exp(−X) is integrable. This is where the optimal bound (7.2) plays acrucial role. The idea of Nelson is to exploit the decomposition ΦN + ΨN

law= Φ

together with Lemma 2.2 to write

X law= XN + YN ,


YNdef= 4

∫T2

:ΦN (x)3::ΨN (x): dx + 6∫

T2:ΦN (x)2::ΨN (x)2: dx

+ 4∫

T2:ΦN (x)::ΨN (x)3: dx +

∫T2

:ΨN (x)4: dx = Y (1)N + . . . + Y (4)

N .

Note now that, setting GN = G − GN , one has

E|Y (1)N |2 = 96

∫T2

∫T2

G3N (x − y) GN (x − y) dx dy .

| log N |4N2 .

Analogous bounds can be obtained for the other Y (i)N . Combining this with (7.2)

and the fact that YN belongs to the chaos of order 4, we obtain the existence of finiteconstants c and C such that

E|YN |2p ≤cpp4p | log N |4p

N2p ≤ Cp4p

N p ,

uniformly over N ≥ C and p ≥ 1. In the sequel, the values of the constants c andC are allowed to change from expression to expression. We conclude that

P(X < −K) = P(XN + YN < −K) ≤ PYN ≤ c(log N)2 − K

≤ P|YN | ≥ K − c(log N)2

≤E|YN |2p

(K − c(log N)2)p

≤Cp4p

N p(K − c(log N)2)p ,

provided that c(log N)2 ≤ K and N ≥ C. We now exploit our freedom to chooseboth N and p. First, we choose N such that c(log N)2 − K ∈ [1, 2] (this is alwayspossible if K is large enough), so that

P(X < −K) ≤ Cp4p

N p ≤ C(pe−c√

K )4p .

We now choose p = ec√

K for some c < c, so that eventually

P(X < −K) ≤ C exp−cec

√K .

We can rewrite this as

P(exp(−X) > M) ≤ C exp−cec

√log M

.

In particular, the right hand side of this expression is smaller than any inverse powerof M , so that exp(−X) is indeed integrable as claimed.


References

[BH07] F. Baudoin and M. Hairer. A version of Hörmander’s theorem for thefractional Brownian motion. Probab. Theory Related Fields 139, no. 3-4,(2007), 373–395.

[Bis81a] J.-M. Bismut. Martingales, the Malliavin calculus and Hörmander’s theorem.In Stochastic integrals (Proc. Sympos., Univ. Durham, Durham, 1980), vol.851 of Lecture Notes in Math., 85–109. Springer, Berlin, 1981.

[Bis81b] J.-M. Bismut. Martingales, the Malliavin calculus and hypoellipticity undergeneral Hörmander’s conditions. Z. Wahrsch. Verw. Gebiete 56, no. 4, (1981),469–505.

[Bog98] V. I. Bogachev. Gaussian measures, vol. 62 of Mathematical Surveys andMonographs. American Mathematical Society, Providence, RI, 1998. doi:10.1090/surv/062.

[BT05] F. Baudoin and J. Teichmann. Hypoellipticity in infinite dimensions andan application in interest rate theory. Ann. Appl. Probab. 15, no. 3, (2005),1765–1777.

[Cas09] T. Cass. Smooth densities for solutions to stochastic differential equations withjumps. Stochastic Process. Appl. 119, no. 5, (2009), 1416–1435.

[CF10] T. Cass and P. Friz. Densities for rough differential equations under Hörman-der’s condition. Ann. of Math. (2) 171, no. 3, (2010), 2115–2141.

[CHLT15] T. Cass, M. Hairer, C. Litterer, and S. Tindel. Smoothness of the densityfor solutions to Gaussian rough differential equations. Ann. Probab. 43, no. 1,(2015), 188–239. doi:10.1214/13-AOP896.

[Gro75] L. Gross. Logarithmic Sobolev inequalities. Amer. J. Math. 97, no. 4, (1975),1061–1083.

[Hai11] M. Hairer. On Malliavin’s proof of Hörmander’s theorem. Bull. Sci. Math.135, no. 6-7, (2011), 650–666. doi:10.1016/j.bulsci.2011.07.007.

[HM06] M. Hairer and J. C. Mattingly. Ergodicity of the 2D Navier-Stokes equationswith degenerate stochastic forcing. Ann. of Math. (2) 164, no. 3, (2006),993–1032.

[HM11] M. Hairer and J. C. Mattingly. A theory of hypoellipticity and uniqueergodicity for semilinear stochastic PDEs. Electron. J. Probab. 16, (2011),658–738.

[Hör67] L. Hörmander. Hypoelliptic second order differential equations. Acta Math.119, (1967), 147–171.

[HP11] M. Hairer and N. Pillai. Ergodicity of hypoelliptic SDEs driven by fractionalBrownian motion. Ann. IHP Ser. B (2011). To appear.

http://dx.doi.org/10.1090/surv/062

http://dx.doi.org/10.1090/surv/062

http://dx.doi.org/10.1214/13-AOP896

http://dx.doi.org/10.1016/j.bulsci.2011.07.007


[IK06] Y. Ishikawa and H. Kunita. Malliavin calculus on the Wiener-Poisson spaceand its application to canonical SDE with jumps. Stochastic Process. Appl.116, no. 12, (2006), 1743–1769.

[KS84] S. Kusuoka and D. Stroock. Applications of the Malliavin calculus. I. InStochastic analysis (Katata/Kyoto, 1982), vol. 32 of North-Holland Math.Library, 271–306. North-Holland, Amsterdam, 1984.

[KS85] S. Kusuoka and D. Stroock. Applications of the Malliavin calculus. II. J.Fac. Sci. Univ. Tokyo Sect. IA Math. 32, no. 1, (1985), 1–76.

[KS87] S. Kusuoka and D. Stroock. Applications of the Malliavin calculus. III. J.Fac. Sci. Univ. Tokyo Sect. IA Math. 34, no. 2, (1987), 391–442.

[Law77] H. B. Lawson, Jr. The quantitative theory of foliations. AmericanMathematicalSociety, Providence, R. I., 1977. Expository lectures from the CBMS RegionalConference held at Washington University, St. Louis, Mo., January 6–10, 1975,Conference Board of the Mathematical Sciences Regional Conference Seriesin Mathematics, No. 27.

[Mal78] P. Malliavin. Stochastic calculus of variation and hypoelliptic operators.In Proceedings of the International Symposium on Stochastic DifferentialEquations (Res. Inst. Math. Sci., Kyoto Univ., Kyoto, 1976), 195–263. Wiley,New York-Chichester-Brisbane, 1978.

[Mal97] P. Malliavin. Stochastic analysis, vol. 313 ofGrundlehren derMathematischenWissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1997. doi:10.1007/978-3-642-15074-6.

[MP06] J. C. Mattingly and É. Pardoux. Malliavin calculus for the stochastic2D Navier-Stokes equation. Comm. Pure Appl. Math. 59, no. 12, (2006),1742–1790.

[Nel66] E. Nelson. A quartic interaction in two dimensions. In Mathematical Theoryof Elementary Particles (Proc. Conf., Dedham, Mass., 1965), 69–73. M.I.T.Press, Cambridge, Mass., 1966.

[Nel73] E. Nelson. Construction of quantum fields from Markoff fields. J. FunctionalAnalysis 12, (1973), 97–112.

[Nor86] J. Norris. Simplified Malliavin calculus. In Séminaire de Probabilités, XX,1984/85, vol. 1204 of Lecture Notes in Math., 101–130. Springer, Berlin, 1986.

[Nua06] D. Nualart. The Malliavin calculus and related topics. Probability and itsApplications (New York). Springer-Verlag, Berlin, second ed., 2006. doi:10.1007/3-540-28329-3.

[Oco88] D. Ocone. Stochastic calculus of variations for stochastic partial differentialequations. J. Funct. Anal. 79, no. 2, (1988), 288–331.

[RS72] M. Reed andB. Simon. Methods of modern mathematical physics. I. Functionalanalysis. Academic Press, New York-London, 1972.

http://dx.doi.org/10.1007/978-3-642-15074-6

http://dx.doi.org/10.1007/3-540-28329-3

http://dx.doi.org/10.1007/3-540-28329-3


[RS75] M. Reed and B. Simon. Methods of modern mathematical physics. II. Fourieranalysis, self-adjointness. Academic Press [Harcourt Brace Jovanovich, Pub-lishers], New York-London, 1975.

[RY99] D. Revuz and M. Yor. Continuous martingales and Brownian motion, vol. 293of Grundlehren der Mathematischen Wissenschaften [Fundamental Principlesof Mathematical Sciences]. Springer-Verlag, Berlin, third ed., 1999.

[Seg56] I. E. Segal. Tensor algebras over Hilbert spaces. I. Trans. Amer. Math. Soc.81, (1956), 106–134. doi:10.1090/S0002-9947-1956-0076317-8.

[Tak02] A. Takeuchi. The Malliavin calculus for SDE with jumps and the partiallyhypoelliptic problem. Osaka J. Math. 39, no. 3, (2002), 523–559.

[Tak10] A. Takeuchi. Bismut–Elworthy–Li-type formulae for stochastic differentialequations with jumps. Journal of Theoretical Probability 23, (2010), 576–604.

[Wie38] N. Wiener. The Homogeneous Chaos. Amer. J. Math. 60, no. 4, (1938),897–936. doi:10.2307/2371268.

http://dx.doi.org/10.1090/S0002-9947-1956-0076317-8

http://dx.doi.org/10.2307/2371268

M.Hairer · Advancedstochasticanalysis February29,2016 M.Hairer TheUniversityofWarwick Contents 1...

Documents

Transcript of M.Hairer · Advancedstochasticanalysis February29,2016 M.Hairer TheUniversityofWarwick Contents 1...