Multilevel Monte Carlo methods - People | Mathematical ... · PDF fileMonte Carlo simulation;...

Acta Numerica (2018), pp. 001– c© Cambridge University Press, 2018

doi:10.1017/S09624929 Printed in the United Kingdom

Multilevel Monte Carlo methods

Michael B. Giles

Mathematical Institute, University of Oxford

[email protected]

Monte Carlo methods are a very general and useful approach for the estima-tion of expectations arising from stochastic simulation. However, they canbe computationally expensive, particularly when the cost of generating in-dividual stochastic samples is very high, as in the case of stochastic PDEs.Multilevel Monte Carlo is a recently developed approach which greatly reducesthe computational cost by performing most simulations with low accuracy ata correspondingly low cost, with relatively few simulations being performedat high accuracy and a high cost.

In this article, we review the ideas behind the multilevel Monte Carlo method,and various recent generalisations and extensions, and discuss a number ofapplications which illustrate the flexibility and generality of the approach andthe challenges in developing more efficient implementations with a faster rateof convergence of the multilevel correction variance.

CONTENTS

1 Introduction 22 MLMC theory and extensions 63 MLMC implementation 214 Introduction to applications 285 SDEs 296 Jump diffusion and Levy processes 477 PDEs and SPDEs 498 Continuous-time Markov chains 559 Nested simulation 5710 Other applications 6011 Conclusions 63References 64

2 M.B. Giles

1. Introduction

1.1. Stochastic modelling and Monte Carlo simulation

Stochastic modelling and simulation is a growing area in applied mathe-matics and scientific computing. One large application area is in computa-tional finance, in the pricing of financial derivatives and quantitative riskmanagement (Glasserman 2004, Asmussen and Glynn 2007). Another isUncertainty Quantification in engineering and science, which has led to anew SIAM journal and associated annual conferences. Stochastic modellingis also important in diverse areas such as biochemical reactions (Andersonand Higham 2012) and plasma physics (Rosin, Ricketson, Dimits, Caflischand Cohen 2014).

When the dimensionality of the uncertainty (or the number of uncertaininput parameters) is low, it can be appropriate to model the uncertaintyusing the Fokker-Planck PDE and use stochastic Galerkin, stochastic collo-cation or polynomial chaos methods (Xiu and Karniadakis 2002, Babuska,Tempone and Zouraris 2004, Babuska, Nobile and Tempone 2010, Gun-zburger, Webster and Zhang 2014). When the level of uncertainty is low,and its effect is largely linear, then moment methods can be an efficientand accurate way in which to quantify the effects on uncertainty (Putko,Taylor, Newman and Green 2002). However, when the uncertainty is high-dimensional and strongly nonlinear, Monte Carlo simulation remains thepreferred approach.

At its simplest, Monte Carlo simulation is extremely simple. To estimateE[P ], a simple Monte Carlo estimate is just an average of values P (ω) forN independent samples ω coming from a given probability space (Ω,F ,P),

N−1N∑

n=1

P (ω(n)).

The variance of this estimate is N−1V[P ], so the r.m.s. error is O(N−1/2)

and an accuracy of ε requires N=O(ε−2) samples. This is the weakness ofMonte Carlo simulation; its computational cost can be very high, particu-larly when each sample P (ω) might require the approximate solution of aPDE, or a computation with many timesteps.

One approach to addressing this high cost is the use of Quasi-Monte Carlo(QMC) methods, in which the samples are not chosen randomly and inde-pendently, but are instead selected very carefully to reduce the error. In thebest cases, the error may be O(N−1), up to logarithmic terms, giving a verysubstantial reduction in the number of samples required for a given accu-racy. The QMC approach was surveyed in Acta Numerica 2013 (Dick, Kuoand Sloan 2013). In this article, we cover a different approach to improvingthe computational efficiency, the multilevel Monte Carlo (MLMC) method,and we include a brief discussion of the combination of the two approaches.

Multilevel Monte Carlo methods 3

1.2. Control variates and two-level MLMC

One of the classic approaches to Monte Carlo variance reduction is throughthe use of a control variate (Glasserman 2004). Suppose we wish to estimateE[f ], and there is a control variate g which is well correlated to f and hasa known expectation E[g]. In that case, we can use the following unbiasedestimator for E[f ] based on N independent samples ω(n):

N−1N∑

n=1

f(ω(n))− λ

(g(ω(n))− E[g]

).

The optimal value for λ is ρ√

V[f ] /V[g], where ρ is the correlation betweenf and g, and the variance of the control variate estimator is reduced byfactor 1−ρ2 compared to the standard estimator.

A two-level version of MLMC is very similar. If we want to estimate E[P1]but it is much cheaper to simulate P0 which approximates P1, then since

E[P1] = E[P0] + E[P1 − P0]

we can use the unbiased two-level estimator

N−10

N0∑

n=1

P(n)0 + N−1

1

N1∑

n=1

(P

(n)1 −P

(n)0

).

Here P(n)1 −P

(n)0 represents the difference between P1 and P0 for the same

underlying stochastic sample ω(n), so that P(n)1 −P

(n)0 is small and has a

small variance; the precise construction depends on the application andvarious examples will be shown later. The two key differences from thecontrol variate approach are that the value of E[P0] is not known, so has tobe estimated, and we use λ = 1.

If we define C0 and C1 to be the cost of computing a single sample ofP0 and P1−P0, respectively, then the total cost is N0 C0+N1 C1, and ifV0 and V1 are the variance of P0 and P1−P0, then the overall variance

is N−10 V0 + N−1

1 V1, assuming that∑N0

n=1 P(n)0 and

∑N1n=1

(P

(n)1 −P

(n)0

)use

independent samples.Hence, treating the integers N0, N1 as real variables and performing a

constrained minimisation using a Lagrange multiplier, the variance is min-imised for a fixed cost by choosing N1/N0 =

√V1/C1 /

√V0/C0.

1.3. Multilevel Monte Carlo

The multilevel generalisation is quite natural: given a sequence P0, . . . , PL−1

which approximates PL with increasing accuracy, but also increasing cost,

4 M.B. Giles

we have the simple identity

E[PL] = E[P0] +

L∑

ℓ=1

E[Pℓ − Pℓ−1],

and therefore we can use the following unbiased estimator for E[PL],

N−10

N0∑

n=1

P(0,n)0 +

L∑

ℓ=1

N−1

ℓ

Nℓ∑

n=1

(P

(ℓ,n)ℓ −P

(ℓ,n)ℓ−1

)

with the inclusion of the level ℓ in the superscript (ℓ, n) indicating thatindependent samples are used at each level of correction.

If we define C0, V0 to be the cost and variance of one sample of P0,and Cℓ, Vℓ to be the cost and variance of one sample of Pℓ−Pℓ−1, then

the overall cost and variance of the multilevel estimator is∑L

ℓ=0 NℓCℓ and∑Lℓ=0N

−1ℓ Vℓ, respectively.

For a fixed variance, the cost is minimised by choosing Nℓ to minimise

L∑

ℓ=0

(NℓCℓ + µ2N−1

ℓ Vℓ

)

for some value of the Lagrange multiplier µ2. This gives Nℓ=µ√Vℓ /Cℓ. To

achieve an overall variance of ε2 then requires that µ= ε−2∑L

ℓ=0

√Vℓ Cℓ,

and the total computational cost is therefore

C = ε−2

(L∑

ℓ=0

√Vℓ Cℓ

)2

. (1.1)

It is important to note whether the product Vℓ Cℓ increases or decreaseswith ℓ, i.e. whether or not the cost increases with level faster than thevariance decreases. If the product increases with level, so that the dominantcontribution to the cost comes from VL CL then we have C ≈ ε−2VL CL,whereas if it decreases and the dominant contribution comes from V0C0 thenC ≈ ε−2V0 C0. This contrasts with the standard MC cost of approximatelyε−2V0 CL, assuming that the cost of computing PL is similar to the cost ofcomputing PL−PL−1, and that V[PL] ≈ V[P0].

This shows that in the first case the MLMC cost is reduced by factorVL/V0, corresponding to the ratio of the variances V[PL−PL−1] and V[PL],whereas in the second case it is reduced by factor C0/CL, the ratio of thecosts of computing P0 and PL−PL−1. If the product Vℓ Cℓ does not varywith level, then the total cost is ε−2L2 V0C0 = ε−2L2 VL CL.


1.4. Origins of MLMC method

The first work on multilevel Monte Carlo methods was by Heinrich forparametric integration, the evaluation of functionals arising from the solu-tion of integral equations, and weakly singular integral operators (Heinrich1998, Heinrich and Sindambiwe 1999, Heinrich 2000, Heinrich 2001, Heinrich2006). Parametric integration concerns the estimation of E[f(x, λ)] where xis a finite-dimensional random variable and λ is a parameter. In the simplestcase in which λ is a scalar real variable in the range [0, 1], having estimatedthe value of E[f(x, 0)] and E[f(x, 1)], one can use 1

2 (f(x, 0) + f(x, 1)) as a

control variate when estimating the value of E[f(x, 12 )]. Heinrich applied thisidea recursively on a geometric sequence of levels, leading to a complexityanalysis which is very similar to that which will be outlined later.

Although not so clearly related, there are also papers by Brandt et al

(Brandt, Galun and Ron 1994, Brandt and Ilyin 2003) which combine MonteCarlo techniques with multigrid ideas in determining thermodynamic limitsin statistical physics applications.

In 2005, Kebaier (2005) developed a two-level approach for path simula-tion which is very similar to the author’s approach (Giles 2008b, Giles 2008a)which was inspired by the multigrid ideas of Brandt and others for the itera-tive solution of PDE approximations. The differences are that Kebaier usedonly two levels, and used a general multiplicative factor as in the standardcontrol variate approach. There was also similar independent work at thesame time by Li (2007) and Speight (2009).

1.5. Outline of article

Section 2 presents the main theoretical results which underpin MLMCmeth-ods, and a number of different generalisations and extensions, many of whichhave been developed very recently.

Section 3 presents in some detail a MATLAB implementation of MLMC.This is used to generate the numerical results which are presented later, andthe full MATLAB code is available online (Giles 2014).

Sections 4-10 cover a wide range of applications, demonstrating the flex-ibility and generality of the MLMC approach, documenting progress in thenumerical analysis of the methods, and illustrating techniques which can beused to improve the variance of the MLMC estimators. Particular attentionis paid to the difficulties caused by discontinuities in output functionals,and a variety of remedies which can be employed.

Finally, Section 11 concludes with some thoughts about directions forfuture research.

Readers who are new to the subject may wish to start with Section 2.1and then proceed to the application sections most relevant to their interests.

6 M.B. Giles

2. MLMC theory and extensions

2.1. Geometric MLMC

In the Introduction, we considered the case of a general multilevel methodin which the output PL on the finest level corresponds to the quantity ofinterest. However, in many infinite-dimensional applications, such as thosewhich involve the simulation of SDEs and SPDEs, the output Pℓ on levelℓ is an approximation to a random variable P which cannot be simulatedexactly. If Y is an approximation to E[P ], then a standard piece of theorygives the mean square error (MSE) as

MSE ≡ E[ (Y −E[P ])2] = V[Y ] + (E[Y ]−E[P ] )2. (2.1)

If Y is now the multilevel estimator

Y =

L∑

ℓ=0

Yℓ, Yℓ = N−1ℓ

Nℓ∑

n=1

(P(ℓ,n)ℓ −P

(ℓ,n)ℓ−1 ), (2.2)

with P−1≡0, then

E[Y ] = E[PL], V[Y ] =L∑

ℓ=0

N−1ℓ Vℓ, Vℓ ≡ V[Pℓ−Pℓ−1]. (2.3)

To ensure that the MSE is less than ε2, it is sufficient to ensure that V[Y ] and(E[PL−P ])2 are both less than 1

2ε2. Combining this idea with a geometric

sequence of levels in which the cost increases exponentially with level, whileboth the weak error E[PL−P ] and the multilevel correction variance Vℓ

decrease exponentially, leads to the following theorem:

Theorem 1. Let P denote a random variable, and let Pℓ denote the cor-responding level ℓ numerical approximation.

If there exist independent estimators Yℓ based on Nℓ Monte Carlo sam-ples, each with expected cost Cℓ and variance Vℓ, and positive constantsα, β, γ, c1, c2, c3 such that α≥ 1

2 min(β, γ) and

i)∣∣E[Pℓ−P ]

∣∣ ≤ c1 2−α ℓ

ii) E[Yℓ] =

E[P0], ℓ = 0

E[Pℓ−Pℓ−1], ℓ > 0

iii) Vℓ ≤ c2 2−β ℓ

iv) Cℓ ≤ c3 2γ ℓ,

then there exists a positive constant c4 such that for any ε<e−1 there arevalues L and Nℓ for which the multilevel estimator

Y =

L∑

ℓ=0

Yℓ,


has a mean-square-error with bound

MSE ≡ E

[(Y − E[P ])2

]< ε2

with a computational complexity C with bound

E[C] ≤

c4 ε−2, β > γ,

c4 ε−2(log ε)2, β = γ,

c4 ε−2−(γ−β)/α, β < γ.

The statement of the theorem is a slight generalisation of the original the-orem in (Giles 2008b). It corresponds to the theorem and proof in (Cliffe,Giles, Scheichl and Teckentrup 2011), except for the minor change to ex-pected costs to allow for applications in which the simulation cost of in-dividual samples is itself random. Note that if condition iii) is tightenedslightly to be a bound on E[(Pℓ−Pℓ−1)

2], which is usually the quantitywhich is bounded in numerical analysis, then it would follow immediatelythat α ≥ 1

2β.

The essence of the proof is very straightforward. If we have Vℓ=O(2−βℓ)and Cℓ = O(2γℓ), then the analysis in Section 1.3 shows that the optimalnumber of samples Nℓ on level ℓ is proportional to 2−(β+γ)ℓ/2, and thereforethe cost on level ℓ is proportional to 2(γ−β)ℓ/2. The result then follows fromthe requirement that L is chosen so that (E[Y ]−E[P ] )2 < 1

2ε2, and the

constant of proportionality for Nℓ is chosen so that V[Y ] < 12ε

2. The proofalso has to handle the fact that Nℓ must be an integer, so the optimal valueis rounded up to the nearest integer.

The result of the theorem merits some discussion. In the case β > γ, thedominant computational cost is on the coarsest levels where Cℓ = O(1) andO(ε−2) samples are required to achieve the desired accuracy. This is thestandard result for a Monte Carlo approach using i.i.d. samples; to do betterwould require an alternative approach such as the use of Latin hypercubesampling or quasi-Monte Carlo methods.

In the case β < γ, the dominant computational cost is on the finest levels.Because of condition i), we have 2−αL = O(ε), and hence CL = O(ε−γ/α).If β = 2α, which is usually the best that can be achieved since typicallyV[Pℓ−Pℓ−1] is similar in magnitude to E[(Pℓ−Pℓ−1)

2] which is greater than(E[Pℓ−Pℓ−1])

2, then the total cost is O(CL), corresponding to O(1) sampleson the finest level which is the best that can be achieved.

The dividing case β = γ is the one for which both the computational ef-fort, and the contributions to the overall variance, are spread approximatelyevenly across all of the levels; the (log ε)2 term corresponds to the L2 factorin the corresponding discussion at the end of Section 1.3.

8 M.B. Giles

One comment on the Theorem is that it assumes lots of properties, andthen from these determines relatively easily some conclusions for the effi-ciency of the MLMC approach. In real applications, the tough challengeis in proving that the assumptions are valid, and in particular determiningthe values of the parameters α, β, γ. Furthermore, the Theorem assumesknowledge of the constants c1, c2, c3. In practice, c1 and c2 are almost neverknown, and instead have to be estimated based on empirical estimates ofthe weak error and the multilevel correction variance.

Collier, Haji-Ali, Nobile, von Schwerin and Tempone (2014) have devel-oped a modified version of the Theorem. Instead of bounding the MeanSquare Error, they prefer to use the Central Limit Theorem to constructa confidence interval which bounds E[P ] with a user-prescribed confidence.This exploits the fact that the multilevel correction Yℓ on each level is asymp-totically Normally-distributed, and therefore so is Y .

Haji-Ali, Nobile, von Schwerin and Tempone (2014b) address two otherimportant points. The first is whether a geometric sequence of levels is thebest choice. Their analysis proves that under certain conditions there isnegligible benefit in using a non-geometric sequence, but later in Section2.6 we will discuss other applications in which a non-geometric sequenceis appropriate. The second point is that the equal split between the errordue to E[Pℓ−Pℓ−1] and the error due to the variance V[Y ] is definitely notoptimal. When β>γ, the dominant computational effort is on the coarsestlevels; therefore L can be increased significantly with negligible cost, andso it makes sense to allocate most of the error to the variance term whichmeans that fewer samples will be required. This can give up to a factor2× improvement in computational cost compared to the equal split used in(Giles 2008b, Giles 2008a).

Equation (2.2) gives the natural choice for the multilevel correction es-timator Yℓ. However, the multilevel theorem allows for the use of otherestimators, provided they satisfy the restriction of condition ii) which en-sures that E[Y ] = E[PL]. Several examples of this will be given later in thisarticle. Some use slightly different numerical approximations for the coarseand fine paths in SDE simulations, resulting in the estimator

Yℓ = N−1ℓ

Nℓ∑

n=1

(P fℓ (ω

(n))−P cℓ−1(ω

(n))).

Provided we maintain the identity

E[P fℓ ] = E[P c

ℓ ] (2.4)

so that the expectation on level ℓ is the same for the two approximations,then condition ii) is satisfied and no additional bias (other than the bias due


to the approximation on the finest level) is introduced into the multilevelestimator.

Some others define an antithetic ω(n)a with the same distribution as ω(n),

and then use the multilevel estimator

Yℓ = N−1ℓ

Nℓ∑

n=1

12

(Pℓ(ω

(n))+Pℓ(ω(n)a ))− Pℓ−1(ω

(n)).

Since E[Pℓ(ω(n)a )] = E[Pℓ(ω

(n))], then again condition ii) is satisfied.In each case, the objective in constructing a more complex estimator is to

achieve a greatly reduced variance V[Yℓ] so that fewer samples are required.

2.2. Randomised MLMC for unbiased estimation

A very interesting extension has been introduced in (Rhee and Glynn 2012,Rhee and Glynn 2013). Rather than choosing the finest level of simulationL, based on the desired accuracy, and then using the optimal number ofsamples on each level based on an estimate of the variance, their “singleterm” estimator instead uses N samples in total, and for each sample theyperform a simulation on level ℓ with probability pℓ.

The estimator is

Y =1

N

N∑

n=1

1

pℓ(n)

(P(n)

ℓ(n)−P(n)

ℓ(n)−1)

with the level ℓ(n) for each sample being selected randomly with the relevantprobability. Alternatively, their estimator can be expressed as

Y =

∞∑

ℓ=0

(1

pℓN

Nℓ∑

n=1

(P(n)ℓ −P

(n)ℓ−1)

).

where Nℓ, the number of samples from level ℓ, is a random variable with

∞∑

ℓ=0

Nℓ = N, E[Nℓ] = pℓN.

and both outer summations can be trivially truncated at the (random)level L beyond which Nℓ = 0. Note that in this form it is very similar inappearance to the standard MLMC estimator which is

Y =

L∑

ℓ=0

(1

Nℓ

Nℓ∑

n=1

(P(n)ℓ −P

(n)ℓ−1)

).

10 M.B. Giles

The beauty of their estimator is that it is naturally unbiased, since

E[Y ] = E

[1

pℓ′(Pℓ′−Pℓ′−1)

]

=

∞∑

ℓ=0

pℓ E

[1

pℓ′(Pℓ′−Pℓ′−1) | ℓ′ = ℓ

]

=

∞∑

ℓ=0

E [Pℓ−Pℓ−1]

= E[P ].

If we define Eℓ = E[Pℓ−Pℓ−1] then the variance of the estimator is

V[Y ] =∞∑

ℓ=0

1

pℓ(Vℓ + E2

ℓ )−(

∞∑

ℓ=0

Eℓ

)2

≥∞∑

ℓ=0

1

pℓVℓ,

due to Jensen’s inequality.The choice of probabilities pℓ is crucial. For both the variance and the

expected cost to be finite, it is necessary that

∞∑

ℓ=0

1

pℓVℓ < ∞,

∞∑

ℓ=0

pℓCℓ < ∞.

Under the conditions of the MLMC Theorem, this is possible when β > γby choosing pℓ ∝ 2−(γ+β)ℓ/2, so that

1

pℓVℓ ∝ 2−(β−γ)ℓ/2, pℓCℓ ∝ 2−(β−γ)ℓ/2.

It is not possible when β≤γ, and for these cases the estimators constructedby Rhee and Glynn (2013) have infinite expected cost.

Provided β>γ, the optimal choice for pℓ is

pℓ =√

Vℓ/Cℓ

(∞∑

ℓ′=0

√Vℓ′/Cℓ′

)−1

.

If E2ℓ ≪ Vℓ, then the condition that the variance of the estimator is approx-

imately equal to ε2 gives

N ≈ ε−2∞∑

ℓ=0

Vℓ/pℓ ≈ ε−2

(∞∑

ℓ=0

√VℓCℓ

)(∞∑

ℓ′=0

√Vℓ′/Cℓ′

)

and therefore the total cost is

C = N

∞∑

ℓ=0

pℓCℓ ≈ ε−2

(∞∑

ℓ=0

√VℓCℓ

)2

,


which is the same as the total cost in Eq. (1.1), and half the cost of thestandard MLMC algorithm in which only half of the MSE “budget” of ε2

is allocated to the variance.Although it is very elegant to have an unbiased estimator when β>γ, the

practical benefits may be limited. In this situation, the dominant cost ofthe standard MLMC algorithm is on the coarsest levels of simulation, andtherefore one can increase the value of L substantially at negligible cost,and at the same time allocate almost all of the Mean Square Error budgetof ε2 to the variance term in (2.1), leading to approximately the same totalcost.

2.3. Multilevel Richardson-Romberg extrapolation

Richardson extrapolation is a very old technique in numerical analysis.Given a numerical approximation Ph based on a discretisation parameter hwhich leads to an error

Ph − P = ahα +O(h2α),

it follows that

P2h − P = a (2h)α +O(h2α),

and hence the extrapolated value

P =2α

2α−1Ph −

1

2α−1P2h

satisfies

P − P = O(h2α).

At the suggestion of a reviewer, the author’s original MLMC paper (Giles2008b) included results using one level of Richardson extrapolation, bothfor the multilevel results as well as the standard Monte Carlo method. Mul-tilevel on its own was significantly better than Richardson extrapolation,but the two together were even better.

Lemaire and Pages have taken this approach much further, and havealso performed a comprehensive error analysis (Lemaire and Pages 2013).Assuming that the weak error has a regular expansion

E[Pℓ]− E[P ] =

L∑

n=1

an2−nαℓ +O(2−αℓL),

they first determine the unique set of weights wℓ, ℓ = 0, 1, . . . , L such that

L∑

ℓ=0

wℓ = 1,

L∑

ℓ=0

wℓ 2−nαℓ = 1, n = 1, . . . , L,

12 M.B. Giles

so that(

L∑

ℓ=0

wℓ E[Pℓ]

)− E[P ] =

L∑

ℓ=0

wℓ (E[Pℓ]− E[P ]) = O(2−αL2).

Next, they re-arrange terms to give

L∑

ℓ=0

wℓ E[Pℓ] =

L∑

ℓ=0

vℓ E[Pℓ−Pℓ−1]

where as usual P−1≡0, and the coefficients vℓ are defined by wℓ = vℓ−vℓ+1,with vL+1≡0, and hence

vℓ =

L∑

ℓ′=ℓ

wℓ′ .

This leads to their Multilevel Richardson-Romberg extrapolation estimator,

Y =

L∑

ℓ=0

Yℓ, Yℓ = N−1ℓ vℓ

∑

n

(P(ℓ,n)ℓ −P

(ℓ,n)ℓ−1 ).

Because the remaining error is O(2−αL2), rather than the usual O(2−αL),

it is possible to obtain the usual O(ε) weak error with a value of L whichis the square root of the usual value. Hence, in the case β = γ they provethat the overall cost is reduced to O(ε−2| log ε|), while for β<γ the cost is

reduced much more to O(ε−22(γ−β)√

| log2 ε|/α). This analysis is supportedby numerical results which demonstrate considerable savings (Lemaire andPages 2013), and therefore this appears to be a very useful extension to thestandard MLMC approach when β≤γ.

2.4. Multi-Index Monte Carlo

Multi-Index Monte Carlo (MIMC) is probably the most significant exten-sion of the MLMC methodology (Haji-Ali, Nobile and Tempone 2014a). Instandard MLMC, there is a one-dimensional set of levels, with a scalar levelindex ℓ, although in some applications changing ℓ can change more than oneaspect of the computation (such as both timestep and spatial discretisationin a parabolic SPDE application, or timestep and number of sub-samplesin a nested simulation). MIMC generalises this to “levels” being defined inmultiple directions, so that the level “index” ℓ is now a vector of integerindices. This is illustrated in Figure 2.1 for a 2D MIMC application.

In MLMC, if we define the backward difference

∆Pℓ ≡ Pℓ − Pℓ−1

with P−1≡0, as usual, then the telescoping sum which lies at the heart of


ℓ1

ℓ2

four evaluations forcross-difference ∆P(5,4)

Figure 2.1. “Levels” in 2D multi-index Monte Carlo application

MLMC is

E[P ] =∑

ℓ≥0

E[∆Pℓ].

Generalising this to D dimensions, Haji-Ali et al. (2014a) first define abackward difference operator in one particular dimension,

∆dPℓ ≡ Pℓ − Pℓ−ed

where ed is the unit vector in direction d. They then define the cross-difference

∆Pℓ ≡(

D∏

d=1

∆d

)Pℓ

and the telescoping sum becomes

E[P ] =∑

ℓ≥0

E[∆Pℓ].

As an example, Figure 2.1 marks the four locations at which Pℓ must becomputed to determine the value of ∆P(5,4) in the 2D application.

The MIMC Theorem formulated by Haji-Ali et al. (2014a) can be ex-pressed in the following form to match, as closely as possible, the formula-tion of the MLMC Theorem.

Theorem 2. If there exist independent estimators Yℓ based on Nℓ MonteCarlo samples, each with expected cost Cℓ and variance Vℓ, and positiveD-dimensional vectors α,β,γ, with αd ≥ 1

2βd, and also positive constantsc1, c2, c3 such that

i)∣∣E[Pℓ − P ]

∣∣ −→ 0 as mind

ℓd −→ ∞

14 M.B. Giles

iii) E[Yℓ] = E[∆Pℓ]

ii)∣∣E[Yℓ]

∣∣ ≤ c1 2−α·ℓ

iv) Vℓ ≤ c2 2−β·ℓ

v) Cℓ ≤ c3 2γ·ℓ,

then there exists a positive constant c4 such that for any ε<e−1 there is aset of levels L, and integers Nℓ for which the multilevel estimator

Y =∑

ℓ∈L

Yℓ,

has a mean-square-error with bound

MSE ≡ E

[(Y − E[P ])2

]< ε2

with a computational complexity C with bound

E[C] ≤

c4 ε−2 , η < 0,

c4 ε−2 | log ε|e1 , η = 0,

c4 ε−2−η | log ε|e2 , η > 0,

where

η = maxd

γd − βdαd

.

When αd > 12βd for all d, the exponents for the logarithmic terms are

e1 = 2D2, e2 = (D2−1) (2+η),

where D2 is the number of dimensions d for which

γd − βdαd

= η.

The form of the exponents is more complicated when αd =12βd for some d

(Haji-Ali et al. 2014a).

Notes:

• Condition i) ensures weak convergence to the correct value.• Condition ii) ensure each Yℓ has the correct expected value; this allows

for the possibility of using an estimator which is more complicated thanthe natural choice Yℓ = ∆Pℓ, if it will lead to a reduced variance.

• Conditions iii) and iv) give the order of convergence of the weak errorand the multilevel correction variance, and condition v) gives the rateat which the cost increases with level.

• The proof of the theorem is significantly harder than for the MLMCtheorem.


ℓ1

ℓ2 tL

ℓ1

ℓ2

Figure 2.2. Two choices of 2D MIMC summation region L.

• There is again the difficulty for each new application of determiningα,β,γ for conditions iii)-v). A relatively simple example will be givenlater in Section 9.

It might seem natural that the summation region L should be rectangular,as illustrated on the left in Figure 2.2, so that

∑

ℓ∈L

Yℓ = PL

where L is the outermost point on the rectangle. However, Haji-Ali et al.(2014a) prove that this gives the optimal order of complexity only whenη<0, and under the additional condition that

∑

d

γdαd

≤ 2.

The optimal choice for L, which yields the complexity bounds given in thetheorem, is of the form ℓ ·n ≤ L for a particular choice of direction vector nwith strictly positive components. In 2D, this corresponds to a triangularregion, as illustrated on the right in Figure 2.2. This is very similar to the useof sparse grid methods in high-dimensional PDE approximations (Bungartzand Griebel 2004), and indeed MIMC can be viewed as a combination ofsparse grid methods and Monte Carlo sampling.

The benefits of MIMC over the standard MLMC can be very substantial.They are perhaps best illustrated by an elliptic PDE or SPDE example,in which D corresponds to the number of spatial dimensions. Using thestandard MLMC approach, β, the rate of convergence of the multilevelvariance, will usually be independent of D, but γ, the rate of increase inthe computational cost, will increase at least linearly with D. Therefore,in a high enough dimension we will have β ≤ γ and therefore the overallcomputational complexity will be less (often much less) than the optimal

16 M.B. Giles

O(ε−2). However, using MIMC we can have βd>γd in each direction, andtherefore obtain the optimal complexity.

Hence, in the same way that sparse grids offer the possibility of dimension-independent complexity for deterministic PDE applications, MIMC offersthe possibility of dimension-independent complexity for SPDEs and otherhigh-dimensional stochastic applications.

2.5. Multi-dimensional output functionals

In Monte Carlo simulation, we start with a sample ω from the random inputspace, often compute an intermediate solution U from an approximation toan SDE or SPDE, and then determine the final output quantity P .

ω −→ U −→ P

So far, the discussion in this article has focussed on the idea of P being ascalar output, but this is not necessarily always the case. In some applica-tions there may be more than one scalar output of interest, and in otherapplications P may be infinite-dimensional.

Some of the possibilities are listed in Table 2.1. In Heinrich’s originalmultilevel research on parametric integration (Heinrich 1998, Heinrich andSindambiwe 1999, Heinrich 2000, Heinrich 2001), the sample ω was from afinite-dimensional unit hypercube, there was no intermediate solution, andthe output was an infinite-dimensional function of a parameter λ. In theauthor’s initial research (Giles 2008b, Giles 2008a) ω represented an infinite-dimensional input Brownian path, U was the solution of an SDE, and P wasa scalar functional of that solution U . In the simplest PDE applications,one might have a finite set of parameters which specify uncertain initial orboundary data, and a single scalar output, so only the intermediate PDEsolution is infinite-dimensional. Finally, new research (Giles, Nagapetyanand Ritter 2014) addresses the problem of approximating the cumulativedistribution function (CDF) of exit times associated with the solution of anSDE; in this case, all three spaces are infinite-dimensional.

The extension of the MLMC theory to a finite set of outputs is verynatural. The standard MLMC theorem will apply to each one of the outputs.If there is a desired accuracy εm for the mth output, then a simple approachproposed by Tigran Nagapetyan1 is to define

Vℓ ≡ maxm

Vℓ,m

ε2m,

where Vℓ,m is the variance of the multilevel estimator on level ℓ for the mth

1 Personal communication


Table 2.1. Dimensionality of Ω, the input space, U , the intermediate solutionspace, and P , the output space, for some different MLMC applications

Application dim(Ω) dim(U) dim(P)

parametric integration finite — ∞SDE with scalar output ∞ ∞ 1PDE with scalar output finite/∞ ∞ 1SDE with CDF output ∞ ∞ ∞

output. Enforcing the constraint

L∑

ℓ=0

N−1ℓ Vℓ ≤ 1

2

is then sufficient to ensure that all of the individual variance constraints aresatisfied, and the standard Lagrange multiplier approach outlined in Section1.3 can be used to determine the optimal number of samples to use on eachlevel.

The extension of the theory to infinite-dimensional outputs requires thedefinition of an appropriate norm ‖·‖, and then to a large extent the theorycarries over by making the substitutions:

∣∣E[Pℓ]− E[P ]∣∣ −→

∥∥E[Pℓ]− E[P ]∥∥

V[Pℓ−Pℓ−1] −→ E

[∥∥Pℓ−Pℓ−1 − E[Pℓ−Pℓ−1]∥∥2]

One slight complication is that if a and b are two independent scalarrandom variables with zero mean then

E[ (a+b)2] = E[a2] + E[b2].

When using the 2-norm, this extends to independent random vectors a andb, each with zero mean, since

E[ ‖a+b‖2] = E[ ‖a‖2] + E[ ‖b‖2],and similarly to random functions with a 2-norm based on an inner product.However this does not necessarily apply for other norms, and so the variancefor the combined multilevel estimator can not necessarily be expressed inthe usual form as

∑

ℓ

N−1ℓ Vℓ, Vℓ ≡ E

[∥∥Pℓ−Pℓ−1 − E[Pℓ−Pℓ−1]∥∥2].

Nevertheless, the theory can be extended to the use of other norms by using

18 M.B. Giles

results from Banach space theory for the sums of independent random vari-ables (Ledoux and Talagrand 1991, Heinrich 1998), and in recent researchDaun and Heinrich have addressed the challenges of parametric integrationin infinite-dimensional Banach spaces (Daun and Heinrich 2013, Daun andHeinrich 2014a, Daun and Heinrich 2014b).

2.6. Non-geometric MLMC

It is very important to repeat, and emphasise, the fact that MLMC doesnot require the use of a geometric sequence of grids. In many applications,and most of the existing literature, a geometric sequence will be the appro-priate choice, but there are other applications for which it is not. The onlyassumptions in the analysis of the two-level method in Section 1.2 and themulti-level method in Section 1.3 are that the accuracy and cost increasewith level, and the variance of the correction term decreases with level.

When there is no underlying structure to suggest that a geometric se-quence is appropriate, a question which arises is how to select the levels.Suppose that one starts with a large collection of potential levels; how doesone decide which subset to use?

One approach developed by Vidal-Codina, Nguyen, Giles and Peraire(2014) is to first use a number of trial samples to obtain an estimate ofVℓ = V[Pℓ] and Vℓ1,ℓ2 ≡ V[Pℓ2−Pℓ1 ] for all pairs (ℓ1, ℓ2), with ℓ1 < ℓ2, withassociated costs Cℓ and Cℓ1,ℓ2 . For a particular ordered subset of levels,ℓ1, ℓ2, . . . , ℓM, with fixed ℓM = L to achieve a user-specified accuracy,following the analysis in Section 1.3 which led to (1.1), we obtain the cost

C(ℓ1, ℓ2, . . . , ℓM ) = ε−2

(√

Vℓ0 Cℓ0 +M∑

m=1

√Vℓm,ℓm−1 Cℓm,ℓm−1

)2

.

If the maximum number of levels is not too large, it is possible to per-form an exhaustive search to find the optimal subset ℓ1, ℓ2, . . . , ℓM whichminimises this cost.

Alternatively, suppose that we have a multilevel Monte Carlo implemen-tation with a number of levels ℓ = 0, 1, . . . , L. Looking in particular at onelevel ℓ, is it computationally more efficient to keep that level, or would itbe better to drop it and jump straight from level ℓ−1 to ℓ+1?

If we keep level ℓ then the contribution to the overall multilevel estimatordue to samples involving level ℓ is

1

Nℓ

Nℓ∑

n=1

(P(n)ℓ − P

(n)ℓ−1) +

1

Nℓ+1

Nℓ+1∑

n=1

(P(n)ℓ+1 − P

(n)ℓ )

The usual multilevel analysis shows that for optimality Nℓ ∝√

Vℓ/Cℓ

where Vℓ ≡ V[Pℓ −Pℓ−1] and Cℓ is the cost of computing a single sample of


Pℓ − Pℓ−1. Hence we have

Nℓ = Nℓ+1

√Vℓ

Vℓ+1

Cℓ+1

Cℓ.

The combined variance from the two terms is

1

NℓVℓ +

1

Nℓ+1Vℓ+1 =

Vℓ+1

Nℓ+1

(1 +

√Vℓ Cℓ

Vℓ+1Cℓ+1

),

and the combined cost is

NℓCℓ +Nℓ+1Cℓ+1 = Nℓ+1Cℓ+1

(1 +

√Vℓ Cℓ

Vℓ+1Cℓ+1

),

and so the product of the combined variance and combined cost is

Vℓ+1Cℓ+1

(1 +

√Vℓ Cℓ

Vℓ+1Cℓ+1

)2

.

On the other hand, if we drop level ℓ and jump straight from level ℓ−1to ℓ+1, then the replacement estimator is

1

Nℓ+1

Nℓ∑

n=1

(P(n)ℓ+1 − P

(n)ℓ−1).

Its variance is Vℓ+1/Nℓ+1, where Vℓ+1 ≡ V[Pℓ+1−Pℓ−1], and its cost is

Nℓ+1Cℓ+1, where Cℓ+1 is the cost of a single sample of Pℓ+1−Pℓ−1. Hence,the product of the variance and the cost is

Vℓ+1 Cℓ+1.

The question now is whether

Vℓ+1 Cℓ+1 < Vℓ+1 Cℓ+1

(1 +

√Vℓ Cℓ

Vℓ+1 Cℓ+1

)2

. (2.5)

If this test is true, then it is best to drop level ℓ, because for a fixed com-putational cost this will deliver the lower variance, or for a fixed variance itcan be achieved at a lower computational cost.

The main contribution to both Cℓ+1 and Cℓ+1 comes from the level ℓ+1simulation which produces the output Pℓ+1, so we will assume from here on

that Cℓ+1 ≈ Cℓ+1. The cost ratio Cℓ/Cℓ+1 is usually known a priori, andthe variance ratio Vℓ/Vℓ+1 is often known asymptotically as ℓ → ∞, or can

be estimated empirically. The key question is how Vℓ+1 compares to Vℓ+1.This could also be quantified empirically, by generating a few Monte Carlo

20 M.B. Giles

samples for Pℓ+1−Pℓ−1. Alternatively, we can perform some analysis to gainsome insight into the question.

Using the standard result for the variance of a sum of two random vari-ables, we have

Vℓ+1 = Vℓ+1 + 2ρ√

Vℓ Vℓ+1 + Vℓ,

where ρ is the correlation between Pℓ+1−Pℓ and Pℓ−Pℓ−1. We can nowconsider two extremes, ρ=1 and ρ=0.

If ρ = 1, and so we have perfect correlation between the increments atdifferent levels, then

Vℓ+1 = Vℓ+1

(1 +

√Vℓ

Vℓ+1

)2

.

Since Cℓ < Cℓ+1, it follows that the test (2.5) is never satisfied, and so itis best to retain level ℓ. Near-perfect correlation is likely to arise in PDEapplications with finite dimensional uncertainty in the initial data, boundarydata, or PDE coefficients. In this case, the increments Pℓ−Pℓ−1 obtainedin going from one level to the next come from the progressive eliminationof truncation error due to the discretisation of the PDE.

On the other hand, if ρ=0, and so the increments at different levels areindependent, then Vℓ+1 = Vℓ+1 + Vℓ and so we drop level ℓ if

1 +Vℓ

Vℓ+1<

(1 +

√Vℓ Cℓ

Vℓ+1Cℓ+1

)2

.

2.7. MLQMC

Giles and Waterhouse (2009) developed a variant of MLMC which usesquasi-Monte Carlo samples instead of independent Monte Carlo samples.This was based on extensible rank-1 lattices developed at UNSW (Dick,Pillichshammer and Waterhouse 2007). QMC is known to be most effectivefor low-dimensional applications, and the numerical results were very en-couraging for SDE applications in which the dominant computational costwas on the coarsest levels of resolution. However, there was no supportingtheory for this research.

More recently, there has been considerable research on the theoreticalfoundations for multilevel QMC (MLQMC) methods (Niu, Hickernell, Muller-Gronbach and Ritter 2010, Kuo, Schwab and Sloan 2012, Dick et al. 2013,Baldeaux and Gnewuch 2014, Dick and Gnewuch 2014a, Dick and Gnewuch2014b). These theoretical developments are very encouraging, showing thatunder certain conditions they lead to multilevel methods with a complexitywhich is O(ε−p) with p<2.


3. MLMC implementation

3.1. MLMC algorithm

Based on the theory in Section 2.1, the geometric MLMC algorithm usedfor the numerical tests in this paper is:

Algorithm 1 MLMC algorithm

start with L=2, and initial target of N0 samples on levels ℓ = 0, 1, 2

while extra samples need to be evaluated do

evaluate extra samples on each levelcompute/update estimates for Vℓ, ℓ = 0, . . . , Ldefine optimal Nℓ, ℓ = 0, . . . , Ltest for weak convergenceif not converged, set L := L+1, and initialise target NL

end while

In the above algorithm, the equation for the optimal Nℓ is

Nℓ =

⌈2 ε−2

√Vℓ/Cℓ

(L∑

ℓ=0

√VℓCℓ

)⌉, (3.1)

where Vℓ is the estimated variance, and Cℓ is the cost of an individualsample on level ℓ. This ensures that the estimated variance of the combinedmultilevel estimator is less than 1

2 ε2.

The test for weak convergence tries to ensure that |E[P−PL] |< ε/√2,

to achieve an MSE which is less than ε2, with ε being a user-specifiedr.m.s. accuracy. If E[Pℓ−Pℓ−1] ∝ 2−αℓ then the remaining error is

E[P−PL] =

∞∑

ℓ=L+1

E[Pℓ − Pℓ−1] = E[PL−PL−1] / (2α − 1)

This leads to the convergence test |E[PL−PL−1] | /(2α − 1) < ε/√2, but for

robustness, we extend this check to extrapolate from the previous two datapoints E[PL−1−PL−2], E[PL−2−PL−3], and take the maximum over all threeas the estimated remaining error.

It is important to note that this algorithm is heuristic; it is not guaranteedto achieve a MSE error which is less than ε2. The main MLMC theoremin Section 2.1 does provide a guarantee, but the conditions of the theoremassume a priori knowledge of the constants c1 and c2 governing the weakconvergence and the variance convergence as ℓ→∞. These two constantsare in effect being estimated in the numerical algorithm described above.

In addition, the accuracy of the variance estimate at each level dependson the size of the sample set. If it is small, then this variance estimate may

22 M.B. Giles

be poor; we will see later that the implementation sets a minimum estimateby extrapolation from coarser levels with more samples.

3.2. Overview of MATLAB implementation

The results to be presented later all use the same two routines written inMATLAB:

• mlmc.m: driver code which performs the MLMC calculation using an

application-specific routine to compute∑

n(P(n)ℓ −P

(n)ℓ−1)

p for p = 1, 2and a specified number of independent samples;

• mlmc test.m: a program which perform various tests and then callsmlmc.m to perform a number of MLMC calculations.

These two routines, and all of the application-specific routines, can bedownloaded from (Giles 2014).

3.3. MLMC test routine

mlmc test.m first performs a set of calculations using a fixed number ofsamples on each level of resolution, and produces four plots:

• log2(Vℓ) versus level ℓ• log2(|E[Pℓ−Pℓ−1]|) versus level ℓ• consistency check versus level• kurtosis versus level

This last two of these need some explanation. If a, b, c are estimates for

E[P fℓ−1], E[P

fℓ ], E[Yℓ], respectively, then it should be true that a−b+ c ≈

0. The consistency check verifies that this is true, to within the accuracyone would expect due to Monte Carlo sampling error. This is particularly

important when P fℓ 6= P c

ℓ because one is using different approximations onthe coarse and fine levels.

Since √V[a−b+c] ≤

√V[a] +

√V[b] +

√V[c]

it computes and plots the ratio

| a− b+ c |3(√Va +

√Vb +

√Vc)

where Va, Vb, Vc are empirical estimates for the variances of a, b, c. Theprobability of this ratio being greater than unity is less than 0.3%. Hence,if it is, it indicates a likely programming error, or a mathematical error inthe formulation which violates the crucial identity (2.4).

The MLMC approach needs a good estimate for Vℓ = V[Yℓ], but howmany samples are needed for this? As few as 10 may be sufficient in many


cases for a rough estimate, but many more are needed when there are rareoutliers. When the number of samples N is large, the standard devia-tion of the sample variance for a random variable X with zero mean isapproximately

√(κ−1)/N E[X2] where the kurtosis κ is defined as κ =

E[X4]/E[X2])2. Hence O(κ) samples are required to obtain a reasonableestimate for the variance. As well as plotting the kurtosis κℓ, mlmc test.m

will give a warning if κℓ is very large, as this is an indication that theempirical variance estimate may be poor.

An extreme, but important, example is when P always takes the value 0or 1. In this case we have

X ≡ Pℓ − Pℓ−1 =

1, probability p−1, probability q0, probability 1−p−q

If p, q ≪ 1, then E[X] ≈ 0, and κ ≈ (p+q)−1 ≫ 1. Therefore, many samplesare required for a good estimate of Vℓ; otherwise, we may get all X(n) = 0,which will give an estimated variance of zero. What is more, the kurtosiswill become worse as ℓ→∞ since p, q→0 due to weak convergence.

After performing these initial tests, mlmc test.m then uses the driverroutine mlmc.m to perform the MLMC computation for a number of valuesof ε, and plots the results. These plots will be explained later when the firstresults are presented.

3.4. MLMC driver routine

The first part of the documentation section at the top of mlmc.m is fairlyself-explanatory:

% function [P, Nl] = mlmc(N0,eps,mlmc_l, alpha,beta,gamma)

%

% P = value

% Nl = number of samples at each level

% N0 = initial number of samples on levels 0,1,2

% eps = desired accuracy (rms error)

% alpha -> weak error is O(2^-alpha*l)

% beta -> variance is O(2^-beta*l)

% gamma -> sample cost is O(2^gamma*l)

%

% if alpha, beta are not positive then they will be estimated

The user must supply an application-specific function to perform the cal-culations to compute Yℓ on level ℓ. This function has to conform to thefollowing template:

24 M.B. Giles

% mlmc_l = application-specific function for level l estimator

% which must conform to the following template:

%

% function sums = mlmc_l(l,N)

% inputs: l = level

% N = number of paths

% output: sums(1) = sum(Y)

% sums(2) = sum(Y.^2)

% where Y are iid samples with expected value:

% E[P_0] on level 0

% E[P_l - P_l-1] on level l>0

The first part of the code initialises various parameters, and then usesthe application-specific routine mlmc l to generate the initial N0 samples oneach of the first three levels.

function [P, Nl] = mlmc(N0,eps,mlmc_l, alpha_0,beta_0,gamma)

alpha = max(0, alpha_0);

beta = max(0, beta_0);

L = 2;

Nl(1:3) = 0;

suml(1:2,1:3) = 0;

dNl(1:3) = N0;

% keep looping while additional samples need to be computed

while sum(dNl) > 0

% update sample sums on each level

for l=0:L

if dNl(l+1) > 0

sums = feval(mlmc_l,l,dNl(l+1));

Nl(l+1) = Nl(l+1) + dNl(l+1);

suml(1,l+1) = suml(1,l+1) + sums(1);

suml(2,l+1) = suml(2,l+1) + sums(2);

end

end

The next part computes (or in later passes updates) the estimates formℓ ≡ |E[Yℓ]|, and Vℓ ≡ V[Yℓ]. If the mean and variance are decayingas expected, one would expect mℓ = 2−α mℓ−1, Vℓ = 2−β Vℓ−1. To avoidpossible problems from high kurtosis generating poor empirical estimates,


the estimates for mℓ and Vℓ are not allowed to decrease by more than factor12 relative to this anticipated value. 2

% compute absolute average and variance

ml = abs( suml(1,:)./Nl);

Vl = max(0, suml(2,:)./Nl - ml.^2);

for l = 3:L+1

ml(l) = max(ml(l), 0.5*ml(l-1)/2^alpha);

Vl(l) = max(Vl(l), 0.5*Vl(l-1)/2^beta);

end

If the user has not supplied the values for α and/or β, they are nowestimated by linear regression.

% use linear regression to estimate alpha, beta if not given

if alpha_0 <= 0

x = [(1:L)’ ones(L,1)] \ log2(ml(2:end))’;

alpha = max(0.5,-x(1));

end

if beta_0 <= 0

x = [(1:L)’ ones(L,1)] \ log2(Vl(2:end))’;

beta = max(0.5,-x(1));

end

Next, the optimal number of samples on each level is computed, which inturn gives the number of additional samples which will need to be generatedin the next pass.

% set optimal number of additional samples

Cl = 2.^(gamma*(0:L));

Ns = ceil(2 * sqrt(Vl./Cl) * sum(sqrt(Vl.*Cl)) / eps^2);

dNl = max(0, Ns-Nl);

In the final part of the loop, we test for weak convergence, using thealgorithm described in Section 3.1. If a new level is added, its variance isestimated by extrapolation, and the optimal number of samples is updated.

% if (almost) converged, estimate remaining error and decide

2 Haji-Ali et al. (2014a) use an alternative Bayesian approach to address this same prob-lem of estimating the variance.

26 M.B. Giles

% whether a new level is required

if sum( dNl > 0.01*Nl ) == 0

range = -2:0;

rem = max(ml(L+1+range).*2.^(alpha*range)) / (2^alpha - 1);

if rem > eps/sqrt(2)

L = L+1;

Vl(L+1) = Vl(L) / 2^beta;

Nl(L+1) = 0;

suml(1:4,L+1) = 0;

Cl = 2.^(gamma*(0:L));

Ns = ceil(2 * sqrt(Vl./Cl) * sum(sqrt(Vl.*Cl)) / eps^2);

dNl = max(0, Ns-Nl);

end

end

end

The final step after the completion of the main loop is to compute thecombined multilevel estimator:

% finally, evaluate multilevel estimator

P = sum(suml(1,:)./Nl);

3.5. MLQMC algorithm

We conclude this chapter with the description of a MLQMC algorithm whichis based on that developed by Giles and Waterhouse (2009), but updatedto bring it more in line with the MLMC algorithm in Section 3.1.

The key change in MLQMC is that Nℓ is now the size of a set of QMCpoints used on level ℓ. This set of points are not constructed randomlyand independently, but are instead constructed very carefully, using well-established QMC techniques such as rank-1 lattices (Dick et al. 2007) orSobol sequences (Joe and Kuo 2008) to provide a relatively uniform coverageof a unit hypercube integration region. In the best cases, this results inthe approximate numerical integration error being O(N−1

ℓ ) rather than the

usual O(N−1/2ℓ ) error which comes from Monte Carlo sampling.

Using just one set of Nℓ points gives good accuracy, but no confidenceinterval. To regain a confidence interval one uses randomised QMC in whichthe set of points is gives a random shift (for rank-1 lattice rules) or a digitalscrambling (for Sobol sequences). Using 32 sets of points, each collectivelyrandomised, yields 32 set averages for the quantity of interest, Yℓ, and from


these 32 random independent values the variance of their average, Vℓ, canbe estimated in the usual way.

Since Vℓ is now defined to be the variance of the average, for a givennumber of levels L, the aim is to choose the Nℓ in a way which ensures that

L∑

ℓ=0

Vℓ ≤ 12ε

2. (3.2)

How can this be achieved most efficiently? Many QMC methods work nat-urally with Nℓ as a power of 2. Doubling Nℓ will usually eliminate a largefraction of the variance, so the greatest reduction in total variance relativeto the additional computational effort is achieved by doubling Nℓ on thelevel ℓ∗ given by

ℓ∗ = argminℓ

Vℓ

NℓCℓ. (3.3)

Based on this, and using the same test for weak convergence as in the MLMCalgorithm, the MLQMC algorithm is:

Algorithm 2 MLQMC algorithm.

start with L=2, and an initial set size Nℓ=1 on levels ℓ = 0, 1, 2

while not converged do

evaluate 32 set averages on any level with new/changed values of Nℓ

compute corresponding new/changed estimates for Vℓ

test whether total variance condition (3.2) is satisfiedif total variance still too big then

determine ℓ∗ defined by (3.3) and double Nℓ∗

else

test for weak convergenceif not converged then

set L := L+1, and initialise NL=1end if

end if

end while

28 M.B. Giles

4. Introduction to applications

The remainder of the article presents and discusses a large number of dif-ferent applications using MLMC methods. In part, the large number isto illustrate the flexibility and generality of the approach, and stimulateideas on how it could be used for other applications which the reader maybe interested in, but the variety of applications is also chosen to illustratedifferent responses to the following questions:

1 How are the levels defined?

This is often quite simple, but in some cases, for example when usingadaptive time-stepping, it is not so obvious.

2 How are the coarse and fine samples coupled when computing the dif-ference Pℓ(ω)−Pℓ−1(ω)?

What does it even mean for Pℓ(ω) and Pℓ−1(ω) to correspond to thesame sample ω? In some cases, it is very simple, but in other cases,such as for the continuous-time Markov chains, it requires a key insightto construct a suitable coupling.

3 Are there problems with discontinuous outputs?

When the output function is a discontinuous function of the intermedi-ate solution U , there is the possibility that a small difference betweenthe intermediate solutions for the coarse and fine samples leads to anO(1) value for Pℓ−Pℓ−1. This usually leads to a larger variance, andhence a lower value for β. Addressing this problem is a critical issue,and we will consider a variety of different techniques. Many of theseexploit the flexibility of slightly different approximations for the coarseand fine samples, provided always that the identity (2.4) is respected,so that the telescoping sum is still valid.

4 Is there scope for constructing even more efficient multilevel estima-tors?

As with the techniques for handling discontinuous output functions,there can be ways of exploiting the flexibility of different coarse andfine approximations to achieve a higher rate of variance convergencethan one would otherwise expect due to the rate of strong convergenceof the intermediate solution.

5 What is the current status on supporting numerical analysis to explainthe observed behaviour of numerical algorithms?

For many applications this is the greatest challenge, but great progresshas been made in the past 5 years.


5. SDEs

5.1. Euler-Maruyama discretisation

The original multilevel path simulation paper (Giles 2008b) treated stochas-tic differential equations

dSt = a(St, t) dt+ b(St, t) dW,

using the simple Euler-Maruyama discretisation with a uniform timestep hand Brownian increments ∆Wn,

Sn+1 = Sn + a(Sn, tn)h+ b(Sn, tn)∆Wn.

The multilevel Monte Carlo implementation is very simple. On level ℓ,the uniform timestep is taken to be hℓ = h0 M

ℓ, for some integer M . Thetimestep h0 on the coarsest level is often taken to be the interval lengthT , so that there is just one timestep for the entire interval, but this is notrequired, and in some applications using such a large timestep may lead tonumerical results which are so inaccurate that they are not helpful, i.e. theyare such a poor control variate for the finer approximations that they donot lead to a reduction in the variance. In such cases, it would be better tostart with a smaller value for h0.

The multilevel coupling is achieved by using the same underlying driv-ing Brownian path for the coarse and fine paths; this is accomplished bysumming the Brownian increments for the fine path timesteps to obtain theBrownian increments for the coarse timesteps. The multilevel estimator isthen the natural one defined in (2.2), with the specific payoff approximationPℓ depending on the particular application.

Provided the SDE satisfies the usual conditions (see Theorem 10.2.2 in(Kloeden and Platen 1992)), the strong error for the Euler discretisationwith timestep h is O(h1/2), so that

E

[‖S − S‖2

]= O(h).

For Lipschitz payoff functions P (such as European, Asian and lookbackoptions in finance) for which

∣∣∣P (S1)− P (S2)∣∣∣ ≤ K‖S1−S2‖,

we have

V[P−Pℓ] ≤ E[(P−Pℓ)2] ≤ K2

E

[‖S − Sℓ‖2

],

and

Vℓ ≡ V[Pℓ−Pℓ−1] ≤ 2(V[P−Pℓ] + V[P−Pℓ−1]

),

and hence Vℓ = O(hℓ).

30 M.B. Giles

If hℓ = 4−ℓh0, as in the numerical examples in (Giles 2008b), then thisgives α=2, β=2 and γ=2. Alternatively, if hℓ = 2−ℓh0 with twice as manytimesteps on each successive level, as used in the numerical examples in thisarticle, then α = 1, β = 1 and γ = 1. In either case, Theorem 1 gives thecomplexity to achieve a root-mean-square error of ε to be O(ε−2(log ε)2),which is near-optimal as Creutzig, Dereich, Muller-Gronbach and Ritter(2009) prove an O(ε−2) lower bound for the complexity. The same pa-per also proves that MLMC achieves an O(ε−2(log ε)2) worst case optimalcomplexity for a class of Lipschitz path-dependent functions.

Figure 5.3 shows results for a financial call option with a single underlyingasset satisfying Geometric Brownian Motion SDE,

dSt = r St dt+ σ St dW,

and payoff P (S) ≡ exp(−rT ) max(ST −K, 0) with r = 0.05, σ = 0.2, T =1, S0=100,K=100. The full code is provided in (Giles 2014).

The top two plots show the variance and absolute mean value for both Pℓ

and Pℓ − Pℓ−1. These are based on an initial sample of 105 paths. The linefor log2 V[Pℓ−Pℓ−1] in the top left plot has a slope of approximately −1,corresponding to a variance proportional to hℓ, as expected. The line forlog2 |E[Pℓ−Pℓ−1]| also has a slope of approximately −1, implying an O(hℓ)weak convergence.

The middle two plots show the consistency check and kurtosis, as dis-cussed in Section 3.3. These indicate there is no problem with the consis-tency, and the kurtosis is actually improving slightly with level.

The bottom two plots have the results from 5 separate MLMC calcula-tions, each with a different specified accuracy ε to be achieved. The bottomleft plot shows the number of samples which is used for each level of theMLMC computation. The number of levels increases as ε decreases, as de-scribed in Section 3.1 to achieve the required weak error. The number ofsamples also varies with level, with many more on the coarsest level with justone timestep. On the finer levels, the optimal allocation of computationaleffort leads to Nℓ ∝

√Vℓ/Cℓ ≈ 2−ℓ.

The bottom right plot displays the total computational cost C, multipliedby ε2. If C were approximately proportional to ε−2 then ε2 C would beapproximately constant. Indeed this is what is seen, with the slight increaseas ε decreases being due to the | log ε|2 term in Theorem 1. For comparison,the bottom right plot also displays the cost using the standard Monte Carloalgorithm on the finest level used by the MLMC algorithm. The cost ratio5− 12, illustrating the benefits provided by the MLMC algorithm.

Figure 5.4 illustrates the problem with discontinuous payoff functions.The underlying SDE is exactly the same, but the payoff is a digital optionwith payoff P (S) ≡ 10 exp(−rT )H(ST −K), where H(x) is the Heavisidestep function. On the finer levels, most samples have fine and coarse paths


0 2 4 6 8−6

−4

−2

0

2

4

6

8

level l

log

2 v

ariance

Pl

Pl− P

l−1

0 2 4 6 8−15

−10

−5

0

5

level l

log

2 |m

ean|

Pl

Pl− P

l−1

0 2 4 6 80

0.1

0.2

0.3

0.4

level l

consis

tency c

heck

0 2 4 6 85

10

15

level l

kurt

osis

0 2 4 6 810

3

104

105

106

107

108

level l

Nl

10−2

10−1

103

104

accuracy ε

ε2 C

ost

Std MC

MLMC

0.005

0.01

0.02

0.05

0.1

Figure 5.3. Numerical results for a European call option using theEuler-Maruyama discretisation of the GBM SDE.

32 M.B. Giles

0 2 4 6 8−2

−1

0

1

2

3

4

5

level l

log

2 v

aria

nce

Pl

Pl− P

l−1

0 2 4 6 8−10

−8

−6

−4

−2

0

2

4

level l

log

2 |m

ea

n|

Pl

Pl− P

l−1

0 2 4 6 80

0.05

0.1

0.15

0.2

0.25

0.3

0.35

level l

co

nsis

ten

cy c

he

ck

0 2 4 6 850

100

150

200

250

300

350

level l

ku

rto

sis

0 2 4 6 810

3

104

105

106

107

108

level l

Nl

10−2

10−1

103

104

accuracy ε

ε2 C

ost

Std MC

MLMC

0.005

0.01

0.02

0.05

0.1

Figure 5.4. Numerical results for a digital call option using the Euler-Maruyamadiscretisation of the GBM SDE.


Euler-Maruyama Milsteinoption numerics analysis numerics analysis

Lipschitz O(h) O(h) O(h2) O(h2)Asian O(h) O(h) O(h2) O(h2)lookback O(h) O(h) O(h2) o(h2−δ)

barrier O(h1/2) o(h1/2−δ) O(h3/2) o(h3/2−δ)digital O(h1/2) O(h1/2 log h) O(h3/2) o(h3/2−δ)

Table 5.2. Observed and theoretical convergence rates for the multilevelcorrection variance for scalar SDEs, using the Euler-Maruyama and Milsteindiscretisations. δ is any strictly positive constant.

which end on the same side of the strike K, and hence give Pℓ−Pℓ−1 = 0.

However, noting that the strong error is O(h1/2ℓ ), and there is a bounded

density of paths terminating in the neighbourhood of K, there is therefore

an O(h1/2ℓ ) fraction of the samples with the coarse and fine paths on either

side of the strike, with Pℓ−Pℓ−1 = ±1. This gives Vℓ = O(h1/2), and thislower rate of convergence is clearly visible in the top left plot. Furthermore,

E[(Pℓ−Pℓ−1)4] = O(h

1/2ℓ ) and so the kurtosis is O(h

−1/2ℓ ); this increase in

kurtosis is obvious in the middle-right plot, which illustrates why this is ahelpful quantity to plot. Consequently, this application has α= 1, β = 1

2 ,γ=1, leading to the MLMC complexity being O(ε−2.5). This slightly worsecomplexity is visible in the bottom-right plot, and the computational savingscompared to the standard Monte Carlo method are lower.

Table 5.2 summarises the observed variance convergence rate in numericalexperiments for a number of different financial options; the Asian option isbased on the average value of the underlying asset, the lookback is based onits maximum or minimum value, and the barrier is a discontinuous functionof the maximum or minimum. The table also display the theoretical resultswhich have been obtained; the digital option analysis is due to Avikainen(Avikainen 2009) while the others are due to Giles, Higham & Mao (Giles,Higham and Mao 2009).

5.2. Milstein discretisation

For Lipschitz payoffs, the variance Vℓ for the natural multilevel estimatorconverges at twice the order of the strong convergence of the numerical ap-proximation of the SDE. This immediately suggests that it would be betterto replace the Euler-Maruyama discretisation by the Milstein discretisation,

34 M.B. Giles

0 2 4 6 8−20

−15

−10

−5

0

5

10

level l

log

2 v

aria

nce

Pl

Pl− P

l−1

0 2 4 6 8−10

−8

−6

−4

−2

0

2

4

level l

log

2 |m

ea

n|

Pl

Pl− P

l−1

0 2 4 6 80

0.1

0.2

0.3

0.4

level l

co

nsis

ten

cy c

he

ck

0 2 4 6 80

10

20

30

40

50

60

70

level l

ku

rto

sis

0 2 4 6 810

2

104

106

108

level l

Nl

10−2

10−1

103

104

105

accuracy ε

ε2 C

ost

Std MC

MLMC

0.005

0.01

0.02

0.05

0.1

Figure 5.5. Numerical results for a European call option using the Milsteindiscretisation of the GBM SDE.


(Giles 2008a) since it gives first order strong convergence under certain con-ditions (see Theorem 10.3.5 in (Kloeden and Platen 1992)).

For a scalar SDE the Milstein discretisation is

Sn+1 = Sn+ a(Sn, tn)h+ b(Sn, tn)∆Wn+12b(Sn, tn)

∂b

∂S(Sn, tn)

(∆W 2

n − h)

and Figure 5.5 demonstrates the improved results obtained for the call op-tion based on a single underlying GBM asset. Vℓ is now O(h2ℓ ), leading toα=1, β=2, γ=1, and there is a significant reduction in the total computa-tional cost compared to MLMC using the Euler-Maruyama discretisation.

Because β > γ, the dominant computational cost is on the coarsest lev-els. Since these are low-dimensional, they are well suited to the use ofquasi-Monte Carlo methods which are particularly effective in lower di-mensions. This has been investigated by Giles & Waterhouse (Giles andWaterhouse 2009) using a rank-1 lattice rule to generate the quasi-randomnumbers, randomisation with 32 independent offsets to obtain confidenceintervals, and a standard Brownian Bridge construction of the incrementsof the driving Brownian process. The numerical results showed that MLMCon its own was better than QMC on its own, but the combination of thetwo was even better. The QMC treatment greatly reduced the variance persample for the coarsest levels, resulting in significantly reduced costs over-all. In the simplest case of a Lipschitz European payoff, the computationalcomplexity was reduced from O(ε−2) to approximately O(ε−1.5).

Digital options

As with the Euler-Maruyama discretisation, discontinuous payoffs pose achallenge to the multilevel Monte Carlo approach because small differencesin the coarse and fine path simulations can lead to an O(1) difference inthe payoff function. In the case of a digital option, if we use the naturalmultilevel estimator then Pℓ−Pℓ−1 = O(1) for an O(hℓ) fraction of the paths,giving Vℓ = O(hℓ), and a kurtosis which is O(h−1

ℓ ). To obtain a multilevelestimator with an improved variance convergence, we exploit the ideas ofSection 2.1 and develop different estimators P c and P f for the coarse andfine paths, based on one of the following ideas:

• conditional expectation• splitting• change of measure

The conditional expectation approach builds on a well-established tech-nique for payoff smoothing which is used for pathwise sensitivity analysis(see, for example, pp. 399-400 in (Glasserman 2004)). We start by con-sidering the fine path simulation, and make a slight change by using theEuler-Maruyama discretisation for the final timestep, instead of the Mil-stein discretisation. Conditional on the value SN−1 which is the numerical

36 M.B. Giles

approximation of ST−h one timestep before the maturity T , the numerical

approximation for the final value SN has a Gaussian distribution, and fora simple digital option the conditional expectation is known analytically.Thus the fine path payoff can be taken to be

P fℓ = 25 exp(−rT )Φ

(SfN−1+ a(Sf

N−1)hℓ − K

b(SfN−1)

√hℓ

),

where Φ is the Normal cumulative distribution function.A similar treatment is used for the coarse path, except that in the final

timestep, we re-use the known value of the Brownian increment for thesecond last fine timestep, which corresponds to the first half of the finalcoarse timestep. This results in the conditional distribution for the coarsepath underlying at maturity matching that of the fine path to within O(h),for both the mean and the standard deviation (Giles, Debrabant and Roßler2013), and the corresponding coarse path payoff function is

P cℓ−1 = 25 exp(−rT )Φ

(ScN−2+ a(Sc

N−2)hℓ−1 + b(ScN−2)

√hℓ − K

b(ScN−2)

√hℓ

).

Consequently, the difference in payoff between the coarse and fine pathsnear the payoff discontinuity is O(h1/2), giving a variance which is approxi-mately O(h3/2), and a kurtosis which is approximately O(h−1/2). This leadsto α=1, β=3/2 and γ=1. Since β > γ, the MLMC complexity is O(ε−2).

This is all illustrated by the numerical results in Figure 5.6. One particu-larly interesting feature of these results is that there is zero variance on thecoarsest level. This is because there is only one timestep on the coarsestlevel, and therefore the conditional expectation is taken immediately andevery sample gives the same payoff.

It is very important in this conditional expectation formulation that

E[P cℓ−1] = E[P f

ℓ−1] so that the numerical payoff approximation on level ℓ−1has the same expected value regardless of whether it is the coarser or finer ofthe two levels. This ensures that the identity in Equation (2.4) is respectedso that the telescoping summation remains valid.

The conditional expectation technique works well in 1D where there isa known analytic value for the conditional expectation, but in multiple di-mensions it may not be known. In this case, one can use the technique of“splitting” (Asmussen and Glynn 2007). Here the conditional expectation isreplaced by a numerical estimate, averaging over a number of sub-samples.i.e. for each set of Brownian increments up to one fine timestep beforethe end, one uses a number of samples of the final Brownian increment toproduce an average payoff. If the number of sub-samples is chosen appro-priately, the variance is the same, to leading order, without any increasein the computational cost, again to leading order. Because of its simplicity


0 2 4 6 8−15

−10

−5

0

5

10

level l

log

2 v

aria

nce

Pl

Pl− P

l−1

0 2 4 6 8−15

−10

−5

0

5

level l

log

2 |m

ea

n|

Pl

Pl− P

l−1

0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

level l

co

nsis

ten

cy c

he

ck

0 2 4 6 80

20

40

60

80

100

120

level l

ku

rto

sis

0 2 4 6 8

102

103

104

105

106

level l

Nl

10−2

10−1

101

102

103

104

105

accuracy ε

ε2 C

ost

Std MC

MLMC

0.005

0.01

0.02

0.05

0.1

Figure 5.6. Numerical results for a digital call option using the Milsteindiscretisation of the GBM SDE, and a conditional expectation MLMC estimator.

38 M.B. Giles

and generality, this is now the author’s preferred approach. Furthermore,one can revert to using the Milstein approximation for the final timestep.

The change of measure approach is another approximation to the condi-tional expectation. Using the Euler-Maruyama approximation for the finaltimestep, the fine and coarse path conditional distributions at maturity aretwo very similar Gaussian distributions. Instead of following the splittingapproach of taking corresponding samples from these two distributions, wecan take a sample from a third Gaussian distribution (with a mean andvariance perhaps equal to the average of the other two). This leads to theintroduction of a Radon-Nikodym derivative for each path, and the differ-ence in the payoffs from the two paths is then due to the difference in theirRadon-Nikodym derivatives.

In the specific context of digital options, this is a more complicatedmethod to implement, and the resulting variance is no better. However,in other contexts a similar approach can be very effective.

Lookback and barrier options

Lookback and barrier options are financial options which depend on theminimum (or maximum) values of the underlying asset during the wholesimulation interval [0, T ]. A multilevel estimator based directly on the min-imum (or maximum) of the values at the discrete timesteps will have a poorvariance. This is because there is an O(h1/2) variation in the asset valuewithin each timestep of size h, and therefore there is an O(hℓ) differenceon average between the minimum (or maximum) values for the coarse andfine paths. This results in an O(hℓ) variance for lookback options whichare a Lipschitz function of the minimum (or maximum), and an even worse

O(h1/2ℓ ) variance for barrier options which are a discontinuous function of

the minimum (or maximum).The key to achieving an improved variance is the definition of a Brownian

Bridge interpolant based on the approximation that the drift and volatil-ity do not vary within the timestep (Giles 2008a). For the fine path, theinterpolant for timestep [tn, tn + h] is

Sf (t) = Sfn + λ (Sf

n+1−Sfn) + bfn (W (t)−Wn − λ (Wn+1−Wn))

where λ = (t−tn)/h and bfn ≡ b(Sfn, tn). This interpolant corresponds to a

Brownian motion with constant drift and volatility, and for this there arestandard results for the distribution of the minimum or maximum withineach fine timestep (Glasserman 2004) which can be used to construct payoffapproximations.

A similar construction can be used for the coarse path, but its timestepis twice as large, so the interpolant for the time interval [tn, tn + 2h] is

Sc(t) = Scn + λ (Sc

n+2−Scn) + bcn (W (t)−Wn − λ (Wn+2−Wn))


where λ = (t− tn)/(2h) and bcn ≡ b(Scn, cn). We can now evaluate this

interpolant at time tn + h, obtaining

Sc(tn+1) =12 (S

cn+Sc

n+2) +12b

cn ((Wn+1−Wn)− (Wn+2−Wn+1))

using the Brownian increments Wn+1−Wn and Wn+2−Wn+1 already gen-erated for the fine path. The standard Brownian path results can then beused to obtain the coarse path payoff approximations.

The full details are given in (Giles 2008a); the outcome is that β=2 forlookback options, and β=1.5 for barrier options. Hence, we remain in theregime where β>γ and the overall complexity is O(ε−2).

Table 5.2 summarises the convergence behaviour observed numerically forthe different financial option types using the Milstein discretisation, and thesupporting numerical analysis by Giles et al. (2013).

5.3. Multi-dimensional SDEs

The discussion so far has been for scalar SDEs, but the computationalbenefits of Monte Carlo methods arise in higher dimensions. For multi-dimensional SDEs satisfying the usual commutativity condition (see, forexample, p.353 in (Glasserman 2004)) the Milstein discretisation requiresonly Brownian increments for its implementation, and most of the analysisabove carries over very naturally.

The only difficulties are with lookback and barrier options where theclassical results for the distribution of the minimum or maximum of a one-dimensional Brownian motion do not extend to the joint distribution ofthe minima or maxima of two correlated Brownian motions. However, ifthe financial option depends on the weighted average of a set of underlyingassets, then the Brownian interpolation for each individual asset leads nat-urally to a Brownian interpolation for the average, and then the standardone-dimensional results can be used as usual.

Figures 5.7 and 5.8 show results for lookback and barrier options basedon 5 underlying assets. Full details of the numerical construction are givenin (Giles 2009) and can also be viewed within the code (Giles 2014). TheMLMC variance isO(h2) for the lookback option, andO(h3/2) for the barrieroption.

With multi-dimensional SDEs which do not satisfy the commutativitycondition, the Milstein discretisation requires the simulation of Levy ar-eas, iterated Ito integrals which can only be relatively easily simulatedin 2D. This is an unavoidable problem; the classical result of Clark andCameron (1980) proves that O(h1/2) strong convergence is the best thatcan be achieved in general using only Brownian increments.

Nevertheless, Giles and Szpruch (2014) have developed an antithetic treat-ment which achieves a very low variance despite the O(h1/2) strong conver-

40 M.B. Giles

0 2 4 6 8

−15

−10

−5

0

5

10

level l

log

2 v

ariance

Pl

Pl− P

l−1

0 2 4 6 8

−10

−5

0

5

level l

log

2 |m

ean|

Pl

Pl− P

l−1

0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

level l

consis

tency c

heck

0 2 4 6 87

7.5

8

8.5

9

9.5

level l

kurt

osis

0 2 4 6 810

2

104

106

108

level l

Nl

10−2

10−1

103

104

105

106

accuracy ε

ε2 C

ost

Std MC

MLMC

0.01

0.02

0.05

0.1

0.2

Figure 5.7. Numerical results for a lookback option on a basket of 5 GBM assets.


0 2 4 6 8

−15

−10

−5

0

5

10

level l

log

2 v

ariance

Pl

Pl− P

l−1

0 2 4 6 8

−10

−5

0

5

level l

log

2 |m

ean|

Pl

Pl− P

l−1

0 2 4 6 80.05

0.1

0.15

0.2

0.25

0.3

level l

consis

tency c

heck

0 2 4 6 80

1000

2000

3000

4000

5000

6000

7000

level l

kurt

osis

0 2 4 6 810

2

104

106

108

level l

Nl

10−2

10−1

103

104

105

106

accuracy ε

ε2 C

ost

Std MC

MLMC

0.01

0.02

0.05

0.1

0.2

Figure 5.8. Numerical results for a barrier option on a basket of 5 GBM assets.

42 M.B. Giles

gence. The estimator which is used is

Yℓ = N−1ℓ

∑

i

12 (Pℓ(ωi)+Pℓ(ω

ai ))− Pℓ−1(ωi).

Here ωi represents the driving Brownian path, and ωai is an antithetic coun-

terpart defined by a time-reversal of the Brownian path within each coarsetimestep. This results in the Brownian increments for the antithetic finepath being swapped relative to the original path. Lengthy analysis provesthat the average of the fine and antithetic paths is within O(h) of the coarsepath, and hence the multilevel variance is O(h2) for smooth payoffs, andO(h3/2) for the standard European call option.

This treatment has been extended to handle lookback and barrier op-tions (Giles and Szpruch 2013). This combines sub-sampling of the Brown-ian path to approximate the Levy areas with sufficient accuracy to achieveO(h3/4) strong convergence, with an antithetic treatment at the finest levelof resolution to ensure that the average of the fine paths is within O(h) ofthe coarse path.

5.4. Computing sensitivities

In computational finance, it is important to be able to compute sensitivitiesin addition to option prices. When the option payoff is continuous, thestandard Monte Carlo approach is pathwise sensitivity analysis (Broadie andGlasserman 1996) (developed earlier as Infinitesimal Perturbation Analysis(L’Ecuyer 1990)) which considers linearised perturbations to the underlyingpath evolution, and the consequential perturbation to the payoff.

From a multilevel Monte Carlo point of view, the difficulty that this intro-duces is that the derivative of a call option payoff function is discontinuous.Hence, computing first order sensitivities for a call option has similar dif-ficulties to computing the option price for a digital option. Sensitivitiesfor digital options can be obtained by first using the conditional expecta-tion approach described previously, and then applying pathwise sensitivityanalysis to this.

Full details on how to formulate appropriate MLMC estimators are givenby Burgos and Giles (2012), and the numerical analysis of the resultingvariance is given by Burgos (2014).

5.5. Exit times and Feynman-Kac formula

The simplest 1D form of the Feynman-Kac formula states that if u(x, t) isthe solution of the parabolic PDE

∂u

∂t+ a(x, t)

∂u

∂x+ 1

2b2(x, t)

∂2u

∂x2= 0,


for x ∈ V , subject to terminal condition u(x, T ) = f(x), and boundarycondition u(x, t) = g(x) for x∈ ∂V , then u(x, t) can also be written as theconditional expectation

u(x, t) = E[f(ST )1τ≥T + g(Sτ )1τ<T | St = x]

where St satisfies the SDE

dSt = a(St, t) dt+ b(St, t) dWt

and τ is the exit time

τ = inftt : St /∈ V .

Because of this formula and its multi-dimensional generalisation, and alsobecause of other applications which are concerned with the time at whichparticles exit a particular domain, there is considerable interest in the esti-mation of exit times.

Higham, Mao, Roj, Song and Yin (2013) consider the estimation ofE[τ ] using a multilevel approximation based on an Euler-Maruyama dis-cretisation. They prove that this can be achieved for very general multi-dimensional SDEs at a computational cost which is O(ε−3| log ε|1/2). Thisis better than the O(ε−4) complexity of a standard Monte Carlo simulationusing the same numerical approximation of the exit time, but is not betterthan the complexity achieved by Gobet and Menozzi (2010) using a MonteCarlo simulation with a boundary correction which improves the weak orderof convergence to first order.

Earlier research by Primozic (2011) demonstrates an O(ε−2) complexityfor a one-dimensional problem. This builds on the Milstein approxima-tion for barrier options developed by Giles (2008a), using the approximateBrownian interpolation to determine the approximate probability that theunderlying path crosses the boundary within each timestep. This improvesboth the weak convergence and the multilevel variance, giving α=1, β=3/2and γ=1. This approach is capable of being extended to multi-dimensionalSDEs which satisfy the commutativity condition which allows the use of theMilstein discretisation without requiring the simulation of Levy areas.

5.6. Stiff and highly nonlinear SDEs

In the numerical examples presented above, it is possible to use only onetimestep on the coarsest level of approximation, and still obtain a sufficientlyaccurate path simulation so that the resulting payoff approximation is auseful control variate for the next level of approximation.

However, in other applications this may not be true. If the drift termhas a characteristic time-scale of τ , as in a mean-reverting drift (θ − St)/τ ,then when using the explicit Euler-Maruyama discretisation the timesteph0 on the coarsest level cannot be much larger than τ without encountering

44 M.B. Giles

severe numerical stability problems. This is a natural restriction shared byexplicit discretisations of ordinary differential equations.

A related problem is addressed by Hutzenthaler, Jentzen and Kloeden(2013), who are concerned with SDEs such as

dSt = −S3t dt+ dWt,

which have a super-linear growth in the drift and/or the volatility. Thisagain leads to numerical instability if a uniform timestep is used. Their so-lution is to introduce a slight modification to the Euler-Maruyama discreti-sation which limits the size of the drift term on each level of approximationwhen St is large, to avoid this instability.

Other approaches to these problems include the use of drift-implicit meth-ods (Dereich, Neuenkirch and Szpruch 2012, Higham, Mao and Stuart 2002),or a change or variables equivalent to the use of an integrating factor inODEs (see the Heston treatment in (Giles 2008a) which was suggested byMark Broadie).

Another important approach, when the difficulties are only for extremevalue of St, or for a short time interval, is to use adaptive time-stepping.Hoel, von Schwerin, Szepessy and Tempone (2012) and Hoel, von Schwerin,Szepessy and Tempone (2014) construct a multilevel adaptive timesteppingdiscretisation in which, for each stochastic instance ω, the timesteps usedon level ℓ are a subdivision of those used on level ℓ−1, which in turn area subdivision of those on level ℓ−2, and so on. By doing this, the payoffPℓ on level ℓ is the same regardless of whether one is computing Pℓ−Pℓ−1

or Pℓ+1−Pℓ, and therefore the MLMC telescoping summation is respected.Another notable aspect of their work is the use of adjoint/dual sensitivitiesto determine the optimal timestep size.

New unpublished work by Giles, Suli, Whittle and Ye implements adap-tive time-stepping slightly differently. Instead of the timesteps on one levelbeing a subdivision of those on the level above, it uses a completely inde-pendent adaptation on each level of refinement, with an adaptive timestepof the form

hℓ = 2−ℓ H(Sn),

where H(S) is independent of level. This results in timesteps which are notnaturally nested. It may appear that this would cause difficulties in theMLMC implementation, but Figure 5.9 tries to illustrate that it does not.The underlying Brownian path needs to be sampled at a set of times whichare the union of the simulation times used by the coarse and fine path. Theindependent Brownian increments can be simulated for each time interval,and summed to give W (t) at the required times. An outline algorithm toimplement this is given in Algorithm 3.


coarse path

fine path

t

Figure 5.9. Generation of Brownian increments for a multilevel simulation withadaptive timestepping

Algorithm 3 Outline of a multilevel algorithm with adaptive timesteppingfor the time interval [0, T ]

t := 0tc := 0tf := 0∆W c := 0∆W f := 0

while (t < T ) dotold := tt := min(tc, tf )h := t− told∆W :=

√hZ

∆W c := ∆W c +∆W∆W f := ∆W f +∆W

if t = tc thenupdate coarse path using ∆W c

compute adapted coarse path timestep hc

tc := min(tc+hc, T )∆W c := 0

end if

if t = tf then

update fine path using ∆W f

compute adapted fine path timestep hf

tf := min(tf+hf , T )∆W f := 0

end if

end while

compute Pℓ − Pℓ−1

46 M.B. Giles

5.7. CDF and density estimation

In some applications the objective is an estimation of the cumulative distri-bution function (CDF) or density of an output quantity P , rather than itsexpected value.

Kebaier and Kohatsu-Higa (2008) addressed the problem of constructingpointwise estimates of the density of the SDE solution ST , using Malliavincalculus and a two-level variance reduction based on Kebaier’s earlier work(Kebaier 2005).

Motivated by an interesting engineering application (Iliev, Nagapetyanand Ritter 2013), a multilevel approach to CDF and density estimation forarbitrary output functions has been developed by Giles et al. (2014). TheCDF of a scalar output P is represented approximately by a polynomialspline function. Pointwise values of the CDF which are used as knot valuesin the spline construction correspond to expected values of an indicatorfunction,

C(x) ≡ E [1P<x] ≡ E [H(x−P )] ,

where H(x) is the Heaviside step function. To improve the multilevel vari-ance, this is smoothed to give

Cδ(x) = E [(g((x−P )/δ)] ,

where g(x) is a continuous function with g(x)= 0 for x<−1, and g(x)= 1for x > 1. The parameter δ controls the degree of smoothing. As δ → 0,g(x/δ) → H(x), and the accuracy improves but the variance of the multi-level estimator increases. Hence, this application has both spline approx-imation and smoothing errors in addition to the usual discretisation andsampling errors. Balancing these errors gives the best accuracy for a givencomputational cost. The MLMC approach gives a substantial improvementin the order of complexity, and numerical examples demonstrate the savingswhich can be achieved.

The same paper also discusses the extension to density estimation. Inthis case, the density ρ(x) of the scalar output P is given by

ρ(x) = limδ→0

E[δ−1g((x−P )/δ)

],

where g(x) is a continuous function with g(x)=0 for |x|>1, and∫ 1

−1g(x) dx = 1.

This has similarities with kernel density estimation (Silverman 1986), andcan also be generalised to multi-dimensional outputs.


6. Jump diffusion and Levy processes

6.1. Jump-diffusion processes

With finite activity jump-diffusion processes, such as in the Merton model(Merton 1976), it is natural to simulate each individual jump using a jump-adapted discretisation (Mikulevicius and Platen 1988, Platen and Bruti-Liberati 2010) in which the Brownian diffusion between each jump is ap-proximated using an Euler-Maruyama or Milstein approximation, and theneach jump is simulated exactly.

If the jump activity rate is constant, then for each stochastic sample ωthe jumps on the coarse and fine paths will occur at the same time, andtherefore the extension of the multilevel method is straightforward with thecoarse and fine paths using the same underlying Brownian paths, and thesame random variables to determine the jump times and strengths (Xia andGiles 2012).

If the jump rate is path-dependent then the situation is trickier. If thereis a known upper bound to the jump rate, then one can use the “thinning”approach of Glasserman and Merener (2004) in which a set of candidatejump times is simulated based on the constant upper bound, and then asubset of these are selected to be real jumps. The problem with the multi-level extension of this is that some candidate jumps will be selected for thecoarse path but not for the fine path, or vice versa, leading to an O(1) dif-ference in the paths and hence the payoffs. Xia and Giles (2012) avoid thisby using a change of measure to ensure that the jump times are the samefor both paths; this introduces a Radon-Nikodym into the payoff evaluation,but significantly reduces the multilevel correction variance.

6.2. More general processes

With infinite activity Levy processes it is impossible to simulate each jump.One approach is to simulate the large jumps and either neglect the smalljumps or approximate their effect by adding a Brownian diffusion term(Dereich 2011, Dereich and Heidenreich 2011, Marxen 2010). Followingthis approach, the cutoff δℓ for the jumps which are simulated varies withlevel, and δℓ → 0 as ℓ → ∞ to ensure that the bias converges to zero.In the multilevel treatment, when simulating Pℓ − Pℓ−1 the jumps fall intothree categories. The ones which are larger than δℓ−1 get simulated inboth the fine and coarse paths. The ones which are smaller than δℓ areeither neglected for both paths, or approximated by the same Brownianincrement. The difficulty is in the intermediate range [δℓ, δℓ−1] in which thejumps are simulated for the fine path, but neglected or approximated forthe coarse path. This is what leads to the difference in path simulations,and contributes to a non-zero value for Pℓ − Pℓ−1.

Alternatively, for many SDEs driven by a Levy process it is possible to

48 M.B. Giles

VG NIG α-stable

Asian O(h2) O(h2) O(h2)

lookback O(h) O(h| log h|) O(h2/α−δ)

barrier o(h1−δ) o(h1/2−δ) o(h1/α−δ)

Table 6.3. Numerical analysis convergence rates for the multilevel varianceVℓ ≡ V[Pℓ−Pℓ−1] for Variance-Gamma, NIG, and spectrally-negative α-stableprocesses and different financial options. δ is any strictly positive constant.

directly simulate the increments of the Levy process over a set of uniformtimesteps (Schoutens 2003), in exactly the same way as one simulates Brow-nian increments. When estimating the price of European options which de-pend only on the final value of the Levy process, there is no need for MLMCas one can directly simulate the final value. However, multilevel is still veryuseful for path-dependent financial options such as Asian, lookback and bar-rier options, and there has been good progress in the numerical analysis ofthese applications (Xia 2014, Xia and Giles 2014). Table 6.3 gives the con-vergence rates obtained from the numerical analysis; numerical experimentssupport these results but suggest they may not be sharp in some cases.

For other Levy processes, it may be possible in the future to simulatethe increments by constructing near-perfect (i.e. to within the limits offinite precision computer arithmetic) approximations to the inverse of thecumulative distribution function. Where this is possible, it may be the bestapproach to achieve a perfect coupling between the coarse and fine pathsimulations, since the increments of the driving Levy process for the coarsepath can be obtained trivially by summing the increments for the fine path.

Finally, we mention the work of Ferreiro-Castilla, Kyprianou, Scheichland Suryanarayana (2014) who develop a multilevel approach for barrieroptions using a Wiener-Hopf factorisation technique to sample exactly fromthe joint distribution of the terminal value of a Levy process at an expo-nential random stopping time, and its maximum (or minimum) over theintervening interval. The particularly novel aspect here is the random ter-minal time; by using multiple shorter random periods it is possible, becauseof the Central Limit Theorem, to get closer to a desired fixed terminaltime, but at an increased cost. In the multilevel coupling, the thinning ap-proach of Glasserman and Merener (2004) is again used to link the coarseand fine path simulations. Numerical analysis for lookback options gives acomplexity which is O(ε−3) for processes of bounded variation, and O(ε−4)for processes of unbounded variation. Numerical experiments for barrieroptions demonstrate a similar complexity.


7. PDEs and SPDEs

After the initial work with MLMC for SDEs, it was immediately clearthat it was equally applicable to SPDEs, and indeed the computationalsavings would be greater because the cost of a single sample increasesmore rapidly with grid resolution for SPDEs with higher space-time di-mension. The first published work was by Graubner (2008); on parabolicSPDEs, and since then there has been a variety of papers on elliptic (Barth,Schwab and Zollinger 2011, Charrier, Scheichl and Teckentrup 2013, Cliffe etal. 2011, Teckentrup, Scheichl, Giles and Ullmann 2013), parabolic (Barth,Lang and Schwab 2013, Giles and Reisinger 2012), hyperbolic (Mishra,Schwab and Sukys 2012) SPDEs, as well as for mixed elliptic-hyperbolic sys-tems (Efendiev, Iliev and Kronsbein 2013, Muller, Jenny and Meyer 2013).

In almost all of this work, the construction of the multilevel estimator isquite natural, using a geometric sequence of grids and the usual estimatorsfor Pℓ−Pℓ−1. It is the numerical analysis of the variance of the multilevelestimator which is often very challenging.

7.1. Two simple examples

The MATLAB codes accompanying this article include two simple PDEexamples.

The first is really a boundary-value ODE application, but it can be viewedas a 1D elliptic PDE, with random coefficients and random forcing. Theequation is

d

dx

(c(x)

du

dx

)= −50Z2,

on 0<x< 1, with boundary data u(0) = u(1) = 0. Z is a Normal randomvariable with zero mean and unit variance, and

c(x) = 1 + a x

with a being a uniform random variable on the unit interval (0, 1). Theoutput quantity of interest is chosen to be

P =

∫ 1

0u(x) dx.

The multilevel implementation is very easy. Level ℓ uses a uniform gridwith spacing hℓ = 2−(ℓ+1), so there is just one interior grid point on thecoarsest level. A simple second order central difference approximation isused (equivalent to a finite element approximation with a 1-point quadra-ture).

The uniform second order accuracy means that there is a constant Ksuch that |P − Pℓ| < K h2ℓ and therefore we have α= 2, β = 4 and γ = 1,

50 M.B. Giles

0 2 4 6 8−30

−20

−10

0

10

level l

log

2 v

aria

nce

Pl

Pl− P

l−1

0 2 4 6 8−15

−10

−5

0

5

level l

log

2 |m

ea

n|

Pl

Pl− P

l−1

0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

level l

co

nsis

ten

cy c

he

ck

0 2 4 6 812

13

14

15

16

17

18

level l

ku

rto

sis

0 2 4 610

2

104

106

108

level l

Nl

10−2

10−1

102

103

104

accuracy ε

ε2 C

ost

Std MC

MLMC

0.005

0.01

0.02

0.05

0.1

Figure 7.10. Numerical results for a 1D elliptic PDE with random coefficients andrandom forcing.


resulting in an O(ε−2) complexity. All of these features can be verified inthe numerical results in Figure 7.10.

The second example is a 1D parabolic SPDE, in which u(x, t) satisfies theSPDE

du =∂2u

∂x2dt+ 10 dW,

on the domain 0<x<1 with boundary data u(0, t)=u(1, t)=0 and initialdata u(x, 0)=0. The output functional is chosen to be

P =

∫ 1

0u2(x, 0.25).

The multilevel implementation is again very easy. An Euler-Maruyamatime discretisation with timestep k, combined with a second order spacediscretisation with uniform grid spacing h, gives the discrete approximation

un+1j = unj +

k

h2(unj+1 − 2unj + unj−1

)+ 10∆W n.

The level ℓ approximation uses

hℓ = 2−(ℓ+1), kℓ =14 h

2ℓ .

Keeping kℓ/h2ℓ =

14 ensures the explicit numerical discretisation is stable on

all levels. Since the number of grid points doubles on each level, and thenumber of timesteps increases by factor 4, the cost per sample increases byfactor 8, giving γ=3.

Because the stochastic forcing is additive, i.e. the volatility does not de-pend on u(x, t), the Euler-Maruyama discretisation is actually equivalent toa Milstein discretisation and hence the time discretisation errors are O(k).This is of the same order as the spatial discretisation errors which are O(h2),and therefore the solution error is O(2−2ℓ) and hence α=2 and β=4. Thisleads to the optimal complexity of O(ε−2).

These features are again confirmed in the numerical results shown inFigure 7.11.

7.2. Elliptic SPDE

The largest amount of research on multilevel for SPDEs has been for ellipticPDEs with random coefficients. The PDE typically has the form

−∇ · (κ(x, ω)∇p(x, ω)) = 0, x ∈ D.

with Dirichlet or Neumann boundary conditions on the boundary ∂D. Forsub-surface flow problems, such as the modelling of groundwater flow in nu-clear waste repositories, the diffusivity (or permeability) κ is often modelledas a lognormal random field, i.e. log κ is a Gaussian field with a uniform

52 M.B. Giles

0 2 4 6−20

−15

−10

−5

0

5

10

level l

log

2 v

aria

nce

Pl

Pl− P

l−1

0 2 4 6−15

−10

−5

0

5

level l

log

2 |m

ea

n|

Pl

Pl− P

l−1

0 2 4 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

level l

co

nsis

ten

cy c

he

ck

0 2 4 66

8

10

12

14

level l

ku

rto

sis

0 2 4 6

102

104

106

level l

Nl

10−2

10−1

102

103

104

105

106

107

accuracy ε

ε2 C

ost

Std MC

MLMC

0.005

0.01

0.02

0.05

0.1

Figure 7.11. Numerical results for a 1D parabolic SPDE with Brownian forcing.


mean (which we will take to be zero for simplicity) and a covariance functionof the general form R(x,y) = r(x−y). Samples of log k are provided by aKarhunen-Loeve expansion:

log κ(x, ω) =∞∑

n=0

√θn ξn(ω) fn(x),

where θn are the eigenvalues of R(x,y) in decreasing order, fn are thecorresponding eigenfunctions, and ξn are independent unit Normal randomvariables. However, it is more efficient to generate them using a circulantembedding technique which enables the use of FFTs (Dietrich and Newsam1997).

The multilevel treatment is straightforward. The spatial grid resolution isdoubled on each level. Using the Karhunen-Loeve generation, the expansionis truncated after Kℓ terms, with Kℓ increasing with level (Teckentrup et al.2013); a similar approach has also been used with the circulant embeddinggeneration.

In both cases, log κ is generated using a row-vector of independent unitNormal random variables ξ. The variables for the fine level can be parti-tioned into those for the coarse level ξℓ−1, plus some additional variables zℓ,giving ξℓ = (ξℓ−1, zℓ).

The numerical analysis of the multilevel approach for these elliptic SPDEapplications is challenging because the diffusivity is unbounded, but Char-rier, Scheichl & Teckentrup (Charrier et al. 2013) have successfully analysedit for certain output functionals, and Teckentrup et al have further devel-oped the analysis for other output functionals and more general log-normaldiffusivity fields (Teckentrup et al. 2013, Teckentrup 2013).

7.3. Parabolic SPDE

Giles & Reisinger (Giles and Reisinger 2012) consider an unusual SPDEfrom credit default modelling,

dp = −µ∂p

∂xdt+

1

2

∂2p

∂x2dt−√

ρ∂p

∂xdMt, x > 0

subject to boundary condition p(0, t)=0. Here p(x, t) represents the prob-ability density function for firms being a distance x from default at time t.The diffusive term is due to idiosyncratic factors affecting individual firms,while the stochastic term due to the scalar Brownian motion Mt correspondsto the systemic movement due to random market effects affecting all firms.The payoff corresponds to different tranches of a credit derivative whichdepends on the integral

∫∞0 p(x, t) dx at a set of discrete times.

A Milstein time discretisation with timestep k, and a central space dis-cretisation of the spatial derivatives with uniform spacing h gives the nu-

54 M.B. Giles

merical approximation

pn+1j = pnj − µk +

√ρ k Zn

2h

(pnj+1 − pnj−1

)

+(1−ρ) k + ρ k Z2

n

2h2(pnj+1 − 2pnj + pnj−1

)

where pnj ≈ p(j h, n k), and the Zn are standard Normal random variables

so that√h Zn corresponds to an increment of the driving scalar Brownian

motion.The multilevel implementation is very straightforward. As with the model

parabolic example discussed previously, kℓ = kℓ−1/4 and hℓ = hℓ−1/2 dueto numerical stability considerations which are analysed in the paper. Thecoupling between the coarse and fine samples comes from summing the finepath Brownian increments to give the increments for the coarse path, exactlythe same as for SDEs The computational cost increases by factor 8 on eachlevel, and numerical experiments indicate that the variance decreases byfactor 8, so the overall computational complexity to achieve an O(ε) RMSerror is again O(ε−2(log ε)2).

This work has been extended by Bujok and Reisinger (2012) to include asystemic jump-diffusion term. They also switch from continuous monitoringof default, which gives the boundary condition p(0, t)=0 for all t, to discretemonitoring which introduces the boundary condition p(x, t)=0 for x≤0 foreach discrete default date t.

7.4. Reduced basis approximation

New research by Vidal-Codina et al. (2014) uses a very different approachto SPDE approximation. They start with a high accuracy finite elementapproximation

A(ω)u(ω) = f(ω)

where ω represents the dependency on the stochastic input parameters.They then use a reduced basis approximation with a fixed set of basis func-tions uk

u(ω) ≈K∑

k=1

uk(ω)uk

which leads to a much smaller system of equations

AK(ω)

u1u2...uK

= fK(ω).


The output of interest is a function of the solution, so that they are inter-ested in estimating E[f(u)]. There is an accuracy-cost tradeoff here; largervalues forK lead to greater accuracy, but at a much larger cost. The MLMCtreatment makes K a function of the level ℓ, so that most simulations areperformed with a small value for K, and relatively few with a large value.

In this application, geometric MLMC is not appropriate and so they facethe question of how to choose the levels. As discussed already in Section 2.6,they start with a set of candidate levels and use a number of trial samplesto obtain an estimate of the variances Vℓ1,ℓ2 ≡ V[Pℓ1−Pℓ2 ] for all pairs oflevels. They then use this to determine the optimal subset of levels for themain computation.

8. Continuous-time Markov chains

Anderson and Higham (2012) developed a very interesting application ofMLMC to continuous-time Markov Chain simulation. Although they presenttheir work in the context of stochastic chemical reactions, when species con-centrations are extremely low and so stochastic effects become significant,they point out that the method has wide applicability in other areas.

When there is just one chemical reaction, the “tau-leaping” method (whichis essentially the Euler-Maruyama method, approximating the reaction rateas being constant throughout the timestep) gives the discrete equation

xn+1 = xn + P (hλ(xn)),

where h is the timestep, λ(xn) is the reaction rate (or propensity function),and P (t) represents a unit-rate Poisson random variable over time interval[0, t]. If this equation defines the fine path in the multilevel simulation, thenthe coarse path, with double the timestep, is given by

xcn+2 = xc

n + P (2hλ(xcn))

for even timesteps n.The question then is how to couple the coarse and fine path simulations.

The key observation in (Anderson and Higham 2012), based on earlier workin a different context by Kurtz (1982), is that for any t1, t2 > 0, the sum oftwo independent Poisson variates P (t1), P (t2) is equivalent in distributionto P (t1+t2). Based on this, the first step is to express the coarse path Poissonvariate as the sum of two Poisson variates, P (hλ(xc

n)) corresponding to thefirst and second fine path timesteps. For the first of the two fine timesteps,the coarse and fine path Poisson variates are coupled by defining two Poissonvariates based on the minimum of the two reactions rates, and the absolutedifference,

P1 = P(hmin(λ(xn), λ(x

cn))), P2 = P

(h |λ(xn)− λ(xc

n)|),

56 M.B. Giles

and then using P1 as the Poisson variate for the path with the smallerrate, and P1 + P2 for the path with the larger rate. This elegant approachnaturally gives a small difference in the Poisson variates when the differencein rates is small, and leads to a very effective multilevel algorithm with acorrection variance which is O(h), leading to an O(ε−2(log ε)2) complexity.

In their paper, Anderson and Higham (2012) treat more general sys-tems with multiple reactions. They also include an additional coupling atthe finest level to the exact Stochastic Simulation Algorithm developed by(Gillespie 1976) which updates the reaction rates after every single reac-tion. Hence, their overall multilevel estimator is unbiased, unlike the esti-mators discussed earlier for SDEs, and the complexity is reduced to O(ε−2)because the number of levels remains fixed as ε → 0. They give a com-plete numerical analysis of the variance of their multilevel algorithm; thishas been further sharpened in more recent work (Anderson, Higham andSun 2014). Because stochastic chemical simulations typically involve 1000’sof reactions, the multilevel method is particularly effective in this context,providing computational savings in excess of a factor of 100 (Anderson andHigham 2012).

They also give an interesting numerical example in which an approximatemodel with fewer reactions/reactants is used as a control variate for the fullsystem. This kind of multilevel modelling is another possibility which couldbe considered in a wide variety of circumstances.

Moraes, Tempone and Vilanova (2014b) have extended these ideas byadaptively switching between tau-leaping which is suitable when reactionrates are high, and exact simulation which is more appropriate when re-action rates are low relative to the timestep. They also develop a moreaccurate estimator for the multilevel variance, which otherwise suffers dueto high kurtosis on the finest grid levels, and a method for controlling theprobability that the tau-leaping leads to x becoming negative.

In a second paper, Moraes, Tempone and Vilanova (2014a) generalise theadaptive switching between tau-leaping and exact simulation to treat eachreaction separately. This is very beneficial when different reactions havevery different propensities. They also introduce a novel control variate forthe coarsest simulation level which significantly reduces the total simulationcost. Numerical examples demonstrating impressive cost savings of up to1000× relative to standard methods.

The non-nested adaptive timestepping approach described in Section 5.6for SDEs is equally applicable in this setting. As described in (Giles, Lesterand Whittle 2015, Lester, Yates, Giles and Baker 2015), the construction isexactly the same as illustrated in Figure 5.9, but with Poisson variates foreach time interval instead of Brownian increments. This adaptive timestep-ping can be very helpful in cases in which propensities vary greatly in time.


9. Nested simulation

9.1. MLMC treatment

In nested simulations we are interested in estimating quantities of the form

EZ

[f(EW [g(Z,W )]

) ]

where Z is an outer random variable, and EW [g(Z,W )] is a conditionalexpectation with respect to an independent inner random variable W .

For example, in some financial applications Z which represents differentrisk scenarios, EW [g(Z,W )] represents exposure, conditional on the scenarioZ, and f might correspond to the loss in excess of the capital reserves, so

that EZ

[f(EW [g(Z,W )]

) ]is the expected shortfall.

This can be simulated using nested Monte Carlo simulation with N outersamples Z(n), M inner samples W (m,n) and a standard Monte Carlo esti-mator:

Y = N−1N∑

n=1

f

(M−1

M∑

m=1

g(Z(n),W (m,n))

)

Note that to improve the accuracy of the estimate we need to increase bothM and N , and this will significantly increase the cost.

An MLMC implementation is straightforward; on level ℓ we can use Mℓ=2ℓ inner samples. To construct a low variance estimate for E[Pℓ−Pℓ−1] where

E[Pℓ] ≡ EZ

[f

(M−1

ℓ

∑

m

g(Z,W (m))

)],

we use an antithetic approach and split the Mℓ samples of W for the “fine”value into two subsets of size Mℓ−1 for the “coarse” value:

Yℓ = N−1ℓ

Nℓ∑

n=1

f

(M−1

ℓ

Mℓ∑

m=1

g(Z(n),W (m,n))

)

− 12f

M−1

ℓ−1

Mℓ−1∑

m=1

g(Z(n),W (m,n))

−12f

M−1

ℓ−1

Mℓ∑

m=Mℓ−1+1

g(Z(n),W (m,n))

Note that this has the correct expectation, i.e. E[Yℓ] = E[Pℓ−Pℓ−1].If we define

58 M.B. Giles

M−1ℓ−1

Mℓ−1∑

m=1

g(Z(n),W (m,n)) = E[g(Z(n),W )] + ∆g(n)1 ,

M−1ℓ−1

Mℓ∑

m=Mℓ−1+1

g(Z(n),W (m,n)) = E[g(Z(n),W )] + ∆g(n)2 ,

then if f is twice differentiable a Taylor series expansion gives

Yℓ ≈ − 1

4Nℓ

Nℓ∑

n=1

f ′′(E[g(Z(n),W )]

)(∆g

(n)1 −∆g

(n)2

)2

By the Central Limit Theorem, ∆g(n)1 ,∆g

(n)2 = O(M

−1/2ℓ ) and therefore

f ′′(E[g(Z(n),W )]

)(∆g

(n)1 −∆g

(n)2

)2= O(M−1

ℓ ).

It follows that E[Yℓ]=O(M−1ℓ ) and Vℓ=O(M−2

ℓ ). For the MLMC theorem,this corresponds to α=1, β=2, γ=1, so the complexity is O(ε−2).

This antithetic approach to nested simulation has been developed inde-pendently by several authors (Haji-Ali 2012, Chen and Liu 2012, Bujok,Hambly and Reisinger 2013).

Haji-Ali (2012) used it in a mean field model for the motion of crowds,in which each person is modelled as a independent agent subject to randomforcing and an additional force due to the collective influence of the crowd.This same approach is also relevant to mean field problems which arise inplasma physics (Rosin et al. 2014).

Bujok et al. (2013) used multilevel nested simulation for a financial creditderivative application. In their case, the function f was piecewise linear,not twice differentiable, and so the rate of variance convergence was slightlylower, with β = 1.5. However, this is still sufficiently large to achieve anoverall complexity which is O(ε−2).

The pricing of American options is one of the big challenges for MonteCarlo methods in computational finance, and Belomestny, Schoenmakersand Dickmann (2013) have developed a multilevel Monte Carlo for this pur-pose, based on Anderson & Broadie’s dual simulation method (Andersenand Broadie 2004) in which a key component at each timestep in the simu-lation is to estimate a conditional expectation using a number of sub-paths.In their multilevel treatment, Belomestny & Schoenmakers use the same uni-form timestep on all levels of the simulation, so the quantity which changesbetween different levels of simulation is the number of sub-samples used toestimate the conditional expectation. Hence, their work is another exampleof nested simulation.


9.2. MIMC treatment

The analysis in the previous section assumes that we can compute g(Z,W )with O(1) cost, but suppose now that W represents a complete Brownianpath, and so g(Z,W ) can not be evaluated exactly; it can only be approxi-mated by using some finite number of timesteps. We could continue to useMLMC, and on level ℓ we could use 2ℓ timesteps. When using the Milsteindiscretisation (giving first order weak and strong convergence) this wouldstill give α=1, β=2. However, we would now have γ=2, because the workon successive levels would go up by factor 4×, because of using twice asmany timesteps as well as twice as many inner samples. This then leads toan overall MLMC complexity which is O(ε−2(log ε)−2).

Instead we can use MIMC, to get back to an optimal complexity of O(ε−2).We now have a pair of level indices (l1, l2), with the number of inner samplesequal to 2ℓ1 and the number of timesteps proportional to 2ℓ2 . If we usethe natural extension of the MLMC estimator to the corresponding MIMCestimator, which means (for l1 > 0, l2 > 0) using

Yℓ = N−1ℓ

Nℓ∑

n=1

f

2−ℓ1

2ℓ1∑

1

gl2(Z(n),W (m,n))

− 12f

2−ℓ1+1

2ℓ1−1∑

1

gl2(Z(n),W (m,n))

−12f

2−ℓ1+1

2ℓ1∑

2ℓ1−1+1

gl2(Z(n),W (m,n))

− f

2−ℓ1

2ℓ1∑

1

gl2−1(Z(n),W (m,n))

+ 12f

2−ℓ1+1

2ℓ1−1∑

1

gl2−1(Z(n),W (m,n))

+12f

2−ℓ1+1

2ℓ1∑

2ℓ1−1+1

gl2−1(Z(n),W (m,n))

The subscript on the g terms denotes the level of timestep approximation.

Carrying out the same analysis as before, performing the Taylor series

60 M.B. Giles

expansion around E[g(Z(n),W )], we obtain

Yℓ ≈ − 1

4Nℓ

Nℓ∑

n=1

f ′′(E[g(Z(n),W )]

) (∆g

(n)1,ℓ2

−∆g(n)2,ℓ2

)2

−(∆g

(n)1,ℓ2−1−∆g

(n)2,ℓ2−1

)2

The difference of squares can be re-arranged as(∆g

(n)1,ℓ2

−∆g(n)2,ℓ2

)2−(∆g

(n)1,ℓ2−1−∆g

(n)2,ℓ2−1

)2

=((∆g

(n)1,ℓ2

+∆g(n)1,ℓ2−1)− (∆g

(n)2,ℓ2

+∆g(n)2,ℓ2−1)

)×

((∆g

(n)1,ℓ2

−∆g(n)1,ℓ2−1)− (∆g

(n)2,ℓ2

−∆g(n)2,ℓ2−1)

)

Due to the Central Limit Theorem, we have

∆g(n)1,ℓ2

+∆g(n)1,ℓ2−1 = O(2−ℓ1/2)

∆g(n)2,ℓ2

+∆g(n)2,ℓ2−1 = O(2−ℓ1/2)

and assuming first order strong convergence we also have

∆g(n)1,ℓ2

−∆g(n)1,ℓ2−1 = O(2−ℓ1/2−ℓ2)

∆g(n)2,ℓ2

−∆g(n)2,ℓ2−1 = O(2−ℓ1/2−ℓ2)

Combining these results we obtain(∆g

(n)1,ℓ2

−∆g(n)2,ℓ2

)2−(∆g

(n)1,ℓ2−1−∆g

(n)2,ℓ2−1

)2= O(2−ℓ1−ℓ2)

and therefore E[Yℓ] = O(2−ℓ1−ℓ2) and Vℓ = O(2−2ℓ1−2ℓ2) with a cost persample which is O(2ℓ1+ℓ2). In Theorem 2 this corresponds to α1 = α2 =1, β1=β2=2, and γ1=γ2=1, so the overall complexity is O(ε−2).

Following the analysis in (Bujok et al. 2013), if the function f is continu-ous and piecewise differentiable, rather than being twice differentiable, thenin the MLMC treatment we would get β =1.5, and hence an overall com-plexity which is O(ε−2.5). On the other hand, with MIMC we would haveβ1=β2=1.5 and so the complexity would remain O(ε−2). This illustratesthe benefit of the MIMC approach compared to standard MLMC.

10. Other applications

10.1. Markov chains and limiting distributions

In new research, Glynn and Rhee (2014) consider the application of theirrandomised version of MLMC to Markov chains, and in particular haveaddressed the problem of estimating quantities which are expectations with


level ℓ−1

level ℓ

n−Nℓ −Nℓ−1 0

r r r r r r r r r r r

r r r r r r r r r r r r r r r

Figure 10.12. Level ℓ and ℓ−1 paths for contracting Markov chains

respect to a limiting distribution. In their application, the Markov chainXn in a metric space with metric d is defined by

X0 = x, Xn+1 = φn(Xn), n ≥ 0

where φn is a sequence of iid random functions. Furthermore, it is as-sumed that the φ’s are contracting on average in the sense that

supx 6=y

E

[(d(φn(x), φn(y))

d(x, y)

)2γ]< 1

for some γ ∈ (0, 1). Under these conditions, it is known that the distribu-tion of Xn converges weakly to that of a limit random variable X∞, andtheir objective is to estimate expectations of the form E[f(X∞)], where fis Holder continuous with exponent γ, so that |f(y)− f(x)| ≤ d(x, y)γ .

An example they offer of a chain satisfying the required conditions is

X0 = 0, Xn+1 =12Xn + ξn, n ≥ 0

where P(ξn=0) = P(ξn=1) = 12 . The invariant distribution in this case is

the uniform distribution on [0, 2].In the standard MC approach, one would approximate E[f(X∞)] by es-

timating E[f(XN )] for some large value of N , but this would be a biasedestimate. Glynn and Rhee (2014) circumvent this problem by making Nincrease with level, starting a level ℓ simulation at n = −Nℓ and terminat-ing it at n = 0, as illustrated in Figure 10.12. The fact that the differentlevels terminate at the same time is crucial to the multilevel coupling, thelevel ℓ and ℓ − 1 simulations share the same random φn for n ≥ −Nℓ−1.Because of the contraction property, the effect of the initial evolution of thelevel ℓ path for n < −Nℓ−1 decays exponentially. This gives a coupling witha multilevel correction variance which decays as ℓ increases. Because thedecay is exponential in Nℓ−Nℓ−1, it is appropriate to choose Nℓ to increaselinearly with level.

A very similar approach can also be used for contracting SDEs whichconverge to a limiting distribution. For these, the level ℓ path will performa simulation for the time interval [−Tℓ, 0], using timestep hℓ. The coarse andfine paths will share the same driving Brownian path for the overlappingtime interval [−Tℓ−1, 0], and the contraction property will ensure that themultilevel variance decays with level.

62 M.B. Giles

10.2. Variable precision arithmetic

In the latest CPUs from Intel and AMD, each core has a vector unit whichcan perform 8 single precision or 4 double precision operations with oneinstruction. Hence, single precision computations can be twice as fast asdouble precision on CPUs, and the difference can be even greater on GPUs.

As a very simple example of a two-level MLMC application, Giles (2013)suggested using single precision arithmetic for samples on level 0, and doubleprecision arithmetic on level 1. Estimating V0 and V1 as defined in Section1.2 will automatically lead to the optimal allocation of computational effortbetween the two levels.

This approach has been generalised in new research (Brugger, de Schryver,Wehn, Omland, Hefter, Ritter, Kostiuk and Korn 2014). which exploitsFPGAs (field-programmable gate arrays) which can perform computationswith a user-specified number of bits to represent floating-point or fixed-pointnumbers. Thus, it is possible to implement a multilevel treatment in whichthe number of bits used increases with level.

The primary challenge is to ensure that the telescoping sum is correctlyrespected. In their current work, Brugger et al. (2014) use full precision inthe generation of the random numbers for the Brownian increments, andthen perform the rest of the path calculations with the appropriate numberof bits, before doing the final summations in full accuracy. This ensures thatthe Brownian increments computed for the coarser path ℓ−1 by summingthe increments of the finer path ℓ, are consistent with the increments whichwould be generated on level ℓ−1 when it is the finer of the two levels.

This would not be the case if the increments on level ℓ were generatedwith Bℓ bits of accuracy, then summed to give increments for level ℓ−1,regardless of whether the truncation to the lower accuracy Bℓ−1 took placebefore or after the summation. One possible way of correctly generating andusing reduced accuracy Brownian increments would be to use a BrownianBridge construction. Random numbers can be generated in steps by

In → Un → Zn

where In is a random integer on a range [0, Imax], Un = (In + 12)/Imax is a

random, approximately uniformly-distributed variable on the interval (0, 1),and Zn = Φ−1(Un) is the corresponding N(0, 1) random variable. Whenusing a Brownian Bridge construction, as long as the In are generated inexactly the same way for each level, the other two steps can be performedwith the level of accuracy appropriate to the level of the path, i.e. withhigher precision on level ℓ than on level ℓ−1. The Brownian path beinggenerated for level ℓ is then exactly the same, regardless of whether it is thefiner or coarser of the two levels being simulated for a particular multilevelcorrection. Hence, the telescoping sum will be respected.


11. Conclusions

In the last few years there has been considerable progress in the theo-retical development, application and analysis of multilevel Monte Carlomethods. On the theoretical side, the key extensions are to unbiased ran-domised estimators for applications with a rapid rate of variance conver-gence; Richardson-Romberg extrapolation for improved computational effi-ciency when the rate of variance convergence is low and the weak error hasa regular expansion; and multi-index Monte Carlo (MIMC), generalisingmultilevel to multiple “directions” in which approximations can be refined.On the practical side, the range of applications is growing steadily, now in-cluding SDEs, Levy processes, SPDEs, continuous-time Markov processes,nested simulation, and variable precision arithmetic. Finally, on the numer-ical analysis side, there has been excellent progress on the variance analysisfor the full range of applications, as well as for multilevel QMC algorithms.

This review has attempted to emphasise the conceptual simplicity of themultilevel approach; in essence it is simply a recursive control variate strat-egy, using cheap inaccurate approximations to some random output quantityas a control variate for more accurate but more costly approximations.

In practice, the challenge is to develop a tight coupling between succes-sive approximation levels, to minimise the variance of the difference in theoutput obtained from each level. One particular difficulty which can arise insome applications is when the output quantity is a discontinuous function ofthe intermediate solution, as in the case of a digital option associated witha Brownian diffusion SDE. Three treatments were described to effectivelysmooth the output to improve the variance in such cases: a analytic condi-tional expectation, a “splitting” approximation, and a change of measure.Similar treatments have been found to be helpful in other contexts.

It was also shown that techniques such as antithetic variates can be usedto improve the variance of the multilevel estimator without improving thestrong convergence properties of the numerical discretisation.

Looking to the future, there are many directions for further research.Some areas to perhaps highlight are:

• Development and application of algorithms for MLQMC

There has been excellent recent work on the theoretical analysis ofMLQMC, but this has not yet been matched by the development andapplication of improved algorithms.

• New applications using MIMC

MIMC, the multi-index Monte Carlo method, is probably the mostexciting new theoretical development in this area, and it offers con-siderable scope for improved computational efficiency in a range ofapplications.

64 M.B. Giles

• More work on sensitivity analysis

There has been very little research so far on MLMC for sensitivityanalysis, but in some application areas such as mathematical financethis is very important.

• Work on combinations with other variance reduction techniques, suchas Latin Hypercube or importance sampling

It is often helpful to combine different variance reduction approachesto obtain greater overall savings, and it seems likely that MLMC canbe helpfully combined with other techniques.

• Nested simulation / mean field games

This looks like a very promising new application area for both MLMCand MIMC. Such simulations can be very expensive using traditionalMonte Carlo methods, so MLMC/MIMC may give very substantialcomputational savings.

The MATLAB codes which produced all of the results in this article areavailable from (Giles 2014), along with a number of additional MLMC appli-cation codes demonstrating adaptive timestepping for SDEs and continuous-time Markov processes, the use of antithetics for multi-dimensional SDEs,and additional SPDE examples.

For further information on multilevel Monte Carlo methods, readers canrefer to http://people.maths.ox.ac.uk/gilesm/mlmc community.html

which lists the research groups working in the area, and their main publi-cations.

Acknowledgements

The author is very grateful to Klaus Ritter and Raul Tempone for theirfeedback on a draft of this article. Much of the content is the outcome ofcollaborations and interactions with many researchers, too many to list herebut all much appreciated.

REFERENCES

L. Andersen and M. Broadie (2004), ‘A primal-dual simulation algorithm for pricingmulti-dimensional American options’, Management Science 50(9), 1222–1234.

D. Anderson and D. Higham (2012), ‘Multi-level Monte Carlo for continuous timeMarkov chains with applications in biochemical kinetics’, SIAM Multiscale

Modeling and Simulation 10(1), 146–179.D. Anderson, D. Higham and Y. Sun (2014), ‘Complexity of multilevel Monte Carlo

tau-leaping’, ArXiv preprint 1310.2676.A. Asmussen and P. Glynn (2007), Stochastic Simulation, Springer, New York.R. Avikainen (2009), ‘On irregular functionals of SDEs and the Euler scheme’,

Finance and Stochastics 13(3), 381–401.


I. Babuska, F. Nobile and R. Tempone (2010), ‘A stochastic collocation method forelliptic partial differential equations with random input data’, SIAM Review

52(2), 317–355.I. Babuska, R. Tempone and G. Zouraris (2004), ‘Galerkin finite element approxi-

mations of stochastic elliptic partial differential equations’, SIAM Journal on

Numerical Analysis 42(2), 800–825.

J. Baldeaux and M. Gnewuch (2014), ‘Optimal randomized multilevel algorithmsfor infinite-dimensional integration on function spaces with ANOVA-type de-composition’, SIAM Journal of Numerical Analysis 52(3), 1128–1155.

A. Barth, A. Lang and C. Schwab (2013), ‘Multilevel Monte Carlo method forparabolic stochastic partial differential equations’, BIT Numerical Mathe-

matics 53(1), 3–27.A. Barth, C. Schwab and N. Zollinger (2011), ‘Multi-level Monte Carlo finite

element method for elliptic PDEs with stochastic coefficients’, Numerische

Mathematik 119(1), 123–161.D. Belomestny, J. Schoenmakers and F. Dickmann (2013), ‘Multilevel dual ap-

proach for pricing American style derivatives’, Finance and Stochastics

17(4), 717–742.A. Brandt and V. Ilyin (2003), ‘Multilevel Monte Carlo methods for studying large

scale phenomena in fluids’, Journal of Molecular Liquids 105(2-3), 245–248.A. Brandt, M. Galun and D. Ron (1994), ‘Optimal multigrid algorithms for calcu-

lating thermodynamic limits’, Journal of Statistical Physics 74(1-2), 313–348.M. Broadie and P. Glasserman (1996), ‘Estimating security price derivatives using

simulation’, Management Science 42(2), 269–285.

C. Brugger, C. de Schryver, N. Wehn, S. Omland, M. Hefter, K. Ritter, A. Kostiukand R. Korn (2014), Mixed precision multilevel Monte Carlo on hybrid com-puting systems, in Proceedings of the Conference on Computational Intelli-

gence for Financial Engineering and Economics, IEEE.

K. Bujok and C. Reisinger (2012), ‘Numerical valuation of basket credit deriva-tives in structural jump-diffusion models’, Journal of Computational Finance

15(4), 115–158.

K. Bujok, B. Hambly and C. Reisinger (2013), ‘Multilevel simulation of functionalsof Bernoulli random variables with application to basket credit derivatives’,Methodology and Computing in Applied Probability pp. 1–26.

H.-J. Bungartz and M. Griebel (2004), ‘Sparse grids’, Acta Numerica 13, 147–269.

S. Burgos (2014), The computation of Greeks with multilevel Monte Carlo, DPhilthesis, University of Oxford.

S. Burgos and M. Giles (2012), Computing Greeks using multilevel path simula-tion, in Monte Carlo and Quasi-Monte Carlo Methods 2010 (L. Plaskota andH. Wozniakowski, eds), Springer, pp. 281–296.

J. Charrier, R. Scheichl and A. Teckentrup (2013), ‘Finite element error analysis ofelliptic PDEs with random coefficients and its application to multilevel MonteCarlo methods’, SIAM Journal on Numerical Analysis 51(1), 322–352.

N. Chen and Y. Liu (2012), ‘Estimating expectations of functionals of conditionalexpected via multilevel nested simulation’, Presentation at conference onMonte Carlo and Quasi-Monte Carlo Methods, Sydney.

66 M.B. Giles

J. Clark and R. Cameron (1980), The maximum rate of convergence of discreteapproximations for stochastic differential equations, in Stochastic Differential

Systems Filtering and Control (B. Grigelionis, ed.), Vol. 25 of Lecture Notes

in Control and Information Sciences, Springer, pp. 162–171.

K. Cliffe, M. Giles, R. Scheichl and A. Teckentrup (2011), ‘Multilevel Monte Carlomethods and applications to elliptic PDEs with random coefficients’, Com-

puting and Visualization in Science 14(1), 3–15.

N. Collier, A.-L. Haji-Ali, F. Nobile, E. von Schwerin and R. Tempone (2014), ‘Acontinuation multilevel Monte Carlo algorithm’, to appear in BIT Numerical

Mathematics.J. Creutzig, S. Dereich, T. Muller-Gronbach and K. Ritter (2009), ‘Infinite-

dimensional quadrature and approximation of distributions’, Foundations of

Computational Mathematics 9(4), 391–429.T. Daun and S. Heinrich (2013), Complexity of Banach space valued and parametric

integration, in Monte Carlo and Quasi-Monte Carlo Methods 2012 (J. Dick,F. Kuo, G. Peters and I. Sloan, eds), Springer, pp. 297–316.

T. Daun and S. Heinrich (2014a), ‘Complexity of parametric initial value problemsin Banach spaces’, Journal of Complexity p. to appear.

T. Daun and S. Heinrich (2014b), ‘Complexity of parametric integration in varioussmoothness classes’, Journal of Complexity p. to appear.

S. Dereich (2011), ‘Multilevel Monte Carlo algorithms for Levy-driven SDEs withGaussian correction’, Annals of Applied Probability 21(1), 283–311.

S. Dereich and F. Heidenreich (2011), ‘A multilevel Monte Carlo algorithm forLevy-driven stochastic differential equations’, Stochastic Processes and their

Applications 121(7), 1565–1587.S. Dereich, A. Neuenkirch and L. Szpruch (2012), ‘An Euler-type method for the

strong approximation of the Cox–Ingersoll–Ross process’, Proc. Roy. Soc. A468(2140), 1105–1115.

J. Dick and M. Gnewuch (2014a), ‘Infinite-dimensional integration in weightedHilbert spaces: anchored decompositions, optimal deterministic algorithms,and higher order convergence’, Foundations of Computational Mathematics

pp. 1–51.J. Dick and M. Gnewuch (2014b), ‘Optimal randomized changing dimension algo-

rithms for infinite-dimensional integration on function spaces with ANOVA-type decomposition’, Journal of Approximation Theory 184, 111–145.

J. Dick, F. Kuo and I. Sloan (2013), ‘High-dimensional integration: the quasi-Monte Carlo way’, Acta Numerica 22, 133–288.

J. Dick, F. Pillichshammer and B. Waterhouse (2007), ‘The construction of goodextensible rank-1 lattices’, Math. Comp. 77, 2345–2373.

C. Dietrich and G. Newsam (1997), ‘Fast and exact simulation of stationary Gaus-sian processes through circulant embedding of the covariance matrix’, SIAMJournal on Scientific Computing 18(4), 1088–1107.

Y. Efendiev, O. Iliev and C. Kronsbein (2013), ‘Multilevel Monte Carlo methodsusing ensemble level mixed MsFEM for two-phase flow and transport simu-lations’, Computational Geosciences 17(5), 833–850.

A. Ferreiro-Castilla, A. Kyprianou, R. Scheichl and G. Suryanarayana (2014),


‘Multi-level Monte-Carlo simulation for Levy processes based on the Wiener-Hopf factorisation’, Stochastic Processes and their Applications 124(2), 985–1010.

M. Giles (2008a), Improved multilevel Monte Carlo convergence using the Milsteinscheme, in Monte Carlo and Quasi-Monte Carlo Methods 2006 (A. Keller,S. Heinrich and H. Niederreiter, eds), Springer, pp. 343–358.

M. Giles (2008b), ‘Multilevel Monte Carlo path simulation’, Operations Research

56(3), 607–617.M. Giles (2009), Multilevel Monte Carlo for basket options, in Proceedings of

the 2009 Winter Simulation Conference (M. Rossetti, R. Hill, B. Johansson,A. Dunkin and R. Ingalls, eds), IEEE, pp. 1283–1290.

M. Giles (2013), Multilevel Monte Carlo methods, in Monte Carlo and Quasi-

Monte Carlo Methods 2012 (J. Dick, F. Kuo, G. Peters and I. Sloan, eds),Springer, pp. 79–98.

M. Giles (2014), ‘Matlab code for multilevel Monte Carlo computations’,http://people.maths.ox.ac.uk/gilesm/acta/.

M. Giles and C. Reisinger (2012), ‘Stochastic finite differences and multilevel MonteCarlo for a class of SPDEs in finance’, SIAM Journal of Financial Mathemat-

ics 3(1), 572–592.M. Giles and L. Szpruch (2013), Antithetic multilevel Monte Carlo estimation

for multidimensional SDEs, in Monte Carlo and Quasi-Monte Carlo Methods

2012 (J. Dick, F. Kuo, G. Peters and I. Sloan, eds), Springer, pp. 367–384.M. Giles and L. Szpruch (2014), ‘Antithetic multilevel Monte Carlo estimation for

multi-dimensional SDEs without Levy area simulation’, Annals of Applied

Probability 24(4), 1585–1620.M. Giles and B. Waterhouse (2009), Multilevel quasi-Monte Carlo path simula-

tion, in Advanced Financial Modelling, Radon Series on Computational andApplied Mathematics, De Gruyter, pp. 165–181.

M. Giles, K. Debrabant and A. Roßler (2013), ‘Numerical analysis of multi-level Monte Carlo path simulation using the Milstein discretisation’, ArXivpreprint: 1302.4676.

M. Giles, D. Higham and X. Mao (2009), ‘Analysing multilevel Monte Carlo for op-tions with non-globally Lipschitz payoff’, Finance and Stochastics 13(3), 403–413.

M. Giles, C. Lester and J. Whittle (2015), Simple adaptive timestepping for mul-tilevel Monte Carlo, in Monte Carlo and Quasi-Monte Carlo Methods 2014,Springer, p. submitted.

M. Giles, T. Nagapetyan and K. Ritter (2014), ‘Multilevel Monte Carlo approxi-mation of distribution functions and densities’, DFG-SPP 1324 Preprint 157.

D. Gillespie (1976), ‘A general method for numerically simulating the stochas-tic time evolution of coupled chemical reactions’, Journal of Computational

Physics 22(4), 403–434.P. Glasserman (2004), Monte Carlo Methods in Financial Engineering, Springer,

New York.P. Glasserman and N. Merener (2004), ‘Convergence of a discretization scheme

for jump-diffusion processes with state-dependent intensities’, Proc. Royal

Soc. London A 460, 111–127.

68 M.B. Giles

P. Glynn and C.-H. Rhee (2014), ‘Exact estimation for Markov chain equilibriumexpectations’, Journal of Applied Probability.

E. Gobet and S. Menozzi (2010), ‘Stopped diffusion processes: overshoots andboundary correction’, Stochastic Processes and their Applications 120, 130–162.

S. Graubner (2008), Multi-level Monte Carlo Methoden fur stochastische partielleDifferentialgleichungen, Diplomarbeit, TU Darmstadt.

M. Gunzburger, C. Webster and G. Zhang (2014), ‘Stochastic finite element meth-ods for partial differential equations with random input data’, Acta Numerica

23, 521–650.A.-L. Haji-Ali (2012), Pedestrian flow in the mean-field limit, MSc thesis, KAUST.

A.-L. Haji-Ali, F. Nobile and R. Tempone (2014a), ‘Multi Index Monte Carlo:when sparsity meets sampling’, ArXiv preprint: 1405.3757.

A.-L. Haji-Ali, F. Nobile, E. von Schwerin and R. Tempone (2014b), ‘Optimiza-tion of mesh hierarchies in multilevel Monte Carlo samplers’, ArXiv preprint:

1403.2480.S. Heinrich (1998), ‘Monte Carlo complexity of global solution of integral equa-

tions’, Journal of Complexity 14(2), 151–175.S. Heinrich (2000), The multilevel method of dependent tests, in Advances in

Stochastic Simulation Methods (N. Balakrishnan, V. Melas and S. Ermakov,eds), Springer, pp. 47–61.

S. Heinrich (2001), Multilevel Monte Carlo methods, in Multigrid Methods, Vol.2179 of Lecture Notes in Computer Science, Springer, pp. 58–67.

S. Heinrich (2006), ‘Monte Carlo approximation of weakly singular integral opera-tors’, Journal of Complexity 22(2), 192–219.

S. Heinrich and E. Sindambiwe (1999), ‘Monte Carlo complexity of parametricintegration’, Journal of Complexity 15(3), 317–341.

D. Higham, X. Mao and A. Stuart (2002), ‘Strong convergence of euler-type meth-ods for nonlinear stochastic differential equations’, SIAM Journal on Numer-

ical Analysis 40(3), 1041–1063.D. Higham, X. Mao, M. Roj, Q. Song and G. Yin (2013), ‘Mean exit times and the

multi-level Monte Carlo method’, SIAM Journal on Uncertainty Quantifica-

tion 1(1), 2–18.H. Hoel, E. von Schwerin, A. Szepessy and R. Tempone (2012), Adaptive multilevel

Monte Carlo simulation, in Numerical Analysis of Multiscale Computations

(B. Engquist, O. Runborg and Y.-H. Tsai, eds), number 82 in ‘Lecture Notesin Computational Science and Engineering’, Springer, pp. 217–234.

H. Hoel, E. von Schwerin, A. Szepessy and R. Tempone (2014), ‘Implementationand analysis of an adaptive multilevel Monte Carlo algorithm’, Monte Carlo

Methods and Applications 20(1), 1–41.M. Hutzenthaler, A. Jentzen and P. Kloeden (2013), ‘Divergence of the multilevel

Monte Carlo method’, Annals of Applied Probability 23(5), 1913–1966.O. Iliev, T. Nagapetyan and K. Ritter (2013), Monte Carlo simulation of asym-

metric flow field flow fractionation, in Monte Carlo Methods and Applications:

Proceedings of the 8th IMACS Seminar on Monte Carlo Methods, de Gruyter,pp. 115–123.


S. Joe and F. Kuo (2008), ‘Constructing Sobol sequences with better two-dimensional projections’, SIAM Journal on Scientific Computing 30(5), 2635–2654.

A. Kebaier (2005), ‘Statistical Romberg extrapolation: a new variance reductionmethod and applications to options pricing’, Annals of Applied Probability

14(4), 2681–2705.A. Kebaier and A. Kohatsu-Higa (2008), ‘An optimal control variance reduction

method for density estimation’, Stochastic Processes and their Applications

118(2), 2143–2180.P. Kloeden and E. Platen (1992), Numerical Solution of Stochastic Differential

Equations, Springer, Berlin.F. Kuo, C. Schwab and I. Sloan (2012), ‘Multi-level quasi-Monte Carlo finite ele-

ment methods for a class of elliptic partial differential equations with randomcoefficients’, ArXiv preprint: 1208.6349.

T. Kurtz (1982), Representation and approximation of counting processes, in Ad-

vances in filtering and optimal stochastic control, Vol. 42, Springer, pp. 177–191.

P. L’Ecuyer (1990), ‘A unified view of the IPA, SF and LR gradient estimationtechniques’, Management Science 36(11), 1364–1383.

M. Ledoux and M. Talagrand (1991), Probability in Banach spaces: isoperimetry

and processes, Springer.V. Lemaire and G. Pages (2013), ‘Multilevel Richardson-Romberg extrapolation’,

ArXiv preprint: 1401.1177.C. Lester, C. Yates, M. Giles and R. Baker (2015), ‘An adapted multi-level simula-

tion algorithm for stochastic biological systems’, Journal of Chemical Physics.Q. Li (2007), Numerical Approximation for SDE, PhD thesis, University of Edin-

burgh.H. Marxen (2010), ‘The multilevel Monte Carlo method used on a Levy driven

SDE’, Monte Carlo Methods and Applications 16(2), 167–190.R. Merton (1976), ‘Option pricing when underlying stock returns are discontinu-

ous’, Journal of Finance 3, 125–144.R. Mikulevicius and E. Platen (1988), ‘Time discrete Taylor approximations for

Ito processes with jump component’, Mathematische Nachrichten 138(1), 93–104.

S. Mishra, C. Schwab and J. Sukys (2012), ‘Multi-level Monte Carlo finite vol-ume methods for nonlinear systems of conservation laws in multi-dimensions’,Journal of Computational Physics 231(8), 3365–3388.

A. Moraes, R. Tempone and P. Vilanova (2014a), ‘A multilevel adaptive reaction-splitting simulation method for stochastic reaction networks’, ArXiv preprint

1406.1989.A. Moraes, R. Tempone and P. Vilanova (2014b), ‘Multilevel hybrid Chernoff tau-

leap’, ArXiv preprint 1403.2943.F. Muller, P. Jenny and D. Meyer (2013), ‘Multilevel Monte Carlo for two phase

flow and Buckley-Leverett transport in random heterogeneous porous media’,Journal of Computational Physics 250, 685–702.

B. Niu, F. Hickernell, T. Muller-Gronbach and K. Ritter (2010), ‘Deterministic

70 M.B. Giles

multi-level algorithms for infinite-dimensional integration on RN ’, Journal of

Complexity 27(3-4), 331–351.E. Platen and N. Bruti-Liberati (2010), Numerical Solution of Stochastic Differen-

tial Equations With Jumps in Finance, Springer.T. Primozic (2011), Estimating expected first passage times using multilevel Monte

Carlo algorithm, MSc thesis, University of Oxford.M. Putko, A. Taylor, P. Newman and L. Green (2002), ‘Approach for input uncer-

tainty propagation and robust design in CFD using sensitivity derivatives’,Journal of Fluids Engineering 124(1), 60–69.

C.-H. Rhee and P. Glynn (2012), A new approach to unbiased estimation for SDEs,in Proceedings of the 2012 Winter Simulation Conference, IEEE.

C.-H. Rhee and P. Glynn (2013), ‘Unbiased estimation with square root conver-gence for SDE models’, Submitted for publication.

M. Rosin, L. Ricketson, A. Dimits, R. Caflisch and B. Cohen (2014), ‘MultilevelMonte Carlo simulation of Coulomb collisions’, Journal of Computational

Physics 247, 140–157.W. Schoutens (2003), Levy processes in finance: pricing financial derivatives, Wi-

ley.B. Silverman (1986), Density Estimation for Statistics and Data Analysis, Chap-

man & Hall.A. Speight (2009), ‘A multilevel approach to control variates’, Journal of Compu-

tational Finance 12, 1–25.A. Teckentrup (2013), Multilevel Monte Carlo methods and uncertainty quantifi-

cation, PhD thesis, University of Bath.A. Teckentrup, R. Scheichl, M. Giles and E. Ullmann (2013), ‘Further analysis of

multilevel Monte Carlo methods for elliptic PDEs with random coefficients’,Numerische Mathematik 125(3), 569–600.

F. Vidal-Codina, N. Nguyen, M. Giles and J. Peraire (2014), ‘A model and vari-ance reduction method for computing statistical outputs of stochastic ellipticpartial differential equations’, ArXiv preprint 1409.1526.

Y. Xia (2014), Multilevel Monte Carlo for jump processes, DPhil thesis, Universityof Oxford.

Y. Xia and M. Giles (2012), Multilevel path simulation for jump-diffusion SDEs,in Monte Carlo and Quasi-Monte Carlo Methods 2010 (L. Plaskota andH. Wozniakowski, eds), Springer, pp. 695–708.

Y. Xia and M. Giles (2014), ‘Multilevel Monte Carlo for exponential Levy models’,submitted.

D. Xiu and G. Karniadakis (2002), ‘The Wiener–Askey polynomial chaos forstochastic differential equations’, SIAM Journal on Scientific Computing

24(2), 619–644.

Multilevel Monte Carlo methods - People | Mathematical ... · PDF fileMonte Carlo simulation;...

Documents

Transcript of Multilevel Monte Carlo methods - People | Mathematical ... · PDF fileMonte Carlo simulation;...