• date post

05-Jul-2020
• Category

## Documents

• view

0

0

Embed Size (px)

### Transcript of Optimal importance sampling using stochastic control › ... › lms2018 ›...

• Optimal importance sampling using

stochastic control Lorenz Richter, Carsten Hartmann

Freie Universität Berlin

The setting

Goal: Compute Eπ [f(X)], f : Rd → R. Model π as stationary distribution of a stochastic process Xt ∈ Rd associated with path measure P and described by

dXt = ∇ log π(Xt)dt+ √ 2dWt

and compute EP[f(XT )].

Objective: Deal with variance and convergence speed.

Idea: Consider the controlled process associated with path measure Qu

dXut = ( ∇ log π(Xut ) +

√ 2u(Xut , t)

) dt+

√ 2dWt

and do importance sampling in path space: EQu [ f(XuT )

dP dQu

] .

Outlook: Exploit ergodicity to find a good control for infinite times:

lim T→∞

1

T

∫ T 0

f(Xt)dt = Eπ [f(X)].

Optimal change of measure

Donsker-Varadhan: Consider W (X0:T ) = ∫ T 0 g(Xt, t)dt+h(XT ).The duality

between sampling and an optimal change of measure

γ(x, t) := − logEP[exp(−W )|Xt = x] = infQu≪P {EQu [W ] + KL(Qu∥P)}

brings a zero-variance estimator, i.e. VarQ∗ ( exp (−W ) dP

dQ∗ ) = 0.

→ γ(x, t) is the value function of a control problem and fulfills the HJB equation

→ the control costs are J(u) = E [∫ T

0

( g(Xut , t) +

1 2 |u(Xut , t)|2

) dt+ h(XuT )

] → the optimal change of measure reads dQ∗ = e

−W

E[e−W ]dP

→ the optimal control is u∗(x, t) = − √ 2∇xγ(x, t)

→ dP dQu is computed via Girsanov’s theorem

Computing the optimal change of measure

Gradient descent Parametrize the control (i.e. the change of measure) in ansatz functions φi:

u(x, t) = − √ 2

m∑ i=1

αi(t)φi(x).

Compute the minimization of the costs J(u) with a gradient descent in α, i.e.

αk+1 = αk − ηk∇αĴ(u(αk))

Different estimators of the gradient scale differently with the time horizon T : – Gfinite differences ∝ T – Gcentered likelihood ratio ∝ T 2

– Glikelihood ratio ∝ T 3

Cross-entropy method It holds J(u) = J(u∗) + KL(Qu∥Q∗), however we minimize KL(Q∗∥Qu) since this is feasible. Again parametrize u(α) in ansatz functions. We then need to minimize the cross-entropy functional for which we get a necessary optimality condition in form of the linear equation Sα = b, which we solve iteratively.

Approximate policy iteration Successive linearization of HJB equation: the cost functional J(u) solves a PDE of the form

A(u)J(u) = l(x, u),

where γ(x, t) = minu J(u;x, t). Start with an initial guess of the control policy and iterate

uk+1(x, t) = − √ 2∇xJ(uk;x, t),

yielding a fixed point iteration in the non-optimal control u.

Numerical simulations

• Sample E[exp(−αXT )] with π = N (0, 1) and u∗(x, t) = − √ 2αet−T ,

0 1 2 3 4 5 t

0

5

10

15

e X t

0 1 2 3 4 5 t

0

2

4

6

8

10

e X

u t d d

x

2 1 0 1 2

t

0 1

2 3

4 5

u * (

x, t)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

x

2 1 0 1 2

t

0 1

2 3

4 5

u( x,

t)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

• normalized variances with α = 1, T = 5,E[exp(−αXT )] = 1.649:

∆t vanilla with u∗ with û 10−2 4.64 1.70× 10−4 5.69× 10−2 10−3 4.02 1.69× 10−7 3.21× 10−2 10−4 4.55 1.73× 10−8 6.45× 10−2

• Sample E[XT ] with π = N (b, 1), i.e. consider h(x) = − log x, w.r.t.

dXs = (−Xs + b)ds+ √ 2dWs, Xt = x

– then XT ∼ N (b+ (x− b)et−T , 1− e2(t−T ))

u∗(x, t) =

√ 2et−T

b+ (x− b)et−T

• Molecular dynamics: compute transition probabilities, i.e. first hitting times

– consider g = 1, h = 0, i.e. we sample hitting times E[e−τ ]

– optimal change of measure can be inter- preted as a tilting of the potential guiding the dynamics

2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 x

0

2

4

6

8

10

12

14

G (x

)

1 3 5 7 9 11 13 15 optimal

References

 Hartmann, C., Richter, L., Schütte, C., & Zhang, W. (2017). Variational characterization of free energy: theory and algorithms. Entropy, 19(11), 626.  Richter, L. (2016). Efficient statistical estimation using stochastic control and optimization. Master’s thesis, Freie Universität Berlin.  Fleming, W.H., Soner, H.M. (2006). Controlled Markov processes and viscosity solutions. Springer, New York, NY, USA.  Dai Pra, P., Meneghini, L., Runggaldier, W.J. (1996) Connections between stochastic control and dynamic games. Math. Control Signals Syst, 9, 303-326.