Optimal importance sampling using stochastic control › ... › lms2018 › p_-_richter.pdfOptimal...

1
Optimal importance sampling using stochastic control Lorenz Richter, Carsten Hartmann Freie Universität Berlin The setting Goal: Compute Eπ[f (X)],f : R d R. Model π as stationary distribution of a stochastic process Xt R d associated with path measure P and described by dXt = log π(Xt )dt + 2dWt and compute E P [f (X T )]. Objective: Deal with variance and convergence speed. Idea: Consider the controlled process associated with path measure Q u dX u t = ( log π(X u t )+ 2u(X u t ,t) ) dt + 2dWt and do importance sampling in path space: E Q u [ f (X u T ) dP dQ u ] . Outlook: Exploit ergodicity to find a good control for infinite times: lim T →∞ 1 T T 0 f (Xt )dt = Eπ[f (X)]. Optimal change of measure Donsker-Varadhan: Consider W (X 0:T )= T 0 g(Xt ,t)dt+h(X T ).The duality between sampling and an optimal change of measure γ(x, t) := - log E P [exp(-W )|Xt = x]= inf Q u P {E Q u [W ] + KL(Q u P)} brings a zero-variance estimator, i.e. Var Q * ( exp (-W ) dP dQ * ) =0. γ(x, t) is the value function of a control problem and fulfills the HJB equation the control costs are J (u)= E [ T 0 ( g(X u t ,t)+ 1 2 |u(X u t ,t)| 2 ) dt + h(X u T ) ] the optimal change of measure reads dQ * = e -W E[e -W ] dP the optimal control is u * (x, t)= - 2xγ(x, t) dP dQ u is computed via Girsanov’s theorem Computing the optimal change of measure Gradient descent Parametrize the control (i.e. the change of measure) in ansatz functions φ i : u(x, t)= - 2 m i=1 α i (t)φ i (x). Compute the minimization of the costs J (u) with a gradient descent in α, i.e. α k+1 = α k - η k α ˆ J (u(α k )) Different estimators of the gradient scale differently with the time horizon T : G finite differences T G centered likelihood ratio T 2 G likelihood ratio T 3 Cross-entropy method It holds J (u)= J (u * ) + KL(Q u Q * ), however we minimize KL(Q * Q u ) since this is feasible. Again parametrize u(α) in ansatz functions. We then need to minimize the cross-entropy functional for which we get a necessary optimality condition in form of the linear equation = b, which we solve iteratively. Approximate policy iteration Successive linearization of HJB equation: the cost functional J (u) solves a PDE of the form A(u)J (u)= l(x, u), where γ(x, t) = minu J (u; x, t). Start with an initial guess of the control policy and iterate u k+1 (x, t)= - 2xJ (u k ; x, t), yielding a fixed point iteration in the non-optimal control u. Numerical simulations • Sample E[exp(-αX T )] with π = N (0, 1) and u * (x, t)= - 2αe t-T , ˆ u determined by gradient decent • normalized variances with α =1,T =5, E[exp(-αX T )] = 1.649: t vanilla with u * with ˆ u 10 -2 4.64 1.70 × 10 -4 5.69 × 10 -2 10 -3 4.02 1.69 × 10 -7 3.21 × 10 -2 10 -4 4.55 1.73 × 10 -8 6.45 × 10 -2 • Sample E[X T ] with π = N (b, 1), i.e. consider h(x)= - log x, w.r.t. dXs =(-Xs + b)ds + 2dWs,Xt = x then X T ∼N (b +(x - b)e t-T , 1 - e 2(t-T ) ) u * (x, t)= 2e t-T b +(x - b)e t-T Molecular dynamics: compute transition probabilities, i.e. first hitting times consider g =1,h =0, i.e. we sample hitting times E[e -τ ] optimal change of measure can be inter- preted as a tilting of the potential guiding the dynamics References [1] Hartmann, C., Richter, L., Schütte, C., & Zhang, W. (2017). Variational characterization of free energy: theory and algorithms. Entropy, 19(11), 626. [2] Richter, L. (2016). Efficient statistical estimation using stochastic control and optimization. Master’s thesis, Freie Universität Berlin. [3] Fleming, W.H., Soner, H.M. (2006). Controlled Markov processes and viscosity solutions. Springer, New York, NY, USA. [4] Dai Pra, P., Meneghini, L., Runggaldier, W.J. (1996) Connections between stochastic control and dynamic games. Math. Control Signals Syst, 9, 303-326.

Transcript of Optimal importance sampling using stochastic control › ... › lms2018 › p_-_richter.pdfOptimal...

Page 1: Optimal importance sampling using stochastic control › ... › lms2018 › p_-_richter.pdfOptimal importance sampling using stochastic control Lorenz Richter, Carsten Hartmann Freie

Optimal importance sampling using

stochastic controlLorenz Richter, Carsten Hartmann

Freie Universität Berlin

The setting

Goal: Compute Eπ [f(X)], f : Rd → R. Model π as stationary distribution of astochastic process Xt ∈ Rd associated with path measure P and described by

dXt = ∇ log π(Xt)dt+√2dWt

and compute EP[f(XT )].

Objective: Deal with variance and convergence speed.

Idea: Consider the controlled process associated with path measure Qu

dXut =

(∇ log π(Xu

t ) +√2u(Xu

t , t))dt+

√2dWt

and do importance sampling in path space: EQu

[f(Xu

T ) dPdQu

].

Outlook: Exploit ergodicity to find a good control for infinite times:

limT→∞

1

T

∫ T

0f(Xt)dt = Eπ [f(X)].

Optimal change of measure

Donsker-Varadhan: Consider W (X0:T ) =∫ T0 g(Xt, t)dt+h(XT ).The duality

between sampling and an optimal change of measure

γ(x, t) := − logEP[exp(−W )|Xt = x] = infQu≪P

{EQu [W ] + KL(Qu∥P)}

brings a zero-variance estimator, i.e. VarQ∗(exp (−W ) dP

dQ∗

)= 0.

→ γ(x, t) is the value function of a control problem and fulfills the HJB equation

→ the control costs are J(u) = E[∫ T

0

(g(Xu

t , t) +12|u(Xu

t , t)|2)dt+ h(Xu

T )]

→ the optimal change of measure reads dQ∗ = e−W

E[e−W ]dP

→ the optimal control is u∗(x, t) = −√2∇xγ(x, t)

→ dPdQu is computed via Girsanov’s theorem

Computing the optimal change of measure

Gradient descentParametrize the control (i.e. the change of measure) in ansatz functions φi:

u(x, t) = −√2

m∑i=1

αi(t)φi(x).

Compute the minimization of the costs J(u) with a gradient descent in α, i.e.

αk+1 = αk − ηk∇αJ(u(αk))

Different estimators of the gradient scale differently with the time horizon T :– Gfinite differences ∝ T

– Gcentered likelihood ratio ∝ T 2

– Glikelihood ratio ∝ T 3

Cross-entropy methodIt holds J(u) = J(u∗) + KL(Qu∥Q∗), however we minimize KL(Q∗∥Qu)since this is feasible.Again parametrize u(α) in ansatz functions. We then need to minimize thecross-entropy functional for which we get a necessary optimality condition inform of the linear equation Sα = b, which we solve iteratively.

Approximate policy iterationSuccessive linearization of HJB equation: the cost functional J(u) solves a PDEof the form

A(u)J(u) = l(x, u),

where γ(x, t) = minu J(u;x, t). Start with an initial guess of the control policyand iterate

uk+1(x, t) = −√2∇xJ(uk;x, t),

yielding a fixed point iteration in the non-optimal control u.

Numerical simulations

• Sample E[exp(−αXT )] with π = N (0, 1) and u∗(x, t) = −√2αet−T ,

u determined by gradient decent

0 1 2 3 4 5t

0

5

10

15

eX t

0 1 2 3 4 5t

0

2

4

6

8

10

eX

u td d

x

2 1 0 1 2

t

01

23

45

u* (

x,t)

0.00.10.20.30.40.50.60.7

x

2 1 0 1 2

t

01

23

45

u(x,

t)

0.00.10.20.30.40.50.60.7

• normalized variances with α = 1, T = 5,E[exp(−αXT )] = 1.649:

∆t vanilla with u∗ with u10−2 4.64 1.70× 10−4 5.69× 10−2

10−3 4.02 1.69× 10−7 3.21× 10−2

10−4 4.55 1.73× 10−8 6.45× 10−2

• Sample E[XT ] with π = N (b, 1), i.e. consider h(x) = − log x, w.r.t.

dXs = (−Xs + b)ds+√2dWs, Xt = x

– then XT ∼ N (b+ (x− b)et−T , 1− e2(t−T ))

u∗(x, t) =

√2et−T

b+ (x− b)et−T

• Molecular dynamics: compute transition probabilities, i.e. first hitting times

– consider g = 1, h = 0, i.e. we samplehitting times E[e−τ ]

– optimal change of measure can be inter-preted as a tilting of the potential guidingthe dynamics

2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0x

0

2

4

6

8

10

12

14

G(x

)

13579111315optimal

References

[1] Hartmann, C., Richter, L., Schütte, C., & Zhang, W. (2017). Variational characterization of free energy: theory and algorithms. Entropy, 19(11), 626.[2] Richter, L. (2016). Efficient statistical estimation using stochastic control and optimization. Master’s thesis, Freie Universität Berlin.[3] Fleming, W.H., Soner, H.M. (2006). Controlled Markov processes and viscosity solutions. Springer, New York, NY, USA.[4] Dai Pra, P., Meneghini, L., Runggaldier, W.J. (1996) Connections between stochastic control and dynamic games. Math. Control Signals Syst, 9, 303-326.