Optimal importance sampling using stochastic control › ... › lms2018 ›...

Click here to load reader

  • date post

    05-Jul-2020
  • Category

    Documents

  • view

    0
  • download

    0

Embed Size (px)

Transcript of Optimal importance sampling using stochastic control › ... › lms2018 ›...

  • Optimal importance sampling using

    stochastic control Lorenz Richter, Carsten Hartmann

    Freie Universität Berlin

    The setting

    Goal: Compute Eπ [f(X)], f : Rd → R. Model π as stationary distribution of a stochastic process Xt ∈ Rd associated with path measure P and described by

    dXt = ∇ log π(Xt)dt+ √ 2dWt

    and compute EP[f(XT )].

    Objective: Deal with variance and convergence speed.

    Idea: Consider the controlled process associated with path measure Qu

    dXut = ( ∇ log π(Xut ) +

    √ 2u(Xut , t)

    ) dt+

    √ 2dWt

    and do importance sampling in path space: EQu [ f(XuT )

    dP dQu

    ] .

    Outlook: Exploit ergodicity to find a good control for infinite times:

    lim T→∞

    1

    T

    ∫ T 0

    f(Xt)dt = Eπ [f(X)].

    Optimal change of measure

    Donsker-Varadhan: Consider W (X0:T ) = ∫ T 0 g(Xt, t)dt+h(XT ).The duality

    between sampling and an optimal change of measure

    γ(x, t) := − logEP[exp(−W )|Xt = x] = infQu≪P {EQu [W ] + KL(Qu∥P)}

    brings a zero-variance estimator, i.e. VarQ∗ ( exp (−W ) dP

    dQ∗ ) = 0.

    → γ(x, t) is the value function of a control problem and fulfills the HJB equation

    → the control costs are J(u) = E [∫ T

    0

    ( g(Xut , t) +

    1 2 |u(Xut , t)|2

    ) dt+ h(XuT )

    ] → the optimal change of measure reads dQ∗ = e

    −W

    E[e−W ]dP

    → the optimal control is u∗(x, t) = − √ 2∇xγ(x, t)

    → dP dQu is computed via Girsanov’s theorem

    Computing the optimal change of measure

    Gradient descent Parametrize the control (i.e. the change of measure) in ansatz functions φi:

    u(x, t) = − √ 2

    m∑ i=1

    αi(t)φi(x).

    Compute the minimization of the costs J(u) with a gradient descent in α, i.e.

    αk+1 = αk − ηk∇αĴ(u(αk))

    Different estimators of the gradient scale differently with the time horizon T : – Gfinite differences ∝ T – Gcentered likelihood ratio ∝ T 2

    – Glikelihood ratio ∝ T 3

    Cross-entropy method It holds J(u) = J(u∗) + KL(Qu∥Q∗), however we minimize KL(Q∗∥Qu) since this is feasible. Again parametrize u(α) in ansatz functions. We then need to minimize the cross-entropy functional for which we get a necessary optimality condition in form of the linear equation Sα = b, which we solve iteratively.

    Approximate policy iteration Successive linearization of HJB equation: the cost functional J(u) solves a PDE of the form

    A(u)J(u) = l(x, u),

    where γ(x, t) = minu J(u;x, t). Start with an initial guess of the control policy and iterate

    uk+1(x, t) = − √ 2∇xJ(uk;x, t),

    yielding a fixed point iteration in the non-optimal control u.

    Numerical simulations

    • Sample E[exp(−αXT )] with π = N (0, 1) and u∗(x, t) = − √ 2αet−T ,

    û determined by gradient decent

    0 1 2 3 4 5 t

    0

    5

    10

    15

    e X t

    0 1 2 3 4 5 t

    0

    2

    4

    6

    8

    10

    e X

    u t d d

    x

    2 1 0 1 2

    t

    0 1

    2 3

    4 5

    u * (

    x, t)

    0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

    x

    2 1 0 1 2

    t

    0 1

    2 3

    4 5

    u( x,

    t)

    0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

    • normalized variances with α = 1, T = 5,E[exp(−αXT )] = 1.649:

    ∆t vanilla with u∗ with û 10−2 4.64 1.70× 10−4 5.69× 10−2 10−3 4.02 1.69× 10−7 3.21× 10−2 10−4 4.55 1.73× 10−8 6.45× 10−2

    • Sample E[XT ] with π = N (b, 1), i.e. consider h(x) = − log x, w.r.t.

    dXs = (−Xs + b)ds+ √ 2dWs, Xt = x

    – then XT ∼ N (b+ (x− b)et−T , 1− e2(t−T ))

    u∗(x, t) =

    √ 2et−T

    b+ (x− b)et−T

    • Molecular dynamics: compute transition probabilities, i.e. first hitting times

    – consider g = 1, h = 0, i.e. we sample hitting times E[e−τ ]

    – optimal change of measure can be inter- preted as a tilting of the potential guiding the dynamics

    2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 x

    0

    2

    4

    6

    8

    10

    12

    14

    G (x

    )

    1 3 5 7 9 11 13 15 optimal

    References

    [1] Hartmann, C., Richter, L., Schütte, C., & Zhang, W. (2017). Variational characterization of free energy: theory and algorithms. Entropy, 19(11), 626. [2] Richter, L. (2016). Efficient statistical estimation using stochastic control and optimization. Master’s thesis, Freie Universität Berlin. [3] Fleming, W.H., Soner, H.M. (2006). Controlled Markov processes and viscosity solutions. Springer, New York, NY, USA. [4] Dai Pra, P., Meneghini, L., Runggaldier, W.J. (1996) Connections between stochastic control and dynamic games. Math. Control Signals Syst, 9, 303-326.