Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and...

36
Advanced Hamiltonian Monte Carlo: Riemann Man- ifold HMC and the No-U Turn Sampler Nilesh Tripuraneni, Adam Scibior [email protected] [email protected] 22/01/2015

Transcript of Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and...

Page 1: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Advanced Hamiltonian Monte Carlo: Riemann Man-ifold HMC and the No-U Turn Sampler

Nilesh Tripuraneni, Adam Scibior

[email protected] [email protected]

22/01/2015

Page 2: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

B.(HM)C.

I MCMC requires a Proposal Kernel – T (θ, θ′) – to sample fromp(θ) = e−U(θ).

I Random Walk Proposal – T (θ, θ′) ∼ N (θ′|θ, σ) – agnostic toany details of the target density.

I Motivated by SDE of Langevin Dynamicsdθ(t) = −∇θU(θ)dt + dW (t) proposeθ′ = θ − ε2

2∇θU(θ) + εN (0,1)

Figure : From IT, L, and I – D.J.C MacKay

2 of 36

Page 3: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Hamiltonian Monte CarloI Have density f (θ) = e−U(θ), and introduce auxiliary variable

p ∼ N (p|0, I) in order to obtain joint density

f (θ,p) =1

(2π)d/2 e−12 pT pp(θ) =

12πd/2 e−( 1

2 pT p+U(θ))

I Energy H(p, θ) = 12 log(2π)d − log f (θ)︸ ︷︷ ︸

PotentialU(θ)

+12

pT p︸ ︷︷ ︸KineticK (p)

I Integrate out p –∫

dp 12πd/2 e−( 1

2 pT p+U(θ)) = e−U(θ) – to recovercorrect parameter marginal.

3 of 36

Page 4: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Hamiltonian Dynamics

I Hamiltonian: H = 12pT p + U(θ)

I Hamilton’s equations:

dθdτ

=∂H∂p

= p (1)

dpdτ

= −∂H∂θ

= −∇θU(θ) (2)

I Naive Euler Integrator

θ(τ + ε) = θ(τ) + εdθ(τ)

dτ= θ(τ) + εp(τ)

p(τ + ε) = p(τ) + εdp(τ)

dτ= p(τ)− ε∇θU(θ)

4 of 36

Page 5: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Hamiltonian Dynamics cont’d.

I For separable H, ”split” H as H(p, θ) = U(θ)/2 + K (p) + U(θ)/2to induce the flow maps on (p,q):

Φ ε2 ,U(θ) ◦ Φε,K (p) ◦ Φ ε

2 ,U(θ)

I Reversible, Volume-Preserving Leapfrog integrator:

p(t + ε/2) = p(τ) + ε∇θU(θ(τ))/2θ(τ + ε) = θ(τ) + εp(τ + ε/2)

p(τ + ε) = p(τ + ε/2) + ε∇θU(θ(τ + ε))/2

(a) Leapfrog (b) Symplectic Int.5 of 36

Page 6: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Hamiltonian Monte CarloI Where’s the sampling? First Gibbs sample from p then do all of

the above, iterating the leapfrog integrator L times withstep-size ε as a deterministic proposal.

I Apply MH correction step

min(1,e−H(p∗,θ∗)

e−H(p,θ) ������

�����:1

T ((p∗, θ∗)→ (p, θ))

T ((p, θ)→ (p∗, θ∗))

to correct for integrator bias. Note that our careful constructionhas resulted in the cancellation of the ”Hastings” term.

6 of 36

Page 7: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Hamiltonian Monte Carlo Picture

Figure : From IT, L, and I – D.J.C MacKay

7 of 36

Page 8: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

HMC parameters - L

The number of steps for which the Hamilton equations aresimulated in a single MCMC step.

Small L - random walk behaviour

Large L - wasteful re-exploration of the state space

8 of 36

Page 9: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

HMC parameters - ε

The size of a discrete step used for numeric integration.

Small ε - larger number of steps required to simulate the sameamount of ”time”

Large ε - significant discretisation errors, low acceptance rate

9 of 36

Page 10: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Automatically setting L - the U turn

The U turn is a convenient stopping criterion

ddt

(θ+ − θ−)2

2= (θ+ − θ−)T p+ < 0

But what about detailed balance?

10 of 36

Page 11: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Automatically setting L - NUTS

We want

min(1,���

������

������:1

f (p∗, θ∗)f (p, θ)

T ((p∗, θ∗)→ (p, θ))

T ((p, θ)→ (p∗, θ∗))) = 1

Three steps of NUTS:

I Simulate Hamiltonian dynamics until the U turn occurrs,generating a trace B.

I Pick a suitable subset of B of candidate states C.I Pick the new state uniformly from C.

11 of 36

Page 12: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

NUTS - slice sampling

Slice sampling makes NUTS simpler.

f (p, θ, u) ∝ I[u ∈ [0, f (p, θ)]]

f (p, θ) =

∫f (p, θ, u)du

f (p, θ | u) ∝ I[f (p, θ) ≥ u]

f (u | p, θ) = Uniform(0, f (p, θ))

12 of 36

Page 13: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

NUTS - generating trajectories

function TRAJECTORY((p, θ))B ← {(p, θ)}for n = 0 to∞ do

direction← Uniform(left , right)B′ ← extend(B,2n,direction)if STOPPING(B′) then breakend ifB ← B′

end forreturn B

end functionfunction STOPPING(B)

(p−, θ−)← leftmost(B)(p+, θ+)← rightmost(B)Uturn← (θ+ − θ−)T p− < 0return Uturn ∨ STOPPING(left(B)) ∨ STOPPING(right(B))

end function13 of 36

Page 14: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

NUTS - full algorithm

function NUTSSTEP((p, θ))u ← Uniform(0, f (p, θ))B ← TRAJECTORY((p, θ))C ← {f (p∗, θ∗) ≥ u | (p∗, θ∗) ∈ B}(p∗, θ∗)← Uniform(C)return (p∗, θ∗)

end function

14 of 36

Page 15: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

NUTS - why it works

Pr(B | (p, θ)) = Pr(B | (p∗, θ∗))

if (p, θ), (p∗, θ∗) ∈ B

f (u | (p, θ))

f (u | (p∗, θ∗))=

f (p∗, θ∗)f (p, θ)

C computed deterministically from u and B

Pr((p, θ) | C) = Pr((p∗, θ∗ | C)

if (p, θ), (p∗, θ∗) ∈ C

T ((p, θ)→ (p∗, θ∗))

T ((p∗, θ∗)→ (p, θ))=

f (p∗, θ∗)f (p, θ)

15 of 36

Page 16: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Automatically setting ε

asymptotically vanishing adaptation

εt+1 = εt + ηtHt∑t

ηt =∞,∑

t

η2t <∞

Ht = (αt − δ)

αt - Metropolis-Hastings acceptance probability at time t

δ - target acceptance ratio; 0.65 is a reasonable default

ηt controls the decay of adaptation, a good choice is ηt = t−κ withκ ∈ (0.5,1]

16 of 36

Page 17: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Automatically setting ε

dual averaging

εt+1 = µ−√

1t + t0

t∑i=0

Hi

ε̄t+1 = ηtεt+1 + (1− ηt )ε̄t

17 of 36

Page 18: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Stan

A C++ based probabilistic programming framework whichimplements HMC and NUTS.

http://mc-stan.org

18 of 36

Page 19: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Pre-Conditioning

Problem: Sample from a correlated distribution with different scalesusing an isotropic proposal

Figure : From IT, L, and I – D.J.C MacKay

I If ε ∼ L too many rejectionsI If ε << L slow mixing along the ”L” direction

Solution: Use a ”preconditioned” proposal Q(x , x ′) ∼ Σ−1N (x ′, εI)19 of 36

Page 20: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

HMC with Pre-Conditioning

I Density p(θ), introduce auxiliary variable p ∼ N (p|0,M).I Potential Energy

H(p, θ) = − log p(θ) + 12 log(2π)d |M|+ 1

2pT M−1pI Hamilton’s equations:

dθdτ = ∂H

∂p = M−1p and dpdτ = −∂H

∂θ = ∇θUI Reversible, Volume-Preserving Leapfrog integrator:

p(t + ε/2) = p(τ) + ε∇θU(θ(τ))/2θ(τ + ε) = θ(τ) + εM−1p(τ + ε/2)

p(τ + ε) = p(τ + ε/2) + ε∇θU(θ(τ + ε))/2I How to set M? Requires either exact (or empirical estimate) of

covariance of target

20 of 36

Page 21: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Global→ LocalGlobal covariance estimates can be badly locally miscalibrated

Figure : From Kameleon MCMC – Gretton et al.

21 of 36

Page 22: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Philosophizing

I Coordinates are for calculations, but geometry should bedefined in a ”covariant” way.

I From a formal perspective all coordinates are equally good.I If so, ideally our calculations (gradients, sampling proposals,

etc...) to be should be independent of parametrization.I Perhaps there are better notions of ”closeness” of parameters

in probability models – i.e. should we consider N (0,1) asbeing the same distance from N (0,2) as N (0,99) is fromN (0,100) ?

22 of 36

Page 23: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

ManifoldI Characterized by an atlas – a consistent collection of open sets

Ui , and functions φi that bicontinuously map the manifold M toa real vector space Rm – (Ui , φi)

I Useful to consider curves, c : (a,b)→ M and functionsf : M → R on the manifold in order to define differentiablestructure

Figure : From GTP – M.Nakahara

23 of 36

Page 24: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Tangent Space

I If c : (a,b)→ M is a curve, and f : M → R thendf (c(t))

dt |t=0 = Xµ( ∂f∂xµ ) ≡ X [f ] where ∂f/∂xµ really means

∂(f◦φ−1(x))∂xµ and Xµ = dφ(c(t))

dt |t=0

I Tangent Space TpM is a linearization of the manifold spannedby directional derivatives at p

24 of 36

Page 25: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

MetricThe tangent space at each point TpM is endowed with an innerproduct via the metric tensor Gp : TpM × TpM → R giving a notionof distance/angle between points. It is also:

I Symmetric Gp(t1, t2)

I Bilinear Gp(t1 + t2, t3) = Gp(t1, t3) + Gp(t2, t3)

I Positive-Definite Gp(t1, t2) > 0If we had a path θ(t) : R→ M, then the length of the curve is givenby:

D(t1, t2) =∫ t2

t1

√Gθ(t)(dθ

dt ,dθdt )dt

For example, the metric on R2 in polar (r , θ) coordinates is:[1 00 r2

]

25 of 36

Page 26: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Connections and Parallel Transport

I In Euclidean space Rm the derivative of the vector fieldV = Vµeµ with respect to xv has the µth component∂Vµ∂xν = lim∆x→0

Vµ(...,xν+∆xν ,...)−Vµ(...,xν ,....)∆xν

I Take an affine connection, then with a chart (U, φ) withcoordinate x = φ(p) on M, and define the connectioncoefficients as by ∇eνeµ = Γλνµ where eµ = ∂/∂xµ

I The covariant derivative of V with respect to xν is defined bylim∆xν→0

Vµ(x+∆x)− ˜Vµ(x+∆x)∆xν ( ∂

∂xµ ) = ∇νVµ =

(∂Vµ∂xν + V νΓµνλ)( ∂

∂xµ )I Allows us to connect nearby tangent spaces

26 of 36

Page 27: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Pictures

27 of 36

Page 28: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Putting It All Together

I A manifold with a metric defines an essentially uniqueconnection called the Levi-Civita connection –Γk

ij = 12G−1

km(∂iGjm + ∂jGI′m − ∂mGij)

I Given a curve c(t) with vector field V = dxµdt , it is a geodesic if

∇V V = 0. This corresponds to the notion of ”straightness” on amanifold.

I Geodesic equivalent to curve minimizing the D(t1, t2) definedbefore.

I Proposals will be Geodesics

28 of 36

Page 29: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Information Geometry

I Describe set of probability distributions parametrized by θ as astatistical manifold

I Denote Expected Fisher Information as

G(θ) = Cov(∇θ log f (θ)) = −E(∂2

∂θ2 log f (θ))

I To first order:

D(θ||θ + δθ) =

∫dyp(y |θ + δθ) log

p(y |θ + δθ)

p(y |θ)≈ δθT G(θ)δθ

29 of 36

Page 30: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

N (µ, σ)

I For N (µ, σ) the metric in (µ, σ) coordinates is:

G =

[N/σ2 0

0 2N/σ2

]∂G∂µ

=

[0 00 0

]∂G∂σ

=

[−2N/σ3 0

0 −4N/σ3

]I Riemannian Langevin Diffusion

dθi(t) = [G−1(θ(t))∇θU(θ)]idt+

|G(θ(t))|−1/2| ∂∂θj

[G−1(θ(t))ij |G(θ(t))|−1/2]idt+

[√

G−1(θ(t))dW (t)]i

30 of 36

Page 31: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

More PicturesI With a sample size of N = 30 drawn from N (µ = 0, σ = 10)

31 of 36

Page 32: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Riemannian HMCI HMC on a metrized (with Fisher information) manifold:

H(θ,p) = −L(θ) +12

log((2π)D|G(θ)|) +12

pT G(θ)−1p

I Integrating out p leaves us with the correct marginal over θ

32 of 36

Page 33: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Riemannian HMC cont’.dI Hamilton’s equations for time evolution are equivalent to the

2nd order geodesic equations – so RHMC proposal aregeodesics.

dθdt

=∂H∂pi

= (G(θ)−1p)i

dpi

dt= −∂H

∂θi= (3)

∂U(θ)

∂θi− 1

2[Tr[G(θ)−1∂G(θ)

∂θi] +

12

pT G(θ)−1∂G(θ)

∂θiG(θ)−1p (4)

I Define implicit, volume-preserving, reversible GeneralizeLeapfrog Integrator which is solved iteratively.

33 of 36

Page 34: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Riemannian HMC Performance

34 of 36

Page 35: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

Riemann SGLD Performance

35 of 36

Page 36: Advanced Hamiltonian Monte Carlo: Riemann Manifold HMC and ...cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/AdvancedHMC.pdf · Advanced Hamiltonian Monte Carlo: Riemann Man-ifold

CitationsI Duane, S, Kennedy, A D, Pendleton, B J, and Roweth, D.

Hybrid Monte Carlo. Physics Letters B, 195(2):216222, 1987I Neal, Radford M. MCMC using Hamiltonian dynamics.

Handbook of Markov Chain Monte Carlo, 54:113162, 2010.I Hoffman, Gelman. The No-U-turn sampler: adaptively setting

path lengths in Hamiltonian Monte Carlo. Journal of MachineLearning Research: Volume 15 Issue 1, 1593-1623

I Girolami, Mark and Calderhead, Ben. Riemann manifoldLangevin and Hamiltonian Monte Carlo methods. Journal ofthe Royal Statistical Society: Series B, 73(2):123 214, 2011.

I Nakahara, Mikio. Geometry, Topology, and Physics

36 of 36