Download - On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Transcript
Page 1: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

On Particle Gibbs Sampling

Sumeetpal S. SinghCambridge University Engineering Department

joint work: N. Chopin (CREST-ENSAE)

MCMSKICharmonix 7 January 2014

Page 1 of 17

Page 2: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

State-space Model

Running example (Yu & Meng 2011):

X0 ∼ N(µ, σ2), Xt+1−µ = ρ(Xt−µ)+σεt , εt ∼i.i .d . N(0, 1)

Yt |Xt = xt ∼ Poisson(ext )

In general:

Xt |Xt−1 = xt−1 ∼ mθ(xt−1, xt)dxt

Yt |Xt = xt ∼ gθ(xt , yt)dyt

Page 2 of 17

Page 3: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

State-space Model

Running example (Yu & Meng 2011):

X0 ∼ N(µ, σ2), Xt+1−µ = ρ(Xt−µ)+σεt , εt ∼i.i .d . N(0, 1)

Yt |Xt = xt ∼ Poisson(ext )

In general:

Xt |Xt−1 = xt−1 ∼ mθ(xt−1, xt)dxt

Yt |Xt = xt ∼ gθ(xt , yt)dyt

Page 2 of 17

Page 4: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Inference Objective

The posterior: p(θ, x0:T |y0:T ), θ = (µ, ρ, σ)

The Gibbs sampler: (θ, x0:T )→ (θ′, x ′0:T )

σ′|(x0:T , µ, ρ) ∼ Gamma (· · ·)...

µ′|(x0:T , σ′, ρ′) ∼ Normal (· · ·)

x ′0:T |(σ′, µ′, ρ′)

One at a time: xi |(x ′0:i−1, xi+1:T , σ′, µ′, ρ′)

Page 3 of 17

Page 5: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Inference Objective

The posterior: p(θ, x0:T |y0:T ), θ = (µ, ρ, σ)

The Gibbs sampler: (θ, x0:T )→ (θ′, x ′0:T )

σ′|(x0:T , µ, ρ) ∼ Gamma (· · ·)...

µ′|(x0:T , σ′, ρ′) ∼ Normal (· · ·)

x ′0:T |(σ′, µ′, ρ′)

One at a time: xi |(x ′0:i−1, xi+1:T , σ′, µ′, ρ′)

Page 3 of 17

Page 6: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Inference Objective

The posterior: p(θ, x0:T |y0:T ), θ = (µ, ρ, σ)

The Gibbs sampler: (θ, x0:T )→ (θ′, x ′0:T )

σ′|(x0:T , µ, ρ) ∼ Gamma (· · ·)...

µ′|(x0:T , σ′, ρ′) ∼ Normal (· · ·)

x ′0:T |(σ′, µ′, ρ′)

One at a time: xi |(x ′0:i−1, xi+1:T , σ′, µ′, ρ′)

Page 3 of 17

Page 7: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Particle Gibbs kernel (Andrieu, Doucet and Holenstein,2010):

Replace Gibbs step for x0:T with

X ′0:T |(θ′, x0:T ) ∼ Pθ′,N(x0:T , dx′0:T )

Invariant measure p(x0:T |θ′, y0:T )

The kernel is a randomly chosen output of a particle filter, Nparticles, one fixed to x0:T

Meant to emulate the Gibbs step (change the wholetrajectory)

Typically:

x ′0:T 6= x0:T but x ′i = xi for small i .

Increasing N fixes this.

Page 4 of 17

Page 8: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Particle Gibbs kernel (Andrieu, Doucet and Holenstein,2010):

Replace Gibbs step for x0:T with

X ′0:T |(θ′, x0:T ) ∼ Pθ′,N(x0:T , dx′0:T )

Invariant measure p(x0:T |θ′, y0:T )

The kernel is a randomly chosen output of a particle filter, Nparticles, one fixed to x0:T

Meant to emulate the Gibbs step (change the wholetrajectory)

Typically:

x ′0:T 6= x0:T but x ′i = xi for small i .

Increasing N fixes this.

Page 4 of 17

Page 9: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Running Particle Gibbs

Sampling p(θ, x0:399|y0:399)

0 50 100 150 200 250 300 350 400t

0.0

0.2

0.4

0.6

0.8

1.0

upda

te ra

te

multinomialresidualsystematicmulti+BS

0 50 100 150 200 250 300 350 400t

0.0

0.2

0.4

0.6

0.8

1.0

upda

te ra

te

multinomialresidualsystematicmulti+BS

Statistic: counting proportion x ′i 6= xi for i = 0, . . . , 399

Page 5 of 17

Page 10: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Running Particle Gibbs

Sampling p(θ, x0:400|y0:400)

0 10 20 30 40 50 60 70 80lag

0.0

0.2

0.4

0.6

0.8

ACF

X0

multinomialresidualsystematicmulti+BS

0 10 20 30 40 50 60 70 80lag

0.0

0.2

0.4

0.6

0.8

ACF

X399

multinomialresidualsystematicmulti+BS

Statistic: autocorrellation X0, X399 (200 particles)

Page 6 of 17

Page 11: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Particle filter

Input: {x i0:t}Ni=1 ≈ p(x0:t |θ, y0:t)

Output: {x i0:t+1}Ni=1 ≈ p(x0:t+1|θ, y0:t+1)

Append:(x i0:t ,Xit+1) where X i

t+1 ∼ mθ(x it , dxt+1)

Weight: x i0:t+1 gets weight w it+1 = gθ(x it+1, yt+1)

Output: Approximation of p(x0:t+1|θ, y0:t+1) are N independentsamples from

N∑i=1

w it+1δx i0:t+1

(dx ′0:t+1)

Page 7 of 17

Page 12: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Particle filter

Input: {x i0:t}Ni=1 ≈ p(x0:t |θ, y0:t)

Output: {x i0:t+1}Ni=1 ≈ p(x0:t+1|θ, y0:t+1)

Append:(x i0:t ,Xit+1) where X i

t+1 ∼ mθ(x it , dxt+1)

Weight: x i0:t+1 gets weight w it+1 = gθ(x it+1, yt+1)

Output: Approximation of p(x0:t+1|θ, y0:t+1) are N independentsamples from

N∑i=1

w it+1δx i0:t+1

(dx ′0:t+1)

Page 7 of 17

Page 13: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Sampling from Pθ,N(x?0:T , dx0:T )

Idea: N particle system but force x10:T = x?0:T

Intermediate step tInput: {x i0:t}Ni=1 with x1

0:t = x?0:t

Output: {x i0:t+1}Ni=1 with x10:t+1 = x?0:t+1

Append particles i = 2, . . . ,N:

(x0:t ,Xit+1) where X i

t+1 ∼ mθ(x it , dxt+1)

Weight: x i0:t+1 gets weight w it+1 = gθ(x it+1, yt+1)

Output x10:t+1 = x?0:t+1 and N − 1 independent samples from

w?t+1δx?0:t+1

(dx ′0:t+1) +N∑i=2

w it+1δx i0:t+1

(dx ′0:t+1)

Page 8 of 17

Page 14: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Sampling from Pθ,N(x?0:T , dx0:T )

Idea: N particle system but force x10:T = x?0:T

Intermediate step tInput: {x i0:t}Ni=1 with x1

0:t = x?0:t

Output: {x i0:t+1}Ni=1 with x10:t+1 = x?0:t+1

Append particles i = 2, . . . ,N:

(x0:t ,Xit+1) where X i

t+1 ∼ mθ(x it , dxt+1)

Weight: x i0:t+1 gets weight w it+1 = gθ(x it+1, yt+1)

Output x10:t+1 = x?0:t+1 and N − 1 independent samples from

w?t+1δx?0:t+1

(dx ′0:t+1) +N∑i=2

w it+1δx i0:t+1

(dx ′0:t+1)

Page 8 of 17

Page 15: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Sampling from Pθ,N(x?0:T , dx0:T )

Idea: N particle system but force x10:T = x?0:T

Intermediate step tInput: {x i0:t}Ni=1 with x1

0:t = x?0:t

Output: {x i0:t+1}Ni=1 with x10:t+1 = x?0:t+1

Append particles i = 2, . . . ,N:

(x0:t ,Xit+1) where X i

t+1 ∼ mθ(x it , dxt+1)

Weight: x i0:t+1 gets weight w it+1 = gθ(x it+1, yt+1)

Output x10:t+1 = x?0:t+1 and N − 1 independent samples from

w?t+1δx?0:t+1

(dx ′0:t+1) +N∑i=2

w it+1δx i0:t+1

(dx ′0:t+1)

Page 8 of 17

Page 16: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Sampling from Pθ,N(x?0:T , dx0:T )

Idea: N particle system but force x10:T = x?0:T

Intermediate step tInput: {x i0:t}Ni=1 with x1

0:t = x?0:t

Output: {x i0:t+1}Ni=1 with x10:t+1 = x?0:t+1

Append particles i = 2, . . . ,N:

(x0:t ,Xit+1) where X i

t+1 ∼ mθ(x it , dxt+1)

Weight: x i0:t+1 gets weight w it+1 = gθ(x it+1, yt+1)

Output x10:t+1 = x?0:t+1 and N − 1 independent samples from

w?t+1δx?0:t+1

(dx ′0:t+1) +N∑i=2

w it+1δx i0:t+1

(dx ′0:t+1)

Page 8 of 17

Page 17: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

cPF example

0 1 2 3 4 5 6 7 8 9 10−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Page 9 of 17

Page 18: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

cPF example

0 1 2 3 4 5 6 7 8 9 10−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Page 9 of 17

Page 19: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

cPF example

0 1 2 3 4 5 6 7 8 9 10−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Page 9 of 17

Page 20: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Uniform Ergodicity

Assumption

gθ(xt , yt) ≤ Gt,θ (density upper bounded)

Predicted density lower bounded:∫mθ(dx0)gθ(x0, y0) ≥ 1

G0,θ,

∫mθ(xt−1, dxt)gθ(xt , yt) ≥

1

Gt,θ

For any θ and ε > 0, for N large enough,

|(PN,θϕ) (x0:T )− (PN,θϕ) (x̌0:T )| ≤ ε

for all x0:T , x̌0:T , and ϕ : XT+1 → [−1, 1]

⇒ Large N gives samples arbitrarily close to target p(x0:T |θ, y0:T )⇒ Uniform ergodicity

Page 10 of 17

Page 21: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Proof Idea

Use coupling: define (X ?0:T , X̌

?0:T ) such that

Law(X ?0:T ) = PN,θ(x0:T , dx

?0:T ) Law(X̌ ?

0:T ) = PN,θ(x̌0:T , dx̌?0:T )

IfP(X ?

0:T 6= X̌ ?0:T ) ≤ ε

then

PNT (x0:T ,A)− PN

T (x̌0:T ,A) = E{(

IA(X ?0:T )− IA(X̌ ?

0:T ))I{X0:T 6=X̌0:T }

}≤ P

(X ?

0:T 6= X̌ ?0:T

)≤ ε

Page 11 of 17

Page 22: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Coupling cPFs

Aim: Coupling outputs of PN,θ(x0:T , dx?0:T ) and PN,θ(x̌0:T , dx̌

?0:T )

If cPF outputs {x i0:t}Ni=1 and {x̌ i0:t}Ni=1 satisfy

x i0:t = x̌ i0:t for i ∈ Ct

where C0 = {2, . . . ,N}Same holds after append move: (x i0:t , x

it+1) = (x̌ i0:t , x̌

it+1) for

i ∈ Ct

Resampling move: select particles in Ct for survival if possible

Coupling probability determined by law of CT

Page 12 of 17

Page 23: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Backward Sampling

N. Whiteley (RSS discussion of PMCMC) suggested an extrabackward step for cPF that tries to modify (recursively, backwardin time) the ancestry of the selected trajectory.

There is a forward “version” by Lindsten and Schon (2012)

Page 13 of 17

Page 24: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Idea of Backward Sampling

0 1 2 3 4 5 6 7 8 9 10−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Page 14 of 17

Page 25: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Running Backward Sampling

0 50 100 150 200 250 300 350 400t

0.0

0.2

0.4

0.6

0.8

1.0

upda

te ra

te

multinomialresidualsystematicmulti+BS

0 10 20 30 40 50 60 70 80lag

0.0

0.2

0.4

0.6

0.8

ACF

X0

multinomialresidualsystematicmulti+BS

Left Statistic: counting proportion x ′i 6= xi for i = 0, . . . , 399

Page 15 of 17

Page 26: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Backward Sampling

Success depends on state transition law mθ(xt , dxt+1)

The cPF kernel Pθ,N with a BS step dominates the no BS versionin asymptotic efficiency

i.e. gives rise to a CLT with a smaller asymptotic variance (Idea:self-adjoint operator + lag-1 domination; see Tierney (1998), Mira& Geyer (1999) )

The cPF kernel geometrically ergodic ⇒ cPF kernel with BS isgeometrically ergodic

(Idea: both kernels are positive operators)

Page 16 of 17

Page 27: On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial residual systematic multi+BS 0 10 20 30 40 50 60 70 80 lag 0.0 0.2 0.4 0.6 0.8 ACF X 399

Final Remarks

Established uniform ergodicity of cPF using coupling

Dependance on N and T not given although in practise Nshould scale linearly with T– now proved by (Andrieu, Lee, Vihola) and (Douc, Lindsten,Moulines)

Whiteley’s BS is better: in asymptotic efficiency and inheritsgeometric ergodicity

Page 17 of 17