On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial...

Post on 12-Jul-2020

1 views 0 download

Transcript of On Particle Gibbs Samplingmwl25/mcmski/slides/ss_mcmski14.pdf · 0.6 0.8 ACF X 0 multinomial...

On Particle Gibbs Sampling

Sumeetpal S. SinghCambridge University Engineering Department

joint work: N. Chopin (CREST-ENSAE)

MCMSKICharmonix 7 January 2014

Page 1 of 17

State-space Model

Running example (Yu & Meng 2011):

X0 ∼ N(µ, σ2), Xt+1−µ = ρ(Xt−µ)+σεt , εt ∼i.i .d . N(0, 1)

Yt |Xt = xt ∼ Poisson(ext )

In general:

Xt |Xt−1 = xt−1 ∼ mθ(xt−1, xt)dxt

Yt |Xt = xt ∼ gθ(xt , yt)dyt

Page 2 of 17

State-space Model

Running example (Yu & Meng 2011):

X0 ∼ N(µ, σ2), Xt+1−µ = ρ(Xt−µ)+σεt , εt ∼i.i .d . N(0, 1)

Yt |Xt = xt ∼ Poisson(ext )

In general:

Xt |Xt−1 = xt−1 ∼ mθ(xt−1, xt)dxt

Yt |Xt = xt ∼ gθ(xt , yt)dyt

Page 2 of 17

Inference Objective

The posterior: p(θ, x0:T |y0:T ), θ = (µ, ρ, σ)

The Gibbs sampler: (θ, x0:T )→ (θ′, x ′0:T )

σ′|(x0:T , µ, ρ) ∼ Gamma (· · ·)...

µ′|(x0:T , σ′, ρ′) ∼ Normal (· · ·)

x ′0:T |(σ′, µ′, ρ′)

One at a time: xi |(x ′0:i−1, xi+1:T , σ′, µ′, ρ′)

Page 3 of 17

Inference Objective

The posterior: p(θ, x0:T |y0:T ), θ = (µ, ρ, σ)

The Gibbs sampler: (θ, x0:T )→ (θ′, x ′0:T )

σ′|(x0:T , µ, ρ) ∼ Gamma (· · ·)...

µ′|(x0:T , σ′, ρ′) ∼ Normal (· · ·)

x ′0:T |(σ′, µ′, ρ′)

One at a time: xi |(x ′0:i−1, xi+1:T , σ′, µ′, ρ′)

Page 3 of 17

Inference Objective

The posterior: p(θ, x0:T |y0:T ), θ = (µ, ρ, σ)

The Gibbs sampler: (θ, x0:T )→ (θ′, x ′0:T )

σ′|(x0:T , µ, ρ) ∼ Gamma (· · ·)...

µ′|(x0:T , σ′, ρ′) ∼ Normal (· · ·)

x ′0:T |(σ′, µ′, ρ′)

One at a time: xi |(x ′0:i−1, xi+1:T , σ′, µ′, ρ′)

Page 3 of 17

Particle Gibbs kernel (Andrieu, Doucet and Holenstein,2010):

Replace Gibbs step for x0:T with

X ′0:T |(θ′, x0:T ) ∼ Pθ′,N(x0:T , dx′0:T )

Invariant measure p(x0:T |θ′, y0:T )

The kernel is a randomly chosen output of a particle filter, Nparticles, one fixed to x0:T

Meant to emulate the Gibbs step (change the wholetrajectory)

Typically:

x ′0:T 6= x0:T but x ′i = xi for small i .

Increasing N fixes this.

Page 4 of 17

Particle Gibbs kernel (Andrieu, Doucet and Holenstein,2010):

Replace Gibbs step for x0:T with

X ′0:T |(θ′, x0:T ) ∼ Pθ′,N(x0:T , dx′0:T )

Invariant measure p(x0:T |θ′, y0:T )

The kernel is a randomly chosen output of a particle filter, Nparticles, one fixed to x0:T

Meant to emulate the Gibbs step (change the wholetrajectory)

Typically:

x ′0:T 6= x0:T but x ′i = xi for small i .

Increasing N fixes this.

Page 4 of 17

Running Particle Gibbs

Sampling p(θ, x0:399|y0:399)

0 50 100 150 200 250 300 350 400t

0.0

0.2

0.4

0.6

0.8

1.0

upda

te ra

te

multinomialresidualsystematicmulti+BS

0 50 100 150 200 250 300 350 400t

0.0

0.2

0.4

0.6

0.8

1.0

upda

te ra

te

multinomialresidualsystematicmulti+BS

Statistic: counting proportion x ′i 6= xi for i = 0, . . . , 399

Page 5 of 17

Running Particle Gibbs

Sampling p(θ, x0:400|y0:400)

0 10 20 30 40 50 60 70 80lag

0.0

0.2

0.4

0.6

0.8

ACF

X0

multinomialresidualsystematicmulti+BS

0 10 20 30 40 50 60 70 80lag

0.0

0.2

0.4

0.6

0.8

ACF

X399

multinomialresidualsystematicmulti+BS

Statistic: autocorrellation X0, X399 (200 particles)

Page 6 of 17

Particle filter

Input: {x i0:t}Ni=1 ≈ p(x0:t |θ, y0:t)

Output: {x i0:t+1}Ni=1 ≈ p(x0:t+1|θ, y0:t+1)

Append:(x i0:t ,Xit+1) where X i

t+1 ∼ mθ(x it , dxt+1)

Weight: x i0:t+1 gets weight w it+1 = gθ(x it+1, yt+1)

Output: Approximation of p(x0:t+1|θ, y0:t+1) are N independentsamples from

N∑i=1

w it+1δx i0:t+1

(dx ′0:t+1)

Page 7 of 17

Particle filter

Input: {x i0:t}Ni=1 ≈ p(x0:t |θ, y0:t)

Output: {x i0:t+1}Ni=1 ≈ p(x0:t+1|θ, y0:t+1)

Append:(x i0:t ,Xit+1) where X i

t+1 ∼ mθ(x it , dxt+1)

Weight: x i0:t+1 gets weight w it+1 = gθ(x it+1, yt+1)

Output: Approximation of p(x0:t+1|θ, y0:t+1) are N independentsamples from

N∑i=1

w it+1δx i0:t+1

(dx ′0:t+1)

Page 7 of 17

Sampling from Pθ,N(x?0:T , dx0:T )

Idea: N particle system but force x10:T = x?0:T

Intermediate step tInput: {x i0:t}Ni=1 with x1

0:t = x?0:t

Output: {x i0:t+1}Ni=1 with x10:t+1 = x?0:t+1

Append particles i = 2, . . . ,N:

(x0:t ,Xit+1) where X i

t+1 ∼ mθ(x it , dxt+1)

Weight: x i0:t+1 gets weight w it+1 = gθ(x it+1, yt+1)

Output x10:t+1 = x?0:t+1 and N − 1 independent samples from

w?t+1δx?0:t+1

(dx ′0:t+1) +N∑i=2

w it+1δx i0:t+1

(dx ′0:t+1)

Page 8 of 17

Sampling from Pθ,N(x?0:T , dx0:T )

Idea: N particle system but force x10:T = x?0:T

Intermediate step tInput: {x i0:t}Ni=1 with x1

0:t = x?0:t

Output: {x i0:t+1}Ni=1 with x10:t+1 = x?0:t+1

Append particles i = 2, . . . ,N:

(x0:t ,Xit+1) where X i

t+1 ∼ mθ(x it , dxt+1)

Weight: x i0:t+1 gets weight w it+1 = gθ(x it+1, yt+1)

Output x10:t+1 = x?0:t+1 and N − 1 independent samples from

w?t+1δx?0:t+1

(dx ′0:t+1) +N∑i=2

w it+1δx i0:t+1

(dx ′0:t+1)

Page 8 of 17

Sampling from Pθ,N(x?0:T , dx0:T )

Idea: N particle system but force x10:T = x?0:T

Intermediate step tInput: {x i0:t}Ni=1 with x1

0:t = x?0:t

Output: {x i0:t+1}Ni=1 with x10:t+1 = x?0:t+1

Append particles i = 2, . . . ,N:

(x0:t ,Xit+1) where X i

t+1 ∼ mθ(x it , dxt+1)

Weight: x i0:t+1 gets weight w it+1 = gθ(x it+1, yt+1)

Output x10:t+1 = x?0:t+1 and N − 1 independent samples from

w?t+1δx?0:t+1

(dx ′0:t+1) +N∑i=2

w it+1δx i0:t+1

(dx ′0:t+1)

Page 8 of 17

Sampling from Pθ,N(x?0:T , dx0:T )

Idea: N particle system but force x10:T = x?0:T

Intermediate step tInput: {x i0:t}Ni=1 with x1

0:t = x?0:t

Output: {x i0:t+1}Ni=1 with x10:t+1 = x?0:t+1

Append particles i = 2, . . . ,N:

(x0:t ,Xit+1) where X i

t+1 ∼ mθ(x it , dxt+1)

Weight: x i0:t+1 gets weight w it+1 = gθ(x it+1, yt+1)

Output x10:t+1 = x?0:t+1 and N − 1 independent samples from

w?t+1δx?0:t+1

(dx ′0:t+1) +N∑i=2

w it+1δx i0:t+1

(dx ′0:t+1)

Page 8 of 17

cPF example

0 1 2 3 4 5 6 7 8 9 10−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Page 9 of 17

cPF example

0 1 2 3 4 5 6 7 8 9 10−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Page 9 of 17

cPF example

0 1 2 3 4 5 6 7 8 9 10−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Page 9 of 17

Uniform Ergodicity

Assumption

gθ(xt , yt) ≤ Gt,θ (density upper bounded)

Predicted density lower bounded:∫mθ(dx0)gθ(x0, y0) ≥ 1

G0,θ,

∫mθ(xt−1, dxt)gθ(xt , yt) ≥

1

Gt,θ

For any θ and ε > 0, for N large enough,

|(PN,θϕ) (x0:T )− (PN,θϕ) (x̌0:T )| ≤ ε

for all x0:T , x̌0:T , and ϕ : XT+1 → [−1, 1]

⇒ Large N gives samples arbitrarily close to target p(x0:T |θ, y0:T )⇒ Uniform ergodicity

Page 10 of 17

Proof Idea

Use coupling: define (X ?0:T , X̌

?0:T ) such that

Law(X ?0:T ) = PN,θ(x0:T , dx

?0:T ) Law(X̌ ?

0:T ) = PN,θ(x̌0:T , dx̌?0:T )

IfP(X ?

0:T 6= X̌ ?0:T ) ≤ ε

then

PNT (x0:T ,A)− PN

T (x̌0:T ,A) = E{(

IA(X ?0:T )− IA(X̌ ?

0:T ))I{X0:T 6=X̌0:T }

}≤ P

(X ?

0:T 6= X̌ ?0:T

)≤ ε

Page 11 of 17

Coupling cPFs

Aim: Coupling outputs of PN,θ(x0:T , dx?0:T ) and PN,θ(x̌0:T , dx̌

?0:T )

If cPF outputs {x i0:t}Ni=1 and {x̌ i0:t}Ni=1 satisfy

x i0:t = x̌ i0:t for i ∈ Ct

where C0 = {2, . . . ,N}Same holds after append move: (x i0:t , x

it+1) = (x̌ i0:t , x̌

it+1) for

i ∈ Ct

Resampling move: select particles in Ct for survival if possible

Coupling probability determined by law of CT

Page 12 of 17

Backward Sampling

N. Whiteley (RSS discussion of PMCMC) suggested an extrabackward step for cPF that tries to modify (recursively, backwardin time) the ancestry of the selected trajectory.

There is a forward “version” by Lindsten and Schon (2012)

Page 13 of 17

Idea of Backward Sampling

0 1 2 3 4 5 6 7 8 9 10−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Page 14 of 17

Running Backward Sampling

0 50 100 150 200 250 300 350 400t

0.0

0.2

0.4

0.6

0.8

1.0

upda

te ra

te

multinomialresidualsystematicmulti+BS

0 10 20 30 40 50 60 70 80lag

0.0

0.2

0.4

0.6

0.8

ACF

X0

multinomialresidualsystematicmulti+BS

Left Statistic: counting proportion x ′i 6= xi for i = 0, . . . , 399

Page 15 of 17

Backward Sampling

Success depends on state transition law mθ(xt , dxt+1)

The cPF kernel Pθ,N with a BS step dominates the no BS versionin asymptotic efficiency

i.e. gives rise to a CLT with a smaller asymptotic variance (Idea:self-adjoint operator + lag-1 domination; see Tierney (1998), Mira& Geyer (1999) )

The cPF kernel geometrically ergodic ⇒ cPF kernel with BS isgeometrically ergodic

(Idea: both kernels are positive operators)

Page 16 of 17

Final Remarks

Established uniform ergodicity of cPF using coupling

Dependance on N and T not given although in practise Nshould scale linearly with T– now proved by (Andrieu, Lee, Vihola) and (Douc, Lindsten,Moulines)

Whiteley’s BS is better: in asymptotic efficiency and inheritsgeometric ergodicity

Page 17 of 17