of 112

• date post

17-Sep-2020
• Category

## Documents

• view

0

0

Embed Size (px)

### Transcript of Bayesian Parameter Inference in State-Space Models using ... ... Bayesian Parameter Inference in...

• Bayesian Parameter Inference in State-Space Models using Particle MCMC

Arnaud Doucet Department of Statistics, Oxford University

University College London

5th October 2012

A. Doucet (UCL Masterclass Oct. 2012) 5th October 2012 1 / 37

• State-Space Models with Unknown Parameters

In most scenarios of interest, the state-space model contains an unknown static parameter θ ∈ Θ so that

X1 ∼ µθ (·) and Xt | (Xt−1 = x) ∼ fθ ( ·| xt−1) .

The observations {Yt}t≥1 are conditionally independent given {Xt}t≥1 and θ

Yt | (Xt = xt ) ∼ gθ ( ·| x) . Aim: Given observations y1:T , we want to infer θ in a Bayesian framework.

A. Doucet (UCL Masterclass Oct. 2012) 5th October 2012 2 / 37

• State-Space Models with Unknown Parameters

In most scenarios of interest, the state-space model contains an unknown static parameter θ ∈ Θ so that

X1 ∼ µθ (·) and Xt | (Xt−1 = x) ∼ fθ ( ·| xt−1) .

The observations {Yt}t≥1 are conditionally independent given {Xt}t≥1 and θ

Yt | (Xt = xt ) ∼ gθ ( ·| x) .

Aim: Given observations y1:T , we want to infer θ in a Bayesian framework.

A. Doucet (UCL Masterclass Oct. 2012) 5th October 2012 2 / 37

• State-Space Models with Unknown Parameters

In most scenarios of interest, the state-space model contains an unknown static parameter θ ∈ Θ so that

X1 ∼ µθ (·) and Xt | (Xt−1 = x) ∼ fθ ( ·| xt−1) .

The observations {Yt}t≥1 are conditionally independent given {Xt}t≥1 and θ

Yt | (Xt = xt ) ∼ gθ ( ·| x) . Aim: Given observations y1:T , we want to infer θ in a Bayesian framework.

A. Doucet (UCL Masterclass Oct. 2012) 5th October 2012 2 / 37

• Bayesian Parameter Inference in State-Space Models

Set a prior p (θ) on θ so inference relies now on

p ( θ, x1:T | y1:T ) = p (θ, x1:T , y1:T )

p (y1:t )

where p (θ, x1:T , y1:T ) = p (θ) pθ (x1:T , y1:T )

with

pθ (x1:T , y1:T ) = µθ (x1) T

∏ k=2

fθ (xk | xk−1) T

∏ k=1

gθ (yk | xk )

Standard approaches rely on MCMC.

A. Doucet (UCL Masterclass Oct. 2012) 5th October 2012 3 / 37

• Bayesian Parameter Inference in State-Space Models

Set a prior p (θ) on θ so inference relies now on

p ( θ, x1:T | y1:T ) = p (θ, x1:T , y1:T )

p (y1:t )

where p (θ, x1:T , y1:T ) = p (θ) pθ (x1:T , y1:T )

with

pθ (x1:T , y1:T ) = µθ (x1) T

∏ k=2

fθ (xk | xk−1) T

∏ k=1

gθ (yk | xk )

Standard approaches rely on MCMC.

A. Doucet (UCL Masterclass Oct. 2012) 5th October 2012 3 / 37

• Common MCMC Approaches and Limitations

MCMC Idea: Simulate an ergodic Markov chain {θ (i) ,X1:T (i)}i≥0 of invariant distribution p ( θ, x1:T | y1:T )... infinite number of possibilities.

Typical strategies consists of updating iteratively X1:T conditional upon θ then θ conditional upon X1:T .

To update X1:T conditional upon θ, use MCMC kernels updating subblocks according to pθ (xt :t+K−1| yt :t+K−1, xt−1, xt+K ). Standard MCMC algorithms are ineffi cient if θ and X1:T are strongly correlated.

Strategy impossible to implement when it is only possible to sample from the prior but impossible to evaluate it pointwise.

A. Doucet (UCL Masterclass Oct. 2012) 5th October 2012 4 / 37

• Common MCMC Approaches and Limitations

MCMC Idea: Simulate an ergodic Markov chain {θ (i) ,X1:T (i)}i≥0 of invariant distribution p ( θ, x1:T | y1:T )... infinite number of possibilities.

Typical strategies consists of updating iteratively X1:T conditional upon θ then θ conditional upon X1:T .

To update X1:T conditional upon θ, use MCMC kernels updating subblocks according to pθ (xt :t+K−1| yt :t+K−1, xt−1, xt+K ). Standard MCMC algorithms are ineffi cient if θ and X1:T are strongly correlated.

Strategy impossible to implement when it is only possible to sample from the prior but impossible to evaluate it pointwise.

A. Doucet (UCL Masterclass Oct. 2012) 5th October 2012 4 / 37

• Common MCMC Approaches and Limitations

MCMC Idea: Simulate an ergodic Markov chain {θ (i) ,X1:T (i)}i≥0 of invariant distribution p ( θ, x1:T | y1:T )... infinite number of possibilities.

Typical strategies consists of updating iteratively X1:T conditional upon θ then θ conditional upon X1:T .

To update X1:T conditional upon θ, use MCMC kernels updating subblocks according to pθ (xt :t+K−1| yt :t+K−1, xt−1, xt+K ).

Standard MCMC algorithms are ineffi cient if θ and X1:T are strongly correlated.

Strategy impossible to implement when it is only possible to sample from the prior but impossible to evaluate it pointwise.

A. Doucet (UCL Masterclass Oct. 2012) 5th October 2012 4 / 37

• Common MCMC Approaches and Limitations

MCMC Idea: Simulate an ergodic Markov chain {θ (i) ,X1:T (i)}i≥0 of invariant distribution p ( θ, x1:T | y1:T )... infinite number of possibilities.

Typical strategies consists of updating iteratively X1:T conditional upon θ then θ conditional upon X1:T .

To update X1:T conditional upon θ, use MCMC kernels updating subblocks according to pθ (xt :t+K−1| yt :t+K−1, xt−1, xt+K ). Standard MCMC algorithms are ineffi cient if θ and X1:T are strongly correlated.

Strategy impossible to implement when it is only possible to sample from the prior but impossible to evaluate it pointwise.

A. Doucet (UCL Masterclass Oct. 2012) 5th October 2012 4 / 37

• Common MCMC Approaches and Limitations

MCMC Idea: Simulate an ergodic Markov chain {θ (i) ,X1:T (i)}i≥0 of invariant distribution p ( θ, x1:T | y1:T )... infinite number of possibilities.

Typical strategies consists of updating iteratively X1:T conditional upon θ then θ conditional upon X1:T .

To update X1:T conditional upon θ, use MCMC kernels updating subblocks according to pθ (xt :t+K−1| yt :t+K−1, xt−1, xt+K ). Standard MCMC algorithms are ineffi cient if θ and X1:T are strongly correlated.

Strategy impossible to implement when it is only possible to sample from the prior but impossible to evaluate it pointwise.

A. Doucet (UCL Masterclass Oct. 2012) 5th October 2012 4 / 37

• Metropolis-Hastings (MH) Sampling

To bypass these problems, we want to update jointly θ and X1:T .

Assume that the current state of our Markov chain is (θ, x1:T ), we propose to update simultaneously the parameter and the states using a proposal

q ( (θ∗, x∗1:T )| (θ, x1:T )) = q ( θ∗| θ) qθ∗ (x∗1:T | y1:T ) .

The proposal (θ∗, x∗1:T ) is accepted with MH acceptance probability

1∧ p ( θ ∗, x∗1:T | y1:T )

p ( θ, x1:T | y1:T ) q ( (x1:T , θ)| (x∗1:T , θ∗)) q ( (x∗1:T , θ

∗) ∣∣ (x1:T , θ))

Problem: Designing a proposal qθ∗ (x∗1:T | y1:T ) such that the acceptance probability is not extremely small is very diffi cult.

A. Doucet (UCL Masterclass Oct. 2012) 5th October 2012 5 / 37

• Metropolis-Hastings (MH) Sampling

To bypass these problems, we want to update jointly θ and X1:T .

Assume that the current state of our Markov chain is (θ, x1:T ), we propose to update simultaneously the parameter and the states using a proposal

q ( (θ∗, x∗1:T )| (θ, x1:T )) = q ( θ∗| θ) qθ∗ (x∗1:T | y1:T ) .

The proposal (θ∗, x∗1:T ) is accepted with MH acceptance probability

1∧ p ( θ ∗, x∗1:T | y1:T )

p ( θ, x1:T | y1:T ) q ( (x1:T , θ)| (x∗1:T , θ∗)) q ( (x∗1:T , θ

∗) ∣∣ (x1:T , θ))

Problem: Designing a proposal qθ∗ (x∗1:T | y1:T ) such that the acceptance probability is not extremely small is very diffi cult.

A. Doucet (UCL Masterclass Oct. 2012) 5th October 2012 5 / 37

• Metropolis-Hastings (MH) Sampling

To bypass these problems, we want to update jointly θ and X1:T .

Assume that the current state of our Markov chain is (θ, x1:T ), we propose to update simultaneously the parameter and the states using a proposal

q ( (θ∗, x∗1:T )| (θ, x1:T )) = q ( θ∗| θ) qθ∗ (x∗1:T | y1:T ) .

The proposal (θ∗, x∗1:T ) is accepted with MH acceptance probability

1∧ p ( θ ∗, x∗1:T | y1:T )

p ( θ, x1:T | y1:T ) q ( (x1:T , θ)| (x∗1:T , θ∗)) q ( (x∗1:T , θ

∗) ∣∣ (x1:T , θ))

Problem: Designing a proposal qθ∗ (x∗1:T | y1:T ) such that the acceptance probability is not extremely small is very diffi cult.

A. Doucet (UCL Masterclass Oct. 2012) 5th October 2012 5 / 37

• Metropolis-Hastings (MH) Sampling

To bypass these problems, we want to update jointly θ and X1:T .

Assume that the current state of our Markov chain is (θ, x1:T ), we propose to update simultaneously the parameter and the states using a proposal

q ( (θ∗, x∗1:T )| (θ, x1:T )) = q ( θ∗| θ) qθ∗ (x∗1:T | y1:T ) .

The proposal (θ∗, x∗1:T ) is accepted with MH acceptance probability

1∧ p ( θ ∗, x∗1:T | y1:T )

p ( θ, x1:T | y1:T ) q ( (x1:T , θ)| (x∗1:T , θ∗)) q ( (x∗1:T , θ

∗) ∣∣ (x1:T , θ))

Problem: Designing a proposal qθ∗ (x∗1:T | y1:T ) such that the acceptance probability is not extremely