Download - Optimal Proposal Density - meteo.fr · Overview. Simplest particle lter requires very large ensemble size, growing exponentially with the problem size.. Can the use of the optimal

Transcript

Performance Bounds for Particle Filters using

the “Optimal” Proposal Density

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

(2 log Ne)1/2

/ τ

E(1

/ma

x w

) -

1

N

e = 32

Ne = 64

Ne = 128

. Chris SnyderNational Center for Atmospheric Research, Boulder Colorado, USA

1

Overview

. Simplest particle filter requires very large ensemble size, growingexponentially with the problem size.

. Can the use of the optimal proposal density fix this?

2

Preliminaries

Notation

. follow Ide et al. (1997) generally, except:. . . dim(x) = Nx, dim(y) = Ny, ensemble size = Ne

. . . superscripts index ensemble members

. ∼ means “distributed as,” e.g. x ∼ N(0, 1). Also used for“asymptotic to”

. state evolution: xk =M(xk−1) + ηk

. observations: yk = H(xk) + εk

Interchangeable terms

. particles ≡ ensemble members

. sample ≡ ensemble

3

Preliminaries: The Simplest Particle Filter

. begin with members xik−1 and weights wik−1 that “represent”

p(xk−1|yok−1)

. compute xik by evolving each member to tk under the system dynamics

. re-weight, given new obs yok: wik ∝ wi

k−1p(yok|xik)

. (resample)

4

Preliminaries: Sequential Importance Sampling

Basic idea

. suppose p(x) is hard to sample from, but π(x) is not.

. draw {xi} from π(x) and approximate

p(x) ≈Ne∑i=1

wiδ(x− xi), where wi ∝ p(xi)/π(xi)

. π(x) is the proposal density

5

Preliminaries: SIS (cont.)

Perform importance sampling sequentially in time

. Given {xik−1} from π(xk−1), wish to sample from p(xk, xk−1|yok)

. choose proposal of the form

π(xk, xk−1|yok) = π(xk|xk−1, yok)π(xk−1)

. update weights using

wik ∝

p(xik, xik−1|yok)

π(xik, xik−1|yok)

=p(yok|xik)p(xik|xik−1)

π(xik|xik−1, yok)

wik−1

6

Preliminaries: SIS (cont.)

PF literature shows that choice of proposal is crucial

Standard proposal: transition density from dynamics

. π(xk, xk−1|yok) = p(xk|xk−1)

. wik ∝ p(yok|xik)

. members at tk generated by evolution under system dynamics,as in EF

7

Preliminaries: SIS (cont.)

“Optimal” proposal: Also condition on most recent obs

. π(xk, xk−1|yok) = p(xk|xk−1, yok)

. wik ∝ p(yok|xik−1)

. members at tk generated, in a sense, by DA

. optimal in sense that it minimizes variance of weights over xik

. several recent PF studies use proposals that either reduce to or arerelated to the optimal proposal (van Leeuwen 2010, Morzfeld et al.2011, Papadakis et al. 2010)

8

Degeneracy of PF Weights

. degeneracy ≡ maxiwik → 1

. common problem, well known in PF literature

. for standard proposal, Bengtsson et al. (2008) and Snyder et al.(2008) show Ne must increase exponentially as problem size increasesin order to avoid degeneracy

. What happens with optimal proposal?

9

A Simple Test Problem

Consider the system

xk = axk−1 + ηk−1, yk = xk + εk

where ηk−1 ∼ N(0, q2I) and εk ∼ N(0, I).

For standard proposal:

yk|xk ∼ N(xk, I), wik ∝ exp

(−12|yk − xik|2

)For optimal proposal:

yk|xk−1 ∼ N(axk−1,

(1 + q2

)I), wi

k ∝ exp

(−|yk − axik−1|2

2 (1 + q2)

)

10

A Simple Test Problem (cont.)

. histograms of maxiwik for Ne = 103, a = q = 1/2. 103 simulations.

. optimal proposal clearly reduces degeneracy

0 0.2 0.4 0.6 0.8 10

200

400

max wi

occurr

ences N

x = 160

0 0.2 0.4 0.6 0.8 10

200

400

max wi

Nx = 160

0 0.2 0.4 0.6 0.8 10

200

400

occurr

ences N

x = 40

0 0.2 0.4 0.6 0.8 10

200

400

Nx = 40

0 0.2 0.4 0.6 0.8 10

200

400

occurr

ences N

x = 10

Standard proposal

0 0.2 0.4 0.6 0.8 10

200

400

Nx = 10

Optimal proposal

11

Behavior of Weights

Define

V (xk, xk−1, yk) = − log(wk/wk−1) =

{− log p(yk|xk), std. proposal− log p(yk|xk−1), opt. proposal

Interested in V as random variable with yk known and xk and xk−1

distributed according to the proposal distribution at tk and tk−1, respectively.

Suppose each component of obs error is independent.

. p(yk|xk), p(yk|xk−1) can be written as products

. V becomes a sum over log likelihoods for each component

. if terms in sum are nearly independent, V → Gaussian as Ny →∞

. infer asymptotic behavior of maxwik from known asymptotics for

sample min of Gaussian

12

Behavior of Weights (cont.)

Let τ2 = var(V ). Then for large Ne and large τ ,

E(1/maxwik) ∼ 1 +

√2 logNe

τ

(Bengtsson et al. 2008, Snyder et al. 2008)

As τ2 increases, Ne must increase as exp(2τ2) to keep E(1/maxwi) fixed.

13

The Linear, Gaussian Case

Analytic results possible for linear, Gaussian case with general R = cov(εk),Q = cov(ηk) and Pk = cov(xk).

Proceed by transformation in obs space that simultaneously diagonalizeseither R and HPkH

T , or R+HQHT and HMPk−1(HM)T .

Then

τ2 =

Ny∑j=1

λ2j (λ2j/2 + y2k,j),

where λ2j are eigenvalues of

A =

{R−1/2HPkH

TR−1/2, std. proposal

(HQHT + R)−1/2HMPk−1(HM)T (HQHT + R)−1/2, opt. proposal.

14

Simple Test Problem, Revisited

V (xk, xk−1, yk) =

{− log p(yk|xk), std. proposal− log p(yk|xk−1), opt. proposal

2V (xk, xk−1, yk) =

∑Ny

j=1(yk,j − xk,j)2, std. proposal(1 + q2

)−1∑Ny

j=1(yk,j − axk−1,j)2, opt. proposal

τ2 = var(V ) =

Ny(a

2 + q2)(32a

2 + 32q

2 + 1), std. proposal

Nya2(32a

2 + q2 + 1)/(q2 + 1)2, opt. proposal

15

Simple Test Problem, Revisited (cont.)

τ2 = var(V ) =

Ny(a

2 + q2)(32a

2 + 32q

2 + 1), std. proposal

Nya2(32a

2 + q2 + 1)/(q2 + 1)2, opt. proposal

. τ2(opt. proposal) always less than or equal to τ2(std. proposal)

. τ2 from the two proposals is equal only when q = 0

. opt. proposal reduces τ2 by an O(1) factor for reasonable values of aand q; a = q = 1/2 implies a factor of 5 reduction in τ2.

16

Simple Test Problem, Revisited (cont.)

. Theoretical prediction for E(1/maxwi) vs. simulations. Expectationis based on 103 realizations.

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

(2 log Ne)1/2

/ τ

E(1

/ma

x w

) -

1

N

e = 32

Ne = 64

Ne = 128

17

Simple Test Problem, Revisited (cont.)

. minimum Ne such that E(1/maxwi) ≥ 1/0.8 for standard proposal(circles) and optimal proposal (crosses) for a = q = 1/2.

. ratio of slopes of best-fit lines is 4.6, vs. asymptotic prediction of 5

0 100 200 300 400 500 600 700 800

0.5

1

1.5

2

2.5

3

3.5

Nx, N

y

log

10 N

e

18

Ny, Nx and Problem Size

τ2 = var(log likelihood) measures “problem size” for PF

. as τ2 increases, Ne must increase as exp(2τ2) if E(1/maxwi) fixed.

Related to obs-space dimension

. in simple example, τ2 ∝ Ny

. given by sum over e-values of obs-space covariance in general linear,Gaussian case—like an effective dimension

Analogy of τ2 to dimension is incomplete

. τ2 depends on obs-error statistics, increasing as R decreases

. τ2 depends on proposal

19

Ny, Nx and Problem Size (cont.)

τ2 depends explicitly only on obs-space quantities

How does Nx affect weight degeneracy?

. asymptotic relation of τ2 and E(1/maxwi) requires V (xk, xk−1, yk)to be ∼ Gaussian over xk

. ∼ Gaussianity of V (xk) only if Nx = dim(x) is large and componentsof x are sufficiently independent

20

Ny, Nx and Problem Size (cont.)

. log of pdf of V : Ny = 1, 3, 10, 100; x ∼ N(0, I), H = I, ε ∼ N(0, I)

. recall that max weight depends on left-hand tail of p(V ), whichchanges greatly as Nx increases and V → Gaussian

0 1 2 3 4 5 6

-14

-12

-10

-8

-6

-4

-2

V / Ny

log(

p(V

) )

Ny = 1

Ny = 3

Ny = 10

Ny = 100

21

Summary

. As was the case for the standard proposal, the optimal proposalrequires Ne to increase exponentially with the “problem size” toavoid degeneracy.

. Exponential rate of increase is quantitatively smaller for the optimalproposal; necessary ensemble size may therefore be much smaller ina given problem.

. Benefits of optimal proposal dependent on magnitude of system noise.

. (In some cases, possible to enhance performance of optimal proposalby artificially increasing system noise in forecast model used infiltering.)

22

Recommendation

New PF algorithms intended for high-dimensional systems should beevaluated first on the simple test problem given here.

23

References

Bengtsson, T., P. Bickel and B. Li, 2008: Curse-of-dimensionality revisited: Collapse of the particle filter invery large scale systems. IMS Collections, 2, 316–334. doi: 10.1214/193940307000000518.

Morzfeld, M., X. Tu, E. Atkins and A. J. Chorin, 2011: A random map implementation of implicit filters. J.

Comput. Phys. 231, 2049–2066.

van Leeuwen, P. J., 2010: Nonlinear data assimilation in geosciences: an extremely efficient particle filter.Quart. J. Roy. Meteor. Soc. 136, 1991–1999.

Papadakis, N., E. Memin, A. Cuzol and N. Gengembre, 2010: Data assimilation with the weighted ensembleKalman filter. Tellus 62A, 673–697.

Snyder, C., T. Bengtsson, P. Bickel and J. Anderson, 2008: Obstacles to high-dimensional particle filtering.Monthly Wea. Rev., 136, 4629–4640.

24