Mean-field Methods in Estimation & Control

CSLCOORDINATED SCIENCE LABORATORY

Mean-field Methods in Estimation & Control

Prashant G. Mehta

Coordinated Science LaboratoryDepartment of Mechanical Science and Engineering

University of Illinois at Urbana-Champaign

EE Seminar, HarvardApr. 1, 2011

Introduction Kuramoto model

Background: Synchronization of coupled oscillators

dθi(t) =

(ωi +

κ

N

N

∑j=1

sin(θ j(t)−θi(t))

)dt +σ dξi(t), i = 1, . . . ,N

ωi taken from distribution g(ω) over [1− γ,1+ γ]γ — measures the heterogeneity of the population

κ — measures the strength of coupling

[10] Y. Kuramoto, 1975; [15] Strogatz et al., J. Stat. Phy., 1991

P. G. Mehta (Illinois) Harvard Apr. 1, 2011 2 / 38



dθi(t) =

(ωi +

κ

N

N

∑j=1


)dt +σ dξi(t), i = 1, . . . ,N


κ — measures the strength of coupling 1- 1+1





dθi(t) =

(ωi +

κ

N

N

∑j=1


)dt +σ dξi(t), i = 1, . . . ,N







dθi(t) =

(ωi +

κ

N

N

∑j=1


)dt +σ dξi(t), i = 1, . . . ,N



0 0.1 0.20.1

0.15

0.2

0.25

0.3 Locking

Incoherence

κ

κ < κc(γ)

R

γ

Synchrony

Incoherence




Movies of incoherence and synchrony solution

−1

0.8

0.6

0.4

0.2

0

0.2

0.4

0.6

0.8

1

−1

0.8

0.6

0.4

0.2

0

0.2

0.4

0.6

0.8

1

Incoherence Synchrony


Introduction Oscillator examples

Motivation: Oscillators in biology

Oscillators in biology

Coupled oscillators

Images taken from google search.



Hodgkin-Huxley type Neuron model

CdVdt

=−gT ·m2∞(V ) ·h · (V −ET )

−gh · r · (V −Eh)− . . . . . .

dhdt

=h∞(V )−h

τh(V )drdt

=r∞(V )− r

τr(V )2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000

−150

−100

−50

0

50

100

Voltage

time

Neural spike train

[6] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004




CdVdt

=−gT ·m2∞(V ) ·h · (V −ET )

−gh · r · (V −Eh)− . . . . . .

dhdt

=h∞(V )−h

τh(V )drdt

=r∞(V )− r

τr(V )2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000

−150

−100

−50

0

50

100

Voltage

time

Neural spike train

−100

−50

0

50

100

0

0.2

0.4

0.6

0.8

10

0.1

0.2

0.3

0.4

Vh

r

Limit cyle

r

h v





CdVdt

=−gT ·m2∞(V ) ·h · (V −ET )

−gh · r · (V −Eh)− . . . . . .

dhdt

=h∞(V )−h

τh(V )drdt

=r∞(V )− r

τr(V )2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000

−150

−100

−50

0

50

100

Voltage

time

Neural spike train

−100

−50

0

50

100

0

0.2

0.4

0.6

0.8

10

0.1

0.2

0.3

0.4

Vh

r

Limit cyle

r

h v

Normal form reduction−−−−−−−−−−−−−→

θi = ωi +ui ·Φ(θi)



Introduction Role of Neural Rhythms

Question (Fundamental question in Neuroscience)Why is synchrony (neural rhythms) useful?Does it have a functional role?

1 SynchronizationPhase transition in controlled system (motivated by coupledoscillators)H. Yin, P. G. Mehta, S. P. Meyn and U. V. Shanbhag, “Synchronization of Coupled Oscillators is a Game,” TAC

2 Neuronal computationsBayesian inferenceNeural circuits as particle filters (Lee & Mumford)T. Yang, P. G. Mehta and S. P. Meyn, “ A Control-oriented Approach for Particle Filtering,” ACC 2011, CDC 2011

3 LearningSynaptic plasticity via long term potentiation (Hebbian learning)“Neurons that fire together wire together”H. Yin, P. G. Mehta, S. P. Meyn and U. V. Shanbhag, “ Learning in Mean-Field Oscillator Games,” CDC 2010

Destexhe & Marder, Nature, 2004; Kopell et al., Neuroscience, 2009; Lee & Mumford, J. Opt. Soc, 2003


Part I

Mean-field Control

Collaborators

Huibing Yin Sean P. Meyn Uday V. Shanbhag

“Synchronization of coupled oscillators is a game,” IEEE TAC

“Learning in Mean-Field Oscillator Games,” CDC 2010

“On the Efficiency of Equilibria in Mean-field Oscillator Games,” ACC 2011


Derivation of model Problem statement

Oscillators dynamic game

Dynamics of ith oscillator

dθi = (ωi +ui(t))dt +σ dξi, i = 1, . . . ,N, t ≥ 0

ui(t) — control 1- 1+1

ith oscillator seeks to minimize

ηi(ui;u−i) = limT→∞

1T

∫ T

0E[ c(θi;θ−i)︸︷︷︸

cost of anarchy

+ 12 Ru2

i︸︷︷︸cost of control

]ds

θ−i = (θ j) j 6=iR — control penalty

c(·) — cost function

c(θi;θ−i) =1N ∑

j 6=ic•(θi,θ j), c• ≥ 0


Derivation of model Problem statement

Oscillators dynamic game

Dynamics of ith oscillator

dθi = (ωi +ui(t))dt +σ dξi, i = 1, . . . ,N, t ≥ 0

ui(t) — control

ith oscillator seeks to minimize


1T

∫ T

0E[ c(θi;θ−i)︸︷︷︸

cost of anarchy

+ 12 Ru2

i︸︷︷︸cost of control

]ds

θ−i = (θ j) j 6=iR — control penalty

c(·) — cost function


j 6=ic•(θi,θ j), c• ≥ 0


Derivation of model Mean-field model

Mean-field model derivation

dθi = (ωi +ui(t))dt +σ dξi


1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds

Influence

Influence

Mass

1 Mean-field approximation

c(θi;θ−i(t)) =1N ∑

j 6=ic•(θi,θ j)

N→∞−−−−−−→ c(θi, t)

2 Optimal control of single oscillatorDecentralized control structure

[8] M. Huang, P. Caines, and R. Malhame, IEEE TAC, 2007 [HCM]; [13] Lasry & Lions, Japan. J. Math, 2007;


Derivation of model Derivation steps

Single oscillator with given cost

Dynamics of the oscillator

dθi = (ωi +ui(t))dt +σ dξi, t ≥ 0

The cost function is assumed known

ηi(ui; c) = limT→∞

1T

∫ T

0E[

c(θi;θ−i) + 12 Ru2

i (s)]

ds

⇑c(θi(s),s)

HJB equation:

∂thi +ωi∂θ hi =1

2R(∂θ hi)2− c(θ , t)+η

∗i −

σ2

2∂

2θθ hi

Optimal control law: u∗i (t) = ϕi(θ , t) =− 1R

∂θ hi(θ , t)

[1] D. P. Bertsekas (1995); [14] S. P. Meyn, IEEE TAC, 1997



Single oscillator with optimal control

Dynamics of the oscillator

dθi(t) =(

ωi−1R

∂θ hi(θi, t))

dt +σ dξi(t)

Kolmogorov forward equation for pdf p(θ , t,ωi)

FPK: ∂t p+ωi∂θ p =1R

∂θ [p(∂θ h)]+σ2

2∂

2θθ p



Mean-field Approximation

HJB equation for population

∂th+ω∂θ h =1

2R(∂θ h)2− c(θ , t)+η(ω)− σ2

2∂

2θθ h h(θ , t,ω)

Population density

∂t p+ω∂θ p =1R

∂θ [p(∂θ h)]+σ2

2∂

2θθ p p(θ , t,ω)

Enforce cost consistency

c(θ , t) =∫

Ω

∫ 2π

0c•(θ ,ϑ)p(ϑ , t,ω)g(ω)dϑ dω

≈ 1N ∑

j 6=ic•(θ ,ϑ)



Mean-field model



1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds


j 6=ic•(θi,θ j(t))

N→∞−−−→ c(θi, t)

Letting N→ ∞ and assume c(θi,θ−i)→ c(θi, t)

HJB: ∂th+ω∂θ h =1

2R(∂θ h)2− c(θ , t) +η

∗− σ2

2∂

2θθ h ⇒ h(θ , t,ω)

FPK: ∂t p+ω∂θ p =1R

∂θ [p( ∂θ h )]+σ2

2∂

2θθ p ⇒ p(θ , t,ω)

c(ϑ , t) =∫

Ω

∫ 2π

0c•(ϑ ,θ) p(θ , t,ω) g(ω)dθ dω

[8] Huang et al., IEEE TAC, 2007; [13] Lasry & Lions, Japan. J. Math, 2007; [16] Weintraub et al., NIPS, 2006;P. G. Mehta (Illinois) Harvard Apr. 1, 2011 14 / 38


Mean-field model



1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds

Influence

Influence

Mass

Letting N→ ∞ and assume c(θi,θ−i)→ c(θi, t)


2R(∂θ h)2− c(θ , t) +η

∗− σ2

2∂

2θθ h ⇒ h(θ , t,ω)


∂θ [p( ∂θ h )]+σ2

2∂

2θθ p ⇒ p(θ , t,ω)

c(ϑ , t) =∫

Ω

∫ 2π


[8] Huang et al., IEEE TAC, 2007; [13] Lasry & Lions, Japan. J. Math, 2007; [16] Weintraub et al., NIPS, 2006;P. G. Mehta (Illinois) Harvard Apr. 1, 2011 14 / 38

Solutions of PDE model ε-Nash equilibrium

Solution of PDEs gives ε-Nash equilibrium

Optimal control law

uoi =− 1

R∂θ h(θ(t), t,ω)

∣∣ω=ωi

Theorem

For the control law uoi =− 1

R∂θ h(θ(t), t,ω)

∣∣ω=ωi

, we have the ε-Nashequilibrium property

ηi(uoi ;uo−i)≤ ηi(ui;uo

−i)+O(1√N

), i = 1, . . . ,N,

for any adapted control ui.

So, we look for solutions of PDEs.


Solutions of PDE model Synchronization

Main result: Synchronization as a solution of the game

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γγIncoherence

R > Rc(γ)

Synchrony

Incoherence

dθi = (ωi +ui)dt +σ dξi


1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds

1- 1+1

Yin et al., ACC 2010 Strogatz et al., J. Stat. Phy., 1991



Main result: Synchronization as a solution of the game

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γγIncoherence

R > Rc(γ)

Synchrony

Incoherence



1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds

0 0.1 0.20.1

0.15

0.2

0.25

0.3 Locking

Incoherence

κ

κ < κc(γ)

R

γ

Synchrony

Incoherence

dθi =

(ωi +

κ

N

N

∑j=1

sin(θ j−θi)

)dt +σ dξi

Yin et al., ACC 2010 Strogatz et al., J. Stat. Phy., 1991



Incoherence solution (PDE)

Incoherence solution

h(θ , t,ω) = h0(θ) := 0 p(θ , t,ω) = p0(θ) :=1

2π

h(θ , t,ω) = 0 ⇒ ∂th+ω∂θ h =1

2R(∂θ h)2− c(θ , t)+η

∗− σ2

2∂

2θθ h

∂t p+ω∂θ p =1R

∂θ [p(∂θ h)]+σ2

2∂

2θθ p

c(θ , t) =∫

Ω

∫ 2π




Incoherence solution (PDE)

Incoherence solution

h(θ , t,ω) = h0(θ) := 0 p(θ , t,ω) = p0(θ) :=1

2π

h(θ , t,ω) = 0⇒ ∂th+ω∂θ h =1

2R(∂θ h)2− c(θ , t)+η

∗− σ2

2∂

2θθ h

p(θ , t,ω) = 12π⇒ ∂t p+ω∂θ p =

1R

∂θ [p(∂θ h)]+σ2

2∂

2θθ p

c(θ , t) =∫

Ω

∫ 2π




Incoherence solution (Finite population)

Closed-loop dynamics dθi = (ωi + ui︸︷︷︸=0

)dt +σ dξi(t)

Average cost

ηi = limT→∞

1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i︸︷︷︸

=0

]dt

= limT→∞

1N ∑

j 6=i

1T

∫ T

0E[ 1

2 sin2(

θi(t)−θ j(t)2

)]dt

=1N ∑

j 6=i

∫ 2π

0E[ 1

2 sin2(

θi(t)−ϑ

2

)]

12π

dϑ =N−1

Nη0

−1

0.8

0.6

0.4

0.2

0

0.2

0.4

0.6

0.8

1

ε-Nash property

ηi(uoi ;uo−i)≤ ηi(ui;uo

−i)+O(1√N

), i = 1, . . . ,N.


Analysis of phase transition Numerics

Bifurcation diagram

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γγIncoherence

R > Rc(γ)

Synchrony

Incoherence R−1/2

η(ω)

0. 1 0.15 0. 2 0.25 0. 3 0.35

0. 1

0.15

0. 2

0.25

ω = 0.95

ω = 1

ω = 1.05

R > Rc

η(ω) = η0

R < Rc

η(ω) < η0



1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds

1 Convergence theory (as N→ ∞)2 Bifurcation analysis3 Analytical and numerical computations

Yin et al., IEEE TAC, To appear; [17] Yin et al., ACC2010



Bifurcation diagram

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γγIncoherence

R > Rc(γ)

Synchrony

Incoherence R−1/2

η(ω)

0. 1 0.15 0. 2 0.25 0. 3 0.35

0. 1

0.15

0. 2

0.25

ω = 0.95

ω = 1

ω = 1.05

R > Rc

η(ω) = η0

R < Rc

η(ω) < η0

incoherence soln.



1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds





Bifurcation diagram

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γγIncoherence

R > Rc(γ)

Synchrony

Incoherence R−1/2

η(ω)

0. 1 0.15 0. 2 0.25 0. 3 0.35

0. 1

0.15

0. 2

0.25

ω = 0.95

ω = 1

ω = 1.05

R > Rc

η(ω) = η0

R < Rc

η(ω) < η0

synchrony soln.



1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds





Bifurcation diagram

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γγIncoherence

R > Rc(γ)

Synchrony

Incoherence R−1/2

η(ω)

0. 1 0.15 0. 2 0.25 0. 3 0.35

0. 1

0.15

0. 2

0.25

ω = 0.95

ω = 1

ω = 1.05

R > Rc

η(ω) = η0

R < Rc

η(ω) < η0



1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds




Part II

Mean-field Estimation

Collaborators

Tao Yang Sean P. Meyn

“A Control-oriented Approach for Particle Filtering,” ACC 2011

“Feedback Particle Filter with Mean-field Coupling,” submitted to CDC 2011


Problem Definition

Filtering problem

Signal, observation processes:

dθ(t) = ω dt +σB dB(t) mod 2π (1)dZ(t) = h(θ(t))dt +σW dW (t) (2) -

Particle filter:

dθi(t) = ω dt +σB dBi(t)+Ui(t) mod 2π, i = 1, ...,N. (3)

Objective: To choose control Ui(t) for the purpose of filtering.


Problem Definition

Filtering problem


dX(t) = a(X(t))dt +σB dB(t) (1)dZ(t) = h(X(t))dt +σW dW (t) (2)

X(t) ∈ R – state, Z(t) ∈ R – observationa( ·),h( ·) – C1 functionsB(t),W (t) –ind. standard Wiener processes, σB ≥ 0,σW > 0X(0) ∼ p0(·)

Define Z t := σZ(s),0≤ s≤ tObjective: estimate the posterior distribution p∗ of X(t) given Z t .Solution approaches:

Linear dynamics: Kalman filter (R. E. Kalman, 1960)Nonlinear dynamics: Kushner’s equation (H. J. Kushner, 1964)Numerical Methods: Particle filtering (N. J. Gordon et. al, 1993)


Existing Approach Kalman Filter

Linear case: Kalman FilterLinear Gaussian system:

dX(t) = aX(t)dt +σB dB(t) (1)dZ(t) = hX(t)dt +σW dW (t) (2)

Kalman filter: p∗ = N(µ(t),Σ(t))

dµ(t) = aµ(t)dt +hΣ(t)σ2

W(dZ(t)−hµ(t)dt)︸︷︷︸

innovation error

dΣ(t)dt

= 2aΣ(t)+σ2B−

h2Σ2(t)σ2

W

[9] R. E. Kalman, Trans. ASME, Ser. D: J. Basic Eng.,1961


Existing Approach Nonlinear case: Kushner’s equation

Nonlinear case: K-S equationSignal, observation processes:

dX(t) = a(X(t))dt +σB dB(t), (1)dZ(t) = h(X(t))dt +σW dW (t) (2)

B(t),W (t) are independent standard Wiener processes.Kushner’s equation: the posterior distribution p∗ is a solution of astochastic PDE:

dp∗ = L †(p∗)dt +(h− h)(σ2W )−1(dZ(t)− hdt)p∗

whereh = E[h(X(t))|Z t ] =

∫h(x)p∗(x, t)dx

L †(p∗) =−∂ (p∗ ·a(x))∂x

+12

σ2B

∂ 2 p∗

∂x2

No closed-form solution in general. Closure problem.

[11] H. J. Kushner, SIAM J. Control, 1964


Existing Approach Nonlinear case: Kushner’s equation

Numerical methods: Particle filter

Idea:Approximate posterior p∗ in terms of particles Xi(t)N

i=1 so

p∗ ≈1N

N

∑i=1

δXi(t)(·)

Algorithm outlineInitialization Xi(0) ∼ p∗0(·).At each time step:

Importance sampling (for weight update)Resampling (for variance reduction)

[5], N. J. Gordon, D. J. Salmond, and A. F. M. Smith, IEEE Proceedings on Radar and Signal Processing, 1993[4], A. Doucet, N. Freitas, N. Gordon, Springer-Verlag, 2001


A Control-Oriented Approach to Particle Filtering Overview

Mean-field filtering


dX(t) = a(X(t))dt +σB dB(t) (1)dZ(t) = h(X(t))dt +σW dW (t) (2)

Controlled system (N particles):

dXi(t) = a(Xi(t))dt +σB dBi(t)+ Ui(t)︸︷︷︸mean field control

, i = 1, ...,N (3)

Bi(t)Ni=1 are ind. standard Wiener Processes.

Objective: Choose control Ui(t) such that the empirical distributionof the population approximates the posterior distribution p∗ asN→ ∞.

[7] M. Huang, P. E. Caines, and R. P. Malhame, IEEE TAC, 2007[18] H. Yin, P. G. Mehta, S. P. Meyn and U. V. Shanbhag, IEEE TAC, To be appeared[12] J. Lasry and P. Lions, Japanese Journal of Mathematics, 2007


A Control-Oriented Approach to Particle Filtering Example: Linear case

Example: Linear caseControlled system: for i = 1, ...N:

dXi(t) = aXi(t)dt +σB dBi(t)+hΣ(t)σ2

W[dZ(t)−h

Xi(t)+ µ(t)2

dt]︸︷︷︸Ui(t)

(3)

where

µ(t)≈ µ(N)(t) =

1N

N

∑i=1

Xi(t) ←− sample mean

Σ(t)≈ Σ(N)(t) =

1N−1

N

∑i=1

(Xi(t)−µ(t))2 ←− sample variance



Example: Linear case

Feedback particle filter:

dXi(t) = aXi(t)dt +σB dBi(t)+hΣ(N)(t)

σ2W

[dZ(t)−hXi(t)+ µ(N)(t)

2dt]

(3)

Mean-field model: Let p denote cond. dist. of Xi(t) given Z t withp(x,0) = N(µ(0),Σ(0)). Then p = N(µ(t),Σ(t)) where

dµ(t) = aµ(t)dt +hΣ(t)σ2

W(dZ(t)−hµ(t))

dΣ(t)dt

= 2aΣ(t)+σ2B−

h2Σ2(t)σ2

W

Same as Kalman filter!

As N→ ∞, the empirical distribution approximates the posterior p∗



Comparison plots

mse =1T

∫ T

0

(Σ

(N)t −Σt

Σt

)2

dt

101

102

103

10−3

10−2

10−1

100

N

ErrorFPF −0.5

0

0.5

BPF −0.5

0

Bootstrap (BPF)

Feedback (FPF)

a ∈ [−1,1],h = 3,σB = 1,σW = 0.5, dt = 0.01,T = 50 andN ∈ [20,50,100,200,500,1000].P. G. Mehta (Illinois) Harvard Apr. 1, 2011 30 / 38

Methodology

Methodology: Variational problem

Continuous-discrete time problem. . . . . . . .

Signal, observ processes:dX(t) = a(X(t))dt +σB dB(t)Z(tn) = h(X(tn))+W (tn)

Particle filterFilter: dXi(t) = a(Xi(t)dt +σB dBi(t)Control: Xi(tn) = Xi(t−n )+u(Xi(t−n ))︸︷︷︸

control

Belief map.p∗n(x): cond. pdf of X(t)|Z t

Bayes rule

pn(x): cond. pdf of Xi(t)|Z t

Variational problem:

K-L Metric: KL(pn ‖ p∗n) = Epn [ln(

pn

p∗n

)]

Control design: u∗(x) = arg minuKL(pn ‖ p∗n).


Feedback particle filter

Feedback particle filter(FPF)Problem:

dX(t) = a(X(t))dt +σB dB(t), (1)dZ(t) = h(X(t))dt +σW dW (t) (2)

FPF:dXi(t) = a(Xi(t))dt +σB dBi(t)

+ v(Xi(t))dIi(t)+12

σ2W v(Xi(t))v′(Xi(t))dt (3)

Innovation error dIi(t)=:dZ(t)− 12(h(Xi(t))+ h)dt

Population prediction h =1N

N

∑i=1

h(Xi(t)).

Euler-Lagrange equation:

− ∂

∂x

(1

p(x, t)∂

∂xp(x, t)v(x, t)

)=

1σ2

Wh′(x), (4)

with boundary condition limx→±∞

v(x, t)p(x, t) = 0.


Consistency result

Consistency result

p∗ denotes cond. pdf of X(t) given Z t .

dp∗ = L †(p∗)dt +(h− h)(σ2W )−1(dZ(t)− hdt)p∗

p denotes cond. pdf of Xi(t) given Z t

dp = L † pdt− ∂

∂x(vp) dZ(t)− ∂

∂x(up) dt +

σ2W

2∂ 2

∂x2

(pv2) dt

TheoremConsider the two evolution equations for p and p∗, defined according tothe solution of the Kolmogorov forward equation and the K-S equation,respectively. Suppose that the control function v(x, t) is obtainedaccording to (4). Then, provided p(x,0) = p∗(x,0), we have for all t ≥ 0,

p(x, t) = p∗(x, t)


Linear Gaussian case

Linear Gaussian case

Linear system:

dX(t) = aX(t)dt +σB dB(t) (1)dZ(t) = hX(t)dt +σW dW (t) (2)

p(x, t) =1√

2πΣ(t)exp

(−(x−µ(t))2

2Σ(t)

)E-L equation:

− ∂

∂x

(1p

∂

∂xpv

)=

hσ2

W⇒ v(x, t) =

hΣ(t)σ2

W

FPF:

dXi(t) = aXi(t)dt +σB dBi(t)+hΣ(t)σ2

W[dZ(t)−h

Xi(t)+ µ(t)2

dt] (3)


Example

Example: Filtering of nonlinear oscillator

Nonlinear oscillator:

dθ(t) = ω dt +σB dB(t) mod 2π (1)

dZ(t) =1+ cosθ(t)

2︸︷︷︸h(θ(t))

dt +σW dW (t) (2)

-

Controlled system (N particles):

dθi(t) = ω dt +σB dBi(t)+ v(θi(t))[dZ(t)− 12(h(θi(t))+ h)dt]

+12

σ2W vv′ dt mod 2π, i = 1, ...,N. (3)

where v(θi(t)) is obtained via the solution of the E-L equation:

− ∂

∂θ

(1

p(θ , t)∂

∂θp(θ , t)v(θ , t)

)=−sinθ

σ2W

(4)


Example

Example: Filtering of nonlinear oscillator

Fourier form of p(θ , t):

p(θ , t) =1

2π+Ps(t)sinθ +Pc(t)cosθ .

Approx. solution of E-L equation (4) (using a perturbationmethod):

v(θ , t) =1

2σ2W

−sinθ +

π

2[Pc(t)sin2θ −Ps(t)cos2θ ]

,

v′(θ , t) =1

2σ2W−cosθ +π[Pc(t)cos2θ +Ps(t)sin2θ ]

wherePc(t)≈ P(N)

c (t) =1

πN

N

∑j=1

cosθ j(t), Ps(t)≈ P(N)s (t) =

1πN

N

∑j=1

sinθ j(t).

Reminiscent of coupled oscillators (Winfree)


Example

Simulation Results


dθ(t) = 1dt +0.5dB(t) (1)

dZ(t) =1+ cosθ(t)

2dt +0.4dW (t) (2)

Particle system (N = 100):

dθi(t) = 1dt +σB dBi(t)+U(θi(t); P(N)c (t), P(N)

s (t)) mod 2π (3)


Example

Simulation Results

0 2 4 0 2 4

0.01

0.02

0.03

0.04

00

0.1

0.2

0.3

0.4

0.5

0.6

0.7

t t xπ 2π0

π

2π(a) (b) (c)First harmonic PDF estimate

Second harmonic

Bounds

State

Conditional mean est.

θ(t)θN(t)


Thank you!

Website: http://www.mechse.illinois.edu/research/mehtapg

Collaborators

Huibing Yin Tao Yang Sean Meyn Uday Shanbhag

“Synchronization of coupled oscillators is a game,” IEEE TAC, ACC 2010“Learning in Mean-Field Oscillator Games,” CDC 2010“On the Efficiency of Equilibria in Mean-field Oscillator Games,” ACC 2011“A Control-oriented Approach for Particle Filtering,” ACC2011, CDC2011

http://www.mechse.illinois.edu/research/mehtapg

Part III

Learning in mean-field game

Learning Q-function approximation

Approx. DPOptimality equation min

uic(θ ;θ−i(t))+ 1

2 Ru2i +Duihi(θ , t)︸︷︷︸

=: Hi(θ ,ui;θ−i(t))

= η∗i

Optimal control law Kuramoto law

u∗i =− 1R

∂θ hi(θ , t) u(Kur)i =−κ

N ∑j 6=i

sin(θi−θ j(t))

Parameterization:

H(Ai,φi)i (θ ,ui;θ−i(t))= c(θ ;θ−i(t))+ 1

2 Ru2i +(ωi−1+ui)AiS(φi)+

σ2

2AiC(φi)

where

S(φ)(θ ,θ−i) =1N ∑

j 6=isin(θ −θ j−φ), C(φ)(θ ,θ−i) =

1N ∑

j 6=icos(θ −θ j−φ)

Approx. optimal control:

u(Ai,φi)i = arg min

ui

H(Ai,φi)i (θ ,ui;θ−i(t))=−Ai

RS(φi)(θ ,θ−i)

Watkins & Dayan, Q-learning, 1992; Bertsekas & Tsitsiklis, NDP, 1996; Mehta & Meyn, CDC 2009P. G. Mehta (Illinois) Harvard Apr. 1, 2011 41 / 38

Learning Stochastic approx.

Learning algorithmBellman error:

Pointwise: L (Ai,φi)(θ , t) = minuiH(Ai,φi)

i −η(A∗i ,φ

∗i )

i

Stochastic approx. algorithm

e(Ai,φi) =2

∑k=1|〈L (Ai,φi), ϕk(θ)〉|2

dAi

dt=−ε

de(Ai,φi)dAi

,dφi

dt=−ε

de(Ai,φi)dφi

Convergence analysis: Ai(t)→ A∗, φi(t)→ φ∗

100 200 300 400 500

−1

0

1

2

3

10 15 20 25

−1

0

1

2

3 Synchrony Incoherence

Yin et al., CDC 2010P. G. Mehta (Illinois) Harvard Apr. 1, 2011 42 / 38

Learning Bifurcation diagram

Comparison of average cost



1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds






1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds

Learning






1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds

Learning

Mean-field


Part IV

Apendix

Analysis of phase transition Incoherence solution

Overview of the steps


2R(∂θ h)2− c(θ , t) +η

∗− σ2

2∂

2θθ h ⇒ h(θ , t,ω)


∂θ [p( ∂θ h )]+σ2

2∂

2θθ p ⇒ p(θ , t,ω)

c(ϑ , t) =∫

Ω

∫ 2π


Assume c•(ϑ ,θ) = c•(ϑ −θ) = 12 sin2

(ϑ −θ

2

)Incoherence solution

h(θ , t,ω) = h0(θ) := 0 p(θ , t,ω) = p0(θ) :=1

2π


Analysis of phase transition Bifurcation analysis

Linearization and spectra

Linearized PDE (about incoherence solution)

∂

∂ tz(θ , t,ω) =

(−ω∂θ h− c− σ2

2 ∂ 2θθ

h−ω∂θ p+ 1

2πR ∂ 2θθ

h+ σ2

2 ∂ 2θθ

p

)=: LRz(θ , t,ω)

Spectrum of the linear operator1 Continuous spectrum S(k)+∞

k=−∞

S(k) :=λ ∈ C

∣∣λ =±σ2

2k2− kωi for all ω ∈Ω

2 Discrete spectrum

Characteristic eqn:1

8R

∫Ω

g(ω)

(λ − σ2

2 +ωi)(λ + σ2

2 +ωi)dω +1 = 0.



Linearization and spectra

Linearized PDE (about incoherence solution)

∂

∂ tz(θ , t,ω) =

(−ω∂θ h− c− σ2

2 ∂ 2θθ

h−ω∂θ p+ 1

2πR ∂ 2θθ

h+ σ2

2 ∂ 2θθ

p

)=: LRz(θ , t,ω)

Spectrum of the linear operator1 Continuous spectrum S(k)+∞

k=−∞

S(k) :=λ ∈ C

∣∣λ =±σ2

2k2− kωi for all ω ∈Ω

−0.2 −0.1 0 0.1 0.2 0.3

−3

−2

−1

0

1

2

3

real

ima

g

γ = 0.1

R decreases

k=2 k=2

k=1 k=1

2 Discrete spectrum


8R

∫Ω

g(ω)

(λ − σ2

2 +ωi)(λ + σ2

2 +ωi)dω +1 = 0.



Bifurcation diagram (Hamiltonian Hopf)


8R

∫Ω

g(ω)

(λ − σ2

2 +ωi)(λ + σ2

2 +ωi)dω +1 = 0.

Stability proof

[3] Dellnitz et al., Int. Series Num. Math., 1992





8R

∫Ω

g(ω)

(λ − σ2

2 +ωi)(λ + σ2

2 +ωi)dω +1 = 0.

Stability proof

−0.2 −0.1 0 0.1 0.2

-0.6

-0.8

-1

-1.2

-1.4

real

imag

(a)

Cont. spectrum; ind. of RDisc. spectrum; fn. of R

Bifurcation point






8R

∫Ω

g(ω)

(λ − σ2

2 +ωi)(λ + σ2

2 +ωi)dω +1 = 0.

Stability proof

−0.2 −0.1 0 0.1 0.2

-0.6

-0.8

-1

-1.2

-1.4

real

imag

(a)

Cont. spectrum; ind. of RDisc. spectrum; fn. of R

Bifurcation point

0 0.05 0.1 0.15 0.215

20

25

30

35

40

45

50

Incoherence

R > RR

c(γ

γ

) (c))

Synchrony

0.05




Numerical solution of PDEs

Incoherence; R = 60

incoherence



Numerical solution of PDEs

Incoherence; R = 60

incoherence

Synchrony; R = 10

synchrony

synchrony



Bifurcation diagram

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γγIncoherence

R > Rc(γ)

Synchrony

Incoherence R−1/2

η(ω)

0. 1 0.15 0. 2 0.25 0. 3 0.35

0. 1

0.15

0. 2

0.25

ω = 0.95

ω = 1

ω = 1.05

R > Rc

η(ω) = η0

R < Rc

η(ω) < η0



1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds



Bifurcation diagram

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γγIncoherence

R > Rc(γ)

Synchrony

Incoherence R−1/2

η(ω)

0. 1 0.15 0. 2 0.25 0. 3 0.35

0. 1

0.15

0. 2

0.25

ω = 0.95

ω = 1

ω = 1.05

R > Rc

η(ω) = η0

R < Rc

η(ω) < η0

incoherence soln.



1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds



Bifurcation diagram

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γγIncoherence

R > Rc(γ)

Synchrony

Incoherence R−1/2

η(ω)

0. 1 0.15 0. 2 0.25 0. 3 0.35

0. 1

0.15

0. 2

0.25

ω = 0.95

ω = 1

ω = 1.05

R > Rc

η(ω) = η0

R < Rc

η(ω) < η0

synchrony soln.



1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds



Bifurcation diagram with another cost

c•(θ ,ϑ) = 0.25(1− cos(θ −ϑ))

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.450.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

0.26

0.28

R−1/2

0 1 2 3 4 5 6 70

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

θ

p

R = 23.8359




c•(θ ,ϑ) = 0.25(1− cos(θ −ϑ))

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.450.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

0.26

0.28

R−1/2

0 1 2 3 4 5 6 70

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

θ

p

R = 23.8359

0 1 2 3 4 5 6 70

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

θ

p

R = 23.8359




c•(θ ,ϑ) = 0.25(1− cos(θ −ϑ))

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.450.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

0.26

0.28

R−1/2

0 1 2 3 4 5 6 70

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

θ

p

R = 23.8359

c•(θ ,ϑ) = 1− cos(θ −ϑ)− cos3(θ −ϑ)

0 0.5 1 1.50

0.2

0.4

0.6

0.8

1

R−1/2

0 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8




c•(θ ,ϑ) = 0.25(1− cos(θ −ϑ))

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.450.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

0.26

0.28

R−1/2

0 1 2 3 4 5 6 70

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

θ

p

R = 23.8359

c•(θ ,ϑ) = 1− cos(θ −ϑ)− cos3(θ −ϑ)

0 1 2 3 4 5 6

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

θ

p



Numerical method: particle filter

Basic algorithm (discretized time):Step 0: Initialization.

For i = 1, ...,N, sample x(i)0 ∼ p0|0(x) and set n = 0.

Step 1: Importance Sampling step.

For i = 1, ...,N, sample x(i)n ∼ pN

n−1|n−1K(xn) = (1/N)N

∑k=1

K(xn|x(k)n−1).

For i = 1, ...,N, evaluate the normalized importance weight w(i)n

w(i)n ∝ g

(zn|x

(i)n

);

N

∑i=1

w(i)n = 1

pNn|n(x) =

N

∑i=1

w(i)n δ

(i)xn

(x)

Step 2: ResamplingFor i = 1, ...,N, sample x(i)

n ∼ pNn|n(x).



Calculation of KL divergence

Recall the definition of K-L divergence for densities,

KL(pn‖ p∗n) =∫

Rpn(s) ln

( pn(s)p∗n(s)

)ds.

We make a co-ordinate transformation s = x+u(x) and express the K-Ldivergence as:

KL(pn‖ p∗n)

=∫

R

p−n (x)|1+u′(x)|

ln(p−n (x)

|1+u′(x)| p∗n(x+u(x)|zn))|1+u′(x)|dx



Derivation of Euler-Lagrange equation

Denote:

L (x,u,u′) =−p−n (x)(

ln |1+u′(x)|

+ ln(p−n (x+u)pZ|X(zn|x+u))).

(4)

The optimization problem thus is a calculus of variation problem:

minu

∫L (x,u,u′)dx.

The minimizer is obtained via the analysis of first variation given by thewell-known Euler-Lagrange equation:

∂L

∂u=

ddx

(∂L

∂u′

),

Explicitly substituting the expression (4) for L , we obtain the expectingresult.



Simulation Results

When particle filter does not work well:

Bounds

0

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

t t

First harmonic PDF estimate

Second harmonic

xπ 2π0

π

2π

0 2 4 0 2 40

0.01

0.02

0.03

0.04

(a) (b) (c)

State

Conditional mean est.

θ(t)θN(t)



Derivation of FPK for FPF

Controlled particle system: Controlled system (N particles):

dXi(t) = a(Xi(t))dt + Ui(t)︸︷︷︸mean field control

+σB dBi(t), i = 1, ...,N (3)

Using ansatz: Ui(t) = u(Xi(t))dt + v(Xi(t))dZ(t). Particle systembecomes:

dXi(t) = (a(Xi(t))+u(Xi(t))) dt +σB dBi(t)+ v(Xi(t))dZ(t) i = 1, ...,N(3)

Then the cond. distribution of Xi(t) given the filteration Z t , p(x, t),satisfies the forward equation:

dp = L † pdt− ∂

∂x(vp) dZt −

∂

∂x(up) dt +σ

2W

12

∂ 2

∂x2

(pv2) dt,

Note after going through some short calculation, we have

u(x, t) = v(x, t)(−1

2(h(x)+ h)+

12

σ2W v′(x, t)

), (4)



Derivation of FPF for linear case

The density p is a function of the stochastic process µt . Using Itô’sformula,

dp(x, t) =∂ p∂ µ

dµt +∂ p∂Σ

dΣt +12

∂ 2 p∂ µ2 dµ

2t ,

where∂ p∂ µ

=x−µt

Σtp,

∂ p∂Σ

=1

2Σt

((x−µt)2

Σt−1)

p, and

∂ 2 p∂ µ2 =

1Σt

((x−µt)2

Σt−1)

p. Substituting these into the forward

equation, we obtain a quadratic equation Ax2 +Bx = 0, where

A = dΣt −(

2aΣt +σ2B−

h2Σ2t

σ2W

)dt,

B = dµt −(

aµt dt +hΣt

σ2W

(dZt −hµt dt))

.

This leads to the Kalman filter.P. G. Mehta (Illinois) Harvard Apr. 1, 2011 57 / 38


Derivation of E-L equation

In the continuous-time case, the control is of the form:

U it = u(X i

t , t)dt + v(X it , t)dZt . (5)

Substituting this in the E-L BVP for the continuous-discrete time case,we arrive at the formal equation:

∂

∂x

(p(x, t)

1+u′ dt + v′ dZt

)=p(x, t)

∂

∂ u

(ln p(x+udt + vdZt , t)

+ ln pdZ|X(dZt | x+udt + vdZt)), (6)

where pdZ|X(dZt |·) =1√

2πσ2W

exp(−(dZt −h(·))2

2σ2W

)and u = udt + vdZt .

For notational ease, we use primes to denote partial derivatives with

respect to x: p is used to denote p(x, t), p′ :=∂ p∂x

(x, t), p′′ :=∂ 2 p∂x2 (x, t),

u′ :=∂u∂x

(x, t), v′ :=∂v∂x

(x, t) etc. Note that the time t is fixed.



Derivation of E-L equation cont.1

Step 1: The three terms in (6) are simplified as:

∂

∂x

(p

1+u′ dt + v′ dZt

)= p′− f1 dt− (p′v′+ pv′′)dZt

p∂

∂ uln p(x+udt + vdZt) = p′+ f2 dt +(p′′v− p′2v

p)dZt

p∂

∂ uln pdZ|X(dZt |x+udt + vdZt) =

pσ2

W[h′ dZt +(h′′v−hh′)dt]

where we have used Itô’s rules dZ2t = σ

2W dt, dZt dt = 0 etc.

f1 = (p′u′+ pu′′)−σ2W (p′v′2 +2pv′v′′),

f2 = (p′′u− p′2up

)+σ2W v2

(12

p′′′− 3p′p′′

2p+

p′3

p2

).

Collecting terms in O(dZt) and O(dt), after some simplification, leadsto the following ODEs:




E (v) =1

σ2W

h′(x) (7)

E (u) =− 1σ2

Wh(x)h′(x)+h′′(x)v+σ

2W G(x, t) (8)

where E (v) =− ∂

∂x

(1

p(x, t)∂

∂xp(x, t)v(x, t)

), and

G =−2v′v′′− (v′)2(ln p)′+12

v2(ln p)′′′.




Step 2. Suppose (u,v) are admissible solutions of the E-L BVP (7)-(8).Then it is claimed that

−(pv)′ =h− hσ2

Wp (9)

−(pu)′ =−(h− h)hσ2

Wp− 1

2σ

2W (pv2)′′. (10)

Recall that admissible here means

limx→±∞

p(x, t)u(x, t) = 0, limx→±∞

p(x, t)v(x, t) = 0. (11)

To show (9), integrate (7) once to obtain

−(pv)′ =1

σ2W

hp+Cp,




where the constant of integration C =− hσ2

Wis obtained by integrating

once again between −∞ to ∞ and using the boundary conditions forv (11). This gives (9). To show (10), we denote its right hand side as Rand claim (

R

p

)′=−hh′

σ2W

+h′′v+σ2W G. (12)

The equation (10) then follows by using the ODE (8) together with theboundary conditions for u (11). The verification of the claim involves astraightforward calculation, where we use (7) to obtain expressions forh′ and v′′. The details of this calculation are omitted on account ofspace.




Step 3. The E-L equation for v is given by (7). After going throughsome short calculation, we have

u(x, t) = v(x, t)(−1

2(h(x)+ h)+

12

σ2W v′(x, t)

), (13)


Bibliography

Dimitri P. Bertsekas.Dynamic Programming and Optimal Control, volume 1.Athena Scientific, Belmont, Massachusetts, 1995.

Eric Brown, Jeff Moehlis, and Philip Holmes.On the phase reduction and response dynamics of neuraloscillator populations.Neural Computation, 16(4):673–715, 2004.

M. Dellnitz, J.E. Marsden, I. Melbourne, and J. Scheurle.Generic bifurcations of pendula.Int. Series Num. Math., 104:111–122, 1992.

A. Doucet, N. de Freitas, and N. Gordon.Sequential Monte-Carlo Methods in Practice.Springer-Verlag, April 2001.

N. J. Gordon, D. J. Salmond, and A. F. M. Smith.Novel approach to nonlinear/non-Gaussian Bayesian stateestimation.P. G. Mehta (Illinois) Harvard Apr. 1, 2011 63 / 38

Bibliography

IEE Proceedings F Radar and Signal Processing, 140(2):107–113,1993.

J. Guckenheimer.Isochrons and phaseless sets.J. Math. Biol., 1:259–273, 1975.

M. Huang, P. E. Caines, and R. P. Malhame.Large-population cost-coupled LQG problems with nonuniformagents: Individual-mass behavior and decentralized ε-Nashequilibria.52(9):1560–1571, 2007.

Minyi Huang, Peter E. Caines, and Roland P. Malhame.Large-population cost-coupled LQG problems with nonuniformagents: Individual-mass behavior and decentralized ε-nashequilibria.IEEE transactions on automatic control, 52(9):1560–1571, 2007.

R. E. Kalman.


Bibliography

A new approach to linear filtering and prediction problems.Journal of Basic Engineering, 82(1):35–45, 1960.

Y. Kuramoto.International Symposium on Mathematical Problems in TheoreticalPhysics, volume 39 of Lecture Notes in Physics.Springer-Verlag, 1975.

H. J. Kushner.On the differential equations satisfied by conditional probabilitydensities of markov process.SIAM J. Control, 2:106–119, 1964.

J. Lasry and P. Lions.Mean field games.Japanese Journal of Mathematics, 2(2):229–260, 2007.

Jean-Michel Lasry and Pierre-Louis Lions.Mean field games.Japan. J. Math., 2:229–260, 2007.


Bibliography

Sean P. Meyn.The policy iteration algorithm for average reward markov decisionprocesses with general state space.IEEE Transactions on Automatic Control, 42(12):1663–1680,December 1997.

S. H. Strogatz and R. E. Mirollo.Stability of incoherence in a population of coupled oscillators.Journal of Statistical Physics, 63:613–635, May 1991.

G. Y. Weintraub, L. Benkard, and B. Van Roy.Oblivious equilibrium: A mean field approximation for large-scaledynamic games.In Advances in Neural Information Processing Systems,volume 18. MIT Press, 2006.

Huibing Yin, Prashant G. Mehta, Sean P. Meyn, and Uday V.Shanbhag.Synchronization of coupled oscillators is a game.


Bibliography

In Proc. of 2010 American Control Conference, pages 1783–1790,Baltimore, MD, 2010.

Mehta P. G. Meyn S. P. Yin, H. and U. V. Shanbhag.Synchronization of coupled oscillators is a game.IEEE Trans. Automat. Control.


Mean-field Methods in Estimation & Control

Technology

Transcript of Mean-field Methods in Estimation & Control