p · 2006. 10. 9. · distrib ute themselv es o v er a number of tar gets (e.g. fires) noisy,...
Transcript of p · 2006. 10. 9. · distrib ute themselv es o v er a number of tar gets (e.g. fires) noisy,...
Opt
imal
cont
rolo
fst
ocha
stic
mul
ti-a
gent
syst
ems
inco
ntin
uous
spac
ean
dti
me
Wim
Wie
geri
nck,
Bar
tva
nde
nB
roek
,B
ert
Kap
pen
SNN
,Rad
boud
Uni
vers
ityN
ijmeg
en
Pres
ente
dat
UA
I20
06
The
rese
arch
repo
rted
here
ispa
rtof
the
Inte
ract
ive
Col
lab-
orat
ive
Info
rmat
ion
Syst
ems
(IC
IS)
proj
ect,
supp
orte
dby
the
Dut
chM
inis
try
ofE
cono
mic
Aff
airs
,gra
ntB
SIK
0302
4.
W.W
iege
rinc
k-
UvA
,9O
ctob
er,2
006
–p.
1
Stoc
hast
icm
ulti
-age
ntsy
stem
san
dop
tim
alco
ntro
l
Mul
ti-ag
ents
yste
ms
(e.g
.fir
emen
-se
efig
ure)
that
have
todi
stri
bute
them
selv
esov
era
num
ber
ofta
rget
s(e
.g.
fires
)no
isy,
non-
linea
rdy
nam
ics
inco
ntin
uous
spac
e-tim
ead
ditiv
eco
ntro
lof
the
dyna
mic
s
Opt
imal
cont
rol:
min
imiz
eto
talj
oint
cost
(=
effo
rtco
st+
end
cost
)to
whi
chfir
esh
ould
afir
eman
go?
whe
nto
deci
de?
W.W
iege
rinc
k-
UvA
,9O
ctob
er,2
006
–p.
2
Con
tent
s
Stoc
hast
icop
timal
cont
roli
nco
ntin
uous
spac
etim
e,H
amilt
on-J
acob
i-B
ellm
aneq
uatio
n
Tra
nsfo
rmin
toa
linea
rPD
EFr
omsi
ngle
-age
ntsi
ngle
-tar
gets
yste
ms
tom
ulti-
agen
tmul
ti-ta
rget
syst
ems
MA
Sco
ntro
land
grap
hica
lmod
els
Sim
ulat
ions
Sum
mar
y,di
scus
sion
W.W
iege
rinc
k-
UvA
,9O
ctob
er,2
006
–p.
3
Stoc
hast
icdy
nam
ics
We
cons
ider
asy
stem
inco
ntin
uous
spac
ean
dtim
e.It
sst
ate
x∈
Rk
obey
sth
est
ocha
stic
dyna
mic
s
dx
=(b
(x,t
)+
u)d
t+
dξ
b:
drif
tter
mm
odel
ing
the
dyna
mic
sdu
eto
the
envi
ronm
ent,
u:
cont
rolt
oin
fluen
ceth
edy
nam
ics,
dξ
aW
iene
rpr
oces
s(i
.e.
nois
e)w
ith〈d
ξ idξ j〉
=ν
ijdt.
00.
20.
40.
60.
81
1.2
1.4
−0.
8
−0.
6
−0.
4
−0.
20
0.2
0.4
0.6
W.W
iege
rinc
k-
UvA
,9O
ctob
er,2
006
–p.
4
Con
trol
prob
lem
Find
the
cont
rolu
(.)
that
min
imiz
esth
eex
pect
edco
st-t
o-go
C(x
i,t i
,u(.
))=
⟨
φ(x
(T))
+
∫
T
t i
dt(
1 2u
(x,t
)>R
u(x
,t)+
V(x
(t),
t))
⟩
inw
hich x
i:in
itial
stat
e,
t i:
initi
altim
e,
T:
Fixe
den
d-tim
e,
φ(x
(T))
:co
stof
bein
gin
stat
ex
aten
d-tim
eT
,
V(x
(t),
t)dt:
cost
ofbe
ing
inst
ate
xdu
ring
time
inte
rval
[t,t
+dt]
,
u>
Ru
dt:
cost
ofco
ntro
ldur
ing
the
sam
etim
ein
terv
al,
Ris
aco
nsta
ntk×
km
atri
x(p
aram
etri
zing
the
cost
ofco
ntro
l).
W.W
iege
rinc
k-
UvA
,9O
ctob
er,2
006
–p.
5
Ham
ilton
-Jac
obi-
Bel
lman
equa
tion
and
op-
tim
alco
ntro
l
Opt
imal
(exp
ecte
d)co
st-t
o-go
J(x
,t)=
min
u(.
)C
(x,t
,u(.
)).
Itsa
tisfie
sth
est
ocha
stic
Ham
ilton
-Jac
obi-
Bel
lman
(HJB
)eq
uatio
n
−∂
tJ
=m
inu
(.)
(
1 2u>
Ru
+V
+(
b+
u)
>∇
J+
1 2T
r(
ν∇
2J)
)
=−
1 2∇
J>
R−
1∇
J+
V+
b>∇
J+
1 2T
r(
ν∇
2J)
with
boun
dary
cond
ition
J(x
,T)=
φ(x
).
The
min
imiz
atio
nw
ithre
spec
tto
uyi
elds
u=
−R
−1∇
J,
whi
chde
fines
the
optim
alco
ntro
l.
W.W
iege
rinc
k-
UvA
,9O
ctob
er,2
006
–p.
6
Log
tran
sfor
mat
ion
and
linea
rev
olut
ion
Ass
ume
ν=
λR
−1,t
hen
the
non-
linea
rPD
Eof
Jca
ntr
ansf
orm
edin
toa
linea
ron
eby
the
log
tran
sfor
m(W
.Fle
mm
ing,
1978
).Se
t
J(x
,t)=
−λ
log
Z(x
,t)
with
“par
titio
nfu
nctio
n”
Z(x
,t)=
Z
dkyρ(y
,T|x
,t)ex
p(−
φ(y
)/λ)
inw
hich
the
‘pro
babi
lity
dens
ity’ρ
satis
fies
the
Fokk
er-P
lanc
keq
uatio
n
∂ϑρ(y
,ϑ|x
,t)=
−V λ
ρ−
∇> y
bρ+
1 2T
r`
ν∇
2 yρ
´
.
ρ(y
,T|x
,t):
prob
abili
tyof
getti
ngat
yat
time
Tgi
ven
initi
alst
ate
xat
time
t,
follo
win
gst
ocha
stic
syst
emdy
nam
ics
inab
senc
eof
cont
rol,
i.e.,
u=
0
with
anni
hila
tion
prob
abili
tyV
(x,t
)dt/
λ
W.W
iege
rinc
k-
UvA
,9O
ctob
er,2
006
–p.
7
Lin
ear
theo
ry
Exp
ecte
dop
timal
cost
togo J
(x,t
)=
λlo
gZ
(x,t
)
Zis
expr
esse
das Z
(x,t
)=
∫
dkyρ(y
,T|x
,t)ex
p(−
φ(y
)/λ)
inw
hich
ρsa
tisfie
sth
eFo
kker
-Pla
nkeq
uatio
n.
The
optim
alco
ntro
lis
give
nby
u(x
,t)=
ν∇
log
Z(x
,t).
W.W
iege
rinc
k-
UvA
,9O
ctob
er,2
006
–p.
8
Exa
mpl
e:Q
uadr
atic
end-
cost
b=
0,V
=0
Ran
dν
scal
ars,
Solu
tion
ofth
edi
ffus
ion
equa
tion
ρ(y
,T|x
,t)
=(2
πν(T
−t)
)k/2ex
p(−
|y−
x|2
2ν(T
−t)
)
Ass
ume
end-
cost
:φ(x
)=
α|x
−µ|2
The
nZ
follo
ws
from
conv
olut
ion
with
exp(−
φ(y
)/λ),
Z(x
,t)∝
exp
(
−|x
−µ|2
2ν(T
−t+
R/α
)
)
.
The
optim
alco
ntro
lfol
low
sfr
omu
=ν∇
log
Z,
u(x
,t)=
µ−
x
T−
t+
R/α
.W
.Wie
geri
nck
-U
vA,9
Oct
ober
,200
6–
p.9
Sing
leta
rget
Bac
kto
arbi
trar
yb
and
V.
Toen
forc
ean
end-
stat
eat
targ
etµ
,we
set
exp(−
φ(y
)/λ)∝
δ(y−
µ)
Thi
sim
plie
s
Z(x
,t;µ
)∝
ρ(µ
,T|x
,t),
u(x
,t;µ
)=
ν∇
log
Z(x
,t;µ
)
=ν∇
log
ρ(µ
,T|x
,t)
W.W
iege
rinc
k-
UvA
,9O
ctob
er,2
006
–p.
10
Run
ning
exam
ple
Ass
ume
b=
0,V
=0,
Ran
dν
scal
ars.
Z(x
,t;µ
)∝
exp
[
−|x
−µ|2
2ν(T
−t)
]
,
u(x
,t;µ
)=
µ−
x
T−
t.
For
any
blin
ear
inx
,and
V=
0,in
the
Fokk
er-P
lanc
keq
uatio
nca
nbe
solv
edan
alyt
ical
ly.
Its
solu
tion
isa
Gau
ssia
n.Z
and
uar
eof
esse
ntia
llyth
esa
me
form
asin
the
runn
ing
exam
ple.
W.W
iege
rinc
k-
UvA
,9O
ctob
er,2
006
–p.
11
Mul
tipl
eta
rget
s
Toen
forc
ean
end-
stat
eat
one
ofm
targ
etsµ
s,w
ithta
rget
pref
eren
ces
expr
esse
dby
rela
tive
cost
E(s
),
exp(−
φ(y
)/λ)∝
m∑ s=
1
exp(−
E(s
)/λ)δ
(y−
µs)
.
Part
ition
func
tion
and
optim
alco
ntro
lcan
beex
pres
sed
assu
mof
sing
le-t
arge
tqu
antit
ies,
Z(x
,t)
∝m
∑ s=
1
exp(−
E(s
)/λ)Z
(x,t
;µs)
,
u(x
,t)
=m
∑ s=
1
p(s|x
,t)u
(x,t
;µs),
inw
hich
p(s|x
,t)
isth
epr
obab
ility
p(s|x
,t)=
exp(−
E(s
)/λ)Z
(x,t
;µs)
∑
m s′=
1ex
p(−
E(s
′ )/λ
)Z(x
,t;µ
s′).
W.W
iege
rinc
k-
UvA
,9O
ctob
er,2
006
–p.
12
Exa
mpl
e:M
ulti
ple
targ
ets
Inth
eru
nnin
gex
ampl
e,op
timal
cont
rolw
ithm
ultip
leta
rget
sis
u(x
,t)
=∑
s
p(s|x
,t)u
(x,t
;µs)
=∑
s
p(s|x
,t)
(
µs−
x
T−
t
)
=µ̄−
x
T−
t
with
µ̄th
e‘e
xpec
ted
targ
et’
µ̄=
m∑ s=
1
p(s|x
,t)µ
s
whi
chis
the
expe
cted
valu
eof
the
targ
etac
cord
ing
toth
epr
obab
ility
p(s|x
,t)=
exp(−
E(s
))ex
p[
−|x
−µ
s|2
2ν(T
−t)
]
∑
m s=
1ex
p(−
E(s
))ex
p[
−|x
−µ
s|2
2ν(T
−t)
]
W.W
iege
rinc
k-
UvA
,9O
ctob
er,2
006
–p.
13
Mul
ti-a
gent
s,m
ulti
ple
targ
ets
MA
Sst
ate
~ x=
(x1,.
..,x
n),
sing
leag
ents
tate
xa.
Dyn
amic
sno
n-in
tera
ctiv
e:b
a(~ x
,t)=
ba(x
a,t
)
V(~ x
,t)
=∑
aV
a(x
a,t
).
Furt
herm
ore,
ν=
λR
−1
glob
ally
.T
here
fore
ρfa
ctor
izes
,
ρ(~ y
,T|~ x
,t)
=∏
a
ρa(y
a,T
|xa,t
).
Cou
plin
gvi
ajo
intt
ask:
dist
ribu
teov
erta
rget
s
exp(−
φ(~ y
)/λ)
=∑
s
exp(−
E(~s
)/λ)∏
a
δ(y
a−
µs
a).
s ala
belo
fta
rget
reac
hed
byag
enta
.E
(~s)
=E
(s1,.
..,s
n)
isth
eco
stw
hen
agen
t1re
ache
sµ
s1,
agen
t2re
ache
sµ
s2
etc.
W.W
iege
rinc
k-
UvA
,9O
ctob
er,2
006
–p.
14
Mul
ti-a
gent
s,m
ulti
ple
targ
ets
(con
t)
Con
trol
for
agen
ta(d
epen
dson
join
tsta
te)
ua(~ x
,t)
=
m∑ sa=
1
p(s
a|~ x
,t)u
a(~ x
a,t
;µs
a)
.
Con
trol
invo
lves
mar
gina
ldis
trib
utio
nfo
rag
enta
p(s
a|~ x
,t)=
∑
~s\s
aex
p(−
E(~s
)/λ)∏
bZ
b(~ x
b,t
;sb)
∑
~sex
p(−
E(~s
)/λ)∏
cZ
c(~ x
c,t
;sb)
,
W.W
iege
rinc
k-
UvA
,9O
ctob
er,2
006
–p.
15
Exa
mpl
e:M
AS,
mul
tipl
eta
rget
s
Inth
eru
nnin
gex
ampl
e,op
timal
cont
rolw
ithm
ultip
leta
rget
sis
ua(~ x
,t)
=µ̄
a−
xa
T−
tw
ithµ̄
the
‘exp
ecte
dta
rget
’fo
rag
enta
µ̄a
=m
∑ s=
1
p(s
a|~ x
,t)µ
sa
whi
chis
the
expe
cted
valu
eof
the
targ
etac
cord
ing
toth
epr
obab
ility
p(s
a|~ x
,t)∝
w(s
a|~ x
\a,t
)ex
p
[
−|x
a−
µs
a|2
2ν(T
−t)
]
with
w(s
a|~ x
\a,t
)=
∑ ~s\s
a
exp(−
E(~s
))ex
p
[
−
∑
b6=
a|x
b−
µs
b|2
2ν(T
−t)
]
W.W
iege
rinc
k-
UvA
,9O
ctob
er,2
006
–p.
16
MA
Sco
ntro
land
grap
hica
lmod
els
MA
Sco
ntro
lreq
uire
sin
fere
nce
ofp(s
a|~ x
,t).
Gra
phic
alm
odel
met
hods
can
beex
ploi
ted
ifp(~s
)is
afa
ctor
grap
h,i.e
.if
the
cost
can
bew
ritte
n
E(~s
)=
∑
α
Eα(s
α)
with
αgr
oups
ofag
ents
.E
.g.
pair
wis
eco
sts
Eab(s
a,s
b)
=−
c abδ s
as
b.
Bol
tzm
ann
mac
hine
anal
ogy
Ea,b(s
a,s
b)
play
sro
leof
coup
lings
ina
Bol
tzm
ann
mac
hine
,con
stan
tin
the
syst
emZ
a(x
a,t
;µs
a),
(i.e
.,ρ(µ
sa,T
|xa,t
))pl
ays
role
ofbi
asin
aB
M,
chan
ges
over
time
Inge
nera
l:gr
aphi
cals
truc
ture
ispr
eser
ved
over
time
(unl
ike
disc
rete
time
fact
ored
MD
Ps).
Spar
segr
aphs
⇒ju
nctio
ntr
eeal
gori
thm
.
W.W
iege
rinc
k-
UvA
,9O
ctob
er,2
006
–p.
17
Sim
ulat
ions
b=
0,V
=0,
pair
wis
ere
latio
nsc a
b.
1-d
‘Pos
ition
s’x
a.
for
plot
ting
purp
oses
.k-
dim
ensi
onal
wou
ldbe
feas
ible
asw
ell.
‘Exp
ecte
dta
rget
s’µ
a=
∑
sap
a(s
a|x
)µs
a.
Onl
yfo
ril
lust
rati
on:u
a=
µa−
xa
ν(T
−t)
ism
athe
mat
ical
lyop
tim
al!
W.W
iege
rinc
k-
UvA
,9O
ctob
er,2
006
–p.
18
Fir
emen
prob
lem
6ag
ents
(fire
men
),th
ree
targ
ets
(fire
s).
Fully
conn
ecte
dgr
aph
with
c ab
=c
nega
tive⇒
aim
todi
stri
bute
even
ly.
01
−101
time
positions
01
−101
time
expected targetsSy
mm
etry
brea
king
asde
laye
dch
oice
(Kap
pen,
2005
)
W.W
iege
rinc
k-
UvA
,9O
ctob
er,2
006
–p.
19
Hol
iday
reso
rtpr
oble
m
42ag
ents
,3ta
rget
s(r
esor
ts).
Ere
pres
ente
dby
spar
segr
aph:
each
agen
thas
pair
wis
ere
latio
ns,
c ab
=±
1,w
ithth
ree
othe
rag
ents
.A
gent
only
care
sfo
rre
late
dag
ents
whe
ther
orno
tto
have
holid
ayin
the
sam
ere
sort
.
Join
ttas
kis
toop
timal
lydi
stri
bute
MA
Sov
erth
ere
sort
s.
Cliq
ue-s
ize
=7
01
−2
−1012
time
positions
01
−101
time
expected targets
W.W
iege
rinc
k-
UvA
,9O
ctob
er,2
006
–p.
20
Sum
mar
yan
ddi
scus
sion
We
stud
ied
optim
alco
ntro
lin
stoc
hast
icM
AS
inco
ntin
uous
spac
e-tim
e
Opt
imal
cont
rolc
anbe
deri
ved
from
the
solu
tion
ofH
amilt
on-J
acob
i-B
ellm
aneq
uatio
ns
Und
erso
me
cond
ition
s,th
elo
g-tr
ansf
orm
atio
ntr
ansf
orm
sth
eno
n-lin
ear
HJB
equa
tions
(ano
n-lin
ear
PDE
)in
toa
linea
rPD
E
unde
rth
ese
cond
ition
s,a
supe
rpos
ition
prin
cipl
eho
lds.
Thi
sen
able
sus
toge
nera
teM
AS
mul
ti-ta
rget
solu
tions
from
sing
leag
ents
ingl
e-ta
rget
solu
tions
Add
ition
alco
mpu
tatio
nalc
ostf
orM
AS
invo
lves
prob
abili
stic
infe
renc
e.w
hich
istr
acta
ble
insp
arse
lyco
nnec
ted
syst
ems.
Inde
nse
MA
Ss,e
xact
infe
renc
eis
intr
acta
ble;
ana
tura
lapp
roac
hw
ould
beap
prox
imat
ein
fere
nce
usin
gm
essa
gepa
ssin
gal
gori
thm
s.(c
urre
ntst
udy)
.
Inlin
ear
mod
els,
with
V=
0an
dno
agen
tint
erac
tions
,opt
imal
cont
roli
nM
AS
mul
ti-ta
rget
syst
ems
isso
lved
anal
ytic
ally
.
Inge
nera
l,ho
wev
er,e
ven
the
sing
leag
entp
robl
emre
quir
esco
mpu
tatio
nal
inte
nsiv
eap
prox
imat
ions
(Kap
pen,
2005
).W
.Wie
geri
nck
-U
vA,9
Oct
ober
,200
6–
p.21