Post on 24-Sep-2020
.75
.25
.
.
0
0
N N-1 3 2 1
-3-3-3-3
-3 -3 -3 -20
0
1
0
.
.
0
0
0
0
.
.
0
0
.5
.5
.
.
0
0
0
0
.
.
.25
.75
0
0
.
.
0
1
(S,A,Pass! ,Ra
ss! , !)
V !(s) =!
a
"(s, a)!
s!
Pass! [Ra
ss! + !V !(s!)]
#t(V ) = rt+1 + !V (st+1) − V (st).
!t+1 = A"1t bt.
O(mn + k2)
0 s ← s0, A ← 0, µ ← 0, t ← 01 Initialize ! arbitrarily2 repeat3 Take action according to " and observe r, s!
4 t ← t + 15 ∆b ← "(s)r6 ∆A ← "(s)("(s) − !"(s!))T
7 A ← A + ∆A8 µ ← µ + ∆b − (∆A)!9 for i from 1 to m do10 j ← argmax(|µj |)11 $j ← $j + %µj
12 µ ← µ − %µjAei
13 end for14 end repeat
µt(!t+1) = µt(!t) − At(∆!t)
µt(!t) = µt"1(!t) + ∆bt − (∆At)!t
V (st) = "(st)T !
!t+1 = !t + !t"(st)#t(V )
µt(!) =t!
i=1
"i#i(V!)
=t!
i=1
"iri+1
" #$ %bt
−t!
i=1
"i("i − $"i+1)T
" #$ %At
!
!