Data Efficient 8 1.1, 11, A b - MIT CSAILpeople.csail.mit.edu/agf/Files/06AAAI-ilstd-poster.pdfData...
1
.75 .25 . . 0 0 N N-1 3 2 1 -3 -3 -3 -3 -3 -3 -3 -2 0 0 1 0 . . 0 0 0 0 . . 0 0 .5 .5 . . 0 0 0 0 . . .25 .75 0 0 . . 0 1 (S , A, P a ss , R a ss ,γ ) V π (s) = a π (s, a) s P a ss [R a ss + γV π (s )] δ t (V )= r t+1 + γV (s t+1 ) - V (s t ). θ t+1 = A -1 t b t . O (mn + k 2 ) 0 s ← s 0 , A ← 0, μ ← 0, t ← 0 1 Initialize θ arbitrarily 2 repeat 3 Take action according to π and observe r , s 4 t ← t +1 5 Δb ← φ(s)r 6 ΔA ← φ(s)(φ(s) - γ φ(s )) T 7 A ← A +ΔA 8 μ ← μ +Δb - (ΔA)θ 9 for i from 1 to m do 10 j ← argmax(|μ j |) 11 θ j ← θ j + αμ j 12 μ ← μ - αμ j Ae i 13 end for 14 end repeat μ t (θ t+1 ) = μ t (θ t ) - A t (Δθ t ) μ t (θ t ) = μ t-1 (θ t )+Δb t - (ΔA t )θ t V (s t )= φ(s t ) T θ θ t+1 = θ t + α t φ(s t )δ t (V ) μ t (θ ) = t i=1 φ i δ i (V θ ) = t i=1 φ i r i+1 b t - t i=1 φ i (φ i - γ φ i+1 ) T A t θ
Transcript of Data Efficient 8 1.1, 11, A b - MIT CSAILpeople.csail.mit.edu/agf/Files/06AAAI-ilstd-poster.pdfData...
.75
.25
.
.
0
0
N N-1 3 2 1
-3-3-3-3
-3 -3 -3 -20
0
1
0
.
.
0
0
0
0
.
.
0
0
.5
.5
.
.
0
0
0
0
.
.
.25
.75
0
0
.
.
0
1
(S,A,Pass! ,Ra
ss! , !)
V !(s) =!
a
"(s, a)!
s!
Pass! [Ra
ss! + !V !(s!)]
#t(V ) = rt+1 + !V (st+1) − V (st).
!t+1 = A"1t bt.
O(mn + k2)
0 s ← s0, A ← 0, µ ← 0, t ← 01 Initialize ! arbitrarily2 repeat3 Take action according to " and observe r, s!
4 t ← t + 15 ∆b ← "(s)r6 ∆A ← "(s)("(s) − !"(s!))T
7 A ← A + ∆A8 µ ← µ + ∆b − (∆A)!9 for i from 1 to m do10 j ← argmax(|µj |)11 $j ← $j + %µj
12 µ ← µ − %µjAei
13 end for14 end repeat
µt(!t+1) = µt(!t) − At(∆!t)
µt(!t) = µt"1(!t) + ∆bt − (∆At)!t
V (st) = "(st)T !
!t+1 = !t + !t"(st)#t(V )
µt(!) =t!
i=1
"i#i(V!)
=t!
i=1
"iri+1
" #$ %bt
−t!
i=1
"i("i − $"i+1)T
" #$ %At
!
!