Data Efficient 8 1.1, 11, A b - MIT CSAILpeople.csail.mit.edu/agf/Files/06AAAI-ilstd-poster.pdfData...

1
.75 .25 . . 0 0 N N-1 3 2 1 -3 -3 -3 -3 -3 -3 -3 -2 0 0 1 0 . . 0 0 0 0 . . 0 0 .5 .5 . . 0 0 0 0 . . .25 .75 0 0 . . 0 1 (S , A, P a ss , R a ss ) V π (s) = a π (s, a) s P a ss [R a ss + γV π (s )] δ t (V )= r t+1 + γV (s t+1 ) - V (s t ). θ t+1 = A -1 t b t . O (mn + k 2 ) 0 s s 0 , A 0, μ 0, t 0 1 Initialize θ arbitrarily 2 repeat 3 Take action according to π and observe r , s 4 t t +1 5 Δb φ(s)r 6 ΔA φ(s)(φ(s) - γ φ(s )) T 7 A A A 8 μ μ b - A)θ 9 for i from 1 to m do 10 j argmax(|μ j |) 11 θ j θ j + αμ j 12 μ μ - αμ j Ae i 13 end for 14 end repeat μ t (θ t+1 ) = μ t (θ t ) - A t θ t ) μ t (θ t ) = μ t-1 (θ t )+Δb t - A t )θ t V (s t )= φ(s t ) T θ θ t+1 = θ t + α t φ(s t )δ t (V ) μ t (θ ) = t i=1 φ i δ i (V θ ) = t i=1 φ i r i+1 b t - t i=1 φ i (φ i - γ φ i+1 ) T A t θ

Transcript of Data Efficient 8 1.1, 11, A b - MIT CSAILpeople.csail.mit.edu/agf/Files/06AAAI-ilstd-poster.pdfData...

Page 1: Data Efficient 8 1.1, 11, A b - MIT CSAILpeople.csail.mit.edu/agf/Files/06AAAI-ilstd-poster.pdfData Efficient 8 1.1, 11, A b — COR 103 101 100 LSTD 10 200 iLSTD 400 600 Episode 800

.75

.25

.

.

0

0

N N-1 3 2 1

-3-3-3-3

-3 -3 -3 -20

0

1

0

.

.

0

0

0

0

.

.

0

0

.5

.5

.

.

0

0

0

0

.

.

.25

.75

0

0

.

.

0

1

(S,A,Pass! ,Ra

ss! , !)

V !(s) =!

a

"(s, a)!

s!

Pass! [Ra

ss! + !V !(s!)]

#t(V ) = rt+1 + !V (st+1) − V (st).

!t+1 = A"1t bt.

O(mn + k2)

0 s ← s0, A ← 0, µ ← 0, t ← 01 Initialize ! arbitrarily2 repeat3 Take action according to " and observe r, s!

4 t ← t + 15 ∆b ← "(s)r6 ∆A ← "(s)("(s) − !"(s!))T

7 A ← A + ∆A8 µ ← µ + ∆b − (∆A)!9 for i from 1 to m do10 j ← argmax(|µj |)11 $j ← $j + %µj

12 µ ← µ − %µjAei

13 end for14 end repeat

µt(!t+1) = µt(!t) − At(∆!t)

µt(!t) = µt"1(!t) + ∆bt − (∆At)!t

V (st) = "(st)T !

!t+1 = !t + !t"(st)#t(V )

µt(!) =t!

i=1

"i#i(V!)

=t!

i=1

"iri+1

" #$ %bt

−t!

i=1

"i("i − $"i+1)T

" #$ %At

!

!