Download - Data Efficient 8 1.1, 11, A b - MIT CSAILpeople.csail.mit.edu/agf/Files/06AAAI-ilstd-poster.pdfData Efficient 8 1.1, 11, A b — COR 103 101 100 LSTD 10 200 iLSTD 400 600 Episode 800

Transcript
Page 1: Data Efficient 8 1.1, 11, A b - MIT CSAILpeople.csail.mit.edu/agf/Files/06AAAI-ilstd-poster.pdfData Efficient 8 1.1, 11, A b — COR 103 101 100 LSTD 10 200 iLSTD 400 600 Episode 800

.75

.25

.

.

0

0

N N-1 3 2 1

-3-3-3-3

-3 -3 -3 -20

0

1

0

.

.

0

0

0

0

.

.

0

0

.5

.5

.

.

0

0

0

0

.

.

.25

.75

0

0

.

.

0

1

(S,A,Pass! ,Ra

ss! , !)

V !(s) =!

a

"(s, a)!

s!

Pass! [Ra

ss! + !V !(s!)]

#t(V ) = rt+1 + !V (st+1) − V (st).

!t+1 = A"1t bt.

O(mn + k2)

0 s ← s0, A ← 0, µ ← 0, t ← 01 Initialize ! arbitrarily2 repeat3 Take action according to " and observe r, s!

4 t ← t + 15 ∆b ← "(s)r6 ∆A ← "(s)("(s) − !"(s!))T

7 A ← A + ∆A8 µ ← µ + ∆b − (∆A)!9 for i from 1 to m do10 j ← argmax(|µj |)11 $j ← $j + %µj

12 µ ← µ − %µjAei

13 end for14 end repeat

µt(!t+1) = µt(!t) − At(∆!t)

µt(!t) = µt"1(!t) + ∆bt − (∆At)!t

V (st) = "(st)T !

!t+1 = !t + !t"(st)#t(V )

µt(!) =t!

i=1

"i#i(V!)

=t!

i=1

"iri+1

" #$ %bt

−t!

i=1

"i("i − $"i+1)T

" #$ %At

!

!