go-go-green-wing-mighty-morphing-materials-in-aircraft …

Post on 21-Oct-2021

4 views 0 download

Transcript of go-go-green-wing-mighty-morphing-materials-in-aircraft …

2017

2013

https://newatlas.com/bae-smartskin/33458/ https://www.nasa.gov/ames/feature/go-go-green-wing-mighty-morphing-materials-in-aircraft-design

http://www.biologyreference.com/Mo-Nu/Neuron.html

http://www.biologyreference.com/Mo-Nu/Neuron.html

I1 I2 B

O

w2 w3

𝑓 𝑥𝑖 , 𝑤𝑖 = Φ(𝑏 + Σ𝑖(𝑤𝑖 . 𝑥𝑖))

Φ 𝑥 = ቊ1, 𝑖𝑓 𝑥 ≥ 0.50, 𝑖𝑓 𝑥 < 0.5

w1

𝑃 ∧ Q

𝑷 𝑸 𝑷 ∧ Q

𝑇 𝑇 𝑇

𝑇 𝐹 𝐹

𝐹 𝑇 𝐹

𝐹 𝑇 𝐹

Agent

Environment𝑆𝑡+1

𝑆𝑡

sta

te

𝑅𝑡+1

𝑅𝑡

rew

ard

𝐴𝑡

actio

n

Sutton and Barto

A state St is Markov if and only if:

𝑃 𝑆𝑡+1|𝑆𝑡 = 𝑃 𝑆𝑡+1 𝑆1, … , 𝑆𝑡]

• 𝐺𝑡𝛾 𝛾 ∈ 0,1

𝐺𝑡 = 𝑅𝑡+1 + 𝛾𝑅𝑡+2 + 𝛾2𝑅𝑡+3 + … =

𝑘=0

𝛾𝑘𝑅𝑡+𝑘+1

𝑣𝜋 𝑠 = 𝔼 𝐺𝑡 𝑆𝑡 = 𝑠 = 𝔼 𝑅𝑡+1 + 𝛾𝑣𝜋 𝑠𝑡+1 𝑠𝑡 = 𝑠

𝜋

𝜋

𝑞𝜋 𝑠, 𝑎 = 𝔼 𝑅𝑡+1 + 𝛾𝑞𝜋 𝑠𝑡+1, 𝑎𝑡+1 𝑠𝑡 = 𝑠, 𝑎𝑡 = 𝑎

= ℛ𝑠𝑎 + 𝛾

𝑠′𝜖𝑆

𝒫𝑠𝑠′𝑎 𝑣𝜋 𝑠′

𝜋

𝜋

𝑠 → 𝑣𝜋(𝑠)

𝑠, 𝑎 → 𝑞𝜋(𝑠, 𝑎)

𝑠′ → 𝑣𝜋(𝑠′)

𝑎

𝑠′

𝑟

5.5

510 -3

𝑣 𝑠 = 10 × .5 + 5 × .25 + −3 × .25 = 5.5

4.4

2

R=5

P=.5 R=2

P=.5

5

P=.4 P=.5

𝑣 𝑠 = 5 × .5 + .5[.4 × 2 + .5 × 5 + .1 × 4.4] = 4.4

P=.5

P=.25

P=.25P=.1

max

s,a

r

s’

a’ s’

s

a

𝜋

p r

max

9

510 -3

𝑣∗ 𝑠 = max{−1 + 10,+2 + 5,+3 − 3} = 9

R = -1

R = 2

R = 3

A policy is better if 𝑣𝜋 𝑠 ≥ 𝑣𝜋′ 𝑠 ∀ 𝑠 ∈ 𝑆

𝑣∗ s ≡ max 𝑣𝜋 𝑠 ∀ 𝑠 ∈ 𝑆

1 2 3

4 5 6 7

8 9 10 11

12 13 14

𝑟𝑡 = −1

𝜋 → . = 𝜋 ↑ . =𝜋 ↓ . = 𝜋 ← . = .25

0.00

1

0.00

2

0.00

3

0.00

4

0.00

5

0.00

6

0.00

7

0.00

8

0.00

9

0.00

10

0.00

11

0.00

12

0.00

13

0.00

14

𝑅𝑡 = −1𝑘 = 0

0

15

𝜋: 𝑅𝑎𝑛𝑑𝑜𝑚 𝑃𝑜𝑙𝑖𝑐𝑦

𝜋 → . = 𝜋 ↑ . =𝜋 ↓ . = 𝜋 ← . = .25

𝑟𝑡 = −1

𝑣𝑘=1 1= .25 × −1 + 0𝑣(2)

𝑘=0 →

+.25 × −1 + 0𝑣(1)𝑘=0 ↑

+

.25 × −1 + 0𝑣(5)𝑘=0 ↓

+.25 × −1 + 0𝑣(𝑇)𝑘=0 ←

= −.25 − .25 − .25 − .25 = −𝟏

-1.00 -1.00 -1.00

-1.00 -1.00 -1.00 -1.00

-1.00 -1.00 -1.00 -1.00

-1.00 -1.00 -1.00

0

0

0.00

1

0.00 0.00

0.00 0.00 0.00 0.00

0.00 0.00 0.00 0.00

0.00 0.00 0.00

0

0

𝜋 → . = 𝜋 ↑ . =𝜋 ↓ . = 𝜋 ← . = .25

𝑟𝑡 = −1

𝑘 = 0 𝑘 = 1

𝑣𝑘=1 7 =.25 × −1 + 0𝑣(7)𝑘=0 →

+.25 × −1 + 0𝑣(3)𝑘=0 ↑

+

.25 × −1 + 0𝑣(11)𝑘=0 ↓

+.25 × −1 + 0𝑣(6)𝑘=0 ←

= −.25 − .25 − .25 − .25 = −𝟏

𝑣𝑘=2 1=.25 × −1 + −1.00𝑣(2)

𝑘=1 →

+ . 25 × −1 + −1.00𝑣(1)𝑘=1 ↑

+

.25 × −1 + −1.00𝑣(5)𝑘=1 ↓

+.25 × −1 + 0𝑣(𝑇)𝑘=1 ←

= .25 × −𝟐 − 𝟐 − 𝟐 − 𝟏 = −𝟏. 𝟕𝟓

-1.75 -2.00 -2.00

-2.00 -2.00 -2.00 -2.00

-2.00 -2.00 -2.00 -1.75

-2.00 -2.00 -1.75

0

0

-1.00 -1.00 -1.00

-1.00 -1.00 -1.00 -1.00

-1.00 -1.00 -1.00 -1.00

-1.00 -1.00 -1.00

0

0

𝑣𝑘=2 7= −1 × .25 − 1.00𝑣(7)

𝑘=1 →

+ −1 × .25 − 1.00𝑣(3)𝑘=1 ↑

+−1 × .25 − 1.00𝑣(11)

𝑘=1 ↓

+ −1 × .25 − 1.00.𝑣(6)𝑘=1 ←

=

= .25 × −𝟐 − 𝟐 − 𝟐 − 𝟏 = −𝟐

𝜋 → . = 𝜋 ↑ . =𝜋 ↓ . = 𝜋 ← . = .25

𝑟𝑡 = −1

𝑘 = 1 𝑘 = 2

𝑣𝑘=3 1=.25 × −1 + −2.00𝑣(2)

𝑘=2 →

+ . 25 × −1 + −1.75𝑣(1)𝑘=2 ↑

+

.25 × −1 + −2.00𝑣(5)𝑘=2 ↓

+.25 × −1 + 0𝑣(𝑇)𝑘=2 ←

= .25 × −𝟑 − 𝟐. 𝟕𝟓 − 𝟑 − 𝟏 = −𝟐. 𝟒𝟑

-2.43 -2.93 -3.00

-2.43 -2.93 -3.00 -2.93

-2.93 -3.00 -2.93 -2.43

-3.00 -2.93 -2.43

0

0

-1.75 -2.00 -2.00

-1.75 -2.00 -2.00 -2.00

-2.00 -2.00 -2.00 -1.75

-2.00 -2.00 -1.75

0

0

𝑣𝑘=3 7= −1 × .25 − 2.00𝑣(7)

𝑘=2 →

+ −1 × .25 − 2.00𝑣(3)𝑘=2 ↑

+−1 × .25 − 1.75𝑣(11)

𝑘=2 ↓

+ −1 × .25 − 2.00.𝑣(6)𝑘=2 ←

=

.25 × −𝟑 − 𝟑 − 𝟐. 𝟕𝟓 − 𝟑 = −𝟐.93

𝜋 → . = 𝜋 ↑ . =𝜋 ↓ . = 𝜋 ← . = .25

𝑟𝑡 = −1

𝑘 = 2 𝑘 = 3

𝜋 𝑉

𝜋 → 𝑣𝜋

𝜋 → 𝑔𝑟𝑒𝑒𝑑𝑦(𝑉)

Evaluation

Improvement

𝜋∗ 𝑉∗

https://github.com/rlcode/reinforcement-learning

• Suitable for medium problem of just a few million states.

… …

TD(1)

TD(2)

s,a

r

s’

max

𝑄 𝑆, 𝐴 ← 𝑄 𝑆, 𝐴 + 𝛼(𝑅 + 𝛾max𝑎′

𝑄 𝑆′, 𝑎′ − 𝑄(𝑆, 𝐴))

https://github.com/dbatalov/reinforcement-learning

Rocket Lander DemoGrid World Demo

https://github.com/rlcode/reinforcement-learning

Check this link for proof of the theorem:

https://en.wikipedia.org/wiki/Universal_approximation_theoremDavid Silver

https://web.stanford.edu/class/psych209/Readings/MnihEtAlHassibis15NatureControlDeepRL.pdf

https://web.stanford.edu/class/psych209/Readings/MnihEtAlHassibis15NatureControlDeepRL.pdf

• DQN Agent achieves >75%

of the human score in 29

our of 49 games

• DQN Agent beats human

score (>100%) in 22 games

𝑆𝑐𝑜𝑟𝑒% =(𝐴𝑔𝑒𝑛𝑡 𝑆𝑐𝑜𝑟𝑒 − 𝑅𝑎𝑛𝑑𝑜𝑚 𝑝𝑙𝑎𝑦 𝑆𝑐𝑜𝑟𝑒)

(𝐻𝑢𝑚𝑎𝑛 𝑆𝑐𝑜𝑟𝑒 − 𝑅𝑎𝑛𝑑𝑜𝑚 𝑝𝑙𝑎𝑦 𝑆𝑐𝑜𝑟𝑒)𝑋 100

https://github.com/apache/incubator-mxnet/tree/master/example/reinforcement-learning/dqn

def dqn_sym_nature(action_num, data=None, name='dqn'): """Structure of the Deep Q Network in the Nature 2015 paper

Human-level control through deep reinforcement learning(http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html)""”if data is None:

net = mx.symbol.Variable('data’)else:

net = data net = mx.symbol.Variable('data') net = mx.symbol.Convolution(data=net, name='conv1', kernel=(8, 8), stride=(4, 4), num_filter=32) net = mx.symbol.Activation(data=net, name='relu1', act_type="relu") net = mx.symbol.Convolution(data=net, name='conv2', kernel=(4, 4), stride=(2, 2), num_filter=64) net = mx.symbol.Activation(data=net, name='relu2', act_type="relu") net = mx.symbol.Convolution(data=net, name='conv3', kernel=(3, 3), stride=(1, 1), num_filter=64) net = mx.symbol.Activation(data=net, name='relu3', act_type="relu") net = mx.symbol.Flatten(data=net) net = mx.symbol.FullyConnected(data=net, name='fc4', num_hidden=512) net = mx.symbol.Activation(data=net, name='relu4', act_type="relu") net = mx.symbol.FullyConnected(data=net, name='fc5', num_hidden=action_num) net = mx.symbol.Custom(data=net, name=name, op_type='DQNOutput’)return net

DQN = gluon.nn.Sequential()with DQN.name_scope():

#first layerDQN.add(gluon.nn.Conv2D(channels=32, kernel_size=8,strides = 4,padding = 0))DQN.add(gluon.nn.BatchNorm(axis = 1, momentum = 0.1,center=True))DQN.add(gluon.nn.Activation('relu'))#second layerDQN.add(gluon.nn.Conv2D(channels=64, kernel_size=4,strides = 2))DQN.add(gluon.nn.BatchNorm(axis = 1, momentum = 0.1,center=True))DQN.add(gluon.nn.Activation('relu'))#tird layerDQN.add(gluon.nn.Conv2D(channels=64, kernel_size=3,strides = 1))DQN.add(gluon.nn.BatchNorm(axis = 1, momentum = 0.1,center=True))DQN.add(gluon.nn.Activation('relu'))DQN.add(gluon.nn.Flatten())#fourth layerDQN.add(gluon.nn.Dense(512,activation ='relu'))#fifth layerDQN.add(gluon.nn.Dense(num_action,activation ='relu'))

• Up to eight NVIDIA Tesla V100 GPUs

• 1 PetaFLOPs of computational performance –

14x better than P2

• 300 GB/s GPU-to-GPU communication

(NVLink) – 9X better than P2

• 16GB GPU memory with 900 GB/sec peak GPU

memory bandwidth

T h e f a s t e s t , m o s t p o w e r f u l G P U i n s t a n c e s i n t h e c l o u d

• Get started quickly with easy-to-launch tutorials

• Hassle-free setup and configuration

• Pay only for what you use – no additional charge for

the AMI

• Accelerate your model training and deployment

• Support for popular deep learning frameworks

End-to-End

Machine Learning

Platform

Zero setup Flexible Model

Training

Pay by the second

$

Build, train, and deploy machine learning models at scale

Lots of companies

doing Machine

Learning

Unable to unlock

business potential

Brainstorming Modeling Teaching

Lack ML

expertise

Leverage Amazon experts with decades of ML

experience with technologies like Amazon Echo,

Amazon Alexa, Prime Air and Amazon GoAmazon ML Lab

provides the missing

ML expertise

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.