Download - Artificial Neural Networks - ut

Transcript

Artificial Neural Networks

Introduction to Computational NeuroscienceArdi Tampuu17.10.2016

Artificial neural network

NB! Inspired by biology, not based on biology!

ApplicationsAutomatic speech recognition

Automatic image classification and tagging

Natural language modeling

Learning objectives

How do artificial neural networks work?

What types of artificial neural networks are used for what tasks?

What are the state-of-the-art results achieved with artificial neural networks?

How DO neural networks work?

Part 1

Frank Rosenblatt (1957)

Added learning rule to McCulloch-Pitts neuron.

otherwise ,0

0 if,1 2211 bwxwxy

Perceptron

Prediction:

ΣΣ

1x 2x 1

1w 2w

otherwise ,0

0 if,1 2211 bwxwxy

Perceptron

Prediction: Learning:

ΣΣ

1x 2x 1

1w 2w

)(

ytbb

xytww iii

If prediction == target, do nothing

If prediction < target, increase weights of positive inputs, decrease wights of negative inputs

If prediction > target, vice versa

Let’s try it out!X Y X OR Y

0 0 0

0 1 1

1 0 1

1 1 1 A B CInitialize A,B,C=0, so output is 0

go over the examples in table:1. t = y, so no changes2. y = 0, t = 1 → A = 0,B = 1, C = 1 3. y = t = 14. y = t = 15. y = 1, t = 0 → A = 0, B = 1, C = 06. y = t = 17. y = 0, t = 1 → A = 1, B = 1, C = 0

Learning:

)(

ytbb

xytww iii

Page 10: Artificial Neural Networks - ut

Perceptron limitations

Perceptron learning algorithm converges only for linearly separable problems (because it only has one layer)

Minsky, Papert, “Perceptrons” (1969)

Page 11: Artificial Neural Networks - ut

Multi-layer perceptrons

Add non-linearactivation functions

Add hidden layer(s)

Universal approximation theorem (important!): Any continuous function can be approximated by finite feed-forward neural network with one hidden layer.

Page 12: Artificial Neural Networks - ut

Forward propagation

y1 =σ(b11+x1w11+x2w21)z = b21+y1w21+y2w22

(no nonlinearity)

b11

b12

w11

w12

w21

w22

y2 = σ(b12+x1w12+x2w22)

b21

w21

w22

Page 13: Artificial Neural Networks - ut

Loss function

• Function approximation:

• Binary classification:

• Multi-class classification:

2)(2

1ztL

0 if ,)1log(

1 if ,)log(

tzL

j ztL log

)log(z

2)10( z

)1log( z

Page 14: Artificial Neural Networks - ut

Backpropagation

ey1= ezW21 σ’(b11+x1w11+x2w21) dL/dz = ez = z-t

∆b11= ey1

∆b12= ey1

∆w11= ey1x1

∆w12= ey2x1

∆w21= ey1x2

∆w22= ey2x2

ey2= ezw22 σ’(b12+x1w12+x2w22)

∆b21= ez

∆w21= ezy1

∆w22= ezy2

y1 = σ(b11+x1w11+x2w21) z = b21+y1w21+y2w22

y2 = σ(b12+x1w12+x2w22)

Derivative of sigmoid:

))(1)(()(' xxx

Page 15: Artificial Neural Networks - ut

Gradient Descent

• Gradient descent finds weight values that result in small loss.

• Gradient descent is guaranteed to find only local minimum.

• But there is plenty of them and they are often good enough!

Page 16: Artificial Neural Networks - ut

Walking around in energy(loss) landscape based on only local gradient information

Page 17: Artificial Neural Networks - ut

Things to remember...

Perceptron was the first artificial neuron model invented in late 1950s.

Perceptron can learn only linearly separable classification problems.

Feed-forward networks with non-linear activation functions and hidden layers can overcome limitations of perceptrons.

Multi-layer artificial neural networks are trained using backpropagation and gradient descent.

Page 18: Artificial Neural Networks - ut

Neural networks taxonomyPart 2

Page 19: Artificial Neural Networks - ut

Simple feed-forward networks

• Architecture:– Each node connected to all

nodes of previous layer.– Information moves in one

direction only.

• Used for:– Function approximation– Simple classification

problems– Not too many inputs (~100)

OUTPUT LAYER

INPUT LAYER

HIDDEN LAYER

Page 20: Artificial Neural Networks - ut

Convolutional neural networks

Page 21: Artificial Neural Networks - ut

Hubel & Wiesel (1959)

• Performed experiments with anesthetized cat.

• Discovered topographical mapping, sensitivity to orientation and hierarchical processing.

Simple cells – convolution

Complex cells – pooling

Page 22: Artificial Neural Networks - ut

Convolution in neural nets

Recommending music on Spotify

Page 23: Artificial Neural Networks - ut

Convolutional neural networks

• Architecture:• Convolutional layer: • local connections +

– weight sharing.– Pooling layer: translation

invariance.

• Used for:– images,– any other data with locality

property, i.e. adjacent characters make up word.

-2

2 3

2 1

0 1 2 -1

POOLING LAYER

INPUT LAYER

-3

CONVOLUTIONAL LAYER

1 0 -1weights:

max

Page 24: Artificial Neural Networks - ut

Convolution

Convolution searches for the same pattern over the entire image and calculates a score for each match.

1 0 1

0 1 0

1 0 1

Page 25: Artificial Neural Networks - ut

Convolution

Convolution searches for the same pattern over the entire image and calculates a score for each match.

1 -1 1

-1 1 -1

1 -1 1

And this..

1 1 1

1 1 -1

1 -1 -1

Now try this:

Page 26: Artificial Neural Networks - ut

What do these filers do?

1 1 1

0 1 0

1 -4 1

0 1 0

Page 27: Artificial Neural Networks - ut

1 1 1

0 1 0

1 -4 1

0 1 0

Page 28: Artificial Neural Networks - ut

Pooling

Pooling achieves translation invariance by taking maximum of adjacent convolution scores.

Page 29: Artificial Neural Networks - ut

Example: handwritten digit recognition

Y. LeCun et al., “Handwritten digit recognition: Applications of neural net chips and automatic learning”, 1989.

LeCun et al. (1989)

Page 30: Artificial Neural Networks - ut

Recurrent neural networks

• Architecture:– Hidden layer nodes

connected to each other.– Allows retaining internal

state and memory.

• Used for:– speech recognition,– handwriting recognition,– any “time” series – brain

activity, DNA reads

OUTPUT LAYER

INPUT LAYER

RECURRENT HIDDEN LAYER

Page 31: Artificial Neural Networks - ut

Backpropagation through time

OUTPUT LAYER

INPUT LAYER

H0HIDDEN LAYER

time

T4T3T2T1

same W

Page 32: Artificial Neural Networks - ut

Auto-encoders

• Architecture:– Input and output are the

same !!– Hidden layer functions as

a “bottleneck”.– Network is trained to

reconstruct input from hidden layer activations.

• Used for:– image search– dimensionality reduction

OUTPUT LAYER = INPUT LAYER

INPUT LAYER

HIDDEN LAYER

Page 33: Artificial Neural Networks - ut

We didn’t talk about...

• Restricted Boltzmann Machines (RBMs)

• Long Short Term Memory networks (LSTMs)

• Echo State Networks / Liquid State Machines

• Hopfield Network

• Self-organizing maps (SOMs)

• Radial basis function networks (RBFs)

• But we covered the most important ones!

Page 34: Artificial Neural Networks - ut

Things to remember...

Simple feed-forward networks are usually used for function approximation, i.e. predict energy consumption.

Convolutional neural networks are mostly used for images.

Recurrent neural networks are used for speech recognition and language modeling

Autoencoders are used for dimensionality reduction.

Page 35: Artificial Neural Networks - ut

State-of-the-art resultsPart 3

Page 36: Artificial Neural Networks - ut

Deep Learning

Artificial neural networks and backpropagation have been around since 1980s. What’s all this fuss about “deep learning”?

• What has changed:– we have much bigger datasets,– we have much faster computers (think GPUs),– we have learned a few tricks how to train

networks with very very many (150) layers.

Page 37: Artificial Neural Networks - ut

GoogLeNet

ImageNet 2014 winner – 27 layers, 5M weights.

Szegedy et al., “Going Deeper with Convolutions” (2014).

Page 38: Artificial Neural Networks - ut

ImageNet classification

Try it yourself: http://www.clarifai.com/#demo

Wu et al., “Deep Image: Scaling up Image Recognition” (2015). Ioffe, Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” (2015).

current best 4,9%(human error 5,1%)