Artificial Neural Networks

Introduction to Computational NeuroscienceArdi Tampuu17.10.2016

Artificial neural network

NB! Inspired by biology, not based on biology!

ApplicationsAutomatic speech recognition

Automatic image classification and tagging

Natural language modeling

Learning objectives

How do artificial neural networks work?

What types of artificial neural networks are used for what tasks?

What are the state-of-the-art results achieved with artificial neural networks?

How DO neural networks work?

Part 1

Frank Rosenblatt (1957)

Added learning rule to McCulloch-Pitts neuron.

otherwise ,0

0 if,1 2211 bwxwxy

Perceptron

Prediction:

1x 2x 1

otherwise ,0

0 if,1 2211 bwxwxy

Perceptron

Prediction: Learning:

1x 2x 1

xytww iii

If prediction == target, do nothing

If prediction < target, increase weights of positive inputs, decrease wights of negative inputs

If prediction > target, vice versa

Let’s try it out!X Y X OR Y

1 1 1 A B CInitialize A,B,C=0, so output is 0

go over the examples in table:1. t = y, so no changes2. y = 0, t = 1 → A = 0,B = 1, C = 1 3. y = t = 14. y = t = 15. y = 1, t = 0 → A = 0, B = 1, C = 06. y = t = 17. y = 0, t = 1 → A = 1, B = 1, C = 0

Learning:

xytww iii

Perceptron limitations

Perceptron learning algorithm converges only for linearly separable problems (because it only has one layer)

Minsky, Papert, “Perceptrons” (1969)

Multi-layer perceptrons

Add non-linearactivation functions

Add hidden layer(s)

Universal approximation theorem (important!): Any continuous function can be approximated by finite feed-forward neural network with one hidden layer.

Forward propagation

y1 =σ(b11+x1w11+x2w21)z = b21+y1w21+y2w22

(no nonlinearity)

y2 = σ(b12+x1w12+x2w22)

Loss function

• Function approximation:

• Binary classification:

• Multi-class classification:

0 if ,)1log(

1 if ,)log(

j ztL log

)log(z

2)10( z

)1log( z

Backpropagation

ey1= ezW21 σ’(b11+x1w11+x2w21) dL/dz = ez = z-t

∆b11= ey1

∆b12= ey1

∆w11= ey1x1

∆w12= ey2x1

∆w21= ey1x2

∆w22= ey2x2

ey2= ezw22 σ’(b12+x1w12+x2w22)

∆b21= ez

∆w21= ezy1

∆w22= ezy2

y1 = σ(b11+x1w11+x2w21) z = b21+y1w21+y2w22

y2 = σ(b12+x1w12+x2w22)

Derivative of sigmoid:

))(1)(()(' xxx

Gradient Descent

• Gradient descent finds weight values that result in small loss.

• Gradient descent is guaranteed to find only local minimum.

• But there is plenty of them and they are often good enough!

Walking around in energy(loss) landscape based on only local gradient information

Things to remember...

Perceptron was the first artificial neuron model invented in late 1950s.

Perceptron can learn only linearly separable classification problems.

Feed-forward networks with non-linear activation functions and hidden layers can overcome limitations of perceptrons.

Multi-layer artificial neural networks are trained using backpropagation and gradient descent.

Neural networks taxonomyPart 2

Simple feed-forward networks

• Architecture:– Each node connected to all

nodes of previous layer.– Information moves in one

direction only.

• Used for:– Function approximation– Simple classification

problems– Not too many inputs (~100)

OUTPUT LAYER

INPUT LAYER

HIDDEN LAYER

Convolutional neural networks

Hubel & Wiesel (1959)

• Performed experiments with anesthetized cat.

• Discovered topographical mapping, sensitivity to orientation and hierarchical processing.

Simple cells – convolution

Complex cells – pooling

Convolution in neural nets

Recommending music on Spotify

Convolutional neural networks

• Architecture:• Convolutional layer: • local connections +

– weight sharing.– Pooling layer: translation

invariance.

• Used for:– images,– any other data with locality

property, i.e. adjacent characters make up word.

0 1 2 -1

POOLING LAYER

INPUT LAYER

CONVOLUTIONAL LAYER

1 0 -1weights:

Convolution

Convolution searches for the same pattern over the entire image and calculates a score for each match.

Convolution

Convolution searches for the same pattern over the entire image and calculates a score for each match.

1 -1 1

-1 1 -1

1 -1 1

And this..

1 1 -1

1 -1 -1

Now try this:

What do these filers do?

1 -4 1

Pooling

Pooling achieves translation invariance by taking maximum of adjacent convolution scores.

Example: handwritten digit recognition

Y. LeCun et al., “Handwritten digit recognition: Applications of neural net chips and automatic learning”, 1989.

LeCun et al. (1989)

Recurrent neural networks

• Architecture:– Hidden layer nodes

connected to each other.– Allows retaining internal

state and memory.

• Used for:– speech recognition,– handwriting recognition,– any “time” series – brain

activity, DNA reads

OUTPUT LAYER

INPUT LAYER

RECURRENT HIDDEN LAYER

Backpropagation through time

OUTPUT LAYER

INPUT LAYER

H0HIDDEN LAYER

T4T3T2T1

same W

Auto-encoders

• Architecture:– Input and output are the

same !!– Hidden layer functions as

a “bottleneck”.– Network is trained to

reconstruct input from hidden layer activations.

• Used for:– image search– dimensionality reduction

OUTPUT LAYER = INPUT LAYER

INPUT LAYER

HIDDEN LAYER

We didn’t talk about...

• Restricted Boltzmann Machines (RBMs)

• Long Short Term Memory networks (LSTMs)

• Echo State Networks / Liquid State Machines

• Hopfield Network

• Self-organizing maps (SOMs)

• Radial basis function networks (RBFs)

• But we covered the most important ones!

Things to remember...

Simple feed-forward networks are usually used for function approximation, i.e. predict energy consumption.

Convolutional neural networks are mostly used for images.

Recurrent neural networks are used for speech recognition and language modeling

Autoencoders are used for dimensionality reduction.

State-of-the-art resultsPart 3

Deep Learning

Artificial neural networks and backpropagation have been around since 1980s. What’s all this fuss about “deep learning”?

• What has changed:– we have much bigger datasets,– we have much faster computers (think GPUs),– we have learned a few tricks how to train

networks with very very many (150) layers.

GoogLeNet

ImageNet 2014 winner – 27 layers, 5M weights.

Szegedy et al., “Going Deeper with Convolutions” (2014).

ImageNet classification

Try it yourself: http://www.clarifai.com/#demo

Wu et al., “Deep Image: Scaling up Image Recognition” (2015). Ioffe, Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” (2015).

current best 4,9%(human error 5,1%)

Automatic image descriptions

Karpathy, Fei-Fei, “Deep Visual-Semantic Alignments for Generating Image Descriptions” (2014)

Reinforcement learning

Pong Breakout Space Invaders

Seaquest Beam Rider Enduro

screen score

actions

https://github.com/tambetm/simple_dqnMnih et al., “Human-level control through deep reinforcement learning” (2015)

Multiagent reinforcement learning

Tampuu, Matiisen et al., “Multiagent Cooperation and Competition with DeepReinforcement Learning” (2015)

Videos on YouTube about competitive mode and collaborative mode

Program execution

Curriculum learning – learning simple expressions first and then more complex – proved to be essential.

Zaremba, Sutskever, “Learning to Execute” (2015).

● Neural Turing Machines

● Memory Networks – writing and reading from external memory(infinite memory)

The future of AI?

For example: Hybrid computing using a neural network with dynamic external memory (Graves, Hassabis et al. 2016)

Thank you!

Artificial Neural Networks - ut

Transcript of Artificial Neural Networks - ut

Artificial Neural Networks - ut

Documents

Transcript of Artificial Neural Networks - ut

Teoría Cuántica Neural

CS 4700: Foundations of Artificial Intelligence

Ut Important

ct223 cap6 Practical ML - comp.ita.brpauloac/ct223/ct223_cap6_parte2.pdf · • Othermodel selection problems: choosing the size of a decision tree or artificial neural network •

Artificial Neural Networks 11library.tee.gr/digital/techr/1992/techr_1992_b_12_1_107.pdf · In the second part of the paper, the Perceptron, the Adaline/Madaline, and ιhe mulιilayer

Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation

neural network lecture 3

INTELIGÊNCIA ARTIFICIAL - paginas.fe.up.ptpaginas.fe.up.pt/~eol/IA/1617/APONTAMENTOS/2MRPeAG.pdf · Eugénio Oliveira / FEUP INTELIGÊNCIA ARTIFICIAL IA Est Lóg 1ªO LNM LMo M.Fracos

UT PIGTURA PHOTOGRAPHIA€¦ · UT PIGTURA PHOTOGRAPHIA ΣΧΟΛΙΟ ΣΕ ΜΙΑ ΦΩΤΟΓΡΑΦΙΑ ΤΟΥ ΣΠΥΡΟΥ ΜΕΛΕΤΖΗ ... Renselaer Lee, Ut pictura poesis: The

DO UT DES - olympias.lib.uoi.grolympias.lib.uoi.gr/jspui/bitstream/123456789/6112/1/10. Do ut des.pdf · do ut des 255 Ο στρατηγός, απελπισμένος, μαζί με

UT Campus Parking Map 2016-17_v5

GeneralizedLinearModels - ut · Tweediedistributionwith1

Protocolo de oro para inseminación artificial a tiempo ...axonveterinaria.net/.../criaysalud/...artificial_tiempo_fijo_IATF.pdf · nº 45 Protocolo de oro para inseminación artificial

Convex Optimization for Neural Networks

Orbital Dynamics: an overview of asteroid and artificial ...

(Artificial) Neural Network

9 Mass Transport Processes in Artificial Organs

Artificial Intelligence Music Generators in Real Time Jazz ...

ARTIFICIAL INTELLIGENCE IS THE FUTURE The AI …...ARTIFICIAL INTELLIGENCE IS THE FUTURE ISSUE.01 SPRING 2019 PHI QUARTERLY Φ ARTIFICIAL INTELLIGENCE IS THE FUTURE ISSUE.01 SPRING

ETH_CUM UT