Some slides and images are taken from: David Wolfe...

Introduction to Deep Learning

Some slides and images are taken from:

David Wolfe Corne https://www.macs.hw.ac.uk/~dwcorne/Teaching/introdl.pptWikipediaGeoffrey A. Hinton

Feedforward networks for function approximation and classification

• Feedforward networks • Deep tensor networks • ConvNets

Terminal Branches of Axon

Dendrites

A single artificial neuron

inputs

weights

summation

nonlinearactivationfunction

biasnode

output / activation of the neuron

Activation functionshttps://en.wikipedia.org/wiki/Activation_function

Traditionallyused

Activation functionshttps://en.wikipedia.org/wiki/Activation_function

Nair and Hinton: Rectified linear units improve restricted Boltzmann machines ICLM’10, 807-814 (2010)Glorot, Bordes and Bengio: Deep sparse rectifier neural networks. PMLR 15:315-323 (2011)

Currently most widely used. Empirically easier to train and results in sparse networks.

Perceptron (Rosenblatt 1957)Used as a binary classifier - equivalent to support vector machine (SVM)

Perceptron (Rosenblatt 1957)

input vector

hidden layers

outputs

Back-propagate error signal to get derivatives for learning

Compare outputs with correct answer to get error signal

Second-generation neural networks (~1985) Slide credit : Geoffrey Hinton

input vector

hidden layers

outputs

Slide credit : Geoffrey Hinton

Second-generation neural networks (~1985)Error at output for a given example:

input vector

hidden layers

outputs

Error sensitivity at output neuron j:

Backpropagate error sensitivity to neuron i:wji

Sensitivity on weight wji:

input vector

hidden layers

outputs

Error sensitivity at output neuron j:

Backpropagate error sensitivity to neuron i:wji

Sensitivity on weight wji:

Update weight wji:

learning rate

Initial random weights

A decision boundary perspective on learning

Present a training instance / adjust the weights

Eventually ….

NNs use nonlinear f(x) so they can draw complex boundaries, but keep the data unchanged

Nonlinear versus linear models

Kernel methods only draw straight lines, but transform the data first in a way that makes it linearly separable

Universal Representation Theorem

• Networks with a single hidden layer can represent any function F(x)with arbitrary precision in the large hidden layer size limit

• However, that doesn‘t mean, networks with single hidden layers are efficient in representing arbitrary functions. For many datasets,deep networks can represent the function F(x) even with narrow layers.

What does a neural network learn?

Feature detectors

What is this unit doing?

1 5 10 15 20 25 …

strong +ve weight

low/zero weight

Hidden layer units become self-organized feature detectors

1 5 10 15 20 25 …

strong +ve weight

low/zero weight

what does this unit detect?

1 5 10 15 20 25 …

strong +ve weight

low/zero weight

it will send strong signal for a horizontal line in the top row, ignoring everywhere else

1 5 10 15 20 25 …

strong +ve weight

low/zero weight

1 5 10 15 20 25 …

strong +ve weight

low/zero weight

Strong signal for a dark area in the top left corner

What features might you expect a good NN to learn, when trained with data like this?

Horizontal lines

Small circles

But what about position invariance?Our example unit detectors were tied to specific parts of the image

Small circles

Deep Networks

etc …detect lines in

Specific positions

Higher level detetors ( horizontal line, “RHS vertical lune” “upper loop”, etc…

etc …

successive layers can detect higher-level features

etc …detect lines in

Specific positions

Higher level detetors ( horizontal line, “RHS vertical lune” “upper loop”, etc…

etc …

What does this unit detect?

successive layers can detect higher-level features

So: multiple layers make sense

Multiple layers are also found in the brain, e.g. visual cortex

But: until recently deep networks could not be efficiently trained

Convolutional Neural Networks

Convolutional Kernel / Filter

Apply convolutions

Convolutional filters perform image processing

Example: filters in face recognition

MNIST dataset

Misclassified examples

Applications to molecular systems

1) Learning to represent (effective) energy function

Behler-Parrinello network

Cartesiancoordinates

Internalcoordinates

Neural networks(may be shared for same atom types)

Atomic energiesTotal energy

Behler and Parrinello, PRL 98, 146401 (2007)

1) Learning to represent (effective) energy function

Schuett, Arbabzadah, Chmiela, Müller & Tkatchenko, Nature Communications 8, 13890 (2017)

2) Generator networks

Gómez-Bombarelli, …, Aspuru-Guzik: Automatic Chemical Design using Variational Autoencoders (2016)

3) VAMPnets

Mardt, Pasquali, Wu & Noé: VAMPnets - deep learning of molecular kinetics (2017)

3) VAMPnets

Mardt, Pasquali, Wu & Noé: VAMPnets - deep learning of molecular kinetics (2017)

Some slides and images are taken from: David Wolfe...

Documents

Transcript of Some slides and images are taken from: David Wolfe...

Audrey Terras 2/16/2004 fun with zeta and L- functions of ...helper.ipam.ucla.edu/publications/agg2004/agg2004_3693.pdf · Audrey Terras 2/16/2004 1 fun with zeta and L- functions

A PRACTICAL VIEW OF METAMATERIAL-ENGINEERED ...helper.ipam.ucla.edu/publications/meta2010/meta2010_9000.pdf1 IPAM Workshop on Metamaterials UCLA Los Angeles, CA 28 January, 2010 Department

W krainie asklepiosa

Θερμοδναμική - eclass.emt.ihu.gr · 2 3 2 243 RT a p # E # p RT 2a ###E p 2RT 6a ###E w w w w p 0 # w w 2 2 p 0 # w w Υπολογισμός ων σ αθερών a και

Lecture 5 - Stanford Universitysporadic.stanford.edu/quantum/lecture5.pdfLecture 5 Daniel Bump May 28, 2019 U W U W U W = U W U W U W. Review of Tangles and Ribbon Categories uV. vV,

[scale=.25]img/KTH.png *.5mm The origin and limitations of .../main.pdf · sinusoidal signals, i.e. E =Re E 1ejw1t +E 2ejw2t where w 1 = w a w 0 and w 2 = w a +w 0, that assures that

±ºRË∞OW «∞bË‰ ´s «≤∑NUØU‹ «∞IU≤uÊ «∞bË∞w «ù≤ºU≤w · ±ºRË∞OW «∞bË‰ ´s «≤∑NUØU‹ «∞IU≤uÊ «∞bË∞w «ù≤ºU≤w

A Mathematical Introduction to Traffic Flow Theory - …helper.ipam.ucla.edu/publications/tratut/tratut_12985.pdfBenjamin Seibold (Temple University) Mathematical Intro to Tra c Flow

W/Z inclusive measurements in ATLAS - Standard Model @ LHC ... · W/Zcross-section:In˝ lepton W! ˝ andZ! ˝˝ crosssections W! ˝ using2010data(˘35pb 1) Z! ˝˝ using1.32-1.55fb

Ridge&Regression - University of Chicagottic.uchicago.edu/~ryotat/teaching/enshu12.pdfwhere w¯ is&the&mean&es6mator w¯ = E ξ wˆ E ξ wˆ − w¯ 2 =? w¯ − w∗ 2 =? Analyze&the&variance&(sketch)

W¹cOHM²« W×zö« w½ËUF²« w×B« ÊULC ... - moh.gov.sa · rOŠd« sLŠd« tK« rÐ ∫ wK¹U —dI¹ w×B« ÊU?LC« WIOŁËË w½ËUF?²« w×B« ÊULC« ÂU?EM W¹cOHM²?«

Lecture 23 Layout of Analog Circuits (brief) Common Mode ...class.ece.iastate.edu/ee435/lectures/EE 435 Lect 23 Spring 2021.pdf3A 3B 3 3B 3A W +W =W W

Fiber Reinforced Concrete, Chalmers research “ - an … concrete σ fct ∆l w FRC Concrete w l ∆l w ≈ 0.05 mm w c ≈ 0.3 mm w = lf / 2 Fibre contribution Residual tensile stress

Deep Neural Networks Motivated By Ordinary Differential ...helper.ipam.ucla.edu/publications/mlptut/mlptut_16188.pdfOrdinary Differential Equations Machine Learning for Physics and

String Quartet No. 4 - Matthew F. Chung · B? Vln. 1 Vln. 2 Vla. Vc. 37 nw w #w w ˙. Œ ˙. Œ ˙. Œ w π π π π w bw bw bw ˙. Œ w w w ˙ ˙ nw bw ˙ ˙ b˙ ˙ w w ˙b˙ w w

W Lan Userguide

2pt Frank-Wolfe Algorithm & Alternating Direction Method ... · Frank-Wolfe Algorithm & Alternating Direction Method of Multipliers Ives Mac^edo ijamj@cs.ubc.ca October 27, 2015 ...

w w w . c y s u r g . o r g 2 · w w w . c y s u r g . o r g 2 Αγαπητοί υνάδελφοι, Έχουμε την τιμή και την χαρά να σας προσκαλέσουμε

Artykuły w czasopismach wyróżnionych w JCR - czasopisma z ... fileArtykuły w czasopismach wyróżnionych w JCR - czasopisma z listy A MNiSW 1. B. Romanowska-Dixon, M. Elas, J.

TPA3003D2: 3-W Stereo Class-D Audio Power Amplifier w/DC ...