Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes...

26
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat

Transcript of Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes...

Page 1: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Neural Networks, Computation Graphs

CMSC 470

Marine Carpuat

Page 2: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Binary Classification with a Multi-layer Perceptron

φ“A” = 1

φ“site” = 1

φ“,” = 2

φ“located” = 1

φ“in” = 1

φ“Maizuru”= 1

φ“Kyoto” = 1

φ“priest” = 0

φ“black” = 0

-1

Page 3: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Example: binary classification with a NN

X

O

O

X

φ0(x2) = {1, 1}φ

0(x1) = {-1, 1}

φ0(x4) = {1, -1}φ

0(x3) = {-1, -1}

1

1

-1

-1

-1

-1

φ0[0]

φ0[1]

φ1[1]

φ1[0]

φ1[0]

φ1[1]

φ1(x1) = {-1, -1}

X O φ1(x2) = {1, -1}

Oφ1(x3) = {-1, 1}

φ1(x4) = {-1, -1}

1

1

2[0] = y

Page 4: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Example: the Final Net

tanh

tanh

φ0[0]

φ0[1]

1

φ0[0]

φ0[1]

1

1

1

-1

-1

-1

-1

1 1

1

1

tanh

φ1[0]

φ1[1]

φ2[0]

Replace “sign” withsmoother non-linear function

(e.g. tanh, sigmoid)

Page 5: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Multi-layer Perceptronsare a kind of “Neural Network” (NN)

φ“A” = 1

φ“site” = 1

φ“,” = 2

φ“located” = 1

φ“in” = 1

φ“Maizuru”= 1

φ“Kyoto” = 1

φ“priest” = 0

φ“black” = 0

-1

• Input (aka features)

• Output

• Nodes (aka neuron)

• Layers

• Hidden layers

• Activation function

(non-linear)

Page 6: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Neural Networks as Computation Graphs

Example & figures by Philipp Koehn

Page 7: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Computation Graphs Make Prediction Easy:Forward Propagation

Page 8: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Computation Graphs Make Prediction Easy:Forward Propagation

Page 9: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Neural Networks as Computation Graphs

• Decomposes computation into simple operations over matrices and vectors

• Forward propagation algorithm• Produces network output given an output

• By traversing the computation graph in topological order

Page 10: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Neural Networks for Multiclass Classification

Page 11: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Multiclass Classification

● The softmax function

Exact same function as in multiclass logistic regression

𝑃 𝑦 ∣ 𝑥 =𝑒𝐰⋅ϕ 𝑥,𝑦

𝑦 𝑒𝐰⋅ϕ 𝑥, 𝑦

Current class

Sum of other classes

Page 12: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Example: A feedforward Neural Networkfor 3-way Classification

Sigmoid function

Softmaxfunction (as

in multi-class logistic reg)

From Eisenstein p66

Page 13: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Designing Neural Networks:Activation functions• Hidden layer can be viewed as

set of hidden features

• The output of the hidden layer indicates the extent to which each hidden feature is “activated” by a given input

• The activation function is a non-linear function that determines range of hidden feature values

Page 14: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Designing Neural Networks:Network structure

• 2 key decisions:• Width (number of nodes per layer)

• Depth (number of hidden layers)

• More parameters means that the network can learn more complex functions of the input

Page 15: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Neural Networks so far

• Powerful non-linear models for classification

• Predictions are made as a sequence of simple operations• matrix-vector operations

• non-linear activation functions

• Choices in network structure• Width and depth

• Choice of activation function

• Feedforward networks (no loop)

• Next: how to train?

Page 16: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Training Neural Networks

Page 17: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

How do we estimate the parameters (aka “train”) a neural net?For training, we need:

• Data: (a large number of) examples paired with their correct class (x,y)

• Loss/error function: quantify how bad our prediction y is compared to the truth t• Let’s use squared error:

Page 18: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Stochastic Gradient Descent

• We view the error as a function of the trainable parameters, on a given dataset

• We want to find parameters that minimize the error

w = 0for I iterations

for each labeled pair x, y in the data

w = w − μ𝑑error(w, x, y)

𝑑w

Start with some initial parameter values

Go through the training data one example at a time

Take a step down the gradient

Page 19: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Computation Graphs Make Training Easy:Computing Error

Page 20: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Computation Graphs Make Training Easy:Computing Gradients

Page 21: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Computation Graphs Make Training Easy:Given forward pass + derivatives for each node

Page 22: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Computation Graphs Make Training Easy:Computing Gradients

Page 23: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Computation Graphs Make Training Easy:Computing Gradients

Page 24: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Computation Graphs Make Training Easy:Updating Parameters

Page 25: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Computation Graph: A Powerful Abstraction

• To build a system, we only need to:• Define network structure• Define loss• Provide data• (and set a few more hyperparameters to control training)

• Given network structure• Prediction is done by forward pass through graph (forward propagation)• Training is done by backward pass through graph (back propagation)• Based on simple matrix vector operations

• Forms the basis of neural network libraries• Tensorflow, Pytorch, mxnet, etc.

Page 26: Neural Networks, Computation Graphs · Neural Networks as Computation Graphs •Decomposes computation into simple operations over matrices and vectors •Forward propagation algorithm

Neural Networks

• Powerful non-linear models for classification

• Predictions are made as a sequence of simple operations• matrix-vector operations

• non-linear activation functions

• Choices in network structure• Width and depth

• Choice of activation function

• Feedforward networks (no loop)

• Training with the back-propagation algorithm• Requires defining a loss/error function

• Gradient descent + chain rule

• Easy to implement on top of computation graphs