Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative...

42
Last lecture summary

Transcript of Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative...

Page 1: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Last lecture summary

Page 2: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

• biologically motivated• synapses

• Neuron accumulates (Σ) positive/negative stimuli from other neurons.

• Then Σ is processed further – f(Σ) – to produce an output, i.e. neuron sends an output signal to neurons connected to it.

Page 3: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Neural networks for applied science and engineering, Samarasinghe

Page 4: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

• threshold neuron (McCulloch-Pitts)– only binary inputs and output– the weights are pre-set, no learning– set the threshold so that the classification is

correct

x – inputsw – weightsf(Σ) – activation (tansfer) functiony - output

Page 5: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Heavyside (threshold) activation function

Page 6: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

• Threshold w0 is incorporated as a weight of one additional input with input value x0 = 1.0.

• Such input is called bias.

2

0 1 1 2 20

1.0j jj

w x w w x w x

Page 7: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

• binary classifier, maps its input x (real-valued vector) to f(x) – a binary value (0 or 1)

• f(x) =• 1 … w∙x > 0 (including bias)• 0 … otherwise

• perceptron can adjust its weights (i.e. can learn) – perceptron learning algorithm

Perceptron

Page 8: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Multiple output perceptron• for multicategory (i.e. more than 2 classes) classification• one output neuron for each class

input layer

output layer

Page 9: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Learning

• Learning means there exist an algorithm for setting neuron’s weights (threshold w0 is also set).– delta rule – gradient descent

– β – learning rate

11 1 1 1i i iw w w w x

Page 10: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

• iterative algorithm, one pass through the whole training set (epoch) is not enough

• online learning – adjust weights after each input pattern

presentation– weight oscillation may occur

• batch learning– obtain the error gradient for each input pattern,

average them at the end of the epoch

Page 11: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

• Supervised learning using delta rule1. Transmit an input pattern through connections

whose weights are initially set to random values.2. The weighted inputs are summed, the output is

produced, and is compared with the given target output to determine error for this pattern.

3. Inputs and target outputs are presented repeatedly, and the weights are adjusted using the delta rule at each iteration or after an epoch until the minimum possible square error is achieved.

4. This usually involves the iterative presentation of the entire training dataset many times.

Page 12: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

New stuffFinishing perceptron

Page 13: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Perceptron failure

• Please, help me and draw on the blackboard following functions:– AND, OR, XOR (eXclusive OR, true when exactly one of the

operands is true, otherwise false)

0 1

0

1

0 1

0

1

0 1

0

1

AND OR XOR

???

Page 14: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Play with

http://lcn.epfl.ch/tutorial/english/perceptron/html/index.html

Page 15: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

• Perceptron uses linear activation function, so only linearly separable problems can be solved.

• 1969 – famous book “Perceptrons” by Marvin Minsky and Seymour Papert showed that it was impossible for these classes of network to learn an XOR function.

• They conjectured (incorrectly !) that a similar result would hold for a perceptron with three or more layers.

• The often-cited Minsky/Papert text caused a significant decline in interest and funding of neural network research. It took ten more years until neural network research experienced a resurgence in the 1980s.

Page 16: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Play with

http://www.eee.metu.edu.tr/~halici/courses/543java/NNOC/Perceptron.html

Page 17: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Multilayer perceptron

Page 18: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Nonlinear activation functions

• So far we met threshold and linear activation functions.

• They are linear, and conversely the solved problems must also be linear.

• The nonlinearity is introduced by using nonlinear activation functions.

Page 19: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

logistic (sigmoid, unipolar) tanh (bipolar)

1

1 e

tanh

e e

e e

Page 20: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Multilayer perceptron

• MLP, the most famous type of neural network

input layer hidden layer output layer

Page 21: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

input layer hidden layer output layer

three-layer vs. two-layer

Page 22: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Backpropagation training algorithm

• How to train MLP?• Gradient descent type of algorithm called

backpropagation.• MLP works in two passes:• forward pass– present a training sample to the neural network– compare the network's output to the desired

output from that sample– calculate the error in each output neuron

Page 23: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

• backward pass– compute the amount ∆w by which the weights

should be updated– first calculate gradient for hidden-to-output weights– then calculate gradient for input-to-hidden weights

• the knowledge of gradhidden-output is necessary to calculate gradinput-hidden

– update the weights in the network

• It is a gradient descent method– learning rate β is used– can get trapped in local minima

1m m m m mw w w w d

Page 24: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

input signal propagates forward

error propagates backward

Page 25: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

• online learning vs. batch learning– In online learning the weights are changed after

each presentation of a training pattern.• Weights may oscillate.• Suitable for online learning.

– In batch learning, the total gradient for the whole epoch is represented as the sum of the gradient for each of the n patterns.

• Batch learning improves the stability by averaging.

• Another averaging approach providing stability is using the momentum.

Page 26: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

• This method basically tags the average of the past weight changes onto the new weight increment at every weight change, thereby smoothing out the net weight change.

• Momentum μ is between 0 and 1.• It indicates the relative importance of the past

weight change ∆wm-1 on the new weight increment ∆wm

• Thus, the current gradient and the past weight change together decide how much the new weight increment will be.

1 1m m mw w d

Page 27: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

• For example, if μ is equal to 0, momentum does not apply at all, and the past history has no place.

• If μ is equal to 1, the current change is totally based on the past change.

• Values of μ between 0 and 1 result in a combined response to weight change.

1 1m m mw w d

Page 28: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

• The equation is recursive , so the influence of the past weight change incorporates that of all previous weight changes as well.

• Momentum can be used with both batch and online learning.

• In batch learning, it can provide further stability to the gradient descent.

• Momentum can be especially useful in online learning to minimize oscillations in error after the presentation of each pattern.

1 1m m mw w d

Page 29: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Delta-Bar-Delta• In backpropagation the same learning rate β

applies to all of the weights. • More flexibility could be achieved if each weight is

adjusted independently.• This method is called delta-bar-delta (TurboProp).• Each weight has its own learning rate, they’re

adjusted as follows: – if the direction in which the error decreases at the

current point is the same as the direction in which the error has been decreasing recently, then the learning rate is increased.

– if the opposite is true, the learning rate is decreased

Page 30: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Second order methods

• Surface curvature can be used to guide the error down the error surface more efficiently.

Page 31: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

grad is a vector pointing in the direction of the greatest rate of increase of the function.

How fast changes the rate of increase of the function in the small neighbourhood?

This is given as the derivative of gradient, derivative of derivative, i.e. second derivative.

The second derivatives with respect to all pairs of weights are given as the Hessian matrix.

2 2 2

21 1 2 1

2 2 2

22 1 2 2

2 2 2

21

n

n

n n n n

E E E

w w w w w

E E E

H w w w w w

E E E

w w w w w

Page 32: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

• Common methods using the Hessian– QuickProp– Gauss-Newton– Levenberg-Marquardt (LM)

• These methods are order of magnitude faster (i.e. they reach minima in much less epochs) than first order methods (i.e. gradient based).

• However, the efficiency is gained at a considerable computational cost.– Computing and inverting Hessian for large

networks with large number of training patterns is expensive (large storage requirements) and slow.

Page 33: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Bias-variance

• Just a small reminder• bias (lack of fit, undefitting) – model does not

fit data enough, not enough flexible (too small number of parameters)

• variance (overfitting) – model is too flexible (too much parameters), fits noise

• bias-variance tradeoff – improving the generalization ability of the model (i.e. find the correct amount of flexibility)

Page 34: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

• Parameters in MLP: weights• If you use one more hidden neuron, the

number of weights increases by how much?– # input neurons + # output neurons

• If MLP is used for regression task, be careful!• To use MLP statistically correctly, the number

of degrees of freedoms (i.e. weights) can’t exceed the number of data points.– Compare to polynomial regression example from

the 2nd lecture

Page 35: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Improving generalization of MLP

• Flexibility comes from hidden neurons.• Choose such a # of hidden neurons so neither

undefitting, nor overfitting occurs.• Three most common approaches:– exhaustive search– early stopping– regularization

Page 36: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Exhaustive search

• Increase a number of hidden units, and monitor the performance on the validation data set.

number of neurons

Page 37: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Early stopping• fixed and large number of neurons is used• network is trained while testing its performance

on a validation set at regular intervals• minimum at validation error – correct weights

epochs

Page 38: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Weight decay

• Idea: keep the growth of weights to a minimum in such a way that non-important weights are pulled toward zero

• Only the important weights are allowed to grow, others are forced to decay

• regularization

Page 39: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

• This is achieved not by minimizing MSE, but by minimizing

• second term – regularization term• m – number of weights in the network• δ – regularization parameter– the larger the δ, the more important the

regularization

2

1

m

jj

W MSE w

Page 40: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Network pruning

• Both early stopping and weight decay use all weights in the NN. They do not reduce the complexity of the model.

• Network pruning – reduce complexity by keeping only essential weights/neurons.

• Several pruning approaches, e.g.– optimal brain damage (OBD)– optimal brain surgeon (OBS)– optimal cell damage (OCD)

Page 41: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

OBD

• Based on sensitivity analysis– systematically change parameters in a model to

determine the effects of such changes• Weights that are not important for input-

output mapping are removed.• The importance (saliency) of the weight is

measured based on the cost of setting a weight to zero.

Page 42: Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

• How to perform OBD?1. Train flexible network in a normal way (i.e. use

early stopping, weight decay, …)2. Compute saliency for each weight. Remove

weight with small saliencies.3. Train again the reduced network with kept

weights. Initialize the training with their values obtained in the previous step.

4. Repeat from step 1.