Multi-layer neural networks - people.cs.pitt.edumilos/courses/cs2750-Spring03/lectures/... · 5 CS...

11

Click here to load reader

Transcript of Multi-layer neural networks - people.cs.pitt.edumilos/courses/cs2750-Spring03/lectures/... · 5 CS...

Page 1: Multi-layer neural networks - people.cs.pitt.edumilos/courses/cs2750-Spring03/lectures/... · 5 CS 1571 Intro to AI Multilayer neural network Input layer Hidden layer Output layer

1

CS 1571 Intro to AI

CS 1571 Introduction to AILecture 10

Milos [email protected] Sennott Square

Multi-layer neural networks

CS 1571 Intro to AI

Linear units

Logistic regressionLinear regression

dx

1

1x0w

1w2w

dw2x

)(xf

xwx Tf =)( )(),|1()( xwwxx Tgypf ===

xxww ))(( fy−+← α

Gradient update: Gradient update:

The same

xxww ))(( fy−+← α

1

)|1( xyp =

0w

1w2w

dw

z=)(xf

1x

dx

2x

∑=

−+←n

iiii fy

1

))(( xxww α ∑=

−+←n

iiii fy

1

))(( xxww αOnline: Online:

Page 2: Multi-layer neural networks - people.cs.pitt.edumilos/courses/cs2750-Spring03/lectures/... · 5 CS 1571 Intro to AI Multilayer neural network Input layer Hidden layer Output layer

2

CS 1571 Intro to AI

Limitations of basic linear units

Logistic regressionLinear regression

1

)|1( xyp =

0w

1w2w

dw

z∑

1

0w

1w2w

dw

)(xf

Function linear in inputs !! Linear decision boundary!!

∑=

+=d

jjj xwwf

10)(x )(),|1()(

10 ∑

=

+===d

jjj xwwgypf wxx

1x

dx

2x1x

dx

2x

CS 1571 Intro to AI

Extensions of simple linear units

)()(1

0 xx j

m

jjwwf φ∑

=

+=

∑)(1 xφ

)(2 xφ

)( xmφ

1

1x

0w

1w2w

mwdx

)(xjφ - an arbitrary function of x

• use feature (basis) functions to model nonlinearities

))(()(1

0 xx j

m

jjwwgf φ∑

=

+=Linear regression Logistic regression

Page 3: Multi-layer neural networks - people.cs.pitt.edumilos/courses/cs2750-Spring03/lectures/... · 5 CS 1571 Intro to AI Multilayer neural network Input layer Hidden layer Output layer

3

CS 1571 Intro to AI

Regression with a quadratic model.

CS 1571 Intro to AI

Quadratic decision boundary

-4 -3 -2 -1 0 1 2 3 4 5 6-4

-3

-2

-1

0

1

2

3

4

5

Page 4: Multi-layer neural networks - people.cs.pitt.edumilos/courses/cs2750-Spring03/lectures/... · 5 CS 1571 Intro to AI Multilayer neural network Input layer Hidden layer Output layer

4

CS 1571 Intro to AI

Multi-layered neural networks

• Offer an alternative way to introduce nonlinearities to regression/classification models

• Idea: Cascade several simple logistic regression units.• Motivation: from a neuron and synaptic connections.

CS 1571 Intro to AI

Model of a neuron

∑0w

1w2w

kw

z

x

1

y

Threshold function

Page 5: Multi-layer neural networks - people.cs.pitt.edumilos/courses/cs2750-Spring03/lectures/... · 5 CS 1571 Intro to AI Multilayer neural network Input layer Hidden layer Output layer

5

CS 1571 Intro to AI

Multilayer neural network

Hidden layer Output layerInput layer

Cascades multiple logistic regression unitsAlso called a multilayer perceptron (MLP)

∑1

1x)|1( xyp =

)1(1,0w

)1(1,kw)1(2,kw

dx

2x)2(1z

)1(2,0w

)1(1z

)1(2z

1

)2(1,0w

)2(1,1w

)2(1,2w

Example: a (2 layer) classifier with non-linear decision boundaries

CS 1571 Intro to AI

Multilayer neural network

• Models non-linearities through logistic regression units• Can be applied to both regression and binary classification

problems

∑1

),|1()( wxx == ypf

)1(1,0w

)1(1,kw)1(2,kw

)2(1z

)1(2,0w

)1(1z

)1(2z

)2(1,0w

)2(1,1w

)2(1,2w

Hidden layer Output layerInput layer

1),()( wxx ff =

regression

classification

option

1x

dx

2x

Page 6: Multi-layer neural networks - people.cs.pitt.edumilos/courses/cs2750-Spring03/lectures/... · 5 CS 1571 Intro to AI Multilayer neural network Input layer Hidden layer Output layer

6

CS 1571 Intro to AI

Multilayer neural network

• Non-linearities are modeled using multiple hidden logistic regression units (organized in layers)

• Output layer determines whether it is a regression and binary classification problem

),|1()( wxx == ypf

Hidden layers Output layerInput layer

),()( wxx ff =

regression

classification

option

1x

dx

2x

CS 1571 Intro to AI

Learning with MLP

• How to learn the parameters of the neural network?• Gradient descent algorithm. • On-line version: Weight updates are based on

• We need to compute gradients for weights in all units• Can be computed in one backward sweep through the net !!!

• The process is called back-propagation

),(online wij

jj DJw

ww∂

∂−← α

),(online wiDJ

Page 7: Multi-layer neural networks - people.cs.pitt.edumilos/courses/cs2750-Spring03/lectures/... · 5 CS 1571 Intro to AI Multilayer neural network Input layer Hidden layer Output layer

7

CS 1571 Intro to AI

Backpropagation

∑)1( +kxl)(kxi

)1(, −kw ji )(kzi )1( +kzl)1(, +kw il

∑)1( −kx j

k-th level (k+1)-th level(k-1)-th level

)(kxi - output of the unit i on level k)(kzi - input to the sigmoid function on level k

∑ −+=j

jjiii kxkwkwkz )1()()()( ,0,

))(()( kzgkx ii =

)(, kw ji - weight between units j and i on levels (k-1) and k

CS 1571 Intro to AI

Backpropagation

)(kiδ

))(()( wx,fyKi −−=δ

)(, kw jiUpdate weight using a data point),(

)()()(

,,, wuonline

jijiji DJ

kwkwkw

∂∂−← α

),()(

)( wuonlinei

i DJkz

k∂

∂=δLet

Then: )1()()(

)()(

),(),(

)( ,,

−=∂∂

∂∂

=∂

∂ kxkkw

kzkzDJ

DJkw ji

ji

i

i

uonlineuonline

ji

δww

S.t. is computed from and the next layer )1( +klδ

))(1)(()1()1()( , kxkxkwkk iiill

li −

++= ∑δδ

Last unit (is the same as for the regular linear units):

It is the same for the classification with the log-likelihoodmeasure of fit and linear regression with least-squares error!!!

>=< yDu ,x

)(kxi

Page 8: Multi-layer neural networks - people.cs.pitt.edumilos/courses/cs2750-Spring03/lectures/... · 5 CS 1571 Intro to AI Multilayer neural network Input layer Hidden layer Output layer

8

CS 1571 Intro to AI

Learning with MLP

• Online gradient descent algorithm– Weight update:

),()(

)()( online,

,, wuji

jiji DJkw

kwkw∂

∂−← α

)1()()(

)()(

),(),(

)( ,,

−=∂∂

∂∂

=∂

∂ kxkkw

kzkzDJ

DJkw ji

ji

i

i

uonlineuonline

ji

δww

)1()()()( ,, −−← kxkkwkw jijiji αδ

)1( −kx j

)(kiδ- j-th output of the (k-1) layer

- derivative computed via backpropagationα - a learning rate

CS 1571 Intro to AI

Online gradient descent algorithm for MLP

Online-gradient-descent (D, number of iterations)Initialize all weightsfor i=1:1: number of iterations

do select a data point Du=<x,y> from Dset compute outputs for each unitcompute derivatives via backpropagationupdate all weights (in parallel)

end forreturn weights w

)(, kw ji

i/1=α

)1()()()( ,, −−← kxkkwkw jijiji αδ

)(kx j

)(kiδ

Page 9: Multi-layer neural networks - people.cs.pitt.edumilos/courses/cs2750-Spring03/lectures/... · 5 CS 1571 Intro to AI Multilayer neural network Input layer Hidden layer Output layer

9

CS 1571 Intro to AI

Xor Example.

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

• No linear decision boundary

CS 1571 Intro to AI

Xor example. Linear unit

Page 10: Multi-layer neural networks - people.cs.pitt.edumilos/courses/cs2750-Spring03/lectures/... · 5 CS 1571 Intro to AI Multilayer neural network Input layer Hidden layer Output layer

10

CS 1571 Intro to AI

Xor example. Neural network with 2 hidden units

CS 1571 Intro to AI

Xor example. Neural network with 10 hidden units

Page 11: Multi-layer neural networks - people.cs.pitt.edumilos/courses/cs2750-Spring03/lectures/... · 5 CS 1571 Intro to AI Multilayer neural network Input layer Hidden layer Output layer

11

CS 1571 Intro to AI

Problems with learning MLPs

• Decision about the number of units must be made in advance• Converges to local optima• Sensitive to initial set of weights

1x

dx

2x

CS 1571 Intro to AI

MLP in practice

• Optical character recognition – digits 20x20– Automatic sorting of mails– 5 layer network with multiple output functions

10 outputs (0,1,…9)…

20x20 = 400 inputs

5 10 3000

4 300 1200

3 1200 50000

2 784 3136

1 3136 78400

layer Neurons Weights