Multi-layer neural networks - people.cs.pitt.edumilos/courses/cs2750-Spring03/lectures/... · 5 CS...
Click here to load reader
Transcript of Multi-layer neural networks - people.cs.pitt.edumilos/courses/cs2750-Spring03/lectures/... · 5 CS...
1
CS 1571 Intro to AI
CS 1571 Introduction to AILecture 10
Milos [email protected] Sennott Square
Multi-layer neural networks
CS 1571 Intro to AI
Linear units
Logistic regressionLinear regression
dx
∑
1
1x0w
1w2w
dw2x
)(xf
xwx Tf =)( )(),|1()( xwwxx Tgypf ===
xxww ))(( fy−+← α
Gradient update: Gradient update:
The same
xxww ))(( fy−+← α
∑
1
)|1( xyp =
0w
1w2w
dw
z=)(xf
1x
dx
2x
∑=
−+←n
iiii fy
1
))(( xxww α ∑=
−+←n
iiii fy
1
))(( xxww αOnline: Online:
2
CS 1571 Intro to AI
Limitations of basic linear units
Logistic regressionLinear regression
∑
1
)|1( xyp =
0w
1w2w
dw
z∑
1
0w
1w2w
dw
)(xf
Function linear in inputs !! Linear decision boundary!!
∑=
+=d
jjj xwwf
10)(x )(),|1()(
10 ∑
=
+===d
jjj xwwgypf wxx
1x
dx
2x1x
dx
2x
CS 1571 Intro to AI
Extensions of simple linear units
)()(1
0 xx j
m
jjwwf φ∑
=
+=
∑)(1 xφ
)(2 xφ
)( xmφ
1
1x
0w
1w2w
mwdx
)(xjφ - an arbitrary function of x
• use feature (basis) functions to model nonlinearities
))(()(1
0 xx j
m
jjwwgf φ∑
=
+=Linear regression Logistic regression
3
CS 1571 Intro to AI
Regression with a quadratic model.
CS 1571 Intro to AI
Quadratic decision boundary
-4 -3 -2 -1 0 1 2 3 4 5 6-4
-3
-2
-1
0
1
2
3
4
5
4
CS 1571 Intro to AI
Multi-layered neural networks
• Offer an alternative way to introduce nonlinearities to regression/classification models
• Idea: Cascade several simple logistic regression units.• Motivation: from a neuron and synaptic connections.
CS 1571 Intro to AI
Model of a neuron
∑0w
1w2w
kw
z
x
1
y
Threshold function
5
CS 1571 Intro to AI
Multilayer neural network
Hidden layer Output layerInput layer
Cascades multiple logistic regression unitsAlso called a multilayer perceptron (MLP)
∑1
1x)|1( xyp =
)1(1,0w
)1(1,kw)1(2,kw
dx
2x)2(1z
)1(2,0w
∑
)1(1z
)1(2z
1
)2(1,0w
)2(1,1w
)2(1,2w
Example: a (2 layer) classifier with non-linear decision boundaries
CS 1571 Intro to AI
Multilayer neural network
• Models non-linearities through logistic regression units• Can be applied to both regression and binary classification
problems
∑1
),|1()( wxx == ypf
)1(1,0w
)1(1,kw)1(2,kw
)2(1z
)1(2,0w
∑
)1(1z
)1(2z
)2(1,0w
)2(1,1w
)2(1,2w
Hidden layer Output layerInput layer
1),()( wxx ff =
regression
classification
option
1x
dx
2x
6
CS 1571 Intro to AI
Multilayer neural network
• Non-linearities are modeled using multiple hidden logistic regression units (organized in layers)
• Output layer determines whether it is a regression and binary classification problem
),|1()( wxx == ypf
Hidden layers Output layerInput layer
),()( wxx ff =
regression
classification
option
1x
dx
2x
CS 1571 Intro to AI
Learning with MLP
• How to learn the parameters of the neural network?• Gradient descent algorithm. • On-line version: Weight updates are based on
• We need to compute gradients for weights in all units• Can be computed in one backward sweep through the net !!!
• The process is called back-propagation
),(online wij
jj DJw
ww∂
∂−← α
),(online wiDJ
7
CS 1571 Intro to AI
Backpropagation
∑)1( +kxl)(kxi
)1(, −kw ji )(kzi )1( +kzl)1(, +kw il
∑)1( −kx j
k-th level (k+1)-th level(k-1)-th level
)(kxi - output of the unit i on level k)(kzi - input to the sigmoid function on level k
∑ −+=j
jjiii kxkwkwkz )1()()()( ,0,
))(()( kzgkx ii =
)(, kw ji - weight between units j and i on levels (k-1) and k
CS 1571 Intro to AI
Backpropagation
)(kiδ
))(()( wx,fyKi −−=δ
)(, kw jiUpdate weight using a data point),(
)()()(
,,, wuonline
jijiji DJ
kwkwkw
∂∂−← α
),()(
)( wuonlinei
i DJkz
k∂
∂=δLet
Then: )1()()(
)()(
),(),(
)( ,,
−=∂∂
∂∂
=∂
∂ kxkkw
kzkzDJ
DJkw ji
ji
i
i
uonlineuonline
ji
δww
S.t. is computed from and the next layer )1( +klδ
))(1)(()1()1()( , kxkxkwkk iiill
li −
++= ∑δδ
Last unit (is the same as for the regular linear units):
It is the same for the classification with the log-likelihoodmeasure of fit and linear regression with least-squares error!!!
>=< yDu ,x
)(kxi
8
CS 1571 Intro to AI
Learning with MLP
• Online gradient descent algorithm– Weight update:
),()(
)()( online,
,, wuji
jiji DJkw
kwkw∂
∂−← α
)1()()(
)()(
),(),(
)( ,,
−=∂∂
∂∂
=∂
∂ kxkkw
kzkzDJ
DJkw ji
ji
i
i
uonlineuonline
ji
δww
)1()()()( ,, −−← kxkkwkw jijiji αδ
)1( −kx j
)(kiδ- j-th output of the (k-1) layer
- derivative computed via backpropagationα - a learning rate
CS 1571 Intro to AI
Online gradient descent algorithm for MLP
Online-gradient-descent (D, number of iterations)Initialize all weightsfor i=1:1: number of iterations
do select a data point Du=<x,y> from Dset compute outputs for each unitcompute derivatives via backpropagationupdate all weights (in parallel)
end forreturn weights w
)(, kw ji
i/1=α
)1()()()( ,, −−← kxkkwkw jijiji αδ
)(kx j
)(kiδ
9
CS 1571 Intro to AI
Xor Example.
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
• No linear decision boundary
CS 1571 Intro to AI
Xor example. Linear unit
10
CS 1571 Intro to AI
Xor example. Neural network with 2 hidden units
CS 1571 Intro to AI
Xor example. Neural network with 10 hidden units
11
CS 1571 Intro to AI
Problems with learning MLPs
• Decision about the number of units must be made in advance• Converges to local optima• Sensitive to initial set of weights
1x
dx
2x
CS 1571 Intro to AI
MLP in practice
• Optical character recognition – digits 20x20– Automatic sorting of mails– 5 layer network with multiple output functions
10 outputs (0,1,…9)…
20x20 = 400 inputs
5 10 3000
4 300 1200
3 1200 50000
2 784 3136
1 3136 78400
layer Neurons Weights