Neuro Absolutism
-
Upload
milan-lajtos -
Category
Data & Analytics
-
view
556 -
download
2
Transcript of Neuro Absolutism
Neuro Absolutism
Mgr. Milan Lajtoš [email protected]
Comparison of various activation functions on problem of linearly non-separable function classification
Dataset
classification of linearly non-separable patterns
100 randomly generated input-output pairs
xi ∈ N (µ, σ2);µ = 0 ∧ σ2 = 1
f(x1, x2) =
!
−1 if x1x2 > 0
+1 if x1x2 ≤ 0
x1
x2f(x1,x2)
Network Architecture
traditional multilayer perceptron with 2 hidden layers
2 → 4 → 8 → 1
linear combination
+ activation function
only linear combination
(Bias terms are ommited.)
f(x) = tanh(x)f(x) =1
1 + e−x
f(x) = log(1 + ex) f(x) = max(x, 0)
f(x) = |x|
f(x) =x
1 + |x|
f(x) = log
!
1
1 + e−x
"
Sigmoid
Rectified Linear Unit
Tanh
HardTanhSoftPlus
Abs LogSigmoid
SoftSign
Activation Functions
f(x) =
⎧
⎪
⎨
⎪
⎩
1 if x > 0
−1 if x < −1
x otherwise
Training
• Stochastic Gradient Descent
• MSE criterion
• fixed learning rate at η=0.01
• no momentum
• 1000 epochs
• 100 trials
MSE on Training Set after 1000 epochs100 trials; η=0.01; no η-decay; no momentum
MSE
0,0
0,1
0,2
0,4
0,5
0,6
0,7
0,8
1,0
1,1
1,2
Activation functionSigmoid ReLU Abs SoftPlus Tanh HardTanh SoftSign LogSigmoid
Best Average Worst
Results
0.00064178757283512