Neuro Absolutism

Neuro Absolutism

Mgr. Milan Lajtoš [email protected]

Comparison of various activation functions on problem of linearly non-separable function classification

mailto:[email protected]

Dataset

classification of linearly non-separable patterns

100 randomly generated input-output pairs

xi ∈ N (µ, σ2);µ = 0 ∧ σ2 = 1

f(x1, x2) =

!

−1 if x1x2 > 0

+1 if x1x2 ≤ 0

x1

x2f(x1,x2)

Network Architecture

traditional multilayer perceptron with 2 hidden layers

2 → 4 → 8 → 1

linear combination

+ activation function

only linear combination

(Bias terms are ommited.)

f(x) = tanh(x)f(x) =1

1 + e−x

f(x) = log(1 + ex) f(x) = max(x, 0)

f(x) = |x|

f(x) =x

1 + |x|

f(x) = log

!

1

1 + e−x

"

Sigmoid

Rectified Linear Unit

Tanh

HardTanhSoftPlus

Abs LogSigmoid

SoftSign

Activation Functions

f(x) =

⎧

⎪

⎨

⎪

⎩

1 if x > 0

−1 if x < −1

x otherwise

Training

• Stochastic Gradient Descent

• MSE criterion

• fixed learning rate at η=0.01

• no momentum

• 1000 epochs

• 100 trials

MSE on Training Set after 1000 epochs100 trials; η=0.01; no η-decay; no momentum

MSE

0,0

0,1

0,2

0,4

0,5

0,6

0,7

0,8

1,0

1,1

1,2

Activation functionSigmoid ReLU Abs SoftPlus Tanh HardTanh SoftSign LogSigmoid

Best Average Worst

Results

0.00064178757283512

Neuro Absolutism

Data & Analytics

Transcript of Neuro Absolutism