Download - (Artificial) Neural Network

Transcript
Page 1: (Artificial) Neural Network

(Artificial) Neural Network

Putri Wikie Novianti

Reading Group

July 11, 2012

Page 2: (Artificial) Neural Network

Analogy with human brain

Page 3: (Artificial) Neural Network

X1

X2

XN

.

.

.

Y

b

F

Input Output

Y_in Y_out

W1

W2

WN

Activation functions:

1. Binary step function

2. Bipolar step function

3. Sigmoid function

4. Linear function

Page 4: (Artificial) Neural Network
Page 5: (Artificial) Neural Network
Page 6: (Artificial) Neural Network

Perceptron

X1

X2

Y1

b

F Y_in Y_out

W1

W2

Y2

Initialization:

- Weight (wi) and bias

- Learning rate (α)

- Maximum epoch

Stopping criterions:

- Weight changes

- Error ≠ 0

- Reach maximum epoch

Update weight and bias

W ij = W ij + α * tj * Xki

Page 7: (Artificial) Neural Network

Backpropagation

Page 8: (Artificial) Neural Network

Backpropagation

Page 9: (Artificial) Neural Network

Backpropagation

Page 10: (Artificial) Neural Network

Backpropagation

Page 11: (Artificial) Neural Network

Backpropagation

Page 12: (Artificial) Neural Network

Backpropagation

Page 13: (Artificial) Neural Network

Backpropagation

Page 14: (Artificial) Neural Network

Backpropagation

Page 15: (Artificial) Neural Network

Backpropagation

Page 16: (Artificial) Neural Network

Backpropagation

Page 17: (Artificial) Neural Network

Backpropagation

Page 18: (Artificial) Neural Network

1. Starting values

Usually starting values for weights are chosen to be random values near zero.

Hence the model starts out nearly linear, and becomes nonlinear as the weights

increase.

Use of exact zero weights leads to zero derivatives and perfect symmetry, and the

algorithm never moves

2. Overfitting

Typically we don’t want the global minimizer of R(θ), as this is likely to be an overfit

solution. Instead some regularization is needed: this is achieved directly through a

penalty term, or indirectly by early stopping.

R(θ) : Error as a function of the complete ser of weight θ

Some Issues in Training NN

Page 19: (Artificial) Neural Network

Some Issues in Training NN

Page 20: (Artificial) Neural Network

Some Issues in Training NN

Page 21: (Artificial) Neural Network

3. Scaling the input

- Since scaling the input determines the effective scaling of the weight in the

bottom layer, It can have the large effect of the final model

- It is better to standardize all inputs (mean = 0 and std.dev = 1)

- Ensure all inputs are treated equally in regulation process and allows one to

choose a meaningfull range for random starting weights.

- Typical weight for standardized inputs: random uniform weight over the range

[-0.7, 0.7]

Some Issues in Training NN

Page 22: (Artificial) Neural Network

4. Number of hidden units and layers

- Better to have too many hidden units than too few

Too few hidden units: Less flexibility of the model (hard to capture nonlinearity)

Too many hidden units: extra weight can be shrunk towards zero with

appropriate regularization used

- Typical # of hidden units : [5, 100]

5. Multiple Minima

- The error function R(θ) is non-convex, possessing many local minima.

As result, the final solution quite depends on the choice of starting weight.

- Solution:

* averaging the predictions over the collection of networks as the final prediction

* averaging the weight

* bagging: averaging the prediction of the networks training from randomly

perturbed version of the training data.

Some Issues in Training NN

Page 23: (Artificial) Neural Network

Example: Simulated Data

Page 24: (Artificial) Neural Network

Example: Simulated Data

Page 25: (Artificial) Neural Network

Example: Simulated Data

Page 26: (Artificial) Neural Network

Example: ZIP Code Data

Page 27: (Artificial) Neural Network

Example: ZIP Code Data

Page 28: (Artificial) Neural Network

Example: ZIP Code Data

Page 29: (Artificial) Neural Network

Bayesian Neural Nets

A classification was held in 2003, by Neural Information Processing

System (NIPS) workshop

The winner: Neal and Zhang (2006) used a series of preprocessing

feature selection steps, followed by Bayesian NN, Dirichelet

diffusion trees, and combination of these methods.

Page 30: (Artificial) Neural Network

Bayesian approach review:

Bayesian Neural Nets

Page 31: (Artificial) Neural Network

Bayesian Neural Nets

Page 32: (Artificial) Neural Network

Bayesian Neural Nets

Page 33: (Artificial) Neural Network

Bayesian Neural Nets

Page 34: (Artificial) Neural Network

Bagging and Boosting Neural Nets

Page 35: (Artificial) Neural Network
Page 36: (Artificial) Neural Network

Computational consideration

Page 37: (Artificial) Neural Network

References

[1] Zhang, X. Support Vector Machine. Lecture slides on Data Mining course. Fall 2010, KSA:

KAUST

[2] Hastie, T., Tibshirani, R., Friedman, J. The elements of statistical learning, second edition.

2009. New York: Springer