(Artificial) Neural Network

Author
putriwikie 
Category
Documents

view
318 
download
1
Embed Size (px)
Transcript of (Artificial) Neural Network
(Artificial) Neural Network
Putri Wikie Novianti
Reading Group
July 11, 2012
Analogy with human brain
X1
X2
XN
.
.
.
Y
b
F
Input Output
Y_in Y_out
W1
W2
WN
Activation functions:
1. Binary step function
2. Bipolar step function
3. Sigmoid function
4. Linear function
Perceptron
X1
X2
Y1
b
F Y_in Y_out
W1
W2
Y2
Initialization:
 Weight (wi) and bias
 Learning rate (α)
 Maximum epoch
Stopping criterions:
 Weight changes
 Error ≠ 0
 Reach maximum epoch
Update weight and bias
W ij = W ij + α * tj * Xki
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
1. Starting values
Usually starting values for weights are chosen to be random values near zero.
Hence the model starts out nearly linear, and becomes nonlinear as the weights
increase.
Use of exact zero weights leads to zero derivatives and perfect symmetry, and the
algorithm never moves
2. Overfitting
Typically we don’t want the global minimizer of R(θ), as this is likely to be an overfit
solution. Instead some regularization is needed: this is achieved directly through a
penalty term, or indirectly by early stopping.
R(θ) : Error as a function of the complete ser of weight θ
Some Issues in Training NN
Some Issues in Training NN
Some Issues in Training NN
3. Scaling the input
 Since scaling the input determines the effective scaling of the weight in the
bottom layer, It can have the large effect of the final model
 It is better to standardize all inputs (mean = 0 and std.dev = 1)
 Ensure all inputs are treated equally in regulation process and allows one to
choose a meaningfull range for random starting weights.
 Typical weight for standardized inputs: random uniform weight over the range
[0.7, 0.7]
Some Issues in Training NN
4. Number of hidden units and layers
 Better to have too many hidden units than too few
Too few hidden units: Less flexibility of the model (hard to capture nonlinearity)
Too many hidden units: extra weight can be shrunk towards zero with
appropriate regularization used
 Typical # of hidden units : [5, 100]
5. Multiple Minima
 The error function R(θ) is nonconvex, possessing many local minima.
As result, the final solution quite depends on the choice of starting weight.
 Solution:
* averaging the predictions over the collection of networks as the final prediction
* averaging the weight
* bagging: averaging the prediction of the networks training from randomly
perturbed version of the training data.
Some Issues in Training NN
Example: Simulated Data
Example: Simulated Data
Example: Simulated Data
Example: ZIP Code Data
Example: ZIP Code Data
Example: ZIP Code Data
Bayesian Neural Nets
A classification was held in 2003, by Neural Information Processing
System (NIPS) workshop
The winner: Neal and Zhang (2006) used a series of preprocessing
feature selection steps, followed by Bayesian NN, Dirichelet
diffusion trees, and combination of these methods.
Bayesian approach review:
Bayesian Neural Nets
Bayesian Neural Nets
Bayesian Neural Nets
Bayesian Neural Nets
Bagging and Boosting Neural Nets
Computational consideration
References
[1] Zhang, X. Support Vector Machine. Lecture slides on Data Mining course. Fall 2010, KSA:
KAUST
[2] Hastie, T., Tibshirani, R., Friedman, J. The elements of statistical learning, second edition.
2009. New York: Springer