On-Line Learning with Recycled Examples: A Cavity Analysis
Embed Size (px)
Transcript of On-Line Learning with Recycled Examples: A Cavity Analysis
On-Line Learning with Recycled Examples:A Cavity AnalysisPeixun Luo and K. Y. Michael Wong
Hong Kong University of Science and Technology
Inputs: j, j = 1, ..., N
Weights: Jj j = 1, , N
Activation: y = J
Output: S = f(y)Formulation
Given p = N examples with
inputs: j j = 1, ..., N, = 1, , poutputs: y generated by a teacher network
Learning is done by defining a risk function and minimizing it by gradient descent.The Learning of a Task
Define a cost function in terms of the examples.E = E + regularization terms
On-line learning:At time t, draw an example (t) and:Jj ~ Gradient with respect to (t) + weight decay
Batch learning:At time t,Jj ~ Average gradient with respect to all examples+ weight decayLearning Dynamics
Batch vs On-line
Batch learningOn-line learningSame batch of examples for all stepsAn independent example per stepSimple dynamics: no sequence dependenceComplex dynamics: sequence dependenceSmall stepwise changes of examplesGiant boosts of examples stepwisePrevious analysis: possiblePrevious analysis: limited to infinite sets Stable but inefficientEfficient but less stable
It has been applied to many complex systems.It has been applied to steady-state properties of learning.It uses a self-consistency argument to consider what happens when a set of p examples is expanded to p + 1examples.The central quantity is the cavity activation, which is the activation of example 0 in a network which learns examples 1 to p (but never learns example 0).Since the original network has no information about example 0, the cavity activation obeys a random distribution (e.g. a Gaussian).Now suppose the network incorporates example 0 at time s. The activation is no longer random.The Cavity Method
The cavity activation diffuses randomly.The generic activation, receiving a stimulus at time s, is no longer random.The background examples also adjust due to the newcomer.Assuming that the background adjustments are small, we can use linear response theory to superpose the effects due to all previous times s.Linear Response
For batch learning:Generic activation of an example at time t= cavity activation of the example at time t+ integrates(Greens function from time s to t) x(gradient term at time s).
For on-line learning:Generic activation of an example at time t= cavity activation of the example at time t+ summations(Greens function from time s to t) x(gradient term at time s).The learning instants s are Poisson distributed.Useful Equations
Simulation Results generic activation (with giant boosts)(line)cavity activation from theory(dots)simulation with example removed
Further Development training errorgeneralization error
Critical Learning Rate (1) theory and simulations agree!
Critical Learning Rate (2) theory and simulations agree!
Average Learning theory and simulations agree!
We have analysed the dynamics of on-line learning with recycled examples using the cavity approach.
Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning.
Theory and simulations agree well on:the evolution of the training and generalization errors,the critical learning rate at which learning diverges,the performance of average learning.
Future: to develop a Monte Carlo sampling procedure for multilayer networks.Conclusion