• date post

22-Feb-2016
• Category

## Documents

• view

30

0

Embed Size (px)

description

On-Line Learning with Recycled Examples: A Cavity Analysis. Peixun Luo and K. Y. Michael Wong Hong Kong University of Science and Technology. y. J j. Formulation. Inputs: ξ j , j = 1, ..., N Weights: J j j = 1, …, N Activation: y = J · ξ Output: S = f ( y ). y. J j. - PowerPoint PPT Presentation

### Transcript of On-Line Learning with Recycled Examples: A Cavity Analysis

• On-Line Learning with Recycled Examples:A Cavity AnalysisPeixun Luo and K. Y. Michael Wong

Hong Kong University of Science and Technology

• Inputs: j, j = 1, ..., N

Weights: Jj j = 1, , N

Activation: y = J

Output: S = f(y)Formulation

• Given p = N examples with

inputs: j j = 1, ..., N, = 1, , poutputs: y generated by a teacher network

Learning is done by defining a risk function and minimizing it by gradient descent.The Learning of a Task

• Define a cost function in terms of the examples.E = E + regularization terms

On-line learning:At time t, draw an example (t) and:Jj ~ Gradient with respect to (t) + weight decay

Batch learning:At time t,Jj ~ Average gradient with respect to all examples+ weight decayLearning Dynamics

• Batch vs On-line

Batch learningOn-line learningSame batch of examples for all stepsAn independent example per stepSimple dynamics: no sequence dependenceComplex dynamics: sequence dependenceSmall stepwise changes of examplesGiant boosts of examples stepwisePrevious analysis: possiblePrevious analysis: limited to infinite sets Stable but inefficientEfficient but less stable

• It has been applied to many complex systems.It has been applied to steady-state properties of learning.It uses a self-consistency argument to consider what happens when a set of p examples is expanded to p + 1examples.The central quantity is the cavity activation, which is the activation of example 0 in a network which learns examples 1 to p (but never learns example 0).Since the original network has no information about example 0, the cavity activation obeys a random distribution (e.g. a Gaussian).Now suppose the network incorporates example 0 at time s. The activation is no longer random.The Cavity Method

• The cavity activation diffuses randomly.The generic activation, receiving a stimulus at time s, is no longer random.The background examples also adjust due to the newcomer.Assuming that the background adjustments are small, we can use linear response theory to superpose the effects due to all previous times s.Linear Response

• For batch learning:Generic activation of an example at time t= cavity activation of the example at time t+ integrates(Greens function from time s to t) x(gradient term at time s).

For on-line learning:Generic activation of an example at time t= cavity activation of the example at time t+ summations(Greens function from time s to t) x(gradient term at time s).The learning instants s are Poisson distributed.Useful Equations

• Simulation Results generic activation (with giant boosts)(line)cavity activation from theory(dots)simulation with example removed

• Further Development training errorgeneralization error

• Critical Learning Rate (1) theory and simulations agree!

• Critical Learning Rate (2) theory and simulations agree!

• Average Learning theory and simulations agree!

• We have analysed the dynamics of on-line learning with recycled examples using the cavity approach.

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning.

Theory and simulations agree well on:the evolution of the training and generalization errors,the critical learning rate at which learning diverges,the performance of average learning.

Future: to develop a Monte Carlo sampling procedure for multilayer networks.Conclusion