On-Line Learning with Recycled Examples: A Cavity Analysis

1

On-Line Learning with Recycled Examples

A Cavity Analysis

Peixun Luo and K Y Michael Wong

Hong Kong University of Science and Technology

2

Inputs ξj j = 1 N

Weights Jj j = 1 hellip N

Activation y = Jξ

Output S = f(y)

Formulation

Jj

y

3

Given p = αN examples with

inputs ξjμ j = 1 N μ = 1 hellip p

outputs yμ generated by a teacher network

Learning is done by defining a risk function and minimizing it by gradient descent

The Learning of a Task

Jj

y

4

Define a cost function in terms of the examplesE = Σ μ E μ + regularization terms

On-line learningAt time t draw an example σ(t) andΔJj ~ Gradient with respect to σ(t) + weight decay

Batch learningAt time tΔJj ~ Average gradient with respect to all examples+ weight decay

Learning Dynamics

5

Batch vs On-lineBatch learning On-line learning

Same batch of examples for all steps

An independent example per step

Simple dynamics no sequence dependence

Complex dynamics sequence dependence

Small stepwise changes of examples

Giant boosts of examples stepwise

Previous analysis possible

Previous analysis limited to infinite sets

Stable but inefficient Efficient but less stable

6

It has been applied to many complex systems It has been applied to steady-state properties of

learning It uses a self-consistency argument to consider what

happens when a set of p examples is expanded to p + 1examples

The central quantity is the cavity activation which is the activation of example 0 in a network which learns examples 1 to p (but never learns example 0)

Since the original network has no information about example 0 the cavity activation obeys a random distribution (eg a Gaussian)

Now suppose the network incorporates example 0 at time s The activation is no longer random

The Cavity Method

7

The cavity activation diffuses randomly The generic activation receiving a stimulus at time s

is no longer random The background examples also adjust due to the

newcomer Assuming that the background adjustments are small

we can use linear response theory to superpose the effects due to all previous times s

Linear Response

time

stimulation time s

X(t)

h(t)random diffusion

8

For batch learningGeneric activation of an example at time t= cavity activation of the example at time t+ integrates(Greenrsquos function from time s to t) x(gradient term at time s)

For on-line learningGeneric activation of an example at time t= cavity activation of the example at time t+ summations(Greenrsquos function from time s to t) x(gradient term at time s)The learning instants s are Poisson distributed

Useful Equations

9

Simulation Results generic activation (with giant boosts)

(line)cavity activation from theory

(dots)simulation with example removed

learning instants (Poisson distributed)

theory and simulations agree

10

Further Development

training error

generalization error


11

Critical Learning Rate (1)


critical learning rate at which learning diverges

other approximations

12





13

Average Learning


generalization error drops when the dynamics is averaged over monitoring periods

14

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

On-Line Learning with Recycled Examples A Cavity Analysis

Slide 2

Slide 3

Slide 4

Batch vs On-line

Slide 6

Slide 7

Slide 8

Simulation Results

Further Development



Average Learning

Slide 14

2

Inputs ξj j = 1 N

Weights Jj j = 1 hellip N

Activation y = Jξ

Output S = f(y)

Formulation

Jj

y

3






Jj

y

4




Learning Dynamics

5











6







The Cavity Method

7





Linear Response

time

stimulation time s

X(t)


8



Useful Equations

9






10

Further Development

training error



11





12





13

Average Learning



14





Conclusion


Slide 2

Slide 3

Slide 4

Batch vs On-line

Slide 6

Slide 7

Slide 8

Simulation Results

Further Development



Average Learning

Slide 14

3






Jj

y

4




Learning Dynamics

5











6







The Cavity Method

7





Linear Response

time

stimulation time s

X(t)


8



Useful Equations

9






10

Further Development

training error



11





12





13

Average Learning



14





Conclusion


Slide 2

Slide 3

Slide 4

Batch vs On-line

Slide 6

Slide 7

Slide 8

Simulation Results

Further Development



Average Learning

Slide 14

4




Learning Dynamics

5











6







The Cavity Method

7





Linear Response

time

stimulation time s

X(t)


8



Useful Equations

9






10

Further Development

training error



11





12





13

Average Learning



14





Conclusion


Slide 2

Slide 3

Slide 4

Batch vs On-line

Slide 6

Slide 7

Slide 8

Simulation Results

Further Development



Average Learning

Slide 14

5











6







The Cavity Method

7





Linear Response

time

stimulation time s

X(t)


8



Useful Equations

9






10

Further Development

training error



11





12





13

Average Learning



14





Conclusion


Slide 2

Slide 3

Slide 4

Batch vs On-line

Slide 6

Slide 7

Slide 8

Simulation Results

Further Development



Average Learning

Slide 14

6







The Cavity Method

7





Linear Response

time

stimulation time s

X(t)


8



Useful Equations

9






10

Further Development

training error



11





12





13

Average Learning



14





Conclusion


Slide 2

Slide 3

Slide 4

Batch vs On-line

Slide 6

Slide 7

Slide 8

Simulation Results

Further Development



Average Learning

Slide 14

7





Linear Response

time

stimulation time s

X(t)


8



Useful Equations

9






10

Further Development

training error



11





12





13

Average Learning



14





Conclusion


Slide 2

Slide 3

Slide 4

Batch vs On-line

Slide 6

Slide 7

Slide 8

Simulation Results

Further Development



Average Learning

Slide 14

8



Useful Equations

9






10

Further Development

training error



11





12





13

Average Learning



14





Conclusion


Slide 2

Slide 3

Slide 4

Batch vs On-line

Slide 6

Slide 7

Slide 8

Simulation Results

Further Development



Average Learning

Slide 14

9






10

Further Development

training error



11





12





13

Average Learning



14





Conclusion


Slide 2

Slide 3

Slide 4

Batch vs On-line

Slide 6

Slide 7

Slide 8

Simulation Results

Further Development



Average Learning

Slide 14

10

Further Development

training error



11





12





13

Average Learning



14





Conclusion


Slide 2

Slide 3

Slide 4

Batch vs On-line

Slide 6

Slide 7

Slide 8

Simulation Results

Further Development



Average Learning

Slide 14

11





12





13

Average Learning



14





Conclusion


Slide 2

Slide 3

Slide 4

Batch vs On-line

Slide 6

Slide 7

Slide 8

Simulation Results

Further Development



Average Learning

Slide 14

12





13

Average Learning



14





Conclusion


Slide 2

Slide 3

Slide 4

Batch vs On-line

Slide 6

Slide 7

Slide 8

Simulation Results

Further Development



Average Learning

Slide 14

13

Average Learning



14





Conclusion


Slide 2

Slide 3

Slide 4

Batch vs On-line

Slide 6

Slide 7

Slide 8

Simulation Results

Further Development



Average Learning

Slide 14

14





Conclusion


Slide 2

Slide 3

Slide 4

Batch vs On-line

Slide 6

Slide 7

Slide 8

Simulation Results

Further Development



Average Learning

Slide 14

On-Line Learning with Recycled Examples: A Cavity Analysis

Documents

Transcript of On-Line Learning with Recycled Examples: A Cavity Analysis