Download - On-Line Learning with Recycled Examples: A Cavity Analysis

Transcript

On-Line Learning with Recycled Examples

A Cavity Analysis

Peixun Luo and K Y Michael Wong

Hong Kong University of Science and Technology

Inputs ξj j = 1 N

Weights Jj j = 1 hellip N

Activation y = Jξ

Output S = f(y)

Formulation

Given p = αN examples with

inputs ξjμ j = 1 N μ = 1 hellip p

outputs yμ generated by a teacher network

Learning is done by defining a risk function and minimizing it by gradient descent

The Learning of a Task

Define a cost function in terms of the examplesE = Σ μ E μ + regularization terms

On-line learningAt time t draw an example σ(t) andΔJj ~ Gradient with respect to σ(t) + weight decay

Batch learningAt time tΔJj ~ Average gradient with respect to all examples+ weight decay

Learning Dynamics

Batch vs On-lineBatch learning On-line learning

Same batch of examples for all steps

An independent example per step

Simple dynamics no sequence dependence

Complex dynamics sequence dependence

Small stepwise changes of examples

Giant boosts of examples stepwise

Previous analysis possible

Previous analysis limited to infinite sets

Stable but inefficient Efficient but less stable

It has been applied to many complex systems It has been applied to steady-state properties of

learning It uses a self-consistency argument to consider what

happens when a set of p examples is expanded to p + 1examples

The central quantity is the cavity activation which is the activation of example 0 in a network which learns examples 1 to p (but never learns example 0)

Since the original network has no information about example 0 the cavity activation obeys a random distribution (eg a Gaussian)

Now suppose the network incorporates example 0 at time s The activation is no longer random

The Cavity Method

The cavity activation diffuses randomly The generic activation receiving a stimulus at time s

is no longer random The background examples also adjust due to the

newcomer Assuming that the background adjustments are small

we can use linear response theory to superpose the effects due to all previous times s

Linear Response

time

stimulation time s

X(t)

h(t)random diffusion

For batch learningGeneric activation of an example at time t= cavity activation of the example at time t+ integrates(Greenrsquos function from time s to t) x(gradient term at time s)

For on-line learningGeneric activation of an example at time t= cavity activation of the example at time t+ summations(Greenrsquos function from time s to t) x(gradient term at time s)The learning instants s are Poisson distributed

Useful Equations

Simulation Results generic activation (with giant boosts)

(line)cavity activation from theory

(dots)simulation with example removed

learning instants (Poisson distributed)

theory and simulations agree

Further Development

training error

generalization error

theory and simulations agree

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

On-Line Learning with Recycled Examples A Cavity Analysis
Slide 2
Slide 3
Slide 4
Batch vs On-line
Slide 6
Slide 7
Slide 8
Simulation Results
Further Development
Critical Learning Rate (1)
Critical Learning Rate (2)
Average Learning
Slide 14

Page 2: On-Line Learning with Recycled Examples: A Cavity Analysis

Inputs ξj j = 1 N

Weights Jj j = 1 hellip N

Activation y = Jξ

Output S = f(y)

Formulation

Given p = αN examples with

inputs ξjμ j = 1 N μ = 1 hellip p

outputs yμ generated by a teacher network

Learning is done by defining a risk function and minimizing it by gradient descent

The Learning of a Task

Define a cost function in terms of the examplesE = Σ μ E μ + regularization terms

On-line learningAt time t draw an example σ(t) andΔJj ~ Gradient with respect to σ(t) + weight decay

Batch learningAt time tΔJj ~ Average gradient with respect to all examples+ weight decay

Learning Dynamics

Batch vs On-lineBatch learning On-line learning

Same batch of examples for all steps

An independent example per step

Simple dynamics no sequence dependence

Complex dynamics sequence dependence

Small stepwise changes of examples

Giant boosts of examples stepwise

Previous analysis possible

Previous analysis limited to infinite sets

Stable but inefficient Efficient but less stable

It has been applied to many complex systems It has been applied to steady-state properties of

learning It uses a self-consistency argument to consider what

happens when a set of p examples is expanded to p + 1examples

The central quantity is the cavity activation which is the activation of example 0 in a network which learns examples 1 to p (but never learns example 0)

Since the original network has no information about example 0 the cavity activation obeys a random distribution (eg a Gaussian)

Now suppose the network incorporates example 0 at time s The activation is no longer random

The Cavity Method

The cavity activation diffuses randomly The generic activation receiving a stimulus at time s

is no longer random The background examples also adjust due to the

newcomer Assuming that the background adjustments are small

we can use linear response theory to superpose the effects due to all previous times s

Linear Response

time

stimulation time s

X(t)

h(t)random diffusion

For batch learningGeneric activation of an example at time t= cavity activation of the example at time t+ integrates(Greenrsquos function from time s to t) x(gradient term at time s)

Useful Equations

Simulation Results generic activation (with giant boosts)

(line)cavity activation from theory

(dots)simulation with example removed

learning instants (Poisson distributed)

theory and simulations agree

Further Development

training error

generalization error

theory and simulations agree

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

On-Line Learning with Recycled Examples A Cavity Analysis
Slide 2
Slide 3
Slide 4
Batch vs On-line
Slide 6
Slide 7
Slide 8
Simulation Results
Further Development
Critical Learning Rate (1)
Critical Learning Rate (2)
Average Learning
Slide 14

Page 3: On-Line Learning with Recycled Examples: A Cavity Analysis

Given p = αN examples with

inputs ξjμ j = 1 N μ = 1 hellip p

outputs yμ generated by a teacher network

Learning is done by defining a risk function and minimizing it by gradient descent

The Learning of a Task

Define a cost function in terms of the examplesE = Σ μ E μ + regularization terms

On-line learningAt time t draw an example σ(t) andΔJj ~ Gradient with respect to σ(t) + weight decay

Batch learningAt time tΔJj ~ Average gradient with respect to all examples+ weight decay

Learning Dynamics

Batch vs On-lineBatch learning On-line learning

Same batch of examples for all steps

An independent example per step

Simple dynamics no sequence dependence

Complex dynamics sequence dependence

Small stepwise changes of examples

Giant boosts of examples stepwise

Previous analysis possible

Previous analysis limited to infinite sets

Stable but inefficient Efficient but less stable

It has been applied to many complex systems It has been applied to steady-state properties of

learning It uses a self-consistency argument to consider what

happens when a set of p examples is expanded to p + 1examples

The central quantity is the cavity activation which is the activation of example 0 in a network which learns examples 1 to p (but never learns example 0)

Since the original network has no information about example 0 the cavity activation obeys a random distribution (eg a Gaussian)

Now suppose the network incorporates example 0 at time s The activation is no longer random

The Cavity Method

The cavity activation diffuses randomly The generic activation receiving a stimulus at time s

is no longer random The background examples also adjust due to the

newcomer Assuming that the background adjustments are small

we can use linear response theory to superpose the effects due to all previous times s

Linear Response

time

stimulation time s

X(t)

h(t)random diffusion

For batch learningGeneric activation of an example at time t= cavity activation of the example at time t+ integrates(Greenrsquos function from time s to t) x(gradient term at time s)

Useful Equations

Simulation Results generic activation (with giant boosts)

(line)cavity activation from theory

(dots)simulation with example removed

learning instants (Poisson distributed)

theory and simulations agree

Further Development

training error

generalization error

theory and simulations agree

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

On-Line Learning with Recycled Examples A Cavity Analysis
Slide 2
Slide 3
Slide 4
Batch vs On-line
Slide 6
Slide 7
Slide 8
Simulation Results
Further Development
Critical Learning Rate (1)
Critical Learning Rate (2)
Average Learning
Slide 14

Page 4: On-Line Learning with Recycled Examples: A Cavity Analysis

Define a cost function in terms of the examplesE = Σ μ E μ + regularization terms

On-line learningAt time t draw an example σ(t) andΔJj ~ Gradient with respect to σ(t) + weight decay

Batch learningAt time tΔJj ~ Average gradient with respect to all examples+ weight decay

Learning Dynamics

Batch vs On-lineBatch learning On-line learning

Same batch of examples for all steps

An independent example per step

Simple dynamics no sequence dependence

Complex dynamics sequence dependence

Small stepwise changes of examples

Giant boosts of examples stepwise

Previous analysis possible

Previous analysis limited to infinite sets

Stable but inefficient Efficient but less stable

It has been applied to many complex systems It has been applied to steady-state properties of

learning It uses a self-consistency argument to consider what

happens when a set of p examples is expanded to p + 1examples

The central quantity is the cavity activation which is the activation of example 0 in a network which learns examples 1 to p (but never learns example 0)

Since the original network has no information about example 0 the cavity activation obeys a random distribution (eg a Gaussian)

Now suppose the network incorporates example 0 at time s The activation is no longer random

The Cavity Method

The cavity activation diffuses randomly The generic activation receiving a stimulus at time s

is no longer random The background examples also adjust due to the

newcomer Assuming that the background adjustments are small

we can use linear response theory to superpose the effects due to all previous times s

Linear Response

time

stimulation time s

X(t)

h(t)random diffusion

For batch learningGeneric activation of an example at time t= cavity activation of the example at time t+ integrates(Greenrsquos function from time s to t) x(gradient term at time s)

Useful Equations

Simulation Results generic activation (with giant boosts)

(line)cavity activation from theory

(dots)simulation with example removed

learning instants (Poisson distributed)

theory and simulations agree

Further Development

training error

generalization error

theory and simulations agree

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

On-Line Learning with Recycled Examples A Cavity Analysis
Slide 2
Slide 3
Slide 4
Batch vs On-line
Slide 6
Slide 7
Slide 8
Simulation Results
Further Development
Critical Learning Rate (1)
Critical Learning Rate (2)
Average Learning
Slide 14

Page 5: On-Line Learning with Recycled Examples: A Cavity Analysis

Batch vs On-lineBatch learning On-line learning

Same batch of examples for all steps

An independent example per step

Simple dynamics no sequence dependence

Complex dynamics sequence dependence

Small stepwise changes of examples

Giant boosts of examples stepwise

Previous analysis possible

Previous analysis limited to infinite sets

Stable but inefficient Efficient but less stable

It has been applied to many complex systems It has been applied to steady-state properties of

learning It uses a self-consistency argument to consider what

happens when a set of p examples is expanded to p + 1examples

The central quantity is the cavity activation which is the activation of example 0 in a network which learns examples 1 to p (but never learns example 0)

Since the original network has no information about example 0 the cavity activation obeys a random distribution (eg a Gaussian)

Now suppose the network incorporates example 0 at time s The activation is no longer random

The Cavity Method

The cavity activation diffuses randomly The generic activation receiving a stimulus at time s

is no longer random The background examples also adjust due to the

newcomer Assuming that the background adjustments are small

we can use linear response theory to superpose the effects due to all previous times s

Linear Response

time

stimulation time s

X(t)

h(t)random diffusion

For batch learningGeneric activation of an example at time t= cavity activation of the example at time t+ integrates(Greenrsquos function from time s to t) x(gradient term at time s)

Useful Equations

Simulation Results generic activation (with giant boosts)

(line)cavity activation from theory

(dots)simulation with example removed

learning instants (Poisson distributed)

theory and simulations agree

Further Development

training error

generalization error

theory and simulations agree

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

On-Line Learning with Recycled Examples A Cavity Analysis
Slide 2
Slide 3
Slide 4
Batch vs On-line
Slide 6
Slide 7
Slide 8
Simulation Results
Further Development
Critical Learning Rate (1)
Critical Learning Rate (2)
Average Learning
Slide 14

Page 6: On-Line Learning with Recycled Examples: A Cavity Analysis

It has been applied to many complex systems It has been applied to steady-state properties of

learning It uses a self-consistency argument to consider what

happens when a set of p examples is expanded to p + 1examples

The central quantity is the cavity activation which is the activation of example 0 in a network which learns examples 1 to p (but never learns example 0)

Since the original network has no information about example 0 the cavity activation obeys a random distribution (eg a Gaussian)

Now suppose the network incorporates example 0 at time s The activation is no longer random

The Cavity Method

The cavity activation diffuses randomly The generic activation receiving a stimulus at time s

is no longer random The background examples also adjust due to the

newcomer Assuming that the background adjustments are small

we can use linear response theory to superpose the effects due to all previous times s

Linear Response

time

stimulation time s

X(t)

h(t)random diffusion

For batch learningGeneric activation of an example at time t= cavity activation of the example at time t+ integrates(Greenrsquos function from time s to t) x(gradient term at time s)

Useful Equations

Simulation Results generic activation (with giant boosts)

(line)cavity activation from theory

(dots)simulation with example removed

learning instants (Poisson distributed)

theory and simulations agree

Further Development

training error

generalization error

theory and simulations agree

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

On-Line Learning with Recycled Examples A Cavity Analysis
Slide 2
Slide 3
Slide 4
Batch vs On-line
Slide 6
Slide 7
Slide 8
Simulation Results
Further Development
Critical Learning Rate (1)
Critical Learning Rate (2)
Average Learning
Slide 14

Page 7: On-Line Learning with Recycled Examples: A Cavity Analysis

The cavity activation diffuses randomly The generic activation receiving a stimulus at time s

is no longer random The background examples also adjust due to the

newcomer Assuming that the background adjustments are small

we can use linear response theory to superpose the effects due to all previous times s

Linear Response

time

stimulation time s

X(t)

h(t)random diffusion

For batch learningGeneric activation of an example at time t= cavity activation of the example at time t+ integrates(Greenrsquos function from time s to t) x(gradient term at time s)

Useful Equations

Simulation Results generic activation (with giant boosts)

(line)cavity activation from theory

(dots)simulation with example removed

learning instants (Poisson distributed)

theory and simulations agree

Further Development

training error

generalization error

theory and simulations agree

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

On-Line Learning with Recycled Examples A Cavity Analysis
Slide 2
Slide 3
Slide 4
Batch vs On-line
Slide 6
Slide 7
Slide 8
Simulation Results
Further Development
Critical Learning Rate (1)
Critical Learning Rate (2)
Average Learning
Slide 14

Page 8: On-Line Learning with Recycled Examples: A Cavity Analysis

For batch learningGeneric activation of an example at time t= cavity activation of the example at time t+ integrates(Greenrsquos function from time s to t) x(gradient term at time s)

Useful Equations

Simulation Results generic activation (with giant boosts)

(line)cavity activation from theory

(dots)simulation with example removed

learning instants (Poisson distributed)

theory and simulations agree

Further Development

training error

generalization error

theory and simulations agree

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

On-Line Learning with Recycled Examples A Cavity Analysis
Slide 2
Slide 3
Slide 4
Batch vs On-line
Slide 6
Slide 7
Slide 8
Simulation Results
Further Development
Critical Learning Rate (1)
Critical Learning Rate (2)
Average Learning
Slide 14

Page 9: On-Line Learning with Recycled Examples: A Cavity Analysis

Simulation Results generic activation (with giant boosts)

(line)cavity activation from theory

(dots)simulation with example removed

learning instants (Poisson distributed)

theory and simulations agree

Further Development

training error

generalization error

theory and simulations agree

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

On-Line Learning with Recycled Examples A Cavity Analysis
Slide 2
Slide 3
Slide 4
Batch vs On-line
Slide 6
Slide 7
Slide 8
Simulation Results
Further Development
Critical Learning Rate (1)
Critical Learning Rate (2)
Average Learning
Slide 14

Page 10: On-Line Learning with Recycled Examples: A Cavity Analysis

Further Development

training error

generalization error

theory and simulations agree

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

On-Line Learning with Recycled Examples A Cavity Analysis
Slide 2
Slide 3
Slide 4
Batch vs On-line
Slide 6
Slide 7
Slide 8
Simulation Results
Further Development
Critical Learning Rate (1)
Critical Learning Rate (2)
Average Learning
Slide 14

Page 11: On-Line Learning with Recycled Examples: A Cavity Analysis

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

On-Line Learning with Recycled Examples A Cavity Analysis
Slide 2
Slide 3
Slide 4
Batch vs On-line
Slide 6
Slide 7
Slide 8
Simulation Results
Further Development
Critical Learning Rate (1)
Critical Learning Rate (2)
Average Learning
Slide 14

Page 12: On-Line Learning with Recycled Examples: A Cavity Analysis

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

On-Line Learning with Recycled Examples A Cavity Analysis
Slide 2
Slide 3
Slide 4
Batch vs On-line
Slide 6
Slide 7
Slide 8
Simulation Results
Further Development
Critical Learning Rate (1)
Critical Learning Rate (2)
Average Learning
Slide 14

Page 13: On-Line Learning with Recycled Examples: A Cavity Analysis

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

On-Line Learning with Recycled Examples A Cavity Analysis
Slide 2
Slide 3
Slide 4
Batch vs On-line
Slide 6
Slide 7
Slide 8
Simulation Results
Further Development
Critical Learning Rate (1)
Critical Learning Rate (2)
Average Learning
Slide 14

Page 14: On-Line Learning with Recycled Examples: A Cavity Analysis

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion