Download - On-Line Learning with Recycled Examples: A Cavity Analysis

Transcript
Page 1: On-Line Learning with Recycled Examples: A Cavity Analysis

1

On-Line Learning with Recycled Examples

A Cavity Analysis

Peixun Luo and K Y Michael Wong

Hong Kong University of Science and Technology

2

Inputs ξj j = 1 N

Weights Jj j = 1 hellip N

Activation y = Jξ

Output S = f(y)

Formulation

Jj

y

3

Given p = αN examples with

inputs ξjμ j = 1 N μ = 1 hellip p

outputs yμ generated by a teacher network

Learning is done by defining a risk function and minimizing it by gradient descent

The Learning of a Task

Jj

y

4

Define a cost function in terms of the examplesE = Σ μ E μ + regularization terms

On-line learningAt time t draw an example σ(t) andΔJj ~ Gradient with respect to σ(t) + weight decay

Batch learningAt time tΔJj ~ Average gradient with respect to all examples+ weight decay

Learning Dynamics

5

Batch vs On-lineBatch learning On-line learning

Same batch of examples for all steps

An independent example per step

Simple dynamics no sequence dependence

Complex dynamics sequence dependence

Small stepwise changes of examples

Giant boosts of examples stepwise

Previous analysis possible

Previous analysis limited to infinite sets

Stable but inefficient Efficient but less stable

6

It has been applied to many complex systems It has been applied to steady-state properties of

learning It uses a self-consistency argument to consider what

happens when a set of p examples is expanded to p + 1examples

The central quantity is the cavity activation which is the activation of example 0 in a network which learns examples 1 to p (but never learns example 0)

Since the original network has no information about example 0 the cavity activation obeys a random distribution (eg a Gaussian)

Now suppose the network incorporates example 0 at time s The activation is no longer random

The Cavity Method

7

The cavity activation diffuses randomly The generic activation receiving a stimulus at time s

is no longer random The background examples also adjust due to the

newcomer Assuming that the background adjustments are small

we can use linear response theory to superpose the effects due to all previous times s

Linear Response

time

stimulation time s

X(t)

h(t)random diffusion

8

For batch learningGeneric activation of an example at time t= cavity activation of the example at time t+ integrates(Greenrsquos function from time s to t) x(gradient term at time s)

For on-line learningGeneric activation of an example at time t= cavity activation of the example at time t+ summations(Greenrsquos function from time s to t) x(gradient term at time s)The learning instants s are Poisson distributed

Useful Equations

9

Simulation Results generic activation (with giant boosts)

(line)cavity activation from theory

(dots)simulation with example removed

learning instants (Poisson distributed)

theory and simulations agree

10

Further Development

training error

generalization error

theory and simulations agree

11

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

12

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

13

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

14

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

  • On-Line Learning with Recycled Examples A Cavity Analysis
  • Slide 2
  • Slide 3
  • Slide 4
  • Batch vs On-line
  • Slide 6
  • Slide 7
  • Slide 8
  • Simulation Results
  • Further Development
  • Critical Learning Rate (1)
  • Critical Learning Rate (2)
  • Average Learning
  • Slide 14
Page 2: On-Line Learning with Recycled Examples: A Cavity Analysis

2

Inputs ξj j = 1 N

Weights Jj j = 1 hellip N

Activation y = Jξ

Output S = f(y)

Formulation

Jj

y

3

Given p = αN examples with

inputs ξjμ j = 1 N μ = 1 hellip p

outputs yμ generated by a teacher network

Learning is done by defining a risk function and minimizing it by gradient descent

The Learning of a Task

Jj

y

4

Define a cost function in terms of the examplesE = Σ μ E μ + regularization terms

On-line learningAt time t draw an example σ(t) andΔJj ~ Gradient with respect to σ(t) + weight decay

Batch learningAt time tΔJj ~ Average gradient with respect to all examples+ weight decay

Learning Dynamics

5

Batch vs On-lineBatch learning On-line learning

Same batch of examples for all steps

An independent example per step

Simple dynamics no sequence dependence

Complex dynamics sequence dependence

Small stepwise changes of examples

Giant boosts of examples stepwise

Previous analysis possible

Previous analysis limited to infinite sets

Stable but inefficient Efficient but less stable

6

It has been applied to many complex systems It has been applied to steady-state properties of

learning It uses a self-consistency argument to consider what

happens when a set of p examples is expanded to p + 1examples

The central quantity is the cavity activation which is the activation of example 0 in a network which learns examples 1 to p (but never learns example 0)

Since the original network has no information about example 0 the cavity activation obeys a random distribution (eg a Gaussian)

Now suppose the network incorporates example 0 at time s The activation is no longer random

The Cavity Method

7

The cavity activation diffuses randomly The generic activation receiving a stimulus at time s

is no longer random The background examples also adjust due to the

newcomer Assuming that the background adjustments are small

we can use linear response theory to superpose the effects due to all previous times s

Linear Response

time

stimulation time s

X(t)

h(t)random diffusion

8

For batch learningGeneric activation of an example at time t= cavity activation of the example at time t+ integrates(Greenrsquos function from time s to t) x(gradient term at time s)

For on-line learningGeneric activation of an example at time t= cavity activation of the example at time t+ summations(Greenrsquos function from time s to t) x(gradient term at time s)The learning instants s are Poisson distributed

Useful Equations

9

Simulation Results generic activation (with giant boosts)

(line)cavity activation from theory

(dots)simulation with example removed

learning instants (Poisson distributed)

theory and simulations agree

10

Further Development

training error

generalization error

theory and simulations agree

11

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

12

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

13

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

14

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

  • On-Line Learning with Recycled Examples A Cavity Analysis
  • Slide 2
  • Slide 3
  • Slide 4
  • Batch vs On-line
  • Slide 6
  • Slide 7
  • Slide 8
  • Simulation Results
  • Further Development
  • Critical Learning Rate (1)
  • Critical Learning Rate (2)
  • Average Learning
  • Slide 14
Page 3: On-Line Learning with Recycled Examples: A Cavity Analysis

3

Given p = αN examples with

inputs ξjμ j = 1 N μ = 1 hellip p

outputs yμ generated by a teacher network

Learning is done by defining a risk function and minimizing it by gradient descent

The Learning of a Task

Jj

y

4

Define a cost function in terms of the examplesE = Σ μ E μ + regularization terms

On-line learningAt time t draw an example σ(t) andΔJj ~ Gradient with respect to σ(t) + weight decay

Batch learningAt time tΔJj ~ Average gradient with respect to all examples+ weight decay

Learning Dynamics

5

Batch vs On-lineBatch learning On-line learning

Same batch of examples for all steps

An independent example per step

Simple dynamics no sequence dependence

Complex dynamics sequence dependence

Small stepwise changes of examples

Giant boosts of examples stepwise

Previous analysis possible

Previous analysis limited to infinite sets

Stable but inefficient Efficient but less stable

6

It has been applied to many complex systems It has been applied to steady-state properties of

learning It uses a self-consistency argument to consider what

happens when a set of p examples is expanded to p + 1examples

The central quantity is the cavity activation which is the activation of example 0 in a network which learns examples 1 to p (but never learns example 0)

Since the original network has no information about example 0 the cavity activation obeys a random distribution (eg a Gaussian)

Now suppose the network incorporates example 0 at time s The activation is no longer random

The Cavity Method

7

The cavity activation diffuses randomly The generic activation receiving a stimulus at time s

is no longer random The background examples also adjust due to the

newcomer Assuming that the background adjustments are small

we can use linear response theory to superpose the effects due to all previous times s

Linear Response

time

stimulation time s

X(t)

h(t)random diffusion

8

For batch learningGeneric activation of an example at time t= cavity activation of the example at time t+ integrates(Greenrsquos function from time s to t) x(gradient term at time s)

For on-line learningGeneric activation of an example at time t= cavity activation of the example at time t+ summations(Greenrsquos function from time s to t) x(gradient term at time s)The learning instants s are Poisson distributed

Useful Equations

9

Simulation Results generic activation (with giant boosts)

(line)cavity activation from theory

(dots)simulation with example removed

learning instants (Poisson distributed)

theory and simulations agree

10

Further Development

training error

generalization error

theory and simulations agree

11

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

12

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

13

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

14

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

  • On-Line Learning with Recycled Examples A Cavity Analysis
  • Slide 2
  • Slide 3
  • Slide 4
  • Batch vs On-line
  • Slide 6
  • Slide 7
  • Slide 8
  • Simulation Results
  • Further Development
  • Critical Learning Rate (1)
  • Critical Learning Rate (2)
  • Average Learning
  • Slide 14
Page 4: On-Line Learning with Recycled Examples: A Cavity Analysis

4

Define a cost function in terms of the examplesE = Σ μ E μ + regularization terms

On-line learningAt time t draw an example σ(t) andΔJj ~ Gradient with respect to σ(t) + weight decay

Batch learningAt time tΔJj ~ Average gradient with respect to all examples+ weight decay

Learning Dynamics

5

Batch vs On-lineBatch learning On-line learning

Same batch of examples for all steps

An independent example per step

Simple dynamics no sequence dependence

Complex dynamics sequence dependence

Small stepwise changes of examples

Giant boosts of examples stepwise

Previous analysis possible

Previous analysis limited to infinite sets

Stable but inefficient Efficient but less stable

6

It has been applied to many complex systems It has been applied to steady-state properties of

learning It uses a self-consistency argument to consider what

happens when a set of p examples is expanded to p + 1examples

The central quantity is the cavity activation which is the activation of example 0 in a network which learns examples 1 to p (but never learns example 0)

Since the original network has no information about example 0 the cavity activation obeys a random distribution (eg a Gaussian)

Now suppose the network incorporates example 0 at time s The activation is no longer random

The Cavity Method

7

The cavity activation diffuses randomly The generic activation receiving a stimulus at time s

is no longer random The background examples also adjust due to the

newcomer Assuming that the background adjustments are small

we can use linear response theory to superpose the effects due to all previous times s

Linear Response

time

stimulation time s

X(t)

h(t)random diffusion

8

For batch learningGeneric activation of an example at time t= cavity activation of the example at time t+ integrates(Greenrsquos function from time s to t) x(gradient term at time s)

For on-line learningGeneric activation of an example at time t= cavity activation of the example at time t+ summations(Greenrsquos function from time s to t) x(gradient term at time s)The learning instants s are Poisson distributed

Useful Equations

9

Simulation Results generic activation (with giant boosts)

(line)cavity activation from theory

(dots)simulation with example removed

learning instants (Poisson distributed)

theory and simulations agree

10

Further Development

training error

generalization error

theory and simulations agree

11

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

12

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

13

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

14

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

  • On-Line Learning with Recycled Examples A Cavity Analysis
  • Slide 2
  • Slide 3
  • Slide 4
  • Batch vs On-line
  • Slide 6
  • Slide 7
  • Slide 8
  • Simulation Results
  • Further Development
  • Critical Learning Rate (1)
  • Critical Learning Rate (2)
  • Average Learning
  • Slide 14
Page 5: On-Line Learning with Recycled Examples: A Cavity Analysis

5

Batch vs On-lineBatch learning On-line learning

Same batch of examples for all steps

An independent example per step

Simple dynamics no sequence dependence

Complex dynamics sequence dependence

Small stepwise changes of examples

Giant boosts of examples stepwise

Previous analysis possible

Previous analysis limited to infinite sets

Stable but inefficient Efficient but less stable

6

It has been applied to many complex systems It has been applied to steady-state properties of

learning It uses a self-consistency argument to consider what

happens when a set of p examples is expanded to p + 1examples

The central quantity is the cavity activation which is the activation of example 0 in a network which learns examples 1 to p (but never learns example 0)

Since the original network has no information about example 0 the cavity activation obeys a random distribution (eg a Gaussian)

Now suppose the network incorporates example 0 at time s The activation is no longer random

The Cavity Method

7

The cavity activation diffuses randomly The generic activation receiving a stimulus at time s

is no longer random The background examples also adjust due to the

newcomer Assuming that the background adjustments are small

we can use linear response theory to superpose the effects due to all previous times s

Linear Response

time

stimulation time s

X(t)

h(t)random diffusion

8

For batch learningGeneric activation of an example at time t= cavity activation of the example at time t+ integrates(Greenrsquos function from time s to t) x(gradient term at time s)

For on-line learningGeneric activation of an example at time t= cavity activation of the example at time t+ summations(Greenrsquos function from time s to t) x(gradient term at time s)The learning instants s are Poisson distributed

Useful Equations

9

Simulation Results generic activation (with giant boosts)

(line)cavity activation from theory

(dots)simulation with example removed

learning instants (Poisson distributed)

theory and simulations agree

10

Further Development

training error

generalization error

theory and simulations agree

11

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

12

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

13

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

14

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

  • On-Line Learning with Recycled Examples A Cavity Analysis
  • Slide 2
  • Slide 3
  • Slide 4
  • Batch vs On-line
  • Slide 6
  • Slide 7
  • Slide 8
  • Simulation Results
  • Further Development
  • Critical Learning Rate (1)
  • Critical Learning Rate (2)
  • Average Learning
  • Slide 14
Page 6: On-Line Learning with Recycled Examples: A Cavity Analysis

6

It has been applied to many complex systems It has been applied to steady-state properties of

learning It uses a self-consistency argument to consider what

happens when a set of p examples is expanded to p + 1examples

The central quantity is the cavity activation which is the activation of example 0 in a network which learns examples 1 to p (but never learns example 0)

Since the original network has no information about example 0 the cavity activation obeys a random distribution (eg a Gaussian)

Now suppose the network incorporates example 0 at time s The activation is no longer random

The Cavity Method

7

The cavity activation diffuses randomly The generic activation receiving a stimulus at time s

is no longer random The background examples also adjust due to the

newcomer Assuming that the background adjustments are small

we can use linear response theory to superpose the effects due to all previous times s

Linear Response

time

stimulation time s

X(t)

h(t)random diffusion

8

For batch learningGeneric activation of an example at time t= cavity activation of the example at time t+ integrates(Greenrsquos function from time s to t) x(gradient term at time s)

For on-line learningGeneric activation of an example at time t= cavity activation of the example at time t+ summations(Greenrsquos function from time s to t) x(gradient term at time s)The learning instants s are Poisson distributed

Useful Equations

9

Simulation Results generic activation (with giant boosts)

(line)cavity activation from theory

(dots)simulation with example removed

learning instants (Poisson distributed)

theory and simulations agree

10

Further Development

training error

generalization error

theory and simulations agree

11

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

12

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

13

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

14

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

  • On-Line Learning with Recycled Examples A Cavity Analysis
  • Slide 2
  • Slide 3
  • Slide 4
  • Batch vs On-line
  • Slide 6
  • Slide 7
  • Slide 8
  • Simulation Results
  • Further Development
  • Critical Learning Rate (1)
  • Critical Learning Rate (2)
  • Average Learning
  • Slide 14
Page 7: On-Line Learning with Recycled Examples: A Cavity Analysis

7

The cavity activation diffuses randomly The generic activation receiving a stimulus at time s

is no longer random The background examples also adjust due to the

newcomer Assuming that the background adjustments are small

we can use linear response theory to superpose the effects due to all previous times s

Linear Response

time

stimulation time s

X(t)

h(t)random diffusion

8

For batch learningGeneric activation of an example at time t= cavity activation of the example at time t+ integrates(Greenrsquos function from time s to t) x(gradient term at time s)

For on-line learningGeneric activation of an example at time t= cavity activation of the example at time t+ summations(Greenrsquos function from time s to t) x(gradient term at time s)The learning instants s are Poisson distributed

Useful Equations

9

Simulation Results generic activation (with giant boosts)

(line)cavity activation from theory

(dots)simulation with example removed

learning instants (Poisson distributed)

theory and simulations agree

10

Further Development

training error

generalization error

theory and simulations agree

11

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

12

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

13

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

14

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

  • On-Line Learning with Recycled Examples A Cavity Analysis
  • Slide 2
  • Slide 3
  • Slide 4
  • Batch vs On-line
  • Slide 6
  • Slide 7
  • Slide 8
  • Simulation Results
  • Further Development
  • Critical Learning Rate (1)
  • Critical Learning Rate (2)
  • Average Learning
  • Slide 14
Page 8: On-Line Learning with Recycled Examples: A Cavity Analysis

8

For batch learningGeneric activation of an example at time t= cavity activation of the example at time t+ integrates(Greenrsquos function from time s to t) x(gradient term at time s)

For on-line learningGeneric activation of an example at time t= cavity activation of the example at time t+ summations(Greenrsquos function from time s to t) x(gradient term at time s)The learning instants s are Poisson distributed

Useful Equations

9

Simulation Results generic activation (with giant boosts)

(line)cavity activation from theory

(dots)simulation with example removed

learning instants (Poisson distributed)

theory and simulations agree

10

Further Development

training error

generalization error

theory and simulations agree

11

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

12

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

13

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

14

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

  • On-Line Learning with Recycled Examples A Cavity Analysis
  • Slide 2
  • Slide 3
  • Slide 4
  • Batch vs On-line
  • Slide 6
  • Slide 7
  • Slide 8
  • Simulation Results
  • Further Development
  • Critical Learning Rate (1)
  • Critical Learning Rate (2)
  • Average Learning
  • Slide 14
Page 9: On-Line Learning with Recycled Examples: A Cavity Analysis

9

Simulation Results generic activation (with giant boosts)

(line)cavity activation from theory

(dots)simulation with example removed

learning instants (Poisson distributed)

theory and simulations agree

10

Further Development

training error

generalization error

theory and simulations agree

11

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

12

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

13

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

14

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

  • On-Line Learning with Recycled Examples A Cavity Analysis
  • Slide 2
  • Slide 3
  • Slide 4
  • Batch vs On-line
  • Slide 6
  • Slide 7
  • Slide 8
  • Simulation Results
  • Further Development
  • Critical Learning Rate (1)
  • Critical Learning Rate (2)
  • Average Learning
  • Slide 14
Page 10: On-Line Learning with Recycled Examples: A Cavity Analysis

10

Further Development

training error

generalization error

theory and simulations agree

11

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

12

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

13

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

14

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

  • On-Line Learning with Recycled Examples A Cavity Analysis
  • Slide 2
  • Slide 3
  • Slide 4
  • Batch vs On-line
  • Slide 6
  • Slide 7
  • Slide 8
  • Simulation Results
  • Further Development
  • Critical Learning Rate (1)
  • Critical Learning Rate (2)
  • Average Learning
  • Slide 14
Page 11: On-Line Learning with Recycled Examples: A Cavity Analysis

11

Critical Learning Rate (1)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

12

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

13

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

14

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

  • On-Line Learning with Recycled Examples A Cavity Analysis
  • Slide 2
  • Slide 3
  • Slide 4
  • Batch vs On-line
  • Slide 6
  • Slide 7
  • Slide 8
  • Simulation Results
  • Further Development
  • Critical Learning Rate (1)
  • Critical Learning Rate (2)
  • Average Learning
  • Slide 14
Page 12: On-Line Learning with Recycled Examples: A Cavity Analysis

12

Critical Learning Rate (2)

theory and simulations agree

critical learning rate at which learning diverges

other approximations

13

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

14

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

  • On-Line Learning with Recycled Examples A Cavity Analysis
  • Slide 2
  • Slide 3
  • Slide 4
  • Batch vs On-line
  • Slide 6
  • Slide 7
  • Slide 8
  • Simulation Results
  • Further Development
  • Critical Learning Rate (1)
  • Critical Learning Rate (2)
  • Average Learning
  • Slide 14
Page 13: On-Line Learning with Recycled Examples: A Cavity Analysis

13

Average Learning

theory and simulations agree

generalization error drops when the dynamics is averaged over monitoring periods

14

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

  • On-Line Learning with Recycled Examples A Cavity Analysis
  • Slide 2
  • Slide 3
  • Slide 4
  • Batch vs On-line
  • Slide 6
  • Slide 7
  • Slide 8
  • Simulation Results
  • Further Development
  • Critical Learning Rate (1)
  • Critical Learning Rate (2)
  • Average Learning
  • Slide 14
Page 14: On-Line Learning with Recycled Examples: A Cavity Analysis

14

We have analysed the dynamics of on-line learning with recycled examples using the cavity approach

Theory is able to reproduce the Poisson-distributed giant boosts of the activations during learning

Theory and simulations agree well onthe evolution of the training and generalization errorsthe critical learning rate at which learning divergesthe performance of average learning

Future to develop a Monte Carlo sampling procedure for multilayer networks

Conclusion

  • On-Line Learning with Recycled Examples A Cavity Analysis
  • Slide 2
  • Slide 3
  • Slide 4
  • Batch vs On-line
  • Slide 6
  • Slide 7
  • Slide 8
  • Simulation Results
  • Further Development
  • Critical Learning Rate (1)
  • Critical Learning Rate (2)
  • Average Learning
  • Slide 14