17 Machine Learning Radial Basis Functions

Click here to load reader

  • date post

    16-Apr-2017
  • Category

    Engineering

  • view

    295
  • download

    2

Embed Size (px)

Transcript of 17 Machine Learning Radial Basis Functions

  • Neural NetworksRadial Basis Functions Networks

    Andres Mendez-Vazquez

    December 10, 2015

    1 / 96

  • Outline1 Introduction

    Main IdeaBasic Radial-Basis Functions

    2 SeparabilityCovers Theorem on the separability of patternsDichotomy-separable functionsThe Stochastic ExperimentThe XOR ProblemSeparating Capacity of a Surface

    3 Interpolation ProblemWhat is gained?Feedforward NetworkLearning ProcessRadial-Basis Functions (RBF)

    4 IntroductionDescription of the ProblemWell-posed or ill-posedThe Main Problem

    5 Regularization TheorySolving the issueBias-Variance DilemmaMeasuring the difference between optimal and learnedThe Bias-Variance

    How can we use this?Getting a solutionWe still need to talk about...

    2 / 96

  • Outline1 Introduction

    Main IdeaBasic Radial-Basis Functions

    2 SeparabilityCovers Theorem on the separability of patternsDichotomy-separable functionsThe Stochastic ExperimentThe XOR ProblemSeparating Capacity of a Surface

    3 Interpolation ProblemWhat is gained?Feedforward NetworkLearning ProcessRadial-Basis Functions (RBF)

    4 IntroductionDescription of the ProblemWell-posed or ill-posedThe Main Problem

    5 Regularization TheorySolving the issueBias-Variance DilemmaMeasuring the difference between optimal and learnedThe Bias-Variance

    How can we use this?Getting a solutionWe still need to talk about...

    3 / 96

  • Introduction

    ObservationThe back-propagation algorithm for the design of a multilayer perceptronas described in the previous chapter may be viewed as the application of arecursive technique known in statistics as stochastic approximation.

    NowWe take a completely different approach by viewing the design of a neuralnetwork as a curve fitting (approximation) problem in a high-dimensionalspace.

    ThusLearning is equivalent to finding a surface in a multidimensional space thatprovides a best fit to the training data.

    Under a statistical metric

    4 / 96

  • Introduction

    ObservationThe back-propagation algorithm for the design of a multilayer perceptronas described in the previous chapter may be viewed as the application of arecursive technique known in statistics as stochastic approximation.

    NowWe take a completely different approach by viewing the design of a neuralnetwork as a curve fitting (approximation) problem in a high-dimensionalspace.

    ThusLearning is equivalent to finding a surface in a multidimensional space thatprovides a best fit to the training data.

    Under a statistical metric

    4 / 96

  • Introduction

    ObservationThe back-propagation algorithm for the design of a multilayer perceptronas described in the previous chapter may be viewed as the application of arecursive technique known in statistics as stochastic approximation.

    NowWe take a completely different approach by viewing the design of a neuralnetwork as a curve fitting (approximation) problem in a high-dimensionalspace.

    ThusLearning is equivalent to finding a surface in a multidimensional space thatprovides a best fit to the training data.

    Under a statistical metric

    4 / 96

  • Thus

    In the context of a neural networkThe hidden units provide a set of "functions"

    A "basis" for the input patterns when they are expanded into thehidden space.

    Name of these functionsRadial-Basis functions.

    5 / 96

  • Thus

    In the context of a neural networkThe hidden units provide a set of "functions"

    A "basis" for the input patterns when they are expanded into thehidden space.

    Name of these functionsRadial-Basis functions.

    5 / 96

  • History

    These functions were first introducedAs the solution of the real multivariate interpolation problem

    Right nowIt is now one of the main fields of research in numerical analysis.

    6 / 96

  • History

    These functions were first introducedAs the solution of the real multivariate interpolation problem

    Right nowIt is now one of the main fields of research in numerical analysis.

    6 / 96

  • Outline1 Introduction

    Main IdeaBasic Radial-Basis Functions

    2 SeparabilityCovers Theorem on the separability of patternsDichotomy-separable functionsThe Stochastic ExperimentThe XOR ProblemSeparating Capacity of a Surface

    3 Interpolation ProblemWhat is gained?Feedforward NetworkLearning ProcessRadial-Basis Functions (RBF)

    4 IntroductionDescription of the ProblemWell-posed or ill-posedThe Main Problem

    5 Regularization TheorySolving the issueBias-Variance DilemmaMeasuring the difference between optimal and learnedThe Bias-Variance

    How can we use this?Getting a solutionWe still need to talk about...

    7 / 96

  • A Basic Structure

    We have the following structure1 Input Layer to connect with the environment.2 Hidden Layer applying a non-linear transformation.3 Output Layer applying a linear transformation.

    Example

    8 / 96

  • A Basic Structure

    We have the following structure1 Input Layer to connect with the environment.2 Hidden Layer applying a non-linear transformation.3 Output Layer applying a linear transformation.

    Example

    8 / 96

  • A Basic Structure

    We have the following structure1 Input Layer to connect with the environment.2 Hidden Layer applying a non-linear transformation.3 Output Layer applying a linear transformation.

    Example

    8 / 96

  • A Basic Structure

    We have the following structure1 Input Layer to connect with the environment.2 Hidden Layer applying a non-linear transformation.3 Output Layer applying a linear transformation.

    ExampleInput Nodes

    Nonlinear Nodes

    Linear Node

    8 / 96

  • Why the non-linear transformation?

    The justificationIn a paper by Cover (1965), a pattern-classification problem mapped to ahigh dimensional space is more likely to be linearly separable than in alow-dimensional space.

    ThusA good reason to make the dimension in the hidden space in aRadial-Basis Function (RBF) network high

    9 / 96

  • Why the non-linear transformation?

    The justificationIn a paper by Cover (1965), a pattern-classification problem mapped to ahigh dimensional space is more likely to be linearly separable than in alow-dimensional space.

    ThusA good reason to make the dimension in the hidden space in aRadial-Basis Function (RBF) network high

    9 / 96

  • Outline1 Introduction

    Main IdeaBasic Radial-Basis Functions

    2 SeparabilityCovers Theorem on the separability of patternsDichotomy-separable functionsThe Stochastic ExperimentThe XOR ProblemSeparating Capacity of a Surface

    3 Interpolation ProblemWhat is gained?Feedforward NetworkLearning ProcessRadial-Basis Functions (RBF)

    4 IntroductionDescription of the ProblemWell-posed or ill-posedThe Main Problem

    5 Regularization TheorySolving the issueBias-Variance DilemmaMeasuring the difference between optimal and learnedThe Bias-Variance

    How can we use this?Getting a solutionWe still need to talk about...

    10 / 96

  • Covers Theorem

    The Resumed StatementA complex pattern-classification problem cast in a high-dimensional spacenonlinearly is more likely to be linearly separable than in a low-dimensionalspace.

    ActuallyIt is quite more complex...

    11 / 96

  • Covers Theorem

    The Resumed StatementA complex pattern-classification problem cast in a high-dimensional spacenonlinearly is more likely to be linearly separable than in a low-dimensionalspace.

    ActuallyIt is quite more complex...

    11 / 96

  • Some facts

    A factOnce we know a set of patterns are linearly separable, the problem is easyto solve.

    ConsiderA family of surfaces that separate the space in two regions.

    In additionWe have a set of patterns

    H = {x1,x2, ...,xN} (1)

    12 / 96

  • Some facts

    A factOnce we know a set of patterns are linearly separable, the problem is easyto solve.

    ConsiderA family of surfaces that separate the space in two regions.

    In additionWe have a set of patterns

    H = {x1,x2, ...,xN} (1)

    12 / 96

  • Some facts

    A factOnce we know a set of patterns are linearly separable, the problem is easyto solve.

    ConsiderA family of surfaces that separate the space in two regions.

    In additionWe have a set of patterns

    H = {x1,x2, ...,xN} (1)

    12 / 96

  • Outline1 Introduction

    Main IdeaBasic Radial-Basis Functions

    2 SeparabilityCovers Theorem on the separability of patternsDichotomy-separable functionsThe Stochastic ExperimentThe XOR ProblemSeparating Capacity of a Surface

    3 Interpolation ProblemWhat is gained?Feedforward NetworkLearning ProcessRadial-Basis Functions (RBF)

    4 IntroductionDescription of the ProblemWell-posed or ill-posedThe Main Problem

    5 Regularization TheorySolving the issueBias-Variance DilemmaMeasuring the difference between optimal and learnedThe Bias-Variance

    How can we use this?Getting a solutionWe still need to talk about...

    13 / 96

  • Dichotomy (Binary Partition)

    NowThe pattern set is split into two classes H1 and H2.

    DefinitionA dichotomy (binary partition) of the points is said to be separable withrespect to the family of surfaces if a surface exists in the family thatseparates the points in the class H1 from those in the class H2.

    DefineFor each pattern x H, we define a set of real valued measurementfunctions {1 (x) , 2 (x) , ..., d1 (x)}

    14 / 96

  • Dichotomy (Binary Partition)

    NowThe pattern set is split into two classes H1 and H2.

    DefinitionA dichotomy (binary partition) of the points is said to