Learning From Data Lecture 15 Reflecting on Our Path...

11
Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The Machine Learning Zoo Moving Forward M. Magdon-Ismail CSCI 4100/6100

Transcript of Learning From Data Lecture 15 Reflecting on Our Path...

Page 1: Learning From Data Lecture 15 Reflecting on Our Path ...magdon/courses/LFD-Slides/SlidesLect15.pdf · PAC-learning errormeasures biometrics multiclass MDL one versus all active learning

Learning From Data

Lecture 15

Reflecting on Our Path - Epilogue to Part I

What We Did

The Machine Learning Zoo

Moving Forward

M. Magdon-IsmailCSCI 4100/6100

Page 2: Learning From Data Lecture 15 Reflecting on Our Path ...magdon/courses/LFD-Slides/SlidesLect15.pdf · PAC-learning errormeasures biometrics multiclass MDL one versus all active learning

recap: Three Learning Principles

Occam’s razor: simpler is better; falsifiable.

Scientist 1 Scientist 2 Scientist 3

temperature T

resistivityρ

temperature T

resistivityρ

temperature T

resistivityρ

not falsifiable falsifiable

Sampling bias: ensure that training and testdistributions are the same, or else acknowl-

edge/account for it. You cannot sample from onebin and use your estimates for another bin.

Data snooping: you are charged forevery choice influenced by D. Choose thelearning process (usually H) before looking at D.

We know the price of choosing g from H.

?h ∈ H ?

?g

Data

Dyour choices

−→ g

c© AML Creator: Malik Magdon-Ismail Reflecting on Our Path: 2 /11 Zen Moment−→

Page 3: Learning From Data Lecture 15 Reflecting on Our Path ...magdon/courses/LFD-Slides/SlidesLect15.pdf · PAC-learning errormeasures biometrics multiclass MDL one versus all active learning

YZen Moment

c© AML Creator: Malik Magdon-Ismail Reflecting on Our Path: 3 /11 Our Plan −→

Page 4: Learning From Data Lecture 15 Reflecting on Our Path ...magdon/courses/LFD-Slides/SlidesLect15.pdf · PAC-learning errormeasures biometrics multiclass MDL one versus all active learning

Our Plan

1. What is Learning?

Output g ≈ f after looking at data (xn, yn).

2. Can We do it?

Ein ≈ Eout simple H, finite dvc, large N

Ein ≈ 0 good H, algorithms

3. How to do it?

Linear models, nonlinear transforms

Algorithms: PLA, pseudoinverse, gradient descent

4. How to do it well?

Overfitting: stochastic & deterministic noise

Cures: regularization, validation.

5. General principles?

Occams razor, sampling bias, data snooping

6. Advanced techniques.

7. Other Learning Paradigms.

conceptstheorypractice

c© AML Creator: Malik Magdon-Ismail Reflecting on Our Path: 4 /11 LFD Jungle −→

Page 5: Learning From Data Lecture 15 Reflecting on Our Path ...magdon/courses/LFD-Slides/SlidesLect15.pdf · PAC-learning errormeasures biometrics multiclass MDL one versus all active learning

Learning From Data: It’s A Jungle Out There

overfitting stochastic noise K-means stochastic gradient descent exploration

reinforcementexploitation

augmented errorill-posed

Gaussian processesbootstrapping

Lloyds algorithm

deterministic noisedistribution free learning

data snoopingQ-learning

unlabelled dataexpectation-maximizationlogistic regression

Rademacher complexitylinear regressionCARTbaggingBayesian VC dimension

transfer learning learning curve gans

sampling biasneural networks Markov Chain Monte Carlo (MCMC)

nonlinear transformation

Mercer’s theoremsupport vectors

Gibbs samplingdecision trees

adaboostSVM

graphical models bioinformatics

linear modelsordinal regression

training versus testingno free lunch

extrapolation

DEEP LEARNINGcross validation HMMs bias-variance tradeoff

PAC-learningbiometricserror measures

MDLmulticlassone versus all

active learning

types of learning

random forests unsupervisedweak learning

online-learning

RBF

is learning feasible?

data contaminationperceptron learning

noisy targetsranking

momentum

Occam’s razor

conjugate gradientsLevenberg-Marquardt

RKHS

kernel methodsmixture of expertsboosting

ensemble methodsAICpermutation complexity

multi-agent systemsclassification

primal-dualPCA

LLEkernel-PCA

colaborative filtering semi-supervised learningclustering

regularization

weight decayBig Data

Boltzmann machine

c© AML Creator: Malik Magdon-Ismail Reflecting on Our Path: 5 /11 Theory −→

Page 6: Learning From Data Lecture 15 Reflecting on Our Path ...magdon/courses/LFD-Slides/SlidesLect15.pdf · PAC-learning errormeasures biometrics multiclass MDL one versus all active learning

Navigating the Jungle: Theory

THEORY

VC-analysis

bias-variance

complexity

Bayesian

Rademacher

SRM...

c© AML Creator: Malik Magdon-Ismail Reflecting on Our Path: 6 /11 Techniques −→

Page 7: Learning From Data Lecture 15 Reflecting on Our Path ...magdon/courses/LFD-Slides/SlidesLect15.pdf · PAC-learning errormeasures biometrics multiclass MDL one versus all active learning

Navigating the Jungle: Techniques

THEORY

VC-analysis

bias-variance

complexity

Bayesian

Rademacher

SRM...

TECHNIQUES

Models Methods

c© AML Creator: Malik Magdon-Ismail Reflecting on Our Path: 7 /11 Models −→

Page 8: Learning From Data Lecture 15 Reflecting on Our Path ...magdon/courses/LFD-Slides/SlidesLect15.pdf · PAC-learning errormeasures biometrics multiclass MDL one versus all active learning

Navigating the Jungle: Models

THEORY

VC-analysis

bias-variance

complexity

Bayesian

Rademacher

SRM...

TECHNIQUES

Models

linear

neural networks

SVM

similarity

Gaussian processes

graphical models

bilinear/SVD...

Methods

c© AML Creator: Malik Magdon-Ismail Reflecting on Our Path: 8 /11 Methods −→

Page 9: Learning From Data Lecture 15 Reflecting on Our Path ...magdon/courses/LFD-Slides/SlidesLect15.pdf · PAC-learning errormeasures biometrics multiclass MDL one versus all active learning

Navigating the Jungle: Methods

THEORY

VC-analysis

bias-variance

complexity

Bayesian

Rademacher

SRM...

TECHNIQUES

Models

linear

neural networks

SVM

similarity

Gaussian processes

graphical models

bilinear/SVD...

Methods

regularization

validation

aggregation

preprocessing...

c© AML Creator: Malik Magdon-Ismail Reflecting on Our Path: 9 /11 Paradigms −→

Page 10: Learning From Data Lecture 15 Reflecting on Our Path ...magdon/courses/LFD-Slides/SlidesLect15.pdf · PAC-learning errormeasures biometrics multiclass MDL one versus all active learning

Navigating the Jungle: Paradigms

THEORY

VC-analysis

bias-variance

complexity

Bayesian

Rademacher

SRM...

TECHNIQUES

Models

linear

neural networks

SVM

similarity

Gaussian processes

graphical models

bilinear/SVD...

Methods

regularization

validation

aggregation

preprocessing...

PARADIGMS

supervised

unsupervised

reinforcement

active

online

unlabeled

transfer learning

big data...

c© AML Creator: Malik Magdon-Ismail Reflecting on Our Path: 10 /11 Moving Forward −→

Page 11: Learning From Data Lecture 15 Reflecting on Our Path ...magdon/courses/LFD-Slides/SlidesLect15.pdf · PAC-learning errormeasures biometrics multiclass MDL one versus all active learning

Moving Forward

1. What is Learning?

Output g ≈ f after looking at data (xn, yn).

2. Can We do it?

Ein ≈ Eout simple H, finite dvc, large NEin ≈ 0 good H, algorithms

3. How to do it?

Linear models, nonlinear transforms

Algorithms: PLA, pseudoinverse, gradient descent

4. How to do it well?

Overfitting: stochastic & deterministic noise

Cures: regularization, validation.

5. General principles?

Occams razor, sampling bias, data snooping

6. Advanced techniques.

Similarity, neural networks, SVMs, preprocessing & aggregation

7. Other Learning Paradigms.

Unsupervised, reinforcement

conceptstheorypractice

c© AML Creator: Malik Magdon-Ismail Reflecting on Our Path: 11 /11