A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to...

A Compact Introduction to Machine Learning

Amir SANI, PhD

Universite Paris 1 Patheon-Sorbonne,Centre d’Economie de la Sorbonne,

CNRS and Paris School of Economics

Universite Paris II Pantheon-Assas

ISF Masters 2

Paris, FranceJanuary 5, 2017

Welcome!

Please go to:

ML4EF.github.io

Click on “Information Sheet” under Class 1

Today’s Class

What is Machine Learning?

Input → Predictor → Output

X → f → y

Binary Classification

“Labels” y ∈ {−1, 1}

(or equivalently y ∈ {0, 1})

Regression

Labels y ∈ R

Supervised Machine Learning

Supervised Machine Learning: Step 1

Given examples,

X Examples

learn a function mapping to labels

yExamples

Supervised Machine Learning: Step 1

Learn a function mapping,

X Examples → f → yExamples

Supervised Machine Learning:Step 2

to map new data

XOut−of−Sample

to predicted labels

yPredicted

Supervised Machine Learning:Step 2

Map new data,

XOut−of−Sample → f → yPredicted

Hypothesis Class

The Feature Map

φ(X ) ∈ Rd

to the d -dimensional space,

φ(X ) = [φ1(X ), . . . , φd(X )]

Your Hypothesis Class is defined by,

φ(X ) = [φ1(X ), . . . , φd(X )]

“Learning” aims to locate the weight vector

w ∈ Rd

over the feature mapping

φ(X ) ∈ Rd

wφ(X ) =d∑

wjφj(X )

Performance is measured according to a loss,

l(w , x , φ(·), y),

Standard Loss functions include,

Square Loss: (wφ(X )− y)2

Absolute Loss: |wφ(X )− y |

Learning is the optimization defined by,

minw∈Rd

loss(w ,X Examples , φ(·), yExamples)

Classification vs. Regression

Classification RegressionPredictor fw (·) sign(yPredicted) yPredicted

Distance to yTrue margin(yPredicted) residual(yPredicted)

Loss functionszero-onehingelogistic

squareabsolute

What is Expressiveness?

Does the Hypothesis class,

F = {fw : w ∈ Rd}

contain a good predictor?

Consider,

Learning can find predictors defined by a

fixed θ(X ) and varying weights w

Feature Extraction can potentially access

better predictors

What are some common ways to extract features?

Feature Extraction

I Domain Knowledge

I Feature Interactions

I Polynomial Expansions

I Kernel Maps

I (Unsupervised) Feature Learning (e.g. DeepLearning)

I Semi-supervised (Manifold) Learning

I Cluster Analysis (e.g. T-SNE, MDS)

I etc.

How can I tell how well my algorithm will performout of sample?

Generalization

How far is the best predictor (in the smaller circle),

B = arg minf ∈F

Error(f ),

from the best possible predictor(in the larger circle)?

Generalization Performance

Generalization Error

Error [A]− Error [C ]

Error [A]− Error [B ] + Error [B ]− Error [C ]

Approximation vs. Estimation Error

Approximation Error

How far is the Hypothesis Class from the

Optimal Predictor C?

Error(B)− Error(C )

Approximation Error

Determined by Quality of Hypothesis Class

LARGER Hypothesis Class↓

smaller Approximation Error

Estimation Error

The learning algorithm’s ability to locate the

best “learnable” predictor.

Error(A)− Error(B)

LARGER Hypothesis Class↓

LARGER Approximation Error

Error(A)− Error(B)︸︷︷︸Estimation Error

+Error(B)− Error(C )︸︷︷︸Approximation Error

Learning Objective

Minimize

Estimation Error

Approximation Error

“Bias–Variance Trade-off”

Bias–Variance Trade-off

Error(A)− Error(B)︸︷︷︸Variance

+Error(B)− Error(C )︸︷︷︸Bias

Avoid Over/Under-Fitting with Feature Selection

I Remove Features

I Simplify Features

I Regularization1

I Avoid Over-Optimization (Data Mining)

1Penalize norm on w by adding additional penalty terms.

Feature Selection

I Forward Selection

I Backward Selection

I L1 (LASSO)

I Elastic Net

I Boosting

I Many many more

A lot of useful feature selection algorithms havealready been implemented. Use them!

Feature Selection

Thank you!

A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to...

Documents

Transcript of A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to...

PRECISION MACHINE VISION LENS

CONVEX REAL PROJECTIVE STRUCTURES ON COMPACT …

DESIGN OF DC MACHINE

Moore Machine

COMPACT COMPOSITION OPERATORS ON WEIGHTED …

Flesh Machine

FM MW Compact Disc Player

HYDRAULIC EXCAVATOR TB260 - Takeuchi Global · 2019-09-02 · TB260 Compact Excavator MACHINE DIMENSIONS 1,620 mm ARM 1,780 mm ARM A Maximum Reach 6,120 mm 6,270 mm B Maximum Reach

DIVISORIAL ZARISKI DECOMPOSITIONS ON COMPACT COMPLEX MANIFOLDSsebastien.boucksom.perso.math.cnrs.fr/publis/divisorial.pdf · DIVISORIAL ZARISKI DECOMPOSITIONS ON COMPACT COMPLEX MANIFOLDS

Turing Machines - Rice Universityyz2/COMP481/Chapter17.pdf · Turing Machine Machine Move • Change symbol under tape head • Move tape head • Change state • Analog Machine?

Introduction to Machine Learning

Tutorial on dagger categories - mta.carrosebru/FMCS2018/Slides/Selinger.pdf · Dagger compact closed categories Deﬁnition. A dagger compact closed category is a compact closed,

UPS Informer Compact User Manual Gr

BB FLUID TOUCH COMPACT · BB FLUID TOUCH COMPACT Ζ CAILYN BB Fluid Touch Compact δζαεέηεζ ιζα πθμφζζα θυνιμοθα πμο εκοδαηχκεζ, θςηίγεζ,

Auger Filling Machine - · PDF fileSemi-auto Auger Type Powder Metering Filling Machine (Bench Model) Model Machine Size Machine Weight Power Required GSM - 2002 250kg 220V, 3φ, 3.5HP,

A Compact Substrate Integrated Waveguide Circularly ...

Machine design course

Morse Genericity and Morse's Theorem for compact … · Morse Genericity and Morse’s Theorem for compact ... Morse Genericity and Morse’s Theorem for compact smooth manifolds.March

Machine Learning 10-701tom/10701_sp11/slides/Kernels_SVM2_04_1… · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 12, 2011

Condition Assessment_HT Machine