A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to...

45
A Compact Introduction to Machine Learning Amir SANI, PhD Universit´ e Paris 1 Path´ eon-Sorbonne, Centre d’ ´ Economie de la Sorbonne, CNRS and Paris School of Economics Universit´ e Paris II Panth´ eon-Assas ISF Masters 2 Paris, France January 5, 2017

Transcript of A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to...

Page 1: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

A Compact Introduction to Machine Learning

Amir SANI, PhD

Universite Paris 1 Patheon-Sorbonne,Centre d’Economie de la Sorbonne,

CNRS and Paris School of Economics

Universite Paris II Pantheon-Assas

ISF Masters 2

Paris, FranceJanuary 5, 2017

Page 2: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Welcome!

Page 3: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Please go to:

ML4EF.github.io

Click on “Information Sheet” under Class 1

Page 4: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Today’s Class

What is Machine Learning?

Page 5: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Input → Predictor → Output

Page 6: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

X → f → y

Page 7: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Binary Classification

“Labels” y ∈ {−1, 1}

(or equivalently y ∈ {0, 1})

Page 8: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Regression

Labels y ∈ R

Page 9: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Supervised Machine Learning

Page 10: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Supervised Machine Learning: Step 1

Given examples,

X Examples

learn a function mapping to labels

yExamples

Page 11: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Supervised Machine Learning: Step 1

Learn a function mapping,

X Examples → f → yExamples

Page 12: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Supervised Machine Learning:Step 2

Use

f

to map new data

XOut−of−Sample

to predicted labels

yPredicted

Page 13: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Supervised Machine Learning:Step 2

Map new data,

XOut−of−Sample → f → yPredicted

Page 14: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Hypothesis Class

The Feature Map

φ(X ) ∈ Rd

maps

X

to the d -dimensional space,

φ(X ) = [φ1(X ), . . . , φd(X )]

Page 15: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Your Hypothesis Class is defined by,

φ(X ) = [φ1(X ), . . . , φd(X )]

Page 16: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

“Learning” aims to locate the weight vector

w ∈ Rd

over the feature mapping

φ(X ) ∈ Rd

where

wφ(X ) =d∑

j=1

wjφj(X )

Page 17: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Performance is measured according to a loss,

l(w , x , φ(·), y),

Page 18: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Standard Loss functions include,

Square Loss: (wφ(X )− y)2

Absolute Loss: |wφ(X )− y |

Page 19: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Learning is the optimization defined by,

minw∈Rd

loss(w ,X Examples , φ(·), yExamples)

Page 20: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Classification vs. Regression

Classification RegressionPredictor fw (·) sign(yPredicted) yPredicted

Distance to yTrue margin(yPredicted) residual(yPredicted)

Loss functionszero-onehingelogistic

squareabsolute

Page 21: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

What is Expressiveness?

Does the Hypothesis class,

F = {fw : w ∈ Rd}

contain a good predictor?

Page 22: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Consider,

Page 23: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Learning can find predictors defined by a

fixed θ(X ) and varying weights w

Page 24: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Feature Extraction can potentially access

better predictors

Page 25: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

What are some common ways to extract features?

Page 26: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Feature Extraction

I Domain Knowledge

I Feature Interactions

I Polynomial Expansions

I Kernel Maps

I (Unsupervised) Feature Learning (e.g. DeepLearning)

I Semi-supervised (Manifold) Learning

I Cluster Analysis (e.g. T-SNE, MDS)

I etc.

Page 27: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

How can I tell how well my algorithm will performout of sample?

Page 28: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Generalization

How far is the best predictor (in the smaller circle),

B = arg minf ∈F

Error(f ),

from the best possible predictor(in the larger circle)?

Page 29: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Generalization Performance

Page 30: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Generalization Error

Error [A]− Error [C ]

Page 31: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Generalization Error

Error [A]− Error [B ] + Error [B ]− Error [C ]

Page 32: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Generalization Error

Approximation vs. Estimation Error

Page 33: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Approximation Error

How far is the Hypothesis Class from the

Optimal Predictor C?

Error(B)− Error(C )

Page 34: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Approximation Error

Determined by Quality of Hypothesis Class

Page 35: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

LARGER Hypothesis Class↓

smaller Approximation Error

Page 36: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Estimation Error

The learning algorithm’s ability to locate the

best “learnable” predictor.

Error(A)− Error(B)

Page 37: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

LARGER Hypothesis Class↓

LARGER Approximation Error

Page 38: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Generalization Error

Error(A)− Error(B)︸ ︷︷ ︸Estimation Error

+Error(B)− Error(C )︸ ︷︷ ︸Approximation Error

Page 39: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Learning Objective

Minimize

Estimation Error

&

Approximation Error

Page 40: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

“Bias–Variance Trade-off”

Page 41: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Bias–Variance Trade-off

Error(A)− Error(B)︸ ︷︷ ︸Variance

+Error(B)− Error(C )︸ ︷︷ ︸Bias

Page 42: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie
Page 43: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Avoid Over/Under-Fitting with Feature Selection

I Remove Features

I Simplify Features

I Regularization1

I Avoid Over-Optimization (Data Mining)

1Penalize norm on w by adding additional penalty terms.

Page 44: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Feature Selection

I Forward Selection

I Backward Selection

I L1 (LASSO)

I Elastic Net

I Boosting

I Many many more

A lot of useful feature selection algorithms havealready been implemented. Use them!

Page 45: A Compact Introduction to Machine Learningml4ef.github.io/files/Class1.pdfA Compact Introduction to Machine Learning Amir SANI, PhD Universit e Paris 1 Path eon-Sorbonne, Centre d’Economie

Feature Selection

Thank you!