Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd...

59
Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman Regression Lasso L1 regularization SVM regression and epsilon-insensitive loss • More loss functions • Multi-class Classification using binary classifiers random forests neural networks

Transcript of Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd...

Page 1: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Lecture 4: Regression ctd and multiple classes

C19 Machine Learning Hilary 2015 A. Zisserman

• Regression• Lasso L1 regularization• SVM regression and epsilon-insensitive loss

• More loss functions

• Multi-class Classification• using binary classifiers• random forests• neural networks

Page 2: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

NXi=1

l (f(xi,w), yi) + λR (w)

• There is a choice of both loss functions and regularization

• So far we have seen – “ridge” regression

• squared loss:

• squared regularizer:

• Now, consider other losses and regularizers

Regression cost functions

loss function regularization

Minimize with respect to w

NXi=1

(yi − f(xi,w))2

λkwk2

Page 3: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

NXi=1

(yi − f(xi,w))2 + λdXj

|wj|

The “Lasso” or L1 norm regularization

loss function regularization

• This is a quadratic optimization problem

• There is a unique solution

• p-Norm definition:

Minimize with respect to w ∈ Rd• LASSO = Least Absolute Shrinkage and Selection

kwkp =⎛⎝ dXj=1

|wi|p⎞⎠1p

Page 4: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Sparsity property of the Lasso• contour plots for d = 2

NXi=1

(yi − f(xi,w))2

λkwk2 λdXj

|wj|ridge regression lasso

• Minimum where loss contours tangent to regularizer’s

• For the lasso case, minima occur at “corners”

• Consequently one of the weights is zero

• In high dimensions many weights can be zero

Page 5: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression
Page 6: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

2 4 6 8 10-1

-0.5

0

0.5

1

1.5Least Squares

2 4 6 8 10-1

-0.5

0

0.5

1

1.5Ridge Regression

2 4 6 8 10-1

-0.5

0

0.5

1

1.5LASSO

Comparison of learnt weight vectors

Linear regression

N = 100

d = 10

y = w . x

lambda = 1

Lasso: number of non-zero coefficients = 5

Page 7: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

0 0.5 1-1

-0.5

0

0.5

1

1.5Ridge Regression

percent of lambdaMax

coef

ficie

nt v

alue

s

0 0.5 1-1

-0.5

0

0.5

1

1.5L1-Regularized Least Squares

percent of lambdaMaxco

effic

ient

val

ues

Lasso in action

regularization parameter lambda

Page 8: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Sparse weight vectors

• Weights being zero is a method of “feature selection” –zeroing out the unimportant features

• The SVM classifier also has this property (sparse alpha in the dual representation)

• Ridge regression does not

Page 9: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Other regularizers

NXi=1

(yi − f(xi,w))2 + λdXj

|wj|q

• For q ≥ 1, the cost function is convex and has a unique minimum.The solution can be obtained by quadratic optimization.

• For q < 1, the problem is not convex, and obtaining the global

minimum is more difficult

Page 10: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

r

SVMs for Regression

Use ε-insensitive error measure

Vε(r) =

(0 if |r| ≤ ε|r|− ε otherwise.

This can also be written as

Vε(r) = (|r|− ε)+

where ()+ indicates the positive part of (.).

Or equivalently as

Vε(r) = max ((|r|− ε),0)

loss function regularization

Vε(r)

square loss

cost is zero inside epsilon “tube”

Page 11: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

cost is zero inside epsilon “tube”• As before, introduce slack variables for

points that violate ε-insensitive error.

• For each data point, xi, two slack vari-

ables, ξi,bξi, are required (depending on

whether f(xi) is above or below the tube)

• Learning is by the optimization

minw∈Rd, ξi, bξi C

NXi

³ξi+

bξi´+ 1

2||w||2

subject to

yi ≤ f(xi,w)+ε+ξi, yi ≥ f(xi,w)−ε−bξi, ξi ≥ 0, bξi ≥ 0 for i = 1 . . . N

• Again, this is a quadratic programming problem

• It can be dualized

• Some of the data points will become support vectors

• It can be kernelized

Page 12: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1.5

-1

-0.5

0

0.5

1

1.5

Example: SV regression with Gaussian basis functions

• Regression function – Gaussians centred on data points

• Parameters are: C, epsilon, sigma

f(x,a) =NXi=1

aie−(x−xi)2/2σ2 N = 50

Page 13: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1.5

-1

-0.5

0

0.5

1

1.5rbf = 0.2 C = 1e+003 = 2

est. func.+-supp. vec.margin. vec.

C = 1000, epsilon = 0.2, sigma = 0.5

As epsilon increases:• fit becomes looser

• less data points are support vectors

Page 14: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Loss functions for regression

−3 −2 −1 0 1 2 30

1

2

3

4

y−f(x)

squareε−insensitiveHuber

• quadratic (square) loss `(y, f(x)) = 12(y − f(x))2

• ε-insensitive loss `(y, f(x)) = max ((|r|− ε),0)

• Huber loss (mixed quadratic/linear): robustness to outliers:`(y, f(x)) = h(y − f(x))

h(r) =

(r2 if |r| ≤ c2c|r|− c2 otherwise.

• all of these are convex

Page 15: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Final notes on cost functions

Regressors and classifiers can be constructed by a “mix ‘n’ match” of loss functions and regularizers to obtain a learning machine suited to a particular application. e.g. for a classifier

• L1—SVM

minw∈Rd

NXi

max (0,1− yif(xi)) + λ||w||1

• Least squares SVM

minw∈Rd

NXi

[max (0,1− yif(xi))]2 + λ||w||2

f(x) = w>x+ b

Page 16: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Multi-class Classification

Page 17: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Multi-Class Classification – what we would like

Assign input vector to one of K classes

Goal: a decision rule that divides input space into K decision regionsseparated by decision boundaries

Page 18: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Reminder: K Nearest Neighbour (K-NN) Classifier

Algorithm• For each test point, x, to be classified, find the K nearest

samples in the training data• Classify the point, x, according to the majority vote of their

class labels

e.g. K = 3

• naturally applicable to multi-class case

Page 19: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Build from binary classifiers

• Learn: K two-class 1-vs-the-rest classifiers fk (x)

C1

C2

C3

1 vs 2 & 3

3 vs 1 & 22 vs 1 & 3

?

?

?

?

Page 20: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Build from binary classifiers continued

• Learn: K two-class 1 vs the rest classifiers fk (x)

• Classification: choose class with most positive score

C1

C2

C3

1 vs 2 & 3

3 vs 1 & 22 vs 1 & 3

maxkfk(x)

Page 21: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Build from binary classifiers continued

• Learn: K two-class 1 vs the rest classifiers fk (x)

• Classification: choose class with most positive score

C1

C2

C3

maxkfk(x)

Page 22: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Application: hand written digit recognition

• Feature vectors: each image is 28 x 28 pixels. Rearrange as a 784-vector x

• Training: learn k=10 two-class 1-vs-the-rest SVM classifiers fk (x)

• Classification: choose class with most positive score

f(x) = maxkfk(x)

Page 23: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

0

5

5

2

5

3

5

4

5

5

1 2 3 4 5 6 7 8 9 0

1 2

3 4

5

Example

hand drawn

classification

Page 24: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Why not learn a multi-class SVM directly?For example for three classes

• Learn w = (w1,w2,w3)> using the cost function

minw||w||2 subject to

w1>xi ≥ w2>xi & w1

>xi ≥ w3>xi for i ∈ class 1w2

>xi ≥ w3>xi & w2>xi ≥ w1>xi for i ∈ class 2

w3>xi ≥ w1>xi & w3

>xi ≥ w2>xi for i ∈ class 3

• This is a quadratic optimization problem subject to linear

constraints and there is a unique minimum

• Note, a margin can also be included in the constraints

In practice there is a little or no improvement over the binary case

Page 25: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Random Forests

Page 26: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Random Forest Overview• A natural multiple class classifier

• Fast at test time

• Start with single tree, and then move onto forest

Page 27: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Generic trees and decision trees

terminal (leaf) node

internal 

(split) node

root node0

1 2

3 4 5 6

9 10 11 12 13 14

A general tree structure

1

1

x1

x2

Classify as 

green class

A decision tree x2 > 1

x1 > 1

Classify as 

blue class

Classify as 

red class

Page 28: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Testing

?

• choose class with max posterior at leaf node

Page 29: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Decision forest model: training and information gain

Before sp

lit

Information gain

Shannon’s entropy

Node training

(for discrete, non‐parametric distributions)

Split 1

Split 2

Page 30: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Trees vs Forests

• A single tree may over fit to the training data• Instead train multiple trees and combine their predictions• Each tree can differ in both training data and node tests• Achieve this by injecting randomness into training algorithm

1

1x1

x2

lack of margin

Page 31: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Forests and trees

A forest is an ensemble of trees. The trees are all slightly different from one another.

Page 32: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Classification forest - injecting randomness I

(1) Bagging (randomizing the training set)

The full training set

The randomly sampled subset of training data

made available for the tree t

Forest training

Page 33: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Classification forest - injecting randomness II

(2) Node training - random subsets of node tests available at each node

Forest training

Node weak learner

Node test params

T1T2

T3

Page 34: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

From a single tree classification ….What do we do at the leaf?

leafleaf

leaf

Prediction model: probabilistic

Page 35: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

How to combine the predictions of the trees?

Tree t=1 t=2 t=3

Forest output probability

The ensemble model

Page 36: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Random Forests - summary• Node tests are usually chosen to be cheap to evaluate (weak

classifiers) e.g. comparing pairs of feature components

• At run time, classification is very fast (like AdaBoost)

• Classifier is non-linear in feature space

• Training can be quite fast as well if node test chosen

completely randomly

• Many parameters: depth of tree, number of trees, type of node

tests, random sampling

• Requires a lot of training data

• Large memory footprint (cf. k-NN)

Page 37: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Application:

Body tracking in Microsoft Kinect for XBox 360

Page 38: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

RGB image

Depth image inferred body parts

Kinect – random forest classifier

Multi-class classification

Input data for each frame Output

Page 39: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

leftelbow

left hand rightshoulderneck

fit stickman model and track skeleton

Goal: train a random forest classifier to predict body parts

Page 40: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Training/test Data

From motion capture system

e.g. 1 million training examples

Page 41: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Node tests

Output

Feature response

Input data point

(depth difference between two points)

(pixel position in 2D image)

Input depth image

Page 42: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Example result

body parts in 3D

Inferred body parts (posterior)Input depth image (bg removed)

Page 43: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Neural Networks

Page 44: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Neural Network (NN) Overview• A natural multiple class classifier

• Start with single neuron, and then move onto a network

Page 45: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Perceptron steps:

1. Map a vector x to a scalar score by an affine projection (w, b)

2. Transform the score monotonically but non-linearly by the sigmoid S()

bw1

wD

w2

1

x1

x2

xD

P(y = 1 | x, w, b)

bias term

Single neuron - Perceptron

Σ S

-20 -15 -10 -5 0 5 10 15 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

z

S(z) =1

1+ e−z

[Rosenblatt 57]

Page 46: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Training as a binary classifier – Logistic Regression

Learning is formulated as the optimization problem

minw∈Rd

NXi

log³1+ e−yif(xi)

´+ λ||w||2

loss function regularization

Comparison with SVM:• both approximate 0-1 loss

• both convex optimizations

• very similar asymptotic behaviour

• main difference is smoothness of LR, and non-zero outside SVM margin

yif(xi)

f(x) = w>x+ b

Page 47: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Neural Networks

Assemble neurons into feed forward networks• More parameters and non-linearities• More capacity for learning – can train multi-way classifier• No longer a convex optimization problem – but optimize

using steepest descent on w

Deep architectures – multi-layer perceptron

Page 48: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Example: Convolutional Neural Networks (CNN)

• Choice of network architecture and loss function

• “Feed forward” network – multiple layers from input to output

• In a CNN, the architecture is convolutional • In this example, consider a loss function for multi-way classification

of images

layer 1W1

layer 2W2

layer NWNdata x y

NXi=1

l (f(xi,w), yi) + λR (w)

Minimize with respect to wLeCunHinton

Krizhevsky

Page 49: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Convolutional layers

(F, b)

linearfilters

RELUmax(0,z)

non-lineargating

max

spatialpooling

sliding l2

channelnormalisation

down-sampling

c1 c2 c3 c4 c5 f6 f7 f8 code

slide credit: Andrea Vedaldi

Page 50: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

input features

x

F1

F2

y

2-dimensionaloutput featuresa bank of 2 filters

Σ

Σ

Convolution

Page 51: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

A bank of 256 filters (learned from data)

Each filter is 1D (it applies to a grayscale image)

Each filter is 16 ⨉ 16 pixels

Filter bank example

Page 52: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Each filter generates a feature map

Filtering

Input Feature Map

.

.

.

Page 53: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

CNN components

(F, b)

linear 3D filters

downsampling

ReLU

max

spatial pooling

sliding l2

normalization

{

slide credit: Andrea Vedaldi

Page 54: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Convolutional layers

(F, b)

linearfilters

RELUmax(0,z)

non-lineargating

max

spatialpooling

sliding l2

channelnormalisation

down-sampling

c1 c2 c3 c4 c5 f6 f7 f8 code

slide credit: Andrea Vedaldi

Page 55: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Learning a CNN

c1 c2 c3 c4 c5 f6 f7 f8 loss

bike

error

w1 w2 w3 w4 w5 w6 w7 w8

Stochastic gradient descent(with momentum, dropout, …)

E( )

slide credit: Andrea Vedaldi

Page 56: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

ImageNet Competition – 1000 categories

Application: recognizing object categories

Page 57: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

ImageNet Competition

▶ 1K classes▶ ~ 1K training images per class▶ ~ 1M training images▶ 100k test images

Page 58: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

Supervised CNN – summary • Learn everything for the task at hand (including feature vectors)

• All parameters are learnt by back-propagation

• Excellent performance

• Quite fast at test time

• Many parameters: e.g. 500M weights• Requires a lot of training data, and “tricks” in training such as data augmentation and drop-out

• Not a convex optimization problem

• Training can be slow – days or weeks on a GPU

• Large memory footprint at run time, e.g. more than 1GB

Page 59: Lecture 4: Regression ctd and multiple classesaz/lectures/ml/lect4.pdf · Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman • Regression

More …

• Other regressors• e.g. Gaussian process regression, random forests

• Different lassos• e.g. group lasso

• Other loss functions• ranking loss• dimensionality reduction

• Other NN architectures• e.g. RNN, LSTM

More on web page: http://www.robots.ox.ac.uk/~az/lectures/ml