Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning...

24
Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur Stochastic and Deterministic Noise M. Magdon-Ismail CSCI 4100/6100

Transcript of Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning...

Page 1: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Learning From DataLecture 11Overfitting

What is Overfitting

When does Overfitting OccurStochastic and Deterministic Noise

M. Magdon-IsmailCSCI 4100/6100

Page 2: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

recap: Nonlinear Transforms

Φ−→

1. Original data

xn ∈ X

2. Transform the data

zn = Φ(xn) ∈ Z

‘Φ−1’←−

4. Classify in X -space

g(x) = g̃(Φ(x)) = sign(w̃tΦ(x))

3. Separate data in Z-space

g̃(z) = sign(w̃tz)

X -space is Rd Z-space is Rd̃

x =

1x1

...xd

z = Φ(x) =

1Φ1(x)

...Φ

d̃(x)

=

1z1...zd̃

x1,x2, . . . ,xN z1, z2, . . . , zN

y1, y2, . . . , yN y1, y2, . . . , yN

no weights w̃ =

w0

w1

...w

dvc = d+ 1 dvc = d+ 1

g(x) = sign(w̃tΦ(x))

c© AML Creator: Malik Magdon-Ismail Overfitting: 2 /24 Superstitions −→

Page 3: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Superstitions – Myth or Reality?

• Paraskevedekatriaphobia – fear of Friday the 13th.

– Are future Friday the 13ths really more dangerous?

• OCD

the subjects performs an action which leads to a good outcome and thereby generalizes it as

cause and effect: the action will always give good results. Having overfit the data, the subject

compulsively engages in that activity.

Humans are overfitting machines, very good at “finding coincidences”.

c© AML Creator: Malik Magdon-Ismail Overfitting: 3 /24 Simple illustration −→

Page 4: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

An Illustration of Overfitting on a Simple Example

Quadratic f

5 data points

A little noise (measurement error)

5 data points→ 4th order polynomial fit

x

y

DataTarget

Classic overfitting: simple target with excessively complex H.

The noise did us in. (why?)

c© AML Creator: Malik Magdon-Ismail Overfitting: 4 /24 Classic overfitting −→

Page 5: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

An Illustration of Overfitting on a Simple Example

Quadratic f

5 data points

A little noise (measurement error)

5 data points→ 4th order polynomial fit

x

y

DataTargetFit

Classic overfitting: simple target with excessively complex H.

Ein ≈ 0; Eout ≫ 0

The noise did us in. (why?)

c© AML Creator: Malik Magdon-Ismail Overfitting: 5 /24 What is overfitting? −→

Page 6: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

What is Overfitting?

Fitting the data more than is warranted

c© AML Creator: Malik Magdon-Ismail Overfitting: 6 /24 Is it bad generalization? −→

Page 7: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Overfitting is Not Just Bad Generalization

in-sample error

out-of-sample error

bad generalization

VC dimension, dvc

Error

VC Analysis:

Covers bad generalization but with lots of slack – the VC bound is loose

c© AML Creator: Malik Magdon-Ismail Overfitting: 7 /24 Beyond bad generalization −→

Page 8: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Overfitting is Not Just Bad Generalization

in-sample error

out-of-sample error

overfitting

VC dimension, dvc

Error

Overfitting:

Going for lower and lower Ein results in higher and higher Eout

c© AML Creator: Malik Magdon-Ismail Overfitting: 8 /24 Case study: simple and complex f −→

Page 9: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Case Study: 2nd vs 10th Order Polynomial Fit

x

y

DataTarget

x

y

DataTarget

10th order f with noise. 50th order f with no noise.

H2: 2nd order polynomial fit

H10: 10th order polynomial fit←− special case of linear models with feature transform x 7→ (1, x, x2, · · · ).

Which model do you pick for which problem and why?

c© AML Creator: Malik Magdon-Ismail Overfitting: 9 /24 H2 versus H10 −→

Page 10: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Case Study: 2nd vs 10th Order Polynomial Fit

x

y

DataTarget

x

y

DataTarget

10th order f with noise. 50th order f with no noise.

H2: 2nd order polynomial fit

H10: 10th order polynomial fit←− special case of linear models with feature transform x 7→ (1, x, x2, · · · ).

Which model do you pick for which problem and why?

c© AML Creator: Malik Magdon-Ismail Overfitting: 10 /24 H2 wins for both cases −→

Page 11: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Case Study: 2nd vs 10th Order Polynomial Fit

replacements

x

y

Data2nd Order Fit10th Order Fit

x

y

Data2nd Order Fit10th Order Fit

simple noisy target

2nd Order 10th Order

Ein 0.050 0.034Eout 0.127 9.00

complex noiseless target

2nd Order 10th Order

Ein 0.029 10−5

Eout 0.120 7680

Go figure:

Simpler H is better even for the more complex target with no noise.

c© AML Creator: Malik Magdon-Ismail Overfitting: 11 /24 Is there really no noise −→

Page 12: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Is there Really “No Noise” with the Complex f?

x

y

DataTarget

x

y

DataTarget

Simple f with noise. Complex f with no noise.

H should match quantity and quality of data, not f

c© AML Creator: Malik Magdon-Ismail Overfitting: 12 /24 Look only at the data −→

Page 13: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Is there Really “No Noise” with the Complex f?

x

y

x

y

Simple f with noise. Complex f with no noise.

H should match quantity and quality of data, not f

c© AML Creator: Malik Magdon-Ismail Overfitting: 13 /24 Learning curves for H2, H10 −→

Page 14: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

When is H2 Better than H10?

Learning curves for H2 Learning curves for H10

Number of Data Points, N

Exp

ectedError Eout

Ein

Number of Data Points, N

Exp

ectedError

Eout

Ein

Overfitting:Eout(H10) > Eout(H2)

c© AML Creator: Malik Magdon-Ismail Overfitting: 14 /24 Overfit measure σ2 vs. N −→

Page 15: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Overfit Measure: Eout(H10)− Eout(H2)

Number of Data Points, N

Noise

Level,σ2

80 100 120-0.2

-0.1

0

0.1

0.2

0

1

2

c© AML Creator: Malik Magdon-Ismail Overfitting: 15 /24 Overfit measure Qf vs. N −→

Page 16: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Overfit Measure: Eout(H10)− Eout(H2)

Number of Data Points, N

Noise

Level,σ2

80 100 120-0.2

-0.1

0

0.1

0.2

0

1

2

Number of Data Points, N

TargetCom

plexity,Q

f

80 100 120-0.2

-0.1

0

0.1

0.2

0

25

50

75

100

Number of data points ↑ Overfitting ↓Noise ↑ Overfitting ↑

Target complexity ↑ Overfitting ↑

c© AML Creator: Malik Magdon-Ismail Overfitting: 16 /24 Define ‘noise’ −→

Page 17: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Noise

That part of y we cannot model

it has two sources . . .

c© AML Creator: Malik Magdon-Ismail Overfitting: 17 /24 Stochastic noise −→

Page 18: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Stochastic Noise — Data Error

We would like to learn from ◦:yn = f(xn)

Unfortunately, we only observe ◦:yn = f(xn) + ‘stochastic noise’

↑no one can model this

x

y

y = f(x)

stoch. noise

Stochastic Noise: fluctuations/measurement errors we cannot model.

c© AML Creator: Malik Magdon-Ismail Overfitting: 18 /24 Deterministic noise −→

Page 19: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Deterministic Noise — Model Error

We would like to learn from ◦:yn = h∗(xn)

Unfortunately, we only observe ◦:yn = f(xn)

= h∗(xn) + ‘deterministic noise’

↑H cannot model this

x

y

best approximation to f in H

h∗(x)

y = f(x)

det. noise

Deterministic Noise: the part of f we cannot model.

c© AML Creator: Malik Magdon-Ismail Overfitting: 19 /24 Both hurt learning −→

Page 20: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Stochastic & Deterministic Noise Hurt Learning

Stochastic Noise

x

y

f(x)

y = f(x)+stoch. noise

source: random measurement errors

re-measure ynstochastic noise changes.

change H

stochastic noise the same.

Deterministic Noise

x

y

h∗

y = h∗(x)+det. noise

source: learner’s H cannot model f

re-measure yndeterministic noise the same.

change H

deterministic noise changes.

We have single D and fixed H so we cannot distinguish

c© AML Creator: Malik Magdon-Ismail Overfitting: 20 /24 Stochastic noise and bias-var −→

Page 21: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Noise and the Bias-Variance Decomposition

y = f(x) + ǫ

↑measurement error

E[Eout(x)] = ED,ǫ[(g(x)− f(x)− ǫ)2]

= ED,ǫ[(g(x)− f(x))2 + 2(g(x)− f(x))ǫ + ǫ2]

↓ ↓ ↓

bias + var 0 σ2

c© AML Creator: Malik Magdon-Ismail Overfitting: 21 /24 bias-var-σ2 and noise −→

Page 22: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Noise and the Bias-Variance Decomposition

y = f(x) + ǫ

↑measurement error

E[Eout(x)] = σ2 + bias + var

↑ ↑ ↑stochastic deterministic indirect

noise noise impactof noise

c© AML Creator: Malik Magdon-Ismail Overfitting: 22 /24 Noise causes overfitting −→

Page 23: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Noise is the Culprit

Overfitting is the disease

Noise is the causeLearning is led astray by fitting the noise more than the signal

Cures

Regularization: Putting on the brakes.

Validation: A reality check from peeking at Eout (the bottom line).

c© AML Creator: Malik Magdon-Ismail Overfitting: 23 /24 Regularization teaser −→

Page 24: Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Regularization

no regularization regularization!

x

y

Data

Target

Fit

x

y

c© AML Creator: Malik Magdon-Ismail Overfitting: 24 /24