Lecture 4: Model Selection - ISyE Hometzhao80/Lectures/Lecture_4.pdfTuo Zhao | Lecture 4: Model...

Lecture 4: Model Selection

Tuo Zhao

Schools of ISYE and CSE, Georgia Tech

ISYE/CSE 6740: Computational Data Analysis

Regularization Selection

Given λ1 and λ2, we solve

= argminθ

L(θ) + λ12‖θ‖22 ,

= argminθ

L(θ) + λ22‖θ‖22 .

Which one is better?

(Continuous) Model Selection

Tuo Zhao — Lecture 4: Model Selection 2/19

= argminθ

L(θ) + λ12‖θ‖22 ,

= argminθ

L(θ) + λ22‖θ‖22 .

= argminθ

L(θ) + λ12‖θ‖22 ,

= argminθ

L(θ) + λ22‖θ‖22 .

Margin?

Model Selection

Given a regression problem, we consider two models,

Y = θ0 + θ1X + θ2X2 + θ3X

Y = θ0 + θ1X + θ2X2.

(Discrete) Model Selection.

Model Selection

Y = θ0 + θ1X + θ2X2 + θ3X

Y = θ0 + θ1X + θ2X2.

Model Selection

Y = θ0 + θ1X + θ2X2 + θ3X

Y = θ0 + θ1X + θ2X2.

Residuals?

Regularization and Constraint

Constrained Empirical Risk Minimization

θ = argminθ

L(θ) subject to ‖θ‖22 ≤ R.

Min-Max Problem

(θ, λ) = argminθ

maxλL(θ) + λ(‖θ‖22 −R).

Regularized Empirical Risk Minimization

θ = argminθ

L(θ) + λ ‖θ‖22 .

θ = argminθ

Min-Max Problem

(θ, λ) = argminθ

maxλL(θ) + λ(‖θ‖22 −R).

θ = argminθ

L(θ) + λ ‖θ‖22 .

θ = argminθ

Min-Max Problem

(θ, λ) = argminθ

maxλL(θ) + λ(‖θ‖22 −R).

θ = argminθ

L(θ) + λ ‖θ‖22 .

fR = argminf

L(f) subject to f ∈ FR.

fλ = argminf

L(f) +Rλ(f).

One-to-one correspondence: FR and Rλ

fR = argminf

fλ = argminf

L(f) +Rλ(f).

fR = argminf

fλ = argminf

L(f) +Rλ(f).

Learn to Generalize

Given a loss function `(f(X), Y ), we define

E(f) = EX,Y `(f(X), Y ).

Empirical Risk Minimization:

f = argminf∈FR

E(f), where E(f) = 1

m∑i=1

`(f(xi), yi)︸︷︷︸Training Error

How to estimate the testing error: E(f)?

We need an independent data set!

Learn to Generalize

E(f) = EX,Y `(f(X), Y ).

f = argminf∈FR

m∑i=1

Learn to Generalize

E(f) = EX,Y `(f(X), Y ).

f = argminf∈FR

m∑i=1

Learn to Generalize

E(f) = EX,Y `(f(X), Y ).

f = argminf∈FR

m∑i=1

A Simple Note on Learning Theory

Oracle Model:

f∗ = argminf∈FR

Generalization Bound:

E(f)︸︷︷︸Testing Error

− E(f)︸︷︷︸Training Error

Excessive Bound:

− E(f∗)︸︷︷︸Oracle Error

Oracle Model:

f∗ = argminf∈FR

Excessive Bound:

Oracle Model:

f∗ = argminf∈FR

Excessive Bound:

Learning and Validation Sets

We split the whole dataset into to two disjoint subsets:

Training Set: {(x1, y1), ..., (xn, yn)}

fRk = argminf∈FRk

Validation Set: {(x1, y1), ..., (xm, ym)}

λ = argminλ∈{λ1,...,λK}

E(fRk), where E(fRk) =1

m∑i=1

`(fRk(x)i, yi)

Learning and Validation Sets

We split the whole dataset into to two disjoint subsets:

Training Set: {(x1, y1), ..., (xn, yn)}

fRk = argminf∈FRk

Validation Set: {(x1, y1), ..., (xm, ym)}

λ = argminλ∈{λ1,...,λK}

E(fRk), where E(fRk) =1

m∑i=1

`(fRk(x)i, yi)

Cross Validation

Double Cross Validation

Cross validation: A reliable estimation of the testing error?

The optimal λ is selected based on all data.

No! The cross validation error is not obtained fromindependent data.

Double Cross Validation:

Learning Set: Training the model

Validation Set: Selecting the model

Testing Set: Estimating the testing error

Early Stopping

Grid Search

Climb Hill

Hyperparameter Optimization

The regularization parameter selection can be viewed as anoptimization problem

θ = argminθ

E(θ),

where E(θ) is the validation error on the validation set.

Different assumptions on E(θ) lead to different algorithms.

Example: Gaussian Process, ....

θ = argminθ

E(θ),

θ = argminθ

E(θ),

Random Search

Random Latin Search

Lecture 4: Model Selection - ISyE Hometzhao80/Lectures/Lecture_4.pdfTuo Zhao | Lecture 4: Model...

Documents

Transcript of Lecture 4: Model Selection - ISyE Hometzhao80/Lectures/Lecture_4.pdfTuo Zhao | Lecture 4: Model...