of 25/25
E D = + f H f H g (D) (x) ¯ g (x) f (x) E E N N E E N E E N
• date post

18-Jan-2019
• Category

## Documents

• view

214

0

Embed Size (px)

### Transcript of D ¯( N - Yaser Abu-Mostafawork.caltech.edu/slides/slides09.pdf · rn Lea g(x) = θ(w T x) ≈f(x)...

Review of Leture 8 Bias and varianeExpeted value of Eout w.r.t. D

= bias + varf

H bias

var fH

g(D)(x) g(x) f(x)

Learning urvesHow Ein and Eout vary with N

PSfrag replaements

Number of Data Points, NExpetedError

biasvariane Eout

Ein

204060800.160.170.180.190.20.210.22

PSfrag replaements

Number of Data Points, NExpetedError

in-sample errorgeneralization errorEout

Ein

204060800.160.170.180.190.20.210.22 N VC dimension

B-V:

VC:

Learning From DataYaser S. Abu-MostafaCalifornia Institute of TehnologyLeture 9: The Linear Model II

Sponsored by Calteh's Provost Oe, E&AS Division, and IST Tuesday, May 1, 2012

Where we are Linear lassiation X Linear regression X Logisti regression ? Nonlinear transforms X

AML Creator: Yaser Abu-Mostafa - LFD Leture 9 2/24

Nonlinear transformsx = (x0, x1, , xd)

z = (z0, z1, , zd)

Eah zi = i(x) z = (x)Example: z = (1, x1, x2, x1x2, x21, x22)

Final hypothesis g(x) in X spae:sign (wT(x)) or wT(x)

AML Creator: Yaser Abu-Mostafa - LFD Leture 9 3/24

The prie we payx = (x0, x1, , xd)

z = (z0, z1, , zd)

w w

dv = d + 1 dv d + 1

AML Creator: Yaser Abu-Mostafa - LFD Leture 9 4/24

Two non-separable ases

PSfrag replaements-1-0.500.51-1.5-1-0.500.511.5

AML Creator: Yaser Abu-Mostafa - LFD Leture 9 5/24

First ase

PSfrag replaements-1-0.500.51-1.5-1-0.500.511.5

Use a linear model in X ; a

ept Ein > 0or

Insist on Ein = 0; go to high-dimensional Z

AML Creator: Yaser Abu-Mostafa - LFD Leture 9 6/24

Seond ase

PSfrag replaements-1-0.500.51-1.5-1-0.500.511.5

z = (1, x1, x2, x1x2, x21, x

22)

Why not: z = (1, x21, x22)or better yet: z = (1, x21 + x22)or even: z = (x21 + x22 0.6)

AML Creator: Yaser Abu-Mostafa - LFD Leture 9 7/24

Lesson learnedLooking at the data before hoosing the model an be hazardous to your Eout

Data snooping

Learning From Data - Leture 9 8/24

Logisti regression - Outline The model Error measure Learning algorithm

AML Creator: Yaser Abu-Mostafa - LFD Leture 9 9/24

A third linear models =

d

i=0

wixi

linear lassiation linear regression logisti regressionh(x) = sign(s) h(x) = s h(x) = (s)

sx

x

x

x0

1

2

d

h x( )

sx

x

x

x0

1

2

d

h x( )

sx

x

x

x0

1

2

d

h x( )

AML Creator: Yaser Abu-Mostafa - LFD Leture 9 10/24

The logisti funtion The formula:

(s) =es

1 + es

PSfrag replaements

(s)1

0 s

-4-202400.51soft threshold: unertaintysigmoid: attened out `s'

AML Creator: Yaser Abu-Mostafa - LFD Leture 9 11/24

Probability interpretationh(x) = (s) is interpreted as a probability

Example. Predition of heart attaksInput x: holesterol level, age, weight, et.(s): probability of a heart attakThe signal s = wTx risk soreh(x) = (s) AML Creator: Yaser Abu-Mostafa - LFD Leture 9 12/24

Genuine probabilityData (x, y) with binary y, generated by a noisy target:

P (y | x) =

{

f(x) for y = +1;1f(x) for y = 1.

The target f : Rd [0, 1] is the probabilityLearn g(x) = (wT x) f(x)

AML Creator: Yaser Abu-Mostafa - LFD Leture 9 13/24

Error measureFor eah (x, y), y is generated by probability f(x)Plausible error measure based on likelihood:

If h = f , how likely to get y from x?P (y | x) =

{

h(x) for y = +1;1 h(x) for y = 1.

AML Creator: Yaser Abu-Mostafa - LFD Leture 9 14/24

Formula for likelihoodP (y | x) =

{

h(x) for y = +1;1 h(x) for y = 1.

Substitute h(x) = (wTx), noting (s) = 1 (s)

PSfrag replaements

(s)

1

0 s

-4-202400.51P (y | x) = (y wTx)Likelihood of D = (x1, y1), . . . , (xN , yN) is

N

n=1

P (yn | xn) =

N

n=1

(ynwTxn)

AML Creator: Yaser Abu-Mostafa - LFD Leture 9 15/24

Maximizing the likelihoodMinimize 1

Nln

(N

n=1

(yn wTxn)

)

=1

N

N

n=1

ln

(1

(yn wT xn)) [

(s) =1

1 + es

]

Ein(w) = 1N

N

n=1

ln(

1 + eynwTxn

)

e(h(xn),yn) ross-entropy error AML Creator: Yaser Abu-Mostafa - LFD Leture 9 16/24

Logisti regression - Outline The model Error measure Learning algorithm

AML Creator: Yaser Abu-Mostafa - LFD Leture 9 17/24

How to minimize EinFor logisti regression,

Ein(w) = 1N

N

n=1

ln(

1 + eynwTxn

)

Compare to linear regression:Ein(w) = 1

N

N

n=1

(wTxn yn)2 losed-form solution

iterative solution

AML Creator: Yaser Abu-Mostafa - LFD Leture 9 18/24

PSfrag replaements

Weights, wIn-sampleError,E

in-10-8-6-4-20210152025

General method for nonlinear optimizationStart at w(0); take a step along steepest slopeFixed step size: w(1) = w(0) + vWhat is the diretion v?

Ein(w)

AML Creator: Yaser Abu-Mostafa - LFD Leture 9 19/24

Formula for the diretion vEin = Ein(w(0) + v ) Ein(w(0))

= Ein(w(0))tv + O(2) Ein(w(0))

Sine v is a unit vetor,v =

Ein(w(0))Ein(w(0)) AML Creator: Yaser Abu-Mostafa - LFD Leture 9 20/24

Fixed-size step?How aets the algorithm:

PSfrag replaements

Weights, wIn-sampleError,Ein

-1-0.8-0.6-0.4-0.200.20.40.60.8100.20.40.60.81

PSfrag replaements

Weights, wIn-sampleError,E

in-1-0.8-0.6-0.4-0.200.20.40.60.8100.050.10.150.20.25

PSfrag replaements

Weights, wIn-sampleError,E

in large small

-1-0.8-0.6-0.4-0.200.20.40.60.8100.20.40.60.81 too small too large variable just right

should inrease with the slope AML Creator: Yaser Abu-Mostafa - LFD Leture 9 21/24

w = v

= Ein(w(0))Ein(w(0))Have

w = Ein(w(0))Fixed learning rate AML Creator: Yaser Abu-Mostafa - LFD Leture 9 22/24

Logisti regression algorithm1: Initialize the weights at t = 0 to w(0)2: for t = 0, 1, 2, . . . do3: Compute the gradient

Ein = 1N

N

n=1

ynxn

1 + e ynwT(t)xn4: Update the weights: w(t + 1) = w(t) Ein5: Iterate to the next step until it is time to stop6: Return the nal weights w

AML Creator: Yaser Abu-Mostafa - LFD Leture 9 23/24

Summary of Linear Models

CreditAnalysis Amountof CreditApproveor DenyProbabilityof Default PereptronLogisti RegressionLinear RegressionClassiation ErrorPLA, Poket,. . .Cross-entropy ErrorGradient desentSquared ErrorPseudo-inverse

AML Creator: Yaser Abu-Mostafa - LFD Leture 9 24/24