K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4...

23
h(x)= K k =1 w k exp γ x μ k 2 μ k w k unsupervised nearest neighbors neural networks SVM regularization learning RBF Kernel =

Transcript of K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4...

Page 1: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

Review of Le ture 16• Radial Basis Fun tions

h(x) =K

k=1

wk exp(

−γ ‖x− µk‖2)

Choose µk's: Lloyd's algorithmChoose wk's: Pseudo-inverse unsupervised

nearest

neighbors

neural

networks

SVM

regularization

learning

RBFKernel =

Page 2: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

Learning From DataYaser S. Abu-MostafaCalifornia Institute of Te hnologyLe ture 17: Three Learning Prin iples

Sponsored by Calte h's Provost O� e, E&AS Division, and IST • Tuesday, May 29, 2012

Page 3: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

Outline• O am's Razor• Sampling Bias• Data Snooping

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 17 2/22

Page 4: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

Re urring theme - simple hypothesesA �quote� by Einstein:An explanation of the data should be made as simple as possible, but no simplerThe razor: symboli of a prin iple set by William of O am

Learning From Data - Le ture 17 3/22

Page 5: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

O am's RazorThe simplest model that �ts the data is also the most plausible.

Two questions:1. What does it mean for a model to be simple?2. How do we know that simpler is better?

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 17 4/22

Page 6: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

First question: `simple' means?Measures of omplexity - two types: omplexity of h and omplexity of H

Complexity of h: MDL, order of a polynomialComplexity of H: Entropy, VC dimension• When we think of simple, it's in terms of h

• Proofs use simple in terms of H © AM

L Creator: Yaser Abu-Mostafa - LFD Le ture 17 5/22

Page 7: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

and the link is . . .

ounting: ℓ bits spe ify h =⇒ h is one of 2ℓ elements of a set HReal-valued parameters? Example: 17th order polynomial - omplex and one of �many�Ex eptions? Looks omplex but is one of few - SVM

Hi

Hi

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 17 6/22

Page 8: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

Puzzle 1: Football ora le00000000000000001111111111111111 000000000111111110000000011111111 100001111000011110000111100001111 000110011001100110011001100110011 101010101010101010101010101010101 1

• Letter predi ting game out ome• Good all!• More letters - for 5 weeks• Perfe t re ord!• Want more? $50 harge• Should you pay?

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 17 7/22

Page 9: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

Se ond question: Why is simpler better?Better doesn't mean more elegant! It means better out-of-sample performan eThe basi argument: (formal proof under di�erent idealized onditions)

Fewer simple hypotheses than omplex onesmH(N)

⇒ less likely to �t a given data setmH(N)/2N

⇒ more signi� ant when it happensThe postal s am: mH(N) = 1 versus 2N

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 17 8/22

Page 10: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

A �t that means nothingPSfrag repla ements

temperature T

resistivityρ

-1012345-1-0.500.511.522.533.54

PSfrag repla ements

temperature T

resistivityρ

-1012345-1-0.500.511.522.533.54

PSfrag repla ements

temperature T

resistivityρ

-1012345-1-0.500.511.522.533.54S ientist A S ientist B �falsi�able�Condu tivity linear in temperature?Two s ientists ondu t experimentsWhat eviden e do A and B provide? © AM

L Creator: Yaser Abu-Mostafa - LFD Le ture 17 9/22

Page 11: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

Outline• O am's Razor• Sampling Bias• Data Snooping

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 17 10/22

Page 12: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

Puzzle 2: Presidential ele tionIn 1948, Truman ran against Dewey in lose ele tionsA newspaper ran a phone poll of how people votedDewey won the poll de isively - newspaper de lared:

Hi

Hi

Learning From Data - Le ture 17 11/22

Page 13: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

On to the vi tory rally . . .

. . . of TrumanIt's not δ's fault:

P [ |Ein − Eout| > ǫ ] ≤ δ

Learning From Data - Le ture 17 12/22

Page 14: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

The biasIn 1948, phones were expensive.

If the data is sampled in a biased way, learningwill produ e a similarly biased out ome.Example: normal period in the marketTesting: live trading in real market

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 17 13/22

Page 15: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

Mat hing the distributionsHi

Hi

P(x)

training

testing

x

Methods to mat h training and testing distributionsDoesn't work if:Region has P = 0 in training, but P > 0 in testing

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 17 14/22

Page 16: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

Puzzle 3: Credit approvalage 23 yearsgender maleannual salary $30,000years in residen e 1 yearyears in job 1 year urrent debt $15,000· · · · · ·

Histori al re ords of ustomersInput: information on redit appli ation:Target: pro�table for the bank

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 17 15/22

Page 17: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

Outline• O am's Razor• Sampling Bias• Data Snooping

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 17 16/22

Page 18: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

The prin ipleIf a data set has a�e ted any step in the learning pro ess,its ability to assess the out ome has been ompromised.

Most ommon trap for pra titioners - many ways to slip Hi

HiLearning From Data - Le ture 17 17/22

Page 19: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

Looking at the data

PSfrag repla ements-1-0.500.51-1.5-1-0.500.511.5

Remember nonlinear transforms?z = (1, x1, x2, x1x2, x

21, x

22)or z = (1, x2

1, x22) or z = (1, x2

1 + x22)Snooping involves D, not other information

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 17 18/22

Page 20: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

Puzzle 4: Finan ial fore asting

PSfrag repla ements

DayCumulativePro�t%

no snooping

snooping

0 100 200 300 400 500-100102030Predi t US Dollar versus British Pound

Normalize data, split randomly: Dtrain, Dtest

Train on Dtrain only, test g on Dtest

∆r−20,∆r−19, · · · ,∆r−1 →∆r0 © AML Creator: Yaser Abu-Mostafa - LFD Le ture 17 19/22

Page 21: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

Reuse of a data setTrying one model after the other on the same data set, you will eventually `su eed'

If you torture the data long enough, it will onfessVC dimension of the total learning modelMay in lude what others tried!Key problem: mat hing a parti ular data set

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 17 20/22

Page 22: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

Two remedies1. Avoid data snoopingstri t dis ipline2. A ount for data snoopinghow mu h data ontamination

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 17 21/22

Page 23: K neural x) = wk exp γ kx µkwork.caltech.edu/slides/slides17.pdf5-1-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 T ... Puzzle 5: Bias via sno oping esting T long-term p rmance erfo of buy and hold

Puzzle 5: Bias via snoopingTesting long-term performan e of �buy and hold� in sto ks. Use 50 years worth of data

• All urrently traded ompanies in S&P500• Assume you stri tly followed buy and hold• Would have made great pro�t!Sampling bias aused by `snooping'

© AML Creator: Yaser Abu-Mostafa - LFD Le ture 17 22/22