Learning From Data Lecture 26 Kernel Machinesmagdon/courses/LFD-Slides/SlidesLect26.pdftx +c) +b∗...

12
Learning From Data Lecture 26 Kernel Machines Popular Kernels The Kernel Measures Similarity Kernels in Different Applications M. Magdon-Ismail CSCI 4100/6100

Transcript of Learning From Data Lecture 26 Kernel Machinesmagdon/courses/LFD-Slides/SlidesLect26.pdftx +c) +b∗...

Page 1: Learning From Data Lecture 26 Kernel Machinesmagdon/courses/LFD-Slides/SlidesLect26.pdftx +c) +b∗ g(x) = sign Xm j=1 wj tanh(vj tx)+ w 0! First layer weights are support vectors

Learning From Data

Lecture 26

Kernel Machines

Popular Kernels

The Kernel Measures Similarity

Kernels in Different Applications

M. Magdon-IsmailCSCI 4100/6100

Page 2: Learning From Data Lecture 26 Kernel Machinesmagdon/courses/LFD-Slides/SlidesLect26.pdftx +c) +b∗ g(x) = sign Xm j=1 wj tanh(vj tx)+ w 0! First layer weights are support vectors

recap: The Kernel Allows Us to Bypass Z-space

xn ∈ X

↓ K(·, ·)

g(x) = sign

α∗

n>0

α∗

nynK(xn,x) + b∗

b∗ = ys −∑

α∗

n>0

α∗

nynK(xn,xs)

(One can compute b∗ for several SVs and average)

Solve the QP

minimizeα

12α

tGα− 1tα

subject to: ytα = 0

C ≥ α ≥ 0

−→α

index s : C > α∗

s > 0↑

free support vectors

Overfitting Computation

SVM Pseudo-inverse

Inner products with Kernel

K(·, ·)

high d̃ → complicated separator

few support vectors → low effective complexity

high d̃ → expensive or infeasible computation

kernel → computationally feasible to go to high d̃

Can go to high (infinite) d̃ Can go to high (infinite) d̃

c© AML Creator: Malik Magdon-Ismail Kernel Machines: 2 /12 Polynomial Kernel −→

Page 3: Learning From Data Lecture 26 Kernel Machinesmagdon/courses/LFD-Slides/SlidesLect26.pdftx +c) +b∗ g(x) = sign Xm j=1 wj tanh(vj tx)+ w 0! First layer weights are support vectors

Polynomial Kernel

2nd-Order Polynomial Kernel

Φ(x) =

x1

x2

...xd

x21

x22

...x2

d√2x1x2√2x1x3

...√2x1xd√2x2x3

...√2xd−1xd

K(x,x′) = Φ(x)tΦ(x′)

=d∑

i=1

xix′

i +d∑

i=1

x2ix

i

2+ 2

i<j

xixjx′

ix′

j ← O(d2)

=

(

1

2+ xtx′

)2

− 1

4

computed quicklyin X -space, in O(d)

Q-th order polynomial kernel

K(x,x′) = (r + xtx′)Q ← inhomogeneous kernel

K(x,x′) = (xtx′)Q ← homogeneous kernel

c© AML Creator: Malik Magdon-Ismail Kernel Machines: 3 /12 RBF-Kernel −→

Page 4: Learning From Data Lecture 26 Kernel Machinesmagdon/courses/LFD-Slides/SlidesLect26.pdftx +c) +b∗ g(x) = sign Xm j=1 wj tanh(vj tx)+ w 0! First layer weights are support vectors

RBF-Kernel

One dimensional RBF-Kernel

Φ(x) = e−x2

1

21

1!x

22

2!x2

23

3!x3

24

4!x4

...

K(x,x′) = Φ(x)tΦ(x′)

= e−x2

e−x′2

∞∑

i=0

(2xx′)i

i!← not feasible

= e−x2

e−x′2

e2xx′

= e−(x−x′)2

computed quicklyin X -space, in O(d)

d-dimensional RBF-Kernel

K(x,x′) = e−γ||x−x′ ||2

(γ > 0)

Hard Margin (γ = 2000, C =∞)

Soft Margin (γ = 2000, C = 0.25)

Soft Margin (γ = 100, C = 0.25)

c© AML Creator: Malik Magdon-Ismail Kernel Machines: 4 /12 RBF-Kernel Width −→

Page 5: Learning From Data Lecture 26 Kernel Machinesmagdon/courses/LFD-Slides/SlidesLect26.pdftx +c) +b∗ g(x) = sign Xm j=1 wj tanh(vj tx)+ w 0! First layer weights are support vectors

Choosing RBF-Kernel Width γ

e−γ||x−x′ ||2

Small γ Medium γ Large γ!

c© AML Creator: Malik Magdon-Ismail Kernel Machines: 5 /12 RBF-Kernel simulates k-RBF-Network −→

Page 6: Learning From Data Lecture 26 Kernel Machinesmagdon/courses/LFD-Slides/SlidesLect26.pdftx +c) +b∗ g(x) = sign Xm j=1 wj tanh(vj tx)+ w 0! First layer weights are support vectors

RBF-Kernel Simulates k-RBF-Network

RBF-Kernel k-RBF-Network

g(x) = sign

α∗n>0

α∗nyne−||x−xn ||2 + b∗

g(x) = sign

k∑

j=1

wje−||x−µj ||2 + w0

Centers are at support vectors

Number of centers auto-determined

Centers chosen to represent the data

Number of centers k is an input

c© AML Creator: Malik Magdon-Ismail Kernel Machines: 6 /12 Neural Network Kernel −→

Page 7: Learning From Data Lecture 26 Kernel Machinesmagdon/courses/LFD-Slides/SlidesLect26.pdftx +c) +b∗ g(x) = sign Xm j=1 wj tanh(vj tx)+ w 0! First layer weights are support vectors

Neural Network Kernel

K(x,x′) = tanh(κ · xtx′ + c)

Neural Network Kernel 2 Layer Neural Network

g(x) = sign

α∗

n>0

α∗nyn tanh(κ · xntx+ c) + b∗

g(x) = sign

(

m∑

j=1

wj tanh(vjtx) + w0

)

First layer weights are support vectors

Number of hidden nodes auto-determined

First layer weights arbitrary

Number of hidden nodes m is an input

c© AML Creator: Malik Magdon-Ismail Kernel Machines: 7 /12 Inner product measures similarity −→

Page 8: Learning From Data Lecture 26 Kernel Machinesmagdon/courses/LFD-Slides/SlidesLect26.pdftx +c) +b∗ g(x) = sign Xm j=1 wj tanh(vj tx)+ w 0! First layer weights are support vectors

The Inner Product Measures Similarity

K(x,x′) = ztz′ = || z || · || z′ || · cos(θz,z′)

= || z || · || z′ || · CosSim(z, z′)

Normalizing for size, Kernel measures similarity between input vectors

c© AML Creator: Malik Magdon-Ismail Kernel Machines: 8 /12 Designing Kernels −→

Page 9: Learning From Data Lecture 26 Kernel Machinesmagdon/courses/LFD-Slides/SlidesLect26.pdftx +c) +b∗ g(x) = sign Xm j=1 wj tanh(vj tx)+ w 0! First layer weights are support vectors

Designing Kernels

• Construct a similarity measure for the data

• A linear model should be plausible in that transformed space

c© AML Creator: Malik Magdon-Ismail Kernel Machines: 9 /12 String Kernels −→

Page 10: Learning From Data Lecture 26 Kernel Machinesmagdon/courses/LFD-Slides/SlidesLect26.pdftx +c) +b∗ g(x) = sign Xm j=1 wj tanh(vj tx)+ w 0! First layer weights are support vectors

String Kernels

Applications: DNA sequences, Text

ACGGTGTCAAACGTGTCAGTGTG

GTCGGGTCAAAACGTGAT

Dear Sir,

With reference to your letter dated 26th

March, I want to confirm the Order No.

34-09-10 placed on 3rd March, 2010. I

would appreciate if you could send me

the account details where the payment has

to be made. As per the invoice, we are

entitled to a cash discount of 2%. Can

you please let us know whether it suits

you if we make a wire transfer instead of

a cheque?

Dear Jane,

I am terribly sorry to hear the news of

your hip fracture. I can only imagine

what a terrible time you must be going

through. I hope you and the family are

coping well. If there is any help you

need, don’t hesitate to let me know.

Similar?

Yes, if classifying spam versus non-spam

No, if classifying business versus personal

To design the kernel −→ measure similarity between strings

Bag of words (number of occurences of each atom)

Co-occurrence of substrings or subsequences

c© AML Creator: Malik Magdon-Ismail Kernel Machines: 10 /12 Graph Kernels −→

Page 11: Learning From Data Lecture 26 Kernel Machinesmagdon/courses/LFD-Slides/SlidesLect26.pdftx +c) +b∗ g(x) = sign Xm j=1 wj tanh(vj tx)+ w 0! First layer weights are support vectors

Graph Kernels

Performing classification on:Graph structures (eg. protein networks for function prediction)

Graph nodes within a network (eg. advertise of not to Facebook users)

Similarity between graphs:random walks degree sequences, connectivity properties, mixing properties.

Measuring similarity between nodes:Looking at neighborhoods, K(v, v′) =

|N(v) ∩N(v′)||N(v) ∪N(v′)|.

c© AML Creator: Malik Magdon-Ismail Kernel Machines: 11 /12 Image Kernels −→

Page 12: Learning From Data Lecture 26 Kernel Machinesmagdon/courses/LFD-Slides/SlidesLect26.pdftx +c) +b∗ g(x) = sign Xm j=1 wj tanh(vj tx)+ w 0! First layer weights are support vectors

Image Kernels

Similar?

Yes - if trying to regcognize pictures with faces.

No - if trying to distinguish Malik from Christos

c© AML Creator: Malik Magdon-Ismail Kernel Machines: 12 /12