Learning From Data Lecture 26 Kernel Machinesmagdon/courses/LFD-Slides/SlidesLect26.pdftx +c) +b∗...
Transcript of Learning From Data Lecture 26 Kernel Machinesmagdon/courses/LFD-Slides/SlidesLect26.pdftx +c) +b∗...
Learning From Data
Lecture 26
Kernel Machines
Popular Kernels
The Kernel Measures Similarity
Kernels in Different Applications
M. Magdon-IsmailCSCI 4100/6100
recap: The Kernel Allows Us to Bypass Z-space
xn ∈ X
↓ K(·, ·)
g(x) = sign
∑
α∗
n>0
α∗
nynK(xn,x) + b∗
b∗ = ys −∑
α∗
n>0
α∗
nynK(xn,xs)
(One can compute b∗ for several SVs and average)
Solve the QP
minimizeα
12α
tGα− 1tα
subject to: ytα = 0
C ≥ α ≥ 0
−→α
∗
index s : C > α∗
s > 0↑
free support vectors
Overfitting Computation
SVM Pseudo-inverse
Inner products with Kernel
K(·, ·)
high d̃ → complicated separator
few support vectors → low effective complexity
high d̃ → expensive or infeasible computation
kernel → computationally feasible to go to high d̃
Can go to high (infinite) d̃ Can go to high (infinite) d̃
c© AML Creator: Malik Magdon-Ismail Kernel Machines: 2 /12 Polynomial Kernel −→
Polynomial Kernel
2nd-Order Polynomial Kernel
Φ(x) =
x1
x2
...xd
x21
x22
...x2
d√2x1x2√2x1x3
...√2x1xd√2x2x3
...√2xd−1xd
K(x,x′) = Φ(x)tΦ(x′)
=d∑
i=1
xix′
i +d∑
i=1
x2ix
′
i
2+ 2
∑
i<j
xixjx′
ix′
j ← O(d2)
=
(
1
2+ xtx′
)2
− 1
4
↑
computed quicklyin X -space, in O(d)
Q-th order polynomial kernel
K(x,x′) = (r + xtx′)Q ← inhomogeneous kernel
K(x,x′) = (xtx′)Q ← homogeneous kernel
c© AML Creator: Malik Magdon-Ismail Kernel Machines: 3 /12 RBF-Kernel −→
RBF-Kernel
One dimensional RBF-Kernel
Φ(x) = e−x2
1
√
21
1!x
√
22
2!x2
√
23
3!x3
√
24
4!x4
...
K(x,x′) = Φ(x)tΦ(x′)
= e−x2
e−x′2
∞∑
i=0
(2xx′)i
i!← not feasible
= e−x2
e−x′2
e2xx′
= e−(x−x′)2
↑
computed quicklyin X -space, in O(d)
d-dimensional RBF-Kernel
K(x,x′) = e−γ||x−x′ ||2
(γ > 0)
Hard Margin (γ = 2000, C =∞)
Soft Margin (γ = 2000, C = 0.25)
Soft Margin (γ = 100, C = 0.25)
c© AML Creator: Malik Magdon-Ismail Kernel Machines: 4 /12 RBF-Kernel Width −→
Choosing RBF-Kernel Width γ
e−γ||x−x′ ||2
Small γ Medium γ Large γ!
c© AML Creator: Malik Magdon-Ismail Kernel Machines: 5 /12 RBF-Kernel simulates k-RBF-Network −→
RBF-Kernel Simulates k-RBF-Network
RBF-Kernel k-RBF-Network
g(x) = sign
∑
α∗n>0
α∗nyne−||x−xn ||2 + b∗
g(x) = sign
k∑
j=1
wje−||x−µj ||2 + w0
Centers are at support vectors
Number of centers auto-determined
Centers chosen to represent the data
Number of centers k is an input
c© AML Creator: Malik Magdon-Ismail Kernel Machines: 6 /12 Neural Network Kernel −→
Neural Network Kernel
K(x,x′) = tanh(κ · xtx′ + c)
Neural Network Kernel 2 Layer Neural Network
g(x) = sign
∑
α∗
n>0
α∗nyn tanh(κ · xntx+ c) + b∗
g(x) = sign
(
m∑
j=1
wj tanh(vjtx) + w0
)
First layer weights are support vectors
Number of hidden nodes auto-determined
First layer weights arbitrary
Number of hidden nodes m is an input
c© AML Creator: Malik Magdon-Ismail Kernel Machines: 7 /12 Inner product measures similarity −→
The Inner Product Measures Similarity
K(x,x′) = ztz′ = || z || · || z′ || · cos(θz,z′)
= || z || · || z′ || · CosSim(z, z′)
Normalizing for size, Kernel measures similarity between input vectors
c© AML Creator: Malik Magdon-Ismail Kernel Machines: 8 /12 Designing Kernels −→
Designing Kernels
• Construct a similarity measure for the data
• A linear model should be plausible in that transformed space
c© AML Creator: Malik Magdon-Ismail Kernel Machines: 9 /12 String Kernels −→
String Kernels
Applications: DNA sequences, Text
ACGGTGTCAAACGTGTCAGTGTG
GTCGGGTCAAAACGTGAT
Dear Sir,
With reference to your letter dated 26th
March, I want to confirm the Order No.
34-09-10 placed on 3rd March, 2010. I
would appreciate if you could send me
the account details where the payment has
to be made. As per the invoice, we are
entitled to a cash discount of 2%. Can
you please let us know whether it suits
you if we make a wire transfer instead of
a cheque?
Dear Jane,
I am terribly sorry to hear the news of
your hip fracture. I can only imagine
what a terrible time you must be going
through. I hope you and the family are
coping well. If there is any help you
need, don’t hesitate to let me know.
Similar?
Yes, if classifying spam versus non-spam
No, if classifying business versus personal
To design the kernel −→ measure similarity between strings
Bag of words (number of occurences of each atom)
Co-occurrence of substrings or subsequences
c© AML Creator: Malik Magdon-Ismail Kernel Machines: 10 /12 Graph Kernels −→
Graph Kernels
Performing classification on:Graph structures (eg. protein networks for function prediction)
Graph nodes within a network (eg. advertise of not to Facebook users)
Similarity between graphs:random walks degree sequences, connectivity properties, mixing properties.
Measuring similarity between nodes:Looking at neighborhoods, K(v, v′) =
|N(v) ∩N(v′)||N(v) ∪N(v′)|.
c© AML Creator: Malik Magdon-Ismail Kernel Machines: 11 /12 Image Kernels −→
Image Kernels
Similar?
Yes - if trying to regcognize pictures with faces.
No - if trying to distinguish Malik from Christos
c© AML Creator: Malik Magdon-Ismail Kernel Machines: 12 /12