Kernels for kernel-based machine...

Kernels for kernel-based machine learning

Matthias Rupp

Berlin Institute of Technology, Germany

Institute of Pure and Applied MathematicsNavigating Chemical Compound Space for Materials and Bio Design

Los Angeles, California, USA, March 18, 2011

Kernels: Definition

A kernel is a function that corresponds to an inner product

k : X × X → R, k(x , z) = <φ(x), φ(z)> with φ : X → H

Example:

-2 Π -Π 0 Π 2 Π

xφ7→

-2Π -Π Π 2Πx

1sin x

Input space Feature space

Matthias Rupp: Kernels for kernel-based learning 2

Kernels: Geometric aspects

Kernels describe the geometry of the data

I Correspond to inner products: k(x , z) = <φ(x), φ(z)>

I Encode information about length and angle in feature space:k(x , z) = ||φ(x)||2 ||φ(z)||2 cos θ

I Correspond to Euclidean distance:||φ(x)− φ(z)||22 =

k(x , x)− 2k(x , z) + k(z , z)

ÈÈx-zÈÈ2

ÈÈxÈÈ2

ÈÈzÈÈ2

ÈÈ z ÈÈ2 cos Θ

ÈÈ x ÈÈ2Matthias Rupp: Kernels for kernel-based learning 3

Kernels: Characterization

A function k : X × X → R is a kernel if and only iffor all n ∈ N, x1, . . . , xn ∈ X the matrix K = k(xi , xj) is symmetric and

I positive semi-definite, ∀ v ∈ Rn : vTKv ≥ 0

I has only non-negative eigenvalues

I all principal minors are non-negative

I ∃ L ∈ Rn×n : K = LLT ∧ diag(L) ≥ 0 (Cholesky decomposition)

I f (v) = 12v

TKv + aTv + c is convex

Similar characterization for symmetric, positive definite k.

Kernel trick for distances leads to conditionally positive semi-definitekernels, where

∑ni=1 vi = 0.

Kernels: Closure properties

Kernels are closed under

I Addition: k(x , z) = k1(x , z) + k2(x , z)

I Multiplication with non-negative scalar: k(x , z) = γk1(x , z), γ ≥ 0

I Point-wise product: k(x , z) = k1(x , z) · k2(x , z)

I Tensor product k1 ⊗ k2, direct sum k1 ⊕ k2

Linear combination of kernels is a kernel:

k(x , z) = γ1k1(x , z) + γ2k2(x , z) + . . .+ γmkm(x , z)

Specific kernels: Vector data

Let x, z ∈ Rd .

I Linear kernel: k(x, z) = <x, z>

I Polynomial kernel: k(x, z) =(<x, z> +c

)dI Squared exponential kernel (also radial basis function kernel):

k(x, z) = exp

||x− z||2

)Local approximator; free parameter σ (length scale)

-3 -2 -1 1 2 3Input

2Output

-3 -2 -1 1 2 3Input

2Output

-3 -2 -1 1 2 3Input

2Output

Specific kernels: Structured data

I Convolution kernels

(k1 × · · · × kd)(x , x ′) =∑R

d∏i=1

ki (xi , x′i )

I String kernels

I Kernels between probability distributions

I Kernels between graphs

Specific kernels: String kernels

Spectrum kernel

I Compare substrings of length k (k-mers)

I Position-independent

I Kernel is sum of products of counts

Example: k = 3

x AAA C AAA TAAGTAACTAATCTTTTAGAACTTTCAACCATTT. . .

z TACCTAATTATG AAA TT AAA TTTCAGCTGTGG AAA CGGAGA. . .

3-mer AAA AAC . . . TTT# in x 2 4 . . . 3# in z 3 1 . . . 1

k(x , z) = 2 · 3 + 4 · 1 + . . .+ 3 · 1

Specific kernels: String kernels

Weighted degree kernel

I Compare matches at each position

I Position-dependent

I k(x , z) =d∑

βk|x |−k∑i=1

I(uk,l(x) = uk,l(z)

)Example: d = 3

x AAACAAATAAGTAACTAATCTTTTAG# 1-mers . | . | . | | | . | . . | | . | . | . . | | | . | |# 2-mers . . . . . | | . . . . . | . . . . . . . | | . . | .# 3-mers . . . . . | . . . . . . . . . . . . . . | . . . . .

z TACCTAATTATGAAATTAAATTTCAG

k(x , z) = β1 · 15 + β2 · 6 + β3 · 2

Specific kernels: Graph kernels (introduction)

Idea: Define k directly on graphs

I Application to chemical structure graphs

I Allows direct measurement of graph similarity

I Rigorous way to combine graph theory and kernel learning

I Caveat: Complete graph kernels are computationally hard

Specific kernels: Graph kernels (overview)

I Random walk graph kernels, path-based graph kernels

I Tree pattern graph kernels, cyclic pattern graph kernels

I Graphlet kernels

I Optimal assignment graph kernels

CCCOCCCOCCCCCCCOCOCCCCCCCCSOSCSOCCCCCCCCCCC

CCCOCCOCCCCOCOCNCCCCCCCCCCCCCCCCNCCOCNCCCOC

M.Rupp, G. Schneider, Mol. Inf. 29(4): 266–273, 2010.

I Graphlet kernels

Specific kernels: Graph kernels (example)

ISOAK = iterative similarity optimal assignment kernel

Xv ,v ′ = (1−α)kv (v , v ′)+αmaxπ

|v ′|∑{v ,u}∈E

Xu,π(u)ke({v , u}, {v ′, π(u)}

)α controls recursiveness; π assigns neighbors of v to neighbors of v ′

Rupp et al, J. Chem. Inf.Mol.Model. 47(6): 2280, 2007.

Specific kernels: Graph kernels (example)

ISOAK = iterative similarity optimal assignment kernel

Xv ,v ′ = (1−α)kv (v , v ′)+αmaxπ

|v ′|∑{v ,u}∈E

Xu,π(u)ke({v , u}, {v ′, π(u)}

)α controls recursiveness; π assigns neighbors of v to neighbors of v ′

Rupp et al, J. Chem. Inf.Mol.Model. 47(6): 2280, 2007.

Application example: Virtual screening

Target

Peroxisome proliferator-activated receptor γ (PPARγ)

-0.2 0.2 0.4PC 1

Kernel principle component analysiswith ISOAK graph kernel (n = 176)

Rucker et al., Bioorg.Med. Chem. 14(15): 5178, 2006; Rupp, PhD thesis, 2009.Matthias Rupp: Kernels for kernel-based learning 16

Application example: Virtual screening

Methods:

I Graph kernel, vector kernels

I Multiple kernel learning

I Gaussian process regression

Results:

Stereochemistry

0.5 1.0 5.0 10.0 50.0logHc.L

0.20.40.60.81.01.21.4

Dose-response curve

Rupp et al., ChemMedChem 5(2): 191, 2009

Summary

I Kernels correspond to inner products

I Well-characterized function class

I Kernels can be defined on structured data

Literature

I K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda, B. Scholkopf: AnIntroduction to Kernel-Based Learning Algorithms,IEEE Trans. Neural Network. 12(2): 181–201, 2001.22 pages, review, broader introduction

I T. Hofmann, B. Scholkopf, A. Smola: Kernel Methods inMachine Learning, Ann. Stat. 36(6): 1171–1220, 2008.50 pages, review, mathematically oriented

I O. Ivanciuc: Applications of Support Vector Machines inChemistry, ch. 6, p. 291–400. In K. Lipkowitz, T. Cundari,Reviews in Computational Chemistry, vol. 23, Wiley, 2007.110 pages, review

I M. Rupp, G. Schneider: Graph kernels for molecular similarity,Mol. Inf 29(4): 266–273, 2010.8 pages, overview of graph kernels for small structure graphs

Kernels for kernel-based machine...

Documents

Transcript of Kernels for kernel-based machine...

Operating System Architectures Kernel Design Microkernels.

Support Vector Machines for Structured Classification and The Kernel Trick

Kernel Ridge Regression - Rensselaer Polytechnic Institute

CSC 411 Lecture 18: Kernels - Department of Computer ...jlucas/teaching/csc411/lectures/lec18_handout.pdfOther Kernel methods Kernel Logistic regression I We can think of logistic

Machine Learning Dimensionality Reduction · Machine Learning Dimensionality Reduction Gerard Pons-Moll Pons-Moll (Lecture 20, 09.01.2019) Machine Learning 1 / 40

Auger Filling Machine - · PDF fileSemi-auto Auger Type Powder Metering Filling Machine (Bench Model) Model Machine Size Machine Weight Power Required GSM - 2002 250kg 220V, 3φ, 3.5HP,

The Performance of μ -Kernel-Based Systems

Learning From Data Lecture 25 The Kernel Trickmagdon/courses/LFD-Slides/SlidesLect25.pdfLearning From Data Lecture 25 The Kernel Trick Learning with only inner products The Kernel

Moore Machine

Applications of differentiated CAD kernel ... - Autodiff.org

Turing machine

Mealy moore machine model

Kernel methods on £nite groups

Kernel methods on £nite groups - Columbia Universityrisi/papers/KondorNIPS2001.pdf · Kernel methods on £nite groups Risi Imre Kondor Center for Automated Learning and Discovery

Fastfood - Approximating Kernel Expansions in Loglinear …€¦ · Fastfood - Approximating Kernel Expansions in Loglinear Time Quoc Le, Tamas Sarl´os, Alex Smola. ICML-2013 Zoltan

Reproducing Kernel Banach Spaces for Machine Learningjmlr.org/papers/volume10/zhang09b/zhang09b.pdf · REPRODUCING KERNEL BANACH SPACES FOR MACHINE LEARNING where Bis Banach space,

Chapter 5 Phrase-based models - Statistical Machine Translation

Kernel–Based Meshless Methodsnum.math.uni-goettingen.de/schaback/teaching/Appverf_II.pdfaccompanying slides. ... range F, while their domain Ω is an unstructured set. ... deﬁning

Heat kernel analysis for Bessel operators on symmetric conespure.au.dk/.../files/69558245/heat_kernel_analysis_on_symmetric_co… · Heat kernel analysis for Bessel operators on symmetric

Flesh Machine