Download - Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Transcript
Page 1: Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Kernels for kernel-based machine learning

Matthias Rupp

Berlin Institute of Technology, Germany

Institute of Pure and Applied MathematicsNavigating Chemical Compound Space for Materials and Bio Design

Los Angeles, California, USA, March 18, 2011

Page 2: Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Kernels: Definition

A kernel is a function that corresponds to an inner product

k : X × X → R, k(x , z) = <φ(x), φ(z)> with φ : X → H

Example:

-2 Π -Π 0 Π 2 Π

xφ7→

-2Π -Π Π 2Πx

-1

1sin x

Input space Feature space

Matthias Rupp: Kernels for kernel-based learning 2

Page 3: Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Kernels: Geometric aspects

Kernels describe the geometry of the data

I Correspond to inner products: k(x , z) = <φ(x), φ(z)>

I Encode information about length and angle in feature space:k(x , z) = ||φ(x)||2 ||φ(z)||2 cos θ

I Correspond to Euclidean distance:||φ(x)− φ(z)||22 =

k(x , x)− 2k(x , z) + k(z , z)

0

Θx

z

ÈÈx-zÈÈ2

ÈÈxÈÈ2

ÈÈzÈÈ2

ÈÈ z ÈÈ2 cos Θ

ÈÈ x ÈÈ2Matthias Rupp: Kernels for kernel-based learning 3

Page 4: Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Kernels: Characterization

A function k : X × X → R is a kernel if and only iffor all n ∈ N, x1, . . . , xn ∈ X the matrix K = k(xi , xj) is symmetric and

I positive semi-definite, ∀ v ∈ Rn : vTKv ≥ 0

I has only non-negative eigenvalues

I all principal minors are non-negative

I ∃ L ∈ Rn×n : K = LLT ∧ diag(L) ≥ 0 (Cholesky decomposition)

I f (v) = 12v

TKv + aTv + c is convex

Similar characterization for symmetric, positive definite k.

Kernel trick for distances leads to conditionally positive semi-definitekernels, where

∑ni=1 vi = 0.

Matthias Rupp: Kernels for kernel-based learning 4

Page 5: Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Kernels: Closure properties

Kernels are closed under

I Addition: k(x , z) = k1(x , z) + k2(x , z)

I Multiplication with non-negative scalar: k(x , z) = γk1(x , z), γ ≥ 0

I Point-wise product: k(x , z) = k1(x , z) · k2(x , z)

I Tensor product k1 ⊗ k2, direct sum k1 ⊕ k2

Linear combination of kernels is a kernel:

k(x , z) = γ1k1(x , z) + γ2k2(x , z) + . . .+ γmkm(x , z)

Matthias Rupp: Kernels for kernel-based learning 5

Page 6: Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Specific kernels: Vector data

Let x, z ∈ Rd .

I Linear kernel: k(x, z) = <x, z>

I Polynomial kernel: k(x, z) =(<x, z> +c

)dI Squared exponential kernel (also radial basis function kernel):

k(x, z) = exp

(−1

2

||x− z||2

2σ2

)Local approximator; free parameter σ (length scale)

-3 -2 -1 1 2 3Input

-2

-1

1

2Output

-3 -2 -1 1 2 3Input

-2

-1

1

2Output

-3 -2 -1 1 2 3Input

-2

-1

1

2Output

Matthias Rupp: Kernels for kernel-based learning 6

Page 7: Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Specific kernels: Structured data

I Convolution kernels

(k1 × · · · × kd)(x , x ′) =∑R

d∏i=1

ki (xi , x′i )

I String kernels

I Kernels between probability distributions

I Kernels between graphs

Matthias Rupp: Kernels for kernel-based learning 7

Page 8: Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Specific kernels: String kernels

Spectrum kernel

I Compare substrings of length k (k-mers)

I Position-independent

I Kernel is sum of products of counts

Example: k = 3

x AAA C AAA TAAGTAACTAATCTTTTAGAACTTTCAACCATTT. . .

z TACCTAATTATG AAA TT AAA TTTCAGCTGTGG AAA CGGAGA. . .

3-mer AAA AAC . . . TTT# in x 2 4 . . . 3# in z 3 1 . . . 1

k(x , z) = 2 · 3 + 4 · 1 + . . .+ 3 · 1

Matthias Rupp: Kernels for kernel-based learning 8

Page 9: Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Specific kernels: String kernels

Weighted degree kernel

I Compare matches at each position

I Position-dependent

I k(x , z) =d∑

k=1

βk|x |−k∑i=1

I(uk,l(x) = uk,l(z)

)Example: d = 3

x AAACAAATAAGTAACTAATCTTTTAG# 1-mers . | . | . | | | . | . . | | . | . | . . | | | . | |# 2-mers . . . . . | | . . . . . | . . . . . . . | | . . | .# 3-mers . . . . . | . . . . . . . . . . . . . . | . . . . .

z TACCTAATTATGAAATTAAATTTCAG

k(x , z) = β1 · 15 + β2 · 6 + β3 · 2

Matthias Rupp: Kernels for kernel-based learning 9

Page 10: Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Specific kernels: Graph kernels (introduction)

Idea: Define k directly on graphs

I Application to chemical structure graphs

I Allows direct measurement of graph similarity

I Rigorous way to combine graph theory and kernel learning

I Caveat: Complete graph kernels are computationally hard

Matthias Rupp: Kernels for kernel-based learning 10

Page 11: Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Specific kernels: Graph kernels (overview)

I Random walk graph kernels, path-based graph kernels

I Tree pattern graph kernels, cyclic pattern graph kernels

I Graphlet kernels

I Optimal assignment graph kernels

CCCOCCCOCCCCCCCOCOCCCCCCCCSOSCSOCCCCCCCCCCC

CCCOCCOCCCCOCOCNCCCCCCCCCCCCCCCCNCCOCNCCCOC

CCCOCCCOCCCCCCCOCOCCCCCCCCSOSCSOCCCCCCCCCCC

CCCOCCOCCCCOCOCNCCCCCCCCCCCCCCCCNCCOCNCCCOC

M.Rupp, G. Schneider, Mol. Inf. 29(4): 266–273, 2010.

Matthias Rupp: Kernels for kernel-based learning 11

Page 12: Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Specific kernels: Graph kernels (overview)

I Random walk graph kernels, path-based graph kernels

I Tree pattern graph kernels, cyclic pattern graph kernels

I Graphlet kernels

I Optimal assignment graph kernels

CCCOCCCOCCCCCCCOCOCCCCCCCCSOSCSOCCCCCCCCCCC

CCCOCCOCCCCOCOCNCCCCCCCCCCCCCCCCNCCOCNCCCOC

M.Rupp, G. Schneider, Mol. Inf. 29(4): 266–273, 2010.

Matthias Rupp: Kernels for kernel-based learning 12

Page 13: Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Specific kernels: Graph kernels (overview)

I Random walk graph kernels, path-based graph kernels

I Tree pattern graph kernels, cyclic pattern graph kernels

I Graphlet kernels

I Optimal assignment graph kernels

M.Rupp, G. Schneider, Mol. Inf. 29(4): 266–273, 2010.

Matthias Rupp: Kernels for kernel-based learning 13

Page 14: Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Specific kernels: Graph kernels (overview)

I Random walk graph kernels, path-based graph kernels

I Tree pattern graph kernels, cyclic pattern graph kernels

I Graphlet kernels

I Optimal assignment graph kernels

M.Rupp, G. Schneider, Mol. Inf. 29(4): 266–273, 2010.

Matthias Rupp: Kernels for kernel-based learning 14

Page 15: Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Specific kernels: Graph kernels (example)

ISOAK = iterative similarity optimal assignment kernel

Xv ,v ′ = (1−α)kv (v , v ′)+αmaxπ

1

|v ′|∑{v ,u}∈E

Xu,π(u)ke({v , u}, {v ′, π(u)}

)α controls recursiveness; π assigns neighbors of v to neighbors of v ′

C1

C2

C6

C3

C4

C5

O7

C1

C2

C6

C3

C4

C5

C7 O

8

O9

Rupp et al, J. Chem. Inf.Mol.Model. 47(6): 2280, 2007.

Matthias Rupp: Kernels for kernel-based learning 15

Page 16: Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Specific kernels: Graph kernels (example)

ISOAK = iterative similarity optimal assignment kernel

Xv ,v ′ = (1−α)kv (v , v ′)+αmaxπ

1

|v ′|∑{v ,u}∈E

Xu,π(u)ke({v , u}, {v ′, π(u)}

)α controls recursiveness; π assigns neighbors of v to neighbors of v ′

C1

C2

C6

C3

C4

C5

O7

C1

C2

C6

C3

C4

C5

C7 O

8

O9

Rupp et al, J. Chem. Inf.Mol.Model. 47(6): 2280, 2007.

Matthias Rupp: Kernels for kernel-based learning 15

Page 17: Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Application example: Virtual screening

Target

Peroxisome proliferator-activated receptor γ (PPARγ)

Data

-0.2 0.2 0.4PC 1

-0.4

-0.2

0.2

0.4

0.6

PC 2

Kernel principle component analysiswith ISOAK graph kernel (n = 176)

Rucker et al., Bioorg.Med. Chem. 14(15): 5178, 2006; Rupp, PhD thesis, 2009.Matthias Rupp: Kernels for kernel-based learning 16

Page 18: Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Application example: Virtual screening

Methods:

I Graph kernel, vector kernels

I Multiple kernel learning

I Gaussian process regression

Results:

O

OO

HO O

OO

HO

Stereochemistry

0.5 1.0 5.0 10.0 50.0logHc.L

0.20.40.60.81.01.21.4

r. A.

Dose-response curve

Rupp et al., ChemMedChem 5(2): 191, 2009

Matthias Rupp: Kernels for kernel-based learning 17

Page 19: Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Summary

I Kernels correspond to inner products

I Well-characterized function class

I Kernels can be defined on structured data

Matthias Rupp: Kernels for kernel-based learning 18

Page 20: Kernels for kernel-based machine learninghelper.ipam.ucla.edu/publications/ccstut/ccstut_9749.pdfKernels for kernel-based machine learning Matthias Rupp Berlin Institute of Technology,

Literature

I K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda, B. Scholkopf: AnIntroduction to Kernel-Based Learning Algorithms,IEEE Trans. Neural Network. 12(2): 181–201, 2001.22 pages, review, broader introduction

I T. Hofmann, B. Scholkopf, A. Smola: Kernel Methods inMachine Learning, Ann. Stat. 36(6): 1171–1220, 2008.50 pages, review, mathematically oriented

I O. Ivanciuc: Applications of Support Vector Machines inChemistry, ch. 6, p. 291–400. In K. Lipkowitz, T. Cundari,Reviews in Computational Chemistry, vol. 23, Wiley, 2007.110 pages, review

I M. Rupp, G. Schneider: Graph kernels for molecular similarity,Mol. Inf 29(4): 266–273, 2010.8 pages, overview of graph kernels for small structure graphs

Matthias Rupp: Kernels for kernel-based learning 19