Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010...

206
Φ(x), Φ(x 0 ) = K (x, x 0 )= C(x, x 0 ) - C(x, x 0 ) - C(x 0 , x 0 ) Learning in Indefiniteness Purushottam Kar Department of Computer Science and Engineering Indian Institute of Technology Kanpur August 2, 2010 Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 1 / 60

Transcript of Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010...

Page 1: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

⟨Φ(x), Φ(x′)

⟩= K (x, x′) = C(x, x′)− C(x, x0)− C(x′, x0)

Learning in Indefiniteness

Purushottam Kar

Department of Computer Science and EngineeringIndian Institute of Technology Kanpur

August 2, 2010

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 1 / 60

Page 2: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Outline

1 A brief introduction to learning

2 Kernels - Definite and Indefinite

3 Using kernels as measures of distanceLandmarking based approachesApproximate embeddings into Pseudo Euclidean spacesExact embeddings into Banach spaces

4 Using kernels as measures of similarityApproximate embeddings into Pseudo Euclidean spacesExact embeddings into Kreın spacesLandmarking based approaches

5 Conclusion

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 2 / 60

Page 3: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Outline

1 A brief introduction to learning

2 Kernels - Definite and Indefinite

3 Using kernels as measures of distanceLandmarking based approachesApproximate embeddings into Pseudo Euclidean spacesExact embeddings into Banach spaces

4 Using kernels as measures of similarityApproximate embeddings into Pseudo Euclidean spacesExact embeddings into Kreın spacesLandmarking based approaches

5 Conclusion

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 2 / 60

Page 4: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Outline

1 A brief introduction to learning

2 Kernels - Definite and Indefinite

3 Using kernels as measures of distanceLandmarking based approachesApproximate embeddings into Pseudo Euclidean spacesExact embeddings into Banach spaces

4 Using kernels as measures of similarityApproximate embeddings into Pseudo Euclidean spacesExact embeddings into Kreın spacesLandmarking based approaches

5 Conclusion

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 2 / 60

Page 5: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Outline

1 A brief introduction to learning

2 Kernels - Definite and Indefinite

3 Using kernels as measures of distanceLandmarking based approachesApproximate embeddings into Pseudo Euclidean spacesExact embeddings into Banach spaces

4 Using kernels as measures of similarityApproximate embeddings into Pseudo Euclidean spacesExact embeddings into Kreın spacesLandmarking based approaches

5 Conclusion

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 2 / 60

Page 6: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Outline

1 A brief introduction to learning

2 Kernels - Definite and Indefinite

3 Using kernels as measures of distanceLandmarking based approachesApproximate embeddings into Pseudo Euclidean spacesExact embeddings into Banach spaces

4 Using kernels as measures of similarityApproximate embeddings into Pseudo Euclidean spacesExact embeddings into Kreın spacesLandmarking based approaches

5 Conclusion

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 2 / 60

Page 7: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Outline

A Quiz

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 3 / 60

Page 8: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Outline

A Quiz

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 3 / 60

Page 9: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Outline

A Quiz

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 3 / 60

Page 10: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Outline

A Quiz

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 3 / 60

Page 11: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Learning 100

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 4 / 60

Page 12: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Learning as pattern recognition

Binary classification

Multi-class classificationMulti-label classificationRegressionClusteringRanking...

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 5 / 60

Page 13: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Learning as pattern recognition

Binary classificationMulti-class classification

Multi-label classificationRegressionClusteringRanking...

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 5 / 60

Page 14: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Learning as pattern recognition

Binary classificationMulti-class classificationMulti-label classification

RegressionClusteringRanking...

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 5 / 60

Page 15: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Learning as pattern recognition

Binary classificationMulti-class classificationMulti-label classificationRegression

ClusteringRanking...

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 5 / 60

Page 16: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Learning as pattern recognition

Binary classificationMulti-class classificationMulti-label classificationRegressionClustering

Ranking...

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 5 / 60

Page 17: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Learning as pattern recognition

Binary classificationMulti-class classificationMulti-label classificationRegressionClusteringRanking

...

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 5 / 60

Page 18: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Learning as pattern recognition

Binary classificationMulti-class classificationMulti-label classificationRegressionClusteringRanking...

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 5 / 60

Page 19: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Learning as pattern recognition

Binary classification X

Multi-class classificationMulti-label classificationRegressionClusteringRanking...

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 5 / 60

Page 20: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Binary classification

Learning Dichotomies from examples

Learning the distinction between a bird and a non-birdMain approaches :

I Generative (Bayesian classification)I Predictive

F Feature BasedF Kernel Based

This talk : Kernel Based predictive approaches to binaryclassification

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 6 / 60

Page 21: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Binary classification

Learning Dichotomies from examplesLearning the distinction between a bird and a non-bird

Main approaches :

I Generative (Bayesian classification)I Predictive

F Feature BasedF Kernel Based

This talk : Kernel Based predictive approaches to binaryclassification

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 6 / 60

Page 22: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Binary classification

Learning Dichotomies from examplesLearning the distinction between a bird and a non-birdMain approaches :

I Generative (Bayesian classification)I Predictive

F Feature BasedF Kernel Based

This talk : Kernel Based predictive approaches to binaryclassification

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 6 / 60

Page 23: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Binary classification

Learning Dichotomies from examplesLearning the distinction between a bird and a non-birdMain approaches :

I Generative (Bayesian classification)

I Predictive

F Feature BasedF Kernel Based

This talk : Kernel Based predictive approaches to binaryclassification

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 6 / 60

Page 24: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Binary classification

Learning Dichotomies from examplesLearning the distinction between a bird and a non-birdMain approaches :

I Generative (Bayesian classification)I Predictive

F Feature BasedF Kernel Based

This talk : Kernel Based predictive approaches to binaryclassification

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 6 / 60

Page 25: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Binary classification

Learning Dichotomies from examplesLearning the distinction between a bird and a non-birdMain approaches :

I Generative (Bayesian classification)I Predictive

F Feature Based

F Kernel Based

This talk : Kernel Based predictive approaches to binaryclassification

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 6 / 60

Page 26: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Binary classification

Learning Dichotomies from examplesLearning the distinction between a bird and a non-birdMain approaches :

I Generative (Bayesian classification)I Predictive

F Feature BasedF Kernel Based

This talk : Kernel Based predictive approaches to binaryclassification

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 6 / 60

Page 27: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Binary classification

Learning Dichotomies from examplesLearning the distinction between a bird and a non-birdMain approaches :

I Generative (Bayesian classification)I Predictive

F Feature BasedF Kernel Based X

This talk : Kernel Based predictive approaches to binaryclassification

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 6 / 60

Page 28: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Probably Approximately Correct learning[Kearns and Vazirani, 1997]

DefinitionA class of boolean functions F defined on a domain X is said to bePAC-learnable if there exists a class of boolean functions H defined onX , an algorithm A and a function S : R+ × R+ such that for alldistributions µ defined on X , all t ∈ F , all ε, δ > 0 : A, when given(xi , f (xi))n

i=1, xi ∈R µ where n = S(1/ε,1/δ), returns with probability(taken over the choice of x1, . . . , xn) greater than 1− δ, a functionh ∈ H such that

Prx∈Rµ

[h(x) 6= t(x)] ≤ ε.

t is the Target function, F the Concept Class

h is the Hypothesis, H the Hypothesis ClassS is the Sample Complexity of the algorithm A

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 7 / 60

Page 29: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Probably Approximately Correct learning[Kearns and Vazirani, 1997]

DefinitionA class of boolean functions F defined on a domain X is said to bePAC-learnable if there exists a class of boolean functions H defined onX , an algorithm A and a function S : R+ × R+ such that for alldistributions µ defined on X , all t ∈ F , all ε, δ > 0 : A, when given(xi , f (xi))n

i=1, xi ∈R µ where n = S(1/ε,1/δ), returns with probability(taken over the choice of x1, . . . , xn) greater than 1− δ, a functionh ∈ H such that

Prx∈Rµ

[h(x) 6= t(x)] ≤ ε.

t is the Target function, F the Concept Classh is the Hypothesis, H the Hypothesis Class

S is the Sample Complexity of the algorithm A

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 7 / 60

Page 30: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Probably Approximately Correct learning[Kearns and Vazirani, 1997]

DefinitionA class of boolean functions F defined on a domain X is said to bePAC-learnable if there exists a class of boolean functions H defined onX , an algorithm A and a function S : R+ × R+ such that for alldistributions µ defined on X , all t ∈ F , all ε, δ > 0 : A, when given(xi , f (xi))n

i=1, xi ∈R µ where n = S(1/ε,1/δ), returns with probability(taken over the choice of x1, . . . , xn) greater than 1− δ, a functionh ∈ H such that

Prx∈Rµ

[h(x) 6= t(x)] ≤ ε.

t is the Target function, F the Concept Classh is the Hypothesis, H the Hypothesis ClassS is the Sample Complexity of the algorithm A

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 7 / 60

Page 31: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Limitations of PAC learning

Most interesting function classes are not PAC learnable withpolynomial sample complexities eg. Regular Languages

Adversarial combinations of target functions and distributions canmake learning impossibleWeaker notions of learning

I Weak-PAC learning - require only that ε be bounded away from 12

I Restrict oneself to benign distributions (uniform, mixture ofGaussians)

I Restrict oneself to benign learning scenarios (targetfunction-distribution pairs that are benign)

I Vaguely defined in literature

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60

Page 32: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Limitations of PAC learning

Most interesting function classes are not PAC learnable withpolynomial sample complexities eg. Regular LanguagesAdversarial combinations of target functions and distributions canmake learning impossible

Weaker notions of learning

I Weak-PAC learning - require only that ε be bounded away from 12

I Restrict oneself to benign distributions (uniform, mixture ofGaussians)

I Restrict oneself to benign learning scenarios (targetfunction-distribution pairs that are benign)

I Vaguely defined in literature

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60

Page 33: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Limitations of PAC learning

Most interesting function classes are not PAC learnable withpolynomial sample complexities eg. Regular LanguagesAdversarial combinations of target functions and distributions canmake learning impossibleWeaker notions of learning

I Weak-PAC learning - require only that ε be bounded away from 12

I Restrict oneself to benign distributions (uniform, mixture ofGaussians)

I Restrict oneself to benign learning scenarios (targetfunction-distribution pairs that are benign)

I Vaguely defined in literature

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60

Page 34: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Limitations of PAC learning

Most interesting function classes are not PAC learnable withpolynomial sample complexities eg. Regular LanguagesAdversarial combinations of target functions and distributions canmake learning impossibleWeaker notions of learning

I Weak-PAC learning - require only that ε be bounded away from 12

I Restrict oneself to benign distributions (uniform, mixture ofGaussians)

I Restrict oneself to benign learning scenarios (targetfunction-distribution pairs that are benign)

I Vaguely defined in literature

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60

Page 35: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Limitations of PAC learning

Most interesting function classes are not PAC learnable withpolynomial sample complexities eg. Regular LanguagesAdversarial combinations of target functions and distributions canmake learning impossibleWeaker notions of learning

I Weak-PAC learning - require only that ε be bounded away from 12

I Restrict oneself to benign distributions (uniform, mixture ofGaussians)

I Restrict oneself to benign learning scenarios (targetfunction-distribution pairs that are benign)

I Vaguely defined in literature

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60

Page 36: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Limitations of PAC learning

Most interesting function classes are not PAC learnable withpolynomial sample complexities eg. Regular LanguagesAdversarial combinations of target functions and distributions canmake learning impossibleWeaker notions of learning

I Weak-PAC learning - require only that ε be bounded away from 12

I Restrict oneself to benign distributions (uniform, mixture ofGaussians)

I Restrict oneself to benign learning scenarios (targetfunction-distribution pairs that are benign)

I Vaguely defined in literature

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60

Page 37: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Limitations of PAC learning

Most interesting function classes are not PAC learnable withpolynomial sample complexities eg. Regular LanguagesAdversarial combinations of target functions and distributions canmake learning impossibleWeaker notions of learning

I Weak-PAC learning - require only that ε be bounded away from 12

I Restrict oneself to benign distributions (uniform, mixture ofGaussians)

I Restrict oneself to benign learning scenarios (targetfunction-distribution pairs that are benign)

I Vaguely defined in literature

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60

Page 38: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Limitations of PAC learning

Most interesting function classes are not PAC learnable withpolynomial sample complexities eg. Regular LanguagesAdversarial combinations of target functions and distributions canmake learning impossibleWeaker notions of learning

I Weak-PAC learning - require only that ε be bounded away from 12

I Restrict oneself to benign distributions (uniform, mixture ofGaussians)

I Restrict oneself to benign learning scenarios (targetfunction-distribution pairs that are benign) X

I Vaguely defined in literature

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60

Page 39: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Learning

Weak∗-Probably Approximately Correct learning

DefinitionA class of boolean functions F defined on a domain X is said to beweak∗-PAC-learnable if for every t ∈ F and distribution µ defined on X ,there exists a class of boolean functions H defined on X , an algorithmA and a function S : R+ × R+ such that for all ε, δ > 0 : A, when given(xi , f (xi))n

i=1, xi ∈R µ where n = S(1/ε,1/δ), returns with probability(taken over the choice of x1, . . . , xn) greater than 1− δ, a functionh ∈ H such that

Prx∈Rµ

[h(x) 6= t(x)] ≤ ε.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 9 / 60

Page 40: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels

Kernels

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 10 / 60

Page 41: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels

Kernels

DefinitionGiven a non-empty set X , a symmetric real-valued (resp. Hermitiancomplex valued) function f : X × X → R (resp f : X × X → C) is calleda kernel.

All notions of (symmetric) distances, similarities are kernels

Alternatively kernels can be thought of as measures of similarityor distance

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 11 / 60

Page 42: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels

Kernels

DefinitionGiven a non-empty set X , a symmetric real-valued (resp. Hermitiancomplex valued) function f : X × X → R (resp f : X × X → C) is calleda kernel.

All notions of (symmetric) distances, similarities are kernelsAlternatively kernels can be thought of as measures of similarityor distance

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 11 / 60

Page 43: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels

Definiteness

DefinitionA matrix A ∈ Rn×n is said to be positive definite if ∀c ∈ Rn, c 6= 0,c>Ac > 0.

DefinitionA kernel K defined on a domain X is said to be positive definite if∀n ∈ N, ∀x1, . . . xn ∈ X , the matrix G = (Gij) = (K (xi , xj)) is positivedefinite. Alternatively, for every g ∈ L2(X ),

∫∫X g(x)g(x ′)K (x , x ′) ≥ 0.

DefinitionA kernel K is said to be indefinite if it is neither positive definite nornegative definite.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 12 / 60

Page 44: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels

Definiteness

DefinitionA matrix A ∈ Rn×n is said to be positive definite if ∀c ∈ Rn, c 6= 0,c>Ac > 0.

DefinitionA kernel K defined on a domain X is said to be positive definite if∀n ∈ N, ∀x1, . . . xn ∈ X , the matrix G = (Gij) = (K (xi , xj)) is positivedefinite. Alternatively, for every g ∈ L2(X ),

∫∫X g(x)g(x ′)K (x , x ′) ≥ 0.

DefinitionA kernel K is said to be indefinite if it is neither positive definite nornegative definite.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 12 / 60

Page 45: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels

Definiteness

DefinitionA matrix A ∈ Rn×n is said to be positive definite if ∀c ∈ Rn, c 6= 0,c>Ac > 0.

DefinitionA kernel K defined on a domain X is said to be positive definite if∀n ∈ N, ∀x1, . . . xn ∈ X , the matrix G = (Gij) = (K (xi , xj)) is positivedefinite. Alternatively, for every g ∈ L2(X ),

∫∫X g(x)g(x ′)K (x , x ′) ≥ 0.

DefinitionA kernel K is said to be indefinite if it is neither positive definite nornegative definite.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 12 / 60

Page 46: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels

The Kernel Trick

All PD Kernels turn out to be inner products in some Hilbert space

Thus, any algorithm that only takes as input pairwise innerproducts can be made to implicitly work in such spacesResults known as Representer Theorems keep any Curses ofdimensionality at bay...Testing the Mercer condition difficultIndefinite kernels known to give good performanceAbility to use indefinite kernels increases the scope oflearning-the-kernel algorithmsLearning paradigm somewhere between PAC and weak∗-PAC

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 13 / 60

Page 47: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels

The Kernel Trick

All PD Kernels turn out to be inner products in some Hilbert spaceThus, any algorithm that only takes as input pairwise innerproducts can be made to implicitly work in such spaces

Results known as Representer Theorems keep any Curses ofdimensionality at bay...Testing the Mercer condition difficultIndefinite kernels known to give good performanceAbility to use indefinite kernels increases the scope oflearning-the-kernel algorithmsLearning paradigm somewhere between PAC and weak∗-PAC

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 13 / 60

Page 48: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels

The Kernel Trick

All PD Kernels turn out to be inner products in some Hilbert spaceThus, any algorithm that only takes as input pairwise innerproducts can be made to implicitly work in such spacesResults known as Representer Theorems keep any Curses ofdimensionality at bay

...Testing the Mercer condition difficultIndefinite kernels known to give good performanceAbility to use indefinite kernels increases the scope oflearning-the-kernel algorithmsLearning paradigm somewhere between PAC and weak∗-PAC

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 13 / 60

Page 49: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels

The Kernel Trick

All PD Kernels turn out to be inner products in some Hilbert spaceThus, any algorithm that only takes as input pairwise innerproducts can be made to implicitly work in such spacesResults known as Representer Theorems keep any Curses ofdimensionality at bay...

Testing the Mercer condition difficultIndefinite kernels known to give good performanceAbility to use indefinite kernels increases the scope oflearning-the-kernel algorithmsLearning paradigm somewhere between PAC and weak∗-PAC

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 13 / 60

Page 50: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels

The Kernel Trick

All PD Kernels turn out to be inner products in some Hilbert spaceThus, any algorithm that only takes as input pairwise innerproducts can be made to implicitly work in such spacesResults known as Representer Theorems keep any Curses ofdimensionality at bay...Testing the Mercer condition difficult

Indefinite kernels known to give good performanceAbility to use indefinite kernels increases the scope oflearning-the-kernel algorithmsLearning paradigm somewhere between PAC and weak∗-PAC

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 13 / 60

Page 51: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels

The Kernel Trick

All PD Kernels turn out to be inner products in some Hilbert spaceThus, any algorithm that only takes as input pairwise innerproducts can be made to implicitly work in such spacesResults known as Representer Theorems keep any Curses ofdimensionality at bay...Testing the Mercer condition difficultIndefinite kernels known to give good performance

Ability to use indefinite kernels increases the scope oflearning-the-kernel algorithmsLearning paradigm somewhere between PAC and weak∗-PAC

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 13 / 60

Page 52: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels

The Kernel Trick

All PD Kernels turn out to be inner products in some Hilbert spaceThus, any algorithm that only takes as input pairwise innerproducts can be made to implicitly work in such spacesResults known as Representer Theorems keep any Curses ofdimensionality at bay...Testing the Mercer condition difficultIndefinite kernels known to give good performanceAbility to use indefinite kernels increases the scope oflearning-the-kernel algorithms

Learning paradigm somewhere between PAC and weak∗-PAC

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 13 / 60

Page 53: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels

The Kernel Trick

All PD Kernels turn out to be inner products in some Hilbert spaceThus, any algorithm that only takes as input pairwise innerproducts can be made to implicitly work in such spacesResults known as Representer Theorems keep any Curses ofdimensionality at bay...Testing the Mercer condition difficultIndefinite kernels known to give good performanceAbility to use indefinite kernels increases the scope oflearning-the-kernel algorithmsLearning paradigm somewhere between PAC and weak∗-PAC

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 13 / 60

Page 54: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances

Kernels as distances

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 14 / 60

Page 55: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Landmarking based approaches

Nearest neighbor classification [Duda et al., 2000]

Learning domain is some distance (possibly metric) space (X ,d)

Given T = (xi , t(xi))ni=1, xi ∈ X , yi ∈ {−1,+1}, T = T + ∪ T−

Classify a new point x as + if d(x ,T +) < d(x ,T−) otherwise as −When will this work ?

I Intuitively when a large fraction of domain points are closer(according to d) to points of the same label than points of thedifferent label

I Prx∈Rµ

[d(x ,X t(x)) < d(x ,X t(x))

]≥ 1− ε

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 15 / 60

Page 56: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Landmarking based approaches

Nearest neighbor classification [Duda et al., 2000]

Learning domain is some distance (possibly metric) space (X ,d)

Given T = (xi , t(xi))ni=1, xi ∈ X , yi ∈ {−1,+1}, T = T + ∪ T−

Classify a new point x as + if d(x ,T +) < d(x ,T−) otherwise as −When will this work ?

I Intuitively when a large fraction of domain points are closer(according to d) to points of the same label than points of thedifferent label

I Prx∈Rµ

[d(x ,X t(x)) < d(x ,X t(x))

]≥ 1− ε

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 15 / 60

Page 57: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Landmarking based approaches

Nearest neighbor classification [Duda et al., 2000]

Learning domain is some distance (possibly metric) space (X ,d)

Given T = (xi , t(xi))ni=1, xi ∈ X , yi ∈ {−1,+1}, T = T + ∪ T−

Classify a new point x as + if d(x ,T +) < d(x ,T−) otherwise as −

When will this work ?

I Intuitively when a large fraction of domain points are closer(according to d) to points of the same label than points of thedifferent label

I Prx∈Rµ

[d(x ,X t(x)) < d(x ,X t(x))

]≥ 1− ε

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 15 / 60

Page 58: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Landmarking based approaches

Nearest neighbor classification [Duda et al., 2000]

Learning domain is some distance (possibly metric) space (X ,d)

Given T = (xi , t(xi))ni=1, xi ∈ X , yi ∈ {−1,+1}, T = T + ∪ T−

Classify a new point x as + if d(x ,T +) < d(x ,T−) otherwise as −When will this work ?

I Intuitively when a large fraction of domain points are closer(according to d) to points of the same label than points of thedifferent label

I Prx∈Rµ

[d(x ,X t(x)) < d(x ,X t(x))

]≥ 1− ε

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 15 / 60

Page 59: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Landmarking based approaches

Nearest neighbor classification [Duda et al., 2000]

Learning domain is some distance (possibly metric) space (X ,d)

Given T = (xi , t(xi))ni=1, xi ∈ X , yi ∈ {−1,+1}, T = T + ∪ T−

Classify a new point x as + if d(x ,T +) < d(x ,T−) otherwise as −When will this work ?

I Intuitively when a large fraction of domain points are closer(according to d) to points of the same label than points of thedifferent label

I Prx∈Rµ

[d(x ,X t(x)) < d(x ,X t(x))

]≥ 1− ε

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 15 / 60

Page 60: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Landmarking based approaches

Nearest neighbor classification [Duda et al., 2000]

Learning domain is some distance (possibly metric) space (X ,d)

Given T = (xi , t(xi))ni=1, xi ∈ X , yi ∈ {−1,+1}, T = T + ∪ T−

Classify a new point x as + if d(x ,T +) < d(x ,T−) otherwise as −When will this work ?

I Intuitively when a large fraction of domain points are closer(according to d) to points of the same label than points of thedifferent label

I Prx∈Rµ

[d(x ,X t(x)) < d(x ,X t(x))

]≥ 1− ε

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 15 / 60

Page 61: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Landmarking based approaches

What is a good distance function

DefinitionA distance function d is said to be strongly (ε, γ)-good for a learningproblem, if at least 1− ε probability mass of examples x ∈ µ satisfy

Prx ,x ′′∈Rµ

[d(x , x ′) < d(x , x ′′)|x ′ ∈ X t(x), x ′′ ∈ X t(x)

]≥ 1

2+ γ.

A smoothed version of the earlier intuitive notion of good distancefunction

Correspondingly the algorithm is also a smoothed version of theclassical NN algorithm

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 16 / 60

Page 62: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Landmarking based approaches

What is a good distance function

DefinitionA distance function d is said to be strongly (ε, γ)-good for a learningproblem, if at least 1− ε probability mass of examples x ∈ µ satisfy

Prx ,x ′′∈Rµ

[d(x , x ′) < d(x , x ′′)|x ′ ∈ X t(x), x ′′ ∈ X t(x)

]≥ 1

2+ γ.

A smoothed version of the earlier intuitive notion of good distancefunctionCorrespondingly the algorithm is also a smoothed version of theclassical NN algorithm

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 16 / 60

Page 63: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Landmarking based approaches

Learning with a good distance function

Theorem ([Wang et al., 2007])Given a strongly (ε, γ)-good distance function, the following classifierh, for any ε, δ > 0, when given n = 1

γ2 lg(1δ

)pairs of positive and

negative training points, (ai ,bi)ni=1,ai ∈R µ+,bi ∈R µ− with probability

greater than 1− δ, has an error no more than ε+ δ

h(x) = sgn [f (x)] , f (x) =1n

n∑i=1

sgn [d(x ,bi)− d(x ,ai)]

What about the NN algorithm - any guarantees for that ?

For metric distances - in a few slidesNote that this is an instance of weak∗-PAC learningGuarantees for NN on non-metric distances ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 17 / 60

Page 64: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Landmarking based approaches

Learning with a good distance function

Theorem ([Wang et al., 2007])Given a strongly (ε, γ)-good distance function, the following classifierh, for any ε, δ > 0, when given n = 1

γ2 lg(1δ

)pairs of positive and

negative training points, (ai ,bi)ni=1,ai ∈R µ+,bi ∈R µ− with probability

greater than 1− δ, has an error no more than ε+ δ

h(x) = sgn [f (x)] , f (x) =1n

n∑i=1

sgn [d(x ,bi)− d(x ,ai)]

What about the NN algorithm - any guarantees for that ?For metric distances - in a few slides

Note that this is an instance of weak∗-PAC learningGuarantees for NN on non-metric distances ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 17 / 60

Page 65: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Landmarking based approaches

Learning with a good distance function

Theorem ([Wang et al., 2007])Given a strongly (ε, γ)-good distance function, the following classifierh, for any ε, δ > 0, when given n = 1

γ2 lg(1δ

)pairs of positive and

negative training points, (ai ,bi)ni=1,ai ∈R µ+,bi ∈R µ− with probability

greater than 1− δ, has an error no more than ε+ δ

h(x) = sgn [f (x)] , f (x) =1n

n∑i=1

sgn [d(x ,bi)− d(x ,ai)]

What about the NN algorithm - any guarantees for that ?For metric distances - in a few slidesNote that this is an instance of weak∗-PAC learning

Guarantees for NN on non-metric distances ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 17 / 60

Page 66: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Landmarking based approaches

Learning with a good distance function

Theorem ([Wang et al., 2007])Given a strongly (ε, γ)-good distance function, the following classifierh, for any ε, δ > 0, when given n = 1

γ2 lg(1δ

)pairs of positive and

negative training points, (ai ,bi)ni=1,ai ∈R µ+,bi ∈R µ− with probability

greater than 1− δ, has an error no more than ε+ δ

h(x) = sgn [f (x)] , f (x) =1n

n∑i=1

sgn [d(x ,bi)− d(x ,ai)]

What about the NN algorithm - any guarantees for that ?For metric distances - in a few slidesNote that this is an instance of weak∗-PAC learningGuarantees for NN on non-metric distances ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 17 / 60

Page 67: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Landmarking based approaches

Other landmarking approaches

[Weinshall et al., 1998], [Jacobs et al., 2000] investigatealgorithms where a (set of) representative(s) is chosen for eachlabel: eg the centroid of all training points with that label

[Pekalska and Duin, 2001] consider combining classifiers basedon different dissimilarity functions as well as building classifiers oncombinations of different dissimilarity functions[Weinberger and Saul, 2009] propose methods to learn aMahalanobis distance to improve NN classification

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 18 / 60

Page 68: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Landmarking based approaches

Other landmarking approaches

[Weinshall et al., 1998], [Jacobs et al., 2000] investigatealgorithms where a (set of) representative(s) is chosen for eachlabel: eg the centroid of all training points with that label[Pekalska and Duin, 2001] consider combining classifiers basedon different dissimilarity functions as well as building classifiers oncombinations of different dissimilarity functions

[Weinberger and Saul, 2009] propose methods to learn aMahalanobis distance to improve NN classification

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 18 / 60

Page 69: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Landmarking based approaches

Other landmarking approaches

[Weinshall et al., 1998], [Jacobs et al., 2000] investigatealgorithms where a (set of) representative(s) is chosen for eachlabel: eg the centroid of all training points with that label[Pekalska and Duin, 2001] consider combining classifiers basedon different dissimilarity functions as well as building classifiers oncombinations of different dissimilarity functions[Weinberger and Saul, 2009] propose methods to learn aMahalanobis distance to improve NN classification

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 18 / 60

Page 70: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Landmarking based approaches

Other landmarking approaches

[Gottlieb et al., 2010] present efficient schemes for NN classifiers(Lipschitz extension classifiers) in doubling spaces

h(x) = sgn [f (x)] , f (x) = minxi∈T

(t(xi) + 2

d(x , xi)

d(T +,T−)

)

I make use of approximate nearest neighbor search algorithmsI show that pseudo dimension of Lipschitz classifiers in doubling

spaces is boundedI are able to provides schemes for optimizing the bias-variance

trade-off

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 19 / 60

Page 71: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Landmarking based approaches

Other landmarking approaches

[Gottlieb et al., 2010] present efficient schemes for NN classifiers(Lipschitz extension classifiers) in doubling spaces

h(x) = sgn [f (x)] , f (x) = minxi∈T

(t(xi) + 2

d(x , xi)

d(T +,T−)

)I make use of approximate nearest neighbor search algorithms

I show that pseudo dimension of Lipschitz classifiers in doublingspaces is bounded

I are able to provides schemes for optimizing the bias-variancetrade-off

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 19 / 60

Page 72: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Landmarking based approaches

Other landmarking approaches

[Gottlieb et al., 2010] present efficient schemes for NN classifiers(Lipschitz extension classifiers) in doubling spaces

h(x) = sgn [f (x)] , f (x) = minxi∈T

(t(xi) + 2

d(x , xi)

d(T +,T−)

)I make use of approximate nearest neighbor search algorithmsI show that pseudo dimension of Lipschitz classifiers in doubling

spaces is bounded

I are able to provides schemes for optimizing the bias-variancetrade-off

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 19 / 60

Page 73: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Landmarking based approaches

Other landmarking approaches

[Gottlieb et al., 2010] present efficient schemes for NN classifiers(Lipschitz extension classifiers) in doubling spaces

h(x) = sgn [f (x)] , f (x) = minxi∈T

(t(xi) + 2

d(x , xi)

d(T +,T−)

)I make use of approximate nearest neighbor search algorithmsI show that pseudo dimension of Lipschitz classifiers in doubling

spaces is boundedI are able to provides schemes for optimizing the bias-variance

trade-off

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 19 / 60

Page 74: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances PE space approaches

Data sensitive embeddings

Landmarking based approaches can be seen as implicitlyembedding the domain into an n dimensional feature space

Perform an explicit embedding of training data to some vectorspace that is isometric and learn a classifierPerform (approximately) isometric embeddings of test data intothe same vector space to classify themExact for transductive problems, approximate for inductive onesLong history of such techniques from early AI - Multidimensionalscaling

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 20 / 60

Page 75: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances PE space approaches

Data sensitive embeddings

Landmarking based approaches can be seen as implicitlyembedding the domain into an n dimensional feature spacePerform an explicit embedding of training data to some vectorspace that is isometric and learn a classifier

Perform (approximately) isometric embeddings of test data intothe same vector space to classify themExact for transductive problems, approximate for inductive onesLong history of such techniques from early AI - Multidimensionalscaling

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 20 / 60

Page 76: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances PE space approaches

Data sensitive embeddings

Landmarking based approaches can be seen as implicitlyembedding the domain into an n dimensional feature spacePerform an explicit embedding of training data to some vectorspace that is isometric and learn a classifierPerform (approximately) isometric embeddings of test data intothe same vector space to classify them

Exact for transductive problems, approximate for inductive onesLong history of such techniques from early AI - Multidimensionalscaling

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 20 / 60

Page 77: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances PE space approaches

Data sensitive embeddings

Landmarking based approaches can be seen as implicitlyembedding the domain into an n dimensional feature spacePerform an explicit embedding of training data to some vectorspace that is isometric and learn a classifierPerform (approximately) isometric embeddings of test data intothe same vector space to classify themExact for transductive problems, approximate for inductive ones

Long history of such techniques from early AI - Multidimensionalscaling

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 20 / 60

Page 78: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances PE space approaches

Data sensitive embeddings

Landmarking based approaches can be seen as implicitlyembedding the domain into an n dimensional feature spacePerform an explicit embedding of training data to some vectorspace that is isometric and learn a classifierPerform (approximately) isometric embeddings of test data intothe same vector space to classify themExact for transductive problems, approximate for inductive onesLong history of such techniques from early AI - Multidimensionalscaling

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 20 / 60

Page 79: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Pseudo Euclidean spaces

The Minkowski space-time

DefinitionR4 = R3 ⊕ R1 := R(3,1) endowed with the inner product〈(x1, y1, z1, t1), (x2, y2, z2, t2)〉 = x1x2 + y1y2 + z1z2 − t1t2 is a4-dimensional Minkowski space with signature (3,1). The normimposed by this inner product is ‖(x1, y1, z1, t1)‖2 = x2

1 + y21 + z2

1 − t21

Can have vectors of negative length due to the imaginary timecoordinate

The definition an be extended to arbitrary R(p,q) (PE Spaces)

Theorem ([Goldfarb, 1984], [Haasdonk, 2005])Any finite pseudo metric (X ,d), |X | = n can be isometricallyembedded in

(R(p,q), ‖ · ‖2

)for some values of p + q < n.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 21 / 60

Page 80: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Pseudo Euclidean spaces

The Minkowski space-time

DefinitionR4 = R3 ⊕ R1 := R(3,1) endowed with the inner product〈(x1, y1, z1, t1), (x2, y2, z2, t2)〉 = x1x2 + y1y2 + z1z2 − t1t2 is a4-dimensional Minkowski space with signature (3,1). The normimposed by this inner product is ‖(x1, y1, z1, t1)‖2 = x2

1 + y21 + z2

1 − t21

Can have vectors of negative length due to the imaginary timecoordinateThe definition an be extended to arbitrary R(p,q) (PE Spaces)

Theorem ([Goldfarb, 1984], [Haasdonk, 2005])Any finite pseudo metric (X ,d), |X | = n can be isometricallyembedded in

(R(p,q), ‖ · ‖2

)for some values of p + q < n.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 21 / 60

Page 81: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Pseudo Euclidean spaces

The Minkowski space-time

DefinitionR4 = R3 ⊕ R1 := R(3,1) endowed with the inner product〈(x1, y1, z1, t1), (x2, y2, z2, t2)〉 = x1x2 + y1y2 + z1z2 − t1t2 is a4-dimensional Minkowski space with signature (3,1). The normimposed by this inner product is ‖(x1, y1, z1, t1)‖2 = x2

1 + y21 + z2

1 − t21

Can have vectors of negative length due to the imaginary timecoordinateThe definition an be extended to arbitrary R(p,q) (PE Spaces)

Theorem ([Goldfarb, 1984], [Haasdonk, 2005])Any finite pseudo metric (X ,d), |X | = n can be isometricallyembedded in

(R(p,q), ‖ · ‖2

)for some values of p + q < n.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 21 / 60

Page 82: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Pseudo Euclidean spaces

The Embedding

Embedding the training setGiven a distance matrix Rn×n 3 D = (d(xi , xj)), find the correspondinginner products in the PE space as G = −1

2JDJ where J = I − 1n 11>.

Do an eigendecomposition of B = QΛQ> = Q|Λ|12 M|Λ|

12 Q> where

M =

[Ip×p 0

0 −Iq×q

]. The representation of the points is X = Q|Λ|

12

Embedding a new pointPerform a linear projection into the space found above. Givend = (d(x , xi)), the vector of distances to the old points, the innerproducts to all the old points is found as g = −1

2

(d − 1

n 11>D)

J. Nowfind the mean square error solution to xMX> = b as x = bX |Λ|−1M.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 22 / 60

Page 83: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances PE space approaches

Classification in PE spaces

Earliest observations by [Goldfarb, 1984] who realized the linkbetween landmarking and embedding approaches

[Pekalska and Duin, 2000],[Pekalska et al., 2001],[Pekalska and Duin, 2002] use this space to learn SVM, LPM,Quadratic Discriminant and Fisher Linear Discriminant classifiers[Harol et al., 2006] propose enlarging the PE space to allow forlesser distortion in embeddings test points[Duin and Pekalska, 2008] propose refinements to the distancemeasure by making modifications to the PE space allowing forbetter NN classificationGuarantees for classifiers learned in PE spaces ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 23 / 60

Page 84: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances PE space approaches

Classification in PE spaces

Earliest observations by [Goldfarb, 1984] who realized the linkbetween landmarking and embedding approaches[Pekalska and Duin, 2000],[Pekalska et al., 2001],[Pekalska and Duin, 2002] use this space to learn SVM, LPM,Quadratic Discriminant and Fisher Linear Discriminant classifiers

[Harol et al., 2006] propose enlarging the PE space to allow forlesser distortion in embeddings test points[Duin and Pekalska, 2008] propose refinements to the distancemeasure by making modifications to the PE space allowing forbetter NN classificationGuarantees for classifiers learned in PE spaces ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 23 / 60

Page 85: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances PE space approaches

Classification in PE spaces

Earliest observations by [Goldfarb, 1984] who realized the linkbetween landmarking and embedding approaches[Pekalska and Duin, 2000],[Pekalska et al., 2001],[Pekalska and Duin, 2002] use this space to learn SVM, LPM,Quadratic Discriminant and Fisher Linear Discriminant classifiers[Harol et al., 2006] propose enlarging the PE space to allow forlesser distortion in embeddings test points

[Duin and Pekalska, 2008] propose refinements to the distancemeasure by making modifications to the PE space allowing forbetter NN classificationGuarantees for classifiers learned in PE spaces ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 23 / 60

Page 86: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances PE space approaches

Classification in PE spaces

Earliest observations by [Goldfarb, 1984] who realized the linkbetween landmarking and embedding approaches[Pekalska and Duin, 2000],[Pekalska et al., 2001],[Pekalska and Duin, 2002] use this space to learn SVM, LPM,Quadratic Discriminant and Fisher Linear Discriminant classifiers[Harol et al., 2006] propose enlarging the PE space to allow forlesser distortion in embeddings test points[Duin and Pekalska, 2008] propose refinements to the distancemeasure by making modifications to the PE space allowing forbetter NN classification

Guarantees for classifiers learned in PE spaces ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 23 / 60

Page 87: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances PE space approaches

Classification in PE spaces

Earliest observations by [Goldfarb, 1984] who realized the linkbetween landmarking and embedding approaches[Pekalska and Duin, 2000],[Pekalska et al., 2001],[Pekalska and Duin, 2002] use this space to learn SVM, LPM,Quadratic Discriminant and Fisher Linear Discriminant classifiers[Harol et al., 2006] propose enlarging the PE space to allow forlesser distortion in embeddings test points[Duin and Pekalska, 2008] propose refinements to the distancemeasure by making modifications to the PE space allowing forbetter NN classificationGuarantees for classifiers learned in PE spaces ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 23 / 60

Page 88: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach space approaches

Data insensitive embeddings

Possible if the distance measure can be isometrically embeddedinto some space

Learn a simple classifier there and interpret it in terms of thedistance measureRequire algorithms that can work without explicit embeddingsExact for transductive as well as inductive problemsRecent interest due to advent of large margin classifiers

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 24 / 60

Page 89: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach space approaches

Data insensitive embeddings

Possible if the distance measure can be isometrically embeddedinto some spaceLearn a simple classifier there and interpret it in terms of thedistance measure

Require algorithms that can work without explicit embeddingsExact for transductive as well as inductive problemsRecent interest due to advent of large margin classifiers

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 24 / 60

Page 90: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach space approaches

Data insensitive embeddings

Possible if the distance measure can be isometrically embeddedinto some spaceLearn a simple classifier there and interpret it in terms of thedistance measureRequire algorithms that can work without explicit embeddings

Exact for transductive as well as inductive problemsRecent interest due to advent of large margin classifiers

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 24 / 60

Page 91: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach space approaches

Data insensitive embeddings

Possible if the distance measure can be isometrically embeddedinto some spaceLearn a simple classifier there and interpret it in terms of thedistance measureRequire algorithms that can work without explicit embeddingsExact for transductive as well as inductive problems

Recent interest due to advent of large margin classifiers

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 24 / 60

Page 92: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach space approaches

Data insensitive embeddings

Possible if the distance measure can be isometrically embeddedinto some spaceLearn a simple classifier there and interpret it in terms of thedistance measureRequire algorithms that can work without explicit embeddingsExact for transductive as well as inductive problemsRecent interest due to advent of large margin classifiers

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 24 / 60

Page 93: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Normed Spaces

DefinitionGiven a vector space V over a field F ⊆ C, a norm is a function‖ · ‖ : V → R such that ∀u,v ∈ V ,a ∈ F , ‖av‖ = |a|‖v‖,‖u + v‖ ≤ ‖u‖+ ‖v‖ and ‖v‖ = 0 if and only if v = 0. A vector spacethat is complete with respect to a norm is called a Banach space.

Theorem ([von Luxburg and Bousquet, 2004])Given a metric spaceM = (X ,d) and the space of all Lipschitzfunctions Lip(X ) defined onM, there exists a Banach Space B andmaps Φ : X → B and Ψ : Lip(X )→ B′, the operator norm on B′ givingthe Lipschitz constant for each function f ∈ Lip(X ) such that both canbe realized simultaneously as isomorphic isometries.

The Kuratowski embedding gives a constructive proof

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 25 / 60

Page 94: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Normed Spaces

DefinitionGiven a vector space V over a field F ⊆ C, a norm is a function‖ · ‖ : V → R such that ∀u,v ∈ V ,a ∈ F , ‖av‖ = |a|‖v‖,‖u + v‖ ≤ ‖u‖+ ‖v‖ and ‖v‖ = 0 if and only if v = 0. A vector spacethat is complete with respect to a norm is called a Banach space.

Theorem ([von Luxburg and Bousquet, 2004])Given a metric spaceM = (X ,d) and the space of all Lipschitzfunctions Lip(X ) defined onM, there exists a Banach Space B andmaps Φ : X → B and Ψ : Lip(X )→ B′, the operator norm on B′ givingthe Lipschitz constant for each function f ∈ Lip(X ) such that both canbe realized simultaneously as isomorphic isometries.

The Kuratowski embedding gives a constructive proof

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 25 / 60

Page 95: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Normed Spaces

DefinitionGiven a vector space V over a field F ⊆ C, a norm is a function‖ · ‖ : V → R such that ∀u,v ∈ V ,a ∈ F , ‖av‖ = |a|‖v‖,‖u + v‖ ≤ ‖u‖+ ‖v‖ and ‖v‖ = 0 if and only if v = 0. A vector spacethat is complete with respect to a norm is called a Banach space.

Theorem ([von Luxburg and Bousquet, 2004])Given a metric spaceM = (X ,d) and the space of all Lipschitzfunctions Lip(X ) defined onM, there exists a Banach Space B andmaps Φ : X → B and Ψ : Lip(X )→ B′, the operator norm on B′ givingthe Lipschitz constant for each function f ∈ Lip(X ) such that both canbe realized simultaneously as isomorphic isometries.

The Kuratowski embedding gives a constructive proof

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 25 / 60

Page 96: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Classification in Banach spaces

[von Luxburg and Bousquet, 2004] proposes large marginclassification schemes on Banach spaces relying on Convex hullinterpretations of SVM classifiers

infp+∈C+,p−∈C−

‖p+ − p−‖ (1)

supt∈B′

infp+∈C+,p−∈C−

〈T ,p+ − p−〉‖T‖

(2)

infT∈B′,b∈R

‖T‖ = L(T )

subject to t(xi) (〈T , xi〉+ b) ≥ 1,∀i = 1, . . . ,n.(3)

infT∈B′,b∈R

L(T ) + Cn∑

i=1ξi

subject to t(xi) (〈T , xi〉+ b) ≥ 1− ξi , ξ ≥ 0∀i = 1, . . . ,n.(4)

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 26 / 60

Page 97: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Classification in Banach spaces

[von Luxburg and Bousquet, 2004] proposes large marginclassification schemes on Banach spaces relying on Convex hullinterpretations of SVM classifiers

infp+∈C+,p−∈C−

‖p+ − p−‖ (1)

supt∈B′

infp+∈C+,p−∈C−

〈T ,p+ − p−〉‖T‖

(2)

infT∈B′,b∈R

‖T‖ = L(T )

subject to t(xi) (〈T , xi〉+ b) ≥ 1,∀i = 1, . . . ,n.(3)

infT∈B′,b∈R

L(T ) + Cn∑

i=1ξi

subject to t(xi) (〈T , xi〉+ b) ≥ 1− ξi , ξ ≥ 0∀i = 1, . . . ,n.(4)

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 26 / 60

Page 98: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Classification in Banach spaces

[von Luxburg and Bousquet, 2004] proposes large marginclassification schemes on Banach spaces relying on Convex hullinterpretations of SVM classifiers

infp+∈C+,p−∈C−

‖p+ − p−‖ (1)

supt∈B′

infp+∈C+,p−∈C−

〈T ,p+ − p−〉‖T‖

(2)

infT∈B′,b∈R

‖T‖ = L(T )

subject to t(xi) (〈T , xi〉+ b) ≥ 1,∀i = 1, . . . ,n.(3)

infT∈B′,b∈R

L(T ) + Cn∑

i=1ξi

subject to t(xi) (〈T , xi〉+ b) ≥ 1− ξi , ξ ≥ 0∀i = 1, . . . ,n.(4)

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 26 / 60

Page 99: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Classification in Banach spaces

[von Luxburg and Bousquet, 2004] proposes large marginclassification schemes on Banach spaces relying on Convex hullinterpretations of SVM classifiers

infp+∈C+,p−∈C−

‖p+ − p−‖ (1)

supt∈B′

infp+∈C+,p−∈C−

〈T ,p+ − p−〉‖T‖

(2)

infT∈B′,b∈R

‖T‖ = L(T )

subject to t(xi) (〈T , xi〉+ b) ≥ 1,∀i = 1, . . . ,n.(3)

infT∈B′,b∈R

L(T ) + Cn∑

i=1ξi

subject to t(xi) (〈T , xi〉+ b) ≥ 1− ξi , ξ ≥ 0∀i = 1, . . . ,n.(4)

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 26 / 60

Page 100: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Classification in Banach spaces

[von Luxburg and Bousquet, 2004] proposes large marginclassification schemes on Banach spaces relying on Convex hullinterpretations of SVM classifiers

infp+∈C+,p−∈C−

‖p+ − p−‖ (1)

supt∈B′

infp+∈C+,p−∈C−

〈T ,p+ − p−〉‖T‖

(2)

infT∈B′,b∈R

‖T‖ = L(T )

subject to t(xi) (〈T , xi〉+ b) ≥ 1,∀i = 1, . . . ,n.(3)

infT∈B′,b∈R

L(T ) + Cn∑

i=1ξi

subject to t(xi) (〈T , xi〉+ b) ≥ 1− ξi , ξ ≥ 0∀i = 1, . . . ,n.(4)

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 26 / 60

Page 101: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Representer Theorems

Lets us escape the curse of dimensionality

Theorem (Lipschitz extension)Given a Lipschitz function f defined on a finite subset X ⊂ X , one canextend f to f ′ on the entire domain such that Lip(f ′) = Lip(f ).

Solution to Program 3 is always of the form

f (x) =d(x ,T−)− d(x ,T +)

d(T +,T−)

Solution to Program 4 is always of the form

g(x) = αmini

(t(xi) + L0d(x , xi)) + (1− α)maxi

(t(xi)− L0d(x , xi))

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 27 / 60

Page 102: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Representer Theorems

Lets us escape the curse of dimensionality

Theorem (Lipschitz extension)Given a Lipschitz function f defined on a finite subset X ⊂ X , one canextend f to f ′ on the entire domain such that Lip(f ′) = Lip(f ).

Solution to Program 3 is always of the form

f (x) =d(x ,T−)− d(x ,T +)

d(T +,T−)

Solution to Program 4 is always of the form

g(x) = αmini

(t(xi) + L0d(x , xi)) + (1− α)maxi

(t(xi)− L0d(x , xi))

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 27 / 60

Page 103: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Representer Theorems

Lets us escape the curse of dimensionality

Theorem (Lipschitz extension)Given a Lipschitz function f defined on a finite subset X ⊂ X , one canextend f to f ′ on the entire domain such that Lip(f ′) = Lip(f ).

Solution to Program 3 is always of the form

f (x) =d(x ,T−)− d(x ,T +)

d(T +,T−)

Solution to Program 4 is always of the form

g(x) = αmini

(t(xi) + L0d(x , xi)) + (1− α)maxi

(t(xi)− L0d(x , xi))

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 27 / 60

Page 104: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

But ...

Not a representer theorem involving distances to individualtraining points

Shown not to exist in certain cases - but the examples don’t seemnaturalBy restricting oneself to different subspaces of Lip(X ) onerecovers the SVM, LPM and NN algorithmsCan one use bi-Lipschitz embeddings instead ?Can one define “distance kernels” that allow one to restrict oneselfto specific subspaces of Lip(X )

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 28 / 60

Page 105: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

But ...

Not a representer theorem involving distances to individualtraining pointsShown not to exist in certain cases - but the examples don’t seemnatural

By restricting oneself to different subspaces of Lip(X ) onerecovers the SVM, LPM and NN algorithmsCan one use bi-Lipschitz embeddings instead ?Can one define “distance kernels” that allow one to restrict oneselfto specific subspaces of Lip(X )

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 28 / 60

Page 106: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

But ...

Not a representer theorem involving distances to individualtraining pointsShown not to exist in certain cases - but the examples don’t seemnaturalBy restricting oneself to different subspaces of Lip(X ) onerecovers the SVM, LPM and NN algorithms

Can one use bi-Lipschitz embeddings instead ?Can one define “distance kernels” that allow one to restrict oneselfto specific subspaces of Lip(X )

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 28 / 60

Page 107: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

But ...

Not a representer theorem involving distances to individualtraining pointsShown not to exist in certain cases - but the examples don’t seemnaturalBy restricting oneself to different subspaces of Lip(X ) onerecovers the SVM, LPM and NN algorithmsCan one use bi-Lipschitz embeddings instead ?

Can one define “distance kernels” that allow one to restrict oneselfto specific subspaces of Lip(X )

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 28 / 60

Page 108: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

But ...

Not a representer theorem involving distances to individualtraining pointsShown not to exist in certain cases - but the examples don’t seemnaturalBy restricting oneself to different subspaces of Lip(X ) onerecovers the SVM, LPM and NN algorithmsCan one use bi-Lipschitz embeddings instead ?Can one define “distance kernels” that allow one to restrict oneselfto specific subspaces of Lip(X )

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 28 / 60

Page 109: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Other Banach Space Approaches

[Hein et al., 2005] consider low distortion embeddings into Hilbertspaces giving a re-derivation of the SVM algorithm

DefinitionA matrix A ∈ Rn×n is said to be conditionally positive definite if∀c ∈ Rn, c>1 = 0, c>Ac > 0.

DefinitionA kernel K defined on a domain X is said to be conditionally positivedefinite if ∀n ∈ N, ∀x1, . . . xn ∈ X , the matrix G = (Gij) = (K (xi , xj)) isconditionally positive definite.

TheoremA metric d is Hibertian if it can be isometrically embedded into aHilbert space iff −d2 is conditionally positive definite

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 29 / 60

Page 110: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Other Banach Space Approaches

[Hein et al., 2005] consider low distortion embeddings into Hilbertspaces giving a re-derivation of the SVM algorithm

DefinitionA matrix A ∈ Rn×n is said to be conditionally positive definite if∀c ∈ Rn, c>1 = 0, c>Ac > 0.

DefinitionA kernel K defined on a domain X is said to be conditionally positivedefinite if ∀n ∈ N, ∀x1, . . . xn ∈ X , the matrix G = (Gij) = (K (xi , xj)) isconditionally positive definite.

TheoremA metric d is Hibertian if it can be isometrically embedded into aHilbert space iff −d2 is conditionally positive definite

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 29 / 60

Page 111: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Other Banach Space Approaches

[Hein et al., 2005] consider low distortion embeddings into Hilbertspaces giving a re-derivation of the SVM algorithm

DefinitionA matrix A ∈ Rn×n is said to be conditionally positive definite if∀c ∈ Rn, c>1 = 0, c>Ac > 0.

DefinitionA kernel K defined on a domain X is said to be conditionally positivedefinite if ∀n ∈ N, ∀x1, . . . xn ∈ X , the matrix G = (Gij) = (K (xi , xj)) isconditionally positive definite.

TheoremA metric d is Hibertian if it can be isometrically embedded into aHilbert space iff −d2 is conditionally positive definite

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 29 / 60

Page 112: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Other Banach Space Approaches

[Hein et al., 2005] consider low distortion embeddings into Hilbertspaces giving a re-derivation of the SVM algorithm

DefinitionA matrix A ∈ Rn×n is said to be conditionally positive definite if∀c ∈ Rn, c>1 = 0, c>Ac > 0.

DefinitionA kernel K defined on a domain X is said to be conditionally positivedefinite if ∀n ∈ N, ∀x1, . . . xn ∈ X , the matrix G = (Gij) = (K (xi , xj)) isconditionally positive definite.

TheoremA metric d is Hibertian if it can be isometrically embedded into aHilbert space iff −d2 is conditionally positive definite

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 29 / 60

Page 113: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Other Banach Space Approaches

[Der and Lee, 2007] consider exploiting the semi-inner productstructure present in Banach space to yield SVM formulations

I Aim for a kernel trick for general metricsI Lack of symmetry and bi-linearity for semi inner products prevents

such kernel tricks for general metrics[Zhang et al., 2009] propose Reproducing Kernel Banach Spacesakin to RKHS that admit kernel tricks

I Use a bilinear form on B × B′ instead of B × BI No succinct characterizations of what can yield an RKBSI For finite domains, any kernel is a reproducing kernel for some

RKBS (trivial)

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 30 / 60

Page 114: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Other Banach Space Approaches

[Der and Lee, 2007] consider exploiting the semi-inner productstructure present in Banach space to yield SVM formulations

I Aim for a kernel trick for general metrics

I Lack of symmetry and bi-linearity for semi inner products preventssuch kernel tricks for general metrics

[Zhang et al., 2009] propose Reproducing Kernel Banach Spacesakin to RKHS that admit kernel tricks

I Use a bilinear form on B × B′ instead of B × BI No succinct characterizations of what can yield an RKBSI For finite domains, any kernel is a reproducing kernel for some

RKBS (trivial)

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 30 / 60

Page 115: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Other Banach Space Approaches

[Der and Lee, 2007] consider exploiting the semi-inner productstructure present in Banach space to yield SVM formulations

I Aim for a kernel trick for general metricsI Lack of symmetry and bi-linearity for semi inner products prevents

such kernel tricks for general metrics

[Zhang et al., 2009] propose Reproducing Kernel Banach Spacesakin to RKHS that admit kernel tricks

I Use a bilinear form on B × B′ instead of B × BI No succinct characterizations of what can yield an RKBSI For finite domains, any kernel is a reproducing kernel for some

RKBS (trivial)

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 30 / 60

Page 116: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Other Banach Space Approaches

[Der and Lee, 2007] consider exploiting the semi-inner productstructure present in Banach space to yield SVM formulations

I Aim for a kernel trick for general metricsI Lack of symmetry and bi-linearity for semi inner products prevents

such kernel tricks for general metrics[Zhang et al., 2009] propose Reproducing Kernel Banach Spacesakin to RKHS that admit kernel tricks

I Use a bilinear form on B × B′ instead of B × BI No succinct characterizations of what can yield an RKBSI For finite domains, any kernel is a reproducing kernel for some

RKBS (trivial)

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 30 / 60

Page 117: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Other Banach Space Approaches

[Der and Lee, 2007] consider exploiting the semi-inner productstructure present in Banach space to yield SVM formulations

I Aim for a kernel trick for general metricsI Lack of symmetry and bi-linearity for semi inner products prevents

such kernel tricks for general metrics[Zhang et al., 2009] propose Reproducing Kernel Banach Spacesakin to RKHS that admit kernel tricks

I Use a bilinear form on B × B′ instead of B × B

I No succinct characterizations of what can yield an RKBSI For finite domains, any kernel is a reproducing kernel for some

RKBS (trivial)

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 30 / 60

Page 118: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Other Banach Space Approaches

[Der and Lee, 2007] consider exploiting the semi-inner productstructure present in Banach space to yield SVM formulations

I Aim for a kernel trick for general metricsI Lack of symmetry and bi-linearity for semi inner products prevents

such kernel tricks for general metrics[Zhang et al., 2009] propose Reproducing Kernel Banach Spacesakin to RKHS that admit kernel tricks

I Use a bilinear form on B × B′ instead of B × BI No succinct characterizations of what can yield an RKBS

I For finite domains, any kernel is a reproducing kernel for someRKBS (trivial)

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 30 / 60

Page 119: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Other Banach Space Approaches

[Der and Lee, 2007] consider exploiting the semi-inner productstructure present in Banach space to yield SVM formulations

I Aim for a kernel trick for general metricsI Lack of symmetry and bi-linearity for semi inner products prevents

such kernel tricks for general metrics[Zhang et al., 2009] propose Reproducing Kernel Banach Spacesakin to RKHS that admit kernel tricks

I Use a bilinear form on B × B′ instead of B × BI No succinct characterizations of what can yield an RKBSI For finite domains, any kernel is a reproducing kernel for some

RKBS (trivial)

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 30 / 60

Page 120: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Kernel Trick for Distances ?

Theorem ([Scholkopf, 2000])A kernel C defined on some domain X is CPD iff for some fixedx0 ∈ X , the kernel K (x , x ′) = C(x , x ′)− C(x , x0)− C(x ′, x0) is PD.Such a C is also a Hilbertian metric.

The SVM algorithm is incapable of distinguishing between C andK [Boughorbel et al., 2005]

n∑i,j=1

αiαjyiyjK (xi , xj) =n∑

i,j=1αiαjyiyjC(xi , xj) subject to

n∑i=1

αiyi = 0

What about higher order CPD kernels - their characterization ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 31 / 60

Page 121: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Kernel Trick for Distances ?

Theorem ([Scholkopf, 2000])A kernel C defined on some domain X is CPD iff for some fixedx0 ∈ X , the kernel K (x , x ′) = C(x , x ′)− C(x , x0)− C(x ′, x0) is PD.Such a C is also a Hilbertian metric.

The SVM algorithm is incapable of distinguishing between C andK [Boughorbel et al., 2005]

n∑i,j=1

αiαjyiyjK (xi , xj) =n∑

i,j=1αiαjyiyjC(xi , xj) subject to

n∑i=1

αiyi = 0

What about higher order CPD kernels - their characterization ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 31 / 60

Page 122: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as distances Banach spaces

Kernel Trick for Distances ?

Theorem ([Scholkopf, 2000])A kernel C defined on some domain X is CPD iff for some fixedx0 ∈ X , the kernel K (x , x ′) = C(x , x ′)− C(x , x0)− C(x ′, x0) is PD.Such a C is also a Hilbertian metric.

The SVM algorithm is incapable of distinguishing between C andK [Boughorbel et al., 2005]

n∑i,j=1

αiαjyiyjK (xi , xj) =n∑

i,j=1αiαjyiyjC(xi , xj) subject to

n∑i=1

αiyi = 0

What about higher order CPD kernels - their characterization ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 31 / 60

Page 123: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity

Kernels as similarity

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 32 / 60

Page 124: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity

The Kernel Trick

Mercer’s theorem tells us that a similarity space (X ,K ) isembeddable in a Hilbert space iff K is a PSD kernel

Quite similar to what we had for Banach spaces only with morestructure nowCan formulate large margin classifiers as beforeRepresenter Theorem [Scholkopf and Smola, 2001] : solution of

the form f (x) =n∑

i=1K (x , xi)

Generalization Guarantees : method of Rademacher Averages[Mendelson, 2003]

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 33 / 60

Page 125: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity

The Kernel Trick

Mercer’s theorem tells us that a similarity space (X ,K ) isembeddable in a Hilbert space iff K is a PSD kernelQuite similar to what we had for Banach spaces only with morestructure now

Can formulate large margin classifiers as beforeRepresenter Theorem [Scholkopf and Smola, 2001] : solution of

the form f (x) =n∑

i=1K (x , xi)

Generalization Guarantees : method of Rademacher Averages[Mendelson, 2003]

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 33 / 60

Page 126: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity

The Kernel Trick

Mercer’s theorem tells us that a similarity space (X ,K ) isembeddable in a Hilbert space iff K is a PSD kernelQuite similar to what we had for Banach spaces only with morestructure nowCan formulate large margin classifiers as before

Representer Theorem [Scholkopf and Smola, 2001] : solution of

the form f (x) =n∑

i=1K (x , xi)

Generalization Guarantees : method of Rademacher Averages[Mendelson, 2003]

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 33 / 60

Page 127: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity

The Kernel Trick

Mercer’s theorem tells us that a similarity space (X ,K ) isembeddable in a Hilbert space iff K is a PSD kernelQuite similar to what we had for Banach spaces only with morestructure nowCan formulate large margin classifiers as beforeRepresenter Theorem [Scholkopf and Smola, 2001] : solution of

the form f (x) =n∑

i=1K (x , xi)

Generalization Guarantees : method of Rademacher Averages[Mendelson, 2003]

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 33 / 60

Page 128: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity

The Kernel Trick

Mercer’s theorem tells us that a similarity space (X ,K ) isembeddable in a Hilbert space iff K is a PSD kernelQuite similar to what we had for Banach spaces only with morestructure nowCan formulate large margin classifiers as beforeRepresenter Theorem [Scholkopf and Smola, 2001] : solution of

the form f (x) =n∑

i=1K (x , xi)

Generalization Guarantees : method of Rademacher Averages[Mendelson, 2003]

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 33 / 60

Page 129: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Indefinite Similarity Kernels

The Lazy approaches

Why bother building a theory when one already exists !

I Use a PD approximation to the given indefinite kernel !![Chen et al., 2009] Spectrum Shift, Spectrum Clip, Spectrum Flip

I [Luss and d’Aspremont, 2007] folds this process into the SVMalgorithm by treating an indefinite kernel as a noisy version of aMercer kernel

I Tries to handle test points consistently but no theoreticaljustification of the process

I Mercer kernels are not dense in the space of symmetric kernels

[Haasdonk and Bahlmann, 2004] propose distance substitutionkernels : substituting distance/similarity measures into kernels ofthe form K (‖x− y‖),K (〈x,y〉)

I These yield PD kernels iff the distance measure is Hilbertian

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 34 / 60

Page 130: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Indefinite Similarity Kernels

The Lazy approaches

Why bother building a theory when one already exists !I Use a PD approximation to the given indefinite kernel !!

[Chen et al., 2009] Spectrum Shift, Spectrum Clip, Spectrum Flip

I [Luss and d’Aspremont, 2007] folds this process into the SVMalgorithm by treating an indefinite kernel as a noisy version of aMercer kernel

I Tries to handle test points consistently but no theoreticaljustification of the process

I Mercer kernels are not dense in the space of symmetric kernels

[Haasdonk and Bahlmann, 2004] propose distance substitutionkernels : substituting distance/similarity measures into kernels ofthe form K (‖x− y‖),K (〈x,y〉)

I These yield PD kernels iff the distance measure is Hilbertian

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 34 / 60

Page 131: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Indefinite Similarity Kernels

The Lazy approaches

Why bother building a theory when one already exists !I Use a PD approximation to the given indefinite kernel !!

[Chen et al., 2009] Spectrum Shift, Spectrum Clip, Spectrum Flip

I [Luss and d’Aspremont, 2007] folds this process into the SVMalgorithm by treating an indefinite kernel as a noisy version of aMercer kernel

I Tries to handle test points consistently but no theoreticaljustification of the process

I Mercer kernels are not dense in the space of symmetric kernels[Haasdonk and Bahlmann, 2004] propose distance substitutionkernels : substituting distance/similarity measures into kernels ofthe form K (‖x− y‖),K (〈x,y〉)

I These yield PD kernels iff the distance measure is Hilbertian

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 34 / 60

Page 132: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Indefinite Similarity Kernels

The Lazy approaches

Why bother building a theory when one already exists !I Use a PD approximation to the given indefinite kernel !!

[Chen et al., 2009] Spectrum Shift, Spectrum Clip, Spectrum FlipI [Luss and d’Aspremont, 2007] folds this process into the SVM

algorithm by treating an indefinite kernel as a noisy version of aMercer kernel

I Tries to handle test points consistently but no theoreticaljustification of the process

I Mercer kernels are not dense in the space of symmetric kernels[Haasdonk and Bahlmann, 2004] propose distance substitutionkernels : substituting distance/similarity measures into kernels ofthe form K (‖x− y‖),K (〈x,y〉)

I These yield PD kernels iff the distance measure is Hilbertian

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 34 / 60

Page 133: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Indefinite Similarity Kernels

The Lazy approaches

Why bother building a theory when one already exists !I Use a PD approximation to the given indefinite kernel !!

[Chen et al., 2009] Spectrum Shift, Spectrum Clip, Spectrum FlipI [Luss and d’Aspremont, 2007] folds this process into the SVM

algorithm by treating an indefinite kernel as a noisy version of aMercer kernel

I Tries to handle test points consistently but no theoreticaljustification of the process

I Mercer kernels are not dense in the space of symmetric kernels[Haasdonk and Bahlmann, 2004] propose distance substitutionkernels : substituting distance/similarity measures into kernels ofthe form K (‖x− y‖),K (〈x,y〉)

I These yield PD kernels iff the distance measure is Hilbertian

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 34 / 60

Page 134: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Indefinite Similarity Kernels

The Lazy approaches

Why bother building a theory when one already exists !I Use a PD approximation to the given indefinite kernel !!

[Chen et al., 2009] Spectrum Shift, Spectrum Clip, Spectrum FlipI [Luss and d’Aspremont, 2007] folds this process into the SVM

algorithm by treating an indefinite kernel as a noisy version of aMercer kernel

I Tries to handle test points consistently but no theoreticaljustification of the process

I Mercer kernels are not dense in the space of symmetric kernels

[Haasdonk and Bahlmann, 2004] propose distance substitutionkernels : substituting distance/similarity measures into kernels ofthe form K (‖x− y‖),K (〈x,y〉)

I These yield PD kernels iff the distance measure is Hilbertian

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 34 / 60

Page 135: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Indefinite Similarity Kernels

The Lazy approaches

Why bother building a theory when one already exists !I Use a PD approximation to the given indefinite kernel !!

[Chen et al., 2009] Spectrum Shift, Spectrum Clip, Spectrum FlipI [Luss and d’Aspremont, 2007] folds this process into the SVM

algorithm by treating an indefinite kernel as a noisy version of aMercer kernel

I Tries to handle test points consistently but no theoreticaljustification of the process

I Mercer kernels are not dense in the space of symmetric kernels[Haasdonk and Bahlmann, 2004] propose distance substitutionkernels : substituting distance/similarity measures into kernels ofthe form K (‖x− y‖),K (〈x,y〉)

I These yield PD kernels iff the distance measure is Hilbertian

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 34 / 60

Page 136: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Indefinite Similarity Kernels

The Lazy approaches

Why bother building a theory when one already exists !I Use a PD approximation to the given indefinite kernel !!

[Chen et al., 2009] Spectrum Shift, Spectrum Clip, Spectrum FlipI [Luss and d’Aspremont, 2007] folds this process into the SVM

algorithm by treating an indefinite kernel as a noisy version of aMercer kernel

I Tries to handle test points consistently but no theoreticaljustification of the process

I Mercer kernels are not dense in the space of symmetric kernels[Haasdonk and Bahlmann, 2004] propose distance substitutionkernels : substituting distance/similarity measures into kernels ofthe form K (‖x− y‖),K (〈x,y〉)

I These yield PD kernels iff the distance measure is Hilbertian

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 34 / 60

Page 137: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity PE space approaches

Working with Indefinite Similarities

Embed Training sets into PE spaces (Minkowski spaces) as before

[Graepel et al., 1998] proposes to learn SVMs in this space -unfortunately not a large margin formulation[Graepel et al., 1999] propose LP machines in a ν-SVM likeformulation to obtain sparse classifiers[Mierswa, 2006] proposes using evolutionary algorithms to solvenon-convex formulations involving indefinite kernels

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 35 / 60

Page 138: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity PE space approaches

Working with Indefinite Similarities

Embed Training sets into PE spaces (Minkowski spaces) as before[Graepel et al., 1998] proposes to learn SVMs in this space -unfortunately not a large margin formulation

[Graepel et al., 1999] propose LP machines in a ν-SVM likeformulation to obtain sparse classifiers[Mierswa, 2006] proposes using evolutionary algorithms to solvenon-convex formulations involving indefinite kernels

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 35 / 60

Page 139: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity PE space approaches

Working with Indefinite Similarities

Embed Training sets into PE spaces (Minkowski spaces) as before[Graepel et al., 1998] proposes to learn SVMs in this space -unfortunately not a large margin formulation[Graepel et al., 1999] propose LP machines in a ν-SVM likeformulation to obtain sparse classifiers

[Mierswa, 2006] proposes using evolutionary algorithms to solvenon-convex formulations involving indefinite kernels

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 35 / 60

Page 140: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity PE space approaches

Working with Indefinite Similarities

Embed Training sets into PE spaces (Minkowski spaces) as before[Graepel et al., 1998] proposes to learn SVMs in this space -unfortunately not a large margin formulation[Graepel et al., 1999] propose LP machines in a ν-SVM likeformulation to obtain sparse classifiers[Mierswa, 2006] proposes using evolutionary algorithms to solvenon-convex formulations involving indefinite kernels

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 35 / 60

Page 141: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity PE space approaches

Working with Indefinite Similarities

[Haasdonk, 2005] embeds training data into a PE space andformulates a ν-SVM-like classifier there

Not a margin maximization formulationNew points are not embedded into this space - rather the SVM likerepresentation is used (without justification)Optimization not possible since program formulations arenon-convex - stabilization usedCan any guarantees be given for this formulation ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 36 / 60

Page 142: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity PE space approaches

Working with Indefinite Similarities

[Haasdonk, 2005] embeds training data into a PE space andformulates a ν-SVM-like classifier thereNot a margin maximization formulation

New points are not embedded into this space - rather the SVM likerepresentation is used (without justification)Optimization not possible since program formulations arenon-convex - stabilization usedCan any guarantees be given for this formulation ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 36 / 60

Page 143: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity PE space approaches

Working with Indefinite Similarities

[Haasdonk, 2005] embeds training data into a PE space andformulates a ν-SVM-like classifier thereNot a margin maximization formulationNew points are not embedded into this space - rather the SVM likerepresentation is used (without justification)

Optimization not possible since program formulations arenon-convex - stabilization usedCan any guarantees be given for this formulation ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 36 / 60

Page 144: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity PE space approaches

Working with Indefinite Similarities

[Haasdonk, 2005] embeds training data into a PE space andformulates a ν-SVM-like classifier thereNot a margin maximization formulationNew points are not embedded into this space - rather the SVM likerepresentation is used (without justification)Optimization not possible since program formulations arenon-convex - stabilization used

Can any guarantees be given for this formulation ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 36 / 60

Page 145: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity PE space approaches

Working with Indefinite Similarities

[Haasdonk, 2005] embeds training data into a PE space andformulates a ν-SVM-like classifier thereNot a margin maximization formulationNew points are not embedded into this space - rather the SVM likerepresentation is used (without justification)Optimization not possible since program formulations arenon-convex - stabilization usedCan any guarantees be given for this formulation ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 36 / 60

Page 146: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Kreın space approaches

Kreın spaces

DefinitionAn inner product space (K, 〈, 〉K) is called a Kreın space if there existtwo Hilbert spaces H+ and H− such K = H+ ⊕H− and ∀f ,g ∈ K,〈f ,g〉K = 〈f ,g〉H+

− 〈f ,g〉H− .

DefinitionGiven a domain X , a subset K ⊂ RX is called a Reproducing KernelKreın space if the evaluation functional Tx : f 7→ f (x) is continuous onK with respect to its strong topology.

Theorem ([Ong et al., 2004])A kernel K on X is a reproducing kernel for some Kreın space K iffthere exist PD kernels K+ and K− such that K = K+ − K−.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 37 / 60

Page 147: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Kreın space approaches

Classification in Kreın spaces

[Ong et al., 2004] proves all the necessary results for learninglarge margin classifiers

Prove that even stabilization leads to an SVM-like RepresenterTheoremNo large margin formulations considered due to singularity issues

I Instead regularization is performed by truncating the spectrum of KI Iterative methods to minimize squared error lead to regularizations

Proves generalization error bounds using method of Rademacheraverages

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 38 / 60

Page 148: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Kreın space approaches

Classification in Kreın spaces

[Ong et al., 2004] proves all the necessary results for learninglarge margin classifiersProve that even stabilization leads to an SVM-like RepresenterTheorem

No large margin formulations considered due to singularity issues

I Instead regularization is performed by truncating the spectrum of KI Iterative methods to minimize squared error lead to regularizations

Proves generalization error bounds using method of Rademacheraverages

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 38 / 60

Page 149: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Kreın space approaches

Classification in Kreın spaces

[Ong et al., 2004] proves all the necessary results for learninglarge margin classifiersProve that even stabilization leads to an SVM-like RepresenterTheoremNo large margin formulations considered due to singularity issues

I Instead regularization is performed by truncating the spectrum of KI Iterative methods to minimize squared error lead to regularizations

Proves generalization error bounds using method of Rademacheraverages

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 38 / 60

Page 150: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Kreın space approaches

Classification in Kreın spaces

[Ong et al., 2004] proves all the necessary results for learninglarge margin classifiersProve that even stabilization leads to an SVM-like RepresenterTheoremNo large margin formulations considered due to singularity issues

I Instead regularization is performed by truncating the spectrum of K

I Iterative methods to minimize squared error lead to regularizations

Proves generalization error bounds using method of Rademacheraverages

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 38 / 60

Page 151: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Kreın space approaches

Classification in Kreın spaces

[Ong et al., 2004] proves all the necessary results for learninglarge margin classifiersProve that even stabilization leads to an SVM-like RepresenterTheoremNo large margin formulations considered due to singularity issues

I Instead regularization is performed by truncating the spectrum of KI Iterative methods to minimize squared error lead to regularizations

Proves generalization error bounds using method of Rademacheraverages

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 38 / 60

Page 152: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Kreın space approaches

Classification in Kreın spaces

[Ong et al., 2004] proves all the necessary results for learninglarge margin classifiersProve that even stabilization leads to an SVM-like RepresenterTheoremNo large margin formulations considered due to singularity issues

I Instead regularization is performed by truncating the spectrum of KI Iterative methods to minimize squared error lead to regularizations

Proves generalization error bounds using method of Rademacheraverages

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 38 / 60

Page 153: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Landmarking based approaches

Landmarking approaches

[Graepel et al., 1999] consider landmarking with indefinite kernels

Perform L1 regularization for large margin classifier to obtainsparse solutions - yields an LP formulationAlso propose the ν-SVM formulation to get control over number ofmargin violationsAllows us to perform optimizations in the bias-variance trade-offHowever no guarantees given - were provided later by[Hein et al., 2005], [von Luxburg and Bousquet, 2004]

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 39 / 60

Page 154: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Landmarking based approaches

Landmarking approaches

[Graepel et al., 1999] consider landmarking with indefinite kernelsPerform L1 regularization for large margin classifier to obtainsparse solutions - yields an LP formulation

Also propose the ν-SVM formulation to get control over number ofmargin violationsAllows us to perform optimizations in the bias-variance trade-offHowever no guarantees given - were provided later by[Hein et al., 2005], [von Luxburg and Bousquet, 2004]

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 39 / 60

Page 155: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Landmarking based approaches

Landmarking approaches

[Graepel et al., 1999] consider landmarking with indefinite kernelsPerform L1 regularization for large margin classifier to obtainsparse solutions - yields an LP formulationAlso propose the ν-SVM formulation to get control over number ofmargin violations

Allows us to perform optimizations in the bias-variance trade-offHowever no guarantees given - were provided later by[Hein et al., 2005], [von Luxburg and Bousquet, 2004]

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 39 / 60

Page 156: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Landmarking based approaches

Landmarking approaches

[Graepel et al., 1999] consider landmarking with indefinite kernelsPerform L1 regularization for large margin classifier to obtainsparse solutions - yields an LP formulationAlso propose the ν-SVM formulation to get control over number ofmargin violationsAllows us to perform optimizations in the bias-variance trade-off

However no guarantees given - were provided later by[Hein et al., 2005], [von Luxburg and Bousquet, 2004]

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 39 / 60

Page 157: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Landmarking based approaches

Landmarking approaches

[Graepel et al., 1999] consider landmarking with indefinite kernelsPerform L1 regularization for large margin classifier to obtainsparse solutions - yields an LP formulationAlso propose the ν-SVM formulation to get control over number ofmargin violationsAllows us to perform optimizations in the bias-variance trade-offHowever no guarantees given - were provided later by[Hein et al., 2005], [von Luxburg and Bousquet, 2004]

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 39 / 60

Page 158: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Landmarking based approaches

What is a good similarity function

DefinitionA kernel function K is said to be (ε, γ)-kernel good for a learningproblem, if ∃β ∈ KK

Prx∈Rµ

[t(x)(〈β,ΦK (x)〉 > γ)] ≥ 1− ε.

DefinitionA kernel function K is said to be strongly (ε, γ)-good for a learningproblem, if at least a 1− ε probablity mass of the domain satisfies

Ex ′∈Rµ+

[K (x , x ′)

]> E

x ′∈Rµ−

[K (x , x ′)

]+ γ

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 40 / 60

Page 159: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Landmarking based approaches

Learning with a good distance function

Theorem ([Balcan et al., 2008a])Given a strongly (ε, γ)-good distance function, the following classifierh, for any ε, δ > 0, when given n = 16

γ2 lg(2δ

)pairs of positive and

negative training points, (ai ,bi)ni=1,ai ∈R µ+,bi ∈R µ− with probability

greater than 1− δ, has an error no more than ε+ δ

h(x) = sgn [f (x)] , f (x) =1n

n∑i=1

K (x ,ai)−1n

n∑i=1

K (x ,bi)

Have to introduce a weighing function to extend scope of thealgorithm

Can be shown to imply that the landmarking kernel induced by arandom sample is good kernel with high probabilityYet another instance of weak∗-PAC learning

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 41 / 60

Page 160: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Landmarking based approaches

Learning with a good distance function

Theorem ([Balcan et al., 2008a])Given a strongly (ε, γ)-good distance function, the following classifierh, for any ε, δ > 0, when given n = 16

γ2 lg(2δ

)pairs of positive and

negative training points, (ai ,bi)ni=1,ai ∈R µ+,bi ∈R µ− with probability

greater than 1− δ, has an error no more than ε+ δ

h(x) = sgn [f (x)] , f (x) =1n

n∑i=1

K (x ,ai)−1n

n∑i=1

K (x ,bi)

Have to introduce a weighing function to extend scope of thealgorithmCan be shown to imply that the landmarking kernel induced by arandom sample is good kernel with high probability

Yet another instance of weak∗-PAC learning

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 41 / 60

Page 161: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Landmarking based approaches

Learning with a good distance function

Theorem ([Balcan et al., 2008a])Given a strongly (ε, γ)-good distance function, the following classifierh, for any ε, δ > 0, when given n = 16

γ2 lg(2δ

)pairs of positive and

negative training points, (ai ,bi)ni=1,ai ∈R µ+,bi ∈R µ− with probability

greater than 1− δ, has an error no more than ε+ δ

h(x) = sgn [f (x)] , f (x) =1n

n∑i=1

K (x ,ai)−1n

n∑i=1

K (x ,bi)

Have to introduce a weighing function to extend scope of thealgorithmCan be shown to imply that the landmarking kernel induced by arandom sample is good kernel with high probabilityYet another instance of weak∗-PAC learning

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 41 / 60

Page 162: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Landmarking based approaches

Kernels as Kernels vs. Kernels as Similarity

Similarity→ Kernel : (ε, γ)-good⇒ (ε+ δ, γ/2)-kernel good

Kernel→ Similarity : (ε, γ)-kernel good⇒(ε+ ε0,

12(1− ε)ε0γ2)-kernel good

[Srebro, 2007] There exist learning instances for which kernelsperform better as kernels than as similarity functions[Balcan et al., 2008b] There exist function classes anddistributions such that no kernel performs well on all the functions.However there exist similarity functions that give optimalperformanceRole of the weighing function not investigated

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 42 / 60

Page 163: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Landmarking based approaches

Kernels as Kernels vs. Kernels as Similarity

Similarity→ Kernel : (ε, γ)-good⇒ (ε+ δ, γ/2)-kernel goodKernel→ Similarity : (ε, γ)-kernel good⇒(ε+ ε0,

12(1− ε)ε0γ2)-kernel good

[Srebro, 2007] There exist learning instances for which kernelsperform better as kernels than as similarity functions[Balcan et al., 2008b] There exist function classes anddistributions such that no kernel performs well on all the functions.However there exist similarity functions that give optimalperformanceRole of the weighing function not investigated

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 42 / 60

Page 164: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Landmarking based approaches

Kernels as Kernels vs. Kernels as Similarity

Similarity→ Kernel : (ε, γ)-good⇒ (ε+ δ, γ/2)-kernel goodKernel→ Similarity : (ε, γ)-kernel good⇒(ε+ ε0,

12(1− ε)ε0γ2)-kernel good

[Srebro, 2007] There exist learning instances for which kernelsperform better as kernels than as similarity functions

[Balcan et al., 2008b] There exist function classes anddistributions such that no kernel performs well on all the functions.However there exist similarity functions that give optimalperformanceRole of the weighing function not investigated

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 42 / 60

Page 165: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Landmarking based approaches

Kernels as Kernels vs. Kernels as Similarity

Similarity→ Kernel : (ε, γ)-good⇒ (ε+ δ, γ/2)-kernel goodKernel→ Similarity : (ε, γ)-kernel good⇒(ε+ ε0,

12(1− ε)ε0γ2)-kernel good

[Srebro, 2007] There exist learning instances for which kernelsperform better as kernels than as similarity functions[Balcan et al., 2008b] There exist function classes anddistributions such that no kernel performs well on all the functions.However there exist similarity functions that give optimalperformance

Role of the weighing function not investigated

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 42 / 60

Page 166: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Kernels as similarity Landmarking based approaches

Kernels as Kernels vs. Kernels as Similarity

Similarity→ Kernel : (ε, γ)-good⇒ (ε+ δ, γ/2)-kernel goodKernel→ Similarity : (ε, γ)-kernel good⇒(ε+ ε0,

12(1− ε)ε0γ2)-kernel good

[Srebro, 2007] There exist learning instances for which kernelsperform better as kernels than as similarity functions[Balcan et al., 2008b] There exist function classes anddistributions such that no kernel performs well on all the functions.However there exist similarity functions that give optimalperformanceRole of the weighing function not investigated

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 42 / 60

Page 167: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Conclusion

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 43 / 60

Page 168: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

The big picture

Finite-dimensional embeddings (PE, Minkowski spaces)

I Work well in transductive settingsI Allow for support vector like effectsI Not much work on generalization guaranteesI Not much known about distortion incurred when embedding test

pointsI Should work well owing to Representer Theorems

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 44 / 60

Page 169: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

The big picture

Finite-dimensional embeddings (PE, Minkowski spaces)I Work well in transductive settings

I Allow for support vector like effectsI Not much work on generalization guaranteesI Not much known about distortion incurred when embedding test

pointsI Should work well owing to Representer Theorems

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 44 / 60

Page 170: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

The big picture

Finite-dimensional embeddings (PE, Minkowski spaces)I Work well in transductive settingsI Allow for support vector like effects

I Not much work on generalization guaranteesI Not much known about distortion incurred when embedding test

pointsI Should work well owing to Representer Theorems

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 44 / 60

Page 171: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

The big picture

Finite-dimensional embeddings (PE, Minkowski spaces)I Work well in transductive settingsI Allow for support vector like effectsI Not much work on generalization guarantees

I Not much known about distortion incurred when embedding testpoints

I Should work well owing to Representer Theorems

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 44 / 60

Page 172: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

The big picture

Finite-dimensional embeddings (PE, Minkowski spaces)I Work well in transductive settingsI Allow for support vector like effectsI Not much work on generalization guaranteesI Not much known about distortion incurred when embedding test

points

I Should work well owing to Representer Theorems

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 44 / 60

Page 173: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

The big picture

Finite-dimensional embeddings (PE, Minkowski spaces)I Work well in transductive settingsI Allow for support vector like effectsI Not much work on generalization guaranteesI Not much known about distortion incurred when embedding test

pointsI Should work well owing to Representer Theorems

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 44 / 60

Page 174: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

The big picture

Exact embeddings (Banach, Keın spaces)

I Work well in inductive settingsI Allow for support vector like effectsI Generalization guarantees well studiedI Embeddings are isometric or “isosimilar”I Too much power though ([von Luxburg and Bousquet, 2004],

[Ong et al., 2004])

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 45 / 60

Page 175: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

The big picture

Exact embeddings (Banach, Keın spaces)I Work well in inductive settings

I Allow for support vector like effectsI Generalization guarantees well studiedI Embeddings are isometric or “isosimilar”I Too much power though ([von Luxburg and Bousquet, 2004],

[Ong et al., 2004])

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 45 / 60

Page 176: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

The big picture

Exact embeddings (Banach, Keın spaces)I Work well in inductive settingsI Allow for support vector like effects

I Generalization guarantees well studiedI Embeddings are isometric or “isosimilar”I Too much power though ([von Luxburg and Bousquet, 2004],

[Ong et al., 2004])

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 45 / 60

Page 177: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

The big picture

Exact embeddings (Banach, Keın spaces)I Work well in inductive settingsI Allow for support vector like effectsI Generalization guarantees well studied

I Embeddings are isometric or “isosimilar”I Too much power though ([von Luxburg and Bousquet, 2004],

[Ong et al., 2004])

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 45 / 60

Page 178: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

The big picture

Exact embeddings (Banach, Keın spaces)I Work well in inductive settingsI Allow for support vector like effectsI Generalization guarantees well studiedI Embeddings are isometric or “isosimilar”

I Too much power though ([von Luxburg and Bousquet, 2004],[Ong et al., 2004])

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 45 / 60

Page 179: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

The big picture

Exact embeddings (Banach, Keın spaces)I Work well in inductive settingsI Allow for support vector like effectsI Generalization guarantees well studiedI Embeddings are isometric or “isosimilar”I Too much power though ([von Luxburg and Bousquet, 2004],

[Ong et al., 2004])

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 45 / 60

Page 180: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

The big picture

Landmarking approaches

I Work well in inductive settingsI Dont allow support vector like effects (got to keep all the landmarks)I Generalization guarantees thereI But how does one find a “good” kernel ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 46 / 60

Page 181: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

The big picture

Landmarking approachesI Work well in inductive settings

I Dont allow support vector like effects (got to keep all the landmarks)I Generalization guarantees thereI But how does one find a “good” kernel ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 46 / 60

Page 182: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

The big picture

Landmarking approachesI Work well in inductive settingsI Dont allow support vector like effects (got to keep all the landmarks)

I Generalization guarantees thereI But how does one find a “good” kernel ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 46 / 60

Page 183: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

The big picture

Landmarking approachesI Work well in inductive settingsI Dont allow support vector like effects (got to keep all the landmarks)I Generalization guarantees there

I But how does one find a “good” kernel ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 46 / 60

Page 184: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

The big picture

Landmarking approachesI Work well in inductive settingsI Dont allow support vector like effects (got to keep all the landmarks)I Generalization guarantees thereI But how does one find a “good” kernel ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 46 / 60

Page 185: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

The big picture

Landmarking approachesI Work well in inductive settingsI Dont allow support vector like effects (got to keep all the landmarks)I Generalization guarantees thereI But how does one find a “good” kernel ?

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 46 / 60

Page 186: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Open questions

Choosing the kernel : still requires one to attend Hogwarts

Existing approaches to learning kernels are pathetic[Balcan et al., 2008c] proposes to learn with multiple similarityfunctionsNeed testable definitions of goodness of kernels

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 47 / 60

Page 187: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Open questions

Choosing the kernel : still requires one to attend HogwartsExisting approaches to learning kernels are pathetic

[Balcan et al., 2008c] proposes to learn with multiple similarityfunctionsNeed testable definitions of goodness of kernels

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 47 / 60

Page 188: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Open questions

Choosing the kernel : still requires one to attend HogwartsExisting approaches to learning kernels are pathetic[Balcan et al., 2008c] proposes to learn with multiple similarityfunctions

Need testable definitions of goodness of kernels

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 47 / 60

Page 189: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Open questions

Choosing the kernel : still requires one to attend HogwartsExisting approaches to learning kernels are pathetic[Balcan et al., 2008c] proposes to learn with multiple similarityfunctionsNeed testable definitions of goodness of kernels

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 47 / 60

Page 190: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Open questions

Application of indefinite kernels to other tasks

I clustering [Balcan et al., 2008d]I principal componentsI multi-class classification [Balcan and Blum, 2006]

Analysis of the feature maps induced by embeddings into Banach,Keın spaces [Balcan et al., 2006]

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 48 / 60

Page 191: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Open questions

Application of indefinite kernels to other tasksI clustering [Balcan et al., 2008d]

I principal componentsI multi-class classification [Balcan and Blum, 2006]

Analysis of the feature maps induced by embeddings into Banach,Keın spaces [Balcan et al., 2006]

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 48 / 60

Page 192: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Open questions

Application of indefinite kernels to other tasksI clustering [Balcan et al., 2008d]I principal components

I multi-class classification [Balcan and Blum, 2006]

Analysis of the feature maps induced by embeddings into Banach,Keın spaces [Balcan et al., 2006]

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 48 / 60

Page 193: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Open questions

Application of indefinite kernels to other tasksI clustering [Balcan et al., 2008d]I principal componentsI multi-class classification [Balcan and Blum, 2006]

Analysis of the feature maps induced by embeddings into Banach,Keın spaces [Balcan et al., 2006]

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 48 / 60

Page 194: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Open questions

Application of indefinite kernels to other tasksI clustering [Balcan et al., 2008d]I principal componentsI multi-class classification [Balcan and Blum, 2006]

Analysis of the feature maps induced by embeddings into Banach,Keın spaces [Balcan et al., 2006]

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 48 / 60

Page 195: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Bibliography I

Balcan, M.-F. and Blum, A. (2006).On a Theory of Learning with Similarity Functions.In International Conference on Machine Learning, pages 73–80.

Balcan, M.-F., Blum, A., and Srebro, N. (2008a).A Theory of Learning with Similarity Functions.Machine Learning, 71(1-2):89–112.

Balcan, M.-F., Blum, A., and Srebro, N. (2008b).Improved Guarantees for Learning via Similarity Functions.In Annual Conference on Computational Learning Theory, pages287–298.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 49 / 60

Page 196: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Bibliography II

Balcan, M.-F., Blum, A., and Srebro, N. (2008c).Learning with Multiple Similarity Functions.Workshop on Kernel Learning: Automatic Selection of OptimalKernels, Advances in Neural Information Processing Systems.

Balcan, M.-F., Blum, A., and Vempala, S. (2006).Kernels as Features: On Kernels, Margins, and Low-dimensionalMappings.Machine Learning, 65(1):79–94.

Balcan, M.-F., Blum, A., and Vempala, S. (2008d).A Discriminative Framework for Clustering via Similarity Functions.In ACM Annual Symposium on Theory of Computing, pages671–680.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 50 / 60

Page 197: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Bibliography III

Boughorbel, S., Tarel, J.-P., and Boujemaa, N. (2005).Conditionally Positive Definite Kernels for SVM Based ImageRecognition.In IEEE International Conference on Multimedia & Expo, pages113–116.

Chen, Y., Garcia, E. K., Gupta, M. R., Rahimi, A., and Cazzanti, L.(2009).Similarity-based Classification: Concepts and Algorithms.Journal of Machine Learning Research, 10:747–776.

Der, R. and Lee, D. (2007).Large Margin Classification in Banach Spaces.In International Conference on Artificial Intelligence and Statistics,volume 2 of JMLR Workshop and Conference Proceedings, pages91–98.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 51 / 60

Page 198: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Bibliography IV

Duda, R. O., Hart, P. E., and Stork, D. G. (2000).Pattern Classification.Wiley-Interscience, second edition.

Duin, R. P. W. and Pekalska, E. (2008).On Refining Dissimilarity Matrices for an Improved NN Learning.In International Conference on Pattern Recognition, pages 1–4.

Goldfarb, L. (1984).A Unified Approach to Pattern Recognition.Pattern Recognition, 17(5):575–582.

Gottlieb, L.-A., Kontorovich, A. L., and Krauthgamer, R. (2010).Efficient Classification for Metric Data.In Annual Conference on Computational Learning Theory.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 52 / 60

Page 199: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Bibliography V

Graepel, T., Herbrich, R., Bollmann-Sdorra, P., and Obermayer, K.(1998).Classification on Pairwise Proximity Data.In Advances in Neural Information Processing Systems, pages438–444.

Graepel, T., Herbrich, R., Scholkopf, B., Smola, A., Bartlett, P.,Muller, K.-R., Obermayer, K., and Williamson, R. (1999).Classication of Proximity Data with LP machines.In Ninth International Conference on Artificial Neural Networks,pages 304–309.

Haasdonk, B. (2005).Feature Space Interpretation of SVMs with Indefinite Kernels.IEEE Transactions on Pattern Analysis and Machince Intelligence,27(4):482–492.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 53 / 60

Page 200: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Bibliography VI

Haasdonk, B. and Bahlmann, C. (2004).Learning with Distance Substitution Kernels.In Annual Symposium of Deutsche Arbeitsgemeinschaft furMustererkennung, pages 220–227.

Harol, A., Pekalska, E., Verzakov, S., and Duin, R. P. W. (2006).Augmented Embedding of Dissimilarity Data into(Pseudo-)Euclidean Spaces.In IAPR Workshop on Structural, Syntactic, and Statistical PatternRecognition, pages 613–621.

Hein, M., Bousquet, O., and Scholkopf, B. (2005).Maximal Margin Classification for Metric Spaces.Journal of Computer and System Sciences, 71(3):333–359.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 54 / 60

Page 201: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Bibliography VII

Jacobs, D. W., Weinshall, D., and Gdalyahu, Y. (2000).Classification with Nonmetric Distances: Image Retrieval andClass Representation.IEEE Transactions on Pattern Analysis and Machince Intelligence,22(6):583–600.

Kearns, M. and Vazirani, U. (1997).An Introduction to Computational Learning Theory.The MIT Press.

Luss, R. and d’Aspremont, A. (2007).Support Vector Machine Classification with Indefinite Kernels.In Advances in Neural Information Processing Systems.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 55 / 60

Page 202: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Bibliography VIII

Mendelson, S. (2003).A Few Notes on Statistical Learning Theory.In Mendelson, S. and Smola, A. J., editors, Advanced Lectures onMachine Learning, Machine Learning Summer School, volume2600 of Lecture Notes in Computer Science. Springer.

Mierswa, I. (2006).Making Indefinite Kernel Learning Practical.Technical report, Collaborative Research Center 475, University ofDortmund.

Ong, C. S., Mary, X., Canu, S., and Smola, A. J. (2004).Learning with non-positive Kernels.In International Conference on Machine Learning.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 56 / 60

Page 203: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Bibliography IX

Pekalska, E. and Duin, R. P. W. (2000).Classifiers for Dissimilarity-Based Pattern Recognition.In International Conference on Pattern Recognition, pages2012–2016.

Pekalska, E. and Duin, R. P. W. (2001).On Combining Dissimilarity Representations.In Multiple Classifier Systems, pages 359–368.

Pekalska, E. and Duin, R. P. W. (2002).Dissimilarity representations allow for building good classifiers.Pattern Recognition Letters, 23(8):943–956.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 57 / 60

Page 204: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Bibliography X

Pekalska, E., Paclık, P., and Duin, R. P. W. (2001).A Generalized Kernel Approach to Dissimilarity-basedClassification.Journal of Machine Learning Research, 2:175–211.

Scholkopf, B. (2000).The Kernel Trick for Distances.In Advances in Neural Information Processing Systems, pages301–307.

Scholkopf, B. and Smola, A. J. (2001).Learning with Kernels: Support Vector Machines, Regularization,Optimization, and Beyond.The MIT Press, first edition.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 58 / 60

Page 205: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Bibliography XI

Srebro, N. (2007).How Good Is a Kernel When Used as a Similarity Measure?In Annual Conference on Computational Learning Theory, pages323–335.

von Luxburg, U. and Bousquet, O. (2004).Distance-Based Classification with Lipschitz Functions.Journal of Machine Learning Research, 5:669–695.

Wang, L., Yang, C., and Feng, J. (2007).On Learning with Dissimilarity Functions.In International Conference on Machine Learning, pages 991–998.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 59 / 60

Page 206: Learning in Indefiniteness · Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 8 / 60. Learning Limitations of PAC learning Most interesting function classes

Conclusion

Bibliography XII

Weinberger, K. Q. and Saul, L. K. (2009).Distance Metric Learning for Large Margin Nearest NeighborClassification.Journal of Machine Learning Research, 10:207–244.

Weinshall, D., Jacobs, D. W., and Gdalyahu, Y. (1998).Classification in Non-Metric Spaces.In Advances in Neural Information Processing Systems, pages838–846.

Zhang, H., Xu, Y., and Zhang, J. (2009).Reproducing Kernel Banach Spaces for Machine Learning.Journal of Machine Learning, 10:2741–2775.

Purushottam Kar (CSE/IITK) Learning in Indefiniteness August 2, 2010 60 / 60