A Hybrid Method for Distance Metric Learning · 2011-09-28 · A Hybrid Method for Distance Metric...

1
Data: Pair-wise similarity rating: a set S of quintuplets (o, o’, x, x’, σ) o, o’ : object identifier x, x’ R K : features of each object σ { 1, 2, 3} : dissimilar / neutral / similar Class labels: a set G of triplets (o, x, c) c { 1, 2, , M}: class Distance Metric: Main Question: how to learn coefficients r ? A Hybrid Method for Distance Metric Learning Yi-hao Kao, Benjamin Van Roy, Daniel Rubin, Jiajing Xu, Jessica Faruque, and Sandy Napel Stanford University Ordinal Regression: We assume where v is the level of similarity, and solve Convex Optimization: We solve Neighborhood Component Analysis: We assume a feature x is assigned class label c with probability and solve Bar-Hillel, A., Hertz, T., Shental, N., and Weinshall, D. Learning distance functions using equivalence relations. ICML. 2003. Cox, T. and Cox, M. A. A. Multidimensional Scaling. Chapman & Hall/CRC, 2000. Frome, A., Singer, Y., and Malik, J. Image retrieval and classification using local distance functions. NIPS. 2006. Goldberger, J., Roweis, S., Hinton, G., and Salakhutdinov, R. Neighbourhood components analysis. NIPS. 2005. Herbrich, R., Graepel, T., and Obermayer, K. Large margin rank boundaries for ordinal regression. Advances in Large Margin Classifiers. 2000. McCullagh, P. and Nelder, J. A. Generalized linear models (Second edition). London: Chapman & Hall,1989. Schultz, M. and Joachims, T. Learning a distance metric from relative comparisons. NIPS. 2004. Weinberger, K. Q. and Saul, L. K. Distance metric learning for large margin nearest neighbor classification. JMLR. 2009. Weinberger, K. Q. and Tesauro, G. Metric learning for kernel regression. AISTATS. 2007. Weinberger, K. Q., Blitzer, J., and Saul, L. K. Distance metric learning for large margin nearest neighbor classification. NIPS. 2006. Xing, E. P., Ng, A. Y., Jordan, M. I., and Russell, S. Distance metric learning, with application to clustering with side-information. NIPS. 2002. 1 Introduction 2 Problem Formulation 3 Conventional Algorithms References K k k k k r x x r x x d 1 2 ) ' ( ) ' , ( Retrieval system Input: new object o Output: a list of similar objects o (1) , …, o (N) ) ) ' , ( exp( 1 1 ) ' , | ( 2 v r x x d x x v P 2 1 ) , ' , ( , 0 s.t. ) ' , | ( log max r x x P S x x r 0 1 ) ' , ( s.t. ) ' , ( min ) 1 , ' , ( ) 3 , ' , ( 2 r x x d x x d S x x r S x x r r G c x r G c c x r x x d x x d G x c P ) ' , ' ( 2 ) , ( 2 )) ' , ( exp ( )) , ( exp ( ) , | ( G c x r c x G x c P ) , ( 0 )) , ( \ , | ( log max Assumptions: The observed features may not express all relevant information. In other words, there are “missing features. Given objects o, o’ with observed feature vectors x, x’ R K and missing feature vectors z, z’ R J , the underlying distance metric is given by where r R K and r R J . x and z are independent conditioned on c. Given a learning algorithm A that learns conditional class probabilities P(c|x) from class label data, we represent the resulting estimates by a vector u(x) R M , defined as Then we have where Q R M×M is defined as We can plug the above results into any learning algorithm B that learns the coefficients of a distance metric from feature differences and similarity ratings. 4 A Hybrid Method 2 1 2 2 2 1 1 2 1 2 ) ' , ( ) ' , ( ) ' ( ) ' ( ) ' , ( z z d x x d z z r x x r o o D r r J j j j j K k k k k ) | ( ˆ ) ( x m P o u m ) ' ( ) ( ) ' , ( )] ' ( ), ( , ' , | ) ' , ( [ E 2 2 o Qu o u x x d o u o u x x o o D T r M c c c c z z d Q r c c ' , 1 ], ' , | ) ' , ( [ E 2 ' , Example: We take A to be a kernel density estimator similar to NCA, and B to be the aforementioned convex optimization: symmetric. and 0 0 1 ) ' ( ) ( ) ' , ( s.t. ) ' ( ) ( ) ' , ( min ) 1 , ' , , ' , ( 2 ) 3 , ' , , ' , ( 2 Q r o Qu o u x x d o Qu o u x x d S x x o o T T S x x o o r r r Synthetic data: We randomly generate 100 datasets and carry out the above algorithms while varying the amount of training data. Figure 1 plots the resulting normalized discounted cumulative gains, defined as Figure 1. The average NDCG delivered by OR, CO, NCA, and HYB, over different sizes of rating data set. Here K=60 and M=3. Real data: Our real data set consists of 30 CT scans of liver lesion. Figure 2(a) gives some sample images. Figure 2(b) plots the NDCG delivered by each algorithm. (a) (b) Figure 2. (a) Sample images (b) The average NDCG delivered by OR, CO, NCA, and HYB. 5 Experiments Objects database We consider the problem of learning a measure of distance among feature vectors, and propose a hybrid method that simultaneously learns from similarity ratings and class labels. Application: information retrieval 10 1 2 10 ) 1 ( log 1 2 DCG ) ( p p p

Transcript of A Hybrid Method for Distance Metric Learning · 2011-09-28 · A Hybrid Method for Distance Metric...

Page 1: A Hybrid Method for Distance Metric Learning · 2011-09-28 · A Hybrid Method for Distance Metric Learning Yi-hao Kao, Benjamin Van Roy, Daniel Rubin, Jiajing Xu, Jessica Faruque,

Data:

Pair-wise similarity rating: a set S of quintuplets (o, o’, x, x’, σ)

• o, o’ : object identifier

• x, x’ ∈ RK : features of each object

• σ ∈ { 1, 2, 3} : dissimilar / neutral / similar

Class labels: a set G of triplets (o, x, c)

• c ∈ { 1, 2, …, M} : class

Distance Metric:

Main Question: how to learn coefficients r ?

A Hybrid Method for Distance Metric Learning

Yi-hao Kao, Benjamin Van Roy, Daniel Rubin, Jiajing Xu, Jessica Faruque, and Sandy Napel

Stanford University

Ordinal Regression: We assume

where v is the level of similarity, and solve

Convex Optimization: We solve

Neighborhood Component Analysis: We assume a feature

x† is assigned class label c† with probability

and solve

• Bar-Hillel, A., Hertz, T., Shental, N., and Weinshall, D. Learning distance functions using

equivalence relations. ICML. 2003.

• Cox, T. and Cox, M. A. A. Multidimensional Scaling. Chapman & Hall/CRC, 2000.

• Frome, A., Singer, Y., and Malik, J. Image retrieval and classification using local distance

functions. NIPS. 2006.

• Goldberger, J., Roweis, S., Hinton, G., and Salakhutdinov, R. Neighbourhood components

analysis. NIPS. 2005.

•Herbrich, R., Graepel, T., and Obermayer, K. Large margin rank boundaries for ordinal

regression. Advances in Large Margin Classifiers. 2000.

• McCullagh, P. and Nelder, J. A. Generalized linear models (Second edition). London:

Chapman & Hall,1989.

• Schultz, M. and Joachims, T. Learning a distance metric from relative comparisons. NIPS.

2004.

• Weinberger, K. Q. and Saul, L. K. Distance metric learning for large margin nearest

neighbor classification. JMLR. 2009.

• Weinberger, K. Q. and Tesauro, G. Metric learning for kernel regression. AISTATS. 2007.

• Weinberger, K. Q., Blitzer, J., and Saul, L. K. Distance metric learning for large margin

nearest neighbor classification. NIPS. 2006.

• Xing, E. P., Ng, A. Y., Jordan, M. I., and Russell, S. Distance metric learning, with application

to clustering with side-information. NIPS. 2002.

1 Introduction

2 Problem Formulation

3 Conventional Algorithms

References

K

k

kkkr xxrxxd1

2)'()',(

Retrieval system

Input:

new object o

Output:

a list of similar objects

o(1), …, o(N)

))',(exp(1

1)',|(

2

vr xxdxxvP

21

),',(,

0 s.t.

)',|(logmax

r

xxPSxx

r

0

1)',( s.t.

)',(min

)1,',(

)3,',(

2

r

xxd

xxd

Sxx

r

Sxx

rr

Gcx

r

Gccx

r

xxd

xxd

GxcP

)','(

†2

),(

†2

††

))',(exp(

)),(exp(

),|(†

Gcx

rcxGxcP

),(0

)),(\,|(logmax

Assumptions:

The observed features may not express all relevant information. In

other words, there are “missing features.”

Given objects o, o’ with observed feature vectors x, x’ ∈ RK and

missing feature vectors z, z’ ∈ RJ, the underlying distance metric is

given by

where r ∈ RK and r┴ ∈ RJ.

x and z are independent conditioned on c.

Given a learning algorithm A that learns conditional class

probabilities P(c|x) from class label data, we represent the resulting

estimates by a vector u(x) ∈ RM , defined as

Then we have

where Q ∈ RM×M is defined as

We can plug the above results into any learning algorithm B that

learns the coefficients of a distance metric from feature differences and

similarity ratings.

4 A Hybrid Method

21

22

2

1

1

2

1

2

)',()',(

)'()'()',(

zzdxxd

zzrxxrooD

rr

J

j

jjj

K

k

kkk

)|(ˆ)( xmPoum

)'()()',(

)]'(),(,',|)',([E

2

2

oQuouxxd

ououxxooD

T

r

McccczzdQrcc ',1 ],',|)',([E 2

',

Example: We take A to be a kernel density estimator similar

to NCA, and B to be the aforementioned convex optimization:

symmetric. and 0

0

1)'()()',( s.t.

)'()()',(min

)1,',,',(

2

)3,',,',(

2

Q

r

oQuouxxd

oQuouxxd

Sxxoo

T

T

Sxxoo

rr

r

Synthetic data: We randomly generate 100 datasets and carry

out the above algorithms while varying the amount of training data.

Figure 1 plots the resulting normalized discounted cumulative

gains, defined as

Figure 1. The average NDCG delivered by OR, CO, NCA, and

HYB, over different sizes of rating data set. Here K=60 and M=3.

Real data: Our real data set consists of 30 CT scans of liver

lesion. Figure 2(a) gives some sample images. Figure 2(b) plots

the NDCG delivered by each algorithm.

(a) (b)

Figure 2. (a) Sample images (b) The average NDCG delivered by

OR, CO, NCA, and HYB.

5 Experiments

Objects

database

We consider the problem of learning a measure of distance

among feature vectors, and propose a hybrid method that

simultaneously learns from similarity ratings and class labels.

Application: information retrieval

10

1 2

10)1(log

12DCG

)(

p p

p