Hybrid Discontinuous Galerkin Methods for Fluid Dynamics ...
A Hybrid Method for Distance Metric Learning · 2011-09-28 · A Hybrid Method for Distance Metric...
Transcript of A Hybrid Method for Distance Metric Learning · 2011-09-28 · A Hybrid Method for Distance Metric...
Data:
Pair-wise similarity rating: a set S of quintuplets (o, o’, x, x’, σ)
• o, o’ : object identifier
• x, x’ ∈ RK : features of each object
• σ ∈ { 1, 2, 3} : dissimilar / neutral / similar
Class labels: a set G of triplets (o, x, c)
• c ∈ { 1, 2, …, M} : class
Distance Metric:
Main Question: how to learn coefficients r ?
A Hybrid Method for Distance Metric Learning
Yi-hao Kao, Benjamin Van Roy, Daniel Rubin, Jiajing Xu, Jessica Faruque, and Sandy Napel
Stanford University
Ordinal Regression: We assume
where v is the level of similarity, and solve
Convex Optimization: We solve
Neighborhood Component Analysis: We assume a feature
x† is assigned class label c† with probability
and solve
• Bar-Hillel, A., Hertz, T., Shental, N., and Weinshall, D. Learning distance functions using
equivalence relations. ICML. 2003.
• Cox, T. and Cox, M. A. A. Multidimensional Scaling. Chapman & Hall/CRC, 2000.
• Frome, A., Singer, Y., and Malik, J. Image retrieval and classification using local distance
functions. NIPS. 2006.
• Goldberger, J., Roweis, S., Hinton, G., and Salakhutdinov, R. Neighbourhood components
analysis. NIPS. 2005.
•Herbrich, R., Graepel, T., and Obermayer, K. Large margin rank boundaries for ordinal
regression. Advances in Large Margin Classifiers. 2000.
• McCullagh, P. and Nelder, J. A. Generalized linear models (Second edition). London:
Chapman & Hall,1989.
• Schultz, M. and Joachims, T. Learning a distance metric from relative comparisons. NIPS.
2004.
• Weinberger, K. Q. and Saul, L. K. Distance metric learning for large margin nearest
neighbor classification. JMLR. 2009.
• Weinberger, K. Q. and Tesauro, G. Metric learning for kernel regression. AISTATS. 2007.
• Weinberger, K. Q., Blitzer, J., and Saul, L. K. Distance metric learning for large margin
nearest neighbor classification. NIPS. 2006.
• Xing, E. P., Ng, A. Y., Jordan, M. I., and Russell, S. Distance metric learning, with application
to clustering with side-information. NIPS. 2002.
1 Introduction
2 Problem Formulation
3 Conventional Algorithms
References
K
k
kkkr xxrxxd1
2)'()',(
Retrieval system
Input:
new object o
Output:
a list of similar objects
o(1), …, o(N)
))',(exp(1
1)',|(
2
vr xxdxxvP
21
),',(,
0 s.t.
)',|(logmax
r
xxPSxx
r
0
1)',( s.t.
)',(min
)1,',(
)3,',(
2
r
xxd
xxd
Sxx
r
Sxx
rr
Gcx
r
Gccx
r
xxd
xxd
GxcP
)','(
†2
),(
†2
††
))',(exp(
)),(exp(
),|(†
Gcx
rcxGxcP
),(0
)),(\,|(logmax
Assumptions:
The observed features may not express all relevant information. In
other words, there are “missing features.”
Given objects o, o’ with observed feature vectors x, x’ ∈ RK and
missing feature vectors z, z’ ∈ RJ, the underlying distance metric is
given by
where r ∈ RK and r┴ ∈ RJ.
x and z are independent conditioned on c.
Given a learning algorithm A that learns conditional class
probabilities P(c|x) from class label data, we represent the resulting
estimates by a vector u(x) ∈ RM , defined as
Then we have
where Q ∈ RM×M is defined as
We can plug the above results into any learning algorithm B that
learns the coefficients of a distance metric from feature differences and
similarity ratings.
4 A Hybrid Method
21
22
2
1
1
2
1
2
)',()',(
)'()'()',(
zzdxxd
zzrxxrooD
rr
J
j
jjj
K
k
kkk
)|(ˆ)( xmPoum
)'()()',(
)]'(),(,',|)',([E
2
2
oQuouxxd
ououxxooD
T
r
McccczzdQrcc ',1 ],',|)',([E 2
',
Example: We take A to be a kernel density estimator similar
to NCA, and B to be the aforementioned convex optimization:
symmetric. and 0
0
1)'()()',( s.t.
)'()()',(min
)1,',,',(
2
)3,',,',(
2
Q
r
oQuouxxd
oQuouxxd
Sxxoo
T
T
Sxxoo
rr
r
Synthetic data: We randomly generate 100 datasets and carry
out the above algorithms while varying the amount of training data.
Figure 1 plots the resulting normalized discounted cumulative
gains, defined as
Figure 1. The average NDCG delivered by OR, CO, NCA, and
HYB, over different sizes of rating data set. Here K=60 and M=3.
Real data: Our real data set consists of 30 CT scans of liver
lesion. Figure 2(a) gives some sample images. Figure 2(b) plots
the NDCG delivered by each algorithm.
(a) (b)
Figure 2. (a) Sample images (b) The average NDCG delivered by
OR, CO, NCA, and HYB.
5 Experiments
Objects
database
We consider the problem of learning a measure of distance
among feature vectors, and propose a hybrid method that
simultaneously learns from similarity ratings and class labels.
Application: information retrieval
10
1 2
10)1(log
12DCG
)(
p p
p