Test: maxy,h wTΨ(x,y,h)
Modeling Latent Variable Uncertainty for Loss-based LearningM. Pawan Kumar Ben Packer Daphne Kollerhttp://cvc.centrale-ponts.fr
Aim: Accurate parameter estimationfrom weakly supervised datasets
Latent Variable Models
Results
y
http://dags.stanford.edu
x : input y : output
h : latent variables (LV)
y = “Deer”xh
Objective
x
y
hValues known during training
Values unknown during training
• Predict the image class y • Predict the object location h
Object Detection
Latent SVM Linear prediction rule with parameter w
Train: minw Σi Δ(yi,yi(w),hi(w))
Train: maxθ Σi Σhi log (Pθ(yi,hi|xi))Test: maxy,h θTΨ(x,y,h)
The EM Algorithm Pθ(y,h|x) = exp(θTΨ(x,y,h))Z
✔ Models uncertainty in LV ✖ Does not model accuracy of LV prediction ✖ Does not employ a user-defined loss function
✔ Employs a user-defined loss function (with restricted form) ✖ Does not model uncertainty in LV
Overview
hi
Pθ(hi|yi,xi)
Pw(yi,hi|xi)(yi,hi)
(yi(w),hi(w))
Two distributions for two tasksModels uncertainty
in LV
Models predictedoutput and LV
Optimization Block coordinate descent over (w,θ)
Minimize Rao’s Dissimilarity Coefficient
Pθ(hi|yi,xi) Pw(yi,hi|xi)
minθ,wΣi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi)
-βΣh,h’ Δ(yi,h,yi,h’)Pθ(h|yi,xi)Pθ(h’|yi,xi)
Encourages predictionwith correct output and
high probability LV
Fix delta distribution; Optimize conditional distributionCase I: Delta distribution predicts correct output, y = y(w)
Case II: Delta distribution predicts incorrect output, y ≠ y(w)
Fix conditional distribution; Optimize delta distribution
Increase the probability of the predicted LV h(w)
Increase the diversity of the conditional distribution
Predict correct output and high probability LV
hi(w)hi(w)
(yi,hi(w))(yi,hi(w))
Property 1If loss function is independent of h, we recover latent SVM
Property 2If Pθ is modeled as delta, we recover iterative latent SVM
Code available at http://cvc.centrale-ponts.fr/personnel/pawan
Ideally, the two learned distributions should match exactlyLimited representational power prevents exact match
Difference-of-convex upper bound of expected lossEfficient concave-convex procedure similar to latent SVM
Fold 1 Fold 2 Fold 3 Fold 4 Fold 50
0.10.20.30.40.50.60.70.80.9
Average 0/1 Test Loss
LSVMOur
Fold 1 Fold 2 Fold 3 Fold 4 Fold 50
0.1
0.2
0.3
0.4
0.5
0.6Average Overlap Test Loss
LSVMOur
Fold 1 Fold 2 Fold 3 Fold 4 Fold 50
0.2
0.4
0.6
0.8
1
1.2Average 0/1 Test Loss
LSVMOur
Fold 1 Fold 2 Fold 3 Fold 4 Fold 50.62
0.64
0.66
0.68
0.7
0.72
0.74Average Overlap Test Loss
LSVMOur
Object Detection
ActionDetection
Statistically Significant Not Statistically Significant
Statistically Significant Statistically Significant
HOG Features
No objectscale variation
Latent Space =All possiblepixel positions
Poselet Features
Large objectscale variation
Latent Space =Top k persondetections
Known ground-truth LV values at test time
Ψ: joint feature vector Δ: loss function; measures risk
Top Related