Test: max y , h w T Ψ ( x , y , h )

Test: maxy,h wTΨ(x,y,h)

Modeling Latent Variable Uncertainty for Loss-based LearningM. Pawan Kumar Ben Packer Daphne Kollerhttp://cvc.centrale-ponts.fr

Aim: Accurate parameter estimationfrom weakly supervised datasets

Latent Variable Models

Results

y

http://dags.stanford.edu

x : input y : output

h : latent variables (LV)

y = “Deer”xh

Objective

x

y

hValues known during training

Values unknown during training

• Predict the image class y • Predict the object location h

Object Detection

Latent SVM Linear prediction rule with parameter w

Train: minw Σi Δ(yi,yi(w),hi(w))

Train: maxθ Σi Σhi log (Pθ(yi,hi|xi))Test: maxy,h θTΨ(x,y,h)

The EM Algorithm Pθ(y,h|x) = exp(θTΨ(x,y,h))Z

✔ Models uncertainty in LV ✖ Does not model accuracy of LV prediction ✖ Does not employ a user-defined loss function

✔ Employs a user-defined loss function (with restricted form) ✖ Does not model uncertainty in LV

Overview

hi

Pθ(hi|yi,xi)

Pw(yi,hi|xi)(yi,hi)

(yi(w),hi(w))

Two distributions for two tasksModels uncertainty

in LV

Models predictedoutput and LV

Optimization Block coordinate descent over (w,θ)

Minimize Rao’s Dissimilarity Coefficient

Pθ(hi|yi,xi) Pw(yi,hi|xi)

minθ,wΣi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi)

-βΣh,h’ Δ(yi,h,yi,h’)Pθ(h|yi,xi)Pθ(h’|yi,xi)

Encourages predictionwith correct output and

high probability LV

Fix delta distribution; Optimize conditional distributionCase I: Delta distribution predicts correct output, y = y(w)

Case II: Delta distribution predicts incorrect output, y ≠ y(w)

Fix conditional distribution; Optimize delta distribution

Increase the probability of the predicted LV h(w)

Increase the diversity of the conditional distribution

Predict correct output and high probability LV

hi(w)hi(w)

(yi,hi(w))(yi,hi(w))

Property 1If loss function is independent of h, we recover latent SVM

Property 2If Pθ is modeled as delta, we recover iterative latent SVM

Code available at http://cvc.centrale-ponts.fr/personnel/pawan

Ideally, the two learned distributions should match exactlyLimited representational power prevents exact match

Difference-of-convex upper bound of expected lossEfficient concave-convex procedure similar to latent SVM

Fold 1 Fold 2 Fold 3 Fold 4 Fold 50

0.10.20.30.40.50.60.70.80.9

Average 0/1 Test Loss

LSVMOur


0.1

0.2

0.3

0.4

0.5

0.6Average Overlap Test Loss

LSVMOur


0.2

0.4

0.6

0.8

1

1.2Average 0/1 Test Loss

LSVMOur

Fold 1 Fold 2 Fold 3 Fold 4 Fold 50.62

0.64

0.66

0.68

0.7

0.72

0.74Average Overlap Test Loss

LSVMOur

Object Detection

ActionDetection

Statistically Significant Not Statistically Significant

Statistically Significant Statistically Significant

HOG Features

No objectscale variation

Latent Space =All possiblepixel positions

Poselet Features

Large objectscale variation

Latent Space =Top k persondetections

Known ground-truth LV values at test time

Ψ: joint feature vector Δ: loss function; measures risk

Test: max y , h w T Ψ ( x , y , h )

Documents

Transcript of Test: max y , h w T Ψ ( x , y , h )