Context Aware Spatial Priors using Entity Relations (CASPER) Geremy Heitz Jonathan Laserson Daphne...

Post on 24-Jan-2016

223 views 0 download

Transcript of Context Aware Spatial Priors using Entity Relations (CASPER) Geremy Heitz Jonathan Laserson Daphne...

Context Aware Spatial Priors using Entity Relations (CASPER)

Geremy HeitzJonathan Laserson

Daphne Koller

December 10th, 2007DAGS

Outline

Goal – Scene Understanding Existing Methods CASPER Preliminary Experiments Future Direction – Going Discriminative

Building

Building

Building

Tree

Car

CarCar Car

Car

Representation

Building

Tree

Car

Building

Building

CarCar Car

Car

l = bag of object categories

ρ = location of centroids

We model P(ρ, l)

Why? Because we use a generative modelP(ρ, l | I) ~ P(ρ, l) P(I|ρ, l)

I = the Image

Building

Tree

Car

Building

Building

CarCar Car

Car

Building

Tree

Car

Car

Building

CarTree Car

Car

Which one makes more sense?

Does Context matter?

Can it help Object Recognition?

LOOPS

Outline

Goal – Scene Understanding Existing Methods CASPER Preliminary Experiments Future Direction – Going Discriminative

Fixed Order Model

Each image has the same bag of objects example: 1 car, 2 buildings, 1 tree

Object centroids are drawn jointly P(ρ, l) = 11{l = l_fixed_order} P(ρ | l) Similar to constellations (Fergus)Problem:

We don't always know the exact set of objects

TDP (Sudderth, 2005)

Each image has a different bag of objects Object centroids are drawn independently P(ρ, l) = P(l) П P(ρi | li) Problems:

This doesn't take pairwise constraints into account

We have lost context

Outline

Goal – Scene Understanding Existing Methods CASPER Preliminary Experiments Future Direction – Going Discriminative

CASPER

Each image has a different bag of objects Object centroids are drawn jointly given l P(ρ,l) = P(l) P(ρ | l) Questions:

How do we represent P(l)? How do we represent P(ρ | l)? How do we learn? How do we infer?

P(l)

Dirichlet Process We don’t want to get into that now

Other options Multinomial Uniform

P(ρ | l) - Desiderata

Correlations between ρ's Sharing of parameters between l's

Intuitive parameterization Continuous Multivariate Distribution Easy to learn parameters Easy to evaluate likelihood Easy to condition Gaussian?

MV Gaussian - Options

Learn a different Gaussian for every l Can't share parameters Large number (∞) of l's

Gaussian Process ρ(x) ~ GP(mu(x), K(x,x’))

Every finite set of x’s produces a Gaussian ρ [ρ(x1) ρ(x2) … ρ(xk)] ~ Gaussian

xt is a hidden function of the class lt Mu(xt) = Axt K(xt,xt’) = c exp(-||B(xt-xt’)||

2) Two objects of the same class -> same x? Is correlation the natural space?

Car

Spatial Distribution - Options

“Singleton Expert” P(ρi|li) Gaussian over absolute object location

“Pairwise Expert” P(ρi-ρj | li,lj) Gaussian offset between objects Expert can be one of K mixture components

Tree

Car CarCar

k = 1

k = 2k = 1

CASPER P(ρ|l) How to use experts? Introduce an auxiliary variable d P(ρ|d,l) d tells us which experts are ‘on’

Building

Tree

Car

Building

Building

CarCar Car

Car

For each edge e=(li,lj), de

indexes all possible experts for this edge

Default is a uniform expert

P(ρ|d,l) ~ POEd

POEd = ПP(ρi|li) ПP(ρi-ρj | dij,li,lj)

Product of Gaussians is a Gaussian

CASPER P(ρ|d,l)

POEd = Zd N(ρ; μd, Σd) P(ρ|d,l) = N(ρ; μd, Σd) = 1/Zd POEd P(d|l) ~ Zd (Multinomial) P(ρ,d|l) ~ POEd

Car3 Car2Car1Car2

Car1 Car Car3Car2

Example: P(ρ,d|l) ~ P(ρ2-ρ1 | d12) P(ρ3-ρ2 | d32)

d1

d2

Car2

Car2

P(ρ|d1,l) = P(ρ|d2,l) but Zd2>Zd1 hence POEd2 > POEd1

Learning the Experts

Training set with supervised (ρ,l) pairs (one pair for each image)

Gibbs over the hidden variables de

Loop over edges Update expert sufficient

statistics with each update Does it converge?

not as much as we want it to Work in progress

Building

Tree

Car

Building

Building

CarCar Car

Car

Outline

Goal – Scene Understanding Existing Methods CASPER Preliminary Experiments Future Direction – Going Discriminative

Preliminary Experiments

LabelMe Datasets

STREETS BEDROOMS

**

*

*

**

*

*

*

*

*

*

***

*

*

*

*

*

*

*

*

**

*

FEATURES Harris Interest Operator -> yi

SIFT Descriptor -> wi

Instance membership -> ti

INSTANCES Centroid -> ρt

Class label -> lt

**

Carρt

(yi, wi, ti)

(ρt, lt)

Observed P(I| ρ,l) = P(y, w|ρ,l)

What do the true ρ’s look like?

Car -> Car

Lamp -> Lamp

Bed -> Lamp

Learning/Inference in Full Model

TDP - Three stage Gibbs: Assign features to instances (Sample ti for every feature)

Assign expert components (Sample de for every edge)

Assign instances to classes (Sample lt, ρt for every instance)

Training Supervise (t,l) variables Gibbs over d and ρ

Testing Introduce new images Gibbs (t,l,d,ρ) of new images

Independent-TDP: ρ’s are independent CASPER-TDP: ρ’s are distributed according to

CASPER

Learned Experts

**

*

*

**

*

*

*

*

*

*

***

*

*

*

*

*

*

*

*

**

*

FEATURES

**

(yi, wi, ti)

*

*

*

*

*

IMAGE GROUNDTRUTH

IND – N = 0.1 IND – N = 0.5

Evaluation – Gen Model

N = 0.1 N = 0.3 N = 0.5

Bed 0.6111 0.6286 0.5882Lamp 0.3077 0.1667 0.0000

Painting 0.5333 0.3333 0.2857Window 0.9091 0.7692 0.5455

Table 0.6667 0.4211 0.3529

“Synthetic Appearance” Visual words give strong indicator for the class

Evaluated on Detection Performance Precision/Recall F1 score for centroid and class

identification Results here with Independent TDP

Can we hope to do this well?

Evaluation - Context

INDEPENDENT

CASPER

Bed 0.5882 0.5714Lamp 0.0000 0.0000

Painting 0.2857 0.1333Window 0.5455 0.4000

Table 0.3529 0.1250

Independent-TDP vs CASPER-TDP N = 0.5

Why isn’t context helping here?

Problems with this Setup

Bad Feelings Supervised setting – Detection

Our model is not trained to maximize detection ability

We will lose to many/most discriminative approaches

Context is NOT the main reason why TDP fails Unsupervised setting

Likelihood? Does anyone care? Object discovery? Context is a lower-order

consideration How would we show that CASPER >

Independent?

Outline

Goal – Scene Understanding Existing Methods CASPER Preliminary Experiments Future Direction – Going Discriminative

Going Discriminative

Up to now we have been generative:

P(I, ρ, l) = P(I | ρ, l) P(ρ, l)

How do we convert this into discriminative?

Include CASPER distribution over (ρ,l) Include term with boosted object detectors Slap on a partition function

P(ρ, l | I) = 1/Z * CASPER * DETECTORS

Discriminative Framework

Boosted Detectors “Over detect”

Each “candidate” has: location ρt, class variable lt detection score DI(lt)

P(ρ, l | I) ~ P(ρ, l) Π DI(lt)

Goal: Reassign detection candidates to classes

Respects the “detection strength” Respects the context between objects

DI(face) = 0.09

DI(face) = 0.92

Similarities to Steve’s work

“Over detection” using boosted detectors

But some detections don’t make sense in context

3D information allows him to “sort out” which detections are correct

CASPER Learning/Inference

Gibbs Inference Loop over images

Loop over detection candidates t Sample (lt | everything else)

Loop over pairs of candidates Sample (de | everything else)

Training lt is known, Gibbs over de

Evaluation Precision/Recall for detections

Possible Datasets

Short Term Plan

Learn the boosted detectors Determine our baseline performance Add Gibbs inference Submit to a conference that is far far

away… ICML = Helsinki, Finland

Alternate Names

Spatial Priors for Arbitrary Groups of Objects

Product of Experts Precision Space View P1(x) = N(a, A) P2(x) = N(b, B) P1(x)P2(x) = Z N(c, C)

Z = N(a ; b, B+A) C-1 = A-1 + B-1

c = C(A-1a + B-1b) What does this mean?

Precision matrices of the experts ADD Even if each expert has a singular A-1

the sum is PSD