NIPS2009: Understand Visual Scenes - Part 2

A car out of context …

Modeling object co‐occurrences

What are the hidden objects?

Chance ~ 1/30000

p(O | I) α p(I|O) p(O)

Object model Context model

image objects

Full joint Scene model Approx. joint

Full joint Scene model

p(O) = Σ Πp(Oi|S=s) p(S=s) s i

Approx. joint

office street

Pixel labeling using MRFs

Enforce consistency between neighboring labels, and between labels and pixels

Carbonetto, de Freitas & Barnard, ECCV’04

Object‐Object RelaPonships

Use latent variables to induce long distance correlaPons between labels in a CondiPonal Random Field (CRF)

He, Zemel & Carreira-Perpinan (04)

[Kumar Hebert 2005]

•  Fink & Perona (NIPS 03) Use output of boosPng from other objects at previous

iteraPons as input into boosPng for this iteraPon

Building, boat, motorbike

Building, boat, person

Water, sky

Most consistent labeling according to object co-occurrences& locallabel probabilities.

Building

A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora and S. Belongie. Objects in Context. ICCV 2007

Objects in Context: Contextual Refinement

Contextual model based on co-occurrences Try to find the most consistent labeling with high posterior probability and high mean pairwise interaction. Use CRF for this purpose. Boat

Building

Independent segment classification Mean interaction of all label pairs

Φ(i,j) is basically the observed label co-occurrences in training set.

Using stuff to find things Heitz and Koller, ECCV 2008

In this work, there is not labeling for stuff. Instead, they look for clusters of textures and model how each cluster correlates with the target object.

What,whereandwho?Classifyingeventsbysceneandobjectrecognition

L-JLi&L.Fei-Fei,ICCV2007Slide by Fei-fei

what who where

L.-J. Li & L. Fei-Fei ICCV 2007 Slide by Fei-fei

Grammars

  Guzman (SEE), 1968   Noton and Stark 1971   Hansen & Riseman (VISIONS), 1978   Barrow & Tenenbaum 1978   Brooks (ACRONYM), 1979   Marr, 1982   Yakimovsky & Feldman, 1973

[Ohta & Kanade 1978]

Grammars for objects and scenes

S.C. Zhu and D. Mumford. A Stochastic Grammar of Images. Foundations and Trends in Computer Graphics and Vision, 2006.

3D scenes

We are wired for 3D ~6cm

We can not shut down 3D perception

3D drives perception of important object attributes

by Roger Shepard (”Turning the Tables”)

Depth processing is automatic, and we can not shut it down…

Coughlan, Yuille. 2003 Slide by James Coughlan

Manhattan World

Coughlan, Yuille. 2003 Slide by James Coughlan

Slide by James Coughlan Coughlan, Yuille. 2003

Single view metrology Criminisi, et al. 1999

Need to recover: •  Ground plane •  Reference height •  Horizon line •  Where objects contact the ground

3d Scene Context

Image World

Hoiem, Efros, Hebert ICCV 2005

3D scene context

meters

Hoiem, Efros, Hebert ICCV 2005

Qualitative Results

Initial: 2 TP / 3 FP Final: 7 TP / 4 FP

Local Detector from [Murphy-Torralba-Freeman 2003]

Car: TP / FP Ped: TP / FP

Slide by Derek Hoiem

3D City Modeling using Cognitive Loops

N. Cornelis, B. Leibe, K. Cornelis, L. Van Gool. CVPR'06

3D from pixel values D. Hoiem, A.A. Efros, and M. Hebert, "Automatic Photo Pop-up”. SIGGRAPH 2005.

A. Saxena, M. Sun, A. Y. Ng. "Learning 3-D Scene Structure from a Single Still Image" In ICCV workshop on 3D Representation for Recognition (3dRR-07), 2007.

Surface Estimation

Image Support Vertical Sky

V-Left V-Center V-Right V-Porous V-Solid

[Hoiem, Efros, Hebert ICCV 2005]

Object Surface?

Support? Slide by Derek Hoiem

Object Support

Slide by Derek Hoiem

Gupta & Davis, EECV, 2008

Qualitative 3D relationships

Large databases Algorithms that rely on millions of images

Human vision • Many input modalities • Active • Supervised, unsupervised, semi supervised learning. It can look for supervision.

Robot vision • Many poor input modalities • Active, but it does not go far

Internet vision • Many input modalities • It can reach everywhere • Tons of data

The two extremes of learning

Number of training samples

1 10 102 103 104 105

Extrapolation problem Generalization

Diagnostic features

Interpolation problem Correspondence

Finding the differences

∞ 106

Transfer learning Classifiers

Priors Label transfer

Input image Nearest neighbors

Hays, Efros, Siggraph 2007 Russell, Liu, Torralba, Fergus, Freeman. NIPS 2007 Divvala, Efros, Hebert, 2008 Malisiewicz, Efros 2008 Torralba, Fergus, Freeman, PAMI 2008 Liu, Yuen, Torralba, CVPR 2009

•  Labels

•  Depth •  …

•  Labels

•  Depth •  …

•  MoPon

The power of large collections

Google Street View PhotoToursim/PhotoSynth [Snavely et al.,2006] (controlled image capture)

(register images based on multi-view geometry)

Image completion

Instead, generate proposals using millions of images

Hays, Efros, 2007

Input 16 nearest neighbors (gist+color matching)

output

im2gps Instead of using objects labels, the web provides other kinds of metadata associate to large collections of images

Hays & Efros. CVPR 2008

20 million geotagged and geographic text-labeled images

Hays & Efros. CVPR 2008 im2gps

Input image Nearest neighbors Geographic location of the nearest neighbors

Predicting events

C. Liu, J. Yuen, A. Torralba, J. Sivic, and W. T. Freeman, ECCV 2008

Predicting events

Retrieved video Query

Retrieved video

Synthesized video

Retrieved video

Synthesized video

Retrieved video

Synthesized video

Databases and the powers of 10

Datasets and

Powers of 10

DATASETS AND

0 images

10 0 images

10 1 images

Marr, 1976

10 2-4 images

In 1996 DARPA released 14000 images, from over 1000 individuals.

The faces and cars scale

The PASCAL Visual Object Classes

M. Everingham, Luc van Gool , C. Williams, J. Winn, A. Zisserman 2007

In 2007, the twenty object classes that have been selected are:

Person: person Animal: bird, cat, cow, dog, horse, sheep Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

10 2-4 images

10 5 images

Caltech 101 and 256

Griffin, Holub, Perona, 2007 Fei-Fei, Fergus, Perona, 2004

10 5 images

Lotus Hill Research InsPtute image corpus

Z.Y. Yao, X. Yang, and S.C. Zhu, 2007

B.C. Russell, A. Torralba, K.P. Murphy, W.T. Freeman, IJCV 2008 Labelme.csail.mit.edu

Tool went online July 1st, 2005 530,000 object annotations collected

LabelMe 10 5 images

Extreme labeling

The other extreme of extreme labeling

… things do not always look good…

Creative testing

10 5 images

10 6-7 images

Things start getting out of hand

Collecting big datasets

•  ESP game (CMU) Luis Von Ahn and Laura Dabbish 2004

•  LabelMe (MIT) Russell, Torralba, Freeman, 2005

•  StreetScenes (CBCL-MIT) Bileschi, Poggio, 2006

•  WhatWhere (Caltech) Perona et al, 2007

•  PASCAL challenge 2006, 2007

•  Lotus Hill Institute Song-Chun Zhu et al, 2007

•  80 million images Torralba, Fergus, Freeman, 2007

10 6-7 images

80.000.000 images 75.000 non-abstract nouns from WordNet 7 Online image search engines

Google: 80 million images

And after 1 year downloading images

A. Torralba, R. Fergus, W.T. Freeman. PAMI 2008

10 6-7 images

~105+ nodes ~108+ images

shepherd dog, sheep dog

German shepherd collie animal

Deng, Dong, Socher, Li & Fei-Fei, CVPR 2009

10 6-7 images

Alexander Sorokin, David Forsyth, "Utility data annotation with Amazon Mechanical Turk", First IEEE Workshop on Internet Vision at CVPR 08.

Labeling for money

1 cent Task: Label one object in this image

Why people does this?

From: John Smith <…@yahoo.co.in>Date: August 22, 2009 10:18:23 AM EDT

To: Bryan Russell Subject: Re: Regarding Amazon Mechanical Turk HIT RX5WVKGA9W

Dear Mr. Bryan, I am awaiPng for your HITS. Please help us with more.

Thanks & Regards

10 6-7 images

10 8-11 images

Canonical PerspecPve

From Vision Science, Palmer

Examples of canonical perspective:

In a recognition task, reaction time correlated with the ratings.

Canonical views are recognized faster at the entry level.

3D object categorizaPon

by Greg Robbins

Despite we can categorize all three pictures as being views of a horse, the three pictures do not look as being equally typical views of horses. And they do not seem to be recognizable with the same easiness.

Canonical Viewpoint

It is not a uniform sampling on viewpoints (some artificial datasets might contain non natural statistics)

10 8-11 images

Interesting biases…

Canonical Viewpoint

Clocks are preferred as purely frontal

10 8-11 images

Interesting biases…

10 >11 images

NIPS2009: Understand Visual Scenes - Part 2

Education

Transcript of NIPS2009: Understand Visual Scenes - Part 2

προγράμματος σε Visual Basic - Gr · 2019. 4. 3. · Η Visual Basic είναι μια χαρακτηριστική, των σύγχρονων αυτών τάσεων,

Op lipid nutri cream VISUAL

THE EFFECTS OF CARROT CAROTENOIDS ON VISUAL FUNCTION …

ΑΣΚΗΣΕΙΣ VISUAL BASIC1epal-ioann.ioa.sch.gr/joom/pdf/tomeis/pl/askiseis_vb.pdfΑ ΚΖ ΔΗ VISUAL BASIC Β΄ ΛΤΚΔΗΟΤ ΚΑΣΔΤΘΤΝ Ζ εκπέκβξηνο 2007 ειίδα

Visual Basic 6

The Loop Visual Magazine Μay 2015

Visual Analytics for Efficient Processing & Analysis …complexity.web.auth.gr/images/material/TzovarasDimitris.pdf · Visual Analytics for Efficient Processing & Analysis of Big

Beards that matter. Visual representations of Patriarch ...

Learning visual similarity for product design with convolutional ...

Visual Performance - University of Arizona · Visual Performance • Resolution Limit ... • Color • Temporal Response • Illumination • Monocular vs. Binocular ... Visual Resolution,”

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS …library.mpib-berlin.mpg.de/toc/ze_2006_1477.pdf · Discriminant Saliency for Visual Recognition from Cluttered Scenes, DASHAN

Σημειώσεις Visual Basic-2012-13-νεο

Actualizado 25-Mayo-2015 Ampliado Glosariodoctorjoseperea.com/images/libros/pdf/diccionario.pdf · aislados (agudeza visual angular). La agudeza visual morfoscópica valora y reconoce

Image aquisition system Visual perception basics

Visual Basic 5.0

Learning, then Compacting Visual Policies (Extended Abstract) · Learning, then Compacting Visual Policies ... As a conclusion, compacting visual policies is probably ... The discount

Logistic Regression: Behind the Scenes

Using global invariant manifolds to understand - Boston University

Visual Electrofluorochromic Detection of Cancer Cell ...

USING MATRIX ALGEBRA TO UNDERSTAND POPULATION GROWTH …personal.denison.edu/~ludwigl/2011populationdynamicsandma.pdf · USING MATRIX ALGEBRA TO UNDERSTAND POPULATION GROWTH RATE