Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features...

Post on 22-Jan-2021

6 views 0 download

Transcript of Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features...

Viola/Jones: features

“Rectangle filters” Differences between sums of pixels in adjacent rectangles

{ yt(x) = +1 if ht(x) > θt -1 otherwise

000,000,6100000,60 =×Unique Features

{ Detection = face, if Y(x) > 0 non-face, otherwise

Y(x)=∑αtyt(x)

Robust Realtime Face Dection, IJCV 2004, Viola and Jonce

Select 200 by Adaboost

Integral Image (aka. summed area table)

•  Define the Integral Image

•  Any rectangular sum can be computed in constant time:

•  Rectangle features can be computed as differences between rectangles

∑≤≤

=

yyxx

yxIyxI''

)','(),('

DBACADCBAA

D

=

+++−++++=

+−+=

)()()32(41

Feature selection (AdaBoost)

Given training data {xn,tn}, find {αt} for {yt(x)} by minimizing total error function:

E(Y = αt yt (x)t=1

M

∑ ) = error(tnY (xn ))n=1

N

Ideal function error(z) = z>0?0:1, hard to optimize. Instead use error(z)=exp(-z) to make the optimization convex.

Define Basic idea: first find f1(x) by minimizing E(f1) Then given fm-1(x), find fm(x) by searching for best αm and ym(x)

fm (x) =12

αl yl (x)l=1

m

Feature selection (AdaBoost)

E( fm ) = error(tn fm (xn ))n=1

N

∑ = exp(−tn fm (xn ))n=1

N

= exp(−tn fm−1(xn )−12tnαmym (xn ))

n=1

N

∑ = wn(m) exp(− 1

2tnαmym (xn ))

n=1

N

wn(m)=exp(-tnfm-1(xn)) is high if fm-1(x) is correct for xn; is

low otherwise. Next we want to find αm and ym(x) to minimize this weighted error function

Feature selection (AdaBoost)

E( fm ) = wn(m) exp(− 1

2tnαmym (xn ))

n=1

N

= wn(m) (tn!= ym (xn ))?exp(

αm

2) : exp(−αm

2)

#

$%

&

'(

n=1

N

= wn(m) True(tn!= ym (xn ))(exp(

αm

2)− exp(−αm

2))+ exp(−αm

2)

#

$%

&

'(

n=1

N

= (exp(αm

2)− exp(−αm

2)) wn

(m)True(tn!= ym (xn ))n=1

N

∑ + exp(− 12αm ) wn

(m)

n=1

N

Recall tn in {1,+1} and ym(x) in {-1,+1}

Feature selection (AdaBoost)

Find ym(x) to minimize

Find αm to minimize

E( fm ) = (exp(αm

2)− exp(−αm

2)) wn

(m)True(tn!= ym (xn ))n=1

N

∑ + exp(− 12αm ) wn

(m)

n=1

N

wn(m)True(tn!= ym (xn ))

n=1

N

Calculate weighted error rate for ym(x) εm =wn(m)True(tn!= ym (xn ))

n=1

N

wn(m)

n=1

N

(exp(αm

2)− exp(−αm

2))εm + exp(−

αm

2)

αm = log1−εmεm

εm < 0.5,αm > 0

Feature selection (AdaBoost)

Update weight wn(m+1)=exp(-tnfm (xn))

wn(m+1) = exp(−tn fm (xn )) = exp(−tn fm−1(xn )−

12tnαmym (xn ))

= wn(m) exp(− 1

2tnαmym (xn ))

tnym (xn ) =1− 2True(ym (xn )!= tn )Note

wn(m+1) = wn

(m) exp −αm

2"

#$

%

&'exp αmTrue(ym (xn )!= tn )( )

∝wn(m) exp αmTrue(ym (xn )!= tn )( )

Only need to update weight for incorrectly classified data

Viola/Jones: handling scale

Smallest Scale

Larger Scale

50,000 Locations/Scales

Cascaded Classifier

1 Feature 5 Features

F

50% 20 Features

20% 2% FACE

NON-FACE

F

NON-FACE

F

NON-FACE

IMAGE SUB-WINDOW

•  first classifier: 100% detection, 50% false positives. •  second classifier: 100% detection, 40% false positives •  (20% cumulative)

•  using data from previous stage. •  third classifier: 100% detection,10% false positive rate •  (2% cumulative)

•  Put cheaper classifiers up front

Viola/Jones results:

Run-time: 15fps (384x288 pixel image on a 700 Mhz Pentium III)

Application

Smart cameras: auto focus, red eye removal, auto color correction

Application

Lexus LS600 Driver Monitor System

Pedestrian Detection: Chamfer matching

Gavrila & Philomin ICCV 1999

Best Match

Distance Transform

Template Edge Detection Input Image

Slides from K. Grauman and B. Leibe

Pedestrian Detection: Chamfer matching

Hierarchy of templates

Gavrila & Philomin ICCV 1999 Slides from K. Grauman and B. Leibe

Pedestrian Detection: HOG Feature

Slides from Andrew Zisserman

Pedestrian Detection: HOG Feature

Dalal & Triggs, CVPR 2005 Slides from Andrew Zisserman

HOG: Histogram of Gradients

Pedestrian Detection: HOG Feature

Dalal & Triggs, CVPR 2005

Map each grid cell in the input window to a gradient-orientation histogram weighted by gradient magnitude Code: http://pascal.inrialpes.fr/soft/olt

Slides from K. Grauman and B. Leibe

Pedestrian Detection: HOG Feature

Slides from Andrew Zisserman

Pedestrian Detection: HOG Feature

Slides from Andrew Zisserman

Algorithm

Slides from Andrew Zisserman

Model training using SVM •  Given

•  Find

•  To minimize

xi ∈ Rd, yi ∈ {0,1}{ }

f (x) =wTx+ b

minw,b

w 2+C error yi f (xi )( )

i=1

N

error(z) =max(0,1− z)

Result

Learned model

Slides from Deva Ramanan

Meaning of negative weights wx>-b (w+-w-)x>-b w+x-w-x>-b

Slides from Deva Ramanan

Complete model should compete pedestrian/pillar/doorway

Faces and Pedestrians •  Relatively easier, but can still be confusing

Slide credit: Lana Lazebnik

More difficult cases

In general •  classify every pixel