Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features...

27
Viola/Jones: features Rectangle filtersDifferences between sums of pixels in adjacent rectangles { y t (x) = +1 if h t (x) > θ t -1 otherwise 000 , 000 , 6 100 000 , 60 = × Unique Features { Detection = face, if Y(x) > 0 non-face, otherwise Y(x)=∑α t y t (x) Robust Realtime Face Dection, IJCV 2004, Viola and Jonce Select 200 by Adaboost

Transcript of Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features...

Page 1: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Viola/Jones: features

“Rectangle filters” Differences between sums of pixels in adjacent rectangles

{ yt(x) = +1 if ht(x) > θt -1 otherwise

000,000,6100000,60 =×Unique Features

{ Detection = face, if Y(x) > 0 non-face, otherwise

Y(x)=∑αtyt(x)

Robust Realtime Face Dection, IJCV 2004, Viola and Jonce

Select 200 by Adaboost

Page 2: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Integral Image (aka. summed area table)

•  Define the Integral Image

•  Any rectangular sum can be computed in constant time:

•  Rectangle features can be computed as differences between rectangles

∑≤≤

=

yyxx

yxIyxI''

)','(),('

DBACADCBAA

D

=

+++−++++=

+−+=

)()()32(41

Page 3: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Feature selection (AdaBoost)

Given training data {xn,tn}, find {αt} for {yt(x)} by minimizing total error function:

E(Y = αt yt (x)t=1

M

∑ ) = error(tnY (xn ))n=1

N

Ideal function error(z) = z>0?0:1, hard to optimize. Instead use error(z)=exp(-z) to make the optimization convex.

Define Basic idea: first find f1(x) by minimizing E(f1) Then given fm-1(x), find fm(x) by searching for best αm and ym(x)

fm (x) =12

αl yl (x)l=1

m

Page 4: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Feature selection (AdaBoost)

E( fm ) = error(tn fm (xn ))n=1

N

∑ = exp(−tn fm (xn ))n=1

N

= exp(−tn fm−1(xn )−12tnαmym (xn ))

n=1

N

∑ = wn(m) exp(− 1

2tnαmym (xn ))

n=1

N

wn(m)=exp(-tnfm-1(xn)) is high if fm-1(x) is correct for xn; is

low otherwise. Next we want to find αm and ym(x) to minimize this weighted error function

Page 5: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Feature selection (AdaBoost)

E( fm ) = wn(m) exp(− 1

2tnαmym (xn ))

n=1

N

= wn(m) (tn!= ym (xn ))?exp(

αm

2) : exp(−αm

2)

#

$%

&

'(

n=1

N

= wn(m) True(tn!= ym (xn ))(exp(

αm

2)− exp(−αm

2))+ exp(−αm

2)

#

$%

&

'(

n=1

N

= (exp(αm

2)− exp(−αm

2)) wn

(m)True(tn!= ym (xn ))n=1

N

∑ + exp(− 12αm ) wn

(m)

n=1

N

Recall tn in {1,+1} and ym(x) in {-1,+1}

Page 6: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Feature selection (AdaBoost)

Find ym(x) to minimize

Find αm to minimize

E( fm ) = (exp(αm

2)− exp(−αm

2)) wn

(m)True(tn!= ym (xn ))n=1

N

∑ + exp(− 12αm ) wn

(m)

n=1

N

wn(m)True(tn!= ym (xn ))

n=1

N

Calculate weighted error rate for ym(x) εm =wn(m)True(tn!= ym (xn ))

n=1

N

wn(m)

n=1

N

(exp(αm

2)− exp(−αm

2))εm + exp(−

αm

2)

αm = log1−εmεm

εm < 0.5,αm > 0

Page 7: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Feature selection (AdaBoost)

Update weight wn(m+1)=exp(-tnfm (xn))

wn(m+1) = exp(−tn fm (xn )) = exp(−tn fm−1(xn )−

12tnαmym (xn ))

= wn(m) exp(− 1

2tnαmym (xn ))

tnym (xn ) =1− 2True(ym (xn )!= tn )Note

wn(m+1) = wn

(m) exp −αm

2"

#$

%

&'exp αmTrue(ym (xn )!= tn )( )

∝wn(m) exp αmTrue(ym (xn )!= tn )( )

Only need to update weight for incorrectly classified data

Page 8: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Viola/Jones: handling scale

Smallest Scale

Larger Scale

50,000 Locations/Scales

Page 9: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Cascaded Classifier

1 Feature 5 Features

F

50% 20 Features

20% 2% FACE

NON-FACE

F

NON-FACE

F

NON-FACE

IMAGE SUB-WINDOW

•  first classifier: 100% detection, 50% false positives. •  second classifier: 100% detection, 40% false positives •  (20% cumulative)

•  using data from previous stage. •  third classifier: 100% detection,10% false positive rate •  (2% cumulative)

•  Put cheaper classifiers up front

Page 10: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Viola/Jones results:

Run-time: 15fps (384x288 pixel image on a 700 Mhz Pentium III)

Page 11: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Application

Smart cameras: auto focus, red eye removal, auto color correction

Page 12: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Application

Lexus LS600 Driver Monitor System

Page 13: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Pedestrian Detection: Chamfer matching

Gavrila & Philomin ICCV 1999

Best Match

Distance Transform

Template Edge Detection Input Image

Slides from K. Grauman and B. Leibe

Page 14: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Pedestrian Detection: Chamfer matching

Hierarchy of templates

Gavrila & Philomin ICCV 1999 Slides from K. Grauman and B. Leibe

Page 15: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Pedestrian Detection: HOG Feature

Slides from Andrew Zisserman

Page 16: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Pedestrian Detection: HOG Feature

Dalal & Triggs, CVPR 2005 Slides from Andrew Zisserman

HOG: Histogram of Gradients

Page 17: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Pedestrian Detection: HOG Feature

Dalal & Triggs, CVPR 2005

Map each grid cell in the input window to a gradient-orientation histogram weighted by gradient magnitude Code: http://pascal.inrialpes.fr/soft/olt

Slides from K. Grauman and B. Leibe

Page 18: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Pedestrian Detection: HOG Feature

Slides from Andrew Zisserman

Page 19: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Pedestrian Detection: HOG Feature

Slides from Andrew Zisserman

Page 20: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Algorithm

Slides from Andrew Zisserman

Page 21: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Model training using SVM •  Given

•  Find

•  To minimize

xi ∈ Rd, yi ∈ {0,1}{ }

f (x) =wTx+ b

minw,b

w 2+C error yi f (xi )( )

i=1

N

error(z) =max(0,1− z)

Page 22: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Result

Page 23: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Learned model

Slides from Deva Ramanan

Page 24: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Meaning of negative weights wx>-b (w+-w-)x>-b w+x-w-x>-b

Slides from Deva Ramanan

Complete model should compete pedestrian/pillar/doorway

Page 25: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

Faces and Pedestrians •  Relatively easier, but can still be confusing

Slide credit: Lana Lazebnik

Page 26: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

More difficult cases

Page 27: Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles y t (x)

In general •  classify every pixel