Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features...
Transcript of Viola/Jones: featurespages.cs.wisc.edu/~lizhang/courses/cs766-2012f/...Viola/Jones: features...
Viola/Jones: features
“Rectangle filters” Differences between sums of pixels in adjacent rectangles
{ yt(x) = +1 if ht(x) > θt -1 otherwise
000,000,6100000,60 =×Unique Features
{ Detection = face, if Y(x) > 0 non-face, otherwise
Y(x)=∑αtyt(x)
Robust Realtime Face Dection, IJCV 2004, Viola and Jonce
Select 200 by Adaboost
Integral Image (aka. summed area table)
• Define the Integral Image
• Any rectangular sum can be computed in constant time:
• Rectangle features can be computed as differences between rectangles
∑≤≤
=
yyxx
yxIyxI''
)','(),('
DBACADCBAA
D
=
+++−++++=
+−+=
)()()32(41
Feature selection (AdaBoost)
Given training data {xn,tn}, find {αt} for {yt(x)} by minimizing total error function:
E(Y = αt yt (x)t=1
M
∑ ) = error(tnY (xn ))n=1
N
∑
Ideal function error(z) = z>0?0:1, hard to optimize. Instead use error(z)=exp(-z) to make the optimization convex.
Define Basic idea: first find f1(x) by minimizing E(f1) Then given fm-1(x), find fm(x) by searching for best αm and ym(x)
fm (x) =12
αl yl (x)l=1
m
∑
Feature selection (AdaBoost)
E( fm ) = error(tn fm (xn ))n=1
N
∑ = exp(−tn fm (xn ))n=1
N
∑
= exp(−tn fm−1(xn )−12tnαmym (xn ))
n=1
N
∑ = wn(m) exp(− 1
2tnαmym (xn ))
n=1
N
∑
wn(m)=exp(-tnfm-1(xn)) is high if fm-1(x) is correct for xn; is
low otherwise. Next we want to find αm and ym(x) to minimize this weighted error function
Feature selection (AdaBoost)
E( fm ) = wn(m) exp(− 1
2tnαmym (xn ))
n=1
N
∑
= wn(m) (tn!= ym (xn ))?exp(
αm
2) : exp(−αm
2)
#
$%
&
'(
n=1
N
∑
= wn(m) True(tn!= ym (xn ))(exp(
αm
2)− exp(−αm
2))+ exp(−αm
2)
#
$%
&
'(
n=1
N
∑
= (exp(αm
2)− exp(−αm
2)) wn
(m)True(tn!= ym (xn ))n=1
N
∑ + exp(− 12αm ) wn
(m)
n=1
N
∑
Recall tn in {1,+1} and ym(x) in {-1,+1}
Feature selection (AdaBoost)
Find ym(x) to minimize
Find αm to minimize
E( fm ) = (exp(αm
2)− exp(−αm
2)) wn
(m)True(tn!= ym (xn ))n=1
N
∑ + exp(− 12αm ) wn
(m)
n=1
N
∑
wn(m)True(tn!= ym (xn ))
n=1
N
∑
Calculate weighted error rate for ym(x) εm =wn(m)True(tn!= ym (xn ))
n=1
N
∑
wn(m)
n=1
N
∑
(exp(αm
2)− exp(−αm
2))εm + exp(−
αm
2)
αm = log1−εmεm
εm < 0.5,αm > 0
Feature selection (AdaBoost)
Update weight wn(m+1)=exp(-tnfm (xn))
wn(m+1) = exp(−tn fm (xn )) = exp(−tn fm−1(xn )−
12tnαmym (xn ))
= wn(m) exp(− 1
2tnαmym (xn ))
tnym (xn ) =1− 2True(ym (xn )!= tn )Note
wn(m+1) = wn
(m) exp −αm
2"
#$
%
&'exp αmTrue(ym (xn )!= tn )( )
∝wn(m) exp αmTrue(ym (xn )!= tn )( )
Only need to update weight for incorrectly classified data
Viola/Jones: handling scale
Smallest Scale
Larger Scale
50,000 Locations/Scales
Cascaded Classifier
1 Feature 5 Features
F
50% 20 Features
20% 2% FACE
NON-FACE
F
NON-FACE
F
NON-FACE
IMAGE SUB-WINDOW
• first classifier: 100% detection, 50% false positives. • second classifier: 100% detection, 40% false positives • (20% cumulative)
• using data from previous stage. • third classifier: 100% detection,10% false positive rate • (2% cumulative)
• Put cheaper classifiers up front
Viola/Jones results:
Run-time: 15fps (384x288 pixel image on a 700 Mhz Pentium III)
Application
Smart cameras: auto focus, red eye removal, auto color correction
Application
Lexus LS600 Driver Monitor System
Pedestrian Detection: Chamfer matching
Gavrila & Philomin ICCV 1999
Best Match
Distance Transform
Template Edge Detection Input Image
Slides from K. Grauman and B. Leibe
Pedestrian Detection: Chamfer matching
Hierarchy of templates
Gavrila & Philomin ICCV 1999 Slides from K. Grauman and B. Leibe
Pedestrian Detection: HOG Feature
Slides from Andrew Zisserman
Pedestrian Detection: HOG Feature
Dalal & Triggs, CVPR 2005 Slides from Andrew Zisserman
HOG: Histogram of Gradients
Pedestrian Detection: HOG Feature
Dalal & Triggs, CVPR 2005
Map each grid cell in the input window to a gradient-orientation histogram weighted by gradient magnitude Code: http://pascal.inrialpes.fr/soft/olt
Slides from K. Grauman and B. Leibe
Pedestrian Detection: HOG Feature
Slides from Andrew Zisserman
Pedestrian Detection: HOG Feature
Slides from Andrew Zisserman
Algorithm
Slides from Andrew Zisserman
Model training using SVM • Given
• Find
• To minimize
xi ∈ Rd, yi ∈ {0,1}{ }
f (x) =wTx+ b
minw,b
w 2+C error yi f (xi )( )
i=1
N
∑
error(z) =max(0,1− z)
Result
Learned model
Slides from Deva Ramanan
Meaning of negative weights wx>-b (w+-w-)x>-b w+x-w-x>-b
Slides from Deva Ramanan
Complete model should compete pedestrian/pillar/doorway
Faces and Pedestrians • Relatively easier, but can still be confusing
Slide credit: Lana Lazebnik
More difficult cases
In general • classify every pixel