Face Detection and Recognition - UNISA · Face Detection and Recognition Docente: Michele Nappi...
Embed Size (px)
Transcript of Face Detection and Recognition - UNISA · Face Detection and Recognition Docente: Michele Nappi...

Face Detection and Recognition
Docente: Michele Nappi
biplab.unisa.it
089-963334

Il Volto: Una biometria al limite (?)
La pareidolia (dal greco εἴδω
λον èidōlon, "immagine", col
prefisso παρά parà, "vicino") è
l'illusione subcosciente che
tende a ricondurre a forme
note oggetti o profili (naturali o
artificiali) dalla forma casuale.
2
Un celebre caso di pareidolia: il Volto su
Marte, una formazione rocciosa marziana
ripresa dalla sonda Viking 1, che appare
come un volto in particolari condizioni e
angolazioni di luce

Il Volto: Una biometria al limite (?)
• La prosopagnosia è un
deficit percettivo acquisito
o congenito del sistema
nervoso centrale che
impedisce ai soggetti che
ne vengono colpiti di
riconoscere
correttamente i volti delle
persone
3

Esempi di pareidolia

11/06/2016 Riconoscimento facciale 5
Perché il volto
• I due più importanti fattori che sanciscono il successo di una biometria sono Affidabilità e Accettabilità
• Il riconoscimento dell’iride è il sistema più affidabile, ma è anche quello più intrusivo. Le impronte digitali sono più facilmente accettate, ma non applicabili a soggetti non consenzienti.
• Il volto ha un’accettabilità molto elevata, mentre l’affidabilità deve essere ancora migliorata

Further advantages …It is natural to recognize a person from face and it is
usual for people to be photographed (e.g., fingerprints
may be associated with the bad feeling to be suspected
to be a criminal)
High recognition rate in controlled conditions
Acquisition devices are easy to
deploy in the operation environment
It is possible to integrate it in remote control applications
It is possible to integrate it in logical access (login)

… but face is a complex object!
(Facial Animation Parameters)

Comparisons
From http://www.cs.bham.ac.uk/~mdr/teaching/module/security/lectures/biometric.html
Classical properties

Comparisons
From http://www.biometricsintegrated.com/biometric-types-comparison.html/biometric-comparison-chart
Further properties
Notice Hygienic
level!
refers to possibility
to acquire

Comparisons
From http://www.ssbs.com/sasolutions/medical-solution/sproducts-client-secure.aspx
Interesting to notice the
position of face which
shares «good»
properties with
fingerprints

Comparisons
Which is the most important
dimensions?
Face is in acceptable position
when compared with other
traits

Market
By Acuity Market Intelligence
(can you guess what is their product?)

Applications
iOS face recognition app –
FaceVault
Forensics
Border control
In a crowd!

ProblemsIntra-personal variations.
Piero Fornasetti's template of
Cavalieri's face gave birth to
hundreds of variations.
http://www.artistdaily.com/blogs/artistdaily/archive/201
3/04/22/350-ways-of-drawing-faces.aspx
Inter-personal similarities.
Bradley Cooper & Hrithik Roshan
http://www.mensxp.com/entertainme
nt/gossip/7356-have-we-met-before-
bollywoodhollywood-lookalikes.html

In detail: PIE (Pose, Illumination,
Expression) variations
http://www.csee.wvu.edu/~gidoretto/courses/2011-fall-
cp/assignments/final_project/results/cao_chen/index.html

In detail: PIE (Pose, Illumination,
Expression) variations + Ageing = A-PIE

Popular databases
Very popular public face databases:
• AR-Faces
• FERET
• MIT
• ORL
• Harward
• MIT/CMU
• CMU test set II
• …

Popular databases

Popular databases: FERET

Popular databases: AR-Faces
(1998)
126 people (over 4,000 color images).
Different facial expressions, illumination conditions and
occlusions.
Two sessions per person (2 different days).
1 : Neutral expression
2 : Smile
3 : Anger
4 : Scream
5 : left light on
6 : right light on
7 : all side lights on
8 : wearing sun glasses
9 : wearing sun glasses and left light on
10 : wearing sun glasses and right light on
11 : wearing scarf
12 : wearing scarf and left light on
13 : wearing scarf and right light on
14 to 26 : second session (same as 1 to 13)
http://wwwsipl.technion.ac.il/new/DataBases/Aleix
%20Face%20Database.htm

Popular databases : CMU-PIE
(2000)• CMU Pose, Illumination, and
Expression (PIE) database.
• 68 subjects
• About 608 color pictures per subject (resolution 640×486 pixels) =41,368 images
• Each person is imaged under 13 different poses, 43 different illumination conditions, and with 4 different expressions.

Popular databases: CASIA 3D
Face V1 (2004)4624 scans of 123 persons using the
non-contact 3D digitizer, Minolta Vivid
910 Created in 2004
http://biometrics.idealtest.org/dbDetailForUser.do?id=8

Popular databases: LFW (Labeled Faces in the
Wild)(2007)
http://vis-www.cs.umass.edu/lfw/
The data set contains more
than 13,000 images of faces
collected from the web.
Each face has been labeled
with the name of the person
pictured. 1680 of the people
pictured have two or more
distinct photos in the data
set. The only constraint on
these faces is that they were
detected by the Viola-Jones
face detector.

Popular databases: YouTube faces
(2011)
http://www.cs.tau.ac.il/~wolf/ytfaces/
The data set contains 3,425 videos of 1,595 different people. All the videos were
downloaded from YouTube. An average of 2.15 videos are available for each subject. The
shortest clip duration is 48 frames, the longest clip is 6,070 frames, and the average length
of a video clip is 181.3 frames. The creation started by using the 5,749 names of subjects
included in the LFW data set to search YouTube for videos of these same individuals. The
top six results for each query were downloaded.
Number of videos per person:
#videos 1 2 3 4 5 6
#people 591 471 307 167 51 8

11/06/2016 Riconoscimento facciale 25
Verifica e Riconoscimento
• Verifica: Confronto Uno a Uno. Conferma l’identità
dichiarata da un individuo
– Dichiarata mediante: carta di identità, codice utente, ...
• Riconoscimento: Confronto Uno a Molti. Stabilisce
l’identità di un soggetto a partire da un insieme di
persone registrate
Ok

11/06/2016 Riconoscimento facciale 26
Struttura di un Riconoscitore Facciale
• Il primo step di un sistema automatico è un face detector.
• A partire dall’immagine, si estrae la regione contenente unicamente il volto.
• Infine la regione estratta viene segmentata, associando un significato semantico a ciascuna zona del volto.
Occhio
Sinistro Occhio
Destro
Naso
Bocca
Face DetectorF
ace
Seg
men
tatio
n

Face localization• Problem: given a single image or a video sequence, detect the presence of
one or more faces and locate their position within the single image.
• Requirements: it is necessary to be independent with respect to position,
orientation (pose), scale, expression, possibly DIFFERENT for different
subjects in the image, as well as to illumination or cluttered background.
What to do here???

Face localization …
possible to hide from
it?
“According to Adam Harvey, the
key part of the face that computers
can read is the "nose bridge," or the
area between the eyes. If you can
obscure that, you have a good
chance of tricking computers into
thinking you don't have a face, he
said. Another technique is to create
an "anit-face," which is less
terrifying than it sounds since it just
means inverting your face's color
scheme. So the black-and-white
triangles on the cheeks aim
to achieve this effect.” (from CNN -
http://whatsnext.blogs.cnn.com/201
2/04/29/how-to-hide-from-face-
detection-technology/)
Adam Harvey proposes CV Dazzle
(Computer vision Dazzle) to avoid
one’s face detection
(http://ahprojects.com/projects/)

Face localization: some
approachesFeature-based techniques
These techniques make explicit use of the knowledge about the expected face
appearance, which is characterized by a set of features at different levels (below, they
are from lower to higher)
• Pixel properties:
– Edges
– Skin color
• Face geometry properties:
– Constellation
– Feature searching
• Template matching against a standard model:
– Correlation
– Snakes
– Active Shape Models

Face localization: some
approachesImage-based techniques
These techniques address the localization problem as a generic pattern recognition
problem (class of faces).
The goal is to learn to recognize a face image according to a number of examples.

Face localization: recent
advances
Localizing Parts of Faces Using a Consensus of Exemplars
from: http://neerajkumar.org/projects/face-parts/

Algorithm A: color space
transformation • RGB is not a perceptually uniform space = colors which are close to
each other in RGB space may not be perceived as similar
• Skin model: a set of close colors (cluster) within the color space
• It is advisable to perform the detection in a different space
RGB color space
YCBCR color space

Modelli di Colore: YIQ

Eye/Mouth localization
• Chroma map is enhanced by histogram equalization
• The two maps are combined through AND operator
• The resulting map undergoes dilation, masking and normalization to
discard the other face regions and brighten eyes.
• Further operations allow to refine this map.

Face contourThe algorithm analyzes all the triangles composed by two candidate
eyes and a candidate mouth.
Each triangle is verified by cheking:
• Luma variations and average of the orientation gradient of the blobs
containing eyes and mouth
• Geometry and orientation of the triangle
• Presence of a face contour around the triangle
A score is assigned to each
triangle satisfying the
conditions, which also
considers preference for
upright orientation and
simmetry.
The triangle with highests score (above a threshold) selected.

Some results

Face Detection: Viola-Jones
• Paul Viola and Michael Jones proposed one of the most successful (so far)
approaches to object localization (included in OpenCV).
• The algorithm is image based and can be applied to face detection (but also to
eye and mouth detection in a hierarchical strategy).
• The algorithm requires to create a classifier that is initially trained using multiple
instances of the class to identify (positive examples), and several instances of
images that do not contain any object of the class but may cause an error
(negative examples).
• Training is designed to extract several features from the examples and to select
the most discriminating ones. The statistical model which is built incrementally
contains such information.
• Misses (a present object is not detected) or false alarms (an object is detected
but it is not present) can be decreased by retraining adding new suited examples
(positive or negative).

Face Detection: Viola-Jones (II)
• Haar Features – All human faces share some similar
properties. This knowledge is used to construct certain
features known as Haar Features.
• The properties that are similar for a human face are:
– The eyes region is darker than the upper-cheeks.
– The nose bridge region is brighter than the eyes.
• That is useful domain knowledge:
– Location - Size: eyes & nose bridge region
– Value: darker / brighter
Haar Feature that looks similar to
the bridge of the nose is applied
onto the face
Haar Feature that looks similar to the eye
region which is darker than the upper
cheeks is applied onto a face

Face Detection: Viola-Jones (III)• The value of any given feature is
always simply the sum of the pixels
within clear rectangles subtracted
from the sum of the pixels within
shaded rectangles.
• Rectangle features:
– Value = Σ (pixels in black area) - Σ (pixels
in white area)
• Three types: two-, three-, four-
rectangles, Viola & Jones used two-
rectangle features
• For example: the difference in
brightness between the white &black
rectangles over a specific area
• Each feature is related to a special
location in the sub-window 39

Example use of rectangular
features
https://github.com/atduskgreg/Makematics/blob/master/research/haar_detection
/index.md
An example of an early stage
in the Haar cascade. Each
black and white patch
represents a feature that the
algorithm hunts for in the
image.

Note: example of AdaBoost

Note: example of AdaBoost

Note: example of AdaBoost

Note: example of AdaBoost

Note: example of AdaBoost

Note: example of AdaBoost

11/06/2016 Riconoscimento facciale 47
Struttura di un Riconoscitore Facciale (1)
• Si associa un insieme di caratteristicheall’identità di un soggetto
– Raccolta dei dati ed estrazione delle caratteristiche
• Il modello estratto (template) è memorizzato in un database o su un supporto portatile (smart card)
• Questo processo può essere effettuato singolarmente o in blocco (batch enrollment)
• Si estrae il template dal volto del soggetto che richiede di essere autenticato
– Raccolta dei dati ed estrazione delle caratteristiche
• Il modello estratto (template) è cercato all’interno di in un database o su un supporto portatile (smart card)
• Un criterio di matching stabilisce se il soggetto deve essere autorizzato o no.
Enrollment
Testing

11/06/2016 Riconoscimento facciale 48
Sistemi esistenti in Letteratura
Image Based
•ICA
•Neural Networks
•Eigenfaces
Feature Based
•Elastic Graph Matching
Hybrid
•Fractals
•Wavelets
3D
•3D Morphable Models

11/06/2016 Riconoscimento facciale 49
Eigenfaces – sistemi lineari
Immagini in Input LinearizzazioneMedia su tutte i volti A ciascun vettore togliamo la media
Risolvendo un particolare sistema lineare
estraiamo un insieme di volti di riferimento.
Ciascun volto viene rappresentato come una combinazione
dei volti di riferimento. I coefficienti utilizzati per la
rappresentazione costituiscono il vettore di caratteristiche

Note (continued)• Some consequences:
– Machine learning requires learning a possibly infinite distribution
from a finite number of data samples; in a high-dimensional
feature space with each feature having a number of possible
values, an enormous amount of training data are required to
ensure that there are several samples with each combination of
values; with a fixed number of training samples, the predictive
power reduces as the dimensionality increases (Hughes effect).
– When a measure such as a Euclidean distance is defined using
many coordinates, there is little difference in the distances
between different pairs of samples problems in nearest
neighbor search and k-nearest neighbor classification
• Dimensionality reduction methods such as Principal Component
Analysis (PCA) aim at reducing high-dimensional data sets to lower-
dimensional data without significant information loss, by identifying
the most informative dimensions relevant subspaces

A statistical method in detail: PCA
• Principal component analysis (PCA) is a statistical procedure that uses an
orthogonal transformation to convert a set of observations of possibly correlated
variables (some redundancy is involved) into a set of values of linearly
uncorrelated variables (no redundancy) called principal components.
• The number of principal components is less than or equal to the number of
original variables the n-dimensional feature space is projected onto a k-
dimensional subspace.
• The transformation is defined in such a way that the first principal component
has the largest possible variance (that is, accounts for as much of the variability
in the data as possible), and each succeeding component in turn has the
highest variance possible under the constraint that it is orthogonal to (i.e.,
uncorrelated with) the preceding components.
• For an image space, the principal components are orthogonal when they are the
eigenvectors of the covariance matrix, which is symmetric.
• In signal processing it is also often referred to as the discrete Karhunen–Loève
transform (KLT).

A statistical method in detail:
PCA

A statistical method in detail:
PCA

A statistical method in detail:
PCA• Sirovich and Kirby where the first to use Principal
Component Analysis (PCA) for face recognition.
• They demonstrated that any particular face can be
efficiently represented along the eigenpictures
coordinate space, and that can be approximately
reconstructed by using just a small collection of
eigenpictures and the corresponding projections
(‘coefficients’) along each eigenpicture.
• PCA Image space for face representation

PCA and Eigenfaces• Starting from Sirovich and Kirby’s approach to
representation, Turk and Pentland realized that
projections along eigenpictures can be used as
classification features to recognize faces.
• They developed a face recognition system that builds
eigenfaces, corresponding to eigenvectors associated
with the dominant eigenvalues of the covariance matrix
built with linearized known faces (patterns).
• The system recognizes particular faces by comparing
their projections along the eigenfaces to those of the
face images of the known individuals.
• The eigenfaces define a feature space that drastically
reduces the dimensionality of the original space, and
face identification is carried out in this reduced space.
• PCA Holistic classification

PCA and Eigenfaces
Example training set

PCA and Eigenfaces
Mean image from the training set

PCA and Eigenfaces
Eigenfaces from the training set

Projection

PCA and Eigenfaces
• Basic idea: perform the search in a subspace k-
dimensional KL subspace which has been built after
faces of the training set.
• Each gallery image is projected onto the KL subspace
the projection coefficients make up the feature vector.
• Each probe image is projected onto the subspace to be
recognized, and classification is based on a simple
nearest neighbor criterion : the face to be recognized is
associated with the face of the gallery to which
corresponds the minimum distance in the subspace
(possibly subject to a threshold).

PCA and Eigenfaces
Example backprojection

PCA: problems and a solution
• PCA is good for representation, but lacks true discriminative
power.
• Eigenfaces separation (scattering) that is maximized depends not
only on inter-class separation (truly useful for classification) but
also on the intra-class differences …
• When there are significant PIE variations, PCA retains them and
therefore the similarity in the space of the faces is not necessarily
determined the identity of individuals the classes may be not
separate correctly.
• To solve the above problems Belhumer et al. (1997) proposed
Fisherfaces, an application of Fisher’s linear discriminant
(FLD), more often mentioned in the context of Linear
Discriminant Analysis (LDA).

11/06/2016 Riconoscimento facciale 63
Vantaggi e Svantaggi
• La fase di identificazione è veloce.
• Se si conservano gli autovettori è possibile ricostruire l’informazione originaria.
• La fase di training è lenta.
• Se si aggiunge un numero consistente di nuovi soggetti è necessario il retraining del sistema.
• Elevata sensibilità a variazioni di illuminazione, posa, occlusioni, …

11/06/2016 Riconoscimento facciale 64
Reti Neurali• Una rete neurale simula il funzionamento dei neuroni nel cervello.
• Ciascun neurone è rappresentato da una funzione matematica basata sul calcolo delle probabilità.
• Per il riconoscimento dei volti, l’ottimo sarebbe utilizzare un neurone per ciascun pixel. Questo approccio richiede troppi neuroni.
• La soluzione consiste nell’utilizzare una rete di neuroni per « riassumere » l’immagine in un vettore più piccolo. Mentre una seconda rete effettua il riconoscimento vero e proprio.
=
=
=
=
=
Input faces LinearizationDimensionality
reductionClassification
network
Output
classes

11/06/2016 Riconoscimento facciale 65
Reti Neurali

11/06/2016 Riconoscimento facciale 66
Vantaggi e Svantaggi
• Riducono l’ambiguità fra soggetti appartenenti a classi simili.
• Con opportuni accorgimenti sono robuste alle occlusioni
• Richiedono più di una immagine per il training
• Alcune reti sono soggette a problemi di:
– Overfitting: la rete ha la stessa dimensione dell’input.
– Overtraining: se le immagini sono grandi il metodo converge lentamente
– Dtabase size: quando il numero di soggetti cresce diventano inefficienti.

11/06/2016 Riconoscimento facciale 67
Sistemi basati su Grafi• Attraverso filtri e funzioni di localizzazione vengono localizzati sul
volto un insieme di punti di riferimento.
• Questi punti vengono collegati da archi pesati e si ottiene un
grafo.
• Ad ogni volto è associato un grafo, per cui confrontare due volti
significa confrontare due grafi.

11/06/2016 Riconoscimento facciale 68
Vantaggi e Svantaggi
• Sono robusti rispetto alle variazioni di posa
• Sono robusti rispetto alle variazioni di illuminazione
• Non richiedono il retraining del sistema
• Il processo di training è lento
• Il processo di testing è molto lento perché richiede il confronto fra grafi (NP-Hard).

11/06/2016 Riconoscimento facciale 69
Termogramma• L’immagine del volto viene acquisita mediante
un sensore termico.
• Il sensore rileva le variazione di temperatura
dell’epidermide del volto.
• L’immagine viene segmentata e indicizzata.

11/06/2016 Riconoscimento facciale 70
Vantaggi e Svantaggi
• Sono robusti rispetto alle variazioni di illuminazione
• Sono robusti rispetto alle variazioni di tempo
• Sono efficienti anche in caso di ambienti esterni
• Richiedono dispositivi di acquisizione costosi
• I dispositivi di acquisizione sono troppo sensibili ai movimenti del soggetto e offrono limitate risoluzioni
• Dipendono dallo stato emotivo del soggetto
• La presenza di un vetro fra il soggetto e il dispositivo rende inefficace l’acquisizione.