FACE RECOGNITION UNDER POSE AND EXPRESIVITY · PDF fileFACE RECOGNITION UNDER POSE AND...

FACE RECOGNITION UNDER POSE AND EXPRESIVITY VARIATION USING THERMAL AND VISIBLE IMAGES

Florin Marius Pop, Mihaela Gordan, Camelia Florea, Aurel Vlaicu Centre for Multimedia Technologies and Distance Education

Technical University of Cluj-Napoca, Romania [email protected], {Mihaela.Gordan, Camelia.Florea, Aurel.Vlaicu}@com.utcluj.ro

ABSTRACT

Many existing works in face recognition are based solely on visible images. The use of bimodal systems based on visible and thermal images is seldom reported in face recognition, despite its advantage of combining the discriminative power of both modalities, under expressions or pose variations. In this paper, we investigate the combined advantages of thermal and visible face recognition on a Principal Component Analysis (PCA) induced feature space, with PCA applied on each spectrum, on a relatively new thermal/visible face database – OTCBVS, for large pose and expression variations. The recognition is done through two fusion schemes based on k-Nearest Neighbors classification and on Support Vector Machines. Our findings confirm that the recognition results are improved by the aid of thermal images over the classical approaches on visible images alone, when a suitably chosen classifier fusion is employed. Keywords: face recognition, fusion scheme, thermal images, PCA, k-NN, SVMs.

1 INTRODUCTION Face recognition research is experiencing a major development due to its potential in the integration of multiple applications, such as commercial or military systems. Face recognition applications are used for access control to high level security areas or video surveillance in important security or commercial areas like airports and casinos. The main advantage of the biometric systems based on the recognition of the human face is the non-intrusive information acquisition. Other biometric systems based on physiological features (e.g. fingerprint or iris) require the cooperation of the tested subjects, a scenario that is not always possible. Face recognition techniques can be classified into two main categories [1]: analytic or feature based techniques and the holistic or appearance based techniques. The analytic techniques extract certain geometrical face features (e.g. the eyes, the nose or the mouth) for feature comparing. Those approaches have the disadvantage of not being robust. Their performance is highly affected by face expressions or other natural changes and generally has a high computational cost due to the local feature extraction involved. Thus, the holistic or appearance based methods were proposed. These methods have the advantage of a lower computational complexity. The holistic methods are based on techniques that transform the face image into a low-dimensional feature space with enhanced discriminatory power.

1.1 Visible spectrum face recognition based on Principal Component Analysis

One of the first holistic methods for face recognition is based on the Principle Component Analysis (PCA) for feature extraction [2] and its flowchart is illustrated in Fig. 1. The good results provided by the classical PCA approach proposed by Turk and Pentland [2] have lead to other successful approaches based on PCA. Recent improvements over the classical PCA based method in visible spectrum are using the discriminative power of the wavelet transform sub-bands [3, 4], more complex classifiers such as Support Vector Machines (SVMs) [5], neuronal networks [6], k-Nearest Neighbors (k-NN) [7] or improved PCA techniques: (2D PCA)2 [8] and Weighted Modular PCA [9].

Figure 1: Flowchart of the classical PCA-based face recognition methods PCA based methods’ performance is mostly affected by intra-personal variations under illumination, pose or/and expressivity which degrade the recognition performance, sometimes in a higher manner than by the inter-personal variations [10].

Special issue of The Romanian Educational Network - RoEduNet

UbiCC Journal, Volume 6 662

Most of the improvements over classical PCA based method in visible spectrum are usually optimizing the computational cost or improve the recognition rate for small intra-personal variations. However, the performance in visible spectrum is highly affected by illumination, expressivity and pose variations, which explains the current interest for the use of thermal images in face recognition [11, 12, 13]. 1.2 Thermal spectrum face recognition based

on Principal Component Analysis Infrared (IR) thermal images represent the patterns of the heat emitted from the human body and are considered as a unique feature of each individual [14]. Also, IR thermal face images are nearly invariant to illumination and less variant to expressivity than the visible images. It is known that even identical twins have different thermal patterns. Earlier approaches were based on determining the thermal shape of the face and used it directly for identification [15]. PCA-based techniques can also be successfully applied in face recognition on IR thermal images [16, 17, 18]. While, the visible spectrum eigenfaces contain mostly low-frequency information, the corresponding IR thermal eigenfaces have fewer low-frequency components and many more high-frequency components [15]. Therefore, the majority of the information for IR thermal images is distributed in a lower dimensional subspace. Even if the results are promising using IR thermal images, their performances are negatively affected by ambient temperature, some emotions states of the tested subject or by wearing glasses. Thus, the IR thermal based recognition rates are in most cases lower than in visible spectrum [17]. For example, the glasses reduce most of the thermal energy emitted by the human face. However, since IR thermal imaging and visible imaging bring complementary information to face recognition, the joint use of the two modalities seems appealing to face recognition applications. 1.3 Fusion schemes for face recognition based

on visible and thermal spectrum images As presented for visible spectrum images and thermal spectrum images, every biometric feature presents advantages, but also disadvantages in the recognition process. Thus, recently the researchers present a growing interest for the multimodal biometric systems, which become the 2nd most used biometric systems after those based on the fingerprint [19]. Most of the multimodal systems are combining the advantages of the visible face images and the fingerprint [20, 21, 22, 23] which offers superior performance to the face recognition approaches, due to the accuracy of the fingerprints. Also, these approaches have the disadvantage of being intrusive, also due to the fingerprint’s

acquisition. The necessity of the non-intrusive data acquisition for recognition methods lead to recent approaches [24, 25, 26, 27, 28] that are considering fusion schemes based on visible spectrum images and IR thermal images, carrying the advantage that both of the biometric features are non intrusive. These bimodal fusion schemes are exploiting the advantages from visible and IR thermal images and trying to compensate the individual drawbacks of each single modality. A multimodal system based on 2D images extracted from visible spectrum, the 3D model of the face and IR thermal images of the same subjects is proposed in [24]. The metric fusion scheme is based on the product rule, obtaining the following recognition rates: 98.7% for 2D-3D fusion scheme, 96.6% for 2D-IR fusion scheme and 98% for 3D-IR fusion scheme. Also, considering a metric fusion scheme based on product rule of all three biometric features (2D, 3D, IR thermal images), the recognition rate achieved was 100%. The approach was tested on a particular image set, since there isn’t a standard database of 2D, 3D and IR thermal images of the same subjects. In [25], another approach based on a special type of Convolutional Neural Network is presented, based on diabolo network model [29] for automatic feature extraction from both visible and IR thermal images. The recognition rate in IR spectrum was the lowest but those obtained from the fusion scheme were improved considering also the recognition rates obtained from visible or IR thermal images. All the experiments from [25] are using “Notre Infrared Face Database (X1 Collection)” [30]. Two fusion schemes were proposed by Singh [27] in order to improve recognition rates in occlusions scenarios caused by eye glasses. Thus, an image-based fusion was performed in wavelet domain and a feature-based fusion in the PCA induced feature space. The results show improvements in the recognition rate using the “Equinox Infrared face database” [31]. A decision fusion scheme based on a voting scheme for IR and visible face recognition was proposed by Shahbe and Hati [26]. In their approach, the eigenface and fisherface classification techniques are applied for extracting the face features, and the experimental results are obtained using “Equinox Infrared Face Database” and “OTCBVS Thermal/Visible Face Database” [32]. Also, in [28] an integrated image fusion and a match fusion score approach is proposed. Discrete Wavelet Transform and a 2ν-Granular Support Vector Machine are performed for the fusion of the visible and thermal spectrum face images. 2D log polar Gabor transform is applied for extracting the global and the local facial features and a match score fusion based on Dezert Smarandache theory [33] is proposed for improving the results over the classical unimodal approaches and over few classical fusion based face recognition systems. The results were validated using



“Notre Infrared Face Database (X1 Collection)” and “Equinox Infrared Face Database”. In this work, we propose to evaluate the performance of a rather classical approach, based on the application of the eigenfaces method on the visible and thermal infrared spectrum face images and we propose two fusion methods. The first method is based on the fusion of classical PCA methods results in visible and thermal spectrum with respect to k-Nearest Neighbor classifier and the second method is based on fusion of the feature vectors in the PCA induced space using SVMs for classification. All the scenarios that are tested in our approaches are following the variation of the pose and expressivity. Both of the approaches are tested on the “OTCBVS Thermal/Visible Face Database”. Unlike the method of Shahbe and Hati mentioned above, we propose two new fusion schemes: a new score fusion generation method with the use of the nearest neighbor classification to increase the confidence in the recognition results and a features fusion scheme directly on the PCA induce space. A complete validation in respect to pose and expression variation in the face images has been performed in the paper for the first method, unlike the previous works on the OTCBVS database. Our approach provides an automatic procedure to select the optimal value of the score fusion weight α between the two modalities to maximize an estimate of the recognition accuracy. For every approach, the optimal values of the fusion weight is different but near closely improved results are obtained for a single value of the weight. The proposed approaches to IR and visible face image fusion and its performances in face recognition are presented in the following sections of this paper: in Section II we review briefly the basic PCA approach and a few theoretical issues about identity classification using the k-NN and SVM classifier. Section III presents our fusion based approaches. Section IV summarizes the experiments and results obtained. Finally, the last section contains the conclusions. 2 THEORETICAL BACKGROUND 2.1 The basic PCA approach to face

recognition PCA is often used in many forms of data analysis, from neuroscience to computer graphics, being a simple non parametric method which extracts the relevant information from large data sets. PCA offer the solution of reducing a complex data set to a lower dimensional one with a good representation of the information for discriminative feature selection. The advantages of PCA were firstly explored by Turk and Pentland in their face recognition method [2]. For that, a training face image database is needed. The training face images set selection is crucial for the face recognition performance, as they must be

representative enough for the given classification problem. When we apply PCA, the most significant eigenvectors are extracted from the training database, and they define a lower dimensional subspace. They are also known as “the eigenfaces” [2] due to their graphical representation. After computing the eigenfaces, any face image is represented by a feature vector in the subspace determined by the eigenfaces. A short summary of the algorithm [2] is presented below: For a grey level N × N image, we consider it as a N2 one-dimensional vector. Let X be a matrix of S columns and N2 rows, where S represents the number of training images, which can be used to represent the entire training set of face images.

⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

=

SNNN

S

S

SN

xxx

xxxxxx

X

222

2

...............

...

...

21

22221

11211

* (1)

The goal of PCA is to derive another matrix P matrix which will describe a linear transformation of every column in X (every training face) in the eigenfaces subspace, in the form: W=PX, where W are the projections of the training facial images on the subspace described by the eigenfaces. The rows of P matrix represent the principal components and they are orthonormal. The computation of the matrix P requires the following:

• Find the mean face vector M, as in (2), where Xi (i=1,2,…,S) represents the ith column of X:

∑=

=S

iiX

SM

1

1 (2)

• Subtract the mean face M from each training

face Xi:

MXH ii −= (3)

• Compute the covariance matrix CA, where A= [H1, H2... HS]:

T

A AAS

C1

1−

= (4)

The largest Q eigenvectors of CA (where Q is determined based on some threshold on the eigenvalues) are the vectors of the best basis for the training set X. Every eigenvector represents a column of the matrix P[N2×Q]. The representation of any



image I (represented as a vector of length N2) in the subspace described by the Q eigenvectors is given by the vector WI (of length Q):

)( MIPWI −= (5)

For every image, a projection on the subspace will be extracted. The projection of the test image is compared with each training projection in all the classical approaches on visible spectrum. The decision process is based on a classifier in the projections space and it returns the identity of a person. In the basic approach of Turk and Pentland [2], a simple minimum Euclidian distance classifier is used. 2.2 K-NN classifier k-NN is a supervised classification method in a given feature space – in our case, the eigenfaces space extracted with PCA. To build the k-NN classifier, one simply needs to define the number of classes C, to form a labeled training set of Ntrn samples in the feature space, with class labels yi=1..C, and to consider the labeled samples in the training set as known prototypes of the C classes. Furthermore, a distance norm d in the feature space must be chosen for the data classification (e.g. the Euclidean distance). Suppose that a vector W from the feature space must be classified in one of the C classes. By k-NN, W is classified by a majority vote of its k nearest neighbors in the sense of the distance norm chosen, being assigned to the class most common amongst them. The algorithm needs to follow the steps: • Compute the distances d(W,Wt,j) for each

prototype (denoted by Wt,j, j=1,2,…,Ntrn) from the training set.

• Sort the distances d(W,Wt,j), j=1..Ntm, increasingly, and keep the labels of the first k prototypes (found at the first k smallest distances from W): {y1’, y2’,…,yk’}.

• Assign to W the yl’ label, most frequent from the class sort array {y1’, y2’,…,yk’}.

In this paper, for the fusion score approach, we use a k-NN classifier with Euclidean distance, and examined four values for k: 1; 3; 5 and 7. 2.3 SVM classifier Support vector machines (SVMs) are powerful classifiers from the machine learning class of techniques, with very good recall and generalization performances, able to learn their decision function from sparse and relatively small training sets [34, 35, 36]. In a SVM, a binary classification problem is solved through the optimal separating hyperplane principle. This means that, in the training phase, the SVM “learns” the parameters of a separating hyperplane H, such that H separates with minimal error and maximal margin the data from the two

classes. In the SVM formulation, the two classes are called “the positive samples class” and “the negative samples class”. In a pattern classification problem, the positive samples class contains the objects of interest, which we aim to identify, and the label +1 is assigned to these objects. The negative samples class contains anything but the objects of interest and all these patterns are assigned the label –1. Let {xi, yi}, i = 1, 2, . . . , Ntrn, denote Ntrn training examples where xi comprises an N-dimensional pattern and yi is its class label. We confine at two-class pattern recognition problem, that is, }1,1{ +−∈iy , yi = +1 is assigned to positive examples, whereas yi = −1 is assigned to counter examples. The data to be classified by the SVM might or might not be linearly separable in their original domain. If they are separable, then a simple linear SVM can be used for their classification; when in the the data cannot be separated by a hyperplane in the original domain, we may use non-linear SVMs, by projecting the data into a higher-dimensional Hilbert space using kernel functions and linearly separate the samples in the higher-dimensional space. In our experiments, we have used for now a linear SVM, whose decision function is of the form: ℜ→ℜ Nf : :

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

=+α= ∑

trnN

ibixTxiyisignxf

1)( (6)

where αi are the nonnegative Lagrange multipliers associated with the quadratic optimization problem that aims to maximize the distance between the two classes measured in Nℜ . Our particular task, i.e., face recognition, is implicitly a multi-class classification task, therefore it requires a multi-class classifier. The two main strategies in the literature for building multi-class SVM classifiers starting from binary SVM classifiers are the one-against-one classification and one-against-all classification [37]. In the one-against-all approach, one binary SVM classifier is constructed for each class, having as positive examples – the ones from the class, and as negative examples – all the others from the data set. However this approach may exhibit the disadvantage of an unbalanced training set in respect to the ratio of positive and negative examples. The other approach, namely the one-against-one, avoids this drawback; in this case, an SVM classifier is built for each pair of classes, and in the end the class label is decided by the majority vote over all the SVM classifiers. This is the approach we adopted in our face recognition experiment.



Figure 2: Flowchart of the score fusion scheme proposed. (A) PCA-based feature extraction in visible spectrum; (B) PCA-based feature extraction in thermal spectrum; (C1) Score fusion scheme.

3 BIMODAL SCORE FUSION SCHEME

PROPOSED The main drawback of the PCA based face recognition techniques in visible spectrum is their sensitivity to pose and expression variance, whereas in IR spectrum is the sensitivity of the thermal images to ambient temperature and emotion states of the subject. Therefore we propose two fusion schemes between the visible and IR thermal face recognition to improve the global performance by minimizing the negative effects. Both schemes are based on the reducing the complex data to a lower dimensional one in the PCA feature induced space with a good representation of the information for discriminative feature selection. After it, two different fusions are proposed: in the first method we propose a score fusion scheme and in the second method, a features fusion scheme is proposed. 3.1 Computing the eigenfaces from the visible

images As it is illustrated in Fig. 2 and in Fig. 3, inside the first main block from the figures (denoted with A), the eigenfaces and the projections of the train and of the test images in the PCA induced feature space are computed firstly in the fusion score scheme and, also, in the features fusion scheme. Thus, every facial image acquired in the visible spectrum is represented in the subspace of the eigenfaces. 3.2 Computing the eigenfaces from IR thermal

images Using the same principle as in the visible spectrum, the eigenfaces in the IR spectrum are computed, as illustrated in blocks B. from Fig. 2 and Fig. 3, and the projections of the IR thermal images on the IR thermal eigenfaces subspace are also computed.

3.3 The fusion schemes Every multimodal biometric system requires an integrating rule for the fusion of the different type of the feature data extracted. The fusion schemes can be classified in: fusion at image level [23, 27], fusion at feature level [20, 27], fusion at matching score level [17, 21, 24, 38] and fusion at decision level [22]. In our paper, we proposed two types of fusion in order to maximize the face recognition efficiency in PCA induced feature space. The first fusion scheme, illustrated in block C1 from Fig. 2 is a score fusion scheme based on the Euclidian distance and the k-NN classifier. The second fusion scheme is illustrated in block C2 from Fig. 3 and represents a features fusion scheme based on a SVM classifier. A briefly review of the fusion schemes that we proposed will be presented below. 3.3.1 The score fusion scheme As it is illustrated in Fig. 2, the first step in our score fusion scheme is to compute the Euclidean distance between the projections of test image and the projections of every training image in the visible eigenfaces space (represented in block A from Fig. 2) and repeating the same operations for the projections of the facial images in the thermal spectrum (represented in block B from Fig. 2). The score fusion scheme, illustrated in block C1 from Fig. 2, is applied on the Euclidian distances computed previously, with the purpose of maximizing the face recognition efficiency for all subjects by using the two modalities. In our approach we proposed a fusion scheme based on the matching score level, by introducing the weighted distance between the visible spectrum images and the thermal images. The weighted distance dw(x,xt) between a bimodal pair of test images x={IV,IIR} and a pair of training face images xt={It,V,It,IR}, is computed as:

),()1(),,(),( ,IRtIRtV IIdVIIdxtxdw ⋅−+⋅= αα (7)

where d denotes the Euclidian distance in the PCA induced subspace.



Figure 3: Flowchart of the features fusion scheme proposed. (A) PCA-based feature extraction in visible spectrum; (B) PCA-based feature extraction in thermal spectrum; (C2) Features fusion scheme. The weight α (between 0 and 1) in the computation of dw(x,xt) is determined based on a validation set, to maximize the face recognition rate. On the weighted distances, a k-NN classification is applied to obtain the recognition result. 3.3.2 The features fusion scheme The feature fusion scheme is illustrated in Fig. 3, where in block A is illustrated the feature extraction in visible spectrum, in block B the feature extraction in thermal spectrum and in block C2 the features fusion scheme. The purpose of the scheme is to fuse the projections of the visible facial image IV and the projections of the similar thermal spectrum facial image IIR in a β weighted combination, where the weight β should be chosen to maximize the recognition rate of the multi-class linear SVM classifier over some validation set of examples. Let us denote the recognition rate of this classifier, depending on β, by rate(β),

setn validatioin the instances ofnumber Totalsetn validatioin the instances classifiedcorrectly ofNumber )rate( =β .

Then the feature vector is described by Eq. (8) and the optimal value of the weight β is given by the Eq. (9).

])1(;[)( IRIVIionFeatureFus •β−•β=βx (8)

( )β=β ratemaxarg (9) 4 EXPERIMENTS AND RESULTS In order to test our fusion approach, we use the OTCBVS benchmark [32]. For both of the fusion schemes proposed, we established a small training set and a validation set to derive the optimal weight for the fusion (α or β), and several test sets for expressions and pose variations. 4.1 The face database and the evaluation

design In our experiments, we use OTCBVS benchmark [32], which contains 4228 pairs of visible and IR

thermal images. Few samples are illustrated in Fig. 4. The images are acquired under illumination, expressivity (“surprised”, “laughing”, “angry”) and pose (11 positions for each type of acquisition as illustrated in Fig. 3) variation. We select OTCBVS benchmark since its images simulate the main variations of real scenarios.

Figure 4: Samples of different subjects from OTCBVS; (A) Visible spectrum images; (B) IR thermal images

Figure 5: Example of a subject acquisition under pose variation; (A) Visible spectrum images; (B) IR thermal images In our tests, no preprocessing is performed on the images, due to test the discriminative power of IR and visible spectrum in individual PCA based face recognition methods and in our fusion based approach. The preprocessing techniques would increase indirectly the recognition rate. The data used for the experiment is divided in three main sets:



training, validation and test sets. Efforts were made to test the fusion schemes proposed, in order to reflect the performance of real world scenarios according to high variations of pose or expressivity. Most of the results reported in the literature are obtained on experimental images with nearly frontal pose, with preprocessing (manually or automatically) such as localization, scale or rotation of the human faces, which are not practical in most of the face recognition applications. 4.2 The training set We include in the training set a total of 12 nearly frontal images for each subject: 6 images in IR spectrum and 6 images in visible spectrum. For each spectrum there are 3 images under the “surprised” expressivity (pose 4, 6 and 8) and 3 images under the “laughing” expressivity (also pose 4, 6 and 8). C classes are defined from the training set, and each class contains 6 projections in the eigenfaces subspace for each spectrum. 4.3 The validation set After computing the projections of the training images in the eigenfaces subspace, we must determine the optimal value of the fusion parameters α (weight of the score fusion scheme) and β (weight of the features fusion scheme). This is done by tuning the weights to maximize the recognition performance on a validation set. The validation set includes images with the “angry” expressivity under pose 5 and 7.

Figure 6: Results on the validation set for various α values in the score fusion scheme.

The recognition rate in the validation set for values of α ranging from 0 (IR modality only) to 1 (visible modality only – 100%) are illustrated in Fig. 6 for the score fusion scheme. Thus, in Fig. 6, the classical PCA based recognition rate in visible spectrum is draw with the dotted blue line and in IR spectrum with the dotted green line. According to the validation set, the weight of the visible image score in the score fusion scheme is chosen as the minimum weight 82% (i.e. α=0.82) that maximizes the recognition rate. It is important to remark from Fig. 6 that for a large set of α values, from 0.2 to 0.98, the performance of the bimodal approach is higher than the performance of the classical PCA approach on visible images and for every α between 0.01 to 1, the bimodal performance is higher than the classical PCA approach on IR thermal images. For the features fusion scheme, many values of β offers superior results than a classical PCA-a based approach in visible or only thermal spectrum with a linear SVM classifier. Such as values of the features fusion weight which maximize the recognition results are 0.7, 0.72 or 0.8. The 0.8 value for β is carried out for the performance evaluation tests to compare all the results from both of the fusion schemes proposed. 4.4 Performance evaluation test The next experiments aim to evaluate the performance of our bimodal fusion based approaches. To exhibit superiority of the bimodal systems, the single modal scheme’s performances of the visible images and the IR thermal images are carried out for comparison. The performances are evaluated by the means of k-NN classification (with k=1, 3, 5 and 7) for the score fusion scheme and SVM classification for the features fusion scheme. A particular experiment is for k-NN based only on the first neighbor, i.e. k=1, and only on the visible spectrum images, i.e. α=1, which is the classical PCA-based method [2] but applied on the OTCBVS database. The first evaluation experiment is considered an expressivity test. The test set is consisting from images with the “angry” expressivity, a different expressivity than those from the training set, and the same nearly-frontal poses: 4, 6 and 8.

Table 1: Description of the images sets for the performance evaluation tests.

Test 1 Test 2 Test 3 Test 4 Test 5 Test 6 Expr. Pose Expr. Pose Expr. Pose Expr. Pose Expr. Pose Expr. Pose

- angry 4, 6, 8

- surprised - laughing

3, 5, 7 - angry 3, 9 - surprised

- laughing 5, 7 - surprised - laughing

- angry

1, 2, 3, 5, 7, 9, 10, 11

- surprised - laughing

- angry

2, 3, 9, 10



Table 2: Recognition rates for the score fusion scheme on OTCBVS (V- visible spectrum; IR – thermal IR spectrum; Bi – bimodal score fusion scheme).

1-NN 3-NN 5-NN 7-NN

V IR Bi V IR Bi V IR Bi V IR Bi

Test 1(%) 91.66 73.80 96.42 85.71 73.80 94.04 82.14 70.23 85.71 79.76 65.47 86.90

Test 2(%) 95.08 87.05 98.21 90.62 78.57 94.64 83.48 75.00 91.07 76.33 64.73 83.03

Test 3(%) 89.28 62.50 92.85 82.14 51.78 87.50 73.21 44.64 78.57 64.28 37.50 71.42

Test 4(%) 99.10 92.85 100 99.10 86.60 99.10 97.32 84.82 99.10 89.28 77.67 96.42

Test 5(%) 69.10 53.72 74.85 64.13 48.21 70.53 61.01 43.75 66.51 55.65 36.16 59.07

Test 6(%) 75.89 56.25 82.73 66.36 48.80 75.29 61.30 42.55 69.34 55.65 33.03 58.33

The second evaluation experiment is a pose test, where the test set consist from images with the same expressivity as those from the training test (“surprised” and “laughing”) but with different poses selected (3, 5, 7 and 9). The third test set includes images with a different expressivity (“angry”) and also different poses (3 and 9) than those from the training or the validation sets. Finally, we evaluate the performance of our approaches with other three test sets that include also the largest expressivity and pose variations from the data set, with extreme pose variations (such as pose 1 or 11) and all the expression variations. The description of the test images sets for our experiments is illustrated in Table 1. 4.5 Results The performances of each individual modality and of our fusion based approach are illustrated in Table 2 for the score fusion scheme and in Table 3 for the features fusion scheme. As can be seen, in every test for both of the fusion schemes, the recognition rate for the IR spectrum is lower than the visible spectrum with significantly differences, also due to the acquisition of the IR thermal images which are more variant in respect to pose and even rotation, compared with the images from the visible spectrum. It is expected that a preprocessing of both the visible and the IR images may solve partially the significant difference of the results. For the score fusion scheme, it can be seen that for the 1-NN classification, our approach has superior results in all the tests, also being the only one that achieves 100% recognition rate in some tests (i.e. Test 4). The recognition rate of our approach is lower in expressivity variance than in pose variance, mainly due to the training set selected. For an extremely high pose variance (i.e. pose 1 and 11), it is expected that all the rates will be lower, as in Tests 5 or 6. Another issue easily to remark is the lower recognition performance of the higher nearest neighbor classification (i.e. k=5, 7) due to the small number of samples in the training set as compared to the number of classes.

For the features fusion scheme it can be seen that also the bimodal approach had obtained better results than the classical PCA approach on a single spectrum with a linear SVM for the classification. Due to the complexity of the SVM, the results for the visible spectrum or the thermal spectrum facial images are superior to those obtained with k-NN in almost all the tests. Also, the improvements obtained with a bimodal system in the features fusion scheme are usually better than the results of the score fusion scheme. In test 4, the rates for visible spectrum and for our features fusion scheme is maximum for the weight β=0.8. All the results for the features fusion scheme are illustrated in Table 3. As can be observed, the SVM classifier use more favorable the features from the thermal spectrum than the k-NN classifier. Table 3: Recognition rates for the features fusion scheme on OTCBVS (V-visible spectrum; IR-thermal IR spectrum; Bi-bimodal features fusion scheme).

V IR Bi Test 1(%) 94.04 76.19 98.80 Test 2(%) 96.42 92.85 98.21 Test 3(%) 83.92 66.07 92.85 Test 4(%) 100 99.10 100 Test 5(%) 69.94 55.80 74.55 Test 6(%) 75.29 59.82 81.84

For our experiments, we proposed a set with a small number of training images in order to simulate a real practical application with difficult scenarios such as: few samples for every person and high variations in appearance in the test images. As can be seen in Table 2 and Table 3, for a difficult test which considers all the expressions and a large set of positions of the subject (even with half of the face hidden), the recognition result of the classical PCA-based approach is as poor as 69.10% and for a PCA-approach classified with SVM classifier is closely as 69.94% and our fusion based approaches improves the recognition rate with almost 6% in both cases.



Figure 7: Performance of the fusion-based approaches. From the same tests, it can be seen that even if the SVM classifier offers superior results in most of the test, there are situations when it has minor lower differences than the score fusion approach based on k-NN classifier. In all the experiments, we found that the performance of our score fusion based approach with α=0.82 and our features fusion based approach with β=0.8 exceeds the individual performances of the systems, sometimes with almost 10% (illustrated in Fig. 7 for direct comparison of the results).

5 CONCLUSIONS Two fusion based approaches that highly improves the performance of the classical PCA-based techniques are proposed in this paper. First approach is a score fusion system of the PCA induced feature space with a k-NN classification and the second approach is a directly features fusion system in the same PCA induced space with a linear SVM classification. The PCA-based techniques in visible spectrum aim to achieve high recognition rate for frontal images with low intra-personal variation. Thus, for practical face recognition application, the acquisition’s conditions cannot be always controlled (e.g. expressivity and pose variance), and the performance of the classical PCA-based approaches are highly affected. In order to minimize the intra-personal variations of the human faces, we combine the discriminative power of IR and visible spectrum, and provide a principled formulation of a procedure to select optimal values of the fusion weight α or β between the two modalities. To improve the recognition rates, the classification with a non-linear more complex SVM can be performed or preprocessing can be also performed, thus the scale, rotation and illumination variations would be reduced.

6 REFERENCES [1] R. Brunelli and T. Poggio: Face recognition:

features versus templates, IEEE Trans. Patt. Anal. Mach. Intell. 15(10), pp. 1042-1052 (1993).

[2] M.A. Turk and A.P. Pentland: Eigenfaces for

face recognition, Journal of Cognitive Neuroscience, 3(1), pp. 71-86 (1991).

[3] M.R. Gupta and N.P. Jacobson: Wavelet Principal Component Analysis and its Application to Hyperspectral Images, IEEE Int’l Conf. on Image Processing, pp. 1585-1588 (2006).

[4] W. Hu, O. Farooq and S. Datta: Wavelet Based Sub-space Features for Face Recognition, CISP '08. Congress on Image and Signal Processing, Vol. 3, pp. 426-430, (2008).

[5] H. Wang, S. Yang and W. Liao: An Improved PCA Face Recognition Algorithm Based on the Discrete Wavelet Transform and the Support Vector Machines, Int’l Conf. on Computational Intelligence and Security Workshops, pp. 308-311 (2007).

[6] M. Mazloom and S. Ayat: Combinational Method for Face Recognition: Wavelet, PCA and ANN, Digital Image Computing: Techniques and Applications, pp. 90-95 (2008).

[7] P. Parveen and B. Thuraisungham: Face recognition using multiple classifiers, Proceedings of the 18th International Conference on Tools with Artificial Intelligence, pp. 179-186 (2006).

[8] D. He, L. Zhang and Y. Cui: Face Recognition Using (2D)^2PCA and Wavelet Packet Decomposition, Congress on Image and Signal Processing, vol. 1, pp. 548-553 (2008).

[9] M. Zhao, P. Li and Z. Liu: Face Recognition Based on Wavelet Transform Weighted Modular PCA, CISP’08, vol. 4, pp.589-593 (2008).

[10] G.F. Xu, S.Q. Ding, L. Huang and C.P. Liu: Recognition based on wavelet reconstruction face, International Conference on Machine Learning and Cybernetics, pp. 3005-3020 (2008).

[11] Y. Yoshitomi, T. Miyaura, S. Tomita and S. Kimura: Face Identification Using Thermal Image Processing, 6th IEEE International Workshop in Robot and Human Communication, pp. 374–379 (1997).

[12] D. Socolinsky, A. Selinger and J. Neuheisel: Face Recognition With Visible And Thermal Infrared Imagery, Computer Vision and Image



Understanding, Vol. 91, Issue 1-2, pp.72–114 (2003).

[13] A. Selinger and D. Socolinsky: Appearance-Based Facial Recognition Using Visible And Thermal Imagery: A Comparative Study, Technical Report 02-01, Equinox Corporation (2002).

[14] F.J. Prokoski, R.B. Riedel and J.S. Coffin: Identification of individuals by means of facial thermography, Proceedings of the IEEE International Conference on Security Technology, Crime Countermeasures, pp. 120–125 (1992).

[15] S. G. Kong, J. Heo, B.R. Abidi, J. Paik, M.A. Abidi: Recent advances in visual and infrared face recognition-a review, Computer Vision and Image Understanding, Vol. 97, Issue 1, pp. 103-135 (2005).

[16] X. Chen, P. J. Flynn, K. W. Bowyer: PCA-Based Face Recognition in Infrared Imagery: Baseline and Comparative Studies, International Workshop on Analysis and Modeling of Faces and Gestures, IEEE, Nice, France, (2003).

[17] D.A. Socolinski, A. Selinger: Thermal Face Recognition In An Operational Scenario, CVPR 2004, Vol. 2, pp. II-1012 - II-1019 (2004).

[18] S.W. Jung, Y. Kim, A.B.J Teoh, K.A. Toh: Robust Identity Verification Based on Infrared Face Images, ICCIT’07, pp. 2066-2071 (2007).

[19] A.F. Abate, M. Nappi, D. Riccio, G. Sabatino: 2d And 3d Face Recognition: A Survey, Pattern Recognition Letters, Vol. 28, Issue 14, pp. 1885-1906 (2007).

[20] Y. Yao, X. Jing, H. Wong: Face And Palmprint Feature Level Fusion For Single Sample Biometrics Recognition, Neurocomputing Vol. 70, Issues 7-9, pp. 1582–1586 (2007).

[21] S. Ribaric, I. Fratric: A Biometric Identification System Based On Eigenpalm And Eigenfinger Features, IEEE Trans. on Patt. Anal. and Mach. Intell., Vol. 27, Issue 11, pp. 1698–1709 (2005).

[22] L. Hong and A. Jain: Integrating faces and fingerprints for personal identification, IEEE Transactions on Pattern Analysis and machine Intelligence, Vol. 20, Issue 12, pp. 1295-1307 (1998).

[23] X. Jing, Y. Yao, D. Zhang, M. Li: Face and Palmprint Pixel Level Fusion And Kernel DCV-RBF Classifier For Small Sample Biometrics Recognition, Pattern Recognition Vol. 40, Issue 11, pp. 3209–3224 (2007).

[24] K.I. Chang, K.W. Bowyer, P.J. Flynn, X. Chen: Multi-biometrics Using Facial Appearance, Shape and Temperature, Proceedings Sixth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 43-48 (2004).

[25] P. Buyssens, M. Revenu, O. Lepetit: Fusion of IR and visible light modalities for face recognition, IEEE 3rd International Conference

on Biometrics: Theory, Applications, and Systems, pp.1-6 (2009).

[26] M.D. Shahbe and S. Hati: Decision fusion based on voting scheme for IR and visible face recognition, Computer Graphics, Imaging and Visualisation, pp. 358-364 (2007).

[27] S. Singh, A. Gyaourova, G. Bebis, I. Pavlidis: Infrared and visible image fusion for face recognition, Proceedings of SPIE Defense and Security Symposium, vol. 5404. pp. 585-596 (2004).

[28] R. Singh, M. Vatsa, A. Noore: Integrated Multilevel Image Fusion and Match Score Fusion of Visible and Infrared Face Images for Robust Face Recognition, Pattern Recognition, Vol. 41, Issue 3, pp. 880-893 (2008).

[29] H. Schwenk: The diabolo classifier, Neural Computation, Vol. 10, Issue 8, pp. 2175–2200 (1998).

[30] <Notre Infrared Face Database>: http://www.nd.edu/~cvrl/undbiometricsdatabase.html.

[31] <Equinox Infrared Face Database> http://www.equinoxsensors.com/products/HID.html

[32] <OTCBVS Thermal/Visible Face Database>: http://www.cse.ohio-state.edu/OTCBVS-BENCH/bench.html

[33] F. Smarandache and J. Dezert: Advances and applications od DSmT for information fusion, American Research Press (2004).

[34] N. Vapnik: Statistical Learning Theory, J. Wiley, N.Y., (1998).

[35] M. Gordan, C. Kotropoulos, I. Pitas: A Support Vector Machine-Based Dynamic Network for Visual Speech Recognition Applications, EURASIP JASP, Special Issue on Joint Audio-Visual Speech Processing, Vol. 2002, No. 11 , pp. 1248-1259 (2002).

[36] M. Gordan, A. Georgakis, O. Tsatos, G. Oltean, L. Miclea: Computational Complexity Reduction of the Support Vector Machine Classifiers for Image Analysis Tasks Through the Use of the Discrete Cosine Transform, Proc. of IEEE-TTTC International Conference on Automation, Quality and Testing, Robotics A&QT-R 2006 (THETA 15), Volume 2, pp. 350 – 355 (2006).

[37] J. Milgram, M. Cheriet, R. Sabourin: “One against one” or “one against all”: which one is better for handwriting recognition with SVMs?, Tenth International Workshop on Frontiers in Handwriting Recognition (2006)

[38] C. Lu, J. Wang and M. Qi: Multimodal Biometric Identification Approach Based on Face and Palmprint, Second International Symposium on Electronic Commerce and Security, Vol.2, pp. 44-47 (2009).



FACE RECOGNITION UNDER POSE AND EXPRESIVITY · PDF fileFACE RECOGNITION UNDER POSE AND...

Documents

Transcript of FACE RECOGNITION UNDER POSE AND EXPRESIVITY · PDF fileFACE RECOGNITION UNDER POSE AND...