ArticlePDF Available

Multi-feature shape regression for face alignment

August 2018
EURASIP Journal on Advances in Signal Processing 2018(1)

August 2018
2018(1)

DOI:10.1186/s13634-018-0572-6

License
CC BY

Authors:

Yi-Chen Chen

Jar-Ferr Yang

National Cheng Kung University

Abstract For smart living applications, personal identification as well as behavior and emotion detection becomes more and more important in our daily life. For identity classification and facial expression detection, facial features extracted from face images are the most popular and low-cost information. The face shape in terms of landmarks estimated by a face alignment method can be used for many applications including virtual face animation and real face classification. In this paper, we propose a robust face alignment method based on the multi-feature shape regression (MSR), which is evolved from the explicit shape regression (ESR) proposed in Cao et al. (Int, Vis, 2014, 107:177–190, Comput). The proposed MSR face alignment method successfully utilizes color, gradient, and regional information to increase accuracy of landmark estimation. For face recognition algorithms, we further suggest a face warping algorithm, which can cooperate with any face alignment algorithm to adjust facial pose variations to improve their recognition performances. For performance evaluations, the proposed and the existing face alignment methods are compared on the face alignment database. Based on alignment-based face recognition concept, the face alignment methods with the proposed face warping method are tested on the face database. Simulation results verify that the proposed MSR face alignment method achieves better performances than the other existing face alignment methods.

Flow diagram of the explicit shape regression method

…

Comparisons with different weights for pixel, region, and gradient differences

…

Shape-indexed features. a Pixels indexed by global coordinates. b Pixels indexed by local coordinates

…

Characteristics of the landmark including a pixel value, b regional block, and c gradient magnitude of a pixel

…

+16

Eight neighboring pixels in 3 × 3 and 5 × 5 windows around p k

…

Figures - available from: EURASIP Journal on Advances in Signal Processing

This content is subject to copyright. Terms and conditions apply.

Access to this full-text is provided by Springer Nature.

Learn more

Content available from EURASIP Journal on Advances in Signal Processing

This content is subject to copyright. Terms and conditions apply.

R E S E A R C H Open Access

Multi-feature shape regression for face

alignment

Wei-Jong Yang, Yi-Chen Chen, Pau-Choo Chung and Jar-Ferr Yang

Abstract

For smart living applications, personal identification as well as behavior and emotion detection becomes more and

more important in our daily life. For identity classification and facial expression detection, facial features extracted

from face images are the most popular and low-cost information. The face shape in terms of landmarks estimated by a

face alignment method can be used for many applications including virtual face animation and real face classification.

In this paper, we propose a robust face alignment method based on the multi-feature shape regression (MSR), which is

evolved from the explicit shape regression (ESR) proposed in Cao et al. (Int, Vis, 2014, 107:177–190,Comput).The

proposed MSR face alignment method successfully utilizes color, gradient, and regional information to increase

accuracy of landmark estimation. For face recognition algorithms, we further suggest a face warping algorithm,

which can cooperate with any face alignment algorithm to adjust facial pose variations to improve their recognition

performances. For performance evaluations, the proposed and the existing face alignment methods are compared on

the face alignment database. Based on alignment-based face recognition concept, the face alignment methods with

the proposed face warping method are tested on the face database. Simulation results verify that the proposed MSR

face alignment method achieves better performances than the other existing face alignment methods.

Keywords: Face alignment, Face warping, Face recognition, Pose variation, Shape regression

1 Introduction

For smart living applications, the identification and behavior

and emotion detection of a person become more and more

important in our daily modern life. For identity verification

and facial expression detection, the facial features extracted

from the captured images are the most popular and low-

cost information. The face shape in terms of the positions of

landmarks is one of the important features. Once the face

shape is extracted, the landmarks can be used for many

applications including face animation for argument reality

(AR) and virtual reality (VR) and emotion detection and face

recognition for smart living services. Face recognition has

been widely investigated in academic and industrial commu-

nities due to the extraordinary demands of security controls

in sensitive areas, device and machine accesses, internet

secure usages, etc. In practical face recognition systems, for

example, a low-computation and accurate system could be

operated under various challenges, such as pose variations,

illumination changes, and partial occlusions. To overcome

the problem of facial pose variations, a suitable face

alignment algorithm figured with an appropriate warping

method becomes essential for face recognition.

Face alignment, which could locate the semantic key

facial landmarks, such as facial contour, eye and mouth

shapes, and nose and chin positions, is a necessary tool

to estimate face contour and key facial characters in face

images. From a captured facial image, the goal of face

alignment is to minimize the difference between the esti-

mated and ground true shapes defined by a set of facial

landmarks. Over past decades, the shape estimation

along the outer facial contour of a given facial image has

been widely investigated for face alignment. The alignment

algorithms can be generally categorized into optimization-

based and regression-based approaches. The optimization-

based algorithms depend on the design of error func-

tions and optimization iterations. The most popular

optimization-based algorithms include the active shape

models (ASMs) [1,2] and their extensions, called active

appearance models (AAMs) [3–6]. For both ASM and

AAM, the generative landmark positions from rough

* Correspondence: jfyang@ee.ncku.edu.tw

Department of Electrical Engineering, Institute of Computer and

Communication Engineering, National Cheng Kung University, Tainan,

Taiwan

EURASIP Journal on Advances

in Signal Processing

International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and

reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to

the Creative Commons license, and indicate if changes were made.

Yang et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:51

https://doi.org/10.1186/s13634-018-0572-6

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

initial estimations are trained by the point distribution

to iteratively refine the results. The parametric shape

models utilized to keep shape constraints are not flexible

enough to fit the faces with large variations and partial

occlusions. The regression-based algorithms [7,8] utilize

regression functions to directly map an image appearance

to the target output. Because the complex variation is

trained from a large dataset, the testing process becomes

efficient in general. In 2012, Cao et al. proposed the

explicit shape regression (ESR) method [9] and realized

the shape constraint to attain a good face alignment in

non-parametric manners.

As to face recognition, numerous successful algorithms

were proposed [10–14]. Over the past years, the subspace

projection optimizations (SPO) with linear and non-linear

approaches are the main research trends. The principal

component analysis (PCA) [10–13] and linear discrimin-

ant analysis (LDA) [14] with linear approaches attempt

to seek a low-dimensional subspace for computation

reduction and performance improvement. The kernel

PCA (KPCA) [15–17]andkernelLDA(KLDA)[18–21]

with non-linear projection approaches can uncover the

underlying structure when the samples lie on a nonlinear

manifold structure. The linear regression classification

(LRC) proposed in [22] is simple in nature and effective in

performance while the modular linear regression classifi-

cation (MLRC) can deal with the occlusion problems.

Simple computation in both training and testing proce-

dures is the advantage of the above methods. Without

re-training the existing candidates, the SPO face recogni-

tion methods can add the hyperplane of any new identity

in the system directly. However, without any assistance,

the SPO face recognition methods cannot achieve suc-

cessful recognition in uncontrollable variations. Currently,

some researches focused on contextual information and

learning-based algorithms [23–25]. In [23], the context-

aware local binary feature achieves better robustness than

the local feature descriptor such as LDA. The convolutional

Fig. 1 Flow diagram of the explicit shape regression method

Fig. 2 Shape-indexed features. aPixels indexed by global coordinates. bPixels indexed by local coordinates

Yang et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:51 Page 2 of 13

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

neural networks [24,25] were introduced for face recogni-

tion to show better performance than the SPO approaches

if a suitable deep network with a large tagged database is

learnt. However, the learning approaches, which need in-

tensive computation for training and testing computation,

may not be suitable for real-time applications in current

handheld devices.

The rest of the paper is organized as follows. In Section 2,

the proposed methods in this paper are described. In

Section 2.1, the explicit shape regression (ESR) face

alignment method is first reviewed. Section 2.2 introduces

the proposed multi-feature shape regression (MSR) face

alignment method in details. In Section 2.3, the alignment-

based face recognition with cross face warping is suggested

to improve the performances of SPO face recognition

methods. The detailed procedure of the cross warping

method is described. To demonstrate the effectiveness

of the methods, the performances of the proposed and

existing face alignment methods are first evaluated on

the famous face alignment database in Section 3.The

face recognition performances with pose variations are

then demonstrated on the face recognition database by

using different SPO face recognition methods and different

face alignment algorithms. In Section 4, the conclusions

about this paper are finally addressed.

2 Methods

In this paper, we propose a robust face alignment method,

which can estimate the positions of facial parameters and

a cross face warping method to adjust the position of

facial parameters. Thus, we can apply all SPO face rec-

ognition methods to the adjusted face image to achieve

better recognition performance. The robust face alignment

method is based on multi-feature shape regression (MSR)

to achieve robust landmark estimation. With the estimated

landmarks, a face cross warping method is proposed to

reduce the pose variation of facial images such that the

SPO face recognition methods can be improved to obtain

better recognition performances.

2.1 Face alignment with shape regression

The face shape is generally defined by the positions of M

selected landmarks as

S¼x1;y1;x2;y2;:…;xM;yM

½¼p1;p2;:…;pM

½;ð1Þ

where p

=(x

) denotes the position of the mth

landmark in the facial image. To estimate Mlandmarks

from a given facial image, we should design an effective

face alignment method to estimate (x

), m=1,2, ….,

M. The explicit shape regression (ESR) algorithm [9]isa

famous learning-based regression method. Figure 1shows

the basic framework of the ESR algorithm with a boosted

regression process [26,27], which combines Tweak

regressors, R1,R2,…., RT, in an additive manner. Each

regressor computes a shape increment δSfrom image

features and updates the face shape as

St¼St−1þRtI;St−1



;t¼1;2;…;T:ð2Þ

Given Ntraining data ðIi;^

SiÞfor i=1, 2, …,N,

the regressors, R1,R2,…., RT, are sequentially learnt

until the training error no longer decreases. The tth

regressor R

is learnt by minimizing the regression error

Rt¼arg min

i¼1

Si−ðSt−1

iþRI

i;St−1







;ð3Þ

where ^

Sidenotes the ground truth shape and St−1

iis the

estimated shape obtained from the previous (t−1)th

regressor.

Fig. 3 Characteristics of the landmark including apixel value, b

regional block, and cgradient magnitude of a pixel

Fig. 4 Eight neighboring pixels in 3 × 3 and 5 × 5 windows around p

Table 1 Comparisons with different weights for pixel, region,

and gradient differences

Weights (w

) Landmark error (%)

0.8, 0.1, 0.1 3.90

0.7, 0.2, 0.1 3.59

0.7, 0.1, 0.2 4.50

0.6, 0.3, 0.1 3.30

0.6, 0.2, 0.2 3.95

0.6, 0.1, 0.3 4.80

Yang et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:51 Page 3 of 13

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

However, a simple weak regressor has the limited

performance to reduce the error. For this reason, the

two-level cascaded regression with the selected feature

extraction as shown in Fig. 1is proposed, and each

weak regressor R

is learnt by the second-level boosted

regression, i.e., Rt¼½rt

1;rt

2;…;rt

k;…;rt

K. Thus, the selected

features are extracted by shape-indexed methods at each

outer stage. Afterwards, each fern selects Fof these features

to infer an offset based on the correlation-based feature

selection method. The fern-based regressor, shape-indexed

feature, and correlation-based feature selection will be

described in details as follows.

The fern is firstly applied for classification [26] and

later used for regression [27]. In the ESR, each fern is

composed of Ffeatures and thresholds. And the threshold

is used to divide all the training samples into 2

bins. After

classification of all training samples, the regression output

δSbin each bin bminimizes the alignment error of Ω

The training samples falling into the bin as:

δSb¼arg min

δSX

i∈Ωb

Si−SiþδSðÞ





;ð4Þ

where S

denotes the estimated shape in the previous

step. According to (4), δSbcan be estimated by:

δSb¼1

ΩbX

i∈Ωb

Si−Si



2:ð5Þ

The training samples falling to the same bin own the same

regression output, δSb. Each outer stage regressor gener-

ates Ppixels, I(q

), k=1, 2, …,Prandomly which are

indexed relative to the nearest landmark of mean shape,

asshowninFig.2.TotalP

pixel-difference features,

Training

Images

Testing

Images

Skin Color

Detection

Skin Color

Detection

Viola-Jones

Face Detection

Viola-Jones

Face Detection

Face Alignment

Computation

Face Alignment

Computation

Face Warping

Computation

Face Warping

Computation

Face Collection

and Training

Face

Recognition

Results

Fig. 5 Flow diagram of typical alignment-based face recognition

Fig. 6 Seven selected key landmarks for face alignment

Fig. 7 Facial image with aseven landmarks retrieved by the face

alignment method and bthree fitting (HE, HM, and VN) lines

obtained by the least square method

Yang et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:51 Page 4 of 13

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

fk;j¼Iq

ðÞ−Iq



;ð6Þ

between all possible two pixels are generated. The local

features are more discriminative than the global ones.

Pixels indexed by the same local coordinates have the

same semantic meaning, but pixels indexed by the same

global coordinates have different semantic meanings due

to face pose variation. Most of the useful features are

distributed around salient landmarks such as eyes, nose,

and mouth. To form a fern, Fout of P

features are

selected by calculating the correlation between the fea-

tures and the regression target, which is the difference

between the ground truth shape and the current estimated

shape. The optimization can be achieved while we generate

a random unit vector, then project each regression target

onto it. We finally estimate the correlation coefficient

between feature values and the lengths of projections to

find the optimal shape.

2.2 Multi-feature shape regression

The ESR algorithm detects the similarity of landmarks

by the intensity difference of pixels as stated in (6);

however, the characteristics of the landmarks are dif-

ferent not only with its pixel value. As shown in

Fig. 3, we should further check the similarity of their

surroundings to improve the detection performance.

The multi-feature shape regression (MSR) method re-

places the pixel difference feature with the multiple

features to achieve more robust landmark detection

than the ESR method. In the first feature set, as

shown in Fig. 3a, the color values of the pixel at p

defined as

k¼rp

ðÞ;gp

ðÞ;bp

ðÞ½;ð7Þ

where r(p

), g(p

), and b(p

) denote the red, green, and

blue values of the pixel at p

for the kth selected land-

mark of the image, respectively. To achieve reliable re-

sults, the intensities of eight neighboring pixels in 3 × 3

and 5 × 5 windows as shown in Fig. 4can be used for de-

tecting the similarity of the landmarks. Thus, for the sec-

ond feature set as shown in Fig. 3b,weusetheregional

values, vr

k,atp

Fig. 8 Four major deformations based on detected cross in face images. aLeft-tilted face. bRight-tilted face. cRotation left face. dRotation

right face

Fig. 9 Flow diagram of conditional cross warping for tilted faces

Yang et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:51 Page 5 of 13

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

k¼Iz

ðÞ;Iz

ðÞ;::…;Iz

ðÞ½:ð8Þ

For the last feature set, as shown in Fig. 3c, we choose

the gradient magnitudes at p

defined as

k¼∇xIp

ðÞðÞ;∇yIp

ðÞðÞ



;ð9Þ

where ∇

(I(p

)) and ∇

(I(p

)) denote the gradients along

xand ydirections are computed by horizontal and vertical

Sobel filters, respectively. With pixel, region, and gradient

features, the total difference between the jth pixel and the

kth landmark at p

is expressed by

k;j¼wpdp

k;jþwrdr

k;jþwgdg

k;jð10Þ

where w

,andw

are the selected weights for the pixel,

region, and gradient differences, respectively. In (10), dp

k;j,

kj,anddg

kj are given as

k;j¼vp

k−vp

;ð11Þ

kj ¼vr

k−vr

;ð12Þ

and

kj ¼vg

k−vg

:ð13Þ

which denote the pixel, region, and gradient differences

between the kth and jth pixels, respectively. Thus, the

feature f

k,j

stated in (6) suggested in EST method is

changed to the total difference as

fk;j¼dT

k;j;ð14Þ

in the proposed MSR method. To determine the weights

depicted in (10), Table 1shows the experimental results

that exhibit landmark errors with different sets of

weights. The pixel difference, which plays the main role

in shape regression, is with the largest weight while the

region and gradient differences, which are used for the

feature refinements, are with slightly smaller weights. By

experiments, we found that the weights with 0.6, 0.3, and

0.1 for pixel, region, and gradient differences, respectively,

achieve the best performance for shape regression. It is

noted that the above MSR concept can be extended to

more features and can be applied for any landmark esti-

mation of target objects.

2.3 Face warping method for alignment-based face

recognition

Once the positions of the key landmarks are extracted

by a face alignment method, they can be used for many

Fig. 10 Top view and detected cross lines related to the nose point of aright-rotation, bnormal, and cleft-rotation faces

(a) (b)

Fig. 11 Alignment face images. aFacial image after rotating. bFacial

image after cropping Fig. 12 Composition of multiple features for the MSR method

Yang et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:51 Page 6 of 13

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

applications such as face animation for argument reality

(AR) and virtual reality (VR) and emotion detection and

face recognition for smart living services. In this section,

we can use the face alignment to improve the perform-

ance of face recognition. Figure 5shows the flow diagram

of a typical alignment-based face recognition, which

includes four major functions of face detection, face

alignment, face warping, and face recognition. The face

detection includes skin color detection, morphological

operations, and Viola-Jones face detector [28]. The skin

color is a simple and distinct feature for face detection

to reduce the computation [29]. The morphological

erosion and dilation are used to remove the noises of

the detected skin areas. After morphological operation,

the final face detection could be performed in large

connected skin areas by Viola-Jones face detector. To

improve SPO face recognition methods [10–21], the

alignment-based face recognition approach needs a good

selection of landmarks and acquires a good warping algo-

rithm to adjust the pose variation of face images.

Figure 6shows seven selected landmarks, including

four eye canthi, one nose tip, and two mouth corners,

which are used in the face warping method. After seven

key landmarks are extracted by a face alignment method,

we suggest a cross warping method to adjust the facial

image with possible pose variation. First, three fitting

lines are obtained by the least square method as shown

in Fig. 7b. The horizontal eye (HE) line is detected by

fitting four positions of landmarks on the canthi of eyes.

The horizontal mouth (HM) line is obtained by two

positions of landmarks on two mouth corners. The

vertical nose (VN) line is found by fitting the position

of the landmark at the nose and orthogonal to the HE

and HM lines in the least square sense. Figure 8shows

the typical cross shapes, which are composed of VN

and HE lines, of straight front, left-tilted, right-tilted,

left-rotation, and right-rotation faces, will be used for

adjusting the face alignment. The proposed cross warping

method is described as follows.

For general facial images, the deformations could be

mixed with tilted and rotated faces. The flow diagram

of the cross warping method for correcting the face

alignment is shown in Fig. 9. Since the estimated land-

marks could not be always correct, we need to detect

the reliability of all the landmarks at the same time. It

is rational that the two cross lines should be nearly

orthogonal if the estimated landmarks are correct.

Thus, the cross angle θbetween the cross HE and VN

linesiscomputedas

θ¼cos−1m1m2



¼cos−11þm1m2

1þm2

ðÞ1þm2

ðÞ½

1=2

;ð15Þ

where m

=[1,m

]andm

=[1, m

] are the slope vectors,

which can characterize the HE and VN lines with slopes

of m

and m

, respectively. The dot operator in (15)

denotes the inner product. Before the warping process, we

first compute the eye-tilted and nose-tilted angles. The

eye-tilted angle αbetween the horizontal and the HE line

is expressed as

α¼tan−1m1

ðÞ;ð16Þ

while the nose-tilted angle βbetween the vertical and

the VN lines is depicted by

β¼tan−1m2

ðÞ−90:ð17Þ

If the angle of the cross is in range of 80

≥θ≥100

the rotation angle for face alignment is the average of

eye-titled and nose-tilted angles as

θrot ¼αþβðÞ=2:ð18Þ

By setting the nose position as the center, the face

image is rotated by affine transform with θ

rot

degrees

and cropped. If the cross angle is out of 80

≥θ≥100

,the

rotation angle is determined either by eye-tilted angle or

nose-tilted angle. If there more than two landmarks on

the HE line, the rotation angle will be determined by

eye-tilted angle, α, if not, the rotation angle becomes β,

the nose-tilted angle.

As to rotation left and right deformations as

depicted in Fig. 8c,d, the image faces slightly rotate

toward the left and right directions, respectively. Fig-

ure 10 exhibits three top views of rotation faces. For

the normal face, the VN line will evenly divide the

HE line into two equal arms as shown in Fig. 10b.

However, the right-rotation face will produce a longer

right arm and a shorter left arm as shown in Fig. 10a

while the left-rotation face will produce a shorter

Table 2 Landmark errors and failure rates compared with

different features of the MSR method on LFPW database

Methods MSR (pd) MSR (pd + rd) MSR (pd + rd + gd)

Landmark error (%) 4.06 3.78 3.30

Failure rate (%) 7.27 3.63 3.60

Table 3 Landmark errors and failure rates compared with

different features of the MSR method on HELEN database

Methods MSR (pd) MSR (pd + rd) MSR (pd + rd + gd)

Landmark error (%) 4.31 4.17 3.83

Failure rate (%) 3.63 1.52 0.91

Yang et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:51 Page 7 of 13

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

right arm and a longer left arm as shown in Fig. 10c.

For simplicity, we only allow the pointing face in + 6°

and −6°, The true angle warping angles, which is actu-

ally related to camera distance and focus length, with re-

spect to the VN line, could be detected as −6, −3, 0, + 3

and + 6 by the ratio of segmented HE lengths separated by

the VN line. If the face image is mixed with tilted and ro-

tated variations, we should perform the adjustment of the

tilt rotation first and then conduct the adjustment of rota-

tion warping.

After warping transform of the face image, the facial

image is adjusted to become a straight frontal face as

Fig. 11a. Since the white (unknown) regions after affine

transform are possibly yielded, the images are further

cropped to 80% of the face image. Finally, the finally

adjusted face image as shown in Fig. 11b will be used for

face recognition.

3 Experimental results and discussion

For performance assessments of the proposed MSR

face alignment, the experiments are divided into two

main parts. For face alignment, the first part of simula-

tions is performed to verify the alignment performance

of the proposed MSR face alignment method while the

second part is conducted to evaluate the recognition

performance of alignment-based face recognition in

use of the proposed MSR face alignment and cross

warping methods.

3.1 Experiments for face alignment

In face alignment experiments, the proposed multi-feature

shape regression (MSR), the explicit shape regression

(ESR) [8], and the other face alignment methods are com-

pared on the LFPW [29]andHELEN[30]facealignment

databases. The LFPW database contains 792 facial images

for the training phase and 220 facial images for the testing

phase. These facial images were taken at different poses,

facial expression, and head rotation. Each facial image has

68 landmarks which were annotated manually. The HELEN

face database contains 1000 facial images for the training

phase and 330 facial images for the testing phase. Each

facial image contains 194 landmarks which were also

annotated manually.

In order to evaluate the performances, the average

landmark error and failure rate are the two important

criteria to assess the face alignment algorithms. The aver-

age landmark error for all Ntesting images is defined as

error ¼1

n¼1

εn;with εn

¼1

m¼1

m−~

xnm



wnþyn

m−~



1=2

;ð19Þ

where (xn

m,yn

m)and(

m,~

m) respectively represent positions

of the mth estimated landmark and the mth ground truth

landmark, (w

)istheimagesizeofthenth image, and

Mdenotes the number of landmarks. If the average of K

Fig. 13 Selected face alignment results by using athe ESR method (top row), bthe MSR method with pixel and region difference features

(bottom row), and cthe MSR with pixel, region, and gradient difference features (final row)

Table 4 Comparisons of different face alignment methods on

FRGC database

Methods LPCM ERT RCPR SDM ESR MSR

Error 0.040 0.038 0.035 0.035 0.040 0.033

Table 5 Comparisons of different face alignment methods on

HELEN database

Methods ERT RCPR SDM ESR MSR

Error 0.049 0.065 0.059 0.043 0.038

Yang et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:51 Page 8 of 13

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

landmarks of the testing image is more than 0.1, it will be

treated as a fail case, and the number of the fail cases, f,is

denoted as

f¼X

n¼1

δfailn

ðÞ

;δfailn

ðÞ

¼1;if εn>0:1;

0;if εn≤0:1:

ð20Þ

Thus, the failure rate is defined as

failure rate %ðÞ¼

N100%:ð21Þ

In addition, the experimental results for face alignment

with AR and FRGC databases will also be presented.

Since the two databases do not provide the ground truth

shapes, we just can show some selected samples of facial

images and their estimated shape.

The proposed multi-feature shape regression (MSR)

method considers total differences of pixel difference

(pd), region difference (rd), and gradient difference (gd).

The three compositions of the multiple features for MSR

are shown in Fig. 12. As shown in Tables 2and 3, the

landmark errors and failure rates by using different com-

binations of multiple features for the proposed MSR are

tested on LFPW and HELEN databases, respectively.

Some selected facial images with the detected landmarks

by the proposed MSR methods and the ESR method are

also shown in Fig. 13. Thus, the MSR face alignment

method will use pixel difference (pd), region difference

(rd), and gradient difference (gd) with 0.6, 0.3, and 0.1

weights for reminding simulations. For the comparisons of

different face alignment methods, the LPCM (Localizing

Parts of faces using a Consensus of Exemplars) [29], ERT

(Ensemble of Regression Trees) [31], RCPR (Robust

Cascaded Pose Regression) [32], and SDM (Supervised

Descent Method) [33]areshowninTables4and 5.The

results show that the proposed MSR is better than

other methods.

3.2 Experiments for face recognition

For face recognition experiments on AR database [34],

we select 100 subjects as shown in Fig. 14, which are

used for performance evaluation. Each subject contains

Fig. 14 Face images in AR database (AR1–6) and the synthesized images (AR7–18) for a sampled identify

Fig. 15 Face images in FRGC database (FRGC 1–4) and the

synthesized images (FRGC 5–12) for a sampled identify

Yang et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:51 Page 9 of 13

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

18 images, where AR1–AR6 face images are the original

images, while AR7–AR18 are the synthesized ones. In

face recognition experiments on FRGC database [35], as

shown in Fig. 15, we also pick 100 subjects, which are

used for performance evaluation. Each subject contains

12 images, where FRGC1–FRGC4 face images are the

original images, while FRGC5–FRGC12 images are the

synthesized ones. Each facial image is downsampled to

20 × 20 pixels.

To validate the proposed alignment-based face recog-

nition system, the recognition performances achieved by

the different algorithms will be simulated. The other face

recognition algorithms used in the experiments include

principal component analysis (PCA) [10,11], linear discrim-

inant analysis (LDA) [15], linear regression classification

(LRC) [22], modular linear regression-based classification

(MLRC) [22], sparse representation classification (SRC)

[36,37], locality preserving projection (LPP) [38],

neighboring preserving embedding (NPE) [39], improved

principal component regression (IPCR) [40], unitary

regression classification (URC) [41], linear discriminant

regression classification (LDRC) [42], and kernel linear

regression classification (KLRC) [43]methods.From

Fig. 14, six original face images, AR1, AR3, AR4, and

AR5 for each identity are used for training while two

original images AR2 and AR6 and four synthesized images

are randomly selected for testing. From Fig. 15,three

original face images, FRGC2, FRGC3, and FRGC4, for

each identity are used for training while the original image

FRGC1 and two synthesized images are randomly selected

for testing.

In face recognition experiments, the abovementioned

face recognition algorithms are compared in three cat-

egories: (1) without alignment, (2) with ESR alignment,

and (3) with MSR alignment. After face alignment by

using the ESR and MSR methods, the face images both

are adjusted by using the proposed conditional cross

warping method for fair comparisons. Figure 16 shows

the detected seven landmarks of some tested (normal

and synthesized) images achieved by the ESR and MSR

methods. The results also show that the MSR method

has higher precision than the ESR method in landmark

estimation on AR and FRGC databases.

If the testing face images are the normal face images,

Tables 6and 7show the recognition performances on

AR (AR2, AR6) and FRGC (FRGC1) databases, respect-

ively. For the normal face images, it is noted that the

ESR and the proposed MSR methods without any prior

knowledges will still perform face alignment and face

warping processes. The recognized results show that the

proposed alignment-based face recognition systems are

quite reliable while the proposed MSR shows better than

the ESR method. For posed face images (synthesized face

images), Tables 8and 9show the recognized rates on

AR and FGGC databases, respectively. The simulation

results show that the proposed MSR face alignment and

ESR MSR ESR MSR ESR MSR

(a)

AR Database

(b)

FRGC Database

Fig. 16 Face alignment results (seven landmarks) achieved by ESR and MSR methods. aAR database. bFRGC database

Table 6 Recognition performances (%) on AR database (normal

faces)

Method Alignment

Without alignment Alignment by ESR Alignment by MSR

PCA 85.00 86.67 91.00

LDA 98.75 95.00 97.00

LRC 97.00 97.50 98.00

MLRC 95.00 93.00 95.00

SRC 98.50 99.00 98.50

LPP 82.00 81.50 83.00

NPE 90.50 90.50 92.50

IPCR 97.00 96.50 96.50

URC 99.00 98.00 99.00

LDRC 97.00 97.00 96.00

KLRC 97.00 95.00 96.00

Average 94.25 93.61 94.77

Yang et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:51 Page 10 of 13

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

conditional cross warping processes can effectively over-

come the problems of pose variations. The proposed MSR

method achieves better performances than the ESR method

not only in face alignment but also in face recognition.

Among all face recognition algorithms, the SRC and URC

methods in conjunction with the proposed alignment-based

face recognition system perform better than the other face

recognition methods.

4 Conclusions

In this paper, the multi-feature shape regression (MSR)

method, which considers pixel difference, region difference,

and gradient difference together, is first proposed. For face

recognition applications, a cross warping method is

suggested to achieve alignment-based face recognition.

The proposed MSR face alignment method can help to

precisely estimate seven key landmarks of face images.

Simulation results show that the multi-feature shape

regression (MSR) method, which utilizes more features

computed from surrounding pixels, shows better alignment

performance than the explicit shape regression (ESR) algo-

rithm, which only uses pixel difference. With seven selected

face key landmarks, including four eye canthi, one nose tip,

and two mouth corners, we can use the positions of seven

landmarks to find a cross shape, which is defined by the

estimated horizontal-eye (HE) and vertical-nose (VN) lines.

By the cross warping process, we can adjust the tilted face

image back to normal face image to overcome the problem

of pose variations for face recognition. The experimental

results show that the MSR method performs better than

the ESR and other face alignment algorithms on face

alignment database. For alignment-based face recognition,

the MSR face alignment algorithm with the cross warping

method can help the SPO face recognition methods to

achieve better recognition performances. Simulation results

show that the proposed multi-feature shape regression

(MSR) face alignment method achieves better perfor-

mances in both face alignment and face recognition

than the existing face alignment methods.

Abbreviations

AAM: Active appearance model; AR: Argument reality; ASM: Active shape

model; ESR: Explicit shape regression; IPCR: Improved principal component

regression; KLDA: Kernel LDA; LDA: Linear discriminant analysis; LPP: Locality

preserving projection; LRC: Linear regression classification; MLRC: Modular

linear regression classification; MSR: Multi-feature shape regression;

NPE: Neighboring preserving embedding; PCA: Principal component analysis;

SPO: Subspace projection optimizations; URC: Unitary regression

classification; VR: Virtual reality

Table 8 Recognition performances (%) on FRGC database

(Synthesized Faces)

Method Alignment

Without alignment Alignment by ESR Alignment by MSR

PCA 46.25 78.29 80.00

LDA 55.00 89.08 90.75

LRC 48.00 88.75 89.50

MLRC 28.75 74.25 76.50

SRC 50.25 91.75 92.75

LPP 30.50 62.25 64.25

NPE 40.75 80.00 78.00

IPCR 43.00 84.75 85.75

URC 64.50 91.75 93.50

LDRC 27.25 76.00 76.25

KLRC 41.50 88.00 86.25

Average 43.25 82.26 83.05

Table 9 Recognition rates on FRGC with different face

recognition algorithms (synthesized faces)

Method Alignment

Without alignment Alignment by ESR Alignment by MSR

PCA 69.00 91.00 91.00

LDA 54.50 76.50 83.00

LRC 43.50 80.50 85.50

MLRC 50.00 60.50 62.00

SRC 61.00 85.50 84.00

LPP 22.00 47.00 50.50

NPE 46.50 79.00 80.00

IPCR 41.00 76.00 75.50

URC 49.00 81.50 85.00

LDRC 6.50 37.00 37.50

KLRC 46.50 76.50 76.00

Average 44.50 71.91 73.64

Table 7 Recognition performances (%) on FRGC database

(normal faces)

Method Alignment

Without alignment Alignment by ESR Alignment by MSR

PCA 98.00 97.00 98.00

LDA 99.00 99.00 99.00

LRC 98.00 96.00 98.00

MLRC 98.00 95.00 96.00

SRC 98.00 96.00 98.00

LPP 94.00 90.00 88.00

NPE 98.00 96.00 97.00

IPCR 98.00 94.00 96.00

URC 98.00 100.00 98.00

LDRC 97.00 93.00 91.00

KLRC 98.00 97.00 98.00

Average 97.63 95.73 96.09

Yang et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:51 Page 11 of 13

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Acknowledgements

This work acknowledged the Editor, anonymous Reviewers and Professor

Din-Yuen Chan for criticizing the presentations and writings of the

manuscript.

Funding

This work was supported by the Ministry of Science and Technology, Taiwan,

under Grant MOST 105-2221-E-006-065-MY3.

Availability of data and materials

The face alignment data is obtained from the LFPW and HELEN face

alignment databases provided in [29,30], respectively. The face recognition data

is retrieved from the AR and FRGC databases delivered in [34,35], respectively.

As to the augment face images, the datasets generated for the current study are

available from the corresponding author on reasonable request.

Authors’contributions

W-JY carried out image processing studies, participated in the proposed

system, assembled formulations, and drafted the manuscript. Y-CC carried out

software simulations and face data augmentation by warping parameters. P-CC

and J-FY conceived of the study, participated in its design and coordination,

and helped to draft the manuscript. All authors read and approved the final

manuscript.

Authors’information

W-J Yang received a B.S. degree in Computer Science from Tunghai

University, Taiwan, in 2012 and an M.S. degree in Computer Science and

Information Engineering from National University of Tainan, Taiwan, in 2015.

Currently, he is a Ph.D. student with the Graduate Institute of Computer and

Communication Engineering in National Cheng Kung University, Taiwan. His

current research interests include pattern recognition, machine learning, and

deep learning for designs of smart systems.

Y-C Chen received a B.S. degree in Electrical Engineering and an M.S. degree

in Computer and Communication Engineering from the National Cheng

Kung University, Tainan, Taiwan, in 2014 and 2016, respectively. Her current

research interests include face recognition and machine learning.

P-C Chung received a Ph.D. degree in Electrical Engineering from Texas Tech

University, Lubbock, TX, USA, in 1991. She was with the Department of

Electrical Engineering, National Cheng Kung University (NCKU), Tainan,

Taiwan, in 1991 and became a Full Professor in 1996. She applies most of

her research results to healthcare and medical applications. Dr. Chung is a

member of the Phi Tau Phi Honor Society, was a member of the Board of

Governors of CAS Society from 2007 to 2009 and from 2010 to 2012, and is

currently an ADCOM Member of the IEEE CIS and the Chair of CIS

Distinguished Lecturer Program. She also is an Associate Editor of IEEE

Transaction on Neural Networks and the Editor of Journal of Information

Science and Engineering, the Guest Editor of Journal of High Speed Network,

the Guest Editor of IEEE Transaction on Circuits and Systems-I, and the Secretary

General of Biomedical Engineering Society of China. She is one of the Co--

Founders of Medical Image Standard Association (MISA) in Taiwan and is cur-

rently on the Board of Directors of MISA. Her research interests include image/

video analysis and pattern recognition, bio signal analysis, computer vision, and

computational intelligence. She is an IEEE fellow.

J-F Yang received a Ph.D. degree in Electrical Engineering from the

University of Minnesota, Minneapolis, MN, USA, in 1988. He joined the

National Cheng Kung University (NCKU), Taiwan, in 1988 and was promoted

to Distinguished Professor in 2004. Dr. Yang was the Distinguished Lecturer

in the Program by the IEEE Circuits and Systems Society (CAS) from 2004 to

2005. He was the Chair of the IEEE CAS Multimedia Systems and Applications

Technical Committee from 2008 to 2009. He was an Associate Editor of IEEE

Transaction on Circuits and Systems for Video Technology and EURASIP

Journal of Advances in Signal Processing. He is an IEEE Fellow. Currently, he

is an Associate Editor of IET Signal Processing. He was a recipient of the NSC

Excellent Research Award in Taiwan in 2008. He has published over 135

journals and 216 conference papers. Currently, his research interests include

multimedia processing, coding, and recognition.

Competing interests

The authors declare that they have no competing interests.

Publisher’sNote

Springer Nature remains neutral with regard to jurisdictional claims in

published maps and institutional affiliations.

Received: 27 November 2017 Accepted: 17 July 2018

References

1. TF Cootes, CJ Taylor, in Proc of the British Machine Vision Conference. Active

shape models—‘smart snakes’(1992), pp. 266–275

2. D Cristinacce, TF Cootes, in Proc of the British Machine Vision Conference.

Boosted regression active shape models (2007)

3. TF Cootes, GJ Edwards, CJ Taylor, in European Conference on Computer

Vision. Active appearance models (1998)

4. I Matthews, S Baker, Active appearance models revisited. Int. J. Comput. Vis.

60(2), 135–164 (2004)

5. P Sauer, TF Cootes, CJ Taylor, in Proc of the British Machine Vision Conference.

Accurate regression procedures for active appearance models (2011)

6. J Saragih, R Goecke, in Proc. of IEEE 11th International Conference on

Computer Vision. A nonlinear discriminative approach to AAM fitting (2007)

7. P Dollár, P Welinder, P Perona, in Proc. of IEEE Conference on Computer Vision

and Pattern Recognition. Cascaded pose regression (2010)

8. M Valstar, B Martinez, X Binefa, in Proc. of IEEE Conference on Computer

Vision and Pattern Recognition. Facial point detection using boosted

regression and graph models (2010)

9. X Cao, Y Wei, F Wen, J Sun, Face alignment by explicit shape regression. Int.

J. Comput. Vis. 107(2), 177–190 (2014)

10. M Turk, A Pentland, Eigenfaces for recognition. J. Cogn. Neurosci. 3(1),

71–86 (1991)

11. P. N. Belhumeur, ,J. P. Hespanha, and D. J. Kriegman, Eigenfaces vs.

fisherfaces: recognition using class specific linear projection IEEE Trans.

Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 711–720, 1997.

12. B Moghaddam, A Pentland, Probabilistic visual learning for object

representation. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 696–710 (1997)

13. J Yang, D Zhang, AF Frangi, J-Y Yang, Two-dimensional PCA: a new

approach to appearance-based face representation and recognition. IEEE

Trans. Pattern Anal. Mach. Intell. 26(1), 131–137 (2004)

14. AM Martínez, AC Kak, PCA versus LDA. IEEE Trans. Pattern Anal. Mach. Intell.

23(2), 228–233 (2001)

15. J Shawe-Taylor, N Cristianini, Kernel methods for pattern analysis (Cambridge

University Press, Oxford, 2004)

16. B Schölkopf, A Smola, K-R Müller, Nonlinear component analysis as a kernel

eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998)

17. M-H Yang, in Proc. of the Fifth International Conference on Automatic Face

and Gesture Recognition. Kernel eigenfaces vs. kernel fisherfaces: face

recognition using kernel methods (2002)

18. B. Scholkopft and K.-R. Mullert, Fisher discriminant analysis with kernels

Neural networks for signal processing IX, 1 1 1999.

19. G Baudat, F Anouar, Generalized discriminant analysis using a kernel

approach. Neural Comput. 12(10), 2385–2404 (2000)

20. J Lu, KN Plataniotis, AN Venetsanopoulos, Face recognition using kernel direct

discriminant analysis algorithms. IEEE Trans. Neural Netw. 14(1), 117–126 (2003)

21. J Huang, PC Yuen, WS Chen, JH Lai, Choosing parameters of kernel

subspace LDA for recognition of face images under pose and illumination

variations. IEEE Trans. Syst. Man Cybern. B Cybern. 37(4), 847–862 (2007)

22. I Naseem, R Togneri, M Bennamoun, Linear regression for face recognition.

IEEE Trans. Pattern Anal. Mach. Intell. 32(11), 2106–2112 (2010)

23. Y Duan, J Lu, J Feng, J Zhou, Context-aware local binary feature learning for

face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(5), (2018)

24. W-Y Liu, Y-D Wen, Z-D Yu, M Li, B Raj, L Song, in IEEE Conference on

Computer Vision and Pattern Recognition (CVPR). SphereFace: deep

hypersphere embedding for face recognition (2017)

25. W Wu, M Kan, X Liu, Y Yang, S Shan, X Chen, in IEEE Conference on

Computer Vision and Pattern Recognition (CVPR). Recursive spatial transformer

(rest) for alignment-free face recognition (2017)

26. N Duffy, D Helmbold, Boosting methods for regression. Mach. Learn. 47(2),

153–200 (2002)

27. JH Friedman, Greedy function approximation: a gradient boosting machine.

Ann. Stat. 29(5), (2001)

28. P Viola, MJ Jones, Robust real-time face detection. Int. J. Comput. Vis. 57(2),

137–154 (2004)

Yang et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:51 Page 12 of 13

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

29. PN Belhumeur, DW Jacobs, DJ Kregman, N Kumar, Localizing parts of faces

using a consensus of exemplars. IEEE Trans. Pattern Anal. Mach. Intell.

35(12), 2930–2940 (2013)

30. V Le, J Brandt, Z Lin, L Bourdev, JS Huang, in Proc. of European Conference

on Computer Vision. Interactive facial feature localization (2012)

31. V Kazemi, J Sullivan, in Proc. of the IEEE Conference on Computer Vision and

Pattern Recognition. One millisecond face alignment with an Ensemble of

Regression Trees (2014)

32. XP Burgos-Artizzu, P Perona, P Dollár, in Proc. of the IEEE International

Conference on Computer Vision. Robust face landmark estimation under

occlusion (2013)

33. X Xiong, F De la Torre, in Proc. of the IEEE Conference on Computer Vision

and Pattern Recognition. Supervised descent method and its applications to

face alignment (2013)

34. AM Martinez, in CVC Technical Report. The AR face database, vol 24 (1998)

35. PJ Phillips, FJ Flynn, T Scruggs, KW Bowyer, J Chang, K Hoffman, J Marques,

J Ming, W Worek, in Proc. of IEEE Computer Society Conference on Computer

Vision and Pattern Recognition (CVPR'05). Overview of the face recognition

grand challenge (2005)

36. J Wright, A-Y Yang, A Ganesh, Robust face recognition via sparse

representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)

37. X Jiang, J Lai, Sparse and dense hybrid representation via dictionary

decomposition for face recognition. IEEE Trans. Pattern Anal. Mach. Intell.

37(5), 1067–1079 (2015)

38. X He, S Yan, Y Hu, P Niyogi, J-J Zhang, Face recognition using

Laplacianfaces. IEEE Trans. Pattern Anal. Mach. Intell. 27(3), 328–340 (2005)

39. X He, D Cai, Y Yang, H-J Zhang, in Proc. of Tenth IEEE International

Conference on Computer Vision (ICCV'05). Neighborhood preserving

embedding, vol 1 (2005)

40. S-M Huang, J-F Yang, Improved principal component regression for face

recognition under illumination variations. IEEE Sig. Process. Lett. 19(4), 179–

182 (2012)

41. S-M Huang, J-F Yang, Unitary regression classification with total minimum

projection error for face recognition. IEEE Sig. Process. Lett. 20(5), 443–446 (2013)

42. S-M Huang, J-F Yang, Linear discriminant regression classification for face

recognition. IEEE Sig. Process. Lett. 20(1), 91–94 (2013)

43. Y-T Chou, S-M Huang, J-F Yang, Class-specific kernel linear regression

classification for face recognition under low-resolution and illumination

variation conditions. EURASIP J. Adv. Sig. Process. https://doi.org/10.1186/

s13634-016-0328-0

Yang et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:51 Page 13 of 13

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Terms and Conditions

Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).

Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-

scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By

accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these

purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.

These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal

subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription

(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will

apply.

We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within

ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not

otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as

detailed in the Privacy Policy.

While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may

not:

use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access

control;

use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is

otherwise unlawful;

falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in

writing;

use bots or other automated methods to access the content or redirect messages

override any security feature or exclusionary protocol; or

share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal

content.

In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,

royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal

content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any

other, institutional repository.

These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or

content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature

may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.

To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied

with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,

including merchantability or fitness for any particular purpose.

Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed

from third parties.

If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not

expressly permitted by these Terms, please contact Springer Nature at

onlineservice@springernature.com

Available via license: CC BY 4.0

Content may be subject to copyright.

Toward masked face recognition: An effective facial feature extraction and refinement model in multiple scenes

Article

Full-text available

Oct 2022
EXPERT SYST

With the impact of the COVID‐19 epidemic, the demand for masked face recognition technology has increased. In the process of masked face recognition, some problems such as less feature information and poor robustness to the environment are obvious. The current masked face recognition model is not quantified enough for feature extraction, there are large errors for faces with high similarity, and the categories cannot be clustered during the detection process, resulting in poor classification of masks, which cannot be well adapted to changes in multiple environments. To solve current problems, this paper designs a new masked face recognition model, taking improved Single Shot Multibox Detector (SSD) model as a face detector, and replaces the input layer VGG16 of SSD with Deep Residual Network (ResNet) to increase the receptive field. In order to better adapt to the network, we adjust the convolution kernel size of ResNet. In addition, we fine‐tune the Xception network by designing a new fully connected layer, and reduce the training cycle. The weights of the three input samples including anchor, positive and negative are shared and clustered together with triplet network to improve recognition accuracy. Meanwhile, this paper adjusts alpha parameter in triplet loss. A higher value of alpha can improve the accuracy of model recognition. We further adopt a small trick to classify and predict face feature vectors using multi‐layer perceptron (MLP), and a total of 60 neural nodes are set in the three neural layers of MLP to get higher classification accuracy. Moreover, three datasets of MFDD, RMFRD and SMFRD are fused to obtain high‐quality images in different scenes, and we also add data augmentation and face alignment methods for processing, effectively reducing the interference of the external environment in the process of model recognition. According to the experimental results, the accuracy of masked face recognition reaches 98.3%, it achieves better results compared with other mainstream models. In addition, the hyper‐parameters tuning experiment is carried out to improve the utilization of computing resources, which shows better results than the indicators of different networks.

A System For Retrieving and Classifying Images Extracted From Video Surveillance Cameras

Thesis

Apr 2021

Sirine Ammar

In this thesis, we present a robust descriptor for background subtraction based on an unsupervised anomaly detection algorithm, called DeepSphere which is able to detect moving objects from video sequences. Unlike conventional background-foreground separation algorithms, this descriptor is less sensitive to noise and detects foreground objects without additional image processing. In addition, our proposal exploits both deep autoencoders and hy-persphere learning methods, having the ability to capture spatio-temporal dependencies between components and through "timesteps", to flexibly learn a non-linear feature representation and reconstruct normal behaviors from potentially anomalous input data. The high quality non-linear representations learned by the autoencoder helps the hypersphere to better distinguish anomalous cases by learning a compact boundary separating normal and ano-malous data. By adapting this algorithm to the background subtraction task, foreground objects are well captured by DeepSphere and the quality of detection of these objects is improved. Once these objects are detected (people/ cars ...), an approach is proposed to classify them using a DCGAN discriminator network in a semi-supervised manner. The discriminator is transformed into a multi-class classifier that uses both a large number of unlabeled data and a very small number of labeled data to compensate the lack of data and the high cost of collecting additional data or labeling all the data. Finally, we have adopted an approach based on FaceNet model to recognize the extracted people through their faces. In addition, we extended our proposal with a data augmentation method based on DCGANs instead of using standard data augmentation methods. This not only increases the accuracy of the model, but also reduces the execution time and the deep neural network learning time by almost half.

From Moving Objects Detection to Classification and Recognition: A Review for Smart Environments

Book

Dec 2020

State-of-the-art in artificial neural network applications: A survey

Article

Full-text available

Nov 2018

This is a survey of neural network applications in the real-world scenario. It provides a taxonomy of artificial neural networks (ANNs) and furnish the reader with knowledge of current and emerging trends in ANN applications research and area of focus for researchers. Additionally, the study presents ANN application challenges, contributions, compare performances and critiques methods. The study covers many applications of ANN techniques in various disciplines which include computing, science, engineering, medicine, environmental, agriculture, mining, technology, climate, business, arts, and nanotechnology, etc. The study assesses ANN contributions, compare performances and critiques methods. The study found that neural-network models such as feedforward and feedback propagation artificial neural networks are performing better in its application to human problems. Therefore, we proposed feedforward and feedback propagation ANN models for research focus based on data analysis factors like accuracy, processing speed, latency, fault tolerance, volume, scalability, convergence, and performance. Moreover, we recommend that instead of applying a single method, future research can focus on combining ANN models into one network-wide application.

3D Concrete Printing: Factors Affecting the US and Portugal

Conference Paper

Oct 2023

Housing starts have never recovered from the Great Recession and therefore the lack of inventory has driven up pricing resulting in a crisis of affordable options in the market. Both the United States (US) and Portuguese housing markets are experiencing this crisis driven by high costs of financing, land, materials, labor, and a limited supply chain. Emergent technologies provide one solution to labor, material and lack of production and 3D concrete printing (3DCP) promises solutions, reduces waste, and improves quality. Yet few studies discuss drivers of and barriers to this emerging technology, especially catered to specific markets. This work aims to develop a common set of attributes affecting the diffusion of 3D printing, comparing international market needs and differing factors that could both hinder and bolster the adoption of 3D concrete printing technology. This paper considers only additive manufacturing when referring to 3DCP and analysis is derived from a distillation of various types of media and the lack of empirical work is a limitation for the industry.

Comparative Analysis of Deep Learning Models for Cotton Leaf Disease Detection

Chapter

Aug 2022

Cotton is the most essential crop and plays an important role in the agricultural economy of the country. Cotton crop is prone to many diseases because of changes in the climatic conditions, insects such as pink bollworm, and many other factors. These diseases decrease crop productivity, and at present farmers, are diagnosing the diseases with their own experience. But these kinds of naked-eye observations do not give accurate results on large plantation areas. Therefore, it is necessary to develop an automatic, accurate, and economic system for detecting leaf diseases. The aim of this work is to detect the infected cotton leaf using Convolutional Neural Network (ConvNet/CNN) which is a deep learning technique. Nearly 519 healthy leaves and 387 diseased leaves are collected from reliable sources and studied. This work focusses on the performance evaluation and comparison of the powerful CNN architectures: AlexNet, InceptionV3, and Residual Network (ResNet) 50, VGG 16, NASNet and Xception in detecting the diseased cotton leaf. Out of these six models, ResNet50 and VGG 16 has shown significant performance with 97.56% of accuracy.KeywordsCotton leaf diseaseDeep learningTransfer learningConvolutional neural network

Automated Face Authentication and Recognition Using Deep Neural Network with SVM Classifier in Cloud Environment

Chapter

Jan 2022

With the development of technology improvement, crime rates are also getting increased nowadays. The number of cameras is installed everywhere, making criminal tracking simpler. The fast-growing demand of surveillance cameras is creating the necessity of real-time analytics, mining, and the need of public clouds for private sectors since cost and scalability are the major issue. With the aim to providing promising solution to the security threat, this research work is proposed to easily identify, recognize, and track the criminals on demand, missing children, etc. In order to meet this requirement, a dynamic neural network approach with classifier SVM is proposed. A sophisticated platform is required for image processing as it is awfully expensive in the context of time needed for computational tasks and storage space. Also, it is very important to acquire economic solutions. The requirement of adopting economic solutions and replacement of traditional systems, that is the large-scale data processing requirements, made us to use cloud computing since it is capable of providing highly required services with wide availability and extensibility. With this proposed model, the minimum error rate of 0.04 is achieved when tested with 1000 data samples.KeywordsCloud computingCyber securityFeature extractionFace recognitionDeep neural networkSVM

Towards Smart World

Chapter

Full-text available

Dec 2020

Vukoman R. Jokanović

Smart healthcare in smart cities

Chapter

Full-text available

Sep 2020

Vukoman R. Jokanović

Medical care is essential for the prospering growth of smart cities, which assumed a high quality of medical service is the most challenging goal for the city government. Since the medical service mostly depends on the competency of medical stuff, which is appropriate for impossible control and estimate, smart technologies of healthcare, can be the right solution to solve every day frequently very complicated and badly formalized problems of healthcare. In this chapter the main attention is paid to the improvement of medical service through corresponding examinations and tendency analysis, using the Internet of Medical Things (IoMT), Electronic Health Recordings, Mobile Cloud Computing (MCC), and machine learning applied to the vast quantity of miscellaneous information. In this paper, the main challenges and possibilities of smart and healthy cities are presented, because a healthy city is an essential prerequisite for a successful city and a key consequence of smart cities. The implementations of a novel MCC decision ensure an authoritative and efficient platform for stakeholders in shearing their online information, enabling more mature decision-making and strengthening engagement of ordinary people in the community. In the last part of this chapter, a new concept of smart health care based on the energetic approach of cell healing is discussed. This very well adopted platform to the technologies of smart cities present an entirely original approach to medical treatment of health problems, successfully making the relationship between quantum physics and biological cells behavior.

Face Recognition Systems: A Survey

Article

Full-text available

Jan 2020
SENSORS-BASEL

Over the past few decades, interest in theories and algorithms for face recognition has been growing rapidly. Video surveillance, criminal identification, building access control, and unmanned and autonomous vehicles are just a few examples of concrete applications that are gaining attraction among industries. Various techniques are being developed including local, holistic, and hybrid approaches, which provide a face image description using only a few face image features or the whole facial features. The main contribution of this survey is to review some well-known techniques for each approach and to give the taxonomy of their categories. In the paper, a detailed comparison between these techniques is exposed by listing the advantages and the disadvantages of their schemes in terms of robustness, accuracy, complexity, and discrimination. One interesting feature mentioned in the paper is about the database used for face recognition. An overview of the most commonly used databases, including those of supervised and unsupervised learning, is given. Numerical results of the most interesting techniques are given along with the context of experiments and challenges handled by these techniques. Finally, a solid discussion is given in the paper about future directions in terms of techniques to be used for face recognition.

Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection

Article

Full-text available

Jul 1997

We develop a face recognition algorithm which is insensitive to large variation in lighting direction and facial expression. Taking a pattern classification approach, we consider each pixel in an image as a coordinate in a high-dimensional space. We take advantage of the observation that the images of a particular face, under varying illumination but fixed pose, lie in a 3D linear subspace of the high dimensional image space-if the face is a Lambertian surface without shadowing. However, since faces are not truly Lambertian surfaces and do indeed produce self-shadowing, images will deviate from this linear subspace. Rather than explicitly modeling this deviation, we linearly project the image into a subspace in a manner which discounts those regions of the face with large deviation. Our projection method is based on Fisher's linear discriminant and produces well separated classes in a low-dimensional subspace, even under severe variation in lighting and facial expressions. The eigenface technique, another method based on linearly projecting the image space to a low dimensional subspace, has similar computational requirements. Yet, extensive experimental results demonstrate that the proposed “Fisherface” method has error rates that are lower than those of the eigenface technique for tests on the Harvard and Yale face databases

SphereFace: Deep Hypersphere Embedding for Face Recognition

Conference Paper

Full-text available

Apr 2017

This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space. However, few existing algorithms can effectively achieve this criterion. To this end, we propose the angular softmax (A-Softmax) loss that enables convolutional neural networks (CNNs) to learn angularly discriminative features. Geometrically, A-Softmax loss can be viewed as imposing discriminative constraints on a hypersphere manifold, which intrinsically matches the prior that faces also lie on a manifold. Moreover, the size of angular margin can be quantitatively adjusted by a parameter m. We further derive specific $m$ to approximate the ideal feature criterion. Extensive analysis and experiments on Labeled Face in the Wild (LFW), Youtube Faces (YTF) and MegaFace Challenge 1 show the superiority of A-Softmax loss in FR tasks.

Class-specific kernel linear regression classification for face recognition under low-resolution and illumination variation conditions

Article

Full-text available

Feb 2016

In this paper, a novel class-specific kernel linear regression classification is proposed for face recognition under very low-resolution and severe illumination variation conditions. Since the low-resolution problem coupled with illumination variations makes ill-posed data distribution, the nonlinear projection rendered by a kernel function would enhance the modeling capability of linear regression for the ill-posed data distribution. The explicit knowledge of the nonlinear mapping function can be avoided by using the kernel trick. To reduce nonlinear redundancy, the low-rank-r approximation is suggested to make the kernel projection be feasible for classification. With the proposed class-specific kernel projection combined with linear regression classification, the class label can be determined by calculating the minimum projection error. Experiments on 8 × 8 and 8 × 6 images down-sampled from extended Yale B, FERET, and AR facial databases revealed that the proposed algorithm outperforms the state-of-the-art methods under severe illumination variation and very low-resolution conditions.

Recursive Spatial Transformer (ReST) for Alignment-Free Face Recognition

Conference Paper

Oct 2017

Context-Aware Local Binary Feature Learning for Face Recognition

Article

May 2017

In this paper, we propose a context-aware local binary feature learning (CA-LBFL) method for face recognition. Unlike existing learning-based local face descriptors such as discriminant face descriptor (DFD) and compact binary face descriptor (CBFD) which learn each feature code individually, our CA-LBFL exploits the contextual information of adjacent bits by constraining the number of shifts from different binary bits, so that more robust information can be exploited for face representation. Given a face image, we first extract pixel difference vectors (PDV) in local patches, and learn a discriminative mapping in an unsupervised manner to project each pixel difference vector (PDV) into a context-aware binary vector. Then, we perform clustering on the learned binary codes to construct a codebook, and extract a histogram feature for each face image with the learned codebook as the final representation. In order to exploit local information from different scales, we propose a context-aware local binary multi-scale feature learning (CA-LBMFL) method to jointly learn multiple projection matrices for face representation. To make the proposed methods applicable for heterogeneous face recognition, we present a coupled CA-LBFL (C-CA-LBFL) method and a coupled CA-LBMFL (C-CA-LBMFL) method to reduce the modality gap of corresponding heterogeneous faces in the feature level, respectively. Extensive experimental results on four widely used face datasets clearly show that our methods outperform most state-of-the-art face descriptors.

Fisher discriminant analysis with kernels

Article

Jan 1999
SIGNAL PROCESS

Two dimensional PCA: A new approach to appearance based face representation and recognition

Article

Jan 2004

Y.J. Zhang

Active shape models-smart snakes

Article

Jan 2006

Greedy Function Approximation: A Gradient Boosting Machine

Article

Oct 2001
ANN STAT

Jerome H. Friedman

Function estimation/approximation is viewed from the perspective of numerical optimization iti function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent "boosting" paradigm is developed for additive expansions based on any fitting criterion. Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such "TreeBoost" models are presented. Gradient boosting of regression trees produces competitives highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire and Friedman, Hastie and Tibshirani are discussed.

Sparse and Dense Hybrid Representation via Dictionary Decomposition for Face Recognition

Article

May 2015

Sparse representation provides an effective tool for classification under the conditions that every class has sufficient representative training samples and the training data are uncorrupted. These conditions may not hold true in many practical applications. Face identification is an example where we have a large number of identities but sufficient representative and uncorrupted training images cannot be guaranteed for every identity. A violation of the two conditions leads to a poor performance of the sparse representation-based classification (SRC). This paper addresses this critic issue by analyzing the merits and limitations of SRC. A sparse- and dense-hybrid representation (SDR) framework is proposed in this paper to alleviate the problems of SRC. We further propose a procedure of supervised low-rank (SLR) dictionary decomposition to facilitate the proposed SDR framework. In addition, the problem of the corrupted training data is also alleviated by the proposed SLR dictionary decomposition. The application of the proposed SDR-SLR approach in face recognition verifies its effectiveness and advancement to the field. Extensive experiments on benchmark face databases demonstrate that it consistently outperforms the state-of-the-art sparse representation based approaches and the performance gains are significant in most cases.

Multi-feature shape regression for face alignment

Abstract and Figures

Recommended publications

Lasso based shape regression for face alignment

Face Alignment by Explicit Shape Regression

Face Alignment with Two-Layer Shape Regression

Weighted Module Linear Regression Classifications for Partially-Occluded Face Recognition

An Iterative Regression Approach for Face Pose Estimation from RGB Images