ArticlePDF Available

Toward Driver Face Recognition in the Intelligent Traffic Monitoring Systems

Authors:

Abstract

This paper models the driver face recognition problem under the intelligent traffic monitoring systems as severe illumination variation face recognition with single sample problem. Firstly, in the point of view of numerical value sign, the current illumination invariant unit is derived from the subtraction of two pixels in the face local region, which may be positive or negative, we propose a generalized illumination robust (GIR) model based on positive and negative illumination invariant units to tackle severe illumination variations. Then, the GIR model can be used to generate several GIR images based on the local edge-region or the local block-region, which results in the edge-region based GIR (EGIR) image or the block-region based GIR (BGIR) image. For single GIR image based classification, the GIR image utilizes the saturation function and the nearest neighbor classifier, which can develop EGIR-face and BGIR-face. For multi GIR images based classification, the GIR images employ the extended sparse representation classification (ESRC) as the classifier that can form the EGIR image based classification (GIRC) and the BGIR image based classification (BGIRC). Further, the GIR model is integrated with the pre-trained deep learning (PDL) model to construct the GIR-PDL model. Finally, the performances of the proposed methods are verified on the Extended Yale B, CMU PIE, AR, self-built Driver and VGGFace2 face databases. The experimental results indicate that the proposed methods are efficient to tackle severe illumination variations.
1
AbstractThis paper models the driver face recognition
problem under the intelligent traffic monitoring systems as
severe illumination variation face recognition with single
sample problem. Firstly, in the point of view of numerical
value sign, the current illumination invariant unit is
derived from the subtraction of two pixels in the face local
region, which may be positive or negative, we propose a
generalized illumination robust (GIR) model based on
positive and negative illumination invariant units to tackle
severe illumination variations. Then, the GIR model can be
used to generate several GIR images based on the local
edge-region or the local block-region, which results in the
edge-region based GIR (EGIR) image or the block-region
based GIR (BGIR) image. For single GIR image based
classification, the GIR image utilizes the saturation
function and the nearest neighbor classifier, which can
develop EGIR-face and BGIR-face. For multi GIR images
based classification, the GIR images employ the extended
sparse representation classification (ESRC) as the classifier
that can form the EGIR image based classification (GIRC)
and the BGIR image based classification (BGIRC). Further,
the GIR model is integrated with the pre-trained deep
learning (PDL) model to construct the GIR-PDL model.
Finally, the performances of the proposed methods are
verified on the Extended Yale B, CMU PIE, AR, self-built
Driver and VGGFace2 face databases. The experimental
results indicate that the proposed methods are efficient to
tackle severe illumination variations.
Index Terms Traffic driver face recognition, severe
illumination variations, generalized illumination robust
model, single sample problem
I. INTRODUCTION
ECENTLY, many research works about the traffic driver
were reported [1]-[3] as well as real-time vehicle driver
This work was supported by the National Natural Science Foundation of
China (No.61802203, No.61702280), Natural Science Foundation of Jiangsu
Province (No.BK20180761, No.BK20170900), China Postdoctoral Science
Foundation (No.2019M651653), Postdoctoral Research Funding Program of
Jiangsu Province (No.2019K124), National Postdoctoral Program for
Innovative Talents (No.20180146), and NUPTSF (No.NY218119).
C.-H. Hu (Corresponding author), F. Wu and X.-Y. Jing are with the College
of Automation and College of Artificial Intelligence, Nanjing University of
Posts and Telecommunications, Nanjing 210023, China (e-mail:
hchnjupt@126.com, wufei_8888@126.com, jingxy_2000@126.com)
C.-H. Hu, Y. Zhang, and X.-B Lu are with the School of Automation,
Southeast University, Nanjing 210096 (e-mail: hchseu@seu.edu.cn,
yang.zhang-1@uts.edu.au, xblu2013@126.com).
P. Liu is with the School of Transportation, Southeast University, Nanjing
210096 (e-mail:, linpan@seu.edu.cn).
authentication [4] under in-vehicle camera, but less works were
involved in the driver face recognition under out-vehicle
camera. Illumination was considered as a major problem for
in-vehicle face analysis [4], whereas illumination of in-vehicle
face analysis is not as severe as that of out-vehicle face
analysis.
In the intelligent traffic monitoring systems of China, the
high-definition cameras are fixed at outdoor traffic intersection,
which can capture the frontal vision of the passed vehicle
including the driver face. The driver face images are usual
frontal pose and natural expression due to the fact that the
images are taken when peoples are focusing on driving. Fig.1.
shows the traffic vehicle images and the driver face images in
the real intelligent traffic monitoring systems. It can be seen
that the driver face images are with severe illumination
variations.
As the driver faces are huge in the intelligent traffic
monitoring systems, it is impossible to record many images or a
period of video for every driver due to limited capacity of
storages. Recording only one high-definition image is the
common practice, which means only one face image is
available for each driver. It is significant for the intelligent
traffic monitoring systems to automatically identify one correct
driver from many by the face images, which results in the
severe illumination variation face recognition with single
sample problem. Hence, severe illumination variation and
single sample problem are two main tasks of the driver face
image.
Fig.1. The traffic vehicle images and the driver face images.
Illumination variation [5] and single sample problem [6] are
extremely tough in face recognition. As numerous approaches
have been proposed to tackle severe illumination variation and
single sample problem respectively, some significant works are
selected to review in this paper.
The illumination recovering approach [7] and the
illumination invariant approach [8]-[16] are two categories of
methods to tackle illumination variations in face recognition.
The illumination recovering approach aims to obtain the
normal lighting version of the illumination contaminated face
image. The illumination invariant approach extracts the
Toward Driver Face Recognition in the
Intelligent Traffic Monitoring Systems
Chang-Hui Hu, Yang Zhang, Fei Wu, Xiao-Bo Lu, Pan Liu and Xiao-Yuan Jing
R
2
illumination insensitive content from the illumination
contaminated face image. As illumination recovering could
distort face discriminant information, the illumination invariant
approach is more robust to tackle severe illumination variations.
In fact, most of illumination invariant approaches were
developed based on the lambertian reflectance model [17]. The
face reflectance [8]-[10], the face high-frequency facial feature
[11]-[14], and the face illumination invariant measure [15]-[16]
are very efficient to tackle severe illumination variations.
The wavelet transform was utilized to construct multiscale
facial structure (MSF) of the face reflectance [8]. Further, the
face reflectance was tackled by the double-density dual-tree
complex wavelet transform (DD-DTCWT) [9]. Recently, the
weighted variational model was proposed to estimate the
reflectance and the illumination simultaneously [10]. The
discrete cosine transform was early used to extract the
reflectance of the logarithm face image (LOG-DCT) [11]. The
logarithmic total variation (LTV) model [12] was firstly
proposed to extract the small-scale facial structures (i.e. the
high-frequency facial feature) of the illumination contaminated
face image. The illumination normalization based on small-and
large-scale features (SL_LTV) [13] employed LTV and DCT to
construct the combination of the illumination normalized
low-frequency facial feature and the corrected high-frequency
facial feature. The frequency interpretation of the single value
decomposition algorithm was firstly used to develop the
high-frequency facial feature of the illumination contaminated
face image (HFSVD-face) [14], where illumination effects of
the face image were strictly constrained. The Gradient-face [15]
used the ratio of y-gradient to x-gradient of the illumination
contaminated face image to construct the illumination invariant
measure. The logarithm gradient histogram (LGH) [16]
combined the gradient feature and the magnitude to developed
the histogram of the illumination contaminated face image.
Ideally, the face illumination invariant measure requires that
illumination intensities of neighborhood pixels are
approximately equal in the face local region.
The local pattern descriptor [18]-[23] is a state-of-the-art
hand-crafted based image feature descriptor. The R-theta local
neighborhood pattern (RTLNP) [21], the local directional
gradient pattern (LDGP) [21], and the centre symmetric
quadruple pattern (CSQP) [23] could effectively recognize the
face image with illumination, pose and expression variations.
Moreover, Supervised-learning based face hallucination [24]
was novel and efficient to tackle low-resolution face images
with severe illumination variations.
The essence of single sample problem is the lack of face intra
and inter class information, the virtual image approach [25]-[26]
and the generic image learning approach [6], [27] are two
categories of methods to address single sample problem in face
recognition. The virtual image approach generates virtual
images of the single sample to learn the face intra class
information, and the generic image learning approach learns the
face intra and inter class information of the single sample from
available multi samples. Undoubtedly, the deep learning based
approach [28]-[31] is the best to learn the face intra and inter
class information from available massive face images. The
matching/non-matching pairs consisting of 200M internet face
images were used to train Facenet [28]. 2.6M internet face
images (2622 persons and 1000 images per person) were
employed to train VGG [29]. 85742 persons and 5.8M internet
face images were utilized to train ArcFace [31].
In this paper, the generalized illumination robust (GIR)
model is propose to tackle severe illumination variations, and
then the GIR model is utilized to generate several GIR images
of the single training sample. For single GIR image based
classification, the saturation function and the nearest neighbor
classifier are used. For multi GIR images based classification,
the extended sparse representation classification (ESRC) [6] is
employed. Further, the GIR model is integrated with the
pre-trained deep learning model.
Compared to the previous works such as [8]-[16] and
[32]-[34], the new contributions of this paper are:
(1) This paper indicates that the existing illumination
invariant unit is derived from the subtraction of two pixels in
the face local region, which may be positive or negative. Based
on this fact, the GIR model is developed to tackle severe
illumination variations.
(2) The GIR model utilizes only one weight rather than
several different weights to generate the illumination invariant
measure from multi face local regions. The GIR model can
easily generate several illumination robust images of the single
training image to address single sample problem.
(3) This paper not only utilize the saturation function and the
template matching approach for single GIR image
classification, but also employs ESRC for multi GIR images
classification.
(4) Despite available driver face images are not able to train a
robust deep learning model, this paper integrates the GIR
model and the pre-trained deep learning model to tackle severe
illumination variation face recognition with single sample
problem.
This paper is organized as follows. The motivation and
related works are reviewed in Section II. Section III elaborates
the generalized illumination robust (GIR) model. Section IV
presents the classification model. Section V gives the
experiments, and Section VI concludes this paper.
II. MOTIVATION AND RELATED WORKS
A. Motivation
The deep learning method is the best face recognition
approach nowadays, since the data-driven based deep learning
method was trained by large scale labeled face images (i.e. face
image pairs or many persons with each containing multi
images). However, if the deep learning model do not consider
severe illumination variations, it may not extract very well
discriminative facial feature of the face image with severe
illumination variations. It can be seen from the experimental
results of Extended Yale B [35] in Section V that VGG [29] or
ArcFace [31] performs unsatisfactorily under severe
illumination variations.
As only one face image is available for each driver in the real
intelligent transportation systems, it is difficult to collect
sufficient driver face image pairs for deep learning training. We
are motivated to research a novel model-driven based
illumination invariant approach, and then we try to integrate the
illumination invariant approach and the pre-trained deep
3
learning model to tackle severe illumination variation face
recognition with single sample problem.
From the driver face images of the real intelligent traffic
monitoring systems as shown in Fig.1, the driver face images
are with severe holistic illumination variations. Our previous
work [14] indicated that the illumination invariant measure
performed better than the high-frequency facial feature under
severe holistic illumination variations (i.e. images of subset 5 of
Extended Yale B [35]), since holistic illumination variations
satisfy that illumination intensities of neighborhood pixels are
approximately equal in the face local region.
B. Related works
The illumination invariant measure aims to eliminate the
illumination of the contaminated face image to form the
reflectance based pattern. The Weber-face [32] constructed a
simple reflectance based pattern that the difference of the center
pixel and its neighbor pixel to the center pixel in a local region
with size of 3×3. Then, the Weber-face was extended to multi
local regions to develop the generalized Weber-face (GWF)
[33]. Recently, the Weber-face was extended to the logarithm
domain. The multiscale logarithm difference edgemaps
(MSLDE) [34] constructed the reflectance based pattern from
multi local edge-regions of the logarithm face. The local near
neighbor face (LNN-face) [14] extracted illumination invariant
measure directly from multi local block-regions of the
logarithm face.
III. GENERALIZED ILLUMINATION ROBUST MODEL
A. The illumination invariant unit
Our previous work [14] indicated that the illumination
invariant measure of the logarithm image had better tolerance
than that of the pixel image to severe illumination variations.
The illumination invariant unit (
IIU
) in the logarithm domain
is defined as
( , ) ( , ) ( , )
i j i j
IIU x y lnI x y lnI x y
,
( , )
i j k
xy
(1)
where
( , )xyI
denotes the pixel intensity of the image point
.
( , )xy
denotes the center point of the local region
k
,
and
( , )
ij
xy
denotes the neighbor point of
( , )xy
in
k
. From
the lambertian reflectance model [17], the logarithm image can
be presented as
( , ) ( , ) ( , )x y x y x ylnI lnR lnL
, where
R
and
L
are the reflectance and the illumination. If illumination
intensities are equal in
k
(i.e.
( , ) ( , )
ij
x y x ylnL lnL
),
IIU
is
the reflectance based pattern, which is regarded as illumination
invariant. However, [14] indicated that severe illumination
variations could cause the high-frequency interference to
contaminate
IIU
.
The sum of all the illumination invariant units (
IIUs
) in the
local region
k
can be presented as
,
( , ) ( , )
k i j
xy
i j k
IIU x y IIU x y

(2)
Hence, the illumination invariant measure (
IIM
) in the
logarithm domain can be presented as
1
( , ) ( , )
N
kk
k
IIM x y IIU x y
(3)
where
N
is the number of the local regions, and
k
is the
weight associated with
k
IIU
. If
k
is the local block-region,
formula (3) is LNN-face without the sigmoid function [14]. If
k
is the local edge-region, formula (3) is MSLDE without the
arc-tangent function [34]. Fig.2 shows some local
block-regions and local edge-regions. It can be seen that the
combination of N edge-regions is equal to the block region with
k=N.
Block region
with k=1 Block region
with k=2 Block region with
k=3
Edge region
with k=1 Edge region
with k=2 Edge region with
k=3
Fig.2. Some local block-regions and local edge-regions.
Fig.3 shows the values of
k
IIU
of 100 points in different
edge-regions. It can be seen that
k
IIU
of large
k
is much
larger than that of small
k
, since the large edge-region contains
more illumination invariant units than the small edge-region as
shown in Fig.2. The same conclusion can also be done for the
k
IIU
of the block-region.
Fig.3. The numerical values of IIUk (k=1,2,3,4,5) of 100 points in different
edge-regions. Three images from left to right are the original pixel image, the
logarithm image and the Gauss smoothed logarithm image with Blue Line.
Blue Line consists of 100 points used here.
MSLDE [34] or LNN-face [14] employed multi local regions
to develop the illumination invariant measure, since multi local
regions not only cover more discriminative information but
also mitigate the effects of the high-frequency interference.
MSLDE [34] designated the large weight to
k
IIU
whose
IIUs
are close to the center point
,()xy
, whereas LNN-face [14]
assigned the large weight to
k
IIU
with more discriminative
information. The weights recommended by MSLDE and
LNN-face are termed as weights of MSLDE and weights of
LNN-face respectively. In point of view of the weights,
MSLDE [34] and LNN-face [14] are tailored to the conditions
of the particular datasets, whereas they might be unable to
achieve high performances on the face databases with different
conditions.
B. The generalized illumination invariant model
From Fig.3, it seems that the value of
k
IIU
of one point
have the same sign when
2k
. Based on our test,
k
IIU
of
1k
may not share the same sign with that of
2k
for some
points, since the small local region with less
IIUs
is sensitive
to the high-frequency interference.
As zero
IIU
contributes nothing to
k
IIU
, in point of view
of numerical value sign, we consider that all the illumination
invariant units
IIUs
in the local region
k
can be replaced by
the positive illumination invariant unit
0IIU
and the
4
negative illumination invariant unit
0IIU
. Then formula (2)
can be re-defined as
,
,,
( , ) ( , ) ( , )
( , ) ( , )
( , ) ( , )
k i j i j
xy
i j k
i j i j
x y x y
i j k i j k
kk
IIU x y IIU x y IIU x y
IIU x y IIU x y
IIU x y IIU x y



 





(4)
where
0
k
IIU
and
0
k
IIU
. The illumination invariant
measure in formula (3) can be represented as
 
1
( , ) ( , ) ( , )


N
k k k
k
IIM x y IIU x y IIU x y
(5)
From MSLDE [34] and LNN-face [14], the differences of
weights of MSLDE (or LNN-face) are very small, and the
numerical value of
k
IIU
of large
k
is much larger than that of
k
IIU
of small
k
as shown in Fig.3. The weights of MSLDE or
LNN-face cannot change the fact that
k
IIU
with large k plays
dominant role in
IIM
. Hence, the role of the weights in
formula (5) is not significant to
IIM
.
Based on the assumption of the illumination invariant
measure that the illumination intensities are approximately
equal in the face local region, all
IIUs
have the same
reflectance based pattern. We consider that each
IIU
should
share the same contribution to
IIM
in the face local regions,
thus all
IIUs
should be assigned the same weight in
IIM
.
In this paper, we assigned
=1
k
(
1,kN
) in formula (5)
to equally treat each
IIU
in multi local regions, which can be
regarded as a generalized strategy. Although
IIU
and
IIU
are counterpart illumination invariant units, the combination of
IIU
and
IIU
can mitigate the effects of the high-frequency
interference. In formula (3), several weights
k
(
1,kN
)
are used to adjust the proportion of
k
IIU
in
IIM
, here we still
require to control the proportions of
IIU
and
IIU
in
formula (5), which can be achieved by only one parameter. We
propose to directly combine
IIU
and
IIU
to form the
generalized illumination robust (
GIR
) model as below
11
( , ) ( , ) ( , )
NN
kk
kk
GIR x y IIU x y IIU x y





(6)
where
and
are the weights and
2


. Only one
weight
(or
) can control the generation of the
GIR
image.
When
1
and
1
, formula (6) is equal to formula (5) with
1
k
(
1,kN
).
In formula (6), only one weight
(or
) should be
estimated. Despite the GIR model extracts
IIUs
from multi
face local regions, the weight number of the GIR model is much
less than that of MSLDE [34] or LNN-face [14].
C. The generation of GIR images
As many weights were involved in MSLDE [34] and
LNN-face [14], it is difficult to establish a simple strategy to
generate several illumination invariant images based on their
weights to tackle single sample problem. From formula (6), the
weight (
or
) of the GIR model is very simple, we can
generate several GIR images by different weights.
In formula (6), the GIR image can be generated by either the
local edge-region or the local block-region. As MSLDE6 [34]
utilized 6 edge-regions and LNN-face [14] used 5
block-regions to develop the illumination invariant measure,
the edge-region based GIR (EGIR for brevity) image and the
block-region based GIR (BGIR for brevity) image employ 6
edge-regions and 5 block-regions respectively in this paper.
Original images
Logarithm images
EGIR images
β = 0
β = 0.4
β = 1
β = 1.6
β = 2
BGIR images
β = 0
β = 0.4
β = 1
β = 1.6
β = 2
Difference images
of BGIR and
EGIR images
β = 0
β = 0.4
β = 1
β = 1.6
β = 2
Fig.4. Some GIR images with different parameters. From top to bottom, 1st row:
original images; 2nd row: logarithm images; 3rd to 7th rows: EGIR images
with parameter β: 0, 0.4, 1.0, 1.6, and 2.0; 8th to 12th rows: BGIR images with
parameter β: 0, 0.4, 1, 1.6, and 2; 13th to 18th rows: difference images of
BGIR and EGIR images with parameter β: 0, 0.4, 1, 1.6, and 2.
5
Fig.4 shows some GIR images with different weights. It can
be seen that the GIR images (i.e. EGIR images and BGIR
images) vary from dark to bright when
changes from 0 to 2.
EGIR images and BGIE images of the same original image
seem very similar under the same
, since they are composed
of the same or neighbouring illumination invariant units in the
face local regions, whereas their numerical values are quite
different, which can be seen from their difference images (i.e.
difference images of BGIR images and EGIR images).
In theoretically, the illumination of the GIR image is
eliminated, the GIR images of original images of one face are
very similar under the same value of
as shown in Fig.4.
However, from Fig.4, different values of
can cause visual
differences of the GIR images of the same original image. The
visual differences of GIR images can be regarded as one kind of
intra class variations caused by the illumination. EGIR image
and BGIR image algorithms are listed in Table I.
IV. THE CLASSIFICATION MODEL
A. Single GIR image based classification
Previous approaches [14], [32]-[34] usually employed the
saturation function to tackle the high-frequency interference in
the illumination invariant measure, which can also be
conducted on the single GIR image in formula (6) as
 
11
( , ) ( , )
( , ) (2 ) ( , )
NN
kk
kk
GIR face x y arctan GIR x y
arctan IIU x y IIU x y
 



 

(7)
Formula (7) is termed as the GIR-face. The edge-region
based GIR-face and the block-region based GIR-face are
termed as EGIR-face and BGIR-face respectively. In formula
(7), we employ
4
, which was recommended by previous
approaches [32] and [34].
Similar with MSLDE [34] and LNN-face [14], it is essential
to estimate the weight
for EGIR-face and BGIR-face in
formula (7). In this paper, we estimate
by experiments on
the Yale B face database [35], which covers a wide range of
illumination variations. Our experiments are as follows. 1) The
first image of each person in Subset 1 is used to form the single
training set (i.e. Normal training images), and the rest images of
Yale B are designated to test. 2) The first image of each person
in Subset 5 forms the single training set (i.e. Contaminated
training images), and the rest images of Yale B are assigned to
test. The GIR-faces of Yale B images are directly used to
conduct classification by the nearest neighbor classifier based
on Euclidean distance.
Fig.5 and Fig.6 show recognition rates of EGIR-face and
BGIR-face under different values of
by using normal
training images and contaminated training images respectively.
It can be seen that EGIR-face and BGIR-face achieve high
recognition rates when
0.4
. Hence,
0.4
and
1.6
are adopted in formula (7) in this paper. EGIR-face and
BGIR-face algorithms are listed in Table I.
As EGIR-face or BGIR-face aims to tackle severe
illumination variations, the Yale B face database with severe
illumination variations is selected as the particular dataset that
is tailored to EGIR-face or BGIR-face. The weight generated
on this tailored dataset can make EGIR-face or BGIR-face
achieve high performance on the face database with severe
illumination variations, whereas they might not achieve high
performances on the face databases without severe illumination
variations.
Fig.5. Recognition rates of EGIR-face under different values of β.
Fig.6. Recognition rates of BGIR-face under different values of β.
TABLE I
EGIR IMAGE, BGIR IMAGE, EGIR-FACE AND BGIR-FACE ALGORITHMS
Step 1. Input a logarithm image
( , ) ( , ) ( , )x y x y x ylnI lnR lnL
with severe
illumination variations.
Step 2.
( , )xylnI
is convolved with Gaussian kernel function
22
22
( , , ) 1
22
xy xy
G exp
 




for Smoothening.
Step 3. Calculate
IIU
and
IIU
of edge-regions and block-regions by
formula (4).
Step 4. Obtain EGIR image and BGIR images by formula (6).
Step 5. Obtain EGIR-face and BGIR-face by formula (7).
B. Multi GIR images based classification
As mentioned above, the current illumination invariant
measure utilized the saturation function such as the arc-tangent
function [32]-[34] and the bipolar sigmoidal function [14] to
eliminate the high-frequency interference, and then the
template matching method such as nearest neighbor classifier
was used to conduct the final classification. The saturation
function can really improve the recognition performance of the
illumination invariant measure under template matching
classification. However, the saturation function may cut some
valid information. In fact, the nearest neighbor classifier is
sensitive to noise (i.e. high-frequency interference), whereas
the sparse representation classification (SRC) [36] is robust to
noise.
Here, we employ ESRC [6] to classify multi GIR images to
tackle severe illumination variation face recognition with single
sample problem. Multi GIR images based classification can be
presented as
6
 
2
21
,V

   

   
   
GG
xVV
xx
min G xx
(8)
12
11
[ , ,

G
,
1,
t
,
,
j
i
,
1,
n
2,
n
,
]
t
n
where
is the GIR image of the testing image,
G
is the GIR
image based training set and
j
i
is the
thj
GIR image of the
thi
training person,
V
is the GIR image based generic intra
class variation set.
 
;
GV
x x x
is the spare coefficient vector,
where
G
x
and
V
x
are the sparse coefficients corresponding to
G
and
V
respectively. The classification rule of formula (8) is
 
2
()
,V G
iV
x
argmin G x



(9)
where
()
G
x
is a skeleton vector whose nonzero entry is the
one in
x
that is corresponding to class
i
. Formulas (8) and (9)
are termed as multi GIR images based classification (GIRC) in
this paper. The edge-region based GIRC and the block-region
based GIRC are briefly termed as EGIRC and BGIRC. The
Homotopy method [37] is employed to solve the
L1-minimization problem in formula (8). GIRC algorithm is
listed in Table II.
As the training person lacks intra class variation information
under single sample problem, the single training image takes
the GIR model to generate multi training GIR images. Multi
training GIR images can improve the representation ability of
the recognition model in formula (8), due to the fact that more
intra class variations of the single training image are covered as
shown in Fig.4. In our experiments, we selected three GIR
images with
= 0.4, 1, and 1.6 to form multi training GIR
images of each single training image. Based on our test, it is
unable to improve the performance of formulas (8) and (9),
when the number of training GIR images of each person is
changed to five (i.e. all five GIR images in Fig. 4).
From Fig. 4, the GIR images with
=1 are with appropriate
vision and distinguished features. When
=1, the GIR image
is synthesized by the inherent information of positive and
negative illumination invariant units. Hence,
is generated by
the GIR image with
=1 in formula (8).
In formula (8), the intra class variation set
V
is generated by
the GIR images with
=1 of the generic images. As faces
share similar intra class variations, the generic images outside
training and testing images, are usually used to model the intra
class variations of the single training image. Generally, the
generic images are available with each person containing multi
images, which are unnecessary to generate multi GIR images,
since multi images of the generic person can produce sufficient
face intra class variation information. However, if the generic
images are less, the GIR model can also be used to generate
multi GIR images of each generic person to model the face intra
class variations sufficiently.
C. Multi GIR images and pre-trained deep learning model
based Classification
The aims of the GIR model and the pre-trained deep learning
model are to extract similar facial features of illumination
contaminated images of the same face. Formula (8) can also be
extended to the pre-trained deep learning model. We utilize the
linear combination characteristic of the ESRC model to
integrate the GIR model and the pre-trained deep learning
model. The ESRC residual of the multi GIR images and the
ESRC residual of the pre-trained deep learning features can be
combined to conduct classification. The classification of the
multi GIR images and the pre-trained deep learning features is
 
 
22
,2 1 2 1
, V , V
G G Gdl Gdl
dl dl dl
xx
dl V V Vdl Vdl
x x x x
min G G
x x x x
 
   
 
   
   
(10)
where
dl
is the pre-trained deep learning feature of the testing
image,
dl
G
is the pre-trained deep learning feature set of the
training images,
dl
V
is the pre-trained deep learning feature
based generic intra class variation set.
 
;
dl Gdl Vdl
x x x
is the
spare coefficient vector. The classification rule of formula (10)
is
 
 
 
 
22
,V ,V
G Gdl
dl dl dl
iV Vdl
xx
argmin G G
xx


 
 
 
 
(11)
Formulas (10) and (11) are termed as multi GIR images and
pre-trained deep learning model based classification (GIR-PDL
for brevity). In this paper, the pre-trained deep learning models
VGG [29] and ArcFace [31] are adopted. Multi EGIR images
and VGG (or ArcFace) based classification is briefly termed as
EGIR-VGG (or EGIR-ArcFace), and multi BGIR images and
VGG (or ArcFace) based classification is briefly termed as
BGIR-VGG (or BGIR-ArcFace). GIR-PDL algorithm is listed
in Table II.
TABLE II
GIRC AND GIR-PDL ALGORITHMS
Step 1. Input training images with single sample per person, a test image and
multi generic images.
Step 2. Generate multi GIR images of each single training image, the GIR
image of the test image, and the GIR image of each generic image to form
G
,
and
V
.
Step 3. Generate the pre-trained deep learning features of each single
training image, the test image, and each generic image to obtain
dl
G
,
dl
and
dl
V
.
Step 4. Normalize each column of
G
,
,
V
,
dl
G
,
dl
and
dl
V
to have
unit L2-norm.
Step 5. Obtain GIRC by formulas (8) and (9).
Step 6. Obtain GIR-PDL by formulas (10) and (11).
V. EXPERIMENTS
A. Face databases
This paper focus on severe illumination variation face
images, several available benchmark illumination variation
face databases are employed. The performances of the
proposed methods are validated on the Extend Yale B [35],
CMU PIE [38], AR [39], our self-built Driver [14] and
VGGFace2 [40] face databases.
In our experiments, the large scale VGGFace2 images are
automatically cropped and aligned by MTCNN [41]. The small
scale Extended Yale B, CMU PIE, AR and Driver images are
manually cropped and aligned, since face images with severe
illumination variations such as Subset 5 images of Extended
Yale B cannot be cropped and outputted by MTCNN, and
severe illumination variation face images processed by
7
logarithm transformation as shown in Fig.4 can easily conduct
manual face alignment. Hence, both automatic alignment and
manual alignment can be used to tackle a complex illumination
variation face alignment task. For fair comparison, all face
images exclude the background information as shown in Fig.7.
Extended
Yale B
CMU
PIE
AR
Driver
Subset5Subset4
Subset1 Subset5Subset2 Subset3
C09C29
C27 C09C27 C29
Session2
Session2
Session1 Session2
In carIn car
Indoor In carIndoor Indoor
Session1 Session1
VGGFace2
Fig.7. Some images from Extended Yale B, CMU PIE, AR, Driver and
VGGFace2 face databases.
It is worth noting that, illumination variations are linear, and
pose/expression variations are nonlinear. As most driver face
images are with frontal pose, natural expression and severe
illumination variations, proper face alignment is essential for
the model-driven based illumination processing method and the
linear method SRC [36], whereas stronger alignment may cause
discriminative information loss.
The compared model-driven based approaches use grayscale
face images, and the data-driven based approaches VGG and
ArcFace utilize color face images. As the real driver face region
is around 50×50 pixels in the intelligent traffic monitoring
systems, all grayscale images are resized to 50×50 pixels for the
compared model-driven based approaches in our experiments.
The Extended Yale B database [35] incorporates grayscale
images of 38 persons. 64 frontal face images of each person are
divided into subsets 1-5 with illumination variations from slight
to severe. Subsets 1-5 consist of 7,12,12,14 and 19 images per
person respectively. As the original Extended Yale B face
images are grayscale, three RGB channels of the color image
used by VGG and ArcFace employ the same grayscale image.
The first 10 persons of Extended Yale B form the Yale B face
database.
The CMU PIE [38] database incorporates color images of 68
persons. 21 images of each person from each of C27 (frontal
camera), C29 (horizontal 22.5 °camera) and C09 (above
camera) in CMU PIE illum set are selected. CMU PIE face
images are with slight/moderate/severe illumination variations.
From Fig.5, pose variation of C29 is larger than that of C09.
The AR database [39] incorporates color images of 126
persons in two sessions. 100 persons (50 males and 50 females)
in session 1 and session 2 are selected, and 10 images of each
person are selected, which include variations of expression
(neutral, smile, anger and scream), illumination (left light, right
light and all side lights) and occlusion. Scarf images are
included, whereas sunglass images are excluded.
The self-built Driver database [14] was used to explore the
identity recognition problem for the drivers in the intelligent
transportation systems. 28 individuals with 22 different images
per person are selected. These images are taken under two
scenes (indoor and in car). Each person contains 12 and 10
different images for scene 1 (indoor) and scene 2 (in car).
The VGGFace2 database [40] incorporates 3.31 million
color images of 9131 persons, which are with large variations in
pose, age, illumination, ethnicity and profession. MTCNN [41]
is employed to tackle VGGFace2 images, which results in
3308101 images of 9131 persons, where VGGFace2 train set is
with 8631 persons and 3138924 images, and VGGFace2 test set
is with 500 persons and 169177 images.
In the experiments of Extended Yale B, CMU PIE, AR and
Driver, 8 persons (11th -18th) are selected to make up the
generic set in each dataset as shown in Tables III, IV and V, and
the rest persons are used for validation. For each dataset
excluding the generic persons, the single training set consists of
one image of each validation person, and the rest images of
each validation person are designated to test. The first image to
the last one of each validation person is designated to form the
single training set in turn, thus the testing times of each dataset
is equal to the number of images of each person in the dataset. It
is worth noting that every person has the same number of
images in each dataset. Hence, the recognition rates are average
results in tables III, IV and V. The experiments are more
challenging for the compared methods than previous works
[8]-[16], since more single training sets are used here. Our
experiments can make significant distinctions for many
compared methods as shown in Tables III, IV and V.
In the experiments of VGGFace2, images of the last person
are used to construct the generic set in each dataset as shown in
Tables VI and VII, and the rest persons are used for validation.
For each dataset, the single training set consists of the first
image of each validation person, and the rest images of each
validation person are designated to test.
B. Compared methods
(1) Proposed method. EGIR-face, BGIR-face, EGIRC,
BGIRC, EGIR-VGG/ArcFace and BGIR-VGG/ArcFace. Three
GIR images (i.e.
= 0.4, 1, and 1.6) are generated for EGIRC,
BGIRC, EGIR-VGG/ArcFace and BGIR-VGG/ArcFace.
(2) High-frequency facial feature and Local pattern
descriptor. LOG-DCT [11], LTV [12], SL-LTV [13],
HFSVD-face [14], and CSQP [23]. The parameters are the
same as the original paper recommended.
(3) Illumination invariant measure. Gradient-face [15],
Weber-face [31], MSLDE [34], LNN-face [14],
MSLDE+ESRC and LNN-face+ESRC. The MSLDE6 in [34]
is adopted. LNN-face+ESRC represents that ESRC is used to
classify LNN-faces of the face images, the same interpretation
can also be done for MSLDE+ESRC.
(4) Pre-trained deep learning model. VGG [29] and
ArcFace [31], VGG/ArcFace+ESRC. The 4096D VGG feature
and the 512D ArcFace feature are used. VGG/ArcFace+ESRC
has the same interpretation as LNN-face+ESRC.
(5) Original and LOG. Original and LOG represent the
pixel image without any processing and the logarithm image,
which are directly used as facial features for recognition.
8
(6) Source code location. The codes of Log-DCT,
Gradient-face and Weber-face were downloaded at http://luks.
fe.uni-lj.si/sl/osebje/vitomir/face_tools/INFace/index.html.The
code of LTV was downloaded at http://www.caam.rice.edu
/~wy1/ParaMaxFlow/2007/06/binarb-code.html. The code of
VGG was downloaded at http://www.robots.ox.ac.uk/_vgg
/software/vgg_face/. The code of ArcFace was downloaded at
https://github.com/deepinsight/ insightface, and the third party
pre-trained model model-r100-ii was adopted and downloaded
at https://pan.baidu.com/s/1wuRTf2YIsKt76TxFufsRNA,
which was trained by MS1MV2 (85742 persons and 5.8M
images). The code of Homotopy [37] was downloaded at
http://www.eecs.berkeley.edu/_yang/software/l1benchmark/,
where the error tolerance
=0 is used. The parameters of
Gradient-face, Weber-face, LTV and VGG are the same as the
source codes recommended.
If there is not specifically stated, the compared methods
(Original, LOG, LOG-DCT, LTV, SL-LTV, HFSVD-face,
Weber-face, MSLDE, LNN-face, VGG and ArcFace) employ
the nearest neighbor (NN) classifier with Euclidean distance for
the classification, whereas Gradient-face uses the classifier as
[15] recommended. LOG-DCT, LTV, SL-LTV, HFSVD-face,
Weber-face, MSLDE, LNN-face, EGIR-face and BGIR-face
are termed as the illumination invariant approaches.
C. Experiment results
Tables III, IV and V list average recognition rates of the
compared methods on Extended Yale B, CMU PIE, AR and
Driver datasets. Tables VI and VII list recognition rates of some
compared methods onVGGFace2 train+test set and test set.
(1) Extended Yale B. The Extended Yale B database is with
extremely challenging illumination variations. Face images in
Subsets 1-2 are with slight illumination variations. Face images
in Subset 3 are with small scale cast shadows, and face images
in Subset 4 are with moderate scale cast shadows, whereas face
images in Subset 5 are with large scale cast shadows (or severe
holistic illumination variations). From Table III, we can
conclude some important results as below.
1) EGIRC and BGIRC outperform EGIR-face and
BGIR-face due to multi GIR images and ESRC based classifier.
EGIR-face and BGIR-face perform better than MSLDE and
LNN-face under severe illumination variations, whereas lag
behind MSLDE and LNN-face respectively on Subsets 1-3
with slight illumination variations and small scale cast shadows,
due to the fact that EGIR-face and BGIR-face are tailored to
severe illumination variations as shown in Fig.5 and Fig.6.
2) As VGG/ArcFace performs well on Subsets 1-3, but
unsatisfactorily under severe illumination variations, it is easy
to know that EGIR-VGG/ArcFace and BGIR-VGG/ArcFace
outperform EGIRC and BGIRC on Subsets 1-3 respectively,
but lag behind EGIRC and BGIRC on other datasets except on
Subset 4. Although VGG/ArcFace degrades the performance of
the GIR-DPL model under severe illumination variations,
VGG/ArcFace can improve the performance of the GIR-DPL
model on Subset 4, due to Subset 4 images are with moderate
cast shadows, which are not as extreme as Subset 5 images.
3) ArcFace outperforms VGG on all face datasets except on
Subsets 1-3, where ArcFace slightly lags behind VGG, since
ArcFace and VGG can well tackle Subsets 1-3 images with
slight illumination variations and small scale cast shadows,
whereas other face datasets of Extended Yale B contain images
with severe illumination variations. Hence, ArcFace performs
better than VGG under severe illumination variations.
Moreover, ArcFace lags behind the compared illumination
invariant approaches, especially on severe illumination
variation datasets.
TABLE III
THE AVERAGE RECOGNITION RATES (%) OF THE COMPARED METHODS ON THE
EXTENDED YALE B FACE DATABASE
Method
Subsets1-3
Subset4
Subset5
Subsets4-5
Total
Original
48.52
19.63
15.79
14.96
20.21
LOG
49.58
32.55
32.51
26.41
22.39
LOG-DCT
83.34
70.79
92.68
81.74
76.41
LTV
78.08
56.90
68.56
59.23
58.32
SL-LTV
79.45
60.97
73.82
64.48
61.85
HFSVD-face
94.92
83.02
97.97
90.18
86.49
Gradient-face
85.57
68.06
95.70
83.78
68.42
Weber-face
87.07
58.66
92.52
77.66
74.21
CSQP
83.13
59.04
87.67
74.55
65.53
MSLDE
81.30
53.35
81.45
66.79
60.27
LNN-face
84.83
61.59
92.02
77.98
70.32
EGIR-face
77.20
61.69
88.12
74.54
66.74
BGIR-face
77.99
70.15
93.27
82.17
72.75
VGG
86.31
47.14
27.67
30.90
45.32
ArcFace
85.56
53.28
30.93
35.49
49.71
MSLDE+ESRC
90.19
66.41
92.17
80.38
75.60
LNN+ESRC
92.55
76.08
97.11
88.36
82.70
VGG+ESRC
94.19
61.90
40.60
43.58
57.75
ArcFace+ESRC
91.35
58.55
36.16
41.78
55.95
EGIRC
95.88
75.62
96.31
86.84
83.59
BGIRC
96.31
78.53
97.30
89.27
86.69
EGIR-VGG
98.33
81.79
84.30
80.69
82.28
EGIR-ArcFace
97.92
79.49
83.13
79.19
78.24
BGIR-VGG
98.42
82.45
82.84
80.19
83.53
BGIR- ArcFace
97.95
79.19
81.73
78.30
78.97
(2) CMU PIE. Some CMU PIE face images are bright (i.e.
slight illumination variations), and other face images are with
partial dark (i.e. moderate/severe illumination variations).
Illumination variations of CMU PIE are not as extreme as those
of Extended Yale B. From Table IV, we can attain the
following results.
1) Images in each of C27, C29 and C09 are with the same
pose (i.e. frontal, 22.5°profile and downward respectively),
whereas images in each of C27+C29 and C27+C09 incorporate
two face poses (i.e. frontal pose and non-frontal pose).
Although VGG/ArcFace cannot achieve the highest recognition
rates under fixed pose and moderate/severe illumination
variations, VGG/ArcFace performs much better than the
illumination invariant approaches under multi face poses and
moderate/severe illumination variations. Moreover,
EGIR-VGG/ArcFace and BGIR-VGG/ArcFace outperform
VGG/ArcFace+ESRC, which illustrates that the GIR model
can improve the performance of the GIR-DPL model under
illumination and pose variations.
2) On C27+C29 and C27+C09, BGIR-face lags behind
LNN-face, whereas BGIR-face is superior to LNN-face on C27,
C29 and C09, which illustrates that BGIR-face outperforms
LNN-face under fixed face pose such as on C27, C29 or C09,
whereas lags behind LNN-face under multi face poses such as
C27+C29 or C27+C09. Hence, BGIR-face is more sensitive to
pose variations than LNN-face.
3) ArcFace outperforms VGG on all face datasets except on
C27+C29, where ArcFace slightly lags behind VGG, which
9
illustrates that ArcFace slightly lags behind VGG to tackle
frontal and 22.5°profile face images with moderate/severe
illumination variations as shown in Fig,7. Although C27+C09
images also incorporate frontal faces and downward faces, pose
variation of C09 downward face is not as large as that of C29
profile face. Moreover, ArcFace lags behind the illumination
invariant approaches on C27 and C29, whereas ArcFace
outperforms the illumination invariant approaches on C09, and
performs much better than the compared illumination invariant
approaches on C27+C29/C27+C09 due to pose variations.
TABLE IV
THE AVERAGE RECOGNITION RATES (%) OF THE COMPARED METHODS ON THE
CMU PIE FACE DATABASE
Method
C27
C29
C09
C27+C29
C27+C09
Original
30.31
30.17
27.52
20.88
19.97
LOG
31.19
30.06
27.04
20.18
19.66
LOG-DCT
93.12
86.88
90.08
46.83
45.69
LTV
87.13
80.46
81.70
46.89
44.45
SL-LTV
88.83
80.95
85.92
47.29
45.31
HFSVD-face
94.50
87.30
91.71
52.82
51.21
Gradient-face
88.26
85.71
87.58
51.64
53.26
Weber-face
89.17
84.00
89.17
49.46
46.42
CSQP
86.36
82.46
83.21
51.97
49.81
MSLDE
81.01
77.57
80.04
46.89
48.41
LNN-face
89.26
84.67
88.29
50.29
51.32
EGIR-face
82.12
83.50
83.33
47.75
47.66
BGIR-face
89.30
89.25
89.72
50.06
49.26
VGG
87.33
76.91
86.67
79.78
83.69
ArcFace
91.90
78.02
97.51
79.57
86.62
MSLDE+ESRC
91.68
88.46
90.46
57.05
58.35
LNN+ESRC
95.09
91.85
94.70
57.08
59.38
VGG+ESRC
95.73
89.02
94.90
91.70
94.00
ArcFace+ESRC
94.89
81.40
97.85
83.48
89.32
EGIRC
92.88
87.86
92.96
60.13
58.12
BGIRC
93.86
89.00
94.21
58.66
55.91
EGIR-VGG
98.88
95.48
98.52
93.95
94.35
EGIR-ArcFace
98.40
93.38
99.07
88.65
89.17
BGIR-VGG
99.08
95.91
98.88
94.40
95.06
BGIR-ArcFace
98.66
93.92
99.37
88.92
89.94
(3) AR and Driver. AR face images are with frontal pose,
slight illumination and moderate/severe expression variations
as well as scarf occlusion. Driver face images are with frontal
faces and moderate/severe illumination variations. Illumination
variations of AR and Driver are not as severe as those of
Extended Yale B and CMU PIE. From Table V, we can obtain
the following results.
1) On AR, for NN based classification, HFSVD-face
outperforms VGG on AR1 and slightly lags behind VGG on
AR2, whereas VGG is superior to HFSVD-face by margins of
over 5% on AR1+AR2, which indicates that VGG is more
robust than the illumination invariant approaches, when the
face dataset is extended. For ESRC based classification,
BGIR-VGG and EGIR-VGG achieve the best performances,
which illustrates that the model-driven approach and the
data-driven approach can be well integrated to tackle face
recognition with various variations.
2) On AR, ArcFace lags behind VGG, which indicates that
VGG is superior to ArcFace to address frontal face images with
moderate/severe expression and slight illumination variations
as shown in Fig.7. ArcFace also lags behind several compared
illumination invariant approaches on AR face datasets.
4) On Driver, ArcFace outperforms all the compared
illumination invariant approaches, and performs much better
than VGG. The reason can be explained as below. ArcFace is
more efficient than VGG to tackle frontal faces with
moderate/severe illumination variations, and Driver face
images are more similar with internet face images than face
images from Extended Yale B, CMU PIE and AR as shown in
Fig.7. EGIR-ArcFace achieves the best performance under
ESRC based classification on Driver face database.
TABLE V
THE AVERAGE RECOGNITION RATES (%) OF THE COMPARED METHODS ON THE
AR AND DRIVER FACE DATABASES
Method
AR1
AR2
AR1+AR2
Driver
Original
14.11
13.21
13.82
33.95
LOG
19.17
18.08
17.14
36.73
LOG-DCT
38.66
36.01
31.40
40.92
LTV
48.83
47.33
42.08
54.92
SL-LTV
49.19
46.93
42.02
55.11
HFSVD-face
63.76
58.49
53.28
68.83
Gradient-face
57.69
55.14
50.48
69.25
Weber-face
49.15
47.43
42.17
62.89
MSLDE
45.30
43.21
38.99
69.63
CSQP
50.19
47.67
43.81
67.81
LNN-face
50.53
48.76
44.23
71.80
EGIR-face
44.41
42.11
37.02
68.56
BGIR-face
44.35
42.21
37.08
67.67
VGG
60.58
59.71
58.61
66.18
ArcFace
41.41
40.37
39.10
76.46
MSLDE+ESRC
61.61
56.92
54.84
79.16
LNN+ESRC
65.66
61.94
59.31
79.53
VGG+ESRC
75.40
74.20
73.68
77.65
ArcFace+ESRC
47.61
46.00
46.15
81.13
EGIRC
67.69
65.07
60.53
85.12
BGIRC
68.03
64.29
60.47
81.97
EGIR-VGG
83.53
80.93
79.66
91.17
EGIR-ArcFace
71.30
67.85
66.51
91.34
BGIR-VGG
83.51
81.35
80.21
89.02
BGIR-ArcFace
71.49
67.46
66.14
90.28
(4) VGGFace2. As VGGFace2 images are composed of
bright internet face images with large pose/expression
variations, and the illumination of VGGFace2 images are not as
severe as Extended Yale B and CMU PIE, which cannot well
validate the proposed illumination invariant approaches. From
Tables VI and VII, we can get the following results.
1) For NN based classification, ArcFace outperforms VGG.
Besides different network structures of ArcFace and VGG,
another one of the main reasons is that ArcFace employed
85742 persons and 5.8M images to train, whereas VGG was
trained by 2622 persons and 2.6M images. ArcFace and VGG
perform much better than other compared illumination
invariant approaches, since ArcFace and VGG are well trained
by large scale internet face images, whereas the illumination
invariant approaches do not depend on large scale face images
to train.
2) For ESRC based classification, ArcFace+ESRC is slightly
better than ArcFace, the reason is that ArcFace is trained by
MS1MV2 face images, which are very similar with VGGFace2
face images in comparison with Extended Yale B, CMU PIE,
AR and Driver face images. ArcFace can extract very
discriminative facial features of VGGFaces2 face images, and
ESRC cannot efficiently improve the performance of 512D
ArcFace. However, ESRC can significantly improve the
performance of 4096D VGG, thus EGIRC-VGG and
BGIRC-VGG achieve the highest recognition rates on
VGGFaces2 test set.
10
The four face databases Extended Yale B, CMU PIE, AR and
Driver are with small size in comparison with the large scale
face database VGGFace2, whereas these four face databases
are with benchmark illumination variations, which can be used
to well validate the performance of the illumination invariant
approaches, since the large scale internet face images are
without severe illumination variations.
TABLE VI
THE RECOGNITION RATES (%) OF THE COMPARED METHODS ON THE
VGGFACE2 TRAIN+TEST SET
MSLDE
LNN-face
EGIR-face
1.00
0.87
0.75
BGIR-face
CSQP
ArcFace
0.65
1.03
22.69
TABLE VII
THE RECOGNITION RATES (%) OF THE COMPARED METHODS ON THE
VGGFACE2 TEST SET
MSLDE
LNN-face
EGIR-face
BGIR-face
3.53
3.07
2.93
2.55
CSQP
VGG
ArcFace
ArcFace+ESRC
3.46
28.80
34.84
35.67
EGIRC
BGIRC
EGIRC-VGG
BGIRC-VGG
3.54
3.20
41.98
44.19
D. CMC curves of some compared methods
510 15 20 25 30
45
50
55
60
65
70
75
80
85
90
95
100
Rank
Recognition rate (%)
CSQP
MSLDE
LNN-face
BGIR-face
VGGFace
ArcFace
BGIRC
BGIR-VGG
BGIR-ArcFace
Fig.8.CMC of some compared methods on Extended Yale B database.
510 15 20 25 30 35 40 45 50 55 60
50
55
60
65
70
75
80
85
90
95
100
Rank
Recognition rate (%)
CSQP
MSLDE
LNN-face
BGIR-face
VGGFace
ArcFace
EGIRC
BGIR-VGG
BGIR-ArcFace
Fig.9.CMC of some compared methods on C27+C29 of CMU PIE database.
The cumulative match characteristic (CMC) curves of some
compared methods in Extended Yale B, CMU PIE, AR, Driver
and VGGFace2 datasets are shown in Fig.8 to Fig.13. These
CMC curves follow the same experiment protocols of the
corresponding datasets in Tables III, IV, V and VII.
Recognitions rates of rank=1 in Fig.8 to Fig.13 are equal to
recognition rates of corresponding datasets in Tables III, IV, V
and VII. The proposed methods show consistent improvement
in recognition rate with increasing ranks.
510 15 20 25 30 35 40 45 50 55 60
50
55
60
65
70
75
80
85
90
95
100
Rank
Recognition rate (%)
CSQP
MSLDE
LNN-face
BGIR-face
VGGFace
ArcFace
EGIRC
BGIR-VGG
BGIR-ArcFace
Fig.10.CMC of some compared methods on C27+C09 of CMU PIE database.
10 20 30 40 50 60 70 80 90
40
50
60
70
80
90
100
Rank
Recognition rate (%)
CSQP
MSLDE
LNN-face
BGIR-face
VGGFace
ArcFace
EGIRC
BGIR-VGG
EGIR-ArcFace
Fig.11.CMC of some compared methods on AR1+AR2 database.
2 4 6 8 10 12 14 16 18 20
65
70
75
80
85
90
95
100
Rank
Recognition rate (%)
CSQP
MSLDE
LNN-face
EGIR-face
VGGFace
ArcFace
EGIRC
EGIR-VGG
EGIR-ArcFace
Fig.12.CMC of some compared methods on Driver database.
50 100 150 200 250 300 350 400 450
10
20
30
40
50
60
70
80
90
100
Rank
Recognition rate (%)
CSQP
MSLDE
LNN-face
EGIR-face
VGGFace
ArcFace
EGIRC
ArcFace+ESRC
BGIR-VGG
Fig.13.CMC of some compared methods on VGGFace2 test set.
11
Moreover, ESRC requires more than one seconds to classify
one test image under 9131 (i.e. VGGFace2 train+test set)
labeled classes on our PC Intel (R) Core (TM) i5-6500 CPU
3.20GHz, and the most computational cost is to calculate 9131
class representation residuals for the test image. Hence,
recognition rates and CMC curves of the ESRC based methods
MSLDE/LNN/VGG/ArcFace+ESRC, EGIRC/BGIRC and
EGIR /BGIR-VGG/ArcFace as well as VGG are not reported
on VGGFace2 train+test set.
E. The illumination of the driver face images
The driver face images are collected at night or day under
rain, cloudy, sunny, or clear weather. The extreme weather
conditions (such as night or day under heavy rain or snow) may
make the driver face image unrecognizable even by human,
which are more challenging than severe illumination variations.
However, a small part of driver face images is taken under
extreme weather conditions (since normal weather conditions
are much more than extreme weather conditions in our life).
Most driver face images are collected under normal weather
conditions (i.e. night or day under clear or cloudy weather),
which could suffer varying illumination rather than well
illumination, especially severe illumination variations. In a
word, severe illumination variation is one of the main
characteristics of the driver face images, which is one of the
main tough issues of the driver face recognition.
Although the illumination of clear day may be brighter than
that of clear night, the illumination of cloudy day may be darker
than that of clear night (since lighting equipment is usually used
at night). The realistic illumination conditions of the driver face
images cannot be clearly distinguished according to different
weather conditions such as night or day under clear or cloudy
weather.
In fact, the face image illumination processing has been
studied for several decades in the literature. From the view of
visual conditions, illuminations of the face images can be
roughly divided into well illumination and varying illumination
(i.e. slight, moderate and severe illumination variations). Well
illumination can improve the performance of face recognition,
whereas varying illumination could degrade the performance of
face recognition. It is proper to assess the driver face image
according to the illumination of the face image itself rather than
the weather condition of the face image. Due to the complexity
of illumination variations, in the literature, no strict and
accurate criterion can be used to assess the face image
illumination conditions, and it is difficult to actually give the
illumination levels of various face images. A recent face image
illumination level estimation method used the singular values
to assess the face image illumination levels [14], whereas it
depended on high-quality reference images.
F. The used databases and their illumination
Extended Yale B, CMU PIE and AR are generic biometric
databases, whereas they are with benchmark illumination
variations from slight to severe, which are widely used and
recognized by the worldwide researchers in the literature. The
self-built driver face images [14] are collected under certain
illumination conditions, which cannot cover all the real
illumination variations of the driver face images in the
intelligent traffic monitoring systems. As the real driver face
images are difficult to form a validation database for the face
recognition method, it is proper to employ Extended Yale B,
CMU PIE, AR and Driver face databases to verify the
performance of the face recognition method under severe
illumination variations.
G. The proposed methods
The proposed GIR model is one of the model-driven based
illumination processing methods. Unlike the data-driven based
deep learning method, the model-driven based illumination
processing methods such as MSLDE [34] and LNN-face [14]
do not depend on large scale face images training. GIR-face in
formula (7) employs only one parameter
, and no parameter
is introduced into GIRC in formula (8) (or GIRC-DPL in
formula (10)). Hence, the proposed methods do not require a
training processing.
From Tables III, IV, V and VII, EGIR-VGG/ArcFace and
BGIR-VGG/ArcFace achieve the highest recognition rates on
all face datasets except on Extended Yale B with severe
illumination variations. The reason is that the pre-trained deep
learning model is restricted to frontal face images with severe
illumination variations, whereas this is insufficient to deny that
EGIR-VGG/ArcFace and BGIR-VGG/ArcFace are the best
approaches to tackle the driver face recognition.
EGIRC and BGIRC are good at frontal face images with
severe illumination variations, but unsatisfactory to pose
variations. Although EGIR-face and BGIR-face lag behind
MSLDE and LNN-face under slight illumination variations,
EGIR-face and BGIR-face outperform MSLDE and LNN-face
under severe illumination variations respectively.
H. The centre symmetric quadruple pattern
CSQP [23] and the proposed GIR model are image pixel
processing based approaches, whereas CSQP is for general face
recognition and the proposed GIR model aims to address severe
illumination variation face recognition. From experimental
results on Extended Yale B and CMU PIE, CSQP lags behind
the proposed BGIR-face under severe illumination variations
except on Subsets 1-3 of Extended Yale B and
C27+C29/C27+C09 of CMU PIE. It can be seen from Fig.7 that
Subsets 1-3 images of Extended Yale B incorporate slight
illumination variations and small scale cast shadows, and
C27+C29/C27+C09 images of CMU PIE contain pose
variations (i.e. frontal and non-frontal face images) and
moderate/severe illumination variations. From experimental
results on AR and Driver, CSQP outperforms the proposed
EGIR-face/BGIR-face. Since AR face images are with slight
illumination and moderate/severe expression variations as well
as scarf occlusion. Driver face images are with frontal faces and
moderate/severe illumination variations.
As discussed above, CSQP outperforms the proposed
EGIR-face/BGIR-face under slight/moderate illumination
variations as well as pose variations, whereas BGIR-face is
superior to CSQP under severe illumination variations.
Moreover, CSQP lags behind the proposed EGIRC/BGIRC and
EGIR-PDL/BGIR-PDL (PDL is VGG or ArcFace).
I. The pre-trained deep learning model
VGG was trained by 2.6M internet face images, and ArcFace
was trained by 5.8M internet face images. These large scale
12
internet face images are with large pose/expression and
slight/moderate illumination variations. From Tables III, IV
and V, VGG/ArcFace and VGG/ArcFace+ESRC performs
unsatisfactorily under severe illumination variations, and
ArcFace outperforms VGG under moderate/severe illumination
variations.
From Tables III to VII, ArcFace+ESRC lags behind
VGG+ESRC on all face datasets except on CMU PIE C09 and
Driver (since ArcFace performs much better than VGG on
CMU PIE C09 and Driver), which means ESRC can efficiently
improve the performance of VGG rather than ArcFace. One
reason can be explained as that ArcFace can extract more
discriminative facial features than VGG for the face image, the
template matching method NN classification is sufficient to
well classify ArcFace, whereas the robust classifier ESRC
cannot improve the performance of ArcFace as efficient as that
of VGG, especially on VGGFaces2. Another reason may be
that ArcFace and VGG are 512D and 4096D features
respectively, whereas 4096D feature may incorporate more
recognizable information than 512D feature. ESRC can further
significantly improve 4096D VGG rather than 512D ArcFace.
VI. CONCLUSION
In the driver face recognition systems, severe illumination
variation is a tough issue. This paper proposes the GIR model to
address severe illumination variations of the driver face images.
The proposed GIR model is efficient to severe illumination
variations. EGIR-face/BGIR-face achieves comparable
recognition rates compared with other illumination invariant
approaches. EGIRC/BGIRC is superior to the illumination
invariant approaches, since multi GIR images cover more
discriminative information of the face image. Moreover, the
proposed GIR model is integrated with the pre-trained deep
learning model to achieve higher recognition rates under
various illumination variations from slight to severe for face
recognition. Hence, we can conclude that the GIR-PDL model
is one of the efficient recognition approaches for the driver face
images. Even the driver face images can be used to construct
the deep learning training set, the GIR-PDL model may also
improve the performance of the deep learning model trained by
the driver face images.
REFERENCES
[1] G. Sikander and S. Anwar, Driver fatigue detection systems A review,
IEEE Transactions on Intelligent Transportation Systems, vol. 10, no. 6, pp.
2339-2352, Jun. 2018.
[2] B. I. Ahmad , P. M. Langdon, J. Liang, S. J. Godsill, M. Delgado and T.
Popham, Driver and Passenger Identification From Smartphone Data, IEEE
Transactions on Intelligent Transportation Systems, vol. 20, no. 4, pp.
1278-1288, Apr. 2018.
[3] A. Amodio, M. Ermidoro, D. Maggi, S, Formenin and S. M. Savaresi,
Automatic Detection of Driver Impairment Based on Pupillary Light Reflex,
IEEE Transactions on Intelligent Transportation Systems,
10.1109/TITS.2018.2871262, 2018.
[4] E. Derman and A. A. Salah, Continuous real-time vehicle driver
authentication using convolutional neural network based face recognition, In
Proceedings of the 13th IEEE International Conference and Workshops on
Automatic Face and Gesture Recognition, May. 2018, pp. 577-584.
[5] W. Zhang, Z. H. A. O. Xi, J. M. Morvan and L. Chen, Improving Shadow
Suppression for Illumination Robust Face Recognition, IEEE transactions on
pattern analysis and machine intelligence, vol. 41, no. 3, pp. 611-624, Mar.
2018.
[6] W. Deng, J. Hu and J. Guo, Extended SRC: Undersampled face
recognition via intraclass variant dictionary, IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 34, no. 9, pp. 1864-1870, Sept. 2012.
[7] J. W. Wang, N. T. Le, J. S. Lee and C. C. Wang, Color face image
enhancement using adaptive singular value decomposition in fourier domain
for face recognition, Pattern Recognition, vol. 57, pp. 31-49, Sept. 2016.
[8] T. Zhang, B. Gang, Y. Yuan, Y.Y. Tang, Z. Shang, D. Li and F. Lang,
"Multiscale facial structure representation for face recognition under varying
illumination," Pattern Recognition, vol. 42, no. 2, pp.251-258, Feb. 2009.
[9] A. Baradarani, Q. Wu and M. Ahmadi, An efficient illumination invariant
face recognition framework via illumination enhancement and DD-DTCWT
filtering”, Pattern Recognition, vol. 46, no. 1, pp. 57-72, Jan. 2013.
[10] X. Fu, D. Zeng, Y. Huang, X. Zhang and X. Ding, A weighted variational
model for simultaneous reflectance and illumination estimation, In
Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, Jun. 2016, pp. 2782-2790.
[11] W. Chen, M. J. Er and S. Wu, Illumination compensation and
normalization for robust face recognition using discrete cosine transform in
logarithm domain, IEEE Transactions on Systems, Man, and Cybernetics, Part
B: Cybernetics, vol. 36, no. 2, pp. 458-466, Apr. 2006.
[12] T. Chen, W. Yin, X. S. Zhou, D. Comaniciu and T. S. Huang, Total
variation models for variable lighting face recognition, IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 28, no. 9, pp. 1519-1527, Sept.
2006.
[13] X. Xie, W. Zheng, J. Lai, P. Yuen and C. Suen, Normalization of face
illumination based on large-and small-scale features, IEEE Transactions on
Image Processing, vol. 20, no. 7, pp. 1807-1821, Jul. 2011.
[14] C. Hu, X. Lu, M. Ye and W. Zeng, Singular value decomposition and
local near neighbors for face recognition under varying illumination, Pattern
Recognition, vol. 64, pp. 60-83, Apr. 2017.
[15] T. Zhang, Y. Tang, B. Fang, Z. Shang and X. Liu, Face recognition under
varying illumination using gradientfaces, IEEE Transactions on Image
Processing, vol. 18, no. 11, pp. 2599-2606, Nov. 2009.
[16] J. Zhu, W. Zheng and J. Lai, Illumination invariant single face image
recognition under heterogeneous lighting condition, Pattern Recognition, vol.
66, pp. 313-327, Jun. 2017.
[17] B. K. P. Horn, Robot Vision, Cambridge, MA, MIT Press, 1997.
[18] S. R. Dubey, S. K. Singh and R. K. Singh, “Multichannel decoded local
binary patterns for content-based image retrieval”, IEEE transactions on image
processing, vol. 25, no. 9, pp. 4018-4032, Sept. 2016.
[19] S. R. Dubey, S. K. Singh and R. K. Singh, “Local bit-plane decoded
pattern: a novel feature descriptor for biomedical image retrieval”, IEEE
Journal of Biomedical and Health Informatics, vol. 20, no. 4, pp. 1139-1147,
Jul. 2015.
[20] S. R. Dubey, S. K. Singh and R. K. Singh, “Local wavelet pattern: a new
feature descriptor for image retrieval in medical CT databases”, IEEE
Transactions on Image Processing, vol. 24, no. 12, pp. 5892-5903, Dec. 2015.
[21] S. Chakraborty, S. K. Singh and P. Chakraborty, R-theta local
neighborhood pattern for unconstrained facial image recognition and retrieval”,
Multimedia Tools and Applications, vol. 78, no. 11, pp. 14799-14822, Jun.
2019.
[22] S. Chakraborty, S. K. Singh and P. Chakraborty, “Local directional
gradient pattern: a local descriptor for face recognition”, Multimedia Tools and
Applications, vol. 76, no. 1, pp. 1201-1216, Jan. 2017.
[23] S. Chakraborty, S. K. Singh and P. Chakraborty, Centre symmetric
quadruple pattern: A novel descriptor for facial image recognition and
retrieval, Pattern Recognition Letters, vol. 115, pp. 50-58, Nov. 2018.
[24] W. T. Su, C. C. Hsu, C. W. Lin and W. Lin, Supervised-learning based
face hallucination for enhancing face recognition”, In Proceedings of the IEEE
International Conference on Acoustics, Speech and Signal Processing, Mar.
2016, pp. 1751-1755.
[25] C. Hu, M. Ye, S. Ji, W. Zeng and X. Lu, A new face recognition method
based on image decomposition for single sample per person problem”,
Neurocomputing, vol. 160, pp. 287-299, Jul. 2015.
[26] Z. Fan, D. Zhang, X. Wang, Q. Zhu and Y. Wang, Virtual dictionary
based kernel sparse representation for face recognition, Pattern Recognition,
vol. 76, pp. 1-13, Apr. 2018.
[27] Y. Gao, J. Ma and A. Yuille, Semi-supervised sparse representation based
classification for face recognition with insufficient labeled samples, IEEE
Transactions on Image Processing, vol. 26, no. 5, pp. 2545-2560, May. 2017.
[28] F. Schroff, D. Kalenichenko and J. Philbin, Facenet: A unified
embedding for face recognition and clustering, In Proceedings of the IEEE
conference on computer vision and pattern recognition, Jun. 2015, pp. 815-823.
[29] O. M. Parkhi, A. Vedaldi and A. Zisserman, Deep face recognition, In
Proceedings of the British Machine Vision Conference, 2015, pp. 1-12.
13
[30] F. Qiu, W. Lin, X. Liu, H. Yu and H. Xiong, Deep Face Recognition
Using Adaptively-Weighted Verification Loss Function, In Proceedings of the
International Forum on Digital TV and Wireless Multimedia Communications,
2017, pp. 182-192.
[31] J. Deng, J. Guo, N. Xue and S. Zafeiriou, Arcface: Additive angular
margin loss for deep face recognition, In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2019, pp. 4690-4699.
[32] B. Wang, W. Li, W. Yang and Q. Liao, Illumination normalization based
on weber's law with application to face recognition”, IEEE Signal Processing
Letters, vol. 18, no. 8, pp. 462-465, Aug. 2011.
[33] Y. Wu, Y. Jiang, Y. Zhou, W. Li, Z. Lu and Q. Liao, Generalized
Weber-face for illumination-robust face recognition, Neurocomputing, vol.
136, pp. 262-267, Jul. 2014.
[34] Z. Lai, D. Dai, C. Ren, and K. Huang, Multiscale logarithm difference
edgemaps for face recognition against varying lighting conditions, IEEE
Transactions on Image Processing, vol. 24, no. 6, pp. 1735-1747, Jun. 2015.
[35] A. S. Georghiades, P. N. Belhumeur and D. Kriegman, From few to many:
Illumination cone models for face recognition under variable lighting and pose”,
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6,
pp. 643-660, Jun. 2001.
[36] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry and Y. Ma, Robust face
recognition via sparse representation”, IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 31, no. 2, pp. 210-227, Feb. 2009.
[37] D. L. Donoho and Y. Tsaig, Fast solution of L1-norm minimization
problems when the solution may be sparse, IEEE Transactions on Information
Forensics and Security, vol. 54, no. 11, pp. 4789-4812, Nov. 2008.
[38] T. Sim, S. Baker and M. Bsat, The CMU pose, illumination, and
expression database”, IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 25, no. 12, pp. 504-507, Dec. 2003.
[39] A. M. Martinez and R. Benavente, The AR face database, CVC Tech.
Rep. #24, Jun. 1998.
[40] Q. Cao, L. Shen, W. Xie, O. M. Parkhi and A. Zisserman, Vggface2: A
dataset for recognising faces across pose and age, In Proceedings of the IEEE
International Conference on Automatic Face and Gesture Recognition, May.
2018, pp. 67-74.
[41] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, Joint face detection and
alignment using multitask cascaded convolutional networks”, IEEE Signal
Processing Letters, vol. 23, no. 10, pp. 14991503, Oct. 2016.
Chang-Hui Hu received the Ph.D. degree from the
School of Automation, Southeast University, Nanjing,
China, in 2017. He is currently a lecturer with the
College of Automation, Nanjing University of Posts and
Telecommunications, and a post-doctor with Southeast
University. His research interests include image
processing and pattern recognition. He has received
many research awards, such as the Best Doctoral
Dissertation Award from the China Intelligent
Transportation Systems Association in 2017, the Prize in the Science and
Technology Award of Jiangsu province in 2017, and the National Scholarship
from Ministry of Education of China in 2015.
Yang Zhang received the B.S. degree in Communication
Engineering from Chongqing University of Posts and
Telecommunications, Chongqing, China, in 2013 and the
M.S. degree in Instrument Engineering from Guilin
University of Electronic Technology, Guilin, China, in
2016. She is currently working toward the Ph.D. degree
with the School of Automation, Southeast University. Her
current research interests include image processing, face
recognition, and pattern recognition.
Fei Wu received the Ph.D. degree in computer science
from Nanjing University of Posts and
Telecommunications, China, in 2016. He is currently
with the College of Automation in NJUPT. He has
authored over thirty scientific papers. His research
interests include pattern recognition, artificial
intelligence, and computer vision.
Xiao-Bo Lu received the Ph.D. degree from Nanjing
University of Aeronautics and Astronautics. He did his
postdoctoral research with Chien-Shiung Wu
Laboratory, Southeast University, from 1998 to 2000.
He is currently a Professor with the School of
Automation and deputy Director of the Detection
Technology and Automation Research Institute,
Southeast University. He is a coauthor of the book An
Introduction to the Intelligent Transportation Systems
(Beijing, China Communications, 2008). His research interests include image
processing, signal processing, pattern recognition, and computer vision. Dr. Lu
has received many research awards, such as the First Prize in Natural Science
Award from the Ministry of Education of China and the prize in the Science and
Technology Award of Jiangsu province.
Pan Liu received the Ph.D. degree in civil engineering
from University of South Florida, Tampa, USA, in 2006.
He is a Professor with the School of Transportation,
Southeast University, Nanjing, China. His research
interests include traffic operations and safety, and
intelligent transportation systems. He was a recipient of the
Outstanding Young Scientist Foundation of NSFC in 2019,
and also a recipient of the Distinguished Young Scientist
Foundation of NSFC in 2013.
XiaoYuan Jing received the Ph.D. degree of Pattern
Recognition and Intelligent System in the Nanjing
University of Science and Technology, 1998. He was a
Professor with the Department of Computer, Shenzhen
Research Student School, Harbin Institute of Technology,
2005. Now he is a Professor with the College of
Automation & College of Artificial Intelligence, Nanjing
University of Posts and Telecommunications, and with
the School of Computer, Wuhan University, China. He
published over 100 scientific papers on the international
journals and conferences such as TPAMI, ITIP, TIFS, TSMCB, TMM, TCSVT,
TCB, CVPR, AAAI, IJCAI, ACMMM, etc.
... R ECENTLY, the driver face recognition under in-vehicle capture [1], [2], [3] and out-vehicle capture [4] were reported. In-vehicle capture is conducted by the camera fixed in the vehicle,which can be used to forbidden the unauthorized person to drive [3]. ...
... Out-vehicle capture is conducted by the camera fixed at outdoor traffic intersection, which can be used to search and locate illegal drivers. Under unconstrained environment, illumination variation is a major challenge that declines the performance of in-vehicle driver face recognition accuracy [3], whereas illumination variation of the in-vehicle face image [3] is not as severe as that of the out-vehicle face image [4]. ...
... Face identity structure extractor. The local regionbased reflectance pattern [4], [31], [32] can be regarded as one kind of efficient face identity structure extractor. Lai et al. [31] developed the multiscale logarithm difference edgemaps (MSLDE) that extracted face structures from multi edges of six local block regions with sizes from 3 × 3 to 13 × 13. ...
Article
Full-text available
The real traffic monitoring driver face (TMDF) images are with complex multiple degradations, which decline face recognition accuracy in real intelligent transportation systems (ITS). This paper is the first to propose joint image-to-image (I2I) translation to enhance TMDF images of ITS. First, as TMDF images are without corresponding clear ones, identity preserving is critical for TMDF images under unpaired I2I translation. This paper proposes a fast diagonal symmetry pattern (FDSP) to preserve identity structure under unpaired I2I translation. Second, FDSP is introduced into CycleGAN to form FDSP-CG, which aims to learn the degradation mapping (i.e., FDSP-CG-d) from the clarity domain to the degradation domain. FDSP-CG-d can generate massive degradation/clarity image pairs for paired I2I translation training. Third, this paper proposes the dual residual block (DRB) to strengthen Pix2pix for rich face detail features learning (i.e., DRB-P2P), which learns the enhancement mapping from the degradation image to its clear version under paired I2I translation. Finally, the experiments on TMDF (i.e., the brevity name of the face database collected from real ITS) and Chinese famous face (CFF) databases, as well as CelebA and MegaFace databases, indicate that the proposed method can efficiently enhance TMDF images whose degradation variations are learned by FDSP-CG.
... Salesmen and advertising publishers often use this device to judge the customer's feelings and experience to determine their own goals [10]. In order to better solve the above problems, that the use of face detection methods or the refinement of facial expression features for extraction, and the recognition of facial expressions from multiple angles and levels have conducted a lot of research [11]. ...
... The final objective function of minimax can be expressed as: ) ), ( (11) In order to investigate whether the current method is feasible, the source domain of the RAF-DB and SFEW databases is compared with some commonly used methods in this experiment. These methods are the basic methods of the methods in the source domain. ...
Preprint
Full-text available
The application of the facial expression recognition system in the human-computer interaction system refers to the recognition of human facial expressions through the human-computer interaction system in the real society, so as to be able to feel the specific situation of recognizing people. This is also one of the main directions of human-computer interaction system research. In this paper, the facial expression recognition system is designed by the algorithm that combines the expressions of the students in the classroom teaching with the system environment, so that the recognition of the facial expressions of the students in the classroom environment is more accurate. This article elaborates on the identification method of the system, and conducts detailed experimental analysis on the specific functions of other modules in the system. The experimental results show that the security and stability of the system are very high. At the same time, the accuracy of the system in the classroom teaching environment is also very high in the recognition of student facial expressions. This is a modern intelligent face recognition system that enters education and teaching. Provide a strong theoretical basis and technical support during the work.
... Facial recognition technology, a popular tool for recognizing persons in images based on facial traits [1], is crucial in a variety of applications, such as attendance systems, crime prevention, and smartphone unlocking. Its main applications are in authentication, access control, and security-related operations [2]. Nonetheless, its widespread use in a variety of applications raises concerns about user privacy and data security. ...
Article
Full-text available
In recent years, the extensive use of facial recognition technology has raised concerns about data privacy and security for various applications, such as improving security and streamlining attendance systems and smartphone access. In this study, a blockchain-based decentralized facial recognition system (DFRS) that has been designed to overcome the complexities of technology. The DFRS takes a trailblazing approach, focusing on finding a critical balance between the benefits of facial recognition and the protection of individuals' private rights in an era of increasing monitoring. First, the facial traits are segmented into separate clusters which are maintained by the specialized node that maintains the data privacy and security. After that, the data obfuscation is done by using generative adversarial networks. To ensure the security and authenticity of the data, the facial data is encoded and stored in the blockchain. The proposed system achieves significant results on the CelebA dataset, which shows the effectiveness of the proposed approach. The proposed model has demonstrated enhanced efficacy over existing methods, attaining 99.80% accuracy on the dataset. The study's results emphasize the system's efficacy, especially in biometrics and privacy-focused applications, demonstrating outstanding precision and efficiency during its implementation. This research provides a complete and novel solution for secure facial recognition and data security for privacy protection.
... R ECENTLY the traffic monitoring image processing under abnormal conditions received many attentions, such as vehicle detection and tracking in adverse weather [1], driver face recognition under out-vehicle camera [2], and nighttime road scene parsing [3]. Although vehicle tracking and pairing in night scenes were reported a few years ago [4], few works were involved in the night traffic monitoring (NTM) image enhancement in real intelligent transportation systems (ITS). ...
Article
Full-text available
This paper proposes a fast HSI (hue, saturation, intensity) color space and an orthogonal triangular with column pivoting (QRCP) enhancement model to tackle the large size RGB (red, green, blue) night traffic monitoring (NTM) image. Firstly, the fast HSI (FHSI) is proposed to decompose the light and color information of the RGB image, whose hue is defined as the cosine value of the included angle, instead of the included angle in HSI. The saturation of FHSI is defined as the ratio of the projection vector length and the side length of the projection equilateral triangle, and a saturation correction model is further proposed to correct color distortion of the low-light image by adjusting the saturation of FHSI. FHSI is more concise and faster than HSI. Secondly, a novel QRCP enhancement (QRCPE) model is proposed to improve the light of the low-light image by enhancing the intensity of FHSI, which first strengthens diagonal elements of QRCP, and followed by controlling the normalization of strengthened diagonal elements of QRCP. Finally, the FHSI-QRCPE based RGB image can be obtained by transforming the processed FHSI to RGB. The experimental results on NTM, SICE, ExDark, and BDD 100K databases, indicate that the proposed FHSI-QRCPE is fast and efficient to tackle low-light image enhancement. Index Terms-Low-light image enhancement, night traffic monitoring image, fast HSI color space, orthogonal triangular with column pivoting enhancement.
... I MAGES captured in low-light environments suffer various degradations, including but not limited to low visibility, low contrast and unpredictable noise, which involves a wide range of applications, such as vehicle detection and tracking in adverse weather [1], illumination variation driver face recognition under out-vehicle camera [2], nighttime road scene parsing [3], vehicle tracking and pairing in night scenes [4]. In real intelligent transportation systems (ITS), many night traffic monitoring (NTM) images are with high-saturation and low-light under HSV (hue, saturation, value) [5], which can be termed as the high-saturation low-light NTM image (see Subsection V-F), i.e., saturations of NTM images are higher than those of day traffic monitoring images under HSV (see Fig.2). ...
Article
Full-text available
This paper proposes HSV (hue, saturation, value) with three sectors (HSV-3S) and two-dimensional gradient descent algorithm (2D-GDA) for high-saturation low-light image enhancement in night traffic monitoring (NTM). The saturation of HSV-3S is defined as the ratio of the projection vector length and twice length of the sector start vector, which results in that the saturation of HSV-3S is smaller than that of HSV, and a saturation weakening model is proposed to further decrease the saturation of HSV-3S. The hue of HSV-3S is defined as the cosine value of the included angle between the projection vector and the sector start vector in each of three sectors. HSV-3S is more concise and faster than HSV. Then, 2D-GDA extends the gradient descent algorithm to 2D image domain. 2D-GDA employs the iteration matrix with variable step values (i.e., the step values of the dark regions are less than those of the bright regions), which can improve the pixel distribution of the 2D-GDA enhanced image. Finally, the HSV-3S $+$ 2D-GDA based RGB image can be obtained by performing 2D-GDA on the value of HSV-3S with transforming the processed HSV-3S to RGB. The experimental results on NTM (i.e., the brevity name of the database collected from real ITS), LOL, ExDark and SICE databases, indicate that HSV-3S $+$ 2D-GDA is fast and efficient for high-saturation low-light image enhancement.
... With the rapid development of deep learning, CNNs have been applied to many computer vision tasks, such as image recognition and target detection (Ding & Tao, 2018). The test results of face recognition show that the performance of image recognition and detection could be significantly improved using the CNNs (Hu et al., 2019). The comparison of different recognition algorithms shows that the aggregating handcrafted and deep CNN features can make up for the deficiency of deep learning with higher accuracy (Alkinani et al., 2022). ...
Article
Full-text available
Distracted driving is one of the main causes of road crashes. Therefore, effective distinguishing of distracted driving behaviour and its category is the key to reducing the incidence of road crashes. To identify distracted driving behaviour accurately and effectively, this paper uses the head posture as a relevant variable and realizes the classification of distracted driving behaviour based on the relevant literature and investigation. A distracted driving discrimination algorithm based on the facial feature triangle is proposed. In the proposed algorithm, the Bayesian network is employed to judge driving behaviour categories. The proposed algorithm is verified by experiments using data from 20 volunteers. The experimental results show that the discrimination accuracy of the proposed algorithm is as high as 90%, which indicates that the head posture parameters used in this study are closely related to the distracted driving state. The results show that the proposed algorithm achieves high accuracy in the discrimination and classification of distracted driving behaviour and can effectively reduce the accident rate caused by distracted driving. Moreover, it can provide a basis for the research of distracted driving behaviour and is conducive to the formulation of the corresponding laws and regulations.
... Perfect, there are still many problems that need to be solved here, especially how to solve the problem of how people's facial images are deformed in real society, and the recognition errors are caused by factors such as angle and light. In order to better solve the above problems, that the use of face detection methods or the refinement of facial expression features for extraction, and the recognition of facial expressions from multiple angles and levels have conducted a lot of research (Hu et al. 2019). It is possible to extend human facial expressions from the initial six expressions to more detailed and subtle expressions and to recognize these refined expressions through multiple dimensions. ...
Article
Full-text available
The application of the facial expression recognition system in the human–computer interaction system refers to the recognition of human facial expressions through the human–computer interaction system in the real society, so as to be able to feel the specific situation of recognizing people. This is also one of the main directions of human–computer interaction system research. In this paper, the facial expression recognition system is designed by the algorithm that combines the expressions of the students in the classroom teaching with the system environment, so that the recognition of the facial expressions of the students in the classroom environment is more accurate. This article elaborates on the identification method of the system and conducts detailed experimental analysis on the specific functions of other modules in the system. The experimental results show that the security and stability of the system are very high. At the same time, the accuracy of the system in the classroom teaching environment is also very high in the recognition of student facial expressions. This is a modern intelligent face recognition system that enters education and teaching. Provide a strong theoretical basis and technical support during the work.
... With video feeds, useful information can be extracted to analyze traffic density, vehicle speed, license plates, and even driver identification using facial detection [1,2]. Transportation authorities can use this information in decision making and creation of rules, guidelines, and traffic ordinances for better vehicle and pedestrian ecosystem on the roads. ...
Conference Paper
Closed Circuit Television (CCTV) systems are widely used in monitoring and security surveillance applications to assess status and implement measures necessary to address problems and concerns. CCTV nowadays are visible in roads for monitoring and analyzing traffic behavior and conditions. Multiple cameras are utilized to capture the different angles of the road. This is useful in improving traffic management systems, determination of road traffic density, accident reviews, and in some advanced applications, contactless apprehension. Small to medium scale community areas like industrial parks, villages, hospitals, and even academic campuses require traffic monitoring systems. In this study, a network of IP-CCTV cameras was designed to capture vehicular movement, density, and road condition in a campus setting. The network is composed of eight IP-CCTV cameras processed by four Raspberry Pi computer boards in a 2:1 camera-to-computer ratio. A graphical user interface displays the video feed of the cameras, time customizable traffic report, and the road map visualization and notifications. All computer boards can send and receive data and can create visual traffic maps displayed in the user interface. Color-coding is used in the road segments to indicate light, moderate, or heavy traffic conditions. The vehicle detection accuracy of the system is 93% while its status notification accuracy is 84%. In a campus-based application, especially those with medical and health research institutes, this model suffices its requirements in monitoring, analyzing, basis for emergency rerouting, and improvements in traffic management.
Conference Paper
div class="section abstract"> The driver monitoring system (DMS) plays an essential role in reducing traffic accidents caused by human errors due to driver distraction and fatigue. The vision-based DMS has been the most widely used because of its advantages of non-contact and high recognition accuracy. However, the traditional RGB camera-based DMS has poor recognition accuracy under complex lighting conditions, while the IR-based DMS has a high cost. In order to improve the recognition accuracy of conventional RGB camera-based DMS under complicated illumination conditions, this paper proposes a lightweight low-illumination image enhancement network inspired by the Retinex theory. The lightweight aspect of the network structure is realized by introducing a pixel-wise adjustment function. In addition, the optimization bottleneck problem is solved by introducing the shortcut mechanism. Model performance comparison test results demonstrate that the Structure Similarity Index Measure index of the proposed model is 7.04% and 31.03% higher than that of Multi-Scale Retinex with Color Restoration and Contrast Limiting Adaptive Histogram Equalization, respectively. In the use case, the proposed model is capable to process videos with a resolution of 400×600 at a speed of 20fps on average, which meets the requirements of DMS video stream processing speed. Furthermore, a MobileNet based distraction state recognition network pre-trained on the SFD3 dataset is adopted as the back-end to verify its application in the DMS system. The results show that the accuracy in the driver's distracted behavior recognition with a low-light environment is improved by 75.39% compared to before use. </div
Article
Chronic fatigue symptoms of jobs are risk factors that may cause errors and lead to occupational accidents. For instance, occupational injuries and traffic accidents stem from overlooking long-term fatigue. According to statistics for fatigue driving, it was found that fatigue driving is one of the main causes of traffic accidents. The resulting decrease in the quality of traffic, as well as impaired traffic flow efficiency and functioning, contributes markedly to the societal costs of fatigue. This article proposes a noninvasive physical method for fatigue detection using a machine vision image algorithm. The main technology was implemented using a software framework based on optimized skin color segmentation and edge detection, as well as eye contour extraction. By integrating machine vision and an optimized Hove transform algorithm, our method mainly identifies fatigue based on the detected target's face, head gestures, mouth aspect ratio (MAR), and eye condition, and then triggers an alarm through an intelligent auxiliary device. Our evaluation results of facial image data analysis showed that with an ideal eye threshold of 0.3, PERCLOS-80 standard, MAR, and head gesture-nod frequency, the method can be used to detect fatigue data accurately and systematically, thereby fulfilling the purpose of alerting a group of high-risk drivers and preventing them from engaging in high-risk activities in an involuntary state.
Article
Full-text available
In this paper R-Theta Local Neighborhood Pattern (RTLNP) is proposed for facial image retrieval. RTLNP exploits relationships amongst the pixels in local neighborhood of the reference pixel at different angular and radial widths. The proposed encoding scheme divides the local neighborhood into sectors of equal angular width. These sectors are again divided into subsectors of two radial widths. Average grayscales values of these two subsectors are encoded to generate the micropatterns. Performance of the proposed descriptor has been evaluated and results are compared with the state of the art descriptors e.g. LBP, CSLBP, CSLTP, LDP, LTrP, MBLBP, and SLBP. The most challenging facial constrained and unconstrained databases, namely; AT&T, CARIA-Face-V5-Cropped, LFW, and Color FERET have been used for showing the efficiency of the proposed descriptor. Proposed descriptor is also tested on near infrared (NIR) face databases; CASIA NIR-VIS 2.0 and PolyU-NIRFD to explore its potential with respect to NIR facial images. Better retrieval rates of RTLNP as compared to the existing state of the art descriptors show the effectiveness of the descriptor.
Article
Full-text available
Driver fatigue has been attributed to traffic accidents; therefore, fatigue-related traffic accidents have a higher fatality rate and cause more damage to the surroundings compared with accidents where the drivers are alert. Recently, many automobile companies have installed driver assistance technologies in vehicles for driver assistance. Third party companies are also manufacturing fatigue detection devices; however, much research is still required for improvement. In the field of driver fatigue detection, continuous research is being performed and several articles propose promising results in constrained environments, still much progress is required. This paper presents state-of-the-art review of recent advancement in the field of driver fatigue detection. Methods are categorized into five groups, i.e., subjective reporting, driver biological features, driver physical features, vehicular features while driving, and hybrid features depending on the features used for driver fatigue detection. Various approaches have been compared for fatigue detection, and areas open for improvements are deduced.
Article
Full-text available
The objective of this paper is twofold. First, it presents a brief overview of existing driver and passenger identification or recognition approaches, which rely on smartphone data. This includes listing the typically available sensory measurements and highlighting a few key practical considerations for automotive settings. Second, a simple identification method that utilizes the smartphone inertial measurements and, possibly, doors signal is proposed. It is based on analyzing the user behavior during entry, namely, the direction of turning, and extracting relevant salient features, which are distinctive depending on the side of entry to the vehicle. This is followed by applying a suitable classifier and decision criterion. Experimental data is shown to demonstrate the usefulness and effectiveness of the introduced probabilistic, low-complexity, identification technique.
Article
Full-text available
2D face analysis techniques, such as face landmarking, face recognition and face verification, are reasonably dependent on illumination conditions which are usually uncontrolled and unpredictable in the real world. An illumination robust preprocessing method thus remains a significant challenge in reliable face analysis. In this paper we propose a novel approach for improving lighting normalization through building the underlying reflectance model which characterizes interactions between skin surface, lighting source and camera sensor, and elaborates the formation of face color appearance. Specifically, the proposed illumination processing pipeline enables the generation of Chromaticity Intrinsic Image (CII) in a log chromaticity space which is robust to illumination variations. Moreover, as an advantage over most prevailing methods, a photo-realistic color face image is subsequently reconstructed which eliminates a wide variety of shadows whilst retaining the color information and identity details. Experimental results under different scenarios and using various face databases show the effectiveness of the proposed approach to deal with lighting variations, including both soft and hard shadows, in face recognition.
Article
The main objective of this paper is to determine the feasibility of designing a driver drunkenness detection system based on the dynamic analysis of a subject's pupillary light reflex (PLR). This involuntary reaction is widely utilized in the medical field to diagnose a variety of diseases, and in this paper, the effectiveness of such a method to reveal an impairment condition due to alcohol abuse is evaluated. The test method consists in applying a light stimulus to one eye of the subject and to capture the dynamics of constriction of both eyes; for extracting the pupil size profiles from the video sequences, a two-step methodology is described, where in the first phase, the iris/pupil search within the image is performed, and in the second stage, the image is cropped to perform pupil detection on a smaller image to improve time efficiency. The undesired pupil dynamics arising in the PLR are defined and evaluated; a spontaneous oscillation of the pupil diameter is observed in the range [0, 2] Hz and the accommodation reflex causes pupil constriction of about 10% of the iris diameter. A database of pupillary light responses is acquired on different subjects in baseline condition and after alcohol consumption, and for each one, a first-order model is identified. A set of features is introduced to compare the two populations of responses and is used to design a support vector machine classifier to discriminate between "Sober" and "Drunk" states.
Chapter
Face recognition plays a critical role in surveillance and security systems. Due to the large appearance variation of human faces, the dissimilarity among faces for the same person may be quite large. This leads to unstable results. To improve the stability and reliability of face recognition, this paper proposes a novel deep-based approach by introducing an adaptively-weighted verification loss function. The proposed loss function can properly enlarge the margin between positive face pairs and negative face pairs from the global perspective, thus obtain a more reliable recognition model by minimizing the dissimilarity between same-person faces and maximizing the dissimilarity between different-person faces. Experiments on the benchmark LFW and YTF datasets demonstrate that the proposed approach can obtain the state-of-the-art performances for face recognition.
Article
Facial features are defined as the local relationships that exist amongst the pixels of a facial image. Hand-crafted descriptors identify the relationships of the pixels in the local neighborhood defined by the kernel. Kernel is a two dimensional matrix which is moved across the facial image. Distinctive information captured by the kernel with limited number of pixel achieves satisfactory recognition and retrieval accuracies on facial images taken under constrained environment (controlled variations in light, pose, expressions, and background). To achieve similar accuracies under unconstrained environment local neighborhood has to be increased, in order to encode more pixels. Increasing local neighborhood also increases the feature length of the descriptor. In this paper we propose a hand-crafted descriptor namely Centre Symmetric Quadruple Pattern (CSQP), which is structurally symmetric and encodes the facial asymmetry in quadruple space. The proposed descriptor efficiently encodes larger neighborhood with optimal number of binary bits. It has been shown using average entropy, computed over feature images encoded with the proposed descriptor, that the CSQP captures more meaningful information as compared to state of the art descriptors. The retrieval and recognition accuracies of the proposed descriptor has been compared with state of the art hand-crafted descriptors (CSLBP, CSLTP, LDP, LBP, SLBP and LDGP) on bench mark databases namely; LFW, Color-FERET, and CASIA-face-v5. Result analysis shows that the proposed descriptor performs well under controlled as well as uncontrolled variations in pose, illumination, background and expressions.