ArticlePDF Available

Toward Driver Face Recognition in the Intelligent Traffic Monitoring Systems

October 2019
IEEE Transactions on Intelligent Transportation Systems PP(99):1-14

October 2019
PP(99):1-14

DOI:10.1109/TITS.2019.2945923

Authors:

Hu Changhui

Nanjing University of Posts and Telecommunications

Yang Zhang

Southeast University (China)

Fei Wu

Nanjing University of Posts and Telecommunications

Show all 6 authorsHide

This paper models the driver face recognition problem under the intelligent traffic monitoring systems as severe illumination variation face recognition with single sample problem. Firstly, in the point of view of numerical value sign, the current illumination invariant unit is derived from the subtraction of two pixels in the face local region, which may be positive or negative, we propose a generalized illumination robust (GIR) model based on positive and negative illumination invariant units to tackle severe illumination variations. Then, the GIR model can be used to generate several GIR images based on the local edge-region or the local block-region, which results in the edge-region based GIR (EGIR) image or the block-region based GIR (BGIR) image. For single GIR image based classification, the GIR image utilizes the saturation function and the nearest neighbor classifier, which can develop EGIR-face and BGIR-face. For multi GIR images based classification, the GIR images employ the extended sparse representation classification (ESRC) as the classifier that can form the EGIR image based classification (GIRC) and the BGIR image based classification (BGIRC). Further, the GIR model is integrated with the pre-trained deep learning (PDL) model to construct the GIR-PDL model. Finally, the performances of the proposed methods are verified on the Extended Yale B, CMU PIE, AR, self-built Driver and VGGFace2 face databases. The experimental results indicate that the proposed methods are efficient to tackle severe illumination variations.

Content uploaded by Hu Changhui

Content may be subject to copyright.



Abstract—This paper models the driver face recognition

problem under the intelligent traffic monitoring systems as

severe illumination variation face recognition with single

sample problem. Firstly, in the point of view of numerical

value sign, the current illumination invariant unit is

derived from the subtraction of two pixels in the face local

region, which may be positive or negative, we propose a

generalized illumination robust (GIR) model based on

positive and negative illumination invariant units to tackle

severe illumination variations. Then, the GIR model can be

used to generate several GIR images based on the local

edge-region or the local block-region, which results in the

edge-region based GIR (EGIR) image or the block-region

based GIR (BGIR) image. For single GIR image based

classification, the GIR image utilizes the saturation

function and the nearest neighbor classifier, which can

develop EGIR-face and BGIR-face. For multi GIR images

based classification, the GIR images employ the extended

sparse representation classification (ESRC) as the classifier

that can form the EGIR image based classification (GIRC)

and the BGIR image based classification (BGIRC). Further,

the GIR model is integrated with the pre-trained deep

learning (PDL) model to construct the GIR-PDL model.

Finally, the performances of the proposed methods are

verified on the Extended Yale B, CMU PIE, AR, self-built

Driver and VGGFace2 face databases. The experimental

results indicate that the proposed methods are efficient to

tackle severe illumination variations.

Index Terms— Traffic driver face recognition, severe

illumination variations, generalized illumination robust

model, single sample problem

I. INTRODUCTION

ECENTLY, many research works about the traffic driver

were reported [1]-[3] as well as real-time vehicle driver

This work was supported by the National Natural Science Foundation of

China (No.61802203, No.61702280), Natural Science Foundation of Jiangsu

Province (No.BK20180761, No.BK20170900), China Postdoctoral Science

Foundation (No.2019M651653), Postdoctoral Research Funding Program of

Jiangsu Province (No.2019K124), National Postdoctoral Program for

Innovative Talents (No.20180146), and NUPTSF (No.NY218119).

C.-H. Hu (Corresponding author), F. Wu and X.-Y. Jing are with the College

of Automation and College of Artificial Intelligence, Nanjing University of

Posts and Telecommunications, Nanjing 210023, China (e-mail:

hchnjupt@126.com, wufei_8888@126.com, jingxy_2000@126.com)

C.-H. Hu, Y. Zhang, and X.-B Lu are with the School of Automation,

Southeast University, Nanjing 210096 (e-mail: hchseu@seu.edu.cn,

yang.zhang-1@uts.edu.au, xblu2013@126.com).

P. Liu is with the School of Transportation, Southeast University, Nanjing

210096 (e-mail:, linpan@seu.edu.cn).

authentication [4] under in-vehicle camera, but less works were

involved in the driver face recognition under out-vehicle

camera. Illumination was considered as a major problem for

in-vehicle face analysis [4], whereas illumination of in-vehicle

face analysis is not as severe as that of out-vehicle face

analysis.

In the intelligent traffic monitoring systems of China, the

high-definition cameras are fixed at outdoor traffic intersection,

which can capture the frontal vision of the passed vehicle

including the driver face. The driver face images are usual

frontal pose and natural expression due to the fact that the

images are taken when peoples are focusing on driving. Fig.1.

shows the traffic vehicle images and the driver face images in

the real intelligent traffic monitoring systems. It can be seen

that the driver face images are with severe illumination

variations.

As the driver faces are huge in the intelligent traffic

monitoring systems, it is impossible to record many images or a

period of video for every driver due to limited capacity of

storages. Recording only one high-definition image is the

common practice, which means only one face image is

available for each driver. It is significant for the intelligent

traffic monitoring systems to automatically identify one correct

driver from many by the face images, which results in the

severe illumination variation face recognition with single

sample problem. Hence, severe illumination variation and

single sample problem are two main tasks of the driver face

image.

Fig.1. The traffic vehicle images and the driver face images.

Illumination variation [5] and single sample problem [6] are

extremely tough in face recognition. As numerous approaches

have been proposed to tackle severe illumination variation and

single sample problem respectively, some significant works are

selected to review in this paper.

The illumination recovering approach [7] and the

illumination invariant approach [8]-[16] are two categories of

methods to tackle illumination variations in face recognition.

The illumination recovering approach aims to obtain the

normal lighting version of the illumination contaminated face

image. The illumination invariant approach extracts the

Toward Driver Face Recognition in the

Intelligent Traffic Monitoring Systems

Chang-Hui Hu, Yang Zhang, Fei Wu, Xiao-Bo Lu, Pan Liu and Xiao-Yuan Jing

illumination insensitive content from the illumination

contaminated face image. As illumination recovering could

distort face discriminant information, the illumination invariant

approach is more robust to tackle severe illumination variations.

In fact, most of illumination invariant approaches were

developed based on the lambertian reflectance model [17]. The

face reflectance [8]-[10], the face high-frequency facial feature

[11]-[14], and the face illumination invariant measure [15]-[16]

are very efficient to tackle severe illumination variations.

The wavelet transform was utilized to construct multiscale

facial structure (MSF) of the face reflectance [8]. Further, the

face reflectance was tackled by the double-density dual-tree

complex wavelet transform (DD-DTCWT) [9]. Recently, the

weighted variational model was proposed to estimate the

reflectance and the illumination simultaneously [10]. The

discrete cosine transform was early used to extract the

reflectance of the logarithm face image (LOG-DCT) [11]. The

logarithmic total variation (LTV) model [12] was firstly

proposed to extract the small-scale facial structures (i.e. the

high-frequency facial feature) of the illumination contaminated

face image. The illumination normalization based on small-and

large-scale features (SL_LTV) [13] employed LTV and DCT to

construct the combination of the illumination normalized

low-frequency facial feature and the corrected high-frequency

facial feature. The frequency interpretation of the single value

decomposition algorithm was firstly used to develop the

high-frequency facial feature of the illumination contaminated

face image (HFSVD-face) [14], where illumination effects of

the face image were strictly constrained. The Gradient-face [15]

used the ratio of y-gradient to x-gradient of the illumination

contaminated face image to construct the illumination invariant

measure. The logarithm gradient histogram (LGH) [16]

combined the gradient feature and the magnitude to developed

the histogram of the illumination contaminated face image.

Ideally, the face illumination invariant measure requires that

illumination intensities of neighborhood pixels are

approximately equal in the face local region.

The local pattern descriptor [18]-[23] is a state-of-the-art

hand-crafted based image feature descriptor. The R-theta local

neighborhood pattern (RTLNP) [21], the local directional

gradient pattern (LDGP) [21], and the centre symmetric

quadruple pattern (CSQP) [23] could effectively recognize the

face image with illumination, pose and expression variations.

Moreover, Supervised-learning based face hallucination [24]

was novel and efficient to tackle low-resolution face images

with severe illumination variations.

The essence of single sample problem is the lack of face intra

and inter class information, the virtual image approach [25]-[26]

and the generic image learning approach [6], [27] are two

categories of methods to address single sample problem in face

recognition. The virtual image approach generates virtual

images of the single sample to learn the face intra class

information, and the generic image learning approach learns the

face intra and inter class information of the single sample from

available multi samples. Undoubtedly, the deep learning based

approach [28]-[31] is the best to learn the face intra and inter

class information from available massive face images. The

matching/non-matching pairs consisting of 200M internet face

images were used to train Facenet [28]. 2.6M internet face

images (2622 persons and 1000 images per person) were

employed to train VGG [29]. 85742 persons and 5.8M internet

face images were utilized to train ArcFace [31].

In this paper, the generalized illumination robust (GIR)

model is propose to tackle severe illumination variations, and

then the GIR model is utilized to generate several GIR images

of the single training sample. For single GIR image based

classification, the saturation function and the nearest neighbor

classifier are used. For multi GIR images based classification,

the extended sparse representation classification (ESRC) [6] is

employed. Further, the GIR model is integrated with the

pre-trained deep learning model.

Compared to the previous works such as [8]-[16] and

[32]-[34], the new contributions of this paper are:

(1) This paper indicates that the existing illumination

invariant unit is derived from the subtraction of two pixels in

the face local region, which may be positive or negative. Based

on this fact, the GIR model is developed to tackle severe

illumination variations.

(2) The GIR model utilizes only one weight rather than

several different weights to generate the illumination invariant

measure from multi face local regions. The GIR model can

easily generate several illumination robust images of the single

training image to address single sample problem.

(3) This paper not only utilize the saturation function and the

template matching approach for single GIR image

classification, but also employs ESRC for multi GIR images

classification.

(4) Despite available driver face images are not able to train a

robust deep learning model, this paper integrates the GIR

model and the pre-trained deep learning model to tackle severe

illumination variation face recognition with single sample

problem.

This paper is organized as follows. The motivation and

related works are reviewed in Section II. Section III elaborates

the generalized illumination robust (GIR) model. Section IV

presents the classification model. Section V gives the

experiments, and Section VI concludes this paper.

II. MOTIVATION AND RELATED WORKS

A. Motivation

The deep learning method is the best face recognition

approach nowadays, since the data-driven based deep learning

method was trained by large scale labeled face images (i.e. face

image pairs or many persons with each containing multi

images). However, if the deep learning model do not consider

severe illumination variations, it may not extract very well

discriminative facial feature of the face image with severe

illumination variations. It can be seen from the experimental

results of Extended Yale B [35] in Section V that VGG [29] or

ArcFace [31] performs unsatisfactorily under severe

illumination variations.

As only one face image is available for each driver in the real

intelligent transportation systems, it is difficult to collect

sufficient driver face image pairs for deep learning training. We

are motivated to research a novel model-driven based

illumination invariant approach, and then we try to integrate the

illumination invariant approach and the pre-trained deep

learning model to tackle severe illumination variation face

recognition with single sample problem.

From the driver face images of the real intelligent traffic

monitoring systems as shown in Fig.1, the driver face images

are with severe holistic illumination variations. Our previous

work [14] indicated that the illumination invariant measure

performed better than the high-frequency facial feature under

severe holistic illumination variations (i.e. images of subset 5 of

Extended Yale B [35]), since holistic illumination variations

satisfy that illumination intensities of neighborhood pixels are

approximately equal in the face local region.

B. Related works

The illumination invariant measure aims to eliminate the

illumination of the contaminated face image to form the

reflectance based pattern. The Weber-face [32] constructed a

simple reflectance based pattern that the difference of the center

pixel and its neighbor pixel to the center pixel in a local region

with size of 3×3. Then, the Weber-face was extended to multi

local regions to develop the generalized Weber-face (GWF)

[33]. Recently, the Weber-face was extended to the logarithm

domain. The multiscale logarithm difference edgemaps

(MSLDE) [34] constructed the reflectance based pattern from

multi local edge-regions of the logarithm face. The local near

neighbor face (LNN-face) [14] extracted illumination invariant

measure directly from multi local block-regions of the

logarithm face.

III. GENERALIZED ILLUMINATION ROBUST MODEL

A. The illumination invariant unit

Our previous work [14] indicated that the illumination

invariant measure of the logarithm image had better tolerance

than that of the pixel image to severe illumination variations.

The illumination invariant unit (

IIU

) in the logarithm domain

is defined as

( , ) ( , ) ( , )

i j i j

IIU x y lnI x y lnI x y

( , )

i j k

xy 

(1)

where

( , )xyI

denotes the pixel intensity of the image point

( , )xy

denotes the center point of the local region



and

( , )

denotes the neighbor point of

( , )xy



. From

the lambertian reflectance model [17], the logarithm image can

be presented as

( , ) ( , ) ( , )x y x y x ylnI lnR lnL

, where

and

are the reflectance and the illumination. If illumination

intensities are equal in



(i.e.

( , ) ( , )

x y x ylnL lnL

IIU

the reflectance based pattern, which is regarded as illumination

invariant. However, [14] indicated that severe illumination

variations could cause the high-frequency interference to

contaminate

IIU

The sum of all the illumination invariant units (

IIUs

) in the

local region



can be presented as

( , ) ( , )

k i j

i j k

IIU x y IIU x y





(2)

Hence, the illumination invariant measure (

IIM

) in the

logarithm domain can be presented as

( , ) ( , )





N

IIM x y IIU x y

(3)

where

is the number of the local regions, and



is the

weight associated with

IIU

. If



is the local block-region,

formula (3) is LNN-face without the sigmoid function [14]. If



is the local edge-region, formula (3) is MSLDE without the

arc-tangent function [34]. Fig.2 shows some local

block-regions and local edge-regions. It can be seen that the

combination of N edge-regions is equal to the block region with

k=N.

Block region

with k=1 Block region

with k=2 Block region with

k=3

Edge region

with k=1 Edge region

with k=2 Edge region with

k=3

Fig.2. Some local block-regions and local edge-regions.

Fig.3 shows the values of

IIU

of 100 points in different

edge-regions. It can be seen that

IIU

of large

is much

larger than that of small

, since the large edge-region contains

more illumination invariant units than the small edge-region as

shown in Fig.2. The same conclusion can also be done for the

IIU

of the block-region.

Fig.3. The numerical values of IIUk (k=1,2,3,4,5) of 100 points in different

edge-regions. Three images from left to right are the original pixel image, the

logarithm image and the Gauss smoothed logarithm image with Blue Line.

Blue Line consists of 100 points used here.

MSLDE [34] or LNN-face [14] employed multi local regions

to develop the illumination invariant measure, since multi local

regions not only cover more discriminative information but

also mitigate the effects of the high-frequency interference.

MSLDE [34] designated the large weight to

IIU

whose

IIUs

are close to the center point

,()xy

, whereas LNN-face [14]

assigned the large weight to

IIU

with more discriminative

information. The weights recommended by MSLDE and

LNN-face are termed as weights of MSLDE and weights of

LNN-face respectively. In point of view of the weights,

MSLDE [34] and LNN-face [14] are tailored to the conditions

of the particular datasets, whereas they might be unable to

achieve high performances on the face databases with different

conditions.

B. The generalized illumination invariant model

From Fig.3, it seems that the value of

IIU

of one point

have the same sign when

2k

. Based on our test,

IIU

1k

may not share the same sign with that of

2k

for some

points, since the small local region with less

IIUs

is sensitive

to the high-frequency interference.

As zero

IIU

contributes nothing to

IIU

, in point of view

of numerical value sign, we consider that all the illumination

invariant units

IIUs

in the local region



can be replaced by

the positive illumination invariant unit

0IIU

and the

negative illumination invariant unit

0IIU

. Then formula (2)

can be re-defined as

( , ) ( , ) ( , )

( , ) ( , )

k i j i j

i j k

i j i j

x y x y

i j k i j k

IIU x y IIU x y IIU x y

IIU x y IIU x y







 









(4)

where

IIU

and

IIU

. The illumination invariant

measure in formula (3) can be represented as

 

( , ) ( , ) ( , )









N

k k k

IIM x y IIU x y IIU x y

(5)

From MSLDE [34] and LNN-face [14], the differences of

weights of MSLDE (or LNN-face) are very small, and the

numerical value of

IIU

of large

is much larger than that of

IIU

of small

as shown in Fig.3. The weights of MSLDE or

LNN-face cannot change the fact that

IIU

with large k plays

dominant role in

IIM

. Hence, the role of the weights in

formula (5) is not significant to

IIM

Based on the assumption of the illumination invariant

measure that the illumination intensities are approximately

equal in the face local region, all

IIUs

have the same

reflectance based pattern. We consider that each

IIU

should

share the same contribution to

IIM

in the face local regions,

thus all

IIUs

should be assigned the same weight in

IIM

In this paper, we assigned



(

1,kN

) in formula (5)

to equally treat each

IIU

in multi local regions, which can be

regarded as a generalized strategy. Although

IIU

and

IIU

are counterpart illumination invariant units, the combination of

IIU

and

IIU

can mitigate the effects of the high-frequency

interference. In formula (3), several weights



(

1,kN

)

are used to adjust the proportion of

IIU

IIM

, here we still

require to control the proportions of

IIU

and

IIU

formula (5), which can be achieved by only one parameter. We

propose to directly combine

IIU

and

IIU

to form the

generalized illumination robust (

GIR

) model as below

( , ) ( , ) ( , )

GIR x y IIU x y IIU x y











(6)

where



and



are the weights and





. Only one

weight



(or



) can control the generation of the

GIR

image.

When





and





, formula (6) is equal to formula (5) with





(

1,kN

In formula (6), only one weight



(or



) should be

estimated. Despite the GIR model extracts

IIUs

from multi

face local regions, the weight number of the GIR model is much

less than that of MSLDE [34] or LNN-face [14].

C. The generation of GIR images

As many weights were involved in MSLDE [34] and

LNN-face [14], it is difficult to establish a simple strategy to

generate several illumination invariant images based on their

weights to tackle single sample problem. From formula (6), the

weight (





) of the GIR model is very simple, we can

generate several GIR images by different weights.

In formula (6), the GIR image can be generated by either the

local edge-region or the local block-region. As MSLDE6 [34]

utilized 6 edge-regions and LNN-face [14] used 5

block-regions to develop the illumination invariant measure,

the edge-region based GIR (EGIR for brevity) image and the

block-region based GIR (BGIR for brevity) image employ 6

edge-regions and 5 block-regions respectively in this paper.

Original images

Logarithm images

EGIR images

β = 0

β = 0.4

β = 1

β = 1.6

β = 2

BGIR images

β = 0

β = 0.4

β = 1

β = 1.6

β = 2

Difference images

of BGIR and

EGIR images

β = 0

β = 0.4

β = 1

β = 1.6

β = 2

Fig.4. Some GIR images with different parameters. From top to bottom, 1st row:

original images; 2nd row: logarithm images; 3rd to 7th rows: EGIR images

with parameter β: 0, 0.4, 1.0, 1.6, and 2.0; 8th to 12th rows: BGIR images with

parameter β: 0, 0.4, 1, 1.6, and 2; 13th to 18th rows: difference images of

BGIR and EGIR images with parameter β: 0, 0.4, 1, 1.6, and 2.

Fig.4 shows some GIR images with different weights. It can

be seen that the GIR images (i.e. EGIR images and BGIR

images) vary from dark to bright when



changes from 0 to 2.

EGIR images and BGIE images of the same original image

seem very similar under the same



, since they are composed

of the same or neighbouring illumination invariant units in the

face local regions, whereas their numerical values are quite

different, which can be seen from their difference images (i.e.

difference images of BGIR images and EGIR images).

In theoretically, the illumination of the GIR image is

eliminated, the GIR images of original images of one face are

very similar under the same value of



as shown in Fig.4.

However, from Fig.4, different values of



can cause visual

differences of the GIR images of the same original image. The

visual differences of GIR images can be regarded as one kind of

intra class variations caused by the illumination. EGIR image

and BGIR image algorithms are listed in Table I.

IV. THE CLASSIFICATION MODEL

A. Single GIR image based classification

Previous approaches [14], [32]-[34] usually employed the

saturation function to tackle the high-frequency interference in

the illumination invariant measure, which can also be

conducted on the single GIR image in formula (6) as

 



( , ) ( , )

( , ) (2 ) ( , )

GIR face x y arctan GIR x y

arctan IIU x y IIU x y



  







  



(7)

Formula (7) is termed as the GIR-face. The edge-region

based GIR-face and the block-region based GIR-face are

termed as EGIR-face and BGIR-face respectively. In formula

(7), we employ





, which was recommended by previous

approaches [32] and [34].

Similar with MSLDE [34] and LNN-face [14], it is essential

to estimate the weight



for EGIR-face and BGIR-face in

formula (7). In this paper, we estimate



by experiments on

the Yale B face database [35], which covers a wide range of

illumination variations. Our experiments are as follows. 1) The

first image of each person in Subset 1 is used to form the single

training set (i.e. Normal training images), and the rest images of

Yale B are designated to test. 2) The first image of each person

in Subset 5 forms the single training set (i.e. Contaminated

training images), and the rest images of Yale B are assigned to

test. The GIR-faces of Yale B images are directly used to

conduct classification by the nearest neighbor classifier based

on Euclidean distance.

Fig.5 and Fig.6 show recognition rates of EGIR-face and

BGIR-face under different values of



by using normal

training images and contaminated training images respectively.

It can be seen that EGIR-face and BGIR-face achieve high

recognition rates when

0.4





. Hence,

0.4





and

1.6





are adopted in formula (7) in this paper. EGIR-face and

BGIR-face algorithms are listed in Table I.

As EGIR-face or BGIR-face aims to tackle severe

illumination variations, the Yale B face database with severe

illumination variations is selected as the particular dataset that

is tailored to EGIR-face or BGIR-face. The weight generated

on this tailored dataset can make EGIR-face or BGIR-face

achieve high performance on the face database with severe

illumination variations, whereas they might not achieve high

performances on the face databases without severe illumination

variations.

Fig.5. Recognition rates of EGIR-face under different values of β.

Fig.6. Recognition rates of BGIR-face under different values of β.

TABLE I

EGIR IMAGE, BGIR IMAGE, EGIR-FACE AND BGIR-FACE ALGORITHMS

Step 1. Input a logarithm image

( , ) ( , ) ( , )x y x y x ylnI lnR lnL

with severe

illumination variations.

Step 2.

( , )xylnI

is convolved with Gaussian kernel function

( , , ) 1

xy xy

G exp

 











for Smoothening.

Step 3. Calculate

IIU

and

IIU

of edge-regions and block-regions by

formula (4).

Step 4. Obtain EGIR image and BGIR images by formula (6).

Step 5. Obtain EGIR-face and BGIR-face by formula (7).

B. Multi GIR images based classification

As mentioned above, the current illumination invariant

measure utilized the saturation function such as the arc-tangent

function [32]-[34] and the bipolar sigmoidal function [14] to

eliminate the high-frequency interference, and then the

template matching method such as nearest neighbor classifier

was used to conduct the final classification. The saturation

function can really improve the recognition performance of the

illumination invariant measure under template matching

classification. However, the saturation function may cut some

valid information. In fact, the nearest neighbor classifier is

sensitive to noise (i.e. high-frequency interference), whereas

the sparse representation classification (SRC) [36] is robust to

noise.

Here, we employ ESRC [6] to classify multi GIR images to

tackle severe illumination variation face recognition with single

sample problem. Multi GIR images based classification can be

presented as

 



   



   

   

xVV

min G xx

(8)

[ , ,



G



]



where



is the GIR image of the testing image,

is the GIR

image based training set and



is the

thj

GIR image of the

thi

training person,

is the GIR image based generic intra

class variation set.

 

;

x x x

is the spare coefficient vector,

where

and

are the sparse coefficients corresponding to

and

respectively. The classification rule of formula (8) is

 

()

,V G

argmin G x











(9)

where

()



is a skeleton vector whose nonzero entry is the

one in

that is corresponding to class

. Formulas (8) and (9)

are termed as multi GIR images based classification (GIRC) in

this paper. The edge-region based GIRC and the block-region

based GIRC are briefly termed as EGIRC and BGIRC. The

Homotopy method [37] is employed to solve the

L1-minimization problem in formula (8). GIRC algorithm is

listed in Table II.

As the training person lacks intra class variation information

under single sample problem, the single training image takes

the GIR model to generate multi training GIR images. Multi

training GIR images can improve the representation ability of

the recognition model in formula (8), due to the fact that more

intra class variations of the single training image are covered as

shown in Fig.4. In our experiments, we selected three GIR

images with



= 0.4, 1, and 1.6 to form multi training GIR

images of each single training image. Based on our test, it is

unable to improve the performance of formulas (8) and (9),

when the number of training GIR images of each person is

changed to five (i.e. all five GIR images in Fig. 4).

From Fig. 4, the GIR images with



=1 are with appropriate

vision and distinguished features. When



=1, the GIR image

is synthesized by the inherent information of positive and

negative illumination invariant units. Hence,



is generated by

the GIR image with



=1 in formula (8).

In formula (8), the intra class variation set

is generated by

the GIR images with



=1 of the generic images. As faces

share similar intra class variations, the generic images outside

training and testing images, are usually used to model the intra

class variations of the single training image. Generally, the

generic images are available with each person containing multi

images, which are unnecessary to generate multi GIR images,

since multi images of the generic person can produce sufficient

face intra class variation information. However, if the generic

images are less, the GIR model can also be used to generate

multi GIR images of each generic person to model the face intra

class variations sufficiently.

C. Multi GIR images and pre-trained deep learning model

based Classification

The aims of the GIR model and the pre-trained deep learning

model are to extract similar facial features of illumination

contaminated images of the same face. Formula (8) can also be

extended to the pre-trained deep learning model. We utilize the

linear combination characteristic of the ESRC model to

integrate the GIR model and the pre-trained deep learning

model. The ESRC residual of the multi GIR images and the

ESRC residual of the pre-trained deep learning features can be

combined to conduct classification. The classification of the

multi GIR images and the pre-trained deep learning features is

 

,2 1 2 1

, V , V

G G Gdl Gdl

dl dl dl

dl V V Vdl Vdl

x x x x

min G G

x x x x

   

       

    

       

       

(10)

where



is the pre-trained deep learning feature of the testing

image,

is the pre-trained deep learning feature set of the

training images,

is the pre-trained deep learning feature

based generic intra class variation set.

 

;

dl Gdl Vdl

x x x

is the

spare coefficient vector. The classification rule of formula (10)

 

 

 

 

,V ,V

G Gdl

dl dl dl

iV Vdl

argmin G G





   

  

   

   

(11)

Formulas (10) and (11) are termed as multi GIR images and

pre-trained deep learning model based classification (GIR-PDL

for brevity). In this paper, the pre-trained deep learning models

VGG [29] and ArcFace [31] are adopted. Multi EGIR images

and VGG (or ArcFace) based classification is briefly termed as

EGIR-VGG (or EGIR-ArcFace), and multi BGIR images and

VGG (or ArcFace) based classification is briefly termed as

BGIR-VGG (or BGIR-ArcFace). GIR-PDL algorithm is listed

in Table II.

TABLE II

GIRC AND GIR-PDL ALGORITHMS

Step 1. Input training images with single sample per person, a test image and

multi generic images.

Step 2. Generate multi GIR images of each single training image, the GIR

image of the test image, and the GIR image of each generic image to form



and

Step 3. Generate the pre-trained deep learning features of each single

training image, the test image, and each generic image to obtain



and

Step 4. Normalize each column of



and

to have

unit L2-norm.

Step 5. Obtain GIRC by formulas (8) and (9).

Step 6. Obtain GIR-PDL by formulas (10) and (11).

V. EXPERIMENTS

A. Face databases

This paper focus on severe illumination variation face

images, several available benchmark illumination variation

face databases are employed. The performances of the

proposed methods are validated on the Extend Yale B [35],

CMU PIE [38], AR [39], our self-built Driver [14] and

VGGFace2 [40] face databases.

In our experiments, the large scale VGGFace2 images are

automatically cropped and aligned by MTCNN [41]. The small

scale Extended Yale B, CMU PIE, AR and Driver images are

manually cropped and aligned, since face images with severe

illumination variations such as Subset 5 images of Extended

Yale B cannot be cropped and outputted by MTCNN, and

severe illumination variation face images processed by

logarithm transformation as shown in Fig.4 can easily conduct

manual face alignment. Hence, both automatic alignment and

manual alignment can be used to tackle a complex illumination

variation face alignment task. For fair comparison, all face

images exclude the background information as shown in Fig.7.

Extended

Yale B

CMU

PIE

Driver

Subset5Subset4

Subset1 Subset5Subset2 Subset3

C09C29

C27 C09C27 C29

Session2

Session1 Session2

In carIn car

Indoor In carIndoor Indoor

Session1 Session1

VGGFace2

Fig.7. Some images from Extended Yale B, CMU PIE, AR, Driver and

VGGFace2 face databases.

It is worth noting that, illumination variations are linear, and

pose/expression variations are nonlinear. As most driver face

images are with frontal pose, natural expression and severe

illumination variations, proper face alignment is essential for

the model-driven based illumination processing method and the

linear method SRC [36], whereas stronger alignment may cause

discriminative information loss.

The compared model-driven based approaches use grayscale

face images, and the data-driven based approaches VGG and

ArcFace utilize color face images. As the real driver face region

is around 50×50 pixels in the intelligent traffic monitoring

systems, all grayscale images are resized to 50×50 pixels for the

compared model-driven based approaches in our experiments.

The Extended Yale B database [35] incorporates grayscale

images of 38 persons. 64 frontal face images of each person are

divided into subsets 1-5 with illumination variations from slight

to severe. Subsets 1-5 consist of 7,12,12,14 and 19 images per

person respectively. As the original Extended Yale B face

images are grayscale, three RGB channels of the color image

used by VGG and ArcFace employ the same grayscale image.

The first 10 persons of Extended Yale B form the Yale B face

database.

The CMU PIE [38] database incorporates color images of 68

persons. 21 images of each person from each of C27 (frontal

camera), C29 (horizontal 22.5 °camera) and C09 (above

camera) in CMU PIE illum set are selected. CMU PIE face

images are with slight/moderate/severe illumination variations.

From Fig.5, pose variation of C29 is larger than that of C09.

The AR database [39] incorporates color images of 126

persons in two sessions. 100 persons (50 males and 50 females)

in session 1 and session 2 are selected, and 10 images of each

person are selected, which include variations of expression

(neutral, smile, anger and scream), illumination (left light, right

light and all side lights) and occlusion. Scarf images are

included, whereas sunglass images are excluded.

The self-built Driver database [14] was used to explore the

identity recognition problem for the drivers in the intelligent

transportation systems. 28 individuals with 22 different images

per person are selected. These images are taken under two

scenes (indoor and in car). Each person contains 12 and 10

different images for scene 1 (indoor) and scene 2 (in car).

The VGGFace2 database [40] incorporates 3.31 million

color images of 9131 persons, which are with large variations in

pose, age, illumination, ethnicity and profession. MTCNN [41]

is employed to tackle VGGFace2 images, which results in

3308101 images of 9131 persons, where VGGFace2 train set is

with 8631 persons and 3138924 images, and VGGFace2 test set

is with 500 persons and 169177 images.

In the experiments of Extended Yale B, CMU PIE, AR and

Driver, 8 persons (11th -18th) are selected to make up the

generic set in each dataset as shown in Tables III, IV and V, and

the rest persons are used for validation. For each dataset

excluding the generic persons, the single training set consists of

one image of each validation person, and the rest images of

each validation person are designated to test. The first image to

the last one of each validation person is designated to form the

single training set in turn, thus the testing times of each dataset

is equal to the number of images of each person in the dataset. It

is worth noting that every person has the same number of

images in each dataset. Hence, the recognition rates are average

results in tables III, IV and V. The experiments are more

challenging for the compared methods than previous works

[8]-[16], since more single training sets are used here. Our

experiments can make significant distinctions for many

compared methods as shown in Tables III, IV and V.

In the experiments of VGGFace2, images of the last person

are used to construct the generic set in each dataset as shown in

Tables VI and VII, and the rest persons are used for validation.

For each dataset, the single training set consists of the first

image of each validation person, and the rest images of each

validation person are designated to test.

B. Compared methods

(1) Proposed method. EGIR-face, BGIR-face, EGIRC,

BGIRC, EGIR-VGG/ArcFace and BGIR-VGG/ArcFace. Three

GIR images (i.e.



= 0.4, 1, and 1.6) are generated for EGIRC,

BGIRC, EGIR-VGG/ArcFace and BGIR-VGG/ArcFace.

(2) High-frequency facial feature and Local pattern

descriptor. LOG-DCT [11], LTV [12], SL-LTV [13],

HFSVD-face [14], and CSQP [23]. The parameters are the

same as the original paper recommended.

(3) Illumination invariant measure. Gradient-face [15],

Weber-face [31], MSLDE [34], LNN-face [14],

MSLDE+ESRC and LNN-face+ESRC. The MSLDE6 in [34]

is adopted. LNN-face+ESRC represents that ESRC is used to

classify LNN-faces of the face images, the same interpretation

can also be done for MSLDE+ESRC.

(4) Pre-trained deep learning model. VGG [29] and

ArcFace [31], VGG/ArcFace+ESRC. The 4096D VGG feature

and the 512D ArcFace feature are used. VGG/ArcFace+ESRC

has the same interpretation as LNN-face+ESRC.

(5) Original and LOG. Original and LOG represent the

pixel image without any processing and the logarithm image,

which are directly used as facial features for recognition.

(6) Source code location. The codes of Log-DCT,

Gradient-face and Weber-face were downloaded at http://luks.

fe.uni-lj.si/sl/osebje/vitomir/face_tools/INFace/index.html.The

code of LTV was downloaded at http://www.caam.rice.edu

/~wy1/ParaMaxFlow/2007/06/binarb-code.html. The code of

VGG was downloaded at http://www.robots.ox.ac.uk/_vgg

/software/vgg_face/. The code of ArcFace was downloaded at

https://github.com/deepinsight/ insightface, and the third party

pre-trained model model-r100-ii was adopted and downloaded

at https://pan.baidu.com/s/1wuRTf2YIsKt76TxFufsRNA,

which was trained by MS1MV2 (85742 persons and 5.8M

images). The code of Homotopy [37] was downloaded at

http://www.eecs.berkeley.edu/_yang/software/l1benchmark/,

where the error tolerance



=0 is used. The parameters of

Gradient-face, Weber-face, LTV and VGG are the same as the

source codes recommended.

If there is not specifically stated, the compared methods

(Original, LOG, LOG-DCT, LTV, SL-LTV, HFSVD-face,

Weber-face, MSLDE, LNN-face, VGG and ArcFace) employ

the nearest neighbor (NN) classifier with Euclidean distance for

the classification, whereas Gradient-face uses the classifier as

[15] recommended. LOG-DCT, LTV, SL-LTV, HFSVD-face,

Weber-face, MSLDE, LNN-face, EGIR-face and BGIR-face

are termed as the illumination invariant approaches.

C. Experiment results

Tables III, IV and V list average recognition rates of the

compared methods on Extended Yale B, CMU PIE, AR and

Driver datasets. Tables VI and VII list recognition rates of some

compared methods onVGGFace2 train+test set and test set.

(1) Extended Yale B. The Extended Yale B database is with

extremely challenging illumination variations. Face images in

Subsets 1-2 are with slight illumination variations. Face images

in Subset 3 are with small scale cast shadows, and face images

in Subset 4 are with moderate scale cast shadows, whereas face

images in Subset 5 are with large scale cast shadows (or severe

holistic illumination variations). From Table III, we can

conclude some important results as below.

1) EGIRC and BGIRC outperform EGIR-face and

BGIR-face due to multi GIR images and ESRC based classifier.

EGIR-face and BGIR-face perform better than MSLDE and

LNN-face under severe illumination variations, whereas lag

behind MSLDE and LNN-face respectively on Subsets 1-3

with slight illumination variations and small scale cast shadows,

due to the fact that EGIR-face and BGIR-face are tailored to

severe illumination variations as shown in Fig.5 and Fig.6.

2) As VGG/ArcFace performs well on Subsets 1-3, but

unsatisfactorily under severe illumination variations, it is easy

to know that EGIR-VGG/ArcFace and BGIR-VGG/ArcFace

outperform EGIRC and BGIRC on Subsets 1-3 respectively,

but lag behind EGIRC and BGIRC on other datasets except on

Subset 4. Although VGG/ArcFace degrades the performance of

the GIR-DPL model under severe illumination variations,

VGG/ArcFace can improve the performance of the GIR-DPL

model on Subset 4, due to Subset 4 images are with moderate

cast shadows, which are not as extreme as Subset 5 images.

3) ArcFace outperforms VGG on all face datasets except on

Subsets 1-3, where ArcFace slightly lags behind VGG, since

ArcFace and VGG can well tackle Subsets 1-3 images with

slight illumination variations and small scale cast shadows,

whereas other face datasets of Extended Yale B contain images

with severe illumination variations. Hence, ArcFace performs

better than VGG under severe illumination variations.

Moreover, ArcFace lags behind the compared illumination

invariant approaches, especially on severe illumination

variation datasets.

TABLE III

THE AVERAGE RECOGNITION RATES (%) OF THE COMPARED METHODS ON THE

EXTENDED YALE B FACE DATABASE

Method

Subsets1-3

Subset4

Subset5

Subsets4-5

Total

Original

48.52

19.63

15.79

14.96

20.21

LOG

49.58

32.55

32.51

26.41

22.39

LOG-DCT

83.34

70.79

92.68

81.74

76.41

LTV

78.08

56.90

68.56

59.23

58.32

SL-LTV

79.45

60.97

73.82

64.48

61.85

HFSVD-face

94.92

83.02

97.97

90.18

86.49

Gradient-face

85.57

68.06

95.70

83.78

68.42

Weber-face

87.07

58.66

92.52

77.66

74.21

CSQP

83.13

59.04

87.67

74.55

65.53

MSLDE

81.30

53.35

81.45

66.79

60.27

LNN-face

84.83

61.59

92.02

77.98

70.32

EGIR-face

77.20

61.69

88.12

74.54

66.74

BGIR-face

77.99

70.15

93.27

82.17

72.75

VGG

86.31

47.14

27.67

30.90

45.32

ArcFace

85.56

53.28

30.93

35.49

49.71

MSLDE+ESRC

90.19

66.41

92.17

80.38

75.60

LNN+ESRC

92.55

76.08

97.11

88.36

82.70

VGG+ESRC

94.19

61.90

40.60

43.58

57.75

ArcFace+ESRC

91.35

58.55

36.16

41.78

55.95

EGIRC

95.88

75.62

96.31

86.84

83.59

BGIRC

96.31

78.53

97.30

89.27

86.69

EGIR-VGG

98.33

81.79

84.30

80.69

82.28

EGIR-ArcFace

97.92

79.49

83.13

79.19

78.24

BGIR-VGG

98.42

82.45

82.84

80.19

83.53

BGIR- ArcFace

97.95

79.19

81.73

78.30

78.97

(2) CMU PIE. Some CMU PIE face images are bright (i.e.

slight illumination variations), and other face images are with

partial dark (i.e. moderate/severe illumination variations).

Illumination variations of CMU PIE are not as extreme as those

of Extended Yale B. From Table IV, we can attain the

following results.

1) Images in each of C27, C29 and C09 are with the same

pose (i.e. frontal, 22.5°profile and downward respectively),

whereas images in each of C27+C29 and C27+C09 incorporate

two face poses (i.e. frontal pose and non-frontal pose).

Although VGG/ArcFace cannot achieve the highest recognition

rates under fixed pose and moderate/severe illumination

variations, VGG/ArcFace performs much better than the

illumination invariant approaches under multi face poses and

moderate/severe illumination variations. Moreover,

EGIR-VGG/ArcFace and BGIR-VGG/ArcFace outperform

VGG/ArcFace+ESRC, which illustrates that the GIR model

can improve the performance of the GIR-DPL model under

illumination and pose variations.

2) On C27+C29 and C27+C09, BGIR-face lags behind

LNN-face, whereas BGIR-face is superior to LNN-face on C27,

C29 and C09, which illustrates that BGIR-face outperforms

LNN-face under fixed face pose such as on C27, C29 or C09,

whereas lags behind LNN-face under multi face poses such as

C27+C29 or C27+C09. Hence, BGIR-face is more sensitive to

pose variations than LNN-face.

3) ArcFace outperforms VGG on all face datasets except on

C27+C29, where ArcFace slightly lags behind VGG, which

illustrates that ArcFace slightly lags behind VGG to tackle

frontal and 22.5°profile face images with moderate/severe

illumination variations as shown in Fig,7. Although C27+C09

images also incorporate frontal faces and downward faces, pose

variation of C09 downward face is not as large as that of C29

profile face. Moreover, ArcFace lags behind the illumination

invariant approaches on C27 and C29, whereas ArcFace

outperforms the illumination invariant approaches on C09, and

performs much better than the compared illumination invariant

approaches on C27+C29/C27+C09 due to pose variations.

TABLE IV

THE AVERAGE RECOGNITION RATES (%) OF THE COMPARED METHODS ON THE

CMU PIE FACE DATABASE

Method

C27

C29

C09

C27+C29

C27+C09

Original

30.31

30.17

27.52

20.88

19.97

LOG

31.19

30.06

27.04

20.18

19.66

LOG-DCT

93.12

86.88

90.08

46.83

45.69

LTV

87.13

80.46

81.70

46.89

44.45

SL-LTV

88.83

80.95

85.92

47.29

45.31

HFSVD-face

94.50

87.30

91.71

52.82

51.21

Gradient-face

88.26

85.71

87.58

51.64

53.26

Weber-face

89.17

84.00

89.17

49.46

46.42

CSQP

86.36

82.46

83.21

51.97

49.81

MSLDE

81.01

77.57

80.04

46.89

48.41

LNN-face

89.26

84.67

88.29

50.29

51.32

EGIR-face

82.12

83.50

83.33

47.75

47.66

BGIR-face

89.30

89.25

89.72

50.06

49.26

VGG

87.33

76.91

86.67

79.78

83.69

ArcFace

91.90

78.02

97.51

79.57

86.62

MSLDE+ESRC

91.68

88.46

90.46

57.05

58.35

LNN+ESRC

95.09

91.85

94.70

57.08

59.38

VGG+ESRC

95.73

89.02

94.90

91.70

94.00

ArcFace+ESRC

94.89

81.40

97.85

83.48

89.32

EGIRC

92.88

87.86

92.96

60.13

58.12

BGIRC

93.86

89.00

94.21

58.66

55.91

EGIR-VGG

98.88

95.48

98.52

93.95

94.35

EGIR-ArcFace

98.40

93.38

99.07

88.65

89.17

BGIR-VGG

99.08

95.91

98.88

94.40

95.06

BGIR-ArcFace

98.66

93.92

99.37

88.92

89.94

(3) AR and Driver. AR face images are with frontal pose,

slight illumination and moderate/severe expression variations

as well as scarf occlusion. Driver face images are with frontal

faces and moderate/severe illumination variations. Illumination

variations of AR and Driver are not as severe as those of

Extended Yale B and CMU PIE. From Table V, we can obtain

the following results.

1) On AR, for NN based classification, HFSVD-face

outperforms VGG on AR1 and slightly lags behind VGG on

AR2, whereas VGG is superior to HFSVD-face by margins of

over 5% on AR1+AR2, which indicates that VGG is more

robust than the illumination invariant approaches, when the

face dataset is extended. For ESRC based classification,

BGIR-VGG and EGIR-VGG achieve the best performances,

which illustrates that the model-driven approach and the

data-driven approach can be well integrated to tackle face

recognition with various variations.

2) On AR, ArcFace lags behind VGG, which indicates that

VGG is superior to ArcFace to address frontal face images with

moderate/severe expression and slight illumination variations

as shown in Fig.7. ArcFace also lags behind several compared

illumination invariant approaches on AR face datasets.

4) On Driver, ArcFace outperforms all the compared

illumination invariant approaches, and performs much better

than VGG. The reason can be explained as below. ArcFace is

more efficient than VGG to tackle frontal faces with

moderate/severe illumination variations, and Driver face

images are more similar with internet face images than face

images from Extended Yale B, CMU PIE and AR as shown in

Fig.7. EGIR-ArcFace achieves the best performance under

ESRC based classification on Driver face database.

TABLE V

THE AVERAGE RECOGNITION RATES (%) OF THE COMPARED METHODS ON THE

AR AND DRIVER FACE DATABASES

Method

AR1

AR2

AR1+AR2

Driver

Original

14.11

13.21

13.82

33.95

LOG

19.17

18.08

17.14

36.73

LOG-DCT

38.66

36.01

31.40

40.92

LTV

48.83

47.33

42.08

54.92

SL-LTV

49.19

46.93

42.02

55.11

HFSVD-face

63.76

58.49

53.28

68.83

Gradient-face

57.69

55.14

50.48

69.25

Weber-face

49.15

47.43

42.17

62.89

MSLDE

45.30

43.21

38.99

69.63

CSQP

50.19

47.67

43.81

67.81

LNN-face

50.53

48.76

44.23

71.80

EGIR-face

44.41

42.11

37.02

68.56

BGIR-face

44.35

42.21

37.08

67.67

VGG

60.58

59.71

58.61

66.18

ArcFace

41.41

40.37

39.10

76.46

MSLDE+ESRC

61.61

56.92

54.84

79.16

LNN+ESRC

65.66

61.94

59.31

79.53

VGG+ESRC

75.40

74.20

73.68

77.65

ArcFace+ESRC

47.61

46.00

46.15

81.13

EGIRC

67.69

65.07

60.53

85.12

BGIRC

68.03

64.29

60.47

81.97

EGIR-VGG

83.53

80.93

79.66

91.17

EGIR-ArcFace

71.30

67.85

66.51

91.34

BGIR-VGG

83.51

81.35

80.21

89.02

BGIR-ArcFace

71.49

67.46

66.14

90.28

(4) VGGFace2. As VGGFace2 images are composed of

bright internet face images with large pose/expression

variations, and the illumination of VGGFace2 images are not as

severe as Extended Yale B and CMU PIE, which cannot well

validate the proposed illumination invariant approaches. From

Tables VI and VII, we can get the following results.

1) For NN based classification, ArcFace outperforms VGG.

Besides different network structures of ArcFace and VGG,

another one of the main reasons is that ArcFace employed

85742 persons and 5.8M images to train, whereas VGG was

trained by 2622 persons and 2.6M images. ArcFace and VGG

perform much better than other compared illumination

invariant approaches, since ArcFace and VGG are well trained

by large scale internet face images, whereas the illumination

invariant approaches do not depend on large scale face images

to train.

2) For ESRC based classification, ArcFace+ESRC is slightly

better than ArcFace, the reason is that ArcFace is trained by

MS1MV2 face images, which are very similar with VGGFace2

face images in comparison with Extended Yale B, CMU PIE,

AR and Driver face images. ArcFace can extract very

discriminative facial features of VGGFaces2 face images, and

ESRC cannot efficiently improve the performance of 512D

ArcFace. However, ESRC can significantly improve the

performance of 4096D VGG, thus EGIRC-VGG and

BGIRC-VGG achieve the highest recognition rates on

VGGFaces2 test set.

The four face databases Extended Yale B, CMU PIE, AR and

Driver are with small size in comparison with the large scale

face database VGGFace2, whereas these four face databases

are with benchmark illumination variations, which can be used

to well validate the performance of the illumination invariant

approaches, since the large scale internet face images are

without severe illumination variations.

TABLE VI

THE RECOGNITION RATES (%) OF THE COMPARED METHODS ON THE

VGGFACE2 TRAIN+TEST SET

MSLDE

LNN-face

EGIR-face

1.00

0.87

0.75

BGIR-face

CSQP

ArcFace

0.65

1.03

22.69

TABLE VII

THE RECOGNITION RATES (%) OF THE COMPARED METHODS ON THE

VGGFACE2 TEST SET

MSLDE

LNN-face

EGIR-face

BGIR-face

3.53

3.07

2.93

2.55

CSQP

VGG

ArcFace

ArcFace+ESRC

3.46

28.80

34.84

35.67

EGIRC

BGIRC

EGIRC-VGG

BGIRC-VGG

3.54

3.20

41.98

44.19

D. CMC curves of some compared methods

510 15 20 25 30

100

Rank

Recognition rate (%)

CSQP

MSLDE

LNN-face

BGIR-face

VGGFace

ArcFace

BGIRC

BGIR-VGG

BGIR-ArcFace

Fig.8.CMC of some compared methods on Extended Yale B database.

510 15 20 25 30 35 40 45 50 55 60

100

Rank

Recognition rate (%)

CSQP

MSLDE

LNN-face

BGIR-face

VGGFace

ArcFace

EGIRC

BGIR-VGG

BGIR-ArcFace

Fig.9.CMC of some compared methods on C27+C29 of CMU PIE database.

The cumulative match characteristic (CMC) curves of some

compared methods in Extended Yale B, CMU PIE, AR, Driver

and VGGFace2 datasets are shown in Fig.8 to Fig.13. These

CMC curves follow the same experiment protocols of the

corresponding datasets in Tables III, IV, V and VII.

Recognitions rates of rank=1 in Fig.8 to Fig.13 are equal to

recognition rates of corresponding datasets in Tables III, IV, V

and VII. The proposed methods show consistent improvement

in recognition rate with increasing ranks.

510 15 20 25 30 35 40 45 50 55 60

100

Rank

Recognition rate (%)

CSQP

MSLDE

LNN-face

BGIR-face

VGGFace

ArcFace

EGIRC

BGIR-VGG

BGIR-ArcFace

Fig.10.CMC of some compared methods on C27+C09 of CMU PIE database.

10 20 30 40 50 60 70 80 90

100

Rank

Recognition rate (%)

CSQP

MSLDE

LNN-face

BGIR-face

VGGFace

ArcFace

EGIRC

BGIR-VGG

EGIR-ArcFace

Fig.11.CMC of some compared methods on AR1+AR2 database.

2 4 6 8 10 12 14 16 18 20

100

Rank

Recognition rate (%)

CSQP

MSLDE

LNN-face

EGIR-face

VGGFace

ArcFace

EGIRC

EGIR-VGG

EGIR-ArcFace

Fig.12.CMC of some compared methods on Driver database.

50 100 150 200 250 300 350 400 450

100

Rank

Recognition rate (%)

CSQP

MSLDE

LNN-face

EGIR-face

VGGFace

ArcFace

EGIRC

ArcFace+ESRC

BGIR-VGG

Fig.13.CMC of some compared methods on VGGFace2 test set.

Moreover, ESRC requires more than one seconds to classify

one test image under 9131 (i.e. VGGFace2 train+test set)

labeled classes on our PC Intel (R) Core (TM) i5-6500 CPU

3.20GHz, and the most computational cost is to calculate 9131

class representation residuals for the test image. Hence,

recognition rates and CMC curves of the ESRC based methods

MSLDE/LNN/VGG/ArcFace+ESRC, EGIRC/BGIRC and

EGIR /BGIR-VGG/ArcFace as well as VGG are not reported

on VGGFace2 train+test set.

E. The illumination of the driver face images

The driver face images are collected at night or day under

rain, cloudy, sunny, or clear weather. The extreme weather

conditions (such as night or day under heavy rain or snow) may

make the driver face image unrecognizable even by human,

which are more challenging than severe illumination variations.

However, a small part of driver face images is taken under

extreme weather conditions (since normal weather conditions

are much more than extreme weather conditions in our life).

Most driver face images are collected under normal weather

conditions (i.e. night or day under clear or cloudy weather),

which could suffer varying illumination rather than well

illumination, especially severe illumination variations. In a

word, severe illumination variation is one of the main

characteristics of the driver face images, which is one of the

main tough issues of the driver face recognition.

Although the illumination of clear day may be brighter than

that of clear night, the illumination of cloudy day may be darker

than that of clear night (since lighting equipment is usually used

at night). The realistic illumination conditions of the driver face

images cannot be clearly distinguished according to different

weather conditions such as night or day under clear or cloudy

weather.

In fact, the face image illumination processing has been

studied for several decades in the literature. From the view of

visual conditions, illuminations of the face images can be

roughly divided into well illumination and varying illumination

(i.e. slight, moderate and severe illumination variations). Well

illumination can improve the performance of face recognition,

whereas varying illumination could degrade the performance of

face recognition. It is proper to assess the driver face image

according to the illumination of the face image itself rather than

the weather condition of the face image. Due to the complexity

of illumination variations, in the literature, no strict and

accurate criterion can be used to assess the face image

illumination conditions, and it is difficult to actually give the

illumination levels of various face images. A recent face image

illumination level estimation method used the singular values

to assess the face image illumination levels [14], whereas it

depended on high-quality reference images.

F. The used databases and their illumination

Extended Yale B, CMU PIE and AR are generic biometric

databases, whereas they are with benchmark illumination

variations from slight to severe, which are widely used and

recognized by the worldwide researchers in the literature. The

self-built driver face images [14] are collected under certain

illumination conditions, which cannot cover all the real

illumination variations of the driver face images in the

intelligent traffic monitoring systems. As the real driver face

images are difficult to form a validation database for the face

recognition method, it is proper to employ Extended Yale B,

CMU PIE, AR and Driver face databases to verify the

performance of the face recognition method under severe

illumination variations.

G. The proposed methods

The proposed GIR model is one of the model-driven based

illumination processing methods. Unlike the data-driven based

deep learning method, the model-driven based illumination

processing methods such as MSLDE [34] and LNN-face [14]

do not depend on large scale face images training. GIR-face in

formula (7) employs only one parameter



, and no parameter

is introduced into GIRC in formula (8) (or GIRC-DPL in

formula (10)). Hence, the proposed methods do not require a

training processing.

From Tables III, IV, V and VII, EGIR-VGG/ArcFace and

BGIR-VGG/ArcFace achieve the highest recognition rates on

all face datasets except on Extended Yale B with severe

illumination variations. The reason is that the pre-trained deep

learning model is restricted to frontal face images with severe

illumination variations, whereas this is insufficient to deny that

EGIR-VGG/ArcFace and BGIR-VGG/ArcFace are the best

approaches to tackle the driver face recognition.

EGIRC and BGIRC are good at frontal face images with

severe illumination variations, but unsatisfactory to pose

variations. Although EGIR-face and BGIR-face lag behind

MSLDE and LNN-face under slight illumination variations,

EGIR-face and BGIR-face outperform MSLDE and LNN-face

under severe illumination variations respectively.

H. The centre symmetric quadruple pattern

CSQP [23] and the proposed GIR model are image pixel

processing based approaches, whereas CSQP is for general face

recognition and the proposed GIR model aims to address severe

illumination variation face recognition. From experimental

results on Extended Yale B and CMU PIE, CSQP lags behind

the proposed BGIR-face under severe illumination variations

except on Subsets 1-3 of Extended Yale B and

C27+C29/C27+C09 of CMU PIE. It can be seen from Fig.7 that

Subsets 1-3 images of Extended Yale B incorporate slight

illumination variations and small scale cast shadows, and

C27+C29/C27+C09 images of CMU PIE contain pose

variations (i.e. frontal and non-frontal face images) and

moderate/severe illumination variations. From experimental

results on AR and Driver, CSQP outperforms the proposed

EGIR-face/BGIR-face. Since AR face images are with slight

illumination and moderate/severe expression variations as well

as scarf occlusion. Driver face images are with frontal faces and

moderate/severe illumination variations.

As discussed above, CSQP outperforms the proposed

EGIR-face/BGIR-face under slight/moderate illumination

variations as well as pose variations, whereas BGIR-face is

superior to CSQP under severe illumination variations.

Moreover, CSQP lags behind the proposed EGIRC/BGIRC and

EGIR-PDL/BGIR-PDL (PDL is VGG or ArcFace).

I. The pre-trained deep learning model

VGG was trained by 2.6M internet face images, and ArcFace

was trained by 5.8M internet face images. These large scale

internet face images are with large pose/expression and

slight/moderate illumination variations. From Tables III, IV

and V, VGG/ArcFace and VGG/ArcFace+ESRC performs

unsatisfactorily under severe illumination variations, and

ArcFace outperforms VGG under moderate/severe illumination

variations.

From Tables III to VII, ArcFace+ESRC lags behind

VGG+ESRC on all face datasets except on CMU PIE C09 and

Driver (since ArcFace performs much better than VGG on

CMU PIE C09 and Driver), which means ESRC can efficiently

improve the performance of VGG rather than ArcFace. One

reason can be explained as that ArcFace can extract more

discriminative facial features than VGG for the face image, the

template matching method NN classification is sufficient to

well classify ArcFace, whereas the robust classifier ESRC

cannot improve the performance of ArcFace as efficient as that

of VGG, especially on VGGFaces2. Another reason may be

that ArcFace and VGG are 512D and 4096D features

respectively, whereas 4096D feature may incorporate more

recognizable information than 512D feature. ESRC can further

significantly improve 4096D VGG rather than 512D ArcFace.

VI. CONCLUSION

In the driver face recognition systems, severe illumination

variation is a tough issue. This paper proposes the GIR model to

address severe illumination variations of the driver face images.

The proposed GIR model is efficient to severe illumination

variations. EGIR-face/BGIR-face achieves comparable

recognition rates compared with other illumination invariant

approaches. EGIRC/BGIRC is superior to the illumination

invariant approaches, since multi GIR images cover more

discriminative information of the face image. Moreover, the

proposed GIR model is integrated with the pre-trained deep

learning model to achieve higher recognition rates under

various illumination variations from slight to severe for face

recognition. Hence, we can conclude that the GIR-PDL model

is one of the efficient recognition approaches for the driver face

images. Even the driver face images can be used to construct

the deep learning training set, the GIR-PDL model may also

improve the performance of the deep learning model trained by

the driver face images.

REFERENCES

[1] G. Sikander and S. Anwar, “Driver fatigue detection systems A review”,

IEEE Transactions on Intelligent Transportation Systems, vol. 10, no. 6, pp.

2339-2352, Jun. 2018.

[2] B. I. Ahmad , P. M. Langdon, J. Liang, S. J. Godsill, M. Delgado and T.

Popham, “Driver and Passenger Identification From Smartphone Data”, IEEE

Transactions on Intelligent Transportation Systems, vol. 20, no. 4, pp.

1278-1288, Apr. 2018.

[3] A. Amodio, M. Ermidoro, D. Maggi, S, Formenin and S. M. Savaresi,

“Automatic Detection of Driver Impairment Based on Pupillary Light Reflex”,

IEEE Transactions on Intelligent Transportation Systems,

10.1109/TITS.2018.2871262, 2018.

[4] E. Derman and A. A. Salah, “Continuous real-time vehicle driver

authentication using convolutional neural network based face recognition”, In

Proceedings of the 13th IEEE International Conference and Workshops on

Automatic Face and Gesture Recognition, May. 2018, pp. 577-584.

[5] W. Zhang, Z. H. A. O. Xi, J. M. Morvan and L. Chen, “Improving Shadow

Suppression for Illumination Robust Face Recognition”, IEEE transactions on

pattern analysis and machine intelligence, vol. 41, no. 3, pp. 611-624, Mar.

2018.

[6] W. Deng, J. Hu and J. Guo, “Extended SRC: Undersampled face

recognition via intraclass variant dictionary”, IEEE Transactions on Pattern

Analysis and Machine Intelligence, vol. 34, no. 9, pp. 1864-1870, Sept. 2012.

[7] J. W. Wang, N. T. Le, J. S. Lee and C. C. Wang, “Color face image

enhancement using adaptive singular value decomposition in fourier domain

for face recognition”, Pattern Recognition, vol. 57, pp. 31-49, Sept. 2016.

[8] T. Zhang, B. Gang, Y. Yuan, Y.Y. Tang, Z. Shang, D. Li and F. Lang,

"Multiscale facial structure representation for face recognition under varying

illumination," Pattern Recognition, vol. 42, no. 2, pp.251-258, Feb. 2009.

[9] A. Baradarani, Q. Wu and M. Ahmadi, “An efficient illumination invariant

face recognition framework via illumination enhancement and DD-DTCWT

filtering”, Pattern Recognition, vol. 46, no. 1, pp. 57-72, Jan. 2013.

[10] X. Fu, D. Zeng, Y. Huang, X. Zhang and X. Ding, “A weighted variational

model for simultaneous reflectance and illumination estimation”, In

Proceedings of the IEEE Conference on Computer Vision and Pattern

Recognition, Jun. 2016, pp. 2782-2790.

[11] W. Chen, M. J. Er and S. Wu, “Illumination compensation and

normalization for robust face recognition using discrete cosine transform in

logarithm domain”, IEEE Transactions on Systems, Man, and Cybernetics, Part

B: Cybernetics, vol. 36, no. 2, pp. 458-466, Apr. 2006.

[12] T. Chen, W. Yin, X. S. Zhou, D. Comaniciu and T. S. Huang, “Total

variation models for variable lighting face recognition”, IEEE Transactions on

Pattern Analysis and Machine Intelligence, vol. 28, no. 9, pp. 1519-1527, Sept.

2006.

[13] X. Xie, W. Zheng, J. Lai, P. Yuen and C. Suen, “Normalization of face

illumination based on large-and small-scale features”, IEEE Transactions on

Image Processing, vol. 20, no. 7, pp. 1807-1821, Jul. 2011.

[14] C. Hu, X. Lu, M. Ye and W. Zeng, “Singular value decomposition and

local near neighbors for face recognition under varying illumination”, Pattern

Recognition, vol. 64, pp. 60-83, Apr. 2017.

[15] T. Zhang, Y. Tang, B. Fang, Z. Shang and X. Liu, “Face recognition under

varying illumination using gradientfaces”, IEEE Transactions on Image

Processing, vol. 18, no. 11, pp. 2599-2606, Nov. 2009.

[16] J. Zhu, W. Zheng and J. Lai, “Illumination invariant single face image

recognition under heterogeneous lighting condition”, Pattern Recognition, vol.

66, pp. 313-327, Jun. 2017.

[17] B. K. P. Horn, “Robot Vision”, Cambridge, MA, MIT Press, 1997.

[18] S. R. Dubey, S. K. Singh and R. K. Singh, “Multichannel decoded local

binary patterns for content-based image retrieval”, IEEE transactions on image

processing, vol. 25, no. 9, pp. 4018-4032, Sept. 2016.

[19] S. R. Dubey, S. K. Singh and R. K. Singh, “Local bit-plane decoded

pattern: a novel feature descriptor for biomedical image retrieval”, IEEE

Journal of Biomedical and Health Informatics, vol. 20, no. 4, pp. 1139-1147,

Jul. 2015.

[20] S. R. Dubey, S. K. Singh and R. K. Singh, “Local wavelet pattern: a new

feature descriptor for image retrieval in medical CT databases”, IEEE

Transactions on Image Processing, vol. 24, no. 12, pp. 5892-5903, Dec. 2015.

[21] S. Chakraborty, S. K. Singh and P. Chakraborty, “R-theta local

neighborhood pattern for unconstrained facial image recognition and retrieval”,

Multimedia Tools and Applications, vol. 78, no. 11, pp. 14799-14822, Jun.

2019.

[22] S. Chakraborty, S. K. Singh and P. Chakraborty, “Local directional

gradient pattern: a local descriptor for face recognition”, Multimedia Tools and

Applications, vol. 76, no. 1, pp. 1201-1216, Jan. 2017.

[23] S. Chakraborty, S. K. Singh and P. Chakraborty, “Centre symmetric

quadruple pattern: A novel descriptor for facial image recognition and

retrieval”, Pattern Recognition Letters, vol. 115, pp. 50-58, Nov. 2018.

[24] W. T. Su, C. C. Hsu, C. W. Lin and W. Lin, “Supervised-learning based

face hallucination for enhancing face recognition”, In Proceedings of the IEEE

International Conference on Acoustics, Speech and Signal Processing, Mar.

2016, pp. 1751-1755.

[25] C. Hu, M. Ye, S. Ji, W. Zeng and X. Lu, “A new face recognition method

based on image decomposition for single sample per person problem”,

Neurocomputing, vol. 160, pp. 287-299, Jul. 2015.

[26] Z. Fan, D. Zhang, X. Wang, Q. Zhu and Y. Wang, “Virtual dictionary

based kernel sparse representation for face recognition”, Pattern Recognition,

vol. 76, pp. 1-13, Apr. 2018.

[27] Y. Gao, J. Ma and A. Yuille, “Semi-supervised sparse representation based

classification for face recognition with insufficient labeled samples”, IEEE

Transactions on Image Processing, vol. 26, no. 5, pp. 2545-2560, May. 2017.

[28] F. Schroff, D. Kalenichenko and J. Philbin, “Facenet: A unified

embedding for face recognition and clustering”, In Proceedings of the IEEE

conference on computer vision and pattern recognition, Jun. 2015, pp. 815-823.

[29] O. M. Parkhi, A. Vedaldi and A. Zisserman, “Deep face recognition”, In

Proceedings of the British Machine Vision Conference, 2015, pp. 1-12.

[30] F. Qiu, W. Lin, X. Liu, H. Yu and H. Xiong, “Deep Face Recognition

Using Adaptively-Weighted Verification Loss Function”, In Proceedings of the

International Forum on Digital TV and Wireless Multimedia Communications,

2017, pp. 182-192.

[31] J. Deng, J. Guo, N. Xue and S. Zafeiriou, “Arcface: Additive angular

margin loss for deep face recognition”, In Proceedings of the IEEE Conference

on Computer Vision and Pattern Recognition, 2019, pp. 4690-4699.

[32] B. Wang, W. Li, W. Yang and Q. Liao, “Illumination normalization based

on weber's law with application to face recognition”, IEEE Signal Processing

Letters, vol. 18, no. 8, pp. 462-465, Aug. 2011.

[33] Y. Wu, Y. Jiang, Y. Zhou, W. Li, Z. Lu and Q. Liao, “Generalized

Weber-face for illumination-robust face recognition”, Neurocomputing, vol.

136, pp. 262-267, Jul. 2014.

[34] Z. Lai, D. Dai, C. Ren, and K. Huang, “Multiscale logarithm difference

edgemaps for face recognition against varying lighting conditions”, IEEE

Transactions on Image Processing, vol. 24, no. 6, pp. 1735-1747, Jun. 2015.

[35] A. S. Georghiades, P. N. Belhumeur and D. Kriegman, “From few to many:

Illumination cone models for face recognition under variable lighting and pose”,

IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6,

pp. 643-660, Jun. 2001.

[36] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry and Y. Ma, “Robust face

recognition via sparse representation”, IEEE Transactions on Pattern Analysis

and Machine Intelligence, vol. 31, no. 2, pp. 210-227, Feb. 2009.

[37] D. L. Donoho and Y. Tsaig, “Fast solution of L1-norm minimization

problems when the solution may be sparse”, IEEE Transactions on Information

Forensics and Security, vol. 54, no. 11, pp. 4789-4812, Nov. 2008.

[38] T. Sim, S. Baker and M. Bsat, “The CMU pose, illumination, and

expression database”, IEEE Transactions on Pattern Analysis and Machine

Intelligence, vol. 25, no. 12, pp. 504-507, Dec. 2003.

[39] A. M. Martinez and R. Benavente, “The AR face database”, CVC Tech.

Rep. #24, Jun. 1998.

[40] Q. Cao, L. Shen, W. Xie, O. M. Parkhi and A. Zisserman, “Vggface2: A

dataset for recognising faces across pose and age”, In Proceedings of the IEEE

International Conference on Automatic Face and Gesture Recognition, May.

2018, pp. 67-74.

[41] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and

alignment using multitask cascaded convolutional networks”, IEEE Signal

Processing Letters, vol. 23, no. 10, pp. 1499–1503, Oct. 2016.

Chang-Hui Hu received the Ph.D. degree from the

School of Automation, Southeast University, Nanjing,

China, in 2017. He is currently a lecturer with the

College of Automation, Nanjing University of Posts and

Telecommunications, and a post-doctor with Southeast

University. His research interests include image

processing and pattern recognition. He has received

many research awards, such as the Best Doctoral

Dissertation Award from the China Intelligent

Transportation Systems Association in 2017, the Prize in the Science and

Technology Award of Jiangsu province in 2017, and the National Scholarship

from Ministry of Education of China in 2015.

Yang Zhang received the B.S. degree in Communication

Engineering from Chongqing University of Posts and

Telecommunications, Chongqing, China, in 2013 and the

M.S. degree in Instrument Engineering from Guilin

University of Electronic Technology, Guilin, China, in

2016. She is currently working toward the Ph.D. degree

with the School of Automation, Southeast University. Her

current research interests include image processing, face

recognition, and pattern recognition.

Fei Wu received the Ph.D. degree in computer science

from Nanjing University of Posts and

Telecommunications, China, in 2016. He is currently

with the College of Automation in NJUPT. He has

authored over thirty scientiﬁc papers. His research

interests include pattern recognition, artiﬁcial

intelligence, and computer vision.

Xiao-Bo Lu received the Ph.D. degree from Nanjing

University of Aeronautics and Astronautics. He did his

postdoctoral research with Chien-Shiung Wu

Laboratory, Southeast University, from 1998 to 2000.

He is currently a Professor with the School of

Automation and deputy Director of the Detection

Technology and Automation Research Institute,

Southeast University. He is a coauthor of the book An

Introduction to the Intelligent Transportation Systems

(Beijing, China Communications, 2008). His research interests include image

processing, signal processing, pattern recognition, and computer vision. Dr. Lu

has received many research awards, such as the First Prize in Natural Science

Award from the Ministry of Education of China and the prize in the Science and

Technology Award of Jiangsu province.

Pan Liu received the Ph.D. degree in civil engineering

from University of South Florida, Tampa, USA, in 2006.

He is a Professor with the School of Transportation,

Southeast University, Nanjing, China. His research

interests include traffic operations and safety, and

intelligent transportation systems. He was a recipient of the

Outstanding Young Scientist Foundation of NSFC in 2019,

and also a recipient of the Distinguished Young Scientist

Foundation of NSFC in 2013.

XiaoYuan Jing received the Ph.D. degree of Pattern

Recognition and Intelligent System in the Nanjing

University of Science and Technology, 1998. He was a

Professor with the Department of Computer, Shenzhen

Research Student School, Harbin Institute of Technology,

2005. Now he is a Professor with the College of

Automation & College of Artificial Intelligence, Nanjing

University of Posts and Telecommunications, and with

the School of Computer, Wuhan University, China. He

published over 100 scientific papers on the international

journals and conferences such as TPAMI, ITIP, TIFS, TSMCB, TMM, TCSVT,

TCB, CVPR, AAAI, IJCAI, ACMMM, etc.

Joint Image-to-Image Translation for Traffic Monitoring Driver Face Image Enhancement

Article

Full-text available

Aug 2023

The real traffic monitoring driver face (TMDF) images are with complex multiple degradations, which decline face recognition accuracy in real intelligent transportation systems (ITS). This paper is the first to propose joint image-to-image (I2I) translation to enhance TMDF images of ITS. First, as TMDF images are without corresponding clear ones, identity preserving is critical for TMDF images under unpaired I2I translation. This paper proposes a fast diagonal symmetry pattern (FDSP) to preserve identity structure under unpaired I2I translation. Second, FDSP is introduced into CycleGAN to form FDSP-CG, which aims to learn the degradation mapping (i.e., FDSP-CG-d) from the clarity domain to the degradation domain. FDSP-CG-d can generate massive degradation/clarity image pairs for paired I2I translation training. Third, this paper proposes the dual residual block (DRB) to strengthen Pix2pix for rich face detail features learning (i.e., DRB-P2P), which learns the enhancement mapping from the degradation image to its clear version under paired I2I translation. Finally, the experiments on TMDF (i.e., the brevity name of the face database collected from real ITS) and Chinese famous face (CFF) databases, as well as CelebA and MegaFace databases, indicate that the proposed method can efficiently enhance TMDF images whose degradation variations are learned by FDSP-CG.

Application of Facial Expression Recognition Based on Domain Adapted Convolutional Neural Network in English Smart Teaching System

Preprint

Full-text available

Feb 2023

Lilin Liu

The application of the facial expression recognition system in the human-computer interaction system refers to the recognition of human facial expressions through the human-computer interaction system in the real society, so as to be able to feel the specific situation of recognizing people. This is also one of the main directions of human-computer interaction system research. In this paper, the facial expression recognition system is designed by the algorithm that combines the expressions of the students in the classroom teaching with the system environment, so that the recognition of the facial expressions of the students in the classroom environment is more accurate. This article elaborates on the identification method of the system, and conducts detailed experimental analysis on the specific functions of other modules in the system. The experimental results show that the security and stability of the system are very high. At the same time, the accuracy of the system in the classroom teaching environment is also very high in the recognition of student facial expressions. This is a modern intelligent face recognition system that enters education and teaching. Provide a strong theoretical basis and technical support during the work.

Toward robust and privacy-enhanced facial recognition: A decentralized blockchain-based approach with GANs and deep learning

Article

Full-text available

Feb 2024

In recent years, the extensive use of facial recognition technology has raised concerns about data privacy and security for various applications, such as improving security and streamlining attendance systems and smartphone access. In this study, a blockchain-based decentralized facial recognition system (DFRS) that has been designed to overcome the complexities of technology. The DFRS takes a trailblazing approach, focusing on finding a critical balance between the benefits of facial recognition and the protection of individuals' private rights in an era of increasing monitoring. First, the facial traits are segmented into separate clusters which are maintained by the specialized node that maintains the data privacy and security. After that, the data obfuscation is done by using generative adversarial networks. To ensure the security and authenticity of the data, the facial data is encoded and stored in the blockchain. The proposed system achieves significant results on the CelebA dataset, which shows the effectiveness of the proposed approach. The proposed model has demonstrated enhanced efficacy over existing methods, attaining 99.80% accuracy on the dataset. The study's results emphasize the system's efficacy, especially in biometrics and privacy-focused applications, demonstrating outstanding precision and efficiency during its implementation. This research provides a complete and novel solution for secure facial recognition and data security for privacy protection.

FHSI and QRCPE-Based Low-Light Enhancement With Application to Night Traffic Monitoring Images

Article

Full-text available

Dec 2023
IEEE T INTELL TRANSP

This paper proposes a fast HSI (hue, saturation, intensity) color space and an orthogonal triangular with column pivoting (QRCP) enhancement model to tackle the large size RGB (red, green, blue) night traffic monitoring (NTM) image. Firstly, the fast HSI (FHSI) is proposed to decompose the light and color information of the RGB image, whose hue is defined as the cosine value of the included angle, instead of the included angle in HSI. The saturation of FHSI is defined as the ratio of the projection vector length and the side length of the projection equilateral triangle, and a saturation correction model is further proposed to correct color distortion of the low-light image by adjusting the saturation of FHSI. FHSI is more concise and faster than HSI. Secondly, a novel QRCP enhancement (QRCPE) model is proposed to improve the light of the low-light image by enhancing the intensity of FHSI, which first strengthens diagonal elements of QRCP, and followed by controlling the normalization of strengthened diagonal elements of QRCP. Finally, the FHSI-QRCPE based RGB image can be obtained by transforming the processed FHSI to RGB. The experimental results on NTM, SICE, ExDark, and BDD 100K databases, indicate that the proposed FHSI-QRCPE is fast and efficient to tackle low-light image enhancement. Index Terms-Low-light image enhancement, night traffic monitoring image, fast HSI color space, orthogonal triangular with column pivoting enhancement.

HSV-3S and 2D-GDA for High-Saturation Low-Light Image Enhancement in Night Traffic Monitoring

Article

Full-text available

Sep 2023
IEEE T INTELL TRANSP

This paper proposes HSV (hue, saturation, value) with three sectors (HSV-3S) and two-dimensional gradient descent algorithm (2D-GDA) for high-saturation low-light image enhancement in night traffic monitoring (NTM). The saturation of HSV-3S is defined as the ratio of the projection vector length and twice length of the sector start vector, which results in that the saturation of HSV-3S is smaller than that of HSV, and a saturation weakening model is proposed to further decrease the saturation of HSV-3S. The hue of HSV-3S is defined as the cosine value of the included angle between the projection vector and the sector start vector in each of three sectors. HSV-3S is more concise and faster than HSV. Then, 2D-GDA extends the gradient descent algorithm to 2D image domain. 2D-GDA employs the iteration matrix with variable step values (i.e., the step values of the dark regions are less than those of the bright regions), which can improve the pixel distribution of the 2D-GDA enhanced image. Finally, the HSV-3S $+$ 2D-GDA based RGB image can be obtained by performing 2D-GDA on the value of HSV-3S with transforming the processed HSV-3S to RGB. The experimental results on NTM (i.e., the brevity name of the database collected from real ITS), LOL, ExDark and SICE databases, indicate that HSV-3S $+$ 2D-GDA is fast and efficient for high-saturation low-light image enhancement.

A Distracted Driving Discrimination Method Based on the Facial Feature Triangle and Bayesian Network

Article

Full-text available

Jun 2023
BALT J ROAD BRIDGE E

Distracted driving is one of the main causes of road crashes. Therefore, effective distinguishing of distracted driving behaviour and its category is the key to reducing the incidence of road crashes. To identify distracted driving behaviour accurately and effectively, this paper uses the head posture as a relevant variable and realizes the classification of distracted driving behaviour based on the relevant literature and investigation. A distracted driving discrimination algorithm based on the facial feature triangle is proposed. In the proposed algorithm, the Bayesian network is employed to judge driving behaviour categories. The proposed algorithm is verified by experiments using data from 20 volunteers. The experimental results show that the discrimination accuracy of the proposed algorithm is as high as 90%, which indicates that the head posture parameters used in this study are closely related to the distracted driving state. The results show that the proposed algorithm achieves high accuracy in the discrimination and classification of distracted driving behaviour and can effectively reduce the accident rate caused by distracted driving. Moreover, it can provide a basis for the research of distracted driving behaviour and is conducive to the formulation of the corresponding laws and regulations.

Application of facial expression recognition based on domain-adapted convolutional neural network in English smart teaching system

Article

Full-text available

Apr 2023
SOFT COMPUT

Lilin Liu

The application of the facial expression recognition system in the human–computer interaction system refers to the recognition of human facial expressions through the human–computer interaction system in the real society, so as to be able to feel the specific situation of recognizing people. This is also one of the main directions of human–computer interaction system research. In this paper, the facial expression recognition system is designed by the algorithm that combines the expressions of the students in the classroom teaching with the system environment, so that the recognition of the facial expressions of the students in the classroom environment is more accurate. This article elaborates on the identification method of the system and conducts detailed experimental analysis on the specific functions of other modules in the system. The experimental results show that the security and stability of the system are very high. At the same time, the accuracy of the system in the classroom teaching environment is also very high in the recognition of student facial expressions. This is a modern intelligent face recognition system that enters education and teaching. Provide a strong theoretical basis and technical support during the work.

Traffic density estimation and mapping using IP-CCTV networks: A campus-based approach

Conference Paper

Feb 2023

Closed Circuit Television (CCTV) systems are widely used in monitoring and security surveillance applications to assess status and implement measures necessary to address problems and concerns. CCTV nowadays are visible in roads for monitoring and analyzing traffic behavior and conditions. Multiple cameras are utilized to capture the different angles of the road. This is useful in improving traffic management systems, determination of road traffic density, accident reviews, and in some advanced applications, contactless apprehension. Small to medium scale community areas like industrial parks, villages, hospitals, and even academic campuses require traffic monitoring systems. In this study, a network of IP-CCTV cameras was designed to capture vehicular movement, density, and road condition in a campus setting. The network is composed of eight IP-CCTV cameras processed by four Raspberry Pi computer boards in a 2:1 camera-to-computer ratio. A graphical user interface displays the video feed of the cameras, time customizable traffic report, and the road map visualization and notifications. All computer boards can send and receive data and can create visual traffic maps displayed in the user interface. Color-coding is used in the road segments to indicate light, moderate, or heavy traffic conditions. The vehicle detection accuracy of the system is 93% while its status notification accuracy is 84%. In a campus-based application, especially those with medical and health research institutes, this model suffices its requirements in monitoring, analyzing, basis for emergency rerouting, and improvements in traffic management.

Research on Low Illumination Image Enhancement Algorithm and Its Application in Driver Monitoring System

Conference Paper

Apr 2023

Zhanqian Wu

div class="section abstract"> The driver monitoring system (DMS) plays an essential role in reducing traffic accidents caused by human errors due to driver distraction and fatigue. The vision-based DMS has been the most widely used because of its advantages of non-contact and high recognition accuracy. However, the traditional RGB camera-based DMS has poor recognition accuracy under complex lighting conditions, while the IR-based DMS has a high cost. In order to improve the recognition accuracy of conventional RGB camera-based DMS under complicated illumination conditions, this paper proposes a lightweight low-illumination image enhancement network inspired by the Retinex theory. The lightweight aspect of the network structure is realized by introducing a pixel-wise adjustment function. In addition, the optimization bottleneck problem is solved by introducing the shortcut mechanism. Model performance comparison test results demonstrate that the Structure Similarity Index Measure index of the proposed model is 7.04% and 31.03% higher than that of Multi-Scale Retinex with Color Restoration and Contrast Limiting Adaptive Histogram Equalization, respectively. In the use case, the proposed model is capable to process videos with a resolution of 400×600 at a speed of 20fps on average, which meets the requirements of DMS video stream processing speed. Furthermore, a MobileNet based distraction state recognition network pre-trained on the SFD3 dataset is adopted as the back-end to verify its application in the DMS system. The results show that the accuracy in the driver's distracted behavior recognition with a low-light environment is improved by 75.39% compared to before use. </div

Design of Fatigue Driving Behavior Detection Based on Circle Hough Transform

Article

Feb 2023

Chronic fatigue symptoms of jobs are risk factors that may cause errors and lead to occupational accidents. For instance, occupational injuries and traffic accidents stem from overlooking long-term fatigue. According to statistics for fatigue driving, it was found that fatigue driving is one of the main causes of traffic accidents. The resulting decrease in the quality of traffic, as well as impaired traffic flow efficiency and functioning, contributes markedly to the societal costs of fatigue. This article proposes a noninvasive physical method for fatigue detection using a machine vision image algorithm. The main technology was implemented using a software framework based on optimized skin color segmentation and edge detection, as well as eye contour extraction. By integrating machine vision and an optimized Hove transform algorithm, our method mainly identifies fatigue based on the detected target's face, head gestures, mouth aspect ratio (MAR), and eye condition, and then triggers an alarm through an intelligent auxiliary device. Our evaluation results of facial image data analysis showed that with an ideal eye threshold of 0.3, PERCLOS-80 standard, MAR, and head gesture-nod frequency, the method can be used to detect fatigue data accurately and systematically, thereby fulfilling the purpose of alerting a group of high-risk drivers and preventing them from engaging in high-risk activities in an involuntary state.

R-theta local neighborhood pattern for unconstrained facial image recognition and retrieval

Article

Full-text available

Jun 2019
MULTIMED TOOLS APPL

In this paper R-Theta Local Neighborhood Pattern (RTLNP) is proposed for facial image retrieval. RTLNP exploits relationships amongst the pixels in local neighborhood of the reference pixel at different angular and radial widths. The proposed encoding scheme divides the local neighborhood into sectors of equal angular width. These sectors are again divided into subsectors of two radial widths. Average grayscales values of these two subsectors are encoded to generate the micropatterns. Performance of the proposed descriptor has been evaluated and results are compared with the state of the art descriptors e.g. LBP, CSLBP, CSLTP, LDP, LTrP, MBLBP, and SLBP. The most challenging facial constrained and unconstrained databases, namely; AT&T, CARIA-Face-V5-Cropped, LFW, and Color FERET have been used for showing the efficiency of the proposed descriptor. Proposed descriptor is also tested on near infrared (NIR) face databases; CASIA NIR-VIS 2.0 and PolyU-NIRFD to explore its potential with respect to NIR facial images. Better retrieval rates of RTLNP as compared to the existing state of the art descriptors show the effectiveness of the descriptor.

Driver Fatigue Detection Systems: A Review

Article

Full-text available

Oct 2018

Driver fatigue has been attributed to traffic accidents; therefore, fatigue-related traffic accidents have a higher fatality rate and cause more damage to the surroundings compared with accidents where the drivers are alert. Recently, many automobile companies have installed driver assistance technologies in vehicles for driver assistance. Third party companies are also manufacturing fatigue detection devices; however, much research is still required for improvement. In the field of driver fatigue detection, continuous research is being performed and several articles propose promising results in constrained environments, still much progress is required. This paper presents state-of-the-art review of recent advancement in the field of driver fatigue detection. Methods are categorized into five groups, i.e., subjective reporting, driver biological features, driver physical features, vehicular features while driving, and hybrid features depending on the features used for driver fatigue detection. Various approaches have been compared for fatigue detection, and areas open for improvements are deduced.

Driver and Passenger Identification From Smartphone Data

Article

Full-text available

Jul 2018

The objective of this paper is twofold. First, it presents a brief overview of existing driver and passenger identification or recognition approaches, which rely on smartphone data. This includes listing the typically available sensory measurements and highlighting a few key practical considerations for automotive settings. Second, a simple identification method that utilizes the smartphone inertial measurements and, possibly, doors signal is proposed. It is based on analyzing the user behavior during entry, namely, the direction of turning, and extracting relevant salient features, which are distinctive depending on the side of entry to the vehicle. This is followed by applying a suitable classifier and decision criterion. Experimental data is shown to demonstrate the usefulness and effectiveness of the introduced probabilistic, low-complexity, identification technique.

Continuous Real-Time Vehicle Driver Authentication Using Convolutional Neural Network Based Face Recognition

Conference Paper

Full-text available

May 2018

VGGFace2: A Dataset for Recognising Faces across Pose and Age

Conference Paper

Full-text available

May 2018

Improving Shadow Suppression for Illumination Robust Face Recognition

Article

Full-text available

Oct 2017

2D face analysis techniques, such as face landmarking, face recognition and face verification, are reasonably dependent on illumination conditions which are usually uncontrolled and unpredictable in the real world. An illumination robust preprocessing method thus remains a significant challenge in reliable face analysis. In this paper we propose a novel approach for improving lighting normalization through building the underlying reflectance model which characterizes interactions between skin surface, lighting source and camera sensor, and elaborates the formation of face color appearance. Specifically, the proposed illumination processing pipeline enables the generation of Chromaticity Intrinsic Image (CII) in a log chromaticity space which is robust to illumination variations. Moreover, as an advantage over most prevailing methods, a photo-realistic color face image is subsequently reconstructed which eliminates a wide variety of shadows whilst retaining the color information and identity details. Experimental results under different scenarios and using various face databases show the effectiveness of the proposed approach to deal with lighting variations, including both soft and hard shadows, in face recognition.

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

Conference Paper

Jun 2019

Automatic Detection of Driver Impairment Based on Pupillary Light Reflex

Article

Oct 2018

The main objective of this paper is to determine the feasibility of designing a driver drunkenness detection system based on the dynamic analysis of a subject's pupillary light reflex (PLR). This involuntary reaction is widely utilized in the medical field to diagnose a variety of diseases, and in this paper, the effectiveness of such a method to reveal an impairment condition due to alcohol abuse is evaluated. The test method consists in applying a light stimulus to one eye of the subject and to capture the dynamics of constriction of both eyes; for extracting the pupil size profiles from the video sequences, a two-step methodology is described, where in the first phase, the iris/pupil search within the image is performed, and in the second stage, the image is cropped to perform pupil detection on a smaller image to improve time efficiency. The undesired pupil dynamics arising in the PLR are defined and evaluated; a spontaneous oscillation of the pupil diameter is observed in the range [0, 2] Hz and the accommodation reflex causes pupil constriction of about 10% of the iris diameter. A database of pupillary light responses is acquired on different subjects in baseline condition and after alcohol consumption, and for each one, a first-order model is identified. A set of features is introduced to compare the two populations of responses and is used to design a support vector machine classifier to discriminate between "Sober" and "Drunk" states.

Deep Face Recognition Using Adaptively-Weighted Verification Loss Function

Chapter

Feb 2018

Face recognition plays a critical role in surveillance and security systems. Due to the large appearance variation of human faces, the dissimilarity among faces for the same person may be quite large. This leads to unstable results. To improve the stability and reliability of face recognition, this paper proposes a novel deep-based approach by introducing an adaptively-weighted verification loss function. The proposed loss function can properly enlarge the margin between positive face pairs and negative face pairs from the global perspective, thus obtain a more reliable recognition model by minimizing the dissimilarity between same-person faces and maximizing the dissimilarity between different-person faces. Experiments on the benchmark LFW and YTF datasets demonstrate that the proposed approach can obtain the state-of-the-art performances for face recognition.

Centre Symmetric Quadruple Pattern: A Novel Descriptor for Facial Image Recognition and Retrieval

Article

Oct 2017
PATTERN RECOGN LETT

Facial features are defined as the local relationships that exist amongst the pixels of a facial image. Hand-crafted descriptors identify the relationships of the pixels in the local neighborhood defined by the kernel. Kernel is a two dimensional matrix which is moved across the facial image. Distinctive information captured by the kernel with limited number of pixel achieves satisfactory recognition and retrieval accuracies on facial images taken under constrained environment (controlled variations in light, pose, expressions, and background). To achieve similar accuracies under unconstrained environment local neighborhood has to be increased, in order to encode more pixels. Increasing local neighborhood also increases the feature length of the descriptor. In this paper we propose a hand-crafted descriptor namely Centre Symmetric Quadruple Pattern (CSQP), which is structurally symmetric and encodes the facial asymmetry in quadruple space. The proposed descriptor efficiently encodes larger neighborhood with optimal number of binary bits. It has been shown using average entropy, computed over feature images encoded with the proposed descriptor, that the CSQP captures more meaningful information as compared to state of the art descriptors. The retrieval and recognition accuracies of the proposed descriptor has been compared with state of the art hand-crafted descriptors (CSLBP, CSLTP, LDP, LBP, SLBP and LDGP) on bench mark databases namely; LFW, Color-FERET, and CASIA-face-v5. Result analysis shows that the proposed descriptor performs well under controlled as well as uncontrolled variations in pose, illumination, background and expressions.

Toward Driver Face Recognition in the Intelligent Traffic Monitoring Systems

Abstract

Recommended publications

Diagonal Symmetric Pattern-Based Illumination Invariant Measure for Severe Illumination Variation Fa...

Single Sample Face Recognition Under Varying Illumination via QRCP Decomposition

General logarithm difference model for severe illumination variation face recognition

Diagonal Symmetric Pattern Based Illumination Invariant Measure for Severe Illumination Variations