ArticlePDF Available

SqueezeFace: Integrative Face Recognition Methods with LiDAR Sensors

Wiley
Journal of Sensors
Authors:

Abstract and Figures

In this paper, we propose a robust and reliable face recognition model that incorporates depth information such as data from point clouds and depth maps into RGB image data to avoid false facial verification caused by face spoofing attacks while increasing the model’s performance. The proposed model is driven by the spatially adaptive convolution (SAC) block of SqueezeSegv3; this is the attention block that enables the model to weight features according to their importance of spatial location. We also utilize large-margin loss instead of softmax loss as a supervision signal for the proposed method, to enforce high discriminatory power. In the experiment, the proposed model, which incorporates depth information, had 99.88% accuracy and an F1 score of 93.45%, outperforming the baseline models, which used RGB data alone.
This content is subject to copyright. Terms and conditions apply.
Research Article
SqueezeFace: Integrative Face Recognition Methods with
LiDAR Sensors
Kyoungmin Ko ,
1
Hyunmin Gwak ,
1
Nalinh Thoummala ,
2
Hyun Kwon ,
3
and SungHwan Kim
1
1
Department of Applied Statistics, Konkuk University, Seoul, Republic of Korea
2
AI Analytics Team, DeepVisions, Seoul, Republic of Korea
3
Department of Electrical Engineering, Korea Military Academy, Seoul, Republic of Korea
Correspondence should be addressed to Hyun Kwon; hkwon.cs@gmail.com and SungHwan Kim; shkim1213@konkuk.ac.kr
Received 20 May 2021; Revised 10 August 2021; Accepted 19 August 2021; Published 28 September 2021
Academic Editor: Haibin Lv
Copyright © 2021 Kyoungmin Ko et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
In this paper, we propose a robust and reliable face recognition model that incorporates depth information such as data from
point clouds and depth maps into RGB image data to avoid false facial verication caused by face spoong attacks while
increasing the models performance. The proposed model is driven by the spatially adaptive convolution (SAC) block of
SqueezeSegv3; this is the attention block that enables the model to weight features according to their importance of spatial
location. We also utilize large-margin loss instead of softmax loss as a supervision signal for the proposed method, to enforce
high discriminatory power. In the experiment, the proposed model, which incorporates depth information, had 99.88%
accuracy and an F1score of 93.45%, outperforming the baseline models, which used RGB data alone.
1. Introduction
LiDAR, short for light detection and ranging, is a remote
sensing technology similar to radar. The dierence is that
radar uses radio waves to detect its surroundings, whereas
LiDAR uses laser energy. When a LiDAR sensor directs a
laser beam at an object, it can calculate the distance to the
object by measuring the delay before the light is reected
back to it, making it possible to extract depth information
for an object and display it in the form of a point cloud or
depth map. Not only can LiDAR sensors estimate an objects
range but also they can measure its shape with high accuracy
and spatial resolution. Furthermore, LiDAR sensors are
robust under various lighting conditions (day or night, with
or without glare and shadows), thereby overcoming the dis-
advantages of other sensor types. Because of its superiority,
LiDAR has been widely used in a variety of applications,
including autonomous vehicles, river surveys, and pollution
modeling. Recently, products launched by technology com-
panies often come equipped with a LiDAR scanner, making
it more convenient to obtain depth information for objects
in the form of 3D point clouds, as shown in Figure 1.
A face recognition system is a computer-assisted applica-
tion that automatically determines or veries an individuals
identity using digital images. In practice, the system veries
the persons identity by comparing intensity images of the
face captured by a camera with prestored images. It can be
used for biometric authentication and is emerging as a crit-
ical authentication method for information and communi-
cations technology (ICT) services. Security-based
applications are spreading to various elds; they include
employee attendance checks, airport surveillance, and bank
transactions. A face recognition system can provide a
straightforward yet convenient authentication process, as it
can operate using just an RGB image captured from a per-
sons face. However, this simplicity makes it vulnerable to
Hindawi
Journal of Sensors
Volume 2021, Article ID 4312245, 8 pages
https://doi.org/10.1155/2021/4312245
spoong attacks [1, 2] because pictures of peoples faces can
easily be obtained on social media platforms without their
consent, and these can be used by someone with malicious
intent to steal a persons identity. To prevent such face
spoong attacks, we propose a robust face recognition
method that uses both RGB images and depth information
such as those extracted from point clouds and depth maps
produced by a LiDAR scanner.
Face recognition based on RGB images is already widely
acknowledged for its promising performance. However, the
determination of whether a face is real or fake, known as
liveness detection, cannot be performed simultaneously.
Distinguishing in terms of liveness between RGB images
captured directly from peoples faces using a camera and
digital images from other sources used for face spoong
attacks remains challenging because the two images are just
one type of input used by the recognition system. A point
cloud and depth map, however, can be obtained only by
direct capture from peoples faces using sensors such as
LiDAR. In addition, depth information is three-
dimensional. In other words, spoong attacks using 2D dig-
ital images are immediately identiable by their lack of 3D
information.
The main feature of the proposed method is a face recog-
nition model that incorporates depth information into RGB
images. The method uses a device equipped with a LiDAR
sensor to collect the supplementary data. Because the
method utilizes point cloud and depth data, it solves the live-
ness detection problem of the existing 2D face recognition
method. We also hypothesize that a deep learning frame-
work using depth information can demonstrate higher per-
formance on the classication model for face recognition
systems.
According to the developers of the SqueezeSeg3 model
[3], point cloud data present strong spatial priors, and their
feature distributions vary according to spatial location. Thus,
we built an attention-based deep convolutional model based
on SqueezeSeg3, called SqueezeFace. Its architecture is
shown in Figure 2.
Based on previous studies [49], we additionally adopted
large-margin loss as a supervision signal that enables the
model to learn highly discriminative deep features for face
recognition by maximizing interclass variance and minimiz-
ing intraclass variance during the training phase. In the test
phase, facial embedding features are extracted 5using our
proposed convolution network for face verication. The
method can then verify an identity by calculating the cosine
similarity between embedding features. The proposed
method delivers performance superior to that of existing
methods that use only RGB images. The remainder of this
paper is organized as follows. In Section 2, related work is
reviewed. The structure of the proposed method is described
in detail in Section 3. The experimental results are discussed
in Section 4. Finally, we conclude the paper in Section 5.
2. Related Work
Convolutional neural networks (CNNs) are powerful models
that play an essential role in learning feature representations
that best describe the given domain while maintaining the
spatial information of an image. Because of their excellence
in learning important patterns, CNNs have achieved break-
throughs on a variety of computer vision tasks such as those
involved in image classication, object detection, and
semantic segmentation [1016].
Attention-based CNNs in particular have attracted con-
siderable interest and have been extensively exploited to
improve a models performance on numerous computer
vision tasks by integrating attention modules with the exist-
ing CNN architecture [3, 1720]. The attention module
allows the model to selectively emphasize important features
and discard less informative ones. Hu et al. [17] proposed
the Squeeze-and-Excitation (SE) block, which learns the
relationship between the channels of its convolutional fea-
tures and adaptively recalibrates channel weights according
to the relationship learned. Specically, the SE block extracts
a representative scalar value for each channel using global
average pooling (GAP) and assigns a weight for each chan-
nel based on the interdependency between channels through
the excitation process. Park et al. [19] introduced the simple
yet ecient Bottleneck Attention Module (BAM), which
generates attention maps by separating the process of infer-
ring a attention map into a channel attention module and a
spatial attention module and congures them in parallel.
Woo et al. [20] presented the lightweight Convolutional
Block Attention Module (CBAM), which sequentially
applies channel and spatial attention modules to emphasize
important elements in both the channel and spatial axes.
Exploiting face representation embedding features
extracted using a deep CNN is one of several methods used
in face recognition tasks [9, 2124]. Face recognition using a
deep CNN involves two essential preprocessing steps: face
detection and face alignment. These two tasks should be per-
formed jointly because they are inherently correlated [25].
Real Apple’s device 2D RGB image Depth data Point cloud data
Figure 1: Capture of RGB image, depth, and point cloud data using a LIDAR scanner-equipped device.
2 Journal of Sensors
Softmax loss [26] is commonly used as a loss function to
supervise the face recognition model and was used in Dee-
pID [21] and DeepFace [22]. However, recent studies have
indicated that softmax loss is not suitable for face recogni-
tion tasks owing to its inability to optimize the feature
embedding to enforce strong similarity within positive class
samples and diversity across negative class samples, which
can deteriorate model performance on face recognition. Sug-
gested alternatives included functions based on Euclidean
distance, such as contrastive loss, triplet loss, and center loss,
to alleviate such constraints while strengthening discrimina-
tive features.
Contrastive loss was proposed as the loss function in
DeepID2 [21] and DeepID3 [27]. Generally, this loss
requires pairs of inputs, and it will adjust the distances
between embedding features dierently depending on
whether the pair belongs to the positive class (for an intra-
class pair) or the negative class (for an interclass pair). To
increase the learning eciency of contrastive loss, triplet loss
was proposed in FaceNet [23]. Unlike contrastive loss, triplet
loss requires three inputs, two of which are in the same class
and the third belongs to a dierent class. This loss function
reduces the distance between the intraclass pairs and
increases the distance between the interclass pairs. Despite
being used in many metric learning methods because of
its excellent performance, triplet loss requires an expen-
sive preprocessing step in constructing input data for
the distance comparison. Thus, center loss was proposed
to learn the centroid of the features of each class and
penalize the distances between the centroids and their
corresponding class features. This loss not only handles
the complicated input data preprocessing step but also
boosts performance.
In addition to the losses described above, there exists a
series of losses that incorporate a large angular margin to
strengthen discriminatory power on classication, decrease
the distance between features within the same class, and
increase the distance between features from dierent classes
[79]. We discuss these losses in detail in Section 3.
Traditional face recognition methods utilize only RGB
data as the input. Such methods perform relatively well,
but they present a disadvantage with regard to liveness in
that the model cannot distinguish whether an image has
been captured directly from a persons face or is a digital
image obtained from other sources. This characteristic
makes such methods vulnerable to face spoong attacks.
Recent studies have sought to mitigate this problem by add-
ing depth information in the form of point cloud and depth
data as inputs. Fuseseg [28], Fusenet [29], and Chinet [30]
have been proposed for boosting model performance by
eectively fusing such data collected from various sensors.
Each model has dierent methods for data fusion, and each
embedding feature created is fused at the layer level.
3. Proposed Method
In this section, we describe the proposed face recognition
method, which uses not only RGB images but also depth
and point cloud data (3D coordinates) extracted from
LiDAR sensors. We constructed the proposed model with a
data integration network that processes data serially from
dierent sensors. Because it is imperative to emphasize fea-
tures that will inuence the models performance, the atten-
tion mechanism was adopted to allow the model to capture
and best exploit important features from the point cloud.
For the operational technique, we incorporated the spatially
Point cloud data
(3xHxW)
< R, G, B > < Point Cloud, Depth>
W
x
W
H
xi
xi
margin
Input data
(7xHxW)
3x3 Kernel (64)
2x2 Max pooling 7x7 Kernel (64)
Sigmoid
DownSample (h/2,w/2)
(64 × 2×2)
W
H
(64 × 2×2)
W
H
(64 × 2×2)
Element-wise
multiply
3x3 Kernel (64)
2x2 Max pooling
× 3
3x3 Kernel (64)
3x3 Kernel (64)
× 3
3x3 Kernel (512)
3x3 Kernel (512)
Fc_layer (512)
Additive angular margin penaltyFeature extraction (ResNet34)
Fc_layer (class)
Somax log (Σc
j = 1 ef
j)
efi
Channel
stack
SAC_Block
𝜃yi
Figure 2: Integration of the attention block in the SqueezeFace architecture.
3Journal of Sensors
adaptive convolution (SAC) block of SqueezeSegv3 into a
data integration network to process our data and extract fea-
tures from them.
In addition, we replaced softmax loss with large-margin
loss for supervising the feature embedding process to
increase similarity within the same class and discrepancy
between dierent classes. We discuss in detail the construc-
tion of the proposed data integration network and the large-
margin loss function in Sections 3.1 and 3.2, respectively.
3.1. SqueezeSegv3. Most face recognition models are based
on deep convolutional neural networks (DCNNs) to have
discriminatory power for classication. Facial feature repre-
sentations can be extracted with standard convolution as
Ym,p,q
½
=σ
i,j,n
Wm,n,i,j
½
×Xn,p+
̂
i,q+
̂
j

!
,ð1Þ
where YRO×S×Sand XRI×S×Sare the output and input
tensors; WRO×I×K×Kis the convolutional weight matrix,
in which Kis the convolutional kernel size; Oand Iare
the output and input channel sizes; Srepresents the image
size; and σð·Þis a nonlinear activation function such as ReLU
[31]. In this method,
̂
iand
̂
jare dened as
̂
i=ibK/2cand
̂
j=jbK/2c. As mentioned with regard to the SqueezeSegv3
model [3], standard convolution is based on the assumption
that the distribution of visual features is invariant to the spa-
tial location of the image. This assumption is largely true in
the case of RGB images; thus, a convolution uses the same
weight for all input locations. However, this assumption
cannot be applied to point cloud data as X-coordinate point
cloud data present very strong spatial priors, and the feature
distribution of the point cloud varies substantially at dier-
ent locations. In consideration of this fact, the SAC block,
which is designed to be spatially adaptive and content aware
using 3D coordinates of a point cloud, is proposed to apply
dierent weights for dierent image locations as follows:
Ym,p,q
½
=σ
i,j,n
WX
0
ðÞ
m,n,p,q,i,j
½
×Xn,p+
̂
i,q+
̂
j

!
:
ð2Þ
In SqueezeSegv3 [3], Wð·ÞRO×I×S×S×K×Kis a spatially
adaptive function of the raw input X0, which depends on
the location (p,q). In this method, X0is only the raw input
point cloud. Wð·Þ, the spatially adaptive function of Squee-
zeFace, is shown in detail in the lower part of Figure 2.
To process our data, which are gathered from dierent
sources, an appropriate data fusion model is required. Seven
channels are constructed for the input data by stacking RGB,
depth, and point cloud data, which are collected from dier-
ent sensors and possess dierent characteristics. To obtain
attention map A, the point cloud data are fed into a 7×7
convolution followed by a sigmoid function. Next, this
attention map Ais combined with the input tensor X. Then,
a standard convolution with weight Wis applied to the
adapted input. For the embedding network, we employ the
well-known ResNet34 architecture [32]. The ResNet model
reduces the image size as it passes through each layer. The
downsampling process for the point cloud has diculty in
properly utilizing spatial coordinate information because of
the small size of our dataset. Therefore, the SAC block is
used at the initial layer, as shown in Figure 2. The network
successfully maps the face input to face representation
embedding features, combining the three types of data.
3.2. Large-Margin Loss. The face recognition task is a multi-
class classication, dened as the problem of classifying
images into one of certain classes. The most commonly used
loss for multiclass classication is softmax loss, which is a
softmax activation function followed by cross-entropy loss
[33]. The softmax activation function outputs the probability
for each class, whose sum is one, and the cross-entropy loss
is the sum of the negative logarithms of these probabilities,
dened as
L1=1
N
N
i=1
log eWT
yixi+byi
n
j=1eWT
jxi+bj
,ð3Þ
where xiis the feature vector of sample data, yirepresents
the truth class corresponding to xi, and Wand bare weight
and bias terms, respectively. Despite being widely used, soft-
max loss has some limitations as it does not strictly enforce
higher similarity within the same class and discrepancy
between dierent classes. Thus, traditional softmax loss
may create a performance gap for face recognition when
intraclass variation is high because of factors such as age
gaps, dierences in facial expression, and variations in pose
(left, right, or frontal). To enable the model to circumvent
this problem, A-softmax loss was proposed as a reformula-
tion of the traditional softmax loss in SphereFace [5] as fol-
lows:
L2=1
N
N
i=1
log exicos mθyi,i
ðÞ
exicos mθyi,i
ðÞ
+jyiexicos θj,i
ðÞ
,ð4Þ
where mis the angular margin and θyi,iis the angle between
the vectors Wyiand xi.A-softmax loss adopts WT
yixias the
linear form, which is expressed as Wyi∥∥xicos ðθyi,iÞ. This
loss enables metric learning by constraining the classication
weights norm to 1 through normalization, setting the bias to
0 and incorporating the angular margin adjusted via param-
eter mto capture discriminative features with clear geomet-
ric interpretation.
Then, the CosFace model [8] was proposed, which
includes a large-margin cosine loss function that normalizes
both weights and features by L2 normalization to eliminate
radial variations and adds a quantitative mvalue, a xed
parameter used to control the magnitude of the cosine
4 Journal of Sensors
margin. The overall loss function can be expressed as
L3=1
N
N
i=1
log escos θyi,i
ðÞ
m
ðÞ
escos θyi,i
ðÞ
m
ðÞ
+jyiescos θj,i
ðÞðÞ
,ð5Þ
where sis a rescale parameter, used by the loss function to
rescale the weights and features after normalizing them.
ArcFace [9] adds an additive angular margin penalty m
between weights and features. This penalty is equal to the
geodesic distance margin penalty in the normalized hyper-
sphere and thus is named ArcFace. The loss function is for-
mulated as follows:
L4=1
N
N
i=1
log escos θyi,i+m
ðÞðÞ
escos θyi,i+m
ðÞðÞ
+jyiescos θj,i
ðÞðÞ
:ð6Þ
Thus, we can supervise our model using additive angular
margin loss that combines the margin penalties of Sphere-
Face [5], CosFace [8], and ArcFace [9], which demonstrates
the best performance, as follows:
L5=1
N
N
i=1
log escos m1θyi,i+m2
ðÞ
m3
ðÞ
escos m1θyi,i+m2
ðÞ
m3
ðÞ
+jyiescos θj,i
ðÞðÞ
,
ð7Þ
where m1,m2, and m3are the angular margin parameters,
each represented as min the loss functions described above.
Our main task is to identify a class for each input identity.
By adopting the proposed additive angular margin loss, the
proposed model can increase the similarity of positive clas-
ses and enforce a wide diversity of negative classes in metric
learning. The proposed large-margin loss can generate high-
quality embedding features from our data, enabling high-
accuracy classication with both the training dataset and
the unseen test dataset.
4. Numerical Experiments
4.1. Datasets. The face dataset consisted of 784 face scans
from 83 Korean individuals. The face data were captured
using Apples latest device equipped with a LiDAR scanner.
Specically, the device was equipped with three cameras
(main, wide, and telephoto) and a LiDAR scanner for cap-
turing both RGB image and depth information. ARKit can
be used to connect with the scanner on the Apple device
and process the depth and point cloud (3D coordinate) data.
ARKit recently introduced a new depth API available only
for devices equipped with a LiDAR scanner and provides
several methods to access depth information collected from
LiDAR scanners. The LiDAR scanner allows this API to
obtain per-pixel depth information of a persons face and
generate 3D coordinates of the point cloud by setting the
parameters for the device. We modied ARKits sample code
and set up the application to simultaneously store RGB and
point cloud data within one scene. We installed this modi-
ed app on the device and collected data through the app.
4.2. Experiment Setup. We trained three dierent models to
compare their performance. The rst model used only
RGB data. The second model used three types of sensor data
(RGB, depth, and point cloud) with three dierent charac-
teristics, and the third model was the SqueezeFace model
that uses the SAC block on the three types of sensor data.
All three models used the ResNet34 architecture [32] and
large-margin loss [6]. The ResNet34 model is pretrained
using a facial image dataset of 400 Korean individuals, pro-
vided by AI Hub (https://aihub.or.kr/). For the three sensor
data models, pretrained weights from ResNet34 were used
as the weights of the RGB data, and the weights for the point
cloud and depth data were initialized using the Xavier
initializer.
4.3. Experiment Results. We split our face dataset into a
training set and a test set, and the sensor data were cong-
ured as three types (RGB, depth, and point cloud). In addi-
tion, to evaluate the face verication performance, we
constructed a face verication dataset with pairs of face
images from the test set. Accuracy, precision, and recall were
used as metrics to measure the models performance for face
verication. Accuracy is the ratio of the number of correct
predictions to the total number of inputs. Precision is the
ratio of the number of true positive predictions to the total
number of the models predicted positive values, and recall
is the ratio of the number of true positive predictions to
the number of all positive samples. These three denitions
are represented as
Accuracy = TP + TN
TP + TN + FP + FN ,
Precision = TP
TP + FP ,
Recall = TP
TP + FN ,
ð8Þ
where TP, TN, FP, and FN denote true positive, true
Table 1: Comparison of modelsperformance on the test set.
Model Accuracy F1score
RGB-only, ResNet34 0.9979 0.8995
Our data, ResNet34 0.9980 0.9056
Our data+SqueezeFace 0.9988 0.9345
Table 2: Face verication performance comparison on three-shot
learning between the RGB-only model and proposed three-
sensor-data-type model.
Statistic RGB RGB + depth + point cloud
Number of output classes 83 83
Number of training images 248 248
Number of testing images 536 536
Testing accuracy 0.9973 0.9977
Testing F1score 0.8884 0.9036
Best threshold 0.7255 0.8026
5Journal of Sensors
negative, false positive, and false negative, respectively. For
the face verication dataset, the number of interclass combi-
nations was much greater than the number of intraclass
combinations. Because the intraclass and interclass counts
were considerably imbalanced, the F1scorethe harmonic
mean of precision and recallwas used as the evaluation
metric for face verication:
F1 score = 2 × Precision × Recall
Precision + Recall :ð9Þ
4.3.1. Analysis of Face Verication Results of the Proposed
Method. According to the experimental results, shown in
Table 1, the model using the three types of sensor data out-
performed the model using only RGB data, demonstrating
that employing depth information can enhance rich facial
representation. More importantly, the proposed Squeeze-
Face model, with the added SAC attention block, achieved
the best accuracy and F1score. This result shows that the
proposed model learned well the face points with high
importance by actively utilizing the point cloud data with
dierent distributions according to the spatial location. The
intraclass variance due to pose variations and age gaps sig-
nicantly increases the angle between positive pairs and
therefore can increase the best threshold for face verication
on test data. However, if the train data for each identity are
limited, making the intraclass variance small, it is dicult to
increase the best threshold for face verication on test data.
A low threshold used in the evaluation of face verication
indicates a low reliability of the model. The proposed model
addresses this problem by adding point cloud and depth
data to the RGB data.
The results for face verication performance on three-
shot learning are compared in Table 2. Three-shot learning
is learning that takes place using only three training samples.
The best threshold is the threshold with the maximum F1
score. The model using the three types of sensor data shows
higher accuracy, a higher F1score, and an increase in the
threshold than the RGB-images-only model. This demon-
strates that by making use of supplementary information
such as point cloud and depth data, the proposed model
can increase intraclass variance and, as a result, increase
the best threshold for face verication.
4.3.2. Analysis of Cosine Similarity on Three-Shot Learning of
the Proposed Method. We examined the cosine similarity for
various facial expressions on three-shot learning, with
results as shown in Table 3. The proposed model produced
better similarity values between positive pairs than the RGB-
images-only model, even with a variety of facial expressions.
Because the proposed method uses more information of face
by adding depth and point cloud, the intraclass variance of
the model can increase the angle between positive pairs.
Therefore, the model can increase the cosine similarity, and
the higher cosine similarity can increase the best threshold
on face verication. This result demonstrates that adding
depth and point cloud data enables the model to learn impor-
tant facial features for face verication more eectively than
the model with only RGB data. In addition, despite the dier-
ence between the same identities according to pose variations,
the proposed method can distinguish the identity well in the
test data by adding depth and point cloud data.
5. Conclusion
This paper has proposed a face recognition approach that
considers depth information using point cloud data. By
using depth information, false facial verication using a face
Table 3: Cosine similarity for various facial expressions.
Description Pair
1
Pair
2
Pair
3
Pair
4
Pair
5
RGB 0.9118 0.7450 0.7734 0.5164 0.5558
RGB + depth + point cloud 0.9664 0.9050 0.8781 0.8258 0.8484
6 Journal of Sensors
photo or video of an authorized person can be avoided,
thereby increasing the reliability of the face recognition sys-
tem. The method incorporates the SAC block based on the
attention mechanism to capture important features and
weight them to enhance model performance. In addition,
we used a modied loss function constructed by adding a
large margin to reinforce high discriminatory power for face
recognition applications [34]. The proposed method delivers
a considerable performance improvement over the baseline
models and uses a higher threshold for face verication
when subjected to an increase in intraclass variance.
Data Availability
All source codes are available online at https://github.com/
kyoungmingo/Fusion_face (authors webpage)
Conflicts of Interest
The authors declare that they have no conicts of interest.
Acknowledgments
This research was supported by the National Research Foun-
dation of Korea (NRF) funded by the Ministry of Education,
Science and Technology (NRF-2020R1C1C1A01005229 and
NRF-2021R1A4A5032622).
References
[1] S. Kumar, S. Singh, and J. Kumar, A comparative study on
face spoong attacks,in 2017 International Conference on
Computing, Communication and Automation (ICCCA),
pp. 11041108, Greater Noida, India, 2017.
[2] T. Girdler and V. G. Vassilakis, Implementing an intrusion
detection and prevention system using software-dened net-
working: defending against ARP spoong attacks and black-
listed MAC addresses,Computers & Electrical Engineering,
vol. 90, p. 106990, 2021.
[3] C. Xu, B. Wu, Z. Wang et al., Squeezesegv3: spatially adaptive
convolution for ecient point-cloud segmentation,in Euro-
pean Conference on Computer Vision, pp. 119, Springer,
2020.
[4] J. Deng, Y. Zhou, and S. Zafeiriou, Marginal loss for deep face
recognition,in Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition Workshops, pp. 6068,
Honolulu, HI, USA, 2017.
[5] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song, Sphereface:
deep hypersphere embedding for face recognition,in Proceed-
ings of the IEEE conference on computer vision and pattern rec-
ognition, pp. 212220, Honolulu, HI, USA, 2017.
[6] W. Liu, Y. Wen, Z. Yu, and M. Yang, Large-margin softmax
loss for convolutional neural networks,ICML, vol. 2, p. 7,
2016.
[7] F. Wang, J. Cheng, W. Liu, and H. Liu, Additive margin soft-
max for face verication,IEEE Signal Processing Letters,
vol. 25, no. 7, pp. 926930, 2018.
[8] H. Wang, Y. Wang, Z. Zhou et al., CosFace: large margin
cosine loss for deep face recognition,in Proceedings of the
IEEE conference on computer vision and pattern recognition,
pp. 52655274, Salt Lake City, UT, USA, 2018.
[9] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, ArcFace: additive
angular margin loss for deep face recognition,in Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 46904699, Long Beach, CA, USA, 2019.
[10] S. Ren, K. He, R. Girshick, and J. Sun, Faster r-cnn: towards
real-time object detection with region proposal networks,
2015, https://arxiv.org/abs/1506.01497.
[11] G. Algan and I. Ulusoy, Image classication with deep learn-
ing in the presence of noisy labels: a survey,Knowledge-Based
Systems, vol. 215, article 106771, 2021.
[12] K. He, G. Gkioxari, P. Dollar, and R. Girshick, Mask r-cnn,
in Proceedings of the IEEE international conference on com-
puter vision, pp. 29612969, 2017.
[13] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schro, and H. Adam,
Encoder-decoder with atrous separable convolution for
semantic image segmentation,in Proceedings of the European
conference on computer vision (ECCV), pp. 801818, 2018.
[14] W. Wang, Q. Lai, H. Fu, J. Shen, H. Ling, and R. Yang, Salient
object detection in the deep learning era: an in-depth survey,
in IEEE Transactions on Pattern Analysis and Machine Intelli-
gence, 2021.
[15] O. Ronneberger, P. Fischer, and T. Brox, U-net: convolutional
networks for biomedical image segmentation,in Interna-
tional Conference on Medical image computing and
computer-assisted intervention, pp. 234241, Cham, 2015.
[16] S. Asgari Taghanaki, K. Abhishek, J. P. Cohen, J. Cohen-Adad,
and G. Hamarneh, Deep semantic segmentation of natural
and medical images: a review,Articial Intelligence Review,
vol. 54, no. 1, pp. 137178, 2021.
[17] J. Hu, L. Shen, and G. Sun, Squeeze-and-excitation networks,
in Proceedings of the IEEE conference on computer vision and
pattern recognition, pp. 71327141, 2018.
[18] J. Hu, L. Shen, S. Albanie, G. Sun, and A. Vedaldi, Gather-
excite: exploiting feature context in convolutional neural net-
works,2018, https://arxiv.org/abs/1810.12348.
[19] J. Park, S. Woo, J.-Y. Lee, and I. S. Kweon, BAM: bottleneck
attention module,2018, https://arxiv.org/abs/1807.06514.
[20] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, CBAM: convolu-
tional block attention module,in Proceedings of the European
conference on computer vision (ECCV), pp. 319, 2018.
[21] Y. Sun, X. Wang, and X. Tang, Deep learning face representa-
tion by joint identication-verication,2014, https://arxiv
.org/abs/1406.4773.
[22] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, Deepface:
closing the gap to human-level performance in face verica-
tion,in Proceedingsof the IEEE conference on computer vision
and pattern recognition, pp. 17011708, 2014.
[23] F. Schro, D. Kalenichenko, and J. Philbin, Facenet: a unied
embedding for face recognition and clustering,in Proceedings
of the IEEE conference on computer vision and pattern recogni-
tion, pp. 815823, 2015.
[24] O. M. Parkhi, A. Vedaldi, and A. Zisserman, Deep Face Recog-
nition, British Machine Vision Association, 2015.
[25] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, Joint face detection
and alignment using multitask cascaded convolutional net-
works,IEEE Signal Processing Letters, vol. 23, no. 10,
pp. 14991503, 2016.
[26] X. Wang, S. Zhang, S. Wang, T. Fu, H. Shi, and T. Mei, Mis-
classied vector guided softmax loss for face recognition,Pro-
ceedings of the AAAI Conference on Articial Intelligence,
vol. 34, no. 7, pp. 1224112248, 2020.
7Journal of Sensors
[27] Y. Sun, D. Liang, X. Wang, and X. Tang, Deepid3: face recog-
nition with very deep neural networks,2015, https://arxiv
.org/abs/1502.00873.
[28] G. Krispel, M. Opitz, G. Waltner, H. Possegger, and H. Bischof,
Fuseseg: Lidar point cloud segmentation fusing multi-modal
data,in Proceedings of the IEEE/CVF Winter Conference on
Applications of Computer Vision, pp. 18741883, 2020.
[29] C. Hazirbas, L. Ma, C. Domokos, and D. Cremers, Fusenet:
incorporating depth into semantic segmentation via fusion-
based CNN architecture,in Asian Conference on Computer
Vision, pp. 213228, Springer, 2016.
[30] V. John, M. Nithilan, S. Mita et al., Sensor fusion of intensity
and depth cues using the chinet for semantic segmentation of
road scenes,in 2018 IEEE Intelligent Vehicles Symposium
(IV), pp. 585590, Changshu, China, 2018.
[31] A. F. Agarap, Deep learning using rectied linear units
(ReLU),2018, https://arxiv.org/abs/1803.08375.
[32] K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for
image recognition,in Proceedingsof the IEEE conference on
computer vision and pattern recognition, pp. 770778, 2016.
[33] Z. Zhang and M. R. Sabuncu, Generalized cross entropy loss
for training deep neural networks with noisy labels,2018,
https://arxiv.org/abs/1805.07836.
[34] W. Ali, W. Tian, S. U. Din, D. Iradukunda, and A. A. Khan,
Classical and modern face recognition approaches: a com-
plete review,Multimedia Tools and Applications, vol. 80,
no. 3, pp. 48254880, 2021.
8 Journal of Sensors
... It has been used as an observation technology for precise atmospheric analysis and global environmental observation via mounting on aircraft and satellites, and as a important technology for laser scanners and 3D imaging cameras in autonomous driving. Recently, mobile applications that use LiDAR for face recognition and clothes measurement had also been studied [35], [36]. The sensor generates point cloud data, which is a 3D representation of the target. ...
... An artkit-based mobile app that simultaneously generated RGB, point cloud, and depth data was used [35]. This camera application also provides information on how to map all points in the point cloud to a specific pixel of the RGB image. ...
Article
Full-text available
Face anti-spoofing (FAS) is a technology that protects face recognition systems from presentation attacks. The current challenge faced by FAS studies is the difficulty in creating a generalized light variation model. This is because face data are sensitive to light domain. FAS models using only red green blue (RGB) images suffer from poor performance when the training and test datasets have different light variations. To overcome this problem, this study focuses on light detection and ranging (LiDAR) sensors. LiDAR is a time-of-flight depth sensor that is included in the latest mobile devices. It is negligibly affected by light and provides 3D coordinate and depth information of the target. Thus, a model that is resistant to light variations and exhibiting excellent performance can be created. For the experiment, datasets collected with a LiDAR camera are built and CloudNet architectures for RGB, point clouds, and depth are designed. Three protocols are used to confirm the performance of the model according to variations in the light domain. Experimental results indicate that for protocols 2 and 3, CloudNet error rates increase by 0.1340 and 0.1528, whereas the error rates of the RGB model increase by 0.3951 and 0.4111, respectively, as compared with protocol 1. These results demonstrate that the LiDAR-based FAS model with CloudNet has a more generalized performance compared with the RGB model.
... Light detection and ranging (LiDAR) has become an ubiquitous sensing technology as a potent 3D sensor due to its capacity for long-range detection and high-resolution 3D imaging [1]. To date, LiDAR has been widely applied in various fields, such as autonomous driving, robotic vision, facial recognition, virtual reality, and biomedical imaging [2][3][4]. In terms of LiDAR ranging principle, the frequency-modulated continuous wave (FMCW) method offers numerous benefits over time-of-flight (ToF) and amplitude-modulated continuous wave (AMCW) methods. ...
Article
Full-text available
We present a spectral-scanning frequency-modulated continuous wave (FMCW) 3D imaging system capable of producing high-resolution depth maps with an extended field of view (FOV). By employing a multipass configuration with an echelle grating, the system achieves an FOV of 5.5° along the grating axis. The resulting depth maps have a resolution of 70 × 40 pixels, with a depth resolution of 5.1 mm. The system employs an echelle grating for beam steering and leverages the multipass configuration for angular FOV magnification. Quantitative depth measurements and 3D imaging results of a static 3D-printed depth variation target are demonstrated. The proposed approach offers a promising solution for enhancing the FOV of spectral-scanning FMCW LiDAR systems within a limited wavelength-swept range, thereby reducing system complexity and cost, paving the way for improved 3D imaging applications.
... With the advancement and widespread adoption of LiDAR devices, various applications and research utilizing LiDAR technology have emerged in our daily lives [35,36,37]. LiDAR technology has also been employed in the recognition and size measurement of plant growth and fruit development [14,15]. ...
Article
Full-text available
In this study, we propose an automated system for measuring the size of strawberries and predicting their weight using AI technology. The system combines computer vision techniques with LiDAR sensor data to accurately estimate the dimensions of strawberries and infer their weight. By integrating deep learning models, such as HRNet for keypoint detection, and leveraging the capabilities of LiDAR sensors, we minimize human intervention and achieve precise size measurement. The relative errors for the width and height of the strawberries are 3.71% and 5.42%, respectively, with the width exhibiting a lower error rate. The standard deviation for the width and height of the strawberries are 0.19% and 0.24%, this indicates that the individual strawberries had very low error rates in terms of their measurements for the width and height. Weight prediction was performed through regression analysis with width and height estimation. Experimental results demonstrate that our approach enables accurate weight prediction with a relative error of 10.3%. This automated technology holds great potential for strawberry harvesting and classification tasks, facilitating the automation of these processes.
... Other methods of deep learning [9]- [14] are also found to be used in the literature. There are methods other than deep learning as well, such as by using the laser imaging, detection, and ranging (Lidar) sensors [15], [16]. For its dedicated camera input port that allows users to record videos and photos, Raspberry Pi is the most common processors for implementing face mask detection models [7]- [10]. ...
Conference Paper
Full-text available
After surfacing in the end of 2019 in China, the SARS-CoV-2 virus has swept across the continents and took millions of lives in just a few years. According to the World Health Organization (WHO), one of the most effective ways to reduce the transmission of this virus is to maintain a safe distance of 1 meter from one another and/or to wear a properly fitted face mask. However, it is close to impossible to stay 1 meter away from each other in a densely populated country like Bangladesh. Therefore, an automated face mask detection system can be a useful method towards ensuring people wear masks properly in crowded areas. To this aim, this paper presents a convolutional neural network (CNN) based real-time face mask detection method. Open source software libraries such as TensorFlow, Keras, OpenCV and MobileNet-v2 is used to implement the presented algorithm. The proposed method can differentiate between properly fitted mask and inappropriately worn mask as well. The overall training accuracy of this method is found to be 93.5%.
... This model also utilize large-margin loss instead of softmax loss as a supervision signal for the proposed method, to enforce high discriminatory power. In the experiment, the proposed model, which incorporates depth information, had 99.88% accuracy and an F1 score of 93.45%, outperforming the baseline models, which used RGB data alone [10]. ...
Chapter
Full-text available
Today it has become mandatory for all the citizens to wear a face mask to protect them from COVID-19. Also taking two doses of vaccine is a must to visiting public places and currently, the only method to verify whether a person is fully vaccinated is by showing a vaccine certificate. The proposed application is helpful for elderly people who find it difficult to use smart phones. The shop owners, offices, banks, or any public place can check for restrictions of entry if anyone is not wearing a mask. As a result, no need for any guard to keep an eye on people. Machine learning techniques with Explainable AI (XAI) can solve these problems easily and results are made understandable to end-users because of the explaining ability and interpretability of neural network models. The system performs well for prediction and gives more accurate and trustworthy predictions. Hence XAI is more reliable in healthcare systems. The proposed system is implemented completely on Raspberry Pi allowing a complete embedded application. The application is developed using Python and HTML. PyCharm/Visual Studio Code with the help of an open-source library is used for training, defining, etc. Machine learning models used for the system are Tensorflow.js, Keras, OpenCV, etc. The whole application can run on a microcontroller such as Raspberry Pi, which allows one to simply plug and play the system at any time.KeywordsCNNXAIOpenCVMask detectionRaspberry PiBootstrapHOG
... The core technology responsible for this process is called TrueDepth. The system uses LEDs to project an irregular grid of over 30,000 infrared dots to record depth within a matter of milliseconds [21,22] and can provide a rapid, reliable, and direct method for producing 3D data [16], [23][24][25]. ...
Preprint
Full-text available
The current paradigm shift in orthodontic treatment planning is based on facially driven diagnostics. This requires an affordable, convenient, and non-invasive solution for face scanning. Therefore, utilization of smartphones` TrueDepth sensors is very tempting. TrueDepth refers to front-facing cameras with a dot projector in Apple devices that provide real-time depth data in addition to visual information. There are several applications that tout themselves as accurate solutions for 3D scanning of the face in dentistry. Their clinical accuracy has been uncertain. This study focuses on evaluating the accuracy of the Bellus3D Dental Pro app, which uses Apple's TrueDepth sensor. The app reconstructs a virtual, high-resolution version of the face, which is available for download as a 3D object. In this paper, sixty TrueDepth scans of the face were compared to sixty corresponding facial surfaces segmented from CBCT. Difference maps were created for each pair and evaluated in specific facial regions. The results confirmed statistically significant differences in some facial regions in amplitudes greater than 3 mm, suggesting that current technology has limited applicability for clinical use. The clinical utilization of facial scanning for orthodontic evaluation, which does not require accuracy in the lip region below 3 mm, can be considered.
Article
The human face is a valuable biomarker of aging, but the collection and use of its image raise significant privacy concerns. Here we present an approach for facial data masking that preserves age-related features using coordinate-wise monotonic transformations. We first develop a deep learning model that estimates age directly from non-registered face point clouds with high accuracy and generalizability. We show that the model learns a highly indistinguishable mapping using faces treated with coordinate-wise monotonic transformations, indicating that the relative positioning of facial information is a low-level biomarker of facial aging. Through visual perception tests and computational 3D face verification experiments, we demonstrate that transformed faces are significantly more difficult to perceive for human but not for machines, except when only the face shape information is accessible. Our study leads to a facial data protection guideline that has the potential to broaden public access to face datasets with minimized privacy risks.
Chapter
3D point clouds find widespread use in various areas of computing research, such as 3D reconstruction, point cloud segmentation, navigation, and assisted driving, to name a few examples. A point cloud is a collection of coordinates that represent the shape or surface of an object or scene. One way to generate these point clouds is by using RGB-D cameras. However, one major issue when using point clouds is the presence of noise and outliers caused by various factors, such as environmental conditions, object reflectivity, and sensor limitations. Classification and segmentation tasks can become complex when point clouds contain noise and outliers. This paper proposes a method to reduce outliers and noise in 3D point clouds. Our proposal builds on a deep learning architecture called PointCleanNet, which we modified by adding extra convolutional layers to extract feature maps that help classify point cloud outliers. We demonstrate the effectiveness of our proposed method in improving outlier classification and noise reduction in non-dense point clouds. We achieved this by including a low-density point cloud dataset in the training stage, which helped our method classify outliers more efficiently than PointCleanNet and Luo, S, et al.
Article
Full-text available
The current paradigm shift in orthodontic treatment planning is based on facially driven diagnostics. This requires an affordable, convenient, and non-invasive solution for face scanning. Therefore, utilization of smartphones’ TrueDepth sensors is very tempting. TrueDepth refers to front-facing cameras with a dot projector in Apple devices that provide real-time depth data in addition to visual information. There are several applications that tout themselves as accurate solutions for 3D scanning of the face in dentistry. Their clinical accuracy has been uncertain. This study focuses on evaluating the accuracy of the Bellus3D Dental Pro app, which uses Apple’s TrueDepth sensor. The app reconstructs a virtual, high-resolution version of the face, which is available for download as a 3D object. In this paper, sixty TrueDepth scans of the face were compared to sixty corresponding facial surfaces segmented from CBCT. Difference maps were created for each pair and evaluated in specific facial regions. The results confirmed statistically significant differences in some facial regions with amplitudes greater than 3 mm, suggesting that current technology has limited applicability for clinical use. The clinical utilization of facial scanning for orthodontic evaluation, which does not require accuracy in the lip region below 3 mm, can be considered.
Article
Full-text available
This work focuses on infiltration methods, such as Address Resolution Protocol (ARP) spoofing, where adversaries sends fabricated ARP messages, linking their Media Access Control (MAC) address to a genuine device’s Internet Protocol (IP) address. We developed a Software-Defined Networking (SDN)-based Intrusion Detection and Prevention System (IDPS), which defends against ARP spoofing and Blacklisted MAC Addresses. This is done by dynamically adjusting SDN’s operating parameters to detect malicious network traffic. Bespoke software was written to conduct the attack tests and customise the IDPS; this was coupled to a specifically developed library to validate user input. Improvements were made to SDN in the areas of attack detection, firewall, intrusion prevention, packet dropping, and shorter timeouts. Our extensive experimental results show that the developed solution is effective and quickly responds to intrusion attempts. In the considered test scenarios, our measured detection and mitigation times are sufficiently low (in the order of a few seconds).
Article
Full-text available
As an essential problem in computer vision, salient object detection (SOD) has attracted an increasing amount of research attention over the years. Recent advances in SOD are predominantly led by deep learning-based solutions (named deep SOD). To enable an in-depth understanding of deep SOD, in this paper, we provide a comprehensive survey covering various aspects, ranging from algorithm taxonomy to unsolved issues. In particular, we first review deep SOD algorithms from different perspectives, including network architecture, level of supervision, learning paradigm, and object-/instance-level detection. Following that, we summarize and analyze existing SOD datasets and evaluation metrics. Then, we benchmark a large group of representative SOD models, and provide detailed analyses of the comparison results. Moreover, we study the performance of SOD algorithms under different attribute settings, which has not been thoroughly explored previously, by constructing a novel SOD dataset with rich attribute annotations covering various salient object types, challenging factors, and scene categories. We further analyze, for the first time in the field, the robustness of SOD models to random input perturbations and adversarial attacks. We also look into the generalization and difficulty of existing SOD datasets. Finally, we discuss several open issues of SOD and outline future research directions. All the saliency prediction maps, our constructed dataset with annotations, and codes for evaluation are publicly available at https://github.com/wenguanwang/SODsurvey.
Article
Full-text available
Human face recognition have been an active research area for the last few decades. Especially, during the last five years, it has gained significant research attention from multiple domains like computer vision, machine learning and artificial intelligence due to its remarkable progress and broad social applications. The primary goal of any face recognition system is to recognize the human identity from the static images, video data, data-streams and the knowledge of the context in which these data components are being actively used. In this review, we have highlighted major applications, challenges and trends of face recognition systems in social and scientific domains. The prime objective of this research is to sum-up recent face recognition techniques and develop a broad understanding of how these techniques behave on different datasets. Moreover, we discuss some key challenges such as variability in illumination, pose, aging, cosmetics, scale, occlusion, and background. Along with classical face recognition techniques, most recent research directions are deeply investigated, i.e., deep learning, sparse models and fuzzy set theory. Additionally, basic methodologies are briefly discussed, while contemporary research contributions are examined in broader details. Finally, this research presents future aspects of face recognition technologies and its potential significance in the upcoming digital society.
Article
Full-text available
The semantic image segmentation task consists of classifying each pixel of an image into an instance, where each instance corresponds to a class. This task is a part of the concept of scene understanding or better explaining the global context of an image. In the medical image analysis domain, image segmentation can be used for image-guided interventions, radiotherapy, or improved radiological diagnostics. In this review, we categorize the leading deep learning-based medical and non-medical image segmentation solutions into six main groups of deep architectural, data synthesis-based, loss function-based, sequenced models, weakly supervised, and multi-task methods and provide a comprehensive review of the contributions in each of these groups. Further, for each group, we analyze each variant of these groups and discuss the limitations of the current approaches and present potential future research directions for semantic image segmentation.
Article
Image classification systems recently made a giant leap with the advancement of deep neural networks. However, these systems require an excessive amount of labeled data to be adequately trained. Gathering a correctly annotated dataset is not always feasible due to several factors, such as the expensiveness of the labeling process or difficulty of correctly classifying data, even for the experts. Because of these practical challenges, label noise is a common problem in real-world datasets, and numerous methods to train deep neural networks with label noise are proposed in the literature. Although deep neural networks are known to be relatively robust to label noise, their tendency to overfit data makes them vulnerable to memorizing even random noise. Therefore, it is crucial to consider the existence of label noise and develop counter algorithms to fade away its adverse effects to train deep neural networks efficiently. Even though an extensive survey of machine learning techniques under label noise exists, the literature lacks a comprehensive survey of methodologies centered explicitly around deep learning in the presence of noisy labels. This paper aims to present these algorithms while categorizing them into one of the two subgroups: noise model based and noise model free methods. Algorithms in the first group aim to estimate the noise structure and use this information to avoid the adverse effects of noisy labels. Differently, methods in the second group try to come up with inherently noise robust algorithms by using approaches like robust losses, regularizers or other learning paradigms.
Chapter
LiDAR point-cloud segmentation is an important problem for many applications. For large-scale point cloud segmentation, the de facto method is to project a 3D point cloud to get a 2D LiDAR image and use convolutions to process it. Despite the similarity between regular RGB and LiDAR images, we are the first to discover that the feature distribution of LiDAR images changes drastically at different image locations. Using standard convolutions to process such LiDAR images is problematic, as convolution filters pick up local features that are only active in specific regions in the image. As a result, the capacity of the network is under-utilized and the segmentation performance decreases. To fix this, we propose Spatially-Adaptive Convolution (SAC) to adopt different filters for different locations according to the input image. SAC can be computed efficiently since it can be implemented as a series of element-wise multiplications, im2col, and standard convolution. It is a general framework such that several previous methods can be seen as special cases of SAC. Using SAC, we build SqueezeSegV3 for LiDAR point-cloud segmentation and outperform all previous published methods by at least 2.0% mIoU on the SemanticKITTI benchmark. Code and pretrained model are available at https://github.com/chenfengxu714/SqueezeSegV3.
Article
Face recognition has witnessed significant progress due to the advances of deep convolutional neural networks (CNNs), the central task of which is how to improve the feature discrimination. To this end, several margin-based (e.g., angular, additive and additive angular margins) softmax loss functions have been proposed to increase the feature margin between different classes. However, despite great achievements have been made, they mainly suffer from three issues: 1) Obviously, they ignore the importance of informative features mining for discriminative learning; 2) They encourage the feature margin only from the ground truth class, without realizing the discriminability from other non-ground truth classes; 3) The feature margin between different classes is set to be same and fixed, which may not adapt the situations very well. To cope with these issues, this paper develops a novel loss function, which adaptively emphasizes the mis-classified feature vectors to guide the discriminative feature learning. Thus we can address all the above issues and achieve more discriminative face features. To the best of our knowledge, this is the first attempt to inherit the advantages of feature margin and feature mining into a unified loss function. Experimental results on several benchmarks have demonstrated the effectiveness of our method over state-of-the-art alternatives. Our code is available at http://www.cbsr.ia.ac.cn/users/xiaobowang/.