ArticlePDF Available

SqueezeFace: Integrative Face Recognition Methods with LiDAR Sensors

Journal of Sensors

September 2021
2021:1-8

DOI:10.1155/2021/4312245

License
CC BY 4.0

Authors:

Hyun Kwon

Korea Military Academy

Show all 5 authorsHide

In this paper, we propose a robust and reliable face recognition model that incorporates depth information such as data from point clouds and depth maps into RGB image data to avoid false facial verification caused by face spoofing attacks while increasing the model’s performance. The proposed model is driven by the spatially adaptive convolution (SAC) block of SqueezeSegv3; this is the attention block that enables the model to weight features according to their importance of spatial location. We also utilize large-margin loss instead of softmax loss as a supervision signal for the proposed method, to enforce high discriminatory power. In the experiment, the proposed model, which incorporates depth information, had 99.88% accuracy and an F1 score of 93.45%, outperforming the baseline models, which used RGB data alone.

Capture of RGB image, depth, and point cloud data using a LIDAR scanner-equipped device.

…

Integration of the attention block in the SqueezeFace architecture.

…

Figures - available from: Journal of Sensors

This content is subject to copyright. Terms and conditions apply.

Access to this full-text is provided by Wiley.

Learn more

Content available from Journal of Sensors

This content is subject to copyright. Terms and conditions apply.

Research Article

SqueezeFace: Integrative Face Recognition Methods with

LiDAR Sensors

Kyoungmin Ko ,

Hyunmin Gwak ,

Nalinh Thoummala ,

Hyun Kwon ,

and SungHwan Kim

Department of Applied Statistics, Konkuk University, Seoul, Republic of Korea

AI Analytics Team, DeepVisions, Seoul, Republic of Korea

Department of Electrical Engineering, Korea Military Academy, Seoul, Republic of Korea

Correspondence should be addressed to Hyun Kwon; hkwon.cs@gmail.com and SungHwan Kim; shkim1213@konkuk.ac.kr

Received 20 May 2021; Revised 10 August 2021; Accepted 19 August 2021; Published 28 September 2021

Academic Editor: Haibin Lv

License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is

properly cited.

In this paper, we propose a robust and reliable face recognition model that incorporates depth information such as data from

point clouds and depth maps into RGB image data to avoid false facial veriﬁcation caused by face spooﬁng attacks while

increasing the model’s performance. The proposed model is driven by the spatially adaptive convolution (SAC) block of

SqueezeSegv3; this is the attention block that enables the model to weight features according to their importance of spatial

location. We also utilize large-margin loss instead of softmax loss as a supervision signal for the proposed method, to enforce

high discriminatory power. In the experiment, the proposed model, which incorporates depth information, had 99.88%

accuracy and an F1score of 93.45%, outperforming the baseline models, which used RGB data alone.

1. Introduction

LiDAR, short for light detection and ranging, is a remote

sensing technology similar to radar. The diﬀerence is that

radar uses radio waves to detect its surroundings, whereas

LiDAR uses laser energy. When a LiDAR sensor directs a

laser beam at an object, it can calculate the distance to the

object by measuring the delay before the light is reﬂected

back to it, making it possible to extract depth information

for an object and display it in the form of a point cloud or

depth map. Not only can LiDAR sensors estimate an object’s

range but also they can measure its shape with high accuracy

and spatial resolution. Furthermore, LiDAR sensors are

robust under various lighting conditions (day or night, with

or without glare and shadows), thereby overcoming the dis-

advantages of other sensor types. Because of its superiority,

LiDAR has been widely used in a variety of applications,

including autonomous vehicles, river surveys, and pollution

modeling. Recently, products launched by technology com-

panies often come equipped with a LiDAR scanner, making

it more convenient to obtain depth information for objects

in the form of 3D point clouds, as shown in Figure 1.

A face recognition system is a computer-assisted applica-

tion that automatically determines or veriﬁes an individual’s

identity using digital images. In practice, the system veriﬁes

the person’s identity by comparing intensity images of the

face captured by a camera with prestored images. It can be

used for biometric authentication and is emerging as a crit-

ical authentication method for information and communi-

cations technology (ICT) services. Security-based

applications are spreading to various ﬁelds; they include

employee attendance checks, airport surveillance, and bank

transactions. A face recognition system can provide a

straightforward yet convenient authentication process, as it

can operate using just an RGB image captured from a per-

son’s face. However, this simplicity makes it vulnerable to

Hindawi

Journal of Sensors

Volume 2021, Article ID 4312245, 8 pages

https://doi.org/10.1155/2021/4312245

spooﬁng attacks [1, 2] because pictures of people’s faces can

easily be obtained on social media platforms without their

consent, and these can be used by someone with malicious

intent to steal a person’s identity. To prevent such face

spooﬁng attacks, we propose a robust face recognition

method that uses both RGB images and depth information

such as those extracted from point clouds and depth maps

produced by a LiDAR scanner.

Face recognition based on RGB images is already widely

acknowledged for its promising performance. However, the

determination of whether a face is real or fake, known as

liveness detection, cannot be performed simultaneously.

Distinguishing in terms of liveness between RGB images

captured directly from people’s faces using a camera and

digital images from other sources used for face spooﬁng

attacks remains challenging because the two images are just

one type of input used by the recognition system. A point

cloud and depth map, however, can be obtained only by

direct capture from people’s faces using sensors such as

LiDAR. In addition, depth information is three-

dimensional. In other words, spooﬁng attacks using 2D dig-

ital images are immediately identiﬁable by their lack of 3D

information.

The main feature of the proposed method is a face recog-

nition model that incorporates depth information into RGB

images. The method uses a device equipped with a LiDAR

sensor to collect the supplementary data. Because the

method utilizes point cloud and depth data, it solves the live-

ness detection problem of the existing 2D face recognition

method. We also hypothesize that a deep learning frame-

work using depth information can demonstrate higher per-

formance on the classiﬁcation model for face recognition

systems.

According to the developers of the SqueezeSeg3 model

[3], point cloud data present strong spatial priors, and their

feature distributions vary according to spatial location. Thus,

we built an attention-based deep convolutional model based

on SqueezeSeg3, called SqueezeFace. Its architecture is

shown in Figure 2.

Based on previous studies [4–9], we additionally adopted

large-margin loss as a supervision signal that enables the

model to learn highly discriminative deep features for face

recognition by maximizing interclass variance and minimiz-

ing intraclass variance during the training phase. In the test

phase, facial embedding features are extracted 5using our

proposed convolution network for face veriﬁcation. The

method can then verify an identity by calculating the cosine

similarity between embedding features. The proposed

method delivers performance superior to that of existing

methods that use only RGB images. The remainder of this

paper is organized as follows. In Section 2, related work is

reviewed. The structure of the proposed method is described

in detail in Section 3. The experimental results are discussed

in Section 4. Finally, we conclude the paper in Section 5.

2. Related Work

Convolutional neural networks (CNNs) are powerful models

that play an essential role in learning feature representations

that best describe the given domain while maintaining the

spatial information of an image. Because of their excellence

in learning important patterns, CNNs have achieved break-

throughs on a variety of computer vision tasks such as those

involved in image classiﬁcation, object detection, and

semantic segmentation [10–16].

Attention-based CNNs in particular have attracted con-

siderable interest and have been extensively exploited to

improve a model’s performance on numerous computer

vision tasks by integrating attention modules with the exist-

ing CNN architecture [3, 17–20]. The attention module

allows the model to selectively emphasize important features

and discard less informative ones. Hu et al. [17] proposed

the Squeeze-and-Excitation (SE) block, which learns the

relationship between the channels of its convolutional fea-

tures and adaptively recalibrates channel weights according

to the relationship learned. Speciﬁcally, the SE block extracts

a representative scalar value for each channel using global

average pooling (GAP) and assigns a weight for each chan-

nel based on the interdependency between channels through

the excitation process. Park et al. [19] introduced the simple

yet eﬃcient Bottleneck Attention Module (BAM), which

generates attention maps by separating the process of infer-

ring a attention map into a channel attention module and a

spatial attention module and conﬁgures them in parallel.

Woo et al. [20] presented the lightweight Convolutional

Block Attention Module (CBAM), which sequentially

applies channel and spatial attention modules to emphasize

important elements in both the channel and spatial axes.

Exploiting face representation embedding features

extracted using a deep CNN is one of several methods used

in face recognition tasks [9, 21–24]. Face recognition using a

deep CNN involves two essential preprocessing steps: face

detection and face alignment. These two tasks should be per-

formed jointly because they are inherently correlated [25].

Real Apple’s device 2D RGB image Depth data Point cloud data

Figure 1: Capture of RGB image, depth, and point cloud data using a LIDAR scanner-equipped device.

2 Journal of Sensors

Softmax loss [26] is commonly used as a loss function to

supervise the face recognition model and was used in Dee-

pID [21] and DeepFace [22]. However, recent studies have

indicated that softmax loss is not suitable for face recogni-

tion tasks owing to its inability to optimize the feature

embedding to enforce strong similarity within positive class

samples and diversity across negative class samples, which

can deteriorate model performance on face recognition. Sug-

gested alternatives included functions based on Euclidean

distance, such as contrastive loss, triplet loss, and center loss,

to alleviate such constraints while strengthening discrimina-

tive features.

Contrastive loss was proposed as the loss function in

DeepID2 [21] and DeepID3 [27]. Generally, this loss

requires pairs of inputs, and it will adjust the distances

between embedding features diﬀerently depending on

whether the pair belongs to the positive class (for an intra-

class pair) or the negative class (for an interclass pair). To

increase the learning eﬃciency of contrastive loss, triplet loss

was proposed in FaceNet [23]. Unlike contrastive loss, triplet

loss requires three inputs, two of which are in the same class

and the third belongs to a diﬀerent class. This loss function

reduces the distance between the intraclass pairs and

increases the distance between the interclass pairs. Despite

being used in many metric learning methods because of

its excellent performance, triplet loss requires an expen-

sive preprocessing step in constructing input data for

the distance comparison. Thus, center loss was proposed

to learn the centroid of the features of each class and

penalize the distances between the centroids and their

corresponding class features. This loss not only handles

the complicated input data preprocessing step but also

boosts performance.

In addition to the losses described above, there exists a

series of losses that incorporate a large angular margin to

strengthen discriminatory power on classiﬁcation, decrease

the distance between features within the same class, and

increase the distance between features from diﬀerent classes

[7–9]. We discuss these losses in detail in Section 3.

Traditional face recognition methods utilize only RGB

data as the input. Such methods perform relatively well,

but they present a disadvantage with regard to liveness in

that the model cannot distinguish whether an image has

been captured directly from a person’s face or is a digital

image obtained from other sources. This characteristic

makes such methods vulnerable to face spooﬁng attacks.

Recent studies have sought to mitigate this problem by add-

ing depth information in the form of point cloud and depth

data as inputs. Fuseseg [28], Fusenet [29], and Chinet [30]

have been proposed for boosting model performance by

eﬀectively fusing such data collected from various sensors.

Each model has diﬀerent methods for data fusion, and each

embedding feature created is fused at the layer level.

3. Proposed Method

In this section, we describe the proposed face recognition

method, which uses not only RGB images but also depth

and point cloud data (3D coordinates) extracted from

LiDAR sensors. We constructed the proposed model with a

data integration network that processes data serially from

diﬀerent sensors. Because it is imperative to emphasize fea-

tures that will inﬂuence the model’s performance, the atten-

tion mechanism was adopted to allow the model to capture

and best exploit important features from the point cloud.

For the operational technique, we incorporated the spatially

Point cloud data

(3xHxW)

< R, G, B > < Point Cloud, Depth>

margin

Input data

(7xHxW)

3x3 Kernel (64)

2x2 Max pooling 7x7 Kernel (64)

Sigmoid

DownSample (h/2,w/2)

(64 × 2×2)

Element-wise

multiply

3x3 Kernel (64)

2x2 Max pooling

× 3

3x3 Kernel (64)

× 3

3x3 Kernel (512)

Fc_layer (512)

Additive angular margin penaltyFeature extraction (ResNet34)

Fc_layer (class)

Somax –log (Σc

j = 1 ef

efi

Channel

stack

SAC_Block

𝜃yi

Figure 2: Integration of the attention block in the SqueezeFace architecture.

3Journal of Sensors

adaptive convolution (SAC) block of SqueezeSegv3 into a

data integration network to process our data and extract fea-

tures from them.

In addition, we replaced softmax loss with large-margin

loss for supervising the feature embedding process to

increase similarity within the same class and discrepancy

between diﬀerent classes. We discuss in detail the construc-

tion of the proposed data integration network and the large-

margin loss function in Sections 3.1 and 3.2, respectively.

3.1. SqueezeSegv3. Most face recognition models are based

on deep convolutional neural networks (DCNNs) to have

discriminatory power for classiﬁcation. Facial feature repre-

sentations can be extracted with standard convolution as

Ym,p,q

½

=σ〠

i,j,n

Wm,n,i,j

½

×Xn,p+

i,q+



,ð1Þ

where Y∈RO×S×Sand X∈RI×S×Sare the output and input

tensors; W∈RO×I×K×Kis the convolutional weight matrix,

in which Kis the convolutional kernel size; Oand Iare

the output and input channel sizes; Srepresents the image

size; and σð·Þis a nonlinear activation function such as ReLU

[31]. In this method,

iand

jare deﬁned as

i=i−bK/2cand

j=j−bK/2c. As mentioned with regard to the SqueezeSegv3

model [3], standard convolution is based on the assumption

that the distribution of visual features is invariant to the spa-

tial location of the image. This assumption is largely true in

the case of RGB images; thus, a convolution uses the same

weight for all input locations. However, this assumption

cannot be applied to point cloud data as X-coordinate point

cloud data present very strong spatial priors, and the feature

distribution of the point cloud varies substantially at diﬀer-

ent locations. In consideration of this fact, the SAC block,

which is designed to be spatially adaptive and content aware

using 3D coordinates of a point cloud, is proposed to apply

diﬀerent weights for diﬀerent image locations as follows:

Ym,p,q

½

=σ〠

i,j,n

ðÞ

m,n,p,q,i,j

½

×Xn,p+

i,q+



ð2Þ

In SqueezeSegv3 [3], Wð·Þ∈RO×I×S×S×K×Kis a spatially

adaptive function of the raw input X0, which depends on

the location (p,q). In this method, X0is only the raw input

point cloud. Wð·Þ, the spatially adaptive function of Squee-

zeFace, is shown in detail in the lower part of Figure 2.

To process our data, which are gathered from diﬀerent

sources, an appropriate data fusion model is required. Seven

channels are constructed for the input data by stacking RGB,

depth, and point cloud data, which are collected from diﬀer-

ent sensors and possess diﬀerent characteristics. To obtain

attention map A, the point cloud data are fed into a 7×7

convolution followed by a sigmoid function. Next, this

attention map Ais combined with the input tensor X. Then,

a standard convolution with weight Wis applied to the

adapted input. For the embedding network, we employ the

well-known ResNet34 architecture [32]. The ResNet model

reduces the image size as it passes through each layer. The

downsampling process for the point cloud has diﬃculty in

properly utilizing spatial coordinate information because of

the small size of our dataset. Therefore, the SAC block is

used at the initial layer, as shown in Figure 2. The network

successfully maps the face input to face representation

embedding features, combining the three types of data.

3.2. Large-Margin Loss. The face recognition task is a multi-

class classiﬁcation, deﬁned as the problem of classifying

images into one of certain classes. The most commonly used

loss for multiclass classiﬁcation is softmax loss, which is a

softmax activation function followed by cross-entropy loss

[33]. The softmax activation function outputs the probability

for each class, whose sum is one, and the cross-entropy loss

is the sum of the negative logarithms of these probabilities,

deﬁned as

L1=−1

N〠

i=1

log eWT

yixi+byi

∑n

j=1eWT

jxi+bj

,ð3Þ

where xiis the feature vector of sample data, yirepresents

the truth class corresponding to xi, and Wand bare weight

and bias terms, respectively. Despite being widely used, soft-

max loss has some limitations as it does not strictly enforce

higher similarity within the same class and discrepancy

between diﬀerent classes. Thus, traditional softmax loss

may create a performance gap for face recognition when

intraclass variation is high because of factors such as age

gaps, diﬀerences in facial expression, and variations in pose

(left, right, or frontal). To enable the model to circumvent

this problem, A-softmax loss was proposed as a reformula-

tion of the traditional softmax loss in SphereFace [5] as fol-

lows:

L2=−1

N〠

i=1

log e∥xi∥cos mθyi,i

ðÞ

e∥xi∥cos mθyi,i

ðÞ

+∑j≠yie∥xi∥cos θj,i

ðÞ

,ð4Þ

where mis the angular margin and θyi,iis the angle between

the vectors Wyiand xi.A-softmax loss adopts WT

yixias the

linear form, which is expressed as ∥Wyi∥∥xi∥cos ðθyi,iÞ. This

loss enables metric learning by constraining the classiﬁcation

weight’s norm to 1 through normalization, setting the bias to

0 and incorporating the angular margin adjusted via param-

eter mto capture discriminative features with clear geomet-

ric interpretation.

Then, the CosFace model [8] was proposed, which

includes a large-margin cosine loss function that normalizes

both weights and features by L2 normalization to eliminate

radial variations and adds a quantitative mvalue, a ﬁxed

parameter used to control the magnitude of the cosine

4 Journal of Sensors

margin. The overall loss function can be expressed as

L3=−1

N〠

i=1

log escos θyi,i

ðÞ

−m

ðÞ

escos θyi,i

ðÞ

−m

ðÞ

+∑j≠yiescos θj,i

ðÞðÞ

,ð5Þ

where sis a rescale parameter, used by the loss function to

rescale the weights and features after normalizing them.

ArcFace [9] adds an additive angular margin penalty m

between weights and features. This penalty is equal to the

geodesic distance margin penalty in the normalized hyper-

sphere and thus is named ArcFace. The loss function is for-

mulated as follows:

L4=−1

N〠

i=1

log escos θyi,i+m

ðÞðÞ

escos θyi,i+m

ðÞðÞ

+∑j≠yiescos θj,i

ðÞðÞ

:ð6Þ

Thus, we can supervise our model using additive angular

margin loss that combines the margin penalties of Sphere-

Face [5], CosFace [8], and ArcFace [9], which demonstrates

the best performance, as follows:

L5=−1

N〠

i=1

log escos m1θyi,i+m2

ðÞ

−m3

ðÞ

escos m1θyi,i+m2

ðÞ

−m3

ðÞ

+∑j≠yiescos θj,i

ðÞðÞ

ð7Þ

where m1,m2, and m3are the angular margin parameters,

each represented as min the loss functions described above.

Our main task is to identify a class for each input identity.

By adopting the proposed additive angular margin loss, the

proposed model can increase the similarity of positive clas-

ses and enforce a wide diversity of negative classes in metric

learning. The proposed large-margin loss can generate high-

quality embedding features from our data, enabling high-

accuracy classiﬁcation with both the training dataset and

the unseen test dataset.

4. Numerical Experiments

4.1. Datasets. The face dataset consisted of 784 face scans

from 83 Korean individuals. The face data were captured

using Apple’s latest device equipped with a LiDAR scanner.

Speciﬁcally, the device was equipped with three cameras

(main, wide, and telephoto) and a LiDAR scanner for cap-

turing both RGB image and depth information. ARKit can

be used to connect with the scanner on the Apple device

and process the depth and point cloud (3D coordinate) data.

ARKit recently introduced a new depth API available only

for devices equipped with a LiDAR scanner and provides

several methods to access depth information collected from

LiDAR scanners. The LiDAR scanner allows this API to

obtain per-pixel depth information of a person’s face and

generate 3D coordinates of the point cloud by setting the

parameters for the device. We modiﬁed ARKit’s sample code

and set up the application to simultaneously store RGB and

point cloud data within one scene. We installed this modi-

ﬁed app on the device and collected data through the app.

4.2. Experiment Setup. We trained three diﬀerent models to

compare their performance. The ﬁrst model used only

RGB data. The second model used three types of sensor data

(RGB, depth, and point cloud) with three diﬀerent charac-

teristics, and the third model was the SqueezeFace model

that uses the SAC block on the three types of sensor data.

All three models used the ResNet34 architecture [32] and

large-margin loss [6]. The ResNet34 model is pretrained

using a facial image dataset of 400 Korean individuals, pro-

vided by AI Hub (https://aihub.or.kr/). For the three sensor

data models, pretrained weights from ResNet34 were used

as the weights of the RGB data, and the weights for the point

cloud and depth data were initialized using the Xavier

initializer.

4.3. Experiment Results. We split our face dataset into a

training set and a test set, and the sensor data were conﬁg-

ured as three types (RGB, depth, and point cloud). In addi-

tion, to evaluate the face veriﬁcation performance, we

constructed a face veriﬁcation dataset with pairs of face

images from the test set. Accuracy, precision, and recall were

used as metrics to measure the model’s performance for face

veriﬁcation. Accuracy is the ratio of the number of correct

predictions to the total number of inputs. Precision is the

ratio of the number of true positive predictions to the total

number of the model’s predicted positive values, and recall

is the ratio of the number of true positive predictions to

the number of all positive samples. These three deﬁnitions

are represented as

Accuracy = TP + TN

TP + TN + FP + FN ,

Precision = TP

TP + FP ,

Recall = TP

TP + FN ,

ð8Þ

where TP, TN, FP, and FN denote true positive, true

Table 1: Comparison of models’performance on the test set.

Model Accuracy F1score

RGB-only, ResNet34 0.9979 0.8995

Our data, ResNet34 0.9980 0.9056

Our data+SqueezeFace 0.9988 0.9345

Table 2: Face veriﬁcation performance comparison on three-shot

learning between the RGB-only model and proposed three-

sensor-data-type model.

Statistic RGB RGB + depth + point cloud

Number of output classes 83 83

Number of training images 248 248

Number of testing images 536 536

Testing accuracy 0.9973 0.9977

Testing F1score 0.8884 0.9036

Best threshold 0.7255 0.8026

5Journal of Sensors

negative, false positive, and false negative, respectively. For

the face veriﬁcation dataset, the number of interclass combi-

nations was much greater than the number of intraclass

combinations. Because the intraclass and interclass counts

were considerably imbalanced, the F1score—the harmonic

mean of precision and recall—was used as the evaluation

metric for face veriﬁcation:

F1 score = 2 × Precision × Recall

Precision + Recall :ð9Þ

4.3.1. Analysis of Face Veriﬁcation Results of the Proposed

Method. According to the experimental results, shown in

Table 1, the model using the three types of sensor data out-

performed the model using only RGB data, demonstrating

that employing depth information can enhance rich facial

representation. More importantly, the proposed Squeeze-

Face model, with the added SAC attention block, achieved

the best accuracy and F1score. This result shows that the

proposed model learned well the face points with high

importance by actively utilizing the point cloud data with

diﬀerent distributions according to the spatial location. The

intraclass variance due to pose variations and age gaps sig-

niﬁcantly increases the angle between positive pairs and

therefore can increase the best threshold for face veriﬁcation

on test data. However, if the train data for each identity are

limited, making the intraclass variance small, it is diﬃcult to

increase the best threshold for face veriﬁcation on test data.

A low threshold used in the evaluation of face veriﬁcation

indicates a low reliability of the model. The proposed model

addresses this problem by adding point cloud and depth

data to the RGB data.

The results for face veriﬁcation performance on three-

shot learning are compared in Table 2. Three-shot learning

is learning that takes place using only three training samples.

The best threshold is the threshold with the maximum F1

score. The model using the three types of sensor data shows

higher accuracy, a higher F1score, and an increase in the

threshold than the RGB-images-only model. This demon-

strates that by making use of supplementary information

such as point cloud and depth data, the proposed model

can increase intraclass variance and, as a result, increase

the best threshold for face veriﬁcation.

4.3.2. Analysis of Cosine Similarity on Three-Shot Learning of

the Proposed Method. We examined the cosine similarity for

various facial expressions on three-shot learning, with

results as shown in Table 3. The proposed model produced

better similarity values between positive pairs than the RGB-

images-only model, even with a variety of facial expressions.

Because the proposed method uses more information of face

by adding depth and point cloud, the intraclass variance of

the model can increase the angle between positive pairs.

Therefore, the model can increase the cosine similarity, and

the higher cosine similarity can increase the best threshold

on face veriﬁcation. This result demonstrates that adding

depth and point cloud data enables the model to learn impor-

tant facial features for face veriﬁcation more eﬀectively than

the model with only RGB data. In addition, despite the diﬀer-

ence between the same identities according to pose variations,

the proposed method can distinguish the identity well in the

test data by adding depth and point cloud data.

5. Conclusion

This paper has proposed a face recognition approach that

considers depth information using point cloud data. By

using depth information, false facial veriﬁcation using a face

Table 3: Cosine similarity for various facial expressions.

Description Pair

Pair

RGB 0.9118 0.7450 0.7734 0.5164 0.5558

RGB + depth + point cloud 0.9664 0.9050 0.8781 0.8258 0.8484

6 Journal of Sensors

photo or video of an authorized person can be avoided,

thereby increasing the reliability of the face recognition sys-

tem. The method incorporates the SAC block based on the

attention mechanism to capture important features and

weight them to enhance model performance. In addition,

we used a modiﬁed loss function constructed by adding a

large margin to reinforce high discriminatory power for face

recognition applications [34]. The proposed method delivers

a considerable performance improvement over the baseline

models and uses a higher threshold for face veriﬁcation

when subjected to an increase in intraclass variance.

Data Availability

All source codes are available online at https://github.com/

kyoungmingo/Fusion_face (author’s webpage)

Conflicts of Interest

The authors declare that they have no conﬂicts of interest.

Acknowledgments

This research was supported by the National Research Foun-

dation of Korea (NRF) funded by the Ministry of Education,

Science and Technology (NRF-2020R1C1C1A01005229 and

NRF-2021R1A4A5032622).

References

[1] S. Kumar, S. Singh, and J. Kumar, “A comparative study on

face spooﬁng attacks,”in 2017 International Conference on

Computing, Communication and Automation (ICCCA),

pp. 1104–1108, Greater Noida, India, 2017.

[2] T. Girdler and V. G. Vassilakis, “Implementing an intrusion

detection and prevention system using software-deﬁned net-

working: defending against ARP spooﬁng attacks and black-

listed MAC addresses,”Computers & Electrical Engineering,

vol. 90, p. 106990, 2021.

[3] C. Xu, B. Wu, Z. Wang et al., “Squeezesegv3: spatially adaptive

convolution for eﬃcient point-cloud segmentation,”in Euro-

pean Conference on Computer Vision, pp. 1–19, Springer,

2020.

[4] J. Deng, Y. Zhou, and S. Zafeiriou, “Marginal loss for deep face

recognition,”in Proceedings of the IEEE Conference on Com-

puter Vision and Pattern Recognition Workshops, pp. 60–68,

Honolulu, HI, USA, 2017.

[5] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song, “Sphereface:

deep hypersphere embedding for face recognition,”in Proceed-

ings of the IEEE conference on computer vision and pattern rec-

ognition, pp. 212–220, Honolulu, HI, USA, 2017.

[6] W. Liu, Y. Wen, Z. Yu, and M. Yang, “Large-margin softmax

loss for convolutional neural networks,”ICML, vol. 2, p. 7,

2016.

[7] F. Wang, J. Cheng, W. Liu, and H. Liu, “Additive margin soft-

max for face veriﬁcation,”IEEE Signal Processing Letters,

vol. 25, no. 7, pp. 926–930, 2018.

[8] H. Wang, Y. Wang, Z. Zhou et al., “CosFace: large margin

cosine loss for deep face recognition,”in Proceedings of the

IEEE conference on computer vision and pattern recognition,

pp. 5265–5274, Salt Lake City, UT, USA, 2018.

[9] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “ArcFace: additive

angular margin loss for deep face recognition,”in Proceedings

of the IEEE/CVF Conference on Computer Vision and Pattern

Recognition, pp. 4690–4699, Long Beach, CA, USA, 2019.

[10] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: towards

real-time object detection with region proposal networks,”

2015, https://arxiv.org/abs/1506.01497.

[11] G. Algan and I. Ulusoy, “Image classiﬁcation with deep learn-

ing in the presence of noisy labels: a survey,”Knowledge-Based

Systems, vol. 215, article 106771, 2021.

[12] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask r-cnn,”

in Proceedings of the IEEE international conference on com-

puter vision, pp. 2961–2969, 2017.

[13] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroﬀ, and H. Adam,

“Encoder-decoder with atrous separable convolution for

semantic image segmentation,”in Proceedings of the European

conference on computer vision (ECCV), pp. 801–818, 2018.

[14] W. Wang, Q. Lai, H. Fu, J. Shen, H. Ling, and R. Yang, “Salient

object detection in the deep learning era: an in-depth survey,”

in IEEE Transactions on Pattern Analysis and Machine Intelli-

gence, 2021.

[15] O. Ronneberger, P. Fischer, and T. Brox, “U-net: convolutional

networks for biomedical image segmentation,”in Interna-

tional Conference on Medical image computing and

computer-assisted intervention, pp. 234–241, Cham, 2015.

[16] S. Asgari Taghanaki, K. Abhishek, J. P. Cohen, J. Cohen-Adad,

and G. Hamarneh, “Deep semantic segmentation of natural

and medical images: a review,”Artiﬁcial Intelligence Review,

vol. 54, no. 1, pp. 137–178, 2021.

[17] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,”

in Proceedings of the IEEE conference on computer vision and

pattern recognition, pp. 7132–7141, 2018.

[18] J. Hu, L. Shen, S. Albanie, G. Sun, and A. Vedaldi, “Gather-

excite: exploiting feature context in convolutional neural net-

works,”2018, https://arxiv.org/abs/1810.12348.

[19] J. Park, S. Woo, J.-Y. Lee, and I. S. Kweon, “BAM: bottleneck

attention module,”2018, https://arxiv.org/abs/1807.06514.

[20] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: convolu-

tional block attention module,”in Proceedings of the European

conference on computer vision (ECCV), pp. 3–19, 2018.

[21] Y. Sun, X. Wang, and X. Tang, “Deep learning face representa-

tion by joint identiﬁcation-veriﬁcation,”2014, https://arxiv

.org/abs/1406.4773.

[22] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “Deepface:

closing the gap to human-level performance in face veriﬁca-

tion,”in Proceedingsof the IEEE conference on computer vision

and pattern recognition, pp. 1701–1708, 2014.

[23] F. Schroﬀ, D. Kalenichenko, and J. Philbin, “Facenet: a uniﬁed

embedding for face recognition and clustering,”in Proceedings

of the IEEE conference on computer vision and pattern recogni-

tion, pp. 815–823, 2015.

[24] O. M. Parkhi, A. Vedaldi, and A. Zisserman, Deep Face Recog-

nition, British Machine Vision Association, 2015.

[25] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection

and alignment using multitask cascaded convolutional net-

works,”IEEE Signal Processing Letters, vol. 23, no. 10,

pp. 1499–1503, 2016.

[26] X. Wang, S. Zhang, S. Wang, T. Fu, H. Shi, and T. Mei, “Mis-

classiﬁed vector guided softmax loss for face recognition,”Pro-

ceedings of the AAAI Conference on Artiﬁcial Intelligence,

vol. 34, no. 7, pp. 12241–12248, 2020.

7Journal of Sensors

[27] Y. Sun, D. Liang, X. Wang, and X. Tang, “Deepid3: face recog-

nition with very deep neural networks,”2015, https://arxiv

.org/abs/1502.00873.

[28] G. Krispel, M. Opitz, G. Waltner, H. Possegger, and H. Bischof,

“Fuseseg: Lidar point cloud segmentation fusing multi-modal

data,”in Proceedings of the IEEE/CVF Winter Conference on

Applications of Computer Vision, pp. 1874–1883, 2020.

[29] C. Hazirbas, L. Ma, C. Domokos, and D. Cremers, “Fusenet:

incorporating depth into semantic segmentation via fusion-

based CNN architecture,”in Asian Conference on Computer

Vision, pp. 213–228, Springer, 2016.

[30] V. John, M. Nithilan, S. Mita et al., “Sensor fusion of intensity

and depth cues using the chinet for semantic segmentation of

road scenes,”in 2018 IEEE Intelligent Vehicles Symposium

(IV), pp. 585–590, Changshu, China, 2018.

[31] A. F. Agarap, “Deep learning using rectiﬁed linear units

(ReLU),”2018, https://arxiv.org/abs/1803.08375.

[32] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for

image recognition,”in Proceedingsof the IEEE conference on

computer vision and pattern recognition, pp. 770–778, 2016.

[33] Z. Zhang and M. R. Sabuncu, “Generalized cross entropy loss

for training deep neural networks with noisy labels,”2018,

https://arxiv.org/abs/1805.07836.

[34] W. Ali, W. Tian, S. U. Din, D. Iradukunda, and A. A. Khan,

“Classical and modern face recognition approaches: a com-

plete review,”Multimedia Tools and Applications, vol. 80,

no. 3, pp. 4825–4880, 2021.

8 Journal of Sensors

Available via license: CC BY 4.0

Content may be subject to copyright.

CloudNet: A LiDAR-based Face Anti-Spoofing model that is robust against light variation

Article

Full-text available

Feb 2023

Face anti-spoofing (FAS) is a technology that protects face recognition systems from presentation attacks. The current challenge faced by FAS studies is the difficulty in creating a generalized light variation model. This is because face data are sensitive to light domain. FAS models using only red green blue (RGB) images suffer from poor performance when the training and test datasets have different light variations. To overcome this problem, this study focuses on light detection and ranging (LiDAR) sensors. LiDAR is a time-of-flight depth sensor that is included in the latest mobile devices. It is negligibly affected by light and provides 3D coordinate and depth information of the target. Thus, a model that is resistant to light variations and exhibiting excellent performance can be created. For the experiment, datasets collected with a LiDAR camera are built and CloudNet architectures for RGB, point clouds, and depth are designed. Three protocols are used to confirm the performance of the model according to variations in the light domain. Experimental results indicate that for protocols 2 and 3, CloudNet error rates increase by 0.1340 and 0.1528, whereas the error rates of the RGB model increase by 0.3951 and 0.4111, respectively, as compared with protocol 1. These results demonstrate that the LiDAR-based FAS model with CloudNet has a more generalized performance compared with the RGB model.

Enhancing the field-of-view of spectral-scanning FMCW LiDAR by multipass configuration with an echelle grating

Article

Full-text available

May 2024
OPT LETT

We present a spectral-scanning frequency-modulated continuous wave (FMCW) 3D imaging system capable of producing high-resolution depth maps with an extended field of view (FOV). By employing a multipass configuration with an echelle grating, the system achieves an FOV of 5.5° along the grating axis. The resulting depth maps have a resolution of 70 × 40 pixels, with a depth resolution of 5.1 mm. The system employs an echelle grating for beam steering and leverages the multipass configuration for angular FOV magnification. Quantitative depth measurements and 3D imaging results of a static 3D-printed depth variation target are demonstrated. The proposed approach offers a promising solution for enhancing the FOV of spectral-scanning FMCW LiDAR systems within a limited wavelength-swept range, thereby reducing system complexity and cost, paving the way for improved 3D imaging applications.

Automated Technology for Strawberry Size Measurement and Weight Prediction Using AI

Article

Full-text available

Jan 2024

In this study, we propose an automated system for measuring the size of strawberries and predicting their weight using AI technology. The system combines computer vision techniques with LiDAR sensor data to accurately estimate the dimensions of strawberries and infer their weight. By integrating deep learning models, such as HRNet for keypoint detection, and leveraging the capabilities of LiDAR sensors, we minimize human intervention and achieve precise size measurement. The relative errors for the width and height of the strawberries are 3.71% and 5.42%, respectively, with the width exhibiting a lower error rate. The standard deviation for the width and height of the strawberries are 0.19% and 0.24%, this indicates that the individual strawberries had very low error rates in terms of their measurements for the width and height. Weight prediction was performed through regression analysis with width and height estimation. Experimental results demonstrate that our approach enables accurate weight prediction with a relative error of 10.3%. This automated technology holds great potential for strawberry harvesting and classification tasks, facilitating the automation of these processes.

A Convolutional Neural Network Based Real Time Face Mask Detection Method to Help Prevent COVID-19

Conference Paper

Full-text available

Sep 2023

After surfacing in the end of 2019 in China, the SARS-CoV-2 virus has swept across the continents and took millions of lives in just a few years. According to the World Health Organization (WHO), one of the most effective ways to reduce the transmission of this virus is to maintain a safe distance of 1 meter from one another and/or to wear a properly fitted face mask. However, it is close to impossible to stay 1 meter away from each other in a densely populated country like Bangladesh. Therefore, an automated face mask detection system can be a useful method towards ensuring people wear masks properly in crowded areas. To this aim, this paper presents a convolutional neural network (CNN) based real-time face mask detection method. Open source software libraries such as TensorFlow, Keras, OpenCV and MobileNet-v2 is used to implement the presented algorithm. The proposed method can differentiate between properly fitted mask and inappropriately worn mask as well. The overall training accuracy of this method is found to be 93.5%.

Face Mask Detection Based Entry Control Using XAI and IoT

Chapter

Full-text available

Oct 2022

Today it has become mandatory for all the citizens to wear a face mask to protect them from COVID-19. Also taking two doses of vaccine is a must to visiting public places and currently, the only method to verify whether a person is fully vaccinated is by showing a vaccine certificate. The proposed application is helpful for elderly people who find it difficult to use smart phones. The shop owners, offices, banks, or any public place can check for restrictions of entry if anyone is not wearing a mask. As a result, no need for any guard to keep an eye on people. Machine learning techniques with Explainable AI (XAI) can solve these problems easily and results are made understandable to end-users because of the explaining ability and interpretability of neural network models. The system performs well for prediction and gives more accurate and trustworthy predictions. Hence XAI is more reliable in healthcare systems. The proposed system is implemented completely on Raspberry Pi allowing a complete embedded application. The application is developed using Python and HTML. PyCharm/Visual Studio Code with the help of an open-source library is used for training, defining, etc. Machine learning models used for the system are Tensorflow.js, Keras, OpenCV, etc. The whole application can run on a microcontroller such as Raspberry Pi, which allows one to simply plug and play the system at any time.KeywordsCNNXAIOpenCVMask detectionRaspberry PiBootstrapHOG

Smartphone-Based Facial Scanning as a Viable Tool for Facially Driven Orthodontics? (preprint)

Preprint

Full-text available

Sep 2022

The current paradigm shift in orthodontic treatment planning is based on facially driven diagnostics. This requires an affordable, convenient, and non-invasive solution for face scanning. Therefore, utilization of smartphones` TrueDepth sensors is very tempting. TrueDepth refers to front-facing cameras with a dot projector in Apple devices that provide real-time depth data in addition to visual information. There are several applications that tout themselves as accurate solutions for 3D scanning of the face in dentistry. Their clinical accuracy has been uncertain. This study focuses on evaluating the accuracy of the Bellus3D Dental Pro app, which uses Apple's TrueDepth sensor. The app reconstructs a virtual, high-resolution version of the face, which is available for download as a 3D object. In this paper, sixty TrueDepth scans of the face were compared to sixty corresponding facial surfaces segmented from CBCT. Difference maps were created for each pair and evaluated in specific facial regions. The results confirmed statistically significant differences in some facial regions in amplitudes greater than 3 mm, suggesting that current technology has limited applicability for clinical use. The clinical utilization of facial scanning for orthodontic evaluation, which does not require accuracy in the lip region below 3 mm, can be considered.

Student Attention Monitoring: An Automated Approach

Conference Paper

May 2024

Coordinate-wise monotonic transformations enable privacy-preserving age estimation with 3D face point cloud

Article

Apr 2024
Sci China Life Sci

The human face is a valuable biomarker of aging, but the collection and use of its image raise significant privacy concerns. Here we present an approach for facial data masking that preserves age-related features using coordinate-wise monotonic transformations. We first develop a deep learning model that estimates age directly from non-registered face point clouds with high accuracy and generalizability. We show that the model learns a highly indistinguishable mapping using faces treated with coordinate-wise monotonic transformations, indicating that the relative positioning of facial information is a low-level biomarker of facial aging. Through visual perception tests and computational 3D face verification experiments, we demonstrate that transformed faces are significantly more difficult to perceive for human but not for machines, except when only the face shape information is accessible. Our study leads to a facial data protection guideline that has the potential to broaden public access to face datasets with minimized privacy risks.

3D Point Cloud Outliers and Noise Reduction Using Neural Networks

Chapter

Nov 2023

3D point clouds find widespread use in various areas of computing research, such as 3D reconstruction, point cloud segmentation, navigation, and assisted driving, to name a few examples. A point cloud is a collection of coordinates that represent the shape or surface of an object or scene. One way to generate these point clouds is by using RGB-D cameras. However, one major issue when using point clouds is the presence of noise and outliers caused by various factors, such as environmental conditions, object reflectivity, and sensor limitations. Classification and segmentation tasks can become complex when point clouds contain noise and outliers. This paper proposes a method to reduce outliers and noise in 3D point clouds. Our proposal builds on a deep learning architecture called PointCleanNet, which we modified by adding extra convolutional layers to extract feature maps that help classify point cloud outliers. We demonstrate the effectiveness of our proposed method in improving outlier classification and noise reduction in non-dense point clouds. We achieved this by including a low-density point cloud dataset in the training stage, which helped our method classify outliers more efficiently than PointCleanNet and Luo, S, et al.

Smartphone-Based Facial Scanning as a Viable Tool for Facially Driven Orthodontics?

Article

Full-text available

Oct 2022
SENSORS-BASEL

The current paradigm shift in orthodontic treatment planning is based on facially driven diagnostics. This requires an affordable, convenient, and non-invasive solution for face scanning. Therefore, utilization of smartphones’ TrueDepth sensors is very tempting. TrueDepth refers to front-facing cameras with a dot projector in Apple devices that provide real-time depth data in addition to visual information. There are several applications that tout themselves as accurate solutions for 3D scanning of the face in dentistry. Their clinical accuracy has been uncertain. This study focuses on evaluating the accuracy of the Bellus3D Dental Pro app, which uses Apple’s TrueDepth sensor. The app reconstructs a virtual, high-resolution version of the face, which is available for download as a 3D object. In this paper, sixty TrueDepth scans of the face were compared to sixty corresponding facial surfaces segmented from CBCT. Difference maps were created for each pair and evaluated in specific facial regions. The results confirmed statistically significant differences in some facial regions with amplitudes greater than 3 mm, suggesting that current technology has limited applicability for clinical use. The clinical utilization of facial scanning for orthodontic evaluation, which does not require accuracy in the lip region below 3 mm, can be considered.

Implementing an intrusion detection and prevention system using Software-Defined Networking: Defending against ARP spoofing attacks and Blacklisted MAC Addresses

Article

Full-text available

Mar 2021
COMPUT ELECTR ENG

This work focuses on infiltration methods, such as Address Resolution Protocol (ARP) spoofing, where adversaries sends fabricated ARP messages, linking their Media Access Control (MAC) address to a genuine device’s Internet Protocol (IP) address. We developed a Software-Defined Networking (SDN)-based Intrusion Detection and Prevention System (IDPS), which defends against ARP spoofing and Blacklisted MAC Addresses. This is done by dynamically adjusting SDN’s operating parameters to detect malicious network traffic. Bespoke software was written to conduct the attack tests and customise the IDPS; this was coupled to a specifically developed library to validate user input. Improvements were made to SDN in the areas of attack detection, firewall, intrusion prevention, packet dropping, and shorter timeouts. Our extensive experimental results show that the developed solution is effective and quickly responds to intrusion attempts. In the considered test scenarios, our measured detection and mitigation times are sufficiently low (in the order of a few seconds).

Salient Object Detection in the Deep Learning Era: An In-Depth Survey

Article

Full-text available

Jan 2021

As an essential problem in computer vision, salient object detection (SOD) has attracted an increasing amount of research attention over the years. Recent advances in SOD are predominantly led by deep learning-based solutions (named deep SOD). To enable an in-depth understanding of deep SOD, in this paper, we provide a comprehensive survey covering various aspects, ranging from algorithm taxonomy to unsolved issues. In particular, we first review deep SOD algorithms from different perspectives, including network architecture, level of supervision, learning paradigm, and object-/instance-level detection. Following that, we summarize and analyze existing SOD datasets and evaluation metrics. Then, we benchmark a large group of representative SOD models, and provide detailed analyses of the comparison results. Moreover, we study the performance of SOD algorithms under different attribute settings, which has not been thoroughly explored previously, by constructing a novel SOD dataset with rich attribute annotations covering various salient object types, challenging factors, and scene categories. We further analyze, for the first time in the field, the robustness of SOD models to random input perturbations and adversarial attacks. We also look into the generalization and difficulty of existing SOD datasets. Finally, we discuss several open issues of SOD and outline future research directions. All the saliency prediction maps, our constructed dataset with annotations, and codes for evaluation are publicly available at https://github.com/wenguanwang/SODsurvey.

Classical and modern face recognition approaches: a complete review

Article

Full-text available

Jan 2021
MULTIMED TOOLS APPL

Human face recognition have been an active research area for the last few decades. Especially, during the last five years, it has gained significant research attention from multiple domains like computer vision, machine learning and artificial intelligence due to its remarkable progress and broad social applications. The primary goal of any face recognition system is to recognize the human identity from the static images, video data, data-streams and the knowledge of the context in which these data components are being actively used. In this review, we have highlighted major applications, challenges and trends of face recognition systems in social and scientific domains. The prime objective of this research is to sum-up recent face recognition techniques and develop a broad understanding of how these techniques behave on different datasets. Moreover, we discuss some key challenges such as variability in illumination, pose, aging, cosmetics, scale, occlusion, and background. Along with classical face recognition techniques, most recent research directions are deeply investigated, i.e., deep learning, sparse models and fuzzy set theory. Additionally, basic methodologies are briefly discussed, while contemporary research contributions are examined in broader details. Finally, this research presents future aspects of face recognition technologies and its potential significance in the upcoming digital society.

Deep semantic segmentation of natural and medical images: a review

Article

Full-text available

Jan 2021
ARTIF INTELL REV

The semantic image segmentation task consists of classifying each pixel of an image into an instance, where each instance corresponds to a class. This task is a part of the concept of scene understanding or better explaining the global context of an image. In the medical image analysis domain, image segmentation can be used for image-guided interventions, radiotherapy, or improved radiological diagnostics. In this review, we categorize the leading deep learning-based medical and non-medical image segmentation solutions into six main groups of deep architectural, data synthesis-based, loss function-based, sequenced models, weakly supervised, and multi-task methods and provide a comprehensive review of the contributions in each of these groups. Further, for each group, we analyze each variant of these groups and discuss the limitations of the current approaches and present potential future research directions for semantic image segmentation.

FuseSeg: LiDAR Point Cloud Segmentation Fusing Multi-Modal Data

Conference Paper

Full-text available

Mar 2020

U-Net: Convolutional Networks for Biomedical Image Segmentation

Book

Jan 2015

Image classification with deep learning in the presence of noisy labels: A survey

Article

Jan 2021
KNOWL-BASED SYST

Image classification systems recently made a giant leap with the advancement of deep neural networks. However, these systems require an excessive amount of labeled data to be adequately trained. Gathering a correctly annotated dataset is not always feasible due to several factors, such as the expensiveness of the labeling process or difficulty of correctly classifying data, even for the experts. Because of these practical challenges, label noise is a common problem in real-world datasets, and numerous methods to train deep neural networks with label noise are proposed in the literature. Although deep neural networks are known to be relatively robust to label noise, their tendency to overfit data makes them vulnerable to memorizing even random noise. Therefore, it is crucial to consider the existence of label noise and develop counter algorithms to fade away its adverse effects to train deep neural networks efficiently. Even though an extensive survey of machine learning techniques under label noise exists, the literature lacks a comprehensive survey of methodologies centered explicitly around deep learning in the presence of noisy labels. This paper aims to present these algorithms while categorizing them into one of the two subgroups: noise model based and noise model free methods. Algorithms in the first group aim to estimate the noise structure and use this information to avoid the adverse effects of noisy labels. Differently, methods in the second group try to come up with inherently noise robust algorithms by using approaches like robust losses, regularizers or other learning paradigms.

SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation

Chapter

Nov 2020

LiDAR point-cloud segmentation is an important problem for many applications. For large-scale point cloud segmentation, the de facto method is to project a 3D point cloud to get a 2D LiDAR image and use convolutions to process it. Despite the similarity between regular RGB and LiDAR images, we are the first to discover that the feature distribution of LiDAR images changes drastically at different image locations. Using standard convolutions to process such LiDAR images is problematic, as convolution filters pick up local features that are only active in specific regions in the image. As a result, the capacity of the network is under-utilized and the segmentation performance decreases. To fix this, we propose Spatially-Adaptive Convolution (SAC) to adopt different filters for different locations according to the input image. SAC can be computed efficiently since it can be implemented as a series of element-wise multiplications, im2col, and standard convolution. It is a general framework such that several previous methods can be seen as special cases of SAC. Using SAC, we build SqueezeSegV3 for LiDAR point-cloud segmentation and outperform all previous published methods by at least 2.0% mIoU on the SemanticKITTI benchmark. Code and pretrained model are available at https://github.com/chenfengxu714/SqueezeSegV3.

Mis-Classified Vector Guided Softmax Loss for Face Recognition

Article

Apr 2020

Face recognition has witnessed significant progress due to the advances of deep convolutional neural networks (CNNs), the central task of which is how to improve the feature discrimination. To this end, several margin-based (e.g., angular, additive and additive angular margins) softmax loss functions have been proposed to increase the feature margin between different classes. However, despite great achievements have been made, they mainly suffer from three issues: 1) Obviously, they ignore the importance of informative features mining for discriminative learning; 2) They encourage the feature margin only from the ground truth class, without realizing the discriminability from other non-ground truth classes; 3) The feature margin between different classes is set to be same and fixed, which may not adapt the situations very well. To cope with these issues, this paper develops a novel loss function, which adaptively emphasizes the mis-classified feature vectors to guide the discriminative feature learning. Thus we can address all the above issues and achieve more discriminative face features. To the best of our knowledge, this is the first attempt to inherit the advantages of feature margin and feature mining into a unified loss function. Experimental results on several benchmarks have demonstrated the effectiveness of our method over state-of-the-art alternatives. Our code is available at http://www.cbsr.ia.ac.cn/users/xiaobowang/.

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

Conference Paper