Depth images captured by different devices or under different conditions show different quality levels. From left to right: (a) Facial depth images captured by SCU 3D scanner [17] (in our collected database), (b) Konica Minolta Vivid 910 [18] and (c) 3dMD [19] in lab, (d) Kinect II in lab [20], (e) RealSense in lab (in our collected database) and (f) in the wild [14].

Depth images captured by different devices or under different conditions show different quality levels. From left to right: (a) Facial depth images captured by SCU 3D scanner [17] (in our collected database), (b) Konica Minolta Vivid 910 [18] and (c) 3dMD [19] in lab, (d) Kinect II in lab [20], (e) RealSense in lab (in our collected database) and (f) in the wild [14].

Source publication
Article
Full-text available
Face recognition using depth data has attracted increasing attention from both academia and industry in the past five years. Previous works show a huge performance gap between high-quality and low-quality depth data. Due to the lack of databases and reasonable evaluations on data quality, very few researchers have focused on boosting depth-based fa...

Contexts in source publication

Context 1
... the accuracy achieved by using depth images captured by low-cost RGB-D sensors [11][12][13]15] is still much lower than that by using 3D faces captured by 3D scanners [1,3]. This should be attributed to that the quality of the depth images captured by low-cost RGB-D sensors is generally poor, and we call such data as low-quality depth data (see Figure 1). In contrast to the aforementioned high-quality depth-based FR, we categorized these methods into low-quality depth-based FR. ...
Context 2
... contrast to the aforementioned high-quality depth-based FR, we categorized these methods into low-quality depth-based FR. In Figure 1, depth images captured by different sensors are shown, which naturally cause the difference between two kinds of data on resolution and precision. Here, resolution (also known as density) refers to the density of the 3D face point clouds, defined by the number of points used to represent the 3D faces. ...

Citations

... This method eliminates the process of preprocessing the data and allows for scaling up the data, but the accuracy of face recognition is not satisfactory enough, and there is still much room for improvement. Hu et al. [27] collected a high-quality database, Extended-Multi-Dim, which includes color images, depth images, and 3D point clouds of each object. Through their standard protocol for fully utilizing features and image information, they demonstrated the feasibility of enhancing depth information quality to improve the accuracy of depth FR, thus improving the performance of models based on low-quality data through high-quality data. ...
Article
Face recognition, as a convenient, natural, and widely applied emerging technology, has achieved many significant research results in recent years. 2D face recognition has drawn extensive studies, while previously,2D face recognition is too sensitive to variations in features like facial expressions. To avoid the shortcoming, more attention was paid to the optimization of algorithms, stronger computational capabilities, and fusion strategies, which contributed greatly to the accuracy of face recognition and made it more outstanding. Compared to existing methods, RGB-D images tend to be more robust and reliable. Based on different processing methods of RGB-D 3D face data, researchers have proposed numerous 3D face recognition methods, such as 3D reconstruction methods from monocular RGB-D images, methods based on point cloud data, and methods based on image depth map data. This paper focuses mainly on the image depth map data method, analyzing its rich development history and its unique advantages and disadvantages in RGB-D 3D face recognition. Additionally, we introduced some common RGB-D face datasets, analyzing data collection methods.
... Hu et. al. [12] also developed a 3D face recognition system using a multimodal database consisting of 902 subjects. However, the system was not developed completely in real time. ...
... Hu et al. [12] used a low depth sensor, to create a live face recognition system using deep learning technique. With resnet-18, the maximum recognition rate obtained was 96.9%. ...
Article
Full-text available
The capturing and development of a real time 3D face recognition system is always a big challenge and attracts huge popularity nowadays. This is because, in most security systems employing face recognition, it is always crucial to capture live images and recognize them. This should save time and cost as well, and incredibly is acceptable across all areas of security concerns. This paper deals with the development of an entirely new 3D face recognition system which was developed in Jadavpur University with six subjects captured fully in real time. The main problem which we have tried to address in the present work is that, how various issues like camera calibration, alignment of the camera, distance of the camera from the subject affects the facial recognition rate. We will also analyze how facial registration helps to increase the recognition rates which have been imposed by the above factors. The problem is relevant, because, it’s always a challenge to capture and correctly predict a real time system. The necessary problems needed for setting up a real time system is always a matter to be investigated by the researchers in face recognition domain. The main overview of our present proposed method is to capture the subjects in real-time taking into consideration all the calibration issues like camera alignment, distance of camera from the subject, and several other factors. We have also discussed how these factors affect the recognition rate. Once captured, we have tried to justify the varying recognition rate of the subjects due to the changes in calibration, alignment and other issues. Now, in order to improve the performances of the subjects, we have proposed a new 3D face registration algorithm termed as FaRegAvFM8, which was tested on the subjects from our database acquired in real-time, as well as on Frav3D and GavabDB databases. Our system attained a recognition rate of 95.83% after registration on frontal subjects using Haar wavelet as the feature extraction method, which depicts the robustness of the present system. Not only that, we have improved our recognition rate up to 96% using the Deep Convolutional Neural Network (DCNN).
... A second CNN performs feature extraction and recognition task. Another DL approach is presented by Hu et al. [13] who boost recognition from low-quality data employing high-quality samples. This method is restricted by the low availability of datasets, including low and high-resolution images. ...
Chapter
Full-text available
Image-based assistive solutions raise concerns about the privacy of the individuals being monitored. The issue involves the situation when such technology is used in medical institutions to protect patients’ health and support the personnel. These devices are installed in facilities and process images that include personal and behavioral data during the day. Other types of images than RGB are used to maintain privacy in this type of application, like depth images. Usage of depth cameras in the majority of publications is considered private protective. This paper discusses the issue of privacy in vision-based applications using depth modality. The factors affecting privacy in depth images are presented. The main problem that makes an image non-private is that the subjects’ faces allow identification. This paper compares the Face Recognition (FR) technique between RGB and depth images. In the experimental part, a state-of-the-art model for FR in depth images is developed, which is used to establish boundary conditions when a person is recognized. The performance of FR between these two modalities is compared on two existing datasets containing images in both versions, including the training process. The study aims to determine under which conditions depth cameras preserve privacy and how much privacy they reveal.
... Authors [12] are boosting recognition form low-quality data using high-quality samples. There are three different techniques proposed where the model is enhanced during training by higher resolution images. ...
... In [25], several pre-processing steps are applied on depth images, including hole filling (to reduce the areas with invalid depth values) depth range normalization (based on the nose tip detection), and outlier removal. Hu et al. [26] present a method for boosting depth-based face recognition through the combined use of high-quality depth data that were acquired by a 3D scanner and depth images. In [27], a Siamese network that processes pairs of facial depth images is proposed without exploiting any specific image pre-processing algorithms. ...
Article
Full-text available
Nowadays, we are witnessing the wide diffusion of active depth sensors. However, the generalization capabilities and performance of the deep face recognition approaches that are based on depth data are hindered by the different sensor technologies and the currently available depth-based datasets, which are limited in size and acquired through the same device. In this paper, we present an analysis on the use of depth maps, as obtained by active depth sensors and deep neural architectures for the face recognition task. We compare different depth data representations (depth and normal images, voxels, point clouds), deep models (two-dimensional and three-dimensional Convolutional Neural Networks, PointNet-based networks), and pre-processing and normalization techniques in order to determine the configuration that maximizes the recognition accuracy and is capable of generalizing better on unseen data and novel acquisition settings. Extensive intra- and cross-dataset experiments, which were performed on four public databases, suggest that representations and methods that are based on normal images and point clouds perform and generalize better than other 2D and 3D alternatives. Moreover, we propose a novel challenging dataset, namely MultiSFace, in order to specifically analyze the influence of the depth map quality and the acquisition distance on the face recognition accuracy.
... There are multiple modalities of face data that can be used in face recognition, such as near infrared images, depth images, Red Green Blue (RGB) images, etc. Compared with near infrared and depth images [4], RGB images include more information and have broader application scenarios. In the past decades, many RGB-based face recognition methods have been proposed and great progress has been made, especially with the development of deep learning [5][6][7][8]. ...
Article
Full-text available
Face recognition using a single sample per person is a challenging problem in computer vision. In this scenario, due to the lack of training samples, it is difficult to distinguish between inter-class variations caused by identity and intra-class variations caused by external factors such as illumination, pose, etc. To address this problem, we propose a scheme to improve the recognition rate by both generating additional samples to enrich the intra-variation and eliminating external factors to extract invariant features. Firstly, a 3D face modeling module is proposed to recover the intrinsic properties of the input image, i.e., 3D face shape and albedo. To obtain the complete albedo, we come up with an end-to-end network to estimate the full albedo UV map from incomplete textures. The obtained albedo UV map not only eliminates the influence of the illumination, pose, and expression, but also retains the identity information. With the help of the recovered intrinsic properties, we then generate images under various illuminations, expressions, and poses. Finally, the albedo and the generated images are used to assist single sample per person face recognition. The experimental results on Face Recognition Technology (FERET), Labeled Faces in the Wild (LFW), Celebrities in Frontal-Profile (CFP) and other face databases demonstrate the effectiveness of the proposed method.
Chapter
Full-text available
Research in service robotics strives at having a positive impact on people’s quality of life by the introduction of robotic helpers for everyday activities. From this ambition arises the need of enabling natural communication between robots and ordinary people. For this reason, Human-Robot Interaction (HRI) is an extensively investigated topic, exceeding language-based exchange of information, to include all the relevant facets of communication. Each aspect of communication (e.g. hearing, sight, touch) comes with its own peculiar strengths and limits, thus they are often combined to improve robustness and naturalness. In this contribution, an HRI framework is presented, based on pointing gestures as the preferred interaction strategy. Pointing gestures are selected as they are an innate behavior to direct another attention, and thus could represent a natural way to require a service to a robot. To complement the visual information, the user could be prompted to give voice commands to resolve ambiguities and prevent the execution of unintended actions. The two layers (perceptive and semantic) architecture of the proposed HRI system is described. The perceptive layer is responsible for objects mapping, action detection, and assessment of the indicated direction. Moreover, it has to listen to uses’ voice commands. To avoid privacy issues and not burden the computational resources of the robot, the interaction would be triggered by a wake-word detection system. The semantic layer receives the information processed by the perceptive layer and determines which actions are available for the selected object. The decision is based on object’s characteristics, contextual information and user vocal feedbacks are exploited to resolve ambiguities. A pilot implementation of the semantic layer is detailed, and qualitative results are shown. The preliminary findings on the validity of the proposed system, as well as on the limitations of a purely vision-based approach, are discussed.
Chapter
Additive Manufacturing (AM) techniques have attracted great interest in sectors with high benefit such as the Medtech. The AM of continuous fiber reinforced composite is a technology that, although still in its initial development, appears to be very promising. The AM process could be particularly interesting for the production of prosthetic devices for sports (ESAR devices such as foot or foil). In this work, is explored the potential of additive manufacturing in the field of prosthetics of the lower limb through an multidisciplinary approach, involving several competences from the clinical evaluation, integration of biomechanics, development of new composite and hybrid materials based on polymers, carbon fibres and metal inserts, and implementation of optical sensors with their integration in the composites for the continuous monitoring of prosthetic devices and their durability.
Article
The popularity of face-authentication systems has also generated interest in the study of malicious authentication attempts, such as face spoofing attacks. In this study we investigate two dynamic face-authentication challenges: the camera close-up, and head-rotation paradigms. For each paradigm we developed an ML-based face-authentication system that performs the tasks of liveness detection and face verification. In order to generate structured data-representations from the videos collected in the wild, we designed feature representations that extract three-dimensional and spatial characteristics of a face, while also capturing the particular liveness cues of the requested challenge-based movements. Furthermore, a set of Neural-Network models that employ Convolutional Neural Networks and Siamese Neural Network architectures were proposed. To train and test our models we collected a dataset of 177 live videos recorded by 41 different subjects and a set of 243 attack attempts in uncontrolled scenarios. The resulting NN models yield good performance against multiple types of media-based attacks (printed-attacks, screen-attacks, 2D-masks, videos acquired from public social media, deep fakes). The camera close-up system presented an overall liveness detection accuracy of 97.7% and a face verification accuracy of 97.6%. On the other hand, the evaluation of the head-rotation system resulted in a liveness detection accuracy of 92.4% and face verification accuracy of 98.1%. Face authentication methods based on dynamic user-challenges constitute a scalable approach that does not require specialized hardware to increase the security of face authentication systems in realistic usage scenarios. The proposed methods not only make it harder for attackers to generate spoofing attacks, but they also constitute a practical complement to static biometric authentication systems.