Synthetic-based still-to-video face recognition system

Source publication

Synthetic face generation under various operational conditions in video surveillance

Conference Paper

Full-text available

Sep 2015

In still-to-video face recognition (FR), the faces captured with surveillance cameras are matched against reference stills of target individuals enrolled to the system. FR is a challenging problem in video surveillance due to uncontrolled capture conditions (variations in pose, expression, illumination, blur, scale, etc.), and the limited number of...

Context 1

... overall block diagram of the proposed FR system is shown in Figure 2. During the enrollment, a set of non-target videos are collected and their ROIs are extracted and the head pose of each ROI in each frame is estimated. The ROI with the face pose angles less than 3 ◦ are selected. Then, GLQ i ( x, y ) and GCQ i ( x, y ) between each ROI isolated from still, and the video ROIs of various non-target individuals are measured. Next, clustering is performed on the normalized GLQ i ( x, y ) and GCQ i ( x, y ) in 2D space using K-means and the representative image of each cluster is determined. The optimal number of clusters obtained using Dunn index is typically around k = 4. This process is repeated 3 times for 10 different sets of non-target videos. So, for each watchlist individual, 12 non-target images are selected. Each watchlist individual is then morphed with the decomposed large scale layers. A total of 12 synthetic face images with diverse illumination and contrast for each still individual are generated and added to the watchlist gallery to create a new gallery. During design phase, for each face of gallery, segmentation is performed and the ROI is then scaled to a common size 48 × 48 to limit processing time. Next, a division into 3 × 3 = 9 uniform non-overlapping patches is performed on each ROI representations. Next, uniform pattern of 59 local binary pattern features of face patches using in the single reference ROI and corresponding synthetic ones are extracted to generate diverse face representations. The extracted features are normalized to range between 0 and 1, and assembled into a ROI pattern of features for matching. The latter are then stored as a template into a gallery. The enrollment phase produces a template gallery with 13 different templates per watchlist person (the original image plus 12 synthetic images). During the operational phase, frames undergo the same processing steps as for the enrollment and following that, template matching is applied that matches the facial models of probes against those models stored in the gallery during enrollment. Each matcher provides a similarity score between every patch of the input vector and the corresponding patch template in the gallery via Euclidian distance. Output scores from matchers are fed into the fusion module after score normalization. A face tracker also regroups faces from each different person and accumulates positive predictions over time for robust spatio-temporal recognition. A positive prediction is produced if a matching score surpasses an individual-specific threshold. Finally, the decision function combines the tracks and matching predictions in order to recognize the most likely individuals in the scene. To assess the transaction-level performance of the proposed FR system, partial ROC curve (pAUC), area under precision-recall space (AUPR) and F1-measure for a desired false positive rate of 1% , are considered. Prior to each replication, 5 persons are randomly selected as target watchlist individuals. The remaining individuals are used in the operational phase as non-target subjects. This process is repeated 5 times. In order to validate of the performance achieved by devel- oped FR system for watchlist screening applications, ChokePoint video dataset has been employed. The dataset includes an array of three cameras placed above two portals to capture subjects walking through them. It contains 54 video sequences in portal 1 and 29 subjects in portal 2. Captured face frames have variations in terms of illumination conditions, pose, sharpness, as well as misalignment due to automatic face localization/detection [25]. Figure 3 represents examples of the images generated with assis- tance of the proposed synthesizing algorithm. Results in Table 1 present the average transaction-level performance (pAUC(20%), AUPR and F1-measure) in the baseline FR system (BFR) with only one sample per person and the FR system with extra images under various illumination conditions. Moreover, it compares the results obtained with and without patching. As shown in Table 1, the recognition system with extra samples under varying illumination outperform the baseline system because of its robustness to the illumination variations. Furthermore, the results achieved via the patch-based technique provide a higher level of performance compared to the baseline system.Since, the extracting features from each patch allow exploiting more discriminant infor- mation and consequently yield better matching performance. Table 2 compares the average transaction-level performance in the synthetic FR system based on the number of synthetic samples. It can be concluded that the number of images added to the gallery has a direct impact on the recognition rate and time complexity. In- creasing synthetic images enhances the system performance; how- ever, reduces time efficiency. Therefore, there should be a trade-off between performance and computational cost associated with in- creased number of samples. It can however be observed that the results vary according to the watchlist individuals. This is demonstrated by the results, for instance, with individual ID#01, adding extra images improve the performance pAUC=0.118 to pAUC=0.378, but in individual ID#04 from pAUC=0.319 to pAUC=0.339. Given the challenges of still-to-video FR in video surveillance ap- plication, a new approach is proposed in this paper to generate multiple synthetic face images per reference still based on camera- specific capture conditions. This approach exploits the abundance of diverse facial ROIs from non-target individuals that appears in a specific camera viewpoint. An extension of image morphing allows to generate a set of diverse images with a smooth transition of illumination. It is able to accurately convey a range of synthetic face images with diverse illumination and contrast. Experimental results with the ChokePoint dataset show that the proposed approach is an effective approach to improve the representativeness under illumination and contrast conditions found in many video surveillance applications, for instance in watchlist screening only one reference face still, captured under controlled condition, is available during enrollment. It is worth mentioning that, this method can be generalized to transfer other appearance variations such as shadow and blur to any objects for a wide range of applications. In order to design a more robust still-to-video FR system, future research should include methods to generate even more synthetic faces based on variations in pose and expression of a target ...

View in full-text

Enhancing Facial Recognition in Surveillance Systems through Embedded Super-resolution

Article

Full-text available

Feb 2024
REV FAC ING-UNIV ANT

This document details the implementation of a sub-pixel convolutional neural network designed to enhance the resolution of face images. The model uses a series of filters to progressively increase the number of pixels, estimating the necessary information for new pixels from the original image and training derived from 22000 synthetic images produced by adversarial neural networks. Within the context of surveillance and related applications, the trained convolutional network exhibits beneficial characteristics. For instance, it can be deployed within a device to achieve higher-resolution images than those the physical camera can produce. This research underscores the feasibility of such a device through the implementation and evaluation of the network on the NVIDIA Jetson TX2 embedded system. The findings demonstrate the model's practicality for real-time surveillance applications and its ability to produce superior-quality images compared to several interpolation methods, as determined by an exhaustive testing process measuring various attributes of the generated images.

Automatic Generation of Synthetic Palm Images

Conference Paper

Full-text available

Sep 2022

A.M. Mahmud Chowdhury

A new method for synthetic palm image generation is proposed in this paper based on StyleGAN2-ADA, a specialized GAN architecture. This method is based on the modification of the styles of the palm, such as principal lines, secondary lines, wrinkles, etc. The model was trained on 3500 palm images, combined from two public datasets. The quality of the synthetic images, generated by the proposed model, is evaluated by a Scale Invariant Feature Transform (SIFT)-based custom algorithm where the features of the synthetic images (for example, principal lines) are compared with reference palm images. The synthetic images having lower quality metrics, below the threshold, are discarded. This quality assessment algorithm shows that 95 percent of the generated synthetic images are acceptable and have enough diversity to be employed for further biometric research. This research is significant as it can address the scarcity of biometric data especially of the palm image which is a relatively new research domain with lots of potential to be a robust identification and verification system.

Automatic Generation of Synthetic Palm Images.

Conference Paper

Aug 2022

Internet of things module accelerated dense deep learning for crime detection in surveillance systems

Article

Full-text available

Apr 2022

The smart surveillance system is becoming a vital application in each streets or houses. Most of the streets are prone to several misbehavior conducts for instance theft in atm, robbery, fights etc. and hence it is necessary to detect and analyse the crime scenes for finding the suspects. However, most of the surveillance system suffers from poor detection of objects due to poor camera resolution, absence of light and other factors. In order to improve the detection of faces after detecting the objects using ResNet, it is necessary to adapt some advanced devices for image capturing and analyzing. In this paper, Internet of Things (IoT) based ESP32 CAM WiFi Module Bluetooth with OV2640 Camera Module 2MP is used for image acquisition that capture better images from the scenes. The study uses dense convolutional network namely DenseNet to detect the faces present in the crime scenes after the object detection. The deep learning module is trained with selected crime scenes for training the classifier. The simulation is conducted further to validate the model with other variants of deep learning.

A Novel Neural Network Method for Face Recognition With a Single Sample Per Person

Article

Full-text available

Jun 2020

Face Recognition (FR) problem is one of the significant fields in computer vision. FR is used to identify the faces that appear over distributed cameras over the network. The problem of face recognition can be divided into two categories, the first is recognition with more than one sample per person, which can be called traditional face recognition problem. The second is the recognition of faces using only a Single Sample Per Person (SSPP). The efficiency of face recognition systems decreases because of limited references especially (SSPP) and faces taken in the Operational Domain (OD) different from faces in the Enrollment Domain (ED) in illumination, pose, low-resolution, and blurriness. This paper proposed a method that deals with all problems related to face recognition with SSPP. 3D face reconstruction is used to increase the reference gallery set with different poses and generate a design domain dictionary to overcome the problem of limited reference. Besides, the design domain dictionary is used to feed different deep learning models. Face illumination transfer techniques are utilized to overcome the illumination problem. Labeled Faces in theWild (LFW) dataset is used to train Super-Resolution Generative Adversarial Network (SRGAN) to overcome the low-resolution problem. Deblur Generative Adversarial Network (DeblurGAN) is trained on the LFW dataset to overcome the problem of blurriness. The proposed method is evaluated using the Chokepoint dataset and COX-S2V dataset. The final results confirm an overall enhancement in accuracy compared to techniques that use SSPP for face recognition (generic learning and face synthesizing approaches). Also, the proposed method outperforms of Traditional and Deep Learning (TDL) method accuracy, which uses SSPP for face recognition.

Face Recognition State-of-the-art, Enablers, Challenges and Solutions: A Review

Article

Full-text available

Apr 2020

The Effect of Database Type on Face Recognition Performance for Surveillance Applications

Article

Full-text available

Apr 2020

Implementation of multimodal biometric recognition via multi-feature deep learning networks and feature fusion

Article

Full-text available

Aug 2019
MULTIMED TOOLS APPL

Although there is an abundance of current research on facial recognition, it still faces significant challenges that are related to variations in factors such as aging, poses, occlusions, resolution, and appearances. In this paper, we propose a Multi-feature Deep Learning Network (MDLN) architecture that uses modalities from the facial and periocular regions, with the addition of texture descriptors to improve recognition performance. Specifically, MDLN is designed as a feature-level fusion approach that correlates between the multimodal biometrics data and texture descriptor, which creates a new feature representation. Therefore, the proposed MLDN model provides more information via the feature representation to achieve better performance, while overcoming the limitations that persist in existing unimodal deep learning approaches. The proposed model has been evaluated on several public datasets and through our experiments, we proved that our proposed MDLN has improved biometric recognition performances under challenging conditions, including variations in illumination, appearances, and pose misalignments.

Periocular recognition in thewild: Implementation of RGB-OCLBCP dual-stream CNN

Article

Full-text available

Jul 2019

Periocular recognition remains challenging for deployments in the unconstrained environments. Therefore, this paper proposes an RGB-OCLBCP dual-stream convolutional neural network, which accepts an RGB ocular image and a colour-based texture descriptor, namely Orthogonal Combination-Local Binary Coded Pattern (OCLBCP) for periocular recognition in the wild. The proposed network aggregates the RGB image and the OCLBCP descriptor by using two distinct late-fusion layers. We demonstrate that the proposed network benefits from the RGB image and thee OCLBCP descriptor can gain better recognition performance. A new database, namely an Ethnic-ocular database of periocular in the wild, is introduced and shared for benchmarking. In addition, three publicly accessible databases, namely AR, CASIA-iris distance and UBIPr, have been used to evaluate the proposed network. When compared against several competing networks on these databases, the proposed network achieved better performances in both recognition and verification tasks.

Deep Learning Architectures for Face Recognition in Video Surveillance

Chapter

Full-text available

Jul 2019

Face recognition (FR) systems for video surveillance (VS) applications attempt to accurately detect the presence of target individuals over a distributed network of cameras. In video-based FR systems, facial models of target individuals are designed a priori during enrollment using a limited number of reference still images or video data. These facial models are not typically representative of faces being observed during operations due to large variations in illumination, pose, scale, occlusion, blur, and camera interoperability. Specifically, in still-to-video FR application, a single high-quality reference still image captured with still camera under controlled conditions is employed to generate a facial model to be matched later against lower-quality faces captured with video cameras under uncontrolled conditions. Current video-based FR systems can perform well on controlled scenarios, while their performance is not satisfactory in uncontrolled scenarios mainly because of the differences between the source (enrollment) and the target (operational) domains. Most of the efforts in this area have been toward the design of robust video-based FR systems in unconstrained surveillance environments. This chapter presents an overview of recent advances in still-to-video FR scenario through deep convolutional neural networks (CNNs). In particular, deep learning architectures proposed in the literature based on triplet-loss function (e.g., cross-correlation matching CNN, trunk-branch ensemble CNN and HaarNet) and supervised autoencoders (e.g., canonical face representation CNN) are reviewed and compared in terms of accuracy and computational complexity.

Synthetic-based still-to-video face recognition system

Context in source publication

Citations