Figure - available from: Signal Image and Video Processing
This content is subject to copyright. Terms and conditions apply.
Visualization of neurons generated from two distinctive emotion classes: Happy and sad, at conv layer before second stage. a depicts the five most effective neuron responses and b presented the mean responses of all neurons. The regions of interest (highlighted by the dotted blocks) show the capability of the DCFA-CNN network to define the disparities between inter-class emotions

Visualization of neurons generated from two distinctive emotion classes: Happy and sad, at conv layer before second stage. a depicts the five most effective neuron responses and b presented the mean responses of all neurons. The regions of interest (highlighted by the dotted blocks) show the capability of the DCFA-CNN network to define the disparities between inter-class emotions

Source publication
Article
Full-text available
In this paper, we propose a novel CNN-based model named as Deep Cross Feature Adaptive Network (DCFA-CNN) for facial expression recognition. The proposed DCFA-CNN model holds two major components: shape feature (ShFeat) block and texture feature (TexFeat) block, respectively. The ShFeat block is responsible to extract high-level responses, which le...

Similar publications

Article
Full-text available
Humans recognise reddish-coloured faces as angry. However, does facial colour also affect “implicit” facial expression perception of which humans are not explicitly aware? In this study, we investigated the effects of facial colour on implicit facial expression perception. The experimental stimuli were “hybrid faces”, in which the low-frequency com...
Article
Full-text available
With the development of AI (Artificial Intelligence), facial expression recognition (FER) is a hot topic in computer vision tasks. Many existing works employ a single label for FER. Therefore, the label distribution problem has not been considered for FER. In addition, some discriminative features can not be captured well. To overcome these problem...
Article
Full-text available
The procedure of determining whether micro expressions are present is accorded a high priority in the majority of settings. This is due to the fact that despite the best attempts of the person, these expressions will always expose the genuine sentiments that are buried under the surface. The purpose of this study is to provide a novel approach to t...
Preprint
Full-text available
The limitation of the small-scale expression samples generally causes the performance degradation for facial expression recognition-based methods. Also, the correlation between different expression is always ignored when performing feature extraction process. Given above, we propose a novel approach that develops multi-class differentiation feature...
Article
Full-text available
Facial expressions of emotion can convey information about the world and disambiguate elements of the environment, thus providing direction to other people’s behavior. However, the functions of facial expressions from the perspective of learning patterns over time remain elusive. This study investigated how the feedback of facial expressions influe...

Citations

... The designed IAG-DCNN with 2.93M parameters and 2.77 GFLOPs has low model and computational complexity, unlike the recent state-of-the-art neural networks such as effective multi-head parallel channel-spatial attention network (MPCSAN) [23], region adaptive correlation deep network (RACN) [29], multi-relations aware network (MRAN) [30], enhanced discriminative global-local feature learning with priority (EDGL-FLP) [31], the multi-task joint learning network with a constrained fusion (CFNet) [32], etc. MPSCAN uses MS-Celeb-1M pre-trained ResNet18 as the backbone CNN and has 25.97M parameters and 2. 35 GFLOPs. RACN uses VGG19 pre-trained on ImageNet as backbone CNN and has more than 22M parameters. ...
... The edge-aware feedback convolutional neural network (E-FCNN) proposed by Shao and Cheng [34] can recognize facial expressions in tiny images captured by surveillance cameras. The deep cross-feature adaptive network (DCFA-CNN) presented for FER by Reddy et al. [35] embeds a two-branch cross-relationship to collect complementary information from a shape feature (ShFeat) block and texture feature (TexFeat) block. Such a scheme helps DCFA-CNN boost the discriminability of the network. ...
Article
Full-text available
Facial expression recognition (FER) in real-world unconstrained conditions is a challenging and active field of research among the pattern recognition and computer vision community. FER systems have immense use in advanced applications based on human-computer interaction (HCI) and human-robot interaction (HRI). Most of these applications heavily rely on the manifestation of hidden emotions of the individuals. However, recognizing facial expressions in complex real-world conditions is difficult for a computer, unlike humans. Over the decades, researchers developed numerous methods for FER in static images. Most of these methods fail to effectively characterize the feature differences among facial expressions and attain desired robustness in real-world scenarios. Besides, these methods are compute-intensive, and their parameters are too large to realize real-time classification of facial expressions on low-cost embedded devices. Hence, to mitigate these challenges and develop a robust and compute-efficient scheme for FER in the wild, this paper proposes a novel integrated attention-guided deep convolutional neural network (IAG-DCNN). The IAG-DCNN model integrates and fine-tunes two lightweight attention-guided DCNNs (AG-DCNNs) initially pre-trained on the FER datasets. The AG-DCNNs use channel attention blocks (CABs) to select relevant convolutional filters, thus alleviating feature map redundancy. Also, the CABs suppress useless information and improve the classification accuracy of the AG-DCNNs. To test the effectiveness of the designed IAG-DCNN model, we performed experiments on the FER2013, RAF-DB, and SFEW datasets. The proposed IAG-DCNN with 2.93M parameters, 2.77 GFLOPs, and 11.50MB of memory storage size attains competitive recognition accuracy on the benchmark FER in the wild datasets. Also, the lightweight IAG-DCNN model with an inference time of 2.96ms significantly improves the overall classification time.
... Hariprasad et al. [53] They proposed Deep Cross Feature Adaptive Network (DCFA-CNN) which contains two main components: shape feature (ShFeat) component used for extracting high-level features which are discriminative to different expressions, texture feature (TexFeat) component used for extracting the minute variations that differ from expression to expression. ...
Article
Full-text available
Facial expressions are an important form of non-verbal communication as they directly reflect the internal emotions of a person. The primary task of automated Facial Expression Recognition (FER) systems lies in extracting salient features related to facial expressions. In this paper, a Cross Connected Convolutional Neural Network (CC-CNN) has been proposed for extracting the facial features. The proposed CC-CNN model contains two levels of input for extracting the features related to facial expressions. Cyclopentane Feature Descriptor (CyFD), inspired by cyclopentane’s structure, has been proposed to extract significant features. The feature response map generated by the CyFD method has been given as input to the first level, and in the second level, the features have been extracted directly from the facial image. The input images from both levels are passed through a series of convolutional layers with cross connections for extracting the fused (local and global) features related to facial expressions. Finally, towards the end, the CC-CNN method works by fusing all the features extracted from both levels. To validate the efficiency of the proposed CC-CNN method, the experiments have been performed on benchmark FER datasets such as CK+, MUG, RAF, FER2013 and FERG. The comparison results from the experimental analysis revealed that the proposed model outperformed the recent FER methods.
... The MRIPP model is a relatively simple network compared to the aforementioned existing DNN models for FER. [18] proposed a CNN architecture with deep cross-feature adaptive network (DCFA-CNN). DCFA-CNN consists of two components: shape feature (ShFeat) block and texture feature (TexFeat) block. ...
... The accuracies of the proposed EFL-LCNN model on individual datasets are compared with the existing CNN based FER models which are discussed in the literature. Tables 5, 6 [15], VGG19 [12], DCMA-CNNs [25], and DCFA-CNN [18] used SIFT features, Network-specific features, and Texture features, these models are extracting the features from a facial image whcih included with the background noise. The background noise in the face regions located by the existing models is reducing the FER performance of these models due to training the irrelavant information to recognize the facial expression. ...
Article
Full-text available
Facial expression is an inevitable aspect of human communication, and hence facial emotion recognition (FER) has become the basis for many machine vision applications. Many deep learning based FER models have been developed and shown good results on emotion recognition. However, FER using deep learning still suffering from illumination conditions, noise around the face such as hair, background, and other ambience conditions. To mitigate such issues and improve the performance of FER, we propose Enhanced Face Localization augmented Light Convolution Neural Network (EFL-LCNN). EFL-LCNN incorporates three phase pre-processing and Light CNN, a trimmed VGG16 model. Three phase pre-processing includes face detection, enhanced face region cropping for ambience noise removal and image enhancement using CLAHE for addressing illumination problems. Three phase pre-processing is followed by the implementation of Light CNN to improve FER performance with reduced complexity. The EFL-LCNN is rigorously tested on four publicly available benchmark datasets: JAFFE, CK, MUG and KDEF. It is observed from the empirical results that the EFL-LCNN boosted recognition accuracies significantly when compared with the state-of-the-art.
... Emotions related to e-learning, like boredom, confusion, contempt, curiosity, disgust, eureka, delight, and frustration were mainly identifed in recent literature [32][33][34][35][36][37][38][39]. Deep learning models, mainly convolutional neural networks, are used for emotion classifcation. ...
... Diferent deep learning models such as VGGNet [34,39] and ResNet [35] are used for the implementation. A variant of CNN, DCFA-CNN [36], is tested with diferent image datasets and got excellent classifcation result. Yolcu et al. [40] presents a deep learning-based system for customer behavior monitoring applications. ...
... Terefore, a single 36 × 1 matrix or four 9 × 1 matrices are there while creating a 16 × 16 block.Mathematically, for given vectors V are 36 rows there, which is represented in equation V � r 1 , r 2 , . . . , r36 . ...
Article
Full-text available
Human emotion recognition from videos involves accurately interpreting facial features, including face alignment, occlusion, and shape illumination problems. Dynamic emotion recognition is more important. The situation becomes more challenging with multiple persons and the speedy movement of faces. In this work, the ensemble max rule method is proposed. For obtaining the results of the ensemble method, three primary forms, such as CNNHOG-KLT, CNNHaar-SVM, and CNNPATCH are developed parallel to each other to detect the human emotions from the extracted vital frames from videos. The first method uses HoG and KLT algorithms for face detection and tracking. The second method uses Haar cascade and SVM to detect the face. Template matching is used for face detection in the third method. Convolution neural network (CNN) is used for emotion classification in CNNHOG-KLT and CNNHaar-SVM. To handle occluded images, a patch-based CNN is introduced for emotion recognition in CNNPATCH. Finally, all three methods are ensembles based on the Max rule. The CNNENSEMBLE for emotion classification results in 92.07% recognition accuracy by considering both occluded and nonoccluded facial videos.
... In this context, deep learning models as Convolutional Neural Networks (CNN) and machine learning technique which have been applied to the area of computer vision, giving state-of-the-art results in several research works with the availability of big data, including object identification, face detection [15], and emotion detection [11,[16][17][18][19][20][21][22][23][24][25][26]. Thanks to the huge volume of data generated and the rapid advancement in artificial intelligence (AI) techniques in different fields including human-computer interaction (HCI) and advanced driver assistant systems (ADASs) that led to an increasing interest in facial expression recognition. ...
Article
Full-text available
Human facial emotion recognition (FER) has attracted interest from the scientific community for its prospective uses. The fundamental goal of FER is to match distinct facial expressions to different emotional states. Recent state-of-the-art studies have generally adopted more complex methods to achieve this aim, such as large-scale deep learning models or multi-model analysis referring to multiple sub-models. Unfortunately, performance defacement happens in these approaches because to poor layer selection in the convolutional neural networks (CNN) architecture. To resolve this problem and unlike these models, the present work proposes a Deep CNN-based intelligent computer vision system capable of recognizing facial emotions. To do so, we propose, first, a Deep CNN architecture using Transfer Learning (TL) approach for constructing a highly accurate FER system, in which a pre-trained Deep CNN model is adopted by substituting its dense upper layers suitable with FER, and the model is fine-tuned with facial expression data. Second, we propose improving ResNet18 model due to its highest performance in terms of recognition accuracy compared with the state-of-the-art studies. Then, the improved model is trained and tested on two benchmark datasets, FER2013 and CK+. The improved ResNet18 model achieves FER accuracies of 98% and 83% on CK+ and FER2013 test sets, respectively. The obtained results show that the suggested FER system based on the improved model outperforms the Deep TL techniques in terms of both emotion detection accuracy and evaluation metrics.
... Facial identification enhances overall security in applications such as e-banking, e-commerce, forensics, airport security, etc. [1,2]. Face recognition aims to give a computer system the ability to quickly and precisely recognize human faces in images or videos [3,4]. Numerous algorithms and methods, includ-ing recently proposed deep learning models, have been proposed to improve face recognition performance [5][6][7]. ...
Article
Full-text available
Face recognition has grown in popularity due to the ease with which most recognition systems can find and recognize human faces in images and videos. However, the accuracy of the face recognition system is critical in ascertaining the success of a person's identification. A lack of sufficiently large training datasets is one of the significant challenges that limit the accuracy of face recognition systems. Meanwhile, machine learning (ML) algorithms, particularly those used for image based face recognition, require large training data samples to achieve a high degree of face recognition accuracy. Based on the above challenge, this research proposes a method for improving face recognition precision and accuracy by employing a hybrid approach of the Gabor filter and a stacked sparse autoencoders (SSAE) deep neural network. The face image datasets from Olivetti Research Laboratory (OLR) and the Extended Yale-B databases were used to evaluate the proposed hybrid model's performance. All face image datasets used in our experiments are grayscale image type with a resolution of 92 × 112 for the OLR database and a resolution 192 × 168 for the Extended Yale-B database. Our experimental results showed that the proposed method improved face recognition accuracy by approximately 100% for the two databases used at a significantly reduced feature extraction time compared to the current state-of-art face recognition methods for all test cases. The SSAE approach can explore large and complex datasets with minimal computation time. In addition , the algorithm minimizes the false acceptance rate and improves recognition accuracy. This implies that the proposed method is promising and has the potential to enhance the performance of face recognition systems.
... Akhand et al. [6] proposed a DCNN (Deep CNN) model through transfer learning (TL) modified the dense upper layers compatible to FER by finetuning the model with facial emotion data in order to enhance the FER accuracy. Reddy et al. [7] proposed a Deep Cross Feature Adaptive Network (DCFA-CNN) to incorporate two complementary features such as shape features and textual features in order to boost the FER performance. ShFeat block is used to discriminate features from different expressive regions, and TexFeat block is used to hold micro variations that define structural differences in the expressive regions. ...
Article
Full-text available
Emotion is an important aspect of effective human communication, and hence, facial emotion recognition (FER) has become essential in human–computer interaction systems. The automation of FER has been carried out by many researchers using ML/DL techniques. However, the models developed using convolutional neural networks (CNNs) bagged high recognition accuracies among different FER approaches. Rather than its high performance, CNN has failed to encode different orientation features since the pooling operations used in CNN for feature extraction omit vital information. Due to omitting vital features, the performance will be reduced while recognizing the emotions from facial images that consists of different orientations. Subsequently, to reduce the problems of CNN such as encoding different orientation features and increased training time, Capsule Networks (CapsNet) were developed. CapsNet is capable of storing 8 such features vectors with the incorporation of dynamic routing approaches and squashing in place of pooling operations to mitigate the issue of rotational invariance. Hence in this paper, we proposed CapsNet for FER in order to enhance the accuracy. However, the facial images that consider for training consist of unwanted information that is not essential for FER, delay the convergence and take more iterations for training facial images. Hence, face localization (FL) is proposed to incorporate with CapsNet in our model to eliminate the back ground noise or unwanted information from the facial images for effective training process. The proposed FL-CapsNet is rigorously tested on benchmark datasets such as JAFFE, CK+, and FER2013 to evaluate the generalization of the proposed model, and it is evidenced that FL-CapsNet outperformed the existing CapsNet-based FER models.
... Machine learning (ML) techniques have obtained remarkable achievements on various tasks, such as image recognition [1], face detection [2] and object detection [3]. In general, building an effective machine learning model is a complex and time-consuming process that involves determining an appropriate algorithm and obtaining the optimal model architecture by tuning its hyperparameters. ...
Article
Full-text available
Machine learning algorithms are sensitive to hyperparameters, and hyperparameter optimization techniques are often computationally expensive, especially for complex deep neural networks. In this paper, we use Q-learning algorithm to search for good hyperparameter configurations for neural networks, where the learning agent searches for the optimal hyperparameter configuration by continuously updating the Q-table to optimize hyperparameter tuning strategy. We modify the initial states and termination conditions of Q-learning to improve search efficiency. The experimental results on hyperparameter optimization of a convolutional neural network and a bidirectional long short-term memory network show that our method has higher search efficiency compared with tree of Parzen estimators, random search and genetic algorithm and can find out the optimal or near-optimal hyperparameter configuration of neural network models with minimum number of trials.
... Kalsum et al. (2021) proposed fusion of global (HoG) and local based feature descriptors (Local Intensity Order Pattern (LIOP)) for facial emotion recognition. Reddy et al. (2021) proposed Deep Cross Feature Adaptive Network (DCFA-CNN) for extracting both high level responses as well as minute variations from a facial image. Verma et al. (2022) proposed Cross-Centroid Ripple Pattern (CRIP) for encoding the image features by using inter radial ripples. ...
Article
Full-text available
Facial expressions can convey the internal emotions of a person within a certain scenario and play a major role in the social interaction of human beings. In automatic Facial Expression Recognition (FER) systems, the method applied for feature extraction plays a major role in determining the performance of a system. In this regard, by drawing inspiration from the Swastik symbol, three texture based feature descriptors named Symbol Patterns (SP1, SP2 and SP3) have been proposed for facial feature extraction. SP1 generates one pattern value by comparing eight pixels within a 3×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}3 neighborhood, whereas, SP2 and SP3 generates two pattern values each by comparing twelve and sixteen pixels within a 5×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}5 neighborhood respectively. In this work, the proposed Symbol Patterns (SP) have been evaluated with natural, fibonacci, odd, prime, squares and binary weights for determining the optimal recognition accuracy. The proposed SP methods have been tested on MUG, TFEID, CK+, KDEF, FER2013 and FERG datasets and the results from the experimental analysis demonstrated an improvement in the recognition accuracy when compared to the existing FER methods.
... Facial expression recognition (FER) has been developed as an important research topic owing to its wide application [1][2][3][4]. Despite deep learning models achieves comparative performance in FER, they mostly leverage vast annotated samples during the training stage. ...
Preprint
Full-text available
The limitation of the small-scale expression samples generally causes the performance degradation for facial expression recognition-based methods. Also, the correlation between different expression is always ignored when performing feature extraction process. Given above, we propose a novel approach that develops multi-class differentiation feature representation guided joint dictionary learning for FER. The proposed approach mainly includes two steps: firstly, we construct multi-class differentiation feature dictionaries corresponding to different expressions of training samples, aiming to enlarge inter-expression distance to mitigate the problem of nonlinear distribution in training samples. Secondly, we joint learn the multiple feature dictionaries by optimizing the resolutions of each feature dictionary, aiming to establish the strong relationship and enhance the representation ability among multiple feature dictionaries. To sum up, the proposed approach has more discriminative ability from the representation perspective. Comprehensive experiments carried out using three public datasets, including JAFFE, CK+, and KDEF datasets, demonstrate that the proposed approach has strong performance for small-scale samples compared to several state-of-the-art methods.