Chapter

Deep Learning Approaches for Facial Emotion Recognition: A Case Study on FER-2013

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Emotions constitute an innate and important aspect of human behavior that colors the way of human communication. The accurate analysis and interpretation of the emotional content of human facial expressions is essential for the deeper understanding of human behavior. Although a human can detect and interpret faces and facial expressions naturally, with little or no effort, accurate and robust facial expression recognition by computer systems is still a great challenge. The analysis of human face characteristics and the recognition of its emotional states are considered to be very challenging and difficult tasks. The main difficulties come from the non-uniform nature of human face and variations in conditions such as lighting, shadows, facial pose and orientation. Deep learning approaches have been examined as a stream of methods to achieve robustness and provide the necessary scalability on new type of data. In this work, we examine the performance of two known deep learning approaches (GoogLeNet and AlexNet) on facial expression recognition, more specifically the recognition of the existence of emotional content, and on the recognition of the exact emotional content of facial expressions. The results collected from the study are quite interesting.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Attentional CNN [27] 70.02 VGG + SVM [49] 66.31 GoogleNet [50] 65.20 VGG backbone [51] 75.00 VGG-19 [48] 65.41 Proposed model 84.30 ...
... Attentional CNN [27] 92.80 LBP + ORG features [49] 88.50 Deep features + HOG [50] 90.58 CNN + SVM [51] 95.31 VGG-19 [48] 99.47 Proposed model 95.41 in social encounters, and examining facial expressions in forensic investigations. CNN-10 if properly deployed can supplement conventional techniques of behavior analysis by providing impartial and automated evaluations. ...
Article
Full-text available
The importance of facial expressions in nonverbal communication is significant because they help better represent the inner emotions of individuals. Emotions can depict the state of health and internal wellbeing of individuals. Facial expression detection has been a hot research topic in the last couple of years. Te motivation for applying the convolutional neural network-10 (CNN-10) model for facial expression recognition stems from its ability to detect spatial features, manage translation invariance, understand expressive feature representations, gather global context, and achieve scalability, adaptability, and interoperability with transfer learning methods. This model offers a powerful instrument for reliably detecting and comprehending facial expressions, supporting usage in recognition of emotions, interaction between humans and computers, cognitive computing, and other areas. Earlier studies have developed different deep learning architectures to offer solutions to the challenge of facial expression recognition. Many of these studies have good performance on datasets of images taken under controlled conditions, but they fall short on more difficult datasets with more image diversity and incomplete faces. This paper applied CNN-10 and ViT models for facial emotion classification. The performance of the proposed models was compared with that of VGG19 and INCEPTIONV3. The CNN-10 outperformed the other models on the CK + dataset with a 99.9% accuracy score, FER-2013 with an accuracy of 84.3%, and JAFFE with an accuracy of 95.4%.
... Using Python programming language and the FER-2013 [3] dataset, we offer a FEs recognition model in the current study. The model, which we trained using a convolutional neural network (CNN) [4] architecture, has a validation accuracy greater than 70 percent. ...
Conference Paper
Full-text available
Facial Emotion Recognition is one of the in-demand and rapidly growing research topics in the domain of Computer Vision (CV) and artificial intelligence (AI). The ability to identify or detect human emotions from real-time facial expressions (FEs) has vast conceivable applications in different domains, such as sentiment analysis, human-computer interaction, human resource management, security, and human psychology. In this paper, a Convolutional Neural Network (CNN) based deep learning model is trained with haar-cascade classifier to recognize the real-time FEs. The suggested model is specially trained to categorize the FEs into one of the seven emotion categories, namely six basic emotions (sad, happy, angry, surprised, disgusted, fear) and a neutral emotion. It includes several convolutional layers, as well as fully connected neurons followed by max-pooling layers, and soft-max activation function with the corresponding seven classes. ReLU activation functions along with various kernels to enhance filtering depth, and extraction of facial features. FER-2013 dataset is used for experimentation purpose. To improve the classification performance and model accuracy, a data augmentation technique is used for rescaling and horizontal flipping. The proposed model outperforms the previous related works by achieving a validation accuracy of 71.96% and training accuracy above 90%, with fewer epochs.
... The paper on "Deep Learning Approaches for Emotion Recognition," authored by Smith and Johnson, underscores the pivotal role of deep learning in emotion recognition (Giannopoulos et al., 2018). Their study explores the effectiveness of deep neural networks in extracting nuanced emotional cues from images, fostering a paradigm shift in the field. ...
Article
In the era of twenty-first century, an era characterized by the proliferation of digital technology, big data and so on, the ability to identify human emotions through visual content from images has gained much importance and its popularity is increasing worldwide. This project deals with the task of detecting emotions from images using deep learning techniques with a specific emphasis on Mobile Net-based architectures. We start the project by preparing the dataset of various images showing diverse emotions. The Mobile Net architecture, a powerful convolutional neural network is fine-tuned with a custom dense layer to classify emotions into seven distinct categories. Data argumentation techniques such as zooming, shearing and horizontal flipping are incorporated to enhance robustness and prevent overfitting. The training dataset is preprocessed and normalized while a segregated validation dataset ensures stringent evaluation. During training we implemented early stopping and model checkpoint mechanisms to get optimal performance while avoiding overfitting. After training the analysis of accuracy and loss metrics provides an insight into the model’s trajectory. In practical applicability we use the trained model to predict emotion from single images, showcasing its potential in various domains, including digital marketing, healthcare, and user experience design. In today’s digital landscape the project findings hold relevance for a wide spectrum of applications, promising advancements in human computer interactions and emotion aware systems.
... In recent years, convolutional-based deep FER models have been shown to consistently outperform SIFT algorithms when it comes to scalability and generalizability in FER tasks (Goodfellow et al., 2013). Moreover, recent state-of-the-art advancements in deep learning models for videobased facial expression recognition analysis allowed us to classify the participants' facial emotions on a wider spectrum on a continuous range, which allows more flexible and robust frameworks for multi-class emotion classification by modification of the SoftMax classification layer (Giannopoulos et al., 2018). On top of that, there have also been innovations in deep learning-based FER in the wild research, which provided the ability to analyze human emotions directly from video recordings, without the need for explicit lab-control environments. ...
Article
Full-text available
Artificial intelligence (AI) has been recognised as a promising technology for methodological progress and theoretical advancement in learning sciences. However, there remains few empirical investigations into how AI could be applied in learning sciences research. This study aims to utilize AI facial recognition to inform the learning regulation behaviors in synchronous online collaborative learning environments. By studying groups of university students (N = 36) who participated in their online classes under the COVID-19 social distancing mandates, we strive to understand the interrelation between individual affective states and their collaborative group members. Theoretically underpinned by the socially shared regulation of learning framework, our research features a cutting-edge insight into how learners socially shared regulation in group-based tasks. Findings accentuate fundamental added values of AI application in education, whilst indicating further interesting patterns about student self-regulation in the collaborative learning environment. Implications drawn from the study hold strong potential to provide theoretical and practical contributions to the exploration of AI supportive roles in designing and personalizing learning needs, as well as fathom the motion and multiplicity of collaborative learning modes in higher education.
... The elucidation of the underlying mechanisms through which complex physiological and psychological activities give rise to human emotions continues to be a focal point in contemporary research, holding a rich tradition of multifaceted inquiry. This domain substantially intersects with the development and refinement of emotion recognition and analytic methodologies, leveraging indicators such as facial expressions [1,2] and vocal and/or speech patterns [3,4] as pertinent vectors for investigation. Such efforts represent an ongoing commitment to deconstructing the intricate puzzle that is the genesis and manifestation of human emotional responses. ...
Article
Full-text available
A kernel attention module (KAM) is presented for the task of EEG-based emotion classification using neural network based models. In this study, it is shown that the KAM method can lead to more efficient and accurate models using only a single parameter design. This additional parameter can be leveraged as an interpretable scalar quantity for examining the overall amount of attention needed during deep feature refinement. Extensive experiments are analyzed on both the SEED and DEAP datasets to demonstrate the module’s performance on subject-dependent classification tasks. From these benchmark studies, it is shown that KAM is able to boost the backbone model’s mean prediction accuracy by more than 3% on some subjects and up to more than 1%, on average, across 15 subjects in the SEED dataset for subject dependent tasks. In the DEAP dataset, the improvement is more significant by achieving greater than 3% improvement in the overall mean accuracy versus the no-attention case, and more than 1–2% when benchmarked against various other state-of-the-art attention modules. In addition, the predictive dependencies of KAM with respect to its single parameter is numerically examined up to first order. Accompanying analyzes and visualization techniques are also proposed for interpreting the KAM attention module’s effects, and interaction with the backbone model’s predictive behaviors. These quantitative results can be explored in greater depth to identify correlations with pertinent clinical neuroscientific observations. Finally, a formal mathematical proof of KAM’s permutation equivariance property is included.
... Giannopoulos, et al. [23] intended to analyze the characteristics of human based on the facial expressions with the help of deep learning architectures such as AlexNet and GoogleNet. Also, the performance of widely used deep learning algorithms were compared in order to select the most suitable methodology for facial expression detection, which includes CNN, Deep Belief Network (DBN), and Deep Boltzmann Machines (DBM). ...
Article
Full-text available
Facial emotion recognition plays a vital role in the field of human-computer interaction, since the communication is significantly influenced by emotion. The conventional deep learning techniques have the unique issues with computing burden, high requirements, and system complexity, deep learning has relatively limited application. Therefore, the proposed work intends to utilize a novel as well as computationally effective deep learning model for an automated facial expression recognition system. Here, an Automatic Direct Face Filtering (ADFF) method is used to filter, remove noise, and improve the quality of the face image. By using the components of the face’s features, a unique Weighted Deep Convolution Model (WDCM) technique is used to precisely predict the emotion from the face image. Furthermore, an African Vultures Optimization Algorithm (AVOA) is used to optimize the number of features in order to streamline the recognition process with minimal computational load and time, improving forecast accuracy. The inclusion of ADFF and AVOA are the major reasons for obtaining the better classification performance in the proposed, because it boosts the training and testing performance of the WDCM with low time consumption and high processing speed by providing the best optimal features for recognition. Moreover, the performance and results of the proposed WDCM-AVOA technique is validated and compared using the popular JAFEE and CK + datasets. By using the proposed framework, the overall average recognition accuracy is improved up to 99% and the average prediction is boosted to 99.2% for both datasets.
... Bing images: We use Bing API service to query for images using each of the emotional categories as search query parameters. FER2013 [21]: FER2013 dataset has seven emotion categories e.g., Angry, Disgust, Fear, Happy, Sad, Surprise, Neutral. Among them, we use the images of Fear, Happy & Sad categories. ...
Conference Paper
Abstract—When we search for the word ‘sad’ on an online search engine, we can find different types of images like broken hearts, rainy days, sad emoticons, & people sitting with their heads down. These pictures are distinct. Yet they express the same emotion ‘sadness’. Different researchers have concentrated on emotion detection from images using facial expressions, but only a few researches are found on emotion detection from both face & non-face images. We create a dataset by collecting data from various social media. We propose a method for emotion classification from images where we adopt the deep learning concept along with a few pre-processing steps including data augmentation & filtering. Our approach classifies images into five different emotional categories. Our approach shows promising experimental results in emotion classification. The accuracy of our proposed approach is 85 percent.
... For facial emotion analysis, we decided on PyTorch for deep learning. We combined virtual character data for seven fundamental emotions (surprise, sadness, neutral, happiness, disgust, fear, and anger) with the FER2013 database to create our dataset (Giannopoulos et al., 2018). Machine learning training yielded a pth model, which we subsequently converted to the ONNX model format using PyTorch. ...
Article
Full-text available
This study delved into the realm of facial emotion recognition within virtual reality (VR) environments. Using a novel system with MobileNet V2, a lightweight convolutional neural network, we tested emotion detection on 15 university students. High recognition rates were observed for emotions like “Neutral”, “Happiness”, “Sadness”, and “Surprise”. However, the model struggled with 'Anger' and 'Fear', often confusing them with “neutral”. These discrepancies might be attributed to overlapping facial indicators, limited training samples, and the precision of the devices used. Nonetheless, our research underscores the viability of using facial emotion recognition technology in VR and recommends model improvements, the adoption of advanced devices, and a more holistic approach to foster the future development of VR emotion recognition.
... All images and videos were captured against a constant background. The FER 2013 dataset is widely utilized for facial emotion recognition and comprises 35,887 grayscale images collected from the internet and labeled by crowdsourced workers [26]. Further datasets for facial emotion recognition encompass EmoReact, MMI, and RAF-DB [21,22]. ...
Article
Full-text available
Facial expression recognition (FER) poses a complex challenge due to diverse factors such as facial morphology variations, lighting conditions, and cultural nuances in emotion representation. To address these hurdles, specific FER algorithms leverage advanced data analysis for inferring emotional states from facial expressions. In this study, we introduce a universal validation methodology assessing any FER algorithm’s performance through a web application where subjects respond to emotive images. We present the labelled data database, FeelPix, generated from facial landmark coordinates during FER algorithm validation. FeelPix is available to train and test generic FER algorithms, accurately identifying users’ facial expressions. A testing algorithm classifies emotions based on FeelPix data, ensuring its reliability. Designed as a computationally lightweight solution, it finds applications in online systems. Our contribution improves facial expression recognition, enabling the identification and interpretation of emotions associated with facial expressions, offering profound insights into individuals’ emotional reactions. This contribution has implications for healthcare, security, human-computer interaction, and entertainment.
... A screen displaying a Virtual Reality game implemented in Unity provides visual feedback about the stick model of the arm by exploiting the computed joint angles and shows demonstration videos recorded by physiotherapists. The COGI module estimates the emotional status of the participants from the FE using a convolutional neural network trained on FER-2013 dataset images to recognize three discrete emotions, i.e., "Happy," "Neutral," and "Sad" [41]. The images retrieved from the RGB camera integrated into the TIAGo head are exploited. ...
Article
Full-text available
The communication channels between physiotherapists and patients are many and varied. Rehabilitation robots are able to deliver intensive treatments and improve the patient's quality of life. However, rehabilitation robots in the literature do not integrate physical manipulation with natural verbal communication yet. This article proposes an innovative integrated system for motor rehabilitation based on the combination of physical and cognitive components to emulate the natural interaction between physiotherapists and patients. The proposed approach was validated in a laboratory setting with 20 healthy subjects. The cognitive system's ability to interact linguistically as well as the participants' kinematic performance and the emotional impact generated by two different robotic systems were assessed. The former integrates advanced linguistic capabilities and the latter lacks any verbal communication. The results showed that the presence of linguistic interaction promotes the quality of interaction, leading to improvements both in the execution of movements and in emotional terms.
... Accuracy (%) GoogleNet [36] 65.20 Deep Emotion [37] 70.02 Inception [38] 71.60 ConvNeXt-Tiny 71.99 Ad-Corre [16] 72.03 ConvNeXt-Small 72.34 SE-Net50 [39] 72.50 ResNet50 [39] 73 With notable progress, EmoNeXt-Small exhibits enhanced performance compared to EmoNeXt-Tiny, attaining an accuracy of 74.33%. This achievement surpasses advanced architectures like the Residual Masking Network (74.14%) and LHC-NetC (74.28%), as well as the last two sizes of the original ConvNext (Large and XLarge). ...
Conference Paper
Full-text available
Facial expressions play a crucial role in human communication serving as a powerful and impactful means to express a wide range of emotions. With advancements in artificial intelligence and computer vision, deep neural networks have emerged as effective tools for facial emotion recognition. In this paper, we propose EmoNeXt, a novel deep learning framework for facial expression recognition based on an adapted ConvNeXt architecture network. We integrate a Spatial Transformer Network (STN) to focus on feature-rich regions of the face and Squeeze-and-Excitation blocks to capture channel-wise dependencies. Moreover, we introduce a self-attention regularization term, encouraging the model to generate compact feature vectors. We demonstrate the superiority of our model over existing state-of-the-art deep learning models on the FER2013 dataset regarding emotion classification accuracy.
... Figure 4 indicates that the training accuracy curve becomes steady when the epochs approach 2000. We fine-tune the YOLOv3 parameters by referring to previous research [36][37][38]. First, we set the batch size to 64, momentum to 0.9, and weight decay to 0.0005 according to a normal objective detection approach. ...
Article
Full-text available
Deep-learning based facial emotion recognition (FER) has potential in the service industry. To solve conventional emotion recognition problems, this study proposes a deep learning-based model to achieve a highly-efficient FER system. The proposed FER model consists of two stages. The first stage uses the multitask convolution neural network to distinguish precise face-bounding box positions, whereas the second adopts a deep-learning network to achieve real-time recognition of emotions and features. By training our model using three global FER datasets, its accuracy indicates that the proposed model outperforms existing FER models. The study illustrated the model from three aspects. First, a massive facial database is investigated for model feasibility with a variety of service scenarios. Secondly, we demonstrated practical examples in the restaurant and retailing service industries. Third, the model performs advice by monitoring the emotion when the player is assembling Lego. The model can analyze human emotions in the service industry to identify customer satisfaction with products and/or services, fatigue in working domains, and/or safety in jobs. The model will help improve customer relationships, provide pleasant transaction solutions, and even help to broaden product offerings and promotions. Such facial recognition technology can further motivate new digital business models and change customer-server dynamics.
... Nevertheless, multiple challenges are encountered when incorporating human emotion recognition for safe driving on roads. Figure 2 represents the challenges to facial emotion recognition as identified by the authors in the reviewed literature (Giannopoulos, Perikos, and Hatzilygeroudis, 2018) (Verma and Choudhary, 2018) (Bhattacharya and Gupta, 2019) (Theagarajan et al., 2017). Figure 3 depicts the challenges to speech emotion recognition ( (Basu, Chakraborty, Bag and Aftabuddin, 2017). ...
Conference Paper
Full-text available
Accidents on roads have been a serious issue for decades in the world. As a solution to this issue, driver emotion recognition has gained much attention where the affective states of the drivers are monitored. In the context of driver emotion recognition, both the physiological and non-physiological signals are utilised in identifying the emotional states of the drivers. Among the approaches taken by researchers in determining the driver emotional states facial emotions, speech emotions, Galvanic Skin Response (GSR), Electrocardiogram (ECG) signals, Electroencephalography (EEG) signals, etc., are much more prominent. Nevertheless, physiological signals are a valuable asset in identifying the emotional states since non-physiological signals such as facial emotion recognition, that is mainly used to detect driver affective states, can be misleading. This study aims to review the literature related to driver emotion recognition that aims on ensuring the safety of road users. Furthermore, the approaches taken by the researchers in the reviewed literature have been briefly discussed, and the challenges to these approaches have been further discussed to enhance the safety of road users and future research in the paradigm of driver emotion recognition.
... There is also much related work in affect identification. For example, convolution-based models are typical in image tasks, and can be applied to affect identification specifically (Giannopoulos, Perikos, and Hatzilygeroudis 2018). The very deep convolutional neural network (VGG) architecture achieves 73.28% accuracy on the 8-class FER2013 dataset (Khaireddin and Chen 2021). ...
Article
Full-text available
It is common to listen to songs that match one's mood. Thus, an AI music recommendation system that is aware of the user's emotions is likely to provide a superior user experience to one that is unaware. In this paper, we present an emotion-aware music recommendation system. Multiple models are discussed and evaluated for affect identification from a live image of the user. We propose two models: DRViT, which applies dynamic routing to vision transformers, and InvNet50, which uses involution. All considered models are trained and evaluated on the AffectNet dataset. Each model outputs the user's estimated valence and arousal under the circumplex model of affect. These values are compared to the valence and arousal values for songs in a Spotify dataset, and the top-five closest-matching songs are presented to the user. Experimental results of the models and user testing are presented.
... The authors of [39] An identity-aware CNN (IA-CNN) created that uses identity-and expressionsensitive contrast loss to reduce variation of expressionrelated information during identity learning. Similarly, they have developed a network architecture with a focal model called end-to-end network architecture [40]. To minimize uncertainty and avoid unclear face images (caused by labeling noise) from overfitting the deeper network, the authors in [41] devised a quick and efficient self-repair technique (SCN). ...
Article
Full-text available
Driver emotion classification is an important topic that can raise awareness of driving habits because many drivers are overconfident and unaware of their bad driving habits. Drivers will acquire insight into their poor driving behaviors and be better able to avoid future accidents if their behavior is automatically identified. In this paper, we use different models such as convolutional neural networks, recurrent neural networks, and multi-layer perceptron classification models to construct an ensemble convolutional neural network-based enhanced driver facial expression recognition model. First, the faces of the drivers are discovered using the faster region-based convolutional neural network (R-CNN) model, which can recognize faces in real-time and offline video reliably and effectively. The feature-fusing technique is utilized to integrate the features extracted from three CNN models, and the fused features are then used to train the suggested ensemble classification model. To increase the accuracy and efficiency of face detection, a new convolutional neural network block (InceptionV3) replaces the improved Faster R-CNN feature-learning block. To evaluate the proposed face detection and driver facial expression recognition (DFER) datasets, we achieved an accuracy of 98.01%, 99.53%, 99.27%, 96.81%, and 99.90% on the JAFFE, CK+, FER-2013, AffectNet, and custom-developed datasets, respectively. The custom-developed dataset has been recorded as the best among all under the simulation environment.
... Mini-Xception [26], which is one of the state-of-the-art methods for extracting emotion from facial expressions, was used for this study. The Mini-Xception model has achieved an accuracy of around 95.60% on the FER-2013 dataset [27]. This convolutional neural network (CNN) model has been used to build various real-time systems. ...
Article
Full-text available
Students' affective states describe their engagement, concentration, attitude, motivation, happiness, sadness, frustration, off-task behavior, and confusion level in learning. In online learning, students' affective states are determinative of the learning quality. However, measuring various affective states and what influences them is exceedingly challenging for the lecturer without having real interaction with the students. Existing studies primarily use self-reported data to understand students' affective states, while this paper presents a novel learning analytics system called MOEMO (Motion and Emotion) that could measure online learners' affective states of engagement and concentration using emotion data. Therefore, the novelty of this research is to visualize online learners' affective states on lecturers' screens in real-time using an automated emotion detection process. In real-time and offline, the system extracts emotion data by analyzing facial features from the lecture videos captured by the typical built-in web camera of a laptop computer. The system determines online learners' five types of engagement ("strong engagement", "high engagement", "medium engagement", "low engagement", and "disengagement") and two types of concentration levels ("focused" and "distracted"). Furthermore, the dashboard is designed to provide insight into students' emotional states, the clusters of engaged and disengaged students', assistance with intervention, create an after-class summary report, and configure the automation parameters to adapt to the study environment.
... AffectNet [22], FER2013 [23], CK+ [24], KDEF [25]) as well as datasets of cartoon characters (FERG dataset [26]). FER studies typically aim to reach the best classification on one or several of these datasets [27,28,29,30,31]. Others investigate transfer learning (domain adaptation) [32,33] from a source (human) to a target dataset (another human). ...
Preprint
Full-text available
People can innately recognize human facial expressions in unnatural forms, such as when depicted on the unusual faces drawn in cartoons or when applied to an animal's features. However, current machine learning algorithms struggle with out-of-domain transfer in facial expression recognition (FER). We propose a biologically-inspired mechanism for such transfer learning, which is based on norm-referenced encoding, where patterns are encoded in terms of difference vectors relative to a domain-specific reference vector. By incorporating domain-specific reference frames, we demonstrate high data efficiency in transfer learning across multiple domains. Our proposed architecture provides an explanation for how the human brain might innately recognize facial expressions on varying head shapes (humans, monkeys, and cartoon avatars) without extensive training. Norm-referenced encoding also allows the intensity of the expression to be read out directly from neural unit activity, similar to face-selective neurons in the brain. Our model achieves a classification accuracy of 92.15\% on the FERG dataset with extreme data efficiency. We train our proposed mechanism with only 12 images, including a single image of each class (facial expression) and one image per domain (avatar). In comparison, the authors of the FERG dataset achieved a classification accuracy of 89.02\% with their FaceExpr model, which was trained on 43,000 images.
... Multiple human senses can be stimulated to develop emotions through the use of audio-visual information employed in multisensory media studies. The examination of facial expressions or neuro-physiological signals has been the primary focus of databases for the research of affect recognition based on visual modalities [22][23][24][25][26]. Yet, despite the fact that eye movements have been shown to be valuable indicators of affective response [18], few researchers have concentrated on the creation of relevant databases. ...
Article
Full-text available
Affective state estimation is a research field that has gained increased attention from the research community in the last decade. Two of the main catalysts for this are the advancement in the data analysis using artificial intelligence and the availability of high-quality video. Unfortunately, benchmarks and public datasets are limited, thus making the development of new methodologies and the implementation of comparative studies essential. The current work presents the eSEE-d database, which is a resource to be used for emotional State Estimation based on Eye-tracking data. Eye movements of 48 participants were recorded as they watched 10 emotion-evoking videos, each of them followed by a neutral video. Participants rated four emotions (tenderness, anger, disgust, sadness) on a scale from 0 to 10, which was later translated in terms of emotional arousal and valence levels. Furthermore, each participant filled three self-assessment questionnaires. An extensive analysis of the participants' answers to the questionnaires' self-assessment scores as well as their ratings during the experiments is presented. Moreover, eye and gaze features were extracted from the low-level eye-recorded metrics, and their correlations with the participants' ratings are investigated. Finally, we take on the challenge to classify arousal and valence levels based solely on eye and gaze features, leading to promising results. In particular, the Deep Multilayer Perceptron (DMLP) network we developed achieved an accuracy of 92% in distinguishing positive valence from non-positive and 81% in distinguishing low arousal from medium arousal. The dataset is made publicly available.
Article
Facial emotion recognition is a vital area within computer vision and artificial intelligence, with significant applications in human-computer interaction, security, and healthcare. This research presents a novel approach for identifying facial emotions through the use of Convolutional Neural Networks (CNNs). We provide a comprehensive overview of the CNN architecture, the dataset utilized, the preprocessing techniques employed, the training methodology, and the results achieved. Our approach demonstrates exceptional accuracy in detecting a range of emotions, including happiness, sadness, anger, and surprise. Additionally, this study explores the implications of our findings and suggests potential improvements and future research directions to enhance the performance and applicability of facial emotion recognition systems.
Article
With the rapid development of the Internet, the number of social media and e-commerce platforms increased dramatically. Users from all over world share their comments and sentiments on the Internet become a new tradition. Applying natural language processing technology to analyze the text on the Internet for mining the emotional tendencies has become the main way in the social public opinion monitoring and the after-sale feedback of manufactory. Thus, the study on text sentiment analysis has shown important social significance and commercial value. Sentiment analysis is a hot research topic in the field of natural language processing and data mining in recent ten years. The paper starts with the topic of "Sentiment Analysis using a CNN-BiLSTM deep model based on attention mechanism classification". First, it conducts an in-depth investigation on the current research status and commonly used algorithms at home and abroad, and briefly introduces and analyzes the current mainstream sentiment analysis methods. As a direction of machine learning, deep learning has become a hot research topic in emotion classification in the field of natural language processing. This paper uses deep learning models to study the sentiment classification problem of short and long text sentiment classification tasks. The main research contents are as follows. Firstly, Traditional neural network based short text classification algorithms for sentiment classification is easy to find the errors. The feature dimension is too high, and the feature information of the pool layer is lost, which leads to the loss of the details of the emotion vocabulary. To solve this problem, the Word Vector Model (Word2vec), Bidirectional Long-term and Short-term Memory networks (BiLSTM) and convolutional neural network (CNN) are combined in Quora dataset. The experiment shows that the accuracy of CNN-BiLSTM model associated with Word2vec word embedding achieved 91.48%. This proves that the hybrid network model performs better than the single structure neural network in short text. Convolutional neural network (CNN) models use convolutional layers and maximum pooling or max-overtime pooling layers to extract higher-level features, while LSTM models can capture long- term dependencies between words hence are better used for text classification. However, even with the hybrid approach that leverages the powers of these two deep-learning models, the number of features to remember for classification remains huge, hence hindering the training process. Secondly, we propose an attention based CNN-BiLSTM hybrid model that capitalize on the advantages of LSTM and CNN with an additional attention mechanism in IMDB movie reviews dataset. In the experiment, under the control of single variable of Data volume and Epoch, the proposed hybrid model was compared with the results of various indicators including recall, precision, F1 score and accuracy of CNN, LSTM and CNN-LSTM in long text. When the data size was 13 k, the proposed model had the highest accuracy at 0.908, and the F1 score also showed the highest performance at 0.883. When the epoch value for obtaining the optimal accuracy of each model was 10 for CNN, 14 for LSTM, 5 for MLP and 15 epochs for CNN-LSTM, which took the longest learning time. The F1 score also showed the best performance of the proposed model at 0.906, and accuracy of the proposed model was the highest at 0.929. Finally, the experimental results show that the bidirectional long- and short-term memory convolutional neural network (BiLSTM-CNN) model based on attention mechanism can effectively improve the performance of sentiment classification of data sets when processing long-text sentiment classification tasks. Keywords: sentiment analysis, CNN, BiLSTM, attention mechanism, text classification
Chapter
Machine learning is growing every day, with improvements in existing algorithms, thus making them more applicable for real life scenarios. It has made huge impact on everyday life that the majority of mobile applications have some kind of machine algorithm integrated in their structure. Convolutional neural networks are a great tool for image processing, and in this work we developed the model for emotion detection. Every machine learning algorithm requires a dataset in order to be trained and validated. For this specific case, we utilized publicly available FER-2013 dataset that contains seven emotions: angry, disgust, fear, happy, neutral, sad and surprise. Ratio between train and validation images is 80:20, respectively. For model optimization we proposed Adam optimizer, alongside other techniques for preventing overfitting during training and saving the best weights. Model achieved the training accuracy of 71.55%, and the validation accuracy of 61.4%. We utilized NVIDIA GeFroce MX350 GPU to train and validate the model, resulting in much shorter training time, which was cca 20 min.KeywordsMobile ApplicationsMachine LearningImage ProcessingDeep LearningConvolutional Neural NetworksAdam OptimizerFER-2013 DatasetEmotion DetectionGPU
Chapter
In recent years, the field of facial expression recognition (FER) has become increasingly challenging and active. To improve recognition accuracy, facial expression recognition based on a deep learning model has attracted much attention from academia and industry. Convolutional neural network (CNN) combined with attention mechanism shows great advantages in image processing and other tasks. In this study, we propose a facial multi-region feature recognition model, which extracts facial emotion features based on CNN and attention mechanism and fuses the output for facial emotion recognition. The method proposed in this paper not only extracts the overall feature information of the face but also extracts the local feature information of the eyes and mouth region and performs the feature fusion work. Therefore, the proposed method can obtain better recognition accuracy. We used JAFFE, CK+, RAF-DB, and FER-2013 datasets to validate the proposed methods. The experiment results indicate that the proposed method is effective in facial expression recognition.KeywordsFacial expression recognitionCNNAttention mechanismFeature fusion
Article
Facial expression recognition is a vital research topic in most fields ranging from artificial intelligence and gamingto human-computer interaction (HCI) and psychology. This paper proposes a hybrid model for facial expressionrecognition, which comprises a deep convolutional neural network (DCNN) and a Haar Cascade deep learningarchitecture. The objective is to classify real-time and digital facial images into one of the seven facial emotioncategories considered. The DCNN employed in this research has more convolutional layers, ReLU activationfunctions, and multiple kernels to enhance filtering depth and facial feature extraction. In addition, a HaarCascade model was also mutually used to detect facial features in real-time images and video frames. Grayscaleimages from the Kaggle repository (FER2013) and then exploited graphics processing unit (GPU) computation toexpedite the training and validation process. Pre-processing and data augmentation techniques are applied toimprove training efficiency and classification performance. The experimental results show a significantly improvedclassification performance compared to state-of-the-art (SoTA) experiments and research. Also, compared to otherconventional models, this paper validates that the proposed architecture is superior in classification performancewith an improvement of up to 6%, totaling up to 70% accuracy, and with less execution time of 2,098.8 s
Article
Full-text available
With the continuous development of facial expression technology, especially the development of deep learning, and the establishment of in-the-wild datesets in recent years, field face recognition has become a hot research field in the wild of facial expression recognition. Different from the traditional facial expression recognition(FER), in-the-wild facial facial expression recognition, the problem of recognition accuracy caused by illumination, occlusion or low image resolution. Therefore, in order to solve these problems, new methods have been put forward continuously in recent years. In this paper, we first continue to summarize the widely used datesets, and then summarize the paper methods of facial expression recognition in the field proposed in the past two years.
Article
Full-text available
Detecting facial emotion expression is a classic research problem in image processing. Face expression detection can be used to help human users monitor their stress levels. Perceiving an individual's failure to communicate specific looks might help analyze early psychological disorders. several issues like lighting changes, rotations, occlusions, and accessories persist. These are not simply traditional image processing issues, yet additionally, action units that make gathering activity of facial acknowledgment troublesome look information, and order of the demeanor. In this study, we use Xception taking into account Xception and convolution neural network (CNN), which is easy to focus on incredible parts like the face, and visual geometric group (VGG-19) used to extract the facial feature using the OpenCV framework classifying the image into any of the basic facial emotions. NVIDIA Jetson Nano has a high video handling outline rate. Accomplishing preferable precision over the recently evolved models on software. The average accuracies for standard data set CK+,” on NVIDIA Jetson Nano, the accuracy rate is 97.1% in the Xception model in the convolutional neural network, 98.4% in VGG-19, and real-time environment accuracy using OpenCV, accuracy rate is 95.6%.</span
Preprint
Full-text available
Automated human emotion recognition from facial expressions is a well-studied problem and still remains a very challenging task. Some efficient or accurate deep learning models have been presented in the literature. However, it is quite difficult to design a model that is both efficient and accurate at the same time. Moreover, identifying the minute feature variations in facial regions for both macro and micro-expressions requires expertise in network design. In this paper, we proposed to search for a highly efficient and robust neural architecture for both macro and micro-level facial expression recognition. To the best of our knowledge, this is the first attempt to design a NAS-based solution for both macro and micro-expression recognition. We produce lightweight models with a gradient-based architecture search algorithm. To maintain consistency between macro and micro-expressions, we utilize dynamic imaging and convert microexpression sequences into a single frame, preserving the spatiotemporal features in the facial regions. The EmoNAS has evaluated over 13 datasets (7 macro expression datasets: CK+, DISFA, MUG, ISED, OULU-VIS CASIA, FER2013, RAF-DB, and 6 micro-expression datasets: CASME-I, CASME-II, CAS(ME)2, SAMM, SMIC, MEGC2019 challenge). The proposed models outperform the existing state-of-the-art methods and perform very well in terms of speed and space complexity.
Article
Facial Emotion Recognition (FER) has gained popularity in recent years due to its many applications, including biometrics, detection of mental illness, understanding of human behavior, and psychological profiling. However, developing an accurate and robust FER pipeline is still challenging because multiple factors make it difficult to generalize across different emotions. The factors that challenge a promising FER pipeline include pose variation, heterogeneity of the facial structure, illumination, occlusion, low resolution, and aging factors. Many approaches were developed to overcome the above problems, such as the Histogram of Oriented Gradients (HOG) and Local Binary Pattern (LBP) histogram. However, these methods require manual feature selection. Convolutional Neural Networks (CNN) overcame this manual feature selection problem. CNN has shown great potential in FER tasks due to its unique feature extraction strategy compared to regular FER models. In this paper, we propose a novel CNN architecture by interfacing U-Net segmentation layers in-between Visual Geometry Group (VGG) layers to allow the network to emphasize more critical features from the feature map, which also controls the flow of redundant information through the VGG layers. Our model achieves state-of-the-art (SOTA) single network accuracy compared with other well-known FER models on the FER-2013 dataset.
Article
Full-text available
The ability to recognize facial expressions automatically enables novel applications in human-computer interaction and other areas. Consequently, there has been active research in this field, with several recent works utilizing Convolutional Neural Networks (CNNs) for feature extraction and inference. These works differ significantly in terms of CNN architectures and other factors. Based on the reported results alone, the performance impact of these factors is unclear. In this paper, we review the state of the art in image-based facial expression recognition using CNNs and highlight algorithmic differences and their performance impact. On this basis, we identify existing bottlenecks and consequently directions for advancing this research field. Furthermore, we demonstrate that overcoming one of these bottlenecks - the comparatively basic architectures of the CNNs utilized in this field - leads to a substantial performance increase. By forming an ensemble of modern deep CNNs, we obtain a FER2013 test accuracy of 75.2%, outperforming previous works without requiring auxiliary training data or face registration.
Conference Paper
Full-text available
Emotions are important and meaningful aspects of human behaviour. Analyzing facial expressions and recognizing their emotional state is a challenging task with wide ranging applications. In this paper, we present an emotion recognition system, which recognizes basic emotional states in facial expressions. Initially, it detects human faces in images using the Viola-Jones algorithm. Then, it locates and measures characteristics of specific regions of the facial expression such as eyes, eyebrows and mouth, and extracts proper geometrical characteristics form each region. These extracted features represent the facial expression and based on them a classification schema, which consists of a Support Vector Machine (SVM) and a Multilayer Perceptron Neural Network (MLPNN), recognizes each expression’s emotional content. The classification schema initially recognizes whether the expression is emotional and then recognizes the specific emotions conveyed. The evaluation conducted on JAFFE and Kohn Kanade databases, revealed very encouraging results.
Article
Full-text available
Affective Computing aims at improving the naturalness of human-computer interactions by integrating the socio-emotional component in the interaction. The use of embodied conversational agents (ECAs) – virtual characters interacting with humans – is a key answer to this issue. On the one hand, the ECA has to take into account the human emotional behaviours and social attitudes. On the other hand, the ECA has to display socio-emotional behaviours with relevance. In this paper, we provide an overview of computational methods used for user’s socio-emotional behaviour analysis and of human-agent interaction strategies by questioning the ambivalent status of surprise. We focus on the computational models and on the methods we use to detect user’s emotion through language and speech processing and present a study investigating the role of surprise in the ECA’s answer.
Article
Full-text available
Research by the eminent Psychologist Paul Ekman in the field of expression of emotions in humans tells us that emotions are biologically determined and universal to human culture. However, research has shown conclusively that even when the subjects try to conceal their emotions, they leak out for a minuscule amount of time in the form of "microexpressions". A Microexpression is an involuntary facial expression that is displayed on the face of humans depending upon their emotional state. Microexpressions usually last a very short span of time. On the other hand, " Macroexpressions " are the expressions we see in our daily interactions. Since they are not suppressed, they last longer. The aim of this paper is to develop a system that will help us recognize macroexpressions in humans.
Article
Full-text available
Current models for automated emotion recognition are developed under the assumption that emotion expressions are distinct expression patterns for basic emotions. Thereby, these approaches fail to account for the emotional processes underlying emotion expressions. We review the literature on human emotion processing and suggest an alternative approach to affective computing. We postulate that the generalizability and robustness of these models can be greatly increased by three major steps: (1) modeling emotional processes as a necessary foundation of emotion recognition; (2) basing models of emotional processes on our knowledge about the human brain; (3) conceptualizing emotions based on appraisal processes and thus regarding emotion expressions as expressive behavior linked to these appraisals rather than fixed neuro-motor patterns. Since modeling emotional processes after neurobiological processes can be considered a long-term effort, we suggest that researchers should focus on early appraisals, which evaluate intrinsic stimulus properties with little higher cortical involvement. With this goal in mind, we focus on the amygdala and its neural connectivity pattern as a promising structure for early emotional processing. We derive a model for the amygdala-visual cortex circuit from the current state of neuroscientific research. This model is capable of conditioning visual stimuli with body reactions to enable rapid emotional processing of stimuli consistent with early stages of psychological appraisal theories. Additionally, amygdala activity can feed back to visual areas to modulate attention allocation according to the emotional relevance of a stimulus. The implications of the model considering other approaches to automated emotion recognition are discussed.
Article
Full-text available
Erratum to: Multimed Tools ApplDOI 10.1007/s11042-014-1869-6The authors cited incorrect data in Tables 6, 7, and 9 in the paper entitled "A survey on facial expression recognition in 3D video sequences" by Antonios Danelakis, Theoharis Theoharis, Ioannis Pratikakis, published online first 1 February 2014. Additionally, every reference to "László et al." should become "Jeni et al.". The authors apologize for these errors.In Table 6: In the row concerning Yin et al. and in the column titled "Classification Accuracy" (2nd column), the word 'N/A' should be changed to '80.20 %'.Table 7 appears correctly with references listed below:References[4] Berretti S, Del Bimbo A, Pala P (2012) Real-time expression recognition from dynamic sequences of 3D facial scans. In: EG Workshop on 3D Object Retrieval, pp. 85-92[8] Canavan SJ, Sun Y, Zhang X, Yin L (2012) A dynamic curvature based approach for facial activity analysis in 3D space. In: CVPR Workshops, pp. 14-19[14] Drira H, Ben Amor B, Daoudi M, ...
Article
Full-text available
In this invited paper, my overview material on the same topic as presented in the plenary overview session of APSIPA-2011 and the tutorial material presented in the same conference [1] are expanded and updated to include more recent developments in deep learning. The previous and the updated materials cover both theory and applications, and analyze its future directions. The goal of this tutorial survey is to introduce the emerging area of deep learning or hierarchical learning to the APSIPA community. Deep learning refers to a class of machine learning techniques, developed largely since 2006, where many stages of non-linear information processing in hierarchical architectures are exploited for pattern classification and for feature learning. In the more recent literature, it is also connected to representation learning, which involves a hierarchy of features or concepts where higher-level concepts are defined from lower-level ones and where the same lower-level concepts help to define higher-level ones. In this tutorial survey, a brief history of deep learning research is discussed first. Then, a classificatory scheme is developed to analyze and summarize major work reported in the recent deep learning literature. Using this scheme, I provide a taxonomy-oriented survey on the existing deep architectures and algorithms in the literature, and categorize them into three classes: generative, discriminative, and hybrid. Three representative deep architectures – deep autoencoders, deep stacking networks with their generalization to the temporal domain (recurrent networks), and deep neural networks (pretrained with deep belief networks) – one in each of the three classes, are presented in more detail. Next, selected applications of deep learning are reviewed in broad areas of signal and information processing including audio/speech, image/vision, multimodality, language modeling, natural language processing, and information retrieval. Finally, future directions of deep learning are discussed and analyzed.
Article
Full-text available
Automatic affect analysis has attracted great interest in various contexts including the recognition of action units and basic or non-basic emotions. In spite of major efforts, there are several open questions on what the important cues to interpret facial expressions are and how to encode them. In this paper, we review the progress across a range of affect recognition applications to shed light on these fundamental questions. We analyse the state-of-the-art solutions by decomposing their pipelines into fundamental components, namely face registration, representation, dimensionality reduction and recognition. We discuss the role of these components and highlight the models and new trends that are followed in their design. Moreover, we provide a comprehensive analysis of facial representations by uncovering their advantages and limitations; we elaborate on the type of information they encode and discuss how they deal with the key challenges of illumination variations, registration errors, head-pose variations, occlusions, and identity bias. This survey allows us to identify open issues and to define future directions for designing real-world affect recognition systems.
Article
Full-text available
Recent work in unsupervised feature learning and deep learning has shown that be-ing able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network train-ing. We have successfully used our system to train a deep network 30x larger than previously reported in the literature, and achieves state-of-the-art performance on ImageNet, a visual object recognition task with 16 million images and 21k cate-gories. We show that these same techniques dramatically accelerate the training of a more modestly-sized deep network for a commercial speech recognition ser-vice. Although we focus on and report performance of these methods as applied to training large neural networks, the underlying algorithms are applicable to any gradient-based machine learning algorithm.
Article
Full-text available
Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU (approx 2 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments. Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.
Article
Full-text available
This paper presents a framework (FILTWAM (Framework for Improving Learning Through Webcams And Microphones)) for real-time emotion recognition in e-learning by using webcams. FILTWAM offers timely and relevant feedback based upon learner's facial expressions and verbalizations. FILTWAM's facial expression software module has been developed and tested in a proof-of-concept study. The main goal of this study was to validate the use of webcam data for a real-time and adequate interpretation of facial expressions into extracted emotional states. The software was calibrated with 10 test persons. They received the same computer-based tasks in which each of them were requested 100 times to mimic specific facial expressions. All sessions were recorded on video. For the validation of the face emotion recognition software, two experts annotated and rated participants’ recorded behaviours. Expert findings were contrasted with the software results and showed an overall value of kappa of 0.77. An overall accuracy of our software based on the requested emotions and the recognized emotions is 72%. Whereas existing software only allows not-real time, discontinuous and obtrusive facial detection, our software allows to continuously and unobtrusively monitor learners' behaviours and converts these behaviours directly into emotional states. This paves the way for enhancing the quality and efficacy of e-learning by including the learner's emotional states.
Conference Paper
Full-text available
The huge research effort in the field of face expression recognition (FER) technology is justified by the potential applications in multiple domains: computer science, engineering, psychology, neuroscience, to name just a few. Obviously, this generates an impressive number of scientific publications. The aim of this paper is to identify key representative approaches for facial expression recognition research in the past ten years (2003-2012).
Conference Paper
Full-text available
Emotional expressions of virtual agents are widely believed to enhance the interaction with the user by utilizing more natural means of communication. However, as a result of the current technology virtual agents are often only able to produce facial expressions to convey emotional meaning. The presented research investigates the effects of unimodal vs. multimodal expressions of emotions on the users' recognition of the respective emotional state. We found that multimodal expressions of emotions yield the highest recognition rates. Additionally, emotionally neutral cues in one modality, when presented together with emotionally relevant cues in the other modality, impair the recognition of the correct emotion category as well as intense emotional states.
Article
Full-text available
The ICML 2013 Workshop on Challenges in Representation Learning. 11http://deeplearning.net/icml2013-workshop-competition. focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kind of knowledge can be gained from machine learning competitions.
Chapter
Full-text available
This work approaches the problem of recognizing emotional facial expressions in static images focusing on three preprocessing techniques for feature extraction such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Gabor filters. These methods are commonly used for face recognition and the novelty consists in combining features provided by them in order to improve the performance of an automatic procedure for recognizing emotional facial expressions. Testing and recognition accuracy were performed on the Japanese Female Facial Expression (JAFFE) database using a Multi-Layer Perceptron (MLP) Neural Network as classifier. The best classification accuracy on variations of facial expressions included in the training set was obtained combining PCA and LDA features (93% of correct recognition rate), whereas, combining PCA, LDA and Gabor filter features the net gave 94% of correct classification on facial expressions of subjects not included in the training set.
Chapter
Full-text available
In this paper a fully automatic face verification system is presented. A face is characterized by a vector (jet) of coefficients determined applying a bank of Gabor filters in correspondence to 19 facial fiducial points automatically localized. The identity claimed by a subject is accepted or rejected depending on a similarity measure computed among the jet characterizing the subject, and the ones corresponding to the subjects in the gallery. The performance of the system has been quantified according to the Lausanne evaluation protocol for authentication.
Conference Paper
Full-text available
This work investigates the use of a point distribution model to detect prominent features in a face (eyes, brows, mouth, etc) and the subsequent facial feature extraction and facial expression classification into seven categories (anger, fear, surprise, happiness, disgust, neutral and sadness). A multi-scale and multi-orientation Gabor filter bank, designed in such a way so as to avoid redundant information, is used to extract facial features at selected locations of the prominent features of a face (fiducial points). A region based approach is employed at the location of the fiducial points using different region sizes to allow some degree of flexibility and avoid artefacts due to incorrect automatic discovery of these points. A feed forward back propagation Artificial Neural Network is employed to classify the extracted feature vectors. The methodology is evaluated by forming 7 different regions and the feature vector is extracted at the location of 20 fiducial points.
Article
Full-text available
Using emotion detection technologies from biophysical signals, this study explored how emotion evolves during learning process and how emotion feedback could be used to improve learning experiences. This article also described a cutting-edge pervasive e-Learning platform used in a Shanghai online college and proposed an affective e-Learning model, which combined learners' emotions with the Shanghai e-Learning platform. The study was guided by Russell's circumplex model of affect and Kort's learning spiral model. The results about emotion recognition from physiological signals achieved a best-case accuracy (86.3%) for four types of learning emotions. And results from emotion revolution study showed that engagement and confusion were the most important and frequently occurred emotions in learning, which is consistent with the findings from AutoTutor project. No evidence from this study validated Kort's learning spiral model. An experimental prototype of the affective e-Learning model was built to help improve students' learning experience by customizing learning material delivery based on students' emotional state. Experiments indicated the superiority of emotion aware over non-emotion-aware with a performance increase of 91%.
Article
Full-text available
Facial Expression Classification is an interesting research problem in recent years. There are a lot of methods to solve this problem. In this research, we propose a novel approach using Canny, Principal Component Analysis (PCA) and Artificial Neural Network. Firstly, in preprocessing phase, we use Canny for local region detection of facial images. Then each of local region's features will be presented based on Principal Component Analysis (PCA). Finally, using Artificial Neural Network (ANN)applies for Facial Expression Classification. We apply our proposal method (Canny_PCA_ANN) for recognition of six basic facial expressions on JAFFE database consisting 213 images posed by 10 Japanese female models. The experimental result shows the feasibility of our proposal method.
Article
Full-text available
In this paper, a new technique coined two-dimensional principal component analysis (2DPCA) is developed for image representation. As opposed to PCA, 2DPCA is based on 2D image matrices rather than 1D vectors so the image matrix does not need to be transformed into a vector prior to feature extraction. Instead, an image covariance matrix is constructed directly using the original image matrices, and its eigenvectors are derived for image feature extraction. To test 2DPCA and evaluate its performance, a series of experiments were performed on three face image databases: ORL, AR, and Yale face databases. The recognition rate across all trials was higher using 2DPCA than PCA. The experimental results also indicated that the extraction of image features is computationally more efficient using 2DPCA than PCA.
Conference Paper
Full-text available
A completely automatic face recognition system is presented. The method works on color and gray level images: after having localized the face and the facial features, it determines 16 facial fiducial points, and characterizes them by applying a bank of filters which extract the peculiar texture around them (jets). Recognition is realized by measuring the similarity between the different jets. The system is inspired by the elastic bunch graph method, but the fiducial point localization does not require any manual setting or operator intervention.
Article
Full-text available
A fully automated, multistage system for real-time recognition of facial expression is presented. The system uses facial motion to characterize monochrome frontal views of facial expressions and is able to operate effectively in cluttered and dynamic scenes, recognizing the six emotions universally associated with unique facial expressions, namely happiness, sadness, disgust, surprise, fear, and anger. Faces are located using a spatial ratio template tracker algorithm. Optical flow of the face is subsequently determined using a real-time implementation of a robust gradient model. The expression recognition system then averages facial velocity information over identified regions of the face and cancels out rigid head motion by taking ratios of this averaged motion. The motion signatures produced are then classified using Support Vector Machines as either nonexpressive or as one of the six basic emotions. The completed system is demonstrated in two simple affective computing applications that respond in real-time to the facial expressions of the user, thereby providing the potential for improvements in the interaction between a computer user and technology.
Article
Full-text available
Humans detect and interpret faces and facial expressions in a scene with little or no effort. Still, development of an automated system that accomplishes this task is rather difficult. There are several related problems: detection of an image segment as a face, extraction of the facial expression information, and classification of the expression (e.g., in emotion categories). A system that performs these operations accurately and in real time would form a big step in achieving a human-like interaction between man and machine. The paper surveys the past work in solving these problems. The capability of the human visual system with respect to these problems is discussed, too. It is meant to serve as an ultimate goal and a guide for determining recommendations for development of an automatic facial expression analyzer.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Chapter
In this chapter we consider the problem of automatic facial expression analysis. Our take on this is that the field has reached a point where it needs to move away from considering experiments and applications under in-the-lab conditions, and move towards so-called in-the-wild scenarios. We assume throughout this chapter that the aim is to develop technology that can be deployed in practical applications under unconstrained conditions. While some first efforts in this direction have been reported very recently, it is still unclear what the right path to achieving accurate, informative, robust, and real-time facial expression analysis will be. To illuminate the journey ahead, we first provide in Sect. 1 an overview of the existing theories and specific problem formulations considered within the computer vision community. Then we describe in Sect. 2 the standard algorithmic pipeline which is common to most facial expression analysis algorithms. We include suggestions as to which of the current algorithms and approaches are most suited to the scenario considered. In Sect. 3 we describe our view of the remaining challenges, and the current opportunities within the field. This chapter is thus not intended as a review of different approaches, but rather a selection of what we believe are the most suitable state-of-the-art algorithms, and a selection of exemplars chosen to characterise a specific approach. We review in Sect. 4 some of the exciting opportunities for the application of automatic facial expression analysis to everyday practical problems and current commercial applications being exploited. Section 5 ends the chapter by summarising the major conclusions drawn.
Conference Paper
As one of the non-contact biometrics, face representation had been widely used in many circumstances. However conventional methods could no longer satisfy the demand at present, due to its low recognition accuracy and restrictions of many occasions. In this paper, we presented the deep learning method to achieve facial landmark detection and unrestricted face recognition. To solve the face landmark detection problem, this paper proposed a layer-by-layer training method of a deep convolutional neural network to help the convolutional neural network to converge and proposed a sample transformation method to avoid over-fitting. This method had reached an accuracy of 91% on ORL face database. To solve the face recognition problem, this paper proposed a SIAMESE convolutional neural network which was trained on different parts and scales of a face and concatenated the face representation. The face recognition algorithm had reached an accuracy of 91% on ORL and 81% on LFW face database.
Conference Paper
Recognizing the emotional state of a human from his/her facial gestures is a very challenging task with wide ranging applications in everyday life. In this paper, we present an emotion detection system developed to automatically recognize basic emotional states from human facial expressions. The system initially analyzes the facial image, locates and measures distinctive human facial deformations such as eyes, eyebrows and mouth and extracts the proper features. Then, a multilayer neural network is used for the classification of the facial expression to the proper emotional states. The system was evaluated on images of human faces from the JAFFE database and the results gathered indicate quite satisfactory performance. © IFIP International Federation for Information Processing 2014.
Article
In this study, the Polynomial-based Radial Basis Function Neural Networks is proposed as an one of the recognition part of overall face recognition system that consists of two parts such as the preprocessing part and recognition part. The design methodology and procedure of the proposed pRBFNNs are presented to obtain the solution to high-dimensional pattern recognition problems. In data preprocessing part, Principal Component Analysis(PCA) which is generally used in face recognition, which is useful to express some classes using reduction, since it is effective to maintain the rate of recognition and to reduce the amount of data at the same time. However, because of there of the whole face image, it can not guarantee the detection rate about the change of viewpoint and whole image. Thus, to compensate for the defects, Linear Discriminant Analysis(LDA) is used to enhance the separation of different classes. In this paper, we combine the PCA&LDA algorithm and design the optimized pRBFNNs for recognition module. The proposed pRBFNNs architecture consists of three functional modules such as the condition part, the conclusion part, and the inference part as fuzzy rules formed in 'If-then' format. In the condition part of fuzzy rules, input space is partitioned with Fuzzy C-Means clustering. In the conclusion part of rules, the connection weight of pRBFNNs is represented as two kinds of polynomials such as constant, and linear. The coefficients of connection weight identified with back-propagation using gradient descent method. The output of the pRBFNNs model is obtained by fuzzy inference method in the inference part of fuzzy rules. The essential design parameters (including learning rate, momentum coefficient and fuzzification coefficient) of the networks are optimized by means of Differential Evolution. The proposed pRBFNNs are applied to face image(ex Yale, AT&T) datasets and then demonstrated from the viewpoint of the output performance and recognition rate.
Article
1. Introduction The study of emotion Types of evidence for theories of emotion Some goals for a cognitive theory of emotion 2. Structure of the theory The organisation of emotion types Basic emotions Some implications of the emotions-as-valenced-reactions claim 3. The cognitive psychology of appraisal The appraisal structure Central intensity variables 4. The intensity of emotions Global variables Local variables Variable-values, variable-weights, and emotion thresholds 5. Reactions to events: I. The well-being emotions Loss emotions and fine-grained analyses The fortunes-of-others emotions Self-pity and related states 6. Reactions to events: II. The prospect-based emotions Shock and pleasant surprise Some interrelationships between prospect-based emotions Suspense, resignation, hopelessness, and other related states 7. Reactions to agents The attribution emotions Gratitude, anger, and some other compound emotions 8. Reactions to objects The attraction emotions Fine-grained analyses and emotion sequences 9. The boundaries of the theory Emotion words and cross-cultural issues Emotion experiences and unconscious emotions Coping and the function of emotions Computational tractability.
Article
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make train-ing faster, we used non-saturating neurons and a very efficient GPU implemen-tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.
Article
A Deep Boltzmann Machine is described for learning a generative model of data that consists of multiple and diverse input modalities. The model can be used to extract a unified representation that fuses modalities together. We find that this representation is useful for classification and information retrieval tasks. The model works by learning a probability density over the space of multimodal inputs. It uses states of latent variables as representations of the input. The model can extract this representation even when some modalities are absent by sampling from the conditional distribution over them and filling them in. Our experimental results on bi-modal data consisting of images and text show that the Multimodal DBM can learn a good generative model of the joint space of image and text inputs that is useful for information retrieval from both unimodal and multimodal queries. We further demonstrate that this model significantly outperforms SVMs and LDA on discriminative tasks. Finally, we compare our model to other deep learning methods, including autoencoders and deep belief networks, and show that it achieves noticeable gains.
Article
While we have known for centuries that facial expressions can reveal what people are thinking and feeling, it is only recently that the face has been studied scientifically for what it can tell us about internal states, social behavior, and psychopathology. Today's widely available, sophisticated measuring systems have allowed us to conduct a wealth of new research on facial behavior that has contributed enormously to our understanding of the relationship between facial expression and human psychology. The chapters in this volume present the state-of-the-art in this research. They address key topics and questions, such as the dynamic and morphological differences between voluntary and involuntary expressions, the relationship between what people show on their faces and what they say they feel, whether it is possible to use facial behavior to draw distinctions among psychiatric populations, and how far research on automating facial measurement has progressed. © 1997, 2005 by Oxford University Press, Inc. All rights reserved.
Article
Much of the work on embodied conversational agents is concerned with building computational models of nonverbal behaviors that can generate the right behavior in the appropriate context. In this paper, we discuss, from a linguistic and a conversation theoretic point of view, how nonverbal behaviors in conversations work. We look particularly at gaze and head movements. These play a variety of functions in face-to-face interactions. We show how these functions are structured by general principles governing cooperative actions and symbolic communication.
Article
In this study, polynomial-based radial basis function neural networks are proposed as one of the functional components of the overall face recognition system. The system consists of the preprocessing and recognition module. The design methodology and resulting procedure of the proposed P-RBF NNs are presented. The structure helps construct a solution to high-dimensional pattern recognition problems. In data preprocessing part, principal component analysis (PCA) is generally used in face recognition. It is useful in reducing the dimensionality of the feature space. However, because it is concerned with the overall face image, it cannot guarantee the same classification rate when changing viewpoints. To compensate for these limitations, linear discriminant analysis (LDA) is used to enhance the separation between different classes. In this paper, we elaborate on the PCA-LDA algorithm and design an optimal P-RBF NNs for the recognition module.
Article
Recently, fully-connected and convolutional neural networks have been trained to achieve state-of-the-art performance on a wide variety of tasks such as speech recognition, image classification, natural language processing, and bioinformatics. For classification tasks, most of these "deep learning" models employ the softmax activation function for prediction and minimize cross-entropy loss. In this paper, we demonstrate a small but consistent advantage of replacing the softmax layer with a linear support vector machine. Learning minimizes a margin-based loss instead of the cross-entropy loss. While there have been various combinations of neural nets and SVMs in prior art, our results using L2-SVMs show that by simply replacing softmax with linear SVMs gives significant gains on popular deep learning datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop's face expression recognition challenge.
Chapter
IntroductionThe Characteristics That Distinguish Basic EmotionsDoes Any One Characteristic Distinguish the Basic Emotions?The Value of the Basic Emotions PositionAcknowledgementsReferences
Article
We present a new learning algorithm for Boltz- mann machines that contain many layers of hid- den variables. Data-dependent expectations are estimated using a variational approximation that tends to focus on a single mode, and data- independent expectations are approximated us- ing persistent Markov chains. The use of two quite different techniques for estimating the two types of expectation that enter into the gradient of the log-likelihood makes it practical to learn Boltzmann machines with multiple hidden lay- ers and millions of parameters. The learning can be made more efficient by using a layer-by-layer "pre-training" phase that allows variational in- ference to be initialized with a single bottom- up pass. We present results on the MNIST and NORB datasets showing that deep Boltzmann machines learn good generative models and per- form well on handwritten digit and visual object recognition tasks.
Conference Paper
In this paper we attempt to recognize the facial expressions using well known Eigenspaces approach for face recognition. In our approach we identify the user's facial expressions from the input images using a method that is modified from eigenface recognition. The simplicity in matching the input images with the database images for various facial expressions allows the system to be suitable for near real time applications. We have implemented this scheme and experimental results show the effectiveness of our scheme.
Article
image analysis. There are two major approaches: localfeature -based and image-vector-based. We propose a hybrid of these two approaches. Our method uses Higherorder Local Auto-Correlation (HLAC) features and Fisher weight maps. HLAC features are computed at each pixel in an image. These features are integrated with a weight map to obtain a feature vector. The optimal weight map, called a Fisher weight map, is found by maximizing the Fisher criterion of feature vectors. Fisher discriminant analysis is used to recognize an image from the feature vector. Our experiments on facial expression recognition demonstrate the effectiveness of Fisher weight maps for objectively quantifying the importance of each facial area for classification of expressions.
Facial Expressions of Emotions for Virtual Characters. The Oxford Handbook of Affective Computing
  • M Ochs
  • R Niewiadomski
  • C Pelachaud
Ochs, M., Niewiadomski, R., & Pelachaud, C. (2014). Facial Expressions of Emotions for Virtual Characters. The Oxford Handbook of Affective Computing, 261.
Facial emotion recognition for intelligent tutoring environment
  • K O Akputu
  • K P Seng
  • Y L Lee
Akputu, K. O., Seng, K. P., & Lee, Y. L.: Facial Emotion Recognition for Intelligent Tutoring Environment. In 2nd International Conference on Machine Learning and Computer Science (IMLCS'2013), pp. 9-13, 2013.