Figure 1 - uploaded by Hatice Gunes
Content may be subject to copyright.
Examples of gestures from [20].  

Examples of gestures from [20].  

Source publication
Conference Paper
Full-text available
To be able to develop and test robust affective multimodal systems, researchers need access to novel databases containing representative samples of human multi-modal expressive behavior. The creation of such databases requires a major effort in the definition of representative behaviors, the choice of expressive modalities, and the collection and l...

Context in source publication

Context 1
... hand gestures, mostly for command entry purposes (i.e. [19][20][21] only, and do not take into consideration the relationship between body parts (i.e. between hands; hands and the face; hands, face and shoulders etc.) [19][20][21]. For instance, the Massey Hand Gesture Database is one such database [21]. Examples of its images are shown in Fig. 1. In general, these databases lack expressiveness of the body and its parts and therefore cannot be used for analysis of human nonverbal affective ...

Similar publications

Article
Full-text available
Globally terrorism continues to destroy the lives of people. To identify the terrorist amongst the other people is very difficult rather impossible. This exploratory study aims at investigating the effects of terrorism to recognize emotions. The current paper presents a view-based approach to the representation and recognition of human facial expre...
Article
Full-text available
Major theories of hemisphere asymmetries in facial expression processing predict right hemisphere dominance for negative facial expressions of disgust, fear, and sadness, however, some studies observe left hemisphere dominance for one or more of these expressions. Research suggests that tasks requiring the identification of six basic emotional faci...
Conference Paper
Full-text available
Facial expressions of human emotions play an essential role in gaining insights into human cognition. They are crucial for designing human-computer interaction models. Although human emotional states are not limited to basic emotions such as happiness, sadness, anger, fear, disgust, and surprise, most of the current researches are focusing on those...
Preprint
Full-text available
Over the past decades the machine and deep learning community has celebrated great achievements in challenging tasks such as image classification. The deep architecture of artificial neural networks together with the plenitude of available data makes it possible to describe highly complex relations. Yet, it is still impossible to fully capture what...
Article
Full-text available
In order to describe uncertain and fuzzy emotions in recognizing facial expressions, a PAD regressive model based on deep convolutional neural network model is built to quantize the emotions, by which the facial expressions can be mapped to the emotional space of PAD. Then the emotional membership function is proposed to describe the uncertainty an...

Citations

... The stimulus set [40] from Tilburg University collects photographs of 50 actors performing different emotions. The FABO database [41] is a pioneering work, and the use of video clips of intentionally posed prototypical gestures for emotion recognition is proposed in this work. The videos in the dataset were tagged with six basic emotions and four states: neutral, anxiety, boredom, and uncertainty. ...
... Early work on automatic modeling of emotional gestures relied heavily on hand-crafted features [40,41,53]. Many recent works have introduced neural networks to accomplish gesture/action recognition. ...
Preprint
In this work, we focus on a special group of human body language -- the micro-gesture (MG), which differs from the range of ordinary illustrative gestures in that they are not intentional behaviors performed to convey information to others, but rather unintentional behaviors driven by inner feelings. This characteristic introduces two novel challenges regarding micro-gestures that are worth rethinking. The first is whether strategies designed for other action recognition are entirely applicable to micro-gestures. The second is whether micro-gestures, as supplementary data, can provide additional insights for emotional understanding. In recognizing micro-gestures, we explored various augmentation strategies that take into account the subtle spatial and brief temporal characteristics of micro-gestures, often accompanied by repetitiveness, to determine more suitable augmentation methods. Considering the significance of temporal domain information for micro-gestures, we introduce a simple and efficient plug-and-play spatiotemporal balancing fusion method. We not only studied our method on the considered micro-gesture dataset but also conducted experiments on mainstream action datasets. The results show that our approach performs well in micro-gesture recognition and on other datasets, achieving state-of-the-art performance compared to previous micro-gesture recognition methods. For emotional understanding based on micro-gestures, we construct complex emotional reasoning scenarios. Our evaluation, conducted with large language models, shows that micro-gestures play a significant and positive role in enhancing comprehensive emotional understanding. The scenarios we developed can be extended to other micro-gesture-based tasks such as deception detection and interviews. We confirm that our new insights contribute to advancing research in micro-gesture and emotional artificial intelligence.
... (3) The experimental results demonstrate that the domain adaptive method based on MJDDAN exhibits significant advantages in the task of facial expression recognition. This is validated through experiments conducted on the RAVDESS [19], FABO [20], and eNTERFACE [21] datasets. In comparison to other methods, MJDDAN achieves superior implementation outcomes. ...
... The experiments in this paper encompass three distinct databases, providing a comprehensive validation of the proposed methodology. These databases include the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) [19], the Bimodal Face and Body Gesture Database (FABO) [20], and the eNTERFACE '05 Audio-Visual Emotion Database (eNTERFACE) [21]. ...
... Each video contains 2-4 fully displayed emotional expressions of the same type. The database recorded nine emotional expressions (surprise, anger, disgust, fear, boredom, anxiety, and uncertainty) [20,[45][46][47][48]. Figure 4 is a sample of partial expressions in the FABO database. ...
Article
Full-text available
In order to obtain more fine-grained information from multiple sub-feature spaces for domain adaptation, this paper proposes a novel multi-representation joint dynamic domain adaptation network (MJDDAN) and applies it to achieve cross-database facial expression recognition. The MJDDAN uses a hybrid structure to extract multi-representation features and maps the original facial expression features into multiple sub-feature spaces, aligning the expression features of the source domain and target domain in multiple sub-feature spaces from different angles to extract features more comprehensively. Moreover, the MJDDAN proposes the Joint Dynamic Maximum Mean Difference (JD-MMD) model to reduce the difference in feature distribution between different subdomains by simultaneously minimizing the maximum mean difference and local maximum mean difference in each substructure. Three databases, including eNTERFACE, FABO, and RAVDESS, are used to design a large number of cross-database transfer learning facial expression recognition experiments. The accuracy of emotion recognition experiments with eNTERFACE, FABO, and RAVDESS as target domains reach 53.64%, 43.66%, and 35.87%, respectively. Compared to the best comparison method chosen in this article, the accuracy rates were improved by 1.79%, 0.85%, and 1.02%, respectively.
... In Table 1, the most widely used benchmarking corpora of emotional speech are listed with their main characteristics. The FABO [87] is a bimodal face and body gesture corpus for the automatic analysis of human nonverbal affective behavior. The corpus contains approximately 1900 videos of emotional facial expressions recorded with two cameras simultaneously. ...
Article
Full-text available
In this article, we present a novel approach for emotional speech lip-reading (EMOLIPS). This two-level approach to emotional speech to text recognition based on visual data processing is motivated by human perception and the recent developments in multimodal deep learning. The proposed approach uses visual speech data to determine the type of speech emotion. The speech data are then processed using one of the emotional lip-reading models trained from scratch. This essentially resolves the multi-emotional lip-reading issue associated with most real-life scenarios. We implemented these models as a combination of EMO-3DCNN-GRU architecture for emotion recognition and 3DCNN-BiLSTM architecture for automatic lip-reading. We evaluated the models on the CREMA-D and RAVDESS emotional speech corpora. In addition, this article provides a detailed review of recent advances in automated lip-reading and emotion recognition that have been developed over the last 5 years (2018–2023). In comparison to existing research, we mainly focus on the valuable progress brought with the introduction of deep learning to the field and skip the description of traditional approaches. The EMOLIPS approach significantly improves the state-of-the-art accuracy for phrase recognition due to considering emotional features of the pronounced audio-visual speech up to 91.9% and 90.9% for RAVDESS and CREMA-D, respectively. Moreover, we present an extensive experimental investigation that demonstrates how different emotions (happiness, anger, disgust, fear, sadness, and neutral), valence (positive, neutral, and negative) and binary (emotional and neutral) affect automatic lip-reading.
... Emotion Datasets. Numerous efforts have been made to collect datasets and analyze human emotions across different modalities, encompassing facial expression [38], body gestures [32,23,45,21], text [53,17,10], music [8,20,24,14] and visual art [5,37]. However, most of them were limited in size or concerned with single modalities. ...
Preprint
Full-text available
We introduce Affective Visual Dialog, an emotion explanation and reasoning task as a testbed for research on understanding the formation of emotions in visually grounded conversations. The task involves three skills: (1) Dialog-based Question Answering (2) Dialog-based Emotion Prediction and (3) Affective emotion explanation generation based on the dialog. Our key contribution is the collection of a large-scale dataset, dubbed AffectVisDial, consisting of 50K 10-turn visually grounded dialogs as well as concluding emotion attributions and dialog-informed textual emotion explanations, resulting in a total of 27,180 working hours. We explain our design decisions in collecting the dataset and introduce the questioner and answerer tasks that are associated with the participants in the conversation. We train and demonstrate solid Affective Visual Dialog baselines adapted from state-of-the-art models. Remarkably, the responses generated by our models show promising emotional reasoning abilities in response to visually grounded conversations. Our project page is available at https://affective-visual-dialog.github.io.
... The FABO (FAce and BOdy) database is a bimodal collection of face and body gestures intended for automated analysis of nonverbal human affective behavior using vision-based methods [116]. It consists of recordings of integrated face and body expressions captured simultaneously. ...
... Body parts Emotions Number of subjects FABO [116] Face and body 10 23 GEMEP [117] Face and body 18 10 HUMAINE [118] Face and body 8 10 EMILYA [119] Body 8 11 LIRIS-ACCEDE [120] Face and Body 6 61 ...
... The bimodal face and body gesture (FABO) database [18] is used to fine-tune the emotion recognition model in this study. The FABO database captured the facial expressions and upper body movements via two cameras and annotated them with the affective states including happiness, surprise, anger, fear, sadness, disgust, boredom, puzzlement, uncertainty and anxiety. ...
... Previous works are mainly based on one-gesture-one-emotion assumptions with two kinds of emotional modeling theories (Noroozi et al. 2018): the categorical and dimensional models. In the categorical model-based methods (Ginevra et al. 2008;Gunes and Piccardi 2006;Mahmoud et al. 2011), each emotion was imposed with a meaningful gesture, and participants were asked to act on those emotions with their body gestures. Recently, some researchers have explored the possibility of analyzing bodily expression with a dimensional model (Kipp and Martin 2009;Luo et al. 2020). ...
... Compared to regular human gesture analysis, such as body pose, action, or sign language recognition, research efforts devoted to using gestural behaviors to interpret human emotion or affection are relatively few (Noroozi et al. 2018). The pioneering work for gesture-based emotion recognition in the computer vision field can go back more than 20 years ago (Ginevra et al. 2008;Gunes and Piccardi 2006;Schindler et al. 2008;Wallbott 1998). Wallbott (1998) collected 224 videos and, in each of their records, an actor acting a body gesture representing an emotional state through a scenario approach. ...
... In the work of Schindler et al. (2008), an image-based dataset was collected in which emotions were displayed by body language in front of a uniform background and different poses could express the same emotion. Gunes and Piccardi (2006) introduced a Bimodal face and body gesture database, called FABO, including facial and gestural modalities. Different from the above laboratory settings, Kipp and Martin (2009) Fig. 3 Acquisition setup for the elicitation and recording of microgestures two movie versions of the play Death of a Salesman trying to explore the correlations between basic gesture attributes and emotion. ...
Article
Full-text available
We explore using body gestures for hidden emotional state analysis. As an important non-verbal communicative fashion, human body gestures are capable of conveying emotional information during social communication. In previous works, efforts have been made mainly on facial expressions, speech, or expressive body gestures to interpret classical expressive emotions. Differently, we focus on a specific group of body gestures, called micro-gestures (MGs), used in the psychology research field to interpret inner human feelings. MGs are subtle and spontaneous body movements that are proven, together with micro-expressions, to be more reliable than normal facial expressions for conveying hidden emotional information. In this work, a comprehensive study of MGs is presented from the computer vision aspect, including a novel spontaneous micro-gesture (SMG) dataset with two emotional stress states and a comprehensive statistical analysis indicating the correlations between MGs and emotional states. Novel frameworks are further presented together with various state-of-the-art methods as benchmarks for automatic classification, online recognition of MGs, and emotional stress state recognition. The dataset and methods presented could inspire a new way of utilizing body gestures for human emotion understanding and bring a new direction to the emotion AI community. The source code and dataset are made available: https://github.com/mikecheninoulu/SMG.
... In addition, the program's movement exercises consist of actions with characteristics similar to gestures associated with the expression of positive emotions (joy and happiness). The body motions associated with expressions of joy and happiness have the characteristics of expansiveness, wherein the upper extremities and upper body move upward, with the arms extending laterally [52][53][54][55]. This typically involves raising both arms and reaching them outward to the sides of the body while also lifting the chest and head upward to create an expansive, uplifting posture. ...
Article
Full-text available
Background: Depression is a substantial global health problem, affecting >300 million people and resulting in 12.7% of all deaths. Depression causes various physical and cognitive problems, leading to a 5-year to 10-year decrease in life expectancy compared with the general population. Physical activity is known to be an effective, evidence-based treatment for depression. However, people generally have difficulties with participating in physical activity owing to limitations in time and accessibility. Objective: To address this issue, this study aimed to contribute to the development of alternative and innovative intervention methods for depression and stress management in adults. More specifically, we attempted to investigate the effectiveness of a mobile phone-based physical activity program on depression, perceived stress, psychological well-being, and quality of life among adults in South Korea. Methods: Participants were recruited and randomly assigned to the mobile phone intervention or waitlist group. Self-report questionnaires were used to assess variables before and after treatment. The treatment group used the program around 3 times per week at home for 4 weeks, with each session lasting about 30 minutes. To evaluate the program's impact, a 2 (condition) × 2 (time) repeated-measures ANOVA was conducted, considering pretreatment and posttreatment measures along with group as independent variables. For a more detailed analysis, paired-samples 2-tailed t tests were used to compare pretreatment and posttreatment measurements within each group. Independent-samples 2-tailed t tests were conducted to assess intergroup differences in pretreatment measurements. Results: The study included a total of 68 adults aged between 18 and 65 years, who were recruited both through web-based and offline methods. Of these 68 individuals, 41 (60%) were randomly assigned to the treatment group and 27 (40%) to the waitlist group. The attrition rate was 10.2% after 4 weeks. The findings indicated that there is a significant main effect of time (F1,60=15.63; P=.003; ηp2=0.21) in participants' depression scores, indicating that there were changes in depression level across time. No significant changes were observed in perceived stress (P=.25), psychological well-being (P=.35), or quality of life (P=.07). Furthermore, depression scores significantly decreased in the treatment group (from 7.08 to 4.64; P=.03; Cohen d=0.50) but not in the waitlist group (from 6.72 to 5.08; P=.20; Cohen d=0.36). Perceived stress score of the treatment group also significantly decreased (from 2.95 to 2.72; P=.04; Cohen d=0.46) but not in the waitlist group (from 2.82 to 2.74; P=.55; Cohen d=0.15). Conclusions: This study provided experimental evidence that mobile phone-based physical activity program affects depression significantly. By exploring the potential of mobile phone-based physical activity programs as a treatment option, this study sought to improve accessibility and encourage participation in physical activity, ultimately promoting better mental health outcomes for individuals with depression and stress.
... In this section, datasets used for emotion recognition task are explained briefly. Different datasets used for emotion detection task is comparatively analyzed in Table 13, where FABO [46] consist of most number of "Emotion States" as compared to other datasets. In term of subjects (participants) for emotion detection databases, SEMAINE [97] database consist of more number fo subjects i.e. 150 (57-Male and 93-Females) for recording of samples for emotion detection tasks. ...
... We can discuss the performance of these models on the MS-COCO widely used dataset. Comparative analysis of the MMID model's experimental results on benchmark evaluation metrics is shown in RFNet [40] He [42] Feng [43] GET [44] Wang [45] FCN-LSTM [46] Bag-LSTM [47] Stack-VS [48] VSR [49] GLA [50] Up-Down [51] MAGAN [55] MGAN [56] R 0 5 10 15 20 25 30 35 40 RFNet [40] He [42] Feng [43] GET [44] Wang [45] FCN-LSTM [46] Bag-LSTM [47] Stack-VS [48] VSR [49] GLA [50] Up-Down [51] MAGAN [55] MGAN [56] S He [42] Feng [43] GET [44] Wang [45] FCN-LSTM [46] Bag-LSTM [47] Stack-VS [48] VSR [49] GLA [50] Up R-LSTM [39] RFNet [40] He [42] Feng [43] GET [44] Wang [45] FCN-LSTM [46] Bag-LSTM [47] Stack-VS [48] VSR [49] GLA [50] Up-Down [51] MAGAN [55] MGAN [56] C 0 5 10 15 20 25 30 35 CRNN [38] R-LSTM [39] RFNet [40] He [42] Feng [43] GET [44] Wang [45] FCN-LSTM [46] Bag-LSTM [47] Stack-VS [48] VSR [49] GLA [50] Up-Down [51] MAGAN [55] MGAN [56] ...
... We can discuss the performance of these models on the MS-COCO widely used dataset. Comparative analysis of the MMID model's experimental results on benchmark evaluation metrics is shown in RFNet [40] He [42] Feng [43] GET [44] Wang [45] FCN-LSTM [46] Bag-LSTM [47] Stack-VS [48] VSR [49] GLA [50] Up-Down [51] MAGAN [55] MGAN [56] R 0 5 10 15 20 25 30 35 40 RFNet [40] He [42] Feng [43] GET [44] Wang [45] FCN-LSTM [46] Bag-LSTM [47] Stack-VS [48] VSR [49] GLA [50] Up-Down [51] MAGAN [55] MGAN [56] S He [42] Feng [43] GET [44] Wang [45] FCN-LSTM [46] Bag-LSTM [47] Stack-VS [48] VSR [49] GLA [50] Up R-LSTM [39] RFNet [40] He [42] Feng [43] GET [44] Wang [45] FCN-LSTM [46] Bag-LSTM [47] Stack-VS [48] VSR [49] GLA [50] Up-Down [51] MAGAN [55] MGAN [56] C 0 5 10 15 20 25 30 35 CRNN [38] R-LSTM [39] RFNet [40] He [42] Feng [43] GET [44] Wang [45] FCN-LSTM [46] Bag-LSTM [47] Stack-VS [48] VSR [49] GLA [50] Up-Down [51] MAGAN [55] MGAN [56] ...
Article
Full-text available
Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning (MMDL) is to create models that can process and link information using various modalities. Despite the extensive development made for unimodal learning, it still cannot cover all the aspects of human learning. Multimodal learning helps to understand and analyze better when various senses are engaged in the processing of information. This paper focuses on multiple types of modalities, i.e., image, video, text, audio, body gestures, facial expressions, physiological signals, flow, RGB, pose, depth, mesh, and point cloud. Detailed analysis of the baseline approaches and an in-depth study of recent advancements during the last five years (2017 to 2021) in multimodal deep learning applications has been provided. A fine-grained taxonomy of various multimodal deep learning methods is proposed, elaborating on different applications in more depth. Lastly, main issues are highlighted separately for each domain, along with their possible future research directions.
... Facial expressions and upper body movements were recorded with several cameras. The Bimodal Face and Body Gesture Database (FABO) [20] is composed of 206 videos related to upper body features in which facial expression and body information are combined to represent 10 emotions. In the HUMAINE [21] database 10 people acted 8 emotions: anger, despair, interest, pleasure, sadness, irritation, joy, and pride. ...
... were more inclined to have a static pose with the addition of gestures and postures that could reveal their emotion. Conforming with the literature, four experts annotated the videos with the perceived emotion, valence and arousal[20] [10]. The annotations were collected through a Matlab application in which the observers had no limits in the number of viewing times of the same video. ...
Conference Paper
Recognizing emotions from body movements represents a challenge in affective computing. Most methods in the literature focus on analyzing speech features and facial expressions; yet, even considering body postures and motions can help in identifying emotions. To this end, datasets have been designed to assess upper limb movement and hand gestures. However, even the lower body (legs and feet) can be used to reveal information about the user's attitude. In this paper a new video database for emotion recognition is presented. 16 non-professional actors express four emotions (happiness, interest, disgust, and boredom). The videos have been acquired by using four GoPro cameras to record whole body movements in two different scenarios: observational and interaction with another person. 14 body joints are extracted from each frame of each video and they are used to derive features to be used for emotion identification and recognition.