The high-level architecture diagram of the proposed model. From left-to-right, (a) EEG topograph inputs, (b) convolutional variational encoders, (c) lateralized embeddings, (d) bi-lateral feature concatenation and lateral convolution, (e) subject-independent dense embeddings, (f) emotion classifier, and (g) adversarial subject classifier.

The high-level architecture diagram of the proposed model. From left-to-right, (a) EEG topograph inputs, (b) convolutional variational encoders, (c) lateralized embeddings, (d) bi-lateral feature concatenation and lateral convolution, (e) subject-independent dense embeddings, (f) emotion classifier, and (g) adversarial subject classifier.

Source publication
Article
Full-text available
Two of the biggest challenges in building models for detecting emotions from electroencephalography (EEG) devices are the relatively small amount of labeled samples and the strong variability of signal feature distributions between different subjects. In this study, we propose a context-generalized model that tackles the data constraints and subjec...

Contexts in source publication

Context 1
... overview of the architecture of the proposed model is shown in Figure 2. Similar to Li et al. [14], we adopt a bi-lateral architecture to emulate the lateralized brain theories of emotion [43], which are frequently used in neuroscience and neuroimaging studies. ...
Context 2
... generate the visualizations, we train the respective algorithms with a random 70% batch from the datasets. As for the inputs to the visualization algorithms, in the case of the deep models, we use the embedding layer outputs of the models, or in other words, the data embeddings positioned right before the final set of fully-connected classification layers, as shown in the right side of Figure 2. For VMD and MIDA, we use the raw features generated by the algorithms. ...

Similar publications

Article
Full-text available
Background Patients with type 2 diabetes require recommendations for self-management education and support. Objective In this study, we aim to design the Diabetes Engagement and Activation Platform (DEAP)—an automated patient education tool integrated into primary care workflow—and examine its implementation and effectiveness. Methods We invited...

Citations

... And strong individual differences might be caused by subjects' sensory information to the stimuli materials [9], clinical settings and other reasons. If the feature extractor is cumbersome and complex, it would be difficult for the emotion classification loss and domain loss to update the weights of a large number of parameters due to the data quantity limitation and strong individual differences [10]. On the other hand, it is also difficult for simple feature extractor to extract discriminative features from complex temporal dynamics and spatial correlations of EEG. ...
... = ( ) (10) where is the prediction of teacher model, is the weight of output layer, and (•) denotes the sigmoid function. ...
... TCN has the strong ability to learn temporal context, but spatial correlation among the electrodes could not be well captured when compared with TSFIN. In the Ref. [10], bi-hemisphere asymmetry learning and DANN are combined to learn the domain-invariant features. Different from the proposed method, the variational autoencoders, which are trained by labeled and unlabeled samples, are adopted to deal with the limited EEG data problems. ...
Preprint
Individual differences of Electroencephalogram (EEG) could cause the domain shift which would significantly degrade the performance of cross-subject strategy. The domain adversarial neural networks (DANN), where the classification loss and domain loss jointly update the parameters of feature extractor, are adopted to deal with the domain shift. However, limited EEG data quantity and strong individual difference are challenges for the DANN with cumbersome feature extractor. In this work, we propose knowledge distillation (KD) based lightweight DANN to enhance cross-subject EEG-based emotion recognition. Specifically, the teacher model with strong context learning ability is utilized to learn complex temporal dynamics and spatial correlations of EEG, and robust lightweight student model is guided by the teacher model to learn more difficult domain-invariant features. In the feature-based KD framework, a transformer-based hierarchical temporalspatial learning model is served as the teacher model. The student model, which is composed of Bi-LSTM units, is a lightweight version of the teacher model. Hence, the student model could be supervised to mimic the robust feature representations of teacher model by leveraging complementary latent temporal features and spatial features. In the DANN-based cross-subject emotion recognition, we combine the obtained student model and a lightweight temporal-spatial feature interaction module as the feature extractor. And the feature aggregation is fed to the emotion classifier and domain classifier for domain-invariant feature learning. To verify the effectiveness of the proposed method, we conduct the subject-independent experiments on the public dataset DEAP with arousal and valence classification. The outstanding performance and t-SNE visualization of latent features verify the advantage and effectiveness of the proposed method.
... One trend of the literature is to adapt DA methods originally proposed in a context (e.g., image classification) to another one (e.g., EEG emotion recognition). For example, in Hagad et al. (2021) methods to adapt DA strategies from the image classification context to EEG emotion classification are proposed. However, each context has its characteristics and peculiarities, making it not trivial to adapt a DA method from one task to another. ...
... Adversarial methods for DA have a wide diffusion in different reallife applications, such as bearing fault diagnosis (Ghorvei et al., 2023) and cross-machine fault diagnosis (Yu et al., 2021). They are also widely used in several studies for EEG data recognition, such as Tzeng et al. (2017), Bao et al. (2020), Li et al. (2019b), Roy (2022), Ding et al. (2021), Hagad et al. (2021) and He et al. (2022). The literature also offers applications of deep DA and shallow DA used together. ...
... In Liu et al. (2021) a similar strategy is adopted, but for Domain Adaptation context. Hagad et al. (2021) join together BiDANN and Variational Autoencoder (VAE) obtaining a subject-invariant Bi-lateral Variational Domain Adversarial Neural Network (BiVDANN). VAEs are generative neural networks able to learn embedding of data constrained to a Gaussian distribution. ...
... For these reasons, newer studies tried to overcome the Dataset Shift problem in EEG-based BCIs [32]. In particular, Domain Adaptation (DA) strategies try to construct models able to generalize on unseen data exploiting knowledge given by available unlabelled data. ...
Article
Full-text available
This work addresses the employment of Machine Learning (ML) and Domain Adaptation (DA) in the framework of Brain-Computer Interfaces (BCIs) based on Steady-State Visually Evoked Potentials (SSVEPs). Currently, all the state-of-the-art classification strategies do not consider the high non-stationarity typical of brain signals. This can lead to poor performance, expecially when short-time signals have to be considered to allow real-time human-environment interaction. In this regard, ML and DA techniques can represent a suitable strategy to enhance the performance of SSVEPs classification pipelines. In particular, the employment of a two-step DA technique is proposed: first, the standardization of the data per subject is performed by exploiting a part of unlabeled test data during the training stage; second, a similarity measure between subjects is considered in the selection of the validation sets. The proposal was applied to three classifiers to verify the statistical significance of the improvements over the standard approaches. These classifiers were validated and comparatively tested on a well-known public benchmark dataset. An appropriate validation method was used in order to simulate real-world usage. The experimental results show that the proposed approach significantly improves the classification accuracy of SSVEPs. In fact, up to 62.27 % accuracy was achieved also in the case of short-time signals (i.e., 1.0 s). This represents a further confirmation of the suitability of advanced ML to improve the performance of BCIs for daily-life applications.
... In Liu et al. (2021) a similar strategy is adopted, but for Domain Adaptation context. Hagad et al. (2021) join together BiDANN and Variational Autoencoder (VAE) obtaining a subject-invariant Bi-lateral Variational Domain Adversarial Neural Network (BiVDANN). VAEs are generative neural networks able to learn embedding of data constrained to a Gaussian distribution. ...
Preprint
Full-text available
A systematic review on machine-learning strategies for improving generalizability (cross-subjects and cross-sessions) electroencephalography (EEG) based in emotion classification was realized. In this context, the non-stationarity of EEG signals is a critical issue and can lead to the Dataset Shift problem. Several architectures and methods have been proposed to address this issue, mainly based on transfer learning methods. 418 papers were retrieved from the Scopus, IEEE Xplore and PubMed databases through a search query focusing on modern machine learning techniques for generalization in EEG-based emotion assessment. Among these papers, 75 were found eligible based on their relevance to the problem. Studies lacking a specific cross-subject and cross-session validation strategy and making use of other biosignals as support were excluded. On the basis of the selected papers' analysis, a taxonomy of the studies employing Machine Learning (ML) methods was proposed, together with a brief discussion on the different ML approaches involved. The studies with the best results in terms of average classification accuracy were identified, supporting that transfer learning methods seem to perform better than other approaches. A discussion is proposed on the impact of (i) the emotion theoretical models and (ii) psychological screening of the experimental sample on the classifier performances.
... In [190], Hagad et al. employ a DANN, consisting of a domain and emotion classifier, alongside a beta-VAE [191], treating each of the multiple sources as a single domain and feeding the DANN with outputs of bi-lateral convolutional on the concatenated VAE outputs of the two hemispheres. [192] proposes a generalization approach that theoretically guarantees a generalization bound on unseen domains. ...
Preprint
Machine learning algorithms have revolutionized different fields, including natural language processing, computer vision, signal processing, and medical data processing. Despite the excellent capabilities of machine learning algorithms in various tasks and areas, the performance of these models mainly deteriorates when there is a shift in the test and training data distributions. This gap occurs due to the violation of the fundamental assumption that the training and test data are independent and identically distributed (i.i.d). In real-world scenarios where collecting data from all possible domains for training is costly and even impossible, the i.i.d assumption can hardly be satisfied. The problem is even more severe in the case of medical images and signals because it requires either expensive equipment or a meticulous experimentation setup to collect data, even for a single domain. Additionally, the decrease in performance may have severe consequences in the analysis of medical records. As a result of such problems, the ability to generalize and adapt under distribution shifts (domain generalization (DG) and domain adaptation (DA)) is essential for the analysis of medical data. This paper provides the first systematic review of DG and DA on functional brain signals to fill the gap of the absence of a comprehensive study in this era. We provide detailed explanations and categorizations of datasets, approaches, and architectures used in DG and DA on functional brain images. We further address the attention-worthy future tracks in this field.
... These methods motivate the development of EEG-based emotion recognition, but the handcrafted features would potentially lose the detailed EEG fluctuation information which is beneficial to the performance. For example, β band (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) and γ band (30-47 Hz) are the common choices in the PSD and DE extraction. The fairly long band-width would cause the dynamic temporal information loss. ...
Preprint
Both the temporal dynamics and spatial correlations of Electroencephalogram (EEG), which contain discriminative emotion information, are essential for the emotion recognition. However, some redundant information within the EEG signals would degrade the performance. Specifically,the subjects reach prospective intense emotions for only a fraction of the stimulus duration. Besides, it is a challenge to extract discriminative features from the complex spatial correlations among a number of electrodes. To deal with the problems, we propose a transformer-based model to robustly capture temporal dynamics and spatial correlations of EEG. Especially, temporal feature extractors which share the weight among all the EEG channels are designed to adaptively extract dynamic context information from raw signals. Furthermore, multi-head self-attention mechanism within the transformers could adaptively localize the vital EEG fragments and emphasize the essential brain regions which contribute to the performance. To verify the effectiveness of the proposed method, we conduct the experiments on two public datasets, DEAP and MAHNOBHCI. The results demonstrate that the proposed method achieves outstanding performance on arousal and valence classification.
... Fourier analysis reflects the frequency content using sums of trigonometric functions and then distributes the average power into the PSD. As shown in Figure 1a, the PSD in a human EEG signal is approximately divided into several ranges, including theta (4-7 Hz), alpha (8)(9)(10)(11)(12), beta (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32), and gamma (>32 Hz) bands. These frequency bands reflect brain activities through the strength of variation. ...
... Theta (4-7 Hz) is a slow wave associated with the subconscious mind, deep relaxation, and meditation. Changes in alpha (8)(9)(10)(11)(12) and beta (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32) waves are the most discriminative for emotional states [12,18]. Gamma (>33 Hz) is a hyperactivity wave associated with problem-solving and concentration and is related to positive and negative emotions but on different sides; left for negative and right for positive [19]. ...
... Accordingly, we specifically adopted four kernels of length 25 ms (=5 samples under the sampling rate of 200 Hz), 50 ms (=10 samples), 100 ms (=20 samples), and 200 ms (=40 samples). These choices were based on four frequency bands, namely, theta (4-7 Hz), alpha (8)(9)(10)(11)(12), beta (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32), and gamma (>33 Hz), extensively used to characterize brain states/activities. The sampling rate is also considered because wavelengths can be captured with the same time scale, even from different datasets. ...
Article
Full-text available
Deep learning using an end-to-end convolutional neural network (ConvNet) has been applied to several electroencephalography (EEG)-based brain–computer interface tasks to extract feature maps and classify the target output. However, the EEG analysis remains challenging since it requires consideration of various architectural design components that influence the representational ability of extracted features. This study proposes an EEG-based emotion classification model called the multi-kernel temporal and spatial convolution network (MultiT-S ConvNet). The multi-scale kernel is used in the model to learn various time resolutions, and separable convolutions are applied to find related spatial patterns. In addition, we enhanced both the temporal and spatial filters with a lightweight gating mechanism. To validate the performance and classification accuracy of MultiT-S ConvNet, we conduct subject-dependent and subject-independent experiments on EEG-based emotion datasets: DEAP and SEED. Compared with existing methods, MultiT-S ConvNet outperforms with higher accuracy results and a few trainable parameters. Moreover, the proposed multi-scale module in temporal filtering enables extracting a wide range of EEG representations, covering short- to long-wavelength components. This module could be further implemented in any model of EEG-based convolution networks, and its ability potentially improves the model’s learning capacity.
... VAE often employs Kullback-Leibler (KL) divergence, which is a measure of how the probability distribution of the latent space differs from that generated by sampling data from it [20]. A special version of the VAE was proposed in [21], focused on learning a generalised model of emotion by concurrently optimizing the goal or learning normally distributed and subjectindependent feature representations, via the use of spectral topography data. The ultimate objective was to maximize dataset inter-compatibility, improve robustness to localized electrode noise, and provide a more generally applicable method within neuroscience. ...
Article
Full-text available
Dimensionality reduction and the automatic learning of key features from electroencephalographic (EEG) signals have always been challenging tasks. Variational autoencoders (VAEs) have been used for EEG data generation and augmentation, denoising, and automatic feature extraction. However, investigations of the optimal shape of their latent space have been neglected. This research tried to understand the minimal size of the latent space of convolutional VAEs, trained with spectral topographic EEG head-maps of different frequency bands, that leads to the maximum reconstruction capacity of the input and maximum utility for classification tasks. Head-maps are generated employing a sliding window technique with a 125ms shift. Person-specific convolutional VAEs are trained to learn latent spaces of varying dimensions while a dense neural network is trained to investigate their utility on a classification task. The empirical results suggest that when VAEs are deployed on spectral topographic maps with shape 32 x 32, deployed for 32 electrodes from 2 seconds cerebral activity, they were capable of reducing the input up to almost 99%, with a latent space of 28 means and standard deviations. This did not compromise the salient information, as confirmed by a structural similarity index, and mean squared error between the input and reconstructed maps. Additionally, along the 28 means maximized the utility of latent spaces in the classification task, with an average 0.93% accuracy. This study contributes to the body of knowledge by offering a pipeline for effective dimensionality reduction of EEG data by employing convolutional variational autoencoders.
... One trend of the literature is to adapt DA methods originally proposed in a context (e.g., image classification) to another one (e.g., EEG emotion recognition). For example, in Hagad et al. (2021) methods to adapt DA strategies from the image classification context to EEG emotion classification are proposed. However, each context has its characteristics and peculiarities, making it not trivial to adapt a DA method from one task to another. ...
... Adversarial methods for DA have a wide diffusion in different reallife applications, such as bearing fault diagnosis (Ghorvei et al., 2023) and cross-machine fault diagnosis (Yu et al., 2021). They are also widely used in several studies for EEG data recognition, such as Tzeng et al. (2017), Bao et al. (2020), Li et al. (2019b), Roy (2022), Ding et al. (2021), Hagad et al. (2021) and He et al. (2022). The literature also offers applications of deep DA and shallow DA used together. ...
Preprint
In the Machine Learning (ML) literature, a well-known problem is the Dataset Shift problem where, differently from the ML standard hypothesis, the data in the training and test sets can follow different probability distributions, leading ML systems toward poor generalisation performances. This problem is intensely felt in the Brain-Computer Interface (BCI) context, where bio-signals as Electroencephalographic (EEG) are often used. In fact, EEG signals are highly non-stationary both over time and between different subjects. To overcome this problem, several proposed solutions are based on recent transfer learning approaches such as Domain Adaption (DA). In several cases, however, the actual causes of the improvements remain ambiguous. This paper focuses on the impact of data normalisation, or standardisation strategies applied together with DA methods. In particular, using \textit{SEED}, \textit{DEAP}, and \textit{BCI Competition IV 2a} EEG datasets, we experimentally evaluated the impact of different normalization strategies applied with and without several well-known DA methods, comparing the obtained performances. It results that the choice of the normalisation strategy plays a key role on the classifier performances in DA scenarios, and interestingly, in several cases, the use of only an appropriate normalisation schema outperforms the DA technique.