Figure - available from: Frontiers in Neuroscience
This content is subject to copyright.
Simplified representation of dual stream prediction model (DSPM) for imagined speech. The dorsal stream is in yellow boxes, whereas the ventral stream is in blue boxes. The red circle represents the truncation of information at primary motor cortex in the case of speech imagery. pSTG, posterior superior temporal gyrus; STS, superior temporal sulcus. The primary auditory cortex lies in the superior temporal gyrus and extends into Heschl's gyri. Though Heschl's gyri is involved in speech perception, the region is not activated during speech imagery.

Simplified representation of dual stream prediction model (DSPM) for imagined speech. The dorsal stream is in yellow boxes, whereas the ventral stream is in blue boxes. The red circle represents the truncation of information at primary motor cortex in the case of speech imagery. pSTG, posterior superior temporal gyrus; STS, superior temporal sulcus. The primary auditory cortex lies in the superior temporal gyrus and extends into Heschl's gyri. Though Heschl's gyri is involved in speech perception, the region is not activated during speech imagery.

Source publication
Article
Full-text available
Over the past decade, many researchers have come up with different implementations of systems for decoding covert or imagined speech from EEG (electroencephalogram). They differ from each other in several aspects, from data acquisition to machine learning algorithms, due to which, a comparison between different implementations is often difficult. T...

Similar publications

Preprint
Full-text available
In the field of brain-computer interface (BCI) research, the availability of high-quality open-access datasets is essential to benchmark the performance of emerging algorithms. The existing open-access datasets from past competitions mostly deal with healthy individuals' data, while the major application area of BCI is in the clinical domain. Thus...

Citations

... To assess different selection strategies, manual selection based on literature and an automated method using Common Spatial Pattern (CSP) analysis were compared. The manual approach selected channels close to areas known for speech processing, including the Post-central gyrus, Wernicke's area, Pre-motor cortex, Broca's area, and auditory cortices [70]. CSP analysis, implemented via the MNE Python toolkit [71], identified channels contributing most to discriminative spatial patterns, optimizing the decoding task's channel set for each dataset and class. ...
Preprint
Neurological disorders affecting speech production adversely impact quality of life for over 7 million individuals in the US. Traditional speech interfaces like eyetracking devices and P300 spellers are slow and unnatural for these patients. An alternative solution, speech Brain-Computer Interfaces (BCIs), directly decodes speech characteristics, offering a more natural communication mechanism. This research explores the feasibility of decoding speech features using non-invasive EEG. Nine neurologically intact participants were equipped with a 63-channel EEG system with additional sensors to eliminate eye artifacts. Participants read aloud sentences displayed on a screen selected for phonetic similarity to the English language. Deep learning models, including Convolutional Neural Networks and Recurrent Neural Networks with/without attention modules, were optimized with a focus on minimizing trainable parameters and utilizing small input window sizes. These models were employed for discrete and continuous speech decoding tasks, achieving above-chance participant-independent decoding performance for discrete classes and continuous characteristics of the produced audio signal. A frequency sub-band analysis highlighted the significance of certain frequency bands (delta, theta, and gamma) for decoding performance, and a perturbation analysis identified crucial channels. Assessed channel selection methods did not significantly improve performance, but they still outperformed chance levels, suggesting a distributed representation of speech information encoded in the EEG signals. Leave-One-Out training demonstrated the feasibility of utilizing common speech neural correlates, reducing data collection requirements from individual participants.
... Imagined speech paradigms will activate neuromotor signals, although the signals may be less robust than when words or sounds are mouthed or spoken. These paradigms are characterized by a participant imagining the articulatory movements involved in the generation of different target phonemes and words (Panachakel and Ramakrishnan, 2021), which capitalizes on the fact that imagining a movement will still lead to activation of the areas of the brain involved in generating movements. An even more subtle signal is generated by inner speech (also called internal speech or covert self-talk), which involves participants thinking about specific words, but without reconstructing the required articulation to generate them. ...
... The rapid expansion of deep-learning and signal processing methods has led to promising state-of-the-art speech decoding methods, with results that go far beyond near-chance level findings (Pressel Coretto et al., 2017), as shown in Tables 1, 2 (for detailed reviews, see Panachakel and Ramakrishnan, 2021;Lopez-Bernal et al., 2022;Shah et al., 2022). However, comparing between these methods and assessing how generalizable they are is currently a difficult task because few papers make their data or code publicly available (Shah et al., 2022). ...
... The two databases analyzed in this work were selected based on the ease of access to their data online, their popularity within the speech decoding community for benchmarking purposes (Panachakel and Ramakrishnan, 2021), and the close correspondence between their stimuli sets in terms of the types of classification problems that are typically posed. ...
Article
Full-text available
Speech decoding from non-invasive EEG signals can achieve relatively high accuracy (70–80%) for strictly delimited classification tasks, but for more complex tasks non-invasive speech decoding typically yields a 20–50% classification accuracy. However, decoder generalization, or how well algorithms perform objectively across datasets, is complicated by the small size and heterogeneity of existing EEG datasets. Furthermore, the limited availability of open access code hampers a comparison between methods. This study explores the application of a novel non-linear method for signal processing, delay differential analysis (DDA), to speech decoding. We provide a systematic evaluation of its performance on two public imagined speech decoding datasets relative to all publicly available deep learning methods. The results support DDA as a compelling alternative or complementary approach to deep learning methods for speech decoding. DDA is a fast and efficient time-domain open-source method that fits data using only few strong features and does not require extensive preprocessing.
... This application could be of particular interest to subjects with reduced communication abilities. Generating speech from recorded EEG signals have been demonstrated to be an extremely challenging task given the complexity of human speech and the lack of clear understanding of the mechanisms that could map EEG signals to speech [66]. However, GAI techniques have been shown to provide a framework that could succeed in this task. ...
Article
Full-text available
Since their inception more than 50 years ago, Brain-Computer Interfaces (BCIs) have held promise to compensate for functions lost by people with disabilities through allowing direct communication between the brain and external devices. While research throughout the past decades has demonstrated the feasibility of BCI to act as a successful assistive technology, the widespread use of BCI outside the lab is still beyond reach. This can be attributed to a number of challenges that need to be addressed for BCI to be of practical use including limited data availability, limited temporal and spatial resolutions of brain signals recorded non-invasively and inter-subject variability. In addition, for a very long time, BCI development has been mainly confined to specific simple brain patterns, while developing other BCI applications relying on complex brain patterns has been proven infeasible. Generative Artificial Intelligence (GAI) has recently emerged as an artificial intelligence domain in which trained models can be used to generate new data with properties resembling that of available data. Given the enhancements observed in other domains that possess similar challenges to BCI development, GAI has been recently employed in a multitude of BCI development applications to generate synthetic brain activity; thereby, augmenting the recorded brain activity. Here, a brief review of the recent adoption of GAI techniques to overcome the aforementioned BCI challenges is provided demonstrating the enhancements achieved using GAI techniques in augmenting limited EEG data, enhancing the spatiotemporal resolution of recorded EEG data, enhancing cross-subject performance of BCI systems and implementing end-to-end BCI applications. GAI could represent the means by which BCI would be transformed into a prevalent assistive technology, thereby improving the quality of life of people with disabilities, and helping in adopting BCI as an emerging human-computer interaction technology for general use.
... The brain's neurons generate electrical signals between a frequency range of 0 to 100 Hz. These signals are categorized into different bands based on their frequency: delta (0-4 Hz), theta (4-7 Hz), alpha (8)(9)(10)(11)(12), beta (12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30), and gamma (30-up to 100 Hz). Figure 2 displays multiple EEG signal bands [4]. ...
... The brain's neurons generate electrical signals between a frequency range of 0 to 100 Hz. These signals are categorized into different bands based on their frequency: delta (0-4 Hz), theta (4-7 Hz), alpha (8)(9)(10)(11)(12), beta (12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30), and gamma (30-up to 100 Hz). Figure 2 displays multiple EEG signal bands [4]. ...
... Considering that speech is the fundamental mode of communication, therefore, simulated speech is the optimal stimulus for a BCI system. This study [12] presents a comprehensive framework that combines all relevant research on decoding imagined speech from EEG conducted in the last decade. Every essential element involved in the development of this system is thoroughly analyzed, encompassing the selection of words to be envisioned, the quantity of electrodes for recording, the temporal and spatial filtering, feature extraction, and classifier. ...
Preprint
Full-text available
Artificial Intelligence (AI) and Machine Learning has brought significant atten- tion to the human brain, making it a prominent research area in engineering and technology and other non-medical sciences. Electroencephalogram (EEG) are one of many biological signals that are produced by human brain. EEG signals contain electrical properties and has frequency ranging between 0-100Hz. Fea- tures are the various attributes of the recorded signals which are associated with the state of the human brain. The data comprises values that correspond to the frequencies of EEG signals, specifically delta, theta, alpha, beta, and gamma. Additionally, it includes information about the level of attention, level of medita- tion, and the frequency of eye blinking of the subject. This research has given a notion of how a imagined digit is classified from an EEG signal by using machine learning algorithms. We have done the analysis by using models like k-Nearest Neighbor (kNN), Convolutional Neural Network (CNN) and Genetic Program- ming (GP). An original EEG data set is created for digit from 0–9 by using a non invasive single electrode (channel) EEG device. The obtained accuracy for kNN is 66.8%, for Convolutional Neural Network it is 73.1% and that for GP it is calculated equal to 82%. If the calculated accuracy of lower channel device is improved and further achieved more then one day they may replace higher chan- nel bulky devices. As single channel or lower channel EEG device are portable and easy to use therefore implementation of this work in future may meet a variety of applications in biomedical engineering, smart health care, personal assistance and automation.
... Among the several techniques that can be used to record brain signals, electroencephalogram (EEG) is the preferred choice for most of the research works that have explored inner speech recognition (Panachakel and Ramakrishnan, 2021). ...
Article
Full-text available
Objective. In recent years, EEG-based Brain-Computer Interfaces (BCIs) applied to inner speech classification have gathered attention for their potential to provide a communication channel for individuals with speech disabilities. However, existing methodologies for this task fall short in achieving acceptable accuracy for real-life implementation. This paper concentrated on exploring the possibility of using inter-trial coherence (ITC) as a feature extraction technique to enhance inner speech classification accuracy in EEG-based BCIs. Approach. To address the objective, this work presents a novel methodology that employs ITC for feature extraction within a complex Morlet time-frequency representation. The study involves a dataset comprising EEG recordings of four different words for ten subjects, with three recording sessions per subject. The extracted features are then classified using k-Nearest-Neighbors (kNN) and Support Vector Machine (SVM). Main results. The average classification accuracy achieved using the proposed methodology is 56.08% for kNN and 59.55% for SVM. These results demonstrate comparable or superior performance in comparison to previous works. The exploration of inter-trial phase coherence as a feature extraction technique proves promising for enhancing accuracy in inner speech classification within EEG-based BCIs. Significance. This study contributes to the advancement of EEG-based BCIs for inner speech classification by introducing a feature extraction methodology using ITC. The obtained results, on par or superior to previous works, highlight the potential significance of this approach in improving the accuracy of BCI systems. The exploration of this technique lays the groundwork for further research toward inner speech decoding.
... In EEG signals, different frequency components play different roles in decoding motor imagery. Generally, the µ band (8)(9)(10)(11)(12)(13) and the β band (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) are related to motor execution and imagery. The activities in these frequency bands often exhibit eventrelated synchronization and desynchronization during motor imagery tasks [11] and present specific spatial pa erns in the EEG, mainly concentrated in the sensorimotor cortex area [12]. ...
... Such models are more reliable in practical applications. However, the collection of EEG signals [13][14][15][16] is influenced by many factors, leading to the limited availability of training data. In addition, EEG signals themselves exhibit non-stationarity and high individual variability [17,18], which limits the applicability of traditional data augmentation methods [19][20][21][22][23][24][25] such as interpolation in the brain-computer interface field. ...
Article
Full-text available
Motor imagery electroencephalography (EEG) signals have garnered attention in brain–computer interface (BCI) research due to their potential in promoting motor rehabilitation and control. However, the limited availability of labeled data poses challenges for training robust classifiers. In this study, we propose a novel data augmentation method utilizing an improved Deep Convolutional Generative Adversarial Network with Gradient Penalty (DCGAN-GP) to address this issue. We transformed raw EEG signals into two-dimensional time–frequency maps and employed a DCGAN-GP network to generate synthetic time–frequency representations resembling real data. Validation experiments were conducted on the BCI IV 2b dataset, comparing the performance of classifiers trained with augmented and unaugmented data. Results demonstrated that classifiers trained with synthetic data exhibit enhanced robustness across multiple subjects and achieve higher classification accuracy. Our findings highlight the effectiveness of utilizing a DCGAN-GP-generated synthetic EEG data to improve classifier performance in distinguishing different motor imagery tasks. Thus, the proposed data augmentation method based on a DCGAN-GP offers a promising avenue for enhancing BCI system performance, overcoming data scarcity challenges, and bolstering classifier robustness, thereby providing substantial support for the broader adoption of BCI technology in real-world applications.
... Understanding the current density and potential development in specific brain areas is crucial for analyzing various activities related to synapses, membranes, fluids, and tissues. In particular, the STG, also known as Wernicke area, plays a vital role in speech comprehension [5]. Figure 3 highlights the STG area, which is responsible for speech comprehension. ...
... Therefore, it becomes imperative to accurately map the precise location of the brain activity [9,10]. The exact localization of primary and secondary speech areas is still being under research [5]. The design should be user-friendly for precise electrode placement on the scalp with minimal electrodes. ...
Article
Full-text available
Individuals facing verbal communication impairments resulting from brain disorders like paralysis or autism encounter significant challenges when unable to articulate speech. This research proposes the design and development of a wearable system capable of decoding imagined speech using electroencephalogram (EEG) signals obtained during the mental process of speech generation. The system’s main objective is to offer an alternative communication method for individuals who can hear and think but face challenges in articulating their thoughts verbally. The design suggested includes user-friendliness, wearability, and comfort for seamless integration into daily life. A minimal number of electrodes are strategically placed on the scalp to minimize invasiveness. Achieving precise localization of the cortical areas responsible for generating the EEG patterns during imagined speech is vital for accurate decoding. Literature studies are utilized to determine the cortical positions associated with speech processing. Due to the inherent limitations in EEG spatial resolution, meticulous experiments are conducted to map the scalp positions onto their corresponding cortical counterparts. Specifically, we focus on identifying the scalp location over the superior temporal gyrus (T3) using the internationally recognized 10-20 electrode placement system by employing a circular periphery movement with a 2 cm distance increment. Our research involves nine subjects spanning various age groups, with the youngest being 23 and the oldest 65. Each participant undergoes ten iterations, during which they imagine six Marathi syllables. Our work contributes to the development of wearable assistive technology, enabling mute individuals to communicate effectively by translating their imagined speech into actionable commands. This innovation ultimately enhances their social participation and overall well-being.
... We use the EEG dataset prepared by Nieto et al. (Nieto, Peterson, Rufiner, Kamienkowski and Spies, 2022) to evaluate the proposed framework for low resource multi-class inner speech classification. Inner speech is defined by the internalized process in which an individual thinks in pure meanings, generally associated with an auditory imagery (Panachakel and Ramakrishnan, 2021;Nieto et al., 2022). Inner speech plays an important role towards building BCIs with an intuitive and fluid communication control paradigm for the users (Martin, Iturrate, Millán, Knight and Pasley, 2018). ...
... These systems uses machine learning technique to convert the EEG signals captured during vowel, word or digit imagining [7] into text vowels, word or a digit [11]. The identification of vowels imagery or alphabets using multichannel EEG (MCEEG) signals has been the subject of interest for several research approaches in the last decade [12], [13], [14] (Table 11) We propose a new dataset which is collected using a NeuroSky Mindwave Mobile2 single electrode or single channel device. In this work the vowel is imagined instead of pronouncing and EEG signals are collected for vowels /a/e/i/o/u/ and dataset is formed. ...
Preprint
Full-text available
Electroencephalogram (EEG) signals are produced by neurons of human brain and contain frequencies and electrical properties. It is easy for a Brain to Computer Interface (BCI) system to record EEG signals by using non-invasive methods. Speech imagery (SI) can be used to convert speech imaging into text, researches done so far on SI has made use of multichannel devices. In this work, we propose EEG signal dataset for imagined a/e/i/o/u vowels collected from 5 participants using NeuroSky Mindwave Mobile2 single channel device. Decision Tree (DT), Random Forest (RF), Genetic Algorithm (GA) Machine Learning (ML) classifiers are trained with proposed dataset. For the proposed dataset, the average classification accuracy of DT is found lower in comparison to RF and GA. GA shows better performance for vowel e/o/u resulting accuracy of 80.8%, 82.36%, 81.8% for 70 − 30 data partition, 80.2%, 81.9%, 80.6% for 60 − 40 partition data and 79.8%, 81.12%, 78.36% for 50–50 data partition. Whereas RF shows improved classification accuracy for a/i which is 83.44%, 81.6% for 70 − 30 data partition, 82.2%, 81.2% for 60 − 40 data partition and 81.4%, 80.2% for 50–50 data partition. Some other performance parameters like min. value, max. value of accuracy, standard deviation, sensitivity, specificity, precision, F1 score, false positive rate and receiver operating characteristics are also evaluated and anal- ysed. Research has proven that brain functions remains normal in patients with vocal disorders. Completely disabled patients can be equipped with such technol- ogy as this may be one of the best way for them to have access over the essential day to day basic requirement.
... Despite these challenges, previous studies have delved into the complexities of covert speech decoding, using both intracranial recordings [14][15][16] and non-invasive techniques such as EEG [17][18][19][20][21], employing the imagery of different types of speech units such as vowels [22,23], syllables [24,25], words [16,[26][27][28], and sentences [29]. Only two previous BCI studies based on surface EEG recordings have so far attempted to decode imagined speech in real time, with limited effectiveness [18,30]. ...
... Given the success of adaptive classifiers in motor imagery BCIs, similar benefits can be expected in speech imagery BCIs [17], where decoding is notably difficult. ...
Article
Full-text available
Brain-Computer Interfaces (BCIs) aim to establish a pathway between the brain and an external device without the involvement of the motor system, relying exclusively on neural signals. Such systems have the potential to provide a means of communication for patients who have lost the ability to speak due to a neurological disorder. Traditional methodologies for decoding imagined speech directly from brain signals often deploy static classifiers, that is, decoders that are computed once at the beginning of the experiment and remain unchanged throughout the BCI use. However, this approach might be inadequate to effectively handle the non-stationary nature of electroencephalography (EEG) signals and the learning that accompanies BCI use, as parameters are expected to change, and all the more in a real-time setting. To address this limitation, we developed an adaptive classifier that updates its parameters based on the incoming data in real time. We first identified optimal parameters (the update coefficient, UC) to be used in an adaptive Linear Discriminant Analysis (LDA) classifier, using a previously recorded EEG dataset, acquired while healthy participants controlled a binary BCI based on imagined syllable decoding. We subsequently tested the effectiveness of this optimization in a real-time BCI control setting. Twenty healthy participants performed two BCI control sessions based on the imagery of two syllables, using a static LDA and an adaptive LDA classifier, in randomized order. As hypothesized, the adaptive classifier led to better performances than the static one in this real-time BCI control task. Furthermore, the optimal parameters for the adaptive classifier were closely aligned in both datasets, acquired using the same syllable imagery task. These findings highlight the effectiveness and reliability of adaptive LDA classifiers for real-time imagined speech decoding. Such an improvement can shorten the training time and favor the development of multi-class BCIs, representing a clear interest for non-invasive systems notably characterized by low decoding accuracies.