Figure 2 - uploaded by Jan Rusz
Content may be subject to copyright.
Mind map illustrating basic principles of individual acoustic speech features. MFCC = Mel-frequency cepstral coefficients. 

Mind map illustrating basic principles of individual acoustic speech features. MFCC = Mel-frequency cepstral coefficients. 

Source publication
Article
Full-text available
For generations, the evaluation of speech abnormalities in neurodegenerative disorders such as Parkinson’s disease (PD) has been limited to perceptual tests or user-controlled laboratory analysis based upon rather small samples of human vocalizations. Our study introduces a fully automated method that yields significant features related to respirat...

Contexts in source publication

Context 1
... precise and robust classification is ensured if individual speech classes are estimated sequentially with respect to the corresponding traits in most differenced parameters. Sequential separation was executed via unique recognition steps where each recognition step separated previous distributions into two fractions (Supplementary Fig. S2). ...
Context 2
... separation. The principle of sequential separation consisted in step-by-step recognition of the most differentiated components of speech ( Supplementary Fig. S2). Speech was separated in the following order: voiced speech ( Supplementary Fig. S2A), unvoiced speech ( Supplementary Fig. S2B), and respiration ( Supplementary Fig. S2C). ...
Context 3
... principle of sequential separation consisted in step-by-step recognition of the most differentiated components of speech ( Supplementary Fig. S2). Speech was separated in the following order: voiced speech ( Supplementary Fig. S2A), unvoiced speech ( Supplementary Fig. S2B), and respiration ( Supplementary Fig. S2C). The recognition step was executed inside a sliding recognition window. ...
Context 4
... principle of sequential separation consisted in step-by-step recognition of the most differentiated components of speech ( Supplementary Fig. S2). Speech was separated in the following order: voiced speech ( Supplementary Fig. S2A), unvoiced speech ( Supplementary Fig. S2B), and respiration ( Supplementary Fig. S2C). The recognition step was executed inside a sliding recognition window. ...
Context 5
... principle of sequential separation consisted in step-by-step recognition of the most differentiated components of speech ( Supplementary Fig. S2). Speech was separated in the following order: voiced speech ( Supplementary Fig. S2A), unvoiced speech ( Supplementary Fig. S2B), and respiration ( Supplementary Fig. S2C). The recognition step was executed inside a sliding recognition window. ...
Context 6
... Figure S1: Standardized, phonetically-balanced Czech text of 80 words. Figure S2: Flow chart of automated algorithm describes full process of segmentation of speech signal in basic physiological sources of signal, including voiced speech, unvoiced speech, pause, and respiration. The signal was decimated and high-pass- filtered in a preprocessing step. ...

Citations

... Hence, quantitative acoustic evaluation in Mandarin also holds significant importance for DBS candidates. Notably, the foundation of this study lies in the methodologies developed by Jan Rusz's lab 14,36,37 and the MSP Program. Although the proposed speech analyses were newly adapted to the Mandarin tonal language and thus were performed under supervision to achieve full control over the quality of the processing, we hypothesize that the final solution can be computerized since the Dysarthria Analyzer 14 as well as the MSP Program are already fully automated. ...
Article
Full-text available
Approximately 90% of Parkinson’s patients (PD) suffer from dysarthria. However, there is currently a lack of research on acoustic measurements and speech impairment patterns among Mandarin-speaking individuals with PD. This study aims to assess the diagnosis and disease monitoring possibility in Mandarin-speaking PD patients through the recommended speech paradigm for non-tonal languages, and to explore the anatomical and functional substrates. We examined total of 160 native Mandarin-speaking Chinese participants consisting of 80 PD patients, 40 healthy controls (HC), and 40 MRI controls. We screened the optimal acoustic metric combination for PD diagnosis. Finally, we used the objective metrics to predict the patient’s motor status using the Naïve Bayes model and analyzed the correlations between cortical thickness, subcortical volumes, functional connectivity, and network properties. Comprehensive acoustic screening based on prosodic, articulation, and phonation abnormalities allows differentiation between HC and PD with an area under the curve of 0.931. Patients with slowed reading exhibited atrophy of the fusiform gyrus (FDR p = 0.010, R = 0.391), reduced functional connectivity between the fusiform gyrus and motor cortex, and increased nodal local efficiency (NLE) and nodal efficiency (NE) in bilateral pallidum. Patients with prolonged pauses demonstrated atrophy in the left hippocampus, along with decreased NLE and NE. The acoustic assessment in Mandarin proves effective in diagnosis and disease monitoring for Mandarin-speaking PD patients, generalizing standardized acoustic guidelines beyond non-tonal languages. The speech impairment in Mandarin-speaking PD patients not only involves motor aspects of speech but also encompasses the cognitive processes underlying language generation.
... The key features of speech in PwPD are monopitch, reduced stress, monoloudness, imprecise consonants, inappropriate silences, short rushes of speech, harsh and breathy voice, low pitch, and variable rate (Darley et al., 1969a(Darley et al., , 1969b). An additional and potential marker of dysarthria in PwPD is reduced loudness or hypophonia (Becker et al., 2002;Canter, 1963;Fox & Ramig, 1997;Hlavnička et al., 2017;Ho et al., 1999;Liotti et al., 2003;Rusz et al., 2013). ...
Article
Purpose Cross-language studies suggest more similarities than differences in how dysarthria affects the speech of people with Parkinson's disease (PwPD) who speak different languages. In this study, we aimed to identify the relative contribution of acoustic variables to distinguish PwPD from controls who spoke varieties of two Romance languages, French and Portuguese. Method This bi-national, cross-sectional, and case-controlled study included 129 PwPD and 124 healthy controls who spoke French or Portuguese. All participants underwent the same clinical examinations, voice/speech recordings, and self-assessment questionnaires. PwPD were evaluated off and on optimal medication. Inferential analyses included Disease (controls vs. PwPD) and Language (French vs. Portuguese) as factors, and random decision forest algorithms identified relevant acoustic variables able to distinguish participants: (a) by language (French vs. Portuguese) and (b) by clinical status (PwPD on and off medication vs. controls). Results French-speaking and Portuguese-speaking individuals were distinguished from each other with over 90% accuracy by five acoustic variables (the mean fundamental frequency and the shimmer of the sustained vowel /a/ production, the oral diadochokinesis performance index, the relative sound level pressure and the relative sound pressure level standard deviation of the text reading). A distinct set of parameters discriminated between controls and PwPD: for men, maximum phonation time and the oral diadochokinesis speech proportion were the most significant variables; for women, variables calculated from the oral diadochokinesis were the most discriminative. Conclusions Acoustic variables related to phonation and voice quality distinguished between speakers of the two languages. Variables related to pneumophonic coordination and articulation rate were the more effective in distinguishing PwPD from controls. Thus, our research findings support that respiration and diadochokinesis tasks appear to be the most appropriate to pinpoint signs of dysarthria, which are largely homogeneous and language-universal. In contrast, identifying language-specific variables with the speech tasks and acoustic variables studied was less conclusive.
... Automatic systems are also proposed for the detection of PD using articulatory and phonatory features extracted with signal processing techniques, reporting accuracies -in the most robust cases, methodologically speaking-below 90% [27], [28]. These systems aim for a binary categorisation of PD vs. controls, with no detailed analysis of the discrimination capabilities of the distinct Manner Classes (MC) of phonemes (i.e., categories with the same manner of articulation), a crucial aspect for future system developments, to adequately select the speech tasks to be employed, and to design appropriate speech rehabilitation techniques. ...
Preprint
Full-text available
Parkinson’s disease significantly impacts speech, particularly affecting phonemic groups like stop-plosives, fricatives, and affricates. However, its objective impact on the different phonemic groups has been briefly addressed in the past. This study introduces a new model, called MARTA, built upon a Gaussian Mixture Variational AutoEncoder with metric learning to measure the disease’s impact on the phonemic grouping automatically and objectively. MARTA was trained on normophonic speech before adapting it to parkinsonian speech. The model effectively clusters phonemic groups unsupervised and demonstrates enhanced discriminative power when supervised using forced-aligned labels. Our findings reveal that beyond the traditionally affected phonemes, Parkinson’s disease not only affects stop-plosives, voiced-plosives, and nasals, but also significantly impacts liquids, vowels, and fricatives, with the model achieving a benchmarking 91% ± 9 discrimination capability. An in-depth evaluation of the impact of the disease on the different phonemic groups represents an advance in the current knowledge of its effects on the speech, and has clear implications in the speech therapy of people with Parkinson’s disease. Moreover, regardless of the specific application domain presented, the model introduced has potential downstream utility in assessing the manner of articulation, whether influenced by other medical conditions or certain dialectal variations.
... The pause intervals were determined from speech signal using an automatic segmentation tool for connected speech. 21 NLP techniques were utilized to conduct linguistic analysis on each word in the transcribed recordings and assign them their respective word types. To achieve this, spaCy 22 was used to analyze English, French, German, and Italian transcriptions, and MorphoDiTa 23 to analyze Czech transcriptions. ...
... NSR is defined as the number of syllables extracted using hyphenation techniques divided by the length of speech after removing all pauses longer than 30 ms. 24 Prolonged pauses were assessed using duration of pause interval (DPI), which assesses the inappropriate silence and is defined as the median length of pause intervals equal to or longer than 250 ms. 21 The threshold of 200 to 250 ms has been widely adopted in the literature to determine pauses associated with cognitive decline. 30 No correlation was found between the acoustic features (Pearson: r = À0.09). ...
Article
Full-text available
Objective This study assessed the relationship between speech and language impairment and outcome in a multicenter cohort of isolated/idiopathic rapid eye movement sleep behavior disorder (iRBD). Methods Patients with iRBD from 7 centers speaking Czech, English, German, French, and Italian languages underwent a detailed speech assessment at baseline. Story‐tale narratives were transcribed and linguistically annotated using fully automated methods based on automatic speech recognition and natural language processing algorithms, leading to the 3 distinctive linguistic and 2 acoustic patterns of language deterioration and associated composite indexes of their overall severity. Patients were then prospectively followed and received assessments for parkinsonism or dementia during follow‐up. The Cox proportional hazard was performed to evaluate the predictive value of language patterns for phenoconversion over a follow‐up period of 5 years. Results Of 180 patients free of parkinsonism or dementia, 156 provided follow‐up information. After a mean follow‐up of 2.7 years, 42 (26.9%) patients developed neurodegenerative disease. Patients with higher severity of linguistic abnormalities (hazard ratio [HR = 2.35]) and acoustic abnormalities (HR = 1.92) were more likely to develop a defined neurodegenerative disease, with converters having lower content richness (HR = 1.74), slower articulation rate (HR = 1.58), and prolonged pauses (HR = 1.46). Dementia‐first (n = 16) and parkinsonism‐first with mild cognitive impairment (n = 9) converters had higher severity of linguistic abnormalities than parkinsonism‐first with normal cognition converters (n = 17). Interpretation Automated language analysis might provide a predictor of phenoconversion from iRBD into synucleinopathy subtypes with cognitive impairment, and thus can be used to stratify patients for neuroprotective trials. This article is protected by copyright. All rights reserved.
... Hence, fewer research studies like [28] aim to automate PD classification/monitoring using connected speech. Rusz et al. showed significant progress in utilizing connected speech for automatic PD classification [42,43]. They used multiple types of speech tasks and observed that features extracted from monologue (connected speech) were sensitive enough and tasks like sustained phonations and DDK were not optimal to capture impairments due to prodromal PD. ...
Article
Full-text available
Parkinson’s disease (PD) classification through speech has been an advancing field of research because of its ease of acquisition and processing. The minimal infrastructure requirements of the system have also made it suitable for telemonitoring applications. Researchers have studied the effects of PD on speech from various perspectives using different speech tasks. Typical speech deficits due to PD include voice monotony (e.g., monopitch), breathy or rough quality, and articulatory errors. In connected speech, these symptoms are more emphatic, which is also the basis for speech assessment in popular rating scales used for PD, like the Unified Parkinson’s Disease Rating Scale (UPDRS) and Hoehn and Yahr (HY). The current study introduces an innovative framework that integrates pitch-synchronous segmentation and an optimized set of features to investigate and analyze continuous speech from both PD patients and healthy controls (HC). Comparison of the proposed framework against existing methods has shown its superiority in classification performance and mitigation of overfitting in machine learning models. A set of optimal classifiers with unbiased decision-making was identified after comparing several machine learning models. The outcomes yielded by the classifiers demonstrate that the framework effectively learns the intrinsic characteristics of PD from connected speech, which can potentially offer valuable assistance in clinical diagnosis.
... In einem zweiten Schritt könnte hörerunabhängig eine qualitative und quantitative Fehleranalyse in Ergänzung zum Fachpersonal vorgenommen werden. Weiterhin könnten geringfügige Abweichungen in Stimmstabilität, Prosodie und Atemmuster erfasst werden, die bereits in frühen Krankheitsstadien sensible Diagnostikmarker darstellen [15]. ...
... Die Anwendung von KI-basierter Sprachanalyse in der Neurologie bietet zahlreiche Vorteile, darunter möglicherweise eine frühere und präzisere Diagnose für Erkrankungen, für die es keine geeigneten Biomarker gibt. Sie ist in der Lage, feine Veränderungen in der Sprache zu erfassen, die auf neurologische Störungen hinweisen, und ermöglicht eine potenzielle Früherkennung [3,5,15,17,29]. ...
... Immer mehr Hinweise deuten darauf hin, dass Veränderungen der Sprache dabei zu den ersten Symptomen im Frühstadium der Parkinson-Krankheit gehören könnten. Forschungsdaten haben gezeigt, dass Patienten mit REM-Schlafverhaltensstörungen (RBD), einer Erkrankung mit einer hohen Konversionsrate von bis zu 80 % zu IPS [27], bereits Anzeichen von Sprachveränderungen aufweisen [15]. Betroffene sprechen möglicherweise leiser, monotoner und weniger ausdrucksstark. ...
Article
ZUSAMMENFASSUNG Gegenstand und Ziel Der Artikel behandelt die Anwendungsmöglichkeiten der KI-gestützten Sprachanalyse bei neurodegenerativen Erkrankungen. Das Ziel besteht darin, einen Überblick über die sprachlichen Auffälligkeiten bei verschiedenen Krankheiten zu geben und zu zeigen, wie KI-basierte Methoden zur Diagnosestellung und Behandlung eingesetzt werden können. Material und Methoden Es werden neurodegenerative Erkrankungen und ihre spezifischen sprachlichen Störungen vorgestellt. Die traditionellen Methoden der Sprachanalyse für neurologische Erkrankungen werden erläutert und Möglichkeiten der KI-gestützten Analyse diskutiert. Ergebnisse Die KI-basierte Sprachanalyse stellt eine vielversprechende Möglichkeit zur Früherkennung und Diagnosestellung von neurologischen Erkrankungen dar. Durch automatische Transkripte und Fehleranalysen können subtile Veränderungen der Sprache und des Sprechens erkannt und objektiviert werden. Die KI-basierte Sprachanalyse ermöglicht eine genaue und quantifizierbare Bewertung von sprachlichen Defiziten und kann Fachpersonal zusätzliche Informationen liefern. Schlussfolgerung Die KI-basierte Sprachanalyse bietet neue Möglichkeiten zur Früherkennung und Überwachung von neurologischen Erkrankungen. Sie kann subtile Veränderungen der Sprache frühzeitig erkennen und eine rechtzeitige Intervention ermöglichen. Dennoch sollte sie als unterstützendes Werkzeug betrachtet werden und nicht als Ersatz für die Expertise von Fachpersonal. Die KI-basierte Sprachanalyse kann Ressourcen schonen, die Genauigkeit der Diagnose verbessern und eine kontinuierliche Verlaufsdiagnostik ermöglichen. Klinische Relevanz Die KI-basierte Sprachanalyse kann dazu beitragen, neurodegenerative Erkrankungen frühzeitig zu erkennen und eine gezielte Behandlung einzuleiten. Sie bietet eine objektivierbare Methode zur Bewertung sprachlicher Defizite und kann die Diagnosestellung unterstützen.
... To process the speech signal, pause intervals (>30 ms) were identified and eliminated using an automatic segmentation tool designed for connected speech [16]. Only the segments containing speech were retained for further analysis, including the estimation of the LTAS and its derived moments. ...
... To avoid the results being influenced by unequally long pauses in the utterances, a voice activity detector was used to eliminate the parts of the acoustic signals in which no speech was present. In addition, unvoiced parts of the speech were also removed using an automatic segmentation tool for connected speech [15] as in most cases it negatively affects the computation of the voice quality features. Three acoustic features describing voice quality were extracted from the recordings. ...
... Analysis of the acquired speech data and potential pathology is primarily interpreted using physiological speech patterns describing vocal tract abilities, such as articulation, pitch variability, loudness, rhythm, and phonation [8]. However, speech can also be parametrized by sets with low interpretability, but high performance, such as Mel Frequency Cepstral Coefficients (MFCCs) and their derivatives [9,10,11], Relative Spectral Transform -Perceptual Linear Prediction parameters [12], and deep neural networks embeddings [13]. ...
... Nonetheless, such an assumption has never been validated, while MFCCs can be easily influenced by other factors such as age, gender, speaking style, or recording procedure/microphone quality [19]. Most recently, in Roche's PD Mobile application designed for clinical trial measures in PD [5,20], the speech performance of the patients was analyzed on a sustained phonation task using only the second coefficient, MFCC2, representing a low-to-high frequency energy ratio [8]. ...
... To link the MFCCs to key dysarthria elements of PD, five acoustic variables with well-known pathophysiological interpretation were extracted from the speech waveform using the framework developed in [8]. ...
... The detection of PD (i.e., healthy vs. parkinsonian) from speech has been investigated in many studies [5][6][7][8][9][10][11][12][13]. More details about the various types of features and approaches used in the literature can be found in [14][15][16][17][18]. The present study focuses on speech-based severity level classification of PD (i.e., healthy vs. mild vs. severe). ...