The Speech Chain: The Physics and Biology of Spoken Language

Code-Switching ASR and TTS Using Semisupervised Learning with Machine Speech Chain

Article

Oct 2021

The phenomenon where a speaker mixes two or more languages within the same conversation is called code-switching (CS). Handling CS is challenging for automatic speech recognition (ASR) and text-to-speech (TTS) because it requires coping with multilingual input. Although CS text or speech may be found in social media, the datasets of CS speech and corresponding CS transcriptions are hard to obtain even though they are required for supervised training. This work adopts a deep learning-based machine speech chain to train CS ASR and CS TTS with each other with semisupervised learning. After supervised learning with monolingual data, the machine speech chain is then carried out with unsupervised learning of either the CS text or speech. The results show that the machine speech chain trains ASR and TTS together and improves performance without requiring the pair of CS speech and corresponding CS text. We also integrate language embedding and language identification into the CS machine speech chain in order to handle CS better by giving language information. We demonstrate that our proposed approach can improve the performance on both a single CS language pair and multiple CS language pairs, including the unknown CS excluded from training data.

Seeing is perceiving: The role of the lips in the production and perception of Anglo-English /r/

Thesis

Full-text available

Nov 2020

Hannah King

Articulatory variation is well-documented in post-alveolar approximant realisations of /r/ in rhotic Englishes, which present a diverse array of tongue configurations. However, the production of /r/ remains enigmatic, especially concerning non-rhotic Englishes and the accompanying labial gesture, both of which tend to be overlooked in the literature. This thesis attempts to account for them both, in which we consider the production and perception of /r/ in the non-rhotic variety of English spoken in England, ‘Anglo-English’. This variety is of particular interest because non-lingual labiodental articulations of /r/ are rapidly gaining currency, which may be due to the visual prominence of the lips, although a detailed phonetic description of this change in progress has yet to be undertaken. Three production and perception experiments were conducted to investigate the role of the lips in Anglo-English /r/. The results indicate that the presence of labiodental /r/ has caused auditory ambiguity with /w/ in Anglo-English. In order to maintain a perceptual contrast between /r/ and /w/, it is argued that Anglo-English speakers use their lips to enhance the perceptual saliency of /r/ in both the auditory and visual domains. The results indicate that visual cues of the speaker's lips are more prominent than the auditory ones and that these visual cues dominate the perception of the contrast when the auditory and visual cues are mismatched. The results have theoretical implications for the nature of speech perception in general, as well as for the role of visual speech cues in diachronic sound change.

A Perceptual Study of Approximated Cantonese Tone Contours

Conference Paper

Dec 2008

Improvisation on the spot: ÁNIMA transdisciplinary approach to artistic creation with five dancers, one musician, six instruments, and three sculptures

Preprint

Full-text available

Dec 2022

Nehir Akansu

nima is a performative, transdisciplinary creation and expresses itself through contact improvisation, cooperating with musical improvisation. Established in contemporary dance, music, photography, and sculptures made by fabric and sponges are inspired by specific parts of the body and raised during the period of the pandemic by six international women, artists from the field of illustration, production, and sculpture, music, and dance. Throughout the process of investigating, sharing, and deciding the concept of the performance, they got the opportunity of the first performance of ÁNIMA at Sporting Club in Russafa, Valencia. This is a powerful, pure, feminine, transparent, honest, and creative proposal. Because it has space to develop in every act, it comes with a different presentation each time, that's why it has renewable energy. This strength is based on the study of dance movements, gestures, voice effects, different colors of musical instruments, and dialogue with the sculptures. The performance is very grounded, but it has also the capacity for improvisation moments between performers. With this creation which is based on various expressions of contemporary art, we want to show that full concentration is a great skill that we have to practice in different ways in our lives. To evolve the spirit, if we surround ourselves with merits in various fields, physically connect with nature, educate our brain, open our perspective to novelties, and act for good, all changes will flow easily. We can transform and be reborn at every moment. Without fear of the new with full consciousness.

Perturbing the consistency of auditory feedback in speech

Article

Full-text available

Aug 2022
FRONT HUM NEUROSCI

Sensory information, including auditory feedback, is used by talkers to maintain fluent speech articulation. Current models of speech motor control posit that speakers continually adjust their motor commands based on discrepancies between the sensory predictions made by a forward model and the sensory consequences of their speech movements. Here, in two within-subject design experiments, we used a real-time formant manipulation system to explore how reliant speech articulation is on the accuracy or predictability of auditory feedback information. This involved introducing random formant perturbations during vowel production that varied systematically in their spatial location in formant space (Experiment 1) and temporal consistency (Experiment 2). Our results indicate that, on average, speakers’ responses to auditory feedback manipulations varied based on the relevance and degree of the error that was introduced in the various feedback conditions. In Experiment 1, speakers’ average production was not reliably influenced by random perturbations that were introduced every utterance to the first (F1) and second (F2) formants in various locations of formant space that had an overall average of 0 Hz. However, when perturbations were applied that had a mean of +100 Hz in F1 and −125 Hz in F2, speakers demonstrated reliable compensatory responses that reflected the average magnitude of the applied perturbations. In Experiment 2, speakers did not significantly compensate for perturbations of varying magnitudes that were held constant for one and three trials at a time. Speakers’ average productions did, however, significantly deviate from a control condition when perturbations were held constant for six trials. Within the context of these conditions, our findings provide evidence that the control of speech movements is, at least in part, dependent upon the reliability and stability of the sensory information that it receives over time.

A Machine Speech Chain Approach for Dynamically Adaptive Lombard TTS in Static and Dynamic Noise Environments

Article

Full-text available

Jan 2022

Recent end-to-end text-to-speech synthesis (TTS) systems have successfully synthesized high-quality speech. However, TTS speech intelligibility degrades in noisy environments because most of these systems were not designed to handle noisy environments. Several works attempted to address this problem by using offline fine-tuning to adapt their TTS to noisy conditions. Unlike machines, humans never perform offline fine-tuning. Instead, they speak with the Lombard effect in noisy places, where they dynamically adjust their vocal effort to improve the audibility of their speech. This ability is supported by the speech chain mechanism, which involves auditory feedback passing from speech perception to speech production. This paper proposes an alternative approach to TTS in noisy environments that is closer to the human Lombard effect. Specifically, we implement Lombard TTS in a machine speech chain framework to synthesize speech with dynamic adaptation. Our TTS performs adaptation by generating speech utterances based on the auditory feedback that consists of the automatic speech recognition (ASR) loss as the speech intelligibility measure and the speech-to-noise ratio (SNR) prediction as power measurement. Two versions of TTS are investigated: non-incremental TTS with utterance-level feedback and incremental TTS (ITTS) with short-term feedback to reduce the delay without significant performance loss. Furthermore, we evaluate the TTS systems in both static and dynamic noise conditions. Our experimental results show that auditory feedback enhanced the TTS speech intelligibility in noise.

Investigación sobre Arte de la Improvisación musical en un enfoque interdisciplinario

Conference Paper

Full-text available

Jul 2022

Nehir Akansu

Arte de la Improvisación musical en un enfoque interdisciplinario es una escritura reflexiva sobre mi investigación y producción en el programa doctorado. Centrado en la improvisación musical y lo explora en una composición de audiovisual y danza contemporánea. Explica el proceso de la producción artística lo cual inspirado de la armonía jazzística a través de esencias de la música folclórica celta, flamenca y tanguera y dotando de estos colores con audiovisuales de fondo de pantalla grande. Esta producción se presenta en una interpretación de viola en vivo con una bailarina moderna que improvisa en esos momentos y crea su propia coreografía. Es una exploración artística que pretende convertir los momentos de improvisación en más enriquecidos como vía expresiva y reflejarlos en diferentes modos de las artes escénicas. Muestra que un músico/bailarín/artista tiene el poder de expresar creativamente su viaje de vida para mostrar que todos tenemos una historia; lo que importa es cómo se convierten estas historias hacia su arte. El arte tiene un potencial significativo para hacernos pensar, darnos cuenta y transformarnos a nosotros mismos y aun cuando esto se convierte en una forma de improvisación, es más mágico ver que el ser humano es un mundo sin fin de fundamento.Palabras-clave: Improvisación musical, audiovisual, danza contemporánea, producción, investigación.

Improvisation Loading…

Article

Full-text available

Jul 2022

Nehir Akansu

El Arte de la Improvisación Musical con un enfoque interdisciplinario es una escritura reflexiva sobre mi investigación y producción en el marco del programa de doctorado. Este proyecto está centrado en la improvisación musical de piano (música grabada) y viola (música en directo) con la composición audiovisual (material propio y material ajeno), incluyendo la danza contemporánea en los videos. El artículo explica brevemente la importancia de la improvisación y los métodos que se aplican en las artes escénicas y comparte el proceso de la producción artística, en el que me he inspirado en la armonía del jazz a través de las esencias de la música folclórica; celta, flamenco y tango. Se vincularon estas armonías con audiovisuales sobre la pantalla grande.

Japanese and Korean speakers’ production of Japanese fricative /s/ and affricate /ts/*

Article

Mar 2022

Timing of brain entrainment to the speech envelope during speaking, listening and self-listening

Article

Jul 2022
COGNITION

This study investigates the dynamics of speech envelope tracking during speech production, listening and self listening. We use a paradigm in which participants listen to natural speech (Listening), produce natural speech (Speech Production), and listen to the playback of their own speech (Self-Listening), all while their neural activity is recorded with EEG. After time-locking EEG data collection and auditory recording and playback, we used a Gaussian copula mutual information measure to estimate the relationship between information content in the EEG and auditory signals. In the 2–10 Hz frequency range, we identified different latencies for maximal speech envelope tracking during speech production and speech perception. Maximal speech tracking takes place approximately 110 ms after auditory presentation during perception and 25 ms before vocalisation during speech production. These results describe a specific timeline for speech tracking in speakers and listeners in line with the idea of a speech chain and hence, delays in communication.

Approaches to the description and analysis of speech

Chapter

Full-text available

May 2009

Radoslav Pavlík

The paper is concerned with the approaches currently used in the description and analysis of speech continuum and speech variation. The main goal of the paper is to offer a brief overview of parametric, segmental, and gestural approaches used in phonetics and phonology, and, at the same time, to show how these three approaches are interrelated and interdependent.

Why Voice Biomarkers of Psychiatric Disorders are not used in Clinical Practice? Deconstructing the Myth of the Need for Objective Diagnoses

Conference Paper

Full-text available

May 2024

Voice biomarkers hold the promise of improving access to care and therapeutic follow-up for people with psychiatric disorders, tackling the issues raised by their high prevalence and the significant diagnostic delays and difficulties in patients follow-up. Yet, despite many years of successful research in the field, none of these voice biomarkers are implemented in clinical practice. Beyond the reductive explanation of the lack of explainability of the involved machine learning systems, we look for arguments in the epistemology and sociology of psychiatry. We show that the estimation of diagnoses, the major task in the literature, is of little interest to both clinicians and patients. After tackling the common misbeliefs about diagnosis in psychiatry in a didactic way, we propose a paradigm shift towards the estimation of clinical symptoms and signs, which not only address the limitations raised against diagnosis estimation but also enable the formulation of new machine learning tasks. We hope that this paradigm shift will empower the use of vocal biomarkers in clinical practice. It is however conditional on a change in database labeling practices, but also on a profound change in the speech processing community's practices towards psychiatry.

On the interplay between speech perception and production: insights from research and theories

Article

Full-text available

Jan 2024

The study of spoken communication has long been entrenched in a debate surrounding the interdependence of speech production and perception. This mini review summarizes findings from prior studies to elucidate the reciprocal relationships between speech production and perception. We also discuss key theoretical perspectives relevant to speech perception-production loop, including hyper-articulation and hypo-articulation (H&H) theory, speech motor theory, direct realism theory, articulatory phonology, the Directions into Velocities of Articulators (DIVA) and Gradient Order DIVA (GODIVA) models, and predictive coding. Building on prior findings, we propose a revised auditory-motor integration model of speech and provide insights for future research in speech perception and production, focusing on the effects of impaired peripheral auditory systems.

Developing Interpreter Trainees' Speech Comprehensibility: Does Nativeness of the Instructor Matter?

Article

Full-text available

Feb 2024

This study examines the effect of native vs. non-native prosody instruction on developing interpreter trainees' speech comprehensibility in English as a foreign language (EFL) using a pretest-posttest-delayed posttest design. Twenty-three groups of 28 interpreter trainees at a University in Iran (six different branches) took part in the study, all groups receiving the same amount of instruction (9 hours over 3 weeks). Three control groups listened to/viewed authentic audio recordings and movies in English, discussed their contents, and completed a variety of speaking tasks but received no specific prosody instruction. Twenty experimental groups spent part of the instruction time on theoretical explanation of, and practical exercises with, English prosody by thirteen nonnative instructors, and seven native instructors. Three experts evaluated the comprehensibility of the trainees in elicited speech samples collected during the pretest, immediate posttest and delayed posttest, and subsequently presented in random order. The findings revealed that the experimental groups gained between 1 and 2 points on the 0 to 10 comprehensibility scale, and lost little in the delayed posttest; however, hardly any changes were observed in the control groups. We conclude that native and non-native English instructors' prosody teaching were equally effective in enhancing EFL students' speech comprehensibility.

Relative importance of speech and voice features in the classification of schizophrenia and depression

Article

Full-text available

Sep 2023

Speech is a promising biomarker for schizophrenia spectrum disorder (SSD) and major depressive disorder (MDD). This proof of principle study investigates previously studied speech acoustics in combination with a novel application of voice pathology features as objective and reproducible classifiers for depression, schizophrenia, and healthy controls (HC). Speech and voice features for classification were calculated from recordings of picture descriptions from 240 speech samples (20 participants with SSD, 20 with MDD, and 20 HC each with 4 samples). Binary classification support vector machine (SVM) models classified the disorder groups and HC. For each feature, the permutation feature importance was calculated, and the top 25% most important features were used to compare differences between the disorder groups and HC including correlations between the important features and symptom severity scores. Multiple kernels for SVM were tested and the pairwise models with the best performing kernel (3-degree polynomial) were highly accurate for each classification: 0.947 for HC vs. SSD, 0.920 for HC vs. MDD, and 0.932 for SSD vs. MDD. The relatively most important features were measures of articulation coordination, number of pauses per minute, and speech variability. There were moderate correlations between important features and positive symptoms for SSD. The important features suggest that speech characteristics relating to psychomotor slowing, alogia, and flat affect differ between HC, SSD, and MDD.

An experimental investigation into older adults of production/comprehension asymmetries and declarative/procedural memory contributions: a Chinese context

Thesis

Full-text available

Jan 2023

Chenwei Xie

Full text available here: https://theses.lib.polyu.edu.hk/handle/200/12295 Children tend to experience an asymmetry in production and comprehension as their language develops. However, it remains unclear whether older adults exhibit such production/comprehension asymmetry (PCA) in a retrogenic manner. The overarching target of the current study is to examine whether PCA exists in the syntactic and semantic abilities of older adults. As language ability is correlated with various cognitive abilities, we further probe whether PCA is associated with the declarative and procedural memory, since semantic and syntactic abilities are considered to be subserved by declarative and procedural memory, based on the declarative/procedural model. The declearn task was used and the serial reaction time task was administered in order to measure declarative memory and procedural memory, respectively. It was found that both declarative memory and procedural memory deteriorated in older adults. Our results also demonstrate that the erasure of items in declarative memory follows the retrogenesis theory. As adults age, they tend to remember more real objects and forget more made-up objects; this pattern is the reverse of that in childhood. Using production tasks and comprehension tasks, we systematically investigated the syntactic and semantic processing patterns in the Chinese population, especially in older Chinese people. The results indicated that although older adults were able to express relevant information with intact syntactic and semantic complexity, they required a greater amount of planning time to initiate sentence production than younger adults. It was determined that older adults had a significantly lower level of semantic comprehension compared to younger adults because of their relatively low accuracy rate, as well as the absence of the N400 effect. On the other hand, it was shown that older adults were capable of reaching a similar accuracy rate as younger adults when judging syntactic correctness. Despite this, older adults failed to exhibit the anterior negative effect that could be observed in the neural potential of younger individuals. Thus, there are behavioral and neural differences in the receptive syntactic abilities of older adults. According to our findings, the semantic performance of older adults fluctuated in terms of receptive modality and expressive modality, resulting in a semantic PCA similar to that of children. The findings are in accordance with the retrogenesis theory, which posits that the decline of language reverses the trajectory of its development. Additionally, we found that the asymmetry between semantic production and comprehension emerged after both behavioral and neural declines in semantic ability. It is also important to note that the asymmetry may be hidden at the behavioral level and only visible at the level of neural activity. Furthermore, we found that declarative/procedural memory was associated with semantic/syntactic performance in adults of all ages. In addition, the original declarative/procedural model has been extended in the current study, as the two memory systems tend to be unequally linked to language abilities in the expressive modality and receptive modality. It is likely that the lifelong unequal associations contribute to PCA. The findings of our study provide a comprehensive picture of age-related memory deficits and language attrition, as well as behavioral and neural mechanisms responsible for these declines. The results of our research on retrogenic production and comprehension asymmetries indicate that we have a responsibility to treat the elderly with the same care and attention offered to our children.

Alignement, affiliation et trajectoire interactionnelle dans la conversation

Article

Full-text available

Dec 2022

The dynamic deployment of talk-in-interaction has been studied mainly from the perspective of collaboration and/or convergence. In both linguistics and psycholinguistics, the authors have mainly tried to show that due to a strong predictability (psycholinguistics) or projection/projectability (Conversational Analysis, Interactional Linguistics) of the utterances, achieved in particular on the linguistic level thanks to projection cues allowing to anticipate what is going to happen, the conversation is a joint activity in which the search for convergence (alignment) proves to be central. This search for convergence would result in an almost continuous collaboration during the interaction. Various phenomena (feedback items, collaborative statements, among others) tend to support this conception according to which the conversation is first and foremost collaboration and the explicit manifestation of this collaboration to his/her partner. The partners would therefore constantly be aligning themselves in order to achieve an optimal mutual understanding. While it is true that the collaboration of participants greatly facilitates conversation, or even that conversation is 'so easy' because we are constantly aligning onto each other, through the most automatic processes possible (Garrod & Pickering, 2004), this collaborative vision of conversation needs to be nuanced. Indeed, the analysis of various excerpts from conversational corpora shows that this notion of collaboration does not make it possible to account for the richness of the conversation, which is linked to its intrinsic dynamism. Thus, we can sometimes observe a superficial collaboration or even an absence of collaboration. Thus, while an interaction calls upon a large repertoire of practices (repetition, statement completion, thematic transition, feedback, humor, etc.), we wish to show that the collaboration of the participants is only one of the possible ways of exploiting this large repertoire, and that it coexists with other, less collaborative, or even non-collaborative forms, on which this article will focus. Through the study of non-expected (non-predictable) utterances, characterized as disaligned and/or disaffiliated in some of our work (Bertrand & Priego-Valverde, 2017, Priego-Valverde, 2021), we wish to emphasize the impact they can have on interactional dynamics. Thus, some of these utterances may simply be refused or ignored because taking them into account would lead to a change in the interactional trajectory initiated by one of the participants. On the other hand, other utterances, although unexpected, may be accepted and integrated into the ongoing conversation. It is therefore crucial to uncover the constraints to which the participants must adhere (“what to do”) turn by turn to allow the “successful” continuation of the interaction. We therefore also wish to show that conversations are not necessarily doomed to failure if collaboration in the strict sense is not total. Thus, these unexpected utterances would be the occasion to introduce new possibilities, allowing us to think that conversation would be made of both these already present (codified) forms/patterns and new emerging 'possibilities', in line with the work of Grammar in Interaction (Ochs et al., 1996; Auer, 2005).We propose to 'track' the emergence of these 'new possibilities': when do they appear? What is their nature? What are the consequences for the interaction in progress? To this end, we analyze five extracts from conversational corpora through the prism of the interactional trajectories that the participants take. Thus, the first example we will analyze perfectly illustrates this notion of collaborative activity accepted in many works on dialogue and inter-individual interactions (Sacks et al., 1974; Clark, 1996, Sidnell & Stivers, 2013; Couper-Kuhlen & Selting, 2018) to qualify the conversation. While we do not contest this notion of collaboration which is inherent to interactions, we aim to show that underneath this relatively consensual notion, 'lurk' diverse practices and processes, which ultimately make it a relatively fuzzy notion. In support of the other examples, we will show in this article that concepts such as progressivity, interactional trajectories, alignment or affiliation, allow us to better understand what we consider as a successful interactional achievement, which goes through different moments during which the participants can be more or less collaborative. The detailed analysis of these different examples will allow us to show that the dynamism of interactions is a crucial entry point for revealing their functioning in all its complexity. Furthermore, it encourages us to reflect on the paradigms that can be used to study the impact of these unexpected forms on conversational trajectories in an experimental framework.

Relacje między postawami wobec jąkania a autonomicznymi i subiektywnymi wskaźnikami lęku [Relationships between public attitudes toward stuttering and autonomic and subjective indices of anxiety]

Article

Full-text available

Dec 2022

Psychological & linguistic approaches to language acquisition

Article

Oct 2022

Kim Plunkett

Language acquisition research experienced a boom following the Chomskyian revolution. The focus of attention centred primarily on the English child's acquisition of syntax. In the seventies, the range of problem areas in language acquisition began to diversify and alternative perspectives (non-nativist) on how the child acquires language beganto emerge. It is argued that socio-cognitive approaches to language acquisition, though providing an important prerequisite for the acquisition of linguistic structure, cannot themselves account for the acquisition of the complex mapping relation between grammar and meaning that is required for full-blooded linguistic communication. Recenttrends in language acquisition research including Learnability theory, Individual differences and Cross-linguistic approaches are reviewed. The article concludes with speculation about the future role of non-nativist approaches in language acquisition research. Although much current detailed work would seem to point to the existence of a Language Acquisition Device that is specifically tuned to the processing of linguistic information, it is premature to conclude that amore general cognitive learning mechanism that is able to account for both universal and particular properties of linguistic development, cannot provide a more parsimonious explanation of acquisition.

NEUROSCIENCE RESEARCH NOTES Brain regions involved in speech production, mechanism and development

Article

Full-text available

Nov 2022

Speech might be one of the best inventions of human beings due to its critical communicative role in individuals' daily lives. Hence any study about it is valuable. To our knowledge, merely three studies focused on brain regions' associations with speech production were published more than eighteen years ago; furthermore, research on the brain areas associated with speech production is currently insufficient. The present review aims to provide information about all brain areas contributing to speech production to update the knowledge of brain areas related to speech production. The current study confirms earlier claims about activating some brain areas in the process; however, the previous studies were not comprehensive, and not all brain areas were mentioned. Three cerebral lobes are involved in the process, namely, the frontal, parietal and temporal lobes. The regions involved include the left superior parietal lobe, Wernicke's area, Heschl's gyri, primary auditory cortex, left posterior superior temporal gyrus (pSTG), Broca's area, and premotor cortex. In addition, regions of the lateral sulcus (anterior insula and posterior superior temporal sulcus), basal ganglia (putamen), and forebrain (thalamus) showed participation in the process. However, there was a different brain activation of overt and covert or silent speech (Broca's and Wernicke's areas). Moreover, mouth position and breathing style showed a difference in speech mechanism. In terms of speech development, the early postnatal years are important for speech development, as well as identifying three crucial stages of speech development: the pre-verbal stage, transition to active speech, and refinement of speech. In addition, during the early years of speech development, auditory and motor brain regions showed involvement in the process.

The Unspoken Voice: Applying John Shotter’s Dialogic Lens to Qualitative Data from People Who have Communication Difficulties

Article

Full-text available

Nov 2022
QUAL HEALTH RES

As speech and language therapists, we explored theories of communication and voice that are familiar to our profession and found them an inadequate basis on which to generate deep and rich analysis of the qualitative data from people who have communication difficulties and who use augmentative and alternative communication. Expanding our conceptual toolkit to include the work of John Shotter allowed us to reconceptualise voice and where it is emergent in dialogue. Reimaging voice will inform clinical and research praxis with people who have communication difficulties as it allows practitioners to attend more closely to the complexity and nuance inherent in interactions with this population. Our proposition is exemplified with excerpts from a single participant who has communication difficulties to illustrate the value of dialogic theory in praxis. This article presents a provocation for the wider academy of qualitative health research; do we have the concepts and tools to develop meaning with people whose lived experiences may also be hard to voice in monologues?

The Effect of Some Aspects of Swati Grammar on the Conceptualization of the Structure and Functions of the English Noun Phrase by University Students

Article

Full-text available

Oct 2022

Relacje między postawami wobec jąkania a autonomicznymi i subiektywnymi wskaźnikami lęku Relationships between public attitudes toward stuttering and autonomic and subjective indices of anxiety

Article

Full-text available

Feb 2022

Introduction: Research has shown that adults who stutter have reacted with increased skin conduct-ance and lower heart rates when confronted with videos of severe stuttering compared to videos of fluent speech. It has not been clearly established how these physiological indices or autonomic arousals are related to stuttering attitudes. The current study sought to compare physiological and psychometric measures of anxiety with stuttering attitudes. Method: In a multiple-baseline design, 18 normal hearing university students listened to short samples of stuttered, masked, and normally fluent speech while their skin conductance and heart rate variability were being monitored by an Empatica E4 wristband device. Pre-experimentally and after each speech condition, they rated their comfort level on a 1-9 scale. Participants filled out the State-Trait Anxiety Inventory (STAI) (Spielberger, 1977) prior to the physiological measures and the short 2022, nr 10, s. 1-34 ISSN 2450-2758 (wersja elektroniczna) Results: No significant main effects were observed for either autonomic measure for the three speech conditions, but interactions were significant. Individual participant analysis revealed that every respondent reacted differently to the skin conductance or heart rate variability. By contrast, mean subjective comfort ratings were more often lower after hearing stuttered or masked speech and higher after hearing fluent speech. Correlations between all the measures and the POSHA-S summary scores revealed little relationship between the autonomic measures and stuttering attitudes, but higher levels of state or trait anxiety were associated with more positive beliefs about people who stutter. In contrast, lower levels of anxiety tended to be associated with more positive self-reactions to those who stutter. Conclusion: This study did not replicate previous reports of heightened autonomic reactions to stuttering among nonstuttering adults, although psychometric measures suggest a relationship between anxiety and stuttering attitudes. Further research should explore these relationships, especially with young children. Abstrakt Wprowadzenie: Badania wykazały, że dorosłe osoby jąkające się zareagowały podwyższeniem prze-wodnictwa skóry i obniżeniem tętna podczas nagrań filmów prezentujących silne jąkanie w porów-naniu z filmami z płynną mową. Nie ustalono jednoznacznie, w jaki sposób te wskaźniki fizjologiczne lub pobudzenia autonomiczne są powiązane z postawami wobec jąkania. Obecne badanie miało na celu porównanie fizjologicznych i psychometrycznych miar lęku z postawami wobec jąkania. Metoda: W badaniu eksperymentalnym, z wykorzystaniem pomiaru kilku poziomów wyjściowych, 18 normalnie słyszących studentów słuchało krótkich próbek mowy jąkającej się, zamaskowanej i mowy normatywnie płynnej, ich przewodnictwo w skórze i zmienność rytmu serca były wtedy monitorowane przez urządzenie zamocowane na nadgarstku (typ Empatica E4). Uczestnicy oceniali przed eksperymentem oraz po każdym nagraniu (mowy z jąkaniem i płynnej) swój poziom komfortu w skali od 1 do 9. Przed pomiarami fizjologicznymi uczestnicy wypełnili Inwentarz Stanu i Cechy Lęku (STAI) (Spielberger, 1977), a następnie krótki inwentarz lęku stanowego. Na koniec wypełnili Ankietę Opinii Publicznej o Ludzkich Atrybutach-Jąkanie (POSHA-S). Wyniki: Nie zaobserwowano żadnych znaczących głównych efektów dla żadnej miary autonomicznej dla trzech stanów mowy, ale interakcje były znaczące. Indywidualna analiza uczestników wykazała, że każdy respondent inaczej reagował na przewodnictwo skóry czy zmienność rytmu serca. Nato-miast średnie oceny komfortu subiektywnego były częściej niższe po usłyszeniu mowy z jąkaniem lub zamaskowanej, a wyższe po usłyszeniu mowy płynnej. Korelacje między wszystkimi pomiarami a wynikami sumarycznymi POSHA-S ujawniły niewielki związek między miarami autonomicznymi a postawami wobec jąkania, ale wyższy poziom lęku jako stanu lub cechy wiązał się z bardziej po-zytywnymi przekonaniami na temat osób jąkających się. W przeciwieństwie do tego, niższy poziom lęku wiązał się z bardziej pozytywnymi reakcjami własnymi u osób jąkających się. Wnioski: Badanie to nie potwierdziło wcześniejszych doniesień o nasilonych autonomicznych reak-cjach na jąkanie wśród niejąkających się osób dorosłych, chociaż pomiary psychometryczne sugerują związek między lękiem a postawami wobec jąkania. Dalsze badania powinny eksplorować te relacje, zwłaszcza u małych dzieci. Słowa klucze: postawy wobec jąkania, lęk, pomiary autonomiczne, pomiary psychometryczne, POSHA-S Relationships between public attitudes toward stuttering…

Mutual intelligibility

Chapter

Full-text available

Aug 2022

Development of categorical speech perception in Mandarin-speaking children and adolescents

Article

Full-text available

Jan 2023
CHILD DEV

Although children develop categorical speech perception at a very young age, the maturation process remains unclear. A cross-sectional study in Mandarin-speaking 4-, 6-, and 10-year-old children, 14-year-old adolescents, and adults (n = 104, 56 males, all Asians from mainland China) was conducted to investigate the development of categorical perception of four Mandarin phonemic contrasts: lexical tone contrast Tone 1-2, vowel contrast /u/−/i/, consonant aspiration contrast /p/−/ph/, and consonant formant transition contrast /p/−/t/. The results indicated that different types of phonemic contrasts, and even the identification and discrimination of the same phonemic contrast, matured asynchronously. The observation that tone and vowel perception are achieved earlier than consonant perception supports the phonological saliency hypothesis.

Relationships Between Public Attitudes Toward Stuttering and Autonomic and Subjective Indices of Anxiety

Article

Full-text available

Jul 2022

Introduction: Research has shown that adults who stutter have reacted with increased skin conduct-ance and lower heart rates when confronted with videos of severe stuttering compared to videos of fluent speech. It has not been clearly established how these physiological indices or autonomic arousals are related to stuttering attitudes. The current study sought to compare physiological and psychometric measures of anxiety with stuttering attitudes. Method: In a multiple-baseline design, 18 normal hearing university students listened to short samples of stuttered, masked, and normally fluent speech while their skin conductance and heart rate variability were being monitored by an Empatica E4 wristband device. Pre-experimentally and after each speech condition, they rated their comfort level on a 1-9 scale. Participants filled out the State-Trait Anxiety Inventory (STAI) (Spielberger, 1977) prior to the physiological measures and the short 2022, nr 10, s. 1-34 ISSN 2450-2758 (wersja elektroniczna) Results: No significant main effects were observed for either autonomic measure for the three speech conditions, but interactions were significant. Individual participant analysis revealed that every respondent reacted differently to the skin conductance or heart rate variability. By contrast, mean subjective comfort ratings were more often lower after hearing stuttered or masked speech and higher after hearing fluent speech. Correlations between all the measures and the POSHA-S summary scores revealed little relationship between the autonomic measures and stuttering attitudes, but higher levels of state or trait anxiety were associated with more positive beliefs about people who stutter. In contrast, lower levels of anxiety tended to be associated with more positive self-reactions to those who stutter. Conclusion: This study did not replicate previous reports of heightened autonomic reactions to stuttering among nonstuttering adults, although psychometric measures suggest a relationship between anxiety and stuttering attitudes. Further research should explore these relationships, especially with young children. Abstrakt Wprowadzenie: Badania wykazały, że dorosłe osoby jąkające się zareagowały podwyższeniem prze-wodnictwa skóry i obniżeniem tętna podczas nagrań filmów prezentujących silne jąkanie w porów-naniu z filmami z płynną mową. Nie ustalono jednoznacznie, w jaki sposób te wskaźniki fizjologiczne lub pobudzenia autonomiczne są powiązane z postawami wobec jąkania. Obecne badanie miało na celu porównanie fizjologicznych i psychometrycznych miar lęku z postawami wobec jąkania. Metoda: W badaniu eksperymentalnym, z wykorzystaniem pomiaru kilku poziomów wyjściowych, 18 normalnie słyszących studentów słuchało krótkich próbek mowy jąkającej się, zamaskowanej i mowy normatywnie płynnej, ich przewodnictwo w skórze i zmienność rytmu serca były wtedy monitorowane przez urządzenie zamocowane na nadgarstku (typ Empatica E4). Uczestnicy oceniali przed eksperymentem oraz po każdym nagraniu (mowy z jąkaniem i płynnej) swój poziom komfortu w skali od 1 do 9. Przed pomiarami fizjologicznymi uczestnicy wypełnili Inwentarz Stanu i Cechy Lęku (STAI) (Spielberger, 1977), a następnie krótki inwentarz lęku stanowego. Na koniec wypełnili Ankietę Opinii Publicznej o Ludzkich Atrybutach-Jąkanie (POSHA-S). Wyniki: Nie zaobserwowano żadnych znaczących głównych efektów dla żadnej miary autonomicznej dla trzech stanów mowy, ale interakcje były znaczące. Indywidualna analiza uczestników wykazała, że każdy respondent inaczej reagował na przewodnictwo skóry czy zmienność rytmu serca. Nato-miast średnie oceny komfortu subiektywnego były częściej niższe po usłyszeniu mowy z jąkaniem lub zamaskowanej, a wyższe po usłyszeniu mowy płynnej. Korelacje między wszystkimi pomiarami a wynikami sumarycznymi POSHA-S ujawniły niewielki związek między miarami autonomicznymi a postawami wobec jąkania, ale wyższy poziom lęku jako stanu lub cechy wiązał się z bardziej po-zytywnymi przekonaniami na temat osób jąkających się. W przeciwieństwie do tego, niższy poziom lęku wiązał się z bardziej pozytywnymi reakcjami własnymi u osób jąkających się. Wnioski: Badanie to nie potwierdziło wcześniejszych doniesień o nasilonych autonomicznych reak-cjach na jąkanie wśród niejąkających się osób dorosłych, chociaż pomiary psychometryczne sugerują związek między lękiem a postawami wobec jąkania. Dalsze badania powinny eksplorować te relacje, zwłaszcza u małych dzieci. Słowa klucze: postawy wobec jąkania, lęk, pomiary autonomiczne, pomiary psychometryczne, POSHA-S Relationships between public attitudes toward stuttering…

Categorical Perception of Lexical Tones in Mandarin-Speaking Seniors

Article

Full-text available

Aug 2022

Purpose: This study aims to investigate the different degeneration processes of categorical perception (CP) of Mandarin lexical tones in the normal aging population and the pathological aging population with mild cognitive impairment (MCI). Method: In Experiment I, we compared the identification and discrimination of Tone 1 and Tone 2 across young adults, seniors aged 60-65 years, and older seniors aged 75-80 years with normal cognitive abilities. In Experiment II, we compared lexical tone identification and discrimination across young adults, healthy seniors, and age-matched seniors with MCI. Results: In Experiment I, tone perception was intact in seniors aged below 65 years. Those aged above 75 years could also maintain normal tone identification, whereas they showed poorer tone discrimination correlated with age-related poorer hearing level. In Experiment II, healthy seniors showed normal CP of Mandarin tones. Tone identification was also normal in those with MCI, whereas their tone discrimination had significantly degenerated. Conclusions: In the normal aging population, age-related hearing loss decreased signal audibility, accounting for poorer discrimination of Mandarin lexical tones in seniors above 75 years. In the pathological aging population with MCI, the poorer discrimination of lexical tones may be attributed to the additive effect of age, hearing loss, and cognitive impairment (e.g., impaired working memory and long-term phonological memory). This study uncovered the roles of low-level sensory processing and high-level cognitive processing in lexical tone perception in the Chinese aging population.

The efficacy of segmental/suprasegmental vs. holistic pronunciation instruction on the development of listening comprehension skills by EFL learners

Article

Oct 2023

The present study investigated the efficacy of segmental/suprasegmental vs. holistic pronunciation instruction in the development of listening comprehension skills by EFL learners, using a pre-test post-test design. Six groups of 20 intermediate EFL learners at a university in Iran took part in the study, all groups receiving the same amount of instruction (10 hours over 5 weeks). The control group listened to/viewed authentic audio recordings and movies in English, discussed their contents, and completed a variety of listening comprehension tasks but received no pronunciation instruction. Four experimental groups completed similar activities but during one third of the teaching time (20 minutes per class), received an explanation of segmental or suprasegmental features followed by production-focused or perception-focused practice. The final experimental group received holistic pronunciation instruction with mixed perception/production-focused practice for 20 minutes during each hour-long class. Versions of Longman’s TOEFL English proficiency test (paper-based) were used to assess listening comprehension at pre-test, immediate post-test and delayed post-test. The findings revealed that the holistic pronunciation instruction enhanced the listening comprehension skills of Iranian EFL learners more than separate segmental or suprasegmental training, with either perception or production-focused practice.

Gunshot Detection Systems: Methods, Challenges, and Can they be Trusted?

Conference Paper

Full-text available

Oct 2021

Many communities which are experiencing increased gun violence are turning to acoustic gunshot detection systems (GSDS) with the hope that their deployment would provide increased 24/7 monitoring and the potential for more rapid response by law enforcement to the scene. In addition to real-time monitoring, data collected by gunshot detection systems have been used alongside witness testimonies in criminal prosecutions. Because of their potential benefit, it would be appropriate to ask-how effective are GSDS in both lab/controlled settings vs. deployed real-world city scenarios? How reliable are outputs produced by GSDS? What is system performance trade-off in gunshot detection vs. source localization of the gunshot? Should they be used only for early alerts or can they be relied upon in courtroom settings? What negative consequences are there for directing law enforcement to locations when a false positive event occurs? Are resources spent on GSDS operational costs well utilized or could these resources be better invested to improve community safety? This study does not attempt to address many of these questions including social or economic questions of GSDS, but provides a reflective survey of hardware and algorithmic operations of the technology to better understand its potential as well as limitations. Specifically, challenges are discussed regarding environmental and other mismatch conditions, as well as emphasis on validation procedures used and their expected reliability. Many concepts discussed in this paper are general and will be likely utilized in or have impact on any gunshot detection technology. For this study, we refer to the ShotSpotter system to provide specific examples of system infrastructure and validation procedures.

Med språket genom tillvaron: En introduktion till dialogiska perspektiv på språkande, tänkande och kommunikation

Book

Jun 2022

Per Linell

Languaging in Real Life: an introduction to dialogical perspectives on language, thinking and communication This book is a comprehensive introduction to a dialogical perspective on language, languaging, thinking, communication and culture. It builds upon social psychology, social and dialogical philosophy (phenomenology), interactional linguistics, and humanistic ideas of thinking and communication. Dialogical terms comprise, for example, dialogue, dialectics, dynamics, extended dialogism, external dialogue, partial holism, situations, contexts and activities, partial and partially shared understandings, participation, appropriation, and meaning-making, interpenetrations of concepts, e.g. persons and culture. According to dialogical theory, a great deal revolves around the assumption that the making of meaning and social order in human activities and cultures is usually and initially built on interactions and relations between Self and Others. Basic properties of contributions to external dialogues are relations between initiatives and responses. Categories of responsive actions include minimal, short (“elliptical”) and expanded (“full”) responses. An important distinction is that between situated and sociohistorical contexts; we can talk about “double dialogicality” (situated vs. sociohistorical). An explanatory context theory must also distinguish between co-textual, other situation-based, and cultural (non-local) types. Pragma-semantic categories are linguistic means and situated (“participants’”) meanings; a parallel distinction is that between meaning potentials (of words and constructions) and message potentialities. (of situated utterances). Communication comprises cognitive, emotional and volitional aspects, and involves partial (and partially shared) understandings, and relations of power and respect. Utterances are characterised by responsivity, addressivity, incrementation, and relatively frequent re(tro)constructions of ongoing processes. The book includes separate sections on evolution and ontogenesis, dialogue and thinking, individual and collective aspects of language and languaging, activity types, multimodality of utterances, conditions production, reception and understanding of utterances, and also some traditional – but partially misguided – ideas in the theorisation of language and communication. This includes a discussion of the “written language bias in linguistics” which is a historical feature of the language sciences despite their shift from “practical” to “theoretical” concerns. All chapters are designed to highlight dialogical aspects. The last two chapters contain a discussions of general dialogical ideas as well as of phenomenology as an overarching framework. The differences between natural-science and humanistic approaches to the mind and mental capacities of man conclude the book. Several arguments build upon earlier work by the author, such as Linell (1998, 2005, 2009) and numerous papers such as Linell & Marková (1993) and Linell (2016, 2020a, 2021a). A list of major sources of inspiration is given in Appendix 1.

Improvising: A Grounded Theory Investigation of Psychology Students' Level of Anxiety, Coping, Communicative Skills, Imagination, and Spontaneity

Article

Full-text available

Jan 2022

Olga Temezhnikova

The aim of this study was to gain insight into the phenomenon of improvisation, how it is manifested in communication, and to conceptualize the process of improvisation in general. I aimed to construct a model for use in teaching and further analysis of training programs that target and develop improvisation skills in communication. The ability to communicate is part and parcel of psychologists’ work. I develop and supervise interactive classes and training programs to promote improvisation and communication skills, using the grounded theory of improvisation in communication under conditions of high uncertainty. The improvisation sessions were videotaped, transcribed, and analyzed. Applying the qualitative method and working with grounded theory methodology, I studied five sessions. Here I report on the major categories that condition the improvisational process: Level of Anxiety, Coping, Communicative Skills, Imagination, and Spontaneity. I also outline the markers of spontaneous behavior: strange combinations (oxymorons), humor, and rapid topic switching.

Est-il possible d'annoter la naturalité des pauses lors de la lecture d'un texte à haute voix ?

Conference Paper

Full-text available

Jun 2022

[English version below] L'utilisation de biomarqueurs vocaux est une des technologies les plus prometteuses pour l'implémentation de systèmes de santé numérique en conditions écologiques. En effet, de nombreuses et diverses pathologies sont maintenant diagnostiquées automatiquement de manière fiable grâce à des marqueurs vocaux. Cet article étudie et discute la faisabilité d'annoter des textes -- qui seront ensuite lus à voix haute par des patients hypersomniaques -- afin de concevoir des descripteurs basés sur la naturalité des pauses faites par ceux-ci. Pour cela, trois spécialistes ont annoté six textes extraits du Petit prince. Nous étudions à la fois à travers des mesures statistiques, mais aussi sous le prisme du rapport des annotateurs, deux axes : les différences entre les méthodes d'annotation ; et les lieux de désaccord dans les textes. Enfin, nous concluons quand à la fiabilité d'utiliser de telles annotations comme vérité terrain dans un système automatique. [Is it possible to annotate the naturalness of pauses made during reading out loud?] The use of voice and speech biomarkers is one of the most promising technologies to implement digital health in ecological conditions. Indeed, numerous and diversified pathologies are now accurately diagnosed by autonomous systems based on voice features. This article investigates and discusses the feasibility of annotating texts -- that are then read out loud by hypersomniac patients -- in order to design new features based on the naturalness of the pauses made by the patients. To do so, three specialists annotated six texts extracted from Le Petit Prince. We investigate two axes through statistics but also from the perceptive of annotators' report: differences in annotation methods; and disagreement location in texts. Finally, we conclude about the robustness of using such annotations as ground truth in an automated pipeline.

Resolving the prosody paradox

Chapter

Full-text available

Nov 2022

Vincent J. Van Heuven

In this tutorial paper, I explain what we mean by prosody in language and speech. I then show that, generally, prosody is highly redundant, i.e., it can be omitted from the speech signal without causing speech to become unintelligible or incomprehensible. Yet, it has often been mentioned, also in recent publications, that getting the prosody right should have priority in the acquisition of the phonology of a foreign language. This seems a contradiction. Why should a generally redundant feature of a code deserve priority in teaching? It is the purpose of this paper to resolve the paradox. [Entire e-book can be downloaded in open access from https://www.letraria.net/prosodia-e-bilinguismo/ ]

VELOSO, João. 2015. Introdução à fonologia. Nível fonético e nível fonológico. Porto: Faculdade de Letras da Universidade do Porto.

Preprint

Full-text available

Aug 2021

2015 Este texto corresponde a material inédito do autor e destina-se a fins pedagógicos na Faculdade de Letras da Universidade do Porto. Pode ser utilizado por terceiros com indicação expressa da fonte, para o que deve ser utilizada a seguinte referência bibliográfica: VELOSO, João. 2015. Introdução à Fonologia. Nível fonético e nível fonológico. Porto: Faculdade de Letras da Universidade do Porto (ms.) (Estas notas serão completadas em breve por outros textos de carácter semelhante relativos a outros tópicos de estudo específicos de fonologia. Comentários, notas e sugestões são bem-vindos e devem ser encaminhados diretamente para o autor: jveloso@letras.up.pt

Effects of Cognitive Load on the Categorical Perception of Mandarin Tones

Article

Full-text available

Oct 2021

Purpose: This study investigated the effect of cognitive load (CL) on the categorical perception (CP) of Mandarin lexical tones to discuss the application of the generalized pulse-skipping hypothesis. This hypothesis assumes that listeners might miss/skip temporal pulses and lose essential speech information due to CL, which consequently affects both the temporal and spectral dimensions of speech perception. Should CL decrease listeners’ pitch sensitivity and impair the distinction of tone categories, this study would support the generalized pulse-skipping hypothesis. Method: Twenty-four native Mandarin-speaking listeners were recruited to complete a dual-task experiment where they were required to identify or discriminate tone stimuli while concurrently memorizing six Chinese characters or graphic symbols. A no-load condition without a memory recall task was also included as a baseline condition. The position of categorical boundary, identification slope, between-/within-category discrimination, and discrimination peakedness were compared across the three conditions to measure the impact of CL on tone perception. The recall accuracy of Chinese characters and graphic symbols was used to assess the difficulty of memory recall. Results: Compared to the no-load condition, both load conditions showed a boundary shift to Tone 3, shallower identification slope, poorer between-category discrimination, and lower discrimination peakedness. Within-category discrimination was negatively affected by CL in the graphic symbol condition only, not in the Chinese character condition. Conclusions: CL degraded listeners’ sensitivity to subtle F0 changes and impaired CP of Mandarin lexical tones. This provides support for the generalized pulse-skipping hypothesis. Besides, the involvement of lexical information modulated the effect of CL.

Fundamentals of the Speech and Language Sciences

Book

May 2024

William Culbertson

Estimating Symptoms and Clinical Signs Instead of Disorders: The Path Toward The Clinical Use of Voice and Speech Biomarkers In Psychiatry

Conference Paper

Full-text available

Apr 2024

Despite the continuous innovation in voice biomarkers domain for more than a decade and the apparent need for clinicians to have objective diagnostic tools, no device has yet been implemented in real clinical settings or widely adopted by clinicians. After giving a short overview of the literature, we argue that in addition to the factors usually mentioned in the literature (low performance, database sizes, transparency, etc.), an underestimated but crucial factor preventing the use of such systems is the therapeutic relationship. We also discuss the “objectivity” of such systems, and the place of diagnosis in clinical practice and its conceptual limitations. In order to shape useful and relevant voice biomarkers, we propose to estimate symptoms instead of diagnosis, and draw perspectives related to this paradigm, which will require databases annotated with patients’ symptoms rather than only their pathological status.

The development of audiovisual speech perception in Mandarin‐speaking children: Evidence from the McGurk paradigm

Article

Oct 2023

The developmental trajectory of audiovisual speech perception in Mandarin‐speaking children remains understudied. This cross‐sectional study in Mandarin‐speaking 3‐ to 4‐year‐old, 5‐ to 6‐year‐old, 7‐ to 8‐year‐old children, and adults from Xiamen, China ( n = 87, 44 males) investigated this issue using the McGurk paradigm with three levels of auditory noise. For the identification of congruent stimuli, 3‐ to 4‐year‐olds underperformed older groups whose performances were comparable. For the perception of the incongruent stimuli, a developmental shift was observed as 3‐ to 4‐year‐olds made significantly more audio‐dominant but fewer audiovisual‐integrated responses to incongruent stimuli than older groups. With increasing auditory noise, the difference between children and adults widened in identifying congruent stimuli but narrowed in perceiving incongruent ones. The findings regarding noise effects agree with the statistically optimal hypothesis.

Basics of Spoken Language Processing

Chapter

Mar 2023

Xu Tan

In this chapter, we introduce some basics of spoken language processing (including both speech and natural language), which are fundamental to text-to-speech synthesis. Since speech and language are studied in the discipline of linguistics, we first overview some basic knowledge in linguistics and discuss a key concept called speech chain that is closely related to TTS. Then, we introduce speech signal processing, which covers the topics of digital signal processing, speech processing in the time and frequency domain, cepstrum analysis, linear prediction analysis, and speech parameter estimation. At last, we overview some typical speech processing tasks.KeywordsSpoken language processingLinguisticsSpeech chainSpeech signal processing

Sonograma Magazine

Article

Full-text available

Feb 2023

Nehir Akansu

Improvisación en el acto ÁNIMA sonograma.org/2023/01/improvisación-en-el-acto-anima/

Dynamic System Coupling in Voice Production

Article

Full-text available

Feb 2023

Voice is a major means of communication for humans, non-human mammals and many other vertebrates like birds and anurans. The physical and physiological principles of voice production are described by two theories: the MyoElastic-AeroDynamic (MEAD) theory and the Source-Filter Theory (SFT). While MEAD employs a multiphysics approach to understand the motor control and dynamics of self-sustained vibration of vocal folds or analogous tissues, SFT predominantly uses acoustics to understand spectral changes of the source via linear propagation through the vocal tract. Because the two theories focus on different aspects of voice production, they are often applied distinctly in specific areas of science and engineering. Here, we argue that the MEAD and the SFT are linked integral aspects of a holistic theory of voice production, describing a dynamically coupled system. The aim of this manuscript is to provide a comprehensive review of both the MEAD and the source-filter theory with its nonlinear extension, the latter of which suggests a number of conceptual similarities to sound production in brass instruments. We discuss the application of both theories to voice production of humans as well as of animals. An appraisal of voice production in the light of non-linear dynamics supports the notion that voice production can best be described with a systems view, considering coupled systems rather than isolated contributions of individual sub-systems.

Psychoacoustic features explain creakiness classifications made by naive and non-naive listeners

Article

Feb 2023
SPEECH COMMUN

Creakiness Judgments by Burmese and Vietnamese Speakers

Conference Paper

Nov 2022

Speech Perception Under Adverse Listening Conditions

Chapter

Feb 2022

Perceiving and understanding spoken language is something that most listeners take for granted, at least in favorable listening conditions. Yet, decades of research have demonstrated that speech is variable and ambiguous, meaning listeners must constantly engage in active hypothesis testing of what was said. Within this framework, even relatively minor challenges imposed on speech recognition must be understood as requiring the interaction of perceptual, cognitive, and linguistic factors. This chapter provides a systematic review of the various ways in which listening environments may be considered adverse, with a dual focus on the cognitive and neural systems that are thought to improve speech recognition in these challenging situations. Although a singular mechanism or construct cannot entirely explain how listeners cope with adversity in speech recognition, overcoming listening adversity is an attentionally guided process. Neurally, many adverse listening conditions appear to depend on higher-order (rather than primary) representations of speech in cortex, suggesting that more abstract linguistic knowledge and context become particularly important for comprehension when acoustic input is compromised. Additionally, the involvement of the cinguloopercular (CO) network, particularly the anterior insula, in a myriad of adverse listening situations may indicate that this network reflects a general indication of cognitive effort. In discussing the various challenges faced in the perception and understanding of speech, it is critically important to consider the interaction of the listener’s cognitive resources (knowledge and abilities) with the specific challenges imposed by the listening environment.

Interactions Between Audition and Cognition in Hearing Loss and Aging

Chapter

Feb 2022

Successful speech understanding relies not only on the auditory pathway, but on cognitive processes that act on incoming sensory information. One area in which the importance of cognitive factors is particularly striking during speech comprehension is when the acoustic signal is made more challenging, which might happen due to background noise, talker characteristics, or hearing loss. This chapter focuses on the interaction between hearing and cognition in hearing loss in older adults. The chapter begins with a review of common age-related changes in hearing and cognition, followed by summary evidence from pupillometric, behavioral, and neuroimaging paradigms that elucidate the interplay between hearing and cognition. Across a variety of experimental paradigms, there is compelling evidence that when listeners process acoustically challenging speech, additional cognitive effort is required compared to acoustically clear speech. This increase in cognitive effort is associated with specific brain networks, with the clearest evidence implicating cingulo-opercular and executive attention networks. Individual differences in hearing and cognitive ability thus determine the cognitive demand faced by a particular listener, and the cognitive and neural resources needed to aid in speech perception.KeywordsListening effortBackground noiseSpeech perceptionCognitive agingSentence comprehensionNeuroimagingCingulo-opercular networkExecutive attentionPupillometryfMRI

References

Article

Dec 2021

Charles Boberg

Drawing on data from well-known actors in popular films and TV shows, this reference guide surveys the representation of accent in North American film and TV over eight decades. It analyzes the speech of 180 film and television performances from the 1930s to today, looking at how that speech has changed; how it reflects the regional backgrounds, gender, and ethnic ancestry of the actors; and how phonetic variation and change in the 'real world' have been both portrayed in, and possibly influenced by, film and television speech. It also clearly explains the technical concepts necessary for understanding the phonetic analysis of accents. Providing new insights into the role of language in the expression of North American cultural identity, this is essential reading for researchers and advanced students in linguistics, film, television and media studies, and North American studies, as well as the larger community interested in film and television.

Goed gehoord, goed gezegd? Auditieve waarneming van het verschil tussen de Nederlandse [e:] en [ɛi] en de invloed ervan op articulatorische reproductie van de Nederlandse [e:] door Poolse moedertaalsprekers

Article

Full-text available

Nov 2021

Zuzanna Czerwonka-Wajda

The paper presents the results of a study on the audition of a Dutch tense vowel [e:] and the diphthong [ɛi] by Polish native speakers. It was hypothesized that Polish native speakers may pronounce the Dutch [e:] as a combination of [ɛ] and [j] because they generally fail to distinguish the [e:] from the diphthong [ɛi], as both sounds lie acoustically close to each other (especially those produced by speakers of Dutch from the Netherlands); moreover, they are absent from the Polish phonetic system. Instead, the experiment has shown that Polish native speakers are very good at differentiating the isolated Dutch [e:] and [ɛi]. It seems that the pronunciation of the Dutch [e:] as a combination of [ɛ] and [j] is rather a matter of articulation than audition, but further experiments are needed to examine the problem.

Predicting and classifying Japanese singleton and geminate consonants using logarithmic duration

Article

Sep 2021

To clarify the acoustic variables for predicting and classifying Japanese singleton and geminate consonants, raw and logarithmic durations of the consonants and their related segments were examined using 12 minimal pair words that were pronounced in a carrier sentence at various speaking rates by 20 native Japanese speakers. Regression and discriminant analyses revealed that the logarithmic durations were better at predicting and classifying Japanese singleton and geminate consonants than the raw durations used in many previous studies. Specifically, the best acoustic variables were the logarithmic duration of the consonant's closure or frication and the logarithmic average duration of the mora in the preceding carrier phrase. These results suggest that logarithmic durations are relational invariant acoustic variables that can cope with the durational variations of singleton and geminate consonants in a wide range of speaking rates.

Inclusive Teamwork for Pupils with Speech, Language and Communication Needs

Book

Full-text available

Sep 2021

Rosalind Merrick

Pseudo Human Sense in the Loop: Proposal of a Presentation Support Method by Pseudo Feedback of Audience Sense

Chapter

Jul 2021

In recent years, attention has been focused on information presentation methods that take into account the user’s situation by utilizing wearable computing technology. Most of the existing information presentation methods present users with information that gives them a choice of actions, and encourage them to take actions. However, users may not be able to control their own actions appropriately. It is important not only to encourage users to act, but also to forcibly control their behavior. Such behavioral control methods use actuators (e.g. displays, speakers etc.) to stimulate senses such as vision and hearing, and then control behavior by reversing the reaction to maintain consistency with the usual senses. However, most of these studies have not yet examined how to determine the intensity of the stimuli when introducing behavioral control methods into the real world. To solve this problem, we focused on the influence of the presence of others on human behavior control. We propose a method to determine the amount of stimuli to be fed back to the user based on the results of simulating the sensations of others. We define this method as Pseudo Human Sense in the Loop (referred to as “PHSIL” in this paper), and conduct cognitive psychology experiments on presentations, applying PHSIL to both auditory and visual stimuli, to verify the effectiveness of PHSIL.

The Speech Chain: The Physics and Biology of Spoken Language

No full-text available

Recommended publications

Peter B. Denes and Elliot N. Pinson The Speech Chain: The Physics and Biology of Spoken Language, 2n...

Biology, Speech, and Language

The speech chain : the phisics and biology of spoken language / [by] Peter B. Denes [and] Elliot N....