Article

The Speech Chain: The Physics and Biology of Spoken Language

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The human speech chain [33] is an essential mechanism for communication. We communicate by expressing our thoughts and listening to others. ...
... Human speech chain [33] and the corresponding machine speech chain [5]. Source: Adapted from [33]. ...
... Human speech chain [33] and the corresponding machine speech chain [5]. Source: Adapted from [33]. ...
Article
The phenomenon where a speaker mixes two or more languages within the same conversation is called code-switching (CS). Handling CS is challenging for automatic speech recognition (ASR) and text-to-speech (TTS) because it requires coping with multilingual input. Although CS text or speech may be found in social media, the datasets of CS speech and corresponding CS transcriptions are hard to obtain even though they are required for supervised training. This work adopts a deep learning-based machine speech chain to train CS ASR and CS TTS with each other with semisupervised learning. After supervised learning with monolingual data, the machine speech chain is then carried out with unsupervised learning of either the CS text or speech. The results show that the machine speech chain trains ASR and TTS together and improves performance without requiring the pair of CS speech and corresponding CS text. We also integrate language embedding and language identification into the CS machine speech chain in order to handle CS better by giving language information. We demonstrate that our proposed approach can improve the performance on both a single CS language pair and multiple CS language pairs, including the unknown CS excluded from training data.
... show that the addition of visual cues from the speaker's face not only facilitates communication, but may in uence the auditory perception of speech and in some cases, may even contribute to language evolution and change. Denes and Pinson's (1993) Speech Chain depicting the progression of a speech message from the brain of the speaker to the brain of the listener through the sound waves generated by the speaker's vocal movements. ...
... List of items removed from the full version of the thesis for copyright reasons Illustrations, figures, images... Figure N o Page(s) in manuscript Denes and Pinson's (1993) Speech Chain depicting the progression of a speech message from the brain of the speaker to the brain of the listener through the sound waves generated by the speaker's vocal movements. The geographical distribution of rhoticity based on data from the Survey of English Dialects from the 1950s (left) (Orton & Dieth, 1962) and the English Dialects App from 2016 (right) (from Leemann et al., 2018, p. 12). ...
... Pinson's (1993) Speech Chain of language processing. . . . . . . . oriented sound change scenarios according toOhala (1981) including hypocorrection and hypercorrection. . . . . . . . . . . . . . . . . . . . . . . . ...
Thesis
Full-text available
Articulatory variation is well-documented in post-alveolar approximant realisations of /r/ in rhotic Englishes, which present a diverse array of tongue configurations. However, the production of /r/ remains enigmatic, especially concerning non-rhotic Englishes and the accompanying labial gesture, both of which tend to be overlooked in the literature. This thesis attempts to account for them both, in which we consider the production and perception of /r/ in the non-rhotic variety of English spoken in England, ‘Anglo-English’. This variety is of particular interest because non-lingual labiodental articulations of /r/ are rapidly gaining currency, which may be due to the visual prominence of the lips, although a detailed phonetic description of this change in progress has yet to be undertaken. Three production and perception experiments were conducted to investigate the role of the lips in Anglo-English /r/. The results indicate that the presence of labiodental /r/ has caused auditory ambiguity with /w/ in Anglo-English. In order to maintain a perceptual contrast between /r/ and /w/, it is argued that Anglo-English speakers use their lips to enhance the perceptual saliency of /r/ in both the auditory and visual domains. The results indicate that visual cues of the speaker's lips are more prominent than the auditory ones and that these visual cues dominate the perception of the contrast when the auditory and visual cues are mismatched. The results have theoretical implications for the nature of speech perception in general, as well as for the role of visual speech cues in diachronic sound change.
... The knowledge about production-acoustics relation has been largely explored in previous research, while that about perception with acoustics is relatively less understood. As production and perception are processed by different physical systems of human being [1], much of the knowledge should be very different when learning how to control acoustics by each individually. For example, in tone production, it was found that human articulators limit the maximum speed of pitch change [2]; in tone perception, the same tone contour can be perceived as a reversed target, if the context is different [3]. ...
... In addition to the approximated tone contours for individual syllables, the inter-syllable transition was approximated also with a linear movement. In Test 3, the practicality of the approximations was evaluated with a few long 1 The Cantonese syllables are transcribed with Jyut Ping ( ) symbols. w to control e, in tone production, it production, it it the maximum speed of the maximum speed of the same tone contour can be he same tone contour the context is different [3]. ...
... The simple act of speaking requires a great deal of improvisation because the mind goes to its own thought and creates its impromptu delivery in words, sounds, and gestures, forming unpredictable statements that further fuel the thought process (the interpreter as well as the listener). Creating an enriched process that is no different from instant composition with a given set or repertoire of elements (Denes;Pinson, 1966). When improvisation is intended to solve a problem temporarily, and the "proper" solution is not available at the time, it can be known as an "interim" which applies to the field of engineering (Ludovice;Lefton;Catrambone, 2010). ...
... The simple act of speaking requires a great deal of improvisation because the mind goes to its own thought and creates its impromptu delivery in words, sounds, and gestures, forming unpredictable statements that further fuel the thought process (the interpreter as well as the listener). Creating an enriched process that is no different from instant composition with a given set or repertoire of elements (Denes;Pinson, 1966). When improvisation is intended to solve a problem temporarily, and the "proper" solution is not available at the time, it can be known as an "interim" which applies to the field of engineering (Ludovice;Lefton;Catrambone, 2010). ...
Preprint
Full-text available
nima is a performative, transdisciplinary creation and expresses itself through contact improvisation, cooperating with musical improvisation. Established in contemporary dance, music, photography, and sculptures made by fabric and sponges are inspired by specific parts of the body and raised during the period of the pandemic by six international women, artists from the field of illustration, production, and sculpture, music, and dance. Throughout the process of investigating, sharing, and deciding the concept of the performance, they got the opportunity of the first performance of ÁNIMA at Sporting Club in Russafa, Valencia. This is a powerful, pure, feminine, transparent, honest, and creative proposal. Because it has space to develop in every act, it comes with a different presentation each time, that's why it has renewable energy. This strength is based on the study of dance movements, gestures, voice effects, different colors of musical instruments, and dialogue with the sculptures. The performance is very grounded, but it has also the capacity for improvisation moments between performers. With this creation which is based on various expressions of contemporary art, we want to show that full concentration is a great skill that we have to practice in different ways in our lives. To evolve the spirit, if we surround ourselves with merits in various fields, physically connect with nature, educate our brain, open our perspective to novelties, and act for good, all changes will flow easily. We can transform and be reborn at every moment. Without fear of the new with full consciousness.
... Below in Figure 1 is a modified version of the production half of Denes and Pinson's (1973) speech chain. The figure portrays a closed loop between intention and the feedback that talkers hear of their own speech. ...
... Frontiers in Human Neuroscience 02 frontiersin.org The Speech Chain (Denes and Pinson, 1973). ...
Article
Full-text available
Sensory information, including auditory feedback, is used by talkers to maintain fluent speech articulation. Current models of speech motor control posit that speakers continually adjust their motor commands based on discrepancies between the sensory predictions made by a forward model and the sensory consequences of their speech movements. Here, in two within-subject design experiments, we used a real-time formant manipulation system to explore how reliant speech articulation is on the accuracy or predictability of auditory feedback information. This involved introducing random formant perturbations during vowel production that varied systematically in their spatial location in formant space (Experiment 1) and temporal consistency (Experiment 2). Our results indicate that, on average, speakers’ responses to auditory feedback manipulations varied based on the relevance and degree of the error that was introduced in the various feedback conditions. In Experiment 1, speakers’ average production was not reliably influenced by random perturbations that were introduced every utterance to the first (F1) and second (F2) formants in various locations of formant space that had an overall average of 0 Hz. However, when perturbations were applied that had a mean of +100 Hz in F1 and −125 Hz in F2, speakers demonstrated reliable compensatory responses that reflected the average magnitude of the applied perturbations. In Experiment 2, speakers did not significantly compensate for perturbations of varying magnitudes that were held constant for one and three trials at a time. Speakers’ average productions did, however, significantly deviate from a control condition when perturbations were held constant for six trials. Within the context of these conditions, our findings provide evidence that the control of speech movements is, at least in part, dependent upon the reliability and stability of the sensory information that it receives over time.
... H UMANS maintain their speech quality in various situations by simultaneously listening to their speech, a mechanism that is also known as the speech chain [1]. The auditory feedback produced from the self-evaluation inside this Sashi Novitasari is with the Nara Institute of Science and Technology, Ikoma 630-0192, Japan (e-mail: sashi.novitasari.si3@is.naist.jp). ...
... r Construction of incremental TTS (ITTS) in a machine speech chain framework, shown in Fig. 2(b). It incrementally synthesizes the speech by progressively taking 1 The initial part of this work was presented in [24]. The previous work only focused on non-incremental Lombard TTS in static noises. ...
Article
Full-text available
Recent end-to-end text-to-speech synthesis (TTS) systems have successfully synthesized high-quality speech. However, TTS speech intelligibility degrades in noisy environments because most of these systems were not designed to handle noisy environments. Several works attempted to address this problem by using offline fine-tuning to adapt their TTS to noisy conditions. Unlike machines, humans never perform offline fine-tuning. Instead, they speak with the Lombard effect in noisy places, where they dynamically adjust their vocal effort to improve the audibility of their speech. This ability is supported by the speech chain mechanism, which involves auditory feedback passing from speech perception to speech production. This paper proposes an alternative approach to TTS in noisy environments that is closer to the human Lombard effect. Specifically, we implement Lombard TTS in a machine speech chain framework to synthesize speech with dynamic adaptation. Our TTS performs adaptation by generating speech utterances based on the auditory feedback that consists of the automatic speech recognition (ASR) loss as the speech intelligibility measure and the speech-to-noise ratio (SNR) prediction as power measurement. Two versions of TTS are investigated: non-incremental TTS with utterance-level feedback and incremental TTS (ITTS) with short-term feedback to reduce the delay without significant performance loss. Furthermore, we evaluate the TTS systems in both static and dynamic noise conditions. Our experimental results show that auditory feedback enhanced the TTS speech intelligibility in noise.
... El simple acto de hablar requiere mucha improvisación porque la mente se dirige a su propio pensamiento y crea su expresión no ensayada en palabras, sonidos y gestos, formando declaraciones impredecibles que alimentan aún más el proceso de pensamiento (el ejecutante como oyente), creando un proceso enriquecido que no se diferencia de la composición instantánea con un conjunto o repertorio de elementos determinado (Denes;Pinson, 2015). ...
... El simple acto de hablar requiere mucha improvisación porque la mente se dirige a su propio pensamiento y crea su expresión no ensayada en palabras, sonidos y gestos, formando declaraciones impredecibles que alimentan aún más el proceso de pensamiento (el ejecutante como oyente), creando un proceso enriquecido que no se diferencia de la composición instantánea con un conjunto o repertorio de elementos determinado (Denes;Pinson, 2015). ...
Conference Paper
Full-text available
Arte de la Improvisación musical en un enfoque interdisciplinario es una escritura reflexiva sobre mi investigación y producción en el programa doctorado. Centrado en la improvisación musical y lo explora en una composición de audiovisual y danza contemporánea. Explica el proceso de la producción artística lo cual inspirado de la armonía jazzística a través de esencias de la música folclórica celta, flamenca y tanguera y dotando de estos colores con audiovisuales de fondo de pantalla grande. Esta producción se presenta en una interpretación de viola en vivo con una bailarina moderna que improvisa en esos momentos y crea su propia coreografía. Es una exploración artística que pretende convertir los momentos de improvisación en más enriquecidos como vía expresiva y reflejarlos en diferentes modos de las artes escénicas. Muestra que un músico/bailarín/artista tiene el poder de expresar creativamente su viaje de vida para mostrar que todos tenemos una historia; lo que importa es cómo se convierten estas historias hacia su arte. El arte tiene un potencial significativo para hacernos pensar, darnos cuenta y transformarnos a nosotros mismos y aun cuando esto se convierte en una forma de improvisación, es más mágico ver que el ser humano es un mundo sin fin de fundamento.Palabras-clave: Improvisación musical, audiovisual, danza contemporánea, producción, investigación.
... Consideramos que el acto de hablar es sencillo, pero también requiere mucha improvisación porque la mente se dirige hacia su propio pensamiento y crea su expresión no ensayada en palabras, sonidos y gestos, formando declaraciones impredecibles que alimentan aún más el proceso de pensamiento (el ejecutante como oyente), creando un proceso enriquecido que no se diferencia de la textura instantánea con un conjunto o tema de elementos determinados (Denes;Pinson, 2015). ...
... Consideramos que el acto de hablar es sencillo, pero también requiere mucha improvisación porque la mente se dirige hacia su propio pensamiento y crea su expresión no ensayada en palabras, sonidos y gestos, formando declaraciones impredecibles que alimentan aún más el proceso de pensamiento (el ejecutante como oyente), creando un proceso enriquecido que no se diferencia de la textura instantánea con un conjunto o tema de elementos determinados (Denes;Pinson, 2015). ...
Article
Full-text available
El Arte de la Improvisación Musical con un enfoque interdisciplinario es una escritura reflexiva sobre mi investigación y producción en el marco del programa de doctorado. Este proyecto está centrado en la improvisación musical de piano (música grabada) y viola (música en directo) con la composición audiovisual (material propio y material ajeno), incluyendo la danza contemporánea en los videos. El artículo explica brevemente la importancia de la improvisación y los métodos que se aplican en las artes escénicas y comparte el proceso de la producción artística, en el que me he inspirado en la armonía del jazz a través de las esencias de la música folclórica; celta, flamenco y tango. Se vincularon estas armonías con audiovisuales sobre la pantalla grande.
... One might suspect that Korean speakers' pronunciation errors are not perceived as errors by native Japanese speakers because the errors are identified with acoustical features, not perceptually in this study. Although several studies (e.g., Baese-Berk, 2019; Flege & Bohn, 2021) argued a weak or no correlation between speech production and perception, other studies (e.g., Amano & Hirata, 2010;Amano & Hirata, 2015;Denes & Pinson, 1993) claimed that speech production and perception are closely related and the production and perceptual boundaries of phonemes are expected to coincide. According to the expectation, the coincidence of the production and perceptual boundaries has been actually confirmed by experimental studies. ...
... If speech production and perception have a close relationship (e.g., Amano & Hirata, 2010;Amano & Hirata, 2010;Amano & Hirata, 2015;Denes & Pinson, 1993), the perception of /s/ and /ts/ may have similar characteristics to their production observed in this study. Namely, Korean speakers might misperceive /ts/ as /s/ more frequently than /s/ as /ts/ as a consequence of a perceptual boundary shift to the origin. ...
... During conversation, speakers and listeners exchange information; sending signals from one brain to another via the medium of speech. Consistent with the idea of the speech chain (Denes & Pinson, 1993), the speaker's brain activity should lead the listener's due to transmission delays from the speaker to the listener, as well as other physical limitations that mediate speech communication. Among these, transmission delays of ~5 ms from the speaker to the listener are expected given the speed of sound for conversations over 1-2 m of distance. ...
... After speech production, though, neural responses to the self-generated sensory consequences of speaking are attenuated for approximately 200 ms (Toyomura, Miyashiro, Kuriki, & Sowman, 2020;Wang et al., 2014). Taken together, these results of maximal speech tracking after auditory presentation during perception and before vocalisation during speech production support recent fMRI findings indicating alignment of the speaker's articulatory system and the listener's auditory system (Liu et al., 2020); however, by providing EEG estimates of the temporal dynamics of brain signals linked to the physical acoustic characteristics of the speech signal, we confirm that the timing of neural responses is in line with accounts of communication based on the idea of a speech chain linking the brains of speakers and listeners (Denes & Pinson, 1993). ...
Article
This study investigates the dynamics of speech envelope tracking during speech production, listening and self listening. We use a paradigm in which participants listen to natural speech (Listening), produce natural speech (Speech Production), and listen to the playback of their own speech (Self-Listening), all while their neural activity is recorded with EEG. After time-locking EEG data collection and auditory recording and playback, we used a Gaussian copula mutual information measure to estimate the relationship between information content in the EEG and auditory signals. In the 2–10 Hz frequency range, we identified different latencies for maximal speech envelope tracking during speech production and speech perception. Maximal speech tracking takes place approximately 110 ms after auditory presentation during perception and 25 ms before vocalisation during speech production. These results describe a specific timeline for speech tracking in speakers and listeners in line with the idea of a speech chain and hence, delays in communication.
... The second type of auditory parameter is based on the instrumental (computer) analysis of the auditory nerve activity during speech perception (Delgutte 1999). The auditory nerve is a part of the internal auditory system which is connected to the cochlea and it contains about 30,000 neurons (Denes -Pinson 1963: 92 -107, Brosnahan -Malmberg 1970: 160 -170, Clark -Yallop 1995: 303 -305, Pavlík 2003. There are at least four main auditory (auditory-nerve) parameters that can be identified: ...
... It may be caused by the depletion of neurotransmitter at the synapses between hair cells and ANFs in the cochlea (cf. Denes -Pinson 1963: 92 -107, Smith 1979. Adaptation plays several roles in the processing of speech in the auditory nerve (Delgutte 1999: 512): ...
Chapter
Full-text available
The paper is concerned with the approaches currently used in the description and analysis of speech continuum and speech variation. The main goal of the paper is to offer a brief overview of parametric, segmental, and gestural approaches used in phonetics and phonology, and, at the same time, to show how these three approaches are interrelated and interdependent.
... It is cost-effective and widely accessible, as it is integrated into all smartphones, enabling passive voice recording within natural living environments of patients. Furthermore, due to the in-tricate interplay of numerous neuromotor (Denes and Pinson, 1963) and neurolinguistic (Kröger et al., 2020) processes involved in speech production, it is sensitive to a wide spectrum of pathologies (Fagherazzi et al., 2021). These advantages have garnered significant attention from the voice and speech processing community, especially in the context of using voice recordings to detect psychiatric disorders. ...
Conference Paper
Full-text available
Voice biomarkers hold the promise of improving access to care and therapeutic follow-up for people with psychiatric disorders, tackling the issues raised by their high prevalence and the significant diagnostic delays and difficulties in patients follow-up. Yet, despite many years of successful research in the field, none of these voice biomarkers are implemented in clinical practice. Beyond the reductive explanation of the lack of explainability of the involved machine learning systems, we look for arguments in the epistemology and sociology of psychiatry. We show that the estimation of diagnoses, the major task in the literature, is of little interest to both clinicians and patients. After tackling the common misbeliefs about diagnosis in psychiatry in a didactic way, we propose a paradigm shift towards the estimation of clinical symptoms and signs, which not only address the limitations raised against diagnosis estimation but also enable the formulation of new machine learning tasks. We hope that this paradigm shift will empower the use of vocal biomarkers in clinical practice. It is however conditional on a change in database labeling practices, but also on a profound change in the speech processing community's practices towards psychiatry.
... These two components need to work in tango to establish "signal parity" between the produced and perceived representations for successful bidirectional communication (Liberman and Mattingly, 1989;Massaro, 2014). The entire system of speech production and perception forms a dynamic process with two cooperating sides to construct a stable and effective system of "speech chain" for effective verbal communication (Denes and Pinson, 1963). ...
Article
Full-text available
The study of spoken communication has long been entrenched in a debate surrounding the interdependence of speech production and perception. This mini review summarizes findings from prior studies to elucidate the reciprocal relationships between speech production and perception. We also discuss key theoretical perspectives relevant to speech perception-production loop, including hyper-articulation and hypo-articulation (H&H) theory, speech motor theory, direct realism theory, articulatory phonology, the Directions into Velocities of Articulators (DIVA) and Gradient Order DIVA (GODIVA) models, and predictive coding. Building on prior findings, we propose a revised auditory-motor integration model of speech and provide insights for future research in speech perception and production, focusing on the effects of impaired peripheral auditory systems.
... Pronunciation instruction with an emphasis on comprehensibility can contribute to EFL/ESL students being understood when using a second language (Levis 2018;. We define the intelligibility of a speaker or of a speech utterance in the classical, rather narrow, sense as the degree to which a listener is able to recognize the linguistic units (e.g., morphemes, words) in the stream of sounds and to establish the order in which they were spoken (e.g., Denes & Pinson, 1963;Smith & Rafiqzad, 1979;Gooskens & van Heuven 2021). When a sufficient number of words are recognized in the correct order, the listener will be able to reconstruct the speaker's meaning and intention. ...
Article
Full-text available
This study examines the effect of native vs. non-native prosody instruction on developing interpreter trainees' speech comprehensibility in English as a foreign language (EFL) using a pretest-posttest-delayed posttest design. Twenty-three groups of 28 interpreter trainees at a University in Iran (six different branches) took part in the study, all groups receiving the same amount of instruction (9 hours over 3 weeks). Three control groups listened to/viewed authentic audio recordings and movies in English, discussed their contents, and completed a variety of speaking tasks but received no specific prosody instruction. Twenty experimental groups spent part of the instruction time on theoretical explanation of, and practical exercises with, English prosody by thirteen nonnative instructors, and seven native instructors. Three experts evaluated the comprehensibility of the trainees in elicited speech samples collected during the pretest, immediate posttest and delayed posttest, and subsequently presented in random order. The findings revealed that the experimental groups gained between 1 and 2 points on the 0 to 10 comprehensibility scale, and lost little in the delayed posttest; however, hardly any changes were observed in the control groups. We conclude that native and non-native English instructors' prosody teaching were equally effective in enhancing EFL students' speech comprehensibility.
... Speech communication is the result of the coordination of over a hundred different muscles and neurobiological processes [8]. Acoustic measurement of speech can be used to observe the impacts of abnormalities on these neurobiological processes. ...
Article
Full-text available
Speech is a promising biomarker for schizophrenia spectrum disorder (SSD) and major depressive disorder (MDD). This proof of principle study investigates previously studied speech acoustics in combination with a novel application of voice pathology features as objective and reproducible classifiers for depression, schizophrenia, and healthy controls (HC). Speech and voice features for classification were calculated from recordings of picture descriptions from 240 speech samples (20 participants with SSD, 20 with MDD, and 20 HC each with 4 samples). Binary classification support vector machine (SVM) models classified the disorder groups and HC. For each feature, the permutation feature importance was calculated, and the top 25% most important features were used to compare differences between the disorder groups and HC including correlations between the important features and symptom severity scores. Multiple kernels for SVM were tested and the pairwise models with the best performing kernel (3-degree polynomial) were highly accurate for each classification: 0.947 for HC vs. SSD, 0.920 for HC vs. MDD, and 0.932 for SSD vs. MDD. The relatively most important features were measures of articulation coordination, number of pauses per minute, and speech variability. There were moderate correlations between important features and positive symptoms for SSD. The important features suggest that speech characteristics relating to psychomotor slowing, alogia, and flat affect differ between HC, SSD, and MDD.
... In communication, speakers and listeners are the two sides of the information sending and receiving coin (Denes & Pinson, 1993), and production and comprehension involve two opposite pathways for processing inputs and outputs. Speakers produce sentences during which conceptual ideas are transferred into serial sounds, while listeners comprehend sentences during which linguistic representations are converted into meaningful information. ...
Thesis
Full-text available
Full text available here: https://theses.lib.polyu.edu.hk/handle/200/12295 Children tend to experience an asymmetry in production and comprehension as their language develops. However, it remains unclear whether older adults exhibit such production/comprehension asymmetry (PCA) in a retrogenic manner. The overarching target of the current study is to examine whether PCA exists in the syntactic and semantic abilities of older adults. As language ability is correlated with various cognitive abilities, we further probe whether PCA is associated with the declarative and procedural memory, since semantic and syntactic abilities are considered to be subserved by declarative and procedural memory, based on the declarative/procedural model. The declearn task was used and the serial reaction time task was administered in order to measure declarative memory and procedural memory, respectively. It was found that both declarative memory and procedural memory deteriorated in older adults. Our results also demonstrate that the erasure of items in declarative memory follows the retrogenesis theory. As adults age, they tend to remember more real objects and forget more made-up objects; this pattern is the reverse of that in childhood. Using production tasks and comprehension tasks, we systematically investigated the syntactic and semantic processing patterns in the Chinese population, especially in older Chinese people. The results indicated that although older adults were able to express relevant information with intact syntactic and semantic complexity, they required a greater amount of planning time to initiate sentence production than younger adults. It was determined that older adults had a significantly lower level of semantic comprehension compared to younger adults because of their relatively low accuracy rate, as well as the absence of the N400 effect. On the other hand, it was shown that older adults were capable of reaching a similar accuracy rate as younger adults when judging syntactic correctness. Despite this, older adults failed to exhibit the anterior negative effect that could be observed in the neural potential of younger individuals. Thus, there are behavioral and neural differences in the receptive syntactic abilities of older adults. According to our findings, the semantic performance of older adults fluctuated in terms of receptive modality and expressive modality, resulting in a semantic PCA similar to that of children. The findings are in accordance with the retrogenesis theory, which posits that the decline of language reverses the trajectory of its development. Additionally, we found that the asymmetry between semantic production and comprehension emerged after both behavioral and neural declines in semantic ability. It is also important to note that the asymmetry may be hidden at the behavioral level and only visible at the level of neural activity. Furthermore, we found that declarative/procedural memory was associated with semantic/syntactic performance in adults of all ages. In addition, the original declarative/procedural model has been extended in the current study, as the two memory systems tend to be unequally linked to language abilities in the expressive modality and receptive modality. It is likely that the lifelong unequal associations contribute to PCA. The findings of our study provide a comprehensive picture of age-related memory deficits and language attrition, as well as behavioral and neural mechanisms responsible for these declines. The results of our research on retrogenic production and comprehension asymmetries indicate that we have a responsibility to treat the elderly with the same care and attention offered to our children.
... 44 Cela soulève d'importantes questions nouvelles sur les mécanismes cognitifs mis en jeu chez les deux locuteurs dans l'orchestration d'une interaction conversationnelle. Les modèles de la communication parlée traditionnels (par exemple, Denes & Pinson, 1963) se fondaient sur une répartition des rôles bien établie entre les deux interactants : lorsque l'un parle, l'autre écoute. De multiples travaux sur la dynamique de la conversation ont depuis montré les fortes limites de cette approche. ...
Article
Full-text available
The dynamic deployment of talk-in-interaction has been studied mainly from the perspective of collaboration and/or convergence. In both linguistics and psycholinguistics, the authors have mainly tried to show that due to a strong predictability (psycholinguistics) or projection/projectability (Conversational Analysis, Interactional Linguistics) of the utterances, achieved in particular on the linguistic level thanks to projection cues allowing to anticipate what is going to happen, the conversation is a joint activity in which the search for convergence (alignment) proves to be central. This search for convergence would result in an almost continuous collaboration during the interaction. Various phenomena (feedback items, collaborative statements, among others) tend to support this conception according to which the conversation is first and foremost collaboration and the explicit manifestation of this collaboration to his/her partner. The partners would therefore constantly be aligning themselves in order to achieve an optimal mutual understanding. While it is true that the collaboration of participants greatly facilitates conversation, or even that conversation is 'so easy' because we are constantly aligning onto each other, through the most automatic processes possible (Garrod & Pickering, 2004), this collaborative vision of conversation needs to be nuanced. Indeed, the analysis of various excerpts from conversational corpora shows that this notion of collaboration does not make it possible to account for the richness of the conversation, which is linked to its intrinsic dynamism. Thus, we can sometimes observe a superficial collaboration or even an absence of collaboration. Thus, while an interaction calls upon a large repertoire of practices (repetition, statement completion, thematic transition, feedback, humor, etc.), we wish to show that the collaboration of the participants is only one of the possible ways of exploiting this large repertoire, and that it coexists with other, less collaborative, or even non-collaborative forms, on which this article will focus. Through the study of non-expected (non-predictable) utterances, characterized as disaligned and/or disaffiliated in some of our work (Bertrand & Priego-Valverde, 2017, Priego-Valverde, 2021), we wish to emphasize the impact they can have on interactional dynamics. Thus, some of these utterances may simply be refused or ignored because taking them into account would lead to a change in the interactional trajectory initiated by one of the participants. On the other hand, other utterances, although unexpected, may be accepted and integrated into the ongoing conversation. It is therefore crucial to uncover the constraints to which the participants must adhere (“what to do”) turn by turn to allow the “successful” continuation of the interaction. We therefore also wish to show that conversations are not necessarily doomed to failure if collaboration in the strict sense is not total. Thus, these unexpected utterances would be the occasion to introduce new possibilities, allowing us to think that conversation would be made of both these already present (codified) forms/patterns and new emerging 'possibilities', in line with the work of Grammar in Interaction (Ochs et al., 1996; Auer, 2005).We propose to 'track' the emergence of these 'new possibilities': when do they appear? What is their nature? What are the consequences for the interaction in progress? To this end, we analyze five extracts from conversational corpora through the prism of the interactional trajectories that the participants take. Thus, the first example we will analyze perfectly illustrates this notion of collaborative activity accepted in many works on dialogue and inter-individual interactions (Sacks et al., 1974; Clark, 1996, Sidnell & Stivers, 2013; Couper-Kuhlen & Selting, 2018) to qualify the conversation. While we do not contest this notion of collaboration which is inherent to interactions, we aim to show that underneath this relatively consensual notion, 'lurk' diverse practices and processes, which ultimately make it a relatively fuzzy notion. In support of the other examples, we will show in this article that concepts such as progressivity, interactional trajectories, alignment or affiliation, allow us to better understand what we consider as a successful interactional achievement, which goes through different moments during which the participants can be more or less collaborative. The detailed analysis of these different examples will allow us to show that the dynamism of interactions is a crucial entry point for revealing their functioning in all its complexity. Furthermore, it encourages us to reflect on the paradigms that can be used to study the impact of these unexpected forms on conversational trajectories in an experimental framework.
... To address the issue of the origin of negative attitudes toward stuttering, it is instructive to consider the acts of normal speaking and listening in what speech scientists (Denes & Pinson, 1963) many years ago termed the "speech chain". In any natural language, detailed rules govern the expected phonological, syntactic, semantic, and pragmatic nature of what is spoken both for the speaker and for the listener. ...
... Attempts to identify biological determinants of language acquisition in »homo sapiens« have rested primarily on studies of critical periods for acquisition (Lenneberg [1967]) and investigations of aphasia where explicit language deficits are related to specific brain injury (Geshwind [1972]). Investigation of the peripheral mechanisms involved in the processes of language comprehension and production i.e. the functioning of the ear and the vocal apparatus, demonstrate how the forms of language fit perceptual and motor skills which indisputably are governed by important genetic factors (Denes & Pinson [1973]; Liberman et al. [1967]). It has even been possible to demonstrate that some of these peripheral mechanisms are shared with other species (Eimas [1975]). ...
Article
Language acquisition research experienced a boom following the Chomskyian revolution. The focus of attention centred primarily on the English child's acquisition of syntax. In the seventies, the range of problem areas in language acquisition began to diversify and alternative perspectives (non-nativist) on how the child acquires language beganto emerge. It is argued that socio-cognitive approaches to language acquisition, though providing an important prerequisite for the acquisition of linguistic structure, cannot themselves account for the acquisition of the complex mapping relation between grammar and meaning that is required for full-blooded linguistic communication. Recenttrends in language acquisition research including Learnability theory, Individual differences and Cross-linguistic approaches are reviewed. The article concludes with speculation about the future role of non-nativist approaches in language acquisition research. Although much current detailed work would seem to point to the existence of a Language Acquisition Device that is specifically tuned to the processing of linguistic information, it is premature to conclude that amore general cognitive learning mechanism that is able to account for both universal and particular properties of linguistic development, cannot provide a more parsimonious explanation of acquisition.
... In speech production, the articulatory movement creates an intermediate representation between neuro-motor planning (high level) and speech acoustics (low level) (Whiteside et al., 1993). Neuro-motor planning is in human brains, and its central role is to express linguistic information. ...
Article
Full-text available
Speech might be one of the best inventions of human beings due to its critical communicative role in individuals' daily lives. Hence any study about it is valuable. To our knowledge, merely three studies focused on brain regions' associations with speech production were published more than eighteen years ago; furthermore, research on the brain areas associated with speech production is currently insufficient. The present review aims to provide information about all brain areas contributing to speech production to update the knowledge of brain areas related to speech production. The current study confirms earlier claims about activating some brain areas in the process; however, the previous studies were not comprehensive, and not all brain areas were mentioned. Three cerebral lobes are involved in the process, namely, the frontal, parietal and temporal lobes. The regions involved include the left superior parietal lobe, Wernicke's area, Heschl's gyri, primary auditory cortex, left posterior superior temporal gyrus (pSTG), Broca's area, and premotor cortex. In addition, regions of the lateral sulcus (anterior insula and posterior superior temporal sulcus), basal ganglia (putamen), and forebrain (thalamus) showed participation in the process. However, there was a different brain activation of overt and covert or silent speech (Broca's and Wernicke's areas). Moreover, mouth position and breathing style showed a difference in speech mechanism. In terms of speech development, the early postnatal years are important for speech development, as well as identifying three crucial stages of speech development: the pre-verbal stage, transition to active speech, and refinement of speech. In addition, during the early years of speech development, auditory and motor brain regions showed involvement in the process.
... Speech and language therapy training is dominated by biological, neuropsychological, and linguistic theories of speech and language. Theories include linear, transactional models of communication, for example, the 'communication chain' (Denes & Pinson, 1993). In this model messages are physical (speech, writing, sign) consisting of semiotic signals in which meaning is inherently contained (words in language). ...
Article
Full-text available
As speech and language therapists, we explored theories of communication and voice that are familiar to our profession and found them an inadequate basis on which to generate deep and rich analysis of the qualitative data from people who have communication difficulties and who use augmentative and alternative communication. Expanding our conceptual toolkit to include the work of John Shotter allowed us to reconceptualise voice and where it is emergent in dialogue. Reimaging voice will inform clinical and research praxis with people who have communication difficulties as it allows practitioners to attend more closely to the complexity and nuance inherent in interactions with this population. Our proposition is exemplified with excerpts from a single participant who has communication difficulties to illustrate the value of dialogic theory in praxis. This article presents a provocation for the wider academy of qualitative health research; do we have the concepts and tools to develop meaning with people whose lived experiences may also be hard to voice in monologues?
... Therefore, self-monitoring involves the comprehension and analysis of what the individual has produced, and detects whether the other processes have been implemented effectively. Denes and Pinson [9] refer to this self-monitoring process as the Feedback Link which enables the speaker to hear himself, comprehend his own speech and immediately make necessary corrections wherever required. In written language, the writer is able to make such corrections through proofreading and editing what has been written. ...
... To address the issue of the origin of negative attitudes toward stuttering, it is instructive to consider the acts of normal speaking and listening in what speech scientists (Denes & Pinson, 1963) many years ago termed the "speech chain". In any natural language, detailed rules govern the expected phonological, syntactic, semantic, and pragmatic nature of what is spoken both for the speaker and for the listener. ...
Article
Full-text available
Introduction: Research has shown that adults who stutter have reacted with increased skin conduct-ance and lower heart rates when confronted with videos of severe stuttering compared to videos of fluent speech. It has not been clearly established how these physiological indices or autonomic arousals are related to stuttering attitudes. The current study sought to compare physiological and psychometric measures of anxiety with stuttering attitudes. Method: In a multiple-baseline design, 18 normal hearing university students listened to short samples of stuttered, masked, and normally fluent speech while their skin conductance and heart rate variability were being monitored by an Empatica E4 wristband device. Pre-experimentally and after each speech condition, they rated their comfort level on a 1-9 scale. Participants filled out the State-Trait Anxiety Inventory (STAI) (Spielberger, 1977) prior to the physiological measures and the short 2022, nr 10, s. 1-34 ISSN 2450-2758 (wersja elektroniczna) Results: No significant main effects were observed for either autonomic measure for the three speech conditions, but interactions were significant. Individual participant analysis revealed that every respondent reacted differently to the skin conductance or heart rate variability. By contrast, mean subjective comfort ratings were more often lower after hearing stuttered or masked speech and higher after hearing fluent speech. Correlations between all the measures and the POSHA-S summary scores revealed little relationship between the autonomic measures and stuttering attitudes, but higher levels of state or trait anxiety were associated with more positive beliefs about people who stutter. In contrast, lower levels of anxiety tended to be associated with more positive self-reactions to those who stutter. Conclusion: This study did not replicate previous reports of heightened autonomic reactions to stuttering among nonstuttering adults, although psychometric measures suggest a relationship between anxiety and stuttering attitudes. Further research should explore these relationships, especially with young children. Abstrakt Wprowadzenie: Badania wykazały, że dorosłe osoby jąkające się zareagowały podwyższeniem prze-wodnictwa skóry i obniżeniem tętna podczas nagrań filmów prezentujących silne jąkanie w porów-naniu z filmami z płynną mową. Nie ustalono jednoznacznie, w jaki sposób te wskaźniki fizjologiczne lub pobudzenia autonomiczne są powiązane z postawami wobec jąkania. Obecne badanie miało na celu porównanie fizjologicznych i psychometrycznych miar lęku z postawami wobec jąkania. Metoda: W badaniu eksperymentalnym, z wykorzystaniem pomiaru kilku poziomów wyjściowych, 18 normalnie słyszących studentów słuchało krótkich próbek mowy jąkającej się, zamaskowanej i mowy normatywnie płynnej, ich przewodnictwo w skórze i zmienność rytmu serca były wtedy monitorowane przez urządzenie zamocowane na nadgarstku (typ Empatica E4). Uczestnicy oceniali przed eksperymentem oraz po każdym nagraniu (mowy z jąkaniem i płynnej) swój poziom komfortu w skali od 1 do 9. Przed pomiarami fizjologicznymi uczestnicy wypełnili Inwentarz Stanu i Cechy Lęku (STAI) (Spielberger, 1977), a następnie krótki inwentarz lęku stanowego. Na koniec wypełnili Ankietę Opinii Publicznej o Ludzkich Atrybutach-Jąkanie (POSHA-S). Wyniki: Nie zaobserwowano żadnych znaczących głównych efektów dla żadnej miary autonomicznej dla trzech stanów mowy, ale interakcje były znaczące. Indywidualna analiza uczestników wykazała, że każdy respondent inaczej reagował na przewodnictwo skóry czy zmienność rytmu serca. Nato-miast średnie oceny komfortu subiektywnego były częściej niższe po usłyszeniu mowy z jąkaniem lub zamaskowanej, a wyższe po usłyszeniu mowy płynnej. Korelacje między wszystkimi pomiarami a wynikami sumarycznymi POSHA-S ujawniły niewielki związek między miarami autonomicznymi a postawami wobec jąkania, ale wyższy poziom lęku jako stanu lub cechy wiązał się z bardziej po-zytywnymi przekonaniami na temat osób jąkających się. W przeciwieństwie do tego, niższy poziom lęku wiązał się z bardziej pozytywnymi reakcjami własnymi u osób jąkających się. Wnioski: Badanie to nie potwierdziło wcześniejszych doniesień o nasilonych autonomicznych reak-cjach na jąkanie wśród niejąkających się osób dorosłych, chociaż pomiary psychometryczne sugerują związek między lękiem a postawami wobec jąkania. Dalsze badania powinny eksplorować te relacje, zwłaszcza u małych dzieci. Słowa klucze: postawy wobec jąkania, lęk, pomiary autonomiczne, pomiary psychometryczne, POSHA-S Relationships between public attitudes toward stuttering…
... This last part of the process is what we call speech understanding or comprehension. The sequence of events sketched here is known as the speech chain (Denes and Pinson 1963), and it has been the blueprint of Levelt's (1989) model of speech production and Cutler's (2012) model of native listening. The intelligibility of a speaker, or of a speech utterance, is the degree to which a listener is able to recognize the linguistic units in the stream of sounds and to establish the order in which they were spoken. ...
... In daily communication, speech consists of continuous acoustic cues that are highly variable. Speech perception is an essential skill that children need to acquire to communicate with others (Denes & Pinson, 2015). One aspect of speech perception is the need for individuals to decode acoustic cues into discrete phonemes via socalled categorical perception (CP; Liberman et al., 1957). ...
Article
Full-text available
Although children develop categorical speech perception at a very young age, the maturation process remains unclear. A cross-sectional study in Mandarin-speaking 4-, 6-, and 10-year-old children, 14-year-old adolescents, and adults (n = 104, 56 males, all Asians from mainland China) was conducted to investigate the development of categorical perception of four Mandarin phonemic contrasts: lexical tone contrast Tone 1-2, vowel contrast /u/−/i/, consonant aspiration contrast /p/−/ph/, and consonant formant transition contrast /p/−/t/. The results indicated that different types of phonemic contrasts, and even the identification and discrimination of the same phonemic contrast, matured asynchronously. The observation that tone and vowel perception are achieved earlier than consonant perception supports the phonological saliency hypothesis.
... To address the issue of the origin of negative attitudes toward stuttering, it is instructive to consider the acts of normal speaking and listening in what speech scientists (Denes & Pinson, 1963) many years ago termed the "speech chain". In any natural language, detailed rules govern the expected phonological, syntactic, semantic, and pragmatic nature of what is spoken both for the speaker and for the listener. ...
Article
Full-text available
Introduction: Research has shown that adults who stutter have reacted with increased skin conduct-ance and lower heart rates when confronted with videos of severe stuttering compared to videos of fluent speech. It has not been clearly established how these physiological indices or autonomic arousals are related to stuttering attitudes. The current study sought to compare physiological and psychometric measures of anxiety with stuttering attitudes. Method: In a multiple-baseline design, 18 normal hearing university students listened to short samples of stuttered, masked, and normally fluent speech while their skin conductance and heart rate variability were being monitored by an Empatica E4 wristband device. Pre-experimentally and after each speech condition, they rated their comfort level on a 1-9 scale. Participants filled out the State-Trait Anxiety Inventory (STAI) (Spielberger, 1977) prior to the physiological measures and the short 2022, nr 10, s. 1-34 ISSN 2450-2758 (wersja elektroniczna) Results: No significant main effects were observed for either autonomic measure for the three speech conditions, but interactions were significant. Individual participant analysis revealed that every respondent reacted differently to the skin conductance or heart rate variability. By contrast, mean subjective comfort ratings were more often lower after hearing stuttered or masked speech and higher after hearing fluent speech. Correlations between all the measures and the POSHA-S summary scores revealed little relationship between the autonomic measures and stuttering attitudes, but higher levels of state or trait anxiety were associated with more positive beliefs about people who stutter. In contrast, lower levels of anxiety tended to be associated with more positive self-reactions to those who stutter. Conclusion: This study did not replicate previous reports of heightened autonomic reactions to stuttering among nonstuttering adults, although psychometric measures suggest a relationship between anxiety and stuttering attitudes. Further research should explore these relationships, especially with young children. Abstrakt Wprowadzenie: Badania wykazały, że dorosłe osoby jąkające się zareagowały podwyższeniem prze-wodnictwa skóry i obniżeniem tętna podczas nagrań filmów prezentujących silne jąkanie w porów-naniu z filmami z płynną mową. Nie ustalono jednoznacznie, w jaki sposób te wskaźniki fizjologiczne lub pobudzenia autonomiczne są powiązane z postawami wobec jąkania. Obecne badanie miało na celu porównanie fizjologicznych i psychometrycznych miar lęku z postawami wobec jąkania. Metoda: W badaniu eksperymentalnym, z wykorzystaniem pomiaru kilku poziomów wyjściowych, 18 normalnie słyszących studentów słuchało krótkich próbek mowy jąkającej się, zamaskowanej i mowy normatywnie płynnej, ich przewodnictwo w skórze i zmienność rytmu serca były wtedy monitorowane przez urządzenie zamocowane na nadgarstku (typ Empatica E4). Uczestnicy oceniali przed eksperymentem oraz po każdym nagraniu (mowy z jąkaniem i płynnej) swój poziom komfortu w skali od 1 do 9. Przed pomiarami fizjologicznymi uczestnicy wypełnili Inwentarz Stanu i Cechy Lęku (STAI) (Spielberger, 1977), a następnie krótki inwentarz lęku stanowego. Na koniec wypełnili Ankietę Opinii Publicznej o Ludzkich Atrybutach-Jąkanie (POSHA-S). Wyniki: Nie zaobserwowano żadnych znaczących głównych efektów dla żadnej miary autonomicznej dla trzech stanów mowy, ale interakcje były znaczące. Indywidualna analiza uczestników wykazała, że każdy respondent inaczej reagował na przewodnictwo skóry czy zmienność rytmu serca. Nato-miast średnie oceny komfortu subiektywnego były częściej niższe po usłyszeniu mowy z jąkaniem lub zamaskowanej, a wyższe po usłyszeniu mowy płynnej. Korelacje między wszystkimi pomiarami a wynikami sumarycznymi POSHA-S ujawniły niewielki związek między miarami autonomicznymi a postawami wobec jąkania, ale wyższy poziom lęku jako stanu lub cechy wiązał się z bardziej po-zytywnymi przekonaniami na temat osób jąkających się. W przeciwieństwie do tego, niższy poziom lęku wiązał się z bardziej pozytywnymi reakcjami własnymi u osób jąkających się. Wnioski: Badanie to nie potwierdziło wcześniejszych doniesień o nasilonych autonomicznych reak-cjach na jąkanie wśród niejąkających się osób dorosłych, chociaż pomiary psychometryczne sugerują związek między lękiem a postawami wobec jąkania. Dalsze badania powinny eksplorować te relacje, zwłaszcza u małych dzieci. Słowa klucze: postawy wobec jąkania, lęk, pomiary autonomiczne, pomiary psychometryczne, POSHA-S Relationships between public attitudes toward stuttering…
... Speech perception plays an important role in social communication. Within the speech chain, people understand others' speech and monitor their own speech via speech perception (Denes & Pinson, 2015). Once the ability of speech perception is degraded, people would have difficulty in engaging in social communication, which is often the case happening in seniors. ...
Article
Full-text available
Purpose: This study aims to investigate the different degeneration processes of categorical perception (CP) of Mandarin lexical tones in the normal aging population and the pathological aging population with mild cognitive impairment (MCI). Method: In Experiment I, we compared the identification and discrimination of Tone 1 and Tone 2 across young adults, seniors aged 60-65 years, and older seniors aged 75-80 years with normal cognitive abilities. In Experiment II, we compared lexical tone identification and discrimination across young adults, healthy seniors, and age-matched seniors with MCI. Results: In Experiment I, tone perception was intact in seniors aged below 65 years. Those aged above 75 years could also maintain normal tone identification, whereas they showed poorer tone discrimination correlated with age-related poorer hearing level. In Experiment II, healthy seniors showed normal CP of Mandarin tones. Tone identification was also normal in those with MCI, whereas their tone discrimination had significantly degenerated. Conclusions: In the normal aging population, age-related hearing loss decreased signal audibility, accounting for poorer discrimination of Mandarin lexical tones in seniors above 75 years. In the pathological aging population with MCI, the poorer discrimination of lexical tones may be attributed to the additive effect of age, hearing loss, and cognitive impairment (e.g., impaired working memory and long-term phonological memory). This study uncovered the roles of low-level sensory processing and high-level cognitive processing in lexical tone perception in the Chinese aging population.
... A necessary but not sufficient intermediate stage in the transformation process is the recognition of (a sufficient number of) words in the order in which they were spoken (e.g. Denes and Pinson 1963;Gooskens and Van Heuven 2021;Smith and Nelson 1985). Listening comprehension skills enable foreign-language (FL) learners to make sense of language input and facilitate the emergence of other language skills (Vandergrift and Goh 2012;Yenkimaleki and Van Heuven 2016). ...
Article
The present study investigated the efficacy of segmental/suprasegmental vs. holistic pronunciation instruction in the development of listening comprehension skills by EFL learners, using a pre-test post-test design. Six groups of 20 intermediate EFL learners at a university in Iran took part in the study, all groups receiving the same amount of instruction (10 hours over 5 weeks). The control group listened to/viewed authentic audio recordings and movies in English, discussed their contents, and completed a variety of listening comprehension tasks but received no pronunciation instruction. Four experimental groups completed similar activities but during one third of the teaching time (20 minutes per class), received an explanation of segmental or suprasegmental features followed by production-focused or perception-focused practice. The final experimental group received holistic pronunciation instruction with mixed perception/production-focused practice for 20 minutes during each hour-long class. Versions of Longman’s TOEFL English proficiency test (paper-based) were used to assess listening comprehension at pre-test, immediate post-test and delayed post-test. The findings revealed that the holistic pronunciation instruction enhanced the listening comprehension skills of Iranian EFL learners more than separate segmental or suprasegmental training, with either perception or production-focused practice.
... For example, online hearing data [13] shows the different sound levels measured by a sound-level meter in terms of decibels for three classes of guns: handguns, shotguns, and rifles (see Table 1). In this gunfire sound level reference chart, all measured levels vary from 152-163 dB, which far exceed the threshold of pain (120 dB for threshold of pain; conversational speech 60 dB; ( [14]). It is clear that the manufacturer of the gun (specific across handguns, etc.), caliber, or length of barrel for shotguns or rifles, will all impact the acoustic sound level response in dB. ...
Conference Paper
Full-text available
Many communities which are experiencing increased gun violence are turning to acoustic gunshot detection systems (GSDS) with the hope that their deployment would provide increased 24/7 monitoring and the potential for more rapid response by law enforcement to the scene. In addition to real-time monitoring, data collected by gunshot detection systems have been used alongside witness testimonies in criminal prosecutions. Because of their potential benefit, it would be appropriate to ask-how effective are GSDS in both lab/controlled settings vs. deployed real-world city scenarios? How reliable are outputs produced by GSDS? What is system performance trade-off in gunshot detection vs. source localization of the gunshot? Should they be used only for early alerts or can they be relied upon in courtroom settings? What negative consequences are there for directing law enforcement to locations when a false positive event occurs? Are resources spent on GSDS operational costs well utilized or could these resources be better invested to improve community safety? This study does not attempt to address many of these questions including social or economic questions of GSDS, but provides a reflective survey of hardware and algorithmic operations of the technology to better understand its potential as well as limitations. Specifically, challenges are discussed regarding environmental and other mismatch conditions, as well as emphasis on validation procedures used and their expected reliability. Many concepts discussed in this paper are general and will be likely utilized in or have impact on any gunshot detection technology. For this study, we refer to the ShotSpotter system to provide specific examples of system infrastructure and validation procedures.
... 14 15.6. Överföringsteorin: kommunikationen som överföring På 1960-talet användes ofta en bok med titeln The Speech Chain (Denes & Pinson, 1963) som introduktion till ämnet talkommunikation. Den var tillkommen under en närmast behavioristisk epok med ett intresse för tal som "överföring"*. ...
Book
Languaging in Real Life: an introduction to dialogical perspectives on language, thinking and communication This book is a comprehensive introduction to a dialogical perspective on language, languaging, thinking, communication and culture. It builds upon social psychology, social and dialogical philosophy (phenomenology), interactional linguistics, and humanistic ideas of thinking and communication. Dialogical terms comprise, for example, dialogue, dialectics, dynamics, extended dialogism, external dialogue, partial holism, situations, contexts and activities, partial and partially shared understandings, participation, appropriation, and meaning-making, interpenetrations of concepts, e.g. persons and culture. According to dialogical theory, a great deal revolves around the assumption that the making of meaning and social order in human activities and cultures is usually and initially built on interactions and relations between Self and Others. Basic properties of contributions to external dialogues are relations between initiatives and responses. Categories of responsive actions include minimal, short (“elliptical”) and expanded (“full”) responses. An important distinction is that between situated and sociohistorical contexts; we can talk about “double dialogicality” (situated vs. sociohistorical). An explanatory context theory must also distinguish between co-textual, other situation-based, and cultural (non-local) types. Pragma-semantic categories are linguistic means and situated (“participants’”) meanings; a parallel distinction is that between meaning potentials (of words and constructions) and message potentialities. (of situated utterances). Communication comprises cognitive, emotional and volitional aspects, and involves partial (and partially shared) understandings, and relations of power and respect. Utterances are characterised by responsivity, addressivity, incrementation, and relatively frequent re(tro)constructions of ongoing processes. The book includes separate sections on evolution and ontogenesis, dialogue and thinking, individual and collective aspects of language and languaging, activity types, multimodality of utterances, conditions production, reception and understanding of utterances, and also some traditional – but partially misguided – ideas in the theorisation of language and communication. This includes a discussion of the “written language bias in linguistics” which is a historical feature of the language sciences despite their shift from “practical” to “theoretical” concerns. All chapters are designed to highlight dialogical aspects. The last two chapters contain a discussions of general dialogical ideas as well as of phenomenology as an overarching framework. The differences between natural-science and humanistic approaches to the mind and mental capacities of man conclude the book. Several arguments build upon earlier work by the author, such as Linell (1998, 2005, 2009) and numerous papers such as Linell & Marková (1993) and Linell (2016, 2020a, 2021a). A list of major sources of inspiration is given in Appendix 1.
... This has been recognized within linguistics, in investigating speech acts that were shown to require A good deal of improvisation because the mind is addressing its own thought and creating its unrehearsed delivery in words, sounds and gestures, forming unpredictable statements that feed back into the thought process (the performer as listener), creating an enriched process that is not unlike instantaneous composition with a given set or repertoire of elements. (Denes & Pinson, 1993) Thus, these processes could even rely on some common psychological mechanisms that trigger and drive them. ...
Article
Full-text available
The aim of this study was to gain insight into the phenomenon of improvisation, how it is manifested in communication, and to conceptualize the process of improvisation in general. I aimed to construct a model for use in teaching and further analysis of training programs that target and develop improvisation skills in communication. The ability to communicate is part and parcel of psychologists’ work. I develop and supervise interactive classes and training programs to promote improvisation and communication skills, using the grounded theory of improvisation in communication under conditions of high uncertainty. The improvisation sessions were videotaped, transcribed, and analyzed. Applying the qualitative method and working with grounded theory methodology, I studied five sessions. Here I report on the major categories that condition the improvisational process: Level of Anxiety, Coping, Communicative Skills, Imagination, and Spontaneity. I also outline the markers of spontaneous behavior: strange combinations (oxymorons), humor, and rapid topic switching.
... Puisque la parole nécessite à la fois la coordination de plus de 100 muscles (Denes & Pinson, 1963) et de nombreux processus psycho et neurolinguistiques (Brown & Hagoort, 2000), les pathologies neurodégénératives, mais aussi les troubles mentaux sont maintenant détectés grâce à des biomarqueurs vocaux, avec des performances permettant leur utilisation en conditions cliniques [pour des revues de la détection de pathologies dans la voix, voir (Fagherazzi et al., 2021) et (Low et al., 2020]. ...
Conference Paper
Full-text available
[English version below] L'utilisation de biomarqueurs vocaux est une des technologies les plus prometteuses pour l'implémentation de systèmes de santé numérique en conditions écologiques. En effet, de nombreuses et diverses pathologies sont maintenant diagnostiquées automatiquement de manière fiable grâce à des marqueurs vocaux. Cet article étudie et discute la faisabilité d'annoter des textes -- qui seront ensuite lus à voix haute par des patients hypersomniaques -- afin de concevoir des descripteurs basés sur la naturalité des pauses faites par ceux-ci. Pour cela, trois spécialistes ont annoté six textes extraits du Petit prince. Nous étudions à la fois à travers des mesures statistiques, mais aussi sous le prisme du rapport des annotateurs, deux axes : les différences entre les méthodes d'annotation ; et les lieux de désaccord dans les textes. Enfin, nous concluons quand à la fiabilité d'utiliser de telles annotations comme vérité terrain dans un système automatique. [Is it possible to annotate the naturalness of pauses made during reading out loud?] The use of voice and speech biomarkers is one of the most promising technologies to implement digital health in ecological conditions. Indeed, numerous and diversified pathologies are now accurately diagnosed by autonomous systems based on voice features. This article investigates and discusses the feasibility of annotating texts -- that are then read out loud by hypersomniac patients -- in order to design new features based on the naturalness of the pauses made by the patients. To do so, three specialists annotated six texts extracted from Le Petit Prince. We investigate two axes through statistics but also from the perceptive of annotators' report: differences in annotation methods; and disagreement location in texts. Finally, we conclude about the robustness of using such annotations as ground truth in an automated pipeline.
... This last part of the process is what we call speech understanding or comprehension. The sequence of events sketched here is known as the speech chain (Denes and Pinson 1963), and it has been the blueprint of Levelt's (1989) model of speech production and Cutler's (2012) model of native listening. ...
Chapter
Full-text available
In this tutorial paper, I explain what we mean by prosody in language and speech. I then show that, generally, prosody is highly redundant, i.e., it can be omitted from the speech signal without causing speech to become unintelligible or incomprehensible. Yet, it has often been mentioned, also in recent publications, that getting the prosody right should have priority in the acquisition of the phonology of a foreign language. This seems a contradiction. Why should a generally redundant feature of a code deserve priority in teaching? It is the purpose of this paper to resolve the paradox. [Entire e-book can be downloaded in open access from https://www.letraria.net/prosodia-e-bilinguismo/ ]
... Esta subdivisão da fonética, pela ligação mais estreita que apresenta a aspetos do processamento linguístico e pela ponte que estabelece entre o sinal acústico e as representações linguísticas, não será porventura tão separável da fonologia quanto a fonética articulatória ou a fonética acústica. Para esta divisão da fonética nestes três domínios e a relação que cada um deles estabelece com outras áreas de estudo, cf., p. ex., Denes & Pinson (1993). Vd. ainda as notas 3 e 10 para mais alguma informação relativa a este assunto. ...
Preprint
Full-text available
2015 Este texto corresponde a material inédito do autor e destina-se a fins pedagógicos na Faculdade de Letras da Universidade do Porto. Pode ser utilizado por terceiros com indicação expressa da fonte, para o que deve ser utilizada a seguinte referência bibliográfica: VELOSO, João. 2015. Introdução à Fonologia. Nível fonético e nível fonológico. Porto: Faculdade de Letras da Universidade do Porto (ms.) (Estas notas serão completadas em breve por outros textos de carácter semelhante relativos a outros tópicos de estudo específicos de fonologia. Comentários, notas e sugestões são bem-vindos e devem ser encaminhados diretamente para o autor: jveloso@letras.up.pt
... S peech perception is an important stage in the human speech chain. It enables the speaker to understand their speech as well as that of others, which is a foundation of social communication (Denes & Pinson, 2015). As an active cognitive procedure, the rapid processing of continuous speech information places a high demand on resources such as focused attention and the efficient manipulation of information held in working memory (Heald & Nusbaum, 2014). ...
Article
Full-text available
Purpose: This study investigated the effect of cognitive load (CL) on the categorical perception (CP) of Mandarin lexical tones to discuss the application of the generalized pulse-skipping hypothesis. This hypothesis assumes that listeners might miss/skip temporal pulses and lose essential speech information due to CL, which consequently affects both the temporal and spectral dimensions of speech perception. Should CL decrease listeners’ pitch sensitivity and impair the distinction of tone categories, this study would support the generalized pulse-skipping hypothesis. Method: Twenty-four native Mandarin-speaking listeners were recruited to complete a dual-task experiment where they were required to identify or discriminate tone stimuli while concurrently memorizing six Chinese characters or graphic symbols. A no-load condition without a memory recall task was also included as a baseline condition. The position of categorical boundary, identification slope, between-/within-category discrimination, and discrimination peakedness were compared across the three conditions to measure the impact of CL on tone perception. The recall accuracy of Chinese characters and graphic symbols was used to assess the difficulty of memory recall. Results: Compared to the no-load condition, both load conditions showed a boundary shift to Tone 3, shallower identification slope, poorer between-category discrimination, and lower discrimination peakedness. Within-category discrimination was negatively affected by CL in the graphic symbol condition only, not in the Chinese character condition. Conclusions: CL degraded listeners’ sensitivity to subtle F0 changes and impaired CP of Mandarin lexical tones. This provides support for the generalized pulse-skipping hypothesis. Besides, the involvement of lexical information modulated the effect of CL.
Conference Paper
Full-text available
Despite the continuous innovation in voice biomarkers domain for more than a decade and the apparent need for clinicians to have objective diagnostic tools, no device has yet been implemented in real clinical settings or widely adopted by clinicians. After giving a short overview of the literature, we argue that in addition to the factors usually mentioned in the literature (low performance, database sizes, transparency, etc.), an underestimated but crucial factor preventing the use of such systems is the therapeutic relationship. We also discuss the “objectivity” of such systems, and the place of diagnosis in clinical practice and its conceptual limitations. In order to shape useful and relevant voice biomarkers, we propose to estimate symptoms instead of diagnosis, and draw perspectives related to this paradigm, which will require databases annotated with patients’ symptoms rather than only their pathological status.
Article
The developmental trajectory of audiovisual speech perception in Mandarin‐speaking children remains understudied. This cross‐sectional study in Mandarin‐speaking 3‐ to 4‐year‐old, 5‐ to 6‐year‐old, 7‐ to 8‐year‐old children, and adults from Xiamen, China ( n = 87, 44 males) investigated this issue using the McGurk paradigm with three levels of auditory noise. For the identification of congruent stimuli, 3‐ to 4‐year‐olds underperformed older groups whose performances were comparable. For the perception of the incongruent stimuli, a developmental shift was observed as 3‐ to 4‐year‐olds made significantly more audio‐dominant but fewer audiovisual‐integrated responses to incongruent stimuli than older groups. With increasing auditory noise, the difference between children and adults widened in identifying congruent stimuli but narrowed in perceiving incongruent ones. The findings regarding noise effects agree with the statistically optimal hypothesis.
Chapter
In this chapter, we introduce some basics of spoken language processing (including both speech and natural language), which are fundamental to text-to-speech synthesis. Since speech and language are studied in the discipline of linguistics, we first overview some basic knowledge in linguistics and discuss a key concept called speech chain that is closely related to TTS. Then, we introduce speech signal processing, which covers the topics of digital signal processing, speech processing in the time and frequency domain, cepstrum analysis, linear prediction analysis, and speech parameter estimation. At last, we overview some typical speech processing tasks.KeywordsSpoken language processingLinguisticsSpeech chainSpeech signal processing
Article
Full-text available
Improvisación en el acto ÁNIMA sonograma.org/2023/01/improvisación-en-el-acto-anima/
Article
Full-text available
Voice is a major means of communication for humans, non-human mammals and many other vertebrates like birds and anurans. The physical and physiological principles of voice production are described by two theories: the MyoElastic-AeroDynamic (MEAD) theory and the Source-Filter Theory (SFT). While MEAD employs a multiphysics approach to understand the motor control and dynamics of self-sustained vibration of vocal folds or analogous tissues, SFT predominantly uses acoustics to understand spectral changes of the source via linear propagation through the vocal tract. Because the two theories focus on different aspects of voice production, they are often applied distinctly in specific areas of science and engineering. Here, we argue that the MEAD and the SFT are linked integral aspects of a holistic theory of voice production, describing a dynamically coupled system. The aim of this manuscript is to provide a comprehensive review of both the MEAD and the source-filter theory with its nonlinear extension, the latter of which suggests a number of conceptual similarities to sound production in brass instruments. We discuss the application of both theories to voice production of humans as well as of animals. An appraisal of voice production in the light of non-linear dynamics supports the notion that voice production can best be described with a systems view, considering coupled systems rather than isolated contributions of individual sub-systems.
Chapter
Perceiving and understanding spoken language is something that most listeners take for granted, at least in favorable listening conditions. Yet, decades of research have demonstrated that speech is variable and ambiguous, meaning listeners must constantly engage in active hypothesis testing of what was said. Within this framework, even relatively minor challenges imposed on speech recognition must be understood as requiring the interaction of perceptual, cognitive, and linguistic factors. This chapter provides a systematic review of the various ways in which listening environments may be considered adverse, with a dual focus on the cognitive and neural systems that are thought to improve speech recognition in these challenging situations. Although a singular mechanism or construct cannot entirely explain how listeners cope with adversity in speech recognition, overcoming listening adversity is an attentionally guided process. Neurally, many adverse listening conditions appear to depend on higher-order (rather than primary) representations of speech in cortex, suggesting that more abstract linguistic knowledge and context become particularly important for comprehension when acoustic input is compromised. Additionally, the involvement of the cinguloopercular (CO) network, particularly the anterior insula, in a myriad of adverse listening situations may indicate that this network reflects a general indication of cognitive effort. In discussing the various challenges faced in the perception and understanding of speech, it is critically important to consider the interaction of the listener’s cognitive resources (knowledge and abilities) with the specific challenges imposed by the listening environment.
Chapter
Successful speech understanding relies not only on the auditory pathway, but on cognitive processes that act on incoming sensory information. One area in which the importance of cognitive factors is particularly striking during speech comprehension is when the acoustic signal is made more challenging, which might happen due to background noise, talker characteristics, or hearing loss. This chapter focuses on the interaction between hearing and cognition in hearing loss in older adults. The chapter begins with a review of common age-related changes in hearing and cognition, followed by summary evidence from pupillometric, behavioral, and neuroimaging paradigms that elucidate the interplay between hearing and cognition. Across a variety of experimental paradigms, there is compelling evidence that when listeners process acoustically challenging speech, additional cognitive effort is required compared to acoustically clear speech. This increase in cognitive effort is associated with specific brain networks, with the clearest evidence implicating cingulo-opercular and executive attention networks. Individual differences in hearing and cognitive ability thus determine the cognitive demand faced by a particular listener, and the cognitive and neural resources needed to aid in speech perception.KeywordsListening effortBackground noiseSpeech perceptionCognitive agingSentence comprehensionNeuroimagingCingulo-opercular networkExecutive attentionPupillometryfMRI
Article
Drawing on data from well-known actors in popular films and TV shows, this reference guide surveys the representation of accent in North American film and TV over eight decades. It analyzes the speech of 180 film and television performances from the 1930s to today, looking at how that speech has changed; how it reflects the regional backgrounds, gender, and ethnic ancestry of the actors; and how phonetic variation and change in the 'real world' have been both portrayed in, and possibly influenced by, film and television speech. It also clearly explains the technical concepts necessary for understanding the phonetic analysis of accents. Providing new insights into the role of language in the expression of North American cultural identity, this is essential reading for researchers and advanced students in linguistics, film, television and media studies, and North American studies, as well as the larger community interested in film and television.
Article
Full-text available
The paper presents the results of a study on the audition of a Dutch tense vowel [e:] and the diphthong [ɛi] by Polish native speakers. It was hypothesized that Polish native speakers may pronounce the Dutch [e:] as a combination of [ɛ] and [j] because they generally fail to distinguish the [e:] from the diphthong [ɛi], as both sounds lie acoustically close to each other (especially those produced by speakers of Dutch from the Netherlands); moreover, they are absent from the Polish phonetic system. Instead, the experiment has shown that Polish native speakers are very good at differentiating the isolated Dutch [e:] and [ɛi]. It seems that the pronunciation of the Dutch [e:] as a combination of [ɛ] and [j] is rather a matter of articulation than audition, but further experiments are needed to examine the problem.
Article
To clarify the acoustic variables for predicting and classifying Japanese singleton and geminate consonants, raw and logarithmic durations of the consonants and their related segments were examined using 12 minimal pair words that were pronounced in a carrier sentence at various speaking rates by 20 native Japanese speakers. Regression and discriminant analyses revealed that the logarithmic durations were better at predicting and classifying Japanese singleton and geminate consonants than the raw durations used in many previous studies. Specifically, the best acoustic variables were the logarithmic duration of the consonant's closure or frication and the logarithmic average duration of the mora in the preceding carrier phrase. These results suggest that logarithmic durations are relational invariant acoustic variables that can cope with the durational variations of singleton and geminate consonants in a wide range of speaking rates.
Chapter
In recent years, attention has been focused on information presentation methods that take into account the user’s situation by utilizing wearable computing technology. Most of the existing information presentation methods present users with information that gives them a choice of actions, and encourage them to take actions. However, users may not be able to control their own actions appropriately. It is important not only to encourage users to act, but also to forcibly control their behavior. Such behavioral control methods use actuators (e.g. displays, speakers etc.) to stimulate senses such as vision and hearing, and then control behavior by reversing the reaction to maintain consistency with the usual senses. However, most of these studies have not yet examined how to determine the intensity of the stimuli when introducing behavioral control methods into the real world. To solve this problem, we focused on the influence of the presence of others on human behavior control. We propose a method to determine the amount of stimuli to be fed back to the user based on the results of simulating the sensations of others. We define this method as Pseudo Human Sense in the Loop (referred to as “PHSIL” in this paper), and conduct cognitive psychology experiments on presentations, applying PHSIL to both auditory and visual stimuli, to verify the effectiveness of PHSIL.
ResearchGate has not been able to resolve any references for this publication.