Table 1 - uploaded by Jessica S Arsenault
Content may be subject to copyright.
Descriptions and selected examples of phonological features

Descriptions and selected examples of phonological features

Source publication
Article
Full-text available
A fundamental goal of the human auditory system is to map complex acoustic signals onto stable internal representations of the basic sound patterns of speech. Phonemes and the distinctive features that they comprise constitute the basic building blocks from which higher-level linguistic representations, such as words and sentences, are formed. Alth...

Contexts in source publication

Context 1
... sounds were played at a comfortable volume for participants through electrodynamic MR-Confon headphones. As depicted in Table 1, the stimuli spanned the major distinctive features. ...
Context 2
... the fact that, for example, /m/ is more confusable with /n/ than it is with /k/ is both reliable and infor- mative-it tells us something about the way in which our brains process speech sounds. Following Shepard (1972), nonmetric multidi- mensional scaling [MDS; using isoMDS from the MASS (Modern Applied Statistics with S) package in the R programming language; Ven- ables and Ripley, 2002] was applied to the pho- nemic confusion data taken directly from Tables 1-6 in Miller and Nicely's (1955) report (noise levels: 18, 12, 6, 0, 6, 12 dB). The acoustic confusion data were first converted to distances using a logarithmic transformation of the normalized con- fusion probabilities (Shepard, 1972). ...

Similar publications

Article
Full-text available
Bilateral cochlear implantation aims to restore binaural hearing, important for spatial hearing, to children who are deaf. Improvements over unilateral implant use are attributed largely to the detection of interaural level differences (ILDs) but emerging evidence of impaired sound localization and binaural fusion suggest that these binaural cues a...
Article
Full-text available
Neuroimaging research in emotion regulation reveals a decrease of amygdala response to affective stimuli when the stimuli are perceived during the performance of a cognitive task. Two types of tasks are usually used to investigate this effect: one distracts attention from the emotional content of stimuli and another directly addresses the emotional...
Article
Full-text available
Understanding others’ speech while individuals simultaneously produce speech utterances implies neural competition and requires specific mechanisms for a neural resolution given that previous studies proposed opposing signal dynamics for both processes in the auditory cortex (AC). We here used neuroimaging in humans to investigate this neural compe...
Article
Full-text available
Purpose To identify potential differences in resting-state networks according to laterality of tinnitus using resting-state functional MRI (fMRI). Materials and Methods A total of 83 age-matched subjects consisting of 19 patients with right-sided tinnitus (Rt-T), 22 patients with left-sided tinnitus (Lt-T), 22 patients with bilateral tinnitus (Bil...
Article
Full-text available
We investigated the association between the left planum temporale (PT) surface area or asymmetry and the hemispheric or regional functional asymmetries during language production and perception tasks in 287 healthy adults (BIL&GIN) who were matched for sex and handedness. The measurements of the PT surface area were performed after manually delinea...

Citations

... While it is not surprising that phonology activated left superior temporal areas, we also report recruitment of the right STG/MTG area, which backs earlier findings on both temporal cortices being involved in phonological feature perception (Arsenault & Buchsbaum, 2015) and responding to higher phonological working memory load during language processing (Perrachione et al., 2017). ...
Article
Full-text available
Language is a key human faculty for communication and interaction that provides invaluable insight into the human mind. Previous work has dissected different linguistic operations, but the large-scale brain networks involved in language processing are still not fully uncovered. Particularly, little is known about the subdomain-specific engagement of brain areas during semantic, syntactic, phonological, and prosodic processing and the role of subcortical and cerebellar areas. Here, we present the largest coordinate-based meta-analysis of language processing including 403 experiments. Overall, language processing primarily engaged bilateral fronto-temporal cortices, with the highest activation likelihood in the left posterior inferior frontal gyrus (IFG). Whereas we could not detect any syntax-specific regions, semantics specifically engaged left posterior temporal areas (left fusiform and occipitotemporal cortex) and the left frontal pole. Phonology showed highest subdomain-specificity in bilateral auditory and left postcentral regions, whereas prosody engaged specifically the right amygdala and the right IFG. Across all subdomains and modalities, we found strong bilateral subcortical and cerebellar contributions. Especially the right cerebellum was engaged during various processes, including speech production, visual, and phonological tasks. Collectively, our results emphasize consistent recruitment and high functional modularity for general language processing in bilateral domain-specific (temporo-frontal) and domain-general (medial frontal/anterior cingulate cortex) regions but also a high specialization of different subareas for different linguistic subdomains. Our findings refine current neurobiological models of language by adding novel insight into the general sensitivity of the language network and subdomain-specific functions of different brain areas and highlighting the role of subcortical and cerebellar regions for different language operations.
... First, we find that glimpse encoding initially occurs for both talkers in HG and later with an effect of attention in STG. The observed HG encoding is in line with past studies showing that neural responses in HG encode phonetic information [52], do so at sites that respond with a greater latency [53], and are minimally modulated by attention [17]. The observed STG encoding suggests that non-target glimpses are suppressed but not eliminated from this later stage of encoding. ...
Article
Full-text available
Humans can easily tune in to one talker in a multitalker environment while still picking up bits of background speech; however, it remains unclear how we perceive speech that is masked and to what degree non-target speech is processed. Some models suggest that perception can be achieved through glimpses, which are spectrotemporal regions where a talker has more energy than the background. Other models, however, require the recovery of the masked regions. To clarify this issue, we directly recorded from primary and non-primary auditory cortex (AC) in neurosurgical patients as they attended to one talker in multitalker speech and trained temporal response function models to predict high-gamma neural activity from glimpsed and masked stimulus features. We found that glimpsed speech is encoded at the level of phonetic features for target and non-target talkers, with enhanced encoding of target speech in non-primary AC. In contrast, encoding of masked phonetic features was found only for the target, with a greater response latency and distinct anatomical organization compared to glimpsed phonetic features. These findings suggest separate mechanisms for encoding glimpsed and masked speech and provide neural evidence for the glimpsing model of speech perception.
... For example, place of articulation can determine 1) labial consonants with ascending formant transition towards the vowel formants, 2) alveolar consonants with descending formant transitions, and 3) palatovelar consonants with descending F2 and ascending F3 transition (Nearey, 1998). Many of the corresponding articulatory neural patterns could be mapped with multivariate approaches contributing to a greater understanding of speech processing (Aglieri et al., 2021;Archila-Meléndez et al., 2018;Arsenault & Buchsbaum, 2015Bonte et al., 2014;Chang et al., 2010;Correia et al., 2014;Formisano et al., 2008;Gardumi et al., 2016;Kilian-Hutten et al., 2011;Lee et al., 2012;Levy & Wilson, 2020;Ley et al., 2012;Obleser et al., 2010;Preisig et al., 2022;Rampinini et al., 2017;Zhang et al., 2016). In general, vowel representations seem to be more robustly identified with MVPA than consonant representations , probably because vowels can stand for themselves while consonants unfold their full articulative spectrum only in combination with a vowel and are influenced by coarticulation and therefore more variable. ...
... Especially in studies investigating consonant representations, the question around the involvement of sensorimotor areas yielded some controversies. On the one hand, neural patterns from the temporal cortex, but not the motor or premotor cortex, reflected the three distinct articulatory features (Arsenault & Buchsbaum, 2015. On the other hand, activity from premotor and motor areas could be used to distinguish between manner of articulation (i.e. ...
Article
Full-text available
Speech perception is heavily influenced by our expectations about what will be said. In this review, we discuss the potential of multivariate analysis as a tool to understand the neural mechanisms underlying predictive processes in speech perception. First, we discuss the advantages of multivariate approaches and what they have added to the understanding of speech processing from the acoustic-phonetic form of speech, over syllable identity and syntax, to its semantic content. Second, we suggest that using multivariate techniques to measure informational content across the hierarchically organised speech-sensitive brain areas might enable us to specify the mechanisms by which prior knowledge and sensory speech signals are combined. Specifically, this approach might allow us to decode how different priors, e.g. about a speaker's voice or about the topic of the current conversation, are represented at different processing stages and how incoming speech is as a result differently represented. ARTICLE HISTORY
... These findings link to a range of existing functional imaging evidence for engagement of motor regions during active speech perception [77,78] and representation of abstract linguistic information (e.g. articulatory gestures or phonemes) in these brain regions [79][80][81]. If the motor system-or at least its part that is involved in temporal predictions-is indeed more active during perception of intelligible speech, then it represents a promising candidate for future investigations of anatomical and functional substrates of speech rhythm perception. ...
Article
Full-text available
Auditory rhythms are ubiquitous in music, speech, and other everyday sounds. Yet, it is unclear how perceived rhythms arise from the repeating structure of sounds. For speech, it is unclear whether rhythm is solely derived from acoustic properties (e.g., rapid amplitude changes), or if it is also influenced by the linguistic units (syllables, words, etc.) that listeners extract from intelligible speech. Here, we present three experiments in which participants were asked to detect an irregularity in rhythmically spoken speech sequences. In each experiment, we reduce the number of possible stimulus properties that differ between intelligible and unintelligible speech sounds and show that these acoustically-matched intelligibility conditions nonetheless lead to differences in rhythm perception. In Experiment 1, we replicate a previous study showing that rhythm perception is improved for intelligible (16-channel vocoded) as compared to unintelligible (1-channel vocoded) speech-despite near-identical broadband amplitude modulations. In Experiment 2, we use spectrally-rotated 16-channel speech to show the effect of intelligibility cannot be explained by differences in spectral complexity. In Experiment 3, we compare rhythm perception for sine-wave speech signals when they are heard as non-speech (for naïve listeners), and subsequent to training, when identical sounds are perceived as speech. In all cases, detection of rhythmic regularity is enhanced when participants perceive the stimulus as speech compared to when they do not. Together, these findings demonstrate that intelligibility enhances the perception of timing changes in speech, which is hence linked to processes that extract abstract linguistic units from sound.
... This is especially interesting for the present study, given the inherently low-dimensional parameterization of speech that is given by articulatory features, which are a Bottleneck layers which are trained alongside the other layers in a model have been shown to be superior to other methods of lowering dimensions, such as simple PCA (Grézl et al., ). Mesgarani et al., 2008Mesgarani et al., , 2014Chang et al., 2010;Di Liberto et al., 2015;Moses et al., 2016Moses et al., , 2018 and functional magnetic resonance imaging (fMRI: Arsenault and Buchsbaum, 2015;Correia et al., 2015) studies in humans show differential responses to speech sounds exhibiting different articulatory features in superior temporal speech areas. Heschl's gyrus (HG) and surrounding areas of the bilateral superior temporal cortices (STC) have also shown selective sensitivity to perceptual features of speech sounds earlier in the recognition process (Chan et al., 2014;Moerel et al., 2014;Saenz and Langers, 2014;Su et al., 2014;Thwaites et al., 2016). ...
Article
Full-text available
Introduction In recent years, machines powered by deep learning have achieved near-human levels of performance in speech recognition. The fields of artificial intelligence and cognitive neuroscience have finally reached a similar level of performance, despite their huge differences in implementation, and so deep learning models can—in principle—serve as candidates for mechanistic models of the human auditory system. Methods Utilizing high-performance automatic speech recognition systems, and advanced non-invasive human neuroimaging technology such as magnetoencephalography and multivariate pattern-information analysis, the current study aimed to relate machine-learned representations of speech to recorded human brain representations of the same speech. Results In one direction, we found a quasi-hierarchical functional organization in human auditory cortex qualitatively matched with the hidden layers of deep artificial neural networks trained as part of an automatic speech recognizer. In the reverse direction, we modified the hidden layer organization of the artificial neural network based on neural activation patterns in human brains. The result was a substantial improvement in word recognition accuracy and learned speech representations. Discussion We have demonstrated that artificial and brain neural networks can be mutually informative in the domain of speech recognition.
... While the present results suggest that the right STS plays an important role in allowing listeners to integrate phonetic information and talker information, future work will be needed to examine the generalizability of these results. For instance, previous work has shown that phonetic features such as voicing, manner, and place of articulation are distributed across the superior temporal lobe (Arsenault & Buchsbaum, 2015;Mesgarani, Cheung, Johnson, & Chang, 2014), so it will be important to examine whether the right STS plays a similar role in conditioning phonetic identity on talker information for speech sound distinctions that rely on other featural differences. ...
Article
Full-text available
Though the right hemisphere has been implicated in talker processing, it is thought to play a minimal role in phonetic processing, at least relative to the left hemisphere. Recent evidence suggests that the right posterior temporal cortex may support learning of phonetic variation associated with a specific talker. In the current study, listeners heard a male talker and a female talker, one of whom produced an ambiguous fricative in /s/-biased lexical contexts (e.g., epi?ode) and one who produced it in /∫/-biased contexts (e.g., friend?ip). Listeners in a behavioral experiment (Experiment 1) showed evidence of lexically guided perceptual learning, categorizing ambiguous fricatives in line with their previous experience. Listeners in an fMRI experiment (Experiment 2) showed differential phonetic categorization as a function of talker, allowing for an investigation of the neural basis of talker-specific phonetic processing, though they did not exhibit perceptual learning (likely due to characteristics of our in-scanner headphones). Searchlight analyses revealed that the patterns of activation in the right superior temporal sulcus (STS) contained information both about who was talking and what phoneme they produced. We take this as evidence that talker information and phonetic information are integrated in the right STS. Functional connectivity analyses suggested that the process of conditioning phonetic identity on talker information depends on the coordinated activity of a left-lateralized phonetic processing system and a right-lateralized talker processing system. Overall, these results clarify the mechanisms through which the right hemisphere supports talker-specific phonetic processing.
... This 56 is especially interesting for the present study, given the inherently low-dimensional parametri-57 sation of speech that is given by articulatory features, which are a candidate characterisation of 58 responses to speech in human auditory cortex. 59 Speech responses in human auditory cortex 60 Recent electrocorticography (ECoG; [9,18,37,38,43,44]) and functional magnetic resonance 61 imaging (fMRI; [1,13]) studies in humans show differential responses to speech sounds exhibiting 62 different articulatory features in superior temporal speech areas. Heschl's gyrus (HG) and 63 surrounding areas of the bilateral superior temporal cortices (STC) have also shown selective 64 sensitivity to perceptual features of speech sounds earlier in the recognition process [8, 40, 50, 56, 65 57]. ...
Preprint
Full-text available
How the human brain supports speech comprehension is an important question in neuroscience. Studying the neurocomputational mechanisms underlying human language is not only critical to understand and develop treatments for many human conditions that impair language and communication but also to inform artificial systems that aim to automatically process and identify natural speech. In recent years, intelligent machines powered by deep learning have achieved near human level of performance in speech recognition. The fields of artificial intelligence and cognitive neuroscience have finally reached a similar phenotypical level despite of their huge differences in implementation, and so deep learning models can—in principle—serve as candidates for mechanistic models of the human auditory system. Utilizing high-performance automatic speech recognition systems, and advanced noninvasive human neuroimaging technology such as magnetoencephalography and multivariate pattern-information analysis, the current study aimed to relate machine-learned representations of speech to recorded human brain representations of the same speech. In one direction, we found a quasi-hierarchical functional organisation in human auditory cortex qualitatively matched with the hidden layers of deep neural networks trained in an automatic speech recognizer. In the reverse direction, we modified the hidden layer organization of the artificial neural network based on neural activation patterns in human brains. The result was a substantial improvement in word recognition accuracy and learned speech representations. We have demonstrated that artificial and brain neural networks can be mutually informative in the domain of speech recognition. Author summary The human capacity to recognize individual words from the sound of speech is a cornerstone of our ability to communicate with one another, yet the processes and representations underlying it remain largely unknown. Software systems for automatic speech-to-text provide a plausible model for how speech recognition can be performed. In this study, we used an automatic speech recogniser model to probe recordings from the brains of participants who listened to speech. We found that the parts of the dynamic, evolving representations inside the machine system were a good fit for representations found in the brain recordings, both showing similar hierarchical organisations. Then, we observed where the machine’s representations diverged from the brain’s, and made experimental adjustments to the automatic recognizer’s design so that its representations might better fit the brain’s. In so doing, we substantially improved the recognizer’s ability to accurately identify words.
... It is known that different consonants are processed by discrete regions of the superior temporal gyrus (STG) of the auditory cortex, and an ECoG study showed that separate electrode clusters responded to phonemes differing in their manner of articulation and voicing (Mesgarani et al., 2014). However, an fMRI study also showed that there is substantial overlap between regions responsible for the processing of different consonant groups (Arsenault and Buchsbaum, 2015). Furthermore, the STG, which is the area of the brain showing discrete processing of phonemes, is also responsible for envelope encoding (Hamilton et al., 2020) and comprehension (Binder et al., 2000). ...
Article
Full-text available
Neural entrainment to speech appears to rely on syllabic features, especially those pertaining to the acoustic envelope of the stimuli. It has been proposed that the neural tracking of speech depends on the phoneme features. In the present electroencephalography experiment, we examined data from 25 participants to investigate neural entrainment to near-isochronous stimuli comprising syllables beginning with different phonemes. We measured the inter-trial phase coherence of neural responses to these stimuli and assessed the relationship between this coherence and acoustic properties of the stimuli designed to quantify their “edginess.” We found that entrainment was different across different classes of the syllable-initial phoneme and that entrainment depended on the amount of “edge” in the sound envelope. In particular, the best edge marker and predictor of entrainment was the latency of the maximum derivative of each syllable.
... Another interpretation is that the identified patterns reflect the categorical perception of syllables. In line with existing data, we found that the bilateral pSTG/STS ( Formisano et al., 2008 ;Chang et al., 2010 ;Kilian-Hütten et al., 2011 ;Mesgarani et al., 2014 ;Arsenault and Buchsbaum, 2015 ;Yi et al., 2019 ;Levy and Wilson, 2020 ), left SMG ( Caplan et al., 1995 ;Dehaene-Lambertz et al., 2005 ;Zevin and McCandliss, 2005 ;Raizada and Poldrack, 2007 ) for a meta-analysis see ( Turkeltaub and Branch Coslett, 2010 ), left IFG ( Hasson et al., 2007 ;Myers et al., 2009 ;Lee et al., 2012 ;Chevillet et criminate between syllable reports in ambiguous and unambiguous stimuli. We found that all these areas also discriminate between different syllable reports when stimulus acoustics are kept constant. ...
Article
Full-text available
Which processes in the human brain lead to the categorical perception of speech sounds? Investigation of this question is hampered by the fact that categorical speech perception is normally confounded by acoustic differences in the stimulus. By using ambiguous sounds, however, it is possible to dissociate acoustic from perceptual stimulus representations. Twenty-seven normally hearing individuals took part in an fMRI study in which they were presented with an ambiguous syllable (intermediate between /da/ and /ga/) in one ear and with disambiguating acoustic feature (third formant, F3) in the other ear. Multi-voxel pattern searchlight analysis was used to identify brain areas that consistently differentiated between response patterns associated with different syllable reports. By comparing responses to different stimuli with identical syllable reports and identical stimuli with different syllable reports, we disambiguated whether these regions primarily differentiated the acoustics of the stimuli or the syllable report. We found that BOLD activity patterns in left perisylvian regions (STG, SMG), left inferior frontal regions (vMC, IFG, AI), left supplementary motor cortex (SMA/pre-SMA), and right motor and somatosensory regions (M1/S1) represent listeners’ syllable report irrespective of stimulus acoustics. Most of these regions are outside of what is traditionally regarded as auditory or phonological processing areas. Our results indicate that the process of speech sound categorization implicates decision-making mechanisms and auditory-motor transformations.
... These findings link to a range of existing functional imaging evidence for engagement of motor regions during active speech perception [75,76] and representation of abstract linguistic information (e.g. articulatory gestures or phonemes) in these brain regions [77][78][79]. ...
Preprint
Full-text available
Auditory rhythms are ubiquitous in music, speech, and other everyday sounds. Yet, it is unclear how perceived rhythms arise from the repeating structure of sounds. For speech, it is unclear whether rhythm is solely derived from acoustic properties (e.g., rapid amplitude changes), or if it is also influenced by the linguistic units (syllables, words, etc.) that listeners extract from intelligible speech. Here, we present three experiments in which participants were asked to detect an irregularity in rhythmic speech sequences. In each experiment, we reduce the number of possible stimulus properties that differ between intelligible and unintelligible speech sounds and show that these acoustically-matched intelligibility conditions nonetheless lead to differences in rhythm perception. In Experiment 1, we replicate a previous study showing that rhythm perception is improved for intelligible (16-channel vocoded) as compared to unintelligible (1-channel vocoded) speech – despite near-identical broadband amplitude modulations. In Experiment 2, we use spectrally-rotated 16-channel speech to show the effect of intelligibility cannot be explained by differences in spectral complexity. In Experiment 3, we compare rhythm perception for sine-wave speech signals when they are heard as non-speech (for naïve listeners), and subsequent to training, when identical sounds are perceived as speech. In all cases, detection of rhythmic regularity is enhanced when participants perceive the stimulus as speech compared to when they do not. Together, these findings demonstrate that intelligibility enhances the perception of speech rhythm, which is hence linked to processes that extract abstract linguistic units from sound.