Article

Merging information in speech recognition: Feedback is never necessary

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Top-down feedback does not benefit speech recognition; on the contrary, it can hinder it. No experimental data imply that feedback loops are required for speech recognition. Feedback is accordingly unnecessary and spoken word recognition is modular. To defend this thesis, we analyse lexical involvement in phonemic decision making. TRACE (McClelland & Elman 1986), a model with feedback from the lexicon to prelexical processes, is unable to account for all the available data on phonemic decision making. The modular Race model (Cutler & Norris 1979) is likewise challenged by some recent results, however. We therefore present a new modular model of phonemic decision making, the Merge model. In Merge, information flows from prelexical processes to the lexicon without feedback. Because phonemic decisions are based on the merging of prelexical and lexical information, Merge correctly predicts lexical involvement in phonemic decisions in both words and nonwords. Computer simulations show how Merge is able to account for the data through a process of competition between lexical hypotheses. We discuss the issue of feedback in other areas of language processing and conclude that modular models are particularly well suited to the problems and constraints of speech recognition.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Note that a model like this could be used to investigate many aspects of word recognition. In fact, the Merge model (Norris et al., 2000) has this structure (as well as lateral inhibition), and can simulate many important aspects of spoken word recognition, despite being unable to encode order or repeated elements. Avoiding these challenges can only be a temporary simplifying assumption, however. ...
... Any comprehensive model of spoken word recognition must be able to account for top-down effects, and feedback allows TRACE to plausibly simulate many such effects (McClelland & Elman, 1986). As discussed above, however, at least some effects considered to be "top-down" can be simulated without feedback (Norris et al., 2000). However, graceful degradation is another important motivation for feedback in interactive activation models (Dell, Chang & Griffin, 1999;McClelland & Elman, 1986 [e.g., pp. ...
... Simulation 5 is particularly critical with respect to theoretical debates in spoken word recognition. Norris, Cutler and McQueen (2000; have argued that there is no logical reason to include feedback in models of spoken word recognition. The crucial aspects of their argument are that (a) a system with feedback is more complex than one without, (b) any result that can be simulated with feedback can be simulated in a purely feedforward ("autonomous") system, and therefore (c) there can be no general benefit of feedback; the best a system can do is tune its feedforward connections to provide the best estimate of the probability of each phoneme given some stretch of input, and appealing to lexical knowledge cannot improve recognition. ...
Article
Full-text available
The Time-Invariant String Kernel (TISK) model of spoken word recognition (Hannagan, Magnuson & Grainger, 2013; You & Magnuson, 2018) is an interactive activation model with many similarities to TRACE (McClelland & Elman, 1986). However, by replacing most time-specific nodes in TRACE with time-invariant open-diphone nodes, TISK uses orders of magnitude fewer nodes and connections than TRACE. Although TISK performed remarkably similarly to TRACE in simulations reported by Hannagan et al., the original TISK implementation did not include lexical feedback, precluding simulation of top-down effects, and leaving open the possibility that adding feedback to TISK might fundamentally alter its performance. Here, we demonstrate that when lexical feedback is added to TISK, it gains the ability to simulate top-down effects without losing the ability to simulate the fundamental phenomena tested by Hannagan et al. Furthermore, with feedback, TISK demonstrates graceful degradation when noise is added to input, although parameters can be found that also promote (less) graceful degradation without feedback. We review arguments for and against feedback in cognitive architectures, and conclude that feedback provides a computationally efficient basis for robust constraint-based processing.
... This feedback loop to phonemes from the words they occur in acts to increase their activation faster than the same phonemes in nonwords. By contrast, in the Merge model [2] information flow through the system is entirely bottom-up, from phonetic to lexical processing, never in the reverse direction. Phonemic decisions are made by a dedicated mechanism which is sensitive both to the continuous stream of incoming information from phonetic processing, and to higher-level information, e.g. from the lexicon. ...
... As described above, either top-down models such as TRACE [1] or strictly bottom-up models such as Merge [2] can account for lexically induced effects of orthography when such a task is performed with words. The models differ, though, in whether they actually predict such effects. ...
... As laid out in the introduction, the appearance of effects of orthography with words but not with nonwords is consistent with the phonemic decision-making model Merge [2], but runs counter to the predictions of the top-down model TRACE [1]. In TRACE, top-down connections operate automatically, so such dissociation should not appear. ...
... r Teorisi (Motor Theory) (Liberman ve Mattingly, 1985), Doğrudan Gerçekçilik Teorisi (Direct Realist Theory) (Fowler, 1986), TRACE Teorisi (McClelland ve Elman, 1986), Bulanık Mantıksal Model (Fuzzy Logic Model) (Massaro, 1989), Kısa Liste Modeli (Shortlist Model) (Norris, 1994), NAM (The Neighborhood Activation Model) (Goldinger vd., 1989), MERGE (Norris, vd. 2000), Çift Akış Modeli (Dual Stream Model) (Hickok ve Poeppel, 2007) ve Nörohesaplamalı Model (Neurocomputational Model) (Kröger vd., 2009) vb.'dir. Bu modellerin her birinin kendilerine özgü çalışma ve kullanım yöntemleri mevcuttur. ...
... Örneğin, TRACE Modeli'nde (McClelland, Elman, 1983, 1986, girdi önce bir özellik temsili düzeyine, daha sonra bir fonem seviyesine ve son olarak bir sözcük seviyesine eşlenir. Kohort (Marslen-Wilson ve Welsh, 1978), NAM (Goldinger, vd., 1989), Kısa Liste (Noris,1994, ve MERGE (Norris, 2000) modellerinde ise dinleyici tarafından duyulan ilk ses, zihinsel sözlükteki tüm benzer sesli girişlerin aktivasyonunu tetikler. Etkinleştirilen tüm ögeler, yalnızca tek bir giriş kalana ve eşiğe ulaşana kadar birbirleriyle rekabet eder." ...
... Konuşma Algısı ve İşitene Özgü Nedenler: Bu "mondegreen"in başladığı sözcüğün ilk sesiyle "orijinal sözler"deki ilgili sözcüğün ilk sesi uyuşmamaktadır. Kohort (Marslen-Wilson ve Welsh, 1978), NAM (Goldinger, vd., 1989), Kısa Liste (Norris, 1994) ve MERGE (Norris, 2000) modellerinde, dinleyici tarafından duyulan ilk ses, zihinsel sözlükteki tüm benzer sesli girişlerin aktivasyonunu tetikler. Etkinleştirilen tüm ögeler, yalnızca tek bir giriş kalana ve eşiğe ulaşana kadar birbirleriyle rekabet eder. ...
Article
Full-text available
“Mondegreen”ler, bir ifadenin veya şarkı sözünün yanlış işitilmesi veya yanlış yorumlanmasından kaynaklanan sözcük dizileridir. Türkçe dışındaki birçok dildeki “mondegreen”ler üzerine muhtelif dil araştırmaları yapılmış ve bunların fonetik ya da fonolojik özellikleri ortaya konmuştur. Bu çalışmada ise Türkçe “mondegreen”ler dilsel açıdan ele alınmış ve bunların ortaya çıkışındaki fonetik ya da fonolojik nedenler Türk dili açısından değerlendirilmeye çalışılmıştır. Bu yapılırken “mondegreen”ler ile ilgili literatür özetine değinilmiş, çalışmanın yöntem ve hedefleri açıklanmış ve Türkçeden seçilen çeşitli “mondegreen” örnekleri üzerinden fonetik ya da fonolojik birçok yorum ortaya konmuştur. Böylelikle, Türkçe “mondegreen”lerin ortaya çıkma sebepleri ve dil özellikleri dikkatlere sunulmuş, sağlıklı bir iletişimin sağlanması yolunda bir engel olarak değerlendirilebilecek yanlış işitmelerin Türk dilindeki yapısı hakkında çeşitli tespitler dile getirilmiştir. Türkçe “mondegreen”lerin dil yapılarını ve meydana çıkış nedenlerini ortaya koymak, diğer dillerdeki “mondegreen” yapıları ile Türk dilindeki “mondegreen” yapılarının dilsel anlamda karşılaştırılmasına olanak sağlayacak, böylece yanlış işitmelerin çeşitli dillerdeki karakteristik nitelikleri hakkında ortak birtakım çıkarımlar yapılabilmesine ortam hazırlayarak sağlıklı iletişimin unsurlarını anlama yolunda bir katkı sağlamış olacaktır.
... In this way, the statistical model captures an earlier snapshot of the recognition process than when the characteristics are calculated at the stimulus offset. If the effects trend the same between the first and second models, then this is evidence that pseudoword recognition uses the same mechanisms as real word recognition, as suggested by Luce (1986), Luce and Pisoni (1998), and Norris et al. (2000), especially since the same mechanisms are obligatorily in use before the uniqueness point. Following the fitting of the model, the results are presented and discussed. ...
... Marslen-Wilson & Tyler, 1980;W. D. Marslen-Wilson & Welsh, 1978), the Neighborhood Activation Model (Luce, 1986;Luce & Pisoni, 1998), TRACE (McClelland & Elman, 1986 and TISK You and Magnuson (2018), naive and linear discriminative learning (Arnold et al., 2017;Baayen et al., 2019Baayen et al., , 2011Chuang et al., 2019Chuang et al., , 2021, Shortlist B (Norris & McQueen, 2008), MERGE (Norris et al., 2000), and the cohort-like DIANA ten Bosch et al., 2013;ten Bosch, Boves, Tucker, et al., 2015). Observations are also made about the nature of the auditory lexical decision task. ...
... The notion of recognising a pseudoword has historically been nebulously defined. The closest descriptions of this process comes from the aforementioned neighbourhood activation model (Luce & Pisoni, 1998) and the Merge model (Norris et al., 2000). In the neighbourhood activation model, a failure to activate a lexical candidate sufficiently results in the identification of a pseudoword. ...
Article
Pseudowords are used as stimuli in many psycholinguistic experiments yet they, remain largely under-researched. To better understand the cognitive processing of pseudowords, we analysed the pseudoword responses in the Massive Auditory Lexical Decision megastudy data set. Linguistic characteristics that influence the processing of real English words–namely, phonotactic probability, phonological neighbourhood density, uniqueness point, and morphological complexity–were also found to influence the processing time of spoken pseudowords. Subsequently, we analysed how the linguistic characteristics of non-unique portions of pseudowords influenced processing time. We again found that the named linguistic characteristics affected processing time, highlighting the dynamicity of activation and competition. We argue these findings also speak to learning new words and spoken word recognition generally. We then discuss what aspects of pseudoword recognition a full model of spoken word recognition must account for. We finish with a re-description of the auditory lexical decision task in light of our results.
... However, the relative contribution of bottom-up and top-down cues to solve the segmentation problem is still a matter of debate (Cutler, 2012;Eysenck & Keane, 2020). In particular, the exact role of contextual cues is yet to determine -especially with some researchers' strongly claiming that no contextual feedback is ever necessary (Cutler, 2012;Norris et al., 2000). ...
... In particular, the relative contribution of bottom-up and top-down cues for L2 online segmentation is still unclear. The role of contextual top-down cues is a source of debate, with some researchers accrediting no effect to context (Cutler, 2012;Norris et al., 2000). Evidence exists on the use of different segmentation cues by native and non-native speakers (e.g. ...
Article
Full-text available
The relative contribution of bottom-up (i.e., acoustic-phonetic) and top-down (i.e., contextual) cues for successful L2 online segmentation is still a matter of debate. This study used the gating paradigm to investigate the segmentation processes of adult L2 English listeners with different proficiency levels, by looking at the type of cues they exploit and how they revise their hypotheses as connected speech is progressively revealed. Twenty-one French and Tunisian undergraduates were selected from a larger pool (n = 226) and identified as skilled (n = 11) and unskilled (n = 10) listeners based on their scores on standardized English listening and vocabulary tests. Descriptive statistics, analysis of variance and qualitative analysis were performed on the obtained data. Overall, this study provides supporting L2 evidence for the hierarchical nature of the multiple speech segmentation cues (Mattys et al., 2005). The results indicated an early effect of context on segmentation independent of L2 proficiency when the context is constraining. In non-constraining contexts, successful segmentation is delayed for both groups, with unskilled L2 listeners needing far more bottom-up information to process input and revise their segmentation hypotheses. We conclude that, in online L2 speech segmentation, what distinguishes proficient from non-proficient listeners is their efficient processing of bottom-up cues. Pedagogical implications are provided, hoping to help L2 English teachers (and materials developers) focus on bottom-up training to improve their learners’ real-time comprehension competence.
... In the recent decades, significant progress has been made in parentage analysis in aquaculture species. These studies have included estimating genetic parameters of traits of interest [5] [12] investigating inbreeding in hatcheries [13], estimating the number of brooders in breeding stocks [14], and evaluating variance in reproductive success among individuals [15]. "Molecular parentage analysis typically involves several key steps: tissue sample collection, DNA extraction, selection of DNA markers, polymerase chain reaction (PCR) amplification of DNA samples, genotyping of PCR products, and data analysis to determine parent-offspring relationships" [16]. ...
Article
Full-text available
Molecular parentage analysis is a powerful tool for reconstructing pedigrees and estimating genetic parameters in aquaculture species. It is based on the comparison of DNA marker genotypes between offspring and potential parents. This review provides a concise overview of molecular parentage analysis in aquaculture, covering its principles, methods, applications, challenges, and limitations. It describes common DNA markers, including microsatellites and single nucleotide polymorphisms (SNPs), used in parentage analysis and the criteria for their selection. The software and statistical methods for assigning parentage and evaluating the accuracy and power of the assignment are also discussed. This review demonstrates applications for estimating genetic parameters, investigating inbreeding, evaluating reproductive success, and improving selective breeding programs. In conclusion, while molecular parentage analysis is a valuable tool for improving genetic management in aquaculture, careful planning, implementation, and interpretation are essential.
... During spoken word recognition, multiple factors affect how quickly or accurately a target word is selected. One of the key debates for theories of word recognition concerns the extent to which top-down information interacts with bottom-up information to influence word recognition (Magnuson et al., 2018;Norris et al., 2000). Regarding the effects of top-down information, there is ample evidence that predictability affects word recognition, such that predictable words are processed more efficiently than unpredictable words (Kutas & Hillyard, 1984;Staub, 2015). ...
Article
Full-text available
We investigated the predictability effects of pitch accent on word recognition using the sandhi rule in Kansai Japanese (KJ). Native KJ speakers and native Tokyo Japanese (TJ) speakers (control group) saw four objects while hearing modifier + noun phrases in a speeded image-selection task. The register tone of the noun’s initial mora was predictable or unpredictable based on the tone of the modifier’s final mora in KJ but not in TJ. Experiment 1 found faster reaction times in the predictable vs. unpredictable condition in KJ speakers but only when the modifier had an all-low tone. This suggests that the modifier ending that changes following the sandhi rule functions as a reliable cue to constrain an upcoming tone, whereas the modifier ending that remains the same does not (although the next tone is predictable). Unexpectedly, we found the same but weaker effect in TJ speakers. Experiment 2 replicated this effect and additionally showed that the facilitation effect was not because TJ speakers were exposed to KJ well enough to become familiar with the KJ sandhi rule. We speculate that the effect in TJ speakers is related to a language-universal constraint against a sequence of low tones without a high tone within a phonological word, which may urge listeners to listen for a high tone in the upcoming input.
... The original LP model (Escudero, 2005(Escudero, , 2009) held a sequential view where perception precedes recognition, that is, the outcome of perception is faithfully passed on to recognition. According to this view, the lexical influences on perception are explained by offline (i.e., post hoc) learning from the lexicon (see the Merge model; Norris, McQueen, & Cutler, 2000). In contrast, the revised LP model (van Leussen & Escudero, 2015) allows for an interactive view as well, in which the lexicon can influence lower-level representations during the online (i.e., ad hoc) processing of speech (see the TRACE model; McClelland & Elman, 1986). ...
Preprint
In this chapter, we thoroughly describe the L2LP model, its five ingredients to explain speech development from first contact with a language or dialect (initial state) to proficiency comparable to a native speaker of the language or dialect (ultimate attainment), and its empirical, computational, and statistical method. We present recent studies comparing different types of bilinguals (simultaneous and sequential) and explaining their differential levels of ultimate attainment in different learning scenarios. We also show that although the model has the word “perception” in its name, it was designed to also explain phonological development in general, including lexical development, speech production, and orthographic effects. The studies reviewed in the chapter include new methods for examining lexical development and speech production, via implicit word learning and corpus-based analyses respectively, as well as a novel suprasegmental example of the L2LP SUBSET problem, which was conceptualized as the reverse of the commonly NEW scenario where L2 learners are phased with target contrasts that do not exist in their L1. We also review a recent study on the effect of bidialectalism on L2 acquisition, showing that the L2LP model’s explanations not only apply to speakers of multiple languages but also of multiple dialects. Finally, we present other topics and future directions, including phonetic training, going beyond segmental phonology, and the formalisation of orthographic effects in phonological development. All in all, the chapter demonstrates that the L2LP model can be regarded as a comprehensive theoretical, computational, and probabilistic model or framework for explaining how we learn the phonetics and phonology of multiple languages (sequentially or simultaneously) with variable levels of language input throughout the life span.
... Specifically, for auditory perception, is feedback used during perception of speech, or can the speech perceptual process can be feedforward? [15][16][17][18] One potential argument regarding feedback versus feedforward systems is whether bottom-up computations are affected by top-down signals. For instance, in a recent paper, RTs for visual popout targets were shorter when the trial "n" target feature was the same feature as in the preceding trial. ...
Article
Reaction times for correct vowel identification were measured to determine the effects of intertrial intervals, vowel, and cue type. Thirteen adults with normal hearing, aged 20–38 years old, participated. Stimuli included three naturally produced syllables (/ba/ /bi/ /bu/) presented whole or segmented to isolate the formant transition or static formant center. Participants identified the vowel presented via loudspeaker by mouse click. Results showed a significant effect of intertrial intervals, no significant effect of cue type, and a significant vowel effect—suggesting that feedback occurs, vowel identification may depend on cue duration, and vowel bias may stem from focal structure.
... If human listeners are able to update higher-order triphone probabilities in response to distributional evidence favoring it, it would provide evidence that listeners encode such higher-order probabilities (cf. discussion of this point in Norris et al., 2000;Newman, 2000). Further, gradient BP effects of the sort evaluated here vary in the degree to which they are supported by native language experience. ...
Article
This study evaluates the malleability of adults' perception of probabilistic phonotactic (biphone) probabilities, building on a body of literature on statistical phonotactic learning. It was first replicated that listeners categorize phonetic continua as sounds that create higher-probability sequences in their native language. Listeners were also exposed to skewed distributions of biphone contexts, which resulted in the enhancement or reversal of these effects. Thus, listeners dynamically update biphone probabilities (BPs) and bring this to bear on perception of ambiguous acoustic information. These effects can override long-term BP effects rooted in native language experience.
... signals with top-down knowledge and context, a longstanding debate concerns whether integration occurs within perceptual pathways (e.g., via feedback from lexical to sublexical representations) or post-perceptually. Norris, McQueen, and Cutler (2000) claimed that all valid top-down effects could be explained by post-perceptual integration, that feedback is not necessary, and could not help word recognition. Space does not permit a full treatment of this topic, so we refer readers to recent publications in this debate that include substantial reviews (Magnuson et al., 2018vs. ...
Article
Psycholinguists define spoken word recognition (SWR) as, roughly, the processes intervening between speech perception and sentence processing, whereby a sequence of speech elements is mapped to a phonological wordform. After reviewing points of consensus and contention in SWR, we turn to the focus of this review: considering the limitations of theoretical views that implicitly assume an idealized (neurotypical, monolingual adult) and static perceiver. In contrast to this assumption, we review evidence that SWR is plastic throughout the life span and changes as a function of cognitive and sensory changes, modulated by the language(s) someone knows. In highlighting instances of plasticity at multiple timescales, we are confronted with the question of whether these effects reflect changes in content or in processes, and we consider the possibility that the two are inseparable. We close with a brief discussion of the challenges that plasticity poses for developing comprehensive theories of spoken language processing. Expected final online publication date for the Annual Review of Linguistics, Volume 10 is January 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
... In this account, speech perception is modeled as the process by which a continuous stream of speech is mapped onto lexical representations, without a mediating level of phonemically categorized representations. Implementations of this theory using recurrent neural networks have been shown to simulate the effect of subcategorical mismatch on the lexical access process (Gaskell & Marslen-Wilson, 1997b;Marslen-Wilson & Warren, 1994; however, see Norris, McQueen, & Cutler, 2000). They have also been able to simulate effects of lexical competition and bottom-up mismatch without directly implemented inhibitory connections between lexical units (Gaskell & Marslen-Wilson, 1999). ...
Article
Full-text available
Two gating studies, a forced-choice identification study and 2 series of cross-modal repetition priming experiments, traced the time course of recognition of words with onset embeddings (captain) and short words in contexts that match (cap tucked) or mismatch (cap looking) with longer words. Results suggest that acoustic differences in embedded syllables assist the perceptual system in discriminating short words from the start of longer words. The ambiguity created by embedded words is therefore not as severe as predicted by models of spoken word recognition based on phonemic representations. These additional acoustic cues combine with post-offset information in identifying onset-embedded words in connected speech.
... In contrast, context sensitivity to previous words and sentences is widely documented in language-related responses such as the N400 (Kutas et al. 1980(Kutas et al. , 2011 which is prominent in centroparietal electrodes ~400ms post word onset. Despite evidence on either side (McClelland and Elman, 1986;Norris et al., 2000;Davis and Johnsrude, 2007;Travis et al., 2013), one prevailing view is that high-level linguistic context predictively shapes pre-lexical representation (Kuperberg and Jaeger, 2016), which has received some electrophysiological support in natural speech comprehension (e.g. Broderick et al. 2019). ...
Preprint
Full-text available
To transform speech into words, the human brain must accommodate variability across utterances in intonation, speech rate, volume, accents and so on. A promising approach to explaining this process has been to model electroencephalogram (EEG) recordings of brain responses to speech. Contemporary models typically invoke speech categories (e.g. phonemes) as an intermediary representational stage between sounds and words. However, such categorical models are typically hand-crafted and therefore incomplete because they cannot speak to the neural computations that putatively underpin categorization. By providing end-to-end accounts of speech-to-language transformation, new deep-learning systems could enable more complete brain models. We here model EEG recordings of audiobook comprehension with the deep-learning system Whisper. We find that (1) Whisper provides an accurate, self-contained EEG model of speech-to-language transformation; (2) EEG modeling is more accurate when including prior speech context, which pure categorical models do not support; (3) EEG signatures of speech-to-language transformation depend on listener-attention.
... Phonotactic probability is a statistical encoding of gradient well-formedness that captures what words typically 'sound' like in a language, based on the extent to which subsequences of phonemes are evidenced across a range of words in that language. A body of work has established that speakers, listeners, and learners of a language have detailed statistical knowledge of phonotactic probabilities, which they can leverage for tasks such as segmenting words from a stream of speech [19][20][21][22][23][24] and judging the well-formedness of non-words [25][26][27][28][29]. Results from the speech perception literature show that phonotactic probability can affect lexical access [30][31][32][33]. ...
Article
Full-text available
Most non-Māori-speaking New Zealanders are regularly exposed to Māori throughout their lives without seeming to build any extensive Māori lexicon; at best, they know a small number of words which are frequently used and sometimes borrowed into English. Here, we ask how many Māori words non-Māori-speaking New Zealanders know, in two ways: how many can they identify as real Māori words, and how many can they actively define? We show that non-Māori-speaking New Zealanders can readily identify many more Māori words than they can define, and that the number of words they can reliably define is quite small. This result adds crucial support to the idea presented in earlier work that non-Māori-speaking New Zealanders have implicit form-based (proto-lexical) knowledge of many Māori words, but explicit semantic (lexical) knowledge of few. Building on this distinction, we further ask how different levels of word knowledge modulate effects of phonotactic probability on the accessing of that knowledge, across both tasks and participants. We show that participants' implicit word knowledge leads to effects of phonotactic probability-and related effects of neighbourhood density-in a word/non-word discrimination task, but not in a more explicit task that requires the active definition of words. Similarly, we show that the effects of phonotactic probability on word/non-word discrimination are strong among participants who appear to lack explicit word knowledge, as indicated by their weak discrimination performance, but absent among participants who appear to have explicit word knowledge, as indicated by their strong discrimination performance. Together, these results suggest that phonotactic probability plays its strongest roles in the absence of explicit semantic knowledge.
... Certainly, there are advantages for studying these topics in isolation-the lack of invariance problem is far more tractable when considered independently from auditory attention, and it is far more practical to study the cocktail party problem without the added complications of talker-specific phonetic variation. Furthermore, extant models of speech perception (e.g., Kleinschmidt & Jaeger, 2015;Magnuson et al., 2020;McClelland & Elman, 1986;Norris et al., 2000) and of auditory attention (Cusack et al., 2004;Holmes et al., 2021;Shamma et al., 2011) have explained a considerable amount of empirical data. Nevertheless, given that some phenomena-such as the emergence of multitalker processing costs-may be driven by both speech perception and auditory attention mechanisms, it is useful to attempt to consider how these mechanisms might interact. ...
Article
Full-text available
Though listeners readily recognize speech from a variety of talkers, accommodating talker variability comes at a cost: Myriad studies have shown that listeners are slower to recognize a spoken word when there is talker variability compared with when talker is held constant. This review focuses on two possible theoretical mechanisms for the emergence of these processing penalties. One view is that multitalker processing costs arise through a resource-demanding talker accommodation process, wherein listeners compare sensory representations against hypothesized perceptual candidates and error signals are used to adjust the acoustic-to-phonetic mapping (an active control process known as contextual tuning ). An alternative proposal is that these processing costs arise because talker changes involve salient stimulus-level discontinuities that disrupt auditory attention . Some recent data suggest that multitalker processing costs may be driven by both mechanisms operating over different time scales. Fully evaluating this claim requires a foundational understanding of both talker accommodation and auditory streaming; this article provides a primer on each literature and also reviews several studies that have observed multitalker processing costs. The review closes by underscoring a need for comprehensive theories of speech perception that better integrate auditory attention and by highlighting important considerations for future research in this area.
... In a later section, we review some of the fundamental discoveries that have followed from attempts to solve the lack-ofinvariance problem. We also review research based on other sources of variation in the signal and the problem of how the perceiver manages to bind meaningless units of perception into larger, meaningful ones (McClelland & Elman, 1986;Norris et al., 2000;Yi et al., 2019). Exemplar-based theory sidesteps the lack of invariance problem by embracing variability in the form of detailed lexical representations (Casserly & Pisoni, 2010). ...
Chapter
Acoustic theories assume that speech perception begins with an acoustic signal transformed by auditory processing. In classical acoustic theory, this assumption entails perceptual primitives that are akin to those identified in the spectral analyses of speech. The research objective is to link these primitives with phonological units of traditional descriptive linguistics via sound categories and then to understand how these units/categories are bound together in time to recognize words. Achieving this objective is challenging because the signal is replete with variation, making the mapping of signal to sound category nontrivial. Research that grapples with the mapping problem has led to many basic findings about speech perception, including the importance of cue redundancy to category identification and of differential cue weighting to category formation. Research that grapples with the related problem of binding categories into words for speech processing motivates current neuropsychological work on speech perception. The central focus on the mapping problem in classical theory has also led to an alternative type of acoustic theory, namely, exemplar-based theory. According to this type of acoustic theory, variability is critical for processing talker-specific information during speech processing. The problems associated with mapping acoustic cues to sound categories is not addressed because exemplar-based theories assume that perceptual traces of whole words are perceptual primitives. Smaller units of speech sound representation, as well as the phonology as a whole, are emergent from the word-based representations. Yet, like classical acoustic theories, exemplar-based theories assume that production is mediated by a phonology that has no inherent motor information. The presumed disconnect between acoustic and motor information during perceptual processing distinguishes acoustic theories as a class from other theories of speech perception.
... The extra cognitive effort and time provide an opportunity for orthographic information to be utilized to reinforce the correct phonological representation. In other words, the type of information that comes into play varies depending on task demands (Cutler et al., 2010;Norris et al., 2000), and orthographic information may be assigned less weight than phonological information as a processing cue because it is accessed indirectly, requiring an additional step (i.e., orthographic information is accessed via phonological representation). As a result, orthographic information does not play an important role when the processing effort is low. ...
Article
Full-text available
The relationship between the ways in which words are pronounced and spelled has been shown to affect spoken word processing, and a consistent relationship between pronunciation and spelling has been reported as a possible cause of unreduced pronunciations being easier to process than reduced counterparts although reduced pronunciations occur more frequently. In the present study, we investigate the effect of pronunciation-to-spelling consistency for reduced and unreduced pronunciations in L1 and L2 listeners of a logographic language. More precisely, we compare L1 and L2 Japanese listeners to probe whether they use orthographic information differently when processing reduced and unreduced speech. Using pupillometry, the current study provides evidence that extends the hypothesis about the role of orthography in the processing of reduced speech. Orthographic realization matters in processing for L1 and L2 advanced listeners. More specifically, how consistent the orthographic realization is with its phonological form (phonology-to-orthography consistency) modulates the extent to which reduced pronunciation induces additional processing costs. The results are further discussed in terms of their implications for how listeners process reduced speech and the role of the orthographic form in speech processing.
... This approach has been supported by many scholars in speech science and psycholinguistics (see e.g., Cutler et al. 1987). Some studies even go beyond arguing for bottom-up processing to stress that speech processing is monodirectional in a strictly bottom-up manner, such that top-down processing could hinder word recognition and speech intelligibility (e.g., Norris et al. 2000). Such a thesis states that the flow of information from segments to words (or even sentences) is always necessary for word recognition, but backward feedback from words to segments is unnecessary or probably implausible. ...
Article
We examined the contributions of segment type (consonants vs. vowels) and segment ratio to word recognition in Arabic sentences, a language that has a nonconcatenative morphological system in which consonants indicate semantic information, while vowels indicate structural information. In two experiments (with a balanced vowel-to-consonant ratio in Experiment 1 and an imbalanced ratio in Experiment 2), we presented participants with spoken sentences in Modern Standard Arabic, in which either consonants or vowels had been replaced by silence, and asked them to report what they could understand. The results indicate that consonants play a much greater role than vowels, both for balanced and also imbalanced sentences. The results also show greater word recognition for stimuli that contained a higher ratio of consonants to vowels. These results support and supplement previous findings on the role of consonantal roots in word recognition in Semitic languages, but clearly differ from those previously reported for non-Semitic languages which highlight the role of vowels in word recognition at the sentence level. We interpret this within the framework of root-and-pattern morphology, and further argue that segmental effects on word recognition and speech processing are crucially modulated by morphological structure.
... This shows that there was less shared information between stimulus pairs, hence once prior knowledge is available, expected sensory information can be cancelled out when the stimulus is presented (Blank & Davis, 2016). This finding was replicated (Sohoglu & Davis, 2020) and rules out both a pure bottom-up account (without influence of prior knowledge, see Norris et al., 2000) and sharpening theories of speech perception according to which effects of prior knowledge and sensory quality should be additive (McClelland & Elman, 1986). Similarly, misperception, induced by partially deviating prior information as written words preceding degraded spoken words, was explained by multivariate representations of prediction errors (Blank et al., 2018). ...
Article
Full-text available
Speech perception is heavily influenced by our expectations about what will be said. In this review, we discuss the potential of multivariate analysis as a tool to understand the neural mechanisms underlying predictive processes in speech perception. First, we discuss the advantages of multivariate approaches and what they have added to the understanding of speech processing from the acoustic-phonetic form of speech, over syllable identity and syntax, to its semantic content. Second, we suggest that using multivariate techniques to measure informational content across the hierarchically organised speech-sensitive brain areas might enable us to specify the mechanisms by which prior knowledge and sensory speech signals are combined. Specifically, this approach might allow us to decode how different priors, e.g. about a speaker's voice or about the topic of the current conversation, are represented at different processing stages and how incoming speech is as a result differently represented. ARTICLE HISTORY
... There is an eagle in the sky flying "There is an eagle flying in the sky" high-surprisal sentence b)* 有 一只 鹰 在 天上 肥* You3 yi4 zhi1 ying1 zai4 tian1 shang4 fei2* There is an eagle in the sky gaining weight* "There is an eagle gaining weight in the sky" Path from acoustic signal to units of perception: many-to-many Speech (segmental) processing frameworks have identified two high-level mechanisms: bottom-up processing & top-down processing • Many early models assumed bottom-up processing as a first attempt o The Cohort Model (Marslen-Wilson 1978, 1987, Direct Perception (Gibson 1954) & Direct Realism (Fowler 1986) • (Stevens 2002(Stevens , 2008 o Opponent: Shortlist (Norris 1994), Merge (Norris, Cutler and McQueen 2000) • Current models tend to incorporate both top-down and bottom-up processes in speech perception, but the relative weighting and integration of these sources of information remains unclear. ...
... To achieve this feat, listeners must map the incoming speech signal onto abstract phonological representations. A hallmark of theories on native (e.g., Luce & Pisoni, 1998;Magnuson et al., 2020;Norris & McQueen, 2008;Norris et al., 2000) and non-native (e.g., Shook & Marian, 2013) spoken-word recognition is that activation cascades from sub-lexical to lexical levels where words, consistent with the incoming signal, compete for recognition. ...
Article
Full-text available
Listeners frequently recognize spoken words in the presence of background noise. Previous research has shown that noise reduces phoneme intelligibility and hampers spoken-word recognition – especially for non-native listeners. In the present study, we investigated how noise influences lexical competition in both the non-native and the native language, reflecting the degree to which both languages are co-activated. We recorded the eye movements of native Dutch participants as they listened to English sentences containing a target word while looking at displays containing four objects. On target-present trials, the visual referent depicting the target word was present, along with three unrelated distractors. On target-absent trials, the target object (e.g., wizard) was absent. Instead, the display contained an English competitor, overlapping with the English target in phonological onset (e.g., window), a Dutch competitor, overlapping with the English target in phonological onset (e.g., wimpel , pennant), and two unrelated distractors. Half of the sentences was masked by speech-shaped noise; the other half was presented in quiet. Compared to speech in quiet, noise delayed fixations to the target objects on target-present trials. For target-absent trials, we observed that the likelihood for fixation biases towards the English and Dutch onset competitors (over the unrelated distractors) was larger in noise than in quiet. Our data thus show that the presence of background noise increases lexical competition in the task-relevant non-native (English) and in the task-irrelevant native (Dutch) language. The latter reflects stronger interference of one’s native language during non-native spoken-word recognition under adverse conditions.
... On its surface, this type of task seems an ideal way to assess speech categorization: it is simple and directly captures listeners' identification of the speech stimulus. Indeed, this task has clear, uncontroversial applications in studies that assess changes in category boundaries, like in perceptual learning (Kraljic and Samuel, 2005;Kraljic et al., 2008;Norris et al., 2000), accent adaptation (Reinisch and Holt, 2014;Sumner, 2011), talker normalization (Johnson et al., 1999;Strand and Johnson, 1996), and context effects (Coady et al., 2007;Holt, 2006). Moreover, the fairly straightforward nature of the task makes it amenable to use with diverse populations [e.g., young children (Hazan and Barrett, 2000;Slawinski and Fitzgerald, 1998), people with language impairments (Robertson et al., 2009;Serniclaes, 2006;Sussman, 1993), and non-native speakers (Aoyama et al., 2004;Goriot et al., 2020; Sebasti an-Gall es, 2011; Sebasti an-Gall es and Bosch, 2002)]. ...
Article
Research on speech categorization and phoneme recognition has relied heavily on tasks in which participants listen to stimuli from a speech continuum and are asked to either classify each stimulus (identification) or discriminate between them (discrimination). Such tasks rest on assumptions about how perception maps onto discrete responses that have not been thoroughly investigated. Here, we identify critical challenges in the link between these tasks and theories of speech categorization. In particular, we show that patterns that have traditionally been linked to categorical perception could arise despite continuous underlying perception and that patterns that run counter to categorical perception could arise despite underlying categorical perception. We describe an alternative measure of speech perception using a visual analog scale that better differentiates between processes at play in speech categorization, and we review some recent findings that show how this task can be used to better inform our theories.
... Yet, systematic short-term deviations from these norms are as commonplace as chatting with a talker with an unexpected accent or recognizing a spouse's voice despite their head cold. There is substantial evidence that speech comprehension initially suffers when speech input shifts away from long-term norms, but comprehension improves with exposure to expectation-violating input like a foreign accent (Bertelson et al., 2003;Bradlow & Bent, 2008;Clarke & Garrett, 2004;Greenspan et al., 1988;Hervais-Adelman et al., 2011;Idemaru & Holt, 2011;Maye et al., 2008;Norris et al., 2000;Samuel & Kraljic, 2009;Schwab et al., 1985;Vroomen et al., 2007). Yet, most of what we know has come from behavioral paradigms for which distinct levels of auditory and decisional processes are unavoidably intertwined and impossible to dissociate. ...
Preprint
Full-text available
Speech perception presents an exemplary model of how neurobiological systems flexibly adjust when input departs from the norm. Dialects, accents, and even head colds can negatively impact comprehension by shifting speech from listeners’ expectations. Comprehension improves with exposure to shifted speech regularities, but there is no neurobiological model of this rapid learning. We used electroencephalography to examine human auditory cortical responses to utterances that varied only in fundamental frequency (F0, perceived as voice pitch) as we manipulated the statistical distributions of speech acoustics across listening contexts. Participants overtly categorized speech sampled across two acoustic dimensions that signal /b/ from /p/ (voice onset time [VOT] and F0) to model typical English speech regularities or an expectation-violating accent. These blocks were interleaved with passive exposure to two F0-distinguished test stimuli presented in an oddball ratio to elicit a cortical mismatch negativity (MMN) response. F0 robustly influenced speech categorization when short-term regularities aligned with English but F0 exerted no influence in the context of the accent. The short-term regularities modulated event-related potentials evoked by F0-distinguished test stimuli across both N1 and P3 temporal windows and, for P3 amplitude, there was a strong correlation with perceptual down-weighting of F0. The influence of the short-term regularities persisted to impact MMN in interleaved passive listening blocks when regularities mirrored English but were absent when regularities conveyed the accent. Thus, cortical response is modulated as a function of statistical regularities of the listening context, likely reflecting both early dimension encoding and later categorization. Significance Statement Speech perception is a quintessential example of how neurobiological systems flexibly adjust when input departs from the norm. Perception is well-tuned to native-language speech patterns. Yet it adjusts when speech diverges from expectations, as with a foreign accent. We observe that the effectiveness of specific cues in speech, like the pitch of a voice, in signaling phonemes like /b/ versus /p/ is dynamically re-weighted when speech violates native-language expectations. We find that this re-weighting is mirrored in cortical responses that reflect both early acoustic dimension encoding and also in later responses linked to phoneme categorization. The results implicate dynamic adjustments in the mapping of speech to cortical representations, as modulated by statistical regularities experienced across local speech input.
... A similar conclusion is reached by Solé (2002, p. 682), who argues that trills are perceptually salient because their trilled manner of articulation results in "a clearly modulated signal, distinct from other speech segments". If we additionally follow the assumption of Norris et al. (2000), that phonetic-categorization tasks can lead to the generation of meta-linguistic decision nodes for that task, we can start formulating how our results might be explained. It might be that such decision nodes are activated more strongly by salient trills than by less salient fricatives, which in turn might explain their strong effects on selective adaptation. ...
Article
Full-text available
In three experiments, we examined selective adaptation of German /r/ depending on the positional and allophonic overlap between adaptors and targets. A previous study had shown that selective adaptation effects with /r/ in Dutch require allophonic overlap between adaptor and target. We aimed at replicating this finding in German, which also has many allophones of /r/. German post-vocalic /r/ is often vocalized, and pre-vocalic /r/ can occur in at least three forms: uvular fricative [ʁ], uvular trill [ʀ] and alveolar trill [r]. We tested selective adaptation between these variants. The critical questions were whether an allophonic overlap is necessary for adaptation or whether phonemic overlap is sufficient to generate an adaptation effect. Surprisingly, our results show that both assertations are wrong: Adaptation does not require an allophonic overlap between adaptors and target and neither is phonemic overlap sufficient. Even more surprisingly, trilled adaptors led to more adaptation for a uvular-fricative target than uvular-fricative adaptors themselves. We suggest that the perceptual salience of the adaptors may be a hitherto underestimated influence on selective adaptation.
... All the models shared that speech perception involves activation of lexical representations via sublexical features, which was originally proposed by the Cohort model (Marslen-Wilson, 1975, 1987. However, they differed in whether the process is entirely bottom-up, as stated in the Merge model (Norris et al., 2000) and the fuzzy logical model (Massaro, 1989), or interactive, as maintained by the TRACE model (McClelland and Elman, 1986) and the Adaptive Resonance Theory (Grossberg, 1980). The TRACE model was identified in our co-citation analysis as an important research cohort and Norris D., who is the founder of a series of models including the Merge model, was identified in the citation analysis as an impactful researcher. ...
Article
Full-text available
Based on 6,407 speech perception research articles published between 2000 and 2020, a bibliometric analysis was conducted to identify leading countries, research institutes, researchers, research collaboration networks, high impact research articles, central research themes and trends in speech perception research. Analysis of highly cited articles and researchers indicated three foundational theoretical approaches to speech perception, that is the motor theory, the direct realism and the computational approach as well as four non-native speech perception models, that is the Speech Learning Model, the Perceptual Assimilation Model, the Native Language Magnet model, and the Second Language Linguistic Perception model. Citation networks, term frequency analysis and co-word networks revealed several central research topics: audio-visual speech perception, spoken word recognition, bilingual and infant/child speech perception and learning. Two directions for future research were also identified: (1) speech perception by clinical populations, such as hearing loss children with cochlear implants and speech perception across lifespan, including infants and aged population; (2) application of neurocognitive techniques in investigating activation of different brain regions during speech perception. Our bibliometric analysis can facilitate research advancements and future collaborations among linguists, psychologists and brain scientists by offering a bird view of this interdisciplinary field.
... However, subsequent extensions of these theories have taken the influence of top-down information into consideration (e.g., TRACE [5], Acoustic Landmarks and Distinctive Features [6]). Others have eschewed any role for top-down processing (e.g., Shortlist [7]; Merge [8]). Current models tend to incorporate both top-down and bottom-up processes in speech perception, but the relative weighting and integration of these sources of information remains unclear. ...
... However, speech perception models diverge in their assumptions about whether higher-level lexical information extends downward into prelexical perceptual mechanisms. An autonomous framework posits a unidirectional flow of activation where, for example, the lexical representations do not affect processing in the prelexical representations that feed up to them (e.g., Norris et al. 2000). In contrast, an interactive model conceives of bidirectional flows of information between different levels of activation nodes, which allows lexical information to reshape the perception of prelexical units (McClelland and Elman 1986). ...
Article
Idiosyncratic perceptual compensation behaviors are considered to have a bearing on the perceptual foundation of sound change. We investigate how compensation processes driven by lexical and coarticulatory contexts simultaneously affect listeners’ perception of a single segment and the individual differences in the compensation patterns. Sibilants on an /s-ʃ/ continuum were embedded into four lexical frames that differed in whether the lexical context favored /s/ or /ʃ/ perceptually and whether the vocalic context favored /s/ or not. Forty-two participants took a lexical decision task to decide whether each stimulus was a word or not. They also completed the autism-spectrum quotient questionnaire. The aggregate results of the lexical decision task show coexistence of lexically induced and coarticulatorily induced perceptual shifts in parallel. A negative correlation was found between the two kinds of perceptual shifts for individual listeners in lexical decisions, lending support to a potential trade-off between compensation magnitudes on different levels of cue integration. No correlation was found between the perceptual shifts of individuals and the results of the autism-spectrum quotient questionnaire.
Article
We recently reported strong, replicable (i.e., replicated) evidence for lexically mediated compensation for coarticulation (LCfC; Luthra et al., 2021), whereby lexical knowledge influences a prelexical process. Critically, evidence for LCfC provides robust support for interactive models of cognition that include top‐down feedback and is inconsistent with autonomous models that allow only feedforward processing. McQueen, Jesse, and Mitterer (2023) offer five counter‐arguments against our interpretation; we respond to each of those arguments here and conclude that top‐down feedback provides the most parsimonious explanation of extant data.
Article
We used eye-tracking during natural reading to study how semantic control and representation mechanisms interact for the successful comprehension of sentences, by manipulating sentence context and single-word meaning. Specifically, we examined whether a word’s semantic characteristic (concreteness) affects first fixation and gaze durations (FFDs and GDs) and whether it interacts with the predictability of a word. We used a linear mixed effects model including several possible psycholinguistic covariates. We found a small but reliable main effect of concreteness and replicated a predictability effect on FFDs, but we found no interaction between the two. The results parallel previous findings of additive effects of predictability (context) and frequency (lexical level) in fixation times. Our findings suggest that the semantics of a word and the context created by the preceding words additively influence early stages of word processing in natural sentence reading.
Chapter
This book is a definitive reference source for the growing, increasingly more important, and interdisciplinary field of computational cognitive modeling, that is, computational psychology. It combines breadth of coverage with definitive statements by leading scientists in this field. Research in computational cognitive modeling explores the essence of cognition and various cognitive functionalities through developing detailed, process-based understanding by specifying computational mechanisms, structures, and processes. Given the complexity of the human mind and its manifestation in behavioral flexibility, process-based computational models may be necessary to explicate and elucidate the intricate details of the mind. The key to understanding cognitive processes is often in fine details. Computational models provide algorithmic specificity: detailed, exactly specified, and carefully thought-out steps, arranged in precise yet flexible sequences. These models provide both conceptual clarity and precision at the same time. This book substantiates this approach through overviews and many examples.
Chapter
This book is a definitive reference source for the growing, increasingly more important, and interdisciplinary field of computational cognitive modeling, that is, computational psychology. It combines breadth of coverage with definitive statements by leading scientists in this field. Research in computational cognitive modeling explores the essence of cognition and various cognitive functionalities through developing detailed, process-based understanding by specifying computational mechanisms, structures, and processes. Given the complexity of the human mind and its manifestation in behavioral flexibility, process-based computational models may be necessary to explicate and elucidate the intricate details of the mind. The key to understanding cognitive processes is often in fine details. Computational models provide algorithmic specificity: detailed, exactly specified, and carefully thought-out steps, arranged in precise yet flexible sequences. These models provide both conceptual clarity and precision at the same time. This book substantiates this approach through overviews and many examples.
Chapter
This book is a definitive reference source for the growing, increasingly more important, and interdisciplinary field of computational cognitive modeling, that is, computational psychology. It combines breadth of coverage with definitive statements by leading scientists in this field. Research in computational cognitive modeling explores the essence of cognition and various cognitive functionalities through developing detailed, process-based understanding by specifying computational mechanisms, structures, and processes. Given the complexity of the human mind and its manifestation in behavioral flexibility, process-based computational models may be necessary to explicate and elucidate the intricate details of the mind. The key to understanding cognitive processes is often in fine details. Computational models provide algorithmic specificity: detailed, exactly specified, and carefully thought-out steps, arranged in precise yet flexible sequences. These models provide both conceptual clarity and precision at the same time. This book substantiates this approach through overviews and many examples.
Chapter
This book is a definitive reference source for the growing, increasingly more important, and interdisciplinary field of computational cognitive modeling, that is, computational psychology. It combines breadth of coverage with definitive statements by leading scientists in this field. Research in computational cognitive modeling explores the essence of cognition and various cognitive functionalities through developing detailed, process-based understanding by specifying computational mechanisms, structures, and processes. Given the complexity of the human mind and its manifestation in behavioral flexibility, process-based computational models may be necessary to explicate and elucidate the intricate details of the mind. The key to understanding cognitive processes is often in fine details. Computational models provide algorithmic specificity: detailed, exactly specified, and carefully thought-out steps, arranged in precise yet flexible sequences. These models provide both conceptual clarity and precision at the same time. This book substantiates this approach through overviews and many examples.
Chapter
This book is a definitive reference source for the growing, increasingly more important, and interdisciplinary field of computational cognitive modeling, that is, computational psychology. It combines breadth of coverage with definitive statements by leading scientists in this field. Research in computational cognitive modeling explores the essence of cognition and various cognitive functionalities through developing detailed, process-based understanding by specifying computational mechanisms, structures, and processes. Given the complexity of the human mind and its manifestation in behavioral flexibility, process-based computational models may be necessary to explicate and elucidate the intricate details of the mind. The key to understanding cognitive processes is often in fine details. Computational models provide algorithmic specificity: detailed, exactly specified, and carefully thought-out steps, arranged in precise yet flexible sequences. These models provide both conceptual clarity and precision at the same time. This book substantiates this approach through overviews and many examples.
Chapter
This book is a definitive reference source for the growing, increasingly more important, and interdisciplinary field of computational cognitive modeling, that is, computational psychology. It combines breadth of coverage with definitive statements by leading scientists in this field. Research in computational cognitive modeling explores the essence of cognition and various cognitive functionalities through developing detailed, process-based understanding by specifying computational mechanisms, structures, and processes. Given the complexity of the human mind and its manifestation in behavioral flexibility, process-based computational models may be necessary to explicate and elucidate the intricate details of the mind. The key to understanding cognitive processes is often in fine details. Computational models provide algorithmic specificity: detailed, exactly specified, and carefully thought-out steps, arranged in precise yet flexible sequences. These models provide both conceptual clarity and precision at the same time. This book substantiates this approach through overviews and many examples.
Article
In six experiments we explored how biphone probability and lexical neighborhood density influence listeners' categorization of vowels embedded in nonword sequences. We found independent effects of each. Listeners shifted categorization of a phonetic continuum to create a higher probability sequence, even when neighborhood density was controlled. Similarly, listeners shifted categorization to create a nonword from a denser neighborhood, even when biphone probability was controlled. Next, using a visual world eye-tracking task, we determined that biphone probability information is used rapidly by listeners in perception. In contrast, task complexity and irrelevant variability in the stimuli interfere with neighborhood density effects. These results support a model in which both biphone probability and neighborhood density independently affect word recognition, but only biphone probability effects are observed early in processing.
Article
Selective auditory attention has been shown to modulate the cortical representation of speech. This effect has been well documented in acoustically more challenging environments. However, the influence of top-down factors, in particular topic familiarity, on this process remains unclear, despite evidence that semantic information can promote speech-in-noise perception. Apart from individual features forming a static listening condition, dynamic and irregular changes of auditory scenes-volatile listening environments-have been less studied. To address these gaps, we explored the influence of topic familiarity and volatile listening on the selective auditory attention process during dichotic listening using electroencephalography. When stories with unfamiliar topics were presented, participants' comprehension was severely degraded. However, their cortical activity selectively tracked the speech of the target story well. This implies that topic familiarity hardly influences the speech tracking neural index, possibly when the bottom-up information is sufficient. However, when the listening environment was volatile and the listeners had to re-engage in new speech whenever auditory scenes altered, the neural correlates of the attended speech were degraded. In particular, the cortical response to the attended speech and the spatial asymmetry of the response to the left and right attention were significantly attenuated around 100-200 ms after the speech onset. These findings suggest that volatile listening environments could adversely affect the modulation effect of selective attention, possibly by hampering proper attention due to increased perceptual load.
Thesis
Full-text available
Si las palabras fueran, por fortuna, completamente diferentes entre sí, el sujeto no tendría más que hacer el esfuerzo, quizás ímprobo, de recordar tantas como le fueran necesarias para comunicarse con otros; si por el contrario y, por facilitar el trabajo de su memoria, las palabras compartieran ciertos símbolos, o incluso se distinguieran entre sí por el orden que toma un conjunto reducido de símbolos, el sujeto tendría que hacer un esfuerzo para reconocer cómo se combinan, identificando los elementos del patrón y contrastando los segmentos obtenidos con otras cadenas de símbolos que conoce por experiencia. Al igual que ocurre en otros niveles de análisis lingüístico, la escritura alfabética es un compromiso entre la economía que se obtiene por repetir un conjunto finito de símbolos en distintas combinaciones, y la redundancia, el empleo de una serie finita de símbolos sin agotar todas las posibilidades combinatorias, dada una gramática fonoarticulatoria explícita en el lenguaje oral. Al no agotar todas las posibilidades combinatorias, el lector cuenta con indicios redundantes para identificar una palabra y discriminarla de sus semejantes en el léxico. Dada la presencia en el lenguaje de homófonos y homógrafos, de homónimos y sinónimos, el lector no puede descansar sobre un tipo único de indicios, a saber, los símbolos de un alfabeto, sin contrastar esa entrada con otras entradas en el léxico en un contexto de uso, bajo demandas particulares de la tarea. La información segmental en fonemas y letras no proporciona indicios suficientes de la identidad de una palabra. La provisión de indicios ortográficos suplementarios, tales como la acentuación y la puntuación, no son mas que un recurso para reducir al mínimo la ambigüedad de un sistema de escritura alfabético, y una muestra de su intrínseca ambigúedad.
Article
This book explores dialects and social differences in language computationally, examining topics such as how (and how much) linguistic differences impede intelligibility, how national borders accelerate and direct change, how opinion and hearsay shape perceptions of language differences, the role of intonation (melody), the differences between variation in pronunciation and vocabulary, and techniques for recognising structure in larger collections of linguistic data. The computational investigations engage more traditional work deeply, and a panel discussion focuses on the opportunities and risks of pursuing humanities research using computational science. There is also an introduction that attempts to sketch perspectives from which to approach the individual contributions.
Article
Research suggests that individuals differ in the degree to which they rely on lexical information to support speech perception. However, the locus of these differences is not yet known; nor is it known whether these individual differences reflect a context-dependent “state” or a stable listener “trait.” Here we test the hypothesis that individual differences in lexical reliance are a stable trait that is linked to individuals' relative weighting of lexical and acoustic-phonetic information for speech perception. At each of two sessions, listeners (n = 73) completed a Ganong task, a phonemic restoration task, and a locally time-reversed speech task – three tasks that have been used to demonstrate a lexical influence on speech perception. Robust lexical effects on speech perception were observed for each task in the aggregate. Individual differences in lexical reliance were stable across sessions; however, relationships among the three tasks in each session were weak. For the Ganong and locally time-reversed speech tasks, increased reliance on lexical information was associated with weaker reliance on acoustic-phonetic information. Collectively, these results (1) provide some evidence to suggest that individual differences in lexical reliance for a given task are a stable reflection of the relative weighting of acoustic-phonetic and lexical cues for speech perception in that task, and (2) highlight the need for a better understanding of the psychometric characteristics of tasks used in the psycholinguistic domain to build theories that can accommodate individual differences in mapping speech to meaning.
Article
Full-text available
This paper focuses on the question of the representation of nasality as well as speakers’ awareness and perceptual use of phonetic nasalisation by examining surface nasalisation in two types of vowels in Bengali: underlying nasal vowels (CṼC) and nasalised vowels before a nasal consonant (CVN). A series of three cross-modal forced-choice experiments was used to investigate the hypothesis that only unpredictable nasalisation is stored and that this sparse representation governs how listeners interpret vowel nasality. Visual full-word targets were preceded by auditory primes consisting of CV segments of CVC words with nasal vowels ([tʃɑ̃] for [tʃɑ̃d] ‘moon’), oral vowels ([tʃɑ] for [tʃɑl] ‘unboiled rice’) or nasalised oral vowels ([tʃɑ̃(n)] for [tʃɑ̃n] ‘bath’) and reaction times and errors were measured. Some targets fully matched the prime while some matched surface or underlying representation only. Faster reaction times and fewer errors were observed after CṼC primes compared to both CVC and CVN primes. Furthermore, any surface nasality was most frequently matched to a CṼC target unless no such target was available. Both reaction times and error data indicate that nasal vowels are specified for nasality leading to faster recognition compared to underspecified oral vowels, which cannot be perfectly matched with incoming signals.
Preprint
Research on speech categorization and phoneme recognition has relied heavily on tasks in which participants listen to stimuli from a speech continuum, and are asked to either classify each stimulus (identification) or discriminate between them (discrimination). Such tasks rest on assumptions about how perception maps onto discrete responses – assumptions that have not been thoroughly investigated. Here we identify critical challenges in the link between these tasks and theories of speech perception. In particular, we show that patterns that have traditionally been linked to categorical perception could arise despite continuous underlying perception; and that patterns that run counter to categorical perception could arise despite underlying categorical perception. We describe an alternative measure of speech perception using a Visual Analogue Scale that better differentiates between processes at play in speech perception, and review some recent findings that show how this task can be used to better inform our theories.
Thesis
Human listeners achieve quick and effortless speech comprehension in daily life and can adopt new words easily into their vocabulary. However, the underlying mechanisms under spoken word recognition and learning remain to be better understood. This thesis examines the neural and functional mechanisms of spoken word recognition and memory encoding by using a competitor priming paradigm - prior presentation (priming) of a competitor spoken word (e.g. hijack) is followed by the presentation of a similar sounding word sharing the same initial segments (e.g. hygiene). Consistent with the Bayes rule, the prior probability of the competitor word has been increased due to the earlier exposure, which can in turn change the perception or memory encoding of the target word. The MEG study described in Chapter 2 examined the neural implementations of spoken word recognition by testing two distinct implementations of Bayes perceptual inference. Competitive-selection accounts (e.g. TRACE) propose direct competition between lexical units such that inhibition of irrelevant candidates leads to selection of critical words, while predictive-selection accounts (e.g. Predictive Coding) suggest that computations of prediction error by comparing heard and predicted speech sounds drive the update of lexical probabilities that are crucial to word recognition. The study results indicated that MEG signals localised to the superior temporal gyrus (STG) showed greater neural responses evoked by competitor primed words than unprimed words after the point at which they were uniquely identified (after /haidʒ/ in hygiene) and these stronger neural signals also correlated with the longer response times caused by competitor priming. These findings were more in line with the predictive neural mechanisms. Chapter 3 reports studies that investigated lexical and sub-lexical processing during spoken word recognition, specifically whether changes in lexical prediction that give rise to the competitor priming effect (longer response times) continue to be observed even when word recognition is not required for task performance. Here, the pause detection task was compared with the lexical decision task in a set of experiments to direct participants’ attention to phonological processing or lexical processing respectively during the perception of prime or target items. The findings showed opposite effects of these two kinds of processing, with the competitor priming effect observable only when participants’ attention was on lexical processing, while phonological facilitatory effect was observed when the pause detection task was used, and that prime item was presented with pause inserted. These results were in accordance with the Predictive Coding account and the Distributed Cohort Model, as both of which support inhibitory lexical processing and facilitatory sub-lexical processing in their respective structures. Chapter 4 describes tasks and analyses that examined the effect of competitor priming on spoken word memory encoding by using additional recognition memory data collected from the same experiments as reported in Chapter 2 and 3. Participants’ memory performance was measured by how accurately they could distinguish previously heard items from foils. The findings indicated that enhanced prediction error caused by competitor priming facilitated memory encoding of words when the encoding was repeated multiple times and involved deeper lexical-semantic processing. These findings were consistent with the PIMMS framework, which proposes that prediction error caused by the competitor priming effect should improve memory encoding. Moreover, subsequent memory analyses of the MEG data (as reported in Chapter 2) showed pseudoword encoding localised to the medial temporal lobe, consistent with the initial rapid encoding stage of novel word learning in the complementary learning systems. In conclusion, the thesis provides evidence for a unified account of computations of prediction error which supports spoken word recognition and memory encoding while also shows that the effects of lexical and sub-lexical processing are dissociated during these two processes.
Article
Successful spoken-word recognition relies on an interplay between lexical and sublexical processing. Previous research demonstrated that listeners readily shift between more lexically-biased and more sublexically-biased modes of processing in response to the situational context in which language comprehension takes place. Recognizing words in the presence of background noise reduces the perceptual evidence for the speech signal and – compared to the clear – results in greater uncertainty. It has been proposed that, when dealing with greater uncertainty, listeners rely more strongly on sublexical processing. The present study tested this proposal using behavioral and electroencephalography (EEG) measures. We reasoned that such an adjustment would be reflected in changes in the effects of variables predicting recognition performance with loci at lexical and sublexical levels, respectively. We presented native speakers of Dutch with words featuring substantial variability in (1) word frequency (locus at lexical level), (2) phonological neighborhood density (loci at lexical and sublexical levels) and (3) phonotactic probability (locus at sublexical level). Each participant heard each word in noise (presented at one of three signal-to-noise ratios) and in the clear and performed a two-stage lexical decision and transcription task while EEG was recorded. Using linear mixed-effects analyses, we observed behavioral evidence that listeners relied more strongly on sublexical processing when speech quality decreased. Mixed-effects modelling of the EEG signal in the clear condition showed that sublexical effects were reflected in early modulations of ERP components (e.g., within the first 300 ms post word onset). In noise, EEG effects occurred later and involved multiple regions activated in parallel. Taken together, we found evidence – especially in the behavioral data – supporting previous accounts that the presence of background noise induces a stronger reliance on sublexical processing.
Article
Full-text available
Attempts a rapprochement between J. J. Gibson's (1961) ecological optics and a conviction that perceiving, imagining, thinking, and dreaming are similarly guided by internalizations of long-enduring constraints in the external world. Phenomena of apparent motion illustrate how alternating presentations of 2 views of an object in 3-dimensional space induce the experience of the simplest rigid twisting motion prescribed by kinematic geometry––provided that times and distances fall within certain lawfully related limits on perceptual integration. Resonance is advanced as a metaphor for not only how internalized constraints such as those of kinematic geometry operate in perception, imagery, apparent motion, dreaming, hallucination, and creative thinking, but also how such constraints can continue to operate despite structural damage to the brain. (5½ p ref)
Article
Full-text available
Quantitative predictions are made from a model for word recognition. The model has as its central feature a set of "logogens," devices which accept information relevant to a particular word response irrespective of the source of this information. When more than a threshold amount of information has accumulated in any logogen, that particular response becomes available for responding. The model is tested against data available on (1) the effect of word frequency on recognition, (2) the effect of limiting the number of response alternatives, (3) the interaction of stimulus and context, and (4) the interaction of successive presentations of stimuli. Implications of the underlying model are largely upheld. Other possible models for word recognition are discussed as are the implications of the logogen model for theories of memory. (30 ref.) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
A radial grating stimulus was used to assess the effect of stimulation of the region beyond the classical surround of monkey lateral geniculate nucleus (LGN) receptive fields. The effect was measured by the differences in the responsiveness of the LGN cell center to small flashing spots between two conditions: (1) grating stationary or (2) grating rotating. The grating was present only in regions beyond the classical center and surround. The rotating grating produced changes in the flash-evoked spike response but not in the spontaneous activity in about half of the X cells and all of the Y cells. The direction of the effect was independent of the sign of the receptive field center. In a control experiment, cryogenic blockade of striate cortex reversed the effect in all cells tested. The grating effect was still present for cells having fields in that part of visual space beyond the region represented by the cooled cortical area. The effect was not a result of activation of classical extra-receptive field influences, since cells showing the effect did not exhibit shift or periphery effects or outer disinhibitory surrounds. The effect was not seen in recordings from intrageniculate retinal axons. We conclude that the radial grating effects LGN cell responsivity by activation of the corticogeniculate pathway.
Article
Full-text available
When an extraneous sound (such as a cough or tone) completely replaces a speech sound in a recorded sentence, listeners believe they hear the missing sound. The extraneous sound seems to occur during another portion of the sentence without interfering with the intelligibility of any phoneme. If silence replaces a speech sound, the gap is correctly localized and the absence of the speech sound detected.
Article
Full-text available
Theoretical considerations and diverse empirical data from clinical, psycholinguistic, and developmental studies suggest that language comprehension processes are decomposable into separate subsystems, including distinct systems for semantic and grammatical processing. Here we report that event-related potentials (ERPs) to syntactically well-formed but semantically anomalous sentences produced a pattern of brain activity that is distinct in timing and distribution from the patterns elicited by syntactically deviant sentences, and further, that different types of syntactic deviance produced distinct ERP patterns. Forty right-handed young adults read sentences presented at 2 words/sec while ERPs were recorded from over several positions between and within the hemispheres. Half of the sentences were semantically and grammatically acceptable and were controls for the remainder, which contained sentence medial words that violated (1) semantic expectations, (2) phrase structure rules, or (3) WH-movement constraints on Specificity and (4) Subjacency. As in prior research, the semantic anomalies produced a negative potential, N400, that was bilaterally distributed and was largest over posterior regions. The phrase structure violations enhanced the N125 response over anterior regions of the left hemisphere, and elicited a negative response (300-500 msec) over temporal and parietal regions of the left hemisphere. Violations of Specificity constraints produced a slow negative potential, evident by 125 msec, that was also largest over anterior regions of the left hemisphere. Violations of Subjacency constraints elicited a broadly and symmetrically distributed positivity that onset around 200 msec. The distinct timing and distribution of these effects provide biological support for theories that distinguish between these types of grammatical rules and constraints and more generally for the proposal that semantic and grammatical processes are distinct subsystems within the language faculty.
Article
Full-text available
Using the HPP, this study extended Jusczyk and Aslin (1995) by familiarizing 7.5- month-olds with nonwords (e.g., (beyp)) and then presenting test passages containing words that differ from the nonwords in their final consonant (e.g., 'bike', (beyk)) along with new unfamiliar words. The infants did not false alarm to the test passages containing the similar sounding words. Our results suggest that infants have detailed phonetic representations of familiar sound patterns.
Article
Full-text available
Four experiments investigated the relationship between syntactic and semantic processing. The first two experiments, which used a word-by-word reading paradigm with a makes-sense judgem ent, demonstrated that verb argument structure is used to construct provisional interpretations at points in a sentence where the syntactic structure is ambiguous, and that resolution of syntactic ambiguity occurs even when it is not necessary for interpretation of the input. The last two experiments used a cross-modal integration paradigm and found evidence that multiple syntactic representations are accessed or constructed at points of syntactic ambiguity just as multiple meanings are accessed at points of lexical ambiguity. The experim ental results are evaluated with regard to serial autonomous models, strongly and weakly interactive models, and a hybrid model proposed here.
Article
Full-text available
Recent experimental results in the visual cortex of cats and monkeys have suggested an important role for synchronization of neuronal activity on a millisecond time scale. Synchronization has been found to occur selectively between neuronal responses to related image components. This suggests that not only the firing rates of neurons but also the relative timing of their action potentials is used as a coding dimension. Thus, a powerful relational code would be available, in addition to the rate code, for the representation of perceptual objects. This could alleviate difficulties in the simultaneous representation of multiple objects. In this article we present a set of theoretical arguments and predictions concerning the mechanisms that could group neurons responding to related image components into coherently active aggregates. Synchrony is likely to be mediated by synchronizing connections; we introduce the concept of an interaction skeleton to refer to the subset of synchronizing connections that are rendered effective by a particular stimulus configuration. If the image is segmented into objects, these objects can typically be segmented further into their constituent parts. The synchronization behavior of neurons that represent the various image components may accurately reflect this hierarchical clustering. We propose that the range of synchronizing interactions is a dynamic parameter of the cortical network, so that the grain of the resultant grouping process may be adapted to the actual behavioral requirements.It can be argued that different aspects of purposeful behavior rely on separable processes by which sensory input is transformed into adjustments of motor activity. Indeed, neurophysiological evidence has suggested separate processing streams originating in the primary visual cortex for object identification and sensorimotor coordination. However, such a separation calls for a mechanism that avoids interference effects in the presence of multiple objects, or when multiple motor programs are simultaneously prepared. In this article we suggest that synchronization between responses of neurons in both the visual cortex and in areas that are involved in response selection and execution might allow for a selective routing of sensory information to the appropriate motor program.
Article
Full-text available
The influence of phonology on visual word perception tasks is often indexed by the presence or absence ofconsistency effects.Consistency concerns whether there exists more than one way to pronounce a spelling body (e.g., _INT as in HINT and PINT versus _EAP as in HEAP and LEAP). The present study considers a similar factor.Feedbackconsistency concerns whether there is more than one way to spell a pronunciation body (e.g., /_ip/ as in HEAP and DEEP versus /_Ob/ as in PROBE and GLOBE). Two experiments demonstrate a robust feedback consistency effect in visual lexical decision. Words with phonologic bodies that can be spelled more than one way (e.g., _EAP as in HEAP) produce slower correct “yes” responses than words with phonologic bodies that can be spelled only one way (e.g., _OBE as in PROBE). This result constitutes strong support for feedback, top-down models of performance in word perception tasks. Furthermore, the data suggest that previous tests of consistency effects may be misleading because they did not take into account feedback consistency.
Article
Previous work in which we compared English infants, English adults, and Hindi adults on their ability to discriminate two pairs of Hindi (non-English) speech contrasts has indicated that infants discriminate speech sounds according to phonetic category without prior specific language experience (Werker, Gilbert, Humphrey, & Tees, 1981), whereas adults and children as young as age 4 (Werker & Tees, in press), may lose this ability as a function of age and or linguistic experience. The present work was designed to (a) determine the generalizability of such a decline by comparing adult English, adult Salish, and English infant subjects on their perception of a new non-English (Salish) speech contrast, and (b) delineate the time course of the developmental decline in this ability. The results of these experiments replicate our original findings by showing that infants can discriminate non-native speech contrasts without relevant experience, and that there is a decline in this ability during ontogeny. Furthermore, data from both cross-sectional and longitudinal studies shows that this decline occurs within the first year of life, and that it is a function of specific language experience. © 2002 Published by Elsevier Science Inc.
Article
In recent years, many new cortical areas have been identified in the macaque monkey. The number of identified connections between areas has increased even more dramatically. We report here on (1) a summary of the layout of cortical areas associated with vision and with other modalities, (2) a computerized database for storing and representing large amounts of information on connectivity patterns, and (3) the application of these data to the analysis of hierarchical organization of the cerebral cortex. Our analysis concentrates on the visual system, which includes 25 neocortical areas that are predominantly or exclusively visual in function, plus an additional 7 areas that we regard as visual-association areas on the basis of their extensive visual inputs. A total of 305 connections among these 32 visual and visual-association areas have been reported. This represents 31% of the possible number of pathways it each area were connected with all others. The actual degree of connectivity is likely to be closer to 40%. The great majority of pathways involve reciprocal connections between areas. There are also extensive connections with cortical areas outside the visual system proper, including the somatosensory cortex, as well as neocortical, transitional, and archicortical regions in the temporal and frontal lobes. In the somatosensory/motor system, there are 62 identified pathways linking 13 cortical areas, suggesting an overall connectivity of about 40%. Based on the laminar patterns of connections between areas, we propose a hierarchy of visual areas and of somato sensory/motor areas that is more comprehensive than those suggested in other recent studies. The current version of the visual hierarchy includes 10 levels of cortical processing. Altogether, it contains 14 levels if one includes the retina and lateral geniculate nucleus at the bottom as well as the entorhinal cortex and hippocampus at the top. Within this hierarchy, there are multiple, intertwined processing streams, which, at a low level, are related to the compartmental organization of areas V1 and V2 and, at a high level, are related to the distinction between processing centers in the temporal and parietal lobes. However, there are some pathways and relationships (about 10% of the total) whose descriptions do not fit cleanly into this hierarchical scheme for one reason or another. In most instances, though, it is unclear whether these represent genuine exceptions to a strict hierarchy rather than inaccuracies or uncertainties in the reported assignment.
Article
Familiar melodic phrases were played repetitively with note durations ranging from 40 msec to 3.6 sec. Recognition required note durations approximating those normally used for playing melodic themes (roughly 150 msec to 1 sec per note). Additional experiments with nonmelodic sequences of tones indicated that different rules applied for nonmelodic patterns: Permuted orders of the same items could be distinguished from each other at all durations employed (10 msec to 5 sec per item). Recognition of different arrangements occurred not only when each tone differed in pitch, but also when all tones had the same pitch but differed in timbre. It was concluded that the durational limits for melodic recognition are not based on perceptual limits applicable to tonal patterns in general, but rather reflect special rules governing melodic organization. Hypotheses concerning the bases for these rules are suggested.
Chapter
This chapter describes a chunk of an ongoing research program. The theoretical framework guided and informed the research that developed a solution to a fundamental problem of cognitive psychology. The problem is to make models perform interesting cognition without being overly sensitive to perturbations in the input. To achieve this goal, depends there are two basic design principles: robustness requires compensatory integration and complex computation requires structural diversity. The classic form of continuous integration is addition. FuzzyProp allows (1) that the fundamental or primitive concepts/features are fuzzy predicates that may hold more or less in a given situation and (2) that the connective operators are also fuzzy in that they preserve the fuzziness introduced by the primitives.
Chapter
This chapter reveals that the stubborn rejection of phonology in the prevailing theories of reading cannot be sustained within a consistent theory of language processing that accommodates all of the facts, not just those that are convenient. The bulk of research on word identification using English language materials has been taken to implicate the dominance of a visual access route with, perhaps, an optional but not preferred phonological route. Data on word identification using Serbo-Croatian language materials point unequivocally to a nonoptional phonological access route. The basic mechanism of written language processing is assumed to be the same for all languages. Different data patterns among languages, therefore, are to be taken as evidence of the ways in which that mechanism can be fine-tuned by the structure of a particular language. Some differences and similarities among Serbo-Croatian, English, and Hebrew are used to elucidate possible features of a written language processing mechanism that would allow such patterns to arise. Given the nature of the data that have been obtained with Serbo-Croatian, such a mechanism must allow for automatic prelexical phonology. The chapter discusses the assumption that all writing systems are phonological, they provide a system for transcribing phonologically any possible word of the language. The variety of orthographies does this in more or less straightforward ways, resulting in their being phonologically shallow or deep.
Article
This article suggests a psychophysiological foundation for cognitive theory, and more generally for goal-oriented or purposive behavior. Of all my articles, this is the one which drives deepest into uncharted territory. I say this partly because new implications of my own concepts and constructions in the article are still crystallizing in my mind.
Article
The experiment reported in this paper used a delayed same/different sentence matching task with concurrent measurement of eye movements to investigate three questions: whether pragmatic plausibility effects are restricted to certain phrasal environments; how rapidly such effects are shown in on-line sentence processing; and whether they are a product of optional, high-level, inferential processes. The results clearly show that plausibility effects are not restricted to low–level phrasal units and that they appear to arise as a necessary consequence of the process responsible for deriving basic sentence meaning. The rapid and highly localized nature of the effects supports a view of sentence processing involving incremental interpretation of the earliest available syntactic representations. We argue that the apparently mandatory nature of plausibility effects, coupled with their insensitivity to repetition context, presents difficulties for both modular and interactive views of sentence processing.
Article
Recent studies demonstrating that multiple meanings of an ambiguous word are initially accessed even when only one reading is syntactically appropriate with the preceding context can be criticized on at least two grounds. First, many of the syntactic contexts used were not truly restrictive, and, secondly, subjects may not have had time to integrate the context before processing the ambiguous word. In the present study, subjects listened to a sentence ending in an ambiguous word and then made a lexical decision to a target related to either the appropriate or inappropriate reading. Contexts were completely restrictive, and a pause was introduced between the context and the ambiguous word. Multiple access still obtained, providing further support for the claim that lexical access is not guided by syntactic context.
Article
A pervasive characteristic of casual (normal conversation) English is the apparent deletion of unstressed vowels like the first vowel in the word ??support.?? One might suppose that if the first vowel in ??support?? were deleted, ??support?? would become homophonous with ??sport.?? Acoustic and physiological data are reported which suggest that in fact when speakers appear to be deleting an unstressed schwa, they are often actually omitting only the oral gestures for the vowel. The glottal gestures stay much as they are in careful speech and are tied to the remaining oral gestures much as they are in careful speech. Also reported are perceptual data which suggest that listeners are sensitive to the acoustic consequences of these residual patterns, and can use this information to distinguish between ??sport?? and reduced versions of ??support?? [see also J. Fokes and Z. S. Bond, J. Acoust. Soc. Am. Suppl. 1 85, S85 (1989); J. Fokes and Z. Bond, Proc. XII Int. Congr. Phon. Sci. 4, 58?61 (1991)]. These results will be discussed in terms of the more general issue of deletion and recovery of phoneticinformation. [Work supported by NSF.]
Chapter
This chapter presents pervasive influence of sentence-level context. The sentence-level context effects are seen in the same experimental tasks that yield associative context and frequency effects, the question arises as to which, if any, of these effects originates in the lexicon. This leads to the more general question concerning the constituents of a “lexical entry”: abstract orthographic and/or phonemic information only, the syntactic category of the word, and some basic semantic information. The process that yields frequency effects for words presented in isolation is neither mandatory nor immune to sentence-level context. The influence of sentence-level context can be as powerful and act as early as that of a single lexically associated word, and sentence context can be used to pick out the appropriate core meaning of an ambiguous word without first passing through an early stage of indiscriminate semantic activation.
Article
The role of vowel context and consonant labeling in the selective adaptation of voiceless fricatives was examined in three experiments. This approach was designed to determine whether selective adaptation effects occurred with voiceless fricative stimuli and to determine whether any such effects had a linguistic basis as opposed to a purely auditory basis. Two synthetic fricative-vowel continua were used; one ranged from [si] to [p] and the other from [su] to [fu]. Identification of the consonant portion of the syllables in these continua, as either [s] or [J], depended on both the frequency of the friction noise and on the vowel quality. In experiment 1, the end points of the continua were used as adaptors, and the identification boundary shifted toward the adapting stimulus. In experiment 2, an ambiguous frication noise (that was identified as [s] before [u] and as [J] before [i]) adapted the identification boundary in opposite directions, depending on which of the two vowels followed the noise. Thus the direction of adaptation depended on the perceptual identity of the consonant. In the final experiment, the isolated [i] and [u] vowels, and the isolated ambiguous frication noise, were demonstrated to be ineffective adaptors. The selective adaptation effects observed in these experiments were not determined by the acoustical information in the consonant or the vowel alone, but rather by the context-conditioned percept of the fricative. These results extend the reports of other research that has attempted to dissociate auditory from linguistic adaptation, provide further evidence that selective adaptation effects have multiple loci, and establish for the first time a selective adaptation effect which is unambiguously, not acoustically based.
Article
Phonemic restoration is a powerful auditory illusion. When part of an utterance is replaced by another sound (e.g. white noise), listeners report that the utterance sounds intact-they perceptually restore the missing speech. Several paradigms have been used to measure this illusion, and to explore its bottom-up and top-down bases. These studies have shown that acoustic properties of the replacement sound (especially its psychoacoustic match to the speech it replaced) strongly affect the illusion. The effect also depends on listener-based factors, such as the amount of lexical activation of the tested word. The current report summarises the results of the restoration literature.
Article
We tested the predictions concerning the pronunciation of monosyllabic homographs (words such as bass that have one regular and one irregular pronunciation) made by the dual-route model, the analogy model, and the distributed model if some level of independence between processes is assumed. In Experiment 1, we found that the naming latency of monosyllabic homographs was longer than the naming latency of regular control words that were half the printed frequency of the homographs. (We also found a longer naming latency for multisyllabic homographs that differed in both stress assignment and phonemic composition (e.g., project) compared to control words, but no difference in naming latency for multisyllabic homographs that differed only in stress assignment (e.g., insult)). In addition, we found that the more frequent pronunciation was the irregular pronunciation for the majority of the monosyllabic homographs, and the naming latency for the more frequent pronunciation for these homographs was the same as or longer than the naming latency for the less frequent pronunciation. In Experiment 2, we carried out a delayed naming control experiment and did not find any naming latency differences between homographs and controls. In Experiment 3, we compared homographs with exceptions matched in frequency and found that the naming latency difference between homographs and their controls was larger than that between exceptions and their controls. Thus, the longer naming latency for monosyllabic homographs found in Experiment 1 is not simply a production effect or an exception effect. Finally, in Experiment 4, we found that for all three classes of homographs, the proportion of a given pronunciation was highly correlated with the subjective familiarity of that pronunciation. In the discussion, we argue that these results can only be supported by naming models in which the entire input string dominates sublexical constraints. Moreover, we argue that the counterintuitive data in which the more frequent pronunciation has a longer latency than the less frequent pronunciation requires two different constraints or processes that have different time courses.
Article
The TRACE model (McClelland & Elman, 1986) predicts lexically-driven inhibition at the phonemic level. This is due to the combination of top-down excitatory connections from the lexical to the phonemic level, and inhibitory connections between competing units within the phonemic level. Frauenfelder, Segui, and Dijkstra (1990, Experiment 2) tested this prediction in French and found no evidence of such inhibition. Experiment 1 of the current study replicated their results with English stimuli: Instead of having longer reaction times (RTs), targets in Inhibiting Nonwords (INWs) were detected just as fast as targets in control nonwords. Our Experiment 2 improved the design of the original experiment by adding a more appropriate control condition, increasing the number of critical items, and employing balanced target locations and conditional target probabilities. Under these conditions, RTs to INWs were significantlyfasterthan baseline RTs, an effect opposite in direction to the hypothesized inhibition. Experiment 3 used a dual-task paradigm to examine the attentional demands of processing different types of nonwords. In addition to performing the phoneme monitoring task as before, subjects also monitored a pure tone for frequency modulations. The RT advantage for INWs was replicated in this experiment for both phoneme and modulation targets. In Experiment 4 we replicated the INW advantage for both phoneme and modulation targets, and found that the advantage disappeared for stimuli that carriedbothtypes of targets. The results suggest that both lexical inhibition and attentional allocation affect phoneme perception; their interaction can mask the effect of each.
Article
Levinson's book presents a theory of generalized conversational implicature (GCI), and makes the central claim that this theory necessitates a "new view of the architecture of the theory of meaning" (p. 9). Levinson claims that to account for GCI (and other types of presumptive meanings, or preferred interpretations), it is necessary to distinguish a new level of utterance-type meaning from sentence-meaning and speaker-meaning: "This level is to capture the suggestions that the use of an expression of a certain type generally or normally carries, b y default" (p. 71). The book belongs to the genre of linguistic argumentation. Expanding u p o n the Gricean notion of GCI (Grice 1975), the author provides numerous examples of GCI and classifies them into three categories, each category representing a different licensing heuristic. Then he discusses the im-plications of the theory: first, for the interface between semantics and pragmatics, and second, for syntactic theory. Throughout the presentation, the author addresses in great detail potential objections and counterarguments from alternative theories of meaning. According to the author, GCIs are defeasible inferences triggered b y the speaker's choice of utterance form and lexical items because of three heuristics mutually assumed b y speaker and hearer. The heuristics, which can be related to Grice's maxims, are these:
Article
Recent work (Vitevitch & Luce, 1998) investigating the role of phonotactic information in spoken word recognition suggests the operation of two levels of representation, each having distinctly different consequences for processing. The lexical level is marked by competitive effects associated with similarity neighborhood activation, whereas increased probabilities of segments and sequences of segments facilitate processing at the sublexical level. We investigated the two proposed levels in six experiments using monosyllabic and specially constructed bisyllabic words and nonwords. The results of these studies provide further support for the hypothesis that the processing of spoken stimuli is a function of both facilitatory effects associated with increased phonotactic probabilities and competitive effects associated with the activation of similarity neighborhoods. We interpret these findings in the context of Grossberg, Boardman, and Cohen's (1997) adaptive resonance theory of speech perception.
Article
Previous consistency research investigated inconsistency only in the mapping of spelling to phonology (feedforward inconsistency). The present experiments investigated whether inconsistency in the mapping of phonology to spelling (feedback inconsistency) would also affect visual word perception. In Experiment 1, we replicated the basic feedback consistency effect previously obtained by Stone, Vanhoy, and Van Orden (1997) in a lexical-decision task. Lexical-decision latencies and errors were increased when a word's phonological rhyme could be spelled in multiple ways. In Experiment 2, we showed that both feedforward and feedback consistency affected lexical-decision performance to the same extent. In Experiment 3, feedback consistency effects persisted in immediate naming, but they were smaller and less reliable than feedforward consistency effects. A portion of the feedforward consistency effects persisted in delayed naming. Our results suggest that a part of both feedforward and feedback consistency effects seem to be modulated by task specific properties (e.g., spelling check in lexical decision or ease of generating articulatory programs in naming). However, another part seems to uncover a task-independent basic principle underlying word perception: the bidirectional coupling of orthography and phonology.
Article
Current theories of spoken-word recognition posit two levels of representation and process: lexical and sublexical. By manipulating probabilistic phonotactics and similarity-neighborhood density, we attempted to determine if these two levels of representation have dissociable effects on processing. Whereas probabilistic phonotactics have been associated with facilitatory effects on recognition, increases in similarity-neighborhood density typically result in inhibitory effects on recognition arising from lexical competition. Our results demonstrated that when the lexical level is invoked using real words, competitive effects of neighborhood density are observed. However, when strong lexical effects are removed by the use of nonsense word stimuli, facilitatory effects of phonotactics emerge. These results are consistent with a two-level framework of process and representation embodied in certain current models of spoken-word recognition.
Article
This paper shows that models of sentence comprehension based on “weak interaction” between syntax and interpretive processes are theoretically well-founded. According to this theory, local syntactic ambiguities can be resolved as soon as they are encountered in a single left-to-right pass through the sentence, by distinguishing the alternative partial analyses on the basis of their semantic and referential appropriateness to the context of utterance. This “incremental-interactive” theory is compared with an alternative serial structural strategy-based model proposed by Frazier and others, including Clifton and Ferreira. Particular attention is paid to a component of this theory that has been called the “Thematic Processor”. This component has hitherto remained somewhat under-specified, but we examine a number of possible interpretations. We argue that the only reasonable interpretation of the thematic processor is as a mechanism identical in every respect with the incremental weakly interactive processor that we ourselves propose. We conclude that the additional postulation of parsing strategies, in particular Minimal Attachment, may be unnecessary.