Article

Lexical Activation Produces Potent Phonemic Percepts

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Theorists disagree about whether auditory word recognition is a fully bottom-up, autonomous process, or whether there is top-down processing within a more interactive architecture. The current study provides evidence for top-down lexical to phonemic activation. In several experiments, listeners labeled members of a /bI/-/dI/ test series, before and after listening to repeated presentations of various adapting sounds. Real English words (containing either a /b/ or a /d/) produced reliable adaptation shifts in labeling of the /bI/-/dI/ syllables. Critically, so did words in which the /b/ or /d/ was perceptually restored (when noise replaced the /b/ or /d/). Several control conditions demonstrated that no adaptation occurred when no phonemic restoration occurred. Similarly, no independent role in adaptation was found for lexical representations themselves. Thus, the results indicate that lexical activation can cause the perceptual process to synthesize a highly functional phonemic code. This result provides strong evidence for interactive models of word recognition.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Thus, either lexical context modulates phonetic perception (the interactive or feedback assumption), or it has a post-perceptual influence on responses (the feedforward assumption). Another fundamental top-down effect in spoken word recognition is phoneme restoration (Samuel, 1981a(Samuel, , 1981b(Samuel, , 1996(Samuel, , 1997Warren, 1970). If a phoneme in a word is replaced by silence, it leaves a salient gap, and participants have no trouble reporting that the word is not intact and can identify which phoneme is missing. ...
... In Simulation 4, we turn to another classic top-down effect using an analog to the phoneme restoration paradigm (Samuel, 1981a(Samuel, ,b, 1996(Samuel, , 1997Warren, 1970). In a phoneme restoration paradigm, a phoneme is replaced either with noise or with silence (typically in a lexical context where there is only one possible completion for the replaced phoneme, e.g., #uxury or _uxury [where # indicates noise and _ indicates silence] can only be restored as luxury). ...
... If a phoneme is replaced by silence, the gap is salient, and listeners can report the precise location of the silence and which specific phoneme is missing. Another difference is that noise-replaced phonemes can drive selective adaptation (Samuel, 1997), as though the actual phoneme had been repeated, but silence cannot. The interpretation of this pattern is that noise provides sufficient bottom-up activation that the missing phoneme is "filled in" by feedback. ...
Article
Full-text available
The Time-Invariant String Kernel (TISK) model of spoken word recognition (Hannagan, Magnuson & Grainger, 2013; You & Magnuson, 2018) is an interactive activation model with many similarities to TRACE (McClelland & Elman, 1986). However, by replacing most time-specific nodes in TRACE with time-invariant open-diphone nodes, TISK uses orders of magnitude fewer nodes and connections than TRACE. Although TISK performed remarkably similarly to TRACE in simulations reported by Hannagan et al., the original TISK implementation did not include lexical feedback, precluding simulation of top-down effects, and leaving open the possibility that adding feedback to TISK might fundamentally alter its performance. Here, we demonstrate that when lexical feedback is added to TISK, it gains the ability to simulate top-down effects without losing the ability to simulate the fundamental phenomena tested by Hannagan et al. Furthermore, with feedback, TISK demonstrates graceful degradation when noise is added to input, although parameters can be found that also promote (less) graceful degradation without feedback. We review arguments for and against feedback in cognitive architectures, and conclude that feedback provides a computationally efficient basis for robust constraint-based processing.
... UlJs 24j3 i8E. It has been shown repeatedly that subjects cannot distinguish reliably between the perception of a word with a real phoneme (plus white noise) and perception of the same word in which the phoneme has been replaced by noise (e.g., between "legislature" with # and "legi#lature") (Samuel, 1997(Samuel, , 2001. (The effect is strongest for stops (/ p/ , / t/ , / k/ , / b/ , / d/ , / g/ ) and fricatives (/ f/ , / s/ , / v/ , / z/ ) near the ends of long words.) ...
... Similar conclusions can be drawn from a behavioral psychological paradigm involving adaptation. Samuel (1997Samuel ( , 2001 reasoned that if the phonemic restoration effect is a perceptual effect, the "restored" phonemes should have the same adaptational effect on subsequent stimuli that real phonemes have. Just as seeing many dots causes a subsequent set of dots to look like fewer dots, hearing repeated / d/ sounds makes an ambiguous stimulus sound like / b/ . ...
... In the version used in (Samuel, 1997), four-syllable words were used in which the target phoneme was in the third syllable. (I'll give examples based on the contrast between / b/ and / d/ . ...
Book
Full-text available
This book argues that there is a joint in nature between seeing and thinking, perception, and cognition. Perception is constitutively iconic, nonconceptual, and nonpropositional, whereas cognition does not have these properties constitutively. The book does not appeal to “intuitions,” as is common in philosophy, but to empirical evidence, including experiments in neuroscience and psychology. The book argues that cognition affects perception, i.e., that perception is cognitively penetrable, but that this does not impugn the joint in nature. A key part of the argument is that we perceive not only low-level properties like colors, shapes, and textures but also high-level properties such as faces and causation. Along the way, the book explains the difference between perception and perceptual memory, the differences between format and content, and whether perception is probabilistic despite our lack of awareness of probabilistic properties. The book argues for perceptual categories that are not concepts, that perception need not be singular, that perceptual attribution and perceptual discrimination are equally fundamental, and that basic features of the mind known as “core cognition” are not a third category in between perception and cognition. The chapter on consciousness leverages these results to argue against some of the most widely accepted theories of consciousness. Although only one chapter is about consciousness, much of the rest of the book repurposes work on consciousness to isolate the scientific basis of perception.
... As a perceptual aftereffect, selective adaptation has long been used as an indirect measure of perceptual processing, allowing researchers to draw inferences about perceptually-relevant features of stimuli through a task that minimizes the risk of decision bias. For example, Samuel (1997) measured selective adaptation resulting from a lexically supported speech illusion, the phonemic restoration effect. Phonemic restoration was first reported by Warren (1970), who removed a segment from an utterance and replaced this segment with noise (e.g., Warren replaced the central "s" of "legislatures" with a coughing sound). ...
... This led us to reevaluate some of the selective adaptation results that originally motivated the account. To briefly restate these selective adaptation findings: McGurk adaptors produce selective adaptation to the unperceived auditory stimulus (e.g., Roberts & Summerfield, 1981;Saldaña & Rosenblum, 1994), while phonemic restoration (Samuel, 1997) and Ganong stimuli (Samuel, 2001;Samuel & Frost, 2015) support selective adaptation to a lexically-determined segment that is perceived but not present in the stimulus. These results suggest that selective adaptation follows perception when that perception is determined by lexical information, but not when it is determined by multisensory information. ...
... Notably, the distributed learning account offered by Kleinschmidt and Jaeger (2011, 2016 strongly predicts that multisensory contexts should support selective adaptation. In contrast, the lexical context selective adaptation effects have been found with: (a) the phonemic restoration effect (Samuel, 1997) in which the adapting phoneme is absent and replaced with noise; and (b) the Ganong effect (Samuel, 2001;Samuel & Frost 2015) in which the adapting phoneme is acoustically ambiguous. In both of these cases, lexical context effects have been observed with stimuli that contain unclear (ambiguous) auditory segments devoid of any simultaneous competing information. ...
Article
Full-text available
Speech selective adaptation is a phenomenon in which repeated presentation of a speech stimulus alters subsequent phonetic categorization. Prior work has reported that lexical, but not multisensory, context influences selective adaptation. This dissociation suggests that lexical and multisensory contexts influence speech perception through separate and independent processes (see Samuel & Lieblich, 2014). However, this dissociation is based on results reported by different studies using different stimuli. This leaves open the possibility that the divergent effects of multisensory and lexical contexts on selective adaptation may be the result of idiosyncratic differences in the stimuli rather than separate perceptual processes. The present investigation used a single stimulus set to compare the selective adaptation produced by lexical and multisensory contexts. In contrast to the apparent dissociation in the literature, we find that multisensory information can in fact support selective adaptation. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
... It has proven very difficult to empirically distinguish between these views, but one type of test provides a clear separation: If a lexical influence on phonemic perception is found using a measure that does not require overt identification, interactive models are supported over autonomous ones, because the latter focus on a decision-stage effect that is precluded when such decisions are not made. Samuel (1997Samuel ( , 2001 provided two such tests. In one, the lexical items supported phonetic perception through phonemic restoration; in the second, the Ganong effect provided the lexical support. ...
... In multiple experiments, the predicted adaptation shifts were found. Comparable findings were obtained using lexically driven phonemic restoration (Samuel, 1997). ...
... The average difference in "sh" reports as a function of adaptor condition was then tested with within-subjects paired ttests. Adaptation effects are typically largest near the category boundary, making this metric a sensitive one (Samuel, 1997(Samuel, , 2001. As Fig. 1 shows, there was no significant shift for the native Hebrew speakers [t(22) = 1.165, n.s.]. ...
Article
Full-text available
Second language comprehension is generally not as efficient and effective as native language comprehension. In the present study, we tested the hypothesis that lower-level processes such as lexical support for phonetic perception are a contributing factor to these differences. For native listeners, it has been shown that the perception of ambiguous acoustic–phonetic segments is driven by lexical factors (Samuel Psychological Science, 12, 348-351, 2001). Here, we tested whether nonnative listeners can use lexical context in the same way. Native Hebrew speakers living in Israel were tested with American English stimuli. When subtle acoustic cues in the stimuli worked against the lexical context, these nonnative speakers showed no evidence of lexical guidance of phonetic perception. This result conflicts with the performance of native speakers, who demonstrate lexical effects on phonetic perception even with conflicting acoustic cues. When stimuli without any conflicting cues were used, the native Hebrew subjects produced results similar to those of native English speakers, showing lexical support for phonetic perception in their second language. In contrast, native Arabic speakers, who were less proficient in English than the native Hebrew speakers, showed no ability to use lexical activation to support phonetic perception, even without any conflicting cues. These results reinforce previous demonstrations of lexical support of phonetic perception and demonstrate how proficiency modulates the use of lexical information in driving phonetic perception.
... Firestone and Scholl (2015) have emphasized the importance of disentangling Bpost-perceptual judgment from actual online perception^(p. 48), a point raised previously by Norris, McQueen, and Cutler (2000); see Samuel (1997;Samuel, 2001) for studies that have done this in the area of spoken word recognition. ...
... Fig. 9 Accentedness ratings of eight-step continua as a function of whether the adaptor was native versus accented. Error bars represent the standard error of the mean The logic of Experiment 5B is similar to the logic Samuel (1997;Samuel, 2001) has used to demonstrate that lexical context can drive the perception of phonetic segments within a word. Samuel (1997) tested whether a phonetic segment produced by phonemic restoration has the same adapting properties as a phonetic segment that is acoustically present in a word. ...
... Error bars represent the standard error of the mean The logic of Experiment 5B is similar to the logic Samuel (1997;Samuel, 2001) has used to demonstrate that lexical context can drive the perception of phonetic segments within a word. Samuel (1997) tested whether a phonetic segment produced by phonemic restoration has the same adapting properties as a phonetic segment that is acoustically present in a word. In phonemic restoration, a segment is deleted from a word and replaced by another sound, such as white noise. ...
Article
Prior studies have reported that seeing an Asian face makes American English sound more accented. The current study investigates whether this effect is perceptual, or if it instead occurs at a later decision stage. We first replicated the finding that showing static Asian and Caucasian faces can shift people's reports about the accentedness of speech accompanying the pictures. When we changed the static pictures to dubbed videos, reducing the demand characteristics, the shift in reported accentedness largely disappeared. By including unambiguous items along with the original ambiguous items, we introduced a contrast bias and actually reversed the shift, with the Asian-face videos yielding lower judgments of accentedness than the Caucasian-face videos. By changing to a mixed rather than blocked design, so that the ethnicity of the videos varied from trial to trial, we eliminated the difference in accentedness rating. Finally, we tested participants' perception of accented speech using the selective adaptation paradigm. After establishing that an auditory-only accented adaptor shifted the perception of how accented test words are, we found that no such adaptation effect occurred when the adapting sounds relied on visual information (Asian vs. Caucasian videos) to influence the accentedness of an ambiguous auditory adaptor. Collectively, the results demonstrate that visual information can affect the interpretation, but not the perception, of accented speech.
... It has proven very difficult to empirically distinguish between these views, but one type of test provides a clear separation: If a lexical influence on phonemic perception is found using a measure that does not require overt identification, interactive models are supported over autonomous ones, because the latter focus on a decision-stage effect that is precluded when such decisions are not made. Samuel (1997Samuel ( , 2001 provided two such tests. In one, the lexical items supported phonetic perception through phonemic restoration; in the second, the Ganong effect provided the lexical support. ...
... The ambiguous mixtures should reduce later Bs^reports if that portion of the waveform is perceived as Bs,^and should reduce Bshr eports if the lexicon pushes perception toward Bsh.^In multiple experiments, the predicted adaptation shifts were found. Comparable findings were obtained using lexically driven phonemic restoration (Samuel, 1997). ...
... The average difference in Bsh^reports as a function of adaptor condition was then tested with within-subjects paired ttests. Adaptation effects are typically largest near the category boundary, making this metric a sensitive one (Samuel, 1997(Samuel, , 2001. As Fig. 1 shows, there was no significant shift for the native Hebrew speakers [t(22) = 1.165, n.s.]. ...
Article
Full-text available
Second language comprehension is generally not as efficient and effective as native language comprehension. In the present study, we tested the hypothesis that lower-level processes such as lexical support for phonetic perception are a contributing factor to these differences. For native listeners, it has been shown that the perception of ambiguous acoustic-phonetic segments is driven by lexical factors (Samuel Psychological Science, 12, 348-351, 2001). Here, we tested whether nonnative listeners can use lexical context in the same way. Native Hebrew speakers living in Israel were tested with American English stimuli. When subtle acoustic cues in the stimuli worked against the lexical context, these nonnative speakers showed no evidence of lexical guidance of phonetic perception. This result conflicts with the performance of native speakers, who demonstrate lexical effects on phonetic perception even with conflicting acoustic cues. When stimuli without any conflicting cues were used, the native Hebrew subjects produced results similar to those of native English speakers, showing lexical support for phonetic perception in their second language. In contrast, native Arabic speakers, who were less proficient in English than the native Hebrew speakers, showed no ability to use lexical activation to support phonetic perception, even without any conflicting cues. These results reinforce previous demonstrations of lexical support of phonetic perception and demonstrate how proficiency modulates the use of lexical information in driving phonetic perception.
... For instance, recent behavioral work has shown that lexical status can influence whether incoming speech is perceived as a unified auditory stream or two segregated streams (Billig, Davis, Deeks, Monstrey, & Carlyon, 2013), consistent with earlier work demonstrating lexical influences on a listener's interpretation of ambiguous speech sounds (Ganong, 1980;Warren, 1970). One particularly compelling set of behavioral results comes from a set of studies by Samuel (1997Samuel ( , 2001, who, like Elman and McClelland (1988), tested for top-down effects by examining whether a lexically restored phoneme could mediate a separate sublexical process. However, instead of leveraging compensation for coarticulation, Samuel leveraged the sublexical process of selective adaptation (Eimas & Corbit, 1973), a phenomenon in which exposure to a repeatedly presented stimulus (e.g., a clearly produced /d/) leads listeners to make fewer responses of that category on a subsequent test (e.g., fewer /d/ responses on a subsequent /d/-/t/ continuum). ...
... However, instead of leveraging compensation for coarticulation, Samuel leveraged the sublexical process of selective adaptation (Eimas & Corbit, 1973), a phenomenon in which exposure to a repeatedly presented stimulus (e.g., a clearly produced /d/) leads listeners to make fewer responses of that category on a subsequent test (e.g., fewer /d/ responses on a subsequent /d/-/t/ continuum). Samuel (1997) found that after repeated exposure to a lexically restored phoneme (e.g., the /d/ in arma?illo, where ? indicates a phoneme replaced by white noise), listeners were less likely to report hearing that phoneme on a subsequent test continuum (e.g., fewer /d/ responses on a /b/-/d/ continuum); that is, a lexically restored phoneme had the same influence on the selective adaptation process as an unambiguous phoneme did. ...
Article
Full-text available
A long‐standing question in cognitive science is how high‐level knowledge is integrated with sensory input. For example, listeners can leverage lexical knowledge to interpret an ambiguous speech sound, but do such effects reflect direct top‐down influences on perception or merely postperceptual biases? A critical test case in the domain of spoken word recognition is lexically mediated compensation for coarticulation (LCfC). Previous LCfC studies have shown that a lexically restored context phoneme (e.g., /s/ in Christma#) can alter the perceived place of articulation of a subsequent target phoneme (e.g., the initial phoneme of a stimulus from a tapes‐capes continuum), consistent with the influence of an unambiguous context phoneme in the same position. Because this phoneme‐to‐phoneme compensation for coarticulation is considered sublexical, scientists agree that evidence for LCfC would constitute strong support for top–down interaction. However, results from previous LCfC studies have been inconsistent, and positive effects have often been small. Here, we conducted extensive piloting of stimuli prior to testing for LCfC. Specifically, we ensured that context items elicited robust phoneme restoration (e.g., that the final phoneme of Christma# was reliably identified as /s/) and that unambiguous context‐final segments (e.g., a clear /s/ at the end of Christmas) drove reliable compensation for coarticulation for a subsequent target phoneme. We observed robust LCfC in a well‐powered, preregistered experiment with these pretested items (N = 40) as well as in a direct replication study (N = 40). These results provide strong evidence in favor of computational models of spoken word recognition that include top–down feedback.
... Yet there is much less work linking video as a computer science and multimedia construct and its linguistics perspective as a language delivery mechanism. For example, research in the linguistics domain has shown that humans do not need complete information of a word in order to recognize or understand it [29,30,38,39]. However, the literature on perceptual video compression and streaming is yet to take rich cognitive perception information into account [25]. ...
... Works such as [14,20] indicate that this effect is cross-language and the reason for these statistical regularities is due to biological reasons [15,26] such as the inertia of moving speech articulators like lips and tongue. Using this effect, some research studies have shown that by adding noise [38,39] in place of some speech units, prime the humans to perceive the missing units. All these results from the psycholinguistics and cognitive science domain have found little place in the multimedia field. ...
Preprint
Full-text available
Speech as a natural signal is composed of three parts - visemes (visual part of speech), phonemes (spoken part of speech), and language (the imposed structure). However, video as a medium for the delivery of speech and a multimedia construct has mostly ignored the cognitive aspects of speech delivery. For example, video applications like transcoding and compression have till now ignored the fact how speech is delivered and heard. To close the gap between speech understanding and multimedia video applications, in this paper, we show the initial experiments by modelling the perception on visual speech and showing its use case on video compression. On the other hand, in the visual speech recognition domain, existing studies have mostly modeled it as a classification problem, while ignoring the correlations between views, phonemes, visemes, and speech perception. This results in solutions which are further away from how human perception works. To bridge this gap, we propose a view-temporal attention mechanism to model both the view dependence and the visemic importance in speech recognition and understanding. We conduct experiments on three public visual speech recognition datasets. The experimental results show that our proposed method outperformed the existing work by 4.99% in terms of the viseme error rate. Moreover, we show that there is a strong correlation between our model's understanding of multi-view speech and the human perception. This characteristic benefits downstream applications such as video compression and streaming where a significant number of less important frames can be compressed or eliminated while being able to maximally preserve human speech understanding with good user experience.
... Step 1 vowels and consonants at a surface level. For example, in my demonstrations that lexical activation can support perception of phonemes (e.g., Samuel 1997Samuel , 2001, my point was that if a listener hears "aboli?", with the "?" representing an ambiguous mixture of "s" and "sh", the listener will perceive the consonant "sh" (whereas the percept will be of "s" if the same "?" mixture occurs in "malpracti?"). There was no intent to be precise about whether that perceived consonant should be thought of as a phoneme or allophone (or some other linguistic unit) -it was simply a consonant (which psycholinguists typically consider to be a phoneme, but not in a linguistic, formal, sense). ...
... If "phoneme" were instead to be viewed as simply a vowel or consonant of the language, as the term is typically used in psycholinguistic research, then it would be easy to identify many uses in the literature. I have already mentioned one such use -the idea that lexical context can generate a "phoneme" from noise (Samuel, 1997) or from an ambiguous segment (Samuel, 2001). The most commonly invoked models of speech perception and word recognition (e.g., McClelland & Elman's (1986) TRACE model, and various models by Norris and his colleagues, e.g., Norris, McQueen, & Cutler, 2000) have phoneme-like units. ...
Article
The current study has empirical, methodological, and theoretical components. It draws heavily on two recent papers: Bowers et al. (2016) (JML, 87, 71–83) used results from selective adaptation experiments to argue that phonemes play a critical role in speech perception. Mitterer et al. (2018) (JML, 98, 77–92) responded with their own adaptation experiments to advocate instead for allophones. These studies are part of a renewed use of the selective adaptation paradigm. Empirically, the current study reports results that demonstrate that the Bowers et al. findings were artifactual. Methodologically, the renewed use of adaptation in the field is a positive development, but many recent studies suffer from a lack of knowledge of prior adaptation findings. As the use of selective adaptation grows, it will be important to draw on the considerable existing knowledge base (this literature is also relevant to the currently popular research on phonetic recalibration). Theoretically, for a half century there has been a recurring effort to demonstrate the psychological reality of various linguistic units, such as the phoneme or the allophone. The evidence is that listeners will use essentially any pattern that has been experienced often enough, not just the units that are well-suited to linguistic descriptions of language. Thus, rather than trying to identify any special perceptual status for linguistic units, psycholinguists should focus their efforts on more productive issues.
... This inefficacy contrasts with adaptors in which the critical sound is generated by lexical context. Samuel (1997) constructed adaptors in which a ''b" (e.g., in ''alphabet") or a ''d" (e.g., in ''armadillo") was replaced by white noise. The lexical context in such words caused listeners to perceive the missing ''b" or ''d" (via phonemic restoration: Samuel, 1981;Warren, 1970), and these lexically-driven sounds successfully generated adaptation shifts. ...
... Samuel and Lieblich (2014) tried to strengthen the audiovisual percept by making it lexical (e.g., pairing a visual ''armagillo" with an auditory ''armibillo", to generate the perceived adaptor ''armadillo"), but the results were entirely like those from the studies using simple syllables as adaptors -the shifts were always determined by the auditory component, not by the perceived stimulus. The results from all of these studies contrast with those from multiple experiments in which the perceived identity of the adaptor was determined by lexical context, either through phonemic restoration (Samuel, 1997), or via the Ganong effect (Samuel, 2001). In the lexical cases, the perceived identity of the adaptors matched the observed adaptation shifts. ...
Data
Full-text available
... This inefficacy contrasts with adaptors in which the critical sound is generated by lexical context. Samuel (1997) constructed adaptors in which a ''b" (e.g., in ''alphabet") or a ''d" (e.g., in ''armadillo") was replaced by white noise. The lexical context in such words caused listeners to perceive the missing ''b" or ''d" (via phonemic restoration: Samuel, 1981;Warren, 1970), and these lexically-driven sounds successfully generated adaptation shifts. ...
... Samuel and Lieblich (2014) tried to strengthen the audiovisual percept by making it lexical (e.g., pairing a visual ''armagillo" with an auditory ''armibillo", to generate the perceived adaptor ''armadillo"), but the results were entirely like those from the studies using simple syllables as adaptors -the shifts were always determined by the auditory component, not by the perceived stimulus. The results from all of these studies contrast with those from multiple experiments in which the perceived identity of the adaptor was determined by lexical context, either through phonemic restoration (Samuel, 1997), or via the Ganong effect (Samuel, 2001). In the lexical cases, the perceived identity of the adaptors matched the observed adaptation shifts. ...
... ects for the AV adaptor perfectly matched the pattern for the simple auditory ''b " . This failure by lip-read context to produce an adaptation shift has been replicated repeatedly (Saldaña & Rosenblum, 1994; Samuel & Lieblich, 2014; Shigeno, 2002). This inefficacy contrasts with adaptors in which the critical sound is generated by lexical context. Samuel (1997) constructed adaptors in which a ''b " (e.g., in ''alphabet " ) or a ''d " (e.g., in ''armadillo " ) was replaced by white noise. The lexical context in such words caused listeners to perceive the missing ''b " or ''d " (via phonemic restoration: Samuel, 1981; Warren, 1970), and these lexically-driven sounds successfully generated adapta ...
... Samuel and Lieblich (2014) tried to strengthen the audiovisual percept by making it lexical (e.g., pairing a visual ''armagillo " with an auditory ''armibillo " , to generate the perceived adaptor ''armadillo " ), but the results were entirely like those from the studies using simple syllables as adaptors – the shifts were always determined by the auditory component, not by the perceived stimulus. The results from all of these studies contrast with those from multiple experiments in which the perceived identity of the adaptor was determined by lexical context, either through phonemic restoration (Samuel, 1997), or via the Ganong effect (Samuel, 2001). In the lexical cases, the perceived identity of the adaptors matched the observed adaptation shifts. ...
... This inefficacy contrasts with adaptors in which the critical sound is generated by lexical context. Samuel (1997) constructed adaptors in which a "b" (e.g., in "alphabet") or a "d" (e.g., in "armadillo") was replaced by white noise. The lexical context in such words caused listeners to perceive the missing "b" or "d" (via phonemic restoration: Samuel, 1981;Warren, 1970), and these lexically-driven sounds successfully generated adaptation shifts. ...
... Samuel and Lieblich (2014) tried to strengthen the audiovisual percept by making it lexical (e.g., pairing a visual "armagillo" with an auditory "armibillo", to generate the perceived adaptor "armadillo"), but the results were entirely like those from the studies using simple syllables as adaptorsthe shifts were always determined by the auditory component, not by the perceived stimulus. The results from all of these studies contrast with those from multiple experiments in which the perceived identity of the adaptor was determined by lexical context, either through phonemic restoration (Samuel, 1997), or via the Ganong effect (Samuel, 2001). In the lexical cases, the perceived identity of the adaptors matched the observed adaptation shifts. ...
... Phonemes are detected more quickly in words than nonwords (the word superiority effect; Rubin, Turvey, & Van Gelder, 1976). Listeners report hearing phonemes consistent with lexical or sentential context in locations completely replaced with noise (the phoneme restoration effect; e.g., Warren, 1970;Samuel, 1981Samuel, , 1997. If a phoneme continuum is attached to a context that makes one endpoint a word and the other a nonword (e.g., /t/-/d/ attached to -ash or -ask), categorical perception boundaries shift such that more steps are identified as consistent with the lexical endpoint (Ganong, 1980; a bias is also found in word-word contexts with a frequency differential; Fox, 1984). ...
... Instead, the appropriate TP context seems to be an n-phone of dynamic length, where n resolves to word length, and thus the knowledge driving mediated compensation for coarticulation seems to be lexical Further evidence for feedback comes from selective adaptation to restored phonemes. Samuel (1997Samuel ( , 2001a has shown that "restored" phonemes (phonemes replaced with noise, but which subjects report hearing in a manner consistent with lexical or larger contexts) can drive the selective adaptation found with fully articulated phonemes. If a segment at one end of a categorical perception continuum is repeated many times, the boundary shifts toward that stimulus, such that a smaller step toward the opposite end of the continuum leads to a change in perception. ...
Chapter
Full-text available
Spoken word recognition is a distinct subsystem providing the interface among low-level perception and cognitive processes of retrieval, parsing, and interpretation. The narrowest conception of the process of recognizing a spoken word is that it starts from a string of phonemes, establishes how these phonemes should be grouped to form words, and passes these words onto the next level of processing. Some theories, though, take a broader view and blur the distinctions among speech perception, spoken word recognition, and sentence processing. The broader view of spoken word recognition has empirical and theoretical motivations. One consideration is that by assuming that the input to spoken word recognition is a string of abstract, phonemic category labels, one implicitly assumes that the nonphonemic variability carried on the speech signal is not relevant for spoken word recognition and higher levels of processing. However, if this variability and detail is not random but is lawfully related to linguistic categories, the simplifying assumption that the output of speech perception is a string of phonemes may actually be a complicating assumption.
... Block focuses on work by Arthur Samuel (1997) that well demonstrates how adaptation can settle difficult cases. Samuel replicated phoneme adaptation in the course of extending the phenomenon to phonemic restoration. ...
Article
Full-text available
1. Introduction Ned Block’s (2022)The Border Between Seeing and Thinking synthesizes a vast array of experimental results to argue that there is a ‘joint’ – a fundamental explanatory difference – between perception and cognition. Perceptual states, on his view, are constitutively iconic, non-conceptual and non-propositional; cognitive states do not possess these features constitutively, to the extent they possess them at all. It’s argued that these constitutive features mesh with perception’s function to quickly but reliably represent aspects of the here and now and help explain why clear cases of perception exhibit various empirical marks, such as adaptation, rivalry, pop-out and illusory contours.¹ Notably, Block’s characterization does not preclude (limited) top-down effects of cognition on perception. But where does language fall in relation to this joint? Might it challenge the distinction? Language – more specifically, for our purposes, utterance comprehension – is complex, involving multiple kinds of processing and representations. Where some aspects or stages fall is fairly clear: the extraction of phonemes, on the perceptual side; judgements of what was asserted, on the cognitive. But things are less clear with others. Representations of syntax in comprehension are, like typical cognitive states, amodal and not iconic. But, like typical perceptual processes, parsing is stimulus-driven, fast, (nearly) automatic and to some extent modular (Ferreira and Nye 2017). Fodor (1983: 44) thus grouped parsing (and some other aspects of linguistic comprehension) with ‘traditional’ cases of perception, arguing that these ‘input systems’ form an important natural kind. Some go further: they claim that utterance comprehension more generally is perceptual, even the assignment of meaning or content. Brogaard (2017, 2020) supports this by adverting to the kind of empirical marks – susceptibility to pop-out, adaptation etc. – that Block argues are diagnostic of clear cases of perception (Gross (Forthcoming) critically discusses empirical arguments for ‘perceiving meaning’).
... However, Dorsi et al. (2021) recently reported selective adaptation effects which were consistent with a multisensory integration-supported phonemic restoration effect (i.e. visual speech + auditory noise results in participants hearing the noise as speech; see also Samuel, 1997;Warren, 1970), suggesting that selective adaptation may be sensitive to multisensory integration in some contexts. Further research should investigate the contrasting conclusions produced by these studies, as the temporal relationship between selective adaptation and multisensory integration remains an open question. ...
Article
The McGurk effect is an illusion in which visible articulations alter the perception of auditory speech (e.g., video ‘da’ dubbed with audio ‘ba’ may be heard as ‘da’). To test the timing of the multisensory processes that underlie the McGurk effect, Ostrand et al. Cognition 151, 96–107, 2016 used incongruent stimuli, such as auditory ‘bait’ + visual ‘date’ as primes in a lexical decision task. These authors reported that the auditory word, but not the perceived (visual) word, induced semantic priming, suggesting that the auditory signal alone can provide the input for lexical access, before multisensory integration is complete. Here, we conceptually replicate the design of Ostrand et al. (2016), using different stimuli chosen to optimize the success of the McGurk illusion. In contrast to the results of Ostrand et al. (2016), we find that the perceived (i.e., visual) word of the incongruent stimulus usually induced semantic priming. We further find that the strength of this priming corresponded to the magnitude of the McGurk effect for each word combination. These findings suggest, in contrast to the findings of Ostrand et al. (2016), that lexical access makes use of integrated multisensory information which is perceived by the listener. These findings further suggest that which unimodal signal of a multisensory stimulus is used in lexical access is dependent on the perception of that stimulus.
... Listeners may leverage lexical information to accomplish this task by mapping sounds to categories that create real words in their lexicon (Luthra, Guediche, Blumstein, & Myers, 2019;Pisoni & Tash, 1974;Wedel & Fatkullin, 2017). The theory that the lexicon plays a role in allowing listeners to process phonetically variable speech is well established (e.g., Connine & Clifton Jr, 1987;Marslen-Wilson, 1984;McClelland & Elman, 1986;Samuel, 1996Samuel, , 1997Samuel, , 2001. Early work by Ganong (1980) established that lexical context disambiguates ambiguous speech sounds. ...
Article
Full-text available
Listeners use lexical information and the speech signal to categorize sounds and recognize words despite substantial acoustic-phonetic variation in natural speech. In diachronic mergers, where systematic variation acts to neutralize lexical contrasts, the role of the lexicon becomes less clear. We examined how lexical competition structures phonetic variability of (merging) lexical tone categories in Cantonese using three experiments. Listeners categorized tokens from lexical tone continua generated from minimal pairs (Experiment 1: Word identification) and categorized tokens from tone continua generated from word-nonword pairs (Experiment 2: Lexical decision). The presence of a lexical competitor at both continuum endpoints in Experiment 1 maintained more discrete categorization functions for non-merging tone pairs than in Experiment 2 where only one endpoint was a word. In the merging tone pairs, categorization was less discrete and the effect of lexical competition was different. Exploratory data from a goodness rating task, Experiment 3, suggest that lexical competition affects internal category structure for merging tones, but not non-merging tones. Overall, these data provide evidence that tone mergers affect phonetic category boundaries and internal category structure in the lexicon, and that, for non-merging tones, the range of acceptable phonetic variation is constrained by the presence of a lexical competitor.
... In a top-down model, on the other hand, higher levels of information (i.e., contextual information) is processed to recognize words and comprehend speech, even with limited segmental information (see e.g., Samuel 1997). For example, if the listener believes she has recognized the word "thrift", this higherlevel information influences her perception of lower-level phonemes, biasing her to believe she has perceived the segments /θ/, /ɹ/, /ɪ/, /f/, and /t/. ...
Article
We examined the contributions of segment type (consonants vs. vowels) and segment ratio to word recognition in Arabic sentences, a language that has a nonconcatenative morphological system in which consonants indicate semantic information, while vowels indicate structural information. In two experiments (with a balanced vowel-to-consonant ratio in Experiment 1 and an imbalanced ratio in Experiment 2), we presented participants with spoken sentences in Modern Standard Arabic, in which either consonants or vowels had been replaced by silence, and asked them to report what they could understand. The results indicate that consonants play a much greater role than vowels, both for balanced and also imbalanced sentences. The results also show greater word recognition for stimuli that contained a higher ratio of consonants to vowels. These results support and supplement previous findings on the role of consonantal roots in word recognition in Semitic languages, but clearly differ from those previously reported for non-Semitic languages which highlight the role of vowels in word recognition at the sentence level. We interpret this within the framework of root-and-pattern morphology, and further argue that segmental effects on word recognition and speech processing are crucially modulated by morphological structure.
... The performance profile of the healthy control group here demonstrates phonemic restoration relatively greater for real spoken words than for pseudowords (as evidenced by a bias towards hearing interpolating noise bursts as overlaying rather than interrupting spoken words). This profile of retained phonemic restoration modulated by top-down lexical context effects is in line both with prevailing models of auditory word processing 35 and with previous work in older listeners using alternative phonemic restoration paradigms. 12,16,17 However, the nature and extent of phonemic restoration differed for the dementia groups. ...
Article
Full-text available
Phonemic restoration – perceiving speech sounds that are actually missing – is a fundamental perceptual process that ‘repairs’ interrupted spoken messages during noisy everyday listening. As a dynamic, integrative process, phonemic restoration is potentially affected by neurodegenerative pathologies but this has not been clarified. Here we studied this phenomenon in five patients with typical Alzheimer’s disease and four patients with semantic dementia, relative to twenty-two age-matched healthy controls. Participants heard isolated sounds, spoken real words and pseudowords in which noise bursts either overlaid a consonant or replaced it; a tendency to hear replaced (missing) speech sounds as present signified phonemic restoration. All groups perceived isolated noises normally and showed phonemic restoration of real words, most marked in Alzheimer patients. For pseudowords, healthy controls showed no phonemic restoration, while Alzheimer patients showed marked suppression of phonemic restoration and patients with semantic dementia contrastingly showed phonemic restoration comparable to real words. Our findings provide the first evidence that phonemic restoration is preserved or even enhanced in neurodegenerative diseases, with distinct syndromic profiles that may reflect the relative integrity of bottom-up phonological representation and top-down lexical disambiguation mechanisms in different diseases. This work has theoretical implications for predictive coding models of language and neurodegenerative disease, and for understanding cognitive ‘repair’ processes in dementia. Future research should expand on these preliminary observations with larger cohorts.
... This result indicates that predictive mechanisms are used in the PAC, and that they involve top-down processing from associative representations constraining the phonological processing of the verb. In general, top-down processing during spoken language processing is not surprising, as argued by Samuel (1981Samuel ( , 1997, who investigated phoneme restoration: lexical representations helped the recognition of phonemes that were replaced by another sound during spoken word processing. As word recognition is guided by lexical knowledge, the recognition of a verb inflection during subject-verb agreement is constrained by the associative information between subject and verb inflection that is obtained after hearing the subject pronoun. ...
Thesis
This thesis is an attempt to contribute more information about subject-verb agreement in spoken language processing. The subject-verb agreement contains thematic role information that informs the listener who does the action and how many people are involved. To understand the meaning of a sentence, it is therefore essential to recognize words sharing subject-verb relationships. By recording the brain activity with the electroencephalography (EEG) method, it has been found that abstract morphosyntactic features (number, person, and gender) were accessed and separately used during agreement processing in reading when morphosyntactic violations were introduced and abstract morphosyntactic features were manipulated. Despite this, we know little about the nature of representations and processes involved in agreement processing. By using brain measures, this thesis investigates the nature of the representations operating in subject-verb agreement processing by examining two levels of representations (abstract and associative), the flexibility in accessing these two levels of representations and the role of prediction in the computation of subject-verb dependencies. To this end, we will examine the subject-verb agreement in spoken language processing with the French language.To achieve these three aims, we conducted three studies in which we manipulated both the nature of agreement violations in terms of abstract features (single violation of person feature, single violation of number feature, and double violation of person and number features) and associative representations by contrasting pronouns which had either a high co-occurrence frequency with one verbal inflection in French language use (high associative frequency) or a low co-occurrence frequency (low associative frequency). Our ERP results elicited by spoken verbs when they were preceded by pronouns confirmed the access to abstract features representations in spoken language as soon as the verbal inflection is recognized. Moreover, it was found that associative representations were also used in the processing of subject-verb agreement. By using the associative representations after hearing the pronoun, the cognitive system actively makes a prediction about the upcoming verbal inflection, leading up to affect the verbal processing at low levels from its initial phoneme.For the second aim, we also manipulated the task demands in two EEG experiments by using the lexical decision task (LDT) or the noun categorization task. Our ERP results time-locked to verbs preceded by pronouns showed that there is flexibility in accessing the abstract representations such that their access was enhanced by the lexical decision task. On the contrary, the sensitivity to associative representations between pronominal subject and verbal inflection was observed, regardless the task demands, in lexical decision and noun categorization tasks.Regarding the third aim, we conducted a magnetoencephalographical (MEG) study with the same stimuli as in our previous EEG experiments. In line with our previous findings, MEG data time-locked to the verb onset showed an influence of associative frequency in the early stage of verbal processing at phonological level in the primary auditory cortex. This suggests that higher-level representations such as associative representations were used to preactivate information related to the upcoming verbal inflection immediately after the recognition of pronouns, causing low-level processing of new information to be affected. This prediction in subject-verb agreement was also associated with the activation of inferior frontal cortex and motor area. Overall, this thesis makes a strong contribution in the understanding of subject-verb agreement by showing a flexible access to different representational levels and the role of prediction from statistical information in language use.
... Expectation effects were also found by Samuel in 1981 who showed that words with a syllable replaced by noise were more likely to be reported as intact words if those words were incorporated into a sentence (Samuel, 1981). Samuel later (Samuel, 1997) showed that phonemic restoration introduced adaptation effects similar to those predicted by previous top-down models (e.g., the TRACE model; Mcclelland and Elman, 1986). More recently, it has been shown that the phonemic restoration effect remains intact despite voice discontinuities pre-and postnoise gap. ...
Article
Full-text available
It has become widely accepted that humans use contextual information to infer the meaning of ambiguous acoustic signals. In speech, for example, high-level semantic, syntactic, or lexical information shape our understanding of a phoneme buried in noise. Most current theories to explain this phenomenon rely on hierarchical predictive coding models involving a set of Bayesian priors emanating from high-level brain regions (e.g., prefrontal cortex) that are used to influence processing at lower-levels of the cortical sensory hierarchy (e.g., auditory cortex). As such, virtually all proposed models to explain top-down facilitation are focused on intracortical connections, and consequently, subcortical nuclei have scarcely been discussed in this context. However, subcortical auditory nuclei receive massive, heterogeneous, and cascading descending projections at every level of the sensory hierarchy, and activation of these systems has been shown to improve speech recognition. It is not yet clear whether or how top-down modulation to resolve ambiguous sounds calls upon these corticofugal projections. Here, we review the literature on top-down modulation in the auditory system, primarily focused on humans and cortical imaging/recording methods, and attempt to relate these findings to a growing animal literature, which has primarily been focused on corticofugal projections. We argue that corticofugal pathways contain the requisite circuitry to implement predictive coding mechanisms to facilitate perception of complex sounds and that top-down modulation at early (i.e., subcortical) stages of processing complement modulation at later (i.e., cortical) stages of processing. Finally, we suggest experimental approaches for future studies on this topic.
... Such effects do not end at the level of individual speech sounds. Our experience with grouping certain combinations of speech sounds into larger units (such as words) causes us to perceive the same sounds differently, depending on which word they are embedded in [17,18]. Learning to read has profound impacts on a large part of our visual cortex (e.g., [19]); the consequences can be readily appreciated by comparing the experience of looking at a familiar versus an unfamiliar writing system. ...
Article
Does language change what we perceive? Does speaking different languages cause us to perceive things differently? We review the behavioral and electrophysiological evidence for the influence of language on perception, with an emphasis on the visual modality. Effects of language on perception can be observed both in higher-level processes such as recognition and in lower-level processes such as discrimination and detection. A consistent finding is that language causes us to perceive in a more categorical way. Rather than being fringe or exotic, as they are sometimes portrayed, we discuss how effects of language on perception naturally arise from the interactive and predictive nature of perception.
... Still other work proposes that cognition emerges from coupling of the brain, body, and environment (e.g., Aydede & Robbins, 2009;Barsalou, Dutriaux, & Scheepers, 2018;Dutriaux, Clark, Papies, Scheepers, & Barsalou, 2019;Gibson, 1966Gibson, , 1979Hutchins, 1995;Newen, Bruin, & Gallagher, 2018;Thompson, 2010;Varela, Thompson, & Rosch, 2016). Finally, considerable empirical work demonstrates that higher-level cognitive processes penetrate deeply into the activity of perceptual systems, affecting their computations (Clark, 2013;Marslen-Wilson & Tyler, 1980;McClelland & Rumelhart, 1981;Muckli et al., 2015;Muckli & Petro, 2017;Murray, Boyaci, & Kersten, 2006;Rumelhart & McClelland, 1982;Samuel, 1997;Smith & Muckli, 2010). From the perspective of all this work, it appears increasingly difficult to defend the position that an autonomous impenetrable module in the brain implements cognition. ...
Article
Full-text available
According to the grounded perspective, cognition emerges from the interaction of classic cognitive processes with the modalities, the body, and the environment. Rather than being an autonomous impenetrable module, cognition incorporates these other domains intrinsically into its operation. The Situated Action Cycle offers one way of understanding how the modalities, the body, and the environment become integrated to ground cognition. Seven challenges and opportunities are raised for this perspective: (1) How does cognition emerge from the Situated Action Cycle and in turn support it? (2) How can we move beyond simply equating embodiment with action, additionally establishing how embodiment arises in the autonomic, neuroendocrine, immune, cardiovascular, respiratory, digestive, and integumentary systems? (3) How can we better understand the mechanisms underlying multimodal simulation, its functions across the Situated Action Cycle, and its integration with other representational systems? (4) How can we develop and assess theoretical accounts of symbolic processing from the grounded perspective (perhaps using the construct of simulators)? (5) How can we move beyond the simplistic distinction between concrete and abstract concepts, instead addressing how concepts about the external and internal worlds pattern to support the Situated Action Cycle? (6) How do individual differences emerge from different populations of situational memories as the Situated Action Cycle manifests itself differently across individuals? (7) How can constructs from grounded cognition provide insight into the replication and generalization crises, perhaps from a quantum perspective on mechanisms (as exemplified by simulators).
... When listeners rely on contextual information to resolve ambiguous speech acoustic input, they learn in a manner that influences how they later categorize acoustically-ambiguous speech even when the disambiguating context is no longer present (Bertelson, Vroomen, & de Gelder, 2003;Guediche, Fiez, & Holt, 2016;Idemaru & Holt, 2011;Kraljic & Samuel, 2005;Maye, Aslin, & Tanenhaus, 2008;Norris, McQueen, & Cutler, 2003). For example, listeners exposed to distorted (Davis, Johnsrude, Hervais-Adelman, Taylor, & McGettigan, 2005;Guediche, Holt, Laurent, Lim, & Fiez, 2014; Hervais-Adelman, Davis, Johnsrude, & Carlyon, 2008;Norris et al., 2003;Samuel, 1997) or accented (Bradlow & Bent, 2008;Maye et al., 2008) speech exhibit rapid perceptual learning to details of speech acoustics such that later encounters with similarly distorted or accented speech exhibits accommodation of the experienced variability (see Guediche et al., 2014;Samuel & Kraljic, 2009), observed as increases in intelligibility or a shift in speech categorization. ...
Article
Full-text available
Speech is notoriously variable, with no simple mapping from acoustics to linguistically-meaningful units like words and phonemes. Empirical research on this theoretically central issue establishes at least two classes of perceptual phenomena that accommodate acoustic variability: normalization and perceptual learning. Intriguingly, perceptual learning is supported by learning across acoustic variability, but normalization is thought to counteract acoustic variability leaving open questions about how these two phenomena might interact. Here, we examine the joint impact of normalization and perceptual learning on how acoustic dimensions map to vowel categories. As listeners categorized nonwords as setch or satch, they experienced a shift in short-term distributional regularities across the vowels' acoustic dimensions. Introduction of this ‘artificial accent’ resulted in a shift in the contribution of vowel duration in categorization. Although this dimension-based statistical learning impacted the influence of vowel duration on vowel categorization, the duration of these very same vowels nonetheless maintained a consistent influence on categorization of a subsequent consonant via duration contrast, a form of normalization. Thus, vowel duration had a duplex role consistent with normalization and perceptual learning operating on distinct levels in the processing hierarchy. We posit that whereas normalization operates across auditory dimensions, dimension-based statistical learning impacts the connection weights among auditory dimensions and phonetic categories.
... Such effects do not end at the level of individual speech sounds. Our experience with grouping certain combinations of speech sounds into larger units (such as words) causes us to perceive the same sounds differently, depending on which word they are embedded in [17,18]. Learning to read has profound impacts on a large part of our visual cortex (e.g., [19]); the consequences can be readily appreciated by comparing the experience of looking at a familiar versus an unfamiliar writing system. ...
Preprint
Does language “reach into” perception to change what we perceive? Does speak- ing different languages cause us to perceive things differently? We review the behavioral and electrophysiological evidence that visual perception is shaped by both long-term experience with language and its rapid involvement in-the- moment. These effects can be observed both in higher-level processes such as recognition, and lower-level processes such as discrimination and detection. A consistent finding is that language causes us to perceive in a more categorical way. Rather than being fringe or exotic, as they are sometimes portrayed, we discuss how effects of language on perception naturally arise from the interactive and predictive nature of perception.
... This result is not unexpected given differences between the current experiment and paradigms used to study selective adaptation. Experiments on selective adaptation tend to repeat the typical pronunciation many dozens of times (e.g., Bowers, Kazanina, & Andermane, 2016;Samuel, 1989Samuel, , 1997Vroomen et al., 2007). By contrast, in the current experiment, the repeated typical sounds totaled only 24 tokens. ...
Article
Full-text available
Perceptual recalibration allows listeners to adapt to talker-specific pronunciations, such as atypical realizations of specific sounds. Such recalibration can facilitate robust speech recognition. However, indiscriminate recalibration following any atypically pronounced words also risks interpreting pronunciations as characteristic of a talker that are in reality because of incidental, short-lived factors (such as a speech error). We investigate whether the mechanisms underlying perceptual recalibration involve inferences about the causes for unexpected pronunciations. In 5 experiments, we ask whether perceptual recalibration is blocked if the atypical pronunciations of an unfamiliar talker can also be attributed to other incidental causes. We investigated 3 type of incidental causes for atypical pronunciations: the talker is intoxicated, the talker speaks unusually fast, or the atypical pronunciations occur only in the context of tongue twisters. In all 5 experiments, we find robust evidence for perceptual recalibration, but little evidence that the presence of incidental causes block perceptual recalibration. We discuss these results in light of other recent findings that incidental causes can block perceptual recalibration. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
... According to Barsalou's (2003) analysis, exemplar models assume that categorization is modular, stable, and implicitly taxonomic in organization, however, the empirical evidence suggests that categorization is non-modular (Reed & Vinson, 1996;Samuel, 1997;Stevens, Fonlupt, Shiffrar, & Decety, 2000), dynamical (McCloskey & Glucksberg, 1978;Smith & Samuelson, 1997), and with an organization that emerges as a consequence of goal-directed behavior (Barsalou, 1991(Barsalou, , 2003Vallee-Tourangeau, Anthony, & Austin, 1998). These limitations of exemplar models pose a challenge to the view of moral categorization proposed by Harman et al. (2010). ...
Preprint
Full-text available
Observed variability and complexity of judgments of 'right' and 'wrong' cannot currently be readily accounted for within extant approaches to understanding moral judgment. In response to this challenge we present a novel perspective on categorization in moral judgment. Moral judgment as categorization (MJAC) incorporates principles of category formation research while addressing key challenges to existing approaches to moral judgment. People develop skills in making context-relevant categorizations. That is, they learn that various objects (events, behaviors, people etc.) can be categorized as morally ‘right’ or ‘wrong’. Repetition and rehearsal results in reliable, habitualized categorizations. According to this skill formation account of moral categorization, the learning and the habitualization of the forming of moral categories, occurs within goal-directed activity that is sensitive to various contextual influences. By allowing for the complexity of moral judgments, MJAC offers greater explanatory power than existing approaches, while also providing opportunities for a diverse range of new research questions.
... However, the phonetic-phonological correspondences differ across languages (e.g., Lisker & Abramson, 1964), and thus the informative variability in talker-specific phonetic idiosyncrasies may be more opaque to listeners when they are identifying foreign-language voices. Higher-level linguistic structure, such as words, guides both the perception and interpretation of ambiguous phonetic information (Getz & Toscano, 2019;Samuel, 1997Samuel, , 2001 and can facilitate phonetic processing even in an unfamiliar language (Samuel & Frost, 2015). Correspondingly, by providing listeners with higher-level linguistic representations through which they can interpret the ambiguous phonetics of foreign language speech, known lexical content may give listeners a scaffold upon which they can extract more information about talker-specific phonetic variation and thus facilitate foreignlanguage talker identification. ...
Article
Listeners identify talkers more accurately when listening to their native language compared to an unfamiliar, foreign language. This language-familiarity effect in talker identification has been shown to arise from familiarity with both the sound patterns (phonetics and phonology) and the linguistic content (words) of one's native language. However, it has been unknown whether these two sources of information contribute independently to talker identification abilities, particularly whether hearing familiar words can facilitate talker identification in the absence of familiar phonetics. To isolate the contribution of lexical familiarity, we conducted three experiments that tested listeners’ ability to identify talkers saying familiar words, but with unfamiliar phonetics. In two experiments, listeners identified talkers from recordings of their native language (English), an unfamiliar foreign language (Mandarin Chinese), or “hybrid” speech stimuli (sentences spoken in Mandarin, but which can be convincingly coerced to sound like English when presented with subtitles that prime plausible English-language lexical interpretations based on the Mandarin phonetics). In a third experiment, we explored natural variation in lexical-phonetic congruence as listeners identified talkers with varying degrees of a Mandarin accent. Priming listeners to hear English speech did not improve their ability to identify talkers speaking Mandarin, even after additional training, and talker identification accuracy decreased as talkers’ phonetics became increasingly dissimilar to American English. Together, these experiments indicate that unfamiliar sound patterns preclude talker identification benefits otherwise afforded by familiar words. These results suggest that linguistic representations contribute hierarchically to talker identification; the facilitatory effect of familiar words requires the availability of familiar phonological forms.
... Consider two points that follow from this fact. On the one hand, it is known that lexical information can modulate perceptual categorization (Ganong, 1980;Samuel, 1997Samuel, , 2001. Therefore, it would not be surprising if a study found that bilinguals exploit lexical information to activate languagespecific perceptual routines. ...
Article
In the present study, Spanish-English bilinguals’ perceptual boundaries between voiced and voiceless stops (a /b/-/p/ continuum including pre-voiced, voiceless unaspirated, and voiceless aspirated tokens) are shown to be modulated by whether participants are “led to believe” they are classifying Spanish or English sounds. In Experiment 1, simultaneous Spanish-English bilinguals and beginner second-language learners of Spanish labeled the same acoustic continuum in two experimental sessions (Spanish mode, English mode), and both groups were found to display language-specific perceptual boundaries (or session effects). In Experiment 2, early bilinguals and late second-language learners of various levels of proficiency participated in a single session in which, in random order, they labeled nonwords that were designed to prime either Spanish or English language modes. Early bilinguals and relatively proficient second-language learners, but not less proficient learners, displayed mode-specific perceptual normalization criteria even in conditions of rapid, random mode switching. Along with similar ones, the experiments reported here demonstrate that bilinguals are able to exploit language-specific perceptual processes (or norms) when processing speech sounds, which entails some degree of separation between their sound systems.
... McDonald and Shillcock 2003;Demberg and Keller 2008;Arnon and Snider 2010;Frank and Bod 2011;Smith and Levy 2013) and more accurate recognition in noise or when parts of the signal are missing (e.g. phoneme restoration, Grosjean 1980;Samuel 1997;Groppe 2010; for review, see Samuel 2010; see also Baese-Berk et al. 2018). These benefits of predictability are also reflected neurally in reduced surprisal responses (Frank et al. 2015;Willems et al. 2016; see also DeLong et al. 2005;Van Berkum et al. 2005; for discussion, see also Yan et al. 2017). ...
Article
A diverse set of empirical findings indicate that word predictability in context influences the fine-grained details of both speech production and comprehension. In particular, lower predictability relative to similar competitors tends to be associated with phonetic enhancement, while higher predictability is associated with phonetic reduction. We review evidence that these in-the-moment biases can shift the prototypical pronunciations of individual lexical items, and that over time, these shifts can promote larger-scale phonological changes such as phoneme mergers. We argue that predictability-associated enhancement and reduction effects are based on predictability at the level of meaning-bearing units (such as words) rather than at sublexical levels (such as segments) and present preliminary typological evidence in support of this view. Based on these arguments, we introduce a Bayesian framework that helps generate testable predictions about the type of enhancement and reduction patterns that are more probable in a given language.
... Consider two points that follow from this fact. On the one hand, it is known that lexical information can modulate perceptual categorization (Ganong, 1980;Samuel, 1997Samuel, , 2001. Therefore, it would not be surprising if a study found that bilinguals exploit lexical information to activate languagespecific perceptual routines. ...
Article
In the present study, Spanish-English bilinguals’ perceptual boundaries between voiced and voiceless stops (a /b/-/p/ continuum including pre-voiced, voiceless unaspirated, and voiceless aspirated tokens) are shown to be modulated by whether participants are “led to believe” they are classifying Spanish or English sounds. In Experiment 1, simultaneous Spanish-English bilinguals and beginner second-language learners of Spanish labeled the same acoustic continuum in two experimental sessions (Spanish mode, English mode), and both groups were found to display language-specific perceptual boundaries (or session effects). In Experiment 2, early bilinguals and late second-language learners of various levels of proficiency participated in a single session in which, in random order, they labeled nonwords that were designed to prime either Spanish or English language modes. Early bilinguals and relatively proficient second-language learners, but not less proficient learners, displayed mode-specific perceptual normalization criteria even in conditions of rapid, random mode switching. Along with similar ones, the experiments reported here demonstrate that bilinguals are able to exploit language-specific perceptual processes (or norms) when processing speech sounds, which entails some degree of separation between their sound systems.
... In fact, there is a close concordance between TRACE's tendency to "hallucinate" lexically consistent phonemes and the tendencies in misperceptions by human listeners in a study of lexically induced phoneme inhibition (Mirman et al., 2005). This finding is consistent with other examples of context-induced misperceptions, such as to failures to detect mispronunciations (Cole, 1973;Marslen-Wilson and Welsh, 1978), phoneme restoration (Samuel, 1981(Samuel, , 1996(Samuel, , 1997, and related findings in other domains, such as illusory contours in visual perception (Lee and Nguyen, 2001). It is crucial to recognize the distinction between optimal performance (the best possible under particular conditions) and perfect performance. ...
Article
Full-text available
Human perception, cognition, and action requires fast integration of bottom-up signals with top-down knowledge and context. A key theoretical perspective in cognitive science is the interactive activation hypothesis: forward and backward flow in bidirectionally connected neural networks allows humans and other biological systems to approximate optimal integration of bottom-up and top-down information under real-world constraints. An alternative view is that online feedback is neither necessary nor helpful; purely feed forward alternatives can be constructed for any feedback system, and online feedback could not improve processing and would preclude veridical perception. In the domain of spoken word recognition, the latter view was apparently supported by simulations using the interactive activation model, TRACE, with and without feedback: as many words were recognized more quickly without feedback as were recognized faster with feedback, However, these simulations used only a small set of words and did not address a primary motivation for interaction: making a model robust in noise. We conducted simulations using hundreds of words, and found that the majority were recognized more quickly with feedback than without. More importantly, as we added noise to inputs, accuracy and recognition times were better with feedback than without. We follow these simulations with a critical review of recent arguments that online feedback in interactive activation models like TRACE is distinct from other potentially helpful forms of feedback. We conclude that in addition to providing the benefits demonstrated in our simulations, online feedback provides a plausible means of implementing putatively distinct forms of feedback, supporting the interactive activation hypothesis.
... Nonwords were used because most of the /u/-/y/ minimal pairs in German are morphologically related, which we tried to avoid for reasons of clarity (e.g., Bruder 'brother'-Brüder 'brothers'). Importantly, in previous literature the use of nonwords is common (Eimas & Corbit, 1973;Samuel, 2001;Tartter & Eimas, 1975, among others) and no difference has been reported on how selective adaptation operates on nonword and word stimuli (Samuel, 1997). ...
Article
Full-text available
Phonological features have frequently been singled out as the units of perception, especially for vowels. Evidence of the use of features has been provided for vowel height and vowel position, which have one acoustic correlate only. However, findings on acoustically complex features such as tenseness are less clear. The present study assessed the role of phonological features in perception using the selective adaptation paradigm. Selective adaptation effects on German vowel contrasts differing in vowel height (Experiment 1), position (Experiment 2) and tenseness (Experiment 3) were examined. We tested how the categorization of each vowel contrast was affected by adaptation to words containing vowels that differently resembled or diverged from the vowels in the critical contrast acoustically and in terms of their phonological feature specifications. Results showed that selective adaptation patterns could be predicted by the vowels’ phonological features for the height and position contrasts, but not for the tenseness contrast. However, adaptation patterns for the latter can be explained by the relationship between adaptors and continuum endpoints in each of the relevant acoustic cues to the contrast. This suggests that vowel perception may be dependent on these acoustic cues rather than phonological features.
... There are several reasons to believe that comprehenders sometimes reconstruct language structure. First, they reconstruct missing sounds (Samuel, 1997). Second, they reanalyze or repair their initial analysis of grammatical sentences upon evidence that this analysis is incompatible with the actual input (Ferreira & Clifton, 1986;Fodor and Inoue, 1994;Frazier & Rayner, 1982;Sturt, Pickering, & Crocker, 1999;Trueswell, Tanenhaus, & Garnsey, 1994). ...
Article
Full-text available
We frequently experience and successfully process anomalous utterances. Here we examine whether people do this by ‘correcting’ syntactic anomalies to yield well-formed representations. In two structural priming experiments, participants’ syntactic choices in picture description were influenced as strongly by previously comprehended anomalous (missing-verb) prime sentences as by well-formed prime sentences. Our results suggest that comprehenders can reconstruct the constituent structure of anomalous utterances – even when such utterances lack a major structural component such as the verb. These results also imply that structural alignment in dialogue is unaffected if one interlocutor produces anomalous utterances.
... Participants presented with the adaptors with unambiguous VOT values were not expected to show a shift in categorization as a function of the adaptor set (/b/-words, /p/-words) because (i) SA is not presumed to be affected by lexical influences itself (Samuel, 1997), and (ii) the unambiguous acoustic information should be enough to determine the identity of the first consonant in each case (Connine and Clifton, 1987), regardless of the lexicality of the resulting adaptors (e.g., posit vs *bosit). In the ambiguous VOT condition, however, lexical effects on SA were put to test. ...
Article
Full-text available
Limited exposure to ambiguous auditory stimuli results in perceptual recalibration. When unambiguous stimuli are used instead, selective adaptation (SA) effects have been reported, even after few adaptor presentations. Crucially, selective adaptation by an ambiguous sound in biasing lexical contexts had previously been found only after massive adaptor repetition [Samuel (2001). Psychol. Sci. 12(4), 348–351]. The present study shows that extensive exposure is not necessary for lexically driven selective adaptation to occur. Lexically driven selective adaptation can arise after as few as nine adaptor presentations. Additionally, build-up course inspection reveals several parallelisms with the time course observed for SA with unambiguous stimuli.
... This overstates the potential for hallucination in TRACE (as the "trace" preserves details of malformed input, and model behavior differs greatly given welland malformed input; McClelland & Elman, 1986, e.g., Figures 7-11). In addition, the "hallucination" claim is typically described as a thought-experiment that falsifies interactive feedback, but this underestimates actual human misperception: in lexical inhibition tests (Mirman et al., 2005), listeners exhibited a tendency toward lexically-induced misperception and this finding is consistent with other contextually-appropriate but illusory perceptions of speech such as failures to detect mispronunciations (Cole, 1973;Marslen-Wilson & Welsh, 1978), hearing noise-replaced phonemes ("phoneme restoration": Samuel, 1981;1996;1997;Warren, 1970), and similar findings from other modalities, such as illusory visual contours (Lee & Nguyen, 2001). In sum, the pattern of phoneme identification phenomena in the literature, including lexically-induced delays and errors, is consistent with direct feedback from lexical to pre-lexical processing. ...
... The question of what information is modulated at the level of selective adaptation is made more complicated by reports of changes in auditory speech perception following adaptation to illusory phonetic information resolved from lexical context (e.g., Samuel, 1997Samuel, , 2001Samuel & Lieblich, 2014). For example, adaptation to auditory word-utterances containing a critical consonant (i.e. ...
Article
Research suggests that selective adaptation in speech is a low-level process dependent on sensory-specific information shared between the adaptor and test-stimuli. However, previous research has only examined how adaptors shift perception of unimodal test stimuli, either auditory or visual. In the current series of experiments, we investigated whether adaptation to cross-sensory phonetic information can influence perception of integrated audio-visual phonetic information. We examined how selective adaptation to audio and visual adaptors shift perception of speech along an audiovisual test continuum. This test-continuum consisted of nine audio-/ba/-visual-/va/ stimuli, ranging in visual clarity of the mouth. When the mouth was clearly visible, perceivers "heard" the audio-visual stimulus as an integrated "va" percept 93.7% of the time (e.g., McGurk & MacDonald, 1976). As visibility of the mouth became less clear across the nine-item continuum, the audio-visual "va" percept weakened, resulting in a continuum ranging in audio-visual percepts from /va/ to /ba/. Perception of the test-stimuli was tested before and after adaptation. Changes in audiovisual speech perception were observed following adaptation to visual-/va/ and audiovisual-/va/, but not following adaptation to auditory-/va/, auditory-/ba/, or visual-/ba/. Adaptation modulates perception of integrated audio-visual speech by modulating the processing of sensory-specific information. The results suggest that auditory and visual speech information are not completely integrated at the level of selective adaptation.
... Others, on the contrary, pointed to the involvement of the high-level cognitive processes, based on the influence that the context and the type of speech materials used had on the perception of interrupted speech (Miller and Licklider 1950;Warren and Sherman 1974;Bashford and Warren 1979;Warren 1983;Bashford et al. 1992;Sivonen et al. 2006b;Grossberg and Kazerounian 2011). Recent studies that showed a deficit in restoration benefit with (real or simulated) hearing impairment implied that the restoration may actually be governed by a combination of the bottom-up peripheral and top-down cognitive processes (Nelson and Jin 2004;Chatterjee et al. 2010;Başkent 2010;Başkent and Chatterjee 2010;Başkent 2012;Bhargava and Başkent 2012), in agreement with general highlevel speech and sound perception mechanisms in complex listening environments (Bronkhorst et al. 1993;Samuel 1997;Alain et al. 2001;Winkler et al. 2005;Davis and Johnsrude 2007;Janse and Ernestus 2011). Hence, the consensus from recent studies is that cognitive processes are involved in the phonemic restoration mechanism, but up to what degree is still not clear. ...
Thesis
Full-text available
MECHANISMS OF TOP-DOWN RESTORATION OF DEGRADED SPEECH All results combined, I conclude that both top-down and bottom-up processes play an important role in the restoration of interrupted speech. Especially high-level linguistic mechanisms seem to have a large influence on the restoration of interrupted speech. Receptive vocabulary and verbal intelligence are shown to be significant predictors of successful restoration of interrupted sentences without spectral degradations. These top-down restoration mechanisms are shown to be less effective if the bottom-up auditory signal is of insufficient quality (as occurs in CI speech processing). IMPLICATIONS FOR CI USERS Our overall results suggest that better perception of interrupted speech can indeed be achieved via training, even with spectrotemporal degradations of CI speech transmission. Since linguistic skills play an important role in the restoration of spectrally degraded interrupted speech, CI users can possibly be trained to improve their linguistic skills by reading books or solving crossword puzzles. Furthermore, providing relatively simple feedback, even the text of the sentence, seems to be an effective feedback to lead to successful learning. Finally, lip-reading aids in speech perception of interrupted speech and is often available in daily speech communication for CI users.
... McClelland and Elman, 1986) or whether the effects arise later, after the initial perceptual mapping (modular/feed-forward accounts, Fodor, 1983;Norris et al., 2000). For example, there is some evidence for lexical influence on phoneme level processing (Samuel, 1990;Samuel, 1996;Samuel, 1997), but the question remains controversial (e.g. Massaro and Cohen, 1991). ...
... In particular, knowledge of what strings of phonemes are words in one's language affects the perception of phonemes. Samuel (1997) used the phenomenon of phonemic restoration, in which people still report hearing a sound (e.g., the middle /s/ in legislature) if it is removed and replaced by noise, to show that knowledge about words affects how much information listeners need about a sound to identify it. However, autonomous models, which do not allow top-down processes (an effect of word-knowledge on phoneme perception is one example of such a process), have had some success in accounting for such findings in other ways. ...
Chapter
To produce and comprehend words and sentences, people use their knowledge of language structure; their knowledge of the situation they are in, including the previous discourse and the local situation; and their cognitive abilities, including memory, attention, and motor control. In this chapter, we explore how competent adult language users bring such knowledge and abilities to bear on the tasks of comprehending spoken and written language and producing spoken language. We emphasize experimental data collected using the tools of cognitive psychology, touching only briefly on language development, disordered language, and the neural basis of language. We also review some of the major theoretical controversies that have occupied the field of psycholinguistics, including the role that linguistic analyses of language structure should play and the debate between modular and interactive views. We also present some of the theoretical positions that have proven successful in guiding our understanding of language processing. We conclude by discussing the need to integrate studies of language comprehension and language production and pointing to emerging research topics.Keywords:psycholinguistics;auditory word recognition;reading;lexical access;sentence comprehension;word production;sentence production
... However, because this work only examines the output of processing, it is unclear whether this combined influence reflects interactive processing (direct lexical influences on perceptual processing), or the mutual effects of lexical and acoustic-phonetic analyses on post-perceptual response selection (Norris, McQueen, & Cutler, 2000). Several studies have addressed this limitation by examining the influence of lexically "restored" phonemes on low-level perceptual adaptation and context effects (Elman & McClelland, 1988;Magnuson et al., 2003a;Magnuson et al., 2003b;Samuel, 1997Samuel, , 2001Samuel & Pitt, 2003). In these studies, phonemes are typically excised from recordings of words, and replaced by noise. ...
Article
Phonotactic frequency effects play a crucial role in a number of debates over language processing and representation. It is unclear however, whether these effects reflect prelexical sensitivity to phonotactic frequency, or lexical ‘‘gang effects’’ in speech perception. In this paper, we use Granger causality analysis of MR-constrained MEG/EEG data to understand how phonotactic frequency influences neural processing dynamics during auditory lexical decision. Effective connectivity analysis showed weaker feedforward influence from brain regions involved in acoustic–phonetic processing (superior temporal gyrus) to lexical areas (supramarginal gyrus) for high phonotactic frequency words, but stronger top-down lexical influence for the same items. Low entropy nonwords (nonwords judged to closely resemble real words) showed a similar pattern of interactions between brain regions involved in lexical and acoustic–phonetic processing. These results contradict the predictions of a feedforward model of phonotactic frequency facilitation, but support the predictions of a lexically mediated account.
Article
I argue that results in perception science do not support the claim that there is semantic perception or that typical, unreflective utterance comprehension is a perceptual process. Phenomena discussed include evidence-insensitivity, the Stroop effect, pop-out, and adaptation – as well as how these phenomena might relate to the function, format, and structure of perceptual representations. An emphasis is placed on non-inferential transitions from perceptual to conceptual representations, which are important for debates about the admissible contents of perception more generally.
Article
Full-text available
Five theories of spoken word production that differ along the discreteness–interactivity dimension are evaluated. Specifically examined is the role that cascading activation, feedback, seriality, and interaction domains play in accounting for a set of fundamental observations derived from patterns of speech errors produced by normal and brain-damaged individuals. After reviewing the evidence from normal speech errors, case studies of 3 brain-damaged individuals with acquired naming deficits are presented. The patterns these individuals exhibit provide important constraints on theories of spoken naming. With the help of computer simulations of the 5 theories, the authors evaluate the extent to which the error patterns predicted by each theory conform with the empirical facts. The results support a theory of spoken word production that, although interactive, places important restrictions on the extent and locus of interactivity.
Chapter
This chapter reviews the computational processes that are responsible for recognizing word forms in the speech stream. We outline the different stages in a processing hierarchy from the extraction of general acoustic features, through speech‐specific prelexical processes, to the retrieval and selection of lexical representations. We argue that two recurring properties of the system as a whole are abstraction and adaptability. We also present evidence for parallel processing of information on different timescales, more specifically that segmental material in the speech stream (its consonants and vowels) is processed in parallel with suprasegmental material (the prosodic structures of spoken words). We consider evidence from both psycholinguistics and neurobiology wherever possible, and discuss how the two fields are beginning to address common computational problems. The challenge for future research in speech perception will be to build an account that links these computational problems, through functional mechanisms that address them, to neurobiological implementation.
Article
Sven Mattys is a Reader in Psychology of Language at the University of Bristol, UK. He earned a Ph.D. in psychology at the State University of New York (Stony Brook) in 1997. He subsequently worked on language acquisition at the Johns Hopkins University and on visual speech recognition at the House Ear Institute, Los Angeles. He currently works on the perceptual, cognitive, and physiological mechanisms underlying language development and speech recognition.
Article
Research suggests that selective adaptation in speech is a low-level process dependent on sensory-specific information shared between the adaptor and test-stimuli. However, previous research has only examined how adaptors shift perception of unimodal test stimuli, either auditory or visual. In the current series of experiments, we investigated whether adaptation to cross-sensory phonetic information can influence perception of integrated audiovisual phonetic information. We examined how selective adaptation to audio and visual adaptors shift perception of speech along an audiovisual test continuum. This test-continuum consisted of nine audio-/ba/-visual-/va/ stimuli, ranging in visual clarity of the mouth. When the mouth was clearly visible, perceivers "heard" the audio-visual stimulus as an integrated "va" percept 93.7% of the time (e.g., McGurk & MacDonald, 1976). As visibility of the mouth became less clear across the nine-item continuum, the audio-visual "va" percept weakened, resulting in a continuum ranging in audio-visual percepts from /va/ to /ba/. Perception of the test-stimuli was tested before and after adaptation. Changes in audiovisual speech perception were observed following adaptation to visual-/va/ and audiovisual-/va/, but not following adaptation to auditory-/va/, auditory-/ba/, or visual-/ba/. Adaptation modulates perception of integrated audiovisual speech by modulating the processing of sensory-specific information. The results suggest that auditory and visual speech information are not completely integrated at the level of selective adaptation. Published by Elsevier Ltd.
Article
An argument that the way we listen to speech is shaped by our experience with our native language. © 2012 Massachusetts Institute of Technology. All rights reserved.
Article
Full-text available
Although a lot of information is available from our environment at every moment, only a small part gives rise to a conscious percept. It is then legitimate to wonder which mechanisms are involved in the perception phenomenon. On the basis of which processes will a sensory stimulation be perceived consciously? What happens to the stimulations that are not consciously perceived? The work presented in this thesis aims to bring some elements of response to these two questions in the auditory modality. Through different behavioral and electroencephalographic studies, we suggest that knowledge could have a top-down facilitatory influence on high-level as well as on low-level (like detection) processing of complex auditory stimulations. The stimulations we have some knowledge about (phonologic or semantic) are more easily detected than the stimulations that contain neither phonologic nor semantic information. We also show that the activation of the knowledge influences the perception of subsequent stimulations, even when the context is not perceived consciously. This is evidenced by a subliminal semantic priming effect and by modifications of the neural oscillations in the beta frequency band associated with lexical processing of stimulations that were not consciously categorized. Hence, auditory perception can be considered as the product of the continuous interaction between the context set by the environment and the knowledge one has about specific stimuli. Such an interaction would lead listeners to preferentially perceive what they already know.
Article
Full-text available
When an extraneous sound (such as a cough or tone) completely replaces a speech sound in a recorded sentence, listeners believe they hear the missing sound. The extraneous sound seems to occur during another portion of the sentence without interfering with the intelligibility of any phoneme. If silence replaces a speech sound, the gap is correctly localized and the absence of the speech sound detected.
Article
Full-text available
A critical issue in modeling speech perception is whether lexical representations can affect lower level (e.g., phonemic) processing. Phonemic restoration studies have provided support for such top-down effects, but there have also been a number of failures to find them. A methodology is introduced that provides good approximations to the underlying distributions of perceived intactness that are assumed in signal detection analyses of restoration. This methodology provides a sensitive means to determine the necessary conditions for lexical feedback to occur. When these conditions are created, a reliable lexical influence on phonemic perception results. The experiments thus show that lexical activation does influence lower level processing, and that these influences are fragile. The theoretical implications of real but fragile lexical effects are discussed. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
A fundamental goal of an information-processing approach to speech perception is to specify the levels of analysis between the initial sensory coding of the signal and the recognition of the phonetic sequence that it conveys. A series of experiments provides evidence for at least 3 qualitatively different levels of analysis involved in the perception of speech. Several properties for the representations of each level are described, including a locus (peripheral or monaurally driven vs. central or binaurally driven), a stimulus domain, and the mechanisms involved in response adjustment as a function of repeated stimulation. The stimulus domains for the 3 levels are (a) processes that deal with simple acoustic patterns, (b) processes that integrate more complex acoustic patterns, and (c) processes that represent categorical or phonetic information. The convergence among several different approaches used to determine levels of analysis supports the 3-level model. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
In seven experiments reaction time to detect the initial phoneme of words and nonwords was measured. Reaction time advantages for words over nonwords come and go according to the particular characteristics of the experimental situation. One relevant characteristic is degree of task monotony, an effect which is most parsimoniously explained by attention shifting between levels of processing. General classes of models of the relationship between levels of processing in comprehension are discussed in light of the results. Serial models incorporate an attention shift explanation of the monotony effect more elegantly than do interactive models. Alternative serial models are available in the literature in this area. One recent model, which allows only a single outlet point for phoneme detection responses, and hence requires that apparent reaction time advantages for words are artefactual, can be unambiguously rejected on the basis of the present data. It is argued that a serial model involving competition between target detection based on a prelexical representation and detection based on a lexical representation most satisfactorily accounts for the overall pattern of results.
Article
Full-text available
Two experiments are reported that demonstrate contextual effects on identification of speech voicing continua. Experiment 1 demonstrated the influence of lexical knowledge on identification of ambiguous tokens from word-nonword and nonword-word continua. Reaction times for word and non-word responses showed a word advantage only for ambiguous stimulus tokens (at the category boundary); no word advantage was found for clear stimuli (at the continua endpoints). Experiment 2 demonstrated an effect of a postperceptual variable, monetary payoff, on nonword-nonword continua. Identification responses were influenced by monetary payoff, but reaction times for bias-consistent and bias-inconsistent responses did not differ at the category boundary. An advantage for bias-consistent responses was evident at the continua endpoints. The contrasting patterns of reaction-time data in the two experiments indicate different underlying mechanisms. We argue that the lexical status effect is attributable to a mechanism in which lexical knowledge directly influences perceptual processes.
Article
Full-text available
The intelligibility of word lists subjected to various types of spectral filtering has been studied extensively. Although words used for communication are usually present in sentences rather than lists, there has been no systematic report of the intelligibility of lexical components of narrowband sentences. In the present study, we found that surprisingly little spectral information is required to identify component words when sentences are heard through narrow spectral slits. Four hundred twenty listeners (21 groups of 20 subjects) were each presented with 100 bandpass filtered CID ("everyday speech") sentences; separate groups received center frequencies of 370, 530, 750, 1100, 1500, 2100, 3000, 4200, and 6000 Hz at 70 dBA SPL. In Experiment 1, intelligibility of single 1/3-octave bands with steep filter slopes (96 dB/octave) averaged more than 95% for sentences centered at 1100, 1500, and 2100 Hz. In Experiment 2, we used the same center frequencies with extremely narrow bands (slopes of 115 dB/octave intersecting at the center frequency, resulting in a nominal bandwidth of 1/20 octave). Despite the severe spectral tilt for all frequencies of this impoverished spectrum, intelligibility remained relatively high for most bands, with the greatest intelligibility (77%) at 1500 Hz. In Experiments 1 and 2, the bands centered at 370 and 6000 Hz provided little useful information when presented individually, but in each experiment they interacted synergistically when combined. The present findings demonstrate the adaptive flexibility of mechanisms used for speech perception and are discussed in the context of the LAME model of opportunistic multilevel processing.
Chapter
A trend in modern psychology is to assume that during perception complex stimulus events are first analyzed by neuronal structures or feature detectors into their component properties or features. The final percept is assumed to be a result of the recoding of these features according to the rules of operation of some higher, integrative level of processing. This type of perceptual model has had wide application in modern psychology, having been proffered in a variety of forms to explain such phenomena as the perception of visual patterns, including geometric shapes, letters, and words (e. g., Neisser, 1966), and the perception of the segmental units of speech (e. g., Abbs and Sussman, 1971).
Article
When part of an utterance is deleted and replaced by an extraneous noise, listeners typically report that the utterance sounded intact; they appear to restore the deleted phoneme. The present study investigated whether this illusion differs when only one word can legally be restored (e.g., when the “l” in “lesion” is replaced), versus when more than one (restored) phoneme will create an English word (e.g., when the “l” in “legion” is replaced). The data suggest that listeners are sensitive to this manipulation. Signal‐detection analyses of the data indicate that this “lexical uniqueness” affects both true perceptual restoration and the bias toward reporting utterances as intact. Control pseudowords, created by splicing together pieces of the test words, showed no effects of the lexical uniqueness of their components, supporting the lexical basis of the effects found with the words. The results are used to postulate a model of the word‐recognition process.
Article
The role of vowel context and consonant labeling in the selective adaptation of voiceless fricatives was examined in three experiments. This approach was designed to determine whether selective adaptation effects occurred with voiceless fricative stimuli and to determine whether any such effects had a linguistic basis as opposed to a purely auditory basis. Two synthetic fricative-vowel continua were used; one ranged from [si] to [p] and the other from [su] to [fu]. Identification of the consonant portion of the syllables in these continua, as either [s] or [J], depended on both the frequency of the friction noise and on the vowel quality. In experiment 1, the end points of the continua were used as adaptors, and the identification boundary shifted toward the adapting stimulus. In experiment 2, an ambiguous frication noise (that was identified as [s] before [u] and as [J] before [i]) adapted the identification boundary in opposite directions, depending on which of the two vowels followed the noise. Thus the direction of adaptation depended on the perceptual identity of the consonant. In the final experiment, the isolated [i] and [u] vowels, and the isolated ambiguous frication noise, were demonstrated to be ineffective adaptors. The selective adaptation effects observed in these experiments were not determined by the acoustical information in the consonant or the vowel alone, but rather by the context-conditioned percept of the fricative. These results extend the reports of other research that has attempted to dissociate auditory from linguistic adaptation, provide further evidence that selective adaptation effects have multiple loci, and establish for the first time a selective adaptation effect which is unambiguously, not acoustically based.
Article
Using a selective adaptation procedure, evidence was obtained for the existence of linguistic feature detectors, analogous to visual feature detectors. These detectors are each sensitive to a restricted range of voice onset times, the physical continuum underlying the perceived phonetic distinctions between voiced and voiceless stop consonants. The sensitivity of a particular detector can be reduced selectively by repetitive presentation of its adequate stimulus. This results in a shift in the locus of the phonetic boundary separating the voiced and voiceless stops.
Article
Six cross-modal priming experiments were conducted that investigated whether representations of spoken words in memory (base word) may be activated by similar-sounding nonwords. The experiments demonstrated that nonwords that differed in one or two linguistic features from a base word resulted in significant priming effects for semantic associates to the base word. Nonwords that deviated by more linguistic features from a base word showed no priming effects. Finally, nonwords in which either the medial or initial portions were altered showed comparable priming effects. This result held for a set of two-syllable words and for a group of longer words (at least three syllables). A model of auditory word recognition is discussed in which partial acoustic-phonetic information in a spoken word is mapped onto a lexical representation in memory based on goodness-of-fit. It is argued that this mapping process affords no particular status to word-initial phonemes.
Book
Auditory scene analysis (ASA) is defined and the problem of partitioning the time-varying spectrum resulting from mixtures of individual acoustic signals is described. Some basic facts about ASA are presented. These include causes and effects of auditory organization (sequential, simultaneous, and the old-plus-new heuristic). Processes employing different cues collaborate and compete in determining the final organization of the mixture. These processes take advantage of regularities in the mixture that give clues about how to parse it. There are general regularities that apply to most types of sound, as well as regularities in particular types of sound. The general ones are hypothesized to be used by innate processes, and the ones specific to restricted environments to be used by learned processes in humans and possibly by innate ones in animals. The use of brain recordings and the study of nonhuman animals is discussed.
Article
Selective adaptation with a syllable-initial consonant fails to affect perception of the same consonant in syllable-final position, and vice versa. One account of this well-replicated result invokes a cancellation explanation: with the place-of-articulation stimuli used, the pattern of formant transitions switches according to syllabic position, allowing putative phonetic-level effects to be opposed by putative acoustic-level effects. Three experiments tested the cancellation hypothesis by preempting the possibility of acoustic countereffects. In Experiment 1, the test syllables and adaptors were /r/-/1/ CVs and VCs, which do not produce cancelling formant patterns across syllabic position. In Experiment 2, /b/-/d/ continua were used in a paired-contrast procedure, believed to be sensitive to phonetic, but not acoustic, identity. In Experiment 3, cross-ear adaptation, also believed to tap phonetic rather than acoustic processes, was used. All three experiments refuted the cancellation hypothesis. Instead, it appears that the perceptual process treats syllable-initial consonants and syllable-final ones as inherently different. These results provide support for the use of demisyllabic representations in speech perception.
Article
In a series of 16 experiments, the reaction time to decide which of two stop consonants, /b/ or /p/, for example, was the initial phoneme of a single target-bearing item was measured. In half of the experiments word frequency was varied and in the other half, lexical status (words vs. nonwords) was varied. When listeners had only to monitor the target-bearing items for the initial consonant, nonsignificant effects of word frequency and lexical status were found. However, with the addition of a secondary task that focused attention on the consequences of lexical processing by requiring listeners to indicate whether the targetbearing item was a noun or verb or a word or nonword, highly reliable lexical effects were obtained: phoneme monitoring times were faster for high than for low frequency words and faster for words than for nonwords. A secondary task that required a judgment regarding item duration did not yield reliable effects of frequency or lexical status. Finally, degrading the targets with pink noise also resulted in reliable lexical effects. An explanation of the results was offered that rested on the following assumptions: (1) competition (i.e., a race) normally exists between which representation, the pre- or postlexical, will be the basis for phoneme decisions; with the prelexical code the usual winner, but that (2) attention to postlexical representations may at times override the consequences of the competition between representations.
Article
Previous work has shown a back-propagation network with recurrent connections can successfully model many aspects of human spoken word recognition (Norris, 1988, 1990, 1992, 1993). However, such networks are unable to revise their decisions in the light of subsequent context. TRACE (McClelland & Elman, 1986), on the other hand, manages to deal appropriately with following context, but only by using a highly implausible architecture that fails to account for some important experimental results. A new model is presented which displays the more desirable properties of each of these models. In contrast to TRACE the new model is entirely bottom-up and can readily perform simulations with vocabularies of tens of thousands of words.
Article
Two experiments employing an auditory priming paradigm were conducted to test predictions of the Neighborhood Activation Model of spoken word recognition (Luce & Pisoni, 1989, Neighborhoods of words in the mental lexicon. Manuscript under review). Acousticphonetic similarity, neighborhood densities, and frequencies of prime and target words were manipulated. In Experiment 1, priming with low frequency, phonetically related spoken words inhibited target recognition, as predicted by the Neighborhood Activation Model. In Experiment 2, the same prime-target pairs were presented with a longer inter-stimulus interval and the effects of priming were eliminated. In both experiments, predictions derived from the Neighborhood Activation Model regarding the effects of neighborhood density and word frequency were supported. The results are discussed in terms of competing activation of lexical neighbors and the dissociation of activation and frequency in spoken word recognition.
Article
An important question in language processing is whether higher-level processes are able to interact directly with lower-level processes, as assumed by interactive models such as the TRACE model of speech perception. This issue is addressed in the present study by examining whether putative interlevel phenomena can trigger the operation of intralevel processes at lower levels. The intralevel process involved the perceptual compensation for the coarticulatory influences of one speech sound on another. TRACE predicts that this compensation can be triggered by illusory phonemes which are perceived as a result of topdown, lexical influences. In Experiment 1, we confirm this prediction. Experiments 2 to 4 replicate this finding and fail to support several potential alternative explanations of the results of Experiment 1. The basic finding that intralevel phenomena can be triggered by interlevel processes argues against the view that aspects of speech perception are encapsulated in a module impervious to influences from higher levels. Instead, it supports a central premise of interactive models, in which basic aspects of perceptual processing are subject to influences from higher levels.
Article
We describe a model called the TRACE model of speech perception. The model is based on the principles of interactive activation. Information processing takes place through the excitatory and inhibitory interactions of a large number of simple processing units, each working continuously to update its own activation on the basis of the activations of other units to which it is connected. The model is called the TRACE model because the network of units forms a dynamic processing structure called “the Trace,” which serves at once as the perceptual processing mechanism and as the system's working memory. The model is instantiated in two simulation programs. TRACE I, described in detail elsewhere, deals with short segments of real speech, and suggests a mechanism for coping with the fact that the cues to the identity of phonemes vary as a function of context. TRACE II, the focus of this article, simulates a large number of empirical findings on the perception of phonemes and words and on the interactions of phoneme and word perception. At the phoneme level, TRACE II simulates the influence of lexical information on the identification of phonemes and accounts for the fact that lexical effects are found under certain conditions but not others. The model also shows how knowledge of phonological constraints can be embodied in particular lexical items but can still be used to influence processing of novel, nonword utterances. The model also exhibits categorical perception and the ability to trade cues off against each other in phoneme identification. At the word level, the model captures the major positive feature of Marslen-Wilson's COHORT model of speech perception, in that it shows immediate sensitivity to information favoring one word or set of words over others. At the same time, it overcomes a difficulty with the COHORT model: it can recover from underspecification or mispronunciation of a word's beginning. TRACE II also uses lexical information to segment a stream of speech into a sequence of words and to find word beginnings and endings, and it simulates a number of recent findings related to these points. The TRACE model has some limitations, but we believe it is a step toward a psychologically and computationally adequate model of the process of speech perception.
Article
Recent accounts of selective adaptation in speech perceptin have proposed that either one or two leves of processing are adapted. Most of the previous experimental results can, however, be accounted for by either type of model. In the present experiments, two aspects of the selective adaptation paradigm were manipulated. The spectral (frequency) overlap between adapting and test syllables was manipulated along with differences in interaural presentation (adapting in one ear, testing in the other). The results indicated that the adapting syllables drawn from the test series and adaptors with no spectral overlap with the test series both produced significant changes in subjects ratings of the test stimuli. However, the identical adaptors caused significantly more adaptation than the nonoverlapping adaptors. Moreover, the nonoverlapping adaptors produced 100% interaural transfer of adaptation, indicating a central locus of this effect. The identical adaptors drawn from the test series showed approximately 50% interaural transfer. Taken together, these results strongly suggest that two levels of processing are involved in selective adaptation to place of articulation in stop consonants. One is a peripheral level that is relatively frequency specific and the other is a central level that integrates information over a wider frequency (spectral) range.
Article
This study investigated whether the apparent completeness of the acoustic speech signal during phonemic restoration derives from a process of auditory induction (Warren, 1984) or segregation, or whether it is an auditory illusion that accompanies the completion of an abstract phonological representation. Specifically, five experiments tested the prediction of the auditory induction (segregation) hypothesis that active perceptual restoration of an [s] noise that has been replaced with an extraneous noise would use up a portion of that noise's high-frequency energy and consequently change the perceived pitch (timbre, brightness) of the extraneous noise. Listeners were required to compare the pitch of a target noise, which replaced a fricative noise in a sentence, with that of a probe noise preceding or following the speech. In the first two experiments, a significant tendency was found in favor of the auditory induction hypothesis, although the effect was small and may have been caused by variations in acoustic context. In the following three experiments, a larger variety of stimuli were used and context was controlled more carefully; this yielded negative results. Phoneme identification responses collected in the same experiments, as well as informal observations about the quality of the restored phoneme, suggested that restoration of a fricative phone distinct from the extraneous noise did not occur; rather, the spectrum of the extraneous noise itself influenced phoneme identification. These results suggest that the apparent auditory restoration which accompanies phonemic restoration is illusory, and that the schema-guided process of phoneme restoration does not interact with auditory processing.
Article
The categorization of word-final phonemes provides a test to distinguish between an interactive and an autonomous model of speech recognition. Word-final lexical effects ought to be stronger than word-initial lexical effects, and the models make different reaction time (RT) predictions only for word-final decisions. A first experiment found no lexical shifts between the categorization functions of word-final fricatives in pairs such as fish-fiss and kish-kiss. In a second experiment, with stimuli degraded by low-pass filtering, reliable lexical shifts did emerge. Both models need revision to account for this stimulus-quality effect. Stimulus quality rather than stimulus ambiguity per se determines the extent of lexical involvement in phonetic categorization. Furthermore, the lexical shifts were limited to fast RT ranges, contrary to the interactive model's predictions. These data therefore favor an autonomous bottom-up model of speech recognition.
Article
Selective adaptation with a syllable-initial consonant fails to affect perception of the same consonant in syllable-final position, and vice versa. One account of this well-replicated result invokes a cancellation explanation: with the place-of-articulation stimuli used, the pattern of formant transitions switches according to syllabic position, allowing putative phonetic-level effects to be opposed by putative acoustic-level effects. Three experiments tested the cancellation hypothesis by preempting the possibility of acoustic countereffects. In Experiment 1, the test syllables and adaptors were /r/-/l/CVs and VCs, which do not produce cancelling formant patterns across syllabic position. In Experiment 2, /b/-/d/ continua were used in a paired-contrast procedure, believed to be sensitive to phonetic, but not acoustic, identity. In Experiment 3, cross-ear adaptation, also believed to tap phonetic rather than acoustic processes, was used. All three experiments refuted the cancellation hypothesis. Instead, it appears that the perceptual process treats syllable-initial consonants and syllable-final ones as inherently different. These results provide support for the use of demisyllabic representations in speech perception.
Article
The TRACE model of speech perception (McClelland & Elman, 1986) is contrasted with a fuzzy logical model of perception (FLMP) (Oden & Massaro, 1978). The central question is how the models account for the influence of multiple sources of information on perceptual judgment. Although the two models can make somewhat similar predictions, the assumptions underlying the models are fundamentally different. The TRACE model is built around the concept of interactive activation, whereas the FLMP is structured in terms of the integration of independent sources of information. The models are tested against test results of an experiment involving the independent manipulation of bottom-up and top-down sources of information. Using a signal detection framework, sensitivity and bias measures of performance can be computed. The TRACE model predicts that top-down influences from the word level influence sensitivity at the phoneme level, whereas the FLMP does not. The empirical results of a study involving the influence of phonological context and segmental information on the perceptual recognition of a speech segment are best described without any assumed changes in sensitivity. To date, not only is a mechanism of interactive activation not necessary to describe speech perception, it is shown to be wrong when instantiated in the TRACE model.
Article
The selective adaptation paradigm was used extensively for about 5 years following its introduction to speech research in 1973. During the next few years, its use dropped dramatically; it is now little used. Several reasons for the abandonment of the paradigm are discussed, and theoretical and empirical justification is provided for rejecting these reasons. Experiment 1 demonstrates that “acoustic similarity” of an adapting sound and test items cannot account for the observed results. Experiments 2–4 demonstrate that adaptation effects are not equivalent to simple contrast effects. These experiments indicate that selective adaptation produces robust reaction time effects—items in the adapted category are identified more slowly than unadapted items. The effects found in a simple paired-contrast procedure differ from those found with selective adaptation. Most strikingly, contrast effects are extremely ear dependent—much larger effects occur if testing is conducted in the right ear than in the left; adaptation effects are relatively symmetrical with respect to ear. The empirical and theoretical analyses suggest that the selective adaptation paradigm can be a powerful tool for investigating the perception of complex acoustic stimuli like speech.
Article
When portions of a signal are masked in noisy environments, perceptual restoration can be accomplished through auditory induction (AI). There are 2 classes of AI: (a) temporal induction (TI), which restores contextually appropriate segments of a signal masked at both ears by transient noises; and (b) contralateral induction (CI), which restores a signal masked at one ear when it is heard at the other. TI can prevent fragmentation of steady sounds and can permit comprehension of speech that would otherwise be unintelligible, while CI can prevent mislocalization of a sound source to the side of the unmasked ear. Both classes of AI are subtractive processes requiring that the neural units corresponding to the perceptually restored sound be among those stimulated by the louder interrupting sound. The rules governing AI provide information concerning general principles underlying perceptual organization in hearing. (60 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
The results of experiments using selective adaptation with stop consonants have been interpreted in terms of auditory feature detector fatigue, phonetic feature detector fatigue, and response contrast. In the present studies, both a selective adaptation procedure and a procedure involving paired comparisons between successively presented stimuli were used to sort out these explanations. A fricative-stop-vowel syllable ([spa]) was constructed using an [s], followed by 75 msec of silence, followed by a 10-msec voice onset time [ba]. The perceived phonetic identity of this syllable was [p] even though the spectral structure of the stop vowel within this syllable was identical to a stimulus from the [ba] end of a [ba]-[pha] test series. As adaptors, the [spa] and [ba] endpoint syllables had identical effects. In paired-comparison procedure, the [spa] caused an ambiguous test item to be labeled "B," whereas the [ba] caused the test item to be labeled "P." Results of these experiments indicate that neither response contrast nor phonetic feature detection are involved in selective adaptation effects found for a voicing stop-consonant series. Results are interpreted as supporting the position that selective adaptation effects arise at an early, auditory level of processing that is responsive to the spectral overlap between adaptor and test items.
Article
Presents a new methodology that calculates the signal detection parameters of discriminality and bias by adding the replacement sound to the appropriate phoneme. In 3 experiments with 160 native-English-speaking Ss, the method was used to test the hypothesis that restoration depends on the "bottom-up" confirmation of expectations generated at higher levels. Results indicate that speech perception depended on the interaction of both the top-down expectations generated by the Ss' knowledge and the bottom-up confirmation provided by the acoustics of the signal. (20 ref) (PsycINFO Database Record (c) 2006 APA, all rights reserved).
Article
We used a selective adaptation procedure to investigate the possibility that differences in the degree to which stimuli within a phonetic category are considered to be good exemplars of the category--that is, differences in perceived category goodness--have a basis at a prephonetic, auditory level of processing. For three different phonetic contrasts (/b-p/, /d-g/, /b-w/), we assessed the relative magnitude of adaptation along a stimulus continuum produced by a variety of stimuli from the continuum belonging to a given phonetic category. For all three phonetic contrasts, nonmonotonic adaptation functions were obtained: As the adaptor moved away from the category boundary, there was an initial increase in adaptation, followed by a subsequent decrease. On the assumption that selective adaptation taps a prephonetic, auditory level of processing, these findings permit the following conclusions. First, at an auditory level there is a limit on the range of stimuli along a continuum that is treated as relevant to a given contrast; that is, the stimuli along a continuum are effectively grouped into auditory categories. Second, stimuli within an auditory category vary in their effectiveness as category members, providing an internal structure to the categories. Finally, this internal category structure at the auditory level, revealed by the adaptation procedure, may provide a basis for differences in perceived category goodness at the phonetic level.
Article
Recent studies that used Ganong's (1980) identification task have produced discrepant results. The present study sought to resolve these discrepancies by examining the influence of methodological factors on phoneme identification and differences in data analysis techniques. Three factors were examined across 2 experiments: position of target phoneme, phonetic contrast, and 2 task conditions in which stimulus quality (S/N ratio) or cognitive load varied. A meta-analysis was then performed on the results from all identification studies, including the present one, in an effort to obtain additional insight on factors that influence the task. The experiments and meta-analysis identified the importance of several methodological factors in affecting identification, most notably position of the target phoneme.
Article
The current investigation manipulated subjects' attention to adaptor tokens in five selective adaptation experiments. All stimuli were synthetic consonant-vowel syllables, with the consonant varying from [b] to [d] by formant frequency transitions. Two distractor conditions (auditory and visual) were compared to a more typical endpoint-[d alpha] adaptor condition. Distraction from endpoint-[d alpha] adaptors to phonetically distinct [si] and [integral of i] was used to observe whether smaller adaptation effects would result when attention was not focused on adaptor stimuli. In contrast, a focused attention condition required subjects to whisper [b alpha] adaptors right after they were heard. Performance in the focused attention condition was compared to a more typical endpoint-[b alpha] adaptation condition. Results indicated that focused attention did not affect the size of the adaptation effect. Asymmetrical adaptation results for [d alpha] vs [b alpha] adaptors, and a larger amount of adaptation with the presence of contralateral "distractor" syllables, resembled findings in psychoacoustic studies of discrimination and loudness adaptation. These results suggest that two levels of auditory processing (not special to speech perception) were responsible for the observed adaptation effects.
Pattern processing of stop consonant place: The role of formant relation-ships and vowel environment (Doctoral dissertation Dissertation Abstracts International
  • L F Garrison
Garrison, L. F. (1992). Pattern processing of stop consonant place: The role of formant relation-ships and vowel environment (Doctoral dissertation, State University of New York at Buf-falo, 1991). Dissertation Abstracts International, 53-B, 1629.
Function and process in spoken word recognition: A tutorial review Attention and performance Control of language processes Processing interactions and lexical access during word recognition in continuous speech
  • W D Marslen-Wilson
  • A Welsh
Marslen-Wilson, W. D. (1984). Function and process in spoken word recognition: A tutorial review. In H. Bouma, & D. G. Bouwhuis (Eds.), Attention and performance: Vol. 10. Control of language processes. Hillsdale, NJ: Erlbaum. Marslen-Wilson, W. D., & Welsh, A. (1978). Processing interactions and lexical access during word recognition in continuous speech. Cognitive Psychology, 10, 29–63.
Bottom-up connectionist models of ''interaction Cognitive models of speech processing: The second Sperlonga Meeting
  • D G Norris
Norris, D. G. (1993). Bottom-up connectionist models of ''interaction''. In G. Altmann & R. Shillcock (Eds.), Cognitive models of speech processing: The second Sperlonga Meeting (pp. 211–234). Hillsdale, NJ: Erlbaum.
Selective adaptation requires no cognitive resources
  • A G Samuel
  • D Kat
Samuel, A. G., & Kat, D. (in preparation). Selective adaptation requires no cognitive resources.
Auditory and phonetic coding of speech Pattern recognition by humans and machines Adaptation and contrast in the perception of voicing
  • J R Sawusch
Sawusch, J. R. (1986). Auditory and phonetic coding of speech. In E. Schwab & H. Nusbaum (Eds.), Pattern recognition by humans and machines (pp. 51–88). Orlando, FL: Academic Press. Sawusch, J. R., & Jusczyk, P. W. (1981). Adaptation and contrast in the perception of voicing. Journal of Experimental Psychology: Human Perception and Performance, 7, 408–421.
A phonologically motivated input representation for the modelling of auditory word perception in continuous speech
  • R Shillcock
  • G Lindsey
  • J Levy
  • N Chater
Shillcock, R., Lindsey, G., Levy, J., & Chater, N. (1992). A phonologically motivated input representation for the modelling of auditory word perception in continuous speech. In Pro-ceedings of the Fourteenth Annual Conference of the Cognitive Science Society (pp. 408– 413). Hillsdale, NJ: Erlbaum.
Connections, competitions, and cohorts: Comments on the chapters by Mar-slen-Wilson
  • P Tabossi
Tabossi, P. (1993). Connections, competitions, and cohorts: Comments on the chapters by Mar-slen-Wilson;
Cognitive models of speech processing: The Second Sperlonga Meeting
  • Norris Shillcock
Norris; and Bard & Shillcock. In G. Atmann & R. Shillcock (Eds.), Cognitive models of speech processing: The Second Sperlonga Meeting (pp. 277–294). Hillsdale, NJ: Erlbaum.