ArticlePDF Available

Electrophysiological Correlates of Phonological Processing: A Cross-linguistic Study

Authors:

Abstract and Figures

It is well known that speech perception is deeply affected by the phoneme categories of the native language. Recent studies have found that phonotactics, i.e., constraints on the cooccurrence of phonemes within words, also have a considerable impact on speech perception routines. For example, Japanese does not allow (nonnasal) coda consonants. When presented with stimuli that violate this constraint, as in / ebzo/, Japanese adults report that they hear a /u/ between consonants, i.e., /ebuzo/. We examine this phenomenon using event-related potentials (ERPs) on French and Japanese participants in order to study how and when the phonotactic properties of the native language affect speech perception routines. Trials using four similar precursor stimuli were presented followed by a test stimulus that was either identical or different depending on the presence or absence of an epenthetic vowel /u/ between two consonants (e.g., "ebuzo ebuzo ebuzo- ebzo"). Behavioral results confirm that Japanese, unlike French participants, are not able to discriminate between identical and deviant trials. In ERPs, three mismatch responses were recorded in French participants. These responses were either absent or significantly weaker for Japanese. In particular, a component similar in latency and topography to the mismatch negativity (MMN) was recorded for French, but not for Japanese participants. Our results suggest that the impact of phonotactics takes place early in speech processing and support models of speech perception, which postulate that the input signal is directly parsed into the native language phonological format. We speculate that such a fast computation of a phonological representation should facilitate lexical access, especially in degraded conditions.
Content may be subject to copyright.
Ele ctroph ysiological C orre lates of Phonological
Pro cessing: A Cross-linguistic Study
G. Dehaene-Lambertz
CNRS UMR 8554 and EHESS, Paris, France and Centre Hospitalier Universitaire Bicetre,
France
E. Dupoux and A. Gout
CNRS UMR 8554 and EHESS, Paris, France
Abstract
& It is well known that speech perception i s deeply affected
by the phoneme categories of the native language. Recent
studies have found that phonotactics, i. e., constraints on the
c ooccurrence of ph onemes within words, also have a
cons ider able impact on speech perception routines. For
exampl e, Japanese does not allow (nonnasal) coda consonants.
Wh en presented with stimuli that violate this constraint, as in /
ebzo/, Japanese adults report that they hear a /u/ between
consonants, i.e., /ebuzo/. We examine this phenomenon using
even t-related potentials (ERPs) on French and Japanese
par ticipants in or der to study how and when the phonotactic
pr operties of the nati ve language affect speech perception
r outines. Trials using four similar precursor stimuli were
pr esented followed by a test stimulus that was either identical
or different depending on the presence or absence of an
epenthetic vowel /u/ between two consonants (e.g., ‘‘ebuzo
ebuzo ebuzo ebzo’’). Behavioral results confirm that Japa-
nese, unlike French participants, are not able to discriminate
between i dentical and deviant trials. In ERPs, three mismatch
r esponses were recorded in French p articipants. These
r esponses were either absent or significantly weaker for
Japa nese. In particular, a component similar in latency an d
topogr aphy to the mismatch negativity (MMN) was recorded
for French, but not for Japanese participants . Our results
suggest that the impact of phonotactics takes plac e early in
speech processing and support models of speech perception,
wh ich postulate that the input signal is directly parsed into the
native language phonological format. We speculate that such a
fa st computation of a phonological representation should
fa cilitate lexical access, especially in degraded conditions. &
INTRODUCTION
Humans use complex sounds to communicate and
convey meaning. The way in whi ch the mapping
between the signal and the concepts is realized, how-
ever, is heavily language dependent. For instance,
some languages use only six consonants to construct
words, others more than 80. Some use three vowels,
others more than 20. The particular phoneme inven-
tory of th e nat ive language has a strong influence on
speech discrimination capacities in adults. For in-
stance, adult monolinguals in Japanese have a lot of
trouble in distinguishing English /r/ from /l/ sounds,
because both are perceived as a single /R/. However,
language sound systems differ in ways other than the
repertoire of phonemes. Th ey also differ in how
particular phonemes can combine in a sequence, i.e. ,
its phonotactic properties. In Japanese, only a rather
strict alternation of vowels and consonants is allowed,
whereas English allows for clusters of several conso-
nants (e.g., strengths). A lot of research has been
devoted to the effect of the phoneme inventory on
speech perception, but the role played by the higher-
order properties of linguistic signals is only starting to
be explored. The focus of this paper is to s tudy, using
event-related potentials (ERPs) methodology and a
cross-linguistic design, whether phonotactic properties
have effects on speech perception routines that are as
profound as those triggere d by differences in pho-
nem e inventories.
The acquisition of the language phoneme inventory
is very quick. At 6 months, infants have established
prototypes for the vowels used in th eir language (Kuhl,
Williams, Lacerda, Stevens, & Lindblom, 1992) and start
to loose sensitivity to nonnative vowels (Polka & Wer-
ker, 1994). At 12 months, they loose the capacity to
discriminate nonnative consonantal contrasts, or at
least those that can be assimilated to nati ve categories
(Best, McRoberts, & Sithole, 1988; Werker & Tees,
1984). After that, the capacity to perceive foreign
phonetic contrasts seems remarkably stable and poor,
© 2000 Massachusetts Institute of Technology Journal of Cognitive Neuroscience 12:4, pp . 635–647
although discrimination remains possible depending on
whether the foreign p honemes are closer or further
away from the prototypes in the subjects’ native lan-
guage. This is also true if the foreign phonemes are not
easily assimilated to any phoneme in the native lan-
guage. Pallier, Bosch, and Sebastian (1997) ha ve shown
that even very fluent bilinguals who have acquired a
second language after age 5, and have used it exten-
sively thereafter, have difficulties with vowel contrasts
in the second language.
Electrophysiological studies suggest that the pho-
neme inventory of a particular language affects very
early speech processing: A phonetic representation de-
pendent on the subject’s native language is coded in the
echoic memory. Na¨a¨ta¨nen et al. (1997) presented Esto-
nian and Finnish participants with series of the vowel /e/
where items were randomly replaced by a ‘‘deviant’’
vowel (/o¨/, /o˜/, or /o/ ). These vowels exist in Estonian
and Finnish, except /o˜/, which does not exist in Finnish.
In Estonian participants, a mismatch negativity (MMN)
response was elicit ed by al l the deviant vowels (100 to
240 msec after vowel onset). It s amplitude increased
with the acoustic distance between the deviant and the
standard vowel. In contrast, in Finnish p articipants, a
significant drop in MM N amplitude was found fo r the
nonnative /o˜ / vowel. Such a drop for the nonnative
vowel was also documented in the magnetic equivalent
of the MMN. Dehaene-Lambertz (1997 ) used a similar
paradigm to study consonant contrasts. She presented
French participants with streams of CV syllables where
acoustic deviants wer e introduced that either crossed a
phonetic boundary or remained within the same cate-
gory. Two phonetic b oundaries, one present and the
other one absent in the subjects’ native language, were
explored. A large MMN was induced by deviants that
crossed the native boundary, but not by nonnative or
within-category deviants, 28 0 msec after syllable onset.
In these experiments, the phonological mismatch was
not preceded by an acoustical mismatch suggesting that
phone tic categorization is computed very early in
speech processing (from 100 to 280 msec after stimulus
onset). Moreover, this categorization appears to be
highly dependent on the subjects native language.
These results are co nsistent with the view that speech
perception involves an early processing stage of phone-
mic categorization. Such a stage removes all of the
irrelevant phonetic details from speech, but retains the
linguistically relevant contrasts in the subject’s native
language. Consequently, two sounds th at fall in the same
native category are extremely difficult t o distinguish
(Best et al., 1988).
However, as we said above, properties other and
more abstract than the inventory of phonemes differ
in language phonologies. Notably, the way in which
phonemes can cooccur in words is different. For in-
stance, languages like Japanese have a very limited set of
syllable shapes: V, VN, CV, CVN. Japanese syllables
cannot have complex onsets (except for consonant-glide
onsets) and cannot have codas (except for nasal con-
sonants). In contrast, other languages like French or
English allow for much more complex and varied sylla-
ble types: CCV, VCC, CCVCC, etc. These contrasting
phonotactic properties affect the way in which foreign
word s are incorporated into a language. For instance
(see Table 1), Japanese borrows words from languages
with complex syllabic structures by inserting an ‘‘epen-
thetic’’ vowel [u] or [o] inside consonant clusters or
after final consonants in such a way t hat the outcome
conforms to the phonotactics of Japanese.
Does thi s process o f vowel epenthesis occur in the
input system, yielding the perception of illusory seg-
ments? Or does it reflect much later processing levels,
like the influence of orthographic notation or problems
in speech production? Several studies have claimed that
phonotactic properties directly affect perceptual pro-
cesses (Dupoux, Kakehi, Hirose, Pallier, & Mehler,
1999; Halle, Segui, Frauenfelder, & Meunier, 1998; Pitt,
1998; Massaro & Cohen, 1983). However, this issue
remains controversial. In the following sections, we
review these studies and propose to test the various,
current, and competing theoretical interpretations using
electrophysiological methods.
Phonotactic Effects in Perception
Massaro and Cohen (1983) presented English subject s
with syllables starting with an obstruent consonant
followed by a liquid. Liquids were synthesized along a
continuum between [r] and [l]. The clusters were either
lega l in English ([sl], [tr], [pl], [pr]) or nonlegal ([sr],
[tl], [vl], [vr]). The results show that listeners tend to
perceive ambiguous items as legal clusters rat her than as
illegal ones. Note, though, that these studies were only
run on a population of English-speaking subjects, leav-
ing open the possibility that part of the observed effect
is due to universal acoustic/phonetic factors. Pitt (1998)
recently replicated Massaro and Cohen’s results and
found that when the illegal clusters are presented within
words, similar shifts in consonant identifications are still
observed. However, given that these illegal clusters are
legal within words (crossroad, maudlin, Atlantic), no
such shift should have been found. Therefore, it is
Table 1
. Original Words From a Non-Japanese Language and
the Jap anese Corresponding Adaptations (From Itoˆ & Mester,
1995)
Ori ginal Word Japanese Adaptation
‘‘Fight’ fai to
‘‘Festival’’ fesutibar u
‘‘Sphinx’’ sufiN kusu
‘‘Zeitgeist’’ tsaitogaisuto
636 Journal of Cognitive Neuroscience Vo lume 12, Number 4
possible that part of the original effect was due to some
factor other than phonotactics.
More recently, Halle et al. (1998) studied the percep-
tion of word initial clusters in French. The clusters *[dl]
and *[tl] are illegal word onsets contrary to [gl] and [kl].
These illegal clusters tend to be perceived as [gl] and
[kl] in an open-response paradigm as well as in a forced
choice paradigm. A gating study, however, demonstrates
that during the initial portions of the stimuli, subjects
perceive the phonemes as dental stops, but as informa-
tion for the liquid becomes available in larger gates,
perception switches and subjects identify the initial
phoneme as velar. Such misperception is also found in
a speeded phonem e detection study on nonwords
beginning with the cluster [dl]: /d/ is missed in 69% of
the cases, whereas /g/ is detected in 80%. As in Massaro
and Cohen’s study, one of the shortcomings of this work
is that it is not cross-linguistic and, hence, it is difficult to
evaluate whether the observed effects are truly language
dependent or are simply universal acoustic/phonetic
effects.
Dupoux et al. (1999) have studied the phenomenon
of vowel epenthesis in Japanese using a cross-linguistic
design. They used a continuum of stimuli ranging
from ebzo (no vowels between the consonants) to
ebuzo (a full vowel between the consonants). The
Japanese, but not the French participants, reported
the presence of a vowel [u] between consonants, even
in stimuli containing no acoustic correlate of vowels.
French p articipants, in contrast, had problems discri-
minating items with different vowel lengths (ebuzo
versus ebuuzo), a distinctive contrast in Japanese, but
not in French. These authors also used a speeded
ABX discrimination paradigm and found that Japanese
participants had trouble discriminating between the
endpoint VCCV and VCuCV stimuli. The confusion
between ebuzo and ebzo in Japanese was fou nd even
with participants who were quite proficient in French.
These results sugges t that phonotactics play a r ole
important enough as to provide an illusory perception
of segments.
Finally, in terms of language acquisition, Jusczyk,
Friederici, Wessels, Svenkerud, and Jusczyk (1993) and
Jusczyk and Lu ce (1994) found that between 6 and 9
months, infants develop a preference for phoneme
sequences that are typical to their native language. They
prefer to listen to lists of native words than to foreign
words even when the two languages are as close as
Dutch and English. The main differences between these
two languages come from the phoneme inventory and
phonotactics while word-prosody is rathe r similar in
both languages. Although this does not constitute direct
proof that phonotactics play an early role in speech
perception, at least, it shows that phonotactics charac-
teristics of the native language may be in place during
the first year of life, and as early as the phoneme
inventory.
In brief, many experiments suggest that phonotactics
play a role both in perception and in acquisition. How-
ever, it is not clear at which processing level phonotac-
tics interfere with speech perception. The next section
examines the various option s at hand.
Phonotactics in Models of Speech Perception
In the following, we distinguish three classes of models
that make con trasting claims with about the effects of
high-order phonological properties such as phonotactics.
Segmental Models
Segmental models claim that speech sounds are first
categorized in terms of discrete segments (phonemes or
bundle of features) (Marslen-Wilson & Warren, 1994;
McClellan d & Elman, 1986; Eimas & Corbit, 1973). A
representation in terms of a sequence of segments is
derived and then used to retrieve word forms in the
lexicon. In such models, whenever a listener is p re-
sented with a sequence of phonemes that belon g to
the phonetic inventory of the language in question, a
stable representation should be obtained, regardless of
whether or not that particular sequence is used in the
language. The only wa y these models could account for
phonotactic effects would be to claim that the lexicon
uses feedback to provide information, or that the effects
occur in another part of the processing system. McClel-
land and Elman (1986) claim that the phonotactic effects
found by Massaro and Cohen (1983) can be modeled by
appealing to top-down word to phoneme influences
during perception (but see a reply in Massaro & Co hen ,
1991). Marslen-Wilson and Warren (1994) claims that
nonwords are perceived through the activation of the
lexicon; in such a view, phonotactic effects emerge as a
rather late process of finding nonwords through ana-
logy. In brief, in this first class of models, p honotactics
can only have a rather late effect, which reflects either
the joint activation of many words in the lexicon, or rely
on a postaccess mechanism.
Hierarchical Models
As above, these models propose th at a segmental re-
presentation is derived first. In addition, however, this
representation is used to construct a hierarchical pho-
nologica l representation that contains higher-order
units such as morae, syllables, and feet (Pallier, 1994;
Church, 1987; Frazier, 1987). Such a phonological re-
presentation is obtained from the segments using lan-
guage-specific rules. In Pallier, Sebastian-Galle
´
s,
Felguera, Christophe, and Mehler (1993), evidence was
found that listeners build a structured representation
containing segments and syllables on-line (see also
Pallier, 1994). In such models, one could then propose
that incorrect or illegal phonological forms are automa-
Dehaene-Lambertz et al. 637
tically regularized by the parsing device. Hence, an
illegal nonword like ‘‘ebzo’’ for Japanese speakers would
be corrected by the parser as ‘ebuzo’’ via the insertion
of a vowel. In such a model, because of the time it takes
for the parser to operate, some delay in the detection
(or correction) of an illegal form should be pred icted.
Hence, o ne should expect an early segmental represen-
tation level, and a somewhat later ‘‘regularized’’ phono-
logical representatio n level.
Coarse Coding Models
Coarse coding models postulate that the input signal is
directly parsed into large processing units. Fo r example,
Mehler, Dupoux, and Segui (1990) have proposed SAR-
AH, a model based on an array of syllable detecto rs. In
this model, speech sounds are categorized into syllable-
sized units. The repertoire of syllables includes all the
syllables used in the language. Similar proposals have
been made for triphones (Wicklegren, 1969), diphones
(Klatt, 1979), and semisyllables (Dupoux, 1993; Fujimira,
1976). In such a view, an account of phonotactically
based assimilation goe s as follows: Faced with a foreign
language, the perceptual system tries to parse the signal
using, say, the available native syllabic categories. In
Japanese, there are no syllable categories containing
consonant clusters or coda consonants. A stimulus like
/ebzo/ therefore activates categories for ‘‘e’’ and ‘‘zo’’. It
also activates, to a lesser extent, all syllables that start
with /b/: ‘‘bu,’’ ‘‘ba,’’ ‘‘be,’ ‘‘bi,’’ and ‘‘bo’’. The ‘‘bu’’
interpretation is favored , maybe because in Japanese,
the [u] vowel is frequently shortened or devoiced and
shows considerable allophonic variation (see Keating &
Hoffman, 1984; Beckman, 1982). Hence, the p rototype
for ‘‘bu’’ is fairly compliant and should emerge as the
best match. Phonotactic, in this final class of models,
probably has a very early effect, and is not distinguish-
able from that of phoneme inventories.
As we saw above, these three classes of models make
rather clear-cut predictions with respect to the time-
course of processing for phonotactic information. We
propose to test these hypotheses by using high-density
ERPs, which ar e an ideal tool for exploring the time-
course of cerebral processing.
Hypotheses and Design
The goal of this experiment was to find out when
phonotactic properties of one’s native language influ-
ence speech perception. In order to study this, we
used vowel epenthesis as described above, in a cross-
linguistic design. We used a mismatch detection task in
which a series of four similar precursor stimuli were
presented followed by a fifth stimulus either similar to
the previous stimuli (contro l condition) or different
(deviant condition) (see Table 2). T he stimuli were
minimal pairs of nonwords, where the only difference
was the presence or absence of the epenthetic vowel
/u/ between two consonants (igumo versus igmo). The
behavioral predictions based on Dupoux et al. (1999)
were that French subjects would detect the deviant
stimulus, whereas Japanese subjects would not. In
order to prevent acoustic information from influencing
the detection of the deviant stimulus, acoustic varia-
bility was introduced in the precursors: These stimuli
were randomly drawn from a large set of stimuli
recorded by six different female Japanese speakers. A
male voice was used for the test stimulus. In order to
control for the temporal characteristics of the test item,
the duration of the individual p honemes was matche d
across tri als by using the resynthesized version of a
Japanese mal e speaker. Final ly, we introduced distrac-
tor trials in which the test it em was deviant for both
the French and the Japanese subjects (igimo). These
trials were introduced so that the Japanese subject s
would be given clear cases of a ‘‘different’’ response
during the experiment.
ERPs were recorded during the behavioral task. To
detect when Japanese phonotactics modify the electro-
physiological responses in the Japanese, as compared to
the French participants, we compared the ERPs in the
control versus deviant conditions for both groups.
In this type of experimental paradigm, where a devi-
ant item is presented after a succession of similar items,
a mismatch process between the features of the novel
stimulus and th e neural traces of the preceding stimuli
in sensory memory leads to an early discrimination effect
(MMN) (Na¨a¨ta¨nen, 1990). If a separate phonological
coding, independent of aco ustical format, is carried
out as suggested by Dehaene-Lambertz (1997) and
Table 2
. Experimental Conditions and Predictions for the Behavioral Responses in Japanese and French Participants
Condi tion P recursor Items Test Item P redictions: Japanese Predi ctions: French
C ontrol i gumo igumo igumo igumo igumo Same Same
i gmo igmo igmo igmo i gmo
Deviant igumo igumo i gumo igumo igmo Same Di fferent
i gmo igmo igmo igmo igumo
Di stractor i gumo igumo igumo igumo igimo Di fferent Di fferent
i gmo igmo igmo igmo igimo
638 Journal of Cognitive Neuroscience Vo lume 12, Number 4
Na¨a¨ta¨nen et al. (1997), Frenc h subjects should display a
MMN response, time-locked to the point of deviance
between the context and the test stimuli (third pho-
neme). It was also expected that French subjects would
show a deviance effect at the time-window of the late
positive component (LPC). Indeed, this component is
sensitive to the conscious detection of a deviant item
and to the decision process involved in making a
response. It is directly correlated to behavioral results,
which must show that French subjects have no problem
in discriminating /igmo/ from /igumo/.
After isolating discrimination responses in French
subjects, we have studied how this is affected by the
subjects’ native language. Here, the predictions are quite
straightforward. If the phonotactic effect takes place
late, early electrical components should be identical
for French and Japanese subjects. Indeed, we should
find an early MMN for the ‘‘ebuzo versus ‘ebzo
contrast in Jap anese and French subjects, even if later
processes mask this mismatch effect and prevent Japa-
nese s ubjects from con sciously detectin g the change. If,
on the other hand , the phonotactic effec t takes place
very early on, a very reduced or no MMN response
should be found for Japanese subjects.
In brief, in order to find an electrical component
sensitive to a specific language, we first isolated the
time-windows for which the control versus deviant
comparison was significant in French subjects . We then
tested whether or not the effect for Japanese subjects
was significant in this time-window. Furthermore, we
computed the condition (control versus deviant)
£
language (French versus Japanese) interaction. If the
electrical component tested is indeed language-specific,
language
£
condition interaction should be significant.
RESULTS
Behavioral Responses
The analysis of the percentage of ‘‘different’’ responses
showed a main effect for language (F(1,22) = 382.8, p <
.0001), a main effect for con dition (F(1,22) = 2,026.9, p
< .0001), and a significant language
£
condition inter-
action (F(1,22) = 1,814.1, p < .0001) (Table 3) . A post
hoc analysis revealed that this interaction is mostly due
to the deviant condition. Indeed, French subjects per-
ceived a change in 95.15 percent o f the trials, whereas
Japanese subjects perceived it in only 8.88 percent of
the trials (F(1,22) = 1,045.8, p < .0001). In the control
condition, performances were nearly identical for French
and Japanese, but the Japanese subjects made more
errors than the French (F(1,22) = 5.13, p = .034).
French subjects responded slower than th e Japanese
(1174 versus 1023 msec; F(1,22) = 4.31, p = .05). In
both population, subjects were slower in the deviant
condition (1174 m sec ) than in the control conditio n
(1023 msec; F(1,22) = 13.5, p = .001). Interaction
between language and condition was significant
(F(1,22) = 4.5, p = .045).
Electrophysiological Results
At the vertex, the classical auditory components N1 P2
were recorded, followed by two negativities N2 and N3
due to the multisyllabic structure of the test item. Then, a
slow positivity, the LPC due to the active response, was
recorded. Because the con trol and deviant items began
to diverge around 237 msec following the item onset,
1
i.e., at the introduction of the third phoneme, we in-
dicated the latencies for the electrical component s rela-
tive to this time. The inspection of the time-course of
two-dimensional reconstructions of t test values in the
comparison of deviant with control trials isolated three
time-windows when significant differences were present
in French subjects: 139 to 283 msec (376 t o 520 ms ec post
item onset) including N2 and N3 for the first response,
291 to 419 msec (528 to 656 msec post item onset) for the
second response, 523 to 651 msec (760 to 888 msec post
item onset) for the third response (peak of the LPC). In
Japanese subjects, deviant minus control differences
were weaker in size and duration than in French subjects.
When differences w ere observed, they occurred during
the time-windows isolated in French subjects.
First Response: 139 to 283 msec Postdeviance Onset
The first condition effect in French subjects consisted of
a sharper negativity in deviant than in control trials at
147 msec (N2) and 251 msec (N3) above front al electro-
des. The polarity of this effect was reversed above
temporal electrodes. As illustrated in Figure 1, the
cartography of the difference between deviant and con-
trol (t test) wa s very close to the topography of a MMN
as described in the literature, i.e., negativity above the
fronto-central region with a reverse of polarity at the
mastoids. In Japanese subjects, no difference between
conditions was evident.
To study this first response, we have chosen a central
pair and the mastoid electrodes, where the deviant
minus control difference was respectively negative and
positive (Figure 1). For both pairs of electrodes, the
condition effect was significant in French (central pair:
F(1,11) = 7.55, p = .019 and mastoid pair: F(1,11) =
5.23, p = .043), but not in Japanese subjects (central
pair: F(1,11) = 1.23, p = .291 and mastoid pair: F(1,11)
< 1). The language
£
condition interaction was signifi-
cant for the central electrodes (F(1,22) = 8.17, p =
.009), but not for the mastoid electrodes F (1,22) = 1.46,
p = .240). There was no main effect for hemisphere nor
any significant interaction of hemisphere with any of the
other factors. However, as illustrated by Figure 1, the
condition effect was stronger in French subjects for t he
left mastoid (F(1,11) = 5.63, p = .037) than the right
(F(1,11) = 1.75, p = .417).
Dehaene-Lambertz et al. 639
Second Response: 291 to 419 msec Postdeviance Onset
A second response was present around 300 msec in
French subjects an d lasted about 130 msec: While there
was a beginning of posterior positivity for th e control
condition, it wa s delayed in t he deviant condition. The
subtraction deviant minus control was very asymmetric
in French subjects with a marked positivity above the
right-frontal region and a medial-posterior negativity,
more important on the right side (Figure 2). Although
the voltage cartographies in Japanese su bjects seemed
similar t o those in French subjects, the t t est maps
showed much weaker effects in size and duration.
An inferior-frontal pair (at the maximum of the posi-
tivity) and an occipital pair (at the maximum of the
negativity) were analyzed for this second time-window.
In French subjects, there was no main effect of condition
for the infero-frontal electrodes, but a significant hemi-
sphere
£
condition interaction (F(1,11) = 10.28, p =
.008) due to a significant condition effect present only
for the right-infero-frontal electrode (F(1,11) = 19.17,
p = .001 for the right electrode, F(1,11) < 1 for th e left).
In Japanese subjects, there was no effect of condition fo r
this pair (F(1,11) = 1.62, p = .125), nor any significant
hemisphere
£
condition interaction (F(1,11) < 1). The
language
£
condition
£
hemisphere in teraction was
Table 3
. ‘‘Different’’ Responses and Reaction Times in
Ja panese and French Participants
Experi mental Condition Control Deviant Distractor
Japanese participant s
‘‘Different’’ responses (%) 6.1 8.9 99.5
R eaction times (msec) 1,030 1,016 996
F rench participants
‘‘Different’’ responses (%) 1.7 95.1 99.2
R eaction times (msec) 1,134 1,215 1,128
Figure 1. ERPs to the last item of the trials in French (left) and Japanese (right) participants. Top: ERPs from two symmetrical central
electrodes. The first bar indicates the item onset and the second bar the onset of the deviance between control and deviant items. Bottom: Maps
of evoked responses to control and deviant conditions at 164 msec following deviance onset (arrow on the waveforms) and maps of statistical
significance ( t test) of deviant versus control item at the s ame time. In French participants, a more important negativity for deviant than for
contro l items is recorded at the fronto-central site with a polarity reversal over the temporal regions, whereas no difference between conditions is
present in Japanese.
640 Journal of Cognitive Neuroscience Vo lume 12, Number 4
significant (F(1,22) = 5.43, p = .029) due to a significant
condition
£
language interaction for the right electrode
alone (F(1,22) = 8.40, p = .008 and F(1,22) < 1
respectively over the right and left electrodes). Over
the occipital pair, there was a condition effect in both
the Japanese and French subjects (F(1,11) = 14.37, p =
.003 in French and F(1,11) = 4.9, p = .049 in Japanese).
The language
£
condition interaction was again signifi-
cant (F(1,22) = 5.62, p = .027). For this occipital pair,
there was no significant interaction of hemisphere with
any other fact or.
Third Response: 523 to 651 msec Postdeviance Onset
Finally, the third response in French subjects was related
to a larger and longer LPC for the deviant than for the
control condition. As with the second response, it was a
relatively slow response that lasted around 220 msec. In
Japanese subjects, th e LPC was weaker than in French
subjects with li ttle difference between devian t and con-
tro l conditions (Figure 3).
For this last response, we have chosen the central and
the inferior-frontal pairs already analyzed during the
preceding time-windows. The voltage amplitude was
higher in French than in Japanese subjects yielding a
main language effect for the infero-frontal electrodes
(F(1,22) = 18.63, p < .001), and a trend for the centr al
pair (F(1,22) = 3.05, p = .095). In French, there was a
condition effect for the central (F(1,11) = 12.27, p =
.005) and the infero-frontal pairs (F(1,11) = 11.95, p =
.005). In Japanese , there was no condition effect for both
pairs. The language
£
condition interaction was signifi-
cant fo r both pairs (central pair: F(1,22) = 4.44, p = .047
and infero-frontal pair: F(1,22) = 6.75, p = .016). There
was also in French subject a significant condition
£
hemisphere interaction for the central pair (F(1,11) =
5.83, p = .034) due to a predominant response over t he
right side (F(1,11) = 15.52, p = .002 over the right
electrode and F(1,11) = 2.24, p = .163 over the left).
Above this same right side, there was a condition effect in
Japanese subjects (F(1,11) = 6. 67, p = .025) with no
significant hemisphere
£
condition interaction. How-
ever, the language
£
condition interaction was again
significan t (F(1,22) = 5.91, p = .024) for this right central
electrode.
DISCUSSION
In this experiment, a deviant stimulus was introduced
after t he presentation of four similar stimuli. Because
Figure 2 . ERPs to the last item of the trials in French (left) and Japanese (right) participants . Top: ERPs fro m two symmetrica l infero-frontal
electrodes. The first bar indicates the item onset and the second bar the onset of the deviance between control and deviant items. Bottom: Maps of
evoked responses to co ntrol and deviant conditions at 315 msec following deviance onset (arrow on the top waveforms) and maps of statistical
significance (t test) of deviant versus control item at the same time. In both groups, while there is a beginning of posterior positivity for the control
condition, it is delaye d in the deviant condition. However, th e condition effect is much more important f or French than for Japanese subjects
yielding a significant condition by language interaction.
Dehaene-Lambertz et al. 641
our goal was to study linguistic representations inde-
pendent of acoustical representations, we used pre-
cursor items spoken by four different voices. The
similarity between items could thus be computed only
at a linguistic level. This experimental situation is very
close to a natural situation where listeners have to
normalize speech across different speakers. The beha-
vioral results demonstrated that Japanese and French
subjects do not react similarly to the same stimuli, and
that performances are st rongly influence d by the sub-
jects native language. There was a huge language by
condition interaction. Japanese subjects almost never
heard a difference between items like /igmo/ and i tems
like /igumo/. These results clearly indicate that the
phonotactic rules of the native language deeply modify
speech perception and confirm th at vocalic epenthesis
in Japanese is not only a production phenomena, but
also a by-product of perception processes. This is
confirmed by the elect rophysiological results that we
discuss b elow.
For French subjects, the deviant versus control com-
parisons produced three responses related to the intro-
duction of a novel linguistic item. The statistical analyses
showed that these responses were either not present in
Japanese or were of shorter duration and weaker. More-
over, the language
£
condition interaction was signifi-
cant for all three responses. We will first consider the
meaning of these three responses in French subjects in
the context of the existing literature. We will then
discuss the cross-linguistic evidence (electrophysiologi-
cal results) an d their significance for speec h perception
models.
In the literature, there ar e reports of a mismatch
negativity or MMN that is elicited whenever there is a
mismatch between the features of a perceived stimulus
and the representation in the sensory memory left by
the stimuli immediately preceding it. This component is
specific to the auditory modality and involves generators
that are predominantly located in the planum temporale
(Giard, Perrin, Pernier, & Bouchet, 1990). Recent experi-
ments have shown that the MMN is not only elicited
when an acoustical mismatch is detected, but also when
the differences between the precursors and the test item
concern more abstract properties, like phonetic cat e-
gories (Dehaene-Lambertz, 1997; Na¨a¨ta¨nen et al., 1997),
or expectations build by subject s during the experimen-
tal run (Cowan, Winkler, Teder, & Na¨a¨ta¨nen, 1993).
Because of its latency (139 to 283 msec postdeviance
Figure 3. ERPs to the last item of the trials in French (left) and Japanese (right) partici pants . Top: ERPs from the two same symmetrical central
electrodes presented in Figure 1. The first bar indicates the item onset and the second bar the onset of the deviance between control and deviant
items. Bottom: Maps of evoked responses to control and devia nt conditions at 531 msec fo llowing deviance onset (arrow on the top waveforms)
and maps of statistical significance (t test) of deviant versus control item at the same time. A larger and longer LPC was elicited for the deviant than
for the control condition, especially in French participants.
642 Journal of Cognitive Neuroscience Vo lume 12, Number 4
onset) and its topography (centro-frontal negativity
synchrone of temporal positivity, which tends to be
asymmetric in favor of the left mastoid), the first sig-
nificant response observed in French subjects seems
very close to a M MN, as it is described in the literature.
Note, however, the differences between the procedure
used in our experiment and that of classical MMN
experiments. First, our participants were required to
make an active comparison between the test item and
the precursors whereas MMN is usually recorded in a
passive listening paradigm. Second, our stimuli were
more complex than the stimuli usually use d in experi-
ments in which M MNs are elicited: The context was
highly v ariable, and the stimuli were multi-syllabic. It is,
in fact, quite surprising to find such similarities between
our first response and MMN, and it would be interesting
to determine whether this response is subserved by the
same generators as those classically involved in MMN:
Attention to the stimuli in our experiment may have
enhanced automatic processes, and selectively favored
one type of processing rathe r than another. In any
event, our results demonstrate that across different
speakers, different speech r ates, and, hence, across
important acou stic differences, French subjects were
able to establish a more abstract representation that i s
used to compare test and precursor items. Such com-
parison was done at an early latency, indeed, a latency
comparable to those found in comparison of simple
acoustical dimensions.
A s econd response was evident in French subjects
between 291 and 419 msec postdeviance onset. Its
topography is quite peculiar with positivity over the
right-frontal region and negativity over the medial-pos-
terior regions. Frontal asymmetry is very clear as show n
by Figure 2 and by the highly significant hemisphere
£
condition interaction. Such an effect has not been re-
ported previously in experiments wher e MMN was re-
corded. However, because of the relative complexity of
our comparison task, it is possible that in order to
respond, subjects not only used th e phonetic represen-
tation stored in the sensor y memory, but also relied on
other processing in order t o do a second check before
responding. The differenc e in topography between the
first and second respons e (see Figures 1 and 2) elim-
inates a second loop i n the same process. In the follow-
ing, we speculate that this seco nd effect could be related
to higher level processes, that mediate, e.g., metapho-
nological awareness. Several exp eriments hav e indeed
described an electrical response related t o phonological
processing, called phonological mismatch or PMM (Con-
nolly & Phillips, 1994). This response has been elicite d
for pairs of words or nonwords in metaphonological
tasks like rhyme judgement (Perez-Abalo, Rodriguez,
Bobes, Gutterriez, & Valdes-Sosa, 1994) or when sub-
ject’s expectation about the word at the end of a sen-
tence was violated (Connolly & Phillips, 1994). A
phonological mismatch response has also been recorded
after visual presentation of wo rds or pictures (Perez-
Abalo et al., 1994). Although it is difficult to judge from
the single electrode presented in that paper whether it is
the same or a close neural netw ork that is involved under
the visual and auditory conditions, the functional simila-
rities of the responses in the two modalities suggest that
the PMM may reflect a phonological representation
independent of the modality of the stimulus presenta-
tion. The PMM laten cy is longer than that of the MMN
(270–300 msec in Connolly & Phillips (1994), 250–450
msec in Praamstra, Meyer, & Levelt, 1994) on alliteration
of pairs of words and compatible with our second
response (290 to 400 msec). The topographies are more
difficult to compare across experiments because of the
small number of electrodes used in the recording system
of these experiments, and the different choices of a
reference. However, none of these experiments has
reported the frontal asymmetry that we have. More data
are needed to better define the process that elicits a PMM
and whether our second effect is similar to what is
described in the literature as a PMM.
Finally, we found a late positive complex (LPC), which
is known to be due to the con scious detection of a less
frequent stimulus and is modulated by the response
decision. Because the behavioral results s howed a ma jor
effect of deviance detection in French subjects, a sig-
nificant difference between deviant and control was
expected at this level. Indeed, there was a larger and
longer positivity over the central regions for the deviant
condition as compared to the control.
In conclusion, we have identified three electrical re-
sponses related to the detection of a phonemic deviance
in complex items in French subjects. We will now see
how native language modifies these responses and dis-
cuss the Japanes e results as related to the French data.
ERPs demonstrated that native language interacts with
phonological representation and goes as deep as sen-
sory memory: For all three responses identified i n
French subjects, a significant language
£
condition
interaction was found. First, no early MMN effect was
found in Japanese subjects, even with instructio n focuss-
ing subjects’ attention toward s the auditory stimuli and a
deviance detection task. This suggests th at phonotactics
play a very early ro le that probably goes back to the
coding of phonetic properties. Second, at the time-
window of the second response, Japanese and French
voltage cartographies look similar and a weak, but
significan t condition effect is present o ver the occipital
electrodes in Japanese. Finally, there was also a weak
condition effect at the third time-window in this group.
It might seem paradoxical that the first response is
sensitive to the subjects’ native language while the
following responses appear to be dependent on univer-
sal coding. Two po ssibilities could explain this paradox.
First, since ERPs rely on the precise timing of a resp onse
related to input, a variable response might not appear in
the average. The mismatch response is probably not all
Dehaene-Lambertz et al. 643
or nothing processing. It is therefore possible that from
time to time, deviant items are effectively coded as
deviant in the sensory memory of Japanese subjects,
but this computation is neither as specific nor as auto-
matic as that of French subjects and woul d disappear
through the averaging process. However, the second
response ma y amplify differences between deviant and
control by summing up the previous deviance computa-
tions acros s a longer time-window and thus demonstrate
a condition effect. The other possibility i s that the second
response processes information coming from different
subsystems, e.g., a prototypicality system, which may
emit an error signal when illegal clusters are presented.
There is also the p ossibility of a phonetic system that
keeps track of the phonemes presented, but this process
may well be encapsulated and not easily accessible for
mismatch processes. Although each subsystem on i ts
own would emit a signal too weak to be noticed, together
they would be strong enough to create a deviance effect
at the second time-window. In any case, the difference in
sensitivity of the first two responses to native language
confirms the fact that the phonological representations
tagged by these responses are different.
What are the theoretical implications of these results
for speech perception models? T he important result
here is that at the time of the first mi smatch response
in French subjects, Japanese subjects show no evidence
of a devianc e effect. We could thus conclude that a fast
and automatic coding of the speech input exists that
relies mainly on the formats authorized by the native
language. These results are compatible with Coarse
Coding models like Sarah proposed by Mehler et al.
(1990), but are difficult to explain using Segmental or
Hierarchical models. However, the condition effect ob-
served in Japanese during the second response suggests
that some information abou t the input could be recov-
ered from other pr ocessing systems.
From a behavioral point of view, it has already been
demonstrated that the Japanese are able to access this
information. For example, Dupoux et al. (1999) found
in Experiments 1 and 2 that, although Japanese listen-
ers tend to report that they hear a vowel /u/ in stimuli
like /igmo/, they do so at a significantly lower rate than
when the /u/ is really present (65–70% versus 95%).
Similarly, in Experiment 3 they found that Japanese
listeners make 32% errors in an ABX task involving
stimuli like /igmo/ and /igumo/. Although such an error
rate was significantly higher than for control French
listeners (6%), it was still better t han chance, suggesting
that Japanese subjects have residual abilities to distin-
guish a real from an illusory vowel.What our cu rrent
study suggests is that this residual capacity is slower,
and that it relies on a different network than the one
used with native contrasts.
In conclusion, language phonotactics deep ly affect
speech perception. Such fast com putation of a phono-
logical representation might be useful in a noisy envir-
onment or in the event of mispronunciation in order to
reconstruct the correct item and to facilitate lexical
access. Because infants are sensitive to the p honotactic
constraints of their native language around 9 months,
we need to study how theses rules are implemented
during l anguage learning. Kuhl et al. (1992) have
suggested that exposure to a particular language, and
thus to particular phonemes, increases responses to the
language prototypes phonemes while decreasing re-
sponses to the adjacent phonemes that are not present
in the environment. If we extend this concept from
phoneme category to phoneme combinations, the
learning brain may in fact comput e large r units, that
include several phonemes. This would accelerate auto-
matic responses to frequent combinations of pho-
nemes, but would prevent the stabilization of the
representation of phoneme combinations that are
never encountered.
METHOD
Subjects
Twelve Japanese and 12 French subjects were recru ited
in Paris and tested individually, after giving written
informed consent. They were all right-handed according
to self-report and the Edinburgh inventory. None of
them had a history of neurological or psychiatric disease,
or a hearing deficit. At t he end of the experiment , they
filled out a detailed biographical questionnaire about
their experience with foreign languages.
Japanese Subjects
Three men and nine women (age: 20 to 36, median 28.4)
were paid for participating in the experiment. They had
all begun to study English after the age of 12, mostly by
reading (according to the questionnaire, 80% of the
teaching was in written mode and 20% was spoken).
They had all begu n to study French after the age o f 18
(except one subject at 16), mo stly by reading (according
to the questionnaire, 60% of the teaching was written
and 40% was spoken).
French Subjects
Three men and nine women (age: 18 to 39, median 25)
were volunteers and were not paid for their participa-
tion. None spoke Japanese. They had started studying
English after the age of 12. Some also knew German,
Spanish or Italian.
Stimuli
The stimuli for this experiment consisted of 18 items or
six triplets of the form V
1
C
1
C
2
V
2
(igmo), V
1
C
1
UC
2
V
2
(igumo), and V
1
C
1
IC
2
V
2
(igimo). Each triplet was un-
644 Journal of Cognitive Neuroscience Vo lume 12, Number 4
iquely defined by the particular combination of the
V
1
C
1
C
2
V
2
, which we call a radical. There were six radicals:
igmo, igna, ikno, ikma, okna, and ogma.
2
All 18 resulting
stimuli were nonwords in both Jap anese and French.
The stimuli were presented in blocks of five items .
The first four items constituted the precursors and the
last the test item. The material therefore consisted in
two sets of stimuli: one for the precursor items and one
for the test items. In both sets, stimuli were selected
from a larger group by three French phoneticians and
six naive Japanese subjects. We only retained those
stimuli that were intelligible and that sounded reason-
ably natural in both languages.
The precursor set consisted in 72 different, naturally
produced items (six radicals by two forms (igmo or
igumo) by six speakers). They were selected fro m a set
of 648 stimuli produced by six female J apanese speakers.
For each speaker, nine utterances for each radical an d
both forms /igumo/ and /igmo/ were recorded in a sound
attenuated room. The stimuli were digitized at 16 kHZ/
16 bits on an OROS AU 22 bo ard and processed on a
waveform editor. Six items for each of the six radicals
and for the V
1
C
1
UC
2
V
2
(igumo) and the V
1
C
1
C
2
V
2
(igmo)
forms were selected from this large set, depending on
the distribution of Japanese speakers in both groups. In
the V
1
C
1
UC
2
V
2
(igumo) stimuli set, we selected the
items where the /u/ vowel was consistent with the
French prototype. Indeed, in these stimuli, the produc-
tion of the Japanese /u/ vowel varied between the /u/
and the /y/ French prototype. In the V
1
C
1
C
2
V
2
stimuli,
although Japanese speaker s knew foreign languages,
they could not be prevented from inserting a very short
/u/ vowel into th e consonant cluster. Stimuli were edited
with a waveform editor, and th e vocalic part was pro-
gressively remo ved until a French judge found that the /
u/ vowel could no longer be perceived.
The test items consisted in 18 synthetic items one
item for each of the six radicals in each of the three
types: V
1
C
1
UC
2
V
2
(igumo), V
1
C
1
C
2
V
2
(igmo), and
V
1
C
1
IC
2
V
2
(igimo). They were synthesized with a
MBROLA speec h synthes izer (Dutoit, Pagel, Pierret,
Bataille, & Vreken, 1996), using the natural productions
of a male Japanese speake r as a model. The algorithm
for the speech s ynthesis used a male voice and a
French diphone database. The test stimuli were edited
with a speech edito r to make sure there was no
‘‘schwa’’ vowel inserted in the consonant cluster; any
evidence of a schwa was removed to obtain an un-
ambiguous consonant cluster. The phonemes of the
resynthesized stimuli had the same duration and the
same pitch as the original natural one . Acros s the six
radicals, the durations of the first t hree phonemes
were measured. The first phoneme, V1, started 9 to
11 msec after the onset of the file. The second
phoneme, C1, starte d 130 to 136 msec after the onset
of the file. The third phoneme (U, C2, or I, depending
on the form of the s timulus) started 236 to 240 msec
after the onset of the file. Mean stimulus durations
were : 746 msec for V
1
C
1
UC
2
V
2
stimuli, 653 msec for
V
1
C
1
C
2
V
2
, and 647 msec for V
1
C
1
IC
2
V
2
.
Procedure
Trials consisted in blocks of five items separated by 600
msec of silence: Four precursor items were followed by
one t est item. The precursors could either be in the
igmo or in the igumo form, the test item defining the
condition. Six types of trials were randomly presented.
In the control condition, the test item was identical to
the precursors (igmo
. . .
! igmo or igumo
. . .
! igumo).
In the deviant condition, the test item w as different from
the precursors (igmo
. . .
! igumo or igumo
. . .
igmo). In
the distractor condition, the test item was always of the
igimo type (igumo
. . .
! igimo or igmo
. . .
! igimo). For
each trial, a radica l was randomly selected and then the
four precursors were randomly chosen from the six
possibilities in their radical group. The 36 different trials
(six radicals
£
two forms
£
three conditions) were
repeated 10 times ea ch, in different random order.
Reaction times were measured from the onset of the
test item with a maximum response delay allowed of 3
sec. The next trial was p resented 1,200 msec after the
behavioral response. The entire se t of 360 trials was
divided into 20 trial blocks sep arated by a short pause to
allow subjects to relax. The total duration of the experi-
men t was 1 hr.
Stimuli and trial presentation, r andomization, and
response measurement were effected using the EXPE
software package on a PC compatible with a Proaudio
Spectrum 1 6 D/A Board (Pallier & Dupoux, 1997).
Stimuli were pr esented binaurally via loud speakers,
on a 65 dB HP.
Subjects were tested individually in a quiet room.
Following electrode application, they were seated in a
comfortable chair and were instructed to move as l ittle
as possible and fix a point in front of them during each
block of recordings. The subjects were informed that
they would hear lists of five stimuli, each of which would
be made up of Japanese nonwords. The first four items
would be identical and uttered by fe male Japanese
speakers while the fifth would be spoken by a male
voice. They were instructed to indicate if the last item
was different by making a bimanual same–different
response. The side associated with the ‘‘same’ response
was changed in the middle of the experiment, and the
order counterbalanced across subjects.
Before starting the ERP recording, there was a training
session of 12 randomly chosen trials, consisting of six
controls and six distractors. No deviant trial was pre-
sented. It wa s considered that both Japanese and French
subjects would respond ‘‘same’’ in the control condition
and ‘‘different’’ i n the distractor condition. Subjects
received visual feedback about whether they had made
a correct response or not. Training session results were
excluded from the data analysis.
Dehaene-Lambertz et al. 645
Recording System
ERPs were collected using a 128-channel geodesic elec-
trode net (Tucker, 1993) referenced to the vertex. This
device consists 128 Ag/AgCl electrodes encased in
sponges moistened with a salty solution. The net was
applied in anatomical reference to the vertex and th e
cantho-meatal line. Vertical eye m ovements and blinks
were mo nitored vi a two frontal and two infra-orbital
electrodes and two canthal electrodes were used to
check fo r horizontal eye movements.
Scalp voltages were recorded during the entire
experiment, amplified, filtered between 0.1 and 39.2
Hz, and digitized at 125 Hz. Then, the EEG was
segmented into epochs starting at 1500 msec before
the onset of each test stimulus and ending 1500 msec
after it. These epoch s were automatically edited to
reject trials contaminated by significant eye move-
ments (deviation higher than 70 m V on the horizontal
and vertica l para-ocular electrodes), or body move-
ments (local deviation higher than 70 m V and global
deviation higher than 100 m V). The artifact-free trials
were then averaged for each subject across the thee
experimental conditions (control, deviant, distractor).
Averages were corrected through a 150-msec baseline
(1500 to 1350 msec before stimulus onset), trans-
formed into reference-independent values using the
average reference method, and then digitally fi ltered
between 0.5 and 20 Hz. Two-dimensional reconstruc-
tions of scalp voltage at each time-step were com-
puted using a spherical spline interpolation (Perrin,
Pernier, Bertrand, & Echallier, 1989).
Data Analysis
Behavioral D ata
The ‘‘ikuma’’ test stimulus was removed when results
were analyzed. Because of its acoustical quality, it was
not perceived in the same way as the other test items in
the vowel condition: Whe n ‘ikuma’’ was the test stimu-
lus, there were 60% errors in the control condition in
Japanese subjects. In order to keep a fully factorial
design, all ‘‘ikm’ radical trials in both populations were
removed. Trials in which subjects did not respond
before the deadline were excluded from the analysis
(less than 1% of the trials).
Two separate analyses of variance (ANOVA) were
carried out for the percentage of ‘‘different responses
and for reaction times with languag e (French or Japa -
nese) as the between-subject factor, and condition
(‘‘deviant’ ’ or ‘ ‘control’’) as the within-subject fact or.
Electrophysiological Results
The goal of this experiment was to detect when
Japanese p honotactics modify the electrophysiological
responses of Japanese as opposed to French subjects.
We thus adopted the following strategy to analyze the
ERPs. First, the inspection of the time-course of two-
dimensional reconstructions of t test values in the
comparison of deviant with control trials was been
used to isolate the time-windows for which significant
differences were present in Frenc h subjects. Second,
we checked if significant effects were also present in
Japanese subjects using similar t test value maps. Third,
for each selected time-window, two pairs of symme-
trical electrodes were chosen, one pair at the maximum
of the positivity and the second at the maximum of the
negativity of the difference between deviant and con-
trol. For each time-window, average voltage was com-
puted for th e pair of electrodes selected and for the
two conditions (deviant versus control) and submitted
to an analysis of variance (ANOVA) with condition
(deviant and control), hemisphere (left and right) as
within-subject factors and language (Japanese or
French) as between-subjects factor. Only the analyses
done on th e trials in which the behavioral responses
were the dominant ones for each condition and each
population are reported here.
3
Acknowledgments
T his study was supported by the Ministere Franc¸ais de la Sante
´
et de la Recherche P HRC 1995 No. AOM95011, the Groupe-
m ent d’ Inte
´
r et Scientifique Sciences de la Cognition No. PO
9004, the Fondation Evi an, and the Mc Donnell Foundation.
R eprint requests should be sent to Gh islaine Dehaene-
Lam bertz, Laboratoire de Sciences Cognitives et Psycholinguis-
ti que, 54 boulevard Raspail, 75270 Paris cedex 06, France.
P hone: (33) 1-49-54-22-62. Fax: (33) 1-45-44-98-35. Email:
ghis@lscp.ehess.fr.
T he color figures can be found at www.ehess.fr/centres/ilscp/
persons/ghis/buzofig.pdf.
Notes
1. T he speech synthethiser was programmed to begin th e
th ird phoneme at 237 msec after item onset. However, because
of coarticulation, differences could already be seen in the
so nograms and perceptible 50 msec earlier. However, if
di fferences were audible, the identity of the third phoneme
w as not perceptible on average before 239 msec. These values
give a bracket for deviance onset.
2. T hese radicals were selected on the basis of a reanalysis
of previous results (Dupoux et al., 1999). We chose from the
consonants that had given the most robust effects in the past.
3. T he same analyses were also computed for all trials
i ndependent of the behavioral response. Because there were
f ew trials for which the behavioral response was different
f rom what was expected (around 9% in Japanese and 5% in
Fr ench subjects), averaged voltages and statistical results
w ere almost identical to the results presented here. In
p articular, no earl y significant difference between control
a nd devi ant condition could be identified in Ja panese
subjects and the language
£
conditi on interaction was still
si gnificant f or the central pair ( F(1,22) = 7.6, p = .011) for the
f irst time-window.
646 Journal of Cognitive Neuroscience Vo lume 12, Number 4
REFERENCES
Beckman, M. E. (1982). Segmental duration and the ‘mora’’ in
Jap anese. Phonetica, 39, 113135.
Best, C. T., McRoberts, G. W., & Sithole, N. M. (1988). Exam-
i nation of the perceptual reorganization for speechcontrasts:
Zulu click discriminati on by English-speaking adults and
i nfants. Journal of Experimental Psychology: Human
P erception and Performance, 3, 345–360.
Ch urch, K. W. (1987). Phonological parsing and lexical retrie-
val. Cognition, 25, 53–69.
Connol ly, J. F., & Phillips, N. A. (1994) . Event-related potential
components reflect phonological and semantic processing
of the terminal word of spoken sentences. Journal of Cog-
ni tive Neuroscience, 6, 256–266.
Cow an, N. , Winkler, I., Teder, W., & Na¨a¨ta¨nen, R. (1993).
Memory prerequisites of mismatch negativity in the auditory
event-related potential (ERP). Journal of Experimental
P sychology: Learning, Memory and Cognition, 19, 909–921.
Dehaene-Lambertz, G. (1997). Electrophysiological correlates
of categorical phoneme perception in adults. NeuroReport,
8, 919–924.
Dup oux, E. (1993). Prelexical processing: The syllabic hypoth-
esis revisited. In G. T. M. Altmann & R. Shillcock (Eds.),
Cog nitive models of speech processing: The second sper-
l onga meeting (pp. 81–114). Hove East Sussex, UK: Lea &
Febiger.
Dup oux, E. , Kakehi, K., Hirose, Y., Pallier, C., & Mehler, J.
( 1999). Epenthetic vowels in Japanese: A perceptual illusion?
Journal of Experimental Psychology: Human Perception
and Performance, 25, 1568 –1578.
Dutoi t, T., Pagel, V., Pierret, N., Bataille, F., & Vreken, O. V. D.
( 1996) . The MBROLA project: To wards a set of high-quality
s peech synthesizers free of use for non-commercial pur-
pos es. Paper presente d at the ICSLP’96, Philadelphia.
Eima s, P. D., & Corbit, J. D. (1973). Selective adaptation of
l inguistic feature detectors. Cognitive Psychology, 4, 99–109.
Fra zier, L. (1987). Structure in auditory word recognition.
Cog nition, 25, 157–187.
Fujimi ra, O. (1976). Syllables as concatenated demisyllables and
af fixes. Journal of the Acoustical Society of America, 59, S55.
Giar d, M. H., Perrin, F., Pernier, J., & Bouchet, P. (1990). Brain
generators implicated in the processing of auditory stimulu s
deviance: A topographic event-related potential study. Psy-
c hophysiology, 27, 627–640.
H alle, P., Segui, J., Frauenfelder, U., & Meunier, C. (1998).
P rocessing of illegal consonant clusters: A case of perceptual
assimilation? Journal of Experimental Psychology: Human
P erception and Performance, 24, 592–608.
Itoˆ, J., & Mester, A. (1995). Japanese phonology. In J. Gold-
smi th (Ed.), The handbook of phonological theory (pp. 817–
838). Oxford, UK: Blackwell.
Jusczyk, P. W., Friederici, A., Wessels, J., Svenkerud, V., &
Jusczyk, A. (1993). Infants’ sensitivity to the sound pattern of
nati ve language words. Journal of Memory and Language,
32, 402–420.
Jusczyk, P. W., & Luce, P. A. (1994). Infants sensitivity to
p honotactic patterns in the native-language. Journal of
M emory and Language, 33, 630–645.
Keating, P., & Hoffman, M. (1984). Vowel variation in Japanese.
P honetica, 41, 191–207.
Kl att, D. H. ( 1979). Speech perception: A model of acoustic
p honetic analysis and lexical access. Journal of Phonetics, 7,
279312.
Kuh l, P. K., Williams, K. A., Lacerda, F., Stevens, K. N., & Lind-
bl om, B. (1992). Linguistic experiences alter phonetic per-
ception in infants by 6 months of age. Science, 255, 606–608.
Mar slen-Wilson, W., & Warren, P. (1994). Levels of perceptual
r epresentation and process in lexical access: Words pho-
nemes and features. Psychological Review, 101, 653–675.
Massaro, D. W., & Cohen, M. M. (1983). Phonological con-
str ainst in speech perception. Perception and Psychophy-
s ics, 34, 338–348.
Massaro, D. W., & Cohen, M. M. (1991). Integration versus
i nteractive activation: The joint influence of stimulus and
context in perception. Cognitive Psychology, 23, 558–614.
McClell and, J. L., & Elman, J. L. (1986). The TRACE model of
speech perception. Cognitive Psychology, 18, 1–86.
Mehler, J., Dupoux, E., & Segui, J. (1990). Constraining models
of lexical access: The onset of word recognition. In G. T. M.
A ltmann (Ed.), Cognitive models of speech processing: Psy-
c holinguistic and computational perspectives (pp. 236–
262). Cambridge: MIT Press.
N a¨a¨ta¨nen, R. (1990) . The role of at tention in auditory infor-
ma tion processing as r evealed by event-related potentials
and other brain measures of cognitive function. Behavioral
and Brain Sciences, 13, 201288.
N a¨a¨ta¨nen, R., Lehtokovski, A., Lennes, M., Cheour, M., Huoti-
l ainen, M., Iovonen, A., Vainio, M., Alku, P., Ilmoniemi, R. J.,
Luuk, A., Allik, J., Sinkkonen , J., & Alho, K. (1997). Language-
specif ic phoneme representations r evealed by electric and
ma gnetic brain responses. Nature, 385, 432–434.
Pa llier, C. (1994). Role de la syllabe dans la perception de la
parol e: e
´
tudes attentionnelles. These de doctorat de l’Ecole
des hautes e
´
tude s en sciences soci ales [available from th e
auth or], Paris.
Pa llier, C., Bosch, L., & Sebastian, N. (1997). A limit on beha-
vior al plasticity in spech perception . Cognition , 64, B9–B17.
Pa llier, C., & Dupoux, E. (1997). EXPE: An expandable pro-
gr amming language for on-line psychological experiments.
B ehavior Research Methods, Instruments, and Computers,
29, 322–327.
Pa llier, C., Sebastian-Galle
´
s, N., Felguera, T., Christophe, A., &
Mehl er, J. (1993). Attentional allocation within the syllabic
str ucture of spoken words. Journal of Memory and Lan-
g uage, 32, 373–389.
Per ez-Abalo , M. C., Rodriguez, R., Bobes, M. A., Gutteriez, J., &
V aldes-Sosa, M. (1994). Brain potentials and the availability
of semantic and phonological codes over time. NeuroRe-
port, 5, 21732177.
Per rin, F., Pernier, J., Bertrand, D., & Echallier, J. F. (1989).
Sph erical splines for scalp potential and current density-
ma pping. Electroencephalography and Clinical Neurophy-
s iology, 72, 184–187.
Pi tt, M. (1998). Phonological processes and the perception of
p honotactically illegal consonant clusters. Perception and
P sychophysics, 60, 941–951.
Pol ka, L., & Werker, J. F. (1994). Developmental changes in
p erception of non-native vowel contrasts. Journal of Ex-
peri mental Psychology: Human Perception and Perfor-
mance , 20, 421–435.
Pr aamstra, P., Meyer, A. S., & Levelt, W. J. M. (1994). Neurop-
sychological manifestations of phonological processing: La-
tency variation of a negativ e ERP component timelocked to
p honological mismatch. Journal of Cognitive Neuroscience,
6, 204–219.
T ucker, D. (1993). Spatial sa mpling of head electrical fields:
T he geodesic electrode net. Electroencephalography and
Cl inical Neurophysiology, 87, 154 – 163.
Wer ker, J. F., & Tees , R. C. (1984). Cross-language speech
p erception: Evidence for perceptual reorganisation during
the fi rst year of life. Infant Behavior and Development, 7,
49–63.
Wi cklegren, W. A. (1969). Context-sensitive coding associative
memory and serial order in (speech) behavior. Psychologi-
c al Review,76,1–15.
Dehaene-Lambertz et al. 647
Control
Deviant
400
+
µ
200
-400
800
400
800
200
+
µ
v
-400
French
Japanese
Control
Deviant
t-test
0
- 2.5
+ 2.5
µ
v
p
.001
.01
.05
.001
.01
.05
Control
Deviant
t-test
200
400
+
µ
v
200
400
+
µ
v
_
_
_
_
_
+
+
+
+
+
+
Figure 1
800
800
-400
-400
+
µ
v
-400
800
400
+
µ
v
-400
400
800
French
Japanese
Control
Deviant
t-test
0
- 2.5
+ 2.5
µ
v
p
.001
.01
.05
.001
.01
.05
Control
Deviant
t-test
Control
Deviant
0
+ 2.5
µ
v
_
+
_
+
_
+
_
+
_
+
_
+
-400
400
800
-400
400
800
+
µ
v
+
µ
v
Figure 2
French
Japanese
Control
Deviant
t-test
0
- 2.5
+ 2.5
µ
v
p
.001
.01
.05
.001
.01
.05
Control
Deviant
t-test
_
+
_
+
_
+
_
+
_
+
_
+
_
-400
400
800
+
µ
v
400
+
µ
v
800
-400
400
800
-400
+
µ
v
400
800
+
µ
v
-400
Figure 3
Control
Deviant
... Lastly, such an extension is consistent with the latest neuropsychological research, which shows that human speech perception involves information integration on different timescales: a short timescale for processing phonemesized representations and a long timescale for processing syllable-sized representations (Hickok, 2014;Hickok & Poeppel, 2004. In other words, speech perception is not strictly phoneme-based, and indeed, it has been argued elsewhere that syllables, instead of single phonemes, are basic units in speech processing (e.g., Dehaene-Lambertz et al., 2000;Massaro, 1974;Massaro & Cohen, 1983;Strange, 2011). ...
... This assumption is consistent with PAM/PAM-L2 (Best, 1995;Best & Tyler, 2007) and Articulatory Phonology (Browman & Goldstein, 1986, 1992, because the gestural scores for C and V tend to be in an in-phase coupling relation, i.e., the onset consonant and the vowel can have very similar initiation times, and listeners can therefore process C and V simultaneously. Potentially, nonnative (L2) speech perception involves not only processing speech input at the phonemic level but also involves "coarse-coding" mechanisms (Dehaene-Lambertz et al., 2000), i.e., the input signal is directly parsed into large processing units such as diphones, triphones, or syllables. Furthermore, the dual-stream neuropsychological model of speech perception (Hickok & Poeppel, 2004 also suggests that speech processing requires information integration on different timescales: a short processing window (20-50 ms) is required for processing phoneme-level representations, while a long window (150-300 ms) is used for processing syllable-level representations, and optimal processing occurs when phonemic and syllabic representations are consistently integrated. ...
... Therefore, CV syllables can be seen as complex constellations or may be regarded as gestural compounds (i.e., combinations of molecules) in nonnative speech perception, and accordingly, the assimilation principles can be extended from single phonemes to phonemic sequences for making predictions and explanations. Such an extension also echoes the call that syllables but not single phonemes should be treated as the basic processing units in speech perception (e.g., Dehaene-Lambertz et al., 2000;Dupoux, 1993;Massaro, 1974;Strange, 2011). At the same time, this extension is still compatible with a phoneme-based approach (especially when phonotactics and contextual effects are not of interest), because phonemes form constituents of syllables in the prosody hierarchy, i.e., 98 phoneme substitution occurs (e.g., /iː/ → /ei/) and thus an unattested syllable is assimilated to an attested syllable (e.g., */wiː/ → /wei/). ...
Thesis
Full-text available
This thesis investigates how native language (L1) segmental phonology and phonotactics interfere with the perception of phonological structures in a nonnative or second language (L2). Previous research has shown that nonnative listeners at times modify the phonological structures of an L2 based on the regularities of their L1 phonology, and that they sometimes experience perceptual difficulties in distinguishing L2 contrasts. The present thesis presents a total of five case studies focusing on different kinds of L1-L2 phonological mismatch, all of which trigger different corresponding modification strategies, including neutralisation (contrastive phonemes become non-contrastive), substitution (replacing a target phoneme with a different segment), epenthesis (perceiving an illusory segment when there is no target), and deletion (failing to perceive a target segment). These perceptual modification strategies are investigated through a series of psycholinguistic experiments relying on established methods in the field (e.g., categorisation, identification, and discrimination), and newer methods such as mouse tracking for triangulating on the cognitive processes involved in perceptual modification strategies. The thesis also explores whether extensive experience with the target language and especially the expansion of the L2 vocabulary, is predictive of L2 listeners’ ability to accurately perceive novel phonotactic structures. The findings of the present thesis have strong implications for extending the prevalent theories of nonnative speech perception, including the Perceptual Assimilation Model (PAM, and its extension, PAM-L2; Best, 1995; Best & Tyler, 2007), the Vocabulary Model of Rephonologisation (Vocab; Bundgaard-Nielsen et al., 2011a, 2011b, 2012), and the Automatic Selective Perception Model (ASP; Strange & Shafer, 2008; Strange, 2011). On the basis of the experiments conducted, this thesis argues that while perceptual assimilation is the fundamental mechanism for understanding nonnative (L2) segmental perception, the current frameworks must be extended to also allow cross-language category mapping for more complex phonological structures (i.e., phonemic sequences) in order to understand how L1 phonotactic expectations interfere with segmental perception in L2 speech perception.
... For example, the phonological sequence /pt/ never occurs in word onset position in English (Vitevitch and Luce, 2004), but it does occur in this position in other languages such as Polish (e.g., the common Polish word "ptak" means "bird"). Cross-linguistic studies have demonstrated that cortical processing is modulated by experience with phonotactic patterns (Dehaene-Lambertz et al., 2000;Wagner et al., 2012Wagner et al., , 2013Wagner et al., , 2022 and these modulatory influences are established during childhood (Jusczyk et al., 1993;Ortiz-Mantilla et al., 2016;Werker and Hensch, 2015). Sensory processing that is selective to native-language phonotactic patterns is streamlined and may facilitate language comprehension (Hisagi et al., 2010;Noordenbos and Serniclaes, 2015;Strange, 2011;Wagner et al., 2012). ...
... The pattern of results in the current study adds to a literature that has demonstrated an effect of experience with speech sound sequences and their associated stress pattern on sensory processing for perception (Dehaene-Lambertz et al., 2000;Dupoux et al., 1999;Li et al., 2021;Wagner et al., 2013Wagner et al., , 2017Wagner et al., , 2022. Within the context of a hierarchical coordinated system of speech perception (Ghitza et al., 2013;Giraud and Poeppel, 2012), bidirectional signaling transmitted between cognitive-linguistic networks and the left hemisphere STG through the alpha and beta bands may modulate sensory processing of spoken nonword forms for perception (Bhaya-Grossman and Chang, 2022;Hamilton et al., 2021). ...
Article
Full-text available
The phonotactic patterns of one's native language are established within cortical network processing during development. Sensory processing of native language phonotactic patterns established in memory may be modulated by top-down signals within the alpha and beta frequency bands. To explore sensory processing of phonotactic patterns in the alpha and beta frequency bands, electroencephalograms (EEGs) were recorded from native Polish and native English-speaking adults as they listened to spoken nonwords within same and different nonword pairs. The nonwords contained three phonological sequence onsets that occur in the Polish and English languages (/pət/,/st/,/sət/) and one onset sequence/pt/, which occurs in Polish but not in English onsets. Source localization modeling was used to transform 64-channel EEGs into brain source-level channels. Spectral power values in the low frequencies (2-29 Hz) were analyzed in response to the first nonword in nonword pairs within the context of counterbalanced listening-task conditions, which were presented on separate testing days. For the with-task listening condition, participants performed a behavioral task to the second nonword in the pairs. For the without-task condition participants were only instructed to listen to the stimuli. Thus, in the with-task condition, the first nonword served as a cue for the second nonword, the target stimulus. The results revealed decreased spectral power in the beta frequency band for the with-task condition compared to the without-task condition in response to native language phonotactic patterns. In contrast, the task-related suppression effects in response to the non-native phonotactic pattern/pt/for the English listeners extended into the alpha frequency band. These effects were localized to source channels in left auditory cortex, the left anterior temporal cortex and the occipital pole. This exploratory study revealed a pattern of results that, if replicated, suggests that native language speech perception is supported by modulations in the alpha and beta frequency bands.
... However, they must develop other abilities to reach adult levels of phonological perception. Within the first 2 months of life, normal infants show a precognitive detection of syllable length (33); at 4 months they begin to distinguish between tones and syllables (34). At 6 months old, they establish prototypes of vowels in their native language (31,32), and around 10 months old, infants have prototypes of consonants (35). ...
... Development of phonemic perception requires infants to detect differences in acoustic features and phonological categories, leading to the use of experimental auditory oddball paradigms in which stimuli including differences between acoustic features, frequency or phonological categories are especially useful in assessing brain electrical activity associated with the acquisition of language (32)(33)(34)(58)(59)(60)(61). From studies in children and adults, the expectation is that the amplitude of P1-N1-P2-N2 complex will be greater for uncommon than common repetitive stimuli (50), due to the fact that neuronal responses habituate to repeated presentation of the same stimulus, while a new, unusual stimulus will produce a large amplitude response (30,62). ...
Article
Full-text available
Introduction Infancy is a stage characterized by multiple brain and cognitive changes. In a short time, infants must consolidate a new brain network and develop two important properties for speech comprehension: phonemic normalization and categorical perception. Recent studies have described diet as an essential factor in normal language development, reporting that breastfed infants show an earlier brain maturity and thus a faster cognitive development. Few studies have described a long-term effect of diet on phonological perception. Methods To explore that effect, we compared the event-related potentials (ERPs) collected during an oddball paradigm (frequent /pa/80%, deviant/ba/20%) of infants fed with breast milk (BF), cow-milk-based formula (MF), and soy-based formula (SF), which were assessed at 3, 6, 9, 12, and 24 months of age [Mean across all age groups: 127 BF infants, Mean (M) 39.6 gestation weeks; 121 MF infants, M = 39.16 gestation weeks; 116 SF infants, M = 39.16 gestation weeks]. Results Behavioral differences between dietary groups in acoustic comprehension were observed at 24-months of age. The BF group displayed greater scores than the MF and SF groups. In phonological discrimination task, the ERPs analyses showed that SF group had an electrophysiological pattern associated with difficulties in phonological-stimulus awareness [mismatch negativity (MMN)-2 latency in frontal left regions of interest (ROI) and longer MMN-2 latency in temporal right ROI] and less brain maturity than BF and MF groups. The SF group displayed more right-lateralized brain recruitment in phonological processing at 12-months old. Discussion We conclude that using soy-based formula in a prolonged and frequent manner might trigger a language development different from that observed in the BF or MF groups. The soy-based formula’s composition might affect frontal left-brain area development, which is a nodal brain region in phonological-stimuli awareness.
... This assumption is consistent with PAM/PAM-L2 (Best, 1995;Best & Tyler, 2007) and Articulatory Phonology (Browman & Goldstein, 1986, 1992, because the gestural scores for C and V tend to be in an in-phase coupling relation, i.e., the onset consonant and the vowel can have very similar initiation times, and listeners can therefore process C and V simultaneously. Potentially, nonnative (L2) speech perception involves not only processing speech input at the phonemic level but also involves coarse-coding mechanisms (Dehaene-Lambertz et al., 2000), i.e., the input signal is directly parsed into large processing units such as diphones, triphones, or syllables. Furthermore, the dual-stream neuropsychological model of speech perception (Hickok & Poeppel, 2004, 2007 also suggests that speech processing requires information integration on different timescales: a short processing window (20-50 ms) is required for processing phoneme-level representations, while a long window (150-300 ms) is used for Licit-licit Licit-illicit Licit-illicit Phonological type processing syllable-level representations, and optimal processing occurs when phonemic and syllabic representations are consistently integrated. ...
... Therefore, CV syllables can be seen as complex constellations or may be regarded as gestural compounds (i.e., combinations of molecules) in nonnative speech perception, and accordingly, the assimilation principles can be extended from single phonemes to phonemic sequences for making predictions and explanations. Such an extension also echoes the call that syllables but not single phonemes should be regarded as the basic processing units in speech perception (e.g., Dehaene-Lambertz et al., 2000;Dupoux, 1993;Massaro, 1974;Strange, 2011). At the same time, this extension is still compatible with a phoneme-based approach (especially when phonotactics and contextual effects are not of interest) because phonemes form constituents of syllables in the prosody hierarchy, i.e., phoneme substitution occurs (e.g., /iː/ → /ei/) and thus an unattested syllable is assimilated to an attested syllable (e.g., */wiː/ → /wei/). ...
Article
Full-text available
The study presented here examines how adult L2 listeners' L1 phonotactics interfere with L2 vowel perception in different consonantal contexts. We examined Mandarin listeners' perception of the English /ei/-/iː/ vowel contrast in three onset consonantal contexts, /p f w/, which represent different phonotactic scenarios with respect to the permissibility of Mandarin phonology. L1 Mandarin listeners (N = 42) completed a series of three tasks: a categorisation task, a vowel identification task, and an AXB discrimination task. The results show that English /ei/-/iː/ are perceived as highly contrastive in the /p/ context because both /pei/ and /piː/ constitute a licit sequence in Mandarin phonology. However, participants experience substantial /ei/-/iː/ category confusions in the /f/ and /w/ contexts, where Mandarin listeners repair perceptually by modifying the vowel quality in illicit (unattested) consonant-vowel sequences, i.e., */fiː/ → /fei/ and */wiː/ → /wei/. Further exploratory analyses indicate that L2 listeners' vowel perception in unfamiliar phonotactic contexts is associated with their target language experience, typically indicated by their L2 vocabulary size. The findings thus suggest that the acquisition of novel phonotactic regularities is tied to increased experience with the L2 lexicon.
... As a result, the percept is no longer faithful to the L2 input but commensurate with the listener's L1 expectations (Kilpatrick et al. 2020). For this reason, nonnative and L2 listeners often fail to discriminate cluster sequences from the same sequences in which the expected epenthetic vowel is actually present (e.g., /ebzo/-/ebuzo/) at both the behavioural and neurophysiological level (Dehaene-Lambertz et al. 2000;Wagner et al. 2012). The inverse process to epenthesis is perceptual deletion, i.e., X → Ø, where a phonotactically unattested sequence is repaired by eliding a segment in the sequence for rendering a perceptual representation which complies with L1 phonological regularities. ...
Article
Nonnative or second language (L2) perception of segmental sequences is often characterised by perceptual modification processes, which may "repair" a nonnative sequence that is phonotactically illegal in the listeners' native language (L1) by transforming the sequence into a sequence that is phonotactically legal in the L1. Often repairs involve the insertion of phonetic materials (epenthesis), but we focus, here, on the less-studied phenomenon of perceptual deletion of nonna-tive phonemes by testing L1 Mandarin listeners' perception of post-vocalic laterals in L2 English using the triangulating methods of a cross-language goodness rating task, an AXB task, and an AX task. The data were analysed in the framework of the Perceptual Assimilation Model (PAM/PAM-L2), and we further investigated the role of L2 vocabulary size on task performance. The experiments indicate that perceptual deletion occurs when the post-vocalic lateral overlaps with the nucleus vowel in terms of tongue backness specification. In addition, Mandarin listeners' discrimination performance in some contexts was significantly correlated with their English vocabulary size, indicating that continuous growth of vocabulary knowledge can drive perceptual learning of novel L2 segmental sequences and phonotactic structures.
Article
Full-text available
Cognitive reserve is the ability to actively cope with brain deterioration and delay cognitive decline in neurodegenerative diseases. It operates by optimizing performance through differential recruitment of brain networks or alternative cognitive strategies. We investigated cognitive reserve using Huntington’s disease (HD) as a genetic model of neurodegeneration to compare premanifest HD, manifest HD, and controls. Contrary to manifest HD, premanifest HD behave as controls despite neurodegeneration. By decomposing the cognitive processes underlying decision making, drift diffusion models revealed a response profile that differs progressively from controls to premanifest and manifest HD. Here, we show that cognitive reserve in premanifest HD is supported by an increased rate of evidence accumulation compensating for the abnormal increase in the amount of evidence needed to make a decision. This higher rate is associated with left superior parietal and hippocampal hypertrophy, and exhibits a bell shape over the course of disease progression, characteristic of compensation.
Article
In this EEG study, we examined the ability of French listeners to perceive and use the position of stress in a discrimination task. Event-Related-Potentials (ERPs) were recorded while participants performed a same-different task. Different stimuli diverged either in one phoneme (e.g., /ʒy'ʁi/-/ʒy'ʁɔ̃/) or in stress position (e.g., /ʒy'ʁi/-/'ʒyʁi/). Although participants reached 93% of correct responses, ERP results indicated that a change in stress position was not detected while a change in one phoneme elicited a MisMatchNegativity (MMN) response. It results that in the early moments of speech processing, stimuli that are phonemically identical but that differ in stress position are perceived as being strictly similar. We concluded that the good performance observed in behavioral responses on stress position contrasts are due to attentional/decisional processes linked to discrimination tasks, and not to automatic and unconscious processes involved in stress position processing.
Preprint
Full-text available
Cognitive reserve is the ability to actively cope with brain deterioration and delay cognitive decline in neurodegenerative diseases. We combined computational modelling (drift diffusion models, DDMs) and neuroanatomical analysis using Huntington’s disease (HD) as a genetic model of neurodegenerative disease to study compensation in premanifest mutation carriers (preHDs). Twenty preHDs, 28 early-stage HD patients (earlyHDs), and 45 controls performed a discrimination task. We used DDMs to investigate underlying cognitive performances and explored the relationship with neuroanatomical substrates. Compared with controls, earlyHDs performed less and preHDs performed similarly. DDMs showed a progressive increase in the amount of evidence needed to take a decision from controls to preHDs and earlyHDs. This increase in response threshold predicted an increase in the rate of evidence accumulation. In preHDs, the higher rate was associated with left parietal and hippocampal hypertrophy, and showed an inversed U-shaped pattern over the course of disease progress, characteristic of compensation.
Chapter
Neural oscillations have emerged as a paradigm of reference for EEG and MEG research. In this chapter, we highlight some the possibilities and limits of modelling the dynamics of complex stimulus perception as being shaped by internal oscillators. The reader is introduced to the main physiological tenets underpinning the use of neural oscillations in cognitive neuroscience. The concepts of entrainment and neural tracking are illustrated with particular reference to speech and language processes.Key wordsNeural oscillationsNeural entrainmentCortical trackingSynchronySpeechLanguage
Chapter
The chapter examines the impact of experience-dependent factors (cross-linguistic similarities between the first and second languages, age of acquisition, proficiency, and quality of language exposure) on second-language phonological, semantic, and syntactic processing. Event-related potential (ERP) studies on second-language analysis are examined and summarized for each level of sentence analysis and each experiential factor. The overview provided here points to a largely qualitative distinction between experience-dependent effects observed in phonology/syntax and those reported in semantics, with experience having a stronger impact on the first two domains. The chapter also highlights novel research directions to be pursued and invites to reflection on methodological choices made in bilingual research literature.Key wordsBilingualismERPSyntaxSemanticsPhonologyL1–L2 similarityAoAProficiencyImmersion
Article
Full-text available
The language environment modifies the speech perception abilities found in early development. In particular, adults have difficulty perceiving many nonnative contrasts that young infants discriminate. The underlying perceptual reorganization apparently occurs by 10–12 months. According to one view, it depends on experiential effects on psychoacoustic mechanisms. Alternatively, phonological development has been held responsible, with perception influenced by whether the nonnative sounds occur allophonically in the native language. We hypothesized that a phonemic process appears around 10–12 months that assimilates speech sounds to native categories whenever possible; otherwise, they are perceived in auditory or phonetic (articulatory) terms. We tested this with English-speaking listeners by using Zulu click contrasts. Adults discriminated the click contrasts; performance on the most difficult (80% correct) was not diminished even when the most obvious acoustic difference was eliminated. Infants showed good discrimination of the acoustically modified contrast even by 12–24 months. Together with earlier reports of developmental change in perception of nonnative contrasts, these findings support a phonological explanation of language-specific reorganization in speech perception. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Defines the problem of serial order in noncreative behavior in much the same manner as K. S. Lashley, and examines several theories of serial order. Lashley's rejection of associative-chain theories of serial order is shown to apply to 1 particular theory, and to be invalid as applied to other associative theories. The most plausible theory is the context-sensitive associative theory, which assumes that serial order is encoded by means of associations between context-sensitive elementary motor responses. In speech, this means that a word, e.g., sleep, is assumed to be coded allophonically rather than phonemically. This theory handles the pronunciation of single words and even phrases in a certain sense. (19 ref.) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Abstract An event-related brain potential (ERP) reflecting the acoustic-phonetic process in the phonological stage of word processing was recorded to the terminal words of spoken sentences. The peak latency of this negative-going response occurred between 270 and 300 msec after the onset of the terminal word. The independence of this response (the phonological mismatch negativity, PMN) from the ERP component known to be sensitive to semantic violations (N400) was demonstrated by manipulating sentence endings so that phonemic and semantic violations occurred together or separately. Four conditions used sentences that ended with (1) the highest Cloze probability word (e.g., "The piano was out of tune."), (2) a word having the same initial phoneme of the highest Cloze probability word but that was, in fact, semantically anomalous (e.g., "The gambler had a streak of bad luggage."), (3) a word having an initial phoneme different from that of the highest Cloze probability word but that was, in fact, semantically appropriate (e.g., "Don caught the ball with his glove."), or (4) a word that was semantically anomalous and, therefore, had an initial phoneme that was totally unexpected given the sentence's context (e.g., "The dog chased our cat up the queen"). Neither the PMN nor the N400 was found in the first condition. Only an N400 was observed in the second condition while only a PMN was seen in the third. Both responses were elicited in the last condition. Finally, a delayed N400 occurred to semantic violations in the second condition where the initial phoneme was identical to that of the highest Cloze probability ending. Results are discussed with regard to the Cohort model of word recognition.
Article
An important question in speech perception research is whether or not listeners have information about the degree to which an acoustic feature is present in speech. Evidence from traditional experimental studies of categorical perception suggests a negative answer for some speech sounds. In the present experiments, listeners were asked for continuous rather than discrete judgments in order to provide a more direct answer to this question. Subjects were asked to rate speech sounds according to where they fell on a particular speech continuum. The continua consisted of stop consonants varying in place (/ba/ to /da/) or voicing (/ba/ to /pa/) or a vowel continuum varying from /i/ to /I/. The rating responses of individual subjects were used to test quantitative models of “categorical” and “continuous” perception of acoustic features in speech. [Work supported by NIMH.]
Article
Previous work in which we compared English infants, English adults, and Hindi adults on their ability to discriminate two pairs of Hindi (non-English) speech contrasts has indicated that infants discriminate speech sounds according to phonetic category without prior specific language experience (Werker, Gilbert, Humphrey, & Tees, 1981), whereas adults and children as young as age 4 (Werker & Tees, in press), may lose this ability as a function of age and or linguistic experience. The present work was designed to (a) determine the generalizability of such a decline by comparing adult English, adult Salish, and English infant subjects on their perception of a new non-English (Salish) speech contrast, and (b) delineate the time course of the developmental decline in this ability. The results of these experiments replicate our original findings by showing that infants can discriminate non-native speech contrasts without relevant experience, and that there is a decline in this ability during ontogeny. Furthermore, data from both cross-sectional and longitudinal studies shows that this decline occurs within the first year of life, and that it is a function of specific language experience. © 2002 Published by Elsevier Science Inc.
Article
An experiment was conducted to test two predictions entailed by the hypothesis that Japanese morae have constant durations. The first prediction is that a segment’s duration will conform to its moraic status in an utterance; a consonant will be longer when it is syllabic than when it occurs in a GV mora, and phonemically long consonants will have durations that reflect their moraic structure. The second prediction is that segment lengths will vary to compensate for the intrinsic durations of adjacent segments in order to equalize mora lengths. Neither of these predictions was borne out.
Article
The role of the syllable during on-line speech perception was explored using a variant of the phoneme detection task developed by Pitt and Samuel (1990). In their task, listeners′ attention to phonemes in different serial positions inside word or nonword stimuli was manipulated by varying the probability that a target phoneme occurred in the various positions. In our experiments, French and Spanish subjects had to detect targets that appeared either in the coda of the first syllable or in the onset of the second syllable of carrier words. Subjects′ expectations about the structural position of the target were manipulated. In a series of five experiments (two using a decision paradigm and three using a detection paradigm), these expectations were shown to influence response latencies: that is, subjects who attended to the coda of the first syllable were faster when the target appeared in this position rather than in the onset of the second syllable; the reverse pattern was observed when subjects attended to the onset of the second syllable. This result held regardless of the serial position of the target. These results were equally valid for French and Spanish. Moreover, syllabification was present when Spanish pseudowords were used as carriers. The fact that subjects could focus their attention on a syllabically defined position, even when processing nonwords, suggests that syllabic information is specified at a prelexical level of representation. The phoneme detection task in which attention is manipulated provides us with an interesting new technique for exploring prelexical representations.
Article
Abstract Two experiments examined phonological priming effects on reaction times, error rates, and event-related brain potential (ERP) measures in an auditory lexical decision task. In Experiment 1 related prime-target pairs rhymed, and in Experiment 2 they alliterated (i.e., shared the consonantal onset and vowel). Event-related potentials were recorded in a delayed response task. Reaction times and error rates were obtained both for the delayed and an immediate response task. The behavioral data of Experiment 1 provided evidence for phonological facilitation of word, but not of nonword decisions. The brain potentials were more negative to unrelated than to rhyming word-word pairs between 450 and 700 rnsec after target onset. This negative enhancement was not present for word-nonword pairs. Thus, the ERP results match the behavioral data. The behavioral data of Experiment 2 provided no evidence for phonological Facilitation. However, between 250 and 450 msec after target onset, i.e., considerably earlier than in Experiment 1, brain potentials were more negative for unrelated than for alliterating Word-word and word-nonword pairs. It is argued that the ERP effects in the two experiments could be modulations of the same underlying component, possibly the N400. The difference in the timing of the effects is likely to be due to the fact that the shared segments in related stimulus pairs appeared in different word positions in the two experiments.