ArticlePDF Available

PRIMIR: A Developmental Framework of Infant Speech Processing

Authors:

Abstract and Figures

Over the past few years, there has been an increasing emphasis on studying the link between infant speech perception and later language acquisition. This research has yielded some seemingly contradictory findings: In some studies infants appear to use phonetic and indexical detail that they fail to use in other studies. In this article we present a new, unified framework for accounting for these divergent findings. PRIMIR (a developmental framework for Processing Rich Information from Multi-dimensional Interactive Representations) assumes there is rich information available in the speech input and that the child picks up and organizes this information along a number of multidimensional interactive planes. Use of this rich information depends on the joint activity of 3 dynamic filters. These filters—the initial biases, the develop-mental level of the child, and requirements of the specific language task the child is facing—work together to differentially direct attention to 1 (or more) plane. In this article we outline the contradictory data that need to be explained, elucidate PRIMIR, including its underlying assumptions and overall architecture, and compare it to ex-isting frameworks. We conclude by presenting core predictions of PRIMIR. Research on speech perception and word learning in infancy has produced a number of divergent findings that are difficult to reconcile within any of the existing models. In illustration of the complexity of the data generated, it has been known for some time that infants show categorical perception and treat variable instances of the same phonetic category as equivalent (for reviews, see Jusczyk, 1987; Werker, 1995). LANGUAGE LEARNING AND DEVELOPMENT, 1(2), 197–234 Copyright © 2005, Lawrence Erlbaum Associates, Inc.
Content may be subject to copyright.
PRIMIR: A Developmental Framework
of Infant Speech Processing
Janet F. Werker
Department of Psychology
University of British Columbia
Suzanne Curtin
Departments of Linguistics and Psychology
University of Pittsburgh
Over the past few years, there has been an increasing emphasis on studying the link
between infant speech perception and later language acquisition. This research has
yielded some seemingly contradictory findings: In some studies infants appear to use
phonetic and indexical detail that they fail to use in other studies. In this article we
present a new, unified framework for accounting for these divergent findings.
PRIMIR (a developmental framework for Processing Rich Information from Multi-
dimensional Interactive Representations) assumes there is rich information available
in the speech input and that the child picks up and organizes this information along a
number of multidimensional interactive planes. Use of this rich information depends
on the joint activity of 3 dynamic filters. These filters—the initial biases, the develop-
mental level of the child, and requirements of the specific language task the child is
facing—work together to differentially direct attention to 1 (or more) plane. In this
article we outline the contradictory data that need to be explained, elucidate PRIMIR,
including its underlying assumptions and overall architecture, and compare it to ex-
isting frameworks. We conclude by presenting core predictions of PRIMIR.
Research on speech perception and word learning in infancy has produced a number
of divergent findings that are difficult to reconcile within any of the existing models.
In illustration of the complexity of the data generated, it has been known for some
time that infants show categoricalperception and treat variable instances of the same
phonetic category as equivalent (for reviews, see Jusczyk, 1987; Werker, 1995).
LANGUAGE LEARNING AND DEVELOPMENT, 1(2), 197–234
Copyright © 2005, Lawrence Erlbaum Associates, Inc.
Requests for reprints should be sent to Janet F. Werker, The University of British Columbia, Depart-
ment of Psychology, 2136 West Mall, Vancouver, BC V6T 1Z4. E-mail: jwerker@psych.ubc.ca
They are able to segment words from the speech stream (Jusczyk & Aslin, 1995),
recognize familiar words (Halle & de Boysson-Bardies, 1996), and recognize famil-
iar voices (DeCasper & Fifer, 1980). Across the first year of life, these perceptual
processes become language-specific, with infants paying particular attention to
those values that are important in their native language (Werker & Tees, 1984).
At the same time that they are showing attention to category and word-level
properties, infants also detect within-phonetic-category differences (Eimas &
Miller, 1992; Kuhl, 1983a; McMurray & Aslin, 2005), show attention to contex-
tual effects such as speaking rate (Eimas & Miller, 1992), utilize subcategorical in-
formation such as coarticulatory cues (Curtin, Mintz, & Byrd, 2001), and utilize
stress (Curtin, Mintz, & Christiansen, in press; Johnson & Jusczyk, 2001; Mattys,
Jusczyk, Luce, & Morgan, 1999; Thiessen & Saffran, 2003). In counting tasks, in-
fants utilize syllables and not individual phonetic segments (Bijeljac-Babic,
Bertoncini, & Mehler, 1993), but in other tasks, same aged infants can and do ac-
cess phonetic and other subsyllabic information. This is seen in preference
(Jusczyk, Goodman, & Baumann, 1999), discrimination (Dupoux & Peperkamp,
2002; J. L. Miller & Eimas, 1996), and segmentation (Christophe, Dupoux,
Bertoncini, & Mehler, 1994; Jusczyk, Hohne, & Bauman, 1999) tasks. Moreover,
when encoding word forms, young infants remember not only the word, but also
specific indexical properties such as speaker (Houston & Jusczyk, 2000), stress,
amplitude, and affect (Singh, Bortfeld, & Morgan, 2002). When they first start ac-
quiring grammar (Naigles, 2002) and mapping sound onto meaning, infants some-
times ignore the phonetic differences they can perceive in simple discrimination
tasks (Stager & Werker, 1997). Conversely, when they first start pulling out word
forms (devoid of meaning) infants will sometimes fail to recognize a previously fa-
miliarized word if it is presented in a new voice (Houston & Jusczyk, 2003). Only
as they advance in lexical knowledge do infants learn that it is the phonetic quality
that specifies identity in the word and that indexical features such as individual
voice can be ignored in some tasks. Essentially, the same speech input can be per-
ceived categorically, gradiently, or ignored.
To account for these results, we propose PRIMIR, a developmental framework
for Processing Rich Information from Multidimensional Interactive Representa-
tions encoded from the speech signal. Processing in PRIMIR relies on three dy-
namic filters: the initial biases, the developmental level of the child, and the re-
quirements of the specific language task the child is facing. These filters enhance
or diminish the raw physical saliency (acoustic, phonetic, gestural, visual, etc.) of
the information in the signal. Representations in PRIMIR rest on the fact that there
is rich information available in the speech input and that the system picks up and
organizes this information along a number of different dimensions. The simulta-
neous representation of multiple dimensions of the signal results in a number of
emergent planes. The planes are not “given” at birth. Rather, they are the joint
product of the biases the infant brings, the regularities in the input, and statistical
198 WERKER AND CURTIN
learning. These three forces coalesce to help ensure that only linguistically possi-
ble combinations are learned. The planes allow for utilization of different informa-
tion for different language tasks, with some types of information more easily ac-
cessible at different times in development. However, as will become apparent,
neither the processing aspects of PRIMIR nor the representational aspects can be
understood without the other. We argue that it is only this type of framework that
can account for the otherwise incommensurate findings in infant speech process-
ing. Although not yet a formal model, this conceptual framework serves as a
springboard for investigating the conditions under which infants use some rather
than other of the rich information in the speech signal and serves as a foundation
for better understanding how a phonological system is constructed.
SPEECH PERCEPTION IN INFANCY
The signal is rich with information. Infants are born with a number of biases that
selectively direct their attention to some aspects of the signal over others. Infants
have a propensity to listen to speech over nonspeech from birth (Vouloumanos &
Werker, 2004) and continue to a show this preference over the next several months
(Vouloumanos & Werker, 2004). Moreover, from the first days of life, speech acti-
vates specialized areas in the brain (Dehaene-Lambertz, Dehaene, & Hertz-Pan-
nier, 2002; Dehaene-Lambertz & Pena
~, 2001; Pena
~et al., 2003) and facilitates
processing of rhythmical patterns (Nazzi, Bertoncini, & Mehler, 1998; Ramus,
Hauser, Miller, Morris, & Mehler, 2000). Infants discriminate languages from dif-
ferent rhythmical classes at birth (Mehler et al., 1988). By 5 months old infants can
even discriminate their native language from unfamiliar languages within the same
rhythmical class (e.g., English infants can discriminate English from German or
British from American English; Nazzi, Jusczyk, & Johnson, 2000). Bilin-
gual-learning Spanish–Catalan infants can discriminate their two, rhythmically
similar, native languages at 4 months of age (Bosch & Sebastián-Gallés, 1997).
At birth, infants perceive consonant differences categorically (Bertoncini,
Bijeljac-Babic, Blumstein, & Mehler, 1987), especially if they are in well-formed
syllables (Bertoncini & Mehler, 1981). Newborns show a preference for listening
to the point vowels and use point vowels as reference points in discrimination tasks
(see Polka & Bohn, 2003, for a review). They also show a preference for infant-
directed speech (Cooper & Aslin, 1990).
PHONETIC AND INDEXICAL INFORMATION
Some of the rich information in the signal directly contributes to phonetic process-
ing, whereas some is implicated in indexical processing. It is difficult, perhaps im-
PRIMIR 199
possible, to always make a clear-cut distinction between these two, but tradition-
ally, distinctions have been made.
Phonetic Information
Phonetic categories are classically understood to be bundles of acoustic or
articulatory features. Infants perceiveconsonant differences categorically. The clas-
sic demonstration from Eimas, Siqueland, Jusczyk, and Vigorito(1971) showed that
infants 1 to 4 months old discriminate a 20 msec difference in voiceonset time (VOT;
the interval between the release of a consonant and the onset of voicing) in stimuli
from two sides of an adult phonemic category boundary (e.g., /ba/ from /pa/) but not
two equally distinct stimuli from within the /ba/ or within the /pa/ categories (see also
Aslin, Pisoni, Hennessy, & Perey, 1981; Eimas et al., 1971). Categorical perception
was subsequently demonstrated across a number of consonant dimensions and con-
trasts (see Jusczyk, 1981, for a review). Moreover, event-related potential studies in-
dicate greater left-hemisphere involvement in the discrimination of consonants
(Dehaene-Lambertz & Baillet, 1998; Dehaene-Lambertz & Gliga, in prep).
Vowels presented in isolation are perceived more continuously than are con-
sonants, and vowels presented in a consonant (C)–vowel (V)–consonant (C) con-
text are perceived more categorically than are vowels presented in isolation
(Swoboda, Kass, Morse, & Leavitt, 1978). Although vowels in isolation are not
perceived categorically, there is an internal organization to vowel categories.
This is seen in studies with adults revealing a prototype organization with
agreed-upon “best” exemplars (by speakers of a language group; e.g., Kuhl,
1991) and by studies with infants showing asymmetries in perception with dis-
crimination of central versus peripheral instances of the category different from
discrimination of peripheral versus central (Kuhl, Williams, Lacerda, Stevens, &
Lindblom, 1992). The organization of vowel categories allows infants to treat
vowels as equivalent when spoken by different talkers at different ages and dif-
ferent genders (Kuhl, 1983a) even though the infants show clear evidence of dis-
criminating the voices of these talkers.
There is a change across the first year of life from sensitivity to both lan-
guage-general and language-specific phonetic differences among consonants to an
exclusive focus on language-specific phonetic detail (Werker & Tees, 1984). At 6
to 8 months of age English infants discriminate both English bilabial and alveolar
place distinctions (/ba/-/da/) and Hindi retroflex and dental place distinctions (/Ta/
-/ta/; Werker, Gilbert, Humphrey, & Tees, 1981), whereas by 10 to 12 months Eng-
lish infants act like English adults and no longer discriminate the cues distinguish-
ing the two different Hindi “t” sounds (Werker & Tees, 1984). Of interest, discrim-
ination of non-native contrasts can be detected even through adulthood. When
tested in tasks with a low memory load, such as a shortened interstimulus interval
200 WERKER AND CURTIN
(ISI), adults show continued sensitivity even to the most difficult non-native con-
trast (Werker & Logan, 1985).
Selective perception of language-specific information in vowels is evident at an
even earlier age (Kuhl et al., 1992; Polka & Werker, 1994). Even though they can-
not categorize at any arbitrary point in a continuum (Werker & Lalonde, 1988), at 6
months of age infants are sensitive to some phonetic differences that do not con-
tribute to a category distinction in the language (Pegg & Werker, 1997). They can
discriminate initial position [d] from the unaspirated, voiceless [t] extracted from
the “st” cluster (which adults hear as a /d/). By 10 to 12 months of age, English in-
fants no longer respond to this difference (Pegg & Werker, 1997).
There is increasing evidence that there are subtle properties of the speech signal
that can contribute to the formation of a phonetic category that are different from
those in typical featural systems (see Pierrehumbert, 2003a). For example, VOT is
a feature associated with the voiced–voiceless phonetic category distinction. It is a
gradient property that is perceived categorically, with voiced segments having a
long lead and voiceless segments having a long lag. This feature is reliable for dis-
tinguishing obstruents (such as b, d, g vs. p, t, k) that occur in syllable-initial posi-
tion. Other cues to voicing include change in fundamental frequency, closure dura-
tion, duration of the preceding vowel, and onset of the first formant. Independent
of one another, the cues might not result in a phonetic distinction within the lan-
guage; however, the grouping of these cues results in the distinction between
voiced and voiceless segments.
While VOT is perceived categorically, within-category variability is also de-
tectable under appropriate testing conditions (e.g., Pisoni & Tash, 1974). Recent
work shows that this sensitivity to subcategorical variation is not restricted to
meaningless phonetic perception tasks. By monitoring participants’ eye move-
ments while they chose among four pictures upon hearing the name of the object,
McMurray, Tanenhaus, and Aslin (2002) reported gradient increases in fixations to
a cross-boundary competitor as a function of 5 msec differences in VOT. The clas-
sic categorical perception finding has recently been nuanced by research showing
that infants as young as 3 to 4 months, like adults, show graded perception of VOT
under appropriate testing conditions (J. L. Miller & Eimas, 1996), with evidence of
within-category discrimination clearly provided in a recent study by McMurray
and Aslin (2005).
The availability of gradient information to a statistical learning mechanism may
contribute to the modifiability of phonetic categories. Maye, Werker, and Gerken
(2002) empirically demonstrated that infants use distributional information to
modify phonetic categories. In this work, two groups of infants were familiarized
to different distributions of eight tokens of /da/ spanning a continuum from an ini-
tial position [da] to the unaspirated, voiceless [ta]. All infants heard all eight to-
kens, but one group heard more instances of Stimuli 2 and 7, corresponding to a bi-
modal distribution of input, whereas the other group heard more instances of
PRIMIR 201
Stimuli 4 and 5, corresponding to a unimodal distribution of input. Following fa-
miliarization, infants in the bimodal group were better able to discriminate the end-
point stimuli, 1 and 8, than were infants in the unimodal group even though both
groups of infants had heard an equal number of repetitions of Stimuli 1 and 8 in the
familiarization phase. What is not known is whether gradient information can be
used to modify phonetic categories or whether infants are constrained in just what
within-category information they can use (see Maye & Weiss, 2003, for a discus-
sion).
Anderson, Morgan, and White (2003) showed another role of statistical learn-
ing in phonetic perception. They asked whether frequency of exposure to native
sounds would influence the rate of decline. In support of this, they found that al-
though English infants of 6 months could discriminate two non-native contrasts
(the Hindi dental-retroflex /t/-/T/ and the Salish glottalized velar-uvular /’k/-/’q/),
at the intermediate age of 9 months they were no longer discriminating the /t/-/T/
contrast but were still discriminating the /’k/-/’q/, presumably because the English
[t] is more frequent in the input than is the English [k] (however, see Best et al.,
1997, for an alternative theory). Taken together, the Maye et al. (2002) and Ander-
son et al. (2003) studies showed that infants use both distributional and frequency
information in the input to restructure their phonetic categories.
By the time they reach 10 months of age, infants are also sensitive to other sta-
tistical phonetic regularities. For example, at this age, but not before, they show a
preference for listening to lists of words that correspond to the acceptable
phonotactics of the native language. An English-learning infant, for example, will
listen longer to words beginning with the acceptable “str” and to words ending
with the English “rts” than to words that violate this language-specific regularity
(Jusczyk, Friederici, Wessels, Svenkerud, & Jusczyk, 1993). Furthermore, when
presented with lists of words, all of which follow acceptable phonotactic patterns
of the native language, infants show a preference for listening to the words with
high frequency phonotactics (Jusczyk, Luce, & Charles-Luce, 1994). By 16.5
months these preferences can be manipulated by exposure to novel phonotactic
regularities (Chambers, Onishi, & Fisher, 2003). This shows that the preference is
based on the frequency of occurrence of these sequences.
Frequent patterns occur along a number of dimensions. In addition to
phonotactics, there are acoustic correlates to grammatical category, to syllable pat-
tern, and to word co-occurrences, to name a few. In stress-accent languages, funda-
mental frequency, together with intensity and duration, results in the perception of
stress (Beckman, 1986) and thus demarcates strong versus weak syllables. The
rhythmic characteristics of English words are overwhelmingly strong-weak (Cut-
ler & Carter, 1987). Thus, English-raised infants are likely exposed to more tro-
chaic patterns in the language than to other rhythmic patterns. Given this exposure,
they might begin organizing speech based on the trochaic pattern. At 9, but not 6
months of age, English infants do, in fact, demonstrate a preference for listening to
202 WERKER AND CURTIN
lists of strong first syllable, weak second syllable (SW) words, demonstrating the
emergence of a trochaic bias (Jusczyk, Cutler, & Redanz, 1993). Moreover, be-
tween 7 and 9 months of age infants begin to treat trochaic patterns as a cohesive
unit (Echols, Crowhurst, & Childers, 1997).
The information allowing discrimination of different segments and of different
rhythmical patterns seems to be available to infants from very early in life, but in-
fants appear to require a period of learning and development in order to integrate
rhythmic and segmental information. In illustration, Morgan and Saffran (1995)
presented infants with sequences of syllables in a specific rhythm. To determine
whether the infants had grouped the test syllables into a unit, infants were tested on
their latency to detect a buzz (as evidenced by a conditioned head-turn response)
that was inserted at a position that either maintained or violated the grouping. At 9
months infants more rapidly detect the buzz if the rhythm and sequential syllabic
information both correspond to that which they had learned (e.g., treat GOka as a
unit, whether in a tiGOka, deGOka, GOkati, or GOkade context). At 6 months in-
fants only show this latency difference when the rhythmic pattern remains the
same, regardless of whether the syllable sequences change or not. This suggests
that only the older infants were able to pick up both sets of cues in the short training
phase and integrate them.
Indexical Information
Indexical information, sometimes called “paralinguistic, refers to properties in
the speech input that carry information about gender, affect, speaker identity, age,
emphasis, and so on. To date, there have not been many studies of indexical per-
ception in infancy. A number of studies have examined perception of acoustic cues
that are not used in speech (e.g., Colombo & Horowitz, 1986, for frequency dis-
crimination), but our knowledge of how infants perceive indexical cues in speech
is only beginning to emerge. We know that even neonates use differences in pitch
to discriminate words (Nazzi, Floccia, & Bertoncini, 1998). They can discriminate
individual voices (DeCasper & Prescott, 1984; Floccia, Nazzi, & Bertoncini,
2000) and show a preference for their mother’s voice (DeCasper & Fifer, 1980). At
4 months (Fernald, 1984; Werker & McLeod, 1989) and even before (Cooper &
Aslin, 1994), infants selectively listen to infant-directed speech.
Even less is known about infants’ ability to categorize stimuli on the basis of
indexical information. One study has shown categorization of voices by 4 to 6
months on the basis of gender (C. L. Miller, 1983). Other studies have shown that
infants can match both gender (Patterson & Werker, 2002; Walker-Andrews,
Bahrick, Raglioni, & Diaz, 1991) and age and affect (Bahrick, Netto, & Hernandez
Reif, 1998) in the face and voice by 6 or 8 months of age, an ability that may pre-
suppose categorization.
PRIMIR 203
Although it is useful to make clear-cut distinctions between phonetic and
indexical perception, it is also necessary to examine the interplay between the two.
Early investigations focused on the question of whether one was more important
than the other. In illustration, Kuhl (1983b) showed that infants can ignore varia-
tions in speaker gender and speaker age yet still discriminate stimuli on the basis of
vowel category, suggesting that in this case, phonetic information is more impor-
tant. Similarly, Patterson and Werker (2002) showed that infants can match pho-
netic information in the face and voice at a younger age (2 months) than they can
gender information (6 to 8 months).
The focus has broadened to explore how indexical and phonetic information
might be mutually informative. The first demonstration in infant speech perception
was provided by Karzon (1985), who showed that infants better discriminate the
middle syllable in “marana” versus “malana” if it is emphasized with infant-directed
prosody. Infant-directed speech also facilitates infant discrimination of non-native
phonetic contrasts, as does the use of a female rather than a male voice (Panneton
Cooper & Ostroff, 2003). Acoustic analyses of infant-directed speech reveal that
both voicing contrasts (Ratner & Luberoff, 1984) and vowel space (Kuhl et al., 1997;
Ratner, 1984) are exaggerated in infant-directed speech. This exaggeration of the
vowel space is unique to infant-directed speech and is not evident in the exaggerated
speech directed to family pets (Burnham, Kitamura, & Vollmer-Conna, 2002).
Indexical information may not simply aid in performance in experimental tasks of
phonetic perception but may contribute to a more robust phonetic representation.
For example, maternal clarification of the vowel space is correlated with superior
speech discrimination in infants of 6 to 12 months (Liu, Kuhl, & Tsao, 2003).
This brief and selective review illustrates that infants are sensitive to detailed in-
formation in the signal and that they can organize it in such a way as to form appro-
priate groupings. Some of these groupings represent phonetic categories (both
acoustically and articulatorily defined), others indexical. In some tasks infants at-
tend to gradient information instead of ignoring it. This was shown in the categori-
cal (Eimas et al., 1971) and the gradient (McMurray & Aslin, 2005) perception of
VOT. Similarly, although numerous studies were reviewed showing that infants are
sensitive to gender information in the voice (see also Walker-Andrews et al.,
1991), it was also shown that there are conditions under which gender information
is ignored in favour of phonetic information when categorizing vowels (Kuhl,
1983b) and when matching lips and voice (Patterson & Werker, 2002).
WORDS
What are words? Ultimately words are recognizable forms that have meanings
shared by speakers of a community or language group. Fully mature words have
reference, in that they stand, on their own, for objects, events, properties, ideas, and
204 WERKER AND CURTIN
so on, that are of interest to other members of the community of speakers. Some
words stand for entire categories and others for only specific instances. Any partic-
ular word is a member of a particular grammatical category or categories. Each
word may ultimately support different types of morphological alternations, and
will have rules for its usage. Well-nuanced words may be used in novel grammati-
cal situations or new semantic contexts for specific communicative acts, such as in
poetry or metaphor. However, in order to occupy any of these roles, words need to
have an agreed-upon phonological form that allows production and comprehen-
sion by speakers and listeners.
The ability to learn words requires that infants be able to recognize a familiar se-
quence of phonetic segments and to treat it as a cohesive unit. Because not all words
are spoken in isolation, it is also necessary that infants have the ability to pull these
cohesiveunits out of the continuous speech stream. As illustrated in the previous sec-
tion, infants are sensitive to numerous properties of the signal. In this section we il-
lustrate how infants use this information to extract and remember word forms and
discuss how the information is applied in the acquisition of word meaning.
Word Forms: Segmentation, Recognition, and Mechanisms
Word forms are cohesive units that have been extracted from the speech stream
without being tied to meaning. Although parents may say some words in isolation
when speaking to their infants (Aslin, Woodward, LaMendola, & Bever, 1996;
Brent & Siskind, 2001), the task of learning to recognize word forms is more effi-
cient when the infant begins to parse words out of the continuous stream of speech.
Although rudimentary word segmentation is evident shortly after birth
(Christophe et al., 1994), infants become progressively more proficient at seg-
menting word forms as they learn more about the phonetic properties of words in
the native language, and as they become able to integrate multiple cues (Jusczyk,
1997a).
In a series of studies using the Head-Turn Preference Procedure (HPP), Jusczyk
and Aslin (1995) found that infants of 7.5, but not 6 months of age, can success-
fully segment words from ongoing speech. They first familiarized infants to two
words (e.g., cup, dog) and then tested them for their listening preference for pas-
sages that contained the familiar words over passages containing a different pair of
words. Infants showed a preference for familiar words even over minimal pair foils
(e.g., cup, tup). This shows that at this age, infants recognize the exact word from
familiarization when they hear it in a sentence context. Moreover, Jusczyk (1997b)
reviews studies showing this effect in the reverse direction as well. When familiar-
ized first to passages containing multiple instances of words, infants show a prefer-
ence for listening to those words in isolation over unfamiliar words.
When infants first demonstrate the ability to recognize familiar word forms, their
recognition performance reflects the degree of similarity between the word forms
PRIMIR 205
being compared. For example, it has been shown that if the word is first learned as
produced by one speaker, an infant of 7.5 months will show diminished recognition
if a new speaker with a very different voice (e.g., male to female) reproduces the
word (Houston & Jusczyk, 2000). More recent work has revealed that it is not the
male versus female difference that interferes with recognition; rather it is the overall
magnitude of the multidimensional differences in the voices that best predicts inter-
ference (Houston & Jusczyk, 2003). Similarly 7- to 8-month-old infants show better
word form recognition when speaker affect, focused versus nonfocused stress,
speech rate, or pitch match the original exposure (Morgan, 2002). Infants 11 months
and older, however, no longer show the same degree of disruption in word form rec-
ognition across variations in voice, affect, speech rate, and pitch.
After being exposed to a speech stream, infants only recognize the words they
have segmented if they agree in coarticulation information (Curtin, Mintz, & Byrd,
2001). Coarticulation refers to the fact that the production of speech sounds is in-
fluenced by the neighbouring sounds. For example, in the English words ‘coo’[ku]
and ‘key’[ki], the [k] sounds are produced differently (Ladefoged, 1993). Curtin et
al. familiarized 7-month-old infants to strings of syllable sequences that were ar-
ticulated appropriately for the context in which they were presented. Following fa-
miliarization, one group of infants was tested on their preference for familiar over
unfamiliar syllables that were properly coarticulated. A second group was tested
with syllables that were miscoarticulated across syllable boundaries. Appropriate
coarticulation was implemented by selecting items from the same context in which
they were originally produced. Miscoarticulation was implemented by splicing to-
gether syllables that were not initially produced adjacent to one another. Infants
showed recognition for only the appropriately coarticulated familiar items.
Shifting the stress to another syllable in segmentally equivalent word forms also
disrupts recognition. Curtin et al. (in press) familiarized 7-month-old infants to an
artificial language of CV syllables where every third syllable was stressed. Infants
pulled out sequences corresponding to a trochaic initial parse and then demon-
strated a listening preference for items that were identical in segmental and stress
information (DObita, DObita) over ones that were segmentally the same but with
stress shifted to an adjacent syllable (doBIta, DObita). This study, together with
the two preceding studies, reveals that indexical, phonetic, and stress information
are stored in word forms.
At the same time that infants are paying attention to the detailed information in
word forms, they are using language-specific properties to aid in their segmenta-
tion. When infants first start segmenting words at 7.5 months, their language-spe-
cific rhythmic biases guide segmentation. English infants at this age successfully
segment only strong-weak (SW) words showing a trochaic bias (Jusczyk, Hous-
ton, & Newsome, 1999), whereas Canadian French infants segment only
weak-strong (WS) words, showing a language-appropriate iambic bias (Polka,
Sundara, & Blue, 2002). The strength of the language-specific rhythmic bias is
206 WERKER AND CURTIN
demonstrated when English infants of 7.5 months misparse WS items. They treat
the S syllable as a word-initial syllable and parse the stream accordingly. Thus, in a
word like “guiTAR,” they segment “TAR.” If “TAR” is consistently followed by an
unstressed word, such as “is, infants in this age group treat “TAR” and “is” as a
single unit and pull out “TARis” (Jusczyk, Houston, et al., 1999). However, if two
strong syllable words are next to each other, as in “COLD ICE” or “PACK ASH,”
infants this age do not mis-segment (Mattys & Jusczyk, 2001). They pull out
“COLD” and “PACK” and they also pull out “ICE” and ASH” as individual items.
By 11 months, English-learning infants no longer mis-segment WS words, pre-
sumably because by this age they can combine their knowledge of the metrical
properties of the native language with their knowledge of both the phonotactics
and the position-specific phonetic variability (Jusczyk, 1997a).
How do infants segment word forms from the speech stream, and what allows
them to improve in their use of appropriate criteria for word segmentation? One
possible mechanism that infants have available is statistical learning, and one sta-
tistic that appears to be useable is transitional probabilities (Saffran, Aslin, & New-
port, 1996). Although not unique to speech (Saffran, Johnson, Aslin, & Newport,
1999), to human infants (Hauser, Newport, & Aslin, 2001), or even to the auditory
domain (see Kirkham, Slemmer, & Johnson, 2002, for statistical learning of visual
shapes in infancy), humans are able to use this general learning mechanism with
speech as well. Saffran et al. (1996; see also Aslin, Saffran, & Newport, 1998) ex-
amined whether infants could use such statistical information to discriminate syl-
lable sequences that form words and sequences that do not. They created an artifi-
cial language of trisyllabic nonwords. The statistical information available to the
infants was the transitional probabilities between successive syllables (1.0 word
internally = “words” and .33 across word boundaries = “part-words” ). After 2 min
of exposure to the language, infants aged 8 months showed a novelty preference
for “part-words” over “words, indicating appropriate segmentation based only on
transitional probabilities.
The convergence of studies showing tuning in the first year of life to the proper-
ties of the native language, together with the demonstration of statistical learning
in the laboratory, has led to studies directly assessing whether infants use statistical
regularities picked up over the course of listening to language in the “real” world to
direct their learning in the laboratory. Probabilistic information in the form of seg-
mental co-occurrences (phonotactics) is one type of regularity readily available to
infants for use in segmentation by 9 months of age. When presented with CVC se-
quences embedded in highly probable word-boundary contexts (i.e., following a C
that would create an illegal CCVC syllable), they were better able to segment the
CVC than if it had been embedded in a CCVC context in which the CC is a more
common within-word cluster. This was true for phonotactic cues (e.g., CC se-
quences) at both the beginning and the end of the CVC sequence (Mattys &
Jusczyk, 2001).
PRIMIR 207
A more subtle but equally important influence of statistics on word segmenta-
tion is that of frequency. The question that has been addressed here is whether
highly frequent words aid in segmentation of upcoming words. The evidence is in-
creasing that they do. In one study, Bortfeld, Rathbun, Morgan, and Golinkoff
(2003) showed that infants were better able to segment novel words when these
words were preceded by their name, a highly frequent (and presumably, highly fa-
miliar) word. Similarly, Shi, Werker, and Cutler (2003) showed that infants of 8
months are better able to segment and remember novel words when they are pre-
ceded by a highly frequent function word such as “the.
It is not necessarily the case that infants always use one of these strategies to
parse the speech stream. Rather, it is likely that some cues are more salient than
others and that statistics are calculated over a variety of cues, features, segments,
and suprasegmental properties. We have seen that infants are sensitive to stress and
coarticulatory information and that they compute statistics over the speech input.
To ascertain whether properties such as coarticulation and stress could override the
raw statistical information, Johnson and Jusczyk (2001) pitted these cues against
each other. Their results show that at 8 months both the coarticulatory cues and the
stress information override transitional probabilities, indicating that both of these
properties can change the parsing strategy. A more recent article showed that for
infants of a slightly younger age, when stress and transitional probabilities are pit-
ted against each other, transitional probabilities override stress (Thiessen &
Saffran, 2003).
Modeling provides a way of examining and combining probabilistic cues in or-
der to determine if this information can be used to segment words out of the speech
stream (Brent & Cartwright, 1996; Christiansen, Allen, & Seidenberg, 1998). For
example, using information from n-grams (e.g., bigrams, trigrams, etc.), Cairns,
Shillcock, Chater, and Levy (1997) examined whether a connectionist network
could segment the speech stream presented in the form of strings of phoneme units.
The input to the model was a noise-free 10,000 segment stretch of corpus. The net-
work’s performance peaked with correct identification of about one in five bound-
aries in the test corpus. As a result of this performance, they checked to see if the
errors the model was making were plausible guesses and found that it posited
phonotactically well-formed boundaries. With the addition of pause information,
the network performed above chance. They concluded that the phonotactics of
English contain sufficient information for a bottom-up processor to work for
one-third of word boundaries and that adding pauses improves performance. The
importance of pause information has also been demonstrated behaviourally with
infants (Pena
~, Bonatti, Nespor, & Mehler, 2002).
Cairns et al. (1997) also wanted to know if the network’s performance sup-
ported the metrical segmentation strategy (MSS; Cutler & Norris, 1988). The net-
work was able to segment correctly without the presence of initial strong syllables,
suggesting these are not necessary in the later stages of word learning. Instead, this
208 WERKER AND CURTIN
model suggests that the MSS is useful in segmentation in the early stages of devel-
opment, and sensitivity to phonotactic information represents a viable part of later
development. These results mirror the behavioural data that demonstrate that
younger infants rely on rhythm and older infants integrate stress and sequential in-
formation (Morgan & Saffran, 1995).
In summary, the studies presented here demonstrate that the signal is rich with
information and that infants utilize much of this rich information to segment word
forms from the speech stream. Changing perceptual sensitivities and powerful sta-
tistical learning mechanisms guide segmentation. In recognizing parsed word
forms, younger infants require a greater degree of similarity than do older infants
in dimensions such as voice, gender, coarticulation, and stress. The studies re-
viewed in this section provide powerful evidence that, long before they understand
their meaning, infants recognize and store familiar word forms.
Word Meaning
Infants can recognize their own names (Mandel, Jusczyk, & Pisoni, 1995) and
other highly frequent word forms by as early as 4 months of age (Tincoff &
Jusczyk, 1999). How do children learn meaningful words? The classic approach to
word learning begins with the child developing a concept and then searching for a
label (Katz, Baker, & Macnamara, 1974; Nelson, 1974). However, as elucidated by
Jusczyk (1997b), stored word units can also initiate word learning by propelling
the search for a concept. Neither of these processes may be operative at the begin-
ning stages of word learning, however, when the child does not yet have the notion
that “things have names. The transition from recognizing word forms to full refer-
ential understanding likely involves several steps (see Hirsh-Pasek & Golinkoff,
1996; Nazzi & Bertoncini, 2003; Werker & Tees, 1999). One of the earliest steps is
learning arbitrary associative links between words and objects or events in the
world. Although this kind of “goes with” understanding falls short, on a number of
dimensions, of full referential understanding (Bloom, 1999; Merriman & Bow-
man, 1989), it is an essential step toward meaning. In naturalistic parent–child en-
counters, associative word learning is supported by a number of contextual and in-
tentional cues. For example, a parent might say “Can you get your shoes” when
standing in front of the door preparing to go out and pointing to the child’s shoes.
In this case, both the situation and the mother’s pointing behaviour help the child
establish the link. However, it is difficult to study the phonetic and indexical detail
picked up by the child in those situations.
In a series of studies over the past 5 years, the Werker lab has investigated the
phonetic detail infants use in an associative Switch word-learning task in a labora-
tory situation wherein infants must solve the word–object associative link on their
own (Werker, Cohen, Lloyd, Casasola, & Stager, 1998). We began this series of
studies to see if the phonetic categories established in the first year of life would
PRIMIR 209
guide word learning. To our surprise, in the first series of experiments we discov-
ered that infants apparently fail to use discriminable phonetic detail when linking
novel words to novel objects (Stager & Werker, 1997).
In the Switch task, infants are shown two word–object pairings for a number of
trials until their looking time declines to a preset habituation criterion and then are
tested on their ability to detect a violation (a “switch”) in the word–object pairing
using two trial types. The “same” trial includes a familiar word and familiar object
in the familiar pairing, whereas the “switch” trial includes a familiar word and fa-
miliar object but in a new pairing (e.g., Object A is now paired with Word B). If in-
fants have formed the associative link, they will be surprised by the “switch” trial
and should look longer than on the “same” trial.
Infants of 14 months, but not younger, can learn the associative link between
word and object during the short laboratory procedure and look longer to the
“switch” trial. Infants as young as 8 months detect a change in the word or object if
habituated to only a single word–object pairing, but they are unable to detect a
switch in the pairing when two words and two objects are used. It is the ability to
form arbitrary links between words and objects that is the hallmark of this initial
step in word learning. Using the Switch task and its simplified, single object vari-
ant it was found that although infants of 14 months can learn to map two phoneti-
cally dissimilar words (“lif” and “neem”) onto two different objects and detect a
“switch” when the word-label pairing is violated (Werker et al., 1998), they fail at
this task when the phonetically similar words “bih” and “dih” are used, even
though they discriminate these words in a simple speech perception task at both 8
and 14 months of age (Stager & Werker, 1997, see Figure 1).
The finding that infants have difficulty learning minimally different words has
been challenged by tests with other procedures. Using a latency response in an
210 WERKER AND CURTIN
FIGURE 1 Summary of word learning studies from Stager and Werker (1997) and Werker,
Fennell, Corcoran, and Stager (2002).
online word recognition task, Swingley and Aslin reported that infants 18–23
(Swingley & Aslin, 2000) and 14 (Swingley & Aslin, 2002) months of age can dis-
tinguish a mispronunciation of a known word from the word itself (e.g., “baby” vs.
“vaby”; see also, Bailey & Plunkett, 2002). Importantly, these studies not only
used a different task than the Switch task, but they also used words and pictures of
objects already known by the child.
To explore why infants succeeded in the Swingley and Aslin (2000, 2002) stud-
ies and failed in the Switch studies, Fennell and Werker (2003) tested infants of 14
months on their ability to distinguish the well-known minimal pair words “ball”
and “doll” in the Switch task.1Infants of 14 months were successful. In a follow-up
study, Fennell and Werker (2004) tested infants in a version of the Switch task that
is more compatible with the mispronunciation task used by Swingley; infants were
tested on their ability to distinguish the known word “doll” from the mispronuncia-
tion “goll.” Infants were also successful at this task. These results help explain the
different results obtained using different tasks and suggest that previous knowl-
edge of the word is helpful across tasks in allowing infants to use the phonetic de-
tail. Of interest, another manipulation showed that knowledge of only the object,
and not of the word, is not sufficient to allow infants of 14 months to succeed in the
Switch task (Fennell, 2004).
The difficulty in mapping phonetically similar novel words onto novel objects
is shortlived. By 17 months of age, infants are successful at learning phonetically
similar words in the Switch task (Werker, Fennell, Corcoran, & Stager, 2002). The
same pattern of results is obtained for Hindi-learning infants as well. Using the
Switch procedure, Werker, Ladhar, and Corcoran (2005) tested Hindi-learning in-
fants and found that by 17 months of age infants were able to link words differing
in the Hindi retroflex versus dental contrast onto two different objects. Of interest,
the English-learning controls did not succeed, showing that it is only native con-
trasts that can be used.
Not all non-native contrasts are equally difficult. There are some acoustically
salient non-native distinctions that remain discriminable across the life span (e.g.,
Best, 1994). Given the physical salience of length, one might hypothesize that
vowel length distinctions may be among the group of distinctions that remain
discriminable. An interesting question is whether infants do indeed continue to
discriminate a non-native vowel length distinction and, if they do, whether they
can use it in a word learning task. Dietrich, Swingley and Werker (2004) compared
English- and Dutch-learning infants of 10 to 12 months of age (the age beyond
which infants typically discriminate non-native contrasts) on their ability to dis-
criminate a vowel length distinction that is phonemic in Dutch but not in English.
Infants in both language groups could still discriminate the Dutch (non-English)
vowel length distinction (e.g., “tam” vs. “taam”). However, when tested in the
PRIMIR 211
1The words “doll” and “ball” are minimal pairs in Western Canadian English.
Switch associative word-learning task at 18 months of age, only the Dutch infants
succeeded. Inclusion of a control study with a vowel quality distinction that is used
in both Dutch and English confirmed that when a native distinction is used, both
English and Dutch infants of 18 months can succeed in the Switch task.
As demonstrated by the previous findings, there are a number of results that re-
quire a coherent explanation. For example, infants of 14 months succeed at linking
“lif” and “neem” to two different objects but fail with phonetically similar words
such as “bih” and “dih” (or “bin” and “din”; Pater, Stager, & Werker, 2004). This
difficulty is gone by 17 months of age. If familiar words are used, infants as young
as 14 months succeed. Moreover, they succeed with exactly the same distinction
that they had failed to use when confronted with novel words. In all cases, success-
ful performance in word learning is restricted to native phonetic distinctions.
In our review of infant speech perception through word learning, we identified a
number of other findings that are difficult to accommodate within any of the exist-
ing frameworks. A common finding is that infants show different patterns of dis-
crimination across different tasks. For example, in speech perception studies in-
fants typically show categorical perception, but under the right testing conditions
they can show gradient perception. In the case of word segmentation, infants ap-
pear to pay attention to different cues at different ages. In addition, in word form
recognition, indexical information plays a more important role for younger than
for older infants. Phonetic information is more discriminable in novel word forms
than it is in newly learned meaningful words. The differences in performance
across all these studies can be accounted for by a careful analysis of how perfor-
mance varies in infants of different ages as a function of task.
PRIMIR
There is a vast amount of information in the speech signal that the infant must
make sense of and apply to a range of language learning tasks in order to success-
fully enter language acquisition. The overall pattern of developmental change is
one of advancement, but this is not always apparent from the infant’s performance
in each and every task. Each piece of the puzzle requires an understanding of the
whole developmental manifold. Thus, understanding performance changes in any
one task requires consideration of performance in other tasks at the same and at
different times in development. PRIMIR provides such a unified framework.
Our starting assumption is that the infant brings to the language learning task
evolutionary and epigenetically based biases. These biases, which include a pref-
erence for speech, infant-directed speech, point vowels, proper syllable form, and
the ability to process rhythmical patterns in speech, act as a filter. The developmen-
tal level of the infant and the specific language-learning task also act as filters. We
assume that these filters are coupled with general learning mechanisms that calcu-
212 WERKER AND CURTIN
late statistics across many different aspects of the input simultaneously. Regular-
ities in the input and the filters collaborate to ensure that only linguistically possi-
ble combinations are learned. These filters and the learning mechanisms that drive
processing interact with the raw physical saliency (acoustic, phonetic, gestural, vi-
sual, etc.) of the information in the signal to form representations.
Representations in PRIMIR are organized along multiple dimensions. The si-
multaneous representation of multiple dimensions of the signal results in an archi-
tecture that allows for utilization of different information for different language
tasks, with some types of information more easily accessible at different times in
development. In PRIMIR the infant simultaneously has access to both the informa-
tion necessary for defining phonetic segments and to the information that groups
segments together into larger units. PRIMIR also allows for the inclusion of
indexical information, such as speaker identity, that is not necessarily criterial in
defining a phonetic category but might be useful for other language tasks. In other
words, regularities are summarized over a number of dimensions simultaneously.
The summary representations are essential for productivity and efficiency in lan-
guage processing. However, as will become apparent, neither the processing as-
pects of PRIMIR nor the representational aspects can be understood without the
other.
PRIMIR’s multidimensional, interactive representations allow for the grouping
of information on the basis of similarity, co-occurrence, and other statistical regu-
larities. The information is grouped at three multidimensional planes. These in-
clude a General Perceptual plane, a Word Form plane, as well as an emerging Pho-
nemic plane. PRIMIR also allows for expansion to an Orthographic and to a
Grammatical plane. Importantly, the planes are not necessarily hierarchically or-
dered with respect to one another but do summarize different kinds of regularities.
Access to these different planes is not mutually exclusive, and formation (and ref-
ormation) of the planes can occur in parallel.
Multidimensional Interactive Representational Planes
The General Perceptual plane refers to all the information that is in the signal. This
plane includes those properties that are specifically phonetic. It also includes those
indexical (paralinguistic) properties that do not play a central role in any specific
linguistic category but nevertheless contribute to processing (as in phonetic or lexi-
cal categories being easier to recognize when in the same voice). The information
itself can be categorized or can be incorporated in the categories at other planes of
representation. The language-specific categories that emerge at this plane aid in
the formation of the Word Form plane—extracted sequences that form cohesive
units. Word Forms are stored as individual exemplars that cluster in multidimen-
sional space. Distinct Word Forms are predicted to correspond to the lan-
guage-specific phonetic categories established at the General Perceptual level.
PRIMIR 213
Linking Word Forms to concepts results in meaningful words. Once the infant has
established a sufficient number and density of meaningful words, generalization of
commonalities occurs, leading to the emergence of the Phoneme plane. Phonemes
reflect those properties that actually define the phonetic variation used in distin-
guishing meaning among lexical units. In later development, learning to read will
sharpen phonemes. Each of these planes informs the other and each directs infor-
mation pick-up and attentional allocation (see Figure 2).
It is important to note that the information in each of these planes is relational in
nature. Be it an indexical cluster such as gender or emotion, or a phonetic cluster
such as a particular vowel or a feature such as stress, the values are not absolute.
Male voices differ from female voices along a number of dimensions, each of
which is defining only because it distinguishes the two gender clusters. Without
the contrasting cluster “female,” the cluster “male” would not exist. Similarly, the
vowel [a] differs from the vowel [e] in acoustic and articulatory features that are
meaningful for this plane only because they contribute to this type of distinction.
The “representation” available to the listener simultaneously focuses attention on
those relations made prominent by a particular task, while also containing all of the
information available in the other clusters that interact with, and contribute to, the
cluster of attentional focus. In this way the planes available in PRIMIR allow the
listener the advantages of generalization that categorization makes possible
(Shepard, 1987) while at the same time permitting access to subcategorical detail,
in some cases all the way back to the exemplars that stimulate the formation of
similarity clusters.
General Perceptual plane.
The phonetic and indexical information is in
the speech environment and is organized in the General Perceptual plane. In
PRIMIR, General Perceptual features are the bases of phonetic and indexical cate-
gories. These categories can best be described as clusters of exemplar-like distribu-
tions. These distributions store information about the different instantiations en-
countered. This results in context-sensitive clusterings. For example, the
214 WERKER AND CURTIN
FIGURE 2 PRIMIR’s multidimensional planes.
distribution of [ph] in syllable initial position has different values than do word-fi-
nal or medial [p] distributions (see also Pierrehumbert, 2003b). These distributions
also enter into neighbourhood clusterings on the basis of similarity of features.
Thus stops, for example, which share several acoustic or articulatory features,
likely form clusterings that are distinct from other manner classes. Or, as suggested
by gestural phonology, segments that share gestures might cluster together into
natural classes (Goldstein & Fowler, 2003). There is less work on the organization
of features of indexical categories, but the work to date suggests that some fea-
tures, such as gender in the face and voice, are categorized by 6 months of age. We
propose that the same distributional organization occurs here.
Some clusterings, for instance phonetic categories, may emerge early from re-
lational features that have privileged salience or special status and that our initial
biases perceive as integral. One illustration of integral features comes from the
work on “trading relations” (as defined by Repp, 1982). Briefly, the many studies
in this area show that in speech perception some acoustic or phonetic features that
are necessary consequences of production, work together to specify a distinction.
For example, two acoustic cues that pattern together to help signal a difference in
the presence or absence of a stop consonant are the duration of the silent interval
following the release of the consonant and the starting frequency of the second for-
mant transition. When these are changed together in a way that is consistent with
how the articulators change (increase in silent interval and elevation of second for-
mant), the resulting phonetic segments are more discriminable than they are if
these two acoustic cues are changed in opposite directions (increase in silent inter-
val and decrease in the second formant). These two cues are said to “trade”: They
work together as integral features to specify phonetic distinctions such as that in
“split” versus “slit” (Fitch, Halwes, Erickson, & Liberman, 1980). Trading rela-
tions are not limited to acoustic features. Articulatory features that co-occur and
share gestural properties can also be integral in the way they contribute to category
formation (see Mann & Repp, 1980).
Information picked up from the environment may contribute simultaneously to
both phonetic and indexical clusterings. Pitch information is used in voice recogni-
tion but is also a cue to voicing distinctions (Whalen, Abramson, Lisker, & Mody,
1993) and vowel colour (Whalen & Levitt, 1995). Pitch may also contribute to
word meaning, and vowel colour may contribute to affective valence (Ohala,
1983). PRIMIR allows for their joint contribution in a way that most other concep-
tual frameworks do not.
By including both phonetic and indexical information in the similarity
clusterings, PRIMIR may also ultimately be extended to adult speech processing.
For example, PRIMIR can account for the finding of superior word recognition in
familiar voices by adults (Palmeri, Goldinger, & Pisoni, 1993) and the demonstra-
tion that specifically phonetic information (sine waves) in the absence of any iden-
tifying voice signature cues can lead to the identification of familiar speakers
PRIMIR 215
(Fellowes, Remez, & Rubin, 1997). Our framework holds open the possibility that
the features that comprise any clustering may differ within the allowable con-
straints between individuals and across languages. Moreover, it allows for the con-
tribution of perceptual information from other domains such as the visual domain.
Word Forms and meaningful words.
The Word Form plane in PRIMIR
consists of extracted units without meaning attached. To account for the fact that
extracted Word Forms encode indexical as well as phonetic properties, we assume
an exemplar-based representation (Jusczyk, 1997a; Vihman, 2002) at this plane.
Similar exemplars overlapping along phonetic or indexical dimensions form clus-
ters. Essentially, neighbourhoods emerge in this plane in the absence of meaning
(Hollich, Jusczyk, & Luce, 2002).
Associating a Word Form to a concept is computationally demanding. The in-
fant has to figure out what the referent is and which of its properties are defining
(Hall, 1993; Markman, 1990). At the same time, the infant has to figure out
which variations (indexical and phonetic) are acceptable and which are irrele-
vant in establishing the linkages. This requires a step beyond the similarity
clusterings that emerge at the General Perceptual and the Word Form planes.
The infant’s task is to link Word Forms with concepts and to hold this linkage in
memory resulting in meaningful words.
At the beginning stages of word learning when infants are just forming associa-
tive links, the statistics of the input drive the process. Infants have available all the
information stored in both their Word Forms and their object knowledge and must
forge appropriate links. At the same time, they must pull out and attend to just that
information that is criterial to the words and to their referents. In a naturalistic
word-learning situation, the child will hear the word over and over again in differ-
ent contexts spoken by different people and will see different instances of the ob-
ject category, again in different contexts. This serves to bring into relief, to make
salient, the criterial information and facilitates an error-free mapping. In a labora-
tory setting, such as exists in the Switch task, the familiarization conditions do not
highlight the critical differences. Thus, if the words to be learned overlap on a
number of different features, it may be more difficult to pull those words apart in
the mapping process. If the words are nonoverlapping, it may be easier to associate
the two words to two different objects. The nonsense words “bih” and “dih” as pre-
sented in the experiment overlap on all phonetic and indexical (e.g., speaker, into-
nation) dimensions except for the place of articulation of the initial consonant. To
map those two words onto the two different objects presented, the infant would
have to home in on just that phonetic detail that distinguishes the words.
Initially, the infant may not attend to the defining features in either the word or
the referent and, in so doing, err in the mapping. The difficulty in forming and
holding the linkage is illustrated in Figure 3 (left), and the difficulty in establishing
a correct mapping using the criterial features is illustrated in Figure 3 (right).
216 WERKER AND CURTIN
According to PRIMIR, all the relevant information is available in the word forms
that children attempt to link to concepts. What is difficult is directing attention to
just that information that is criterial for distinguishing the words.
In PRIMIR, these word–object associations are initially stored as exemplars
and allow little generalizability. As in a standard exemplar model, they cluster in
multidimensional space. Words that share phonetic and indexical features will
eventually form overlapping clusters, and concepts that share features will also
form overlapping clusters. Already established clusterings will affect the spacing
of new items. This allows neighbourhoods to form along language-specific pho-
netic dimensions. Neighbourhoods also emerge among concepts and meanings
based on similarities of semantic properties.
Changing neighbourhood density is another important property of the emerging
planes. In recent work it has been shown that at 17 months, prior exposure to a
dense neighbourhood of words (identical in rhyme and coda, differing in only the
initial consonant) inhibits learning a new word–object linkage in comparison to
prior exposure to a sparse neighbourhood (Hollich et al., 2002). A prediction from
PRIMIR is that after phonemes have emerged, dense neighbourhoods will facili-
tate rather than inhibit learning new word-object linkages.
Phonemic plane.
As the vocabulary expands and more words with over-
lapping features are added, higher order regularities emerge from the multidi-
mensional clusters. These higher order regularities gradually coalesce into a sys-
tem of contrastive phonemes (see Remez, 2000). What do we mean by
phonemes? Traditionally, phonemes are viewed as abstract units that are used to
contrast meaning (Trubetzkoy, 1939/1969). In PRIMIR, phonemes summarize
across the context-sensitive variations that are represented as language-specific
PRIMIR 217
FIGURE 3 The figure on the left illustrates difficulty mapping two distinct Word Forms onto
a similar concept. The figure on the right illustrates difficulty mapping two similar Word Forms
onto distinct concepts.
phonetic categories at the General Perceptual plane and are included in the ex-
emplars at the Word Form plane. Phonemic categories coalesce once a criterial
number of Word Forms and meaning linkages is present (Beckman & Edwards,
2000). Once they have coalesced, “phonemes” can begin to direct information
pick-up even during word learning. As such they serve to focus the infant’s at-
tention on just that information in the Word Form that is essential in learning a
new word–object link. This accounts for why the infant with a larger vocabulary
is successful in learning phonetically similar words and an infant at the early
stages of word learning is not (Werker, Fennell, Corcoran, & Stager, 2002).
Phonemes become more and more firmly established as the lexicon grows, to
the point that they become resistant to change. At the beginning, the phonemes as
viewed in PRIMIR may not correspond completely to the adult inventory. With uti-
lization of phonemes not just for word learning and word recognition but also for
rhyming, alliteration, and reading, they come to more closely mirror the adult sys-
tem and become increasingly solidified. In this way, the phonemes that emerge in
PRIMIR eventually come to resemble those abstract categories envisioned in the
traditional view.
PROCESSING RICH INFORMATION
PRIMIR is a conceptual framework comprised of multidimensional interactive
representational planes (General Perceptual, Word Form, and Phoneme) and dy-
namic filters (initial biases, task demands, and developmental level). Initial biases
get things started and without them it is difficult to imagine how speech perception
could become organized and ultimately link to language acquisition. It is likely
that their contribution gradually becomes less important across development. Task
demands and developmental level play increasingly important roles as will be
elaborated next.
At its most basic level, PRIMIR asserts that in tasks that are most like straight-
forward auditory processing, any discriminable differences at the General Percep-
tual plane will be detected. These include both phonetic and indexical information.
In perceptual tasks that require use of categorical information infants use the lan-
guage and culturally specific categories developed during the first year of life. The
discrimination and categorization of language-specific phonetic categories is ro-
bustly evident in older infants. The use of culturally specific indexical information
is seen in tasks requiring matching of culturally specific display attributes, such as
gender information in the face and voice (Patterson & Werker, 2002). In adults, the
use of language-specific phonetic information is evident in phonetic identification
tasks and in discrimination tasks with long ISIs (Pisoni & Tash, 1974; Werker &
Logan, 1985) and in tasks that require matching the phonetic information in the
face and voice (Werker, Frost, & McGurk, 1992). In discrimination with shorter
218 WERKER AND CURTIN
ISIs the use of both language-general phonetic information (i.e., non-native con-
trasts) and subphonetic acoustic differences is evident (Pisoni & Tash, 1974;
Werker & Logan, 1985).
This language- and cultural-specific categorical information facilitates the seg-
mentation seen at the Word Form plane (Mattys & Jusczyk, 2001). Indeed, all the
information available at the General Perceptual plane is encoded in the Word
Forms. This is what accounts for faster recognition or memory of words in the
same voice or gender as originally experienced both in infants (Houston &
Jusczyk, 2000) and adults (Nygaard, Sommers, & Pisoni, 1995) and for the gradi-
ent effects seen in online word recognition tasks wherein the speed and accuracy of
recognition is influenced by the precise VOT of the test items for both adults
(McMurray et al., 2002) and infants (McMurray & Aslin, 2005).In tasks that re-
quire postprocessing decisions about the identity of a known, meaningful word,
PRIMIR predicts that the Phoneme plane will have priority and that attention will
therefore be directed more to those features that contribute to the phonemic system
of the language than to the phonetic and indexical detail available at the Word
Form plane.
However, how does PRIMIR account for the fact that children of 14 months can
distinguish well-known minimally different words in both the Switch (Fennell &
Werker, 2004) and the two-choice Visual Fixation (Swingley & Aslin, 2002) tasks
but fail when learning novelwords? In PRIMIR a developmental filter is proposed to
interact with the task filter to direct attention. At the beginning, the infant has only a
few words. As the vocabularyexpands and more words with overlapping features are
added, phonemes emerge. Thus infants who are more advanced in lexical develop-
ment have an emerging Phonemic plane to guide information pick-up. Indeed, in our
work there is evidence that even at 14 months, those infants with larger vocabularies
are better able to use the phonetic detail when learning novel words in the Switch
word-learning task (Werker et al., 2002; see also Beckman & Edwards, 2000, for a
model predicting a relation between vocabulary size and phonetic detail).
In sum, initial biases, task demands, and developmental level act as dynamic fil-
ters, directing attention to only some of the available information. In Figure 4, the
ball represents the three representational interactive planes. The square represents
the summary contributions of the three dynamic filters to focusing attention. In the
figure on the left, the filters work together to direct attention to a small amount of
information in one representational plane. The figure in the middle illustrates how
different filter configurations can direct attention to multiple planes. Notice as
well, that the amount of information actively processed can be increased or de-
creased. This will depend on the biases, the task, and the developmental level. The
figure on the right demonstrates how attention can be directed to multiple types of
information across multiple planes.
One of the well-known benefits of any higher order category is the increase in
processing efficiency it affords. In PRIMIR this gain in efficiency is seen from the
PRIMIR 219
very first emergence of phonemes. It allows for attentional focus on just those
properties that are necessary to distinguish one possible word from another (as
seen in the wider variety of tasks in which infants can utilize phonetic detail in
word-learning situations). This suggests that the emergence of the Phoneme plane
may be one of the key elements that allow for an explosion in word learning. We
propose that this is the case and include this as one of the predictions of PRIMIR.
Once infants have a Phonemic plane established, it will guide them in informa-
tion pick-up and act as a beacon in directing attention to just that information that is
criterial in distinguishing phonetically similar words such as “bih” and “dih. Pre-
liminary evidence in support of this prediction is presented at the end of this sec-
tion. Without a set of phonemes, however, the child is faced with the task of wading
through all the overlap and ascertaining where the difference lies. We argue that
this is the situation faced by the novice word learner. They do not yet have available
a repertoire of phonemes and must thus figure out—item by item—what is irrele-
vant and what is criterial and must do so while engaged in the difficult task of link-
ing words to concepts. In the case of words such as “duck” and “ball, there is vir-
tually no overlap in the phonetic information, creating a situation that makes it
easier to link the two items to two different concepts (see Figure 3). If infants have
exposure under conditions where the noncriterial information is not overlapping,
their attention might be more clearly focused on what is criterial. That infants of 14
months failed not only on “bih” and “dih, but also on words that differ in only
voicing (“pin” vs. “bin”) or even voicing plus place (“pin” vs. “din”; Pater et al.,
2004) may be because the overlap is still nearly complete (see Figure 3). In addi-
tion, these novel words differ in features that are physically quite similar. Recall
one of the assumptions of PRIMIR is that raw physical saliency also plays a role. A
prediction from PRIMIR then is that if the nonsense words differ in some feature
that is more salient than place or voicing of the initial consonant, the infant could
more easily pull the words apart. We are currently testing this prediction in two
studies: one involving words that differ in stress but not in phonetic features (Cur-
220 WERKER AND CURTIN
FIGURE 4 Summary of interaction of the three dynamic filters in directing attention.
tin & Werker, 2005), and one involving words that differ in the vowel (Curtin, Fen-
nell, & Werker, 2005). As phonemes become more robust, raw physical saliency
may play a lesser role at least for a period of time in development.
PRIMIR was developed to provide a unified account of infant speech percep-
tion and word learning and to better demonstrate their relatedness. We argued that
any comprehensive account of infant speech perception and word learning must
address at least two fundamental issues. The first is that speech perception is both
categorical and gradient. The second is that performance in speech perception and
word learning tasks is influenced by two time lines: ontogenetic development and
online processing. PRIMIR addresses these two issues by providing a cohesive
conceptual framework. There is rich information in the speech signal. In PRIMIR,
this information is organized along multidimensional planes. Attention to, and uti-
lization of, this information varies as function of the interaction of initial biases,
developmental level, and task demands. We argued that the infant begins life with
initial biases that make some aspects of the rich information more perceptually sa-
lient than others and guide information pick-up. Importantly, even for the newborn
infant, perceptual salience varies as a function of task. Perceptual saliencey oscil-
lates across ontogenetic development, and its role is magnified as the infant faces
new types of linguistic tasks. In this way, initial biases, developmental level, and
task demands act as dynamic filters to guide information pick-up.
There are statistical regularities in the input. The infant has available general
learning mechanisms that allow detection of these statistical regularities. The dy-
namic filters enhance, attenuate, or transform the available information. Together,
the dynamic filters that direct information uptake and the sensitivity to regularities
in the input propel the emergence of clusterings that function like categories. As
these similarity clusters coalesce, they further drive information uptake in an itera-
tive manner. In this way processing and representation are intimately tied in
PRIMIR. Processing influences representations while representations simulta-
neously drive processing.
ALTERNATIVE MODELS CONSIDERED
Now that we have presented PRIMIR, we will consider alternative accounts of in-
fant speech perception development and word learning and evaluate PRIMIR in
comparison to these alternatives. Four primary infant speech perception models
will be included. Although this is not an exhaustive overview of all existing models
(see, e.g., Boersma, Escudero, & Hayes, 2003; Peperkamp & Dupoux, 2002), it
serves to illustrate and evaluate PRIMIR.
Both Best (1994) and Kuhl (1993) have proposed models to account for age-re-
lated changes in infant speech perception that, like PRIMIR, are based on similar-
ity-detecting mechanisms (see also Burnham, Tyler, & Horlyck, 2002). Best’s Per-
PRIMIR 221
ceptual Assimilation Model (PAM) provides a “direct realist” (Gibsonian) account
of native and non-native speech perception. According to this account, speech is
perceived categorically in young infants because they recover from the acoustic
signal information about the distal object, in this case, the vocal tract producing the
sounds. Using gestural phonology as the best descriptor of the kind of information
that can be recovered in speech perception, PAM predicts non-native speech seg-
ments will be perceived according to how they might be assimilated to native cate-
gories (see Best & McRoberts, 2003, for an extension of PAM). Non-native sounds
that are assimilated to two different native language phonological categories re-
main discriminable, whereas those assimilated to a single intermediate category do
not. Other assimilation patterns, such as “category goodness, show intermediate
levels of discriminability.
Kuhl’s (1993) Native Language Magnet (NLM) model uses acoustic cues rather
than gestural phonology as the source of information available to the listener in
speech perception studies. The primary tenets of the theory are that categories are
defined by clusterings in multidimensional space and that repeated experience lis-
tening to speech helps create a category structure defined by good and poor exem-
plars. This emerging category structure leads first to asymmetries in discrimina-
tion and later to a collapsing of non-native categories into native ones. The essence
of NLM is that an asymmetry in perception is predicted, with poorer performance
if the central instance is used as the standard in discrimination studies than if a pe-
ripheral instance is used as the standard.
Both of these models focus on speech perception. They are not designed to link
speech perception to word learning. Moreover, neither explicitly addresses how
indexical information is integrated into speech perception (although Kuhl, 1993,
does compare the use of phonetic to nonphonetic information). Finally, although
both models are based on similarity-matching metrics, neither provides an explicit
account of what the underlying psychological or learning process might be that al-
lows the child to evaluate similarity. PRIMIR proposes that the learning mecha-
nism is statistical learning. In reminder, PRIMIR points to the distributional learn-
ing demonstrated by Maye et al. (2002) to account for the assimilation patterns
described in both PAM and NLM. Moreover, PRIMIR allows for differences
across contrasts in the impact of such distributional learning as a function of the
frequency of occurrence of the precise features in the input (see Anderson et al.,
2003). Both PAM and NLM take an explicit stance on the form of information that
contributes to the phonetic percept: It is gestural in PAM and acoustic in NLM.
PRIMIR is not explicit on this count because it is designed to allow all forms of
available information to be used.
Jusczyk (1992, 1997a) proposed Word Recognition and Phonetic Structure Ac-
quisition (WRAPSA) to provide a unified account of how native language experi-
ence affects both infant speech perception and word recognition. Initially, as the
signal enters the auditory system, a set of “auditory analyzers” provides a descrip-
222 WERKER AND CURTIN
tion of the signal. This is a general process that describes all auditory input, not just
speech. PRIMIR is not restricted to auditory information, but also accepts other
types of sensory and motor input. In PRIMIR, it is assumed that some information
may be privileged due to epigenetically given perceptual biases. At this level,
WRAPSA treats all auditory input as the same, whereas in PRIMIR we argue there
is value in retaining a distinction between phonetic and indexical sources of infor-
mation while recognizing that they are often inseparable.
In WRAPSA, the output of the auditory analyzer is weighted later in the acquisi-
tion process to give prominence to the features that are most crucial in forming a
meaning contrast between words. The properties of the sounds themselves in con-
junction with their distribution are critical in determining the weighting scheme. Ac-
cording to WRAPSA, once the signal has been weighted, the output of this process is
then subject to pattern extraction that refines the description of the processed signal
and attempts to segment the signal into word-size units. The representation that is
formed during this process is global in that it provides a description that temporally
groups together prominent features into syllabic units. These units, however, are not
further broken down into phonetic segments. Indeed, in WRAPSA it is assumed that
across development infant listeners first have access to prosodic information, then
syllabic, and only later phonetic. PRIMIR assumes that the same general statistical
learning mechanisms are operating over the different levels of analysis simulta-
neously. Thus, prosodic analysis, segmentation of the speech stream, extraction of
syllables, formation of phonetic categories, and storage of Word Forms happen si-
multaneously,with each plane further influencing the category formation and the in-
formation pick-up at every other plane. In PRIMIR, the convergence of the child’s
developmental level and the requirements of the task determine which information
in which planes will be given priority.
WRAPSA accounts for word recognition through the use of representations as
probes to the mental lexicon. What this means is that representations are matched
against existing representations of known words already stored in memory
(traces). If a close match is obtained between the probe and the existing set of
stored traces, then a word is recognized and its meaning (if represented) is ac-
cessed. If no match for the probe can be found, either the probe will be reprocessed
or it will be stored as a new entry with or without meaning. PRIMIR stores exem-
plar-based Word Forms in a similar fashion to WRAPSA. The accounts diverge in
their view on the formation of phonemic categories. WRAPSA assumes that pho-
nemes emerge from the weighting scheme prior to the extraction of potential Word
Forms. PRIMIR proposes that phonemic categories form only after distinguishing
properties are detected across a number of Word Form–concept linkages.
Morgan’s (2002) Dimensionally Reduced Item-Based Lexical Recognition
(DRIBBLER) model extends WRAPSA and provides a computational implemen-
tation. It differs from WRAPSA in only a few ways but provides a more formalized
computational model than was ever proposed for WRAPSA. One way that
PRIMIR 223
DRIBBLER and WRAPSA differ is that WRAPSA assumes that phonemes
emerge from the weighting schemes applied to the auditory analyzers prior to the
advent of word segmentation. DRIBBLER, like PRIMIR, argues that segmenta-
tion can be seen at an earlier stage in development. DRIBBLER focuses on de-
scribing how the rich speech input is processed without specifying whether there
are representations other than the word form exemplars. It argues that all the kinds
of information described in PRIMIR are stored in each word form exemplar and
that utilization of different information results from probing the relevant dimen-
sions in the exemplar. One processing attribute of DRIBBLER that has been most
extensively described is the covariance detector. This is highly sensitive to fre-
quency. DRIBBLER argues that over frequent input, the dimensionality encoded
in the representation is reduced from phonetic to phonemic by extracting
covariations and dropping irrelevant information. PRIMIR differs from
DRIBBLER in that irrelevant information is not dropped; it is simply not accessed
or used for that particular task. PRIMIR and DRIBBLER both assume some kind
of similarity clustering but differ in that PRIMIR allows similarity clusterings to be
conceptualized as different planes, whereas DRIBBLER seems not to. Both none-
theless argue that something like phonemes is required to account for efficiency in
processing in speech comprehension. Only PRIMIR takes the strong stance that
abstract phonemes exist.
TESTING THE PREDICTIONS OF PRIMIR
There are a number of predictions that follow from PRIMIR. The strongest predic-
tion is that if information has been shown to be available at one point in develop-
ment, then it should be accessible at every subsequent point in development under
at least one task condition. It may be difficult to find a task that reveals an earlier
sensitivity given the dynamic changes in the developmental level filter, but there
should be some combination that will allow access to this information. For exam-
ple, information at the General Perceptual plane that has been accessed at one
point in development should be accessible again given the right task.
A second prediction is that infants at the same developmental level should be
able to access different types of information depending on the task. Given the
multidimensional interactive structure of PRIMIR, this should be found across at
least three different comparisons: across planes (e.g., phonemic and phonetic), be-
tween clusters within planes (e.g., indexical and phonetic), and within clusters
(e.g., categorical and graded).
A third prediction is that infants at different stages of development should access
different types of information when tested in the same task. This pattern was seen
with both word learning and word recognition studies. Recall that infants of 17
months can access detail when learning words in the Switch task that infants of 14
224 WERKER AND CURTIN
months do not use (Werker et al., 2002). Recall as well that in word recognition tasks,
infants of 7.5 months attend to indexical information, such as affect,that infants of 10
months ignore (Singh et al., 2002). Moreover, at 10 to12 months infants can attend to
non-native phonetic differences if they are especially salient but still cannot use
these nonphonemic differences in a word learning task at 18 to 20 months (Dietrich
et al., 2004). Although supported now by a number of studies, this prediction is at the
heart of PRIMIR and thus needs to be more broadly explored.
There are corollaries to these predictions. It is not necessarily the case that every
change in a task will lead to a different pattern of results for same-aged infants. In-
deed, many times evidence for the use of precisely the same information will be
seen across many tasks. PRIMIR would predict this pattern for planes that are
more readily available for a child at a certain age or developmental level. For ex-
ample, once the Phoneme plane is firmly established, phonemes should be readily
utilized across a variety of tasks. We have argued that this is why the older child is
more successful at accessing phonetic detail when learning novel words.
Another corollary is that the developmental level of the infant may be more im-
portant than exact chronological age. A third is that these same predictions should
hold for both children and adults. One domain that has yet to be explored within
the context of PRIMIR is proficiency in second language acquisition. We hope to
consider this more fully in the future, but for now, PRIMIR does not take a stand on
whether level of proficiency in second language acquisition will, like developmen-
tal level, act as a dynamic filter for directing attention.
In addition to the predictions, there are also key assumptions built into
PRIMIR that bear testing. The assumption that the infant has, at birth, perceptual
biases predicts that certain properties of the speech stream will be privileged.
This will be evident in both neonatal preferences and in categorical organization.
Such preferences and categorical percepts are already abundantly substantiated
in the infant speech perception literature, whereas others that also fall directly
from the architecture of PRIMIR have yet to be tested. For example, we predict
that infants will be able to learn the correlation between articulatorily possible
phonetic features more easily than between features that cannot co-occur
articulatorily. The assumption that there are constraints on learning contributes
to this predicted data pattern.
We have argued that a criterial number of meaningful words (Word
Form–concept linkages) is required before the Phonemic plane emerges. However,
the number of lexical items required is not yet known. Moreover, it remains to be
established whether phonemes emerge in an across-the-board fashion or in a stag-
gered fashion with denser lexical neighbourhoods yielding earlier appearing pho-
nemes. PRIMIR would predict the latter.
Throughout we have referred to word learning studies that have involved only
associative word–object learning. There is now evidence that infants 18 months
and older infer intentional understanding when taught words in an associative
PRIMIR 225
fashion (Preissler & Carey, 2004), but no such evidence exists for younger infants.
We believe there is a distinction between the associative word knowledge of a
14-month-old and the referential word knowledge of an older infant. PRIMIR does
not explicitly make this distinction (see also Nazzi & Bertoncini, 2003). PRIMIR
does, however, predict that raw salience plays a more important role in guiding dis-
crimination in younger infants and perceptual salience (defined along a number of
dimensions that have become organized in the representational planes) plays a
larger role across development. This opens the possibility that raw salience plays a
more important role in word discrimination for infants who are only capable of as-
sociative knowledge than for infants who have referential understanding. In future
work we hope to explore a possible link between the movement to referential word
learning and the emergence of the Phonemic plane.
An active part of our current research programs involves refining and testing
these predictions and evaluating the assumptions. It is hoped that this will allow for
the further elaboration of PRIMIR and lead to its evolution from a conceptual
framework to a formalized model.
We began this enterprise with the goal of providing a unified framework de-
signed to account for the seemingly contradictory findings in infant speech percep-
tion and word learning. PRIMIR is such a framework. We addressed in our frame-
work two fundamental issues in infant speech perception: that speech perception is
both categorical and gradient and that performance is influenced by both
ontogenetic development and online processing. PRIMIR’s multidimensional
planes allow categorical and gradient representations, and PRIMIR allows for con-
tinually changing and flexible performance as a function of age and task. It does so
by recognizing that processing and representations are inextricably entwined. As
we have demonstrated, it is only this type of conceptual framework that can ac-
count for the complexities of infant language development.
ACKNOWLEDGMENTS
This framework was developed with assistance from a Discovery Grant to Janet F.
Werker from Canada’s Natural Science & Engineering Research Council, with
funding from the Human Frontiers Science Program, and with a postdoctoral
fellowship to Suzanne Curtin from Canada’s Social Sciences and Humanities Re-
search Council. We also gratefully acknowledge the support of the Canada Re-
search Chair Program. We thank James Morgan, Chris Fennell, Laurel Fais, Dan
Hufnagle, Aniruddh Patel, George Hollich, and Athena Vouloumanos for detailed
comments on this and earlier versions of the article. We thank LouAnn Gerken, Su-
san Goldin-Meadow, and two anonymous reviewers for their helpful suggestions
and Marie Jetté and Vashti Garcia for their help editing the manuscript. To Anne
Cutler, we extend thanks for pointing out that the name should say it all, and to
226 WERKER AND CURTIN
Robert Remez and David Pisoni for inspiration. Finally, we want to acknowledge
how much this manuscript benefited from the enormous contributions Peter
Jusczyk made to the field.
REFERENCES
Anderson, J. L., Morgan, J. L., & White, K. S. (2003). A statistical basis for speech sound discrimina-
tion. Language and Speech, 46(2–3), 155–182.
Aslin, R. N., Pisoni, D. B., Hennessy, B. L., & Perey, A. J. (1981). Discrimination of voice onset time by
human infants: New findings and implications for the effects of early experience. Child Develop-
ment, 52(4), 1135–1145.
Aslin, R. N., Saffran, J. R., & Newport, E. L. (1998). Computation of conditional probability statistics
by 8-month-old infants. Psychological Science, 9(4), 321–324.
Aslin, R. N., Woodward, J. Z., LaMendola, N. P., & Bever, T. G. (1996). Models of word segmentation
in fluent maternal speech to infants. In J. L. Morgan & K. Demuth (Eds.), Signal to syntax: Boot-
strapping from speech to grammar in early acquisition (pp. 117–134). Mahwah, NJ: Lawrence
Erlbaum Associates, Inc.
Bahrick, L. E., Netto, D., & Hernandez Reif, M. (1998). Intermodal perception of adult and child faces
and voices by infants. Child Development, 69(5), 1263–1275.
Bailey, T. M., & Plunkett, K. (2002). Phonological specificity in early words. Cognitive Development,
17(2), 1267–1284.
Beckman, M. E. (1986). Stress and non-stress accent. In Netherlands Phonetic Archives No. 7.
Dordrecht, The Netherlands: Foris.
Beckman, M. E., & Edwards, J. (2000). The ontogeny of phonological categories and the primacy of
lexical learning in linguistic development. Child Development, 71(1), 240–249.
Bertoncini, J., Bijeljac-Babic, R., Blumstein, S., & Mehler, J. (1987). Discrimination of very short CV
syllables by neonates. Journal of the Acoustical Society of America, 82, 31–37.
Bertoncini, J., & Mehler, J. (1981). Syllables as units in infant speech perception. Infant Behavior and
Development, 4(3), 247–260.
Best, C. T., Calderon, J., Jones, C., Avery, R., Correa, L., & McRoberts, G. (1997, April). Preferences of
4-6 and 8-10 month olds for infant-directed vs. adult-directed utterances in two non-native lan-
guages. Poster session presented at the Society of Research in Child Development, Washington,DC.
Best, C. T., & McRoberts, G. W. (2003). Infant perception of non-native consonant contrasts that adults
assimilate in different ways. Language and Speech, 46(2–3), 183–216.
Best, C. T. (1994). The emergence of native-language phonological influences in infants: A percep-
tual assimilation model. In J. C. Goodman & H. C. Nusbaum (Eds.), The development of speech
perception: The transition from speech sounds to spoken words (pp. 167–224). Cambridge, MA:
MIT Press.
Bijeljac-Babic, R., Bertoncini, J., & Mehler, J. (1993). How do 4-day-old infants categorize
multisyllabic utterances? Developmental Psychology, 29(4), 711–721.
Bloom, P. (1999). Theories of word learning: Rationalist alternatives to associationism. In W. C.
Ritchie & T. K. Bhatia (Eds.), Handbook of child language acquisition (pp. 249–278). San Diego,
CA: Academic.
Boersma, P., Escudero, P., & Hayes, R. A. (2003). Learning abstract phonological from auditory pho-
netic categories: An integrated model for the acquisition of language-specific sound categories. In
Proceedings of the 15th International Congress of Phonetic Sciences (pp. 1013–1016). Adelaide,
Australia: Causal Productions.
PRIMIR 227
Bortfeld, H., Rathbun, K., Morgan, J., & Golinkoff, R. (2003). What’s in a name? Highly familiar items
anchor infants’ segmentation of fluent speech. In B. Beachley, A. Brown, & F. Conlin (Eds.), Pro-
ceedings of the 27th Annual Boston University Conference on Language Development (Vol. 1, pp.
162–172). Somerville, MA: Cascadilla.
Bosch, L., & Sebastián-Gallés, N. (1997). Native-language recognition abilities in 4-month-old infants
from monolingual and bilingual environments. Cognition, 65(1), 33–69.
Brent, M. R., & Cartwright, T. A. (1996). Distributional regularity and phonotactic constraints are use-
ful for segmentation. Cognition, 61(1–2), 93–125.
Brent, M. R., & Siskind, J. M. (2001). The role of exposure to isolated words in early vocabulary devel-
opment. Cognition, 81(2), B33–B44.
Burnham, D., Kitamura, C., & Vollmer-Conna, U. (2002). What’s new, pussycat? On talking to babies
and animals. Science, 296(5572), 1435.
Burnham, D., Tyler, M., & Horlyck, S. (2002). Periods of speech perception development and their ves-
tiges in adulthood. In P. Burmeister, T. Piske, & A. Rohde (Eds.), An integrated view of language de-
velopment: Papers in honor of Henning Wode (pp. 281–300). Trier, Germany: Wissenschaftlicher
Verlag Trier.
Cairns, P., Shillcock, R., Chater, N., & Levy, J. (1997). Bootstrapping word boundaries: A bottom-up
corpus-based approach to speech segmentation. Cognitive Psychology, 33(2), 111–153.
Chambers, K. E., Onishi, K. H., & Fisher, C. (2003). Infants learn phonotactic regularities from brief
auditory experiences. Cognition, 87(2), B69–B77.
Christiansen, M. H., Allen, J., & Seidenberg, M. S. (1998). Learning to segment speech using multiple
cues: A connectionist model. Language and Cognitive Processes, 13, 2–3.
Christophe, A., Dupoux, E., Bertoncini, J., & Mehler, J. (1994). Do infants perceive word boundaries?
An empirical study of the bootstrapping of lexical acquisition. Journal of the Acoustical Society of
America, 95(3), 1570–1580.
Colombo, J., & Horowitz, F. D. (1986). Infants’ attentional responses to frequency modulated sweeps.
Child Development, 57(2), 287–291.
Cooper, R. P., & Aslin, R. N. (1990). Preference for infant-directed speech in the first month after birth.
Child Development, 61(5), 1584–1595.
Cooper, R. P., & Aslin, R. N. (1994). Developmental differences in infant attention to the spectral prop-
erties of infant-directed speech. Child Development, 65(6), 1663–1677.
Curtin, S., Fennell, C. T., & Werker, J. F. (2005). Can saliency explain the detection of minimal pairs
during word learning? Manuscript in preparation.
Curtin, S., Mintz, T. H., & Byrd, D. (2001). Coarticulatory cues enhance infants’recognition of syllable
sequences in speech. In A. H. J. Do, L. Dominguez, & A. Johansen (Eds.), Proceedings of the 25th
Annual Boston University Conference on Language Development (pp. 190–201). Somerville, MA:
Cascadilla.
Curtin, S., Mintz, T. H., & Christiansen, M. H. (in press). Stress changes the representational landscape:
Evidence from word segmentation. Cognition.
Curtin, S., & Werker, J. F. (2005). The developmental progression of criterial information in early word
learning. Manuscript in preparation.
Cutler, A., & Carter, D. M. (1987). The predominance of strong initial syllables in the English vocabu-
lary. Computer Speech and Language, 2, 3–4.
Cutler, A., & Norris, D. (1988). The role of strong syllables in segmentation for lexical access. Journal
of Experimental Psychology: Human Perception and Performance,14(1), 113–121.
DeCasper, A. J., & Fifer, W. P. (1980). Of human bonding: Newborns prefer their mothers’voices. Sci-
ence, 208(4448), 1174–1176.
DeCasper, A. J., & Prescott, P. (1984). Human newborns’ perception of male voices: Preference, dis-
crimination and reinforcing value. Developmental Psychobiology, 17, 481– 491.
228 WERKER AND CURTIN
Dehaene-Lambertz, G., & Baillet, S. (1998). A phonological representation in the infant brain.
Neuroreport, 9(8), 1885–1888.
Dehaene-Lambertz, G., Dehaene, S., & Hertz-Pannier, L. (2002). Functional neuroimaging of speech
perception in infants. Science, 298(5600), 2013–2015.
Dehaene-Lambertz, G., & Gliga, T. (in prep). Common neural basis for phoneme processing in infants
and adults. Journal of Cognitive Neuroscience.
Dehaene-Lambertz, G., & Pena
~, M. (2001). Electrophysiological evidence for automatic phonetic pro-
cessing in neonates. Neuroreport, 12(14), 3155–3158.
Dietrich, C., Swingley, D., & Werker, J. F. (2004, November). Phonetic information in infant
word-learning. Presented at the Boston UniversityConference on Language Development, Boston.
Dupoux, E., & Peperkamp, S. (2002, May). The phonetic filter hypothesis: How phonology impacts
speech perception and vice versa. Poster session presented at the Second International Conference
on Contrast in Phonology, University of Toronto, Ontario, Canada.
Echols, C. H., Crowhurst, M. J., & Childers, J. B. (1997). The perception of rhythmic units in speech by
infants and adults. Journal of Memory and Language, 36, 202–225.
Eimas, P. D., & Miller, J. L. (1992). Organization in the perception of speech by young infants. Psycho-
logical Science, 3(6), 340–345.
Eimas, P. D., Siqueland, E. R., Jusczyk, P., & Vigorito, J. (1971). Speech perception in infants. Science,
171(968), 303–306.
Fellowes, J. M., Remez, R. E., & Rubin, P. E. (1997). Perceiving the sex and identity of a talker without
natural vocal timbre. Perception and Psychophysics, 59(6), 839–849.
Fennell, C. T. (2004). Infant attention to phonetic detail in word forms: Knowledge and familiarity ef-
fects. Unpublished doctoral dissertation, University of British Columbia, Vancouver.
Fennell, C. T., & Werker, J. F. (2003). Early word learners’ ability to access phonetic detail in
well-known words. Language and Speech, 46(2–3), 245–264.
Fennell, C. T., & Werker, J. F. (2004). Infant attention to phonetic detail: Knowledge and familiarity ef-
fects. In A. Brugos, L. Micciulla, & C. E. Smith (Eds.), Proceedings of the 28th Annual Boston Uni-
versity Conference on Language Development (Vol. 1, pp. 165–176). Somerville, MA: Cascadilla.
Fernald, A. (Ed.). (1984). The perceptual and affective salience of mothers’speech to infants. Norwood,
NJ: Ablex.
Fitch, H., Halwes, T., Erickson, D., & Liberman, A. (1980). Perceptual equivalence of two acoustic
cues for stop-consonant manner. Perception & Psychophysics, 27, 343–350.
Floccia, C., Nazzi, T., & Bertoncini, J. (2000). Unfamiliar voice discrimination for short stimuli in new-
borns. Developmental Science, 3(3), 333–343.
Goldstein, L. M., & Fowler, C. (2003). Articulatory phonology: A phonology for public language use.
In A. S. Meyer & N. O. Schiller (Eds.), In Phonetics and phonology in language comprehension and
production: Differences and similarities (pp. 159–207). Berlin, Germany: Mouton de Gruyter.
Hall, D. G. (1993). Assumptions about word meaning: Individuation and basic-level kinds. Child De-
velopment, 64, 1550–1570.
Halle, P. A., & de Boysson-Bardies, B. (1996). The format of representation of recognized words in in-
fants’ early receptive lexicon. Infant Behavior and Development, 19, 463–481.
Hauser, M. D., Newport, E. L., & Aslin, R. N. (2001). Segmentation of the speech stream in a non-hu-
man primate: Statistical learning in cotton-top tamarins. Cognition, 78(3), B53–B64.
Hirsh-Pasek, K., & Golinkoff, R. M. (1996). The intermodal preferential looking paradigm: A win-
dow onto emerging language comprehension. In D. McDaniel, C. McKee, & H. S. Cairns (Eds.),
Methods for assessing children’s syntax (pp. 105–124). Cambridge, MA: MIT Press.
Hollich, G., Jusczyk, P. W., & Luce, P. (2002). Lexical neighbourhood effects in 17-month-old word
learning. In Proceedings of the 26th Annual Boston University Conference on Language Develop-
ment (pp. 314–323). Boston, MA: Cascadilla.
PRIMIR 229
Houston, D. M., & Jusczyk, P. W. (2000). The role of talker-specific information in word segmentation
by infants. Journal of Experimental Psychology: Human Perception and Performance, 26(5),
1570–1582.
Houston, D. M., & Jusczyk, P. W. (2003). Infants’long-term memory for the sound patterns of words and
voices.Journal of Experimental Psychology: Human Perception& Performance,29(6), 1143–1154.
Johnson, E. K., & Jusczyk, P. W. (2001). Word segmentation by 8-month-olds: When speech cues count
more than statistics. Journal of Memory and Language, 44(4), 548–567.
Jusczyk, P. W. (1981). Infant speech perception: A critical appraisal. In P. D. Eimas & J. L. Miller
(Eds.), Perspectives on the study of speech (pp. 113–164). Mahwah, NJ: Lawrence Erlbaum Associ-
ates, Inc.
Jusczyk, P. W. (1987). Implications from speech studies on the unit of perception. In M. E. H.
Schouten (Ed.), The psychophysics of speech perception (pp. 433–443). Dordrecht, The Nether-
lands: Nijhoff.
Jusczyk, P. W. (1992). Developing phonological categories from the speech signal. In C. A. Ferguson,
L. Menn, & C. Stoel-Gammon (Eds.), Phonological development: Models, research, implications
(pp. 17–64). Parkton, MD: York Press.
Jusczyk, P. W. (1997a). The discovery of spoken language. Cambridge, MA: MIT Press.
Jusczyk, P. W. (1997b). Finding and remembering words: Some beginnings by English-learning in-
fants. Current Directions in Psychological Science, 6(6), 170–174.
Jusczyk, P. W., & Aslin, R. N. (1995). Infants’detection of the sound patterns of words in fluent speech.
Cognitive Psychology, 29, 1–23.
Jusczyk, P. W., Cutler, A., & Redanz, N. J. (1993). Infants’ preference for the predominant stress pat-
terns of English words. Child Development, 64, 675–687.
Jusczyk, P. W., Friederici, A. D., Wessels, J. M., Svenkerud, V. Y., & Jusczyk, A. M. (1993). Infants’
sensitivity to the sound patterns of native language words. Journal of Memory and Language, 32,
402–420.
Jusczyk, P. W., Goodman, M. B., & Baumann, A. (1999). Nine-month-olds’attention to sound similari-
ties in syllables. Journal of Memory and Language, 40(1), 62–82.
Jusczyk, P. W., Hohne, E. A., & Bauman, A. (1999). Infants’sensitivity to allophonic cues to word seg-
mentation. Perception & Psychophysics, 61, 1465–1476.
Jusczyk, P. W., Houston, D. M., & Newsome, M. (1999). The beginnings of word segmentation in Eng-
lish-learning infants. Cognitive Psychology, 39(3–4), 159–207.
Jusczyk, P. W., Luce, P. A., & Charles-Luce, J. (1994). Infants’ sensitivity to phonotactic patterns in the
native language. Journal of Memory & Language, 33(5), 630–645.
Karzon, R. G. (1985). Discrimination of polysyllabic sequences by one to four-month-old infants.
Journal of Experimental Child Psychology, 39, 326–342.
Katz, N., Baker, E., & Macnamara, J. (1974). What’s in a name? A study of how children learn common
and proper names. Child Development, 45(2), 469–473.
Kirkham, N. Z., Slemmer, J. A., & Johnson, S. P. (2002). Visual statistical learning in infancy: Evidence
for a domain general learning mechanism. Cognition, 83(2), B35–B42.
Kuhl, P. K. (1983a). Perception of auditory equivalence classes for speech in early infancy. Infant Be-
havior and Development, 6(3), 263–285.
Kuhl, P. K. (1983b). The perception of speech in early infancy: Four phenomena. In S. E. Gerber & G.
T. Mencher (Eds.), The development of auditory behavior (pp. 187–217). New York: Grune &
Stratton.
Kuhl, P. K. (1991). Human adults and human infants show a “perceptual magnet effect” for the proto-
types of speech categories, monkeys do not. Perception & Psychophysics, 50(2), 93–107.
Kuhl, P. K. (1993). Innate predispositions and the effects of experience in speech perception: The native
language magnet theory. In B. de Boysson-Bardies, S. de Schonen, P. Jusczyk, P. McNeilage, & J.
230 WERKER AND CURTIN
Morton (Eds.), Developmental neurocognition: Speech and face processing in the first year of life
(pp. 259–274). Dordrecht, The Netherlands: Kluwer Academic.
Kuhl, P. K., Andruski, J. E., Chistovich, I. A., Chistovich, L. A., Kozhevnikova, E. V., Ryskina, V. L., et
al. (1997). Cross-language analysis of phonetic units in language addressed to infants. Science, 277,
684–686.
Kuhl, P. K., Williams, K. A., Lacerda, F., Stevens, K. N., & Lindblom, B. (1992). Linguistic experience
alters phonetic perception in infants by 6 months of age. Science, 255, 606–608.
Ladefoged, P. (1993). A course in phonetics (3rd ed.). New York: Harcourt Brace.
Liu, H.-M., Kuhl, P. K., & Tsao, F.-M. (2003). An association between mothers’speech clarity and in-
fants’ speech discrimination skills. Developmental Science, 6(3), F1–F10.
Mandel, D. R., Jusczyk, P. W., & Pisoni, D. B. (1995). Infants’recognition of the sound patterns of their
own names. Psychological Science, 6(5), 315–318.
Mann, V. A., & Repp, B. H. (1980). Influence of vocalic context on the [s]-[sh] distinction. Perception
and Psychophysics, 28, 213–228.
Markman,E. M. (1990). Constraints children place on word meanings. Cognitive Science, 14(1), 57–77.
Mattys, S. L., & Jusczyk, P. W. (2001). Do infants segment words or recurring contiguous patterns?
Journal of Experimental Psychology: Human Perception and Performance, 27(3), 644–655.
Mattys, S. L., Jusczyk, P. W., Luce, P. A., & Morgan, J. L. (1999). Phonotactic and prosodic effects on
word segmentation in infants. Cognitive Psychology, 38(4), 465–494.
Maye, J., & Weiss, D. (2003). Statistical cues facilitate infants’ discrimination of difficult phonetic
contrasts. In B. Beachley, A. Brown, & F. Conlin (Eds.), Proceedings of the 27th Annual Boston
University Conference on Language Development (Vol. 2, pp. 508–518). Somerville, MA:
Cascadila.
Maye, J., Werker, J. F., & Gerken, L. (2002). Infant sensitivity to distributional information can affect
phonetic discrimination. Cognition, 82(3), B101–B111.
McMurray, B., & Aslin, R. N. (2005). Infants are sensitive to within-category variation in speech per-
ception. Cognition, 95, B15–B26.
McMurray, B., Tanenhaus, M. K., & Aslin, R. N. (2002). Gradient effects of within-category phonetic
variation on lexical access. Cognition, 86, B33–B42.
Mehler, J., Jusczyk, P., Lambertz, G., Halsted, N., Bertoncini, J., & Amiel-Tison, C. (1988). A precur-
sor of language acquisition in young infants. Cognition, 29, 143–178.
Merriman, W. E., & Bowman, L. L. (1989). The mutual exclusivity bias in children’s word learning.
Monographs of the Society for Research in Child Development,4(220), 1–123.
Miller, C. L. (1983). Developmental changes in male/female voice classification by infants. Infant be-
havior and development,6, 313–330.
Miller, J. L., & Eimas, P. D. (1996). Internal structure of voicing categories in early infancy. Perception
and Psychophysics,58(8), 1157–1167.
Morgan, J. (2002, June). Word recognition and phonetic structure acquisition: Possible relations. Paper
presented at The 143rd Meeting of the Acoustical Society of America: Special Session in Memory of
Peter Jusczyk, Pittsburgh, Pennsylvania.
Morgan, J. L., & Saffran, J. R. (1995). Emerging integration of sequential and suprasegmental informa-
tion in preverbal speech segmentation. Child Development,66, 911–936.
Naigles, L. R. (2002). Form is easy, meaning is hard: Resolving a paradox in early child language. Cog-
nition, 86(2), 157–199.
Nazzi, T., & Bertoncini, J. (2003). Before and after the vocabulary spurt: Two modes of word acquisi-
tion? Developmental Science, 6(2), 136–142.
Nazzi, T., Bertoncini, J., & Mehler, J. (1998). Language discrimination by newborns: Toward an under-
standing of the role of rhythm. Journal of Experimental Psychology: Human Perception and Perfor-
mance,24(3), 756–766.
PRIMIR 231
Nazzi, T., Floccia, C., & Bertoncini, J. (1998). Discrimination of pitch contours by neonates. Infant Be-
havior and Development,21(4), 779–784.
Nazzi, T., Jusczyk, P. W., & Johnson, E. K. (2000). Language discrimination by English-learning
5-month-olds: Effects of rhythm and familiarity. Journal of Memory and Language, 43(1), 1–19.
Nelson, K. (1974). Concept, word, and sentence: Interrelations in acquisition and development. Psy-
chological Review, 81(4), 267–285.
Nygaard, L. C., Sommers, M. S., & Pisoni, D. B. (1995). Effects of stimulus variability on perception
and representation of spoken words in memory. Perception and Psychophysics, 57(7), 989–1001.
Ohala, J. J. (1983). Cross-language use of pitch: An ethological view. Phonetica, 40, 1–18.
Palmeri, T.J., Goldinger, S. D., & Pisoni, D. B. (1993). Episodic encoding of voice attributes and recog-
nition memory for spoken words. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 19(2), 309–328.
Panneton Cooper, R., & Ostroff, W. (2003, May). Task-specific influences on attention to infant-di-
rected speech and non-native speech during infancy. Paper presented at the University of Melbourne,
Australia.
Pater, J., Stager, C. L., & Werker, J. F. (2004). The lexical acquisition of phonological contrasts. Lan-
guage, 80(3), 361–379.
Patterson, M. L., & Werker, J. F. (2002). Infants’ability to match dynamic phonetic and gender infor-
mation in the face and voice. Journal of Experimental Child Psychology, 81(1), 93–115.
Pegg, J. E., & Werker, J. F. (1997). Adult and infant perception of two English phones. Journal of the
Acoustical Society of America, 102(6), 3742–3753.
Pena
~, M., Bonatti, L. L., Nespor, M., & Mehler, J. (2002). Signal-driven computations in speech pro-
cessing. Science, 298(5593), 604–607.
Pena
~, M., Maki, A., Kovacic, D., Dehaene-Lambertz, G., Koizumi, H., Bouquet, F., et al. (2003).
Sounds and silence: An optical topography study of language recognition at birth. Proceedings of the
National Acadamy of Sciences, 100(20), 11702–11705.
Peperkamp, S., & Dupoux, E. (2002). Coping with phonological variationin early lexical acquisition. In I.
Lasser(Ed.), The processof language acquisition (pp. 359–385). Berlin, Germany:Peter LangVerlag.
Pierrehumbert, J. (2003a). Phonetic diversity, statistical learning, and acquisition of phonology. Lan-
guage and Speech, 46(2–3), 115–154.
Pierrehumbert, J. (2003b). Probabilistic phonology: Discrimation and robustness. In R. Bod, J. Hay, &
S. Jannedy (Eds.), Probability theory in linguistics (pp. 177228). Cambridge, MA: MIT Press.
Pisoni, D. B., & Tash, J. (1974). Reaction times to comparisons within and across phonetic categories.
Perception & Psychophysics, 15(2), 285–290.
Polka, L., & Bohn, O.-S. (2003). Asymmetries in vowel perception. Speech Communication, 41(1),
221–231.
Polka, L., Sundara, M., & Blue, S. (2002, June). The role of language experience in word segmentation:
A comparison of English, French, and bilingual infants. Paper presented at the 143rd Meeting of the
Acoustical Society of America: Special Session in Memory of Peter Jusczyk, Pittsburgh,
Pennsylvania.
Polka, L., & Werker, J. F. (1994). Developmental changes in perception of nonnative vowel contrasts.
Journal of Experimental Psychology: Human Perception and Performance, 20(2), 421–435.
Preissler, M. A., & Carey, S. (2004). Do both pictures and words function as symbols for 18- and
24-month-old children. Journal of Cognition & Development, 5(2), 185–212.
Ramus, F., Hauser, M. D., Miller, C., Morris, D., & Mehler, J. (2000). Language discrimination by hu-
man newborns and by cotton-top tamarin monkeys. Science, 288(5464), 349–351.
Ratner, N. B. (1984). Patterns of vowel modification in mother–child speech. Journal of Child Lan-
guage, 11, 557–578.
Ratner, N. B., & Luberoff, A. (1984). Cues to post-vocalic voicing in mother–child speech. Journal of
Phonetics, 12, 285–289.
232 WERKER AND CURTIN
Remez, R. E. (2000). Speech spoken and represented. In E. Dietrich & A. B. Markman (Eds.), Cogni-
tive dynamics: Conceptual and representational change in humans and machines (pp. 93–115).
Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Repp, B. (1982). Phonetic trading relations and context effects: New experimental evidence for a
speech mode of perception, Psychological Bulletin, 92, 81–110.
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Sci-
ence, 274(5294), 1926–1928.
Saffran, J. R., Johnson, E. K., Aslin, R. N., & Newport, E. L. (1999). Statistical learning of tone se-
quences by human infants and adults. Cognition, 70(1), 27–52.
Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. Science,
237, 1317–1323.
Shi, R., Werker, J. F., & Cutler, A. (2003). Function words in early speech perception. In Proceedings of
the 15th International Conference of Phonetic Sciences (pp. 3009–3012). Adelaide, Australia:
Causal Productions.
Singh, L., Bortfeld, H., & Morgan, J. (2002). Effects of variability on infant word recognition. In A. H.
J. Do, L. Domínguez, & A. Johansen (Eds.), Proceedings of the 26th Annual Boston University Con-
ference on Language Development (pp. 608–619). Somerville, MA: Cascadilla.
Stager, C. L., & Werker, J. F. (1997). Infants listen for more phonetic detail in speech perception than in
word-learning tasks. Nature, 388(6640), 381–382.
Swingley, D., & Aslin, R. N. (2000). Spoken word recognition and lexical representation in very young
children. Cognition, 76(2), 147–166.
Swingley, D., & Aslin, R. N. (2002). Lexical neighborhoods and the word-form representations of
14-month-olds. Psychological Science, 13(5), 480–484.
Swoboda, P. J., Kass, J., Morse, P. A., & Leavitt, L. A. (1978). Memory factors in vowel discrimination
of normal and at-risk infants. Child Development, 49(2), 332–339.
Thiessen, E. D., & Saffran, J. R. (2003). When cues collide: Use of stress and statistical cues to word
boundaries by 7- to 9-month-old infants. Developmental Psychology, 39(4), 706–716.
Tincoff, R., & Jusczyk, P. W. (1999). Some beginnings of word comprehension in 6-month-olds. Psy-
chological Science, 10(2), 172–175.
Trubetzkoy, N. S. (1969). Principles of phonology (C. Baltaxe, Trans.).Berkeley: Universityof Califor-
nia Press. (Original work published 1939)
Vihman, M. M. (2002). Getting started without a system: From phonetics to phonology in bilingual de-
velopment. International Journal of Bilingualism, 6, 239–254.
Vouloumanos, A., & Werker, J. F. (2004). Tuned to the signal: the privileged status of speech for young
infants. Developmental Science, 7(3), 270.
Vouloumanos, A., & Werker, J. F. (2004, May). Listening to speech at birth. Paper presented at the 14th
Biennial International Conference on Infant Studies, Chicago.
Walker-Andrews, A. S., Bahrick, L. E., Raglioni, S. S., & Diaz, I. (1991). Infants’ bimodal perception
of gender. Ecological Psychology, 3(2), 55–75.
Werker, J. F. (1995). Exploring developmental changes in cross-language speech perception. In L. R.
Gleitman & M. Liberman (Eds.), Language: An invitation to cognitive science, Part 1 (2nd ed., pp.
87–106). Cambridge, MA: MIT Press.
Werker, J. F., Cohen, L. B., Lloyd, V. L., Casasola, M., & Stager, C. L. (1998). Acquisition of word-ob-
ject associations by 14-month-old infants. Developmental Psychology, 34(6), 1289–1309.
Werker, J. F., Fennell, C. T., Corcoran, K. M., & Stager, C. L. (2002). Infants’ ability to learn phoneti-
cally similar words: Effects of age and vocabulary size. Infancy, 3(1), 1–30.
Werker, J. F., Frost, P. E., & McGurk, H. (1992). La langue et les levres: Cross-language influences on
bimodal speech perception. Canadian Journal of Psychology, 46(4), 551–568.
Werker, J. F., Gilbert, J. H., Humphrey, K., & Tees, R. C. (1981). Developmental aspects of cross-lan-
guage speech perception. Child Development, 52(1), 349–355.
PRIMIR 233
Werker, J. F., Ladhar, N., & Corcoran, K. M. (2005). Language-specific phonetic categories direct word
learning. Manuscript in preparation.
Werker, J. F., & Lalonde, C. E. (1988). Cross-language speech perception: Initial capabilities and devel-
opmental change. Developmental Psychology, 24(5), 672–683.
Werker, J. F., & Logan, J. S. (1985). Cross-language evidence for three factors in speech perception.
Perception and Psychophysics, 37(1), 35–44.
Werker, J. F., & McLeod, P. J. (1989). Infant preference for both male and female infant-directed talk: A
developmental study of attentional and affective responsiveness. Canadian Journal of Psychology,
43(2), 230–246.
Werker, J. F., & Tees, R. C. (1984). Cross-language speech perception: Evidence for perceptual reorga-
nization during the first year of life. Infant Behavior and Development, 7(1), 49–63.
Werker, J. F., & Tees, R. C. (1999). Influences on infant speech processing: Toward a new synthesis.
Annual Review of Psychology, 50, 509–535.
Whalen, D. H., Abramson, A. S., Lisker, L., & Mody, M. (1993). F0 gives voicing information even with
unambiguous voice onset times. Journal of the Acoustical Society of America, 93(4), 2152–2159.
Whalen, D. H., & Levitt, A. G. (1995). The universality of intrinsic F0 of vowels. Journal of Phonetics,
23, 349–366.
234 WERKER AND CURTIN
... Die Idee des statistischen Lernens als Mechanismus, der das perceptual attunement vorantreibt, findet sich daher in vielen Theorien der frühen Sprachwahrnehmung (z. B. Werker & Curtin, 2005;Kuhl et al., 2008). ...
... Eine daran anschließende Frage ist, inwieweit sich lautsprachliche Repräsentationen grundlegend durch den Erwerb eines Lexikons verändern (siehe lexical restructuring model, Metsala & Walley, 1998). Das PRIMIR-Modell (Werker & Curtin, 2005) geht davon aus, dass sich die lautliche Information verändert, die im Laufe der sprachlichen Entwicklung genutzt wird. Diese Annahme basiert auf scheinbar widersprüchlichen Befunden, dass Säuglinge zwar bereits im ersten Lebensjahr verschiedenste Lautkontraste unterscheiden können, im zweiten Lebensjahr aber Schwierigkeiten haben, dieses lautliche Detail beim Wortlernen zu nutzen (Stager & Werker, 1997). ...
Article
Full-text available
Zusammenfassung: Der Artikel gibt einen Überblick über die Entwicklung der Sprachwahrnehmung im ersten Lebensjahr. Er zeichnet den Entwicklungsverlauf von den frühesten Hörerfahrungen bis zum ersten gesprochenen Wort nach und geht darauf ein: (i) wie vorgeburtliche Erfahrungen das Hören bei Neugeborenen prägen; (ii) wie die Sprachwahrnehmung sich im Säuglingsalter verändert und auf die Umgebungssprache einstellt; (iii) welche Faktoren die Veränderung in der Sprachwahrnehmung beeinflussen; (iv) wie theoretische Modelle diese Veränderungen erklären; und (v) welche Rolle die perzeptuelle Entwicklung im ersten Lebensjahr für die weitere Sprachentwicklung spielt.
... In the top-down case, early language would be bootstrapped by grounding of holistic phrase-like speech patterns with other multimodal and/or embodied experiences such as visual input. In this case, phonemic and lexical representations would not be proximal targets of learning, but gradually emerge through analysis and decomposition of the situated language patterns into constituents that enable more efficient encoding of the language (cf., e.g., PRIMIR theory by Werker & Curtin, 2005; see also Khorrami & Räsänen, 2021;Merkx, 2022;Tomasello, 2000; see also Räsänen & Rasilo, 2015, for a discussion on the bottom-up and top-down strategies). ...
... Hence, one should not equate the emergence of different language capabilities at different developmental milestones with a cascade of specific learning processes targeting at those capabilities. Thereby, the present results are equally compatible with the usage-based accounts of language acquisition, where gradual emergence of lexical and sub-lexical structure results from holistic "top-down" meaning-centered learning (see Chrupała, 2022;Khorrami & Räsänen, 2021;or Merkx, 2022, for a recent discussion, models and references; cf. also Werker & Curtin, 2005). ...
Conference Paper
Full-text available
Previous computational models of early language acquisition have shown how linguistic structure of speech can be acquired using auditory or audiovisual learning mechanisms. However, real infants have sustained access to both uni-and multimodal sensory experiences. Therefore, it is of interest how the uni-and multimodal learning mechanisms could operate in concert, and how their interplay might affect the acquisition dynamics of different linguistic representations. This paper explores these questions with a computational model capable of simultaneous auditory and audiovisual learning from speech and images. We study how the model's latent representations reflect phonemic, lexical, and semantic knowledge as a function of language experience. We also test how the findings vary with differential emphasis on the two learning mechanisms. As a result, we find phonemic learning always starting to emerge before lexical learning, followed by semantics. However, there is also notable overlap in their development. The same pattern emerges irrespectively of the emphasis on auditory or audiovisual learning. The result illustrates how the acquisition dynamics of linguistic representations are decoupled from the primary learning objectives (mechanisms) of the learner, and how the emergence of phonemes and words can be facilitated by both auditory and audiovisual learning in a synergetic manner.
... Empirical evidence suggests that children's language development in a minority language is predicted by the number of native speakers around them (Place & Hoff, 2016). A plausible explanation is that input from various speakers helps tease linguistic information apart from speaker information and establish robust representations of speech categories (Werker & Curtin, 2005). Furthermore, the excerpt from Teacher Hsu indicated teachers' perspective on the roles of home language policies in heritage language maintenance and their effort to encourage parents' use of home language(s), which was in line with evidence on heritage language attrition with reduced home language input (Chang et al., 2011). ...
Article
Full-text available
Despite an increasing interest in pronunciation instruction in English as a majority language or international lingua franca, less is known about pronunciation learning in non-English minority languages, especially among child learners. Bilingual education programs provide a unique context to address this research gap, as they involve immersive education in minority languages. Teachers in these programs thus are insightful informants. The current study focuses on the context of a Mandarin-English bilingual program in Canada and addresses two research questions: What factors do teachers believe influence students’ Mandarin pronunciation learning? What are teachers’ strategies and needs when teaching Mandarin pronunciation? Semi-structured interviews were conducted with twelve Chinese teachers with diverse language backgrounds. The teachers discussed multifaceted factors that influenced bilingual students’ pronunciation learning, including speech targets, individual factors, and language environments at school and in society. Teachers shared a wide array of pronunciation teaching techniques, although they expressed concerns related to policies and resources. This study demonstrates the complexity of teaching the pronunciation of a minority language, whose speech system is distinctly different from English, in a bilingual classroom setting. It shares teaching strategies among bilingual teachers and identifies future directions for policymaking and research.
... More specifically, infants develop and then maintain their sensitivity to native language phonetic contrasts (Kuhl et al., 2006;Werker & Tees, 1984) whilst at the same time, their sensitivity to most (but not all, see Best & McRoberts, 2003) non-native contrasts declines. This developmental evolution is informed by patterns and variability in an infant's early language environment (Best et al., 2016;Kuhl et al., 2006;Werker & Curtin, 2005). However, the field has focused primarily on trajectories of perceptual narrowing in monolingual children who are learning one language and thus one set of phonetic rules. ...
... Researchers have conducted extensive investigations to understand the initial state and process of language acquisition, providing insights into how environmental and genetic factors interact to fashion language and cognitive function, and the mechanisms underlying brain plasticity (Barkat et al., 2011;Weaver et al., 2004;Werker & Hensch, 2015;Werker & Tees, 2005). It is now widely accepted that both genetic and experiential factors contribute to language acquisition (Gervain & Mehler, 2010;Werker & Curtin, 2005), and researchers are interested in understanding how these factors interact during human development. ...
Preprint
Full-text available
Exposure to maternal speech during the prenatal period shapes speech perception and linguistic preferences, allowing neonates to recognize stories heard frequently in utero and demonstrating an enhanced preference for their mother’s voice and native language. Yet, with a high prevalence of bilingualism worldwide, it remains an open question whether monolingual or bilingual maternal speech during pregnancy influence differently the fetus’ neural mechanisms underlying speech sound encoding. In the present study, the frequency-following response (FFR), an auditory evoked potential that reflects the complex spectrotemporal dynamics of speech sounds, was recorded to a two-vowel /oa/ stimulus in a sample of 131 healthy term neonates within the 1-3 days after birth. Newborns were divided into two groups according to maternal language usage during the last trimester of gestation (monolingual; bilingual). Spectral amplitudes and spectral signal-to-noise ratios (SNR) at the stimulus fundamental (F 0 ) and first formant (F 1 ) frequencies of each vowel were respectively taken as measures of pitch and formant structure neural encoding. Our results reveal that while spectral amplitudes at F 0 did not differ between groups, neonates from bilingual mothers exhibited a lower spectral SNR. Additionally, monolingually exposed neonates exhibited a higher spectral amplitude and SNR at F 1 frequencies. We interpret our results under the consideration that bilingual maternal speech, as compared to monolingual, is characterized by a greater complexity in the speech sound signal, rendering newborns from bilingual mothers more sensitive to a wider range of speech frequencies without generating a particularly strong response at any of them. Our results contribute to an expanding body of research indicating the influence of prenatal experiences on language acquisition and underscore the necessity of including prenatal language exposure in developmental studies on language acquisition, a variable often overlooked yet capable of influencing research outcomes.
... Consequently, an early developmental challenge faced by infants is acquiring the phonemic categories of their language. It is widely accepted that these are at least partially acquired by around 18 months (Werker & Curtin, 2005), a time when infants cannot produce much speech themselves, know a limited number of words, and may receive little explicit feedback on whether they are perceiving speech accurately. Consequently, the canonical view is that most important mechanisms for these earliest phases of development rely on sensitivity to information in the speech stimuli (e.g., the statistical structure of speech cues). ...
Article
Full-text available
Purpose Talkers adapt their speech according to the demands of their listeners and the communicative context, enhancing the properties of the signal (pitch, intensity) and/or properties of the code (enhancement of phonemic contrasts). This study asked how mothers adapt their child-directed speech (CDS) in ways that might serve the immediate goals of increasing intelligibility, as well as long-term goals of supporting speech and language development in their children. Method Mothers ( N = 28) participated in a real-time interactive speech production/perception paradigm, in which mothers instructed their young (3- to 5-year-old) children, or an adult listener, to select the picture corresponding to a target word. The task was performed at low and high levels (56 vs. 75 dB SPL) of background noise to examine the Lombard effects of decreased audibility on speech production. Results Acoustic–phonetic analyses of CDS and adult-directed speech (ADS) productions of target words and carrier phrase (e.g., “Find pig”) revealed that mothers significantly enhanced the mean pitch, pitch variability, and intensity of target words in CDS, particularly at higher background noise levels and for younger children. Mothers produce CDS with a higher signal-to-noise ratio than ADS. However, limited evidence was found for phonetic enhancement of the segmental properties of speech. Although increased category separation was found in the voice onset time of stop consonants, decreased vowel category separation (an anti-enhancement effect) was observed in CDS. Conclusions Mothers readily enhance the suprasegmental signal properties of their speech in CDS, but not the acoustic–phonetic properties of phonemes. This study fails to provide evidence of phonetic enhancement in preschool children in a dyadic communication task under noisy listening conditions. Supplemental Material https://doi.org/10.23641/asha.24645423
... The speech signal is rich in information and it can be encoded based on a number of different dimensions. Infants' representation of these dimensions emerges from their initial perceptual biases, the acousticperceptual saliency of the information, the regularities present in the input language and the implicit learning mechanisms that are at work [2]. Phonetic learning in the first year of life has been characterized as a transition from language-general to language-specific speech perception resulting from experience with the native input. ...
Conference Paper
Full-text available
Infants' perceptual narrowing to native speech sounds has been mainly described from auditory paradigms. Previous results from fricative place of articulation discrimination in infancy offer a non-convergent view on the factors modulating discrimination of noise contrasts. The present research focuses on infants' capacity to discriminate a native voiceless fricative place of articulation contrast, [s]-[f]. The contribution of audiovisual (AV) cues in reaching successful discrimination was explored. Three groups of 6-month-old infants were tested with versions of the familiarization-preference procedure involving two auditory-only conditions (adult-directed and infant-directed speech) and an AV presentation. Discrimination was absent and not favored by the AV format. Positive evidence was only obtained in an older group of 12-month-olds from the auditory-only version. The developmental pattern emerging from these data deviates from the expected maintenance of early language-general skills and suggests a late emergence of this fricative contrast with limited contribution of AV cues.
Article
Although each of us was once a baby, infant consciousness remains mysterious and there is no received view about when, and in what form, consciousness first emerges. Some theorists defend a ‘late-onset’ view, suggesting that consciousness requires cognitive capacities which are unlikely to be in place before the child’s first birthday at the very earliest. Other theorists defend an ‘early-onset’ account, suggesting that consciousness is likely to be in place at birth (or shortly after) and may even arise during the third trimester. Progress in this field has been difficult, not just because of the challenges associated with procuring the relevant behavioral and neural data, but also because of uncertainty about how best to study consciousness in the absence of the capacity for verbal report or intentional behavior. This review examines both the empirical and methodological progress in this field, arguing that recent research points in favor of early-onset accounts of the emergence of consciousness.
Article
Full-text available
There is a current 'theory crisis' in language acquisition research, resulting from fragmentation both at the level of the approaches and the linguistic level studied. We identify a need for integrative approaches that go beyond these limitations, and propose to analyse the strengths and weaknesses of current theoretical approaches of language acquisition. In particular, we advocate that language learning simulations, if they integrate realistic input and multiple levels of language, have the potential to contribute significantly to our understanding of language acquisition. We then review recent results obtained through such language learning simulations. Finally, we propose some guidelines for the community to build better simulations.
Book
This volume contains the proceedings of a NATO Advanced Research Workshop (ARW) on the topic of "Changes in Speech and Face Processing in Infancy: A glimpse at Developmental Mechanisms of Cognition", which was held in Carry-Ie-Rouet (France) at the Vacanciel "La Calanque", from June 29 to July 3, 1992. For many years, developmental researchers have been systematically exploring what is concealed by the blooming and buzzing confusion (as William James described the infant's world). Much research has been carried out on the mechanisms by which organisms recognize and relate to their conspecifics, in particular with respect to language acquisition and face recognition. Given this background, it seems worthwhile to compare not only the conceptual advances made in these two domains, but also the methodological difficulties faced in each of them. In both domains, there is evidence of sophisticated abilities right from birth. Similarly, researchers in these domains have focused on whether the mechanisms underlying these early competences are modality-specific, object­ specific or otherwise.
Chapter
One of the great overlooked tasks of language acquisition is how the language learner develops a representation of the sound properties of a word that allows for the recognition of words in fluent speech. Thus, while child phonologists have given a great deal of attention to the nature of the representations that underlie the child’s earliest productions of speech (e.g. Ferguson, 1986; Macken, 1980; Menn, 1980; Vihman, 1978), and other child language specialists have focused on the structure of the child’s semantic categories, much less is known about what sort of representation of the sound structure of words permits their comprehension by the child (cf. Jusczyk, 1985). Yet, one could argue that learning the essential phonemic characteristics that distinguish between one name and another in one’s native language is at least as important as distinguishing the boundaries of potential referents for the names. Clearly, to be a fluent speaker-hearer of a language, one needs to develop the appropriate categories for both the sounds and the meanings of words.
Chapter
Developmental theories of face perception and speech perception have similar goals. Theorists in both domains seek to explain infants’ early sophistication with regard to the detection and/or discrimination of facial and speech stimuli and to determine whether infants’ early abilities are due to mechanisms dedicated to the processing of specific biologically relevant stimuli or more general sensory/cognitive mechanisms. In addition, theorists in both domains seek to explain how experience with specific faces and speech sounds modifies infants’ perception. In this chapter, studies showing enhanced discriminability at phonetic boundaries, as well as studies on the perception of phonetic prototypes, exceptionally good instances representing the centers of phonetic categories, are described. The studies show that although phonetic boundary effects are common to monkey and man, prototype effects are not. For human listeners prototypes play a unique role in speech perception. They function like “perceptual magnets, ” attracting nearby members of the category. By 6 months of age the prototype’s perceptual magnet effect is language-specific. Exposure to a specific language thus alters infants’ perception prior to the acquisition of word meaning and linguistic contrast. These results support a new theory, the Native Language Magnet (NLM) theory, which describes how innate factors and early experience with language interact in the development of speech perception.
Conference Paper
The abstract for this document is available on CSA Illumina.To view the Abstract, click the Abstract button above the document title.