A preview of this full-text is provided by American Psychological Association.
Content available from Journal of Experimental Psychology Learning Memory and Cognition
This content is subject to copyright. Terms and conditions apply.
Journal of Experimental Psychology:
Learning, Memory, and Cognition
1993,
Vol. 19, No. 2, 309-328
Copyright 1993 by the American Psychological Association, Inc.
0278-7393/93/S3.00
Episodic Encoding of Voice Attributes and Recognition Memory
for Spoken Words
Thomas J. Palmeri, Stephen D. Goldinger, and David B. Pisoni
Recognition memory for spoken words was investigated with a continuous recognition memory
task. Independent variables were number of intervening words (lag) between initial and subse-
quent presentations of a word, total number of talkers in the stimulus set, and whether words were
repeated in the same voice or a different voice. In Experiment 1, recognition judgments were
based on word identity alone. Same-voice repetitions were recognized more quickly and accu-
rately than different-voice repetitions at all values of lag and at all levels of talker variability. In
Experiment 2, recognition judgments were based on both word identity and voice identity.
Subjects recognized repeated voices quite accurately. Gender of the talker affected voice recog-
nition but not item recognition. These results suggest that detailed information about a talker's
voice is retained in long-term episodic memory representations of spoken words.
The speech signal varies substantially across individual
talkers as a result of differences in the shape and length of
the vocal tract (Carrell, 1984; Fant, 1973; Summerfield &
Haggard, 1973), glottal source function (Carrell, 1984), po-
sitioning and control of articulators (Ladefoged, 1980), and
dialect. According to most contemporary theories of speech
perception, acoustic differences between talkers constitute
noise that must be somehow filtered out or transformed so
that the symbolic information encoded in the speech signal
may be recovered (e.g., Blandon, Henton, & Pickering,
1984;
Disner, 1980; Gerstman, 1968; Green, Kuhl, Melt-
zoff,
& Stevens, 1991; Summerfield & Haggard, 1973). In
these theories, some type of "talker-normalization" mecha-
nism, either implicit or explicit, is assumed to compensate
for the inherent talker variability1 in the speech signal (e.g.,
Joos,
1948). Although many theories attempt to describe
how idealized or abstract phonetic representations are re-
covered from the speech signal (see Johnson, 1990, and
Nearey, 1989, for reviews), little mention is made of
the
fate
of voice information after lexical access is complete. The
talker-normalization hypothesis is consistent with current
views of speech perception wherein acoustic-phonetic in-
variances are sought, redundant surface forms are quickly
forgotten, and only semantic information is retained in long-
term memory (see Pisoni, Lively, & Logan, 1992).
According to the traditional view of speech perception,
detailed information about a talker's voice is absent from
the representations of spoken utterances in memory. In fact,
Thomas J. Palmeri and David B. Pisoni, Department of Psychol-
ogy, Indiana University; Stephen D. Goldinger, Department of
Psychology, Arizona State University.
This research was supported by National Institutes of Health
Research Grant DC-00111-16 to Indiana University, Bloomington.
We thank Fergus Craik, Edward Geiselman, Leah Light, Scott
Lively, Lynne Nygaard, Mitch Sommers, and Richard Shiffrin for
their valuable comments and criticisms. We also thank Kristin
Lively for collecting data in Experiment 2.
Correspondence concerning this article should be addressed to
Thomas J. Palmeri or David B. Pisoni, Department of Psychology,
Indiana University, Bloomington, Indiana 47405.
evidence from a variety of tasks suggests that the surface
forms of both auditory and visual stimuli are retained in
memory. Using a continuous recognition memory task
(Shepard & Teghtsoonian, 1961), Craik and Kirsner (1974)
found that recognition memory for spoken words was better
when words were repeated in the same voice as that in
which they were originally presented. The enhanced recog-
nition of same-voice repetitions did not deteriorate over
increasing delays between repetitions. Moreover, subjects
were able to recognize whether a word was repeated in the
same voice as in its original presentation. When words were
presented visually, Kirsner (1973) found that recognition
memory was better for words that were presented and re-
peated in the same typeface. In a parallel to the auditory
data, subjects were also able to recognize whether a word
was repeated in the same typeface as in its original presen-
tation. Kirsner and Smith (1974) found similar results when
the presentation modalities of words, either visual or audi-
tory, were repeated.
Long-term memory for surface features of text has also
been demonstrated in several studies by Kolers and his
colleagues. Kolers and Ostry (1974) observed greater sav-
ings in reading times when subjects reread passages of
inverted text that were presented in the same inverted form
as an earlier presentation than when the same text was
presented in a different inverted form. This savings in read-
ing time was found even 1 year after the original presenta-
tion of the inverted text, although recognition memory for
the semantic content of the passages was reduced to chance
(Kolers, 1976). Together with the data from Kirsner and
colleagues, these findings suggest that physical forms of
auditory and visual stimuli are not filtered out during en-
coding but instead remain part of long-term memory repre-
sentations. In the domain of spoken language processing,
1 Talker variability refers to differences between talkers. All
references to talker variability and voice differences throughout
this article refer to such between-talker differences. Differences
between words produced by the same talker are not implied by this
term.
309
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.