ArticlePDF Available

Recognition of Speech Produced in Noise

Authors:

Abstract and Figures

A two-part study examined recognition of speech produced in quiet and in noise by normal hearing adults. In Part I 5 women produced 50 sentences consisting of an ambiguous carrier phrase followed by a unique target word. These sentences were spoken in three environments: quiet, wide band noise (WBN), and meaningful multi-talker babble (MMB). The WBN and MMB competitors were presented through insert earphones at 80 dB SPL. For each talker, the mean vocal level, long-term average speech spectra, and mean word duration were calculated for the 50 target words produced in each speaking environment. Compared to quiet, the vocal levels produced in WBN and MMB increased an average of 14.5 dB. The increase in vocal level was characterized by increased spectral energy in the high frequencies. Word duration also increased an average of 77 ms in WBN and MMB relative to the quiet condition. In Part II, the sentences produced by one of the 5 talkers were presented to 30 adults in the presence of multi-talker babble under two conditions. Recognition was evaluated for each condition. In the first condition, the sentences produced in quiet and in noise were presented at equal signal-to-noise ratios (SNR(E)). This served to remove the vocal level differences between the speech samples. In the second condition, the vocal level differences were preserved (SNR(P)). For the SNR(E) condition, recognition of the speech produced in WBN and MMB was on average 15% higher than that for the speech produced in quiet. For the SNR(P) condition, recognition increased an average of 69% for these same speech samples relative to speech produced in quiet. In general, correlational analyses failed to show a direct relation between the acoustic properties measured in Part I and the recognition measures in Part II.
Content may be subject to copyright.
Pittman & Wiley:
Recognition of Speech Produced in Noise
487
Andrea L. Pittman
Terry L. Wiley
University of Wisconsin–
Madison
A two-part study examined recognition of speech produced in quiet and in noise
by normal hearing adults. In Part I 5 women produced 50 sentences consisting of
an ambiguous carrier phrase followed by a unique target word. These sentences
were spoken in three environments: quiet, wide band noise (WBN), and meaning-
ful multi-talker babble (MMB). The WBN and MMB competitors were presented
through insert earphones at 80 dB SPL. For each talker, the mean vocal level,
long-term average speech spectra, and mean word duration were calculated for
the 50 target words produced in each speaking environment. Compared to quiet,
the vocal levels produced in WBN and MMB increased an average of 14.5 dB.
The increase in vocal level was characterized by increased spectral energy in the
high frequencies. Word duration also increased an average of 77 ms in WBN
and MMB relative to the quiet condition. In Part II, the sentences produced by one
of the 5 talkers were presented to 30 adults in the presence of multi-talker babble
under two conditions. Recognition was evaluated for each condition. In the first
condition, the sentences produced in quiet and in noise were presented at equal
signal-to-noise ratios (SNR
E
). This served to remove the vocal level differences
between the speech samples. In the second condition, the vocal level differences
were preserved (SNR
P
). For the SNR
E
condition, recognition of the speech
produced in WBN and MMB was on average 15% higher than that for the
speech produced in quiet. For the SNR
P
condition, recognition increased an
average of 69% for these same speech samples relative to speech produced in
quiet. In general, correlational analyses failed to show a direct relation between
the acoustic properties measured in Part I and the recognition measures in Part II.
KEY WORDS: speech perception, speech acoustics, background noise,
competing message
Recognition of Speech Produced
in Noise
Journal of Speech, Language, and Hearing Research
• Vol. 44 • 487–496 • June 2001 • ©American Speech-Language-Hearing Association
1092-4388/01/4403-0487
T
he presence of a competing acoustic signal during communication
often interferes with the perception of speech. This is particularly
true for persons with sensorineural hearing loss who are more
susceptible to the deleterious effects of noise (Walden, Prosek, &
Worthington; 1975). Studies have shown that word recognition in noise
or competing message can differ for listeners with and without hearing
loss as well as for listeners with different types and degrees of hearing
loss (Beattie, 1989; Walden, Demorest, & Hepler, 1984; Wilson, Zizz,
Shanks, & Causey, 1990). Although these differences represent the in-
fluence of noise on the perception of speech, they do not clarify the influ-
ence of noise on the production of speech or the subsequent perception of
that speech. It is well-established that the acoustic properties of speech
produced in noise are significantly different from those produced in quiet
(Amazi & Garber, 1982; Junqua, 1993; Letowski, Frank, & Caravella,
1993; Summers, Pisoni, Bernaki, Pedlow, & Stokes, 1988; Tartter, Gomes,
& Litwin, 1993; Webster & Klump, 1962). The relation between these
488
Journal of Speech, Language, and Hearing Research
• Vol. 44 • 487–496 • June 2001
properties and the perception of speech, however, re-
mains unclear.
Speech Production in Noise
The acoustic characteristics of speech produced in
noise typically are determined by recording speech
stimuli from a single talker in a quiet environment and
again in the presence of noise or competing message.
Although many acoustic characteristics have been ex-
amined, increases in vocal level, changes in spectral
composition, and increases in word duration have been
reported most consistently. Summers et al. (1988) ex-
amined the acoustic properties of the digits “one” through
“nine” produced by two men in quiet and in three levels
of noise. Significant increases in vocal levels were ob-
served for each talker in each noise level (an average of
4.5, 6.0, and 6.9 dB in 80, 90, and 100 dB SPL noise,
respectively). The slope of a regression line, fitted to the
data points of an amplitude-by-frequency analysis of the
speech stimuli, was significantly steeper for speech pro-
duced in noise than in quiet. The steeper slope indicated
a significant increase in amplitude for higher frequen-
cies relative to the lower frequencies for speech spoken
in noise. The mean word duration for each talker also
increased significantly, from 461 ms in quiet to 524 ms
in 80 dB SPL white noise, and increased further with
each increase in noise level.
Tartter et al. (1993) examined the vocal levels of
two women who produced the digits “zero” through
“nine” in the presence of white noise. They reported an
average increase of 1.0, 2.6, and 3.7 dB in 35, 60, and 80
dB SPL noise, respectively. As in the Summers et al.
(1988) study, the slope of an amplitude-by-frequency
analysis was calculated for each of the speech samples.
Significant increases in high-frequency energy were re-
ported for the 60 dB SPL noise condition relative to the
lower noise level of 35 dB SPL. A significant increase in
word duration (from an average of 343 ms in quiet to
530 ms in 80 dB SPL noise) also was reported.
Junqua (1993) examined the vocal levels of five men
and five women who produced several subsets of speech
materials (digits, monosyllabic words, bisyllabic words,
and letters) in 85 dB SPL white noise. Average vocal
level increases of 18.2 and 12.6 dB were reported for the
men and women talkers, respectively. No significant
shifts in spectral composition were found. An increase
in phoneme duration also was reported, although no
values were provided.
Letowski et al. (1993) evaluated the vocal levels of
running speech produced by five men and five women in
the presence of multi-talker babble, traffic noise, and wide
band noise presented at 70 and 90 dB SPL. They reported
significant increases in vocal level between quiet and both
noise levels; however, no significant differences in vocal
levels were found across the three noise types. An analy-
sis of the amplitude of 20 frequencies taken from the
long-term spectrum of speech indicated significantly
larger increases in amplitude for frequencies 630 Hz.
A measure of words per minute revealed no significant
differences in speech rate between the quiet and the
three noise conditions. Although no significant differ-
ences in the acoustic characteristics of running speech
were found for the speech produced in each competitor,
long-term spectral analyses may not have been sensi-
tive to changes in individual words—particularly those
important for perception. Materials produced in com-
petitors that differ in spectral and semantic content (e.g.,
wideband noise vs. multi-talker babble) may not differ
acoustically over the long term, although it is possible
that differences may be observed for individual words.
If so, the perception of speech produced in noise may be
influenced by these acoustic changes.
The results of these studies suggest that (a) both men
and women increase their vocal levels as a function of
noise level; (b) the amplitude of mid- to high-frequency
energy increases more than that for lower frequencies;
(c) speech rates of men and women are similar in noise;
and (d) vocal level, spectral composition, and word dura-
tion do not appear to be influenced by the spectral con-
tent of the noise when measured over the long term.
Perception of Speech Produced
in Noise
The recognition of speech produced in quiet and in
noise has been compared in only a few published stud-
ies and with conflicting results (Dreher & O’Neill, 1957;
Junqua, 1993; Summers et al. 1988). Junqua (1993) re-
ported significant decreases in the recognition of digits,
monosyllabic words, bisyllabic words, and letters pro-
duced in 85 dB SPL white noise relative to the same
stimuli produced in quiet. The Junqua report, however,
did not provide details regarding how the stimuls levels
were set for various conditions.
Dreher and O’Neill (1957), on the other hand, re-
ported significantly higher recognition scores (an aver-
age of 27%) for spondees spoken in 70 dB SPL white
noise than for the same spondees spoken in quiet. Sum-
mers et al. (1988) also reported significantly higher rec-
ognition scores (an average of 6%) for monosyllabic dig-
its produced in 90 dB SPL white noise than for the same
digits produced in quiet. It is important to note that
stimulus levels in the Dreher and O’Neill (1957) study
were not equalized before presentation, unlike the stimu-
lus levels in the Summers et al. (1988) report. This may
account, in part, for the difference in recognition scores
across these two studies.
In summary, there are few data available regarding
the recognition of speech spoken in noise even though a
Pittman & Wiley:
Recognition of Speech Produced in Noise
489
considerable portion of everyday communication takes
place in the presence of a competitor. Further, the out-
comes of these studies have been ignored in terms of
clinical applications. If speech recognition is influenced
by speech production, which is in turn influenced by a
competing noise, this would be an important consider-
ation to the face validity of speech-recognition measures
used clinically. Although previous studies describe the
differences in recognition between speech produced in
quiet and in noise, it is not clear whether the magni-
tude of the differences warrants the use of environment-
specific speech materials in a clinical setting. The most
important consideration is whether recognition in noise
is significantly underestimated using currently avail-
able speech materials. The studies reviewed above indi-
cate that recognition of speech produced in quiet is gen-
erally poorer than for speech produced in noise.
Unfortunately, those differences were evaluated only for
a small number of stimuli not typically used in an au-
diological evaluation and were limited to a noise back-
ground not typical of everyday communication.
This study determined the recognition of speech
produced in quiet and in two types of noise. In Part I,
speech samples spoken in quiet and in two noise condi-
tions were used to determine if the type of noise signifi-
cantly influenced production. In Part II, the speech
samples from one talker (exhibiting the average acous-
tic characteristics of speech spoken in noise) were se-
lected and presented to a group of listeners under two
listening conditions. In the first condition, the vocal level
differences between the samples were removed by pre-
senting each at a signal-to-noise ratio (SNR) that
equated the overall presentation level. This determined
the degree to which recognition may be underestimated
in clinically derived measures. In the second condition,
the same speech samples were presented with the vocal
level differences preserved. Recognition in this condi-
tion may more accurately reflect the perception of speech
in noisy environments. The recognition scores from these
two conditions were then analyzed with respect to the
acoustic characteristics measured in Part I to determine
the influence of these characteristics on perception.
Part I: Development of Speech
Material
Method
Participants
Five women between the ages of 19 and 28 years par-
ticipated as talkers. All had hearing thresholds 20 dB
HL at audiometric frequencies 0.25, 0.5, 1, 2, 4, and 8 kHz
in each ear and normal middle-ear function as determined
by tympanometry (Roup, Wiley, Safady, & Stoppenbach,
1998). All five women were native speakers of American
English with no noticeable regional dialects.
Materials
Fifty low-predictability (LP) sentences from the
Speech in Noise (SPIN) Test were used (Kalikow,
Stevens, & Elliot, 1977). These sentences consisted of a
unique, although ambiguous, carrier phrase (e.g., “He
would not think about the…”) followed by a unique tar-
get word (crack). The structure of the sentences provided
no contextual information with which to predict the fi-
nal target word during the recognition task (Part II).
This required the listener to rely on the acoustic infor-
mation rather than the semantic content of the sentence.
Five practice sentences began each list to allow the talker
to adjust to each speaking environment. Three random-
izations of the 50 sentences were constructed, one for
each speaking environment.
Procedure
Each talker was seated in a sound-treated room with
a head-worn microphone (Shure, SM10A) placed 1 inch
from the lips, out of the breath stream. Each talker read
the 50 LP sentences first in quiet, then in the presence of
wide band noise (WBN), and again in the presence of
meaningful multi-talker babble (MMB).
1
The competi-
tors were delivered binaurally at 80 dB SPL through in-
sert earphones (Etymotic, ER-3A). The WBN was gener-
ated by an audiometer (GSI, 16), and the MMB was
routed through the audiometer from a cassette tape
player (Nakamichi, CR-2A). The presentation level and
spectra of each competitor were confirmed for both insert
earphones with acoustic measures in a 2-cc coupler. The
earphones were removed for the quiet speaking environ-
ment. The overall noise level in the sound-treated room
in the quiet environment was 16 dB SPL.
To encourage each talker to speak in a manner that
would maximize recognition, an assistant wearing head-
phones was seated outside the window of the sound booth
and instructed to write the final word of each sentence.
Each talker was told that the listener was unable to see
the features of her face and was instructed to speak
clearly, to read the sentences in order, and to wait for
the listener to look up from the response sheet before
proceeding. The talker was unable to see the written
responses. Unknown to the talker, all sentences were
1
The MMB competitor contained independent conversations of three men
and three women recorded separately and then mixed to produce a multi-
talker competitor. Semantic information was preserved in that portions of
each conversation could be selectively followed. It was produced by G.
Donald Causey in 1979 at the Biocommunications Laboratory at the
University of Maryland.
490
Journal of Speech, Language, and Hearing Research
• Vol. 44 • 487–496 • June 2001
digitally recorded (Tascam, DA-P1) at a 44.1 kHz sam-
pling rate for later analyses. It was felt that the talker
might artificially alter her vocal effort if she were aware
of the recording. Each talker was informed of the re-
cording at the completion of the session, and each agreed
to have her speech samples included in the study.
Acoustic Analyses
The sentences produced by each talker in each
speaking environment were low-pass filtered at 10 kHz
and digitized using a 16-bit A/D converter. The target
word within each of the 50 sentences was extracted, con-
catenated, and saved in 15 separate speech samples
(5 talkers × 3 speaking environments). The boundaries
of each target word were visually determined using a
digital audio editor (Syntrillium Software Corp.,
CoolEdit). Using digital signal processing techniques,
long-term-average speech spectra (LTASS) were mea-
sured in
1
/
3
-octave bands for each 50-word speech sample.
A 1000-Hz reference tone of a known SPL and voltage
was pre-recorded on each digital audiotape and used to
calculate the level of each
1
/
3
-octave band as well as the
overall vocal level (in dB SPL). To describe the spectral
composition of each speech sample with a single num-
ber, the slope of a regression line (in dB SPL/kHz) was
fitted to 14 of 15 data points representing the ampli-
tude of each
1
/
3
-octave band frequency. The 15th fre-
quency band was not included because of the limited
bandwidth of the earphones used in Part II of this study.
The average duration (in ms) was calculated for the 50
words measured in each speech sample.
Results and Discussion
The LTASS for each talker and speaking environ-
ment, as well as the spectra calculated for all five talk-
ers (lower right panel), are shown in Figure 1. The solid,
dashed, and dotted lines represent speech produced in
quiet, WBN, and MMB, respectively. The bottom right
panel shows the average spectra of each speaking envi-
ronment for all five talkers. In general, the spectra for
speech produced in both WBN and MMB exhibited
higher overall levels than the spectra for speech pro-
duced in quiet. Talkers 4 and 5 exhibited the smallest
differences in level between the speaking environments,
whereas Talkers 2 and 3 exhibited the largest differ-
ences. Small differences between the spectra for the
WBN and MMB speaking environments are apparent
for Talkers 2, 3, and 5, but not for Talkers 1 and 4.
Vocal Levels
The vocal levels for each talker and speaking condi-
tion are shown in the top panel of Figure 2. Also shown
are the mean vocal levels (and one standard deviation)
for each speaking environment. Relative to quiet, vo-
cal levels increased an average of 14.5 dB in noise. A
one-way ANOVA with repeated measures and planned
orthogonal contrasts confirmed that the vocal levels of
the speech spoken in noise were significantly higher
than those spoken in quiet [F(2, 8) = 17.7, p = 0.001, ω
2
= 0.47; Quiet vs. WBN: t
2
Dunn
(3, 8) = 26.4, p < 0.001;
Quiet vs. MMB: t
2
Dunn
(3, 8) = 26.4, p < 0.001]. The vocal
levels of speech produced in WBN and MMB did not
differ significantly. These results are consistent with
those of Junqua (1993), who reported an average in-
crease of 15 dB for speech produced in 85 dB SPL white
noise. These results are somewhat higher than the 4.6
dB increase in 80 dB SPL white noise reported by Sum-
mers et al. (1988) and the 3.7 dB increase reported by
Tartter et al. (1993). The reason for these differences in
vocal level is unclear but may reflect individual differ-
ences among talkers. It is possible that the two talkers in
each study exhibited small increases in vocal level simi-
lar to Talkers 4 and 5 in the present study (9 dB in 80 dB
SPL WBN).
The absolute vocal levels found in the present study
also are somewhat higher than those reported in previ-
ous studies. This is likely due to differences in the dis-
tance of the talker from the recording microphone. For
example, the microphone in the present study was posi-
tioned 1 inch from the talker’s mouth, whereas in
Letowski et al. (1993) and Summers et al. (1988), the
microphones were 12 and 4 inches from the talkers, re-
spectively. Using the inverse square law to estimate vocal
levels at a microphone distance of 1 inch, the levels in
quiet for the Letowski and Summers studies are equiva-
lent to 84 and 71 dB SPL, respectively, which are some-
what similar to the average vocal level of 82 dB SPL in
the present study.
Spectral Composition
The slope values for each speaking condition are
shown in the middle panel of Figure 2 as a function of
talker. Also shown are the mean slope values (and one
standard deviation) for each speaking environment. The
positive slope values for the two noise environments
indicate an increase in high-frequency energy for speech
produced in noise. For example, the mean amplitude at
2.5 kHz for all talkers shown in Figure 1 increased an
average of 18 dB compared to an average increase of
only 7 dB at 0.2 kHz. A one-way ANOVA with repeated
measures revealed a significant difference between the
slope values for the speech samples produced in each
speaking environment [F(2, 8) = 19.338, p < 0.001, ω
2
=
0.48]. Planned orthogonal contrasts revealed a signifi-
cant difference between the values for speech produced
in quiet and in both noise environments [Quiet vs. WBN:
Pittman & Wiley:
Recognition of Speech Produced in Noise
491
Figure 1. Long-term average spectra of speech spoken in quiet, in wide band noise (WBN), and in
meaningful multi-talker babble (MMB) for each of the five talkers, with the combined spectra of all five
talkers in the lower right panel.
t
2
Dunn
(3, 8) = 33.375, p < 0.001; Quiet vs. MMB: t
2
Dunn
(3,
8) = 23.838, p < 0.001]. However, no difference was found
between the slope values for the speech produced in
WBN and in MMB [t
2
Dunn
(3, 8) = 0.801, p = 0.528]. These
results are consistent with Summers et al. (1988) and
Tartter et al. (1993), who also reported significant in-
creases in high-frequency energy for speech spoken in
noise. The lack of differences between WBN and MMB
in vocal level and slope are consistent with Letowski et
al. (1993), who suggested that the overall level of a com-
petitor, rather than its spectral content, determines
changes in the acoustic characteristics of speech.
Target Word Duration
The average target word durations for each speak-
ing environment are shown by talker in the bottom
panel of Figure 2. Also shown are the mean word dura-
tions (and one standard deviation) for each speaking
environment. Relative to speech spoken in quiet, tar-
get word duration increased an average of 88 and 65
ms in WBN and in MMB, respectively. A one-way
ANOVA with repeated measures revealed significant
differences in word duration for the speech spoken in
quiet and in noise [F(2, 8) = 6.7, p = 0.021, ω
2
= 0.20].
492
Journal of Speech, Language, and Hearing Research
• Vol. 44 • 487–496 • June 2001
Planned orthogonal contrasts revealed significantly
longer target word durations for the speech spoken in
noise relative to speech in quiet [Quiet vs. WBN: t
2
Dunn
(3, 8) = 12.2, p = 0.002; Quiet vs. MMB: t
2
Dunn
(3, 8) = 6.7,
p = 0.014], although no difference in word duration was
observed between WBN and MMB [t
2
Dunn
(3, 8) = 0.8, p =
0.527]. These results are consistent with Summers et
al. (1988), who reported an average increase of 60 ms in
word duration under similar conditions, but are some-
what shorter than the 185-ms increase reported by
Tartter et al. (1993).
To determine whether the insert earphones caused
an occlusion effect that was not present in the quiet
speaking environment, Talker 1 was asked to return for
further testing. She read 10 of the original 50 sentences
under three conditions: (1) wearing the insert earphones
with no noise input, (2) wearing the insert earphones
with 80 dB SPL of WBN, and (3) in quiet without the
insert earphones. The sentences were analyzed as de-
scribed above. Although the acoustic characteristics of
the speech spoken in 80 dB SPL noise were similar to
those measured previously for this talker, no significant
differences were found between the speech spoken in
the two quiet environments. This suggests that the in-
sert earphones did not create an occlusion effect that
might have affected speech production.
In summary, the acoustic characteristics of the
speech materials in the present study appear to be
consistent with those of previous studies. Relative to
speech spoken in quiet, speech spoken in noise demon-
strated significant increases in vocal level, spectral slope,
and word duration. The speech samples of Talker 1 were
used for a recognition task described in Part II because
the acoustic characteristics of her speech were closest
to the average of all five talkers. In addition, the sen-
tences produced by this talker contained no errors,
whereas the other four talkers occasionally misread one
or two nontarget words.
Part II: Recognition
Part II of this study compared the recognition of
speech produced in quiet and in noise. Like previous
studies of this kind, the speech samples produced in quiet
and in noise were presented at SNRs that equated over-
all vocal levels (SNR
E
). Unlike previous studies, the
speech samples also were presented at SNRs that pre-
served these vocal-level differences (SNR
P
). In this way,
the influence of the spectral and temporal changes in
the speech stimuli could be evaluated independent of,
and then in combination with, the additional contribu-
tion of increased vocal level. Based on the work of Sum-
mers et al. (1988) and Dreher and O’Neill (1957), one
would expect higher recognition scores for the speech
produced in WBN and MMB than for the speech pro-
duced in quiet. In addition, one would expect no differ-
ence in recognition between the speech produced in WBN
and MMB, because no significant acoustic differences
were observed.
Method
Participants
Twenty-seven women and 3 men between the ages
of 18 and 30 years served as listeners. Each participant
had hearing thresholds in the test ear 10 dB HL at
Figure 2. Mean vocal levels in dB SPL (top panel), spectral slope in
dB SPL/kHz (middle panel), and word duration in ms (bottom
panel) for the 50 target words produced in quiet, in wide-band
noise (WBN), and in meaningful multi-talker babble (MMB) for
each talker. Group means and ±1 standard deviation are shown
for each speaking environment at the right.
Pittman & Wiley:
Recognition of Speech Produced in Noise
493
audiometric frequencies 0.25, 0.5, 1, 2, and 4 kHz and
15 dB HL at 8 kHz. Hearing levels in the nontest ear
were 20 dB HL at audiometric frequencies 0.25 through
8 kHz. The ear with the lowest thresholds was chosen
as the test ear. In cases of equal thresholds in both ears,
the test ear was alternated across listeners. All listen-
ers exhibited normal middle ear function bilaterally
based on tympanometry (Roup et al., 1998).
Speech Materials
The 50 sentences produced in quiet and in the two
noise conditions by Talker 1 were digitally extracted from
the original recording to remove extraneous utterances.
The sentences within each condition were randomized
and recorded onto a compact disk at a sampling rate of
22.05 kHz. A 4-s gap was inserted between each sen-
tence to allow time for a written response. Two 1-kHz
calibration tones also were recorded. The first was equal
in average RMS level to the sentences produced in quiet,
and the second was equal to the average RMS level of
the speech produced in WBN and MMB. Separate cali-
bration tones were not necessary for the WBN and MMB
sentences, because the overall level of the two samples
differed by less than 1 dB. No attempt was made to
equalize the RMS levels of the target words within each
sentence. This enabled preservation of the acoustic char-
acteristics unique to each speaking environment, includ-
ing variations in vocal level.
Procedure
Each 50-sentence speech sample was presented with
the MMB competitor at 0, –5, and –10 dB SNRs. The
level of the speech remained constant, and the level of
the competitor changed according to the SNR. There were
two listening conditions. In the first condition (SNR
E
),
the levels of the three 50-sentence speech samples were
equated by presenting each at the same SNR. The rec-
ognition scores would therefore reflect the influence of
all acoustic differences between the samples, except vo-
cal level. In the second condition (SNR
P
), the level dif-
ferences between the three 50-sentence speech samples
were preserved so that recognition scores would reflect
the influence of all the acoustic differences, including
vocal level. This was accomplished by setting the noise
level equal to that of the speech produced in quiet and
then presenting the speech produced in WBN and MMB
11 dB higher, which is equivalent to the increase in vo-
cal level for this talker. Presentation of the speech ma-
terial at 0 and –5 dB SNRs was discontinued for the
SNR
P
condition after the results of the first five listen-
ers revealed ceiling effects.
Each 50-sentence sample was presented monau-
rally through a TDH-50 earphone at 60 dB SL relative
to the pure-tone threshold at 1 kHz. Each participant
was instructed to write the final word of each of the
sentences. Testing was conducted in 2 one-hour sessions.
The first five listeners responded to a total of 18 lists of
50 sentences each (3 speech samples × 3 SNRs × 2 lis-
tening conditions). The number of lists was reduced to
12 when the 0 and –5 dB SNRs were discontinued from
the SNR
P
condition. To reduce learning effects, the
speech samples, SNRs, and listening conditions were
randomized; and each participant was familiarized with
the sentences at a +30 dB SNR before testing. Recogni-
tion scores were examined to confirm that performance
did not improve significantly between the first and last
presentation of the sentences.
Results and Discussion
Part II was conducted to determine if speech pro-
duced in quiet and in the two noise environments dif-
fered in terms of recognition. Mean scores for the SNR
E
and SNR
P
conditions are shown in Figure 3 as a func-
tion of SNR and listening condition. The data from one
of the 30 listeners were corrupted for the 0 and –10 dB
SNR
E
conditions and could not be used (indicated with
asterisks). Before statistical analyses, scores were trans-
formed into rationalized arcsine units (RAU) so that the
variances would be homogenous across the range of
scores (Studebaker, 1985).
For the SNR
E
condition, recognition of speech pro-
duced in noise was an average of 15% higher (at –5 dB
SNR) than that of the speech produced in quiet. A one-way
Figure 3. Mean recognition in percent correct (and ±1 standard
deviation) for speech produced in quiet, in wide-band noise
(WBN), and in meaningful multi-talker babble (MMB) presented
under two listening conditions: SNR
P
, for which vocal level
differences were preserved, and SNR
E
, for which vocal levels
differences were removed. Asterisks indicate those conditions with
only 29 listeners; all other values were obtained for 30 listeners.
494
Journal of Speech, Language, and Hearing Research
• Vol. 44 • 487–496 • June 2001
ANOVA with repeated measures revealed significant dif-
ferences in recognition for the speech samples at each
SNR [0 dB SNR: F(2, 56) = 7.4, p = 0.001, ω
2
= 0.07; –5
dB SNR: F(2, 58) = 27.5, p < 0.001, ω
2
= 0.11; –10 dB
SNR: F(2, 56) = 21.0, p < 0.001, ω
2
= 0.13]. Planned or-
thogonal contrasts, listed in Table 1, revealed signifi-
cantly better recognition for both samples of speech spo-
ken in noise compared to speech spoken in quiet (p <
0.01). No significant differences were found for the
speech spoken in WBN and MMB at 0 and –10 dB SNR.
However, recognition of MMB was significantly higher
than that of WBN at –5 dB SNR (p = 0.05).
In the SNR
P
condition, differences in recognition
performance were greatest at –10 dB SNR; scores for
speech spoken in noise were an average of 69% higher
than those for speech spoken in quiet. A one-way ANOVA
with repeated measures revealed significant differences
among the three speech samples [F(1.7, 48.2) = 529.7,
p < 0.001, ω
2
= 0.78; degrees of freedom were adjusted to
compensate for a lack of sphericity
2
]. Planned orthogo-
nal contrasts revealed significantly higher recognition
scores for speech spoken in WBN and MMB than in
speech spoken in quiet [Quiet vs. WBN: t
2
Dunn
(3, 58) =
753.5, p < 0.001; Quiet vs. MMB: t
2
Dunn
(3, 58) = 833.3,
p < 0.001], although no differences in recognition scores
were found between WBN and MMB [t
2
Dunn
(3, 58) = 2.0,
p = 0.123].
Because recognition scores were higher for speech
spoken in WBN and MMB than for speech spoken in
quiet, additional analyses were performed to determine
if the acoustic characteristics measured in Part I influ-
enced performance. Specifically, the percentage of lis-
teners able to correctly identify each target word was
calculated for the quiet, WBN, and MMB speech
samples. The differences in percent between the two
noise conditions and the quiet condition were calculated
(WBN-quiet and MMB-quiet, respectively). These val-
ues quantified the magnitude of improvement between
the target words spoken in noise and in quiet. In the same
way, difference values also were calculated for the peak
RMS level (dB SPL), spectral slope (dB SPL/kHz), and
duration (ms) of each target word. This was done for the
0, –5, and –10 dB SNR
E
conditions. Recall that the SNRs
of the speech samples in the SNR
E
condition were equated
so that the large vocal level differences would be removed.
However, because no attempt was made to equalize the
RMS levels of each target word for the recognition task,
some level differences between the words remained. Cor-
relation coefficients were computed to determine the re-
lation between the changes in performance and the
changes in the three acoustic characteristics.
Pearson’s product-moment correlations are listed in
Table 2. In separate analyses, the increased vocal level
and spectral slope observed for the speech produced in
noise (WBN and MMB) correlated significantly with
increases in recognition (p < 0.01). However, the effects
were small for all but the peak RMS level for the WBN
speech sample at –5 dB SNR. Interestingly, this highest
correlation conflicts with the results of the recognition
task described earlier, for which significantly better per-
formance was observed for the MMB speech sample than
for the WBN speech sample at –5 dB SNR. Overall, these
results suggest that increases in vocal level and spec-
tral composition do not completely account for the ob-
served increases in recognition.
Table 1. Planned orthogonal contrasts (
t
2
Dunn
) by SNR for the
speech produced in quiet, in wide-band noise (WBN), and in
meaningful multi-talker babble (MMB) presented in the SNR
E
condition. Asterisks indicate significant contrasts (
p
0.05).
SNR Contrast
df t p
0 Quiet vs. WBN 3, 56 6.5 0.001*
Quiet vs. MMB 3, 56 14.4 <0.001*
WBN vs. MMB 3, 56 1.5 0.214
–5 Quiet vs. WBN 3, 58 24.2 <0.001*
Quiet vs. MMB 3, 58 52.8 <0.001*
WBN vs. MMB 3, 58 52.7 0.05*
–10 Quiet vs. WBN 3, 56 26.8 <0.001*
Quiet vs. MMB 3, 56 35.5 <0.001*
WBN vs. MMB 3, 56 0.5 0.613
Table 2. Pearson’s product-moment correlation coefficients (
r
)
relating the differences in recognition and acoustic characteristics
for each target word produced in quiet and in wideband noise
(WBN-quiet) and meaningful multi-talker babble (MMB-quiet)
presented at 0, –5, and –10 dB SNR in the SNR
E
condition.
Significant correlations (
p
< 0.01) are indicated by asterisks.
Speaking condition
WBN-quiet MMB-quiet
Peak RMS levels
0 dB SNR 0.57* 0.16
–5 dB SNR 0.74* 0.40*
–10 dB SNR 0.61* 0.33*
Spectral Slope
0 dB SNR 0.45* 0.26
–5 dB SNR 0.51* 0.32*
–10 dB SNR 0.41* 0.24
Word Duration
0 dB SNR 0.11 0.32*
–5 dB SNR –0.02 0.01
–10 dB SNR –0.03 0.11
2
A lack of sphericity indicates that the variances of all possible compari-
sons (quiet, WBN, MMB scores) were not equal, which may inflate the
Type I error rate. An adjustment to the degrees of freedom using the
Greenhouse-Geisser method was made to maintain a rejection rate of 5%.
Pittman & Wiley:
Recognition of Speech Produced in Noise
495
General Discussion
In this study, speech produced in quiet and in two
noise conditions (Part I) was presented to listeners in a
recognition paradigm using two SNR conditions (Part
II). The acoustic analyses of speech produced in the two
noise types revealed significant increases in vocal level,
spectral composition, and word duration as compared
to speech produced in quiet. Interestingly, the acoustic
analyses revealed no differences between the speech
produced in the two noise types (WBN and MMB) de-
spite the spectral and semantic differences of these
competitors. In terms of recognition, scores were an av-
erage of 69% higher for the speech produced in noise
when the vocal level differences between the speech
samples were preserved (SNR
P
) and an average of 15%
higher when the vocal level differences were removed
(SNR
E
). No significant differences in recognition were
found for the speech produced in WBN and in MMB ex-
cept at –5 dB SNR
E
, where a significant increase of 6%
was observed for speech produced in MMB. In general,
these results suggest that the recognition of speech pro-
duced in noise was significantly better than that for
speech produced in quiet and that the spectral and se-
mantic content of the WBN and MMB competitors did
not appear to differentially influence the production of
speech or the subsequent perception of that speech.
The results of this study are consistent with those
of Summers et al. (1988). In that study, digits produced
in broadband noise and in quiet were presented in a
paradigm similar to the –10 dB SNR
E
condition in the
present study for an average increase in recognition of
6%. Dreher and O’Neill (1957) reported increases of 27%
for spondees produced in white noise and presented at
+4 dB SNR. Junqua (1993), on the other hand, reported
significant decreases in the perception of digits and
monosyllabic and bisyllabic words produced in noise for
some listeners. Unfortunately, these results cannot be
compared directly because of insufficient information
provided by Junqua regarding methodology and statis-
tical significance. Recall, however, that Junqua reported
no significant differences in the spectral composition of
his speech materials, which may explain the discrep-
ancy between the results of that study and those of the
present study.
In general, higher recognition scores were observed
for speech produced in noise relative to speech produced
in quiet in the SNR
P
condition. The increased perfor-
mance was likely due to the improved signal-to-noise
ratio provided by the large increases in vocal level. Al-
though the effects were smaller, significant increases in
recognition also were observed when the overall vocal
level of the speech samples was equated (SNR
E
condi-
tion). The residual difference in performance for this
condition suggests that additional variations in the
acoustic characteristics of each target word (other than
overall vocal level) may have contributed to the observed
differences in performance. Further analyses of each
target word revealed significant correlations between
performance and two acoustic characteristics (vocal level
and spectral composition). However, the effects were
small and inconsistent. Overall, these results suggest
that there is not a simple relation between vocal level
or spectral composition of individual words and recog-
nition. Rather, recognition is more likely the result of
complex interactions between these and other acoustic
characteristics that were not examined.
Implications
The results of this study have implications for at
least two areas of clinical audiology. First, Wiley and
Page (1997) argued that, among other things, speech
perception tasks should provide results that can be ap-
plied to rehabilitation efforts, such as amplification, and
the prediction of communication difficulties in everyday
listening situations. The results of Part I suggest that
the acoustic characteristics of speech spoken in noise
are significantly different from those for speech spoken
in quiet. These characteristics, therefore, should be con-
sidered when using hearing aid prescriptive procedures.
For example, many hearing aid prescriptive methods
use the long-term spectrum of speech produced in quiet
as a reference for all incoming signals (Byrne & Dillon,
1986; Cox & Moore, 1988; Schwartz, Lyregaard, &
Lundh, 1988). Hearing aid manufacturers and others
recommend a decrease in low-frequency gain and an
increase in high-frequency gain for the best perception
of speech in noisy environments (Martin, 1996). Al-
though this practice may reduce the effects of upward
spread of masking, the results of this study suggest that
smaller adjustments may be necessary. Talkers will
naturally speak louder in noisy conditions and there-
fore reduce low-frequency and increase high-frequency
energy. If the parameters of a hearing aid are set with-
out this consideration, the acoustic properties of speech
may be overcorrected and, in some cases, perception may
actually be degraded (e.g., the hearing aid may be forced
to operate in saturation). It is important to remember,
however, that the talkers in the present study were spe-
cifically instructed to speak clearly to a listener. Whether
this is fully representative of speech in a typical noise
environment is unknown.
The results of Part II suggest that speech-recogni-
tion tasks used clinically are of limited value for predict-
ing communication difficulties in everyday situations that
involve noise or competing speech because these tasks
use speech samples recorded in quiet. The absence of a
relation between recognition and the most robust acous-
tic differences between these speech samples suggests
496
Journal of Speech, Language, and Hearing Research
• Vol. 44 • 487–496 • June 2001
that it may not be possible to predict accurately speech
recognition in noise through simple modifications of
speech produced in quiet (e.g., increasing the SNR or
shaping the frequency response). Rather, these results
suggest the need to develop speech samples for recogni-
tion tests that incorporate the acoustic characteristics
of actual speaking environments, including those with
background noise. In this way, the effects of hearing loss
on speech recognition can be determined more accurately
by closely imitating common communication environ-
ments under controlled conditions.
Acknowledgments
We would like to acknowledge Ray Kent, Dolores Vetter,
Keith Kluender, and Cynthia Fowler for their insightful
contributions to this project and Patricia Stelmachowicz for
her many helpful comments on earlier versions of this
manuscript. This work was funded in part by a grant from
the Ventry and Friedrich Memorial Funds of the American
Speech-Language-Hearing Foundation.
References
Amazi, D. K., & Garber, S. R. (1982). The Lombard Sign as
a function of age and task. Journal of Speech and Hearing
Research, 25, 581–585.
Avaaz Innovations. (1995). Computerized Speech Research
Environment (CRSE) v4.5 Users Guide. London, Ontario,
Canada: Author.
Beattie, R. (1989). Word recognition functions for the CID
W-22 Test in multi-talker noise for normally hearing and
hearing-impaired subjects. Journal of Speech and Hearing
Disorders, 54, 20–32.
Byrne, D., & Dillon, H. (1986). The National Acoustics
Laboratories’ (NAL) new procedure for selection the gain
and frequency response of a hearing aid. Ear and Hearing,
7, 257–265.
Cox, R. M., & Moore, J. N. (1988). Composite speech
spectrum for hearing aid gain prescriptions. Journal of
Speech and Hearing Research, 31, 102–107.
Dreher, J. J., & O’Neill, J. J. (1957). Effects of ambient
noise on speaker intelligibility for words and phrases.
Journal of the Acoustical Society of America, 29,
1320–1323.
Junqua, J. C. (1993). The Lombard reflex and its role on
human listeners and automatic speech recognizers.
Journal of the Acoustical Society of America, 93, 510–524.
Kalikow, D. N., Stevens, K. N., & Elliot, L. L. (1977).
Development of a test of speech intelligibility in noise using
sentence materials with controlled word predictability.
Journal of the Acoustical Society of America, 61, 1337–1351.
Kryter, K. D. (1962). Methods for calculation and use of the
articulation index. Journal of the Acoustical Society of
America, 34, 1689–1697.
Martin, R. L. (1996). How to shape amplified sound to help
patients hear in background noise. The Hearing Journal,
49, 49–50.
Letowski, T., Frank, T., & Caravella, J. (1993). Acoustical
properties of speech produced in noise presented through
supra-aural earphones. Ear and Hearing, 14, 332–338.
Roup, C. M., Wiley, T. L., Safady, S. H., & Stoppenbach,
D. T. (1998). Tympanometric screening norms for adults.
American Journal of Audiology, 7, 55–60.
Schwartz, D. M., Lyregaard, P. E., & Lundh, P. (1988,
February). Hearing aid selection for severe-to-profound
hearing loss. The Hearing Journal, 13–17.
Studebaker, G. A. (1985). A rationalized arcsine transform.
Journal of Speech and Hearing Research, 28, 455–462.
Summers, W. V., Pisoni , D. B., Bernacki, R. H., Pedlow,
R. I., & Stokes, M. A. (1988). Effects of noise on speech
production: Acoustical and perceptual analyses. Journal of
the Acoustical Society of America, 84, 917–928.
Tartter, V. C., Gomes, H., & Litwin, E. (1993). Some
acoustic effects of listening to noise on speech production.
Journal of the Acoustical Society of America, 94,
2437–2440.
Walden, B. E., Prosek, R. A., & Worthington, D. W.
(1975). The prevalence of hearing loss within selected U.S.
Army branches (Interagency No. IAO 4745, August, 31, 1-
95). Washington, DC: U.S. Army Medical Research and
Development Command.
Webster, J. C., & Klumpp, R. G. (1962). Effects of ambient
noise and nearby talkers on a face-to-face communication
task. Journal of the Acoustical Society of America, 34,
936–912.
Wiley, T. L., & Page, A. L. (1997). Summary: Current and
future perspectives on speech perception tests. In L. L.
Mendel & J. L. Danhauer (Eds.), Audiologic evaluation
and management and speech perception assessment (pp.
201–210) San Diego, CA: Singular Publishing
Wilson, R. H., Zizz, C. A., Shanks, J. E., & Causey, G. D.
(1990). Normative data in quiet, broadband noise, and
competing message for the Northwest University Auditory
Test No. 6 by a female speaker. Journal of Speech and
Hearing Disorders, 55, 771–778.
Received May 9, 2000
Accepted March 1, 2001
DOI: 10.1044/1092-4388(2001/038)
Contact author: Andrea Pittman, PhD, Boys Town National
Research Hospital, 555 North 30
th
Street, Omaha, NE
68131. Email: pittmana@boystown.org
... Perceptual studies have shown that all these acoustic and articulatory changes have a positive impact on speech audiovisual intelligibility [11,7]. A significant perceptual benefit of Lombard speech, compared to conversation speech produced in quiet, has already been found in the audio domain alone, for the comprehension of words and sentences [1, 12,13,14,4,6] and more specifically, of vowels and voiced consonants [2]. Furthermore, comparison of this perceptual benefit in audiovisual and purely auditory conditions showed that visible articulatory changes in Lombard speech also contribute, in most cases, to its improved intelligibility in the visual domain (three participants in [15]; hard listening condition in [11]). ...
... At the global word and sentence level, previous studies have shown that Lombard speech is more intelligible than normal speech in both audio and audio-visual modalities [13,1,2,14,12,11,4,6,7]. This study confirms this tendency at the more specific level of vowel recognition. ...
... The most common changes found in healthy, vocally untrained subjects describe adaptations of increasing the voice sound pressure level (SPL) and fundamental frequency (ƒ o ) [26][27][28] as well as additional effects on the acoustic properties of voice production, such as pronounced amplitude modulations [29] and increases in duration and intensity of vowel production [27,30]. In particular, changes in the duration of vowels and voiced consonants were found to be typical for Lombard speech [24,31,32]. Generally, lengthened vocalizations in noisy environments were found to be interrelated with the voice SPL, particularly on stressed words [24]. ...
... The results showed that the voice SPL of the teachers increased by about 8.4 dB(A) and the mean voicing duration increased by 58 ms. The stretching of the voiced words in this study was therefore slightly longer than in Garnier and Henrich [28], but also shorter than the word duration increase of 77 ms found in Pittman and Wiley [31]. These findings indicate that the longer phonation time was mainly caused by the lengthening of the voiced parts of the words. ...
Article
Full-text available
(1) Objective: Teaching is a particularly voice-demanding occupation. Voice training provided during teachers’ education is often insufficient and thus teachers are at risk of developing voice disorders. Vocal demands during teaching are not only characterized by speaking for long durations but also by speaking in noisy environments. This provokes the so-called Lombard effect, which intuitively leads to an increase in voice intensity, pitch and phonation time in laboratory studies. However, this effect has not been thoroughly investigated in realistic teaching scenarios. (2) Methods: This study thus examined how 13 experienced, but vocally untrained, teachers behaved when reading in a noisy compared to quiet background environment. The quiet and noisy conditions were provided by a live audience either listening quietly or making noise by talking to each other. By using a portable voice accumulator, the fundamental frequency, sound pressure level of the voice and the noise as well as the phonation time were recorded in both conditions. (3) Results: The results showed that the teachers mainly responded according to the Lombard effect. In addition, analysis of phonation time revealed that they failed to increase inhalation time and appeared to lose articulation through the shortening of voiceless consonants in the noisy condition. (4) Conclusions: The teachers demonstrated vocally demanding behavior when speaking in the noisy condition, which can lead to vocal fatigue and cause dysphonia. The findings underline the necessity for specific voice training in teachers’ education, and the content of such training is discussed in light of the results.
... Consequently, the primary goal of the current study was to determine whether those forms of speech that are well-recognized in noise lead to a reduction in listening effort compared to forms that are more challenging to understand. Four forms of speech were compared: (i) "plain" speech, i.e., an unmodified natural form of speech; (ii) Lombard speech (Lombard, 1911), a naturally-enhanced form of speech resulting from speaking in the presence of noise, known to be substantially more intelligible than plain speech when presented at the same SNR (Dreher and O'Neill, 1957;Pittman and Wiley, 2001;Marcoux et al., . /fnins. . ...
Article
Full-text available
Listeners are routinely exposed to many different types of speech, including artificially-enhanced and synthetic speech, styles which deviate to a greater or lesser extent from naturally-spoken exemplars. While the impact of differing speech types on intelligibility is well-studied, it is less clear how such types affect cognitive processing demands, and in particular whether those speech forms with the greatest intelligibility in noise have a commensurately lower listening effort. The current study measured intelligibility, self-reported listening effort, and a pupillometry-based measure of cognitive load for four distinct types of speech: (i) plain i.e. natural unmodified speech; (ii) Lombard speech, a naturally-enhanced form which occurs when speaking in the presence of noise; (iii) artificially-enhanced speech which involves spectral shaping and dynamic range compression; and (iv) speech synthesized from text. In the first experiment a cohort of 26 native listeners responded to the four speech types in three levels of speech-shaped noise. In a second experiment, 31 non-native listeners underwent the same procedure at more favorable signal-to-noise ratios, chosen since second language listening in noise has a more detrimental effect on intelligibility than listening in a first language. For both native and non-native listeners, artificially-enhanced speech was the most intelligible and led to the lowest subjective effort ratings, while the reverse was true for synthetic speech. However, pupil data suggested that Lombard speech elicited the lowest processing demands overall. These outcomes indicate that the relationship between intelligibility and cognitive processing demands is not a simple inverse, but is mediated by speech type. The findings of the current study motivate the search for speech modification algorithms that are optimized for both intelligibility and listening effort.
... This can be found in Lombard speech, naturally occurring in noisy environments (Brumm and Zollinger, 2011), or in the case of clear speech style, where the speaker pays particular attention to their language production, for instance when interacting with a hearing-impaired interlocutor (Krause and Braida, 2004). Both of these compensatory mechanisms have been shown to increase intelligibility for the listener (Picheny et al., 1985;Uchanski et al., 1996;Pittman and Wiley, 2001), and could alleviate masks' potential impact on intelligibility (Smiljanic et al., 2021). ...
Article
Full-text available
Due to the global COVID-19 pandemic, covering the mouth region with a face mask became pervasive in many regions of the world, potentially impacting how people communicate with and around children. To explore the characteristics of this masked communication, we asked nursery school educators, who have been at the forefront of daily masked interaction with children, about their perception of daily communicative interactions while wearing a mask in an online survey. We collected data from French and Japanese nursery school educators to gain an understanding of commonalities and differences in communicative behavior with face masks given documented cultural differences in pre-pandemic mask wearing habits, face scanning patterns, and communicative behavior. Participants (177 French and 138 Japanese educators) reported a perceived change in their own communicative behavior while wearing a mask, with decreases in language quantity and increases in language quality and non-verbal cues. Comparable changes in their team members’ and children’s communicative behaviors were also reported. Moreover, our results suggest that these changes in educators’ communicative behaviors are linked to their attitudes toward mask wearing and their potential difficulty in communicating following its use. These findings shed light on the impact of pandemic-induced mask wearing on children’s daily communicative environment.
Article
Control of speech fulfilled by cooperation between feedforward control and feedback control. Feedforward control activates program of articulation, whereas feedback control carries acoustic and sensorimotor information about pronounced utterance. Their complementary speech control function described by the DIVA model, which based on adjustment of auditory and proprioceptive signals relatively to program of articulation in nerve centers. The inconsistency between the sensory information received via feedback and the presentation of the acoustic signal in the auditory nucleus causes corrective commands. Auditory feedback is necessary for the correct development of children’s articulatory skills, i.e. forming feedforward control. For this reason, prelingually deafened adults have significant articulation impairments due to immature articulatory skills. In postlingual deafness, the previously forming feedforward control allows pronounce phonemes successfully. However, in people with sensorineural hearing loss, control of phonation and articulation through the auditory feedback deteriorates, which expressed by an increase of voice intensity, changes in the speech spectral characteristics and instability in frequency and amplitude. Similar speech changes are found in speakers with normal hearing in the presence of noise that masks the speaker’s voice (Lombard effect). In noise, voice intensity increase, spectral characteristics of speech shift to the high-frequency region, and increase the amplitude and speed of articulatory movements (hyperarticulation). This speech reorganization is an adaptation of the speaker’s own voice to background noise, which purpose is to unmask the speech and restore auditory feedback control.
Article
Purpose Speech motor control changes underlying louder speech are poorly understood in children with cerebral palsy (CP). The current study evaluates changes in the oral articulatory and laryngeal subsystems in children with CP and their typically developing (TD) peers during louder speech. Method Nine children with CP and nine age- and sex-matched TD peers produced sentence repetitions in two conditions: (a) with their habitual rate and loudness and (b) with louder speech. Lip and jaw movements were recorded with optical motion capture. Acoustic recordings were obtained to evaluate vocal fold articulation. Results Children with CP had smaller jaw movements, larger lower lip movements, slower jaw speeds, faster lip speeds, reduced interarticulator coordination, reduced low-frequency spectral tilt, and lower cepstral peak prominences (CPP) in comparison to their TD peers. Both groups produced louder speech with larger lip and jaw movements, faster lip and jaw speeds, increased temporal coordination, reduced movement variability, reduced spectral tilt, and increased CPP. Conclusions Children with CP differ from their TD peers in the speech motor control of both the oral articulatory and laryngeal subsystems. Both groups alter oral articulatory and vocal fold movements when cued to speak loudly, which may contribute to the increased intelligibility associated with louder speech. Supplemental Material https://doi.org/10.23641/asha.24970302
Article
Full-text available
Background: There are few hearing tests in Spanish that assess speech discrimination in noise in the adult population that take into account the Lombard effect. This study presents the design and development of a Spanish hearing test for speech in noise (Prueba Auditiva de Habla en Ruido en Español (PAHRE) in Spanish). The pattern of the Quick Speech in Noise test was followed when drafting sentences with five key words each grouped in lists of six sentences. It was necessary to take into account the differences between English and Spanish. Methods: A total of 61 people (24 men and 37 women) with an average age of 46.9 (range 18-84 years) participated in the study. The work was carried out in two phases. In the first phase, a list of Spanish sentences was drafted and subjected to a familiarity test based on the semantic and syntactic characteristics of the sentences; as a result, a list of sentences was selected for the final test. In the second phase, the selected sentences were recorded with and without the Lombard effect, the equivalence between both lists was analysed, and the test was applied to a first reference population. Results: The results obtained allow us to affirm that it is representative of the Spanish spoken in its variety in peninsular Spain. Conclusions: In addition, these results point to the usefulness of the PAHRE test in assessing speech in noise by maintaining a fixed speech intensity while varying the intensity of the multi-speaker background noise. The incorporation of the Lombard effect in the test shows discrimination differences with the same signal-to-noise ratio compared to the test without the Lombard effect.
Article
Full-text available
Speakers tend to speak clearly in noisy environments, while they tend to reserve effort by shortening word duration in predictable contexts. It is unclear how these two communicative demands are met. The current study investigates the acoustic realizations of syllables in predictable vs unpredictable contexts across different background noise levels. Thirty-eight German native speakers produced 60 CV syllables in two predictability contexts in three noise conditions (reference = quiet, 0 dB and −10 dB signal-to-noise ratio). Duration, intensity (average and range), F0 (median), and vowel formants of the target syllables were analysed. The presence of noise yielded significantly longer duration, higher average intensity, larger intensity range, and higher F0. Noise levels affected intensity (average and range) and F0. Low predictability syllables exhibited longer duration and larger intensity range. However, no interaction was found between noise and predictability. This suggests that noise-related modifications might be independent of predictability-related changes, with implications for including channel-based and message-based formulations in speech production.
Article
Full-text available
The purpose of this study was to reexamine the Margolis and Heller (1987) normative tympanometric data (also American Speech-Language-Hearing [ASHA], 1990 interim norms) using a strict control over subject age and gender. Normative values for peak, compensated static acoustic admittance (Peak Ytm), acoustic equivalent volume (Vea), and tympanometric width (TW) were determined for 102 young adults with normal hearing. Relative to the Margolis and Heller normative values, significant differences were found for Vea and TW. Although statistically significant, these differences were small and of little clinical importance. However, significant and clinically important gender differences in young adults were observed for each of the tympanometric measures. Compared to males, females had lower Peak Ytm values, smaller Vea values, and higher TW values.
Article
Acoustical analyses were carried out on a set of utterances produced by two male speakers talking in quiet and in 80, 90, and 100 dB SPL of masking noise. In addition to replicating previous studies demonstrating increases in amplitude, duration, and vocal pitch while talking in noise, these analyses also found reliable differences in the formant frequencies and short‐term spectra of vowels. Perceptual experiments were also conducted to assess the intelligibility of utterances produced in quiet and in noise when they were presented at equal S/N ratios for identification. In each experiment, utterances originally produced in noise were found to be more intelligible than utterances produced in the quiet. The results of the acoustic analyses showed clear and consistent differences in the acoustic–phonetic characteristics of speech produced in quiet versus noisy environments. Moreover, these acoustic differences produced reliable effects on intelligibility. The findings are discussed in terms of: (1) the nature of the acoustic changes that take place when speakers produce speech under adverse conditions such as noise, psychological stress, or high cognitive load; (2) the role of training and feedback in controlling and modifying a talker’s speech to improve performance of current speech recognizers; and (3) the development of robust algorithms for recognition of speech in noise.
Article
The Lombard, or voice reflex, effect results in speech with characteristics different from those of speech that is normally produced. This change of characteristics can be demonstrated as an effective way to combat noise interference during reception. It also demonstrates the advisability of control of the production of speech by the speaker himself, in addition to that offered by equipment, during audiological evaluations. Fifteen naive speakers read words and sentences while noise was being delivered to their headsets. There were five noise conditions. Their speech was recorded, with the noise being kept out of the recording channel, and then limited. Noise was then added to the recording in such a way as to produce a constant speech-to-noise ratio. The result was played to 200 American listeners. Results indicate that at a constant speech-to-noise ratio of reception speech produced by a talker with masking noise in his cars becomes more intelligible as the masking level rises to a given value. The change in intelligibility throughout the range investigated suggests an application to audiological testing as well as a device for use in voice communication.
Article
From 1 to 5 talker-listener pairs, talkers seated shoulder-to shoulder on one side of a table with listeners on the other, communicated word lists in conditions of quiet and ambient thermal noise levels of 65, 75, and 85 dB. Each talker read one word at a time to his listener-partner, who repeated back each word for verification by the talker. Talker—listener pairs were instructed to maintain an accuracy of 90% or better. For the lower ambient levels the speech level of a central pair increased about 5 dB for an additional 10 dB of noise or for each doubling of the number of pairs around them. The rate of utterance decreased with noise but showed no clear cut pattern of change as the number of additional talkers was varied. Accuracy of communication was, on the average, 94% and was never below 84%. Communication errors defy simple description but in general (1) for a constant noise level, increasing the number of talkers results in increasing errors; and (2 for 3 or fewer talker-listener pairs, percent error does not increase until the ambient-noise level reaches 85 dB.
Article
Speech‐intelligibility testing is an expensive and time‐consuming operation that requires laboratory test conditions. In an attempt to short‐cut or make unnecessary this type of testing, a procedure was developed by French and Steinberg [J. Acoust. Soc. Am. 19, 90–119 (1949)] for calculating from physical and acoustical measurements made on a communication system a measure that is indicative of the intelligibility scores that would be obtained for that system under actual test conditions. This measure is called the “Articulation Index” (AI). Methods of calculating AI have been improved and elaborated to the point where several methods for its calculation, herein reported, can be proposed for use in the evaluation of most speech communication systems.
Article
The purpose of this investigation was to derive estimates of the prevalence of hearing loss within U.S. Army branches suspected to be high-risk with regard to hearing loss. Questionnaire data were obtained from high-risk personnel concerning their opinions of their hearing ability, hearing protective devices, and exposure to hazardous noises. Audiometric and questionnaire data were obtained from 3000 enlisted men representing three combat branches (i.e., infantry, armor, artillery) and five time-in-service categories. Subjects were selected at random, in proportion to population sizes, from ten Army posts. All of the data gathering was accomplished by the Audiology Officer(s) assigned to each post. The results suggest that the prevalence of hearing loss is approximately the same in the infantry, armor and artillery branches. In contrast, there are substantial differences in the prevalence of hearing loss according to length of time in service. Further, the problem of premature hearing loss among U.S. Army troops affects only the mid- to high-frequency range in the majority of soliders, with speech-reception thresholds and speech discrimination in quiet frequently remaining within normal limits even in advanced cases of noise-induced hearing loss. A comparison of reported profiles and profiles based upon the audiometric data suggests that many soldiers do not appear to carry the appropriate profile for hearing.
Article
This paper describes a test of everyday speech reception, in which a listener's utilization of the linguistic-situational information of speech is assessed, and is compared with the utilization of acoustic-phonetic information. The test items are sentences which are presented in babble-type noise, and the listener response is the final word in the sentence (the key word) which is always a monosyllabic noun. Two types of sentences are used: high-predictability items for which the key word is somewhat predictable from the context, and low-predictability items for which the final word cannot be predicted from the context. Both types are included in several 50-item forms of the test, which are balanced for intelligibility, key-word familiarity and predictability, phonetic content, and length. Performance of normally hearing listeners for various signal-to-noise ratios shows significantly different functions for low- and high-predictability items. The potential applications of this test, particularly in the assessment of speech reception in the hearing impaired, are discussed.