ArticlePDF Available

Recognition of Speech Produced in Noise

July 2001
Journal of Speech Language and Hearing Research 44(3):487-96

July 2001
44(3):487-96

DOI:10.1044/1092-4388(2001/038)

Source
PubMed

Authors:

Andrea Pittman

MGH Institute for Health Professions

Terry Wiley

University of Wisconsin–Madison

A two-part study examined recognition of speech produced in quiet and in noise by normal hearing adults. In Part I 5 women produced 50 sentences consisting of an ambiguous carrier phrase followed by a unique target word. These sentences were spoken in three environments: quiet, wide band noise (WBN), and meaningful multi-talker babble (MMB). The WBN and MMB competitors were presented through insert earphones at 80 dB SPL. For each talker, the mean vocal level, long-term average speech spectra, and mean word duration were calculated for the 50 target words produced in each speaking environment. Compared to quiet, the vocal levels produced in WBN and MMB increased an average of 14.5 dB. The increase in vocal level was characterized by increased spectral energy in the high frequencies. Word duration also increased an average of 77 ms in WBN and MMB relative to the quiet condition. In Part II, the sentences produced by one of the 5 talkers were presented to 30 adults in the presence of multi-talker babble under two conditions. Recognition was evaluated for each condition. In the first condition, the sentences produced in quiet and in noise were presented at equal signal-to-noise ratios (SNR(E)). This served to remove the vocal level differences between the speech samples. In the second condition, the vocal level differences were preserved (SNR(P)). For the SNR(E) condition, recognition of the speech produced in WBN and MMB was on average 15% higher than that for the speech produced in quiet. For the SNR(P) condition, recognition increased an average of 69% for these same speech samples relative to speech produced in quiet. In general, correlational analyses failed to show a direct relation between the acoustic properties measured in Part I and the recognition measures in Part II.

Long-term average spectra of speech spoken in quiet, in wide band noise (WBN), and in meaningful multi-talker babble (MMB) for each of the five talkers, with the combined spectra of all five talkers in the lower right panel.

…

Mean vocal levels in dB SPL (top panel), spectral slope in dB SPL/kHz (middle panel), and word duration in ms (bottom panel) for the 50 target words produced in quiet, in wide-band noise (WBN), and in meaningful multi-talker babble (MMB) for each talker. Group means and ± 1 standard deviation are shown for each speaking environment at the right.

…

Mean recognition in percent correct (and ± 1 standard deviation) for speech produced in quiet, in wide-band noise (WBN), and in meaningful multi-talker babble (MMB) presented under two listening conditions: SNR , for which vocal level P differences were preserved, and SNR E , for which vocal levels differences were removed. Asterisks indicate those conditions with

…

Figures - uploaded by Terry Wiley

Content may be subject to copyright.

Content uploaded by Terry Wiley

Content may be subject to copyright.

Pittman & Wiley:

Recognition of Speech Produced in Noise

487

Andrea L. Pittman

Terry L. Wiley

University of Wisconsin–

Madison

A two-part study examined recognition of speech produced in quiet and in noise

by normal hearing adults. In Part I 5 women produced 50 sentences consisting of

an ambiguous carrier phrase followed by a unique target word. These sentences

were spoken in three environments: quiet, wide band noise (WBN), and meaning-

ful multi-talker babble (MMB). The WBN and MMB competitors were presented

through insert earphones at 80 dB SPL. For each talker, the mean vocal level,

long-term average speech spectra, and mean word duration were calculated for

the 50 target words produced in each speaking environment. Compared to quiet,

the vocal levels produced in WBN and MMB increased an average of 14.5 dB.

The increase in vocal level was characterized by increased spectral energy in the

high frequencies. Word duration also increased an average of 77 ms in WBN

and MMB relative to the quiet condition. In Part II, the sentences produced by one

of the 5 talkers were presented to 30 adults in the presence of multi-talker babble

under two conditions. Recognition was evaluated for each condition. In the first

condition, the sentences produced in quiet and in noise were presented at equal

signal-to-noise ratios (SNR

). This served to remove the vocal level differences

between the speech samples. In the second condition, the vocal level differences

were preserved (SNR

). For the SNR

condition, recognition of the speech

produced in WBN and MMB was on average 15% higher than that for the

speech produced in quiet. For the SNR

condition, recognition increased an

average of 69% for these same speech samples relative to speech produced in

quiet. In general, correlational analyses failed to show a direct relation between

the acoustic properties measured in Part I and the recognition measures in Part II.

KEY WORDS: speech perception, speech acoustics, background noise,

competing message

Recognition of Speech Produced

in Noise

Journal of Speech, Language, and Hearing Research

• Vol. 44 • 487–496 • June 2001 • ©American Speech-Language-Hearing Association

1092-4388/01/4403-0487

he presence of a competing acoustic signal during communication

often interferes with the perception of speech. This is particularly

true for persons with sensorineural hearing loss who are more

susceptible to the deleterious effects of noise (Walden, Prosek, &

Worthington; 1975). Studies have shown that word recognition in noise

or competing message can differ for listeners with and without hearing

loss as well as for listeners with different types and degrees of hearing

loss (Beattie, 1989; Walden, Demorest, & Hepler, 1984; Wilson, Zizz,

Shanks, & Causey, 1990). Although these differences represent the in-

fluence of noise on the perception of speech, they do not clarify the influ-

ence of noise on the production of speech or the subsequent perception of

that speech. It is well-established that the acoustic properties of speech

produced in noise are significantly different from those produced in quiet

(Amazi & Garber, 1982; Junqua, 1993; Letowski, Frank, & Caravella,

1993; Summers, Pisoni, Bernaki, Pedlow, & Stokes, 1988; Tartter, Gomes,

& Litwin, 1993; Webster & Klump, 1962). The relation between these

488

Journal of Speech, Language, and Hearing Research

• Vol. 44 • 487–496 • June 2001

properties and the perception of speech, however, re-

mains unclear.

Speech Production in Noise

The acoustic characteristics of speech produced in

noise typically are determined by recording speech

stimuli from a single talker in a quiet environment and

again in the presence of noise or competing message.

Although many acoustic characteristics have been ex-

amined, increases in vocal level, changes in spectral

composition, and increases in word duration have been

reported most consistently. Summers et al. (1988) ex-

amined the acoustic properties of the digits “one” through

“nine” produced by two men in quiet and in three levels

of noise. Significant increases in vocal levels were ob-

served for each talker in each noise level (an average of

4.5, 6.0, and 6.9 dB in 80, 90, and 100 dB SPL noise,

respectively). The slope of a regression line, fitted to the

data points of an amplitude-by-frequency analysis of the

speech stimuli, was significantly steeper for speech pro-

duced in noise than in quiet. The steeper slope indicated

a significant increase in amplitude for higher frequen-

cies relative to the lower frequencies for speech spoken

in noise. The mean word duration for each talker also

increased significantly, from 461 ms in quiet to 524 ms

in 80 dB SPL white noise, and increased further with

each increase in noise level.

Tartter et al. (1993) examined the vocal levels of

two women who produced the digits “zero” through

“nine” in the presence of white noise. They reported an

average increase of 1.0, 2.6, and 3.7 dB in 35, 60, and 80

dB SPL noise, respectively. As in the Summers et al.

(1988) study, the slope of an amplitude-by-frequency

analysis was calculated for each of the speech samples.

Significant increases in high-frequency energy were re-

ported for the 60 dB SPL noise condition relative to the

lower noise level of 35 dB SPL. A significant increase in

word duration (from an average of 343 ms in quiet to

530 ms in 80 dB SPL noise) also was reported.

Junqua (1993) examined the vocal levels of five men

and five women who produced several subsets of speech

materials (digits, monosyllabic words, bisyllabic words,

and letters) in 85 dB SPL white noise. Average vocal

level increases of 18.2 and 12.6 dB were reported for the

men and women talkers, respectively. No significant

shifts in spectral composition were found. An increase

in phoneme duration also was reported, although no

values were provided.

Letowski et al. (1993) evaluated the vocal levels of

running speech produced by five men and five women in

the presence of multi-talker babble, traffic noise, and wide

band noise presented at 70 and 90 dB SPL. They reported

significant increases in vocal level between quiet and both

noise levels; however, no significant differences in vocal

levels were found across the three noise types. An analy-

sis of the amplitude of 20 frequencies taken from the

long-term spectrum of speech indicated significantly

larger increases in amplitude for frequencies ≥630 Hz.

A measure of words per minute revealed no significant

differences in speech rate between the quiet and the

three noise conditions. Although no significant differ-

ences in the acoustic characteristics of running speech

were found for the speech produced in each competitor,

long-term spectral analyses may not have been sensi-

tive to changes in individual words—particularly those

important for perception. Materials produced in com-

petitors that differ in spectral and semantic content (e.g.,

wideband noise vs. multi-talker babble) may not differ

acoustically over the long term, although it is possible

that differences may be observed for individual words.

If so, the perception of speech produced in noise may be

influenced by these acoustic changes.

The results of these studies suggest that (a) both men

and women increase their vocal levels as a function of

noise level; (b) the amplitude of mid- to high-frequency

energy increases more than that for lower frequencies;

and (d) vocal level, spectral composition, and word dura-

tion do not appear to be influenced by the spectral con-

tent of the noise when measured over the long term.

Perception of Speech Produced

in Noise

The recognition of speech produced in quiet and in

noise has been compared in only a few published stud-

ies and with conflicting results (Dreher & O’Neill, 1957;

Junqua, 1993; Summers et al. 1988). Junqua (1993) re-

ported significant decreases in the recognition of digits,

monosyllabic words, bisyllabic words, and letters pro-

duced in 85 dB SPL white noise relative to the same

stimuli produced in quiet. The Junqua report, however,

did not provide details regarding how the stimuls levels

were set for various conditions.

Dreher and O’Neill (1957), on the other hand, re-

ported significantly higher recognition scores (an aver-

age of 27%) for spondees spoken in 70 dB SPL white

noise than for the same spondees spoken in quiet. Sum-

mers et al. (1988) also reported significantly higher rec-

ognition scores (an average of 6%) for monosyllabic dig-

its produced in 90 dB SPL white noise than for the same

digits produced in quiet. It is important to note that

stimulus levels in the Dreher and O’Neill (1957) study

were not equalized before presentation, unlike the stimu-

lus levels in the Summers et al. (1988) report. This may

account, in part, for the difference in recognition scores

across these two studies.

In summary, there are few data available regarding

the recognition of speech spoken in noise even though a

Pittman & Wiley:

Recognition of Speech Produced in Noise

489

considerable portion of everyday communication takes

place in the presence of a competitor. Further, the out-

comes of these studies have been ignored in terms of

clinical applications. If speech recognition is influenced

by speech production, which is in turn influenced by a

competing noise, this would be an important consider-

ation to the face validity of speech-recognition measures

used clinically. Although previous studies describe the

differences in recognition between speech produced in

quiet and in noise, it is not clear whether the magni-

tude of the differences warrants the use of environment-

specific speech materials in a clinical setting. The most

important consideration is whether recognition in noise

is significantly underestimated using currently avail-

able speech materials. The studies reviewed above indi-

cate that recognition of speech produced in quiet is gen-

erally poorer than for speech produced in noise.

Unfortunately, those differences were evaluated only for

a small number of stimuli not typically used in an au-

diological evaluation and were limited to a noise back-

ground not typical of everyday communication.

This study determined the recognition of speech

produced in quiet and in two types of noise. In Part I,

speech samples spoken in quiet and in two noise condi-

tions were used to determine if the type of noise signifi-

cantly influenced production. In Part II, the speech

samples from one talker (exhibiting the average acous-

tic characteristics of speech spoken in noise) were se-

lected and presented to a group of listeners under two

listening conditions. In the first condition, the vocal level

differences between the samples were removed by pre-

senting each at a signal-to-noise ratio (SNR) that

equated the overall presentation level. This determined

the degree to which recognition may be underestimated

in clinically derived measures. In the second condition,

the same speech samples were presented with the vocal

level differences preserved. Recognition in this condi-

tion may more accurately reflect the perception of speech

in noisy environments. The recognition scores from these

two conditions were then analyzed with respect to the

acoustic characteristics measured in Part I to determine

the influence of these characteristics on perception.

Part I: Development of Speech

Material

Method

Participants

Five women between the ages of 19 and 28 years par-

ticipated as talkers. All had hearing thresholds ≤20 dB

HL at audiometric frequencies 0.25, 0.5, 1, 2, 4, and 8 kHz

in each ear and normal middle-ear function as determined

by tympanometry (Roup, Wiley, Safady, & Stoppenbach,

1998). All five women were native speakers of American

English with no noticeable regional dialects.

Materials

Fifty low-predictability (LP) sentences from the

Speech in Noise (SPIN) Test were used (Kalikow,

Stevens, & Elliot, 1977). These sentences consisted of a

unique, although ambiguous, carrier phrase (e.g., “He

would not think about the…”) followed by a unique tar-

get word (crack). The structure of the sentences provided

no contextual information with which to predict the fi-

nal target word during the recognition task (Part II).

This required the listener to rely on the acoustic infor-

mation rather than the semantic content of the sentence.

Five practice sentences began each list to allow the talker

to adjust to each speaking environment. Three random-

izations of the 50 sentences were constructed, one for

each speaking environment.

Procedure

Each talker was seated in a sound-treated room with

a head-worn microphone (Shure, SM10A) placed 1 inch

from the lips, out of the breath stream. Each talker read

the 50 LP sentences first in quiet, then in the presence of

wide band noise (WBN), and again in the presence of

meaningful multi-talker babble (MMB).

The competi-

tors were delivered binaurally at 80 dB SPL through in-

sert earphones (Etymotic, ER-3A). The WBN was gener-

ated by an audiometer (GSI, 16), and the MMB was

routed through the audiometer from a cassette tape

player (Nakamichi, CR-2A). The presentation level and

spectra of each competitor were confirmed for both insert

earphones with acoustic measures in a 2-cc coupler. The

earphones were removed for the quiet speaking environ-

ment. The overall noise level in the sound-treated room

in the quiet environment was 16 dB SPL.

To encourage each talker to speak in a manner that

would maximize recognition, an assistant wearing head-

phones was seated outside the window of the sound booth

and instructed to write the final word of each sentence.

Each talker was told that the listener was unable to see

the features of her face and was instructed to speak

clearly, to read the sentences in order, and to wait for

the listener to look up from the response sheet before

proceeding. The talker was unable to see the written

responses. Unknown to the talker, all sentences were

The MMB competitor contained independent conversations of three men

and three women recorded separately and then mixed to produce a multi-

talker competitor. Semantic information was preserved in that portions of

each conversation could be selectively followed. It was produced by G.

Donald Causey in 1979 at the Biocommunications Laboratory at the

University of Maryland.

490

Journal of Speech, Language, and Hearing Research

• Vol. 44 • 487–496 • June 2001

digitally recorded (Tascam, DA-P1) at a 44.1 kHz sam-

pling rate for later analyses. It was felt that the talker

might artificially alter her vocal effort if she were aware

of the recording. Each talker was informed of the re-

cording at the completion of the session, and each agreed

to have her speech samples included in the study.

Acoustic Analyses

The sentences produced by each talker in each

speaking environment were low-pass filtered at 10 kHz

and digitized using a 16-bit A/D converter. The target

word within each of the 50 sentences was extracted, con-

catenated, and saved in 15 separate speech samples

(5 talkers × 3 speaking environments). The boundaries

of each target word were visually determined using a

digital audio editor (Syntrillium Software Corp.,

CoolEdit). Using digital signal processing techniques,

long-term-average speech spectra (LTASS) were mea-

sured in

-octave bands for each 50-word speech sample.

A 1000-Hz reference tone of a known SPL and voltage

was pre-recorded on each digital audiotape and used to

calculate the level of each

-octave band as well as the

overall vocal level (in dB SPL). To describe the spectral

composition of each speech sample with a single num-

ber, the slope of a regression line (in dB SPL/kHz) was

fitted to 14 of 15 data points representing the ampli-

tude of each

-octave band frequency. The 15th fre-

quency band was not included because of the limited

bandwidth of the earphones used in Part II of this study.

The average duration (in ms) was calculated for the 50

words measured in each speech sample.

Results and Discussion

The LTASS for each talker and speaking environ-

ment, as well as the spectra calculated for all five talk-

ers (lower right panel), are shown in Figure 1. The solid,

dashed, and dotted lines represent speech produced in

quiet, WBN, and MMB, respectively. The bottom right

panel shows the average spectra of each speaking envi-

ronment for all five talkers. In general, the spectra for

speech produced in both WBN and MMB exhibited

higher overall levels than the spectra for speech pro-

duced in quiet. Talkers 4 and 5 exhibited the smallest

differences in level between the speaking environments,

whereas Talkers 2 and 3 exhibited the largest differ-

ences. Small differences between the spectra for the

WBN and MMB speaking environments are apparent

for Talkers 2, 3, and 5, but not for Talkers 1 and 4.

Vocal Levels

The vocal levels for each talker and speaking condi-

tion are shown in the top panel of Figure 2. Also shown

are the mean vocal levels (and one standard deviation)

for each speaking environment. Relative to quiet, vo-

cal levels increased an average of 14.5 dB in noise. A

one-way ANOVA with repeated measures and planned

orthogonal contrasts confirmed that the vocal levels of

the speech spoken in noise were significantly higher

than those spoken in quiet [F(2, 8) = 17.7, p = 0.001, ω

= 0.47; Quiet vs. WBN: t

Dunn

(3, 8) = 26.4, p < 0.001;

Quiet vs. MMB: t

Dunn

(3, 8) = 26.4, p < 0.001]. The vocal

levels of speech produced in WBN and MMB did not

differ significantly. These results are consistent with

those of Junqua (1993), who reported an average in-

crease of 15 dB for speech produced in 85 dB SPL white

noise. These results are somewhat higher than the 4.6

dB increase in 80 dB SPL white noise reported by Sum-

mers et al. (1988) and the 3.7 dB increase reported by

Tartter et al. (1993). The reason for these differences in

vocal level is unclear but may reflect individual differ-

ences among talkers. It is possible that the two talkers in

each study exhibited small increases in vocal level simi-

lar to Talkers 4 and 5 in the present study (9 dB in 80 dB

SPL WBN).

The absolute vocal levels found in the present study

also are somewhat higher than those reported in previ-

ous studies. This is likely due to differences in the dis-

tance of the talker from the recording microphone. For

example, the microphone in the present study was posi-

tioned 1 inch from the talker’s mouth, whereas in

Letowski et al. (1993) and Summers et al. (1988), the

microphones were 12 and 4 inches from the talkers, re-

spectively. Using the inverse square law to estimate vocal

levels at a microphone distance of 1 inch, the levels in

quiet for the Letowski and Summers studies are equiva-

lent to 84 and 71 dB SPL, respectively, which are some-

what similar to the average vocal level of 82 dB SPL in

the present study.

Spectral Composition

The slope values for each speaking condition are

shown in the middle panel of Figure 2 as a function of

talker. Also shown are the mean slope values (and one

standard deviation) for each speaking environment. The

positive slope values for the two noise environments

indicate an increase in high-frequency energy for speech

produced in noise. For example, the mean amplitude at

2.5 kHz for all talkers shown in Figure 1 increased an

average of 18 dB compared to an average increase of

only 7 dB at 0.2 kHz. A one-way ANOVA with repeated

measures revealed a significant difference between the

slope values for the speech samples produced in each

speaking environment [F(2, 8) = 19.338, p < 0.001, ω

0.48]. Planned orthogonal contrasts revealed a signifi-

cant difference between the values for speech produced

in quiet and in both noise environments [Quiet vs. WBN:

Pittman & Wiley:

Recognition of Speech Produced in Noise

491

Figure 1. Long-term average spectra of speech spoken in quiet, in wide band noise (WBN), and in

meaningful multi-talker babble (MMB) for each of the five talkers, with the combined spectra of all five

talkers in the lower right panel.

Dunn

(3, 8) = 33.375, p < 0.001; Quiet vs. MMB: t

Dunn

(3,

8) = 23.838, p < 0.001]. However, no difference was found

between the slope values for the speech produced in

WBN and in MMB [t

Dunn

(3, 8) = 0.801, p = 0.528]. These

results are consistent with Summers et al. (1988) and

Tartter et al. (1993), who also reported significant in-

creases in high-frequency energy for speech spoken in

noise. The lack of differences between WBN and MMB

in vocal level and slope are consistent with Letowski et

al. (1993), who suggested that the overall level of a com-

petitor, rather than its spectral content, determines

changes in the acoustic characteristics of speech.

Target Word Duration

The average target word durations for each speak-

ing environment are shown by talker in the bottom

panel of Figure 2. Also shown are the mean word dura-

tions (and one standard deviation) for each speaking

environment. Relative to speech spoken in quiet, tar-

get word duration increased an average of 88 and 65

ms in WBN and in MMB, respectively. A one-way

ANOVA with repeated measures revealed significant

differences in word duration for the speech spoken in

quiet and in noise [F(2, 8) = 6.7, p = 0.021, ω

= 0.20].

492

Journal of Speech, Language, and Hearing Research

• Vol. 44 • 487–496 • June 2001

Planned orthogonal contrasts revealed significantly

longer target word durations for the speech spoken in

noise relative to speech in quiet [Quiet vs. WBN: t

Dunn

(3, 8) = 12.2, p = 0.002; Quiet vs. MMB: t

Dunn

(3, 8) = 6.7,

p = 0.014], although no difference in word duration was

observed between WBN and MMB [t

Dunn

(3, 8) = 0.8, p =

0.527]. These results are consistent with Summers et

al. (1988), who reported an average increase of 60 ms in

word duration under similar conditions, but are some-

what shorter than the 185-ms increase reported by

Tartter et al. (1993).

To determine whether the insert earphones caused

an occlusion effect that was not present in the quiet

speaking environment, Talker 1 was asked to return for

further testing. She read 10 of the original 50 sentences

under three conditions: (1) wearing the insert earphones

with no noise input, (2) wearing the insert earphones

with 80 dB SPL of WBN, and (3) in quiet without the

insert earphones. The sentences were analyzed as de-

scribed above. Although the acoustic characteristics of

the speech spoken in 80 dB SPL noise were similar to

those measured previously for this talker, no significant

differences were found between the speech spoken in

the two quiet environments. This suggests that the in-

sert earphones did not create an occlusion effect that

might have affected speech production.

In summary, the acoustic characteristics of the

speech materials in the present study appear to be

consistent with those of previous studies. Relative to

speech spoken in quiet, speech spoken in noise demon-

strated significant increases in vocal level, spectral slope,

and word duration. The speech samples of Talker 1 were

used for a recognition task described in Part II because

the acoustic characteristics of her speech were closest

to the average of all five talkers. In addition, the sen-

tences produced by this talker contained no errors,

whereas the other four talkers occasionally misread one

or two nontarget words.

Part II: Recognition

Part II of this study compared the recognition of

speech produced in quiet and in noise. Like previous

studies of this kind, the speech samples produced in quiet

and in noise were presented at SNRs that equated over-

all vocal levels (SNR

). Unlike previous studies, the

speech samples also were presented at SNRs that pre-

served these vocal-level differences (SNR

). In this way,

the influence of the spectral and temporal changes in

the speech stimuli could be evaluated independent of,

and then in combination with, the additional contribu-

tion of increased vocal level. Based on the work of Sum-

mers et al. (1988) and Dreher and O’Neill (1957), one

would expect higher recognition scores for the speech

produced in WBN and MMB than for the speech pro-

duced in quiet. In addition, one would expect no differ-

ence in recognition between the speech produced in WBN

and MMB, because no significant acoustic differences

were observed.

Method

Participants

Twenty-seven women and 3 men between the ages

of 18 and 30 years served as listeners. Each participant

had hearing thresholds in the test ear ≤10 dB HL at

Figure 2. Mean vocal levels in dB SPL (top panel), spectral slope in

dB SPL/kHz (middle panel), and word duration in ms (bottom

panel) for the 50 target words produced in quiet, in wide-band

noise (WBN), and in meaningful multi-talker babble (MMB) for

each talker. Group means and ±1 standard deviation are shown

for each speaking environment at the right.

Pittman & Wiley:

Recognition of Speech Produced in Noise

493

audiometric frequencies 0.25, 0.5, 1, 2, and 4 kHz and

≤15 dB HL at 8 kHz. Hearing levels in the nontest ear

were ≤20 dB HL at audiometric frequencies 0.25 through

8 kHz. The ear with the lowest thresholds was chosen

as the test ear. In cases of equal thresholds in both ears,

the test ear was alternated across listeners. All listen-

ers exhibited normal middle ear function bilaterally

based on tympanometry (Roup et al., 1998).

Speech Materials

The 50 sentences produced in quiet and in the two

noise conditions by Talker 1 were digitally extracted from

the original recording to remove extraneous utterances.

The sentences within each condition were randomized

and recorded onto a compact disk at a sampling rate of

22.05 kHz. A 4-s gap was inserted between each sen-

tence to allow time for a written response. Two 1-kHz

calibration tones also were recorded. The first was equal

in average RMS level to the sentences produced in quiet,

and the second was equal to the average RMS level of

the speech produced in WBN and MMB. Separate cali-

bration tones were not necessary for the WBN and MMB

sentences, because the overall level of the two samples

differed by less than 1 dB. No attempt was made to

equalize the RMS levels of the target words within each

sentence. This enabled preservation of the acoustic char-

acteristics unique to each speaking environment, includ-

ing variations in vocal level.

Procedure

Each 50-sentence speech sample was presented with

the MMB competitor at 0, –5, and –10 dB SNRs. The

level of the speech remained constant, and the level of

the competitor changed according to the SNR. There were

two listening conditions. In the first condition (SNR

the levels of the three 50-sentence speech samples were

equated by presenting each at the same SNR. The rec-

ognition scores would therefore reflect the influence of

all acoustic differences between the samples, except vo-

cal level. In the second condition (SNR

), the level dif-

ferences between the three 50-sentence speech samples

were preserved so that recognition scores would reflect

the influence of all the acoustic differences, including

vocal level. This was accomplished by setting the noise

level equal to that of the speech produced in quiet and

then presenting the speech produced in WBN and MMB

11 dB higher, which is equivalent to the increase in vo-

cal level for this talker. Presentation of the speech ma-

terial at 0 and –5 dB SNRs was discontinued for the

SNR

condition after the results of the first five listen-

ers revealed ceiling effects.

Each 50-sentence sample was presented monau-

rally through a TDH-50 earphone at 60 dB SL relative

to the pure-tone threshold at 1 kHz. Each participant

was instructed to write the final word of each of the

sentences. Testing was conducted in 2 one-hour sessions.

The first five listeners responded to a total of 18 lists of

50 sentences each (3 speech samples × 3 SNRs × 2 lis-

tening conditions). The number of lists was reduced to

12 when the 0 and –5 dB SNRs were discontinued from

the SNR

condition. To reduce learning effects, the

speech samples, SNRs, and listening conditions were

randomized; and each participant was familiarized with

the sentences at a +30 dB SNR before testing. Recogni-

tion scores were examined to confirm that performance

did not improve significantly between the first and last

presentation of the sentences.

Results and Discussion

Part II was conducted to determine if speech pro-

duced in quiet and in the two noise environments dif-

fered in terms of recognition. Mean scores for the SNR

and SNR

conditions are shown in Figure 3 as a func-

tion of SNR and listening condition. The data from one

of the 30 listeners were corrupted for the 0 and –10 dB

SNR

conditions and could not be used (indicated with

asterisks). Before statistical analyses, scores were trans-

formed into rationalized arcsine units (RAU) so that the

variances would be homogenous across the range of

scores (Studebaker, 1985).

For the SNR

condition, recognition of speech pro-

duced in noise was an average of 15% higher (at –5 dB

SNR) than that of the speech produced in quiet. A one-way

Figure 3. Mean recognition in percent correct (and ±1 standard

deviation) for speech produced in quiet, in wide-band noise

(WBN), and in meaningful multi-talker babble (MMB) presented

under two listening conditions: SNR

, for which vocal level

differences were preserved, and SNR

, for which vocal levels

differences were removed. Asterisks indicate those conditions with

only 29 listeners; all other values were obtained for 30 listeners.

494

Journal of Speech, Language, and Hearing Research

• Vol. 44 • 487–496 • June 2001

ANOVA with repeated measures revealed significant dif-

ferences in recognition for the speech samples at each

SNR [0 dB SNR: F(2, 56) = 7.4, p = 0.001, ω

= 0.07; –5

dB SNR: F(2, 58) = 27.5, p < 0.001, ω

= 0.11; –10 dB

SNR: F(2, 56) = 21.0, p < 0.001, ω

= 0.13]. Planned or-

thogonal contrasts, listed in Table 1, revealed signifi-

cantly better recognition for both samples of speech spo-

ken in noise compared to speech spoken in quiet (p <

0.01). No significant differences were found for the

speech spoken in WBN and MMB at 0 and –10 dB SNR.

However, recognition of MMB was significantly higher

than that of WBN at –5 dB SNR (p = 0.05).

In the SNR

condition, differences in recognition

performance were greatest at –10 dB SNR; scores for

speech spoken in noise were an average of 69% higher

than those for speech spoken in quiet. A one-way ANOVA

with repeated measures revealed significant differences

among the three speech samples [F(1.7, 48.2) = 529.7,

p < 0.001, ω

= 0.78; degrees of freedom were adjusted to

compensate for a lack of sphericity

]. Planned orthogo-

nal contrasts revealed significantly higher recognition

scores for speech spoken in WBN and MMB than in

speech spoken in quiet [Quiet vs. WBN: t

Dunn

(3, 58) =

753.5, p < 0.001; Quiet vs. MMB: t

Dunn

(3, 58) = 833.3,

p < 0.001], although no differences in recognition scores

were found between WBN and MMB [t

Dunn

(3, 58) = 2.0,

p = 0.123].

Because recognition scores were higher for speech

spoken in WBN and MMB than for speech spoken in

quiet, additional analyses were performed to determine

if the acoustic characteristics measured in Part I influ-

enced performance. Specifically, the percentage of lis-

teners able to correctly identify each target word was

calculated for the quiet, WBN, and MMB speech

samples. The differences in percent between the two

noise conditions and the quiet condition were calculated

(WBN-quiet and MMB-quiet, respectively). These val-

ues quantified the magnitude of improvement between

the target words spoken in noise and in quiet. In the same

way, difference values also were calculated for the peak

RMS level (dB SPL), spectral slope (dB SPL/kHz), and

duration (ms) of each target word. This was done for the

0, –5, and –10 dB SNR

conditions. Recall that the SNRs

of the speech samples in the SNR

condition were equated

so that the large vocal level differences would be removed.

However, because no attempt was made to equalize the

RMS levels of each target word for the recognition task,

some level differences between the words remained. Cor-

relation coefficients were computed to determine the re-

lation between the changes in performance and the

changes in the three acoustic characteristics.

Pearson’s product-moment correlations are listed in

Table 2. In separate analyses, the increased vocal level

and spectral slope observed for the speech produced in

noise (WBN and MMB) correlated significantly with

increases in recognition (p < 0.01). However, the effects

were small for all but the peak RMS level for the WBN

speech sample at –5 dB SNR. Interestingly, this highest

correlation conflicts with the results of the recognition

task described earlier, for which significantly better per-

formance was observed for the MMB speech sample than

for the WBN speech sample at –5 dB SNR. Overall, these

results suggest that increases in vocal level and spec-

tral composition do not completely account for the ob-

served increases in recognition.

Table 1. Planned orthogonal contrasts (

Dunn

) by SNR for the

speech produced in quiet, in wide-band noise (WBN), and in

meaningful multi-talker babble (MMB) presented in the SNR

condition. Asterisks indicate significant contrasts (

≤ 0.05).

SNR Contrast

df t p

0 Quiet vs. WBN 3, 56 6.5 0.001*

Quiet vs. MMB 3, 56 14.4 <0.001*

WBN vs. MMB 3, 56 1.5 0.214

–5 Quiet vs. WBN 3, 58 24.2 <0.001*

Quiet vs. MMB 3, 58 52.8 <0.001*

WBN vs. MMB 3, 58 52.7 0.05*

–10 Quiet vs. WBN 3, 56 26.8 <0.001*

Quiet vs. MMB 3, 56 35.5 <0.001*

WBN vs. MMB 3, 56 0.5 0.613

Table 2. Pearson’s product-moment correlation coefficients (

)

relating the differences in recognition and acoustic characteristics

for each target word produced in quiet and in wideband noise

(WBN-quiet) and meaningful multi-talker babble (MMB-quiet)

presented at 0, –5, and –10 dB SNR in the SNR

condition.

Significant correlations (

< 0.01) are indicated by asterisks.

Speaking condition

WBN-quiet MMB-quiet

Peak RMS levels

0 dB SNR 0.57* 0.16

–5 dB SNR 0.74* 0.40*

–10 dB SNR 0.61* 0.33*

Spectral Slope

0 dB SNR 0.45* 0.26

–5 dB SNR 0.51* 0.32*

–10 dB SNR 0.41* 0.24

Word Duration

0 dB SNR 0.11 0.32*

–5 dB SNR –0.02 0.01

–10 dB SNR –0.03 0.11

A lack of sphericity indicates that the variances of all possible compari-

sons (quiet, WBN, MMB scores) were not equal, which may inflate the

Type I error rate. An adjustment to the degrees of freedom using the

Greenhouse-Geisser method was made to maintain a rejection rate of 5%.

Pittman & Wiley:

Recognition of Speech Produced in Noise

495

General Discussion

In this study, speech produced in quiet and in two

noise conditions (Part I) was presented to listeners in a

recognition paradigm using two SNR conditions (Part

II). The acoustic analyses of speech produced in the two

noise types revealed significant increases in vocal level,

spectral composition, and word duration as compared

to speech produced in quiet. Interestingly, the acoustic

analyses revealed no differences between the speech

produced in the two noise types (WBN and MMB) de-

spite the spectral and semantic differences of these

competitors. In terms of recognition, scores were an av-

erage of 69% higher for the speech produced in noise

when the vocal level differences between the speech

samples were preserved (SNR

) and an average of 15%

higher when the vocal level differences were removed

(SNR

). No significant differences in recognition were

found for the speech produced in WBN and in MMB ex-

cept at –5 dB SNR

, where a significant increase of 6%

was observed for speech produced in MMB. In general,

these results suggest that the recognition of speech pro-

duced in noise was significantly better than that for

speech produced in quiet and that the spectral and se-

mantic content of the WBN and MMB competitors did

not appear to differentially influence the production of

speech or the subsequent perception of that speech.

The results of this study are consistent with those

of Summers et al. (1988). In that study, digits produced

in broadband noise and in quiet were presented in a

paradigm similar to the –10 dB SNR

condition in the

present study for an average increase in recognition of

6%. Dreher and O’Neill (1957) reported increases of 27%

for spondees produced in white noise and presented at

+4 dB SNR. Junqua (1993), on the other hand, reported

significant decreases in the perception of digits and

monosyllabic and bisyllabic words produced in noise for

some listeners. Unfortunately, these results cannot be

compared directly because of insufficient information

provided by Junqua regarding methodology and statis-

tical significance. Recall, however, that Junqua reported

no significant differences in the spectral composition of

his speech materials, which may explain the discrep-

ancy between the results of that study and those of the

present study.

In general, higher recognition scores were observed

for speech produced in noise relative to speech produced

in quiet in the SNR

condition. The increased perfor-

mance was likely due to the improved signal-to-noise

ratio provided by the large increases in vocal level. Al-

though the effects were smaller, significant increases in

recognition also were observed when the overall vocal

level of the speech samples was equated (SNR

condi-

tion). The residual difference in performance for this

condition suggests that additional variations in the

acoustic characteristics of each target word (other than

overall vocal level) may have contributed to the observed

differences in performance. Further analyses of each

target word revealed significant correlations between

performance and two acoustic characteristics (vocal level

and spectral composition). However, the effects were

small and inconsistent. Overall, these results suggest

that there is not a simple relation between vocal level

or spectral composition of individual words and recog-

nition. Rather, recognition is more likely the result of

complex interactions between these and other acoustic

characteristics that were not examined.

Implications

The results of this study have implications for at

least two areas of clinical audiology. First, Wiley and

Page (1997) argued that, among other things, speech

perception tasks should provide results that can be ap-

plied to rehabilitation efforts, such as amplification, and

the prediction of communication difficulties in everyday

listening situations. The results of Part I suggest that

the acoustic characteristics of speech spoken in noise

are significantly different from those for speech spoken

in quiet. These characteristics, therefore, should be con-

sidered when using hearing aid prescriptive procedures.

For example, many hearing aid prescriptive methods

use the long-term spectrum of speech produced in quiet

as a reference for all incoming signals (Byrne & Dillon,

1986; Cox & Moore, 1988; Schwartz, Lyregaard, &

Lundh, 1988). Hearing aid manufacturers and others

recommend a decrease in low-frequency gain and an

increase in high-frequency gain for the best perception

of speech in noisy environments (Martin, 1996). Al-

though this practice may reduce the effects of upward

spread of masking, the results of this study suggest that

smaller adjustments may be necessary. Talkers will

naturally speak louder in noisy conditions and there-

fore reduce low-frequency and increase high-frequency

energy. If the parameters of a hearing aid are set with-

out this consideration, the acoustic properties of speech

may be overcorrected and, in some cases, perception may

actually be degraded (e.g., the hearing aid may be forced

to operate in saturation). It is important to remember,

however, that the talkers in the present study were spe-

cifically instructed to speak clearly to a listener. Whether

this is fully representative of speech in a typical noise

environment is unknown.

The results of Part II suggest that speech-recogni-

tion tasks used clinically are of limited value for predict-

ing communication difficulties in everyday situations that

involve noise or competing speech because these tasks

use speech samples recorded in quiet. The absence of a

relation between recognition and the most robust acous-

tic differences between these speech samples suggests

496

Journal of Speech, Language, and Hearing Research

• Vol. 44 • 487–496 • June 2001

that it may not be possible to predict accurately speech

recognition in noise through simple modifications of

speech produced in quiet (e.g., increasing the SNR or

shaping the frequency response). Rather, these results

suggest the need to develop speech samples for recogni-

tion tests that incorporate the acoustic characteristics

of actual speaking environments, including those with

background noise. In this way, the effects of hearing loss

on speech recognition can be determined more accurately

by closely imitating common communication environ-

ments under controlled conditions.

Acknowledgments

We would like to acknowledge Ray Kent, Dolores Vetter,

Keith Kluender, and Cynthia Fowler for their insightful

contributions to this project and Patricia Stelmachowicz for

her many helpful comments on earlier versions of this

manuscript. This work was funded in part by a grant from

the Ventry and Friedrich Memorial Funds of the American

Speech-Language-Hearing Foundation.

References

Amazi, D. K., & Garber, S. R. (1982). The Lombard Sign as

a function of age and task. Journal of Speech and Hearing

Research, 25, 581–585.

Avaaz Innovations. (1995). Computerized Speech Research

Environment (CRSE) v4.5 User’s Guide. London, Ontario,

Canada: Author.

Beattie, R. (1989). Word recognition functions for the CID

W-22 Test in multi-talker noise for normally hearing and

hearing-impaired subjects. Journal of Speech and Hearing

Disorders, 54, 20–32.

Byrne, D., & Dillon, H. (1986). The National Acoustics

Laboratories’ (NAL) new procedure for selection the gain

and frequency response of a hearing aid. Ear and Hearing,

7, 257–265.

Cox, R. M., & Moore, J. N. (1988). Composite speech

spectrum for hearing aid gain prescriptions. Journal of

Speech and Hearing Research, 31, 102–107.

Dreher, J. J., & O’Neill, J. J. (1957). Effects of ambient

noise on speaker intelligibility for words and phrases.

Journal of the Acoustical Society of America, 29,

1320–1323.

Junqua, J. C. (1993). The Lombard reflex and its role on

human listeners and automatic speech recognizers.

Journal of the Acoustical Society of America, 93, 510–524.

Kalikow, D. N., Stevens, K. N., & Elliot, L. L. (1977).

Development of a test of speech intelligibility in noise using

sentence materials with controlled word predictability.

Journal of the Acoustical Society of America, 61, 1337–1351.

Kryter, K. D. (1962). Methods for calculation and use of the

articulation index. Journal of the Acoustical Society of

America, 34, 1689–1697.

Martin, R. L. (1996). How to shape amplified sound to help

patients hear in background noise. The Hearing Journal,

49, 49–50.

Letowski, T., Frank, T., & Caravella, J. (1993). Acoustical

properties of speech produced in noise presented through

supra-aural earphones. Ear and Hearing, 14, 332–338.

Roup, C. M., Wiley, T. L., Safady, S. H., & Stoppenbach,

D. T. (1998). Tympanometric screening norms for adults.

American Journal of Audiology, 7, 55–60.

Schwartz, D. M., Lyregaard, P. E., & Lundh, P. (1988,

February). Hearing aid selection for severe-to-profound

hearing loss. The Hearing Journal, 13–17.

Studebaker, G. A. (1985). A rationalized arcsine transform.

Journal of Speech and Hearing Research, 28, 455–462.

Summers, W. V., Pisoni , D. B., Bernacki, R. H., Pedlow,

R. I., & Stokes, M. A. (1988). Effects of noise on speech

production: Acoustical and perceptual analyses. Journal of

the Acoustical Society of America, 84, 917–928.

Tartter, V. C., Gomes, H., & Litwin, E. (1993). Some

acoustic effects of listening to noise on speech production.

Journal of the Acoustical Society of America, 94,

2437–2440.

Walden, B. E., Prosek, R. A., & Worthington, D. W.

(1975). The prevalence of hearing loss within selected U.S.

Army branches (Interagency No. IAO 4745, August, 31, 1-

95). Washington, DC: U.S. Army Medical Research and

Development Command.

Webster, J. C., & Klumpp, R. G. (1962). Effects of ambient

noise and nearby talkers on a face-to-face communication

task. Journal of the Acoustical Society of America, 34,

936–912.

Wiley, T. L., & Page, A. L. (1997). Summary: Current and

future perspectives on speech perception tests. In L. L.

Mendel & J. L. Danhauer (Eds.), Audiologic evaluation

and management and speech perception assessment (pp.

201–210) San Diego, CA: Singular Publishing

Wilson, R. H., Zizz, C. A., Shanks, J. E., & Causey, G. D.

(1990). Normative data in quiet, broadband noise, and

competing message for the Northwest University Auditory

Test No. 6 by a female speaker. Journal of Speech and

Hearing Disorders, 55, 771–778.

Received May 9, 2000

Accepted March 1, 2001

DOI: 10.1044/1092-4388(2001/038)

Contact author: Andrea Pittman, PhD, Boys Town National

Research Hospital, 555 North 30

Street, Omaha, NE

68131. Email: pittmana@boystown.org

Audio, Visual and Audiovisual intelligibility of vowels produced in noise

Conference Paper

Aug 2023

Maeva Garnier

Vocal Behavior of Teachers Reading with Raised Voice in a Noisy Environment

Article

Full-text available

Jul 2022
Int J Environ Res Publ Health

(1) Objective: Teaching is a particularly voice-demanding occupation. Voice training provided during teachers’ education is often insufficient and thus teachers are at risk of developing voice disorders. Vocal demands during teaching are not only characterized by speaking for long durations but also by speaking in noisy environments. This provokes the so-called Lombard effect, which intuitively leads to an increase in voice intensity, pitch and phonation time in laboratory studies. However, this effect has not been thoroughly investigated in realistic teaching scenarios. (2) Methods: This study thus examined how 13 experienced, but vocally untrained, teachers behaved when reading in a noisy compared to quiet background environment. The quiet and noisy conditions were provided by a live audience either listening quietly or making noise by talking to each other. By using a portable voice accumulator, the fundamental frequency, sound pressure level of the voice and the noise as well as the phonation time were recorded in both conditions. (3) Results: The results showed that the teachers mainly responded according to the Lombard effect. In addition, analysis of phonation time revealed that they failed to increase inhalation time and appeared to lose articulation through the shortening of voiceless consonants in the noisy condition. (4) Conclusions: The teachers demonstrated vocally demanding behavior when speaking in the noisy condition, which can lead to vocal fatigue and cause dysphonia. The findings underline the necessity for specific voice training in teachers’ education, and the content of such training is discussed in light of the results.

The impact of speech type on listening effort and intelligibility for native and non-native listeners

Article

Full-text available

Sep 2023

Listeners are routinely exposed to many different types of speech, including artificially-enhanced and synthetic speech, styles which deviate to a greater or lesser extent from naturally-spoken exemplars. While the impact of differing speech types on intelligibility is well-studied, it is less clear how such types affect cognitive processing demands, and in particular whether those speech forms with the greatest intelligibility in noise have a commensurately lower listening effort. The current study measured intelligibility, self-reported listening effort, and a pupillometry-based measure of cognitive load for four distinct types of speech: (i) plain i.e. natural unmodified speech; (ii) Lombard speech, a naturally-enhanced form which occurs when speaking in the presence of noise; (iii) artificially-enhanced speech which involves spectral shaping and dynamic range compression; and (iv) speech synthesized from text. In the first experiment a cohort of 26 native listeners responded to the four speech types in three levels of speech-shaped noise. In a second experiment, 31 non-native listeners underwent the same procedure at more favorable signal-to-noise ratios, chosen since second language listening in noise has a more detrimental effect on intelligibility than listening in a first language. For both native and non-native listeners, artificially-enhanced speech was the most intelligible and led to the lowest subjective effort ratings, while the reverse was true for synthetic speech. However, pupil data suggested that Lombard speech elicited the lowest processing demands overall. These outcomes indicate that the relationship between intelligibility and cognitive processing demands is not a simple inverse, but is mediated by speech type. The findings of the current study motivate the search for speech modification algorithms that are optimized for both intelligibility and listening effort.

Mask wearing in Japanese and French nursery schools: The perceived impact of masks on communication

Article

Full-text available

Nov 2022

Due to the global COVID-19 pandemic, covering the mouth region with a face mask became pervasive in many regions of the world, potentially impacting how people communicate with and around children. To explore the characteristics of this masked communication, we asked nursery school educators, who have been at the forefront of daily masked interaction with children, about their perception of daily communicative interactions while wearing a mask in an online survey. We collected data from French and Japanese nursery school educators to gain an understanding of commonalities and differences in communicative behavior with face masks given documented cultural differences in pre-pandemic mask wearing habits, face scanning patterns, and communicative behavior. Participants (177 French and 138 Japanese educators) reported a perceived change in their own communicative behavior while wearing a mask, with decreases in language quantity and increases in language quality and non-verbal cues. Comparable changes in their team members’ and children’s communicative behaviors were also reported. Moreover, our results suggest that these changes in educators’ communicative behaviors are linked to their attitudes toward mask wearing and their potential difficulty in communicating following its use. These findings shed light on the impact of pandemic-induced mask wearing on children’s daily communicative environment.

The Role of Auditory Feedback in Voice Control in Normal and Impaired Hearing

Article

May 2024
Neurosci Behav Physiol

Тhe role of auditory feedback in voice control with normal and impaired hearing

Article

Oct 2023

Control of speech fulfilled by cooperation between feedforward control and feedback control. Feedforward control activates program of articulation, whereas feedback control carries acoustic and sensorimotor information about pronounced utterance. Their complementary speech control function described by the DIVA model, which based on adjustment of auditory and proprioceptive signals relatively to program of articulation in nerve centers. The inconsistency between the sensory information received via feedback and the presentation of the acoustic signal in the auditory nucleus causes corrective commands. Auditory feedback is necessary for the correct development of children’s articulatory skills, i.e. forming feedforward control. For this reason, prelingually deafened adults have significant articulation impairments due to immature articulatory skills. In postlingual deafness, the previously forming feedforward control allows pronounce phonemes successfully. However, in people with sensorineural hearing loss, control of phonation and articulation through the auditory feedback deteriorates, which expressed by an increase of voice intensity, changes in the speech spectral characteristics and instability in frequency and amplitude. Similar speech changes are found in speakers with normal hearing in the presence of noise that masks the speaker’s voice (Lombard effect). In noise, voice intensity increase, spectral characteristics of speech shift to the high-frequency region, and increase the amplitude and speed of articulatory movements (hyperarticulation). This speech reorganization is an adaptation of the speaker’s own voice to background noise, which purpose is to unmask the speech and restore auditory feedback control.

Spectral Tilt May Have a Smaller Impact on the Intelligibility of Speech in Noise

Conference Paper

Dec 2023

Articulatory and Vocal Fold Movement Patterns During Loud Speech in Children With Cerebral Palsy

Article

Jan 2024
J SPEECH LANG HEAR R

Ignatius S B Nip

Purpose Speech motor control changes underlying louder speech are poorly understood in children with cerebral palsy (CP). The current study evaluates changes in the oral articulatory and laryngeal subsystems in children with CP and their typically developing (TD) peers during louder speech. Method Nine children with CP and nine age- and sex-matched TD peers produced sentence repetitions in two conditions: (a) with their habitual rate and loudness and (b) with louder speech. Lip and jaw movements were recorded with optical motion capture. Acoustic recordings were obtained to evaluate vocal fold articulation. Results Children with CP had smaller jaw movements, larger lower lip movements, slower jaw speeds, faster lip speeds, reduced interarticulator coordination, reduced low-frequency spectral tilt, and lower cepstral peak prominences (CPP) in comparison to their TD peers. Both groups produced louder speech with larger lip and jaw movements, faster lip and jaw speeds, increased temporal coordination, reduced movement variability, reduced spectral tilt, and increased CPP. Conclusions Children with CP differ from their TD peers in the speech motor control of both the oral articulatory and laryngeal subsystems. Both groups alter oral articulatory and vocal fold movements when cued to speak loudly, which may contribute to the increased intelligibility associated with louder speech. Supplemental Material https://doi.org/10.23641/asha.24970302

Design and Development of a Spanish Hearing Test for Speech in Noise (PAHRE)

Article

Full-text available

Dec 2022

Background: There are few hearing tests in Spanish that assess speech discrimination in noise in the adult population that take into account the Lombard effect. This study presents the design and development of a Spanish hearing test for speech in noise (Prueba Auditiva de Habla en Ruido en Español (PAHRE) in Spanish). The pattern of the Quick Speech in Noise test was followed when drafting sentences with five key words each grouped in lists of six sentences. It was necessary to take into account the differences between English and Spanish. Methods: A total of 61 people (24 men and 37 women) with an average age of 46.9 (range 18-84 years) participated in the study. The work was carried out in two phases. In the first phase, a list of Spanish sentences was drafted and subjected to a familiarity test based on the semantic and syntactic characteristics of the sentences; as a result, a list of sentences was selected for the final test. In the second phase, the selected sentences were recorded with and without the Lombard effect, the equivalence between both lists was analysed, and the test was applied to a first reference population. Results: The results obtained allow us to affirm that it is representative of the Spanish spoken in its variety in peninsular Spain. Conclusions: In addition, these results point to the usefulness of the PAHRE test in assessing speech in noise by maintaining a fixed speech intensity while varying the intensity of the multi-speaker background noise. The incorporation of the Lombard effect in the test shows discrimination differences with the same signal-to-noise ratio compared to the test without the Lombard effect.

The combined effects of contextual predictability and noise on the acoustic realisation of German syllables

Article

Full-text available

Aug 2022

Speakers tend to speak clearly in noisy environments, while they tend to reserve effort by shortening word duration in predictable contexts. It is unclear how these two communicative demands are met. The current study investigates the acoustic realizations of syllables in predictable vs unpredictable contexts across different background noise levels. Thirty-eight German native speakers produced 60 CV syllables in two predictability contexts in three noise conditions (reference = quiet, 0 dB and −10 dB signal-to-noise ratio). Duration, intensity (average and range), F0 (median), and vowel formants of the target syllables were analysed. The presence of noise yielded significantly longer duration, higher average intensity, larger intensity range, and higher F0. Noise levels affected intensity (average and range) and F0. Low predictability syllables exhibited longer duration and larger intensity range. However, no interaction was found between noise and predictability. This suggests that noise-related modifications might be independent of predictability-related changes, with implications for including channel-based and message-based formulations in speech production.

Tympanometric Screening Norms for Adults

Article

Full-text available

Oct 1998

The purpose of this study was to reexamine the Margolis and Heller (1987) normative tympanometric data (also American Speech-Language-Hearing [ASHA], 1990 interim norms) using a strict control over subject age and gender. Normative values for peak, compensated static acoustic admittance (Peak Ytm), acoustic equivalent volume (Vea), and tympanometric width (TW) were determined for 102 young adults with normal hearing. Relative to the Margolis and Heller normative values, significant differences were found for Vea and TW. Although statistically significant, these differences were small and of little clinical importance. However, significant and clinically important gender differences in young adults were observed for each of the tympanometric measures. Compared to males, females had lower Peak Ytm values, smaller Vea values, and higher TW values.

Summary: Current and future perspectives on speech perception tests

Chapter

Jan 1997

Effects of noise on speech production: Acoustic and perceptual analyses

Article

Sep 1988

Van Summers

Acoustical analyses were carried out on a set of utterances produced by two male speakers talking in quiet and in 80, 90, and 100 dB SPL of masking noise. In addition to replicating previous studies demonstrating increases in amplitude, duration, and vocal pitch while talking in noise, these analyses also found reliable differences in the formant frequencies and short‐term spectra of vowels. Perceptual experiments were also conducted to assess the intelligibility of utterances produced in quiet and in noise when they were presented at equal S/N ratios for identification. In each experiment, utterances originally produced in noise were found to be more intelligible than utterances produced in the quiet. The results of the acoustic analyses showed clear and consistent differences in the acoustic–phonetic characteristics of speech produced in quiet versus noisy environments. Moreover, these acoustic differences produced reliable effects on intelligibility. The findings are discussed in terms of: (1) the nature of the acoustic changes that take place when speakers produce speech under adverse conditions such as noise, psychological stress, or high cognitive load; (2) the role of training and feedback in controlling and modifying a talker’s speech to improve performance of current speech recognizers; and (3) the development of robust algorithms for recognition of speech in noise.

Effects of Ambient Noise on Speaker Intelligibility for Words and Phrases

Article

Dec 1957

John J. Dreher

The Lombard, or voice reflex, effect results in speech with characteristics different from those of speech that is normally produced. This change of characteristics can be demonstrated as an effective way to combat noise interference during reception. It also demonstrates the advisability of control of the production of speech by the speaker himself, in addition to that offered by equipment, during audiological evaluations. Fifteen naive speakers read words and sentences while noise was being delivered to their headsets. There were five noise conditions. Their speech was recorded, with the noise being kept out of the recording channel, and then limited. Noise was then added to the recording in such a way as to produce a constant speech-to-noise ratio. The result was played to 200 American listeners. Results indicate that at a constant speech-to-noise ratio of reception speech produced by a talker with masking noise in his cars becomes more intelligible as the masking level rises to a given value. The change in intelligibility throughout the range investigated suggests an application to audiological testing as well as a device for use in voice communication.

Effects of Ambient Noise and Nearby Talkers on a Face-to-Face Communication Task

Article

Jul 1962

John C. Webster

From 1 to 5 talker-listener pairs, talkers seated shoulder-to shoulder on one side of a table with listeners on the other, communicated word lists in conditions of quiet and ambient thermal noise levels of 65, 75, and 85 dB. Each talker read one word at a time to his listener-partner, who repeated back each word for verification by the talker. Talker—listener pairs were instructed to maintain an accuracy of 90% or better. For the lower ambient levels the speech level of a central pair increased about 5 dB for an additional 10 dB of noise or for each doubling of the number of pairs around them. The rate of utterance decreased with noise but showed no clear cut pattern of change as the number of additional talkers was varied. Accuracy of communication was, on the average, 94% and was never below 84%. Communication errors defy simple description but in general (1) for a constant noise level, increasing the number of talkers results in increasing errors; and (2 for 3 or fewer talker-listener pairs, percent error does not increase until the ambient-noise level reaches 85 dB.

Methods for the Calculation and Use of the Articulation Index

Article

Nov 1962

Karl D. Kryter

Speech‐intelligibility testing is an expensive and time‐consuming operation that requires laboratory test conditions. In an attempt to short‐cut or make unnecessary this type of testing, a procedure was developed by French and Steinberg [J. Acoust. Soc. Am. 19, 90–119 (1949)] for calculating from physical and acoustical measurements made on a communication system a measure that is indicative of the intelligibility scores that would be obtained for that system under actual test conditions. This measure is called the “Articulation Index” (AI). Methods of calculating AI have been improved and elaborated to the point where several methods for its calculation, herein reported, can be proposed for use in the evaluation of most speech communication systems.

Audiologic Evaluation and Management and Speech Perception Assessment

Article

Feb 1999

Lynne Marshall

The Prevalence of Hearing Loss within Selected U.S. Army Branches

Article

Aug 1975

The purpose of this investigation was to derive estimates of the prevalence of hearing loss within U.S. Army branches suspected to be high-risk with regard to hearing loss. Questionnaire data were obtained from high-risk personnel concerning their opinions of their hearing ability, hearing protective devices, and exposure to hazardous noises. Audiometric and questionnaire data were obtained from 3000 enlisted men representing three combat branches (i.e., infantry, armor, artillery) and five time-in-service categories. Subjects were selected at random, in proportion to population sizes, from ten Army posts. All of the data gathering was accomplished by the Audiology Officer(s) assigned to each post. The results suggest that the prevalence of hearing loss is approximately the same in the infantry, armor and artillery branches. In contrast, there are substantial differences in the prevalence of hearing loss according to length of time in service. Further, the problem of premature hearing loss among U.S. Army troops affects only the mid- to high-frequency range in the majority of soliders, with speech-reception thresholds and speech discrimination in quiet frequently remaining within normal limits even in advanced cases of noise-induced hearing loss. A comparison of reported profiles and profiles based upon the audiometric data suggests that many soldiers do not appear to carry the appropriate profile for hearing.

Computerized Speech Research Environment (v. 4.2)-User's Guide and Software

Article

Jun 1995

Development of a Test of Speech Intelligibility in Noise Using Sentence Materials With Controlled Word Predictability

Article

Jun 1977

This paper describes a test of everyday speech reception, in which a listener's utilization of the linguistic-situational information of speech is assessed, and is compared with the utilization of acoustic-phonetic information. The test items are sentences which are presented in babble-type noise, and the listener response is the final word in the sentence (the key word) which is always a monosyllabic noun. Two types of sentences are used: high-predictability items for which the key word is somewhat predictable from the context, and low-predictability items for which the final word cannot be predicted from the context. Both types are included in several 50-item forms of the test, which are balanced for intelligibility, key-word familiarity and predictability, phonetic content, and length. Performance of normally hearing listeners for various signal-to-noise ratios shows significantly different functions for low- and high-predictability items. The potential applications of this test, particularly in the assessment of speech reception in the hearing impaired, are discussed.

Recognition of Speech Produced in Noise

Abstract and Figures

Recommended publications

Phoneme Feature Perception in Noise by Normal-Hearing and Hearing-Impaired Subjects

A Comparison of the Effects of Hearing Impairment and Acoustic Filtering on Consonant Recognition

Talker continuity and the use of rate information during phonetic perception

Recognition of natural and time/intensity altered CVs by young and elderly subjects with normal hear...