ArticlePDF Available

Speech intelligibility and spatial release from masking in young children

June 2005
The Journal of the Acoustical Society of America 117(5):3091-9

June 2005
117(5):3091-9

DOI:10.1121/1.1873913

Source
PubMed

Authors:

Ruth Litovsky

University of Wisconsin–Madison

Children between the ages of 4 and 7 and adults were tested in free field on speech intelligibility using a four-alternative forced choice paradigm with spondees. Target speech was presented from front (0 degrees); speech or modulated speech-shaped-noise competitors were either in front or on the right (90 degrees). Speech reception thresholds were measured adaptively using a three-down/one-up algorithm. The primary difference between children and adults was seen in elevated thresholds in children in quiet and in all masked conditions. For both age groups, masking was greater with the speech-noise versus speech competitor and with two versus one competitor(s). Masking was also greater when the competitors were located in front compared with the right. The amount of masking did not differ across the two age groups. Spatial release from masking was similar in the two age groups, except for in the one-speech condition, when it was greater in children than adults. These findings suggest that, similar to adults, young children are able to utilize spatial and/or head shadow cues to segregate sounds in noisy environments. The potential utility of the measures used here for studying hearing-impaired children is also discussed.

Average ͑ Ϯ SD, dB SPL ͒ differences between speech reception thresholds ͑ SRTs ͒ in the masked and quiet conditions. Data are plotted for front ͑ top panels ͒ and right ͑ bottom panels ͒ conditions, for children ͑ left panels ͒ and adults ͑ right panels ͒ . Each panel compares difference values for the speech and speech-noise competitors when the number of competitor ͑ s ͒ was either one ͑ black bars ͒ or two ͑ gray bars ͒ .

…

Spatial release from masking values are shown for children and adults in panels ͑ A ͒ and ͑ B ͒ , respectively. Each panel shows values grouped by competitor type/number condition ͑ on the x -axis labels SP and Sp-Ns refer to the speech and speech-noise conditions, respectively ͒ . Individual values appear in gray circles, and group averages ͑ Ϯ SD ͒ are shown in black circles. When necessary to avoid overlap of data points, in some cases there was a slight shifting along the x axis.

…

Figures - uploaded by Ruth Litovsky

Content may be subject to copyright.

Content uploaded by Ruth Litovsky

Content may be subject to copyright.

Speech intelligibility and spatial release from masking in young

children

Ruth Y. Litovsky

Waisman Center, University of Wisconsin—Madison, 1500 Highland Avenue, Madison, Wisconsin 53705

共Received 13 August 2003; accepted for publication 27 January 2005兲

Children between the ages of 4 and 7 and adults were tested in free ﬁeld on speech intelligibility

using a four-alternative forced choice paradigm with spondees. Target speech was presented from

front 共0°兲; speech or modulated speech-shaped-noise competitors were either in front or on the right

共90°兲. Speech reception thresholds were measured adaptively using a three-down/one-up algorithm.

The primary difference between children and adults was seen in elevated thresholds in children in

quiet and in all masked conditions. For both age groups, masking was greater with the speech-noise

versus speech competitor and with two versus one competitor共s兲. Masking was also greater when the

competitors were located in front compared with the right. The amount of masking did not differ

across the two age groups. Spatial release from masking was similar in the two age groups, except

for in the one-speech condition, when it was greater in children than adults. These ﬁndings suggest

that, similar to adults, young children are able to utilize spatial and/or head shadow cues to segregate

sounds in noisy environments. The potential utility of the measures used here for studying

关DOI: 10.1121/1.1873913兴

PACS numbers: 43.66.Pn, 43.66.Qp, 43.71.Ft 关AK兴 Pages: 3091–3099

I. INTRODUCTION

Children spend numerous hours every day in complex

auditory environments, such as classrooms, where multiple

sounds that vary in content and direction typically co-occur.

In addition to voices of adults and children, instructional

aids, environmental sounds, and reverberation are standard

aspects of acoustic environments in classrooms. Some work

indicates that children learn best in relatively quiet environ-

ments, and often have difﬁculty hearing speech in the pres-

ence of distracting sounds 共Crandell, 1993; Yacullo and

Hawkins, 1987; Papso and Blood, 1989兲. Psychophysical

studies in which stimuli were presented over headphones

have shown that, compared with adults, preschool listeners

exhibit poorer attentional selectivity on auditory tasks 共e.g.,

Stellmack et al., 1997; Oh et al., 2001兲 and reduced unmask-

ing for tone detection under dichotic conditions 共Wightman

et al., 2003; Hall et al., 2004兲.

Also under headphones, it has been found that in the

presence of two-talker maskers speech reception thresholds

are higher in children than adults, and for both age groups

thresholds are higher in the presence of two-talker maskers

than with speech-shaped noise maskers 共Hall et al., 2002兲.

Headphone stimulus presentation is limited, however, be-

cause spatial cues that are known to be important for sound

segregation in realistic environments are missing. Studies

with adults have shown that the ability to segregate target

speech from competing speech and/or noise is determined by

a complex set of auditory computations that involve both

monaural and binaural processes 共Hawley et al., 1999, 2004;

Bronkhorst, 2000; Culling et al., 2004兲. Spatial cues in par-

ticular play a key role in facilitating source segregation.

Speech intelligibility improves by up to 12 dB when the

target speech and competing sounds are spatially separated,

resulting in ‘‘spatial release from masking’’ 共Plomp and

Mimpen, 1981; Bronkhorst and Plomp, 1992; Nilsson et al.,

1994; Koehnke and Besing, 1996; Peissig and Kollmeier,

1997; Hawley et al., 1999, 2004; Shinn-Cunningham et al.,

2001; Litovsky et al., 2002兲.

The extent to which children demonstrate spatial release

from masking for speech is poorly understood. Of particular

interest in the present study is the effect of number of

maskers, as well as their content, on the extent to which

young children experience spatial release from masking. In

adult listeners spatial release from masking is especially

large for multiple 共two or more兲 maskers that carry linguistic

content or context 共i.e., speech or reversed speech兲, and rela-

tively small for a single, nonspeech masker such as speech-

shaped noise 关Hawley et al., 共2004兲; see also Bronkhorst

共2000兲 for review兴. The authors of those works have con-

cluded that release from masking as provided by spatial cues

is particularly effective when the auditory environment is

complex. The concept of ‘‘informational masking’’ has been

invoked to explain this phenomenon, whereby, in the pres-

ence of maskers that are harder to ignore, spatial cues be-

come important for sound source segregation. In this case,

maskers that are multiple in number and/or that carry infor-

mation resembling that contained in the target result in

greater spatial release from masking 共e.g., Brungart 2001;

Freyman et al., 2001; Arbogast et al., 2002; Durlach et al.,

2003兲.

Several studies have reported that speech masking in

children depends on the masker type 共Papso and Blood,

a兲

Select portions of these data were presented at the 143rd Meeting of the

Acoustical Society of America, Pittsburgh, PA, and at the 24th Meeting of

the Association for Research in Otolaryngology, Tampa, FL.

b兲

Electronic mail: litovsky@waisman.wisc.edu

1989; Hall et al., 2002, 2004兲. However, the effect of num-

ber and spatial cues, and the possible contribution of these

stimulus parameters to spatial release from masking, remain

poorly understood. Binaural abilities in children are adultlike

on measures of binaural masking level differences 共Nozza

et al., 1988; Moore et al., 1991兲 and minimum audible angle

共Litovsky, 1997兲. Since spatial cues are known to play a key

role in speech understanding for adults, it is important to

understand how young children comprehend speech in real-

istic, multi-source acoustic environments, and the conditions

that enable them to beneﬁt from spatial cues. The research

paradigm used here may ultimately also be useful in evalu-

ating performance of hearing-impaired children. Noisy envi-

ronments are particularly problematic for children with a his-

tory of otitis media 共e.g., Hall et al., 2003; Moore et al.,

2003; Roberts et al., 2004兲 and for hearing aid and cochlear

implant users 共e.g., Dawson et al., 2004; Eisenberg et al.,

2004; Litovsky et al., 2004兲. Because the important task of

hearing speech in noise can be a daily struggle for many of

these children, ultimately their performance on these mea-

sures can assist with diagnosis and ﬁtting strategies.

In the present study the task involved a four-alternative

forced-choice 共4AFC兲 word discrimination paradigm. Sub-

jects selected a picture that matched the speech target from

an array of four pictures that appeared on a computer moni-

tor. Other tests such as the HINT-C 共Nilsson et al., 1994兲

may be usable for measuring speech intelligibility in noise in

children as young as 6 years, but are difﬁcult to implement

with younger children. The test protocol described here was

speciﬁcally designed to enable the study of speech intelligi-

bility in noise in children as young as 4 years old, an age at

which many children begin to spend a signiﬁcant number of

hours in noisy environments such as preschool classrooms.

II. METHODS

A. Subjects

A total of 36 volunteer children were recruited from lo-

cal public schools and the general community 共14 males and

22 females兲, and all subjects completed testing on the three

required conditions. Subjects ranged in age from 4.5 to 7.5

years 共average and standard deviation⫽5.5⫾1 years; see also

Table I兲.

All were native speakers of English with no known

auditory dysfunction or other cognitive disorders. According

to the parents’ report, none of the children were on medica-

tion or had known illness or ear infections on the day of

testing, and none of the children had a known history of

hearing loss. Total testing time for each listener was approxi-

mately 45 min.

Nine paid adult volunteers, with normal hearing as veri-

ﬁed by standard audiometric testing for frequencies between

250 and 8000 Hz, and English as their ﬁrst language, were

also tested. Since testing was much less time consuming with

adults than with children, a within-subject design was used

whereby each subject participated in all conditions that per-

tained to the four groups of children.

B. Testing chamber, materials apparatus

Testing was conducted in a single-walled sound booth

共3.6⫻4m兲 with carpeting. This room had a reverberation

time (T

)⫽ 250ms and ambient noise levels averaging 35

dB SPL. During testing, subjects were always seated in the

center of the room, with loudspeakers 共Radio Shack Mini-

mus 7兲 placed at 15.24 cm above ear level for children 共ear

level for adults兲 and at a distance of 1.67 m from the center

of the subject’s head. All stimuli were prerecorded, digitized,

and stored on a laptop computer 共Winbook兲. In the one-

competitor conditions, the target and competing sound were

fed to separate channels of a two-channel soundcard 共Digi-

gram VX Pocket兲, ampliﬁed 共Crown D-75兲, and presented to

separate loudspeakers. When both target and competitor

were presented from the front position, the speakers were

placed next to one another, with their centers at ⫾2°, with

their medial walls nearly touching. Each loudspeaker sub-

tended 4° in the horizontal dimension, hence strictly speak-

ing, speakers were separated by 4°. In the two-competitor

condition, when both occurred from the front, they were pre-

sented from the same loudspeaker. Target stimulus selection,

level controls, and output as well as response acquisition

were achieved using Matlab. A picture book containing four

target pictures per page was placed on a small table in front

of the subject.

C. Stimuli

Stimuli consisted of target words and competing sen-

tences. Targets comprised a closed set of 25 spondaic words

from CID W-1 obtained from Auditech and spoken by a male

talker. Although a larger set of words is available, the subset

chosen for the present study consisted of words that were

easily represented with a visual illustration and readily rec-

ognized as such during pilot testing of 20, 4 to 5 year-old

children 共a list of the target words used is shown in the

Appendix兲. The root-mean-square levels were equalized for

all target words using Matlab software. The competitors were

either speech or modulated speech-shaped noise. Competing

sentences were taken from the Harvard IEEE list 共Rothauser

et al., 1969兲 and recorded with a female voice. Examples of

sentences are ‘‘Glue the sheet to the dark blue background,’’

TABLE I. List of conditions tested for children 共nine subjects per condition兲.

Group

No. of

competitors

Age range

共years. months⫾SD兲

Competitor

type Conditions

1 1 5.4⫾1.1 Speech Quiet, 1 front, 1 right

2 1 5.6⫾1.2 Speech-noise Quiet, 1 front, 1 right

3 2 5.8⫾1 Speech Quiet, 2 front, 2 right

4 2 5.6⫾1 Speech-noise Quiet, 2 front, 2 right

3092 J. Acoust. Soc. Am., Vol. 117, No. 5, May 2005 Ruth Y. Litovsky: Speech intelligibility in young children

‘‘Two blue ﬁsh swam in the tank,’’ and ‘‘The meal was

cooked before the bell rang.’’ Ten such sentences were used,

and these were presented in a random order during testing.

Speech-noise was made based on the ten competitor sen-

tences and also played in a random order during testing.

These interferers were ﬁltered to match the long-term spec-

trum of the speech competitors, calculated for each talker

separately. The noise samples were scaled to the same root-

mean-square value and cut to the same length as the match-

ing speech competitor. The envelope was then extracted from

the speech competitor and was used to modulate the noise

tokens, giving the same coarse temporal structure as the

speech. The envelope of running speech was extracted using

a method similar to that described by Festen and Plomp

共1990兲 in which a rectiﬁed version of the waveform is low-

pass ﬁltered. A ﬁrst-order Butterworth low-pass ﬁlter was

used with a 3-dB cutoff at 40 Hz.

D. Design

The target words were always presented from the front

共0°兲. Competitors were presented from either front or side

共90°兲. Four groups of children with nine subjects per group

were tested 共see Table I兲. The side condition was always with

competitor共s兲 on the right. Each child subject was randomly

assigned to a group that was tested on one combination of

type 共speech or speech-noise兲 and number 共1or2兲 of com-

petitor共s兲. The subject was then tested on three conditions:

共1兲 quiet: no competitor共s兲, 共2兲 front: target and competitor共s兲

in front, and 共3兲 right: target in front and competitor共s兲 at 90°

on the right; the order of conditions was randomized using a

Latin-square design. For the adult group, testing was con-

ducted in a single 2-h session, with the order of the nine

conditions randomized for each listener.

For each condition one adaptive track was measured.

When two competitors were presented they were of the same

type, but different samples were used for the two sources; in

the two-speech conditions the same female voice was pre-

sented, speaking two different sentences, and in the two-

speech-noise conditions two different segments of the noise

were presented.

E. Familiarization

The present study was not aimed at testing children’s

vocabulary, but rather their speech intelligibility for known

words. The 25 words were selected from the spondee list

after pilot testing indicated that 20, 4 to 5 year-old children

were either familiar with the words or could easily ascertain

their meaning after one presentation. For each of the 25

words a commissioned artist-drawn picture was used to vi-

sually represent the meaning of the word. Prior to testing,

subjects underwent a familiarization session 共approximately

5 min in duration兲 in which they were presented with the

picture-word combinations and tested to insure that they as-

sociated each of the pictures with their intended auditory

target.

F. Speech reception threshold estimation

The test involved a single interval 4AFC discrimination

procedure. On each trial, the child viewed a set of four pic-

tures from the set of 25 picture-word matches. A word

matching one of the pictures was randomly selected and pre-

sented from the front speaker. A leading phrase such as

‘‘Point to the picture of the...’’ or ‘‘Where is the...’’ preceded

each target word. The child was asked to select the picture

matching the heard word, and to guess if not sure or if the

word was not audible. The randomization process ensured

that for every subject, on average, all 25 words were selected

an equal number of times. The experimenter entered the

child’s response into the computer. Following correct re-

sponses, feedback was provided in the form of 3-s musical

clips from popular children’s music. Approximately 20 clips

were digitized and stored on the computer, and randomly

selected on correct-feedback trials. Following incorrect re-

sponses, feedback was provided in the form of a brief phrase

such as ‘‘Let’s try another one’’ or ‘‘That must have been

difﬁcult.’’ Five such phrases were digitized and stored on the

computer, and randomly selected on incorrect-feedback tri-

als.

An adaptive tracking method was used to vary the level

of the target signal, such that correct responses result in level

decrement and incorrect responses result in level increment.

The algorithm includes the following rules: 共1兲 Level is ini-

tially reduced in steps of 8 dB, until the ﬁrst incorrect re-

sponse. 共2兲 Following the ﬁrst incorrect response a three-

down/one-up rule is used, whereby level is decremented

following three consecutive correct responses and level is

incremented following a single incorrect response. 共3兲 Fol-

lowing each reversal the step size is halved. 共4兲 The mini-

mum step size is 2 dB. 共5兲 A step size that has been used

twice in a row in the same direction is doubled. For instance,

if the level was decreased from 40 to 36 共step⫽4兲 and then

again from 36 to 32 共step⫽4兲, continued decrease in level

would result in the next level being 24 共step⫽8兲. 共6兲 After

three consecutive incorrect responses a ‘‘probe’’ trial is pre-

sented at the original level of 60 dB. If the probe results in a

correct response the algorithm resumes at the last trial before

the probe was presented. If more than three consecutive

probes are required, testing is terminated and the subject’s

data are not included in the ﬁnal sample. 共7兲 Testing is ter-

minated following ﬁve reversals.

For each subject, speech-reception-thresholds 共SRTs兲

were measured for each condition. At the start of each SRT

measurement, the level of the target was initially 60 dB SPL.

When competitors were present 共non-quiet conditions兲, the

level of each competitor was ﬁxed at 60 dB SPL, such that

the overall level of the competitors was increased by ap-

proximately 3 dB when two competitors were presented

compared with the one-competitor conditions. Thus, the

adaptive track began with a signal-to-noise ratio of 0 dB in

the one-competitor cases and ⫺3 dB in the two-competitor

cases.

Results were analyzed using a constrained maximum-

likelihood method of parameter estimation outlined by Wich-

mann and Hill 共2001a, b兲. All the data from each experimen-

tal run for each participant were ﬁt to a logistic function.

3093J. Acoust. Soc. Am., Vol. 117, No. 5, May 2005 Ruth Y. Litovsky: Speech intelligibility in young children

Thresholds were calculated by taking the inverse of the func-

tion at a speciﬁc probability level. In our 4AFC task, using

an adaptive three-down/one-up procedure, the lower bound

of the psychometric function was ﬁxed at the level of chance

performance, 0.25, and the threshold level corresponded to

the point on the psychometric function where performance

was approximately 79.4% correct. Biased estimates of

threshold can occur. Bias can be introduced by the sampling

scheme used and lapses in listener attention. Wichmann and

Hill 共2001a, b兲 demonstrated that bias associated with lapses

was easily overcome by introducing a highly constrained pa-

rameter to control the upper bound of the psychometric func-

tion. This approach was used to assess our data. The upper

bound of the psychometric function was constrained within a

narrow range 共0.06兲 as suggested by Wichmann and Hill

共2001b兲. As the authors suggest, under some circumstances,

bias introduced by the sampling scheme may be more prob-

lematic to avoid even when a hundred trials are obtained per

level visited. The possibility of biased threshold estimates

due to our sampling scheme was assessed by comparing the

thresholds obtained using the constrained maximum-

likelihood method with traditional threshold estimates based

on the last three reversals in each experimental run. A re-

peated measured t-test on quiet thresholds for the 36 children

tested revealed no statistically signiﬁcant difference

between the estimated threshold values obtained using the

ML approach versus the traditional approach

关

t(35)⫽1.37,

p⬎ 0.05, two tailed兴.

III. RESULTS

SRTs were statistically analyzed for the children groups

using a mixed-design analysis of variance 共ANOVA兲 with

two between-subjects variables 共number of competitors,

competitor type兲 and one within-subjects variable 共condi-

tion兲. Signiﬁcant main effects of number

关

F(1,32)⫽4.05;

p⬍ 0.05

兴

and condition

关

F(2,32)⫽119.57, p⬍0.0001

兴

were found, but there was no effect of type. Signiﬁcant in-

teractions were found for condition with number

关

F(2,64)

⫽ 66.50; p⬍0.03

兴

and condition with type 关F共2,64兲

⫽162.01; p⬍0.001兴. Scheffe’s posthoc contrasts 共signiﬁ-

cance value p⬍ 0.05) showed that SRTs in quiet were sig-

niﬁcantly lower than SRTs in either front or right. Children

tested with two competitors had signiﬁcantly higher SRTs

than those tested with one competitor for the front and right

conditions 共further comparisons between front and right are

described below with regard to spatial release from mask-

ing兲. Finally, for reasons that are not clear, SRTs on the quiet

conditions were lower in the two speech-noise groups than in

the groups tested with the speech competitors. Adult

data were analyzed with a one-way ANOVA for the nine

conditions, which revealed a signiﬁcant main effect

关

F(8,8)⫽3.77; p⬍0.05

兴

. Scheffe’s posthoc contrasts

关

F(8,8);Fp⬍ 0.01

兴

revealed that quiet SRTs were lower than

SRTs on all other conditions. Child and adult SRTs were

compared with independent t-tests for each of the nine con-

ditions; since the quiet condition was tested for each of the

child groups, a total of 12 comparisons were conducted. The

Bonferroni adjustment for multiple comparisons as described

by Uitenbroek 共1997兲 was applied 共df⫽16, criterion of t

⬎3.34 and p⬍ 0.004). Signiﬁcant differences were found for

all 12 comparisons, suggesting that adults’ SRTs were lower

than those of children for all conditions tested.

Figure 1 shows group means 共⫾SD兲 for masking 共dif-

ferences between masked and quiet SRTs兲. For each subject

masking amounts for front and right were obtained by sub-

tracting quiet SRTs from front and right SRTs, respectively.

To place the masking values into context, average 共⫾SD兲

SRTs for all groups and conditions are listed in Table II.

Statistical analyses on the amount of masking for the child

groups were conducted with a three-way mixed-design

ANOVA treating condition 共front minus quiet, right minus

quiet兲 as the within-subjects variable and competitor type

and number as the between-subjects variables. A signiﬁcant

effect of condition

关

F(1,32)⫽29.13; p⬍0.0001

兴

suggests

FIG. 1. Average 共⫾SD, dB SPL兲 differences between speech reception

thresholds 共SRTs兲 in the masked and quiet conditions. Data are plotted for

front 共top panels兲 and right 共bottom panels兲 conditions, for children 共left

panels兲 and adults 共right panels兲. Each panel compares difference values for

the speech and speech-noise competitors when the number of competitor共s兲

was either one 共black bars兲 or two 共gray bars兲.

TABLE II. Mean 共⫾SD兲 speech reception thresholds 共in dB SPL兲

a兲

Group Quiet Front Right

Children

1 speech 26.02共3.81兲 41.81 共6.31兲 36.64 共6.48兲

2 speech 27.32共5.25兲 47.75 共6.30兲 40.33 共6.29兲

1 speech-noise 23.25共5.56兲 44.37 共6.50兲 40.13 共3.89兲

2 speech-noise 21.45共3.3兲 48.01 共2.07兲 44.41 共7.18兲

Adults

1 speech 3.84共3.18兲 16.71 共5.66兲 16.86 共3.84兲

2 speech 23.35 共4.41兲 20.43 共4.01兲

1 speech-noise 27.39 共5.28兲 22.25 共4.82兲

2 speech-noise 32.82 共4.40兲 27.60 共8.65兲

a兲

It is important to recall that each child was tested on three conditions

共quiet, front, right兲 for one masker type, and that each adult was tested on

all nine conditions, hence only one entry in Table II for adult quiet thresh-

olds.

3094 J. Acoust. Soc. Am., Vol. 117, No. 5, May 2005 Ruth Y. Litovsky: Speech intelligibility in young children

that masking in the front minus quiet condition was higher

than in right minus quiet. Signiﬁcant effects of type

关

F(1,32)⫽15.51; p⬍0.0001

兴

and number

关

F(1,32)

⫽ 6.95; p⬍0.013

兴

further suggest that masking was greater

for two competitors than one, and greater for the speech-

noise competitor compared with speech. There were no sig-

niﬁcant interactions. For the adult subjects, a three-way re-

peated measures ANOVA 共condition⫻type⫻number兲

suggested, similar to the children, that masking was greater

in the front versus right conditions

关

F(1,8)⫽27.72;

p⬍ 0.001

兴

, greater with speech-noise than speech

关

F(1,8)

⫽ 30.72; p⬍0.001

兴

and greater for two compared with one

competitor

关

F(1,8)⫽16.71; p⬍0.004

兴

. Masking data for

child and adult groups were compared with independent

t-tests for each competitor location/type/number combina-

tion, and the Bonferroni correction for eight comparisons

was applied 共Uitenbroek, 1997兲. None of the comparisons

yielded a signiﬁcant difference in masking between the child

and adult groups, and none of the interactions were signiﬁ-

cant.

Spatial release from masking was deﬁned as the differ-

ence between front masking 共front minus quiet兲 and right

masking 共right minus quiet兲. Figure 2 shows individual

points for right minus quiet plotted versus front minus quiet

for all subjects and conditions tested. If no spatial release

from masking occurred, the points would be expected to fall

along the diagonal. Points falling below the diagonal would

be indicative of spatial release from masking. Alternatively,

points falling above the diagonal would represent cases in

which thresholds were higher when the competitors were on

the right rather than in front. The majority of individual data

points in Fig. 2 are below the diagonal, and average points

for all but one group are also indicative of spatial release

from masking.

Figure 3 summarizes the ﬁndings for spatial release

from masking. For children, group average values are be-

tween 3.6 and 7.5 dB; the overall average for all 36 children

is 5.25 dB. For adults, group averages range from 0 to 5.2 dB

with an overall average of 3.34 dB. Children’s data were

analyzed with a two-way between-subjects ANOVA 共type

⫻number兲, revealing no signiﬁcant main effects or interac-

tions. This lack of an effect may not surprising given the

large intersubject variability, which is notable in Fig. 3共A兲;

while some children had spatial release from masking values

greater than 10 dB, other children had values near 0, and a

small number had negative values. Adult data were analyzed

with a two-way repeated measures ANOVA 共type⫻number兲,

also revealing no signiﬁcant effects or interactions. Finally,

FIG. 2. Masking amounts 共differences

between masked and quiet thresholds兲

for the Right minus Quiet conditions

are plotted vs. Front minus Quiet con-

ditions. Panels 共A兲 and 共C兲 show data

for children and adults, respectively;

each symbol denotes data from an in-

dividual subject, and the four different

symbols refer to the type/number com-

bination of competitor共s兲. The diago-

nal lines denote equality between the

two variables. Panels 共B兲 and 共D兲

show average group data from 共A兲 and

共C兲, respectively, for the four condi-

tions tested.

3095J. Acoust. Soc. Am., Vol. 117, No. 5, May 2005 Ruth Y. Litovsky: Speech intelligibility in young children

to compare spatial release from masking for children and

adults independent t-tests were conducted for each type/

number combination, with the Bonferroni correction for four

contrasts applied 共Uitenbroek, 1997兲. The only signiﬁcant

difference between groups was for the one-speech competi-

tor condition, in which the average spatial release from

masking in adults is 0, compared with an average value of

5.7 for the child group.

IV. DISCUSSION

Speech intelligibility in quiet and in the presence of

competing sounds and the ability to beneﬁt from spatial

separation of the speech and competitor共s兲 were investigated

in children and adults. Although extensively studied in

adults, to date this area of research has been minimal in

children. This study may therefore be helpful towards im-

proving our understanding of children’s ability to hear and

learn in noisy and reverberant environments, especially

given that such abilities are known to be compromised com-

pared with abilities measured under quiet condition 共e.g.,

ANSI, 2002; Yacullo and Hawkins, 1987; Knecht et al.,

2002兲. The results can be summarized as follows: 共1兲 Adults’

SRTs were lower than those of the children for all conditions.

共2兲 For both age groups masking was signiﬁcantly greater

with speech-noise than with speech and with two competi-

tors compared with one. 共3兲 The amount of masking did not

differ across the two age groups. 共4兲 The amount of spatial

release from masking was similar for children and adults on

all but one condition. 共5兲 The number or type of competitor

did not affect the size of spatial release from masking for

either age group.

A. SRTs and masking amount

The primary age difference was that of higher SRTs in

children than adults, in quiet and in all masked conditions.

This age effect is consistent with existing developmental

psychoacoustic literature, which has shown that children

ages 4 to 7 typically have higher tone detection thresholds

compared with adults 共e.g., Buss et al., 1999; Oh et al.,

2001兲. Similarly, recognition of spondee words such as those

used here in temporally modulated noise has been shown to

produce higher thresholds in 5 to 10 year-old children than in

adults 共Hall et al., 2002兲.

The age effect found here can be attributed to a combi-

nation of peripheral and central mechanisms. Peripherally,

frequency resolution is highly similar to that of adults by 5

years of age 共Allen et al., 1989; Hall and Grose, 1991;

Veloso et al., 1990兲. However, young children appear to in-

tegrate auditory information over a greater number of audi-

tory channels than adults, suggesting that their ability to ex-

tract auditory cues, and in the present study to identify target

words at low signal levels, is likely to be still developing

共e.g., Hall et al., 1997; Buss et al., 1999; Hartley et al.,

2000; Oh et al., 2001兲. Immaturity of central auditory pro-

cesses and the adoption of listening strategies that are non-

optimal or less efﬁcient than adults 共Allen and Wightman,

1994; Lutﬁ et al., 2003兲 may have also affected SRTs. Fi-

nally, differences in thresholds may represent age-related dif-

ferences in the ability to take advantage of hearing partial

word segments and to ‘‘ﬁll in’’ the remainder of the target

word. Anecdotal reports from adults suggest that they relied

heavily on this strategy at low signal levels. The ability to

adopt this strategy can most likely be attributed to adults’

having more experience and better-developed language

skills, including the ability to parse phonetic, semantic, and

lexical aspects of speech 共Fletcher and MacWhinney, 1995兲.

Of interest is the lack of an age effect for the amount of

masking. Previous studies have typically shown that adults

experience reduced masking compared with children 共e.g.,

Buss et al., 1999; Oh et al., 2001; Papso and Blood, 1989;

Hall et al., 2002兲. Although this explanation may not be en-

tirely satisfying, the lack of an age-related masking effect

may be attributed to the task itself. In the current study, using

the 4AFC task, quiet thresholds were extremely low in

adults. In contrast, adults tested on the same measure using

identical stimuli, but with a 25AFC did not reveal such low

FIG. 3. Spatial release from masking values are shown for children and

adults in panels 共A兲 and 共B兲, respectively. Each panel shows values grouped

by competitor type/number condition 共on the x-axis labels SP and Sp-Ns

refer to the speech and speech-noise conditions, respectively兲. Individual

values appear in gray circles, and group averages 共⫾SD兲 are shown in black

circles. When necessary to avoid overlap of data points, in some cases there

was a slight shifting along the x axis.

3096 J. Acoust. Soc. Am., Vol. 117, No. 5, May 2005 Ruth Y. Litovsky: Speech intelligibility in young children

SRTs in quiet, but continued to show lower masked SRTs.

The amount of masking in the 25AFC task was therefore

lower in adults than children 共Johnstone and Litovsky, 2005兲.

When increasing task difﬁculty for adults, a more realistic

story with regard to age-related masking differences may

emerge, suggesting the importance of equating for difﬁculty

of the task when comparing perceptual abilities across age

groups.

B. Competitor type

SRTs did not differ for the two types of competitors for

children, but were higher with speech-noise than speech for

the adults, which may be in part due to greater statistical

power in the adult within-subjects comparisons. For both age

groups, masking was greater with speech-noise than speech.

These ﬁndings are consistent with other ﬁndings in adults in

a one-masker paradigm, whereby greater amounts of mask-

ing were reported in the presence of speech-noise compared

with speech 共e.g., Hawley et al., 2004兲. This has been attrib-

uted to greater amounts of overlap in the energies of the

speech-noise masker and the target, resulting in the reduction

of F0 discrimination. However, in previous work, as the

number of maskers increased, speech became a more potent

masker, an explanation involving informational masking and

linguistic interference from multiple speech maskers was in-

voked to account for the increased interference from speech

共e.g., Bronkhorst, 2000; Hawley et al., 2004兲. Here, there

was no interaction of type and number of competitors, which

may be explained by stimulus differences across studies.

Studies such as those of Hawley et al. 共2004兲 typically use

male voices for both the target and competitors, whereas here

the target was a male voice and the competitor was spoken

by a female. The differences in voice pitch, quality, and on-

going F0 differences provided a robust cue for source segre-

gation in the presence of speech competitors, regardless of

the number of competitors. The speech-noise competitor,

having momentary dips in amplitude but no ongoing changes

in frequency, served as a more potent masker whose effect

was greater than that of speech. With same-gender competi-

tors it is highly likely that speech would have produced

masking at least as great, if not larger than the speech-noise

competitor 共e.g., Brungart et al., 2001兲. Finally, the differ-

ences in masking amounts for the child groups may be ac-

counted for by the fact that, for reasons that are not entirely

clear, but probably due to random variation within the popu-

lation, SRTs on the quiet conditions were lower in the two

speech-noise

groups than in the groups tested with the

speech competitors.

C. Number of competitors

For both children and adults, masking was signiﬁcantly

greater for two compared with one competitor共s兲, and the

interactions of number with location 共front versus right兲 were

not signiﬁcant. Averaged over all competitor types and num-

bers, the addition of a second competing sound resulted in

increased masking of 4.7 dB for children and 4.8 for adults.

Two interpretations can be considered here. First, in the pres-

ence of competitors with envelope modulations such as those

used here, listeners may be better able to take advantage of

the modulations and ‘‘listen in the gaps’’ in the presence of a

single competitor. As a second competitor is added the signal

contains fewer gaps, thereby decreasing opportunities of

‘‘gap listening’’ 共e.g., Festen and Plomp, 1990; Hawley

et al., 2004兲. Second, consider the possible role of ‘‘informa-

tional’’ masking. In recent years this term has been used

extensively in the auditory literature to explain masking phe-

nomena that cannot be attributed solely to peripheral audi-

tory mechanisms 共e.g., Neff and Green, 1987; Lutﬁ, 1990;

Kidd et al., 2003兲. In the speech intelligibility literature, one

of the conditions under which informational masking has

been thought to occur is when the addition of a second

masker elevates thresholds by more than the 3 dB expected

simply from the added energy in the presence of a second

masker 共e.g., Brungart et al., 2001; Hawley et al., 2004;

Durlach et al., 2003兲. This threshold elevation may result

from the increased complexity of the listening environment,

possibly due to uncertainty on the part of the listener as to

what aspects of the stimulus to ignore and what aspects to

pay attention to. Although difﬁcult to evaluate numerically,

this component of masking may have been present here to

some extent, and more direct tests of the effect in children

would be important to pursue in future studies.

D. Spatial release from masking

Measures of spatial release from masking did not statis-

tically differ across age groups, nor were there effects of

competitor type and number. The only effect was the lack of

spatial release from masking in the one-speech condition in

adults, compared with 5.7 dB in children. The adult data

differ from other free ﬁeld studies in adults, in which spatial

release from masking for speech was reported to be at least 3

dB for a single competing talker and as high as 12 dB for

multiple talkers 共Bronkhorst, 2000; Hawley et al., 2004兲.

The lack of release from masking found here with the one-

speech competitor is likely due to the nature of the task and

stimuli; the use of a fairly easy 4AFC task in combination

with different-gender talkers for the target and competitor

most likely created a relatively simple listening situation for

adults.

Spatial cues are thought to be especially useful in chal-

lenging conditions when nonspatial cues are difﬁcult to ac-

cess 共Peissig and Kollmeier, 1997; Bronkhorst, 2000;

Durlach et al., 2003; Freyman et al., 2004兲. In the adult

group tested here, spatial cues were beneﬁcial in the condi-

tions that created greater amounts of front masking 共two-

speech, one-speech-noise and two-speech-noise兲. The lack of

a location effect in the one-speech condition is likely due to

the general ease of listening to spondees when the competitor

consists of a single, different-gender talker. In that condition,

spatial cues did not help to reduce masking in the right con-

dition, since masking was already relatively small in that

condition. In contrast with adults, in children the one-speech

front condition did present a challenging situation, probably

because children are less able to take advantage of the

different-gender competitor to hear the target speech. Thus,

spatial cues were indeed relevant to the children so as to

produce a robust improvement in the right condition com-

3097J. Acoust. Soc. Am., Vol. 117, No. 5, May 2005 Ruth Y. Litovsky: Speech intelligibility in young children

pared with the front. These ﬁndings suggest that, while tasks

that are more complex, using sentence material and/or same-

gender stimuli may be more appropriate for measuring spa-

tial release from masking in adults, the task used here is a

good tool for measuring the ability of young children to ne-

gotiate complex auditory environments.

The ﬁnding that, overall, spatial release from masking in

children is similar to that in adults is consistent with work

showing that preschool-age children perform similar to

adults on measures of binaural masking level differences

共Nozza et al., 1988; Moore et al., 1991兲 and minimum au-

dible angle 关Litovsky 共1997兲; for review see Litovsky and

Ashmead 共1997兲兴. This ﬁnding implies that for a simple

closed-set task young children are able to utilize spatial

and/or head shadow cues to the same extent as adults in

order to segregate sounds in noisy environments. That is not

to say that children would be expected to perform similar to

adults on all measures of speech intelligibility in noise.

Given recent ﬁndings that children exhibit poorer attentional

selectivity on auditory tasks 共e.g., Oh et al., 2001兲, and re-

duced unmasking for tone detection under dichotic condi-

tions 共Wightman et al., 2003; Hall et al., 2004兲, the possibil-

ity remains that age differences would be seen under more

demanding conditions, such as an open-set test or with same-

gender target and competitors. Those differences, however,

would not be attributable to age-dependent binaural abilities,

but rather to other central processes such as auditory atten-

tion.

E. Conclusions

Young children require higher signal levels than adults

to identify spondees in a simple 4AFC task, and these age-

related differences may be mediated by both peripheral and

central auditory processes. The fact that young children can

beneﬁt from spatial separation of the target speech and com-

peting sources suggests that in a complex acoustic environ-

ment, such as a noisy classroom, they might ﬁnd it easier to

attain information if the source of interest is spatially segre-

gated from noise sources. Although, the extent to which this

is true with real-world sounds may depend on duration, com-

plexity and type of sounds, and the demand on attentional

resources that various sounds may require. Finally, the test

used here 共developed by Litovsky, 2003兲 is designed to also

be used in pediatric clinical settings where young children

are often ﬁtted with hearing aids or cochlear implants, with

little knowledge about the efﬁcacy of the ﬁttings in noisy

environments. This test may offer a way to evaluate the abili-

ties in children with hearing aids and cochlear implants to

function in noisy environments, and may, for example, be

useful in assessing the extent to which children obtain a ben-

eﬁt from bilateral ﬁtting strategies 共Litovsky et al., 2004兲.

ACKNOWLEDGMENTS

The author is grateful to Aarti Dalal and Gerald Ng for

assistance with programming and data collection, and to Patti

Johnstone and Shelly Godar for helping with data analysis.

The author is also grateful to Dr. Joseph Hall for initially

suggesting the use of spondees in a forced choice paradigm,

and to Dr. Adelbert Bronkhorst and an anonymous reviewer

for helpful suggestions during the review process. This work

was supported by NIDCD 共Grant Nos. DC00100 and

DC0055469兲, National Organization for Hearing Research,

and the Deafness Research Foundation. Portions of the data

were collected while R. Litovsky was at Boston University,

Hearing Research Center.

APPENDIX: LIST OF SPONDEE WORDS USED IN THE

PRESENT EXPERIMENT

Hotdog

Ice Cream

Birdnest

Cowboy

Dollhouse

Barnyard

Scarecrow

Railroad

Sidewalk

Rainbow

Cupcake

Birthday

Airplane

Eyebrow

Shoelace

Toothbrush

Hairbrush

Highchair

Necktie

Playground

Football

Baseball

Bluejay

Bathtub

Bedroom

The lower limit of 4.5 years is slightly conservative, and was based on pilot

testing which suggested that by that age all children were familiar with the

majority of the target words. The upper limit of 7.5 is somewhat smaller

than the 10-year limit used in a number of other works 共e.g., Oh et al.,

2001; Hall et al., 2002兲, but similar to that used in studies on auditory

attention in young children, in which there do not appear to be develop-

mental effects within the age range 共e.g., Stellmack et al., 1997; Oh et al.,

2001兲.

Allen, P., and Wightman, F. 共1994兲. ‘‘Psychometric functions for children’s

detection of tones in noise,’’ J. Speech Hear. Res. 37, 205–215.

Allen, P., Wightman, F., Kistler, D., and Dolan, T. 共1989兲. ‘‘Frequency reso-

lution in children,’’ J. Speech Hear. Res. 32, 317–322.

American National Standards Institute 共2002兲. ‘‘Standard for acoustical

characteristics of classrooms in the United States,’’ ANSI—S12.60.

Arbogast, T. L., Mason, C. R., and Kidd, G. 共2002兲. ‘‘The effect of spatial

separation on informational and energetic masking of speech,’’ J. Acoust.

Soc. Am. 112, 2086–2098.

Bronkhorst, A. 共2000兲. ‘‘The cocktail party phenomenon: A review of re-

search on speech intelligibility in multiple-talker conditions,’’Acta. Acust.

Acust. 86, 117–128.

Bronkhorst, A. W., and Plomp, R. 共1992兲. ‘‘Effect of multiple speechlike

maskers on binaural speech recognition in normal and impaired hearing,’’

J. Acoust. Soc. Am. 92, 3132–3139.

Brungart, D. S., Simpson, B. D., Ericson, M. A., and Scott, K. R. 共2001兲.

‘‘Informational and energetic masking effects in the perception of multiple

talkers,’’ J. Acoust. Soc. Am. 110, 2527–2538.

3098 J. Acoust. Soc. Am., Vol. 117, No. 5, May 2005 Ruth Y. Litovsky: Speech intelligibility in young children

Buss, E., Hall, III, J. W., Grose, J. H., and Dev, M. B. 共1999兲. ‘‘Development

of adult-like performance in backward, simultaneous, and forward mask-

ing,’’ J. Speech Lang. Hear. Res. 42, 844–849.

Crandell, C. C. 共1993兲. ‘‘Speech recognition in noise by children with mini-

mal degrees of sensorineural hearing loss,’’ Ear Hear. 14, 210–216.

Culling, J. F., Hawley, M. L., and Litovsky, R. Y. 共2004兲. ‘‘The role of

head-induced interaural time and level differences in the speech reception

threshold for multiple interfering sound sources,’’ J.Acoust. Soc. Am. 116,

1057–1065.

Dawson, P. W., Decker, J. A., and Psarros, C. E. 共2004兲. ‘‘Optimizing dy-

namic range in children using the nucleus cochlear implant,’’ Ear Hear. 25,

230–241.

Durlach, N. I., Mason, C. R., Shinn-Cunningham, B. G., Arbogast, T. L.,

Colburn, H. S., and Kidd, Jr., G. 共2003兲. ‘‘Informational masking: coun-

teracting the effects of stimulus uncertainty by decreasing target-masker

similarity,’’ J. Acoust. Soc. Am. 114, 368–379.

Eisenberg, L. S., Kirk, K. I., Martinez, A. S., Ying, E. A., and Miyamoto, R.

T. 共2004兲. ‘‘Communication abilities of children with aided residual hear-

ing: comparison with cochlear implant users,’’ Arch. Otolaryngol. Head

Neck Surg. 130, 563–569.

Festen, J. M., and Plomp, R. 共1990兲. ‘‘Effects of ﬂuctuating noise and inter-

fering speech on the speech-reception threshold for impaired and normal

hearing,’’ J. Acoust. Soc. Am. 88, 1725–1736.

Fletcher, P., and MacWhinney, B. 共1995兲. Handbook of Child Language

共Blackwell, Oxford, UK兲.

Freyman, R. L., Balakrishnan, U., and Helfer, K. S. 共2001兲. ‘‘Spatial release

from informational masking in speech recognition,’’ J. Acoust. Soc. Am.

109, 2112–2122.

Freyman, R. L., Balakrishnan, U., and Helfer, K. S. 共2004兲. ‘‘Effect of

number of masking talkers and auditory priming on informational masking

in speech recognition,’’ J. Acoust. Soc. Am. 115, 2246–2256.

Hall, J. W., Buss, E., Grose, J. H., and Dev, M. B. 共2004兲. ‘‘Developmental

effects in the masking-level difference,’’ J. Speech Lang. Hear. Res. 47,

13–20.

Hall, III, J. W., and Grose, J. H. 共1991兲. ‘‘Notched-noise measures of fre-

quency selectivity in adults and children using ﬁxed-masker-level and

ﬁxed-signal-level presentation,’’ J. Speech Hear. Res. 34, 651–60.

Hall, III, J. W., Grose, J. H., and Dev, M. B. 共1997兲. ‘‘Auditory development

in complex tasks of comodulation masking release,’’ J. Speech Lang. Hear.

Res. 40, 946–954.

Hall, III, J. W., Grose, J. H., Buss, E., and Dev, M. B. 共2002兲. ‘‘Spondee

recognition in a two-talker masker and a speech-shaped noise masker in

adults and children,’’ Ear Hear. 23, 159–165.

Hall, III, J. W., Grose, J. H., Buss, E., Dev, M. B., Drake, A. F., and Pills-

bury, H. C. 共2003兲. ‘‘The effect of otitis media with effusion on perceptual

masking,’’ Arch. Otolaryngol. Head Neck Surg. 129, 1056–1062.

Hartley, D. E., Wright, B. A., Hogan, S. C., and Moore, D. R. 共2000兲.

‘‘Age-related improvements in auditory backward and simultaneous mask-

ing in 6- to 10-year-old children,’’ J. Speech Lang. Hear. Res. 43,1402–

1415.

Hawley, M. L., Litovsky, R. Y., and Colburn, H. S. 共1999兲. ‘‘Speech intel-

ligibility and localization in complex environments,’’ J. Acoust. Soc. Am.

105, 3436–3448.

Hawley, M. L., Litovsky, R. Y., and Culling, J. F. 共2004兲. ‘‘The beneﬁt of

binaural hearing in a cocktail party: Effect of location and type of inter-

ferer,’’ J. Acoust. Soc. Am. 115, 833–843.

Johnstone, P., and Litovsky, R. Y. 共2005兲. ‘‘Speech intelligibility and spatial

release from masking in children and adults for various types of interfer-

ing sounds,’’ J. Acoust. Soc. Am. 共in press兲.

Kidd, Jr., G., Mason, C. R., and Richards, V. M. 共2003兲. ‘‘Multiple bursts,

multiple looks, and stream coherence in the release from informational

masking,’’ J. Acoust. Soc. Am. 114, 2835–2845.

Koehnke, J., and Besing, J. M. 共1996兲. ‘‘A procedure note for testing speech

intelligibility in a virtual listening environment,’’ Ear Hear. 17, 211–217.

Knecht, H. A., Nelson, P. B., Whitelaw, G. M., and Feth, L. L. 共2002兲.

‘‘Background noise levels and reverberation times in unoccupied class-

rooms: predictions and measurements,’’ Am. J. Audiol. 11, 65–71.

Litovsky, R. 共1997兲. ‘‘Developmental changes in the precedence effect: Es-

timates of Minimal Audible Angle,’’ J. Acoust. Soc. Am. 102, 1739–1745.

Litovsky, R. 共2003兲. ‘‘Method and system for rapid and reliable testing of

speech intelligibility in children,’’ U.S. Patent No. 6,584,440.

Litovsky, R., and Ashmead, D. 共1997兲. ‘‘Developmental aspects of binaural

and spatial hearing,’’ in Binaural and Spatial Hearing, edited by R. H.

Gilkey and T. R. Anderson 共Earlbaum, Hillsdale, NJ兲, pp. 571–592.

Litovsky, R. Y., Fligor, B., and Tramo, M. 共2002兲. ‘‘Functional role of the

human inferior colliculus in binaural hearing,’’ Hear. Res. 165, 177–188.

Litovsky, R. Y., Parkinson, A.,Arcaroli, J., Peters, R., Lake, J., Johnstone, P.,

and Yu, G. 共2004兲. ‘‘Bilateral cochlear implants in adults and children,’’

Arch. Otolaryngol. Head Neck Surg. 130, 648–655.

Lutﬁ, R. A. 共1990兲. ‘‘How much masking is informational masking?’’ J.

Acoust. Soc. Am. 88, 2607–2610.

Lutﬁ, R. A., Kistler, D. J., Oh, E. L., Wightman, F. L., and Callahan, M. R.

共2003兲. ‘‘One factor underlies individual differences in auditory informa-

tional masking within and across age groups,’’ Percept. Psychophys. 65,

396–406.

Moore, D. R., Hutchings, M., and Meyer, S. 共1991兲. ‘‘Binaural masking

level differences in children with a history of otitis media,’’Audiology 30,

91–101.

Moore, D. R., Hartley, D. E., and Hogan, S. C. 共2003兲. ‘‘Effects of otitis

media with effusion 共OME兲 on central auditory function,’’ Int. J. Pediatr.

Otorhinolaryngol. 67, S63–S67.

Neff, D. L., and Green, D. M. 共1987兲. ‘‘Masking produced by spectral un-

certainty with multicomponent maskers,’’ Percept. Psychophys. 41,409–

415.

Nilsson, M., Soli, S. D., and Sullivan, J. A. 共1994兲. ‘‘Development of the

Hearing in Noise Test for the measurement of speech reception thresholds

in quiet and in noise,’’ J. Acoust. Soc. Am. 95, 1085–1099.

Nozza, R. J., Wagner, E. F., and Crandell, M. A. 共1988兲. ‘‘Binaural release

from masking for a speech sound in infants, preschoolers, and adults,’’ J.

Speech Hear. Res. 31, 212–218.

Oh, E. L., Wightman, F., and Lutﬁ, R. A. 共2001兲. ‘‘Children’s detection of

pure-tone signals with random multiple maskers,’’ J. Acoust. Soc. Am.

109, 2888–2895.

Papso, C. F., and Blood, I. M. 共1989兲. ‘‘Word recognition skills of children

and adults in background noise,’’ Ear Hear. 10, 337–338.

Peissig, J., and Kollmeier, B. 共1997兲. ‘‘Directivity of binaural noise reduc-

tion in spatial multiple noise-source arrangements for normal and impaired

listeners,’’ J. Acoust. Soc. Am. 101, 1660–1670.

Plomp, R., and Mimpen, A. M. 共1981兲. ‘‘Effect of the orientation of the

speaker’s head and the azimuth on a noise source on the speech reception

threshold for sentences,’’ Acustica 48, 325–328.

Roberts, J., Hunter, L., Gravel, J., Rosenfeld, R., Berman, S., Haggard, M.,

Hall, III, J., Lannon, C., Moore, D., Vernon-Feagans, L., and Wallace, I.

共2004兲. ‘‘Otitis media, hearing loss, and language learning: controversies

and current research,’’ J. Dev. Behav. Pediatr. 25, 110–122.

Rothauser, E. H., Chapman, W. D., Guttman, N., Nordby, K. S., Silbiger, H.

R., Urbanek, G. E., and Weinstock, M. 共1969兲. ‘‘IEEE Recommended

Practice for Speech Quality Measurements,’’ IEEE Trans. Audio Electroa-

coust. 17, 227–246.

Shinn-Cunningham, B. G., Schickler, J., Kopco, N., and Litovsky, R. 共2001兲.

‘‘Spatial unmasking of nearby speech sources in a simulated anechoic

environment,’’ J. Acoust. Soc. Am. 110, 1118–1129.

Stellmack, M. A., Willihnganz, M. S., Wightman, F. L., and Lutﬁ, R. A.

共1997兲. ‘‘Spectral weights in level discrimination by preschool children:

analytic listening conditions,’’ J. Acoust. Soc. Am. 101, 2811–2821.

Uitenbroek, D. G. 共1997兲. ‘‘SISA Binomial,’’ Southampton: D. G. Uiten-

broek. Retrieved 1 January, 2004, from the World Wide Web: http://

home.clara.net/sisa/binomial.htm.

Veloso, K., Hall, III, J. W., and Grose, J. H. 共1990兲. ‘‘Frequency selectivity

and comodulation masking release in adults and in 6-year-old children,’’ J.

Speech Hear. Res. 33, 96–102.

Wichmann, F. A., and Hill, N. J. 共2001a兲. ‘‘The psychometric function: I.

Fitting, sampling, and goodness of ﬁt,’’ Percept. Psychophys. 63,1293–

1313.

Wichmann, F. A., and Hill, N. J. 共2001b兲. ‘‘The psychometric function: II.

Bootstrap-based conﬁdence intervals and sampling,’’ Percept. Psychophys.

63, 1314–1329.

Wightman, F. L., Callahan, M. R., Lutﬁ, R. A., Kistler, D. J., and Oh, E.

共2003兲. ‘‘Children’s detection of pure-tone signals: informational masking

with contralateral maskers,’’ J. Acoust. Soc. Am. 113, 3297–3305.

Yacullo, W. S., and Hawkins, D. B. 共1987兲. ‘‘Speech recognition in noise

and reverberation by school-age children,’’ Audiology 26, 235–246.

3099J. Acoust. Soc. Am., Vol. 117, No. 5, May 2005 Ruth Y. Litovsky: Speech intelligibility in young children

The Not-So-Slight Perceptual Consequences of Slight Hearing Loss in School-Age Children: A Scoping Review

Article

May 2024
LANG SPEECH HEAR SER

Purpose This study aimed to conduct a scoping review of research exploring the effects of slight hearing loss on auditory and speech perception in children. Method A comprehensive search conducted in August 2023 identified a total of 402 potential articles sourced from eight prominent bibliographic databases. These articles were subjected to rigorous evaluation for inclusion criteria, specifically focusing on their reporting of speech or auditory perception using psychoacoustic tasks. The selected studies exclusively examined school-age children, encompassing those between 5 and 18 years of age. Following rigorous evaluation, 10 articles meeting these criteria were selected for inclusion in the review. Results The analysis of included articles consistently shows that even slight hearing loss in school-age children significantly affects their speech and auditory perception. Notably, most of the included articles highlighted a common trend, demonstrating that perceptual deficits originating due to slight hearing loss in children are particularly observable under challenging experimental conditions and/or in cognitively demanding listening tasks. Recent evidence further underscores that the negative impacts of slight hearing loss in school-age children cannot be solely predicted by their pure-tone thresholds alone. However, there is limited evidence concerning the effect of slight hearing loss on the segregation of competing speech, which may be a better representation of listening in the classroom. Conclusion This scoping review discusses the perceptual consequences of slight hearing loss in school-age children and provides insights into an array of methodological issues associated with studying perceptual skills in school-age children with slight hearing losses, offering guidance for future research endeavors.

Objective and Subjective Measures of Spatial Hearing in Unilateral Cochlear Implant Users with Bilateral Profound Hearing Loss

Article

Apr 2024

Purpose: The ability to benefit from spatial separation between target and masker signals is important in multi-sound source listening environments. The goal of this study was to measure the spatial release from masking (SRM) in unilateral cochlear implant (CI) users with bilateral profound hearing loss. We also determined the relationships between the SRMs and the self-reported spatial hearing abilities.Methods: Fourteen unilateral CI users with bilateral profound hearing loss participated in this study. The target sentence was always presented to the front of the listener, and the nonfluctuating speech-shaped noise (SSN) or fluctuating speech noise was either co-located with the target (speech at 0°, noise at 0°, S0N0) or spatially separated at ± 90°. The SRM was quantified as the difference between speech recognition thresholds (SRTs) in the co-located and spatially separated conditions. The self-reported spatial hearing abilities were also measured using validated subjective questionnaires.Results: Overall, the SRTs were lower (better) with SSN than with fluctuating speech noise. When the noise was presented to the non-CI ear (speech at 0°, noise at non-CI ear, S0Nnonci), speech-in-noise recognition was the greatest due to head shadow or better-ear listening effect, resulting in the SRMs of approximately 5~6 dB regardless of noise type. When the noise was given to the CI ear (speech at 0°, noise at CI ear, S0Nci), some individuals exhibited positive SRMs (3~8 dB), while others showed negative SRMs, leading to little SRMs overall. When the SSN was given, subjects with less SRMs (less spatial separation benefits on the objective test) reported greater subjective spatial hearing difficulties.Conclusion: The spatial hearing of unilateral CI users varied by the position of the sound source. Listeners' spatial hearing abilities, which are unpredictable from clinical routine tests, need to be assessed by either objective or subjective measures.

Bruit ambiant et voix dégradée : quel impact sur la perception et la compréhension de la parole chez l’enfant ?

Article

Jun 2023

Écouter un orateur dans des conditions acoustiques défavorables reste un défi pour l’enfant apprenant. À l’école, les élèves sont exposés à différents bruits, dont le niveau de pression sonore peut vite se révéler critique. Ils peuvent aussi écouter un enseignant porteur d’un trouble vocal. Cet article compile les résultats de quatre années de recherche menées dans l’Unité logopédie de la voix à l’Université de Liège. Isabel Schiller a dévolu son doctorat à l’étude des effets isolés et combinés du bruit ambiant et de la qualité vocale du locuteur sur le traitement du langage oral des enfants d’environ 6 ans. L’objectif était d’explorer la manière dont ils perçoivent et comprennent la parole dans le bruit, lorsqu’elle est transmise par un locuteur à la voix dégradée. Le bruit en classe et la voix altérée d’un locuteur réduisent la performance des enfants en classe et augmentent leur effort d’écoute.

Ceiling effects for speech perception tests in pediatric cochlear implant users

Article

Full-text available

Oct 2023

Objectives: The purpose of this study was to determine the prevalence of ceiling effects for commonly used speech perception tests in a large population of children who received a cochlear implant (CI) before the age of four. A secondary goal was to determine the demographic factors that were relevant for predicting which children were more likely to reach ceiling level performance. We hypothesize that ceiling effects are highly prevalent for most tests. Design: Retrospective chart review of children receiving a CI between 2002 and 2014. Results: 165 children were included. Median scores were above ceiling levels (≥90% correct) for the majority of speech perception tests and all distributions of scores were highly skewed. Children who were implanted earlier, received two implants, and were oral communicators were more likely to reach ceiling-level performance. Age and years of CI listening experience at time of test were negatively correlated with performance, suggesting a non-random assignment of tests. Many children were re-tested on tests for which they had already scored at ceiling. Conclusions: Commonly used speech perception tests for children with CIs are prone to ceiling effects and may not accurately reflect how a child performs in everyday listening situations.

Sound field synthesis for psychoacoustic research: In situ evaluation of auralized sound pressure level

Article

Sep 2023

The use of virtual acoustic environments has become a key element in psychoacoustic and audiologic research, as loudspeaker-based reproduction offers many advantages over headphones. However, sound field synthesis methods have mostly been evaluated numerically or perceptually in the center, yielding little insight into the achievable accuracy of the reproduced sound field over a wider reproduction area with loudspeakers in a physical, laboratory-standard reproduction setup. Deviations from the ideal free-field and point-source concepts, such as non-ideal frequency response, non-omnidirectional directivity, acoustic reflections, and diffraction on the necessary hardware, impact the generated sound field. We evaluate reproduction accuracy in a 61-loudspeaker setup, the Simulated Open Field Environment, installed in an anechoic chamber. A first measurement following the ISO 8253-2:2009 standard for free-field audiology shows that the required accuracy is reached with critical-band-wide noise. A second measurement characterizes the sound pressure reproduced with the higher-order Ambisonics basic decoder, with and without max rE weighting, vector base amplitude panning, and nearest loudspeaker mapping on a 187 cm × 187 cm reproduction area. We show that the sweet-spot size observed in measured sound fields follows the rule kr≤N/2 rather than kr≤N but is still large enough to avoid compromising psychoacoustic experiments.

Is Inappropriate Pulse Timing Responsible for Poor Binaural Hearing with Cochlear Implants?

Preprint

Aug 2023

Cochlear implants (CIs) have restored enough of a sense of hearing to around one million severely hearing impaired patients to enable speech understanding in quiet. However, several aspects of hearing with CIs remain very poor. This includes a severely limited ability of CI patients to make use of interaural time difference (ITD) cues for spatial hearing and noise reduction. A major cause for this poor ITD sensitivity could be that current clinical devices fail to deliver ITD information in a manner that is accessible to the auditory pathway. CI processors measure the envelopes of incoming sounds and then stimulate the auditory nerve with electrical pulse trains which are amplitude modulated to reflect incoming sound envelopes. The timing of the pulses generated by the devices is largely or entirely independent of the incoming sounds. Consequently, bilateral CIs (biCIs) provide veridical envelope (ENV) ITDs but largely or entirely replace the “fine structure” ITDs that naturally occur in sounds with completely arbitrary electrical pulse timing (PT) ITDs. To assess the extent to which this matters, we devised experiments that measured the sensitivity of deafened rats to precisely and independently controlled PT and ENV ITDs for a variety of different CI pulse rates and envelope shapes. We observed that PT ITDs completely dominate ITD perception, while the sensitivity to ENV ITDs was almost negligible in comparison. This strongly suggests that the confusing yet powerful PT ITDs that contemporary clinical devices deliver to biCI patients may be a major cause of poor binaural hearing outcomes with biCIs. Significance Statement CIs deliver spectro-temporal envelopes, including speech formants, to severely deaf patients, but they do little to cater to the brain’s ability to process temporal sound features with sub-millisecond precision. CIs “sample” sound envelope signals rapidly and accurately, and thus provide information which should make it possible in principle for CI listeners to detect envelope ITDs in a similar way to normal listeners. However, here we demonstrate through behavioral experiments on CI implanted rats trained to detect sub-millisecond ITDs that pulse timing ITDs completely dominate binaural hearing. This provides the strongest confirmation to date that the arbitrary pulse timing widely used in current clinical CIs is a critical obstacle to good binaural hearing through prosthetic devices.

Efficacy of Auditory Spatial Training with Real-Life Environmental Noise on Speech-in-Noise Intelligibility of Children with Hearing Loss

Article

Jul 2023

Purpose: This study aimed to determine whether auditory spatial training with real-life environmental noise would improve the speech-in-noise intelligibility of hearing-impaired children. Methods: Thirteen children with hearing loss participated in this study. We conducted an 8-week in-laboratory auditory spatial training. During the training, the target sentence and pre-recorded real-life environmental noise were spatially separated by 90°, and uncertainty about the location of the target and noise was given. To evaluate the efficacy of the training, sentence recognition with fluctuating and non-fluctuating noises was measured in a free sound-field condition, where the speech and noise sources were spatially separated and also co-located. The pre-training tests of sentence-in-noise recognition were performed twice with an interval of 6 weeks. The sentence-in-noise recognition test was also measured right after the 8-week training (post-training test) as well as 1 month after the completion of the training (retention test). In addition to the objective tests, the parents completed a subjective questionnaire on auditory behavior in everyday life before and after training. Results: There were no significant differences between the results of the two pre-training tests. The auditory spatial training significantly enhanced sentence-in-noise recognition in both spatially separated and co-located conditions at all signal-to-noise ratios, and the training efficacy was maintained until 1 month after the completion of the training. The parental subjective responses also showed positive changes after the training. Conclusion: An 8-week auditory spatial training could effectively enhance the speech-in-noise intelligibility of hearing-impaired children in spatialized as well as non-spatialized conditions.

Age differences in binaural and working memory abilities in school-going children

Article

Jul 2023
Int J Pediatr Otorhinolaryngol

Objectives: Binaural hearing is the interplay of acoustic cues (interaural time differences: ITD, interaural level differences: ILD, and spectral cues) and cognitive abilities (e.g., working memory, attention). The current study investigated the effect of developmental age on auditory binaural resolution and working memory and the association between them (if any) in school-going children. Methods: Fifty-seven normal-hearing school-going children aged 6-15 y were recruited for the study. The participants were divided into three groups: Group 1 (n=17, Mage = 7.1y ± 0.72 y), Group 2 (n = 23; Mage = 10.2y ± 0.8 y), Group 3 (n = 17; Mage: 14.1 y ±1.3 y). Group 4, with normal hearing young adults (n = 20; Mage = 21.1 y± 3.2 y), was included for comparing the maturational changes in former groups with adult values. Tests of binaural resolution (ITD and ILD thresholds) and auditory working memory (forward and backward digit span and 2n-back digit) were administered to all the participants. Results: Results indicated a main effect of age on spatial resolution and working memory, with the median of lower age groups (Group 1 & Group 2) being significantly poorer (p < 0.01) than the higher age groups (Group 3 & Group 4). Groups 2, 3, and 4 performed significantly better than Group 1 (p < 0.001) on the forward span and ILD task. Groups 3 and 4 had significantly better ITD (p = 0.04), backward span (p = 0.02), and 2n-back scores than Group 2. A significant correlation between scores on working memory tasks and spatial resolution thresholds was also found. On discriminant function analysis, backward span and ITD emerged as sensitive measures for segregating older groups (Group 3 & Group 4) from younger groups (Group 1 & Group 2). Conclusions: The present study showed that the ILD thresholds and forward digit span mature by nine years. However, the backward digit span score continued to mature beyond 15 y. This finding can be attributed to the influence of auditory attention (a working memory process) on the binaural resolution, which is reported to mature till late adolescence.

Sensitivity to Interaural Time Difference (ITD) in Cochlear Implant Recipients: Should the ITD be on Pulse-Timing or Envelope?

Conference Paper

May 2023

Children's use of spatial and visual cues for release from perceptual masking

Article

Feb 2024

This study examined the role of visual speech in providing release from perceptual masking in children by comparing visual speech benefit across conditions with and without a spatial separation cue. Auditory-only and audiovisual speech recognition thresholds in a two-talker speech masker were obtained from 21 children with typical hearing (7–9 years of age) using a color–number identification task. The target was presented from a loudspeaker at 0° azimuth. Masker source location varied across conditions. In the spatially collocated condition, the masker was also presented from the loudspeaker at 0° azimuth. In the spatially separated condition, the masker was presented from the loudspeaker at 0° azimuth and a loudspeaker at –90° azimuth, with the signal from the –90° loudspeaker leading the signal from the 0° loudspeaker by 4 ms. The visual stimulus (static image or video of the target talker) was presented at 0° azimuth. Children achieved better thresholds when the spatial cue was provided and when the visual cue was provided. Visual and spatial cue benefit did not differ significantly depending on the presence of the other cue. Additional studies are needed to characterize how children's preferential use of visual and spatial cues varies depending on the strength of each cue.

The psychometric function: I. Fitting, sampling, and goodness of fit

Article

Full-text available

Nov 2001

The psychometric function relates an observer’s performance to an independent variable, usually some physical quantity of a stimulus in a psychophysical task. This paper, together with its companion paper (Wichmann & Hill, 2001), describes an integrated approach to (1) fitting psychometric functions, (2) assessing the goodness of fit, and (3) providing confidence intervals for the function’s parameters and other estimates derived from them, for the purposes of hypothesis testing. The present paper deals with the first two topics, describing a constrained maximum-likelihood method of parameter estimation and developing several goodness-of-fit tests. Using Monte Carlo simulations, we deal with two specific difficulties that arise when fitting functions to psychophysical data. First, we note that human observers are prone to stimulus-independent errors (orlapses). We show that failure to account for this can lead to serious biases in estimates of the psychometric function’s parameters and illustrate how the problem may be overcome. Second, we note that psychophysical data sets are usually rather small by the standards required by most of the commonly applied statistical tests. We demonstrate the potential errors of applying traditionalX 2 methods to psychophysical data and advocate use of Monte Carlo resampling techniques that do not rely on asymptotic theory. We have made available the software to implement our methods.

Article

Dec 2000

This study investigated the development of auditory frequency and temporal resolution using simultaneous and backward masking of a tone by a noise. The participants were 6- to 10-year-old children and adults. On the measure of frequency resolution (the difference in the detection threshold for a tone presented either in a bandpass noise or in a spectrally notched noise), 6-year-old children performed as well as adults. However, for the backward masking task, 6-year-olds had, on average, 34 dB higher thresholds than adults. A negative exponential decay function Fitted to the backward masking data for subjects of all ages indicated that adult-like temporal resolution may not be reached until about 11 years of age. These results show that, measured by masking, frequency resolution has reached adult-like performance by 6 years of age, whereas temporal resolution develops beyond 10 years of age. Six-year-old children were also assessed with tests of cognitive ability. Improvements in both frequency and temporal resolution were found with increasing IQ score.

Word recognition skills of children and adults in background noise (I)

Article

Jan 1989

M.P. Feeney

Effect of number of masking talkers and auditory priming on informational masking in speech recognition

Article

May 2004

Three experiments investigated factors that influence the creation of and release from informational masking in speech recognition. The target stimuli were nonsense sentences spoken by a female talker. In experiment 1 the masker was a mixture of three, four, six, or ten female talkers, all reciting similar nonsense sentences. Listeners' recognition performance was measured with both target and masker presented from a front loudspeaker (F-F) or with a masker presented from two loudspeakers, with the right leading the front by 4 ms (F-RF). In the latter condition the target and masker appear to be from different locations. This aids recognition performance for one- and two-talker maskers, but not for noise. As the number of masking talkers increased to ten, the improvement in the F-RF condition diminished, but did not disappear. The second experiment investigated whether hearing a preview (prime) of the target sentence before it was presented in masking improved recognition for the last key word, which was not included in the prime. Marked improvements occurred only for the F-F condition with two-talker masking, not for continuous noise or F-RF two-talker masking. The third experiment found that the benefit of priming in the F-F condition was maintained if the prime sentence was spoken by a different talker or even if it was printed and read silently. These results suggest that informational masking can be overcome by factors that improve listeners' auditory attention toward the target. (C) 2004 Acoustical Society of America.

Development of binaural and spatial hearing

Article

Jan 2012

Ruth Litovsky

The role of head-induced interaural time and level differences in the speech reception threshold for multiple interfering sound sources

Article

Aug 2004

Three experiments investigated the roles of interaural time differences (ITDs) and level differences (ILDs) in. spatial unmasking in multi-source environments. In experiment 1, speech reception thresholds (SRTs) were measured in virtual-acoustic simulations of an anechoic environment with three interfering sound sources of either speech or noise. The target source lay directly ahead, while three interfering sources were (1) all at the target's location (0degrees,0degrees,0'), (2) at locations distributed across both hemifields (-30degrees,60degrees,90degrees); (3) at locations in the same hemifield (30degrees,60degrees,90degrees), or (4) co-located in one hernifield (90degrees,90degrees,90degrees). Sounds were convolved with head-related impulse responses (HRIRs) that were manipulated to remove individual binaural cues. Three conditions used HRIRs with (1) both ILDs and ITDs, (2) only ILDs, and (3) only ITDs. The ITD-only condition produced the same pattern of results across spatial configurations as the combined cues, but with, smaller differences between spatial configurations. The ILD-only condition yielded similar SRTs for the (-30degrees,60degrees,90degrees) and (0degrees,0degrees,0degrees) configurations, as expected for best-ear listening. In experiment 2, pure-tone BMLDs were measured at third-octave frequencies against the ITD-only, speech-shaped noise interferers of experiment 1. These BMLDs were 4-8 dB at low frequencies for all spatial configurations. In experiment 3, SRTs were measured for speech in diotic, speech-shaped noise. Noises were filtered to reduce the spectrum level at each frequency according to the BMLDs measured in experiment 2. SRTs were as low or lower than those of the corresponding ITD-only conditions from experiment 1. Thus, an explanation of speech understanding in complex listening environments based on the combination of best-ear listening and binaural unmasking (without involving sound-localization) cannot be excluded. (C) 2004 Acoustical Society of America.

Article

Sep 1969

The IEEE Recommended Practice for Speech Quality Measurement is the product of roughly six years of study, discussion, writing, and rewriting by a diligent team of scientists representing a broad range of disciplines and research institutions. We will attempt to recognize here all those who participated, ranging from the dogged individuals who were with the committee for its entire life to those who participated in at least one meeting.

Method and system for rapid and reliable testing of speech intelligibility in children

Article

Jun 2004

Ruth Litovsky

Effect of the orientation of the speaker's head and azimuth of a noise source on the speech reception threshold for sentences

Article

Jan 1980
ACTA ACUST UNITED AC

The speech reception threshold (SRT) for sentences in a free field condition was investigated as a funcion of (1) the orientation of the speaker's head, and (2) the azimuth of a noise source, with the listener always looking in the direction of the speaker. A miniature electric microphone with a flat frequency response was glued to the upper lip of a pseudospeaker standing 2 m from a loudspeaker. For each of ten directional orientations of the pseudospeaker, corresponding to angles, of zero deg, 22.5 deg, 45 deg, 90 deg, and 180 deg relative to the loudspeaker, a list of 13 sentences was reproduced and rerecorded through the microphone. Results show that (1) the maximum effect of direction in speech radiation from the mouth of the speaker is, in terms of signal to noise ratio, 6 dB, and (2) the maximum effect of the azimuth of the noise source is 10 dB.

The Effect of Otitis Media With Effusion on Perceptual Masking

Article

Oct 2003
ARCH OTOLARYNGOL

To determine the effect of otitis media with effusion (OME) on perceptual masking (a phenomenon in which spondee threshold for a 2-talker masker is poorer than for a speech-shaped noise masker). Longitudinal testing over a 1-year period following insertion of tympanostomy tubes, using clinical and normal-hearing control groups. Forty-seven children having a history of OME were tested. Possible testing intervals were just before the placement of tympanostomy tubes, and up to 3 separate occasions after the placement of the tubes. An age-matched control group of 19 children was tested. A perceptual masking paradigm was used to measure the ability of the listener to recognize a spondee in either a speech-shaped noise or a 2-talker masker background. The masker was either continuous or gated on and off with the target spondee. In gated masking conditions, children with a history of normal hearing showed only slight perceptual masking, but the children with a history of OME showed relatively great perceptual masking before surgery and up to 6 months following surgery. In continuous masking conditions, both groups of children showed relatively great perceptual masking and did not differ significantly from each other in this respect either before or after surgery. However, before surgery, the OME group showed higher thresholds in both the 2-talker and speech-shaped noise maskers. In agreement with previous psychoacoustical findings, the relatively great perceptual masking in gated conditions shown by children with OME history may reflect a general deficit in complex auditory processing.

Speech intelligibility and spatial release from masking in young children

Abstract and Figures

Recommended publications

Independent and Combined Effects of Fundamental Frequency and Vocal Tract Length Differences for Sch...

Effect of masker type on speech intelligibility and spatial release from masking in children and adu...

Role of Masker Predictability in the Cocktail Party Problem

Speech intelligibility in free field: Spatial unmasking in preschool children

Importance of Age and Postimplantation Experience on Speech Perception Measures in Children With Seq...