ArticlePDF Available

Speech intelligibility and spatial release from masking in young children

Authors:

Abstract and Figures

Children between the ages of 4 and 7 and adults were tested in free field on speech intelligibility using a four-alternative forced choice paradigm with spondees. Target speech was presented from front (0 degrees); speech or modulated speech-shaped-noise competitors were either in front or on the right (90 degrees). Speech reception thresholds were measured adaptively using a three-down/one-up algorithm. The primary difference between children and adults was seen in elevated thresholds in children in quiet and in all masked conditions. For both age groups, masking was greater with the speech-noise versus speech competitor and with two versus one competitor(s). Masking was also greater when the competitors were located in front compared with the right. The amount of masking did not differ across the two age groups. Spatial release from masking was similar in the two age groups, except for in the one-speech condition, when it was greater in children than adults. These findings suggest that, similar to adults, young children are able to utilize spatial and/or head shadow cues to segregate sounds in noisy environments. The potential utility of the measures used here for studying hearing-impaired children is also discussed.
Content may be subject to copyright.
Speech intelligibility and spatial release from masking in young
children
a)
Ruth Y. Litovsky
b)
Waisman Center, University of Wisconsin—Madison, 1500 Highland Avenue, Madison, Wisconsin 53705
Received 13 August 2003; accepted for publication 27 January 2005
Children between the ages of 4 and 7 and adults were tested in free field on speech intelligibility
using a four-alternative forced choice paradigm with spondees. Target speech was presented from
front ; speech or modulated speech-shaped-noise competitors were either in front or on the right
90°. Speech reception thresholds were measured adaptively using a three-down/one-up algorithm.
The primary difference between children and adults was seen in elevated thresholds in children in
quiet and in all masked conditions. For both age groups, masking was greater with the speech-noise
versus speech competitor and with two versus one competitors. Masking was also greater when the
competitors were located in front compared with the right. The amount of masking did not differ
across the two age groups. Spatial release from masking was similar in the two age groups, except
for in the one-speech condition, when it was greater in children than adults. These findings suggest
that, similar to adults, young children are able to utilize spatial and/or head shadow cues to segregate
sounds in noisy environments. The potential utility of the measures used here for studying
hearing-impaired children is also discussed. © 2005 Acoustical Society of America.
DOI: 10.1121/1.1873913
PACS numbers: 43.66.Pn, 43.66.Qp, 43.71.Ft AK Pages: 30913099
I. INTRODUCTION
Children spend numerous hours every day in complex
auditory environments, such as classrooms, where multiple
sounds that vary in content and direction typically co-occur.
In addition to voices of adults and children, instructional
aids, environmental sounds, and reverberation are standard
aspects of acoustic environments in classrooms. Some work
indicates that children learn best in relatively quiet environ-
ments, and often have difficulty hearing speech in the pres-
ence of distracting sounds Crandell, 1993; Yacullo and
Hawkins, 1987; Papso and Blood, 1989. Psychophysical
studies in which stimuli were presented over headphones
have shown that, compared with adults, preschool listeners
exhibit poorer attentional selectivity on auditory tasks e.g.,
Stellmack et al., 1997; Oh et al., 2001and reduced unmask-
ing for tone detection under dichotic conditions Wightman
et al., 2003; Hall et al., 2004.
Also under headphones, it has been found that in the
presence of two-talker maskers speech reception thresholds
are higher in children than adults, and for both age groups
thresholds are higher in the presence of two-talker maskers
than with speech-shaped noise maskers Hall et al., 2002.
Headphone stimulus presentation is limited, however, be-
cause spatial cues that are known to be important for sound
segregation in realistic environments are missing. Studies
with adults have shown that the ability to segregate target
speech from competing speech and/or noise is determined by
a complex set of auditory computations that involve both
monaural and binaural processes Hawley et al., 1999, 2004;
Bronkhorst, 2000; Culling et al., 2004. Spatial cues in par-
ticular play a key role in facilitating source segregation.
Speech intelligibility improves by up to 12 dB when the
target speech and competing sounds are spatially separated,
resulting in ‘spatial release from masking’ Plomp and
Mimpen, 1981; Bronkhorst and Plomp, 1992; Nilsson et al.,
1994; Koehnke and Besing, 1996; Peissig and Kollmeier,
1997; Hawley et al., 1999, 2004; Shinn-Cunningham et al.,
2001; Litovsky et al., 2002.
The extent to which children demonstrate spatial release
from masking for speech is poorly understood. Of particular
interest in the present study is the effect of number of
maskers, as well as their content, on the extent to which
young children experience spatial release from masking. In
adult listeners spatial release from masking is especially
large for multiple two or moremaskers that carry linguistic
content or context i.e., speech or reversed speech, and rela-
tively small for a single, nonspeech masker such as speech-
shaped noise Hawley et al., 2004; see also Bronkhorst
2000 for review. The authors of those works have con-
cluded that release from masking as provided by spatial cues
is particularly effective when the auditory environment is
complex. The concept of ‘informational masking’’ has been
invoked to explain this phenomenon, whereby, in the pres-
ence of maskers that are harder to ignore, spatial cues be-
come important for sound source segregation. In this case,
maskers that are multiple in number and/or that carry infor-
mation resembling that contained in the target result in
greater spatial release from masking e.g., Brungart 2001;
Freyman et al., 2001; Arbogast et al., 2002; Durlach et al.,
2003.
Several studies have reported that speech masking in
children depends on the masker type Papso and Blood,
a
Select portions of these data were presented at the 143rd Meeting of the
Acoustical Society of America, Pittsburgh, PA, and at the 24th Meeting of
the Association for Research in Otolaryngology, Tampa, FL.
b
Electronic mail: litovsky@waisman.wisc.edu
3091J. Acoust. Soc. Am. 117 (5), May 2005 0001-4966/2005/117(5)/3091/9/$22.50 © 2005 Acoustical Society of America
1989; Hall et al., 2002, 2004. However, the effect of num-
ber and spatial cues, and the possible contribution of these
stimulus parameters to spatial release from masking, remain
poorly understood. Binaural abilities in children are adultlike
on measures of binaural masking level differences Nozza
et al., 1988; Moore et al., 1991and minimum audible angle
Litovsky, 1997. Since spatial cues are known to play a key
role in speech understanding for adults, it is important to
understand how young children comprehend speech in real-
istic, multi-source acoustic environments, and the conditions
that enable them to benefit from spatial cues. The research
paradigm used here may ultimately also be useful in evalu-
ating performance of hearing-impaired children. Noisy envi-
ronments are particularly problematic for children with a his-
tory of otitis media e.g., Hall et al., 2003; Moore et al.,
2003; Roberts et al., 2004 and for hearing aid and cochlear
implant users e.g., Dawson et al., 2004; Eisenberg et al.,
2004; Litovsky et al., 2004. Because the important task of
hearing speech in noise can be a daily struggle for many of
these children, ultimately their performance on these mea-
sures can assist with diagnosis and fitting strategies.
In the present study the task involved a four-alternative
forced-choice 4AFC word discrimination paradigm. Sub-
jects selected a picture that matched the speech target from
an array of four pictures that appeared on a computer moni-
tor. Other tests such as the HINT-C Nilsson et al., 1994
may be usable for measuring speech intelligibility in noise in
children as young as 6 years, but are difficult to implement
with younger children. The test protocol described here was
specifically designed to enable the study of speech intelligi-
bility in noise in children as young as 4 years old, an age at
which many children begin to spend a significant number of
hours in noisy environments such as preschool classrooms.
II. METHODS
A. Subjects
A total of 36 volunteer children were recruited from lo-
cal public schools and the general community 14 males and
22 females, and all subjects completed testing on the three
required conditions. Subjects ranged in age from 4.5 to 7.5
years average and standard deviation5.51 years; see also
Table I.
1
All were native speakers of English with no known
auditory dysfunction or other cognitive disorders. According
to the parents’ report, none of the children were on medica-
tion or had known illness or ear infections on the day of
testing, and none of the children had a known history of
hearing loss. Total testing time for each listener was approxi-
mately 45 min.
Nine paid adult volunteers, with normal hearing as veri-
fied by standard audiometric testing for frequencies between
250 and 8000 Hz, and English as their first language, were
also tested. Since testing was much less time consuming with
adults than with children, a within-subject design was used
whereby each subject participated in all conditions that per-
tained to the four groups of children.
B. Testing chamber, materials apparatus
Testing was conducted in a single-walled sound booth
3.64m with carpeting. This room had a reverberation
time (T
60
) 250ms and ambient noise levels averaging 35
dB SPL. During testing, subjects were always seated in the
center of the room, with loudspeakers Radio Shack Mini-
mus 7 placed at 15.24 cm above ear level for children ear
level for adults and at a distance of 1.67 m from the center
of the subject’s head. All stimuli were prerecorded, digitized,
and stored on a laptop computer Winbook. In the one-
competitor conditions, the target and competing sound were
fed to separate channels of a two-channel soundcard Digi-
gram VX Pocket, amplified Crown D-75, and presented to
separate loudspeakers. When both target and competitor
were presented from the front position, the speakers were
placed next to one another, with their centers at 2°, with
their medial walls nearly touching. Each loudspeaker sub-
tended in the horizontal dimension, hence strictly speak-
ing, speakers were separated by 4°. In the two-competitor
condition, when both occurred from the front, they were pre-
sented from the same loudspeaker. Target stimulus selection,
level controls, and output as well as response acquisition
were achieved using Matlab. A picture book containing four
target pictures per page was placed on a small table in front
of the subject.
C. Stimuli
Stimuli consisted of target words and competing sen-
tences. Targets comprised a closed set of 25 spondaic words
from CID W-1 obtained from Auditech and spoken by a male
talker. Although a larger set of words is available, the subset
chosen for the present study consisted of words that were
easily represented with a visual illustration and readily rec-
ognized as such during pilot testing of 20, 4 to 5 year-old
children a list of the target words used is shown in the
Appendix. The root-mean-square levels were equalized for
all target words using Matlab software. The competitors were
either speech or modulated speech-shaped noise. Competing
sentences were taken from the Harvard IEEE list Rothauser
et al., 1969 and recorded with a female voice. Examples of
sentences are ‘Glue the sheet to the dark blue background,’
TABLE I. List of conditions tested for children nine subjects per condition.
Group
No. of
competitors
Age range
years. monthsSD
Competitor
type Conditions
1 1 5.41.1 Speech Quiet, 1 front, 1 right
2 1 5.61.2 Speech-noise Quiet, 1 front, 1 right
3 2 5.81 Speech Quiet, 2 front, 2 right
4 2 5.61 Speech-noise Quiet, 2 front, 2 right
3092 J. Acoust. Soc. Am., Vol. 117, No. 5, May 2005 Ruth Y. Litovsky: Speech intelligibility in young children
‘Two blue fish swam in the tank,’ and ‘The meal was
cooked before the bell rang.’ Ten such sentences were used,
and these were presented in a random order during testing.
Speech-noise was made based on the ten competitor sen-
tences and also played in a random order during testing.
These interferers were filtered to match the long-term spec-
trum of the speech competitors, calculated for each talker
separately. The noise samples were scaled to the same root-
mean-square value and cut to the same length as the match-
ing speech competitor. The envelope was then extracted from
the speech competitor and was used to modulate the noise
tokens, giving the same coarse temporal structure as the
speech. The envelope of running speech was extracted using
a method similar to that described by Festen and Plomp
1990 in which a rectified version of the waveform is low-
pass filtered. A first-order Butterworth low-pass filter was
used with a 3-dB cutoff at 40 Hz.
D. Design
The target words were always presented from the front
. Competitors were presented from either front or side
90°. Four groups of children with nine subjects per group
were tested see Table I. The side condition was always with
competitors on the right. Each child subject was randomly
assigned to a group that was tested on one combination of
type speech or speech-noise and number 1or2 of com-
petitors. The subject was then tested on three conditions:
1quiet: no competitors, 2front: target and competitors
in front, and 3right: target in front and competitorsat 90°
on the right; the order of conditions was randomized using a
Latin-square design. For the adult group, testing was con-
ducted in a single 2-h session, with the order of the nine
conditions randomized for each listener.
For each condition one adaptive track was measured.
When two competitors were presented they were of the same
type, but different samples were used for the two sources; in
the two-speech conditions the same female voice was pre-
sented, speaking two different sentences, and in the two-
speech-noise conditions two different segments of the noise
were presented.
E. Familiarization
The present study was not aimed at testing children’s
vocabulary, but rather their speech intelligibility for known
words. The 25 words were selected from the spondee list
after pilot testing indicated that 20, 4 to 5 year-old children
were either familiar with the words or could easily ascertain
their meaning after one presentation. For each of the 25
words a commissioned artist-drawn picture was used to vi-
sually represent the meaning of the word. Prior to testing,
subjects underwent a familiarization session approximately
5 min in duration in which they were presented with the
picture-word combinations and tested to insure that they as-
sociated each of the pictures with their intended auditory
target.
F. Speech reception threshold estimation
The test involved a single interval 4AFC discrimination
procedure. On each trial, the child viewed a set of four pic-
tures from the set of 25 picture-word matches. A word
matching one of the pictures was randomly selected and pre-
sented from the front speaker. A leading phrase such as
‘Point to the picture of the...’’ or ‘Where is the...’ preceded
each target word. The child was asked to select the picture
matching the heard word, and to guess if not sure or if the
word was not audible. The randomization process ensured
that for every subject, on average, all 25 words were selected
an equal number of times. The experimenter entered the
child’s response into the computer. Following correct re-
sponses, feedback was provided in the form of 3-s musical
clips from popular children’s music. Approximately 20 clips
were digitized and stored on the computer, and randomly
selected on correct-feedback trials. Following incorrect re-
sponses, feedback was provided in the form of a brief phrase
such as ‘Let’s try another one’ or ‘That must have been
difficult.’ Five such phrases were digitized and stored on the
computer, and randomly selected on incorrect-feedback tri-
als.
An adaptive tracking method was used to vary the level
of the target signal, such that correct responses result in level
decrement and incorrect responses result in level increment.
The algorithm includes the following rules: 1 Level is ini-
tially reduced in steps of 8 dB, until the first incorrect re-
sponse. 2 Following the first incorrect response a three-
down/one-up rule is used, whereby level is decremented
following three consecutive correct responses and level is
incremented following a single incorrect response. 3 Fol-
lowing each reversal the step size is halved. 4 The mini-
mum step size is 2 dB. 5 A step size that has been used
twice in a row in the same direction is doubled. For instance,
if the level was decreased from 40 to 36 step4 and then
again from 36 to 32 step4, continued decrease in level
would result in the next level being 24 step8. 6 After
three consecutive incorrect responses a ‘probe’ trial is pre-
sented at the original level of 60 dB. If the probe results in a
correct response the algorithm resumes at the last trial before
the probe was presented. If more than three consecutive
probes are required, testing is terminated and the subject’s
data are not included in the final sample. 7 Testing is ter-
minated following five reversals.
For each subject, speech-reception-thresholds SRTs
were measured for each condition. At the start of each SRT
measurement, the level of the target was initially 60 dB SPL.
When competitors were present non-quiet conditions, the
level of each competitor was fixed at 60 dB SPL, such that
the overall level of the competitors was increased by ap-
proximately 3 dB when two competitors were presented
compared with the one-competitor conditions. Thus, the
adaptive track began with a signal-to-noise ratio of 0 dB in
the one-competitor cases and 3 dB in the two-competitor
cases.
Results were analyzed using a constrained maximum-
likelihood method of parameter estimation outlined by Wich-
mann and Hill 2001a, b. All the data from each experimen-
tal run for each participant were fit to a logistic function.
3093J. Acoust. Soc. Am., Vol. 117, No. 5, May 2005 Ruth Y. Litovsky: Speech intelligibility in young children
Thresholds were calculated by taking the inverse of the func-
tion at a specific probability level. In our 4AFC task, using
an adaptive three-down/one-up procedure, the lower bound
of the psychometric function was fixed at the level of chance
performance, 0.25, and the threshold level corresponded to
the point on the psychometric function where performance
was approximately 79.4% correct. Biased estimates of
threshold can occur. Bias can be introduced by the sampling
scheme used and lapses in listener attention. Wichmann and
Hill 2001a, bdemonstrated that bias associated with lapses
was easily overcome by introducing a highly constrained pa-
rameter to control the upper bound of the psychometric func-
tion. This approach was used to assess our data. The upper
bound of the psychometric function was constrained within a
narrow range 0.06 as suggested by Wichmann and Hill
2001b. As the authors suggest, under some circumstances,
bias introduced by the sampling scheme may be more prob-
lematic to avoid even when a hundred trials are obtained per
level visited. The possibility of biased threshold estimates
due to our sampling scheme was assessed by comparing the
thresholds obtained using the constrained maximum-
likelihood method with traditional threshold estimates based
on the last three reversals in each experimental run. A re-
peated measured t-test on quiet thresholds for the 36 children
tested revealed no statistically significant difference
between the estimated threshold values obtained using the
ML approach versus the traditional approach
t(35)1.37,
p 0.05, two tailed.
III. RESULTS
SRTs were statistically analyzed for the children groups
using a mixed-design analysis of variance ANOVA with
two between-subjects variables number of competitors,
competitor type and one within-subjects variable condi-
tion. Significant main effects of number
F(1,32)4.05;
p 0.05
and condition
F(2,32)119.57, p0.0001
were found, but there was no effect of type. Significant in-
teractions were found for condition with number
F(2,64)
66.50; p0.03
and condition with type F2,64
162.01; p0.001. Scheffe’s posthoc contrasts signifi-
cance value p 0.05) showed that SRTs in quiet were sig-
nificantly lower than SRTs in either front or right. Children
tested with two competitors had significantly higher SRTs
than those tested with one competitor for the front and right
conditions further comparisons between front and right are
described below with regard to spatial release from mask-
ing. Finally, for reasons that are not clear, SRTs on the quiet
conditions were lower in the two speech-noise groups than in
the groups tested with the speech competitors. Adult
data were analyzed with a one-way ANOVA for the nine
conditions, which revealed a significant main effect
F(8,8)3.77; p0.05
. Scheffe’s posthoc contrasts
F(8,8);Fp 0.01
revealed that quiet SRTs were lower than
SRTs on all other conditions. Child and adult SRTs were
compared with independent t-tests for each of the nine con-
ditions; since the quiet condition was tested for each of the
child groups, a total of 12 comparisons were conducted. The
Bonferroni adjustment for multiple comparisons as described
by Uitenbroek 1997 was applied df16, criterion of t
3.34 and p 0.004). Significant differences were found for
all 12 comparisons, suggesting that adults’ SRTs were lower
than those of children for all conditions tested.
Figure 1 shows group means SD for masking dif-
ferences between masked and quiet SRTs. For each subject
masking amounts for front and right were obtained by sub-
tracting quiet SRTs from front and right SRTs, respectively.
To place the masking values into context, average SD
SRTs for all groups and conditions are listed in Table II.
Statistical analyses on the amount of masking for the child
groups were conducted with a three-way mixed-design
ANOVA treating condition front minus quiet, right minus
quiet as the within-subjects variable and competitor type
and number as the between-subjects variables. A significant
effect of condition
F(1,32)29.13; p0.0001
suggests
FIG. 1. Average SD, dB SPL differences between speech reception
thresholds SRTs in the masked and quiet conditions. Data are plotted for
front top panels and right bottom panels conditions, for children left
panelsand adults right panels. Each panel compares difference values for
the speech and speech-noise competitors when the number of competitors
was either one black bars or two gray bars.
TABLE II. Mean SD speech reception thresholds in dB SPL
a
Group Quiet Front Right
Children
1 speech 26.023.81 41.81 6.31 36.64 6.48
2 speech 27.325.25 47.75 6.30 40.33 6.29
1 speech-noise 23.255.56 44.37 6.50 40.13 3.89
2 speech-noise 21.453.3 48.01 2.07 44.41 7.18
Adults
1 speech 3.843.18 16.71 5.66 16.86 3.84
2 speech 23.35 4.41 20.43 4.01
1 speech-noise 27.39 5.28 22.25 4.82
2 speech-noise 32.82 4.40 27.60 8.65
a
It is important to recall that each child was tested on three conditions
quiet, front, rightfor one masker type, and that each adult was tested on
all nine conditions, hence only one entry in Table II for adult quiet thresh-
olds.
3094 J. Acoust. Soc. Am., Vol. 117, No. 5, May 2005 Ruth Y. Litovsky: Speech intelligibility in young children
that masking in the front minus quiet condition was higher
than in right minus quiet. Significant effects of type
F(1,32)15.51; p0.0001
and number
F(1,32)
6.95; p0.013
further suggest that masking was greater
for two competitors than one, and greater for the speech-
noise competitor compared with speech. There were no sig-
nificant interactions. For the adult subjects, a three-way re-
peated measures ANOVA conditiontypenumber
suggested, similar to the children, that masking was greater
in the front versus right conditions
F(1,8)27.72;
p 0.001
, greater with speech-noise than speech
F(1,8)
30.72; p0.001
and greater for two compared with one
competitor
F(1,8)16.71; p0.004
. Masking data for
child and adult groups were compared with independent
t-tests for each competitor location/type/number combina-
tion, and the Bonferroni correction for eight comparisons
was applied Uitenbroek, 1997. None of the comparisons
yielded a significant difference in masking between the child
and adult groups, and none of the interactions were signifi-
cant.
Spatial release from masking was defined as the differ-
ence between front masking front minus quiet and right
masking right minus quiet. Figure 2 shows individual
points for right minus quiet plotted versus front minus quiet
for all subjects and conditions tested. If no spatial release
from masking occurred, the points would be expected to fall
along the diagonal. Points falling below the diagonal would
be indicative of spatial release from masking. Alternatively,
points falling above the diagonal would represent cases in
which thresholds were higher when the competitors were on
the right rather than in front. The majority of individual data
points in Fig. 2 are below the diagonal, and average points
for all but one group are also indicative of spatial release
from masking.
Figure 3 summarizes the findings for spatial release
from masking. For children, group average values are be-
tween 3.6 and 7.5 dB; the overall average for all 36 children
is 5.25 dB. For adults, group averages range from 0 to 5.2 dB
with an overall average of 3.34 dB. Children’s data were
analyzed with a two-way between-subjects ANOVA type
number, revealing no significant main effects or interac-
tions. This lack of an effect may not surprising given the
large intersubject variability, which is notable in Fig. 3A;
while some children had spatial release from masking values
greater than 10 dB, other children had values near 0, and a
small number had negative values. Adult data were analyzed
with a two-way repeated measures ANOVA typenumber,
also revealing no significant effects or interactions. Finally,
FIG. 2. Masking amounts differences
between masked and quiet thresholds
for the Right minus Quiet conditions
are plotted vs. Front minus Quiet con-
ditions. Panels A and C show data
for children and adults, respectively;
each symbol denotes data from an in-
dividual subject, and the four different
symbols refer to the type/number com-
bination of competitors. The diago-
nal lines denote equality between the
two variables. Panels B and D
show average group data from Aand
C, respectively, for the four condi-
tions tested.
3095J. Acoust. Soc. Am., Vol. 117, No. 5, May 2005 Ruth Y. Litovsky: Speech intelligibility in young children
to compare spatial release from masking for children and
adults independent t-tests were conducted for each type/
number combination, with the Bonferroni correction for four
contrasts applied Uitenbroek, 1997. The only significant
difference between groups was for the one-speech competi-
tor condition, in which the average spatial release from
masking in adults is 0, compared with an average value of
5.7 for the child group.
IV. DISCUSSION
Speech intelligibility in quiet and in the presence of
competing sounds and the ability to benefit from spatial
separation of the speech and competitorswere investigated
in children and adults. Although extensively studied in
adults, to date this area of research has been minimal in
children. This study may therefore be helpful towards im-
proving our understanding of children’s ability to hear and
learn in noisy and reverberant environments, especially
given that such abilities are known to be compromised com-
pared with abilities measured under quiet condition e.g.,
ANSI, 2002; Yacullo and Hawkins, 1987; Knecht et al.,
2002. The results can be summarized as follows: 1 Adults’
SRTs were lower than those of the children for all conditions.
2 For both age groups masking was significantly greater
with speech-noise than with speech and with two competi-
tors compared with one. 3 The amount of masking did not
differ across the two age groups. 4 The amount of spatial
release from masking was similar for children and adults on
all but one condition. 5 The number or type of competitor
did not affect the size of spatial release from masking for
either age group.
A. SRTs and masking amount
The primary age difference was that of higher SRTs in
children than adults, in quiet and in all masked conditions.
This age effect is consistent with existing developmental
psychoacoustic literature, which has shown that children
ages 4 to 7 typically have higher tone detection thresholds
compared with adults e.g., Buss et al., 1999; Oh et al.,
2001. Similarly, recognition of spondee words such as those
used here in temporally modulated noise has been shown to
produce higher thresholds in 5 to 10 year-old children than in
adults Hall et al., 2002.
The age effect found here can be attributed to a combi-
nation of peripheral and central mechanisms. Peripherally,
frequency resolution is highly similar to that of adults by 5
years of age Allen et al., 1989; Hall and Grose, 1991;
Veloso et al., 1990. However, young children appear to in-
tegrate auditory information over a greater number of audi-
tory channels than adults, suggesting that their ability to ex-
tract auditory cues, and in the present study to identify target
words at low signal levels, is likely to be still developing
e.g., Hall et al., 1997; Buss et al., 1999; Hartley et al.,
2000; Oh et al., 2001. Immaturity of central auditory pro-
cesses and the adoption of listening strategies that are non-
optimal or less efficient than adults Allen and Wightman,
1994; Lutfi et al., 2003 may have also affected SRTs. Fi-
nally, differences in thresholds may represent age-related dif-
ferences in the ability to take advantage of hearing partial
word segments and to ‘fill in’ the remainder of the target
word. Anecdotal reports from adults suggest that they relied
heavily on this strategy at low signal levels. The ability to
adopt this strategy can most likely be attributed to adults’
having more experience and better-developed language
skills, including the ability to parse phonetic, semantic, and
lexical aspects of speech Fletcher and MacWhinney, 1995.
Of interest is the lack of an age effect for the amount of
masking. Previous studies have typically shown that adults
experience reduced masking compared with children e.g.,
Buss et al., 1999; Oh et al., 2001; Papso and Blood, 1989;
Hall et al., 2002. Although this explanation may not be en-
tirely satisfying, the lack of an age-related masking effect
may be attributed to the task itself. In the current study, using
the 4AFC task, quiet thresholds were extremely low in
adults. In contrast, adults tested on the same measure using
identical stimuli, but with a 25AFC did not reveal such low
FIG. 3. Spatial release from masking values are shown for children and
adults in panels Aand B, respectively. Each panel shows values grouped
by competitor type/number condition on the x-axis labels SP and Sp-Ns
refer to the speech and speech-noise conditions, respectively. Individual
values appear in gray circles, and group averages SDare shown in black
circles. When necessary to avoid overlap of data points, in some cases there
was a slight shifting along the x axis.
3096 J. Acoust. Soc. Am., Vol. 117, No. 5, May 2005 Ruth Y. Litovsky: Speech intelligibility in young children
SRTs in quiet, but continued to show lower masked SRTs.
The amount of masking in the 25AFC task was therefore
lower in adults than children Johnstone and Litovsky, 2005.
When increasing task difficulty for adults, a more realistic
story with regard to age-related masking differences may
emerge, suggesting the importance of equating for difficulty
of the task when comparing perceptual abilities across age
groups.
B. Competitor type
SRTs did not differ for the two types of competitors for
children, but were higher with speech-noise than speech for
the adults, which may be in part due to greater statistical
power in the adult within-subjects comparisons. For both age
groups, masking was greater with speech-noise than speech.
These findings are consistent with other findings in adults in
a one-masker paradigm, whereby greater amounts of mask-
ing were reported in the presence of speech-noise compared
with speech e.g., Hawley et al., 2004. This has been attrib-
uted to greater amounts of overlap in the energies of the
speech-noise masker and the target, resulting in the reduction
of F0 discrimination. However, in previous work, as the
number of maskers increased, speech became a more potent
masker, an explanation involving informational masking and
linguistic interference from multiple speech maskers was in-
voked to account for the increased interference from speech
e.g., Bronkhorst, 2000; Hawley et al., 2004. Here, there
was no interaction of type and number of competitors, which
may be explained by stimulus differences across studies.
Studies such as those of Hawley et al. 2004 typically use
male voices for both the target and competitors, whereas here
the target was a male voice and the competitor was spoken
by a female. The differences in voice pitch, quality, and on-
going F0 differences provided a robust cue for source segre-
gation in the presence of speech competitors, regardless of
the number of competitors. The speech-noise competitor,
having momentary dips in amplitude but no ongoing changes
in frequency, served as a more potent masker whose effect
was greater than that of speech. With same-gender competi-
tors it is highly likely that speech would have produced
masking at least as great, if not larger than the speech-noise
competitor e.g., Brungart et al., 2001. Finally, the differ-
ences in masking amounts for the child groups may be ac-
counted for by the fact that, for reasons that are not entirely
clear, but probably due to random variation within the popu-
lation, SRTs on the quiet conditions were lower in the two
speech-noise
´
groups than in the groups tested with the
speech competitors.
C. Number of competitors
For both children and adults, masking was significantly
greater for two compared with one competitors, and the
interactions of number with location front versus rightwere
not significant. Averaged over all competitor types and num-
bers, the addition of a second competing sound resulted in
increased masking of 4.7 dB for children and 4.8 for adults.
Two interpretations can be considered here. First, in the pres-
ence of competitors with envelope modulations such as those
used here, listeners may be better able to take advantage of
the modulations and ‘listen in the gaps’’ in the presence of a
single competitor. As a second competitor is added the signal
contains fewer gaps, thereby decreasing opportunities of
‘gap listening’ e.g., Festen and Plomp, 1990; Hawley
et al., 2004. Second, consider the possible role of ‘informa-
tional’ masking. In recent years this term has been used
extensively in the auditory literature to explain masking phe-
nomena that cannot be attributed solely to peripheral audi-
tory mechanisms e.g., Neff and Green, 1987; Lutfi, 1990;
Kidd et al., 2003. In the speech intelligibility literature, one
of the conditions under which informational masking has
been thought to occur is when the addition of a second
masker elevates thresholds by more than the 3 dB expected
simply from the added energy in the presence of a second
masker e.g., Brungart et al., 2001; Hawley et al., 2004;
Durlach et al., 2003. This threshold elevation may result
from the increased complexity of the listening environment,
possibly due to uncertainty on the part of the listener as to
what aspects of the stimulus to ignore and what aspects to
pay attention to. Although difficult to evaluate numerically,
this component of masking may have been present here to
some extent, and more direct tests of the effect in children
would be important to pursue in future studies.
D. Spatial release from masking
Measures of spatial release from masking did not statis-
tically differ across age groups, nor were there effects of
competitor type and number. The only effect was the lack of
spatial release from masking in the one-speech condition in
adults, compared with 5.7 dB in children. The adult data
differ from other free field studies in adults, in which spatial
release from masking for speech was reported to be at least 3
dB for a single competing talker and as high as 12 dB for
multiple talkers Bronkhorst, 2000; Hawley et al., 2004.
The lack of release from masking found here with the one-
speech competitor is likely due to the nature of the task and
stimuli; the use of a fairly easy 4AFC task in combination
with different-gender talkers for the target and competitor
most likely created a relatively simple listening situation for
adults.
Spatial cues are thought to be especially useful in chal-
lenging conditions when nonspatial cues are difficult to ac-
cess Peissig and Kollmeier, 1997; Bronkhorst, 2000;
Durlach et al., 2003; Freyman et al., 2004. In the adult
group tested here, spatial cues were beneficial in the condi-
tions that created greater amounts of front masking two-
speech, one-speech-noise and two-speech-noise. The lack of
a location effect in the one-speech condition is likely due to
the general ease of listening to spondees when the competitor
consists of a single, different-gender talker. In that condition,
spatial cues did not help to reduce masking in the right con-
dition, since masking was already relatively small in that
condition. In contrast with adults, in children the one-speech
front condition did present a challenging situation, probably
because children are less able to take advantage of the
different-gender competitor to hear the target speech. Thus,
spatial cues were indeed relevant to the children so as to
produce a robust improvement in the right condition com-
3097J. Acoust. Soc. Am., Vol. 117, No. 5, May 2005 Ruth Y. Litovsky: Speech intelligibility in young children
pared with the front. These findings suggest that, while tasks
that are more complex, using sentence material and/or same-
gender stimuli may be more appropriate for measuring spa-
tial release from masking in adults, the task used here is a
good tool for measuring the ability of young children to ne-
gotiate complex auditory environments.
The finding that, overall, spatial release from masking in
children is similar to that in adults is consistent with work
showing that preschool-age children perform similar to
adults on measures of binaural masking level differences
Nozza et al., 1988; Moore et al., 1991 and minimum au-
dible angle Litovsky 1997; for review see Litovsky and
Ashmead 1997兲兴. This finding implies that for a simple
closed-set task young children are able to utilize spatial
and/or head shadow cues to the same extent as adults in
order to segregate sounds in noisy environments. That is not
to say that children would be expected to perform similar to
adults on all measures of speech intelligibility in noise.
Given recent findings that children exhibit poorer attentional
selectivity on auditory tasks e.g., Oh et al., 2001, and re-
duced unmasking for tone detection under dichotic condi-
tions Wightman et al., 2003; Hall et al., 2004, the possibil-
ity remains that age differences would be seen under more
demanding conditions, such as an open-set test or with same-
gender target and competitors. Those differences, however,
would not be attributable to age-dependent binaural abilities,
but rather to other central processes such as auditory atten-
tion.
E. Conclusions
Young children require higher signal levels than adults
to identify spondees in a simple 4AFC task, and these age-
related differences may be mediated by both peripheral and
central auditory processes. The fact that young children can
benefit from spatial separation of the target speech and com-
peting sources suggests that in a complex acoustic environ-
ment, such as a noisy classroom, they might find it easier to
attain information if the source of interest is spatially segre-
gated from noise sources. Although, the extent to which this
is true with real-world sounds may depend on duration, com-
plexity and type of sounds, and the demand on attentional
resources that various sounds may require. Finally, the test
used here developed by Litovsky, 2003 is designed to also
be used in pediatric clinical settings where young children
are often fitted with hearing aids or cochlear implants, with
little knowledge about the efficacy of the fittings in noisy
environments. This test may offer a way to evaluate the abili-
ties in children with hearing aids and cochlear implants to
function in noisy environments, and may, for example, be
useful in assessing the extent to which children obtain a ben-
efit from bilateral fitting strategies Litovsky et al., 2004.
ACKNOWLEDGMENTS
The author is grateful to Aarti Dalal and Gerald Ng for
assistance with programming and data collection, and to Patti
Johnstone and Shelly Godar for helping with data analysis.
The author is also grateful to Dr. Joseph Hall for initially
suggesting the use of spondees in a forced choice paradigm,
and to Dr. Adelbert Bronkhorst and an anonymous reviewer
for helpful suggestions during the review process. This work
was supported by NIDCD Grant Nos. DC00100 and
DC0055469, National Organization for Hearing Research,
and the Deafness Research Foundation. Portions of the data
were collected while R. Litovsky was at Boston University,
Hearing Research Center.
APPENDIX: LIST OF SPONDEE WORDS USED IN THE
PRESENT EXPERIMENT
Hotdog
Ice Cream
Birdnest
Cowboy
Dollhouse
Barnyard
Scarecrow
Railroad
Sidewalk
Rainbow
Cupcake
Birthday
Airplane
Eyebrow
Shoelace
Toothbrush
Hairbrush
Highchair
Necktie
Playground
Football
Baseball
Bluejay
Bathtub
Bedroom
1
The lower limit of 4.5 years is slightly conservative, and was based on pilot
testing which suggested that by that age all children were familiar with the
majority of the target words. The upper limit of 7.5 is somewhat smaller
than the 10-year limit used in a number of other works e.g., Oh et al.,
2001; Hall et al., 2002, but similar to that used in studies on auditory
attention in young children, in which there do not appear to be develop-
mental effects within the age range e.g., Stellmack et al., 1997; Oh et al.,
2001.
Allen, P., and Wightman, F. 1994. ‘Psychometric functions for children’s
detection of tones in noise,’’ J. Speech Hear. Res. 37, 205215.
Allen, P., Wightman, F., Kistler, D., and Dolan, T. 1989. ‘Frequency reso-
lution in children,’ J. Speech Hear. Res. 32, 317322.
American National Standards Institute 2002. ‘Standard for acoustical
characteristics of classrooms in the United States,’’ ANSI—S12.60.
Arbogast, T. L., Mason, C. R., and Kidd, G. 2002. ‘The effect of spatial
separation on informational and energetic masking of speech,’’ J. Acoust.
Soc. Am. 112, 20862098.
Bronkhorst, A. 2000. ‘The cocktail party phenomenon: A review of re-
search on speech intelligibility in multiple-talker conditions,’Acta. Acust.
Acust. 86, 117128.
Bronkhorst, A. W., and Plomp, R. 1992. ‘Effect of multiple speechlike
maskers on binaural speech recognition in normal and impaired hearing,’
J. Acoust. Soc. Am. 92, 31323139.
Brungart, D. S., Simpson, B. D., Ericson, M. A., and Scott, K. R. 2001.
‘Informational and energetic masking effects in the perception of multiple
talkers,’ J. Acoust. Soc. Am. 110, 25272538.
3098 J. Acoust. Soc. Am., Vol. 117, No. 5, May 2005 Ruth Y. Litovsky: Speech intelligibility in young children
Buss, E., Hall, III, J. W., Grose, J. H., and Dev, M. B. 1999. ‘‘Development
of adult-like performance in backward, simultaneous, and forward mask-
ing,’ J. Speech Lang. Hear. Res. 42, 844849.
Crandell, C. C. 1993. ‘Speech recognition in noise by children with mini-
mal degrees of sensorineural hearing loss,’’ Ear Hear. 14, 210216.
Culling, J. F., Hawley, M. L., and Litovsky, R. Y. 2004. ‘The role of
head-induced interaural time and level differences in the speech reception
threshold for multiple interfering sound sources,’ J.Acoust. Soc. Am. 116,
10571065.
Dawson, P. W., Decker, J. A., and Psarros, C. E. 2004. ‘Optimizing dy-
namic range in children using the nucleus cochlear implant,’ Ear Hear. 25,
230241.
Durlach, N. I., Mason, C. R., Shinn-Cunningham, B. G., Arbogast, T. L.,
Colburn, H. S., and Kidd, Jr., G. 2003. ‘Informational masking: coun-
teracting the effects of stimulus uncertainty by decreasing target-masker
similarity,’ J. Acoust. Soc. Am. 114, 368379.
Eisenberg, L. S., Kirk, K. I., Martinez, A. S., Ying, E. A., and Miyamoto, R.
T. 2004. ‘Communication abilities of children with aided residual hear-
ing: comparison with cochlear implant users,’ Arch. Otolaryngol. Head
Neck Surg. 130, 563569.
Festen, J. M., and Plomp, R. 1990. ‘Effects of fluctuating noise and inter-
fering speech on the speech-reception threshold for impaired and normal
hearing,’ J. Acoust. Soc. Am. 88, 17251736.
Fletcher, P., and MacWhinney, B. 1995. Handbook of Child Language
Blackwell, Oxford, UK.
Freyman, R. L., Balakrishnan, U., and Helfer, K. S. 2001. ‘‘Spatial release
from informational masking in speech recognition,’ J. Acoust. Soc. Am.
109, 21122122.
Freyman, R. L., Balakrishnan, U., and Helfer, K. S. 2004. ‘Effect of
number of masking talkers and auditory priming on informational masking
in speech recognition,’ J. Acoust. Soc. Am. 115, 22462256.
Hall, J. W., Buss, E., Grose, J. H., and Dev, M. B. 2004. ‘Developmental
effects in the masking-level difference,’ J. Speech Lang. Hear. Res. 47,
13–20.
Hall, III, J. W., and Grose, J. H. 1991. ‘Notched-noise measures of fre-
quency selectivity in adults and children using fixed-masker-level and
fixed-signal-level presentation,’’ J. Speech Hear. Res. 34, 65160.
Hall, III, J. W., Grose, J. H., and Dev, M. B. 1997. ‘‘Auditory development
in complex tasks of comodulation masking release,’ J. Speech Lang. Hear.
Res. 40, 946954.
Hall, III, J. W., Grose, J. H., Buss, E., and Dev, M. B. 2002. ‘Spondee
recognition in a two-talker masker and a speech-shaped noise masker in
adults and children,’ Ear Hear. 23, 159165.
Hall, III, J. W., Grose, J. H., Buss, E., Dev, M. B., Drake, A. F., and Pills-
bury, H. C. 2003. ‘The effect of otitis media with effusion on perceptual
masking,’’ Arch. Otolaryngol. Head Neck Surg. 129, 10561062.
Hartley, D. E., Wright, B. A., Hogan, S. C., and Moore, D. R. 2000.
‘Age-related improvements in auditory backward and simultaneous mask-
ing in 6- to 10-year-old children,’’ J. Speech Lang. Hear. Res. 43,1402
1415.
Hawley, M. L., Litovsky, R. Y., and Colburn, H. S. 1999. ‘Speech intel-
ligibility and localization in complex environments,’’ J. Acoust. Soc. Am.
105, 34363448.
Hawley, M. L., Litovsky, R. Y., and Culling, J. F. 2004. ‘The benefit of
binaural hearing in a cocktail party: Effect of location and type of inter-
ferer,’’ J. Acoust. Soc. Am. 115, 833843.
Johnstone, P., and Litovsky, R. Y. 2005. ‘Speech intelligibility and spatial
release from masking in children and adults for various types of interfer-
ing sounds,’’ J. Acoust. Soc. Am. in press.
Kidd, Jr., G., Mason, C. R., and Richards, V. M. 2003. ‘Multiple bursts,
multiple looks, and stream coherence in the release from informational
masking,’ J. Acoust. Soc. Am. 114, 28352845.
Koehnke, J., and Besing, J. M. 1996. ‘A procedure note for testing speech
intelligibility in a virtual listening environment,’’ Ear Hear. 17, 211217.
Knecht, H. A., Nelson, P. B., Whitelaw, G. M., and Feth, L. L. 2002.
‘Background noise levels and reverberation times in unoccupied class-
rooms: predictions and measurements,’ Am. J. Audiol. 11, 6571.
Litovsky, R. 1997. ‘Developmental changes in the precedence effect: Es-
timates of Minimal Audible Angle,’ J. Acoust. Soc. Am. 102, 17391745.
Litovsky, R. 2003. ‘Method and system for rapid and reliable testing of
speech intelligibility in children,’’ U.S. Patent No. 6,584,440.
Litovsky, R., and Ashmead, D. 1997. ‘Developmental aspects of binaural
and spatial hearing,’ in Binaural and Spatial Hearing, edited by R. H.
Gilkey and T. R. Anderson Earlbaum, Hillsdale, NJ, pp. 571592.
Litovsky, R. Y., Fligor, B., and Tramo, M. 2002. ‘Functional role of the
human inferior colliculus in binaural hearing,’’ Hear. Res. 165, 177–188.
Litovsky, R. Y., Parkinson, A.,Arcaroli, J., Peters, R., Lake, J., Johnstone, P.,
and Yu, G. 2004. ‘Bilateral cochlear implants in adults and children,’
Arch. Otolaryngol. Head Neck Surg. 130, 648655.
Lutfi, R. A. 1990. ‘How much masking is informational masking?’ J.
Acoust. Soc. Am. 88, 26072610.
Lutfi, R. A., Kistler, D. J., Oh, E. L., Wightman, F. L., and Callahan, M. R.
2003. ‘One factor underlies individual differences in auditory informa-
tional masking within and across age groups,’ Percept. Psychophys. 65,
396406.
Moore, D. R., Hutchings, M., and Meyer, S. 1991. ‘Binaural masking
level differences in children with a history of otitis media,’Audiology 30,
91101.
Moore, D. R., Hartley, D. E., and Hogan, S. C. 2003. ‘Effects of otitis
media with effusion OME on central auditory function,’’ Int. J. Pediatr.
Otorhinolaryngol. 67, S63S67.
Neff, D. L., and Green, D. M. 1987. ‘Masking produced by spectral un-
certainty with multicomponent maskers,’ Percept. Psychophys. 41,409
415.
Nilsson, M., Soli, S. D., and Sullivan, J. A. 1994. ‘Development of the
Hearing in Noise Test for the measurement of speech reception thresholds
in quiet and in noise,’’ J. Acoust. Soc. Am. 95, 10851099.
Nozza, R. J., Wagner, E. F., and Crandell, M. A. 1988. ‘Binaural release
from masking for a speech sound in infants, preschoolers, and adults,’’ J.
Speech Hear. Res. 31, 212218.
Oh, E. L., Wightman, F., and Lutfi, R. A. 2001. ‘Children’s detection of
pure-tone signals with random multiple maskers,’ J. Acoust. Soc. Am.
109, 28882895.
Papso, C. F., and Blood, I. M. 1989. ‘Word recognition skills of children
and adults in background noise,’’ Ear Hear. 10, 337338.
Peissig, J., and Kollmeier, B. 1997. ‘Directivity of binaural noise reduc-
tion in spatial multiple noise-source arrangements for normal and impaired
listeners,’ J. Acoust. Soc. Am. 101, 16601670.
Plomp, R., and Mimpen, A. M. 1981. ‘Effect of the orientation of the
speakers head and the azimuth on a noise source on the speech reception
threshold for sentences,’’ Acustica 48, 325328.
Roberts, J., Hunter, L., Gravel, J., Rosenfeld, R., Berman, S., Haggard, M.,
Hall, III, J., Lannon, C., Moore, D., Vernon-Feagans, L., and Wallace, I.
2004. ‘Otitis media, hearing loss, and language learning: controversies
and current research,’ J. Dev. Behav. Pediatr. 25, 110122.
Rothauser, E. H., Chapman, W. D., Guttman, N., Nordby, K. S., Silbiger, H.
R., Urbanek, G. E., and Weinstock, M. 1969. ‘IEEE Recommended
Practice for Speech Quality Measurements,’ IEEE Trans. Audio Electroa-
coust. 17, 227246.
Shinn-Cunningham, B. G., Schickler, J., Kopco, N., and Litovsky, R. 2001.
‘Spatial unmasking of nearby speech sources in a simulated anechoic
environment,’ J. Acoust. Soc. Am. 110, 11181129.
Stellmack, M. A., Willihnganz, M. S., Wightman, F. L., and Lutfi, R. A.
1997. ‘Spectral weights in level discrimination by preschool children:
analytic listening conditions,’ J. Acoust. Soc. Am. 101, 28112821.
Uitenbroek, D. G. 1997. ‘SISA Binomial,’ Southampton: D. G. Uiten-
broek. Retrieved 1 January, 2004, from the World Wide Web: http://
home.clara.net/sisa/binomial.htm.
Veloso, K., Hall, III, J. W., and Grose, J. H. 1990. ‘Frequency selectivity
and comodulation masking release in adults and in 6-year-old children,’ J.
Speech Hear. Res. 33, 96102.
Wichmann, F. A., and Hill, N. J. 2001a. ‘The psychometric function: I.
Fitting, sampling, and goodness of fit,’ Percept. Psychophys. 63,1293
1313.
Wichmann, F. A., and Hill, N. J. 2001b. ‘The psychometric function: II.
Bootstrap-based confidence intervals and sampling,’ Percept. Psychophys.
63, 13141329.
Wightman, F. L., Callahan, M. R., Lutfi, R. A., Kistler, D. J., and Oh, E.
2003. ‘Children’s detection of pure-tone signals: informational masking
with contralateral maskers,’ J. Acoust. Soc. Am. 113, 32973305.
Yacullo, W. S., and Hawkins, D. B. 1987. ‘Speech recognition in noise
and reverberation by school-age children,’’ Audiology 26, 235–246.
3099J. Acoust. Soc. Am., Vol. 117, No. 5, May 2005 Ruth Y. Litovsky: Speech intelligibility in young children
... Even with normal hearing, young children have greater difficulty understanding speech in noisy environments than adults (Goldsworthy & Markle, 2019;Litovsky, 2005). Speech understanding in noise follows a prolonged developmental process, and this process is even longer when the masker is competing speech (Brown et al., 2010;. ...
Article
Purpose This study aimed to conduct a scoping review of research exploring the effects of slight hearing loss on auditory and speech perception in children. Method A comprehensive search conducted in August 2023 identified a total of 402 potential articles sourced from eight prominent bibliographic databases. These articles were subjected to rigorous evaluation for inclusion criteria, specifically focusing on their reporting of speech or auditory perception using psychoacoustic tasks. The selected studies exclusively examined school-age children, encompassing those between 5 and 18 years of age. Following rigorous evaluation, 10 articles meeting these criteria were selected for inclusion in the review. Results The analysis of included articles consistently shows that even slight hearing loss in school-age children significantly affects their speech and auditory perception. Notably, most of the included articles highlighted a common trend, demonstrating that perceptual deficits originating due to slight hearing loss in children are particularly observable under challenging experimental conditions and/or in cognitively demanding listening tasks. Recent evidence further underscores that the negative impacts of slight hearing loss in school-age children cannot be solely predicted by their pure-tone thresholds alone. However, there is limited evidence concerning the effect of slight hearing loss on the segregation of competing speech, which may be a better representation of listening in the classroom. Conclusion This scoping review discusses the perceptual consequences of slight hearing loss in school-age children and provides insights into an array of methodological issues associated with studying perceptual skills in school-age children with slight hearing losses, offering guidance for future research endeavors.
... 객관적 공간청취검 사가 어려운 경우 신뢰도와 타당도가 검증된 자가보고 설 문 도구를 통해 난청인의 공간청취 어려움을 주관적으로 평 가할 수 있다 ( Jung et al., 2023b;Noble et al., 2008;Perreau et al., 2014;Zhang et al., 2015). 다수의 선행 연 구에서 위치분리가 이루어지지 않은(spatially co-located) 조건에 비해 위치분리된 조건에서 얼마나 소음하 어음인지 가 향상하는지를 통해 위치분리로 인한 차폐감소(spatial release from masking, SRM) 혹은 위치분리혜택(spatial separation benefit) 정도를 도출하였다 (Hawley et al., 2004;King et al., 2020;Litovsky, 2005;Yost, 2017). SRM 이 클수록 더 나은 공간청취능력을 가졌다고 간주하며, 건 청인의 경우 음향차폐(energetic masker) 조건보다 의미차 폐(informational masker) 조건에서 더 큰 SRM을 보였다 (Arbogast et al., 2002;Rothpletz et al., 2012). ...
Article
Purpose: The ability to benefit from spatial separation between target and masker signals is important in multi-sound source listening environments. The goal of this study was to measure the spatial release from masking (SRM) in unilateral cochlear implant (CI) users with bilateral profound hearing loss. We also determined the relationships between the SRMs and the self-reported spatial hearing abilities.Methods: Fourteen unilateral CI users with bilateral profound hearing loss participated in this study. The target sentence was always presented to the front of the listener, and the nonfluctuating speech-shaped noise (SSN) or fluctuating speech noise was either co-located with the target (speech at 0°, noise at 0°, S0N0) or spatially separated at ± 90°. The SRM was quantified as the difference between speech recognition thresholds (SRTs) in the co-located and spatially separated conditions. The self-reported spatial hearing abilities were also measured using validated subjective questionnaires.Results: Overall, the SRTs were lower (better) with SSN than with fluctuating speech noise. When the noise was presented to the non-CI ear (speech at 0°, noise at non-CI ear, S0Nnonci), speech-in-noise recognition was the greatest due to head shadow or better-ear listening effect, resulting in the SRMs of approximately 5~6 dB regardless of noise type. When the noise was given to the CI ear (speech at 0°, noise at CI ear, S0Nci), some individuals exhibited positive SRMs (3~8 dB), while others showed negative SRMs, leading to little SRMs overall. When the SSN was given, subjects with less SRMs (less spatial separation benefits on the objective test) reported greater subjective spatial hearing difficulties.Conclusion: The spatial hearing of unilateral CI users varied by the position of the sound source. Listeners' spatial hearing abilities, which are unpredictable from clinical routine tests, need to be assessed by either objective or subjective measures.
... À 6 ans, lorsque l'enfant entre à l'école primaire, son système auditif est déjà hautement fonctionnel (Werner 2007). Néanmoins, pour certaines capacités de traitement auditif central, il atteindra sa maturation bien après l'adolescence (Litovsky 2005). L. A. Werner a distingué trois étapes dans le développement auditif central des enfants (2007 : 276-280) : -la première concerne la maturation du codage du son qui se poursuit jusqu'à environ 6 mois après la naissance. ...
Article
Écouter un orateur dans des conditions acoustiques défavorables reste un défi pour l’enfant apprenant. À l’école, les élèves sont exposés à différents bruits, dont le niveau de pression sonore peut vite se révéler critique. Ils peuvent aussi écouter un enseignant porteur d’un trouble vocal. Cet article compile les résultats de quatre années de recherche menées dans l’Unité logopédie de la voix à l’Université de Liège. Isabel Schiller a dévolu son doctorat à l’étude des effets isolés et combinés du bruit ambiant et de la qualité vocale du locuteur sur le traitement du langage oral des enfants d’environ 6 ans. L’objectif était d’explorer la manière dont ils perçoivent et comprennent la parole dans le bruit, lorsqu’elle est transmise par un locuteur à la voix dégradée. Le bruit en classe et la voix altérée d’un locuteur réduisent la performance des enfants en classe et augmentent leur effort d’écoute.
... Although there have been laboratory-based attempts at designing these types of tests (e.g. CRISP (Litovsky, 2005)), none have been sufficiently practical to administer in a clinical setting (see Schafer, 2010 for a review). Other tests that probe different aspects of speech understanding besides perception, such as reading, vocabulary, or language tests may be more appropriate (Archbold et al., 2008;Baldassari et al., 2009;Johnson & Goswami, 2010). ...
Article
Full-text available
Objectives: The purpose of this study was to determine the prevalence of ceiling effects for commonly used speech perception tests in a large population of children who received a cochlear implant (CI) before the age of four. A secondary goal was to determine the demographic factors that were relevant for predicting which children were more likely to reach ceiling level performance. We hypothesize that ceiling effects are highly prevalent for most tests. Design: Retrospective chart review of children receiving a CI between 2002 and 2014. Results: 165 children were included. Median scores were above ceiling levels (≥90% correct) for the majority of speech perception tests and all distributions of scores were highly skewed. Children who were implanted earlier, received two implants, and were oral communicators were more likely to reach ceiling-level performance. Age and years of CI listening experience at time of test were negatively correlated with performance, suggesting a non-random assignment of tests. Many children were re-tested on tests for which they had already scored at ceiling. Conclusions: Commonly used speech perception tests for children with CIs are prone to ceiling effects and may not accurately reflect how a child performs in everyday listening situations.
... The loudspeaker-based sound reproduction allows for seamless integration of participant movement and head rotations with high fidelity, as it removes the need for measured individualized head-related transfer functions (HRTFs) and head tracking necessary for real-time headphone-based auralization (Seeber et al., 2010). Headphone-free testing is also useful for hearing research with young children (Litovsky, 2005;McCartney, 2013) and for hearing aid or cochlear implant users who cannot wear headphones (Kerber and Seeber, 2013b). With free-field audiometry, i.e., realizing audiometric measurements via loudspeakers in the free field instead of with headphones, the benefit of hearing aids can be measured directly on the listener (Shulberg, 1980). ...
Article
The use of virtual acoustic environments has become a key element in psychoacoustic and audiologic research, as loudspeaker-based reproduction offers many advantages over headphones. However, sound field synthesis methods have mostly been evaluated numerically or perceptually in the center, yielding little insight into the achievable accuracy of the reproduced sound field over a wider reproduction area with loudspeakers in a physical, laboratory-standard reproduction setup. Deviations from the ideal free-field and point-source concepts, such as non-ideal frequency response, non-omnidirectional directivity, acoustic reflections, and diffraction on the necessary hardware, impact the generated sound field. We evaluate reproduction accuracy in a 61-loudspeaker setup, the Simulated Open Field Environment, installed in an anechoic chamber. A first measurement following the ISO 8253-2:2009 standard for free-field audiology shows that the required accuracy is reached with critical-band-wide noise. A second measurement characterizes the sound pressure reproduced with the higher-order Ambisonics basic decoder, with and without max rE weighting, vector base amplitude panning, and nearest loudspeaker mapping on a 187 cm × 187 cm reproduction area. We show that the sweet-spot size observed in measured sound fields follows the rule kr≤N/2 rather than kr≤N but is still large enough to avoid compromising psychoacoustic experiments.
... For the relatively small ITDs we focussed on in this study, we observed significant effects of envelope width (or, equivalently, modulation rate) and pulse rate only on PT ITD sensitivity, but not on ENV ITD sensitivity. The effects on PT ITD sensitivity (greater sensitivity at 900 than 4500 pps and for more rapidly, rather than slowly, rising envelopes) are entirely in line with expectations from previously published studies (van Hoesel and Tyler 2003;Litovsky 2005;van Hoesel et al. 2009;Buck et al. 2023). ENV ITD sensitivity was uniformly low and similar for all envelope widths, envelope rates, and pulse rates tested. ...
Preprint
Cochlear implants (CIs) have restored enough of a sense of hearing to around one million severely hearing impaired patients to enable speech understanding in quiet. However, several aspects of hearing with CIs remain very poor. This includes a severely limited ability of CI patients to make use of interaural time difference (ITD) cues for spatial hearing and noise reduction. A major cause for this poor ITD sensitivity could be that current clinical devices fail to deliver ITD information in a manner that is accessible to the auditory pathway. CI processors measure the envelopes of incoming sounds and then stimulate the auditory nerve with electrical pulse trains which are amplitude modulated to reflect incoming sound envelopes. The timing of the pulses generated by the devices is largely or entirely independent of the incoming sounds. Consequently, bilateral CIs (biCIs) provide veridical envelope (ENV) ITDs but largely or entirely replace the “fine structure” ITDs that naturally occur in sounds with completely arbitrary electrical pulse timing (PT) ITDs. To assess the extent to which this matters, we devised experiments that measured the sensitivity of deafened rats to precisely and independently controlled PT and ENV ITDs for a variety of different CI pulse rates and envelope shapes. We observed that PT ITDs completely dominate ITD perception, while the sensitivity to ENV ITDs was almost negligible in comparison. This strongly suggests that the confusing yet powerful PT ITDs that contemporary clinical devices deliver to biCI patients may be a major cause of poor binaural hearing outcomes with biCIs. Significance Statement CIs deliver spectro-temporal envelopes, including speech formants, to severely deaf patients, but they do little to cater to the brain’s ability to process temporal sound features with sub-millisecond precision. CIs “sample” sound envelope signals rapidly and accurately, and thus provide information which should make it possible in principle for CI listeners to detect envelope ITDs in a similar way to normal listeners. However, here we demonstrate through behavioral experiments on CI implanted rats trained to detect sub-millisecond ITDs that pulse timing ITDs completely dominate binaural hearing. This provides the strongest confirmation to date that the arbitrary pulse timing widely used in current clinical CIs is a critical obstacle to good binaural hearing through prosthetic devices.
... 보청기나 인공와우와 같은 청각보조기기의 사용만으로 난 청인의 소음 하 의사소통 능력이 개선되지 않을 경우 청능훈련 을 시도해 볼 수 있다 (Gohari et al., 2023;Nanjundaswamy et al., 2018 (Ching et al., 2011;Gorodensky et al., 2019;Johnstone & Litovsky, 2006;Litovsky, 2005;Misurelli & Litovsky, 2012;Misurelli & Litovsky, 2015). 공간청능훈련을 시행한 여러 선행 연구들의 훈 련 구성과 프로토콜에 차이가 있었으나 (Cameron & Dillon, 2011;Coudert et al., 2023;Jarollahi et al., 2019;Lotfi et al., 2016, Tyler et al., 2010Valzolgher et al., 2023) ...
Article
Purpose: This study aimed to determine whether auditory spatial training with real-life environmental noise would improve the speech-in-noise intelligibility of hearing-impaired children. Methods: Thirteen children with hearing loss participated in this study. We conducted an 8-week in-laboratory auditory spatial training. During the training, the target sentence and pre-recorded real-life environmental noise were spatially separated by 90°, and uncertainty about the location of the target and noise was given. To evaluate the efficacy of the training, sentence recognition with fluctuating and non-fluctuating noises was measured in a free sound-field condition, where the speech and noise sources were spatially separated and also co-located. The pre-training tests of sentence-in-noise recognition were performed twice with an interval of 6 weeks. The sentence-in-noise recognition test was also measured right after the 8-week training (post-training test) as well as 1 month after the completion of the training (retention test). In addition to the objective tests, the parents completed a subjective questionnaire on auditory behavior in everyday life before and after training. Results: There were no significant differences between the results of the two pre-training tests. The auditory spatial training significantly enhanced sentence-in-noise recognition in both spatially separated and co-located conditions at all signal-to-noise ratios, and the training efficacy was maintained until 1 month after the completion of the training. The parental subjective responses also showed positive changes after the training. Conclusion: An 8-week auditory spatial training could effectively enhance the speech-in-noise intelligibility of hearing-impaired children in spatialized as well as non-spatialized conditions.
... Binaural hearing promotes speech understanding in noise, enabling these cues to segregate target speech from background noise [8][9][10][11][12]. Studies on the maturation of binaural hearing are needed as the knowledge of typical development would help diagnose such deficits during a child's formative years, which can directly affect academic performance [13]. ...
Article
Objectives: Binaural hearing is the interplay of acoustic cues (interaural time differences: ITD, interaural level differences: ILD, and spectral cues) and cognitive abilities (e.g., working memory, attention). The current study investigated the effect of developmental age on auditory binaural resolution and working memory and the association between them (if any) in school-going children. Methods: Fifty-seven normal-hearing school-going children aged 6-15 y were recruited for the study. The participants were divided into three groups: Group 1 (n=17, Mage = 7.1y ± 0.72 y), Group 2 (n = 23; Mage = 10.2y ± 0.8 y), Group 3 (n = 17; Mage: 14.1 y ±1.3 y). Group 4, with normal hearing young adults (n = 20; Mage = 21.1 y± 3.2 y), was included for comparing the maturational changes in former groups with adult values. Tests of binaural resolution (ITD and ILD thresholds) and auditory working memory (forward and backward digit span and 2n-back digit) were administered to all the participants. Results: Results indicated a main effect of age on spatial resolution and working memory, with the median of lower age groups (Group 1 & Group 2) being significantly poorer (p < 0.01) than the higher age groups (Group 3 & Group 4). Groups 2, 3, and 4 performed significantly better than Group 1 (p < 0.001) on the forward span and ILD task. Groups 3 and 4 had significantly better ITD (p = 0.04), backward span (p = 0.02), and 2n-back scores than Group 2. A significant correlation between scores on working memory tasks and spatial resolution thresholds was also found. On discriminant function analysis, backward span and ITD emerged as sensitive measures for segregating older groups (Group 3 & Group 4) from younger groups (Group 1 & Group 2). Conclusions: The present study showed that the ILD thresholds and forward digit span mature by nine years. However, the backward digit span score continued to mature beyond 15 y. This finding can be attributed to the influence of auditory attention (a working memory process) on the binaural resolution, which is reported to mature till late adolescence.
... For the relatively small ITDs we focussed on in this study, we observed significant effects of envelope width (or, equivalently, modulation rate) and pulse rate only on PT ITD sensitivity, but not on ENV ITD sensitivity. The effects on PT ITD sensitivity (greater sensitivity at 900 than 4500 pps and for more rapidly, rather than slowly, rising envelopes) are entirely in line with expectations from previously published studies (van Hoesel and Tyler 2003;Litovsky 2005;van Hoesel et al. 2009;Buck et al. 2023). ENV ITD sensitivity was uniformly low and similar for all envelope widths, envelope rates, and pulse rates tested. ...
Article
This study examined the role of visual speech in providing release from perceptual masking in children by comparing visual speech benefit across conditions with and without a spatial separation cue. Auditory-only and audiovisual speech recognition thresholds in a two-talker speech masker were obtained from 21 children with typical hearing (7–9 years of age) using a color–number identification task. The target was presented from a loudspeaker at 0° azimuth. Masker source location varied across conditions. In the spatially collocated condition, the masker was also presented from the loudspeaker at 0° azimuth. In the spatially separated condition, the masker was presented from the loudspeaker at 0° azimuth and a loudspeaker at –90° azimuth, with the signal from the –90° loudspeaker leading the signal from the 0° loudspeaker by 4 ms. The visual stimulus (static image or video of the target talker) was presented at 0° azimuth. Children achieved better thresholds when the spatial cue was provided and when the visual cue was provided. Visual and spatial cue benefit did not differ significantly depending on the presence of the other cue. Additional studies are needed to characterize how children's preferential use of visual and spatial cues varies depending on the strength of each cue.
Article
Full-text available
The psychometric function relates an observer’s performance to an independent variable, usually some physical quantity of a stimulus in a psychophysical task. This paper, together with its companion paper (Wichmann & Hill, 2001), describes an integrated approach to (1) fitting psychometric functions, (2) assessing the goodness of fit, and (3) providing confidence intervals for the function’s parameters and other estimates derived from them, for the purposes of hypothesis testing. The present paper deals with the first two topics, describing a constrained maximum-likelihood method of parameter estimation and developing several goodness-of-fit tests. Using Monte Carlo simulations, we deal with two specific difficulties that arise when fitting functions to psychophysical data. First, we note that human observers are prone to stimulus-independent errors (orlapses). We show that failure to account for this can lead to serious biases in estimates of the psychometric function’s parameters and illustrate how the problem may be overcome. Second, we note that psychophysical data sets are usually rather small by the standards required by most of the commonly applied statistical tests. We demonstrate the potential errors of applying traditionalX 2 methods to psychophysical data and advocate use of Monte Carlo resampling techniques that do not rely on asymptotic theory. We have made available the software to implement our methods.
Article
This study investigated the development of auditory frequency and temporal resolution using simultaneous and backward masking of a tone by a noise. The participants were 6- to 10-year-old children and adults. On the measure of frequency resolution (the difference in the detection threshold for a tone presented either in a bandpass noise or in a spectrally notched noise), 6-year-old children performed as well as adults. However, for the backward masking task, 6-year-olds had, on average, 34 dB higher thresholds than adults. A negative exponential decay function Fitted to the backward masking data for subjects of all ages indicated that adult-like temporal resolution may not be reached until about 11 years of age. These results show that, measured by masking, frequency resolution has reached adult-like performance by 6 years of age, whereas temporal resolution develops beyond 10 years of age. Six-year-old children were also assessed with tests of cognitive ability. Improvements in both frequency and temporal resolution were found with increasing IQ score.
Article
Three experiments investigated factors that influence the creation of and release from informational masking in speech recognition. The target stimuli were nonsense sentences spoken by a female talker. In experiment 1 the masker was a mixture of three, four, six, or ten female talkers, all reciting similar nonsense sentences. Listeners' recognition performance was measured with both target and masker presented from a front loudspeaker (F-F) or with a masker presented from two loudspeakers, with the right leading the front by 4 ms (F-RF). In the latter condition the target and masker appear to be from different locations. This aids recognition performance for one- and two-talker maskers, but not for noise. As the number of masking talkers increased to ten, the improvement in the F-RF condition diminished, but did not disappear. The second experiment investigated whether hearing a preview (prime) of the target sentence before it was presented in masking improved recognition for the last key word, which was not included in the prime. Marked improvements occurred only for the F-F condition with two-talker masking, not for continuous noise or F-RF two-talker masking. The third experiment found that the benefit of priming in the F-F condition was maintained if the prime sentence was spoken by a different talker or even if it was printed and read silently. These results suggest that informational masking can be overcome by factors that improve listeners' auditory attention toward the target. (C) 2004 Acoustical Society of America.
Article
Three experiments investigated the roles of interaural time differences (ITDs) and level differences (ILDs) in. spatial unmasking in multi-source environments. In experiment 1, speech reception thresholds (SRTs) were measured in virtual-acoustic simulations of an anechoic environment with three interfering sound sources of either speech or noise. The target source lay directly ahead, while three interfering sources were (1) all at the target's location (0degrees,0degrees,0'), (2) at locations distributed across both hemifields (-30degrees,60degrees,90degrees); (3) at locations in the same hemifield (30degrees,60degrees,90degrees), or (4) co-located in one hernifield (90degrees,90degrees,90degrees). Sounds were convolved with head-related impulse responses (HRIRs) that were manipulated to remove individual binaural cues. Three conditions used HRIRs with (1) both ILDs and ITDs, (2) only ILDs, and (3) only ITDs. The ITD-only condition produced the same pattern of results across spatial configurations as the combined cues, but with, smaller differences between spatial configurations. The ILD-only condition yielded similar SRTs for the (-30degrees,60degrees,90degrees) and (0degrees,0degrees,0degrees) configurations, as expected for best-ear listening. In experiment 2, pure-tone BMLDs were measured at third-octave frequencies against the ITD-only, speech-shaped noise interferers of experiment 1. These BMLDs were 4-8 dB at low frequencies for all spatial configurations. In experiment 3, SRTs were measured for speech in diotic, speech-shaped noise. Noises were filtered to reduce the spectrum level at each frequency according to the BMLDs measured in experiment 2. SRTs were as low or lower than those of the corresponding ITD-only conditions from experiment 1. Thus, an explanation of speech understanding in complex listening environments based on the combination of best-ear listening and binaural unmasking (without involving sound-localization) cannot be excluded. (C) 2004 Acoustical Society of America.
Article
The IEEE Recommended Practice for Speech Quality Measurement is the product of roughly six years of study, discussion, writing, and rewriting by a diligent team of scientists representing a broad range of disciplines and research institutions. We will attempt to recognize here all those who participated, ranging from the dogged individuals who were with the committee for its entire life to those who participated in at least one meeting.
Article
The speech reception threshold (SRT) for sentences in a free field condition was investigated as a funcion of (1) the orientation of the speaker's head, and (2) the azimuth of a noise source, with the listener always looking in the direction of the speaker. A miniature electric microphone with a flat frequency response was glued to the upper lip of a pseudospeaker standing 2 m from a loudspeaker. For each of ten directional orientations of the pseudospeaker, corresponding to angles, of zero deg, 22.5 deg, 45 deg, 90 deg, and 180 deg relative to the loudspeaker, a list of 13 sentences was reproduced and rerecorded through the microphone. Results show that (1) the maximum effect of direction in speech radiation from the mouth of the speaker is, in terms of signal to noise ratio, 6 dB, and (2) the maximum effect of the azimuth of the noise source is 10 dB.
Article
To determine the effect of otitis media with effusion (OME) on perceptual masking (a phenomenon in which spondee threshold for a 2-talker masker is poorer than for a speech-shaped noise masker). Longitudinal testing over a 1-year period following insertion of tympanostomy tubes, using clinical and normal-hearing control groups. Forty-seven children having a history of OME were tested. Possible testing intervals were just before the placement of tympanostomy tubes, and up to 3 separate occasions after the placement of the tubes. An age-matched control group of 19 children was tested. A perceptual masking paradigm was used to measure the ability of the listener to recognize a spondee in either a speech-shaped noise or a 2-talker masker background. The masker was either continuous or gated on and off with the target spondee. In gated masking conditions, children with a history of normal hearing showed only slight perceptual masking, but the children with a history of OME showed relatively great perceptual masking before surgery and up to 6 months following surgery. In continuous masking conditions, both groups of children showed relatively great perceptual masking and did not differ significantly from each other in this respect either before or after surgery. However, before surgery, the OME group showed higher thresholds in both the 2-talker and speech-shaped noise maskers. In agreement with previous psychoacoustical findings, the relatively great perceptual masking in gated conditions shown by children with OME history may reflect a general deficit in complex auditory processing.