ArticlePDF Available

Modeling auditory processing of amplitude modulation. II. Spectral and temporal integration

Authors:

Abstract and Figures

A multi-channel model, describing the effects of spectral and temporal integration in amplitude-modulation detection for a stochastic noise carrier, is proposed and validated. The model is based on the modulation filterbank concept which was established in the accompanying paper [Dau et al., J. Acoust. Soc. Am. 102, 2892-2905 (1997)] for modulation perception in narrow-band conditions (single-channel model). To integrate information across frequency, the detection process of the model linearly combines the channel outputs. To integrate information across time, a kind of "multiple-look" strategy, is realized within the detection stage of the model. Both data from the literature and new data are used to validate the model. The model predictions agree with the results of Eddins [J. Acoust. Soc. Am. 93, 470-479 (1993)] that the "time constants" associated with the temporal modulation transfer functions (TMTF) derived for narrow-band stimuli do not vary with carrier frequency region and that they decrease monotonically with increasing stimulus bandwidth. The model is able to predict masking patterns in the modulation-frequency domain, as observed experimentally by Houtgast [J. Acoust. Soc. Am. 85, 1676-1680 (1989)]. The model also accounts for the finding by Sheft and Yost [J. Acoust. Soc. Am. 88, 796-805 (1990)] that the long "effective" integration time constants derived from the data are two orders of magnitude larger than the time constants derived from the cutoff frequency of the TMTF. Finally, the temporal-summation properties of the model allow the prediction of data in a specific temporal paradigm used earlier by Viemeister and Wakefield [J. Acoust. Soc. Am. 90, 858-865 (1991)]. The combination of the modulation filterbank concept and the optimal decision algorithm proposed here appears to present a powerful strategy for describing modulation-detection phenomena in narrow-band and broadband conditions.
Content may be subject to copyright.
Modeling auditory processing of amplitude modulation. II.
Spectral and temporal integration
a)
Torsten Dau
b)
and Birger Kollmeier
Carl von Ossietzky Universita
¨
t Oldenburg, Graduiertenkolleg Psychoakustik, AG Medizinische Physik,
D-26111 Oldenburg, Germany
Armin Kohlrausch
IPO Center for Research on User-System Interaction, P.O. Box 513, 5600 MB Eindhoven, The Netherlands
~Received 28 June 1996; accepted for publication 4 August 1997!
A multi-channel model, describing the effects of spectral and temporal integration in
amplitude-modulation detection for a stochastic noise carrier, is proposed and validated. The model
is based on the modulation filterbank concept which was established in the accompanying paper
@Dau et al., J. Acoust. Soc. Am. 102, 28922905 ~1997!# for modulation perception in narrow-band
conditions ~single-channel model!. To integrate information across frequency, the detection process
of the model linearly combines the channel outputs. To integrate information across time, a kind of
‘‘multiple-look’’ strategy, is realized within the detection stage of the model. Both data from the
literature and new data are used to validate the model. The model predictions agree with the results
of Eddins @J. Acoust. Soc. Am. 93, 470479 ~1993!# that the ‘‘time constants’’ associated with the
temporal modulation transfer functions ~TMTF! derived for narrow-band stimuli do not vary with
carrier frequency region and that they decrease monotonically with increasing stimulus bandwidth.
The model is able to predict masking patterns in the modulation-frequency domain, as observed
experimentally by Houtgast @J. Acoust. Soc. Am. 85, 16761680 ~1989!#. The model also accounts
for the finding by Sheft and Yost @J. Acoust. Soc. Am. 88, 796805 ~1990!# that the long
‘‘effective’’ integration time constants derived from the data are two orders of magnitude larger than
the time constants derived from the cutoff frequency of the TMTF. Finally, the temporal-summation
properties of the model allow the prediction of data in a specific temporal paradigm used earlier by
Viemeister and Wakefield @J. Acoust. Soc. Am. 90, 858865 ~1991!#. The combination of the
modulation filterbank concept and the optimal decision algorithm proposed here appears to present
a powerful strategy for describing modulation-detection phenomena in narrow-band and broadband
conditions. © 1997 Acoustical Society of America.
@S0001-4966~97!05711-1#
PACS numbers: 43.66.Ba, 43.66.Dc, 43.66.Mk @JWH#
INTRODUCTION
One of the most interesting and fundamental questions
in psychophysical research is how the auditory system
‘‘trades’’ spectral and temporal resolution. One problem in
this field is the question of how peripheral filtering affects
the ability to detect modulation. It is often postulated that
peripheral filtering does not limit the ability to detect modu-
lation and that in ‘‘temporal resolution’’ tasks such as modu-
lation detection the observer broadens his ‘‘effective’’ band-
width ~e.g., Viemeister, 1979; Berg, 1996!. With such an
assumption, the experimental data can be simulated by a
model using a ‘‘predetection’’ filter as broad as several criti-
cal bandwidths to account for some kind of peripheral filter-
ing ~Viemeister, 1979!. If, in contrast, peripheral filtering in
terms of critical bands has any influence on modulation de-
tection, the question arises how the applied modulation
analysis depends on center frequency. In this vein, some au-
thors have postulated that the TMTF of a broadband noise is
actually determined by the information in the highest critical
band excited by the stimulus ~e.g., Maiwald, 1967a, b; Van
Zanten, 1980!. Eddins ~1993! examined spectral integration
in amplitude-modulation detection, independently varying
stimulus bandwidth and spectral region. He found that the
cutoff frequency of the TMTF does not depend on carrier
frequency region, but increases significantly with increasing
carrier bandwidth. As recently pointed out by Strickland and
Viemeister ~1997!, this latter observation might be an artifact
of the stimulus generation, which included a rectangular
bandlimitation after modulation of the carrier.
In Dau ~1996! and Dau et al. ~1997!, experiments on
amplitude-modulation detection were described using
narrow-band noise as the carrier at a high center frequency ~5
kHz!. By using these conditions, effects of any peripheral or
‘‘predetection’’ filtering were minimized and, in addition,
the spectral region that was being used to detect the modu-
lation was restricted to one critical band. A model of the
effective signal processing in the auditory system, including
a modulation filterbank, was derived to analyze the temporal
envelope of the stimuli. This model will be called the
a!
Part of this research was presented at the 131th meeting of the Acoustical
Society of America @T. Dau, B. Kollmeier and A. Kohlrausch, ‘‘A quan-
titative prediction of modulation masking with an optimal-detector
model,’’ J. Acoust. Soc. Am. 99, 2565~A!~1996!#.
b!
Corresponding author. Electronic mail: torsten@medi.physik.uni-
oldenburg.de
2906 2906J. Acoust. Soc. Am. 102 (5), Pt. 1, November 1997 0001-4966/97/102(5)/2906/14/$10.00 © 1997 Acoustical Society of America
‘‘modulation filterbank model’’ throughout this paper.
To get more insight into the processing of modulation,
several experiments are described here that investigate the
effects of spectral integration in amplitude-modulation detec-
tion, examining in particular the transition between narrow-
band and broadband conditions. To test the capabilities of
the modulation filterbank model in conditions of spectral in-
tegration, some critical experiments are performed and
model predictions are compared with data from the literature
~Eddins, 1993; Sheft and Yost, 1990! and the new experi-
mental data. To compare results from experiments and simu-
lations with as close a similarity in the experimental details
as possible, the experiments by Eddins ~1993! and those by
Sheft and Yost ~1990! were replicated in our laboratory with
a slightly different threshold estimation procedure.
Another aspect of modulation perception is the phenom-
enon of temporal integration in amplitude-modulation detec-
tion. Temporal integration or temporal summation refers to
the well-known fact that over a range of durations thresholds
decrease with increasing signal duration. Several models
have been suggested in the literature to describe this phe-
nomenon. However, differences in the modeling strategies
occur that reflect the ‘‘resolution-integration conflict’’ ~de
Boer, 1985!. In the case of temporal integration in modula-
tion detection, Sheft and Yost ~1990! found that time con-
stants associated with temporal integration are much larger
than those indicated by the ‘‘resolution data.’’ The present
study examines the ability of the modulation filterbank
model to account for these long effective time constants
found in the data.
As before, both our own experimental data from ‘‘criti-
cal’’ experiments as well as data from the literature were
compared with model predictions. It should be emphasized at
this point that the same parameter set was used for all model
predictions throughout this paper. These parameters and in
particular the scaling of the modulation filters were also the
same as in the accompanying paper ~Dau et al., 1997!, where
the filterbank parameters were adjusted to predict amplitude
modulation-detection thresholds for narrow-band noise
carriers.
1
I. MULTI-CHANNEL MODEL
All simulations discussed here were performed on the
basis of the modulation filterbank model that was initially
developed as a single-channel model, as described in Dau
et al. ~1997!. Figure 1 shows the structure of the multi-
channel model which was extended from the single-channel
model by performing the modulation analysis in parallel on
the output of each stimulated peripheral auditory filter. This
is motivated in part by results from physiological studies of
the representation of amplitude modulation within the infe-
rior colliculus ~IC! of the cat, where it was found that modu-
lation frequencies are represented in a systematic way or-
thogonally to the tonotopical organization of the IC
~Schreiner and Langner, 1988!.
The model contains the same stages of signal processing
along the auditory pathway as proposed in the single-channel
model. The main features are described briefly here; for fur-
ther details see Dau et al. ~1996a, 1997!. To simulate the
peripheral bandpass characteristics of the basilar membrane,
the gammatone filterbank of Patterson et al. ~1987! is used.
The filters overlap at the 23-dB points of their transfer func-
tions. The summation of all filter transfer functions would
result in a flat transfer characteristic across frequency. The
stimulus at the output of each peripheral filter is half-wave
rectified and low-pass filtered at 1 kHz. This stage essentially
preserves the envelope of the signal for high center frequen-
cies. Effects of adaptation are simulated by nonlinear adap-
tation circuits ~Pu
¨
schel, 1988; Dau et al., 1996a!. The param-
eters for these circuits were the same as in Dau et al. ~1996a,
b!. The adapted signal is further analyzed by a linear modu-
lation filterbank. The parameters of the modulation filters
agree with those in Dau et al. ~1997! and we refer to that
paper for a detailed description. It is assumed within the
present study that these parameters do not vary across fre-
quency, i.e., the same modulation filterbank is applied to
analyze the signal’s envelope at the output of each critical
band. Limitations of resolution are simulated by adding in-
ternal noise with a constant variance to each modulation fil-
ter output. The decision device is realized as an optimal de-
tector in the same way as described in Dau et al. ~1996a,
1997!. Within the multi-channel model, the internal repre-
sentation of the stimuli has four dimensions, namely ampli-
tude, time, modulation center frequency ~as before in the
single-channel model!, and ~audio! center frequency as the
additional axis. In the simulations the internal representa-
tions of the different peripheral channels are appended one
after another in one large array. The decision process works
exactly in the same way as described in the single-channel
analysis, with the extension that all auditory filters with cen-
FIG. 1. Block diagram of the psycho-acoustical model for describing modu-
lation detection data with an optimal detector as the decision device. The
signals are preprocessed, subjected to adaptation, filtered by a modulation
filterbank and finally added to internal noise; this processing transforms the
signals into their ‘‘internal representations.’’
2907 2907J. Acoust. Soc. Am., Vol. 102, No. 5, Pt. 1, November 1997 Dau
et al.
: Spectral and temporal integration
ter frequencies within the spectral range of the stimulus were
included in the analysis. Decisions were then based on the
cross correlation between the ~four-dimensional! actual inter-
nal representation of the stimulus and the normalized su-
prathreshold ~four-dimensional! template representation. Us-
ing such a model, a prediction of the average subject’s
performance is possible on a trial-by-trial basis. Simulations
of the experimental runs can thus be performed by imple-
menting on the computer the same threshold estimating pro-
cedures within the model as those being used in the experi-
ments with human observers.
II. METHOD
Modulation thresholds were measured and simulated us-
ing an adaptive 3-interval forced-choice ~3IFC! procedure.
The experimental procedure and stimulus generation were
the same as described in the accompanying paper ~Dau et al.,
1997!. Also, the same five subjects participated in this study.
Several experiments on modulation detection were per-
formed. The subject’s task was to detect a sinusoidal test
modulation of a Gaussian noise carrier. The test modulation
was applied with zero onset phase. Unless explicitly stated, a
carrier level of 65 dB SPL was used. During the experiments
with a noise carrier, an independent sample of the noise was
presented in each interval. Filtering of the noise stimuli was
done in the frequency domain by Fourier transforming the
whole noise waveform, setting the respective frequency
samples to zero, and transforming the signal back into the
time domain. Unless stated otherwise, the filtering was ap-
plied prior to modulation. The rms level of the modulated
signal was always equated to the rms level employed in the
unmodulated trials.
III. RESULTS
A. Modulation analysis within and beyond one critical
band
In the first experiment, the effect of carrier bandwidth on
the detectability of a certain signal modulation was examined
for bandwidths within and beyond one critical band. In the
accompanying study by Dau et al. ~1977! only carrier band-
widths smaller than a critical band were applied and in the
corresponding simulations the modulation analysis was per-
formed only in one peripheral channel. The present experi-
ment was designed to illustrate the limitations of the ‘‘single-
channel’’ model in conditions of spectral integration in
modulation detection, and to show the necessity of the ex-
tention of the model towards a ‘‘multi-channel’’ model.
A noise carrier centered at 5 kHz was sinusoidally
modulated with a modulation rate of 5 Hz. Modulation depth
at threshold was measured as a function of the carrier band-
width, which had one of the following values: 10, 25, 50,
100, 250, 500, 1000, 1500, 2500, 5000, or 10 000 Hz. The
duration of the carrier and the modulation was 500 ms in-
cluding 50-ms cosine-squared rise and decay ramps. The
bandwidths cover the range from less than the critical band-
width at 5 kHz to much greater than the critical bandwidth.
The left panel of Fig. 2 shows the experimental results of
three subjects ~open symbols!. The ordinate indicates modu-
lation depth at threshold and the abscissa shows the carrier
bandwidth. Thresholds decrease monotonically with increas-
ing bandwidth. This decrease is due to changes in the spec-
trum of the inherent fluctuations, which, for low modulation
frequencies, decreases with increasing bandwidth of the car-
rier ~Lawson and Uhlenbeck, 1950; see Dau et al., 1997!.As
a consequence, less energy of the random envelope fluctua-
tions of the carrier leaks into the transfer range of the modu-
lation filters near the test modulation frequency. In the
model, this leads to decreasing thresholds with increasing
carrier bandwidth. Simulated thresholds obtained with the
single-channel model as described in Dau et al. ~1997! are
plotted in the left panel of Fig. 2 ~filled symbols!. In this
single-channel simulation, the modulation analysis was car-
ried out using the peripheral filter tuned to 5 kHz. In condi-
tions with carrier bandwidths D f,1000 Hz, simulated and
measured data are in good agreement. For bandwidths D f
>1000 Hz, however, systematic differences between simu-
lations and experimental data occur, which increase with in-
FIG. 2. Modulation-detection thresholds of a 5-Hz modulation as a function of the carrier bandwidth. Open symbols indicate measured data of three subjects
~same in the left and the right panel!. Simulated thresholds with the single-channel model are represented in the left panel by the filled circles. Simulated
thresholds with the multi-channel model are shown in the right panel ~filled circles!. The filled box represents the simulated threshold for D f5 10 kHz, where
the modulation analysis was performed using the highest critical band excited by the stimulus. Center frequency: 5 kHz; Carrier and modulation duration: 500
ms. Level: 65 dB SPL. Subjects: JV ~h!;TD~s!;PD~L!; optimal detector ~d,j!.
2908 2908J. Acoust. Soc. Am., Vol. 102, No. 5, Pt. 1, November 1997 Dau
et al.
: Spectral and temporal integration
creasing bandwidth. For the bandwidth D f510 kHz, for ex-
ample, the difference between the experimental results and
simulations amounts to more than 5 dB. Of course, this dis-
crepancy is expected, since the simulations were carried out
using only the information about the signal modulation avail-
able at the output of one peripheral filter. The filled box in
the left panel of Fig. 2 represents the simulated threshold for
a carrier bandwidth of 10 kHz, where the modulation analy-
sis was performed using the highest critical band excited by
the stimulus. Even in this ‘‘optimized’’ single-channel simu-
lation, there remains a discrepancy of nearly 5 dB between
the simulated and the measured threshold. This does not sup-
port the hypothesis by Maiwald ~1967a, b! and Van Zanten
~1980! that sufficient information about the signal modula-
tion is available in the highest excited frequency region and
that the detection strategy of the subject is to monitor the
‘‘internal’’ filters in this high-frequency region.
2
The right
panel of Fig. 2 shows the simulated data from the multi-
channel filterbank model together with the experimental data
already shown in the left panel. By combining information
from all excited peripheral channels, the model can account
for the continuing decrease in thresholds over the whole
range of carrier bandwidths.
B. Effects of bandwidth and frequency region
In the next experiment, effects of absolute bandwidth
and frequency region on spectral integration in modulation
detection were studied in the same way as by Eddins ~1993!.
Modulation thresholds for narrow-band noise carriers were
measured as a function of modulation frequency for the fol-
lowing conditions: The stimulus bandwidth was either 200,
400, 800, or 1600 Hz and the frequency region was varied by
adjusting the high-frequency cutoff of the noise to be either
600, 2200, or 4400 Hz. The purpose was to determine
TMTFs by independently varying stimulus bandwidth and
stimulus frequency to determine the influence of these pa-
rameters on modulation detection. As in Eddins ~1993!, the
modulated stimuli were generated by bandpass filtering after
amplitude modulation of wide-band noise to avoid the pos-
sibility that the detection of modulation would be based on
spectral cues in the signal interval rather than changes in the
temporal waveform. In this way, the bandwidth of the
narrow-band stimuli was the same in the presence or absence
of modulation. In addition, the stimuli were adjusted to have
equal energy in each interval of the forced-choice trial to
prevent detection of modulation based on overall level rather
than on the presence or absence of modulation.
Figures 3 and 4 show the experimental data of two sub-
jects for several stimulus conditions. Modulation depth m at
threshold is plotted as a function of modulation frequency.
The transfer functions reflect a bandpass characteristic that is
similar to data from previous studies ~Rodenburg, 1972,
1977; Viemeister, 1977, 1979; Formby and Muir, 1988!. The
data are in very good agreement with the results of Eddins,
FIG. 3. Modulation-detection thresholds for one subject ~BG! as a function of modulation frequency with upper cutoff frequency as the parameter. Each panel
represents a different bandwidth condition: upper left: 200 Hz, upper right: 400 Hz, lower left: 800 Hz, lower right: 1600 Hz. The upper cutoff frequency is
either 4400 Hz ~n!; 2200 Hz ~L!; or 600 Hz ~s!.
2909 2909J. Acoust. Soc. Am., Vol. 102, No. 5, Pt. 1, November 1997 Dau
et al.
: Spectral and temporal integration
in spite of slight differences in the threshold estimation pro-
cedure between both studies. Increasing stimulus bandwidth
results in a corresponding increase in sensitivity to modula-
tion. This systematic increase is seen for each of the three
cutoff frequencies tested. Eddins fitted his data—for each
subject and condition—with the transfer function of a simple
low-pass filter and derived the time constant associated with
the 2 3-dB cutoff frequency of the specific transfer function.
Two interpretations emerged from his analysis: First, the
time constants associated with the TMTFs do not vary with
changing frequency region. Eddins ~1993! concluded from
the data that temporal acuity is independent of frequency
region, assuming that temporal acuity and derived time con-
stants from the data are directly related. Second, the time
constants associated with TMTFs decrease monotonically
with increasing stimulus bandwidth.
Figure 5 shows the corresponding model predictions ob-
tained with the modulation filterbank model. Modulation
depth at threshold is plotted as a function of modulation
frequency. Modulation analysis was performed using those
peripheral channels with center frequencies within the spec-
tral width of the noise carrier. The shape of the simulated
threshold patterns agrees well with the corresponding data,
but there is a constant absolute deviation of 25 dB between
model predictions and individual experimental data. That is,
the form of the simulated TMTF does not depend on the
frequency region of the stimuli but depends on the stimulus
bandwidth. In some—extreme—conditions, e.g., for D f
5 400 Hz at the upper cutoff frequency of 600 Hz and for
D f5800 Hz at the upper cutoff frequency of 2200 Hz, the
slope of the simulated TMTF between the penultimate and
the last modulation frequency is steeper than in case of other
cutoffs at these bandwidths. The same tendency is seen in
most of the corresponding experimental conditions, e.g., in
Fig. 3 for 800 Hz and in Fig. 4 for 400 and 800 Hz.
Within the model the ‘‘low-frequency’’ and the ‘‘high-
frequency’’ conditions require quite different decision strat-
egies to determine the detection threshold. Consider, for ex-
ample, the bandwidth of D f5 400 Hz and the upper cutoff
frequencies of 600 and 4400 Hz, representing the low-
frequency and the high-frequency conditions, respectively.
In the low-frequency condition, the modulation analysis is
performed in parallel in several peripheral filters. In the high-
frequency condition, the modulation analysis is carried out in
only one peripheral channel because of the poor spectral
resolution of the auditory system at high frequencies. The
results of the simulations in the modulation detection task are
the same for both conditions ~except for the highest modula-
tion frequency!, indicating that, apparently, the poor tempo-
ral resolution in the low-frequency condition is somehow
compensated for by the greater number of ‘‘observations’’
across frequency compared to the high-frequency condition.
This compensation will break down at high modulation fre-
quencies, because these will be strongly attenuated, if many
narrow auditory filters cover the carrier spectrum. In this
case, a second limitation for detecting modulation comes into
play, namely the absolute threshold for modulation detec-
tion. Such a threshold was introduced within the present
model by the addition of internal noise to the output of each
modulation filter. For a sinusoidal carrier this internal noise
FIG. 4. Same as Fig. 3 for subject TD.
2910 2910J. Acoust. Soc. Am., Vol. 102, No. 5, Pt. 1, November 1997 Dau
et al.
: Spectral and temporal integration
results in a threshold of detectability for a low-frequency
amplitude modulation of about 227 dB. If, in the case of a
noise carrier, the imposed modulation is strongly attenuated
within one or more peripheral filters, the detectability of the
modulation is not determined solely by the inherent statistics
of the noise carrier but is also determined by the internal
noise. As a consequence, in such a condition, some of the
~low-frequency! peripheral filters do not contribute to the
information about the signal modulation leading to an in-
creased detection threshold. Therefore, in an ‘‘extreme’’
condition as described above, namely in the case of the high-
est imposed modulation frequency in the lowest-frequency
region, the width of the peripheral filter has an influence on
modulation detection. This influence is indeed seen in the
model predictions and in the experimental data for some, but
not all, subjects.
C. Influence of filter shape and spacing on spectral
integration
In some additional simulations we investigated the influ-
ence of filter shape and filter spacing on spectral integration.
These simulations were performed for a low modulation fre-
quency of 8 Hz and involved high-frequency auditory chan-
nels of a broadband running noise carrier. Thresholds are
based on 20 repeated estimates of the model.
The gain resulting from the analysis of coherent modu-
lation in several filters is a consequence of combining ~par-
tially! independent observations. As long as external fluctua-
tions are the limiting factor for detection, the independence
of these observations will depend on the correlation between
the carrier waveforms in these channels. The correlation de-
pends in turn on the shape and spacing of the filters. In a first
simulation, we implemented nonoverlapping rectangular fil-
ters in the model instead of filters with a Roex shape. In this
situation, the carrier waveforms in the various filters are
completely independent. This condition results in a threshold
decrease with 3 dB per doubling of the number of analyzed
filters. This is the same amount which will later be described
and discussed in the context of temporal integration ~see Sec.
IV B!. In a similar simulation with Roex filters overlapping
as in the present model, the gain was somewhat smaller and
amounted to about 2.5 dB per doubling of the number of
filters ~8-dB threshold change by going from one to nine
filters!. This result shows that the remaining correlation be-
tween adjacent Roex filters has a small, but ‘‘measurable’’
effect on the threshold prediction.
In a second simulation, we compared predictions for a
single high-frequency filter with those for the combination of
five adjacent Roex filters. These five filters either had a spac-
ing as in the present model ~narrow spacing! or they were
separated twice as much ~wide spacing!. In agreement with
the previous simulation, the combination of narrower spaced
filters resulted in a threshold 5.5 dB below the prediction for
a single filter. In the wide-spaced condition, thresholds were
an additional 1.25 dB lower. Compared with the single-filter
result, the combination of widely spaced Roex filters led to a
3-dB effect per doubling of the number of filters. We thus
can conclude that by allowing some overlap between adja-
FIG. 5. Simulated modulation-detection thresholds as a function of modulation frequency for the same conditions as in Figs. 3 and 4. Thresholds are indicated
by filled symbols.
2911 2911J. Acoust. Soc. Am., Vol. 102, No. 5, Pt. 1, November 1997 Dau
et al.
: Spectral and temporal integration
cent filters, the observations in these filters are not statisti-
cally independent. By using a wider spacing, the information
gain from combining a certain number of filters is larger.
However, with a wider spacing, the number of filters that
could be placed within a given spectral range would de-
crease. This decrease in the number of observations would
affect thresholds more strongly than the gain from the statis-
tical independence, so that the spacing used in the present
model makes better use of the available information than a
wider spacing would.
Finally, we checked whether an even narrower spacing
of filters would change the model predictions. This was done
for one of the conditions from Eddins ~1993! described in the
previous section. The carrier was a bandlimited noise rang-
ing from 2800 to 4400 Hz. In the first simulation, modulation
analysis was performed with Roex filters overlapping at their
2 3-dB points. In a second simulation, the spectral density
and thus the total number of analyzed filters was increased
by a factor of 3. Both simulations led to the same threshold
prediction.
In summary, the additional simulations show that the
chosen overlap between adjacent filters allows a close to op-
timal detection of modulation imposed on a broadband noise
carrier with a minimal computational load ~i.e., the smallest
number of analyzed auditory channels!.
D. Predictions for modulation masking using
broadband noise carriers
Houtgast ~1989! adopted a classical masking paradigm
for investigating frequency selectivity in the modulation-
frequency domain: the detectability of test modulation in the
presence of masker modulation, as a function of the spectral
difference between test and masker modulation.
The carrier in all his experiments was a pink noise with
a spectrum level in the 1-kHz region of about 25 dB SPL.
After applying the modulation, the carrier was bandpass fil-
tered between 1 and 4 kHz and added to unmodulated pink
noise with a complementary ~bandstop! spectrum. The mask-
ing patterns ~of the first experiment! were obtained for each
of three
1
2
-octave-wide bands of noise as the modulation
masker. The carrier noise was multiplied both with the
masker and the target modulator waveform. Center frequen-
cies for the masker modulation were 4, 8, or 16 Hz. The left
panel of Fig. 6 shows the resulting masking patterns from
that study. The three curves show a peaked characteristic:
The amount of masking decreases for increasing frequency
difference between test modulation and masker modulation.
The lower curve in the figure shows the unmasked modula-
tion detection threshold level as a function of modulation
frequency. For details about the experimental setup, stimuli
and procedure, see Houtgast ~1989!.
The right panel of Fig. 6 shows results from simulations
obtained with the multi-channel version of the modulation
filterbank model. Instead of using notched noise surrounding
the carrier, we forced the model to only analyze auditory
filters with center frequencies between 1 and 4 kHz, which
should have the same effect with respect to off-frequency
listening. The simulated thresholds for the unmasked condi-
tion are systematically lower than the experimental data from
Houtgast. This might be caused by the different presentation
of the masker and by differences in the applied threshold
estimation procedure. Besides this systematic difference of
about 25 dB, both the unmasked thresholds and the masked
threshold in both panels agree very well with each other. The
simulated masked patterns show the same peaked character-
istic, with less masking for increasing frequency difference
between test and masker modulation.
An additional observation can be made from the un-
masked modulation threshold data in Fig. 6. The difference
between the experimental thresholds at 4 and 64 Hz amounts
to about 11 dB. This is much more than observed in corre-
sponding measurements with a white-noise carrier ~cf. Figs.
3 and 4 in Sec. III B!. Interestingly, a similar difference be-
tween white and pink noise is also observed in simulations.
When the experimental condition of Houtgast is simulated
with a white-noise carrier, thresholds at 4-Hz modulation are
the same as for pink noise. The increase between 4 and 64
Hz is, however, smaller for white noise and amounts to about
FIG. 6. Modulation thresholds for a sinusoidal test modulation and a pink noise ranging from 1 to 4 kHz as carrier. The lower curve in both panels shows the
unmasked modulation detection thresholds as a function of modulation frequency. Each of the three peaked curves show the masked-modulation threshold
pattern for one of the
1
2
-octave-wide masker-modulation bands centered at 4 Hz ~n!,8Hz~L!,or16Hz~,!. The left panel shows data from Houtgast ~1989!.
The right panel shows predictions with the multi-channel model.
2912 2912J. Acoust. Soc. Am., Vol. 102, No. 5, Pt. 1, November 1997 Dau
et al.
: Spectral and temporal integration
7 dB. At present there is no simple explanation for this in-
fluence of the masker’s spectral shape on the TMTF.
As in the study of Houtgast ~1989!, the effect of a re-
duction of the masker-modulation level was investigated for
one of the conditions from Fig. 6 ~the 8-Hz modulation fre-
quency for the noise band centered at 8 Hz!. The masker-
modulation level used to obtain the middle threshold func-
tion in Fig. 6 was reduced by 5 and 10 dB, respectively. The
effect on the model prediction is shown in Fig. 7 ~right
panel!. The reduction of the threshold level amounts to 9 dB
and is slightly smaller than the reduction of the masker-
modulation level ~10 dB! and 2 dB smaller than the effect
observed in the data by Houtgast ~11 dB!.
To further test the ability of the modulation filterbank
model to account for modulation masking data, the effect of
varying the bandwidth of the modulation masker was inves-
tigated. Thresholds were obtained for a test modulation fre-
quency of 8 Hz for various values of the bandwidth of the
masker modulation. The center frequency and the spectral
density within the passband were kept constant. Figure 8
shows results from the study of Houtgast ~1989!~left panel!
and model predictions obtained with the present model ~right
panel!.
Houtgast ~1989! found that for small bandwidths, the
modulation thresholds increased by 3 dB for each doubling
of the masker-modulation bandwidth, whereas for large
bandwidths, the threshold remained constant. He proposed
that the modulation-detection threshold is associated with a
constant signal-to-noise ratio within a filter centered on the
test-modulation frequency. To a first approximation, indi-
cated by the two straight lines in the left panel of Fig. 8, he
suggested a width of the modulation filter of
1
2
1 oct.
3
In the present model, the modulation filter centered at 8
Hz has a bandwidth of 5 Hz. This bandwidth lies exactly in
the range of
1
2
1 octave that was suggested by Houtgast. The
model predicts an increase in threshold of about 6 dB be-
tween
1
8
and
1
2
octave of modulation-masker bandwidth. For
modulation-masker bandwidths larger than a
1
2
oct, it predicts
an almost constant threshold. This is in agreement with the
data of Houtgast. Thus modulation masking is only effective
within a ‘‘critical’’ band around the test-modulation fre-
quency.
FIG. 7. 8-Hz modulation-detection thresholds as a function of the relative masker-modulation level with the masker-modulation band centered at 8 Hz. Left
panel: data from Houtgast ~1989!. Right panel: corresponding predictions of the modulation filterbank model. The dashed lines represent the unmasked
detection threshold of the test modulation.
FIG. 8. 8-Hz modulation-detection thresholds as a function of the bandwidth of the masker-modulation noise centered at 8 Hz. Left panel: data from Houtgast
~1989!. Right panel: corresponding predictions of the modulation filterbank model.
2913 2913J. Acoust. Soc. Am., Vol. 102, No. 5, Pt. 1, November 1997 Dau
et al.
: Spectral and temporal integration
The results from these simulations support the hypoth-
esis that the envelope fluctuations of the stimuli are pro-
cessed by modulation-frequency selective channels.
E. Temporal integration in modulation detection
Thresholds for detecting sinusoidal amplitude modula-
tion of a wide-band noise carrier ~low-pass filtered at 6 kHz!
were measured and simulated as a function of the duration of
the modulating signal. The experimental design was chosen
according to the study of Sheft and Yost ~1990!. The carrier
was gated with a duration that exceeded the duration of
modulation by the combined stimulus rise and fall times.
Stimuli were shaped by a 25.6-ms risefall time. The com-
binations of modulation frequency and numbers of modula-
tion cycles for each condition are listed in Table I in Sheft
and Yost ~1990!. Because modulation was restricted to the
constant-amplitude portion of the carrier ~thus excluding the
ramps!, stimulus duration was always 51.2 ms longer than
the modulation duration listed in the table ~see Sheft and
Yost, 1990!.
Figure 9 shows the experimental data of two subjects.
The modulation depth at threshold is plotted as a function of
the modulation frequency. The parameter is the modulation
duration. The data agree very well with those from Sheft and
Yost ~1990!. The curves represent the ‘‘classical’’ broadband
TMTF often described in other studies ~e.g., Viemeister,
1979; Formby and Muir, 1988!. The data show somewhat
increased thresholds for the two lowest modulation frequen-
cies, f
mod
5 2.5 and 5 Hz. This is caused by the ‘‘gating’’ of
the carrier in this experiment in contrast to experiments
where the carrier was presented continuously, as discussed in
previous studies ~e.g., Viemeister, 1979; Sheft and Yost,
1990!.
4
For modulation frequencies between 5 and 40 Hz,
the thresholds are roughly constant. They increase slightly
between 40 and 80 Hz and at a rate of approximately 3
dB/octave for higher modulation frequencies.
Figure 10 shows the corresponding simulated thresholds
obtained with the modulation filterbank model. The ordinate
indicates modulation depth at threshold and the abscissa rep-
resents modulation frequency. The simulated threshold pat-
tern is in very good agreement with the pattern found in the
measured data. It shows increased thresholds for the two
lowest modulation frequencies, 2.5 and 5 Hz, leading to a
bandpass characteristic of the threshold function for the two
greatest durations of 200 and 400 ms. This is caused by the
dynamic properties of the adaption model ~Pu
¨
schel, 1988;
Dau et al., 1996a!. The feedback loops of the adaptation
model produce a considerable overshoot at the carrier onset
that decreases the detectability of the signal modulation es-
pecially at very low modulation rates ~see discussion in Vie-
meister, 1979; Sheft and Yost, 1990!.
5
The time constants derived from temporal integration
per unit duration are in very good agreement with those
found in the measured data. The model therefore accounts
for the long effective integration time constants whose val-
ues are much larger than the time constants indicated by the
‘‘resolution data’’ ~Sheft and Yost, 1990; Viemeister, 1979!.
IV. DISCUSSION
In this study the performance of the modulation filter-
bank model was described with respect to spectral and tem-
FIG. 9. Measured TMTFs of two subjects using broadband noise as a carrier. Parameter is the duration of the imposed test modulation. Carrier: 06 kHz;
Level: 65 dB SPL; Modulation duration: s: 400 ms; L: 200 ms; n: 100 ms; h:50ms;,: 25 ms; : 12.5 ms; 3: 6.25 ms; Subjects: TD ~left panel!;AS
~right panel!.
FIG. 10. Simulated TMTFs obtained with the current model for the same
conditions as in Fig. 9.
2914 2914J. Acoust. Soc. Am., Vol. 102, No. 5, Pt. 1, November 1997 Dau
et al.
: Spectral and temporal integration
poral integration in amplitude modulation detection. Several
‘‘critical’’ experiments were performed or taken from the
literature and model predictions were compared with experi-
mental data. As an extension to the single-channel model
proposed in Dau et al. ~1997!, in which the concept of the
modulation filterbank was established and the parameters of
the modulation filters were fitted to data for narrow-band
carriers, the modulation analysis was applied to broadband
conditions. In this multi-channel version of the model, the
temporal envelopes of the stimuli were processed by the
same modulation filterbank in parallel at the output of each
stimulated peripheral filter and the decision device combined
all filter outputs linearly.
A. Spectral integration
In the modulation-detection conditions considered in
this study using stochastic noise carriers and signal modula-
tion, which is coherent across auditory filters, peripheral fil-
tering generally does not limit the ability to detect modula-
tion. This has been interpreted in some former studies as the
observers’ ability to increase their effective bandwidth in de-
tecting wideband signals ~Green, 1960; Bos and de Boer,
1966; Schacknow and Raab, 1976!. As already discussed by
Viemeister ~1979!, the mechanism behind such a combina-
tion and the stage of processing where it occurs are still
unclear. It has been suggested in the literature ~Green and
Swets, 1966! that in detection experiments with multi-
component signals, combination probably occurs at a high
level: The observer can combine information nearly opti-
mally from widely spaced critical bands. This model ap-
proach was adopted within the current study and may be
denoted as an ‘‘auditory-filter-based approach.’’ In contrast,
in models for modulation-detection conditions, it has been
previously assumed that the combination of peripheral filter
outputs occurs at a very early stage of processing ~Viemeis-
ter, 1979; Berg, 1996!. In this context, the model proposed
by Viemeister ~1979! that includes a wide predetection band-
pass filter (D f5 2000 Hz) followed by a nonlinearity and a
low-pass filter, can account for modulation detection data
using ~broadband! noise as the carrier. This model approach
may be denoted as ‘‘predetection-filter approach.’’ For low
modulation frequencies and broadband carriers, it is difficult
to discriminate between such a predetection-filter approach
and the auditory-filter-based approach.
Furthermore, it is not possible to discriminate whether
there is one ‘‘large’’ modulation filterbank behind the com-
bined outputs of all peripheral channels ~cf. Yost et al.,
1989! or, alternatively, whether there is a modulation filter-
bank that separately analyzes the output of each peripheral
channel before the information is combined. An argument in
favor of the auditory-filter-based approach is that the band-
width of the predetection filter of the Viemeister model is
larger than the critical band estimates for most of the audi-
tory range. This latter approach therefore fails to describe the
data in an appropriate way for conditions in which spectral
resolution of the auditory system plays any role. The present
model gives a more general description of the processing of
modulation in the auditory system and is also applicable to
spectral-masking conditions, as was shown in Verhey and
Dau ~1996!.
One of the main results of this study is that all the data
could be described in terms of masking phenomena in the
modulation-frequency domain. Analogous to the results de-
scribed in Dau et al. ~1997!, thresholds are mainly deter-
mined by the amount of the inherent modulation power of
the specific noise carrier that falls into the transfer range of
the modulation filter tuned to the test modulation frequency.
The model can therefore account for the experimental find-
ings of Eddins that time constants derived from the TMTF
do not vary with changing frequency region and decrease
with increasing stimulus bandwidth. Note, however, that this
model does not support the notion of one resolution time
constant derivable from the data because the low-pass char-
acteristic in the modulation data for broadband noise is not
caused by a corresponding ‘‘low-pass weighting’’ of fast en-
velope fluctuations by the auditory system. Instead, the data
are explained in terms of the interaction of stimulus power in
the modulation-frequency region and the scaling of the
modulation filters.
6
We want to add that, recently, Strickland
and Viemeister ~1997! have pointed out that the effective
change of time constant with carrier bandwidth in Eddins’
experiment might be an artifact caused by the stimulus gen-
eration procedure. They showed experimentally that without
the bandlimitation after filtering the dependence of the time
constant on the carrier bandwidth is strongly reduced.
B. Temporal integration
The phenomenon of temporal integration refers to the
fact that, over a range of durations, thresholds decrease with
increasing signal duration. Several models have been sug-
gested in the literature that describe the phenomenon of tem-
poral integration. For example, to describe the threshold
function observed in typical test-tone integration data, an in-
tegration process is assumed that occurs over relatively long
time periods, i.e., of the order of several hundred millisec-
onds. The most prominent approach is the energy detection
model ~Green, 1960; Green and Swets, 1966! in which deci-
sions are based on the power of the input integrated over a
fixed time period. Another class of models assumes a shorter
integration time to account for temporal resolution, such as
modulation detection ~Viemeister, 1979!, gap detection ~For-
rest and Green, 1987!, and temporal aspects of nonsimulta-
neous masking ~Moore et al., 1988; Oxenham and Moore,
1994!. The discrepancy between these two modeling strate-
gies is often described as illustrating the ‘‘resolution-
integration’’ paradox ~de Boer, 1985!.
In recent studies, however, it has been argued that the
disparity between the integration and the resolution time
constants is not a real problem ~Viemeister and Wakefield,
1991!. They pointed out that the observation of a 3-dB de-
crease in threshold for each doubling of duration—as seen in
typical test-tone integration data—means that the auditory
system behaves as if perfect power integration occurs but
that the system is not necessarily performing the operation of
mathematical integration. Therefore it might be important to
distinguish between the phenomenon of temporal integration
and the process that accounts for the phenomenon. Viemeis-
2915 2915J. Acoust. Soc. Am., Vol. 102, No. 5, Pt. 1, November 1997 Dau
et al.
: Spectral and temporal integration
ter and Wakefield ~1991! provided evidence that integration
with a long time constant, such as proposed by the classical
models, does not occur in all situations of auditory process-
ing. They showed that the threshold for a pair of short pulses
yields classic power integration only for pulse separations of
less than 510 ms. For longer separations, the thresholds did
not change with separation and the pulses appeared to be
processed independently ~cf. Zwislocki et al., 1962!.Ina
second experiment, Viemeister and Wakefield ~1991!
showed that the threshold for a pair of tone pulses was lower
than for a single pulse, indicating some type of integration,
but was not affected by level changes of the noise which was
presented between the two pulses. The experimental results
from that study are plotted in the left panel of Fig. 11. It
shows the average thresholds for the first pulse alone
~squares!, the second pulse alone ~circles!, and for the pulse
pair ~triangles! as a function of the relative level of the in-
tervening noise. The thresholds for the first pulse alone do
not depend on the noise level. There is a slight increase in
threshold for the second pulse reflecting forward masking.
The thresholds for the pulse pair are about 2.5 dB lower than
those for either pulse alone and do not depend on the level of
the intervening noise ~for details see Viemeister and Wake-
field, 1991!. These data cannot be explained by long-term
integration.
Furthermore, as discussed by Viemeister and Wakefield
~1991!, the results of this second experiment are also incon-
sistent with the model proposed by Penner ~1978!. Penner
showed that a compressive nonlinearity followed by a short-
time constant integration can result in long effective integra-
tion. However, such a model—as an example of the class of
‘‘single-look’’ integration models—would predict a certain
change in threshold depending on the energy of the noise
between the two pulses, since the lower threshold for a pulse
pair compared to that for the single pulse requires integration
at least over the time of separation of the two pulses.
To account for the data, Viemeister and Wakefield
~1991! proposed a ‘‘multiple-look’’ model. With such a
model, ‘‘looks’’ or samples from a short-time constant pro-
cess are stored in memory and can be processed ‘‘selec-
tively,’’ depending on the task. The short-time constant al-
lows the model, in principle, to account for temporal
resolution data. The combination of a short-time constant
and selective processing allows the model to also account for
the data from the pulse pair experiment described above. The
authors suggested that in temporal integration tasks, the long
effective time constants may result from the combination of
information from different looks, not from true long-term
integration. However, there are some open questions with
regard to the multiple-look model proposed by Viemeister
and Wakefield ~1991!. A very basic question is concerned
with what is meant by a ‘‘look.’’ The authors discuss the
question of whether a look may be considered as a sample
from the envelopelike waveform from the short temporal
window that defines a look. But what is the shape of such a
window? What about the correlation between successive
looks or samples and how are these looks or samples com-
bined to arrive at a decision statistics? The predicted tempo-
ral integration function described in the study of Viemeister
and Wakefield ~1991! depends strongly on how the looks are
weighted and combined.
Nevertheless, the basic concept of the multiple-looks ap-
proach is to take into account that the observer attempts to
use all the samples from the observation interval. For the
detection of a tone, for example, an increase in the duration
of the tone increases the number of samples and results in an
improvement in performance. The model proposed in the
present study contains an optimal detector as a decision de-
vice. The detection process can be considered as a ‘‘matched
filtering’’ process as already described in Dau et al. ~1996a!.
This implies that a variable time constant is available that is
matched to the signal duration, dependent on the specific
task, i.e., the model has at its disposal a continuum of time
constants. The integration of the cross correlator is similar to
the classic notion of temporal integration, but no fixed inte-
gration time constant is necessary for long-term integration.
FIG. 11. Thresholds of a pair of tones separated by 100 ms as a function of the relative level of the intervening noise ~cf. Viemeister and Wakefield, 1991!.
Thresholds for either the first or the second tone pulse alone are included for comparison. Squares: first pulse only; circles: second pulse only; triangles: pulse
pair. Pulse~s!: 10 ms, 1 kHz, presented during 10-ms gaps in a steady-state noise masker. During the 50-ms interval centered between the gaps, the noise level
was either incremented by 6 dB, decremented by 6 dB, or left unchanged. Noise level: 40-dB spectrum level ~measured at 1 kHz!. Left panel: experimental
data from Viemeister and Wakefield ~1991!. Right panel: corresponding simulated thresholds on the basis of the present model.
2916 2916J. Acoust. Soc. Am., Vol. 102, No. 5, Pt. 1, November 1997 Dau
et al.
: Spectral and temporal integration
It is the temporal extension of the template which automati-
cally determines the weighting of stimuli across time. Thus
our implementation is effectively close to the ‘‘multiple-
look’’ strategy discussed by Viemeister and Wakefield
~1991!. Time constants that are related to the ‘‘hard-wired’’
part of signal processing within the model represent a lower
limit of temporal acuity. ~The term ‘‘hard-wired’’ is used in
the sense that this part of signal processing is assumed to be
independent of the specific experimental task.! The modula-
tion filterbank represents a set of time constants that are,
however, too short to account for the long-term integration
data. Thus it is the decision device that inherently accounts
for the long effective time constants. This agrees well with
the considerations by Viemeister and Wakefield ~1991! that
different strategies are probably being employed, and differ-
ent capabilities tapped, in resolution and integration tasks.
The resolution task seems to use more ‘‘peripheral’’ pro-
cesses whereas temporal integration may require higher level
processes such as multiple sampling and probability summa-
tion. To that extent, assuming that the decision process oc-
curs at a higher level of auditory processing, there is a certain
correspondence between the modeling strategy of the present
model and that suggested in the ‘‘multiple-look’’ model. The
current approach may therefore be considered as an alterna-
tive strategy to the multiple-look model.
This correspondence is supported by the model predic-
tions for the experiment of Viemeister and Wakefield, which
are shown in the right panel of Fig. 11. Our model predicts a
3-dB decrease in threshold for the pulse pair compared to the
threshold for a single pulse. Such a result is expected if the
psychometric function for detection of pure tones in noise
and in quiet can be described by: d
8
5 m(E/N
0
)
k
, where d
8
is the normalized sensitivity index, E is signal power, N
0
is
the spectral noise power density, and k and m represent ‘‘in-
dividual’’ parameters. For ideal observers, the parameter k
has the value 0.5 ~cf. Green and Swets, 1966!, leading to a
3-dB decrease in threshold for the pulse pair compared to
that for a single pulse. According to Egan et al. ~1969!, the
psychometric function for normal hearing subjects can be
described more accurately with a value k'1 instead of 0.5.
Such a value would lead to a 1.5-dB instead of 3-dB lower
threshold for the pulse pair than for a single pulse, as was
discussed by Viemeister and Wakefield ~1991!.
Long effective time constants occur both in typical test-
tone integration and in the modulation integration examined
in the present study. Whereas the decision device is respon-
sible for the shift of the TMTF with changing signal dura-
tion, from which an integration time constant can be derived,
it is the scaling of the modulation filterbank that determines
the form of the TMTF, from which a resolution time constant
is commonly derived. As discussed in the section about spec-
tral integration in modulation detection, the threshold is de-
termined mainly by the portion of the modulation power of
the broadband noise carrier that is processed by the modula-
tion filters tuned to the signal modulation frequency.
C. Future extensions of the model
Generally, all predicted thresholds shown in this paper
lie between 2 and 5 dB lower than the experimental data.
This shift indicates that there must be a certain loss of infor-
mation in the auditory processing of modulation ~indepen-
dent of modulation rate! that is not at present accounted for
in the model. Further modeling efforts are required to under-
stand this discrepancy, that cannot be explained by simply
increasing the variance of the internal noise, since thresholds
are mainly determined by the external statistics of the
stimuli.
7
The present model does not cover conditions which re-
quire some processes of across-channel comparison. Such
across-critical-band processing of temporally modulated
complex stimuli might occur in, for instance, comodulation
masking release ~CMR!~Hall, 1987!, comodulation detec-
tion difference ~CDD!~McFadden, 1987; Cohen and Schu-
bert, 1987!, and modulation detection interference ~MDI!
~Yost and Sheft, 1989!. In these cases the auditory system
seems to be looking across frequency channels that contain
temporally modulated stimuli. Concerning conditions of
MDI, Yost et al. ~1989! suggested a ‘‘large’’ modulation fil-
terbank in which information about modulations of compa-
rable rates are combined across the whole frequency range.
Such an approach appears interesting. However, to simulta-
neously ensure the possibility of predicting spectral mask-
ing, it appears necessary to process a certain low-frequency
part of the modulation spectrum including the dc-component
separately within each peripheral channel, because this com-
ponent represents the energy. Higher rate modulation may be
combined within one large modulation filterbank. This, of
course, should be tested in further studies in this field.
In order to predict performance for these types of experi-
ments ~as well as binaural masking experiments!, additional
stages would have to be included that calculate, for instance,
the correlation between the envelopes of different frequency
regions ~or the two ears!. Such stages have not been system-
atically evaluated within the framework of this study ~for a
first result on binaural masking, see Holube et al., 1995a, b!.
However, using the present model as a preprocessing circuit,
it might be easier to find realistic across-channel processing
stages that also allow correct prediction for conditions in
which across-channel processing is not needed.
V. CONCLUSIONS
The multi-channel modulation filterbank model de-
scribed in this study can predict a wide variety of experimen-
tal conditions, including spectral and temporal integration of
modulation detection and modulation masking with broad-
band carriers.
~1! Spectral integration is accounted for by combining
the detection cues from all auditory filters with an optimal
decision statistic.
~2! Temporal integration is accounted for by the vari-
able length of the template that forms the basis of the optimal
detector incorporated in the model.
~3! The combination of the modulation filterbank con-
cept and the optimal decision algorithm presents a powerful
strategy for describing modulation detection phenomena in
narrow-band and broadband conditions.
2917 2917J. Acoust. Soc. Am., Vol. 102, No. 5, Pt. 1, November 1997 Dau
et al.
: Spectral and temporal integration
ACKNOWLEDGMENTS
We would like to thank all our colleagues of the Gra-
duiertenkolleg Psychoakustik at the University of Oldenburg
for fruitful discussions on the content of this paper. We also
thank Brian Moore, Jesko Verhey, Andrew Oxenham, Stefan
Mu
¨
nkner, and Ralf Fassel for their comments and sugges-
tions concerning this study and for their critical reading of
earlier versions of this paper. The two anonymous reviewers
also provided very constructive criticism.
1
The implementation described in this paper represents one of two slightly
different modulation filterbank models that were developed and tested in
parallel at the universities of Oldenburg and Go
¨
ttingen. While this paper
concentrates on conditions relevant in the context of modulation percep-
tion, the paper by Mu
¨
nkner and Kohlrausch ~1997! focuses on other mea-
sures of temporal processing such as gap detection and forward masking.
2
This statement from Maiwald ~1967b! is in contrast to some data shown in
his thesis ~Maiwald, 1966, Fig. 26!. This figure shows that modulation-
detection thresholds obtained with a noise carrier ranging from 0 to 16 kHz
are about 3 dB lower than those for a carrier ranging from 6.5 to 16 kHz
@modulation-detection thresholds expressed as 20 log(m)#.
3
These results of Houtgast are in contrast with data from a later study by
Grantham and Bacon ~1991! who were not able to replicate Houtgast’s
bandwidening results, despite increasing the bandwidth by two octaves.
The authors concluded from their study that the effects observed in
modulation-masking experiments may be explained on the basis of some
kind of temporal pattern discrimination and not on a critical modulation
band filtering process. As the authors state, however, it is possible that
different methods of producing the noise modulators and of combining
noise and signal may account for the discrepancy between their results and
those of Houtgast ~1989!.
4
It is assumed that adaptation effects are responsible for this effect, reflect-
ing the temporal relationship between the carrier onset and the onset of
modulation. The carrier onset produces a change in the level of the internal
representation that is large relative to the changes produced by the sinu-
soidal modulation and might therefore interfere with the detection of the
modulation, especially at low modulation frequencies ~Viemeister, 1979!.
5
Sheft and Yost ~1990! found that even when presenting the carrier with a
fixed duration that included a 500-ms carrier fringe preceding the onset of
modulation, threshold patterns exhibit a bandpass characteristic. The adap-
tation model of Pu
¨
schel ~1988! would not account for such a long-term
adaptation effect.
6
If frozen-noise carriers were used simulated thresholds obtained with the
present model would be mainly determined by the statistics of the internal
noise ~added after the nonlinear processing!. This would lead to signifi-
cantly lower thresholds than for running-noise carriers. Threshold differ-
ences between both types of carriers observed in our own experimental data
appear to depend on carrier bandwidth, with a large effect at very small
bandwidths and a decreasing difference with increasing bandwidth. Studies
on capabilities and limitations of the present model in conditions with
frozen-noise carriers are currently in progress.
7
The amount of the variance of the internal noise was determined in a
simulation on intensity discrimination with deterministic stimuli to satisfy
Webers law. A simple increase of the amount of internal noise necessary to
compensate the observed 25 dB discrepancy in the present study would
lead to a completely unrealistic threshold prediction in the calibration con-
dition.
Berg, B. G. ~1996!. ‘‘On the relation between comodulation masking release
and temporal modulation transfer functions,’’ J. Acoust. Soc. Am. 100,
10131024.
Boer, E. de ~1985!. ‘‘Auditory Time Constants: A Paradox?’’ in Time Reso-
lution in Auditory Systems, edited by A. Michelsen ~Springer-Verlag, Ber-
lin!, pp. 141158.
Bos, C. E., and Boer, E. de ~1966!. ‘‘Masking and discrimination,’’ J.
Acoust. Soc. Am. 39, 708715.
Cohen, M. F., and Schubert, E. T. ~1987!. ‘‘The effect of cross-spectrum
correlation on the detectability of a noise band,’’ J. Acoust. Soc. Am. 81,
721723.
Dau, T. ~1996!. ‘‘Modeling auditory processing of amplitude modulation,’’
Doctoral thesis, University of Oldenburg.
Dau, T., Pu
¨
schel, D., and Kohlrausch, A. ~1996a!. ‘‘A quantitative model of
the ‘‘effective’’ signal processing in the auditory system: I. Model struc-
ture,’’ J. Acoust. Soc. Am. 99, 36153622.
Dau, T., Kollmeier, D., and Kohlrausch, A. ~1997!. ‘‘Modeling auditory
processing of amplitude modulation: I. Detection and masking with nar-
rowband carriers,’’ J. Acoust. Soc. Am. 102, 28922905.
Dau, T., Pu
¨
schel, D., and Kohlrausch, A. ~1996b!. ‘‘A quantitative model of
the ‘‘effective’’ signal processing in the auditory system: II. Simulations
and measurements,’’ J. Acoust. Soc. Am. 99, 36233631.
Eddins, D. ~1993!. ‘‘Amplitude modulation detection of narrow-band noise:
Effects of absolute bandwidth and frequency region,’’ J. Acoust. Soc. Am.
93, 470479.
Egan, J. P., Lindner, W. A., and McFadden, D. ~1969!. ‘‘Masking-level
differences and the form of the psychometric function,’’ Percept. Psycho-
phys. 6, 209215.
Formby, C., and Muir, K. ~1988!. ‘‘Modulation and gap detection for broad-
band and filtered noise signals,’’ J. Acoust. Soc. Am. 84, 545550.
Forrest, T. G., and Green, D. M. ~1987!. ‘‘Detection of partially filled gaps
in noise and the temporal modulation transfer function,’’ J. Acoust. Soc.
Am. 82, 19331943.
Grantham, D. W., and Bacon, S. P. ~1991!. ‘‘Binaural modulation mask-
ing,’’ J. Acoust. Soc. Am. 89, 13401349.
Green, D. M. ~1960!. ‘‘Auditory detection of a noise signal,’’ J. Acoust.
Soc. Am. 32, 121131.
Green, D. M., and Swets, J. A. ~1966!. Signal Detection Theory and Psy-
chophysics ~Wiley, New York!.
Hall, J. W., III ~1987!. ‘‘Experiments on comodulation masking release,’’ in
Auditory Processing of Complex Sounds, edited by W. A. Yost and C. S.
Watson ~Erlbaum, Hillsdale, NJ!.
Holube, I., Colburn, H. S., van de Par, S., and Kohlrausch, A. ~1995a!.
‘‘Model simulations of masked thresholds for tones in dichotic noise
maskers,’’ J. Acoust. Soc. Am. 97, 34113412.
Holube, I., Colburn, H. S., van de Par, S., and Kohlrausch, A. ~1995b!.
‘‘Simulationen der Mitho
¨
rschwellen von Testto
¨
nen in dichotischen Raus-
chmaskierern,’’ Fortschritte der Akustik, DAGA ’95, pp. 783786.
Houtgast, T. ~1989!. ‘‘Frequency selectivity in amplitude-modulation detec-
tion,’’ J. Acoust. Soc. Am. 85, 16761680.
Lawson, J. L., and Uhlenbeck, G. E. ~1950!. Threshold Signals, Volume 24
of Radiation Laboratory Series ~McGraw-Hill, New York!.
Maiwald, D. ~1966!. ‘‘Zusammenhang zwischen Mitho
¨
rschwellen und
Modulationsschwellen,’’ Doctoral thesis, Technical University of
Mu
¨
nchen.
Maiwald, D. ~1967a!. ‘‘Ein Funktionsschema des Geho
¨
rs zur Beschreibung
der Erkennbarkeit kleiner Frequenz- und Amplitudena
¨
nderungen,’’ Acus-
tica 18, 8192.
Maiwald, D. ~1967b!. ‘‘Die Berechnung von Modulationsschwellen mit
Hilfe eines Funktionsschemas,’’ Acustica 18, 193207.
McFadden, D. ~1987!. ‘‘Comodulation detection differences using noise-
band signals,’’ J. Acoust. Soc. Am. 81, 15191527.
Moore, B. C. J., Glasberg, B. R., Plack, C. J., and Biswas, A. K. ~1988!.
‘‘The shape of the ear’s temporal window,’’ J. Acoust. Soc. Am. 83,
11021116.
Mu
¨
nkner, S., and Kohlrausch, A. ~1997!. ‘‘Simulations of temporal resolu-
tion and integration experiments using a modulation filterbank model’’ ~in
preparation!.
Oxenham, A. J., and Moore, B. C. J. ~1994!. ‘‘Modeling the additivity of
nonsimultaneous masking,’’ Hearing Res. 80, 105118.
Patterson, R. D., Nimmo-Smith, I., Holdsworth, J., and Rice, P. ~1987!. ‘‘An
efficient auditory filterbank based on the gammatone function,’’ in Paper
presented at a meeting of the IOC Speech Group on Auditory Modelling at
RSRE, 1415 December.
Penner, M. J. ~1978!. ‘‘A power law transformation resulting in a class of
short-term integrators that produce time-intensity trades for noise bursts,’’
J. Acoust. Soc. Am. 63, 195201.
Pu
¨
schel, D. ~1988!. ‘‘Prinzipien der zeitlichen Analyse beim Ho
¨
ren,’’ Doc-
toral thesis, University of Go
¨
ttingen.
Rodenburg, M. ~1972!. ‘‘Sensitivity of the auditory system to differences in
intensity,’’ Doctoral thesis, Medical Faculty of Rotterdam.
Rodenburg, M. ~1977!. ‘‘Investigations of temporal effects with amplitude
modulated signals,’’ in Psychophysics and Physiology of Hearing, edited
by E. F. Evans and J. P. Wilson ~Academic, London!, pp. 429437.
Schacknow, P. N., and Raab, D. H. ~1976!. ‘‘Noise-intensity discrimination:
2918 2918J. Acoust. Soc. Am., Vol. 102, No. 5, Pt. 1, November 1997 Dau
et al.
: Spectral and temporal integration
Effects of bandwidth conditions and mode of masker presentation,’’ J.
Acoust. Soc. Am. 60, 893905.
Schreiner, C., and Langner, G. ~1988!. ‘‘Periodicity coding in the inferior
colliculus of the cat. I. Neuronal mechanism,’’ J. Neurophysiol. 60, 1799
1822.
Sheft, S., and Yost, W. ~1990!. ‘‘Temporal integration in amplitude modu-
lation detection,’’ J. Acoust. Soc. Am. 88, 796805.
Strickland, E. A., and Viemeister, N. F. ~1997!. ‘‘The effects of frequency
region and bandwidth on the temporal modulation transfer function,’’ J.
Acoust. Soc. Am. 102, 17991810.
van Zanten, G. A. ~1980!. ‘‘Temporal modulation transfer functions for
intensity modulated noise bands,’’ in Psychophysical and Behavioral
Studies in Hearing, edited by G. van den Brink and F. A. Bilsen ~Delft
U.P., Delft, The Netherlands!, pp. 206209.
Verhey, J. L., and Dau, T. ~1996!. ‘‘Simulations of spectral masking with a
model incorporating an optimal decision strategy,’’ in Psychoacoustics,
Speech, and Hearing Aids, edited by B. Kollmeier ~World Scientific, Sin-
gapore!, pp. 2934.
Viemeister, N. F. ~1977!. ‘‘Temporal factors in audition: a system analysis
approach,’’ in Psychophysics and Physiology of Hearing, edited by E. F.
Evans and J. P. Wilson ~Academic, London!, pp. 419427.
Viemeister, N. F. ~1979!. ‘‘Temporal modulation transfer functions based
upon modulation thresholds,’’ J. Acoust. Soc. Am. 66, 13641380.
Viemeister, N. F., and Wakefield, G. H. ~1991!. ‘‘Temporal integration and
multiple looks,’’ J. Acoust. Soc. Am. 90, 858865.
Yost, W. A., and Sheft, S. ~1989!. ‘‘Across critical band processing of
amplitude modulated tones,’’ J. Acoust. Soc. Am. 85, 848857.
Yost, W. A., Sheft, S., and Opie, J. ~1989!. ‘‘Modulation interference in
detection and discrimination of amplitude modulation,’’ J. Acoust. Soc.
Am. 86, 21382147.
Zwislocki, J. J., Hellman, R. P., and Verillo, R. T. ~1962!. ‘‘Threshold of
audibility for short pulses,’’ J. Acoust. Soc. Am. 34, 16481652.
2919 2919J. Acoust. Soc. Am., Vol. 102, No. 5, Pt. 1, November 1997 Dau
et al.
: Spectral and temporal integration
... The specific tuning characteristics of these AM filters (i.e., their bandwidth) explain masking or interference effects demonstrated repeatedly in the AM domain [3]. The ability to detect and combine these slow and fast AM cues over time (also called 'temporal integration') by human listeners is currently understood as resulting from the limitations introduced by internal variabilityalso called 'internal noise'and the operation of (late) decision-making mechanisms based on template matching [4]. The relative importance of slow and fast AM for speech perception in adult listeners has been addressed in a wealth of studies. ...
... Importantly, age-related changes were also observed on AM detection consistency. To better understand these developmental trends in AM detection, we developed a series of computational models of human auditory processing based on the modulationfilterbank and template-matching concepts [3], [4], [13], [14], to simulate the AM detection thresholds of children and adults in each condition. Two sets of model parameters were manipulated to distinguish the role of sensory factors (i.e., selectivity of modulation filters) and processing efficiency: 1) the effects of internal-noise sources modelled as additional gaussian noises at the output of modulation filters, and 2) sub-optimal decision strategies modelled as template matching operating on the noisy output of modulation filters. ...
... In this group of English-speaking children aged from 5 to 11 years of age, only the integration scores in the 32-Hz condition representing how much children' AM detection thresholds improve from 2 vs 8 cycles, were predictive of the identification thresholds in noise obtained in a condition contrasting fricative consonants on the place contrast (adj R 2 = 5%). Temporal integration for AM has been interpreted as reflecting the properties of the correlation operation of the decisional mechanism based on template matching [4,11,15]. Further work is warranted to explain why a relationship was specifically observed for relatively fast (32 Hz) AM cues and place of articulation identification of fricative consonants. ...
Conference Paper
Full-text available
Speech sounds convey relatively slow Amplitude Modulation cues whose processing plays a crucial role for speech comprehension. However, the development of AM processing and its interaction with speech intelligibility remains unclear. Previous studies suggested that AM processing development relates to changes in the central filtering of AM cues or in 'processing efficiency' (i.e., a reduction in internal noise and/or improvements in the optimality of decision making). Here, we explored the contribution of (i) the ability to combine AM cues over time (temporal integration), (ii) response consistency for AM detection, on children's in-noise consonant discrimination. Temporal integration developed until 11 years. Response consistency in AM detection also increased with age. Temporal integration at higher AM rates and AM detection consistency were statistically related to identification thresholds in noise for a subset of the tested consonants. Children vocabulary was not a better predictor of speech intelligibility compared to the measures of AM processing. Overall, the development of AM processing and its interaction with speech intelligibility may result from changes in (central) processing efficiency for AM.-* Corresponding author: first.author@email.ad.
... The auditory system demonstrates a remarkable sensitivity to AMs, as evidenced 5 by perceptual studies (e.g., Viemeister, 1979) and physiological investigations (see Joris et al., 2004 for 6 a comprehensive review). Similar to spectral masking effects in the audio-frequency domain, masking 7 effects also occur in the AM frequency domain, resulting in reduced sensitivity to a target AM in the 8 presence of a masking AM (Bacon and Grantham, 1989;Houtgast, 1989;Dau et al., 1997aDau et al., , 1997b9 Ewert and Dau, 2000; Sek and Moore, 2003). Specifically, AM masking patterns provide evidence for 10 a frequency-selective process, where the amount of AM masking decreases as the spectral distance 11 between the masker and the target increases (Bacon and Grantham, 1989;Houtgast, 1989 Conroy et al., 2023). ...
... conditions (Dau et al., 1997a(Dau et al., , 1997bVerhey et al., 1999;Ewert and Dau, 2000; Relaño-Iborra and Dau, 2022). Furthermore, the modulation filterbank is conceptually consistent with 27 the temporal dimension of a 'two-dimensional' spectro-temporal modulation filterbank, inspired by 28 neural responses to spectro-temporally varying stimuli in the auditory cortex of ferrets (Kowalski et 29 al., 1996;Depireux et al., 2001) and supported by data from perceptual learning and masking conditions 30 Overall, the versatility of the modulation filterbank model suggests that AM frequency selectivity is 33 an essential auditory processing feature for quantitatively predicting perceptual data obtained with 34 dynamically varying sounds. ...
Preprint
Full-text available
The processing and perception of amplitude modulations (AMs) in the auditory system reflect a frequency-selective process, often described as a modulation filterbank. Previous studies on perceptual AM masking reported similar results for older listeners with hearing impairment (HI) and young listeners with normal hearing (NH), suggesting no effects of age nor hearing loss on AM frequency selectivity. However, recent evidence has shown that age, independently of hearing loss, is detrimental to AM frequency selectivity. Hence, the present study aimed to disentangle the effects of hearing loss and age. A simultaneous AM masking paradigm was employed, utilizing a sinusoidal carrier at 2.8 kHz, narrow-band noise modulation maskers, and target modulation frequencies of 4, 16, 64, and 128 Hz. The results obtained from older (n=10, 63-77 years) and young HI listeners (n=3, 24-30 years) were compared to data from young and older NH listeners. Notably, the HI listeners generally exhibited lower (unmasked) AM detection thresholds and greater AM frequency selectivity compared to their NH counterparts of similar age. These findings suggest that age negatively affects AM frequency selectivity in both NH and HI listeners, while hearing loss improves AM detection and AM selectivity, likely due to the loss of peripheral compression.
... These models encompass cochlear mechanics, inner hair cells (IHC), and auditory nerve and brainstem signal processing. Dau et al. [25], for instance, proposed an auditory perception model to emulate signal processing in the human auditory system. In this model, temporal modulation cues are obtained using auditory filtering of the speech signal and modulation filtering of the temporal amplitude envelope in a cascade manner. ...
Article
Full-text available
Dimensional emotion can better describe rich and fine-grained emotional states than categorical emotion. In the realm of human–robot interaction, the ability to continuously recognize dimensional emotions from speech empowers robots to capture the temporal dynamics of a speaker’s emotional state and adjust their interaction strategies in real-time. In this study, we present an approach to enhance dimensional emotion recognition through modulation-filtered cochleagram and parallel attention recurrent neural network (PA-net). Firstly, the multi-resolution modulation-filtered cochleagram is derived from speech signals through auditory signal processing. Subsequently, the PA-net is employed to establish multi-temporal dependencies from diverse scales of features, enabling the tracking of the dynamic variations in dimensional emotion within auditory modulation sequences. The results obtained from experiments conducted on the RECOLA dataset demonstrate that, at the feature level, the modulation-filtered cochleagram surpasses other assessed features in its efficacy to forecast valence and arousal. Particularly noteworthy is its pronounced superiority in scenarios characterized by a high signal-to-noise ratio. At the model level, the PA-net attains the highest predictive performance for both valence and arousal, clearly outperforming alternative regression models. Furthermore, the experiments carried out on the SEWA dataset demonstrate the substantial enhancements brought about by the proposed method in valence and arousal prediction. These results collectively highlight the potency and effectiveness of our approach in advancing the field of dimensional speech emotion recognition.
... Another important limitation of the critical-band model is the inability to explain masking in the modulation domain, which is masking of amplitude-modulated signals by envelope fluctuations of similar frequency that occur in the noise background (Bacon and Grantham, 1989;Houtgast, 1989). Human listeners show band-pass modulation masking patterns consistent with a modulation filterbank processing strategy (Hose et al., 1987;Dau et al., 1997a;Dau et al., 1997b;Ewert et al., 2002) thought to occur in the central nervous system. Similar "modulation tuning" of neural responses has also been observed at the forebrain level in the mynah bird (Hose et al., 1987). ...
Article
Full-text available
Anthropogenic noise and its impact on wildlife has recently received considerable attention. Research interest began to increase at the turn of the century and the number of publications investigating the effects of anthropogenic noise has been growing steadily ever since. Songbirds have been a major focus in the study of anthropogenic noise effects, with a significant portion of the literature focusing on the changes in singing behavior in noise. Many of these studies have found increases in the amplitude or frequency of song, or changes in the temporal patterning of song production, putatively due to the masking effects of noise. Implicit in the masking hypothesis is the assumption that all species process sounds in noise similarly and will therefore be subject to similar masking effects. However, the emerging comparative literature on auditory processing in birds suggests that there may be significant differences in how different species process sound, both in quiet and in noise. In this paper we will (1) briefly review the literature on anthropogenic noise and birds, (2) provide a mechanistic overview of how noise impacts auditory processing, (3) review what is known about the comparative avian auditory processing in noise, and (4) discuss the implications of species level differences in auditory processing for behavioral and physiological responses to anthropogenic noise.
... Altogether, Figure 4 illustrates the algorithm of the simulation model. It follows the structure of auditory models proposed by Dau and his colleagues [30][31][32][33]. Since not every change in vowel formant frequency can be perceptually detected, a change in the auditory metric has to be greater than or equal to a decision statistic to detect a formant frequency shift. ...
Article
Full-text available
As formant frequencies of vowel sounds are critical acoustic cues for vowel perception, human listeners need to be sensitive to formant frequency change. Numerous studies have found that formant frequency discrimination is affected by many factors like formant frequency, speech level, and fundamental frequency. Theoretically, to perceive a formant frequency change, human listeners with normal hearing may need a relatively constant change in the excitation and loudness pattern, and this internal change in auditory processing is independent of vowel category. Thus, the present study examined whether such metrics could explain the effects of formant frequency and speech level on formant frequency discrimination thresholds. Moreover, a simulation model based on the auditory excitation-pattern and loudness-pattern models was developed to simulate the auditory processing of vowel signals and predict thresholds of vowel formant discrimination. The results showed that predicted thresholds based on auditory metrics incorporating auditory excitation or loudness patterns near the target formant showed high correlations and low root-mean-square errors with human behavioral thresholds in terms of the effects of formant frequency and speech level). In addition, the simulation model, which particularly simulates the spectral processing of acoustic signals in the human auditory system, may be used to evaluate the auditory perception of speech signals for listeners with hearing impairments and/or different language backgrounds.
Article
Full-text available
The processing and perception of amplitude modulation (AM) in the auditory system reflect a frequency-selective process, often described as a modulation filterbank. Previous studies on perceptual AM masking reported similar results for older listeners with hearing impairment (HI listeners) and young listeners with normal hearing (NH listeners), suggesting no effects of age or hearing loss on AM frequency selectivity. However, recent evidence has shown that age, independently of hearing loss, adversely affects AM frequency selectivity. Hence, this study aimed to disentangle the effects of hearing loss and age. A simultaneous AM masking paradigm was employed, using a sinusoidal carrier at 2.8 kHz, narrowband noise modulation maskers, and target modulation frequencies of 4, 16, 64, and 128 Hz. The results obtained from young (n = 3, 24–30 years of age) and older (n = 10, 63–77 years of age) HI listeners were compared to previously obtained data from young and older NH listeners. Notably, the HI listeners generally exhibited lower (unmasked) AM detection thresholds and greater AM frequency selectivity than their NH counterparts in both age groups. Overall, the results suggest that age negatively affects AM frequency selectivity for both NH and HI listeners, whereas hearing loss improves AM detection and AM selectivity, likely due to the loss of peripheral compression. Copyright (2024) Author(s). This article is distributed under a Creative Commons Attribution (CC BY) License.
Article
Auditory detection of the Amplitude Modulation (AM) of sounds, crucial for speech perception, improves until 10 years of age. This protracted development may not only be explained by sensory maturation, but also by im- provements in processing efficiency: the ability to make efficient use of available sensory information. This hy- pothesis was tested behaviorally on 86 6-to-9-year-olds and 15 adults using AM-detection tasks assessing absolute sensitivity, masking, and response consistency in the AM domain. Absolute sensitivity was estimated by the detection thresholds of a sinusoidal AM applied to a pure-tone carrier; AM masking was estimated as the elevation of AM-detection thresholds produced when replacing the pure-tone carrier by a narrowband noise; response consistency was estimated using a double-pass paradigm where the same set of stimuli was presented twice. Results showed that AM sensitivity improved from childhood to adulthood, but did not change between 6 and 9 years. AM masking did not change with age, suggesting that the selectivity of perceptual AM filters was adult-like by 6 years. However, response consistency increased developmentally, supporting the hypothesis of reduced processing efficiency in early childhood. At the group level, double-pass data of children and adults were well simulated by a model of the human auditory system assuming a higher level of internal noise for children. At the individual level, for both children and adults, double-pass data were better simulated when assuming a sub- optimal decision strategy in addition to differences in internal noise. In conclusion, processing efficiency for AM detection is reduced in childhood. Moreover, worse AM detection was linked to both systematic and stochastic inefficiencies, in both children and adults.
Article
Full-text available
The neural mechanisms underlying the exogenous coding and neural entrainment to repetitive auditory stimuli have seen a recent surge of interest. However, few studies have characterized how parametric changes in stimulus presentation alter entrained responses. We examined the degree to which the brain entrains to repeated speech (i.e., /ba/) and nonspeech (i.e., click) sounds using phase-locking value (PLV) analysis applied to multichannel human electroencephalogram (EEG) data. Passive cortico-acoustic tracking was investigated in N = 24 normal young adults utilizing EEG source analyses that isolated neural activity stemming from both auditory temporal cortices. We parametrically manipulated the rate and periodicity of repetitive, continuous speech and click stimuli to investigate how speed and jitter in ongoing sound streams affect oscillatory entrainment. Neuronal synchronization to speech was enhanced at 4.5 Hz (the putative universal rate of speech) and showed a differential pattern to that of clicks, particularly at higher rates. PLV to speech decreased with increasing jitter but remained superior to clicks. Surprisingly, PLV entrainment to clicks was invariant to periodicity manipulations. Our findings provide evidence that the brain's neural entrainment to complex sounds is enhanced and more sensitized when processing speech-like stimuli, even at the syllable level, relative to nonspeech sounds. The fact that this specialization is apparent even under passive listening suggests a priority of the auditory system for synchronizing to behaviorally relevant signals.
Article
This study investigated word recognition for sentences temporally filtered within and across acoustic–phonetic segments providing primarily vocalic or consonantal cues. Amplitude modulation was filtered at syllabic (0–8 Hz) or slow phonemic (8–16 Hz) rates. Sentence-level modulation properties were also varied by amplifying or attenuating segments. Participants were older adults with normal or impaired hearing. Older adult speech recognition was compared to groups of younger normal-hearing adults who heard speech unmodified or spectrally shaped with and without threshold matching noise that matched audibility to hearing-impaired thresholds. Participants also completed cognitive and speech recognition measures. Overall, results confirm the primary contribution of syllabic speech modulations to recognition and demonstrate the importance of these modulations across vowel and consonant segments. Group differences demonstrated a hearing loss–related impairment in processing modulation-filtered speech, particularly at 8–16 Hz. This impairment could not be fully explained by age or poorer audibility. Principal components analysis identified a single factor score that summarized speech recognition across modulation-filtered conditions; analysis of individual differences explained 81% of the variance in this summary factor among the older adults with hearing loss. These results suggest that a combination of cognitive abilities and speech glimpsing abilities contribute to speech recognition in this group.
Chapter
Three types of experiments on temporal discrimination are considered. The first is threshold detection of tone bursts with the duration of the burst as the independent variable. Results can be described by a model incorporating a “leaky integrator” which operates on the intensity of the signal. The appropriate time constant is 200 msec. Next comes the detection of amplitude modulation. The threshold found as a function of modulation frequency (for low modulation frequencies) can again be described by a “leaky integrator” model, this time equipped with a time constant of 20 msec. It is argued why these two models are incompatible: the auditory system appears to be able to detect modulations far easier than the 200 msec time constant model would allow. More directly aimed at temporal phenomena are experiments on forward masking and the detection of gaps in a continuous signal. Interpretation of the findings is rather difficult because there is a peculiar nonlinearity in the temporal persistence of masking and because the “ringing” of the auditory filters is difficult to separate from the persistence of masking. It is argued why, on the basis of experimental results, one should come to the conclusion that the auditory system is capable of detecting a 2–4 dB decrement of intensity in approx. 10 ms. This is compatible with what we can infer about the ringing of the filters. However, it is incompatible with the 200 msec time constant model of temporal integration: the auditory system is far less capable of detecting increments than decrements. The conclusion is that each type of experiment leads to a model describing only the results of that experiment. The three models described should be regarded as “ad hoc” models because they cannot be united into one model.
Article
This article examines the idea that the temporal resolution of the auditory system can be modeled using a temporal window (an intensity weighting function) analogous to the auditory filter measured in the frequency domain. To estimate the shape of the hypothetical temporal window, threshold was measured for a brief sinusoidal signal presented in a temporal gap between two bursts of noise. The duration of the gap was systematically varied and the signal was placed both symmetrically and asymmetrically within the gap. The data were analyzed by assuming that the temporal window had the form of a simple mathematical expression with a small number of free parameters. The values of the parameters were adjusted to give the best fit to the data. The analysis assumed that, for each condition, the temporal window was centered at the time giving the highest signal‐to‐masker ratio, and that threshold corresponded to a fixed ratio of signal energy to masker energy at the output of the window. The data were fitted well by modeling each side of the window as the sum of two rounded‐exponential functions. The window was highly asymmetric, having a shallower slope for times before the center than for times after. The equivalent rectangular duration (ERD) of the window was typically about 8 ms. The ERD increased slightly when the masker level was decreased, but did not differ significantly for signal frequencies of 500 and 2000 Hz. The temporal‐window model successfully accounts for the data from a variety of experimentsmeasuring temporal resolution. However, it fails to predict certain aspects of forward masking and of the detection of amplitude modulation at high rates.
Article
Thresholds and psychometric functions for the detection of amplitude modulation were measured as a function of modulation frequency under several stimulus conditions. The first experiment investigated the relative importance of stimulus bandwidth and frequency region for amplitude-modulation detection. The stimulus bandwidth was either 200, 400, 800, or 1600 Hz. The frequency region was varied by adjusting the high-frequency cutoff of the noise to be either 600, 2200, or 4400 Hz. Temporal modulation transfer functions demonstrated the typical low-pass filter characteristic, with sensitivity to modulation decreasing with increasing modulation frequency. Time constants associated with the transfer functions were derived from low-pass filter functions fitted to the data. The time constants varied inversely with noise bandwidth (less-than-or-equal-to 1600 Hz) and were independent of frequency region. These results are consistent with estimates of temporal acuity based on previous studies of gap detection for narrow-band noise as well as estimates of temporal acuity using deterministic stimuli. In a second experiment, psychometric functions, plotted with modulation depth in dB, demonstrated somewhat steeper slopes as modulation frequency increased. The estimated slope values did not vary greatly with frequency region or noise bandwidth.
Article
An important property of the auditorv system is the way this system processes changes in stimulus intensitv. Although this problem has been discussed manv times since Weber, it is rather surprising that no generally accepted hypothesis about the underlying mechanism has emerged. In the nresent study the sensitivity of the ear to intensity differences of noise signals is studied. In general two methods of investigation are used for the study of perception: nsychophysics and electrophysiology. Fechner founded psychonhvsics in 1860 ( Boring, 1950, 1966) as a studv of the relation between stimulus and sensation. In psychophysical experiments a stimulus is presented, and an observer makes some response (either by saying or by pressing a button). The system to be studied contains the peripheral sense-organ as well as the neural pathways that processes the information elicited in the peripheral organ. Considering the stimulus as the input of the system and the response of the observer as the output, it is possible to measure input-output relations by varying the parameters of the input stimulus. The investigator makes assumptions about how the system functions, and these assumptions have to account for the inputoutput relations. All the assumptions together constitute a model. Such a model of the system can suggest new experiments, and the results of such experiments lead to refinement or rejection of the model. Psychophysical experiments can never prove, however, that the system works like the model.