ArticlePDF Available

Crossmodal interactions and multisensory integration in the perception of audio-visual motion - A free-field study

Authors:
  • Hearing Diagnostics Limited

Abstract

Motion perception can be altered by information received through multiple senses. So far, the interplay between the visual and the auditory modality in peripheral motion perception is scarcely described. The present free-field study investigated audio-visual motion perception for different azimuthal trajectories in space. To disentangle effects related to crossmodal interactions (the influence of one modality on signal processing in another modality) and multisensory integration (binding of bimodal streams), we manipulated the subjects' attention in two experiments on a single set of moving audio-visual stimuli. Acoustic and visual signals were either congruent or spatially and temporally disparate at motion offset. (i) Crossmodal interactions were studied in a selective attention task. Subjects were instructed to attend to either the acoustic or the visual stream and to indicate the perceived final position of motion. (ii) Multisensory integration was studied in a divided attention task in which subjects were asked to report whether they perceived unified or separated audio-visual motion offsets. The results indicate that crossmodal interactions in motion perception do not depend on the integration of the audio-visual stream. Furthermore, in the crossmodal task, both visual and auditory motion perception were susceptible to modulation by irrelevant streams, provided that temporal disparities did not exceed a critical range. Concurrent visual streams modulated auditory motion perception in the central field, whereas concurrent acoustic streams attracted visual motion information in the periphery. Differential abilities between the visual and auditory system when attempting to accurately track positional information along different trajectories account for the observed biasing effects.
Research Report
Crossmodal interactions and multisensory integration in the
perception of audio-visual motion A free-field study
Kristina Schmiedchen
, Claudia Freigang, Ines Nitsche, Rudolf Rübsamen
Faculty of Biosciences, Pharmacy and Psychology, University of Leipzig, Talstrasse 33, 04103 Leipzig, Germany
ARTICLE INFO ABSTRACT
Article history:
Accepted 6 May 2012
Available online 14 May 2012
Motion perception can be altered by information received through multiple senses. So far,
the interplay between the visual and the auditory modality in peripheral motion perception
is scarcely described. The present free-field study investigated audio-visual motion
perception for different azimuthal trajectories in space. To disentangle effects related to
crossmodal interactions (the influence of one modality on signal processing in another
modality) and multisensory integration (binding of bimodal streams), we manipulated the
subjectsattention in two experiments on a single set of moving audio-visual stimuli.
Acoustic and visual signals were either congruent or spatially and temporally disparate at
motion offset. (i) Crossmodal interactions were studied in a selective attention task.
Subjects were instructed to attend to either the acoustic or the visual stream and to indicate
the perceived final position of motion. (ii) Multisensory integration was studied in a divided
attention task in which subjects were asked to report whether they perceived unified or
separated audio-visual motion offsets. The results indicate that crossmodal interactions in
motion perception do not depend on the integration of the audio-visual stream.
Furthermore, in the crossmodal task, both visual and auditory motion perception were
susceptible to modulation by irrelevant streams, provided that temporal disparities did not
exceed a critical range. Concurrent visual streams modulated auditory motion perception in
the central field, whereas concurrent acoustic streams attracted visual motion information
in the periphery. Differential abilities between the visual and auditory system when
attempting to accurately track positional information along different trajectories account
for the observed biasing effects.
© 2012 Elsevier B.V. All rights reserved.
Keywords:
Audio-visual
Motion perception
Dynamic capture
Perceived unity
Trajectory
Attention
1. Introduction
Objects that surround us usually stimulate more than a single
sense. Various streams of information that belong to the same
object are integrated in the brain, while simultaneously those
streams that belong to different objects are separated. Over
the past decades, a wide range of studies dealing with the
perception of stationary events has led to a substantial
improvement of our understanding of how multisensory
inputs are combined into meaningful percepts (Calvert et al.,
2001; Ernst and Bülthoff, 2004; McGurk and MacDonald, 1976;
Senkowski et al., 2007; Teder-Sälejärvi et al., 2005; Werner and
BRAIN RESEARCH 1466 (2012) 99111
Corresponding author. Fax: +49 341 9736848.
E-mail address: k.schmiedchen@uni-leipzig.de (K. Schmiedchen).
0006-8993/$ see front matter © 2012 Elsevier B.V. All rights reserved.
doi:10.1016/j.brainres.2012.05.015
Available online at www.sciencedirect.com
www.elsevier.com/locate/brainres
Noppeney, 2011). However, the specifics of the interplay, i.e.
the mutual influence between different senses in the percep-
tion of motion, are described in much less detail.
Motion perception requires the dynamic integration of spatial
changes over time. In vision, a moving light pattern consecu-
tively activates adjacent locations on the retina, which in turn
have to be associated by motion detectors in the visual cortex
(Albright and Stoner, 1995). The auditory system, in contrast, has
to track dynamically changing interaural time differences (ITDs)
and/or interaural intensity differences (IIDs) in order to infer
acoustic motion (Middlebrooks and Green, 1991). As such,
dynamic location coding achieved by a direct, retinotopic based
representation in the visual system and an indirect, recon-
structed representation in the auditory system fundamentally
differs between both modalities (Wilson and O'Neill, 1998).
A wide range of studies making use of various methodologies
such as single-cell recordings in animals, human behavior, EEG
and fMRI recordings or statistic modeling, all seem to indicate
that the combination of multisensory inputs requires the
analysis of spatial and/or temporal features between different
streams of sensory information. Asynchronous inputs are linked
as long as they fall within a defined temporal window (Colonius
and Diederich, 2010; Lewald and Guski, 2003; Meredith et al.,
1987; Navarra et al., 2005; Slutsky and Recanzone, 2001).
Similarly, spatial disparities between acoustic and visual inputs
are tolerated to a certain degree in static events (Bertelson and
Radeau, 1981; Hairston et al., 2003; Thurlow and Jack, 1973; Welch
and Warren, 1980)andmovingevents(Alink et al., 2008; Meyer
and Wuerger, 2001; Soto-Faraco et al., 2002; Stekelenburg and
Vroomen, 2009). Other studies have highlighted the crucial role
of the co-analysis of spatial and temporal features in multisen-
sory perception (Bolognini et al., 2005; Lewald and Guski, 2003;
Meyer et al., 2005; Recanzone, 2009; Soto-Faraco et al., 2004; Stein
and Meredith, 1993).
Importantly, multisensory integration refers to the binding
of stimuli perceived through multiple senses, whereas cross-
modal interactionsdescribe the directinfluence of one modality
on signal processing in another modality without necessarily
integrating information (Spence et al., 2009). In particular, the
term crossmodal localization bias is used to describe the
displacement of the perceived location of a signal in the
attended modality towards the location of a concurrent, but
ignored signal in another modality (Bertelson et al., 2000;
Hairston et al., 2003).
A well described crossmodal interaction phenomenon is the
ventriloquist illusion. Ventriloquism refers to a bias of per-
ceived information towards the information of a competing
sense that has either a more precise spatial or temporal
resolution. The first studies on ventriloquism in motion
perception supported the notion of a visual dominance (Allen
and Kolers, 1981; Mateeff et al., 1985), as the visual system was
found to provide more salient cues in perceiving motion than
the auditory or the somatosensory system. Crossmodal dy-
namic capture of auditory motion was also demonstrated by
more recent studies (Kitajima and Yamashita, 1999; Sanabria et
al., 2007b; Soto-Faraco et al., 2002; Strybel and Vatakis, 2004),
where an alteration of the perceived direction of auditory
motion towards that of the visual signal was attributed to a
captureby the visual system which has superior spatial
resolution in the central field. In contrast, laterally moving
sounds were shown to induce motion perception of static visual
flashes (Hidaka et al., 2011), which demonstrates that the
auditory system can bias the visual system in the spatial
domain in instances where the visual resolution is inferior to
the auditory resolution. This finding is consistent with previous
studies employing stationary events (Alais and Burr, 2004;
Hairston et al., 2003). The unrestricted dominance of the visual
modality in motion perception was further challenged by
additional reports of modulatory effects of a moving sound on
the perceived direction of visual motion (Alink et al., 2012;
Brooks et al., 2007; Jain et al., 2008) or in resolving ambiguous
visual motion displays (Sanabria et al., 2007a). Also, the timing
of a stationary sound has been shown to affect the detection or
the perceived direction of moving visual signals (Freeman and
Driver, 2008; Getzmann, 2007; Kafaligonul and Stoner, 2010).
Taken together, the results of these studies suggest context-
dependent crossmodal interactions between the visual and the
auditory modality in motion perception and support the
hypothesis that perception is dominated by the modality
which provides the most reliable information (Battaglia et al.,
2003; Burr and Alais, 2006; Ernst and Banks, 2002).
Unlike crossmodal interactions, multisensory integration
necessarily implicates the binding of various streams of
information and yields a coherent percept. Ernst and
Bülthoff (2004) pointed out that the integration of multisen-
sory inputs reduces the variance between the individual
sensory estimates and leads to enhanced reliability of the
combined percept. Integration effects have been observed at
very distinct processing stages, from low-level structures
such as the superior colliculus (Meredith and Stein, 1983;
Meredith et al., 1987) to higher-order cortical areas such as
the superior temporal sulcus (Calvert et al., 2001; Werner and
Noppeney, 2010). Consequently, Meyer et al. (2011) suggested
that these different stages are responsible for the integration
of various contents of bimodal stimuli. That is, sensory
integration (e.g. spatial and temporal features) is likely
reflected in low-level structures, whereas semantic integra-
tion (e.g. linguistic contents) is assigned to higher-order
structures. Psychophysical studies investigating low-level
integration of motion features found improved speed estima-
tion (Wuerger et al., 2003) and an increase in the detection
rate (Meyer et al., 2005) in the bimodal condition. An fMRI
study by Lewis and Noppeney (2010) on higher-level integra-
tion revealed that audio-visual synchrony facilitates motion
discrimination. Furthermore, both behavioral and neurophys-
iological studies have shown that the integration of two or
more streams of information into a unified percept is optimal
for spatially and temporally aligned features (Stein and
Meredith, 1993; Wallace et al., 2004). In conflicting audio-
visual situations, the integration or separation of several
streams depends on the magnitude of the spatial and/or
temporal disparities, as previously suggested by the findings
of behavioral studies employing stationary events (Bertelson
and Radeau, 1981; Lewald and Guski, 2003; Slutsky and
Recanzone, 2001; Wallace et al., 2004).
A limitation of most of the previous studies is that moving
stimuli were mainly presented in the central visual field. To date,
interactions between the visual and the auditory system in
perceiving motion in the peripherywhere the interplay between
the two senses might differhave been seldom investigated.
100 BRAIN RESEARCH 1466 (2012) 99111
The present free-field study aimed at better understanding
how crossmodal interactions and multisensory integration
contribute to audio-visual motion perception in central and
peripheral space. To this end, a single set of audio-visual
stimuli was employed in two different experimental tasks. The
subjectsattention was either (i) selectively directed to a single
modality or (ii) simultaneously directed to both modalities.
The first experiment of this study aimed at elucidating the role
of crossmodal interactions in the perception of motion in
space. Moving audio-visual stimuli and moving unimodal
stimuli (acoustic or visual), traveling along different prede-
fined trajectories at a stimulus duration of either 0.5 s or 2.0 s,
were randomly presented in the free-field (Fig. 1A). Audio-
visual stimuli were either congruent or spatially and tempo-
rally disparate at motion offset (Figs. 1B and C). In a blocked
design, subjects were instructed to selectively attend to either
the acoustic or the visual stream and to localize the final
position of the attended stimulus. At central motion offset
locations, we expected a shift of the perceived final position of
acoustic trajectories towards the displaced final position of
visual trajectories. However, as positional accuracy in visual
motion perception decreases towards the peripheral field
(McKee and Nakayama, 1984) and thus the reliability of visual
signals, we hypothesized that auditory motion perception
would be less influenced by concurrent visual information at
lateral motion offset locations.
In a second experiment, we studied multisensory integra-
tion of motion stimuli. Using the same audio-visual stimuli as
in the first experiment, subjects were instructed to simulta-
neously attend to both streams and to judge after each trial
whether the visual and the acoustic motion offsets were
congruent or incongruent. We hypothesized that increasing
spatio-temporal disparities between the visual and the
acoustic motion offset would lead to increased rates of
separation of both streams. Furthermore, we predicted that
the rate of perceived unity in conflicting trials is reduced at
central motion offset locations due to enhanced stimulus
reliability in both modalities.
2. Results
2.1. Experiment 1: audio-visual localization task
Visual and auditory localization performances in the selective
attention task were contrasted with each other to evaluate the
ACOUSTIC CONSTANT VISUAL CONSTANT
UNIMODAL
-15°
-10°
-5°
+5°
+10°
+15°
38° 38°
-60°
-38°
-8° +8°
+38°
+60°
2.35 m
(A) (B)
-15° -10° -5° +5° +10° +15°
Constant signal
Variable signal
SPATIAL DISPARITY
-195 ms -130 ms -65 ms 0ms
-780 ms -520 ms -260 ms 0ms
+65 ms
+260 ms
+130 ms
+520 ms
+195 ms
+780 ms
TEMPORAL DISPARITY
0.5 s signal duration:
2.0 s signal duration:
(C)
Fig. 1 Experimental setup and stimulus conditions. (A) Array of 47 loudspeakers and 188 LEDs (dots in front of the
loudspeaker symbols) mounted in an azimuthal, semicircular arrangement at a distance of 2.35 m from the subject's head.
Locations in the left and right hemifield are denoted by negative and positive signs, respectively. Arrows indicate the
trajectories of motion stimuli which move towards the target positions from either side. Motion offset locations in the constant
modality are shown by blue loudspeaker symbols and yellow LED symbols. Motion offset locations of the concurrent signal in
the second modality (not indicated) during bimodal stimulation varied with respect to a predefined spatio-temporal disparity
(panel B and C). (B) Uppermost row: Unimodal stimuli (leftacoustic only; rightvisual only) traveling an angular range of 38°.
Left side: acoustic constant, right side: visual constant. In the audio-visual localization task (experiment 1) two different
attentional conditions were tested. The attended signal traveled a constant angular range (38°), while the unattended signal,
which concurrently started from the same position and traveled with the same angular speed, stopped before, simultaneously
or after motion offset of the signal in the attended modality. Spatial disparities between the attended and unattended signals
were systematically varied between 15° and +15° in steps of 5°. Note, that spatial disparities at motion offset co-varied with
temporal disparities (panel C). The same set of stimuli, with the exception of the unimodal stimulus conditions, was also
employed in the task on detecting spatio-temporal disparities (experiment 2). (C) Relationship of spatial and temporal
disparities at motion offset between signals in the constant and the variable modality. The employed spatial disparities were
identical for 0.5 s and 2.0 s signals, but temporal disparities co-varied both with spatial disparity and signal duration.
101BRAIN RESEARCH 1466 (2012) 99111
influence of concurrent visual streams on auditory motion
perception and vice versa. In Fig. 2 localization performance at
different motion offset locations in space is plotted as a
function of spatial disparity between the attended and
unattended modality. For reasons of clarity, the respective
temporal disparities are not indicated in the following figures.
Note, however, that temporal offset disparities varied both
as a function of spatial disparity and overall signal duration
(Fig. 1C, Table 1). Mean values are displayed for normalized
data, i.e. relative to the respective unimodal localization
performance. For the statistical comparison between visual
and auditory localization performance, data of the two
stimulus durations (0.5 s, 2.0 s) were submitted to separate
four-way repeated measures analysis of variance (ANOVA),
including the within-subject factors (i) attended modality
(attend auditory,attend visual), (ii) spatial location of
motion offset (8°, 38°, 60°), (iii) motion direction (towards
the periphery, towards the midline) and (iv) spatio-temporal
disparity (for 0.5 s signal duration: 15° (195 ms), 10°
(130 ms), 5° (65 ms), 0° (0 ms), +5° (+65 ms), +10°
(+130 ms), +15° (+ 195 ms); for 2.0 s signal duration (15°
(780 ms), 10° (520 ms), 5° (260 ms), 0° (0 ms), +
(+260 ms), +10° (+ 520 ms), + 15° (+ 780 ms)).
2.1.1. Signal duration 0.5 s
Localization performance in both the attend auditoryand
the attend visualcondition was affected by concurrent
motion streams in the respective unattended modality
(Fig. 2A). Auditory motion perception mainly varied as a
function of the displaced visual stimulus at central motion
offset locations. The mean values indicate that the magnitude
of localization bias towards the respective final position of
visual trajectories increased with increasing spatio-temporal
disparities. This trend was confirmed by a highly significant
main effect of spatio-temporal disparity (F
(6,150)
= 25.55; p< 0.001,
η
2
=0.505). The localization biases were noticeably reduced, but
still significant, when trajectories with the same disparities
between the visual and the acoustic signal terminated at
paracentral (F
(6,150)
=9.63; p< 0.001, η
2
=0.278) and lateral motion
offset locations (F
(6,150)
=2.51; p =0.044, η
2
=0.091). Opposite
effects were observed for the attend visualcondition. Concur-
rent acoustic streams had the strongest influence on visual
motion perception at lateral motion offset locations (F
(6,150)
=53.12;p<0.001,η
2
= 0.680). That is, the perceived final positions
of visual trajectories were shifted towards the respective final
positions of acoustic trajectories. This biasing effect was
reduced, though still significant at paracentral (F
(6,150)
=29.58;
ATTEND VISUAL
ATTEND AUDITORY
LATERAL
PARACENTRAL
CENTRAL
ATTEND VISUAL
ATTEND AUDITORY
(A)
0.5 seconds
(B) 2.0 seconds
Spatial disparity
Localization bias (deg)
-5
0
5
-15°
-10°
-5°
+5°
+10°
+15°
-5
0
5
-5
0
5
-5
0
5
Fig. 2 Mean localization bias in the attended modality. Blue symbols: attend auditory, yellow symbols: attend visual.
Localization performance is given for central, paracentral and lateral motion offset locations (data are collapsed across both
hemifields). The respective trajectories are indicated on top of the graph. Red circles denote motion offset locations in the
attended modality. Normalized data are plotted as a function of spatial disparity between the acoustic and visual motion offset.
Note, that temporal disparities are not indicated in the graph. The dotted, grey lines refer to the performance in localizing the
respective unimodal control stimulus. Mean values deviating from zero indicate a localization bias towards the non-attended
modality. Lower and upper error bars represent the 25th and the 75th percentile, respectively. A) 0.5 s signal duration in the
attended modality B) 2.0 s signal duration in the attended modality.
102 BRAIN RESEARCH 1466 (2012) 99111
p<0.001,η
2
=0.542) and central motion offset locations (F
(6,150)
=
7.76; p< 0.001, η
2
=0.237).
The comparison of visual and auditory localization perfor-
mance in a four-way repeated measures ANOVA revealed a
main effect of spatio-temporal disparity (F
(6,150)
=82.99; p < 0.001,
η
2
=0.768). Localization biases in the attended modality were
correlated with spatio-temporal disparities, i.e. the largest
negative shifts were observed for disparities of 15° (195 ms),
whereas the largest positive shifts occurred for disparities of + 15°
(+195 ms). The overall biasing effect of concurrent acoustic
streams on visual motion perception was larger than the reverse
effect for disparities of 10° (130 ms), + 10° (+130 ms) and +15°
(+195 ms), which is confirmed by an interaction of attended
modality and spatio-temporal disparity (F
(6,150)
=2.70; p = 0.038,
η
2
=0.097). The inverted pattern of mutual influences between
modalities at central and lateral motion offset locations is
reflected in the significant interaction of attended modality,
spatial location of motion offset and spatio-temporal disparity
(F
(12,300)
=8.31; p<0.001, η
2
=0.249). When subjects indicated the
final position of moving acoustic signals that terminated at
central locations, the magnitude of localization bias varied as a
function of the displaced visual motion offset, whereas the
localization of visual motion offsets in the central field was much
less affected by the displaced acoustic motion offset. However, at
lateral motion offset locations, it was visual motion perception
that was biased in the presence of concurrent acoustic streams
and the magnitude of the respective localization bias depended
on the displaced acoustic motion offset. Concurrent visual
streams, in contrast, were not as effective in biasing the perceived
final position of acoustic trajectories in the periphery. Finally, at
paracentral motion offset locations, comparable biasing effects
towards the unattended modality were observed between the
attend auditoryand attend visualcondition.
2.1.2. Signal duration 2.0 s
For longer signal durations, localization performance in the
target modality was less affected or even unaffected by
concurrent motion streams in the unattended modality
(Fig. 2B). Specifically, mean values in the attend visual
condition were largely comparable to unimodal localization
performance at each spatial location of motion offset. This
indicates that concurrent acoustic motion streams hardly
affected the perceived final position of visual motion. Biasing
effects were observed in the attend auditorycondition at
paracentral (F
(6,150)
=5.73; p = 0.001, η
2
=0.186) and lateral mo-
tion offset locations (F
(6,150)
=6.55; p < 0.001, η
2
=0.208). Surpris-
ingly, the localization biases were towards the opposite
direction (i.e. away from the final position of visual motion).
The comparison of visual and auditory localization perfor-
mance in a four-way repeated measures ANOVA revealed a
main effect of attended modality (F
(1,25)
=5.75; p = 0.024,
η
2
=0.187), which was due to the biasing effects in the attend
auditorycondition. More pronounced biasing effects were
observed for signals moving towards the periphery (F
(1,25)
=
10.19; p= 0.004, η
2
=0.290). Furthermore, the magnitude of
localization bias depended on the spatio-temporal disparity
(F
(6,150)
=12.72; p< 0.001, η
2
=0.337), i.e. localization of the acous-
tic signal was increasingly biased away from the locationof the
visual motion offset with increasing spatio-temporal dispar-
ities. The interaction of attended modality and spatio-temporal
disparity (F
(6,150)
=7.25; p<0.001, η
2
=0.225) is explained by the
fact that the magnitude of the localization bias towards the
opposite direction varied as a function of spatio-temporal
disparity particularly in the attend auditorycondition. Addi-
tionally, the significant interaction of spatial location of motion
offset and spatio-temporal disparity (F
(12,300)
=3.15; p =0.004,
η
2
=0.112) is dueto the fact that localization performance varied
as a function of spatio-temporal disparity at paracentral and
lateral motion offset locations, but was not affected at central
motion offset locations.
2.1.3. Comparing localization performance for 0.5 s and 2.0 s
signals
The employed spatial disparities at motion offset were
identical for short and long signal durations. However, the
presentation of the same spatial offset disparity in signals
that differ in their overall duration implicated a different
temporal offset disparity between the attended and unat-
tended signal, a feature inherent to the variation in the
duration of moving signals. This temporal variation led to
notable differences in localization performance between both
signal durations. Effects of unattended motion streams on
localization performance in the attended modality were
predominantly observed for signals at a duration of 0.5 s, in
which spatial disparities in steps of 5° between 15° and +15°
co-varied with temporal disparities in steps of 65 ms between
195 ms and + 195 ms. However, the covariation of spatial and
temporal disparities differed in signals at a duration of 2.0 s.
Spatial disparities between 15° and +15° were by accompanied
by larger temporal disparities in steps of 260 ms between
780 ms and + 780 ms, resulting in a considerably reduced
influenceof the unattended signal on localization performance
in the target modality.
2.2. Experiment 2: integration and separation of audio-
visual motion streams
Reports of perceived unity or separation of moving audio-
visual stimuli (Fig. 3) were classified into four groups,
according to the respective constant signal (acoustic or visual)
and stimulus duration (0.5 s or 2.0 s). Statistical analysis was
based on separate log-linear models for the data of the two
stimulus durations. The included variables were the same as
in experiment 1 (see Section 2.1).
2.2.1. Signal duration 0.5 s
Subjects predominantly reported perceived unity for congru-
ent audio-visual signals or for signals with spatio-temporal
disparities of /+5° (/+ 65 ms) between the acoustic and the
visual motion offset (Fig. 3A).
With increasing spatio-temporal disparities the probability
to integrate both signals constantly decreased as reflected in a
one-way association of spatio-temporal disparity (χ
2(0.95,6)
=
101.7; p<0.001). A two-way association of spatio-temporal
disparity and constant modality (χ
2(0.95,6)
=60.90; p< 0.001) is
explained by an increased probability to separate signals
when the visual trajectory exceeds the acoustic trajectory in
terms of motion path and duration. That is, for the constant
acoustic trajectory, the percentage of reported unity, in
particular for spatial disparities of +10° (+130 ms) and +15°
103BRAIN RESEARCH 1466 (2012) 99111
(+195 ms) (visual trajectory longer than the acoustic trajectory),
was considerably smaller compared to spatio-temporal dispar-
ities of 10° (130 ms) and 15° (195 ms) (visual trajectory
shorter than the acoustic trajectory). Likewise, when the visual
trajectory was constant, the probability to integrate both signals
was lower for spatio-temporal disparities of 10° (130 ms) and
15° (195 ms) (visual trajectory longer than the acoustic
trajectory) compared to spatio-temporal disparities of +10°
(+130 ms) and +15° (+ 195 ms) (visual trajectory shorter than
the acoustic trajectory). No one-way associations or higher-
order associations were found for the factors spatial location of
motion offset or motion direction. This indicates that the ratios
of integration and separation are similar for stimulus combina-
tions with identical spatio-temporal disparities, but traveling
along different trajectories.
2.2.2. Signal duration 2.0 s
Reports of perceived unity were mainly observed for congru-
ent trials. A lower percentage of audio-visual signals was
integrated at spatio-temporal disparities of /+5° (/+260 ms)
and signals were almost exclusively perceived as separated
for spatio-temporal disparities of /+10° (/+520 ms) and /+
15° (/+780 ms) (Fig. 3B). A one-way association of spatio-
temporal disparity (χ
2(0.95,6)
=858.1; p< 0.001) confirms the
increased probability of stimulus separation with increasing
disparities between both streams. Furthermore, the percent-
age of reported integration was lower when the moving visual
signal's termination point exceeded its acoustic counterpart.
For example, for trials in which the acoustic trajectory was
constant, the percentage of reported signal unity was about
70% when the visual trajectory was shorter than the acoustic
trajectory (5°, 260 ms), but considerably lower (about 20%)
when the visual trajectory exceeded the acoustic trajectory by
+ (+260 ms). Response behavior was similar for trials in
which the visual trajectory was constant. This finding was
confirmed by a two-way association of spatio-temporal
disparity and constant modality (χ
2(0.95,6)
=100.19; p < 0.001).
Again, there were no one-way associations or higher-order
associations including the factors spatial location of motion
offset or motion direction, indicating that stimulus combina-
tions with identical spatio-temporal disparities were equally
likely to be integrated regardless of the traveled trajectory.
2.2.3. Comparing discordance detection for 0.5 s and 2.0 s
signals
The rate of perceived unity showed a pronounced asymmetry
between short (Fig. 3A) and long signal durations (Fig. 3B). As
described for experiment 1, one has to take into account that
temporal disparities at motion offset covary both with spatial
disparity and overall signal duration which notably influenced
performance in detecting incongruent trials. The same spatial
disparities ranging between 15° and +15° were accompanied
by much larger temporal disparities in 2.0 s signals which is
reflected in overall lower probabilities to integrate both signals.
0
20
40
60
80
100
-15° -10° -5° +5° +10° +15°
0
20
40
60
80
100
-15° -10° -5° +5° +10° +15°
(A) 0.5 seconds
(B) 2.0 seconds
ACOUSTIC CONSTANT VISUAL CONSTANT
Spatial disparity
Percentage reports of unity (%)
47° to 8°
30° to 8°
77° to 38°
0° to 38°
90° to 60°
22° to 60°
Fig. 3 Percentage of reports of perceived unity in the constant modality. Percentages are given for each trajectory and each
spatial disparity between the acoustic and visual motion offset (data are collapsed across both hemifields). Note, that temporal
disparities are not indicated in the graph. Different colors indicate different trajectories as illustrated on top of the graph and
specified in the legend. Black circles denote motion offset locations in the constant modality. A) 0.5 s signal duration in the
constant modality B) 2.0 s signal duration in the constant modality.
104 BRAIN RESEARCH 1466 (2012) 99111
Still, there were two general principles that applied to both
signal durations: when the visual trajectory exceeded the
acoustic trajectory, the probability of perceiving separate
signals was increased compared to signals in which the
acoustic trajectoryexceeded the visual trajectory. Furthermore,
the proportion of perceived unity and separation for identical
spatio-temporal disparities was not affected by the traveled
trajectory of the audio-visual stimulus.
3. Discussion
The goal of the current experiments was to disentangle the
contributions of crossmodal interactions and multisensory
integration to the perception of audio-visual motion in space.
So far, spatial effects have not been investigated in detail. In
the present study, the location of the trajectoriestermination
point (central vs. lateral) has proven to be a crucial factor for
the crossmodal interplay between the visual and the auditory
system. However, the location of the trajectoriestermination
point did not influence the ratio of perceived unity and
separation of audio-visual motion streams.
We found that, in particular for short signal durations, the
magnitude of crossmodal localization bias towards the
location of motion offset in the non-attended modality varied
as function of spatio-temporal disparity between the visual
and acoustic motion offset. However, when the task required
a decision on whether the audio-visual stimuli were spatio-
temporally aligned or not, the percentage of reported unity
decreased with increasing spatio-temporal disparities. This
indicates that crossmodal interactions do not depend on the
integration of the audio-visual motion stream. Moreover, the
results suggest that subjects benefited jointly from spatial and
temporal cues in the perception of audio-visual motion and
that temporal disparities at motion offset likely explain the
observed asymmetries in performance between short and
long signal durations in both experiments.
3.1. Crossmodal interactions between the visual and the
auditory system depend on the trajectory in space
The findings of the audio-visual localization task indicate
mutual crossmodal interactions between the visual and the
auditory system in the perception of motion. Thereby, the
influenceof the unattended signal on localization performance
in the attended modality crucially depended on the specific
trajectory as well as on the spatio-temporal disparity between
the visual and the acoustic motion offset. Due to the fact that
the stimuli were initially congruent, the modality determining
the percept in conflicting trials could only emerge at the final
part of the moving stimulus. Crossmodal interactions between
the visual and the auditory modality were mainly observed for
short stimulus durations (0.5 s), but were found to be much
weaker for longer signal durations (2.0 s). Even though the
predefined spatial conflicts were identical for both signal
durations (/+5°, /+10°, /+15°), moving audio-visual stimuli
differed with respect to temporal offset disparities (65 ms to
195 ms in short signals and 260 ms to 780 ms in long signals),
which likely explains the discrepancy in localization perfor-
mance between short and long signal durations. For 0.5 s
signals, visual motion perception was most susceptible to be
influenced by concurrent acoustic streams at lateral motion
offset locations, whereas auditory motion perception was
biased towards visual information mainly at central motion
offset locations. The results do not support the notion of an
existing asymmetry between modalities in the perception of
motion (Soto-Faraco et al., 2004), i.e. a visual dominance, but
rather highlight the crucial role of the auditory modality when
the reliability of the visual stimulus is reduced (Alais and Burr,
2004; Hidaka et al., 2011). Our results might therefore reflect
differential abilities between the visual and the auditory system
when attempting to accurately track dynamically changing
location cues along different trajectories in space. The process-
ing of visual motion is most accurate in foveal regions, and this
accuracy in detecting positional changes decreases remarkably
in the peripheral visual field (Finlay, 1982; McKee and
Nakayama, 1984; To et al., 2011). Studies on spatial acuity in
the auditory modality have shown that the minimum auditory
movement angle (MAMA) is smallest in the central field
(Grantham, 1986). However, Perrott et al. (1987, 1990) pointed
out that the resolution of the auditory system across space is
roughly homogeneous in relation to the visual resolution. That
is, optimal visual resolution is only possible in the fovea,
whereas the acuity in perceiving sound sources is much more
constant across space. This crucial discrepancy between both
modalities is likely the key underlying feature that accounts for
the results of the present study. On the one hand, our results
show that concurrent visual motion streams capturedynamic
auditory information at central motion offset locations, thus
confirming the results of previous studies on audio-visual
motion perception (e.g. Allen and Kolers, 1981; Mateeff et al.,
1985; Soto-Faraco et al., 2002; Strybel and Vatakis, 2004). Visual
motion perception, in contrast, was only slightly biased towards
the final position of the acoustic trajectory in the central field.
This suggests that foveal tracking of motion features in the
visual system is more precise than dynamic tracking of
interaural differences at midline positions in the auditory
system. Indeed, during dynamic visual capture, Alink et al.
(2008) demonstrated reduced activity in the auditory motion
complex (AMC) and simultaneously enhanced activity in the
human middle temporal area. The authors concluded that
altered activity in early auditory motion areas reflects a
neuronal correlate of visual dominance.
On the other hand, our results also demonstrate that
concurrent acoustic streams capturevisual motion informa-
tion at lateral motion offset locations, whereas concurrent
visual streams were not as effective in biasing auditory motion
perception in the periphery. We hypothesize that superior
peripheral tracking of dynamically changing location cues in
the auditory system accounts for the observed biasing effects
on visual motion perception. Thisis in line with the notion that
spatial acuity in the periphery is more declined in the visual
system when compared to the auditory system, as previously
proposed by Perrott et al. (1987, 1990). Concerning the study by
Alink et al. (2008), the question arises whether an inverted
activation shift between auditory and visual motion areas
would be observed, once the auditory modality dominates
perception. That is, enhanced activity in the respective cortical
auditory areas could directly affect motion processing in visual
areas. The hypothesisof a gradual shift in the activation pattern
105BRAIN RESEARCH 1466 (2012) 99111
is also supported by the fact that the influence of the respective
unattended modality on visual or acoustic localization perfor-
mance was equivalent at paracentral motion offset locations in
the present study. Thus, the visual and the auditory system
seem to be equal in their capabilities to track dynamic
information at intermediate spatial locations.
For 2.0 s signals, crossmodal interactions were only found
for the attend-auditorycondition. The biasing effects were
much smaller and, interestingly, negatively correlated with
spatio-temporal disparity. Wallace et al. (2004) also found a
negative localization bias for static audio-visual events that
were either spatially or temporally disparate. In the present
study, concurrent acoustic and visual signals moved in the
same direction and differed only with regard to spatial and
temporal motion offsets. Specifically, in the case of prolonged
stimulus durations, we assume that multisensory predictions
about the trajectory of an ongoing motion are established in
the respective brain areas which are then violated when one
signal ceases abruptly. Van Wanrooij et al. (2010) demonstrat-
ed that multisensory expectations based on the spatial
congruence or incongruence of audio-visual stimuli of previ-
ous trials have an influence on reaction times and accuracy in
orienting towards the current audio-visual stimulus. Further-
more, the violation of multisensory expectations has been
shown to affect the dynamics of oscillatory activity in the
brain and reflects the detection of an intermodal conflict
(Arnal et al., 2011), which in turn might interfere with our
ability to properly localize motion stimuli. In the current
study, however, the detection of a discrepancy between both
streams seems to have an impact on auditory, but not on
visual localization performance. Though, it is debatable
whether the consequence of a putative prediction error can
be interpreted in terms of a crossmodal localization bias or
rather reflects a distinct phenomenon. Additional research is
needed to clarify this aspect.
3.2. Spatio-temporal analysis is crucial for the integration
and separation of audio-visual motion streams
In the second experiment we found that the performance in
detecting conflicting audio-visual trials improved with increasing
spatio-temporal disparities between the acoustic and the visual
motion offset. The results are therefore in line with previous
studies using stationary audio-visual events (Bertelson and
Radeau, 1981; Lewald and Guski, 2003; Slutsky and Recanzone,
2001; Wallace et al., 2004). As conflicting trials only differed at
motion offset, one can assume that the decision regarding the
integration or separation of both streams depended on the final
part of the audio-visual stimulus. Furthermore, we demonstrat-
ed the crucial role of the co-analysis of spatial and temporal
features in integrating multisensory information. Though the
probability of stimulus separation generally increased with
increasing spatial disparities both for 0.5 s and 2.0 s signals,
subjectsperformance showed pronounced asymmetries with
respect to the absolute rate of stimulus separation. Inherent to
the prolongation of the overall signal duration were larger
temporal offset disparities between the acoustic and visual
stream in 2.0 s signals which likely account for the observed
differences. It can be assumed that the temporal disparities in
2.0 s signals did not fall within the temporal window of
integration (Colonius and Diederich, 2010). Consequently,
these signals were more likely to be perceived as separated
compared to those at a duration of 0.5 s.
Unexpectedly, the proportion of perceived unity and
separation for stimulus combinations with identical spatio-
temporal disparities was comparable across trajectories. This
finding applied to both signal durations. It can be reasoned
that subjects may have additionally relied on a comparison of
temporal offsets between the visual and acoustic signal. Thus,
the combined use of spatial and temporal cues seems to
provide redundant information at each spatial location of
motion offset.
Another important finding was the role of the temporal
order of visual and acoustic signal offsets for perceived signal
unity. Visual motion offsets prior to acoustic motion offsets
resulted in a higher percentage of reports of perceived unity
than vice versa. In other words, segregation of both streams
occurred more frequently when the visual trajectory exceeded
the acoustic trajectory. This finding is likely the direct result
of the fact that light travels more quickly than sound (King,
2005; Recanzone, 2009), i.e. the visual information is assumed
to reach the sensory receptors earlier than the associated
acoustic information. Therefore, events violating this fact, e.g.
when sound precedes light or when the visual trajectory
exceeds the acoustic trajectory, are not considered to origi-
nate from the same object (Morein-Zamir et al., 2003) and thus
the streams tend to be separated.
3.3. Crossmodal interactions and multisensory integration
in motion perception are independent processes
Crossmodal interactions between the visual and the auditory
system were predominantly observed for short signal dura-
tions. Depending on the specific traveled trajectory in space,
both the perceived auditory and visual motion information
were subject to a captureby the unattended modality.
Thereby, the magnitude of localization bias varied as a
function of spatio-temporal disparity between the visual and
the acoustic motion offset. Biasing effects were still observed
for the largest spatio-temporal disparities of -/+15° (/
+195 ms). In contrast, a reverse trend was observed for the
subjectsreports of perceived integration or separation of both
sensory streams. Increasing spatio-temporal disparities led to
more reliable separations. Thus, it can be concluded that
integration of both streams is not a necessary prerequisite for
a perceptual bias of information towards the irrelevant
modality as previously suggested by Bertelson and Radeau
(1981). The results of the present study stand in contrast to
those of Wallace et al. (2004), where subjects were presented
with congruent and conflicting stationary audio-visual
events. The subjects were instructed to localize the position
of the acoustic signal and to indicate at the same time
whether the visual and the acoustic signal were perceived as
unified or separated. Hence, Wallace et al. (2004) did not
separately investigate crossmodal interactions and multisen-
sory integration. They showed that perceived unity was
correlated with localization of the stationary auditory signal
at or very near the location of the stationary visual signal. This
is not surprising, given the fact that subjects may have relied
on the non-ignored visual signal instead of using auditory
106 BRAIN RESEARCH 1466 (2012) 99111
cues for localization. However, the assumption of unity
implies integration into a coherent percept whereby different
streams of information are perceived as emanating from the
same spatial location. Localization in a selective attention
task, in contrast, rather reflects crossmodal interactions
between senses without necessarily assuming that various
streams belong to the same object. In the present study,
localization biases towards the non-attended signal were
observed at intermediate positions between the distance of
the visual and the acoustic motion offset location. Thus, the
results support the existence of genuine localization biases
and do not reflect pure decisional strategies.
Furthermore, our data provide clear evidence for a co-
analysis of spatial and temporal cues in the perception of
bimodal motion streams. Though one cannot dissociate the
respective contribution of spatial and temporal cues to
behavioral performance for a given spatio-temporal disparity,
the comparison between short and long signal durations
revealed that multisensory interactions only occur when both
cues are provided within a specific range of tolerance. Both
interaction and integration effects were much weaker for 2.0 s
signals. Although these signals were presented with identical
predefined spatial disparities as those at a duration of 0.5 s,
temporal disparities were much larger and obviously exceeded
the temporal window for binding of various streams of informa-
tion. Temporal disparities are therefore likely to explain the
observed asymmetries both in the magnitude of localization bias
and the rate of perceived unity between 0.5 s and 2.0 s signals.
Previously, audio-visual interactions in motion perception
have mainly been studied in tasks in which subjects were
asked to discriminate the direction of motion in one modality
while ignoring a crossmodal distractor in another modality
(Allen and Kolers, 1981; Mateeff et al., 1985; Soto-Faraco et al.,
2002; Strybel and Vatakis, 2004). In the present study, cross-
modal interactions and multisensory integration were studied
with concurrently presented acoustic and visual signals that
moved in identical directions but were conflicting at motion
offset. The results support the notion that the visual modality
dominates motion perception in the central field which is
most likely due to superior tracking of positional information
at the midline (Soto-Faraco et al., 2004). However, tracking of
dynamically changing location cues in the periphery seems to
be more accurate in the auditory system as concurrent
acoustic motion streams biased visual motion perception at
lateral motion offset locations. The results of the current
study confirm that degraded reliability of the visual signal in
motion perception can be compensated for by the auditory
modality as previously proposed by Hidaka et al. (2011).
Taken together, the present findings indicate that different
trajectories in space alter the perceived quality of moving
visual and moving acoustic signals and thus their suscepti-
bility to capture by the other modality. The crossmodal
localization bias towards the location of the final position in
the non-attended modality still occurs even when the moving
audio-visual signal would be perceived as separated. This
finding implicates a more restricted range of tolerance for
spatial and/or temporal disparities in multisensory integra-
tion compared to crossmodal interactions, an aspect that
needs to be further considered in future studies investigating
multisensory interactions.
4. Conclusion
Our data provide evidence that the interplay of the visual and
the auditory modality in motion perception crucially depends
on the trajectory. The findings indicate that positional
information in the central field is more accurately tracked by
the visual system, since concurrent visual streams biased
auditory motion perception mainly at central motion offset
locations. The auditory system, in contrast, seems to be
superior to the visual system in tracking positional informa-
tion in the peripheral space as visual localization performance
was biased towards the final position of the acoustic
trajectory mainly at lateral motion offset locations. The
magnitude of localization bias thereby varied as a function
of the spatio-temporal disparity between the visual and the
acoustic stream. Importantly, the interplay between modali-
ties was only observed when temporal conflicts at motion
offset did not exceed a critical range. The results furthermore
suggest that crossmodal interactions occur independently
from the integration of the audio-visual motion stream.
5. Experimental procedure
5.1. Subjects
Twenty-six subjects (14 females, 12 males; 4 left-handed;
mean age: 26.4 years; age range 2033 years) with normal or
corrected-to-normal vision and normal hearing abilities
participated both in experiment 1 and experiment 2. None of
the subjects reported any neurological disorder. All subjects
gave informed written consent and were compensated for
their participation. This study conformed to The Code of
Ethics of the World Medical Association and was approved by
the local Ethics Committee of the University of Leipzig.
5.2. Setup and stimuli
The experiments were conducted in an anechoic, sound
attenuated free-field laboratory (40 m
2
, Industrial Acoustics
Company [IAC]). Forty-seven broad-band loudspeakers (Visa-
ton, FRS8 4) were mounted in an azimuthal, semicircular array
at ear level (Fig. 1A). A comfortable, fixed chair was positioned
in the middle of the semicircle at a constant distance of 2.35 m
from the loudspeakers such that subjects were aligned
straight ahead to the central speaker at 0°. The loudspeaker
array covered an azimuthal plane from 98° to the left to +98°
to the right. The angular distance between two loudspeaker
membranes was 4.3°. Each loudspeaker was calibrated indi-
vidually. For this, the transmission spectrum was measured
using the Brüel & Kjær measuring amplifier (B&K 2610), a
microphone (B&K 2669, pre-amplifier B&K 4190) and a real-
time signal processor (RP 2.1, System3, Tucker Davis Technol-
ogies, TDT). For each loudspeaker a calibration file was
generated in Matlab 6.1 (The MathWorks Inc, Natick, USA)
and later used for presentation of acoustic stimuli with flat
spectra across the frequency range of the stimulus.
The speaker array was combined with an array of 188
white light emitting diodes (LED) mounted in azimuthal steps
107BRAIN RESEARCH 1466 (2012) 99111
of 1° at eye-level. The LEDs were controlled by a set of 51
printed circuit boards (PCB), which were interfaced with a
desktop PC. Each PCB was assembled with four infra-red (IR)
sensitive phototransistors for the registration of pointing
directions. The phototransistors were arranged with the
same angular distances as the LEDs, but extended beyond
the loudspeaker- and LED array by 8° to both the left and the
right. In combination with the IR-sensitive phototransistors,
the LED array was also used to provide visual feedback of the
angular position pointed to by the subjects. A customized IR-
torch served as pointing device (Solarforce L2 with 3W NVG
LED). The subtended angle of the IR-light beam covered a
maximum of 8° at the level of the LEDs. The mean position
across all activated IR-sensitive phototransistors was com-
puted online and the corresponding LED flashed up as a visual
feedback for the subject.
The loudspeakers and LEDs were hidden behind acousti-
cally transparent gauze, which did not affect visibility of the
LEDs. Thus, subjects were unable to make use of landmarks
during the localization and detection tasks. An infrared
camera was installed in the test chamber to monitor subjects
performance during the experimental sessions. Custom
written MATLAB scripts (R2007b, The MathWorks Inc., Natick,
USA) were used to control stimulus presentation and data
acquisition. Visual and acoustic signals were digitally gener-
ated using RPvdsEx (Real Time Processor Visual Design Studio,
Tucker Davis Technologies, TDT) and delivered to two multi-
channel signal processors (RX8, System3, TDT).
Acoustic stimuli were low-frequency Gaussian noise bursts
(2501000 Hz) that were presented at 40 dB SL (sensation level).
Sound localization in this low-frequency range is primarily
based on the processing of interaural time differences (ITDs).
Sound motion was simulated by successive activation of
adjacent loudspeakers. To obtain a continuous motion, the
ratio of sound intensity between two adjacent loudspeakers
was adjusted by linear cross-fading of the output voltage. The
level roving (variability in sound intensity around presentation
level) was set to /+3 dB to avoid adaptation to loudness-related
localization cues. Visual stimuli were light spots at a luminance
of 2.5 lux. Moving visual signals were simulated by successive
activation of adjacent LEDs. The small distance of 1.0° between
two adjacent LEDs was sufficient to generate an apparent
motion percept.
Stimuli were unimodal (acoustic only/visual only) and
concurrent audio-visual motion streams. The constant signal
in moving audio-visual stimuli (attended modality in exper-
iment 1 or constant modality in experiment 2) traveled an
angular range of 38°. The speed of motion, however, varied
with the signal duration of the presented sequence (either
0.5 s or 2.0 s, including 10 ms rise and decay times). The final
positions of the constant signals were located at 8°, 38°,
60°, +8°, +38°, and +60°, respectively (Fig. 1A). Motion offset
locations at /+8° were defined as central, at /+38° as
paracentral and at /+60° as lateral. The concurrent signal in
the second modality finished either at a congruent location or
spatially and temporally displaced with respect to the final
position of the constant signal (Fig. 1B). Note, that the
predefined spatial disparities (/+5°, /+ 10°, /+ 15°) for 0.5 s
and 2.0 s signals were identical, but temporal disparities co-
varied both with spatial disparity and signal duration (Fig. 1C,
Table 1). Control stimuli (acoustic only or visual only) were
identical to the respective constant signal of audio-visual
signals in terms of trajectory and signal duration. To examine
the possible effect of motion direction on localization perfor-
mance, unimodal and audio-visual stimuli moved towards
the final position from either side (i.e. towards the midline
and towards the periphery, Fig. 1A).
5.3. Study design and procedure
Subjects were tested in complete darkness. Prior to experimen-
tal testing the detection threshold for moving sounds was
obtained for each subject to adjust the presentation level for
acoustic signals during the tests to 40 dB SL. Employing a heard/
not-heard paradigm, the subjects were asked to indicate by a
button press on a response box whether they detected an
acoustic signal (Gaussian noise bursts, 2501000 Hz) moving
from 38° to 0° in the left hemifield. The initial sound level of
the moving stimulus was set to 63 dB SPL. When the subjects
detected the stimulus, sound level was decreased in steps of
2.5 dB. Otherwise, sound level was increased in equal step sizes.
When the subjects were confident on their individual hearing
threshold, i.e. the minimum sound pressure level which they
required to detect the moving stimulus, they confirmed their
decision by a button press. This reference value was used to set
the acoustic stimulus at 40 dB SL.
Table 1 Spatial and temporal disparities at motion offset. The indications refer to disparities between the constant signal
(either a moving acoustic or a moving visual signal) and the corresponding variable signal of audio-visual stimuli as
presented in the audio-visual localization task (experiment 1) and the task on detecting spatio-temporal disparities
(experiment 2). Signal duration in the constant modality was either (A) 0.5 s or (B) 2.0 s. Note, that the employed spatial
disparities were identical for both signal durations, but temporal disparities covaried both with spatial disparity and signal
duration. In experiment 1 the indications for the constant modality and the variable modality correspond to the attended
modality and the non-attended modality, respectively.
(A) 0.5 s
Spatial disparity 15° 10° 5° 0° +5° + 10° + 15°
Signal duration constant modality 0.5 s 0.5 s 0.5 s 0.5 s 0.5 s 0.5 s 0.5 s
Signal duration variable modality 0.305 s 0.37 s 0.435 s 0.5 s 0.565 s 0.63 s 0.695 s
Temporal disparity 0.195 s 0.13 s 0.065 s 0 s +0.065 s + 0.13 s + 0.195 s
(B) 2.0 s
Spatial disparity 15° 10° 5° 0° +5° + 10° + 15°
Signal duration constant modality 2.0 s 2.0 s 2.0 s 2.0 s 2.0 s 2.0 s 2.0 s
Signal duration variable modality 1.22 s 1.48 s 1.74 s 2.0 s 2.26 s 2.52 s 2.78 s
Temporal disparity 0.78 s 0.52 s 0.26 s 0 s +0.26 s + 0.52 s +0.78 s
108 BRAIN RESEARCH 1466 (2012) 99111
Each subject participated in two experiments: (i) an audio-
visual localization task and (ii) a detection task in which
moving audio-visual stimuli had to be judged as spatio-
temporally congruent or incongruent.
Both experiments were divided into blocks. Short breaks
were allowed after completion of each block. Each block
started with the presentation of two stationary stimuli that
could be ignored by subjects. Additional 5 stationary stimuli
per block (acoustic, visual or audio-visual) were randomly
interspersed between moving stimuli to avoid adaptation to
motion. Subjects were instructed to look straight ahead
during the trials and not to pursue the moving signals
neither with their eyes nor their head. Unlike in a previous
free-field study in which subjects were asked to fixate an LED
during stimulus presentation (Hofbauer et al., 2004), no
fixation point was provided in the present study to exclude
the possibility that subjects could make use of it as a
reference for the midline position in the darkened test
chamber. The subjectsposition was permanently monitored
by the experimenter via video stream from the test chamber.
Trials in which subjects moved their head were excluded
from data analysis.
5.3.1. Audio-visual localization task
Each subject completed a practice run consisting of 30 stimuli
to become familiar with the task and the infra-red pointing
device. During two test sessions, a total of 384 moving
stimulus combinations (336 audio-visual, 24 acoustic, 24
visual) were presented in 16 experimental blocks. An exper-
imental session consisted of 8 blocks. Each stimulus combi-
nation was presented twice. Stimuli within an experimental
block were presented in randomized order. Additionally, the
order of the 16 experimental blocks was counterbalanced
across subjects. In a blocked design, subjects were instructed
to selectively focus their attention on either the acoustic
(attend auditory) or the visual (attend visual) component of
the moving audio-visual stimuli. Instruction on the respective
to-be-attended-modality was given prior to each test block.
Moving unimodal stimuli within an experimental block were
only presented in the modality that corresponded to the
attended modality. Subjects were asked to indicate the
perceived final position of the moving targets in the attended
modality by pointing with an IR-torch. Visual feedback on the
indicated angular position was provided by flashing up the
corresponding LED. To confirm their response, subjects had to
release the button on the IR-torch whereby the designated
LED flashed three times and signalized successful registration
of the angular position. Responses were automatically stored
for subsequent data analyses. The next trial started after an
intertrial-interval (ITI) of 3.0 s, such that subjects were able to
re-orient towards the midline position. A response to each
moving target was required before the next trial could begin.
No feedback on the correct angular position was given at any
time of the test session. Fig. 4 illustrates the experimental
procedure.
5.3.2. Integration and separation of audio-visual motion streams
Each subject completed a practice run consisting of 20 stimuli to
become familiar with the task. Only audio-visual stimulus
combinations (the same as in the audio-visual localization task)
were presented in randomized order in eight experimental
blocks. The order of the eight blocks was counterbalanced across
subjects. Subjects were asked to simultaneously attend to the
acoustic and the visual part of the moving stimulus and to judge
whether motion offsets were spatio-temporally aligned or not
(same-different judgments). After each trial subjects indicated
their response by pressing the corresponding button on a
response box. The ITI between button press and the onset of
the next trial was 2.0 s. A response to each moving audio-visual
stimulus was required before the next trial could begin. No
feedback on the correct response was given at any time of the
test session.
5.4. Data analysis
Subjectsperformance did not differ between both hemifields,
so data for each experiment were collapsed for the respective
trajectories in the left and right hemifield.
5.4.1. Audio-visual localization task
Localization bias in the attended modality was quantified as the
difference between the indicated final position and the actual
final position of motion for each stimulus combination. Addi-
tionally, data were normalized with respect to individual unim-
odal localization performance. Normalization compensated for
the overestimation of the final position which is inherent
to motion perception (Hubbard, 2005). Displaying normal-
ized data allowed for the direct comparison of the
magnitude of localization bias between different trajecto-
ries. Stimulus repetitions were averaged for each subject.
ITI
2.0s 2.0s 0.5s 2.0s 0.5s 0.5s 2.0s 0.5s 2.0s 0.5s
0.5s
0.5s
0.5s
RR R RRRRRRR
...
Stationary visual stimulus
Moving visual stimulus
Moving sound
Moving audio-visual stimulus
Stationary sound
Stationary audio-visual stimulus
Fig. 4 Experimental procedure. The symbols illustrate a sequence of different types of moving and stationary stimuli that
were presented in randomized order within an experimental block. The stimulus durationsof signals in the attended modality
(audio-visual localization task) and in the constant modality (detection task on spatio-temporal disparities) are indicated below
the symbols. R = response (experiment 1: localization of the final position of motion in the attended modality using an infra-red
torch, experiment 2: button press in the task on detecting spatio-temporal disparities), ITI = intertrial-interval.
109BRAIN RESEARCH 1466 (2012) 99111
The results for the respective stimulus combinations were
grouped according to both the attended modality and the
stimulus duration. Statistical analyses were based on
normalized data. Data were submitted to a four-way
repeated measures ANOVA. The included within-subject
factors are specified in Section 2.1.Dataofthetwo
stimulus durations(0.5 s, 2.0 s) were analyzed in separate
ANOVAs.
5.4.2. Integration and separation of audio-visual motion streams
Response behavior was evaluated with regards to the rate of
perceived unity for each stimulus combination. The results
for the respective stimulus combinations were grouped
according to both the constant modality and the stimulus
duration. Data (same/different responses) were nominal vari-
ables and were submitted to a log-linear analysis for
statistical analysis. Log-linear models predict the expected
frequency counts in a contingency table for a two- or more
factorial design. Differences between observed frequencies
and expected frequencies are expressed in one-way associa-
tions (main effects) and two-way and higher-order associa-
tions (interactions). The log-linear data analysis was based on
the number of subjects that reported perceived unity for a
given stimulus combination. Included variables were the
same as in the audio-visual localization task (see Section 2.1).
Separate analyses were conducted for the data of the two
stimulus durations (0.5 s and 2.0 s).
Acknowledgments
This work was supported by the Deutsche Forschungsge-
meinschaft (DFG), graduate program Function of attention in
cognition. We wish to thank two reviewers for valuable
comments and suggestions. We are also thankful to Ingo
Kannetzky, Jörg Eckebrecht and Matthias Freier for technical
assistance and to Patrice Voss for proofreading of the manuscript.
REFERENCES
Alais, D., Burr, D., 2004. The ventriloquist effect results from
near-optimal bimodal integration. Curr. Biol. 14 (3), 257262.
Albright, T.D., Stoner, G.R., 1995. Visual motion perception. Proc.
Natl. Acad. Sci. U. S. A. 92 (7), 24332440.
Alink, A., Singer, W., Muckli, L., 2008. Capture of auditory motion
by vision is represented by an activation shift from auditory
to visual motion cortex. J. Neurosci. 28 (11), 26902697.
Alink, A., Euler, F., Galeano, E., Krugliak, A., Singer, W., Kohler, A.,
2012. Auditory motion capturing ambiguous visual motion.
Front. Psychol. 2, 391.
Allen, P.G., Kolers, P.A., 1981. Sensory specificity of apparent
motion. J. Exp. Psychol. Hum. Percept. Perform. 7 (6), 13181328.
Arnal, L.H., Wyart, V., Giraud, A., 2011. Transitions in neural
oscillations reflect prediction errors generated in audiovisual
speech. Nat. Neurosci. 14 (6), 797801.
Battaglia, P.W., Jacobs, R.A., Aslin, R.N., 2003. Bayesian integration
of visual and auditory signals for spatial localization. J. Opt.
Soc. Am. A Opt. Image Sci. Vis. 20 (7), 13911397.
Bertelson, P., Radeau, M., 1981. Cross-modal bias and perceptual
fusion with auditory-visual spatial discordance. Percept.
Psychophys. 29 (6), 578584.
Bertelson, P., Vroomen, J., de Gelder, B., Driver, J., 2000. The
ventriloquist effect does not depend on the direction of
deliberate visual attention. Percept. Psychophys. 62 (2), 321332.
Bolognini, N., Frassinetti, F., Serino, A., Làdavas, E., 2005. "Acous-
tical vision" of below threshold stimuli: interaction among
spatially converging audiovisual inputs. Exp. Brain Res. 160 (3),
273282.
Brooks, A., van der Zwan, R., Billard, A., Petreska, B., Clarke, S.,
Blanke, O., 2007. Auditory motion affects visual biological
motion processing. Neuropsychologia 45 (3), 523530.
Burr, D., Alais, D., 2006. Combining visual and auditory
information. Prog. Brain Res. 155, 243258.
Calvert, G.A., Hansen, P.C., Iversen, S.D., Brammer, M.J., 2001.
Detection of audio-visual integration sites in humans by
application of electrophysiological criteria to the BOLD effect.
Neuroimage 14 (2), 427438.
Colonius, H., Diederich, A., 2010. The optimal time window of
visualauditory integration: a reaction time analysis. Front.
Integr. Neurosci. 4, 11.
Ernst, M.O., Banks, M.S., 2002. Humans integrate visual and
haptic information in a statistically optimal fashion.
Nature 415 (6870), 429433.
Ernst, M.O., Bülthoff, H.H., 2004. Merging the senses into a robust
percept. Trends Cogn. Sci. 8 (4), 162169.
Finlay, D., 1982. Motion perception in the peripheral visual field.
Perception 11 (4), 457462.
Freeman, E., Driver, J., 2008. Direction of visual apparent motion
driven solely by timing of a static sound. Curr. Biol. 18 (16),
12621266.
Getzmann, S., 2007. The effect of brief auditory stimuli on visual
apparent motion. Perception 36 (7), 10891103.
Grantham, D.W., 1986. Detection and discrimination of simulated
motion of auditory targets in the horizontal plane. J. Acoust.
Soc. Am. 79 (6), 19391949.
Hairston, W.D., Wallace, M.T., Vaughan, J.W., Stein, B.E., Norris,
J.L., Schirillo, J.A., 2003. Visual localization ability influences
cross-modal bias. J. Cogn. Neurosci. 15 (1), 2029.
Hidaka, S., Teramoto, W., Sugita, Y., Manaka, Y., Sakamoto, S.,
Suzuki, Y., 2011. Auditory motion information drives visual
motion perception. PLoS One 6 (3), e17499.
Hofbauer, M., Wuerger, S.M., Meyer, G.F., Roehrbein, F., Schill, K.,
Zetzsche, C., 2004. Catching audiovisual mice: predicting the
arrival time of auditoryvisual motion signals. Cogn. Affect.
Behav. Neurosci. 4 (2), 241250.
Hubbard, T.L., 2005. Representational momentum and related
displacements in spatial memory: a review of the findings.
Psychon. Bull. Rev. 12 (5), 822851.
Jain, A., Sally, S.L., Papathomas, T.V., 2008. Audiovisual
short-term influences and aftereffects in motion: examination
across three sets of directional pairings. J. Vis. 8 (15), 7.17.13.
Kafaligonul, H., Stoner, G.R., 2010. Auditory modulation of visual
apparent motion with short spatial and temporal intervals.
J. Vis. 10 (12), 31.
King, A.J., 2005. Multisensory integration: strategies for
synchronization. Curr. Biol. 15 (9), R339R341.
Kitajima, N., Yamashita, Y., 1999. Dynamic capture of sound
motion by light stimuli moving in three-dimensional space.
Percept. Mot. Skills 89 (3 Pt 2), 11391158.
Lewald, J., Guski, R., 2003. Cross-modal perceptual integration
of spatially and temporally disparate auditory and visual
stimuli. Brain Res. Cogn. Brain Res. 16 (3), 468478.
Lewis, R., Noppeney, U., 2010. Audiovisual synchrony improves
motion discriminationvia enhanced connectivitybetween early
visual and auditory areas. J. Neurosci. 30 (37), 1232912339.
Mateeff, S., Hohnsbein, J., Noack, T., 1985. Dynamic visual capture:
apparent auditory motion induced by a moving visual target.
Perception 14 (6), 721727.
McGurk, H., MacDonald, J., 1976. Hearing lips and seeing voices.
Nature 264 (5588), 746748.
110 BRAIN RESEARCH 1466 (2012) 99111
McKee, S.P., Nakayama, K., 1984. The detection of motion in the
peripheral visual field. Vision Res. 24 (1), 2532.
Meredith, M.A., Stein, B.E., 1983. Interactions among converging
sensory inputs in the superior colliculus. Science 221 (4608),
389391.
Meredith, M.A., Nemitz, J.W., Stein, B.E., 1987. Determinants of
multisensory integration in superior colliculus neurons. I.
temporal factors. J. Neurosci. 7 (10), 32153229.
Meyer, G.F., Wuerger, S.M., 2001. Cross-modal integration of auditory
and visual motion signals. Neuroreport 12 (11), 25572560.
Meyer, G.F., Wuerger, S.M., Röhrbein, F., Zetzsche, C., 2005.
Low-level integration of auditory and visual motion signals
requires spatial co-localisation. Exp. Brain Res. 166 (34), 538547.
Meyer, G.F., Greenlee, M., Wuerger, S., 2011. Interactions between
auditory and visual semantic stimulus classes: evidence for
common processing networks for speech and body actions.
J. Cogn. Neurosci. 23 (9), 22912308.
Middlebrooks, J.C., Green, D.M., 1991. Sound localization by
human listeners. Annu. Rev. Psychol. 42, 135159.
Morein-Zamir, S., Soto-Faraco, S., Kingstone, A., 2003. Auditory
capture of vision: examining temporal ventriloquism. Brain
Res. Cogn. Brain Res. 17 (1), 154163.
Navarra, J., Vatakis, A., Zampini, M., Soto-Faraco, S., Humphreys,
W., Spence, C., 2005. Exposure to asynchronous audiovisual
speech extends the temporal window for audiovisual
integration. Brain Res. Cogn. Brain Res. 25 (2), 499507.
Perrott, D.R., Ambarsoom, H., Tucker, J., 1987. Changes in head
position as a measure of auditory localization performance:
auditory psychomotor coordination under monaural and
binaural listening conditions. J. Acoust. Soc. Am. 82 (5),
16371645.
Perrott, D.R., Saberi, K., Brown, K., Strybel, T.Z., 1990. Auditory
psychomotor coordination and visual search performance.
Percept. Psychophys. 48 (3), 214226.
Recanzone, G.H., 2009. Interactions of auditory and visual stimuli
in space and time. Hear. Res. 258 (12), 8999.
Sanabria, D., Lupiañez, J., Spence, C., 2007a. Auditory motion
affects visual motion perception in a speeded discrimination
task. Exp. Brain Res. 178 (3), 415421.
Sanabria, D., Spence, C., Soto-Faraco, S., 2007b. Perceptual and
decisional contributions to audiovisual interactions in the
perception of apparent motion: a signal detection study.
Cognition 102 (2), 299310.
Senkowski, D., Talsma, D., Grigutsch, M., Herrmann, C.S.,
Woldorff, M.G., 2007. Good times for multisensory integration:
effects of the precision of temporal synchrony as revealed by
gamma-band oscillations. Neuropsychologia 45 (3), 561571.
Slutsky, D.A., Recanzone, G.H., 2001. Temporal and spatial
dependency of the ventriloquism effect. Neuroreport 12 (1),
710.
Soto-Faraco, S., Lyons, J., Gazzaniga, M., Spence, C., Kingstone, A.,
2002. The ventriloquist in motion: illusory capture of dynamic
information across sensory modalities. Brain Res. Cogn. Brain
Res. 14 (1), 139146.
Soto-Faraco, S., Spence, C., Lloyd, D., Kingstone, A., 2004. Moving
multisensory research along: motion perception across
sensory modalities. Curr. Dir. Psychol. Sci. 13, 2932.
Spence, C., Senkowski, D., Röder, B., 2009. Crossmodal processing.
Exp. Brain Res. 198 (23), 107111.
Stein, B.E., Meredith, M.A., 1993. The Merging of the Senses. MIT
Press, Cambridge, MA.
Stekelenburg, J.J., Vroomen, J., 2009. Neural correlates of
audiovisual motion capture. Exp. Brain Res. 198 (23),
383390.
Strybel, T.Z., Vatakis, A., 2004. A comparison of auditory and
visual apparent motion presented individually and with
crossmodal moving distractors. Perception 33 (9), 10331048.
Teder-Sälejärvi, W.A., Di Russo, F., McDonald, J.J., Hillyard, S.A.,
2005. Effects of spatial congruity on audio-visual multimodal
integration. J. Cogn. Neurosci. 17 (9), 13961409.
Thurlow, W.R., Jack, C.E., 1973. Certain determinants of the
"ventriloquism effect". Percept. Mot. Skills 36 (3), 11711184.
To, M.P.S., Regan, B.C., Wood, D., Mollon, J.D., 2011. Vision out of
the corner of the eye. Vision Res. 51 (1), 203214.
Van Wanrooij, M.M., Bremen, P., John Van Opstal, A., 2010.
Acquired prior knowledge modulates audiovisual integration.
Eur. J. Neurosci. 31 (10), 17631771.
Wallace, M.T., Roberson, G.E., Hairston, W.D., Stein, B.E., Vaughan,
J.W., Schirillo, J.A., 2004. Unifying multisensory signals across
time and space. Exp. Brain Res. 158 (2), 252258.
Welch, R.B., Warren, D.H., 1980. Immediate perceptual response to
intersensory discrepancy. Psychol. Bull. 88 (3), 638667.
Werner, S., Noppeney, U., 2010. Superadditive responses in
superior temporal sulcus predict audiovisual benefits in object
categorization. Cereb. Cortex 20 (8), 18291842.
Werner, S., Noppeney, U., 2011. The contributions of transient and
sustained response codes to audiovisual integration. Cereb.
Cortex 21 (4), 920931.
Wilson, W.W., O'Neill, W.E., 1998. Auditory motion induces
directionally dependent receptive field shifts in inferior
colliculus neurons. J. Neurophysiol. 79 (4), 20402062.
Wuerger, S.M., Hofbauer, M., Meyer, G.F., 2003. The integration of
auditory and visual motion signals at threshold. Percept.
Psychophys. 65 (8), 11881196.
111BRAIN RESEARCH 1466 (2012) 99111
... In the spatial domain, crossmodal influences are typically much stronger from the visual to auditory system than vice versa, both in the case of stimulus-source localization (Jack and Thurlow, 1973;Kopco et al., 2009) and motion perception (Alais and Burr, 2004;Bertelson and Radeau, 1981;Soto-Faraco et al., 2004). However, crossmodal auditory information can modulate visual motion perception (Cappe et al., 2009;Chien et al., 2013;Schmiedchen et al., 2012;Soto-Faraco et al., 2003), help orienting to relevant visual motion cues (Hanada et al., 2019), and modulate visual cortex activation during motion discrimination tasks (Kayser et al., 2017;Lewis and Noppeney, 2010). Neuroimaging studies, further, suggest that auditory motion cues can modulate visual processing of motion in the human middle temporal complex (hMT+) (Alink et al., 2008;Kayser and Kayser, 2018;Lewis and Noppeney, 2010;Rezk et al., 2020;Strnad et al., 2013;von Saldern and Noppeney, 2013), which is the human homolog of motion sensitive areas including the middle temporal (MT) and middle superior temporal (MST) regions of the monkey brain (Allman and Kaas, 1971;Dubner and Zeki, 1971). ...
... The present behavioral findings are consistent with our previous psychophysical studies, which suggested that auditory motion cues facilitate object movement detection within a dynamic scene that simulates the observer's self-motion (Calabro et al., 2011;Roudaia et al., 2018). Our behavioral findings are also in line with previous psychophysical evidence that auditory cues modulate visual motion processing within an otherwise stationary scene (Cappe et al., 2009;Chien et al., 2013;Schmiedchen et al., 2012;Soto-Faraco et al., 2003). Here, the auditory-to-visual performance modulation, further, emerged in a highload setting where the single target object motion needed to be parsed from the much stronger retinal motion pattern of the eight non-target objects. ...
Article
Visual segregation of moving objects is a considerable computational challenge when the observer moves through space. Recent psychophysical studies suggest that directionally congruent, moving auditory cues can substantially improve parsing object motion in such settings, but the exact brain mechanisms and visual processing stages that mediate these effects are still incompletely known. Here, we utilized multivariate pattern analyses (MVPA) of MRI-informed magnetoencephalography (MEG) source estimates to examine how crossmodal auditory cues facilitate motion detection during the observer's self-motion. During MEG recordings, participants identified a target object that moved either forward or backward within a visual scene that included nine identically textured objects simulating forward observer translation. Auditory motion cues 1) improved the behavioral accuracy of target localization, 2) significantly modulated the MEG source activity in the areas V2 and human middle temporal complex (hMT+), and 3) increased the accuracy at which the target movement direction could be decoded from hMT+ activity using MVPA. The increase of decoding accuracy by auditory cues in hMT+ was significant also when superior temporal activations in or near auditory cortices were regressed out from the hMT+ source activity to control for source estimation biases caused by point spread. Taken together, these results suggest that parsing object motion from self-motion-induced optic flow in the human extrastriate visual cortex can be facilitated by crossmodal influences from auditory system.
... (3) Real-space, physical-simulation, 3D-audio-source auxiliary equipment This equipment mainly simulates the sound in the natural environment through hardware sound sources in different directions in real space to train the orientation abilities of visually impaired children. The traditional real-space, physical-simulation, 3D-soundsource auxiliary equipment is mainly based on the 3D sound source space of the speaker array, and it is not easy to flexibly control the position of the sound source-the simulated sound source [31][32][33][34][35][36][37][38]. However, it is difficult for most visually impaired groups to use similar conditions for orientation training, making it challenging to popularize this piece of equipment among the public. ...
Article
Full-text available
Orientation and Mobility training (O&M) is a specific program that teaches people with vision loss to orient themselves and travel safely within certain contexts. State-of-the-art research reveals that people with vision loss expect high-quality O&M training, especially at early ages, but the conventional O&M training methods involve tedious programs and require a high participation of professional trainers. However, there is an insufficient number of excellent trainers. In this work, we first interpret and discuss the relevant research in recent years. Then, we discuss the questionnaires and interviews we conducted with visually impaired people. On the basis of field investigation and related research, we propose the design of a training solution for children to operate and maintain direction based on audio augmented reality. We discuss how, within the perceptible scene created by EasyAR’s map-aware framework, we created an AR audio source tracing training that simulates a social scene to strengthen the audiometric identification of the subjects, and then to verify the efficiency and feasibility of this scheme, we implemented the application prototype with the required hardware and software and conducted the subsequential experiments with blindfolded children. We confirm the high usability of the designed approach by analyzing the results of the pilot study. Compared with other orientation training studies, the method we propose makes the whole training process flexible and entertaining. At the same time, this training process does not involve excessive economic costs or require professional skills training, allowing users to undergo training at home or on the sports ground rather than having to go to rehabilitation sites or specified schools. Furthermore, according to the feedback from the experiments, the approach is promising in regard to gamification.
... Our findings provided evidence for spatiotemporal interrelations in a new form of tasks-namely, (auditory) interceptionas compared with the previously reported effects on relative judgments (e.g., and memory retrieval (Sarrazin et al., 2004). Moreover, the results indicate that modality plays an important role as concerns the contributions of spatial and temporal characteristics of a task (O'Connor & Hermelin, 1972;Recanzone, 2009;Schmiedchen et al., 2012). Both experiments showed that in the auditory condition interception performance revealed a significant tau, but no classical (yet in Exp. 1 a small and reversed) kappa effect. ...
Article
Full-text available
Batting and catching are real-life examples of interception. Due to latencies between the processing of sensory input and the corresponding motor response, successful interception requires accurate spatiotemporal prediction. However, spatiotemporal predictions can be subject to bias. For instance, the more spatially distant two sequentially presented objects are, the longer the interval between their presentations is perceived (kappa effect) and vice versa (tau effect). In this study, we deployed these phenomena to test in two sensory modalities whether temporal representations depend asymmetrically on spatial representations, or whether both are symmetrically interrelated. We adapted the tau and kappa paradigms to an interception task by presenting four stimuli (visually or auditorily) one after another on four locations, from left to right, with constant spatial and temporal intervals in between. In two experiments, participants were asked to touch the screen where and when they predicted a fifth stimulus to appear. In Exp. 2, additional predictive gaze measures were examined. Across experiments, auditory but not visual stimuli produced a tau effect for interception, supporting the idea that the relationship between space and time is moderated by the sensory modality. Results did not reveal classical auditory or visual kappa effects and no visual tau effects. Gaze data in Exp. 2 showed that the (spatial) gaze orientation depended on temporal intervals while the timing of fixations was modulated by spatial intervals, thereby indicating tau and kappa effects across modalities. Together, the results suggest that sensory modality plays an important role in spatiotemporal predictions in interception.
... Thus many dominant elements are at play while those deemed less important to the specific task move to the background, yet all must be integrated in the brain. 10 Therefore, we review recent neurocognitive research underscoring inter-individual differences in perception, action, and decision making that may have implications for dance education. In our review, we have not distinguished between neurocognitive research stating that inter-individual differences were explicitly targeted at the outset of the investigation. ...
Article
This systematic bibliometric review summarizes recent neurocognitive research highlighting inter-individual differences in perception, action, and decision making that may have implications for dance education. First, the relevance of individual differences in cognitive functioning for dance education is illustrated by describing how a person’s preferred reliance on certain perceptual, motor, or (meta)cognitive skills may be exploited in dance training. Subsequently, we describe the findings of a literature search conducted to identify cognitive neuroscientific publications between 2010 and 2021 that highlight individual differences in cognitive functions that were also found to be supported by structural or functional-connectivity differences in the central nervous system. To cluster the findings of the literature search, we propose a simplified, six-category information processing model. Finally, for each model category, we summarize recent representative findings on salient individual differences and tentatively formulate testable implications for dance education practice with regard to pedagogical and curricula adaptations. Finally, the review also delineates an agenda for lines of research of which the results hopefully will assist dance instructors in the future.
... A great deal of research has focused on the phenomenon of MSI in the formation of human motion perception (Alink et al. 2008;Kayser et al. 2017;Lewis and Noppeney 2010;McCourt and Leone 2016;Rosemann et al. 2017;Schmiedchen et al. 2012;von Saldern and Noppeney 2013). The streaming-bouncing effect (SBE) has received more attention and has become a vital experimental paradigm (Berger and Ehrsson 2017;Bushara et al. 2001;Fujisaki et al. 2004;Maniglia et al. 2012;Meyerhoff and Scholl 2018;Zeljko et al. 2019;Zhao et al. 2018). ...
Article
Full-text available
Motion perception in real situations is often stimulated by multisensory information. Speed is an essential characteristic of moving objects; however, at present, it is not clear whether speed affects the process of audiovisual temporal integration in motion perception. Therefore, this study used a streaming-bouncing task (a bistable motion perception; SB task) combined with a simultaneous judgment task (SJ task) to explore the effect of speed on audiovisual temporal integration from implicit and explicit perspectives. The experiment had a within-subjects design, two speed conditions (fast/slow), eleven audiovisual conditions [stimulus onset asynchrony (SOA): 0 ms/ ± 60 ms/ ± 120 ms/ ± 180 ms/ ± 240 ms/ ± 300 ms], and a visual-only condition. A total of 30 subjects were recruited for the study. These participants completed the SB task and the SJ task successively. The results showed the following outcomes: (1) the optimal times needed to induce the “bouncing” illusion and maximum audiovisual bounce-inducing effect (ABE) magnitude were much earlier than that for the optimal time of audiovisual synchrony, (2) speed as a bottom-up factor could affect the proportion of “bouncing” perception in SB illusions but did not affect the ABE magnitude, (3) speed could also affect the ability of audiovisual temporal integration in motion perception, and the main manifestation was that the point of subjective simultaneity (PSS) in fast speed conditions was earlier than that of slow speed conditions in the SJ task and (4) the SB task and SJ task were not related. In conclusion, the time to complete the maximum audiovisual integration was different from the optimal time for synchrony perception; moreover, speed could affect audiovisual temporal integration in motion perception but only in explicit temporal tasks.
... Other studies have shown that perceived motion is a product of multisensory information. Audition and vision interact, exerting influence on each other when participants are asked to attend to one of the two modalities (Alink et al., 2008;Schmiedchen et al., 2012;Soto-Faraco et al., 2002;Stekelenburg & Vroomen, 2009). However, most of this work on multimodal motion perception has examined objects moving in azimuthal or bypass trajectories with respect to the observer. ...
Article
Looming objects generate unique multisensory information and can signal potential danger. While visual judgments of arrival time are relatively accurate, auditory judgments tend to be anticipatory. When presented with information from both modalities, observers tend to rely on visual information for successful interaction. Previous work has shown that auditory information can influence perception of visual looming and receding. Here, we examined how visual information affects the auditory perception of looming and receding sounds. Participants judged the loudness change of sounds accompanied by visual looming and receding stimuli. We found that looming and receding visual stimuli influenced estimates of loudness change. Sounds were perceived to change more in loudness when presented with a larger visual change. We also found that looming sounds were perceived to change more in loudness than equivalent receding sounds and that participants showed better discrimination of the change between receding sounds than looming sounds.
... Its basic aspect is beating time, i.e., the clear indication of the metric pulse by the conductor's right hand, often with the help of a baton. (Apel, 1982, p. 196) Crossmodal Correspondence/Crossmodal Interactions Schmiedchen et al. (2012) define crossmodal interactions as "the influence of one modality on signal processing in another modality." The McGurk Effect is an example of crossmodal correspondence resulting in multisensory integration. ...
Article
An objective understanding of media depictions, such as inclusive portrayals of how much someone is heard and seen on screen such as in film and television, requires the machines to discern automatically who, when, how, and where someone is talking, and not. Speaker activity can be automatically discerned from the rich multimodal information present in the media content. This is however a challenging problem due to the vast variety and contextual variability in media content, and the lack of labeled data. In this work, we present a cross-modal neural network for learning visual representations, which have implicit information pertaining to the spatial location of a speaker in the visual frames. Avoiding the need for manual annotations for active speakers in visual frames, acquiring of which is very expensive, we present a weakly supervised system for the task of localizing active speakers in movie content. We use the learned cross-modal visual representations, and provide weak supervision from movie subtitles acting as a proxy for voice activity, thus requiring no manual annotations. Furthermore, we propose an audio-assisted post-processing formulation for the task of active speaker detection. We evaluate the performance of the proposed system on three benchmark datasets: i) AVA active speaker dataset, ii) Visual person clustering dataset, and iii) Columbia datset, and demonstrate the effectiveness of the cross-modal embeddings for localizing active speakers in comparison to fully supervised systems.
Article
The present study examines the extent to which temporal and spatial properties of sound modulate visual motion processing in spatial localization tasks. Participants were asked to locate the place at which a moving visual target unexpectedly vanished. Across different tasks, accompanying sounds were factorially varied within subjects as to their onset and offset times and/or positions relative to visual motion. Sound onset had no effect on the localization error. Sound offset was shown to modulate the perceived visual offset location, both for temporal and spatial disparities. This modulation did not conform to attraction toward the timing or location of the sounds but, demonstrably in the case of temporal disparities, to bimodal enhancement instead. Favorable indications to a contextual effect of audiovisual presentations on interspersed visual-only trials were also found. The short sound-leading offset asynchrony had equivalent benefits to audiovisual offset synchrony, suggestive of the involvement of early-level mechanisms, constrained by a temporal window, at these conditions. Yet, we tentatively hypothesize that the whole of the results and how they compare with previous studies requires the contribution of additional mechanisms, including learning-detection of auditory-visual associations and cross-sensory spread of endogenous attention.
Article
Full-text available
The perception of dynamic objects is sometimes biased. For example, localizing a moving object after it has disappeared results in a perceptual shift in the direction of motion, a bias known as representational momentum . We investigated whether the temporal characteristics of an irrelevant, spatially uninformative vibrotactile stimulus bias the perceived location of a visual target. In two visuotactile experiments, participants judged the final location of a dynamic, visual target. Simultaneously, a continuous (starting with the onset of the visual target, Experiments 1 and 2) or brief (33-ms stimulation, Experiment 2) vibrotactile stimulus (at the palm of participant’s hands) was presented, and the offset disparity between the visual target and tactile stimulation was systematically varied. The results indicate a cross-modal influence of tactile stimulation on the perceived final location of the visual target. Closer inspection of the nature of this cross-modal influence, observed here for the first time, reveals that the vibrotactile stimulus was likely just taken as a temporal cue regarding the offset of the visual target, but no strong interaction and combined processing of the two stimuli occurred. The present results are related to similar cross-modal temporal illusions and current accounts of multisensory perception, integration, and cross-modal facilitation.
Article
Full-text available
In this study, it is demonstrated that moving sounds have an effect on the direction in which one sees visual stimuli move. During the main experiment sounds were presented consecutively at four speaker locations inducing left or rightward auditory apparent motion. On the path of auditory apparent motion, visual apparent motion stimuli were presented with a high degree of directional ambiguity. The main outcome of this experiment is that our participants perceived visual apparent motion stimuli that were ambiguous (equally likely to be perceived as moving left or rightward) more often as moving in the same direction than in the opposite direction of auditory apparent motion. During the control experiment we replicated this finding and found no effect of sound motion direction on eye movements. This indicates that auditory motion can capture our visual motion percept when visual motion direction is insufficiently determinate without affecting eye movements.
Article
Full-text available
This research focused on the response of neurons in the inferior colliculus of the unanesthetized mustached bat, Pteronotus parnelli, to apparent auditory motion. We produced the apparent motion stimulus by broadcasting pure-tone bursts sequentially from an array of loudspeakers along horizontal, vertical, or oblique trajectories in the frontal hemifield. Motion direction had an effect on the response of 65% of the units sampled. In these cells, motion in opposite directions produced shifts in receptive field locations, differences in response magnitude, or a combination of the two effects. Receptive fields typically were shifted opposite the direction of motion (i.e., units showed a greater response to moving sounds entering the receptive field than exiting) and shifts were obtained to horizontal, vertical, and oblique motion orientations. Response latency also shifted as a function of motion direction, and stimulus locations eliciting greater spike counts also exhibited the shortest neural latency. Motion crossing the receptive field boundaries appeared to be both necessary and sufficient to produce receptive field shifts. Decreasing the silent interval between successive stimuli in the apparent motion sequence increased both the probability of obtaining a directional effect and the magnitude of receptive field shifts. We suggest that the observed directional effects might be explained by "spatial masking," where the response of auditory neurons after stimulation from particularly effective locations in space would be diminished. The shift in auditory receptive fields would be expected to shift the perceived location of a moving sound and may explain shifts in localization of moving sources observed in psychophysical studies. Shifts in perceived target location caused by auditory motion might be exploited by auditory predators such as Pteronotus in a predictive tracking strategy to capture moving insect prey.
Article
Human observers localize events in the world by using sensory signals from multiple modalities. We evaluated two theories of spatial localization that predict how visual and auditory information are weighted when these signals specify different locations in space. According to one theory (visual capture), the signal that is typically most reliable dominates in a winner-take-all competition, whereas the other theory (maximum-likelihood estimation) proposes that perceptual judgments are based on a weighted average of the sensory signals in proportion to each signal's relative reliability. Our results indicate that both theories are partially correct, in that relative signal reliability significantly altered judgments of spatial location, but these judgments were also characterized by an overall bias to rely on visual over auditory information. These results have important implications for the development of cue integration and for neural plasticity in the adult brain that enables humans to optimally integrate multimodal information.
Article
The past few years have seen a rapid growth of interest regarding how information from the different senses is combined. Historically, the majority of the research on this topic has focused on interactions in the perception of stationary stimuli, but given that the majority of stimuli in the world move, an important question concerns the extent to which principles derived from stationary stimuli also apply to moving stimuli. A key finding emerging from recent work with moving stimuli is that our perception of stimulus movement in one modality is frequently, and unavoidably, modulated by the concurrent movement of stimuli in other sensory modalities. Visual motion has a particularly strong influence on the perception of auditory and tactile motion. These behavioral results are now being complemented by the results of neuroimaging studies that have pointed out the existence of both modality-specific motion-processing areas and areas involved in processing motion in more than one sense. The challenge for the future will be to develop novel experimental paradigms that can integrate behavioral and neuroscientific approaches in order to refine our understanding of multisensory contributions to the perception of movement.
Article
It is well known that discrepancies in the location of synchronized auditory and visual events can lead to mislocalizations of the auditory source, so-called ventriloquism. In two experiments, we tested whether such cross-modal influences on auditory localization depend on deliberate visual attention to the biasing visual event. In Experiment 1, subjects pointed to the apparent source of sounds in the presence or absence of a synchronous peripheral flash. They also monitored for target visual events, either at the location of the peripheral flash or in a central location. Auditory localization was attracted toward the synchronous peripheral flash, but this was unaffected by where deliberate visual attention was directed in the monitoring task. In Experiment 2, bilateral flashes were presented in synchrony with each sound, to provide competing visual attractors. When these visual events were equally salient on the two sides, auditory localization was unaffected by which side subjects monitored for visual targets. When one flash was larger than the other, auditory localization was slightly but reliably attracted toward it, but again regardless of where visual monitoring was required. We conclude that ventriloquism largely reflects automatic sensory interactions, with little or no role for deliberate spatial attention.
Article
In Experiments 1 and 2, the time to locate and identify a visual target (visual search performance in a two-alternative forced-choice paradigm) was measured as a function of the location of the target relative to the subject’s initial line of gaze. In Experiment 1, tests were conducted within a 260° region on the horizontal plane at a fixed elevation (eye level). In Experiment 2, the position of the target was varied in both the horizontal (260°) and the vertical (±46° from the initial line of gaze) planes. In both experiments, and for all locations tested, the time required to conduct a visual search was reduced substantially (175–1,200 msec) when a 10-Hz click train was presented from the same location as that occupied by the visual target. Significant differences in latencies were still evident when the visual target was located within 10° of the initial line of gaze (central visual field). In Experiment 3, we examined head and eye movements that occur as subjects attempt to locate a sound source. Concurrent movements of the head and eyes are commonly encountered during auditorily directed search behavior. In over half of the trials, eyelid closures were apparent as the subjects attempted to orient themselves toward the sound source. The results from these experiments support the hypothesis that the auditory spatial channel has a significant role in regulating visual gaze.
Article
Investigations of situations involving spatial discordance between auditory and visual data which can otherwise be attributed to a common origin have revealed two main phenomena:cross-modal bias andperceptual fusion (or ventriloquism). The focus of the present study is the relationship between these two. The question asked was whether bias occurred only with fusion, as is predicted by some accounts of reactions to discordance, among them those based on cuesubstitution. The approach consisted of having subjects, on each trial, both point to signals in one modality in the presence of conflicting signals in the other modality and produce same-different origin judgments. To avoid the confounding of immediate effects with cumulative adaptation, which was allowed in most previous studies, the direction and amplitude of discordance was varied randomly from trial to trial. Experiment 1, which was a pilot study, showed that both visual bias of auditory localization and auditory bias of visual localization can be observed under such conditions. Experiment 2, which addressed the main question, used a method which controls for the selection involved in separating fusion from no-fusion trials and showed that the attraction of auditory localization by conflicting visual inputs occurs even when fusion is not reported. This result is inconsistent with purely postperceptual views of cross-modal interactions. The question could not be answered for auditory bias of visual localization, which, although significant, was very small in Experiment 1 and fell below significance under the conditions of Experiment 2.
Article
The nervous system has evolved to transduce different types of environmental energy independently, for example light energy is transduced by the retina whereas sound energy is transduced by the cochlea. However, the neural processing of this energy is necessarily combined, resulting in a unified percept of a real-world object or event. These percepts can be modified in the laboratory, resulting in illusions that can be used to probe how multisensory integration occurs. This paper reviews studies that have utilized such illusory percepts in order to better understand the integration of auditory and visual signals in primates. Results from human psychophysical experiments where visual stimuli alter the perception of acoustic space (the ventriloquism effect) are discussed, as are experiments probing the underlying cortical mechanisms of this integration. Similar psychophysical experiments where auditory stimuli alter the perception of visual temporal processing are also described.
Article
Electrophysiological studies in nonhuman primates and other mammals have shown that sensory cues from different modalities that appear at the same time and in the same location can increase the firing rate of multisensory cells in the superior colliculus to a level exceeding that predicted by summing the responses to the unimodal inputs. In contrast, spatially disparate multisensory cues can induce a profound response depression. We have previously demonstrated using functional magnetic resonance imaging (fMRI) that similar indices of crossmodal facilitation and inhibition are detectable in human cortex when subjects listen to speech while viewing visually congruent and incongruent lip and mouth movements. Here, we have used fMRI to investigate whether similar BOLD signal changes are observable during the crossmodal integration of nonspeech auditory and visual stimuli, matched or mismatched solely on the basis of their temporal synchrony, and if so, whether these crossmodal effects occur in similar brain areas as those identified during the integration of audio-visual speech. Subjects were exposed to synchronous and asynchronous auditory (white noise bursts) and visual (B/W alternating checkerboard) stimuli and to each modality in isolation. Synchronous and asynchronous bimodal inputs produced superadditive BOLD response enhancement and response depression across a large network of polysensory areas. The most highly significant of these crossmodal gains and decrements were observed in the superior colliculi. Other regions exhibiting these crossmodal interactions included cortex within the superior temporal sulcus, intraparietal sulcus, insula, and several foci in the frontal lobe, including within the superior and ventromedial frontal gyri. These data demonstrate the efficacy of using an analytic approach informed by electrophysiology to identify multisensory integration sites in humans and suggest that the particular network of brain areas implicated in these crossmodal integrative processes are dependent on the nature of the correspondence between the different sensory inputs (e.g. space, time, and/or form).