ArticlePDF Available

Deconstructing Mental Rotation

Authors:

Abstract and Figures

A random walk model of the classical mental rotation task is explored in two experiments. By assuming that a mental rotation is repeated until sufficient evidence for a match/mismatch is obtained, the model accounts for the approximately linearly increasing reaction times (RTs) on positive trials, flat RTs on negative trials, false alarms and miss rates, effects of complexity, and for the number of eye movement switches between stimuli as functions of angular difference in orientation. Analysis of eye movements supports key aspects of the model and shows that initial processing time is roughly constant until the first saccade switch between stimulus objects, while the duration of the remaining trial increases approximately linearly as a function of angular discrepancy. The increment results from additive effects of (a) a linear increase in the number of saccade switches between stimulus objects, (b) a linear increase in the number of saccades on a stimulus, and (c) a linear increase in the number and in the duration of fixations on a stimulus object. The fixation duration increment was the same on simple and complex trials (about 15 ms per 600), which suggests that the critical orientation alignment take place during fixations at very high speed. (PsycINFO Database Record (c) 2014 APA, all rights reserved).
Content may be subject to copyright.
Journal of Experimental Psychology: Human
Perception and Performance
Deconstructing Mental Rotation
Axel Larsen
Online First Publication, February 10, 2014. http://dx.doi.org/10.1037/a0035648
CITATION
Larsen, A. (2014, February 10). Deconstructing Mental Rotation. Journal of Experimental
Psychology: Human Perception and Performance. Advance online publication.
http://dx.doi.org/10.1037/a0035648
Deconstructing Mental Rotation
Axel Larsen
University of Copenhagen
A random walk model of the classical mental rotation task is explored in two experiments. By assuming
that a mental rotation is repeated until sufficient evidence for a match/mismatch is obtained, the model
accounts for the approximately linearly increasing reaction times (RTs) on positive trials, flat RTs on
negative trials, false alarms and miss rates, effects of complexity, and for the number of eye movement
switches between stimuli as functions of angular difference in orientation. Analysis of eye movements
supports key aspects of the model and shows that initial processing time is roughly constant until the first
saccade switch between stimulus objects, while the duration of the remaining trial increases approxi-
mately linearly as a function of angular discrepancy. The increment results from additive effects of (a)
a linear increase in the number of saccade switches between stimulus objects, (b) a linear increase in the
number of saccades on a stimulus, and (c) a linear increase in the number and in the duration of fixations
on a stimulus object. The fixation duration increment was the same on simple and complex trials (about
15 ms per 60
0
), which suggests that the critical orientation alignment take place during fixations at very
high speed.
Keywords: mental rotation, eye movements, visual working memory, random walk
Our ability to determine that objects have the same shape
despite differences in orientation or size is a classical problem in
visual perception. It was thoroughly discussed at the turn of the
19th century by Mach (1902) and since has been treated by
numerous authors (e.g., Biederman, 1987; Dodwell, 1970; Edel-
man, 1995; Furmanski & Engel, 2000; Gibson, 1969; Graf, 2006;
Hebb, 1949; Hodgetts, Hahn, & Chater, 2009; Köhler, 1929;
Lashley, 1942; Pitts & McCulloch, 1947; Rock, 1956; Tarr &
Gauthier, 1998).
In their landmark study on mental rotation some 40 years ago
R. N. Shepard and Metzler (1971) opened a new line of attack that
was followed by related studies on visual transformations of size
(Bundesen & Larsen, 1975; Sekuler & Nash, 1972), mental scan-
ning of images maintained in visual short term memory (Kosslyn,
1973; Kosslyn, 1980), and mental translation of visual images
(Larsen & Bundesen, 1998). R. N. Shepard and Metzler displayed
projections of two unfamiliar three-dimensional figures on a com-
puter screen and recorded the time subjects needed to decide
whether the objects had the same shape as a function of their
angular difference in orientation. They found that reaction time
(RT) costs increased linearly as a function of angular discrepancy
at about 1 s per 60 degrees, and that this rate was roughly the same
for rotations in the picture plane and rotations in depth.
R. N. Shepard and Metzler (1971; see also, Metzler & Shepard,
1974) offered an interpretation of their findings that was straight-
forward and intuitively compelling, but also radically different
from previous attempts to understand orientation invariance (e.g.,
Selfridge, 1959; Sutherland, 1968). The interpretation was essen-
tially based on introspection: All subjects claimed that they imag-
ined one of the figures rotated into the same orientation as the
other one and that they could carry out this mental rotation at no
greater than a certain limiting rate. The notion that mental rotation
should be conceived in close analogy to actually perceiving a
rotating physical object was developed and extensively tested in
further studies (R. N. Shepard & Cooper, 1982). Of particular note
is R. N. Shepard and Judd’s study (1976) on stroboscopic motion
in which they displayed the very same three-dimensional objects in
sequential alternation (zero ISI). With a suitable stimulus onset
asynchrony (SOA), participants reported vivid impressions of a
rigid three-dimensional object rotating back and forth in the pic-
ture plane or in depth. The critical SOA at which the impression of
rigid rotational motion broke down increased approximately lin-
early as a function of angular difference in orientation with nearly
the same slopes for rotations in the picture plane or in depth. The
linear increase suggests that there is an upper limit to the velocity
of the rotational movement. Farrell, Larsen, and Bundesen (1982)
showed that the limit relates to angular velocity, and not the linear
velocity of the fastest moving subpattern.
Further evidence on the close relationship between transforma-
tion of visual images in mental rotation and visual motion percep-
tion comes from studies of the motion aftereffect (MAE), which
show that the MAE interferes with mental rotation (see, e.g.,
Corballis & McLaren, 1982; Heil, Bajric, Rösler, & Henning-
hausen, 1997; Jolicoeur, Corballis, & Lawson, 1998; Seurinck, de
Lange, Achten, & Vingerhoets, 2011). In line with the investiga-
Axel Larsen, Center for Visual Cognition, Department of Psychology,
University of Copenhagen, Denmark.
This research was financially supported by the Nordic Council (NOS-S)
to the Nordic Center of Excellence in Cognitive Control. I thank Claus
Bundesen for critical comments and helpful suggestions, and Martin Lange
for programming synchronization protocols between Eyelink II and the PC
graphic display system.
Correspondence concerning this article should be addressed to Axel
Larsen, Center for Visual Cognition, Department of Psychology, Univer-
sity of Copenhagen, Øster Farimagsgade 2A, DK–1353 Copenhagen K,
Denmark. E-mail: Larsen.axel@psy.ku.dk
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Journal of Experimental Psychology:
Human Perception and Performance
© 2014 American Psychological Association
2014, Vol. 40, No. 2, 000
0096-1523/14/$12.00 DOI: 10.1037/a0035648
1
tions in visual psychophysics a meta-analysis of 32 investigations
of brain activations during mental rotation (Zacks, 2008) reveals
that all experiments using transformation specific contrasts (i.e.,
within-task comparisons of effects of mental rotation, e.g., com-
paring large rotations with small rotations) have found activations
located about (47.5, 59.5, 10.0, in Talairach space) that
corresponds to the visual motion area (V5/MT).
R. N. Shepard and Metzler’s (1971) interpretation of mental
rotation as an unitary or holistic visual process has been disputed
in numerous studies ever since (e.g., Anderson, 1978; Folk &
Luce, 1987; Liesefeld & Zimmer, 2013; Pylyshyn, 1973, 2003).
Besides the philosophical and theoretical aspects of the dispute, the
experimental support for Shepard and Metzler’s original claim also
has been questioned. In particular it has turned out that estimates
of the rate of mental rotation varies tremendously as a function of
stimulus complexity, stimulus familiarity, training, and similarity
within negative stimulus pairs (e.g., Bethell-Fox & Shepard, 1988;
Cohen & Kubovy, 1993; Dahlstrom-Hakki, Pollatsek, Fisher,
Miller, & Rayner, 2008; Folk & Luce, 1987; Förster, Gebhardt,
Lindlar, Siemann, & Delius, 1996; Pylyshyn, 1979; Yuille &
Steiger, 1982; but see also, Cooper & Podgorny, 1976). In addi-
tion, the systematic deviation from linear RT functions in some
studies (Bundesen, Larsen, & Farrell, 1981; Cooper & Shepard,
1973) is puzzling and difficult to reconcile with R. N. Shepard and
Metzler’s original interpretation (but see, Searle & Hamm, 2012).
This paper has two main goals. The first goal is to demonstrate
a principle that, in part, may explain the highly inconsistent esti-
mates of mental rotation velocity. This is done by an explicit
computational model of performance in a typical mental rotation
task. The model assumes that limited capacity in visual short-term
memory (VSTM) severely constrains the amount of information
that is encoded from one of the stimuli as a (more or less degraded)
visual image, and hence that when the transformed visual image is
matched against the other stimulus the evidence does not suffice to
meet the requirement to respond as quickly as possible, while
keeping errors low. Therefore this process of encoding, transfor-
mation, and match, is repeated until the evidence sampled suffices
to meet instructions. Unlike structural models (e.g., Biederman,
1987), the random walk model is based on well-established prop-
erties of visual short-term memory.
Visual short-term memory (STM) capacity is modest. Sperling
(1960) showed that the number of letters we can read off from a
visual image of a briefly exposed stimulus display is about four or
five. Later studies (e.g., Bundesen, 1990; Bundesen, Pedersen, &
Larsen, 1984; Luck & Vogel, 1997; Pashler, 1988; Shibuya &
Bundesen, 1988) generally reported a somewhat lower VSTM
capacity between three or four alphanumeric items. The capacity to
retain colored visual shapes (Todd & Marois, 2004; Vogel &
Machizawa, 2004) is also about three or four, but when the
composition of stimuli in terms of component features gets more
complex, capacity, that is, number of objects retained, seems to
decrease (e.g., Alvarez & Cavanagh, 2004; Sørensen & Kyllings-
bæk, 2012; Wheeler & Treisman, 2002).
Is the VSTM store hypothesized in these studies identical to the
store in which a few subpatterns (or features) of a stimulus is
encoded and subsequently transformed with respect to orientation?
Or, to put it more directly, is mental rotation of visual images done
in VSTM? To answer the question it is instructive to note that the
successive stimulus presentation in the popular change detection
paradigm (Luck & Vogel, 1997; Pashler, 1988; Phillips, 1974),
which is often used to estimate VSTM capacity, is not essentially
different from the stimulus presentation in the successive matching
paradigm (see later) that is used to invoke mental rotation when the
stimuli differ with respect to orientation. Basically the two para-
digms only differ with respect to the task participants are requested
to solve. In change detection, the task is to report if the second
stimulus is changed relative to first one. Instructions in the suc-
cessive matching are also to determine whether the second stim-
ulus have changed relative to the first, except for irrelevant
changes in orientation. It may be that different aspects of the first
stimulus are encoded in the two paradigms, but it is natural to
assume that stimulus encoding in each case is done into VSTM.
Two recent studies (Hyun & Luck, 2007; Prime & Jolicoeur,
2010) also suggested an affirmative answer. Hyun and Luck used
a dual task paradigm and found interference between storage of
colors supposedly maintained in VSTM and mental rotation, but
no interference on mental rotation with non-VSTM short-term
storage of positions in space. Prime and Jolicoeur (2010) moni-
tored evoked potentials released by mental rotation and showed
that the duration of the electrophysiological signature that is cou-
pled to maintenance of information in VSTM correlated with
angular difference, and thus by hypothesis to the duration of
mental rotation.
The second goal was to map the contributions and possible
organization of the various processing components that generate
the systematic linear increase in response times as a function of
angular difference in orientation between stimuli. This is done in
Experiment 2 by monitoring saccades and fixations during task
performance, and by systematically relating their number and
duration to the difference in the orientation.
Experiment 1
In this experiment the principal goal was to explore if visual
performance in the classical R. N. Shepard and Metzler (1971)
paradigm may emerge as a result of repeatedly executing a mental
rotation of a more or less schematic visual image of one of the
stimulus objects followed by a comparison of the transformed
visual image with the other stimulus. To achieve this goal mental
rotation was investigated with two different paradigms: Simulta-
neous matching in which two stimuli in different orientation are
presented side by side at the same time, and successive matching
in which stimuli in different orientation are shown one at a time in
succession. To solve the successive matching task the first stimu-
lus must be encoded and maintained as a visual image in VSTM
until the onset of the second stimulus, and then brought into
alignment with the second stimulus. A mental rotation can only be
made once in this case. In contrast, multiple mental rotations may
be done in the simultaneous matching task. In particular, it should
be possible to determine whether performance in the simultaneous
matching task can be modeled as repetitions of the mental rotation
that is done but once in the successive matching task. For conve-
nience, the mental rotation that is done but once in successive
matching and the mental rotation that by hypothesis is done
repeatedly in simultaneous matching are both labeled as simple
mental rotations. In both cases, mental rotation is presumably done
while the observer by one or more fixations and saccades inspect
one stimulus object at a time.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
2LARSEN
The notion that mental rotation inferred from linearly increasing
RTs as function of angular discrepancy may be based on repeated
application of a simple mental rotation is not new, but it has never
been pursued in detail, let alone rigorously specified in a compu-
tational model. It derives from observations made by several
researchers (e.g., Carpenter & Just, 1978; Metzler & Shepard,
1974) that when (fairly complex) stimuli are displayed side by side
at the same time, participants may often go back and forth between
stimuli several times before reaching a decision. Presumably they
do this because their VSTM capacity only allow them to encode,
mentally rotate, and compare a schematic image or a few frag-
ments of a stimulus at a time.
A natural formalization of these observations is to model visual
performance in the classical mental rotation task with two simul-
taneously displayed tri-dimensional objects as a random walk with
a constant step size between two boundaries (thresholds) that
represent the evidence needed for reaching a positive (match) or
negative (mismatch) decision. The walk continues one step at a
time until one of the boundaries is reached. A step toward one of
the boundaries cancels a step toward the other boundary. Boundary
values and the probability of stepping toward the positive or
negative threshold are treated as free parameters.
For each random step then, a global processing module is
executed. For positive stimulus pairs the module comprises encod-
ing a schematic visual image of one of the objects a simple mental
rotation of the visual image of the object, and a subsequent test for
a match against the other object. Each time the module is executed;
one unit of perceptual evidence is accumulated. On trials with
congruent stimulus pairs the probability of collecting a unit of
positive evidence is represented by the parameter, p
. The prob-
ability of collecting a unit of negative evidence is 1 – p
. The
processing module is repeated until the accumulated evidence
exceeds a positive (
2
) or a negative threshold (
2
).
For the incongruent (negative) stimulus pairs, the same process-
ing module of stimulus encoding, mental rotation, and comparison
is executed. The probability of drifting one unit toward the nega-
tive threshold is then set to p
, and the probability for drifting
toward the positive threshold to 1 – p
. When the drift parameters
p
and p
and thresholds
and
are known, the general theory
of random walks provides explicit formulas for calculating false
alarm and miss rates and mean number of steps to reach upper and
lower thresholds (see, e.g., Bundesen, 1982; Feller, 1970). Mean
number of steps multiplied by the time to execute the encode-
rotate-compare processing module added to a base RT was used to
predict positive and negative RTs.
Studies of mental rotation generally show that error rates increase
as a function of angular difference, which suggests that response
criteria change as a function of angular difference. A convenient and
straightforward way to accommodate this in a random walk model
framework is to let positive and negative thresholds vary as a linear
function of angular difference orientation.
Method
Participants. Three paid undergraduates and one graduate
student between 24 and 29 years participated in the study. The
participants were males, naïve with respect to the purpose of this
experiment, but otherwise fairly well-trained experimental partic-
ipants. All had normal or corrected-to-normal vision.
Stimuli and apparatus. The stimulus material was a subset of
the original stimuli
1
used by R. N. Shepard and Metzler (1971).
The set comprised of five three-dimensional prototypes portrayed
from seven different views, from which any angular difference
between 0
0
and 180
0
in steps of 20
0
could be constructed. The
stimuli were displayed on a computer monitor. Each view was
inscribed in a circle with a diameter of 14 cm on the screen and
viewing distance was 60 cm. There were two conditions: A simul-
taneous condition in which two views of the same prototype were
shown at the center of the screen side by side at the same time until
the participant responded, and a successive condition in which the
two views were displayed one at a time at the center of the screen.
In the simultaneous condition the shortest distance between the
perimeters of the inscribing circles was 3 cm. In the successive
condition the first view was displayed for 1,500 ms, and the second
view remained visible until the participant responded. The inter-
stimulus interval between the first and the second view was 1,500
ms. Intertrial interval were 1,250 ms in the simultaneous condition
and 1,750 ms in the successive condition.
In the simultaneous matching condition the angular difference
between the views of the prototypes was 0
0
,20
0
,40
0
,60
0
,80
0
,
100
0
, 120
0
, 140
0
, 160
0
, and 180
0
about the vertical axis. For each
prototype and for each angular difference in orientation there was one
positive and one negative trial. Negative trials were identical to
positive trials except for a replacement of one pattern with its mirror
image with respect to the frontal plane. A block of simultaneous
matching trials thus comprised 50 positive and 50 negative trials.
The composition of stimulus pairs in the successive matching
condition was the same as in the simultaneous matching condition.
Because error rates approached 40 to 50% at large difference in
orientation in pilot experiments, only a restricted range (0
0
,20
0
,
40
0
,60
0
,80
0
) of five angular differences in orientation was inves-
tigated. A block of successive matching trials thus comprised 25
positive and 25 negative trials. The direction of shortest rotation
path was always the same: In the successive matching condition
the leftmost parts of the stimuli were always rotated about the
vertical axis away from the viewer into the screen; in the simul-
taneous condition the shortest path was to turn the leftmost parts of
the left stimulus about the vertical axis into the screen. There were
eight blocks of successive matching trials interleaved with eight
blocks of simultaneous matching trials in an ABAB . . . sequence
starting with the simultaneous matching condition (two partici-
pants) or with the successive matching condition (two partici-
pants). Within each block the sequence of trials was randomized
anew for each participant.
Procedure. The participants were tested individually and
were asked to determine if the two patterns in a pair were congru-
ent, disregarding differences in angular departure, if any, and to
respond quickly with only a few errors. Accuracy was fed back
after each trial in the lower left corner of the screen. Participants
practiced for about a quarter of hour and were informed of the
possible changes of the direction and magnitude (up to 180
0
)ofthe
orientation of the stimuli. The experiment was self-paced between
blocks and took about 3 to 4 hr to complete, including breaks.
1
Roger Shepard and Michael Tarr have made copies of the original
tri-dimensional stimuli publicly available. The stimulus material was re-
trieved from http://www.cog.brown.edu/~tarr/stimuli.html#sh
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
3
DECONSTRUCTING MENTAL ROTATION
Results
Subjective reports. The participants were debriefed after
completing the experiment. They reported that they focused on one
or more characteristic subpatterns on the first stimulus object in the
successive matching condition and at the same time tried to keep
the general spatial layout of the stimulus in mind. At the onset of
the second stimulus, they looked for the memorized characteristic
subpatterns and mentally rotated the image to fit the corresponding
subpatterns in the stimulus.
When the stimuli were displayed side by side at the same time,
the participants used a similar strategy of encoding one of the
stimuli with particular attention to prominent details or subpat-
terns. Corresponding details in the other stimulus were then iden-
tified and confirmed (or falsified) after mentally rotating the image
to fit the other stimulus. In many cases this process was repeated,
sometimes after encoding new characteristic features. In some
cases the participants claimed that they immediately realized that
stimulus patterns were mirror image pairs.
Response latencies. All RTs for correct responses were ana-
lyzed. Figures 1 and 2 illustrate group mean RTs and response
accuracies for congruent and incongruent pairs, respectively. The
angular difference within negative pairs in Figure 2 corresponds to
the angle between the congruent stimuli prior to the replacement of
one of the stimulus patterns by its reflection in the frontal plane.
To determine effects of mode of presentation (simultaneous vs.
successive matching) angular difference was only analyzed at the
five levels (0
0
,20
0
,40
0
,60
0
,80
0
) common to both modes. The
overall effects of mode of presentation, F(1, 3) 22.88, p.02,
p
2.88, type of stimulus pair (positive vs. negative), F(1, 3)
15.45, p.03, p
2.84, and angular difference in orientation,
F(4, 12) 10.65, p.02, p
2.78, were significant.
2
The
interaction between angular difference in orientation and type
(positive or negative) of stimulus pair, F(4, 12) 6.97, p.03,
p
2.70, was also significant.
Regardless of mode of presentation, the effect of angular dif-
ference was only significant on positive trials, F(4, 12) 25.79,
p.01, p
2.90 and F(4, 12) 8.80, p.03, p
2.75, in the
simultaneous and successive matching task, respectively. There
was a reliable linear component in both tasks, F(1, 3) 17.22, p
.025, p
2.85, successive matching, and F(1, 3) 45.86, p.01,
p
2.94, simultaneous matching, respectively. The interaction
between mode of presentation (see Figure 1) and angular differ-
ence was reliable, F(4, 12) 13.53, p.02, p
2.82.
Random walk model. Modeling the visual behavior of each
participant in the simultaneous matching condition is based on the
assumption that a subset of the cognitive procedures that can only
be executed once in the successive matching condition is repeat-
edly executed (serially) in the simultaneous matching condition
until the accumulated evidence for a match or mismatch exceeds a
fixed threshold. The reciprocal to the rate of mental rotation, ,is
treated as a free parameter in the random walk model. For com-
parison purposes this rate, and the inverse to the rate of mental
rotation estimated by the linear slope constants in least chi-square
fits to successive and simultaneous matching RTs (
Succ
and
Sim
,
respectively) is displayed in Table 1.
Let the duration of the encoding process in the simultaneous
matching condition be t
Encode
, let the time taken to mentally rotate a
visual image through the angle vbe v, where is a constant,
3
and let
the duration of the comparison process be t
Compare
. Then the time
taken to execute these processes once equals t
Encode
⫹␣vt
Compare
,
and the total RT when repeating them ntimes is given by
RT RTBaseline
n(TEncodeCompare v), for positive responses .
(1a)
RT RTBaseline
nTEncodeCompare, for negative responses,
(1b)
where RTBaseline
and RTBaseline
represent base RTs for positive and
negative responses, and t
Encode
t
Compare
is collapsed into the
parameter T
EncodeCompare
. It seems likely that participants also
encode, mentally rotate, and match a schematic visual image on
negative trials. Presumably this adds a constant latency regardless
of angular difference to every negative response. This constant
latency may be represented as a component of the RTBaseline
pa-
rameter. In Experiment 1, I made the simplifying assumption that
participants on average on the negative pairs did perform a mental
rotation corresponding to an angular difference of 90
0
. Thus,
Equation 1b was replaced by Equation 1c.
RT RTBaseline
n(TEncodeCompare 90) . (1c)
The theory of random walks (see Feller, 1970) provides the math-
ematical foundation for computing the number of repetitions nto
reach a specified state of evidence, when the probability of a gathering
one unit of evidence as a result of the comparison is known.
To model visual performance, let the initial evidence for a positive
or negative response at the beginning of a trial be zero, and let the
probability of moving toward the positive threshold
and accumu-
lating one unit of evidence in favor of a match be p. Then, the
probability of moving toward the negative threshold
and accumu-
lating one unit of evidence for a mismatch equals 1 – p. Evidence is
accumulated such that one unit of evidence favoring a match cancels
one unit of evidence favoring a mismatch. As proven by Feller (1970,
p.353; see also Bundesen, 1982; Larsen, McIlhagga, & Bundesen,
1999) the probability (u
n
) of reaching the negative threshold after just
nsteps is given by
una12npnz
2(1 p)nz
2
i1
a1
cosn1i
asin
i
asin
zi
a, for nz,
(2)
where ais the distance between
and
, and zis the distance from
to zero. If nz, then u
n
0, and if nz, then u
n
(1–p)
n
. The
probability (U
n
) of reaching the negative threshold after nsteps or
less is given by,
Un
1
n
ui. (3)
2
Repeated-measures analysis of variance with alpha level equal to .05
throughout, and with Greenhouse-Geisser adjustment when applicable (p
2
represent partial eta-squared).
3
The reciprocal of the linear slope constant is usually interpreted as
the velocity of mental rotation. However, as will be clear from the analysis
of eye movements, represent the combined effect of saccades and
fixations.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
4LARSEN
By substituting 1 – pfor pand a–zfor zin Equation 2, the
probability of reaching the positive threshold after nsteps or less
is given by,
Vn
1
n
ui. (3a)
The probability of reaching the negative threshold
as ntends to
infinity (see Feller, 1970, p. 345) (U
) is given by
U1z
a,ifp. 5, (4a)
and
U
1p
p
a
1p
p
z
1p
p
a
1
,ifp.5. (4b)
For positive stimulus pairs 1– U
estimates the probability of
reaching the positive threshold
(hits). For negative stimulus
pairs U
is a direct estimate of accuracy (correct rejections).
The mean number of steps to reach the positive threshold on
positive trials was obtained by increasing nuntil V
V
n
10
8
and for each step size, n, adding the product of nby the corre-
sponding probability given by Equation 2, in which 1 – pis
substituted for pand a–zfor z. Likewise mean number of steps
to the negative threshold on negative trials was computed by
increasing nuntil U
U
n
10
8
and for each step size, n, adding
the product of nby the corresponding probability given by Equa-
tion 2.
Random walk model fit. The model has 10 free parameters.
Two baseline RTs (one for each type of response: RTBaseline
and
RTBaseline
), and one common encode and comparison time param-
eter (T
EncodeCompare
) for positive and negative stimulus pairs; two
linear threshold functions
and
of angular difference in
orientation with zero intercepts 0
and 0
, respectively, and slope
constants slope
and slope
, respectively. Two drift parameters: the
probability, p
, of drifting toward the positive threshold for pos-
itive pairs, and the probability, p
, of drifting toward the negative
threshold for negative pairs, and the processing time increment
with angular difference in orientation (), for each run of the
mental rotation module.
Noninteger threshold values were treated as probability mix-
tures such that a threshold of 2.1, for example, was computed as a
mixture of a threshold of two with probability .9 and a threshold of
three with a probability of .1.
For each participant the random walk model was fitted to mean
RTs and response accuracies (1.0) by minimizing the total
chi-square deviation between predicted and observed values. For
each data point a chi-square deviation with one degree of freedom
was computed by squaring the deviation between observed and
predicted means and dividing the result by the square of t he standard
Reaction Time (ms)
500
1000
1500
2000
2500
3000
3500
Simultaneous
Successive
Angular Difference in Orientation
0 20 40 60 80 100 120 140 160 180
Accuracy
0.75
0.80
0.85
0.90
0.95
1.00
Figure 1. Experiment 1: Top panel shows group mean reaction times (RTs) for correct responses to congruent
(positive) stimulus pairs as a function of angular difference in orientation with stimulus presentation mode
(successive vs. simultaneous) as a parameter. Successive matching RTs are fitted by a least squares straight line.
Dashed lines show theoretical fits by the random walk model, solid lines show fits to successive matching data.
Bottom panel: Proportion of correct responses. The solid curved line is a spline interpolated fit to accuracy data
in the successive matching condition. Vertical bars around each symbol show the standard error of that mean
based on the corresponding means from the four individual participants.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
5
DECONSTRUCTING MENTAL ROTATION
error of the mean. By summing over all computable chi-square
deviations, goodness of fit was then measured as the sum of
chi-squares with the degrees of freedom equal to the number of
data points (40), minus the number of free parameters in the
random walk model (10), minus the number of data points for
which the standard error of the mean could not be computed.
Table 1 shows model parameter values and goodness of fit for
each participant, and in Figures 1 and 2 the dashed lines fitted to
the filled circles show the fit of the model to the data. The fits were
acceptable for each participant per se as well as for the group of
participants. The mean correlation based on individual fits for each
participant between predicted and observed data were .91 (RTs),
.80 (accuracy). By comparing observed and predicted grand
means, the corresponding correlations were higher, .96 (RTs), .89
(accuracy).
Discussion
The patterns of RTs and rates of correct responses agree with
previous findings: An approximately linear increase in positive
response latencies as functions of angular discrepancy (e.g., Borst,
Kievit, Thompson, & Kosslyn, 2011; Just & Carpenter, 1976;
Larsen, 1985; R. N. Shepard & Cooper, 1982; R. N. Shepard &
Metzler, 1971; S. Shepard & Metzler, 1988). The linear slope
constant was largest in the simultaneous condition in agreement
with the computational model on which the encode, mental rota-
tion, and match module was usually done more than once on a
trial. This difference between successive and simultaneous presen-
tation has been established previously by Cohen and Kubovy
(1993; for related data see also, S. Shepard & D. Metzler, 1988)
and parallels an analogous result for mental transformations of size
(Larsen et al., 1999).
Random walk model. Discounting paradigms in which ob-
servers in response to a cue imagine a stimulus in the cued
orientation (e.g., Bethell-Fox & Shepard, 1988; Cooper, 1975,
1976) most reported research on mental rotation tend to dismiss
negative response latencies and error rates. In contrast, the random
walk model accounts for both positive and negative latencies as
well as miss- and false-alarm rates. The theoretical fits show that
the basic principles embodied in the model provide a plausible
account of both positive and negative response latencies as well as
errors. Overall the drift and baseline parameters seem reasonable.
The estimated positive baseline response time of some 105 ms for
participant MO does not seem realistic, however. A comparison of
participants MO and MA suggests that the model to some extent
makes allowance for a trade-off between baseline RTs and the T
Encode-
Compare
parameter.
The random walk model supports the notion that repeated ap-
plication of an assembly of a few elementary processes captures
the main aspects of performance on positive trials in the classical
mental rotation task. The model also explains performance on
negative trials, both RTs and false alarms, by the very same
principles, namely, by assuming that a mental rotation of the image
Reaction Time (ms)
500
1000
1500
2000
2500
3000
3500
Simultanoeus
Sucessive
Angular Difference in Orientation
0 20 40 60 80 100 120 140 160 180
Accuracy
0.75
0.80
0.85
0.90
0.95
1.00
Figure 2. Experiment 1: Top panel shows group mean reaction times (RTs) for correct responses to incon-
gruent (negative) stimulus pairs as a function of angular difference in orientation with stimulus presentation
mode (successive vs. simultaneous) as a parameter. Successive matching RTs are fitted by a least squares straight
line. Dashed lines show theoretical fits by the random walk model, solid lines show fits to successive matching
data. Bottom panel: Proportion of correct responses. The solid curved line is a spline interpolated fit to accuracy
data in the successive matching condition. Vertical bars around each symbol show the standard error of that mean
based on the corresponding means from the four individual participants.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
6LARSEN
of one of the stimuli, a match of the transformed image with the
other stimulus, is repeated until the accumulated evidence trans-
gresses a positive or negative threshold. In particular, negative RTs
were fitted by assuming that participants on average did a mental
rotation of 90
0
(cf. Equation 1c).
The angular difference between negative stimulus pairs is not
well defined. Another way to process negative stimulus pairs
would be to gradually rotate and match the encoded image against
the other stimulus until the image in all orientations around the
half-circle were found to mismatch, and then respond “different.”
This exhaustive search strategy is not very likely. It implies that
negative RTs should exceed positive RTs at 180
0
, which is clearly
not the case (see Figures 1 and 2), and it goes against the assump-
tion (cf. Larsen & Bundesen, 2009) that the trajectory of the visual
image in VSTM is computed prior to the actual transformation.
Consider next the hypothesis that visual behavior in the simul-
taneous matching task is a partial replication of visual behavior in
the successive matching task. In particular that the simple mental
rotation in the successive matching task is repeated in the simul-
taneous matching task. On this account one may expect that the
slope of the RT function in the successive matching task (
Succ
)is
close to the slope () of the simple mental rotation function. Table
1 does not provide support for this idea and the findings in
Experiment 2 shows that the hypothesis is not tenable.
In sum, the random walk model framework for achieving ori-
entation invariance accounts for RTs and errors. The framework
integrates a serially ordered triad of processes comprising encod-
ing into VSTM, transformation of the content in VSTM, and match
of the transformed image with a stimulus object. On the assump-
tion that the sequence of eye movements mirrors the sequence of
this triad, the random walk model predicts the number of eye
movement switches back and forth between stimulus objects dur-
ing task performance. Thus, a very strong test of the random walk
model may be obtained by testing predictions of RTs, errors, and
eye-movement switches back and forth between patterns.
Experiment 2
Experiment 2 had two main goals: First, to test whether the
comprehensive account of behavioral data by the random walk
model could be replicated, when stimulus complexity was varied,
and when the model was constrained to account for eye movement
switches too. Second, to make progress in determining the tem-
poral location and nature of the sources that generate the widely
diverging estimates of the rate of mental rotation reported in the
literature (see, e.g., Bethell-Fox & Shepard, 1988; Folk & Luce,
1987; Förster et al., 1996; Pylyshyn, 1979; Yuille & Steiger, 1982,
Experiment 2).
Method
Participants. Six male and two female students, 17 to 29
years of age, were paid to participate. All had normal or corrected-
to-normal vision. Two of the participants (both male) were ex-
cluded. Both felt ill at ease after a few minutes of wearing the eye
movement recording equipment and were unable to complete the
session.
Stimuli. Stimuli were filled green random two-dimensional
polygons with either eight or 20 vertices displayed on a black
background. They were constructed by defining a certain point as
the center of the polygon and by letting the endpoints of either
eight or 20 imaginary half lines originating from the center deter-
mine the vertices of the polygons. The orientation of the first half
line was chosen at random and the succeeding seven (19) half lines
were then rotated counterclockwise in steps of 45
0
(18
0
) such that
the angle between any half line and the preceding one was 45
0
(18
0
). The length of each half line was selected at random to
subtend between 0.48
0
and 4.8
0
of visual angle from the center
subject to the constraint that the shortest half line was less than
0.96
0
and the longest was greater than 3.8
0
.
The polygons were displayed in pairs of two simple polygons
with eight vertices or two complex polygons with 20 vertices. The
mode of presentation was either successive in which case the
polygons were presented one by one approximately at the center of
the screen, or simultaneous, in which case the centers the polygons
were displayed 6.5
0
to the left and 6.5
0
to right of the center of the
screen. Figure 3 shows examples of positive and negative stimulus
pairs. Positive pairs comprised two congruent polygons. Negative
pairs comprised two almost congruent polygons one of which was
a slight perturbation of the other. The perturbation was made by
increasing the shortest half line by a factor of two (simple polygon
pairs) or five (complex polygon pairs), and decreasing longest half
line by a factor of two (simple polygon pairs) or five (complex
polygon pairs). The different perturbation factors were based on
pilot studies with two participants in an attempt to equalize error
rates. Using the same perturbation factors for both levels of com-
plexity made same/different discriminations much more difficult
for complex stimulus pairs, in which only 10% of the vertices
(compared to 25% of the vertices in the simple patterns) were
Table 1
Best Fitting Random Walk Model Parameters in Experiment 1
Parameters
Participants
JC JE MA MO M
RTBaseline
629.01 891.07 841.78 105.50 616.84
RTBaseline
500.00 756.99 676.09 467.53 600.15
T
EncodeCompare
224.40 147.43 22.14 337.90 182.97
p
.73 .83 .74 .97 .82
p
.93 .96 .96 .82 .92
0
1.39 1.14 1.81 2.84 1.80
0
2.56 2.03 3.14 1.58 2.33
slope
0.18 0.15 0.81 0.92 0.52
slope
0.65 0.24 0.28 0.03 0.03
5.31 7.87 2.4 3.93 4.88
Succ
2.71 5.50 3.27 1.13 3.15
Sim
11.92 9.94 8.62 4.84 8.83
Summary
2
35.74 18.97 27.57 31.62 114.06
df 28 24 23 23 96
p.149 .754 .191 .081 .101
Note.RTBaseline
and RTBaseline
represent base RTs for positive and neg-
ative responses, and t
Encode
t
Compare
is collapsed into the parameter
T
EncodeCompare
. The parameters slope
, and slope
estimate rate of change of
thresholds per 180
0
in the simultaneous matching condition, and 0
and 0
estimate corresponding zero intercepts. Parameters
Succ
and
Sim
repre-
sent the linear slope constants (ms/degree) in the least square fits to the RTs
in the successive and simultaneous matching condition, respectively. Ran-
dom walk model parameters in italics. RT reaction time.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
7
DECONSTRUCTING MENTAL ROTATION
changed in different pairs. Despite these attempts it turned out that
error rates were still much higher for complex stimulus pairs.
Design. In the simultaneous condition the two stimuli were
presented simultaneously, side by side, and type of stimulus pair
(same or different), complexity (simple or complex), and angular
difference in orientation (0
0
,30
0
,60
0
,90
0
, 120
0
, 150
0
,or180
0
)
was varied orthogonally. Each of the 28 combinations of variables
was repeated 15 times, which yielded 420 trials per participant.
In the successive condition the orthogonal variation of type of
stimulus pair, complexity and four levels of angular difference (0
0
,
60
0
, 120
0
,or180
0
) gave 16 combinations of variables that were
repeated 24 times, which yielded a total of 384 trials for each
participant. The right field stimulus in the simultaneous condition
was always rotated clockwise relative to the left field stimulus.
Likewise, the second stimulus in a successive pair was always
rotated clockwise relative to the first stimulus.
Randomization of polygons and sequence of trials was done
anew for each participant.
Procedure. Participants were seated in front of the display
screen at a viewing distance of 60 cm in a semidarkened room.
They were informed of direction and magnitude of the change in
orientation, if any, and practiced the tasks for about 30 min on
which a block of 192 trials in the successive matching condition
was run. After a short pause, the eye movement recording equip-
ment (Eyelink II, SR Research Ltd, Ontario, Canada) was mounted
and calibrated on which the whole block of 420 simultaneous
matching trials was run. Sampling rate was 250 Hz and calibration
of the eye tracker was done automatically every 15th trial. After
dismounting the eye recording equipment and a 10 min pause the
remaining 192 successive matching trials were completed.
In each condition, a fixation cross in the center of screen
signaled the onset of a stimulus pair for 500 ms and instructed
participants to fixate the cross until the onset of the trial. The cross
and the stimuli remained on the screen in the simultaneous match-
ing condition until the participant responded. In the successive
matching task, the fixation cross went off and the first stimulus in
pair was then displayed for 2,000 ms. After a latency of 500 ms the
second stimulus was displayed and remained on the screen until
the participant responded. The location of the center of the second
stimulus was randomly displaced horizontally up to 1.4
0
of
visual angle relative to the first stimulus. In either condition
accuracy was then fed back for 1,000 ms in the lower left part of
the screen in blue print if the response was correct, otherwise in red
print. After a latency of 500 ms, the appearance of the fixation
cross at the center of the screen signaled the next trial. The
simultaneous matching condition was self-paced after each block
of 85 trials, and the successive condition after each block of 48
trials.
Results
The participants’ remarks on their strategies for solving the
tasks were similar to the reports of the participants that served in
Experiment 1. In a few cases the recalibration of the eye-
movement equipment failed or saccades landed outside the screen
area (altogether 45 trials or 1.8%). Except for responses on these
trials, all correct responses were analyzed.
Response latencies.
Successive matching. All RTs for correct responses were an-
alyzed. Effects of complexity, F(1, 5) 12.05, p.02, p
2.71,
and angular difference in orientation, F(3, 15) 9.64, p.007,
p
2.66, were reliable on positive trials, and the effect of angular
difference had a significant linear component, F(1, 5) 17.64,
p.008, p
2.78. Angular difference has no effect on negative
trials, F(1, 5) 1, but there was a significant effect of complexity,
F(1, 5) 22.00, p.005, p
2.82.
Simultaneous matching. All RTs for correct responses were
analyzed. The effect of angular difference in orientation, F(6,
30) 5.91, p.003, p
2.54, was reliable, and the effect of
complexity nearly so, F(1, 5) 6.42, p.052, p
2.56. There
was a significant linear component, F(1, 5) 7.57, p.04, p
2
.60, in the effect of angular difference.
There was no effect of angular difference on negative trials, F(1,
5) 1, but there was a significant effect of complexity, F(1, 5)
35.76, p.002, p
2.88.
Random walk model. The two-level complexity factor aside,
the basic designs of Experiment 1 and 2 were identical, and the
number of free parameters in the random walk model thus doubled
to 20. Figure 4 displays simultaneous matching group mean RTs
and response accuracies for congruent stimulus pairs as a function
of angular difference in orientation with complexity as a parame-
ter. Figure 5 shows corresponding data for incongruent stimulus
A
B
C
D
Figure 3. Examples of the stimulus polygons used in Experiment 2.
Panels A and B show simple polygons, panel C and D complex polygons.
A and D show positive stimulus pairs, B and C negative pairs.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
8LARSEN
pairs, and Figure 6 the number of saccade switches between
stimuli. As can be seen from the fits in the figures, and Table 2, the
random walk model successfully accommodates effects of com-
plexity and overall provides a fairly good fit to positive and
negative RTs, false alarms and misses, and eye movement
switches.
The mean correlation based on individual fits for each partici-
pant between predicted and observed data were .84 (RTs), .37
(accuracy), and .81 (eye movement switches). By comparing ob-
served and predicted grand means, the corresponding correlations
were .98 (RTs), .82 (accuracy), and .97 (saccade switches).
Analysis of eye movements. With a few modifications the
classification of eye movements was done according to the proce-
dure described by Nakatani and Pollatsek (2004), who used a
related, but more complex task in which the orientation of visual
scenes and the orientation of individual objects within a scene
were manipulated independently. The sequence of eye-movements
was grouped into an initial latency, first pass, saccade switches,
and second pass. The initial latency is defined as the time from
stimulus onset until the first saccade off the central fixation point
lands on one of the stimulus objects (Object A). The first pass
comprises the total time of saccades and fixations spent on A
terminated by the onset of the first saccade switch to the second
stimulus (Object B). The duration of the remaining trial, accumu-
lated across fixation and saccades on A and B, but disregarding
separately recorded durations of saccade switches between stim-
ulus A and B, defines the second pass.
The main goal of the analysis of eye movements was to locate
the sources of the increase in response time as a function of
angular difference in orientation. This was done in two steps. First,
by fitting a minimum chi-square straight line as a function of
angular difference to the mean duration of initial latencies, eye
movement switches, and first and second pass durations (see Table 3).
Second, by fitting a minimum chi-square straight line to saccades
and fixations in those eye movement phases in which a reliable
linear increase was established.
The fits were done for each participant, and each eye movement
category (initial latency, etc.) and each level of angular difference
by computing a chi-square deviation with one degree of freedom.
The chi-square was computed by squaring the deviation between
observed and predicted means and dividing the result by the square
of the standard error of the mean. For each participant goodness of
fit for a particular type of data was then estimated as the sum of
chi-squares with the associated degrees of freedom equal to the
sum of number of data points (7), minus the number parameters in
the linear fit. Overall goodness of fit was then obtained by sum-
ming chi-squares and degrees of freedom across participants.
Negative responses. Negative response latencies could be fit-
ted as a constant (one parameter) function of angular difference for
each participant and each level of complexity
2
((7–1)
6
2
72) 70.13, p.54. The remaining analyses of were thus
confined to the eye movement partitions of positive responses.
Initial latency and first pass duration. Table 3 suggests that
the RT increment for positive responses is almost exclusively due
to saccade switches between stimulus objects and processes in the
second pass. This was confirmed by an analysis of initial latencies
and first pass durations that could be fitted for each participant and
Reaction time (ms)
500
1000
1500
2000
2500
3000
3500
Simple
Complex
Angular Difference in orientation
0 30 60 90 120 150 180
Accuracy
0.75
0.80
0.85
0.90
0.95
1.00
Figure 4. Positive responses in the simultaneous matching condition in Experiment 2. Top panel: Group mean
reaction times (RTs) for correct responses to positive stimulus pairs as a function of angular difference in
orientation with stimulus complexity as a parameter. Bottom panel: Proportion of correct responses. Solid and
dashed lines represent fits by the 20 parameter random walk model. Vertical bars around each symbol show
standard errors of group means.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
9
DECONSTRUCTING MENTAL ROTATION
each level of complexity as constant functions of angular differ-
ence in orientation,
2
(144) 162. 83, p.13.
Saccade switches. For each participant and each level of com-
plexity total saccade switch duration as a function of angular
difference was fitted by two straight lines with different slopes and
different intercepts. The fit was excellent,
2
(60) 54.73, p.67.
Slope constants were 8.9 ms per 60
0
in the simple condition and
23.4 ms per 60
0
in the complex condition (see also Table 3). The
linear increase in switch duration is essentially based on a linear
increase in the number of switches as a function of angular
difference,
2
(60) 56.86, p.59 (cf., Figure 6). The average
increase in the number of switches was 0.12 per 60
0
in the simple
condition and 0.34 per 60
0
in the complex condition. In contrast,
duration of saccades per se was nearly a constant function of angle.
The duration of saccade switches as a function of angular depar-
ture could be fitted by two straight lines,
2
(60) 75.70, p.08,
with different intercepts (46.10 ms and 51.77 ms) and different, but
very small, slope constants (4.53 ms and 1.14 ms per 180
0
,inthe
simple and complex conditions, respectively).
Second pass. Second pass durations for the group of partici-
pants increased approximately linearly as a function of angular
discrepancy,
2
(10) 7.12, p.71, and with a steeper slope
in the complex condition (about 304.1 ms and 101.3 ms per 60
0
in the complex and simple condition, respectively). The linear
fit for five of the participants was good,
2
(50) 52.58, p
.37, while the deviation of second pass durations from a straight
line for one participant (ML) was significant,
2
(10) 32.66, p
.001. For ML, correlations between angular discrepancy and sec-
ond pass duration was equal to .34 (simple condition) and .76
(complex condition).
Number and duration of saccades. Across participants, the
time to execute a saccade was essentially a constant function of
angular difference,
2
(12) 8.61, p.74, but with some random
fluctuations within participants,
2
(71) 113.03, p.001. The
grand mean duration of saccades was equal to 26.8 ms, and
increased by 1.32 ms (range: 7.44 ms, 8.84 ms) from 0
0
to 180
0
.
On average, the increase in the number of saccades from one
location to the next on one of the objects in the second pass was
small (about 0.07 per 60
0
), but nevertheless significantly different
from zero,
2
(72) 170.80, p0.001.
Fixations. The duration of fixations in the second pass in-
creased roughly linearly (see Figure 7) as a function of angular
difference with a common slope (8.8 ms per 60
0
) on simple and
complex trials,
2
(66) 76.90, p.17, and almost identical
intercepts (207.1 ms and 208.2 ms in the simple and complex
condition, respectively).
The linear increment as a function of angular difference of
saccade and fixation durations is 0.44 ms per 60
0
and 8.8 ms per
60
0
, respectively. The difference is statically reliable (paired ttest,
p.03).
Integrating local and global analyses of mental rotation.
The random walk model may be a reasonable first approximation,
but the interpretation of the 20 parameters in the model is obvi-
ously an issue. The eye movement analysis offers an interesting
opportunity to evaluate the model against a new set of data because
the partitioning of trials into initial latencies, saccade switches, and
Reaction Time (ms)
500
1000
1500
2000
2500
3000
3500
Simple
Complex
Angular Difference in Orientation
0306090120150180
Accuracy
0.65
0.70
0.75
0.80
0.85
0.90
0.95
Figure 5. Negative responses in the simultaneous matching condition in Experiment 2. Top panel: Group mean
reaction times (RTs) for correct responses to negative stimulus pairs as a function of angular difference in
orientation with stimulus complexity as a parameter. Bottom panel: Proportion of correct responses. Solid and
dashed lines represent fits by 20 parameter random walk model. Vertical bars around each symbol show standard
errors of group means.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
10 LARSEN
first and second pass makes allowance for direct measures of key
parameters in the model.
Thus, to constrain the model, baseline RTs (cf. Equation 1a and
1b) for each participant and each level of complexity and each type
of response was calculated from the sum of the average of
initial latency and the average first pass duration (see Table 3)
in agreement with the assumption that the source of the increase
in RT lies in the second pass and in the eye movement switches.
The first four baseline parameters in model were thus treated as
constants.
The model was further constrained by computing four encode/
compare constants (one for each response type and complexity)
and the linear increment in the two (simple and complex) simple
mental rotation functions. For positive responses this was done by
estimating the best linear fit to the mean gaze duration on one
stimulus object in the second pass (i.e., the sum of durations of
fixations and saccades) as a function of angular difference. The
slope for each level of complexity that determines the rate of the
simple mental rotation thus replaces the slope parameter, ,in
Equation 1a, while the intercept enters as a component of the
encode/compare processes (TSimEncode
,TComEncode
, cf. Equation1a
and 1b).
The duration of one saccade switch between stimulus objects
was almost the same irrespective of response category, complex-
ity, and angular difference. Thus, for positive responses and each
level of complexity, the duration of the encode/compare process
was set to the intercept plus the mean duration (t) of one saccade
switch, measured across response type, complexity, and angular
difference. For negative responses and each level of complexity,
the encode/compare parameter (TSimEncode
,TComEncode
) was set to t
plus the mean gaze duration across angular difference in orienta-
tion.
In sum, the number of free parameters reduces to four drift
parameters (one for each level of complexity and response type),
and two parameters in each of four linear equations that for each
level of complexity specify upper thresholds for positive responses
as a function of angular difference, and correspondingly lower
thresholds for negative responses, yielding a total of 12 free
parameters.
For Participants RG and JO the partition of RTs into duration
of eye movement phases (see note to Table 3) was incomplete
and left fairly large RT residuals. On average RG’s RTs were
142.3 ms higher and JO’s RTs 27.5 ms higher than could be
accounted by summing the durations of eye movement parti-
tions. Thus, for RG and JO RTs on trials that could not be fully
partitioned were removed, and errors, saccade switches, and
remaining RTs were then modeled by treating the constants
based on eye recordings as partially free parameters; that is, the
constants were constrained to vary within an envelope 33%
of their original values, which left 22 free parameters to model
RG’s and JO’s data. For RG the mean numerical deviation
between constrained parameters and eye movement based con-
stants was 15% and for JO 24%. In Table 4 all data for JO and
RG represent parameter values.
The constrained fit is generally weaker,
2
(383) 454.60, p
.077 than the unconstrained fit with 20 free parameters. The
correlation based on individual fits for each participant between
predicted and observed data were also marginally weaker, .82
(RTs), .36 (accuracy), and .77 (eye movement switches). Still,
grand mean correlations between predicted and observed RTs,
Angular Difference in Orientation
0 30 60 90 120 150 180
Mean Number of Switch Saccades
Positive Trials Negative
0 30 60 90 120 150 180
1.0
1.5
2.0
2.5
3.0
3.5
Simple
Complex
Positive
Figure 6. Experiment 2: Number of eye movement switches in the simultaneous matching task. Group
mean eye movement switches as a function of angular difference in orientation with stimulus complexity
as a parameter. Left panel: Positive stimulus pairs. Right panel: Negative stimulus pairs. Solid and dashed
curves represent fits by 20 parameter random walk model. Vertical bars show standard errors of group
means.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
11
DECONSTRUCTING MENTAL ROTATION
accuracy, and eye movement switches were high, namely .96, .92,
and .91, respectively.
Discussion
Experiment 2 confirmed and substantially extended the random
walk model. The model and the eye movement data suggest that
we decide that two stimuli portray the same object in different
orientations by first encoding one of the stimuli in one or more
fixations; and next, switch our gaze to the second stimulus to
compare and align the encoded stimulus (the image) with the
second stimulus in one or more fixations on the second stimulus.
Finally, if sufficient evidence is obtained, issue a response; other-
wise, these processes are repeated until sufficient evidence for a
decision has been acquired. The initial encoding that is done in the
first pass and the prior computation of the layout of the visual
scene during initial latency is roughly a constant function of
angular difference.
The account is formalized in the constrained random walk
model. The model seems to capture major aspects of visual be-
havior in the classical mental rotation task and it takes major steps
toward linking global and local processes invoked in achieving
orientation invariance by mental rotation (cf. Table 5). It predicts
to good first approximation effects of angular difference and
complexity on positive and negative RTs, errors, and eye move-
ment switches with fairly few free (12) parameters for four par-
ticipants, and with 10 additional free parameters constrained to
vary within bounds that make good sense for two participants (JO,
RG). Note also that the constrained model is not optimally con-
figured. This is because the constants in the model (i.e., baselines,
compare/encode constants, and slope constants) are estimates, and
because several assumptions, although convenient, are too simple.
For example it is assumed that the duration of a switch saccade is
independent of prior switches on a trial, and that duration of a
switch saccade is the same disregarding complexity, and response
type.
Interpretation of Negative Response Times in the
Random Walk Model
Experiments 1 and 2 were not designed to throw light on how
we come to realize that objects are different with respect to shape.
Indeed, the basic function of negative stimulus pairs is not to
elucidate negative response times at all, but to experimentally
isolate the process, in casu mental rotation, by which orientation
invariance may be achieved. For example, it is well known that the
composition of negative stimulus pairs is highly critical for invok-
ing mental rotation (see, e.g., Förster et al., 1996; Takano, 1989),
Table 2
Experiment 2: Best Fitting Random Walk Model Parameters
Participants
Parameters ML RG LL TS JO PH M
RTSimBaseline
159.77 744.91 355.42 350.97 297.71 500.36 401.52
RTComBaseline
365.68 948.58 469.10 761.92 471.23 358.56 562.51
RTSimBaseline
356.34 622.11 411.71 511.55 440.02 286.70 438.07
RTComBaseline
504.72 1076.40 544.45 896.97 1016.61 850.49 814.94
T
SimEncode
636.73 855.40 641.63 684.84 523.70 779.91 687.03
T
ComEncode
551.02 530.60 455.32 477.03 486.84 617.96 519.79
Simple
0.09 1.04 0.14 0.54 1.47 0.30 0.72
Complex
0.92 1.10 0.26 1.43 2.80 0.97 1.48
pSimple
.96 .50 .75 .85 0.92 .50 .75
pComplex
.64 .81 .89 .73 0.87 .66 .77
pSimple
.72 .99 .92 .87 0.74 .96 .87
pComplex
.87 .74 .55 .91 0.62 .68 .73
0 SimpleSlope
1.60 0.27 1.04 0.76 1.28 0.72 0.94
0 Complex
0.84 0.69 1.69 0.79 2.51 0.85 1.23
0 Simple
1.00 2.73 1.25 1.71 1.21 2.73 1.77
0 Complex
2.05 1.32 1.09 2.04 1.00 2.14 1.61
SimpleSlope
0.70 0.09 0.33 0.23 0.47 0.54 0.21
ComplexSlope
0.72 0.45 0.15 0.81 0.59 0.18 0.38
SimpleSlope
0.21 0.02 0.55 0.68 0.28 0.46 0.18
ComplexSlope
0.46 0.16 0.08 0.63 0.06 0.87 0.20
Summary
2
59.41 53.75 45.40 62.38 74.88 68.86 364.67
df 58 56 61 55 57 64 353
p.424 .561 .932 .230 .043 .316 0.323
Note. Random walk model: Parameter values for each participant. Goodness of fit in bottom rows.
RTBaseline
and RTBaseline
represent base RTs for positive and negative responses, and t
Encode
t
Compare
is
collapsed into the parameter T
EncodeCompare
. Prefix Sim and Com in baseline subscripts designate simple and
complex condition, respectively. The parameters slope
, and slope
estimate rate of change of thresholds per
180
0
in the simultaneous matching condition, and 0
and 0
estimate corresponding zero intercepts.
Parameters
Succ
and
Sim
represent the linear slope constants (ms/degree) in the least square fits to the RTs
in the successive and simultaneous matching condition, respectively. Random walk model parameters in
italics. RT reaction time.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
12 LARSEN
and that little or no evidence of mental rotation is observed if
negative stimulus pairs are too different from positive stimulus
pairs.
There are presumably several cases, also noted by some partic-
ipants, where it is immediately obvious that stimuli cannot share
the same shape and where mental rotation is not invoked. In fact,
threshold parameter estimates between 1 and 1 (see Table 2)
imply that some responses cannot be based on mental rotation.
Nevertheless, it is worth noting that the random walk model
framework in which mental rotations are repeated can explain, in
principle at least, the time course of some proportion of (presum-
ably long) negative RTs. One way to see this is to assume that
observers treat negative stimulus pairs just like they treat positive
stimulus pairs that result in a miss; namely, by basing correct
rejections of negative stimuli on a sequence of mental rotations in
which the number of mismatches exceeds the number of matches
by a fixed threshold value. Because there is no well-defined
angular difference between incongruent stimuli, and because the
whole range of angular departures from 0
0
to 180
0
is known to the
participants, it appears natural to assume that the angular differ-
ence through which the mental rotation is done is about 90
0
on
average.
Negative response latencies in Experiment 1 were thus modeled
as if participants make a mental rotation of 90
0
(see Equation 1c).
By the same reasoning the TSimEncode
parameter (see Table 4) that
estimates encode and compare processing time on negative trials in
the second pass in Experiment 2 should be equal to TSimEncode
,
except for the hypothetical contribution due to mental rotation.
Assuming that participants on average execute a mental rotation of
90
0
on negative trials with simple stimuli, TSimEncode
should then
equal TSimEncode
90
Simple
. On negative trials with complex
stimuli, TComEncode
should likewise equal TComEncode
90
Complex
.
The correlation between predicted encoding and comparison times
on negative trials based on these assumptions and the observed
data (see Table 4) is .79. In five cases, predictions were too high,
in six cases too low, and in one case almost exact.
Table 3
Grand Mean Response Time Components in Experiment 2
Angular difference RT Initial latency Switch duration First pass Second pass
Not
classified
Positive responses
Simple stimuli
982.1 200.9 73.9 269.3 434.2 3.8
3°0 1,288.9 211.1 87.2 309.7 670.1 10.8
60° 1,332.9 221.3 90.6 298.9 714.4 7.7
90° 1,304.1 226.0 94.8 297.5 683.4 2.4
120° 1,375.4 225.2 91.1 334.1 723.4 1.6
150° 1,422.2 215.7 103.8 301.3 797.4 4.1
180° 1,375.3 221.5 105.9 292.4 748.4 7.1
Complex stimuli
1,298.4 211.8 96.5 320.0 662.3 7.8
30° 1,536.8 204.3 116.5 334.0 870.2 11.9
60° 1,773.9 194.2 129.5 329.1 1,111.4 9.7
90° 1,792.9 208.3 138.9 389.2 1,054.1 2.5
120° 2,021.8 204.9 132.9 364.9 1,315.9 3.2
150° 2,182.4 211.1 161.2 354.2 1,448.5 7.4
180° 2,285.0 197.9 163.8 347.6 1,553.8 21.8
Negative responses
Simple stimuli
1,189.3 242.6 80.2 306.1 552.1 8.3
30° 1,366.4 225.2 96.9 299.1 742.5 2.7
60° 1,229.8 222.3 84.1 320.6 597.0 5.8
90° 1,240.1 222.0 78.7 329.1 605.6 4.7
120° 1,286.2 225.7 91.6 289.1 662.8 17.0
150° 1,286.8 255.1 94.7 307.5 625.1 4.4
180° 1,246.9 240.1 89.3 304.1 601.9 11.5
Complex stimuli
1,790.4 195.1 129.9 352.5 1,108.9 4.0
30° 1,927.9 201.3 140.1 351.2 1,234.7 0.6
60° 1,791.2 197.1 120.7 328.4 1,142.3 2.7
90° 1,814.6 214.4 121.5 339.7 1,133.9 5.1
120° 1,865.9 201.5 115.0 363.4 1,184.5 1.5
150° 1,845.4 212.6 123.2 346.0 1,151.3 12.3
180° 1,874.4 193.3 126.3 350.0 1,196.4 8.4
Note. The table shows correct responses latencies partitioned into the duration of eye movement phases. Initial
latency includes the duration (approximately 30 ms) of the first saccade. Switch duration designate the sum of
the duration of saccade switches. First and second pass represent the sum of the duration of saccades and
fixations in the first and second pass, respectively. RT reaction time; not classified response terminated
events, usually saccades that were initiated but not completed before trial termination, eye blinks, or saccades
to areas outside stimuli.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
13
DECONSTRUCTING MENTAL ROTATION
Mental rotation may thus be a component in some proportion of
negative responses in Experiment 1 and 2. To be sure, negative
responses may rely on other processes, which have yet to be
revealed.
Nonlinear Predictions
Figures 4 and 6 show a few dips in the fits. This is because the
mean number of steps to a threshold as a function of angular differ-
ence is not linearly related to the linear change in threshold settings.
For example, a linear increase (or decrease) in positive thresholds (see
Tables 2 or 4) as a function of angular discrepancy leads to a positive
(or negative) acceleration of the mean number of steps to reach the
positive threshold. Thus, by averaging across participants the resulting fits
may approach straight lines but will rarely be strictly linear.
Orientation invariance may be achieved without using mental ro-
tation, for example by using verbal descriptions, or by fast detection
of common features such as similar vertices, which may result in
many errors, but nevertheless be used. I find it interesting that the
random walk model can, at least to some extent, account for positive
trials in which mental rotation is not used. Tables 2 and 4 show that
for some participants (e.g., RG) thresholds are less than 1 and greater
than 1, which implies that some of the (presumably fast) responses
from these participants cannot be based on mental rotation.
Mental Rotation is Done During Fixations
The duration of saccades between stimuli, and saccades within
stimuli in the second pass, is essentially a constant function of
angular difference in orientation, but the number of saccades both
0 306090120150180
150
175
200
225
250
275
Col 1 vs 1PassSimple
Col 1 vs 1PassComplex
Col 1 vs ThFpCompFixDur
Col 1 vs ThFpSimpFixDur
150
175
200
225
250
275
Simple
Complex
Second Pass
First Pass
Mean Fixation Duration
Angular Difference in Orientation
Figure 7. Experiment 2: Group means of first and second pass mean fixation duration on positive trials as a
function of angular difference in orientation with stimulus complexity as a parameter. Top panel: Second pass
fixations fitted by a least chi-square straight line with the same intercept and slope constant on simple and
complex trials,
2
(72) 92,13, p.06. Bottom panel: First pass fixations. Solid and dashed lines represent
averaged minimum chi-square zero slope straight lines to the data points,
2
(72) 90. 01, p.07. Vertical bars
around each symbol show standard errors of group means.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
14 LARSEN
within and between stimulus objects increases approximately lin-
early as a function of angular difference. Together, this adds a
substantial linear component to overall RT, which presumably
reflects processes that prepare for the basic orientation alignment
of the encoded stimulus, rather than the alignment itself, which
seems solely to be done during fixations in the second pass.
Three lines of evidence converge on this interpretation. First, the
overall effect of angular difference on the duration of switch
saccades and saccades on stimuli in the second pass is negligible
and significantly different from the effect of angular difference on
the duration of fixations. Second, there is evidence that mental
rotation is not done (or suppressed) during saccades (Irwin &
Carlson-Radvansky, 1996; Irwin & Brockmole, 2000; but see also,
Jonikaitis, Deubel, & de’Sperati, 2009).
Third, in line with a number of studies that support the general
hypothesis that visuospatial processing is confined to eye fixations
(see the review in Irwin, 2004), fixation duration in the second
pass increased approximately linearly as a function of angular
difference (see Figure 7).
Integrating Local and Global Analyses
of Mental Rotation
The linear slope constant for the duration of fixations in the
second pass is the same on simple and complex trials, which
suggests that the basic orientation alignment during a fixation is
done at the same velocity regardless of visual complexity. It seems
natural to assume therefore that there is a basic mental rotation
velocity that is reflected in the linear increase in fixation durations
as a function of angular difference. It is fast (perhaps about 15 ms
Table 4
Experiment 2: Constrained Random Walk Model
Participants
Constants/parameters ML RG LL TS JO PH M
RTSimBaseline
360.94 549.23 474.20 579.70 599.78 449.76 502.27
RTComBaseline
394.50 743.63 369.86 642.23 642.32 529.06 553.60
RTSimBaseline
368.49 602.88 543.01 578.12 559.32 423.99 512.62
RTComBaseline
385.81 557.02 378.82 599.88 654.08 535.41 518.50
TSimEncode
326.08 449.51 323.85 329.05 352.68 396.76 362.99
TSimEncode
348.87 455.40 365.43 449.7 498.7 406.42 420.75
TComEncode
386.08 365.77 327.85 440.34 443.06 415.62 396.45
TComEncode
459.58 596.80 394.87 534.05 620.08 551.89 526.21
Simple
0.00 1.03 0.00 0.64 1.03 0.37 0.51
Complex
0.61 0.65 0.16 1.37 1.00 0.49 0.71
pSimple
.83 .92 .81 .86 .84 .60 .81
pComplex
.84 .84 .89 .78 .93 .67 .82
pSimple
.72 .72 .78 .77 .81 .87 .78
pComplex
.65 .65 .50 .81 .51 .64 .63
0 Simple
1.58 1.63 1.73 1.26 0.95 1.25 1.40
0 Complex
1.68 1.48 2.55 1.09 3.55 1.06 1.90
0 Simple
1.35 1.09 1.23 1.55 1.72 2.42 1.56
0 Complex
1.62 1.22 1.08 2.10 0.78 2.12 1.49
SimpleSlope
0.68 0.05 0.07 0.29 0.36 0.41 0.17
ComplexSlope
0.74 0.14 0.20 0.90 0.83 0.29 0.42
SimpleSlope
0.05 0.07 0.29 0.27 0.58 0.36 0.13
ComplexSlope
0.06 0.04 0.02 0.76 0.38 0.72 0.20
Summary
2
79.1 67.04 54.99 67.42 81.97 72.61 423.12
df 66 56 69 63 57 72 383
p.129 .148 .890 .329 .017 .458 .077
Note. The upper part of the table shows constants derived from direct measures of eye movements. Free
parameter values in midsection and goodness of fit in bottom rows. Constants for RG and JO were treated as
partially free parameters (see text and note to Table 2).
Table 5
Experiment 2: The Velocity of Mental Rotation
Participants
Parameters ML RG LL TS JO PH M
Basic
0.03 0.34 0.20 0.08 0.07 0.04 0.12
SecPassSimple
0.00 1.03 0.00 0.64 1.03 0.37 0.51
SecPassComplex
0.61 0.65 0.16 1.37 1.00 0.49 0.71
SuccSimple
0.79 1.25 0.25 2.25 1.47 1.50 1.10
SuccComplex
1.03 1.93 0.43 2.34 2.80 3.40 1.71
SimSimple
0.89 1.48 0.22 2.62 4.74 0.68 1.99
SimComplex
3.59 2.67 1.54 10.71 12.08 1.54 6.12
Note. For abbreviations and conventions see text and Table 1. Prefixes
Sim and Succ designate the simultaneous and successive condition, respec-
tively. The estimates in the three upper rows are grand means based the
analysis of eye movements. The first row represents the linear increase
(ms/degree) in the duration of fixations in the second pass across simple
and complex trials. The second and third row represents the linear incre-
ments in the simple mental rotation functions as function of complexity.
The four lower rows represent grand means based on RTs. The correlation
between the simple slopes in the second pass and the simple slope is
successive matching is .81. RT reaction time.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
15
DECONSTRUCTING MENTAL ROTATION
per 60
0
), and orders of magnitude faster than the velocity (about
1,000 ms per 60
0
) originally reported by R. N. Shepard and
Metzler (1971). Nevertheless, by coupling the partitioning of eye
movements into initial latency, first pass, saccade switches, and
second pass to the parameters in random walk model it is possible
to bridge this gap.
This is illustrated in detail for each participant in Table 5, which
in the first row shows the linear increment in the mean duration of
fixations in the second pass, and next (Table 5, rows 2 and 3) the
linear increment in visual processing time, measured by the mean
duration of fixations and saccades on a stimulus in the second pass
(i.e., the linear increment in the simple mental rotation in the
random walk model). The bottom rows show slope constants,
based on linear fits to RTs. As can be seen, Participant JO for
example, has a mental rotation rate based on RTs about 720 ms per
60
0
, fairly close to the rate (1,000 ms per 60
0
) reported by R. N.
Shepard and Metzler (1971). This rate is based on the contribution
from saccade switches (shown for grand means in Table 3), and
JO’s (simple) mental rotation rate (roughly between 100 and 180
ms per 60
0
) based on the average duration as a function of angular
difference JO looked at one object in the second pass. This simple
mental rotation slope constant may be further decomposed to the
linear combination of constant saccades and linearly increasing
duration of fixations. The increase in the duration of fixations in
the second pass is small, implying a rapid basic orientation align-
ment rate (for JO about 4 ms per 60
0
).
It is interesting that using a more complex task with visual
identification of two-dimensional projections of cubes portrayed in
different orientations Dahlstrom-Hakki et al. (2008) also found
that response times ranging from about 10,000 ms (no difference
in orientation) to about 22,500 ms (angular difference of 270
0
)
related to repeated applications of a simple mental rotation func-
tion. This very large span in RTs characterized a group of slow
male subjects, but even with much faster subjects (RT range
3,500 –6,000 ms), and groups in between the very fast and very
slow, a reduction of RT functions to repeated application of a
simple mental rotation function appears to make very good sense.
For the slow group of males, the simple mental rotation (of 0
0
)
was repeated about 10 times with no difference in orientation
between stimulus objects. When angular difference was 270
0
the
mental rotation was repeated about 40 times. Dahlstrom-Hakki et
al. (2008) defined gaze duration on a face (of a stimulus cube) as
the sum of the fixations on that cube before another face was
fixated. The slope of the gaze duration as a function of angular
difference in orientation (i.e., the slope of the simple mental
rotation function) was roughly the same for slow and fast subjects,
and across subjects about 0.41 ms/degree.
In Experiment 2, the corresponding slope constants (of about
0.50 and 0.79 ms/degree, see Table 5) include the contribution
from the linearly increasing number of saccades as a function of
angular difference, which may be elicited by a need to check more
subpatterns because objects at large angular differences look more
and more dissimilar. However, this will at most amount to about
0.03 ms/degree on the assumption that saccades on stimulus ob-
jects are about 27 ms. Also note, that more than three or four
saccades on an object in the present context seems highly unlikely
in view of the limited VSTM capacity of three or four items or less.
Keeping this in mind, the estimates of the simple mental rotation
slopes in Dahlstrom-Hakki et al. (2008) and Experiment 2 roughly
appear of the same order of magnitude.
Successive Matching
A major reason for running the successive matching task was to
test the idea that the linear increase in the duration of the gaze (i.e.,
the mean of the sum of the duration of saccades and fixations) on
stimuli in the second pass equaled the linear increase in successive
matching RTs. Table 5 illustrates that this idea is contradicted by
the data. As expected the correlation between the slope constants
in successive matching (Table 5, rows 4 and 5) and corresponding
slope constants of the simple mental rotation functions (Table 5,
rows 2 and 3) is fairly high (.81). However, in 11 of 12 compar-
isons the slope constant was larger in the successive matching task
than the corresponding slope constant for the simple mental rota-
tion in simultaneous matching (p.003, cf. Table 5).
In the simultaneous matching task the participant has all the
information needed for a decision until the trial is terminated by a
response. Thus, in the simultaneous matching task, in which the
participant fully controls exposure duration, it is always possible to
go back and re-encode a stimulus. In response to the experimental
conditions in the successive matching task on the other hand, it
makes sense to encode the first stimulus thoroughly by more
fixations (and saccades), and possibly encode and retain some
features (e.g., verbally) in nonvisual buffers during the 2,000 ms
presentation of the first stimulus. Mental rotation of this presum-
ably richer encoded stimulus image in the successive matching
condition, in which more features may be aligned with respect to
orientation, should tend to generate larger slope constants in the
successive matching task.
Repeating Mental Rotation
The random walk model accounts for saccade switches between
stimulus objects in the simultaneous matching task (see Figure 6),
but offers no insight into the details of the underlying processes.
Two possibilities seem rather straightforward, however. (a) The
participants may solve the task piece by piece; that is, by first
encoding a segment of a stimulus, switching to the other stimulus
to make the comparison after a simple mental rotation, switching
back to the first stimulus, encoding a new (or the same) segment,
again switching to the other stimulus to compare the mentally
rotated encoded segment and so forth until the evidence sufficient
for a match/mismatch decision has been accumulated.
If participants stick to this procedure then it follows that the last
saccade switch on a trial should always be odd (1, 3, 5, etc.), never
even. Furthermore, because re-encoding (following even num-
bered switches) would not entail mental rotation, only fixations
following odd switches reflect the basic orientation alignment. (b)
After the first switch (or generally any odd numbered switch),
observers may happen to note characteristic features in the stim-
ulus they currently study, and switch back and mentally rotate the
image of these features in the opposite direction in order to test for
a match.
Experiments 1 and 2 were not designed to elucidate these
hypotheses. Because the fine grained processing underlying the
switches is uncertain, the estimates of the basic mental rotation
velocity that is based on the duration of fixations in the second
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
16 LARSEN
pass (see Table 5) may be too high, and should at any rate be
viewed with some caution.
Is MT/V5 the Neural Mechanism That Align the
Orientation of Visual Images?
In a meta-analysis of 32 investigations of brain activations
during mental rotation Zacks (2008) found strong evidence for the
hypothesis that the human motion area V5/MTis strongly im-
plicated in mental rotation. Zacks reported that all experiments
using transformation specific contrasts (i.e., within-task com-
parisons of effects of mental rotation, e.g., comparing large
rotations with small rotations) discovered activations about
(47.5, 59.5, 10.0, in Talairach space), which corresponds
to the visual motion area (V5/MT).
It is interesting that studies on the visual psychophysics of
motion and mental rotation seem to explain why V5/MTis
implicated in mental rotation. One group of studies documents that
MAE interferes with mental rotation (see, e.g., Corballis &
McLaren, 1982; Heil, Bajric, Rösler, & Henninghausen, 1997;
Jolicoeur et al., 1998; Seurinck et al., 2011).
Another group of investigations report the remarkably close
functional relationships between online visual motion perception
of stroboscopic stimuli that differ in orientation or size, and visual
identification of stimuli that portray the same objects in different
orientation or in different size (see Bundesen, Larsen, & Farrell,
1981; Bundesen, Larsen, & Farrell, 1983; Farrell et al., 1982;
Larsen, 1985; Larsen & Bundesen, 2009; R. N. Shepard & Judd,
1976). In the studies of visual apparent motion observers view two
stimuli that differ in orientation, size, or both orientation and size.
The stimuli are presented in sequential alternation (zero ISI)
which, provided suitable timing of the stimuli, generate vivid
impressions of revolving shape preserving motion, when the stim-
uli differ in orientation; impressions of an object that keep distal
size while moving back and forth in visual space, when the stimuli
differ in size; and impressions of screw-like helical motion in
depth, when the stimuli differ with respect to orientation and size.
The minimum SOA in which shape preserving motion breaks
down increases as linear function of angular difference, and as a
linear function of (s– 1)/(s1), where srepresent the size ratio
4
between stimuli. When the stroboscopic stimuli differ with respect
to orientation and size, SOA thresholds combine additively as a
joint function of angular difference and (s– 1)/(s1). These
results for the SOA dependencies in visual motion have direct
analogues in visual identification latencies of objects that differ in
orientation, size, or both orientation and size (Bundesen, Larsen, &
Farrell, 1981; Larsen, 1985; Sekuler & Nash, 1972).
The remarkable functional similarity between visual perception
and visual imagery notwithstanding, there is a puzzling difference
in the magnitude of temporal effects between SOA thresholds in
visual motion perception and RTs in visual object identification.
For example, R. N. Shepard and Judd (1976) observed that while
the slope in apparent form preserving revolving motion was of the
order of 60 ms per 60
0
, the corresponding slope in mental rotation
was about 1,000 ms per 60
0
. R. N. Shepard and Judd argued that
the difference probably related to mental rotation being “inner”
driven and visual motion driven externally.
However, the data in Table 5 (rows 2 and 3) point to a direct link
at comparable time scales between visual motion perception and
mental rotation. As can been seen from Table 5 simple mental
rotations are done with speed of about (30 or 40 ms per 60
0
), which
fit pretty well with the reported findings for online perception of
apparent rotational motion (Bundesen et al., 1983; Farrell et al.,
1982; R. N. Shepard & Judd, 1976).
Perspectives
Individual differences in mental rotation proficiency is an active
research area and has been investigated in numerous studies. For
example in children as a function age and gender (e.g., Jansen,
Schmelter, Quaiser-Pohl, Neuburger, & Heil, 2013), as a function
of training (e.g., Heil, Rösler, Link, & Bajric, 1998; Moreau,
2013), and in neurological and clinical syndromes (e.g., Fiorio,
Tinazzi, & Aglioti, 2006; Rogers et al., 2002).
The analysis and modeling of mental rotation in this article may
be very useful in unraveling the nature of the effects on mental
rotation in many of these studies. For instance, are effects of
training due to a speed-up of the simple (or basic) mental rotation
rate, or due to a reduction in saccade switches back and forth
between stimulus objects, or a more efficient assembly of the triad
of local components in the global module? Is the nature of the
developmental change in the ability to identify congruent objects
in different orientation related to the concomitant increase in
VSTM capacity?
Concluding Remarks
For each of 10 participants a random walk model accounts for
the approximately linearly increasing RTs on positive trials, flat
RTs on negative trials, and false alarms and miss rates as functions
of angular difference. In addition the model also predicted effects
of complexity and the number of eye movement switches between
stimuli as functions of angular difference in orientation for the six
participants in Experiment 2. The model assumes that a global
module, comprising encoding one of the stimulus objects into
VSTM, a simple mental rotation of the encoded image to fit the
other object, and match of image and object, is repeated due to the
limited VSTM capacity until the accumulated evidence for a
response has been sampled. The number of repetitions to reach the
evidence needed for reaching a decision is then treated as a random
walk (Feller, 1970).
The analysis of eye movements supports key aspects of the
model by replacing free parameters in the random walk model with
measures derived directly from partitioning RTs by the sequence
of eye movements. The eye movement analysis shows that pro-
cessing time is roughly a constant function of angular difference
until the first saccade switch between stimulus objects is com-
menced, while the duration of the remaining trial increases ap-
proximately linearly as a function of angular discrepancy. This
overall linear increase results from the additive effects of (a) a
linear increase in the number, but not the duration of saccades
between stimulus objects, (b) a linear increase in the number of
saccades of approximately constant duration on a stimulus, and (c)
4
In general SOA is a linear function of (s– 1)/(s1), where s
represents the size ratio between stimulus objects. In special viewing
conditions (see, Larsen & Bundesen, 2009), however, SOA is just a linear
function of s–1.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
17
DECONSTRUCTING MENTAL ROTATION
a linear increase in the number and in the duration of fixations on
a stimulus object. The slope constants for the duration of fixations
on trials with simple and complex stimuli were small (about 15 ms
per 60
0
), but they do not seem different.
The approximately constant duration of saccades as a function
of angular difference, the findings that mental rotation may not be
done (is suppressed) during saccades (Irwin & Brockmole, 2000;
Irwin & Carlson-Radvansky, 1996), and the observation that the
duration of fixations (in the second pass) increases approximately
linearly as a function of angular difference, all support the hypoth-
esis that the critical orientation alignment takes place during fix-
ations. The small slope constant suggests that this basic orientation
alignment of a visual image (or parts thereof) take place at very
high speed (perhaps about 15 ms per 60
0
), which together with
converging evidence from brain imaging and visual psychophysics
suggest that the alignment is done by mechanisms developed for
online motion perception.
In conclusion, deciding whether objects that appear in different
orientation are identical, may be surprisingly time consuming,
sometimes taking 4 or 5 s, occasionally even 10 or 15 s. The paper
presents and tests aspects of a general computational framework
that explains visual behavior in the classic mental rotation task.
The framework integrates (a) the underlying processes running on
a millisecond time scale revealed by eye movements to (b) local
processes measured on a time scale in hundreds of milliseconds,
and at a higher level, (c) local processes to a global process
measured in seconds.
References
Alvarez, G. A., & P. Cavanagh, P. (2004). The capacity of visual short-
term memory is set both by visual information load and by number of
objects. Psychological Science, 15, 106 –111. doi:10.1111/j.0963-7214
.2004.01502006.x
Anderson, J. R. (1978). Arguments concerning representations for mental
imagery. Psychological Review, 85, 249 –277. doi:10.1037/0033-295X
.85.4.249
Bethell-Fox, C. E., & Shepard, R. N. (1988). Mental rotation: Effects of
stimulus complexity and familiarity. Journal of Experimental Psychol-
ogy: Human Perception and Performance, 14, 12–23. doi:10.1037/
0096-1523.14.1.12
Biederman, I. (1987). Recognition-by-components: A theory of human
image understanding. Psychological Review, 94, 115–147. doi:10.1037/
0033-295X.94.2.115
Borst, G., Kievit, R. A., Thompson, W. L., & Kosslyn, S. M. (2011).
Mental rotation is not easily cognitively penetrable. Journal of Cognitive
Psychology, 23, 60 –75. doi:10.1080/20445911.2011.454498
Bundesen, C. (1982). Item recognition with automatized performance.
Scandinavian Journal of Psychology, 23, 173–192. doi:10.1111/j.1467-
9450.1982.tb00431.x
Bundesen, C. (1990). A theory of visual attention. Psychological Review,
97, 523–547. doi:10.1037/0033-295X.97.4.523
Bundesen, C., & Larsen, A. (1975). Visual transformation of size. Journal
of Experimental Psychology: Human Perception and Performance, 1,
214 –220. doi:10.1037/0096-1523.1.3.214
Bundesen, C., Larsen, A., & Farrell, J. E. (1981). Mental transformations
of size and orientation. In J. Long & A. Baddeley (Eds.), Attention and
performance IX (pp. 279 –294). Hillsdale, NJ: Erlbaum.
Bundesen, C., Larsen, A., & Farrell, J. E. (1983). Visual apparent move-
ment: Transformations of size and orientation. Perception, 12, 549 –558.
doi:10.1068/pp.120549
Bundesen, C., Pedersen, L. F., & Larsen, A. (1984). Measuring efficiency
of selection from briefly exposed visual displays: A model for partial
report. Journal of Experimental Psychology: Human Perception and
Performance, 10, 329 –339. doi:10.1037/0096-1523.10.3.329
Carpenter, P. A., & Just, M. A. (1978). Eye fixations during mental
rotation. In J. W. Senders, D. F. Fisher, & R. A. Monty (Eds.), Eye
movements and the psychological functions (pp. 115–133). Hillsdale,
NJ: Erlbaum.
Cohen, D. J., & Kubovy, M. (1993). Mental rotation, mental representa-
tion, and flat slopes. Cognitive Psychology, 25, 351–382. doi:10.1006/
cogp.1993.1009
Cooper, L. A. (1975). Mental rotation of random two-dimensional shapes.
Cognitive Psychology, 7, 20 – 43. doi:10.1016/0010-0285(75)90003-1
Cooper, L. A. (1976). Demonstration of a mental analog of an external
rotation. Perception & Psychophysics, 19, 296–302. doi:10.3758/
BF03204234
Cooper, L. A., & Podgorny, P. (1976). Mental transformations and visual
comparison processes. Journal of Experimental Psychology: Human
Perception and Performance, 2, 503–514. doi:10.1037/0096-1523.2.4
.503
Cooper, L. A., & Shepard, R. N. (1973). Chronometric studies of the
rotation of mental images. In W. G. Chase (Ed.), Visual information
processing (pp. 75–176). Oxford, England: Academic.
Corballis, M. C., & McLaren, R. (1982). Interaction between perceived and
imagined rotation. Journal of Experimental Psychology: Human Per-
ception and Performance, 8, 215–224. doi:10.1037/0096-1523.8.2.215
Dahlstrom-Hakki, I., Pollatsek, A., Fisher, D. L., Miller, B., & Rayner, K.
(2008). Eye movements and individual differences in mental rotation. In
K. Rayner, D. Shen, X. Bai, & G. Yan (Eds.), Cognitive and cultural
influences on eye movements (pp. 209 –232). Hove, England: Psychol-
ogy Press.
Dodwell, P. C. (1970). Visual pattern recognition. New York, NY: Holt,
Rinehart & Winston.
Edelman, S. (1995). Class similarity and viewpoint invariance in the
recognition of 3D objects. Biological Cybernetics, 72, 207–220. doi:
10.1007/BF00201485
Farrell, J. E., Larsen, A., & Bundesen, C. (1982). Velocity constraints on
apparent rotational movement. Perception, 11, 541–546. doi:10.1068/pp
.110541
Feller, W. (1970). An introduction to probability theory and its applica-
tions. New York, NY: Wiley.
Fiorio, M., Tinazzi, M., & Aglioti, S. M. (2006). Selective impairment of
hand mental rotation in patients with focal hand dystonia. Brain, 129,
47–54. doi:10.1093/brain/awh630
Folk, M. D., & Luce, R. D. (1987). Effects of stimulus complexity on
mental rotation rate of polygons. Journal of Experimental Psychology:
Human Perception and Performance, 13, 395– 404. doi:10.1037/0096-
1523.13.3.395
Förster, B., Gebhardt, R-P., Lindlar, K., Siemann, M., & Delius, J. D.
(1996). Mental-rotation effect: A function of elementary stimulus dis-
criminability? Perception, 25, 1301–1316. doi:10.1068/pp.251301
Furmanski, C. S., & Engel, S. A. (2000). Perceptual learning in object
recognition: Object specificity and size invariance. Vision Research, 40,
473– 484. doi:10.1016/S0042-6989(99)00134-0
Gibson, E. J. (1969). Principles of perceptual learning and development.
East Norwalk, CT: Appleton-Century-Crofts.
Graf, M. (2006). Coordinate transformations in object recognition. Psy-
chological Bulletin, 132, 920 –945. doi:10.1037/0033-2909.132.6.920
Hebb, D. O. (1949). The organization of behavior. New York, NY: Wiley.
Heil, M., Bajric, J., Rösler, F., & Hennighausen, E. (1997). A rotation
aftereffect changes both the speed and the preferred direction of mental
rotation. Journal of Experimental Psychology: Human Perception and
Performance, 23, 681– 692. doi:10.1037/0096-1523.23.3.681
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
18 LARSEN
Heil, M., Rösler, F., Link, M., & Bajric, J. (1998). What is improved if a
mental rotation task is repeated—The efficiency of memory access, or
the speed of a transformation routine? Psychological Research, 61,
99 –106. doi:10.1007/s004260050016
Hodgetts, C. J., Hahn, U., & Chater, N. (2009). Transformation and
alignment in similarity. Cognition, 113, 62–79. doi:10.1016/j.cognition
.2009.07.010
Hyun, J.-S., & Luck, S. J. (2007). Visual working memory as the substrate
for mental rotation. Psychonomic Bulletin & Review, 14, 154 –158.
doi:10.3758/BF03194043
Irwin, D. (2004). Fixation location and fixation duration as indices of
cognitive processing. In J. M. Henderson & F. Ferreira (Eds.), The
interface of language, vision, and action: Eye movements and the visual
world (pp. 105–134). New York, NY: Psychology Press.
Irwin, D. E., & Brockmole, J. R. (2000). Mental rotation is suppressed
during saccadic eye movements. Psychonomic Bulletin & Review, 7,
654 – 661. doi:10.3758/BF03213003
Irwin, D. E., & Carlson-Radvansky, L. A. (1996). Cognitive suppression
during saccadic eye movements. Psychological Science, 7, 83– 88. doi:
10.1111/j.1467-9280.1996.tb00334.x
Jansen, P., Schmelter, A., Quaiser-Pohl, C., Neuburger, S., & Heil, M.
(2013). Mental rotation performance in primary school age children: Are
there gender differences in chronometric tests? Cognitive Development,
28, 51– 62. doi:10.1016/j.cogdev.2012.08.005
Jolicoeur, P., Corballis, M. C., & Lawson, R. (1998. The influence of
perceived rotary motion on the recognition of rotated objects. Psycho-
nomic Bulletin & Review, 5, 140 –146. doi:10.3758/BF03209470
Jonikaitis, D., Deubel, H., de’Sperati, C. (2009). Time gaps in mental
imagery introduced by competing saccadic tasks. Vision Research, 49,
2164 –2175. doi:org/10.1016/j.visres.2009.05.021
Just, M. A., & Carpenter, P. A. (1976). Eye fixations and cognitive
processes. Cognitive Psychology, 8, 441– 480. doi:10.1016/0010-
0285(76)90015-3
Köhler, W. (1929). Gestalt psychology. Oxford, England: Liveright.
Kosslyn, S. M. (1973). Scanning visual images: Some structural implica-
tions. Perception & Psychophysics, 14, 90 –94. doi:10.3758/
BF03198621
Kosslyn, S. M. (1980). Image and mind. Cambridge, MA: Harvard Uni-
versity Press.
Larsen, A. (1985). Pattern matching: Effects of size ratio, angular differ-
ence in orientation, and familiarity. Perception and Psychophysics, 38,
63– 68. doi:10.3758/BF03202925
Larsen, A., & Bundesen, C. (1998). Effects of spatial separation in visual
pattern matching: Evidence on the role of mental translation. Journal of
Experimental Psychology: Human Perception and Performance, 24,
719 –731. doi:10.1037/0096-1523.24.3.719
Larsen, A., & Bundesen, C. (2009). Common mechanisms in apparent
motion perception and visual pattern matching. Scandinavian Journal of
Psychology, 50, 526 –534. doi:10.1111/j.1467-9450.2009.00782.x
Larsen, A., McIlhagga, W., & Bundesen, C. (1999). Visual pattern match-
ing: Effects of size ratio, complexity, and similarity in visual pattern
matching. Psychological Research/Psychologische Forschung, 62, 280 –
288. doi:10.1007/s004260050058
Lashley, K. S. (1942). The problem of cerebral organization in vision. In
H. Klüver (Ed.), Biological symposia: Visual mechanisms (pp. 301–
322). Lancaster, PA: Cattell Press.
Liesefeld, H. R., & Zimmer, H. D. (2013). Think spatial: The representa-
tion in mental rotation is nonvisual. Journal of Experimental Psychol-
ogy: Learning, Memory, and Cognition, 39, 167–182. doi:10.1037/
a0028904
Luck, S. J., & Vogel, E. K. (1997). The capacity of visual working memory
for features and conjunctions. Nature, 390, 279 –281. doi:10.1038/36846
Mach, E. (1902). Die Analyse der Empfindungen und das Verhältnis des
Physischen um Psychischen [The Analysis of Sensations and the Relation of
the Physical to the Psychical] (3rd ed.). Jena, Germany: Gustav Fischer.
Metzler, J., & Shepard, R. N. (1974). Transformational studies of the
internal representation of three-dimensional objects. In R. L. Solso (Ed.),
Theories in cognitive psychology: The Loyola Symposium (pp. 146 –
201). Oxford, England: Erlbaum.
Moreau, D. (2013). Differentiating two- from three-dimensional mental
rotation training effects. Quarterly Journal of Experimental Psychology,
66, 1399 –1413. doi:10.1080/17470218.2012.744761
Nakatani, C., & Pollatsek, A. (2004). An eye movement analysis of
“mental rotation” of simple scenes. Perception & Psychophysics, 66,
1227–1245. doi:10.3758/BF03196848
Pashler, H. (1988). Familiarity and visual change detection. Perception &
Pschophysics, 44, 369 –378. doi:10.3758/BF03210419
Phillips, W. A. (1974). On the distinction between sensory storage and
short-term visual memory. Perception and Psychophysics, 16, 283–290.
doi:10.3758/BF03203943
Pitts, W., & McCulloch, W. S. (1947). How we know universals: The
perception of auditory and visual forms. Bulletin of Mathematical Bio-
physics, 9, 127–147. doi:10.1007/BF02478291
Prime, D. J., & Jolicoeur, P. (2010). Mental rotation requires visual
short-term memory: Evidence from human electric cortical activity.
Journal of Cognitive Neuroscience, 22, 2437–2446. doi:10.1162/jocn
.2009.21337
Pylyshyn, Z. W. (1973). What the mind’s eye tells the mind’s brain: A
critique of mental imagery. Psychological Bulletin, 80, 1–24. doi:
10.1037/h0034650
Pylyshyn, Z. W. (1979). The rate of mental rotation of images: A test of a
holistic analogue hypothesis. Memory & Cognition, 7, 19 –28. doi:
10.3758/BF03196930
Pylyshyn, Z. (2003). Return of the mental image: “Are there really pictures
in the brain?” Trends in Cognitive Sciences, 7, 113–118. doi:
10.1016/S1364-6613(03)00003-2
Rock, I. (1956). The orientation of forms on the retina and in the environ-
ment. The American Journal of Psychology, 69, 513–528. doi:10.2307/
1419077
Rogers, M. A., Bradshaw, J. L., Phillips, J. G., Chiu, E., Mileshkin, C., &
Vaddadi, K. (2002). Mental rotation in unipolar major depression. Jour-
nal of Clinical and Experimental Neuropsychology, 24, 101–106. doi:
10.1076/jcen.24.1.101.974
Searle, J. A., & Hamm, J. P. (2012). Individual differences in the mixture
ratio of rotation and nonrotation trials during rotated mirror/normal letter
discriminations. Memory & Cognition, 40, 594 – 613. doi:10.3758/
s13421-011-0172-2
Sekuler, R., & Nash, D. (1972). Speed of size scaling in human vision.
Psychonomic Science, 27, 93–94. doi:10.3758/BF03328898
Selfridge, O. G. (1959). Pandemonium: A paradigm for learning. In
Mechanisation of thought processes (pp. 511–526). London, England:
Her Majesty’s Stationery Office.
Seurinck, R., de Lange, F. P., Achten, E., & Vingerhoets, G. (2011).
Mental rotation meets the motion aftereffect: The role of hV5/MT in
visual mental imagery. Journal of Cognitive Neuroscience, 23, 1395–
1404. doi:10.1162/jocn.2010.21525
Shepard, R. N., & Cooper, L. A. (Eds.). (1982). Mental images and their
transformations. Cambridge, MA: MIT Press.
Shepard, R. N., & Judd, S. A. (1976). Perceptual illusion of rotation of
three-dimensional objects. Science, 191, 952–954. doi:10.1126/science
.1251207
Shepard, R. N., & Metzler, J. (1971). Mental rotation of three-dimensional
objects. Science, 171, 701–703. doi:10.1126/science.171.3972.701
Shepard, S., & Metzler, D. (1988). Mental rotation: Effects of dimension-
ality of objects and type of task Journal of Experimental Psychology:
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
19
DECONSTRUCTING MENTAL ROTATION
Human Perception and Performance, 14, 3–11. doi:10.1037/0096-1523
.14.1.3
Shibuya, H., & Bundesen, C. (1988). Visual selection from multi-element
displays: Measuring and modeling effects of exposure duration. Journal
of Experimental Psychology: Human Perception and Performance, 14,
591– 600. doi:10.1037/0096-1523.14.4.591
Sørensen, T. A., & Kyllingsbæk, S. (2012). Short-term storage capacity for
visual objects depends on expertise. Acta Psychologica, 140, 158 –163.
doi:10.1016/j.actpsy.2012.04.004
Sperling, G. (1960). The information available in brief visual presentations.
Psychological Monographs: General and Applied, 74, 1–29. doi:
10.1037/h0093759
Sutherland, N. S. (1968). Outlines of a theory of visual pattern recognition
in animals and man. Proceedings of the Royal Society of London: Series
B, 171, 297–317. doi:10.1098/rspb.1968.0072
Takano, Y. (1989). Perception of rotated forms: A theory of information
types. Cognitive Psychology, 21, 1–59. doi:10.1016/0010-
0285(89)90002-9
Tarr, M. J., & Gauthier, I. (1998). Do viewpoint-dependent mechanisms
generalize across members of a class? Cognition, 67, 73–110. doi:
10.1016/S0010-0277(98)00023-7
Todd, J. J., & Marois, R. (2004). Capacity limit of visual short-term
memory in human posterior parietal cortex. Nature, 428, 751–754.
doi:10.1038/nature02466
Vogel, E. K., & Machizawa, M. G. (2004). Neural activity predicts indi-
vidual differences in visual working memory capacity. Nature, 428,
748 –751. doi:10.1038/nature02447
Wheeler, M. E., & Treisman, A. M. (2002). Binding in short-term visual
memory. Journal of Experimental Psychology: General, 131, 48 – 64.
doi:10.1037/0096-3445.131.1.48
Yuille, J. C., & Steiger, J. H. (1982). Nonholistic processing in mental
rotation: Some suggestive evidence. Perception & Psychophysics, 31,
201–209. doi:10.3758/BF03202524
Zacks, J. M. (2008). Neuroimaging studies of mental rotation: A meta-
analysis and review. Journal of Cognitive Neuroscience, 20, 1–19.
doi:10.1162/jocn.2008.20013
Received June 25, 2013
Revision received December 9, 2013
Accepted December 9, 2013
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
20 LARSEN
... We selected this task because it is one of the most heavily researched tasks of visuospatial cognitive ability that allows both accuracy and response time to be measured in a trial-by-trial fashion. Mental rotation was also an attractive option for the current study because it has previously been studied in adults using evidence accumulation models similar to the diffusion model (Larsen, 2014;Provost & Heathcote, 2015). If specific difficulties in the mental rotation process exist, it would be captured by a significant ADHD × Rotation interaction for Ter, where group differences increase with angle of rotation. ...
... There are of course limitations as well. First, although the mental rotation task is among the most widely accepted tests of visuospatial cognition and has been previously evaluated with evidence accumulation models (Larsen, 2014;Provost & Heathcote, 2015), it does not capture all elements of spatial cognition. Future studies are encouraged to evaluate the generalizability of our findings by applying the DM to other spatial processes such as perspective taking, spatial memory, or other forms of mental transformation (e.g., 3-dimensional rotation, mental folding, brittle transformations) (Bayliss, Jarrold, Baddeley, Gunn, & Leigh, 2005;Kozhevnikov & Hegarty, 2001;Moreau, 2013;Resnick & Shipley, 2013;Rump & McNamara, 2013). ...
... Thus, our findings for the influence of rotation on boundary separation or Ter might be questioned. However, in this particular task, stimuli are actively rotated before the onset of the accumulation stage, such that knowledge of rotation angle could theoretically be used to determine the appropriate level of caution for each trial (Larsen, 2014). Likewise, since a significant portion of Ter is rotation time, which necessarily varies by angular discrepancy, trial-by-trial differences are to be expected. ...
Article
Objectives Multiple studies have found evidence of task non-specific slow drift rate in ADHD, and slow drift rate has rapidly become one of the most visible cognitive hallmarks of the disorder. In this study, we use the diffusion model to determine whether atypicalities in visuospatial cognitive processing exist independently of slow drift rate. Methods Eight- to twelve-year-old children with ( n = 207) and without ADHD ( n = 99) completed a 144-trial mental rotation task. Results Performance of children with ADHD was less accurate and more variable than non-ADHD controls, but there were no group differences in mean response time. Drift rate was slower, but nondecision time was faster for children with ADHD. A Rotation × ADHD interaction for boundary separation was also found in which children with ADHD did not strategically adjust their response thresholds to the same degree as non-ADHD controls. However, the Rotation × ADHD interaction was not significant for nondecision time, which would have been the primary indicator of a specific deficit in mental rotation per se . Conclusions Poorer performance on the mental rotation task was due to slow rate of evidence accumulation, as well as relative inflexibility in adjusting boundary separation, but not to impaired visuospatial processing specifically. We discuss the implications of these findings for future cognitive research in ADHD.
... As individuals' fixations maintain the gaze on a single location, MR is closely related to our ability to visually encode spatially distributed information [17,18]. While there are studies [7,16,[19][20][21][22], that have reported the changes in gaze metrics during the MR tasks, saccadic characteristics require attention of researchers. Irwin et al. [20] suggested that there is suppression of the MR process during the saccadic eye movements. ...
... A negative correlation was found between saccade duration and fixation count that means higher fixation count in reflex angle range than convex. Since the number of saccades between stimuli decreases with experience, participants encode more complete representations, requiring fewer encode-rotate-compare iterations [21,40]. The cognitive emphasis on the angles or major feature of the visual stimulus has significant contribution in the creation of an internal representation of the stimulus, and when an original stimulus is perceived and interpreted by matching, similar fixation pattern is used on the matched stimuli. ...
Article
Full-text available
Mental rotation (MR) is an important aspect of cognitive processing in gaming since transformation and manipulation of visuospatial information are necessary in order to execute a gaming task. This study provides insights on saccadic characteristics in gaming task performance that involves 2D and 3D isomorphic objects with varying angular disparity. Healthy participants (N = 60) performed MR gaming task. Each participant was tested individually in an acoustic treated lab environment. Gaze behavior data of all participants were recorded during task execution and analyzed to find the changes in spatiotemporal characteristics of saccades associated with the variation in angular disparity and dimensionality. There were four groups with unique combination of angular disparity and dimensionality, each with fifteen participants randomly assigned. Results indicate that the spatial characteristics of the object affect the temporal aspect of saccade (duration), whereas the spatial aspect of the saccade (amplitude) is influenced by the objects’ dimension. A longer saccade duration indicates a prolonged suppression of spatial information processing during the MR tasks with objects at convex range angular disparities. Therefore, the MR tasks with convex angular disparity become more complex to process compared to the tasks with reflex angular disparity. MR process is faster and more accurate with 3D objects compared to the 2D objects. There is an interaction between angular disparity and dimensionality in terms of mental demand, such that the MR processing with 2D objects in reflex angular disparity was more mental demanding than that of convex angular disparity; however, this trend was absent in case of 3D objects. Hence, during the MR task, the longer saccade duration implies that the tasks with convex angular disparities become comparatively more challenging. Also, the lower saccadic amplitude for 2D objects indicates difficulties in processing due to deficient visual features. The findings could help in framing the computer-based game (or video game) concerning MR abilities for training or rehabilitation purposes.
... In their original study, Shepard and Metzler viewed the linear relationship between rotation angle and reaction time as evidence against conceptual or propositional processing of visual information 7,43 . Later research, which investigated the process of rotation itself, revealed that both a holistic and a piecemeal approach were used to align the figures 16,42,44 . When processing visual figures, motion parallax allows for lateral head movements, which could be used to decrease the rotation angle between the figures by changing perspectives. ...
Article
Full-text available
Mental rotation is the ability to rotate mental representations of objects in space. Shepard and Metzler’s shape-matching tasks, frequently used to test mental rotation, involve presenting pictorial representations of 3D objects. This stimulus material has raised questions regarding the ecological validity of the test for mental rotation with actual visual 3D objects. To systematically investigate differences in mental rotation with pictorial and visual stimuli, we compared data of N=54\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=54$$\end{document} university students from a virtual reality experiment. Comparing both conditions within subjects, we found higher accuracy and faster reaction times for 3D visual figures. We expected eye tracking to reveal differences in participants’ stimulus processing and mental rotation strategies induced by the visual differences. We statistically compared fixations (locations), saccades (directions), pupil changes, and head movements. Supplementary Shapley values of a Gradient Boosting Decision Tree algorithm were analyzed, which correctly classified the two conditions using eye and head movements. The results indicated that with visual 3D figures, the encoding of spatial information was less demanding, and participants may have used egocentric transformations and perspective changes. Moreover, participants showed eye movements associated with more holistic processing for visual 3D figures and more piecemeal processing for pictorial 2D figures.
... This showed a longer ter on large than small rotations, a well-documented finding across various studies on the mental rotation task (Feldman & Huang-Pollock, 2021;Provost & Heathcote, 2015;Tlauka et al., 2018). In this context, the longer ter could be attributed to slower visual stimulus encoding in larger rotation angles, a lengthier stimulus rotation process, and/or reduced motor preparation and execution speed (Feldman & Huang-Pollock, 2021;Larsen, 2014;Provost & Heathcote, 2015). These model-based analysis results underscore the importance of distinguishing performance-related factors from decisional and non-decisional processes in exploring conflict processing abilities during mental rotation. ...
Preprint
Full-text available
The conflict processing mechanisms in mental imagery within one spatial location have been conceptualized by the dual-stage two-phase model. However, it remains unclear whether the imagery conflict can arise across disparate spatial locations and how these models can account for decision-making across such locations. Our study addresses these questions by introducing a novel mental-rotation flanker task, comprising both a central gaze judgment task and a flanking gaze judgment task. As expected, we found a decreased imagery conflict effect with increasing mental rotation, even between different spatial locations. Interestingly, our cross-experimental analysis revealed enhanced processing of stimuli in the fovea compared to the peripheral visual field, alongside modulation of processing advantages in central vision by rotation angles. Furthermore, we successfully fitted three conflict drift-diffusion models to the experimental data using a hierarchical Bayesian method, revealing a two-phase conflict processing mechanism in mental imagery for both tasks. Specifically, in the central task, the model-based analysis revealed a reduced duration of non-decision process for small rotations, a more conservative response strategy, and enhanced processing of imagery conflict in the decision process. Conversely, in the flanking task, participants adopted a similar response strategy across various rotations, accompanied by a decreased rate of evidence accumulated rate in the second phase for small rotations. Overall, our results suggested that angle-related conflict processing across disparate spatial locations was driven by both decisional and non-decisional components. We discuss the implications of these findings for future research on the cognitive intricacies of mental rotation and conflict tasks.
... Visual literacy tasks with an average level of difficulty , such as mental rotation, assess our ability to determine whether objects have the same shape despite differences in orientation or size. This form of assessment is a classical visual perception problem in which participants have to imagine one of the figures rotated in the same orientation as the other to successfully mentally rotate in the given time (Larsen, 2014). Mental rotation calls for the visual review and spatial thinking to visualize the rotation of an object in space (Hegarty & Waller, 2005) and has been found to be a core component of scientific reasoning (Xu & Franconeri, 2015). ...
Article
Full-text available
The rapid development of other nations' science and technology makes it more difficult to stay competitive internationally without concentrating on how science is taught in US classes. Representative competence, the capacity to generate, transform, interpret and clarify representations, is the primary obstacle to visual literacy in science, technology, engineering and mathematics (STEM) fields and although the relationship between the fundamental visual literacy and domain-specific science literacy is known, how visual science literacy is achieved through science learning is still not fully understood. In order to improve student representational competence skills, the hierarchy of component visualization skills required to interpret these science representations needs to be identified in order to evaluate an individual’s level of visual science literacy and to provide the resources to enable the individual to reach the next literacy level. This involves the construction of instruments capable of assessing visual science literacy as well as a Rasch analysis to rank complexities of science visuals. This research investigates modern methods of assessing visual science literacy with a focus on using artificial neural networks (ANN) to analyze neurocognitive measurements captured during science content related tasks and requiring varying predictable levels of visual science literacy. The method of developing this machine learning tool will be detailed by investigating the ANN, successfully made using the Gradient Boosted Trees algorithm to analyze functional Near-Infrared Spectroscopy (fNIR) data. With an autonomic, neurocognitive, and quantitative scientific literacy assessment, educators and curriculum designers will have the ability to create more targeted classroom resources to enhance the visual and spatial cognitive processes behind visual science literacy. Keywords: visual science literacy, science literacy, visual literacy, artificial neural network, machine learning, artificial intelligence, mental rotation
... Processing efficiency, indexed by drift rate, was measured by applying the diffusion model to a perceptual decision task and a mental rotation task previously analyzed using evidence accumulation models Larsen, 2014;Provost & Heathcote, 2015). ...
Article
Full-text available
Slow drift rate has become one of the most salient cognitive deficits among children with ADHD, and has repeatedly been found to explain slow, variable, and error-prone performance on tasks of executive functioning (EF). The present study applies the diffusion model to determine whether slow drift rate better predicts parent and teacher ratings of ADHD than standard EF metrics. 201 children aged 8–12 completed two tests of speeded decision-making analyzed with the diffusion model and two traditionally scored tests of EF. Latent EF and drift rate factors each independently predicted the general ADHD factor in a bifactor model of ADHD, with poor EF and slow drift rate associated with greater ADHD symptomology. When both EF and drift rate were entered into the model, slow drift rate (but not EF) continued to predict elevated symptomology. These findings suggest that using drift rate to index task performance improves upon conventional approaches to measuring and conceptualizing cognitive dysfunction in ADHD. Implications for future cognitive research in ADHD are discussed.
... Encoding a representation of the first fixated object is followed by rotation of the representation to determine if it matches the second object. Afterward, a comparison of the rotated mental image to this second object precipitates a decision (Larsen 2014). The process of encoding is related to the early visual attention process, thereby impacting score accuracy in the classical mental rotation task (Ruggeri et al. 2020). ...
Article
The present study aimed to explore the influence of the chronotype on mental rotation performance in university students. Using the Morningness-Eveningness Questionnaire (MEQ), 24 healthy volunteers were categorized as either early chronotype (ECT) or late chronotype (LCT). Participants completed a chronometric mental rotation task with three-dimensional stimuli at different times of day (8 AM and 8 PM). ECT participants showed a shorter reaction time in the morning trial than in the evening (p = 0.003), whereas LCT participants showed a shorter reaction time (p = 0.001) and increased accuracy (p = 0.031) in the evening compared to the morning session. Additionally, the MEQ score was positively correlated with the difference in reaction time between morning and evening trials (r = −0.589, p = 0.002). Two-way analysis of variance revealed an interaction between time and chronotype for the parameter reaction time in the evening trials (F(1, 22) = 28.27, p < 0.001). LCT participants showed higher speed and increased accuracy during their optimal time compared to ECT participants. This study explored diurnal alterations of visual-spatial abilities assessed as mental rotation performance, and the possible implications for certain life skills such as sports, car driving, and manual labor are discussed.
Article
Mental rotation (MR) of character letters requires participants to mentally rotate the letter in their minds' eyes through a process akin to the physical rotation of the stimulus. It has been suggested that different cognitive processes are engaged during such MR of both canonical and mirror-reversed letters. In addition to the planar rotation of the canonical letters, an additional "flip-over" process (non-planar rotation) has been assumed during the MR of mirror-reversed letters. However, the temporal relationship between planar and non-planar rotation has not been systematically investigated. In this study, the occurrence of both planar and non-planar rotations were examined through the analysis of the event-related brain potentials (ERPs) elicited by canonical or mirror-reversed letters presented at different rotation angles between 300 and 1000 ms post-stimulus onset over consecutive 50ms time-windows. For smaller rotation angles (30° and 60°), non-planar rotation preceded planar rotation. For letters rotated by 90°, planar and non-planar rotation occurred at the same time. For larger angles (120° and 150°), the letter was first rotated within the plane (planar rotation) and afterwards it was also rotated out-of-the-plane (non-planar rotation) until it was fully canonicalized. Thus, the temporal relationship between planar and non-planar rotation differed for each rotation angle, with the non-planar rotation occurring at increasingly later intervals for different points in time for progressively larger rotation angles. These findings have relevant methodological implications for studies investigating the psychophysiological correlates of the mental rotation of mirror letters.
Article
The history of Danish neuroscience starts with an account of impressive contributions made at the 17th century. Thomas Bartholin was the first Danish neuroscientist, and his disciple Nicolaus Steno became internationally one of the most prominent neuroscientists in this period. From the start, Danish neuroscience was linked to clinical disciplines. This continued in the 19th and first half of the 20th centuries with new initiatives linking basic neuroscience to clinical neurology and psychiatry in the same scientific environment. Subsequently, from the middle of the 20th century, basic neuroscience was developing rapidly within the preclinical university sector. Clinical neuroscience continued and was even reinforced during this period with important translational research and a close co-operation between basic and clinical neuroscience. To distinguish 'history' from 'present time' is not easy, as many historical events continue in present time. Therefore, we decided to consider 'History' as new major scientific developments in Denmark, which were launched before the end of the 20th century. With this aim, scientists mentioned will have been born, with a few exceptions, no later than the early 1960s. However, we often refer to more recent publications in documenting the developments of initiatives launched before the end of the last century. In addition, several scientists have moved to Denmark after the beginning of the present century, and they certainly are contributing to the present status of Danish neuroscience-but, again, this is not the History of Danish neuroscience.
Article
As our viewpoint changes, the whole scene around us rotates coherently. This allows us to predict how one part of a scene (e.g., an object) will change by observing other parts (e.g., the scene background). While human object perception is known to be strongly context-dependent, previous research has largely focused on how scene context can disambiguate fixed object properties, such as identity (e.g., a car is easier to recognize on a road than on a beach). It remains an open question whether object representations are updated dynamically based on the surrounding scene context, for example across changes in viewpoint. Here, we tested whether human observers dynamically and automatically predict the appearance of objects based on the orientation of the background scene. In three behavioral experiments (N = 152), we temporarily occluded objects within scenes that rotated. Upon the objects' reappearance, participants had to perform a perceptual discrimination task, which did not require taking the scene rotation into account. Performance on this orthogonal task strongly depended on whether objects reappeared rotated coherently with the surrounding scene or not. This effect persisted even when a majority of trials violated this real-world contingency between scene and object, showcasing the automaticity of these scene-based predictions. These findings indicate that contextual information plays an important role in predicting object transformations in structured real-world environments.
Article
Full-text available
In a partial-report experiment, subjects reported the digits from a circular array of digits and letters terminated by a pattern mask. Individual frequency distributions of the number of correctly reported digits were analyzed as functions of number of digits (2, 4, or 6) and number of letters (0, 2, 4, 6, or 8) at nine exposure durations ranging from 10 to 200 ms. The distributions (hundreds of data points per subject) were accurately predicted by a four-parameter fixed-capacity independent race model that assumes exponentially distributed processing times, limitations in both processing capacity and storage capacity, and time-invariant selectivity. Estimated from the data, processing capacity C was 45 items/s, selectivity α (ratio between the amount of processing capacity devoted to a distractor and the amount devoted to a target) was 0.48, short-term storage capacity K was 3.5 items, and the longest ineffective exposure duration t0 was 18 ms.
Article
Full-text available
In Exp I, 48 17–42 yr olds with normal or corrected vision performed a mental-rotation task in which they were timed as they decided whether rotated letters were normal or backwards. Between presentations of the letters, Ss watched a rotating textured disk that induced an aftereffect of rotary movement on the letters. The function relating RTs to orientation was influenced asymmetrically by the aftereffect, suggesting that perceived movement interacts with imagined movement. Exp II, with 8 20–30 yr olds, showed that the aftereffect produced a negligible influence on perceived orientation, suggesting that the influence of the aftereffect on mental rotation was not caused by changes in the perceived orientations of the letters. Analysis of the mental-rotation functions suggested that the aftereffect may sometimes have induced Ss to rotate letters through the larger rather than the smaller angle back to the upright where the aftereffect was in the appropriate direction. (22 ref)
Article
Full-text available
The present study contrasts 3 theories which provide explanations for performance improvement in mental rotation tasks. Wallace and Hofelich conjectured that the process as such may be executed more rapidly after training, while Bethell-Fox and Shepard attributed practice effects to the fact that images may be transformed, first elementwise but later as a Gestalt. In contrast, Tarr and Pinker assumed that a transformation of an image will no longer be computed after training but simply be retrieved from memory. Thirty-seven subjects participated in 3 test sessions in which they had to decide on the parity of 3-D block figures presented from different perspectives. Experimental group subjects underwent 4 additional practice sessions in which a subset of the figures and a subset of perspective views were used. Tests adapted to the predictions of the 3 theories revealed specific learning effects but no transfer, either to old objects presented in new perspectives nor to new objects. This supports an instance-based explanation of practice effects which states that objects are represented in multiple perspective views.
Article
Full-text available
Six Ss (college students and staff) were required to discriminate previously learned "standard" versions of angular shapes from randomly perturbed "distractor" versions that varied in similarity to the standard. Advance information concerning the identity and the orientation of the test form was provided. Ss were instructed to prepare for the presentation of the test form by mentally rotating an internal representation of the designated standard form (identity cue) into the designated orientation. The time needed to prepare for the presentation of the test form increased linearly with the angular departure of the indicated orientation from a previously learned position. This finding suggests that, in accordance with instructions, Ss performed a mental rotation in preparing for the upcoming test shape. Rate of preparation was not affected by the complexity of the standard form presented as the identity cue. Discriminative reaction time was not affected by either test-form complexity or angular departure of the test form from the learned orientation. In addition, striking individual differences in the pattern of discriminative reaction times were found. (16 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
In contrast to the well documented male advantage in psychometric mental rotation tests, gender differences in chronometric experimental designs are still under dispute. Therefore, a systematic investigation of gender differences in mental rotation performance in primary-school children is presented in this paper. A chronometric mental rotation task was used to test 449 second and fourth graders. The children were tested in three separate groups each with different stimulus material (animal drawings, letters, or cube figures). The results show that chronometric mental rotation tasks with cube figures – even rotated in picture plane only – were too difficult for children in both age groups. Further analyses with animal drawings and letters as stimuli revealed an overall gender difference in response time (RT) favoring males, an increasing RT with increasing angular disparity for all children, and faster RTs for fourth graders compared to second graders. This is the first study which has shown consistent gender differences in chronometric mental rotation with primary school aged children regarding reaction time and accuracy while considering appropriate stimuli.