Conference PaperPDF Available

Two selection stages provide efficient object-based attentional control for dynamic vision

Authors:

Abstract and Figures

In this paper, we introduce semiattentive computa-tions as the result of replacing the usual single selection stage of visual attention by two consecutive selection stages. They are motivated by shortcomings of conventional attention models and correlate well to findings in human attention. The first selection stage employs preattentive saliency computations for the complete available input, and selects a small number of discrete items. These are subject to the semiattentive processes of tracking and information accumulation. The second stage selects a single element from the result of the first selection stage for the conventional focus of attention. The implementation and efficiency of this scheme is demonstrated in this paper. Its main advantage is the efficient selection and inhibition of objects in dynamic scenes. It allows the serialized accumulation of information for a changing environment and provides an up-to-date world model. The focus of this paper is on the quality of the computed world model and the object-related computations.
Content may be subject to copyright.
Two selection stages provide efficient object-based
attentional control for dynamic vision
Gerriet Backer
Krauss Software GmbH
Cremlingen, Germany
Gerriet.Backer@krauss-software.de
Bärbel Mertsching
AG IMA, Department of Computer Science
University Hamburg, Germany
mertsching@informatik.uni-hamburg.de
Abstract In this paper, we introduce semiattentive computa-
tions as the result of replacing the usual single selection stage
of visual attention by two consecutive selection stages. They
are motivated by shortcomings of conventional attention models
and correlate well to findings in human attention. The first
selection stage employs preattentive saliency computations for
the complete available input, and selects a small number of
discrete items. These are subject to the semiattentive processes
of tracking and information accumulation. The second stage
selects a single element from the result of the first selection
stage for the conventional focus of attention. The implementation
and efficiency of this scheme is demonstrated in this paper.
Its main advantage is the efficient selection and inhibition of
objects in dynamic scenes. It allows the serialized accumulation
of information for a changing environment and provides an up-
to-date world model. The focus of this paper is on the quality of
the computed world model and the object-related computations.
I. INTRODUCTION
Attentional mechanisms are mainly used to reduce the
amount of data for complex computations. They employ a
method of determining important, salient objects or areas and
select them - one after another - for being subjected to these
computations. Thus attention is a general method for seri-
alizing complex operations. Computations in those schemes
are either preattentively applied to the complete input data in
parallel or attentively and serially only to the selected area.
Complex accurate computations are usually done attentively
while simple computations are assigned to the preattentive
part.
The spotlight metaphor describes this behavior: at each time
only one region of space is illuminated. This area is the focus
of attention (FOA) and as such the place where complex
operations are applied. The spotlight can move to include other
regions. It is moved to regions of high saliency. The saliency
of an area can be determined by either data-driven bottom-up
information or model-driven top-down information. The focus
of computational attention models is mostly on the data-driven
information. The predominant selection unit for attention is
space. Only a few models deviate from this view and use
either features or objects as selection unit.
The rest of the article is organized as follows. A short review
of the predominant attention models in chapter II leads us to
an analysis of their drawbacks when operating in dynamic
environments. Chapter III proposes necessary modifications
that lead to our new architecture. An implementation of this
architecture which lines out the object-based aspects follows.
After an analysis and comparison of the properties of our
model in chapter IV, we conclude with an outlook on further
developments (chapter V).
II. PREVIOUS WORK
A. Conventional models of visual attention
A classic model of visual attention was proposed by Koch
and Ullman in 1985 [1]. In containing a parallel feature
extraction stage, a master map of attention for integrating the
saliency, applying a WTA-process to this map and allowing
the scanning of maxima by using an inhibition map, it already
provided many aspects present in todays attention models. The
model (outlined in figure 1) is closely related to models of
human visual attention like the Feature Integration Theory by
Treisman [2] or the Guided Search model by Wolfe [3], [4].
Input data
Saliency map
Feature map
WTA
FOA
Feature map Feature map
Preattentive stage
Attentive stage
Inhibition
map
Fig. 1. Simplified version of conventional attention models.
Computer models that build on this architecture were in-
troduced by e.g. Milanese et al. [5], Leavers [6], Itti et al.
[7], and Maki [8]. Other models are more concerned with the
transformation of a scene part into a constant reference frame
like the routing circuits of Olshausen [9] or the inhibitorybeam
of Tsotsos [10].
Attentional control is of special importance in active vision
[11], where the activity of the system - mostly in the form
of directing a camera - has to be determined according to the
properties of the environment and the state and goal of the
system. Active vision is a form of overt visual attention that
is closely related to covert attention by selection from internal
representations.
Applications of visual attention in computer vision include
object recognition methods [12], control of vehicles [13], and
navigation [14]. Especially object recognition profits from
the availability of a segmented single object in contrast to
a cluttered scene.
B. Beyond the spotlight
Accounts that go beyond the spotlight metaphor are mainly
found in models of natural visual attention. Pylyshyn [15],
[16] proposed the so-called FINST-theory to give an account
of findings from various experimental paradigms. He was able
to show that one can keep track of a small number (about 4
or 5) of independently moving objects among other identical
objects [17]. Accounts of a fast serial scanning of the objects
by a single focus of attention could be ruled out due to the
necessary speed of the focus. There is also an attention-related
limit on the fast, parallel and error-free counting (so-called
subitizing) of a small number of items (about 4 or 5). This
led to the assumption that some indices are available pointing
to moving objects and sticking to them without the need for
focal attention. Indexed items are more easily available for
focal attention.
Object-based theories of visual attention [18], [19] challenge
the predominant spatial accounts of attention. According to
them, objects are the meaningful units in visual selection.
The partitioning of the scene into objects determines the
assignment of attention. The empirical evidence comes from
experiments where in identical spatial layouts the suggested
grouping into objects caused additional costs associated with
the processing of multiple objects. Some recent empirical
findings point towards an integration of object-based effects
in spatial selection. A possible compromise suggests that
although attention selects a spatial part of the scene, the space
is determined by a fast object-based segmentation of the scene,
or by grouping effects.
Examples of the successful incorporation of object-based
approaches into computer models of visual attention have
been demonstrated at various levels by Fellenz [20], Maki et
al. [21], and Dickinson et al. [22]. We aim to contribute to
these achievements with a special focus on dynamic aspects
of controlling attention.
C. Limitations of conventional models
By applying conventional models (see section II-A) to
dynamic scenes we identify three major problems:
Inhibition of return is bound to static locations instead of
moving objects.
Extracted information cannot be bound to moving objects.
Selection and feature integration do not take into account
the dynamic environment.
By inhibiting recently selected locations, inhibition of return
(IOR) allows the scanning of a scene by a serial process. The
area with maximal activation in the master map of attention is
marked in the inhibition map with high activity. The activity in
the inhibition map is slowly decaying and inhibits the master
map of attention, so that another area will show the highest
activity. Using this static inhibition map, it is not possible to
inhibit moving objects. Imagine a scene with a highly salient
moving object and a number of salient static objects. After the
moving object is selected and processed, it is marked in the
inhibition map. As soon as it moves out of the inhibited area
it becomes the most salient object and is selected again. This
prevents the system from scanning the scene and selecting
among the static objects. For human visual attention Tipper et
al. [23] have demonstrated that the IOR is in fact bound to
moving objects instead of static locations.
By serializing the high-level computations, their results e.g.
object identities or classifications are bound to the location an
object inhabited in the moment it was selected. In the case
of moving objects, this information is soon outdated. Without
expensive re-checking, the system is not able to provide an up-
to-date world model, binding the identities to actual locations.
Itti and Koch [24] identified the spatiotemporal integration
of saliency information as an important step in the control
of visual attention. Thus the saliency information has to be
computed taking into account the previous saliency data. This
can lead to problems if there is no knowledge about the
movement of objects raising the saliency values.
In the following we will discuss how these problems can
be overcome in a new model of visual attention.
III. MODELING VISUAL ATTENTION
A. Consequences for modeling visual attention in dynamic
environments
From the above analysis it is clear that we need a method
of binding the saliency information to moving objects. For
the problem of dynamic IOR, we have to bind saliency
information to a small number of recently selected moving
objects. This also provides us with the binding of attentively
computed information to these objects and is thus a solution
for the first two problems.
This binding is necessary for the already selected objects
as well as salient objects that have not been selected yet.
This determines the need for a model-free tracking mechanism.
Objects that have never been selected for focal attention are not
recognized and can thus not be tracked based on knowledge
of their identity. Nevertheless it is not necessary to track all
objects in the scene. Just those who are salient enough to be
candidates for focal selection are relevant. This indicates a
close connection between selection and tracking.
Determining the saliency for these objects is a necessary
first step. To reflect the properties of the environment and
account for immanent inaccuracies in the feature computa-
tions, that have to computed preattentively for the complete
input images, spatial and temporal integration of saliency is
important. It has to compensate that the objects may be moving
and that the speed constraints on the preattentive feature
computations impose limits on their accuracy and reliability.
B. Model architecture
The processes that have been classified essential in the
previous chapter can neither be assigned to the preattentive
part nor to the attentive part of the selection.We therefor define
an additional semiattentive stage, where a small number of
discrete items is represented. These items have to be selected
by a first selection stage. This first selection stage selects a
small number of items according to their saliency integrated
over space and time. It should be robust and show hysteresis:
selected items remain selected for some time, even if other
items become more salient. Tracking is integrated into this first
selection stage which allows to bind extracted information to
moving objects as well as to inhibit them from being selected
by focal attention.
Input image
computations
Feature
Behavior
control
.
FOA
Neural
field
Attentive stage Semiattentive stage
Object file 3
Object file 2
Object file 1
...
Preattentive stage
World model
sequence
Saliency
representation
Object
recognition
Fig. 2. Architecture of the attention model outlining the three computa-
tion stages. Inside the neural field, three-dimensional activity clusters are
displayed.
Among the semiattentive algorithms is the generation of
symbolic descriptions for each selected item. These contain
information about the position, size, and trajectories as well
as histories of object selection, mean feature values, and
the results of high-level computations like object recognition.
They are stored in so-called object files, which constitute
the world model of the system. The notion of object files is
borrowed from psychophysical modeling [25], [26], [27] and
emphasizes the symbolic reference to an object preceeding the
computation of identity information.
For the focal selection of a single item, a second selection
stage is needed. This stage selects among the items that were
the result of the first selection stage. Second-stage selection
is subject to behaviors. It operates on the symbolic data
associated with an object and can include top-down influences.
The behavior is responsible for controlling the system, it
can e.g. initiate camera movements for foveating an object.
Figure 2 depicts the model architecture. Its implementation is
explained in the following section.
C. Model implementation
1) Feature computations: For the computation of saliency
we employ a number of features designed for fast object-
related information extraction.To achieve a most robust behav-
ior in different environments, these features use very different
aspects of the visual information, including edges, areas, color
information and stereo information where available. The use of
multi-scale computations ensures fast computations and robust
results. We tried to realize a more object-based behavior than
simple filter-operations could achieve.
Symmetry: To extract edge information in a biologically
plausible way, gabor filters of different scales and orientations
are applied to the input. The energy of the gabor filters
orthogonal to circles of different radii at different scales is
accumulated to compute the strength of rotation symmetry at
every pixel. Symmetry is a strong cue for artificial objects as
well as biological forms and points toward their center.
Eccentricity: A grey-level segmentation of the image,
consisting of a fast initial segmentation into many small
segments followed by a dilation and integration procedure,
provides area-based information of homogenous object or
object-part candidates. The saliency of segments is evaluated
by a computation of the segments eccentricity.
Color contrast: The image is first transformed into the
MTM color space [28] to achieve human-like processing of
colors. There, a segmentation takes place. The saliency of each
segment is computed according to the mean color contrast
to its neighboring segments weighted by the length of the
common border.
Depth: Gabor filters with vertical components form the
basis for this feature. For different orientations, a modified
cross correlation is applied to the filter energies of two stereo
images using multiple scales. Results from the lower resolution
scales limit the correlation range. A voting scheme selects the
most probable disparity from the correlation results for each
location. It takes into account the results of neighboring pixels,
different orientations and scales. According to the heuristic
that a system should first react to close objects, the saliency
is monotonic with the disparity.
The segmentation results as well as clues from depth and
symmetry can be used to identify visual objects and segment
them. The features have been described in detail in [29], [30],
[31]. The feature saliency is integrated into a representation
by first honoring exclusivity (one single red area is more
salient than a large number of identically colored areas) and
a following superposition of the feature values. In case stereo
information is available, a 3D representation is created. The
saliency representation provides the information necessary for
the first selection stage.
2) Dynamic neural fields: The close integration of robust
selection and model-free tracking suggests the use of dynamic
neural fields (DNF) as proposed by Amari [32]. Their selection
characteristic is robust, shows hysteresis and spatiotemporal
integration, which makes them the perfect candidates for
this stage as shown in [33]. Neural fields are simulations of
laterally connected cortical areas. Their topology corresponds
to the input they receive. The connections inside the field
are homogenous, only dependant on the distance between the
neurons. The dynamic of a neuron’s activity
at position
and time
is defined by the following differential equation:










!

#"

$
(1)
Herein,
is a (negative) resting value,
is the weight
function for the connections between the neurons,
is a
sigmoid function and
"
denotes the input function. The weights
for a DNF are excitatoric in a local neighborhood and get
inhibitoric for distant neurons. Different implementations use
either connections in a local neighborhood (local inhibition
type) or simulate a completely interconnected neural field
(global inhibition type). While the first type has stable states
with multiple activity clusters, the latter shows not more than
one such cluster. The weights are typically defined by a DoG-
function (for the local inhibition type) or standard distribu-
tions with a constant negative term (for the global inhibition
type). The distinct clusters of positive activity develop at
locations with sustained high input values and follow this
input. Hysteresis and spatiotemporal integration are important
mathematically proven properties of neural fields [32], [34].
We have realized different architectures of neural fields
reflecting characteristics of the saliency representation that
is used as input for the neural field [33]. They include
systems of interconnected global inhibition 2D neural fields for
individually weighted superpositions of the saliency features
as well as a single local inhibition 3D DNF. Using these
architectures we aim at integrating saliency only for objects
and use the cues of (three-dimensional) neighborhood or the
homogenity of an object. All those architectures show a small
number of distinct activity clusters (connected areas of positive
activity) that denote locations of high saliency and follow the
movement of such areas in their input.
These activity clusters are the result of the first selection
stage. They correspond to areas of sustained high saliency in
the input. For each of these clusters, a symbolic description is
created, a so-called object file. The underlying hypothesis is
that each of the clusters corresponds to a basic visual object,
a meaningful part of an object or a collection of objects. The
correspondence between object files and activity clusters is
constantly updated, a process that is easily implemented due to
the well-defined behavior of DNF, which shows spatial limits
for integration, tracking, and inhibition of different objects.
These thresholds determine spatial boundaries beyond which
no correspondence is sought. Inside the boundaries, spatial
distance and similarity of features inside the activity areas
determine the correspondence of object files to activity clusters
and therefor the continuity of object files.
3) Second selection stage: The second selection stage is
subject to top-down influences and can be implemented in a
problem-specific way. Its operation is encapsulated in behav-
iors, that take into account the object file-information as well
as the state and goal of the system. Main task of an behavior
is the selection of one of the object files, and thereby the
corresponding activity cluster in the neural field, for focal
attention. The area corresponding to the activity cluster in
the input image is then subjected to high-level computations
like object recognition. It can also be foveated by saccadic
camera movements, so that the system shows overt attention
by controlling an active vision system [31].
The default exploration behavior is achieved by assigning
priority levels to the object files according to the time they
were last selected. Unselected items receive the highest prior-
ity. Within a priority level the object files are ordered by their
saliency. Dynamic IOR for moving objects is implicit to this
behavior and can be achieved by other behaviors in a similar
way. Examples of other behaviors we have implemented
include an alarm system, integrated searching and tracking of
a defined object, and the simulation of visual search.
Due to the symbolic computation on a small number of
simple data structures, the modification of behaviors and the
implementation of additional behaviors (possibly using exist-
ing behaviors) is easily achieved. In operating on individual
items, the behaviors are related Ullmans visual routines [35].
The indexed items from the visual routine model correspond
to the object files in our model. They differ in that a single
behavior is used for the main system control and a collection
of visual routines is used in the Ullman model, but it would
be possible to replace the monolithic behavior by such an
approach. An important aspect agreed on by Ullman [35]
and Pylyshyn [16] is that indices to a number of items are
important for relational operations. This is also achieved by
the first selection stage of our architecture. Behaviors can use
notions of objects that are “behind”, “higher” or “larger” than
something else.
IV. RESULTS
A. First selection stage and semiattentive computations
Static image feature computations, integration and selection
by a DNF is depicted in figure 3. The variant shown uses
the stereo information computed during the determination
of stereo saliency to create a 3D representation of overall
saliency. The neural field used is a single three-dimensional
local inhibition type DNF. We used some modifications [30]
to the neural field to realize fast computations in spite of the
high dimensionality and the large number of neurons.
The tracking performance by neural fields is demonstrated
in [33]. The feature saliency reflects the environment proper-
ties. The features have been analyzed further in [30], [31].
B. World model quality
In order to compare the quality of the new approach to more
conventional modeling of attentional control we designed an
experiment involving the exploration of a scene by simulated
Input images
3D−Mastermap
activation
Neural field
Features
2D−Mastermap
Fig. 3. Example of the feature computations (from left to right: symmetry,
eccentricity, color contrast, and depth), saliency integration into master map,
superposition in 3D master map, and neural field activation. The activation
clusters in the neural field are colored. The 3D representations are ordered
by increasing distance in reading order. The colored background reflects
the architectural distinction of preattentive and semiattentive components as
shown in fig. 2.
recognition of objects. This allowed us to abstract from special
aspects like feature computation qualities for different inputs
and concentrating on the architectural design. The goal to
be achieved was to compute a world model containing as
much objects as possible while maintaining accurate position
information for these objects. A number of simple objects
(squares of 5 by 5 pixels) were either stationary or moving
on a straight path (they moved at most 2 pixels in x- and y-
direction between consecutive frames). Noise was added with
half the amplitude of the objects. This data was used as a
simulated 2D master map of attention. Figure 4 shows three
consecutive frames of such a scene.
To these scenes we applied a conventional attention algo-
rithm with a static inhibition map as well as our attention
model. A simulated object recognition was the high-level
algorithm carried out at the focus of attention. The recognition
should take three frames. For our model, we added another
fourth frame to the recognition duration to compensate the
additional computations necessary for the neural fields. We
compared the resulting world models (identified objects and
their positions) to ground truth and computed the mean number
Fig. 4. Three consecutive frames used in the experiment for comparing
the attention models. Two of the objects are static, while three of them are
dynamic.
of recognized objects and the position error. Whenever the
position was off by more than 20 pixels, the object was
counted as not being recognized.
For this experiment, we used the most simple variant of
our model with a single 2D neural field of local inhibition
type. The choice was made to achieve as much comparability
between the two models as possible. As conventional models
use a 2D representation of overall saliency, we decided to use
the same representation for our neural field model. This ruled
out the use of the more advanced 3D neural field and the
system of global inhibition 2D fields with weighted features
(see [33] for a comparison).
The conventional model was mainly derived from the Koch
and Ullman [1] model. By abstracting from the feature com-
putations as well as the WTA-process, we tried to capture the
essential selection and inhibition scheme of the conventional
attention algorithms that we analyzed in section II-C. The
localization and selection was achieved by blurring the input
(mimicking the selection by neural fields and finding the
center of the input) and selecting the maximum value after
applying inhibition. We used an inhibition map with activity
slowly decaying by a factor of 0.8 after each frame. An object
was marked in the inhibition map using an area of 8 by
8 pixels, taking into account that the distance between two
objects was at least 14 pixels at each moment, so that there
was no danger of inhibiting a different object. The large size
and slow decay was chosen to give the conventional system
a small additional advantage: the long inhibition of objects
that would not inhibit other moving objects due to the large
distance. Under real world circumstances, the classical model
would perform worse than in our experiment, while our model
could still be improved by using more advanced neural field
architectures and saliency representations.
One run of the experiment consisted of the preparation of 40
input frames (master maps) with the desired number of static
and dynamic objects. Ground truth of the identity and location
of the objects was computed. Both models were presented with
the simulated master maps, selected one location/item for focal
attention and started the simulated object recognition. After
three or four frames (depending on the model), the identity
of the selected object was returned by the simulated object
recognition. The identity was transferred to the internal world
model. For the conventional algorithm, it was connected to the
position, where it was selected. Our model used the object files
0
1
2
3
4
5
6
0 1 2 3 4 5 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Recognized objects
Position error
Dynamic objects
NeuralField-objects
Conventional-objects
NeuralField-position
Conventional-position
0
1
2
3
4
5
6
0 1 2 3 4 5 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Recognized objects
Position error
Dynamic objects
NeuralField-objects
Conventional-objects
NeuralField-position
Conventional-position
0
1
2
3
4
5
6
0 1 2 3 4 5 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Recognized objects
Position error
Dynamic objects
NeuralField-objects
Conventional-objects
NeuralField-position
Conventional-position
Static objects: 1 Static objects: 3 Static objects: 5
Fig. 5. Comparison of world model quality for two approaches of visual attention and varied numbers of static and dynamic objects. Depicted are the number
of recognized objects and the position error. See text for details.
to connect the identity with the actual position of the activity
cluster. The mean number of recognized objects in the world
model (computed over all frames and all runs) as well as the
mean position error for the recognized objects was computed.
Figure 5 shows the results for different numbers of static
and dynamic objects. We refer to the conventional model
as “Conventional” and to our two-stage selection model as
“NeuralField”. Each data point is based on 50 runs of 40
frames. The mean number of recognized objects is always
smaller than the number of objects present because the mean
is taken over the complete run. The systems need a number of
frames until every object is recognized. Take the runs with five
dynamic and five static objects. The neural field model needs
at least all 40 frames until all 10 objects are recognized (four
frames per object). Therefor its optimal result would be a mean
of five recognized objects. It nearly reaches this optimum.
We find that for every number of static objects, the neural
field model scales much better with the number of dynamic
objects. The advantage of faster simulated recognition is only
exploited by the conventionalmodel when no object is moving.
In all other cases the new model is supreme. This is especially
true for the position error. In every condition, the mean
position error is smaller than 0.5 pixels for our new model
while the conventional model shows an error between 0.5 and
5 pixels. We conclude that even with the higher computational
demands associated with the neural fields, the new approach
provides a more efficient way of scanning a scene and keeping
the extracted information up to date than conventional models
of visual attention.
C. System performance
The exploration of a scene by a specified behavior is
depicted in figure 6. It shows the input frames, with object
files marked by bounding boxes, together with the area of
the FOA (below the input frame). Note the selection of
mainly meaningful areas (ball, picture, and robot) due to the
object-related feature computations and the movement of the
bounding box together with the moving objects (ball and
robot). The first re-checking of an item occurs only after
all objects were subject to focal attention. In this aspect, the
system has found an optimal dynamic scanpath.
The experiment was carried out using the simulation envi-
ronment Orbital 3D, which was implemented in our workgroup
[36]. Its suitability for evaluating vision algorithms is due to
the fact that controllable environments of different qualities
can be used to provide reproducible dynamic experiments with
ground truth [37].
D. Correspondence to natural visual attention
We have incorporated some advanced aspects of natural
visual attention models into our attention architecture. It is
therefor suggesting to ask if the architecture can serve to
explain additional empirical findings on attention. Besides the
simultaneous tracking of multiple objects and the binding of
IOR to moving objects that are inherent to the model, we take
a look at further effects in natural attention that are known to
be difficult to explain.
The two selection stages contribute to the old debate on
early and late selection. The core problem here is that while
under some circumstances there is evidence for complex com-
putations outside the focus of attention, others find that even
simple computations need attention. By using two selection
stages, the dichotomy of attentive and preattentive processing
is replaced by three stages, adding semiattentive computations.
By shifting computations between the attentive and the semi-
attentive stage in accordance with the computational load,
the complexity of the task, and the state of the system, the
observed variants of serial and parallel processing could be
produced.
Accounts of multiple foci of attention [38] or the striking
effects of flanker compatibility [39] can also be explained
by our model as being related to semiattentive computations.
Take the experiments by Kramer and Hahn [38]. They showed
that it is possible to quickly compare objects at two positions
Fig. 6. Exploration of a scene. For 15 frames (in reading order), the current view of the scene is annotated with the bounding boxes and numbers of the
object files. The currently selected OF is white with an arrow pointing from the center towards it; OF with already recognized objects are blue, OF unselected
so far are red. For each frame, the area of the FOA is depicted separately.
without identifying distractors lying in between them. This
ruled out the possibility of one large spotlight of attention. The
presentation speed ruled out a possible “jump” of the focus of
attention from one object to the other. Using our model, the
explanation would not involve multiple foci of attention but
just semiattentive selection and comparison of both items.
The flanker compatibility effects [40] demonstrate the pro-
cessing and recognition of items at positions that are known
to be irrelevant (distractors) when the task is to classify one
item (the target) at a previously known position. At a first
look, this seems to be just what would be avoided by selective
attention. The typical displays to demonstrate this effect show
a small number of items that are easily recognized (like digits
or letters). The distractors are of the same type as the target.
Applying our model to such displays, each item would be
selected by the first selection stage due to the small number
of overall present items and their similarity to the target. The
identification processes are rather simple and could operate on
the semiattentive stage. Focal attention is then just needed to
bind the correct result to the target and the reaction. Although
it may be more efficient to suppress the computation of letter
and target identities, the Stroop effect [41] suggests that they
are too automated to be suppressed whenever an item is
selected.
V. CONCLUSIONS
The novel architecture of two selection stages in visual
attention, providing an additional semiattentive computation
stage was motivated by problems conventional approaches of
visual attention reveal in dynamic scenes. The object-based
computations in every stage of the model allow us to refer
to meaningful entities of the environment. This improves the
selection process itself and simplifies high-level computations
like object recognition. Especially in dynamic environments,
the operation on moving objects is an improvement over purely
spatial approaches.
By creating object files for the discrete activity clusters in
the neural field, the model shows a well-defined transition from
subsymbolic computations to the symbolic domain, where
single visual objects are the subjects of manipulation. The
implementation of behaviors for the second selection stage al-
lows an encapsulation of top-down influences on the operation
characteristic of the system.
Specialization for applications is achieved by additional
features that allow the localization and selection of objects
relevant to the task at hand. The modification or implemen-
tation of behaviors allows the integration into a larger vision
system, as well as interaction with other system components,
and the inclusion of specific knowledge. Note that the system
does not depend on such knowledge, but it can be augmented
and specialized whenever it is available. To provide even better
object candidates by the first selection stage, a segmentation
process based on the feature computations would be a sugges-
tive extension of the model.
REFERENCES
[1] C. Koch and S. Ullman, “Shifts in selective visual attention: Towards the
underlying neural circuitry, Human Neurobiology, vol. 4, pp. 219–227,
1985.
[2] A. Treisman and G. Gelade, “A feature integration theory of attention,”
Cognitive Psychology, vol. 12, pp. 97–136, 1980.
[3] J. Wolfe, K. R. Cave, and S. L. Franzel, “Guided search: An alternative
to the feature integration model for visual search,” Journal of Exper-
imental Psychology: Human Perception and Performance, vol. 15, pp.
419–433, 1989.
[4] J. Wolfe, “Guided search 2.0: A revised model of visual search,”
Psychonomic Bulletin and Review, vol. 1, no. 2, pp. 202–238, 1994.
[5] R. Milanese, H. Wechsler, S. Gil, J. Bost, and T. Pun, “Integration
of bottom-up and top-down cues for visual attention using non-linear
relaxation,” in Proceedings, of the IEEE Conference on Computer Vision
and Pattern Recognition (Seattle, 1994), 1994, pp. 781–785.
[6] V. Leavers, “Preattentive computer vision - towards a 2-stage computer
vision system for the extraction of qualitative descriptors and the cues
for focus of attention,” Image and Vision Computing, vol. 12, no. 9, pp.
583–599, 1994.
[7] L. Itti and C. Koch, “A saliency-based search mechanism for overt and
covert shifts of visual attention,” Vision Research, vol. 10-12, pp. 1489–
1506, 6 2000.
[8] A. Maki, P. Nordlund, and J.-O. Eklundh, “A computational model of
depth-based attention,” in Proc. 13th Int. Conf. on Pattern Recognition,
vol. 4, 1996, pp. 734–738.
[9] B. Olshausen, C. Anderson, and D. V. Essen, A multiscale dynamic
routing circuit for forming size- and position-invariant object representa-
tions,” Journal of Computational Neuroscience, vol. 2, no. 1, pp. 45–62,
1995.
[10] J. K. Tsotsos, “An inhibitory beam for attentional selection,” in Spatial
visions in humans and robots, L. Harris and M. Jenkins, Eds., 1993.
[11] Y. Aloimonos, I. Weiss, and A. Bandopadhay, “Active vision,” in
Proceedings of the first International Conference on Computer Vision,
1987, pp. 35–54.
[12] L. Pessoa and S. Exel, “Attentional strategies for object recognition,”
in Proceedings of the IWANN, Alicante, Spain 1999, ser. Lecture Notes
in Computer Science, J. Mira and J. Sachez-Andres, Eds., vol. 1606.
Springer, 1999, pp. 850–859.
[13] D. Reece and S. Shafer, “Control of perceptual attention in robot
driving, Artificial Intelligence, vol. 78, pp. 397–430, 1995.
[14] A. Abbott, “A survey of selective fixation control for machine vision,”
in IEEE Control Systems, 1992, pp. 25–31.
[15] Z. Pylyshyn, J. Burkell, B. Fisher, C. Sears, W. Schmidt, and L. Trick,
“Multiple parallel access in visual attention,” Canadian Journal of
Experimental Psychology, vol. 48, no. 2, pp. 260–283, 1994.
[16] Z. Pylyshyn, “Visual indexes in spatial vision and imagery, in Visual
Attention, ser. Vancouver Studies in Cognitive Science, R. Wright, Ed.
Oxford University Press, 1998, no. 8, pp. 215–231.
[17] Z. W. Pylyshyn and R. Storm, “Tracking multiple independent targets:
evidence for a parallel tracking mechanism,” Spatial Vision, vol. 3, no. 3,
1988.
[18] S. Vecera and M. Farah, “Does visual attention select objects or
locations,” Journal of Experimental Psychology: General, vol. 123, pp.
146–160, 1994.
[19] S. Tipper and B. Weaver, “The medium of attention: Location-based,
object-centered, or scene-bases,” in Visual Attention, R. Wright, Ed.
Oxford University Press, 1998, pp. 77–107.
[20] W. Fellenz and G. Hartmann, “Preattentive grouping and attentive
selection for early visual computation,” in 13th ICPR 1996, International
25 - 30, 1996 Conference on Pattern Recognition Technical University,
Wien, August, 1996.
[21] A. Maki, P. Nordlund, and J.-O. Eklundh, “Attentional scene segmen-
tation: Integrating depth and motion,” Computer Vision and Image
Understanding, vol. 78, pp. 351–373, 2000.
[22] S. Dickinson, H. Christensen, J. Tsotsos, and G. Olofsson, “Active ob-
ject recognition integrating attention and viewpoint control,” Computer
Vision and Image Understanding, vol. 67, no. 3, pp. 239–260, 1997.
[23] S. P. Tipper, J. Driver, and B. Weaver, “Object-centered inhibition of re-
turn of visual attention,” Quarterly Journal of Experimental Psychology,
vol. 43A, no. 2, pp. 289–298, May 1991.
[24] L. Itti and C. Koch, “Feature combination strategies for saliency-based
visual attention systems,” Journal of Electronic Imaging, vol. 10, no. 1,
2001.
[25] A. Treisman, “Representing visual objects,” in Attention and Perfor-
mance, D. Meyer and S. Kornblum, Eds. Hillsdale, NJ: Erlbaum, 1991,
vol. 14.
[26] D. Kahneman, A. Treisman, and B. Gibbs, “The reviewing of object
files: object-specific integration of information,” Cognitive Psychology,
vol. 24, no. 2, pp. 175–210, 1992.
[27] J. Wolfe and S. Bennett, “Preattentive object files: Shapeless bundles of
basic features,” Vision Research, vol. 37, pp. 25–43, 1997.
[28] M. Miyahara and Y. Yoshida, “Mathematical transform of (r, g, b) color
data to munsell (h, v, c) color data,” Visual Communication and Image
Processing, vol. 1001, pp. 650–657, 1988.
[29] B. Mertsching, M. Bollmann, R. Hoischen, and S. Schmalz, “The neural
active vision system navis,” in Handbook of Computer Vision and
Applications Vol. 3 (Systems and Applications), B. Jähne, H. HauSSecke,
and P. GeiSSler, Eds. Academic Press, 1999, pp. 543–568.
[30] G. Backer and B. Mertsching, “Integrating time and depth into the at-
tentional control of an active vision system,” in Dynamische Perzeption.
Workshop der GI-Fachgruppe 1.0.4 Bildverstehen, Ulm, November 2000,
G. Baratoff and H. Neumann, Eds., 2000, pp. 69–74.
[31] G. Backer, B. Mertsching, and M. Bollmann, “Data- and model-driven
gaze control for an active-vision system,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 23, no. 12, pp. 1415–1429, 2001.
[32] S.-I. Amari, “Dynamics of pattern formation in lateral inhibition type
neural field,” Biological Cybernetic, vol. 27, pp. 77–87, 1977.
[33] G. Backer and B. Mertsching, “Using neural field dynamics in the
context of attentional control,” in Proceedings of the ICANN 2002, 2002,
pp. 1237–1242.
[34] K. Kishimoto and S.-I. Amari, “Existence and stability of local excita-
tions in homogenous neural fields,” Journal of Mathematical Biology,
vol. 7, pp. 303–318, 1979.
[35] S. Ullman, “Visual routines,” Cognition, vol. 18, pp. 97–159, 1984.
[36] M. Bungenstock, A. Baudry, J. Bitterling, and B. Mertsching, “Devel-
opment of a simulation framework for mobile robots,” in Proceedings
of the EUROIMAGE ICAV3D 2001, 2001, pp. 89–92.
[37] G. Backer and B. Mertsching, “Evaluation of attentional control in active
vision systems using a 3d simulation framework, in Journal of the
WSCG - 10th International Conference in Central Europe on Computer
Graphics, Visualization and Computer Vision, vol. 10, 2002, pp. 32–39.
[38] A. Kramer and S. Hahn, “Splitting the beam: Distribution of attention
over noncontiguous regions of the visual field,” Psychological Science,
vol. 6, no. 6, pp. 381–386, 1995.
[39] J. Miller, “The flanker compatibility effct as a function of visual angle,
attention focus, visual transients, and perceptual load: A search for
boundary conditions,” Perception and Psychophysics, vol. 49, pp. 270–
288, 1991.
[40] C. Eriksen and J. Hoffman, “The extent of processing of noise ele-
ments during selective encoding from visual displays,” Perception and
Psychophysics, vol. 14, pp. 155–160, 1973.
[41] J. Stroop, “Studies of interference in serial verbal reactions,” Journal of
Experimental Psychology, vol. 18, pp. 643–662, 1935.
... Backer and Mertsching present an innovative model of attention divided into two selection stages [Backer and Mertsching, 2000;Backer et al., 2001;Backer and Mertsching, 2003]. Furthermore, while the previous methods compute saliency pixel by pixel, the authors introduce a region-based model that performs a pixel clustering prior to the saliency computation. ...
... The general two-stages structure of the attention mechanism is related to a previous proposal [Backer and Mertsching, 2003]. In the pre-attentive stage, the different proto-objects in the image are extracted using a perceptual segmentation algorithm based on a hierarchical framework . ...
... Tomando como base estos estudios, se han ido desarrollando distintos modelos computacionales de atención visual durante los últimos 30 años. Por ejemplo, los modelos desarrollados por Koch and Ullman [1985], Itti et al. [1998], Backer and Mertsching [2003], Navalpakkam and Itti [2006], Frintrop [2006] o Kouchaki and Nasrabadi [2012]. En contraposición a los modelos anteriores, basados en regiones, que computan la relevancia a nivel de píxel, los modelos basados en objetos ponen de manifiesto el hecho de que las habilidades perceptivas deben optimizarse para interaccionar con conjuntos coherentes de píxeles y no con meras regiones espaciales desestructuradas [Duncan, 1984]. ...
Thesis
Full-text available
This Ph. D. Thesis presents a novel attention-based cognitive architecture for social robots. The architecture aims to join perception and reasoning considering a double and simultaneous imbrication: the ongoing task biases the perceptual process to obtain only useful elements whereas perceived items determine the behaviours to be accomplished. Therefore, the proposed architecture represents a bidirectional solution to the perception-reasoning-action loop closing problem. The basis of the architecture is an Object-Based Visual Attention model. This perception system draws attention over perceptual units of visual information, called proto-objects. In order to highlight relevant elements, not only several intrinsic basic features (such as colour, location or shape) but also the constraints provided by the ongoing behaviour and context are considered. The proposed architecture is divided into two levels of performance. The lower level is concerned with quantitative models of execution, namely tasks that are suitable for the current work conditions, whereas a qualitative framework that describes and defines tasks relationships and coverages is placed at the top level. Perceived items determine the tasks that can be executed in each moment, following a need-based approach. Thereby, the tasks that better fit the perceived environment are more likely to be executed. Finally, the cognitive architecture has been tested using a real and unrestricted scenario that involves a real robot, time-varying tasks and daily life situations, in order to demonstrate that the proposal is able to efficiently address time- and behaviour-varying environments, overcoming the main drawbacks of already existing models. - See more at: http://riuma.uma.es/xmlui/handle/10630/8174#sthash.n8yRGDHU.dpuf
... Although these models have good performance in static environments, they cannot in principle handle dynamic environments due to their impossibility to take into account the motion and the occlusions of the objects in the scene. In order to solve this problem, an attention control mechanism must integrate depth and motion information to be able to track moving objects [1]. Thus, Maki et al. [7] propose an attention mechanism which incorporates depth and motion as features for the computation of saliency. ...
... Thus, Maki et al. [7] propose an attention mechanism which incorporates depth and motion as features for the computation of saliency. Baker and Mertsching [1] also compute depth as a feature, but use dynamic neural fields to track the most salient regions of the saliency map in a semiattentive stage. The method is reported to take 30 seconds per frame, which makes its application to real-time, interactive systems unfeasible. ...
... The presented work is centered in the task-independent stage of a feature integration approach. Our method is related to the recent proposal of Backer and Mertsching [1] in several aspects. The first is the use of a preattentive stage in which parallel features are computed and integrated into a saliency map. ...
Article
Full-text available
Biological-plausible attention mechanisms are general approaches that permit a social robot to extract only relevant information from the huge amount of input data. In this paper an attention mechanism based on the feature integration theory is proposed. The aim of this attention mechanism is to provide to higher-level modules of the vision system the most relevant regions in fast, dynamic scenarios where interaction with humans can occur. The proposed system integrates bottom-up (data-driven) and top-down (model-driven) processing. The bottom-up component determines and selects salient image regions by computing a num- ber of different features. The top-down component makes use of object templates to filter out data and track significant objects. The proposed system has three steps: parallel computation of feature maps, feature integration and simultaneous tracking of the most salient regions. Its main characteristic is that the mechanism integrates the tracking of the most salient regions, which allows to handle changing environments with moving objects where occlusions can oc- cur.
... Although these models have good performance in static environments, they cannot in principle handle dynamic environments due to their impossibility to take into account the motion and the occlusions of the objects in the scene. In order to solve this problem, an attentional control mechanism must integrate depth and motion information to be able to track moving objects [6]. Thus, Maki et al. [7] propose an attention mechanism which incorporates depth and motion as features for the computation of saliency. ...
... Thus, Maki et al. [7] propose an attention mechanism which incorporates depth and motion as features for the computation of saliency. Baker and Mertsching [6] also compute depth as a feature, but use dynamic neural fields to track the most salient regions of the saliency map in a semiattentive stage. The method is reported to take 30 seconds per frame, which makes its application to real-time, interactive systems unfeasible. ...
... The presented work is centered in the task-independent stage of a feature integration approach. Our method is related to the recent proposal of Backer and Mertsching [6] in several aspects. The first is the use of a preattentive stage in which parallel features are computed and integrated into a saliency map. ...
... In the above formula, p(i) is the probability of corresponding degree of grayness and H(A) is the entropy of the image. Obtain the target entropy through Maximum Entropy Criteria [14,[17][18]. Theory of Maximum Entropy is usually used in image segmentation. It assumes that the changes in grayness are similar in target region and background region, and the fluctuation of grayness is smooth. ...
Article
Full-text available
In this article, we propose an adaptive image segmentation method based on saliency. First of all, we obtain the saliency map of an image via four bottom-layer feature tunnels, i.e. color, intensity, direction and energy. The energy tunnel helps to describe the outline of objects better in the saliency map. Then, we construct the target detection masks according to the greyness of pixels in the saliency map. Each mask is applied to the original image as the result of pre-segmentation, then corresponding image entropy is calculated. Predict the expected entropy according to maximum entropy criteria and select the optimal segmentation according to the entropies of pre-segmented images and the expected entropy. A large number of experiments have proved the effectiveness and advantages of this algorithm.
... A 'preattentive object' catches the attention if it differs from its immediate surrounding. In contrast with other previous works [5,38] which only compute one saliency map in the preattentive stage, the proposed approach computes two different saliency maps associated to the set of 'proto-objects' previously extracted. ...
Article
Full-text available
This paper describes a visual perception system for a social robot. The central part of this system is an artificial attention mechanism that discriminates the most relevant information from all the visual information perceived by the robot. It is composed by three stages. At the preattentive stage, the concept of saliency is implemented based on ‘proto-objects’ [37]. From these objects, different saliency maps are generated. Then, the semiattentive stage identifies and tracks significant items according to the tasks to accomplish. This tracking process allows to implement the ‘inhibition of return’. Finally, the attentive stage fixes the field of attention to the most relevant object depending on the behaviours to carry out. Three behaviours have been implemented and tested which allow the robot to detect visual landmarks in an initially unknown environment, and to recognize and capture the upper-body motion of people interested in interact with it.
... The semi-atttentive stage makes use of object specific properties to filter out data and only track significant objects. Thus, the system is related to the Backer and Mertsching's proposal [5] in several aspects. The first is the use of a pre-attentive stage in which parallel features are computed and integrated into a saliency map. ...
Article
Full-text available
This paper describes a visual perception system which allows a social robot to conduct several tasks. The central part of this system is an artificial attention mechanism which is able to discriminate the most relevant information from all the visual information perceived by the robot. This attention mechanism is composed by three modules or stages. At the preattentive stage, a set of uniforms blobs or 'pre-attentive objects' is obtained. Once the most salient objects are obtained, the semiattentive stage identifies and tracks some of them according to the tasks to accomplish. This tracking process allows to implement the `inhibition of return', avoiding revisiting an attended object. Finally, the attentive stage also fixes the field of attention to the most relevant object depending on the behaviours to accomplish. Three behaviours have been implemented which allow the robot to detect visual landmarks in an initially unknown environment and to recognize and capture the upper-body motion of people interested in interact with it.
... hof, 2003), taking into account the three closely interrelated aspects of saliency, scale, and content. The detector is translation, rotation, and scale invariant. In current machine attention, bottom-up selection plays an important role in providing early cues in a multistage competetive scheme of attention processing (Navalpakkam and Itti, 2002). Backer and Mertsching (2003) introduced a cascaded computation by selecting a small number of discrete items in a preattentive phase analyzing symmetry, eccentricity, color contrast, and depth, and then applied smiattentive processes of tracking and information accumulation until a single cue of interest could be more efficiently selected. ...
Chapter
A key property of neural processing in higher mammals is the capability to focus resources, by selectively directing attention towards the most important sensory inputs of the moment. Attention research has shown rapid growth over the past two decades, as new techniques have become available to study higher brain function in humans, non-human primates, and other mammals. Neurobiology of Attention is the first encyclopedic volume to summarize the latest developments in attention research. An authoritative collection of 111 concise articles organized into thematic sections provides both broad coverage and access to focused, up-to-date research findings. The volume presents a state-of-the-art multidisciplinary perspective on psychological, physiological and computational approaches to understanding the neurobiology of attention. Ideal for students, as a reference handbook, or for rapid browsing, the book has a wide appeal to anybody interested in attention research.
Chapter
The purpose of this chapter is both to review some of the most representative visual attention models, both theoretical and practical, that have been proposed to date, and to introduce the authors’ attention model, which has been successfully used as part of the control system of a robotic platform. The chapter has three sections: in the first section, an introduction to visual attention is given. In the second section, relevant state of art in visual attention is reviewed. This review is organised in three areas: psychological based models, connectionist models, and features-based models. In the last section, the authors’ attention model is presented.
Article
Full-text available
Resumen En Inteligencia Artiflcial esta ampliamente aceptado el im- portante papel que desempe~na el razonamiento espacial. Por un lado, el razonamiento cualitativo espacial puede tratarse mediante teor¶‡as pura- mente existenciales en forma de sistemas de satisfaccion de restricciones sobre un conjunto de relaciones mutuamente excluyentes y disjuntas. Por otro lado, las logicas modales y de primer orden proporcionan un poder expresivo mayor a costa de un peor comportamiento computacional. En este art¶‡culo, se compara una logica modal, llamada Spatial Propositio- nal Neighborhood Logic (SpPNL), con el ¶ Algebra de Rectangulos, que es la extension bidimensional del ¶ Algebra de Intervalos de Allen para el razonamiento temporal. Se demuestra como es posible comprobar la consistencia de una red de restricciones del ¶ Algebra de Rectangulos me- diante una formula en SpPNL. Tambien se muestra como dicha logica permite expresar ciertas restricciones espaciales intuitivas que no pueden representarse en el ¶ Algebra de Rectangulos.
Article
Resumen En los últimos años, hemos sido testigos del importante papel que la lógica difusa o borrosa (fuzzy logic) ha jugado en el desarrollo de sofisticadas aplicaciones software en campos tan diversos como los sistemas expertos, inteligencia artificial, control industrial, medicina, etc. Con el objetivo de facilitar el desarrollo de tales aplicaciones, mucho más recientemente ha surgido el interés por diseñar lenguajes declara- tivos (funcionales, lógicos y/o difusos) que incorporen entre sus recursos expresivos el tratamiento de información imprecisa de forma natural, al tiempo que puedan ejecutarse de forma eficiente mediante, por ejem- plo, el uso de mecanismos de evaluación perezosos. Para que ello sea factible, es necesario definir previamente una noción de igualdad apropi- ada que integre de forma natural todas estas características en un marco uniforme. En este trabajo, proponemos una potente noción de igualdad en este sentido, que es capaz de dar cuenta de propiedades perezosas y borrosas en un contexto declarativo integrado, y que además resulta facilmente implementable a un alto nivel de abstracción. Palabras clave: Igualdad, Programación Declarativa, Lógica Difusa
Article
Full-text available
Bottom-up or saliency-based visual attention allows pri- mates to detect nonspecific conspicuous targets in cluttered scenes. A classical metaphor, derived from electrophysiological and psycho- physical studies, describes attention as a rapidly shiftable ''spot- light.'' We use a model that reproduces the attentional scan paths of this spotlight. Simple multi-scale ''feature maps'' detect local spatial discontinuities in intensity, color, and orientation, and are combined into a unique ''master'' or ''saliency'' map. The saliency map is se- quentially scanned, in order of decreasing saliency, by the focus of attention. We here study the problem of combining feature maps, from different visual modalities (such as color and orientation), into a unique saliency map. Four combination strategies are compared us- ing three databases of natural color images: (1) Simple normalized summation, (2) linear combination with learned weights, (3) global nonlinear normalization followed by summation, and (4) local non- linear competition between salient locations followed by summation. Performance was measured as the number of false detections be- fore the most salient target was found. Strategy (1) always yielded poorest performance and (2) best performance, with a threefold to eightfold improvement in time to find a salient target. However, (2) yielded specialized systems with poor generalization. Interestingly, strategy (4) and its simplified, computationally efficient approxima- tion (3) yielded significantly better performance than (1), with up to fourfold improvement, while preserving generality. © 2001 SPIE and IS&T. (DOI: 10.1117/1.1333677)
Article
Most models of visual search, whether involving overt eye movements or covert shifts of attention, are based on the concept of a saliency map, that is, an explicit two-dimensional map that encodes the saliency or conspicuity of objects in the visual environment. Competition among neurons in this map gives rise to a single winning location that corresponds to the next attended target. Inhibiting this location automatically allows the system to attend to the next most salient location. We describe a detailed computer implementation of such a scheme, focusing on the problem of combining information across modalities, here orientation, intensity and color information, in a purely stimulus-driven manner. The model is applied to common psychophysical stimuli as well as to a very demanding visual search task. Its successful performance is used to address the extent to which the primate visual system carries out visual search via one or more such saliency maps and how this can be tested.
Article
In studying new-generation color image codings, it is very effective 1) to code signals in the space of inherent tri-attributes of human color perception, and 2) to relate a coding error with perceptual degree of deteriorations. For these purpose, we have adopted the Munsell Renotation System in which color signals of tri-attributes of human color perception (Hue, Value and Chroma) and psychometrical color differences are defined. In the Munsell Renotation System, however, intertransformation between (RGB) data and corresponding color data is very cumbersome. Because the intertransformation depends on a look up table. This article presents a new method of mathematical transformation. The mathematical transformation is obtained by multiple regression analysis of 250 color samples, which are uniformly sampled from whole color ranges that a conventional NTSC color TV camera can present. The new method can transform (RGB) data to the data of the Munsell Renotation System far better than the conventional method given by the CIE(1976)L*a*b*.
Article
In an effort to examine the flexibility with which attention can be allocated in visual space, we investigated whether subjects could selectively attend to multiple noncontiguous locations in the visual field We examined this issue by precuing two separate areas of the visual field and requiring subjects to decide whether the letters that appeared in these locations matched or mismatched while distractors that primed either the match or mismatch response were presented between the cued locations If the distractors had no effect on performance, it would provide evidence that subjects can divide attention over noncontiguous areas of space Subjects were able to ignore the distractors when the targets and distractors were presented as nononset stimuli (i e, when premasks were changed into the targets and distractors) In contrast, when the targets and distractors were presented as sudden-onset stimuli, subjects were unable to ignore the distractors These results begin to define the conditions under which attention can be flexibly deployed to multiple noncontiguous locations in the visual field. © 1995, Association for Psychological Science. All rights reserved.
Article
Reviews experiments conducted in the authors' laboratory that have found support for a flexible attention system that can gain access to different forms of internal representation. This evidence comes from 3 main research areas: Negative Priming, Inhibition of Return, and Visual Neglect. This variety of research approaches provides converging evidence for the existence of object-based mechanisms of selective attention. The authors propose that the typical role of attention is not necessarily to facilitate perceptual processes, although it can do that. Rather, attention is necessary to achieve particular behavioural goals via selection of specific internal representations of individual objects from the complex internal representation of a scene. (PsycINFO Database Record (c) 2012 APA, all rights reserved)