Conference PaperPDF Available

Two selection stages provide efficient object-based attentional control for dynamic vision

April 2003

April 2003

Conference: Proceedings of the International Workshop on Attention and Performance in Computer Vision 2003
At: Graz, Austria

Authors:

In this paper, we introduce semiattentive computa-tions as the result of replacing the usual single selection stage of visual attention by two consecutive selection stages. They are motivated by shortcomings of conventional attention models and correlate well to findings in human attention. The first selection stage employs preattentive saliency computations for the complete available input, and selects a small number of discrete items. These are subject to the semiattentive processes of tracking and information accumulation. The second stage selects a single element from the result of the first selection stage for the conventional focus of attention. The implementation and efficiency of this scheme is demonstrated in this paper. Its main advantage is the efficient selection and inhibition of objects in dynamic scenes. It allows the serialized accumulation of information for a changing environment and provides an up-to-date world model. The focus of this paper is on the quality of the computed world model and the object-related computations.

Simplified version of conventional attention models.

…

Architecture of the attention model outlining the three computation stages. Inside the neural field, three-dimensional activity clusters are displayed.

…

Example of the feature computations (from left to right: symmetry, eccentricity, color contrast, and depth), saliency integration into master map, superposition in 3D master map, and neural field activation. The activation clusters in the neural field are colored. The 3D representations are ordered by increasing distance in reading order. The colored background reflects the architectural distinction of preattentive and semiattentive components as shown in fig. 2.

…

Figures - uploaded by Gerriet Backer

Content may be subject to copyright.

Content uploaded by Gerriet Backer

Content may be subject to copyright.

Two selection stages provide efﬁcient object-based

attentional control for dynamic vision

Gerriet Backer

Krauss Software GmbH

Cremlingen, Germany

Gerriet.Backer@krauss-software.de

Bärbel Mertsching

AG IMA, Department of Computer Science

University Hamburg, Germany

mertsching@informatik.uni-hamburg.de

Abstract— In this paper, we introduce semiattentive computa-

tions as the result of replacing the usual single selection stage

of visual attention by two consecutive selection stages. They

are motivated by shortcomings of conventional attention models

and correlate well to ﬁndings in human attention. The ﬁrst

selection stage employs preattentive saliency computations for

the complete available input, and selects a small number of

discrete items. These are subject to the semiattentive processes

of tracking and information accumulation. The second stage

selects a single element from the result of the ﬁrst selection

stage for the conventional focus of attention. The implementation

and efﬁciency of this scheme is demonstrated in this paper.

Its main advantage is the efﬁcient selection and inhibition of

objects in dynamic scenes. It allows the serialized accumulation

of information for a changing environment and provides an up-

to-date world model. The focus of this paper is on the quality of

the computed world model and the object-related computations.

I. INTRODUCTION

Attentional mechanisms are mainly used to reduce the

amount of data for complex computations. They employ a

method of determining important, salient objects or areas and

select them - one after another - for being subjected to these

computations. Thus attention is a general method for seri-

alizing complex operations. Computations in those schemes

are either preattentively applied to the complete input data in

parallel or attentively and serially only to the selected area.

Complex accurate computations are usually done attentively

while simple computations are assigned to the preattentive

part.

The spotlight metaphor describes this behavior: at each time

only one region of space is illuminated. This area is the focus

of attention (FOA) and as such the place where complex

operations are applied. The spotlight can move to include other

regions. It is moved to regions of high saliency. The saliency

of an area can be determined by either data-driven bottom-up

information or model-driven top-down information. The focus

of computational attention models is mostly on the data-driven

information. The predominant selection unit for attention is

space. Only a few models deviate from this view and use

either features or objects as selection unit.

The rest of the article is organized as follows. A short review

of the predominant attention models in chapter II leads us to

an analysis of their drawbacks when operating in dynamic

environments. Chapter III proposes necessary modiﬁcations

that lead to our new architecture. An implementation of this

architecture which lines out the object-based aspects follows.

After an analysis and comparison of the properties of our

model in chapter IV, we conclude with an outlook on further

developments (chapter V).

II. PREVIOUS WORK

A. Conventional models of visual attention

A classic model of visual attention was proposed by Koch

and Ullman in 1985 [1]. In containing a parallel feature

extraction stage, a master map of attention for integrating the

saliency, applying a WTA-process to this map and allowing

the scanning of maxima by using an inhibition map, it already

provided many aspects present in todays attention models. The

model (outlined in ﬁgure 1) is closely related to models of

human visual attention like the Feature Integration Theory by

Treisman [2] or the Guided Search model by Wolfe [3], [4].

Input data

Saliency map

Feature map

WTA

FOA

Feature map Feature map

Preattentive stage

Attentive stage

Inhibition

map

Fig. 1. Simpliﬁed version of conventional attention models.

Computer models that build on this architecture were in-

troduced by e.g. Milanese et al. [5], Leavers [6], Itti et al.

[7], and Maki [8]. Other models are more concerned with the

transformation of a scene part into a constant reference frame

like the routing circuits of Olshausen [9] or the inhibitorybeam

of Tsotsos [10].

Attentional control is of special importance in active vision

[11], where the activity of the system - mostly in the form

of directing a camera - has to be determined according to the

properties of the environment and the state and goal of the

system. Active vision is a form of overt visual attention that

is closely related to covert attention by selection from internal

representations.

Applications of visual attention in computer vision include

object recognition methods [12], control of vehicles [13], and

navigation [14]. Especially object recognition proﬁts from

the availability of a segmented single object in contrast to

a cluttered scene.

B. Beyond the spotlight

Accounts that go beyond the spotlight metaphor are mainly

found in models of natural visual attention. Pylyshyn [15],

[16] proposed the so-called FINST-theory to give an account

of ﬁndings from various experimental paradigms. He was able

to show that one can keep track of a small number (about 4

or 5) of independently moving objects among other identical

objects [17]. Accounts of a fast serial scanning of the objects

by a single focus of attention could be ruled out due to the

necessary speed of the focus. There is also an attention-related

limit on the fast, parallel and error-free counting (so-called

subitizing) of a small number of items (about 4 or 5). This

led to the assumption that some indices are available pointing

to moving objects and sticking to them without the need for

focal attention. Indexed items are more easily available for

focal attention.

Object-based theories of visual attention [18], [19] challenge

the predominant spatial accounts of attention. According to

them, objects are the meaningful units in visual selection.

The partitioning of the scene into objects determines the

assignment of attention. The empirical evidence comes from

experiments where in identical spatial layouts the suggested

grouping into objects caused additional costs associated with

the processing of multiple objects. Some recent empirical

ﬁndings point towards an integration of object-based effects

in spatial selection. A possible compromise suggests that

although attention selects a spatial part of the scene, the space

is determined by a fast object-based segmentation of the scene,

or by grouping effects.

Examples of the successful incorporation of object-based

approaches into computer models of visual attention have

been demonstrated at various levels by Fellenz [20], Maki et

al. [21], and Dickinson et al. [22]. We aim to contribute to

these achievements with a special focus on dynamic aspects

of controlling attention.

C. Limitations of conventional models

By applying conventional models (see section II-A) to

dynamic scenes we identify three major problems:



Inhibition of return is bound to static locations instead of

moving objects.



Extracted information cannot be bound to moving objects.



Selection and feature integration do not take into account

the dynamic environment.

By inhibiting recently selected locations, inhibition of return

(IOR) allows the scanning of a scene by a serial process. The

area with maximal activation in the master map of attention is

marked in the inhibition map with high activity. The activity in

the inhibition map is slowly decaying and inhibits the master

map of attention, so that another area will show the highest

activity. Using this static inhibition map, it is not possible to

inhibit moving objects. Imagine a scene with a highly salient

moving object and a number of salient static objects. After the

moving object is selected and processed, it is marked in the

inhibition map. As soon as it moves out of the inhibited area

it becomes the most salient object and is selected again. This

prevents the system from scanning the scene and selecting

among the static objects. For human visual attention Tipper et

al. [23] have demonstrated that the IOR is in fact bound to

moving objects instead of static locations.

By serializing the high-level computations, their results e.g.

object identities or classiﬁcations are bound to the location an

object inhabited in the moment it was selected. In the case

of moving objects, this information is soon outdated. Without

expensive re-checking, the system is not able to provide an up-

to-date world model, binding the identities to actual locations.

Itti and Koch [24] identiﬁed the spatiotemporal integration

of saliency information as an important step in the control

of visual attention. Thus the saliency information has to be

computed taking into account the previous saliency data. This

can lead to problems if there is no knowledge about the

movement of objects raising the saliency values.

In the following we will discuss how these problems can

be overcome in a new model of visual attention.

III. MODELING VISUAL ATTENTION

A. Consequences for modeling visual attention in dynamic

environments

From the above analysis it is clear that we need a method

of binding the saliency information to moving objects. For

the problem of dynamic IOR, we have to bind saliency

information to a small number of recently selected moving

objects. This also provides us with the binding of attentively

computed information to these objects and is thus a solution

for the ﬁrst two problems.

This binding is necessary for the already selected objects

as well as salient objects that have not been selected yet.

This determines the need for a model-free tracking mechanism.

Objects that have never been selected for focal attention are not

recognized and can thus not be tracked based on knowledge

of their identity. Nevertheless it is not necessary to track all

objects in the scene. Just those who are salient enough to be

candidates for focal selection are relevant. This indicates a

close connection between selection and tracking.

Determining the saliency for these objects is a necessary

ﬁrst step. To reﬂect the properties of the environment and

account for immanent inaccuracies in the feature computa-

tions, that have to computed preattentively for the complete

input images, spatial and temporal integration of saliency is

important. It has to compensate that the objects may be moving

and that the speed constraints on the preattentive feature

computations impose limits on their accuracy and reliability.

B. Model architecture

The processes that have been classiﬁed essential in the

previous chapter can neither be assigned to the preattentive

part nor to the attentive part of the selection.We therefor deﬁne

an additional semiattentive stage, where a small number of

discrete items is represented. These items have to be selected

by a ﬁrst selection stage. This ﬁrst selection stage selects a

small number of items according to their saliency integrated

over space and time. It should be robust and show hysteresis:

selected items remain selected for some time, even if other

items become more salient. Tracking is integrated into this ﬁrst

selection stage which allows to bind extracted information to

moving objects as well as to inhibit them from being selected

by focal attention.

Input image

computations

Feature

Behavior

control

FOA

−

Neural

field

Attentive stage Semiattentive stage

Object file 3

Object file 2

Object file 1

...

Preattentive stage

World model

sequence

Saliency

representation

Object

recognition

Fig. 2. Architecture of the attention model outlining the three computa-

tion stages. Inside the neural ﬁeld, three-dimensional activity clusters are

displayed.

Among the semiattentive algorithms is the generation of

symbolic descriptions for each selected item. These contain

information about the position, size, and trajectories as well

as histories of object selection, mean feature values, and

the results of high-level computations like object recognition.

They are stored in so-called object ﬁles, which constitute

the world model of the system. The notion of object ﬁles is

borrowed from psychophysical modeling [25], [26], [27] and

emphasizes the symbolic reference to an object preceeding the

computation of identity information.

For the focal selection of a single item, a second selection

stage is needed. This stage selects among the items that were

the result of the ﬁrst selection stage. Second-stage selection

is subject to behaviors. It operates on the symbolic data

associated with an object and can include top-down inﬂuences.

The behavior is responsible for controlling the system, it

can e.g. initiate camera movements for foveating an object.

Figure 2 depicts the model architecture. Its implementation is

explained in the following section.

C. Model implementation

1) Feature computations: For the computation of saliency

we employ a number of features designed for fast object-

related information extraction.To achieve a most robust behav-

ior in different environments, these features use very different

aspects of the visual information, including edges, areas, color

information and stereo information where available. The use of

multi-scale computations ensures fast computations and robust

results. We tried to realize a more object-based behavior than

simple ﬁlter-operations could achieve.

Symmetry: To extract edge information in a biologically

plausible way, gabor ﬁlters of different scales and orientations

are applied to the input. The energy of the gabor ﬁlters

orthogonal to circles of different radii at different scales is

accumulated to compute the strength of rotation symmetry at

every pixel. Symmetry is a strong cue for artiﬁcial objects as

well as biological forms and points toward their center.

Eccentricity: A grey-level segmentation of the image,

consisting of a fast initial segmentation into many small

segments followed by a dilation and integration procedure,

provides area-based information of homogenous object or

object-part candidates. The saliency of segments is evaluated

by a computation of the segments eccentricity.

Color contrast: The image is ﬁrst transformed into the

MTM color space [28] to achieve human-like processing of

colors. There, a segmentation takes place. The saliency of each

segment is computed according to the mean color contrast

to its neighboring segments weighted by the length of the

common border.

Depth: Gabor ﬁlters with vertical components form the

basis for this feature. For different orientations, a modiﬁed

cross correlation is applied to the ﬁlter energies of two stereo

images using multiple scales. Results from the lower resolution

scales limit the correlation range. A voting scheme selects the

most probable disparity from the correlation results for each

location. It takes into account the results of neighboring pixels,

different orientations and scales. According to the heuristic

that a system should ﬁrst react to close objects, the saliency

is monotonic with the disparity.

The segmentation results as well as clues from depth and

symmetry can be used to identify visual objects and segment

them. The features have been described in detail in [29], [30],

[31]. The feature saliency is integrated into a representation

by ﬁrst honoring exclusivity (one single red area is more

salient than a large number of identically colored areas) and

a following superposition of the feature values. In case stereo

information is available, a 3D representation is created. The

saliency representation provides the information necessary for

the ﬁrst selection stage.

2) Dynamic neural ﬁelds: The close integration of robust

selection and model-free tracking suggests the use of dynamic

neural ﬁelds (DNF) as proposed by Amari [32]. Their selection

characteristic is robust, shows hysteresis and spatiotemporal

integration, which makes them the perfect candidates for

this stage as shown in [33]. Neural ﬁelds are simulations of

laterally connected cortical areas. Their topology corresponds

to the input they receive. The connections inside the ﬁeld

are homogenous, only dependant on the distance between the

neurons. The dynamic of a neuron’s activity



at position



and time



is deﬁned by the following differential equation:



























 





!





#"





$

(1)

Herein,



is a (negative) resting value,



is the weight

function for the connections between the neurons,



is a

sigmoid function and

denotes the input function. The weights

for a DNF are excitatoric in a local neighborhood and get

inhibitoric for distant neurons. Different implementations use

either connections in a local neighborhood (local inhibition

type) or simulate a completely interconnected neural ﬁeld

(global inhibition type). While the ﬁrst type has stable states

with multiple activity clusters, the latter shows not more than

one such cluster. The weights are typically deﬁned by a DoG-

function (for the local inhibition type) or standard distribu-

tions with a constant negative term (for the global inhibition

type). The distinct clusters of positive activity develop at

locations with sustained high input values and follow this

input. Hysteresis and spatiotemporal integration are important

mathematically proven properties of neural ﬁelds [32], [34].

We have realized different architectures of neural ﬁelds

reﬂecting characteristics of the saliency representation that

is used as input for the neural ﬁeld [33]. They include

systems of interconnected global inhibition 2D neural ﬁelds for

individually weighted superpositions of the saliency features

as well as a single local inhibition 3D DNF. Using these

architectures we aim at integrating saliency only for objects

and use the cues of (three-dimensional) neighborhood or the

homogenity of an object. All those architectures show a small

number of distinct activity clusters (connected areas of positive

activity) that denote locations of high saliency and follow the

movement of such areas in their input.

These activity clusters are the result of the ﬁrst selection

stage. They correspond to areas of sustained high saliency in

the input. For each of these clusters, a symbolic description is

created, a so-called object ﬁle. The underlying hypothesis is

that each of the clusters corresponds to a basic visual object,

a meaningful part of an object or a collection of objects. The

correspondence between object ﬁles and activity clusters is

constantly updated, a process that is easily implemented due to

the well-deﬁned behavior of DNF, which shows spatial limits

for integration, tracking, and inhibition of different objects.

These thresholds determine spatial boundaries beyond which

no correspondence is sought. Inside the boundaries, spatial

distance and similarity of features inside the activity areas

determine the correspondence of object ﬁles to activity clusters

and therefor the continuity of object ﬁles.

3) Second selection stage: The second selection stage is

subject to top-down inﬂuences and can be implemented in a

problem-speciﬁc way. Its operation is encapsulated in behav-

iors, that take into account the object ﬁle-information as well

as the state and goal of the system. Main task of an behavior

is the selection of one of the object ﬁles, and thereby the

corresponding activity cluster in the neural ﬁeld, for focal

attention. The area corresponding to the activity cluster in

the input image is then subjected to high-level computations

like object recognition. It can also be foveated by saccadic

camera movements, so that the system shows overt attention

by controlling an active vision system [31].

The default exploration behavior is achieved by assigning

priority levels to the object ﬁles according to the time they

were last selected. Unselected items receive the highest prior-

ity. Within a priority level the object ﬁles are ordered by their

saliency. Dynamic IOR for moving objects is implicit to this

behavior and can be achieved by other behaviors in a similar

way. Examples of other behaviors we have implemented

include an alarm system, integrated searching and tracking of

a deﬁned object, and the simulation of visual search.

Due to the symbolic computation on a small number of

simple data structures, the modiﬁcation of behaviors and the

implementation of additional behaviors (possibly using exist-

ing behaviors) is easily achieved. In operating on individual

items, the behaviors are related Ullmans visual routines [35].

The indexed items from the visual routine model correspond

to the object ﬁles in our model. They differ in that a single

behavior is used for the main system control and a collection

of visual routines is used in the Ullman model, but it would

be possible to replace the monolithic behavior by such an

approach. An important aspect agreed on by Ullman [35]

and Pylyshyn [16] is that indices to a number of items are

important for relational operations. This is also achieved by

the ﬁrst selection stage of our architecture. Behaviors can use

notions of objects that are “behind”, “higher” or “larger” than

something else.

IV. RESULTS

A. First selection stage and semiattentive computations

Static image feature computations, integration and selection

by a DNF is depicted in ﬁgure 3. The variant shown uses

the stereo information computed during the determination

of stereo saliency to create a 3D representation of overall

saliency. The neural ﬁeld used is a single three-dimensional

local inhibition type DNF. We used some modiﬁcations [30]

to the neural ﬁeld to realize fast computations in spite of the

high dimensionality and the large number of neurons.

The tracking performance by neural ﬁelds is demonstrated

in [33]. The feature saliency reﬂects the environment proper-

ties. The features have been analyzed further in [30], [31].

B. World model quality

In order to compare the quality of the new approach to more

conventional modeling of attentional control we designed an

experiment involving the exploration of a scene by simulated

Input images

3D−Mastermap

activation

Neural field

Features

2D−Mastermap

Fig. 3. Example of the feature computations (from left to right: symmetry,

eccentricity, color contrast, and depth), saliency integration into master map,

superposition in 3D master map, and neural ﬁeld activation. The activation

clusters in the neural ﬁeld are colored. The 3D representations are ordered

by increasing distance in reading order. The colored background reﬂects

the architectural distinction of preattentive and semiattentive components as

shown in ﬁg. 2.

recognition of objects. This allowed us to abstract from special

aspects like feature computation qualities for different inputs

and concentrating on the architectural design. The goal to

be achieved was to compute a world model containing as

much objects as possible while maintaining accurate position

information for these objects. A number of simple objects

(squares of 5 by 5 pixels) were either stationary or moving

on a straight path (they moved at most 2 pixels in x- and y-

direction between consecutive frames). Noise was added with

half the amplitude of the objects. This data was used as a

simulated 2D master map of attention. Figure 4 shows three

consecutive frames of such a scene.

To these scenes we applied a conventional attention algo-

rithm with a static inhibition map as well as our attention

model. A simulated object recognition was the high-level

algorithm carried out at the focus of attention. The recognition

should take three frames. For our model, we added another

fourth frame to the recognition duration to compensate the

additional computations necessary for the neural ﬁelds. We

compared the resulting world models (identiﬁed objects and

their positions) to ground truth and computed the mean number

Fig. 4. Three consecutive frames used in the experiment for comparing

the attention models. Two of the objects are static, while three of them are

dynamic.

of recognized objects and the position error. Whenever the

position was off by more than 20 pixels, the object was

counted as not being recognized.

For this experiment, we used the most simple variant of

our model with a single 2D neural ﬁeld of local inhibition

type. The choice was made to achieve as much comparability

between the two models as possible. As conventional models

use a 2D representation of overall saliency, we decided to use

the same representation for our neural ﬁeld model. This ruled

out the use of the more advanced 3D neural ﬁeld and the

system of global inhibition 2D ﬁelds with weighted features

(see [33] for a comparison).

The conventional model was mainly derived from the Koch

and Ullman [1] model. By abstracting from the feature com-

putations as well as the WTA-process, we tried to capture the

essential selection and inhibition scheme of the conventional

attention algorithms that we analyzed in section II-C. The

localization and selection was achieved by blurring the input

(mimicking the selection by neural ﬁelds and ﬁnding the

center of the input) and selecting the maximum value after

applying inhibition. We used an inhibition map with activity

slowly decaying by a factor of 0.8 after each frame. An object

was marked in the inhibition map using an area of 8 by

8 pixels, taking into account that the distance between two

objects was at least 14 pixels at each moment, so that there

was no danger of inhibiting a different object. The large size

and slow decay was chosen to give the conventional system

a small additional advantage: the long inhibition of objects

that would not inhibit other moving objects due to the large

distance. Under real world circumstances, the classical model

would perform worse than in our experiment, while our model

could still be improved by using more advanced neural ﬁeld

architectures and saliency representations.

One run of the experiment consisted of the preparation of 40

input frames (master maps) with the desired number of static

and dynamic objects. Ground truth of the identity and location

of the objects was computed. Both models were presented with

the simulated master maps, selected one location/item for focal

attention and started the simulated object recognition. After

three or four frames (depending on the model), the identity

of the selected object was returned by the simulated object

recognition. The identity was transferred to the internal world

model. For the conventional algorithm, it was connected to the

position, where it was selected. Our model used the object ﬁles

0 1 2 3 4 5 0

Recognized objects

Position error

Dynamic objects

NeuralField-objects

Conventional-objects

NeuralField-position

Conventional-position

0 1 2 3 4 5 0

Recognized objects

Position error

Dynamic objects

NeuralField-objects

Conventional-objects

NeuralField-position

Conventional-position

0 1 2 3 4 5 0

Recognized objects

Position error

Dynamic objects

NeuralField-objects

Conventional-objects

NeuralField-position

Conventional-position

Static objects: 1 Static objects: 3 Static objects: 5

Fig. 5. Comparison of world model quality for two approaches of visual attention and varied numbers of static and dynamic objects. Depicted are the number

of recognized objects and the position error. See text for details.

to connect the identity with the actual position of the activity

cluster. The mean number of recognized objects in the world

model (computed over all frames and all runs) as well as the

mean position error for the recognized objects was computed.

Figure 5 shows the results for different numbers of static

and dynamic objects. We refer to the conventional model

as “Conventional” and to our two-stage selection model as

“NeuralField”. Each data point is based on 50 runs of 40

frames. The mean number of recognized objects is always

smaller than the number of objects present because the mean

is taken over the complete run. The systems need a number of

frames until every object is recognized. Take the runs with ﬁve

dynamic and ﬁve static objects. The neural ﬁeld model needs

at least all 40 frames until all 10 objects are recognized (four

frames per object). Therefor its optimal result would be a mean

of ﬁve recognized objects. It nearly reaches this optimum.

We ﬁnd that for every number of static objects, the neural

ﬁeld model scales much better with the number of dynamic

objects. The advantage of faster simulated recognition is only

exploited by the conventionalmodel when no object is moving.

In all other cases the new model is supreme. This is especially

true for the position error. In every condition, the mean

position error is smaller than 0.5 pixels for our new model

while the conventional model shows an error between 0.5 and

5 pixels. We conclude that even with the higher computational

demands associated with the neural ﬁelds, the new approach

provides a more efﬁcient way of scanning a scene and keeping

the extracted information up to date than conventional models

of visual attention.

C. System performance

The exploration of a scene by a speciﬁed behavior is

depicted in ﬁgure 6. It shows the input frames, with object

ﬁles marked by bounding boxes, together with the area of

the FOA (below the input frame). Note the selection of

mainly meaningful areas (ball, picture, and robot) due to the

object-related feature computations and the movement of the

bounding box together with the moving objects (ball and

robot). The ﬁrst re-checking of an item occurs only after

all objects were subject to focal attention. In this aspect, the

system has found an optimal dynamic scanpath.

The experiment was carried out using the simulation envi-

ronment Orbital 3D, which was implemented in our workgroup

[36]. Its suitability for evaluating vision algorithms is due to

the fact that controllable environments of different qualities

can be used to provide reproducible dynamic experiments with

ground truth [37].

D. Correspondence to natural visual attention

We have incorporated some advanced aspects of natural

visual attention models into our attention architecture. It is

therefor suggesting to ask if the architecture can serve to

explain additional empirical ﬁndings on attention. Besides the

simultaneous tracking of multiple objects and the binding of

IOR to moving objects that are inherent to the model, we take

a look at further effects in natural attention that are known to

be difﬁcult to explain.

The two selection stages contribute to the old debate on

early and late selection. The core problem here is that while

under some circumstances there is evidence for complex com-

putations outside the focus of attention, others ﬁnd that even

simple computations need attention. By using two selection

stages, the dichotomy of attentive and preattentive processing

is replaced by three stages, adding semiattentive computations.

By shifting computations between the attentive and the semi-

attentive stage in accordance with the computational load,

the complexity of the task, and the state of the system, the

observed variants of serial and parallel processing could be

produced.

Accounts of multiple foci of attention [38] or the striking

effects of ﬂanker compatibility [39] can also be explained

by our model as being related to semiattentive computations.

Take the experiments by Kramer and Hahn [38]. They showed

that it is possible to quickly compare objects at two positions

Fig. 6. Exploration of a scene. For 15 frames (in reading order), the current view of the scene is annotated with the bounding boxes and numbers of the

object ﬁles. The currently selected OF is white with an arrow pointing from the center towards it; OF with already recognized objects are blue, OF unselected

so far are red. For each frame, the area of the FOA is depicted separately.

without identifying distractors lying in between them. This

ruled out the possibility of one large spotlight of attention. The

presentation speed ruled out a possible “jump” of the focus of

attention from one object to the other. Using our model, the

explanation would not involve multiple foci of attention but

just semiattentive selection and comparison of both items.

The ﬂanker compatibility effects [40] demonstrate the pro-

cessing and recognition of items at positions that are known

to be irrelevant (distractors) when the task is to classify one

item (the target) at a previously known position. At a ﬁrst

look, this seems to be just what would be avoided by selective

attention. The typical displays to demonstrate this effect show

a small number of items that are easily recognized (like digits

or letters). The distractors are of the same type as the target.

Applying our model to such displays, each item would be

selected by the ﬁrst selection stage due to the small number

of overall present items and their similarity to the target. The

identiﬁcation processes are rather simple and could operate on

the semiattentive stage. Focal attention is then just needed to

bind the correct result to the target and the reaction. Although

it may be more efﬁcient to suppress the computation of letter

and target identities, the Stroop effect [41] suggests that they

are too automated to be suppressed whenever an item is

selected.

V. CONCLUSIONS

The novel architecture of two selection stages in visual

attention, providing an additional semiattentive computation

stage was motivated by problems conventional approaches of

visual attention reveal in dynamic scenes. The object-based

computations in every stage of the model allow us to refer

to meaningful entities of the environment. This improves the

selection process itself and simpliﬁes high-level computations

like object recognition. Especially in dynamic environments,

the operation on moving objects is an improvement over purely

spatial approaches.

By creating object ﬁles for the discrete activity clusters in

the neural ﬁeld, the model shows a well-deﬁned transition from

subsymbolic computations to the symbolic domain, where

single visual objects are the subjects of manipulation. The

implementation of behaviors for the second selection stage al-

lows an encapsulation of top-down inﬂuences on the operation

characteristic of the system.

Specialization for applications is achieved by additional

features that allow the localization and selection of objects

relevant to the task at hand. The modiﬁcation or implemen-

tation of behaviors allows the integration into a larger vision

system, as well as interaction with other system components,

and the inclusion of speciﬁc knowledge. Note that the system

does not depend on such knowledge, but it can be augmented

and specialized whenever it is available. To provide even better

object candidates by the ﬁrst selection stage, a segmentation

process based on the feature computations would be a sugges-

tive extension of the model.

REFERENCES

[1] C. Koch and S. Ullman, “Shifts in selective visual attention: Towards the

underlying neural circuitry,” Human Neurobiology, vol. 4, pp. 219–227,

1985.

[2] A. Treisman and G. Gelade, “A feature integration theory of attention,”

Cognitive Psychology, vol. 12, pp. 97–136, 1980.

[3] J. Wolfe, K. R. Cave, and S. L. Franzel, “Guided search: An alternative

to the feature integration model for visual search,” Journal of Exper-

imental Psychology: Human Perception and Performance, vol. 15, pp.

419–433, 1989.

[4] J. Wolfe, “Guided search 2.0: A revised model of visual search,”

Psychonomic Bulletin and Review, vol. 1, no. 2, pp. 202–238, 1994.

[5] R. Milanese, H. Wechsler, S. Gil, J. Bost, and T. Pun, “Integration

of bottom-up and top-down cues for visual attention using non-linear

relaxation,” in Proceedings, of the IEEE Conference on Computer Vision

and Pattern Recognition (Seattle, 1994), 1994, pp. 781–785.

[6] V. Leavers, “Preattentive computer vision - towards a 2-stage computer

vision system for the extraction of qualitative descriptors and the cues

for focus of attention,” Image and Vision Computing, vol. 12, no. 9, pp.

583–599, 1994.

[7] L. Itti and C. Koch, “A saliency-based search mechanism for overt and

covert shifts of visual attention,” Vision Research, vol. 10-12, pp. 1489–

1506, 6 2000.

[8] A. Maki, P. Nordlund, and J.-O. Eklundh, “A computational model of

depth-based attention,” in Proc. 13th Int. Conf. on Pattern Recognition,

vol. 4, 1996, pp. 734–738.

[9] B. Olshausen, C. Anderson, and D. V. Essen, “A multiscale dynamic

routing circuit for forming size- and position-invariant object representa-

tions,” Journal of Computational Neuroscience, vol. 2, no. 1, pp. 45–62,

1995.

[10] J. K. Tsotsos, “An inhibitory beam for attentional selection,” in Spatial

visions in humans and robots, L. Harris and M. Jenkins, Eds., 1993.

[11] Y. Aloimonos, I. Weiss, and A. Bandopadhay, “Active vision,” in

Proceedings of the ﬁrst International Conference on Computer Vision,

1987, pp. 35–54.

[12] L. Pessoa and S. Exel, “Attentional strategies for object recognition,”

in Proceedings of the IWANN, Alicante, Spain 1999, ser. Lecture Notes

in Computer Science, J. Mira and J. Sachez-Andres, Eds., vol. 1606.

Springer, 1999, pp. 850–859.

[13] D. Reece and S. Shafer, “Control of perceptual attention in robot

driving,” Artiﬁcial Intelligence, vol. 78, pp. 397–430, 1995.

[14] A. Abbott, “A survey of selective ﬁxation control for machine vision,”

in IEEE Control Systems, 1992, pp. 25–31.

[15] Z. Pylyshyn, J. Burkell, B. Fisher, C. Sears, W. Schmidt, and L. Trick,

“Multiple parallel access in visual attention,” Canadian Journal of

Experimental Psychology, vol. 48, no. 2, pp. 260–283, 1994.

[16] Z. Pylyshyn, “Visual indexes in spatial vision and imagery,” in Visual

Attention, ser. Vancouver Studies in Cognitive Science, R. Wright, Ed.

Oxford University Press, 1998, no. 8, pp. 215–231.

[17] Z. W. Pylyshyn and R. Storm, “Tracking multiple independent targets:

evidence for a parallel tracking mechanism,” Spatial Vision, vol. 3, no. 3,

1988.

[18] S. Vecera and M. Farah, “Does visual attention select objects or

locations,” Journal of Experimental Psychology: General, vol. 123, pp.

146–160, 1994.

[19] S. Tipper and B. Weaver, “The medium of attention: Location-based,

object-centered, or scene-bases,” in Visual Attention, R. Wright, Ed.

Oxford University Press, 1998, pp. 77–107.

[20] W. Fellenz and G. Hartmann, “Preattentive grouping and attentive

selection for early visual computation,” in 13th ICPR 1996, International

25 - 30, 1996 Conference on Pattern Recognition Technical University,

Wien, August, 1996.

[21] A. Maki, P. Nordlund, and J.-O. Eklundh, “Attentional scene segmen-

tation: Integrating depth and motion,” Computer Vision and Image

Understanding, vol. 78, pp. 351–373, 2000.

[22] S. Dickinson, H. Christensen, J. Tsotsos, and G. Olofsson, “Active ob-

ject recognition integrating attention and viewpoint control,” Computer

Vision and Image Understanding, vol. 67, no. 3, pp. 239–260, 1997.

[23] S. P. Tipper, J. Driver, and B. Weaver, “Object-centered inhibition of re-

turn of visual attention,” Quarterly Journal of Experimental Psychology,

vol. 43A, no. 2, pp. 289–298, May 1991.

[24] L. Itti and C. Koch, “Feature combination strategies for saliency-based

visual attention systems,” Journal of Electronic Imaging, vol. 10, no. 1,

2001.

[25] A. Treisman, “Representing visual objects,” in Attention and Perfor-

mance, D. Meyer and S. Kornblum, Eds. Hillsdale, NJ: Erlbaum, 1991,

vol. 14.

[26] D. Kahneman, A. Treisman, and B. Gibbs, “The reviewing of object

ﬁles: object-speciﬁc integration of information,” Cognitive Psychology,

vol. 24, no. 2, pp. 175–210, 1992.

[27] J. Wolfe and S. Bennett, “Preattentive object ﬁles: Shapeless bundles of

basic features,” Vision Research, vol. 37, pp. 25–43, 1997.

[28] M. Miyahara and Y. Yoshida, “Mathematical transform of (r, g, b) color

data to munsell (h, v, c) color data,” Visual Communication and Image

Processing, vol. 1001, pp. 650–657, 1988.

[29] B. Mertsching, M. Bollmann, R. Hoischen, and S. Schmalz, “The neural

active vision system navis,” in Handbook of Computer Vision and

Applications Vol. 3 (Systems and Applications), B. Jähne, H. HauSSecke,

and P. GeiSSler, Eds. Academic Press, 1999, pp. 543–568.

[30] G. Backer and B. Mertsching, “Integrating time and depth into the at-

tentional control of an active vision system,” in Dynamische Perzeption.

Workshop der GI-Fachgruppe 1.0.4 Bildverstehen, Ulm, November 2000,

G. Baratoff and H. Neumann, Eds., 2000, pp. 69–74.

[31] G. Backer, B. Mertsching, and M. Bollmann, “Data- and model-driven

gaze control for an active-vision system,” IEEE Transactions on Pattern

Analysis and Machine Intelligence, vol. 23, no. 12, pp. 1415–1429, 2001.

[32] S.-I. Amari, “Dynamics of pattern formation in lateral inhibition type

neural ﬁeld,” Biological Cybernetic, vol. 27, pp. 77–87, 1977.

[33] G. Backer and B. Mertsching, “Using neural ﬁeld dynamics in the

context of attentional control,” in Proceedings of the ICANN 2002, 2002,

pp. 1237–1242.

[34] K. Kishimoto and S.-I. Amari, “Existence and stability of local excita-

tions in homogenous neural ﬁelds,” Journal of Mathematical Biology,

vol. 7, pp. 303–318, 1979.

[35] S. Ullman, “Visual routines,” Cognition, vol. 18, pp. 97–159, 1984.

[36] M. Bungenstock, A. Baudry, J. Bitterling, and B. Mertsching, “Devel-

opment of a simulation framework for mobile robots,” in Proceedings

of the EUROIMAGE ICAV3D 2001, 2001, pp. 89–92.

[37] G. Backer and B. Mertsching, “Evaluation of attentional control in active

vision systems using a 3d simulation framework,” in Journal of the

WSCG - 10th International Conference in Central Europe on Computer

Graphics, Visualization and Computer Vision, vol. 10, 2002, pp. 32–39.

[38] A. Kramer and S. Hahn, “Splitting the beam: Distribution of attention

over noncontiguous regions of the visual ﬁeld,” Psychological Science,

vol. 6, no. 6, pp. 381–386, 1995.

[39] J. Miller, “The ﬂanker compatibility effct as a function of visual angle,

attention focus, visual transients, and perceptual load: A search for

boundary conditions,” Perception and Psychophysics, vol. 49, pp. 270–

288, 1991.

[40] C. Eriksen and J. Hoffman, “The extent of processing of noise ele-

ments during selective encoding from visual displays,” Perception and

Psychophysics, vol. 14, pp. 155–160, 1973.

[41] J. Stroop, “Studies of interference in serial verbal reactions,” Journal of

Experimental Psychology, vol. 18, pp. 643–662, 1935.

Cognitive architecture for an Attention-based and Bidirectional Loop-closing Domain (CABILDO)

Thesis

Full-text available

Jul 2014

Antonio Jesús Palomino

This Ph. D. Thesis presents a novel attention-based cognitive architecture for social robots. The architecture aims to join perception and reasoning considering a double and simultaneous imbrication: the ongoing task biases the perceptual process to obtain only useful elements whereas perceived items determine the behaviours to be accomplished. Therefore, the proposed architecture represents a bidirectional solution to the perception-reasoning-action loop closing problem. The basis of the architecture is an Object-Based Visual Attention model. This perception system draws attention over perceptual units of visual information, called proto-objects. In order to highlight relevant elements, not only several intrinsic basic features (such as colour, location or shape) but also the constraints provided by the ongoing behaviour and context are considered. The proposed architecture is divided into two levels of performance. The lower level is concerned with quantitative models of execution, namely tasks that are suitable for the current work conditions, whereas a qualitative framework that describes and defines tasks relationships and coverages is placed at the top level. Perceived items determine the tasks that can be executed in each moment, following a need-based approach. Thereby, the tasks that better fit the perceived environment are more likely to be executed. Finally, the cognitive architecture has been tested using a real and unrestricted scenario that involves a real robot, time-varying tasks and daily life situations, in order to demonstrate that the proposal is able to efficiently address time- and behaviour-varying environments, overcoming the main drawbacks of already existing models. - See more at: http://riuma.uma.es/xmlui/handle/10630/8174#sthash.n8yRGDHU.dpuf

Fast Attentional Mechanism for a Social Robot

Article

Full-text available

Biological-plausible attention mechanisms are general approaches that permit a social robot to extract only relevant information from the huge amount of input data. In this paper an attention mechanism based on the feature integration theory is proposed. The aim of this attention mechanism is to provide to higher-level modules of the vision system the most relevant regions in fast, dynamic scenarios where interaction with humans can occur. The proposed system integrates bottom-up (data-driven) and top-down (model-driven) processing. The bottom-up component determines and selects salient image regions by computing a num- ber of different features. The top-down component makes use of object templates to filter out data and track significant objects. The proposed system has three steps: parallel computation of feature maps, feature integration and simultaneous tracking of the most salient regions. Its main characteristic is that the mechanism integrates the tracking of the most salient regions, which allows to handle changing environments with moving objects where occlusions can oc- cur.

ATTENTION MECHANISM

Article

Rebeca Marfil

Aptive Image Segmentation based on Saliency Detection

Article

Full-text available

Mar 2015

Shui Linlin

In this article, we propose an adaptive image segmentation method based on saliency. First of all, we obtain the saliency map of an image via four bottom-layer feature tunnels, i.e. color, intensity, direction and energy. The energy tunnel helps to describe the outline of objects better in the saliency map. Then, we construct the target detection masks according to the greyness of pixels in the saliency map. Each mask is applied to the original image as the result of pre-segmentation, then corresponding image entropy is calculated. Predict the expected entropy according to maximum entropy criteria and select the optimal segmentation according to the entropies of pre-segmented images and the expected entropy. A large number of experiments have proved the effectiveness and advantages of this algorithm.

Visual Attention Mechanism for a Social Robot

Article

Full-text available

Jan 2012

This paper describes a visual perception system for a social robot. The central part of this system is an artificial attention mechanism that discriminates the most relevant information from all the visual information perceived by the robot. It is composed by three stages. At the preattentive stage, the concept of saliency is implemented based on ‘proto-objects’ [37]. From these objects, different saliency maps are generated. Then, the semiattentive stage identifies and tracks significant items according to the tasks to accomplish. This tracking process allows to implement the ‘inhibition of return’. Finally, the attentive stage fixes the field of attention to the most relevant object depending on the behaviours to carry out. Three behaviours have been implemented and tested which allow the robot to detect visual landmarks in an initially unknown environment, and to recognize and capture the upper-body motion of people interested in interact with it.

Visual perception system for a social robot

Article

Full-text available

Jun 2010

This paper describes a visual perception system which allows a social robot to conduct several tasks. The central part of this system is an artificial attention mechanism which is able to discriminate the most relevant information from all the visual information perceived by the robot. This attention mechanism is composed by three modules or stages. At the preattentive stage, a set of uniforms blobs or 'pre-attentive objects' is obtained. Once the most salient objects are obtained, the semiattentive stage identifies and tracks some of them according to the tasks to accomplish. This tracking process allows to implement the `inhibition of return', avoiding revisiting an attended object. Finally, the attentive stage also fixes the field of attention to the most relevant object depending on the behaviours to accomplish. Three behaviours have been implemented which allow the robot to detect visual landmarks in an initially unknown environment and to recognize and capture the upper-body motion of people interested in interact with it.

Attention Architectures for Machine Vision and Mobile Robots

Chapter

Jan 2005

A key property of neural processing in higher mammals is the capability to focus resources, by selectively directing attention towards the most important sensory inputs of the moment. Attention research has shown rapid growth over the past two decades, as new techniques have become available to study higher brain function in humans, non-human primates, and other mammals. Neurobiology of Attention is the first encyclopedic volume to summarize the latest developments in attention research. An authoritative collection of 111 concise articles organized into thematic sections provides both broad coverage and access to focused, up-to-date research findings. The volume presents a state-of-the-art multidisciplinary perspective on psychological, physiological and computational approaches to understanding the neurobiology of attention. Ideal for students, as a reference handbook, or for rapid browsing, the book has a wide appeal to anybody interested in attention research.

Selective Review of Visual Attention Models

Chapter

Jan 2013

The purpose of this chapter is both to review some of the most representative visual attention models, both theoretical and practical, that have been proposed to date, and to introduce the authors’ attention model, which has been successfully used as part of the control system of a robotic platform. The chapter has three sections: in the first section, an introduction to visual attention is given. In the second section, relevant state of art in visual attention is reviewed. This review is organised in three areas: psychological based models, connectionist models, and features-based models. In the last section, the authors’ attention model is presented.

Una Comparativa entre el ¶ Algebra de Rectangulos y la Logica SpPNL

Article

Full-text available

Resumen En Inteligencia Artiflcial esta ampliamente aceptado el im- portante papel que desempe~na el razonamiento espacial. Por un lado, el razonamiento cualitativo espacial puede tratarse mediante teor¶‡as pura- mente existenciales en forma de sistemas de satisfaccion de restricciones sobre un conjunto de relaciones mutuamente excluyentes y disjuntas. Por otro lado, las logicas modales y de primer orden proporcionan un poder expresivo mayor a costa de un peor comportamiento computacional. En este art¶‡culo, se compara una logica modal, llamada Spatial Propositio- nal Neighborhood Logic (SpPNL), con el ¶ Algebra de Rectangulos, que es la extension bidimensional del ¶ Algebra de Intervalos de Allen para el razonamiento temporal. Se demuestra como es posible comprobar la consistencia de una red de restricciones del ¶ Algebra de Rectangulos me- diante una formula en SpPNL. Tambien se muestra como dicha logica permite expresar ciertas restricciones espaciales intuitivas que no pueden representarse en el ¶ Algebra de Rectangulos.

Programando con Igualdad Similar Estricta

Article

Resumen En los últimos años, hemos sido testigos del importante papel que la lógica difusa o borrosa (fuzzy logic) ha jugado en el desarrollo de sofisticadas aplicaciones software en campos tan diversos como los sistemas expertos, inteligencia artificial, control industrial, medicina, etc. Con el objetivo de facilitar el desarrollo de tales aplicaciones, mucho más recientemente ha surgido el interés por diseñar lenguajes declara- tivos (funcionales, lógicos y/o difusos) que incorporen entre sus recursos expresivos el tratamiento de información imprecisa de forma natural, al tiempo que puedan ejecutarse de forma eficiente mediante, por ejem- plo, el uso de mecanismos de evaluación perezosos. Para que ello sea factible, es necesario definir previamente una noción de igualdad apropi- ada que integre de forma natural todas estas características en un marco uniforme. En este trabajo, proponemos una potente noción de igualdad en este sentido, que es capaz de dar cuenta de propiedades perezosas y borrosas en un contexto declarativo integrado, y que además resulta facilmente implementable a un alto nivel de abstracción. Palabras clave: Igualdad, Programación Declarativa, Lógica Difusa

Feature combination strategies for saliency-based visual attention systems

Article

Full-text available

Jan 2001
J ELECTRON IMAGING

Bottom-up or saliency-based visual attention allows pri- mates to detect nonspecific conspicuous targets in cluttered scenes. A classical metaphor, derived from electrophysiological and psycho- physical studies, describes attention as a rapidly shiftable ''spot- light.'' We use a model that reproduces the attentional scan paths of this spotlight. Simple multi-scale ''feature maps'' detect local spatial discontinuities in intensity, color, and orientation, and are combined into a unique ''master'' or ''saliency'' map. The saliency map is se- quentially scanned, in order of decreasing saliency, by the focus of attention. We here study the problem of combining feature maps, from different visual modalities (such as color and orientation), into a unique saliency map. Four combination strategies are compared us- ing three databases of natural color images: (1) Simple normalized summation, (2) linear combination with learned weights, (3) global nonlinear normalization followed by summation, and (4) local non- linear competition between salient locations followed by summation. Performance was measured as the number of false detections be- fore the most salient target was found. Strategy (1) always yielded poorest performance and (2) best performance, with a threefold to eightfold improvement in time to find a salient target. However, (2) yielded specialized systems with poor generalization. Interestingly, strategy (4) and its simplified, computationally efficient approxima- tion (3) yielded significantly better performance than (1), with up to fourfold improvement, while preserving generality. © 2001 SPIE and IS&T. (DOI: 10.1117/1.1333677)

An inhibitory beam for attentional selection

Article

Full-text available

Jan 1993

John K Tsotsos

A saliency-based search mechanism for overt and covert shifts of visual attention

Article

Jun 2000
VISION RES

Most models of visual search, whether involving overt eye movements or covert shifts of attention, are based on the concept of a saliency map, that is, an explicit two-dimensional map that encodes the saliency or conspicuity of objects in the visual environment. Competition among neurons in this map gives rise to a single winning location that corresponds to the next attended target. Inhibiting this location automatically allows the system to attend to the next most salient location. We describe a detailed computer implementation of such a scheme, focusing on the problem of combining information across modalities, here orientation, intensity and color information, in a purely stimulus-driven manner. The model is applied to common psychophysical stimuli as well as to a very demanding visual search task. Its successful performance is used to address the extent to which the primate visual system carries out visual search via one or more such saliency maps and how this can be tested.

Representing visual objects

Conference Paper

Mar 1993

Anne Treisman

Visual indexes in spatial vision and imagery

Article

Jan 1998

Zenon Pylyshyn

Dynamics of pattern formation in lateral-inhibi-tion type neural fields

Article

Jan 1977

Shun-ichi Amari

Mathematical transform of (R, G, B) color data to Munsell (H, V, C) color data

Article

Oct 1988
Proceedings of SPIE

In studying new-generation color image codings, it is very effective 1) to code signals in the space of inherent tri-attributes of human color perception, and 2) to relate a coding error with perceptual degree of deteriorations. For these purpose, we have adopted the Munsell Renotation System in which color signals of tri-attributes of human color perception (Hue, Value and Chroma) and psychometrical color differences are defined. In the Munsell Renotation System, however, intertransformation between (RGB) data and corresponding color data is very cumbersome. Because the intertransformation depends on a look up table. This article presents a new method of mathematical transformation. The mathematical transformation is obtained by multiple regression analysis of 250 color samples, which are uniformly sampled from whole color ranges that a conventional NTSC color TV camera can present. The new method can transform (RGB) data to the data of the Munsell Renotation System far better than the conventional method given by the CIE(1976)L*a*b*.

The extent of processing of noise elements during selective encoding

Article

Jan 1973

Splitting the Beam: Distribution of Attention Over Noncontiguous Regions of the Visual Field

Article

Nov 1995

In an effort to examine the flexibility with which attention can be allocated in visual space, we investigated whether subjects could selectively attend to multiple noncontiguous locations in the visual field We examined this issue by precuing two separate areas of the visual field and requiring subjects to decide whether the letters that appeared in these locations matched or mismatched while distractors that primed either the match or mismatch response were presented between the cued locations If the distractors had no effect on performance, it would provide evidence that subjects can divide attention over noncontiguous areas of space Subjects were able to ignore the distractors when the targets and distractors were presented as nononset stimuli (i e, when premasks were changed into the targets and distractors) In contrast, when the targets and distractors were presented as sudden-onset stimuli, subjects were unable to ignore the distractors These results begin to define the conditions under which attention can be flexibly deployed to multiple noncontiguous locations in the visual field. © 1995, Association for Psychological Science. All rights reserved.

The medium of attention: Location-based, object-centered, or scene-based?

Article

Jan 1998

Reviews experiments conducted in the authors' laboratory that have found support for a flexible attention system that can gain access to different forms of internal representation. This evidence comes from 3 main research areas: Negative Priming, Inhibition of Return, and Visual Neglect. This variety of research approaches provides converging evidence for the existence of object-based mechanisms of selective attention. The authors propose that the typical role of attention is not necessarily to facilitate perceptual processes, although it can do that. Rather, attention is necessary to achieve particular behavioural goals via selection of specific internal representations of individual objects from the complex internal representation of a scene. (PsycINFO Database Record (c) 2012 APA, all rights reserved)

Two selection stages provide efficient object-based attentional control for dynamic vision

Abstract and Figures

Recommended publications

Two Selection Stages Provide Efficient Object-Based

Using Neural Field Dynamics in the Context of Attentional Control

Data- and model-driven gaze control for an active-vision system

Evaluation of Attentional Control in Active Vision Systems Using a 3D Simulation Framework.