Conference PaperPDF Available

Saliency Detection via Graph-Based Manifold Ranking

June 2013
Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

June 2013

DOI:10.1109/CVPR.2013.407

Conference: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on

Authors:

Lihe Zhang

Dalian University of Technology

Xiang Ruan

tiwaki Co., LTD.

Show all 5 authorsHide

Most existing bottom-up methods measure the foreground saliency of a pixel or region based on its contrast within a local context or the entire image, whereas a few methods focus on segmenting out background regions and thereby salient objects. Instead of considering the contrast between the salient objects and their surrounding regions, we consider both foreground and background cues in a different way. We rank the similarity of the image elements (pixels or regions) with foreground cues or background cues via graph-based manifold ranking. The saliency of the image elements is defined based on their relevances to the given seeds or queries. We represent the image as a close-loop graph with super pixels as nodes. These nodes are ranked based on the similarity to background and foreground queries, based on affinity matrices. Saliency detection is carried out in a two-stage scheme to extract background regions and foreground salient objects efficiently. Experimental results on two large benchmark databases demonstrate the proposed method performs well when against the state-of-the-art methods in terms of accuracy and speed. We also create a more difficult benchmark database containing 5,172 images to test the proposed saliency model and make this database publicly available with this paper for further studies in the saliency field.

Diagram of our proposed model.

…

The example in which imprecise salient queries are selected in the second stage. From left to right: input image, saliency map of the first stage, binary segmentation, the final saliency map.

…

Figures - uploaded by Lihe Zhang

Content may be subject to copyright.

Content uploaded by Lihe Zhang

Content may be subject to copyright.

Saliency Detection via Graph-Based Manifold Ranking

Chuan Yang1, Lihe Zhang1, Huchuan Lu1, Xiang Ruan2, and Ming-Hsuan Yang3

1Dalian University of Technology 2OMRON Corporation 3University of California at Merced

Abstract

Most existing bottom-up methods measure the fore-

ground saliency of a pixel or region based on its con-

trast within a local context or the entire image, whereas

a few methods focus on segmenting out background re-

gions and thereby salient objects. Instead of considering

the contrast between the salient objects and their surround-

ing regions, we consider both foreground and background

cues in a different way. We rank the similarity of the im-

age elements (pixels or regions) with foreground cues or

background cues via graph-based manifold ranking. The

saliency of the image elements is deﬁned based on their rel-

evances to the given seeds or queries. We represent the

image as a close-loop graph with superpixels as nodes.

These nodes are ranked based on the similarity to back-

ground and foreground queries, based on afﬁnity matrices.

Saliency detection is carried out in a two-stage scheme to

extract background regions and foreground salient objects

efﬁciently. Experimental results on two large benchmark

databases demonstrate the proposed method performs well

when against the state-of-the-art methods in terms of ac-

curacy and speed. We also create a more difﬁcult bench-

mark database containing 5,172 images to test the proposed

saliency model and make this database publicly available

with this paper for further studies in the saliency ﬁeld.

1. Introduction

The task of saliency detection is to identify the most im-

portant and informative part of a scene. It has been applied

to numerous vision problems including image segmenta-

tion [11], object recognition [28], image compression [16],

content based image retrieval [8], to name a few. Saliency

methods in general can be categorized as either bottom-up

or top-down approaches. Bottom-up methods [1,2,6,7,9–

12,14,15,17,21,24,25,27,32,33,37] are data-driven and

pre-attentive, while top-down methods [23,36] are task-

driven that entails supervised learning with class labels. We

note that saliency models have been developed for eye ﬁxa-

tion prediction [6,14,15,17,19,25,33] and salient object

detection [1,2,7,9,23,24,32]. The former focuses on

identifying a few human ﬁxation locations on natural im-

ages, which is important for understanding human attention.

The latter is to accurately detect where the salient object

should be, which is useful for many high-level vision tasks.

In this paper, we focus on the bottom-up salient object de-

tection tasks.

Salient object detection algorithms usually generate

bounding boxes [7,10], binary foreground and background

segmentation [12,23,24,32], or saliency maps which in-

dicate the saliency likelihood of each pixel. Liu et al. [23]

propose a binary saliency estimation model by training a

conditional random ﬁeld to combine a set of novel features.

Wang et al. [32] analyze multiple cues in a uniﬁed energy

minimization framework and use a graph-based saliency

model [14] to detect salient objects. In [24] Lu et al. de-

velop a hierarchical graph model and utilize concavity con-

text to compute weights between nodes, from which the

graph is bi-partitioned for salient object detection. On the

other hand, Achanta et al. [1] compute the saliency likeli-

hood of each pixel based on its color contrast to the entire

image. Cheng et al. [9] consider the global region con-

trast with respect to the entire image and spatial relation-

ships across the regions to extract saliency map. In [11]

Goferman et al. propose a context-aware saliency algo-

rithm to detect the image regions that represent the scene

based on four principles of human visual attention. The

contrast of the center and surround distribution of features

is computed based on the Kullback-Leibler divergence for

salient object detection [21]. Xie et al. [35] propose a novel

model for bottom-up saliency within the Bayesian frame-

work by exploiting low and mid level cues. Sun et al. [30]

improve the Xie’s model by introducing boundary and soft-

segmentation. Recently, Perazzi et al. [27] show that the

complete contrast and saliency estimation can be formu-

lated in a uniﬁed way using high-dimensional Gaussian ﬁl-

ters. In this work, we generate a full-resolution saliency

map for each input image.

Most above-mentioned methods measure saliency by

measuring local center-surround contrast and rarity of fea-

tures over the entire image. In contrast, Gopalakrishnan et

al. [12] formulate the object detection problem as a binary

segmentation or labelling task on a graph. The most salient

2013 IEEE Conference on Computer Vision and Pattern Recognition

DOI 10.1109/CVPR.2013.407

3164

2013 IEEE Conference on Computer Vision and Pattern Recognition

DOI 10.1109/CVPR.2013.407

3164

2013 IEEE Conference on Computer Vision and Pattern Recognition

DOI 10.1109/CVPR.2013.407

3166











 









 



Figure 1. Diagram of our proposed model.

seed and several background seeds are identiﬁed by the be-

havior of random walks on a complete graph and a k-regular

graph. Then, a semi-supervised learning technique is used

to infer the binary labels of the unlabelled nodes. Recently,

a method that exploits background priors is proposed for

saliency detection [34]. The main observation is that the

distance between a pair of background regions is shorter

than that of a region from the salient object and a region

from the background. The node labelling task (either salient

object or background) is formulated as an energy minimiza-

tion problem based on this criteria.

We observe that background often presents local or

global appearance connectivity with each of four image

boundaries and foreground presents appearance coherence

and consistency. In this work, we exploit these cues to com-

pute pixel saliency based on the ranking of superpixels. For

each image, we construct a close-loop graph where each

node is a superpixel. We model saliency detection as a man-

ifold ranking problem and propose a two-stage scheme for

graph labelling. Figure 1shows the main steps of the pro-

posed algorithm. In the ﬁrst stage, we exploit the boundary

prior [13,22] by using the nodes on each side of image as

labelled background queries. From each labelled result, we

compute the saliency of nodes based on their relevances (i.e,

ranking) to those queries as background labels. The four la-

belled maps are then integrated to generate a saliency map.

In the second stage, we apply binary segmentation on the

resulted saliency map from the ﬁrst stage, and take the la-

belled foreground nodes as salient queries. The saliency of

each node is computed based on its relevance to foreground

queries for the ﬁnal map.

To fully capture intrinsic graph structure information and

incorporate local grouping cues in graph labelling, we use

manifold ranking techniques to learn a ranking function,

which is essential to learn an optimal afﬁnity matrix [20].

Different from [12], the proposed saliency detection algo-

rithm with manifold ranking requires only seeds from one

class, which are initialized with either the boundary pri-

ors or foreground cues. The boundary priors are proposed

inspired on the recent works of human ﬁxations on im-

ages [31], which shows that humans tend to gaze at the cen-

ter of images. These priors have also been used in image

segmentation and related problems [13,22,34]. In con-

trast, the semi-supervised method [12] requires both back-

ground and salient seeds, and generates a binary segmen-

tation. Furthermore, it is difﬁcult to determine the number

and locations of salient seeds as they are generated by ran-

dom walks, especially for the scenes with different salient

objects. This is a known problem with graph labelling

where the results are sensitive to the selected seeds. In this

work, all the background and foreground seeds can be easily

generated via background priors and ranking background

queries (or seeds). As our model incorporates local group-

ing cues extracted from the entire image, the proposed algo-

rithm generates well-deﬁned boundaries of salient objects

and uniformly highlights the whole salient regions. Exper-

imental results using large benchmark data sets show that

the proposed algorithm performs efﬁciently and favorably

against the state-of-the-art saliency detection methods.

2. Graph-Based Manifold Ranking

The graph-based ranking problem is described as fol-

lows: given a node as a query, the remaining nodes are

ranked based on their relevances to the given query. The

goal is to learn a ranking function, which deﬁnes the rele-

vance between unlabelled nodes and queries.

2.1. Manifold Ranking

In [39], a ranking method that exploits the intrinsic man-

ifold structure of data (such as image) for graph labelling

is proposed. Given a dataset X=x1,...,x

l,x

l+1,

...,x

n∈Rm×n, some data points are labelled queries

and the rest need to be ranked according to their relevances

to the queries. Let f:X→Rndenote a ranking func-

tion which assigns a ranking value fito each point xi,

and fcan be viewed as a vector f=[f1,...,f

n]T. Let

y=[y1,y

2,...,y

n]Tdenote an indication vector, in which

yi=1if xiis a query, and yi=0otherwise. Next, we

deﬁne a graph G=(V,E)on the dataset, where the nodes

Vare the dataset Xand the edges Eare weighted by an

afﬁnity matrix W=[wij]n×n.GivenG, the degree matrix

is D=diag{d11,...,d

nn}, where dii =jwij . Similar

to the PageRank and spectral clustering algorithms [5,26],

the optimal ranking of queries are computed by solving the

following optimization problem:

316531653167

Figure 2. Our graph model. The red line along the four sides indi-

cates that all the boundary nodes are connected with each other.

f∗

=arg min



i,j=1

wij fi

√dii −fj

djj 2+μ



i=1fi−yi2),

(1)

where the parameter μcontrols the balance of the smooth-

ness constraint (the ﬁrst term) and the ﬁtting constraint (the

second term). That is, a good ranking function should not

change too much between nearby points (smoothness con-

straint) and should not differ too much from the initial query

assignment (ﬁtting constraint). The minimum solution is

computed by setting the derivative of the above function to

be zero. The resulted ranking function can be written as:

f∗=(I−αS)−1y,(2)

where Iis an identity matrix, α=1/(1 + μ)and Sis the

normalized Laplacian matrix, S=D−1/2WD−1/2.

The ranking algorithm [39] is derived from the work on

semi-supervised learning for classiﬁcation [38]. Essentially,

manifold ranking can be viewed as an one-class classiﬁca-

tion problem [29], where only positive examples or negative

examples are required. We can get another ranking function

by using the unormalized Laplacian matrix in Eq. 2:

f∗=(D−αW)−1y.(3)

We compare the saliency results using Eq. 2and Eq. 3in

the experiments, and the latter achieves better performance

(See Figure 8). Hence, we adopt Eq. 3in this work.

2.2. Saliency Measure

Given an input image represented as a graph and some

salient query nodes, the saliency of each node is deﬁned

as its ranking score computed by Eq. 3which is rewritten

as f∗=Ay to facilitate analysis. The matrix Acan be

regarded as a learnt optimal afﬁnity matrix which is equal

to (D−αW)−1. The ranking score f∗(i)of the i-th node

is the inner product of the i-th row of Aand y. Because y

is a binary indicator vector, f∗(i)can also be viewed as the

sum of the relevances of the i-th node to all the queries.

In the conventional ranking problems, the queries are

manually labelled with the ground-truth. However, as

Figure 3. Graph labelling results using the top boundary prior.

Left: input images. Center: Results without enforcing the

geodesic distance constraints. Right: Results with geodesic dis-

tance constraints.

queries for saliency detection are selected by the proposed

algorithm, some of them may be incorrect. Thus, we need

to compute a degree of conﬁdence (i.e., the saliency value)

for each query, which is deﬁned as its ranking score ranked

by the other queries (except itself). To this end, we set the

diagonal elements of Ato 0 when computing the ranking

score by Eq. 3. We note that this seemingly insigniﬁcant

process has great effects on the ﬁnal results. If we compute

the saliency of each query without setting the diagonal el-

ements of Ato 0, its ranking value in f∗will contain the

relevance of this query to itself, which is meaningless and

often abnormally large so as to severely weaken the contri-

butions of the other queries to the ranking score. Lastly, we

measure the saliency of nodes using the normalized ranking

score f∗when salient queries are given, and using 1−f∗

when background queries are given.

3. Graph Construction

We construct a single layer graph G=(V,E)as shown

in Figure 2, where Vis a set of nodes and Eis a set of

undirected edges. In this work, each node is a superpixel

generated by the SLIC algorithm [3]. As neighboring nodes

are likely to share similar appearance and saliency values,

we use a k-regular graph to exploit the spatial relationship.

First, each node is not only connected to those nodes neigh-

boring it, but also connected to the nodes sharing common

boundaries with its neighboring node (See Figure 2). By ex-

tending the scope of node connection with the same degree

of k, we effectively utilize local smoothness cues. Second,

we enforce that the nodes on the four sides of image are

connected, i.e., any pair of boundary nodes are considered

to be adjacent. Thus, we denote the graph as the close-loop

graph. This close-loop constraint signiﬁcantly improves the

performance of the proposed method as it tends to reduce

the geodesic distance of similar superpixels, thereby im-

proving the ranking results. Figure 3shows some exam-

ples where the ranking results with and without these con-

straints. We note that these constraints work well when the

316631663168

Figure 4. Saliency maps using different queries. From left to right:

input image, result of using all the boundary nodes together as

queries, result of integrating four maps from each side, result of

ranking with foreground queries.

salient objects appear near the image boundaries or some of

the background regions are not the same.

With the constraints on edges, it is clear that the con-

structed graph is a sparsely connected. That is, most ele-

ments of the afﬁnity matrix Ware zero. In this work, the

weight between two nodes is deﬁned by

wij =e−ci−cj

σ2i, j ∈V, (4)

where ciand cjdenote the mean of the superpixels corre-

sponding to two nodes in the CIE LAB color space, and σ

is a constant that controls the strength of the weight. The

weights are computed based on the distance in the color

space as it has been shown to be effective in saliency de-

tection [2,4].

By ranking the nodes on the constructed graph, the in-

verse matrix (D−αW)−1in Eq. 3can be regarded as a

complete afﬁnity matrix, i.e., there exists a nonzero rele-

vance value between any pair of nodes on the graph. This

matrix naturally captures spatial relationship information.

That is, the relevance between nodes is increased when their

spatial distance is decreased, which is an important cue for

saliency detection [9].

4. Two-Stage Saliency Detection

In this section, we detail the proposed two-stage scheme

for bottom-up saliency detection using ranking with back-

ground and foreground queries.

4.1. Ranking with Background Queries

Based on the attention theories of early works for visual

saliency [17], we use the nodes on the image boundary as

background seeds, i.e., the labelled data (query samples) to

rank the relevances of all the other regions. Speciﬁcally,

we construct four saliency maps using boundary priors and

then integrate them for the ﬁnal map, which is referred as

the separation/combination (SC) approach.

Taking top image boundary as an example, we use the

nodes on this side as the queries and other nodes as the un-

labelled data. Thus, the indicator vector yis given, and all

the nodes are ranked based on Eq. 3in f∗, which is a N-

dimensional vector (Nis the total number of nodes of the

graph). Each element in this vector indicates the relevance

of a node to the background queries, and its complement is

Figure 5. Examples in which the salient objects appear at the image

boundary. From left to right: input images, saliency maps using

all the boundary nodes together as queries, four side-speciﬁc maps,

integration of four saliency maps, the ﬁnal saliency map after the

second stage.

the saliency measure. We normalize this vector to the range

between 0and 1, and the saliency map using the top bound-

ary prior, Stcan be written as:

St(i)=1−f∗(i)i=1,2,...,N, (5)

where iindexes a superpixel node on graph, and f∗denotes

the normalized vector.

Similarly, we compute the other three maps Sb,S

land

Sr, using the bottom, left and right image boundary as

queries. We note that the saliency maps are computed with

different indicator vector ywhile the weight matrix Wand

the degree matrix Dare ﬁxed. That is, we need to compute

the inverse of the matrix (D−αW)only once for each

image. Since the number of superpixels is small, the ma-

trix inverse in Eq. 3can be computed efﬁciently. Thus, the

overall computational load for the four maps is low. The

four saliency maps are integrated by the following process:

Sbq(i)=St(i)×Sb(i)×Sl(i)×Sr(i).(6)

There are two reasons for using the SC approach to gen-

erate saliency maps. First, the superpixels on different sides

are often disimilar which should have large distance. If we

simultaneously use all the boundary superpixels as queries

(i.e., indicating these suprerpixels are similar), the labelled

results are usually less optimal as these nodes are not com-

pactable (See Figure 4). Note that the geodesic distance that

we use in Section 3can be considered as weakly labelled as

only a few superpixels are involved (i.e., only the superpix-

els with low color distance from the sides are considered as

similar) whereas the case with all superpixels can be consid-

ered as strongly labelled (i.e., all the nodes from the sides

are considered as similar). Second, it reduces the effects

of imprecise queries, i.e., the ground-truth salient nodes are

inadvertently selected as background queries. As shown in

the second column of Figure 5, the saliency maps generated

using all the boundary nodes are poor. Due to the impre-

cise labelling results, the pixels with the salient objects have

low saliency values. However, as objects are often compact

“things” (such as a people or a car) as opposed to incompact

316731673169

Figure 6. The example in which imprecise salient queries are se-

lected in the second stage. From left to right: input image, saliency

map of the ﬁrst stage, binary segmentation, the ﬁnal saliency map.

“stuff” (such as grass or sky) and therefore they rarely oc-

cupy three or all sides of image, the proposed SC approach

ensures at least two saliency maps are effective (third col-

umn of Figure 5). By integration of four saliency maps,

some salient parts of object can be identiﬁed (although the

whole object is not uniformly highlighted), which provides

sufﬁcient cues for the second stage detection process.

While most regions of the salient objects are highlighted

in the ﬁrst stage, some background nodes may not be ade-

quately suppressed (See Figure 4and Figure 5). To alleviate

this problem and improve the results especially when ob-

jects appear near the image boundaries, the saliency maps

are further improved via ranking with foreground queries.

4.2. Ranking with Foreground Queries

The saliency map of the ﬁrst stage is binary segmented

(i.e., salient foreground and background) using an adaptive

threshold, which facilitates selecting the nodes of the fore-

ground salient objects as queries. We expect that the se-

lected queries cover the salient object regions as much as

possible (i.e., with high recall). Thus, the threshold is set as

the mean saliency over the entire saliency map.

Once the salient queries are given, an indicator vector

yis formed to compute the ranking vector f∗using Eq. 3.

As is carried out in the ﬁrst stage, the ranking vector f∗is

normalized between the range of 0and 1to form the ﬁnal

saliency map by

Sfq(i)=f∗(i)i=1,2,...,N, (7)

where iindexes superpixel node on graph, and f∗denotes

the normalized vector.

We note that there are cases where nodes may be in-

correctly selected as foreground queries in this stage. De-

spite some imprecise labelling, salient objects can be well

detected by the proposed algorithm as shown in Figure 6.

This can be explained as follows. The salient object re-

gions are usually relatively compact (in terms of spatial dis-

tribution) and homogeneous in appearance (in terms of fea-

ture distribution), while background regions are the oppo-

site. In other words, the intra-object relevance (i.e., two

nodes of the salient objects) is statistically much larger

than that of object-background and intra-background rel-

evance, which can be inferred from the afﬁnity matrix

A. To show this phenomenon, we compute the aver-

age intra-object, intra-background and object-background

0 50 100 150 200 250 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Image number

Average revelance between superpixels

intra−object

intra−background

object−background

Figure 7. Analysis of the learned relevances between nodes in the

afﬁnity matrix A.

relevance values in Afor each of the 300 images sam-

pled from a dataset with ground truth labels [2], which is

shown in Figure 7. Therefore, the sum of the relevance

values of object nodes to the ground-truth salient queries

is considerably larger than that of background nodes to

all the queries. That is, background saliency can be sup-

pressed effectively (fourth column of Figure 6). Similarly,

in spite of the saliency maps after the ﬁrst stage of Fig-

ure 5are not precise, salient object can be well detected by

the saliency maps after the foreground queries in the sec-

ond stage. The main steps of the proposed salient ob-

ject detection algorithm are summarized in Algorithm 1.

Algorithm 1 Bottom-up Saliency based on Manifold Ranking

Input: An image and required parameters

1: Segment the input image into superpixels, construct a graph G

with superpixels as nodes, and compute its degree matrix Dand

weight matrix Wby Eq. 4.

2: Compute (D−αW)−1and set its diagonal elements to 0.

3: Form indicator vectors ywith nodes on each side of image as

queries, and compute their corresponding side-speciﬁc maps by

Eq. 3and Eq. 5. Then, compute the saliency map Sbq by Eq. 6.

4: Bi-segment Sbq to form salient foreground queries and an indi-

cator vector y. Compute the saliency map Sfq by Eq. 3and Eq. 7.

Output: a saliency map Sfq representing the saliency value of

each superpixel.

5. Experimental Results

We evaluate the proposed method on three datasets. The

ﬁrst one is the MSRA dataset [23] which contains 5,000

images with the ground truth of salient region marked

by bounding boxes. The second one is the MSRA-1000

dataset, a subset of the MSRA dataset, which contains

1,000 images provided by [2] with accurate human-labelled

masks for salient objects. The last one is the proposed

DUT-OMRON dataset, which contains 5,172 carefully la-

beled images by ﬁve users. The source images, ground

316831683170

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Recall

Precision

unnormalized Laplaican matrix

normalized Laplaican matrix

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Recall

Precision

no close−loop constraint without k−regular graph

close−loop constraint with k−regular graph

close−loop constraint without k−regular graph

no close−loop constraint with k−regular graph

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Recall

Precision

using the SC approach with four boundary priors

without using the SC approach

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Recall

Precision

the first stage

the second stage

(a) (b) (c) (d)

Figure 8. Precision-recall curves on the MSRA-1000 dataset with different design options of the proposed algorithm. From left to right:

ranking with normalized and unnormalized Laplacian matrices, graph construction, the SC approach, results generated by each stage.

truth labels and detailed description of this dataset can be

found at http://ice.dlut.edu.cn/lu/DUT-OMRON/

Homepage.htm. We compare our method with fourteen

state-of-the-art saliency detection algorithms: the IT [17],

GB [14], MZ [25], SR [15], AC [1], Gof [11], FT [2],

LC [37], RC [9], SVO [7], SF [27], CB [18], GS SP [34]

and XIE [35] methods.

Experimental Setup: We set the number of superpixel

nodes N= 200 in all the experiments. There are two pa-

rameters in the proposed algorithm: the edge weight σin

Eq. 4, and the balance weight αin Eq. 3. The parameter

σcontrols the strength of weight between a pair of nodes

and the parameter αbalances the smooth and ﬁtting con-

straints in the regularization function of manifold ranking

algorithm. These two parameters are empirically chosen,

σ2=0.1and α=0.99, for all the experiments.

Evaluation Metrics: We evaluate all methods by precision,

recall and F-measure. The precision value corresponds to

the ratio of salient pixels correctly assigned to all the pix-

els of extracted regions, while the recall value is deﬁned as

the percentage of detected salient pixels in relation to the

ground-truth number. Similar as prior works, the precision-

recall curves are obtained by binarizing the saliency map

using thresholds in the range of 0and 255. The F-measure

is the overall performance measurement computed by the

weighted harmonic of precision and recall:

Fβ=(1 + β2)P recision ×Recall

β2P recision +Recall ,(8)

where we set β2=0.3to emphasize the precision [2].

5.1. MSRA-1000

We ﬁrst examine the design options of the proposed al-

gorithm in details. The ranking results using the normal-

ized (Eq. 2) and unnormalized (Eq. 3) Laplacian matri-

ces for ranking are analyzed. Figure 8(a) shows that the

ranking results with the unnormalized Laplacian matrix are

better, and used in all the experiments. Next, we demon-

strate the merits of the proposed graph construction scheme.

We compute four precision-recall curves for four cases of

node connection on the graph: close-loop constraint with-

out extending the scope of node with k-regular graph, with-

out close-loop constraint and with k-regular graph, without

both close-loop constraint and k-regular graph and close-

loop constraint with k-regular graph. Figure 8(b) shows

that the use of close-loop constraint and k-regular graph

performs best. The effect of the SC approach in the ﬁrst

stage is also evaluated. Figure 8(c) shows that our approach

using the integration of saliency maps generated from dif-

ferent boundary priors performs better in the ﬁrst stage. We

further compare the performance for each stage of the pro-

posed algorithm. Figure 8(d) demonstrates that the second

stage using the foreground queries further improve the per-

formance of the ﬁrst stage with background queries.

We evaluate the performance of the proposed method

against fourteen state-of-the-art bottom-up saliency detec-

tion methods. Figure 9shows the precision-recall curves

of all methods. We note that the proposed methods outper-

forms the SVO [7], Gof [11], CB [18], and RC [9] which are

top-performance methods for saliency detection in a recent

benchmark study [4]. In addition, the proposed methods

signiﬁcantly outperforms the GS SP [34] method which is

also based on boundary priors. We also compute the preci-

sion, recall and F-measure with an adaptive threshold pro-

posed in [2], deﬁned as twice the mean saliency of the im-

age. The rightmost plot of Figure 9shows that the proposed

algorithm achieves the highest precision and F-measure val-

ues. Overall, the results using three metrics demonstrate

that the proposed algorithm outperforms the state-of-the-

art methods. Figure 10 shows a few saliency maps of the

evaluated methods. We note that the proposed algorithm

uniformly highlights the salient regions and preserves ﬁner

object boundaries than the other methods.

5.2. MSRA

We further evaluate the proposed algorithm on the

MSRA dataset in which the images are annotated with

nine bounding boxes by different users. To compute pre-

cision and recall values, we ﬁrst ﬁt a rectangle to the bi-

nary saliency map and then use the output bounding box for

316931693171

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Recall

Precision

SVO

Gof

GS_SP

Ours

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Recall

Precision

XIE

Ours

Ours CB SVO SF XIE RC GS_SP IT FT LC CA AC GB SR MZ

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Precision

Recall

F−measure

Figure 9. Left, middle: precision-recall curves of different methods. Right: precision, recall and F-measure using an adaptive threshold.

All results are computed on the MSRA-1000 dataset. The proposed method performs well in all these metrics.

 !"!#!$!%&

Figure 10. Saliency detection results of different methods. The proposed algorithm consistently generates saliency maps close to the ground

truth.

Method Ours CB [18] Gof [11] SVO [7]

Time(s) 0.256 2.146 38.896 79.861

Table 1. Comparison of average run time (seconds per image).

the evaluation. Similar to the experiments on the MSRA-

1000 database, we also binarize saliency maps using the

threshold of twice the mean saliency to compute precision,

recall and F-measure bars. Figure 11 shows the proposed

model performs better than the other methods on this large

dataset. We note that the Gof [11] and FT [2] methods have

extremely large recall values, since their methods tend to

select large attention regions, but at the expense of low pre-

cision.

5.3. DUT-OMRON

We test the proposed model on the DUT-OMRON

dataset in which images are annotated with bounding boxes

by ﬁve users. Similar to the experiments on the MSRA

database, we also compute a rectangle of the binary saliency

map and then evaluate our model by the ﬁxed thresholding

and the adaptive thresholding ways. Figure 12 shows that

the proposed dataset is more challenging (all the models

performs more poorly), and thus provides more room for

improvement of the future work.

5.4. Run Time

The average run time of currently top-performance

methods using matlab implementation on the MSRA-

1000 database are presented in Table 1based on a ma-

chine with Intel Dual Core i3-2120 3.3 GHz CPU and

2GB RAM. Our run time is much faster than that of

the other saliency models. Speciﬁcally, the superpixel

generation by SLIC algorithm [3] spends 0.165 s (about

64%), and the actual saliency computation spends 0.091

s. The MATLAB implementation of the proposed al-

gorithm is available at http://ice.dlut.edu.cn/lu/

publications.html,orhttp://faculty.ucmerced.

edu/mhyang/pubs.html.

6. Conclusion

We propose a bottom-up method to detect salient regions

in images through manifold ranking on a graph, which in-

corporates local grouping cues and boundary priors. We

adopt a two-stage approach with the background and fore-

317031703172

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.4

0.5

0.6

0.7

0.8

0.9

Recall

Precision

SVO

Gof

Ours

Ours CB Gof RC SVO FT

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Precision

Recall

F−measure

Figure 11. Left: precision-recall curves of different methods.

Right: precision, recall and F-measure for adaptive threshold. All

results are computed on the MSRA dataset.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Recall

rec

SVO

Gof

Ours

Ours CB Gof RC SVO FT

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Precision

Recall

F−measure

Figure 12. Left: precision-recall curves of different methods.

Right: precision, recall and F-measure for adaptive threshold. All

results are computed on the DUT-OMRON dataset.

ground queries for ranking to generate the saliency maps.

We evaluate the proposed algorithm on large datasets and

demonstrate promising results with comparisons to fourteen

state-of-the-art methods. Furthermore, the proposed algo-

rithm is computationally efﬁcient. Our future work will fo-

cus on integration of multiple features with applications to

other vision problems.

Acknowledgements

C. Yang and L. Zhang are supported by the Fundamental

Research Funds for the Central Universities (DUT12JS05).

H. Lu is supported by the Natural Science Foundation of

China #61071209 and #61272372. M.-H. Yang is support

in part by the NSF CAREER Grant #1149783 and NSF IIS

Grant #1152576.

References

[1] R. Achanta, F. Estrada, P. Wils, and S. Susstrunk. Salient region

detection and segmentation. In ICVS, 2008. 1,6

[2] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk. Frequency-

tuned salient region detection. In CVPR, 2009. 1,4,5,6,7

[3] R. Achanta, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk. Slic

superpixels. Technical report, EPFL, Tech.Rep. 149300, 2010. 3

[4] A. Borji, D. Sihite, and L. Itti. Salient object detection: A bench-

mark. In ECCV, 2012. 4,6

[5] S. Brin and L. Page. The anatomy of a large-scale hypertextual web

search engine. Computer networks and ISDN systems, 30(1):107–

117, 1998. 2

[6] N. Bruce and J. Tsotsos. Saliency based on information maximiza-

tion. In NIPS, 2005. 1

[7] K.-Y. Chang, T.-L. Liu, H.-T. Chen, and S.-H. Lai. Fusing generic

objectness and visual saliency for salient object detection. In ICCV,

2011. 1,6,7

[8] T. Chen, M. Cheng, P. Tan, A. Shamir, and S. Hu. Sketch2photo:

Internet image montage. ACM Trans. on Graphics, 2009. 1

[9] M. M. Cheng, G. X. Zhang, N. J. Mitra, X. Huang, and S. M. Hu.

Global contrast based salient region detection. In CVPR, 2011. 1,4,

[10] J. Feng, Y. Wei, L. Tao, C. Zhang, and J. Sun. Salient object detection

by composition. In ICCV, 2011. 1

[11] S. Goferman, L. Zelnik-Manor, and A. Tal. Context-aware saliency

detection. In CVPR, 2010. 1,6,7

[12] V. Gopalakrishnan, Y. Hu, and D. Rajan. Random walks on graphs

for salient object detection in images. IEEE TIP, 2010. 1,2

[13] L. Grady, M. Jolly, and A. Seitz. Segmenation from a box. In ICCV,

2011. 2

[14] J. Harel, C. Koch, and P. Perona. Graph-based visual saliency. In

NIPS, 2006. 1,6

[15] X. Hou and L. Zhang. Saliency detection: A spectral residual ap-

proach. In CVPR, 2007. 1,6

[16] L. Itti. Automatic foveation for video compression using a neurobi-

ological model of visual attention. IEEE TIP, 2004. 1

[17] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual

attention for rapid scene analysis. IEEE PAMI, 1998. 1,4,6

[18] H. Jiang, J. Wang, Z. Yuan, T. Liu, N. Zheng, and S. Li. Auto-

matic salient object segmentation based on contex and shape prior.

In BMVC, 2011. 6,7

[19] T. Judd, K. Ehinger, F. Durand, and A. Torralba. Learning to predict

where humans look. In ICCV, 2009. 1

[20] T. H. Kim, K. M. Lee, and S. U. Lee. Learning full pairwise afﬁnities

for spectral segmentation. In CVPR, 2010. 2

[21] D. Klein and S. Frintrop. Center-surround divergence of feature

statistics for salient object detection. In ICCV, 2011. 1

[22] V. Lempitsky, P. Kohli, C. Rother, and T. Sharp. Image segmentation

with a bounding box prior. In ICCV, 2009. 2

[23] T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, and H. Shum.

Learning to detect a salient object. IEEE PAMI, 2011. 1,5

[24] Y. Lu, W. Zhang, H. Lu, and X. Y. Xue. Salient object detection

using concavity context. In ICCV, 2011. 1

[25] Y. Ma and H. Zhang. Contrast-based image attention analysis by

using fuzzy growing. ACM Multimedia, 2003. 1,6

[26] A. Ng, M. Jordan, Y. Weiss, et al. On spectral clustering: Analysis

and an algorithm. In NIPS, pages 849–856, 2002. 2

[27] F. Perazzi, P. Krahenbuhl, Y. Pritch, and A. Hornung. Saliency ﬁlters:

Contrast based ﬁltering for salient region detection. In CVPR, 2012.

1,6

[28] U. Rutishauser, D. Walther, C. Koch, and P. Perona. Is bottom-up

attention useful for object recognition? In CVPR, 2004. 1

[29] B. Scholkopf, J. Platt, J. Shawe-Taylor, A. Smola, and

R. Williamson. Estimating the support of a high-dimensional dis-

tribution. Neural Computation, 2001. 3

[30] J. Sun, H. C. Lu, and S. F. Li. Saliency detection based on integration

of boundary and soft-segmentation. In ICIP, 2012. 1

[31] B. Tatler. The central ﬁxation bias in scene viewing: Selecting an

optimal viewing position independently of motor biases and image

feature distributions. Journal of Vision, 2007. 2

[32] L. Wang, J. Xue, N. Zheng, and G. Hua. Automatic salient object

extraction with contextual cue. In ICCV, 2011. 1

[33] W. Wang, Y. Wang, Q. Huang, and W. Gao. Measuring visual

saliency by site entropy rate. In CVPR, 2010. 1

[34] Y. C. Wei, F. Wen, W. J. Zhu, and J. Sun. Geodesic saliency using

background priors. In ECCV, 2012. 2,6

[35] Y. L. Xie, H. C. Lu, and M. H. Yang. Bayesian saliency via low and

mid level cues. IEEE TIP, 2013. 1,6

[36] J. Yang and M. Yang. Top-down visual saliency via joint crf and

dictionary learning. In CVPR, 2012. 1

[37] Y. Zhai and M. Shah. Visual attention detection in video sequences

using spatiotemporal cues. ACM Multimedia, 2006. 1,6

[38] D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Scholkopf. Learning

with local and global consistency. In NIPS, 2003. 3

[39] D. Zhou, J. Weston, A. Gretton, O. Bousquet, and B. Scholkopf.

Ranking on data manifolds. In NIPS, 2004. 2,3

317131713173

Video saliency detection using modified high efficiency video coding and background modelling

Article

Full-text available

Jul 2024

Video saliency has a profound effect on our lives with its compression efficiency and precision. There have been several types of research done on image saliency but not on video saliency. This paper proposes a modified high efficiency video coding (HEVC) algorithm with background modelling and the implication of classification into coding blocks. This solution first employs the G-picture in the fourth frame as a long-term reference and then it is quantized based on the algorithm that segregates using the background features of the image. Then coding blocks are introduced to decrease the complexity of the HEVC code, reduce time consumption and overall speed up the process of saliency. The solution is experimented upon with the dynamic human fixation 1K (DHF1K) dataset and compared with several other state-of-the-art saliency methods to showcase the reliability and efficiency of the proposed solution.

SimSAM: Simple Siamese Representations Based Semantic Affinity Matrix for Unsupervised Image Segmentation

Preprint

Full-text available

Jun 2024

Recent developments in self-supervised learning (SSL) have made it possible to learn data representations without the need for annotations. Inspired by the non-contrastive SSL approach (SimSiam), we introduce a novel framework SIMSAM to compute the Semantic Affinity Matrix, which is significant for unsupervised image segmentation. Given an image, SIMSAM first extracts features using pre-trained DINO-ViT, then projects the features to predict the correlations of dense features in a non-contrastive way. We show applications of the Semantic Affinity Matrix in object segmentation and semantic segmentation tasks. Our code is available at https://github.com/chandagrover/SimSAM.

Dataset Enhancement with Instance-Level Augmentations

Preprint

Full-text available

Jun 2024

We present a method for expanding a dataset by incorporating knowledge from the wide distribution of pre-trained latent diffusion models. Data augmentations typically incorporate inductive biases about the image formation process into the training (e.g. translation, scaling, colour changes, etc.). Here, we go beyond simple pixel transformations and introduce the concept of instance-level data augmentation by repainting parts of the image at the level of object instances. The method combines a conditional diffusion model with depth and edge maps control conditioning to seamlessly repaint individual objects inside the scene, being applicable to any segmentation or detection dataset. Used as a data augmentation method, it improves the performance and generalization of the state-of-the-art salient object detection, semantic segmentation and object detection models. By redrawing all privacy-sensitive instances (people, license plates, etc.), the method is also applicable for data anonymization. We also release fully synthetic and anonymized expansions for popular datasets: COCO, Pascal VOC and DUTS.

A Universal Multi-View Guided Network for Salient Object and Camouflaged Object Detection

Article

Full-text available

Jan 2024

Salient object detection and camouflaged object detection have attracted increasing attention due to their significant practical applications. While these two domains share similarities in recognition methods and object characteristics, they also exhibit distinctions. In this paper, we propose a novel multi-view guided network for camouflaged and salient object detection, utilizing the Transformer as the backbone network for feature extraction. Capitalizing on shared characteristics, we introduce a CNN-based multi-view encoder and a multi-view fusion module, enhancing the acquisition of multi-perspective information while minimizing the increase in computational cost. Moreover, recognizing domain differences, we incorporate an attention exploration module, seamlessly integrating multi-view features with globally extracted features from the backbone network. This integration involves simultaneous exploration from both positional and color perspectives, unearthing valuable information to identify salient and camouflaged objects. Our approach maximizes shared characteristics between the two tasks while effectively addressing their differences, leading to precise object identification—be it for camouflaged or salient objects. Extensive experiments on nine challenging benchmark datasets demonstrate the superior performance of our method across four widely used evaluation metrics, outperforming 34 state-of-the-art methods. Furthermore, we applied our method to other visually-related tasks, such as polyp segmentation and defect detection. The results further demonstrate the versatility of our model. The source code and results of our method are available at https://github.com/1900zpf/MVGNet.

Cross‐scale resolution consistent network for salient object detection

Article

Full-text available

Jun 2024
IET IMAGE PROCESS

The salient object detection task tries to simulate the human visual system for most eye‐catching objects or regions detection. However, due to the complexity of the visual mechanisms, current methods will suffer from severe performance degradation, leading to inconsistent prediction results for the same regions, when directly adopting a model trained on a fixed resolution to evaluate at other different resolutions. Considering that consistency in predictions is essential for salient object detection, a cross‐scale resolution consistent salient object detection method, called RCNet, is proposed. Specifically, to enhance the model's capacity for generalization across images of varying resolutions and make the model implicitly learn the scale invariance, a multi‐resolution data enhancement module is constructed to generate images with arbitrary resolutions for the same scene. Moreover, to accomplish better multi‐level feature fusion, a cross‐scale fusion module is developed to fuse high‐level semantic features and low‐level detail features. Additionally, to explicitly learn the scale invariance of the salient scores, a hybrid salient consistency loss is formulated on salient object detection with different resolutions. Comprehensive evaluations on five benchmark datasets show that RCNet achieves a highly competitive result.

Enhancing learning on uncertain pixels in self-distillation for object segmentation

Article

Full-text available

Jun 2024

Self-distillation method guides the model learning via transferring knowledge of the model itself, which has shown the advantages in object segmentation. However, it has been proved that uncertain pixels with predicted probability close to 0.5 will restrict the model performance. The existing self-distillation methods cannot guide the model to enhance its learning ability for uncertain pixels, so the improvement is limited. To boost the student model’s learning ability for uncertain pixels, a novel self-distillation method is proposed. Firstly, the predicted probability in the current training sample and the ground truth label are fused to construct the teacher knowledge, as the current predicted information can express the performance of student models and represent the uncertainty of pixels more accurately. Secondly, a quadratic mapping function between the predicted probabilities of the teacher and student model is proposed. Theoretical analysis shows that the proposed method using the mapping function can guide the model to enhance the learning ability for uncertain pixels. Finally, the essential difference of utilizing the predicted probability of the student model in self-distillation is discussed in detail. Extensive experiments were conducted on models with convolutional neural networks and Transformer architectures as the backbone networks. The results on four public datasets demonstrate that the proposed method can effectively improve the student model performance.

A visual knowledge oriented approach for weakly supervised remote sensing object detection

Article

Jun 2024
NEUROCOMPUTING

Multi-branch feature fusion and refinement network for salient object detection

Article

Full-text available

Jun 2024
MULTIMEDIA SYST

With the development of convolutional neural networks (CNNs), salient object detection methods have made great progress in performance. Most methods are designed with complex structures to aggregate the multi-level feature maps, to reach the goal of filtering noise and obtaining rich information. However, there is no differentiation when dealing with the multi-level features, and only a uniform treatment is used in general. Based on the above considerations, in this paper, we propose a multi-branch feature fusion and refinement network (MFFRNet), which is a framework for treating low-level features and high-level features differently, and effectively fuses the information of multi-level features to make the results more accurate. We propose a detail optimization module (DOM) designed for the rich detail information in low-level features and a pyramid feature extraction module (PFEM) designed for the rich semantic information in high-level features, as well as a feature optimization module (FOM) for refining the fused feature of multiple levels. Extensive experiments are conducted on six benchmark datasets, and the results show that our approach outperforms the state-of-the-art methods.

The effecitveness of aggregation functions used in fuzzy local contrast constructions

Article

Jun 2024
FUZZY SET SYST

DASOD: Detail-aware salient object detection

Article

Jun 2024
IMAGE VISION COMPUT

Automatic Salient Object Segmentation Based on Context and Shape Prior

Article

Full-text available

Jan 2011

We propose a novel automatic salient object segmentation algorithm which integrates both bottom-up salient stimuli and object-level shape prior, i.e., a salient object has a well-defined closed boundary. Our approach is formalized as an iterative energy mini-mization framework, leading to binary segmentation of the salient object. Such energy minimization is initialized with a saliency map which is computed through context analy-sis based on multi-scale superpixels. Object-level shape prior is then extracted combining saliency with object boundary information. Both saliency map and shape prior update after each iteration. Experimental results on two public benchmark datasets show that our proposed approach outperforms state-of-the-art methods.

Saliency based on informationmaximization

Article

Jan 2006

Global contrast based salient region detection

Article

Jan 2011

Salient Object Detection: A Benchmark

Conference Paper

Oct 2012

Several salient object detection approaches have been published which have been assessed using different evaluation scores and datasets resulting in discrepancy in model comparison. This calls for a methodological framework to compare existing models and evaluate their pros and cons. We analyze benchmark datasets and scoring techniques and, for the first time, provide a quantitative comparison of 35 state-of-the-art saliency detection models. We find that some models perform consistently better than the others. Saliency models that intend to predict eye fixations perform lower on segmentation datasets compared to salient object detection algorithms. Further, we propose combined models which show that integration of the few best models outperforms all models over other datasets. By analyzing the consistency among the best models and among humans for each scene, we identify the scenes where models or humans fail to detect the most salient object. We highlight the current issues and propose future research directions.

Geodesic Saliency Using Background Priors

Conference Paper

Oct 2012

Generic object level saliency detection is important for many vision tasks. Previous approaches are mostly built on the prior that “appearance contrast between objects and backgrounds is high”. Although various computational models have been developed, the problem remains challenging and huge behavioral discrepancies between previous approaches can be observed. This suggest that the problem may still be highly ill-posed by using this prior only. In this work, we tackle the problem from a different viewpoint: we focus more on the background instead of the object. We exploit two common priors about backgrounds in natural images, namely boundary and connectivity priors, to provide more clues for the problem. Accordingly, we propose a novel saliency measure called geodesic saliency. It is intuitive, easy to interpret and allows fast implementation. Furthermore, it is complementary to previous approaches, because it benefits more from background priors while previous approaches do not. Evaluation on two databases validates that geodesic saliency achieves superior results and outperforms previous approaches by a large margin, in both accuracy and speed (2 ms per image). This illustrates that appropriate prior exploitation is helpful for the ill-posed saliency detection problem.

Saliency detection based on integration of boundary and soft-segmentation

Conference Paper

Sep 2012
Image Process

Detection of the visual salient regions is a challenging and significant problem in computer vision. In this paper, we propose a boundary based prior map and a soft-segmentation based convex hull to improve the saliency detection. First, we present to utilize the boundary information to obtain the coarse prior map. Then a convex hull improved by soft-segmentation is proposed to form the observation likelihood map. Finally, the Bayes formula is applied to combine these two maps. Experiments on a publicly available database show that our augmented framework performs favorably against the state-of-the-art algorithms.

Top-Down Visual Saliency via Joint CRF and Dictionary Learning

Conference Paper

Jun 2012
IEEE Comput Soc Conf Comput Vis Pattern Recogn

Top-down visual saliency facilities object localization by providing a discriminative representation of target objects and a probability map for reducing the search space. In this paper, we propose a novel top-down saliency model that jointly learns a Conditional Random Field (CRF) and a discriminative dictionary. The proposed model is formulated based on a CRF with latent variables. By using sparse codes as latent variables, we train the dictionary modulated by CRF, and meanwhile a CRF with sparse coding. We propose a max-margin approach to train our model via fast inference algorithms. We evaluate our model on the Graz-02 and PASCAL VOC 2007 datasets. Experimental results show that our model performs favorably against the state-of-the-art top-down saliency methods. We also observe that the dictionary update significantly improves the model performance.

Saliency Filters: Contrast Based Filtering for Salient Region Detection

Conference Paper

Jun 2012
IEEE Comput Soc Conf Comput Vis Pattern Recogn

Saliency estimation has become a valuable tool in image processing. Yet, existing approaches exhibit considerable variation in methodology, and it is often difficult to attribute improvements in result quality to specific algorithm properties. In this paper we reconsider some of the design choices of previous methods and propose a conceptually clear and intuitive algorithm for contrast-based saliency estimation. Our algorithm consists of four basic steps. First, our method decomposes a given image into compact, perceptually homogeneous elements that abstract unnecessary detail. Based on this abstraction we compute two measures of contrast that rate the uniqueness and the spatial distribution of these elements. From the element contrast we then derive a saliency measure that produces a pixel-accurate saliency map which uniformly covers the objects of interest and consistently separates fore- and background. We show that the complete contrast and saliency estimation can be formulated in a unified way using high-dimensional Gaussian filters. This contributes to the conceptual simplicity of our method and lends itself to a highly efficient implementation with linear complexity. In a detailed experimental evaluation we analyze the contribution of each individual feature and show that our method outperforms all state-of-the-art approaches.

Learning Full Pairwise Affinities for Spectral Segmentation

Article

Jul 2013

Segmenting a single image into multiple coherent groups remains a challenging task in the field of computer vision. Particularly, spectral segmentation which uses the global information embedded in the spectrum of a given image's affinity matrix is a major trend in image segmentation. This paper focuses on the problem of efficiently learning a full range of pairwise affinities gained by integrating local grouping cues for spectral segmentation. We first construct a sparse multilayer graph whose nodes are both the pixels and the oversegmented regions obtained by an unsupervised segmentation algorithm. By applying the semi-supervised learning strategy to this graph, the intra and interlayer affinities between all pairs of nodes can be estimated without iteration. These pairwise affinities are then applied into the spectral segmentation algorithms. In this paper, two types of spectral segmentation algorithms are introduced: $(K)$-way segmentation and hierarchical segmentation. Our algorithms provide high-quality segmentations which preserve object details by directly incorporating the full-range connections. Moreover, since our full affinity matrix is defined by the inverse of a sparse matrix, its eigendecomposition can be efficiently computed. The experimental results on the BSDS and MSRC image databases demonstrate the superiority of our segmentation algorithms in terms of relevance and accuracy compared with existing popular methods.

Bayesian Saliency via Low and Mid Level Cues

Article

Sep 2012
IEEE T IMAGE PROCESS

Visual saliency detection is a challenging problem in computer vision with great importance which finds numerous applications. In this paper, we propose a novel model for bottomup saliency within the Bayesian framework by exploiting low and mid level cues. In contrast to most existing methods that operate directly on low level cues, we propose an algorithm in which a coarse saliency region is first obtained via a convex hull of interest points. Next, we analyze the saliency information with mid level visual cues via superpixels. We present a Laplacian sparse subspace clustering method to group superpixels with local features, and analyze the results with respect to the coarse saliency region to compute the prior saliency map. In the meanwhile, we use the low level visual cues based on the convex hull to compute the observation likelihood, thereby facilitating inference of Bayesian saliency at each pixel. Extensive experiments on a large data set show that our Bayesian saliency model performs favorably against the state-of-the-art algorithms.

Saliency Detection via Graph-Based Manifold Ranking

Abstract and Figures

Recommended publications

RLP-AGMC: Robust label propagation for saliency detection based on an adaptive graph with multiview...

Ranking Saliency

Saliency Detection via Absorbing Markov Chain

Saliency Detection via Dense and Sparse Reconstruction

Saliency Detection with Multi-Scale Superpixels