Article

Saliency Detection Based on Conditional Random Field and Image Segmentation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Considering the problems existed in the saliency region detected out by the current familiar saliency detection methods: its boundary is sparse and unclear, and its inside is uneven and non-compact, a method called saliency detection based on the conditional random field and image segmentation is proposed. This method comprehensively utilizes boundary information, local information and global information to extract a variety of salient features from an image. By fusing these features in the framework of conditional random field, a coarse detection for saliency region is realized based on region labeled of saliency region and background region, and then a fine detection for saliency region is realized through combining the result of region labeled with an interactive image segmentation method. Experimental results show that the proposed approach can clearly and accurately extract saliency regions and improve the detection precision.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Segmentation through the combined use of high and low feature maps is a common method of medical segmentation (Qian et al., 2015;Zhou et al., 2018;Gu et al., 2019;Fan et al., 2020). However, because of their large size, low-level feature maps require more resources, and the effect on performance improvement is not obvious (Wu et al., 2019). ...
Article
Full-text available
In recent years, an increasing number of people have myopia in China, especially the younger generation. Common myopia may develop into high myopia. High myopia causes visual impairment and blindness. Parapapillary atrophy (PPA) is a typical retinal pathology related to high myopia, which is also a basic clue for diagnosing high myopia. Therefore, accurate segmentation of the PPA is essential for high myopia diagnosis and treatment. In this study, we propose an optimized Unet (OT-Unet) to solve this important task. OT-Unet uses one of the pre-trained models: Visual Geometry Group (VGG), ResNet, and Res2Net, as a backbone and is combined with edge attention, parallel partial decoder, and reverse attention modules to improve the segmentation accuracy. In general, using the pre-trained models can improve the accuracy with fewer samples. The edge attention module extracts contour information, the parallel partial decoder module combines the multi-scale features, and the reverse attention module integrates high- and low-level features. We also propose an augmented loss function to increase the weight of complex pixels to enable the network to segment more complex lesion areas. Based on a dataset containing 360 images (Including 26 pictures provided by PALM), the proposed OT-Unet achieves a high AUC (Area Under Curve) of 0.9235, indicating a significant improvement over the original Unet (0.7917).
Article
Full-text available
Image segmentation in medical imaging has long been a problem in radiological image processing. Most of the image segmentation methods in traditional vision algorithms are difficult to achieve high-resolution image segmentation due to the complexity of the algorithm. This paper proposes an image segmentation method based on an optimized cellular neural network. This method introduces a non-linear template and data quantization on the basis of a basic network model, which greatly reduces the computational complexity while maintaining the accuracy of image segmentation. We then applied the method to a computer-aided system to classify tumor lesions in mammograms. Finally, we propose an FPGA-based multilevel optimization architecture for energy-efficient cellular neural networks. The optimization scheme includes three levels: system level, module level, and design space. This solution improves computing performance by increasing system parallelism, using data reuse technology to fully utilize loading bandwidth, and using data quantization to reduce computational redundancy. It also introduces pipeline and dual cache structures to optimize memory access, and analyzes the limited resources through the Roofline model. System for best performance. The experimental results show that the FPGA accelerator in this paper can improve unit performance by 34% compared with other existing research work. The nonlinear quantified cellular neural network proposed in this paper can reduce LUT resource consumption by 74% and energy of 48.2%. Compared with the original network, in the two projection position segmentation results of the mammogram, only 1.5% and 0.6% of the accuracy loss, respectivel.
Chapter
In order to effectively manage photos in personal photo album and improve the efficiency of re-finding photos, the visualization of photo album has received attention. The most popular and reasonable visualization method is to display a representative photo of each photo cluster. We studied the characteristics of representative photos and then proposed a method of selecting the representative photos from a set of photos related to a specific event. The method mainly considered two aspects of photos: aesthetic quality and memorable factor. Aesthetic quality contains the area and location of the salient region and the sharpness of photo; memorable factors contain the salient people and text information. The experimental data sets are real-world personal photo collections, including more than 7,000 photos and more than 2000 specific events. The experimental results show the efficiency and reliability of selecting representative photos to visualization of photo album.
Article
Full-text available
Named entity recognition (NER) is an indispensable and very important part of many natural language processing technologies, such as information extraction, information retrieval, and intelligent Q & A. This paper describes the development of the AL-CRF model, which is a NER approach based on active learning (AL). The algorithmic sequence of the processes performed by the AL-CRF model is the following: first, the samples are clustered using the k -means approach. Then, stratified sampling is performed on the produced clusters in order to obtain initial samples, which are used to train the basic conditional random field (CRF) classifier. The next step includes the initiation of the selection process which uses the criterion of entropy. More specifically, samples having the highest entropy values are added to the training set. Afterwards, the learning process is repeated, and the CRF classifier is retrained based on the obtained training set. The learning and the selection process of the AL is running iteratively until the harmonic mean F stabilizes and the final NER model is obtained. Several NER experiments are performed on legislative and medical cases in order to validate the AL-CRF performance. The testing data include Chinese judicial documents and Chinese electronic medical records (EMRs). Testing indicates that our proposed algorithm has better recognition accuracy and recall rate compared to the conventional CRF model. Moreover, the main advantage of our approach is that it requires fewer manually labelled training samples, and at the same time, it is more effective. This can result in a more cost effective and more reliable process.
Conference Paper
Laser lines emitted by the laser level are mostly detected manually and laser particle and optical effects also bring difficulties on measurement. In this paper, we design a detection system for the five-line laser level and propose a laser line measurement method based on ma- chine vision. Image processing is divided into two stages: in the first stage, we use random sample consensus (RANSAC) algorithm combined with Hough transform to fit the laser axis, which can get its position information. In the second stage, a laser edge extraction method based on conditional random fields (CRFs) is proposed, and the sub-pixel width of laser line is obtained by spline interpolation algorithm. The results confirm that the laser level detection method proposed in this paper can realize the corresponding detection precision and requirement.
Conference Paper
Full-text available
In this paper, we propose a visual saliency detection algorithm from the perspective of reconstruction errors. The image boundaries are first extracted via super pixels as likely cues for background templates, from which dense and sparse appearance models are constructed. For each image region, we first compute dense and sparse reconstruction errors. Second, the reconstruction errors are propagated based on the contexts obtained from K-means clustering. Third, pixel-level saliency is computed by an integration of multi-scale reconstruction errors and refined by an object-biased Gaussian model. We apply the Bayes formula to integrate saliency measures based on dense and sparse reconstruction errors. Experimental results show that the proposed algorithm performs favorably against seventeen state-of-the-art methods in terms of precision and recall. In addition, the proposed algorithm is demonstrated to be more effective in highlighting salient objects uniformly and robust to background noise.
Conference Paper
Full-text available
Salient object detection has been attracting a lot of interest, and recently various heuristic computational models have been designed. In this paper, we regard saliency map computation as a regression problem. Our method, which is based on multi-level image segmentation, uses the supervised learning approach to map the regional feature vector to a saliency score, and finally fuses the saliency scores across multiple levels, yielding the saliency map. The contributions lie in two-fold. One is that we show our approach, which integrates the regional contrast, regional property and regional background ness descriptors together to form the master saliency map, is able to produce superior saliency maps to existing algorithms most of which combine saliency maps heuristically computed from different types of features. The other is that we introduce a new regional feature vector, background ness, to characterize the background, which can be regarded as a counterpart of the objectness descriptor [2]. The performance evaluation on several popular benchmark data sets validates that our approach outperforms existing state-of-the-arts.
Conference Paper
Full-text available
Most existing bottom-up methods measure the foreground saliency of a pixel or region based on its contrast within a local context or the entire image, whereas a few methods focus on segmenting out background regions and thereby salient objects. Instead of considering the contrast between the salient objects and their surrounding regions, we consider both foreground and background cues in a different way. We rank the similarity of the image elements (pixels or regions) with foreground cues or background cues via graph-based manifold ranking. The saliency of the image elements is defined based on their relevances to the given seeds or queries. We represent the image as a close-loop graph with super pixels as nodes. These nodes are ranked based on the similarity to background and foreground queries, based on affinity matrices. Saliency detection is carried out in a two-stage scheme to extract background regions and foreground salient objects efficiently. Experimental results on two large benchmark databases demonstrate the proposed method performs well when against the state-of-the-art methods in terms of accuracy and speed. We also create a more difficult benchmark database containing 5,172 images to test the proposed saliency model and make this database publicly available with this paper for further studies in the saliency field.
Article
Full-text available
Salient areas in natural scenes are generally regarded as areas which the human eye will typically focus on, and finding these areas is the key step in object detection. In computer vision, many models have been proposed to simulate the behavior of eyes such as SaliencyToolBox (STB), Neuromorphic Vision Toolkit (NVT), and others, but they demand high computational cost and computing useful results mostly relies on their choice of parameters. Although some region-based approaches were proposed to reduce the computational complexity of feature maps, these approaches still were not able to work in real time. Recently, a simple and fast approach called spectral residual (SR) was proposed, which uses the SR of the amplitude spectrum to calculate the image's saliency map. However, in our previous work, we pointed out that it is the phase spectrum, not the amplitude spectrum, of an image's Fourier transform that is key to calculating the location of salient areas, and proposed the phase spectrum of Fourier transform (PFT) model. In this paper, we present a quaternion representation of an image which is composed of intensity, color, and motion features. Based on the principle of PFT, a novel multiresolution spatiotemporal saliency detection model called phase spectrum of quaternion Fourier transform (PQFT) is proposed in this paper to calculate the spatiotemporal saliency map of an image by its quaternion representation. Distinct from other models, the added motion dimension allows the phase spectrum to represent spatiotemporal saliency in order to perform attention selection not only for images but also for videos. In addition, the PQFT model can compute the saliency map of an image under various resolutions from coarse to fine. Therefore, the hierarchical selectivity (HS) framework based on the PQFT model is introduced here to construct the tree structure representation of an image. With the help of HS, a model called multiresolution wavelet domain foveation (MWDF) is p- - roposed in this paper to improve coding efficiency in image and video compression. Extensive tests of videos, natural images, and psychological patterns show that the proposed PQFT model is more effective in saliency detection and can predict eye fixations better than other state-of-the-art models in previous literature. Moreover, our model requires low computational cost and, therefore, can work in real time. Additional experiments on image and video compression show that the HS-MWDF model can achieve higher compression rate than the traditional model.
Conference Paper
Full-text available
Reliable estimation of visual saliency allows appropriate processing of images without prior knowledge of their contents, and thus remains an important step in many computer vision tasks including image segmentation, object recognition, and adaptive compression. We propose a regional contrast based saliency extraction algorithm, which simultaneously evaluates global contrast differences and spatial coherence. The proposed algorithm is simple, efficient, and yields full resolution saliency maps. Our algorithm consistently outperformed existing saliency detection methods, yielding higher precision and better recall rates, when evaluated using one of the largest publicly available data sets. We also demonstrate how the extracted saliency map can be used to create high quality segmentation masks for subsequent image processing.
Conference Paper
Full-text available
A new bottom-up visual saliency model, Graph-Based Visual Saliency (GBVS), is proposed. It consists of two steps: first forming activation maps on certain feature channels, and then normalizing them in a way which highlights conspicuity and admits combination with other maps. The model is simple, and biologically plausible insofar as it is naturally parallelized. This model powerfully predicts human fixations on 749 variations of 108 natural images, achieving 98% of the ROC area of a human-based control, whereas the classical algorithms of Itti & Koch ([2], [3], [4]) achieve only 84%.
Conference Paper
Full-text available
Detecting visually attentive regions of an image is a challenging but useful issue in many multimedia applications. In this paper, we describe a method to extract visual attentive regions in images using subspace estimation and analysis techniques. The image is represented in a 2D space using polar transformation of its features so that each region in the image lies in a 1D linear subspace. A new subspace estimation algorithm based on Generalized Principal Component Analysis (GPCA) is proposed. The robustness of subspace estimation is improved by using weighted least square approximation where weights are calculated from the distribution of K nearest neighbors to reduce the sensitivity of outliers. Then a new region attention measure is defined to calculate the visual attention of each region by considering both feature contrast and geometric properties of the regions. The method has been shown to be effective through experiments to be able to overcome the scale dependency of other methods. Compared with existing visual attention detection methods, it directly measures the global visual contrast at the region level as opposed to pixel level contrast and can correctly extract the attentive region.
Conference Paper
Full-text available
The ability of human visual system to detect visual saliency is extraordinarily fast and reliable. However, computational modeling of this basic intelligent behavior still remains a challenge. This paper presents a simple method for the visual saliency detection. Our model is independent of features, categories, or other forms of prior knowledge of the objects. By analyzing the log-spectrum of an input image, we extract the spectral residual of an image in spectral domain, and propose a fast method to construct the corresponding saliency map in spatial domain. We test this model on both natural pictures and artificial images such as psychological patterns. The result indicate fast and robust saliency detection of our method.
Conference Paper
Full-text available
Salient areas in natural scenes are generally regarded as the candidates of attention focus in human eyes, which is the key stage in object detection. In computer vision, many models have been proposed to simulate the behav- ior of eyes such as SaliencyToolBox (STB), Neuromorphic Vision Toolkit (NVT) and etc., but they demand high com- putational cost and their remarkable results mostly rely on the choice of parameters. Recently a simple and fast ap- proach based on Fourier transform called spectral residual (SR) was proposed, which used SR of the amplitude spec- trum to obtain the saliency map. The results are good, but the reason is questionable. In this paper, we propose it is the phase spectrum, not the amplitude spectrum, of the Fourier transform that is the key in obtaining the location of salient areas. We provide some examples to show that PFT can get better results in comparison with SR and requires less computational com- plexity as well. Furthermore, PFT can be easily extended from a two-dimensional Fourier transform to a Quaternion Fourier Transform (QFT) if the value of each pixel is repre- sented as a quaternion composed of intensity, color and mo- tion feature. The added motion dimension allows the phase spectrum to represent spatio-temporal saliency in order to engage in attention selection for videos as well as images. Extensive tests of videos, natural images and psycholog- ical patterns show that the proposed method is more effec- tive than other models. Moreover, it is very robust against white-colored noise and meets the real-time requirements, which has great potentials in engineering applications.
Article
Full-text available
The problem of efficient, interactive foreground/background segmentation in still images is of great practical importance in image editing. Classical image segmentation tools use either texture (colour) information, e.g. Magic Wand, or edge (contrast) information, e.g. Intelligent Scissors. Recently, an approach based on optimization by graph-cut has been developed which successfully combines both types of information. In this paper we extend the graph-cut approach in three respects. First, we have developed a more powerful, iterative version of the optimisation. Secondly, the power of the iterative algorithm is used to simplify substantially the user interaction needed for a given quality of result. Thirdly, a robust algorithm for "border matting" has been developed to estimate simultaneously the alpha-matte around an object boundary and the colours of foreground pixels. We show that for moderately difficult examples the proposed method outperforms competitive tools.
Article
Full-text available
In this paper, we study the salient object detection problem for images. We formulate this problem as a binary labeling task where we separate the salient object from the background. We propose a set of novel features, including multiscale contrast, center-surround histogram, and color spatial distribution, to describe a salient object locally, regionally, and globally. A conditional random field is learned to effectively combine these features for salient object detection. Further, we extend the proposed approach to detect a salient object from sequential images by introducing the dynamic salient features. We collected a large image database containing tens of thousands of carefully labeled images by multiple users and a video segment database, and conducted a set of experiments over them to demonstrate the effectiveness of the proposed approach.
Conference Paper
Full-text available
Classification of images in many category datasets has rapidly improved in recent years. However, systems that perform well on particular datasets typically have one or more limitations such as a failure to generalize across visual tasks (e.g., requiring a face detector or extensive retuning of parameters), insufficient translation invariance, inability to cope with partial views and occlusion, or significant performance degradation as the number of classes is increased. Here we attempt to overcome these challenges using a model that combines sequential visual attention using fixations with sparse coding. The model's biologically-inspired filters are acquired using unsupervised learning applied to natural image patches. Using only a single feature type, our approach achieves 78.5% accuracy on Caltech-101 and 75.2% on the 102 Flowers dataset when trained on 30 instances per class and it achieves 92.7% accuracy on the AR Face database with 1 training instance per person. The same features and parameters are used across these datasets to illustrate its robust performance.
Article
Full-text available
We propose a new type of saliency -- context-aware saliency -- which aims at detecting the image regions that represent the scene. This definition differs from previous definitions whose goal is to either identify fixation points or detect the dominant object. In accordance with our saliency definition, we present a detection algorithm which is based on four principles observed in the psychological literature. The benefits of the proposed approach are evaluated in two applications where the context of the dominant objects is just as essential as the objects themselves. In image retargeting we demonstrate that using our saliency prevents distortions in the important regions. In summarization we show that our saliency helps to produce compact, appealing, and informative summaries.
Article
Full-text available
As you drive into the centre of town, cars and trucks approach from several directions, and pedestrians swarm into the intersection. The wind blows a newspaper into the gutter and a pigeon does something unexpected on your windshield. This would be a demanding and stressful situation, but you would probably make it to the other side of town without mishap. Why is this situation taxing, and how do you cope?
Article
Full-text available
A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented. Multiscale image features are combined into a single topographical saliency map. A dynamical neural network then selects attended locations in order of decreasing saliency. The system breaks down the complex problem of scene understanding by rapidly selecting, in a computationally efficient manner, conspicuous locations to be analyzed in detail. Index terms: Visual attention, scene analysis, feature extraction, target detection, visual search. \Pi I. Introduction Primates have a remarkable ability to interpret complex scenes in real time, despite the limited speed of the neuronal hardware available for such tasks. Intermediate and higher visual processes appear to select a subset of the available sensory information before further processing [1], most likely to reduce the complexity of scene analysis [2]. This selection appears to be implemented in the ...
Chapter
This chapter presents an algorithm for inverse color map computation. An inverse color map is used to translate full (RGB) colors into a limited set of colors. It might be used to drive an 8-bit color display to perform error-propagation dithering (for example, Floyd–Steinberg) in color or for the output phase of a color quantization algorithm. Published methods for computing such color maps seem to be few and far between, and are either relatively inefficient or inexact. This chapter describes a simple and efficient method for computing inverse color maps. A representative color is one of the colors in the limited set. There are n representative colors. RGB colors are quantized to k bits by taking the top k bits of each primary as an integer in the range 0–2k. In this chapter, two versions of the algorithm are described. The first is simple, and illustrates the basic principle as well. It is also fairly efficient, taking approximately 24 s to completely fill an inverse color map (k = 5 and n = 256) on a SUN 3/60 computer. The second version is significantly more complex, but much more efficient for large color maps, taking approximately 3.2 s to fill the same map.
Article
Salient region detection is a key research field in image processing. An approach for calculating salient regions using multi-scale analysis in the frequency domain was proposed. First, low-level features of the image at different scales were extracted. By analyzing the amplitude spectra and phase spectra of feature maps in the frequency domain, a corresponding saliency map was constructed in the spatial domain. Using the saliency map, the salient regions in an image can be identified. This method was tested on both natural images and artificial images. The experimental results showed that this approach quickly extracted salient regions which were consistent with human visual perception. The method performed well even in images with a high density of noise.
Conference Paper
Several salient object detection approaches have been published which have been assessed using different evaluation scores and datasets resulting in discrepancy in model comparison. This calls for a methodological framework to compare existing models and evaluate their pros and cons. We analyze benchmark datasets and scoring techniques and, for the first time, provide a quantitative comparison of 35 state-of-the-art saliency detection models. We find that some models perform consistently better than the others. Saliency models that intend to predict eye fixations perform lower on segmentation datasets compared to salient object detection algorithms. Further, we propose combined models which show that integration of the few best models outperforms all models over other datasets. By analyzing the consistency among the best models and among humans for each scene, we identify the scenes where models or humans fail to detect the most salient object. We highlight the current issues and propose future research directions.
Conference Paper
Generic object level saliency detection is important for many vision tasks. Previous approaches are mostly built on the prior that “appearance contrast between objects and backgrounds is high”. Although various computational models have been developed, the problem remains challenging and huge behavioral discrepancies between previous approaches can be observed. This suggest that the problem may still be highly ill-posed by using this prior only. In this work, we tackle the problem from a different viewpoint: we focus more on the background instead of the object. We exploit two common priors about backgrounds in natural images, namely boundary and connectivity priors, to provide more clues for the problem. Accordingly, we propose a novel saliency measure called geodesic saliency. It is intuitive, easy to interpret and allows fast implementation. Furthermore, it is complementary to previous approaches, because it benefits more from background priors while previous approaches do not. Evaluation on two databases validates that geodesic saliency achieves superior results and outperforms previous approaches by a large margin, in both accuracy and speed (2 ms per image). This illustrates that appropriate prior exploitation is helpful for the ill-posed saliency detection problem.
Conference Paper
Detection of the visual salient regions is a challenging and significant problem in computer vision. In this paper, we propose a boundary based prior map and a soft-segmentation based convex hull to improve the saliency detection. First, we present to utilize the boundary information to obtain the coarse prior map. Then a convex hull improved by soft-segmentation is proposed to form the observation likelihood map. Finally, the Bayes formula is applied to combine these two maps. Experiments on a publicly available database show that our augmented framework performs favorably against the state-of-the-art algorithms.
Conference Paper
We introduce a saliency model based on two key ideas. The first one is considering local and global image patch rarities as two complementary processes. The second one is based on our observation that for different images, one of the RGB and Lab color spaces outperforms the other in saliency detection. We propose a framework that measures patch rarities in each color space and combines them in a final map. For each color channel, first, the input image is partitioned into non-overlapping patches and then each patch is represented by a vector of coefficients that linearly reconstruct it from a learned dictionary of patches from natural scenes. Next, two measures of saliency (Local and Global) are calculated and fused to indicate saliency of each patch. Local saliency is distinctiveness of a patch from its surrounding patches. Global saliency is the inverse of a patch's probability of happening over the entire image. The final saliency map is built by normalizing and fusing local and global saliency maps of all channels from both color systems. Extensive evaluation over four benchmark eye-tracking datasets shows the significant advantage of our approach over 10 state-of-the-art saliency models.
Conference Paper
When dealing with objects with complex structures, saliency detection confronts a critical problem - namely that detection accuracy could be adversely affected if salient foreground or background in an image contains small-scale high-contrast patterns. This issue is common in natural images and forms a fundamental challenge for prior methods. We tackle it from a scale point of view and propose a multi-layer approach to analyze saliency cues. The final saliency map is produced in a hierarchical model. Different from varying patch sizes or downsizing images, our scale-based region handling is by finding saliency values optimally in a tree model. Our approach improves saliency detection on many images that cannot be handled well traditionally. A new dataset is also constructed.
Conference Paper
Salient object detection is not a pure low-level, bottom-up process. Higher-level knowledge is important even for task-independent image saliency. We propose a unified model to incorporate traditional low-level features with higher-level guidance to detect salient objects. In our model, an image is represented as a low-rank matrix plus sparse noises in a certain feature space, where the non-salient regions (or background) can be explained by the low-rank matrix, and the salient regions are indicated by the sparse noises. To ensure the validity of this model, a linear transform for the feature space is introduced and needs to be learned. Given an image, its low-level saliency is then extracted by identifying those sparse noises when recovering the low-rank matrix. Furthermore, higher-level knowledge is fused to compose a prior map, and is treated as a prior term in the objective function to improve the performance. Extensive experiments show that our model can comfortably achieves comparable performance to the existing methods even without the help from high-level knowledge. The integration of top-down priors further improves the performance and achieves the state-of-the-art. Moreover, the proposed model can be considered as a prototype framework not only for general salient object detection, but also for potential task-dependent saliency applications.
Article
We propose a novel image-importance model for content-aware image resizing. In contrast to the previous gradient magnitude-based approaches, we focus on the excellence of gradient domain statistics. The proposed scheme originates from a well-known property of the human visual system that the human visual perception is highly adaptive and sensitive to structural information in images rather than nonstructural information. We do not model the image structure explicitly, because there are diverse aspects of image structure and they cannot be easily modeled from cluttered natural images. Instead, our method obtains the structural information in an image by exploiting the gradient domain statistics in an implicit manner. Extensive tests on a variety of cluttered natural images show that the proposed method is more effective than the previous content-aware image-resizing methods and it is very robust to images with a cluttered background, unlike the previous schemes.
Article
In this work we present discriminative random fields (DRFs), a discriminative framework for the classification of image regions by incorporating neighborhood interactions in the labels as well as the observed data. The discriminative random fields offer several advantages over the conventional Markov random field (MRF) framework. First, the DRFs allow to relax the strong assumption of conditional independence of the observed data generally used in the MRF framework for tractability. This assumption is too restrictive for a large number of applications in vision. Second, the DRFs derive their classification power by exploiting the probabilistic discriminative models instead of the generative models used in the MRF framework. Finally, all the parameters in the DRF model are estimated simultaneously from the training data unlike the MRF framework where likelihood parameters are usually learned separately from the field parameters. We illustrate the advantages of the DRFs over the MRF framework in an application of man-made structure detection in natural images taken from the Corel database.
Article
A content-based image retrieval system normally returns the retrieval results according to the similarity between features extracted from the query image and candidate images. In certain circumstances, however, users may concern more about salient regions in an image of their interest and only wish to retrieve images containing the relevant salient regions while ignoring those irrelevant (such as the background or other regions and objects). Although how to represent the local image properties is still one of the most active research issues, much previous work on image retrieval does not examine salient regions in an image. In this paper, we propose an improved salient point detector based on wavelet transform; it can extract salient points in an image more accurately. Then salient points are segmented into different salient regions according to their spatial distribution. Colour moments and Gabor features of these different salient regions are computed and form a feature vector to index the image. We test the proposed scheme using a wide range of image samples from the Corel Image Library. The experimental results indicate that the method has produced promising results.
Conference Paper
We present conditional random fields, a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models and stochastic grammars for such tasks, including the ability to relax strong independence assumptions made in those models. Conditional random fields also avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states. We present iterative parameter estimation algorithms for conditional random fields and compare the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
Conference Paper
In this paper we introduce a new salient object segmentation method, which is based on combining a saliency measure with a conditional random field (CRF) model. The proposed saliency measure is formulated using a statistical framework and local feature contrast in illumination, color, and motion information. The resulting saliency map is then used in a CRF model to define an energy minimization based segmentation approach, which aims to recover well-defined salient objects. The method is efficiently implemented by using the integral histogram approach and graph cut solvers. Compared to previous approaches the introduced method is among the few which are applicable to both still images and videos including motion cues. The experiments show that our approach outperforms the current state-of-the-art methods in both qualitative and quantitative terms.
Conference Paper
Detection of visually salient image regions is useful for applications like object segmentation, adaptive compression, and object recognition. Recently, full-resolution salient maps that retain well-defined boundaries have attracted attention. In these maps, boundaries are preserved by retaining substantially more frequency content from the original image than older techniques. However, if the salient regions comprise more than half the pixels of the image, or if the background is complex, the background gets highlighted instead of the salient object. In this paper, we introduce a method for salient region detection that retains the advantages of such saliency maps while overcoming their shortcomings. Our method exploits features of color and luminance, is simple to implement and is computationally efficient. We compare our algorithm to six state-of-the-art salient region detection methods using publicly available ground truth. Our method outperforms the six algorithms by achieving both higher precision and better recall. We also show application of our saliency maps in an automatic salient object segmentation scheme using graph-cuts.
Conference Paper
In this paper an improved, macroblock (MB) level, visual saliency algorithm, aimed at video compression, is presented. A Relevance Vector Machine (RVM) is trained over dimensional feature vectors, pertaining to global, local and rarity measures of conspicuity, to yield probabalistic values which form the saliency map. These saliency values are used for non-uniform bit-allocation over video frames. A video compression architecture for propagation of saliency values, saving tremendous amount of computation, is also proposed.
Conference Paper
For many applications in graphics, design, and human computer interaction, it is essential to understand where humans look in a scene. Where eye tracking devices are not a viable option, models of saliency can be used to predict fixation locations. Most saliency approaches are based on bottom-up computation that does not consider top-down image semantics and often does not match actual eye movements. To address this problem, we collected eye tracking data of 15 viewers on 1003 images and use this database as training and testing examples to learn a model of saliency based on low, middle and high-level image features. This large database of eye tracking data is publicly available with this paper.
Article
Attention exhibits characteristic neural signatures in brain regions that process sensory signals. An important area of future research is to understand the nature of top-down signals that facilitate attentional guidance towards behaviorally relevant locations and features. In this review, we discuss recent studies that have made progress towards understanding: (i) the brain structures and circuits involved in attentional allocation; (ii) top-down attention pathways, particularly as elucidated by microstimulation and lesion studies; (iii) top-down modulatory influences involving subcortical structures and reward systems; (iv) plausible substrates and embodiments of top-down signals; and (v) information processing and theoretical constraints that might be helpful in guiding future experiments. Understanding top-down attention is crucial for elucidating the mechanisms by which we can filter sensory information to pay attention to the most behaviorally relevant events.
Article
When we look at a scene our scanning eye movements are not random [1]. Remarkably, different observers look at similar points in a given image. One explanation is that our understanding of the scene controls the paths our eyes take - so called 'top-down' control. An alternative possibility is that the visual system uses low-level 'bottom-up' features, such as edges, contrast or boundaries, to determine where the eyes land [2-4]. Fixated locations have been shown to contain higher values of 'low-level' visual features than non-fixated ones [2,3,5]. Moreover, biologically-plausible, low-level computational saliency maps produce scanpaths similar to those traced by human eye movements [4]. However, there is controversy about the role of bottom-up versus top-down control of eye movements [6,7]. To test between these possibilities, we measured the eye movements of two patients with visual agnosia who are severely impaired at recognizing objects or scenes, and therefore diverge from healthy volunteers in their understanding of the scene. Despite this, we found that, when inspecting a picture, their eyes look at the same locations as healthy individuals for the first few fixations. Initial eye movements, during a recognition task, therefore, are not affected by an impaired explicit understanding of the scene.
Article
Psychophysical and physiological evidence indicates that the visual system of primates and humans has evolved a specialized processing focus moving across the visual scene. This study addresses the question of how simple networks of neuron-like elements can account for a variety of phenomena associated with this shift of selective visual attention. Specifically, we propose the following: (1) A number of elementary features, such as color, orientation, direction of movement, disparity etc. are represented in parallel in different topographical maps, called the early representation. (2) There exists a selective mapping from the early topographic representation into a more central non-topographic representation, such that at any instant the central representation contains the properties of only a single location in the visual scene, the selected location. We suggest that this mapping is the principal expression of early selective visual attention. One function of selective attention is to fuse information from different maps into one coherent whole. (3) Certain selection rules determine which locations will be mapped into the central representation. The major rule, using the conspicuity of locations in the early representation, is implemented using a so-called Winner-Take-All network. Inhibiting the selected location in this network causes an automatic shift towards the next most conspicious location. Additional rules are proximity and similarity preferences. We discuss how these rules can be implemented in neuron-like networks and suggest a possible role for the extensive back-projection from the visual cortex to the LGN.
Article
A new hypothesis about the role of focused attention is proposed. The feature-integration theory of attention suggests that attention must be directed serially to each stimulus in a display whenever conjunctions of more than one separable feature are needed to characterize or distinguish the possible objects presented. A number of predictions were tested in a variety of paradigms including visual search, texture segregation, identification and localization, and using both separable dimensions (shape and color) and local elements or parts of figures (lines, curves, etc. in letters) as the features to be integrated into complex wholes. The results were in general consistent with the hypothesis. They offer a new set of criteria for distinguishing separable from integral features and a new rationale for predicting which tasks will show attention limits and which will not.
Article
Five important trends have emerged from recent work on computational models of focal visual attention that emphasize the bottom-up, image-based control of attentional deployment. First, the perceptual saliency of stimuli critically depends on the surrounding context. Second, a unique 'saliency map' that topographically encodes for stimulus conspicuity over the visual scene has proved to be an efficient and plausible bottom-up control strategy. Third, inhibition of return, the process by which the currently attended location is prevented from being attended again, is a crucial element of attentional deployment. Fourth, attention and eye movements tightly interplay, posing computational challenges with respect to the coordinate system used to control attention. And last, scene understanding and object recognition strongly constrain the selection of attended locations. Insights from these five key areas provide a framework for a computational and neurobiological understanding of visual attention.
Article
We evaluate the applicability of a biologically-motivated algorithm to select visually-salient regions of interest in video streams for multiply-foveated video compression. Regions are selected based on a nonlinear integration of low-level visual cues, mimicking processing in primate occipital, and posterior parietal cortex. A dynamic foveation filter then blurs every frame, increasingly with distance from salient locations. Sixty-three variants of the algorithm (varying number and shape of virtual foveas, maximum blur, and saliency competition) are evaluated against an outdoor video scene, using MPEG-1 and constant-quality MPEG-4 (DivX) encoding. Additional compression radios of 1.1 to 8.5 are achieved by foveation. Two variants of the algorithm are validated against eye fixations recorded from four to six human observers on a heterogeneous collection of 50 video clips (over 45 000 frames in total). Significantly higher overlap than expected by chance is found between human and algorithmic foveations. With both variants, foveated clips are, on average, approximately half the size of unfoveated clips, for both MPEG-1 and MPEG-4. These results suggest a general-purpose usefulness of the algorithm in improving compression ratios of unconstrained video.
Conference Paper
The use of interest points in content-based image retrieval allows the image index to represent local properties of the image. Classic corner detectors can be used for this purpose. However, they have drawbacks when applied to various natural images for image retrieval, because visual features need not be corners and corners may gather in small regions. We present a salient point detector that extract points where variations occur in the image, whether they are corner-like or not. The detector is based on the wavelet transform to detect global variations as well as local ones. The wavelet-based salient points are evaluated for image retrieval with a retrieval system using texture features. In this experiment our method provides better retrieval performance compared with other point detectors.
Frequencytuned salient region detection
  • R Achanta
  • S Hemami
  • F Estrada
  • S Susstrunk
Achanta R, Hemami S, Estrada F, Susstrunk S. Frequencytuned salient region detection. In: Proceedings of the 2009 IEEE International Conference on Computer Vision and Pattern Recognition. Miami Beach, Florida, USA: IEEE, 2009. 1597−1604
Saliency detection using maximum symmetric surround
  • R Achanta
  • S Susstrunk
Achanta R, Susstrunk S. Saliency detection using maximum symmetric surround. In: Proceedings of the 2010 IEEE International Conference on Image Processing. Hong Kong, China: IEEE, 2010. 2653−2656
Automatic salient object segmentation based on context and shape prior
  • H Z Jiang
  • J D Wang
  • Z J Yuan
  • T Liu
  • N Zheng
Jiang H Z, Wang J D, Yuan Z J, Liu T, Zheng N N. Automatic salient object segmentation based on context and shape prior. In: Proceedings of the 2011 British Machine Vision Conference. Dundee, Scotland, UK: BMVA Press, 2011. 1−12
  • H L Teuber
Teuber H L. Physiological psychology. Annual Review of Psychology, 1955, 6: 267−296