Figure 3 - uploaded by Qi Ma
Content may be subject to copyright.
Selection results of five models in the test video. 

Selection results of five models in the test video. 

Source publication
Conference Paper
Full-text available
Salient areas in natural scenes are generally regarded as the candidates of attention focus in human eyes, which is the key stage in object detection. In computer vision, many models have been proposed to simulate the behav- ior of eyes such as SaliencyToolBox (STB), Neuromorphic Vision Toolkit (NVT) and etc., but they demand high com- putational c...

Context in source publication

Context 1
... use a video (988 images in total) captured at 15f/s with the resolution of 640 × 480 to test the performance of each methods. Fig.3 shows the selection results and orders of five methods in six frames. ...

Similar publications

Article
Full-text available
Recently, we proposed marginal space learning (MSL) as a generic approach for automatic detection of 3D anatomical structures in many medical imaging modalities. To accurately localize a 3D object, we need to estimate nine parameters (three for position, three for orientation, and three for anisotropic scaling). Instead of uniformly searching the o...

Citations

... Looking at the available saliency models for stationary images, we have Itti's model [1], this is regarded as the most used model for stationary image. Other models such as [2], which use Fourier transformation along the lines of phase spectrum and [3] uses frequency tuning for saliency detection. The commonality among the aforementioned models is the employment of the bottom-up visual attention mechanism. ...
... We shall use B M ∈ ℝ 3×bn and F M ∈ ℝ 3×fn to represent background model and foreground appearance model, with fn and bn being the sizes of their respective backgrounds, while their job is to take care of the i − th super pixel's RGB (Red, Green, Blue) history in all regions. Then, we follow (1) and (2). ...
Article
Full-text available
span lang="EN-US">Video saliency detection is a major growing field with quite few contributions to it. The general method available today is to conduct frame wise saliency detection and this leads to several complications, including an incoherent pixel-based saliency map, making it not so useful. This paper provides a novel solution to saliency detection and mapping with its custom spatio-temporal fusion method that uses frame wise overall motion colour saliency along with pixel-based consistent spatio-temporal diffusion for its temporal uniformity. In the proposed method section, it has been discussed how the video is fragmented into groups of frames and each frame undergoes diffusion and integration in a temporary fashion for the colour saliency mapping to be computed. Then the inter group frame are used to format the pixel-based saliency fusion, after which the features, that is, fusion of pixel saliency and colour information, guide the diffusion of the spatio temporal saliency. With this, the result has been tested with 5 publicly available global saliency evaluation metrics and it comes to conclusion that the proposed algorithm performs better than several state-of-the-art saliency detection methods with increase in accuracy with a good value margin. All the results display the robustness, reliability, versatility and accuracy.</span
... It can give us a numerical value (seefigure 4) as well as one-dimensional and multi-dimensional outcomes for the low and high frequency components of our images (seeFigures 5 & 6). The low frequency components represent luminosity, and the higher components represent the lumens distribution, such the standard deviation of the luminosity of an image, and the threedimensional edges and structure of the assessed image (seeGuo et al., 2008). We do not, necessarily, need to compile code for acquiring outcomes for these characteristics, there are open-source software that are available for this task (seeFigure 4; for open-code code and open-source software resources see https://osf.io/xuvtd/). ...
Preprint
Full-text available
In this manuscript, we discourse concepts, issues and resolutions that call for conscious awareness in research on the unconscious. We discuss subjects that contribute to our knowledge of the unconscious as a concept. We discuss seminal historical episodes and episodes of controversial experimentation that are formative rallying points for understanding the polarity of contemporary attitudes to the unconscious in contemporary psychological science. We acknowledge the impact of historical controversy to contemporary polarisation, and proceed to show how the latter has resulted in methodological scrutiny, topical concerns and contributing resolutions. We provide empirical illustrations concerning how purportedly established and currently debated topical issues bias contemporary experimental research, and we provide empirical illustrations on how we can resolve several biases in contemporary experimental research. We examine the issues that have been resolved, the issues that remain unresolved, and most critically whether and the extent to which we employ and apply our conceptual findings and empirical advances in this area. We argue the importance of scholarly and scientific awareness for progressing to further topical developments.
... In fact, characterizing this uncertainty using the gradient of the image might be more appropriate. This is because the gradient represents only the absolute difference between adjacent pixels in an image, without reflecting its frequency content [16]. This implies that we can compute the sum of image variations from the differences between neighboring pixels using the image gradient, thereby avoiding the misleading effects of low-frequency signals. ...
... However, it is infeasible to derive the closed-form expectation of diff max by Equation (16). Then, we can give a complete induction to calculate B, by enumerating all possible cases of I in extreme value space, i.e., {0, 1}. ...
... Following this, the Phase spectrum of the Fourier Transform (PFT) was presented, which achieved nearly the same performance as the SR. Guo et al. [36] proposed a method called phase spectrum of Fourier Transform (PQFT) to calculate spatiotemporal saliency maps, which are used to detect salient objects in both natural images and videos. They believed the phase spectrum is critical to the saliency map because the saliency map can be easily obtained when the amplitude spectrum of the image is at any nonzero constant value. ...
Article
Full-text available
In the non-deep learning-based salient object detection methods known so far, the detection effect and robustness based on the background detection method are good. However, results are not desirable in small objects and complex scene images. This paper proposes a salient object detection algorithm, which employs a fusion framework to fuse background and frequency domain features to improve the accuracy of salient object detection. First, an improved background model is proposed for salient object detection to extract the background feature of the image. Simultaneously, the frequency domain features are obtained by the proposed frequency domain algorithm, which combines global information and local details by the Gaussian pyramid algorithm and different filters. Then, within our fusion framework, the fusion operations are guided by the self-attention mechanism to fuse background and multi-scale frequency domain features to obtain the self-attention maps. Finally, this paper introduces a fusion algorithm to derive the final saliency map from the self-attention maps. The results demonstrate that the proposed method consistently outperforms state-of-the-art approaches in four evaluation metrics on six challenging and complicated datasets and improves the accuracy of salient object detection in complex and small object scene images.
... For video saliency detection, the spatial-temporal information contained in the video frames is critical. Similar to image saliency detection, traditional video saliency detection methods capture saliency cues based on hand-crafted spatial-temporal features [46,47], but low-level hand-crafted features couldn't deliver a satisfactory performance for modeling dynamic saliency. Recently, a good deal of models based on deep learning have been proposed that adopt different ways of acquiring temporal information. ...
Article
Full-text available
Saliency detection plays an important role in computer vision and scene understanding, which has attracted increasing attention in recent years. Compared to the widely studied image saliency prediction, there are still many problems to be solved in the area of video saliency. Different from images, effectively describing and utilizing the motion information contained in video data is a critical issue. In this paper, we propose a spatial and motion dual-stream framework for video saliency detection. The coarse motion features extracting from optical flow are fine-tuned with higher level semantic spatial features via a residual cross-connection. A hierarchical fusion structure is proposed to maintain contextual information by integrating spatial and motion features in each level. To model the inter-frame correlation in the video, the convolutional gated recurrent unit (convGRU) is used to retain global consistency of the saliency area between neighbor frames. Experimental results on four widely used datasets demonstrate the effectiveness of the proposed method with other state-of-the-art methods. Our source codes can be acquired at https://github.com/banhuML/MFHF.
... Itti's model is one of the most [1] researched and most prominent models for image saliency. Fourier transformation is used with the help of phase spectrum and [2], [3] helps image saliency using frequency tuning. They have used the principles of inhibition of return and winner take all that is inspired from the visual nervous system [4], [5]. ...
Article
Full-text available
There have been several researches done in the field of image saliency but not as much as in video saliency. In order to increase precision and accuracy during compression, reduce coding complexity and time consumption along with memory allocation problems with our proposed solution. It is a modified high-definition video compression (HEVC) pixel based consistent spatiotemporal diffusion with temporal uniformity. It involves taking apart the video into groups of frames, computing colour saliency, integrate temporal fusion, pixel saliency fusion is conducted and then colour information guides the diffusion process for the spatiotemporal mapping with the help of permutation matrix. The proposed solution is tested on a publicly available extensive dataset with five global saliency valuation metrics and is compared with several other state-of-the-art saliency detection methods. The results display and overall best performance amongst all other candidates.
... Hou and Zhang [16] demonstrated a straightforward approach for generating the corresponding saliency map in the spatial domain by analyzing the log-spectrum of an input image. Guo [17] proposed a fast method that uses the spectral residual of the amplitude spec-trum to build the saliency map. Schauerte and Stiefelhagen [18] firstly proposed employing Eigenaxes and Eigenangles for models of spectral saliency dependent on the Fourier transform. ...
Article
Full-text available
Visual attention is one of the most significant characteristics for selecting and understanding the outside redundancy world. The human vision system cannot process all information simultaneously due to the visual information bottleneck. In order to reduce the redundant input of visual information, the human visual system mainly focuses on dominant parts of scenes. This is commonly known as visual saliency map prediction. This paper proposed a new psychophysical oriented saliency prediction architecture, which inspired by multi-channel model of visual cortex functioning in humans. The model consists of opponent color channels, wavelet transform, wavelet energy map, and contrast sensitivity function for extracting low-level image features and providing a maximum approximation to the low-level human visual system. The proposed model is evaluated using several datasets, including the MIT1003, MIT300, TORONTO, SID4VAM, and UCF Sports datasets. We also quantitatively and qualitatively compare the saliency prediction performance with that of other state-of-the-art models. Our model achieved strongly stable and better performance with different metrics on natural images, psychophysical synthetic images and dynamic videos. Additionally, we suggested that Fourier and spectral-inspired saliency prediction models outperformed other state-of-the-art non-neural network and even deep neural network models on psychophysical synthetic images. In the meantime, we suggest that deep neural networks need specific architectures and goals to be able to predict salient performance on psychophysical synthetic images better and more reliably. Finally, the proposed model could be used as a computational model of primate low-level vision system and help us understand mechanism of primate low-level vision system. The project page can be available at: https://sinodanishspain.github.io/HVS_SaliencyModel/.
... There is a great scope in the development of methods for detecting saliency in the spatial and frequency domains, in recent years. Wavelet transform is used in some techniques which are used to extract the frequency-based features for detecting salient objects [14][15][16][17][18]. Wavelet transformation is based on multiscale analysis of an image which considers both spatial as well as frequency-based details simultaneously. ...
Article
Full-text available
Today, salient object detection has caught the interest of numerous researchers for a variety of applications in computer vision. Most deep learning-based algorithms for SOD tasks produce excellent results but require a lot of data availability and large computational and structural complexities. Also, many methods provide excellent outcomes but are unable to preserve the complete boundaries of the objects in images. The paper discusses an innovative integration method for salient object detection of superpixel segmented images to address these issues. It deals with the integration of saliency maps generated by image decomposition based on non-sub-sampled contourlet transform (NSCT) and by machine learning technique based on the random forest regression using a parameter adaptive pulse coupled neural network (PA-PCNN). The PA-PCNN adaptively estimates each of the free parameters of the pulse-coupled neural network (PCNN). The utilization of PA-PCNN considers neighborhood pixel variances and aids in maintaining object details without fuzziness or distortions. The proposed method restores the edges and boundaries of the objects effectively as PA-PCNN aids in maintaining the perceptually similar attributes of the saliency maps. The results of this study are evaluated using three widely used datasets for detecting salient objects, which show the potential of the proposed system to precisely locate the salient objects in various imaging circumstances like complex background images, images with multiple objects, etc. The quantitative and qualitative experimental results validate a substantial advancement in various evaluation parameters for salient object detection with better boundary preservation of objects.
... Hull shapes, as a crucial parameter for wake detection, were directly extracted using a traditional coarse-to-fine method. Candidate hulls were fast located through a visual saliency detection method called phase spectrum of Fourier transform (PFT) (Guo, Ma, and Zhang 2008). Hull refining module was executed to generate accurate shapes of candidate hulls and false alarms were eliminated according to shape features. ...
... Therefore, PFT was employed to achieve the saliency map given its high processing efficiency. Detailed steps are summarized below (Guo, Ma, and Zhang 2008 respectively; P(F) represents the phase spectrum of the transformed image F; g(x, y) is a 2D Gaussian filter; kk denotes the modulo operation. Salient targets were extracted as candidate hulls through threshold segmentation. ...
Article
Full-text available
Satellite remote-sensing provides a cost- and time-effective tool for ship monitoring at sea. Most existing approaches focused on extraction of ship locations using either hull or wake. In this paper, a method of cascaded detection of ship hull and wake was proposed to locate and classify ships using high-resolution satellite imagery. Candidate hulls were fast located through phase spectrum of Fourier transform. A hull refining module was then executed to acquire accurate shapes of candidate hull. False alarms were removed through the shape features and textures of candidate hulls. The probability that a candidate hull is determined as a real one increased with the presence of wakes. After true ships were determined, ship classification was conducted using a fuzzy classifier combining both hull and wake information. The proposed method was implemented to Gaofen-1 panchromatic and multispectral (PMS) imagery and showed good performance for ship detection with recall, precision, overall accuracy, and specificity of 90.1%, 88.1%, 98.8%, and 99.3%, respectively, better than other state-of-the-art coarse-to-fine ship detection methods. Ship classification was successfully achieved for ships with detected wakes. The accuracy of correct classification was 83.8% while the proportion of false classification was 1.0%. Factors influencing the accuracy of the developed method, including texture features and classifiers combination and key parameters of the method, were also discussed.
... Traditional methods to study users' visual attention have historically focused on images. However, recently more complex visual stimuli have been considered, such as videos [1][2][3][4][5], virtual reality environments [6,7], egocentric videos [8,9], and websites. Unfortunately, the literature on users' visual attention on websites is generally based on salience maps computed for static Web interfaces [10][11][12][13], where the website structure is always known in advance [14]. ...
Article
Full-text available
Understanding users’ visual attention on websites is paramount to enhance the browsing experience, such as providing emergent information or dynamically adapting Web interfaces. Existing approaches to accomplish these challenges are generally based on the computation of salience maps of static Web interfaces, while websites increasingly become more dynamic and interactive. This paper proposes a method and provides a proof-of-concept to predict user’s visual attention on specific regions of a website with dynamic components. This method predicts the regions of a user’s visual attention without requiring a constant recording of the current layout of the website, but rather by knowing the structure it presented in a past period. To address this challenge, the concept of visit intention is introduced in this paper, defined as the probability that a user, while browsing, will fixate their gaze on a specific region of the website in the next period. Our approach uses the gaze patterns of a population that browsed a specific website, captured via an eye-tracker device, to aid personalized prediction models built with individual visual kinetics features. We show experimentally that it is possible to conduct such a prediction through multilabel classification models using a small number of users, obtaining an average area under curve of 84.3%, and an average accuracy of 79%. Furthermore, the user’s visual kinetics features are consistently selected in every set of a cross-validation evaluation.