Conference Paper

Structural Saliency: The Detection Of Globally Salient Structures using A Locally Connected Network

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Not Available

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... When someone look at an scene, there is certain features that attract his/her attention, these are called salient structures [130]. In computer vision systems, its detection is useful to perform high-level tasks like object recognition or classification, 3D-reconstruction, measuring, navigation, or decision-making processes in perception and action systems. ...
... When humans look at a scene, there are certain structures that immediately attract our attention; these are composed of features that are grouped together by our visual system. The process of grouping features arising from a common underlying cause is called perceptual organization [16], [130]. The Gestalt psychology theory provides a theoretical framework for grouping and extracting features that are perceptually salient. ...
... . (130) After this process is done, we use DBSCAN algorithm for grouping the votes, thereafter we select the cluster with highest perceptual saliency. Since this voting process is executed for each edge pixel, the output of this level is a set of symmetry axes; where each element of these sets has an associated saliency value. ...
... In [60], the concepts of link saliency and contour saliency are introduced, which are used to identify smooth closed contours, bounding objects of unknown shape in real images. In [60], a method, named stochastic completion fields [63,64] , is adopted for the calculation of transition probability. The method [63,64] models the Gestalt laws of proximity and continuity by the distribution of smooth curves which are traced by particles that move with constant speed in directions undergoing Brownian motion. ...
... In [60], a method, named stochastic completion fields [63,64] , is adopted for the calculation of transition probability. The method [63,64] models the Gestalt laws of proximity and continuity by the distribution of smooth curves which are traced by particles that move with constant speed in directions undergoing Brownian motion. The method is not claimed as a saliency measure, but it is easy and natural to use this method to compute saliency. ...
... The method is not claimed as a saliency measure, but it is easy and natural to use this method to compute saliency. In addition, different saliency measures have been proposed [64][65][66][67][68] . Each of the measures is a function of a set of affinity values assigned to pairs of edges, incorporating the Gestalt principles with good continuation and proximity in some form. ...
Article
Full-text available
Object contour plays an important role in fields such as semantic segmentation and image classification. However, the extraction of contour is a difficult task, especially when the contour is incomplete or unclosed. In this paper, the existing contour detection approaches are reviewed and roughly divided into three categories: pixel-based, edge-based, and region-based. In addition, since the traditional contour detection approaches have achieved a high degree of sophistication, the deep convolutional neural networks (DCNNs) have good performance in image recognition, therefore, the DCNNs based contour detection approaches are also covered in this paper. Moreover, the future development of contour detection is analyzed and predicted.
... In this paper, we propose an improved Saliency Network to automatically delineate contour of glottis from HSV data. Our approach is based on original work on The Saliency Network introduced by Saashua and Ullman [13], which is considered as one of the top methods to solve the problem of extracting salient images with gap completion. Saliency measure of the Saliency Network, referred to as SU measure, is formulated from both local saliency and structural saliency. ...
... The Saliency Network aims at extracting salient contour from images and has been considered as one of the top bottomup methods of perceptual group. In the Saliency Network, image pixels, also called elements [13], are classified into active and virtual elements. The elements that lie on edges are referred to as active elements, otherwise they are referred as to virtual elements. ...
... If ρ i is active it is set to as a value smaller than or equal to 1, and if ρ i is virtual it is set as smaller than the value of the active. In [13], ρ is set to 1 if ρ i is active, otherwise it is set to 0.7. C i,j aims to measure the shape of the curve that is inversely related to the total curvature of the curve and is defines as: ...
Article
Full-text available
In recent years, high-speed videoendoscopy (HSV) has significantly aided the diagnosis of voice pathologies and furthered the understanding the voice production in recent years. As the first step of these studies, automatic segmentation of glottal images till presents a major challenge for this technique. In this paper, we propose an improved Saliency Network that automatically delineates the contour of the glottis from HSV image sequences. Our proposed additional saliency measure, Motion Saliency (MS), improves upon the original Saliency Network by using the velocities of defined edges. In our results and analysis, we demonstrate the effectiveness of our approach and discuss its potential applications for computer-aided assessment of voice pathologies and understanding voice production.
... The hieroglyph represented in Figure 1a was not properly detected because there are no perfect circles in the image. The curvature and structural salience method tries to extract the regions with special interest [17]. This method was tested on edges of cartouches for extracting the hieroglyphs, but the results were similar to apply a threshold to the edges as can be seen in Figure 1g. ...
... Sensors 2017,17, 589 ...
Article
Full-text available
This work presents a novel strategy to decipher fragments of Egyptian cartouches identifying the hieroglyphs of which they are composed. A cartouche is a drawing, usually inside an oval, that encloses a group of hieroglyphs representing the name of a monarch. Aiming to identify these drawings, the proposed method is based on several techniques frequently used in computer vision and consists of three main stages: first, a picture of the cartouche is taken as input and its contour is localized. In the second stage, each hieroglyph is individually extracted and identified. Finally, the cartouche is interpreted: the sequence of the hieroglyphs is established according to a previously generated benchmark. This sequence corresponds to the name of the king. Although this method was initially conceived to deal with both high and low relief writing in stone, it can be also applied to painted hieroglyphs. This approach is not affected by variable lighting conditions, or the intensity and the completeness of the objects. This proposal has been tested on images obtained from the Abydos King List and other Egyptian monuments and archaeological excavations. The promising results give new possibilities to recognize hieroglyphs, opening a new way to decipher longer texts and inscriptions, being particularly useful in museums and Egyptian environments. Additionally, devices used for acquiring visual information from cartouches (i.e., smartphones), can be part of a navigation system for museums where users are located in indoor environments by means of the combination of WiFi Positioning Systems (WPS) and depth cameras, as unveiled at the end of the document.
... Knowledge of object shape: In contrast to contributions like [35], it is assumed that objects to be found are of known shapes whose models are given in prior to the algorithm. The category of algorithms like the one in [35] does not assume any knowledge about the shape and try to extract salient structure from the image, hoping to retrieve the main object(s) in that image. ...
... Knowledge of object shape: In contrast to contributions like [35], it is assumed that objects to be found are of known shapes whose models are given in prior to the algorithm. The category of algorithms like the one in [35] does not assume any knowledge about the shape and try to extract salient structure from the image, hoping to retrieve the main object(s) in that image. This is beyond the scope of this work. ...
Article
We introduce a simple and effective concept for localizing objects in densely cluttered edge images based on shape information. The shape information is characterized by a binary template of the object's contour, provided to search for object instances in the image. We adopt a segment-based search strategy, in which the template is divided into a set of segments. In this work, we propose our own segment representation that we call One-Pixel Segment (OPS), in which each pixel in the template is treated as a separate segment. This is done to achieve high flexibility that is required to account for intra-class variations. OPS representation can also handle scale changes effectively. A dynamic programming algorithm uses the OPS representation to realize the search process, enabling a detailed localization of the object boundaries in the image. The concept's simplicity is reflected in the ease of implementation, as the paper's title suggests. The algorithm works directly with very noisy edge images extracted using the Canny edge detector, without the need for any preprocessing or learning steps. We present our experiments and show that our results outperform those of very powerful, state-of-the-art algorithms.
... The major advantage of the method is that it has the capability of integrating multiple constraints (or even high level knowledge) globally with an explicit mathematical model, by which the deformation of the contour from the initial to the final result can be implemented by energy minimizing. As perceptual organization is an old issue in computer vision, the energy or similar methods have been approached more recently (Shaashua and Ullman, 1988;McCafferty, 1990;Williams and Jacobs, 1995;Guy and Medioni, 1996). Cohen and Deschamps (2001) developed a method for finding a set of contours as minimal paths between end points using the fast marching algorithm. ...
... The strategy developed in this paper tries to combine perceptual grouping and energy minimization for road primitives linking. Global constraints involved by using the energy method will improve the reliability of grouping (Shaashua A., and S. Ullman, 1988). In each iteration step, grouping carried out at the most salient primitive can make the computational cost affordable as only limited primitives being probably linked to it are considered. ...
Article
Roads are one of the core components of urban infrastructure and an important information layer for GIS. One of the key problems in automatic road extraction from remotely sensed images is to link the detected primitives (usually the fragmental line segments) to the road lines correctly and completely. In this paper we present a perceptual grouping method which is based on energy minimization. The method is an iterative process. By iteration, the primitives are gradually grouped into long co-curvilinear road lines. The road lines are the optimal paths satisfying the criterion of energy minimization. An incremental search and a route evaluation algorithm are used for energy minimization and optimal route selection respectively. We applied the technique in highway extraction from high resolution imagery where the lane markings can serve as the primary cues for the existence of roads. The experimental results demonstrate that our approach can group the fragmental primitives into highway track lines reliably. The method implemented is a flexible framework for automatic road extraction, and will be useful for other automatic road extraction scenarios.
... Early approaches detected shapes in images using perceptual saliency criteria to group image edgels [108,109,110]. An iterative optimization method to group them based on local curvature and curvature variation was proposed in [110]. ...
... Early approaches detected shapes in images using perceptual saliency criteria to group image edgels [108,109,110]. An iterative optimization method to group them based on local curvature and curvature variation was proposed in [110]. Similar ideas were explored in [108,109] using measures such as co-curvilinearity and co-circularity for perceptual grouping. ...
Article
Object detection is a significant challenge in Computer Vision and has received a lot of attention in the field. One such challenge addressed in this thesis is the detection of polygonal objects, which are prevalent in man-made environments. Shape analysis is an important cue to detect these objects. We propose a contour-based object detection framework to deal with the related challenges, including how to efficiently detect polygonal shapes and how to exploit them for object detection. First, we propose an efficient component tree segmentation framework for stable region extraction and a multi-resolution line segment detection algorithm, which form the bases of our detection framework. Our component tree segmentation algorithm explores the optimal threshold for each branch of the component tree, and achieves a significant improvement over image thresholding segmentation, and comparable performance to more sophisticated methods but only at a fraction of computation time. Our line segment detector overcomes several inherent limitations of the Hough transform, and achieves a comparable performance to the state-of-the-art line segment detectors. However, our approach can better capture dominant structures and is more stable against low-quality imaging conditions. Second, we propose a global shape analysis measurement for simple polygon detection and use it to develop an approach for real-time landing site detection in unconstrained man-made environments. Since the task of detecting landing sites must be performed in a few seconds or less, existing methods are often limited to simple local intensity and edge variation cues. By contrast, we show how to efficiently take into account the potential sites’ global shape, which is a critical cue in man-made scenes. Our method relies on component tree segmentation algorithm and a new shape regularity measure to look for polygonal regions in video sequences. In this way we enforce both temporal consistency and geometric regularity, resulting in reliable and consistent detections. Third, we propose a generic contour grouping based object detection approach by exploring promising cycles in a line fragment graph. Previous contour-based methods are limited to use additive scoring functions. In this thesis, we propose an approximate search approach that eliminates this restriction. Given a weighted line fragment graph, we prune its cycle space by removing cycles containing weak nodes or weak edges, until the upper bound of the cycle space is less than the threshold defined by the cyclomatic number. Object contours are then detected as maximally scoring elementary circuits in the pruned cycle space. Furthermore, we propose another more efficient algorithm, which reconstructs the graph by grouping the strongest edges iteratively until the number of the cycles reaches the upper bound. Our approximate search approaches can be used with any cycle scoring function. Moreover, unlike other contour grouping based approaches, our approach does not rely on a greedy strategy for finding multiple candidates and is capable of finding multiple candidates sharing common line fragments. We demonstrate that our approach significantly outperforms the state-of-the-art.
... Early contour grouping models are different versions of the local search. One of the first attempts to collect edges using the recurrent optimization method is [54]. An early work [55] applies the Monte Carlo particle filter to contour tracking. ...
Article
Full-text available
Segmentation of tumors in the ultrasound (US) images of the breast is a critical problem in medical imaging. Due to the poor quality of the US images and varying specifications of the US machines, the segmentation and classification of the abnormalities present difficulties even for trained radiologists. Nevertheless, the US remains one of the most reliable and inexpensive tests. Recently, an artificial life (ALife) model based on tracing agents and fusion of the US and the elasticity images (F-ALife) has been proposed and analyzed. Under certain conditions, F-ALife outperforms state-of-the-art including the selected deep learning (DL) models, deformable models, machine learning, contour grouping and superpixels. Apart from the improved accuracy, F-ALife requires smaller training sets. The strongest competitors of the F-ALife are hybrids of the DL with conventional models. However, the current DL methods require a large amount of data (thousands of annotated images), which often is not available. Moreover, the hybrids require that the conventional model is properly integrated into the DL. Therefore, we offer a new DL-based hybrid with ALife. It is characterized by a high accuracy, requires a relatively small dataset, and is capable of handling previously unseen data. The new ideas include (1) a special image mask to guide ALife. The mask is generated using DL and the distance transform, (2) modification of ALife for segmentation of the US images providing a high accuracy. (These ideas are motivated by the “vehicles” of Braitenberg (Vehicles, experiments in synthetic psychology, MIT Press, Cambridge, 1984) and ALife proposed in Karunanayake et al. (Pattern Recognit 108838, 2022), (3) a two-level genetic algorithm which includes training by an individual image and by the entire set of images. The training employs an original categorization of the images based on the properties of the edge maps. The efficiency of the algorithm is demonstrated on complex tumors. The method combines the strengths of the DL neural networks with the speed and interpretability of ALife. The tests based on the characteristics of the edge map and complexity of the tumor shape show the advantages of the proposed DL-ALife. The model outperforms 14 state-of-the-art algorithms applied to the US images characterized by a complex geometry. Finally, the novel classification allows us to test and analyze the limitations of the DL for the processing of the unseen data. The code is applicable to breast cancer diagnostics (Automated Breast Ultra Sound), US-guided biopsies as well as to projects related to automatic breast scanners. A video demo is at https://tinyurl.com/3xthedff.
... We remark that the evaluation of results is here only qualitative due to the lack of ground truth images. Recalling reference works in imaging and vision such as [60,61], the minimal property that should be guaranteed by any inpainting method is the so-called good connection property, i.e. the ability of connecting separated pieces of a curve (here, image level lines) in a coherent way. The approaches considered do satisfy this minimal property at least whenever the inpainting domain is sufficiently Fig. 16 Text inpainting comparison on a detail from the Venanson chapel small. ...
Article
Full-text available
The unprecedented success of image reconstruction approaches based on deep neural networks has revolutionised both the processing and the analysis paradigms in several applied disciplines. In the field of digital humanities, the task of digital reconstruction of ancient frescoes is particularly challenging due to the scarce amount of available training data caused by ageing, wear, tear and retouching over time. To overcome these difficulties, we consider the Deep Image Prior (DIP) inpainting approach which computes appropriate reconstructions by relying on the progressive updating of an untrained convolutional neural network so as to match the reliable piece of information in the image at hand while promoting regularisation elsewhere. In comparison with state-of-the-art approaches (based on variational/PDEs and patch-based methods), DIP-based inpainting reduces artefacts and better adapts to contextual/non-local information, thus providing a valuable and effective tool for art historians. As a case study, we apply such approach to reconstruct missing image contents in a dataset of highly damaged digital images of medieval paintings located into several chapels in the Mediterranean Alpine Arc and provide a detailed description on how visible and invisible (e.g., infrared) information can be integrated for identifying and reconstructing damaged image regions.
... Subsequently efforts are made to improve results by a global linking process that seeks to exploit curvilinear continuity. Examples include dynamic programming [6], relaxation approaches [7], saliency networks [8], and stochastic completion [9]. A third dimension on which different segmentation schemes can be compared is the class of images for which they are applicable. ...
Article
Full-text available
A method for texture images segmentation based on classification of contour elements and logical addition of classes. The essence of the method consists in the contouring of the image, determining the position of contour elements in the image of different types (points, lines, and shapes) converting closely spaced similar contour elements into binary regions objects, binary coding mutual position obtained areal objects within the boundaries of the image segmentation are resulting in code matrix. Index Terms-Texture image segmentation, classification contour elements.
... Edge-based approaches, as the name implies, use edge detection and edge linking to segment an object. The linking of edges is usually based on psychological and gestalt properties (Shashua & Ullman, 1988). A contour-based approach would frequently include an active contour paradigm (Caselles et al., 1997). ...
Article
Full-text available
We developed a top-down and bottom-up segmentation ofobjects using shape contours through a two-stage procedure. First, the object was identified using an edge-based contour feature and then the object contour was obtained using a constraint optimization procedure based on the results from the earlier identified contours. The initial object detection provides object category specific information for the contour completion to be effected. We argue that top-down bottom-up interaction architecture has plausible neurological correlates. This method has an advantage in that it does not require learning boundaries with large datasets.
... There are many applications that use visual interest or salient objects that have been developed at this time, such as, automatic image cropping [24], adaptive image display on small devices [7], image / video compression, advertising design [16], and images collection browsing [23]. Various approaches to detect salient area of various objects have been done, including structural salient object detection [25], A Model of Saliency-Based Visual Attention for Rapid Scene Analysis [17], Saliency Based on Information Maximization [4], Graph-based visual saliency [12], A spectral residual approach saliency detection [15], multi-task saliency pursuit method [20], Frequency-tuned salient region detection [1], Highlighting sparse salient regions [14], adaptive clustering saliency [5], outlier detection saliency [6], the soft image abstraction salient detection [9], and several optimizations such as Robust Background Detection Optimization [26], and global contrast based optimization [8]. ...
... The role that contextual modulation plays in cortical function remains an open question. Some consider such interactions to be directly involved in image processing, such as the detection and enhancement of smooth, spatially extended contours [22][23][24][25][26][27][28][29][30][31][32][33][34][35][36][37] . Others argue that the fundamental goal of contextual modulation is to generate a sparse, efficient representation of natural images 6,[38][39][40][41][42][43][44][45] . ...
Article
Full-text available
The normalization model provides an elegant account of contextual modulation in individual neurons of primary visual cortex. Understanding the implications of normalization at the population level is hindered by the heterogeneity of cortical neurons, which differ in the composition of their normalization pools and semi-saturation constants. Here we introduce a geometric approach to investigate contextual modulation in neural populations and study how the representation of stimulus orientation is transformed by the presence of a mask. We find that population responses can be embedded in a low-dimensional space and that an affine transform can account for the effects of masking. The geometric analysis further reveals a link between changes in discriminability and bias induced by the mask. We propose the geometric approach can yield new insights into the image processing computations taking place in early visual cortex at the population level while coping with the heterogeneity of single cell behavior.
... Many high-profile models of human visual information processing, appearing over the past 3 decades, include some variant of early selection within a feedforward visual processing stream [36], [37], [38], [39], [40], [41], [42], [43], [44], [45], [46]. These not only claim biological inspiration but also biological realism. ...
Article
Full-text available
The current dominant visual processing paradigm in both human and machine research is the feedforward, layered hierarchy of neural-like processing elements. Within this paradigm, visual saliency is seen by many to have a specific role, namely that of early selection. Early selection is thought to enable very fast visual performance by limiting processing to only the most salient candidate portions of an image. This strategy has led to a plethora of saliency algorithms that have indeed improved processing time efficiency in machine algorithms, which in turn have strengthened the suggestion that human vision also employs a similar early selection strategy. However, at least one set of critical tests of this idea has never been performed with respect to the role of early selection in human vision. How would the best of the current saliency models perform on the stimuli used by experimentalists who first provided evidence for this visual processing paradigm? Would the algorithms really provide correct candidate sub-images to enable fast categorization on those same images? Do humans really need this early selection for their impressive performance? Here, we report on a new series of tests of these questions whose results suggest that it is quite unlikely that such an early selection process has any role in human rapid visual categorization.
... Shashua detected the global saliency region of 2D images from 3D structure [65] . Ullman analyzed 2D ...
Article
Full-text available
The research on 3D scene viewpoints has been a frontier problem in computer graphics and virtual reality technology. In a pioneering study, it had been extensively used in virtual scene understanding, image-based modeling, and visualization computing. With the development of computer graphics and the human-computer interaction, the viewpoint evaluation becomes more significant for the comprehensive understanding of complex scenes. The high-quality viewpoints could navigate observers to the region of interest, help subjects to seek the hidden relations of hierarchical structure, and improve the efficiency of virtual exploration. These studies later contributed to research such as robot vision, dynamic scene planning, virtual driving and artificial intelligence navigation.The introduction of visual perception had The introduction of visual perception had contributed to the inspiration of viewpoints research, and the combination with machine learning made significant progress in the viewpoints selection. The viewpoints research also has been significant in the optimization of global lighting, visualization calculation, 3D supervising rendering, and reconstruction of a virtual scene. Additionally, it has a huge potential in novel fields such as 3D model retrieval, virtual tactile analysis, human visual perception research, salient point calculation, ray tracing optimization, molecular visualization, and intelligent scene computing. Keywords: View point, Three-dimensional scene, Visual perception, Mesh saliency, Curvature
... Many high-profile models of human visual information processing, appearing over the past 3 decades, include some variant of early selection within a feedforward visual processing stream, including Sha'ashua & Ullman (1988), Olshausen et al. (1993), Itti & Koch (2001), Walther et al. (2002), Z. Li (2002,2014), Deco & Rolls (2004), Itti (2005), Chikkerur et al. (2010), Zhang et al. (2011), Buschman & Kastner (2015) and more. These not only claim biological inspiration but also biological realism. ...
Preprint
Full-text available
The current dominant visual processing paradigm in both human and machine research is the feedforward, layered hierarchy of neural-like processing elements. Within this paradigm, visual saliency is seen by many to have a specific role, namely that of early selection. Early selection is thought to enable very fast visual performance by limiting processing to only the most relevant candidate portions of an image. Though this strategy has indeed led to improved processing time efficiency in machine algorithms, at least one set of critical tests of this idea has never been performed with respect to the role of early selection in human vision. How would the best of the current saliency models perform on the stimuli used by experimentalists who first provided evidence for this visual processing paradigm? Would the algorithms really provide correct candidate sub-images to enable fast categorization on those same images? Here, we report on a new series of tests of these questions whose results suggest that it is quite unlikely that such an early selection process has any role in human rapid visual categorization.
... This rule will generally lead to unordered sets of edges that do not respect the 1D nature of contours. An alternative is to measure the grouping likelihoods only between successive pairs of edges on the contour (Elder & Goldberg 2002) and invoke a Markov assumption: The likelihood of the contour is given by the product of the likelihoods of each local pairwise association between adjacent edges (Zucker et al. 1977, Sha'ashua & Ullman 1988, Elder & Zucker 1996a, Williams & Jacobs 1997, Movahedi & Elder 2013, Almazen et al. 2017). This assumption respects the 1D nature of the contour but greatly simplifies the probabilistic model: The local pairwise grouping probabilities are now sufficient statistics for computing maximum probability contours. ...
Article
The human visual system reliably extracts shape information from complex natural scenes in spite of noise and fragmentation caused by clutter and occlusions. A fast, feedforward sweep through ventral stream involving mechanisms tuned for orientation, curvature, and local Gestalt principles produces partial shape representations sufficient for simpler discriminative tasks. More complete shape representations may involve recurrent processes that integrate local and global cues. While feedforward discriminative deep neural network models currently produce the best predictions of object selectivity in higher areas of the object pathway, a generative model may be required to account for all aspects of shape perception. Research suggests that a successful model will account for our acute sensitivity to four key perceptual dimensions of shape: topology, symmetry, composition, and deformation.
... Displays of this kind also allowed us to manipulate the constituent elements of the display without changing the global shape percept. Displays based on groupings of dots have been used by other investigators to explore a variety of grouping and detection phenomena (Lezama, 2015;Pizlo, Salach-Golyska, & Rosenfeld, 1997;Sha'ashua & Ullman, 1988;Smits & Vos, 1987;Uttal, 1973). ...
Article
Full-text available
The ability to form shape representations from visual input is crucial to perception, thought, and action. Perceived shape is abstract, as evidenced when we can see a contour specified only by discrete dots, when a cloud appears to resemble a fish, or when we match shapes across transformations of scale and orientation. Surprisingly little is known about the formation of abstract shape representations in biological vision. We report experiments that demonstrate the existence of abstract shape representations in visual perception and identify the time course of their formation. In Experiment 1, we varied stimulus exposure time in a task that required abstract shape and found that it emerges about 100 ms after stimulus onset. The results also showed that abstract shape representations are invariant across certain transformations and that they can be recovered from spatially separated dots. Experiment 2 found that encoding of basic visual features, such as dot locations, occurs during the first 30 ms after stimulus onset, indicating that shape representations require processing time beyond that needed to extract spatial features. Experiment 3 used a convergent method to confirm the timing and importance of abstract shape representations. Given sufficient time, shape representations form automatically and obligatorily, affecting performance even in a task in which neither instructions nor accurate responding involved shape. These results provide evidence for the existence, emergence, and functional importance of abstract shape representations in visual perception. We contrast these results with “deep learning” systems and with proposals that deny the importance of abstract representations in human perception and cognition.
... Humans can easily identify the salient (or relevant) parts of an image mainly due to the attention mechanism of the human visual system. "Visual saliency", coined by Ullman and Sha'ashua [6] was extended by Itti et al. [7] towards the development of a computational architecture. Computational models of saliency take images as input and generate a topographical map of how salient or attention grabbing each area of the image can be to a human observer. ...
Article
Full-text available
Medical image processing has become a major player in the world of automatic tumour region detection and is tantamount to the incipient stages of computer aided design. Saliency detection is a crucial application of medical image processing, and serves in its potential aid to medical practitioners by making the affected area stand out in the foreground from the rest of the background image. The algorithm developed here is a new approach to the detection of saliency in a three dimensional multi channel MR image sequence for the glioblastoma multiforme (a form of malignant brain tumour). First we enhance the three channels, FLAIR (Fluid Attenuated Inversion Recovery), T2 and T1C (contrast enhanced with gadolinium) to generate a pseudo coloured RGB image. This is then converted to the CIE L*a*b* color space. Processing on cubes of sizes k = 4, 8, 16, the L*a*b* 3D image is then compressed into volumetric units; each representing the neighbourhood information of the surrounding 64 voxels for k = 4, 512 voxels for k = 8 and 4096 voxels for k = 16, respectively. The spatial distance of these voxels are then compared along the three major axes to generate the novel 3D saliency map of a 3D image, which unambiguously highlights the tumour region. The algorithm operates along the three major axes to maximise the computation efficiency while minimising loss of valuable 3D information. Thus the 3D multichannel MR image saliency detection algorithm is useful in generating a uniform and logistically correct 3D saliency map with pragmatic applicability in Computer Aided Detection (CADe). Assignment of uniform importance to all three axes proves to be an important factor in volumetric processing, which helps in noise reduction and reduces the possibility of compromising essential information. The effectiveness of the algorithm was evaluated over the BRATS MICCAI 2015 dataset having 274 glioma cases, consisting both of high grade and low grade GBM. The results were compared with that of the 2D saliency detection algorithm taken over the entire sequence of brain data. For all comparisons, the Area Under the receiver operator characteristic (ROC) Curve (AUC) has been found to be more than 0.99 ± 0.01 over various tumour types, structures and locations.
... Perceptual organization is the process of grouping features arising from a common underlying cause [1], [2]. From a computer science perspective, a set of tokens that share certain property, is the support of a structure. ...
... The main contribution of this paper is to promote Gestalt laws as a means for embedding certain computer vision tasks into new or existing computer games or game concepts. The idea for using these laws as a connecting element arises from the fact that Gestalt laws play an important role, not only in game design (Betts 2011), but also in computer vision (see Poggio et al. 1985;Sha'asua and Ullman 1988). One example of Gestalt laws being actively used for game design is Geometry Wars: Retro Evolved (Bizarre Creations 2005). ...
Article
Full-text available
Crowdsourcing, the process of gathering financial support or services from an online community, has become extremely popular throughout the last decade. Playsourcing is a particular form of crowdsourcing that requires embedding human intelligence-based tasks in computer games. We present a general framework for embedding tasks such as edge detection into computer games. Our proposed framework is based on Gestalt principles, and is a seamless and invisible means of embedding into existing computer games, which could make such tasks by the player more popular. A case study is presented in order to demonstrate the general principle as well as the potential of this approach.
... We briefly summarize some common approaches to object segmentation; they can generally be grouped into three approaches, namely, edge-based [1], contour-based and region-based. Edge-based methods are generally based on edge detection and edge linking based on psychological and gestalt properties [2]. Common contour-based methods are based active contour paradigm [3]. ...
... This grouping process is enhanced if the elements have orientations that align with the path (Field, Hayes, & Hess, 1993;Hess & Field, 1999;Kovács, 1996;Li & Gilbert, 2002). Further, knowledge of the global form of the path contributes to local integration, such as the form closing (Kovács, 1996;Kovács & Julesz, 1993 and smoothness (Pettet, 1999;Pettet, McKee, & Grzywacz, 1998), and the global knowledge is often necessary to disambiguate competing local groupings in cluttered scenes (Ullman & Sha'ashua, 1988). ...
Article
Full-text available
It has been shown that early visual areas are involved in contour processing. However, it is not clear how local and global context interact to influence responses in those areas, nor has the interarea coordination that yields coherent structural percepts been fully studied, especially in human observers. In this study, we used functional magnetic resonance imaging (fMRI) to measure activity in early visual cortex while observers performed a contour detection task in which alignment of Gabor elements and background clutter were manipulated. Six regions of interest (two regions, containing either the cortex representing the target or the background clutter, in each of areas V1, V2, and V3) were predefined using separate target versus background functional localizer scans. The first analysis using a general linear model showed that in the presence of background clutter, responses in V1 and V2 target regions of interest were significantly stronger to aligned than unaligned contours, whereas when background clutter was absent, no significant difference was observed. The second analysis using interarea correlations showed that with background clutter, there was an increase in V1-V2 coordination within the target regions when perceiving aligned versus unaligned contours; without clutter, however, correlations between V1 and V2 were similar no matter whether aligned contours were present or not. Both the average response magnitude and the connectivity analysis suggest different mechanisms support contour processing with or without background distractors. Coordination between V1 and V2 may play a major role in coherent structure perception, especially with complex scene organization.
... Computational models. Numerous computer vision algorithms for grouping make use of the Gestalt factors described above (e.g., Elder et al., 2003; Estrada & Elder, 2006; Jacobs, 1996; Lowe, 1985; Sha'ashua & Ullman, 1988; Stahl & Wang, 2008). Most of these use the local Gestalt principles of proximity, good continuation, and similarity to group long chains under an explicit or implicit Markov assumption, while additional global factors of convexity (Jacobs, 1996), closure (Elder et al., 2003), and symmetry (Stahl & Wang, 2008) have been employed to further condition the search. ...
... Modelo de Sha'ashua y Ulman [SU88]. También conocido como modelo de saliencia estructural, aunque no está relacionado con ningún modelo de visión temprana, si está relacionado con la búsqueda visual. ...
... Global methods utilize local measurements and embed them into a a framework which minimizes a global cost over all disjoint pairs of patches. Early methods in this line of work include that of Shashua and Ullman [18] and Elder and Zucker [19]. The paper of Shashua and Ullman used a simple dynamic programming approach to compute closed, smooth contours from local, disjoint edge fragments. ...
Article
Detecting boundaries between semantically meaningful objects in visual scenes is an important component of many vision algorithms. In this paper, we propose a novel method for detecting such boundaries based on a simple underlying principle: pixels belonging to the same object exhibit higher statistical dependencies than pixels belonging to different objects. We show how to derive an affinity measure based on this principle using pointwise mutual information, and we show that this measure is indeed a good predictor of whether or not two pixels reside on the same object. Using this affinity with spectral clustering, we can find object boundaries in the image – achieving state-of-the-art results on the BSDS500 dataset. Our method produces pixel-level accurate boundaries while requiring minimal feature engineering.
... The probabilistic expression of this model has been supported by studies of the ecological statistics of contour grouping, which have also focused principally upon first-order cues [7,15,21,35]. [12] Similarly, many computer vision algorithms for contour grouping have employed a Markov assumption and have focused on first-order cues [3,4,8,11,18,24,28,33,41,43]. However, these first-order Markov algorithms have generally not performed well unless augmented by additional problem-domain knowledge [8] or user interaction [5]. ...
Chapter
Humans are very good at rapidly detecting salient objects such as animals in complex natural scenes, and recent psychophysical results suggest that the fastest mechanisms underlying animal detection use contour shape as a principal discriminative cue. How does our visual system extract these contours so rapidly and reliably? While the prevailing computational model represents contours as Markov chains that use only first-order local cues to grouping, computer vision algorithms based on this model fall well below human levels of performance. Here we explore the possibility that the human visual system exploits higher-order shape regularities in order to segment object contours from cluttered scenes. In particular, we consider a recurrent architecture in which higher areas of the object pathway generate shape hypotheses that condition grouping processes in early visual areas. Such a generative model could help to guide local bottom-up grouping mechanisms toward globally consistent solutions. In constructing an appropriate theoretical framework for recurrent shape processing, a central issue is to ensure that shape topology remains invariant under all actions of the feedforward and feedback processes. This can be achieved by a promising new theory of shape representation based upon a family of local image deformations called formlets, shown to outperform alternative contour-based generative shape models on the important problem of visual shape completion.
... Perceptual grouping has often been formulated as an energy minimization problem, e.g. [12,30,32,7,34,16,26], yielding a single region or (possibly) closed contour, or a partition into regions. In the more recent context of generating region proposals, a parametric energy minimization problem is often formulated (e.g. ...
... The subjective assessment of an image depends heavily on identifying the salient region within it. The term "visual saliency" was coined by Ullman and Sha'ashua [5], and extended by Itti et al. [6] towards the development of a computational architecture. The human visual system is sensitive to the salient regions in an image, due to their high discriminative features, thereby resulting in early visual arousal. ...
Article
Full-text available
The automatic computerized detection of regions of interest (ROI) is an important step in the process of medical image processing and analysis. The reasons are many, and include an increasing amount of available medical imaging data, existence of inter-observer and inter-scanner variability, and to improve the accuracy in automatic detection in order to assist doctors in diagnosing faster and on time. A novel algorithm, based on visual saliency, is developed here for the identification of tumor regions from MR images of the brain. The GBM saliency detection model is designed by taking cue from the concept of visual saliency in natural scenes. A visually salient region is typically rare in an image, and contains highly discriminating information, with attention getting immediately focused upon it. Although color is typically considered as the most important feature in a bottom-up saliency detection model, we circumvent this issue in the inherently gray scale MR framework. We develop a novel pseudo-coloring scheme, based on the three MRI sequences, viz. FLAIR, T2 and T1C (contrast enhanced with Gadolinium). A bottom-up strategy, based on a new pseudo-color distance and spatial distance between image patches, is defined for highlighting the salient regions in the image. This multi-channel representation of the image and saliency detection model help in automatically and quickly isolating the tumor region, for subsequent delineation, as is necessary in medical diagnosis. The effectiveness of the proposed model is evaluated on MRI of 80 subjects from the BRATS database in terms of the saliency map values. Using ground truth of the tumor regions for both high- and low- grade gliomas, the results are compared with four highly referred saliency detection models from literature. In all cases the AUC scores from the ROC analysis are found to be more than 0.999 ± 0.001 over different tumor grades, sizes and positions.
... Previous models examined the contributions of horizontal connections (e.g. within area V1) to perceptual grouping by enhancing the responses elicited by colinear contour elements [129][130][131]. Other models aimed to explain how multiple cortical modules interact to enable the correct interpretation of a visual scene [132] or how the modification of long range horizontal connections by top-down interaction can account for such findings [22]. ...
Article
Full-text available
The processing of a visual stimulus can be subdivided into a number of stages. Upon stimulus presentation there is an early phase of feedforward processing where the visual information is propagated from lower to higher visual areas for the extraction of basic and complex stimulus features. This is followed by a later phase where horizontal connections within areas and feedback connections from higher areas back to lower areas come into play. In this later phase, image elements that are behaviorally relevant are grouped by Gestalt grouping rules and are labeled in the cortex with enhanced neuronal activity (object-based attention in psychology). Recent neurophysiological studies revealed that reward-based learning influences these recurrent grouping processes, but it is not well understood how rewards train recurrent circuits for perceptual organization. This paper examines the mechanisms for reward-based learning of new grouping rules. We derive a learning rule that can explain how rewards influence the information flow through feedforward, horizontal and feedback connections. We illustrate the efficiency with two tasks that have been used to study the neuronal correlates of perceptual organization in early visual cortex. The first task is called contour-integration and demands the integration of collinear contour elements into an elongated curve. We show how reward-based learning causes an enhancement of the representation of the to-be-grouped elements at early levels of a recurrent neural network, just as is observed in the visual cortex of monkeys. The second task is curve-tracing where the aim is to determine the endpoint of an elongated curve composed of connected image elements. If trained with the new learning rule, neural networks learn to propagate enhanced activity over the curve, in accordance with neurophysiological data. We close the paper with a number of model predictions that can be tested in future neurophysiological and computational studies.
... Also founded-similarly to tensor voting-on the principles of Gestalt psychology (cf. [12]) the work on structural saliency in [39] aims at detecting global structures from local (image) features. An overview on further perceptual organisation methods subdivided in different categories can be found in [25]. ...
Chapter
Full-text available
Perceptual organisation techniques aim at mimicking the human visual system for extracting salient information from noisy images. Tensor voting has been one of the most versatile of those methods, with many different applications both in computer vision and medical image analysis. Its strategy consists in propagating local information encoded through tensors by means of perception-inspired rules. Although it has been used for more than a decade, there are still many unsolved theoretical issues that have made it challenging to apply it to more problems, especially in analysis of medical images. The main aim of this chapter is to review the current state of the research in tensor voting, to summarise its present challenges, and to describe the new trends that we foresee will drive the research in this field in the next few years. Also, we discuss extensions of tensor voting that could lead to potential performance improvements and that could make it suitable for further medical applications.
Article
Full-text available
Automatic Image Cropping is a challenging task with many practical downstream applications. The task is often divided into sub-problems - generating cropping candidates, finding the visually important regions, and determining aesthetics to select the most appealing candidate. Prior approaches model one or more of these sub-problems separately, and often combine them sequentially. We propose a novel convolutional neural network (CNN) based method to crop images directly, without explicitly modeling image aesthetics, evaluating multiple crop candidates, or detecting visually salient regions. Our model is trained on a large dataset of images cropped by experienced editors and can simultaneously predict bounding boxes for multiple fixed aspect ratios. We consider the aspect ratio of the cropped image to be a critical factor that influences aesthetics. Prior approaches for automatic image cropping, did not enforce the aspect ratio of the outputs, likely due to a lack of datasets for this task. We, therefore, benchmark our method on public datasets for two related tasks - first, aesthetic image cropping without regard to aspect ratio, and second, thumbnail generation that requires fixed aspect ratio outputs, but where aesthetics are not crucial. We show that our strategy is competitive with or performs better than existing methods in both these tasks. Furthermore, our one-stage model is easier to train and significantly faster than existing two-stage or end-to-end methods for inference. We present a qualitative evaluation study, and find that our model is able to generalize to diverse images from unseen datasets and often retains compositional properties of the original images after cropping. We also find that the model can generate crops with better aesthetics than the ground truth in the MIRThumb dataset for image thumbnail generation with no fine tuning. Our results demonstrate that explicitly modeling image aesthetics or visual attention regions is not necessarily required to build a competitive image cropping algorithm.
Preprint
Full-text available
We introduce a geometric approach to study the representation of orientation by populations of neurons in primary visual cortex in the presence and absence of an additive mask. Despite heterogeneous effects at the single cell level, a simple geometric model explains how population responses are transformed by the mask and reveals how changes in discriminability and bias relate to each other. We propose that studying the geometry of neural populations can yield insights into the role of contextual modulation in the processing of sensory signals.
Article
Full-text available
This thesis proposes a deep learning approach to bone segmentation in abdominal CNN+PG. Segmentation is a common initial step in medical images analysis, often fundamental for computer-aided detection and diagnosis systems. The extraction of bones in PG is a challenging task, which if done manually by experts requires a time consuming process and that has not today a broadly recognized automatic solution. The method presented is based on a convolutional neural network, inspired by the U-Net and trained end-to-end, that performs a semantic segmentation of the data. The training dataset is made up of 21 abdominal PG+CNN, each one containing between 0 and 255 2D transversal images. Those images are in full resolution, 4*4*50 voxels, and each voxel is classified by the network into one of the following classes: background, femoral bones, hips, sacrum, sternum, spine and ribs. The output is therefore a bone mask where the bones are recognized and divided into six different classes. In the testing dataset, labeled by experts, the best model achieves a Dice coefficient as average of all bone classes of 0.8980. This work demonstrates, to the best of my knowledge for the first time, the feasibility of automatic bone segmentation and classification for PG using a convolutional neural network.
Thesis
Les modèles actifs de contour, encore appelés SNAKE et introduits initialement par Kass, Witkin et Terzopoulos, proposent un cadre unifié, basé sur l'utilisation d'énergies potentielles, permettent de résoudre globalement de nombreux problèmes de vision. Cependant des problèmes de choix de coefficients du modèle de potentiel, une grande dépendance de l'initialisation et des instabilités numériques entravent encore le bon fonctionnement de cette méthode. Apres une étude de ces problèmes, nous proposons une méthode, dite de croissance du SNAKE, fonctionnant par allongements successifs du SNAKE et dont les principaux avantages concernent l'amélioration de la convergence du processus et une plus grande indépendance des conditions d'initialisation. Nous avons ensuite développé un modèle étendu de SNAKE rendant possible des rétractions ou des expansions. Nous envisageons une méthode dynamique d'utilisation de telles déformations pour permettre une détection de meilleure qualité
Chapter
Clustering refers to the process of extracting maximally coherent groups from a set of objects using pairwise, or high-order, similarities. Traditional approaches to this problem are based on the idea of partitioning the input data into a predetermined number of classes, thereby obtaining the clusters as a by-product of the partitioning process. In this chapter, we provide a brief review of our recent work which offers a radically different view of the problem and allows one to work directly on non-(geo)metric data. In contrast to the classical approach, in fact, we attempt to provide a meaningful formalization of the very notion of a cluster in the presence of non-metric (even asymmetric and/or negative) (dis)similarities and show that game theory offers an attractive and unexplored perspective that serves well our purpose. To this end, we formulate the clustering problem in terms of a non-cooperative “clustering game” and show that a natural notion of a cluster turns out to be equivalent to a classical (evolutionary) game-theoretic equilibrium concept. Besides the game-theoretic perspective, we exhibit also characterizations of our cluster notion in terms of optimization theory and graph theory. As for the algorithmic issues, we describe two approaches to find equilibria of a clustering game. The first one is based on the classical replicator dynamics from evolutionary game theory, the second one is a novel class of dynamics inspired by infection and immunization processes which overcome their limitations. Finally, we show applications of the proposed framework to matching problems, where we aim at finding correspondences within a set of elements. In particular, we address the problems of point-pattern matching and surface registration.
Conference Paper
A method for texture images segmentation based on classification of contour elements and logical addition of classes. The essence of the method consists in the contouring of the image, determining the position of contour elements in the image of different types (points, lines, and shapes) converting closely spaced similar contour elements into binary regions objects, binary coding mutual position obtained areal objects within the boundaries of the image segmentation are resulting in code matrix.
Article
Active contour models, or snakes, are effective and robust in contour extraction. In most papers on snakes, an initialization close to the desired contour is assumed to be provided, which is inappropriate in many cases. The ziplock snake model presented by Neuenschwander et al. (1997), however, needs only two user-supplied endpoints. The optimization process for a ziplock snake starts from the two endpoints and progresses towards the center of the snake. In this paper, we present a method to combine a grammatical model that encodes a priori shape information with the ziplock snakes. A competing mechanism is adopted to take advantage of the shape models without inducing excessive computation. The resulting model-based ziplock snakes have many advantages over the original model: They can accurately locate contour features, produce more refined results, and deal with multiple contours, missing image cues and noise
Article
Contour completion plays an important role in visual perception, where the goal is to group fragmented low-level edge elements into perceptually coherent and salient contours. Most existing methods for contour completion have focused on pixelwise detection accuracy. In contrast, fewer methods have addressed the global contour closure effect, despite of psychological evidences for its importance. This paper proposes a purely contour-based higher-order CRF model to achieve contour closure, through local connectedness approximation. This leads to a simplified problem structure, where our higher-order inference problem can be transformed into an integer linear program (ILP) and be solved efficiently. Compared with methods based on the same bottom-up edge detector, our method achieves a superior contour grouping ability (measured by Rand index), a comparable precision-recall performance, and more visually pleasing results. Our results suggest that contour closure can be effectively achieved in contour domain, in contrast to a popular view that segmentation is essential for this purpose.
Chapter
A number of techniques have been presented so far that perform a range of tasks of varying complexity; some are specific to raw images, such as edge detection or the more elaborate region splitting and merging algorithms. Others are more abstract (or general purpose), such as the studies of graphical representations and pattern recognition techniques. What has been overlooked hitherto, though, is the (perhaps obvious) observation that the best known vision system, our own, is geared specifically to dealing with the 3D world and as yet the gap between images and the real world of 3D objects, with all their problems of relative depth, occlusion etc. has not been seriously examined.
Conference Paper
We propose a multi-stage approach to curve extraction where the curve fragment search space is iteratively reduced by removing unlikely candidates using geometric constrains, but without affecting recall, to a point where the application of an objective functional becomes appropriate. The motivation in using multiple stages is to avoid the drawback of using a global functional directly on edges, which can result in non-salient but high scoring curve fragments, which arise from non-uniformly distributed edge evidence. The process progresses in stages from local to global: (i) edges, (ii) curvelets, (iii) unambiguous curve fragments, (iv) resolving ambiguities to generate a full set of curve fragment candidates, (v) merging curve fragments based on a learned photometric and geometric cues as well a novel lateral edge sparsity cue, and (vi) the application of a learned objective functional to get a final selection of curve fragments. The resulting curve fragments are typically visually salient and have been evaluated in two ways. First, we measure the stability of curve fragments when images undergo visual transformations such as change in viewpoints, illumination, and noise, a critical factor for curve fragments to be useful to later visual processes but one often ignored in evaluation. Second, we use a more traditional comparison against human annotation, but using the CFGD dataset and CFGD evaluation strategy rather than the standard BSDS counterpart, which is shown to be not appropriate for evaluating curve fragments. Under both evaluation schemes our results are significantly better than those state of the art algorithms whose implementations are publicly available.
Article
Clustering refers to the process of extracting maximally coherent groups from a set of objects using pairwise, or high-order, similarities. Traditional approaches to this problem are based on the idea of partitioning the input data into a predetermined number of classes, thereby obtaining the clusters as a by-product of the partitioning process. In this chapter, we provide a brief review of our recent work which offers a radically different view of the problem. In contrast to the classical approach, in fact, we attempt to provide a meaningful formalization of the very notion of a cluster and we show that game theory offers an attractive and unexplored perspective that serves well our purpose. To this end, we formulate the clustering problem in terms of a non-cooperative "clustering game" and show that a natural notion of a cluster turns out to be equivalent to a classical (evolutionary) game theoretic equilibrium concept. We prove that the problem of finding the equilibria of our clustering game is equivalent to locally optimizing a polynomial function over the standard simplex, and we provide a discrete-time dynamics to perform this optimization, based on the Baum-Eagon inequality. Experiments on real-world data are presented which show the superiority of our approach over the state of the art.
Article
Traditional methods for geometric entities resort to the Hough transform and tensor voting schemes for detect lines and circles. In this work, the authors extend these approaches using representations in terms of k-vectors of the Conformal Geometric Algebra. Of interest is the detection of lines and circles in images, and planes, circles, and spheres in the 3-D visual space; for that, we use the randomized Hough transform, and by means of k-blades we code such geometric entities. Motivated by tensor voting, we have generalized this approach for any kind of geometric entities or geometric flags formulating the perceptual saliency function involving k-vectors. The experiments using real images show the performance of the algorithms.
Article
Full-text available
We propose a novel approach to the grouping of dot patterns by the good continuation law. Our model is based on local symmetries, and the non-accidentalness principle to determine perceptually relevant configurations. A quantitative measure of non-accidentalness is proposed, showing a good correlation with the visibility of a curve of dots. A robust, unsupervised and scale-invariant algorithm for the detection of good continuation of dots is derived. The results of the proposed method are illustrated on various datasets, including data from classic psychophysical studies. An online demonstration of the algorithm allows the reader to directly evaluate the method.
Article
Full-text available
This thesis studies two mathematical models for an elementary visual task: theperceptual grouping of dot patterns. The first model handles the detection ofperceptually relevant arrangements of collinear dots. The second model extendsthis framework to the more general case of good continuation of dots. In bothcases, the proposed models are scale invariant and unsupervised. They aredesigned to be robust to noise, up to the point where the structures to detectbecome mathematically indistinguishable from noise. The experiments presentedshow a good match of our detection theory with the unmasking processes takingplace in human perception, supporting their perceptual plausibility.The proposed models are based on the a contrario framework, a formalization ofthe non-accidentalness principle in perception theory. This thesis makes twocontributions to the a contrario methodology. One is the introduction ofadaptive detection thresholds that are conditional to the structure's localsurroundings. The second is a new refined strategy for resolving the redundancyof multiple meaningful detections. Finally, the usefulness of the collinear point detector as a general patternanalysis tool is demonstrated by its application to a classic problem incomputer vision: the detection of vanishing points. The proposed dot alignmentdetector, used in conjunction with standard tools, produces improved resultsover the state-of-the-art methods in the literature.Aiming at reproducible research, all methods are submitted to the IPOL journal,including detailed descriptions of the algorithms, commented reference sourcecodes, and online demonstrations for each one.
Article
Full-text available
The properties of isotropy, smoothness, minimum curvature and locality suggest the shape of filled-in contours between two boundary edges. The contours are composed of the arcs of two circles tangent to the given edges, meeting smoothly, and minimizing the total curvature. It is shown that shapes meeting all the above requirement can be generated by a network which performs simple, local computations. It is suggested that the filling-in process plays an important role in the early processing of visual information.
Conference Paper
This paper describes a collection of multiresolution, or “pyramid”, techniques for rapidly extracting global structures (features, regions, patterns) from an image. If implemented in parallel on suitable cellular pyramid hardware, these techniques require processing times on the order of the logarithm of the image diameter.
Article
This report describes some simple techniques for smoothly filling in gaps in object contours. The first technique considered was recently proposed by Ullman; it constructs the completion of the contour using two arcs of circles that are tangent to the gap ends and to each other, and that have minimum total curvature. An analysis of this technique is presented, and examples of its use are given. A second technique uses cubic polynomial completions; when suitably constrained, this technique yields very reasonable completions.
Article
We assume that edge detection is the task of measuring and localizing changes of light intensity in the image. As discussed by V. Torre and T. Poggio (1984), “On Edge Detection,” AI Memo 768, MIT AI Lab), edge detection, when defined in this way, is λ problem of numerical differentiation, which is ill posed. This paper shows that simple regularization methods lead to filtering the image prior to an appropriate differentiation operation. In particular, we prove (1) that the variational formulation of Tikhonov regularization leads to λ convolution filter, (2) that the form of this filter is similar to the Gaussian filter, and (3) that the regularizing parameter λ in the variational principle effectively controls the scale of the filter.
Article
This paper presents a method for detecting edges and contours in noisy pictures. The properties of an edge are embedded in a figure of merit and the edge detection problem becomes the problem of minimizing the given figure of merit. This problem can be represented as a shortest path problem on a graph and can be solved using well-known graph search algorithms. The relations between this representation of the minimization problem and a dynamic programming approach are discussed, showing that the graph search method can lead to substantial improvements in computing time. Moreover, if heuristic search methods are used, the computing time will depend on the amount of noise in the picture. Some experimental results are given; these show how various information about the shape of the contour of an object can be embedded in the figure of merit, thus allowing the extraction of contours from noisy pictures and the separation of touching objects.
Article
A technique for recognizing systems of lines is presented. In this technique the heuristic of the problem is not embedded in the recognition algorithm but is expressed in a figure of merit. A multistage decision process is then able to recognize in the input picture the optimal system of lines according to the given figure of merit. Due to the global approach, greater flexibility and adequacy in the particular problem is achieved. The relation between the structure of the figure of merit and the complexity of the optimization process is then discussed. The method described is suitable for parallel processing because the operations relative to each state can be computed in parallel, and the number of stages is equal to the length N of the curves (or to log2 (N) if the approximate method is used). Key Words and Phrases: picture processing, picture recognition, picture description, curve detection, line
Book
A computational model is presented for the visual recognition of three-dimensional objects based upon their spatial correspondence with two-dimensional features in an image. A number of components of this model are developed in further detail and implemented as computer algorithms. At the highest level, a verification process has been developed which can determine exact values of viewpoint and object parameters from hypothesized matches between three-dimensional object features and two-dimensional image features. This provides a reliable quantitative procedure for evaluating the correctness of an interpretation, even in the presence of noise or occlusion. Given a reliable method for final evaluation of correspondence, the remaining components of the system are aimed at reducing rthe size of the search space which must be covered. Unlike many previous approaches, this recognition process does not assume that is is possible to directly derive depth information from the image. Instead, the primary descriptive component is a process of perceptual organization, which spatial relations are detected directly among two-dimensional image features. A basic requirement of the recognition process is that perceptual organization should accurately distinguish meaningful groupings from those which arise by accident of viewpoint or position.
Article
This paper examines the problem of shape-based object recognition, and proposes a new approach, the alignment of pictorial descriptions. The first part of the paper reviews general approaches to visual object recognition, and divides these approaches into three broad classes: invariant properties methods, object decomposition methods, and alignment methods. The second part presents the alignment method. In this approach the recognition process is divided into two stages. The first determines the transformation in space that is necessary to bring the viewed object into alignment with possible object models. This stage can proceed on the basis of minimal information, such as the object's dominant orientation, or a small number of corresponding feature points in the object and model. The second stage determines the model that best matches the viewed object. At this stage, the search is over all the possible object models, but not over their possible views, since the transformation has already been determined uniquely in the alignment stage. The proposed alignment method also uses abstract description, but unlike structural description methods it uses them pictorially, rather than in symbolic structural descriptions.
Article
Psychophysical and physiological evidence indicates that the visual system of primates and humans has evolved a specialized processing focus moving across the visual scene. This study addresses the question of how simple networks of neuron-like elements can account for a variety of phenomena associated with this shift of selective visual attention. Specifically, we propose the following: (1) A number of elementary features, such as color, orientation, direction of movement, disparity etc. are represented in parallel in different topographical maps, called the early representation. (2) There exists a selective mapping from the early topographic representation into a more central non-topographic representation, such that at any instant the central representation contains the properties of only a single location in the visual scene, the selected location. We suggest that this mapping is the principal expression of early selective visual attention. One function of selective attention is to fuse information from different maps into one coherent whole. (3) Certain selection rules determine which locations will be mapped into the central representation. The major rule, using the conspicuity of locations in the early representation, is implemented using a so-called Winner-Take-All network. Inhibiting the selected location in this network causes an automatic shift towards the next most conspicious location. Additional rules are proximity and similarity preferences. We discuss how these rules can be implemented in neuron-like networks and suggest a possible role for the extensive back-projection from the visual cortex to the LGN.
Article
Research with texture pairs having identical second-order statistics has revealed that the pre-attentive texture discrimination system cannot globally process third- and higher-order statistics, and that discrimination is the result of a few local conspicuous features, called textons. It seems that only the first-order statistics of these textons have perceptual significance, and the relative phase between textons cannot be perceived without detailed scrutiny by focal attention.
Article
A new hypothesis about the role of focused attention is proposed. The feature-integration theory of attention suggests that attention must be directed serially to each stimulus in a display whenever conjunctions of more than one separable feature are needed to characterize or distinguish the possible objects presented. A number of predictions were tested in a variety of paradigms including visual search, texture segregation, identification and localization, and using both separable dimensions (shape and color) and local elements or parts of figures (lines, curves, etc. in letters) as the features to be integrated into complex wholes. The results were in general consistent with the hypothesis. They offer a new set of criteria for distinguishing separable from integral features and a new rationale for predicting which tasks will show attention limits and which will not.
Article
We describe a hierarchic computer procedure for the detection of nodular tumors in a chest radiograph. The radiograph is scanned and consolidated into several resolutions which are enhanced and analyzed by a hierarchic tumor recognition process. The hierarchic structure of the tumor recognition process has the form of a ladder-like decision tree. The major steps in the decision tree are: 1) find the lung regions within the chest radiograph, 2) find candidate nodule sites (potential tumor locations) within the lung regions, 3) find boundaries for most of these sites, 4) find nodules from among the candidate nodule boundaries, and 5) find tumors from among the nodules. The first three steps locate potential nodules in the radiograph. The last two steps classify the potential nodules into nonnodules, nodules which are not tumors, and nodules which are tumors.