Article

Fast Object Recognition Using Dynamic Programming from a Combination of Salient Line Groups

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper presents a new method of grouping and matching line segments to recognize objects. We propose a dynamic programming-based formulation extracting salient line patterns by defining a robust and stable geometric representation that is based on perceptual organizations. As the endpoint proximity, we detect several junctions from image lines. We then search for junction groups by using the collinear constraint between the junctions. Junction groups similar to the model are searched in the scene, based on a local comparison. A DP-based search algorithm reduces the time complexity for the search of the model lines in the scene. The system is able to find reasonable line groups in a short time.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Line based efficient object recognition was adopted as the recognition method of the landmark's lines [12] [13]. The estimation of the landmark's position requires knowledge of the landmark's vertex and line intersections. ...
Article
This paper addresses an underwater landmark for updating UUV positioning information. A method is proposed in which the landmark's cubic shape and edge are recognized. The reliability, installation load, and management of landmark design were taken into consideration in order to assess practical applications of the landmark. Landmark recognition was based on topological features. The straight line recognition confirmed the landmark's location and enabled an UUV to accurately estimated its underwater position with respect to the landmark. An efficient recognition method is proposed, which provides real-time processing with limited UUV computing power. An underwater experiment was conducted in order to evaluate the proposed method's reliability and accuracy.
... In this paper, the noise model and the error propagation for the collinearity test between lines are proposed. The Gaussian noise distribution for two endpoints of a line is an effective and general approach as referred to in Haralick [25] and Kang [26], etc. In this section, we use the Gaussian noise model to compute error propagation for the collinearity of two lines obtained from the micro-tensile image and find a threshold value resulting from the error variance test to decide whether the micro-tensile specimen that provides evaluation lines in the image of the camera sensor are well aligned or not. ...
Article
Full-text available
This paper proposes a new system for verification of the alignment of loading fixtures and test specimens during tensile testing of thin film with a micrometer size through direct imaging. The novel and reliable image recognition system to evaluate the misalignment between the load train and the specimen axes during tensile test of thin film was developed using digital image processing technology with CCD. The decision of whether alignment of the tensile specimen is acceptable or not is based on a probabilistic analysis through the edge feature extraction of digital imaging. In order to verify the performance of the proposed system and investigate the effect of the misalignment of the specimen on tensile properties, the tensile tests were performed as displacement control in air and at room temperature for metal thin film, the beryllium copper (BeCu) alloys. In the case of the metal thin films, bending stresses caused by misalignment are insignificant because the films are easily bent during tensile tests to eliminate the bending stresses. And it was observed that little effects and scatters on tensile properties occur by stress gradient caused by twisting at in-plane misalignment, and the effects and scatters on tensile properties are insignificant at out-of-plane misalignment, in the case of the BeCu thin film.
... Line grouping based on graph search and optimisation techniques enforcing line continuity and smoothness were applied to integrate line evidence [13,23], but segmentation of objects based on linear segments requires relevant local segments configurations that capture objects shape characteristics [22]. Shape modelling assuming evenly distributed landmark points along nematode body proved a complex issue, although non-linear systems had been devised [10] the complete range of nematode body configurations is still far from being model. ...
Conference Paper
Full-text available
In this paper we study how shape information encoded in contour energy components values can be used for detection of microscopic organisms in population images. We proposed features based on shape and geometrical statistical data obtained from samples of optimized contour lines integrated in the framework of Bayesian inference for recognition of individual specimens. Compared with common geometric features the results show that patterns present in the image allow better detection of a considerable amount of individuals even in cluttered regions when sufficient shape information is retained. Therefore providing an alternative to building a specific shape model or imposing specific constrains on the interaction of overlapping objects. Department of telecommunication and information processing, Ghent University, St-Pieters Nieuwstraat 41, B-9000, Ghent, Belgium Centro de Vision y Robotica, Facultad de Ingenieria en Electricidad y Computación, ESPOL University, Km 30.5 via perimetral, 09015863, Guayaquil, Ecuador
... Lowe presents an algorithm that iteratively refines an initially guessed view point via Newton's method [11]. Kang et al. propose a technique to efficiently extract topology information, i.e. line junctions, within an image [9] and use it to perform object recognition and pose estimation [10]. Vacchetti et al. present in [15] a marker-less registration method based on the combination of image feature points and edge tracking. ...
Article
The availability of large geospatial data from different sources has dramatically increased, but for the usage of such data in geo-mashup or contextaware systems, a data fusion component is necessary. To solve the integration issue classifiers are obtained by supervised training, with feature vectors derived from textual and geospatial attributes. In an application example, a coherent part of Germany was annotated by humans and used for supervised learning. Annotation by humans is not free of errors, which decreases the performance of the classifier. We show how visual analytics techniques can be used to efficiently detect such false annotations. Especially the textual features introduce high-dimensional feature vectors, where visual analytics becomes important and helps to understand and improve the trained classifiers. Particular technical components used in our systems are scatterplots, multiple coordinated views, and interactive data drill-down.
Chapter
Pose estimation between cameras and object is a central element for computer vision and its applications. In this paper, we present an approach to solve the problem of estimating the camera 3-D location and orientation from a matched set of 3-D model and 2-D image features. We derive an error equation using roll-pitch-yaw angle to present the rotation matrix and directly calculate the partial derivatives of Jacobian matrix without use of numerical methods for estimation parameters from the nonlinear error equation. Because the proposed method does not use a numerical method to derive the partial derivatives, it is very fast and so adequate for real-time pose estimation and also insensitive to selection of initial values for solving the nonlinear equation. The method is proved from real image experiments and a comparison with a numerical estimation method is presented.
Article
As the salient objects extraction is of great importance in computer vision and multimedia information retrieval, this paper concentrates on the problem of salient object recognition using local features. Considering the rotational invariance performance of circular region is much better, we exploit a circular region to replace the rectangular region. To implement the salient object detection, the visual object classes should be constructed from training image dataset through SIFT features clustering. Furthermore, for a test image, the object class which the test image belonged to can be detected by interest points matching. Afterwards, the SIFT features clustering and local features matching process can be implemented through the proposed hybrid SVM-QPSO model. To promote the quality of parameter selection in SVM, we utilize the quantum behaved particle swarm optimization technique to select suitable SVM parameters. Finally, experiments are conducted to make performance evaluation using the MSRC dataset. Experimental results show that compared with other methods, the proposed algorithm can effectively detect salient objects in both object detecting precision and computing efficiency.
Chapter
Pose estimation between cameras and object is a central element for computer vision and its applications. In this paper, we present an approach to solve the problem of estimating the camera 3-D location and orientation from a matched set of 3-D model and 2-D image features. We derive an error equation using roll-pitch-yaw angle to present the rotation matrix and directly calculate the partial derivatives of Jacobian matrix without use of numerical methods for estimation parameters from the nonlinear error equation. Because the proposed method does not use a numerical method to derive the partial derivatives, it is very fast and so adequate for real-time pose estimation and also insensitive to selection of initial values for solving the nonlinear equation. The method is proved from real image experiments and a comparison with a numerical estimation method is presented.
Conference Paper
Full-text available
We present a method that can dramatically accelerate object detection with part based models. The method is based on the observation that the cost of detection is likely to be dominated by the cost of matching each part to the image, and not by the cost of computing the optimal configuration of the parts as commonly assumed. Therefore accelerating detection requires minimizing the number of part-to-image comparisons. To this end we propose a multiple-resolutions hierarchical part based model and a corresponding coarse-to-fine inference procedure that recursively eliminates from the search space unpromising part placements. The method yields a ten-fold speedup over the standard dynamic programming approach and is complementary to the cascade-of-parts approach of. Compared to the latter, our method does not have parameters to be determined empirically, which simplifies its use during the training of the model. Most importantly, the two techniques can be combined to obtain a very significant speedup, of two orders of magnitude in some cases. We evaluate our method extensively on the PASCAL VOC and INRIA datasets, demonstrating a very high increase in the detection speed with little degradation of the accuracy.
Conference Paper
In this paper, we present a new approach to solve the problem of estimating the camera 3-D location and orientation from a matched set of 3-D model and 2-D image features. An iterative least-square method is used to solve both rotation and translation simultaneously. Because conventional methods that solved for rotation first and then translation do not provide good solutions, we derive an error equation using roll-pitch-yaw angle to present the rotation matrix. From the modeling of the error equation, we analytically extract the partial derivates for estimation parameters from the nonlinear error equation. To minimize the error equation, Levenberg-Marquardt algorithm is introduced with uniform sampling strategy of rotation space to avoid stuck in local minimum. Experimental results using real images are presented.
Article
Full-text available
Motris, an integrated system for model-based tracking research, has been designed modularly to study the effects of algorithmic variations on tracking results. Motris attempts to avoid introducing bias into the relative assessment of alternative approaches. Such a bias may be caused by differences of implementation and parameterization if the component approaches are evaluated in separate testing environments. Tracking results are evaluated automatically on a significant test sample in order to quantify the effects of different combinations of alternatives. The Motris system environment thus allows an in-depth comparison between the so-called ‘Edge-Element Association’ approach documented in Haag and Nagel (1999) and the more recent ‘Expectation-Maximization’ approach reported by Pece and Worrall (2002).
Article
Full-text available
In this paper we present a formalism for the formation of self consistent, hierarchical, "Low-Level" groupings of pairs of straight line segments from which all higher level groupings may be derived. Additionally, each low-level grouping is associated with a "Quality" factor, based on ev- idential reasoning, which reflects how much the groupings differ from mathematically perfect ones. This formalism has been incorporated into algorithms within the "LPEG" software package produced at the Uni- versity of Surrey. LPEG was developed as part of the Vision As Process (Crowley et al., 1989) project. We present results of the application of these algorithms to sets of line segments extracted from a test image.
Article
Full-text available
A new method has been designed to identify and locate objects lying on a flat surface. The merit of the approach is to provide strong robustness to partial occlusions (due for instance to uneven lighting conditions, shadows, highlights, touching and overlapping objects) thanks to a local and compact description of the objects boundaries and to a new fast recognition method involving generation and recursive evaluation of hypotheses named HYPER (HY potheses Predicted and Evaluated Recursively). The method has been integrated within a vision system coupled to an indutrial robot arm, to provide automatic picking and repositioning of partially overlapping industrial parts.
Article
Full-text available
Machine vision systems that perform inspection tasks must be capable of making measurements. A vision system measures an image to determine a measurement of the object being viewed. The image measurement depends on several factors, including sensing, image processing, and feature extraction. We consider the error that can occur in measuring the distance between two corner points of the 2D image. We analyze the propagation of the uncertainty in edge point position to the 2D measurements made by the vision system, from 2D curve extraction, through point determination, to measurement. We extend earlier work on the relationship between random perturbation of edge point position and variance of the least squares estimate of line parameters and analyze the relationship between the variance of 2D points.
Article
Full-text available
Dynamic programming is discussed as an approach to solving variational problems in vision. Dynamic programming ensures global optimality of the solution, is numerically stable, and allows for hard constraints to be enforced on the behavior of the solution within a natural and straightforward structure. As a specific example of the approach's efficacy, applying dynamic programming to the energy-minimizing active contours is described. The optimization problem is set up as a discrete multistage decision process and is solved by a time-delayed discrete dynamic programming algorithm. A parallel procedure for decreasing computational costs is discussed
Article
This paper discusses how local measurements of positions and surface normals may be used to identify and locate overlapping objects. The objects are modeled as polyhedra (or polygons) having up to six degrees of positional freedom relative to the sensors. The approach operates by examining all hypotheses about pairings between sensed data and object surfaces and efficiently discarding inconsistent ones by using local constraints on: distances between faces, angles between face normals, and angles (relative to the surface normals) of vectors between sensed points. The method described here is an extension of a method for recognition and localization of nonoverlapping parts previously described in [18] and [15].
Article
This paper presents a stereo matching algorithm using the dynamic programming technique. The stereo matching problem, that is, obtaining a correspondence between right and left images, can be cast as a search problem. When a pair of stereo images is rectified, pairs of corresponding points can be searched for within the same scanlines. We call this search intra-scanline search. This intra-scanline search can be treated as the problem of finding a matching path on a two-dimensional (2D) search plane whose axes are the right and left scanlines. Vertically connected edges in the images provide consistency constraints across the 2D search planes. Inter-scanline search in a three-dimensional (3D) search space, which is a stack of the 2D search planes, is needed to utilize this constraint. Our stereo matching algorithm uses edge-delimited intervals as elements to be matched, and employs the above mentioned two searches: one is inter-scanline search for possible correspondences of connected edges in right and left images and the other is intra-scanline search for correspondences of edge-delimited intervals on each scanline pair. Dynamic programming is used for both searches which proceed simultaneously: the former supplies the consistency constraint to the latter while the latter supplies the matching score to the former. An interval-based similarity metric is used to compute the score. The algorithm has been tested with different types of images including urban aerial images, synthesized images, and block scenes, and its computational requirement has been discussed.
Article
A large class of problems can be formulated in terms of the assignment of labels to objects. Frequently, processes are needed which reduce ambiguity and noise, and select the best label among several possible choices. Relaxation labeling processes are just such a class of algorithms. They are based on the parallel use of local constraints between labels. This paper develops a theory to characterize the goal of relaxation labeling. The theory is founded on a definition of con-sistency in labelings, extending the notion of constraint satisfaction. In certain restricted circumstances, an explicit functional exists that can be maximized to guide the search for consistent labelings. This functional is used to derive a new relaxation labeling operator. When the restrictions are not satisfied, the theory relies on variational cal-culus. It is shown that the problem of finding consistent labelings is equivalent to solving a variational inequality. A procedure nearly identical to the relaxation operator derived under restricted circum-stances serves in the more general setting. Further, a local convergence result is established for this operator. The standard relaxation labeling formulas are shown to approximate our new operator, which leads us to conjecture that successful applications of the standard methods are explainable by the theory developed here. Observations about con-vergence and generalizations to higher order compatibility relations are described.
Article
This paper presents an efficient model-based recognition method to recognize 2-D objects and to obtain correspondences between models and scene boundaries with a subpixel positioning error.As a shape signature for a contour, we propose a descriptor consisting of five-point invariants, which are used to index a hash table. Also, we propose a projective refinement as a verification method to compute exact correspondences between models and scene contour points. This method repeatedly computes projective transformation using a weighted pseudo inverse.We present an error model for five-point invariants, which are used to define a similarity between two descriptors, to determine a searching bound in indexing, and to obtain the weights in the projective refinement.In experiments using seriously distorted images of forty models, this method led to the recognition of planar curved objects. A transformation using the correspondence between the model and scene points on contours was also obtained.
Article
A single intensity image of a three-dimensional (3D) object obtained under perspective projection can be reduced to a two-dimensional (2D) line drawing which contains patterns characteristic of the object and of its pose. These patterns can provide clues to narrow down the search for possible objects and poses when comparing 2D view-class models of 3D objects against images. This paper describes a general way of representing 2D line patterns and of using the patterns to consistently label 2D models of 3D objects. The representation is based on groups of three line segments that are likely to be found in most images containing man-made objects, but are unlikely to occur by accident. Experimental results using representative patterns to match 2D view-class models of 3D objects against real images of the objects are included.
Article
A computer vision system has been implemented that can recognize three-dimensional objects from unknown viewpoints in single gray-scale images. Unlike most other approaches, the recognition is accomplished without any attempt to reconstruct depth information bottom-up from the visual input. Instead, three other mechanisms are used that can bridge the gap between the two-dimensional image and knowledge of three-dimensional objects. First, a process of perceptual organization is used to form groupings and structures in the image that are likely to be invariant over a wide range of viewpoints. Second, a probabilistic ranking method is used to reduce the size of the search space during model-based matching. Finally, a process of spatial correspondence brings the projections of three-dimensional models into direct correspondence with the image by solving for unknown viewpoint and model parameters. A high level of robustness in the presence of occlusion and missing data can be achieved through full application of a viewpoint consistency constraint. It is argued that similar mechanisms and constraints form the basis for recognition in human vision.
Article
An optimization approach to invariant matching is proposed. In this approach, an object or a pattern is invariantly represented by an object-centred description called an attributed relational structure (ARS) embedding invariant properties and relations between the primitives of the pattern such as line segments and points. Noise effect is taken into account such that a scene can consist of noisy sub-parts of a model. The matching is then to find the optimal mapping between the ARSs of the scene and the model. A gain functional is formulated to measure the goodness of fit and is to be maximized by using the relaxation labelling method. Experiments are shown to illustrate the matching algorithm and to demonstrate that the approach is truly invariant to arbitrary translations, rotations, and scale changes under noise.
Article
Finding line segments in an intensity image has been one of the most fundamental issues in the area of computer vision. In complex scenes, it is hard to detect the locations of point features. Line features are more robust in providing greater positional accuracy. In this paper we present a robust “line feature extraction” algorithm which extracts line features in a single pass without using any assumptions and constraints. Our algorithm consists of six steps: (1) edge extraction, (2) edge scanning, (3) edge normalization, (4) line-blob extraction, (5) line-feature computation and (6) line linking. By using an edge scanning, the computational complexity due to too many edge pixels is drastically reduced. Edge normalization improves the local quantization error induced from the gradient space partitioning and minimizes perturbations on edge orientation. We also analyze the effects of edge processing, and the least squares-based method and the principal axis-based method on the computation of line orientation. We show its efficiency with some real images.
Article
A new method for the recognition of arbitrary two-dimensional (2D) shapes is described. It is based on string edit distance computation. The recognition method is invariant under translation, rotation, scaling and partial occlusion. A set of experiments are described demonstrating the robustness and reliability of the proposed approach.
Article
A stereo algorithm is presented that optimizes a maximum likelihood cost function. The maximum likelihood cost function assumes that corresponding features in the left and right images are normally distributed about a common true value and consists of a weighted squared error term if two features are matched or a (fixed) cost if a feature is determined to be occluded. The stereo algorithm finds the set of correspondences that maximize the cost function subject to ordering and uniqueness constraints. The stereo algorithm is independent of the matching primitives. However, for the experiments described in this paper, matching is performed on the $cf4$individual pixel intensities.$cf3$ Contrary to popular belief, the pixel-based stereo appears to be robust for a variety of images. It also has the advantages of (i) providing adensedisparity map, (ii) requiringnofeature extraction, and (iii)avoidingthe adaptive windowing problem of area-based correlation methods. Because feature extraction and windowing are unnecessary, a very fast implementation is possible. Experimental results reveal that good stereo correspondences can be found using only ordering and uniqueness constraints, i.e., withoutlocalsmoothness constraints. However, it is shown that the original maximum likelihood stereo algorithm exhibits multiple global minima. The dynamic programming algorithm is guaranteed to find one, but not necessarily the same one for each epipolar scanline, causing erroneous correspondences which are visible as small local differences between neighboring scanlines. Traditionally, regularization, which modifies the original cost function, has been applied to the problem of multiple global minima. We developed several variants of the algorithm that avoid classical regularization while imposing several global cohesiveness constraints. We believe this is a novel approach that has the advantage of guaranteeing that solutions minimize the original cost function and preserve discontinuities. The constraints are based on minimizing the total number of horizontal and/or vertical discontinuities along and/or between adjacent epipolar lines, and local smoothing is avoided. Experiments reveal that minimizing the sum of the horizontal and vertical discontinuities provides the most accurate results. A high percentage of correct matches and very little smearing of depth discontinuities are obtained. An alternative to imposing cohesiveness constraints to reduce the correspondence ambiguities is to use more than two cameras. We therefore extend the two camera maximum likelihood toNcameras. TheN-camera stereo algorithm determines the “best” set of correspondences between a given pair of cameras, referred to as the principal cameras. Knowledge of the relative positions of the cameras allows the 3D point hypothesized by an assumed correspondence of two features in the principal pair to be projected onto the image plane of the remainingN− 2 cameras. TheseN− 2 points are then used to verify proposed matches. Not only does the algorithm explicitly model occlusion between features of the principal pair, but the possibility of occlusions in theN− 2 additional views is also modeled. Previous work did not model this occlusion process, the benefits and importance of which are experimentally verified. Like other multiframe stereo algorithms, the computational and memory costs of this approach increase linearly with each additional view. Experimental results are shown for two outdoor scenes. It is clearly demonstrated that the number of correspondence errors is significantly reduced as the number of views/cameras is increased.
Conference Paper
Object recognition from sensory data involves, in part, determining the pose of a model with respect to a scene. A common method for finding an object's pose is the generalized Hough transform, which accumulates evidence for possible coordinate transformations in a parameter space whose axes are the quantized transformation parameters. Large clusters of similar transformations in that space are taken as evidence of a correct match. A theoretical analysis of the behavior of such methods is presented. The authors derive bounds on the set of transformations consistent with each pairing of data and model features, in the presence of noise and occlusion in the image. Bounds are provided on the likelihood of false peaks in the parameter space, as a function of noise, occlusion, and tessellation effects. It is argued that haphazardly applying such methods to complex recognition tasks is risky, as the probability of false positives can be very high
Article
Authors offer a combined descriptive scheme nd decision metric which is general, intuitively satisfying, and which has led to promising experimental results and present an algorithm which takes the above descriptions, together with a mix representing the intensities of the ual photograph, and then finds the described object in the matrix. The algorithm uses a procedure similar to dynamic programming in order to cut down on the vast amount of computation otherwise necessary. A new programming system does not to be written for every new description; instead, one just specifies descriptions in terms of a certain set of primitives and parameters. There are many areas of application: scene analysis and description, map matching for navigation and guidance, optical tracking, stereo compilation, and image change detection.
Article
Object recognition from sensory data involves, in part, determining the pose of a model with respect to a scene. A common method for finding an object's pose is the generalized Hough transform, which accumulates evidence for possible coordinate transformations in a parameter space whose axes are the quantized transformation parameters. Large clusters of similar transformations in that space are taken as evidence of a correct match. In this article, we provide a theoretical analysis of the behavior of such methods. We derive bounds on the set of transformations consistent with each pairing of data and model features, in the presence of noise and occlusion in the image. We also provide bounds on the likelihood of false peaks in the parameter space, as a function of noise, occlusion, and tessellation effects. We argue that blithely applying such methods to complex recognition tasks is a risky proposition, as the probability of false positives can be very high.
Article
This paper describes an algorithm that robustly locates convex collections of line segments in an image. The algorithm is guaranteed to find all convex sets of line segments in which the length of the line segments accounts for at least some fixed proportion of the length of their convex hull. This enables the algorithm to find convex groups whose contours are partially occluded or missing due to noise. We perform an expected case analysis of the algorithm's performance that shows that its run time is O(n 2 log(n) + nm), when we wish to find the m most salient groups in an image with n line segments. We support this analysis with experiments on real data. Our analysis not only reveals the circumstances under which our algorithm is efficient, but also tells us when the groups found are unlikely to occur at random, and so are likely to capture the underlying structure of a scene. We also demonstrate the grouping system as a module in an efficient recognition system that combines group...