Figure 4 - uploaded by Armand Joulin
Content may be subject to copyright.
An example of curve expansion move for two nodes, b and r . The blue curve corresponds to the nodes ( n λ b ) 1 ≤ λ ≤ N l ob- 

An example of curve expansion move for two nodes, b and r . The blue curve corresponds to the nodes ( n λ b ) 1 ≤ λ ≤ N l ob- 

Source publication
Conference Paper
Full-text available
This paper addresses the problem of category-level image classification. The underlying image model is a graph whose nodes correspond to a dense set of regions, and edges reflect the underlying grid structure of the image and act as springs to guarantee the geometric consistency of nearby regions during matching. A fast approximate algorithm for ma...

Context in source publication

Context 1
... can be solved optimally using Ishikawa’s method. We then solve a sequence of vertical and horizontal moves: The local minimum obtained by this procedure is lower than 2 √ N l N n configurations, where N l is the number of labels. By comparison, the minimum obtained by alpha expansion is only guaranteed to be lower than N l 2 N n other configurations [5]. The procedure proposed in the previous section only allows vertical and horizontal moves. Let us now show how to extend it to allow more complicated moves. Ishikawa’s method reaches the global minimum of functions verifying condition (4). It can be extended to more general binary terms by replacing (4) by: This is a direct consequence of the proof in [20]. With this condition, we can handle binary functions which do not only depend on pairwise label differences. This allows us to use more complicated moves than horizontal or vertical displacements. We thus propose the following algorithm: At each step t , we consider an ordered list of P t possible rent distortions distortion D t = d t n [ , d ̃ 1 we , d ̃ 2 update , ..., d ̃ P these t ] . Given distortions nodes n by with solving cur- the following problem: where t 1 N . Then the updated distortion of node n is d t n +1 ← d t n + d ̃ t n . For example the vertical move, Eq. (6), consists of distortions d ̃ of the form ( d x, d y ) = (0 , k ) for k ∈ {− ( K − 1) / 2 , . . . , ( K − 1) / 2 } . In practice, we construct a graph inspired by the one constructed for Ishikawa’s method. An example is shown in Figure 4 with two nodes. Note that the set of all possible distortions D can be different for each node, which gives N different D n ’s. The only constraint is that all the ( D n ) 1 ≤ n ≤ N should be increasing (or decreasing) in y and increasing (or decreasing) in x . The proposed approach to graph matching has been im- plemented in C ++ . In this section we compare its running time to competing algorithms, before presenting image matching results and a comparative evaluation with the state of the art in image classification tasks on standard benchmarks (Caltech 101, Caltech 256 and Scenes). We compare here the running times of our curve expansion algorithm to the alpha expansion and TRW-S. For the alpha expansion, we use the C ++ implementation of Boykov et al. [5, 25]. For TRW-S, we use the C ++ implementation of Kolmogorov [24]. For the multi-step curve expansion we use four different moves (horizontal, vertical and diagonal moves). All experiments are performed on a single 2.4 gHz processor with 4 gB of RAM. We take 100 random pairs of images from Caltech 101 and run the four algorithms on increasing grid sizes. The results of our comparison are shown in Figure 5. The 2-step and multi-step curve expansions are much faster than the alpha expansion and TRW-S for grids with up to 1000 nodes or so. However, empirically, their complexity in the number of nodes is higher than alpha expansion’s, which makes them slower for graphs with more than 4000 nodes. In terms of average minimization performance, 2-step curve expansion is similar to alpha expansion, whereas the multi-step curve expansion with 4 moves, and TRW-S im- prove the results by respectively, 2% and 5%. However, these improvements have empirically little influence on the overall process. Indeed, for categorization, a coarse matching seems to be enough to obtain high categorization performance. Thus, the real issue in this context is time and we prefer to use the 2-step curve expansion which matches two images in less than 0.04 seconds for 500 nodes. Other methods have been developed for approximate graph matching [2, 23, 27], but their running time is pro- hibitive for our application. Berg et al. [2] match two graphs with 50 nodes in 5 seconds and Leordeanu et al. [27] match 130 points in 9 seconds. Kim and Grauman [23] propose a string matching algorithm which takes around 10 seconds for 4800 nodes and around 1 second to match 500 nodes. To illustrate image matching, we use a finer grid than in our actual image classification experiments to show the level of localization accuracy that can be achieved. We fix 30 × 40 grid with maximum allowed displacement K = 15 . In Figure 6, we show the influence of the parameter μ which penalizes the crossing between matches. On the left panel of the figure, where crossings are not penalized, some parts of the image are duplicated, whereas, when crossings are forbidden (right panel), the deformed picture retains the original image structure yet still matches well the model. For our categorization experiments, we choose a value of this parameter in between (middle panel). Figure 7 shows some matching results for images of the same category, similar to Figure 1 (more examples can be seen in the supple- mentary material). We test our algorithm on the publicly available Caltech 101, Caltech 256, and Scenes datasets. We fix λ = 0 . 1 , μ = 0 . 5 and K = 11 for all our experiments on all the databases. We have tried two grid sizes: 18 × 24 , and 24 × 32 , and have consistently obtained better results (by 1 to 2%) using the coarser grid, so only show the corresponding results in this section. Our algorithm is robust to the choice of K (as long as K is at least 11 ). The value for λ and μ have been chosen by looking at the resulting matching on a single pair of images. Obviously using parameters adapted to each database and selected by cross-validation would have lead to better performance. Since time is a major issue when dealing with large databases, we use the 2-step curve expansion instead of the M-step version. Caltech 101. Like others, we report results for two different training set sizes (15 or 30 images), and report the average performance over 20 random splits. Our results are compared to those of other methods based on graph matching [1, 2, 23] in Table 1, which shows that we obtain classification rates that are better by more than 12% . We also compare our results to the state of art on Caltech 101 in Ta- ble 2. Our algorithm outperforms all competing methods for 15 training examples, and is the third performer overall, behind Yang et al. [36] and, Todorovic and Ahuja [33] for 30 examples. Note that our method is the top performer among algorithms using a single type of feature for both 15 and 30 training examples. Caltech 256. Our results are compared with the state of the art for this dataset in Table 3. They are similar to those obtained by methods using a single feature [3, 23], but not as good as those using multiple features ([3] with 5 descriptors,[33]). Scenes. A comparison with the state of the art on this dataset is given in Table 4. Our method is the second top performer below Boureau et al. [4]. This result is expected since it is designed to recognize objects with a fairly con- sistent spatial layout (at least for some range of viewpoints). In contrast, scenes are composed of many different elements that move freely in space. We have presented a new approach to object categorization that formulates image matching as an energy optimization problem defined over graphs associated with a coarse image grid, presented an efficient algorithm for optimizing this energy function and constructing the corresponding image comparison kernel, and demonstrated results that ...

Similar publications

Article
Full-text available
p>In this paper, a graph based handwritten Tifinagh character recognition system is presented. In preprocessing Zhang Suen algorithm is enhanced. In features extraction, a novel key point extraction algorithm is presented. Images are then represented by adjacency matrices defining graphs where nodes represent feature points extracted by a novel alg...
Article
Full-text available
The sample mean is one of the most fundamental concepts in statistics. Properties of the sample mean that are well-defined in Euclidean spaces become unclear in graph spaces. This paper proposes conditions under which the following properties are valid: existence, uniqueness, and consistency of means, the midpoint property, necessary conditions of...

Citations

... Graph matching has been discussed in pattern recognition and computer vision for decades [38]- [41]. In recent years, research on deep learning-based graph matching has attracted more and more attentions. ...
Preprint
The current point cloud registration methods are mainly based on geometric information and usually ignore the semantic information in the point clouds. In this paper, we treat the point cloud registration problem as semantic instance matching and registration task, and propose a deep semantic graph matching method for large-scale outdoor point cloud registration. Firstly, the semantic category labels of 3D point clouds are obtained by utilizing large-scale point cloud semantic segmentation network. The adjacent points with the same category labels are then clustered together by using Euclidean clustering algorithm to obtain the semantic instances. Secondly, the semantic adjacency graph is constructed based on the spatial adjacency relation of semantic instances. Three kinds of high-dimensional features including geometric shape features, semantic categorical features and spatial distribution features are learned through graph convolutional network, and enhanced based on attention mechanism. Thirdly, the semantic instance matching problem is modeled as an optimal transport problem, and solved through an optimal matching layer. Finally, according to the matched semantic instances, the geometric transformation matrix between two point clouds is first obtained by SVD algorithm and then refined by ICP algorithm. The experiments are cconducted on the KITTI Odometry dataset, and the average relative translation error and average relative rotation error of the proposed method are 6.6cm and 0.229{\deg} respectively.
... Traditional methods [71,21,104,53,55,28] explicitly incorporated hand-designed prior terms to achieve smoother correspondence, such as total variation (TV) or discontinuity-aware smoothness. However, the formulation of the hand-crafted prior term is notoriously challenging and may vary depending on the specific dense correspondence task, such as geometric matching [55,23,44] or optical flow [103,74]. ...
Preprint
Full-text available
The objective for establishing dense correspondence between paired images consists of two terms: a data term and a prior term. While conventional techniques focused on defining hand-designed prior terms, which are difficult to formulate, recent approaches have focused on learning the data term with deep neural networks without explicitly modeling the prior, assuming that the model itself has the capacity to learn an optimal prior from a large-scale dataset. The performance improvement was obvious, however, they often fail to address inherent ambiguities of matching, such as textureless regions, repetitive patterns, and large displacements. To address this, we propose DiffMatch, a novel conditional diffusion-based framework designed to explicitly model both the data and prior terms. Unlike previous approaches, this is accomplished by leveraging a conditional denoising diffusion model. DiffMatch consists of two main components: conditional denoising diffusion module and cost injection module. We stabilize the training process and reduce memory usage with a stage-wise training strategy. Furthermore, to boost performance, we introduce an inference technique that finds a better path to the accurate matching field. Our experimental results demonstrate significant performance improvements of our method over existing approaches, and the ablation studies validate our design choices along with the effectiveness of each component. Project page is available at https://ku-cvlab.github.io/DiffMatch/.
... Finding visual correspondences [42], [43] across related images is a fundamental task in computer vision. It has seen a variety of applications in areas such as scene understanding [44], object detection [45], and semantic correspondence [46]- [49]. ...
Preprint
Few-shot semantic segmentation is the task of learning to locate each pixel of the novel class in the query image with only a few annotated support images. The current correlation-based methods construct pair-wise feature correlations to establish the many-to-many matching because the typical prototype-based approaches cannot learn fine-grained correspondence relations. However, the existing methods still suffer from the noise contained in naive correlations and the lack of context semantic information in correlations. To alleviate these problems mentioned above, we propose a Feature-Enhanced Context-Aware Network (FECANet). Specifically, a feature enhancement module is proposed to suppress the matching noise caused by inter-class local similarity and enhance the intra-class relevance in the naive correlation. In addition, we propose a novel correlation reconstruction module that encodes extra correspondence relations between foreground and background and multi-scale context semantic features, significantly boosting the encoder to capture a reliable matching pattern. Experiments on PASCAL-$5^i$ and COCO-$20^i$ datasets demonstrate that our proposed FECANet leads to remarkable improvement compared to previous state-of-the-arts, demonstrating its effectiveness.
... Thereby, each object's type label is deduct by those of its spatial neighboring. Duchenne textitet al. [11] converse an conception nucleus machine by deriving graphs' writing for labeling object categories. Lin textitet al. [40] converse a semantic parsing algorithmic rule second-hand the oppose-informed depict graph. ...
Article
Full-text available
Accurately calculating the labels of each high-resolution image is an unavoidable technique in remote sensing. In this paper, we propose a novel image assortment model that personate each aerial image by optimally encoding a gaze shifting path (GSP). At the same time, wrong semantic model can get absent with it. More specifically, for each aerial image, we reference visually/semantically noticeable representational rogue interiors. To encode their analysis attributes, we mean a small graph comprise of spatially conterminous motivational wall, and extract GSPs on it by active literature algorithm rules. GSP can accurately capture humans perception over many aerial image areas when the notice senses are placed in each image. Subsequently, a double deep learning framework is proposed to intelligently exploit the semantics of these GSPs, with three attributes: i) label noises reduction, ii) visual manner-unchanging semantics, and iii) adaptive data chart updates are seamlessly integrated. The proposed framework can iteratively solved, with each graphlet re-form into a base. Finally, the GSP-compliant summaries in each aerial have shown the quantized vectors for visual understanding. To qualitatively and quantitatively assess how GSP affects information aerial image classification. We notice that 1) the phantom copy of our progress classification is more accurate than its competitors, and 2) the GSPs propagated by Alzheimer's patients are discriminative from those produced by typical observers, making the classification competitive.
... Graph matching has been widely studied in computer vision and pattern recognition [54], [55], [56], [57], [58]. Recently, learningbased graph matching has attracted considerable research interest [27], [59], [60], but, to the best of our knowledge, there is no research on using learning-based graph matching to solve the point cloud registration problem. ...
Preprint
3D point cloud registration is a fundamental problem in computer vision and robotics. Recently, learning-based point cloud registration methods have made great progress. However, these methods are sensitive to outliers, which lead to more incorrect correspondences. In this paper, we propose a novel deep graph matching-based framework for point cloud registration. Specifically, we first transform point clouds into graphs and extract deep features for each point. Then, we develop a module based on deep graph matching to calculate a soft correspondence matrix. By using graph matching, not only the local geometry of each point but also its structure and topology in a larger range are considered in establishing correspondences, so that more correct correspondences are found. We train the network with a loss directly defined on the correspondences, and in the test stage the soft correspondences are transformed into hard one-to-one correspondences so that registration can be performed by a correspondence-based solver. Furthermore, we introduce a transformer-based method to generate edges for graph construction, which further improves the quality of the correspondences. Extensive experiments on object-level and scene-level benchmark datasets show that the proposed method achieves state-of-the-art performance. The code is available at: \href{https://github.com/fukexue/RGM}{https://github.com/fukexue/RGM}.
... Graph matching is a classical problem in computer graphics and computer vision, which aims at computing the optimal correspondences between the nodes of two graphs. It is widely used in many applications, such as shape retrieval [2], object recognition [9] and stereo matching [20]. ...
Article
Full-text available
Graph matching is a fundamental NP-problem in computer graphics and computer vision. In this work, we present an approximate graph matching method. Given two graphs to be matched, the proposed method first constructs an association graph to convert the problem of graph matching into a problem of selecting nodes on the constructed graph. The nodes of the association graph represent candidate correspondences between the two original graphs. An affinity matrix is then computed based on the local, intermediate and global information of the original graphs’ nodes, each element of which is used to measure the mutual consistency of a correspondence pair within the association graph. Updating the affinity of each correspondence pair with the affinities of relevant correspondences, our method then utilizes the reweighted random walks strategy to simulate random walks on the association graph and to iteratively obtain a quasi-stationary distribution. Finally, our method applies the Hungarian algorithm to discretize the distribution. Experimental results on four common datasets verify the effectiveness of the proposed method.
... Graph matching (GM) aims to find a mapping between the nodes of two graphs, such that the nodes or the edges with similar attributes are likely to be matched. GM is widely used to establish feature correspondence among sets of graphstructured data to accomplish various tasks associated with computer vision and pattern recognition tasks [45,7,30,36], such as shape analysis [3], image retrieval [43], object categorization [10,34], and structure from motion [38]. In these applications, a graph represents real-world data (e.g., speeded up robust features descriptor and shape context) as nodes that are connected by edges. ...
... In the literature, the formulation of an objective function https://doi.org/10.1016/j.ins.2022. 10.065 0020-0255/Ó 2022 Elsevier Inc. ...
Article
Graph matching (GM) plays a vital role in solving various computer vision tasks by establishing a node-to-node correspondence between graph-structured data. In this study, we propose a novel method based on affinity matrix for solving GM tasks. Specifically, a pruning strategy is proposed to remove inconsistent edges from an association graph that formulates the GM problem into a node ranking and selection problem. Moreover, an edge context information with respect to a cross-graph reference structure is proposed and combined with the local distance information to compute the affinities among the graphs’ nodes. Then, the re-weighted random walk technique is utilized to simulate random walks on the association graph and compute a quasi-stationary distribution in an iterative manner. Finally, the Hungarian algorithm is used to discretize the distribution and obtain an approximate matching between two graphs. The experimental results indicate that our proposed GM method is able to produce promising results in the presence of outliers and deformation noise as compared with other methods.
... Sdika [19] proposed a new non-rigid unimodal image registration algorithm based on B-spline, and adopted a method combining multiplier method and L-BFGS algorithm to effectively solve a large number of variables and constraints in 3D image registration. Some papers similar to [20][21][22] solve the category-level classification problem, but this type of method still relies on the traditional non-trainable classic method to guide matching. The classic method of finding correspondences extracts local key points and calculates local descriptor matching around the extracted points [2][3][4][5][6]. ...
Article
Full-text available
In view of the multi-scale changes and the influence of light and angle in the image matching process, it is quite difficult to realize intelligent image registration by using convolutional neural network. The existing image matching algorithm has the following problems in the application process: the existing shallow feature extraction model has lost a lot of effective feature information and low recognition accuracy. Meanwhile, the image registration method based on deep learning is not robust and accurate enough. Therefore, an image registration method based on additive edge cosine loss was proposed in this paper. In the twin network architecture, cosine loss was used to convert Euclidean space into angular space, which eliminated the influence of characteristic intensity and improved the accuracy of registration. The matching cost was directly calculated by the included angle of two vectors in the embedded space, where the size of the angle edge could be quantitatively adjusted through parameter m\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m$$\end{document}. We further derived a specific m\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m$$\end{document} to quantitatively adjust the loss. In addition, anti-rotation attention mechanism was added to the network to enhance the ability of feature information extraction and adjust the position information of feature vectors to reduce the mismatching caused by image rotation.
... Semantic correspondence is one of the fundamental problems in computer vision with many applications in object recognition (Duchenne et al. 2011;Liu et al. 2011 (Dale et al. 2009), semantic segmentation (Kim et al. 2013), and scene parsing (Zhou et al. 2015a), to name a few. The goal is to establish dense correspondences across images containing the objects or scenes of the same category. ...
Article
Full-text available
Dense correspondence across semantically related images has been extensively studied, but still faces two challenges: 1) large variations in appearance, scale and pose exist even for objects from the same category, and 2) labeling pixel-level dense correspondences is labor intensive and infeasible to scale. Most existing methods focus on designing various matching modules using fully-supervised ImageNet pretrained networks. On the other hand, while a variety of self-supervised approaches are proposed to explicitly measure image-level similarities, correspondence matching the pixel level remains under-explored. In this work, we propose a multi-level contrastive learning approach for semantic matching, which does not rely on any ImageNet pretrained model. We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects, while the performance can be further enhanced by regularizing cross-instance cycle-consistency at intermediate feature levels. Experimental results on the PF-PASCAL, PF-WILLOW, and SPair-71k benchmark datasets demonstrate that our method performs favorably against the state-of-the-art approaches. The source code and trained models will be made available to the public.
... Among these digital features, we Ąnd Ąngerprints [22,161], retina [71] and the face [140]. In computer vision, graph matching is used for image analysis and image databases for both indexing and retrieval [30,2,19,19,64]. Moreover, it is also used in video analysis for object tracking [123,95,150,159,56]. ...
Thesis
Graph Pattern Matching (GPM), usually evaluated through subgraph isomorphism, finds subgraphs of a large data graph that are similar to an input query graph. It has many applications, such as pattern recognition and finding communities in social networks. However, besides its NP-completeness, the strict constraints of subgraph isomorphism are making it impractical for GPM in the context of big data. As a result, relaxed GPM models such as graph simulation emerged as they yield interesting results in polynomial time. Moreover, massive graphs generated by mostly social networks require distributed storing and processing of the data over multiple machines. Therefore, the existing algorithms for relaxed GPM need to be revised to this context by adopting new paradigms for big graph processing, e.g. Think-Like-A-Vertex and its derivatives. In this thesis, we investigate the use of distributed graph processing paradigms and systems in the evaluation of GPM queries. Our goal is to identify the programming models that are best suited for this problem. Furthermore, we study the existing GPM approaches, with more emphasis on the relaxed ones in the aim of proposing new parallel and distributed algorithms for relaxed GPM that guarantee linear scalability. Our contributions are summarized as follows. First, we propose a taxonomy of prior work on distributed GPM based on multiple criteria, such as the GPM model and the programming paradigm. Next, we introduce BDSim as a new model that captures more semantic similarities compared to the existing models while being feasible in cubic time. Besides, we design distributed vertex-centric algorithms that are adapted to the context of massive graphs for evaluating BDSim. Furthermore, we propose the first fully distributed and scalable approach for strong simulation, a relaxed GPM model that strikes a balance between flexibility and tractability. Finally, we propose the first efficient parallel edge-centric approach for evaluating graph simulation and dual simulation in distributed graphs. We validate the effectiveness and efficiency of our approaches through theoretical guarantees and reliable testing over synthetic and real-world graphs. We confirmed in this thesis that different paradigms can be used in designing distributed GPM algorithms depending on the GPM model adopted. Indeed, algorithms for neighborhood-based models such as subgraph isomorphism and strong simulation perform better with a vertex-centric or subgraph-centric paradigm as the latter involves some data locality, while the most efficient algorithms for graph simulation and dual simulation are edge-based and offer linear scalability guarantees.