ArticlePDF Available

The graph matching problem

Authors:

Abstract and Figures

In this paper, we propose a survey concerning the state of the art of the graph matching problem, conceived as the most important element in the definition of inductive inference engines in graph-based pattern recognition applications. We review both methodological and algorithmic results, focusing on inexact graph matching procedures. We consider different classes of graphs that are roughly differentiated considering the complexity of the defined labels for both vertices and edges. Emphasis will be given to the understanding of the underlying methodological aspects of each identified research branch. A selection of inexact graph matching algorithms is proposed and synthetically described, aiming at explaining some significant instances of each graph matching methodology mainly considered in the technical literature.
Content may be subject to copyright.
A preview of the PDF is not available
... A spectral framework for computing GGDs Before computing GGDs, it is necessary to establish the node-to-node correspondence between two graphs. This can be achieved by leveraging existing graph-matching techniques [33,15,8]. In this work, we will leverage a recent spectral graph matching framework that has been shown to recover accurate matching with high probability [16]. ...
... Computing the GGD metric between two input graphs requires solving a graph matching problem in advance. Graph matching techniques can be used to establish node-to-node correspondence by seeking a bijection between node sets to maximize the alignment of edge sets [33,15,8]. This combinatorial optimization problem can be cast into a Quadratic Assignment Problem, which is NP-hard to solve or approximate [16,53]. ...
Preprint
Full-text available
This paper presents a spectral framework for assessing the generalization and stability of Graph Neural Networks (GNNs) by introducing a Graph Geodesic Distance (GGD) metric. For two different graphs with the same number of nodes, our framework leverages a spectral graph matching procedure to find node correspondence so that the geodesic distance between them can be subsequently computed by solving a generalized eigenvalue problem associated with their Laplacian matrices. For graphs with different sizes, a resistance-based spectral graph coarsening scheme is introduced to reduce the size of the bigger graph while preserving the original spectral properties. We show that the proposed GGD metric can effectively quantify dissimilarities between two graphs by encapsulating their differences in key structural (spectral) properties, such as effective resistances between nodes, cuts, the mixing time of random walks, etc. Through extensive experiments comparing with the state-of-the-art metrics, such as the latest Tree-Mover's Distance (TMD) metric, the proposed GGD metric shows significantly improved performance for stability evaluation of GNNs especially when only partial node features are available.
... However, decoding introduces node order differences between G and G ′ . Thus, a graph matching procedure (NPhard (Livi and Rizzi 2013)) between the two is necessary. ...
Article
Counterfactual Explanation (CE) techniques have garnered attention as a means to provide insights to the users engaging with AI systems. While extensively researched in domains such as medical imaging and autonomous vehicles, Graph Counterfactual Explanation (GCE) methods have been comparatively under-explored. GCEs generate a new graph similar to the original one, with a different outcome grounded on the underlying predictive model. Among these GCE techniques, those rooted in generative mechanisms have received relatively limited investigation despite demonstrating impressive accomplishments in other domains, such as artistic styles and natural language modelling. The preference for generative explainers stems from their capacity to generate counterfactual instances during inference, leveraging autonomously acquired perturbations of the input graph. Motivated by the rationales above, our study introduces RSGG-CE, a novel Robust Stochastic Graph Generator for Counterfactual Explanations able to produce counterfactual examples from the learned latent space considering a partially ordered generation sequence. Furthermore, we undertake quantitative and qualitative analyses to compare RSGG-CE's performance against SoA generative explainers, highlighting its increased ability to engendering plausible counterfactual candidates.
... Pattern recognition is an important research area (Foggia, Percannella, and Vento 2014) due to its numerous applications ranging from detecting bad code patterns (Piotrowski and Madeyski 2020) and software analysis (Park et al. 2010;Singh et al. 2021;Zou et al. 2020) in general, to computational biology (Carletti, Foggia, and Vento 2013;Zaslavskiy, Bach, and Vert 2009). One fundamental problem in pattern recognition is graph matching (Livi and Rizzi 2013). Two common approaches are: (i) graph isomorphism (Cordella et al. 1999;Dahm et al. 2012;Ullmann 1976Ullmann , 2010Larrosa and Valiente 2002;Ullmann 2010;Zampelli, Deville, and Solnon 2010) and (ii) approximated graph matching (Bunke 1997;Raymond and Willett 2002;Sanfeliu and Fu 1983). ...
Article
Graph matching is a fundamental problem in pattern recognition, with many applications such as software analysis and computational biology. One well-known type of graph matching problem is graph isomorphism, which consists of deciding if two graphs are identical. Despite its usefulness, the properties that one may check using graph isomorphism are rather limited, since it only allows strict equality checks between two graphs. For example, it does not allow one to check complex structural properties such as if the target graph is an arbitrary length sequence followed by an arbitrary size loop. We propose a generalization of graph isomorphism that allows one to check such properties through a declarative specification. This specification is given in the form of a Regular Graph Pattern (ReGaP), a special type of graph, inspired by regular expressions, that may contain wildcard nodes that represent arbitrary structures such as variable-sized sequences or subgraphs. We propose a SAT-based algorithm for checking if a target graph matches a given ReGaP. We also propose a preprocessing technique for improving the performance of the algorithm and evaluate it through an extensive experimental evaluation on benchmarks from the CodeSearchNet dataset.
... Pattern recognition is an important research area (Foggia, Percannella, and Vento 2014) due to its numerous applications ranging from detecting bad code patterns (Piotrowski and Madeyski 2020) and software analysis (Park et al. 2010;Singh et al. 2021;Zou et al. 2020) in general, to computational biology (Carletti, Foggia, and Vento 2013;Zaslavskiy, Bach, and Vert 2009). One fundamental problem in pattern recognition is graph matching (Livi and Rizzi 2013). Two common approaches are: (i) graph isomorphism (Cordella et al. 1999;Dahm et al. 2012;Ullmann 1976Ullmann , 2010Larrosa and Valiente 2002;Ullmann 2010;Zampelli, Deville, and Solnon 2010) and (ii) approximated graph matching (Bunke 1997;Raymond and Willett 2002;Sanfeliu and Fu 1983). ...
Conference Paper
Full-text available
Graph matching is a fundamental problem in pattern recognition, with many applications such as software analysis and computational biology. One well-known type of graph matching problem is graph isomorphism, which consists of deciding if two graphs are identical. Despite its usefulness, the properties that one may check using graph isomorphism are rather limited, since it only allows strict equality checks between two graphs. For example, it does not allow one to check complex structural properties such as if the target graph is an arbitrary length sequence followed by an arbitrary size loop. We propose a generalization of graph isomorphism that allows one to check such properties through a declarative specification. This specification is given in the form of a Regular Graph Pattern (ReGaP), a special type of graph, inspired by regular expressions, that may contain wildcard nodes that represent arbitrary structures such as variable-sized sequences or subgraphs. We propose a SAT-based algorithm for checking if a target graph matches a given ReGaP. We also propose a preprocessing technique for improving the performance of the algorithm and evaluate it through an extensive experimental evaluation on benchmarks from the CodeSearchNet dataset.
... The topic of self-correspondence of network nodes is also related to the problem of graph matching or graph correspondence [3]. This challenge encompasses identifying the most accurate, or even perfect, node matching within the same network or across multiple networks. ...
Preprint
Full-text available
Several interesting situations involve two networks containing the same set of nodes, but which are interconnected in potentially distinct manners. These two networks can refer to: the same network, a network and a respectively modified version, or two related but inherently distinct networks such as a citation and a co-authorship networks that, though involving the same node labels (authors), are not necessarily interconnected in the same manner. A particularly interesting problem implied by these situations concerns the quantification of the similarity between the topological measurements of respectively corresponding nodes in the two networks. In case unique maximum similarity is observed between a node in a network and its corresponding node in another network, we say that node self-corresponds to itself. We approach this issue in terms of the coincidence similarity, which is a multiset enhancement of the Jaccard similarity index allowing enhanced selectivity and sensitivity. Surprising results are reported respectively to several related effects observed for both model-theoretical and real-world (world trade along years) networks respectively to topological changes as well as dependence on the networks parameters. The presented concepts, methods and results pave the way to many theoretical and applied developments, including the characterization of the robustness of networked structures.
Book
Full-text available
This book of proceedings gathers the contributions presented at the 6th URV Doctoral Workshop in Computer Science and Mathematics. The main aim of this workshop is to promote the dissemination of the ideas, methods and results that are developed in the Doctoral Thesis of the students of this doctorate program, and to promote the knowledge sharing, collaboration and discussion between their respective research groups.
Article
Classical graph matching aims to find a node correspondence between two unlabeled graphs of known topologies. This problem has a wide range of applications, from matching identities in social networks to identifying similar biological network functions across species. However, when the underlying graphs are unknown, the use of conventional graph matching methods requires inferring the graph topologies first, a process that is highly sensitive to observation errors. In this paper, we tackle the blind graph matching problem with unknown underlying graphs directly using observations of graph signals, which are generated from graph filters applied to graph signal excitations. We propose to construct sample covariance matrices from the observed signals and match the nodes based on the selected sample eigenvectors. Our analysis shows that the blind matching outcome converges to the result obtained with known graph topologies when the signal sampling size is large and the signal noise is small. Numerical results showcase the performance improvement of the proposed algorithm compared to matching two estimated underlying graphs learned from the graph signals.
Chapter
Characterizations in terms of fractals are typically employed for systems with complex and multiscale descriptions. A prominent example of such systems is provided by the human brain, which can be idealized as a complex dynamical system made of many interacting subunits. The human brain can be modeled in terms of observable variables together with their spatio-temporal-functional relations. Computational intelligence is a research field bridging many nature-inspired computational methods, such as artificial neural networks, fuzzy systems, and evolutionary and swarm intelligence optimization techniques. Typical problems faced by means of computational intelligence methods include those of recognition, such as classification and prediction. Although historically conceived to operate in some vector space, such methods have been recently extended to the so-called nongeometric spaces, considering labeled graphs as the most general example of such patterns. Here, we suggest that fractal analysis and computational intelligence methods can be exploited together in neuroscience research. Fractal characterizations can be used to (i) assess scale-invariant properties and (ii) offer numeric, feature-based representations to complement the usually more complex pattern structures encountered in neurosciences. Computational intelligence methods could be used to exploit such fractal characterizations, considering also the possibility to perform data-driven analysis of nongeometric input spaces, therby overcoming the intrinsic limits related to Euclidean geometry.
Book
Numerical Optimization presents a comprehensive and up-to-date description of the most effective methods in continuous optimization. It responds to the growing interest in optimization in engineering, science, and business by focusing on the methods that are best suited to practical problems. For this new edition the book has been thoroughly updated throughout. There are new chapters on nonlinear interior methods and derivative-free methods for optimization, both of which are used widely in practice and the focus of much current research. Because of the emphasis on practical methods, as well as the extensive illustrations and exercises, the book is accessible to a wide audience. It can be used as a graduate text in engineering, operations research, mathematics, computer science, and business. It also serves as a handbook for researchers and practitioners in the field. The authors have strived to produce a text that is pleasant to read, informative, and rigorous - one that reveals both the beautiful nature of the discipline and its practical side.
Article
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Article
The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light.