Conference PaperPDF Available

Image Webs: Computing and Exploiting Connectivity in Image Collections

Authors:

Abstract and Figures

The widespread availability of digital cameras and ubiquitous Internet access have facilitated the creation of massive image collections. These collections can be highly interconnected through implicit links between image pairs viewing the same or similar objects. We propose building graphs called Image Webs to represent such connections. While earlier efforts studied local neighborhoods of such graphs, we are interested in understanding global structure and exploiting connectivity at larger scales. We show how to efficiently construct Image Webs that capture the connectivity in an image collection using spectral graph theory. Our technique can link together tens of thousands of images in a few minutes using a computer cluster. We also demonstrate applications for exploring collections based on global topological analysis.
Content may be subject to copyright.
A preview of the PDF is not available
... These graphs can identify which pairs of images are likely to match. On the other hand, many image search algorithms, including state-of-the-art ones, follow the classical bag of visual words (BoVW) approach [13]- [23]; these algorithms are also used in large-scale SfM pipelines [7], [11], [24]. BoVW is a quantization method that represents an image by certain number of visual words. ...
Article
Selecting visually overlapping image pairs without any prior information is an essential task of large-scale structure from motion (SfM) pipelines. To address this problem, many state-of-the-art image retrieval systems adopt the idea of bag of visual words (BoVW) for computing image-pair similarity. In this paper, we present a method for improving the image pair selection using BoVW. Our method combines a conventional vector-based approach and a set-based approach. For the set similarity, we introduce a modified version of the Simpson (m-Simpson) coefficient. We show the advantage of this measure over three typical set similarity measures and demonstrate that the combination of vector similarity and the m-Simpson coefficient effectively reduces false positives and increases accuracy. To discuss the choice of vocabulary construction, we prepared both a sampled vocabulary on an evaluation dataset and a basic pre-trained vocabulary on a training dataset. In addition, we tested our method on vocabularies of different sizes. Our experimental results show that the proposed method dramatically improves precision scores especially on the sampled vocabulary and performs better than the state-of-the-art methods that use pre-trained vocabularies. We further introduce a method to determine the k value of top-k relevant searches for each image and show that it obtains higher precision at the same recall.
... Several works have utilized it in the past for exploring and navigating image collections. For example, "Image Webs" (Heath et al. 2010) discovers corresponding regions between images and uses spectral graph theory to capture the connectivity in an image dataset. The discovered connectivity is then used for revealing global structures in the dataset (such as paths linking between images), and for supporting a Photo-Tourism-style navigation (Snavely et al. 2006). ...
Article
Full-text available
We present a principled framework for inferring pixel labels in weakly-annotated image datasets. Most previous, example-based approaches to computer vision rely on a large corpus of densely labeled images. However, for large, modern image datasets, such labels are expensive to obtain and are often unavailable. We establish a large-scale graphical model spanning all labeled and unlabeled images, then solve it to infer pixel labels jointly for all images in the dataset while enforcing consistent annotations over similar visual patterns. This model requires significantly less labeled data and assists in resolving ambiguities by propagating inferred annotations from images with stronger local visual evidences to images with weaker local evidences. We apply our proposed framework to two computer vision problems, namely image annotation with semantic segmentation, and object discovery and co-segmentation (segmenting multiple images containing a common object). Extensive numerical evaluations and comparisons show that our method consistently outperforms the state-of-the-art in automatic annotation and semantic labeling, while requiring significantly less labeled data. In contrast to previous co-segmentation techniques, our method manages to discover and segment objects well even in the presence of substantial amounts of noise images (images not containing the common object), as typical for datasets collected from Internet search.
... Arguably the most popular topological summary is the persistence diagram (PD), which is a multi-set of points in a 2D plane that quantifies the birth and death times of topological features such as k-dimensional holes or sub-level sets of a function defined on a point cloud [15]. This simple summary has resulted in the adoption of topological methods for various applications [34,47,8,11,10,23,42,48,31]. However, TDA methods suffer from two major limitations. ...
Conference Paper
Full-text available
Topological features such as persistence diagrams and their functional approximations like persistence images (PIs) have been showing substantial promise for machine learning and computer vision applications. This is greatly attributed to the robustness topological representations provide against different types of physical nuisance variables seen in real-world data, such as view-point, illumination, and more. However, key bottlenecks to their large scale adoption are computational expenditure and difficulty incorporating them in a differentiable architecture. We take an important step in this paper to mitigate these bottlenecks by proposing a novel one-step approach to generate PIs directly from the input data. We design two separate convolutional neural network architectures, one designed to take in multi-variate time series signals as input and another that accepts multi-channel images as input. We call these networks Signal PI-Net and Image PI-Net respectively. To the best of our knowledge, we are the first to propose the use of deep learning for computing topological features directly from data. We explore the use of the proposed PI-Net architectures on two applications: human activity recognition using tri-axial accelerometer sensor data and image classification. We demonstrate the ease of fusion of PIs in supervised deep learning architectures and speed up of several orders of magnitude for extracting PIs from data. Our code is available at https://github.com/anirudhsom/PI-Net.
... Arguably the most popular topological summary is the persistence diagram (PD), which is a multi-set of points in a 2D plane that quantifies the birth and death times of topological features such as k-dimensional holes or sub-level sets of a function defined on a point cloud [15]. This simple summary has resulted in the adoption of topological methods for various applications [34,47,8,11,10,23,42,48,31]. However, TDA methods suffer from two major limitations. ...
Preprint
Full-text available
Topological features such as persistence diagrams and their functional approximations like persistence images (PIs) have been showing substantial promise for machine learning and computer vision applications. Key bottlenecks to their large scale adoption are computational expenditure and difficulty in incorporating them in a differentiable architecture. We take an important step in this paper to mitigate these bottlenecks by proposing a novel one-step approach to generate PIs directly from the input data. We propose a simple convolutional neural network architecture called PI-Net that allows us to learn mappings between the input data and PIs. We design two separate architectures, one designed to take in multi-variate time series signals as input and another that accepts multi-channel images as input. We call these networks Signal PI-Net and Image PI-Net respectively. To the best of our knowledge, we are the first to propose the use of deep learning for computing topological features directly from data. We explore the use of the proposed method on two applications: human activity recognition using accelerometer sensor data and image classification. We demonstrate the ease of fusing PIs in supervised deep learning architectures and speed up of several orders of magnitude for extracting PIs from data. Our code is available at https://github.com/anirudhsom/PI-Net.
... The core algorithm of the time-evolving summarization solves a "connecting the dots" problem, which has appeared in the literature before in a variety of guises in entity networks [16,30], image collections [27], social networks [15], and document collections [13,37,44,62]. While some of these efforts can be adapted toward our problem context, the time-evolving summarization of events emphasizes on amalgamating heterogeneous information (text and image features) in the summarization process, whereas the above efforts typically require a stronger connecting thread between nodes of a homogeneous network. ...
Article
Full-text available
A wide variety of publicly available heterogeneous data has provided us with an opportunity to meander through contextual snippets relevant to a particular event or persons of interest. One example of a heterogeneous source is online news articles where both images and text descriptions may co-exist in documents. Many of the images in a news article may contain faces of people. Names of many of the faces may not appear in the text. An expert on the topic may be able to identify people in images or at least recognize the context of the faces who are not widely known. However, it is difficult as well as expensive to employ topic experts of news topics to label every face of a massive news archive. In this paper, we describe an approach named F2ConText that helps analysts build contextual information, e.g., named entity context and geographical context of facial images found within news articles. Our approach extracts facial features of the faces detected in the images of publicly available news articles and learns probabilistic mappings between the features and the contents of the articles in an unsupervised manner. Afterward, it translates the mappings to geographical distributions and generates a contextual template for every face detected in the collection. This paper demonstrates three empirical studies—related to construction of context-based genealogy of events, tracking of a contextual phenomenon over time, and creation of contextual clusters of faces—to evaluate the effectiveness of the generated contexts.
... For a point (b, d) in the PD, a homological feature appears at scale b and disappears at scale d. Due to the simplicity of PDs, there has been a surge of interest to use persistent homology for summarizing high-dimensional complex data and has resulted in its successful implementation in several research areas [49,63,14,19,15,31,57,66]. However, application of machine learning (ML) techniques on the space of PDs has always been a challenging task. ...
Chapter
Full-text available
Topological methods for data analysis present opportunities for enforcing certain invariances of broad interest in computer vision, including view-point in activity analysis, articulation in shape analysis, and measurement invariance in non-linear dynamical modeling. The increasing success of these methods is attributed to the complementary information that topology provides, as well as availability of tools for computing topological summaries such as persistence diagrams. However, persistence diagrams are multi-sets of points and hence it is not straightforward to fuse them with features used for contemporary machine learning tools like deep-nets. In this paper we present theoretically well-grounded approaches to develop novel perturbation robust topological representations, with the long-term view of making them amenable to fusion with contemporary learning architectures. We term the proposed representation as Perturbed Topological Signatures, which live on a Grassmann manifold and hence can be efficiently used in machine learning pipelines. We explore the use of the proposed descriptor on three applications: 3D shape analysis, view-invariant activity analysis, and non-linear dynamical modeling. We show favorable results in both high-level recognition performance and time-complexity when compared to other baseline methods.
... For a point (b, d) in the PD, a homological feature appears at scale b and disappears at scale d. Due to the simplicity of PDs, there has been a surge of interest to use persistent homology for summarizing high-dimensional complex data and has resulted in its successful implementation in several research areas [49,63,14,19,15,31,57,66]. However, application of machine learning (ML) techniques on the space of PDs has always been a challenging task. ...
Preprint
Full-text available
Topological methods for data analysis present opportunities for enforcing certain invariances of broad interest in computer vision, including view-point in activity analysis, articulation in shape analysis, and measurement invariance in non-linear dynamical modeling. The increasing success of these methods is attributed to the complementary information that topology provides, as well as availability of tools for computing topological summaries such as persistence diagrams. However, persistence diagrams are multi-sets of points and hence it is not straightforward to fuse them with features used for contemporary machine learning tools like deep-nets. In this paper we present theoretically well-grounded approaches to develop novel perturbation robust topological representations, with the long-term view of making them amenable to fusion with contemporary learning architectures. We term the proposed representation as Perturbed Topological Signatures, which live on a Grassmann manifold and hence can be efficiently used in machine learning pipelines. We explore the use of the proposed descriptor on three applications: 3D shape analysis, view-invariant activity analysis, and non-linear dynamical modeling. We show favorable results in both high-level recognition performance and time-complexity when compared to other baseline methods.
... A fundamental problem in VLSI circuit design [17] deals with designing a suitable network topology (i.e., the interconnects between the components) such that the specified performance objective is realized. The same problem also 6 CHAPTER 1. INTRODUCTION appears in disparate disciplines such as coding theory [20], image webs [21], air traffic management [22,23] and free space optical and communication networks [24], [25]. ...
Preprint
This article deals with the following simpler version of an open problem in system realization theory which has several important engineering applications: Given a collection of masses and a set of linear springs with a specified cost and stiffness, a resource constraint in terms of a budget on the total cost, the problem is to determine an optimal connection of masses and springs so that the resulting structure is as stiff as possible, i.e., the structure is connected and its smallest non-zero natural frequency is as large as possible. In this article, algebraic connectivity, or its mechanical analog - the smallest non-zero natural frequency of a connected structure was chosen as a performance objective. Algebraic connectivity determines the convergence rate of consensus protocols and error attenuation in Unmanned Aerial Vehicle (UAV) formations and is chosen to be a performance objective as it can be viewed as a measure of robustness in UAV communication networks to random node failures. Underlying the mechanical and UAV network synthesis problems is a Mixed Integer Semi-Definite Program (MISDP), an NP-hard problem. The novel contributions of this article lies in developing efficient algorithms to solve the MISDPs based on iterative primal-dual algorithm, cutting-plane methods for polyhedral outer-approximation, capturing the feasible set of MISDPs using Fiedler vectors or Laplacian with equivalency guarantees, and efficient neighborhood-search-based heuristic algorithms.
Article
This paper presents an approach for Structure from Motion (SfM) for unorganized complex image sets. To achieve high accuracy and robustness, image triplets are employed and an (approximate) internal camera calibration is assumed to be known. The complexity of an image set is determined by the camera configurations which may include wide as well as weak baselines. Wide baselines occur for instance when terrestrial images and images from small Unmanned Aerial Systems (UAS) are combined. The resulting large (geometric/radiometric) distortions between images make image matching difficult possibly leading to an incomplete result. Weak baselines mean an insufficient distance between cameras compared to the distance of the observed scene and give rise to critical camera configurations. Inappropriate handling of such configurations may lead to various problems in triangulation-based SfM up to total failure. The focus of our approach lies on a complete linking of images even in case of wide or weak baselines. We do not rely on any additional information such as camera configurations, Global Positioning System (GPS) or an Inertial Navigation System (INS). As basis for generating suitable triplets to link the images, an iterative graph-based method is employed formulating image linking as the search for a terminal Steiner minimum tree in the line graph. SIFT (Lowe, 2004) descriptors are embedded into Hamming space for fast image similarity ranking. This is employed to limit the number of pairs to be geometrically verified by a computationally and more complex wide baseline matching method (Mayer et al., 2012). Critical camera configurations which are not suitable for geometric verification are detected by means of classification (Michelini and Mayer, 2019). Additionally, we propose a graph-based approach for the optimization of the hierarchical merging of triplets to efficiently generate larger image subsets. By this means, a complete, 3D reconstruction of the scene is obtained. Experiments demonstrate that the approach is able to produce reliable orientation for large image sets comprising wide as well as weak baseline configurations.
Chapter
The Gapminder project set out to use statistics to dispel simplistic notions about global development. In the same spirit, we use persistent homology, a technique from computational algebraic topology, to explore the relationship between country development and geography. For each country, four indicators, gross domestic product per capita; average life expectancy; infant mortality; and gross national income per capita, were used to quantify the development. Two analyses were performed. The first considers clusters of the countries based on these indicators, and the second uncovers cycles in the data when combined with geographic border structure. Our analysis is a multi-scale approach that reveals similarities and connections among countries at a variety of levels. We discover localized development patterns that are invisible in standard statistical methods.
Conference Paper
Full-text available
This paper improves recent methods for large scale image search. State-of-the-art methods build on the bag-of-features image representation. We, first, analyze bag-of-features in the framework of approximate nearest neighbor search. This shows the sub-optimality of such a representation for matching descriptors and leads us to derive a more precise representation based on 1) Hamming embedding (HE) and 2) weak geometric consistency constraints (WGC). HE provides binary signatures that refine the matching based on visual words. WGC filters matching descriptors that are not consistent in terms of angle and scale. HE and WGC are integrated within the inverted file and are efficiently exploited for all images, even in the case of very large datasets. Experiments performed on a dataset of one million of images show a significant improvement due to the binary signature and the weak geometric consistency constraints, as well as their efficiency. Estimation of the full geometric transformation, i.e., a re-ranking step on a short list of images, is complementary to our weak geometric consistency constraints and allows to further improve the accuracy.
Article
Full-text available
We propose a novel hashing scheme for image retrieval, clustering and automatic object discovery. Unlike commonly used bag-of-words approaches, the spatial extent of image features is exploited in our method. The geometric information is used both to construct repeatable hash keys and to increase the discriminability of the description. Each hash key combines visual appearance (visual words) with semi-local geometric information. Compared with the state-of-the-art min-hash, the proposed method has both higher recall (probability of collision for hashes on the same object) and lower false positive rates (random collisions). The advantages of geometric min-hashing approach are most pronounced in the presence of viewpoint and scale change, significant occlusion or small physical overlap of the viewing fields. We demonstrate the power of the proposed method on small object discovery in a large unordered collection of images and on a large scale image clustering problem.
Article
Full-text available
Persistent homology is an algebraic tool for measuring topological features of shapes and functions. It casts the multi-scale organization we frequently observe in na-ture into a mathematical formalism. Here we give a record of the short history of persistent homology and present its basic concepts. Besides the mathematics we focus on algorithms and mention the various connections to applications, including to biomolecules, biological networks, data analysis, and geometric modeling.
Article
We present a system for interactively browsing and exploring large unstructured collections of photographs of a scene using a novel 3D interface. Our system consists of an image-based modeling front end that automatically computes the viewpoint of each photograph as well as a sparse 3D model of the scene and image to model correspondences. Our photo explorer uses image-based rendering techniques to smoothly transition between photographs, while also enabling full 3D navigation and exploration of the set of images and world geometry, along with auxiliary information such as overhead maps. Our system also makes it easy to construct photo tours of scenic or historic locations, and to annotate image details, which are automatically transferred to other relevant images. We demonstrate our system on several large personal photo collections as well as images gathered from Internet photo sharing sites.
Article
When a scene is photographed many times by different people, the viewpoints often cluster along certain paths. These paths are largely specific to the scene being photographed, and follow interesting regions and viewpoints. We seek to discover a range of such paths and turn them into controls for image-based rendering. Our approach takes as input a large set of community or personal photos, reconstructs camera viewpoints, and automatically computes orbits, panoramas, canonical views, and optimal paths between views. The scene can then be interactively browsed in 3D using these controls or with six degree-of-freedom free-viewpoint control. As the user browses the scene, nearby views are continuously selected and transformed, using control-adaptive reprojection techniques.
Conference Paper
Abstract This paper introduces a novel colour-based affine co- variantregiondetector. Ouralgorithmisanextension ofthe maximally stable extremal region (MSER) to colour. The ex- tension to colour is done by looking at successive time-steps of an agglomerative clustering of image pixels. The se- lection of time-steps is stabilised against intensity scalings and image blur by modelling the distribution of edge mag- nitudes. The algorithm contains a novel edge significance measure based on a Poisson image noise model, which we show performs better than the commonly,used Euclidean distance. We compare our algorithm to the original MSER detector and a competing,colour-based blob feature detec- tor, and show through a repeatability test that our detector performs better. We also extend the state of the art in fea- ture repeatability tests, by using scenes consisting of two planes where one is piecewise transparent. This new test is able to evaluate how stable a feature is against changing backgrounds.