Conference PaperPDF Available

Image Webs: Computing and Exploiting Connectivity in Image Collections

June 2010
Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

June 2010

DOI:10.1109/CVPR.2010.5539991

Source
DBLP

Conference: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010

Authors:

Kyle Heath

Mousera

N. Gelfand

Fermi National Accelerator Laboratory (Fermilab)

Show all 5 authorsHide

The widespread availability of digital cameras and ubiquitous Internet access have facilitated the creation of massive image collections. These collections can be highly interconnected through implicit links between image pairs viewing the same or similar objects. We propose building graphs called Image Webs to represent such connections. While earlier efforts studied local neighborhoods of such graphs, we are interested in understanding global structure and exploiting connectivity at larger scales. We show how to efficiently construct Image Webs that capture the connectivity in an image collection using spectral graph theory. Our technique can link together tens of thousands of images in a few minutes using a computer cluster. We also demonstrate applications for exploring collections based on global topological analysis.

Image Webs can connect images of buildings along a bus route via the moving bus (top row) or images of Calais and London through casts of the Rodin sculpture "The Burghers of Calais", which commemorates a historical connection between the two cities in the Hundred Years' War (bottom row).

…

Affine cosegmentation is performed by a.) extracting local features b.) detecting affine consistent feature matches c.) extracting regions by merging keypoint support regions.

…

Image-graphs resulting from Phase 2 construction using EdgeRank (a) and Query Expansion (b) strategies respectively. The graph in (a) has a connectivity 5.5 times larger than the initial graph after adding 151 edges. The graph in (b) has a connectivity only 1.7 times larger after adding 5,212 edges.

…

In (a), a visual hyperlink browser lets users navigate to related images by clicking on visual hyperlinks and provides a de- tailed view of a local neighborhood of the web. In (b), a summary graph browser provides a global view of the web allowing simple navigation around the entire web.

…

Figures - uploaded by Leonidas J. Guibas

Content may be subject to copyright.

Content uploaded by Leonidas J. Guibas

Content may be subject to copyright.

A preview of the PDF is not available

Improving Image Pair Selection for Large Scale Structure from Motion by Introducing Modified Simpson Coefficient

Article

Sep 2022

Selecting visually overlapping image pairs without any prior information is an essential task of large-scale structure from motion (SfM) pipelines. To address this problem, many state-of-the-art image retrieval systems adopt the idea of bag of visual words (BoVW) for computing image-pair similarity. In this paper, we present a method for improving the image pair selection using BoVW. Our method combines a conventional vector-based approach and a set-based approach. For the set similarity, we introduce a modified version of the Simpson (m-Simpson) coefficient. We show the advantage of this measure over three typical set similarity measures and demonstrate that the combination of vector similarity and the m-Simpson coefficient effectively reduces false positives and increases accuracy. To discuss the choice of vocabulary construction, we prepared both a sampled vocabulary on an evaluation dataset and a basic pre-trained vocabulary on a training dataset. In addition, we tested our method on vocabularies of different sizes. Our experimental results show that the proposed method dramatically improves precision scores especially on the sampled vocabulary and performs better than the state-of-the-art methods that use pre-trained vocabularies. We further introduce a method to determine the k value of top-k relevant searches for each image and show that it obtains higher precision at the same recall.

Joint Inference in Weakly-Annotated Image Datasets via Dense Correspondence

Article

Full-text available

Aug 2016
INT J COMPUT VISION

We present a principled framework for inferring pixel labels in weakly-annotated image datasets. Most previous, example-based approaches to computer vision rely on a large corpus of densely labeled images. However, for large, modern image datasets, such labels are expensive to obtain and are often unavailable. We establish a large-scale graphical model spanning all labeled and unlabeled images, then solve it to infer pixel labels jointly for all images in the dataset while enforcing consistent annotations over similar visual patterns. This model requires significantly less labeled data and assists in resolving ambiguities by propagating inferred annotations from images with stronger local visual evidences to images with weaker local evidences. We apply our proposed framework to two computer vision problems, namely image annotation with semantic segmentation, and object discovery and co-segmentation (segmenting multiple images containing a common object). Extensive numerical evaluations and comparisons show that our method consistently outperforms the state-of-the-art in automatic annotation and semantic labeling, while requiring significantly less labeled data. In contrast to previous co-segmentation techniques, our method manages to discover and segment objects well even in the presence of substantial amounts of noise images (images not containing the common object), as typical for datasets collected from Internet search.

PI-Net: A Deep Learning Approach to Extract Topological Persistence Images

Conference Paper

Full-text available

Jun 2020

Topological features such as persistence diagrams and their functional approximations like persistence images (PIs) have been showing substantial promise for machine learning and computer vision applications. This is greatly attributed to the robustness topological representations provide against different types of physical nuisance variables seen in real-world data, such as view-point, illumination, and more. However, key bottlenecks to their large scale adoption are computational expenditure and difficulty incorporating them in a differentiable architecture. We take an important step in this paper to mitigate these bottlenecks by proposing a novel one-step approach to generate PIs directly from the input data. We design two separate convolutional neural network architectures, one designed to take in multi-variate time series signals as input and another that accepts multi-channel images as input. We call these networks Signal PI-Net and Image PI-Net respectively. To the best of our knowledge, we are the first to propose the use of deep learning for computing topological features directly from data. We explore the use of the proposed PI-Net architectures on two applications: human activity recognition using tri-axial accelerometer sensor data and image classification. We demonstrate the ease of fusion of PIs in supervised deep learning architectures and speed up of several orders of magnitude for extracting PIs from data. Our code is available at https://github.com/anirudhsom/PI-Net.

PI-Net: A Deep Learning Approach to Extract Topological Persistence Images

Preprint

Full-text available

Jun 2019

Topological features such as persistence diagrams and their functional approximations like persistence images (PIs) have been showing substantial promise for machine learning and computer vision applications. Key bottlenecks to their large scale adoption are computational expenditure and difficulty in incorporating them in a differentiable architecture. We take an important step in this paper to mitigate these bottlenecks by proposing a novel one-step approach to generate PIs directly from the input data. We propose a simple convolutional neural network architecture called PI-Net that allows us to learn mappings between the input data and PIs. We design two separate architectures, one designed to take in multi-variate time series signals as input and another that accepts multi-channel images as input. We call these networks Signal PI-Net and Image PI-Net respectively. To the best of our knowledge, we are the first to propose the use of deep learning for computing topological features directly from data. We explore the use of the proposed method on two applications: human activity recognition using accelerometer sensor data and image classification. We demonstrate the ease of fusing PIs in supervised deep learning architectures and speed up of several orders of magnitude for extracting PIs from data. Our code is available at https://github.com/anirudhsom/PI-Net.

F2ConText: how to extract holistic contexts of persons of interest for enhancing exploratory analysis

Article

Full-text available

Oct 2019
KNOWL INF SYST

A wide variety of publicly available heterogeneous data has provided us with an opportunity to meander through contextual snippets relevant to a particular event or persons of interest. One example of a heterogeneous source is online news articles where both images and text descriptions may co-exist in documents. Many of the images in a news article may contain faces of people. Names of many of the faces may not appear in the text. An expert on the topic may be able to identify people in images or at least recognize the context of the faces who are not widely known. However, it is difficult as well as expensive to employ topic experts of news topics to label every face of a massive news archive. In this paper, we describe an approach named F2ConText that helps analysts build contextual information, e.g., named entity context and geographical context of facial images found within news articles. Our approach extracts facial features of the faces detected in the images of publicly available news articles and learns probabilistic mappings between the features and the contents of the articles in an unsupervised manner. Afterward, it translates the mappings to geographical distributions and generates a contextual template for every face detected in the collection. This paper demonstrates three empirical studies—related to construction of context-based genealogy of events, tracking of a contextual phenomenon over time, and creation of contextual clusters of faces—to evaluate the effectiveness of the generated contexts.

Perturbation Robust Representations of Topological Persistence Diagrams: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII

Chapter

Full-text available

Sep 2018

Topological methods for data analysis present opportunities for enforcing certain invariances of broad interest in computer vision, including view-point in activity analysis, articulation in shape analysis, and measurement invariance in non-linear dynamical modeling. The increasing success of these methods is attributed to the complementary information that topology provides, as well as availability of tools for computing topological summaries such as persistence diagrams. However, persistence diagrams are multi-sets of points and hence it is not straightforward to fuse them with features used for contemporary machine learning tools like deep-nets. In this paper we present theoretically well-grounded approaches to develop novel perturbation robust topological representations, with the long-term view of making them amenable to fusion with contemporary learning architectures. We term the proposed representation as Perturbed Topological Signatures, which live on a Grassmann manifold and hence can be efficiently used in machine learning pipelines. We explore the use of the proposed descriptor on three applications: 3D shape analysis, view-invariant activity analysis, and non-linear dynamical modeling. We show favorable results in both high-level recognition performance and time-complexity when compared to other baseline methods.

Perturbation Robust Representations of Topological Persistence Diagrams

Preprint

Full-text available

Jul 2018

On Maximizing Weighted Algebraic Connectivity for Synthesizing Robust Networks

Preprint

May 2018

Harsha Nagarajan

This article deals with the following simpler version of an open problem in system realization theory which has several important engineering applications: Given a collection of masses and a set of linear springs with a specified cost and stiffness, a resource constraint in terms of a budget on the total cost, the problem is to determine an optimal connection of masses and springs so that the resulting structure is as stiff as possible, i.e., the structure is connected and its smallest non-zero natural frequency is as large as possible. In this article, algebraic connectivity, or its mechanical analog - the smallest non-zero natural frequency of a connected structure was chosen as a performance objective. Algebraic connectivity determines the convergence rate of consensus protocols and error attenuation in Unmanned Aerial Vehicle (UAV) formations and is chosen to be a performance objective as it can be viewed as a measure of robustness in UAV communication networks to random node failures. Underlying the mechanical and UAV network synthesis problems is a Mixed Integer Semi-Definite Program (MISDP), an NP-hard problem. The novel contributions of this article lies in developing efficient algorithms to solve the MISDPs based on iterative primal-dual algorithm, cutting-plane methods for polyhedral outer-approximation, capturing the feasible set of MISDPs using Fiedler vectors or Laplacian with equivalency guarantees, and efficient neighborhood-search-based heuristic algorithms.

Structure from motion for complex image sets

Article

Aug 2020
ISPRS J PHOTOGRAMM

This paper presents an approach for Structure from Motion (SfM) for unorganized complex image sets. To achieve high accuracy and robustness, image triplets are employed and an (approximate) internal camera calibration is assumed to be known. The complexity of an image set is determined by the camera configurations which may include wide as well as weak baselines. Wide baselines occur for instance when terrestrial images and images from small Unmanned Aerial Systems (UAS) are combined. The resulting large (geometric/radiometric) distortions between images make image matching difficult possibly leading to an incomplete result. Weak baselines mean an insufficient distance between cameras compared to the distance of the observed scene and give rise to critical camera configurations. Inappropriate handling of such configurations may lead to various problems in triangulation-based SfM up to total failure. The focus of our approach lies on a complete linking of images even in case of wide or weak baselines. We do not rely on any additional information such as camera configurations, Global Positioning System (GPS) or an Inertial Navigation System (INS). As basis for generating suitable triplets to link the images, an iterative graph-based method is employed formulating image linking as the search for a terminal Steiner minimum tree in the line graph. SIFT (Lowe, 2004) descriptors are embedded into Hamming space for fast image similarity ranking. This is employed to limit the number of pairs to be geometrically verified by a computationally and more complex wide baseline matching method (Mayer et al., 2012). Critical camera configurations which are not suitable for geometric verification are detected by means of classification (Michelini and Mayer, 2019). Additionally, we propose a graph-based approach for the optimization of the hierarchical merging of triplets to efficiently generate larger image subsets. By this means, a complete, 3D reconstruction of the scene is obtained. Experiments demonstrate that the approach is able to produce reliable orientation for large image sets comprising wide as well as weak baseline configurations.

Mind the Gap: A Study in Global Development Through Persistent Homology

Chapter

Jul 2018

The Gapminder project set out to use statistics to dispel simplistic notions about global development. In the same spirit, we use persistent homology, a technique from computational algebraic topology, to explore the relationship between country development and geography. For each country, four indicators, gross domestic product per capita; average life expectancy; infant mortality; and gross national income per capita, were used to quantify the development. Two analyses were performed. The first considers clusters of the countries based on these indicators, and the second uncovers cycles in the data when combined with geographic border structure. Our analysis is a multi-scale approach that reveals similarities and connections among countries at a variety of levels. We discover localized development patterns that are invisible in standard statistical methods.

Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search

Conference Paper

Full-text available

Oct 2008

This paper improves recent methods for large scale image search. State-of-the-art methods build on the bag-of-features image representation. We, first, analyze bag-of-features in the framework of approximate nearest neighbor search. This shows the sub-optimality of such a representation for matching descriptors and leads us to derive a more precise representation based on 1) Hamming embedding (HE) and 2) weak geometric consistency constraints (WGC). HE provides binary signatures that refine the matching based on visual words. WGC filters matching descriptors that are not consistent in terms of angle and scale. HE and WGC are integrated within the inverted file and are efficiently exploited for all images, even in the case of very large datasets. Experiments performed on a dataset of one million of images show a significant improvement due to the binary signature and the weak geometric consistency constraints, as well as their efficiency. Estimation of the full geometric transformation, i.e., a re-ranking step on a short list of images, is complementary to our weak geometric consistency constraints and allows to further improve the accuracy.

Geometric min-Hashing: Finding a (thick) needle in a haystack

Article

Full-text available

Jun 2009
IEEE Comput Soc Conf Comput Vis Pattern Recogn

We propose a novel hashing scheme for image retrieval, clustering and automatic object discovery. Unlike commonly used bag-of-words approaches, the spatial extent of image features is exploited in our method. The geometric information is used both to construct repeatable hash keys and to increase the discriminability of the description. Each hash key combines visual appearance (visual words) with semi-local geometric information. Compared with the state-of-the-art min-hash, the proposed method has both higher recall (probability of collision for hashes on the same object) and lower false positive rates (random collisions). The advantages of geometric min-hashing approach are most pronounced in the presence of viewpoint and scale change, significant occlusion or small physical overlap of the viewing fields. We demonstrate the power of the proposed method on small object discovery in a large unordered collection of images and on a large scale image clustering problem.

Persistent homology—a survey

Article

Full-text available

Jan 2008

Persistent homology is an algebraic tool for measuring topological features of shapes and functions. It casts the multi-scale organization we frequently observe in na-ture into a mathematical formalism. Here we give a record of the short history of persistent homology and present its basic concepts. Besides the mathematics we focus on algorithms and mention the various connections to applications, including to biomolecules, biological networks, data analysis, and geometric modeling.

Photo tourism: Exploring photo collections in 3D

Article

Jan 2006

We present a system for interactively browsing and exploring large unstructured collections of photographs of a scene using a novel 3D interface. Our system consists of an image-based modeling front end that automatically computes the viewpoint of each photograph as well as a sparse 3D model of the scene and image to model correspondences. Our photo explorer uses image-based rendering techniques to smoothly transition between photographs, while also enabling full 3D navigation and exploration of the set of images and world geometry, along with auxiliary information such as overhead maps. Our system also makes it easy to construct photo tours of scenic or historic locations, and to annotate image details, which are automatically transferred to other relevant images. We demonstrate our system on several large personal photo collections as well as images gathered from Internet photo sharing sites.

Finding paths through the world's photos

Article

Aug 2008

When a scene is photographed many times by different people, the viewpoints often cluster along certain paths. These paths are largely specific to the scene being photographed, and follow interesting regions and viewpoints. We seek to discover a range of such paths and turn them into controls for image-based rendering. Our approach takes as input a large set of community or personal photos, reconstructs camera viewpoints, and automatically computes orbits, panoramas, canonical views, and optimal paths between views. The scene can then be interactively browsed in 3D using these controls or with six degree-of-freedom free-viewpoint control. As the user browses the scene, nearby views are continuously selected and transformed, using control-adaptive reprojection techniques.

Inferring Local Homology from Sampled Stratified Spaces

Conference Paper

Oct 2007

Distinctive image features from scale-invariant key points

Article

Jan 2003
INT J COMPUT VISION

D. Lowe

Persistent homology - A survey

Chapter

Jan 2008

On the Vietoris-Rips complexes and a cohomology theory for metric spaces

Article

Jan 1994

J. C. Hausmann

Maximally Stable Colour Regions for Recognition and Matching

Conference Paper

Jun 2007
IEEE Comput Soc Conf Comput Vis Pattern Recogn

Per-Erik Forssén

Abstract This paper introduces a novel colour-based affine co- variantregiondetector. Ouralgorithmisanextension ofthe maximally stable extremal region (MSER) to colour. The ex- tension to colour is done by looking at successive time-steps of an agglomerative clustering of image pixels. The se- lection of time-steps is stabilised against intensity scalings and image blur by modelling the distribution of edge mag- nitudes. The algorithm contains a novel edge significance measure based on a Poisson image noise model, which we show performs better than the commonly,used Euclidean distance. We compare our algorithm to the original MSER detector and a competing,colour-based blob feature detec- tor, and show through a repeatability test that our detector performs better. We also extend the state of the art in fea- ture repeatability tests, by using scenes consisting of two planes where one is piecewise transparent. This new test is able to evaluate how stable a feature is against changing backgrounds.

Image Webs: Computing and Exploiting Connectivity in Image Collections

Abstract and Figures

Recommended publications

An Invoice Reading System Using a Graph Convolutional Network

RankCompete: simultaneous ranking and clustering of web photos

Collection of neighborhoods of normal tiling

Edge location to subpixel precision and analysis