Illustration of pixel selection. Areas shaded with green indicate an pixels with absolute gradient magnitude greater than 5 resulting in 220422 pixels (47.3% of total pixels). Selected pixels are shown in the second row resulting in 7573 (1.62% of total pixels).

Source publication

Direct Disparity Space: Robust and Real-time Visual Odometry

Technical Report

Full-text available

Oct 2014

We present a direct visual odometry formulation using a warping function in dis- parity space. In disparity space measurement noise is well-modeled by a Gaussian distribution, in contrast to the heteroscedastic noise in 3D space. In addition, the Ja- cobian of the warp separates the rotation and translation terms, enabling motion to be estimated fr...

Context 1

... applications with pose tracking, Dellaert and Collins [1999] propose a method that se- lects the pixels that constrain each degree of freedom the most. However, the method requires An example of selected pixels using our approach is shown in fig. 3. The figure compares the number of pixels that would be detected by absolute gradient magnitude thresholding (with threshold set to five pixels) versus our method. As we will show in section 4, this pixel selection scheme is sufficient for accurate pose tracking and real-time performance using only 1.0 − 3.0% of the usable image ...

View in full-text

Efficient Online Surface Correction for Real-time Large-Scale 3D Reconstruction

Conference Paper

Full-text available

Sep 2017

State-of-the-art methods for large-scale 3D reconstruction from RGB-D sensors usually reduce drift in camera tracking by globally optimizing the estimated camera poses in real-time without simultaneously updating the reconstructed surface on pose changes. We propose an efficient on-the-fly surface correction method for globally consistent dense 3D...

Fast Multi-frame Stereo Scene Flow with Motion Segmentation

Conference Paper

Full-text available

Jul 2017

We propose a new multi-frame method for efficiently computing scene flow (dense depth and optical flow) and camera ego-motion for a dynamic scene observed from a moving stereo camera rig. Our technique also segments out moving objects from the rigid scene. In our method, we first estimate the disparity map and the 6-DOF camera motion using stereo matching and visual odometry. We then identify regions inconsistent with the estimated camera motion and compute per-pixel optical flow only at these regions. This flow proposal is fused with the camera motion-based flow proposal using fusion moves to obtain the final optical flow and motion segmentation. This unified framework benefits all four tasks - stereo, optical flow, visual odometry and motion segmentation leading to overall higher accuracy and efficiency. Our method is currently ranked third on the KITTI 2015 scene flow benchmark. Furthermore, our CPU implementation runs in 2-3 seconds per frame which is 1-3 orders of magnitude faster than the top six methods. We also report a thorough evaluation on challenging Sintel sequences with fast camera and object motion, where our method consistently outperforms OSF [Menze and Geiger, 2015], which is currently ranked second on the KITTI benchmark.

Discrete Inference Approaches to Image Segmentation and Dense Correspondence

Thesis

Full-text available

Mar 2017

Tatsunori Taniai

We consider discrete inference approaches to image segmentation and dense correspondence. The two problems cover diverse tasks such as image segmentation, binarization, cosegmentation, motion segmentation, binocular stereo vision, optical flow and general dense correspondence, which are addressed sorely or jointly in this work as energy minimization problems on Markov random fields (MRFs). Discrete inference approaches are employed to effectively optimize inherently discrete functions or highly non-convex continuous functions. The contributions of this work are two folds: proposal of novel joint frameworks of image segmentation and dense correspondence problems, and development of new inference techniques for sole or joint tasks. Specifically, we comprehensively address three challenges of discrete inference, that is, label space size, higher-order energy, and non-submodular energy, which are posed in various forms in the following tasks that we tackle. First, we study inference problems on non-submodular and higher-order MRFs that have binary variables. Such problems naturally appear in low-level computer vision tasks such as image segmentation and binarization of gray images. They are also imposed as subproblems in estimation of more general multi-valued or continuous-valued variables. For such fundamental inference problems, we develop a new theoretical insight into several existing optimization methods and propose a new method by unifying them. The proposed method has a mechanism to better avoid bad local minimums of non-submodular functions, and is thus more robust to initializations compared to existing methods. The proposed method was evaluated on image segmentation and binarization tasks and was shown to outperform state-of-the-art methods. Second, we propose an efficient and accurate binocular stereo matching method, whose model and inference both favor piecewise planar surfaces. We formulate the stereo problem by a model of per-pixel local 3D surface planes with piecewise planar smoothness regularization, which forms a pairwise MRF with a continuous 3D label space. In order to efficiently infer this rich model, we propose a new inference technique that extends the well-known expansion move algorithm by incorporating the spatial propagation and randomization search mechanisms of PatchMatch inference. Unlike conventional fusion-based approaches, the proposed method does not require solution proposals and also produces submodular energies that are optimally minimized by graph cuts during the inference. The computations can be easily accelerated by parallelization and using fast cost-map filtering. The proposed method achieved the state-of-the-art performance on the Middlebury stereo benchmark among more than 160 stereo algorithms. Third, we propose a unified framework of general dense correspondence and cosegmentation for two images, where common “foreground” regions in the two images are segmented and aligned to each other. Our method is formulated using a hierarchical MRF model with joint labels of segmentation and correspondence. The correspondence field is parameterized using similarity transformations (4-DOF) assigned on superpixels. The hierarchy is used to evaluate correspondence across various coarseness of superpixels, which brings high robustness when aligning objects with different appearances. Unlike prior hierarchical methods which assume that the structure is given, we dynamically recover the structure along with the correspondence and segmentation labeling. This joint inference is performed in an energy minimization framework using iterated graph cuts. The proposed method was quantitatively evaluated on a new dataset and it outperformed state-of-the-art methods designed specifically for either cosegmentation or correspondence estimation. Finally, we propose a fast scene flow method for stereo image sequences that simultaneously recovers motion segmentation of moving objects as well as camera ego-motion. This framework unifies four tasks ―stereo, optical flow, motion segmentation and visual odometry providing rich information of disparity, 2D flow and binary segmentation of moving objects at every pixel along with camera motion. The inference is carried out through a multi-staged pipeline where the solution to one task benefits others, leading to overall higher accuracy and efficiency. The proposed method was evaluated on the KITTI 2015 scene flow benchmark and was ranked third. Furthermore, our CPU implementation processed each frame in 2-3 seconds, which was 1-3 orders of magnitude faster than the top six methods that took 1-50 minutes per frame. Our method was also thoroughly evaluated on challenging Sintel sequences having fast camera and object motion, where our method consistently outperformed the method ranked second on the KITTI benchmark.

Direct Visual Odometry in Low Light Using Binary Descriptors

Article

Dec 2016

Feature descriptors are powerful tools for photometrically and geometrically invariant image matching. To date, however, their use has been tied to sparse interest point detection, which is susceptible to noise under adverse imaging conditions. In this work, we propose to use binary feature descriptors in a direct tracking framework without relying on sparse interest points. This novel combination of feature descriptors and direct tracking is shown to achieve robust and efficient visual odometry with applications to poorly lit subterranean environments.

Direct Visual Odometry using Bit-Planes

Article

Full-text available

Apr 2016

Feature descriptors, such as SIFT and ORB, are well-known for their robustness to illumination changes, which has made them popular for feature-based VSLAM\@. However, in degraded imaging conditions such as low light, low texture, blur and specular reflections, feature extraction is often unreliable. In contrast, direct VSLAM methods which estimate the camera pose by minimizing the photometric error using raw pixel intensities are often more robust to low textured environments and blur. Nonetheless, at the core of direct VSLAM is the reliance on a consistent photometric appearance across images, otherwise known as the brightness constancy assumption. Unfortunately, brightness constancy seldom holds in real world applications. In this work, we overcome brightness constancy by incorporating feature descriptors into a direct visual odometry framework. This combination results in an efficient algorithm that combines the strength of both feature-based algorithms and direct methods. Namely, we achieve robustness to arbitrary photometric variations while operating in low-textured and poorly lit environments. Our approach utilizes an efficient binary descriptor, which we call Bit-Planes, and show how it can be used in the gradient-based optimization required by direct methods. Moreover, we show that the squared Euclidean distance between Bit-Planes is equivalent to the Hamming distance. Hence, the descriptor may be used in least squares optimization without sacrificing its photometric invariance. Finally, we present empirical results that demonstrate the robustness of the approach in poorly lit underground environments.

Direct Visual Odometry using Bit-Planes

Working Paper

Full-text available

Jan 2016

A Two-Stage Attention Based Efficient Second-Order Minimization Network for Planar Object Tracking

Conference Paper

Oct 2023

Illustration of pixel selection. Areas shaded with green indicate an pixels with absolute gradient magnitude greater than 5 resulting in 220422 pixels (47.3% of total pixels). Selected pixels are shown in the second row resulting in 7573 (1.62% of total pixels).

Context in source publication

Similar publications

Citations