June 2023
·
10 Reads
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
June 2023
·
10 Reads
March 2023
·
23 Reads
We present a new pipeline for acquiring a textured mesh in the wild with a single smartphone which offers access to images, depth maps, and valid poses. Our method first introduces an RGBD-aided structure from motion, which can yield filtered depth maps and refines camera poses guided by corresponding depth. Then, we adopt the neural implicit surface reconstruction method, which allows for high-quality mesh and develops a new training process for applying a regularization provided by classical multi-view stereo methods. Moreover, we apply a differentiable rendering to fine-tune incomplete texture maps and generate textures which are perceptually closer to the original scene. Our pipeline can be applied to any common objects in the real world without the need for either in-the-lab environments or accurate mask images. We demonstrate results of captured objects with complex shapes and validate our method numerically against existing 3D reconstruction and texture mapping methods.
May 2022
·
91 Reads
Visual localization, i.e., camera pose estimation in a known scene, is a core component of technologies such as autonomous driving and augmented reality. State-of-the-art localization approaches often rely on image retrieval techniques for one of two purposes: (1) provide an approximate pose estimate or (2) determine which parts of the scene are potentially visible in a given query image. It is common practice to use state-of-the-art image retrieval algorithms for both of them. These algorithms are often trained for the goal of retrieving the same landmark under a large range of viewpoint changes which often differs from the requirements of visual localization. In order to investigate the consequences for visual localization, this paper focuses on understanding the role of image retrieval for multiple visual localization paradigms. First, we introduce a novel benchmark setup and compare state-of-the-art retrieval representations on multiple datasets using localization performance as metric. Second, we investigate several definitions of "ground truth" for image retrieval. Using these definitions as upper bounds for the visual localization paradigms, we show that there is still sgnificant room for improvement. Third, using these tools and in-depth analysis, we show that retrieval performance on classical landmark retrieval or place recognition tasks correlates only for some but not all paradigms to localization performance. Finally, we analyze the effects of blur and dynamic scenes in the images. We conclude that there is a need for retrieval approaches specifically designed for localization paradigms. Our benchmark and evaluation protocols are available at https://github.com/naver/kapture-localization. https://arxiv.org/abs/2205.15761
May 2022
·
137 Reads
·
22 Citations
International Journal of Computer Vision
Visual localization, i.e., camera pose estimation in a known scene, is a core component of technologies such as autonomous driving and augmented reality. State-of-the-art localization approaches often rely on image retrieval techniques for one of two purposes: (1) provide an approximate pose estimate or (2) determine which parts of the scene are potentially visible in a given query image. It is common practice to use state-of-the-art image retrieval algorithms for both of them. These algorithms are often trained for the goal of retrieving the same landmark under a large range of viewpoint changes which often differs from the requirements of visual localization. In order to investigate the consequences for visual localization, this paper focuses on understanding the role of image retrieval for multiple visual localization paradigms. First, we introduce a novel benchmark setup and compare state-of-the-art retrieval representations on multiple datasets using localization performance as metric. Second, we investigate several definitions of “ground truth” for image retrieval. Using these definitions as upper bounds for the visual localization paradigms, we show that there is still significant room for improvement. Third, using these tools and in-depth analysis, we show that retrieval performance on classical landmark retrieval or place recognition tasks correlates only for some but not all paradigms to localization performance. Finally, we analyze the effects of blur and dynamic scenes in the images. We conclude that there is a need for retrieval approaches specifically designed for localization paradigms. Our benchmark and evaluation protocols are available at https://github.com/naver/kapture-localization.
May 2022
·
11 Reads
·
28 Citations
May 2022
·
14 Reads
·
5 Citations
March 2022
·
122 Reads
Global registration using 3D point clouds is a crucial technology for mobile platforms to achieve localization or manage loop-closing situations. In recent years, numerous researchers have proposed global registration methods to address a large number of outlier correspondences. Unfortunately, the degeneracy problem, which represents the phenomenon in which the number of estimated inliers becomes lower than three, is still potentially inevitable. To tackle the problem, a degeneracy-robust decoupling-based global registration method is proposed, called Quatro. In particular, our method employs quasi-SO(3) estimation by leveraging the Atlanta world assumption in urban environments to avoid degeneracy in rotation estimation. Thus, the minimum degree of freedom (DoF) of our method is reduced from three to one. As verified in indoor and outdoor 3D LiDAR datasets, our proposed method yields robust global registration performance compared with other global registration methods, even for distant point cloud pairs. Furthermore, the experimental results confirm the applicability of our method as a coarse alignment. Our code is available: https://github.com/url-kaist/quatro.
March 2022
·
36 Reads
Monocular depth estimation in the wild inherently predicts depth up to an unknown scale. To resolve scale ambiguity issue, we present a learning algorithm that leverages monocular simultaneous localization and mapping (SLAM) with proprioceptive sensors. Such monocular SLAM systems can provide metrically scaled camera poses. Given these metric poses and monocular sequences, we propose a self-supervised learning method for the pre-trained supervised monocular depth networks to enable metrically scaled depth estimation. Our approach is based on a teacher-student formulation which guides our network to predict high-quality depths. We demonstrate that our approach is useful for various applications such as mobile robot navigation and is applicable to diverse environments. Our full system shows improvements over recent self-supervised depth estimation and completion methods on EuRoC, OpenLORIS, and ScanNet datasets.
October 2021
·
12 Reads
·
4 Citations
August 2021
·
15 Reads
We present a novel approach for estimating depth from a monocular camera as it moves through complex and crowded indoor environments, e.g., a department store or a metro station. Our approach predicts absolute scale depth maps over the entire scene consisting of a static background and multiple moving people, by training on dynamic scenes. Since it is difficult to collect dense depth maps from crowded indoor environments, we design our training framework without requiring depths produced from depth sensing devices. Our network leverages RGB images and sparse depth maps generated from traditional 3D reconstruction methods to estimate dense depth maps. We use two constraints to handle depth for non-rigidly moving people without tracking their motion explicitly. We demonstrate that our approach offers consistent improvements over recent depth estimation methods on the NAVERLABS dataset, which includes complex and crowded scenes.
... Subsequently, a fine-tuning method has been established to estimate depth in a metrically accurate manner with the self-supervised learning scheme. To resolve the issue of scale ambiguity in single-image depth estimation in the wild or any rough or diverse environment, an algorithm termed SelfTune has been introduced in "SelfTune: Metrically Scaled Monocular Depth Estimation through Self-Supervised Learning" [126], which makes use of SLAM (Simultaneous Localization And Mapping) using proprioceptive sensors. These SLAM techniques can provide poses of cameras which are metrically scaled. ...
May 2022
... Among the relevant redundancy-minimizing methods is [23]. Therein, the authors show that fewer correspondences are better in global registration, given that the correspondences are accurate. ...
May 2022
... A coarse initial estimate can be obtained via image retrieval [3,80] against a database of reference images. The pose(s) of the top-retrieved image(s) then provide an approximation of the pose of the query image [32,79]. A more efficient alternative to image retrieval is to directly regress the camera pose using a neural network [5,13,17,18,35,36,73,74,84]. ...
May 2022
International Journal of Computer Vision
... However, these depth completion methods are vulnerable to noisy depth values from SLAM and varying sparse point distributions [33], [35]. Thus, we aim to leverage the learning-based depth estimation from a single image to predict depth with a metric scale [14]- [16]. ...
October 2021
... Over the last decade, many visual localisation methods have been proposed, including feature matching-based approaches [11,21,27,30,41], scene coordinate regression [2][3][4] and absolute pose regressors (APRs) [17,18,37]. Much of this progress has been driven by the availability of diverse datasets and benchmarks [6,8,10,18,19,29,31,36,38,40,41,43,44]. However, most of these datasets present limitations that affect their application to XR. ...
June 2021
... This algorithm was extended to depth completion, introducing improvements like the use of aligned color as a guiding factor for the weight function and to define the order of computations [33], and the use of a pixel-wise confidence factor [30]. Sparse depth maps, generally captured with LiDAR sensors, suffer especially from large patches of missing depth data and have particular time limitations, as they are commonly linked with autonomous driving. Thus, more advanced techniques have been developed, relying on both supervised [29] and self-supervised [36,11,16] deep convolutional neural networks assisted by color information to fill large depth gaps. ...
May 2021
... Sin embargo, una secuencia de imágenes aumenta la precisión del método. En consecuencia, algunos artículos han comenzado a aprovechar secuencias de imágenes para estimar la VBL (Lee et al. (2021); Brahmbhatt et al. (2018); Valada et al. (2018); Xue et al. (2019); Li et al. (2019)). En este articulo, suponemos que se nos proporcionan datos secuenciales, y para localizar combinamos métodos de aprendizaje profundo y métodos tradicionales de seguimiento. ...
January 2021
... Indoor place recognition represents an important yet relatively less explored area. The SpoxelNet (Chang et al. 2020) neural network architecture was proposed as a 3D-PCPR method tailored for crowded indoor spaces. SpoxelNet effectively encodes input voxels into global descriptor vectors. ...
October 2020