Donghwan Lee's research works | Naver and other places

This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering

Conference Paper

June 2023

10 Reads

[...]

TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering

Preprint

March 2023

23 Reads

[...]

We present a new pipeline for acquiring a textured mesh in the wild with a single smartphone which offers access to images, depth maps, and valid poses. Our method first introduces an RGBD-aided structure from motion, which can yield filtered depth maps and refines camera poses guided by corresponding depth. Then, we adopt the neural implicit surface reconstruction method, which allows for high-quality mesh and develops a new training process for applying a regularization provided by classical multi-view stereo methods. Moreover, we apply a differentiable rendering to fine-tune incomplete texture maps and generate textures which are perceptually closer to the original scene. Our pipeline can be applied to any common objects in the real world without the need for either in-the-lab environments or accurate mask images. We demonstrate results of captured objects with complex shapes and validate our method numerically against existing 3D reconstruction and texture mapping methods.

Investigating the Role of Image Retrieval for Visual Localization -- An exhaustive benchmark

May 2022

91 Reads

[...]

Visual localization, i.e., camera pose estimation in a known scene, is a core component of technologies such as autonomous driving and augmented reality. State-of-the-art localization approaches often rely on image retrieval techniques for one of two purposes: (1) provide an approximate pose estimate or (2) determine which parts of the scene are potentially visible in a given query image. It is common practice to use state-of-the-art image retrieval algorithms for both of them. These algorithms are often trained for the goal of retrieving the same landmark under a large range of viewpoint changes which often differs from the requirements of visual localization. In order to investigate the consequences for visual localization, this paper focuses on understanding the role of image retrieval for multiple visual localization paradigms. First, we introduce a novel benchmark setup and compare state-of-the-art retrieval representations on multiple datasets using localization performance as metric. Second, we investigate several definitions of "ground truth" for image retrieval. Using these definitions as upper bounds for the visual localization paradigms, we show that there is still sgnificant room for improvement. Third, using these tools and in-depth analysis, we show that retrieval performance on classical landmark retrieval or place recognition tasks correlates only for some but not all paradigms to localization performance. Finally, we analyze the effects of blur and dynamic scenes in the images. We conclude that there is a need for retrieval approaches specifically designed for localization paradigms. Our benchmark and evaluation protocols are available at https://github.com/naver/kapture-localization. https://arxiv.org/abs/2205.15761

Download

This paper analyzes the role of image retrieval in three visual localization paradigms through extensive experiments

InLoc 3D map generated by assigning a 3D point to each local feature in the training images (viewed in COLMAP)

Top view of the 3D reconstruction of GangnamStation_B2

Paradigm 1 (pose approximation). Results obtained with pose interpolation methods where the weights are obtained using EWB, BDI, and CSI (rows) for different datasets (columns). For datasets with available retrieval GT (see Sect. 3.6.3), we show results obtained using the GT rankings (dashed lines) with EWB weighting scheme. These results can be understood as upper bounds on the localization performance. The best upper bound can be obtained with the distance-based ranking. The best pose approximation results are obtained with CSI and simply using the top-retrieved pose works best in many cases (except for NetVLAD and for all on Aachen). There is no clear winning global representation for all weighting schemes

Paradigm 2a (pose estimation without a global map). We show the percentage of images localized within the high accuracy threshold as a function of k retrieved images. For datasets with available retrieval GT (see Sect. 3.6.3), we show it as upper bound (dashed lines). Best performance is obtained with the co-observation based GT ranking. The global features perform similarly on GangnamStation_B2, Baidu-mall, and RobotCar day, while on RobotCar night and Aachen (day and night), AP-GeM and DELG-GLDv2 (due to their extensive training data) outperform NetVLAD and DenseVLAD

Investigating the Role of Image Retrieval for Visual Localization: An Exhaustive Benchmark

May 2022

137 Reads

22 Citations

International Journal of Computer Vision

[...]

Visual localization, i.e., camera pose estimation in a known scene, is a core component of technologies such as autonomous driving and augmented reality. State-of-the-art localization approaches often rely on image retrieval techniques for one of two purposes: (1) provide an approximate pose estimate or (2) determine which parts of the scene are potentially visible in a given query image. It is common practice to use state-of-the-art image retrieval algorithms for both of them. These algorithms are often trained for the goal of retrieving the same landmark under a large range of viewpoint changes which often differs from the requirements of visual localization. In order to investigate the consequences for visual localization, this paper focuses on understanding the role of image retrieval for multiple visual localization paradigms. First, we introduce a novel benchmark setup and compare state-of-the-art retrieval representations on multiple datasets using localization performance as metric. Second, we investigate several definitions of “ground truth” for image retrieval. Using these definitions as upper bounds for the visual localization paradigms, we show that there is still significant room for improvement. Third, using these tools and in-depth analysis, we show that retrieval performance on classical landmark retrieval or place recognition tasks correlates only for some but not all paradigms to localization performance. Finally, we analyze the effects of blur and dynamic scenes in the images. We conclude that there is a need for retrieval approaches specifically designed for localization paradigms. Our benchmark and evaluation protocols are available at https://github.com/naver/kapture-localization.

View access options

A Single Correspondence Is Enough: Robust Global Registration to Avoid Degeneracy in Urban Environments

Conference Paper

May 2022

11 Reads

28 Citations

[...]

SelfTune: Metrically Scaled Monocular Depth Estimation through Self-Supervised Learning

Conference Paper

May 2022

14 Reads

5 Citations

[...]

A Single Correspondence Is Enough: Robust Global Registration to Avoid Degeneracy in Urban Environments

March 2022

122 Reads

[...]

Global registration using 3D point clouds is a crucial technology for mobile platforms to achieve localization or manage loop-closing situations. In recent years, numerous researchers have proposed global registration methods to address a large number of outlier correspondences. Unfortunately, the degeneracy problem, which represents the phenomenon in which the number of estimated inliers becomes lower than three, is still potentially inevitable. To tackle the problem, a degeneracy-robust decoupling-based global registration method is proposed, called Quatro. In particular, our method employs quasi-SO(3) estimation by leveraging the Atlanta world assumption in urban environments to avoid degeneracy in rotation estimation. Thus, the minimum degree of freedom (DoF) of our method is reduced from three to one. As verified in indoor and outdoor 3D LiDAR datasets, our proposed method yields robust global registration performance compared with other global registration methods, even for distant point cloud pairs. Furthermore, the experimental results confirm the applicability of our method as a coarse alignment. Our code is available: https://github.com/url-kaist/quatro.

Download

SelfTune: Metrically Scaled Monocular Depth Estimation through Self-Supervised Learning

Preprint

March 2022

36 Reads

[...]

Monocular depth estimation in the wild inherently predicts depth up to an unknown scale. To resolve scale ambiguity issue, we present a learning algorithm that leverages monocular simultaneous localization and mapping (SLAM) with proprioceptive sensors. Such monocular SLAM systems can provide metrically scaled camera poses. Given these metric poses and monocular sequences, we propose a self-supervised learning method for the pre-trained supervised monocular depth networks to enable metrically scaled depth estimation. Our approach is based on a teacher-student formulation which guides our network to predict high-quality depths. We demonstrate that our approach is useful for various applications such as mobile robot navigation and is applicable to diverse environments. Our full system shows improvements over recent self-supervised depth estimation and completion methods on EuRoC, OpenLORIS, and ScanNet datasets.

DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes

Conference Paper

October 2021

12 Reads

4 Citations

[...]

DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes

Preprint

August 2021

15 Reads

[...]

We present a novel approach for estimating depth from a monocular camera as it moves through complex and crowded indoor environments, e.g., a department store or a metro station. Our approach predicts absolute scale depth maps over the entire scene consisting of a static background and multiple moving people, by training on dynamic scenes. Since it is difficult to collect dense depth maps from crowded indoor environments, we design our training framework without requiring depths produced from depth sensing devices. Our network leverages RGB images and sparse depth maps generated from traditional 3D reconstruction methods to estimate dense depth maps. We use two constraints to handle depth for non-rigidly moving people without tracking their motion explicitly. We demonstrate that our approach offers consistent improvements over recent depth estimation methods on the NAVERLABS dataset, which includes complex and crowded scenes.

... Subsequently, a fine-tuning method has been established to estimate depth in a metrically accurate manner with the self-supervised learning scheme. To resolve the issue of scale ambiguity in single-image depth estimation in the wild or any rough or diverse environment, an algorithm termed SelfTune has been introduced in "SelfTune: Metrically Scaled Monocular Depth Estimation through Self-Supervised Learning" [126], which makes use of SLAM (Simultaneous Localization And Mapping) using proprioceptive sensors. These SLAM techniques can provide poses of cameras which are metrically scaled. ...
Reference:
Deep Learning-Based Stereopsis and Monocular Depth Estimation Techniques: A Review

SelfTune: Metrically Scaled Monocular Depth Estimation through Self-Supervised Learning

Citing Conference Paper
May 2022

[...]

... Among the relevant redundancy-minimizing methods is [23]. Therein, the authors show that fewer correspondences are better in global registration, given that the correspondences are accurate. ...
Reference:
RMS: Redundancy-Minimizing Point Cloud Sampling for Real-Time Pose Estimation

A Single Correspondence Is Enough: Robust Global Registration to Avoid Degeneracy in Urban Environments

Citing Conference Paper
May 2022

[...]

... A coarse initial estimate can be obtained via image retrieval [3,80] against a database of reference images. The pose(s) of the top-retrieved image(s) then provide an approximation of the pose of the query image [32,79]. A more efficient alternative to image retrieval is to directly regress the camera pose using a neural network [5,13,17,18,35,36,73,74,84]. ...
Reference:
Self-supervised Learning of Neural Implicit Feature Fields for Camera Pose Refinement

Investigating the Role of Image Retrieval for Visual Localization: An Exhaustive Benchmark

Citing Article
Publisher preview available
May 2022

International Journal of Computer Vision

[...]

... However, these depth completion methods are vulnerable to noisy depth values from SLAM and varying sparse point distributions [33], [35]. Thus, we aim to leverage the learning-based depth estimation from a single image to predict depth with a metric scale [14]- [16]. ...
Reference:
SelfTune: Metrically Scaled Monocular Depth Estimation through Self-Supervised Learning

DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes

Citing Conference Paper
October 2021

[...]

... Over the last decade, many visual localisation methods have been proposed, including feature matching-based approaches [11,21,27,30,41], scene coordinate regression [2][3][4] and absolute pose regressors (APRs) [17,18,37]. Much of this progress has been driven by the availability of diverse datasets and benchmarks [6,8,10,18,19,29,31,36,38,40,41,43,44]. However, most of these datasets present limitations that affect their application to XR. ...
Reference:
MARViN: Mobile AR Dataset with Visual-Inertial Data

Large-scale Localization Datasets in Crowded Indoor Spaces

Citing Conference Paper
June 2021

[...]

... This algorithm was extended to depth completion, introducing improvements like the use of aligned color as a guiding factor for the weight function and to define the order of computations [33], and the use of a pixel-wise confidence factor [30]. Sparse depth maps, generally captured with LiDAR sensors, suffer especially from large patches of missing depth data and have particular time limitations, as they are commonly linked with autonomous driving. Thus, more advanced techniques have been developed, relying on both supervised [29] and self-supervised [36,11,16] deep convolutional neural networks assisted by color information to fill large depth gaps. ...
Reference:
SelfReDepth: Self-Supervised Real-Time Depth Restoration for Consumer-Grade Sensors

SelfDeco: Self-Supervised Monocular Depth Completion in Challenging Indoor Environments

Citing Conference Paper
May 2021

[...]

... Sin embargo, una secuencia de imágenes aumenta la precisión del método. En consecuencia, algunos artículos han comenzado a aprovechar secuencias de imágenes para estimar la VBL (Lee et al. (2021); Brahmbhatt et al. (2018); Valada et al. (2018); Xue et al. (2019); Li et al. (2019)). En este articulo, suponemos que se nos proporcionan datos secuenciales, y para localizar combinamos métodos de aprendizaje profundo y métodos tradicionales de seguimiento. ...
Reference:
Localización y detección de anomalías utilizando imágenes en un marco bayesiano

Local to Global: Efficient Visual Localization for a Monocular Camera

Citing Conference Paper
January 2021

... Indoor place recognition represents an important yet relatively less explored area. The SpoxelNet (Chang et al. 2020) neural network architecture was proposed as a 3D-PCPR method tailored for crowded indoor spaces. SpoxelNet effectively encodes input voxels into global descriptor vectors. ...
Reference:
3D point cloud-based place recognition: a survey

SpoxelNet: Spherical Voxel-based Deep Place Recognition for 3D Point Clouds of Crowded Indoor Spaces

Citing Conference Paper
October 2020

Donghwan Lee's research while affiliated with Naver and other places

What is this page?

Publications (17)

Citations (8)