Normalized influence area of the points. Notice how it expands around local structure areas given a set of points in Ω. First column: RGB image with the points of Ω labeled with different colors. Second column: influence areas computed by our method. Notice how this influence expands in areas with the same local structure but can be misled in areas where there is a lack of points or where the estimation from the neural net is not accurate enough. Figure best viewed in color.

Source publication

Single-View and Multi-View Depth Fusion

Article

Full-text available

Jun 2017

Dense 3D mapping from a monocular sequence is a key technology for several applications and still a research problem. This paper leverages recent results on single-view CNN-based depth estimation and fuses them with multi-view depth estimation. Both approaches present complementary strengths. Multi-view depth estimation is highly accurate but only...

Context 1

... normalized weights expand the local influence to the whole image (see Fig. 4 and Fig. 5 for a more detailed view). Notice how the influence expands along planes even if the points in Ω do not reach the end of the plane; and is sharply reduced when the local structure changes. Once these influence weights have been calculated and normalized, the fusion depth estimation, f , for each point (i, j) is a combination ...

View in full-text

Diseño e Implementación de un Sistema de Localización y Mapeo Simultáneos (SLAM) para la Plataforma Robótica Robotino®

Thesis

Full-text available

Jun 2013

Victor Julio Narváez Tipantaxi

En el presente trabajo se presenta el desarrollo de un proyecto el cual tiene por objetivo la implementación en MATLAB de un algoritmo de SLAM para un robot móvil comercial, operando en entornos estructurados y estáticos, usando para la localización la odometría del robot y el modelo del mismo, y para el mapeo un sensor de rango láser, además, una...

The matching result of feature points in this algorithm

Slam algorithm for multi-robot communication in unknown environment based on particle filter

Article

Full-text available

Mar 2021

Weilian Liu

To solve the SLAM (simultaneous localization and map building) of mobile robots, a multi-robot SLAM algorithm based on particle filter for communication in unknown environment is proposed. In the standard particle filter, the incremental map construction method based on point-line consistency is introduced to preserve the hypothesis of the line seg...

A block diagram illustrating the working of visual odometry (VO) using...

Architecture of the proposed approach. The primary heuristics are...

Illustration of reprojection error of a moving feature where the frames...

Illustration of the behaviour of a tracked feature belonging to a...

Illustration of correlation between current camera coordinate frame...

An Ensemble of Spatial Clustering and Temporal Error Profile Based Dynamic Point Removal for visual Odometry

Article

Full-text available

Jul 2022

Visual odometry in the field of computer vision and robotics is a well-known approach with which the position and orientation of an agent can be obtained using only images from a camera or multiple of them. In most traditional point feature-based visual odometry, one important assumption and also an ideal condition is that the scene remains static....

Figure 2. Occupancy grid map built by a mobile robot with a LiDAR...

Figure 3. An example of configuring a two-robot SLAM.

Figure 6. Individual grid maps in a multi-robot system. (a) Individual...

Figure 7. Rotation angle estimation by the SMM.

Figure 8. Translation amounts estimation by the SMM.

Grid Map Merging with Ant Colony Optimization for Multi-Robot Systems

Chapter

Full-text available

May 2021

Heoncheol Lee

Multi-robot systems have recently been in the spotlight in terms of efficiency in performing tasks. However, if there is no map in the working environment, each robot must perform SLAM which simultaneously performs localization and mapping the surrounding environments. To operate the multi-robot systems efficiently, the individual maps should be ac...

A Fast Visual Feature Matching Algorithm in Multi-Robot Visual SLAM

Conference Paper

Full-text available

May 2019

Abstract: To reduce the feature matching time in visual based multi-robot Simultaneous localization and mapping (SLAM), a feature matching algo-rithm based on map environment is proposed in this paper. This algorithm is different from any other method purposed, which will establish feature li-braries by classifying features collected in the mobile...

Uncertainty and Self-Supervision in Single-View Depth

Preprint

Full-text available

Jun 2024

Javier Rodriguez-Puigvert

Single-view depth estimation refers to the ability to derive three-dimensional information per pixel from a single two-dimensional image. Single-view depth estimation is an ill-posed problem because there are multiple depth solutions that explain 3D geometry from a single view. While deep neural networks have been shown to be effective at capturing depth from a single view, the majority of current methodologies are deterministic in nature. Accounting for uncertainty in the predictions can avoid disastrous consequences when applied to fields such as autonomous driving or medical robotics. We have addressed this problem by quantifying the uncertainty of supervised single-view depth for Bayesian deep neural networks. There are scenarios, especially in medicine in the case of endoscopic images, where such annotated data is not available. To alleviate the lack of data, we present a method that improves the transition from synthetic to real domain methods. We introduce an uncertainty-aware teacher-student architecture that is trained in a self-supervised manner, taking into account the teacher uncertainty. Given the vast amount of unannotated data and the challenges associated with capturing annotated depth in medical minimally invasive procedures, we advocate a fully self-supervised approach that only requires RGB images and the geometric and photometric calibration of the endoscope. In endoscopic imaging, the camera and light sources are co-located at a small distance from the target surfaces. This setup indicates that brighter areas of the image are nearer to the camera, while darker areas are further away. Building on this observation, we exploit the fact that for any given albedo and surface orientation, pixel brightness is inversely proportional to the square of the distance. We propose the use of illumination as a strong single-view self-supervisory signal for deep neural networks.

Two-in-One Depth: Bridging the Gap Between Monocular and Binocular Self-supervised Depth Estimation

Preprint

Full-text available

Sep 2023

Monocular and binocular self-supervised depth estimations are two important and related tasks in computer vision, which aim to predict scene depths from single images and stereo image pairs respectively. In literature, the two tasks are usually tackled separately by two different kinds of models, and binocular models generally fail to predict depth from single images, while the prediction accuracy of monocular models is generally inferior to binocular models. In this paper, we propose a Two-in-One self-supervised depth estimation network, called TiO-Depth, which could not only compatibly handle the two tasks, but also improve the prediction accuracy. TiO-Depth employs a Siamese architecture and each sub-network of it could be used as a monocular depth estimation model. For binocular depth estimation, a Monocular Feature Matching module is proposed for incorporating the stereo knowledge between the two images, and the full TiO-Depth is used to predict depths. We also design a multi-stage joint-training strategy for improving the performances of TiO-Depth in both two tasks by combining the relative advantages of them. Experimental results on the KITTI, Cityscapes, and DDAD datasets demonstrate that TiO-Depth outperforms both the monocular and binocular state-of-the-art methods in most cases, and further verify the feasibility of a two-in-one network for monocular and binocular depth estimation. The code is available at https://github.com/ZM-Zhou/TiO-Depth_pytorch.

Towards the Probabilistic Fusion of Learned Priors into Standard Pipelines for 3D Reconstruction

Preprint

Jul 2022

The best way to combine the results of deep learning with standard 3D reconstruction pipelines remains an open problem. While systems that pass the output of traditional multi-view stereo approaches to a network for regularisation or refinement currently seem to get the best results, it may be preferable to treat deep neural networks as separate components whose results can be probabilistically fused into geometry-based systems. Unfortunately, the error models required to do this type of fusion are not well understood, with many different approaches being put forward. Recently, a few systems have achieved good results by having their networks predict probability distributions rather than single values. We propose using this approach to fuse a learned single-view depth prior into a standard 3D reconstruction system. Our system is capable of incrementally producing dense depth maps for a set of keyframes. We train a deep neural network to predict discrete, nonparametric probability distributions for the depth of each pixel from a single image. We then fuse this "probability volume" with another probability volume based on the photometric consistency between subsequent frames and the keyframe image. We argue that combining the probability volumes from these two sources will result in a volume that is better conditioned. To extract depth maps from the volume, we minimise a cost function that includes a regularisation term based on network predicted surface normals and occlusion boundaries. Through a series of experiments, we demonstrate that each of these components improves the overall performance of the system.

DeepFusion: Real-Time Dense 3D Reconstruction for Monocular SLAM using Single-View Depth and Gradient Predictions

Preprint

Jul 2022

While the keypoint-based maps created by sparse monocular simultaneous localisation and mapping (SLAM) systems are useful for camera tracking, dense 3D reconstructions may be desired for many robotic tasks. Solutions involving depth cameras are limited in range and to indoor spaces, and dense reconstruction systems based on minimising the photometric error between frames are typically poorly constrained and suffer from scale ambiguity. To address these issues, we propose a 3D reconstruction system that leverages the output of a convolutional neural network (CNN) to produce fully dense depth maps for keyframes that include metric scale. Our system, DeepFusion, is capable of producing real-time dense reconstructions on a GPU. It fuses the output of a semi-dense multiview stereo algorithm with the depth and gradient predictions of a CNN in a probabilistic fashion, using learned uncertainties produced by the network. While the network only needs to be run once per keyframe, we are able to optimise for the depth map with each new frame so as to constantly make use of new geometric constraints. Based on its performance on synthetic and real-world datasets, we demonstrate that DeepFusion is capable of performing at least as well as other comparable systems.

Error Diagnosis of Deep Monocular Depth Estimation Models

Preprint

Full-text available

Nov 2021

Estimating depth from a monocular image is an ill-posed problem: when the camera projects a 3D scene onto a 2D plane, depth information is inherently and permanently lost. Nevertheless, recent work has shown impressive results in estimating 3D structure from 2D images using deep learning. In this paper, we put on an introspective hat and analyze state-of-the-art monocular depth estimation models in indoor scenes to understand these models' limitations and error patterns. To address errors in depth estimation, we introduce a novel Depth Error Detection Network (DEDN) that spatially identifies erroneous depth predictions in the monocular depth estimation models. By experimenting with multiple state-of-the-art monocular indoor depth estimation models on multiple datasets, we show that our proposed depth error detection network can identify a significant number of errors in the predicted depth maps. Our module is flexible and can be readily plugged into any monocular depth prediction network to help diagnose its results. Additionally, we propose a simple yet effective Depth Error Correction Network (DECN) that iteratively corrects errors based on our initial error diagnosis.

The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth

Conference Paper

Jun 2021

Self-supervised monocular depth estimation networks are trained to predict scene depth using nearby frames as a supervision signal during training. However, for many applications, sequence information in the form of video frames is also available at test time. The vast majority of monocular networks do not make use of this extra signal, thus ignoring valuable information that could be used to improve the predicted depth. Those that do, either use computationally expensive test-time refinement techniques or off-the-shelf recurrent networks, which only indirectly make use of the geometric information that is inherently available. We propose ManyDepth, an adaptive approach to dense depth estimation that can make use of sequence information at test time, when it is available. Taking inspiration from multi-view stereo, we propose a deep end-to-end cost volume based approach that is trained using self-supervision only. We present a novel consistency loss that encourages the network to ignore the cost volume when it is deemed unreliable, e.g. in the case of moving objects, and an augmentation scheme to cope with static cameras. Our detailed experiments on both KITTI and Cityscapes show that we outperform all published self-supervised baselines, including those that use single or multiple frames at test time.

The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth

Preprint

Apr 2021

Endo-Depth-and-Motion: Localization and Reconstruction in Endoscopic Videos using Depth Networks and Photometric Constraints

Preprint

Full-text available

Mar 2021

Estimating a scene reconstruction and the camera motion from in-body videos is challenging due to several factors, e.g. the deformation of in-body cavities or the lack of texture. In this paper we present Endo-Depth-and-Motion, a pipeline that estimates the 6-degrees-of-freedom camera pose and dense 3D scene models from monocular endoscopic videos. Our approach leverages recent advances in self-supervised depth networks to generate pseudo-RGBD frames, then tracks the camera pose using photometric residuals and fuses the registered depth maps in a volumetric representation. We present an extensive experimental evaluation in the public dataset Hamlyn, showing high-quality results and comparisons against relevant baselines. We also release all models and code for future comparisons.

Image-Based Rendering for Large-Scale Outdoor Scenes With Fusion of Monocular and Multi-View Stereo Depth

Article

Full-text available

Jun 2020

Image-based rendering (IBR) attempts to synthesize novel views using a set of observed images. Some IBR approaches (such as light fields) have yielded impressive high-quality results on small-scale scenes with dense photo capture. However, available wide-baseline IBR methods are still restricted by the low geometric accuracy and completeness of multi-view stereo (MVS) reconstruction on low-textured and non-Lambertian surfaces. The issues become more significant in large-scale outdoor scenes due to challenging scene content, e.g., buildings, trees, and sky. To address these problems, we present a novel IBR algorithm that consists of two key components. First, we propose a novel depth refinement method that combines MVS depth maps with monocular depth maps predicted via deep learning. A lookup table remap is proposed for converting the scale of the monocular depths to be consistent with the scale of the MVS depths. Then, the rescaled monocular depth is used as the constraint in the minimum spanning tree (MST)-based nonlocal filter to refine the per-view MVS depth. Second, we present an efficient shape-preserving warping algorithm that uses superpixels to generate the warped images and blend expected novel views of scenes. The proposed method has been evaluated on public MVS and view synthesis datasets, as well as newly captured large-scale outdoor datasets. In comparison with state-of-the-art methods, the experimental results demonstrated that the proposed method can obtain more complete and reliable depth maps for the challenging large-scale outdoor scenes, thereby resulting in more promising novel view synthesis.

Semantic and structural image segmentation for prosthetic vision

Article

Full-text available

Jan 2020
PLOS ONE

Prosthetic vision is being applied to partially recover the retinal stimulation of visually impaired people. However, the phosphenic images produced by the implants have very limited information bandwidth due to the poor resolution and lack of color or contrast. The ability of object recognition and scene understanding in real environments is severely restricted for prosthetic users. Computer vision can play a key role to overcome the limitations and to optimize the visual information in the prosthetic vision, improving the amount of information that is presented. We present a new approach to build a schematic representation of indoor environments for simulated phosphene images. The proposed method combines a variety of convolutional neural networks for extracting and conveying relevant information about the scene such as structural informative edges of the environment and silhouettes of segmented objects. Experiments were conducted with normal sighted subjects with a Simulated Prosthetic Vision system. The results show good accuracy for object recognition and room identification tasks for indoor scenes using the proposed approach, compared to other image processing methods.

Context in source publication

Similar publications

Citations