Figure - available from: Autonomous Robots
This content is subject to copyright. Terms and conditions apply.
Block diagram of the visual odometry software system

Block diagram of the visual odometry software system

Source publication
Article
Full-text available
Visual odometry can be augmented by depth information such as provided by RGB-D cameras, or from lidars associated with cameras. However, such depth information can be limited by the sensors, leaving large areas in the visual images where depth is unavailable. Here, we propose a method to utilize the depth, even if sparsely available, in recovery o...

Similar publications

Article
Full-text available
We propose and demonstrate a novel surface-grating vertical cavity surface emitting laser (VCSEL)-integrated amplifier/beam scanner. When the surface of the VCSEL section is periodically etched, a single slow-light mode which travels laterally into the amplifier section is selected due to the wavelength selectivity of the grating. The coupled slow...

Citations

... Many studies propose methods that combine and complement the strengths of vision sensors and LiDAR sensors, which have contrasting characteristics. The approaches suggested in [15][16][17][18] involve extracting visual features from the vision sensor and measuring depth with the LiDAR sensor. Although these methods leverage the advantages of both sensors, the point cloud generated with the LiDAR sensor is less dense compared to the vision sensor, resulting in 3D-2D depth association errors. ...
... The loosely coupled approach includes LiDAR-assisted Visual SLAM and Visual Assist LiDAR SLAM. LiDAR-assisted Visual SLAM [15][16][17][18] addresses one of the main issues in Visual SLAM-depth inaccuracies of image feature points-by correcting them with LiDAR data. While this greatly enhances the accuracy of feature points, the relatively low resolution of LiDAR data can lead to errors in 3D-2D mapping. ...
... Previous studies [15][16][17][18] have utilized LiDAR-assisted visual odometry based on image feature points. However, as illustrated in Figure 3, due to the density differences between vision sensors and LiDAR sensors, not all image feature points match with LiDAR 3D points. ...
Article
Full-text available
In this study, we enhanced odometry performance by integrating vision sensors with LiDAR sensors, which exhibit contrasting characteristics. Vision sensors provide extensive environmental information but are limited in precise distance measurement, whereas LiDAR offers high accuracy in distance metrics but lacks detailed environmental data. By utilizing data from vision sensors, this research compensates for the inadequate descriptors of LiDAR sensors, thereby improving LiDAR feature matching performance. Traditional fusion methods, which rely on extracting depth from image features, depend heavily on vision sensors and are vulnerable under challenging conditions such as rain, darkness, or light reflection. Utilizing vision sensors as primary sensors under such conditions can lead to significant mapping errors and, in the worst cases, system divergence. Conversely, our approach uses LiDAR as the primary sensor, mitigating the shortcomings of previous methods and enabling vision sensors to support LiDAR-based mapping. This maintains LiDAR Odometry performance even in environments where vision sensors are compromised, thus enhancing performance with the support of vision sensors. We adopted five prominent algorithms from the latest LiDAR SLAM open-source projects and conducted experiments on the KITTI odometry dataset. This research proposes a novel approach by integrating a vision support module into the top three LiDAR SLAM methods, thereby improving performance. By making the source code of VA-LOAM publicly available, this work enhances the accessibility of the technology, fostering reproducibility and transparency within the research community.
... Zhang et al. [25,72] utilized LiDAR depth information to enhance visual odometry in DEMO. They utilized the estimated pose of the camera to register a depth map, where new points from point clouds in the front of the camera are added. ...
Article
In recent years, Simultaneous Localization And Mapping (SLAM) technology has prevailed in a wide range of applications, such as autonomous driving, intelligent robots, Augmented Reality (AR), and Virtual Reality (VR). Multi-sensor fusion using the most popular three types of sensors (e.g., visual sensor, LiDAR sensor, and IMU) is becoming ubiquitous in SLAM, in part because of the complementary sensing capabilities and the inevitable shortages (e.g., low precision and long-term drift) of the stand-alone sensor in challenging environments. In this article, we survey thoroughly the research efforts taken in this field and strive to provide a concise but complete review of the related work. Firstly, a brief introduction of the state estimator formation in SLAM is presented. Secondly, the state-of-the-art algorithms of different multi-sensor fusion algorithms are given. Then we analyze the deficiencies associated with the reviewed approaches and formulate some future research considerations. This paper can be considered as a brief guide to newcomers and a comprehensive reference for experienced researchers and engineers to explore new interesting orientations.
... The visual-LiDAR SLAM systems [42,45,57] provide a low-cost, low-compute, high-precision method for mapping and calculating odometry. DEMO [54,55] is the first to associate LiDAR depth measurements with Harris corner features. However, due to the lack of loop closure, the accumulated residuals cannot be optimized and eliminated. ...
Preprint
Full-text available
The mobile robot relies on SLAM (Simultaneous Localization and Mapping) to provide autonomous navigation and task execution in complex and unknown environments. However, it is hard to develop a dedicated algorithm for mobile robots due to dynamic and challenging situations, such as poor lighting conditions and motion blur. To tackle this issue, we propose a tightly-coupled LiDAR-visual SLAM based on geometric features, which includes two sub-systems (LiDAR and monocular visual SLAM) and a fusion framework. The fusion framework associates the depth and semantics of the multi-modal geometric features to complement the visual line landmarks and to add direction optimization in Bundle Adjustment (BA). This further constrains visual odometry. On the other hand, the entire line segment detected by the visual subsystem overcomes the limitation of the LiDAR subsystem, which can only perform the local calculation for geometric features. It adjusts the direction of linear feature points and filters out outliers, leading to a higher accurate odometry system. Finally, we employ a module to detect the subsystem's operation, providing the LiDAR subsystem's output as a complementary trajectory to our system while visual subsystem tracking fails. The evaluation results on the public dataset M2DGR, gathered from ground robots across various indoor and outdoor scenarios, show that our system achieves more accurate and robust pose estimation compared to current state-of-the-art multi-modal methods.
... Independent use of LiDAR SLAM in various scenarios, such as long tunnels or environments with occlusions, remains challenging due to degradation and outliers. Therefore, the combination of cameras and LiDAR in SLAM shows more robust capabilities.DEMO[ZKS14] [ZKS17] was the first to associate LiDAR depth measurements with Harris corner features. It builds two different constraints used together for pose optimization for features with and without depth. ...
Thesis
Full-text available
Providing stable navigation for the visually impaired in various unknown scenarios is a significant challenge. Simultaneous Localization And Mapping (SLAM) technology is essential for navigation with precise and reliable real-time map and position. It helps visually impaired people reach their destinations smoothly. Several SLAM algorithms are currently designed for such navigation. However, tracking in unfamiliar environments without prior information poses a significant challenge for achieving reliable results. Single-sensor SLAM algorithms face challenges in improving robustness due to sensor limitations. Multi-modal SLAM has a higher potential for improving robustness. Multi-modal SLAM systems supplement the information in different dimensions by combining the different sensors. For example, cameras capture rich texture information such as color and brightness but provide only 2D information. Depth, an essential parameter for map reconstruction, is missing. On the other hand, the LiDAR sensor captures depth information in full 360 degrees, but the resulting point clouds are massive, unordered, and lack semantic information. We also found that algorithms based on point features are susceptible to noise. Geometric features, such as linear and planar features, are more robust and are receiving more attention in this field. In this master thesis, we propose a tightly-coupled LiDAR-visual SLAM based on geometric features. It consists of two parallel subsystems - a LiDAR and a visual subsystem. These subsystems independently generate rich geometric features. While the system is running, we construct a fusion framework that acquires geometric features from the front end of both subsystems. As the dimensions of the geometric features generated by the two subsystems are inconsistent, we establish a spherical coordinate system as a fusion reference system. It ensures the spatial and temporal consistency of the geometric features within a unified system. Our LiDAR-visual system, also known as a multi-modal SLAM, employs geometric information provided by the visual subsystem, such as 2D detected lines, and linear and planar features with depth and direction supplied by the LiDAR subsystem. Through neighborhood search, projection, and computation of the feature's direction, our multi-modal framework generates high-quality and diverse spatial lines. These lines return as new optimization terms to both subsystems. The LiDAR subsystem uses 2D line features from the multi-modal frame to optimize the direction of its linear feature points. In the monocular visual subsystem, the reconstructed lines with depth and direction contribute as new optimization terms to the visual odometry estimation and back-end optimization, thus improving the accuracy of the multi-modal subsystem. Finally, we employ a module to detect whether the subsystem is working. We choose the visual subsystem output as the final result of our algorithm since visual SLAM has higher accuracy, even though it may have lower robustness. If the detection module detects a failure in the visual odometry tracking, we employ the LiDAR subsystem's trajectory and map as the final result. With our selected dataset M2DGR, we completed experiments in various scenes, including narrow and spacious indoor and outdoor environments with varying lighting conditions. These scenes match the diverse, complex, and unfamiliar navigational requirements of the visually impaired. We executed the proposed SLAM algorithm and analyzed the results qualitatively and quantitatively. We found that the feature fusion in our multi-modal framework was effective in various scenarios, and our algorithm achieved higher accuracy than its predecessor subsystems. Furthermore, our algorithm produced complete trajectories and maps in every scene. This proves the robustness of our algorithm. In conclusion, our system explores the fusion of geometric features in visual-LiDAR multi-modal SLAM and has made significant progress in this area. It provides navigation systems with more accurate position and environment information in unknown scenarios. Our system also adapts to different scenes and provides stable and reliable performance.
... However, this method performs sensor fusion in a limited space and did not perform accurately in experiments on real-time datasets, nor was it compared with other methods. The visual odometry DEMO [18] proposed by Zhang et al. combines monocular vision and depth information to classify feature points into feature points with depth information matching, feature points with depth obtained by triangulation, and feature points without depth information, then combines them for pose estimation. The experiment was conducted on the KITTI Visual Odometry datasets and the result shows that the positioning accuracy is even higher than that of some stereo methods, but the DEMO is more sensitive to the features of specific angles. ...
... We compared our method with other multi-sensor fusion SLAM. In Table 3, we compare the average translation error of our method with DEMO [18] and DVL-SLAM [19] in the KITTI Visual Odometry datasets. Out of 11 sequences, we outperform DEMO in 7 sequences and DVL-SLAM in 6 sequences. ...
Article
Full-text available
Monocular camera and Lidar are the two most commonly used sensors in unmanned vehicles. Combining the advantages of the two is the current research focus of SLAM and semantic analysis. In this paper, we propose an improved SLAM and semantic reconstruction method based on the fusion of Lidar and monocular vision. We fuse the semantic image with the low-resolution 3D Lidar point clouds and generate dense semantic depth maps. Through visual odometry, ORB feature points with depth information are selected to improve positioning accuracy. Our method uses parallel threads to aggregate 3D semantic point clouds while positioning the unmanned vehicle. Experiments are conducted on the public CityScapes and KITTI Visual Odometry datasets, and the results show that compared with the ORB-SLAM2 and DynaSLAM, our positioning error is approximately reduced by 87%; compared with the DEMO and DVL-SLAM, our positioning accuracy improves in most sequences. Our 3D reconstruction quality is better than DynSLAM and contains semantic information. The proposed method has engineering application value in the unmanned vehicles field.
... At present, there are many methods of calibrating LiDAR and camera with extrinsic parameters [37][38][39]. The calibration method is mainly divided into two parts: one is based on dynamic calibration and the other is based on point, line, and plane calibration. ...
Article
Full-text available
The sensing system consisting of Light Detection and Ranging (LiDAR) and a camera provides complementary information about the surrounding environment. To take full advantage of multi-source data provided by different sensors, an accurate fusion of multi-source sensor information is needed. Time synchronization and space registration are the key technologies that affect the fusion accuracy of multi-source sensors. Due to the difference in data acquisition frequency and deviation in startup time between LiDAR and the camera, asynchronous data acquisition between LiDAR and camera is easy to occur, which has a significant influence on subsequent data fusion. Therefore, a time synchronization method of multi-source sensors based on frequency self-matching is developed in this paper. Without changing the sensor frequency, the sensor data are processed to obtain the same number of data frames and set the same ID number, so that the LiDAR and camera data correspond one by one. Finally, data frames are merged into new data packets to realize time synchronization between LiDAR and camera. Based on time synchronization, to achieve spatial synchronization, a nonlinear optimization algorithm of joint calibration parameters is used, which can effectively reduce the reprojection error in the process of sensor spatial registration. The accuracy of the proposed time synchronization method is 99.86% and the space registration accuracy is 99.79%, which is better than the calibration method of the Matlab calibration toolbox.
... isual navigation system has been widely used in autonomous mobile robots. For visual navigation based on a monocular camera, the rotation can be recovered except for the translation scale [1]. An inertial measurement unit (IMU) can be incorporated to construct a visual-inertial navigation system (VINS) [2], [3] and retrieve the scale. ...
... LiDAR can directly obtain long-distance and centimeter-level depth measurements. It has been incorporated into the visual navigation system to provide depth estimation for visual landmarks in recent years [1], [10]- [16]. The spinning 3D LiDAR can achieve a 360° horizontal field of view (FOV) while a smaller vertical FOV, such as 30° for Velodyne VLP-16. ...
... Using LiDAR with more laser beams can mitigate this effect. In [1], [10]- [12], 64-beam LiDAR is adopted to provide depth for their feature-based visual navigation systems. However, the more laser beams the LiDAR has, the more expensive it is. ...
Article
Accurate and long-distance depth estimation for visual landmarks is challenging in visual-inertial navigation systems (VINS). In visual-degenerated scenes with illumination changes, moving objects, or weak texture, depth estimation may be more difficult, resulting in poor robustness and accuracy. For low-speed robot navigation, we present a solid-state-LiDAR-enhanced VINS (LE-VINS) to improve the system robustness and accuracy in challenging environments. The point clouds from the solid-state LiDAR are projected to the visual keyframe with the inertial navigation system (INS) pose for depth association while compensating for the motion distortion. A robust depth-association method with an effective plane-checking algorithm is proposed to estimate the landmark depth. With the estimated depth, we present a LiDAR depth factor to construct accurate depth measurements for visual landmarks in factor graph optimization (FGO). The visual feature, LiDAR depth, and IMU measurements are tightly fused within the FGO framework to achieve maximum-a-posterior estimation. Field tests were conducted on a low-speed robot in large-scale challenging environments. The results demonstrate that the proposed LE-VINS yields significantly improved robustness and accuracy compared to the original VINS. Besides, LE-VINS exhibits superior accuracy than the state-of-the-art LiDAR-visual-inertial navigation system. LE-VINS also outperforms the existing LiDAR-enhanced method, benefiting from the robust depth-association algorithm and the effective LiDAR depth factor.
... In order to estimate the disturbance during hover conditions, the VINS filter should either maintain feature depths as part of the state [13] or have additional depth information feedback [14]. For GPS-denied dynamic flights, depth enhanced visual navigation is proposed in [15] because relying on monocular or stereo estimated depth at high altitudes becomes unreliable [16]. Work in [15] proposes a depth enhanced update rule which has shown stable results in full-scale vertical take-off and landing (VTOL) applications. ...
... For GPS-denied dynamic flights, depth enhanced visual navigation is proposed in [15] because relying on monocular or stereo estimated depth at high altitudes becomes unreliable [16]. Work in [15] proposes a depth enhanced update rule which has shown stable results in full-scale vertical take-off and landing (VTOL) applications. However, disturbance estimation and the related observability limitations were not studied. ...
... where g ij is the column j of matrix g i of system (15), the value of the matrix was omitted for brevity. The matrix Ξ was found to be of full column rank, therefore, the observability properties of the system can be found using matrix B in (16) that contains the gradients of the basis set, where ...
... Loosely Coupled Method Methods of this type usually extract depth from LiDAR to enhance visual odometry. DEMO [24] integrates the depth information of point clouds into visual SLAM for the first time. LIMO-PL [10] utilizes line features with depths from LiDAR. ...
Preprint
The ability for a moving agent to localize itself in environment is the basic demand for emerging applications, such as autonomous driving, etc. Many existing methods based on multiple sensors still suffer from drift. We propose a scheme that fuses map prior and vanishing points from images, which can establish an energy term that is only constrained on rotation, called the direction projection error. Then we embed these direction priors into a visual-LiDAR SLAM system that integrates camera and LiDAR measurements in a tightly-coupled way at backend. Specifically, our method generates visual reprojection error and point to Implicit Moving Least Square(IMLS) surface of scan constraints, and solves them jointly along with direction projection error at global optimization. Experiments on KITTI, KITTI-360 and Oxford Radar Robotcar show that we achieve lower localization error or Absolute Pose Error (APE) than prior map, which validates our method is effective.
... In 2014, Zhang et al. presented DEMO [132,133] that used depth information from LiDAR to recover camera motion. Harris corners features are extracted and tracked from RGB images, if the corresponding depth information is available from laser data it is directly associated to it, otherwise it is triangulated using the previously approximated motion. ...
Article
Full-text available
Simultaneous Localization and Mapping (SLAM) have been widely studied over the last years for autonomous vehicles. SLAM achieves its purpose by constructing a map of the unknown environment while keeping track of the location. A major challenge, which is paramount during the design of SLAM systems, lies in the efficient use of onboard sensors to perceive the environment. The most widely applied algorithms are camera-based SLAM and LiDAR-based SLAM. Recent research focuses on the fusion of camera-based and LiDAR-based frameworks that show promising results. In this paper, we present a study of commonly used sensors and the fundamental theories behind SLAM algorithms. The study then presents the hardware architectures used to process these algorithms and the performance obtained when possible. Secondly, we highlight state-of-the-art methodologies in each modality and in the multi-modal framework. A brief comparison followed by future challenges is then underlined. Additionally, we provide insights to possible fusion approaches that can increase the robustness and accuracy of modern SLAM algorithms; hence allowing the hardware-software co-design of embedded systems taking into account the algorithmic complexity and the embedded architectures and real-time constraints.