Figure - available from: Machine Vision and Applications
This content is subject to copyright. Terms and conditions apply.
The camera locations estimated by the relocalization module (the blue dots) and the tracking module (the red dots). a Camera locations of V1. b Camera locations of V2. c Camera locations of V3 (color figure online)

The camera locations estimated by the relocalization module (the blue dots) and the tracking module (the red dots). a Camera locations of V1. b Camera locations of V2. c Camera locations of V3 (color figure online)

Source publication
Article
Full-text available
A visual simultaneous localization and mapping (SLAM) system usually contains a relocalization module to recover the camera pose after tracking failure. The core of this module is to establish correspondences between map points and key points in the image, which is typically achieved by local image feature matching. Since recently emerged binary fe...

Similar publications

Conference Paper
Full-text available
In the highly active research field of Simultaneous Localization And Mapping (SLAM), RGB-D images have been a major interest to use. Real-time SLAM for RGB-D images is of great importance since dense methods using all the depth and intensity values showed superior performance in the past. Due to development of GPU and CPU technologies, the real-tim...

Citations

... To meet this requirement, simultaneous localization and mapping has emerged as a solution [1,2]. Visual simultaneous localization and mapping (VSLAM) pertains to an intelligent entity that begins its journey from an unknown location [3]. By observing its surroundings through a camera during movement, it autonomously determines its own position based on environmental cues and incrementally constructs a map [4]. ...
... In order to solve for the other two Euler angles 1  , 3  , define factorization: ...
Preprint
Full-text available
Feature-based visual odometry has difficulty in feature extraction and matching in weak texture environment, resulting in substantial inter-frame pose resolution errors. Meanwhile, the computation and matching of feature point descriptors can be time-consuming and computationally inefficient. To address these issues encountered by traditional ORB-SLAM odometry in texture-lacking regions, an enhanced method for visual odometry estimation is proposed. First, the quadtree technique is employed to extract ORB feature points with a uniform distribution and an adequate number. Subsequently, when processing non-critical frames, the optical flow method is utilized to predict the precise locations of the feature points, circumventing the need for feature matching. Following this, the random sampling consistency method is applied to eliminate mismatched points in optical flow tracking, ensuring that only high-quality internal points are retained. Afterwards, a system of nonlinear equations is solved using AP3P method to estimate the precise position of the camera. Finally, the trajectory is optimized by Dogleg algorithm to achieve accurate and stable tracking and positioning. The experimental results demonstrate that the improved algorithm outperforms mainstream ORB-SLAM3 algorithm in terms of operation efficiency and positioning accuracy across multiple experimental scenarios. This method effectively addresses the challenges of low tracking accuracy and poor real-time performance commonly encountered by traditional visual odometers operating in weak texture environments. As a result, the method combining the feature-based method and the optical flow method significantly enhances the application of visual odometry in complex environments by improving the tracking stability, motion estimation accuracy, and real-time performance.
... Although the PnP problem dates back to the '80s, some recent papers witness the perduring interest in this topic and its applications. In [9], a PnP module is used in a vision-based SLAM algorithm to recover the camera pose after a tracking failure, and the EPnP method presented in [10] is used to solve that problem. In [11], the problem of an unmanned aerial vehicle (UAV) landing on a runway is faced, and it is shown that a PnP algorithm using landmark points on the runway outperforms the landing procedure based on GPS only. ...
Article
Full-text available
In this work, we face the problem of estimating the relative position and orientation of a camera and an object, when they are both equipped with inertial measurement units (IMUs), and the object exhibits a set of n landmark points with known coordinates (the so-called Pose estimation or P n P Problem). We present two algorithms that, fusing the information provided by the camera and the IMUs, solve the P n P problem with good accuracy. These algorithms only use the measurements given by IMUs’ inclinometers, as the magnetometers usually give inaccurate estimates of the Earth magnetic vector. The effectiveness of the proposed methods is assessed by numerical simulations and experimental tests. The results of the tests are compared with the most recent methods proposed in the literature.
... Compared with laser SLAM, monocular SLAM has a lower cost and rich information. And monocular SLAM has the function of relocalization which pure odometry localization does not own [1,2]. Therefore, some household sweeping robots begin to use monocular cameras for localization and mapping. ...
... Camera relocalization plays a critical role in the field of the robotics such as mobile robot navigation, simultaneous localization and mapping (SLAM) and automatic drive [1][2][3][4][5]. Given a pre-built map or a series of images with the exact position, it aims to determine the 6D pose of the current camera view. ...
... Recent research over camera relocalization roughly falls into two classes: appearance-based and deep learning based methods. In the former methods [1][2][3][4][5][6][7][8], on the one hand, place recognition techniques such as bag-of-words (BoW) model [9] are used for retrieving the images similar to the current view and thus, a coarse pose of the current view can be obtained. On the other hand, to compute an accurate pose, the hand-crafted keypoint features such as ORB [10] are extracted first and then feature matching methods are applied to match the points between the 2D current view and the 3D point cloud [11][12][13]. ...
... However, these schemes only provide a coarse camera pose of the current view without achieving accurate camera localization. To obtain accurate 6D pose of the camera, the relocalization is addressed based on a given 3D map of the scene [1][2][3][4][5][6][7][8]. In this pipeline, with the keypoints such as SIFT [31], ORB [10] extracted in the image, the place recognition techniques are utilized to provide the images similar to the current image. ...
Article
Full-text available
Camera relocalization is a challenging task, especially based on the sparse 3D map or keyframes. In this paper, we present an accurate method for RGB camera relocalization in case of a very sparse 3D map built by limited keyframes. The core of our approach is a top-to-down feature matching strategy to provide a set of accurate 2D-to-3D matches. Specifically, we firstly use the landmark-based place recognition method to generate from keyframes the images similar to the current view along with the set of pairwise matched landmarks. This step constrains the 3D model points that can be matched with the current view. Then, the points are matched within the landmark pairs and combined afterwards. This is in contrast to conventional methods of feature matching that typically match points between the entire images and the whole 3D map, which, as a result, may not be robust to large viewpoint changes, the main challenge of the relocalization based on sparse map. After feature matching, the camera pose is calculated by an efficient novel Perspective-n-Points (PnP) algorithm. We conduct experiments on challenging datasets to demonstrate that the camera poses estimated by our method based on the sparse 3D point cloud are more accurate than the classical methods using dense map or large number of training images.
... (iii) Local matching methods match points in camera space with known points in world space, pass the correspondences to the Perspective-n-Point (PnP) algorithm [28] or the Kabsch algorithm [31] to generate a number of initial camera pose hypotheses, and then refine these down to a final pose using some variant of RANSAC [20]. Many approaches match the descriptors of sparse keypoints to perform this matching [71,41,19,59]; some approaches that perform dense matching also exist [61]. Local matching methods tend to generalise better to novel poses than image retrieval methods, since individual points are often easier to match from novel angles than are whole images. ...
... Global regression methods generally require significant offline training on the target scene; moreover, their comparatively poor accuracy makes them unattractive for applications like interactive dense SLAM [53] that require precise poses (their main niche is large-scale, RGB-only relocalisation scenarios in which coarse poses are acceptable). Local matching methods can generally be used online [71,19], but because most rely on detecting/matching sufficient sparse keypoints in the image, their robustness can suffer in textureless parts of the scene. By contrast, local regression methods avoid the need to detect keypoints explicitly, making them appealing for robust relocalisation in small/medium-scale scenes. ...
Preprint
Full-text available
Many applications require a camera to be relocalised online, without expensive offline training on the target scene. Whilst both keyframe and sparse keypoint matching methods can be used online, the former often fail away from the training trajectory, and the latter can struggle in textureless regions. By contrast, scene coordinate regression (SCoRe) methods generalise to novel poses and can leverage dense correspondences to improve robustness, and recent work has shown how to adapt SCoRe forests between scenes, allowing their state-of-the-art performance to be leveraged online. However, because they use features hand-crafted for indoor use, they do not generalise well to harder outdoor scenes. Whilst replacing the forest with a neural network and learning suitable features for outdoor use is possible, the techniques used to adapt forests between scenes are unfortunately harder to transfer to a network context. In this paper, we address this by proposing a novel way of leveraging a network trained on one scene to predict points in another scene. Our approach replaces the appearance clustering performed by the branching structure of a regression forest with a two-step process that first uses the network to predict points in the original scene, and then uses these predicted points to look up clusters of points from the new scene. We show experimentally that our online approach achieves state-of-the-art performance on both the 7-Scenes and Cambridge Landmarks datasets, whilst running in under 300ms, making it highly effective in live scenarios.
... Campbell [64] presented a method for globally optimal inlier set maximization for simultaneous camera pose and feature correspondence. Real-time SLAM relocalization with online learning of binary feature indexing was proposed by [65]. Wu et al. [66] proposed CNNs for camera relocalization. ...
Article
Full-text available
Abstract Virtual reality, augmented reality, robotics, and autonomous driving, have recently attracted much attention from both academic and industrial communities, in which image-based camera localization is a key task. However, there has not been a complete review on image-based camera localization. It is urgent to map this topic to enable individuals enter the field quickly. In this paper, an overview of image-based camera localization is presented. A new and complete classification of image-based camera localization approaches is provided and the related techniques are introduced. Trends for future development are also discussed. This will be useful not only to researchers, but also to engineers and other individuals interested in this field.
... It must be noticed that loop closure in our problem cannot be handled as in keypoint-based SLAM. When using 386 keypoints, loop closure can be detected after tracking using several approaches such as [27,28,29,30]. Once detected, ...
Article
Full-text available
SLAM is generally addressed using natural landmarks such as keypoints or texture, but it poses some limitations, such as the need for enough textured environments and high computational demands. In some cases, it is preferable sacrificing the flexibility of such methods for an increase in speed and robustness by using artificial landmarks. The recent work [1] proposes an off-line method to obtain a map of squared planar markers in large indoor environments. By freely distributing a set of markers printed on a piece of paper, the method estimates the marker poses from a set of images, given that at least two markers are visible in each image. Afterwards, camera localization can be done, in the correct scale. However, an off-line process has several limitations. First, errors can not be detected until the whole process is finished, e.g., an insufficient number of markers in the scene or markers not properly spotted in the capture stage. Second, the method is not incremental, so, in case of requiring the expansion of the map, it is necessary to repeat the whole process from start. Finally, the method can not be employed in real-time systems with limited computational resources such as mobile robots or UAVs. To solve these limitations, this work proposes a real-time solution to the problems of simultaneously localizing the camera and building a map of planar markers. This paper contributes with a number of solutions to the problems arising when solving SLAM from squared planar markers, coining the term SPM-SLAM. The experiments carried out show that our method can be more robust, precise and fast, than visual SLAM methods based on keypoints or texture.
Article
In this paper, we proposed a sparse semantic map building method and an outdoor relocalization strategy based on this map. Most existing semantic mapping approaches focus on improving semantic understanding of single frames and retain a large amount of environmental data. Instead, we don't want to locate the UGV precisely, but use the imprecise environmental information to determine the general position of UGV in a large-scale environment like human beings. For this purpose, we divide the environment into environment nodes according to the result of scene understanding. The semantic map of the outdoor environment is obtained by generating topological relations between the environment nodes. In the semantic map, only the information of the nodes is saved, so that the storage space can be kept at a very small level with the increasing size of environment. When the UGV receives a new local semantic map, we evaluate the similarity between local map and global map to determine the possible position of the UGV according to the categories of the left and right nodes and the distance between the current position and the nodes. In order to validate the proposed approach, experiments have been conducted in a large-scale outdoor environment with a real UGV. Depending on the semantic map, the UGV can redefine its position from different starting points.