Conference PaperPDF Available

Evaluation of Image Feature Detection and Matching Algorithms

Authors:

Figures

Content may be subject to copyright.
Evaluation of Image Feature Detection and Matching Algorithms
Yiwen Ou
School of Information Science and Engineering
Fujian University of Technology
Fuzhou, China
e-mail: 1172704941@qq.com
Zhiming Cai*
National Demonstration Center for Experimental
Electronic Information and Electrical Technology
Education,
Fujian University of Technology
Fuzhou, China
Corresponding author, e-mail: caizm@fjut.edu.cn
Jian Lu
School of Information Science and Engineering
Fujian University of Technology
Fuzhou, China
e-mail: 573843470@qq.com
Jian Dong
School of Information Science and Engineering
Fujian University of Technology,
Fuzhou, China
e-mail: 2711713088@qq.com
Yufeng Ling
School of Information Science and Engineering
Fujian University of Technology,
Fuzhou, China
e-mail: 1504662829@qq.com
Abstract—Image features detection and matching algorithms
play an important role in the field of machine vision. Among
them, the computational efficiency and robust performance of
the features detector descriptor selected by the algorithm have
a great impact on the accuracy and time consumption of image
matching. This paper comprehensively evaluates typical SIFT,
SURF, ORB, BRISK, KAZE, AKAZE algorithms. The Oxford
dataset is used to compare the robustness of various
algorithms under illumination transformation, rotation
transformation, scale transformation, blur transformation,
and viewpoint transformation. Jitter video is also used to
compare the anti-jitter ability for these algorithms. The
indicators compared include: time of detecting features, time
of matching images, total running time, number of detected
feature points, accuracy, number of repeated feature points,
and repetition rate. Experimental results show that, Under
different transformations, each algorithm has its own
advantages and disadvantages.
Keywords-features detection and matching; comprehensively
evaluates; robustness
I. INTRODUCTION
Feature point detection and matching algorithms in
images have been widely used in many machine vision
fields, such as real-time location and 3D reconstruction[1],
pose estimation[2], object recognition[3], intelligent device
application[4], slam (simultaneous localization and mapping)
[5, 6], automatic driving, robot navigation[7], AR[8], etc.
The algorithms can be classified into two categories.
A. Algorithms Based on Blob Detection
Scale invariant feature transform (SIFT) algorithm[9]
was proposed by David G. Lowe in 1999, and then
improved in 2004[10]. SURF (Speed Up Robust Features)
algorithm was first proposed by Bay et al [11]in 2006 and
improved in 2008[12]. This algorithm is a robust local
feature detection algorithm. A more stable feature detection
algorithm KAZE[13] appears in ECCV 2012 than SIFT. In
2013, PF Alcantarilla et al. presented Accelerated-KAZE
(AKAZE) algorithm[14],which adopts nonlinear diffusion
filtering. AKAZE improves repeatability and uniqueness
compared with SIFT and SURF.
B. Algorithms Based on Corner Detection
ORB (Oriented Fast and Rotated Binary Robust
Independent Elementary Features) was proposed by Ruble et
al.[15] in 2011. BRISK(Binary Robot Invariant Scalable
keypoints) method was proposed by Stefan et al in 2011[16],
which realized the detection, description and matching of
image feature points.
Image feature detection and matching algorithms usually
have the following steps: 1) Detection and description of
feature points; 2) Matching of feature points; 3) Rough
matching of feature points and using the RANSAC method
220

978-1-7281-6136-5/20/$31.00 ©2020 IEEE
Authorized licensed use limited to: Fujian University of Technology. Downloaded on July 07,2020 at 08:33:48 UTC from IEEE Xplore. Restrictions apply.
to perform " Purify "(that is, remove outliers). 4) The
features obtained from 3) are then matched (good matches).
In this paper, SIFT, SURF, ORB, BRIKS, KAZE,
AKAZE algorithms are evaluated with Oxford dataset which
providing video sequence for rotation transformation, scale
transformation, illumination transformation, blur
transformation, viewpoint transformation. We also use jitter
video stream to verify the robustness of anti-jitter.
II. FUNDAMENTALS OF EVALUATION
A. Experimental Setup
OPENCV 3.4 has been used for experiments presented in
this paper. Specifications of the computer system used are:
Intel(R) Core(TM) i5-4210U CPU @1.70GHz 2.40GHz and
4.00GB RAM.
B. Datasets
Two groups of experiments, namely the robustness of
various algorithms under different transformations and the
anti-jitter ability are performed.
The Oxford dataset
(http://www.robots.ox.ac.uk/~vgg/research/affine/) is used
as the first set of experimental data to evaluate the robust
performance of each algorithm under illumination
transformation, blur transformation, scale transformation,
rotation transformation and viewpoint transformation. The
leuven and graf image packages are used originally. For
other image packages, the first image is kept and the others
are deleted. In bikes packages, the first image is filtered by 5
× 5 mean, Gauss and median filters. All the result images are
put in the same package which denoted as updated bikes
package. In boats package, the first image is contracted by
0.2 times, 0.5 times sampling and enlarged by 1.5 times, 2
times sampling. In bark package, the first image is rotated by
15 °, 30 °, 45 °, 60 °, 90 ° 180 °respectively. All the result
images are put in corresponding package.
The second set of experimental data is a video suffering
from strong rolling shutter
artifacts(http://web.cecs.pdx.edu/~fliu/project/subspace_stab
ilization/). Two frames of the video are extracted to verify
the anti-jitter performance of each algorithm.
This paper use the following indicators to describe the
performance of various algorithms: 1) time of detecting
features; 2) time of matching images; 3) total running time;
4) number of detected feature points; 5) accuracy (The
number of feature points filtered by RANSAC divided by
the number of feature points after rough matching); 6)
correspondence(repeat feature point pairs for feature point
detection) ;7) repeatability(correspondence divided by the
minimum number of detected feature points in two pictures).
III. EVALUATION OF ROBUSTNESS OF VARIOUS
ALGORITHMS
To evaluate the robustness of every algorithm, the first
image is matched with the rest images one by one in the
updated package. As some algorithms may fail under certain
transformation, this paper takes the average of the
successfully matched data set as the experimental result. To
get reasonable experimental result, each pair of images are
matched for 5 times, and the detecting and matching time is
averaged.
A. Evaluate the Indicators of Each Algorithm under Each
Transformation
1) The number of features detect and correspondence
As shown in the Table I, the number of feature points
detected by SIFT, SURF and BRISK are several times that
of the other three, and the ORB gets the least.
Except for scale transformation, the number of repeated
feature points detected by SURF algorithm is the highest.
ORB algorithm has the least number of repeated feature
points;
TABLE I. THE NUMBER OF FEATURES DETECTED AND CORRESPONDENCE
OF VARIOUS ALGORITHMS
DOJRULWKPV
LQGLFDWRUV6,)7 685) 25% %5,6. .$=( $.$=(
LOOXPLQDWLRQWUDQVIRUPDWLRQRULJLQDOOHXYHQSDFNDJH
D      
E     
EOXUWUDQVIRUPDWLRQXSGDWHGELNHVSDFNDJH
D      
E      
VFDOHWUDQVIRUPDWLRQXSGDWHGERDWVSDFNDJH
D      
E      
URWDWLRQWUDQVIRUPDWLRQXSGDWHGEDUNSDFNDJH
D      
E      
YLHZSRLQWWUDQVIRUPDWLRQRULJLQDOJUDISDFNDJH
D      
E      
-LWWHUYLGHR
D      
E      
Note: a-The number of features detect; b-Correspondence.
2) The time of detecting features, matching features and
total time
The time indicators of all the algorithms evaluated in
different datasets are shown in Figure 1-5. All the figures
show that under any transformation, ORB, AKAZE pay
much less time to detect and match features compared with
other algorithms .KAZE consumes the maximum time under
any transformation. BRISK is similar to SIFT. For all the
tests, ORB spends the least time for detecting and matching
features. In most situations, SIFT outperforms SURF except
scale transformation. From these figures, it also depicts that
detecting costs more time than matching except in scale
transformation. KAZE and AKAZE nearly have equal
matching time. Overall, the algorithms ORB and AKAZE
have better performance in detecting and matching time.
Figure 1. Illumination transformation.
221
Authorized licensed use limited to: Fujian University of Technology. Downloaded on July 07,2020 at 08:33:48 UTC from IEEE Xplore. Restrictions apply.
Figure 2. Blur transformation.
Figure 3. Scale transformation.
Figure 4. Rotate transformation.
Figure 5. Viewpoint transformation.
3) Accuracy and Repeatability
The accuracy and repeatability evaluated with different
datasets are shown in Figure 6-10. From Figure6-10, it
shows that the accuracy of each algorithm is very high in the
blur transformation. However the accuracy is generally low
in viewpoint transformation where KAZE gets the lowest.
Under the illumination transformation, the accuracy of SIFT
is the lowest , while that of AKAZE is the highest. Under the
rotation transformation, KAZE gets the lowest accuracy.
It can be concluded from Figure 6 to Figure 10: 1)under
the illumination, scale and blur transformations, the
repeatability of AKAZE is the highest. 2)under the rotation
or viewpoint transformation, the highest Repeatability
changeto KAZE. 3) For illumination or scale transformation,
the ORB achieves the lowest repeatability.
Based on the above evaluation performance indicators,
we can find that some algorithms are robust under certain
transformations. For instance SURF performs well under the
illumination, blur, scale, rotate transformation. And SIFT
performs better under the scale and rotate transformation. In
addition ,under viewpoint transformation, BRISK algorithm
and AKAZE algorithm are also perform better.
Figure 6. Illumination transformation.
Figure 7. Blur transformation.
Figure 8. Scale transformation.
Figure 9. Rotate transformation.
Figure 10. Viewpoint transformation.
222
Authorized licensed use limited to: Fujian University of Technology. Downloaded on July 07,2020 at 08:33:48 UTC from IEEE Xplore. Restrictions apply.
4) Anti-jitter performance
Table I.(6) and Figure 11. show the experimental result
of each algorithm under video jitter.
a) The number of features detected:
BRISK>SIFT>SURF>KAZE>ORB>AKAZE;
b) Detecting features time: KAZE > BRISK > SURF >
SIFT> ORB > AKAZE;
c) Matching features time: BRISK >SIFT > SURF ǃ
KAZE > ORB > AKAZE;
d) Total Time ˖KAZE > BRISK > SURF >SIFT >
ORB > AKAZE;
e) Accuracy: Excluding the failed match, all the
algorithms have superior results, the accuracy rate
achieving 1;
f) Repeatability: KAZE > ORB > BRISK > SURF >
AKAZE >SIFT;
g) Correspondence: BRISK >KAZE> SIFT > SURF >
ORB > AKAZE;
From the above experimental results of Anti-jitter, we
can see that KAZE algorithm takes the most time with the
highest repetition rate. While ORB algorithm takes less time
with less number of feature points detected and fewer
repeated feature points. The SIFT algorithm detects more
feature points, but the repetition rate is the lowest. BRISK
algorithm takes longer, but the number of feature points, the
repetition rate and the number of repeated feature points are
higher. Although AKAZE algorithm spends less time to
detect feature points, the number of feature points is small
and the repetition rate is low. In the aspects of the number of
detection feature points, the time-consuming, the repetition
rate and the correspondence, the SURF algorithm only has
average performance.
In summary, although the time-consuming of BRISK
algorithm is long, the effect of other indicators is good. So
the anti-shake performance of the BRISK and SURF
algorithms are relatively good.
(a)Detecting, matching features time
(b) The accuracy and repeatability of all algorithms
Figure 11. Anti-jitter performance.
IV. CONCLUSION
In this paper, a large number of experiments are
performed to evaluate some feature detecting and matching
algorithms (SIFT, SURF, ORB, BRISK, KAZE, AKAZE).
Some robustness indicators are used to measure the
performance of the algorithms. The experimental results
show: under the lighting and blur transformation, the SURF
algorithm is more robust; under the scale and rotation
transformation SURF algorithm performs better; in the
viewpoint transformation, the BRISK and AKAZE
algorithms perform better. The second set of experimental
data shows that the BRISK and SURF algorithms have
better anti-jitter performance.
We can see from the experimental data that some
algorithms are more robust than others in some
transformation scenarios. But they all have a common
feature, that is either the accuracy rate is lower when it takes
less time, or the accuracy rate is higher when it takes more
time. In a word, it can be seen that these algorithms can not
be applied to some occasions with both less time-consuming
and high accuracy. It's still a challenge to have an algorithm
with short time and high accuracy at the same time. So in the
future, we should not only pay attention to the time-
consuming of the algorithm, but also improve the accuracy
of the algorithm.
REFERENCES
[1] Mouragnon E, Dekeyser F, Sayd P, et al. Real Time Localization and
3D Reconstruction. IEEE Computer Society Conference on
Computer Vision & Pattern Recognition, 2006. 363-370.
[2] Fleer D, Möller R. Comparing holistic and feature-based visual
methods for estimating the relative pose of mobile robots. Robotics
and Autonomous Systems, 2017, 89: 51-74.
[3] Pillai S, Leonard J. Monocular SLAM Supported Object Recognition.
Computer Science, 2015.
[4] Hu Z, Jiang Y. An improved ORB, gravity-ORB for target detection
on mobile devices. 2016 12th World Congress on Intelligent Control
and Automation (WCICA); 12-15 June 2016, 2016. 1708-1713.
[5] Mur-Artal R, Montiel J M M, Tardos J D. ORB-SLAM: A Versatile
and Accurate Monocular SLAM System. IEEE Transactions on
Robotics, 2015, 31(5): 1147-1163.
[6] Mur-Artal R, Tardos J D. ORB-SLAM2: An Open-Source SLAM
System for Monocular, Stereo, and RGB-D Cameras. IEEE
Transactions on Robotics, 2017, 33(5): 1255-1262.
[7] Geiger A, Lenz P, Stiller C, et al. Vision meets robotics: the KITTI
dataset. The International Journal of Robotics Research, 2013, 32:
1231-1237.
[8] Marchand E, Uchiyama H, Spindler F. Pose estimation for
augmented reality: a hands-on survey. IEEE Transactions on
Visualization & Computer Graphics, 2016, 22(12): 2633-2651.
[9] Lowe D G. Object recognition from local scale-invariant features.
Proceedings of the Seventh IEEE International Conference on
Computer Vision; 20-27 Sept. 1999, 1999. 1150-1157 vol.1152.
[10] Lowe D G. Distinctive Image Features from Scale-Invariant
Keypoints. International Journal of Computer Vision, 2004, 60(2):
91-110.
[11] Bay H, Tuytelaars T, Van Gool L. SURF: Speeded Up Robust
Features. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. 404-
417.
223
Authorized licensed use limited to: Fujian University of Technology. Downloaded on July 07,2020 at 08:33:48 UTC from IEEE Xplore. Restrictions apply.
[12] Bay H, Ess A, Tuytelaars T, et al. Speeded-Up Robust Features
(SURF). Computer Vision and Image Understanding, 2008, 110(3):
346-359.
[13] Alcantarilla P F, Bartoli A, Davison A J. KAZE Features. European
Conference on Computer Vision, 2012. 214-227.
[14] Fernández Alcantarilla P. Fast Explicit Diffusion for Accelerated
Features in Nonlinear Scale Spaces. 2013.
[15] Rublee E, Rabaud V, Konolige K, et al. ORB: an efficient alternative
to SURF or SURF. International Conference on Computer Vision,
2012.
[16] Leutenegger S, Chli M, Siegwart R Y. BRISK: Binary Robust
invariant scalable keypoints. International Conference on Computer
Vision, 2011.
224
Authorized licensed use limited to: Fujian University of Technology. Downloaded on July 07,2020 at 08:33:48 UTC from IEEE Xplore. Restrictions apply.
www.engineeringvillage.com
Detailed results: 1
Downloaded: 7/29/2020
Content provided by Engineering Village. Copyright 2020 Page 1 of 1
1. Evaluation of image feature detection and matching Algorithms
Accession number: 20202808915403
Authors: Ou, Yiwen (1); Cai, Zhiming (2); Lu, Jian (1); Dong, Jian (1); Ling, Yufeng (1)
Author affiliation: (1) Fujian University of Technology, School of Information Science and Engineering, Fuzhou,
China; (2) Fujian University of Technology, Natl. Demonstration Ctr. for Experimental Electronic Information and
Electrical Technology Education, Fuzhou, China
Corresponding author: Cai, Zhiming(caizm@fjut.edu.cn)
Source title: 2020 5th International Conference on Computer and Communication Systems, ICCCS 2020
Abbreviated source title: Int. Conf. Comput. Commun. Syst., ICCCS
Part number: 1 of 1
Issue title: 2020 5th International Conference on Computer and Communication Systems, ICCCS 2020
Issue date: May 2020
Publication year: 2020
Pages: 220-224
Article number: 9118480
Language: English
ISBN-13: 9781728161365
Document type: Conference article (CA)
Conference name: 5th International Conference on Computer and Communication Systems, ICCCS 2020
Conference date: May 15, 2020 - May 18, 2020
Conference location: Shanghai, China
Conference code: 161227
Publisher: Institute of Electrical and Electronics Engineers Inc.
Abstract: Image features detection and matching algorithms play an important role in the field of machine vision.
Among them, the computational efficiency and robust performance of the features detector descriptor selected by the
algorithm have a great impact on the accuracy and time consumption of image matching. This paper comprehensively
evaluates typical SIFT, SURF, ORB, BRISK, KAZE, AKAZE algorithms. The Oxford dataset is used to compare the
robustness of various algorithms under illumination transformation, rotation transformation, scale transformation,
blur transformation, and viewpoint transformation. Jitter video is also used to compare the anti-jitter ability for these
algorithms. The indicators compared include: time of detecting features, time of matching images, total running time,
number of detected feature points, accuracy, number of repeated feature points, and repetition rate. Experimental
results show that, Under different transformations, each algorithm has its own advantages and disadvantages. © 2020
IEEE.
Number of references: 16
Main heading: Feature extraction
Controlled terms: Computational efficiency - Jitter
Uncontrolled terms: Image features - Matching algorithm - Repetition rate - Robust performance - Rotation
transformation - Scale transformation - Time consumption - Viewpoint transformation
DOI: 10.1109/ICCCS49078.2020.9118480
Compendex references: YES
Database: Compendex
Compilation and indexing terms, Copyright 2020 Elsevier Inc.
Data Provider: Engineering Village
... The BRISK has several advantages such as fast keypoint detection, description, matching, rotation invariant, scale-invariant, high quality, and reduce computational cost [9,10]. The bisecting K-means is an extended version of K-means clustering algorithm. ...
Article
Full-text available
Due to the exponential growth of video data, aided by rapid advancements in multimedia technologies. It became difficult for the user to obtain information from a large video series. The process of providing an abstract of the entire video that includes the most representative frames is known as static video summarization. This method resulted in rapid exploration, indexing, and retrieval of massive video libraries. We propose a framework for static video summary based on a Binary Robust Invariant Scalable Keypoint (BRISK) and bisecting K-means clustering algorithm. The current method effectively recognizes relevant frames using BRISK by extracting keypoints and the descriptors from video sequences. The video frames’ BRISK features are clustered using a bisecting K-means, and the keyframe is determined by selecting the frame that is most near the cluster center. Without applying any clustering parameters, the appropriate clusters number is determined using the silhouette coefficient. Experiments were carried out on a publicly available open video project (OVP) dataset that contained videos of different genres. The proposed method’s effectiveness is compared to existing methods using a variety of evaluation metrics, and the proposed method achieves a trade-off between computational cost and quality.
... In Figure 5.23 I exemplify the differences in features detected if we use different diffusion functions. From the evaluation done in [236], [237], [252]- [257] we can observe that A-KAZE features spends less time to detect feature points but on the down side the number of feature points is small and the repetition rate is low. One important positive aspect is that A-KAZE features have proven to be more rotation invariant than other compared feature detectors. ...
Thesis
Full-text available
Landmarks are typically defined from two perspectives: one as an object or structure that is easy visible and to recognize, and the second as a building or place that has an important historical importance. Landmarks in an urban area serve as “spatial magnet” in which cultural, civic, or economical activities take place. In this sense they have become an important aspect in multiple domains related to tourism and culture. Identifying and locating of an urban landmark is an activity that naturally blends several research domains like image signal processing (ISP), computer vision (CV), augmented reality (AR). This blending of multiple domains was the first trigger that caused me to choose this research topic for the thesis. As a result of this thesis, I wish to offer an urban landmark detection solution, from street view perspective, that can be utilized in a mobile solution for an AR tourism application. This direction desires to exploit the continuous development of user applications aimed for Timişoara European Capital of Culture 2023. In this thesis I will attempt to answer the following research questions: 1. What is the state of the art in urban landmark detection using mobile cameras imaging? 2. What should a simulation framework offer to be considered as a suitable solution for processing systems of this nature? 3. What ISP algorithms enhance the image to obtain a better detection in this case? 4. What are the challenges in creating an urban landmark detection solution tailored for the Timişoara use-case? The thesis is structured in several chapters that are described below. Chapter 1 is an exposition of my motivation towards choosing the subject of this thesis. With the brief exposure I wish to explain the interconnections of multiple domains that founded the decision of choosing this research topic. Chapter 2 offers an overview of the urban landmark detection domain, from general aspects focusing on the end to a specific sub-domain of content-based image retrieval system. The chapter focuses on presenting the domain ecosystem with all the challenges and solutions that literature has to offer. Chapter 3 aims to present my chosen simulation system. The capability of offline simulating a system is an important one with considerable benefits in the development direction. End-to-End Computer Vision Framework (EECVF) is an open source, python-based framework with the goal to offer a flexible and dynamic tool for researching. Chapter 4 presents a proposed image sharpening algorithm that is low computational and based on dilated filters. The proposed algorithm is evaluated on several use-cases that can appear in landmark detection system to better understand the benefits. Chapter 5 presents the proposed landmark detection algorithm with a deep dive in each constructing block of it. I tried for each architectural decision inside the algorithm to explain and justify it in our given use-case context. The evaluation of the proposed landmark detection algorithm using popular dataset, presented in Chapter 2, plus the Timişoara specific dataset that was created for this scope. Chapter 6 is the concluding part of the thesis. I start with some general conclusions regarding the research that I have done. Afterwards, I continue with enumerating theoretical and practical contributions that this thesis brings in the scientific fields. My thesis can be summarized as a proposal for a landmark detection scheme tailored for Timişoara’s urban environment. From the evaluations presented in the thesis we observe a performance of a value of 99.13% Top1 on ZuBuD dataset and 92.05% on TMBuD v3_N dataset. This complex algorithm can be integrated in a mobile application that can offer tourists the chance to better discover the urban scenario of our city.
... In many applications based on the use of artificial vision, the process of image pairing is common, which includes five main stages: first, feature recognition and description; and second, determining the correspondence between the features of the images, rejecting atypical features, deriving the transformation function, and reconstructing the images [25]. For the detection and description process, the following algorithms can be used: SIFT [26], SURF [27], KAZE [28], AZAKE [29], ORB [30], and BRISK [31]. ...
Article
Full-text available
Rice grain production is important for the world economy. Determining the moisture content of the grains, at several stages of production, is crucial for controlling the quality, safety, and storage of the grain. This work inspects how well rice images from global and local descriptors work for determining the moisture content of the grains using artificial vision and intelligence techniques. Three sets of images of rice grains from the INIAP 12 variety (National Institute of Agricultural Research of Ecuador) were captured with a mobile camera. The first one with natural light and the other ones with a truncated pyramid-shaped structure. Then, a set of global descriptors (color, texture) and a set of local descriptors (AZAKE, BRISK, ORB, and SIFT) in conjunction with the dominate technique bag of visual words (BoVW) were used to analyze the content of the image with classification and regression algorithms. The results show that detecting humidity through images with classification and regression algorithms is possible. Finally, f1-score values of at least 0.9 were accomplished for global color descriptors and of 0.8 for texture descriptors, in contrast to the local descriptors (AKAZE, BRISK, and SIFT) that reached up to an f1-score of 0.96.
... Image2 Image3 Image4 Image5 Image6 Image1 Image2 Image3 Image4 Image5 Image6 [5,28,29]. However, there is no consensus on a universally optimal detector for all possible image geometrical and photometric variations [23]. ...
Article
Full-text available
The repeatability rate is an important measure for evaluating and comparing the performance of keypoint detectors. Several repeatability rate measurements were used in the literature to assess the effectiveness of keypoint detectors. While these repeatability rates are calculated for pairs of images, the general assumption is that the reference image is often known and unchanging compared to other images in the same dataset. So, these rates are asymmetrical as they require calculations in only one direction. In addition, the image domain in which these computations take place substantially affects their values. The presented scatter diagram plots illustrate how these directional repeatability rates vary in relation to the size of the neighboring region in each pair of images. Therefore, both directional repeatability rates for the same image pair must be included when comparing different keypoint detectors. This paper, firstly, examines several commonly utilized repeatability rate measures for keypoint detector evaluations. The researcher then suggests computing a twofold repeatability rate to assess keypoint detector performance on similar scene images. Next, the symmetric mean repeatability rate metric is computed using the given twofold repeatability rates. Finally, these measurements are validated using well-known keypoint detectors on different image groups with various geometric and photometric attributes.
Article
Nowadays, feature based 3D reconstruction and 1 tracking technology have been widely used in the medical field. 2 Feature matching is the most important step in feature-based 3 3D reconstruction process, as the accuracy of feature matching 4 directly affects the accuracy of subsequent 3D point cloud 5 coordinates. However, the matching performance of traditional 6 feature matching methods is poor. To overcome this limitation, 7 a method of matching based on convolutional neural network is 8 presented. The convolutional neural network is trained by collecting 9 a training set on the video sequence of a certain length from starting 10 frame. The matched feature points in different endoscopic video 11 frames are treated as the same category. The feature points in 12 subsequent frames are matched by network classification. The 13 proposed method is validated using the silicone simulation heart 14 video and the endoscope video of the vivo beating heart obtained by 15 Da Vinci's surgical robot. Compared with SURF and ORB algorithms, 16 as well as other methods, the experimental results show that the 17 feature matching algorithm based on convolutional neural network 18 is effective in the feature matching effect, rotation invariance, and 19 scale invariance. For the first 200 frames of the video, the matching 20 accuracy reached 90%. c 2023 Society for Imaging Science and 21 Technology.
Article
Full-text available
Keypoint detection and matching algorithms are frequently compared in the literature using datasets of real-world images that have a range of geometric and non-geometric variations; these include viewpoints, illuminations, visual content, and distortions. Homography (H) matrices often describe geometric variations when utilizing these image datasets. However, models for non-geometric differences between these images are rarely offered, resulting in inaccurate and misleading comparisons. This study presents a methodology for objectively comparing classical keypoint detection and matching algorithms by eliminating implicit non-geometric influences from assessments, therefore, offering a step towards limiting the comparison between an image pair to the geometric transformations between them. This proposed technique uses the H matrix provided by the image dataset to generate an augmented image that resembles one of the images in each image group. The performance of the proposed technique was evaluated using several traditional keypoint detections and matching techniques using image groups from well-known datasets to determine the impact of excluding non-geometric changes. The assessments are conducted using the performance measures of repeatability, precision, and recall rates.
Article
Full-text available
In this work, we develop a monocular SLAM-aware object recognition system that is able to achieve considerably stronger recognition performance, as compared to classical object recognition systems that function on a frame-by-frame basis. By incorporating several key ideas including multi-view object proposals and efficient feature encoding methods, our proposed system is able to detect and robustly recognize objects in its environment using a single RGB camera in near-constant time. Through experiments, we illustrate the utility of using such a system to effectively detect and recognize objects, incorporating multiple object viewpoint detections into a unified prediction hypothesis. The performance of the proposed recognition system is evaluated on the UW RGB-D Dataset, showing strong recognition performance and scalable run-time performance compared to current state-of-the-art recognition systems.
Article
Full-text available
This paper presents ORB-SLAM, a feature-based monocular SLAM system that operates in real time, in small and large, indoor and outdoor environments. The system is robust to severe motion clutter, allows wide baseline loop closing and relocalization, and includes full automatic initialization. Building on excellent algorithms of recent years, we designed from scratch a novel system that uses the same features for all SLAM tasks: tracking, mapping, relocalization, and loop closing. A survival of the fittest strategy that selects the points and keyframes of the reconstruction leads to excellent robustness and generates a compact and trackable map that only grows if the scene content changes, allowing lifelong operation. We present an exhaustive evaluation in 27 sequences from the most popular datasets. ORB-SLAM achieves unprecedented performance with respect to other state-of-the-art monocular SLAM approaches. For the benefit of the community, we make the source code public.
Article
Full-text available
We present a novel dataset captured from a VW station wagon for use in mobile robotics and autonomous driving research. In total, we recorded 6 hours of traffic scenarios at 10–100 Hz using a variety of sensor modalities such as high-resolution color and grayscale stereo cameras, a Velodyne 3D laser scanner and a high-precision GPS/IMU inertial navigation system. The scenarios are diverse, capturing real-world traffic situations, and range from freeways over rural areas to inner-city scenes with many static and dynamic objects. Our data is calibrated, synchronized and timestamped, and we provide the rectified and raw image sequences. Our dataset also contains object labels in the form of 3D tracklets, and we provide online benchmarks for stereo, optical flow, object detection and other tasks. This paper describes our recording platform, the data format and the utilities that we provide.
Article
Feature-based and holistic methods present two fundamentally different approaches to relative-pose estimation from pairs of camera images. Until now, there has been a lack of direct comparisons between these methods in the literature. This makes it difficult to evaluate their relative merits for their many applications in mobile robotics. In this work, we compare a selection of such methods in the context of an autonomous domestic cleaning robot. We find that the holistic Min-Warping method gives good and fast results. Some of the feature-based methods can provide excellent and robust results, but at much slower speeds. Other such methods also achieve high speeds, but at reduced robustness to illumination changes. We also provide novel image databases and supporting data for public use.
Article
We present ORB-SLAM2 a complete SLAM system for monocular, stereo and RGB-D cameras, including map reuse, loop closing and relocalization capabilities. The system works in real-time in standard CPUs in a wide variety of environments from small hand-held indoors sequences, to drones flying in industrial environments and cars driving around a city. Our backend based on Bundle Adjustment with monocular and stereo observations allows for accurate trajectory estimation with metric scale. Our system includes a lightweight localization mode that leverages visual odometry tracks for unmapped regions and matches to map points that allow for zero-drift localization. The evaluation in 29 popular public sequences shows that our method achieves state-of-the-art accuracy, being in most cases the most accurate SLAM solution. We publish the source code, not only for the benefit of the SLAM community, but with the aim of being an out-of-the-box SLAM solution for researchers in other fields.
Conference Paper
Feature matching is at the base of the target detection problem. Current methods rely on costly descriptors for detection and matching. This paper presents an improved feature descriptor based on ORB, called Gravity-ORB for target detection in mobile devices. Compared with traditional descriptors such as SIFT or ORB, the concept design can perform fast feature matching under the condition of keeping the restriction on robustness, even in the case where mobile devices have limited computational capacity. Specially, Gravity-ORB reduces the complexity of feature computation for mobile devices with less computing by using gravity acceleration sensors. In the end, experiments conducted in smart phones and tablets demonstrate the effectiveness and real-time performance of the proposed method.
Article
Augmented reality (AR) allows to seamlessly insert virtual objects in an image sequence. In order to accomplish this goal, it is important that synthetic elements are rendered and aligned in the scene in an accurate and visually acceptable way. The solution of this problem can be related to a pose estimation or, equivalently, a camera localization process. This paper aims at presenting a brief but almost self-contented introduction to the most important approaches dedicated to vision-based camera localization along with a survey of several extension proposed in the recent years. For most of the presented approaches, we also provide links to code of short examples. This should allow readers to easily bridge the gap between theoretical aspects and practical implementations.
Article
This article presents a novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features). SURF approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster. This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors and descriptors (specifically, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and by simplifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps. The paper encompasses a detailed description of the detector and descriptor and then explores the effects of the most important parameters. We conclude the article with SURF's application to two challenging, yet converse goals: camera calibration as a special case of image registration, and object recognition. Our experiments underline SURF's usefulness in a broad range of topics in computer vision.