Conference PaperPDF Available

Evaluation of Image Feature Detection and Matching Algorithms

May 2020

May 2020

DOI:10.1109/ICCCS49078.2020.9118480

Conference: 2020 5th International Conference on Computer and Communication Systems (ICCCS)

Authors:

Zhiming Cai

Fujian University of Technology

Show all 5 authorsHide

Illumination transformation.

…

Figures - uploaded by Zhiming Cai

Content may be subject to copyright.

Content uploaded by Zhiming Cai

Content may be subject to copyright.

Evaluation of Image Feature Detection and Matching Algorithms

Yiwen Ou

School of Information Science and Engineering

Fujian University of Technology

Fuzhou, China

e-mail: 1172704941@qq.com

Zhiming Cai*

National Demonstration Center for Experimental

Electronic Information and Electrical Technology

Education,

Fujian University of Technology

Fuzhou, China

Corresponding author, e-mail: caizm@fjut.edu.cn

Jian Lu

School of Information Science and Engineering

Fujian University of Technology

Fuzhou, China

e-mail: 573843470@qq.com

Jian Dong

School of Information Science and Engineering

Fujian University of Technology,

Fuzhou, China

e-mail: 2711713088@qq.com

Yufeng Ling

School of Information Science and Engineering

Fujian University of Technology,

Fuzhou, China

e-mail: 1504662829@qq.com

Abstract—Image features detection and matching algorithms

play an important role in the field of machine vision. Among

them, the computational efficiency and robust performance of

the features detector descriptor selected by the algorithm have

a great impact on the accuracy and time consumption of image

matching. This paper comprehensively evaluates typical SIFT,

SURF, ORB, BRISK, KAZE, AKAZE algorithms. The Oxford

dataset is used to compare the robustness of various

algorithms under illumination transformation, rotation

transformation, scale transformation, blur transformation,

and viewpoint transformation. Jitter video is also used to

compare the anti-jitter ability for these algorithms. The

indicators compared include: time of detecting features, time

of matching images, total running time, number of detected

feature points, accuracy, number of repeated feature points,

and repetition rate. Experimental results show that, Under

different transformations, each algorithm has its own

advantages and disadvantages.

Keywords-features detection and matching; comprehensively

evaluates; robustness

I. INTRODUCTION

Feature point detection and matching algorithms in

images have been widely used in many machine vision

fields, such as real-time location and 3D reconstruction[1],

pose estimation[2], object recognition[3], intelligent device

application[4], slam (simultaneous localization and mapping)

[5, 6], automatic driving, robot navigation[7], AR[8], etc.

The algorithms can be classified into two categories.

A. Algorithms Based on Blob Detection

Scale invariant feature transform (SIFT) algorithm[9]

was proposed by David G. Lowe in 1999, and then

improved in 2004[10]. SURF (Speed Up Robust Features)

algorithm was first proposed by Bay et al [11]in 2006 and

improved in 2008[12]. This algorithm is a robust local

feature detection algorithm. A more stable feature detection

algorithm KAZE[13] appears in ECCV 2012 than SIFT. In

2013, PF Alcantarilla et al. presented Accelerated-KAZE

(AKAZE) algorithm[14],which adopts nonlinear diffusion

filtering. AKAZE improves repeatability and uniqueness

compared with SIFT and SURF.

B. Algorithms Based on Corner Detection

ORB (Oriented Fast and Rotated Binary Robust

Independent Elementary Features) was proposed by Ruble et

al.[15] in 2011. BRISK(Binary Robot Invariant Scalable

keypoints) method was proposed by Stefan et al in 2011[16],

which realized the detection, description and matching of

image feature points.

Image feature detection and matching algorithms usually

have the following steps: 1) Detection and description of

feature points; 2) Matching of feature points; 3) Rough

matching of feature points and using the RANSAC method

220



Authorized licensed use limited to: Fujian University of Technology. Downloaded on July 07,2020 at 08:33:48 UTC from IEEE Xplore. Restrictions apply.

to perform " Purify "(that is, remove outliers). 4) The

features obtained from 3) are then matched (good matches).

In this paper, SIFT, SURF, ORB, BRIKS, KAZE,

AKAZE algorithms are evaluated with Oxford dataset which

providing video sequence for rotation transformation, scale

transformation, illumination transformation, blur

transformation, viewpoint transformation. We also use jitter

video stream to verify the robustness of anti-jitter.

II. FUNDAMENTALS OF EVALUATION

A. Experimental Setup

OPENCV 3.4 has been used for experiments presented in

this paper. Specifications of the computer system used are:

Intel(R) Core(TM) i5-4210U CPU @1.70GHz 2.40GHz and

4.00GB RAM.

B. Datasets

Two groups of experiments, namely the robustness of

various algorithms under different transformations and the

anti-jitter ability are performed.

The Oxford dataset

(http://www.robots.ox.ac.uk/~vgg/research/affine/) is used

as the first set of experimental data to evaluate the robust

performance of each algorithm under illumination

transformation, blur transformation, scale transformation,

rotation transformation and viewpoint transformation. The

leuven and graf image packages are used originally. For

other image packages, the first image is kept and the others

are deleted. In bikes packages, the first image is filtered by 5

× 5 mean, Gauss and median filters. All the result images are

put in the same package which denoted as updated bikes

package. In boats package, the first image is contracted by

0.2 times, 0.5 times sampling and enlarged by 1.5 times, 2

times sampling. In bark package, the first image is rotated by

15 °, 30 °, 45 °, 60 °, 90 ° 180 °respectively. All the result

images are put in corresponding package.

The second set of experimental data is a video suffering

from strong rolling shutter

artifacts(http://web.cecs.pdx.edu/~fliu/project/subspace_stab

ilization/). Two frames of the video are extracted to verify

the anti-jitter performance of each algorithm.

This paper use the following indicators to describe the

performance of various algorithms: 1) time of detecting

features; 2) time of matching images; 3) total running time;

4) number of detected feature points; 5) accuracy (The

number of feature points filtered by RANSAC divided by

the number of feature points after rough matching); 6)

correspondence(repeat feature point pairs for feature point

detection) ;7) repeatability(correspondence divided by the

minimum number of detected feature points in two pictures).

III. EVALUATION OF ROBUSTNESS OF VARIOUS

ALGORITHMS

To evaluate the robustness of every algorithm, the first

image is matched with the rest images one by one in the

updated package. As some algorithms may fail under certain

transformation, this paper takes the average of the

successfully matched data set as the experimental result. To

get reasonable experimental result, each pair of images are

matched for 5 times, and the detecting and matching time is

averaged.

A. Evaluate the Indicators of Each Algorithm under Each

Transformation

1) The number of features detect and correspondence

As shown in the Table I, the number of feature points

detected by SIFT, SURF and BRISK are several times that

of the other three, and the ORB gets the least.

Except for scale transformation, the number of repeated

feature points detected by SURF algorithm is the highest.

ORB algorithm has the least number of repeated feature

points;

TABLE I. THE NUMBER OF FEATURES DETECTED AND CORRESPONDENCE

OF VARIOUS ALGORITHMS

DOJRULWKPV

LQGLFDWRUV6,)7 685) 25% %5,6. .$=( $.$=(

LOOXPLQDWLRQWUDQVIRUPDWLRQRULJLQDOOHXYHQSDFNDJH

D      

E     

EOXUWUDQVIRUPDWLRQXSGDWHGELNHVSDFNDJH

D      

E      

VFDOHWUDQVIRUPDWLRQXSGDWHGERDWVSDFNDJH

D      

E      

URWDWLRQWUDQVIRUPDWLRQXSGDWHGEDUNSDFNDJH

D      

E      

YLHZSRLQWWUDQVIRUPDWLRQRULJLQDOJUDISDFNDJH

D      

E      

-LWWHUYLGHR

D      

E      

Note: a-The number of features detect; b-Correspondence.

2) The time of detecting features, matching features and

total time

The time indicators of all the algorithms evaluated in

different datasets are shown in Figure 1-5. All the figures

show that under any transformation, ORB, AKAZE pay

much less time to detect and match features compared with

other algorithms .KAZE consumes the maximum time under

any transformation. BRISK is similar to SIFT. For all the

tests, ORB spends the least time for detecting and matching

features. In most situations, SIFT outperforms SURF except

scale transformation. From these figures, it also depicts that

detecting costs more time than matching except in scale

transformation. KAZE and AKAZE nearly have equal

matching time. Overall, the algorithms ORB and AKAZE

have better performance in detecting and matching time.

Figure 1. Illumination transformation.

221

Authorized licensed use limited to: Fujian University of Technology. Downloaded on July 07,2020 at 08:33:48 UTC from IEEE Xplore. Restrictions apply.

Figure 2. Blur transformation.

Figure 3. Scale transformation.

Figure 4. Rotate transformation.

Figure 5. Viewpoint transformation.

3) Accuracy and Repeatability

The accuracy and repeatability evaluated with different

datasets are shown in Figure 6-10. From Figure6-10, it

shows that the accuracy of each algorithm is very high in the

blur transformation. However the accuracy is generally low

in viewpoint transformation where KAZE gets the lowest.

Under the illumination transformation, the accuracy of SIFT

is the lowest , while that of AKAZE is the highest. Under the

rotation transformation, KAZE gets the lowest accuracy.

It can be concluded from Figure 6 to Figure 10: 1)under

the illumination, scale and blur transformations, the

repeatability of AKAZE is the highest. 2)under the rotation

or viewpoint transformation, the highest Repeatability

changeto KAZE. 3) For illumination or scale transformation,

the ORB achieves the lowest repeatability.

Based on the above evaluation performance indicators,

we can find that some algorithms are robust under certain

transformations. For instance SURF performs well under the

illumination, blur, scale, rotate transformation. And SIFT

performs better under the scale and rotate transformation. In

addition ,under viewpoint transformation, BRISK algorithm

and AKAZE algorithm are also perform better.

Figure 6. Illumination transformation.

Figure 7. Blur transformation.

Figure 8. Scale transformation.

Figure 9. Rotate transformation.

Figure 10. Viewpoint transformation.

222

Authorized licensed use limited to: Fujian University of Technology. Downloaded on July 07,2020 at 08:33:48 UTC from IEEE Xplore. Restrictions apply.

4) Anti-jitter performance

Table I.(6) and Figure 11. show the experimental result

of each algorithm under video jitter.

a) The number of features detected:

BRISK>SIFT>SURF>KAZE>ORB>AKAZE;

b) Detecting features time: KAZE > BRISK > SURF >

SIFT> ORB > AKAZE;

c) Matching features time: BRISK >SIFT > SURF ǃ

KAZE > ORB > AKAZE;

d) Total Time ˖KAZE > BRISK > SURF >SIFT >

ORB > AKAZE;

e) Accuracy: Excluding the failed match, all the

algorithms have superior results, the accuracy rate

achieving 1;

f) Repeatability: KAZE > ORB > BRISK > SURF >

AKAZE >SIFT;

g) Correspondence: BRISK >KAZE> SIFT > SURF >

ORB > AKAZE;

From the above experimental results of Anti-jitter, we

can see that KAZE algorithm takes the most time with the

highest repetition rate. While ORB algorithm takes less time

with less number of feature points detected and fewer

repeated feature points. The SIFT algorithm detects more

feature points, but the repetition rate is the lowest. BRISK

algorithm takes longer, but the number of feature points, the

repetition rate and the number of repeated feature points are

higher. Although AKAZE algorithm spends less time to

detect feature points, the number of feature points is small

and the repetition rate is low. In the aspects of the number of

detection feature points, the time-consuming, the repetition

rate and the correspondence, the SURF algorithm only has

average performance.

In summary, although the time-consuming of BRISK

algorithm is long, the effect of other indicators is good. So

the anti-shake performance of the BRISK and SURF

algorithms are relatively good.

(a)Detecting, matching features time

(b) The accuracy and repeatability of all algorithms

Figure 11. Anti-jitter performance.

IV. CONCLUSION

In this paper, a large number of experiments are

performed to evaluate some feature detecting and matching

algorithms (SIFT, SURF, ORB, BRISK, KAZE, AKAZE).

Some robustness indicators are used to measure the

performance of the algorithms. The experimental results

show: under the lighting and blur transformation, the SURF

algorithm is more robust; under the scale and rotation

transformation SURF algorithm performs better; in the

viewpoint transformation, the BRISK and AKAZE

algorithms perform better. The second set of experimental

data shows that the BRISK and SURF algorithms have

better anti-jitter performance.

We can see from the experimental data that some

algorithms are more robust than others in some

transformation scenarios. But they all have a common

feature, that is either the accuracy rate is lower when it takes

less time, or the accuracy rate is higher when it takes more

time. In a word, it can be seen that these algorithms can not

be applied to some occasions with both less time-consuming

and high accuracy. It's still a challenge to have an algorithm

with short time and high accuracy at the same time. So in the

future, we should not only pay attention to the time-

consuming of the algorithm, but also improve the accuracy

of the algorithm.

REFERENCES

[1] Mouragnon E, Dekeyser F, Sayd P, et al. Real Time Localization and

3D Reconstruction. IEEE Computer Society Conference on

Computer Vision & Pattern Recognition, 2006. 363-370.

[2] Fleer D, Möller R. Comparing holistic and feature-based visual

methods for estimating the relative pose of mobile robots. Robotics

and Autonomous Systems, 2017, 89: 51-74.

[3] Pillai S, Leonard J. Monocular SLAM Supported Object Recognition.

Computer Science, 2015.

[4] Hu Z, Jiang Y. An improved ORB, gravity-ORB for target detection

on mobile devices. 2016 12th World Congress on Intelligent Control

and Automation (WCICA); 12-15 June 2016, 2016. 1708-1713.

[5] Mur-Artal R, Montiel J M M, Tardos J D. ORB-SLAM: A Versatile

and Accurate Monocular SLAM System. IEEE Transactions on

Robotics, 2015, 31(5): 1147-1163.

[6] Mur-Artal R, Tardos J D. ORB-SLAM2: An Open-Source SLAM

System for Monocular, Stereo, and RGB-D Cameras. IEEE

Transactions on Robotics, 2017, 33(5): 1255-1262.

[7] Geiger A, Lenz P, Stiller C, et al. Vision meets robotics: the KITTI

dataset. The International Journal of Robotics Research, 2013, 32:

1231-1237.

[8] Marchand E, Uchiyama H, Spindler F. Pose estimation for

augmented reality: a hands-on survey. IEEE Transactions on

Visualization & Computer Graphics, 2016, 22(12): 2633-2651.

[9] Lowe D G. Object recognition from local scale-invariant features.

Proceedings of the Seventh IEEE International Conference on

Computer Vision; 20-27 Sept. 1999, 1999. 1150-1157 vol.1152.

[10] Lowe D G. Distinctive Image Features from Scale-Invariant

Keypoints. International Journal of Computer Vision, 2004, 60(2):

91-110.

[11] Bay H, Tuytelaars T, Van Gool L. SURF: Speeded Up Robust

Features. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. 404-

417.

223

Authorized licensed use limited to: Fujian University of Technology. Downloaded on July 07,2020 at 08:33:48 UTC from IEEE Xplore. Restrictions apply.

[12] Bay H, Ess A, Tuytelaars T, et al. Speeded-Up Robust Features

(SURF). Computer Vision and Image Understanding, 2008, 110(3):

346-359.

[13] Alcantarilla P F, Bartoli A, Davison A J. KAZE Features. European

Conference on Computer Vision, 2012. 214-227.

[14] Fernández Alcantarilla P. Fast Explicit Diffusion for Accelerated

Features in Nonlinear Scale Spaces. 2013.

[15] Rublee E, Rabaud V, Konolige K, et al. ORB: an efficient alternative

to SURF or SURF. International Conference on Computer Vision,

2012.

[16] Leutenegger S, Chli M, Siegwart R Y. BRISK: Binary Robust

invariant scalable keypoints. International Conference on Computer

Vision, 2011.

224

Authorized licensed use limited to: Fujian University of Technology. Downloaded on July 07,2020 at 08:33:48 UTC from IEEE Xplore. Restrictions apply.

www.engineeringvillage.com

Detailed results: 1

Downloaded: 7/29/2020

1. Evaluation of image feature detection and matching Algorithms

Accession number: 20202808915403

Authors: Ou, Yiwen (1); Cai, Zhiming (2); Lu, Jian (1); Dong, Jian (1); Ling, Yufeng (1)

Author affiliation: (1) Fujian University of Technology, School of Information Science and Engineering, Fuzhou,

China; (2) Fujian University of Technology, Natl. Demonstration Ctr. for Experimental Electronic Information and

Electrical Technology Education, Fuzhou, China

Corresponding author: Cai, Zhiming(caizm@fjut.edu.cn)

Source title: 2020 5th International Conference on Computer and Communication Systems, ICCCS 2020

Abbreviated source title: Int. Conf. Comput. Commun. Syst., ICCCS

Part number: 1 of 1

Issue title: 2020 5th International Conference on Computer and Communication Systems, ICCCS 2020

Issue date: May 2020

Publication year: 2020

Pages: 220-224

Article number: 9118480

Language: English

ISBN-13: 9781728161365

Document type: Conference article (CA)

Conference name: 5th International Conference on Computer and Communication Systems, ICCCS 2020

Conference date: May 15, 2020 - May 18, 2020

Conference location: Shanghai, China

Conference code: 161227

Publisher: Institute of Electrical and Electronics Engineers Inc.

Abstract: Image features detection and matching algorithms play an important role in the field of machine vision.

Among them, the computational efficiency and robust performance of the features detector descriptor selected by the

algorithm have a great impact on the accuracy and time consumption of image matching. This paper comprehensively

evaluates typical SIFT, SURF, ORB, BRISK, KAZE, AKAZE algorithms. The Oxford dataset is used to compare the

robustness of various algorithms under illumination transformation, rotation transformation, scale transformation,

blur transformation, and viewpoint transformation. Jitter video is also used to compare the anti-jitter ability for these

algorithms. The indicators compared include: time of detecting features, time of matching images, total running time,

number of detected feature points, accuracy, number of repeated feature points, and repetition rate. Experimental

IEEE.

Number of references: 16

Main heading: Feature extraction

Controlled terms: Computational efficiency - Jitter

Uncontrolled terms: Image features - Matching algorithm - Repetition rate - Robust performance - Rotation

transformation - Scale transformation - Time consumption - Viewpoint transformation

DOI: 10.1109/ICCCS49078.2020.9118480

Compendex references: YES

Database: Compendex

Data Provider: Engineering Village

Video Summarization Approach Based on Binary Robust Invariant Scalable Keypoints and Bisecting K-Means

Article

Full-text available

Mar 2024
CMC-COMPUT MATER CON

Due to the exponential growth of video data, aided by rapid advancements in multimedia technologies. It became difficult for the user to obtain information from a large video series. The process of providing an abstract of the entire video that includes the most representative frames is known as static video summarization. This method resulted in rapid exploration, indexing, and retrieval of massive video libraries. We propose a framework for static video summary based on a Binary Robust Invariant Scalable Keypoint (BRISK) and bisecting K-means clustering algorithm. The current method effectively recognizes relevant frames using BRISK by extracting keypoints and the descriptors from video sequences. The video frames’ BRISK features are clustered using a bisecting K-means, and the keyframe is determined by selecting the frame that is most near the cluster center. Without applying any clustering parameters, the appropriate clusters number is determined using the silhouette coefficient. Experiments were carried out on a publicly available open video project (OVP) dataset that contained videos of different genres. The proposed method’s effectiveness is compared to existing methods using a variety of evaluation metrics, and the proposed method achieves a trade-off between computational cost and quality.

Urban landmark detection using Computer Vision

Thesis

Full-text available

Jun 2022

Ciprian Orhei

Landmarks are typically defined from two perspectives: one as an object or structure that is easy visible and to recognize, and the second as a building or place that has an important historical importance. Landmarks in an urban area serve as “spatial magnet” in which cultural, civic, or economical activities take place. In this sense they have become an important aspect in multiple domains related to tourism and culture. Identifying and locating of an urban landmark is an activity that naturally blends several research domains like image signal processing (ISP), computer vision (CV), augmented reality (AR). This blending of multiple domains was the first trigger that caused me to choose this research topic for the thesis. As a result of this thesis, I wish to offer an urban landmark detection solution, from street view perspective, that can be utilized in a mobile solution for an AR tourism application. This direction desires to exploit the continuous development of user applications aimed for Timişoara European Capital of Culture 2023. In this thesis I will attempt to answer the following research questions: 1. What is the state of the art in urban landmark detection using mobile cameras imaging? 2. What should a simulation framework offer to be considered as a suitable solution for processing systems of this nature? 3. What ISP algorithms enhance the image to obtain a better detection in this case? 4. What are the challenges in creating an urban landmark detection solution tailored for the Timişoara use-case? The thesis is structured in several chapters that are described below. Chapter 1 is an exposition of my motivation towards choosing the subject of this thesis. With the brief exposure I wish to explain the interconnections of multiple domains that founded the decision of choosing this research topic. Chapter 2 offers an overview of the urban landmark detection domain, from general aspects focusing on the end to a specific sub-domain of content-based image retrieval system. The chapter focuses on presenting the domain ecosystem with all the challenges and solutions that literature has to offer. Chapter 3 aims to present my chosen simulation system. The capability of offline simulating a system is an important one with considerable benefits in the development direction. End-to-End Computer Vision Framework (EECVF) is an open source, python-based framework with the goal to offer a flexible and dynamic tool for researching. Chapter 4 presents a proposed image sharpening algorithm that is low computational and based on dilated filters. The proposed algorithm is evaluated on several use-cases that can appear in landmark detection system to better understand the benefits. Chapter 5 presents the proposed landmark detection algorithm with a deep dive in each constructing block of it. I tried for each architectural decision inside the algorithm to explain and justify it in our given use-case context. The evaluation of the proposed landmark detection algorithm using popular dataset, presented in Chapter 2, plus the Timişoara specific dataset that was created for this scope. Chapter 6 is the concluding part of the thesis. I start with some general conclusions regarding the research that I have done. Afterwards, I continue with enumerating theoretical and practical contributions that this thesis brings in the scientific fields. My thesis can be summarized as a proposal for a landmark detection scheme tailored for Timişoara’s urban environment. From the evaluations presented in the thesis we observe a performance of a value of 99.13% Top1 on ZuBuD dataset and 92.05% on TMBuD v3_N dataset. This complex algorithm can be integrated in a mobile application that can offer tourists the chance to better discover the urban scenario of our city.

Determination of Moisture in Rice Grains Based on Visible Spectrum Analysis

Article

Full-text available

Nov 2022

Rice grain production is important for the world economy. Determining the moisture content of the grains, at several stages of production, is crucial for controlling the quality, safety, and storage of the grain. This work inspects how well rice images from global and local descriptors work for determining the moisture content of the grains using artificial vision and intelligence techniques. Three sets of images of rice grains from the INIAP 12 variety (National Institute of Agricultural Research of Ecuador) were captured with a mobile camera. The first one with natural light and the other ones with a truncated pyramid-shaped structure. Then, a set of global descriptors (color, texture) and a set of local descriptors (AZAKE, BRISK, ORB, and SIFT) in conjunction with the dominate technique bag of visual words (BoVW) were used to analyze the content of the image with classification and regression algorithms. The results show that detecting humidity through images with classification and regression algorithms is possible. Finally, f1-score values of at least 0.9 were accomplished for global color descriptors and of 0.8 for texture descriptors, in contrast to the local descriptors (AKAZE, BRISK, and SIFT) that reached up to an f1-score of 0.96.

Two-Fold and Symmetric Repeatability Rates for Comparing Keypoint Detectors

Article

Full-text available

Sep 2022
CMC-COMPUT MATER CON

Ibrahim El rube

The repeatability rate is an important measure for evaluating and comparing the performance of keypoint detectors. Several repeatability rate measurements were used in the literature to assess the effectiveness of keypoint detectors. While these repeatability rates are calculated for pairs of images, the general assumption is that the reference image is often known and unchanging compared to other images in the same dataset. So, these rates are asymmetrical as they require calculations in only one direction. In addition, the image domain in which these computations take place substantially affects their values. The presented scatter diagram plots illustrate how these directional repeatability rates vary in relation to the size of the neighboring region in each pair of images. Therefore, both directional repeatability rates for the same image pair must be included when comparing different keypoint detectors. This paper, firstly, examines several commonly utilized repeatability rate measures for keypoint detector evaluations. The researcher then suggests computing a twofold repeatability rate to assess keypoint detector performance on similar scene images. Next, the symmetric mean repeatability rate metric is computed using the given twofold repeatability rates. Finally, these measurements are validated using well-known keypoint detectors on different image groups with various geometric and photometric attributes.

Universal Deoxidation of Semiconductor Substrates Assisted by Machine Learning and Real-Time Feedback Control

Article

Mar 2024
ACS APPL MATER INTER

A Feature Matching Method based on the Convolutional Neural Network

Article

Apr 2023

Nowadays, feature based 3D reconstruction and 1 tracking technology have been widely used in the medical field. 2 Feature matching is the most important step in feature-based 3 3D reconstruction process, as the accuracy of feature matching 4 directly affects the accuracy of subsequent 3D point cloud 5 coordinates. However, the matching performance of traditional 6 feature matching methods is poor. To overcome this limitation, 7 a method of matching based on convolutional neural network is 8 presented. The convolutional neural network is trained by collecting 9 a training set on the video sequence of a certain length from starting 10 frame. The matched feature points in different endoscopic video 11 frames are treated as the same category. The feature points in 12 subsequent frames are matched by network classification. The 13 proposed method is validated using the silicone simulation heart 14 video and the endoscope video of the vivo beating heart obtained by 15 Da Vinci's surgical robot. Compared with SURF and ORB algorithms, 16 as well as other methods, the experimental results show that the 17 feature matching algorithm based on convolutional neural network 18 is effective in the feature matching effect, rotation invariance, and 19 scale invariance. For the first 200 frames of the video, the matching 20 accuracy reached 90%. c 2023 Society for Imaging Science and 21 Technology.

Image Augmentation for Keypoint Detection and Matching Assessments

Article

Full-text available

Nov 2022

Ibrahim El rube

Keypoint detection and matching algorithms are frequently compared in the literature using datasets of real-world images that have a range of geometric and non-geometric variations; these include viewpoints, illuminations, visual content, and distortions. Homography (H) matrices often describe geometric variations when utilizing these image datasets. However, models for non-geometric differences between these images are rarely offered, resulting in inaccurate and misleading comparisons. This study presents a methodology for objectively comparing classical keypoint detection and matching algorithms by eliminating implicit non-geometric influences from assessments, therefore, offering a step towards limiting the comparison between an image pair to the geometric transformations between them. This proposed technique uses the H matrix provided by the image dataset to generate an augmented image that resembles one of the images in each image group. The performance of the proposed technique was evaluated using several traditional keypoint detections and matching techniques using image groups from well-known datasets to determine the impact of excluding non-geometric changes. The assessments are conducted using the performance measures of repeatability, precision, and recall rates.

Eraser Feature in Ultrasound Wide-Field-of-View Imaging

Conference Paper

Apr 2021

Monocular SLAM Supported Object Recognition

Article

Full-text available

Jun 2015

In this work, we develop a monocular SLAM-aware object recognition system that is able to achieve considerably stronger recognition performance, as compared to classical object recognition systems that function on a frame-by-frame basis. By incorporating several key ideas including multi-view object proposals and efficient feature encoding methods, our proposed system is able to detect and robustly recognize objects in its environment using a single RGB camera in near-constant time. Through experiments, we illustrate the utility of using such a system to effectively detect and recognize objects, incorporating multiple object viewpoint detections into a unified prediction hypothesis. The performance of the proposed recognition system is evaluated on the UW RGB-D Dataset, showing strong recognition performance and scalable run-time performance compared to current state-of-the-art recognition systems.

Monocular SLAM Supported Object Recognition

Conference Paper

Full-text available

Jul 2015

ORB-SLAM: a versatile and accurate monocular SLAM system

Article

Full-text available

Oct 2015

This paper presents ORB-SLAM, a feature-based monocular SLAM system that operates in real time, in small and large, indoor and outdoor environments. The system is robust to severe motion clutter, allows wide baseline loop closing and relocalization, and includes full automatic initialization. Building on excellent algorithms of recent years, we designed from scratch a novel system that uses the same features for all SLAM tasks: tracking, mapping, relocalization, and loop closing. A survival of the fittest strategy that selects the points and keyframes of the reconstruction leads to excellent robustness and generates a compact and trackable map that only grows if the scene content changes, allowing lifelong operation. We present an exhaustive evaluation in 27 sequences from the most popular datasets. ORB-SLAM achieves unprecedented performance with respect to other state-of-the-art monocular SLAM approaches. For the benefit of the community, we make the source code public.

Vision meets robotics: the KITTI dataset

Article

Full-text available

Sep 2013

We present a novel dataset captured from a VW station wagon for use in mobile robotics and autonomous driving research. In total, we recorded 6 hours of traffic scenarios at 10–100 Hz using a variety of sensor modalities such as high-resolution color and grayscale stereo cameras, a Velodyne 3D laser scanner and a high-precision GPS/IMU inertial navigation system. The scenarios are diverse, capturing real-world traffic situations, and range from freeways over rural areas to inner-city scenes with many static and dynamic objects. Our data is calibrated, synchronized and timestamped, and we provide the rectified and raw image sequences. Our dataset also contains object labels in the form of 3D tracklets, and we provide online benchmarks for stereo, optical flow, object detection and other tasks. This paper describes our recording platform, the data format and the utilities that we provide.

Distinctive image features from scale-invariant key points

Article

Jan 2003
INT J COMPUT VISION

D. Lowe

Comparing holistic and feature-based visual methods for estimating the relative pose of mobile robots

Article

Mar 2017
ROBOT AUTON SYST

Feature-based and holistic methods present two fundamentally different approaches to relative-pose estimation from pairs of camera images. Until now, there has been a lack of direct comparisons between these methods in the literature. This makes it difficult to evaluate their relative merits for their many applications in mobile robotics. In this work, we compare a selection of such methods in the context of an autonomous domestic cleaning robot. We find that the holistic Min-Warping method gives good and fast results. Some of the feature-based methods can provide excellent and robust results, but at much slower speeds. Other such methods also achieve high speeds, but at reduced robustness to illumination changes. We also provide novel image databases and supporting data for public use.

ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras

Article

Oct 2016

We present ORB-SLAM2 a complete SLAM system for monocular, stereo and RGB-D cameras, including map reuse, loop closing and relocalization capabilities. The system works in real-time in standard CPUs in a wide variety of environments from small hand-held indoors sequences, to drones flying in industrial environments and cars driving around a city. Our backend based on Bundle Adjustment with monocular and stereo observations allows for accurate trajectory estimation with metric scale. Our system includes a lightweight localization mode that leverages visual odometry tracks for unmapped regions and matches to map points that allow for zero-drift localization. The evaluation in 29 popular public sequences shows that our method achieves state-of-the-art accuracy, being in most cases the most accurate SLAM solution. We publish the source code, not only for the benefit of the SLAM community, but with the aim of being an out-of-the-box SLAM solution for researchers in other fields.

An improved ORB, gravity-ORB for target detection on mobile devices

Conference Paper

Jun 2016

Feature matching is at the base of the target detection problem. Current methods rely on costly descriptors for detection and matching. This paper presents an improved feature descriptor based on ORB, called Gravity-ORB for target detection in mobile devices. Compared with traditional descriptors such as SIFT or ORB, the concept design can perform fast feature matching under the condition of keeping the restriction on robustness, even in the case where mobile devices have limited computational capacity. Specially, Gravity-ORB reduces the complexity of feature computation for mobile devices with less computing by using gravity acceleration sensors. In the end, experiments conducted in smart phones and tablets demonstrate the effectiveness and real-time performance of the proposed method.

Pose Estimation for Augmented Reality: A Hands-On Survey

Article

Jan 2016

Augmented reality (AR) allows to seamlessly insert virtual objects in an image sequence. In order to accomplish this goal, it is important that synthetic elements are rendered and aligned in the scene in an accurate and visually acceptable way. The solution of this problem can be related to a pose estimation or, equivalently, a camera localization process. This paper aims at presenting a brief but almost self-contented introduction to the most important approaches dedicated to vision-based camera localization along with a survey of several extension proposed in the recent years. For most of the presented approaches, we also provide links to code of short examples. This should allow readers to easily bridge the gap between theoretical aspects and practical implementations.

Speeded-up robust features (SURF)

Article

Jun 2008

This article presents a novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features). SURF approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster. This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors and descriptors (specifically, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and by simplifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps. The paper encompasses a detailed description of the detector and descriptor and then explores the effects of the most important parameters. We conclude the article with SURF's application to two challenging, yet converse goals: camera calibration as a special case of image registration, and object recognition. Our experiments underline SURF's usefulness in a broad range of topics in computer vision.

Evaluation of Image Feature Detection and Matching Algorithms

Figures

Recommended publications

Line Detection Methods for Spectrogram Images

High-Resolution Feature Evaluation Benchmark

Line Detection Methods for Spectrogram Images

Feature Detection and Matching With Linear Adjustment and Adaptive Thresholding

Comparison of Image Feature Detection Algorithms

A comparative analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK

Development of Recommendation Model for Image Keypoint Detection and Descriptor Extraction Algorithm