ArticlePDF Available

A New Sample Consensus Based on Sparse Coding for Improved Matching of SIFT Features on Remote Sensing Images

January 2020
IEEE Transactions on Geoscience and Remote Sensing PP(99):1-10

January 2020
PP(99):1-10

DOI:10.1109/TGRS.2019.2959606

Authors:

Pouriya Etezadifar

Imam Hossein University

Hassan Farsi

University of Birjand

In this article, a new method is proposed for feature matching of remote sensing images using sample consensus based on sparse coding (SCSC) to improve the image registration technique. To this end, scale-invariant feature transform (SIFT) features are used to select interesting points for image matching. The extracted points contain some differences and similarities in two images captured from the same area (but different in sensor resolution, azimuth, elevation, contrast, illumination, etc.); in such a case, similar points should be extracted and other dissimilar should be eliminated. In this article, we greatly improve the matching between two images using the SCSC through checking points altogether. Moreover, the proposed method is shown to have better results than standard alternative methods such as random sample consensus (RANSAC) when the number of feature points is too much or have noise. However, it should be noted that for a low-noise and distortion rate, the proposed method and the RANSAC yield similar results. In general, the proposed method using sparse coding achieves a higher correct match rate than the SIFT algorithm. In order to illustrate this issue, the proposed method is compared to other updated matching and registration methods based on the SIFT algorithm. The obtained results confirm the accuracy of this claim and show that the proposed algorithm is accurate between 0.48% and 7.68% rather than SVD-RANSAC, Hoge, Stone, Foroosh, Leprince, Nagashima, Guizar, Youkyung, Lowe, Preregistration, IS-SIFT, SPSA, Gong, Standard SIFT, IS-SIFT, UR-SIFT, Sourabh, and Han methods.

Content uploaded by Hassan Farsi

Content may be subject to copyright.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1

A New Sample Consensus Based on Sparse Coding

for Improved Matching of SIFT Features on

Remote Sensing Images

Pouriya Etezadifar and Hassan Farsi

Abstract— In this article, a new method is proposed for feature

matching of remote sensing images using sample consensus based

on sparse coding (SCSC) to improve the image registration

technique. To this end, scale-invariant feature transform (SIFT)

features are used to select interesting points for image matching.

The extracted points contain some differences and similarities

in two images captured from the same area (but different

in sensor resolution, azimuth, elevation, contrast, illumination,

etc.); in such a case, similar points should be extracted and

other dissimilar should be eliminated. In this article, we greatly

improve the matching between two images using the SCSC

through checking points altogether. Moreover, the proposed

method is shown to have better results than standard alternative

methods such as random sample consensus (RANSAC) when the

number of feature points is too much or have noise. However,

it should be noted that for a low-noise and distortion rate,

the proposed method and the RANSAC yield similar results.

In general, the proposed method using sparse coding achieves a

higher correct match rate than the SIFT algorithm. In order

to illustrate this issue, the proposed method is compared to

other updated matching and registration methods based on the

SIFT algorithm. The obtained results conﬁrm the accuracy of

this claim and show that the proposed algorithm is accurate

between 0.48% and 7.68% rather than SVD-RANSAC, Hoge,

Stone, Foroosh, Leprince, Nagashima, Guizar, Youkyung, Lowe,

Preregistration, IS-SIFT, SPSA, Gong, Standard SIFT, IS-SIFT,

UR-SIFT, Sourabh, and Han methods.

Index Terms—Image matching, image registration, random

sample consensus (RANSAC), scale-invariant feature transform

(SIFT), sparse coding.

I. INTRODUCTION

THE image matching using the SIFT features for ﬁnding

corresponding points in two or more images of the same

scene has been used in many methods such as image matching

to align images from a different camera source [1]–[3], video

summarization for selecting informative frames of the video

[4] as well as change detection by observing land feature

Manuscript received September 4, 2019; revised June 30, 2019 and

September 11, 2019; accepted December 3, 2019. This work was supported

by the University of Birjand. (Corresponding author: Hassan Farsi.)

P. Etezadifar is with the Department of Electrical Engineering, Imam

Hussein University (IHU), Tehran 1698715461, Iran (e-mail: petezadifar@

ihu.ac.ir).

H. Farsi is with the Department of Electrical and Computer Engi-

neering, University of Birjand, Birjand 9717434765, Iran (e-mail:

hfarsi@birjand.ac.ir).

Color versions of one or more of the ﬁgures in this article are available

online at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TGRS.2019.2959606

differences at different times for remote sensing images [5].

Moreover, these methods are used extensively in commercial

ﬁelds for face recognition, the study of drought trends, and

water resources assessment. One of the most important things

to consider in these methods is that matching points are as

correct as possible and result in a minimum error. The reason

is that the error in aligned points leads to a reduction in the

efﬁciency of matching methods.

In most of the methods proposed for image matching,

the most distinguishing feature is to ﬁnd the correct match-

ing points. Accordingly, new approaches are proposed for

selecting similar points which have led to the improvement

of image registration methods. Image matching techniques

are divided into two distinct feature-based and area-based

categories. The image matching based on the feature is divided

into two main groups of the Bag of Words (BoW) model and

feature descriptors [6]. The BoW model categorizes features

in the image using codebook generation. Feature-descriptor-

based methods use feature extraction for image matching.

We proposed a feature-based method for image matching in

this article which has combined SIFT features and sparse

coding for this purpose.

The rest of this article is organized as follows. In Section I,

we give a brief review of the feature-based and area-based

techniques. In Section III, we propose our matching model.

Accordingly, in Section III-A, the sparse coding problem

is analyzed brieﬂy. In Section III-B, the proposed image

matching model is described and solved. In Section III-C,

our proposed algorithm is presented as pseudocode which is

obvious for implementation. In Section III-D, the features used

in this article are described. In Section IV, the experimental

results are displayed in Section IV. Finally, in Section V,

we draw the conclusion.

II. RELATED WORKS

Lu et al. [7] proposed a method, for categorization based

on the BoW model, which can be used to match objects in

images with each other. Many methods such as the SIFT

[8] and speeded up robust features (SURF) have been pro-

posed to extract features of images, which are popular in

image matching. Joglekar et al. [9] proposed a new method,

which uses the SIFT features and the relaxation indexing for

remote sensing image matching. In this article, a probabilistic

See https://www.ieee.org/publications/rights/index.html for more information.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

neural-network-based feature-matching algorithm is proposed

for stereo images. These methods were very sensitive to

image noise and distortion. Li et al. [10] proposed a method

based on the Harris method that combines adaptive threshold

and random sample consensus (RANSAC). First, the Harris

feature points are selected based on the adaptive threshold,

and second, the normalized cross correlation matching and

the RANSAC are applied to precisely match the detected

Harris corners. Tran et al. [11] proposed a method to design

integrates compact image retrieval to estimate the location in

extensive city regions. Also, they proposed a new hashing-

based cascade search for fast computation of 2-D–3-D cor-

respondences. In addition, they proposed a new one-many

RANSAC for accurate pose estimation. Jiayi et al. [12]

proposed an image matching method that was performed using

vector ﬁeld consensus and maximum a posteriori (MAP)

estimator. In this method, ﬁrst, all correct and false points

of the feature were considered. Then, false points were

removed based on the MAP estimator and correct feature

points remained. Our proposed method is relatively close to

Jiayi’s proposed method. However, in our method, sparse

coding was used, which led to an improvement in the method

as compared to Jiayi’s method. Youkyung et al. [13] proposed

a method to enhance the accuracy of automatic high-resolution

image registration [13], which used SIFT features. In their

method, by optimizing the target function, they selected correct

feature points. Both this method and our proposed method

are based on the optimization. However, in our proposed

method, we used afﬁne transform to select correct points,

which results in better performance and efﬁciency. A method

was proposed by Tong et al. [14] to phase correlation using

SVD and uniﬁed RANSAC. Which SVD theoretically uniﬁed

RANSAC algorithm acts as a robust estimator for the line

ﬁtting. In [15], He and his colleagues used an improved SIFT

technique for matching sequences of images taken from a line-

scanning ophthalmoscope (LSO). Also, we proposed a novel

SIFT descriptor to reduce calculation time, implementing the

original SIFT method. Yu et al. [16] proposed a hierarchal

image matching method using the CNN feature pyramid. Yu’s

method advantage is the complementarity of different layers

using guidance from higher layer to lower layer. Gonçalves

et al. [17] improved the performance of remote sensing image

matching. Sedaghat’s method was based on a new strategy for

choosing features of the SIFT algorithm. We implement our

proposed method using the features extracted from Sedaghat’s

method, which led to increased matching points, improved

matching rate, and reduced matching errors. In [18], a subpixel

phase correlation method was proposed by Tong using the sin-

gular value decomposition (SVD) and RANSAC methods. He

used the SVD method and converted the estimating problem

to 1-D space, which made it simpler and more efﬁcient. Li

et al. [19] proposed a new method based on spatial–spectral

SIFT for HSI matching and classiﬁcation using a spatial–

spectral model of spectral value and gradient change to analyze

information. In [20], the modiﬁed SIFT version was proposed.

In the next step, the bivariate histogram and the RANSAC

algorithm were used to correct matching points. Finally, a

method was proposed to maximize the matching points. Our

method improves the matching point’s correction, which led

to better results as compared to Paul’s method. Paul and

Pati [21] proposed a new method by improving the matching

by using SIFT features. To this end, Kupfer selected closed

feature points using the nearest neighbor and Hough-like

voting scheme algorithm. However, the runtime was faster in

our proposed method. In [22], a taxonomy of image matching

methods based on the dense disparity map was studied. Our

proposed method was based on this feature. Therefore, we did

not study the area based algorithm in this article.

III. PROPOSED MATC HI NG MODEL

In this section, we attempt to solve the matching problem

by proposing a method based on sparse coding. The basis of

this article is classical equations and optimization. Therefore,

the general method is ﬁrst obtained using the sparse coding

for a Laplacian distribution based on the MAP estimation

model. In the next step, the generalized equation obtained

in Section III-A is reformulated for remote sensing image

matching. After solving the solution, an algorithm is proposed.

Finally, the proposed method is presented for easy implemen-

tation in a pseudocode format.

A. General Problem Formulation Based on Sparse Coding

In Section III-B, the aim is to select the inlier data and

remove the outlier by training and sparse dictionary selecting.

Thus, in the ﬁrst step, we discuss the problem. In this case,

we assume that the observation vector xcan be deﬁned as

y=Ax+n+o&y,n,o∈m&A∈m×n&x∈n.(1)

In (1), yis representing the reference image observations, n

denotes noise, and oindicates the outlier data. Moreover, Ais

known as the dictionary matrix and xis an observation vector

of the image to be matched. Moreover, the prior distribution

of each element of xand ohas a Laplacian distribution with

zero mean, and the elements are independent of each other

and identically distributed. The reason for using Laplacian

distribution is that this distribution is sharp around zero

compared to the other distributions such as Gaussian which is

more appropriate for the sparse answer. Moreover, the noise

has a Gaussian distribution with zero mean and its variance

is σ2I. On that basis, we calculate the best estimate of xand

ovectors. In this problem, since the prior distributions of the

random variables, xand o, are available, the best estimation

is MAP [23]. Therefore, the estimated xand ovectors can be

solved by

{ˆomap,ˆxmap}=arg max

o,x

{Ln(P(y|x,o))

+Ln(P(x))+Ln(P(o))}.(2)

According to the assumptions of the problem, we know

that the conditional distribution of y|x,ois equal to the noise

distribution with Ax+omean and σ2Ivariance, which has the

normal distribution N(Ax +o,σ2I). Equation (3) is obtained

from (2) based on the conditional probability of y|x,oand

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ETEZADIFAR AND FARSI: NEW SCSC FOR IMPROVED MATCHING OF SIFT FEATURES ON REMOTE SENSING IMAGES 3

eliminate ﬁxed values according to the dictionary modeling

used in Group Lasso [24] as follows:

{ˆ

O,ˆ

X}=arg min

O,X

{Y−AX−O2

F+λ1X1+λ2O1}.(3)

We know that the Xvalues are available as observations.

Therefore, the function X1has a ﬁxed value and is ineffec-

tive in the optimization process. Moreover, the values of the

dictionary, A∈

d×k, are called learning the dictionary. Here,

Y∈

d×nis the reference image feature point coordinates

and X∈

k×nindicates the image feature point coordinates

which must be matched. Furthermore, O∈

d×nrepresents

the outlier named as pursuit coefﬁcient matrix. According to

the above explanation, (3) can be reordered as

{ˆ

A,ˆ

O}=arg min

A,O

{Y−AX −O2

F+λO1}.(4)

In (4), XFdenotes Frobenius norm deﬁned as XF=

(i,jX2

ij)1/

2. Moreover, O1=i,j|Oij|is known as

l1norm. If λhas a small value, then the solution of (4) tends

in the direction that all the reference and matching points are

mapped to each other with the least error and solving the

equation proceeds to select all the feature points as matching

points. When λincreases, the number of zero elements in

the matrix Otends to increase. This means that solving the

equation proceeds to a minimum number of matching points,

which leads to an increase in the undesirable matching error

rate. The second part of (4) calculates the sparsity of the matrix

using and attempts to make many columns of the matrix O

equal to zero and select matching points using the nonzero

column. The problem of using l1norm is that the points with

a slight error are eliminated. To solve this problem, l1,2norm

can be used instead of l1norm [25], which is deﬁned as

O2,1=ioi2. Moreover, oi2denotes l2norm of the

ith column in the matrix O. Another problem in (4) is that the

appropriate value of λcan be selected in the range 0 <λ≤∞.

However, by reordering (4), the value of λcan be selected in

the range of 0 <λ≤1 as follows:

{ˆ

A,ˆ

O}=arg min

A,Oλ

2Y−AX−O2

F+1−λ

2O2,1.(5)

In (5), for a value of λ=1, the reconstruction error is

reduced to the lowest value but the sparsity condition of the

matrix Ois not checked. Moreover, for the value λ=ε,

the matrix Ois chosen but the condition for minimizing the

reconstruction error is not checked. Therefore, in this case,

the matrix Ois constructed as the zero matrix.

B. Solution of the Formulated Problem for Remote Sensing

Image Matching

In this section, we ﬁrst review the proposed matching

scheme. Then, we rewrite the general (5) for our problem and

solve the problem. Finally, the proposed method is presented

as an optimization problem in the form of a pseudocode

algorithm.

1) Deﬁnition of the Proposed Matching Scheme: In the

image matching process, after extracting features and select-

ing the corresponding points in the two test/reference

images, outlier points should be deleted. The RANSAC algo-

rithm [26] or its modiﬁcation is generally used for this

goal [27]. The RANSAC is a general method for ﬁnding

the geometric model parameters by using a set of sample

data. It is assumed that x=(x1,x2)is a feature point

coordinate of the test image and y=(y1,y2)is the feature

point coordinate of the reference image which is found as

the corresponding point of xin the matching process. Given

that the images have variations in translation [28]. Therefore,

the afﬁne transform is an appropriate model for matching.

In the following, we assume that for a pair of (x,y)sample

points that satisﬁes the geometric model, the following can be

written:

y=Axa,&A=abc

de f

&xa=(x1,x2,1). (6)

As the number of unknown parameters (the number of

matrix strings) is equal to 6, three feature points are sufﬁcient

to determine these parameters. Using these three points, matrix

Ais obtained by solving the following problem:

A=arg min



i=1

yi−Axai2

2=(XXT)−1XTY.(7)

In (7), the columns of the matrices Xand Ydenote xai and

yifeature point coordinates, respectively. Now, the question

is how this geometric model (transform parameters) can be

found. According to the obtained matrix A, inlier points

are selected and outlier points are removed. The RANSAC

algorithm has a random nature, and in the limit, it should

select and analyze the C3

Nfeature point coordinates (the

number of Nfeature points is considered; from here on,

in this article, the number of feature points is assumed to

be N). In other words, the complexity of this algorithm is

of order θ(n2), which is not acceptable in many applications.

In this section, we propose an algorithm that has much less

computational complexity than the RANSAC algorithm. This

claim is supported by a comparison of the runtime between

our proposed method and the other image matching methods

based on RANSAC and modiﬁed methods. By putting (6) in

(5) and deﬁning the o={oT

i}N

i=1,oi=(oix,oiy )vector, the

proposed problem is presented in this article. Before proposing

our problem, it should be noted that, ﬁrst, for the feature points

mapped without errors in the geometric model, the vector

Oshould be zero. This means that these feature points are

inliers. However, for other feature points, the vector has a

nonzero value. This means that these feature points had a large

deviation from the calculated model. Second, the number of

the outlier feature points is much lower than the number of the

feature points which satisﬁed the geometric model (the inliers).

Therefore, the vector Ohas values close to zero. Moreover,

the Xa={xT

ai}N

i=1,xai =(x1i,x2i,1)vector is the measured

values for the test image point coordinates and has no effect on

the optimization. In the following, the optimization problem

should be carried out iteratively between selecting sparse

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

dictionary and updating the dictionary. Therefore, the problem

is rewritten as follows:

{ˆ

A(k+1),ˆ

O(k+1)}

=arg min

A,Oλ

2Y−A(k)Xa−O(k)2

F+1−λ

2O(k)2,1.

(8)

Because the proposed method is an iterative method, in (8),

the variable krepresents the number of algorithm repetitions

that is updated after training its k+1value.

2) Solution of the Proposed Matching Problem: We so lve

the proposed problem in this article in Sections III-B2a) and

III-B2b). In Section III-B2a), we solve the selecting sparse

dictionary problem, and in Section III-B2b), the dictionary

learning is discussed.

a) Sparse dictionary selection: In this section, the sparse

dictionary selection is performed by using the matching model.

For this purpose, the matrix Ais initialized using random

values and acted as ﬁxed data at this stage. Therefore, it does

not affect solving the optimization process. Therefore, (8) is

rewritten

{ˆo(k+1)

l}=arg min

Oλ

2

Y−A(k)Xa−o(k)

l



+1−λ

2

o(k)

l

2,1.(9)

As shown in (9), to calculate the lth row of the matrix O

in the k+1th iteration, the rows of the dictionary Ain the kth

iteration are used. To solve the above equation, the gradient

is calculated with respect to Oand set to zero. Therefore,

the elements value of ˆ

Olis calculated using (9) by

−λ

2Y−A(k)Xa−o(k)

l+1−λ

o(k)



o(k)

l

2

→−

2(Y−A(k)Xa)+o(k)

lλ

2+1−λ

2

o(k)

l

2=0

→Y−A(k)Xa=o(k)

l1+1−λ

λ

o(k)

l

2

⎛

⎝1+1−λ

λ

o(k)

l

2

⎞

⎠≥0

−−−−−−−−−−−→

⎛

⎝1+1−λ

λ

o(k)

l

2

⎞

⎠→1

×o(k)

l=β(Y−A(k)Xa). (10)

By the replacement of o(k)

l=β(Y−A(k)xa)with o(k)

l[1+

((1−λ)/(λo(k)

l2))]=Y−A(k)xa, (10) can be rewritten as

β1+1−λ

βλY−A(k)Xa2=1

⇒β+1−λ

λY−A(k)Xa2

⇒β=1−1−λ

λY−A(k)Xa2

.(11)

Using (10) and (11), o(k)

lcan be deﬁned as

o(k)

l=1−1−λ

λY−A(k)X2·(Y−A(k)X), λ > γ

0,otherwise

(12)

where γdeﬁned as 1/(1+Y−A(k)X2). In (12),

the boundary condition is obtained by satisfying the

β>0⇒1−(1−λ)/(λY−A(k)xa2)>0 expression.

The element’s value of each row of the matrix ois computed

using (12) and the element’s value of o(k)

lminimizing (9)

which is selected as the sparse solution in the kth iteration.

b) Dictionary learning: After selecting the sparse dic-

tionary, the outlier data is slightly determined. Then, using

dictionary learning, we obtain an accurate model to reduce the

matching error rate. The initial outliers in the sparse dictionary

selection step are eliminated since they are away from other

feature points. Therefore, applying dictionary learning using

the data with fewer errors leads to improved matching models.

In accordance with that, we focus on the dictating learning

process to improve the matching model. At this stage, (13)

can be rewritten via the replacement of o(k)←o(k+1)for

dictionary learning as

A(k+1)=arg min

Aλ

2Y−A(k)Xa−O(k)2

+1−λ

2O(k)2,1

×Z=Y−O(k)

−−−−−−−−−−−−−→

1−λ

2O(k)2,1=Const

A(k+1)

=arg min

Aλ

2Z−A(k)Xa2

+Const (13)

where o(k+1)is the new value is obtained and o(k)is the

Algorithm 1 Image Matching Using Sample Consensus Based

on Sparse Coding

Input: The Features Position Set for Test & Ref Remote

Sensing Xa,Y,O,A,λ

Initialization: O=0, A=Rand(2,3),λ=0.32

Output: The Selected Features, Which is Used for Match-

ing Process

repeat

for l =1,2,··· ,Ndo

if λ> 1

1+Y−A(k)X2

o(k)

l=1−1−λ

λY−A(k)X2×(Y−A(k)X)

Else

o(k)

l=0

end i f

end f or

O←O(k+1)

A=(Y−O(k))(Xa)T(Xa(Xa)T)−1

Support (O)=



h=1

(lim

p→0(



k=1

|ok,h|p)) =#{i:oi= 0}

μ(A)=max

1≤i,j≤m,i= j

|a

iTa

a

i2cot a

j2&Er_A=

A−A2

2&A←A

Er_A=A−A2

2&A←A

Until Support(O)<1

2(1+1/

μ(A))OR Er_A≤0.001

previous elements value of the variable o. At this stage,

considering that the optimization process is relative to the

dictionary (Matrix A), the matrix ois considered constant

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ETEZADIFAR AND FARSI: NEW SCSC FOR IMPROVED MATCHING OF SIFT FEATURES ON REMOTE SENSING IMAGES 5

TAB LE I

ZY-3 IMAGING SENSOR FEATURE

and has no effect on the optimization. In order to update the

dictionary, it is needed to calculate the gradient for (13) with

respect to Aand set it to zero. Therefore, it can be deﬁned as

∂A(k+1)

δA=0⇒

∂λ

2Z−A(k)Xa2

F+Const

δA=0

⇒−(Z−A(k)Xa)(Xa)T=0

⇒Z(Xa)T=A(k)Xa(Xa)T

⇒A=Z(Xa)T(Xa(Xa)T)−1

Z=Y−O(k)

−−−−−−→A=(Y−O(k))(Xa)T(Xa(Xa)T)−1.

(14)

c) Pseudocode algorithm for proposed image matching:

The pseudocode image matching is introduced using sample

consensus based on sparse coding in Algorithm 1.

As shown in Algorithm 1 (pseudocode), the initial input

values for implementing the proposed method are feature

point coordinates of the image reference indicated as Y.Xa

denotes test image with their values being extracted by the

SIFT feature algorithm. The matrix opresents the outlier data

which was constructed as the zero matrix in the ﬁrst step.

The afﬁne transformation matrix A is initialized randomly

to prevent obtaining the local optimal and λcorresponding

to a tradeoff between the sparsity of the matrix o and the

reconstruction error (Y−AX −O). After examining different

values of λon several images, the optimum value of λ

is 0.27 for image matching using the proposed algorithm.

As shown in Algorithm 1 (pseudocode) and also detailed

and proved in Section III-A, the proposed method consists of

two parts: sparse dictionary selection and dictionary learning.

In the dictionary selection step, the element values of the

matrix Oare selected using the dictionary. In the next step,

the dictionary is trained using the matrix O. This iterative

operation is run until one of the two convergence conditions

is satisﬁed. In the following, we describe the convergence

conditions. In terms of convergence, the parameter Support(·)

is deﬁned as the number of nonzero elements of a matrix

known as || · ||0. In addition, the μ(·) parameter is introduced

as mutual-coherence which indicated the matrix (A) columns

correlation [29].

In terms of convergence, we consider these two problems:

ﬁrst, the matrix ois sufﬁciently sparse; second, if the matrix o

could not achieve its sparsity, it would be examined whether

the matrix A(denoting the afﬁne transform) is convergent.

The reason for using the two conditions for convergence is

that the sparsity condition may require a high repetition time

and it is very strict. However, when the matrix Aconvergence

condition is met, an acceptable solution could be obtained.

C. SIFT

The SIFT features suitable for image matching because of

independent from scale, rotate, and change in brightness [8].

D. Improved SIFT Features

In this article, the improved SIFT features are also used to

compare with other methods including IS-SIFT [17], UR-SIFT

[18], MS-SIFT [21], and A2-SIFT [30].

IV. PROPOSED METHOD EVALUATION AND

EXPERIMENTAL RESULTS

In this section, we ﬁrst implement our proposed method for

remote sensing images group which is provided in detail and

compare the matching rate with other methods. In the next

step, only the results of comparing our proposed method with

other methods for remote sensing images group are reported.

All these experiments show improvements in our proposed

method as compared to other methods for remote sensing

image matching.

A. Experimental Results for ZY-3 Remote Sensing Images

In the ﬁrst experiment, our proposed method is compared

for images obtained from the ZiYuan-3 (ZY-3) remote sensing

satellite with the proposed Tong method [14] introduced as

SVD-RANSAC. The ZY-3 imaging sensor feature is shown

in Table I. Based on that, the experimental conditions and

their implementation are reviewed.

Before comparing our proposed method with the SVD-

RANSAC, it is necessary to simulate the images as described

in the reference. After that, the results of the proposed method

are compared with the method described in this reference,

along with the Hoge [31], Stone [32], Foroosh [33], Leprince

[34], Nagashima [35], and Guizar [36] methods. In order to

correctly compare our proposed method with the reference

methods, the process similar to the implementation process in

[14] is performed on the images. The ZY-3 remote sensing

satellite images whose speciﬁcations are shown in Table II

are used to compare the results of our proposed method with

those of the reference methods.

As shown in Table II, the images were obtained from differ-

ent areas of the Earth with different regional features. In Fig. 1,

the images used in this experiment are shown. As described

in [14], the images are converted to new images by adding

noise which constructed 450 images using these six reference

images. Aliasing occurs when the sensor sampling rate does

not follow the Nyquist theorem [36]. The mentioned cases are

considered to be negative factors in signal processing [37].

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

TAB LE I I

ZY-3 REMOTE SENSING SATELLITE IMAGES SPECIFICATION

Fig. 1. ZY-3 satellite imagery presented in Table II.

TABLE III

RUNTIME AND MEAN ABSOLUTE PIXEL ERROR COMPARISON FOR THE

DATABASE SIMULATED WITH ERROR ALIASING

1) Making Images With Aliasing Error: For adding aliasing

errors to the reference images, subpixel shifts are used based

on low band ﬁltering and downsampling. For aliasing error,

the results are compared to the Hoge, Stone, Foroosh, Lep-

rince, Nagashima, and Guizar methods. For all the 150 sim-

ulated images, the calculated values are reported in Fig. 2.

As shown in Fig. 2, our SCSC method worked well in the

presence of the aliasing error and was able to make a good

improvement in the results. Fig. 2 also demonstrates that the

Foroosh method has the highest sensitivity to the aliasing error

followed by the Nagashima and Guizar methods, compared to

the other methods. On the contrary, the SCSC has the highest

resistance to the aliasing error followed by the SVD-RANSAC,

Stone and Hoge methods as compared to the other methods.

However, the results for the SCSC and SVD-RANSAC are

relatively close to each other.

Fig. 2. Comparison of the calculated error values including (a) mean value,

(b) rms, (c) max value, and (d) standard deviation.

Fig. 3. Comparison of the calculated error values including (a) mean value,

(b) rms, (c) max value, and (d) standard deviation. The output of the Hoge

method was cut off for better values display.

Fig. 4. Comparison of the methods in terms of (a) SVD-RANSAC, (b)

Guizar, and (c) SCSC for three images with aliasing error and Gaussian ﬁlter

with σ=1.

2) Making Images With Additive Noise: In this section,

a new image database is created by additive white Gaussian

noise (AWGN) with a zero mean and different variances.

For this purpose, the second three images (images 4, 5,

and 6 of the ZY-3 images) are used. The results of this

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ETEZADIFAR AND FARSI: NEW SCSC FOR IMPROVED MATCHING OF SIFT FEATURES ON REMOTE SENSING IMAGES 7

TAB LE I V

COMPARISON OF THE CORRECT MAT C H I N G RATE BETWEEN THE SCSC METHOD WITH THE LOWE AND YOUKYUNG METHODS EXTRACTED FROM [17]

TAB LE V

COMPARISON OF THE BI AS AND STANDARD DEVIATION VALUES BETWEEN THE SCSC WITH LOWE AND YOUKYUNG METHODS

experiment are reported in Fig. 3. As can be seen in Fig. 3,

the SCSC is well resisted against noisy images and provides

an appropriate improvement in the results compared to all

the other methods. Moreover, Fig. 3 shows that the Foroosh,

Stone, Hoge, and Nagashima methods were more sensitive

to noise than the other methods, and Hoge, unlike its good

behavior against the aliasing error, has the highest sensitivity to

the noise. However, the results for the SCSC, SVD-RANSAC,

and Leprince methods are relatively close to each other; the

resistance of the SVD-RANSAC and Leprince methods is in

the form of a ripple falling down and rising up against the

noise.

All the experiments are run using the MATLAB program on

a computer with CPU/INTEL Core i3-2120 with a 3.3-GHz

frequency. The runtime and mean absolute pixel error are

calculated using the database generated by the aliasing error

for SCSC, SVD-RANSAC, Hoge, Stone, Foroosh, Leprince,

Nagashima, and Guizar methods in which the variance values

σvaried between 1 and 5 with a unit step. It should be

noted that the runtime is included for the 150 images with

ﬁve different σvalues, which, in total, are 750 repetitions

of the algorithm. The comparison of the runtimes is shown

in Table III.

As shown in Table III, the SCSC runtime is relatively

low. The SCSC runtime, compared to the SVD-RANSAC

method which has the closest values to the proposed method

in terms of the simulation results, decreases by half. In other

cases, the Foroosh, Hoge, and Nagashima methods have lower

running-time than the SCSC. However, as shown in Table III

and Figs. 2 and 3, these methods have a high sensitivity

compared to the SCSC method for the simulated database

by adding noise and aliasing distortion. The results of the

comparison between the proposed method with the SVD-

RANSAC and Guizar methods are shown in Fig. 4, where

the number of correct and false matches is calculated sta-

tistically through visual inspection. This experiment is cal-

culated for three images with aliasing error and variance

value σ=1.

TAB LE V I

COMPARISON OF MAT C H I NG ACCURACY VALUES BETWEEN THE SCSC

WITH LOWE AND YOUKYUNG METHODS

TAB LE V II

COMPARISON OF THE IMAGE MATC H I N G BETWEEN THE SCSC METHOD

WITH THE PREREGISTRATION,IS-SIFT, SPSA, AND GONG METHODS

USING [18]

B. Experimental Results and Evaluation of the Proposed

Method for QuickBird-2, IKONOS-2, and

KOMPSAT-2 Remote Sensing Images

In this section, we compare our proposed method with [13].

Youkyung et al. [13] selected the correct point by optimization

which was using both the distribution of the matching points

and the reliability of the transformation model Using SIFT

features. For testing, images of the South Korean Daejeon

region are obtained from three QuickBird-2, IKONOS-2, and

KOMPSAT-2 satellites. More features are reported in [13,

Table I]. In the ﬁrst evaluation, the SCSC method is compared

to the Youkyung and Lowe methods as reported in Table IV.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

TABLE VIII

COMPARISON OF THE IMAGE MAT C H I N G BETWEEN THE SCSC METHOD WITH THE STANDARD SIFT, IS-SIFT, UR-SIFT, AND SOURABH METHODS

FROM TABLE II [21]

TAB LE I X

COMPARISON OF THE MAT C H I N G RATE BETWEEN THE SIFT, HAN,AND SCSC METHODS USING THE OBTAINED INFORMATION FROM TABLE II [34]

As shown in Table IV, the proposed method (SCSC)

provides more correct matching points than the Lowe and

Youkyung methods, and Lowe’s method is extremely weak.

Moreover, in the SCSC method, the number of extracted

matching points is less than the other two methods, which

leads to a signiﬁcant increase in the matching rate compared to

the other two methods. In another evaluation, we compare bias

and standard deviation for the extracted feature points along

the axes of xand yand the results are reported in Table V.

Furthermore, the RMSE value, using the information obtained

from Table V, is reported in Table VI. The reported values

for the Lowe and Youkyung methods are obtained from [13,

Table 4] .

As shown in Tables V and VI, the SCSC method improves

the matching rate. Although the matching accuracy between

the IKONOS-2 and KOMPSAT-2 images has the least

improvement, this is due to a large change in the angle of

the imaging between the IKONOS-2 and KOMPSAT-2 images

which is 28.02◦.

C. Experimental Results for Landsat TM and SPOT

In this experiment, the results of the SCSC method are

compared to the proposed method by Gong et al. [20] which

is using the Marquardt–Levenberg search strategy [39]. The

results obtained for this experiment are reported in Table VII.

As can be seen from Table VII, the proposed method

(SCSC) has the lowest error rate compared to the other

methods. The SCSC runtime is close to the preregistration

method but is much better than preregistration in the matching

rate.

D. Experimental Results for Landsat TM, ETM +, and EO-1

In this experiment, we compare the SCSC method to the

method proposed by Sourabh et al. [21] where an improved

SIFT version was proposed to build a matching feature with

uniform distribution. In the proposed method, Sourabh used

three pairs of images in which the speciﬁcation of images

and sensors can be viewed in [21]. In order to evaluate

the proposed method, we use several new evaluation criteria

similar to [21]: the rms error for all extracted matching points

RMSall, the rms error for matched point residuals based on the

leave-one-out [41] method RMSLOO, the statistical measure-

ment of residual distribution in quadrants pquad, the bad point

proportion with norm 1 BPP(1), the statistical measurement of

the presence of residues along a preference axis Skew,and the

statistical measurement of feature points distribution Scat [42].

The experimental results are reported in Table VIII where dratio

indicates the distance between the ﬁrst neighbor to the second

neighbor, and its values are equal to dratio in [21] which are

compared in Table VIII.

As shown in Table VIII, our proposed method has improved

the values of RMSall,RMS

LOO,pquad,and Scat compared to

the other methods.

E. Experimental Results for QuickBird-Pan and Multi-and

Ikonos-Pan Remote Sensing Images

As the last experiment, we compare the results of the

proposed method by Han [34] with our method for remote

sensing images obtained from QuikBird for pan and multi-

sensors and Ikonos for the pan sensor. In this experiment, six

images from the South Korean Daejeon region are used in six

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ETEZADIFAR AND FARSI: NEW SCSC FOR IMPROVED MATCHING OF SIFT FEATURES ON REMOTE SENSING IMAGES 9

TAB LE X

COMPARISON OF THE MATC H I N G BETWEEN THE SIFT, HAN,AND SCSC

METHODS FOR THREE PAIRS OF QUIKBIRD AND IKONOS REMOTE

SENSING IMAGES USING THE OBTAINED INFORMATION FRO M

TABLE III [43]

dates and different angles are named Site1, Site2, and Site3.

The features of these images are given in Table I [43]. In

Table IX, the results of the SCSC method are compared to the

SIFT and Han’s (Table II values [43]) methods. As shown

in Table IX, the SCSC method has lower matching points

than the Han method, but more correct matching points than

the Han and other comparable methods. This leads to a

better improvement in the SCSC method than in the Han and

SIFT methods. Finally, the RMSE error rates of our proposed

method are compared to the Han and SIFT methods and

reported in Table X which achieves lower values.

V. C ONCLUSION

In this article, a classic method was proposed for remote

sensing image matching using sparse coding. Accordingly,

in the ﬁrst step, the image features were extracted using

the SIFT algorithm. In the next step, we combined afﬁne

transformation and sparse coding, and a model for remov-

ing outlier points and choosing correct matching points was

proposed. The outlier was then removed using optimization

and MAP estimation. This algorithm was implemented iter-

atively in the two parts of the sparse dictionary selection

and dictionary learning. The results of the proposed method

on several remote sensing images obtained from different

satellites (ZY-3, QuickBird-2, IKONOS-2, KOMSAT-2, SPOT,

TM, ETM+, EO-1, QuickBird-pan, and Multi- and Ikonos-

pan) were compared to several new image matching methods

(SVD-RANSAC, Hoge, Stone, Foroosh, Leprince, Nagashima,

Guizar, Youkyung, Lowe, Preregistration, IS-SIFT, SPSA,

Gong, Standard SIFT, IS-SIFT, UR-SIFT, Sourabh, and Han ).

In the majority of cases, our proposed method (the SCSC

method) outperformed the other methods.

REFERENCES

[1] W. Ma et al., “Remote sensing image registration with modiﬁed sift and

enhanced feature matching,” IEEE Geosci. Remote Sens. Lett., vol. 14,

no. 1, pp. 3–7, Jan. 2017.

[2] Z. Yang, Y. Yang, K. Yang, and Z.-Q. Wei, “Non-rigid image registration

with dynamic Gaussian component density and space curvature preser-

vation,” IEEE Trans. Image Process., vol. 28, no. 5, pp. 2584–2598,

May 2019.

[3] Q. Zeng, J. Adu, J. Liu, J. Yang, Y. Xu, M. Gong, “Real-time adap-

tive visible and infrared image registration based on morphological

gradient and C_SIFT,” J. Real-Time Image Process., Mar. 2019, doi:

10.1007/s11554-019-00858-x.

[4] P. Etezadifar and H. Farsi, “Scalable video summarization via sparse

dictionary learning and selection simultaneously,” Multimedia Tools,

vol. 76, no. 6, Mar. 2017.

[5] G. Liu, Y. Gousseau, and F. Tupin, “A contrario comparison of local

descriptors for change detection in very high spatial resolution satellite

images of urban areas,” IEEE Trans. Geosci. Remote Sens., vol. 57,

no. 6, pp. 3904–3918, Jun. 2019.

[6] C. Huo, C. Pan, L. Huo, and Z. Zhou, “Multilevel SIFT matching for

large-size VHR image registration,” IEEE Geosci. Remote Sens. Lett.,

vol. 9, no. 2, pp. 171–175, Mar. 2012.

[7] L. Wu, S. C. H. Hoi, and N. Yu, “Semantics-preserving bag-of-words

models and applications,” IEEE Trans. Image Process., vol. 19, no. 7,

pp. 1908–1920, Jul. 2010.

[8] D. G. Lowe, “Object recognition from local scale-invariant features,” in

Proc. IEEE Int. Conf. Comput. Vis., vol. 2, Sep. 1999, pp. 1150–1157.

[9] J. Joglekar, S. S. Gedam, and B. K. Mohan, “Image matching using SIFT

features and relaxation labeling technique—A constraint initializing

method for dense stereo matching,” IEEE Geosci. Remote Sens, vol. 52,

no. 9, pp. 5643–5652, Sep. 2014.

[10] H. Li, J. Qin, X. Xiang, L. Pan, W. Ma, and N. N. Xiong, “An efﬁcient

image matching algorithm based on adaptive threshold and RANSAC,”

IEEE Access, vol. 6, pp. 66963–66971, 2018.

[11] N.-T. Tran et al., “On-device scalable image-based localization via

prioritized cascade search and fast one-many RANSAC,” IEEE Trans.

Image Process ., vol. 28, no. 4, pp. 1675–1690, Apr. 2019.

[12] J. Ma, J. Zhao, J. Tian, A. L. Yuille, and Z. Tu, “Robust point matching

via vector ﬁeld consensus,” IEEE Trans. Image Process., vol. 23, no. 4,

pp. 1706–1721, Apr. 2014.

[13] Y. Han, J. Choi, Y. Byun, and Y. Kim, “Parameter optimization for

the extraction of matching points between high-resolution multisensor

images in urban areas,” IEEE Trans. Geosci. Remote Sens., vol. 52,

no. 9, pp. 5612–5621, Sep. 2014.

[14] X. Tong et al., “A novel subpixel phase correlation method using

singular value decomposition and uniﬁed random sample consensus,”

IEEE Trans. Geosci. Remote Sens., vol. 53, no. 8, pp. 4143–4156,

Aug. 2015.

[15] Y. He et al., “Optimization of SIFT algorithm for fast-image feature

extraction in line-scanning ophthalmoscope,” Optik, vol. 152, pp. 21–28,

Jan. 2018.

[16] W. Yu et al., “Hierarchical semantic image matching using CNN

feature pyramid,” Comput. Vis. Image Understand., vol. 169,

pp. 40–51, Apr. 2018.

[17] H. Gonçalves, L. Corte-Real, and A. Gonçalves, “Automatic image

registration through image segmentation and SIFT,” IEEE Trans. Geosci.

Remote Sens., vol. 49, no. 7, pp. 2589–2600, Jul. 2011.

[18] A. Sedaghat, M. Mokhtarzade, and H. Ebadi, “Uniform robust scale-

invariant feature matching for optical remote sensing images,” IEEE

Trans. Geosci. Remote Sens., vol. 49, no. 11, pp. 4516–4527, Nov. 2011.

[19] Y. Li et al., “A spatial-spectral SIFT for hyperspectral image matching

and classiﬁcation,” Pattern Recognit. Lett., to be published.

[20] M. Gong, S. Zhao, L. Jiao, D. Tian, and S. Wang, “A novel coarse-

to-ﬁne scheme for automatic image registration based on SIFT and

mutual information,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 7,

pp. 4328–4338, Jul. 2014.

[21] S. Paul and U. C. Pati, “Remote sensing optical image registration

using modiﬁed uniform robust SIFT,” IEEE Geosci. Remote Sens. Lett.,

vol. 13, no. 9, pp. 1300–1304, Sep. 2016.

[22] B. Kufer, N. S. Netanyahu, and I. Shimshoni, “An efﬁcient SIFT-based

mode-seeking algorithm for sub-pixel registration of remotely sensed

images,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 2, pp. 379–383,

Feb. 2015.

[23] D. Scharstein, R. Szeliski, and R. Zabih, “A taxonomy and evaluation of

dense two frame stereo correspondence algorithms,” Int. J. Comp. Vis,

vol. 47, no. 1, pp. 5–45, 2002.

[24] M. Yuan and Y. Lin, “Model selection and estimation in regression with

grouped variables,” J. Roy. Statist. Soc. B, Statist. Methodol., vol. 68,

no. 1, pp. 49–67, 2006.

[25] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. Roy.

Statist. Soc. B, Methodol., vol. 58, no. 1, pp. 267–288, 1996.

[26] Y. Cong et al., “Sparse reconstruction cost for abnormal event detection,”

in Proc. IEEE Conf. CVPR, Jun. 2011, pp. 3449–3456.

[27] V. Rodehorst and O. Hellwich, “Genetic algorithm SAmple consensus

(GASAC)—A parallel strategy for robust parameter estimation,” in Proc.

IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPRW),

Jun. 2006, p. 103.

[28] M. Berger, Problems in Geometry. New York, NY, USA:

Springer-Verlag, 1984.

[29] M. Elad, Sparse and Redundant Representations From Theory to

Applications in Signal and Image Processing. New York, NY, USA:

Springer-Verlag, 2010.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

[30] A. Lingua, D. Marenchino, and F. Nex, “Performance analysis of the

SIFT operator for automatic feature extraction and matching in pho-

togrammetric applications,” Sensors, vol. 9, pp. 3745–3766, May 2009.

[31] W. S. Hoge, “A subspace identiﬁcation extension to the phase corre-

lation method,” IEEE Trans. Med. Imag., vol. 22, no. 2, pp. 277–280,

Feb. 2003.

[32] H. S. Stone, M. T. Orchard, E.-C. Chang, and S. A. Martucci, “A fast

direct Fourier-based algorithm for subpixel registration of images,” IEEE

Trans. Geosci. Remote Sens., vol. 39, no. 10, pp. 2235–2243, Oct. 2001.

[33] H. Foroosh, J. B. Zerubia, and M. Berthod, “Extension of phase cor-

relation to subpixel registration,” IEEE Trans. Image Process., vol. 11,

no. 3, pp. 188–200, Mar. 2002.

[34] S. Leprince, S. Barbot, F. Ayoub, and J.-P. Avouac, “Automatic and pre-

cise orthorectiﬁcation, coregistration, and subpixel correlation of satellite

images, application to ground deformation measurements,” IEEE Trans.

Geosci. Remote Sens., vol. 45, no. 6, pp. 1529–1558, Jun. 2007.

[35] S. Nagashima, T. Aoki, T. Higuchi, and K. Kobayashi, “A subpixel image

matching technique using phase-only correlation,” in Proc. IEEE Int.

Symp. Intell. Signal Process. Commun. Syst., Dec. 2006, pp. 701–704.

[36] M. Guizar-Sicairos, S. T. Thurman, and J. R. Fienup, “Efﬁcient subpixel

image registration algorithms,” Opt. Lett., vol. 33, no. 2, pp. 156–158,

2008.

[37] H. Nyquist, “Certain topics in telegraph transmission theory,” Trans.

Amer. Inst. Elect. Eng., vol. 47, no. 2, pp. 617–644, Apr. 1928.

[38] X. Dong et al., “Noise estimation of hyperspectral remote sensing image

based on multiple linear regression and wavelet transform,” Boletim de

Ciências Geodésicas, vol. 19, no. 4, pp. 639–652, 2013.

[39] P. Thévenaz and M. Unser, “Optimization of mutual information for

multiresolution image registration,” IEEE Trans. Image Process.,vol.9,

no. 12, pp. 2083–2099, Dec. 2000.

[40] M. Schneider et al., “Matching of high-resolution optical data to a

shaded DEM,” Int. J. Image Data Fusion, vol. 3, no. 2, pp. 111–127,

2012.

[41] E. M. Mikhail, J. S. Bethel, and J. C. McGlone, Introduction to Modern

Photogrammetry. New York, NY, USA: Wiley, 2001.

[42] H. Gonçalves, J. A. Gonçalves, and L. Corte-Real, “Measures for an

objective evaluation of the geometric correction process quality,” IEEE

Geosci. Remote Sens. Lett., vol. 6, no. 2, pp. 292–296, Apr. 2009.

[43] Y. K. Han et al., “Automatic registration of high-resolution images using

local properties of features,” Photogramm. Eng. Remote Sens., vol. 78,

no. 3, pp. 211–221, 2012.

Pouriya Etezadifar received the B.S., M.S., and

Ph.D. degrees from the University of Birjand, Bir-

jand, Iran, in 2011, 2013, and 2017, respectively, all

in communication engineering.

Since 2017, he has been a Faculty Member with

the Electrical Engineering Department, IHU Univer-

sity, Tehran, Iran, where he is an Assistant Professor.

His main research areas are sparse signal process-

ing, dictionary learning for sparse representation,

machine learning for signal processing, blind source

separation (BSS), sparse signal processing, statis-

tical signal processing, information theory, and digital speech/video/image

processing.

Hassan Farsi received the B.Sc. and M.Sc. degrees

from the Sharif University of Technology, Tehran,

Iran, in 1992 and 1995, respectively, and the Ph.D.

degree from the Centre of Communications Systems

Research (CCSR), University of Surrey, Guildford,

U.K., in 2004.

He currently works as a Professor of communica-

tion engineering with the Department of Electrical

and Computer Engineering, University of Birjand,

Birjand, Iran. He is interested in speech, image, and

video processing on wireless communications.

Fast Automatic Registration of UAV Images via Bidirectional Matching

Article

Full-text available

Oct 2023
SENSORS-BASEL

Image registration plays a vital role in the mosaic process of multiple UAV (Unmanned Aerial Vehicle) images acquired from different spatial positions of the same scene. Aimed at the problem that many fast registration methods cannot provide both high speed and accuracy simultaneously for UAV visible light images, this work proposes a novel registration framework based on a popular baseline registration algorithm, ORB—the Oriented FAST (Features from Accelerated Segment Test) and Rotated BRIEF (Binary Robust Independent Elemental Features) algorithm. First, the ORB algorithm is utilized to extract image feature points fast. On this basis, two bidirectional matching strategies are presented to match obtained feature points. Then, the PROSRC (Progressive Sample Consensus) algorithm is applied to remove false matches. Finally, the experiments are carried out on UAV image pairs about different scenes including urban, road, building, farmland, and forest. Compared with the original version and other state-of-the-art registration methods, the bi-matching ORB algorithm exhibits higher accuracy and faster speed without any training or prior knowledge. Meanwhile, its complexity is quite low for on-board realization.

A new person re-identification method by defining CNN-based feature extractor and sparse representation

Article

Full-text available

Jun 2023
MULTIMED TOOLS APPL

Due to the rapid increase of using surveillance cameras, it has become more important to re-identify persons on different non-overlapped cameras. Person re-identification is an important and challenging topic on machine vision and media processing. Few data for training, low quality of surveillance videos and varying position of persons among different cameras lead re-identification problems to be solved difficultly. This paper aims to introduce a new model which tries to overcome these challenges and to increase the person re-identification efficiency. The proposed method uses both hand-crafted and learned features by combining Convolutional Neural Network with Gaussian of Gaussian descriptor. Also, an arbitrary data augmentation is considered to train CNN more efficiently. After that, the person re-identification problem is modeled as a sparse problem which aims to find the best similar persons, avoiding in this way the need of metric learning algorithms. The proposed method is evaluated on three databases, namely CUHK01, CUHK03 and GRID. Experimental results show that the proposed method achieves better precision in most ranks compared to the some recent studies.

Key point extraction method for spatial objects in high-resolution remote sensing images based on multi-hot cross-entropy loss

Article

Full-text available

Dec 2022

Extracting spatial objects and their key points from remote sensing images has attracted great attention of worldwide researchers in intelligent machine perception of the Earth’s surface. However, the key points of spatial objects (KPSOs) extracted by the conventional mask region-convolution neural network model are difficult to be sorted reasonably, which is a key obstacle to enhance the ability of machine intelligent perception of spatial objects. The widely distributed artificial structures with stable morphological and spectral characteristics, such as sports fields, cross-river bridges, and urban intersections, are selected to study how to extract their key points with a multihot cross-entropy loss function. First, the location point in KPSOs is selected as one category individually to distinguish morphological feature points. Then, the two categories of key points are arranged in order while maintaining internal disorder, and the mapping relationship between KPSOs and the prediction heat map is improved to one category rather than a single key point. Therefore, the predicted heat map of each category can predict all the corresponding key points at one time. The experimental results demonstrate that the prediction accuracy of KPSOs extracted by the new method is 80.6%, taking part area of Huai’an City for example. It is reasonable to believe that this method will greatly promote the development of intelligent machine perception of the Earth’s surface.

VL-MFL: UAV Visual Localization Based on Multisource Image Feature Learning

Article

Jan 2024

Obtaining the earth-fixed coordinates is a fundamental requirement for long-distance unmanned aerial vehicle (UAV) flight. Global navigation satellite systems are the most common location model, but their signals are susceptible to interference from obstacles and complex electromagnetic environments. To solve this issue, a visual localization framework based on multi-source image feature learning (VL-MFL) is proposed. In the proposed framework, the UAV is located by mapping airborne images to the satellite images with absolute coordinate positions. Firstly, for the heterogeneity issues caused by the different imaging environments of drone and satellite images, a lightweight Siamese network based on 3-D attention mechanism is proposed to extract the consistent features from the multi-source images. Secondly, to overcome the problem of inaccurate localization caused by the large receptive field of traditional convolutional neural networks, the cell-divided strategy is imported to strengthen the position mapping relationship of multi-source images features. Finally, based on similarity measurement, a confidence evaluation mechanism is established and a search region prediction method is proposed, which is effectively improved the accuracy and efficiency in matching localization. To evaluate the location performance of the proposed framework, several related methods are compared and analysed in details. The results on the real-world datasets indicate that the proposed method has achieved outstanding location accuracy and real-time performance.

Automatic registration method for medium-resolution remote sensing images of coral reefs with morphological information pairing and constrained iterative fining

Article

Sep 2023
J APPL REMOTE SENS

A Dense Matching Method for Satellite Remote Sensing Images Based on Multiple Matching Primitives

Chapter

Aug 2023

In order to satisfy the demand of the number of matching points of satellite remote sensing images in the 3D reconstruction of geographic information space, a dense matching method for satellite remote sensing images based on multiple matching primitives is proposed in this paper, in which the matching algorithm of SIFT(Scale Invariant Feature Transform), the matching algorithm based on object space geometry constraint and the region matching point growing algorithm based on affine transformation are comprehensively used with multiple primitives. The experimental results show that the proposed method achieves the dense matching of satellite remote sensing images, which can meet the need of 3D reconstruction of geographic information space.KeywordsRemote sensingImage matchingMultiple primitivesObject space geometry constraintAffine transformation

Image Compression for Underwater Multi-Robot Loop Closing

Conference Paper

Jun 2023

A Novel Scene Matching Navigation System for UAVs Based on Vision/Inertial Fusion

Article

Mar 2023

Scene matching navigation system (SMNS) remains challenging in many navigation tasks, which rely heavily on accuracy, computational efficiency, and robustness. Due to the different generation conditions of the matching images, it is difficult for traditional methods to cover every aspect of the three navigation performances. This paper aims at developing an accurate, fast, and robust SMNS based on vision/inertial fusion to provide complete navigation information for unmanned aerial vehicles (UAVs). Utilizing the mechanization results of the low-cost MEMS, the proposed system first completes the georeferencing of the real-time aerial images, in which the projection errors are reduced greatly by introducing an optimized factor to the homography matrix. Then, applying a robust noise processing strategy, an improved feature extraction algorithm is designed to eliminate most of the features that vary with climate, time, and season, which lays a solid foundation for the accuracy of the following matching procedure. Under the framework of the SMNS, a novel matching strategy based on logic graphs is designed, which can facilitate the matching procedure. Eventually, by combining the mechanization results of the MEMS and the matching results of the SMNS, the proposed system can provide complete navigation results. Experiments in typical and complex scenarios are carried out respectively to verify the effectiveness and robustness of the proposed system. Experimental results demonstrate that the proposed SMNS possesses accuracy, computational efficiency, and robustness, which outperforms the state-of-the-art strategies(i.e., HOPC, CFOG, PC) in terms of matching aerial and satellite images.

Moving Target Detection Algorithm Based on SIFT Feature Matching

Conference Paper

Jun 2022

Classification and Application of Sports Venue Monitoring Images Using SIFT Algorithm

Conference Paper

Jul 2022

Dan Li

Real-time adaptive visible and infrared image registration based on morphological gradient and C_SIFT

Article

Full-text available

Oct 2020

Since the visible and infrared images have different imaging mechanisms, the difficulty of image registration has greatly increased. The grayscale difference between visible and infrared images is very disadvantageous for extracting feature points in homogenous region, but they both retain the obvious contour edge in the scene. After using the morphological gradient method, the grayscale edge of visible and infrared images can be obtained and their similarity is greatly improved, and their difference may be seen as the difference in brightness or grayscale. Therefore, we proposed a novel algorithm to realise real-time adaptive registration of visible and infrared images using morphological gradient and C_SIFT. Firstly, the morphological gradient method is used to extract the rough edges of visible and infrared images for aligning their visual features as a single similar type. Secondly, the C_SIFT feature detection operator is used to detect and extract feature points from the extracted edges. The C_SIFT uses the centroid method to describe the main direction of feature points, makes rotation invariance feasible. Finally, to verify the effectiveness of the proposed algorithm, we carried out a series of experiments in eight various scenarios. The experimental results show that the proposed algorithm has achieved good experimental results. The registration of visible and infrared images can be completed quickly by the proposed algorithm, and the registration accuracy is satisfactory.

A contrario comparison of local descriptors for change detection in Very High spatial Resolution (VHR) satellite images of urban areas

Article

Full-text available

Jan 2018

Change detection is a key problem for many remote sensing applications. In this paper, we present a novel unsupervised method for change detection between two high-resolution remote sensing images possibly acquired by two different sensors. This method is based on keypoints matching, evaluation, and grouping, and does not require any image co-registration. It consists of two main steps. First, global and local mapping functions are estimated through keypoints extraction and matching. Second, based on these mappings, keypoint matchings are used to detect changes and then grouped to extract regions of changes. Both steps are defined through an a contrario framework, simplifying the parameter setting and providing a robust pipeline. The proposed approach is evaluated on synthetic and real data from different optic sensors with different resolutions, incidence angles, and illumination conditions.

On-device Scalable Image-based Localization via Prioritized Cascade Search and Fast One-Many RANSAC

Article

Full-text available

Nov 2018

We present the design of an entire on-device system for large-scale urban localization using images. The proposed design integrates compact image retrieval and 2D-3D correspondence search to estimate the location in extensive city regions. Our design is GPS agnostic and does not require network connection. In order to overcome the resource constraints of mobile devices, we propose a system design that leverages the scalability advantage of image retrieval and accuracy of 3D model-based localization. Furthermore, we propose a new hashing-based cascade search for fast computation of 2D-3D correspondences. In addition, we propose a new one-many RANSAC for accurate pose estimation. The new one-many RANSAC addresses the challenge of repetitive building structures (e.g. windows, balconies) in urban localization. Extensive experiments demonstrate that our 2D-3D correspondence search achieves state-of-the-art localization accuracy on multiple benchmark datasets. Furthermore, our experiments on a large Google Street View (GSV) image dataset show the potential of large-scale localization entirely on a typical mobile device.

An Efficient Image Matching Algorithm Based on Adaptive Threshold and RANSAC

Article

Full-text available

Nov 2018

The education plays more and more important role in disseminating knowledge because of the explosive growth of knowledge. As one kind of carrier delivering knowledge, image also presents an explosive growth trend and plays an increasingly important role in education, medical, advertising, entertainment, etc. Aiming at the long time of massive image feature extraction in the construction of smart campus, the traditional Harris corner has such problems as low detection efficiency and many non-maximal pseudo-corner points etc. This paper proposes a Harris image matching method that combines adaptive threshold and RANSAC (Random Sample Consensus). Firstly, the Harris feature points are selected based on the adaptive threshold and the Forstner algorithm in this method. On the one hand, candidate points are filtered based on the adaptive threshold. On the other hand, the Forstner algorithm is used to further select the corner points. Secondly, the NCC (Normalized Cross Correlation matching) and the RANSAC are applied to precisely match the detected Harris corners. The experimental results show that compared with existing algorithms, the proposed method not only obtains a matching accuracy higher than 20% of Cui’s algorithm, but also saves more than 30% detection time of corner detection and image matching. Further more, the proposed method obtains a matching accuracy higher than 50% of the Cui’s algorithm, and saves more than 50% detection time of corner detection and image matching.

On-device Scalable Image-based Localization

Article

Full-text available

Feb 2018

We present the scalable design of an entire on-device system for large-scale urban localization. The proposed design integrates compact image retrieval and 2D-3D correspondence search to estimate the camera pose in a city region of extensive coverage. Our design is GPS agnostic and does not require the network connection. The system explores the use of an abundant dataset: Google Street View (GSV). In order to overcome the resource constraints of mobile devices, we carefully optimize the system design at every stage: we use state-of-the-art image retrieval to quickly locate candidate regions and limit candidate 3D points; we propose a new hashing-based approach for fast computation of 2D-3D correspondences and new one-many RANSAC for accurate pose estimation. The experiments are conducted on benchmark datasets for 2D-3D correspondence search and on a database of over 227K Google Street View (GSV) images for the overall system. Results show that our 2D-3D correspondence search achieves state-of-the-art performance on some benchmark datasets and our system can accurately and quickly localize mobile images; the median error is less than 4 meters and the processing time is averagely less than 10s on a typical mobile device.

Non-Rigid Image Registration With Dynamic Gaussian Component Density and Space Curvature Preservation

Article

Dec 2018

Image registration plays an important role in military and civilian applications, such as natural disaster damage assessment, environmental monitoring, ground change detection and military damage assessment, etc. This work presents a new feature-based non-rigid image registration method. The main contributions of this work are: (i) a dynamic Gaussian component density is designed to better exploit available potential image information and provide sufficient inlier pairs for image transformation; (ii) a spatial structure preservation, which consists of an image transformation space curvature preservation and a local spatial structure constrain, is proposed to constrain the image transforming cost as well as the local structure of feature points during feature point set registration. The performances of the proposed method in multi-spectral natural images, lowaltitude aerial images and medical images against four types of nine state-of-the-art methods are tested where our method shows the best performances in most scenarios.

Regression Shrinkage and Selection Via the Lasso

Article

Jan 1996

Robert Tibshirani

We propose a new method for estimation in linear models. The ‘lasso’ minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree‐based models are briefly described.

A spatial-spectral SIFT for hyperspectral image matching and classification

Article

Aug 2018
PATTERN RECOGN LETT

The scale-invariant feature transform (SIFT) is known as one of the most robust local invariant feature and is widely applied to image matching and classification. However, There is few studies on SIFT for hyperspectral image (HSI). Hyperspectral image (HSI) embraces the spectral information reflecting the material radiation property and the geometrical relationship of the objects. Thus, HSI provides much more information than gray and color image. Therefore, this paper puts forward a spatial-spectral SIFT for HSI matching and classification by using the geometric algebra as its mathematic tool. It extracts and describes the spatial-spectral SIFT feature in the spatial-spectral domain to exploit both the spectral and spatial information of HSI. Firstly, a spatial-spectral unified model of spectral value and gradient change (UMSGC for short) is built to analyze spectral and spatial information for HSI synthetically. Secondly, the scale space for HSI based on UMSGC is designed. Finally, both the new detector and descriptor of the spatial-spectral SIFT for HSI that comprehensively consider spectral and spatial information are proposed. The experimental results show that the proposed algorithm demonstrates excellent performance in HSI matching and classification.

Hierarchical Semantic Image Matching using CNN Feature Pyramid

Article

Jan 2018

Image matching remains an important and challenging problem in computer vision, especially for the dense correspondence estimation between images with high category-level similarity. The effectiveness of image matching largely depends on the advance of image descriptors. Inspired by the success of Convolutional Neural Network(CNN), we propose a hierarchal image matching method using the CNN feature pyramid, named as CNN Flow. The feature maps output by different layers of CNN tend to encode different information of the input image, such as the semantic information extracted from higher layers and the structural information extracted from lower layers. This nature of CNN feature pyramid is suitable to build the hierarchical image matching framework, which detects the patterns of different levels in an implicit coarse-to-fine manner. In particular, we take advantage of the complementarity of different layers using guidance from higher layer to lower layer. The high-layer features present semantic patterns to cope with the intra-class variations, and the guidance from high layers can resist the semantic ambiguity of low-layer features due to small receptive fields. The bottom-level matching utilize the low-layer features with more structural information to achieve finer matching. On one hand, extensive experiments and analysis demonstrate the superiority of CNN Flow in image dense matching under challenging variations. On the other hand, CNN Flow is demonstrated through various applications, such as fine alignment for intra-class object, scene label transfer and facial expression transfer.

Optimization of SIFT algorithm for fast-image feature extraction in line-scanning ophthalmoscope

Article

Sep 2017
OPTIK

The Scale Invariant Feature Transform (SIFT) algorithm is utilized broadly in image registration to improve image qualities. However, the algorithm's complexity reduces its efficiency in biology study and usually requires real-time. In this article, we present an improved SIFT technique in software architecture for matching sequences of images taken from a line-scanning ophthalmoscope (LSO). The method generates the Gaussian Scale-space pyramid in frequency domain to complete the SIFT feature detector more quickly. A novel SIFT descriptor invariable with rotation and illumination is then created to reduce calculation time, implementing the original SIFT method, our improved SIFT method, and the graphic processing unit (GPU) version of our improved SIFT method. The experiments have shown that the improved SIFT is almost 2–3 times faster than the original while maintaining more robust performance, and the GPU implementation of the improved SIFT is 20 times faster than central processing unit (CPU) implementation and achieves acceleration at real-time as expected. Although tested on an LSO system, the improved SIFT method does not rely on the acquisition setup. As a result, this method can be applied to other imaging instruments, e.g., adaptive optics to increase their resolution in agreement.

A New Sample Consensus Based on Sparse Coding for Improved Matching of SIFT Features on Remote Sensing Images

Abstract

Recommended publications

LANDSAT Enhanced Thematic Mapper plus image registration using SIFT

An efficient SIFT-based matching algorithm for optical remote sensing images

Remote Sensing Image Registration Based on Multifeature and Region Division

Coarse-to-Fine Registration of Remote Sensing Optical Images Using SIFT and SPSA Optimization