ArticlePDF Available

A New Sample Consensus Based on Sparse Coding for Improved Matching of SIFT Features on Remote Sensing Images

Authors:

Abstract

In this article, a new method is proposed for feature matching of remote sensing images using sample consensus based on sparse coding (SCSC) to improve the image registration technique. To this end, scale-invariant feature transform (SIFT) features are used to select interesting points for image matching. The extracted points contain some differences and similarities in two images captured from the same area (but different in sensor resolution, azimuth, elevation, contrast, illumination, etc.); in such a case, similar points should be extracted and other dissimilar should be eliminated. In this article, we greatly improve the matching between two images using the SCSC through checking points altogether. Moreover, the proposed method is shown to have better results than standard alternative methods such as random sample consensus (RANSAC) when the number of feature points is too much or have noise. However, it should be noted that for a low-noise and distortion rate, the proposed method and the RANSAC yield similar results. In general, the proposed method using sparse coding achieves a higher correct match rate than the SIFT algorithm. In order to illustrate this issue, the proposed method is compared to other updated matching and registration methods based on the SIFT algorithm. The obtained results confirm the accuracy of this claim and show that the proposed algorithm is accurate between 0.48% and 7.68% rather than SVD-RANSAC, Hoge, Stone, Foroosh, Leprince, Nagashima, Guizar, Youkyung, Lowe, Preregistration, IS-SIFT, SPSA, Gong, Standard SIFT, IS-SIFT, UR-SIFT, Sourabh, and Han methods.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1
A New Sample Consensus Based on Sparse Coding
for Improved Matching of SIFT Features on
Remote Sensing Images
Pouriya Etezadifar and Hassan Farsi
Abstract In this article, a new method is proposed for feature
matching of remote sensing images using sample consensus based
on sparse coding (SCSC) to improve the image registration
technique. To this end, scale-invariant feature transform (SIFT)
features are used to select interesting points for image matching.
The extracted points contain some differences and similarities
in two images captured from the same area (but different
in sensor resolution, azimuth, elevation, contrast, illumination,
etc.); in such a case, similar points should be extracted and
other dissimilar should be eliminated. In this article, we greatly
improve the matching between two images using the SCSC
through checking points altogether. Moreover, the proposed
method is shown to have better results than standard alternative
methods such as random sample consensus (RANSAC) when the
number of feature points is too much or have noise. However,
it should be noted that for a low-noise and distortion rate,
the proposed method and the RANSAC yield similar results.
In general, the proposed method using sparse coding achieves a
higher correct match rate than the SIFT algorithm. In order
to illustrate this issue, the proposed method is compared to
other updated matching and registration methods based on the
SIFT algorithm. The obtained results confirm the accuracy of
this claim and show that the proposed algorithm is accurate
between 0.48% and 7.68% rather than SVD-RANSAC, Hoge,
Stone, Foroosh, Leprince, Nagashima, Guizar, Youkyung, Lowe,
Preregistration, IS-SIFT, SPSA, Gong, Standard SIFT, IS-SIFT,
UR-SIFT, Sourabh, and Han methods.
Index Terms—Image matching, image registration, random
sample consensus (RANSAC), scale-invariant feature transform
(SIFT), sparse coding.
I. INTRODUCTION
THE image matching using the SIFT features for finding
corresponding points in two or more images of the same
scene has been used in many methods such as image matching
to align images from a different camera source [1]–[3], video
summarization for selecting informative frames of the video
[4] as well as change detection by observing land feature
Manuscript received September 4, 2019; revised June 30, 2019 and
September 11, 2019; accepted December 3, 2019. This work was supported
by the University of Birjand. (Corresponding author: Hassan Farsi.)
P. Etezadifar is with the Department of Electrical Engineering, Imam
Hussein University (IHU), Tehran 1698715461, Iran (e-mail: petezadifar@
ihu.ac.ir).
H. Farsi is with the Department of Electrical and Computer Engi-
neering, University of Birjand, Birjand 9717434765, Iran (e-mail:
hfarsi@birjand.ac.ir).
Color versions of one or more of the figures in this article are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TGRS.2019.2959606
differences at different times for remote sensing images [5].
Moreover, these methods are used extensively in commercial
fields for face recognition, the study of drought trends, and
water resources assessment. One of the most important things
to consider in these methods is that matching points are as
correct as possible and result in a minimum error. The reason
is that the error in aligned points leads to a reduction in the
efficiency of matching methods.
In most of the methods proposed for image matching,
the most distinguishing feature is to find the correct match-
ing points. Accordingly, new approaches are proposed for
selecting similar points which have led to the improvement
of image registration methods. Image matching techniques
are divided into two distinct feature-based and area-based
categories. The image matching based on the feature is divided
into two main groups of the Bag of Words (BoW) model and
feature descriptors [6]. The BoW model categorizes features
in the image using codebook generation. Feature-descriptor-
based methods use feature extraction for image matching.
We proposed a feature-based method for image matching in
this article which has combined SIFT features and sparse
coding for this purpose.
The rest of this article is organized as follows. In Section I,
we give a brief review of the feature-based and area-based
techniques. In Section III, we propose our matching model.
Accordingly, in Section III-A, the sparse coding problem
is analyzed briefly. In Section III-B, the proposed image
matching model is described and solved. In Section III-C,
our proposed algorithm is presented as pseudocode which is
obvious for implementation. In Section III-D, the features used
in this article are described. In Section IV, the experimental
results are displayed in Section IV. Finally, in Section V,
we draw the conclusion.
II. RELATED WORKS
Lu et al. [7] proposed a method, for categorization based
on the BoW model, which can be used to match objects in
images with each other. Many methods such as the SIFT
[8] and speeded up robust features (SURF) have been pro-
posed to extract features of images, which are popular in
image matching. Joglekar et al. [9] proposed a new method,
which uses the SIFT features and the relaxation indexing for
remote sensing image matching. In this article, a probabilistic
0196-2892 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
neural-network-based feature-matching algorithm is proposed
for stereo images. These methods were very sensitive to
image noise and distortion. Li et al. [10] proposed a method
based on the Harris method that combines adaptive threshold
and random sample consensus (RANSAC). First, the Harris
feature points are selected based on the adaptive threshold,
and second, the normalized cross correlation matching and
the RANSAC are applied to precisely match the detected
Harris corners. Tran et al. [11] proposed a method to design
integrates compact image retrieval to estimate the location in
extensive city regions. Also, they proposed a new hashing-
based cascade search for fast computation of 2-D–3-D cor-
respondences. In addition, they proposed a new one-many
RANSAC for accurate pose estimation. Jiayi et al. [12]
proposed an image matching method that was performed using
vector field consensus and maximum a posteriori (MAP)
estimator. In this method, first, all correct and false points
of the feature were considered. Then, false points were
removed based on the MAP estimator and correct feature
points remained. Our proposed method is relatively close to
Jiayi’s proposed method. However, in our method, sparse
coding was used, which led to an improvement in the method
as compared to Jiayi’s method. Youkyung et al. [13] proposed
a method to enhance the accuracy of automatic high-resolution
image registration [13], which used SIFT features. In their
method, by optimizing the target function, they selected correct
feature points. Both this method and our proposed method
are based on the optimization. However, in our proposed
method, we used affine transform to select correct points,
which results in better performance and efficiency. A method
was proposed by Tong et al. [14] to phase correlation using
SVD and unified RANSAC. Which SVD theoretically unified
RANSAC algorithm acts as a robust estimator for the line
fitting. In [15], He and his colleagues used an improved SIFT
technique for matching sequences of images taken from a line-
scanning ophthalmoscope (LSO). Also, we proposed a novel
SIFT descriptor to reduce calculation time, implementing the
original SIFT method. Yu et al. [16] proposed a hierarchal
image matching method using the CNN feature pyramid. Yu’s
method advantage is the complementarity of different layers
using guidance from higher layer to lower layer. Gonçalves
et al. [17] improved the performance of remote sensing image
matching. Sedaghat’s method was based on a new strategy for
choosing features of the SIFT algorithm. We implement our
proposed method using the features extracted from Sedaghat’s
method, which led to increased matching points, improved
matching rate, and reduced matching errors. In [18], a subpixel
phase correlation method was proposed by Tong using the sin-
gular value decomposition (SVD) and RANSAC methods. He
used the SVD method and converted the estimating problem
to 1-D space, which made it simpler and more efficient. Li
et al. [19] proposed a new method based on spatial–spectral
SIFT for HSI matching and classification using a spatial–
spectral model of spectral value and gradient change to analyze
information. In [20], the modified SIFT version was proposed.
In the next step, the bivariate histogram and the RANSAC
algorithm were used to correct matching points. Finally, a
method was proposed to maximize the matching points. Our
method improves the matching point’s correction, which led
to better results as compared to Paul’s method. Paul and
Pati [21] proposed a new method by improving the matching
by using SIFT features. To this end, Kupfer selected closed
feature points using the nearest neighbor and Hough-like
voting scheme algorithm. However, the runtime was faster in
our proposed method. In [22], a taxonomy of image matching
methods based on the dense disparity map was studied. Our
proposed method was based on this feature. Therefore, we did
not study the area based algorithm in this article.
III. PROPOSED MATC HI NG MODEL
In this section, we attempt to solve the matching problem
by proposing a method based on sparse coding. The basis of
this article is classical equations and optimization. Therefore,
the general method is first obtained using the sparse coding
for a Laplacian distribution based on the MAP estimation
model. In the next step, the generalized equation obtained
in Section III-A is reformulated for remote sensing image
matching. After solving the solution, an algorithm is proposed.
Finally, the proposed method is presented for easy implemen-
tation in a pseudocode format.
A. General Problem Formulation Based on Sparse Coding
In Section III-B, the aim is to select the inlier data and
remove the outlier by training and sparse dictionary selecting.
Thus, in the first step, we discuss the problem. In this case,
we assume that the observation vector xcan be defined as
y=Ax+n+o&y,n,o∈m&A∈m×n&x∈n.(1)
In (1), yis representing the reference image observations, n
denotes noise, and oindicates the outlier data. Moreover, Ais
known as the dictionary matrix and xis an observation vector
of the image to be matched. Moreover, the prior distribution
of each element of xand ohas a Laplacian distribution with
zero mean, and the elements are independent of each other
and identically distributed. The reason for using Laplacian
distribution is that this distribution is sharp around zero
compared to the other distributions such as Gaussian which is
more appropriate for the sparse answer. Moreover, the noise
has a Gaussian distribution with zero mean and its variance
is σ2I. On that basis, we calculate the best estimate of xand
ovectors. In this problem, since the prior distributions of the
random variables, xand o, are available, the best estimation
is MAP [23]. Therefore, the estimated xand ovectors can be
solved by
omap,ˆxmap}=arg max
o,x
{Ln(P(y|x,o))
+Ln(P(x))+Ln(P(o))}.(2)
According to the assumptions of the problem, we know
that the conditional distribution of y|x,ois equal to the noise
distribution with Ax+omean and σ2Ivariance, which has the
normal distribution N(Ax +o2I). Equation (3) is obtained
from (2) based on the conditional probability of y|x,oand
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
ETEZADIFAR AND FARSI: NEW SCSC FOR IMPROVED MATCHING OF SIFT FEATURES ON REMOTE SENSING IMAGES 3
eliminate fixed values according to the dictionary modeling
used in Group Lasso [24] as follows:
{ˆ
O,ˆ
X}=arg min
O,X
{YAXO2
F+λ1X1+λ2O1}.(3)
We know that the Xvalues are available as observations.
Therefore, the function X1has a fixed value and is ineffec-
tive in the optimization process. Moreover, the values of the
dictionary, A∈
d×k, are called learning the dictionary. Here,
Y∈
d×nis the reference image feature point coordinates
and X∈
k×nindicates the image feature point coordinates
which must be matched. Furthermore, O∈
d×nrepresents
the outlier named as pursuit coefficient matrix. According to
the above explanation, (3) can be reordered as
{ˆ
A,ˆ
O}=arg min
A,O
{YAX O2
F+λO1}.(4)
In (4), XFdenotes Frobenius norm defined as XF=
(i,jX2
ij)1/
2. Moreover, O1=i,j|Oij|is known as
l1norm. If λhas a small value, then the solution of (4) tends
in the direction that all the reference and matching points are
mapped to each other with the least error and solving the
equation proceeds to select all the feature points as matching
points. When λincreases, the number of zero elements in
the matrix Otends to increase. This means that solving the
equation proceeds to a minimum number of matching points,
which leads to an increase in the undesirable matching error
rate. The second part of (4) calculates the sparsity of the matrix
using and attempts to make many columns of the matrix O
equal to zero and select matching points using the nonzero
column. The problem of using l1norm is that the points with
a slight error are eliminated. To solve this problem, l1,2norm
can be used instead of l1norm [25], which is defined as
O2,1=ioi2. Moreover, oi2denotes l2norm of the
ith column in the matrix O. Another problem in (4) is that the
appropriate value of λcan be selected in the range 0 ≤∞.
However, by reordering (4), the value of λcan be selected in
the range of 0 1 as follows:
{ˆ
A,ˆ
O}=arg min
A,Oλ
2YAXO2
F+1λ
2O2,1.(5)
In (5), for a value of λ=1, the reconstruction error is
reduced to the lowest value but the sparsity condition of the
matrix Ois not checked. Moreover, for the value λ=ε,
the matrix Ois chosen but the condition for minimizing the
reconstruction error is not checked. Therefore, in this case,
the matrix Ois constructed as the zero matrix.
B. Solution of the Formulated Problem for Remote Sensing
Image Matching
In this section, we first review the proposed matching
scheme. Then, we rewrite the general (5) for our problem and
solve the problem. Finally, the proposed method is presented
as an optimization problem in the form of a pseudocode
algorithm.
1) Definition of the Proposed Matching Scheme: In the
image matching process, after extracting features and select-
ing the corresponding points in the two test/reference
images, outlier points should be deleted. The RANSAC algo-
rithm [26] or its modification is generally used for this
goal [27]. The RANSAC is a general method for finding
the geometric model parameters by using a set of sample
data. It is assumed that x=(x1,x2)is a feature point
coordinate of the test image and y=(y1,y2)is the feature
point coordinate of the reference image which is found as
the corresponding point of xin the matching process. Given
that the images have variations in translation [28]. Therefore,
the affine transform is an appropriate model for matching.
In the following, we assume that for a pair of (x,y)sample
points that satisfies the geometric model, the following can be
written:
y=Axa,&A=abc
de f
&xa=(x1,x2,1). (6)
As the number of unknown parameters (the number of
matrix strings) is equal to 6, three feature points are sufficient
to determine these parameters. Using these three points, matrix
Ais obtained by solving the following problem:
A=arg min
A
3
i=1
yiAxai2
2=(XXT)1XTY.(7)
In (7), the columns of the matrices Xand Ydenote xai and
yifeature point coordinates, respectively. Now, the question
is how this geometric model (transform parameters) can be
found. According to the obtained matrix A, inlier points
are selected and outlier points are removed. The RANSAC
algorithm has a random nature, and in the limit, it should
select and analyze the C3
Nfeature point coordinates (the
number of Nfeature points is considered; from here on,
in this article, the number of feature points is assumed to
be N). In other words, the complexity of this algorithm is
of order θ(n2), which is not acceptable in many applications.
In this section, we propose an algorithm that has much less
computational complexity than the RANSAC algorithm. This
claim is supported by a comparison of the runtime between
our proposed method and the other image matching methods
based on RANSAC and modified methods. By putting (6) in
(5) and defining the o={oT
i}N
i=1,oi=(oix,oiy )vector, the
proposed problem is presented in this article. Before proposing
our problem, it should be noted that, first, for the feature points
mapped without errors in the geometric model, the vector
Oshould be zero. This means that these feature points are
inliers. However, for other feature points, the vector has a
nonzero value. This means that these feature points had a large
deviation from the calculated model. Second, the number of
the outlier feature points is much lower than the number of the
feature points which satisfied the geometric model (the inliers).
Therefore, the vector Ohas values close to zero. Moreover,
the Xa={xT
ai}N
i=1,xai =(x1i,x2i,1)vector is the measured
values for the test image point coordinates and has no effect on
the optimization. In the following, the optimization problem
should be carried out iteratively between selecting sparse
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
4IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
dictionary and updating the dictionary. Therefore, the problem
is rewritten as follows:
{ˆ
A(k+1),ˆ
O(k+1)}
=arg min
A,Oλ
2YA(k)XaO(k)2
F+1λ
2O(k)2,1.
(8)
Because the proposed method is an iterative method, in (8),
the variable krepresents the number of algorithm repetitions
that is updated after training its k+1value.
2) Solution of the Proposed Matching Problem: We so lve
the proposed problem in this article in Sections III-B2a) and
III-B2b). In Section III-B2a), we solve the selecting sparse
dictionary problem, and in Section III-B2b), the dictionary
learning is discussed.
a) Sparse dictionary selection: In this section, the sparse
dictionary selection is performed by using the matching model.
For this purpose, the matrix Ais initialized using random
values and acted as fixed data at this stage. Therefore, it does
not affect solving the optimization process. Therefore, (8) is
rewritten
o(k+1)
l}=arg min
Oλ
2
YA(k)Xao(k)
l
2
F
+1λ
2
o(k)
l
2,1.(9)
As shown in (9), to calculate the lth row of the matrix O
in the k+1th iteration, the rows of the dictionary Ain the kth
iteration are used. To solve the above equation, the gradient
is calculated with respect to Oand set to zero. Therefore,
the elements value of ˆ
Olis calculated using (9) by
λ
2YA(k)Xao(k)
l+1λ
2
o(k)
l
o(k)
l
2
=0
→−
λ
2(YA(k)Xa)+o(k)
lλ
2+1λ
2
o(k)
l
2=0
YA(k)Xa=o(k)
l1+1λ
λ
o(k)
l
2
1+1λ
λ
o(k)
l
2
0
−−−−−−−−−−
1+1λ
λ
o(k)
l
2
1
β
×o(k)
l=β(YA(k)Xa). (10)
By the replacement of o(k)
l=β(YA(k)xa)with o(k)
l[1+
((1λ)/(λo(k)
l2))]=YA(k)xa, (10) can be rewritten as
β1+1λ
βλYA(k)Xa2=1
β+1λ
λYA(k)Xa2
=1
β=11λ
λYA(k)Xa2
.(11)
Using (10) and (11), o(k)
lcan be defined as
o(k)
l=11λ
λYA(k)X2·(YA(k)X), λ > γ
0,otherwise
(12)
where γdefined as 1/(1+YA(k)X2). In (12),
the boundary condition is obtained by satisfying the
β>01(1λ)/(λYA(k)xa2)>0 expression.
The element’s value of each row of the matrix ois computed
using (12) and the element’s value of o(k)
lminimizing (9)
which is selected as the sparse solution in the kth iteration.
b) Dictionary learning: After selecting the sparse dic-
tionary, the outlier data is slightly determined. Then, using
dictionary learning, we obtain an accurate model to reduce the
matching error rate. The initial outliers in the sparse dictionary
selection step are eliminated since they are away from other
feature points. Therefore, applying dictionary learning using
the data with fewer errors leads to improved matching models.
In accordance with that, we focus on the dictating learning
process to improve the matching model. At this stage, (13)
can be rewritten via the replacement of o(k)o(k+1)for
dictionary learning as
ˆ
A(k+1)=arg min
Aλ
2YA(k)XaO(k)2
F
+1λ
2O(k)2,1
×Z=YO(k)
−−−−−−−−−−−−
1λ
2O(k)2,1=Const
ˆ
A(k+1)
=arg min
Aλ
2ZA(k)Xa2
F
+Const (13)
where o(k+1)is the new value is obtained and o(k)is the
Algorithm 1 Image Matching Using Sample Consensus Based
on Sparse Coding
Input: The Features Position Set for Test & Ref Remote
Sensing Xa,Y,O,A
Initialization: O=0, A=Rand(2,3),λ=0.32
Output: The Selected Features, Which is Used for Match-
ing Process
repeat
for l =1,2,··· ,Ndo
if λ> 1
1+YA(k)X2
o(k)
l=11λ
λYA(k)X2×(YA(k)X)
Else
o(k)
l=0
end i f
end f or
OO(k+1)
A=(YO(k))(Xa)T(Xa(Xa)T)1
Support (O)=
N
h=1
(lim
p0(
m
k=1
|ok,h|p)) =#{i:oi= 0}
μ(A)=max
1i,jm,i= j
|a
iTa
j|
a
i2cot a
j2&Er_A=
AA2
2&AA
Er_A=AA2
2&AA
Until Support(O)<1
2(1+1/
μ(A))OR Er_A0.001
previous elements value of the variable o. At this stage,
considering that the optimization process is relative to the
dictionary (Matrix A), the matrix ois considered constant
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
ETEZADIFAR AND FARSI: NEW SCSC FOR IMPROVED MATCHING OF SIFT FEATURES ON REMOTE SENSING IMAGES 5
TAB LE I
ZY-3 IMAGING SENSOR FEATURE
and has no effect on the optimization. In order to update the
dictionary, it is needed to calculate the gradient for (13) with
respect to Aand set it to zero. Therefore, it can be defined as
A(k+1)
δA=0
λ
2ZA(k)Xa2
F+Const
δA=0
⇒−(ZA(k)Xa)(Xa)T=0
Z(Xa)T=A(k)Xa(Xa)T
A=Z(Xa)T(Xa(Xa)T)1
Z=YO(k)
−−−−−A=(YO(k))(Xa)T(Xa(Xa)T)1.
(14)
c) Pseudocode algorithm for proposed image matching:
The pseudocode image matching is introduced using sample
consensus based on sparse coding in Algorithm 1.
As shown in Algorithm 1 (pseudocode), the initial input
values for implementing the proposed method are feature
point coordinates of the image reference indicated as Y.Xa
denotes test image with their values being extracted by the
SIFT feature algorithm. The matrix opresents the outlier data
which was constructed as the zero matrix in the first step.
The affine transformation matrix A is initialized randomly
to prevent obtaining the local optimal and λcorresponding
to a tradeoff between the sparsity of the matrix o and the
reconstruction error (YAX O). After examining different
values of λon several images, the optimum value of λ
is 0.27 for image matching using the proposed algorithm.
As shown in Algorithm 1 (pseudocode) and also detailed
and proved in Section III-A, the proposed method consists of
two parts: sparse dictionary selection and dictionary learning.
In the dictionary selection step, the element values of the
matrix Oare selected using the dictionary. In the next step,
the dictionary is trained using the matrix O. This iterative
operation is run until one of the two convergence conditions
is satisfied. In the following, we describe the convergence
conditions. In terms of convergence, the parameter Support(·)
is defined as the number of nonzero elements of a matrix
known as || · ||0. In addition, the μ(·) parameter is introduced
as mutual-coherence which indicated the matrix (A) columns
correlation [29].
In terms of convergence, we consider these two problems:
first, the matrix ois sufficiently sparse; second, if the matrix o
could not achieve its sparsity, it would be examined whether
the matrix A(denoting the affine transform) is convergent.
The reason for using the two conditions for convergence is
that the sparsity condition may require a high repetition time
and it is very strict. However, when the matrix Aconvergence
condition is met, an acceptable solution could be obtained.
C. SIFT
The SIFT features suitable for image matching because of
independent from scale, rotate, and change in brightness [8].
D. Improved SIFT Features
In this article, the improved SIFT features are also used to
compare with other methods including IS-SIFT [17], UR-SIFT
[18], MS-SIFT [21], and A2-SIFT [30].
IV. PROPOSED METHOD EVALUATION AND
EXPERIMENTAL RESULTS
In this section, we first implement our proposed method for
remote sensing images group which is provided in detail and
compare the matching rate with other methods. In the next
step, only the results of comparing our proposed method with
other methods for remote sensing images group are reported.
All these experiments show improvements in our proposed
method as compared to other methods for remote sensing
image matching.
A. Experimental Results for ZY-3 Remote Sensing Images
In the first experiment, our proposed method is compared
for images obtained from the ZiYuan-3 (ZY-3) remote sensing
satellite with the proposed Tong method [14] introduced as
SVD-RANSAC. The ZY-3 imaging sensor feature is shown
in Table I. Based on that, the experimental conditions and
their implementation are reviewed.
Before comparing our proposed method with the SVD-
RANSAC, it is necessary to simulate the images as described
in the reference. After that, the results of the proposed method
are compared with the method described in this reference,
along with the Hoge [31], Stone [32], Foroosh [33], Leprince
[34], Nagashima [35], and Guizar [36] methods. In order to
correctly compare our proposed method with the reference
methods, the process similar to the implementation process in
[14] is performed on the images. The ZY-3 remote sensing
satellite images whose specifications are shown in Table II
are used to compare the results of our proposed method with
those of the reference methods.
As shown in Table II, the images were obtained from differ-
ent areas of the Earth with different regional features. In Fig. 1,
the images used in this experiment are shown. As described
in [14], the images are converted to new images by adding
noise which constructed 450 images using these six reference
images. Aliasing occurs when the sensor sampling rate does
not follow the Nyquist theorem [36]. The mentioned cases are
considered to be negative factors in signal processing [37].
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
6IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
TAB LE I I
ZY-3 REMOTE SENSING SATELLITE IMAGES SPECIFICATION
Fig. 1. ZY-3 satellite imagery presented in Table II.
TABLE III
RUNTIME AND MEAN ABSOLUTE PIXEL ERROR COMPARISON FOR THE
DATABASE SIMULATED WITH ERROR ALIASING
1) Making Images With Aliasing Error: For adding aliasing
errors to the reference images, subpixel shifts are used based
on low band filtering and downsampling. For aliasing error,
the results are compared to the Hoge, Stone, Foroosh, Lep-
rince, Nagashima, and Guizar methods. For all the 150 sim-
ulated images, the calculated values are reported in Fig. 2.
As shown in Fig. 2, our SCSC method worked well in the
presence of the aliasing error and was able to make a good
improvement in the results. Fig. 2 also demonstrates that the
Foroosh method has the highest sensitivity to the aliasing error
followed by the Nagashima and Guizar methods, compared to
the other methods. On the contrary, the SCSC has the highest
resistance to the aliasing error followed by the SVD-RANSAC,
Stone and Hoge methods as compared to the other methods.
However, the results for the SCSC and SVD-RANSAC are
relatively close to each other.
Fig. 2. Comparison of the calculated error values including (a) mean value,
(b) rms, (c) max value, and (d) standard deviation.
Fig. 3. Comparison of the calculated error values including (a) mean value,
(b) rms, (c) max value, and (d) standard deviation. The output of the Hoge
method was cut off for better values display.
Fig. 4. Comparison of the methods in terms of (a) SVD-RANSAC, (b)
Guizar, and (c) SCSC for three images with aliasing error and Gaussian filter
with σ=1.
2) Making Images With Additive Noise: In this section,
a new image database is created by additive white Gaussian
noise (AWGN) with a zero mean and different variances.
For this purpose, the second three images (images 4, 5,
and 6 of the ZY-3 images) are used. The results of this
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
ETEZADIFAR AND FARSI: NEW SCSC FOR IMPROVED MATCHING OF SIFT FEATURES ON REMOTE SENSING IMAGES 7
TAB LE I V
COMPARISON OF THE CORRECT MAT C H I N G RATE BETWEEN THE SCSC METHOD WITH THE LOWE AND YOUKYUNG METHODS EXTRACTED FROM [17]
TAB LE V
COMPARISON OF THE BI AS AND STANDARD DEVIATION VALUES BETWEEN THE SCSC WITH LOWE AND YOUKYUNG METHODS
experiment are reported in Fig. 3. As can be seen in Fig. 3,
the SCSC is well resisted against noisy images and provides
an appropriate improvement in the results compared to all
the other methods. Moreover, Fig. 3 shows that the Foroosh,
Stone, Hoge, and Nagashima methods were more sensitive
to noise than the other methods, and Hoge, unlike its good
behavior against the aliasing error, has the highest sensitivity to
the noise. However, the results for the SCSC, SVD-RANSAC,
and Leprince methods are relatively close to each other; the
resistance of the SVD-RANSAC and Leprince methods is in
the form of a ripple falling down and rising up against the
noise.
All the experiments are run using the MATLAB program on
a computer with CPU/INTEL Core i3-2120 with a 3.3-GHz
frequency. The runtime and mean absolute pixel error are
calculated using the database generated by the aliasing error
for SCSC, SVD-RANSAC, Hoge, Stone, Foroosh, Leprince,
Nagashima, and Guizar methods in which the variance values
σvaried between 1 and 5 with a unit step. It should be
noted that the runtime is included for the 150 images with
five different σvalues, which, in total, are 750 repetitions
of the algorithm. The comparison of the runtimes is shown
in Table III.
As shown in Table III, the SCSC runtime is relatively
low. The SCSC runtime, compared to the SVD-RANSAC
method which has the closest values to the proposed method
in terms of the simulation results, decreases by half. In other
cases, the Foroosh, Hoge, and Nagashima methods have lower
running-time than the SCSC. However, as shown in Table III
and Figs. 2 and 3, these methods have a high sensitivity
compared to the SCSC method for the simulated database
by adding noise and aliasing distortion. The results of the
comparison between the proposed method with the SVD-
RANSAC and Guizar methods are shown in Fig. 4, where
the number of correct and false matches is calculated sta-
tistically through visual inspection. This experiment is cal-
culated for three images with aliasing error and variance
value σ=1.
TAB LE V I
COMPARISON OF MAT C H I NG ACCURACY VALUES BETWEEN THE SCSC
WITH LOWE AND YOUKYUNG METHODS
TAB LE V II
COMPARISON OF THE IMAGE MATC H I N G BETWEEN THE SCSC METHOD
WITH THE PREREGISTRATION,IS-SIFT, SPSA, AND GONG METHODS
USING [18]
B. Experimental Results and Evaluation of the Proposed
Method for QuickBird-2, IKONOS-2, and
KOMPSAT-2 Remote Sensing Images
In this section, we compare our proposed method with [13].
Youkyung et al. [13] selected the correct point by optimization
which was using both the distribution of the matching points
and the reliability of the transformation model Using SIFT
features. For testing, images of the South Korean Daejeon
region are obtained from three QuickBird-2, IKONOS-2, and
KOMPSAT-2 satellites. More features are reported in [13,
Table I]. In the first evaluation, the SCSC method is compared
to the Youkyung and Lowe methods as reported in Table IV.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
8IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
TABLE VIII
COMPARISON OF THE IMAGE MAT C H I N G BETWEEN THE SCSC METHOD WITH THE STANDARD SIFT, IS-SIFT, UR-SIFT, AND SOURABH METHODS
FROM TABLE II [21]
TAB LE I X
COMPARISON OF THE MAT C H I N G RATE BETWEEN THE SIFT, HAN,AND SCSC METHODS USING THE OBTAINED INFORMATION FROM TABLE II [34]
As shown in Table IV, the proposed method (SCSC)
provides more correct matching points than the Lowe and
Youkyung methods, and Lowe’s method is extremely weak.
Moreover, in the SCSC method, the number of extracted
matching points is less than the other two methods, which
leads to a significant increase in the matching rate compared to
the other two methods. In another evaluation, we compare bias
and standard deviation for the extracted feature points along
the axes of xand yand the results are reported in Table V.
Furthermore, the RMSE value, using the information obtained
from Table V, is reported in Table VI. The reported values
for the Lowe and Youkyung methods are obtained from [13,
Table 4] .
As shown in Tables V and VI, the SCSC method improves
the matching rate. Although the matching accuracy between
the IKONOS-2 and KOMPSAT-2 images has the least
improvement, this is due to a large change in the angle of
the imaging between the IKONOS-2 and KOMPSAT-2 images
which is 28.02.
C. Experimental Results for Landsat TM and SPOT
In this experiment, the results of the SCSC method are
compared to the proposed method by Gong et al. [20] which
is using the Marquardt–Levenberg search strategy [39]. The
results obtained for this experiment are reported in Table VII.
As can be seen from Table VII, the proposed method
(SCSC) has the lowest error rate compared to the other
methods. The SCSC runtime is close to the preregistration
method but is much better than preregistration in the matching
rate.
D. Experimental Results for Landsat TM, ETM +, and EO-1
In this experiment, we compare the SCSC method to the
method proposed by Sourabh et al. [21] where an improved
SIFT version was proposed to build a matching feature with
uniform distribution. In the proposed method, Sourabh used
three pairs of images in which the specification of images
and sensors can be viewed in [21]. In order to evaluate
the proposed method, we use several new evaluation criteria
similar to [21]: the rms error for all extracted matching points
RMSall, the rms error for matched point residuals based on the
leave-one-out [41] method RMSLOO, the statistical measure-
ment of residual distribution in quadrants pquad, the bad point
proportion with norm 1 BPP(1), the statistical measurement of
the presence of residues along a preference axis Skew,and the
statistical measurement of feature points distribution Scat [42].
The experimental results are reported in Table VIII where dratio
indicates the distance between the first neighbor to the second
neighbor, and its values are equal to dratio in [21] which are
compared in Table VIII.
As shown in Table VIII, our proposed method has improved
the values of RMSall,RMS
LOO,pquad,and Scat compared to
the other methods.
E. Experimental Results for QuickBird-Pan and Multi-and
Ikonos-Pan Remote Sensing Images
As the last experiment, we compare the results of the
proposed method by Han [34] with our method for remote
sensing images obtained from QuikBird for pan and multi-
sensors and Ikonos for the pan sensor. In this experiment, six
images from the South Korean Daejeon region are used in six
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
ETEZADIFAR AND FARSI: NEW SCSC FOR IMPROVED MATCHING OF SIFT FEATURES ON REMOTE SENSING IMAGES 9
TAB LE X
COMPARISON OF THE MATC H I N G BETWEEN THE SIFT, HAN,AND SCSC
METHODS FOR THREE PAIRS OF QUIKBIRD AND IKONOS REMOTE
SENSING IMAGES USING THE OBTAINED INFORMATION FRO M
TABLE III [43]
dates and different angles are named Site1, Site2, and Site3.
The features of these images are given in Table I [43]. In
Table IX, the results of the SCSC method are compared to the
SIFT and Han’s (Table II values [43]) methods. As shown
in Table IX, the SCSC method has lower matching points
than the Han method, but more correct matching points than
the Han and other comparable methods. This leads to a
better improvement in the SCSC method than in the Han and
SIFT methods. Finally, the RMSE error rates of our proposed
method are compared to the Han and SIFT methods and
reported in Table X which achieves lower values.
V. C ONCLUSION
In this article, a classic method was proposed for remote
sensing image matching using sparse coding. Accordingly,
in the first step, the image features were extracted using
the SIFT algorithm. In the next step, we combined affine
transformation and sparse coding, and a model for remov-
ing outlier points and choosing correct matching points was
proposed. The outlier was then removed using optimization
and MAP estimation. This algorithm was implemented iter-
atively in the two parts of the sparse dictionary selection
and dictionary learning. The results of the proposed method
on several remote sensing images obtained from different
satellites (ZY-3, QuickBird-2, IKONOS-2, KOMSAT-2, SPOT,
TM, ETM+, EO-1, QuickBird-pan, and Multi- and Ikonos-
pan) were compared to several new image matching methods
(SVD-RANSAC, Hoge, Stone, Foroosh, Leprince, Nagashima,
Guizar, Youkyung, Lowe, Preregistration, IS-SIFT, SPSA,
Gong, Standard SIFT, IS-SIFT, UR-SIFT, Sourabh, and Han ).
In the majority of cases, our proposed method (the SCSC
method) outperformed the other methods.
REFERENCES
[1] W. Ma et al., “Remote sensing image registration with modified sift and
enhanced feature matching,” IEEE Geosci. Remote Sens. Lett., vol. 14,
no. 1, pp. 3–7, Jan. 2017.
[2] Z. Yang, Y. Yang, K. Yang, and Z.-Q. Wei, “Non-rigid image registration
with dynamic Gaussian component density and space curvature preser-
vation,IEEE Trans. Image Process., vol. 28, no. 5, pp. 2584–2598,
May 2019.
[3] Q. Zeng, J. Adu, J. Liu, J. Yang, Y. Xu, M. Gong, “Real-time adap-
tive visible and infrared image registration based on morphological
gradient and C_SIFT,” J. Real-Time Image Process., Mar. 2019, doi:
10.1007/s11554-019-00858-x.
[4] P. Etezadifar and H. Farsi, “Scalable video summarization via sparse
dictionary learning and selection simultaneously,Multimedia Tools,
vol. 76, no. 6, Mar. 2017.
[5] G. Liu, Y. Gousseau, and F. Tupin, “A contrario comparison of local
descriptors for change detection in very high spatial resolution satellite
images of urban areas,” IEEE Trans. Geosci. Remote Sens., vol. 57,
no. 6, pp. 3904–3918, Jun. 2019.
[6] C. Huo, C. Pan, L. Huo, and Z. Zhou, “Multilevel SIFT matching for
large-size VHR image registration,” IEEE Geosci. Remote Sens. Lett.,
vol. 9, no. 2, pp. 171–175, Mar. 2012.
[7] L. Wu, S. C. H. Hoi, and N. Yu, “Semantics-preserving bag-of-words
models and applications,” IEEE Trans. Image Process., vol. 19, no. 7,
pp. 1908–1920, Jul. 2010.
[8] D. G. Lowe, “Object recognition from local scale-invariant features,” in
Proc. IEEE Int. Conf. Comput. Vis., vol. 2, Sep. 1999, pp. 1150–1157.
[9] J. Joglekar, S. S. Gedam, and B. K. Mohan, “Image matching using SIFT
features and relaxation labeling technique—A constraint initializing
method for dense stereo matching,” IEEE Geosci. Remote Sens, vol. 52,
no. 9, pp. 5643–5652, Sep. 2014.
[10] H. Li, J. Qin, X. Xiang, L. Pan, W. Ma, and N. N. Xiong, “An efficient
image matching algorithm based on adaptive threshold and RANSAC,”
IEEE Access, vol. 6, pp. 66963–66971, 2018.
[11] N.-T. Tran et al., “On-device scalable image-based localization via
prioritized cascade search and fast one-many RANSAC,” IEEE Trans.
Image Process ., vol. 28, no. 4, pp. 1675–1690, Apr. 2019.
[12] J. Ma, J. Zhao, J. Tian, A. L. Yuille, and Z. Tu, “Robust point matching
via vector field consensus,” IEEE Trans. Image Process., vol. 23, no. 4,
pp. 1706–1721, Apr. 2014.
[13] Y. Han, J. Choi, Y. Byun, and Y. Kim, “Parameter optimization for
the extraction of matching points between high-resolution multisensor
images in urban areas,” IEEE Trans. Geosci. Remote Sens., vol. 52,
no. 9, pp. 5612–5621, Sep. 2014.
[14] X. Tong et al., “A novel subpixel phase correlation method using
singular value decomposition and unified random sample consensus,”
IEEE Trans. Geosci. Remote Sens., vol. 53, no. 8, pp. 4143–4156,
Aug. 2015.
[15] Y. He et al., “Optimization of SIFT algorithm for fast-image feature
extraction in line-scanning ophthalmoscope,Optik, vol. 152, pp. 21–28,
Jan. 2018.
[16] W. Yu et al., “Hierarchical semantic image matching using CNN
feature pyramid,” Comput. Vis. Image Understand., vol. 169,
pp. 40–51, Apr. 2018.
[17] H. Gonçalves, L. Corte-Real, and A. Gonçalves, “Automatic image
registration through image segmentation and SIFT,IEEE Trans. Geosci.
Remote Sens., vol. 49, no. 7, pp. 2589–2600, Jul. 2011.
[18] A. Sedaghat, M. Mokhtarzade, and H. Ebadi, “Uniform robust scale-
invariant feature matching for optical remote sensing images,IEEE
Trans. Geosci. Remote Sens., vol. 49, no. 11, pp. 4516–4527, Nov. 2011.
[19] Y. Li et al., “A spatial-spectral SIFT for hyperspectral image matching
and classification,” Pattern Recognit. Lett., to be published.
[20] M. Gong, S. Zhao, L. Jiao, D. Tian, and S. Wang, “A novel coarse-
to-fine scheme for automatic image registration based on SIFT and
mutual information,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 7,
pp. 4328–4338, Jul. 2014.
[21] S. Paul and U. C. Pati, “Remote sensing optical image registration
using modified uniform robust SIFT,IEEE Geosci. Remote Sens. Lett.,
vol. 13, no. 9, pp. 1300–1304, Sep. 2016.
[22] B. Kufer, N. S. Netanyahu, and I. Shimshoni, “An efficient SIFT-based
mode-seeking algorithm for sub-pixel registration of remotely sensed
images,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 2, pp. 379–383,
Feb. 2015.
[23] D. Scharstein, R. Szeliski, and R. Zabih, “A taxonomy and evaluation of
dense two frame stereo correspondence algorithms,” Int. J. Comp. Vis,
vol. 47, no. 1, pp. 5–45, 2002.
[24] M. Yuan and Y. Lin, “Model selection and estimation in regression with
grouped variables,” J. Roy. Statist. Soc. B, Statist. Methodol., vol. 68,
no. 1, pp. 49–67, 2006.
[25] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. Roy.
Statist. Soc. B, Methodol., vol. 58, no. 1, pp. 267–288, 1996.
[26] Y. Cong et al., “Sparse reconstruction cost for abnormal event detection,”
in Proc. IEEE Conf. CVPR, Jun. 2011, pp. 3449–3456.
[27] V. Rodehorst and O. Hellwich, “Genetic algorithm SAmple consensus
(GASAC)—A parallel strategy for robust parameter estimation,” in Proc.
IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPRW),
Jun. 2006, p. 103.
[28] M. Berger, Problems in Geometry. New York, NY, USA:
Springer-Verlag, 1984.
[29] M. Elad, Sparse and Redundant Representations From Theory to
Applications in Signal and Image Processing. New York, NY, USA:
Springer-Verlag, 2010.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
10 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
[30] A. Lingua, D. Marenchino, and F. Nex, “Performance analysis of the
SIFT operator for automatic feature extraction and matching in pho-
togrammetric applications,” Sensors, vol. 9, pp. 3745–3766, May 2009.
[31] W. S. Hoge, “A subspace identification extension to the phase corre-
lation method,” IEEE Trans. Med. Imag., vol. 22, no. 2, pp. 277–280,
Feb. 2003.
[32] H. S. Stone, M. T. Orchard, E.-C. Chang, and S. A. Martucci, “A fast
direct Fourier-based algorithm for subpixel registration of images,” IEEE
Trans. Geosci. Remote Sens., vol. 39, no. 10, pp. 2235–2243, Oct. 2001.
[33] H. Foroosh, J. B. Zerubia, and M. Berthod, “Extension of phase cor-
relation to subpixel registration,” IEEE Trans. Image Process., vol. 11,
no. 3, pp. 188–200, Mar. 2002.
[34] S. Leprince, S. Barbot, F. Ayoub, and J.-P. Avouac, “Automatic and pre-
cise orthorectification, coregistration, and subpixel correlation of satellite
images, application to ground deformation measurements,” IEEE Trans.
Geosci. Remote Sens., vol. 45, no. 6, pp. 1529–1558, Jun. 2007.
[35] S. Nagashima, T. Aoki, T. Higuchi, and K. Kobayashi, “A subpixel image
matching technique using phase-only correlation,” in Proc. IEEE Int.
Symp. Intell. Signal Process. Commun. Syst., Dec. 2006, pp. 701–704.
[36] M. Guizar-Sicairos, S. T. Thurman, and J. R. Fienup, “Efficient subpixel
image registration algorithms,” Opt. Lett., vol. 33, no. 2, pp. 156–158,
2008.
[37] H. Nyquist, “Certain topics in telegraph transmission theory,” Trans.
Amer. Inst. Elect. Eng., vol. 47, no. 2, pp. 617–644, Apr. 1928.
[38] X. Dong et al., “Noise estimation of hyperspectral remote sensing image
based on multiple linear regression and wavelet transform,” Boletim de
Ciências Geodésicas, vol. 19, no. 4, pp. 639–652, 2013.
[39] P. Thévenaz and M. Unser, “Optimization of mutual information for
multiresolution image registration,” IEEE Trans. Image Process.,vol.9,
no. 12, pp. 2083–2099, Dec. 2000.
[40] M. Schneider et al., “Matching of high-resolution optical data to a
shaded DEM,” Int. J. Image Data Fusion, vol. 3, no. 2, pp. 111–127,
2012.
[41] E. M. Mikhail, J. S. Bethel, and J. C. McGlone, Introduction to Modern
Photogrammetry. New York, NY, USA: Wiley, 2001.
[42] H. Gonçalves, J. A. Gonçalves, and L. Corte-Real, “Measures for an
objective evaluation of the geometric correction process quality, IEEE
Geosci. Remote Sens. Lett., vol. 6, no. 2, pp. 292–296, Apr. 2009.
[43] Y. K. Han et al., “Automatic registration of high-resolution images using
local properties of features, Photogramm. Eng. Remote Sens., vol. 78,
no. 3, pp. 211–221, 2012.
Pouriya Etezadifar received the B.S., M.S., and
Ph.D. degrees from the University of Birjand, Bir-
jand, Iran, in 2011, 2013, and 2017, respectively, all
in communication engineering.
Since 2017, he has been a Faculty Member with
the Electrical Engineering Department, IHU Univer-
sity, Tehran, Iran, where he is an Assistant Professor.
His main research areas are sparse signal process-
ing, dictionary learning for sparse representation,
machine learning for signal processing, blind source
separation (BSS), sparse signal processing, statis-
tical signal processing, information theory, and digital speech/video/image
processing.
Hassan Farsi received the B.Sc. and M.Sc. degrees
from the Sharif University of Technology, Tehran,
Iran, in 1992 and 1995, respectively, and the Ph.D.
degree from the Centre of Communications Systems
Research (CCSR), University of Surrey, Guildford,
U.K., in 2004.
He currently works as a Professor of communica-
tion engineering with the Department of Electrical
and Computer Engineering, University of Birjand,
Birjand, Iran. He is interested in speech, image, and
video processing on wireless communications.
... Because of its flexibility, high efficiency, and low cost, UAV (Unmanned Aerial Vehicle) remote sensing technology has gradually emerged in many fields, such as accurate agriculture, resources investigation, environment management, and disaster monitoring [1][2][3]. How to yield high-precision registered UAV images quickly has become an inevitable challenge to the wide application of UAV technology [4][5][6][7][8][9]. However, the high resolutions of UAV images have a great influence on the detection and matching of image feature points. ...
Article
Full-text available
Image registration plays a vital role in the mosaic process of multiple UAV (Unmanned Aerial Vehicle) images acquired from different spatial positions of the same scene. Aimed at the problem that many fast registration methods cannot provide both high speed and accuracy simultaneously for UAV visible light images, this work proposes a novel registration framework based on a popular baseline registration algorithm, ORB—the Oriented FAST (Features from Accelerated Segment Test) and Rotated BRIEF (Binary Robust Independent Elemental Features) algorithm. First, the ORB algorithm is utilized to extract image feature points fast. On this basis, two bidirectional matching strategies are presented to match obtained feature points. Then, the PROSRC (Progressive Sample Consensus) algorithm is applied to remove false matches. Finally, the experiments are carried out on UAV image pairs about different scenes including urban, road, building, farmland, and forest. Compared with the original version and other state-of-the-art registration methods, the bi-matching ORB algorithm exhibits higher accuracy and faster speed without any training or prior knowledge. Meanwhile, its complexity is quite low for on-board realization.
... The normalization function used to be designed in different ways, based on the application domain [45]. The efficiency of using sparse representation yields researchers to use it in different tasks such as image, speech, and video processing [8,30,48]. ...
Article
Full-text available
Due to the rapid increase of using surveillance cameras, it has become more important to re-identify persons on different non-overlapped cameras. Person re-identification is an important and challenging topic on machine vision and media processing. Few data for training, low quality of surveillance videos and varying position of persons among different cameras lead re-identification problems to be solved difficultly. This paper aims to introduce a new model which tries to overcome these challenges and to increase the person re-identification efficiency. The proposed method uses both hand-crafted and learned features by combining Convolutional Neural Network with Gaussian of Gaussian descriptor. Also, an arbitrary data augmentation is considered to train CNN more efficiently. After that, the person re-identification problem is modeled as a sparse problem which aims to find the best similar persons, avoiding in this way the need of metric learning algorithms. The proposed method is evaluated on three databases, namely CUHK01, CUHK03 and GRID. Experimental results show that the proposed method achieves better precision in most ranks compared to the some recent studies.
... SIFT algorithm was first proposed by Lowe [3], and the key points found by SIFT are some prominent points that do not change due to illumination, affine transformation, and noise, such as corner points, edge points, bright points in dark areas and dark points in bright areas. Because of such advantages, SIFT has been widely used to select interesting points for matching of remote sensing images [4][5][6]. However, SIFT requires the image to have enough textures, otherwise the constructed 128-dimensional vectors are not too differentiated. ...
Article
Full-text available
Extracting spatial objects and their key points from remote sensing images has attracted great attention of worldwide researchers in intelligent machine perception of the Earth’s surface. However, the key points of spatial objects (KPSOs) extracted by the conventional mask region-convolution neural network model are difficult to be sorted reasonably, which is a key obstacle to enhance the ability of machine intelligent perception of spatial objects. The widely distributed artificial structures with stable morphological and spectral characteristics, such as sports fields, cross-river bridges, and urban intersections, are selected to study how to extract their key points with a multihot cross-entropy loss function. First, the location point in KPSOs is selected as one category individually to distinguish morphological feature points. Then, the two categories of key points are arranged in order while maintaining internal disorder, and the mapping relationship between KPSOs and the prediction heat map is improved to one category rather than a single key point. Therefore, the predicted heat map of each category can predict all the corresponding key points at one time. The experimental results demonstrate that the prediction accuracy of KPSOs extracted by the new method is 80.6%, taking part area of Huai’an City for example. It is reasonable to believe that this method will greatly promote the development of intelligent machine perception of the Earth’s surface.
Article
Obtaining the earth-fixed coordinates is a fundamental requirement for long-distance unmanned aerial vehicle (UAV) flight. Global navigation satellite systems are the most common location model, but their signals are susceptible to interference from obstacles and complex electromagnetic environments. To solve this issue, a visual localization framework based on multi-source image feature learning (VL-MFL) is proposed. In the proposed framework, the UAV is located by mapping airborne images to the satellite images with absolute coordinate positions. Firstly, for the heterogeneity issues caused by the different imaging environments of drone and satellite images, a lightweight Siamese network based on 3-D attention mechanism is proposed to extract the consistent features from the multi-source images. Secondly, to overcome the problem of inaccurate localization caused by the large receptive field of traditional convolutional neural networks, the cell-divided strategy is imported to strengthen the position mapping relationship of multi-source images features. Finally, based on similarity measurement, a confidence evaluation mechanism is established and a search region prediction method is proposed, which is effectively improved the accuracy and efficiency in matching localization. To evaluate the location performance of the proposed framework, several related methods are compared and analysed in details. The results on the real-world datasets indicate that the proposed method has achieved outstanding location accuracy and real-time performance.
Chapter
In order to satisfy the demand of the number of matching points of satellite remote sensing images in the 3D reconstruction of geographic information space, a dense matching method for satellite remote sensing images based on multiple matching primitives is proposed in this paper, in which the matching algorithm of SIFT(Scale Invariant Feature Transform), the matching algorithm based on object space geometry constraint and the region matching point growing algorithm based on affine transformation are comprehensively used with multiple primitives. The experimental results show that the proposed method achieves the dense matching of satellite remote sensing images, which can meet the need of 3D reconstruction of geographic information space.KeywordsRemote sensingImage matchingMultiple primitivesObject space geometry constraintAffine transformation
Article
Scene matching navigation system (SMNS) remains challenging in many navigation tasks, which rely heavily on accuracy, computational efficiency, and robustness. Due to the different generation conditions of the matching images, it is difficult for traditional methods to cover every aspect of the three navigation performances. This paper aims at developing an accurate, fast, and robust SMNS based on vision/inertial fusion to provide complete navigation information for unmanned aerial vehicles (UAVs). Utilizing the mechanization results of the low-cost MEMS, the proposed system first completes the georeferencing of the real-time aerial images, in which the projection errors are reduced greatly by introducing an optimized factor to the homography matrix. Then, applying a robust noise processing strategy, an improved feature extraction algorithm is designed to eliminate most of the features that vary with climate, time, and season, which lays a solid foundation for the accuracy of the following matching procedure. Under the framework of the SMNS, a novel matching strategy based on logic graphs is designed, which can facilitate the matching procedure. Eventually, by combining the mechanization results of the MEMS and the matching results of the SMNS, the proposed system can provide complete navigation results. Experiments in typical and complex scenarios are carried out respectively to verify the effectiveness and robustness of the proposed system. Experimental results demonstrate that the proposed SMNS possesses accuracy, computational efficiency, and robustness, which outperforms the state-of-the-art strategies(i.e., HOPC, CFOG, PC) in terms of matching aerial and satellite images.
Article
Full-text available
Since the visible and infrared images have different imaging mechanisms, the difficulty of image registration has greatly increased. The grayscale difference between visible and infrared images is very disadvantageous for extracting feature points in homogenous region, but they both retain the obvious contour edge in the scene. After using the morphological gradient method, the grayscale edge of visible and infrared images can be obtained and their similarity is greatly improved, and their difference may be seen as the difference in brightness or grayscale. Therefore, we proposed a novel algorithm to realise real-time adaptive registration of visible and infrared images using morphological gradient and C_SIFT. Firstly, the morphological gradient method is used to extract the rough edges of visible and infrared images for aligning their visual features as a single similar type. Secondly, the C_SIFT feature detection operator is used to detect and extract feature points from the extracted edges. The C_SIFT uses the centroid method to describe the main direction of feature points, makes rotation invariance feasible. Finally, to verify the effectiveness of the proposed algorithm, we carried out a series of experiments in eight various scenarios. The experimental results show that the proposed algorithm has achieved good experimental results. The registration of visible and infrared images can be completed quickly by the proposed algorithm, and the registration accuracy is satisfactory.
Article
Full-text available
Change detection is a key problem for many remote sensing applications. In this paper, we present a novel unsupervised method for change detection between two high-resolution remote sensing images possibly acquired by two different sensors. This method is based on keypoints matching, evaluation, and grouping, and does not require any image co-registration. It consists of two main steps. First, global and local mapping functions are estimated through keypoints extraction and matching. Second, based on these mappings, keypoint matchings are used to detect changes and then grouped to extract regions of changes. Both steps are defined through an a contrario framework, simplifying the parameter setting and providing a robust pipeline. The proposed approach is evaluated on synthetic and real data from different optic sensors with different resolutions, incidence angles, and illumination conditions.
Article
Full-text available
We present the design of an entire on-device system for large-scale urban localization using images. The proposed design integrates compact image retrieval and 2D-3D correspondence search to estimate the location in extensive city regions. Our design is GPS agnostic and does not require network connection. In order to overcome the resource constraints of mobile devices, we propose a system design that leverages the scalability advantage of image retrieval and accuracy of 3D model-based localization. Furthermore, we propose a new hashing-based cascade search for fast computation of 2D-3D correspondences. In addition, we propose a new one-many RANSAC for accurate pose estimation. The new one-many RANSAC addresses the challenge of repetitive building structures (e.g. windows, balconies) in urban localization. Extensive experiments demonstrate that our 2D-3D correspondence search achieves state-of-the-art localization accuracy on multiple benchmark datasets. Furthermore, our experiments on a large Google Street View (GSV) image dataset show the potential of large-scale localization entirely on a typical mobile device.
Article
Full-text available
The education plays more and more important role in disseminating knowledge because of the explosive growth of knowledge. As one kind of carrier delivering knowledge, image also presents an explosive growth trend and plays an increasingly important role in education, medical, advertising, entertainment, etc. Aiming at the long time of massive image feature extraction in the construction of smart campus, the traditional Harris corner has such problems as low detection efficiency and many non-maximal pseudo-corner points etc. This paper proposes a Harris image matching method that combines adaptive threshold and RANSAC (Random Sample Consensus). Firstly, the Harris feature points are selected based on the adaptive threshold and the Forstner algorithm in this method. On the one hand, candidate points are filtered based on the adaptive threshold. On the other hand, the Forstner algorithm is used to further select the corner points. Secondly, the NCC (Normalized Cross Correlation matching) and the RANSAC are applied to precisely match the detected Harris corners. The experimental results show that compared with existing algorithms, the proposed method not only obtains a matching accuracy higher than 20% of Cui’s algorithm, but also saves more than 30% detection time of corner detection and image matching. Further more, the proposed method obtains a matching accuracy higher than 50% of the Cui’s algorithm, and saves more than 50% detection time of corner detection and image matching.
Article
Full-text available
We present the scalable design of an entire on-device system for large-scale urban localization. The proposed design integrates compact image retrieval and 2D-3D correspondence search to estimate the camera pose in a city region of extensive coverage. Our design is GPS agnostic and does not require the network connection. The system explores the use of an abundant dataset: Google Street View (GSV). In order to overcome the resource constraints of mobile devices, we carefully optimize the system design at every stage: we use state-of-the-art image retrieval to quickly locate candidate regions and limit candidate 3D points; we propose a new hashing-based approach for fast computation of 2D-3D correspondences and new one-many RANSAC for accurate pose estimation. The experiments are conducted on benchmark datasets for 2D-3D correspondence search and on a database of over 227K Google Street View (GSV) images for the overall system. Results show that our 2D-3D correspondence search achieves state-of-the-art performance on some benchmark datasets and our system can accurately and quickly localize mobile images; the median error is less than 4 meters and the processing time is averagely less than 10s on a typical mobile device.
Article
Image registration plays an important role in military and civilian applications, such as natural disaster damage assessment, environmental monitoring, ground change detection and military damage assessment, etc. This work presents a new feature-based non-rigid image registration method. The main contributions of this work are: (i) a dynamic Gaussian component density is designed to better exploit available potential image information and provide sufficient inlier pairs for image transformation; (ii) a spatial structure preservation, which consists of an image transformation space curvature preservation and a local spatial structure constrain, is proposed to constrain the image transforming cost as well as the local structure of feature points during feature point set registration. The performances of the proposed method in multi-spectral natural images, lowaltitude aerial images and medical images against four types of nine state-of-the-art methods are tested where our method shows the best performances in most scenarios.
Article
We propose a new method for estimation in linear models. The ‘lasso’ minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree‐based models are briefly described.
Article
The scale-invariant feature transform (SIFT) is known as one of the most robust local invariant feature and is widely applied to image matching and classification. However, There is few studies on SIFT for hyperspectral image (HSI). Hyperspectral image (HSI) embraces the spectral information reflecting the material radiation property and the geometrical relationship of the objects. Thus, HSI provides much more information than gray and color image. Therefore, this paper puts forward a spatial-spectral SIFT for HSI matching and classification by using the geometric algebra as its mathematic tool. It extracts and describes the spatial-spectral SIFT feature in the spatial-spectral domain to exploit both the spectral and spatial information of HSI. Firstly, a spatial-spectral unified model of spectral value and gradient change (UMSGC for short) is built to analyze spectral and spatial information for HSI synthetically. Secondly, the scale space for HSI based on UMSGC is designed. Finally, both the new detector and descriptor of the spatial-spectral SIFT for HSI that comprehensively consider spectral and spatial information are proposed. The experimental results show that the proposed algorithm demonstrates excellent performance in HSI matching and classification.
Article
Image matching remains an important and challenging problem in computer vision, especially for the dense correspondence estimation between images with high category-level similarity. The effectiveness of image matching largely depends on the advance of image descriptors. Inspired by the success of Convolutional Neural Network(CNN), we propose a hierarchal image matching method using the CNN feature pyramid, named as CNN Flow. The feature maps output by different layers of CNN tend to encode different information of the input image, such as the semantic information extracted from higher layers and the structural information extracted from lower layers. This nature of CNN feature pyramid is suitable to build the hierarchical image matching framework, which detects the patterns of different levels in an implicit coarse-to-fine manner. In particular, we take advantage of the complementarity of different layers using guidance from higher layer to lower layer. The high-layer features present semantic patterns to cope with the intra-class variations, and the guidance from high layers can resist the semantic ambiguity of low-layer features due to small receptive fields. The bottom-level matching utilize the low-layer features with more structural information to achieve finer matching. On one hand, extensive experiments and analysis demonstrate the superiority of CNN Flow in image dense matching under challenging variations. On the other hand, CNN Flow is demonstrated through various applications, such as fine alignment for intra-class object, scene label transfer and facial expression transfer.
Article
The Scale Invariant Feature Transform (SIFT) algorithm is utilized broadly in image registration to improve image qualities. However, the algorithm's complexity reduces its efficiency in biology study and usually requires real-time. In this article, we present an improved SIFT technique in software architecture for matching sequences of images taken from a line-scanning ophthalmoscope (LSO). The method generates the Gaussian Scale-space pyramid in frequency domain to complete the SIFT feature detector more quickly. A novel SIFT descriptor invariable with rotation and illumination is then created to reduce calculation time, implementing the original SIFT method, our improved SIFT method, and the graphic processing unit (GPU) version of our improved SIFT method. The experiments have shown that the improved SIFT is almost 2–3 times faster than the original while maintaining more robust performance, and the GPU implementation of the improved SIFT is 20 times faster than central processing unit (CPU) implementation and achieves acceleration at real-time as expected. Although tested on an LSO system, the improved SIFT method does not rely on the acquisition setup. As a result, this method can be applied to other imaging instruments, e.g., adaptive optics to increase their resolution in agreement.