ArticlePDF Available

Hilbert–Schmidt Independence Criterion Subspace Learning on Hybrid Region Covariance Descriptor for Image Classification

Authors:

Abstract and Figures

The region covariance descriptor (RCD), which is known as a symmetric positive definite (SPD) matrix, is commonly used in image representation. As SPD manifolds have a non-Euclidean geometry, Euclidean machine learning methods are not directly applicable to them. In this work, an improved covariance descriptor called the hybrid region covariance descriptor (HRCD) is proposed. The HRCD incorporates the mean feature information into the RCD to improve the latter’s discriminative performance. To address the non-Euclidean properties of SPD manifolds, this study also proposes an algorithm called the Hilbert-Schmidt independence criterion subspace learning (HSIC-SL) for SPD manifolds. The HSIC-SL algorithm is aimed at improving classification accuracy. This algorithm is a kernel function that embeds SPD matrices into the reproducing kernel Hilbert space and further maps them to a linear space. To make the mapping consider the correlation between SPD matrices and linear projection, this method introduces global HSIC maximization to the model. The proposed method is compared with existing methods and is proved to be highly accurate and valid by classification experiments on the HRCD and HSIC-SL using the COIL-20, ETH-80, QMUL, face data FERET, and Brodatz datasets.
This content is subject to copyright. Terms and conditions apply.
Research Article
HilbertSchmidt Independence Criterion Subspace Learning on
Hybrid Region Covariance Descriptor for Image Classification
Xi Liu ,
1
Peng Yang ,
1
Zengrong Zhan ,
1
and Zhengming Ma
2
1
School of Information Engineering, Guangzhou Panyu Polytechnic, Guangzhou 511483, China
2
School of Electronics and Information Technology, Sun Yat-Sen University, Guangzhou 510006, China
Correspondence should be addressed to Peng Yang; citystars@163.com
Received 10 December 2020; Revised 14 May 2021; Accepted 14 July 2021; Published 21 July 2021
Academic Editor: Muhammad Haroon Yousaf
Copyright ©2021 Xi Liu et al. is is an open access article distributed under the Creative Commons Attribution License, which
permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
e region covariance descriptor (RCD), which is known as a symmetric positive definite (SPD) matrix, is commonly used in
image representation. As SPD manifolds have a non-Euclidean geometry, Euclidean machine learning methods are not directly
applicable to them. In this work, an improved covariance descriptor called the hybrid region covariance descriptor (HRCD) is
proposed. e HRCD incorporates the mean feature information into the RCD to improve the latter’s discriminative perfor-
mance. To address the non-Euclidean properties of SPD manifolds, this study also proposes an algorithm called the Hilbert-
Schmidt independence criterion subspace learning (HSIC-SL) for SPD manifolds. e HSIC-SL algorithm is aimed at improving
classification accuracy. is algorithm is a kernel function that embeds SPD matrices into the reproducing kernel Hilbert space
and further maps them to a linear space. To make the mapping consider the correlation between SPD matrices and linear
projection, this method introduces global HSIC maximization to the model. e proposed method is compared with existing
methods and is proved to be highly accurate and valid by classification experiments on the HRCD and HSIC-SL using the COIL-
20, ETH-80, QMUL, face data FERET, and Brodatz datasets.
1. Introduction
A growing number of non-Euclidean data, such as sym-
metric positive definite (SPD) manifolds [1] and Grassmann
manifolds [2], are often encountered in vision recognition
tasks. In particular, SPD manifolds have attracted increased
attention in the form of the region covariance descriptor
(RCD) [3, 4], Gaussian mixed model (GMM) [5], tensors
[6–9], etc. In this work, we mainly discuss the image clas-
sification on SPD manifolds.
e RCD has been proved to be an effective descriptor in a
variety of applications [1012]. It captures the correlation be-
tween different features of an image and represents the image
with a covariance matrix. However, the mean vector of features
has been proved to be significant in image recognition tasks
[13, 14]. In this work, we construct a new image descriptor by
directly incorporating the mean feature information into the
RCD. e new image descriptor is called the hybrid region
covariance descriptor (HRCD). e HRCD inherits the
advantages of the RCD, and it is more discriminable than the
RCD. e images represented by the HRCD are also SPD
matrices that lie on SPD manifolds. Most classical machine
learning algorithms are constructed on linear spaces. Given the
non-Euclidean geometry of Riemannian manifolds, directly
using most of the conventional machine learning methods on
Riemannian manifolds is inadequate [15, 16]. erefore, the
classification of the points on Riemannian manifolds has be-
come a hot research topic.
Two main approaches are generally adopted to cope with
the nonlinearity of Riemannian manifolds. e first ap-
proach is to construct learning methods by directly con-
sidering the Riemannian geometry; one such method is the
widely used tangent approximation [17, 18]. Most existing
SPD classification methods have been proposed by making
use of Riemannian metrics [15, 16] or matrix divergences
[19, 20] as the distance measure for SPD matrices [21–23].
e other approach is to project the SPD matrices to
another space, such as a high-dimensional reproducing
Hindawi
Mathematical Problems in Engineering
Volume 2021, Article ID 6663710, 15 pages
https://doi.org/10.1155/2021/6663710
kernel Hilbert space (RKHS) [24] and another low-di-
mensional SPD manifold [25]. Classification algorithms can
be constructed on the projection space. Benefiting from the
success of kernel methods in Euclidean spaces, the kernel-
based classification scheme is a good choice for the analysis
of SPD manifolds and has shown promising performance
[26, 27]. Kernel-based methods embed manifolds into
RKHSs and further project these manifolds to Euclidean
spaces via an explicit mapping. Hence, algorithms designed
for linear spaces can be extended to Riemannian manifolds.
However, the mapping from RKHSs to Euclidean spaces
using existing methods is based on a linear assumption.
Moreover, the intrinsic connections of SPD matrices and
low-dimensional projections are ignored.
To circumvent this limitation of kernel-based methods,
we propose introducing the Hilbert–Schmidt independence
criterion (HSIC) to the kernel trick and refer to the resulting
method as the HSIC subspace learning (HSIC-SL) algo-
rithm. Specifically, we derive the log-linear and log-Gaussian
kernels to embed SPD matrices into a high-dimensional
RKHS and then project these points into a low-dimensional
vector space of the RKHS. To align the low-dimensional
representation with the intrinsic features of the input data,
we introduce statistical dependence between the SPD ma-
trices and the low-dimensional representation. In this work,
explicit mapping is obtained on the basis of subspace
learning and HSIC maximization. Here, HSIC can be used to
characterize the statistical correlation between two datasets.
e main contributions of this study are as follows:
(1) We propose a novel covariance descriptor called the
HRCD. e proposed descriptor explores discrimi-
native information effectively.
(2) e HSIC is first applied to the kernel framework on
SPD Riemannian manifolds, and a novel subspace
learning algorithm called the HSIC-SL is proposed.
e proposed method achieves effective classifica-
tion on the basis of global HSIC maximization.
(3) We identify two simple kernel functions involved in
the HSIC-SL algorithm. e diversity of kernels
improves the flexibility of the HSIC-SL.
e rest of the paper is organized as follows. We provide
a review of previous work in Section 2. A brief description
about RCD, RKHS, and HSIC is presented in Section 3. We
derive the proposed descriptor and algorithm in detail in
Section 4. e experimental results are presented in Section 5
to demonstrate the effectiveness of the HRCD and HSIC-SL.
Conclusions and future research directions are established in
Section 6.
2. Literature Review
is section presents a brief review of RCDs, as well as recent
manifold classification methods constructed on SPD
manifolds.
e RCD was first introduced by Tuzel et al. [28]. It
represents an image region with a nonsingular SPD matrix
by extracting the covariance matrix of multiple features.
e covariance matrix does not have any information about
size and ordering, which implies certain scale and rotation
independence. e RCD is used not only in image rec-
ognition but also in image set recognition tasks, in which an
image set is modeled with its natural second-order statistic
[4, 29]. e GMM could also serve as the SPD descriptor of
an image set. Under the assumption of the multi-Gaussian
distribution of an image set [30], hundreds of images in the
image set are assigned to a small number of Gaussian
components. Each Gaussian component is represented as
an SPD matrix [31]. us, the image set is described by
multiple SPD matrices. As mentioned previously, mean
vectors have also been proved to be important in recog-
nition tasks. In [32], the mean information was utilized in
an improved log-Euclidean Gaussian kernel. However, this
approach is limited to a specific algorithm and lacks
generality. In the current work, we propose to incorporate
the feature mean information and covariance matrix into a
new SPD matrix and introduce first-order statistic infor-
mation into the image RCD to improve the discriminant
ability of the descriptor.
When the manifold under consideration is an SPD
manifold, the tangent space of a particular point is a linear
space. Most works map SPD matrices onto the tangent space
of a particular point; thus, traditional linear classifiers can be
applied. Under this framework, dimensionality reduction
and clustering methods, such as Laplacian eigenmaps, local
linear embedding (LLE), and Hessian LLE, have been ex-
tended to Riemannian manifolds [17]. Tuzel et al. introduced
LogitBoost for classification on Riemannian manifolds [18].
e classifier has been generalized to multiclass classification
[33]. Sparse coding by embedding manifolds into identity
tangent spaces to identify the Lie algebra of SPD manifolds
was considered in [34]. Such tangent space approximations
could preserve manifold value data and eliminate the
swelling effect. However, flattening a manifold through
tangent spaces may generate inaccurate modeling, especially
for regions far away from the tangent pole.
Except for tangent approximation, many efforts have
been devoted to the distance measure on SPD manifolds to
measure the true SPD manifold geometry; examples include
the log-Euclidean Riemannian metric (LERM) [15] and the
affine invariance Riemannian metric (AIRM) [16]. Although
matrix divergences are not real Riemannian metric, they
provide fast and approximate distance computation. Siva-
lingam et al. proposed tensor sparse coding (TSC) for
positive definite matrices [35] that utilizes the Burg diver-
gence to perform sparse coding and dictionary learning on
SPD manifolds. Riemannian dictionary learning and sparse
coding (DLSC) [36] represents data as sparse combinations
of SPD dictionary atoms via a Riemannian geometric ap-
proach and characterizes the loss of optimization for DLSC
via the affine invariant Riemannian metric. However, these
methods cannot be applied to other Riemannian manifolds
because of the specificity of the specific metrics used. Em-
bedding discriminant analysis (EDA) [37] identifies a bi-
linear isometric mapping such that the resulting
representation maximizes the preservation of Riemannian
geodesic distance.
2Mathematical Problems in Engineering
As for the proposed kernel methods for SPD manifolds,
Riemannian locality preserving projections (RLPPs) [38]
embed Riemannian manifolds into low-dimensional vector
spaces by defining Riemannian kernels; moreover, their
computational complexity is heavy, and the kernel is not
always positive definite. Jayasumana et al. [39] presented a
framework on Riemannian manifolds to identify the positive
definiteness of Gaussian RBF kernels and utilized the log-
Euclidean Gaussian kernel in kernel principal component
analysis (KPCA) for a recognition task. Caseiro et al. pro-
posed a heat kernel mean shift on Riemannian manifolds
[40]. In [41], kernel DLSC based on LERM was introduced.
Harandi et al. proposed to seek sparse coding by embedding
the space of SPD matrices into Hilbert spaces through two
types of Bregman matrix divergences [42]. Covariance
discriminative learning (CDL) [4] utilizes a matrix logarithm
operator to define kernel functions and then explicitly maps
the covariance matrices from a Riemannian manifold to a
Euclidean space. Zhuang et al. proposed a data-dependent
kernel learning framework on the basis of the kernel learning
and Riemannian metric (KLRM) [43]. In [44], multi-kernel
SSCR (MKSSCR) created a linear combination of a set of
Riemannian kernels. Considerable results are achieved in the
kernel framework. To improve the performance of the kernel
trick, we introduce a statistical dependence constraint be-
tween SPD matrices and projections and measure the sta-
tistical dependence with the HSIC.
3. Related Work
In this section, we briefly review the RCD and the properties
of the RKHS and HSIC.
3.1. Region Covariance Descriptor. Region covariance de-
scriptor (RCD), as a special case of SPD matrices, proposes a
natural way of fusing multiple features. Suppose Ris an
image region of size h×w, and we can extract multiple
features of every pixel in R. e features could be location,
grey values, and gradients. We denote the feature vector of
the k-th pixel as
zkx, y, I, Ix
, Iy
 ,(1)
where xand ydenote the location, Iis the grey value, and Ix
and Iyare the gradients with respect to xand y. e RCD of
Ris defined as
Σ1
n1
n
k1
zkμ
zkμ
 T,(2)
where nh×wand μ� (1/n)n
k1zkRddenotes the
mean of the points. en, the image region can be presented
by a d×dSPD matrix, where ddepends on the number of
features.
3.2. RKHS. Reproducing kernel Hilbert space (RKHS) is the
theoretical basis of kernel methods. After projecting the data
into a RKHS, various machine learning methods will be
implemented in the RKHS.
Let S(Ω)be a function space, and 〈·,·〉 is an inner
product defined on S(Ω). e complete inner product space
H� (S(Ω),〈·,·〉) induced by 〈·,·〉 is a Hilbert space. For all
x∈ Ω and fS(Ω), if the function ksatisfies
f(x) � 〈f, k, x)〉, then kis the reproducing kernel of the
RKHS H. We denote the mapping defined by the repro-
ducing kernel as ϕ(x) � k, x) � kxH. We can induce
that
ϕ(x),ϕ(y)〉 kx, k, y)kx(y) � k(y, x) � k(x, y).
(3)
e function kcould be a kernel function only if the
kernel matrix Kis symmetric positive definite, where
K
k(x1, x1) · · · k(x1, xn)
⋮ ⋱ ⋮
k(xn, x1) · · · k(xn, xn)
. According to Mercer’s
theorem [45], once a valid reproducing kernel is defined, we
can generate a unique Hilbert space.
3.3. Hilbert–Schmidt Independence Criterion (HSIC). e
HSIC [46] is usually used to characterize the statistical
correlation of two datasets. e mathematical theory of
HSIC has been studied for a long time and there are many
achievements [47–51]. In the computation of HSIC, the two
datasets are firstly embedded onto two RKHSs, and then the
HSIC of the two set of data is measured by the Hil-
bert–Schmidt (HS) operator of these two RKHSs.
Let Xbe a random variable/vector defined on ΩXand Y
be a random variable/vector defined on ΩY,HXand HYbe
two separate Hilbert spaces, and ϕX:ΩXHXand
ϕY:ΩYHYbe the kernel mappings defined by the
reproducing kernels, respectively.
3.3.1. Hilbert–Schmidt (HS) Operators. Let T:HXHY
be a compact operator and eX
i|iI
 be the orthonormal
basis of HX; if iITeX
i2
Y<+, then Tis called a Hil-
bert–Schmidt (HS) operator [52]. If for all
T, S HS(HXHY),iI|〈TeX
i, SeX
iY|<+, then
(HS(HXHY),〈·,·〉HS)is a Hilbert space. e inner
product 〈·,·〉HS is defined as T, SHS iITeX
i, SeX
iY.
For f0HXand g0HY, the tensor product of f0and g0is
denoted as f0g0. Since f0g0(f) � 〈f0, fXg0HY,
then f0g0HS(HXHY)[53].
3.3.2. Mean Functions and Cross Covariance Operators.
Let ΦX:HXRbe a continuous linear functional over
HX; for all THS(HXHY),
ΦX(f) � EXφX(X), fX
 .(4)
According to Riesz theorem, there must a unique HS
operator μXHXsuch that for all fHX,
ΦX(f) � 〈f, μXX; then, μXis called the mean function of
φX(X). Similarly, the mean function μYof φY(Y)is defined
in the same way.
Mathematical Problems in Engineering 3
Let Φbe a continuous linear functional over
HS(HXHY); for all THS(HXHY),
Φ(T) � EXY φX(X)φY(Y), THS
 .(5)
en, according to Riesz theorem, there must a unique
HS operator CXY HS(HX, HY), such that for all
THS(HX, HY),
Φ(T) T, CXYHS,(6)
where CXY is called the cross covariance operator between
φX(X)and φY(Y).
e relationship between CXY,μX, and μYis illustrated in
Figure 1. e two datasets ΩXand ΩYare embedded into HX
and HYby the kernel functions ϕX:ΩXHXand
ϕY:ΩYHY, respectively. μXand μYare the mean
functions. e HSIC of ΩXand ΩYis given by the Hil-
bert–Schmidt (HS) operator CXY of HXand HY.
3.3.3. HSIC. HSIC of two random variables/vectors is de-
fined as follows:
HSIC(X, Y) � EXY φX(X) − μX
 φY(Y) − μY
 
2
HS
 .
(7)
It can be seen from the definition of HSIC(X, Y)that
instead of directly calculating the covariance of Xand Y, i.e.,
EXY[(XEX[X])(YEY[Y])], HSIC first transforms X
and Yinto HXand HY, respectively, and then calculates the
covariance of φX(X)and φY(Y)by using HS operators
between HXand HY. In practice, HXand HYare generated
from kernel functions kXand kY.
If the joint probability distribution of Xand Yis given or
known, HSIC(X, Y)can be calculated as follows:
HSIC(X, Y) � EXY φX(X) − μX
 φY(Y) − μY
 
2
HS
 CXY μXμY
2
HS
CXY, CXY HS 2CXY,μXμYHS +μXμY,μXμYHS,
(8)
where
CXY, CXY HS EXYEXYkXX, X
 kYY, Y
  ,(9)
CXY,μXμYHS EXY EXkXX, X
  EYkYY, Y
  ,
(10)
μXμY,μXμYHS μX,μXXμY,μYY.(11)
Generally speaking, the joint probability distribution of
Xand Yis unknown, and only some samples of Xand Yare
given: Xx1,. . . , xN
  ΩX, Y y1,. . . , yN
  ΩY. In this
case, the statistical average can be approximated by the
sample average. Moreover, it is assumed that when ij, the
probability of the random event Xxi;Yyj
 is 0; then,
the cross covariance operator CXY and the mean functions
μXand μYcan be approximated as follows:
HX, kXHY, kY
CXY
φX
φ (X)φ (Y)
μXμY
φY
X
Y
XY
Figure 1: e sketch mapping of HSIC.
4Mathematical Problems in Engineering
CXY 1
N
N
i1
φXxi
 φYyi
 ,
μX1
N
N
i1
φXxi
 ,
μY1
N
N
i1
φYyi
 .
(12)
Substituting equation (12) into equations (9) and (10)
gives
CXY, CXY HS 1
N2tr KXKY
 ,
CXY,μXμYHS 1
N3ΓT
NKXKYΓN,
μXμY,μXμYHS 1
N4ΓT
NKXΓNΓT
NKYΓN,
(13)
where ΓN� [1,. . . ,1]TRNis the N-dimensional vector
with all elements being 1 and KXand KYare the kernel
matrices of Xand Y, respectively.
At last, the calculation formula of HSIC(X, Y)can be
obtained by
HSIC(X, Y) � CXY μXμY
2
HS
CXY, CXY HS +μXμY,μXμYHS
2CXY,μXμYHS
1
N2tr KYCNKXCN
 ,
(14)
where CNIN− (1/N)ΓNΓT
Nis the centralizing matrix.
4. HRCD and HSIC Subspace Learning
e algorithm can be divided into four steps. First, we model
the labeled training samples through the proposed HRCD.
Each training sample is described by an SPD matrix. Second,
we embed the SPD matrices into a high-dimensional RKHS
Hwith a defined kernel function and further project the
elements in the RKHS into a vector space H. e mapping
f:HHis explored by solving the optimization
problem. ird, we use the explicit map in mapping the
training and test samples onto the low-dimensional and
relatively discriminative space. Finally, the classification task
can be realized by executing a classifier on H. An overall
illustration of the algorithm is shown in Figure 2.
Given a set of training samples belonging to cclasses
χX1, X2,. . . , XN
 M,XiRd×dis an SPD matrix, and
ll1, l2,. . . , lN
denote the corresponding labels. e
representation of χon the low-dimensional vector space H
is denoted as Y� [y1, y2,. . . , yN],yiRm. In the frame-
work of the kernel analysis, the low-dimensional
representation yiof Xiis obtained by the mapping
yiWKiRow, where KiRow � [ki1, ki2,. . . , kiN ]T.
4.1. Hybrid Region Covariance Descriptor. As mentioned
previously, we propose incorporating the feature mean in-
formation into the RCD to improve the discrimination of the
descriptor. We refer to the resulting descriptor as the HRCD.
Given an image region R, we extract multiple features of
each point in Rand then compute the mean vector and
covariance matrix of the features. Suppose that the feature
vector of the k-th pixel is zk; the mean vector μRdand
covariance matrix Σ ∈ Rd×dcan then be computed as
μi1
n
n
k1
zkRd,
Σ1
n1
n
k1
zkμ
zkμ
 T.
(15)
Following the information geometry theory [54], we
combine the mean and the covariance matrix into a new
matrix without additional computational complexity. e
new matrix is constructed as
X� |Σ|− (1/(d+1)) Σ+μμTμ
μT1
.(16)
Here, dis the dimensionality of the feature vector, and |·|
is the determinant operator. e (d+1) × (d+1)SPD
matrix Xis the HRCD of the image. As a result of the
inheritance from the covariance matrix, the HRCD is not
only effective, robust, and low-dimensional but also more
discriminable than the RCD.
4.2. Kernel Function in HSIC-SL. In defining a valid RKHS,
the kernel must be symmetric positive definite. Many dis-
cussions on the symmetric positive definiteness of kernel
functions are based on vector spaces. In this section, we
introduce two typical kernel functions on SPD Riemannian
manifolds.
4.2.1. Log-Linear Kernel. e polynomial kernel is one of the
commonly used kernel functions in Euclidean spaces. e
polynomial kernel function in a vector space is defined as
k xi, xj
 αxT
ixj+β
 c,(17)
where xi, xjRn. If αc1 and β0, then equation (17)
is a linear kernel.
k xi, xj
 xT
ixj.(18)
When the linear kernel is developed into SPD Rie-
mannian manifolds, it should be redefined in a sophisticated
form. e linear kernel on SPD Riemannian manifolds can
be defined as
Mathematical Problems in Engineering 5
k Xi, Xj
 tr log Xi
  Tlog Xj
  tr log Xi
 log Xj
  .
(19)
We denote the kernel as the log-linear kernel.
4.2.2. Log-Gaussian Kernel. e Gaussian kernel is another
popular kernel function in Euclidean spaces. e definition
of a Gaussian kernel function is
k xi, xj
 exp d2xi, xj
 
2σ2
,(20)
where d(xi, xj) � ‖xixjF. A good effect can be achieved
by replacing the Euclidean distance with a log-Euclidean
distance. e log-Gaussian kernel is defined by
kLE Xi, Xj
 exp d2
LE Xi, Xj
 
2σ2
,(21)
where dLE(Xi, Xj) � ‖log(Xi) − log(Xj)‖Fis the log-Eu-
clidean distance between Xiand Xj. e positive definite-
ness of kLE was proved in [39].
e parameter σis an important parameter in the
Gaussian kernel. To make the log-Gaussian kernel sensitive
to distances, we suggest setting σto the average value of
distances between the training samples.
4.3. HSIC Subspace Learning. After embedding the matrices to
the RKHS, we further project the points into a vector space
through explicit mapping. We aim to find the explicit mapping
from the RKHS to the vector space by maximizing the HSIC
between the SPD matrices and the low-dimensional repre-
sentation, as well as preserving the local information. e
proposed HSIC-SL includes global HSIC maximization and
within-class information preservation.
We denote the HSIC of χand the low-dimensional
representation Yas HSIC(χ, Y). According to equation (14),
HSIC(χ, Y)can be computed as
HSIC(χ, Y) � 1
N2tr KYCNKχCN
 .(22)
e input data χand projection Yare represented by Kχ
and KY, respectively. To explicitly realize the low-dimen-
sional representation, we define the kernel function of Yin
HSIC(χ, Y)as kY:Rd×RdR,y, yRd; that is,
kYy, y
 yTy.(23)
We denote the kernel matrix of kYas KY. It can be
computed by
KY
yT
1y1· · · yT
1yN
⋮ ⋱
yT
Ny1· · · yT
NyN
YTY. (24)
Substituting equation (24) to equation (22) yields
HSIC(χ, Y) � 1
N2tr KYCNKχCN
 
1
N2tr YTYCNKχCN
 
1
N2tr YCNKχCNYT
 .
(25)
As Nis not related to Y, the coefficient (1/N2)in
equation (25) can be omitted. en, we have
HSIC(χ, Y) � tr WKχCNKχCNKT
χWT
 tr WLHWT
 ,
(26)
where LHKχCNKχCNKT
χ.
e within-class information is represented by the
within-class scatter SW, which is defined as
SWtr
c
i1
Ni
j1
ykmi
ykmi
 T
,(27)
where Niis the number of training samples of the i-th class,
c
i1NiN, and miis the mean vector corresponding to the
f
M
H
H
Figure 2: Framework of the proposed method. Each image is represented by an SPD matrix on the manifold M. e points on Mare
embedded into the RKHS Hand are further projected to a low-dimensional and discriminative subspace H. e map is optimized via the
cost function f.
6Mathematical Problems in Engineering
i-th class. According to the relationship between yiand Xi,
equation (27) can be further transformed into
SWtr W
c
i1
Ni
j1
KjRow Kmi
KjRow Kmi
 T
WT
tr WLWWT
 ,(28)
where Kmi� (1/Ni)Ni
j1ki
j,LWc
i1Ni
j1(KjRow mi)
(KjRow mi)T.
In sum, the objective function is formulated as
J(W) � arg max
W
(HSIC(χ, Y)) � arg max
W
WLHWT
 
s.t.tr WLWW
 tr LW
 .
(29)
e Rayleigh quotient maximum problem is commonly
used in optimization problems because of the fast and simple
calculation. e problem shown in equation (29) can be
solved by calculating the Rayleigh quotient maximum. To
tackle the singularity, we add a small perturbation εto the
diagonal elements of LW. e optimal projection matrix Wis
composed of the eigenvectors corresponding to the m
biggest eigenvalues of (LW+εIN)1LH, where INis the
identity matrix.
Hence, for the given test image, we first compute its
HRCD and denote the result as Xt. e projection can be
obtained by ytWKtRow. en, the class of the test image
can be predicted through the nearest neighbor classifier.
5. Experiment
e performance of the HRCD and the proposed algorithm
is verified in this section. We considered five widely studied
image datasets: COIL-20 (Columbia Object Image Library)
dataset [55], ETH-80 dataset [56], Queen Mary University of
London (QMUL) dataset [57], face data FERET dataset [58],
and Brodatz dataset [59]. All of the compared methods were
implemented in MATLAB R2014 and tested on an Intel(R)
Core(TM) i5-4670K (3.40 GHz) machine.
5.1. Performance of HRCD. To verify that the HRCD is an
effective image descriptor, we directly used the KNN clas-
sifier on the image feature space represented by the HRCD
and RCD without feature extraction. By adopting the Eu-
clidean metric, LERM, AIRM, and Burg divergence as the
measurements, the classification experiments were per-
formed on COIL-20 and ETH-80. e COIL-20 dataset
contains 20 objects, each of which contains 72 images
measuring 128 ×128 at different directions. Figure 3 shows
the sample pictures. Features including grey values and first-
and second-order gradients were extracted to calculate the
RCD and HRCD of an image. Hence, the RCD and HRCD of
an image were a 5 ×5 SPD matrix and a 6 ×6 SPD matrix,
respectively. e images were randomly split into the
training set and test set, with 10 pictures assigned to the
training set and the remaining images assigned to the test set.
ETH-80 is an image set containing eight types of objects,
such as apple, pears, cars, and dogs. Each object has 10
instances, and each instance contains images from 41 dif-
ferent viewpoints. e images in ETH-80 were resized to
128 ×128 (Figure 4). For the RCD and HRCD represen-
tations, we extracted the following features:
F(x, y) � x, y, Rx,y , Gx,y, Bx,y, Ix,y, Ix
, Iy
, Ixx
, Iyy
 ,
(30)
where Rx,y,Gx,y ,Bx,y are the RGB color values of a pixel at
the position of xand y,Ix,y is the greyscale value, and |Ix|,
|Iy|,|Ixx|,|Iyy |are the first-and second-order gradients of
intensities. e RCD and HRCD of the image were a 10 ×10
SPD matrix and a 11 ×11 SPD matrix, respectively. Half of
the instances in every object were used for training, and the
remaining instances were used for the test. Each instance in
the training and test sets comprised 100 random samples.
erefore, the training and test sets each contained 800
images.
Table 1 lists the classification accuracies and runtimes
under different metrics. To eliminate the randomness of the
experiment, we obtained the average accuracy and runtime
for 20 tests.
5.2. Performance of HSIC-SL. e proposed HSIC-SL was
compared with several recognition methods on SPD man-
ifolds. e compared methods included RLPP [38], KSLR
[32], CDL [4], KPCA using the log-Gaussian kernel [39],
RSR [42], TSC [35], Riem-DLSC [36], logEuc-SC [34],
Geometry-DR [25], KLRM-DL [43], EDA [37], and
MKSSCR [44]. For brevity, we denote the HSIC-SL with the
log-linear kernel as HSIC-SL (log-linear) and that with the
log-Gaussian kernel as HSIC-SL (log-Gaussian). HSIC-SL
(log-linear) and HSIC-SL (log-Gaussian) were combined
with the RCD and HRCD. us, for the proposed HSIC-SL,
four different combinations were tested. For equality, the
important parameters of the comparison methods were set
according to the suggestion of the original paper.
5.2.1. Experiments on QMUL Dataset. e QMUL dataset
[44] is a set of images of human heads collected from airport
terminal cameras. e dataset is composed of 20,005 images.
It is divided into five classes according to the direction of the
head images: back, front, left, right, and background. e
samples from QMUL are shown in Figure 5. e dataset was
divided into the training and test sets in advance. Table 2
shows the number of training and test sets in every class. e
extracted feature of any pixel is
Mathematical Problems in Engineering 7
F(x, y) � IL(x, y), Ia(x, y), Ib(x, y),
I2
x+I2
y
,arctan I2
x
I2
y
, G1(x, y),..., G8(x, y)
,(31)
where IL(x, y),Ia(x, y), and Ib(x, y)are the three channel
values of the CIELAB color space, Ixand Iyare the first-order
gradients in the x- and y-directions of IL(x, y), respectively,
and Gi(x, y);i1,. . . ,8is the response of eight difference-of-
Gaussians filters. We obtained a 13 ×13 SPD matrix for the
RCD and a 14 ×14 SPD matrix for the HRCD. e training
data consisted of 200 randomly selected samples for each
category, and the test set consisted of 100 randomly selected
samples. e KNN (k12) search was used to construct the
neighborhood graphs in the RLPP and Geometry-DR. e
parameters (σ) in the kernels of the KPCA, RLPP, KSLR, and
HSIC-SL were set to the average distances. e parameter cin
the KSLR was set to 0.3. e parameter εin the proposed
method was set to 0.001. We evaluated the performance of the
CDL, RLPP, KSLR, Geometry-DR, and HSIC-SL for various
dimensions and reported the maximum performance. In
logEuc-SC, RSR, TSC, Riem-DLSC, and KLRM-DL, 50 dic-
tionaries and kernel parameters were learned from the training
set. e kernel function in the RSR and the basic kernel in the
KLRM-DL was the Stein kernel. e parameter alpha was set to
0.1, and the number of data samples was set to 30. e 1NN
classifier was adopted in all the algorithms.
In Table 3, we show the recognition accuracy of the
HSIC-SL and the other existing algorithms. To eliminate the
Figure 3: Sample images of COIL-20.
Figure 4: Sample images of ETH-80.
8Mathematical Problems in Engineering
randomness of the experiment, we used the average rec-
ognition rate for 20 tests. HSIC-SL (log-Gaussian) + HRCD
and HSIC-SL (log-linear) + HRCD achieved impressive
performance while HSIC-SL (log-Gaussian) + HRCD ob-
tained the highest classification accuracy. Moreover, the
accuracy of the HRCD was greater than that of the RCD in
the experiment. ese results indicated that the HRCD was
better than the RCD. Furthermore, HSIC-SL + HRCD was
better than the other algorithms.
5.2.2. Experiments on FERET Dataset. To conduct the face
recognition experiment, we used the “b” subset of the
FERET dataset [56], which consists of 2,000 face images of
200 people. e images are those of 71 females and 129 males
of diverse ethnicities, genders, and ages. e images were
cropped and downsampled to 64 ×64. e training set was
composed of images with “ba,” “bc,” “bh,” and “bk” labels.
Images marked as “bd,” “be,” “bf,” and “bg” constituted the
test set. e feature vector for computing the RCD and
HRCD is described by
F(x, y) � x, y, I(x, y), G00 (x, y)
,. . . , G47(x, y)
 ,
(32)
where xand ydenote the position, I(x, y)is the intensity,
and Guv(x, y)is the response value of the Gabor filter. e
direction uof the Gabor filter was from 0 to 4, and the scale v
was from 0 to 7. us, the RCD and HRCD of each image
were a 43 ×43 SPD matrix and a 44 ×44 SPD matrix, re-
spectively. e neighborhood graphs constructed in the
RLPP and Geometry-DR were KNN (k3). e kernel
functions with Jeffrey and Stein divergences were adopted in
RSR and, respectively, denoted as RSR-J and RSR-S for
brevity. In RSR, TSC, Riem-DLSC, KLRM-DL, and logEuc-
SC, all training samples were regarded as dictionary atoms.
e settings of the other parameters were the same as those
for the QMUL dataset.
Table 4 shows the recognition rates of the compared
algorithms. e proposed method was not the best algorithm
for the FERET dataset. It only achieved the highest recog-
nition accuracy in the “bd” test scenario. Nevertheless, the
average recognition accuracies of HSIC-SL were still better
than those of the other algorithms and were only slightly
worse than those of KLRM-DL. Hence, HSIC-SL was still a
feasible algorithm for the FERET dataset. We also noticed
that HSIC-SL (log-Gaussian) performed better than HSIC-
SL (log-linear). erefore, the log-Gaussian kernel was more
suitable than the log-linear kernel for this dataset.
5.2.3. Experiments on Brodatz Dataset. We performed two
texture classification experiments on the Brodatz dataset
[57]. Examples from the Brodatz dataset are shown in
Figure 5: Sample images of QMUL dataset.
Table 1: Comparison of RCD and HRCD in terms of classification accuracy (%) and runtime (seconds).
Metric Descriptor COIL-20 ETH-80
Classification accuracy (%) Runtime (s) Classification accuracy (%) Runtime (s)
Euclidean RCD 74.98 2.946 62.63 5.234
HRCD 59.88 2.908 66.34 5.299
LERM RCD 84.81 4.149 71.03 7.1
HRCD 88.99 4.262 72.04 7.759
AIRM RCD 87.10 118.33 71.64 482.47
HRCD 91.06 125.79 73.35 509.15
Burg divergence RCD 89.23 10.102 72.07 26.368
HRCD 91.71 10.492 73.63 26.66
Table 2: Distribution of QMUL dataset.
Label Back Background Front Left Right
Training 2256 2256 2256 2256 2256
Test 2096 1107 1772 1502 2248
Table 3: Comparison of classification accuracies on QMUL dataset.
Methods Accuracy (%)
KPCA 42.5
CDL 76
RLPP 58.4
KSLR 77.4
logEuc-SC 66.3
RSR 73.2
TSC 61.7
Riem-DLSC 36.6
Geometry-DR 69.2
KLRM_DL 70.74
EDA 75.85
MKSSCR 78.14
HSIC-SL (log-linear) + RCD 76.22
HSIC-SL (log-linear) + HRCD 78.29
HSIC-SL (log-Gaussian) + RCD 76.16
HSIC-SL (log-Gaussian) + HRCD 79.65
Mathematical Problems in Engineering 9
Figure 6. e first experiment was a grouping experiment
with selected textures, and the other was a classification
experiment for all texture images.
In the first experiment, we followed the test setup
designed in [35] and selected three of the test schemes. e
schemes included one of the 5-texture groups, one of the 10-
texture groups, and one of the 16-texture groups. e
number of classes selected in each test scheme is shown in
Table 5. Each image was resized to 256 ×256, and then 64
regions measuring 32 ×32 were extracted. e covariance
matrices were computed from a five-dimensional feature
vector, including intensity and the first- and second-order
gradients. In each test scheme, eight samples in one image
were randomly selected as the training data, and the
remaining samples were used for the test. Geometry-DR was
not suitable for the dataset because of the low dimension of
the SPD matrices in this experiment. e results shown in
Figure 7 are the average results for 20 tests.
HSIC-SL achieved the highest classification result on all
test schemes, except for the 5-texture test, in which the
recognition rates of most of the algorithms were relatively
close. In this dataset, HSIC-SL (log-linear) performed better
than HSIC-SL (log-Gaussian).
In the second experiment, 20 random samples were chosen
as the training set, and 10 random samples were chosen as the
test set from all the texture images. e average results for 20
tests are presented in Table 6. HSIC-SL (log-linear) and HSIC-
SL (log-Gaussian) outperformed the other methods, with the
latter being marginally better than the former when all texture
pictures were classified. In addition, HSIC-SL modeled in the
HRCD was much higher than HSIC-SL modeled in the RCD.
e discriminative ability of the HRCD was verified again.
5.2.4. Experiments on COIL-20 and ETH-80 Datasets. In this
experiment, we used the COIL-20 dataset [55] and ETH-80
dataset in the object categorization task. e experimental
procedure was the same as that described in Section 5.1. We
compared the proposed method with KPCA [39], RLPP [38],
KSLR [32], and CDL [4]. In addition, KPCA and RLPP were
conducted on the HRCD and, respectively, denoted as
KPCA + HRCD and RLPP + HRCD. e classifier adopted
in all of the algorithms was the 1NN classifier.
Table 7 shows the classification accuracies of the
methods on COIL-20 and ETH-80. First, HSIC-SL obtained
the best accuracy in all of the datasets. is result indicated
that the introduction of the HSIC improved the effectiveness
of the recognition algorithm. Second, the classification ac-
curacies of RLPP, KPCA, and HSIC-SL in the RCD were
lower than those in the HRCD (i.e., RLPP + HRCD,
KPCA + HRCD, and HSIC-SL + HRCD). is result proved
once again that the HRCD had advantages over the RCD.
Finally, the effectiveness of the log-linear kernel and log-
Gaussian kernel in HSIC-SL was demonstrated in the
experiments.
5.3. Analysis of Dimensionality. e parameter mwas
regarded as the dimensionality of the vector space after
feature extraction. e curves of the classification accuracies
of the compared algorithms on COIL-20 [55], ETH-80 [37],
and Brodatz versus mare shown in Figures 8 and 9. e
experimental setups were the same as those described in the
previous section.
With the increase of the dimensionality, the recognition
accuracy curves showed an upward trend. When the rec-
ognition accuracy reached a certain value, the recognition
rate remained basically stable within a certain range of the
subspace dimension.
5.4. Discussion. In the above experiments, the performance
of the RCD and HRCD and the effectiveness of HSIC-SL and
the other algorithms were compared. e following obser-
vations were made:
(1) e classification accuracy in the image feature space
represented by the HRCD was better than that by the
RCD regardless of which classifier was used (i.e.,
KNN classifier without feature extraction or the
proposed HSIC-SL). e result showed that the
proposed image descriptor HRCD outperformed the
RCD.
(2) When the RCD was used as the image descriptor, the
HSIC-SL method was superior to most of the
methods, except for the FERET and Brodatz datasets.
In FERET, the performance of Riem-DLSC,
MKSSCR, and KLRM-DL was slightly better than
that of HSIC-SL + RCD (log-Gaussian kernel). In
Brodatz, the performance of HSIC-SL + RCD was
slightly worse than that of the other methods in the
5-texture group, 10-texture group, and 16-texture
group. Nevertheless, the recognition accuracy of
HSIC-SL + RCD in the experiment on all texture
images was higher than those of the other methods.
e results showed that HSIC-SL was indeed an
excellent algorithm on SPD manifolds, but it was
Table 4: Comparison of classification accuracies on FERET dataset.
Methods bd be bf bg Average
CDL 76.50 75.00 88.50 84.50 81.13
RLPP 58.40 60.00 67.00 60.50 61.48
KSLR 83.00 90.00 96.00 91.00 90.00
logEuc-SC 74.00 94.00 97.50 80.50 86.50
RSR-S 82.50 94.50 98.00 83.50 89.63
RSR-J 79.50 96.50 97.50 86.00 89.88
TSC 36.00 73.00 73.50 44.50 56.75
Riem-DLSC 88.25 93.50 96.50 91.75 92.50
Geometry-DR 80.50 78.00 86.50 83.00 82.00
KLRM-DL 89.50 96.00 97.00 94.00 94.13
EDA 86.00 90.00 95.50 92.00 90.88
MKSSCR 88.50 92.00 96.00 94.50 92.75
HSIC-SL
(log-linear) + RCD 83.50 87.50 94.00 91.00 89.00
HSIC-SL
(log-linear) + HRCD 83.50 87.00 93.50 91.50 88.88
HSIC-SL
(log-Gaussian) + RCD 88.00 88.50 95.00 93.00 91.13
HSIC-SL
(log-Gaussian) + HRCD 90.00 90.50 96.50 93.50 92.63
10 Mathematical Problems in Engineering
inferior in the classification of datasets with subtle
features, such as face recognition and texture rec-
ognition. At the same time, the HRCD makes up for
this defect to a certain extent. e performance of
HSIC-SL + HRCD was almost superior to that of all
methods. However, in the FERET dataset, the av-
erage recognition accuracy of HSIC-SL was lower
than that of KLRM-DL.
Figure 6: Sample images of Brodatz dataset.
Table 5: Images selected for each group.
Experiment number Images selected
5-texture D77, D84, D55, D53, D24
10-texture D4, D9, D19, D21, D24, D28, D29, D36, D37, D38
16-texture D3, D4, D5, D6, D9, D21, D24, D29, D32, D33, D54, D55, D57, D68, D77, D84
0.4
0.5
0.6
0.7
0.8
0.9
1
5-texture 10-texture 16-texture
CDL
RLPP
KSLR
logEuc-SC
RSR-J
RSR-S
TSC
Riem-DLSC
KLRM_DL
HSIC-SL (log-linear) + RCD
HSIC-SL (log-linear) + HRCD
HSIC-SL (log-Gaussian) + RCD
HSIC-SL (log-Gaussian)
+ HRCD
Figure 7: Comparison of algorithms for 1NN classification accuracy in Brodatz dataset.
Table 6: Accuracy of all texture classifications of Brodatz.
Methods Accuracy (%)
CDL 85.25
RLPP 75.8
KSLR 85.55
logEuc-SC 63.24
RSR-S 78.32
RSR-J 76.7
Riem-DLSC 37.1
Geometry-DR 74.9
KLRM-DL 79.23
HSIC-SL (log-linear) + RCD 84.87
HSIC-SL (log-linear) + HRCD 88.87
HSIC-SL (log-Gaussian) + RCD 85.98
HSIC-SL (log-Gaussian) + HRCD 89.66
Mathematical Problems in Engineering 11
Table 7: Comparison of recognition rates (%) of different methods.
Methods COIL-20 ETH-80
KPCA 81.05 72.61
KPCA + HRCD 83.79 73.7
RLPP 85.89 74.08
RLPP + HRCD 88.79 75.63
CDL 94.54 79.92
KSLR 96.24 81.66
HSIC-SL (log-linear) + RCD 96.72 82.80
HSIC-SL (log-linear) + HRCD 97.75 84.60
HSIC-SL (log-Gaussian) + RCD 96.87 82.40
HSIC-SL (log-Gaussian) + HRCD 97.92 85.28
KPCA
RLPP
CDL
HSIC-SL (log-linear)
KPCA + HRCD
RLPP + HRCD
KSLR
HSIC-SL (log-Gaussian)
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
Recognition accuracy
6 10141822263034384246502
Dimensionality
Figure 8: Recognition rates versus different dimensionalities on COIL-20 database.
KPCA
RLPP
CDL
HSIC-SL (log-linear)
KPCA + HRCD
RLPP + HRCD
KSLR
HSIC-SL (log-Gaussian)
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
Recognition accuracy
12 22 32 42 52 62 72 82 92 1022
Dimensionality
Figure 9: Recognition rates versus different dimensionalities on ETH-80 database.
12 Mathematical Problems in Engineering
(3) In the experiments, we also compared the perfor-
mance of the log-Gaussian kernel and log-linear
kernel. In general, the log-Gaussian kernel was better
than the log-linear kernel. However, in the experi-
ments on QMUL and Brodatz, the log-linear kernel
obtained better results than the log-Gaussian kernel.
e difference in performance indicated that the
choice of kernel affected the performance of HSIC-
SL. We can improve the performance of HSIC-SL by
selecting a suitable kernel function.
6. Conclusions
In this work, we propose an improved covariance descriptor
called the HRCD, which represents images with SPD ma-
trices. e HRCD inherits the advantages of the RCD and is
more effective.
To address the classification problem on SPD Rieman-
nian manifolds, we propose an efficient image classification
method that is based on a kernel framework. We refer to it as
HSIC-SL. rough the definition of the log-linear kernel and
log-Gaussian kernel, the input images represented by SPD
matrices can be embedded into the RKHS. To seek explicit
mapping from the RKHS to the vector space, HSIC-SL
constructs the objective function on the basis of the
framework of subspace learning and HSIC maximization.
HSIC-SL always outperforms other representative methods
without increasing computational complexity.
e proposed algorithm also has certain limitations. e
average classification accuracy is slightly worse than that of
KLRM-DL on the FERET dataset. Hence, the covariance
descriptor is not strong enough to handle the classification of
small details, such as face recognition. For our future work,
we will employ other effective features to form the covari-
ance matrices. We will also explore other useful kernel
functions to suit different types of datasets.
Data Availability
e data used to support the findings of this study are
available from the corresponding author upon request.
Conflicts of Interest
e authors declare that they have no conflicts of interest.
Acknowledgments
is study was supported by the National Natural Science
Foundation of China through the Project “Research on
Nonlinear Alignment Algorithm of Local Coordinates in
Manifold Learning” under grant no. 61773022, the Character
and Innovation Project of Education Department of
Guangdong Province under grant no. 2018GKTSCX081, the
Young Innovative Talents Project of Education Department
of Guangdong Province under grant no. 2020KQNCX191,
the Guangzhou Science and Technology Plan Project of
Bureau of Science and Technology of Guangzhou Munici-
pality under grant no. 202102020700, and the Educational
Big Data Enterprise Lab of Guangzhou Panyu Polytechnic
under grant no. 2021XQS05.
References
[1] V. Arsigny, P. Fillard, X. Pennec, and N. Ayache, “Geometric
means in a novel vector space structure on symmetric pos-
itive-definite matrices,” SIAM Journal on Matrix Analysis and
Applications, vol. 29, no. 1, pp. 328–347, 2007.
[2] M. Harandi, C. Sanderson, C. Shen, and B. Lovell, “Dictionary
learning and sparse coding on Grassmann manifolds: an
extrinsic solution,” in Proceedings of the 2013 IEEE Interna-
tional Conference on Computer Vision, pp. 3120–3127, Sydney,
Australia, 2013.
[3] F. Porikli, O. Tuzel, and P. Meer, “Covariance tracking using
model update based on Lie algebra,” in Proceedings of the 2006
IEEE Computer Society Conference on Computer Vision and
Pattern Recognition, New York, NY, USA, 2006.
[4] R. Wang, H. Guo, L. S. Davis, and Q. Dai, “Covariance
discriminative learning: a natural and efficient approach to
image set classification,” in Proceedings of the 2012 IEEE
Conference on Computer Vision and Pattern Recognition,
Providence, RI, USA, 2012.
[5] W. Wang, R. Wang, Z. Huang, S. Shan, and X. Chen,
“Discriminant analysis on Riemannian manifold of Gaussian
distributions for face recognition with image sets,” IEEE
Transactions on Image Processing, vol. 27, no. 1, pp. 151–163,
2017.
[6] I. L. Dryden, A. Koloydenko, and D. Zhou, “Non-Euclidean
statistics for covariance matrices, with applications to diffu-
sion tensor imaging,” e Annals of Applied Statistics, vol. 3,
no. 3, pp. 1102–1123, 2009.
[7] D. Le Bihan, J.-F. o. Mangin, C. Poupon et al., “Diffusion
tensor imaging: concepts and applications,” Journal of
Magnetic Resonance Imaging, vol. 13, no. 4, pp. 534–546, 2001.
[8] R. Caseiro, P. Martins, J. F. Henriques, and J. Batista, “A
nonparametric Riemannian framework on tensor field with
application to foreground segmentation,” Pattern Recogni-
tion, vol. 45, no. 11, pp. 3997–4017, 2012.
[9] M. G. omason and J. Gregor, “Higher order singular value
decomposition of tensors for fusion of registered images,”
Journal of Electronic Imaging, vol. 20, no. 1, pp. 13–23, 2011.
[10] Z. Gao, Y. Wu, M. Harandi, and Y. Jia, “A robust distance
measure for similarity-based classification on the SPD
manifold,” IEEE Transactions on Neural Networks and
Learning Systems, vol. 31, no. 9, pp. 3230–3244, 2020.
[11] Y. Wu, Y. Jia, P. Li, J. Zhang, and J. Yuan, “Manifold kernel
sparse representation of symmetric positive-definite matrices
and its applications,” IEEE Transactions on Image Processing,
vol. 24, no. 11, pp. 3729–3741, 2015.
[12] A. Sanin, C. Sanderson, M. T. Harandi, and B. C. Lovell,
“Spatio-temporal covariance descriptors for action and ges-
ture recognition,” in Proceedings of the 2013 IEEE Workshop
on Applications of Computer Vision (WACV), Clearwater
Beach, FL, USA, 2013.
[13] J. S´anchez, F. Perronnin, T. Mensink, and J. Verbeek, “Image
classification with the Fisher vector: theory and practice,”
International Journal of Computer Vision, vol. 105, no. 3,
pp. 222–245, 2013.
[14] H. Jegou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, and
C. Schmid, “Aggregating local image descriptors into compact
codes,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 34, no. 9, pp. 1704–1716, 2012.
Mathematical Problems in Engineering 13
[15] X. Pennec, P. Fillard, and N. Ayache, “A Riemannian
framework for tensor computing,” International Journal of
Computer Vision, vol. 66, no. 1, pp. 41–66, 2006.
[16] V. Arsigny, P. Fillard, X. Pennec, and N. Ayache, “Log-Eu-
clidean metrics for fast and simple calculus on diffusion
tensors,” Magnetic Resonance in Medicine, vol. 56, no. 2,
pp. 411–421, 2006.
[17] A. Goh and R. Vidal, “Clustering and dimensionality re-
duction on Riemannian manifolds,” in Proceedings of the 2008
IEEE Conference on Computer Vision and Pattern Recognition,
Anchorage, AK, USA, 2008.
[18] O. Tuzel, F. Porikli, and P. Meer, “Pedestrian detection via
classification on Riemannian manifolds,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 30, no. 10,
pp. 1713–1727, 2008.
[19] B. Kulis, M. Sustik, and I. Dhillon, “Learning low-rank kernel
matrices,” in Proceedings of the 23rd International Conference
on Machine Learning, Pittsburgh, PA, USA, 2006.
[20] S. Sra, “A new metric on the manifold of kernel matrices with
application to matrix geometric means,” in Proceedings of the
25th International Conference on Neural Information Pro-
cessing Systems, vol. 1, Lake Tahoe, NV, USA, 2012.
[21] D. Tosato, M. Farenzena, M. Spera, V. Murino, and
M. Cristani, “Multi-class classification on Riemannian
manifolds for video surveillance,” in Proceedings of the 2010
ECCV, Heraklion, Greece, 2010.
[22] J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu, “Se-
mantic segmentation with second-order pooling,” in Pro-
ceedings of the 2012 ECCV, Florence, Italy, 2012.
[23] A. Cherian, S. Sra, A. Banerjee, and N. Papanikolopoulos,
“Jensen-Bregman LogDet divergence with application to ef-
ficient similarity search for covariance matrices,” IEEE
Transactions on Pattern Analysis and Machine Intelligence,
vol. 35, no. 9, pp. 2161–2174, 2013.
[24] N. Aronszajn, “eory of reproducing kernels,” Transactions
of the American Mathematical Society, vol. 68, no. 3,
pp. 337–404, 1950.
[25] M. Harandi, M. Salzmann, and R. Hartley, “Dimensionality
reduction on SPD manifolds: the emergence of geometry-
aware methods,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 40, no. 1, pp. 48–62, 2018.
[26] M. T. Harandi, C. Sanderson, R. Hartley, and B. C. Lovell,
“Sparse coding and dictionary learning for symmetric positive
definite matrices: a kernel approach,” in Proceedings of the
ECCV 2012, Florence, Italy, 2012.
[27] R. Vemulapalli, J. K. Pillai, and R. Chellappa, “Kernel learning
for extrinsic classification of manifold features,” in Proceed-
ings of the 2013 IEEE Conference on Computer Vision and
Pattern Recognition, Portland, OR, USA, 2013.
[28] O. Tuzel, F. Porikli, and P. Meer, “Region covariance: a fast
descriptor for detection and classification,” in Proceedings of
the ECCV 2006, vol. 3952, Graz, Austria, 2006.
[29] H. Tan and Y. Gao, “Patch-based principal covariance dis-
criminative learning for image set classification,” IEEE Access,
vol. 5, pp. 15001–15012, 2017.
[30] O. Arandjelovic, G. Shakhnarovich, J. Fisher, R. Cipolla, and
T. Darrell, “Face recognition with image sets using manifold
density divergence,” in Proceedings of the 2005 IEEE Computer
Society Conference on Computer Vision and Pattern Recog-
nition (CVPR’05), San Diego, CA, USA, 2005.
[31] M. Lovri´
c, M. Min-Oo, and E. A. Ruh, “Multivariate normal
distributions parametrized as a Riemannian symmetric
space,” Journal of Multivariate Analysis, vol. 74, no. 1,
pp. 36–48, 2000.
[32] X. Liu and Z. Ma, “Kernel-based subspace learning on Rie-
mannian manifolds for visual recognition,” Neural Processing
Letters, vol. 51, no. 1, pp. 147–165, 2020.
[33] R. Caseiro, P. Martins, J. F. Henriques, F. S. Leite, and
J. Batista, “Rolling Riemannian manifolds to solve the multi-
class classification problem,” in Proceedings of the 2013 IEEE
Conference on Computer Vision and Pattern Recognition,
vol. 9, Portland, OR, USA, 2013.
[34] K. Guo, P. Ishwar, and J. Konrad, “Action recognition from
video using feature covariance matrices,” IEEE Transactions
on Image Processing, vol. 22, no. 6, pp. 2479–2494, 2013.
[35] R. Sivalingam, D. Boley, V. Morellas, and
N. Papanikolopoulos, “Tensor sparse coding for positive
definite matrices,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 36, no. 3, pp. 592–605, 2014.
[36] A. Cherian and S. Sra, “Riemannian dictionary learning and
sparse coding for positive definite matrices,” IEEE Transac-
tions on Neural Networks and Learning Systems, vol. 28, no. 12,
pp. 2859–2871, 2017.
[37] X. Xie, Z. L. Yu, Z. Gu, and Y. Li, “Classification of symmetric
positive definite matrices based on bilinear isometric Rie-
mannian embedding,” Pattern Recognition, vol. 87, pp. 94–
105, 2019.
[38] M. T. Harandi, C. Sanderson, A. Wiliem, and B. C. Lovell,
“Kernel analysis over Riemannian manifolds for visual rec-
ognition of actions, pedestrians and textures,” in Proceedings
of the 2012 IEEE Workshop on the Applications of Computer
Vision (WACV), Breckenridge, CO, USA, 2012.
[39] S. Jayasumana, R. Hartley, M. Salzmann, H. Li, and
M. Harandi, “Kernel methods on Riemannian manifolds with
Gaussian RBF kernels,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 37, no. 12, pp. 2464–2477, 2015.
[40] R. Caseiro, J. F. Henriques, P. Martins, and J. Batista, “Semi-
intrinsic mean shift on Riemannian manifolds,” in Proceed-
ings of the ECCV 2012, vol. 7572, Florence, Italy, 2012.
[41] P. Li, Q. Wang, W. Zuo, and L. Zhang, “Log-Euclidean kernels
for sparse representation and dictionary learning,” in Pro-
ceedings of the 2013 IEEE International Conference on Com-
puter Vision, Sydney, Australia, 2013.
[42] M. T. Harandi, R. Hartley, B. Lovell, and C. Sanderson,
“Sparse coding on symmetric positive definite manifolds
using Bregman divergences,” IEEE Transactions on Neural
Networks and Learning Systems, vol. 27, no. 6, pp. 1294–1306,
2016.
[43] R. Zhuang, Z. Ma, W. Feng, and Y. Lin, “SPD data dictionary
learning based on kernel learning and Riemannian metric,”
IEEE Access, vol. 8, pp. 61956–61972, 2020.
[44] S. Hechmi, A. Gallas, and E. Zagrouba, “Multi-kernel sparse
subspace clustering on the Riemannian manifold of sym-
metric positive definite matrices,” Pattern Recognition Letters,
vol. 125, pp. 21–27, 2019.
[45] J. Shawetaylor and N. Cristianini, Kernel Methods for Pattern
Analysis, Cambridge University Press, New York, NY, USA,
2004.
[46] E. Kreyszig, Introductory Functional Analysis with Applica-
tions, Vol. 1, Wiley, New York, NY, USA, 1978.
[47] B. B. Damodaran, N. Courty, and S. Lef`
evre, “Sparse Hilbert
Schmidt independence criterion and surrogate-kernel-based
feature selection for hyperspectral image classification,” IEEE
Transactions on Geoscience and Remote Sensing, vol. 55, no. 4,
pp. 2385–2398, 2017.
[48] K. Zhang, V. W. Zheng, Q. Wang, J. T. Kwok, Q. Yang, and
I. Marsic, “Covariate shift in Hilbert space: a solution via
surrogate kernels,” in Proceedings of the 30th International
14 Mathematical Problems in Engineering
Conference on Machine Learning, pp. 388–395, Atlanta, GA,
USA, 2013.
[49] M. J. Gangeh, H. Zarkoob, and A. Ghodsi, “Fast and scalable
feature selection for gene expression data using Hilbert-
Schmidt independence criterion,” IEEE/ACM Transactions on
Computational Biology and Bioinformatics, vol. 14, no. 1,
pp. 167–181, 2017.
[50] L. Song, A. Gretton, K. M. Borgwardt, and A. J. Smola,
“Colored maximum variance unfolding,” in Proceedings of the
20th International Conference on Neural Information Pro-
cessing Systems, Vancouver, Canada, 2007.
[51] L. Song, A. Smola, A. Gretton, J. Bedo, and K. Borgwardt,
“Feature selection via dependence maximization,” Journal of
Machine Learning Research, vol. 13, no. 1, pp. 1393–1434,
2007.
[52] J.-P. Aubin, “Hilbert-Schmidt operators and tensor products,”
in Applied Functional Analysis, pp. 283–308, John Wiley &
Sons, Hoboken, NJ, USA, 2nd edition, 2000.
[53] A. Gretton, O. Bousquet, A. Smola, and B. Sch¨
olkopf,
“Measuring statistical dependence with Hilbert-Schmidt
norms,” in Proceedings of the 2005 International Conference
on Algorithmic Learning eory, pp. 63–77, Singapore, 2005.
[54] S. I. Amari and H. Nagaoka, Methods of Information Ge-
ometry, Oxford University Press, New York, NY, USA, 2000.
[55] S. Nene, S. Nayar, and H. Murase, “Columbia object image
library (coil-20),” Technical report, Columbia University,
New York, NY, USA, 1996.
[56] B. Leibe and B. Schiele, “Analyzing appearance and contour
based methods for object categorization,” in Proceedings of the
2003 IEEE Computer Society Conference on Computer Vision
and Pattern Recognition, vol. 2, Madison, WI, USA, 2003.
[57] D. Tosato, M. Spera, M. Cristani, and V. Murino, “Charac-
terizing humans on Riemannian manifolds,” IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, vol. 35,
no. 8, pp. 1972–1984, 2013.
[58] P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss, “e
FERET evaluation methodology for face-recognition algo-
rithms,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 22, no. 10, pp. 1090–1104, 2000.
[59] T. Randen and J. H. Husoy, “Filtering for texture classifica-
tion: a comparative study,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 21, no. 4, pp. 291–310,
1999.
Mathematical Problems in Engineering 15
... Although HSIC has been found in many applications, it seems not to have been directly applied on SPD manifolds. is study is an extension of our previously published work [44]. In [44], we proposed a method named HSIC subspace learning (HSIC-SL) by using global HSIC maximization. ...
... is study is an extension of our previously published work [44]. In [44], we proposed a method named HSIC subspace learning (HSIC-SL) by using global HSIC maximization. Compared with HSIC-SL, we use HSIC herein in the form of regularization term to build a novel kernel framework on SPD manifolds. ...
... where C N � I N − (1/N)Γ N Γ T N is the centralizing matrix and K X and K Y are the kernel matrices of X and Y, respectively. We have summarized the relevant theories of HSIC in [44]. For detailed derivation, please refer to our previously published work. ...
Article
Full-text available
Image recognition tasks involve an increasingly high amount of symmetric positive definite (SPD) matrices data. SPD manifolds exhibit nonlinear geometry, and Euclidean machine learning methods cannot be directly applied to SPD manifolds. The kernel trick of SPD manifolds is based on the concept of projecting data onto a reproducing kernel Hilbert space. Unfortunately, existing kernel methods do not consider the connection of SPD matrices and linear projections. Thus, a framework that uses the correlation between SPD matrices and projections to model the kernel map is proposed herein. To realize this, this paper formulates a Hilbert–Schmidt independence criterion (HSIC) regularization framework based on the kernel trick, where HSIC is usually used to express the interconnectedness of two datasets. The proposed framework allows us to extend the existing kernel methods to new HSIC regularization kernel methods. Additionally, this paper proposes an algorithm called HSIC regularized graph discriminant analysis (HRGDA) for SPD manifolds based on the HSIC regularization framework. The proposed HSIC regularization framework and HRGDA are highly accurate and valid based on experimental results on several classification tasks.
Article
Full-text available
The use of regional covariance descriptors to generate feature data represented by Symmetric Positive Definite (SPD) matrices from images or videos has become increasingly common in machine learning. However, SPD data itself does not constitute a vector space, and dictionary learning involves a large number of linear operations, so dictionary learning cannot be performed directly on SPD data. For this reason, a more common method is to map the SPD data to the Reproducing Kernel Hilbert Space (RKHS). The so-called kernel learning is to find the most suitable RKHS for specific tasks. RKHS can be uniquely generated by a kernel function. Therefore, RKHS learning can also be considered as kernel learning. In this article, there are two main contributions. The first contribution is to propose a framework which based on Kernel Learning and Riemannian Metric (KLRM). Usually the learnable kernel function framework is to learn some parameters in the kernel function. The second contribution is dictionary learning by applying KLRM to SPD data. The SPD data is transformed into the RKHS generated by KLRM, and RKHS after training provides the most suitable working space for dictionary learning. Under the proposed framework, we design a positive definite kernel function, which is defined by the Log-Euclidean metric. This function can be transformed into a corresponding Riemannian kernel. The experimental results provided in this paper is compared with other state-of-the-art algorithms for SPD data dictionary learning and show that the proposed algorithm achieves better results.
Article
Full-text available
Covariance matrices have attracted increasing attention for data representation in many computer vision tasks. The nonsingular covariance matrices are regarded as points on Riemannian manifolds rather than Euclidean space. A common technique for classification on Riemannian manifolds is to embed the covariance matrices into a reproducing kernel Hilbert space (RKHS), and then construct a map from RKHS to Euclidean space, while the explicit map from RKHS to Euclidean space in most kernel-based methods only depends on a linear hypothesis. In this paper, we propose a subspace learning framework to project Riemannian manifolds to Euclidean space, and give the theoretical derivation for it. Specifically, the Euclidean space is isomorphic to the subspace of RKHS. Under the framework, firstly we define an improved Log-Euclidean Gaussian radial basis function kernel for embedding. The first order statistical features of input images are incorporated into the kernel function to increase the discriminative power. After that we seek the optimal projection matrix of the subspace of the RKHS by conducting a graph embedding discriminant analysis. Texture recognition and object categorization experiments with region covariance descriptors demonstrate the considerable effectiveness of the improved Log-Euclidean Gaussian RBK kernel and the proposed method.
Article
Full-text available
Image set classification has attracted increasing attention with respect to the use of significant amounts of within-set information. The covariance matrix is a natural and effective descriptor for describing image sets. Non-singular covariance matrices, known as Symmetric Positive Definite (SPD) matrices, are regarded as points on a Riemannian manifold. A common method of classifying points on a manifold is to explicitly map the SPD matrices from a Riemannian manifold to Euclidean space such as in the Covariance Discriminative Learning (CDL) method. However, the disadvantages of the CDL method are as follows: 1) the method models the whole image set as a covariance matrix, whereas if there are insufficient set samples or merely one set, the within-class information studied by the discriminative learning may not be utilized well, 2) and when the original sample covariance matrices are of high dimensionality, the computational cost is non-trivial. To address these problems of CDL, we propose to exploit the Maximal Linear Patch (MLP) to cluster image sets into multiple subsets (local patches), which could provide substantially more within-class information. Moreover, we refine the manifold formed by the SPD matrices to a lower dimensionality and more discriminative manifold by collaboratively applying Principal Component Analysis (PCA) to all training sets. Experiments are performed on face recognition and objection categorization tasks; extensive comparison results illustrate the considerable effectiveness of our proposed method.
Article
The symmetric positive definite (SPD) matrices, forming a Riemannian manifold, are commonly used as visual representations. The non-Euclidean geometry of the manifold often makes developing learning algorithms (e.g., classifiers) difficult and complicated. The concept of similarity-based learning has been shown to be effective to address various problems on SPD manifolds. This is mainly because the similarity-based algorithms are agnostic to the geometry and purely work based on the notion of similarities/distances. However, existing similarity-based models on SPD manifolds opt for holistic representations, ignoring characteristics of information captured by SPD matrices. To circumvent this limitation, we propose a novel SPD distance measure for the similarity-based algorithm. Specifically, we introduce the concept of point-to-set transformation, which enables us to learn multiple lower dimensional and discriminative SPD manifolds from a higher dimensional one. For lower dimensional SPD manifolds obtained by the point-to-set transformation, we propose a tailored set-to-set distance measure by making use of the family of alpha-beta divergences. We further propose to learn the point-to-set transformation and the set-to-set distance measure jointly, yielding a powerful similarity-based algorithm on SPD manifolds. Our thorough evaluations on several visual recognition tasks (e.g., action classification and face recognition) suggest that our algorithm comfortably outperforms various state-of-the-art algorithms.
Article
Recently, clustering in the Riemannian manifolds has received a great interest. The main specificity of this kind of spaces is their ability to detect nonlinear forms in the real world data groups. Accordingly, numerous Euclidean clustering algorithms have been emerged to the Riemannian framework. One of them is the Sparse Subspace Clustering (SSC) method. Within this context, an interesting method named Kernel Sparse Subspace Clustering in the Riemannian manifold (KSSCR) has been proposed. In order to improve the clustering performance of the SSC method, KSSCR is based on the idea of data projection into a Reproducing Kernel Hilbert Space (RKHS) through a Riemannian kernel. This modification in terms of projection space has yielded to significant clustering results compared to the original SSC. This paper proposed a Multi-Kernel SSCR (MKSSCR) method. The idea is to create a linear combination of a set of Riemannian kernels. In fact, by using a single kernel, the faraway points will be neglected. Thus, the key motivation of this work is to use a mixture of kernels. The purpose here is to emphasis the closest data points without ignoring the faraway ones in the process of selecting neighbors for each data point. Several experiments carried out on varied face clustering data sets demonstrate the clustering accuracy improvement by the proposed method compared to other state-of-the-art methods.
Article
Because covariance features with the form of symmetric positive definite matrices lie on Riemannian manifold, classification on Riemannian manifold could possess high performance in many applications. Unfortunately, the applicability of classification methods developed on Riemannian manifold is limited by their huge computational complexities, particularly with the feature data on high-dimensional Riemannian manifold. To alleviate the problem of computational cost, in this paper, a simple yet efficient dimensionality reduction algorithm, bilinear isometric Riemannian embedding, is derived to construct a low-dimensional embedding from high-dimensional Riemannian manifold. To this end, we model the bilinear isometric mapping to identify a low-dimensional embedding that maximizes the preservation of Riemannian geodesic distance. A supervised classification method, embedding discriminant analysis, is then proposed based on the low-dimensional embedding. Experimental results on image and electroencephalogram reveal that the proposed algorithms can efficiently extract the distance-preserving embedding and obtain higher classification performance.
Article
To address the problem of face recognition with image sets, we aim to capture the underlying data distribution in each set and thus facilitate more robust classification. To this end, we represent image set as Gaussian Mixture Model (GMM) comprising a number of Gaussian components with prior probabilities and seek to discriminate Gaussian components from different classes. Since in the light of information geometry the Gaussians lie on a specific Riemannian manifold, this paper presents a method named Discriminant Analysis on Riemannian manifold of Gaussian distributions (DARG). We investigate several distance metrics between Gaussians and accordingly two discriminative learning frameworks are presented to meet the geometric and statistical characteristics of the specific manifold. The first framework derives a series of provably positive definite probabilistic kernels to embed the manifold to a high-dimensional Hilbert space, where conventional discriminant analysis methods developed in Euclidean space can be applied, and a weighted Kernel Discriminant Analysis is devised which learns discriminative representation of the Gaussian components in GMMs with their prior probabilities as sample weights. Alternatively, the other framework extends the classical graph embedding method to the manifold by utilizing the distance metrics between Gaussians to construct the adjacency graph, and hence the original manifold is embedded to a lower-dimensional and discriminative target manifold with the geometric structure preserved and the interclass separability maximized. The proposed method is evaluated by face identification and verification tasks on four most challenging and largest databases, YouTube Celebrities, COX, YouTube Face DB and Point-and-Shoot Challenge, to demonstrate its superiority over the state-of-the-art.
Chapter
The Hilbert Space of Hilbert-Schmidt OperatorsThe Fundamental Isomorphism TheoremHilbert Tensor ProductsThe Tensor Product of Continuous Linear OperatorsThe Hilbert Tensor Product by l2The Hilbert Tensor Product by L2The Tensor Product by the Sobolev Space Hm
Article
Goal: In computational biology, selecting a small subset of informative genes from microarray data continues to be a challenge due to the presence of thousands of genes. This paper aims at quantifying the dependence between gene expression data and the response variables and to identifying a subset of the most informative genes using a fast and scalable multivariate algorithm. Methods: A novel algorithm for feature selection from gene expression data was developed. The algorithm was based on the Hilbert-Schmidt independence criterion (HSIC), and was partly motivated by singular value decomposition (SVD). Results: The algorithm is computationally fast and scalable to large datasets. Moreover, it can be applied to problems with any type of response variables including, biclass, multiclass, and continuous response variables. The performance of the proposed algorithm in terms of accuracy, stability of the selected genes, speed, and scalability was evaluated using both synthetic and real-world datasets. The simulation results demonstrated that the proposed algorithm effectively and efficiently extracted stable genes with high predictive capability, in particular for datasets with multiclass response variables. Conclusion/significance: The proposed method does not require the whole microarray dataset to be stored in memory, and thus can easily be scaled to large datasets. This capability is an important attribute in big data analytics, where data can be large and massively distributed.
Article
Designing an effective criterion to select a subset of features is a challenging problem for hyperspectral image classification. In this paper, we develop a feature selection method to select a subset of class discriminant features for hyperspectral image classification. First, we propose a new class separability measure based on the surrogate kernel and Hilbert Schmidt independence criterion in the reproducing kernel Hilbert space. Second, we employ the proposed class separability measure as an objective function and we model the feature selection problem as a continuous optimization problem using LASSO optimization framework. The combination of the class separability measure and the LASSO model allows selecting the subset of features that increases the class separability information and also avoids a computationally intensive subset search strategy. Experiments conducted with three hyperspectral data sets and different experimental settings show that our proposed method increases the classification accuracy and outperforms the state-of-the-art methods.