The region covariance descriptor (RCD), which is known as a symmetric positive definite (SPD) matrix, is commonly used in image representation. As SPD manifolds have a non-Euclidean geometry, Euclidean machine learning methods are not directly applicable to them. In this work, an improved covariance descriptor called the hybrid region covariance descriptor (HRCD) is proposed. The HRCD incorporates the mean feature information into the RCD to improve the latter’s discriminative performance. To address the non-Euclidean properties of SPD manifolds, this study also proposes an algorithm called the Hilbert-Schmidt independence criterion subspace learning (HSIC-SL) for SPD manifolds. The HSIC-SL algorithm is aimed at improving classification accuracy. This algorithm is a kernel function that embeds SPD matrices into the reproducing kernel Hilbert space and further maps them to a linear space. To make the mapping consider the correlation between SPD matrices and linear projection, this method introduces global HSIC maximization to the model. The proposed method is compared with existing methods and is proved to be highly accurate and valid by classification experiments on the HRCD and HSIC-SL using the COIL-20, ETH-80, QMUL, face data FERET, and Brodatz datasets.
e region covariance descriptor (RCD), which is known as a symmetric positive definite (SPD) matrix, is commonly used in
image representation. As SPD manifolds have a non-Euclidean geometry, Euclidean machine learning methods are not directly
applicable to them. In this work, an improved covariance descriptor called the hybrid region covariance descriptor (HRCD) is
proposed. e HRCD incorporates the mean feature information into the RCD to improve the latter’s discriminative perfor-
mance. To address the non-Euclidean properties of SPD manifolds, this study also proposes an algorithm called the Hilbert-
Schmidt independence criterion subspace learning (HSIC-SL) for SPD manifolds. e HSIC-SL algorithm is aimed at improving
classification accuracy. is algorithm is a kernel function that embeds SPD matrices into the reproducing kernel Hilbert space
and further maps them to a linear space. To make the mapping consider the correlation between SPD matrices and linear
projection, this method introduces global HSIC maximization to the model. e proposed method is compared with existing
methods and is proved to be highly accurate and valid by classification experiments on the HRCD and HSIC-SL using the COIL-
20, ETH-80, QMUL, face data FERET, and Brodatz datasets.
1. Introduction
A growing number of non-Euclidean data, such as sym-
metric positive definite (SPD) manifolds [1] and Grassmann
manifolds [2], are often encountered in vision recognition
tasks. In particular, SPD manifolds have attracted increased
attention in the form of the region covariance descriptor
(RCD) [3, 4], Gaussian mixed model (GMM) [5], tensors
[6–9], etc. In this work, we mainly discuss the image clas-
sification on SPD manifolds.
e RCD has been proved to be an effective descriptor in a
variety of applications [1012]. It captures the correlation be-
tween different features of an image and represents the image
with a covariance matrix. However, the mean vector of features
has been proved to be significant in image recognition tasks
[13, 14]. In this work, we construct a new image descriptor by
directly incorporating the mean feature information into the
RCD. e new image descriptor is called the hybrid region
covariance descriptor (HRCD). e HRCD inherits the
advantages of the RCD, and it is more discriminable than the
RCD. e images represented by the HRCD are also SPD
matrices that lie on SPD manifolds. Most classical machine
learning algorithms are constructed on linear spaces. Given the
non-Euclidean geometry of Riemannian manifolds, directly
using most of the conventional machine learning methods on
Riemannian manifolds is inadequate [15, 16]. erefore, the
classification of the points on Riemannian manifolds has be-
come a hot research topic.
Two main approaches are generally adopted to cope with
the nonlinearity of Riemannian manifolds. e first ap-
proach is to construct learning methods by directly con-
sidering the Riemannian geometry; one such method is the
widely used tangent approximation [17, 18]. Most existing
SPD classification methods have been proposed by making
use of Riemannian metrics [15, 16] or matrix divergences
[19, 20] as the distance measure for SPD matrices [21–23].
e other approach is to project the SPD matrices to
another space, such as a high-dimensional reproducing
kernel Hilbert space (RKHS) [24] and another low-di-
mensional SPD manifold [25]. Classification algorithms can
be constructed on the projection space. Benefiting from the
success of kernel methods in Euclidean spaces, the kernel-
based classification scheme is a good choice for the analysis
of SPD manifolds and has shown promising performance
[26, 27]. Kernel-based methods embed manifolds into
RKHSs and further project these manifolds to Euclidean
spaces via an explicit mapping. Hence, algorithms designed
for linear spaces can be extended to Riemannian manifolds.
However, the mapping from RKHSs to Euclidean spaces
using existing methods is based on a linear assumption.
Moreover, the intrinsic connections of SPD matrices and
low-dimensional projections are ignored.
To circumvent this limitation of kernel-based methods,
we propose introducing the Hilbert–Schmidt independence
criterion (HSIC) to the kernel trick and refer to the resulting
method as the HSIC subspace learning (HSIC-SL) algo-
rithm. Specifically, we derive the log-linear and log-Gaussian
kernels to embed SPD matrices into a high-dimensional
RKHS and then project these points into a low-dimensional
vector space of the RKHS. To align the low-dimensional
representation with the intrinsic features of the input data,
we introduce statistical dependence between the SPD ma-
trices and the low-dimensional representation. In this work,
explicit mapping is obtained on the basis of subspace
learning and HSIC maximization. Here, HSIC can be used to
characterize the statistical correlation between two datasets.
e main contributions of this study are as follows:
(1) We propose a novel covariance descriptor called the
HRCD. e proposed descriptor explores discrimi-
native information effectively.
(2) e HSIC is first applied to the kernel framework on
SPD Riemannian manifolds, and a novel subspace
learning algorithm called the HSIC-SL is proposed.
e proposed method achieves effective classifica-
tion on the basis of global HSIC maximization.
(3) We identify two simple kernel functions involved in
the HSIC-SL algorithm. e diversity of kernels
improves the flexibility of the HSIC-SL.
e rest of the paper is organized as follows. We provide
a review of previous work in Section 2. A brief description
about RCD, RKHS, and HSIC is presented in Section 3. We
derive the proposed descriptor and algorithm in detail in
Section 4. e experimental results are presented in Section 5
to demonstrate the effectiveness of the HRCD and HSIC-SL.
Conclusions and future research directions are established in
Section 6.
2. Literature Review
is section presents a brief review of RCDs, as well as recent
manifold classification methods constructed on SPD
e RCD was first introduced by Tuzel et al. [28]. It
represents an image region with a nonsingular SPD matrix
by extracting the covariance matrix of multiple features.
e covariance matrix does not have any information about
size and ordering, which implies certain scale and rotation
independence. e RCD is used not only in image rec-
ognition but also in image set recognition tasks, in which an
image set is modeled with its natural second-order statistic
[4, 29]. e GMM could also serve as the SPD descriptor of
an image set. Under the assumption of the multi-Gaussian
distribution of an image set [30], hundreds of images in the
image set are assigned to a small number of Gaussian
components. Each Gaussian component is represented as
an SPD matrix [31]. us, the image set is described by
multiple SPD matrices. As mentioned previously, mean
vectors have also been proved to be important in recog-
nition tasks. In [32], the mean information was utilized in
an improved log-Euclidean Gaussian kernel. However, this
approach is limited to a specific algorithm and lacks
generality. In the current work, we propose to incorporate
the feature mean information and covariance matrix into a
new SPD matrix and introduce first-order statistic infor-
mation into the image RCD to improve the discriminant
ability of the descriptor.
When the manifold under consideration is an SPD
manifold, the tangent space of a particular point is a linear
space. Most works map SPD matrices onto the tangent space
of a particular point; thus, traditional linear classifiers can be
applied. Under this framework, dimensionality reduction
and clustering methods, such as Laplacian eigenmaps, local
linear embedding (LLE), and Hessian LLE, have been ex-
tended to Riemannian manifolds [17]. Tuzel et al. introduced
LogitBoost for classification on Riemannian manifolds [18].
e classifier has been generalized to multiclass classification
[33]. Sparse coding by embedding manifolds into identity
tangent spaces to identify the Lie algebra of SPD manifolds
was considered in [34]. Such tangent space approximations
could preserve manifold value data and eliminate the
swelling effect. However, flattening a manifold through
tangent spaces may generate inaccurate modeling, especially
for regions far away from the tangent pole.
Except for tangent approximation, many efforts have
been devoted to the distance measure on SPD manifolds to
measure the true SPD manifold geometry; examples include
the log-Euclidean Riemannian metric (LERM) [15] and the
affine invariance Riemannian metric (AIRM) [16]. Although
matrix divergences are not real Riemannian metric, they
provide fast and approximate distance computation. Siva-
lingam et al. proposed tensor sparse coding (TSC) for
positive definite matrices [35] that utilizes the Burg diver-
gence to perform sparse coding and dictionary learning on
SPD manifolds. Riemannian dictionary learning and sparse
coding (DLSC) [36] represents data as sparse combinations
of SPD dictionary atoms via a Riemannian geometric ap-
proach and characterizes the loss of optimization for DLSC
via the affine invariant Riemannian metric. However, these
methods cannot be applied to other Riemannian manifolds
because of the specificity of the specific metrics used. Em-
bedding discriminant analysis (EDA) [37] identifies a bi-
linear isometric mapping such that the resulting
representation maximizes the preservation of Riemannian
geodesic distance.
As for the proposed kernel methods for SPD manifolds,
Riemannian locality preserving projections (RLPPs) [38]
embed Riemannian manifolds into low-dimensional vector
spaces by defining Riemannian kernels; moreover, their
computational complexity is heavy, and the kernel is not
always positive definite. Jayasumana et al. [39] presented a
framework on Riemannian manifolds to identify the positive
definiteness of Gaussian RBF kernels and utilized the log-
Euclidean Gaussian kernel in kernel principal component
analysis (KPCA) for a recognition task. Caseiro et al. pro-
posed a heat kernel mean shift on Riemannian manifolds
[40]. In [41], kernel DLSC based on LERM was introduced.
Harandi et al. proposed to seek sparse coding by embedding
the space of SPD matrices into Hilbert spaces through two
types of Bregman matrix divergences [42]. Covariance
discriminative learning (CDL) [4] utilizes a matrix logarithm
operator to define kernel functions and then explicitly maps
the covariance matrices from a Riemannian manifold to a
Euclidean space. Zhuang et al. proposed a data-dependent
kernel learning framework on the basis of the kernel learning
and Riemannian metric (KLRM) [43]. In [44], multi-kernel
SSCR (MKSSCR) created a linear combination of a set of
Riemannian kernels. Considerable results are achieved in the
kernel framework. To improve the performance of the kernel
trick, we introduce a statistical dependence constraint be-
tween SPD matrices and projections and measure the sta-
tistical dependence with the HSIC.
3. Related Work
In this section, we briefly review the RCD and the properties
of the RKHS and HSIC.
3.1. Region Covariance Descriptor. Region covariance de-
scriptor (RCD), as a special case of SPD matrices, proposes a
natural way of fusing multiple features. Suppose Ris an
image region of size h×w, and we can extract multiple
features of every pixel in R. e features could be location,
grey values, and gradients. We denote the feature vector of
the k-th pixel as
zkx, y, I, Ix
, Iy
 ,(1)
where xand ydenote the location, Iis the grey value, and Ix
and Iyare the gradients with respect to xand y. e RCD of
Ris defined as
 T,(2)
where nh×wand μ� (1/n)n
k1zkRddenotes the
mean of the points. en, the image region can be presented
by a d×dSPD matrix, where ddepends on the number of
3.2. RKHS. Reproducing kernel Hilbert space (RKHS) is the
theoretical basis of kernel methods. After projecting the data
into a RKHS, various machine learning methods will be
implemented in the RKHS.
Let S(Ω)be a function space, and 〈·,·〉 is an inner
product defined on S(Ω). e complete inner product space
H� (S(Ω),〈·,·〉) induced by 〈·,·〉 is a Hilbert space. For all
x∈ Ω and fS(Ω), if the function ksatisfies
f(x) � 〈f, k, x)〉, then kis the reproducing kernel of the
RKHS H. We denote the mapping defined by the repro-
ducing kernel as ϕ(x) � k, x) � kxH. We can induce
ϕ(x),ϕ(y)〉 kx, k, y)kx(y) � k(y, x) � k(x, y).
e function kcould be a kernel function only if the
kernel matrix Kis symmetric positive definite, where
k(x1, x1) · · · k(x1, xn)
⋮ ⋱ ⋮
k(xn, x1) · · · k(xn, xn)
. According to Mercer’s
theorem [45], once a valid reproducing kernel is defined, we
can generate a unique Hilbert space.
3.3. Hilbert–Schmidt Independence Criterion (HSIC). e
HSIC [46] is usually used to characterize the statistical
correlation of two datasets. e mathematical theory of
HSIC has been studied for a long time and there are many
achievements [47–51]. In the computation of HSIC, the two
datasets are firstly embedded onto two RKHSs, and then the
HSIC of the two set of data is measured by the Hil-
bert–Schmidt (HS) operator of these two RKHSs.
Let Xbe a random variable/vector defined on ΩXand Y
be a random variable/vector defined on ΩY,HXand HYbe
two separate Hilbert spaces, and ϕX:ΩXHXand
ϕY:ΩYHYbe the kernel mappings defined by the
reproducing kernels, respectively.
3.3.1. Hilbert–Schmidt (HS) Operators. Let T:HXHY
be a compact operator and eX
 be the orthonormal
basis of HX; if iITeX
Y<+, then Tis called a Hil-
bert–Schmidt (HS) operator [52]. If for all
i, SeX
iY|<+, then
(HS(HXHY),〈·,·〉HS)is a Hilbert space. e inner
product 〈·,·〉HS is defined as T, SHS iITeX
i, SeX
For f0HXand g0HY, the tensor product of f0and g0is
denoted as f0g0. Since f0g0(f) � 〈f0, fXg0HY,
then f0g0HS(HXHY)[53].
3.3.2. Mean Functions and Cross Covariance Operators.
Let ΦX:HXRbe a continuous linear functional over
HX; for all THS(HXHY),
ΦX(f) � EXφX(X), fX
 .(4)
According to Riesz theorem, there must a unique HS
operator μXHXsuch that for all fHX,
ΦX(f) � 〈f, μXX; then, μXis called the mean function of
φX(X). Similarly, the mean function μYof φY(Y)is defined
in the same way.
Let Φbe a continuous linear functional over
HS(HXHY); for all THS(HXHY),
Φ(T) � EXY φX(X)φY(Y), THS
 .(5)
en, according to Riesz theorem, there must a unique
HS operator CXY HS(HX, HY), such that for all
Φ(T) T, CXYHS,(6)
where CXY is called the cross covariance operator between
φX(X)and φY(Y).
e relationship between CXY,μX, and μYis illustrated in
Figure 1. e two datasets ΩXand ΩYare embedded into HX
and HYby the kernel functions ϕX:ΩXHXand
ϕY:ΩYHY, respectively. μXand μYare the mean
functions. e HSIC of ΩXand ΩYis given by the Hil-
bert–Schmidt (HS) operator CXY of HXand HY.
3.3.3. HSIC. HSIC of two random variables/vectors is de-
fined as follows:
HSIC(X, Y) � EXY φX(X) − μX
 φY(Y) − μY
 
 .
It can be seen from the definition of HSIC(X, Y)that
instead of directly calculating the covariance of Xand Y, i.e.,
EXY[(XEX[X])(YEY[Y])], HSIC first transforms X
and Yinto HXand HY, respectively, and then calculates the
covariance of φX(X)and φY(Y)by using HS operators
between HXand HY. In practice, HXand HYare generated
from kernel functions kXand kY.
If the joint probability distribution of Xand Yis given or
known, HSIC(X, Y)can be calculated as follows:
HSIC(X, Y) � EXY φX(X) − μX
 φY(Y) − μY
 
 CXY μXμY
 kYY, Y
  ,(9)
  EYkYY, Y
  ,
μXμY,μXμYHS μX,μXXμY,μYY.(11)
Generally speaking, the joint probability distribution of
Xand Yis unknown, and only some samples of Xand Yare
given: Xx1,. . . , xN
  ΩX, Y y1,. . . , yN
  ΩY. In this
case, the statistical average can be approximated by the
sample average. Moreover, it is assumed that when ij, the
probability of the random event Xxi;Yyj
 is 0; then,
the cross covariance operator CXY and the mean functions
μXand μYcan be approximated as follows:
HX, kXHY, kY
φ (X)φ (Y)
Figure 1: e sketch mapping of HSIC.
 φYyi
 ,
 ,
 .
Substituting equation (12) into equations (9) and (10)
 ,
μXμY,μXμYHS 1
where ΓN� [1,. . . ,1]TRNis the N-dimensional vector
with all elements being 1 and KXand KYare the kernel
matrices of Xand Y, respectively.
At last, the calculation formula of HSIC(X, Y)can be
obtained by
 ,
where CNIN− (1/N)ΓNΓT
Nis the centralizing matrix.
4. HRCD and HSIC Subspace Learning
e algorithm can be divided into four steps. First, we model
the labeled training samples through the proposed HRCD.
Each training sample is described by an SPD matrix. Second,
we embed the SPD matrices into a high-dimensional RKHS
Hwith a defined kernel function and further project the
elements in the RKHS into a vector space H. e mapping
f:HHis explored by solving the optimization
problem. ird, we use the explicit map in mapping the
training and test samples onto the low-dimensional and
relatively discriminative space. Finally, the classification task
can be realized by executing a classifier on H. An overall
illustration of the algorithm is shown in Figure 2.
Given a set of training samples belonging to cclasses
χX1, X2,. . . , XN
 M,XiRd×dis an SPD matrix, and
ll1, l2,. . . , lN
denote the corresponding labels. e
representation of χon the low-dimensional vector space H
is denoted as Y� [y1, y2,. . . , yN],yiRm. In the frame-
work of the kernel analysis, the low-dimensional
representation yiof Xiis obtained by the mapping
yiWKiRow, where KiRow � [ki1, ki2,. . . , kiN ]T.
4.1. Hybrid Region Covariance Descriptor. As mentioned
previously, we propose incorporating the feature mean in-
formation into the RCD to improve the discrimination of the
descriptor. We refer to the resulting descriptor as the HRCD.
Given an image region R, we extract multiple features of
each point in Rand then compute the mean vector and
covariance matrix of the features. Suppose that the feature
vector of the k-th pixel is zk; the mean vector μRdand
covariance matrix Σ ∈ Rd×dcan then be computed as
 T.
Following the information geometry theory [54], we
combine the mean and the covariance matrix into a new
matrix without additional computational complexity. e
new matrix is constructed as
X� |Σ|− (1/(d+1)) Σ+μμTμ
Here, dis the dimensionality of the feature vector, and |·|
is the determinant operator. e (d+1) × (d+1)SPD
matrix Xis the HRCD of the image. As a result of the
inheritance from the covariance matrix, the HRCD is not
only effective, robust, and low-dimensional but also more
discriminable than the RCD.
4.2. Kernel Function in HSIC-SL. In defining a valid RKHS,
the kernel must be symmetric positive definite. Many dis-
cussions on the symmetric positive definiteness of kernel
functions are based on vector spaces. In this section, we
introduce two typical kernel functions on SPD Riemannian
4.2.1. Log-Linear Kernel. e polynomial kernel is one of the
commonly used kernel functions in Euclidean spaces. e
polynomial kernel function in a vector space is defined as
k xi, xj
 αxT
 c,(17)
where xi, xjRn. If αc1 and β0, then equation (17)
is a linear kernel.
k xi, xj
 xT
When the linear kernel is developed into SPD Rie-
mannian manifolds, it should be redefined in a sophisticated
form. e linear kernel on SPD Riemannian manifolds can
be defined as
k Xi, Xj
 tr log Xi
  Tlog Xj
  tr log Xi
 log Xj
  .
We denote the kernel as the log-linear kernel.
4.2.2. Log-Gaussian Kernel. e Gaussian kernel is another
popular kernel function in Euclidean spaces. e definition
of a Gaussian kernel function is
k xi, xj
 exp d2xi, xj
 
where d(xi, xj) � ‖xixjF. A good effect can be achieved
by replacing the Euclidean distance with a log-Euclidean
distance. e log-Gaussian kernel is defined by
kLE Xi, Xj
 exp d2
LE Xi, Xj
 
where dLE(Xi, Xj) � ‖log(Xi) − log(Xj)‖Fis the log-Eu-
clidean distance between Xiand Xj. e positive definite-
ness of kLE was proved in [39].
e parameter σis an important parameter in the
Gaussian kernel. To make the log-Gaussian kernel sensitive
to distances, we suggest setting σto the average value of
distances between the training samples.
4.3. HSIC Subspace Learning. After embedding the matrices to
the RKHS, we further project the points into a vector space
through explicit mapping. We aim to find the explicit mapping
from the RKHS to the vector space by maximizing the HSIC
between the SPD matrices and the low-dimensional repre-
sentation, as well as preserving the local information. e
proposed HSIC-SL includes global HSIC maximization and
within-class information preservation.
We denote the HSIC of χand the low-dimensional
representation Yas HSIC(χ, Y). According to equation (14),
HSIC(χ, Y)can be computed as
HSIC(χ, Y) � 1
 .(22)
e input data χand projection Yare represented by Kχ
and KY, respectively. To explicitly realize the low-dimen-
sional representation, we define the kernel function of Yin
HSIC(χ, Y)as kY:Rd×RdR,y, yRd; that is,
kYy, y
 yTy.(23)
We denote the kernel matrix of kYas KY. It can be
computed by
1y1· · · yT
⋮ ⋱
Ny1· · · yT
YTY. (24)
Substituting equation (24) to equation (22) yields
HSIC(χ, Y) � 1
 
 
 .
As Nis not related to Y, the coefficient (1/N2)in
equation (25) can be omitted. en, we have
 tr WLHWT
 ,
e within-class information is represented by the
within-class scatter SW, which is defined as
 T
where Niis the number of training samples of the i-th class,
i1NiN, and miis the mean vector corresponding to the
Figure 2: Framework of the proposed method. Each image is represented by an SPD matrix on the manifold M. e points on Mare
embedded into the RKHS Hand are further projected to a low-dimensional and discriminative subspace H. e map is optimized via the
cost function f.
6Mathematical Problems in Engineering
i-th class. According to the relationship between yiand Xi,
equation (27) can be further transformed into
SWtr W
KjRow Kmi
KjRow Kmi
 T
 ,(28)
where Kmi� (1/Ni)Ni
j1(KjRow mi)
(KjRow mi)T.
In sum, the objective function is formulated as
J(W) � arg max
(HSIC(χ, Y)) � arg max
  WLWW
 tr LW
 .
e Rayleigh quotient maximum problem is commonly
used in optimization problems because of the fast and simple
calculation. e problem shown in equation (29) can be
solved by calculating the Rayleigh quotient maximum. To
tackle the singularity, we add a small perturbation εto the
diagonal elements of LW. e optimal projection matrix Wis
composed of the eigenvectors corresponding to the m
biggest eigenvalues of (LW+εIN)1LH, where INis the
identity matrix.
Hence, for the given test image, we first compute its
HRCD and denote the result as Xt. e projection can be
obtained by ytWKtRow. en, the class of the test image
can be predicted through the nearest neighbor classifier.
5. Experiment
e performance of the HRCD and the proposed algorithm
is verified in this section. We considered five widely studied
image datasets: COIL-20 (Columbia Object Image Library)
dataset [55], ETH-80 dataset [56], Queen Mary University of
London (QMUL) dataset [57], face data FERET dataset [58],
and Brodatz dataset [59]. All of the compared methods were
implemented in MATLAB R2014 and tested on an Intel(R)
Core(TM) i5-4670K (3.40 GHz) machine.
5.1. Performance of HRCD. To verify that the HRCD is an
effective image descriptor, we directly used the KNN clas-
sifier on the image feature space represented by the HRCD
and RCD without feature extraction. By adopting the Eu-
clidean metric, LERM, AIRM, and Burg divergence as the
measurements, the classification experiments were per-
formed on COIL-20 and ETH-80. e COIL-20 dataset
contains 20 objects, each of which contains 72 images
measuring 128 ×128 at different directions. Figure 3 shows
the sample pictures. Features including grey values and first-
and second-order gradients were extracted to calculate the
RCD and HRCD of an image. Hence, the RCD and HRCD of
an image were a 5 ×5 SPD matrix and a 6 ×6 SPD matrix,
respectively. e images were randomly split into the
training set and test set, with 10 pictures assigned to the
training set and the remaining images assigned to the test set.
ETH-80 is an image set containing eight types of objects,
such as apple, pears, cars, and dogs. Each object has 10
instances, and each instance contains images from 41 dif-
ferent viewpoints. e images in ETH-80 were resized to
128 ×128 (Figure 4). For the RCD and HRCD represen-
tations, we extracted the following features:
F(x, y) � x, y, Rx,y , Gx,y, Bx,y, Ix,y, Ix
, Iy
, Ixx
, Iyy
 ,
where Rx,y,Gx,y ,Bx,y are the RGB color values of a pixel at
the position of xand y,Ix,y is the greyscale value, and |Ix|,
|Iy|,|Ixx|,|Iyy |are the first-and second-order gradients of
intensities. e RCD and HRCD of the image were a 10 ×10
SPD matrix and a 11 ×11 SPD matrix, respectively. Half of
the instances in every object were used for training, and the
remaining instances were used for the test. Each instance in
the training and test sets comprised 100 random samples.
erefore, the training and test sets each contained 800
Table 1 lists the classification accuracies and runtimes
under different metrics. To eliminate the randomness of the
experiment, we obtained the average accuracy and runtime
for 20 tests.
5.2. Performance of HSIC-SL. e proposed HSIC-SL was
compared with several recognition methods on SPD man-
ifolds. e compared methods included RLPP [38], KSLR
[32], CDL [4], KPCA using the log-Gaussian kernel [39],
RSR [42], TSC [35], Riem-DLSC [36], logEuc-SC [34],
Geometry-DR [25], KLRM-DL [43], EDA [37], and
MKSSCR [44]. For brevity, we denote the HSIC-SL with the
log-linear kernel as HSIC-SL (log-linear) and that with the
log-Gaussian kernel as HSIC-SL (log-Gaussian). HSIC-SL
(log-linear) and HSIC-SL (log-Gaussian) were combined
with the RCD and HRCD. us, for the proposed HSIC-SL,
four different combinations were tested. For equality, the
important parameters of the comparison methods were set
according to the suggestion of the original paper.
5.2.1. Experiments on QMUL Dataset. e QMUL dataset
[44] is a set of images of human heads collected from airport
terminal cameras. e dataset is composed of 20,005 images.
It is divided into five classes according to the direction of the
head images: back, front, left, right, and background. e
samples from QMUL are shown in Figure 5. e dataset was
divided into the training and test sets in advance. Table 2
shows the number of training and test sets in every class. e
extracted feature of any pixel is
F(x, y) � IL(x, y), Ia(x, y), Ib(x, y),
,arctan I2
, G1(x, y),..., G8(x, y)
where IL(x, y),Ia(x, y), and Ib(x, y)are the three channel
values of the CIELAB color space, Ixand Iyare the first-order
gradients in the x- and y-directions of IL(x, y), respectively,
and Gi(x, y);i1,. . . ,8is the response of eight difference-of-
Gaussians filters. We obtained a 13 ×13 SPD matrix for the
RCD and a 14 ×14 SPD matrix for the HRCD. e training
data consisted of 200 randomly selected samples for each
category, and the test set consisted of 100 randomly selected
samples. e KNN (k12) search was used to construct the
neighborhood graphs in the RLPP and Geometry-DR. e
parameters (σ) in the kernels of the KPCA, RLPP, KSLR, and
HSIC-SL were set to the average distances. e parameter cin
the KSLR was set to 0.3. e parameter εin the proposed
method was set to 0.001. We evaluated the performance of the
CDL, RLPP, KSLR, Geometry-DR, and HSIC-SL for various
dimensions and reported the maximum performance. In
logEuc-SC, RSR, TSC, Riem-DLSC, and KLRM-DL, 50 dic-
tionaries and kernel parameters were learned from the training
set. e kernel function in the RSR and the basic kernel in the
KLRM-DL was the Stein kernel. e parameter alpha was set to
0.1, and the number of data samples was set to 30. e 1NN
classifier was adopted in all the algorithms.
In Table 3, we show the recognition accuracy of the
HSIC-SL and the other existing algorithms. To eliminate the
Figure 3: Sample images of COIL-20.
Figure 4: Sample images of ETH-80.
8Mathematical Problems in Engineering
randomness of the experiment, we used the average rec-
ognition rate for 20 tests. HSIC-SL (log-Gaussian) + HRCD
and HSIC-SL (log-linear) + HRCD achieved impressive
performance while HSIC-SL (log-Gaussian) + HRCD ob-
tained the highest classification accuracy. Moreover, the
accuracy of the HRCD was greater than that of the RCD in
the experiment. ese results indicated that the HRCD was
better than the RCD. Furthermore, HSIC-SL + HRCD was
better than the other algorithms.
5.2.2. Experiments on FERET Dataset. To conduct the face
recognition experiment, we used the “b” subset of the
FERET dataset [56], which consists of 2,000 face images of
200 people. e images are those of 71 females and 129 males
of diverse ethnicities, genders, and ages. e images were
cropped and downsampled to 64 ×64. e training set was
composed of images with “ba,” “bc,” “bh,” and “bk” labels.
Images marked as “bd,” “be,” “bf,” and “bg” constituted the
test set. e feature vector for computing the RCD and
HRCD is described by
F(x, y) � x, y, I(x, y), G00 (x, y)
,. . . , G47(x, y)
 ,
where xand ydenote the position, I(x, y)is the intensity,
and Guv(x, y)is the response value of the Gabor filter. e
direction uof the Gabor filter was from 0 to 4, and the scale v
was from 0 to 7. us, the RCD and HRCD of each image
were a 43 ×43 SPD matrix and a 44 ×44 SPD matrix, re-
spectively. e neighborhood graphs constructed in the
RLPP and Geometry-DR were KNN (k3). e kernel
functions with Jeffrey and Stein divergences were adopted in
RSR and, respectively, denoted as RSR-J and RSR-S for
brevity. In RSR, TSC, Riem-DLSC, KLRM-DL, and logEuc-
SC, all training samples were regarded as dictionary atoms.
e settings of the other parameters were the same as those
for the QMUL dataset.
Table 4 shows the recognition rates of the compared
algorithms. e proposed method was not the best algorithm
for the FERET dataset. It only achieved the highest recog-
nition accuracy in the “bd” test scenario. Nevertheless, the
average recognition accuracies of HSIC-SL were still better
than those of the other algorithms and were only slightly
worse than those of KLRM-DL. Hence, HSIC-SL was still a
feasible algorithm for the FERET dataset. We also noticed
that HSIC-SL (log-Gaussian) performed better than HSIC-
SL (log-linear). erefore, the log-Gaussian kernel was more
suitable than the log-linear kernel for this dataset.
5.2.3. Experiments on Brodatz Dataset. We performed two
texture classification experiments on the Brodatz dataset
[57]. Examples from the Brodatz dataset are shown in
Figure 5: Sample images of QMUL dataset.
Table 1: Comparison of RCD and HRCD in terms of classification accuracy (%) and runtime (seconds).
Metric Descriptor COIL-20 ETH-80
Classification accuracy (%) Runtime (s) Classification accuracy (%) Runtime (s)
Euclidean RCD 74.98 2.946 62.63 5.234
HRCD 59.88 2.908 66.34 5.299
LERM RCD 84.81 4.149 71.03 7.1
HRCD 88.99 4.262 72.04 7.759
AIRM RCD 87.10 118.33 71.64 482.47
HRCD 91.06 125.79 73.35 509.15
Burg divergence RCD 89.23 10.102 72.07 26.368
HRCD 91.71 10.492 73.63 26.66
Table 2: Distribution of QMUL dataset.
Label Back Background Front Left Right
Training 2256 2256 2256 2256 2256
Test 2096 1107 1772 1502 2248
Table 3: Comparison of classification accuracies on QMUL dataset.
Methods Accuracy (%)
KPCA 42.5
CDL 76
RLPP 58.4
KSLR 77.4
logEuc-SC 66.3
RSR 73.2
TSC 61.7
Riem-DLSC 36.6
Geometry-DR 69.2
KLRM_DL 70.74
EDA 75.85
MKSSCR 78.14
HSIC-SL (log-linear) + RCD 76.22
HSIC-SL (log-linear) + HRCD 78.29
HSIC-SL (log-Gaussian) + RCD 76.16
HSIC-SL (log-Gaussian) + HRCD 79.65
Figure 6. e first experiment was a grouping experiment
with selected textures, and the other was a classification
experiment for all texture images.
In the first experiment, we followed the test setup
designed in [35] and selected three of the test schemes. e
schemes included one of the 5-texture groups, one of the 10-
texture groups, and one of the 16-texture groups. e
number of classes selected in each test scheme is shown in
Table 5. Each image was resized to 256 ×256, and then 64
regions measuring 32 ×32 were extracted. e covariance
matrices were computed from a five-dimensional feature
vector, including intensity and the first- and second-order
gradients. In each test scheme, eight samples in one image
were randomly selected as the training data, and the
remaining samples were used for the test. Geometry-DR was
not suitable for the dataset because of the low dimension of
the SPD matrices in this experiment. e results shown in
Figure 7 are the average results for 20 tests.
HSIC-SL achieved the highest classification result on all
test schemes, except for the 5-texture test, in which the
recognition rates of most of the algorithms were relatively
close. In this dataset, HSIC-SL (log-linear) performed better
than HSIC-SL (log-Gaussian).
In the second experiment, 20 random samples were chosen
as the training set, and 10 random samples were chosen as the
test set from all the texture images. e average results for 20
tests are presented in Table 6. HSIC-SL (log-linear) and HSIC-
SL (log-Gaussian) outperformed the other methods, with the
latter being marginally better than the former when all texture
pictures were classified. In addition, HSIC-SL modeled in the
HRCD was much higher than HSIC-SL modeled in the RCD.
e discriminative ability of the HRCD was verified again.
5.2.4. Experiments on COIL-20 and ETH-80 Datasets. In this
experiment, we used the COIL-20 dataset [55] and ETH-80
dataset in the object categorization task. e experimental
procedure was the same as that described in Section 5.1. We
compared the proposed method with KPCA [39], RLPP [38],
KSLR [32], and CDL [4]. In addition, KPCA and RLPP were
conducted on the HRCD and, respectively, denoted as
KPCA + HRCD and RLPP + HRCD. e classifier adopted
in all of the algorithms was the 1NN classifier.
Table 7 shows the classification accuracies of the
methods on COIL-20 and ETH-80. First, HSIC-SL obtained
the best accuracy in all of the datasets. is result indicated
that the introduction of the HSIC improved the effectiveness
of the recognition algorithm. Second, the classification ac-
curacies of RLPP, KPCA, and HSIC-SL in the RCD were
lower than those in the HRCD (i.e., RLPP + HRCD,
KPCA + HRCD, and HSIC-SL + HRCD). is result proved
once again that the HRCD had advantages over the RCD.
Finally, the effectiveness of the log-linear kernel and log-
Gaussian kernel in HSIC-SL was demonstrated in the
5.3. Analysis of Dimensionality. e parameter mwas
regarded as the dimensionality of the vector space after
feature extraction. e curves of the classification accuracies
of the compared algorithms on COIL-20 [55], ETH-80 [37],
and Brodatz versus mare shown in Figures 8 and 9. e
experimental setups were the same as those described in the
previous section.
With the increase of the dimensionality, the recognition
accuracy curves showed an upward trend. When the rec-
ognition accuracy reached a certain value, the recognition
rate remained basically stable within a certain range of the
subspace dimension.
5.4. Discussion. In the above experiments, the performance
of the RCD and HRCD and the effectiveness of HSIC-SL and
the other algorithms were compared. e following obser-
vations were made:
(1) e classification accuracy in the image feature space
represented by the HRCD was better than that by the
RCD regardless of which classifier was used (i.e.,
KNN classifier without feature extraction or the
proposed HSIC-SL). e result showed that the
proposed image descriptor HRCD outperformed the
(2) When the RCD was used as the image descriptor, the
HSIC-SL method was superior to most of the
methods, except for the FERET and Brodatz datasets.
In FERET, the performance of Riem-DLSC,
MKSSCR, and KLRM-DL was slightly better than
that of HSIC-SL + RCD (log-Gaussian kernel). In
Brodatz, the performance of HSIC-SL + RCD was
slightly worse than that of the other methods in the
5-texture group, 10-texture group, and 16-texture
group. Nevertheless, the recognition accuracy of
HSIC-SL + RCD in the experiment on all texture
images was higher than those of the other methods.
e results showed that HSIC-SL was indeed an
excellent algorithm on SPD manifolds, but it was
Table 4: Comparison of classification accuracies on FERET dataset.
Methods bd be bf bg Average
CDL 76.50 75.00 88.50 84.50 81.13
RLPP 58.40 60.00 67.00 60.50 61.48
KSLR 83.00 90.00 96.00 91.00 90.00
logEuc-SC 74.00 94.00 97.50 80.50 86.50
RSR-S 82.50 94.50 98.00 83.50 89.63
RSR-J 79.50 96.50 97.50 86.00 89.88
TSC 36.00 73.00 73.50 44.50 56.75
Riem-DLSC 88.25 93.50 96.50 91.75 92.50
Geometry-DR 80.50 78.00 86.50 83.00 82.00
KLRM-DL 89.50 96.00 97.00 94.00 94.13
EDA 86.00 90.00 95.50 92.00 90.88
MKSSCR 88.50 92.00 96.00 94.50 92.75
(log-linear) + RCD 83.50 87.50 94.00 91.00 89.00
(log-linear) + HRCD 83.50 87.00 93.50 91.50 88.88
(log-Gaussian) + RCD 88.00 88.50 95.00 93.00 91.13
(log-Gaussian) + HRCD 90.00 90.50 96.50 93.50 92.63
inferior in the classification of datasets with subtle
features, such as face recognition and texture rec-
ognition. At the same time, the HRCD makes up for
this defect to a certain extent. e performance of
HSIC-SL + HRCD was almost superior to that of all
methods. However, in the FERET dataset, the av-
erage recognition accuracy of HSIC-SL was lower
than that of KLRM-DL.
Figure 6: Sample images of Brodatz dataset.
Table 5: Images selected for each group.
Experiment number Images selected
5-texture D77, D84, D55, D53, D24
10-texture D4, D9, D19, D21, D24, D28, D29, D36, D37, D38
16-texture D3, D4, D5, D6, D9, D21, D24, D29, D32, D33, D54, D55, D57, D68, D77, D84
5-texture 10-texture 16-texture
HSIC-SL (log-linear) + RCD
HSIC-SL (log-linear) + HRCD
HSIC-SL (log-Gaussian) + RCD
HSIC-SL (log-Gaussian)
Figure 7: Comparison of algorithms for 1NN classification accuracy in Brodatz dataset.
Table 6: Accuracy of all texture classifications of Brodatz.
Methods Accuracy (%)
CDL 85.25
RLPP 75.8
KSLR 85.55
logEuc-SC 63.24
RSR-S 78.32
RSR-J 76.7
Riem-DLSC 37.1
Geometry-DR 74.9
KLRM-DL 79.23
HSIC-SL (log-linear) + RCD 84.87
HSIC-SL (log-linear) + HRCD 88.87
HSIC-SL (log-Gaussian) + RCD 85.98
HSIC-SL (log-Gaussian) + HRCD 89.66
Table 7: Comparison of recognition rates (%) of different methods.
Methods COIL-20 ETH-80
KPCA 81.05 72.61
KPCA + HRCD 83.79 73.7
RLPP 85.89 74.08
RLPP + HRCD 88.79 75.63
CDL 94.54 79.92
KSLR 96.24 81.66
HSIC-SL (log-linear) + RCD 96.72 82.80
HSIC-SL (log-linear) + HRCD 97.75 84.60
HSIC-SL (log-Gaussian) + RCD 96.87 82.40
HSIC-SL (log-Gaussian) + HRCD 97.92 85.28
HSIC-SL (log-linear)
HSIC-SL (log-Gaussian)
Recognition accuracy
6 10141822263034384246502
Figure 8: Recognition rates versus different dimensionalities on COIL-20 database.
HSIC-SL (log-linear)
HSIC-SL (log-Gaussian)
Recognition accuracy
12 22 32 42 52 62 72 82 92 1022
Figure 9: Recognition rates versus different dimensionalities on ETH-80 database.
(3) In the experiments, we also compared the perfor-
mance of the log-Gaussian kernel and log-linear
kernel. In general, the log-Gaussian kernel was better
than the log-linear kernel. However, in the experi-
ments on QMUL and Brodatz, the log-linear kernel
obtained better results than the log-Gaussian kernel.
e difference in performance indicated that the
choice of kernel affected the performance of HSIC-
SL. We can improve the performance of HSIC-SL by
selecting a suitable kernel function.
6. Conclusions
In this work, we propose an improved covariance descriptor
called the HRCD, which represents images with SPD ma-
trices. e HRCD inherits the advantages of the RCD and is
more effective.
To address the classification problem on SPD Rieman-
nian manifolds, we propose an efficient image classification
method that is based on a kernel framework. We refer to it as
HSIC-SL. rough the definition of the log-linear kernel and
log-Gaussian kernel, the input images represented by SPD
matrices can be embedded into the RKHS. To seek explicit
mapping from the RKHS to the vector space, HSIC-SL
constructs the objective function on the basis of the
framework of subspace learning and HSIC maximization.
HSIC-SL always outperforms other representative methods
without increasing computational complexity.
e proposed algorithm also has certain limitations. e
average classification accuracy is slightly worse than that of
KLRM-DL on the FERET dataset. Hence, the covariance
descriptor is not strong enough to handle the classification of
small details, such as face recognition. For our future work,
we will employ other effective features to form the covari-
ance matrices. We will also explore other useful kernel
functions to suit different types of datasets.
Data Availability
e data used to support the findings of this study are
available from the corresponding author upon request.
Conflicts of Interest
e authors declare that they have no conflicts of interest.
is study was supported by the National Natural Science
Foundation of China through the Project “Research on
Nonlinear Alignment Algorithm of Local Coordinates in
Manifold Learning” under grant no. 61773022, the Character
and Innovation Project of Education Department of
Guangdong Province under grant no. 2018GKTSCX081, the
Young Innovative Talents Project of Education Department
of Guangdong Province under grant no. 2020KQNCX191,
the Guangzhou Science and Technology Plan Project of
Bureau of Science and Technology of Guangzhou Munici-
pality under grant no. 202102020700, and the Educational
Big Data Enterprise Lab of Guangzhou Panyu Polytechnic
under grant no. 2021XQS05.
