ArticlePDF Available

Hilbert–Schmidt Independence Criterion Subspace Learning on Hybrid Region Covariance Descriptor for Image Classification

July 2021
Mathematical Problems in Engineering 2021

July 2021
2021

DOI:10.1155/2021/6663710

License
CC BY 4.0

Authors:

Zengrong Zhan

Sun Yat-Sen University

Zhengming Ma

Sun Yat-Sen University

The region covariance descriptor (RCD), which is known as a symmetric positive definite (SPD) matrix, is commonly used in image representation. As SPD manifolds have a non-Euclidean geometry, Euclidean machine learning methods are not directly applicable to them. In this work, an improved covariance descriptor called the hybrid region covariance descriptor (HRCD) is proposed. The HRCD incorporates the mean feature information into the RCD to improve the latter’s discriminative performance. To address the non-Euclidean properties of SPD manifolds, this study also proposes an algorithm called the Hilbert-Schmidt independence criterion subspace learning (HSIC-SL) for SPD manifolds. The HSIC-SL algorithm is aimed at improving classification accuracy. This algorithm is a kernel function that embeds SPD matrices into the reproducing kernel Hilbert space and further maps them to a linear space. To make the mapping consider the correlation between SPD matrices and linear projection, this method introduces global HSIC maximization to the model. The proposed method is compared with existing methods and is proved to be highly accurate and valid by classification experiments on the HRCD and HSIC-SL using the COIL-20, ETH-80, QMUL, face data FERET, and Brodatz datasets.

The sketch mapping of HSIC.

…

Framework of the proposed method. Each image is represented by an SPD matrix on the manifold M. The points on M are embedded into the RKHS H and are further projected to a low-dimensional and discriminative subspace H′. The map is optimized via the cost function f.

…

Sample images of COIL-20.

…

Sample images of ETH-80.

…

Sample images of QMUL dataset.

…

Figures - available from: Mathematical Problems in Engineering

This content is subject to copyright. Terms and conditions apply.

Access to this full-text is provided by Wiley.

Learn more

Content available from Mathematical Problems in Engineering

This content is subject to copyright. Terms and conditions apply.

Research Article

Hilbert–Schmidt Independence Criterion Subspace Learning on

Hybrid Region Covariance Descriptor for Image Classification

Xi Liu ,

Peng Yang ,

Zengrong Zhan ,

and Zhengming Ma

School of Information Engineering, Guangzhou Panyu Polytechnic, Guangzhou 511483, China

School of Electronics and Information Technology, Sun Yat-Sen University, Guangzhou 510006, China

Correspondence should be addressed to Peng Yang; citystars@163.com

Received 10 December 2020; Revised 14 May 2021; Accepted 14 July 2021; Published 21 July 2021

Academic Editor: Muhammad Haroon Yousaf

permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

e region covariance descriptor (RCD), which is known as a symmetric positive deﬁnite (SPD) matrix, is commonly used in

image representation. As SPD manifolds have a non-Euclidean geometry, Euclidean machine learning methods are not directly

applicable to them. In this work, an improved covariance descriptor called the hybrid region covariance descriptor (HRCD) is

proposed. e HRCD incorporates the mean feature information into the RCD to improve the latter’s discriminative perfor-

mance. To address the non-Euclidean properties of SPD manifolds, this study also proposes an algorithm called the Hilbert-

Schmidt independence criterion subspace learning (HSIC-SL) for SPD manifolds. e HSIC-SL algorithm is aimed at improving

classiﬁcation accuracy. is algorithm is a kernel function that embeds SPD matrices into the reproducing kernel Hilbert space

and further maps them to a linear space. To make the mapping consider the correlation between SPD matrices and linear

projection, this method introduces global HSIC maximization to the model. e proposed method is compared with existing

methods and is proved to be highly accurate and valid by classiﬁcation experiments on the HRCD and HSIC-SL using the COIL-

20, ETH-80, QMUL, face data FERET, and Brodatz datasets.

1. Introduction

A growing number of non-Euclidean data, such as sym-

metric positive deﬁnite (SPD) manifolds [1] and Grassmann

manifolds [2], are often encountered in vision recognition

tasks. In particular, SPD manifolds have attracted increased

attention in the form of the region covariance descriptor

(RCD) [3, 4], Gaussian mixed model (GMM) [5], tensors

[6–9], etc. In this work, we mainly discuss the image clas-

siﬁcation on SPD manifolds.

e RCD has been proved to be an eﬀective descriptor in a

variety of applications [10–12]. It captures the correlation be-

tween diﬀerent features of an image and represents the image

with a covariance matrix. However, the mean vector of features

has been proved to be signiﬁcant in image recognition tasks

[13, 14]. In this work, we construct a new image descriptor by

directly incorporating the mean feature information into the

RCD. e new image descriptor is called the hybrid region

covariance descriptor (HRCD). e HRCD inherits the

advantages of the RCD, and it is more discriminable than the

RCD. e images represented by the HRCD are also SPD

matrices that lie on SPD manifolds. Most classical machine

learning algorithms are constructed on linear spaces. Given the

non-Euclidean geometry of Riemannian manifolds, directly

using most of the conventional machine learning methods on

Riemannian manifolds is inadequate [15, 16]. erefore, the

classiﬁcation of the points on Riemannian manifolds has be-

come a hot research topic.

Two main approaches are generally adopted to cope with

the nonlinearity of Riemannian manifolds. e ﬁrst ap-

proach is to construct learning methods by directly con-

sidering the Riemannian geometry; one such method is the

widely used tangent approximation [17, 18]. Most existing

SPD classiﬁcation methods have been proposed by making

use of Riemannian metrics [15, 16] or matrix divergences

[19, 20] as the distance measure for SPD matrices [21–23].

e other approach is to project the SPD matrices to

another space, such as a high-dimensional reproducing

Hindawi

Mathematical Problems in Engineering

Volume 2021, Article ID 6663710, 15 pages

https://doi.org/10.1155/2021/6663710

kernel Hilbert space (RKHS) [24] and another low-di-

mensional SPD manifold [25]. Classiﬁcation algorithms can

be constructed on the projection space. Beneﬁting from the

success of kernel methods in Euclidean spaces, the kernel-

based classiﬁcation scheme is a good choice for the analysis

of SPD manifolds and has shown promising performance

[26, 27]. Kernel-based methods embed manifolds into

RKHSs and further project these manifolds to Euclidean

spaces via an explicit mapping. Hence, algorithms designed

for linear spaces can be extended to Riemannian manifolds.

However, the mapping from RKHSs to Euclidean spaces

using existing methods is based on a linear assumption.

Moreover, the intrinsic connections of SPD matrices and

low-dimensional projections are ignored.

To circumvent this limitation of kernel-based methods,

we propose introducing the Hilbert–Schmidt independence

criterion (HSIC) to the kernel trick and refer to the resulting

method as the HSIC subspace learning (HSIC-SL) algo-

rithm. Speciﬁcally, we derive the log-linear and log-Gaussian

kernels to embed SPD matrices into a high-dimensional

RKHS and then project these points into a low-dimensional

vector space of the RKHS. To align the low-dimensional

representation with the intrinsic features of the input data,

we introduce statistical dependence between the SPD ma-

trices and the low-dimensional representation. In this work,

explicit mapping is obtained on the basis of subspace

learning and HSIC maximization. Here, HSIC can be used to

characterize the statistical correlation between two datasets.

e main contributions of this study are as follows:

(1) We propose a novel covariance descriptor called the

HRCD. e proposed descriptor explores discrimi-

native information eﬀectively.

(2) e HSIC is ﬁrst applied to the kernel framework on

SPD Riemannian manifolds, and a novel subspace

learning algorithm called the HSIC-SL is proposed.

e proposed method achieves eﬀective classiﬁca-

tion on the basis of global HSIC maximization.

(3) We identify two simple kernel functions involved in

the HSIC-SL algorithm. e diversity of kernels

improves the ﬂexibility of the HSIC-SL.

e rest of the paper is organized as follows. We provide

a review of previous work in Section 2. A brief description

about RCD, RKHS, and HSIC is presented in Section 3. We

derive the proposed descriptor and algorithm in detail in

Section 4. e experimental results are presented in Section 5

to demonstrate the eﬀectiveness of the HRCD and HSIC-SL.

Conclusions and future research directions are established in

Section 6.

2. Literature Review

is section presents a brief review of RCDs, as well as recent

manifold classiﬁcation methods constructed on SPD

manifolds.

e RCD was ﬁrst introduced by Tuzel et al. [28]. It

represents an image region with a nonsingular SPD matrix

by extracting the covariance matrix of multiple features.

e covariance matrix does not have any information about

size and ordering, which implies certain scale and rotation

independence. e RCD is used not only in image rec-

ognition but also in image set recognition tasks, in which an

image set is modeled with its natural second-order statistic

[4, 29]. e GMM could also serve as the SPD descriptor of

an image set. Under the assumption of the multi-Gaussian

distribution of an image set [30], hundreds of images in the

image set are assigned to a small number of Gaussian

components. Each Gaussian component is represented as

an SPD matrix [31]. us, the image set is described by

multiple SPD matrices. As mentioned previously, mean

vectors have also been proved to be important in recog-

nition tasks. In [32], the mean information was utilized in

an improved log-Euclidean Gaussian kernel. However, this

approach is limited to a speciﬁc algorithm and lacks

generality. In the current work, we propose to incorporate

the feature mean information and covariance matrix into a

new SPD matrix and introduce ﬁrst-order statistic infor-

mation into the image RCD to improve the discriminant

ability of the descriptor.

When the manifold under consideration is an SPD

manifold, the tangent space of a particular point is a linear

space. Most works map SPD matrices onto the tangent space

of a particular point; thus, traditional linear classiﬁers can be

applied. Under this framework, dimensionality reduction

and clustering methods, such as Laplacian eigenmaps, local

linear embedding (LLE), and Hessian LLE, have been ex-

tended to Riemannian manifolds [17]. Tuzel et al. introduced

LogitBoost for classiﬁcation on Riemannian manifolds [18].

e classiﬁer has been generalized to multiclass classiﬁcation

[33]. Sparse coding by embedding manifolds into identity

tangent spaces to identify the Lie algebra of SPD manifolds

was considered in [34]. Such tangent space approximations

could preserve manifold value data and eliminate the

swelling eﬀect. However, ﬂattening a manifold through

tangent spaces may generate inaccurate modeling, especially

for regions far away from the tangent pole.

Except for tangent approximation, many eﬀorts have

been devoted to the distance measure on SPD manifolds to

measure the true SPD manifold geometry; examples include

the log-Euclidean Riemannian metric (LERM) [15] and the

aﬃne invariance Riemannian metric (AIRM) [16]. Although

matrix divergences are not real Riemannian metric, they

provide fast and approximate distance computation. Siva-

lingam et al. proposed tensor sparse coding (TSC) for

positive deﬁnite matrices [35] that utilizes the Burg diver-

gence to perform sparse coding and dictionary learning on

SPD manifolds. Riemannian dictionary learning and sparse

coding (DLSC) [36] represents data as sparse combinations

of SPD dictionary atoms via a Riemannian geometric ap-

proach and characterizes the loss of optimization for DLSC

via the aﬃne invariant Riemannian metric. However, these

methods cannot be applied to other Riemannian manifolds

because of the speciﬁcity of the speciﬁc metrics used. Em-

bedding discriminant analysis (EDA) [37] identiﬁes a bi-

linear isometric mapping such that the resulting

representation maximizes the preservation of Riemannian

geodesic distance.

2Mathematical Problems in Engineering

As for the proposed kernel methods for SPD manifolds,

Riemannian locality preserving projections (RLPPs) [38]

embed Riemannian manifolds into low-dimensional vector

spaces by deﬁning Riemannian kernels; moreover, their

computational complexity is heavy, and the kernel is not

always positive deﬁnite. Jayasumana et al. [39] presented a

framework on Riemannian manifolds to identify the positive

deﬁniteness of Gaussian RBF kernels and utilized the log-

Euclidean Gaussian kernel in kernel principal component

analysis (KPCA) for a recognition task. Caseiro et al. pro-

posed a heat kernel mean shift on Riemannian manifolds

[40]. In [41], kernel DLSC based on LERM was introduced.

Harandi et al. proposed to seek sparse coding by embedding

the space of SPD matrices into Hilbert spaces through two

types of Bregman matrix divergences [42]. Covariance

discriminative learning (CDL) [4] utilizes a matrix logarithm

operator to deﬁne kernel functions and then explicitly maps

the covariance matrices from a Riemannian manifold to a

Euclidean space. Zhuang et al. proposed a data-dependent

kernel learning framework on the basis of the kernel learning

and Riemannian metric (KLRM) [43]. In [44], multi-kernel

SSCR (MKSSCR) created a linear combination of a set of

Riemannian kernels. Considerable results are achieved in the

kernel framework. To improve the performance of the kernel

trick, we introduce a statistical dependence constraint be-

tween SPD matrices and projections and measure the sta-

tistical dependence with the HSIC.

3. Related Work

In this section, we brieﬂy review the RCD and the properties

of the RKHS and HSIC.

3.1. Region Covariance Descriptor. Region covariance de-

scriptor (RCD), as a special case of SPD matrices, proposes a

natural way of fusing multiple features. Suppose Ris an

image region of size h×w, and we can extract multiple

features of every pixel in R. e features could be location,

grey values, and gradients. We denote the feature vector of

the k-th pixel as

zk�x, y, I, Ix

, Iy



 ,(1)

where xand ydenote the location, Iis the grey value, and Ix

and Iyare the gradients with respect to xand y. e RCD of

Ris deﬁned as

Σ�1

n−1

k�1

zk−μ

  zk−μ

 T,(2)

where n�h×wand μ� (1/n)n

k�1zk∈Rddenotes the

mean of the points. en, the image region can be presented

by a d×dSPD matrix, where ddepends on the number of

features.

3.2. RKHS. Reproducing kernel Hilbert space (RKHS) is the

theoretical basis of kernel methods. After projecting the data

into a RKHS, various machine learning methods will be

implemented in the RKHS.

Let S(Ω)be a function space, and 〈·,·〉 is an inner

product deﬁned on S(Ω). e complete inner product space

H� (S(Ω),〈·,·〉) induced by 〈·,·〉 is a Hilbert space. For all

x∈ Ω and f∈S(Ω), if the function ksatisﬁes

f(x) � 〈f, k(·, x)〉, then kis the reproducing kernel of the

RKHS H. We denote the mapping deﬁned by the repro-

ducing kernel as ϕ(x) � k(·, x) � kx∈H. We can induce

that

〈ϕ(x),ϕ(y)〉 �〈kx, k(·, y)〉�kx(y) � k(y, x) � k(x, y).

(3)

e function kcould be a kernel function only if the

kernel matrix Kis symmetric positive deﬁnite, where

K�

k(x1, x1) · · · k(x1, xn)

⋮ ⋱ ⋮

k(xn, x1) · · · k(xn, xn)









. According to Mercer’s

theorem [45], once a valid reproducing kernel is deﬁned, we

can generate a unique Hilbert space.

3.3. Hilbert–Schmidt Independence Criterion (HSIC). e

HSIC [46] is usually used to characterize the statistical

correlation of two datasets. e mathematical theory of

HSIC has been studied for a long time and there are many

achievements [47–51]. In the computation of HSIC, the two

datasets are ﬁrstly embedded onto two RKHSs, and then the

HSIC of the two set of data is measured by the Hil-

bert–Schmidt (HS) operator of these two RKHSs.

Let Xbe a random variable/vector deﬁned on ΩXand Y

be a random variable/vector deﬁned on ΩY,HXand HYbe

two separate Hilbert spaces, and ϕX:ΩX⟶HXand

ϕY:ΩY⟶HYbe the kernel mappings deﬁned by the

reproducing kernels, respectively.

3.3.1. Hilbert–Schmidt (HS) Operators. Let T:HX⟶HY

be a compact operator and eX

i|i∈I

 be the orthonormal

basis of HX; if i∈I‖TeX

i‖2

Y<+∞, then Tis called a Hil-

bert–Schmidt (HS) operator [52]. If for all

T, S ∈HS(HX⟶HY),i∈I|〈TeX

i, SeX

i〉Y|<+∞, then

(HS(HX⟶HY),〈·,·〉HS)is a Hilbert space. e inner

product 〈·,·〉HS is deﬁned as 〈T, S〉HS �i∈I〈TeX

i, SeX

i〉Y.

For f0∈HXand g0∈HY, the tensor product of f0and g0is

denoted as f0⊗g0. Since f0⊗g0(f) � 〈f0, f〉Xg0∈HY,

then f0⊗g0∈HS(HX⟶HY)[53].

3.3.2. Mean Functions and Cross Covariance Operators.

Let ΦX:HX⟶Rbe a continuous linear functional over

HX; for all T∈HS(HX⟶HY),

ΦX(f) � EX〈φX(X), f〉X

 .(4)

According to Riesz theorem, there must a unique HS

operator μX∈HXsuch that for all f∈HX,

ΦX(f) � 〈f, μX〉X; then, μXis called the mean function of

φX(X). Similarly, the mean function μYof φY(Y)is deﬁned

in the same way.

Mathematical Problems in Engineering 3

Let Φbe a continuous linear functional over

HS(HX⟶HY); for all T∈HS(HX⟶HY),

Φ(T) � EXY 〈φX(X)⊗φY(Y), T〉HS

 .(5)

en, according to Riesz theorem, there must a unique

HS operator CXY ∈HS(HX, HY), such that for all

T∈HS(HX, HY),

Φ(T) � 〈T, CXY〉HS,(6)

where CXY is called the cross covariance operator between

φX(X)and φY(Y).

e relationship between CXY,μX, and μYis illustrated in

Figure 1. e two datasets ΩXand ΩYare embedded into HX

and HYby the kernel functions ϕX:ΩX⟶HXand

ϕY:ΩY⟶HY, respectively. μXand μYare the mean

functions. e HSIC of ΩXand ΩYis given by the Hil-

bert–Schmidt (HS) operator CXY of HXand HY.

3.3.3. HSIC. HSIC of two random variables/vectors is de-

ﬁned as follows:

HSIC(X, Y) � EXY φX(X) − μX

 ⊗φY(Y) − μY

 







2

 .

(7)

It can be seen from the deﬁnition of HSIC(X, Y)that

instead of directly calculating the covariance of Xand Y, i.e.,

EXY[(X−EX[X])(Y−EY[Y])], HSIC ﬁrst transforms X

and Yinto HXand HY, respectively, and then calculates the

covariance of φX(X)and φY(Y)by using HS operators

between HXand HY. In practice, HXand HYare generated

from kernel functions kXand kY.

If the joint probability distribution of Xand Yis given or

known, HSIC(X, Y)can be calculated as follows:

HSIC(X, Y) � EXY φX(X) − μX

 ⊗φY(Y) − μY

 







2

 �CXY −μX⊗μY







2

�〈CXY, CXY 〉HS −2〈CXY,μX⊗μY〉HS +〈μX⊗μY,μX⊗μY〉HS,

(8)

where

〈CXY, CXY 〉HS �EXYEX′Y′kXX, X′

 kYY, Y′

  ,(9)

〈CXY,μX⊗μY〉HS �EXY EX′kXX, X′

  EY′kYY, Y′

   ,

(10)

〈μX⊗μY,μX⊗μY〉HS �〈μX,μX〉X〈μY,μY〉Y.(11)

Generally speaking, the joint probability distribution of

Xand Yis unknown, and only some samples of Xand Yare

given: X≜x1,. . . , xN

 ⊆ ΩX, Y ≜y1,. . . , yN

 ⊆ ΩY. In this

case, the statistical average can be approximated by the

sample average. Moreover, it is assumed that when i≠j, the

probability of the random event X�xi;Y�yj

 is 0; then,

the cross covariance operator CXY and the mean functions

μXand μYcan be approximated as follows:

HX, kXHY, kY

CXY

φX

φ (X)φ (Y)

∙μX∙μY

φY

ΩXΩ

Figure 1: e sketch mapping of HSIC.

4Mathematical Problems in Engineering

CXY ≈1

N

i�1

φXxi

 ⊗φYyi

 ,

μX≈1

N

i�1

φXxi

 ,

μY≈1

N

i�1

φYyi

 .

(12)

Substituting equation (12) into equations (9) and (10)

gives

〈CXY, CXY 〉HS ≈1

N2tr KXKY

 ,

〈CXY,μX⊗μY〉HS ≈1

N3ΓT

NKXKYΓN,

〈μX⊗μY,μX⊗μY〉HS ≈1

N4ΓT

NKXΓNΓT

NKYΓN,

(13)

where ΓN� [1,. . . ,1]T∈RNis the N-dimensional vector

with all elements being 1 and KXand KYare the kernel

matrices of Xand Y, respectively.

At last, the calculation formula of HSIC(X, Y)can be

obtained by

HSIC(X, Y) � CXY −μX⊗μY







2

�〈CXY, CXY 〉HS +〈μX⊗μY,μX⊗μY〉HS

−2〈CXY,μX⊗μY〉HS

≈1

N2tr KYCNKXCN

 ,

(14)

where CN�IN− (1/N)ΓNΓT

Nis the centralizing matrix.

4. HRCD and HSIC Subspace Learning

e algorithm can be divided into four steps. First, we model

the labeled training samples through the proposed HRCD.

Each training sample is described by an SPD matrix. Second,

we embed the SPD matrices into a high-dimensional RKHS

Hwith a deﬁned kernel function and further project the

elements in the RKHS into a vector space H′. e mapping

f:H⟶H′is explored by solving the optimization

problem. ird, we use the explicit map in mapping the

training and test samples onto the low-dimensional and

relatively discriminative space. Finally, the classiﬁcation task

can be realized by executing a classiﬁer on H′. An overall

illustration of the algorithm is shown in Figure 2.

Given a set of training samples belonging to cclasses

χ�X1, X2,. . . , XN

 ⊆M,Xi∈Rd×dis an SPD matrix, and

l�l1, l2,. . . , lN

  denote the corresponding labels. e

representation of χon the low-dimensional vector space H′

is denoted as Y� [y1, y2,. . . , yN],yi∈Rm. In the frame-

work of the kernel analysis, the low-dimensional

representation yiof Xiis obtained by the mapping

yi�WKiRow, where KiRow � [ki1, ki2,. . . , kiN ]T.

4.1. Hybrid Region Covariance Descriptor. As mentioned

previously, we propose incorporating the feature mean in-

formation into the RCD to improve the discrimination of the

descriptor. We refer to the resulting descriptor as the HRCD.

Given an image region R, we extract multiple features of

each point in Rand then compute the mean vector and

covariance matrix of the features. Suppose that the feature

vector of the k-th pixel is zk; the mean vector μ∈Rdand

covariance matrix Σ ∈ Rd×dcan then be computed as

μi�1

n

k�1

zk∈Rd,

Σ�1

n−1

k�1

zk−μ

  zk−μ

 T.

(15)

Following the information geometry theory [54], we

combine the mean and the covariance matrix into a new

matrix without additional computational complexity. e

new matrix is constructed as

X� |Σ|− (1/(d+1)) Σ+μμTμ

μT1









.(16)

Here, dis the dimensionality of the feature vector, and |·|

is the determinant operator. e (d+1) × (d+1)SPD

matrix Xis the HRCD of the image. As a result of the

inheritance from the covariance matrix, the HRCD is not

only eﬀective, robust, and low-dimensional but also more

discriminable than the RCD.

4.2. Kernel Function in HSIC-SL. In deﬁning a valid RKHS,

the kernel must be symmetric positive deﬁnite. Many dis-

cussions on the symmetric positive deﬁniteness of kernel

functions are based on vector spaces. In this section, we

introduce two typical kernel functions on SPD Riemannian

manifolds.

4.2.1. Log-Linear Kernel. e polynomial kernel is one of the

commonly used kernel functions in Euclidean spaces. e

polynomial kernel function in a vector space is deﬁned as

k xi, xj

 �αxT

ixj+β

 c,(17)

where xi, xj∈Rn. If α�c�1 and β�0, then equation (17)

is a linear kernel.

k xi, xj

 �xT

ixj.(18)

When the linear kernel is developed into SPD Rie-

mannian manifolds, it should be redeﬁned in a sophisticated

form. e linear kernel on SPD Riemannian manifolds can

be deﬁned as

Mathematical Problems in Engineering 5

k Xi, Xj

 �tr log Xi

  Tlog Xj

  �tr log Xi

 log Xj

  .

(19)

We denote the kernel as the log-linear kernel.

4.2.2. Log-Gaussian Kernel. e Gaussian kernel is another

popular kernel function in Euclidean spaces. e deﬁnition

of a Gaussian kernel function is

k xi, xj

 �exp −d2xi, xj

 

2σ2





,(20)

where d(xi, xj) � ‖xi−xj‖F. A good eﬀect can be achieved

by replacing the Euclidean distance with a log-Euclidean

distance. e log-Gaussian kernel is deﬁned by

kLE Xi, Xj

 �exp −d2

LE Xi, Xj

 

2σ2









,(21)

where dLE(Xi, Xj) � ‖log(Xi) − log(Xj)‖Fis the log-Eu-

clidean distance between Xiand Xj. e positive deﬁnite-

ness of kLE was proved in [39].

e parameter σis an important parameter in the

Gaussian kernel. To make the log-Gaussian kernel sensitive

to distances, we suggest setting σto the average value of

distances between the training samples.

4.3. HSIC Subspace Learning. After embedding the matrices to

the RKHS, we further project the points into a vector space

through explicit mapping. We aim to ﬁnd the explicit mapping

from the RKHS to the vector space by maximizing the HSIC

between the SPD matrices and the low-dimensional repre-

sentation, as well as preserving the local information. e

proposed HSIC-SL includes global HSIC maximization and

within-class information preservation.

We denote the HSIC of χand the low-dimensional

representation Yas HSIC(χ, Y). According to equation (14),

HSIC(χ, Y)can be computed as

HSIC(χ, Y) � 1

N2tr KYCNKχCN

 .(22)

e input data χand projection Yare represented by Kχ

and KY, respectively. To explicitly realize the low-dimen-

sional representation, we deﬁne the kernel function of Yin

HSIC(χ, Y)as kY:Rd×Rd⟶R,∀y′, y″∈Rd; that is,

kYy′, y″

 �y′Ty″.(23)

We denote the kernel matrix of kYas KY. It can be

computed by

KY�

1y1· · · yT

1yN

⋮ ⋱ ⋮

Ny1· · · yT

NyN









�YTY. (24)

Substituting equation (24) to equation (22) yields

HSIC(χ, Y) � 1

N2tr KYCNKχCN

 

�1

N2tr YTYCNKχCN

 

�1

N2tr YCNKχCNYT

 .

(25)

As Nis not related to Y, the coeﬃcient (1/N2)in

equation (25) can be omitted. en, we have

HSIC(χ, Y) � tr WKχCNKχCNKT

χWT

 �tr WLHWT

 ,

(26)

where LH�KχCNKχCNKT

χ.

e within-class information is represented by the

within-class scatter SW, which is deﬁned as

SW�tr 

i�1

j�1

yk−mi

  yk−mi

 T





,(27)

where Niis the number of training samples of the i-th class,

c

i�1Ni�N, and miis the mean vector corresponding to the

H′

Figure 2: Framework of the proposed method. Each image is represented by an SPD matrix on the manifold M. e points on Mare

embedded into the RKHS Hand are further projected to a low-dimensional and discriminative subspace H′. e map is optimized via the

cost function f.

6Mathematical Problems in Engineering

i-th class. According to the relationship between yiand Xi,

equation (27) can be further transformed into

SW�tr W

i�1

j�1

KjRow −Kmi

  KjRow −Kmi

 T





WT





�tr WLWWT

 ,(28)

where Kmi� (1/Ni)Ni

j�1ki

j,LW�c

i�1Ni

j�1(KjRow −mi)

(KjRow −mi)T.

In sum, the objective function is formulated as

J(W) � arg max

(HSIC(χ, Y)) � arg max

WLHWT

 

s.t.tr WLWW

 �tr LW

 .

(29)

e Rayleigh quotient maximum problem is commonly

used in optimization problems because of the fast and simple

calculation. e problem shown in equation (29) can be

solved by calculating the Rayleigh quotient maximum. To

tackle the singularity, we add a small perturbation εto the

diagonal elements of LW. e optimal projection matrix Wis

composed of the eigenvectors corresponding to the m

biggest eigenvalues of (LW+εIN)−1LH, where INis the

identity matrix.

Hence, for the given test image, we ﬁrst compute its

HRCD and denote the result as Xt. e projection can be

obtained by yt�WKtRow. en, the class of the test image

can be predicted through the nearest neighbor classiﬁer.

5. Experiment

e performance of the HRCD and the proposed algorithm

is veriﬁed in this section. We considered ﬁve widely studied

image datasets: COIL-20 (Columbia Object Image Library)

dataset [55], ETH-80 dataset [56], Queen Mary University of

London (QMUL) dataset [57], face data FERET dataset [58],

and Brodatz dataset [59]. All of the compared methods were

implemented in MATLAB R2014 and tested on an Intel(R)

Core(TM) i5-4670K (3.40 GHz) machine.

5.1. Performance of HRCD. To verify that the HRCD is an

eﬀective image descriptor, we directly used the KNN clas-

siﬁer on the image feature space represented by the HRCD

and RCD without feature extraction. By adopting the Eu-

clidean metric, LERM, AIRM, and Burg divergence as the

measurements, the classiﬁcation experiments were per-

formed on COIL-20 and ETH-80. e COIL-20 dataset

contains 20 objects, each of which contains 72 images

measuring 128 ×128 at diﬀerent directions. Figure 3 shows

the sample pictures. Features including grey values and ﬁrst-

and second-order gradients were extracted to calculate the

RCD and HRCD of an image. Hence, the RCD and HRCD of

an image were a 5 ×5 SPD matrix and a 6 ×6 SPD matrix,

respectively. e images were randomly split into the

training set and test set, with 10 pictures assigned to the

training set and the remaining images assigned to the test set.

ETH-80 is an image set containing eight types of objects,

such as apple, pears, cars, and dogs. Each object has 10

instances, and each instance contains images from 41 dif-

ferent viewpoints. e images in ETH-80 were resized to

128 ×128 (Figure 4). For the RCD and HRCD represen-

tations, we extracted the following features:

F(x, y) � x, y, Rx,y , Gx,y, Bx,y, Ix,y, Ix

, Iy

, Ixx

, Iyy



 ,

(30)

where Rx,y,Gx,y ,Bx,y are the RGB color values of a pixel at

the position of xand y,Ix,y is the greyscale value, and |Ix|,

|Iy|,|Ixx|,|Iyy |are the ﬁrst-and second-order gradients of

intensities. e RCD and HRCD of the image were a 10 ×10

SPD matrix and a 11 ×11 SPD matrix, respectively. Half of

the instances in every object were used for training, and the

remaining instances were used for the test. Each instance in

the training and test sets comprised 100 random samples.

erefore, the training and test sets each contained 800

images.

Table 1 lists the classiﬁcation accuracies and runtimes

under diﬀerent metrics. To eliminate the randomness of the

experiment, we obtained the average accuracy and runtime

for 20 tests.

5.2. Performance of HSIC-SL. e proposed HSIC-SL was

compared with several recognition methods on SPD man-

ifolds. e compared methods included RLPP [38], KSLR

[32], CDL [4], KPCA using the log-Gaussian kernel [39],

RSR [42], TSC [35], Riem-DLSC [36], logEuc-SC [34],

Geometry-DR [25], KLRM-DL [43], EDA [37], and

MKSSCR [44]. For brevity, we denote the HSIC-SL with the

log-linear kernel as HSIC-SL (log-linear) and that with the

log-Gaussian kernel as HSIC-SL (log-Gaussian). HSIC-SL

(log-linear) and HSIC-SL (log-Gaussian) were combined

with the RCD and HRCD. us, for the proposed HSIC-SL,

four diﬀerent combinations were tested. For equality, the

important parameters of the comparison methods were set

according to the suggestion of the original paper.

5.2.1. Experiments on QMUL Dataset. e QMUL dataset

[44] is a set of images of human heads collected from airport

terminal cameras. e dataset is composed of 20,005 images.

It is divided into ﬁve classes according to the direction of the

head images: back, front, left, right, and background. e

samples from QMUL are shown in Figure 5. e dataset was

divided into the training and test sets in advance. Table 2

shows the number of training and test sets in every class. e

extracted feature of any pixel is

Mathematical Problems in Engineering 7

F(x, y) � IL(x, y), Ia(x, y), Ib(x, y),

x+I2

,arctan I2









, G1(x, y),..., G8(x, y)









,(31)

where IL(x, y),Ia(x, y), and Ib(x, y)are the three channel

values of the CIELAB color space, Ixand Iyare the ﬁrst-order

gradients in the x- and y-directions of IL(x, y), respectively,

and Gi(x, y);i�1,. . . ,8is the response of eight diﬀerence-of-

Gaussians ﬁlters. We obtained a 13 ×13 SPD matrix for the

RCD and a 14 ×14 SPD matrix for the HRCD. e training

data consisted of 200 randomly selected samples for each

category, and the test set consisted of 100 randomly selected

samples. e KNN (k�12) search was used to construct the

neighborhood graphs in the RLPP and Geometry-DR. e

parameters (σ) in the kernels of the KPCA, RLPP, KSLR, and

HSIC-SL were set to the average distances. e parameter cin

the KSLR was set to 0.3. e parameter εin the proposed

method was set to 0.001. We evaluated the performance of the

CDL, RLPP, KSLR, Geometry-DR, and HSIC-SL for various

dimensions and reported the maximum performance. In

logEuc-SC, RSR, TSC, Riem-DLSC, and KLRM-DL, 50 dic-

tionaries and kernel parameters were learned from the training

set. e kernel function in the RSR and the basic kernel in the

KLRM-DL was the Stein kernel. e parameter alpha was set to

0.1, and the number of data samples was set to 30. e 1NN

classiﬁer was adopted in all the algorithms.

In Table 3, we show the recognition accuracy of the

HSIC-SL and the other existing algorithms. To eliminate the

Figure 3: Sample images of COIL-20.

Figure 4: Sample images of ETH-80.

8Mathematical Problems in Engineering

randomness of the experiment, we used the average rec-

ognition rate for 20 tests. HSIC-SL (log-Gaussian) + HRCD

and HSIC-SL (log-linear) + HRCD achieved impressive

performance while HSIC-SL (log-Gaussian) + HRCD ob-

tained the highest classiﬁcation accuracy. Moreover, the

accuracy of the HRCD was greater than that of the RCD in

the experiment. ese results indicated that the HRCD was

better than the RCD. Furthermore, HSIC-SL + HRCD was

better than the other algorithms.

5.2.2. Experiments on FERET Dataset. To conduct the face

recognition experiment, we used the “b” subset of the

FERET dataset [56], which consists of 2,000 face images of

200 people. e images are those of 71 females and 129 males

of diverse ethnicities, genders, and ages. e images were

cropped and downsampled to 64 ×64. e training set was

composed of images with “ba,” “bc,” “bh,” and “bk” labels.

Images marked as “bd,” “be,” “bf,” and “bg” constituted the

test set. e feature vector for computing the RCD and

HRCD is described by

F(x, y) � x, y, I(x, y), G00 (x, y)

,. . . , G47(x, y)



 ,

(32)

where xand ydenote the position, I(x, y)is the intensity,

and Guv(x, y)is the response value of the Gabor ﬁlter. e

direction uof the Gabor ﬁlter was from 0 to 4, and the scale v

was from 0 to 7. us, the RCD and HRCD of each image

were a 43 ×43 SPD matrix and a 44 ×44 SPD matrix, re-

spectively. e neighborhood graphs constructed in the

RLPP and Geometry-DR were KNN (k�3). e kernel

functions with Jeﬀrey and Stein divergences were adopted in

RSR and, respectively, denoted as RSR-J and RSR-S for

brevity. In RSR, TSC, Riem-DLSC, KLRM-DL, and logEuc-

SC, all training samples were regarded as dictionary atoms.

e settings of the other parameters were the same as those

for the QMUL dataset.

Table 4 shows the recognition rates of the compared

algorithms. e proposed method was not the best algorithm

for the FERET dataset. It only achieved the highest recog-

nition accuracy in the “bd” test scenario. Nevertheless, the

average recognition accuracies of HSIC-SL were still better

than those of the other algorithms and were only slightly

worse than those of KLRM-DL. Hence, HSIC-SL was still a

feasible algorithm for the FERET dataset. We also noticed

that HSIC-SL (log-Gaussian) performed better than HSIC-

SL (log-linear). erefore, the log-Gaussian kernel was more

suitable than the log-linear kernel for this dataset.

5.2.3. Experiments on Brodatz Dataset. We performed two

texture classiﬁcation experiments on the Brodatz dataset

[57]. Examples from the Brodatz dataset are shown in

Back

Background

Front

Left

Right

Figure 5: Sample images of QMUL dataset.

Table 1: Comparison of RCD and HRCD in terms of classiﬁcation accuracy (%) and runtime (seconds).

Metric Descriptor COIL-20 ETH-80

Classiﬁcation accuracy (%) Runtime (s) Classiﬁcation accuracy (%) Runtime (s)

Euclidean RCD 74.98 2.946 62.63 5.234

HRCD 59.88 2.908 66.34 5.299

LERM RCD 84.81 4.149 71.03 7.1

HRCD 88.99 4.262 72.04 7.759

AIRM RCD 87.10 118.33 71.64 482.47

HRCD 91.06 125.79 73.35 509.15

Burg divergence RCD 89.23 10.102 72.07 26.368

HRCD 91.71 10.492 73.63 26.66

Table 2: Distribution of QMUL dataset.

Label Back Background Front Left Right

Training 2256 2256 2256 2256 2256

Test 2096 1107 1772 1502 2248

Table 3: Comparison of classiﬁcation accuracies on QMUL dataset.

Methods Accuracy (%)

KPCA 42.5

CDL 76

RLPP 58.4

KSLR 77.4

logEuc-SC 66.3

RSR 73.2

TSC 61.7

Riem-DLSC 36.6

Geometry-DR 69.2

KLRM_DL 70.74

EDA 75.85

MKSSCR 78.14

HSIC-SL (log-linear) + RCD 76.22

HSIC-SL (log-linear) + HRCD 78.29

HSIC-SL (log-Gaussian) + RCD 76.16

HSIC-SL (log-Gaussian) + HRCD 79.65

Mathematical Problems in Engineering 9

Figure 6. e ﬁrst experiment was a grouping experiment

with selected textures, and the other was a classiﬁcation

experiment for all texture images.

In the ﬁrst experiment, we followed the test setup

designed in [35] and selected three of the test schemes. e

schemes included one of the 5-texture groups, one of the 10-

texture groups, and one of the 16-texture groups. e

number of classes selected in each test scheme is shown in

Table 5. Each image was resized to 256 ×256, and then 64

regions measuring 32 ×32 were extracted. e covariance

matrices were computed from a ﬁve-dimensional feature

vector, including intensity and the ﬁrst- and second-order

gradients. In each test scheme, eight samples in one image

were randomly selected as the training data, and the

remaining samples were used for the test. Geometry-DR was

not suitable for the dataset because of the low dimension of

the SPD matrices in this experiment. e results shown in

Figure 7 are the average results for 20 tests.

HSIC-SL achieved the highest classiﬁcation result on all

test schemes, except for the 5-texture test, in which the

recognition rates of most of the algorithms were relatively

close. In this dataset, HSIC-SL (log-linear) performed better

than HSIC-SL (log-Gaussian).

In the second experiment, 20 random samples were chosen

as the training set, and 10 random samples were chosen as the

test set from all the texture images. e average results for 20

tests are presented in Table 6. HSIC-SL (log-linear) and HSIC-

SL (log-Gaussian) outperformed the other methods, with the

latter being marginally better than the former when all texture

pictures were classiﬁed. In addition, HSIC-SL modeled in the

HRCD was much higher than HSIC-SL modeled in the RCD.

e discriminative ability of the HRCD was veriﬁed again.

5.2.4. Experiments on COIL-20 and ETH-80 Datasets. In this

experiment, we used the COIL-20 dataset [55] and ETH-80

dataset in the object categorization task. e experimental

procedure was the same as that described in Section 5.1. We

compared the proposed method with KPCA [39], RLPP [38],

KSLR [32], and CDL [4]. In addition, KPCA and RLPP were

conducted on the HRCD and, respectively, denoted as

KPCA + HRCD and RLPP + HRCD. e classiﬁer adopted

in all of the algorithms was the 1NN classiﬁer.

Table 7 shows the classiﬁcation accuracies of the

methods on COIL-20 and ETH-80. First, HSIC-SL obtained

the best accuracy in all of the datasets. is result indicated

that the introduction of the HSIC improved the eﬀectiveness

of the recognition algorithm. Second, the classiﬁcation ac-

curacies of RLPP, KPCA, and HSIC-SL in the RCD were

lower than those in the HRCD (i.e., RLPP + HRCD,

KPCA + HRCD, and HSIC-SL + HRCD). is result proved

once again that the HRCD had advantages over the RCD.

Finally, the eﬀectiveness of the log-linear kernel and log-

Gaussian kernel in HSIC-SL was demonstrated in the

experiments.

5.3. Analysis of Dimensionality. e parameter mwas

regarded as the dimensionality of the vector space after

feature extraction. e curves of the classiﬁcation accuracies

of the compared algorithms on COIL-20 [55], ETH-80 [37],

and Brodatz versus mare shown in Figures 8 and 9. e

experimental setups were the same as those described in the

previous section.

With the increase of the dimensionality, the recognition

accuracy curves showed an upward trend. When the rec-

ognition accuracy reached a certain value, the recognition

rate remained basically stable within a certain range of the

subspace dimension.

5.4. Discussion. In the above experiments, the performance

of the RCD and HRCD and the eﬀectiveness of HSIC-SL and

the other algorithms were compared. e following obser-

vations were made:

(1) e classiﬁcation accuracy in the image feature space

represented by the HRCD was better than that by the

RCD regardless of which classiﬁer was used (i.e.,

KNN classiﬁer without feature extraction or the

proposed HSIC-SL). e result showed that the

proposed image descriptor HRCD outperformed the

RCD.

(2) When the RCD was used as the image descriptor, the

HSIC-SL method was superior to most of the

methods, except for the FERET and Brodatz datasets.

In FERET, the performance of Riem-DLSC,

MKSSCR, and KLRM-DL was slightly better than

that of HSIC-SL + RCD (log-Gaussian kernel). In

Brodatz, the performance of HSIC-SL + RCD was

slightly worse than that of the other methods in the

5-texture group, 10-texture group, and 16-texture

group. Nevertheless, the recognition accuracy of

HSIC-SL + RCD in the experiment on all texture

images was higher than those of the other methods.

e results showed that HSIC-SL was indeed an

excellent algorithm on SPD manifolds, but it was

Table 4: Comparison of classiﬁcation accuracies on FERET dataset.

Methods bd be bf bg Average

CDL 76.50 75.00 88.50 84.50 81.13

RLPP 58.40 60.00 67.00 60.50 61.48

KSLR 83.00 90.00 96.00 91.00 90.00

logEuc-SC 74.00 94.00 97.50 80.50 86.50

RSR-S 82.50 94.50 98.00 83.50 89.63

RSR-J 79.50 96.50 97.50 86.00 89.88

TSC 36.00 73.00 73.50 44.50 56.75

Riem-DLSC 88.25 93.50 96.50 91.75 92.50

Geometry-DR 80.50 78.00 86.50 83.00 82.00

KLRM-DL 89.50 96.00 97.00 94.00 94.13

EDA 86.00 90.00 95.50 92.00 90.88

MKSSCR 88.50 92.00 96.00 94.50 92.75

HSIC-SL

(log-linear) + RCD 83.50 87.50 94.00 91.00 89.00

HSIC-SL

(log-linear) + HRCD 83.50 87.00 93.50 91.50 88.88

HSIC-SL

(log-Gaussian) + RCD 88.00 88.50 95.00 93.00 91.13

HSIC-SL

(log-Gaussian) + HRCD 90.00 90.50 96.50 93.50 92.63

10 Mathematical Problems in Engineering

inferior in the classiﬁcation of datasets with subtle

features, such as face recognition and texture rec-

ognition. At the same time, the HRCD makes up for

this defect to a certain extent. e performance of

HSIC-SL + HRCD was almost superior to that of all

methods. However, in the FERET dataset, the av-

erage recognition accuracy of HSIC-SL was lower

than that of KLRM-DL.

Figure 6: Sample images of Brodatz dataset.

Table 5: Images selected for each group.

Experiment number Images selected

5-texture D77, D84, D55, D53, D24

10-texture D4, D9, D19, D21, D24, D28, D29, D36, D37, D38

16-texture D3, D4, D5, D6, D9, D21, D24, D29, D32, D33, D54, D55, D57, D68, D77, D84

0.4

0.5

0.6

0.7

0.8

0.9

5-texture 10-texture 16-texture

CDL

RLPP

KSLR

logEuc-SC

RSR-J

RSR-S

TSC

Riem-DLSC

KLRM_DL

HSIC-SL (log-linear) + RCD

HSIC-SL (log-linear) + HRCD

HSIC-SL (log-Gaussian) + RCD

HSIC-SL (log-Gaussian)

+ HRCD

Figure 7: Comparison of algorithms for 1NN classiﬁcation accuracy in Brodatz dataset.

Table 6: Accuracy of all texture classiﬁcations of Brodatz.

Methods Accuracy (%)

CDL 85.25

RLPP 75.8

KSLR 85.55

logEuc-SC 63.24

RSR-S 78.32

RSR-J 76.7

Riem-DLSC 37.1

Geometry-DR 74.9

KLRM-DL 79.23

HSIC-SL (log-linear) + RCD 84.87

HSIC-SL (log-linear) + HRCD 88.87

HSIC-SL (log-Gaussian) + RCD 85.98

HSIC-SL (log-Gaussian) + HRCD 89.66

Mathematical Problems in Engineering 11

Table 7: Comparison of recognition rates (%) of diﬀerent methods.

Methods COIL-20 ETH-80

KPCA 81.05 72.61

KPCA + HRCD 83.79 73.7

RLPP 85.89 74.08

RLPP + HRCD 88.79 75.63

CDL 94.54 79.92

KSLR 96.24 81.66

HSIC-SL (log-linear) + RCD 96.72 82.80

HSIC-SL (log-linear) + HRCD 97.75 84.60

HSIC-SL (log-Gaussian) + RCD 96.87 82.40

HSIC-SL (log-Gaussian) + HRCD 97.92 85.28

KPCA

RLPP

CDL

HSIC-SL (log-linear)

KPCA + HRCD

RLPP + HRCD

KSLR

HSIC-SL (log-Gaussian)

0.65

0.7

0.75

0.8

0.85

0.9

0.95

Recognition accuracy

6 10141822263034384246502

Dimensionality

Figure 8: Recognition rates versus diﬀerent dimensionalities on COIL-20 database.

KPCA

RLPP

CDL

HSIC-SL (log-linear)

KPCA + HRCD

RLPP + HRCD

KSLR

HSIC-SL (log-Gaussian)

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

Recognition accuracy

12 22 32 42 52 62 72 82 92 1022

Dimensionality

Figure 9: Recognition rates versus diﬀerent dimensionalities on ETH-80 database.

12 Mathematical Problems in Engineering

(3) In the experiments, we also compared the perfor-

mance of the log-Gaussian kernel and log-linear

kernel. In general, the log-Gaussian kernel was better

than the log-linear kernel. However, in the experi-

ments on QMUL and Brodatz, the log-linear kernel

obtained better results than the log-Gaussian kernel.

e diﬀerence in performance indicated that the

choice of kernel aﬀected the performance of HSIC-

SL. We can improve the performance of HSIC-SL by

selecting a suitable kernel function.

6. Conclusions

In this work, we propose an improved covariance descriptor

called the HRCD, which represents images with SPD ma-

trices. e HRCD inherits the advantages of the RCD and is

more eﬀective.

To address the classiﬁcation problem on SPD Rieman-

nian manifolds, we propose an eﬃcient image classiﬁcation

method that is based on a kernel framework. We refer to it as

HSIC-SL. rough the deﬁnition of the log-linear kernel and

log-Gaussian kernel, the input images represented by SPD

matrices can be embedded into the RKHS. To seek explicit

mapping from the RKHS to the vector space, HSIC-SL

constructs the objective function on the basis of the

framework of subspace learning and HSIC maximization.

HSIC-SL always outperforms other representative methods

without increasing computational complexity.

e proposed algorithm also has certain limitations. e

average classiﬁcation accuracy is slightly worse than that of

KLRM-DL on the FERET dataset. Hence, the covariance

descriptor is not strong enough to handle the classiﬁcation of

small details, such as face recognition. For our future work,

we will employ other eﬀective features to form the covari-

ance matrices. We will also explore other useful kernel

functions to suit diﬀerent types of datasets.

Data Availability

e data used to support the ﬁndings of this study are

available from the corresponding author upon request.

Conflicts of Interest

e authors declare that they have no conﬂicts of interest.

Acknowledgments

is study was supported by the National Natural Science

Foundation of China through the Project “Research on

Nonlinear Alignment Algorithm of Local Coordinates in

Manifold Learning” under grant no. 61773022, the Character

and Innovation Project of Education Department of

Guangdong Province under grant no. 2018GKTSCX081, the

Young Innovative Talents Project of Education Department

of Guangdong Province under grant no. 2020KQNCX191,

the Guangzhou Science and Technology Plan Project of

Bureau of Science and Technology of Guangzhou Munici-

pality under grant no. 202102020700, and the Educational

Big Data Enterprise Lab of Guangzhou Panyu Polytechnic

under grant no. 2021XQS05.

References

[1] V. Arsigny, P. Fillard, X. Pennec, and N. Ayache, “Geometric

means in a novel vector space structure on symmetric pos-

itive-deﬁnite matrices,” SIAM Journal on Matrix Analysis and

Applications, vol. 29, no. 1, pp. 328–347, 2007.

[2] M. Harandi, C. Sanderson, C. Shen, and B. Lovell, “Dictionary

learning and sparse coding on Grassmann manifolds: an

extrinsic solution,” in Proceedings of the 2013 IEEE Interna-

tional Conference on Computer Vision, pp. 3120–3127, Sydney,

Australia, 2013.

[3] F. Porikli, O. Tuzel, and P. Meer, “Covariance tracking using

model update based on Lie algebra,” in Proceedings of the 2006

IEEE Computer Society Conference on Computer Vision and

Pattern Recognition, New York, NY, USA, 2006.

[4] R. Wang, H. Guo, L. S. Davis, and Q. Dai, “Covariance

discriminative learning: a natural and eﬃcient approach to

image set classiﬁcation,” in Proceedings of the 2012 IEEE

Conference on Computer Vision and Pattern Recognition,

Providence, RI, USA, 2012.

[5] W. Wang, R. Wang, Z. Huang, S. Shan, and X. Chen,

“Discriminant analysis on Riemannian manifold of Gaussian

distributions for face recognition with image sets,” IEEE

Transactions on Image Processing, vol. 27, no. 1, pp. 151–163,

2017.

[6] I. L. Dryden, A. Koloydenko, and D. Zhou, “Non-Euclidean

statistics for covariance matrices, with applications to diﬀu-

sion tensor imaging,” e Annals of Applied Statistics, vol. 3,

no. 3, pp. 1102–1123, 2009.

[7] D. Le Bihan, J.-F. o. Mangin, C. Poupon et al., “Diﬀusion

tensor imaging: concepts and applications,” Journal of

Magnetic Resonance Imaging, vol. 13, no. 4, pp. 534–546, 2001.

[8] R. Caseiro, P. Martins, J. F. Henriques, and J. Batista, “A

nonparametric Riemannian framework on tensor ﬁeld with

application to foreground segmentation,” Pattern Recogni-

tion, vol. 45, no. 11, pp. 3997–4017, 2012.

[9] M. G. omason and J. Gregor, “Higher order singular value

decomposition of tensors for fusion of registered images,”

Journal of Electronic Imaging, vol. 20, no. 1, pp. 13–23, 2011.

[10] Z. Gao, Y. Wu, M. Harandi, and Y. Jia, “A robust distance

measure for similarity-based classiﬁcation on the SPD

manifold,” IEEE Transactions on Neural Networks and

Learning Systems, vol. 31, no. 9, pp. 3230–3244, 2020.

[11] Y. Wu, Y. Jia, P. Li, J. Zhang, and J. Yuan, “Manifold kernel

sparse representation of symmetric positive-deﬁnite matrices

and its applications,” IEEE Transactions on Image Processing,

vol. 24, no. 11, pp. 3729–3741, 2015.

[12] A. Sanin, C. Sanderson, M. T. Harandi, and B. C. Lovell,

“Spatio-temporal covariance descriptors for action and ges-

ture recognition,” in Proceedings of the 2013 IEEE Workshop

on Applications of Computer Vision (WACV), Clearwater

Beach, FL, USA, 2013.

[13] J. S´anchez, F. Perronnin, T. Mensink, and J. Verbeek, “Image

classiﬁcation with the Fisher vector: theory and practice,”

International Journal of Computer Vision, vol. 105, no. 3,

pp. 222–245, 2013.

[14] H. Jegou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, and

C. Schmid, “Aggregating local image descriptors into compact

codes,” IEEE Transactions on Pattern Analysis and Machine

Intelligence, vol. 34, no. 9, pp. 1704–1716, 2012.

Mathematical Problems in Engineering 13

[15] X. Pennec, P. Fillard, and N. Ayache, “A Riemannian

framework for tensor computing,” International Journal of

Computer Vision, vol. 66, no. 1, pp. 41–66, 2006.

[16] V. Arsigny, P. Fillard, X. Pennec, and N. Ayache, “Log-Eu-

clidean metrics for fast and simple calculus on diﬀusion

tensors,” Magnetic Resonance in Medicine, vol. 56, no. 2,

pp. 411–421, 2006.

[17] A. Goh and R. Vidal, “Clustering and dimensionality re-

duction on Riemannian manifolds,” in Proceedings of the 2008

IEEE Conference on Computer Vision and Pattern Recognition,

Anchorage, AK, USA, 2008.

[18] O. Tuzel, F. Porikli, and P. Meer, “Pedestrian detection via

classiﬁcation on Riemannian manifolds,” IEEE Transactions

on Pattern Analysis and Machine Intelligence, vol. 30, no. 10,

pp. 1713–1727, 2008.

[19] B. Kulis, M. Sustik, and I. Dhillon, “Learning low-rank kernel

matrices,” in Proceedings of the 23rd International Conference

on Machine Learning, Pittsburgh, PA, USA, 2006.

[20] S. Sra, “A new metric on the manifold of kernel matrices with

application to matrix geometric means,” in Proceedings of the

25th International Conference on Neural Information Pro-

cessing Systems, vol. 1, Lake Tahoe, NV, USA, 2012.

[21] D. Tosato, M. Farenzena, M. Spera, V. Murino, and

M. Cristani, “Multi-class classiﬁcation on Riemannian

manifolds for video surveillance,” in Proceedings of the 2010

ECCV, Heraklion, Greece, 2010.

[22] J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu, “Se-

mantic segmentation with second-order pooling,” in Pro-

ceedings of the 2012 ECCV, Florence, Italy, 2012.

[23] A. Cherian, S. Sra, A. Banerjee, and N. Papanikolopoulos,

“Jensen-Bregman LogDet divergence with application to ef-

ﬁcient similarity search for covariance matrices,” IEEE

Transactions on Pattern Analysis and Machine Intelligence,

vol. 35, no. 9, pp. 2161–2174, 2013.

[24] N. Aronszajn, “eory of reproducing kernels,” Transactions

of the American Mathematical Society, vol. 68, no. 3,

pp. 337–404, 1950.

[25] M. Harandi, M. Salzmann, and R. Hartley, “Dimensionality

reduction on SPD manifolds: the emergence of geometry-

aware methods,” IEEE Transactions on Pattern Analysis and

Machine Intelligence, vol. 40, no. 1, pp. 48–62, 2018.

[26] M. T. Harandi, C. Sanderson, R. Hartley, and B. C. Lovell,

“Sparse coding and dictionary learning for symmetric positive

deﬁnite matrices: a kernel approach,” in Proceedings of the

ECCV 2012, Florence, Italy, 2012.

[27] R. Vemulapalli, J. K. Pillai, and R. Chellappa, “Kernel learning

for extrinsic classiﬁcation of manifold features,” in Proceed-

ings of the 2013 IEEE Conference on Computer Vision and

Pattern Recognition, Portland, OR, USA, 2013.

[28] O. Tuzel, F. Porikli, and P. Meer, “Region covariance: a fast

descriptor for detection and classiﬁcation,” in Proceedings of

the ECCV 2006, vol. 3952, Graz, Austria, 2006.

[29] H. Tan and Y. Gao, “Patch-based principal covariance dis-

criminative learning for image set classiﬁcation,” IEEE Access,

vol. 5, pp. 15001–15012, 2017.

[30] O. Arandjelovic, G. Shakhnarovich, J. Fisher, R. Cipolla, and

T. Darrell, “Face recognition with image sets using manifold

density divergence,” in Proceedings of the 2005 IEEE Computer

Society Conference on Computer Vision and Pattern Recog-

nition (CVPR’05), San Diego, CA, USA, 2005.

[31] M. Lovri´

c, M. Min-Oo, and E. A. Ruh, “Multivariate normal

distributions parametrized as a Riemannian symmetric

space,” Journal of Multivariate Analysis, vol. 74, no. 1,

pp. 36–48, 2000.

[32] X. Liu and Z. Ma, “Kernel-based subspace learning on Rie-

mannian manifolds for visual recognition,” Neural Processing

Letters, vol. 51, no. 1, pp. 147–165, 2020.

[33] R. Caseiro, P. Martins, J. F. Henriques, F. S. Leite, and

J. Batista, “Rolling Riemannian manifolds to solve the multi-

class classiﬁcation problem,” in Proceedings of the 2013 IEEE

Conference on Computer Vision and Pattern Recognition,

vol. 9, Portland, OR, USA, 2013.

[34] K. Guo, P. Ishwar, and J. Konrad, “Action recognition from

video using feature covariance matrices,” IEEE Transactions

on Image Processing, vol. 22, no. 6, pp. 2479–2494, 2013.

[35] R. Sivalingam, D. Boley, V. Morellas, and

N. Papanikolopoulos, “Tensor sparse coding for positive

deﬁnite matrices,” IEEE Transactions on Pattern Analysis and

Machine Intelligence, vol. 36, no. 3, pp. 592–605, 2014.

[36] A. Cherian and S. Sra, “Riemannian dictionary learning and

sparse coding for positive deﬁnite matrices,” IEEE Transac-

tions on Neural Networks and Learning Systems, vol. 28, no. 12,

pp. 2859–2871, 2017.

[37] X. Xie, Z. L. Yu, Z. Gu, and Y. Li, “Classiﬁcation of symmetric

positive deﬁnite matrices based on bilinear isometric Rie-

mannian embedding,” Pattern Recognition, vol. 87, pp. 94–

105, 2019.

[38] M. T. Harandi, C. Sanderson, A. Wiliem, and B. C. Lovell,

“Kernel analysis over Riemannian manifolds for visual rec-

ognition of actions, pedestrians and textures,” in Proceedings

of the 2012 IEEE Workshop on the Applications of Computer

Vision (WACV), Breckenridge, CO, USA, 2012.

[39] S. Jayasumana, R. Hartley, M. Salzmann, H. Li, and

M. Harandi, “Kernel methods on Riemannian manifolds with

Gaussian RBF kernels,” IEEE Transactions on Pattern Analysis

and Machine Intelligence, vol. 37, no. 12, pp. 2464–2477, 2015.

[40] R. Caseiro, J. F. Henriques, P. Martins, and J. Batista, “Semi-

intrinsic mean shift on Riemannian manifolds,” in Proceed-

ings of the ECCV 2012, vol. 7572, Florence, Italy, 2012.

[41] P. Li, Q. Wang, W. Zuo, and L. Zhang, “Log-Euclidean kernels

for sparse representation and dictionary learning,” in Pro-

ceedings of the 2013 IEEE International Conference on Com-

puter Vision, Sydney, Australia, 2013.

[42] M. T. Harandi, R. Hartley, B. Lovell, and C. Sanderson,

“Sparse coding on symmetric positive deﬁnite manifolds

using Bregman divergences,” IEEE Transactions on Neural

Networks and Learning Systems, vol. 27, no. 6, pp. 1294–1306,

2016.

[43] R. Zhuang, Z. Ma, W. Feng, and Y. Lin, “SPD data dictionary

learning based on kernel learning and Riemannian metric,”

IEEE Access, vol. 8, pp. 61956–61972, 2020.

[44] S. Hechmi, A. Gallas, and E. Zagrouba, “Multi-kernel sparse

subspace clustering on the Riemannian manifold of sym-

metric positive deﬁnite matrices,” Pattern Recognition Letters,

vol. 125, pp. 21–27, 2019.

[45] J. Shawetaylor and N. Cristianini, Kernel Methods for Pattern

Analysis, Cambridge University Press, New York, NY, USA,

2004.

[46] E. Kreyszig, Introductory Functional Analysis with Applica-

tions, Vol. 1, Wiley, New York, NY, USA, 1978.

[47] B. B. Damodaran, N. Courty, and S. Lef`

evre, “Sparse Hilbert

Schmidt independence criterion and surrogate-kernel-based

feature selection for hyperspectral image classiﬁcation,” IEEE

Transactions on Geoscience and Remote Sensing, vol. 55, no. 4,

pp. 2385–2398, 2017.

[48] K. Zhang, V. W. Zheng, Q. Wang, J. T. Kwok, Q. Yang, and

I. Marsic, “Covariate shift in Hilbert space: a solution via

surrogate kernels,” in Proceedings of the 30th International

14 Mathematical Problems in Engineering

Conference on Machine Learning, pp. 388–395, Atlanta, GA,

USA, 2013.

[49] M. J. Gangeh, H. Zarkoob, and A. Ghodsi, “Fast and scalable

feature selection for gene expression data using Hilbert-

Schmidt independence criterion,” IEEE/ACM Transactions on

Computational Biology and Bioinformatics, vol. 14, no. 1,

pp. 167–181, 2017.

[50] L. Song, A. Gretton, K. M. Borgwardt, and A. J. Smola,

“Colored maximum variance unfolding,” in Proceedings of the

20th International Conference on Neural Information Pro-

cessing Systems, Vancouver, Canada, 2007.

[51] L. Song, A. Smola, A. Gretton, J. Bedo, and K. Borgwardt,

“Feature selection via dependence maximization,” Journal of

Machine Learning Research, vol. 13, no. 1, pp. 1393–1434,

2007.

[52] J.-P. Aubin, “Hilbert-Schmidt operators and tensor products,”

in Applied Functional Analysis, pp. 283–308, John Wiley &

Sons, Hoboken, NJ, USA, 2nd edition, 2000.

[53] A. Gretton, O. Bousquet, A. Smola, and B. Sch¨

olkopf,

“Measuring statistical dependence with Hilbert-Schmidt

norms,” in Proceedings of the 2005 International Conference

on Algorithmic Learning eory, pp. 63–77, Singapore, 2005.

[54] S. I. Amari and H. Nagaoka, Methods of Information Ge-

ometry, Oxford University Press, New York, NY, USA, 2000.

[55] S. Nene, S. Nayar, and H. Murase, “Columbia object image

library (coil-20),” Technical report, Columbia University,

New York, NY, USA, 1996.

[56] B. Leibe and B. Schiele, “Analyzing appearance and contour

based methods for object categorization,” in Proceedings of the

2003 IEEE Computer Society Conference on Computer Vision

and Pattern Recognition, vol. 2, Madison, WI, USA, 2003.

[57] D. Tosato, M. Spera, M. Cristani, and V. Murino, “Charac-

terizing humans on Riemannian manifolds,” IEEE Transac-

tions on Pattern Analysis and Machine Intelligence, vol. 35,

no. 8, pp. 1972–1984, 2013.

[58] P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss, “e

FERET evaluation methodology for face-recognition algo-

rithms,” IEEE Transactions on Pattern Analysis and Machine

Intelligence, vol. 22, no. 10, pp. 1090–1104, 2000.

[59] T. Randen and J. H. Husoy, “Filtering for texture classiﬁca-

tion: a comparative study,” IEEE Transactions on Pattern

Analysis and Machine Intelligence, vol. 21, no. 4, pp. 291–310,

1999.

Mathematical Problems in Engineering 15

Hilbert–Schmidt Independence Criterion Regularization Kernel Framework on Symmetric Positive Definite Manifolds

Article

Full-text available

Oct 2021
MATH PROBL ENG

Image recognition tasks involve an increasingly high amount of symmetric positive definite (SPD) matrices data. SPD manifolds exhibit nonlinear geometry, and Euclidean machine learning methods cannot be directly applied to SPD manifolds. The kernel trick of SPD manifolds is based on the concept of projecting data onto a reproducing kernel Hilbert space. Unfortunately, existing kernel methods do not consider the connection of SPD matrices and linear projections. Thus, a framework that uses the correlation between SPD matrices and projections to model the kernel map is proposed herein. To realize this, this paper formulates a Hilbert–Schmidt independence criterion (HSIC) regularization framework based on the kernel trick, where HSIC is usually used to express the interconnectedness of two datasets. The proposed framework allows us to extend the existing kernel methods to new HSIC regularization kernel methods. Additionally, this paper proposes an algorithm called HSIC regularized graph discriminant analysis (HRGDA) for SPD manifolds based on the HSIC regularization framework. The proposed HSIC regularization framework and HRGDA are highly accurate and valid based on experimental results on several classification tasks.

SPD Data Dictionary Learning Based on Kernel Learning and Riemannian Metric

Article

Full-text available

Apr 2020

The use of regional covariance descriptors to generate feature data represented by Symmetric Positive Definite (SPD) matrices from images or videos has become increasingly common in machine learning. However, SPD data itself does not constitute a vector space, and dictionary learning involves a large number of linear operations, so dictionary learning cannot be performed directly on SPD data. For this reason, a more common method is to map the SPD data to the Reproducing Kernel Hilbert Space (RKHS). The so-called kernel learning is to find the most suitable RKHS for specific tasks. RKHS can be uniquely generated by a kernel function. Therefore, RKHS learning can also be considered as kernel learning. In this article, there are two main contributions. The first contribution is to propose a framework which based on Kernel Learning and Riemannian Metric (KLRM). Usually the learnable kernel function framework is to learn some parameters in the kernel function. The second contribution is dictionary learning by applying KLRM to SPD data. The SPD data is transformed into the RKHS generated by KLRM, and RKHS after training provides the most suitable working space for dictionary learning. Under the proposed framework, we design a positive definite kernel function, which is defined by the Log-Euclidean metric. This function can be transformed into a corresponding Riemannian kernel. The experimental results provided in this paper is compared with other state-of-the-art algorithms for SPD data dictionary learning and show that the proposed algorithm achieves better results.

Kernel-Based Subspace Learning on Riemannian Manifolds for Visual Recognition

Article

Full-text available

Feb 2020
NEURAL PROCESS LETT

Covariance matrices have attracted increasing attention for data representation in many computer vision tasks. The nonsingular covariance matrices are regarded as points on Riemannian manifolds rather than Euclidean space. A common technique for classification on Riemannian manifolds is to embed the covariance matrices into a reproducing kernel Hilbert space (RKHS), and then construct a map from RKHS to Euclidean space, while the explicit map from RKHS to Euclidean space in most kernel-based methods only depends on a linear hypothesis. In this paper, we propose a subspace learning framework to project Riemannian manifolds to Euclidean space, and give the theoretical derivation for it. Specifically, the Euclidean space is isomorphic to the subspace of RKHS. Under the framework, firstly we define an improved Log-Euclidean Gaussian radial basis function kernel for embedding. The first order statistical features of input images are incorporated into the kernel function to increase the discriminative power. After that we seek the optimal projection matrix of the subspace of the RKHS by conducting a graph embedding discriminant analysis. Texture recognition and object categorization experiments with region covariance descriptors demonstrate the considerable effectiveness of the improved Log-Euclidean Gaussian RBK kernel and the proposed method.

Patch-Based Principal Covariance Discriminative Learning for Image Set Classification

Article

Full-text available

Jul 2017

Image set classification has attracted increasing attention with respect to the use of significant amounts of within-set information. The covariance matrix is a natural and effective descriptor for describing image sets. Non-singular covariance matrices, known as Symmetric Positive Definite (SPD) matrices, are regarded as points on a Riemannian manifold. A common method of classifying points on a manifold is to explicitly map the SPD matrices from a Riemannian manifold to Euclidean space such as in the Covariance Discriminative Learning (CDL) method. However, the disadvantages of the CDL method are as follows: 1) the method models the whole image set as a covariance matrix, whereas if there are insufficient set samples or merely one set, the within-class information studied by the discriminative learning may not be utilized well, 2) and when the original sample covariance matrices are of high dimensionality, the computational cost is non-trivial. To address these problems of CDL, we propose to exploit the Maximal Linear Patch (MLP) to cluster image sets into multiple subsets (local patches), which could provide substantially more within-class information. Moreover, we refine the manifold formed by the SPD matrices to a lower dimensionality and more discriminative manifold by collaboratively applying Principal Component Analysis (PCA) to all training sets. Experiments are performed on face recognition and objection categorization tasks; extensive comparison results illustrate the considerable effectiveness of our proposed method.

A Robust Distance Measure for Similarity-Based Classification on the SPD Manifold

Article

Sep 2019

The symmetric positive definite (SPD) matrices, forming a Riemannian manifold, are commonly used as visual representations. The non-Euclidean geometry of the manifold often makes developing learning algorithms (e.g., classifiers) difficult and complicated. The concept of similarity-based learning has been shown to be effective to address various problems on SPD manifolds. This is mainly because the similarity-based algorithms are agnostic to the geometry and purely work based on the notion of similarities/distances. However, existing similarity-based models on SPD manifolds opt for holistic representations, ignoring characteristics of information captured by SPD matrices. To circumvent this limitation, we propose a novel SPD distance measure for the similarity-based algorithm. Specifically, we introduce the concept of point-to-set transformation, which enables us to learn multiple lower dimensional and discriminative SPD manifolds from a higher dimensional one. For lower dimensional SPD manifolds obtained by the point-to-set transformation, we propose a tailored set-to-set distance measure by making use of the family of alpha-beta divergences. We further propose to learn the point-to-set transformation and the set-to-set distance measure jointly, yielding a powerful similarity-based algorithm on SPD manifolds. Our thorough evaluations on several visual recognition tasks (e.g., action classification and face recognition) suggest that our algorithm comfortably outperforms various state-of-the-art algorithms.

Multi-kernel sparse subspace clustering on the Riemannian manifold of symmetric positive definite matrices

Article

Jul 2019
PATTERN RECOGN LETT

Recently, clustering in the Riemannian manifolds has received a great interest. The main specificity of this kind of spaces is their ability to detect nonlinear forms in the real world data groups. Accordingly, numerous Euclidean clustering algorithms have been emerged to the Riemannian framework. One of them is the Sparse Subspace Clustering (SSC) method. Within this context, an interesting method named Kernel Sparse Subspace Clustering in the Riemannian manifold (KSSCR) has been proposed. In order to improve the clustering performance of the SSC method, KSSCR is based on the idea of data projection into a Reproducing Kernel Hilbert Space (RKHS) through a Riemannian kernel. This modification in terms of projection space has yielded to significant clustering results compared to the original SSC. This paper proposed a Multi-Kernel SSCR (MKSSCR) method. The idea is to create a linear combination of a set of Riemannian kernels. In fact, by using a single kernel, the faraway points will be neglected. Thus, the key motivation of this work is to use a mixture of kernels. The purpose here is to emphasis the closest data points without ignoring the faraway ones in the process of selecting neighbors for each data point. Several experiments carried out on varied face clustering data sets demonstrate the clustering accuracy improvement by the proposed method compared to other state-of-the-art methods.

Classification of Symmetric Positive Definite Matrices based on Bilinear Isometric Riemannian Embedding

Article

Oct 2018
PATTERN RECOGN

Because covariance features with the form of symmetric positive definite matrices lie on Riemannian manifold, classification on Riemannian manifold could possess high performance in many applications. Unfortunately, the applicability of classification methods developed on Riemannian manifold is limited by their huge computational complexities, particularly with the feature data on high-dimensional Riemannian manifold. To alleviate the problem of computational cost, in this paper, a simple yet efficient dimensionality reduction algorithm, bilinear isometric Riemannian embedding, is derived to construct a low-dimensional embedding from high-dimensional Riemannian manifold. To this end, we model the bilinear isometric mapping to identify a low-dimensional embedding that maximizes the preservation of Riemannian geodesic distance. A supervised classification method, embedding discriminant analysis, is then proposed based on the low-dimensional embedding. Experimental results on image and electroencephalogram reveal that the proposed algorithms can efficiently extract the distance-preserving embedding and obtain higher classification performance.

Discriminant Analysis on Riemannian Manifold of Gaussian Distributions for Face Recognition With Image Sets

Article

Aug 2017

To address the problem of face recognition with image sets, we aim to capture the underlying data distribution in each set and thus facilitate more robust classification. To this end, we represent image set as Gaussian Mixture Model (GMM) comprising a number of Gaussian components with prior probabilities and seek to discriminate Gaussian components from different classes. Since in the light of information geometry the Gaussians lie on a specific Riemannian manifold, this paper presents a method named Discriminant Analysis on Riemannian manifold of Gaussian distributions (DARG). We investigate several distance metrics between Gaussians and accordingly two discriminative learning frameworks are presented to meet the geometric and statistical characteristics of the specific manifold. The first framework derives a series of provably positive definite probabilistic kernels to embed the manifold to a high-dimensional Hilbert space, where conventional discriminant analysis methods developed in Euclidean space can be applied, and a weighted Kernel Discriminant Analysis is devised which learns discriminative representation of the Gaussian components in GMMs with their prior probabilities as sample weights. Alternatively, the other framework extends the classical graph embedding method to the manifold by utilizing the distance metrics between Gaussians to construct the adjacency graph, and hence the original manifold is embedded to a lower-dimensional and discriminative target manifold with the geometric structure preserved and the interclass separability maximized. The proposed method is evaluated by face identification and verification tasks on four most challenging and largest databases, YouTube Celebrities, COX, YouTube Face DB and Point-and-Shoot Challenge, to demonstrate its superiority over the state-of-the-art.

Hilbert‐Schmidt Operators and Tensor Products

Chapter

Oct 2011

Jean-Pierre Aubin

The Hilbert Space of Hilbert-Schmidt OperatorsThe Fundamental Isomorphism TheoremHilbert Tensor ProductsThe Tensor Product of Continuous Linear OperatorsThe Hilbert Tensor Product by l2The Hilbert Tensor Product by L2The Tensor Product by the Sobolev Space Hm

Fast and Scalable Feature Selection for Gene Expression Data Using Hilbert-Schmidt Independence Criterion

Article

Jan 2017

Goal: In computational biology, selecting a small subset of informative genes from microarray data continues to be a challenge due to the presence of thousands of genes. This paper aims at quantifying the dependence between gene expression data and the response variables and to identifying a subset of the most informative genes using a fast and scalable multivariate algorithm. Methods: A novel algorithm for feature selection from gene expression data was developed. The algorithm was based on the Hilbert-Schmidt independence criterion (HSIC), and was partly motivated by singular value decomposition (SVD). Results: The algorithm is computationally fast and scalable to large datasets. Moreover, it can be applied to problems with any type of response variables including, biclass, multiclass, and continuous response variables. The performance of the proposed algorithm in terms of accuracy, stability of the selected genes, speed, and scalability was evaluated using both synthetic and real-world datasets. The simulation results demonstrated that the proposed algorithm effectively and efficiently extracted stable genes with high predictive capability, in particular for datasets with multiclass response variables. Conclusion/significance: The proposed method does not require the whole microarray dataset to be stored in memory, and thus can easily be scaled to large datasets. This capability is an important attribute in big data analytics, where data can be large and massively distributed.

Sparse Hilbert Schmidt Independence Criterion and Surrogate-Kernel-Based Feature Selection for Hyperspectral Image Classification

Article

Jan 2017

Designing an effective criterion to select a subset of features is a challenging problem for hyperspectral image classification. In this paper, we develop a feature selection method to select a subset of class discriminant features for hyperspectral image classification. First, we propose a new class separability measure based on the surrogate kernel and Hilbert Schmidt independence criterion in the reproducing kernel Hilbert space. Second, we employ the proposed class separability measure as an objective function and we model the feature selection problem as a continuous optimization problem using LASSO optimization framework. The combination of the class separability measure and the LASSO model allows selecting the subset of features that increases the class separability information and also avoids a computationally intensive subset search strategy. Experiments conducted with three hyperspectral data sets and different experimental settings show that our proposed method increases the classification accuracy and outperforms the state-of-the-art methods.

Hilbert–Schmidt Independence Criterion Subspace Learning on Hybrid Region Covariance Descriptor for Image Classification

Abstract and Figures

Recommended publications

Positive Definite Matrices: Symmetric positive definite (SPD) matrices Data RepresentationData repre...

Hilbert–Schmidt Independence Criterion Regularization Kernel Framework on Symmetric Positive Definit...

Kernel-Based Subspace Learning on Riemannian Manifolds for Visual Recognition

Mixed Region Covariance Discriminative Learning for Image Classification on Riemannian Manifolds

The framework of learnable kernel function and its application to dictionary learning of SPD data