ArticlePDF Available

Jointly Learning Kernel Representation Tensor and Affinity Matrix for Multi-View Clustering

Authors:

Abstract and Figures

Multi-view clustering refers to the task of partitioning numerous unlabeled data into several distinct clusters using multiple features. In this paper, we propose a novel nonlinear method called joint learning multi-view clustering (JLMVC) to jointly learn kernel representation tensor and affinity matrix. The proposed JLMVC has three advantages: (1) unlike existing low-rank representation-based multi-view clustering methods that learn the representation tensor and affinity matrix in two separate steps, JLMVC jointly learns them both in a single step such that their correlations can be well preserved. (2) using the “kernel trick“, JLMVC can handle nonlinear data structures for various real applications. (3) different from most existing methods that treat representations of all views equally, JLMVC automatically learns a reasonable weight for each view. Based on the alternating direction method of multipliers, an effective algorithm is designed to solve the proposed model. Extensive experiments on eight multimedia datasets demonstrate the superiority of the proposed JLMVC over state-of-the-art methods.
Content may be subject to copyright.
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 22, NO. 8, AUGUST 2020 1985
Jointly Learning Kernel Representation Tensor and
Affinity Matrix for Multi-View Clustering
Yongyong Chen , Xiaolin Xiao , and Yicong Zhou , Senior Member, IEEE
Abstract—Multi-view clustering refers to the task of partitioning
numerous unlabeled multimedia data into several distinct clusters
using multiple features. In this paper, we propose a novel nonlinear
method called joint learning multi-view clustering (JLMVC) to
jointly learn kernel representation tensor and affinity matrix.
The proposed JLMVC has three advantages: (1) unlike existing
low-rank representation-based multi-view clustering methods that
learn the representation tensor and affinity matrix in two separate
steps, JLMVC jointly learns them both; (2) using the “kernel trick,
JLMVC can handle nonlinear data structures for various real
applications; and (3) different from most existing methods that
treat representations of all views equally, JLMVC automatically
learns a reasonable weight for each view. Based on the alternating
direction method of multipliers, an effective algorithm is designed
to solve the proposed model. Extensive experiments on eight
multimedia datasets demonstrate the superiority of the proposed
JLMVC over state-of-the-art methods.
Index Terms—Multi-view clustering, low-rank tensor represen-
tation, kernel trick, affinity matrix, adaptive weight.
I. INTRODUCTION
IN MANY real-world applications, multimedia data such as
images, videos, audio, and documents, are usually repre-
sented by different features or collected from various fields
(called multi-view data) [1]–[3]. For example, in multimedia
retrieval [2], images can be represented by color, textures, and
edges. In video surveillance [3], the same scene is monitored by
multiple cameras from different viewpoints. In natural language
processing [4], documents can be translated by multiple different
languages like Chinese, English, French, and so on. Considering
that multi-view data are greatly conducive to the performance
improvement, multi-view clustering has attracted great research
Manuscript received June 5, 2019; revised September 28, 2019; accepted
October 29, 2019. Date of publication November 11, 2019; date of current
version July 24, 2020. This work was supported in part by the Science and
Technology Development Fund, Macau SAR (File no. 189/2017/A3), and in part
by the Research Committee at University of Macau under Grants MYRG2016-
00123-FST and MYRG2018-00136-FST. The associate editor coordinating the
review of this manuscript and approving it for publication was Dr. Marco Carli.
(Corresponding author: Yicong Zhou.)
Y. Chen and Y. Zhou are with the Department of Computer and Information
Science, University of Macau, Macau 999078, China (e-mail: YongyongChen.
cn@gmail.com; yicongzhou@um.edu.mo).
X. Xiao is with the School of Computer Science and Engineering, South
China University of Technology, Guangzhou 510006, China, and also with the
Department of Computer and Information Science, University of Macau, Macau
999078, China (e-mail: shellyxiaolin@gmail.com).
Color versions of one or more of the figures in this article are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TMM.2019.2952984
interests in many fields including multimedia data mining, ma-
chine learning and pattern recognition communities [5]–[8].
Given multi-view features extracted from the original multi-
media data, they are used to partition all unlabeled multimedia
data into several distinct clusters. Massive approaches for
clustering have been proposed. Either single-view clustering or
multi-view clustering, they usually follow two main steps: 1)
constructing a symmetric affinity matrix (also called similarity
matrix) to describe the pairwise relations between multimedia
data points and 2) performing the spectral clustering algo-
rithm [9] to obtain clustering results. The core of these methods
is construction of the affinity matrix. This means that the quality
of the learned affinity matrix heavily determines the clustering
performance. In literature, two common schemes, the raw mul-
timedia features and computed representations [10], [11], are
selected to conduct the affinity matrix, leading to the following
three categories: 1) graph-based methods [12]–[19], 2) sub-
space clustering-based methods [5]–[8], [11], [20]–[24], 3) their
combinations [10], [25], [26]. For example, due to simplicity
and effectiveness, k-Nearest Neighbor using cosine or heat
kernel distances [27] has become an intuitive way to construct
the affinity matrix. Following the idea that local connectivity of
multimedia data can be measured by the Euclidean distance, the
work in [12] constructed the affinity matrix by assigning adap-
tive neighbors to each multimedia data point. In [13], Nie et al.
adopted the l1-norm distance instead of the Euclidean distance
and proposed a graph clustering relaxation. Based on the fact
that the affinity matrix should obey the block diagonal property,
Nie et al. [14] imposed the rank constraint on the Laplacian
matrix for graph-based clustering. To well explore the com-
plementary information of multi-view features, the approaches
in [17] and [18] extended the adaptive neighbor strategy [12]
and the rank constraint [14] from the single-view setting into the
multi-view one, respectively. Following this, Wang et al. [19]
pursued a unified affinity matrix from the affinity matrices of all
views and the rank function was considered to partition multime-
dia data points into optimal number of clusters. However, these
graph-based approaches, e.g., [16], [18], [19], usually construct
the affinity matrix by directly using the raw multimedia features
which are often corrupted by noise and outliers. Thus, they may
obtain an unreliable and inaccurate affinity matrix [10], [26].
As the second category, subspace clustering-based methods
have become the mainstream due to their excellent interpretabil-
ity and performance. The goal of subspace clustering is to
simultaneously find low-dimensional subspaces and partition
multimedia data points into multiple subspaces. Specifically,
1520-9210 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on September 12,2020 at 09:23:51 UTC from IEEE Xplore. Restrictions apply.
1986 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 22, NO. 8, AUGUST 2020
sparse subspace clustering (SSC) [21] and low-rank represen-
tation (LRR) [20] are two representative works, resulting in a
local representation matrix and a global one, respectively. Since
SSC learns the representation matrix by l1-norm, it imposes the
sparsity on all entries of the representation matrix. However,
LRR conducts the representation matrix by the low-rank regu-
larizer. This imposes the sparsity on the singular values. Beyond
the low-rankness and sparsity, some extra structures underlying
data, such as the local similarity structure and nonnegativity [28],
may not be fully considered. Instead of the fixed dictionary, i.e.,
the original multimedia feature, the work in [29] proposed to
learn a locality-preserving dictionary to capture the intrinsic ge-
ometric structure of the dictionary for LRR. Yin et al. [26] pro-
posed to integrate LRR and the graph construction in a unified
framework to learn an adaptive low-rank graph affinity matrix. A
similar idea was adopted in [10], [25]. A major challenge is that,
when handing multi-view features, they may cause a significant
performance degradation since they focus only on single-view
feature.
Recently, considerable efforts based on deep neural network
have been expended for clustering. For example, Ji et al. [30]
proposed a deep neural network by introducing a self-expressive
layer into the auto-encoder framework for clustering. To conduct
a deep structure, the authors in [31] adopted semi-nonnegative
matrix factorization for mutli-view clustering. In [32], a highly-
economized scalable image clustering method was proposed
to cluster large-scale multi-view images. Besides, to deal with
multi-view clustering with missing features, Chao et al. [33]
presented an enhanced multi-view co-clustering method. For a
comprehensive survey on clustering, please refer to [34] and the
references therein.
A. Related Work
The existing low-rank-based approaches for multi-view clus-
tering can be roughly grouped into two categories: two-
dimension matrix-based low-rank methods [5], [23], [35]–[40]
and three-dimension tensor-based low-rank ones [6]–[8]. For
example, to deal with multiple multimedia features, the work
in [35] proposed to concatenate all heterogeneous features and
then perform LRR [20]. Xia et al. [36] exploited the low-rank
and sparse matrix decomposition to uncover a shared transition
probability matrix under the Markov chain method. Except for
consistency among multi-view features, the work in [38] took lo-
cal view-specific information into consideration for multi-view
clustering. Similarly, Tang et al. [5] proposed a multi-view clus-
tering method by learning a joint affinity graph. In [5], [38], the
consistency measures the common properties among all views
while the specificity captures the inherent difference in each
view. Different from these approaches that use the nuclear norm
to depict the low-rank property of the representation matrices,
Wang et al. [23] proposed to factorize each representation matrix
as the product of symmetric low-rank data-cluster matrices, such
that the singular value decomposition can be ignored. Following
this, Liu et al. [40] proposed to mine a consensus representation
of all views by multi-view non-negative matrix factorization.
Fig. 1. Comparison of existing low-rank tensor representation-based MVC
methods (the red dashed rectangle) and our proposed JLMVC (the blue dashed
rectangle). Existing methods construct the representation matrix (a) and the
affinity matrix (b) in two separate steps without considering their correlation.
JLMVC learns the representation tensor and the affinity matrix (d) in a unified
framework. Additionally, the kernel-induced mapping is adopted to map the
original multimedia data (usually nonlinear separable) into a new linear space.
The most representative methods of the second category are
the tensor unfolding-based method (LT-MSC) [6] and t-singular
value decomposition (t-SVD)-based one (t-SVD-MSC) [7]. As
shown in Fig. 1(a), each representation matrix is stored as the
frontal slice of a tensor, resulting in a third-order tensor (called
representation tensor). The main difference between [6] and [7]
is the tensor rank approximation which aims to explore the
high order correlations among multi-views. By organizing all
multi-view features into a third-order tensor, the work in [41]
exploited the sparsity and tensor nuclear norm penalty with
self-expressiveness to construct the representation tensor.
Although these approaches have achieved a great advance for
multi-view clustering, they may suffer from the following chal-
lenges: 1) their performance may sharply degrade in real applica-
tions when the multimedia data come from nonlinear subspaces.
The intuitive reason is that they were originally designed to deal
with the data that lie within multiple linear subspaces [8], [42],
[43]. 2) the correlation between the representation tensor and
affinity matrix may not be fully exploited. They learn the rep-
resentation tensor via different low-rank tensor representations,
and then construct the affinity matrix as shown in Figs. 1(a) and
(b) in two separate steps. This means that the global optimal
affinity matrix cannot be ensured. 3) the importance of each
view in the construction of the affinity matrix is not considered.
For example, methods in [6], [7], [44] simply average all repre-
sentation matrices with the same weight. The approach in [44]
overcomes the first limitation, but fails to address the other two
Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on September 12,2020 at 09:23:51 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: JOINTLY LEARNING KERNEL REPRESENTATION TENSOR AND AFFINITY MATRIX 1987
challenges. To our best knowledge, no work has been done to
address these three challenges simultaneously.
B. Our Contributions
To address above three challenges, we propose a unified model
to jointly learn the kernel representation tensor and affinity
matrix for multi-view clustering (JLMVC). JLMVC learns the
representation tensor and affinity matrix jointly such that their
correlations can be well exploited, handles the nonlinear mul-
timedia data using a kernel-induced mapping, and adopts the
adaptive weight strategy to form a unified affinity matrix. Fig. 1
compares the proposed JLMVC with two state-of-the-art low-
rank tensor representation-based MVC methods LT-MSC [6]
and t-SVD-MSC [7]. As can be observed that, under the assump-
tion that the original data lie within multiple linear subspaces,
existing low-rank tensor representation-based MVC methods
learn the representation tensor from the original multimedia data.
However, this assumption may not be ensured in real applica-
tions. To achieve nonlinear multi-view clustering, JLMVC maps
the original multimedia data from the input data space into a
new feature space such that the mapped data points can reside in
multiple linear subspaces, as shown in the middle of Fig. 1(c).
JLMVC then learns the representation tensor and affinity matrix
simultaneously. Finally, the learned unified affinity matrix is fed
to the input of the spectral clustering algorithm [9] to obtain the
clustering results.
The contributions and novelty of this paper are summarized
as follows:
rWe propose a joint learning multi-view clustering
(JLMVC) model to jointly learn kernel representation ten-
sor and affinity matrix for multi-view clustering. JLMVC
is able to well explore the correlation between the represen-
tation tensor and affinity matrix, handles the nonlinear data
using a kernel-induced mapping, and adopts the adaptive
weight strategy to form a unified affinity matrix.
rJLMVC uses the tensor nuclear norm to encode the low
rank property of the representation tensor and adaptively
learns different weights for different views’ representation
matrices. This greatly benefits the construction of the uni-
fied affinity matrix.
rAn effective algorithm is designed to solve the JLMVC
model via the alternating direction method of multipli-
ers. Extensive experiments on eight popular multimedia
datasets are conducted and validate the superiority of
JLMVC over ten state-of-the-art approaches.
C. Organization of the Paper
The rest of this paper is structured as follows. Section II intro-
duces some notations and preliminaries, especially the t-SVD-
based tensor nuclear norm which is used to depict the low-rank
property of the representation tensor. In Section II, we intro-
duce JLMVC and design an iterative algorithm under the alter-
nating direction method of multipliers framework. We evaluate
the performance of the proposed JLMVC on eight real-world
multi-view datasets in Section IV and conclude the whole paper
in Section V.
TAB LE I
BASIC NOTATIONS AND THEIR DESCRIPTIONS
II. NOTATIONS AND PRELIMINARIES
In this section, we aim to introduce some notations used
throughout this paper and the t-SVD-based tensor nuclear norm
(see Definition 2.2) that will be used to depict the low-rank
property of the representation tensor. Some basic notations are
summarized in Table I.
Before the definition of t-SVD [45], several operators are first
introduced. For a tensor X∈Rn1×n2×n3, its block circular ma-
trix bcirc(X)and block diagonal matrix bdiag(X)are defined
as
bcirc(X)=
X(1) X(n3)··· X(2)
X(2) X(1) ··· X(3)
.
.
..
.
.....
.
.
X(n3)X(n31) ··· X(1)
,
bdiag(X)=
X(1)
X(2)
...
X(n3)
.
The block vectorization is defined as bvec(X)=[X(1);···;
X(n3)]. The inverse operations of bvec and bdiag are de-
fined as bvfold(bvec(X)) = Xand bdfold(bdiag(X)) =
X, respectively. Let Y∈Rn2
×n4
×n3.Thet-product X∗Yis
an n1×n4×n3tensor, X∗Y=bvfold(bcirc(X)bvec
(Y)).Thetranspose of Xis XTRn2×n1×n3by transpos-
ing each of the frontal slices and then reversing the order
of transposed frontal slices 2 through n3.Theidentity ten-
sor I∈Rn1×n1×n3is a tensor whose first frontal slice is an
n1×n1identity matrix and the rest frontal slices are zero. A
tensor X∈Rn1×n1×n3is orthogonal if it satisfies XT∗X =
X∗X
T=I.
Definition 2.1: (t-SVD) Given X, its t-SVD is defined as
X=U∗G∗V
T,
where U∈Rn1×n1×n3and V∈Rn2×n2×n3are orthogonal ten-
sors, G∈Rn1×n2×n3is an f-diagonal tensor. Each of its frontal
slices is a diagonal matrix.
Fig. 2 shows the t-SVD of a third-order tensor. The t-SVD-
based tensor nuclear norm (t-SVD-TNN) is given as follows.
Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on September 12,2020 at 09:23:51 UTC from IEEE Xplore. Restrictions apply.
1988 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 22, NO. 8, AUGUST 2020
Fig. 2. The t-SVD of a tensor of size n1×n2×n3.
Definition 2.2: (t-SVD-TNN) The t-SVD-TNN of a tensor
X∈Rn1×n2×n3, denoted as X, is defined as the sum of
singular values of all the frontal slices of ˆ
X,i.e.,
X=
min{n1,n2}
i=1
n3
k=1 |ˆ
G(i, i, k)|.(1)
III. JOINT LEARNING MULTI-VIEW CLUSTERING
In this section, we first elaborate the proposed JLMVC model
in Section III-A, and then solve this model by the alternating
direction method of multipliers (ADMM) in Section III-B. Con-
sidering that, in real world applications, the multimedia data
may be drawn from multiple nonlinear subspaces, JLMVC first
uses the kernel trick to solve the nonlinearity. Based on the
self-expression property [20], [21], JLMVC carries out joint
learning of the representation tensor and unified affinity matrix.
A. Problem Formulation
The existing multi-view clustering method t-SVD-MSC [7]
learns the representation tensor Zby
min
Z,E Z+α
V
v=1 E(v)2,1
s.t. X(v)=X(v)Z(v)+E(v),v=1,...,V,
Z(Z(1),Z(2),...,Z(V)).(2)
where X(v)Rdv×ndenotes the v-th view feature; α>0is
the regularization parameter; Edenotes noise and outliers; Φ(·)
is an operator to stack all representation matrices {Z(v)}into a
third-order tensor Zas shown in Fig. 1(a).
Once Zis yielded by Eq. (2), the affinity matrix Sis con-
structed by averaging all frontal slices of Z. This means that,
in the construction of S, the correlation between Sand Zis
fixed. This scheme, however, may not ensure the optimal affin-
ity matrix since different view features characterize specific and
partly independent information of the dataset. Therefore, to ad-
dress this issue, different weights should be assigned on different
views. Then we give the following model:
min
Z,S,ω Z+
V
v=1 αX(v)X(v)Z(v)2,1
+λω(v)Z(v)S2
F+ηω2
2
s.t. Z(Z(1),Z(2),...,Z(V))0,Σvω(v)=1,(3)
where α,λand ηare three positive parameters to balance the
contributions of all terms in the objective function; ω(v)is the
relative weight of the v-th view; the last term is to smoothen
the weight distribution and avoid the futile solution [46]. How-
ever, in model (3), the self-expression property is encoded on
the original input data space (i.e., the second term). This usu-
ally exhibits the nonlinear structure in real-world datasets. Here,
we seek new feature spaces for the linear separated multi-view
clustering. Borrowing the idea of the kernel methods [42], [43],
for the v-th feature, let φ(v):Rdv→H
(v)be a kernel mapping
from the original data space to the kernel space. As stated in the
following Eq. (6), φ(v)does not need to be defined explicitly.
Let K(v)Rn×nbe a positive kernel Gram matrix, i.e.,
K(v)=φ(v)(X(v))Tφ(v)(X(v)).(4)
Then, we encode the self-expression property on the new feature
space. This is also the reason that the proposed JLMVC can
handle the nonlinearity problem. Based on the above analysis,
model (3) can be formulated as
min
Z,S,ω Z+
V
v=1 αφ(X(v))φ(X(v))Z(v)2,1
+λω(v)Z(v)S2
F+ηω2
2
s.t. Z(Z(1),Z(2),...,Z(V))0,Σvω(v)=1.(5)
Note that the second term of Eq. (5) can be rewritten as
φ(X(v))φ(X(v))Z(v)2,1
=
n
i=1 P(v)T
iK(v)P(v)
i1
2,(6)
where P(v)=IZ(v).P(v)
iis the i-th column of P(v).From
Eq. (6), it is easy to see that the kernel mapping φ(v)appears only
in the form of the inner product, i.e.,φ(v)(X(v))Tφ(v)(X(v)),
leading to the kernel Gram matrix K(v). Therefore, φ(v)is
implicitly defined. For simplicity, we denote g(v)(P(v))=
n
i=1 P(v)T
iK(v)P(v)
i1
2to be the reconstruction error in the
kernel space. Finally, the proposed JLMVC model can be for-
mulated as
min
Z,P (v),S,ω Z+
V
v=1 αg(v)P(v)
+λω(v)Z(v)S2
F+ηω2
2
s.t. Z(Z(1),Z(2),...,Z(V)),
P(P(1),P(2),...,P(V)),
P=I−Z0,Σvω(v)=1,(7)
where the first term, i.e.,Zdefined in Eq. (1), is used to
explore the low-rankness of Z; the second term can handle the
nonlinear structures; the third term with the adaptive weight
strategy aims to learn a unified affinity matrix S.
Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on September 12,2020 at 09:23:51 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: JOINTLY LEARNING KERNEL REPRESENTATION TENSOR AND AFFINITY MATRIX 1989
B. Optimization
It is intractable to solve the proposed model in Eq. (7) since
it is not jointly convex and coupled with respect to variable Z.
Therefore, we solve Eq. (7) under ADMM framework. We can
reformulate Eq. (7) as:
min
Z,Y,P,S,ω Y+
V
v=1 αg(v)P(v)
+λω(v)Z(v)S2
F+ηω2
2
s.t. Z(Z(1) ,Z(2),...,Z(V)),
P(P(1) ,P(2),...,P(V)),
P=I−Z 0,Σvω(v)=1,Z=Y.(8)
Following the idea of ADMM, we introduce one auxiliary vari-
able Yto separate Zin the objective function and then itera-
tively update each variable by fixing other variables [47]. The
augmented Lagrangian function is defined as the sum of the
objective function of Eq. (8) and the penalty term under
the l2-norm. The augmented Lagrangian function of model (8)
is given by:
Lρ(Z,Y,P(v),S,Π) = Y+
V
v=1 αg(v)P(v)
+λω(v)Z(v)S2
F+ηω2
2+Θ,I−Z−P
+ρ
2I−Z−P2
F+Π,Z−Y+ρ
2Z Y2
F,(9)
where Θand Πare the Lagrange multipliers of size n×n×
V;ρis the non-negative penalty parameter; ·,· is the inner
product. Under the ADMM framework, we can solve Eq. (9) by
optimizing one variable while keeping the other variables fixed
as follows:
Step 1 Update Z:Fixing other variables, we can update Z
by the following subproblem:
min
Z
V
v=1
λω(v)
kZ(v)Sk2
F
+ρk
2
I−Z−P
k+Θk
ρk
2
F
+ρk
2
Z−Yk+Πk
ρk
2
F
.
(10)
It is easy to see that updating each frontal slice Z(v)of Zis
independent. This means that Z(v)can be updated in parallel.
The v-th subproblem is
min
Z(v)
λω(v)
kVertZ
(v)Sk2
F
+ρk
2
Z(v)A(v)
k
2
F+ρk
2
Z(v)B(v)
k
2
F,
(11)
where A(v)
k=IP(v)+Θ(v)
k
ρkand B(v)
k=Y(v)
kΠ(V)
k
ρk.By
setting the derivative of Eq. (11) with respect to Z(v)to zero,
Fig. 3. Explanation of rotation.
the optimal solution Z(v)
k+1 is
Z(v)
k+1 =2λω(v)
kSk+ρkA(v)
k+ρkB(v)
k(2λω(v)
k+2ρk).
(12)
Step 2 Update Y:When other variables are fixed, Ycan be
updated by
min
YY+ρk
2Y Fk2
F,(13)
where Fk=Zk+1 +Πk
ρk. Following [7], we rotate Yfrom size
n×n×Vto n×V×nas shown in Fig. 3. The first reason is
that, as in Eq. (1), t-SVD-TNN performs SVD on each frontal
slice of ˆ
Yto capture the “spatial-shifting” correlation [45], [48].
This means that t-SVD-TNN preserves only the low-rank prop-
erty of intra-view. However, we hope to capture the low-rank
property of inter-views. The second reason is that the rotation
operation can significantly reduce the computation cost [7]. Af-
ter the rotation operation, each frontal slice of ˆ
Yrepresents the
view-specific self-representation matrix.
The closed-form solution of Eq. (13) can be obtained by the
tensor tubal-shrinkage operator [7], [49]:
Yk+1 =CV
ρk
(Fk)=U∗CV
ρk
(G)∗VT,(14)
where Fk=U∗G∗V
T, and CV
ρk
(G)=G∗J, in which Jis an
f-diagonal tensor whose diagonal element in the Fourier domain
is J(i, i, k)=max{1V/ρ
k
G(i,i,k),0}.
Step 3 Update P:With other variables fixed, we minimize
the augmented Lagrangian function in Eq. (9) with respect to P:
min
P
V
v=1
αg(v)P(v)+ρk
2I−Zk+1 −P+Θk
ρk2
F.(15)
Similar to Eq. (10), updating P(v)is also independent:
min
P(v)αg(v)P(v)+ρk
2P(v)D(v)
k2
F,(16)
where D(v)
k=IZ(v)
k+1 +Θ(v)
k
ρk. Compared with the method
in [42] which uses l2-norm to measure the reconstruction er-
ror, it is more difficult to solve Eq. (16) since g(v)is convex but
non-smooth. According to [43], the i-th column of the optimal
solution of Eq. (16) p(v)
iis
p(v)
i=ˆp(v),if[1(v)
1,...,1(v)
r]t(v)
u>1;
c(v)
iV(v)
Kt(v)
u,otherwise.(17)
Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on September 12,2020 at 09:23:51 UTC from IEEE Xplore. Restrictions apply.
1990 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 22, NO. 8, AUGUST 2020
where τ=ρk
α;is the element multiplication operator;
K(v)=V(v)Σ(v)2V(v)Tis the singular value decomposition of
K(v);Σ(v)=diag(σ(v)
1,...,σ
(v)
r,0,...,0) and ris the rank
of K(v);V(v)
Kis constructed by the first rcolumns of V(v);
t(v)
u=V(v)
Kc(v)
i;ˆp(v)is defined as
ˆp(v)=c(v)
iV(v)
K
×
σ(v)2
1
γ(v)+σ(v)2
1
,..., σ(v)2
r
γ(v)+σ(v)2
rT
t(v)
u
,
(18)
where γ(v)>0is a scalar, and it satisfies
t(v)T
udiag  σ(v)2
i
(γ(v)+σ(v)2
i)21irt(v)
u=1 2.(19)
We can obtain a unique root γ(v)when [1(v)
1,...,1(v)
r]
t(v)
u>1.
Step 4 Update S:When keeping other variables fixed, we
obtain the following optimization problem:
Sk+1 =argmin
S
V
v=1
ω(v)
kZ(v)
k+1 S2
F,
=
V
v=1
ω(v)
kZ(v)
k+1.(20)
The last equation is based on the fact that vω(v)
k=1.
Step 5 Update ω:To obtain the adaptive weights ωk+1,we
minimize the augmented Lagrangian function in Eq. (9) with
respect to ω:
ωk+1 =argmin
ω
V
v=1
ω(v)Z(v)
k+1 Sk+12
F+ηω2
2,
s.t. ω0,
v
ω(v)=1.(21)
Actually, Eq. (21) is a quadratic programming problem
ωk+1 =argmin
ω
ω+gk
2η
2
2
,
s.t. ω0,
v
ωv=1.(22)
where gv
k=Z(v)
k+1 Sk+12
Fforms the vector gk. We adopt the
off-the-shelf quadratic programming solver to solve the above
problem.
Step 6 Update Θ,Π, and ρ:The Lagrangian multipliers Θ,Π
and the penalty parameter ρare updated by
Θk+1
k+ρk(I−Z
k+1 −P
k+1);
Πk+1
k+ρk(Zk+1 −Y
k+1);
ρk+1 =min{βρk
max},(23)
Algorithm 1: JLMVC for multi-view clustering
Input: multi-view features: {X(v)}; parameters: α,λ;
Initialize: Y1,Z1,S
1,Θ1,Π1initialized to 0; weight
ω(v)
1=1
V;η= 500,ρ1=10
3,β=1.5,
=10
7,k=1;
1: Calculate the v-th kernel matrix K(v)by Eq. (4)
(v=1,...,V);
2: while not converged do
3: for v=1to Vdo
4: Update Z(v)
k+1 by Eq. (12);
5: Update P(v)
k+1 by Eq. (17);
6: end for
7: Update Yk+1 by Eq. (14);
8: Update Sk+1 by Eq. (20);
9: Update ωk+1 by Eq. (22);
10: Update Θk+1,Πk+1 , and ρk+1 by Eq. (23);
11: Check the convergence condition in Eq. (24);
12: end while
Output:Affinity matrix Sk+1.
where β[0,5+1
2]is a step length to update the penalty pa-
rameter ρin each iteration [50]. ρmax is the maximum value of
the penalty parameter ρ.
The details of the proposed algorithm for solving the JLMVC
model are summarized in Algorithm 1. Algorithm 1 can be ter-
minated when the following convergence condition is satisfied
max IZ(v)
k+1 P(v)
k+1,v =1,...,V
Zk+1 −Y
k+1tol, (24)
where tol > 0is a pre-defined tolerance.
Several notes regarding Algorithm 1 are given below to further
understand the proposed JLMVC.
rThe weights of different views are of importance to the
construction of the affinity matrix. An intuitive way to ini-
tialize weights of different views is set each weight to be
ω(v)
1=1
V. Then, weights are updated in an adaptive man-
ner by Eq. (22). Other variables Y1,Z1,S
1,Θ1,Π1are ini-
tialized to 0.
rLines 3–6 of Algorithm 1 can be performed in parallel as
subproblems (11) and (16) are independent with respect to
Z(v)and P(v), respectively.
rAfter performing Algorithm 1, we can obtain the unified
affinity matrix Swhich well inherits the advantage of the
representation tensor Z. Finally, the learned affinity matrix
Sserves as the input of spectral clustering algorithm [9] to
yield the clustering results.
IV. EXPERIMENTAL RESULTS
In this section, we aim to evaluate the performance of JLMVC
on eight multimedia datasets. The model analysis is also re-
ported.
Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on September 12,2020 at 09:23:51 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: JOINTLY LEARNING KERNEL REPRESENTATION TENSOR AND AFFINITY MATRIX 1991
TAB LE II
SUMMARY OF EIGHT MULTI-VIEW DATASETS
A. Experimental Settings
Our experiments select eight multimedia datasets for multi-
view clustering, including four face image datasets, two scene
datasets, one prokaryotic dataset, and one article data. The brief
description of these datasets is summarized in Table II. The de-
tails of each dataset are listed as follows:
Dataset descriptions: Yale: 1it consists of 165 gray-scale
images of 15 individuals with different facial expressions and
configurations. Following [6], [7], 4096d(dimension, d)In-
tensity, 3304dLBP, and 6750dGabor are extracted as three
multi-view features; Extended YaleB:2it contains 2414 face
images of 38 individuals, each of which has 64 near frontal
images under different lighting conditions. Similar to [6], [7],
the first 10 classes are selected and three types of features, in-
cluding 2500dIntensity, 3304dLBP, and 6750dGabor, are ex-
tracted; ORL:3it includes 400 face images with 40 clusters
under different times, lighting, facial expressions, and facial de-
tails; Prokaryotic phyla: it contains 551 prokaryotic species
described by textual data and different genomic representations.
Wikipedia:4it is an article dataset selected by Wikipedia ed-
itors since 2009. Following [46], 693 documents with 2 views
are selected; COIL-20:5COIL_20 contains 1440 images of 20
object categories. Three view features including 1024dinten-
sity, 3304dLBP, and 6750dGabor are employed; CMU-PIE:6
it consists of 5440 facial images of 68 subjects. Each image
is of size 64 ×64 with a large variance. Following [51], three
types of features including 1024dIntensity, 256dLBP, and 496d
HOG are used; Scene-15 [52]: it contains 4485 outdoor and in-
door scene images from 15 categories. Following [7], three kinds
of image features, including 1800dPHOW, 1180dPRI-CoLBP,
and 1240dCENTRIST are extracted to represent Scene-15.
Baselines: Our proposed JLMVC is compared with twelve
state-of-the-art single-view and multi-view clustering methods.
The competing methods are listed as follows: SSCbest [21]:
single-view clustering using the sparse regularizer (l1-norm) to
construct the representation matrix; LRRbest [20]: single-view
clustering using the nuclear norm to construct the representation
matrix; MLAP [35]: multi-view clustering by concatenating
representation matrices of different views and imposing low-
rank constraint to explore the complementarity; DiMSC [53]:
1http://cvc.yale.edu/projects/yalefaces/yalefaces.html
2http://vision.ucsd.edu/ leekc/ExtYaleDatabase/ExtYaleB.html
3http://www.uk.research.att.com/facedatabase.html
4http://lig-membres.imag.fr/grimal/data.html
5http://www.cs.columbia.edu/CAVE/software/softlib/
6http://vasc.ri.cmu.edu/idb/html/face/
multi-view clustering with the Hilbert-Schmidt Independence
criterion; LT-MS C [6]: multi-view clustering with the low-rank
tensor constraint; MLAN [16]: multi-view clustering with adap-
tive neighbors; ECMSC [24]: multi-view clustering by simulta-
neously exploiting the representation exclusivity and indicator
consistency; t-SVD-MSC [7]: multi-view clustering via tensor
multi-rank minimization; HLR-M2VS [8]: multi-view clus-
tering via hyper-Laplacian regularized multilinear multiview
self-representations; Kt-SVD-MSC [44]: multi-view clustering
via robust kernelized multi-view self-representations; DMF-
MVC [31]: multi-view clustering via deep matrix factorization;
AW P [54]: multi-view clustering via adaptively weighted
procrustes.
Specifically, SSCbest and LRRbest are two representative
baselines for single-view clustering. Others are the multi-view
clustering baselines. LT-MSC, t-SVD-MSC, HLR-M2VS, and
Kt-SVD-MSC are low-rank tensor representation-based multi-
view clustering approaches. Kt-SVD-MSC is the kernelized ver-
sion of t-SVD-MSC. MLAN is graph-based multi-view cluster-
ing one. The source codes of all competing methods are down-
loaded from the authors’ homepages. For single-view clustering
methods, we perform SSC and LRR on each feature matrix inde-
pendently and report the best clustering results. For multi-view
clustering ones, LT-MSC, t-SVD-MSC, HLR-M2VS, and Kt-
SVD-MSC are first performed to learn the representation tensor
Z, and then conduct the affinity matrix Sby averaging each
frontal slice of Z, that is, S=1
Vv|Z(v)|+|Z(v)T|.This
means that they are performed in two separate steps to obtain the
affinity matrix. After that, the spectral clustering algorithm [9]
is carried out to obtain the final clustering results. For fair com-
parison, our experiments follow the same parameter settings of
the original papers. For SSC and LRR, we select the regulariza-
tion parameter from the interval [0.01,10]; for MLAP, two free
parameters are searched from 0.001 to 1; for DiMSC, two free
parameters are chosen from [0.01,0.03] and [20 : 20 : 180],re-
spectively; the trade-off parameter of LT-MSC is selected from
0.01 to 100; for MLAN, one parameter is set to a random number
between 1 and 30; three free parameters of ECMSC are set in
[0.1,1], [0.1,1], and 1.2, respectively; the trade-off parameters of
t-SVD-MSC and Kt-SVD-MSC are set within the range [0.1,2]
and [0.001,0.6], respectively; for HLR-M2VS, two parameters
are located within the ranges [0.01,0.2] and [0.1,0.9], respec-
tively; DMF-MVC adopts {[100,50],[500,50],[500,200]}as
the sizes of the last layer and other parameters use the default
settings as recommended in [31]; AWP is parameter-free.
Evaluation metrics: Six widely used metrics are selected to
evaluate the clustering quality including accuracy (ACC), nor-
malized mutual information (NMI), adjusted rank index (AR),
F-score, Precision, and Recall. For each evaluation metric, the
higher value indicates the better clustering performance. As we
know, the spectral clustering algorithm uses the K-means al-
gorithm to obtain the indicator matrix for all methods except
MLAN, and different initializations may yield different cluster-
ing results. Thus, we run 10 trials for each experiment on all
datasets and report their average performance with standard de-
viations. Although MLAN does not use the K-means algorithm,
there exists one random parameter. Thus, we repeat MLAN
algorithm 10 trials.
Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on September 12,2020 at 09:23:51 UTC from IEEE Xplore. Restrictions apply.
1992 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 22, NO. 8, AUGUST 2020
TABLE III
CLUSTERING RESULTS (MEAN±STANDARD DEVIATION)ON THREE FACE IMAGE DATASETS
B. Clustering Performance Comparison
The clustering performance comparison on all multimedia
datasets are reported in Tables III, IV, and V. The best results
are highlighted in bold and the second-best ones are underlined
in each table. From the results in these tables, we reach the
following conclusions:
rGenerally speaking, the proposed JLMVC achieves the
best results on all datasets, except the ORL data where
JLMVC is the second best. They have verified the va-
lidity of the proposed JLMVC. This is mainly because
the proposed JLMVC takes three aspects into one unified
model: 1) high correlation between the representation ten-
sor and affinity matrix; (2) the nonlinear structures in real
applications; (3) different contributions of each view for
the construction of the unified affinity matrix. (More
details can be found in Section IV-C-(3).) Take the
Extended YaleB data as an example, the proposed JLMVC
improves around 1.4%, 0.4%, 2.1%, 1.7%, 1.6%, and 1.8%
with respect to six measures over the second-best method
Kt-SVD-MSC which also exploits the kernel trick to solve
the nonlinear subspaces problem but learns the representa-
tion tensor and affinity matrix in two separate manners;
rThe low-rank tensor representation-based MVC methods
(LT-MSC, t-SVD-MSC, HLR-M2VS, Kt-SVD-MSC, and
the proposed JLMVC) show better results than all single-
view clustering methods (SSC and LRR) in most cases.
This is mostly due to the fact that different features charac-
terize different and partly independent information of the
datasets. LRR and SSC exploit only partial information,
leading to unsatisfactory results especially when multi-
view features are heterogeneous. Whereas, the low-rank
Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on September 12,2020 at 09:23:51 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: JOINTLY LEARNING KERNEL REPRESENTATION TENSOR AND AFFINITY MATRIX 1993
TAB LE IV
CLUSTERING RESULTS (MEAN±STANDARD DEVIATION)ON WIKIPEDIA AND PROKARYOTIC
DMF-MVCC was crashed on these two databases.
tensor representation-based MVC can well explore the high
order correlations underlying multi-view features;
rThe graph-based multi-view clustering method, MLAN,
obtains unstable results. On Prokaryotic data, MLAN
achieves the similar performance with our JLMVC. How-
ever, it performs worse than those single-view clustering
methods on other datasets. The reason may be that the
graph-based clustering approaches usually construct the
affinity matrix on the raw multimedia features which may
be corrupted by noise and outliers;
rOn ORL data, HLR-M2VS achieves better results than the
proposed JLMVC. The reason is that the manifold regu-
larization may be better to preserve the local geometrical
structure of ORL data than the kernel trick when han-
dling nonlinearity. However, HLR-M2VS is less robust on
Yale and Extended YaleB datasets. Specifically, in terms
of ACC and NMI, the leading margins of our JLMVC are
24.0% and 19.4% over HLR-M2VS on Extended YaleB, re-
spectively. On Yale, the improvement of JLMVC is 24.4%
and 21.0%, respectively. Similar observations can be ob-
tained on Scene-15 and Prokaryotic datasets. This indi-
cates that, compared to the manifold-based methods, the
kernel-based methods may be a better way to handle the
nonlinear subspaces;
rThe performance of MLAP degrades sharply on the Ex-
tended YaleB data. Its performance is even worse than those
of the single-view clustering methods, i.e., LRR and SSC.
However, it performs better than them on other datasets.
As stated in [7], the LBP and Gabor features cause less
discriminative representation than the intensity feature due
to large variations of illumination as shown in the first
group of Fig. 4. This indicates that simply concatenating
all features may fail to obtain a good affinity matrix to de-
scribe the relationship among all samples, especially when
all features are heterogeneous. This is the direct motiva-
tion why our model considers different contributions of
different features to construct the affinity matrix.
C. Model Analysis
In this section, we aim to give a comprehensive analysis of the
proposed JLMVC in Eq. (7), including the parameter analysis,
convergence analysis, and runtime.
1) Parameter Analysis: There are three parameters, i.e.,
α, λ in the proposed JLMVC. In all experiments, we set
η= 500. Thus, there are two free parameters which need to
be tuned. Actually, αand λare used to balance the contri-
butions of the low-rank tensor term, noise term and consen-
sus term. For example, when the noise level of features is
high, αmay be selected a large value. αand λare selected
from the ranges [0.001,0.005,0.01,0.05,0.1,0.3,0.5,0.7] and
[0.001,0.005,0.01,0.05,0.1,0.3,0.5,0.7,0.9,1], respectively.
Here, the Yale and Extended YaleB datasets are selected as two
examples. Fig. 5 shows the ACC and NMI values with respect
to different combinations of αand λ. From this figure, we can
observe that when αis set to a relatively large value, JLMVC can
achieve the best results. An intuitive interpretation is that there
are large variations of illumination on the Extended YaleB data.
Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on September 12,2020 at 09:23:51 UTC from IEEE Xplore. Restrictions apply.
1994 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 22, NO. 8, AUGUST 2020
TAB LE V
CLUSTERING RESULTS (MEAN±STANDARD DEVIATION)ON COIL-20,CMU-PIE AND SCENE-15
2) Computation Complexity and Empirical Convergence
Analysis: The proposed JLMVC consists of six subproblems.
The main computation complexity of JLMVC is to update Y
and Psince updating other variables contains only the matrix
addition and scalar-matrix multiplication. The total computa-
tion complexity of Ysubproblem is O(2Vn
2log(n)+V2n2)
since it needs to compute the FFT, inverse FFT and singu-
lar value decomposition. For updating P, it includes Vinde-
pendent subproblems as shown in Eq. (16). Each subproblem
takes O(rn2)for the vector-matrix multiplication, where ris
the rank of K(v). Thus, the computation complexity of JLMVC
is O(2Vn
2log(n)+V2n2+Vrn
2).
The empirical convergence of JLMVC on Extended YaleB
dataset is shown in Fig. 6. The x-axis denotes the number of
iterations, while the y-axis represents the errors defined in Eq.
(24). We can see that, after several iterations, the errors witness a
Fig. 4. ACC and NMI values of LRR with all features on (1) Extended YaleB,
(2) Yale and (3) ORL datasets.
quick drop until a stable value. In all experiments, the proposed
JLMVC can reach the smallest residual within 50 iterations. To
further investigate the empirical convergence of JLMVC, Fig. 7
reports the ACC and NMI values with respect to iterations on
Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on September 12,2020 at 09:23:51 UTC from IEEE Xplore. Restrictions apply.
CHEN ET AL.: JOINTLY LEARNING KERNEL REPRESENTATION TENSOR AND AFFINITY MATRIX 1995
TAB LE VI
PERFORMANCE (ACC/NMI) OF JLMVC AND ITS VARIANTS ON DIFFERENT DATASETS
Fig. 5. ACC and NMI values of JLMVC with different combinations of αand
λon Yale (Two top figures) and Extended YaleB (Two bottom figures) datasets.
Fig. 6. Empirical convergence versus iterations on Extended YaleB data.
Fig. 7. ACC and NMI values versus iterations on Extended YaleB.
Extended YaleB dataset. Before the first 10 iterations, JLMVC
does not reach a meaningful accuracy. But after that, JLMVC
achieves promising ACC and NMI values higher than those of all
competing methods except Kt-SVD-MSC. This shows that the
proposed JLMVC is an excellent multi-view clustering method.
3) The Effect of Zand S:The proposed JLMVC achieves
the joint learning of the representation tensor Zand affinity
matrix S. However, most existing MVC methods follow two
separate steps to construct Zand S. To investigate the effect of
Zand S, we perform a test by setting λ=0. In this test, we sim-
ply obtain Zand then construct S=1
Vv|Z(v)|+|Z(v)T|.
This simple variant of JLMVC is denoted as JLMVC-Z.Ta-
ble VI reports clustering results of JLMVC and JLMVC-Z.Itis
easy to see that JLMVC achieves superior clustering results over
JLMVC-Zin all cases. The average improvement of JLMVC
is around 17.06% and 16.23% over JLMVC-Zwith respect to
ACC and NMI, respectively, indicating that construction of Z
and Ssimultaneous can boost the clustering performance.
4) Ablation Study on the Kernel Trick: To investigate the ef-
fect of the kernel trick, we also carry out the model in Eq. (3),
denoted as JLMVC-nk. Like JLMVC, JLMVC-nk also learns the
representation tensor and affinity matrix simultaneously without
the kernel trick. This means that the affinity matrix is constructed
from the the original multimedia data (usually nonlinear sepa-
rable). The ACC and NMI values of JLMVC-nk are reported
in the last row of Table VI. One can see that JLMVC achieves
better clustering results than JLMVC-nk in all cases. A typical
example is the Extended YaleB dataset whose multiple features
are diverse as shown in Fig. 4. This indicates that the kernel trick
can handle the nonlinearity and boost the multi-view clustering
performance.
5) Runtime: Since the computation time of a method is also
an evaluation factor, we give a runtime comparison of the pro-
posed JLMVC and several competitors. Table VII reports the
runtime comparison results. All experiments are implemented
in Matlab 2016a on a workstation with 3.50 GHz CPU and 16 GB
RAM. From Table VII, the methods with the average time from
low to high are MLAN, t-SVD-MSC, HLR-M2VS, JLMVC,
LT-MSC, DiMSC, MLAP, and Kt-SVD-MSC. MLAN costs the
shortest processing time and the proposed JLMVC belongs to
the middle-ranking group. All methods except for MLAN should
compute the singular value decomposition and matrix inversion.
This leads to a high computation cost. Although MLAN is the
most efficient one, it has an unstable performance. The reason is
that MLAN uses the raw data to learn the similarity matrix and
the raw data are easily contaminated by noise. Other methods
impose the low-rank constraint on the representation matrix (or
tensor) and use the sparse regularizer to remove noise. They can
construct a reliable similarity matrix.
Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on September 12,2020 at 09:23:51 UTC from IEEE Xplore. Restrictions apply.
1996 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 22, NO. 8, AUGUST 2020
TAB LE VI I
A
VERAGE RUNNING TIME (IN SECONDS)ON ALL DATABASES
V. CONCLUSIONS
In this paper, we proposed a novel method called JLMVC to
solve the multi-view clustering problem, based on the low-rank
tensor representation and “kernel trick”. In JLMVC, instead of
capturing a low-rank representation matrix among all views, the
tensor singular value decomposition-based tensor nuclear norm
was used to learn the representation tensor so as to explore the
high order correlations among different views. Using the kernel
trick, the original multimedia data was implicitly mapped from
the input data space into a new feature space to overcome the dif-
ficulty of nonlinearity in real applications. To make full use of the
high correlation between the representation tensor and affinity
matrix, the proposed JLMVC achieved the joint learning of the
representation tensor and affinity matrix. Thus, the learned affin-
ity matrix has the potential to boost the clustering performance
which was demonstrated by extensive experiments on eight mul-
timedia datasets. Our future work will design a fast and efficient
multi-view clustering method. One possible solution is using the
Frank-Wolfe algorithm to reduce the computation complexity of
the singular value decomposition.
ACKNOWLEDGMENT
The authors would like to thank the editors and the anony-
mous reviewers for their constructive comments, which helped
to improve the quality of this article. The authors wish to grate-
fully acknowledge Prof. C. Zhang from Tianjin University and
Prof. Y. Xie from East China Normal University for sharing
multi-view datasets and codes.
REFERENCES
[1] S. Yang et al., “SkeletonNet: A hybrid network with a skeleton-embedding
process for multi-view image representation learning,” IEEE Trans. Mul-
timedia, vol. 21, no. 11, pp. 2916–2929, Nov. 2019.
[2] Z. Zhang, Y. Xie, W. Zhang, and Q. Tian, “Effective image retrieval via
multilinear multi-index fusion,” IEEE Trans. Multimedia, vol. 21, no. 11,
pp. 2878–2890, Nov. 2019.
[3] S. K. Kuanar, K. B. Ranga, and A. S. Chowdhury, “Multi-view video
summarization using bipartite matching constrained optimum-path for-
est clustering,” IEEE Trans. Multimedia, vol. 17, no. 8, pp. 1166–1173,
Aug. 2015.
[4] X. Wu, C.-W. Ngo, and A. G. Hauptmann, “Multimodal news story clus-
tering with pairwise visual near-duplicate constraint,” IEEE Trans. Multi-
media, vol. 10, no. 2, pp. 188–199, Feb. 2008.
[5] C. Tang et al., “Learning a joint affinity graph for multiviewsubspace clus-
tering,” IEEE Trans. Multimedia, vol. 21, no. 7, pp. 1724–1736, Jul. 2019.
[6] C. Zhang, H. Fu, S. Liu, G. Liu, and X. Cao, “Low-rank tensor constrained
multiview subspace clustering, in Proc. IEEE Int. Conf. Comput. Vision,
2015, pp. 1582–1590.
[7] Y.Xie et al., “On unifying multi-view self-representations for clustering by
tensor multi-rank minimization,” Int. J. Comput. Vision, vol. 126, no. 11,
pp. 1157–1179, 2018.
[8] Y. Xie, W. Zhang, Y. Qu, L. Dai, and D. Tao, “Hyper-Laplacian regular-
ized multilinear multiview self-representations for clustering and semisu-
pervised learning,” IEEE Trans. Cybern., 2018, to be published.
[9] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering: Analysis
and an algorithm,” in Proc. Neural Inf. Process. Syst., 2002, pp. 849–856.
[10] X. Guo, “Robust subspace segmentation by simultaneously learning data
representations and their affinity matrix,” in Proc. Joint Conf. Artif. Intell.,
2015, pp. 3547–3553.
[11] X. Peng, Z. Yu, Z. Yi, and H. Tang, “Constructing the l2-graph for robust
subspace learning and subspace clustering,” IEEE Trans. Cybern., vol. 47,
no. 4, pp. 1053–1066, Apr. 2017.
[12] F. Nie, X. Wang, and H. Huang, “Clustering and projected clustering with
adaptive neighbors, in Proc. 20th ACM SIGKDD Int. Conf. Knowl. Dis-
covery Data Mining, 2014, pp. 977–986.
[13] F. Nie et al.,“Newl1-norm relaxations and optimizations for graph clus-
tering,” in Proc. AAAI Conf. Artif. Intell., 2016, pp. 1962–1968.
[14] F. Nie, X. Wang, M. I. Jordan, and H. Huang, “The constrained Laplacian
rank algorithm for graph-based clustering,” in Proc. AAAI Conf. Artif.
Intell., 2016, pp. 1969–1976.
[15] K. Zhan, C. Zhang, J. Guan, and J. Wang, “Graph learning for multi-
view clustering, IEEE Trans. Cybern., vol. 48, no. 10, pp. 2887–2895,
Oct. 2017.
[16] F. Nie, G. Cai, J. Li, and X. Li, “Auto-weighted multi-view learning for
image clustering and semi-supervised classification,” IEEE Trans. Image
Process., vol. 27, no. 3, pp. 1501–1511, Mar. 2018.
[17] F. Nie, G. Cai, and X. Li, “Multi-view clustering and semi-supervised
classification with adaptive neighbours, in Proc. AAAI Conf. Artif. Intell.,
2017, pp. 2408–2414.
[18] F. Nie et al., “Self-weighted multiview clustering with multiple graphs,
in Proc. Joint Conf. Artif. Intell., 2017, pp. 2564–2570.
[19] H. Wang, Y. Yang, and B. Liu, “GMC: Graph-based multi-view clustering,”
IEEE Trans. Knowl. Data Eng., 2019, to be published.
[20] G. Liu et al., “Robust recovery of subspace structures by low-rank rep-
resentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1,
pp. 171–184, Jan. 2013.
[21] E. Elhamifar and R. Vidal, “Sparse subspace clustering: Algorithm, theory,
and applications,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 11,
pp. 2765–2781, Nov. 2013.
[22] C. Lu, J. Feng, Z. Lin, T. Mei, and S. Yan, “Subspace clustering by block
diagonal representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41,
no. 2, pp. 487–501, Feb. 2019.
[23] Y. Wang, L. Wu, X. Lin, and J. Gao, “Multiview spectral clustering
via structured low-rank matrix factorization, IEEE Trans. Neural Netw.
Learn. Syst., no. 29, no. 10, pp. 4833–4843, Oct. 2018.
[24] X. Wang, X. Guo, Z. Lei, C. Zhang, and S. Z. Li, “Exclusivity-consistency
regularized multi-view subspace clustering, in Proc. IEEE Conf. Comput.
Vision Pattern Recognit., 2017, pp. 923–931.
[25] Z. Kang, H. Pan, S. C. H. Hoi, and Z. Xu, “Robust graph learning from
noisy data,” IEEE Trans. Cybern., 2019, to be published.
[26] M. Yin, S. Xie, Z. Wu, Y. Zhang, and J. Gao, “Subspace clustering via
learning an adaptive low-rank graph, IEEE Trans.Image Process., vol. 27,
no. 8, pp. 3716–3728, Aug. 2018.
[27] M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduc-
tion and data representation,” Neural Comput., vol. 15, no. 6, pp. 1373–
1396, 2003.
[28] L. Zhuang et al., “Constructing a nonnegative low-rank and sparse graph
with data-adaptive features, IEEE Trans. Image Process., vol. 24, no. 11,
pp. 3717–3728, Nov. 2015.
Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on September 12,2020 at 09:23:51 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: JOINTLY LEARNING KERNEL REPRESENTATION TENSOR AND AFFINITY MATRIX 1997
[29] S. Yi et al., “Dual pursuit for subspace learning,” IEEE Trans. Multimedia,
vol. 21, no. 6, pp. 1399–1411, Jun. 2019.
[30] P. Ji, T. Zhang, H. Li, M. Salzmann, and I. Reid, “Deep subspace clustering
networks,” in Proc. Neural Inf. Process. Syst., 2017, pp. 24–33.
[31] H. Zhao, Z. Ding, and Y. Fu, “Multi-view clustering via deep matrix fac-
torization,” in Proc. AAAI Conf. Artif. Intell., 2017, pp. 2921–2927.
[32] Z. Zhang et al., “Highly-economized multi-view binary compression for
scalable image clustering,” in Proc. Eur. Conf. Comput. Vision, 2018,
pp. 717–732.
[33] G. Chao et al., “Multi-view cluster analysis with incomplete data to un-
derstand treatment effects, Inf. Sci., vol. 494, pp. 278–293, 2019.
[34] G. Chao, S. Sun, and J. Bi, “A survey on multi-view clustering,” 2017,
arXiv:1712.06246.
[35] B. Cheng, G. Liu, J. Wang, Z. Huang, and S. Yan, “Multi-task low-rank
affinity pursuit for image segmentation, in Proc. IEEE Int. Conf. Comput.
Vis io n, 2011, pp. 2439–2446.
[36] R. Xia, Y. Pan, L. Du, and J. Yin, “Robust multi-view spectral clustering
via low-rank and sparse decomposition,” in Proc. AAAI Conf. Artif. Intell.,
2014, pp. 2149–2155.
[37] C. Zhang, Q. Hu, H. Fu, P. Zhu, and X. Cao, “Latent multi-view subspace
clustering,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., 2017,
pp. 4279–4287.
[38] S. Luo, C. Zhang, W. Zhang, and X. Cao, “Consistent and specific
multi-view subspace clustering,” in Proc. AAAI Conf. Artif. Intell., 2018,
pp. 3730–3713.
[39] C. Zhang et al., “Generalized latent multi-view subspace clustering,” IEEE
Trans. Pattern Anal. Mach. Intell., 2018, to be published.
[40] J. Liu, C. Wang, J. Gao, and J. Han, “Multi-view clustering via joint non-
negative matrix factorization, in Proc. SIAM Int. Conf. Data Min., 2013,
pp. 252–260.
[41] M. Yin, J. Gao, S. Xie, and Y. Guo, “Multiview subspace clustering via
tensorial t-product representation,” IEEE Trans. Neural Netw. Learn. Syst.,
vol. 30, no. 3, pp. 851–864, Mar. 2019.
[42] V. M. Patel and R. Vidal, “Kernel sparse subspace clustering, in Proc.
IEEE Int. Conf. Image Process., 2014, pp. 2849–2853.
[43] S. Xiao, M. Tan, D. Xu, and Z. Y. Dong, “Robust kernel low-rank rep-
resentation,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 11,
pp. 2268–2281, Nov. 2016.
[44] Y. Qu, J. Liu, Y. Xie, and W. Zhang, “Robust kernelized multi-view self-
representations for clustering by tensor multi-rank minimization,” 2017,
arXiv:1709.05083.
[45] M. E. Kilmer and C. D. Martin, “Factorization strategies for third-order
tensors,” Linear Algebra Appl., vol. 435, no. 3, pp. 641–658, 2011.
[46] H. Wang, Y. Yang, and T. Li, “Multi-view clustering via concept factor-
ization with local manifold regularization,” in Proc. IEEE Int. Conf. Data
Mining, 2016, pp. 1245–1250.
[47] Y. Chen et al., “Denoising of hyperspectral images using nonconvex low
rank matrix approximation,” IEEE Trans. Geosci. Remote Sens., vol. 55,
no. 9, pp. 5366–5380, Sep. 2017.
[48] Y. Chen, S. Wang, and Y. Zhou, “Tensor nuclear norm-based low-rank
approximation with total variation regularization, IEEE J. Sel. Topics
Signal Process., vol. 12, no. 6, pp. 1364–1377, Dec. 2018.
[49] W. Hu, D. Tao, W. Zhang, Y. Xie, and Y. Yang, “The twist tensor nu-
clear norm for video completion,” IEEE Trans. Neural Netw. Learn. Syst.,
vol. 28, no. 12, pp. 2961–2973, Dec. 2017.
[50] Y. Chen, Y. Wang, M. Li, and G. He, “Augmented Lagrangian alternating
direction method for low-rank minimization via non-convex approxima-
tion,” Signal, Image Video Process., vol. 11, no. 7, pp. 1271–1278, 2017.
[51] T. Zhou, C. Zhang, C. Gong, H. Bhaskar, and J. Yang, “Multiview la-
tent space learning with feature redundancy minimization,” IEEE Trans.
Cybern., 2018, to be published.
[52] L. Fei-Fei and P. Perona, “A Bayesian hierarchical model for learning nat-
ural scene categories,” in Proc. IEEE Comput. Soc. Conf. Comput. Vision
Pattern Recognit., 2005, vol. 2, pp. 524–531.
[53] X. Cao, C. Zhang, H. Fu, S. Liu, and H. Zhang, “Diversity-induced multi-
view subspace clustering,” in Proc. IEEE Conf. Comput. Vision Pattern
Recognit., 2015, pp. 586–594.
[54] F. Nie, L. Tian, and X. Li, “Multiview clustering via adaptively weighted
procrustes,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discovery Data
Mining, 2018, pp. 2022–2030.
Yongyong Chen received the B.S. and M.S. degrees
in the College of Mathematics and Systems Science,
Shandong University of Science and Technology,
Qingdao, China, and visited the National Key Lab
for Novel Software Technology, Nanjing University,
Nanjing, China, as an exchange student in 2017. He is
currently working toward the Ph.D. degree with the
Department of Computer and Information Science,
University of Macau, Macau, China. His research in-
terests include (non-convex) low-rank and sparse ma-
trix/tensor decomposition models, with applications
to image processing, data mining, and computer vision.
Xiaolin Xiao received the B.E. degree from Wuhan
University, Wuhan, China, in 2013, and the Ph.D. de-
gree from the University of Macau, Macau, China, in
2019. She is currently a Postdoctoral Fellow with the
School of Computer Science and Engineering, South
China University of Technology, Guangzhou, China.
Her research interests include superpixel segmenta-
tion, saliency detection, and color image processing
and understanding.
Yicong Zhou (M’07–SM’14) received the B.S. de-
gree in electrical engineering from Hunan University,
Changsha, China, and the M.S. and Ph.D. de-
grees in electrical engineering from Tufts University,
Medford, MA, USA. He is an Associate Professor
and the Director of the Vision and Image Processing
Laboratory, Department of Computer and Informa-
tion Science, University of Macau, Macau, China. His
research interests include image processing and un-
derstanding, computer vision, machine learning, and
multimedia security. Dr. Zhou is a Senior Member
of the International Society for Optical Engineering. He was a recipient of the
Third Price of Macau Natural Science Award in 2014. He is the Co-Chair of
Technical Committee on Cognitive Computing in the IEEE Systems, Man, and
Cybernetics Society. He is an Associate Editor for the IEEE TRANSACTIONS
ON NEUTRAL NETWORKS AND LEARNING SYSTEMS, IEEE TRANSACTIONS ON
CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, IEEE TRANSACTIONS ON
GEOSCIENCE AND REMOTE SENSING, and four other journals.
Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on September 12,2020 at 09:23:51 UTC from IEEE Xplore. Restrictions apply.
... In addition, it is a clever way to reduce the computational complexity of LSMVC algorithms by rotating the constructed third-order tensor. Therefore, combined with third-order tensor decomposition, many multi-view clustering algorithms have been proposed in recent years (Li et al. 2021;Wu et al. 2019Wu et al. , 2020Jiang and Gao 2022;Chen et al. 2019Chen et al. , 2021Chen et al. , 2022cZhang et al. 2021;Xu et al. 2020;Fu et al. 2022;Yang et al. 2022c;Xia et al. 2021Xia et al. , 2022aSun et al. 2020;Tang et al. 2021;Ma et al. 2022;Dou et al. 2021;Zheng et al. 2020Zheng et al. , 2023Hao et al. 2022;Gao et al. 2020). This section first introduces the third-order t-SVD, then the reduction of computational complexity caused by the third-order rotation will be analyzed theoretically. ...
... The rotation has recently been widely used in multi-view clustering algorithms based on the third-order tensor t-SVD (Xie et al. 2018;Li et al. 2021;Wu et al. 2019Wu et al. , 2020Jiang and Gao 2022;Chen et al. 2019Chen et al. , 2021Chen et al. , 2022cZhang et al. 2021;Xu et al. 2020;Fu et al. Fig. 2 Construction of the third-order tensors based on each view's Markov chain transition probability matrix (Wu et al. 2019). ...
... Generally, since the power of the number of samples N directly determines the time complexity of the algorithm, only the power of N is considered here. The three categories are O(N 3 ) (Xie et al. 2018(Xie et al. , 2021(Xie et al. , 2022aLi et al. 2021;Wu et al. 2019Wu et al. , 2020Chen et al. 2019Chen et al. , 2021Zhang et al. 2021;Sun et al. 2020;Ma et al. 2022;Dou et al. 2021;Zheng et al. 2020Zheng et al. , 2023Hao et al. 2022;Gao et al. 2020), O(N 2 ) (Xu et al. 2020;Fu et al. 2022;Chen et al. 2022c), O(N) (Jiang and Gao 2022;Yang et al. 2022c;Tang et al. 2021;Xia et al. 2022b), respectively. Each category will be introduced separately in the following. ...
Article
Full-text available
The diversity and large scale of multi-view data have brought more significant challenges to conventional clustering technology. Recently, multi-view clustering has received widespread attention because it can better use different views’ consensus and complementary information to improve clustering performance. Simultaneously, many researchers have proposed various algorithms to reduce the computational complexity to accommodate the demands of large-scale multi-view clustering. However, the current reviews do not summarize from the perspective of reducing the computational complexity of large-scale multi-view clustering. Therefore, this paper outlines various high-frequency methods used in recent years to reduce the computational complexity of large-scale multi-view clustering, i.e. third-order tensor t-SVD, anchors-based graph construction, matrix blocking, and matrix factorization, and compares the corresponding algorithms based on several open datasets. Finally, the strengths and weaknesses of the current algorithm and the point of improvement are analyzed.
... Cai et al. [19] developed a unified clustering method called the improved K-means algorithm that has multiple kernels for processing large-scale data with multiple views; this method does not require graph construction [20], and adopts the structured sparsity-inducing norm, l 2,1 -norm. Chen et al. [21] introduced a novel nonlinear method with kernel-induced mapping and automatically learned reasonable weights for balancing the distribution of multiple views. In addition, owing to the excellent performance of the spectral clustering methods, many MVC methods attempt to integrate spectral clustering and multi-view data to explore the similarity information rather than a kernel structure for completing the clustering tasks. ...
Preprint
Full-text available
Incomplete multi-view clustering primarily focuses on dividing unlabeled data into corresponding categories with missing instances, and has received intensive attention due to its superiority in real applications. Considering the influence of incomplete data, the existing methods mostly attempt to recover data by adding extra terms. However, for the unsupervised methods, a simple recovery strategy will cause errors and outlying value accumulations, which will affect the performance of the methods. Broadly, the previous methods have not taken the effectiveness of recovered instances into consideration, or cannot flexibly balance the discrepancies between recovered data and original data. To address these problems, we propose a novel method termed Manifold-based Incomplete Multi-view clustering via Bi-consistency guidance (MIMB), which flexibly recovers incomplete data among various views, and attempts to achieve biconsistency guidance via reverse regularization. In particular, MIMB adds reconstruction terms to representation learning by recovering missing instances, which dynamically examines the latent consensus representation. Moreover, to preserve the consistency information among multiple views, MIMB implements a biconsistency guidance strategy with reverse regularization of the consensus representation and proposes a manifold embedding measure for exploring the hidden structure of the recovered data. Notably, MIMB aims to balance the importance of different views, and introduces an adaptive weight term for each view. Finally, an optimization algorithm with an alternating iteration optimization strategy is designed for final clustering. Extensive experimental results on 6 benchmark datasets are provided to confirm that MIMB can significantly obtain superior results as compared with several state-of-the-art baselines.
... Various kinds of real-world data are usually represented with different modalities, such as perception data of intelligent unmanned systems and medical diagnosis data (Yang et al., 2019b;Chen et al., 2019;Cao et al., 2022). Among researches on modeling such multi-modal data, multi-modal clustering (MmC), which divides samples into clusters in an unsupervised manner, has attracted much attention in recent years Chen et al., 2022). ...
Article
Full-text available
Incomplete multi-modal clustering (IMmC) is challenging due to the unexpected missing of some modalities in data. A key to this problem is to explore complementarity information among different samples with incomplete information of unpaired data. Despite preliminary progress, existing methods suffer from (1) relying heavily on paired data, and (2) difficulty in mining complementarity on data with high missing rates. To address the problems, we propose a novel method, Integrated Heterogeneous Graph ATtention (IHGAT) network, for IMmC. To fully exploit the complementarity among different samples and modalities, we first construct a set of integrated heterogeneous graphs based on the similarity graph learned from unified latent representations and the modality-specific availability graphs formed by the existing relations of different samples. Thereafter, the attention mechanism is applied to the constructed integrated heterogeneous graph to aggregate the embedded content of heterogeneous neighbors for each node. In this way, the representations of missing modalities can be learned based on the complementarity information of other samples and their other modalities. Finally, the consistency of probability distribution is embedded into the network for clustering. Consequently, the proposed method can form a complete latent space where incomplete information can be supplemented by other related samples via the learned intrinsic structure. Extensive experiments on eight public datasets show that the proposed IHGAT outperforms existing methods under various settings and is typically more robust in cases of high missing rates.
Article
Since hand-print recognition, i.e., palmprint, finger-knuckle-print (FKP), and hand-vein, have significant superiority in user convenience and hygiene, it has attracted greater enthusiasm from researchers. Seeking to handle the long-standing interference factors, i.e., noise, rotation, shadow, in hand-print images, multi-view hand-print representation has been proposed to enhance the feature expression by exploiting multiple characteristics from diverse views. However, the existing methods usually ignore the high-order correlations between different views or fuse very limited types of features. To tackle these issues, in this paper, we present a novel tensorized multi-view low-rank approximation based robust hand-print recognition method (TMLA_RHR), which can dexterously manipulate the multi-view hand-print features to produce a high-compact feature representation. To achieve this goal, we formulate TMLA_RHR by two key components, i.e., aligned structure regression loss and tensorized low-rank approximation, in a joint learning model. Specifically, we treat the low-rank representation matrices of different views as a tensor, which is regularized with a low-rank constraint. It models the across information between different views and reduces the redundancy of the learned sub-space representations. Experimental results on eight real-world hand-print databases prove the superiority of the proposed method in comparison with other state-of-the-art related works.
Article
Full-text available
Some existing low-rank approximation approaches either need to predefine the rank values (such as the matrix/tensor factorization-based methods) or fail to consider local information of data (e.g., spatial or spectral smooth structure). To overcome these drawbacks, this paper proposes a new model called the tensor nuclear norm-based low-rank approximation with total variation regularization (TLR-TV) for color and multispectral image denoising. TLR-TV uses the tensor nuclear norm to encode the global low-rank prior of tensor data and the total variation regularization to preserve the spatial-spectral continuity in a unified framework. Including the hyper total variation (HTV) and spatial-spectral total variation (SSTV), we propose two TLR-TV-based algorithms, namely TLR-HTV and TLR-SSTV. Using the alternating direction method of multiplier, we further propose two simple algorithms to solve TLR-HTV and TLR-SSTV. Extensive experiments on simulated and real-world noisy images demonstrate that the proposed TLR-HTV and TLR-SSTV outperform the state-of-the-art methods in color and multispectral image denoising in terms of quantitative and qualitative evaluations.
Article
Multi-View Clustering (MVC) has garnered more attention recently since many real-world data are comprised of different representations or views. The key is to explore complementary information to benefit the clustering problem. In this paper, we present a deep matrix factorization framework for MVC, where semi-nonnegative matrix factorization is adopted to learn the hierarchical semantics of multi-view data in a layer-wise fashion. To maximize the mutual information from each view, we enforce the non-negative representation of each view in the final layer to be the same. Furthermore, to respect the intrinsic geometric structure in each view data, graph regularizers are introduced to couple the output representation of deep structures. As a non-trivial contribution, we provide the solution based on alternating minimization strategy, followed by a theoretical proof of convergence. The superior experimental results on three face benchmarks show the effectiveness of the proposed deep matrix factorization model.
Article
Due to the efficiency of learning relationships and complex structures hidden in data, graph-oriented methods have been widely investigated and achieve promising performance in multi-view learning. Generally, these learning algorithms construct informative graph for each view or fuse different views to one graph, on which the following procedure are based. However, in many real world dataset, original data always contain noise and outlying entries that result in unreliable and inaccurate graphs, which cannot be ameliorated in the previous methods. In this paper, we propose a novel multi-view learning model which performs clustering/semi-supervised classification and local structure learning simultaneously. The obtained optimal graph can be partitioned into specific clusters directly. Moreover, our model can allocate ideal weight for each view automatically without additional weight and penalty parameters. An efficient algorithm is proposed to optimize this model. Extensive experimental results on different real-world datasets show that the proposed model outperforms other state-of-the-art multi-view algorithms.
Article
In recent data mining research, the graph clustering methods, such as normalized cut and ratio cut, have been well studied and applied to solve many unsupervised learning applications. The original graph clustering methods are NP-hard problems. Traditional approaches used spectral relaxation to solve the graph clustering problems. The main disadvantage of these approaches is that the obtained spectral solutions could severely deviate from the true solution. To solve this problem, in this paper, we propose a new relaxation mechanism for graph clustering methods. Instead of minimizing the squared distances of clustering results, we use the l1-norm distance. More important, considering the normalized consistency, we also use the l1-norm for the normalized terms in the new graph clustering relaxations. Due to the sparse result from the l1-norm minimization, the solutions of our new relaxed graph clustering methods get discrete values with many zeros, which are close to the ideal solutions. Our new objectives are difficult to be optimized, because the minimization problem involves the ratio of nonsmooth terms. The existing sparse learning optimization algorithms cannot be applied to solve this problem. In this paper, we propose a new optimization algorithm to solve this difficult non-smooth ratio minimization problem. The extensive experiments have been performed on three two-way clustering and eight multi-way clustering benchmark data sets. All empirical results show that our new relaxation methods consistently enhance the normalized cut and ratio cut clustering results.
Article
Multi-view clustering has attracted intensive attention due to the effectiveness of exploiting multiple views of data. However, most existing multi-view clustering methods only aim to explore the consistency or enhance the diversity of different views. In this paper, we propose a novel multi-view subspace clustering method (CSMSC), where consistency and specificity are jointly exploited for subspace representation learning. We formulate the multi-view self-representation property using a shared consistent representation and a set of specific representations, which better fits the real-world datasets. Specifically, consistency models the common properties among all views, while specificity captures the inherent difference in each view. In addition, to optimize the non-convex problem, we introduce a convex relaxation and develop an alternating optimization algorithm to recover the corresponding data representations. Experimental evaluations on four benchmark datasets demonstrate that the proposed approach achieves better performance over several state-of-the-arts.
Article
Multi-view representation learning plays a fundamental role in multimedia data analysis. Some specific inter-view alignment principles are adopted in conventional models, where there is an assumption that different views share a common latent subspace. However, when dealing views on diverse semantic levels, the view-specific characteristics are neglected, and the divergent inconsistency of similarity measurements hinders sufficient information sharing. This paper proposes a hybrid deep network by introducing tensor factorization into the multi-view Deep Auto-encoder. The network adopts Skeleton-Embedding Process for unsupervised multi-view subspace learning. It takes full consideration of view-specific characteristics, and leverages the strength of both shallow and deep architectures for modeling low and high-level views respectively. We first formulate the high-level-view semantic distribution as the underlying skeleton structure of the learned subspace, and then infer the local tangent structures according to the affinity propagation of low-level-view geometric correlations. As a consequence, more discriminative subspace representation can be learned from global semantic pivots to local geometric details. Experimental comparisons on three benchmark image datasets show the promising performance and flexibility of our model.
Article
Multi-view cluster analysis, as a popular granular computing method, aims to partition sample subjects into consistent clusters across different views in which the subjects are characterized. Frequently, data entries can be missing from some of the views. The latest multi-view co-clustering methods cannot effectively deal with incomplete data, especially when there are mixed patterns of missing values. We propose an enhanced formulation for a family of multi-view co-clustering methods to cope with the missing data problem by introducing an indicator matrix whose elements indicate which data entries are observed and assessing cluster validity only on observed entries. In comparison with common methods that impute missing data in order to use regular multi-view analytics, our approach is less sensitive to imputation uncertainty. In comparison with other state-of-the-art multi-view incomplete clustering methods, our approach is sensible in the cases of either missing any entry in a view or missing the entire view. We first validated the proposed strategy in simulations, and then applied it to a treatment study of opioid dependence which would have been impossible with previous methods due to a number of missing-data patterns. Patients in the treatment study were naturally assessed in different feature spaces such as in the pre-, during- and post-treatment time windows. Our algorithm was able to identify subgroups where patients in each group showed similarities in all of the three time windows, thus leading to the identification of pre-treatment (baseline) features predictive of post-treatment outcomes. We found that cue-induced heroin craving predicts adherence to XR-NTX therapy. This finding is consistent with the clinical literature, serving to validate our approach.
Article
Multi-view graph-based clustering aims to provide clustering solutions to multi-view data. However, most existing methods do not give sufficient consideration to weights of different views and require an additional clustering step to produce the final clusters. They also usually optimize their objectives based on fixed graph similarity matrices of all views. In this paper, we propose a general Graph-based Multi-view Clustering (GMC) to tackle these problems. GMC takes the data graph matrices of all views and fuses them to generate a unified matrix. The unified matrix in turn improves the data graph matrix of each view, and also gives the final clusters directly. The key novelty of GMC is its learning method, which can help the learning of each view graph matrix and the learning of the unified matrix in a mutual reinforcement manner. A novel multi-view fusion technique can automatically weight each data graph matrix to derive the unified matrix. A rank constraint without introducing a tuning parameter is also imposed on the Laplacian matrix of the unified matrix, which helps partition the data points naturally into the required number of clusters. An alternating iterative optimization algorithm is presented to optimize the objective function. Experimental results demonstrate that the proposed method outperforms state-of-the-art baselines markedly.
Article
With the ability of exploiting the internal structure of data, graph based models have been paid great attention and earned great success in multi-view subspace clustering. Most of existing methods individually construct an affinity graph for each single view and fuse the result obtained from each single graph. However, the common shared and complementary diversity existed between views are not efficiently exploited. In this paper, we propose to address this issue by learning a joint affinity graph for multi-view subspace clustering based on low rank representation with diversity regularization and rank constraint. Specifically, a low rank representation model is employed to learn a shared sample representation coefficient matrix which is used to generate the affinity graph. Meanwhile, we use a diversity regularization to learn optimal weights for each view, which can suppress the redundancy and excavate the diversity among different feature views. In addition, the cluster number is used to promote the affinity graph learning by using a rank constraint. The final clustering result is obtained by using normalized cuts on the learned affinity graph. An efficient algorithm based on Augmented Lagrangian Multiplier with Alternating Direction Minimization (ALM-ADM) is carefully designed to solve the resultant optimization problem. Extensive experiments on various real-world datasets are conducted and the results well demonstrate the effectiveness of the proposed algorithm.
Preprint
Learning graphs from data automatically has shown encouraging performance on clustering and semisupervised learning tasks. However, real data are often corrupted, which may cause the learned graph to be inexact or unreliable. In this paper, we propose a novel robust graph learning scheme to learn reliable graphs from real-world noisy data by adaptively removing noise and errors in the raw data. We show that our proposed model can also be viewed as a robust version of manifold regularized robust PCA, where the quality of the graph plays a critical role. The proposed model is able to boost the performance of data clustering, semisupervised classification, and data recovery significantly, primarily due to two key factors: 1) enhanced low-rank recovery by exploiting the graph smoothness assumption, 2) improved graph construction by exploiting clean data recovered by robust PCA. Thus, it boosts the clustering, semi-supervised classification, and data recovery performance overall. Extensive experiments on image/document clustering, object recognition, image shadow removal, and video background subtraction reveal that our model outperforms the previous state-of-the-art methods.