Conference PaperPDF Available

Hierarchical Community Structure Preserving Network Embedding: A Subspace Approach

Authors:

Abstract and Figures

To depict ubiquitous relational data in real world, network data have been widely applied in modeling complex relationships. Projecting vertices to low dimensional spaces, quoted as Network Embedding, would thus be applicable to diverse real-world predicative tasks. Numerous works exploiting pairwise proximities, one characteristic owned by real networks, the clustering property, namely vertices are inclined to form communities of various ranges and hence form a hierarchy consisting of communities, has barely received attention from researchers. In this paper, we propose our network embedding framework, abbreviated SpaceNE, preserving hierarchies formed by communities through subspaces, manifolds with flexible dimensionalities and are inherently hierarchical. Moreover, we propose that subspaces are able to address further problems in representing hierarchical communities, including sparsity and space warps. Last but not least, we proposed constraints on dimensions of subspaces to denoise, which are further approximated by differentiable functions such that joint optimization is enabled, along with a layer-wise scheme to alleviate the overhead cause by the vast number of parameters. We conduct various experiments with results demonstrating our model's effectiveness in addressing community hierarchies.
Content may be subject to copyright.
Hierarchical Community Structure Preserving Network
Embedding: A Subspace Approach
Qingqing Long
Key Laboratory of Machine
Perception (Ministry of Education),
Peking University
qingqinglong@pku.edu.cn
Yiming Wang
Key Laboratory of Machine
Perception (Ministry of Education),
Peking University
wangyiming17@pku.edu.cn
Lun Du∗†
Microsoft Research
lun.du@microsoft.com
Guojie Song
Key Laboratory of Machine
Perception (Ministry of Education),
Peking University
gjsong@pku.edu.cn
Yilun Jin
Key Laboratory of Machine
Perception (Ministry of Education),
Peking University
yljin@pku.edu.cn
Wei Lin
Alibaba Group
yangkun.lw@alibaba-inc.com
ABSTRACT
To depict ubiquitous relational data in real world, network data
are widely applied in modeling complex relationships. Projecting
vertices to low dimensional spaces, quoted as Network Embedding,
would thus be applicable to diverse predicative tasks. Numerous
works exploiting pairwise proximities, one characteristic owned
by real networks, the clustering property, namely vertices are in-
clined to form communities of various ranges and hence form a
hierarchy consisting of communities, has barely received attention
from researchers. In this paper, we propose our network embedding
framework, abbreviated SpaceNE, preserving hierarchies formed
by communities through subspaces, manifolds with exible dimen-
sionalities and are inherently hierarchical. Moreover, we propose
that subspaces are able to address further problems in representing
hierarchical communities, including sparsity and space warps. Last
but not least, we proposed constraints on dimensions of subspaces
to denoise, which are further approximated by dierentiable func-
tions such that joint optimization is enabled, along with a layer-wise
scheme to alleviate the overhead cause by vast number of parame-
ters. We conduct various experiments with results demonstrating
our model’s eectiveness in addressing community hierarchies.
CCS CONCEPTS
Networks
Network structure;
Information systems
Col-
laborative and social computing systems and tools;
Mathematics
of computing Graph theory.
These authors contributed equally to the work.
Work performed as a student of Peking University.
Corresponding Author.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
CIKM ’19, November 3–7, 2019, Beijing, China
©2019 Association for Computing Machinery.
ACM ISBN 978-1-4503-6976-3/19/11.. .$15.00
https://doi.org/10.1145/3357384.3357947
KEYWORDS
Network embedding; subspace; complex networks; community
structure; data mining
ACM Reference Format:
Qingqing Long, Yiming Wang, Lun Du, Guojie Song, Yilun Jin, and Wei Lin.
2019. Hierarchical Community Structure Preserving Network Embedding:
A Subspace Approach. In The 28th ACM International Conference on Infor-
mation and Knowledge Management (CIKM ’19), November 3–7, 2019, Beijing,
China. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3357384.
3357947
1 INTRODUCTION
Network data have been ubiquitous due to its precise depiction of
relational data. Traditional network algorithms being not applicable
due to its forbidding computational costs, network embedding algo-
rithms, projecting vertices within networks to lower-dimensional
vector space while preserving general proximity between nodes,
have proved to overcome such complexity and hence being appli-
cable to a wider range of prediction tasks including link prediction
and classication [14, 28, 37].
Apart from general proximity between pairwise nodes, a char-
acteristic property real-world networks possess is the clustering
property, namely nodes would tend to form communities with
varying size and range at a far higher frequency than randomly
connected. In addition, community structures are highly informa-
tive in that they shed light on how the network is inclined to be,
with connections within communities being considerably more
probable than those across communities, and hence provide net-
work structures with resistance to noise links that occur randomly.
Even more remarkably, not only do vertices form communities, the
communities thus formed are commonly organized in a hierarchi-
cal manner as well, which proves to be signicantly indicative of
functional components of the underlying networks [3, 4].
Ubiquitous and instrumental as community structures and their
hierarchy are, relatively few attention has been devoted to such
properties. Typical attempts trying to preserve communities within
vector spaces are MNMF [
34
] and GNE [
7
]. MNMF, aiming at pre-
serving community structures within networks and reect them
Session: Long - Network Embedding I
CIKM ’19, November 3–7, 2019, Beijing, China
409
in embedding networks ignores the hierarchy across communities,
while GNE, preserving hierarchies across communities through
spherical projection, suers from drawbacks that, spherical pro-
jections tend to propel vertices across communities undesirably
far from each other, resulting in incorrect modeling for vertices
from dierent communities. In addition, communities lying deep
within the hierarchy are treated with extremely small spheres yet
possessing the same dimensionality as their shallower counter-
parts in GNE, and hence did not suciently exploit the inclusion
relationship between high-level and lower-level communities.
It is hence concluded that, utilizing communities as well as their
hierarchy is a eld of pressing importance yet demanding chal-
lenges, with relatively few attempts devoted to it. Specically, we
summarize the following challenges yet to be resolved such that
hierarchical community structures can be utilized.
(1) Sparsity
Intuitively, communities lying deep within the hi-
erarchy are less inclusive than their shallower counterparts,
and hence possess less variance within themselves. There-
fore, the original space in which the whole network is em-
bedded to will be increasingly inappropriate for embedding
deeper communities as they require far less variance to be
encoded than is enabled by the original space. Consequently,
embedding thus learned will suer from extreme sparsity
and as a result, noise arises, which will undermine the rep-
resentation for deeper communities. We hence conclude
that the sparsity issue for low-level communities should be
addressed, and that dimensionality of spaces where com-
munities reside should vary corresponding to their depth, a
requirement scarcely met by previous research works.
(2) Space Warps
As previously mentioned, spherical projec-
tion used by GNE suers from restrictions imposed by the
radii of the spheres, which decay exponentially as the hierar-
chy deepens. Consequently, it is common for vertices across
communities to be exponentially distant than those within
the same community, thus inappropriately underestimating
density of links across communities and condensing links
within communities. It is hence deduced that a desirable
gure to which communities are projected to should be de-
signed to ensure the correct modeling of nodes within and
across communities and alleviate space warps.
(3) “Curse” of Depth
Just like the “Curse of Dimensionality”,
when encountered with extremely deep hierarchies, it is
not trivial that a sensible scale in which communities reside
be still maintained, a counter-example of which would be,
again, GNE. As explained previously, the radii of spheres to
be projected to shrink exponentially with increasing depth,
resulting in unduly tiny radii which may cause practical
problems including underow.
To address these challenges, we propose Subspace Network Em-
bedding, abbreviated SpaceNE, to model the community structures
in networks along with their hierarchy. Specically, we observe
that subspaces within Euclidean space inherently follow the or-
ganization of hierarchy. For example, in three-dimensional space
as illustrated in Figure 1, planes and lines, which are subspaces
of varying dimensions reside within the 3-d space, while as lines
reside within planes and planes within the 3-d space, an inherent
hierarchy is illustrated. In addition, natural metrics of measuring
distances between pairwise subspaces exist, such as angles between
pairwise planes and lines, which can be easily adopted to measure
similarities between communities dwelling within subspaces of the
same dimensions. Both the aforementioned factors, inherent hierar-
chy and handy metrics, contribute to our modeling of hierarchical
community structure using subspace.
In addition, we are delighted to nd that subspaces possess other
appealing properties that can further enhance our modeling of
community structures and their hierarchy. On one hand, the di-
mensionality of subspaces is highly exible, which corresponds
to the inconsistency of variance within communities of dierent
depth within the hierarchy and alleviates the sparsity issue, thereby
ltering undesirable noise and enhancing our representation. On
the other hand, subspaces are at and possess consistent scale of
distance regardless of their dimensionality, which maintains dis-
tances within a community and across communities at comparable
scales while allowing deep hierarchies to possess a similar scale of
distance to their shallower counterparts, facilitating the modeling
of arbitrarily deep community hierarchies.
Consequently, we extended DeepWalk [
23
], which preserves
general proximity between pairwise vertices though co-occurence
in random walks. In addition to proximity between vertices, with
subspaces to model hierarchical community structures, we project
vertices according to their communities, into subspaces of corre-
sponding dimensions, during which objectives preserving simi-
larities within a community and across communities are adopted,
such that representations for communities, i.e. subspaces can be
jointly optimized along with node representation vectors. What
is more, we impose constraints on the dimensions of subspaces
used to represent communities, keeping them as low as possible,
such that a minimal level of redundancy and noise is kept, which
was further approximated according to matrix algebra and convex
optimization into a dierentiable one, such that it can be optimized
simultaneously along with proximity between pairwise vertices
and communities.
To summarize, we make the following contributions:
We propose Subspace Network Embedding model, abbrevi-
ated SpaceNE, introducing subspaces to the eld of commu-
nity preserving network embedding, which, to the best of
our knowledge, is the rst attempt of introducing subspaces
into network representation learning.
We design elaborate objectives preserving proximity be-
tween pairwise nodes, across communities, along with con-
straints on subspace dimension which was then approxi-
mated by a dierentiable one, leading to ecient optimiza-
tion of our model.
We conduct extensive experiments on several real-world
datasets, where experimental results demonstrate that SpaceNE
is signicantly more competitive to its various counterparts
on various applications.
2 RELATED WORK
Hierarchical Network.
Many real-world systems can be mapped
into networks with hierarchical community structure [
19
][
21
]. Gen-
erally, a network community refers to a dense sub-network in which
Session: Long - Network Embedding I
CIKM ’19, November 3–7, 2019, Beijing, China
410
(c) 3-d hierarchical subspace(b) Hierarchical community structure
d-dimensional subspace
(d-1)-dimensional subspace
(d-2)-dimensional subspace
(a) Network
Figure 1: The correspondence between the community hierarchy and the subspace hierarchy
vertices are densely connected one another [
20
]. The communities
can be recursively divided into sub-communities. Thus, commu-
nities on dierent scales are hierarchically organized like a tree.
Exploring the hierarchical community structure of network has
proved to possess a wide range of applications, including scientic
collaboration analysis [
10
], protein function prediction [
26
] and
so on. Thus more and more works pay attention to networks con-
taining a hierarchy of communities [
3
,
25
]. [
3
] claims that knowing
the hierarchical structure of complex networks is useful for link
prediction tasks.
Network Embedding
. Network embedding aims at embedding
the network data to a low-dimensional space [
5
,
8
,
33
]. Deepwalk
[
23
] learns vertex representations by using truncated random walk
and Skip-Gram. LINE [
28
] rst proposes the method to preserve
the rst-and second- order proximity among nodes. Struc2Vec [
24
]
denes a hierarchy to measure node similarity at dierent scales,
and constructs a multilayer graph to describe structural similarities.
As an important property of the network, the community struc-
ture has been extensively studied in network embedding. Recently,
as for preserving the hierarchical community structure of the net-
work, several models have been proposed. [
34
] leverages the matrix
factorization to integrate the community structure information into
the node embeddings. [
22
] learns the network representations in
the hyperbolic space. Its limitation is the learned representations
are hyperbolic vectors, which can not be applied to the majority
machine learning algorithms whose inputs are usually Euclidean
ones. Inspired by the structure of the Galaxy, [
7
] presents GNE
that embeds the communities at dierent scales onto the surface of
the spheres with dierent radii. However, this algorithm has obvi-
ous shortcomings. With the level increasing, the radii of spheres
decrease quickly, which limits the representation spaces for the
communities in deep levels. Therefore, the optimizing algorithm
with a constant learning rate can not learn the proper representa-
tions for them.
Subspace
. A subspace is a subset of a topological space endowed
with the subspace topology [
27
]. Subspaces, inherently of lower
variance than the original space, can be used to approximate data
with higher dimensions such that only the principal features are
kept within. Typical and widespread examples of utilizing subspaces
are low-rank approximations and subspace clustering. Classical
related words include [
6
,
15
,
31
], which all have achieved convincing
results in their corresponding elds. Nonetheless, though widely
explored in other elds, subspaces are largely overlooked in the
eld of network representation learning.
3 PRELIMINARIES AND PROBLEM
STATEMENT
Denition 1
(
Hierarchical Network
)
.
Let
G(V,E)
denote an undi-
rected network with
V
as the vertex set and
E
being the edge set.
T
denotes the tree representing hierarchy of communities within
the network
G
with depth
L
and node set
C
. For a node
cC
,
ch(c)
and
pa(c)
denotes its set of children and its parent node within
T
,
respectively. We let
Cl
iV
denote the set of the
i
-th community
within the
l
-th layer. Specically,
C1
1=V
is the original set of
vertices of G.
Denition 2
(
Subspace
)
.
Let
U
be the vector space
Rn
.
Us
, a non-
empty subset of
U
, is called a subspace
1
if for any element pair
x,yUs
and any scalar
λR
,
x+yUs,λxUs
[
27
][
15
].
Subspace has several important properties mentioned as follows to
model hierarchical network embedding:
Hierarchical structure
. Subspaces within Euclidean space
inherently follow the organization of hierarchy. For example,
in three-dimensional space as illustrated in Figure 1, com-
munities can be vividly depicted as planes, or 2-d subspaces
of the 3-d space, and sub-communities can be consequently
modeled as lines, or 1-d subspaces, residing in their corre-
sponding communities, or planes. In addition, vertices, rep-
resented by points in Euclidean space, can reside in arbitrary
subspaces of arbitrary dimension.
Lower Dimensional Representation
. The subspaces has
proved to be instrumental in extracting principal features
from extremely high-dimensional data, demonstrated dis-
tinctively by the famous PCA [
35
] decomposition algorithm,
which selects the subspaces in which highest variances of
the original data lie. Consequently, we would anticipate that
subspaces would serve as a sieve, preserving those principal
characteristics of the original networks and communities.
Denition 3
(
Hierarchical Subspace Network Embedding
)
.
The
hierarchical subspace is denoted as
T
with a depth of
L
.
W=
1
There is no loss of generality in assuming that subspaces are linear, i.e., they all
contain the origin. For the ane subspaces that do not contain the origin, we can
always increase the dimension of the ambient space by one and identify each ane
subspace with the linear subspace that it spans. So we always use ‘subspace’ to denote
‘linear subspace’ and ‘ane subspace’ in this work.
Session: Long - Network Embedding I
CIKM ’19, November 3–7, 2019, Beijing, China
411
[w1,w2, ... , wn] ∈ Rn×m
denotes a basis of a certain subspace and
Wl
i
denotes the
i
-th subspace at the
l
-th level of
T
. The upper bound
of the dimension of subspace is a hyper parameter. The descending
trend of subspace dimension can be linear (eg.
ba·l
) or non-linear
(eg.
log(bl)⌋
), where
b
and
a
are positive integers. Community
cl
i
should be corresponding to
Wl
i
in a hierarchical subspace. The
objective of hierarchical community structure preserving network
embedding is to learn the subspace representations (corresponding
base vectors) of the input hierarchical network.
4 SUBSPACE BASED HIERARCHICAL
NETWORK EMBEDDING
Subspaces follow a hierarchy naturally, which inspires us to come
across a subspace based hierarchy preserving network embedding
approach. SpaceNE preserves pairwise proximity and proximities
between hierarchical communities, including structural informa-
tion within a community and among communities.
4.1 Preserving Pairwise Proximity Between
Vertices
We propose that preserving pairwise proximity between vertices
should be most fundamental in general network embedding prob-
lems. Hence we rst address the pairwise proximity between nodes
similar to DeepWalk, as mentioned before.
min
®u
1=Õ
viVÕ
vjN(vi)
log σ(
®uj− ®ui
2)
+k·EvnPn(v)[logσ(− ®ui− ®un2)],
(1)
where
σ(x)=
1
/(
1
+ex)
is the sigmoid function,
®ui
is the rep-
resentation vector of
vi
,
N(vi)
denotes the “neighborhood” of
vi
,
i.e. vertices co-occuring with
vi
in random walks, and
Pn(v)
is the
noise distribution for negative sampling.
4.2 Preserving Proximity of Hierarchical
Communities
4.2.1 Preservation of Structure within Individual Communities. Ver-
tices from the same community should be closer to each other than
those from dierent communities, thus we project nodes from the
same community into the same subspace. To integrate the hierar-
chical subspace information, for each community
cl
i
in layer
l
we
have the following constraint:
rank (Ul
i) ≤ dl,(2)
where
dl
is the dimension of the subspace in layer
l
, a hyper-
parameter, and
Ul
i
is a matrix each line of which is a representation
vector of a vertex, belonging to community
cl
i
. Each vertex
vi
in its
corresponding community has a new vector representation under
base-vectors of the corresponding subspace, which is denoted as
®ul
i. Hence the variable ®uiin Eq. (1) becomes an alias of ®u1
i.
Considering that such constraints make the problem dicult to
solve, we equivalently transform the way of introducing constraints
through vertex projection layer by layer. Specically, we change
the decision variables from
®u1
iRd1
into
®uL
iRdL
where
dL<d1
,
and introduce auxiliary decision variables
Sl
jRdl×dl1
denoting
1
2
3
4
a
b
c
Figure 2: Illustration of relationship preservation among
communities. Distance of node embeddings may change be-
fore and after projection, as there are angles among the pro-
jection subspaces.
the projection matrix, i.e. there exists the following relationship:
®ul
i=Sl
j®ul1
i,®ul1
i=Sl
j
®ul
i,(3)
where
j
is the index of community
cl
j
that the vertex
vi
belongs to
in the l-th layer, and Sl
j
is the pseudo-inverse of the matrix Sl
j.
4.2.2 Preservation of Structure among Communities. However, the
relationship among communities may be lost if only the hierar-
chical projection method above is used. Actually, the relationship
among dierent subspaces is able to reect the relationship among
communities after adding the subspace constraints. For example, as
shown in Fig.2, the projected position of node 3 in the grey plane is
b, and cin the green plane. The distance between node 2 and node
3 varies with the planes to which node 3 is projected. The main
idea of formulating this objective is to minimize the dierence of
subspace similarity and community similarity between every two
communities i,j, namely,
min l
2=|| lΓl||F,(4)
where
l
is a matrix, each entry of which
l
i,j
is the similarity
of two communities
cl
i
and
cl
j
in the original graph. Similarly,
Γl
i,j
is the similarity of the two subspaces . The method of calculating
can be customized according to the datasets. For example, the
community similarity, based on PMI [
13
] or common neighbors [
7
],
etc, is reasonable. The relationship between two subspaces can be
measured by the inner product of projection matrices [
27
]. Thus
the subspace similarity Γl
i,jcan be dened as
Γl
i,j=||Sl
jUl1
i(Sl
jUl1
i)T||F.(5)
4.2.3 Low Rank Representation. As stated before, inconsistent de-
mand for dimensionality poses a signicant threat on the repre-
sentation of subspaces. On one hand, inappropriate dimensionality
for modeling communities, especially those lying deep within the
hierarchy, would result in redundancy in dimensionality and hence-
forth sparsity and noise, which would compromise the quality of
representations of vertices and communities. On the other hand, in
order to get rid of undesirable noise while conserving the principal
features of the original networks, we would prefer a slimmer sub-
space such that only the variances that are vital enough are kept
while leaving those obscure behind.
Session: Long - Network Embedding I
CIKM ’19, November 3–7, 2019, Beijing, China
412
Consequently, we are motivated to impose penalties on the di-
mensionality needed for subspaces, which corresponds to the rank
of projection matrices. Therefore, we add the following constraint
on projection matrices:
min
Sl
r eд=
k
Õ
i=1
rank (Sl
i),(6)
where
rank
of a matrix is the number of linearly independent
columns which it contains and kis the number of communities at
each layer. Since
SRm×n
is the projection matrix, the projected
space lies within the column space of
S
. We can minimize the rank
of
S
with Eq.
(6)
to reduce the dimension of the projected subspace.
Eq.(6) can restrict
Sl
i
to be of rank as low as possible, resulting in
desirable low-dimensional subspaces, which further improves the
capability of denoising.
Considering all the objectives mentioned above, the overall ob-
jective function is:
min
UL
,S
3=1+α
L
Õ
l=1
l
2+λ
L
Õ
l=1
l
r eд,(7)
where
α
and
λ
are the coecients regulating the contributions of
two community preserving targets.
5 LEARNING PROCEDURE
The optimization of Eq.
(7)
is extremely dicult due to the following
two problems. First, it has a massive number of parameters that
need to be optimized at a single iteration. Thus directly optimiz-
ing may lead to vanishing gradient [
11
]. Second, it has a discrete
term of ranks, which cannot be solved directly by dierentiating.
Correspondingly we propose two alternatives to address the above
problems.
5.1 From Global to Layer-wise Optimization
It is obvious that the overall objective function has a massive num-
ber of complicated parameters, such as matrix inversion, which
may lead to an unacceptable ineciency and vanishing gradient.
Inspired by the layer-by-layer training in auto-encoder [
12
], we
consider learning the parameters hierarchically in our method.
Specically, we rst optimize pairwise proximity
Φlocal
i
through
Eq.
(1)
, which is followed by adding the constraints to
Φlocal
i
of
each layer in the subspace hierarchy. On the other hand, unitary
transformation does not change the relative position of vectors [
36
],
thus it’s reasonable to learn the parameters layer by layer, allevi-
ating the need to project nodes all the way down to the bottom of
the hierarchy.
The hierarchy subspace constraints are essentially projections
among spaces, while other constraints are naturally disassembled in
the layer-wise optimization. This projection is similar to the general
idea of PCA [35], which selects the spaces in which maintains the
nearest reconstruction of vertices before and after projection with
high eciency. Based on the following Lemma.1, we extend PCA
to describe the projection of all inner-community nodes in the
subspace:
min l
4=
k
Õ
i=1
tr ((Sl
i)TUl
i(Ul
i)TSl
i),(8)
where
tr (M)
is the trace of
M
. Eq.
(8)
constructs a low-dimensional
representation of the data, which omits complicated and inecient
matrix inversion, and improves the capability of denoising further.
Lemma 1.
If Eq.
(8)
achieves desired optimal solution, the optimal
result learned from recursively optimization is the same as that of
joint optimization (Eq. (1)) with constraints in Section 4.2.1.
Proof.
Let
dui,®uj)
be the Euclidean distance of vertices
vi
and
vj
. The vector
®ui
after projection is denoted as
®u
i
. Based on
Trigonometric Inequalities [1] in Euclidean Space, here comes
dui,®uj) dui,®u
i)+du
i,®u
j)+duj,®u
j),
du
i,®u
j) ≤ d( ®ui,®u
i)+dui,®uj)+d(®uj,®u
j).
The projection of PCA[
35
] maintains the nearest reconstruction of
vertices before and after projection, which means
lim
Eq .(8)0dui,®u
i) → 0,duj,®u
j) → 0
Thus,
du
i,®u
j) → d( ®ui,®uj).
It means that relationships among vertices before and after projec-
tion are maintained.
5.2 From Discrete to Dierentiable
Optimization
Rank minimization turns out to be an NP-hard combinatorial prob-
lem that is computationally intractable in practical cases. Mean-
while, the rank minimization is discontinuous, so the objective
function cannot be optimized with SGD. The tightest possible con-
vex relaxation of function (13) is to replace the rank with the nuclear
norm equal to the sum of its singular values [2]:
rank (M) ≈ ||M||,(9)
where ||M||is the nuclear norm of matrix M.
As nuclear norm is discontinuous, it cannot be optimized with
SGD as well. [18] gives a smooth approximation via conjugate for
nuclear norm. It is the sum of the Huber penalties applied to the
singular values of M.
||M||Õ
i
ϕµ(Θi(M)),(10)
where Θi(M)is the singular values of M, and ϕµis
ϕµ(Z)=
Z2
2µ,|Z| ≤ µ
|Z| − µ
2,|Z|>µ
(11)
where
µ
controls accuracy and smoothness,
Z
is the result of
Θi(M)
.
Thus the approximated regularization term
r eд
is Eq.
(12)
, and
the
overall objective function
in the
l
-th layer is approximated
as Eq.(13):
l
r eд
k
Õ
i=1Õ
m
ϕµ(Θm(Sl
i)),(12)
min
UL
,S
l=l
4+αl
2+λl
r eд.(13)
Session: Long - Network Embedding I
CIKM ’19, November 3–7, 2019, Beijing, China
413
It should be noted that, to ensure that the base vectors of the
subspace of each level are orthogonal,
Sl
i
is orthogonalized [
1
].
Another advantage of layer-by-layer learning is to replace the
pseudo-inverse calculation in the origin problem with a matrix
orthogonalization calculation over Sl
iof each layer.
Algorithm 1 The SpaceNE algorithm
function
LearnFeatures(Network
G
, Hierarchical Clustering
Tree T)
Ulocal = minimize 1with Adam (see Eq.(1))
U= RecursiveOptimization (c1
1,T,Ulocal)
return U,S
function
RecursiveOptimization(Current node
cl
i
, Hierachical
Clustering Tree T, Representation Set Ulocal )
if l=Lthen return U,S
{Sl+1
j}= minimize H(l)
iwith Adam (see Eq.(13))
for all cl+1
jCh(cl
i)do
Sl+1
j=Orthoдonalize (Sl+1
j)
Ul+1
j=Sl
jUl
i
for all cl+1
jCh(cl
i)do
U,S= RecursiveOptimization (cl+1
j,T,Ulocal)
return U,S
5.3 The Subspace Algorithm
Algorithm 1 is the pseudocode of SpaceNE. In the function Learn-
Features, we optimize the local objective Eq.
(1)
to learn the node
embedding vectors preserving local information, and then adjust
the embedding vectors to add subspace constraints from top to
down. In the function RecursiveOptimization, the set of projection
matrices
{Sl+1
j}
can be obtained by optimizing the objective
(13)
and orthogonalization. We obtain the node representations of the
next layer Ul+1
jby matrix multiplication.
Time Complexity Analysis
Without loss of generality, we as-
sume that Tis a k-ary tree and the dimension of subspaces in l-th
layer decreases exponentially, i.e.
dl=logk(Dl)⌋
. SpaceNE
should be done on each tree node layer by layer and the learning
procedures on the same layer tree nodes can be parallelized. For
a certain node
cl
i
in
T
, the complexity of calculating the loss
l
2
is
O(k(logk(Dl))2)
, and the complexity of the loss
l
4
is
O((klogk(D
l))2)
. Therefore, the time complexity of optimizing the overall loss
is
O(E(klogk(Dl))2)
where
E
is the number of training epochs.
Thus the total complexity of SpaceNE is
O(|V|E (klogkD)2)
. The
optimization algorithm is implemented on the Tensorow platform,
thus can be accelerated with GPU.
6 EXPERIMENTS
6.1 Experiment Setup
Datasets
. We employ the following real networks in the Facebook
social networks [
30
]. The datasets in our paper and their basic
attributes are shown in Table 1. Moreover, synthetic network con-
sisting of a 6-layer hierarchy is used to evaluate the performance
of hierarchical structure preservation. We generate a self-similar
network with a hierarchical network generation algorithm intro-
duced in [
10
]. Specically, we rst generate a 6-layer trident tree,
and use the leaves of the tree as the nodes of the network. Then we
add edges to the node pairs with probability proportional to path
length among them in the k-ary tree.
Datasets Vertices Edges Layers Classes Labels
Amherst 2314 96394 2 5
Georgetown 9414 425639 2 5
UC 16808 522148 2 5
Sync_6 729 21735 6 × ×
Sync_show 125 982 4 × ×
Amherst_noise 2314 91409 2 5
Table 1: Data set statistics and properties.
Relevant Algorithms
. In the experiments, we compare SpaceNE
against the following baselines:
MNMF [
34
]: a single-layer community structure preserv-
ing baseline, which integrates the community information
through a matrix factorization.
GNE [
7
]: a multi-layer community structure preserving base-
line, which embeds communities onto surface of the spheres.
DeepWalk [
23
]: a method combines truncated random walks
[9] with Skip-Gram [17] model to learn vertex embeddings.
LINE [
28
]: a popular baseline based on preserving the rst-
and second- order relational information among vertices.
Struc2Vec [
24
]: measures node similarity at dierent scales,
and uses a multilayer graph to encode structural similarities.
SpectralClustering [
29
]: learns the vertex representations by
factorizing the Laplacian Matrix.
Parameter Settings
. We conducted many experiments, and the
optimal default parameters are selected as follows:
µ=
10,
α=
1,
λ=
10
3
. In our experiments, the embedding size
m
of all models is
64.
Besides, the parameter setting of comparison models follows the
recommended settings in relevant code packages. Specically, for
DeepWalk, the walk length is set to 40 and window size is set to 10.
For GNE, the scaling radius is set to 3.0,
λ
is set to 0.2,
θ
is set to 3,
the initial radius is set to 105, min radius is set to 0.05, max radius
is set to 0.25, and the scaling radius is set to 3.0. For LINE, we set
the number of negative samples used in negative sampling as 5, the
starting value of the learning rate as 0.025, and the total number
of training samples as 100M. We consider both srt- and second-
order information in LINE. For Struc2Vec, the walk length is set to
10, the number of walks is set to 80, the stay probability is set to
0.3. For MNMF, the number of clusters to is set to 10,
λ
is set to 0.2,
β
is set to 0.05,
η
is set to 5.0, the parameter “lower-control” is set
to 10
15
. For SpectralClustering, we use PCA to reduce dimension
of the laplacian matrix.
6.2 Alleviating Sparsity
To evaluate the performance of alleviating sparsity of SpaceNE, we
conduct experiments on node classication and its resistance to
randomly added noises on edges, with respect to link prediction.
Session: Long - Network Embedding I
CIKM ’19, November 3–7, 2019, Beijing, China
414
Model Amherst Georgetown UC
30% 50% 70% 90% 30% 50% 70% 90% 30% 50% 70% 90%
SpaceNE 92.52 93.11 93.74 95.09 56.12 56.42 56.92 56.54 88.69 89.02 89.23 90.07
GNE 93.17 93.33 93.26 93.52 52.19 53.53 53.75 53.12 87.78 88.42 88.42 87.57
MNMF 87.11 88.04 89.23 89.96 51.52 51.69 51.60 53.25 87.89 87.95 88.09 88.10
DeepWalk 91.09 91.26 91.71 92.03 51.45 53.25 53.76 54.03 88.35 88.42 88.51 88.63
LINE 91.11 91.53 91.89 91.67 51.35 51.93 52.18 52.38 87.71 87.88 87.95 87.53
Struc2Vec 72.72 73.35 73.92 77.23 46.85 47.44 48.33 47.59 87.96 87.89 88.11 88.25
SpectralClustering 72.88 73.51 73.89 74.41 49.67 50.02 50.79 51.23 84.23 84.35 84.31 84.21
Table 2: The multi-label classication results on dierent percentages of train datasets
6.2.1 Node Classification. Three real-world social networks of the
Facebook datasets with a four-layer hierarchical tree (including root
and leaves) are used in the vertex classication experiment. The two
intermediate layers are divided by the enrollment year and major,
respectively. For MNMF, we use enrollment year as an indicator
for community division. The learned representations are used to
classify the vertices into a set of labels. The classier we used is
Logistic Regression, and the evaluation metric is Accuracy. Dierent
percentage of nodes are sampled randomly for evaluation, and the
rest are for training. The results are averaged over 10 dierent runs.
Table 2 shows that SpaceNE performs well in most cases. Al-
though MNMF, GNE and SpaceNE all take community information
into consideration, SpaceNE still performs better, which means the
low-rank representation of SpaceNE plays an important role. Al-
though considering some global network structure, Struc2Vec does
not work very well. Compared with Amherst, our model performs
better on the Georgetown dataset and the UC dataset. It is because
our model is more stable on larger datasets after integrating the
hierarchical community structure information.
6.2.2 Resistance to Random Noise. Besides preserving the deep hi-
erarchical community structure, SpaceNE tends to be more resistant
to random noises due to the inherent properties of subspaces, hence
generating node embeddings robust enough regardless of noise. On
one hand, the reconstruction optimization term (Eq.
(8)
) of SpaceNE
is similar to PCA, which maintains the noise reduction feature of
it. On the other hand, the low rank optimization term (Eq.
(6)
) of
SpaceNE can learn the most important features of the network and
then improve the resistance to noise[
15
]. In order to verify the re-
sistance to noise of SpaceNE, we conduct several experiments and
compare the results of SpaceNE with other community or global
structure preserving methods (MNMF, GNE and Struc2Vec). We
construct the noisy data according to the method introduced by
[
32
]. Specically, the dataset we used is Amherst_noise (see Table 1).
We contaminate Amherst by adding 5% the number of total edges
and delete 5% of them in Amherst randomly. Then, we conduct link
prediction tasks using SpaceNE and its counterparts on Amherst
and Amherst_noise datasets for comparison.
The results are shown in Table 3. The values in Table 3 rep-
resent the reduction in percentage in precision and recall of the
algorithms after adding the noise. “Gain of SpaceNE” is the im-
provement of SpaceNE with respect to the best of other baseline
methods. It can be seen from Table 3 that all the algorithms go
worse after contamination, but SpaceNE still performs better than
its counterparts. Moreover, we can nd that compared with other
Model precision precision recall recall
70% 90% 70% 90%
SpaceNE -0.54 -0.24 -0.01 -0.01
GNE -1.26 -0.96 -0.03 -0.26
MNMF -2.71 -2.05 -0.02 -0.06
Struc2Vec -1.01 -0.99 -0.02 -0.02
Gain of SpaceNE [%] 46.53 75 50 50
Table 3: The link prediction results on dierent percentages
of the training dataset, Amherst_noise. The values are reduc-
tion in precision (or recall) of the algorithms after adding
the noise.
community preserving methods (MNMF and GNE), SpaceNE can
not only preserve a deeper hierarchical community structure, but
it is also less aected by noise in a complex network.
6.3 Alleviating Space Warps
To evaluate how SpaceNE alleviates space warps, we conduct ex-
periments on link prediction, verifying whether it relieves the prob-
lem owned by its community-aware counterparts. In addition, we
present an illustration of how GNE and SpaceNE preserve distances
between pairwise, intra-community nodes as the depth increases.
6.3.1 Link Prediction. We conduct link prediction experiments on
all three datasets. We sample a proportion of edges from the initial
network, which are used as positive samples, along with an identical
number of random negative edges. We take the inner product of
embedding vectors as the score for each sample, which are then
ranked where the samples with their score in top 50% are predicted
as “positive” edges. We report precision as the metric, which is
equivalent to recall as we use an identical number of positive and
negative samples.
The results of are reported in Table 4. Intuitively, community-
aware methods are inherently compromised in link prediction tasks
as, compared to Skip-Gram based model which preserves primarily
proximity between pairwise nodes, their community-aware coun-
terparts tend to sacrice such pairwise properties for better mod-
eling of clustering properties and hence warping the space where
vertices reside, which can be exemplied by GNE and MNMF be-
ing gravely defeated by Skip-Gram models including DeepWalk.
In contrast, our model, SpaceNE, is able to address more complex
community structures without compromising its performance com-
pared with its Skip-Gram counterparts. It is hence concluded that
Session: Long - Network Embedding I
CIKM ’19, November 3–7, 2019, Beijing, China
415
our modeling of communities using subspaces possesses a wider
range of applicable elds in that it does not enhance community
structures at the expense of other key properties.
6.3.2 Distance between pairwise, intra-community nodes. In order
to demonstrate why SpaceNE performs better on the task of link
prediction than other community preserving methods, here we
show the average distance between pairwise, intra-community
nodes, which will be aected most signicantly by space warps in
GNE.
As SpaceNE is able to preserve the distances among nodes rea-
sonable without making them too close as GNE, we calculate the
average distance of intra-community node pairs on dierent layers.
Specically, the average distance of intra-community node pairs
on layer lis calculated as
1
|Ml|
|Ml|
Õ
i=1A_Dis(Cl
i),(14)
where
Ml
is the number of communities on the
l
-th layer.
A_Dis
is the average distance of all nodes pairs belonging to the
i
-th
community on the l-th layer Cl
i, which can be calculated by
A_Dis(Cl
i)=1
|Cl
i||Cl
i1|Õ
x,yCl
i,x,y
D(x,y).(15)
D(x,y)
is the Euclidean distance between vector representations
of node
x
and node
y
. We use Sync_6, introduced previously as an
articial network consisting of a 6-layer community. Fig.3 shows
the results with varying numbers of layers, where the y-coordinate
scale is
logarithmic
. Fig. 3 shows that the average distance decays
exponentially in GNE, while linearly in SpaceNE. This result partly
explains why SpaceNE is superior to GNE in essence: the nodes are
too close to each other so that they can not be distinguished.
Model Amherst Georgetown UC
SpaceNE 85.61 89.28 91.32
GNE 62.07 68.97 51.25
MNMF 48.89 49.76 50.05
DeepWalk 86.40 89.16 91.39
LINE 74.37 76.58 71.22
Struc2Vec 51.77 49.94 46.83
SpectralClustering 37.76 40.63 38.68
Table 4: The link prediction results on dierent datasets.
6.4 Eliminating the “Curse” of Depth
To evaluate the performance of modeling extremely deep hierar-
chies, experiments are conducted on deep hierarchical community
detection task. The basic properties of our dataset “Sync_6” are
shown in Table 1. “Sync_6” is a 6-layer synthetic dataset. We ap-
plied K-means to the learned node representations. We compare the
dierences between the clustering results and the prior community
division at dierent layers of the hierarchical community tree. Ad-
justed rand index (ARI), ranging from -1 to 1, is an index reecting
the similarity between two sets, thus it is applied as an index to
123456
Number of layers in the network
10-2
100
102
104
Average Distance
SpaceNE
GNE
Figure 3: Average Distance with the increasing of layers
evaluate the performance of community preservation. The closer
ARI is to 1, the better the performance of community detection is.
Fig.5 illustrates the eect of the number of layers on the hierar-
chical community detection (excluding the root, for there is only
one community in the root layer). The results show that the hierar-
chical community structure can be integrally preserved by SpaceNE,
no matter how deep the hierarchical community or how many com-
munities exist within the network. Obviously, when the number of
layers reaches 4, the radii of spheres in GNE are unduly small, thus
the accuracy of community detection begins to decline, which is
probably cause by underow as the number of layers increase and
the spheres shrink.
Besides, the experimental results illustrate that DeepWalk and
LINE are incompetent at structure preservation. While DeepWalk
can preserve part of the hierarchical information in the process
of random walks, the results show when the number of layers ex-
ceeds 4, the performance of DeepWalk drastically declines. LINE
performed even worse compared to DeepWalk, as LINE only con-
siders the local structure as far as 2 steps away of each node and
does not model the hierarchical community information. MNMF
only preserves community at some layers but not all, leading to
poor performance after a few layers. It is worth mentioning that
as a global structure preserving algorithm, Struc2Vec performs the
best at layer 5, while becomes worse at the deeper layers.
6.5 Eciency
To demonstrate the eciency advantage of SpaceNE, we compared
the running time of SpaceNE, GNE, Struc2Vec and MNMF, methods
capturing the community or global structure. The three datasets
from small to large are described in Table 1. All eciency exper-
iments were conducted on a single machine with a 12GB mem-
ory GPU. Results are presented in Fig. 6. MNMF only considers a
single-layer community, thus the running time is shorter. Although
Struc2Vec captures some global structural information, its optimiza-
tion is too slow, which is not suitable for large networks. Although
GNE and SpaceNE have similar time complexity in theory, the con-
vergence of SpaceNE is faster. The reason is that GNE preserves the
Session: Long - Network Embedding I
CIKM ’19, November 3–7, 2019, Beijing, China
416
(b) GNE (c) MNMF
(e) LINE (f) DeepWalk(d) Stru2vec
(a) SpaceNE
Figure 4: The visualization of vertex representations in 2-D space from dierent models.
123456
Number of layers in the network
-0.5
0
0.5
1
Performance of Community Detection
SpaceNE
GNE
MNMF
Deepwalk
LINE
Stru2Vec
Figure 5: The comparison of hierarchical community preser-
vation on dierent models. A 6-layer generated hierarchical
networks is used.
community structure through projections to the sphere. Compared
with the linear matrix multiplication of subspace, the optimization
of GNE is more complex and requires more epochs. The result
shows that SpaceNE is scalable to large networks.
6.6 Visualization
We visualize the “Sync_show” network used in [
7
]. Fig.4 shows the
visualization experiments. For all experiments in the comparison,
we rst embed the network into low-dimensional spaces, and then
map the low-dimensional vectors of the vertices to a 2-D space with
Amherst Georgetown UC
Dataset
0
50
100
150
200
Running time/min
MNMF
Stru2Vec
GNE
SpaceNE
Figure 6: The running time on dierent datasets.
t-SNE[
16
] package. Note that [
7
] maps GNE
directly into two-
dimensional space
, which is dierent from the other methods.
Thus in order to unify the experimental standards, we do not use
the method in this work. We use t-SNE to conduct dimensionality
reduction for all methods.
Fig.4 shows that the SpaceNE preserves both relationship among
communities and relationship within the community. Although
GNE keeps the relationship among communities, the nodes from
the same community are too close to each other, which means the
relationship among the nodes in the same community is ignored.
This proves the space warp of GNE again, from a new view. Addi-
tionally, SpaceNE has an outstanding performance on clustering
vertices compared with other methods.
Session: Long - Network Embedding I
CIKM ’19, November 3–7, 2019, Beijing, China
417
7 CONCLUSION AND FUTURE WORK
In this paper, we proposed SpaceNE, introducing subspaces to the
eld of community network embedding. To the best of our knowl-
edge, this work is the rst attempt of introducing subspaces into
network representation learning. Specically, we design elaborate
objectives preserving proximity between pairwise nodes, across
communities, along with constraints on subspace dimension which
was then approximated by a dierentiable one, leading to ecient
optimization of our model. Empirically, we verify SpaceNE in a
variety of datasets and applications. Extensive experimental re-
sults demonstrate the advantages of SpaceNE, especially on link
prediction, hierarchical community preservation.
Here we focus on the hierarchical network embedding by using
subspace theory. Nevertheless, the theory of subspace is largely
overlooked in the eld of network representation learning. For
future work, one intriguing direction is utilizing subspace theory
to deal with the heterogeneity of complex networks. Also, in real-
life scenes, such as neighborhood-based recommendation, when
searching for the nearest neighbor of an item, the search engine
only needs to search in the lower-dimensional subspace, which can
improve the eciency greatly.
ACKNOWLEDGMENTS
We are thankful to Yizhou Zhang for his helpful suggestions. This
work was supported by the National Natural Science Foundation
of China (Grant No. 61876006 and No. 61572041).
REFERENCES
[1]
Åke Björck. 1994. Numerics of gram-schmidt orthogonalization. Linear Algebra
and Its Applications 197 (1994), 297–316.
[2]
Emmanuel J Candes and Benjamin Recht. 2009. Exact Matrix Completion via
Convex Optimization. Foundations of Computational Mathematics 9, 6 (2009),
717–772.
[3]
Aaron Clauset, Cristopher Moore, and Mark EJ Newman. 2008. Hierarchical
structure and the prediction of missing links in networks. Nature 453, 7191 (2008),
98.
[4]
Aaron Clauset, Cristopher Moore, and M E J Newman. 2006. Structural inference
of hierarchies in networks. international conference on machine learning (2006),
1–13.
[5]
Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. 2018. A Survey on Network
Embedding. IEEE Transactions on Knowledge and Data Engineering (2018), 1–1.
[6]
Chris Ding, Ding Zhou, Xiaofeng He, and Hongyuan Zha. 2006. R 1-PCA: ro-
tational invariant L 1-norm principal component analysis for robust subspace
factorization. In Proceedings of the 23rd international conference on Machine learn-
ing. ACM, 281–288.
[7]
Lun Du, Zhicong Lu, Yun Wang, Guojie Song, Yiming Wang, and Wei Chen. 2018.
Galaxy network embedding: a hierarchical community structure preserving
approach. In Proceedings of the 27th International Joint Conference on Articial
Intelligence. AAAI Press, 2079–2085.
[8]
Lun Du, Yun Wang, Guojie Song, Zhicong Lu, and Junshan Wang. 2018. Dy-
namic network embedding: an extended approach for skip-gram based network
embedding. In Proceedings of the 27th International Joint Conference on Articial
Intelligence. AAAI Press, 2086–2092.
[9]
Francois Fouss, Alain Pirotte, Jeanmichel Renders, and Marco Saerens. 2007.
Random-Walk Computation of Similarities between Nodes of a Graph with
Application to Collaborative Recommendation. IEEE Transactions on Knowledge
and Data Engineering 19, 3 (2007), 355–369.
[10]
Michelle Girvan and Mark EJ Newman. 2002. Community structure in social and
biological networks. Proceedings of the national academy of sciences 99, 12 (2002),
7821–7826.
[11]
Robert Hechtnielsen. 1988. Theory of the backpropagation neural network.
Neural Networks 1 (1988), 445–448.
[12]
Georey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the dimensional-
ity of data with neural networks. science 313, 5786 (2006), 504–507.
[13]
Omer Levy and Yoav Goldberg. 2014. Neural word embedding as implicit matrix
factorization. In Advances in neural information processing systems. 2177–2185.
[14]
Ziyao Li, Liang Zhang, and Guojie Song. 2019. Sepne: Bringing separability to net-
work embedding. In Proceedings of the AAAI Conference on Articial Intelligence,
Vol. 33. 4261–4268.
[15]
Guangcan Liu, Zhouchen Lin, and Yong Yu. 2010. Robust subspace segmentation
by low-rank representation. In Proceedings of the 27th international conference on
machine learning (ICML-10). 663–670.
[16]
Laurens Van Der Maaten and Georey Hinton. 2008. Visualizing Data using
t-SNE. Journal of Machine Learning Research 9, 2605 (2008), 2579–2605.
[17]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jerey Dean. 2013. Ecient
estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
(2013).
[18]
Yurii Nesterov. 2005. Smooth minimization of non-smooth functions. Mathemat-
ical Programming 103, 1 (2005), 127–152.
[19]
Mark EJ Newman. 2003. The structure and function of complex networks. SIAM
review 45, 2 (2003), 167–256.
[20]
M E J Newman. 2006. Finding community structure in networks using the
eigenvectors of matrices. Physical Review E 74, 3 (2006), 036104.
[21]
M E J Newman and Michelle Girvan. 2004. Finding and evaluating community
structure in networks. Physical Review E 69, 2 (2004), 026113–026113.
[22]
Maximillian Nickel and Douwe Kiela. 2017. Poincaré embeddings for learning
hierarchical representations. In Advances in neural information processing systems.
6338–6347.
[23]
Bryan Perozzi, Rami Alrfou, and Steven Skiena. 2014. DeepWalk: online learning
of social representations. Knowledge Discovery and Data mining (2014), 701–710.
[24]
Leonardo Filipe Rodrigues Ribeiro, Pedro H P Saverese, and Daniel R Figueiredo.
2017. struc2vec : Learning Node Representations from Structural Identity. knowl-
edge discovery and data mining (2017), 385–394.
[25]
Huawei Shen, Xueqi Cheng, Kai Cai, and Mao Bin Hu. 2009. Detect overlap-
ping and hierarchical community structure in networks. Physica A Statistical
Mechanics & Its Applications 388, 8 (2009), 1706–1712.
[26]
Victor Spirin and Leonid A Mirny. 2003. Protein complexes and functional
modules in molecular networks. Proceedings of the National Academy of Sciences
of the United States of America 100, 21 (2003), 12123–12128.
[27]
Gilbert Strang, Gilbert Strang, Gilbert Strang, and Gilbert Strang. 1993. Introduc-
tion to Linear Algebra. Vol. 3. Wellesley-Cambridge Press Wellesley, MA.
[28]
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015.
LINE: Large-scale Information Network Embedding. In International Conference
on World Wide Web. 1067–1077.
[29]
Lei Tang and Huan Liu. 2011. Leveraging social media networks for classication.
Data Mining and Knowledge Discovery 23, 3 (2011), 447–478.
[30]
Amanda L. Traud, Peter J. Mucha, and Mason A. Porter. 2012. Social structure of
Facebook networks. Social Science Electronic Publishing 391, 16 (2012), 4165–4180.
[31]
René Vidal. 2011. Subspace clustering. IEEE Signal Processing Magazine 28, 2
(2011), 52–68.
[32]
Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierrean-
toine Manzagol. 2010. Stacked Denoising Autoencoders: Learning Useful Rep-
resentations in a Deep Network with a Local Denoising Criterion. Journal of
Machine Learning Research 11 (2010), 3371–3408.
[33]
Junshan Wang, Zhicong Lu, Guojia Song, Yue Fan, Lun Du, and Wei Lin. 2019.
Tag2Vec: Learning Tag Representations in Tag Networks. In The World Wide Web
Conference. ACM, 3314–3320.
[34]
Xiao Wang, Peng Cui, Jing Wang, Jian Pei, Wenwu Zhu, and Shiqiang Yang. 2017.
Community Preserving Network Embedding. In Association for the Advancement
of Articial Intelligence Conference.
[35]
Svante Wold, Kim Esbensen, and Paul Geladi. 1987. Principal component analysis.
Chemometrics and intelligent laboratory systems 2, 1-3 (1987), 37–52.
[36]
Zi Yin and Yuanyuan Shen. 2018. On the Dimensionality of Word Embedding.
neural information processing systems (2018), 895–906.
[37]
Yizhou Zhang, Guojie Song, Lun Du, Shuwen Yang, and Yilun Jin. 2019. DANE:
Domain Adaptive Network Embedding. In Proceedings of the 28th International
Joint Conference on Articial Intelligence. AAAI Press.
Session: Long - Network Embedding I
CIKM ’19, November 3–7, 2019, Beijing, China
418
... Since hierarchy greatly enriches the network structure by introducing the concept of community, a lot of studies on network embedding are devoted to mining this type of structural relations. The representative works include GNE [9] and SpaceNE [10]. GNE proposes an embedding method for encoding hierarchical structures by spherical projection. ...
... Table 2 shows the details. Baseline Methods The effectiveness of FastHGE is compared with two hierarchical network embedding models, SpaceNE [10] and GNE [9] Meanwhile, the comparison with traditional network embedding models such as GCN(unsupervised) [14] , and GraphSAGE (unsupervised) [12] are also made. ...
... Structure-preserving methods [5,7,9,10,[25][26][27][28][29] aim to preserve the structure and inherent properties of the network when learning the low-dimensional representations of nodes in it. Pioneering network embedding works such as DeepWalk [5], LINE [6], and node2vec [7]are proposed to encode a vertex with its neighbor vertices. ...
Article
Full-text available
Hierarchy network, as a type of complex graphs, is widely used in many application scenarios such as social network analysis in web, human resource analysis in e-government, and product recommendation in e-commerce. Hierarchy preserving network embedding is a representation learning method that project nodes into feature space by preserving the hierarchy property of networks. Recently, researches on network embedding are devoted to mining hierarchical structures and profit a lot form it. Among these works, SpaceNE stands out of preserving hierarchy with the help of subspace constraint on the hierarchical subspace system. However, like all other existing works, SpaceNE is based on transductive learning method and is hard to generalize to new nodes. Besides, they have high time complexity and hard to be scalable to large-scale networks. This paper proposes an inductive method, FastHGE to learn node representation more efficiently and generalize to new nodes more easily. As SpaceNE, a hierarchy network is embedded into a hierarchical subspace tree. For upper communities, we exploit transductive learning by preserving inner-subspace proximity of subspace from the same ancestor. For extending to new nodes, we adopt inductive learning to learn representations of leaf nodes. The overall representation of a node is retrieved by concatenating the embedding vectors of all its ancestor communities and the leaf node. By learning the basis vectors of subspace, the computing cost is alleviated from updating many parameters of projection matrices as in SpaceNE. The performance evaluation experiments show that FastHGE outperforms with much fast speed and the same accuracy. For example, in the node classification, FastHGE is nearly 30 times faster than SpaceNE. The source code of FastHGE is available online.
... Since hierarchy greatly enriches the network structure by introducing the concept of community, a lot of studies on network embedding are devoted to mining this type of structural relations. The representative works include GNE [7] and SpaceNE [8]. GNE proposes an embedding method for encoding hierarchical structures by spherical projection. ...
... Baseline Methods. The effectiveness of FastHGE is compared with two hierarchical network embedding models, SpaceNE [8] and GNE [7] Meanwhile, the comparison with traditional network embedding models such as GCN(unsupervised) [12] , and GraphSAGE (unsupervised) [10] are also made. ...
... Structure-preserving Methods. Structure-preserving methods [3,5,7,8,[23][24][25][26] aim to preserve the structure and inherent properties of the network when learning the low-dimensional representations of nodes in it. Pioneering network embedding works such as DeepWalk [3], LINE [4], and node2vec [5]are proposed to encode a vertex with its neighbor vertices. ...
Preprint
Full-text available
Hierarchy network, as a type of complex graphs, is widely used in many application scenarios such as social network analysis in web, human resource analysis in e-government, and product recommendation in e-commerce. Hierarchy preserving network embedding is a representation learning method that project nodes into feature space by preserving the hierarchy property of networks. Recently, researches on network embedding are devoted to mining hierarchical structures and profit a lot form it. Among these works, SpaceNE stands out of preserving hierarchy with the help of subspace constraint on the hierarchical subspace system. However, like all other existing works, SpaceNE is based on transductive learning method and is hard to generalize to new nodes. Besides, they have high time complexity and hard to be scalable to large-scale networks. This paper proposes an inductive method, FastHGE to learn node representation more efficiently and generalize to new nodes more easily. As SpaceNE, a hierarchy network is embedded into a hierarchical subspace tree. For upper communities, we exploit transductive learning by preserving inner-subspace proximity of subspace from the same ancestor. For extending to new nodes, we adopt inductive learning to learn representations of leaf nodes. The overall representation of a node is retrieved by concatenating the embedding vectors of all its ancestor communities and the leaf node. By learning the basis vectors of subspace, the computing cost is alleviated from updating many parameters of projection matrices as in SpaceNE. The performance evaluation experiments show that FastHGE outperforms with much fast speed and the same accuracy. For example, in the node classification, FastHGE is nearly 30 times faster than SpaceNE. The source code of FastHGE is available online.
... In recent years, Graph Neural Networks (GNNs) have been widely studied on graph representation learning for its expressive power [11,12,25,29,39,40,46,49]. Spectral GNNs [3,5,9,20,32,41,50] were first proposed based on graph signal processing theory. ...
Preprint
Full-text available
Graph Neural Networks (GNNs) have shown expressive performance on graph representation learning by aggregating information from neighbors. Recently, some studies have discussed the importance of modeling neighborhood distribution on the graph. However, most existing GNNs aggregate neighbors' features through single statistic (e.g., mean, max, sum), which loses the information related to neighbor's feature distribution and therefore degrades the model performance. In this paper, inspired by the method of moment in statistical theory, we propose to model neighbor's feature distribution with multi-order moments. We design a novel GNN model, namely Mix-Moment Graph Neural Network (MM-GNN), which includes a Multi-order Moment Embedding (MME) module and an Element-wise Attention-based Moment Adaptor module. MM-GNN first calculates the multi-order moments of the neighbors for each node as signatures, and then use an Element-wise Attention-based Moment Adaptor to assign larger weights to important moments for each node and update node representations. We conduct extensive experiments on 15 real-world graphs (including social networks, citation networks and web-page networks etc.) to evaluate our model, and the results demonstrate the superiority of MM-GNN over existing state-of-the-art models.
Article
In this paper, we address the problem of unsuperised social network embedding, which aims to embed network nodes, including node attributes, into a latent low dimensional space. In recent methods, the fusion mechanism of node attributes and network structure has been proposed for the problem and achieved impressive prediction performance. However, the non-linear property of node attributes and network structure is not efficiently fused in existing methods, which is potentially helpful in learning a better network embedding. To this end, in this paper, we propose a novel model called ASM (Adaptive Specific Mapping) based on encoder-decoder framework. In encoder, we use the kernel mapping to capture the non-linear property of both node attributes and network structure. In particular, we adopt two feature mapping functions, namely an untrainable function for node attributes and a trainable function for network structure. By the mapping functions, we obtain the low dimensional feature vectors for node attributes and network structure, respectively. Then, we design an attention layer to combine the learning of both feature vectors and adaptively learn the node embedding. In encoder, we adopt the component of reconstruction for the training process of learning node attributes and network structure. We conducted a set of experiments on seven real-world social network datasets. The experimental results verify the effectiveness and efficiency of our method in comparison with state-of-the-art baselines.
Chapter
Graph representation learning has demonstrated improved performance in tasks such as link prediction and node classification across a range of domains. Research has shown that many natural graphs can be organized in hierarchical communities, leading to approaches that use these communities to improve the quality of node representations. However, these approaches do not take advantage of the learned representations to also improve the quality of the discovered communities and establish an iterative and joint optimization of representation learning and community discovery. In this work, we present Mazi, an algorithm that jointly learns the hierarchical community structure and the node representations of the graph in an unsupervised fashion. To account for the structure in the node representations, Mazi generates node representations at each level of the hierarchy, and utilizes them to influence the node representations of the original graph. Further, the communities at each level are discovered by simultaneously maximizing the modularity metric and minimizing the distance between the representations of a node and its community. Using multi-label node classification and link prediction tasks, we evaluate our method on a variety of synthetic and real-world graphs and demonstrate that Mazi outperforms other hierarchical and non-hierarchical methods.KeywordsNetworksNetwork embeddingUnsupervised learningGraph representation learningHierarchical clusteringCommunity detection
Article
Graph embedding, which aims to learn low-dimensional node representations to preserve original graph structures, has attracted extensive research interests. However, most existing graph embedding models represent nodes in Euclidean spaces, which cannot effectively preserve complex patterns, e.g., hierarchical structures. Very recently, several hyperbolic embedding models have been proposed to preserve the hierarchical information in negative curvature spaces. Nevertheless, existing hyperbolic models fail to model the asymmetric proximity between nodes. To address this, we investigate a new asymmetric hyperbolic network representation problem, which targets at jointly preserving the hierarchical structures and asymmetric proximity for general directed graphs. We solve this problem by proposing a novel Ro tated L orentzian E mbedding (ROLE) model, which yields two main benefits. First, our model can effectively capture both implicit and explicit hierarchical structures that come from the network topology and category information of nodes, respectively. Second, it can model the asymmetric proximity using rotation transformations. Specifically, we represent each node with a Lorentzian embedding vector, and learn two rotation matrices to reflect the direction of edges. We conduct extensive experiments on four real-world directed graph datasets. Empirical results demonstrate that the proposed approach consistently outperforms various state-of-the-art embedding models. In particular, ROLE achieves HR@1 scores up to 19.8% higher and NDCG@5 scores up to 11.3% higher than the best baselines on the task of node recommendation.
Article
Hidden community is a useful concept proposed recently for social network analysis. Hidden communities indicate some weak communities whose most members also belong to other stronger dominant communities. Dominant communities could form a layer that partitions all the individuals of a network, and hidden communities could form other layer(s) underneath. These layers could be natural structures in the real-world networks like students grouped by major, minor, hometown, etc. To handle the rapid growth of network scale, in this work, we explore the detection of hidden communities from the local perspective, and propose a new method that detects and boosts each layer iteratively on a subgraph sampled from the original network. We first expand the seed set from a single seed node based on our modified local spectral method and detect an initial dominant local community. Then we temporarily remove the members of this community as well as their connections to other nodes, and detect all the neighborhood communities in the remaining subgraph, including some “broken communities” that only contain a fraction of members in the original network. The local community and neighborhood communities form a dominant layer, and by reducing the edge weights inside these communities, we weaken this layer’s structure to reveal the hidden layers. Eventually, we repeat the whole process and all communities containing the seed node can be detected and boosted iteratively. We theoretically show that our method can avoid some situations that a broken community and the local community are regarded as one community in the subgraph, leading to the inaccuracy on detection which can be caused by global hidden community detection methods. Extensive experiments show that our method could significantly outperform the state-of-the-art baselines designed for either global hidden community detection or multiple local community detection.
Article
A Symmetric Non-negative Matrix Factorization (SNMF)-based network embedding model adopts a unique Latent Factor (LF) matrix for describing the symmetry of an undirected network, which reduces its representation ability to the target network and thus resulting in accuracy loss when performing community detection. To address this issue, this paper proposes a new undirected network embedding model, i.e., Alternating Direction Method of Multipliers ( A DMM)-based, M odularity, S ymmetry and N onnegativity-constrained E mbedding (AMSNE), which can be applicable to undirected, weighted or unweighted networks. It relies on two-fold ideas: a) Introducing the symmetry constraints into the model to correctly describe the symmetric of an undirected network without accuracy loss; and b) Adopting the ADMM principle to efficiently solve its constrained objective. Extensive experiments on eight real-world networks strongly evidence that the proposed AMSNE outperform several state-of-the-art models, making it suitable for real applications.
Conference Paper
Full-text available
Network embedding is a method to learn low-dimensional representation vectors for nodes in complex networks. In real networks, nodes may have multiple tags but existing methods ignore the abundant semantic and hierarchical information of tags. This information is useful to many network applications and usually very stable. In this paper, we propose a tag representation learning model, Tag2Vec, which mixes nodes and tags into a hybrid network. Firstly, for tag networks, we define semantic distance as the proximity between tags and design a novel strategy, parameterized random walk, to generate context with semantic and hierarchical information of tags adaptively. Then, we propose hyperbolic Skip-gram model to express the complex hierarchical structure better with lower output dimensions. We evaluate our model on the NBER U.S. patent dataset and WordNet dataset. The results show that our model can learn tag representations with rich semantic information and it outperforms other baselines.
Conference Paper
Full-text available
Network embedding, as an approach to learn low-dimensional representations of vertices, has been proved extremely useful in many applications. Lots of state-of-the-art network embedding methods based on Skip-gram framework are efficient and effective. However, these methods mainly focus on the static network embedding and cannot naturally generalize to the dynamic environment. In this paper, we propose a stable dynamic embedding framework with high efficiency. It is an extension for the Skip-gram based network embedding methods, which can keep the optimality of the objective in the Skip-gram based methods in theory. Our model can not only generalize to the new vertex representation, but also update the most affected original vertex representations during the evolvement of the network. Multi-class classification on three real-world networks demonstrates that, our model can update the vertex representations efficiently and achieve the performance of retraining simultaneously. Besides, the visualization experimental result illustrates that, our model is capable of avoiding the embedding space drifting.
Conference Paper
Full-text available
Network embedding is a method of learning a low-dimensional vector representation of network vertices under the condition of preserving different types of network properties. Previous studies mainly focus on preserving structural information of vertices at a particular scale, like neighbor information or community information, but cannot preserve the hierarchical community structure, which would enable the network to be easily analyzed at various scales. Inspired by the hierarchical structure of galaxies, we propose the Galaxy Network Embedding (GNE) model, which formulates an optimization problem with spherical constraints to describe the hierarchical community structure preserving network embedding. More specifically, we present an approach of embedding communities into a low dimensional spherical surface, the center of which represents the parent community they belong to. Our experiments reveal that the representations from GNE preserve the hierarchical community structure and show advantages in several applications such as vertex multi-class classification and network visualization. The source code of GNE is available online.
Article
Full-text available
Network embedding assigns nodes in a network to low-dimensional representations and effectively preserves the network structure. Recently, a significant amount of progresses have been made toward this emerging network analysis paradigm. In this survey, we focus on categorizing and then reviewing the current development on network embedding methods, and point out its future research directions. We first summarize the motivation of network embedding. We discuss the classical graph embedding algorithms and their relationship with network embedding. Afterwards and primarily, we provide a comprehensive overview of a large number of network embedding methods in a systematic manner, covering the structure- and property-preserving network embedding methods, the network embedding methods with side information and the advanced information preserving network embedding methods. Moreover, several evaluation approaches for network embedding and some useful online resources, including the network data sets and softwares, are reviewed, too. Finally, we discuss the framework of exploiting these network embedding methods to build an effective system and point out some potential future directions.
Article
Many successful methods have been proposed for learning low dimensional representations on large-scale networks, while almost all existing methods are designed in inseparable processes, learning embeddings for entire networks even when only a small proportion of nodes are of interest. This leads to great inconvenience, especially on super-large or dynamic networks, where these methods become almost impossible to implement. In this paper, we formalize the problem of separated matrix factorization, based on which we elaborate a novel objective function that preserves both local and global information. We further propose SepNE, a simple and flexible network embedding algorithm which independently learns representations for different subsets of nodes in separated processes. By implementing separability, our algorithm reduces the redundant efforts to embed irrelevant nodes, yielding scalability to super-large networks, automatic implementation in distributed learning and further adaptations. We demonstrate the effectiveness of this approach on several real-world networks with different scales and subjects. With comparable accuracy, our approach significantly outperforms state-of-the-art baselines in running times on large networks.
Conference Paper
Recent works reveal that network embedding techniques enable many machine learning models to handle diverse downstream tasks on graph structured data. However, as previous methods usually focus on learning embeddings for a single network, they can not learn representations transferable on multiple networks. Hence, it is important to design a network embedding algorithm that supports downstream model transferring on different networks, known as domain adaptation. In this paper, we propose a novel Domain Adaptive Network Embedding framework, which applies graph convolutional network to learn transferable embeddings. In DANE, nodes from multiple networks are encoded to vectors via a shared set of learnable parameters so that the vectors share an aligned embedding space. The distribution of embeddings on different networks are further aligned by adversarial learning regularization. In addition, DANE's advantage in learning transferable network embedding can be guaranteed theoretically. Extensive experiments reflect that the proposed framework outperforms other state-of-the-art network embedding baselines in cross-network domain adaptation tasks.
Conference Paper
We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.
Article
Representation learning has become an invaluable approach for learning from symbolic data such as text and graphs. However, while complex symbolic datasets often exhibit a latent hierarchical structure, state-of-the-art methods typically learn embeddings in Euclidean vector spaces, which do not account for this property. For this purpose, we introduce a new approach for learning hierarchical representations of symbolic data by embedding them into hyperbolic space -- or more precisely into an n-dimensional Poincar\'e ball. Due to the underlying hyperbolic geometry, this allows us to learn parsimonious representations of symbolic data by simultaneously capturing hierarchy and similarity. We introduce an efficient algorithm to learn the embeddings based on Riemannian optimization and show experimentally that Poincar\'e embeddings outperform Euclidean embeddings significantly on data with latent hierarchies, both in terms of representation capacity and in terms of generalization ability.
Article
We analyze skip-gram with negative-sampling (SGNS), a word embedding method introduced by Mikolov et al., and show that it is implicitly factorizing a word-context matrix, whose cells are the pointwise mutual information (PMI) of the respective word and context pairs, shifted by a global constant. We find that another embedding method, NCE, is implicitly factorizing a similar matrix, where each cell is the (shifted) log conditional probability of a word given its context. We show that using a sparse Shifted Positive PMI word-context matrix to represent words improves results on two word similarity tasks and one of two analogy tasks. When dense low-dimensional vectors are preferred, exact factorization with SVD can achieve solutions that are at least as good as SGNS's solutions for word similarity tasks. On analogy questions SGNS remains superior to SVD. We conjecture that this stems from the weighted nature of SGNS's factorization.