Conference PaperPDF Available

Hierarchical Community Structure Preserving Network Embedding: A Subspace Approach

November 2019

November 2019

DOI:10.1145/3357384.3357947

Conference: the 28th ACM International Conference

Authors:

Lun Du

Microsoft

Guojie Song

Peking University

Show all 6 authorsHide

To depict ubiquitous relational data in real world, network data have been widely applied in modeling complex relationships. Projecting vertices to low dimensional spaces, quoted as Network Embedding, would thus be applicable to diverse real-world predicative tasks. Numerous works exploiting pairwise proximities, one characteristic owned by real networks, the clustering property, namely vertices are inclined to form communities of various ranges and hence form a hierarchy consisting of communities, has barely received attention from researchers. In this paper, we propose our network embedding framework, abbreviated SpaceNE, preserving hierarchies formed by communities through subspaces, manifolds with flexible dimensionalities and are inherently hierarchical. Moreover, we propose that subspaces are able to address further problems in representing hierarchical communities, including sparsity and space warps. Last but not least, we proposed constraints on dimensions of subspaces to denoise, which are further approximated by differentiable functions such that joint optimization is enabled, along with a layer-wise scheme to alleviate the overhead cause by the vast number of parameters. We conduct various experiments with results demonstrating our model's effectiveness in addressing community hierarchies.

Illustration of relationship preservation among communities. Distance of node embeddings may change before and after projection, as there are angles among the projection subspaces.

…

Average Distance with the increasing of layers

…

that all the algorithms go worse after contamination, but SpaceNE still performs better than its counterparts. Moreover, we can find that compared with other

…

Figures - uploaded by Lun Du

Content may be subject to copyright.

Content uploaded by Lun Du

Content may be subject to copyright.

Hierarchical Community Structure Preserving Network

Embedding: A Subspace Approach

Qingqing Long∗

Key Laboratory of Machine

Perception (Ministry of Education),

Peking University

qingqinglong@pku.edu.cn

Yiming Wang∗

Key Laboratory of Machine

Perception (Ministry of Education),

Peking University

wangyiming17@pku.edu.cn

Lun Du∗†

Microsoft Research

lun.du@microsoft.com

Guojie Song‡

Key Laboratory of Machine

Perception (Ministry of Education),

Peking University

gjsong@pku.edu.cn

Yilun Jin

Key Laboratory of Machine

Perception (Ministry of Education),

Peking University

yljin@pku.edu.cn

Wei Lin

Alibaba Group

yangkun.lw@alibaba-inc.com

ABSTRACT

To depict ubiquitous relational data in real world, network data

are widely applied in modeling complex relationships. Projecting

vertices to low dimensional spaces, quoted as Network Embedding,

would thus be applicable to diverse predicative tasks. Numerous

works exploiting pairwise proximities, one characteristic owned

by real networks, the clustering property, namely vertices are in-

clined to form communities of various ranges and hence form a

hierarchy consisting of communities, has barely received attention

from researchers. In this paper, we propose our network embedding

framework, abbreviated SpaceNE, preserving hierarchies formed

by communities through subspaces, manifolds with exible dimen-

sionalities and are inherently hierarchical. Moreover, we propose

that subspaces are able to address further problems in representing

hierarchical communities, including sparsity and space warps. Last

but not least, we proposed constraints on dimensions of subspaces

to denoise, which are further approximated by dierentiable func-

tions such that joint optimization is enabled, along with a layer-wise

scheme to alleviate the overhead cause by vast number of parame-

ters. We conduct various experiments with results demonstrating

our model’s eectiveness in addressing community hierarchies.

CCS CONCEPTS

•Networks →

Network structure;

•Information systems →

Col-

laborative and social computing systems and tools;

•Mathematics

of computing →Graph theory.

∗These authors contributed equally to the work.

†Work performed as a student of Peking University.

‡Corresponding Author.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

CIKM ’19, November 3–7, 2019, Beijing, China

ACM ISBN 978-1-4503-6976-3/19/11.. .$15.00

https://doi.org/10.1145/3357384.3357947

KEYWORDS

Network embedding; subspace; complex networks; community

structure; data mining

ACM Reference Format:

Qingqing Long, Yiming Wang, Lun Du, Guojie Song, Yilun Jin, and Wei Lin.

2019. Hierarchical Community Structure Preserving Network Embedding:

A Subspace Approach. In The 28th ACM International Conference on Infor-

mation and Knowledge Management (CIKM ’19), November 3–7, 2019, Beijing,

China. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3357384.

3357947

1 INTRODUCTION

Network data have been ubiquitous due to its precise depiction of

relational data. Traditional network algorithms being not applicable

due to its forbidding computational costs, network embedding algo-

rithms, projecting vertices within networks to lower-dimensional

vector space while preserving general proximity between nodes,

have proved to overcome such complexity and hence being appli-

cable to a wider range of prediction tasks including link prediction

and classication [14, 28, 37].

Apart from general proximity between pairwise nodes, a char-

acteristic property real-world networks possess is the clustering

property, namely nodes would tend to form communities with

varying size and range at a far higher frequency than randomly

connected. In addition, community structures are highly informa-

tive in that they shed light on how the network is inclined to be,

with connections within communities being considerably more

probable than those across communities, and hence provide net-

work structures with resistance to noise links that occur randomly.

Even more remarkably, not only do vertices form communities, the

communities thus formed are commonly organized in a hierarchi-

cal manner as well, which proves to be signicantly indicative of

functional components of the underlying networks [3, 4].

Ubiquitous and instrumental as community structures and their

hierarchy are, relatively few attention has been devoted to such

properties. Typical attempts trying to preserve communities within

vector spaces are MNMF [

] and GNE [

]. MNMF, aiming at pre-

serving community structures within networks and reect them

Session: Long - Network Embedding I

CIKM ’19, November 3–7, 2019, Beijing, China

409

in embedding networks ignores the hierarchy across communities,

while GNE, preserving hierarchies across communities through

spherical projection, suers from drawbacks that, spherical pro-

jections tend to propel vertices across communities undesirably

far from each other, resulting in incorrect modeling for vertices

from dierent communities. In addition, communities lying deep

within the hierarchy are treated with extremely small spheres yet

possessing the same dimensionality as their shallower counter-

parts in GNE, and hence did not suciently exploit the inclusion

relationship between high-level and lower-level communities.

It is hence concluded that, utilizing communities as well as their

hierarchy is a eld of pressing importance yet demanding chal-

lenges, with relatively few attempts devoted to it. Specically, we

summarize the following challenges yet to be resolved such that

hierarchical community structures can be utilized.

(1) Sparsity

Intuitively, communities lying deep within the hi-

erarchy are less inclusive than their shallower counterparts,

and hence possess less variance within themselves. There-

fore, the original space in which the whole network is em-

bedded to will be increasingly inappropriate for embedding

deeper communities as they require far less variance to be

encoded than is enabled by the original space. Consequently,

embedding thus learned will suer from extreme sparsity

and as a result, noise arises, which will undermine the rep-

resentation for deeper communities. We hence conclude

that the sparsity issue for low-level communities should be

addressed, and that dimensionality of spaces where com-

munities reside should vary corresponding to their depth, a

requirement scarcely met by previous research works.

(2) Space Warps

As previously mentioned, spherical projec-

tion used by GNE suers from restrictions imposed by the

radii of the spheres, which decay exponentially as the hierar-

chy deepens. Consequently, it is common for vertices across

communities to be exponentially distant than those within

the same community, thus inappropriately underestimating

density of links across communities and condensing links

within communities. It is hence deduced that a desirable

gure to which communities are projected to should be de-

signed to ensure the correct modeling of nodes within and

across communities and alleviate space warps.

(3) “Curse” of Depth

Just like the “Curse of Dimensionality”,

when encountered with extremely deep hierarchies, it is

not trivial that a sensible scale in which communities reside

be still maintained, a counter-example of which would be,

again, GNE. As explained previously, the radii of spheres to

be projected to shrink exponentially with increasing depth,

resulting in unduly tiny radii which may cause practical

problems including underow.

To address these challenges, we propose Subspace Network Em-

bedding, abbreviated SpaceNE, to model the community structures

in networks along with their hierarchy. Specically, we observe

that subspaces within Euclidean space inherently follow the or-

ganization of hierarchy. For example, in three-dimensional space

as illustrated in Figure 1, planes and lines, which are subspaces

of varying dimensions reside within the 3-d space, while as lines

reside within planes and planes within the 3-d space, an inherent

hierarchy is illustrated. In addition, natural metrics of measuring

distances between pairwise subspaces exist, such as angles between

pairwise planes and lines, which can be easily adopted to measure

similarities between communities dwelling within subspaces of the

same dimensions. Both the aforementioned factors, inherent hierar-

chy and handy metrics, contribute to our modeling of hierarchical

community structure using subspace.

In addition, we are delighted to nd that subspaces possess other

appealing properties that can further enhance our modeling of

community structures and their hierarchy. On one hand, the di-

mensionality of subspaces is highly exible, which corresponds

to the inconsistency of variance within communities of dierent

depth within the hierarchy and alleviates the sparsity issue, thereby

ltering undesirable noise and enhancing our representation. On

the other hand, subspaces are at and possess consistent scale of

distance regardless of their dimensionality, which maintains dis-

tances within a community and across communities at comparable

scales while allowing deep hierarchies to possess a similar scale of

distance to their shallower counterparts, facilitating the modeling

of arbitrarily deep community hierarchies.

Consequently, we extended DeepWalk [

], which preserves

general proximity between pairwise vertices though co-occurence

in random walks. In addition to proximity between vertices, with

subspaces to model hierarchical community structures, we project

vertices according to their communities, into subspaces of corre-

sponding dimensions, during which objectives preserving simi-

larities within a community and across communities are adopted,

such that representations for communities, i.e. subspaces can be

jointly optimized along with node representation vectors. What

is more, we impose constraints on the dimensions of subspaces

used to represent communities, keeping them as low as possible,

such that a minimal level of redundancy and noise is kept, which

was further approximated according to matrix algebra and convex

optimization into a dierentiable one, such that it can be optimized

simultaneously along with proximity between pairwise vertices

and communities.

To summarize, we make the following contributions:

•

We propose Subspace Network Embedding model, abbrevi-

ated SpaceNE, introducing subspaces to the eld of commu-

nity preserving network embedding, which, to the best of

our knowledge, is the rst attempt of introducing subspaces

into network representation learning.

•

We design elaborate objectives preserving proximity be-

tween pairwise nodes, across communities, along with con-

straints on subspace dimension which was then approxi-

mated by a dierentiable one, leading to ecient optimiza-

tion of our model.

•

We conduct extensive experiments on several real-world

datasets, where experimental results demonstrate that SpaceNE

is signicantly more competitive to its various counterparts

on various applications.

2 RELATED WORK

Hierarchical Network.

Many real-world systems can be mapped

into networks with hierarchical community structure [

][

]. Gen-

erally, a network community refers to a dense sub-network in which

Session: Long - Network Embedding I

CIKM ’19, November 3–7, 2019, Beijing, China

410

d-dimensional subspace

(d-1)-dimensional subspace

(d-2)-dimensional subspace

(a) Network

Figure 1: The correspondence between the community hierarchy and the subspace hierarchy

vertices are densely connected one another [

]. The communities

can be recursively divided into sub-communities. Thus, commu-

nities on dierent scales are hierarchically organized like a tree.

Exploring the hierarchical community structure of network has

proved to possess a wide range of applications, including scientic

collaboration analysis [

], protein function prediction [

] and

so on. Thus more and more works pay attention to networks con-

taining a hierarchy of communities [

]. [

] claims that knowing

the hierarchical structure of complex networks is useful for link

prediction tasks.

Network Embedding

. Network embedding aims at embedding

the network data to a low-dimensional space [

]. Deepwalk

[

] learns vertex representations by using truncated random walk

and Skip-Gram. LINE [

] rst proposes the method to preserve

the rst-and second- order proximity among nodes. Struc2Vec [

]

denes a hierarchy to measure node similarity at dierent scales,

and constructs a multilayer graph to describe structural similarities.

As an important property of the network, the community struc-

ture has been extensively studied in network embedding. Recently,

as for preserving the hierarchical community structure of the net-

work, several models have been proposed. [

] leverages the matrix

factorization to integrate the community structure information into

the node embeddings. [

] learns the network representations in

the hyperbolic space. Its limitation is the learned representations

are hyperbolic vectors, which can not be applied to the majority

machine learning algorithms whose inputs are usually Euclidean

ones. Inspired by the structure of the Galaxy, [

] presents GNE

that embeds the communities at dierent scales onto the surface of

the spheres with dierent radii. However, this algorithm has obvi-

ous shortcomings. With the level increasing, the radii of spheres

decrease quickly, which limits the representation spaces for the

communities in deep levels. Therefore, the optimizing algorithm

with a constant learning rate can not learn the proper representa-

tions for them.

Subspace

. A subspace is a subset of a topological space endowed

with the subspace topology [

]. Subspaces, inherently of lower

variance than the original space, can be used to approximate data

with higher dimensions such that only the principal features are

kept within. Typical and widespread examples of utilizing subspaces

are low-rank approximations and subspace clustering. Classical

related words include [

], which all have achieved convincing

results in their corresponding elds. Nonetheless, though widely

explored in other elds, subspaces are largely overlooked in the

eld of network representation learning.

3 PRELIMINARIES AND PROBLEM

STATEMENT

Denition 1

(

Hierarchical Network

)

Let

G(V,E)

denote an undi-

rected network with

as the vertex set and

being the edge set.

denotes the tree representing hierarchy of communities within

the network

with depth

and node set

. For a node

c∈C

ch(c)

and

pa(c)

denotes its set of children and its parent node within

respectively. We let

i⊆V

denote the set of the

-th community

within the

-th layer. Specically,

1=V

is the original set of

vertices of G.

Denition 2

(

Subspace

)

Let

be the vector space

, a non-

empty subset of

, is called a subspace

if for any element pair

x,y∈Us

and any scalar

λ∈R

x+y∈Us,λx∈Us

[

][

Subspace has several important properties mentioned as follows to

model hierarchical network embedding:

•Hierarchical structure

. Subspaces within Euclidean space

inherently follow the organization of hierarchy. For example,

in three-dimensional space as illustrated in Figure 1, com-

munities can be vividly depicted as planes, or 2-d subspaces

of the 3-d space, and sub-communities can be consequently

modeled as lines, or 1-d subspaces, residing in their corre-

sponding communities, or planes. In addition, vertices, rep-

resented by points in Euclidean space, can reside in arbitrary

subspaces of arbitrary dimension.

•Lower Dimensional Representation

. The subspaces has

proved to be instrumental in extracting principal features

from extremely high-dimensional data, demonstrated dis-

tinctively by the famous PCA [

] decomposition algorithm,

which selects the subspaces in which highest variances of

the original data lie. Consequently, we would anticipate that

subspaces would serve as a sieve, preserving those principal

characteristics of the original networks and communities.

Denition 3

(

Hierarchical Subspace Network Embedding

)

The

hierarchical subspace is denoted as

T′

with a depth of

L′

There is no loss of generality in assuming that subspaces are linear, i.e., they all

contain the origin. For the ane subspaces that do not contain the origin, we can

always increase the dimension of the ambient space by one and identify each ane

subspace with the linear subspace that it spans. So we always use ‘subspace’ to denote

‘linear subspace’ and ‘ane subspace’ in this work.

Session: Long - Network Embedding I

CIKM ’19, November 3–7, 2019, Beijing, China

411

[w1,w2, ... , wn] ∈ Rn×m

denotes a basis of a certain subspace and

denotes the

-th subspace at the

-th level of

T′

. The upper bound

of the dimension of subspace is a hyper parameter. The descending

trend of subspace dimension can be linear (eg.

b−a·l

) or non-linear

(eg.

⌊log(b−l)⌋

), where

and

are positive integers. Community

should be corresponding to

in a hierarchical subspace. The

objective of hierarchical community structure preserving network

embedding is to learn the subspace representations (corresponding

base vectors) of the input hierarchical network.

4 SUBSPACE BASED HIERARCHICAL

NETWORK EMBEDDING

Subspaces follow a hierarchy naturally, which inspires us to come

across a subspace based hierarchy preserving network embedding

approach. SpaceNE preserves pairwise proximity and proximities

between hierarchical communities, including structural informa-

tion within a community and among communities.

4.1 Preserving Pairwise Proximity Between

Vertices

We propose that preserving pairwise proximity between vertices

should be most fundamental in general network embedding prob-

lems. Hence we rst address the pairwise proximity between nodes

similar to DeepWalk, as mentioned before.

min

®u

ℓ1=Õ

vi∈VÕ

vj∈N(vi)

log σ(

®uj− ®ui

2)

+k·Evn∼Pn(v)[logσ(− ∥®ui− ®un∥2)],

(1)

where

σ(x)=

+e−x)

is the sigmoid function,

®ui

is the rep-

resentation vector of

N(vi)

denotes the “neighborhood” of

i.e. vertices co-occuring with

in random walks, and

Pn(v)

is the

noise distribution for negative sampling.

4.2 Preserving Proximity of Hierarchical

Communities

4.2.1 Preservation of Structure within Individual Communities. Ver-

tices from the same community should be closer to each other than

those from dierent communities, thus we project nodes from the

same community into the same subspace. To integrate the hierar-

chical subspace information, for each community

in layer

have the following constraint:

rank (Ul

i) ≤ dl,(2)

where

is the dimension of the subspace in layer

, a hyper-

parameter, and

is a matrix each line of which is a representation

vector of a vertex, belonging to community

. Each vertex

in its

corresponding community has a new vector representation under

base-vectors of the corresponding subspace, which is denoted as

®ul

i. Hence the variable ®uiin Eq. (1) becomes an alias of ®u1

Considering that such constraints make the problem dicult to

solve, we equivalently transform the way of introducing constraints

through vertex projection layer by layer. Specically, we change

the decision variables from

®u1

i∈Rd1

into

®uL

i∈RdL

where

dL<d1

and introduce auxiliary decision variables

j∈Rdl×dl−1

denoting

Figure 2: Illustration of relationship preservation among

communities. Distance of node embeddings may change be-

fore and after projection, as there are angles among the pro-

jection subspaces.

the projection matrix, i.e. there exists the following relationship:

®ul

i=Sl

j®ul−1

i,®ul−1

i=Sl

†®ul

i,(3)

where

is the index of community

that the vertex

belongs to

in the l-th layer, and Sl

†is the pseudo-inverse of the matrix Sl

4.2.2 Preservation of Structure among Communities. However, the

relationship among communities may be lost if only the hierar-

chical projection method above is used. Actually, the relationship

among dierent subspaces is able to reect the relationship among

communities after adding the subspace constraints. For example, as

shown in Fig.2, the projected position of node 3 in the grey plane is

b, and cin the green plane. The distance between node 2 and node

3 varies with the planes to which node 3 is projected. The main

idea of formulating this objective is to minimize the dierence of

subspace similarity and community similarity between every two

communities i,j, namely,

min ℓl

2=|| ∆l−Γl||F,(4)

where

∆l

is a matrix, each entry of which

∆l

i,j

is the similarity

of two communities

and

in the original graph. Similarly,

Γl

i,j

is the similarity of the two subspaces . The method of calculating

∆

can be customized according to the datasets. For example, the

community similarity, based on PMI [

] or common neighbors [

etc, is reasonable. The relationship between two subspaces can be

measured by the inner product of projection matrices [

]. Thus

the subspace similarity Γl

i,jcan be dened as

Γl

i,j=||Sl

jUl−1

i(Sl

jUl−1

i)T||F.(5)

4.2.3 Low Rank Representation. As stated before, inconsistent de-

mand for dimensionality poses a signicant threat on the repre-

sentation of subspaces. On one hand, inappropriate dimensionality

for modeling communities, especially those lying deep within the

hierarchy, would result in redundancy in dimensionality and hence-

forth sparsity and noise, which would compromise the quality of

representations of vertices and communities. On the other hand, in

order to get rid of undesirable noise while conserving the principal

features of the original networks, we would prefer a slimmer sub-

space such that only the variances that are vital enough are kept

while leaving those obscure behind.

Session: Long - Network Embedding I

CIKM ’19, November 3–7, 2019, Beijing, China

412

Consequently, we are motivated to impose penalties on the di-

mensionality needed for subspaces, which corresponds to the rank

of projection matrices. Therefore, we add the following constraint

on projection matrices:

min

Sℓl

r eд=

i=1

rank (Sl

i),(6)

where

rank

of a matrix is the number of linearly independent

columns which it contains and kis the number of communities at

each layer. Since

S∈Rm×n

is the projection matrix, the projected

space lies within the column space of

. We can minimize the rank

with Eq.

(6)

to reduce the dimension of the projected subspace.

Eq.(6) can restrict

to be of rank as low as possible, resulting in

desirable low-dimensional subspaces, which further improves the

capability of denoising.

Considering all the objectives mentioned above, the overall ob-

jective function is:

min

ℓ3=ℓ1+α

l=1

ℓl

2+λ

l=1

ℓl

r eд,(7)

where

and

are the coecients regulating the contributions of

two community preserving targets.

5 LEARNING PROCEDURE

The optimization of Eq.

(7)

is extremely dicult due to the following

two problems. First, it has a massive number of parameters that

need to be optimized at a single iteration. Thus directly optimiz-

ing may lead to vanishing gradient [

]. Second, it has a discrete

term of ranks, which cannot be solved directly by dierentiating.

Correspondingly we propose two alternatives to address the above

problems.

5.1 From Global to Layer-wise Optimization

It is obvious that the overall objective function has a massive num-

ber of complicated parameters, such as matrix inversion, which

may lead to an unacceptable ineciency and vanishing gradient.

Inspired by the layer-by-layer training in auto-encoder [

], we

consider learning the parameters hierarchically in our method.

Specically, we rst optimize pairwise proximity

Φlocal

through

Eq.

(1)

, which is followed by adding the constraints to

Φlocal

each layer in the subspace hierarchy. On the other hand, unitary

transformation does not change the relative position of vectors [

thus it’s reasonable to learn the parameters layer by layer, allevi-

ating the need to project nodes all the way down to the bottom of

the hierarchy.

The hierarchy subspace constraints are essentially projections

among spaces, while other constraints are naturally disassembled in

the layer-wise optimization. This projection is similar to the general

idea of PCA [35], which selects the spaces in which maintains the

nearest reconstruction of vertices before and after projection with

high eciency. Based on the following Lemma.1, we extend PCA

to describe the projection of all inner-community nodes in the

subspace:

min ℓl

i=1

−tr ((Sl

i)TUl

i(Ul

i)TSl

i),(8)

where

tr (M)

is the trace of

. Eq.

(8)

constructs a low-dimensional

representation of the data, which omits complicated and inecient

matrix inversion, and improves the capability of denoising further.

Lemma 1.

If Eq.

(8)

achieves desired optimal solution, the optimal

result learned from recursively optimization is the same as that of

joint optimization (Eq. (1)) with constraints in Section 4.2.1.

Proof.

Let

d(®ui,®uj)

be the Euclidean distance of vertices

and

. The vector

®ui

after projection is denoted as

®u′

. Based on

Trigonometric Inequalities [1] in Euclidean Space, here comes

d(®ui,®uj) ≤ d(®ui,®u′

i)+d(®u′

i,®u′

j)+d(®uj,®u′

j),

d(®u′

i,®u′

j) ≤ d( ®ui,®u′

i)+d(®ui,®uj)+d(®uj,®u′

j).

The projection of PCA[

] maintains the nearest reconstruction of

vertices before and after projection, which means

lim

Eq .(8)→0d(®ui,®u′

i) → 0,d(®uj,®u′

j) → 0

Thus,

d(®u′

i,®u′

j) → d( ®ui,®uj).

It means that relationships among vertices before and after projec-

tion are maintained. □

5.2 From Discrete to Dierentiable

Optimization

Rank minimization turns out to be an NP-hard combinatorial prob-

lem that is computationally intractable in practical cases. Mean-

while, the rank minimization is discontinuous, so the objective

function cannot be optimized with SGD. The tightest possible con-

vex relaxation of function (13) is to replace the rank with the nuclear

norm equal to the sum of its singular values [2]:

rank (M) ≈ ||M||∗,(9)

where ||M||∗is the nuclear norm of matrix M.

As nuclear norm is discontinuous, it cannot be optimized with

SGD as well. [18] gives a smooth approximation via conjugate for

nuclear norm. It is the sum of the Huber penalties applied to the

singular values of M.

||M||∗≈Õ

ϕµ(Θi(M)),(10)

where Θi(M)is the singular values of M, and ϕµis

ϕµ(Z)=









2µ,|Z| ≤ µ

|Z| − µ

2,|Z|>µ

(11)

where

controls accuracy and smoothness,

is the result of

Θi(M)

Thus the approximated regularization term

ℓ′

r eд

is Eq.

(12)

, and

the

overall objective function ℓ

in the

-th layer is approximated

as Eq.(13):

ℓl

r eд′≈

i=1Õ

ϕµ(Θm(Sl

i)),(12)

min

ℓl=ℓl

4+αℓl

2+λℓl

r eд′.(13)

Session: Long - Network Embedding I

CIKM ’19, November 3–7, 2019, Beijing, China

413

It should be noted that, to ensure that the base vectors of the

subspace of each level are orthogonal,

is orthogonalized [

Another advantage of layer-by-layer learning is to replace the

pseudo-inverse calculation in the origin problem with a matrix

orthogonalization calculation over Sl

iof each layer.

Algorithm 1 The SpaceNE algorithm

function

LearnFeatures(Network

, Hierarchical Clustering

Tree T)

Ulocal = minimize ℓ1with Adam (see Eq.(1))

U= RecursiveOptimization (c1

1,T,Ulocal)

return U,S

function

RecursiveOptimization(Current node

, Hierachical

Clustering Tree T, Representation Set Ulocal )

if l=Lthen return U,S

{Sl+1

j}= minimize H(l)

iwith Adam (see Eq.(13))

for all cl+1

j∈Ch(cl

i)do

Sl+1

j=Orthoдonalize (Sl+1

Ul+1

j=Sl

jUl

for all cl+1

j∈Ch(cl

i)do

U,S= RecursiveOptimization (cl+1

j,T,Ulocal)

return U,S

5.3 The Subspace Algorithm

Algorithm 1 is the pseudocode of SpaceNE. In the function Learn-

Features, we optimize the local objective Eq.

(1)

to learn the node

embedding vectors preserving local information, and then adjust

the embedding vectors to add subspace constraints from top to

down. In the function RecursiveOptimization, the set of projection

matrices

{Sl+1

can be obtained by optimizing the objective

(13)

and orthogonalization. We obtain the node representations of the

next layer Ul+1

jby matrix multiplication.

Time Complexity Analysis

Without loss of generality, we as-

sume that Tis a k-ary tree and the dimension of subspaces in l-th

layer decreases exponentially, i.e.

dl=⌊logk(D−l)⌋

. SpaceNE

should be done on each tree node layer by layer and the learning

procedures on the same layer tree nodes can be parallelized. For

a certain node

, the complexity of calculating the loss

ℓl

O(k(logk(D−l))2)

, and the complexity of the loss

ℓl

O((klogk(D−

l))2)

. Therefore, the time complexity of optimizing the overall loss

ℓ

O(E(klogk(D−l))2)

where

is the number of training epochs.

Thus the total complexity of SpaceNE is

O(|V|E (klogkD)2)

. The

optimization algorithm is implemented on the Tensorow platform,

thus can be accelerated with GPU.

6 EXPERIMENTS

6.1 Experiment Setup

Datasets

. We employ the following real networks in the Facebook

social networks [

]. The datasets in our paper and their basic

attributes are shown in Table 1. Moreover, synthetic network con-

sisting of a 6-layer hierarchy is used to evaluate the performance

of hierarchical structure preservation. We generate a self-similar

network with a hierarchical network generation algorithm intro-

duced in [

]. Specically, we rst generate a 6-layer trident tree,

and use the leaves of the tree as the nodes of the network. Then we

add edges to the node pairs with probability proportional to path

length among them in the k-ary tree.

Datasets Vertices Edges Layers Classes Labels

Amherst 2314 96394 2 5 ✓

Georgetown 9414 425639 2 5 ✓

UC 16808 522148 2 5 ✓

Sync_6 729 21735 6 × ×

Sync_show 125 982 4 × ×

Amherst_noise 2314 91409 2 5 ✓

Table 1: Data set statistics and properties.

Relevant Algorithms

. In the experiments, we compare SpaceNE

against the following baselines:

•

MNMF [

]: a single-layer community structure preserv-

ing baseline, which integrates the community information

through a matrix factorization.

•

GNE [

]: a multi-layer community structure preserving base-

line, which embeds communities onto surface of the spheres.

•

DeepWalk [

]: a method combines truncated random walks

[9] with Skip-Gram [17] model to learn vertex embeddings.

•

LINE [

]: a popular baseline based on preserving the rst-

and second- order relational information among vertices.

•

Struc2Vec [

]: measures node similarity at dierent scales,

and uses a multilayer graph to encode structural similarities.

•

SpectralClustering [

]: learns the vertex representations by

factorizing the Laplacian Matrix.

Parameter Settings

. We conducted many experiments, and the

optimal default parameters are selected as follows:

µ=

10,

α=

λ=

. In our experiments, the embedding size

of all models is

64.

Besides, the parameter setting of comparison models follows the

recommended settings in relevant code packages. Specically, for

DeepWalk, the walk length is set to 40 and window size is set to 10.

For GNE, the scaling radius is set to 3.0,

is set to 0.2,

is set to 3,

the initial radius is set to 105, min radius is set to 0.05, max radius

is set to 0.25, and the scaling radius is set to 3.0. For LINE, we set

the number of negative samples used in negative sampling as 5, the

starting value of the learning rate as 0.025, and the total number

of training samples as 100M. We consider both srt- and second-

order information in LINE. For Struc2Vec, the walk length is set to

10, the number of walks is set to 80, the stay probability is set to

0.3. For MNMF, the number of clusters to is set to 10,

is set to 0.2,

is set to 0.05,

is set to 5.0, the parameter “lower-control” is set

to 10

−15

. For SpectralClustering, we use PCA to reduce dimension

of the laplacian matrix.

6.2 Alleviating Sparsity

To evaluate the performance of alleviating sparsity of SpaceNE, we

conduct experiments on node classication and its resistance to

randomly added noises on edges, with respect to link prediction.

Session: Long - Network Embedding I

CIKM ’19, November 3–7, 2019, Beijing, China

414

Model Amherst Georgetown UC

30% 50% 70% 90% 30% 50% 70% 90% 30% 50% 70% 90%

SpaceNE 92.52 93.11 93.74 95.09 56.12 56.42 56.92 56.54 88.69 89.02 89.23 90.07

GNE 93.17 93.33 93.26 93.52 52.19 53.53 53.75 53.12 87.78 88.42 88.42 87.57

MNMF 87.11 88.04 89.23 89.96 51.52 51.69 51.60 53.25 87.89 87.95 88.09 88.10

DeepWalk 91.09 91.26 91.71 92.03 51.45 53.25 53.76 54.03 88.35 88.42 88.51 88.63

LINE 91.11 91.53 91.89 91.67 51.35 51.93 52.18 52.38 87.71 87.88 87.95 87.53

Struc2Vec 72.72 73.35 73.92 77.23 46.85 47.44 48.33 47.59 87.96 87.89 88.11 88.25

SpectralClustering 72.88 73.51 73.89 74.41 49.67 50.02 50.79 51.23 84.23 84.35 84.31 84.21

Table 2: The multi-label classication results on dierent percentages of train datasets

6.2.1 Node Classification. Three real-world social networks of the

Facebook datasets with a four-layer hierarchical tree (including root

and leaves) are used in the vertex classication experiment. The two

intermediate layers are divided by the enrollment year and major,

respectively. For MNMF, we use enrollment year as an indicator

for community division. The learned representations are used to

classify the vertices into a set of labels. The classier we used is

Logistic Regression, and the evaluation metric is Accuracy. Dierent

percentage of nodes are sampled randomly for evaluation, and the

rest are for training. The results are averaged over 10 dierent runs.

Table 2 shows that SpaceNE performs well in most cases. Al-

though MNMF, GNE and SpaceNE all take community information

into consideration, SpaceNE still performs better, which means the

low-rank representation of SpaceNE plays an important role. Al-

though considering some global network structure, Struc2Vec does

not work very well. Compared with Amherst, our model performs

better on the Georgetown dataset and the UC dataset. It is because

our model is more stable on larger datasets after integrating the

hierarchical community structure information.

6.2.2 Resistance to Random Noise. Besides preserving the deep hi-

erarchical community structure, SpaceNE tends to be more resistant

to random noises due to the inherent properties of subspaces, hence

generating node embeddings robust enough regardless of noise. On

one hand, the reconstruction optimization term (Eq.

(8)

) of SpaceNE

is similar to PCA, which maintains the noise reduction feature of

it. On the other hand, the low rank optimization term (Eq.

(6)

) of

SpaceNE can learn the most important features of the network and

then improve the resistance to noise[

]. In order to verify the re-

sistance to noise of SpaceNE, we conduct several experiments and

compare the results of SpaceNE with other community or global

structure preserving methods (MNMF, GNE and Struc2Vec). We

construct the noisy data according to the method introduced by

[

]. Specically, the dataset we used is Amherst_noise (see Table 1).

We contaminate Amherst by adding 5% the number of total edges

and delete 5% of them in Amherst randomly. Then, we conduct link

prediction tasks using SpaceNE and its counterparts on Amherst

and Amherst_noise datasets for comparison.

The results are shown in Table 3. The values in Table 3 rep-

resent the reduction in percentage in precision and recall of the

algorithms after adding the noise. “Gain of SpaceNE” is the im-

provement of SpaceNE with respect to the best of other baseline

methods. It can be seen from Table 3 that all the algorithms go

worse after contamination, but SpaceNE still performs better than

its counterparts. Moreover, we can nd that compared with other

Model precision precision recall recall

70% 90% 70% 90%

SpaceNE -0.54 -0.24 -0.01 -0.01

GNE -1.26 -0.96 -0.03 -0.26

MNMF -2.71 -2.05 -0.02 -0.06

Struc2Vec -1.01 -0.99 -0.02 -0.02

Gain of SpaceNE [%] 46.53 75 50 50

Table 3: The link prediction results on dierent percentages

of the training dataset, Amherst_noise. The values are reduc-

tion in precision (or recall) of the algorithms after adding

the noise.

community preserving methods (MNMF and GNE), SpaceNE can

not only preserve a deeper hierarchical community structure, but

it is also less aected by noise in a complex network.

6.3 Alleviating Space Warps

To evaluate how SpaceNE alleviates space warps, we conduct ex-

periments on link prediction, verifying whether it relieves the prob-

lem owned by its community-aware counterparts. In addition, we

present an illustration of how GNE and SpaceNE preserve distances

between pairwise, intra-community nodes as the depth increases.

6.3.1 Link Prediction. We conduct link prediction experiments on

all three datasets. We sample a proportion of edges from the initial

network, which are used as positive samples, along with an identical

number of random negative edges. We take the inner product of

embedding vectors as the score for each sample, which are then

ranked where the samples with their score in top 50% are predicted

as “positive” edges. We report precision as the metric, which is

equivalent to recall as we use an identical number of positive and

negative samples.

The results of are reported in Table 4. Intuitively, community-

aware methods are inherently compromised in link prediction tasks

as, compared to Skip-Gram based model which preserves primarily

proximity between pairwise nodes, their community-aware coun-

terparts tend to sacrice such pairwise properties for better mod-

eling of clustering properties and hence warping the space where

vertices reside, which can be exemplied by GNE and MNMF be-

ing gravely defeated by Skip-Gram models including DeepWalk.

In contrast, our model, SpaceNE, is able to address more complex

community structures without compromising its performance com-

pared with its Skip-Gram counterparts. It is hence concluded that

Session: Long - Network Embedding I

CIKM ’19, November 3–7, 2019, Beijing, China

415

our modeling of communities using subspaces possesses a wider

range of applicable elds in that it does not enhance community

structures at the expense of other key properties.

6.3.2 Distance between pairwise, intra-community nodes. In order

to demonstrate why SpaceNE performs better on the task of link

prediction than other community preserving methods, here we

show the average distance between pairwise, intra-community

nodes, which will be aected most signicantly by space warps in

GNE.

As SpaceNE is able to preserve the distances among nodes rea-

sonable without making them too close as GNE, we calculate the

average distance of intra-community node pairs on dierent layers.

Specically, the average distance of intra-community node pairs

on layer lis calculated as

|Ml|

i=1Avд_Dis(Cl

i),(14)

where

is the number of communities on the

-th layer.

Avд_Dis

is the average distance of all nodes pairs belonging to the

-th

community on the l-th layer Cl

i, which can be calculated by

Avд_Dis(Cl

i)=1

|Cl

i||Cl

i−1|Õ

x,y∈Cl

i,x,y

D(x,y).(15)

D(x,y)

is the Euclidean distance between vector representations

of node

and node

. We use Sync_6, introduced previously as an

articial network consisting of a 6-layer community. Fig.3 shows

the results with varying numbers of layers, where the y-coordinate

scale is

logarithmic

. Fig. 3 shows that the average distance decays

exponentially in GNE, while linearly in SpaceNE. This result partly

explains why SpaceNE is superior to GNE in essence: the nodes are

too close to each other so that they can not be distinguished.

Model Amherst Georgetown UC

SpaceNE 85.61 89.28 91.32

GNE 62.07 68.97 51.25

MNMF 48.89 49.76 50.05

DeepWalk 86.40 89.16 91.39

LINE 74.37 76.58 71.22

Struc2Vec 51.77 49.94 46.83

SpectralClustering 37.76 40.63 38.68

Table 4: The link prediction results on dierent datasets.

6.4 Eliminating the “Curse” of Depth

To evaluate the performance of modeling extremely deep hierar-

chies, experiments are conducted on deep hierarchical community

detection task. The basic properties of our dataset “Sync_6” are

shown in Table 1. “Sync_6” is a 6-layer synthetic dataset. We ap-

plied K-means to the learned node representations. We compare the

dierences between the clustering results and the prior community

division at dierent layers of the hierarchical community tree. Ad-

justed rand index (ARI), ranging from -1 to 1, is an index reecting

the similarity between two sets, thus it is applied as an index to

123456

Number of layers in the network

10-2

100

102

104

Average Distance

SpaceNE

GNE

Figure 3: Average Distance with the increasing of layers

evaluate the performance of community preservation. The closer

ARI is to 1, the better the performance of community detection is.

Fig.5 illustrates the eect of the number of layers on the hierar-

chical community detection (excluding the root, for there is only

one community in the root layer). The results show that the hierar-

chical community structure can be integrally preserved by SpaceNE,

no matter how deep the hierarchical community or how many com-

munities exist within the network. Obviously, when the number of

layers reaches 4, the radii of spheres in GNE are unduly small, thus

the accuracy of community detection begins to decline, which is

probably cause by underow as the number of layers increase and

the spheres shrink.

Besides, the experimental results illustrate that DeepWalk and

LINE are incompetent at structure preservation. While DeepWalk

can preserve part of the hierarchical information in the process

of random walks, the results show when the number of layers ex-

ceeds 4, the performance of DeepWalk drastically declines. LINE

performed even worse compared to DeepWalk, as LINE only con-

siders the local structure as far as 2 steps away of each node and

does not model the hierarchical community information. MNMF

only preserves community at some layers but not all, leading to

poor performance after a few layers. It is worth mentioning that

as a global structure preserving algorithm, Struc2Vec performs the

best at layer 5, while becomes worse at the deeper layers.

6.5 Eciency

To demonstrate the eciency advantage of SpaceNE, we compared

the running time of SpaceNE, GNE, Struc2Vec and MNMF, methods

capturing the community or global structure. The three datasets

from small to large are described in Table 1. All eciency exper-

iments were conducted on a single machine with a 12GB mem-

ory GPU. Results are presented in Fig. 6. MNMF only considers a

single-layer community, thus the running time is shorter. Although

Struc2Vec captures some global structural information, its optimiza-

tion is too slow, which is not suitable for large networks. Although

GNE and SpaceNE have similar time complexity in theory, the con-

vergence of SpaceNE is faster. The reason is that GNE preserves the

Session: Long - Network Embedding I

CIKM ’19, November 3–7, 2019, Beijing, China

416

(b) GNE (c) MNMF

(e) LINE (f) DeepWalk(d) Stru2vec

(a) SpaceNE

Figure 4: The visualization of vertex representations in 2-D space from dierent models.

123456

Number of layers in the network

-0.5

0.5

Performance of Community Detection

SpaceNE

GNE

MNMF

Deepwalk

LINE

Stru2Vec

Figure 5: The comparison of hierarchical community preser-

vation on dierent models. A 6-layer generated hierarchical

networks is used.

community structure through projections to the sphere. Compared

with the linear matrix multiplication of subspace, the optimization

of GNE is more complex and requires more epochs. The result

shows that SpaceNE is scalable to large networks.

6.6 Visualization

We visualize the “Sync_show” network used in [

]. Fig.4 shows the

visualization experiments. For all experiments in the comparison,

we rst embed the network into low-dimensional spaces, and then

map the low-dimensional vectors of the vertices to a 2-D space with

Amherst Georgetown UC

Dataset

100

150

200

Running time/min

MNMF

Stru2Vec

GNE

SpaceNE

Figure 6: The running time on dierent datasets.

t-SNE[

] package. Note that [

] maps GNE

directly into two-

dimensional space

, which is dierent from the other methods.

Thus in order to unify the experimental standards, we do not use

the method in this work. We use t-SNE to conduct dimensionality

reduction for all methods.

Fig.4 shows that the SpaceNE preserves both relationship among

communities and relationship within the community. Although

GNE keeps the relationship among communities, the nodes from

the same community are too close to each other, which means the

relationship among the nodes in the same community is ignored.

This proves the space warp of GNE again, from a new view. Addi-

tionally, SpaceNE has an outstanding performance on clustering

vertices compared with other methods.

Session: Long - Network Embedding I

CIKM ’19, November 3–7, 2019, Beijing, China

417

7 CONCLUSION AND FUTURE WORK

In this paper, we proposed SpaceNE, introducing subspaces to the

eld of community network embedding. To the best of our knowl-

edge, this work is the rst attempt of introducing subspaces into

network representation learning. Specically, we design elaborate

objectives preserving proximity between pairwise nodes, across

communities, along with constraints on subspace dimension which

was then approximated by a dierentiable one, leading to ecient

optimization of our model. Empirically, we verify SpaceNE in a

variety of datasets and applications. Extensive experimental re-

sults demonstrate the advantages of SpaceNE, especially on link

prediction, hierarchical community preservation.

Here we focus on the hierarchical network embedding by using

subspace theory. Nevertheless, the theory of subspace is largely

overlooked in the eld of network representation learning. For

future work, one intriguing direction is utilizing subspace theory

to deal with the heterogeneity of complex networks. Also, in real-

life scenes, such as neighborhood-based recommendation, when

searching for the nearest neighbor of an item, the search engine

only needs to search in the lower-dimensional subspace, which can

improve the eciency greatly.

ACKNOWLEDGMENTS

We are thankful to Yizhou Zhang for his helpful suggestions. This

work was supported by the National Natural Science Foundation

of China (Grant No. 61876006 and No. 61572041).

REFERENCES

[1]

Åke Björck. 1994. Numerics of gram-schmidt orthogonalization. Linear Algebra

and Its Applications 197 (1994), 297–316.

[2]

Emmanuel J Candes and Benjamin Recht. 2009. Exact Matrix Completion via

Convex Optimization. Foundations of Computational Mathematics 9, 6 (2009),

717–772.

[3]

Aaron Clauset, Cristopher Moore, and Mark EJ Newman. 2008. Hierarchical

structure and the prediction of missing links in networks. Nature 453, 7191 (2008),

98.

[4]

Aaron Clauset, Cristopher Moore, and M E J Newman. 2006. Structural inference

of hierarchies in networks. international conference on machine learning (2006),

1–13.

[5]

Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. 2018. A Survey on Network

Embedding. IEEE Transactions on Knowledge and Data Engineering (2018), 1–1.

[6]

Chris Ding, Ding Zhou, Xiaofeng He, and Hongyuan Zha. 2006. R 1-PCA: ro-

tational invariant L 1-norm principal component analysis for robust subspace

factorization. In Proceedings of the 23rd international conference on Machine learn-

ing. ACM, 281–288.

[7]

Lun Du, Zhicong Lu, Yun Wang, Guojie Song, Yiming Wang, and Wei Chen. 2018.

Galaxy network embedding: a hierarchical community structure preserving

approach. In Proceedings of the 27th International Joint Conference on Articial

Intelligence. AAAI Press, 2079–2085.

[8]

Lun Du, Yun Wang, Guojie Song, Zhicong Lu, and Junshan Wang. 2018. Dy-

namic network embedding: an extended approach for skip-gram based network

embedding. In Proceedings of the 27th International Joint Conference on Articial

Intelligence. AAAI Press, 2086–2092.

[9]

Francois Fouss, Alain Pirotte, Jeanmichel Renders, and Marco Saerens. 2007.

Random-Walk Computation of Similarities between Nodes of a Graph with

Application to Collaborative Recommendation. IEEE Transactions on Knowledge

and Data Engineering 19, 3 (2007), 355–369.

[10]

Michelle Girvan and Mark EJ Newman. 2002. Community structure in social and

biological networks. Proceedings of the national academy of sciences 99, 12 (2002),

7821–7826.

[11]

Robert Hechtnielsen. 1988. Theory of the backpropagation neural network.

Neural Networks 1 (1988), 445–448.

[12]

Georey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the dimensional-

ity of data with neural networks. science 313, 5786 (2006), 504–507.

[13]

Omer Levy and Yoav Goldberg. 2014. Neural word embedding as implicit matrix

factorization. In Advances in neural information processing systems. 2177–2185.

[14]

Ziyao Li, Liang Zhang, and Guojie Song. 2019. Sepne: Bringing separability to net-

work embedding. In Proceedings of the AAAI Conference on Articial Intelligence,

Vol. 33. 4261–4268.

[15]

Guangcan Liu, Zhouchen Lin, and Yong Yu. 2010. Robust subspace segmentation

by low-rank representation. In Proceedings of the 27th international conference on

machine learning (ICML-10). 663–670.

[16]

Laurens Van Der Maaten and Georey Hinton. 2008. Visualizing Data using

t-SNE. Journal of Machine Learning Research 9, 2605 (2008), 2579–2605.

[17]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jerey Dean. 2013. Ecient

estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

(2013).

[18]

Yurii Nesterov. 2005. Smooth minimization of non-smooth functions. Mathemat-

ical Programming 103, 1 (2005), 127–152.

[19]

Mark EJ Newman. 2003. The structure and function of complex networks. SIAM

review 45, 2 (2003), 167–256.

[20]

M E J Newman. 2006. Finding community structure in networks using the

eigenvectors of matrices. Physical Review E 74, 3 (2006), 036104.

[21]

M E J Newman and Michelle Girvan. 2004. Finding and evaluating community

structure in networks. Physical Review E 69, 2 (2004), 026113–026113.

[22]

Maximillian Nickel and Douwe Kiela. 2017. Poincaré embeddings for learning

hierarchical representations. In Advances in neural information processing systems.

6338–6347.

[23]

Bryan Perozzi, Rami Alrfou, and Steven Skiena. 2014. DeepWalk: online learning

of social representations. Knowledge Discovery and Data mining (2014), 701–710.

[24]

Leonardo Filipe Rodrigues Ribeiro, Pedro H P Saverese, and Daniel R Figueiredo.

2017. struc2vec : Learning Node Representations from Structural Identity. knowl-

edge discovery and data mining (2017), 385–394.

[25]

Huawei Shen, Xueqi Cheng, Kai Cai, and Mao Bin Hu. 2009. Detect overlap-

ping and hierarchical community structure in networks. Physica A Statistical

Mechanics & Its Applications 388, 8 (2009), 1706–1712.

[26]

Victor Spirin and Leonid A Mirny. 2003. Protein complexes and functional

modules in molecular networks. Proceedings of the National Academy of Sciences

of the United States of America 100, 21 (2003), 12123–12128.

[27]

Gilbert Strang, Gilbert Strang, Gilbert Strang, and Gilbert Strang. 1993. Introduc-

tion to Linear Algebra. Vol. 3. Wellesley-Cambridge Press Wellesley, MA.

[28]

Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015.

LINE: Large-scale Information Network Embedding. In International Conference

on World Wide Web. 1067–1077.

[29]

Lei Tang and Huan Liu. 2011. Leveraging social media networks for classication.

Data Mining and Knowledge Discovery 23, 3 (2011), 447–478.

[30]

Amanda L. Traud, Peter J. Mucha, and Mason A. Porter. 2012. Social structure of

Facebook networks. Social Science Electronic Publishing 391, 16 (2012), 4165–4180.

[31]

René Vidal. 2011. Subspace clustering. IEEE Signal Processing Magazine 28, 2

(2011), 52–68.

[32]

Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierrean-

toine Manzagol. 2010. Stacked Denoising Autoencoders: Learning Useful Rep-

resentations in a Deep Network with a Local Denoising Criterion. Journal of

Machine Learning Research 11 (2010), 3371–3408.

[33]

Junshan Wang, Zhicong Lu, Guojia Song, Yue Fan, Lun Du, and Wei Lin. 2019.

Tag2Vec: Learning Tag Representations in Tag Networks. In The World Wide Web

Conference. ACM, 3314–3320.

[34]

Xiao Wang, Peng Cui, Jing Wang, Jian Pei, Wenwu Zhu, and Shiqiang Yang. 2017.

Community Preserving Network Embedding. In Association for the Advancement

of Articial Intelligence Conference.

[35]

Svante Wold, Kim Esbensen, and Paul Geladi. 1987. Principal component analysis.

Chemometrics and intelligent laboratory systems 2, 1-3 (1987), 37–52.

[36]

Zi Yin and Yuanyuan Shen. 2018. On the Dimensionality of Word Embedding.

neural information processing systems (2018), 895–906.

[37]

Yizhou Zhang, Guojie Song, Lun Du, Shuwen Yang, and Yilun Jin. 2019. DANE:

Domain Adaptive Network Embedding. In Proceedings of the 28th International

Joint Conference on Articial Intelligence. AAAI Press.

Session: Long - Network Embedding I

CIKM ’19, November 3–7, 2019, Beijing, China

418

A subspace constraint based approach for fast hierarchical graph embedding

Article

Full-text available

Aug 2023
WORLD WIDE WEB

Hierarchy network, as a type of complex graphs, is widely used in many application scenarios such as social network analysis in web, human resource analysis in e-government, and product recommendation in e-commerce. Hierarchy preserving network embedding is a representation learning method that project nodes into feature space by preserving the hierarchy property of networks. Recently, researches on network embedding are devoted to mining hierarchical structures and profit a lot form it. Among these works, SpaceNE stands out of preserving hierarchy with the help of subspace constraint on the hierarchical subspace system. However, like all other existing works, SpaceNE is based on transductive learning method and is hard to generalize to new nodes. Besides, they have high time complexity and hard to be scalable to large-scale networks. This paper proposes an inductive method, FastHGE to learn node representation more efficiently and generalize to new nodes more easily. As SpaceNE, a hierarchy network is embedded into a hierarchical subspace tree. For upper communities, we exploit transductive learning by preserving inner-subspace proximity of subspace from the same ancestor. For extending to new nodes, we adopt inductive learning to learn representations of leaf nodes. The overall representation of a node is retrieved by concatenating the embedding vectors of all its ancestor communities and the leaf node. By learning the basis vectors of subspace, the computing cost is alleviated from updating many parameters of projection matrices as in SpaceNE. The performance evaluation experiments show that FastHGE outperforms with much fast speed and the same accuracy. For example, in the node classification, FastHGE is nearly 30 times faster than SpaceNE. The source code of FastHGE is available online.

A Subspace Constraint based Approach for Fast Hierarchical Graph Embedding

Preprint

Full-text available

Dec 2022

MM-GNN: Mix-Moment Graph Neural Network towards Modeling Neighborhood Feature Distribution

Preprint

Full-text available

Aug 2022

Graph Neural Networks (GNNs) have shown expressive performance on graph representation learning by aggregating information from neighbors. Recently, some studies have discussed the importance of modeling neighborhood distribution on the graph. However, most existing GNNs aggregate neighbors' features through single statistic (e.g., mean, max, sum), which loses the information related to neighbor's feature distribution and therefore degrades the model performance. In this paper, inspired by the method of moment in statistical theory, we propose to model neighbor's feature distribution with multi-order moments. We design a novel GNN model, namely Mix-Moment Graph Neural Network (MM-GNN), which includes a Multi-order Moment Embedding (MME) module and an Element-wise Attention-based Moment Adaptor module. MM-GNN first calculates the multi-order moments of the neighbors for each node as signatures, and then use an Element-wise Attention-based Moment Adaptor to assign larger weights to important moments for each node and update node representations. We conduct extensive experiments on 15 real-world graphs (including social networks, citation networks and web-page networks etc.) to evaluate our model, and the results demonstrate the superiority of MM-GNN over existing state-of-the-art models.

Unveiling Delay Effects in Traffic Forecasting: A Perspective from Spatial-Temporal Delay Differential Equations

Conference Paper

May 2024

A Comprehensive Survey on Deep Graph Representation Learning

Article

Feb 2024
NEURAL NETWORKS

Unsupervised social network embedding via adaptive specific mappings

Article

Jan 2024

In this paper, we address the problem of unsuperised social network embedding, which aims to embed network nodes, including node attributes, into a latent low dimensional space. In recent methods, the fusion mechanism of node attributes and network structure has been proposed for the problem and achieved impressive prediction performance. However, the non-linear property of node attributes and network structure is not efficiently fused in existing methods, which is potentially helpful in learning a better network embedding. To this end, in this paper, we propose a novel model called ASM (Adaptive Specific Mapping) based on encoder-decoder framework. In encoder, we use the kernel mapping to capture the non-linear property of both node attributes and network structure. In particular, we adopt two feature mapping functions, namely an untrainable function for node attributes and a trainable function for network structure. By the mapping functions, we obtain the low dimensional feature vectors for node attributes and network structure, respectively. Then, we design an attention layer to combine the learning of both feature vectors and adaptively learn the node embedding. In encoder, we adopt the component of reconstruction for the training process of learning node attributes and network structure. We conducted a set of experiments on seven real-world social network datasets. The experimental results verify the effectiveness and efficiency of our method in comparison with state-of-the-art baselines.

Joint Learning of Hierarchical Community Structure and Node Representations: An Unsupervised Approach

Chapter

Mar 2023

Graph representation learning has demonstrated improved performance in tasks such as link prediction and node classification across a range of domains. Research has shown that many natural graphs can be organized in hierarchical communities, leading to approaches that use these communities to improve the quality of node representations. However, these approaches do not take advantage of the learned representations to also improve the quality of the discovered communities and establish an iterative and joint optimization of representation learning and community discovery. In this work, we present Mazi, an algorithm that jointly learns the hierarchical community structure and the node representations of the graph in an unsupervised fashion. To account for the structure in the node representations, Mazi generates node representations at each level of the hierarchy, and utilizes them to influence the node representations of the original graph. Further, the communities at each level are discovered by simultaneously maximizing the modularity metric and minimizing the distance between the representations of a node and its community. Using multi-label node classification and link prediction tasks, we evaluate our method on a variety of synthetic and real-world graphs and demonstrate that Mazi outperforms other hierarchical and non-hierarchical methods.KeywordsNetworksNetwork embeddingUnsupervised learningGraph representation learningHierarchical clusteringCommunity detection

ROLE: Rotated Lorentzian Graph Embedding Model for Asymmetric Proximity

Article

Jan 2022

Graph embedding, which aims to learn low-dimensional node representations to preserve original graph structures, has attracted extensive research interests. However, most existing graph embedding models represent nodes in Euclidean spaces, which cannot effectively preserve complex patterns, e.g., hierarchical structures. Very recently, several hyperbolic embedding models have been proposed to preserve the hierarchical information in negative curvature spaces. Nevertheless, existing hyperbolic models fail to model the asymmetric proximity between nodes. To address this, we investigate a new asymmetric hyperbolic network representation problem, which targets at jointly preserving the hierarchical structures and asymmetric proximity for general directed graphs. We solve this problem by proposing a novel Ro tated L orentzian E mbedding (ROLE) model, which yields two main benefits. First, our model can effectively capture both implicit and explicit hierarchical structures that come from the network topology and category information of nodes, respectively. Second, it can model the asymmetric proximity using rotation transformations. Specifically, we represent each node with a Lorentzian embedding vector, and learn two rotation matrices to reflect the direction of edges. We conduct extensive experiments on four real-world directed graph datasets. Empirical results demonstrate that the proposed approach consistently outperforms various state-of-the-art embedding models. In particular, ROLE achieves HR@1 scores up to 19.8% higher and NDCG@5 scores up to 11.3% higher than the best baselines on the task of node recommendation.

Uncovering the Local Hidden Community Structure in Social Networks

Article

Oct 2022

Hidden community is a useful concept proposed recently for social network analysis. Hidden communities indicate some weak communities whose most members also belong to other stronger dominant communities. Dominant communities could form a layer that partitions all the individuals of a network, and hidden communities could form other layer(s) underneath. These layers could be natural structures in the real-world networks like students grouped by major, minor, hometown, etc. To handle the rapid growth of network scale, in this work, we explore the detection of hidden communities from the local perspective, and propose a new method that detects and boosts each layer iteratively on a subgraph sampled from the original network. We first expand the seed set from a single seed node based on our modified local spectral method and detect an initial dominant local community. Then we temporarily remove the members of this community as well as their connections to other nodes, and detect all the neighborhood communities in the remaining subgraph, including some “broken communities” that only contain a fraction of members in the original network. The local community and neighborhood communities form a dominant layer, and by reducing the edge weights inside these communities, we weaken this layer’s structure to reveal the hidden layers. Eventually, we repeat the whole process and all communities containing the seed node can be detected and boosted iteratively. We theoretically show that our method can avoid some situations that a broken community and the local community are regarded as one community in the subgraph, leading to the inaccuracy on detection which can be caused by global hidden community detection methods. Extensive experiments show that our method could significantly outperform the state-of-the-art baselines designed for either global hidden community detection or multiple local community detection.

Multi-Constrained Embedding for Accurate Community Detection on Undirected Networks

Article

Sep 2022

A Symmetric Non-negative Matrix Factorization (SNMF)-based network embedding model adopts a unique Latent Factor (LF) matrix for describing the symmetry of an undirected network, which reduces its representation ability to the target network and thus resulting in accuracy loss when performing community detection. To address this issue, this paper proposes a new undirected network embedding model, i.e., Alternating Direction Method of Multipliers ( A DMM)-based, M odularity, S ymmetry and N onnegativity-constrained E mbedding (AMSNE), which can be applicable to undirected, weighted or unweighted networks. It relies on two-fold ideas: a) Introducing the symmetry constraints into the model to correctly describe the symmetric of an undirected network without accuracy loss; and b) Adopting the ADMM principle to efficiently solve its constrained objective. Extensive experiments on eight real-world networks strongly evidence that the proposed AMSNE outperform several state-of-the-art models, making it suitable for real applications.

Tag2Vec: Learning Tag Representations in Tag Networks

Conference Paper

Full-text available

May 2019

Network embedding is a method to learn low-dimensional representation vectors for nodes in complex networks. In real networks, nodes may have multiple tags but existing methods ignore the abundant semantic and hierarchical information of tags. This information is useful to many network applications and usually very stable. In this paper, we propose a tag representation learning model, Tag2Vec, which mixes nodes and tags into a hybrid network. Firstly, for tag networks, we define semantic distance as the proximity between tags and design a novel strategy, parameterized random walk, to generate context with semantic and hierarchical information of tags adaptively. Then, we propose hyperbolic Skip-gram model to express the complex hierarchical structure better with lower output dimensions. We evaluate our model on the NBER U.S. patent dataset and WordNet dataset. The results show that our model can learn tag representations with rich semantic information and it outperforms other baselines.

Dynamic Network Embedding : An Extended Approach for Skip-gram based Network Embedding

Conference Paper

Full-text available

Jul 2018

Network embedding, as an approach to learn low-dimensional representations of vertices, has been proved extremely useful in many applications. Lots of state-of-the-art network embedding methods based on Skip-gram framework are efficient and effective. However, these methods mainly focus on the static network embedding and cannot naturally generalize to the dynamic environment. In this paper, we propose a stable dynamic embedding framework with high efficiency. It is an extension for the Skip-gram based network embedding methods, which can keep the optimality of the objective in the Skip-gram based methods in theory. Our model can not only generalize to the new vertex representation, but also update the most affected original vertex representations during the evolvement of the network. Multi-class classification on three real-world networks demonstrates that, our model can update the vertex representations efficiently and achieve the performance of retraining simultaneously. Besides, the visualization experimental result illustrates that, our model is capable of avoiding the embedding space drifting.

Galaxy Network Embedding: A Hierarchical Community Structure Preserving Approach

Conference Paper

Full-text available

Jul 2018

Network embedding is a method of learning a low-dimensional vector representation of network vertices under the condition of preserving different types of network properties. Previous studies mainly focus on preserving structural information of vertices at a particular scale, like neighbor information or community information, but cannot preserve the hierarchical community structure, which would enable the network to be easily analyzed at various scales. Inspired by the hierarchical structure of galaxies, we propose the Galaxy Network Embedding (GNE) model, which formulates an optimization problem with spherical constraints to describe the hierarchical community structure preserving network embedding. More specifically, we present an approach of embedding communities into a low dimensional spherical surface, the center of which represents the parent community they belong to. Our experiments reveal that the representations from GNE preserve the hierarchical community structure and show advantages in several applications such as vertex multi-class classification and network visualization. The source code of GNE is available online.

A Survey on Network Embedding

Article

Full-text available

Nov 2017

Network embedding assigns nodes in a network to low-dimensional representations and effectively preserves the network structure. Recently, a significant amount of progresses have been made toward this emerging network analysis paradigm. In this survey, we focus on categorizing and then reviewing the current development on network embedding methods, and point out its future research directions. We first summarize the motivation of network embedding. We discuss the classical graph embedding algorithms and their relationship with network embedding. Afterwards and primarily, we provide a comprehensive overview of a large number of network embedding methods in a systematic manner, covering the structure- and property-preserving network embedding methods, the network embedding methods with side information and the advanced information preserving network embedding methods. Moreover, several evaluation approaches for network embedding and some useful online resources, including the network data sets and softwares, are reviewed, too. Finally, we discuss the framework of exploiting these network embedding methods to build an effective system and point out some potential future directions.

Community Preserving Network Embedding

Conference Paper

Full-text available

Feb 2017

SepNE: Bringing Separability to Network Embedding

Article

Jul 2019

Many successful methods have been proposed for learning low dimensional representations on large-scale networks, while almost all existing methods are designed in inseparable processes, learning embeddings for entire networks even when only a small proportion of nodes are of interest. This leads to great inconvenience, especially on super-large or dynamic networks, where these methods become almost impossible to implement. In this paper, we formalize the problem of separated matrix factorization, based on which we elaborate a novel objective function that preserves both local and global information. We further propose SepNE, a simple and flexible network embedding algorithm which independently learns representations for different subsets of nodes in separated processes. By implementing separability, our algorithm reduces the redundant efforts to embed irrelevant nodes, yielding scalability to super-large networks, automatic implementation in distributed learning and further adaptations. We demonstrate the effectiveness of this approach on several real-world networks with different scales and subjects. With comparable accuracy, our approach significantly outperforms state-of-the-art baselines in running times on large networks.

DANE: Domain Adaptive Network Embedding

Conference Paper

Aug 2019

Recent works reveal that network embedding techniques enable many machine learning models to handle diverse downstream tasks on graph structured data. However, as previous methods usually focus on learning embeddings for a single network, they can not learn representations transferable on multiple networks. Hence, it is important to design a network embedding algorithm that supports downstream model transferring on different networks, known as domain adaptation. In this paper, we propose a novel Domain Adaptive Network Embedding framework, which applies graph convolutional network to learn transferable embeddings. In DANE, nodes from multiple networks are encoded to vectors via a shared set of learnable parameters so that the vectors share an aligned embedding space. The distribution of embeddings on different networks are further aligned by adversarial learning regularization. In addition, DANE's advantage in learning transferable network embedding can be guaranteed theoretically. Extensive experiments reflect that the proposed framework outperforms other state-of-the-art network embedding baselines in cross-network domain adaptation tasks.

Efficient Estimation of Word Representations in Vector Space

Conference Paper

Jan 2013

We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

Poincaré Embeddings for Learning Hierarchical Representations

Article

May 2017

Representation learning has become an invaluable approach for learning from symbolic data such as text and graphs. However, while complex symbolic datasets often exhibit a latent hierarchical structure, state-of-the-art methods typically learn embeddings in Euclidean vector spaces, which do not account for this property. For this purpose, we introduce a new approach for learning hierarchical representations of symbolic data by embedding them into hyperbolic space -- or more precisely into an n-dimensional Poincar\'e ball. Due to the underlying hyperbolic geometry, this allows us to learn parsimonious representations of symbolic data by simultaneously capturing hierarchy and similarity. We introduce an efficient algorithm to learn the embeddings based on Riemannian optimization and show experimentally that Poincar\'e embeddings outperform Euclidean embeddings significantly on data with latent hierarchies, both in terms of representation capacity and in terms of generalization ability.

Neural word embedding as implicit matrix factorization

Article

Jan 2014
Adv Neural Inform Process Syst

We analyze skip-gram with negative-sampling (SGNS), a word embedding method introduced by Mikolov et al., and show that it is implicitly factorizing a word-context matrix, whose cells are the pointwise mutual information (PMI) of the respective word and context pairs, shifted by a global constant. We find that another embedding method, NCE, is implicitly factorizing a similar matrix, where each cell is the (shifted) log conditional probability of a word given its context. We show that using a sparse Shifted Positive PMI word-context matrix to represent words improves results on two word similarity tasks and one of two analogy tasks. When dense low-dimensional vectors are preferred, exact factorization with SVD can achieve solutions that are at least as good as SGNS's solutions for word similarity tasks. On analogy questions SGNS remains superior to SVD. We conjecture that this stems from the weighted nature of SGNS's factorization.

Hierarchical Community Structure Preserving Network Embedding: A Subspace Approach

Abstract and Figures

Recommended publications

Joint Detection of Community and Structural Hole Spanner of Networks in Hyperbolic Space

Network Embedding on Hierarchical Community Structure Network

Fast Hierarchy Preserving Graph Embedding via Subspace Constraints

Active Domain Transfer on Network Embedding

Tag2Vec: Learning Tag Representations in Tag Networks