Yixiang Fang
The University of Hong Kong | HKU · Department of Computer Science

About

Publications

8,157

Reads

2,078

Citations

Skills and Expertise

Graphs

Algorithms

Algorithm Analysis

Algorithm Development

Graph Τheory

Publications

In-depth Analysis of Densest Subgraph Discovery in a Unified Framework

Preprint

Full-text available

Jun 2024

As a fundamental topic in graph mining, Densest Subgraph Discovery (DSD) has found a wide spectrum of real applications. Several DSD algorithms, including exact and approximation algorithms, have been proposed in the literature. However, these algorithms have not been systematically and comprehensively compared under the same experimental settings....

Efficient Historical Butterfly Counting in Large Temporal Bipartite Networks via Graph Structure-aware Index

Preprint

Jun 2024

Bipartite graphs are ubiquitous in many domains, e.g., e-commerce platforms, social networks, and academia, by modeling interactions between distinct entity sets. Within these graphs, the butterfly motif, a complete 2*2 biclique, represents the simplest yet significant subgraph structure, crucial for analyzing complex network patterns. Counting the...

On Efficient Large Sparse Matrix Chain Multiplication

Article

May 2024

Sparse matrices are often used to model the interactions among different objects and they are prevalent in many areas including e-commerce, social network, and biology. As one of the fundamental matrix operations, the sparse matrix chain multiplication (SMCM) aims to efficiently multiply a chain of sparse matrices, which has found various real-worl...

A Counting-based Approach for Efficient k-Clique Densest Subgraph Discovery

Article

May 2024

Densest subgraph discovery (DSD) is a fundamental topic in graph mining. It has been extensively studied in the literature and has found many real applications in a wide range of fields, such as biology, finance, and social networks. As a typical problem of DSD, the k-clique densest subgraph (CDS) problem aims to detect a subgraph from a graph, suc...

Conference Paper

May 2024

c-core-based pruning in flow-based algos

Efficient and effective algorithms for densest subgraph discovery and maintenance

Article

Full-text available

May 2024

The densest subgraph problem (DSP) is of great significance due to its wide applications in different domains. Meanwhile, diverse requirements in various applications lead to different density variants for DSP. Unfortunately, existing DSP algorithms cannot be easily extended to handle those variants efficiently and accurately. To fill this gap, we...

Efficient Distributed Hop-Constrained Path Enumeration on Large-Scale Graphs

Article

Mar 2024

The enumeration of hop-constrained simple paths is a building block in many graph-based areas. Due to the enormous search spaces in large-scale graphs, a single machine can hardly satisfy the requirements of both efficiency and memory, which causes an urgent need for efficient distributed methods. In practice, it is inevitable to produce plenty of...

Influential Exemplar Replay for Incremental Learning in Recommender Systems

Article

Mar 2024

Personalized recommender systems have found widespread applications for effective information filtering. Conventional models engage in knowledge mining within the static setting to reconstruct singular historical data. Nonetheless, the dynamics of real-world environments are in a constant state of flux, rendering acquired model knowledge inadequate...

Deep Structural Knowledge Exploitation and Synergy for Estimating Node Importance Value on Heterogeneous Information Networks

Article

Mar 2024

The classic problem of node importance estimation has been conventionally studied with homogeneous network topology analysis. To deal with practical network heterogeneity, a few recent methods employ graph neural models to automatically learn diverse sources of information. However, the major concern revolves around that their fully adaptive learni...

GSim: A Graph Neural Network Based Relevance Measure for Heterogeneous Graphs

Article

Dec 2023

Heterogeneous graphs, which contain nodes and edges of multiple types, are prevalent in various domains, including bibliographic networks, social media, and knowledge graphs. As a fundamental task in analyzing heterogeneous graphs, relevance measure aims to calculate the relevance between two objects of different types, which has been used in many...

Efficient Core Maintenance in Large Bipartite Graphs

Article

Nov 2023

As an important cohesive subgraph model in bipartite graphs, the (α, β)-core (a.k.a. bi-core) has found a wide spectrum of real-world applications, such as product recommendation, fraudster detection, and community search. In these applications, the bipartite graphs are often large and dynamic, where vertices and edges are inserted and deleted freq...

Accelerating directed densest subgraph queries with software and hardware approaches

Article

Full-text available

Jul 2023

Given a directed graph G, the directed densest subgraph (DDS) problem refers to finding a subgraph from G, whose density is the highest among all subgraphs of G. The DDS problem is fundamental to a wide range of applications, such as fake follower detection and community mining. Theoretically, the DDS problem closely connects to other essential gra...

Influential Community Search over Large Heterogeneous Information Networks

Article

Jun 2023

Recently, the topic of influential community search has gained much attention. Given a graph, it aims to find communities of vertices with high importance values from it. Existing works mainly focus on conventional homogeneous networks, where vertices are of the same type. Thus, they cannot be applied to heterogeneous information networks (HINs) li...

On Querying Connected Components in Large Temporal Graphs

Article

Jun 2023

In this paper, for the first time, we introduce the concepts of window-CCs and window-SCCs on undirected and directed temporal graphs, respectively. We then study the queries of window-CC and window-SCC by developing several efficient index-based query solutions. The space costs of the best indices are linear to the sizes of the temporal graphs. Th...

WISK: A Workload-aware Learned Index for Spatial Keyword Queries

Article

Jun 2023

Spatial objects often come with textual information, such as Points of Interest (POIs) with their descriptions, which are referred to as geo-textual data. To retrieve such data, spatial keyword queries that take into account both spatial proximity and textual relevance have been extensively studied. Existing indexes designed for spatial keyword que...

Efficient and Effective Algorithms for Generalized Densest Subgraph Discovery

Article

Jun 2023

A Survey of Densest Subgraph Discovery on Large Graphs

Preprint

Jun 2023

With the prevalence of graphs for modeling complex relationships among objects, the topic of graph mining has attracted a great deal of attention from both academic and industrial communities in recent years. As one of the most fundamental problems in graph mining, the densest subgraph discovery (DSD) problem has found a wide spectrum of real appli...

Bipartite Graph Convolutional Hashing for Effective and Efficient Top-N Search in Hamming Space

Conference Paper

Apr 2023

Bipartite Graph Convolutional Hashing for Effective and Efficient Top-N Search in Hamming Space

Preprint

Apr 2023

Searching on bipartite graphs is basal and versatile to many real-world Web applications, e.g., online recommendation, database retrieval, and query-document searching. Given a query node, the conventional approaches rely on the similarity matching with the vectorized node embeddings in the continuous Euclidean space. To efficiently manage intensiv...

Finding Top-k Important Edges on Bipartite Graphs: Ego-betweenness Centrality-based Approaches

Conference Paper

Apr 2023

Scalable Algorithms for Densest Subgraph Discovery

Conference Paper

Apr 2023

Figure 3: The typical RL learning framework subset of D that includes...

Figure 6: An example of the state representation

WISK: A Workload-aware Learned Index for Spatial Keyword Queries

Preprint

Full-text available

Feb 2023

ABLE: Meta-Path Prediction in Heterogeneous Information Networks

Article

Aug 2022

Given a heterogeneous information network (HIN) H, a head node h , a meta-path P, and a tail node t , the meta-path prediction aims at predicting whether h can be linked to t by an instance of P. Most existing solutions either require predefined meta-paths, which limits their scalability to schema-rich HINs and long meta-paths, or do not aim at pre...

Fig. 1: An example bibliographic information network and relevance measure.

Fig. 2: Context paths between a 1 and a 2 .

Fig. 3: The overall framework of the CP-GNN.

Fig. 4: The illustration of relation message passing.

Statistics of datasets. The labeled node type are highlighted by * .

GSim: A Graph Neural Network based Relevance Measure for Heterogeneous Graphs

Preprint

Full-text available

Aug 2022

Densest subgraph discovery on large graphs: applications, challenges, and techniques

Article

Aug 2022

As one of the most fundamental problems in graph data mining, the densest subgraph discovery (DSD) problem has found a broad spectrum of real applications, such as social network community detection, graph index construction, regulatory motif discovery in DNA, fake follower detection, and so on. Theoretically, DSD closely relates to other fundament...

The Social Technology and Research (STAR) Lab in the University of Hong Kong

Article

Jul 2022

The main goal of the Social Technology and Research Laboratory (STAR Lab) in the University of Hong Kong (https://star.hku.hk) is to develop novel IT technologies for serving the society. Our team has more than three years of experience in project development, web, app, and game design, photography, and video production. We are interested in?Data S...

Effective community search over large star-schema heterogeneous information networks

Article

Jul 2022

Community search (CS) enables personalized community discovery and has found a wide spectrum of emerging applications such as setting up social events and friend recommendation. While CS has been extensively studied for conventional homogeneous networks, the problem for heterogeneous information networks (HINs) has received attention only recently....

A Convex-Programming Approach for Efficient Directed Densest Subgraph Discovery

Conference Paper

Jun 2022

Constrained Path Search with Submodular Function Maximization

Conference Paper

May 2022

Estimating Node Importance Values in Heterogeneous Information Networks

Conference Paper

May 2022

Comparison Analysis

Chapter

Feb 2022

In the previous two chapters, we have extensively introduced the CSMs and solutions for the bipartite networks and other general HINs, such as core-, truss-, clique-, connectivity-, and density-based models and solutions. These works focus on different types of specific HINs and formulate CSMs in different manners, so a natural question is which CS...

CSS on Other General HINs

Chapter

Feb 2022

In the era of big data, most of data or informational objects, individual agents, groups, or components are interconnected or interact with each other, forming numerous, large, interconnected, and sophisticated networks, which are often called heterogeneous information networks (HINs) in the literature. For instance, Twitter contains 326 million mo...

Future Work and Conclusion

Chapter

Jan 2022

Although much research effort has been devoted to CSS over large HINs over the past several decades, there are still many issues that are not well addressed, thus there is still much room to perform further study on CSS over large HINs in the future, from the perspectives of effective CSMs, computational efficiency, parameter optimization, tools, e...

CSS on Bipartite Networks

Chapter

Jan 2022

In many real-world applications, relationships between two different types of entities (e.g., user-item people-location and author-paper) are naturally modeled as bipartite networks. When analyzing bipartite networks, CSMs and CSS techniques play an important role in many aspects including network measurement, dense region discovering, and network...

Cohesive Subgraph Search Over Large Heterogeneous Information Networks

Book

Jan 2022

Preliminaries

Chapter

Jan 2022

To formally introduce the research problem and solutions of CSS over HINs, people often follow some commonly-used notations and models to facilitate the presentation. Before reviewing the specific models and solutions in the following chapters, in this chapter we first formally introduce the data models of HINs and bipartite networks, and then revi...

Related Work on CSMs and Solutions

Chapter

Jan 2022

In the literature, the topic of CSS on graphs has received tremendous research attention and it has been extensively studied in the past several decades, and most of existing research works focus on conventional homogeneous networks. However, the models and solutions of these works are highly related to the these of CSS over large HINs. In this cha...

Figure 2: Attention matrix of 3-length context path on ACM.

CP-GNN: A Software for Community Detection in Heterogeneous Information Networks

Article

Full-text available

Nov 2021

Recently, the topic of community detection (CD) in heterogeneous information networks (HINs), which contain multiple types of nodes and edges, has received much attention. However, existing CD methods could not well exploit the high-order relationship among the nodes to detect communities. To alleviate this issue, we propose to use the concept of c...

Relation Prediction via Graph Neural Network in Heterogeneous Information Networks with Missing Type Information

Conference Paper

Oct 2021

Detecting Communities from Heterogeneous Graphs: A Context Path-based Graph Neural Network Model

Conference Paper

Oct 2021

Detecting Communities from Heterogeneous Graphs: A Context Path-based Graph Neural Network Model

Preprint

Sep 2021

Community detection, aiming to group the graph nodes into clusters with dense inner-connection, is a fundamental graph mining task. Recently, it has been studied on the heterogeneous graph, which contains multiple types of nodes and edges, posing great challenges for modeling the high-order relationship between nodes. With the surge of graph embedd...

Efficient Directed Densest Subgraph Discovery

Article

Jun 2021

Given a directed graph G, the directed densest subgraph (DDS) problem refers to the finding of a subgraph from G, whose density is the highest among all the subgraphs of G. The DDS problem is fundamental to a wide range of applications, such as fraud detection, community mining, and graph compression. However, existing DDS solutions suffer from eff...

Cohesive Subgraph Search over Big Heterogeneous Information Networks: Applications, Challenges, and Solutions

Conference Paper

Jun 2021

Efficient bi-triangle counting for large bipartite networks

Article

Feb 2021

A bipartite network is a network with two disjoint vertex sets and its edges only exist between vertices from different sets. It has received much interest since it can be used to model the relationship between two different sets of objects in many applications (e.g., the relationship between users and items in E-commerce). In this paper, we study...

Correction: A survey of community search over big graphs

Article

Full-text available

Sep 2020

In the original article, the Table 1 was published with incorrect figures. The correct Table 1 is given below

Inductive Link Prediction for Nodes Having Only Attribute Information

Preprint

Full-text available

Jul 2020

Predicting the link between two nodes is a fundamental problem for graph data analytics. In attributed graphs, both the structure and attribute information can be utilized for link prediction. Most existing studies focus on transductive link prediction where both nodes are already in the graph. However, many real-world applications require inductiv...

Optimal Region Search with Submodular Maximization

Conference Paper

Jul 2020

Region search is an important problem in location-based services due to its wide applications. In this paper, we study the problem of optimal region search with submodular maximization (ORS-SM). This problem considers a region as a connected subgraph. We compute an objective value over the locations in the region using a submodular function and a b...

Inductive Link Prediction for Nodes Having Only Attribute Information

Conference Paper

Full-text available

Jul 2020

Efficient Community Search over Large Directed Graph: An Augmented Index-based Approach

Conference Paper

Jul 2020

Given a graph G and a query vertex q, the topic of community search (CS), aiming to retrieve a dense subgraph of G containing q, has gained much attention. Most existing works focus on undirected graphs which overlooks the rich information carried by the edge directions. Recently, the problem of community search over directed graphs (or CSD problem...

Efficient Community Search over Large Directed Graph: An Augmented Index-based Approach

Conference Paper

Jul 2020

Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs

Conference Paper

Jun 2020

MC-Explorer: Analyzing and Visualizing Motif-Cliques on Large Networks

Conference Paper

Apr 2020

Effective and Efficient Truss Computation over Large Heterogeneous Information Networks

Conference Paper

Apr 2020

Tree organization of all the k-influential communities (the ICP-Index)...

Partial graphs in the memory (k=2\documentclass[12pt]{minimal}...

An example of a multi-valued graph [126]

A survey of community search over big graphs

Article

Full-text available

Jan 2020

With the rapid development of information technologies, various big graphs are prevalent in many real applications (e.g., social media and knowledge bases). An important component of these graphs is the network community. Essentially, a community is a group of vertices which are densely connected internally. Community retrieval can be used in many...

Evaluating pattern matching queries for spatial databases

Article

Full-text available

Oct 2019

In this paper, we study the spatial pattern matching (SPM) query. Given a set D of spatial objects (e.g., houses and shops), each with a textual description, we aim at finding all combinations of objects from D that match a user-defined spatial patternP. A pattern P is a graph whose vertices represent spatial objects, and edges denote distance rela...

LINC: a motif counting algorithm for uncertain graphs

Article

Oct 2019

In graph applications (e.g., biological and social networks), various analytics tasks (e.g., clustering and community search) are carried out to extract insight from large and complex graphs. Central to these tasks is the counting of the number of motifs , which are graphs with a few nodes. Recently, researchers have developed several fast motif co...

Spatial pattern matching: a new direction for finding spatial objects

Article

Aug 2019

In this paper, we study the spatial pattern matching (SPM) query. Given a set D of spatial objects (e.g., houses and shops), each with a textual description, we aim at finding all combinations of objects from D that match a user-defined spatial pattern P. A pattern P is a graph whose vertices represent spatial objects, and edges denote distance rel...

Efficient algorithms for densest subgraph discovery

Article

Jul 2019

Densest subgraph discovery (DSD) is a fundamental problem in graph mining. It has been studied for decades, and is widely used in various areas, including network science, biological analysis, and graph databases. Given a graph G, DSD aims to find a subgraph D of G with the highest density (e.g., the number of edges over the number of vertices in D...

Efficient Algorithms for Densest Subgraph Discovery

Preprint

Jun 2019

LABIN: Balanced Min Cut for Large-Scale Data

Article

May 2019

Although many spectral clustering algorithms have been proposed during the past decades, they are not scalable to large-scale data due to their high computational complexities. In this paper, we propose a novel spectral clustering method for large-scale data, namely, large-scale balanced min cut (LABIN). A new model is proposed to extend the self-b...

A Survey of Community Search Over Big Graphs

Preprint

Apr 2019

Structured Spectral Clustering of PurTree Data

Chapter

Apr 2019

Recently, a “Purchase Tree” data structure is proposed to compress the customer transaction data and a local PurTree Spectral clustering method is proposed to recover the cluster structure from the purchase trees. However, in the PurTree distance, the node weights for the children nodes of a parent node are set as equal and the difference between d...

Exploring Communities in Large Profiled Graphs (Extended Abstract)

Conference Paper

Apr 2019

Discovering Maximal Motif Cliques in Large Heterogeneous Information Networks

Conference Paper

Apr 2019

Effective and Efficient Community Search Over Large Directed Graphs (Extended Abstract)

Conference Paper

Apr 2019

Exploring Communities in Large Profiled Graphs

Article

Nov 2018

Given a graph G and a vertex $q \epsilon G$ , the community search (CS) problem aims to efficiently find a subgraph of G whose vertices are closely related to q. Communities are prevalent in social and biological networks, and can be used in product advertisement and social event recommendation. In this paper, we study profiled community search (P...

Effective and Efficient Community Search Over Large Directed Graphs

Article

Oct 2018

Communities are prevalent in social networks, knowledge graphs, and biological networks. Recently, the topic of community search (CS), extracting a dense subgraph containing a query vertex q from a graph, has received great attention. However, existing CS solutions are designed for undirected graphs, and overlook directions of edges which potential...

On Spatial-Aware Community Search

Article

Jun 2018

Communities are prevalent in social networks, knowledge graphs, and biological networks. Recently, the topic of community search (CS) has received plenty of attention. The CS problem aims to look for a dense subgraph that contains a query vertex. Existing CS solutions do not consider the spatial extent of a community. They can yield communities who...

STEM: a suffix tree-based method for web data records extraction

Article

Full-text available

May 2018

To automatically extract data records from Web pages, the data record extraction algorithm is required to be robust and efficient. However, most of existing algorithms are not robust enough to cope with rich information or noisy data. In this paper, we propose a novel suffix tree-based extraction method (STEM) for this challenging task. First, we e...

Scalable Evaluation of k-NN Queries on Large Uncertain Graphs

Article

Jan 2018

Large graphs are prevalent in social networks, traffic networks, and biology. These graphs are often inexact. For example, in a friendship network, an edge between two nodes u and v indicates that users u and v have a close relationship. This edge may only exist with a probability. To model such information, the uncertain graph model has been propo...

On Attributed Community Search

Chapter

Jan 2018

The process of splitting nodes in a path

Comparing with community detection method. a Keyword (CMF), b Keyword...

Effective and efficient attributed community search

Article

Full-text available

Dec 2017

Given a graph G and a vertex $q \in G$, the community search query returns a subgraph of G that contains vertices related to q. Communities, which are prevalent in attributed graphs such as social networks and knowledge bases, can be used in emerging applications such as product advertisement and setting up of social events. In this paper, we inv...

On Embedding Uncertain Graphs

Conference Paper

Full-text available

Nov 2017

Graph data are prevalent in communication networks, social media, and biological networks. These data, which are often noisy or inexact, can be represented by uncertain graphs, whose edges are associated with probabilities to indicate the chances that they exist. Recently, researchers have studied various algorithms (e.g., clustering, classificatio...

PurTreeClust: A Clustering Algorithm for Customer Segmentation from Massive Customer Transaction Data

Article

Oct 2017

Clustering of customer transaction data is an important procedure to analyze customer behaviors in retail and e-commerce companies. Note that products from companies are often organized as a product tree, in which the leaf nodes are goods to sell, and the internal nodes (except root node) could be multiple product categories. Based on this tree, we...

C-explorer: browsing communities in large graphs

Article

Aug 2017

Community retrieval (CR) algorithms, which enable the extraction of subgraphs from large social networks (e.g., Facebook and Twitter), have attracted tremendous interest. Various CR solutions, such as k-core and codicil, have been proposed to obtain graphs whose vertices are closely related. In this paper, we propose the C-Explorer system to assist...

On Minimal Steiner Maximum-Connected Subgraph Queries

Article

Full-text available

Jul 2017

Given a graph $G$ and a set $Q$ of query nodes, we examine the Steiner Maximum-Connected Subgraph (SMCS) problem. The SMCS, or $G$ 's induced subgraph that contains $Q$ with the largest connectivity, can be useful for customer prediction, product promotion, and team assembling. Despite its importance, the SMCS problem has only been recently studied...

Effective community search over large spatial graphs

Article

Full-text available

Feb 2017

Communities are prevalent in social networks, knowledge graphs, and biological networks. Recently, the topic of community search (CS) has received plenty of attention. Given a query vertex, CS looks for a dense subgraph that contains it. Existing CS solutions do not consider the spatial extent of a community. They can yield communities whose locati...

Querying Minimal Steiner Maximum-Connected Subgraphs in Large Graphs

Conference Paper

Oct 2016

Given a graph G and a set Q of query nodes, we examine the Steiner Maximum-Connected Subgraph (SMCS). The SMCS, or G's induced subgraph that contains Q with the largest connectivity, can be useful for customer prediction, product promotion, and team assembling. Despite its importance, the SMCS problem has only been recently studied. Existing soluti...

Effective community search for large attributed graphs

Article

Aug 2016

Given a graph G and a vertex q ∈ G, the community search query returns a subgraph of G that contains vertices related to q. Communities, which are prevalent in attributed graphs such as social networks and knowledge bases, can be used in emerging applications such as product advertisement and setting up of social events. In this paper, we investiga...

Scalable algorithms for nearest-neighbor joins on big trajectory data

Conference Paper

May 2016

Walking in the Cloud: Parallel SimRank at Scale

Article

Full-text available

Sep 2015

Despite its popularity, SimRank is computationally costly, in both time and space. In particular, its recursive nature poses a great challenge in using modern distributed computing power, and also prevents querying similarities individually. Existing solutions suffer greatly from these practical issues. In this paper, we break such dependency for m...

Scalable Algorithms for Nearest-Neighbor Joins on Big Trajectory Data

Article

Full-text available

Jan 2015

Trajectory data are prevalent in systems that monitor the locations of moving objects. In a location-based service, for instance, the positions of vehicles are continuously monitored through GPS; the trajectory of each vehicle describes its movement history. We study joins on two sets of trajectories, generated by two sets M and R of moving objects...

Detecting hot topics from Twitter: A multiview approach

Article

Full-text available

Oct 2014

Twitter is widely used all over the world, and a huge number of hot topics are generated by Twitter users in real time. These topics are able to reflect almost every aspect of people's daily lives. Therefore, the detection of topics in Twitter can be used in many real applications, such as monitoring public opinion, hot product recommendation and i...

Back-buy prediction based on TriFG

Article

Full-text available

Aug 2012

Reciprocal Relationship in twitter can be predicted by TriFG model. Based on this model, we study the extent to which the formation of a two-way relationship can be predicted in a dynamic e-commerce web site which is composed of products and customers, especially the back-buy behavior. Back-buy behavior represents a more stable interest direction o...

Extracting data records from web using suffix tree

Article

Full-text available

Aug 2012

There are many automatic methods that can extract lists of objects from the Web, but they often fail to handle multi-type pages automatically. This paper introduces a new method for record extraction using suffix tree which can find the repeated sub-string. Our method transfers a distinct group of tag paths appearing repeatedly in the DOM tree of t...