Yixiang Fang

Yixiang Fang
The University of Hong Kong | HKU · Department of Computer Science

About

85
Publications
8,157
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,078
Citations

Publications

Publications (85)
Preprint
Full-text available
As a fundamental topic in graph mining, Densest Subgraph Discovery (DSD) has found a wide spectrum of real applications. Several DSD algorithms, including exact and approximation algorithms, have been proposed in the literature. However, these algorithms have not been systematically and comprehensively compared under the same experimental settings....
Preprint
Bipartite graphs are ubiquitous in many domains, e.g., e-commerce platforms, social networks, and academia, by modeling interactions between distinct entity sets. Within these graphs, the butterfly motif, a complete 2*2 biclique, represents the simplest yet significant subgraph structure, crucial for analyzing complex network patterns. Counting the...
Article
Sparse matrices are often used to model the interactions among different objects and they are prevalent in many areas including e-commerce, social network, and biology. As one of the fundamental matrix operations, the sparse matrix chain multiplication (SMCM) aims to efficiently multiply a chain of sparse matrices, which has found various real-worl...
Article
Densest subgraph discovery (DSD) is a fundamental topic in graph mining. It has been extensively studied in the literature and has found many real applications in a wide range of fields, such as biology, finance, and social networks. As a typical problem of DSD, the k-clique densest subgraph (CDS) problem aims to detect a subgraph from a graph, suc...
Article
Full-text available
The densest subgraph problem (DSP) is of great significance due to its wide applications in different domains. Meanwhile, diverse requirements in various applications lead to different density variants for DSP. Unfortunately, existing DSP algorithms cannot be easily extended to handle those variants efficiently and accurately. To fill this gap, we...
Article
The enumeration of hop-constrained simple paths is a building block in many graph-based areas. Due to the enormous search spaces in large-scale graphs, a single machine can hardly satisfy the requirements of both efficiency and memory, which causes an urgent need for efficient distributed methods. In practice, it is inevitable to produce plenty of...
Article
Personalized recommender systems have found widespread applications for effective information filtering. Conventional models engage in knowledge mining within the static setting to reconstruct singular historical data. Nonetheless, the dynamics of real-world environments are in a constant state of flux, rendering acquired model knowledge inadequate...
Article
The classic problem of node importance estimation has been conventionally studied with homogeneous network topology analysis. To deal with practical network heterogeneity, a few recent methods employ graph neural models to automatically learn diverse sources of information. However, the major concern revolves around that their fully adaptive learni...
Article
Heterogeneous graphs, which contain nodes and edges of multiple types, are prevalent in various domains, including bibliographic networks, social media, and knowledge graphs. As a fundamental task in analyzing heterogeneous graphs, relevance measure aims to calculate the relevance between two objects of different types, which has been used in many...
Article
As an important cohesive subgraph model in bipartite graphs, the (α, β)-core (a.k.a. bi-core) has found a wide spectrum of real-world applications, such as product recommendation, fraudster detection, and community search. In these applications, the bipartite graphs are often large and dynamic, where vertices and edges are inserted and deleted freq...
Article
Full-text available
Given a directed graph G, the directed densest subgraph (DDS) problem refers to finding a subgraph from G, whose density is the highest among all subgraphs of G. The DDS problem is fundamental to a wide range of applications, such as fake follower detection and community mining. Theoretically, the DDS problem closely connects to other essential gra...
Article
Recently, the topic of influential community search has gained much attention. Given a graph, it aims to find communities of vertices with high importance values from it. Existing works mainly focus on conventional homogeneous networks, where vertices are of the same type. Thus, they cannot be applied to heterogeneous information networks (HINs) li...
Article
In this paper, for the first time, we introduce the concepts of window-CCs and window-SCCs on undirected and directed temporal graphs, respectively. We then study the queries of window-CC and window-SCC by developing several efficient index-based query solutions. The space costs of the best indices are linear to the sizes of the temporal graphs. Th...
Article
Spatial objects often come with textual information, such as Points of Interest (POIs) with their descriptions, which are referred to as geo-textual data. To retrieve such data, spatial keyword queries that take into account both spatial proximity and textual relevance have been extensively studied. Existing indexes designed for spatial keyword que...
Article
The densest subgraph problem (DSP) is of great significance due to its wide applications in different domains. Meanwhile, diverse requirements in various applications lead to different density variants for DSP. Unfortunately, existing DSP algorithms cannot be easily extended to handle those variants efficiently and accurately. To fill this gap, we...
Preprint
With the prevalence of graphs for modeling complex relationships among objects, the topic of graph mining has attracted a great deal of attention from both academic and industrial communities in recent years. As one of the most fundamental problems in graph mining, the densest subgraph discovery (DSD) problem has found a wide spectrum of real appli...
Preprint
Searching on bipartite graphs is basal and versatile to many real-world Web applications, e.g., online recommendation, database retrieval, and query-document searching. Given a query node, the conventional approaches rely on the similarity matching with the vectorized node embeddings in the continuous Euclidean space. To efficiently manage intensiv...
Preprint
Full-text available
Spatial objects often come with textual information, such as Points of Interest (POIs) with their descriptions, which are referred to as geo-textual data. To retrieve such data, spatial keyword queries that take into account both spatial proximity and textual relevance have been extensively studied. Existing indexes designed for spatial keyword que...
Article
Given a heterogeneous information network (HIN) H, a head node h , a meta-path P, and a tail node t , the meta-path prediction aims at predicting whether h can be linked to t by an instance of P. Most existing solutions either require predefined meta-paths, which limits their scalability to schema-rich HINs and long meta-paths, or do not aim at pre...
Preprint
Full-text available
Heterogeneous graphs, which contain nodes and edges of multiple types, are prevalent in various domains, including bibliographic networks, social media, and knowledge graphs. As a fundamental task in analyzing heterogeneous graphs, relevance measure aims to calculate the relevance between two objects of different types, which has been used in many...
Article
As one of the most fundamental problems in graph data mining, the densest subgraph discovery (DSD) problem has found a broad spectrum of real applications, such as social network community detection, graph index construction, regulatory motif discovery in DNA, fake follower detection, and so on. Theoretically, DSD closely relates to other fundament...
Article
The main goal of the Social Technology and Research Laboratory (STAR Lab) in the University of Hong Kong (https://star.hku.hk) is to develop novel IT technologies for serving the society. Our team has more than three years of experience in project development, web, app, and game design, photography, and video production. We are interested in?Data S...
Article
Community search (CS) enables personalized community discovery and has found a wide spectrum of emerging applications such as setting up social events and friend recommendation. While CS has been extensively studied for conventional homogeneous networks, the problem for heterogeneous information networks (HINs) has received attention only recently....
Chapter
In the previous two chapters, we have extensively introduced the CSMs and solutions for the bipartite networks and other general HINs, such as core-, truss-, clique-, connectivity-, and density-based models and solutions. These works focus on different types of specific HINs and formulate CSMs in different manners, so a natural question is which CS...
Chapter
In the era of big data, most of data or informational objects, individual agents, groups, or components are interconnected or interact with each other, forming numerous, large, interconnected, and sophisticated networks, which are often called heterogeneous information networks (HINs) in the literature. For instance, Twitter contains 326 million mo...
Chapter
Although much research effort has been devoted to CSS over large HINs over the past several decades, there are still many issues that are not well addressed, thus there is still much room to perform further study on CSS over large HINs in the future, from the perspectives of effective CSMs, computational efficiency, parameter optimization, tools, e...
Chapter
In many real-world applications, relationships between two different types of entities (e.g., user-item people-location and author-paper) are naturally modeled as bipartite networks. When analyzing bipartite networks, CSMs and CSS techniques play an important role in many aspects including network measurement, dense region discovering, and network...
Chapter
To formally introduce the research problem and solutions of CSS over HINs, people often follow some commonly-used notations and models to facilitate the presentation. Before reviewing the specific models and solutions in the following chapters, in this chapter we first formally introduce the data models of HINs and bipartite networks, and then revi...
Chapter
In the literature, the topic of CSS on graphs has received tremendous research attention and it has been extensively studied in the past several decades, and most of existing research works focus on conventional homogeneous networks. However, the models and solutions of these works are highly related to the these of CSS over large HINs. In this cha...
Article
Full-text available
Recently, the topic of community detection (CD) in heterogeneous information networks (HINs), which contain multiple types of nodes and edges, has received much attention. However, existing CD methods could not well exploit the high-order relationship among the nodes to detect communities. To alleviate this issue, we propose to use the concept of c...
Preprint
Community detection, aiming to group the graph nodes into clusters with dense inner-connection, is a fundamental graph mining task. Recently, it has been studied on the heterogeneous graph, which contains multiple types of nodes and edges, posing great challenges for modeling the high-order relationship between nodes. With the surge of graph embedd...
Article
Given a directed graph G, the directed densest subgraph (DDS) problem refers to the finding of a subgraph from G, whose density is the highest among all the subgraphs of G. The DDS problem is fundamental to a wide range of applications, such as fraud detection, community mining, and graph compression. However, existing DDS solutions suffer from eff...
Article
A bipartite network is a network with two disjoint vertex sets and its edges only exist between vertices from different sets. It has received much interest since it can be used to model the relationship between two different sets of objects in many applications (e.g., the relationship between users and items in E-commerce). In this paper, we study...
Article
Full-text available
In the original article, the Table 1 was published with incorrect figures. The correct Table 1 is given below
Preprint
Full-text available
Predicting the link between two nodes is a fundamental problem for graph data analytics. In attributed graphs, both the structure and attribute information can be utilized for link prediction. Most existing studies focus on transductive link prediction where both nodes are already in the graph. However, many real-world applications require inductiv...
Conference Paper
Region search is an important problem in location-based services due to its wide applications. In this paper, we study the problem of optimal region search with submodular maximization (ORS-SM). This problem considers a region as a connected subgraph. We compute an objective value over the locations in the region using a submodular function and a b...
Conference Paper
Full-text available
Predicting the link between two nodes is a fundamental problem for graph data analytics. In attributed graphs, both the structure and attribute information can be utilized for link prediction. Most existing studies focus on transductive link prediction where both nodes are already in the graph. However, many real-world applications require inductiv...
Conference Paper
Given a graph G and a query vertex q, the topic of community search (CS), aiming to retrieve a dense subgraph of G containing q, has gained much attention. Most existing works focus on undirected graphs which overlooks the rich information carried by the edge directions. Recently, the problem of community search over directed graphs (or CSD problem...
Conference Paper
Given a graph G and a query vertex q, the topic of community search (CS), aiming to retrieve a dense subgraph of G containing q, has gained much attention. Most existing works focus on undirected graphs which overlooks the rich information carried by the edge directions. Recently, the problem of community search over directed graphs (or CSD problem...
Article
Full-text available
With the rapid development of information technologies, various big graphs are prevalent in many real applications (e.g., social media and knowledge bases). An important component of these graphs is the network community. Essentially, a community is a group of vertices which are densely connected internally. Community retrieval can be used in many...
Article
Full-text available
In this paper, we study the spatial pattern matching (SPM) query. Given a set D of spatial objects (e.g., houses and shops), each with a textual description, we aim at finding all combinations of objects from D that match a user-defined spatial patternP. A pattern P is a graph whose vertices represent spatial objects, and edges denote distance rela...
Article
In graph applications (e.g., biological and social networks), various analytics tasks (e.g., clustering and community search) are carried out to extract insight from large and complex graphs. Central to these tasks is the counting of the number of motifs , which are graphs with a few nodes. Recently, researchers have developed several fast motif co...
Article
In this paper, we study the spatial pattern matching (SPM) query. Given a set D of spatial objects (e.g., houses and shops), each with a textual description, we aim at finding all combinations of objects from D that match a user-defined spatial pattern P. A pattern P is a graph whose vertices represent spatial objects, and edges denote distance rel...
Article
Densest subgraph discovery (DSD) is a fundamental problem in graph mining. It has been studied for decades, and is widely used in various areas, including network science, biological analysis, and graph databases. Given a graph G, DSD aims to find a subgraph D of G with the highest density (e.g., the number of edges over the number of vertices in D...
Preprint
Densest subgraph discovery (DSD) is a fundamental problem in graph mining. It has been studied for decades, and is widely used in various areas, including network science, biological analysis, and graph databases. Given a graph G, DSD aims to find a subgraph D of G with the highest density (e.g., the number of edges over the number of vertices in D...
Article
Although many spectral clustering algorithms have been proposed during the past decades, they are not scalable to large-scale data due to their high computational complexities. In this paper, we propose a novel spectral clustering method for large-scale data, namely, large-scale balanced min cut (LABIN). A new model is proposed to extend the self-b...
Preprint
With the rapid development of information technologies, various big graphs are prevalent in many real applications (e.g., social media and knowledge bases). An important component of these graphs is the network community. Essentially, a community is a group of vertices which are densely connected internally. Community retrieval can be used in many...
Chapter
Recently, a “Purchase Tree” data structure is proposed to compress the customer transaction data and a local PurTree Spectral clustering method is proposed to recover the cluster structure from the purchase trees. However, in the PurTree distance, the node weights for the children nodes of a parent node are set as equal and the difference between d...
Article
Given a graph G and a vertex $q \epsilon G$ , the community search (CS) problem aims to efficiently find a subgraph of G whose vertices are closely related to q. Communities are prevalent in social and biological networks, and can be used in product advertisement and social event recommendation. In this paper, we study profiled community search (P...
Article
Communities are prevalent in social networks, knowledge graphs, and biological networks. Recently, the topic of community search (CS), extracting a dense subgraph containing a query vertex q from a graph, has received great attention. However, existing CS solutions are designed for undirected graphs, and overlook directions of edges which potential...
Article
Communities are prevalent in social networks, knowledge graphs, and biological networks. Recently, the topic of community search (CS) has received plenty of attention. The CS problem aims to look for a dense subgraph that contains a query vertex. Existing CS solutions do not consider the spatial extent of a community. They can yield communities who...
Article
Full-text available
To automatically extract data records from Web pages, the data record extraction algorithm is required to be robust and efficient. However, most of existing algorithms are not robust enough to cope with rich information or noisy data. In this paper, we propose a novel suffix tree-based extraction method (STEM) for this challenging task. First, we e...
Article
Large graphs are prevalent in social networks, traffic networks, and biology. These graphs are often inexact. For example, in a friendship network, an edge between two nodes u and v indicates that users u and v have a close relationship. This edge may only exist with a probability. To model such information, the uncertain graph model has been propo...
Article
Full-text available
Given a graph G and a vertex \(q \in G\), the community search query returns a subgraph of G that contains vertices related to q. Communities, which are prevalent in attributed graphs such as social networks and knowledge bases, can be used in emerging applications such as product advertisement and setting up of social events. In this paper, we inv...
Conference Paper
Full-text available
Graph data are prevalent in communication networks, social media, and biological networks. These data, which are often noisy or inexact, can be represented by uncertain graphs, whose edges are associated with probabilities to indicate the chances that they exist. Recently, researchers have studied various algorithms (e.g., clustering, classificatio...
Article
Clustering of customer transaction data is an important procedure to analyze customer behaviors in retail and e-commerce companies. Note that products from companies are often organized as a product tree, in which the leaf nodes are goods to sell, and the internal nodes (except root node) could be multiple product categories. Based on this tree, we...
Article
Community retrieval (CR) algorithms, which enable the extraction of subgraphs from large social networks (e.g., Facebook and Twitter), have attracted tremendous interest. Various CR solutions, such as k-core and codicil, have been proposed to obtain graphs whose vertices are closely related. In this paper, we propose the C-Explorer system to assist...
Article
Full-text available
Given a graph $G$ and a set $Q$ of query nodes, we examine the Steiner Maximum-Connected Subgraph (SMCS) problem. The SMCS, or $G$ 's induced subgraph that contains $Q$ with the largest connectivity, can be useful for customer prediction, product promotion, and team assembling. Despite its importance, the SMCS problem has only been recently studied...
Article
Full-text available
Communities are prevalent in social networks, knowledge graphs, and biological networks. Recently, the topic of community search (CS) has received plenty of attention. Given a query vertex, CS looks for a dense subgraph that contains it. Existing CS solutions do not consider the spatial extent of a community. They can yield communities whose locati...
Conference Paper
Given a graph G and a set Q of query nodes, we examine the Steiner Maximum-Connected Subgraph (SMCS). The SMCS, or G's induced subgraph that contains Q with the largest connectivity, can be useful for customer prediction, product promotion, and team assembling. Despite its importance, the SMCS problem has only been recently studied. Existing soluti...
Article
Given a graph G and a vertex q ∈ G, the community search query returns a subgraph of G that contains vertices related to q. Communities, which are prevalent in attributed graphs such as social networks and knowledge bases, can be used in emerging applications such as product advertisement and setting up of social events. In this paper, we investiga...
Article
Full-text available
Despite its popularity, SimRank is computationally costly, in both time and space. In particular, its recursive nature poses a great challenge in using modern distributed computing power, and also prevents querying similarities individually. Existing solutions suffer greatly from these practical issues. In this paper, we break such dependency for m...
Article
Full-text available
Trajectory data are prevalent in systems that monitor the locations of moving objects. In a location-based service, for instance, the positions of vehicles are continuously monitored through GPS; the trajectory of each vehicle describes its movement history. We study joins on two sets of trajectories, generated by two sets M and R of moving objects...
Article
Full-text available
Twitter is widely used all over the world, and a huge number of hot topics are generated by Twitter users in real time. These topics are able to reflect almost every aspect of people's daily lives. Therefore, the detection of topics in Twitter can be used in many real applications, such as monitoring public opinion, hot product recommendation and i...
Article
Full-text available
Reciprocal Relationship in twitter can be predicted by TriFG model. Based on this model, we study the extent to which the formation of a two-way relationship can be predicted in a dynamic e-commerce web site which is composed of products and customers, especially the back-buy behavior. Back-buy behavior represents a more stable interest direction o...
Article
Full-text available
There are many automatic methods that can extract lists of objects from the Web, but they often fail to handle multi-type pages automatically. This paper introduces a new method for record extraction using suffix tree which can find the repeated sub-string. Our method transfers a distinct group of tag paths appearing repeatedly in the DOM tree of t...

Network

Cited By