Figure 18 - uploaded by Zhengping Qian
Content may be subject to copyright.
Number of hot points under different thresholds.

Number of hot points under different thresholds.

Source publication
Article
Full-text available
As graph data is prevalent for an increasing number of Internet applications, continuously monitoring structural patterns in dynamic graphs in order to generate real-time alerts and trigger prompt actions becomes critical for many applications. In this paper, we present a new system GraphS to efficiently detect constrained cycles in a dynamic graph...

Similar publications

Article
Full-text available
Real-time neuron detection and neural activity extraction are critical components of real-time neural decoding. They are modeled effectively in dataflow graphs. However, these graphs and the components within them in general have many parameters, including hyper-parameters associated with machine learning sub-systems. The dataflow graph parameters...

Citations

... However, the field of incremental cycle detection and strongly connected component management contains extensive variations that improve the efficiency of these algorithms. Most of these approaches enhance the algorithms with topologically sorted vertices [11,13,45], two-way searches [45,81], or other methods to reduce the search space necessary to identify possible new cycles [14,81]. ...
... However, the field of incremental cycle detection and strongly connected component management contains extensive variations that improve the efficiency of these algorithms. Most of these approaches enhance the algorithms with topologically sorted vertices [11,13,45], two-way searches [45,81], or other methods to reduce the search space necessary to identify possible new cycles [14,81]. ...
Article
Full-text available
Collaborative model-driven development is a de facto practice to create software-intensive systems in several domains (e.g., aerospace, automotive, and robotics). However, when multiple engineers work concurrently, keeping all model artifacts synchronized and consistent is difficult. This is even harder when the engineering process relies on a myriad of tools and domains (e.g., mechanic, electronic, and software). Existing work tries to solve this issue from different perspectives, such as using trace links between different artifacts or computing change propagation paths. However, these solutions mainly provide additional information to engineers, still requiring manual work for propagating changes. Yet, most modeling tools are limited regarding the traceability between different domains, while also lacking the efficiency and granularity required during the development of software-intensive systems. Motivated by these limitations, in this work, we present a solution based on what we call “reactive links”, which are highly granular trace links that propagate change between property values across models in different domains, managed in different tools. Differently from traditional “passive links”, reactive links automatically propagate changes when engineers modify models, assuring the synchronization and consistency of the artifacts. The feasibility, performance, and flexibility of our solution were evaluated in three practical scenarios, from two partner organizations. Our solution is able to resolve all cases in which change propagation among models were required. We observed a great improvement of efficiency when compared to the same propagation if done manually. The contribution of this work is to enhance the engineering of software-intensive systems by reducing the burden of manually keeping models synchronized and avoiding inconsistencies that potentially can originate from collaborative engineering in a variety of tool from different domains.
... For instance, fraud detection applications [36,135,136] analyze the continuously generated transactions to prevent Poor High money or property from being obtained through false pretenses. System monitoring applications [46,170] derive real-time system metrics swiftly based on data logs. ...
Article
Full-text available
To provide applications with the ability to analyze fresh data and eliminate the time-consuming ETL workflow, hybrid transactional and analytical (HTAP) systems have been developed to serve online transaction processing and online analytical processing workloads in a single system. In recent years, HTAP systems have attracted considerable interest from both academia and industry. Several new architectures and technologies have been proposed. This paper provides a comprehensive overview of these HTAP systems. We review recently published papers and technical reports in this field and broadly classify existing HTAP systems into two categories based on their data formats: monolithic and hybrid HTAP. We further classify hybrid HTAP into four sub-categories based on their storage architecture: row-oriented, column-oriented, separated, and hybrid. Based on such a taxonomy, we outline each stream’s design challenges and performance issues (e.g., the contradictory format demand for monolithic HTAP). We then discuss potential solutions and their trade-offs by reviewing noteworthy research findings. Finally, we summarize emerging HTAP applications, benchmarks, future trends, and open problems.
... Continuous subgraph matching (CSM, for short) is an important problem over dynamic graphs, which constantly reports matches for each edge insertion/deletion. It has been widely utilized to detect specific subgraphs over real-world dynamic graphs, such as fraud patterns over payment network [1] and attack patterns over communication network [2], [3]. However, the scale of matching results could be exponential which may overwhelm analysts. ...
... All the codes and datasets are available on GitHub [39]. 1 If τ 1 (τ 2 , resp.) does not exist, we set E 1 (E 2 , resp.) as ∅. ...
... Graphs as ubiquitous data structures are prevalently applied in a variety of domains such as social network analysis [1,2], knowledge representation [3][4][5], bioinformatics [6,7] and fraud detection [8][9][10]. Subgraph retrieval (also acknowledged as subgraph search) and subgraph matching are fundamental operations for analyzing these graphs. ...
Preprint
Full-text available
Graph index as an effective data structure is widely applied in subgraph retrieval and matching. It records and compares the frequencies of a set of specific features to detect subgraph containment on the fly, which is the foundation of the filtering techniques for subgraph retrieval and matching. However, due to the NP-hardness of the subgraph counting, current graph indices struggle to be built on large graphs. Even counting the simple path and cycle graphs is NP-hard. We observe that the monotone property of the counting process is crucial for the correctness and precision of the index. Therefore, we introduce an efficient graph indexing scheme by counting the path and cycle features monotonically in relaxed semantics. In addition to the filtering techniques, we propose to reorder the search candidates via our index. Experimental results reveal that our index can be constructed significantly faster than existing methods, by 1-3 orders of magnitude, and can handle graphs that are larger than previous work by 1-3 orders of magnitude. Our index-boosted filtering and ordering techniques are proven to be effective in optimizing the subgraph retrieval and matching process.
... Bipartite graphs are a natural fit when it comes to modeling the relationship between two different types of entities in real-world applications [1], [2]. For instance, Alibaba's ecommerce platform models relationships between users and products via bipartite graphs [3]. ...
... For instance, it is used to measure the butterfly clustering coefficient in a bipartite graph, which indicates how cohesive the graph is and can highlight how entities are clustered [6], [7], [8], [9]. This metric is important in many real-world applications, such as: in online recommendation systems to identify similar items [10], [11], [12], [13], cluster users, and enhance collaborativefiltering [14]); in real-time anomaly detection [15], [16], [17]; in fraud detection [2]. Also, counting butterflies for each edge is required for the computation of k-bitrusses [18], [19], [20], which is used in a variety of applications, such as community and spam detection [21], [22], [23], [24], [25], [26]. ...
... Definition 1. A fully dynamic bipartite graph stream Π is a sequence of elements (e (1) , e (2) , . . . ). ...
Conference Paper
Full-text available
A bipartite graph extensively models relationships between real-world entities of two different types, such as user-product data in e-commerce. Such graph data are inherently becoming more and more streaming, entailing continuous insertions and deletions of edges. A butterfly (i.e., 2×2 bi-clique) is the smallest non-trivial cohesive structure that plays a crucial role. Counting such butterfly patterns in streaming bipartite graphs is a core problem in applications such as dense subgraph discovery and anomaly detection. Yet, existing approximate solutions consider insert-only streams and, thus, achieve very low accuracy in fully dynamic bipartite graph streams that involve both insertions and deletions of edges. Adapting them to consider deletions is not trivial either, because different sampling schemes and new accuracy analyses are required. We propose ABACUS, a novel approximate algorithm that counts butterflies in the presence of both insertions and deletions by utilizing sampling. We prove that ABACUS always delivers unbiased estimates of low variance. Furthermore, we extend ABACUS and devise a parallel mini-batch variant, namely, PARABACUS, which counts butterflies in parallel. PARABACUS counts butterflies in a load-balanced manner using versioned samples, which results in significant speedup and is thus ideal for critical applications in the streaming environment. We evaluate ABACUS/PARABACUS using a diverse set of real bipartite graphs and assess its performance in terms of accuracy, throughput, and speedup. The results indicate that our proposal is the first capable of efficiently providing accurate butterfly counts in the most generic setting, i.e., a fully dynamic graph streaming environment that entails both insertions and deletions. It does so without sacrificing throughput, and even improves it with the parallel version.
... A machine learning algorithm employs the extracted features to classify the trajectory. In [11], researchers detected newly generated cycles in a dynamic graph that is constantly changing. Regarding detour detection, in [12] for the same start and end point, a detour is defined as taking much time or driving long distances. ...
... For example, Alibaba ecommerce activity graph is being updated 20,000 edges per Manuscript second at the peak and Twitter has about 100 million users login daily, with around 500 million tweets per day. Network traffic data averages to about 10 9 packets per hour per router in large data centers [1], [2], [3]. ...
... The expected numbers of irrelevant segments and associated segments are s · (1 − 1 s ) x and s − s · (1 − 1 s ) x , respectively. 3 According to LPMA expansion and re-balancing process, only associated segments and their successor empty segments (in the new expanded layer of LPMA) participate in the rebalancing process. However, if an associated segment and its successor empty segment cannot accommodate the corresponding edge insertions, we may need to roll up the insertion and consider the consecutive four segments. ...
Article
There is a growing interest to offload dynamic graph computation to GPU and resort to its high parallel processing ability and larger memory bandwidths compared with CPUs. The existing GPU graph systems usually use compressed sparse row (CSR) as the de-facto structure. However, CSR has a critical weakness for dynamic change due to the large overhead of re-balance process after update. GPMA+ is a state-of-art dynamic PMA-based structure that uses PMA structure and segment-oriented parallel update procedure to address the dynamic weakness of CSR, but it still has a bottleneck on the array expansion. In this paper, we propose an leveled structure (called LPMA) instead of continue array to retain low time complexity and high parallel update and lift the expansion bottleneck of GPMA+. More specifically, we propose a series of optimization techniques, including bottom-up update, top-down update and on-demand hybrid update strategies as well as consistence-guaranteed parallel processing for update-query mixed workloads. We theoretically analyze the benefits of LPMA compared in terms of re-balance cost during updates. Extensive experiments on four large real-life graphs prove the superiority of LPMA compared with the-state-of-arts.
... CSM is useful in a wide range of applications, such as recommendation systems [6,14,39], fraud detection [34,37], and cyber security [9,10], etc. ...
... For example, a cycle pattern can be served as a strong indication of a fake transaction in e-commerce platforms [34], where the accounts of users (buyers or sellers) are represented as vertices and online transactions, e.g., payment activities, are denoted as dynamic edges. With CSM, suspicious transactions would be detected to generate real-time alerts and trigger prompt actions. ...
... GraphBolt [30] proposes a generalized incremental model to handle non-monotonic algorithms like Belief Propagation, but involves more overheads than KickStarter for monotonic algorithms. GraphS [34] designs a real-time streaming system called GraphS for cycle detection. RisGraph [12] targets per-update analysis to provide low latency and detailed information in comparison. ...
Preprint
Full-text available
Streaming graphs are drawing increasing attention in both academic and industrial communities as many graphs in real applications evolve over time. Continuous subgraph matching (shorted as CSM) aims to report the incremental matches of a query graph in such streaming graphs. It involves two major steps, i.e., candidate maintenance and incremental match generation, to answer CSM. Throughout the course of continuous subgraph matching, incremental match generation backtracking over the search space dominates the total cost. However, most previous approaches focus on developing techniques for efficient candidate maintenance, while incremental match generation receives less attention despite its importance in CSM. Aiming to minimize the overall cost, we propose two techniques to reduce backtrackings in this paper. We present a cost-effective index CaLiG that yields tighter candidate maintenance, shrinking the search space of backtracking. In addition, we develop a novel incremental matching paradigm KSS that decomposes the query vertices into conditional kernel vertices and shell vertices. With the matches of kernel vertices, the incremental matches can be produced immediately by joining the candidates of shell vertices without any backtrackings. Benefiting from reduced backtrackings, the elapsed time of CSM decreases significantly. Extensive experiments over real graphs show that our method runs faster than the state-of-the-art algorithm orders of magnitude.
... In financial systems, transaction activities can be modeled as a directed graph, where each vertex represents a person or account, and each edge ( , ) represents a transaction from to . A simple cycle in such a graph is a strong indication of fraudulent activity or even a financial crime like money laundering [27,29,30,35]. For a certain transaction ( , ), by extracting vertices and edges in all simple cycles containing ( , ), all fraudsters and fraudulent transactions involved can be identified. ...
... Clearly, generating the hop-constrained simple path graph from to will immediately produce the target fraudsters and transactions. Similar to previous works [27,29,30,35], non-simple cycles are not considered here, since they may contain other cycles not related to the current edge ( , ) (i.e., not participating current fraud). Involving them may impose unnecessary repeated punishments and increase downstream workloads (e.g., monitoring and investigation). ...
Preprint
Full-text available
Graphs have been widely used in real-world applications, in which investigating relations between vertices is an important task. In this paper, we study the problem of generating the k-hop-constrained s-t simple path graph, i.e., the subgraph consisting of all simple paths from vertex s to vertex t of length no larger than k. To our best knowledge, we are the first to formalize this problem and prove its NP-hardness on directed graphs. To tackle this challenging problem, we propose an efficient algorithm named EVE, which exploits the paradigm of edge-wise examination rather than exhaustively enumerating all paths. Powered by essential vertices appearing in all simple paths between vertex pairs, EVE distinguishes the edges that are definitely (or not) contained in the desired simple path graph, producing a tight upper-bound graph in the time cost $\mathcal{O}(k^2|E|)$. Each remaining undetermined edge is further verified to deliver the exact answer. Extensive experiments are conducted on 15 real networks. The results show that EVE significantly outperforms all baselines by several orders of magnitude. Moreover, by taking EVE as a built-in block, state-of-the-art for hop-constrained simple path enumeration can be accelerated by up to an order of magnitude.
... These most recent edges are always changing, which are defined as a sliding window [12]. The sliding window model is widely used in streaming graph algorithms and systems [13][14][15]. Therefore, a sliding window-based continuous triangle counting algorithm is desired. ...
... Count-based one can be seen as a simplified time-based sliding window where there is exactly one edge coming at each time point. Most previous algorithms and applications also use time-based sliding windows [13][14][15]. For simplicity, we use "sliding window" to denote the time-based sliding window in the following sections. ...
Article
Full-text available
Streaming graph analysis is gaining importance in various fields due to the natural dynamicity in many real graph applications. However, approximately counting triangles in real-world streaming graphs with duplicate edges and sliding window model remains an unsolved problem. In this paper, we propose SWTC algorithm to address approximate sliding-window triangle counting problem in streaming graphs. In SWTC, we propose a fixed-length slicing strategy that addresses both sample maintaining and cardinality estimation issues with a bounded memory usage. We theoretically prove the superiority of our method in sample graph size and estimation accuracy under given memory upper bound. To further improve the performance of our algorithm, we propose two optimization techniques, vision counting to avoid computation peaks, and asynchronous grouping to stabilize the accuracy. Extensive experiments also confirm that our approach has higher accuracy compared with the baseline method under the same memory usage.