Number of hot points under different thresholds.

Source publication

Real-time constrained cycle detection in large dynamic graphs

Article

Full-text available

Aug 2018

As graph data is prevalent for an increasing number of Internet applications, continuously monitoring structural patterns in dynamic graphs in order to generate real-time alerts and trigger prompt actions becomes critical for many applications. In this paper, we present a new system GraphS to efficiently detect constrained cycles in a dynamic graph...

FIGURE Pareto fronts derived from NEDECO-PSO-CellSort (solid lines) vs....

A parameter-optimization framework for neural decoding systems

Article

Full-text available

Feb 2023

Real-time neuron detection and neural activity extraction are critical components of real-time neural decoding. They are modeled effectively in dataflow graphs. However, these graphs and the components within them in general have many parameters, including hyper-parameters associated with machine learning sub-systems. The dataflow graph parameters...

Using reactive links to propagate changes across engineering models

Article

Full-text available

Jun 2024
SOFTW SYST MODEL

Collaborative model-driven development is a de facto practice to create software-intensive systems in several domains (e.g., aerospace, automotive, and robotics). However, when multiple engineers work concurrently, keeping all model artifacts synchronized and consistent is difficult. This is even harder when the engineering process relies on a myriad of tools and domains (e.g., mechanic, electronic, and software). Existing work tries to solve this issue from different perspectives, such as using trace links between different artifacts or computing change propagation paths. However, these solutions mainly provide additional information to engineers, still requiring manual work for propagating changes. Yet, most modeling tools are limited regarding the traceability between different domains, while also lacking the efficiency and granularity required during the development of software-intensive systems. Motivated by these limitations, in this work, we present a solution based on what we call “reactive links”, which are highly granular trace links that propagate change between property values across models in different domains, managed in different tools. Differently from traditional “passive links”, reactive links automatically propagate changes when engineers modify models, assuring the synchronization and consistency of the artifacts. The feasibility, performance, and flexibility of our solution were evaluated in three practical scenarios, from two partner organizations. Our solution is able to resolve all cases in which change propagation among models were required. We observed a great improvement of efficiency when compared to the same propagation if done manually. The contribution of this work is to enhance the engineering of software-intensive systems by reducing the burden of manually keeping models synchronized and avoiding inconsistencies that potentially can originate from collaborative engineering in a variety of tool from different domains.

A survey on hybrid transactional and analytical processing

Article

Full-text available

Jun 2024
VLDB J

To provide applications with the ability to analyze fresh data and eliminate the time-consuming ETL workflow, hybrid transactional and analytical (HTAP) systems have been developed to serve online transaction processing and online analytical processing workloads in a single system. In recent years, HTAP systems have attracted considerable interest from both academia and industry. Several new architectures and technologies have been proposed. This paper provides a comprehensive overview of these HTAP systems. We review recently published papers and technical reports in this field and broadly classify existing HTAP systems into two categories based on their data formats: monolithic and hybrid HTAP. We further classify hybrid HTAP into four sub-categories based on their storage architecture: row-oriented, column-oriented, separated, and hybrid. Based on such a taxonomy, we outline each stream’s design challenges and performance issues (e.g., the contradictory format demand for monolithic HTAP). We then discuss potential solutions and their trade-offs by reviewing noteworthy research findings. Finally, we summarize emerging HTAP applications, benchmarks, future trends, and open problems.

CSM-TopK: Continuous Subgraph Matching with TopK Density Constraints

Conference Paper

Full-text available

May 2024

Optimizing Subgraph Retrieval and Matching with an Efficient Indexing Scheme

Preprint

Full-text available

Apr 2024

Graph index as an effective data structure is widely applied in subgraph retrieval and matching. It records and compares the frequencies of a set of specific features to detect subgraph containment on the fly, which is the foundation of the filtering techniques for subgraph retrieval and matching. However, due to the NP-hardness of the subgraph counting, current graph indices struggle to be built on large graphs. Even counting the simple path and cycle graphs is NP-hard. We observe that the monotone property of the counting process is crucial for the correctness and precision of the index. Therefore, we introduce an efficient graph indexing scheme by counting the path and cycle features monotonically in relaxed semantics. In addition to the filtering techniques, we propose to reorder the search candidates via our index. Experimental results reveal that our index can be constructed significantly faster than existing methods, by 1-3 orders of magnitude, and can handle graphs that are larger than previous work by 1-3 orders of magnitude. Our index-boosted filtering and ordering techniques are proven to be effective in optimizing the subgraph retrieval and matching process.

Counting Butterflies in Fully Dynamic Bipartite Graph Streams

Conference Paper

Full-text available

Dec 2023

A bipartite graph extensively models relationships between real-world entities of two different types, such as user-product data in e-commerce. Such graph data are inherently becoming more and more streaming, entailing continuous insertions and deletions of edges. A butterfly (i.e., 2×2 bi-clique) is the smallest non-trivial cohesive structure that plays a crucial role. Counting such butterfly patterns in streaming bipartite graphs is a core problem in applications such as dense subgraph discovery and anomaly detection. Yet, existing approximate solutions consider insert-only streams and, thus, achieve very low accuracy in fully dynamic bipartite graph streams that involve both insertions and deletions of edges. Adapting them to consider deletions is not trivial either, because different sampling schemes and new accuracy analyses are required. We propose ABACUS, a novel approximate algorithm that counts butterflies in the presence of both insertions and deletions by utilizing sampling. We prove that ABACUS always delivers unbiased estimates of low variance. Furthermore, we extend ABACUS and devise a parallel mini-batch variant, namely, PARABACUS, which counts butterflies in parallel. PARABACUS counts butterflies in a load-balanced manner using versioned samples, which results in significant speedup and is thus ideal for critical applications in the streaming environment. We evaluate ABACUS/PARABACUS using a diverse set of real bipartite graphs and assess its performance in terms of accuracy, throughput, and speedup. The results indicate that our proposal is the first capable of efficiently providing accurate butterfly counts in the most generic setting, i.e., a fully dynamic graph streaming environment that entails both insertions and deletions. It does so without sacrificing throughput, and even improves it with the parallel version.

Anomalous Behavior Detection in Trajectory Data of Older Drivers

Conference Paper

Full-text available

Dec 2023

An Efficient Data Structure for Dynamic Graph on GPUs

Article

Nov 2023
IEEE T KNOWL DATA EN

There is a growing interest to offload dynamic graph computation to GPU and resort to its high parallel processing ability and larger memory bandwidths compared with CPUs. The existing GPU graph systems usually use compressed sparse row (CSR) as the de-facto structure. However, CSR has a critical weakness for dynamic change due to the large overhead of re-balance process after update. GPMA+ is a state-of-art dynamic PMA-based structure that uses PMA structure and segment-oriented parallel update procedure to address the dynamic weakness of CSR, but it still has a bottleneck on the array expansion. In this paper, we propose an leveled structure (called LPMA) instead of continue array to retain low time complexity and high parallel update and lift the expansion bottleneck of GPMA+. More specifically, we propose a series of optimization techniques, including bottom-up update, top-down update and on-demand hybrid update strategies as well as consistence-guaranteed parallel processing for update-query mixed workloads. We theoretically analyze the benefits of LPMA compared in terms of re-balance cost during updates. Extensive experiments on four large real-life graphs prove the superiority of LPMA compared with the-state-of-arts.

Fast Continuous Subgraph Matching over Streaming Graphs via Backtracking Reduction

Preprint

Full-text available

Apr 2023

Streaming graphs are drawing increasing attention in both academic and industrial communities as many graphs in real applications evolve over time. Continuous subgraph matching (shorted as CSM) aims to report the incremental matches of a query graph in such streaming graphs. It involves two major steps, i.e., candidate maintenance and incremental match generation, to answer CSM. Throughout the course of continuous subgraph matching, incremental match generation backtracking over the search space dominates the total cost. However, most previous approaches focus on developing techniques for efficient candidate maintenance, while incremental match generation receives less attention despite its importance in CSM. Aiming to minimize the overall cost, we propose two techniques to reduce backtrackings in this paper. We present a cost-effective index CaLiG that yields tighter candidate maintenance, shrinking the search space of backtracking. In addition, we develop a novel incremental matching paradigm KSS that decomposes the query vertices into conditional kernel vertices and shell vertices. With the matches of kernel vertices, the incremental matches can be produced immediately by joining the candidates of shell vertices without any backtrackings. Benefiting from reduced backtrackings, the elapsed time of CSM decreases significantly. Extensive experiments over real graphs show that our method runs faster than the state-of-the-art algorithm orders of magnitude.

Towards Generating Hop-constrained s-t Simple Path Graphs

Preprint

Full-text available

Apr 2023

Graphs have been widely used in real-world applications, in which investigating relations between vertices is an important task. In this paper, we study the problem of generating the k-hop-constrained s-t simple path graph, i.e., the subgraph consisting of all simple paths from vertex s to vertex t of length no larger than k. To our best knowledge, we are the first to formalize this problem and prove its NP-hardness on directed graphs. To tackle this challenging problem, we propose an efficient algorithm named EVE, which exploits the paradigm of edge-wise examination rather than exhaustively enumerating all paths. Powered by essential vertices appearing in all simple paths between vertex pairs, EVE distinguishes the edges that are definitely (or not) contained in the desired simple path graph, producing a tight upper-bound graph in the time cost $\mathcal{O}(k^2|E|)$. Each remaining undetermined edge is further verified to deliver the exact answer. Extensive experiments are conducted on 15 real networks. The results show that EVE significantly outperforms all baselines by several orders of magnitude. Moreover, by taking EVE as a built-in block, state-of-the-art for hop-constrained simple path enumeration can be accelerated by up to an order of magnitude.

Sliding window-based approximate triangle counting with bounded memory usage

Article

Full-text available

Mar 2023
VLDB J

Streaming graph analysis is gaining importance in various fields due to the natural dynamicity in many real graph applications. However, approximately counting triangles in real-world streaming graphs with duplicate edges and sliding window model remains an unsolved problem. In this paper, we propose SWTC algorithm to address approximate sliding-window triangle counting problem in streaming graphs. In SWTC, we propose a fixed-length slicing strategy that addresses both sample maintaining and cardinality estimation issues with a bounded memory usage. We theoretically prove the superiority of our method in sample graph size and estimation accuracy under given memory upper bound. To further improve the performance of our algorithm, we propose two optimization techniques, vision counting to avoid computation peaks, and asynchronous grouping to stabilize the accuracy. Extensive experiments also confirm that our approach has higher accuracy compared with the baseline method under the same memory usage.

Number of hot points under different thresholds.

Similar publications

Citations