Conference Paper

Testing Flow Graph Reducibility

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Many problems in program optimization have been solved by applying a technique called interval analysis to the flow graph of the program. A flow graph which is susceptible to this type of analysis is called reducible. This paper describes an algorithm for testing whether a flow graph is reducible. The algorithm uses depth-first search to reveal the structure of the flow graph and a good method for computing disjoint set unions to determine reducibility from the search information. When the algorithm is implemented on a random access computer, it requires O(E log* E) time to analyze a graph with E edges, where log* x &equil; min{i/logix≤1}. The time bound compares favorably with the O(E log E) bound of a previously known algorithm.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... A reducible flow graph [Hecht and Ullman 1974; Tarjan 1974b] is one in which every strongly connected subgraph S has a single entry vertex v such that every path from s to a vertex in S contains v. There are many equivalent characterizations of reducible flow graphs [Tarjan 1974b], and there are algorithms to test reducibility in near-linear [Tarjan 1974b] and truly linear [Buchsbaum et al. 2008] time. ...
... A reducible flow graph [Hecht and Ullman 1974; Tarjan 1974b] is one in which every strongly connected subgraph S has a single entry vertex v such that every path from s to a vertex in S contains v. There are many equivalent characterizations of reducible flow graphs [Tarjan 1974b], and there are algorithms to test reducibility in near-linear [Tarjan 1974b] and truly linear [Buchsbaum et al. 2008] time. One notion of a " structured " program is that its flow graph is reducible. ...
... A reducible flow graph [Hecht and Ullman 1974; Tarjan 1974b] is one in which every strongly connected subgraph S has a single entry vertex v such that every path from s to a vertex in S contains v. There are many equivalent characterizations of reducible flow graphs [Tarjan 1974b], and there are algorithms to test reducibility in near-linear [Tarjan 1974b] and truly linear [Buchsbaum et al. 2008] time. One notion of a " structured " program is that its flow graph is reducible. ...
Article
How does one verify that the output of a complicated program is correct? One can formally prove that the program is correct, but this may be beyond the power of existing methods. Alternatively, one can check that the output produced for a particular input satisfies the desired input--output relation by running a checker on the input--output pair. Then one only needs to prove the correctness of the checker. For some problems, however, even such a checker may be too complicated to formally verify. There is a third alternative: augment the original program to produce not only an output but also a correctness certificate, with the property that a very simple program (whose correctness is easy to prove) can use the certificate to verify that the input--output pair satisfies the desired input--output relation. We consider the following important instance of this general question: How does one verify that the dominator tree of a flow graph is correct? Existing fast algorithms for finding dominators are complicated, and even verifying the correctness of a dominator tree in the absence of additional information seems complicated. We define a correctness certificate for a dominator tree, show how to use it to easily verify the correctness of the tree, and show how to augment fast dominator-finding algorithms so that they produce a correctness certificate. We also relate the dominator certificate problem to the problem of finding divergent spanning trees in a flow graph, and we develop algorithms to find such trees. All our algorithms run in linear time. Previous algorithms apply just to the special case of only trivial dominators, and they take at least quadratic time.
... Node d is the dominator of node n if every path from the entry node of CFG to n goes through This algorithm can only detect loops with single entries, reducible graph [89]. But in practice, graphs contain loops with multiple entries, which are defined as irreducible graph, happen. ...
... As Figure 10(a) shows, Loop 2 has two entries: Node 5 and 6. In this case, we apply the node collapsing algorithm [52,89] to determine if a CFG contains a loop with multiple entries, and the node splitting algorithm [52,89] to break such a loop. We then can make an irreducible graph reducible. ...
... As Figure 10(a) shows, Loop 2 has two entries: Node 5 and 6. In this case, we apply the node collapsing algorithm [52,89] to determine if a CFG contains a loop with multiple entries, and the node splitting algorithm [52,89] to break such a loop. We then can make an irreducible graph reducible. ...
Article
ABSTRACT Buffer overflow attacks have been a computer security threat in software-based systems and applications for decades. The existence of buffer overflow vulnerabilities makes the system
... Applications of sequential disjoint set union include storage allocation in compilers [32], finding minimum spanning trees using Kruskal's algorithm [31], maintaining the connected components of an undirected graph under edge additions [10,17,44], testing percolation [40], finding loops and dominators in flow graphs [12,42,43], and finding strong components in directed graphs. Some of these applications, notably finding connected components [24,27,33,37,39,41] and finding strong components, are on immense graphs and could potentially benefit from the use of concurrency to speed up the computation. ...
... In some applications of disjoint set union, such as computing flow graph information [42,43] each set has a name or some other associated value, such as the number of elements in the set. We can extend the compressed tree data structure to support set values by storing these in the set roots. ...
Article
Full-text available
We develop and analyze concurrent algorithms for the disjoint set union (“union-find” ) problem in the shared memory, asynchronous multiprocessor model of computation, with CAS (compare and swap) or DCAS (double compare and swap) as the synchronization primitive. We give a deterministic bounded wait-free algorithm that uses DCAS and has a total work bound of O(m·lognpm+1+αn,mnp) for a problem with n elements and m operations solved by p processes, where α is a functional inverse of Ackermann’s function. We give two randomized algorithms that use only CAS and have the same work bound in expectation. The analysis of the second randomized algorithm is valid even if the scheduler is adversarial. Our DCAS and randomized algorithms take O(logn) steps per operation, worst-case for the DCAS algorithm, high-probability for the randomized algorithms. Our work and step bounds grow only logarithmically with p, making our algorithms truly scalable. We prove that for a class of symmetric algorithms that includes ours, no better step or work bound is possible. Our work is theoretical, but Alistarh et al (In search of the fastest concurrent union-find algorithm, 2019), Dhulipala et al (A framework for static and incremental parallel graph connectivity algorithms, 2020) and Hong et al (Exploring the design space of static and incremental graph connectivity algorithms on gpus, 2020) have implemented some of our algorithms on CPUs and GPUs and experimented with them. On many realistic data sets, our algorithms run as fast or faster than all others.
... Applications of sequential disjoint set union include storage allocation in compilers [LA02], finding minimum spanning trees using Kruskal's algorithm [Kru56], maintaining the con-nected components of an undirected graph under edge additions [Tar75a], testing percolation [SW11], finding loops and dominators in flow graphs [Tar73,Tar74,FGMT13], and finding strong components in directed graphs. Some of these applications, notably finding connected components [Tar72, SV82, JM97, HZ01, RMCDS12, LT18] and finding strong components, are on immense graphs and could potentially benefit from the use of concurrency to speed up the computation. ...
... In some applications of disjoint set union, such as computing flow graph information [Tar74,Tar73] each set has a name or some other associated value, such as the number of elements in the set. We can extend the compressed tree date structure to support set values by storing these in the set roots. ...
Preprint
We develop and analyze concurrent algorithms for the disjoint set union (union-find) problem in the shared memory, asynchronous multiprocessor model of computation, with CAS (compare and swap) or DCAS (double compare and swap) as the synchronization primitive. We give a deterministic bounded wait-free algorithm that uses DCAS and has a total work bound of $O(m \cdot (\log(np/m + 1) + \alpha(n, m/(np)))$ for a problem with $n$ elements and $m$ operations solved by $p$ processes, where $\alpha$ is a functional inverse of Ackermann's function. We give two randomized algorithms that use only CAS and have the same work bound in expectation. The analysis of the second randomized algorithm is valid even if the scheduler is adversarial. Our DCAS and randomized algorithms take $O(\log n)$ steps per operation, worst-case for the DCAS algorithm, high-probability for the randomized algorithms. Our work and step bounds grow only logarithmically with $p$, making our algorithms truly scalable. We prove that for a class of symmetric algorithms that includes ours, no better step or work bound is possible.
... Consequently, the almost-linear time algorithm for WPO can also be used to compute WTO. The algorithm for WPO construction handles dependency graphs that are irreducible [Hecht and Ullman 1972;Tarjan 1973]. The key insight behind our approach is to adapt algorithms for computing loop nesting forests [Ramalingam 1999[Ramalingam , 2002] to the problem of computing a concurrent iteration strategy for abstract interpretation. ...
... The map R maps a vertex to a set of edges, and is used to handle irreducible graphs [Hecht and Ullman 1972;Tarjan 1973]. Initially, R is set to ∅, and updated on Line 29. ...
Article
Full-text available
interpretation is a general framework for expressing static program analyses. It reduces the problem of extracting properties of a program to computing an approximation of the least fixpoint of a system of equations. The de facto approach for computing this approximation uses a sequential algorithm based on weak topological order (WTO). This paper presents a deterministic parallel algorithm for fixpoint computation by introducing the notion of weak partial order (WPO). We present an algorithm for constructing a WPO in almost-linear time. Finally, we describe Pikos, our deterministic parallel abstract interpreter, which extends the sequential abstract interpreter IKOS. We evaluate the performance and scalability of Pikos on a suite of 1017 C programs. When using 4 cores, Pikos achieves an average speedup of 2.06x over IKOS, with a maximum speedup of 3.63x. When using 16 cores, Pikos achieves a maximum speedup of 10.97x.
... A flow graph G = (V, E, r) is reducible if every strongly connected subgraph S has a single entry vertex v such every path from s to a vertex in S contains v [28, 45]. Tarjan [45] gave a characterization of reducible flow graphs using dominators: A flow graph is reducible if and only if it becomes acyclic when every edge (v, w) such that w dominates v is deleted. ...
... A flow graph G = (V, E, r) is reducible if every strongly connected subgraph S has a single entry vertex v such every path from s to a vertex in S contains v [28, 45]. Tarjan [45] gave a characterization of reducible flow graphs using dominators: A flow graph is reducible if and only if it becomes acyclic when every edge (v, w) such that w dominates v is deleted. The notion of reducibility is important because many programs have control flow graphs that are reducible, which simplifies many computations. ...
Article
Full-text available
Motivated by recent applications of dominator computations, we consider the problem of dynamically maintaining the dominators of flow graphs through a sequence of insertions and deletions of edges. Our main theoretical contribution is a simple incremental algorithm that maintains the dominator tree of a flow graph with $n$ vertices through a sequence of $k$ edge insertions in $O(m\min\{n,k\}+kn)$ time, where $m$ is the total number of edges after all insertions. Moreover, we can test in constant time if a vertex $u$ dominates a vertex $v$, for any pair of query vertices $u$ and $v$. Next, we present a new decremental algorithm to update a dominator tree through a sequence of edge deletions. Although our new decremental algorithm is not asymptotically faster than repeated applications of a static algorithm, i.e., it runs in $O(mk)$ time for $k$ edge deletions, it performs well in practice. By combining our new incremental and decremental algorithms we obtain a fully dynamic algorithm that maintains the dominator tree through intermixed sequence of insertions and deletions of edges. Finally, we present efficient implementations of our new algorithms as well as of existing algorithms, and conduct an extensive experimental study on real-world graphs taken from a variety of application areas.
... We employ a reducibility method, whose reduction operations iteratively obtain smaller graphs. Reducibility methods of this kind have been applied in papers, as [16,21]. ...
... A flow graph in which each of its cycles is single-entry is called reducible. Reducible graphs were characterized by Hecht and Ullman [9,10], while efficient recognition has been described by Tarjan [21]. ...
Article
We revisit a concept that has been central in some early stages of computer science, that of structured programming: a set of rules that an algorithm must follow in order to acquire a structure that is desirable in many aspects. While much has been written about structured programming, an important issue has been left unanswered: given an arbitrary, compiled program, decide whether it is structured, that is, whether it conforms to the stated principles of structured programming. By employing graph-theoretic tools, we formulate an efficient algorithm for answering this question. To do so, we first introduce the class of graphs which correspond to structured programs, which we call Dijkstra Graphs. Our problem then becomes the recognition of such graphs, for which we present an $O(n^2)$-time algorithm. Furthermore, we describe an isomorphism algorithm for Dijkstra graphs presenting the same quadratic complexity.
... where E is the number of edges in the graph. Tarjan [114] presents an approach by performing a depth-rst search over the graph and using a set-union function to test whether T 1 and T 2 can be performed, ...
... As a result of the aforementioned aws of TGSA construction, building the GDDG incorporates Tarjan's algorithm [114] for building the loop nesting tree, which identies a unique header node for each loop in a reducible CFG. Once the loop nesting tree has been built, loop construction proceeds in the manner of TGSA form, augmenting loop entries and exists and placing µand η-nodes. ...
Article
A thesis submitted, on July 23rd, 2011, in partial fullment of the requirements for the degree of Doctor of Philosophy (DPhil) in the School of Informatics at the University of Sussex.
... The dominance problem is an excellent example of the need to balance theory with practice. Ever since Lowry and Medlock's O(N 4 ) algorithm appeared in 1969 [23], researchers have steadily improved the time bound for this problem [7] [10] [17] [19] [22] [26] [29]. However, our results suggest that these improvements in asymptotic complexity may not help on realistically-sized examples, and that careful engineering makes the iterative scheme the clear method of choice. ...
... In 1974, Tarjan proposed an algorithm that uses depth-first search and union-find to achieve an asymptotic complexity of O(N log N + E) [29]. Five years later, Lengauer and Tarjan built on this work to produce an algorithm with almost linear complexity [22]. ...
Technical Report
Full-text available
The problem of finding the dominators in a control-flow graph has a long history in the literature. The original algorithms suffered from a large asymptotic complexity but were easy to understand. Subsequent work improved the time bound, but generally sacrificed both simplicity and ease of implementation. This paper returns to a simple formulation of dominance as a global data-flow problem. Some insights into the nature of dominance lead to an implementation of an O(N^2) algorithm that runs faster, in practice, than the classic Lengauer-Tarjan algorithm, which has a timebound of O(E ∗ log(N)). We compare the algorithm to Lengauer-Tarjan because it is the best known and most widely used of the fast algorithms for dominance. Working from the same implementation insights, we also rederive (from earlier work on control dependence by Ferrante, et al.) a method for calculating dominance frontiers that we show is faster than the original algorithm by Cytron, et al. The aim of this paper is not to present a new algorithm, but, rather, to make an argument based on empirical evidence that algorithms with discouraging asymptotic complexities can be faster in practice than those more commonly employed. This research was supported, in part, by Darpa through Usafrl contract F30602-97-2-298, and the State of Texas through its Advanced Technology Program, grant number 3604-0122-1999
... Different approaches are proposed to potentially reduce cycles [49,98,105], which can also be employed and adapted to manage existing cycles and prevent, where possible, endless propagation loops. Namely, reducible cycles [49] can be partitioned into two acyclic subgraphs. ...
Article
Full-text available
Collaborative model-driven development is a de facto practice to create software-intensive systems in several domains (e.g., aerospace, automotive, and robotics). However, when multiple engineers work concurrently, keeping all model artifacts synchronized and consistent is difficult. This is even harder when the engineering process relies on a myriad of tools and domains (e.g., mechanic, electronic, and software). Existing work tries to solve this issue from different perspectives, such as using trace links between different artifacts or computing change propagation paths. However, these solutions mainly provide additional information to engineers, still requiring manual work for propagating changes. Yet, most modeling tools are limited regarding the traceability between different domains, while also lacking the efficiency and granularity required during the development of software-intensive systems. Motivated by these limitations, in this work, we present a solution based on what we call “reactive links”, which are highly granular trace links that propagate change between property values across models in different domains, managed in different tools. Differently from traditional “passive links”, reactive links automatically propagate changes when engineers modify models, assuring the synchronization and consistency of the artifacts. The feasibility, performance, and flexibility of our solution were evaluated in three practical scenarios, from two partner organizations. Our solution is able to resolve all cases in which change propagation among models were required. We observed a great improvement of efficiency when compared to the same propagation if done manually. The contribution of this work is to enhance the engineering of software-intensive systems by reducing the burden of manually keeping models synchronized and avoiding inconsistencies that potentially can originate from collaborative engineering in a variety of tool from different domains.
... In a function' CFG, basic blocks are defined as vertices that can be classified based on their functionality. We follow the taxonomy of vertices followed by Tarjan [36] and Karamitas [9]. A vertex V in a function CF G = V, E can be classified into the following seven categories. 1) Entry: Given a function CFG, the vertices which are the root nodes in a function CFG or execution entry points are entry vertices. ...
Article
Full-text available
Binary-binary function matching problem serves as a plinth in many reverse engineering techniques such as binary diffing, malware analysis, and code plagiarism detection. In literature, function matching is performed by first extracting function features (syntactic and semantic), and later these features are used as selection criteria to formulate an approximate 1:1 correspondence between binary functions. The accuracy of the approximation is dependent on the selection of efficient features. Although substantial research has been conducted on this topic, we have explored two major drawbacks in previous research. (i) The features are optimized only for a single architecture and their matching efficiency drops for other architectures. (ii) function matching algorithms mainly focus on the structural properties of a function, which are not inherently resilient against compiler optimizations. To resolve the architecture dependency and compiler optimizations, we benefit from the intermediate representation (IR) of function assembly and propose a set of syntactic and semantic (embedding-based) features which are efficient for multi-architectures, and sensitive to compiler-based optimizations. The proposed function matching algorithm employs one-shot encoding that is flexible to small changes and uses a KNN based approach to effectively map similar functions. We have evaluated proposed features and algorithms using various binaries, which were compiled for x86 and ARM architectures; and the prototype implementation is compared with Diaphora (an industry-standard tool), and other baseline research. Our proposed prototype has achieved a matching accuracy of approx. 96%, which is higher than the compared tools and consistent against optimizations and multi-architecture binaries.
... However, executing only what is needed is obviously more performant. For reducible programs [47], we can automatically generate a protocol which details how often to execute which sequence of basic blocks. Using well-known techniques, we analyze the CFG, inline functions, and dissect the program into the building blocks of structured programming: sequences, which are executed in-order, selections (if/else), for which both the true path and the false path are executed once, and iterations, for which we rely on the attacker for annotations in the DOP script to hint the number of repetitions. ...
Preprint
The wide-spread adoption of system defenses such as the randomization of code, stack, and heap raises the bar for code-reuse attacks. Thus, attackers utilize a scripting engine in target programs like a web browser to prepare the code-reuse chain, e.g., relocate gadget addresses or perform a just-in-time gadget search. However, many types of programs do not provide such an execution context that an attacker can use. Recent advances in data-oriented programming (DOP) explored an orthogonal way to abuse memory corruption vulnerabilities and demonstrated that an attacker can achieve Turing-complete computations without modifying code pointers in applications. As of now, constructing DOP exploits requires a lot of manual work. In this paper, we present novel techniques to automate the process of generating DOP exploits. We implemented a compiler called Steroids that compiles our high-level language SLANG into low-level DOP data structures driving malicious computations at run time. This enables an attacker to specify her intent in an application- and vulnerability-independent manner to maximize reusability. We demonstrate the effectiveness of our techniques and prototype implementation by specifying four programs of varying complexity in SLANG that calculate the Levenshtein distance, traverse a pointer chain to steal a private key, relocate a ROP chain, and perform a JIT-ROP attack. Steroids compiles each of those programs to low-level DOP data structures targeted at five different applications including GStreamer, Wireshark, and ProFTPd, which have vastly different vulnerabilities and DOP instances. Ultimately, this shows that our compiler is versatile, can be used for both 32- and 64-bit applications, works across bug classes, and enables highly expressive attacks without conventional code-injection or code-reuse techniques in applications lacking a scripting engine.
... Loop reconstruction is a well researched topic, which dates back into the 1970s with Tarjan [27,28] formulating his interval analysis algorithm capable of identifying loops in reducible control flow graphs. The algorithm creates a depth-first tree of the CFG and identifies loops in a bottom up traversal from inside out, by collapsing inner loops into single vertices [20]. ...
Conference Paper
On-stack replacement (OSR) is a common technique employed by dynamic compilers to reduce program warm-up time. OSR allows switching from interpreted to compiled code during the execution of this code. The main targets are long running loops, which need to be represented explicitly, with dedicated information about condition and body, to be optimized at run time. Bytecode interpreters, however, represent control flow implicitly via unstructured jumps and thus do not exhibit the required high-level loop representation. To enable OSR also for jump-based - often called unstructured - languages, we propose the partial reconstruction of loops in order to explicitly represent them in a bytecode interpreter. Besides an outline of the general idea, we implemented our approach in Sulong, a bytecode interpreter for LLVM bitcode, which allows the execution of C/C++. We conducted an evaluation with a set of C benchmarks, which showed speed-ups in warm-up of up to 9x for certain benchmarks. This facilitates execution of programs with long-running loops in rarely called functions, which would yield significant slowdown without OSR. While shown with a prototype implementation, the overall idea of our approach is generalizable for all bytecode interpreters.
... Hence we introduce a taxonomy of edges, similar to the one described by Tarjan [31]. Using DFS traversal, we classify edges into the following categories. ...
Article
Full-text available
Binary diffing consists in comparing syntactic and semantic differences of two programs in binary form, when source code is unavailable. It can be reduced to a graph isomorphism problem between the Control Flow Graphs, Call Graphs or other forms of graphs of the compared programs. Here we present REveal, a prototype tool which implements a binary diffing algorithm and an associated set of features, extracted from a binary’s CG and CFGs. Additionally, we explore the potential of applying Markov lumping techniques on function CFGs. The proposed algorithm and features are evaluated in a series of experiments on executables compiled for i386, amd64, arm and aarch64. Furthermore, the effectiveness of our prototype tool, code-named REveal, is assessed in a second series of experiments involving clustering of a corpus of 18 malware samples into 5 malware families. REveal’s results are compared against those produced by Diaphora, the most widely used binary diffing software of the public domain. We conclude that REveal improves the state-of-the-art in binary diffing by achieving higher matching scores, obtained at the cost of a slight running time increase, in most of the experiments conducted. Furthermore, REveal successfully partitions the malware corpus into clusters consisting of samples of the same malware family.
... Reconstruction of loops Loop reconstruction is a well researched topic, which dates back into the 1970s with Tarjan [27,28] formulating his interval analysis algorithm capable of identifying loops in reducible control flow graphs. The algorithm creates a depth-first tree of the CFG and identifies loops in a bottom up traversal from inside out, by collapsing inner loops into single vertices [20]. ...
Preprint
On-stack replacement (OSR) is a common technique employed by dynamic compilers to reduce program warm-up time. OSR allows switching from interpreted to compiled code during the execution of this code. The main targets are long running loops, which need to be represented explicitly, with dedicated information about condition and body, to be optimized at run time. Bytecode interpreters, however, represent control flow implicitly via unstructured jumps and thus do not exhibit the required high-level loop representation. To enable OSR also for jump-based - often called unstructured - languages, we propose the partial reconstruction of loops in order to explicitly represent them in a bytecode interpreter. Besides an outline of the general idea, we implemented our approach in Sulong, a bytecode interpreter for LLVM bitcode, which allows the execution of C/C++. We conducted an evaluation with a set of C benchmarks, which showed speed-ups in warm-up of up to 9x for certain benchmarks. This facilitates execution of programs with long-running loops in rarely called functions, which would yield significant slowdown without OSR. While shown with a prototype implementation, the overall idea of our approach is generalizable for all bytecode interpreters.
... Map R is used when restoring the removed cross and forward edges. Function FindNestedSCCs relies on the assumption that the graph is reducible [Hecht and Ullman 1972;Tarjan 1973]. It follows the edges backwards to find nested SCCs using rep(p) instead of predecessor p, as on Lines 38 and 45, to skip the search inside the nested SCCs. ...
Preprint
interpretation is a general framework for expressing static program analyses. It reduces the problem of extracting properties of a program to computing a fixpoint of a system of equations. The de facto approach for computing this fixpoint uses a sequential algorithm based on weak topological order (WTO). This paper presents a deterministic parallel algorithm for fixpoint computation by introducing the notion of weak partial order (WPO). We present an algorithm for constructing a WPO in almost-linear time. Finally, we describe our deterministic parallel abstract interpreter PIKOS. We evaluate the performance of PIKOS on a suite of 207 C programs. We observe a maximum speedup of 9.38x when using 16 cores compared to the existing WTO-based sequential abstract interpreter.
... To solve this problem we adopted a taxonomy of edges, similar to this originally described by Tarjan [33]. Briefly, using DFS traversal, we classify edges into the following categories: ...
... Los rankings de smells son vistos como un grafo, donde cada nodo es un smell, y cada arista indica una relación de precedencia inmediata entre dos smells en alguno de los rankings de los agentes. En particular, se aplica el algoritmo de Tarjan [18] que permite encontrar ciclos en un grafo mediante la búsqueda de componentes fuertemente conexos. Si se encuentra un ciclo en el grafo, quiere decir que existe una inconsistencia entre los rankings de los agentes. ...
Article
Full-text available
A code smell is a symptom in the source code that helps to identify a design problem. Several tools for detecting and ranking code smells according to their criticality to the system have been developed. However, existing works assume a centralized development approach, which does not consider systems being developed in a distributed fashion. The main problem in a distributed group of developers is that a tool cannot always ensure a global vision of (smells of) the system, and thus inconsistencies among the rankings provided by each developer are likely to happen. These inconsistencies often cause unnecessary refactorings and might not focus the whole team on the critical smells system-wide. Along this line, this work proposes a multi-agent tool, called D-JSpIRIT, which helps individual developers to reach a consensus on their smell rankings by means of distributed optimization techniques.
... A major technical feature of the front-end is its use of Tarjan's fast interval finding algorithm [20] to implement the reachingdefinition calculation of Allen and Cocke [1] for use-def chaining. Starting with the user-specified base types, and shapes or at least the rank (dimensions) of input parameters, base types and shapes are propagated via the u-d chains. ...
Conference Paper
ELI is a succinct array-based interactive programming language derived from APL. In this paper we present the overall design and implementation of a bootstrapped ELI-to-C compiler which is implemented in ELI. We provide a brief introduction to the ELI language, a high-level view of the code generation strategy, and a description of our bootstrapping process. We also provide a preliminary performance evaluation. Firstly, we use three existing C benchmarks to demonstrate the performance of the ELI-generated C code as compared with interpreted ELI and native C. Secondly, we use two benchmarks originally from APL to compare the ELI-generated C to interpreted ELI and a naive hand-generated C version. These preliminary results are encouraging, showing speedups over the interpreter and in many cases performance close to C. The results also show that some future optimizations, such as copy elimination/avoidance, would be beneficial.
... For the purpose of this chapter, the general pattern for a reducible loop is depicted in Figure 3. 6. It is generally the case that optimizing compilers do not deal with irreducible loops, and we consequently do not account for this case [78]. Some loops test their condition after their loop bodies, for example do-while loops within the Java programming language. ...
... Il faut noter que cette dénition des boucles ne prend en compte que les boucles n'ayant pas plusieurs entrées, autrement dit les boucles irréductibles. Une méthode pour tester rapidement si un CFG est réductible est donné par R. Tarjan dans [84]. ...
... As can be observed in the table, the detected cycles account for most of the running time for each application. Algorithms for cycle detection in graphs have been studied for years [16,9,18]. Our graph visualization framework implements the algorithm in [20]. ...
Article
Full-text available
Event flow graphs used in the context of performance monitoring combine the scalability and low overhead of profiling methods with lossless information recording of tracing tools. In other words, they capture statistics on the performance behavior of parallel applications while pre- serving the temporal ordering of events. Event flow graphs require significantly less storage than regular event traces and can still be used to recover the full ordered sequence of events performed by the application.
... The classical algorithm for identifying loops is Tarjan's interval analysis algorithm [29] which is restricted to reducible graphs. A reducible graph is defined as a graph that can be Irreducible graphs are graphs that cannot be classified as reducible graphs. ...
... [HU72]). Da der in [Tar73] und [BDL + 86] vorgestellte Algorithmus nur " reduzierbare", also Schleifen mit nur einem Eingang erkennt, kann er, um ein erstelltes EPK Modell als transformierbar zu validieren, nicht herangezogen werden. Stattdessen sollte der Algorithmus (DJ-Graph), der in dem Artikel von [UM02]/ [SGL96] vorgestellt wird, adaptiert werden. ...
Article
Für die erfolgreiche IT-Unterstützung abteilungs- und standortübergreifender Anwendungsszenarien müssen die vorhandenen und zukünftig einzusetzenden Informationssysteme miteinander integriert werden. Aufgrund der sich häufig ändernden Geschäftsprozesse ist hierzu ein flexibler Integrationsansatz nötig, der mit geringem Zeit- und Kostenaufwand auch an geänderte Abläufe anpassbar ist. Dies bedingt, dass die vorhandenen, teilweise sehr heterogenen Systeme mit einer geeigneten standardisierten Schnittstelle versehen und als Service gekapselt werden. Die eigentliche Realisierung des anwendungsübergreifenden Workflows erfolgt dann durch eine zentrale Workflow-Orchestrierungsengine, die die einzelnen Services anspricht und miteinander verbindet. In der ausgeschriebenen Diplomarbeit sollen am Beispiel eines Szenarios aus dem e-Government-Umfeld die in Form ereignisgesteuerter Prozessketten (EPKs) modellierten Geschäftsprozesse auf eine geeignete Service-orientierte Architektur abgebildet werden und dort mit Hilfe einer passenden Workflow-Orchestrierungssprache in ihrem Ablauf unterstützt werden. Als Orchestrierungssprache ist BPEL (Business Process Execution Language) vorgesehen.
... Computing the Loop-Nesting Forest The loop-nesting forest of a reducible CFG is unique and can be computed in O(|V | · log * (|E|)). Tarjan's algorithm [29] performs a bottom up traversal in a depth-first search tree of the CFG, identifying inner (nested) loops first. A loop is defined as a set of nodes from which there is a path to the source of a back-edge that does not go through its target. ...
Article
Full-text available
We revisit the problem of computing liveness sets, i.e., the set of variables live-in and live-out of basic blocks, for programs in strict SSA (static single assignment). Strict SSA is also known as SSA with dominance property because it ensures that the definition of a variable always dominates all its uses. This property can be exploited to optimize the computation of liveness sets. Our first contribution is the design of a fast data-flow algorithm, which, unlike traditional approaches, avoids the iterative calculation of a fixed point. Thanks to the properties of strict SSA form and the use of a loop-nesting forest, we show that two passes are sufficient. A first pass, similar to the initialization of iterative data-flow analysis, traverses the control-flow graph in postorder propagating liveness information backwards. A second pass then traverses the loop-nesting forest, updating liveness information within loops. Another approach is to propagate from uses to definition, one variable and one path at a time, instead of unioning sets as in standard data-flow analysis. Such a path-exploration strategy was proposed by Appel in his ''Tiger book'' and is also used in the LLVM compiler. Our second contribution is to show how to extend and optimize algorithms based on this idea to compute liveness sets one variable at a time using adequate data~structures. Finally, we evaluate and compare the efficiency of the proposed algorithms using the SPECINT 2000 benchmark suite. The standard data-flow approach is clearly outperformed, all algorithms show substantial speed-ups of a factor of 2 on average. Depending on the underlying set implementation either the path-exploration approach or the loop-forest-based approach provides superior performance. Experiments show that our loop-forest-based algorithm provides superior performances (average speed-up of 43% on the fastest alternative) when sets are represented as bitsets and for optimized programs, i.e., when there are more variables and larger live-sets and live-ranges.
... A reducible flow graph [15,16,21] is a directed graph G with a source s ∈ V (G), such that, for each cycle C of G, every directed path from s to C reaches C at the same vertex. It is well known that a reducible flow graph has at most one Hamiltonian cycle. ...
Article
Full-text available
Watermarking solutions have been regarded as a promising way to fight copyright violations and intellectual theft of software. They consist of techniques to embed authorship/ownership data into a computer program. In some graph-based watermarking schemes, the identification data is disguised as control-flow graph of dummy code. Based on an idea of Collberg et al., Chroni and Nikolopoulos recently developed an ingenious such scheme whereby an integer is encoded into a particular kind of permutation graph. We extend the work of those authors as follows. First, we give a formal characterization of the class of graphs generated by their encoding function. Then, we provide a simpler decoding function and a robust polynomial-time algorithm which restores watermarks with a constant number of missing edges whenever at all possible, therefore providing a reasonable level of pro-tection against distortive attacks which aim at disabling the watermark.
... The head of a reducible loop dominates all vertices in the loop. A flow graph is reducible [26, 35] if all its loops are reducible. If G is reducible, deletion of all its back arcs with respect to any spanning tree produces an acyclic graph with the same dominators as G. Thus Algorithm AD extends to find the dominators of any reducible graph. ...
Article
The problem of finding dominators in a directed graph has many important applications, notably in global optimization of computer code. Although linear and near-linear-time algorithms exist, they use sophisticated data structures. We develop an algorithm for finding dominators that uses only a "static tree" disjoint set data structure in addition to simple lists and maps. The algorithm runs in near-linear or linear time, depending on the implementation of the disjoint set data structure. We give several versions of the algorithm, including one that computes loop nesting information (needed in many kinds of global code optimization) and that can be made self-certifying, so that the correctness of the computed dominators is very easy to verify.
... They identified the base structure that is present in a flow graph if and only if it is a non-reducible graph. Tarjan then improved upon Hecht and Ullman's algorithm in 1973, publishing a method for determining the reducibility of a flow graph [55]. This method is based on his earlier work on depth-first search, and runs in a slightly better time bound than the T1-T2 method. ...
Article
Abstract This thesis demonstrates that careful selection of compiler transformations can im- prove the output and reduce the compile-time cost of adaptive compilation. Compiler effectiveness depends on the order of code transformations applied. Adaptive com- pilation, then, uses empirical search to tune the transformation sequence for each program. This method achieves higher performance than traditional compilers, but often requires large compilation times. Previous research reduces compilation time by tuning the search process. This thesis, instead, tunes the search space by adding a loop unroller, addressing a deficiency in our compiler. Despite increasing the search- space size, this change results in more effective and efficientsearching. Averaged across nine benchmarks, the adaptive compiler produces code, more quickly, that is 10% faster. Unfortunately, implementing a loop unroller is non-trivial for our low- level intermediate language. Therefore, this thesis also contributes an algorithm for identifying and unrolling loops in the absence of high-level loop structures. Acknowledgments
Chapter
A new paradigm for HPC Resource Management, called Elastic Computing, is under development at the Invasive Computing Transregional Collaborative Research Center. An extension to MPI for programming elastic applications and a resource manager were implemented. The resource manager is an extension of the SLURM batch scheduler. Resource elasticity allows the resource manager to dictate changes in the resource allocations of running applications based on scheduler decisions. These resource allocation changes are decided by the scheduler based on performance feedback from the applications. The collection of performance feedback from running applications poses unique challenges for the runtime system. In this document, our current performance feedback system is presented.
Conference Paper
Waddle is a research intermediate-form optimizer that strictly maintains a canonical form similar to the loop-simplify form used in LLVM. The properties of this canonical form simplify movement of instructions to the edges of loops and often localize the effect on variables to the loop in which they are defined. The guarantee of canonical form preservation allows program transformations to rely on the presence of certain program properties without a necessary sanity-check or recalculation pre-step and does not impose an order of transformations in which reconstruction passes must be inserted. In this paper, we present a form-preserving edge deletion operation, in which a provably unreachable branch between two basic blocks is removed from the control flow graph. Additionally, we showa distinct application of the block ejection operation, a core procedure used for loop body reconstruction, as utilized in a function inlining transformation.
Conference Paper
In test case generation methods based on symbolic testing and/or model checking, the primary emphasis is on covering code/model elements and not on optimising test sequence length. However, in certain domains, e.g. embedded systems, GUI, networking software, testing process may involve interaction with other physical subsystems, possibly remotely situated. Thus, test sequence length may have important implications on the cost of testing. In this paper, we present SymTest, a novel framework for test sequence generation for testing embedded systems. SymTest selects good control flow paths so as to generate shorter test sequences. In case of unsatisfiability, SymTest explores the neighbouring paths using backtracking and heuristics. SymTest is distinctive w.r.t. other related methods in its attempt to generate shorter test sequences while searching for feasible paths. The other novelty is that SymTest allows plugging in heuristics in a flexible way, a feature because of which we call SymTest a framework and not an algorithm. Part of SymTest's power is in its extensibility to seamlessly accommodate more heuristics, thus enhancing its ability to generate shorter test sequences with economy of effort. Our experiments with SymTest show that SymTest achieves significantly shorter test sequences, with comparatively higher test coverage, as compared with other methods.
Article
Pointer analysis is a critical compiler analysis used to disambiguate the indirect memory references that result from the use of pointers and pointer-based data structures. A conventional pointer analysis deduces for every pair of pointers, at any program point, whether a points-to relation between them (i) definitely exists, (ii) definitely does not exist, or (iii) maybe exists. Many compiler optimizations rely on accurate pointer analysis, and to ensure correctness cannot optimize in the maybe case. In contrast, recently-proposed speculative optimizations can aggressively exploit the maybe case, especially if the likelihood that two pointers alias can be quantified. This paper proposes a Probabilistic Pointer Analysis (PPA) algorithm that statically predicts the probability of each points-to relation at every program point. Building on simple control-flow edge profiling, our analysis is both one-level context and flow sensitive-yet can still scale to large programs including the SPEC 2000 integer benchmark suite. The key to our approach is to compute points-to probabilities through the use of linear transfer functions that are efficiently encoded as sparse matrices.We demonstrate that our analysis can provide accurate probabilities, even without edge-profile information. We also find that-even without considering probability information-our analysis provides an accurate approach to performing pointer analysis.
Conference Paper
The deployment of larger and larger HPC systems challenges the scalability of both applications and analysis tools. Performance analysis toolsets provide users with means to spot bottlenecks in their applications by either collecting aggregated statistics or generating lossless time-stamped traces. While obtaining detailed trace information is the best method to examine the behavior of an application in detail, it is infeasible at extreme scales due to the huge volume of data generated. In this context, knowing the application structure, and particularly the nesting of loops in iterative applications is of great importance as it allows, among other things, to reduce the amount of data collected by focusing on important sections of the code. In this paper we demonstrate how the loop nesting structure of an MPI application can be extracted on-line from its event flow graph without the need of any explicit source code instrumentation. We show how this knowledge on the application structure can be used to compute post-mortem statistics as well as to reduce the amount of redundant data collected. To that end, we present a usage scenario where this structure information is utilized on-line (while the application runs) to intelligently collect fine-grained data for only a few iterations of an application, considerably reducing the amount of data gathered.
Article
Full-text available
Since the earliest days of compilation, code quality has been recognized as an important problem [18]. A rich literature has developed around the issue of improving code quality. This paper surveys one part of that literature: code transformations intended to improve the running time of programs on uniprocessor machines. This paper emphasizes transformations intended to improve code quality rather than analysis methods. We describe analytical techniques and specific data-flow problems to the extent that they are necessary to understand the transformations. Other papers provide excellent summaries of the various sub-fields of program analysis. The paper is structured around a simple taxonomy that classifies transformations based on how they change the code. The taxonomy is populated with example transformations drawn from the literature. Each transformation is described at a depth that facilitates broad understanding; detailed references are provided for deeper study of individual transformations. The taxonomy provides the reader with a framework for thinking about code-improving transformations. It also serves as an organizing principle for the paper. Copyright 1998, all rights reserved. You may copy this article for your personal use in Comp 512. Further reproduction or distribution requires written permission from the authors.
Article
This article defines a problem that involves merging nodes into trees while retaining the ability to determine the lowest common ancestor of any two nodes. An O(n log n) algorithm is offered to solve the problem on-line. It is shown how this algorithm provides a fast way of computing the dominator tree of a reducible flow graph.
Article
Full-text available
The development of services-based systems starts from defining business requirements to be implemented as high-level business processes. In this paper, we describe a scenario-driven approach for developing business processes specified as WS-BPEL descriptions. We aim for simplicity in the business level notation and leverage example-like modelling principles in order to enable process sketching. The first step in our approach is to identify the essential functional requirements for business processes. The requirements are modelled as simple scenarios, each of them defining a sample run through the process, i.e., required behaviour that the underlying service-based system should allow. The scenarios, specifying sent and received messages among the services, are synthesised into a state machine. The state machine is transformed into an initial process model given in UML activity model notation. To enable mapping into WS-BPEL code, the transformation exploit domain-specific rules, i.e., our target model consists of a subset of UML with WS-BPEL specific constraints and stereotypes. The initial process model can be further refined to enable generation of executable WS-BPEL descriptions. We apply the approach on two cases, a simple process for managing loan requests and an industry case study from a logistics provider are presented.
Article
With the increasing importance of Application Domain Specific Processor (ADSP) design, a significant challenge is to identify special-purpose operations for implementation as a customized instruction. While many methodologies have been proposed for this purpose, they all work for a single algorithm chosen from the target application domain. Such algorithm-specific approaches are not suitable for designing instruction sets applicable to a whole family of related algorithms. For an entire range of related algorithms, this paper develops a methodology for identifying compound operations, as a basis for designing “domain-specific” Instruction Set Architectures (ISAs) that can efficiently run most of the algorithms in a given domain. Our methodology combines three different static analysis techniques to identify instruction sequences common to several related algorithms: identification of (non-branching) instruction sequences that occur commonly across the algorithms; identification of instruction sequences nested within iterative constructs that are thus executed frequently; and identification of commonly-occurring instruction sequences that span basic blocks. Choosing different combinations of these results enables us to design domain-specific special operations with different desired characteristics, such as performance or suitability as a library function. To demonstrate our approach, case studies are carried out for a family of thirteen string matching algorithms. Finally, the validity of our static analysis results is confirmed through independent dynamic analysis experiments and performance improvement measurements.
Conference Paper
We consider problems related to dominators and independent spanning trees in flowgraphs and provide linear-time algorithms for their solutions. We introduce the notion of a directed bipolar order, generalizing a previous notion of Plein and Cheriyan and Reif. We show how to construct such an order from information computed by several known algorithms for finding dominators. We show how to concurrently verify the correctness of a dominator tree D and a directed bipolar order O very simply, and how to construct from D and O two spanning trees whose paths are disjoint except for common dominators. Finally, we describe alternative ways to verify dominators without using a directed bipolar order.
Conference Paper
Internet use by older people has increased dramatically during the past 10 years. According to different sources, the number of users over age 65 has more than doubled since 2000. Besides, the inevitable effect of younger users aging will increase the number of older people using the Internet the next decades. Unfortunately, older people face several challenges when using the web due to diminishing capacities related to aging, such as vision decline, hearing loss, decremented motor skills and cognition issues. On the other hand, e-learning can be an opportunity in helping older people become integrated with the rest of society. In this context, Massive Open Online Courses (MOOC) bring great opportunities to enhance the quality of life of older people by enabling lifelong learning and inclusion in learning communities. However, MOOCs can present some barriers that could hamper full participation by elderly students. In order to avoid these barriers, MOOCs have to meet different user needs, skills and situations: MOOCs have to successfully address web accessibility challenges for elderly students. The purpose of this paper is to raise awareness towards a better understanding of the web accessibility challenges that elderly students of MOOCs face.
Conference Paper
Caravela platform was developed for stream-based computing using Graphics Processing Units (GPUs) as the main processing elements. It provides a new pipeline-processing mechanism called meta-pipeline, which allows to connect processing units in the Caravela platform and invokes an application in a pipeline manner. The processing units can be locally or remotely located, establishing a distributed processing environment. However, it is hard for a programmer to define the processing pipeline by directly using Caravela runtime functions. Thus a GUI-based entry tool for a meta-pipeline application is proposed in this paper. This paper presents the design and the implementation of this entry tool. The tool addresses the main difficulties found for programming meta-pipeline applications: by devising methods for defining pipeline stages and connections among them, to detect illegal connections and to debug the pipelined processing using the Caravela runtime environment. Based on this tool, this paper also presents and discusses a case study of a meta-pipeline application.
Conference Paper
Dynamic Symbolic Execution (DSE) is a state-of-the-art test-generation approach that systematically explores program paths to generate high-covering tests. In DSE, the presence of loops (especially unbound loops) can cause an enormous or even infinite number of paths to be explored. There exist techniques (such as bounded iteration, heuristics, and summarization) that assist DSE in addressing loop problems. However, there exists no literature-survey or empirical work that shows the pervasiveness of loop problems or identifies challenges faced by these techniques on real-world open-source applications. To fill this gap, we provide characteristic studies to guide future research on addressing loop problems for DSE. Our proposed study methodology starts with conducting a literature-survey study to investigate how technical problems such as loop problems compromise automated software-engineering tasks such as test generation, and which existing techniques are proposed to deal with such technical problems. Then the study methodology con-tinues with conducting an empirical study of applying the existing techniques on real-world software applications sampled based on the literature-survey results and major open-source project hostings. This empirical study investigates the pervasiveness of the technical problems and how well existing techniques can address such problems among real-world software applications. Based on such study methodology, our two-phase characteristic studies identify that bounded iteration and heuristics are effective in addressing loop problems when used properly. Our studies further identify challenges faced by these techniques and provide guidelines for effectively addressing these challenges.
Article
This paper presents an intermediate program representation called the Hierarchical Task Graph (HTG), and argues that it is not only suitable as the basis for program optimization and code generation, but it fully encapsulates program parallelism at all levels of granularity. As such, the HTG can be used as the basis for a variety of restructuring and optimization techniques, and hence as the target for front-end compilers as well as the input to source and code generators. Our implementation and testing of the HTG in the Parafrase-2 compiler has demonstrated its suitability and versatility as a potentially universal intermediate representation. In addition to encapsulating semantic information, data and control dependences, the HTG provides more information vital to efficient code generation and optimizations related to parallel code generation. In particular, we introduce the notion of precedence between nodes of the structure whose grain size can range from atomic operations to entire subprograms.
ResearchGate has not been able to resolve any references for this publication.