Testing Flow Graph Reducibility

Dominator Tree Certification and Divergent Spanning Trees

Article

Nov 2015

How does one verify that the output of a complicated program is correct? One can formally prove that the program is correct, but this may be beyond the power of existing methods. Alternatively, one can check that the output produced for a particular input satisfies the desired input--output relation by running a checker on the input--output pair. Then one only needs to prove the correctness of the checker. For some problems, however, even such a checker may be too complicated to formally verify. There is a third alternative: augment the original program to produce not only an output but also a correctness certificate, with the property that a very simple program (whose correctness is easy to prove) can use the certificate to verify that the input--output pair satisfies the desired input--output relation. We consider the following important instance of this general question: How does one verify that the dominator tree of a flow graph is correct? Existing fast algorithms for finding dominators are complicated, and even verifying the correctness of a dominator tree in the absence of additional information seems complicated. We define a correctness certificate for a dominator tree, show how to use it to easily verify the correctness of the tree, and show how to augment fast dominator-finding algorithms so that they produce a correctness certificate. We also relate the dominator certificate problem to the problem of finding divergent spanning trees in a flow graph, and we develop algorithms to find such trees. All our algorithms run in linear time. Previous algorithms apply just to the special case of only trivial dominators, and they take at least quadratic time.

BUFFER OVERFLOW VULNERABILITY DIAGNOSIS FOR COMMODITY SOFTWARE

Article

Jiang Zheng

ABSTRACT Buffer overflow attacks have been a computer security threat in software-based systems and applications for decades. The existence of buffer overflow vulnerabilities makes the system

Concurrent disjoint set union

Article

Full-text available

Dec 2021
DISTRIB COMPUT

We develop and analyze concurrent algorithms for the disjoint set union (“union-find” ) problem in the shared memory, asynchronous multiprocessor model of computation, with CAS (compare and swap) or DCAS (double compare and swap) as the synchronization primitive. We give a deterministic bounded wait-free algorithm that uses DCAS and has a total work bound of O(m·lognpm+1+αn,mnp) for a problem with n elements and m operations solved by p processes, where α is a functional inverse of Ackermann’s function. We give two randomized algorithms that use only CAS and have the same work bound in expectation. The analysis of the second randomized algorithm is valid even if the scheduler is adversarial. Our DCAS and randomized algorithms take O(logn) steps per operation, worst-case for the DCAS algorithm, high-probability for the randomized algorithms. Our work and step bounds grow only logarithmically with p, making our algorithms truly scalable. We prove that for a class of symmetric algorithms that includes ours, no better step or work bound is possible. Our work is theoretical, but Alistarh et al (In search of the fastest concurrent union-find algorithm, 2019), Dhulipala et al (A framework for static and incremental parallel graph connectivity algorithms, 2020) and Hong et al (Exploring the design space of static and incremental graph connectivity algorithms on gpus, 2020) have implemented some of our algorithms on CPUs and GPUs and experimented with them. On many realistic data sets, our algorithms run as fast or faster than all others.

Concurrent Disjoint Set Union

Preprint

Mar 2020

We develop and analyze concurrent algorithms for the disjoint set union (union-find) problem in the shared memory, asynchronous multiprocessor model of computation, with CAS (compare and swap) or DCAS (double compare and swap) as the synchronization primitive. We give a deterministic bounded wait-free algorithm that uses DCAS and has a total work bound of $O(m \cdot (\log(np/m + 1) + \alpha(n, m/(np)))$ for a problem with $n$ elements and $m$ operations solved by $p$ processes, where $\alpha$ is a functional inverse of Ackermann's function. We give two randomized algorithms that use only CAS and have the same work bound in expectation. The analysis of the second randomized algorithm is valid even if the scheduler is adversarial. Our DCAS and randomized algorithms take $O(\log n)$ steps per operation, worst-case for the DCAS algorithm, high-probability for the randomized algorithms. Our work and step bounds grow only logarithmically with $p$, making our algorithms truly scalable. We prove that for a class of symmetric algorithms that includes ours, no better step or work bound is possible.

Deterministic parallel fixpoint computation

Article

Full-text available

Dec 2019

interpretation is a general framework for expressing static program analyses. It reduces the problem of extracting properties of a program to computing an approximation of the least fixpoint of a system of equations. The de facto approach for computing this approximation uses a sequential algorithm based on weak topological order (WTO). This paper presents a deterministic parallel algorithm for fixpoint computation by introducing the notion of weak partial order (WPO). We present an algorithm for constructing a WPO in almost-linear time. Finally, we describe Pikos, our deterministic parallel abstract interpreter, which extends the sequential abstract interpreter IKOS. We evaluate the performance and scalability of Pikos on a suite of 1017 C programs. When using 4 cores, Pikos achieves an average speedup of 2.06x over IKOS, with a maximum speedup of 3.63x. When using 16 cores, Pikos achieves a maximum speedup of 10.97x.

An Experimental Study of Dynamic Dominators

Article

Full-text available

Apr 2016

Motivated by recent applications of dominator computations, we consider the problem of dynamically maintaining the dominators of flow graphs through a sequence of insertions and deletions of edges. Our main theoretical contribution is a simple incremental algorithm that maintains the dominator tree of a flow graph with $n$ vertices through a sequence of $k$ edge insertions in $O(m\min\{n,k\}+kn)$ time, where $m$ is the total number of edges after all insertions. Moreover, we can test in constant time if a vertex $u$ dominates a vertex $v$, for any pair of query vertices $u$ and $v$. Next, we present a new decremental algorithm to update a dominator tree through a sequence of edge deletions. Although our new decremental algorithm is not asymptotically faster than repeated applications of a static algorithm, i.e., it runs in $O(mk)$ time for $k$ edge deletions, it performs well in practice. By combining our new incremental and decremental algorithms we obtain a fully dynamic algorithm that maintains the dominator tree through intermixed sequence of insertions and deletions of edges. Finally, we present efficient implementations of our new algorithms as well as of existing algorithms, and conduct an extensive experimental study on real-world graphs taken from a variety of application areas.

Dijkstra Graphs

Article

Feb 2016
DISCRETE APPL MATH

We revisit a concept that has been central in some early stages of computer science, that of structured programming: a set of rules that an algorithm must follow in order to acquire a structure that is desirable in many aspects. While much has been written about structured programming, an important issue has been left unanswered: given an arbitrary, compiled program, decide whether it is structured, that is, whether it conforms to the stated principles of structured programming. By employing graph-theoretic tools, we formulate an efficient algorithm for answering this question. To do so, we first introduce the class of graphs which correspond to structured programs, which we call Dijkstra Graphs. Our problem then becomes the recognition of such graphs, for which we present an $O(n^2)$-time algorithm. Furthermore, we describe an isomorphism algorithm for Dijkstra graphs presenting the same quadratic complexity.

Removing and Restoring Control Flow with the Value State Dependence Graph

Article

James Stanier

A thesis submitted, on July 23rd, 2011, in partial fullment of the requirements for the degree of Doctor of Philosophy (DPhil) in the School of Informatics at the University of Sussex.

A Simple, Fast Dominance Algorithm

Technical Report

Full-text available

Nov 2006

The problem of finding the dominators in a control-flow graph has a long history in the literature. The original algorithms suffered from a large asymptotic complexity but were easy to understand. Subsequent work improved the time bound, but generally sacrificed both simplicity and ease of implementation. This paper returns to a simple formulation of dominance as a global data-flow problem. Some insights into the nature of dominance lead to an implementation of an O(N^2) algorithm that runs faster, in practice, than the classic Lengauer-Tarjan algorithm, which has a timebound of O(E ∗ log(N)). We compare the algorithm to Lengauer-Tarjan because it is the best known and most widely used of the fast algorithms for dominance. Working from the same implementation insights, we also rederive (from earlier work on control dependence by Ferrante, et al.) a method for calculating dominance frontiers that we show is faster than the original algorithm by Cytron, et al. The aim of this paper is not to present a new algorithm, but, rather, to make an argument based on empirical evidence that algorithms with discouraging asymptotic complexities can be faster in practice than those more commonly employed. This research was supported, in part, by Darpa through Usafrl contract F30602-97-2-298, and the State of Texas through its Advanced Technology Program, grant number 3604-0122-1999

Using reactive links to propagate changes across engineering models

Article

Full-text available

Jun 2024
SOFTW SYST MODEL

Collaborative model-driven development is a de facto practice to create software-intensive systems in several domains (e.g., aerospace, automotive, and robotics). However, when multiple engineers work concurrently, keeping all model artifacts synchronized and consistent is difficult. This is even harder when the engineering process relies on a myriad of tools and domains (e.g., mechanic, electronic, and software). Existing work tries to solve this issue from different perspectives, such as using trace links between different artifacts or computing change propagation paths. However, these solutions mainly provide additional information to engineers, still requiring manual work for propagating changes. Yet, most modeling tools are limited regarding the traceability between different domains, while also lacking the efficiency and granularity required during the development of software-intensive systems. Motivated by these limitations, in this work, we present a solution based on what we call “reactive links”, which are highly granular trace links that propagate change between property values across models in different domains, managed in different tools. Differently from traditional “passive links”, reactive links automatically propagate changes when engineers modify models, assuring the synchronization and consistency of the artifacts. The feasibility, performance, and flexibility of our solution were evaluated in three practical scenarios, from two partner organizations. Our solution is able to resolve all cases in which change propagation among models were required. We observed a great improvement of efficiency when compared to the same propagation if done manually. The contribution of this work is to enhance the engineering of software-intensive systems by reducing the burden of manually keeping models synchronized and avoiding inconsistencies that potentially can originate from collaborative engineering in a variety of tool from different domains.

Efficient Features for Function Matching in Multi-Architecture Binary Executables

Article

Full-text available

Jul 2021

Binary-binary function matching problem serves as a plinth in many reverse engineering techniques such as binary diffing, malware analysis, and code plagiarism detection. In literature, function matching is performed by first extracting function features (syntactic and semantic), and later these features are used as selection criteria to formulate an approximate 1:1 correspondence between binary functions. The accuracy of the approximation is dependent on the selection of efficient features. Although substantial research has been conducted on this topic, we have explored two major drawbacks in previous research. (i) The features are optimized only for a single architecture and their matching efficiency drops for other architectures. (ii) function matching algorithms mainly focus on the structural properties of a function, which are not inherently resilient against compiler optimizations. To resolve the architecture dependency and compiler optimizations, we benefit from the intermediate representation (IR) of function assembly and propose a set of syntactic and semantic (embedding-based) features which are efficient for multi-architectures, and sensitive to compiler-based optimizations. The proposed function matching algorithm employs one-shot encoding that is flexible to small changes and uses a KNN based approach to effectively map similar functions. We have evaluated proposed features and algorithms using various binaries, which were compiled for x86 and ARM architectures; and the prototype implementation is compared with Diaphora (an industry-standard tool), and other baseline research. Our proposed prototype has achieved a matching accuracy of approx. 96%, which is higher than the compared tools and consistent against optimizations and multi-architecture binaries.

Steroids for DOPed Applications: A Compiler for Automated Data-Oriented Programming

Preprint

Jul 2020

The wide-spread adoption of system defenses such as the randomization of code, stack, and heap raises the bar for code-reuse attacks. Thus, attackers utilize a scripting engine in target programs like a web browser to prepare the code-reuse chain, e.g., relocate gadget addresses or perform a just-in-time gadget search. However, many types of programs do not provide such an execution context that an attacker can use. Recent advances in data-oriented programming (DOP) explored an orthogonal way to abuse memory corruption vulnerabilities and demonstrated that an attacker can achieve Turing-complete computations without modifying code pointers in applications. As of now, constructing DOP exploits requires a lot of manual work. In this paper, we present novel techniques to automate the process of generating DOP exploits. We implemented a compiler called Steroids that compiles our high-level language SLANG into low-level DOP data structures driving malicious computations at run time. This enables an attacker to specify her intent in an application- and vulnerability-independent manner to maximize reusability. We demonstrate the effectiveness of our techniques and prototype implementation by specifying four programs of varying complexity in SLANG that calculate the Levenshtein distance, traverse a pointer chain to steal a private key, relocate a ROP chain, and perform a JIT-ROP attack. Steroids compiles each of those programs to low-level DOP data structures targeted at five different applications including GStreamer, Wireshark, and ProFTPd, which have vastly different vulnerabilities and DOP instances. Ultimately, this shows that our compiler is versatile, can be used for both 32- and 64-bit applications, works across bug classes, and enables highly expressive attacks without conventional code-injection or code-reuse techniques in applications lacking a scripting engine.

Supporting on-stack replacement in unstructured languages by loop reconstruction and extraction

Conference Paper

Oct 2019

On-stack replacement (OSR) is a common technique employed by dynamic compilers to reduce program warm-up time. OSR allows switching from interpreted to compiled code during the execution of this code. The main targets are long running loops, which need to be represented explicitly, with dedicated information about condition and body, to be optimized at run time. Bytecode interpreters, however, represent control flow implicitly via unstructured jumps and thus do not exhibit the required high-level loop representation. To enable OSR also for jump-based - often called unstructured - languages, we propose the partial reconstruction of loops in order to explicitly represent them in a bytecode interpreter. Besides an outline of the general idea, we implemented our approach in Sulong, a bytecode interpreter for LLVM bitcode, which allows the execution of C/C++. We conducted an evaluation with a set of C benchmarks, which showed speed-ups in warm-up of up to 9x for certain benchmarks. This facilitates execution of programs with long-running loops in rarely called functions, which would yield significant slowdown without OSR. While shown with a prototype implementation, the overall idea of our approach is generalizable for all bytecode interpreters.

Function matching between binary executables: efficient algorithms and features

Article

Full-text available

Dec 2019

Binary diffing consists in comparing syntactic and semantic differences of two programs in binary form, when source code is unavailable. It can be reduced to a graph isomorphism problem between the Control Flow Graphs, Call Graphs or other forms of graphs of the compared programs. Here we present REveal, a prototype tool which implements a binary diffing algorithm and an associated set of features, extracted from a binary’s CG and CFGs. Additionally, we explore the potential of applying Markov lumping techniques on function CFGs. The proposed algorithm and features are evaluated in a series of experiments on executables compiled for i386, amd64, arm and aarch64. Furthermore, the effectiveness of our prototype tool, code-named REveal, is assessed in a second series of experiments involving clustering of a corpus of 18 malware samples into 5 malware families. REveal’s results are compared against those produced by Diaphora, the most widely used binary diffing software of the public domain. We conclude that REveal improves the state-of-the-art in binary diffing by achieving higher matching scores, obtained at the cost of a slight running time increase, in most of the experiments conducted. Furthermore, REveal successfully partitions the malware corpus into clusters consisting of samples of the same malware family.

Supporting On-Stack Replacement in Unstructured Languages by Loop Reconstruction and Extraction

Preprint

Sep 2019

On-stack replacement (OSR) is a common technique employed by dynamic compilers to reduce program warm-up time. OSR allows switching from interpreted to compiled code during the execution of this code. The main targets are long running loops, which need to be represented explicitly, with dedicated information about condition and body, to be optimized at run time. Bytecode interpreters, however, represent control flow implicitly via unstructured jumps and thus do not exhibit the required high-level loop representation. To enable OSR also for jump-based - often called unstructured - languages, we propose the partial reconstruction of loops in order to explicitly represent them in a bytecode interpreter. Besides an outline of the general idea, we implemented our approach in Sulong, a bytecode interpreter for LLVM bitcode, which allows the execution of C/C++. We conducted an evaluation with a set of C benchmarks, which showed speed-ups in warm-up of up to 9x for certain benchmarks. This facilitates execution of programs with long-running loops in rarely called functions, which would yield significant slowdown without OSR. While shown with a prototype implementation, the overall idea of our approach is generalizable for all bytecode interpreters.

Deterministic Parallel Fixpoint Computation

Preprint

Sep 2019

interpretation is a general framework for expressing static program analyses. It reduces the problem of extracting properties of a program to computing a fixpoint of a system of equations. The de facto approach for computing this fixpoint uses a sequential algorithm based on weak topological order (WTO). This paper presents a deterministic parallel algorithm for fixpoint computation by introducing the notion of weak partial order (WPO). We present an algorithm for constructing a WPO in almost-linear time. Finally, we describe our deterministic parallel abstract interpreter PIKOS. We evaluate the performance of PIKOS on a suite of 207 C programs. We observe a maximum speedup of 9.38x when using 16 cores compared to the existing WTO-based sequential abstract interpreter.

Efficient features for function matching between binary executables

Conference Paper

Full-text available

Mar 2018

A tool to prioritize code smells in distributed development

Article

Full-text available

Oct 2017
Lat Am Trans IEEE

A code smell is a symptom in the source code that helps to identify a design problem. Several tools for detecting and ranking code smells according to their criticality to the system have been developed. However, existing works assume a centralized development approach, which does not consider systems being developed in a distributed fashion. The main problem in a distributed group of developers is that a tool cannot always ensure a global vision of (smells of) the system, and thus inconsistencies among the rankings provided by each developer are likely to happen. These inconsistencies often cause unnecessary refactorings and might not focus the whole team on the critical smells system-wide. Along this line, this work proposes a multi-agent tool, called D-JSpIRIT, which helps individual developers to reach a consensus on their smell rankings by means of distributed optimization techniques.

An ELI-to-C compiler: design, implementation, and performance

Conference Paper

Jun 2017

ELI is a succinct array-based interactive programming language derived from APL. In this paper we present the overall design and implementation of a bootstrapped ELI-to-C compiler which is implemented in ELI. We provide a brief introduction to the ELI language, a high-level view of the code generation strategy, and a description of our bootstrapping process. We also provide a preliminary performance evaluation. Firstly, we use three existing C benchmarks to demonstrate the performance of the ELI-generated C code as compared with interpreted ELI and native C. Secondly, we use two benchmarks originally from APL to compare the ELI-generated C to interpreted ELI and a naive hand-generated C version. These preliminary results are encouraging, showing speedups over the interpreter and in many cases performance close to C. The results also show that some future optimizations, such as copy elimination/avoidance, would be beneficial.

On the generation and analysis of program transformations

Article

Aug 2010

Richard Warburton

Vérification de contraintes temporelles strictes sur des programmes par composition d'analyses partielles

Article

Sep 2010

Clément Ballabriga

Visual MPI Performance Analysis using Event Flow Graphs

Article

Full-text available

Dec 2015

Event flow graphs used in the context of performance monitoring combine the scalability and low overhead of profiling methods with lossless information recording of tracing tools. In other words, they capture statistics on the performance behavior of parallel applications while pre- serving the temporal ordering of events. Event flow graphs require significantly less storage than regular event traces and can still be used to recover the full ordered sequence of events performed by the application.

OPTIMIZATION OF INSTRUCTION MEMORY FOR EMBEDDED SYSTEMS

Article

Andhi Janapsatya

Realisierung von verteilten Anwendungsszenarien anhand einer Service-orientierten Architektur

Article

Jan 2006

Patrick M Schneider

Für die erfolgreiche IT-Unterstützung abteilungs- und standortübergreifender Anwendungsszenarien müssen die vorhandenen und zukünftig einzusetzenden Informationssysteme miteinander integriert werden. Aufgrund der sich häufig ändernden Geschäftsprozesse ist hierzu ein flexibler Integrationsansatz nötig, der mit geringem Zeit- und Kostenaufwand auch an geänderte Abläufe anpassbar ist. Dies bedingt, dass die vorhandenen, teilweise sehr heterogenen Systeme mit einer geeigneten standardisierten Schnittstelle versehen und als Service gekapselt werden. Die eigentliche Realisierung des anwendungsübergreifenden Workflows erfolgt dann durch eine zentrale Workflow-Orchestrierungsengine, die die einzelnen Services anspricht und miteinander verbindet. In der ausgeschriebenen Diplomarbeit sollen am Beispiel eines Szenarios aus dem e-Government-Umfeld die in Form ereignisgesteuerter Prozessketten (EPKs) modellierten Geschäftsprozesse auf eine geeignete Service-orientierte Architektur abgebildet werden und dort mit Hilfe einer passenden Workflow-Orchestrierungssprache in ihrem Ablauf unterstützt werden. Als Orchestrierungssprache ist BPEL (Business Process Execution Language) vorgesehen.

Computing Liveness Sets for SSA-Form Programs

Article

Full-text available

Apr 2011

We revisit the problem of computing liveness sets, i.e., the set of variables live-in and live-out of basic blocks, for programs in strict SSA (static single assignment). Strict SSA is also known as SSA with dominance property because it ensures that the definition of a variable always dominates all its uses. This property can be exploited to optimize the computation of liveness sets. Our first contribution is the design of a fast data-flow algorithm, which, unlike traditional approaches, avoids the iterative calculation of a fixed point. Thanks to the properties of strict SSA form and the use of a loop-nesting forest, we show that two passes are sufficient. A first pass, similar to the initialization of iterative data-flow analysis, traverses the control-flow graph in postorder propagating liveness information backwards. A second pass then traverses the loop-nesting forest, updating liveness information within loops. Another approach is to propagate from uses to definition, one variable and one path at a time, instead of unioning sets as in standard data-flow analysis. Such a path-exploration strategy was proposed by Appel in his ''Tiger book'' and is also used in the LLVM compiler. Our second contribution is to show how to extend and optimize algorithms based on this idea to compute liveness sets one variable at a time using adequate data~structures. Finally, we evaluate and compare the efficiency of the proposed algorithms using the SPECINT 2000 benchmark suite. The standard data-flow approach is clearly outperformed, all algorithms show substantial speed-ups of a factor of 2 on average. Depending on the underlying set implementation either the path-exploration approach or the loop-forest-based approach provides superior performance. Experiments show that our loop-forest-based algorithm provides superior performances (average speed-up of 43% on the fastest alternative) when sets are represented as bitsets and for optimized programs, i.e., when there are more variables and larger live-sets and live-ranges.

Full characterization of a class of graphs suitable for software watermarking

Article

Full-text available

Feb 2014

Watermarking solutions have been regarded as a promising way to fight copyright violations and intellectual theft of software. They consist of techniques to embed authorship/ownership data into a computer program. In some graph-based watermarking schemes, the identification data is disguised as control-flow graph of dummy code. Based on an idea of Collberg et al., Chroni and Nikolopoulos recently developed an ingenious such scheme whereby an integer is encoded into a particular kind of permutation graph. We extend the work of those authors as follows. First, we give a formal characterization of the class of graphs generated by their encoding function. Then, we provide a simpler decoding function and a robust polynomial-time algorithm which restores watermarks with a constant number of missing edges whenever at all possible, therefore providing a reasonable level of pro-tection against distortive attacks which aim at disabling the watermark.

Finding Dominators via Disjoint Set Union

Article

Oct 2013
J Discrete Algorithm

The problem of finding dominators in a directed graph has many important applications, notably in global optimization of computer code. Although linear and near-linear-time algorithms exist, they use sophisticated data structures. We develop an algorithm for finding dominators that uses only a "static tree" disjoint set data structure in addition to simple lists and maps. The algorithm runs in near-linear or linear time, depending on the implementation of the disjoint set data structure. We give several versions of the algorithm, including one that computes loop nesting information (needed in many kinds of global code optimization) and that can be made self-certifying, so that the correctness of the computed dominators is very easy to verify.

Tuning an Adaptive-Compilation Search Space with Loop Unrolling

Article

Jeffrey A. Sandoval

Abstract This thesis demonstrates that careful selection of compiler transformations can im- prove the output and reduce the compile-time cost of adaptive compilation. Compiler effectiveness depends on the order of code transformations applied. Adaptive com- pilation, then, uses empirical search to tune the transformation sequence for each program. This method achieves higher performance than traditional compilers, but often requires large compilation times. Previous research reduces compilation time by tuning the search process. This thesis, instead, tunes the search space by adding a loop unroller, addressing a deficiency in our compiler. Despite increasing the search- space size, this change results in more effective and efficientsearching. Averaged across nine benchmarks, the adaptive compiler produces code, more quickly, that is 10% faster. Unfortunately, implementing a loop unroller is non-trivial for our low- level intermediate language. Therefore, this thesis also contributes an algorithm for identifying and unrolling loops in the absence of high-level loop structures. Acknowledgments

Representing Data Collections in an SSA Form

Conference Paper

Mar 2024

Dealing with Intractability of Information System Subsystems Development Order via Control Flow Graph Reducibility

Conference Paper

Sep 2020

STEROIDS for DOPed Applications: A Compiler for Automated Data-Oriented Programming

Conference Paper

Jun 2019

Towards Elastic Resource Management: Proceedings of the 11th International Workshop on Parallel Tools for High Performance Computing, September 2017, Dresden, Germany

Chapter

Jan 2019

A new paradigm for HPC Resource Management, called Elastic Computing, is under development at the Invasive Computing Transregional Collaborative Research Center. An extension to MPI for programming elastic applications and a resource manager were implemented. The resource manager is an extension of the SLURM batch scheduler. Resource elasticity allows the resource manager to dictate changes in the resource allocations of running applications based on scheduler decisions. These resource allocation changes are decided by the scheduler based on performance feedback from the applications. The collection of performance feedback from running applications poses unique challenges for the runtime system. In this document, our current performance feedback system is presented.

Maintaining Canonical Form After Edge Deletion

Conference Paper

Jul 2018

Eric Fritz

Waddle is a research intermediate-form optimizer that strictly maintains a canonical form similar to the loop-simplify form used in LLVM. The properties of this canonical form simplify movement of instructions to the edges of loops and often localize the effect on variables to the loop in which they are defined. The guarantee of canonical form preservation allows program transformations to rely on the presence of certain program properties without a necessary sanity-check or recalculation pre-step and does not impose an order of transformations in which reconstruction passes must be inserted. In this paper, we present a form-preserving edge deletion operation, in which a provably unreachable branch between two basic blocks is removed from the control flow graph. Additionally, we showa distinct application of the block ejection operation, a core procedure used for loop body reconstruction, as utilized in a function inlining transformation.

SymTest: A Framework for Symbolic Testing of Embedded Software

Conference Paper

Feb 2016

In test case generation methods based on symbolic testing and/or model checking, the primary emphasis is on covering code/model elements and not on optimising test sequence length. However, in certain domains, e.g. embedded systems, GUI, networking software, testing process may involve interaction with other physical subsystems, possibly remotely situated. Thus, test sequence length may have important implications on the cost of testing. In this paper, we present SymTest, a novel framework for test sequence generation for testing embedded systems. SymTest selects good control flow paths so as to generate shorter test sequences. In case of unsatisfiability, SymTest explores the neighbouring paths using backtracking and heuristics. SymTest is distinctive w.r.t. other related methods in its attempt to generate shorter test sequences while searching for feasible paths. The other novelty is that SymTest allows plugging in heuristics in a flexible way, a feature because of which we call SymTest a framework and not an algorithm. Part of SymTest's power is in its extensibility to seamlessly accommodate more heuristics, thus enhancing its ability to generate shorter test sequences with economy of effort. Our experiments with SymTest show that SymTest achieves significantly shorter test sequences, with comparatively higher test coverage, as compared with other methods.

A probabilistic pointer analysis for speculative optimizations

Article

Oct 2006

Pointer analysis is a critical compiler analysis used to disambiguate the indirect memory references that result from the use of pointers and pointer-based data structures. A conventional pointer analysis deduces for every pair of pointers, at any program point, whether a points-to relation between them (i) definitely exists, (ii) definitely does not exist, or (iii) maybe exists. Many compiler optimizations rely on accurate pointer analysis, and to ensure correctness cannot optimize in the maybe case. In contrast, recently-proposed speculative optimizations can aggressively exploit the maybe case, especially if the likelihood that two pointers alias can be quantified. This paper proposes a Probabilistic Pointer Analysis (PPA) algorithm that statically predicts the probability of each points-to relation at every program point. Building on simple control-flow edge profiling, our analysis is both one-level context and flow sensitive-yet can still scale to large programs including the SPEC 2000 integer benchmark suite. The key to our approach is to compute points-to probabilities through the use of linear transfer functions that are efficiently encoded as sparse matrices.We demonstrate that our analysis can provide accurate probabilities, even without edge-profile information. We also find that-even without considering probability information-our analysis provides an accurate approach to performing pointer analysis.

Automatic On-Line Detection of MPI Application Structure with Event Flow Graphs

Conference Paper

Aug 2015

The deployment of larger and larger HPC systems challenges the scalability of both applications and analysis tools. Performance analysis toolsets provide users with means to spot bottlenecks in their applications by either collecting aggregated statistics or generating lossless time-stamped traces. While obtaining detailed trace information is the best method to examine the behavior of an application in detail, it is infeasible at extreme scales due to the huge volume of data generated. In this context, knowing the application structure, and particularly the nesting of loops in iterative applications is of great importance as it allows, among other things, to reduce the amount of data collected by focusing on important sections of the code. In this paper we demonstrate how the loop nesting structure of an MPI application can be extracted on-line from its event flow graph without the need of any explicit source code instrumentation. We show how this knowledge on the application structure can be used to compute post-mortem statistics as well as to reduce the amount of redundant data collected. To that end, we present a usage scenario where this structure information is utilized on-line (while the application runs) to intelligently collect fine-grained data for only a few iterations of an application, considerably reducing the amount of data gathered.

The quadratic assignment problem: special cases and relatives

Article

Jan 1995

Eranda Dragoti-Cela

Compiler-Based Code-Improvement Techniques

Article

Full-text available

Since the earliest days of compilation, code quality has been recognized as an important problem [18]. A rich literature has developed around the issue of improving code quality. This paper surveys one part of that literature: code transformations intended to improve the running time of programs on uniprocessor machines. This paper emphasizes transformations intended to improve code quality rather than analysis methods. We describe analytical techniques and specific data-flow problems to the extent that they are necessary to understand the transformations. Other papers provide excellent summaries of the various sub-fields of program analysis. The paper is structured around a simple taxonomy that classifies transformations based on how they change the code. The taxonomy is populated with example transformations drawn from the literature. Each transformation is described at a depth that facilitates broad understanding; detailed references are provided for deeper study of individual transformations. The taxonomy provides the reader with a framework for thinking about code-improving transformations. It also serves as an organizing principle for the paper. Copyright 1998, all rights reserved. You may copy this article for your personal use in Comp 512. Further reproduction or distribution requires written permission from the authors.

On Finding Lowest Common Ancestors in Trees

Article

Mar 1976

This article defines a problem that involves merging nodes into trees while retaining the ability to determine the lowest common ancestor of any two nodes. An O(n log n) algorithm is offered to solve the problem on-line. It is shown how this algorithm provides a fast way of computing the dominator tree of a reducible flow graph.

Scenario-driven approach for business process development

Article

Full-text available

Jul 2012

The development of services-based systems starts from defining business requirements to be implemented as high-level business processes. In this paper, we describe a scenario-driven approach for developing business processes specified as WS-BPEL descriptions. We aim for simplicity in the business level notation and leverage example-like modelling principles in order to enable process sketching. The first step in our approach is to identify the essential functional requirements for business processes. The requirements are modelled as simple scenarios, each of them defining a sample run through the process, i.e., required behaviour that the underlying service-based system should allow. The scenarios, specifying sent and received messages among the services, are synthesised into a state machine. The state machine is transformed into an initial process model given in UML activity model notation. To enable mapping into WS-BPEL code, the transformation exploit domain-specific rules, i.e., our target model consists of a subset of UML with WS-BPEL specific constraints and stereotypes. The initial process model can be further refined to enable generation of executable WS-BPEL descriptions. We apply the approach on two cases, a simple process for managing loan requests and an industry case study from a logistics provider are presented.

Domain-specific application analysis for customized instruction identification

Article

Oct 2014

With the increasing importance of Application Domain Specific Processor (ADSP) design, a significant challenge is to identify special-purpose operations for implementation as a customized instruction. While many methodologies have been proposed for this purpose, they all work for a single algorithm chosen from the target application domain. Such algorithm-specific approaches are not suitable for designing instruction sets applicable to a whole family of related algorithms. For an entire range of related algorithms, this paper develops a methodology for identifying compound operations, as a basis for designing “domain-specific” Instruction Set Architectures (ISAs) that can efficiently run most of the algorithms in a given domain. Our methodology combines three different static analysis techniques to identify instruction sequences common to several related algorithms: identification of (non-branching) instruction sequences that occur commonly across the algorithms; identification of instruction sequences nested within iterative constructs that are thus executed frequently; and identification of commonly-occurring instruction sequences that span basic blocks. Choosing different combinations of these results enables us to design domain-specific special operations with different desired characteristics, such as performance or suitability as a library function. To demonstrate our approach, case studies are carried out for a family of thirteen string matching algorithms. Finally, the validity of our static analysis results is confirmed through independent dynamic analysis experiments and performance improvement measurements.

Dominators, Directed Bipolar Orders, and Independent Spanning Trees

Conference Paper

Jul 2012

We consider problems related to dominators and independent spanning trees in flowgraphs and provide linear-time algorithms for their solutions. We introduce the notion of a directed bipolar order, generalizing a previous notion of Plein and Cheriyan and Reif. We show how to construct such an order from information computed by several known algorithms for finding dominators. We show how to concurrently verify the correctness of a dominator tree D and a directed bipolar order O very simply, and how to construct from D and O two spanning trees whose paths are disjoint except for common dominators. Finally, we describe alternative ways to verify dominators without using a directed bipolar order.

Web accessibility of MOOCs for elderly students

Conference Paper

Oct 2013

Internet use by older people has increased dramatically during the past 10 years. According to different sources, the number of users over age 65 has more than doubled since 2000. Besides, the inevitable effect of younger users aging will increase the number of older people using the Internet the next decades. Unfortunately, older people face several challenges when using the web due to diminishing capacities related to aging, such as vision decline, hearing loss, decremented motor skills and cognition issues. On the other hand, e-learning can be an opportunity in helping older people become integrated with the rest of society. In this context, Massive Open Online Courses (MOOC) bring great opportunities to enhance the quality of life of older people by enabling lifelong learning and inclusion in learning communities. However, MOOCs can present some barriers that could hamper full participation by elderly students. In order to avoid these barriers, MOOCs have to meet different user needs, skills and situations: MOOCs have to successfully address web accessibility challenges for elderly students. The purpose of this paper is to raise awareness towards a better understanding of the web accessibility challenges that elderly students of MOOCs face.

Design and implementation of a graphical user interface for stream-based distributed computing

Conference Paper

Jan 2008

Caravela platform was developed for stream-based computing using Graphics Processing Units (GPUs) as the main processing elements. It provides a new pipeline-processing mechanism called meta-pipeline, which allows to connect processing units in the Caravela platform and invokes an application in a pipeline manner. The processing units can be locally or remotely located, establishing a distributed processing environment. However, it is hard for a programmer to define the processing pipeline by directly using Caravela runtime functions. Thus a GUI-based entry tool for a meta-pipeline application is proposed in this paper. This paper presents the design and the implementation of this entry tool. The tool addresses the main difficulties found for programming meta-pipeline applications: by devising methods for defining pipeline stages and connections among them, to detect illegal connections and to debug the pipelined processing using the Caravela runtime environment. Based on this tool, this paper also presents and discusses a case study of a meta-pipeline application.

Fundamental Approaches to Software Engineering

Book

Jan 2011
Lect Notes Comput Sci

Characteristic Studies of Loop Problems for Structural Test Generation via Symbolic Execution

Conference Paper

Nov 2013

Dynamic Symbolic Execution (DSE) is a state-of-the-art test-generation approach that systematically explores program paths to generate high-covering tests. In DSE, the presence of loops (especially unbound loops) can cause an enormous or even infinite number of paths to be explored. There exist techniques (such as bounded iteration, heuristics, and summarization) that assist DSE in addressing loop problems. However, there exists no literature-survey or empirical work that shows the pervasiveness of loop problems or identifies challenges faced by these techniques on real-world open-source applications. To fill this gap, we provide characteristic studies to guide future research on addressing loop problems for DSE. Our proposed study methodology starts with conducting a literature-survey study to investigate how technical problems such as loop problems compromise automated software-engineering tasks such as test generation, and which existing techniques are proposed to deal with such technical problems. Then the study methodology con-tinues with conducting an empirical study of applying the existing techniques on real-world software applications sampled based on the literature-survey results and major open-source project hostings. This empirical study investigates the pervasiveness of the technical problems and how well existing techniques can address such problems among real-world software applications. Based on such study methodology, our two-phase characteristic studies identify that bounded iteration and heuristics are effective in addressing loop problems when used properly. Our studies further identify challenges faced by these techniques and provide guidelines for effectively addressing these challenges.

The hierarchical task graph as a universal intermediate representation

Article

Oct 1994

This paper presents an intermediate program representation called the Hierarchical Task Graph (HTG), and argues that it is not only suitable as the basis for program optimization and code generation, but it fully encapsulates program parallelism at all levels of granularity. As such, the HTG can be used as the basis for a variety of restructuring and optimization techniques, and hence as the target for front-end compilers as well as the input to source and code generators. Our implementation and testing of the HTG in the Parafrase-2 compiler has demonstrated its suitability and versatility as a potentially universal intermediate representation. In addition to encapsulating semantic information, data and control dependences, the HTG provides more information vital to efficient code generation and optimizations related to parallel code generation. In particular, we introduce the notion of precedence between nodes of the structure whose grain size can range from atomic operations to entire subprograms.

Improving Adaptive Compilation with Truncated Execution and Loop Unrolling

Article

Full-text available