Conference PaperPDF Available

Static Analysis of Low-level Synchronization.

Authors:

Abstract

We describe a debugger that is being developed for distributed programs in Amoeba. A major goal in our work is to make the debugger independent of the Amoeba kernel. Our design integrates many facilities found in other debuggers, such as execution replay, ...
A preview of the PDF is not available
... Other similar approaches, all of them static, are [8,18]. ...
Article
Tracing is one of the most important techniques for program understanding and debugging. A trace gives the user access to otherwise hidden information about a computation. In the context of concurrent languages, computations are particularly complex due to the non-deterministic execution order of processes and to the restrictions imposed on this order by synchronizations; hence, a tracer is a powerful tool to explore, understand and debug concurrent computations. In CSP, traces are sequences of events that define a particular execution. This notion of trace is completely different to the one used in other paradigms where traces are formed by those source code expressions evaluated during a particular execution. We refer to this second notion of traces as tracks. In this work, we introduce the theoretical basis for tracking concurrent and explicitly synchronized computations in process algebras such as CSP. Tracking computations in this kind of systems is a difficult task due to the subtleties of the underlying operational semantics which combines concurrency, non-determinism and non-termination. We define an instrumented operational semantics that generates as a side-effect an appropriate data structure (a track) which can be used to track computations. The formal definition of a tracking semantics improves the understanding of the tracking process, but also, it allows us to formally prove the correctness of the computed tracks.
Chapter
As part of the ESPRIT-II project EDS a toolkit (called Delphi) is under development for the debugging of Lisp programs with explicit parallelism being executed on a homogeneous distributed machine. It assists in the detection of functionar and synchronisation errors. It also helps to detect unexpected nondeterminacy and sources of poor program performance. Specific mechanisms allow the user to effectively control several processes in a debugging session. The paper introduces the basic concepts behind the tools and how the user may benefit from them.
Article
A large body of data-flow analyses exists for analyzing and optimizing sequential code. Unfortunately, much of it cannot be directly applied on parallel code, for reasons of correctness. This paper presents a technique to automatically, aggressively, yet safely apply sequentially-sound data-flow transformations, without change , on shared-memory programs. The technique is founded on the notion of program references being "siloed" on certain control-flow paths. Intuitively, siloed references are free of interference from other threads within the confines of such paths. Data-flow transformations can, in general, be unblocked on siloed references. The solution has been implemented in a widely used compiler. Results on benchmarks from SPLASH-2 show that performance improvements of up to 41% are possible, with an average improvement of 6% across all the tested programs over all thread counts.
Conference Paper
Precise dynamic race detectors report an error if and only if more than one thread concurrently exhibits conflict on a memory access. They insert instrumentations at compile-time to perform runtime checks on all memory accesses to ensure that all races are captured and no spurious warnings are generated. However, a dynamic race check for a particular memory access statement is guaranteed to be redundant if the statement can be statically identified as thread interference-free. Despite significant recent advances in dynamic detection techniques, the redundant check remains a critical factor that leads to prohibitive overhead of dynamic race detection for multithreaded programs. In this paper, we present a new framework that eliminates redundant race check and boosts the dynamic race detection by performing static optimizations on top of a series of thread interference analysis phases. Our framework is implemented on top of LLVM 3.5.0 and evaluated with an industry dynamic race detector TSAN which is available as a part of LLVM tool chain. 11 benchmarks from SPLASH2 are used to evaluate the effectiveness of our approach in accelerating TSAN by eliminating redundant interference-free checks. The experimental result demonstrates our new approach achieves from 1.4x to 4.0x (2.4x on average) speedup over original TSAN under 4 threads setting, and achieves from 1.3x to 4.6x (2.6x on average) speedup under 16 threads setting.
Article
One important issue in parallel program debugging is the efficient detection of access anomalies caused by uncoordinated accesses to shared variables. On-the-fly detection of access anomalies has two advantages over static analysis or post-mortem trace analysis. First, it reports only actual anomalies during execution. Second, it produces shorter traces for post-mortem analysis purposes if an anomaly is detected, since generating further trace information after the detection of an anomaly is of dubious value. Existing methods for on-the-fly access anomaly detection suffer from performance penalties since the execution of the program being debugged has to be interrupted on every access to shared variables. In this paper, we propose an efficient cachebased access anomaly detection scheme that piggybacks on the overhead already paid by the underlying cache coherence protocol.
Article
Understanding synchronization is important for a parallel programming tool that uses dependence analysis as the basis for advising programmers on the correctness of parallel constructs. This paper discusses static analysis methods that can be applied to parallel programs with event variable synchronization. The objective is to be able to predict potential data races in a parallel program. The focus is on how dependencies and synchronization statements inside loops can be used to analyze complete programs with parallel loop and parallel case style parallelism.
Article
We propose a modest collection of primitives for synchronization and control in parallel numerical algorithms. These are phrased in a syntax that is compatible with FORTRAN, creating a publication language for parallel software. A preprocessor may be used to map code written in this extended FORTRAN into standard FORTRAN with calls to the run-time libraries of the various parallel systems now in use. We solicit the reader's comments on the clarity, as well as the adequacy, of the primitives we have proposed.
Article
This paper discusses the class of algorithms having global parallelism, i.e. those in which parallelism is introduced at the top of the program structure hierarchy. Such algorithms have performance advantages in a shared-memory. MIMD computational model. A programming environment consisting of FORTRAN, enhanced by some pre-processed macros, has been built to aid in writing programs for such algorithms for the Denelcor HEP multiprocessor. Applications of from tens to hundreds of FORTRAN statements have been written and tested in this environment. A few parallelism constructs suffice to yield understandable programs with a high degree of parallelism. The automatic generation of programs with global parallelism seems to be a promising possibility.
Conference Paper
The first vector supercomputers appeared on the market in the early to mid seventies. Yet, because of the lag in developing supporting software, it is only recently that vectorizing compilers powerful enough to effectively utilize vector hardware have been developed. Since parallel programming is a much more complex task than vectorization, we expect the challenge of producing adequate programming support to be much greater. In the ParaScope project, we will be exploring the leverage to be gained through an integrated collection of tools in which each tool depends on the others for important information. For example, the editor will depend on the interprocedural analyzer, which itself depends on the results of editing other modules. The debugger uses dependence information to assist in the location of potential problems. The user interface permits abstract displays of the data-flow information within a program. We believe that it is essential to have this sort of cooperation to provide adequate support for programming on the evolving class of highly parallel machines.