Robert H. B. Netzer's research while affiliated with Brown University and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (43)
Dynamic data race detection is a critical part of debugging shared-memory parallel programs. The races that can be detected must be refined to filter out false alarms and pinpoint only those that are direct manifestations of bugs. Most race detection methods can report false alarms because of imprecise run-time information and because some races ar...
We address the problem of detecting race conditions in programs that use semaphores for synchronization. Netzer and Miller showed that it is NP-complete to detect race conditions in programs that use many semaphores. We show in this paper that it remains NP-complete even if only two semaphores are used in the parallel programs. For the tractable ca...
The widespread adoption of distributed computing has accentuated the need for an effective set of support tools to facilitate debugging and monitoring of distributed programs. Unfortunately for distributed programs, this is not a trivial task. Many distributed programs are inherently non-deterministic in nature. Two runs of the same programs with t...
To support incremental replay of message-passing applications, processes must periodically checkpoint and the content of some messages must be logged, to break dependencies of the current state of the execution on past events. This paper shows that known adaptive logging algorithms are likely to introduce deadlocks in replay, and we introduce a new...
A useless checkpoint is a local checkpoint that cannot be part of a consistent global checkpoint. This paper addresses the following problem. Given a set of processes that take (basic) local checkpoints in an independent and unknown way, the problem is to design communication-induced checkpointing protocols that direct processes to take additional...
We address a problem arising in debugging parallel programs, detecting race conditions in programs using semaphores for synchronization. It is NPcomplete to detect race conditions in programs that use many semaphores [10]. We show in this paper that it remains NP-complete even if the programs are allowed to use only two semaphores. For the case of...
To support incremental replay of message-passing applications,
processes must periodically checkpoint and the content of some messages
must be logged, to break dependencies of the current state of the
execution on past events. The paper presents a new adaptive logging
algorithm that dynamically decides whether to log a message based on
dependencies...
A global checkpoint is a set of local checkpoints, one per
process. The traditional consistency criterion for global checkpoints
states that a global checkpoint is consistent if it does not include
messages received and not sent. The paper investigates other consistency
criteria, transitlessness, and strong consistency. A global checkpoint
is trans...
To support incremental replay of message-passing applications, processes must periodically checkpoint and the content of some messages must be logged, to break dependencies of the current state of the execution on past events. The paper presents a new adaptive logging algorithm that dynamically decides whether to log a message based on dependencies...
A useless checkpoint is a local checkpoint that cannot be part of a consistent global checkpoint. The paper addresses the following important problem. Given a set of processes that take (basic) local checkpoints in an independent and unknown way, the problem is to design a communication induced checkpointing protocol that directs processes to take...
Debugging long program runs can be difficult because of the delays
required to repeatedly re-run the execution. Even a moderately long run
of five minutes can incur aggravating delays. To address this problem,
techniques exist that allow re-executing a distributed program from
intermediate points by using combinations of checkpointing and message
l...
A useless checkpoint is a local checkpoint that cannot be part of a consistent global checkpoint. This paper addresses the following problem. Given a set of processes that take (basic) local checkpoints in an independent and unknown way, the problem is to design communication-induced checkpointing protocols that direct processes to take additional...
Consistent global checkpoints have many uses in distributed
computations. A central question in applications that use consistent
global checkpoints is to determine whether a consistent global
checkpoint that includes a given set of local checkpoints can exist.
Netzer and Xu (1995) presented the necessary and sufficient conditions
under which such a...
: A global checkpoint is a set of local checkpoints, one per process. The traditional consistency criterion for global checkpoints states that a global checkpoint is consistent iff it does not include messages received and not sent. This paper investigates other consistency criteria, transitlessness and strong consistency. A global checkpoint is tr...
Flowback analysis is a powerful technique for debugging programs. It allows the programmer to examine dynamic dependences in a program's execution history without having to re-execute the program. The goal is to present to the programmer a graphical view of the dynamic program dependences. We are building a system, called PPD, that performs flowbac...
A useless checkpoint is a local checkpoint that cannot be part of a consistent global checkpoint. This paper addresses the following important problem. Given a set of processes that take (basic) local checkpoints in an independent and unknown way, the problem is to design a communicationinduced checkpointing protocol that directs processes to take...
For shared-memory systems, the most commonly assumed programmer's model of memory is sequential consistency. The weaker models of weak ordering, release consistency with sequentially consistent synchronization operations, data-race-free-0, and data-race-free-1 provide higher performance by guaranteeing sequential consistency to only a restricted cl...
We address a problem arising in debugging parallel programs, detecting race conditions in programs using semaphores for synchronization. It is NP-complete to detect race conditions in programs that use polynomial number of semaphores [10]. We show in this paper that it remains NP-complete even if the programs are allowed to use only two semaphores,...
We address a problem arising in debugging parallel programs, detecting race conditions in programs using a single semaphore for synchronization. It is NP-complete to detect races in programs that use many semaphores. For the case of a single semaphore, we give an algorithm that takes O(n
1.5p) time, where p is the number of processors and n is the...
In this paper we address the problem of dynamically locating unwanted nondeterminism (race conditions) in executions of explicitly parallel message-passing programs. We formally define what it means for a race to exist and show conceptually how to dynamically locate races. We also show the importance of accurate race detection as a starting point f...
We present a sender-based message logging protocol for supporting
fault tolerance with checkpointing and rollback recovery in distributed
systems. Our scheme achieves the benefits of both optimistic and
pessimistic message logging. Experimental results show that the maximum
rollback induced by our protocol, and the number of messages logged, can
be...
As most important applications today are large-scale in nature, high-performance methods are becoming indispensable. Two promising computational paradigms for large-scale applications are dynamic and I/O-efficient computations. We give efficient dynamic data structures for several fundamental problems in computational geometry, including point loca...
The overhead of saving checkpoints to stable storage is the dominant performance cost in checkpointing systems. In this paper, we present a complete study of compressed differences, a new algorithm for fast incremental checkpointing. Compressed differences reduce the overhead of checkpointing by saving only the words that have changed in the curren...
Consistent global snapshots are important in many distributed
applications. We prove the exact conditions for an arbitrary checkpoint,
or a set of checkpoints, to belong to a consistent global snapshot, a
previously open problem. To describe the conditions, we introduce a
generalization of Lamport's (1978) happened-before relation called a
zigzag p...
Acommon debugging strategy involves re-executing a program (on a given input) over and over,e ach time gaining more information about bugs. Such techniques can fail on message-passing parallel programs. Because of nondeterminacy,different runs on the given input may produce different results. This non-repeatability is a serious debugging problem, s...
Debugging long-running, nondeterministic message-passing parallel
programs requires incremental replay, the ability to exactly replay
selected parts of an execution. To support incremental replay, we must
log enough messages and checkpoint processes often enough to allow any
requested replay to complete quickly. We present an adaptive tracing
strat...
This paper presents an adaptive message logging algorithm that keeps time and space costs low by logging only a fraction of the messages. The algorithm dynamically tracks dependences among messages to determine which cause domino effects and must be traced. The domino effect can force a replay to start arbitrarily far back in the execution, and dom...
Adaptive message logging, which traces dependences between messages and checkpoints and selectively logs messages, letting users accurately and efficiently replay specific portions of parallel programs, is presented. Traces are reduced by logging only messages that cannot be quickly recomputed during replay. By restarting the execution at the right...
Execution replay is a debugging strategy where a program is run over and over on an input that manifests bugs. For explicitly parallel shared-memory programs, execution replay requires support of special tools --- because these programs can be nondeterministic, their executions can differ from run to run on the same input. For such programs, execut...
Execution replay is a crucial part of debugging. Because explicitly parallel shared-memory programs can be nondeterministic, a tool is required that traces executions so they can be replayed for debugging. We present an adaptive tracing strategy that is optimal and records the minimal number of shared-memory references required to exactly replay ex...
An abstract is not available.
In shared-memory parallel programs that use explicit synchronization, race conditions result when accesses to shared memory are not properly synchronized. Race conditions are often considered to be manifestations of bugs since their presence can cause the program to behave unexpectedly. Unfortunately, there has been little agreement in the literatu...
For shared-memory parallel programs that use explicit synchronization, data race detection is an important part of debugging. A data race exists when concurrently executing sections of code access common shared variables. In programs intended to be data race free, they are sources of nondeterminism usually considered bugs. Previous methods for dete...
Flowback analysis is a powerful technique for debugging programs. It allows the programmer to examine dynamic dependences in a program's execution history without having to re-execute the program. The goal is to present to the programmer a graphical view of the dynamic program dependences. We are building a system, called PPD, that performs flowbac...
An abstract is not available.
Several methods currently exist for detecting data races in an execution of a shared-memory parallel program. Although these methods address an important aspect of parallel program debugging, they do not precisely define the notion of a data race. As a result, is it not possible to precisely state which data races are detected, nor is the meaning o...
This paper presents results on the complexity of computing event orderings for sharedmemory parallel program executions. Given a program execution, we formally define the problem of computing orderings that the execution must have exhibited or could have exhibited, and prove that computing such orderings is an intractable problem. We present a form...
A common debugging strategy involves reexecuting a program (on a given input) over and over, each time gaining more information about bugs. Such techniques can fail on message-passing parallel programs. Because of variations in message latencies and process scheduling, different runs on the given input may produce different results. This non-repeat...
Citations
... Since then, detection algorithms have been designed for many different classes of bugs such as: race conditions [11], predicates on single global states [1], predicates based on sequences of global states [5]. Research in replaying trace computations have focussed on reducing the size of the trace by determining which events are necessary for successful replaying [9]. Our approach focusses on adding a control mechanism to the debugging process to allow computations to be run under safety constraints. ...
... They divide their work into four section including modeling and design of the systems, data collection, analysis of the collected data, and dynamic performance controlling. Also, a number of bibliographies of parallel debugging tools were presented by Pancake et al. [12] [13] [14]. ...
... Deterministic replay of multithreaded programs has several important uses. First, determinism can help developers effectively debug multithreaded programs using cyclic debugging [23] because the erroneous executions can be repeated. Furthermore, determinism is also necessary in fault detection [30], fault recovery [15], and replay-based intrusion analysis [8]. ...
... In this proof, we have to prove that if having v a > v b , then it is possible to conclude m is receivable at a. To prove this, we have to use one theorem (reference at Theorem 1 in [19]). The theorem is: m is receivable at a ⇔ ab (happened-before relation [13]) (*) Because of v a > v b , so there is no message sent from p a to p b received before b. ...
... Checkpointing is a well-known technique used to identify consistent global snapshots from local recorded states called checkpoints. Informally, a global snapshot is consistent if the set of checkpoints that compose it (one per process) accomplishes the following two constraints: first, all the local checkpoints in the snapshot are concurrent and no Z-path exists from one local checkpoint to another or itself [13]. This last case is called a Z − Cycle (these patterns are formally defined in Section 2.2). ...
... Starting programs from intermediate states can solve this problem. Such a solution is offered by Incremental Replay techniques [31]. They support to start a parallel program at intermediate points and investigate only a part of one process at a time. ...
... Further there are associative operations which while non-deterministic in their execution are deterministic in their results (e.g if you add n numbers which are on different processors, the order of the addition is unimportant and you could typically just add the values you receive in messages, without worrying about the order in which the messages were sent). Non-determinacy inherently stems from races and depending on the kind of race, the behavior is either desirable or not [7]. Hence if a language does not allow the user to express desirable non-determinacy, it is sacrificing some performance. ...
... Task-based programs are susceptible to concurrency errors such as atomicity violations [81] and data races [4,23,63,64,76]. A data race occurs when two accesses, with at least one write, from different tasks are incorrectly synchronized [1]. The presence of data races in shared-memory programs often indicates the presence of other concurrency errors [24], and can affect an execution by crashing, hanging, or corrupting data [26,35]. ...
... This thesis is mainly concerned with the performance side of profiling. Numerous concurrent debugging tools, whose primary purpose is correctness, also exist [PN93]. ...
... (5) Preparing the execution environment and building the master shell script (run.sh), which will handle the whole ASR process. The shell script is executed sequentially, in order to achieve an incremental execution style, which will simplify code debugging (Netzer & Weaver, 1994). Another interesting reason for using this script file is to do parallel execution of some inde-pendent steps or some sub steps of a complex step such as training on a cluster of machines. ...