ArticlePDF Available

Detecting Data Races in Parallel Program Executions

Authors:

Abstract

Several methods currently exist for detecting data races in an execution of a shared-memory parallel program. Although these methods address an important aspect of parallel program debugging, they do not precisely define the notion of a data race. As a result, is it not possible to precisely state which data races are detected, nor is the meaning of the reported data races always clear. Furthermore, these methods can sometimes generate false data race reports. They can determine whether a data race was exhibited during an execution, but when more than one data race is reported, only limited indication is given as to which ones are real. This paper addresses these two issues. We first present a model for reasoning about data races, and then present a two-phase approach to data race detection that attempts to validate the accuracy of each detected data race. Our model of data races distinguishes among those data races that actually occurred during an execution (actual data races), those...
Detecting Data Races in Parallel Program Executions
Robert H. B. Netzer
netzer@cs.wisc.edu
Barton P. Miller
bart@cs.wisc.edu
Computer Sciences Department
University of WisconsinMadison
1210 W. Dayton Street
Madison, Wisconsin 53706
Research supported in part by National Science Foundation grant CCR-8815928, Office of Naval Research grant N00014-89-J-1222, and a
Digital Equipment Corporation External Research Grant.
Copyright 1989,1990 Robert H. B. Netzer, Barton P. Miller.
To appear in Languages and Compilers for Parallel Computing, D. Gelernter, T. Gross, A. Nicolau, and D. Padua eds., MIT press, 1991;
also appears in Proceedings of the 3rd Workshop on Programming Languages and Compilers for Parallel Computing, Irvine, CA, August 1990.
Abstract
Several methods currently exist for detecting data races in an execution of a
shared-memory parallel program. Although these methods address an important
aspect of parallel program debugging, they do not precisely define the notion of
a data race. As a result, is it not possible to precisely state which data races are
detected, nor is the meaning of the reported data races always clear. Further-
more, these methods can sometimes generate false data race reports. They can
determine whether a data race was exhibited during an execution, but when
more than one data race is reported, only limited indication is given as to which
ones are real. This paper addresses these two issues. We first present a model
for reasoning about data races, and then present a two-phase approach to data
race detection that attempts to validate the accuracy of each detected data race.
Our model of data races distinguishes among those data races that actually
occurred during an execution (actual data races), those that could have occurred
because of timing variations (feasible data races), and those that appeared to
have occurred (apparent data races). The first phase of our two-phase approach
to data race detection is similar to previous methods and detects a set of data
race candidates (the apparent data races). We prove that this set always contains
all actual data races, although it may contain other data races, both feasible and
infeasible. Unlike previous methods, we then employ a second phase which
validates the apparent data races by attempting to determine which ones are
feasible. This second phase requires no more information than previous
methods collect, and involves making a conservative estimate of the data depen-
dences among the shared data to determine how these dependences may have
constrained alternate orderings potentially exhibited by the execution. Each
apparent data race can then be characterized as either being feasible, or as
belonging to a set of apparent data races where at least one is feasible.
1. Introduction
In shared-memory parallel programs, if accesses to shared data are not properly coordinated, a
data race can result, causing the program to behave in a way not intended by the programmer.
Detecting data races in a particular execution of a parallel program is an important part of
debugging. Several methods for data race detection have been developed[1, 3,4,9,15,17].
Although these methods provide valuable tools for debugging, they do not precisely define the
notion of a data race. As a result, we cannot precisely state which data races are detected by
these methods. In addition, false data race reports can sometimes be generated. These
methods can determine whether or not a data race occurred, but when more than one data race
is reported, no indication is given as to which ones are real. Failure to characterize the detected
data races, and the generation of false data race reports, can make it difficult to use these
methods for debugging the program and locating the cause of the data races. This paper
addresses these two issues. We first present a formal model in which to reason about data
races, and then outline a two-phase approach to data race detection that is discussed entirely in
terms of the model. The first phase performs essentially the same type of analysis as previous
methods, and detects a set of candidate data races (which we call apparent data races). Unlike
previous methods, we then employ a second phase that validates each of the apparent data
races to determine which ones are real. By providing a model for reasoning about data races,
the correctness of our techniques can be convincingly argued, and the meaning of the data race
reports (generated by our methods or others) is made explicit. By validating the apparent data
races, the programmer can be provided with information crucial to debugging.
One purpose of adding explicit synchronization to shared-memory parallel programs is to
coordinate accesses to shared data. Some programs are intended to behave deterministically,
and for these programs synchronization is usually designed to force all shared-data accesses to
the same location to occur in a specific order (for some given program input). When the order
of two shared-memory accesses made by different processes (to the same location) is not
enforced, a race condition is said to exist[5,6], possibly resulting in a nondeterministic execu-
tion. In contrast, other programs are not intended to be deterministic, and for these programs
synchronization is usually added to ensure that some sections of code execute as if they were
atomic (i.e., to implement critical sections). For example, consider a section of code that adds
up a list of shared data representing the deposits to a bank account during a certain month. If
this section of code does not execute as if it were atomic (because, for example, another section
of concurrently executing code is debiting the account), the computed deposit total might not
be correct. A section of code is guaranteed to execute atomically if the shared variables it
reads and modifies are not modified by any other concurrently executing section of code[2]. If
these conditions are not met, a data race is said to exist. Since nondeterministic behavior can
result, a data race is a special case of the more general race condition. In this paper we focus
on data race detection.
To provide a mechanism for reasoning about data race detection, we present a model for
representing executions of shared-memory parallel programs, on sequentially consistent pro-
cessors, that use fork/join and counting semaphores. Our model distinguishes between the ord-
ering of events that actually occurred during execution and the ordering that could have
occurred. Given an actual execution of the program, we characterize alternate event orderings
2
that the execution could have exhibited. Possible orderings include those that could still allow
the original data dependences among the shared data to occur and that do not violate the
semantics of the explicit synchronization primitives used by the program. An execution exhi-
biting such an alternate ordering is called a feasible program execution. The characterization
of feasible program executions in general requires knowledge of which shared-data depen-
dences (if any) were exhibited between any two events performed by the execution. Since
recording this information is not practical in general, we characterize approximate information
in terms of our model. We show how the information recorded by previous methods can be
used to define an approximate program execution. We then distinguish between three types of
data races. Actual data races are those actually exhibited during an execution, feasible data
races are those that could have occurred because of nondeterministic timing variations, and
apparent data races are those that appeared to have occurred from analyzing the approximate
information. Previous methods detect apparent data races. We show that apparent data races
are not always actual or feasible, and show how a two-phase approach can be used to detect
and then validate the apparent data races. The first phase is essentially identical to previous
methods and simply detects the apparent data races. We prove that each actual data race is also
apparent. The approach employed by previous methods is therefore safe in the sense that no
actual data races are left undetected. We also employ a second phase that classifies each
apparent data race as either feasible or as belonging to a set of data races that contains at least
one feasible data race. Performing such a validation provides the programmer with some infor-
mation as to which of the apparent data races should be investigated for debugging.
2. Previous Data Race Detection Methods
All previous methods for dynamic data race detection operate by first instrumenting the pro-
gram so that information about its execution is recorded, and then executing the program and
analyzing the collected information. These methods all analyze essentially the same informa-
tion about the execution, but differ mainly in how and when that information is collected and
analyzed. Two approaches to this information collection and analysis have been proposed:
on-the-fly and post-mortem. On-the-fly techniques[4,9,17] detect data races by an on-going
analysis during execution that encodes information about the execution so it can be accessed
quickly and discarded as it becomes obsolete. Post-mortem techniques[1,3,15] detect data
races after execution ends by analyzing trace files that are produced during execution.
Although all previous methods never fail to detect any data races actually exhibited during exe-
cution (we prove this claim in Section 7), they do not precisely locate where these data races
occurred. We briefly describe the common characteristics of these methods below.
Previous methods instrument the program to collect the same information: which sections
of code executed, the set of shared variables read and written by each section of code, and the
relative execution order between some synchronization operations. To represent this relative
ordering, a DAG is constructed (either explicitly or in an encoded form), which we call the
ordering graph, in which each node represents an execution instance of either a synchroniza-
tion operation (a synchronization event) or the code executed between two synchronization
3
operations (a computation event)
. Edges are added from each event to the next event belong-
ing to the same process, and between some pairs of synchronization events (belonging to dif-
ferent processes) to indicate the order in which the synchronization events executed. The vari-
ous methods differ in the types of synchronization handled, but they all handle fork/join (in
one form or another). An edge is added from each fork event to the first event in each child
created by the fork, and from the last event in each child to the corresponding join event.
The crux of data race detection is the location of events that accessed a common shared
variable (that at least one wrote) and that either did or could have executed concurrently. Find-
ing events that accessed a common shared variable is straightforward, since the sets of shared
variables read and written by each event is recorded. To determine if two events could have
executed concurrently, all previous methods analyze the ordering graph. Two computation
events are assumed to have potentially executed concurrently if no path in the graph connects
the two events. Data races are therefore reported between pairs of events that accessed a com-
mon shared variable and that have no connecting path. However, this assumption is not always
true, and causes previous methods to generate potentially many false data race reports.
To illustrate these false reports, consider the program fragment in Figure 1. This program
creates two identical children that remove from a shared queue the lower and upper bounds of a
region of a shared array to operation upon, perform some computation on that region of the
array, and loop until the queue is empty. The queue initially contains records representing dis-
joint regions of the array. A correct execution of this program should therefore exhibit no data
races, since only disjoint regions of the shared array should be accessed concurrently. How-
ever, assume that the ‘‘remove’’ operations do not properly synchronize their accesses to the
shared queue. An ordering graph for one possible execution of this program is also shown (the
dotted lines only illustrate the data races and are not part of the graph). In this execution, the
first ‘‘remove’’ operation performed by the left child completed before the first ‘‘remove’’ per-
formed by the right child began (the nodes are staggered horizontally to indicate this order).
The first two records were therefore correctly removed, and both children operated (correctly)
on disjoint regions of the array. However, during the next iteration, the ‘‘remove’’ operations
actually overlapped, and the right child correctly removed the fourth record, but the left child
removed the upper bound (100) from the third record and the lower bound (300) from the
fourth record. The left child therefore operated (erroneously) on region [100,299] of the array.
In this graph, no paths connect any nodes of the left child with any nodes of the right
child. Since both children accessed the same queue, previous methods would report four data
races between the ‘‘remove’’ operations (shown by the finely dotted lines). Similarly, since
both children accessed a common region of the array, a data race would also be reported
between these array accesses (shown by the coarsely dotted line). The latter data race report
can be misleading, however, since the accesses by the left child to region [100,299] did not,
and could never, execute concurrently with the accesses to region [200,299] made by the right
child. For these accesses to execute concurrently, the second ‘‘remove’’ operation performed
Some methods do not actually construct a node to represent a computation event but rather represent the
event by an edge connecting the two surrounding synchronization events[4, 15].
4
fork
join
fork
remove (L,U)
loop
while QueueNotEmpty
remove (L,U)
loop
while QueueNotEmpty
join
Initial state of Queue:
remove [200,300]
remove [300,400]
remove [100,200]
remove [100,300]
[100,200]
[200,300]
[1,100]
[300,400]
work on region [L,U1]
of shared array
work on region [L,U1]
of shared array
work on [200,299]
work on [300,399]
work on [100,199]
work on [100,299]
Real data races
False data race
Figure 1. Example program fragment and ordering graph
(the dotted lines only illustrate the reported data races)
by the left child would have to execute before the second ‘‘remove’’ operation performed by
the right child (with which it originally overlapped). If this would happen, the erroneous
record [100,300] would not be removed by the left child, since it would no longer overlap with
the other ‘‘remove’’ operation, and a different region of the array would be accessed.
If the array accesses were more complex, perhaps creating other children, there may have
been many nodes in the graph representing these accesses. In such a case, many false data race
reports would be generated, instead of only one. In this example, the data races are caused by
lack of synchronization in the ‘‘remove’’ operations. The fact that non-disjoint regions of the
array were accessed is an artifact of this missing synchronization, and does not represent a bug
in the program. Reporting many false data races to the programmer, only one of which
involves events that did (or could) execute concurrently, complicates the job of debugging.
False data race reports can result whenever shared variables are used (either directly or
transitively) in conditional expressions or in expressions determining which shared locations
are accessed (e.g., shared array subscripts). Accurate data race detection involves examining
how shared data flowed through the execution and whether the execution might have changed
had a different ordering occurred. This paper presents results showing how to validate the
accuracy of each data race without recording additional information about the execution.
5
3. Program Execution Model
Before discussing data race detection, we first present a formal model to provide a mechanism
for reasoning about shared-memory parallel program executions. The model contains the
objects that represent a program execution (such as which statements were executed and in
what order), and axioms that characterize properties those objects must possess. This model is
useful as a notational device for describing behavior the execution actually exhibited. We later
show how it can also be used to speculate on behavior that the execution could have exhibited
(such as alternate event orderings) due to nondeterministic timing variations. Our model
describes programs that use counting semaphores and the fork/join construct.
3.1. General Model
Our model is based on Lamport’s theory of concurrent systems[13], which provides a formal-
ism for reasoning about concurrent systems that does not assume the existence of atomic opera-
tions. In Lamport’s formalism, a concurrent system execution is modeled as a collection of
operation executions. Two relations on operation executions, precedes ( ) and can causally
affect ( ), describe a system execution; a b means that a completes before b begins (in
the sense that the last action of a can affect the first action of b), and a b means that some
action of a precedes some action of b. We use Lamport’s theory, but restrict it to the class of
shared-memory parallel programs that execute on sequentially consistent processors[11].
When the underlying hardware guarantees sequential consistency, any two events that ex-
ecute concurrently can affect one another (i.e., a
/
b a b b a).
Given sequen-
tial consistency, a single relation is sufficient to describe the temporal aspects of a system exe-
cution. For this purpose we introduce
T
, the temporal ordering relation among events; a
T
b means that a completes before b begins, and a
/
T
b means that a and b execute
concurrently (i.e, neither completes before the other begins). We should emphasize that we are
defining the temporal ordering relation so it describes the order in which events actually exe-
cuted during a particular execution; e.g., a
/
T
b means that a and b actually executed con-
currently, and does not mean that a and b could have executed in any order. In Section 5 we
show how to speculate on alternate temporal orderings that could have been exhibited.
In addition, we replace the relation with the transitive shared-data dependence rela-
tion (or just shared-data dependence relation for brevity),
D
. This relation shows when one
event can causally affect another either because of a direct data dependence involving a single
shared variable, or because of a chain of direct dependences involving several different vari-
ables. A direct shared-data dependence from a to b (denoted a
DD
b) exists if a accesses a
shared variable that b later accesses (where at least one access modifies the variable); we also
say that a direct dependence exists if a precedes b in the same process, since data can in gen-
† In Lamport’s terminology, we are considering the class of system executions that have global-time models.
Throughout this paper, we use superscripted arrows to denote relations, and write a
/
b as a shorthand for ¬(a
b), and a
/
b as a shorthand for ¬(a b) ∧ ¬(b a).
6
eral flow through non-shared variables which are local to the process. A transitive shared-data
dependence (a
D
b) exists if there is a chain of direct dependences from a to b (possibly in-
volving events that access different shared variables);
D
= (
DD
)
+
, the irreflexive tran-
sitive closure of the direct shared-data dependence relation
. This definition of data depen-
dence is different from the standard ones[10] since we consider transitive dependences involv-
ing flow-, anti-, and output-dependences, and do not explicitly state the variables involved.
We define a program execution, P, to be a triple, E,
T
,
D
, where E is a finite set
of events, and
T
and
D
are the relations over E described above. We refer to a given
program execution, P, as an actual program execution when P represents an execution that the
program at hand actually performed. Each event eE represents the execution of a set of pro-
gram statements, and possesses two attributes, READ(e) and WRITE (e), the set of shared vari-
ables read and written by those statements. A data conflict is said to exist between two events
if one events writes a shared variable that the other reads or writes. The temporal ordering and
shared-data dependence relations must satisfy the following axioms:
A1.
T
is an irreflexive partial order.
A2. If a
T
b
/
T
c
T
d then a
T
d.
A3. If a
D
b then b
/
T
a.
No generality is lost by modeling each event, e, as having a unique start time (e
s
) and
finish time (e
f
)[12]. A total ordering on the start and finish times is called a global-time model.
Given a global-time model, the
T
relation is defined as follows: a
T
b iff a
f
< b
s
, and a
/
T
b iff a
s
< b
f
b
s
< a
f
. Axioms A1-A3 can also be written in terms of the start and
finish times. We will occasionally employ such a view in proofs of our results.
3.2. Model Applied to Semaphores and Fork/Join
So far, the model does not describe any of the synchronization aspects of a program execution.
By imposing some structure on the set of events, E, and by adding axioms that describe the se-
mantics of synchronization operations, we extend the general model to describe programs that
use counting semaphores and the fork/join construct. Other types of synchronization can be
similarly accommodated.
We assume that a program execution consists of a number of processes, each of which ei-
ther exists when the program execution begins or is created during execution by a fork opera-
tion. Similarly, a process either continues to exist until the program execution ends or until the
process (and all others created by the same fork operation) is terminated by a join operation.
The set of events belonging to process p is denoted by E
p
, and therefore E =
p
E
p
, for all
‡ The transitive shared-data dependence relation is conservative in the sense that when data flows from a to b
it always shows a dependence from a to b, but also sometimes shows a dependence when in fact no data flow oc-
curs. A more precise characterization of causality would require examining the semantics of the individual ac-
tions performed by each event.
7
processes p that exist during the program execution. Each process is viewed as containing a to-
tally ordered sequence of events, and the term e
p,i
will denote the i
th
event in the execution of
process p. The following axiom describes the total ordering imposed on events belonging to
the same process:
A4. e
p,i
T
e
p,i+1
for all processes p and 1 i < E
p
To describe the presence of synchronization operations, we distinguish between different
types of events. A synchronization event is an instance of a semaphore operation (a P event or
a V event), a fork operation (a fork event), or a join operation (a join event). The set of all P
and V operations on semaphore i is denoted by E
P(i)
and E
V(i)
, respectively. A computation
event is an instance of a group of statements, belonging to the same process, that executed con-
secutively, none of which are synchronization operations. Any arbitrary grouping of (consecu-
tively executed) statement instances that does not include a synchronization operation defines a
computation event.
To describe the semantics of synchronization operations, we add additional axioms to our
model. A fork event, Fork
p,i
, is assumed to precede all events in the child processes which it
creates, and all events in these child processes are assumed to precede the subsequent join
event in process p, Join
p,i+k
:
A5. For all child processes, c, created by each Fork
p,i
and terminated at Join
p,i+k
,
Fork
p,i
T
e
c,j
T
Join
p,i+k
1 j E
c
We assume that in any program execution the semaphore invariant[7] is always maintained.
For counting semaphores, the semaphore invariant is maintained iff at each point in the execu-
tion, the number of V operations that have either completed or have begun executing is greater
than or equal to the number of P operations that have completed. For each semaphore, S, this
invariant can be expressed by the following axiom:
A6. For every subset of P events, P E
P(S)
,
{ v | vE
V(s)
pP (v
T
p v
/
T
p) } P .
The above version of axiom A6 assumes that the initial value of each semaphore is zero. An
arbitrary initial value, m, for some semaphore could be described by creating an artificial pro-
cess that contains m V-events that precede all other events.
3.3. Higher-Level Views
It is useful to be able to view a program execution at different levels of abstraction, since infor-
mation about the execution may be collected at that level, and because sometimes it is useful to
abstract irrelevant details of part of an execution into a higher-level event. We can reason
about a program execution at any level of abstraction by following Lamport and defining a
higher-level view. A higher-level view of a program execution P = E,
T
,
D
is P = E,
T
,
D
where
(H1) E partitions E, and e′∈E,
READ(e) =
ee
READ (e), and WRITE(e) =
ee
WRITE(e).
8
(H2) A
T
B aA, b B (a
T
b), and
(H3) A
D
B aA, bB (a
D
b).
A higher-level view always obeys axioms A1-A3. Since axioms A4-A6 are defined in terms of
synchronization and computation events, they are also obeyed if each higher-level event con-
sists of either a single synchronization event from E, or only a set of computation events from
E. In such a case, each event e′∈E inherits its type from the type of the events comprising e.
When the higher-level events are defined to partition E in this way, P obeys axioms A1-A6 and
is then itself a program execution.
4. Representing Actual Program Executions
The model we have presented so far captures complete information about a program execution
in the sense that
T
shows the relative ordering in which any two events actually executed,
and
D
shows the actual shared-data dependences between any two events. In practice,
recording such complete information is not practical, and we now discuss how to represent par-
tial information about a program execution in our model. Our intent is not to discuss details of
program instrumentation, but rather to outline one type of information that is sufficient for data
race detection, and show how this information is represented in our model. We will define ap-
proximate counterparts to the
T
and
D
relations capturing partial information about a
program execution that is based on the type of information previous methods record. This in-
formation can be recorded without tracing every shared-memory access and without introduc-
ing a central bottleneck into the program. The resulting approximate relations,
T
ˆ
and
D
ˆ
,
define an approximate program execution, P
ˆ
= E,
T
ˆ
,
D
ˆ
. In Section 7 we show how P
ˆ
can be used for data race validation.
4.1. Approximate Temporal Ordering
As described in Section 2, previous data race detection methods record the temporal ordering
among only some synchronization events. For example, the order among fork and join opera-
tions and their child processes is recorded, but the relative order of operations belonging to the
different child processes is not. Recording incomplete ordering information is desirable be-
cause the required instrumentation can be embedded into the implementation of the synchroni-
zation operations without introducing additional synchronization. Not introducing additional
synchronization ensures that the instrumentation will not create a central bottleneck which
could reduce the amount of parallelism achievable by the program. We assume that the pro-
gram is instrumented to record such incomplete ordering information, and that the ordering is
represented by constructing a graph, called the temporal ordering graph, similar to the ordering
graphs used by previous methods. For every event, e, this graph contains two nodes
, e
s
and e
f
(corresponding to the start and finish of e), and an edge from e
s
and e
f
. The graph defines an
In practice, it is not always necessary to actually construct two nodes per event, but we use such a
representation here since it conceptually follows our model.
9
approximate temporal ordering relation,
T
ˆ
, as follows: a
T
ˆ
b iff there is a path from a
f
to
b
s
, b
T
ˆ
a iff there is a path from b
f
to a
s
, and a
/
T
ˆ
b otherwise. We assume the pro-
gram instrumentation constructs a temporal ordering graph that gives
T
ˆ
the following pro-
perties:
(1) If a
T
ˆ
b then a
T
b, and
(2) If a
/
T
ˆ
b then the explicit synchronization performed by the execution did not
prevent a and b from executing concurrently.
The first property states that the ordering of events given by
T
ˆ
must be consistent with the
order in which they actually executed (i.e.,
T
ˆ
T
). The second property means that any
linear ordering of the graph is a global-time model defining a temporal ordering that obeys the
synchronization axioms (i.e., axioms A4-A6). Recall that
T
was defined to represent the ac-
tual order in which events executed; a
/
T
b means that a and b actually overlapped during
execution. Since
T
ˆ
is an approximation of
T
(it is a subset), a
/
T
ˆ
b does not (neces-
sarily) mean that a and b actually overlapped. Instead, it means that a and b were not forced to
occur in a certain order by explicit synchronization (the graph does not contain enough infor-
mation to determine their actual execution order). As illustrated in Figure 1, such events may
nonetheless be ordered. The goal of data race validation (discussed in Section 7) is determin-
ing which events that are not ordered by
T
ˆ
could have indeed executed concurrently.
For programs using fork/join, the ordering graphs constructed by previous methods satis-
fy the above two properties. To accommodate semaphore synchronization, edges can be added
to the temporal ordering graph by recording the order of all operations on a given semaphore.
Such an ordering can be recorded without introducing additional synchronization into the pro-
gram (as mentioned above for fork/join operations). To reflect this ordering in the graph, an
edge can be drawn from each semaphore operation (on a given semaphore) to the next opera-
tion on the same semaphore. Since
T
ˆ
only needs to obey the synchronization axioms, other
approaches for adding edges (that result in more events being unordered by
T
ˆ
, allowing
more data races to be detected) are possible. For example, previous methods that handle sema-
phores[3,4, 15] construct edges only from a V operation to the P operation it allowed to
proceed. More sophisticated approaches have also been investigated[8].
4.2. Approximate Shared-Data Dependences
Determining the actual shared-data dependences exhibited by an execution would in general
require the relative order of all shared memory accesses to be recorded. However, in Section 2
we mentioned that previous methods record the READ and WRITE sets for each computation
event. By using only these sets and the approximate temporal ordering, the actual shared-data
dependences can be conservatively estimated. The approximate shared-data dependence rela-
tion,
D
ˆ
, is defined by speculating on what the actual shared-data dependences might have
been. Consider two events, a and b, that both access a common shared variable (where at least
one access is a modification). If a
T
ˆ
b, then there is a direct shared-data dependence from a
10
to b. When a
/
T
ˆ
b, the direction of any direct dependence cannot be determined (since the
actual temporal ordering between a and b is not known), and we make the conservative as-
sumption that a direct dependence exists from a to b and from b to a. This assumption is con-
servative since it will always include the actual direct dependences, although it may introduce a
dependence from b to a when in fact the only dependence was from a to b. The
D
ˆ
relation
is then defined as the irreflexive transitive closure (see Section 3.1) of this approximation of the
direct dependences. As we will see, for data race validation
D
ˆ
only needs to be computed
between events a and b when a
/
T
ˆ
b.
5. Characterizing Alternate Temporal Orderings
An actual program execution describes aspects of how the program actually performed, and
does not contain any information regarding what the program might have done. For example,
T
is defined to represent the actual temporal ordering in which any two events were per-
formed. In a given program execution, the temporal ordering between some events is not al-
ways enforced by (explicit or implicit) synchronization, but sometimes occurs by chance. It is
possible that another execution of the program could perform exactly the same events, but ex-
hibit a different temporal ordering among these events. In this section we characterize such al-
ternate temporal orderings that an actual program execution, P, could have exhibited because
of nondeterministic timing variations. To determine how much the temporal ordering of P can
be disturbed without affecting the events performed, we consider the shared-data dependences
exhibited by P. Any execution exhibiting these same dependences is capable of performing the
same events. We later use this characterization of alternate orderings to distinguish between
data races actually exhibited by an execution and data races that could have been exhibited.
For a given execution of a program, consider a particular view of the execution (called a
single-access view) in which each computation event is defined to comprise at most one
shared-memory access. The program execution describing this view, P
S
, shows the data
dependences among the individual shared-memory accesses made by the execution. These
single-access shared-data dependences uniquely characterize the events performed. Since the
execution outcome of each statement instance depends only upon the values of the variables it
reads[14], the single-access dependences uniquely determine the program state at each step in
each process.
Any temporal ordering that could still allow these data dependences to occur
(and that would not violate the semantics of the synchronization operations) is an ordering the
execution could have exhibited. Therefore, any other single-access program execution, P
S
,
possessing the same events and (single-access) shared-data dependences as P
S
, represents an
execution the program could actually exhibit, regardless of how its temporal ordering differs
from that of P
S
.
For this statement to hold, interactions with the external environment must be modeled as shared-data
dependences.
11
Similarly, this result also holds for higher-level views of the program execution. In
higher-level views, computation events can consist of many shared-memory accesses. In the
following theorem we show that any higher-level program execution possessing the same
events and (higher-level) shared-data dependences as P describes an execution the program
could actually exhibit, regardless of how its temporal ordering differs from that of P. We call a
program execution that could actually be exhibited a feasible program execution. The follow-
ing theorem gives sufficient conditions for a program execution to be feasible.
Theorem 5.1.
Let P = E,
T
,
D
be an actual program execution. P= E,
T
,
D
is a feasi-
ble program execution if
(F1) Pis a valid program execution (axioms A1-A6 are satisfied), and
(F2)
D
=
D
.
Proof.
We will use the result mentioned above that any single-access program execution possess-
ing the same (single-access) shared-data dependences as those that actually occurred represents
an execution the program could exhibit[14]. This theorem extends the result to higher-level
program executions. Since computation events in higher-level program executions can consist
of more than one shared-memory access, there may be more than one single-access program
execution for which P, the actual program execution, is a higher-level view. Therefore, given a
higher-level view, we do not always know which shared-data dependences actually occurred at
the single-access level. To show that P is a feasible program execution, we must show that it
is a higher-level view of a single-access program execution possessing the actual single-access
dependences. However, since these dependences are not known, we will show that the shared-
data dependences exhibited by each single-access program execution described by P are also
exhibited by some single-access execution described by P. We will then be guaranteed that,
no matter which single-access shared-data dependences were exhibited during P, an execution
capable of exhibiting those same dependences is described by P.
The single-access program executions that are described by P is given by the set
{ P
S
= E
S
,
T
S
,
D
S
P is a higher-level view of P
S
}, and the single-access executions
described by P is given by { P
S
= E
S
,
T
S
,
D
S
P is a higher-level view of P
S
}. We
must prove that each
D
S
is equal to some
D
S
. We first show that for any pair of higher-
level events, we can always find some
D
S
exhibiting the same shared-data dependences as
any
D
S
among the lower-level events comprising these events, and then show that this
guarantees some
D
S
exists exhibiting the same dependences as any
D
S
among all the
lower-level events comprising the actual program execution (which shows
D
S
=
D
S
).
First, consider any P
S
, and its (single-access) shared-data dependences among the lower-
level events a
S
a and b
S
b comprising any two higher-level events a and b. We now show
a P
S
exists exhibiting these same dependences. Since each lower-level event comprises at
most one shared-memory access, it suffices to show that some P
S
exists such that b
S
/
T
S
a
S
whenever a
S
D
S
b
S
, and a
S
/
T
S
b
S
whenever b
S
D
S
a
S
, for any a
S
a and b
S
b.
12
Case (1): a
T
b and a
T
b. In this case,
D
S
can only contain shared-data dependences
from some a
S
to some b
S
, and all P
S
have a
S
T
b
S
for all a
S
a and b
S
b.
Case (2): a
T
b and a
/
T
b. As with case (1),
D
S
can only contain shared-data
dependences from some a
S
to some b
S
. Some P
S
must exist in which b
S
/
T
S
a
S
for all
a
S
a and b
S
b, since otherwise b
S
T
S
a
S
for all a
S
a and b
S
b would imply b
T
a, contradicting the assumption a
/
T
b.
Case (3): a
T
b and b
T
a. In this case,
D
S
can contain no shared-data dependences
between any a
S
and b
S
(or else Pwould violate A3).
Case (4): a
/
T
b and a
T
b. Since
D
S
can contain shared-data dependences only
from some a
S
to some b
S
(or else Pwould violate A3), this case is analogous to case (1).
Case (5): a
/
T
b and a
/
T
b. In this case,
D
S
can contain shared-data dependences
in both directions between the a
S
and b
S
, and since the set of single-access program executions
described by P contains all possible temporal orderings among the a
S
and b
S
that cause a
/
T
b, some P
S
clearly exists with the desired properties.
Finally, we show that each
D
S
equals some
D
S
. Notice that when there are events a
and b that overlap, P describes more than one single-access program execution. These
single-access program executions contain all possible (legal) temporal orderings among the
lower-level events a
S
a and b
S
b that cause a and b to overlap. The set of all single-access
program executions described by P can be constructed by choosing, for each pair of higher-
level events a and b, one such temporal ordering among the lower-level events comprising a
and b. We showed above that for any
D
S
, some
D
S
exists exhibiting the same shared-
data dependences among the lower-level events comprising any pair of higher-level events.
Using this result, we can always find a
D
S
exhibiting the same shared-data dependences as
any
D
S
among all the lower-level events by independently considering each pair of higher-
level events. Therefore, each
D
S
is equal to some
D
S
, which proves the theorem.
Given an actual program execution, P, we are not attempting to predict the behavior of
the program had different shared-data dependences occurred. Instead, the above theorem
characterizes different program executions (performing the same events as P) that we can
guarantee the program is capable of exhibiting. Indeed, there may be program executions that
violate the above conditions but nevertheless perform the same events. However, characteriz-
ing such program executions requires analyzing the semantics of the program itself, to deter-
mine what effects different shared-data dependences would have on P.
6. Definition of Data Race
We can now characterize different types of data races in terms of our model. We distinguish
between an actual data race, which is a data race exhibited during an actual program execu-
tion, and a feasible data race, which is a data race that could have been exhibited because of
timing variations. We also characterize the apparent data races, those data races detected by
13
searching the ordering graph for data-conflicting events that are not ordered by the graph
(which are the data races reported by previous methods). As discussed in the next section, the
problem of data race validation is determining which apparent data races are feasible.
Definition 6.1
A data race under
T
exists between a and b iff
(DR1) a data conflict exists between a and b, and
(DR2) a
/
T
b.
Definition 6.2
An actual data race exists between a and b iff
(AR1) P = E,
T
,
D
is an actual program execution, and
(AR2) a data race under
T
exists between a and b.
Definition 6.3
A feasible data race exists between a and b iff
(FR1) there exists some feasible program execution, P = E,
T
,
D
, and
(FR2) a data race under
T
exists between a and b.
Definition 6.4
An apparent data race exists between a and b iff
(AP1) P
ˆ
= E,
T
ˆ
,
D
ˆ
is an approximate program execution, and
(AP2) a data race under
T
ˆ
exists between a and b.
7. Detecting Data Races
We now present our two-phase approach to data race detection. In the first phase, the apparent
data races are located by using the approximate information collected about the execution to
construct and then analyze the temporal ordering graph. This first phase performs the same
type of analysis as previous data race detection methods. Unlike previous methods, we then
employ a second phase to validate each apparent data race by attempting to determine whether
or not the race is feasible. This determination is made by first augmenting the temporal order-
ing graph with additional edges representing a conservative estimate of the shared-data depen-
dences, and then analyzing the resulting graph for cycles. Such a two-phase approach has the
advantage that approximate information (such as that recorded by previous methods) can be
used, but the programmer can still be provided with information regarding the feasibility of the
reported data races. Throughout the remainder of this section, P
ˆ
= E,
T
ˆ
,
D
ˆ
will denote
an approximate program execution, and P = E,
T
,
D
will denote the actual program ex-
ecution (which is unknown).
14
7.1. Phase I: Detecting Apparent Data Races
The first phase of our data race detection method identifies the apparent data races. The ap-
parent data races are located by first constructing the temporal ordering graph and then search-
ing the graph for pairs of data-conflicting events, a and b, whose nodes have no connecting
path (implying that a
/
T
ˆ
b). In general, these data races include all actual data races, plus
additional races, both feasible and infeasible. This phase cannot distinguish among these types
of races since doing so would require knowledge of the complete temporal ordering.
Because the apparent data races are detected using an approximate temporal ordering, not
all apparent data races are always actual or feasible. Figure 1 showed an example of an ap-
parent data race that was not feasible. However, we now prove that each actual data race is
also an apparent data race. The naive method of simply reporting all apparent data races to the
user (which is the approach of previous methods) is therefore safe in the sense that no actual
data races are left undetected.
Theorem 7.1.
Every actual data race is also an apparent data race.
Proof.
If there is an actual data race between a and b, then a
/
T
b. To show that there is an
apparent data race between a and b, we must show that a
/
T
ˆ
b. By definition, the temporal
ordering graph is constructed so that a
T
ˆ
b a
T
b (see Section 3). We must show that
this implies a
/
T
b a
/
T
ˆ
b. Consider that the contrapositive of the assumption is a
/
T
b a
/
T
ˆ
b, which is equivalent to a
T
b a
/
T
ˆ
b, or a
T
b b
T
ˆ
a a
/
T
ˆ
b. But if a
T
ˆ
b a
T
b, then this becomes a
T
b b
T
a a
/
T
ˆ
b, or
¬(a
/
T
b) a
/
T
ˆ
b, which is equivalent to a
/
T
b a
/
T
ˆ
b.
Note that the proof of Theorem 7.1 does not make use of the specifics of how the temporal
ordering graph is constructed. Indeed, any approximate temporal ordering,
T
ˆ
, with the pro-
perty a
T
ˆ
b a
T
b is sufficient to allow all actual data races to be detected as apparent
data races. However, the more exhaustively the program is traced, the more accurately the ap-
parent data races can be validated, as is shown below. As we will also show below, apparent
data races have the property that the presence of an apparent data race implies that there is a
feasible data race somewhere in the program execution, implying that when no actual data
races occur, no apparent data races will be reported.
7.2. Phase II: Validating Apparent Data Races
The first phase of our data race detection method locates a set of apparent data races. We now
outline the second phase, which validates each apparent data race by attempting to determine
whether or not the race is feasible. This determination is made by first augmenting the tem-
poral ordering graph with edges representing a conservative estimate of the actual shared-data
dependences, and then searching the augmented graph for certain types of cycles. Each ap-
parent data race can be characterized either as being feasible, or as belonging to a set of ap-
parent data races where at least one is feasible.
15
To show that an apparent data race between a and b is feasible, we must guarantee that
some feasible program execution, P′ = 〈E,
T
,
D
, exists such that a
/
T
b. To
determine the feasibility of a program execution requires knowledge of
D
, the shared-data
dependences exhibited by the observed execution. When only an approximate program execu-
tion is available, however, the exact shared-data dependences are not known. By using the
conservative estimate of these dependences,
D
ˆ
, we can guarantee that some feasible pro-
gram executions must exist. We augment the temporal ordering graph with edges, called
shared-data dependence edges, representing this conservative estimate. Let G be the temporal
ordering graph. We construct the augmented temporal ordering graph, G
AUG
, by augmenting
G with edges that ensure there is a path from a
s
to b
f
whenever a
D
ˆ
b. These edges ensure
that any possible shared-data dependence from a to b would be allowed to occur in certain pro-
gram executions defined by G
AUG
(shown below in the proof of Theorem 7.2). If a
T
ˆ
b,
then a path from a
s
to b
f
already exists. Edges are therefore added only when a
/
T
ˆ
b. In
this case, edges are added from a
s
to b
f
and from b
s
to a
f
if there is a data conflict between a
and b, or if a has a data conflict with some other event c that also has a data conflict with b and
a
/
T
ˆ
c
/
T
ˆ
b.
In general, G
AUG
may contain cycles, due to the conservative approximation made about
the actual shared-data dependences. By classifying the apparent data races into those that par-
ticipate in cycles and those that do not, some apparent data races can be guaranteed to be feasi-
ble. We say that two events, a and b, are tangled if either a
s
and b
f
, or b
s
and a
f
, belong to the
same strongly connected component
of G
AUG
. A tangled data race is an apparent data race
between two tangled events. Each strongly connected component defines a set of tangled data
races, called a tangle. We now show (in Theorem 7.2) that any apparent data race between two
events that are not tangled is guaranteed to be feasible. We then show (in Theorem 7.3) that in
each tangle, at least one of the apparent data races is guaranteed to be feasible.
Lemma 7.1.
For a given execution, assume P = E,
T
,
D
and P
ˆ
= E,
T
ˆ
,
D
ˆ
are the associ-
ated complete and approximate program executions. Let G be the temporal ordering
graph defining
T
ˆ
, and let G
AUGD
be G augmented with edges representing the actual
shared-data dependences,
D
. Any linear ordering of the nodes of G
AUGD
is a global-
time model that defines a temporal ordering relation,
T
, such that P′ = E,
T
,
D
is a feasible program execution.
Proof.
We introduce G
AUGD
as a device for showing that certain feasible program executions
must exist. G
AUGD
is identical to G
AUG
, except that edges representing shared-data depen-
A strongly connected component has the property that there is path from every node in the component to
every other node, but no path from a node in one component to a node in another component and back.
16
dences that did not actually occur do not appear (they were conservatively added to G
AUG
so
that no actual shared-data dependences were missed). Even though we do not have enough in-
formation to construct G
AUGD
, it nonetheless exists, and in Theorems 7.2 and 7.3 we prove
that it must possess certain properties. In this Lemma, we prove that any linear ordering of the
nodes of this graph can be used to define a feasible program execution. Any such linear order-
ing defines a temporal ordering that satisfies the conditions for feasibility. The shared-data
dependence constraint (axiom A3) is satisfied since G
AUGD
contains shared-data dependence
edges representing the actual shared-data dependences. The synchronization constraints (ax-
ioms A4-A6) are satisfied since G
AUGD
contains at least as many edges as G, and by
definition, any linear ordering of G obeys axioms A4-A6.
Let L be any linear ordering of the nodes of G
AUGD
, and let
T
be the temporal order-
ing defined by L. We first show that
T
satisfies axioms A1-A6:
A1. The
T
relation is irreflexive, since if a
T
a for some event a, then a
f
would
have to appear before a
s
in L, which is not possible, since by definition G contains an
edge from b
s
to b
f
for every event b. By the same argument, the
T
relation is
asymmetric. The
T
relation is transitive, since if a
T
b b
T
c, then a
f
appears before b
s
in L, and b
f
appears before c
s
. Since an edge exists from b
s
to b
f
for every event b, it follows that a
f
appears before c
s
in L, implying that a
T
c.
A2. Assume a, b, c, d exist such that a
T
b
/
T
c
T
d. By the definition of
T
, L must contain the nodes a
f
b
s
c
f
d
s
, in this order, implying that a
T
d.
A3. Since G
AUGD
contains an edge from a
s
to b
f
whenever a
D
b, a
s
will precede b
f
in L, implying that b
/
T
a.
A4-A6. Because G
AUGD
contains no fewer edges than G, axioms A4 through A6 are
satisfied since, by definition, any linear ordering of G obeys these axioms.
Since
T
satisfies axioms A1-A6, P′ = 〈E,
T
,
D
is a feasible program execution.
Theorem 7.2.
If there is an apparent data race between a and b, and a and b are not tangled, then the data
race is feasible.
Proof.
We first show that G
AUG
contains no path from a
f
to b
s
, and no path from b
f
to a
s
, and
then show that this implies the apparent data race between a and b is feasible.
To show that G
AUG
contains no path from a
f
to b
s
, and no path from b
f
to a
s
, we will es-
tablish a contradiction by assuming there is a path from a
f
to b
s
, or a path from b
f
to a
s
. Since
a and b are not tangled, only one such path can exist. Assume the path from a
f
to b
s
exists.
Since an apparent data race exists between a and b, G
AUG
contains shared-data dependence
edges from a
s
to b
f
, and from b
s
to a
f
. But these edges create the path a
f
b
s
a
f
in G
AUG
, im-
plying that a
f
and b
s
belong to the strongly connected component, which cannot be true since a
and b are not tangled. Therefore, there can be no path from a
f
to b
s
, and no path from b
f
to a
s
.
17
We finally show that the apparent data race between a and b is feasible. Consider the
graph G
AUGD
, constructed by augmenting the temporal ordering graph, G, with edges
representing the shared-data dependences that were actually exhibited by the program execu-
tion (see the proof of Lemma 7.1). This graph contains no more edges than G
AUG
, since the
edges in G
AUG
represent the conservative estimate of the actual shared-data dependences.
Since G
AUG
cannot contain a path from a
f
to b
s
, or a path from b
f
to a
s
, G
AUGD
cannot con-
tain such paths either. There is thus a linear ordering of the nodes of G
AUGD
in which a
s
ap-
pears before b
f
, and b
s
appears before a
f
. This linear ordering is a global-time model that
defines a temporal ordering,
T
, such that a
/
T
, b. By Lemma 7.1, P′ = E,
T
,
D
is a feasible program execution. Therefore, the apparent data race between a and b is
feasible.
The above theorem shows that the apparent data races between events that are not tangled
are guaranteed to be feasible. Not all of the remaining apparent data races are infeasible, how-
ever. We now show that, in each tangle, at least one tangled data race is guaranteed to be
feasible. Without more precise knowledge of the actual shared-data dependences (or without
examining the semantics of the program execution), we cannot determine exactly which tan-
gled data races are feasible.
Lemma 7.2.
Let G be a temporal ordering graph, let G
AUG
be G augmented with edges representing the
conservative estimate of the actual shared-data dependences, and let G
AUGD
be G aug-
mented with edges representing the actual shared-data dependences (see the proof of Lem-
ma 7.1). Assume T is a set of tangled events defined by a strongly connected component
of G
AUG
. Then there exists two events a,bT such that an apparent data race exists
between a and b, and no path from a
f
to b
s
, or from b
f
to a
s
, exists in G
AUGD
.
Proof.
Let T be the set of nodes in G
AUGD
representing the events in T. To establish a contrad-
iction, assume that for all events a,bT such that there is an apparent data race between a and
b, there is either a path from a
f
to b
s
, or a path from b
f
to a
s
, in G
AUGD
. Since a path from a
f
to b
s
and a path from b
f
to a
s
cannot both exist (G
AUGD
is acyclic), assume that the path from
a
f
to b
s
exists. Since there is an apparent data race between a and b, no such path exists in G.
The path in G
AUGD
must therefore contain at least one shared-data dependence edge, which
cannot emanate from a
f
. This path must contain nodes for two events, c and d, such that there
is a path from a
f
to c
s
, a shared-data dependence edge from c
s
to d
f
, and a path from d
f
to b
s
.
Such a path implies that a
T
c and d
T
b. Furthermore, c and d must belong to T, since T
contains a strongly connected component.
The shared-data dependence edge from c
s
to d
f
exists either because there is a data
conflict between c and d (and therefore also an apparent data race), or because c data conflicts
with some other event that data conflicts with d (a transitive data conflict). Assume that the
edge exists because of an apparent data race between c and d. Since c and d belong to T, our
contradiction assumption implies that there must be a path from c
f
to d
s
. By applying the
above argument to c and d, we conclude that the path from c
f
to d
s
must contain nodes for two
events, e,fT, such that there is a path from c
f
to e
s
, a shared-data dependence edge from e
s
to
18
f
f
, and a path from f
f
to d
s
. Such a path implies that c
T
e and f
T
d. Since a
T
c and
d
T
b, the events e and f must be different than c and d. By inductively applying the above
argument, we find that we always need two more events, x and y, belonging to T, that are dif-
ferent than all other events in T. Since T is finite, we eventually arrive at a contradiction.
If the shared-data dependence edge exists from c
s
to d
f
because of a transitive data
conflict between c and d, event c must participate in an apparent data race with some event e
that has a (possibly transitive) data conflict with d. By applying an argument similar to the one
above to c and e, we also arrive at a contradiction. Therefore, two events, a,bT, must exist
such that there is an apparent data race between a and b, and there is no path from a
f
to b
s
, and
no path from b
f
to a
s
, in G
AUGD
.
Theorem 7.3.
Let G
AUG
be an augmented temporal ordering graph, and let T be the set of tangled events
defined by some strongly connected component of G
AUG
. At least one of the apparent
data races in T is feasible.
Proof.
Let G
AUGD
be the temporal ordering graph augmented with edges representing the actual
shared-data dependences (see the proof of Lemma 7.1). By Lemma 7.2, there exists two events
a,bT such that there is an apparent data race between a and b, and there is no path from a
f
to
b
s
, and no path from b
f
to a
s
, in G
AUGD
. By the argument at the end of the proof of Theorem
7.2, there is a feasible program execution, P′ = 〈E,
T
,
D
, such that a
/
T
b, show-
ing that the apparent data race between a and b is feasible. Therefore, at least one of the tan-
gled data races is feasible.
For each tangle, the above theorem guarantees that at least one tangled data race in the
tangle is always feasible. As illustrated in Section 2, however, not all of the tangled data races
are always feasible. An infeasible tangled data race exists only when the outcome of one tan-
gled data race affects another tangled data race. A data race between a and b can affect a data
race between c and d if (1) a or b modifies a shared variable, V, and (2) either the shared loca-
tions accessed by c or d, or the presence of c or d, depend upon V. The presence of c or d can
depend upon V if the outcome of some conditional statement depends upon V, and the outcome
might either delay the execution of c or d, or cause c or d to not execute at all. This notion is
similar to the hides relation of Allen and Padua[1]. A future paper will describe how to employ
these ideas to locate tangled data races that can be guaranteed feasible.
8. Conclusion
This paper has addressed two issues regarding data race detection. We first presented a formal
model for reasoning about data races, and then presented a two-phase approach to data race
detection that validates the accuracy of each detected data race. Our model distinguished
among the data races that actually occurred (actual data races), that could have occurred (feasi-
ble data races), and that appeared to have occurred (apparent data races). Such a model al-
lowed us to characterize the type of data races detected by previous methods, and to develop
and argue the correctness of our two-phase approach. The first phase of this approach is essen-
tially identical to previous methods and detects the apparent data races. We proved that all ac-
19
tual data races are detected by this phase. Unlike previous methods, we then employed a
second phase that validates the apparent data races. This phase augments the temporal order-
ing graph with edges representing a conservative estimate of the shared-data dependences. An
apparent data race is validated by determining whether the events involved in the race belong
to the same strongly connected component. We proved that each apparent data race involving
two events belonging to different strongly connected components (or none at all) is feasible,
and in each set of races belonging to a strongly connected component, at least one is feasible.
We are currently investigating several issues related to this work. First, we are develop-
ing more precise analyses for locating those apparent data races that are feasible. As men-
tioned in Section 7, tangled data races are infeasible only when one tangled data race affects
the outcome (or the existence) of another. By examining when one event can affect another,
the notion of a feasible program execution can be extended to characterize what an execution
could have done had different shared-data dependences occurred. Using this extended notion
of feasibility, certain tangled data races can be shown to be feasible. Second, we are examin-
ing different classes of feasible data races. We have proven that the problem of detecting all
feasible data races is NP-hard[16] (even when the complete program execution is known).
However, certain classes of feasible data races can be efficiently detected. Third, we are
developing techniques for providing efficient data race detection in practice. These techniques
include efficient program instrumentation, and algorithms for actually constructing, augment-
ing and analyzing the temporal ordering graph. For example, it is not necessary to model each
event with two nodes in the temporal ordering graph. By appropriately modeling the end of
one event as the start of the next event (in the same process), only one node per event is re-
quired. We are also investigating techniques for efficiently recording the READ and WRITE
sets for each computation event. In addition, even though we presented a two-phase scheme,
data race validation does not necessarily require a post-mortem approach. It may be possible to
perform the validation phase on-the-fly. Finally, the ideas presented in this paper can be ap-
plied to shared-memory parallel programs that use synchronization primitives other than sema-
phores, such as event variables, barriers, or rendezvous. To gain practical experience with
these ideas, we are currently incorporating them into a parallel program debugger[3,15] under
development at the University of WisconsinMadison.
Acknowledgements
This research was supported in part by National Science Foundation grant CCR-8815928,
Office of Naval Research grant N00014-89-J-1222, and a Digital Equipment Corporation
External Research Grant.
References
1. Allen, T. R. and D. A. Padua, ‘‘Debugging Fortran on a Shared Memory Machine,’’
Proc. of Intl. Conf. on Parallel Processing, pp. 721-727 St. Charles, IL, (Aug. 1987).
2. Bernstein, A. J., ‘‘Analysis of Programs for Parallel Processing,’’ IEEE Trans. on Elec-
tronic Computers EC-15(5) pp. 757-763 (Oct. 1966).
20
3. Choi, J.-D., B. P. Miller, and R. H. B. Netzer, ‘‘Techniques for Debugging Parallel Pro-
grams with Flowback Analysis,’’ Comp. Sci. Dept. Tech. Rep. #786, Univ. of
Wisconsin-Madison, (Aug. 1988).
4. Dinning, A. and E. Schonberg, ‘‘An Empirical Comparison of Monitoring Algorithms
for Access Anomaly Detection,’’ Proc. of ACM SIGPLAN Symp. on Principles and
Practice of Parallel Programming, pp. 1-10 Seattle, WA, (Mar. 1990).
5. Emrath, P. A. and D. A. Padua, ‘‘Automatic Detection Of Nondeterminacy in Parallel
Programs,’’ Proc. of the Workshop on Parallel and Distributed Debugging, pp. 89-99
Madison, WI, (May 1988). Also SIGPLAN Notices 24(1) (Jan. 1989).
6. Emrath, P. A., S. Ghosh, and D. A. Padua, ‘‘Event Synchronization Analysis for De-
bugging Parallel Programs,’’ Supercomputing ’89, pp. 580-588 Reno,NV, (Nov. 1989).
7. Habermann, A. N., ‘‘Synchronization of Communicating Processes,’’ Communications
of the ACM 12(3) pp. 171-176 (Mar. 1972).
8. Helmbold, D. P., C. E. McDowell, and J.-Z. Wang, ‘‘Analyzing Traces with
Anonymous Synchronization,’’ Proc. of Intl. Conf. on Parallel Processing, St. Charles,
IL, (Aug. 1990).
9. Hood, R., K. Kennedy, and J. Mellor-Crummey, ‘‘Parallel Program Debugging with
On-the-fly Anomaly Detection,’’ Supercomputing ’90, New York, NY, (Nov. 1990).
10. Kuck, D. J., R. H. Kuhn, B. Leasure, D. A. Padua, and M. Wolfe, ‘‘Dependence Graphs
and Compiler Optimizations,’’ Conf. Record of the 8th ACM Symp. on Principles of
Programming Languages, pp. 207-218 Williamsburg, VA, (Jan. 1981).
11. Lamport, L., ‘‘How to Make a Multiprocessor Computer That Correctly Executes Mul-
tiprocess Programs,’’ IEEE Trans. on Computers C-28(9) pp. 690-691 (Sep. 1979).
12. Lamport, L., ‘‘Interprocess Communication,’’ SRI Technical Report, (Mar. 1985).
13. Lamport, L., ‘‘The Mutual Exclusion Problem: Part I A Theory of Interprocess
Communication,’’ Journal of the ACM 33(2) pp. 313-326 (Apr. 1986).
14. Mellor-Crummey, J. M., ‘‘Debugging and Analysis of Large-Scale Parallel Programs,’’
Ph.D. Thesis, also Comp. Sci. Dept. Tech. Rep. 312, Univ. of Rochester, (Sep. 1989).
15. Miller, B. P. and J.-D. Choi, ‘‘A Mechanism for Efficient Debugging of Parallel Pro-
grams,’’ Proc. of the Conf. on Programming Language Design and Implementation,
pp. 135-144 Atlanta, GA, (June 1988). Also SIGPLAN Notices 23(7) (July 1988).
16. Netzer, R. H. B. and B. P. Miller, ‘‘On the Complexity of Event Ordering for Shared-
Memory Parallel Program Executions,’’ Proc. of Intl. Conf. on Parallel Processing, St.
Charles, IL, (Aug. 1990).
17. Nudler, I. and L. Rudolph, ‘‘Tools for the Efficient Development of Efficient Parallel
Programs,’’ Proc. of 1st Israeli Conf. on Computer System Engineering, (1988).
21
22
... Evidence based on prior work, suggests a belief that the problem might be NP-complete. First, related problems, like datarace detection for programs with strong synchronization primitives [27][28][29], or verifying sequential consistency [16], are known to be NP-hard. Second, all known "complete" algorithms run in worst-case exponential time. ...
... al. [4] studied various languagetheoretic questions about Mazurkiewicz traces. The folklore results about the NP-hardness of race detection are often attributed to Netzer and Miller [27][28][29]. However, the problem considered in their work differs in significant ways from the problem of data-race prediction. ...
Preprint
Writing concurrent programs is notoriously hard due to scheduling non-determinism. The most common concurrency bugs are data races, which are accesses to a shared resource that can be executed concurrently. Dynamic data-race prediction is the most standard technique for detecting data races: given an observed, data-race-free trace $t$, the task is to determine whether $t$ can be reordered to a trace $t^*$ that exposes a data-race. Although the problem has received significant practical attention for over three decades, its complexity has remained elusive. In this work, we address this lacuna, identifying sources of intractability and conditions under which the problem is efficiently solvable. Given a trace $t$ of size $n$ over $k$ threads, our main results are as follows. First, we establish a general $O(k\cdot n^{2\cdot (k-1)})$ upper-bound, as well as an $O(n^k)$ upper-bound when certain parameters of $t$ are constant. In addition, we show that the problem is NP-hard and even W[1]-hard parameterized by $k$, and thus unlikely to be fixed-parameter tractable. Second, we study the problem over acyclic communication topologies, such as server-clients hierarchies. We establish an $O(k^2\cdot d\cdot n^2\cdot \log n)$ upper-bound, where $d$ is the number of shared variables accessed in $t$. In addition, we show that even for traces with $k=2$ threads, the problem has no $O(n^{2-\epsilon})$ algorithm under Orthogonal Vectors. Since any trace with 2 threads defines an acyclic topology, our upper-bound for this case is optimal wrt polynomial improvements for up to moderate values of $k$ and $d$. Finally, we study a distance-bounded version of the problem, where the task is to expose a data race by a witness trace that is similar to $t$. We develop an algorithm that works in $O(n)$ time when certain parameters of $t$ are constant.
... Some modern molecular dynamics frameworks such as OpenMM [26], thus target shared-memory CPU and GPU architectures to exploit current hardware to its fullest extent. While the shared-memory model eliminates most of the communication overhead, it comes at the cost of a parallelism model that is much more prone to errors, in particular a class of errors called data races [27]. ...
Preprint
Recently, there has been much progress in the formulation and implementation of methods for generic many-particle simulations. These models, however, typically either do not utilize shared memory hardware or do not guarantee data-race freedom for arbitrary particle dynamics. Here, we present both a abstract formal model of particle dynamics and a corresponding domain-specific programming language that can guarantee data-race freedom. The design of both the model and the language are heavily inspired by the Rust programming language that enables data-race-free general-purpose parallel computation. We also present a method of preventing deadlocks within our model by a suitable graph representation of a particle simulation. Finally, we demonstrate the practicability of our model on a number of common numerical primitives from molecular dynamics.
... A major cause for defects in multithreaded programs is race conditions. A large body of research focuses on dynamic or static race detection [78,69,33,82,21,26]. Race detection suffers from the problem of false warnings. ...
Research Proposal
Full-text available
Software pervades every aspect of our life: businesses, financial services, medical services, communication systems, entertainment, and education are invariably dependent on software. With this increasing dependency on software, I expect software to be reliable, robust, safe, and secure. Unfortunately, at the present time the reliability of day-to-day software is questionable. In fact, NIST estimated in 2002 that software failures cost the US economy alone about $59.5 billion every year, and that improvements in software testing infrastructure might save one-third of this cost.
... Beyond the dynamic on-the-fly approach which we took, other approaches include static checking for races [16,17,36] and post-mortem analysis [2,10,37,38]. There is extensive work on dynamic race detection for parallel computations that do no have well-defined structure [15,21,41,45,54,57,62]. ...
Conference Paper
A program is said to have a determinacy race if logically parallel parts of a program access the same memory location and one of the accesses is a write. These races are generally bugs in the program since they lead to non-deterministic program behavior --- different schedules of the program can lead to different results. Most prior work on detecting these races focuses on a subclass of programs with fork-join parallelism. This paper presents a race-detection algorithm, 2D-Order, for detecting races in a more general class of programs, namely programs whose dependence structure can be represented as planar dags embedded in 2D grids. Such dependence structures arise from programs that use pipelined parallelism or dynamic programming recurrences. Given a computation with T1 work and T∞ span, 2D-Order executes it while also detecting races in O(T1/P + T∞) time on P processors, which is asymptotically optimal. We also implemented PRacer, a race-detection algorithm based on 2D-Order for Cilk-P, which is a language for expressing pipeline parallelism. Empirical results demonstrate that PRacer incurs reasonable overhead and exhibits scalability similar to the baseline (executions without race detection) when running on multiple cores.
Article
Testing for data races in the Linux OS kernel is challenging because there is an exponentially large space of system calls and thread interleavings that can potentially lead to concurrent executions with races. In this work, we introduce a new approach for modeling execution trace feasibility and apply it to Linux OS Kernel race prediction. To address the fundamental scalability challenge posed by the exponentially large domain of possible execution traces, we decompose the task of predicting trace feasibility into independent prediction subtasks encoded as learning Boolean indicator functions for specific memory accesses, and apply a sparse fourier learning approach to learning each feasibility subtask. Boolean functions that are sparse in their fourier domain can be efficiently learned by estimating the coefficients of their fourier expansion. Since the feasibility of each memory access depends on only a few other relevant memory accesses or system calls (e.g., relevant inter-thread communications), we observe that trace feasibility functions often have this sparsity property and can be learned efficiently. We use learned trace feasibility functions in conjunction with conservative alias analysis to implement a kernel race-testing system, HBFourier, that uses sparse fourier learning to efficiently model feasibility when making predictions. We evaluate our approach on a recent Linux development kernel and show it finds 44 more races with 15.7% more accurate race predictions than the next best performing system in our evaluation, in addition to identifying 5 new race bugs confirmed by kernel developers.
Article
A program is said to have a determinacy race if logically parallel parts of a program access the same memory location and one of the accesses is a write. These races are generally bugs in the program since they lead to non-deterministic program behavior --- different schedules of the program can lead to different results. Most prior work on detecting these races focuses on a subclass of programs with fork-join parallelism. This paper presents a race-detection algorithm, 2D-Order, for detecting races in a more general class of programs, namely programs whose dependence structure can be represented as planar dags embedded in 2D grids. Such dependence structures arise from programs that use pipelined parallelism or dynamic programming recurrences. Given a computation with T1 work and T∞ span, 2D-Order executes it while also detecting races in O(T1/P + T∞) time on P processors, which is asymptotically optimal. We also implemented PRacer, a race-detection algorithm based on 2D-Order for Cilk-P, which is a language for expressing pipeline parallelism. Empirical results demonstrate that PRacer incurs reasonable overhead and exhibits scalability similar to the baseline (executions without race detection) when running on multiple cores.
Article
Like shared-memory multi-threaded programs, event-driven programs such as client-side web applications are susceptible to data races that are hard to reproduce and debug. Race detection for such programs is hampered by their pervasive use of ad hoc synchronization, which can lead to a prohibitive number of false positives. Race detection also faces a scalability challenge, as a large number of short-running event handlers can quickly overwhelm standard vector-clock-based techniques. This paper presents several novel contributions that address both of these challenges. First, we introduce race coverage, a systematic method for exposing ad hoc synchronization and other (potentially harmful) races to the user, significantly reducing false positives. Second, we present an efficient connectivity algorithm for computing race coverage. The algorithm is based on chain decomposition and leverages the structure of event-driven programs to dramatically decrease the overhead of vector clocks. We implemented our techniques in a tool called EventRacer and evaluated it on a number of public web sites. The results indicate substantial performance and precision improvements of our approach over the state-of-the-art. Using EventRacer, we found many harmful races, most of which are beyond the reach of current techniques.
Article
Full-text available
This paper presents results on the complexity of computing event orderings for sharedmemory parallel program executions. Given a program execution, we formally define the problem of computing orderings that the execution must have exhibited or could have exhibited, and prove that computing such orderings is an intractable problem. We present a formal model of a shared-memory parallel program execution on a sequentially consistent processor, and discuss event orderings in terms of this model. Programs are considered that use fork/join and either counting semaphores or event style synchronization. We define a feasible program execution to be an execution of the program that performs the same events as an observed execution, but which may exhibit different orderings among those events. Any program execution exhibiting the same data dependences among the shared data as the observed execution is feasible. We define several relations that capture the orderings present in all (or s...
Article
One of the most serious problems in the development cycle of large-scale parallel programs is the lack of tools for debugging and performance analysis. Parallel programs are more difficult to analyze than their sequential counterparts for several reasons. First, race conditions in parallel programs can cause non-deterministic behavior, which reduces the effectiveness of traditional cyclic debugging techniques. Second, invasive, interactive analysis can distort a parallel program's execution beyond recognition. Finally, comprehensive analysis of a parallel program's execution requires collection, management, and presentation of an enormous amount of information. This dissertation addresses the problem of debugging and analysis of large-scale parallel programs executing on shared-memory multiprocessors. It proposes a methodology for top-down analysis of parallel program executions that replaces previous ad-hoc approaches. To support this methodology, a formal model for shared-memory communication among processes in a parallel program is developed. It is shown how synchronization traces based on this abstract model can be used to create indistinguishable executions that form the basis for debugging. This result is used to develop a practical technique for tracing parallel program executions on shared-memory parallel processors so that their executions can be repeated deterministically on demand.
Article
Typescript (photocopy). Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 1989. Includes vita and abstract. Includes bibliographical references (leaves 117-127).
Article
A set of conditions are described which determine whether or not two successive portions of a given program can be performed in parallel and still produce the same results. The conditions are general and can be applied to sections of the program of arbitrary size. The conditions are interesting because of the light they shed on the structure of programs amenable to parallel processing and the memory organization of a multi-computer system. Copyright © 1966 by The Institute of Electrical and Electronics Engineers, Inc.