ArticlePDF Available

Detecting Data Races in Parallel Program Executions

February 1970

February 1970

Authors:

University of Wisconsin–Madison

Several methods currently exist for detecting data races in an execution of a shared-memory parallel program. Although these methods address an important aspect of parallel program debugging, they do not precisely define the notion of a data race. As a result, is it not possible to precisely state which data races are detected, nor is the meaning of the reported data races always clear. Furthermore, these methods can sometimes generate false data race reports. They can determine whether a data race was exhibited during an execution, but when more than one data race is reported, only limited indication is given as to which ones are real. This paper addresses these two issues. We first present a model for reasoning about data races, and then present a two-phase approach to data race detection that attempts to validate the accuracy of each detected data race. Our model of data races distinguishes among those data races that actually occurred during an execution (actual data races), those...

Content uploaded by Barton P. Miller

Content may be subject to copyright.

Detecting Data Races in Parallel Program Executions

Robert H. B. Netzer

netzer@cs.wisc.edu

Barton P. Miller

bart@cs.wisc.edu

Computer Sciences Department

University of Wisconsin−Madison

1210 W. Dayton Street

Madison, Wisconsin 53706

Research supported in part by National Science Foundation grant CCR-8815928, Ofﬁce of Naval Research grant N00014-89-J-1222, and a

Digital Equipment Corporation External Research Grant.

To appear in Languages and Compilers for Parallel Computing, D. Gelernter, T. Gross, A. Nicolau, and D. Padua eds., MIT press, 1991;

also appears in Proceedings of the 3rd Workshop on Programming Languages and Compilers for Parallel Computing, Irvine, CA, August 1990.

Abstract

Several methods currently exist for detecting data races in an execution of a

shared-memory parallel program. Although these methods address an important

aspect of parallel program debugging, they do not precisely deﬁne the notion of

a data race. As a result, is it not possible to precisely state which data races are

detected, nor is the meaning of the reported data races always clear. Further-

more, these methods can sometimes generate false data race reports. They can

determine whether a data race was exhibited during an execution, but when

more than one data race is reported, only limited indication is given as to which

ones are real. This paper addresses these two issues. We ﬁrst present a model

for reasoning about data races, and then present a two-phase approach to data

race detection that attempts to validate the accuracy of each detected data race.

Our model of data races distinguishes among those data races that actually

occurred during an execution (actual data races), those that could have occurred

because of timing variations (feasible data races), and those that appeared to

have occurred (apparent data races). The ﬁrst phase of our two-phase approach

to data race detection is similar to previous methods and detects a set of data

race candidates (the apparent data races). We prove that this set always contains

all actual data races, although it may contain other data races, both feasible and

infeasible. Unlike previous methods, we then employ a second phase which

validates the apparent data races by attempting to determine which ones are

feasible. This second phase requires no more information than previous

methods collect, and involves making a conservative estimate of the data depen-

dences among the shared data to determine how these dependences may have

constrained alternate orderings potentially exhibited by the execution. Each

apparent data race can then be characterized as either being feasible, or as

belonging to a set of apparent data races where at least one is feasible.

1. Introduction

In shared-memory parallel programs, if accesses to shared data are not properly coordinated, a

data race can result, causing the program to behave in a way not intended by the programmer.

Detecting data races in a particular execution of a parallel program is an important part of

debugging. Several methods for data race detection have been developed[1, 3,4,9,15,17].

Although these methods provide valuable tools for debugging, they do not precisely deﬁne the

notion of a data race. As a result, we cannot precisely state which data races are detected by

these methods. In addition, false data race reports can sometimes be generated. These

methods can determine whether or not a data race occurred, but when more than one data race

is reported, no indication is given as to which ones are real. Failure to characterize the detected

data races, and the generation of false data race reports, can make it difﬁcult to use these

methods for debugging the program and locating the cause of the data races. This paper

addresses these two issues. We ﬁrst present a formal model in which to reason about data

races, and then outline a two-phase approach to data race detection that is discussed entirely in

terms of the model. The ﬁrst phase performs essentially the same type of analysis as previous

methods, and detects a set of candidate data races (which we call apparent data races). Unlike

previous methods, we then employ a second phase that validates each of the apparent data

races to determine which ones are real. By providing a model for reasoning about data races,

the correctness of our techniques can be convincingly argued, and the meaning of the data race

reports (generated by our methods or others) is made explicit. By validating the apparent data

races, the programmer can be provided with information crucial to debugging.

One purpose of adding explicit synchronization to shared-memory parallel programs is to

coordinate accesses to shared data. Some programs are intended to behave deterministically,

and for these programs synchronization is usually designed to force all shared-data accesses to

the same location to occur in a speciﬁc order (for some given program input). When the order

of two shared-memory accesses made by different processes (to the same location) is not

enforced, a race condition is said to exist[5,6], possibly resulting in a nondeterministic execu-

tion. In contrast, other programs are not intended to be deterministic, and for these programs

synchronization is usually added to ensure that some sections of code execute as if they were

atomic (i.e., to implement critical sections). For example, consider a section of code that adds

up a list of shared data representing the deposits to a bank account during a certain month. If

this section of code does not execute as if it were atomic (because, for example, another section

of concurrently executing code is debiting the account), the computed deposit total might not

be correct. A section of code is guaranteed to execute atomically if the shared variables it

reads and modiﬁes are not modiﬁed by any other concurrently executing section of code[2]. If

these conditions are not met, a data race is said to exist. Since nondeterministic behavior can

result, a data race is a special case of the more general race condition. In this paper we focus

on data race detection.

To provide a mechanism for reasoning about data race detection, we present a model for

representing executions of shared-memory parallel programs, on sequentially consistent pro-

cessors, that use fork/join and counting semaphores. Our model distinguishes between the ord-

ering of events that actually occurred during execution and the ordering that could have

occurred. Given an actual execution of the program, we characterize alternate event orderings

that the execution could have exhibited. Possible orderings include those that could still allow

the original data dependences among the shared data to occur and that do not violate the

semantics of the explicit synchronization primitives used by the program. An execution exhi-

biting such an alternate ordering is called a feasible program execution. The characterization

of feasible program executions in general requires knowledge of which shared-data depen-

dences (if any) were exhibited between any two events performed by the execution. Since

recording this information is not practical in general, we characterize approximate information

in terms of our model. We show how the information recorded by previous methods can be

used to deﬁne an approximate program execution. We then distinguish between three types of

data races. Actual data races are those actually exhibited during an execution, feasible data

races are those that could have occurred because of nondeterministic timing variations, and

apparent data races are those that appeared to have occurred from analyzing the approximate

information. Previous methods detect apparent data races. We show that apparent data races

are not always actual or feasible, and show how a two-phase approach can be used to detect

and then validate the apparent data races. The ﬁrst phase is essentially identical to previous

methods and simply detects the apparent data races. We prove that each actual data race is also

apparent. The approach employed by previous methods is therefore safe in the sense that no

actual data races are left undetected. We also employ a second phase that classiﬁes each

apparent data race as either feasible or as belonging to a set of data races that contains at least

one feasible data race. Performing such a validation provides the programmer with some infor-

mation as to which of the apparent data races should be investigated for debugging.

2. Previous Data Race Detection Methods

All previous methods for dynamic data race detection operate by ﬁrst instrumenting the pro-

gram so that information about its execution is recorded, and then executing the program and

analyzing the collected information. These methods all analyze essentially the same informa-

tion about the execution, but differ mainly in how and when that information is collected and

analyzed. Two approaches to this information collection and analysis have been proposed:

on-the-ﬂy and post-mortem. On-the-ﬂy techniques[4,9,17] detect data races by an on-going

analysis during execution that encodes information about the execution so it can be accessed

quickly and discarded as it becomes obsolete. Post-mortem techniques[1,3,15] detect data

races after execution ends by analyzing trace ﬁles that are produced during execution.

Although all previous methods never fail to detect any data races actually exhibited during exe-

cution (we prove this claim in Section 7), they do not precisely locate where these data races

occurred. We brieﬂy describe the common characteristics of these methods below.

Previous methods instrument the program to collect the same information: which sections

of code executed, the set of shared variables read and written by each section of code, and the

relative execution order between some synchronization operations. To represent this relative

ordering, a DAG is constructed (either explicitly or in an encoded form), which we call the

ordering graph, in which each node represents an execution instance of either a synchroniza-

tion operation (a synchronization event) or the code executed between two synchronization

operations (a computation event)

†

. Edges are added from each event to the next event belong-

ing to the same process, and between some pairs of synchronization events (belonging to dif-

ferent processes) to indicate the order in which the synchronization events executed. The vari-

ous methods differ in the types of synchronization handled, but they all handle fork/join (in

one form or another). An edge is added from each fork event to the ﬁrst event in each child

created by the fork, and from the last event in each child to the corresponding join event.

The crux of data race detection is the location of events that accessed a common shared

variable (that at least one wrote) and that either did or could have executed concurrently. Find-

ing events that accessed a common shared variable is straightforward, since the sets of shared

variables read and written by each event is recorded. To determine if two events could have

executed concurrently, all previous methods analyze the ordering graph. Two computation

events are assumed to have potentially executed concurrently if no path in the graph connects

the two events. Data races are therefore reported between pairs of events that accessed a com-

mon shared variable and that have no connecting path. However, this assumption is not always

true, and causes previous methods to generate potentially many false data race reports.

To illustrate these false reports, consider the program fragment in Figure 1. This program

creates two identical children that remove from a shared queue the lower and upper bounds of a

region of a shared array to operation upon, perform some computation on that region of the

array, and loop until the queue is empty. The queue initially contains records representing dis-

joint regions of the array. A correct execution of this program should therefore exhibit no data

races, since only disjoint regions of the shared array should be accessed concurrently. How-

ever, assume that the ‘‘remove’’ operations do not properly synchronize their accesses to the

shared queue. An ordering graph for one possible execution of this program is also shown (the

dotted lines only illustrate the data races and are not part of the graph). In this execution, the

ﬁrst ‘‘remove’’ operation performed by the left child completed before the ﬁrst ‘‘remove’’ per-

formed by the right child began (the nodes are staggered horizontally to indicate this order).

The ﬁrst two records were therefore correctly removed, and both children operated (correctly)

on disjoint regions of the array. However, during the next iteration, the ‘‘remove’’ operations

actually overlapped, and the right child correctly removed the fourth record, but the left child

removed the upper bound (100) from the third record and the lower bound (300) from the

fourth record. The left child therefore operated (erroneously) on region [100,299] of the array.

In this graph, no paths connect any nodes of the left child with any nodes of the right

child. Since both children accessed the same queue, previous methods would report four data

races between the ‘‘remove’’ operations (shown by the ﬁnely dotted lines). Similarly, since

both children accessed a common region of the array, a data race would also be reported

between these array accesses (shown by the coarsely dotted line). The latter data race report

can be misleading, however, since the accesses by the left child to region [100,299] did not,

and could never, execute concurrently with the accesses to region [200,299] made by the right

child. For these accesses to execute concurrently, the second ‘‘remove’’ operation performed

† Some methods do not actually construct a node to represent a computation event but rather represent the

event by an edge connecting the two surrounding synchronization events[4, 15].

fork

join

fork

remove (L,U)

loop

while QueueNotEmpty

remove (L,U)

loop

while QueueNotEmpty

join

Initial state of Queue:

remove [200,300]

remove [300,400]

remove [100,200]

remove [100,300]

[100,200]

[200,300]

[1,100]

[300,400]

work on region [L,U−1]

of shared array

work on region [L,U−1]

of shared array

work on [200,299]

work on [300,399]

work on [100,199]

work on [100,299]

Real data races

False data race

Figure 1. Example program fragment and ordering graph

(the dotted lines only illustrate the reported data races)

by the left child would have to execute before the second ‘‘remove’’ operation performed by

the right child (with which it originally overlapped). If this would happen, the erroneous

record [100,300] would not be removed by the left child, since it would no longer overlap with

the other ‘‘remove’’ operation, and a different region of the array would be accessed.

If the array accesses were more complex, perhaps creating other children, there may have

been many nodes in the graph representing these accesses. In such a case, many false data race

reports would be generated, instead of only one. In this example, the data races are caused by

lack of synchronization in the ‘‘remove’’ operations. The fact that non-disjoint regions of the

array were accessed is an artifact of this missing synchronization, and does not represent a bug

in the program. Reporting many false data races to the programmer, only one of which

involves events that did (or could) execute concurrently, complicates the job of debugging.

False data race reports can result whenever shared variables are used (either directly or

transitively) in conditional expressions or in expressions determining which shared locations

are accessed (e.g., shared array subscripts). Accurate data race detection involves examining

how shared data ﬂowed through the execution and whether the execution might have changed

had a different ordering occurred. This paper presents results showing how to validate the

accuracy of each data race without recording additional information about the execution.

3. Program Execution Model

Before discussing data race detection, we ﬁrst present a formal model to provide a mechanism

for reasoning about shared-memory parallel program executions. The model contains the

objects that represent a program execution (such as which statements were executed and in

what order), and axioms that characterize properties those objects must possess. This model is

useful as a notational device for describing behavior the execution actually exhibited. We later

show how it can also be used to speculate on behavior that the execution could have exhibited

(such as alternate event orderings) due to nondeterministic timing variations. Our model

describes programs that use counting semaphores and the fork/join construct.

3.1. General Model

Our model is based on Lamport’s theory of concurrent systems[13], which provides a formal-

ism for reasoning about concurrent systems that does not assume the existence of atomic opera-

tions. In Lamport’s formalism, a concurrent system execution is modeled as a collection of

operation executions. Two relations on operation executions, precedes ( ) and can causally

affect ( ), describe a system execution; a b means that a completes before b begins (in

the sense that the last action of a can affect the ﬁrst action of b), and a b means that some

action of a precedes some action of b. We use Lamport’s theory, but restrict it to the class of

shared-memory parallel programs that execute on sequentially consistent processors[11].

When the underlying hardware guarantees sequential consistency, any two events that ex-

ecute concurrently can affect one another (i.e., a

b ⇔ a b ∧ b a).

†

Given sequen-

tial consistency, a single relation is sufﬁcient to describe the temporal aspects of a system exe-

cution. For this purpose we introduce

, the temporal ordering relation among events; a

b means that a completes before b begins, and a

b means that a and b execute

concurrently (i.e, neither completes before the other begins). We should emphasize that we are

deﬁning the temporal ordering relation so it describes the order in which events actually exe-

cuted during a particular execution; e.g., a

b means that a and b actually executed con-

currently, and does not mean that a and b could have executed in any order. In Section 5 we

show how to speculate on alternate temporal orderings that could have been exhibited.

In addition, we replace the relation with the transitive shared-data dependence rela-

tion (or just shared-data dependence relation for brevity),

. This relation shows when one

event can causally affect another either because of a direct data dependence involving a single

shared variable, or because of a chain of direct dependences involving several different vari-

ables. A direct shared-data dependence from a to b (denoted a

b) exists if a accesses a

shared variable that b later accesses (where at least one access modiﬁes the variable); we also

say that a direct dependence exists if a precedes b in the same process, since data can in gen-

† In Lamport’s terminology, we are considering the class of system executions that have global-time models.

Throughout this paper, we use superscripted arrows to denote relations, and write a

b as a shorthand for ¬(a

b), and a

b as a shorthand for ¬(a b) ∧ ¬(b a).

eral ﬂow through non-shared variables which are local to the process. A transitive shared-data

dependence (a

b) exists if there is a chain of direct dependences from a to b (possibly in-

volving events that access different shared variables);

= (

)

, the irreﬂexive tran-

sitive closure of the direct shared-data dependence relation

‡

. This deﬁnition of data depen-

dence is different from the standard ones[10] since we consider transitive dependences involv-

ing ﬂow-, anti-, and output-dependences, and do not explicitly state the variables involved.

We deﬁne a program execution, P, to be a triple, 〈E,

〉, where E is a ﬁnite set

of events, and

and

are the relations over E described above. We refer to a given

program execution, P, as an actual program execution when P represents an execution that the

program at hand actually performed. Each event e∈E represents the execution of a set of pro-

gram statements, and possesses two attributes, READ(e) and WRITE (e), the set of shared vari-

ables read and written by those statements. A data conﬂict is said to exist between two events

if one events writes a shared variable that the other reads or writes. The temporal ordering and

shared-data dependence relations must satisfy the following axioms:

A1.

is an irreﬂexive partial order.

A2. If a

d then a

A3. If a

b then b

No generality is lost by modeling each event, e, as having a unique start time (e

) and

ﬁnish time (e

)[12]. A total ordering on the start and ﬁnish times is called a global-time model.

Given a global-time model, the

relation is deﬁned as follows: a

b iff a

< b

, and a

b iff a

< b

∧ b

< a

. Axioms A1-A3 can also be written in terms of the start and

ﬁnish times. We will occasionally employ such a view in proofs of our results.

3.2. Model Applied to Semaphores and Fork/Join

So far, the model does not describe any of the synchronization aspects of a program execution.

By imposing some structure on the set of events, E, and by adding axioms that describe the se-

mantics of synchronization operations, we extend the general model to describe programs that

use counting semaphores and the fork/join construct. Other types of synchronization can be

similarly accommodated.

We assume that a program execution consists of a number of processes, each of which ei-

ther exists when the program execution begins or is created during execution by a fork opera-

tion. Similarly, a process either continues to exist until the program execution ends or until the

process (and all others created by the same fork operation) is terminated by a join operation.

The set of events belonging to process p is denoted by E

, and therefore E =

∪E

, for all

‡ The transitive shared-data dependence relation is conservative in the sense that when data ﬂows from a to b

it always shows a dependence from a to b, but also sometimes shows a dependence when in fact no data ﬂow oc-

curs. A more precise characterization of causality would require examining the semantics of the individual ac-

tions performed by each event.

processes p that exist during the program execution. Each process is viewed as containing a to-

tally ordered sequence of events, and the term e

p,i

will denote the i

event in the execution of

process p. The following axiom describes the total ordering imposed on events belonging to

the same process:

A4. e

p,i

p,i+1

for all processes p and 1 ≤ i < E

To describe the presence of synchronization operations, we distinguish between different

types of events. A synchronization event is an instance of a semaphore operation (a P event or

a V event), a fork operation (a fork event), or a join operation (a join event). The set of all P

and V operations on semaphore i is denoted by E

P(i)

and E

V(i)

, respectively. A computation

event is an instance of a group of statements, belonging to the same process, that executed con-

secutively, none of which are synchronization operations. Any arbitrary grouping of (consecu-

tively executed) statement instances that does not include a synchronization operation deﬁnes a

computation event.

To describe the semantics of synchronization operations, we add additional axioms to our

model. A fork event, Fork

p,i

, is assumed to precede all events in the child processes which it

creates, and all events in these child processes are assumed to precede the subsequent join

event in process p, Join

p,i+k

A5. For all child processes, c, created by each Fork

p,i

and terminated at Join

p,i+k

Fork

p,i

c,j

Join

p,i+k

1 ≤ j ≤ E

We assume that in any program execution the semaphore invariant[7] is always maintained.

For counting semaphores, the semaphore invariant is maintained iff at each point in the execu-

tion, the number of V operations that have either completed or have begun executing is greater

than or equal to the number of P operations that have completed. For each semaphore, S, this

invariant can be expressed by the following axiom:

A6. For every subset of P events, P ⊆ E

P(S)

{ v | v∈E

V(s)

∧ ∃p∈P (v

p ∨ v

p) } ≥ P .

The above version of axiom A6 assumes that the initial value of each semaphore is zero. An

arbitrary initial value, m, for some semaphore could be described by creating an artiﬁcial pro-

cess that contains m V-events that precede all other events.

3.3. Higher-Level Views

It is useful to be able to view a program execution at different levels of abstraction, since infor-

mation about the execution may be collected at that level, and because sometimes it is useful to

abstract irrelevant details of part of an execution into a higher-level event. We can reason

about a program execution at any level of abstraction by following Lamport and deﬁning a

higher-level view. A higher-level view of a program execution P = 〈E,

〉 is P = 〈E,

〉 where

(H1) E partitions E, and ∀e′∈E,

READ(e′) =

e∈e′

∪ READ (e), and WRITE(e′) =

e∈e′

∪ WRITE(e).

(H2) A

B ⇔ ∀a∈A, b ∈B (a

b), and

(H3) A

B ⇔ ∃a∈A, b∈B (a

b).

A higher-level view always obeys axioms A1-A3. Since axioms A4-A6 are deﬁned in terms of

synchronization and computation events, they are also obeyed if each higher-level event con-

sists of either a single synchronization event from E, or only a set of computation events from

E. In such a case, each event e′∈E inherits its type from the type of the events comprising e′.

When the higher-level events are deﬁned to partition E in this way, P obeys axioms A1-A6 and

is then itself a program execution.

4. Representing Actual Program Executions

The model we have presented so far captures complete information about a program execution

in the sense that

shows the relative ordering in which any two events actually executed,

and

shows the actual shared-data dependences between any two events. In practice,

recording such complete information is not practical, and we now discuss how to represent par-

tial information about a program execution in our model. Our intent is not to discuss details of

program instrumentation, but rather to outline one type of information that is sufﬁcient for data

race detection, and show how this information is represented in our model. We will deﬁne ap-

proximate counterparts to the

and

relations capturing partial information about a

program execution that is based on the type of information previous methods record. This in-

formation can be recorded without tracing every shared-memory access and without introduc-

ing a central bottleneck into the program. The resulting approximate relations,

and

deﬁne an approximate program execution, P

= 〈E,

〉. In Section 7 we show how P

can be used for data race validation.

4.1. Approximate Temporal Ordering

As described in Section 2, previous data race detection methods record the temporal ordering

among only some synchronization events. For example, the order among fork and join opera-

tions and their child processes is recorded, but the relative order of operations belonging to the

different child processes is not. Recording incomplete ordering information is desirable be-

cause the required instrumentation can be embedded into the implementation of the synchroni-

zation operations without introducing additional synchronization. Not introducing additional

synchronization ensures that the instrumentation will not create a central bottleneck which

could reduce the amount of parallelism achievable by the program. We assume that the pro-

gram is instrumented to record such incomplete ordering information, and that the ordering is

represented by constructing a graph, called the temporal ordering graph, similar to the ordering

graphs used by previous methods. For every event, e, this graph contains two nodes

†

, e

and e

(corresponding to the start and ﬁnish of e), and an edge from e

and e

. The graph deﬁnes an

† In practice, it is not always necessary to actually construct two nodes per event, but we use such a

representation here since it conceptually follows our model.

approximate temporal ordering relation,

, as follows: a

b iff there is a path from a

, b

a iff there is a path from b

to a

, and a

b otherwise. We assume the pro-

gram instrumentation constructs a temporal ordering graph that gives

the following pro-

perties:

(1) If a

b then a

b, and

(2) If a

b then the explicit synchronization performed by the execution did not

prevent a and b from executing concurrently.

The ﬁrst property states that the ordering of events given by

must be consistent with the

order in which they actually executed (i.e.,

⊆

). The second property means that any

linear ordering of the graph is a global-time model deﬁning a temporal ordering that obeys the

synchronization axioms (i.e., axioms A4-A6). Recall that

was deﬁned to represent the ac-

tual order in which events executed; a

b means that a and b actually overlapped during

execution. Since

is an approximation of

(it is a subset), a

b does not (neces-

sarily) mean that a and b actually overlapped. Instead, it means that a and b were not forced to

occur in a certain order by explicit synchronization (the graph does not contain enough infor-

mation to determine their actual execution order). As illustrated in Figure 1, such events may

nonetheless be ordered. The goal of data race validation (discussed in Section 7) is determin-

ing which events that are not ordered by

could have indeed executed concurrently.

For programs using fork/join, the ordering graphs constructed by previous methods satis-

fy the above two properties. To accommodate semaphore synchronization, edges can be added

to the temporal ordering graph by recording the order of all operations on a given semaphore.

Such an ordering can be recorded without introducing additional synchronization into the pro-

gram (as mentioned above for fork/join operations). To reﬂect this ordering in the graph, an

edge can be drawn from each semaphore operation (on a given semaphore) to the next opera-

tion on the same semaphore. Since

only needs to obey the synchronization axioms, other

approaches for adding edges (that result in more events being unordered by

, allowing

more data races to be detected) are possible. For example, previous methods that handle sema-

phores[3,4, 15] construct edges only from a V operation to the P operation it allowed to

proceed. More sophisticated approaches have also been investigated[8].

4.2. Approximate Shared-Data Dependences

Determining the actual shared-data dependences exhibited by an execution would in general

require the relative order of all shared memory accesses to be recorded. However, in Section 2

we mentioned that previous methods record the READ and WRITE sets for each computation

event. By using only these sets and the approximate temporal ordering, the actual shared-data

dependences can be conservatively estimated. The approximate shared-data dependence rela-

tion,

, is deﬁned by speculating on what the actual shared-data dependences might have

been. Consider two events, a and b, that both access a common shared variable (where at least

one access is a modiﬁcation). If a

b, then there is a direct shared-data dependence from a

to b. When a

b, the direction of any direct dependence cannot be determined (since the

actual temporal ordering between a and b is not known), and we make the conservative as-

sumption that a direct dependence exists from a to b and from b to a. This assumption is con-

servative since it will always include the actual direct dependences, although it may introduce a

dependence from b to a when in fact the only dependence was from a to b. The

relation

is then deﬁned as the irreﬂexive transitive closure (see Section 3.1) of this approximation of the

direct dependences. As we will see, for data race validation

only needs to be computed

between events a and b when a

5. Characterizing Alternate Temporal Orderings

An actual program execution describes aspects of how the program actually performed, and

does not contain any information regarding what the program might have done. For example,

is deﬁned to represent the actual temporal ordering in which any two events were per-

formed. In a given program execution, the temporal ordering between some events is not al-

ways enforced by (explicit or implicit) synchronization, but sometimes occurs by chance. It is

possible that another execution of the program could perform exactly the same events, but ex-

hibit a different temporal ordering among these events. In this section we characterize such al-

ternate temporal orderings that an actual program execution, P, could have exhibited because

of nondeterministic timing variations. To determine how much the temporal ordering of P can

be disturbed without affecting the events performed, we consider the shared-data dependences

exhibited by P. Any execution exhibiting these same dependences is capable of performing the

same events. We later use this characterization of alternate orderings to distinguish between

data races actually exhibited by an execution and data races that could have been exhibited.

For a given execution of a program, consider a particular view of the execution (called a

single-access view) in which each computation event is deﬁned to comprise at most one

shared-memory access. The program execution describing this view, P

, shows the data

dependences among the individual shared-memory accesses made by the execution. These

single-access shared-data dependences uniquely characterize the events performed. Since the

execution outcome of each statement instance depends only upon the values of the variables it

reads[14], the single-access dependences uniquely determine the program state at each step in

each process.

†

Any temporal ordering that could still allow these data dependences to occur

(and that would not violate the semantics of the synchronization operations) is an ordering the

execution could have exhibited. Therefore, any other single-access program execution, P

′,

possessing the same events and (single-access) shared-data dependences as P

, represents an

execution the program could actually exhibit, regardless of how its temporal ordering differs

from that of P

† For this statement to hold, interactions with the external environment must be modeled as shared-data

dependences.

Similarly, this result also holds for higher-level views of the program execution. In

higher-level views, computation events can consist of many shared-memory accesses. In the

following theorem we show that any higher-level program execution possessing the same

events and (higher-level) shared-data dependences as P describes an execution the program

could actually exhibit, regardless of how its temporal ordering differs from that of P. We call a

program execution that could actually be exhibited a feasible program execution. The follow-

ing theorem gives sufﬁcient conditions for a program execution to be feasible.

Theorem 5.1.

Let P = 〈E,

〉 be an actual program execution. P′ = 〈E,

T′

D′

〉 is a feasi-

ble program execution if

(F1) P′ is a valid program execution (axioms A1-A6 are satisﬁed), and

(F2)

D ′

Proof.

We will use the result mentioned above that any single-access program execution possess-

ing the same (single-access) shared-data dependences as those that actually occurred represents

an execution the program could exhibit[14]. This theorem extends the result to higher-level

program executions. Since computation events in higher-level program executions can consist

of more than one shared-memory access, there may be more than one single-access program

execution for which P, the actual program execution, is a higher-level view. Therefore, given a

higher-level view, we do not always know which shared-data dependences actually occurred at

the single-access level. To show that P′ is a feasible program execution, we must show that it

is a higher-level view of a single-access program execution possessing the actual single-access

dependences. However, since these dependences are not known, we will show that the shared-

data dependences exhibited by each single-access program execution described by P are also

exhibited by some single-access execution described by P′. We will then be guaranteed that,

no matter which single-access shared-data dependences were exhibited during P, an execution

capable of exhibiting those same dependences is described by P′.

The single-access program executions that are described by P is given by the set

{ P

= 〈E

〉 P is a higher-level view of P

}, and the single-access executions

described by P′ is given by { P

′ = 〈E

′

〉 P′ is a higher-level view of P

′ }. We

must prove that each

is equal to some

′

. We ﬁrst show that for any pair of higher-

level events, we can always ﬁnd some

′

exhibiting the same shared-data dependences as

any

among the lower-level events comprising these events, and then show that this

guarantees some

′

exists exhibiting the same dependences as any

among all the

lower-level events comprising the actual program execution (which shows

′

First, consider any P

, and its (single-access) shared-data dependences among the lower-

level events a

∈ a and b

∈ b comprising any two higher-level events a and b. We now show

a P

′ exists exhibiting these same dependences. Since each lower-level event comprises at

most one shared-memory access, it sufﬁces to show that some P

′ exists such that b

′

whenever a

, and a

′

whenever b

, for any a

∈ a and b

∈ b.

Case (1): a

b and a

T′

b. In this case,

can only contain shared-data dependences

from some a

to some b

, and all P

′ have a

T′

for all a

∈ a and b

∈ b.

Case (2): a

b and a

T′

b. As with case (1),

can only contain shared-data

dependences from some a

to some b

. Some P

′ must exist in which b

′

for all

∈ a and b

∈ b, since otherwise b

′

for all a

∈ a and b

∈ b would imply b

T′

a, contradicting the assumption a

T′

Case (3): a

b and b

T′

a. In this case,

can contain no shared-data dependences

between any a

and b

(or else P′ would violate A3).

Case (4): a

b and a

T′

b. Since

can contain shared-data dependences only

from some a

to some b

(or else P′ would violate A3), this case is analogous to case (1).

Case (5): a

b and a

T′

b. In this case,

can contain shared-data dependences

in both directions between the a

and b

, and since the set of single-access program executions

described by P′ contains all possible temporal orderings among the a

and b

that cause a

T′

b, some P

′ clearly exists with the desired properties.

Finally, we show that each

equals some

′

. Notice that when there are events a

and b that overlap, P′ describes more than one single-access program execution. These

single-access program executions contain all possible (legal) temporal orderings among the

lower-level events a

∈ a and b

∈ b that cause a and b to overlap. The set of all single-access

program executions described by P′ can be constructed by choosing, for each pair of higher-

level events a and b, one such temporal ordering among the lower-level events comprising a

and b. We showed above that for any

, some

′

exists exhibiting the same shared-

data dependences among the lower-level events comprising any pair of higher-level events.

Using this result, we can always ﬁnd a

′

exhibiting the same shared-data dependences as

any

among all the lower-level events by independently considering each pair of higher-

level events. Therefore, each

is equal to some

′

, which proves the theorem.

Given an actual program execution, P, we are not attempting to predict the behavior of

the program had different shared-data dependences occurred. Instead, the above theorem

characterizes different program executions (performing the same events as P) that we can

guarantee the program is capable of exhibiting. Indeed, there may be program executions that

violate the above conditions but nevertheless perform the same events. However, characteriz-

ing such program executions requires analyzing the semantics of the program itself, to deter-

mine what effects different shared-data dependences would have on P.

6. Deﬁnition of Data Race

We can now characterize different types of data races in terms of our model. We distinguish

between an actual data race, which is a data race exhibited during an actual program execu-

tion, and a feasible data race, which is a data race that could have been exhibited because of

timing variations. We also characterize the apparent data races, those data races detected by

searching the ordering graph for data-conﬂicting events that are not ordered by the graph

(which are the data races reported by previous methods). As discussed in the next section, the

problem of data race validation is determining which apparent data races are feasible.

Deﬁnition 6.1

A data race under

exists between a and b iff

(DR1) a data conﬂict exists between a and b, and

(DR2) a

Deﬁnition 6.2

An actual data race exists between a and b iff

(AR1) P = 〈E,

〉 is an actual program execution, and

(AR2) a data race under

exists between a and b.

Deﬁnition 6.3

A feasible data race exists between a and b iff

(FR1) there exists some feasible program execution, P′ = 〈E,

T′

D′

〉, and

(FR2) a data race under

T′

exists between a and b.

Deﬁnition 6.4

An apparent data race exists between a and b iff

(AP1) P

= 〈E,

〉 is an approximate program execution, and

(AP2) a data race under

exists between a and b.

7. Detecting Data Races

We now present our two-phase approach to data race detection. In the ﬁrst phase, the apparent

data races are located by using the approximate information collected about the execution to

construct and then analyze the temporal ordering graph. This ﬁrst phase performs the same

type of analysis as previous data race detection methods. Unlike previous methods, we then

employ a second phase to validate each apparent data race by attempting to determine whether

or not the race is feasible. This determination is made by ﬁrst augmenting the temporal order-

ing graph with additional edges representing a conservative estimate of the shared-data depen-

dences, and then analyzing the resulting graph for cycles. Such a two-phase approach has the

advantage that approximate information (such as that recorded by previous methods) can be

used, but the programmer can still be provided with information regarding the feasibility of the

reported data races. Throughout the remainder of this section, P

= 〈E,

〉 will denote

an approximate program execution, and P = 〈E,

〉 will denote the actual program ex-

ecution (which is unknown).

7.1. Phase I: Detecting Apparent Data Races

The ﬁrst phase of our data race detection method identiﬁes the apparent data races. The ap-

parent data races are located by ﬁrst constructing the temporal ordering graph and then search-

ing the graph for pairs of data-conﬂicting events, a and b, whose nodes have no connecting

path (implying that a

b). In general, these data races include all actual data races, plus

additional races, both feasible and infeasible. This phase cannot distinguish among these types

of races since doing so would require knowledge of the complete temporal ordering.

Because the apparent data races are detected using an approximate temporal ordering, not

all apparent data races are always actual or feasible. Figure 1 showed an example of an ap-

parent data race that was not feasible. However, we now prove that each actual data race is

also an apparent data race. The naive method of simply reporting all apparent data races to the

user (which is the approach of previous methods) is therefore safe in the sense that no actual

data races are left undetected.

Theorem 7.1.

Every actual data race is also an apparent data race.

Proof.

If there is an actual data race between a and b, then a

b. To show that there is an

apparent data race between a and b, we must show that a

b. By deﬁnition, the temporal

ordering graph is constructed so that a

b ⇒ a

b (see Section 3). We must show that

this implies a

b ⇒ a

b. Consider that the contrapositive of the assumption is a

b ⇒ a

b, which is equivalent to a

b ∨ a

b, or a

b ∨ b

a ∨ a

b. But if a

b ⇒ a

b, then this becomes a

b ∨ b

a ∨ a

b, or

¬(a

b) ∨ a

b, which is equivalent to a

b ⇒ a

Note that the proof of Theorem 7.1 does not make use of the speciﬁcs of how the temporal

ordering graph is constructed. Indeed, any approximate temporal ordering,

, with the pro-

perty a

b ⇒ a

b is sufﬁcient to allow all actual data races to be detected as apparent

data races. However, the more exhaustively the program is traced, the more accurately the ap-

parent data races can be validated, as is shown below. As we will also show below, apparent

data races have the property that the presence of an apparent data race implies that there is a

feasible data race somewhere in the program execution, implying that when no actual data

races occur, no apparent data races will be reported.

7.2. Phase II: Validating Apparent Data Races

The ﬁrst phase of our data race detection method locates a set of apparent data races. We now

outline the second phase, which validates each apparent data race by attempting to determine

whether or not the race is feasible. This determination is made by ﬁrst augmenting the tem-

poral ordering graph with edges representing a conservative estimate of the actual shared-data

dependences, and then searching the augmented graph for certain types of cycles. Each ap-

parent data race can be characterized either as being feasible, or as belonging to a set of ap-

parent data races where at least one is feasible.

To show that an apparent data race between a and b is feasible, we must guarantee that

some feasible program execution, P′ = 〈E,

T′

〉, exists such that a

T′

b. To

determine the feasibility of a program execution requires knowledge of

, the shared-data

dependences exhibited by the observed execution. When only an approximate program execu-

tion is available, however, the exact shared-data dependences are not known. By using the

conservative estimate of these dependences,

, we can guarantee that some feasible pro-

gram executions must exist. We augment the temporal ordering graph with edges, called

shared-data dependence edges, representing this conservative estimate. Let G be the temporal

ordering graph. We construct the augmented temporal ordering graph, G

AUG

, by augmenting

G with edges that ensure there is a path from a

to b

whenever a

b. These edges ensure

that any possible shared-data dependence from a to b would be allowed to occur in certain pro-

gram executions deﬁned by G

AUG

(shown below in the proof of Theorem 7.2). If a

then a path from a

to b

already exists. Edges are therefore added only when a

b. In

this case, edges are added from a

to b

and from b

to a

if there is a data conﬂict between a

and b, or if a has a data conﬂict with some other event c that also has a data conﬂict with b and

In general, G

AUG

may contain cycles, due to the conservative approximation made about

the actual shared-data dependences. By classifying the apparent data races into those that par-

ticipate in cycles and those that do not, some apparent data races can be guaranteed to be feasi-

ble. We say that two events, a and b, are tangled if either a

and b

, or b

and a

, belong to the

same strongly connected component

†

of G

AUG

. A tangled data race is an apparent data race

between two tangled events. Each strongly connected component deﬁnes a set of tangled data

races, called a tangle. We now show (in Theorem 7.2) that any apparent data race between two

events that are not tangled is guaranteed to be feasible. We then show (in Theorem 7.3) that in

each tangle, at least one of the apparent data races is guaranteed to be feasible.

Lemma 7.1.

For a given execution, assume P = 〈E,

〉 and P

= 〈E,

〉 are the associ-

ated complete and approximate program executions. Let G be the temporal ordering

graph deﬁning

, and let G

AUG−D

be G augmented with edges representing the actual

shared-data dependences,

. Any linear ordering of the nodes of G

AUG−D

is a global-

time model that deﬁnes a temporal ordering relation,

T′

, such that P′ = 〈E,

T′

〉 is a feasible program execution.

Proof.

We introduce G

AUG−D

as a device for showing that certain feasible program executions

must exist. G

AUG−D

is identical to G

AUG

, except that edges representing shared-data depen-

† A strongly connected component has the property that there is path from every node in the component to

every other node, but no path from a node in one component to a node in another component and back.

dences that did not actually occur do not appear (they were conservatively added to G

AUG

that no actual shared-data dependences were missed). Even though we do not have enough in-

formation to construct G

AUG−D

, it nonetheless exists, and in Theorems 7.2 and 7.3 we prove

that it must possess certain properties. In this Lemma, we prove that any linear ordering of the

nodes of this graph can be used to deﬁne a feasible program execution. Any such linear order-

ing deﬁnes a temporal ordering that satisﬁes the conditions for feasibility. The shared-data

dependence constraint (axiom A3) is satisﬁed since G

AUG−D

contains shared-data dependence

edges representing the actual shared-data dependences. The synchronization constraints (ax-

ioms A4-A6) are satisﬁed since G

AUG−D

contains at least as many edges as G, and by

deﬁnition, any linear ordering of G obeys axioms A4-A6.

Let L be any linear ordering of the nodes of G

AUG−D

, and let

T′

be the temporal order-

ing deﬁned by L. We ﬁrst show that

T′

satisﬁes axioms A1-A6:

A1. The

T′

relation is irreﬂexive, since if a

T′

a for some event a, then a

would

have to appear before a

in L, which is not possible, since by deﬁnition G contains an

edge from b

to b

for every event b. By the same argument, the

T′

relation is

asymmetric. The

T′

relation is transitive, since if a

T′

b ∧ b

T′

c, then a

appears before b

in L, and b

appears before c

. Since an edge exists from b

to b

for every event b, it follows that a

appears before c

in L, implying that a

T′

A2. Assume a, b, c, d exist such that a

T′

d. By the deﬁnition of

T′

, L must contain the nodes a

, in this order, implying that a

T′

A3. Since G

AUG−D

contains an edge from a

to b

whenever a

b, a

will precede b

in L, implying that b

T′

A4-A6. Because G

AUG−D

contains no fewer edges than G, axioms A4 through A6 are

satisﬁed since, by deﬁnition, any linear ordering of G obeys these axioms.

Since

T′

satisﬁes axioms A1-A6, P′ = 〈E,

T′

〉 is a feasible program execution.

Theorem 7.2.

If there is an apparent data race between a and b, and a and b are not tangled, then the data

race is feasible.

Proof.

We ﬁrst show that G

AUG

contains no path from a

to b

, and no path from b

to a

, and

then show that this implies the apparent data race between a and b is feasible.

To show that G

AUG

contains no path from a

to b

, and no path from b

to a

, we will es-

tablish a contradiction by assuming there is a path from a

to b

, or a path from b

to a

. Since

a and b are not tangled, only one such path can exist. Assume the path from a

to b

exists.

Since an apparent data race exists between a and b, G

AUG

contains shared-data dependence

edges from a

to b

, and from b

to a

. But these edges create the path a

in G

AUG

, im-

plying that a

and b

belong to the strongly connected component, which cannot be true since a

and b are not tangled. Therefore, there can be no path from a

to b

, and no path from b

to a

We ﬁnally show that the apparent data race between a and b is feasible. Consider the

graph G

AUG−D

, constructed by augmenting the temporal ordering graph, G, with edges

representing the shared-data dependences that were actually exhibited by the program execu-

tion (see the proof of Lemma 7.1). This graph contains no more edges than G

AUG

, since the

edges in G

AUG

represent the conservative estimate of the actual shared-data dependences.

Since G

AUG

cannot contain a path from a

to b

, or a path from b

to a

, G

AUG−D

cannot con-

tain such paths either. There is thus a linear ordering of the nodes of G

AUG−D

in which a

ap-

pears before b

, and b

appears before a

. This linear ordering is a global-time model that

deﬁnes a temporal ordering,

T′

, such that a

T′

, b. By Lemma 7.1, P′ = 〈E,

T′

〉 is a feasible program execution. Therefore, the apparent data race between a and b is

feasible.

The above theorem shows that the apparent data races between events that are not tangled

are guaranteed to be feasible. Not all of the remaining apparent data races are infeasible, how-

ever. We now show that, in each tangle, at least one tangled data race is guaranteed to be

feasible. Without more precise knowledge of the actual shared-data dependences (or without

examining the semantics of the program execution), we cannot determine exactly which tan-

gled data races are feasible.

Lemma 7.2.

Let G be a temporal ordering graph, let G

AUG

be G augmented with edges representing the

conservative estimate of the actual shared-data dependences, and let G

AUG−D

be G aug-

mented with edges representing the actual shared-data dependences (see the proof of Lem-

ma 7.1). Assume T is a set of tangled events deﬁned by a strongly connected component

of G

AUG

. Then there exists two events a,b∈T such that an apparent data race exists

between a and b, and no path from a

to b

, or from b

to a

, exists in G

AUG−D

Proof.

Let T be the set of nodes in G

AUG−D

representing the events in T. To establish a contrad-

iction, assume that for all events a,b∈T such that there is an apparent data race between a and

b, there is either a path from a

to b

, or a path from b

to a

, in G

AUG−D

. Since a path from a

to b

and a path from b

to a

cannot both exist (G

AUG−D

is acyclic), assume that the path from

to b

exists. Since there is an apparent data race between a and b, no such path exists in G.

The path in G

AUG−D

must therefore contain at least one shared-data dependence edge, which

cannot emanate from a

. This path must contain nodes for two events, c and d, such that there

is a path from a

to c

, a shared-data dependence edge from c

to d

, and a path from d

to b

Such a path implies that a

c and d

b. Furthermore, c and d must belong to T, since T

contains a strongly connected component.

The shared-data dependence edge from c

to d

exists either because there is a data

conﬂict between c and d (and therefore also an apparent data race), or because c data conﬂicts

with some other event that data conﬂicts with d (a transitive data conﬂict). Assume that the

edge exists because of an apparent data race between c and d. Since c and d belong to T, our

contradiction assumption implies that there must be a path from c

to d

. By applying the

above argument to c and d, we conclude that the path from c

to d

must contain nodes for two

events, e,f∈T, such that there is a path from c

to e

, a shared-data dependence edge from e

, and a path from f

to d

. Such a path implies that c

e and f

d. Since a

c and

b, the events e and f must be different than c and d. By inductively applying the above

argument, we ﬁnd that we always need two more events, x and y, belonging to T, that are dif-

ferent than all other events in T. Since T is ﬁnite, we eventually arrive at a contradiction.

If the shared-data dependence edge exists from c

to d

because of a transitive data

conﬂict between c and d, event c must participate in an apparent data race with some event e

that has a (possibly transitive) data conﬂict with d. By applying an argument similar to the one

above to c and e, we also arrive at a contradiction. Therefore, two events, a,b∈T, must exist

such that there is an apparent data race between a and b, and there is no path from a

to b

, and

no path from b

to a

, in G

AUG−D

Theorem 7.3.

Let G

AUG

be an augmented temporal ordering graph, and let T be the set of tangled events

deﬁned by some strongly connected component of G

AUG

. At least one of the apparent

data races in T is feasible.

Proof.

Let G

AUG−D

be the temporal ordering graph augmented with edges representing the actual

shared-data dependences (see the proof of Lemma 7.1). By Lemma 7.2, there exists two events

a,b∈T such that there is an apparent data race between a and b, and there is no path from a

, and no path from b

to a

, in G

AUG−D

. By the argument at the end of the proof of Theorem

7.2, there is a feasible program execution, P′ = 〈E,

T′

〉, such that a

T′

b, show-

ing that the apparent data race between a and b is feasible. Therefore, at least one of the tan-

gled data races is feasible.

For each tangle, the above theorem guarantees that at least one tangled data race in the

tangle is always feasible. As illustrated in Section 2, however, not all of the tangled data races

are always feasible. An infeasible tangled data race exists only when the outcome of one tan-

gled data race affects another tangled data race. A data race between a and b can affect a data

race between c and d if (1) a or b modiﬁes a shared variable, V, and (2) either the shared loca-

tions accessed by c or d, or the presence of c or d, depend upon V. The presence of c or d can

depend upon V if the outcome of some conditional statement depends upon V, and the outcome

might either delay the execution of c or d, or cause c or d to not execute at all. This notion is

similar to the hides relation of Allen and Padua[1]. A future paper will describe how to employ

these ideas to locate tangled data races that can be guaranteed feasible.

8. Conclusion

This paper has addressed two issues regarding data race detection. We ﬁrst presented a formal

model for reasoning about data races, and then presented a two-phase approach to data race

detection that validates the accuracy of each detected data race. Our model distinguished

among the data races that actually occurred (actual data races), that could have occurred (feasi-

ble data races), and that appeared to have occurred (apparent data races). Such a model al-

lowed us to characterize the type of data races detected by previous methods, and to develop

and argue the correctness of our two-phase approach. The ﬁrst phase of this approach is essen-

tially identical to previous methods and detects the apparent data races. We proved that all ac-

tual data races are detected by this phase. Unlike previous methods, we then employed a

second phase that validates the apparent data races. This phase augments the temporal order-

ing graph with edges representing a conservative estimate of the shared-data dependences. An

apparent data race is validated by determining whether the events involved in the race belong

to the same strongly connected component. We proved that each apparent data race involving

two events belonging to different strongly connected components (or none at all) is feasible,

and in each set of races belonging to a strongly connected component, at least one is feasible.

We are currently investigating several issues related to this work. First, we are develop-

ing more precise analyses for locating those apparent data races that are feasible. As men-

tioned in Section 7, tangled data races are infeasible only when one tangled data race affects

the outcome (or the existence) of another. By examining when one event can affect another,

the notion of a feasible program execution can be extended to characterize what an execution

could have done had different shared-data dependences occurred. Using this extended notion

of feasibility, certain tangled data races can be shown to be feasible. Second, we are examin-

ing different classes of feasible data races. We have proven that the problem of detecting all

feasible data races is NP-hard[16] (even when the complete program execution is known).

However, certain classes of feasible data races can be efﬁciently detected. Third, we are

developing techniques for providing efﬁcient data race detection in practice. These techniques

include efﬁcient program instrumentation, and algorithms for actually constructing, augment-

ing and analyzing the temporal ordering graph. For example, it is not necessary to model each

event with two nodes in the temporal ordering graph. By appropriately modeling the end of

one event as the start of the next event (in the same process), only one node per event is re-

quired. We are also investigating techniques for efﬁciently recording the READ and WRITE

sets for each computation event. In addition, even though we presented a two-phase scheme,

data race validation does not necessarily require a post-mortem approach. It may be possible to

perform the validation phase on-the-ﬂy. Finally, the ideas presented in this paper can be ap-

plied to shared-memory parallel programs that use synchronization primitives other than sema-

phores, such as event variables, barriers, or rendezvous. To gain practical experience with

these ideas, we are currently incorporating them into a parallel program debugger[3,15] under

development at the University of Wisconsin−Madison.

Acknowledgements

This research was supported in part by National Science Foundation grant CCR-8815928,

Ofﬁce of Naval Research grant N00014-89-J-1222, and a Digital Equipment Corporation

External Research Grant.

References

1. Allen, T. R. and D. A. Padua, ‘‘Debugging Fortran on a Shared Memory Machine,’’

Proc. of Intl. Conf. on Parallel Processing, pp. 721-727 St. Charles, IL, (Aug. 1987).

2. Bernstein, A. J., ‘‘Analysis of Programs for Parallel Processing,’’ IEEE Trans. on Elec-

tronic Computers EC-15(5) pp. 757-763 (Oct. 1966).

3. Choi, J.-D., B. P. Miller, and R. H. B. Netzer, ‘‘Techniques for Debugging Parallel Pro-

grams with Flowback Analysis,’’ Comp. Sci. Dept. Tech. Rep. #786, Univ. of

Wisconsin-Madison, (Aug. 1988).

4. Dinning, A. and E. Schonberg, ‘‘An Empirical Comparison of Monitoring Algorithms

for Access Anomaly Detection,’’ Proc. of ACM SIGPLAN Symp. on Principles and

Practice of Parallel Programming, pp. 1-10 Seattle, WA, (Mar. 1990).

5. Emrath, P. A. and D. A. Padua, ‘‘Automatic Detection Of Nondeterminacy in Parallel

Programs,’’ Proc. of the Workshop on Parallel and Distributed Debugging, pp. 89-99

Madison, WI, (May 1988). Also SIGPLAN Notices 24(1) (Jan. 1989).

6. Emrath, P. A., S. Ghosh, and D. A. Padua, ‘‘Event Synchronization Analysis for De-

bugging Parallel Programs,’’ Supercomputing ’89, pp. 580-588 Reno,NV, (Nov. 1989).

7. Habermann, A. N., ‘‘Synchronization of Communicating Processes,’’ Communications

of the ACM 12(3) pp. 171-176 (Mar. 1972).

8. Helmbold, D. P., C. E. McDowell, and J.-Z. Wang, ‘‘Analyzing Traces with

Anonymous Synchronization,’’ Proc. of Intl. Conf. on Parallel Processing, St. Charles,

IL, (Aug. 1990).

9. Hood, R., K. Kennedy, and J. Mellor-Crummey, ‘‘Parallel Program Debugging with

On-the-ﬂy Anomaly Detection,’’ Supercomputing ’90, New York, NY, (Nov. 1990).

10. Kuck, D. J., R. H. Kuhn, B. Leasure, D. A. Padua, and M. Wolfe, ‘‘Dependence Graphs

and Compiler Optimizations,’’ Conf. Record of the 8th ACM Symp. on Principles of

Programming Languages, pp. 207-218 Williamsburg, VA, (Jan. 1981).

11. Lamport, L., ‘‘How to Make a Multiprocessor Computer That Correctly Executes Mul-

tiprocess Programs,’’ IEEE Trans. on Computers C-28(9) pp. 690-691 (Sep. 1979).

12. Lamport, L., ‘‘Interprocess Communication,’’ SRI Technical Report, (Mar. 1985).

13. Lamport, L., ‘‘The Mutual Exclusion Problem: Part I — A Theory of Interprocess

Communication,’’ Journal of the ACM 33(2) pp. 313-326 (Apr. 1986).

14. Mellor-Crummey, J. M., ‘‘Debugging and Analysis of Large-Scale Parallel Programs,’’

Ph.D. Thesis, also Comp. Sci. Dept. Tech. Rep. 312, Univ. of Rochester, (Sep. 1989).

15. Miller, B. P. and J.-D. Choi, ‘‘A Mechanism for Efﬁcient Debugging of Parallel Pro-

grams,’’ Proc. of the Conf. on Programming Language Design and Implementation,

pp. 135-144 Atlanta, GA, (June 1988). Also SIGPLAN Notices 23(7) (July 1988).

16. Netzer, R. H. B. and B. P. Miller, ‘‘On the Complexity of Event Ordering for Shared-

Memory Parallel Program Executions,’’ Proc. of Intl. Conf. on Parallel Processing, St.

Charles, IL, (Aug. 1990).

17. Nudler, I. and L. Rudolph, ‘‘Tools for the Efﬁcient Development of Efﬁcient Parallel

Programs,’’ Proc. of 1st Israeli Conf. on Computer System Engineering, (1988).

The Complexity of Dynamic Data Race Prediction

Preprint

Apr 2020

Writing concurrent programs is notoriously hard due to scheduling non-determinism. The most common concurrency bugs are data races, which are accesses to a shared resource that can be executed concurrently. Dynamic data-race prediction is the most standard technique for detecting data races: given an observed, data-race-free trace $t$, the task is to determine whether $t$ can be reordered to a trace $t^*$ that exposes a data-race. Although the problem has received significant practical attention for over three decades, its complexity has remained elusive. In this work, we address this lacuna, identifying sources of intractability and conditions under which the problem is efficiently solvable. Given a trace $t$ of size $n$ over $k$ threads, our main results are as follows. First, we establish a general $O(k\cdot n^{2\cdot (k-1)})$ upper-bound, as well as an $O(n^k)$ upper-bound when certain parameters of $t$ are constant. In addition, we show that the problem is NP-hard and even W[1]-hard parameterized by $k$, and thus unlikely to be fixed-parameter tractable. Second, we study the problem over acyclic communication topologies, such as server-clients hierarchies. We establish an $O(k^2\cdot d\cdot n^2\cdot \log n)$ upper-bound, where $d$ is the number of shared variables accessed in $t$. In addition, we show that even for traces with $k=2$ threads, the problem has no $O(n^{2-\epsilon})$ algorithm under Orthogonal Vectors. Since any trace with 2 threads defines an acyclic topology, our upper-bound for this case is optimal wrt polynomial improvements for up to moderate values of $k$ and $d$. Finally, we study a distance-bounded version of the problem, where the task is to expose a data race by a witness trace that is similar to $t$. We develop an algorithm that works in $O(n)$ time when certain parameters of $t$ are constant.

Generic framework for data-race-free many-particle simulations on shared memory hardware

Preprint

Feb 2023

Recently, there has been much progress in the formulation and implementation of methods for generic many-particle simulations. These models, however, typically either do not utilize shared memory hardware or do not guarantee data-race freedom for arbitrary particle dynamics. Here, we present both a abstract formal model of particle dynamics and a corresponding domain-specific programming language that can guarantee data-race freedom. The design of both the model and the language are heavily inspired by the Rust programming language that enables data-race-free general-purpose parallel computation. We also present a method of preventing deadlocks within our model by a suitable graph representation of a particle simulation. Finally, we demonstrate the practicability of our model on a number of common numerical primitives from molecular dynamics.

Program Analysis & Design - Research Motivation or Plan

Research Proposal

Full-text available

Apr 2020

Towhidul Mannan

Software pervades every aspect of our life: businesses, financial services, medical services, communication systems, entertainment, and education are invariably dependent on software. With this increasing dependency on software, I expect software to be reliable, robust, safe, and secure. Unfortunately, at the present time the reliability of day-to-day software is questionable. In fact, NIST estimated in 2002 that software failures cost the US economy alone about $59.5 billion every year, and that improvements in software testing infrastructure might save one-third of this cost.

Efficient parallel determinacy race detection for two-dimensional dags

Conference Paper

Feb 2018

A program is said to have a determinacy race if logically parallel parts of a program access the same memory location and one of the accesses is a write. These races are generally bugs in the program since they lead to non-deterministic program behavior --- different schedules of the program can lead to different results. Most prior work on detecting these races focuses on a subclass of programs with fork-join parallelism. This paper presents a race-detection algorithm, 2D-Order, for detecting races in a more general class of programs, namely programs whose dependence structure can be represented as planar dags embedded in 2D grids. Such dependence structures arise from programs that use pipelined parallelism or dynamic programming recurrences. Given a computation with T1 work and T∞ span, 2D-Order executes it while also detecting races in O(T1/P + T∞) time on P processors, which is asymptotically optimal. We also implemented PRacer, a race-detection algorithm based on 2D-Order for Cilk-P, which is a language for expressing pipeline parallelism. Empirical results demonstrate that PRacer incurs reasonable overhead and exhibits scalability similar to the baseline (executions without race detection) when running on multiple cores.

Accurate Data Race Prediction in the Linux Kernel through Sparse Fourier Learning

Article

Apr 2024

Testing for data races in the Linux OS kernel is challenging because there is an exponentially large space of system calls and thread interleavings that can potentially lead to concurrent executions with races. In this work, we introduce a new approach for modeling execution trace feasibility and apply it to Linux OS Kernel race prediction. To address the fundamental scalability challenge posed by the exponentially large domain of possible execution traces, we decompose the task of predicting trace feasibility into independent prediction subtasks encoded as learning Boolean indicator functions for specific memory accesses, and apply a sparse fourier learning approach to learning each feasibility subtask. Boolean functions that are sparse in their fourier domain can be efficiently learned by estimating the coefficients of their fourier expansion. Since the feasibility of each memory access depends on only a few other relevant memory accesses or system calls (e.g., relevant inter-thread communications), we observe that trace feasibility functions often have this sparsity property and can be learned efficiently. We use learned trace feasibility functions in conjunction with conservative alias analysis to implement a kernel race-testing system, HBFourier, that uses sparse fourier learning to efficiently model feasibility when making predictions. We evaluate our approach on a recent Linux development kernel and show it finds 44 more races with 15.7% more accurate race predictions than the next best performing system in our evaluation, in addition to identifying 5 new race bugs confirmed by kernel developers.

An Efficient Sequential Consistency Implementation with Dynamic Race Detection for GPUs

Article

May 2024
J PARALLEL DISTR COM

Krace: Data Race Fuzzing for Kernel File Systems

Conference Paper

May 2020

The Complexity of Dynamic Data Race Prediction

Conference Paper

Jul 2020

Efficient parallel determinacy race detection for two-dimensional dags

Article

Feb 2018

Effective race detection for event-driven programs

Article

Nov 2013

Like shared-memory multi-threaded programs, event-driven programs such as client-side web applications are susceptible to data races that are hard to reproduce and debug. Race detection for such programs is hampered by their pervasive use of ad hoc synchronization, which can lead to a prohibitive number of false positives. Race detection also faces a scalability challenge, as a large number of short-running event handlers can quickly overwhelm standard vector-clock-based techniques. This paper presents several novel contributions that address both of these challenges. First, we introduce race coverage, a systematic method for exposing ad hoc synchronization and other (potentially harmful) races to the user, significantly reducing false positives. Second, we present an efficient connectivity algorithm for computing race coverage. The algorithm is based on chain decomposition and leverages the structure of event-driven programs to dramatically decrease the overhead of vector clocks. We implemented our techniques in a tool called EventRacer and evaluated it on a number of public web sites. The results indicate substantial performance and precision improvements of our approach over the state-of-the-art. Using EventRacer, we found many harmful races, most of which are beyond the reach of current techniques.

On the Complexity of Event Ordering for Shared-Memory Parallel Program Executions

Article

Full-text available

Feb 1970

This paper presents results on the complexity of computing event orderings for sharedmemory parallel program executions. Given a program execution, we formally define the problem of computing orderings that the execution must have exhibited or could have exhibited, and prove that computing such orderings is an intractable problem. We present a formal model of a shared-memory parallel program execution on a sequentially consistent processor, and discuss event orderings in terms of this model. Programs are considered that use fork/join and either counting semaphores or event style synchronization. We define a feasible program execution to be an execution of the program that performs the same events as an observed execution, but which may exhibit different orderings among those events. Any program execution exhibiting the same data dependences among the shared data as the observed execution is feasible. We define several relations that capture the orderings present in all (or s...

On interprocess communication

Article

Jan 1986

L. Lamport

Debugging and analysis of large-scale parallel programs. Doctoral thesis

Article

Mellor-Crummey

One of the most serious problems in the development cycle of large-scale parallel programs is the lack of tools for debugging and performance analysis. Parallel programs are more difficult to analyze than their sequential counterparts for several reasons. First, race conditions in parallel programs can cause non-deterministic behavior, which reduces the effectiveness of traditional cyclic debugging techniques. Second, invasive, interactive analysis can distort a parallel program's execution beyond recognition. Finally, comprehensive analysis of a parallel program's execution requires collection, management, and presentation of an enormous amount of information. This dissertation addresses the problem of debugging and analysis of large-scale parallel programs executing on shared-memory multiprocessors. It proposes a methodology for top-down analysis of parallel program executions that replaces previous ad-hoc approaches. To support this methodology, a formal model for shared-memory communication among processes in a parallel program is developed. It is shown how synchronization traces based on this abstract model can be used to create indistinguishable executions that form the basis for debugging. This result is used to develop a practical technique for tracing parallel program executions on shared-memory parallel processors so that their executions can be repeated deterministically on demand.

Tools for efficient development of efficient paxallel programs

Article

Debugging and analysis of large-scale parallel programs /

Article

Typescript (photocopy). Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 1989. Includes vita and abstract. Includes bibliographical references (leaves 117-127).

Analysis of Programs for Parallel Processing

Article

Nov 1966

A. J. Bernstein

A set of conditions are described which determine whether or not two successive portions of a given program can be performed in parallel and still produce the same results. The conditions are general and can be applied to sections of the program of arbitrary size. The conditions are interesting because of the light they shed on the structure of programs amenable to parallel processing and the memory organization of a multi-computer system. Copyright © 1966 by The Institute of Electrical and Electronics Engineers, Inc.

Detecting Data Races in Parallel Program Executions

Abstract

Recommended publications

A Numerical Implementation of Spherical Object Collision Probability

Performance Appraisal: Nonverbal Influences On the Rating Process

On experiment design for local approach identification of LPV systems

High-accuracy and fast new-format optical Hough transform