ArticlePDF Available

Analyzing Traces with Anonymous Synchronization

Authors:

Abstract

this paper. A trace specifies a total ordering of the events performed by the program. For our purposes, the trace reflects only one of the orders in which the events could have occurred. A more restrictive definition that is difficult to achieve in practice would be for a trace to specify the exact order in which the events did occur. Since traces are only approximations of executions, there are usually several executions that are consistent with a given trace. What we want to compute is the orderings between pairs of events that must occur in all executions which are consistent with the trace. In general this will be a partial order. If the partial order contains all orderings that must occur, then a pair of events not ordered by this "must occur" partial ordering can potentially execute in either order. Much research has been directed towards determining the partial ordering of events in parallel and distributed systems. Previous models have assumed point-to-point communication which makes it very easy to determine which events were caused by which other events (e.g. "message received by B from A" is clearly caused by "message sent by A to B"). Unfortunately the synchronization models supported by several parallel programming languages allow for anonymous communication, where the partner is unknown. Examples of anonymous communication include locks, semaphores, and monitors. Emrath, Ghosh, and Padua [EGP89] present a method for detecting non-determinacy in parallel programs that utilize fork/join and event style synchronization instructions with the Post, Wait, and Clear primitives. They construct a Task Graph from the given synchronization instructions and the sequential components of the program that is intended to show the guaranteed orderings between events. For ...
6. Conclusion
23
References
[AP87] T. R. Allen and D. A. Padua. Debugging fortran on a shared memory machine.
In
Proc. International Conf. on Parallel Processing
, pages 721{727, 1987.
[Dij65] E. W. Dijkstra. Solution of a problem in concurrent programming control.
Communications of the ACM
, 8(9), September 1965.
[EGP89] P. A. Emrath, S. Ghosh, and D. A. Padua. Event synchronization analysis for
debugging parallel programs. In
Supercomputing '89
, November 1989. Reno,
NV.
[EP88] P. A. Emrath and D. A. Padua. Automatic detection of nondeterminacy in
parallel programs. In
Proc. Workshop on Parallel and Distributed Debugging
,
pages 89{99, May 1988.
[Fid88] C. J. Fidge. Partial orders for parallel debugging. In
Proc. Workshop on
Parallel and Distributed Debugging
, pages 183{194, May 1988.
[GPH*88] M. D. Guzzi, D. A. Padua, J. P. Hoeinger, , and D. H. Lawrie. Cedar fortran
and other vector and parallel fortran dialects. In
Proceedings Supercomputing
'88
, pages 114{121, 1988.
[IBM88]
Parallel FORTRAN language and library reference
. IBM, 1988.
[Lam78] L. Lamport. Time, clocks, and the ordering of events in a distributed system.
CACM
, 21(7):558{565, July 1978.
[Lam86] Leslie Lamport. The mutual exclusion problem: part i{a theory of interprocess
communication.
JACM
, 33(2):290{312, April 1986.
[Mat88] F. Mattern. Virtual time and global states of distributed systems. In M.
Cosnard, editor,
Proceedings of Parallel and Distributed Algorithms
, 1988.
[McD89] C. E. McDowell. A practical algorithm for static analysis of parallel programs.
Journal of Parallel and Distributed Computing
, June, 1989.
[NM89] R. Netzer and B. P. Miller.
Detecting Data Races in Parallel Program Exe-
cutions
. Technical Report 894, University of Wisconsin-Madison, November
1989.
[Tay84] R. N. Taylor.
Debugging Real-Time Software in a Host-Target Environment
.
Technical Report, U.C. Irvine Tech. Rep. 212, 1984.
6. Conclusion
22
of events. We feel that this is misleading { an execution is more properly viewed as a
partial ordering on the events. Fidge and Mattern have pioneered the use of time vectors
to represent these partial orders. We have extended this approach by using time vectors
to analyze sets of executions rather than just capturing a single execution.
6. Conclusion
21
After adding the virtual edge from BW1
to CW1, CW1 becomes the second wait
on S1. Using Algorithm 5:
S
(BW1,CW1)
is
f
(BW1,CW1), (BW1,CS1), (BW1,CS2),
(BS1,CW1), (BS1,CS1), (BS1,CS2)
g
.
After adding the virtual edge from CS1 to
BW1, BW1 b ecomes the second wait on
S1. Again using Algorithm 5:
S
(CW1,BW1)
is
f
(BW1,CW1), (BS1,CW1), (BS2,CW1),
(BW1,CS1), (BS1,CS1), (BS2,CS1)
g
.
S
(BW1,CW1)
\
S
(CW1,BW1) =
f
(BW1,CW1),(BW1,CS1),(BS1,CS1),(BS1,CW1)
g
Figure 5.1: Detect Critical Regions
The problem is made even more dicult when there is no clear correspondence between
the blo cking and enabling events in the trace.
This paper contains a series of algorithms for extracting useful information from
sequential traces with anonymous synchronization. The rst algorithm is very similar to
the vector timestamp methods of Fidge and Mattern [Fid88, Mat88]. The other algorithms
systematically manipulate these vectors of timestamps in order to discover pairs of events
that must be ordered in every execution which is consistent with the trace. In addition to
presenting our algorithms, we have also proved their correctness.
Although our algorithms nd many of these \must-be-ordered" relationships, we have
been unable to prove that they nd all of them. We are investigating additional procedures
which can increase the number of \must-be-ordered" relationships found. We would also
like to distinguish all pairs of events that are concurrent in some consistent execution from
pairs of events which can happen in either order, but not concurrently.
Some parallel programming environments view a parallel execution as a linear sequence
6. Conclusion
20
If
s
?
w
2 =
)
e
k
e
0
, i.e., if there are enough signals for both waits to precede,
then the two waits can happen concurrently.
If
s
?
w
= 1 =
) :
(
e
k
e
0
), i.e., there is only one signal for a wait to precede, then we
can conclude that they cannot happen concurrently. The starting points of critical
regions have b een found. The following procedure is used to determine unordered
sequential event pairs in critical region.
1. First, assume that event
e
happened before
e
0
. Thus
e
0
is the
w
+ 2nd wait for
S
. Using Algorithm 4 with
k
=
w
+ 2 to calculate time vectors for event
e
0
and
other events.
Let
S
(
e; e
0
) =
f
(
e
i
; e
j
):(
e
i
; e
j
)
2
Conc
; e
i
2
E
i
; e
j
2
E
j
;
and ^
(
e
i
)[
i
]
^
(
e
j
)[
i
]
or ^
(
e
j
)[
j
]
^
(
e
i
)[
j
]
g
.
Undo the timestamp up dating.
2. Similarly, assume that event
e
0
happened b efore
e
. Thus
e
is the
w
+ 2nd wait
for
S
. Using Algorithm 4 with
k
=
w
+ 2 to calculate time vectors for event
e
and other events.
Let
S
(
e
0
; e
) =
f
(
e
i
; e
j
):(
e
i
; e
j
)
2
Conc
; e
i
2
E
i
; e
j
2
E
j
;
and ^
(
e
i
)[
i
]
^
(
e
j
)[
i
]
or ^
(
e
j
)[
j
]
^
(
e
i
)[
j
]
g
.
Undo the timestamp up dating.
Let Seq
t
=
S
(
e; e
0
)
\
S
(
e
0
; e
). Notice that Seq
t
maintains the set of unordered
event pairs in the critical region. They are not concurrent in any executions,
whenever
e
happened before
e
0
or
e
0
occurred before
e
.
3. Let Seq = Seq
[
Seq
t
.
Let Conc = Conc
?
Seq
t
.
s
?
w
0 means neither of them can precede. In this case, there is a deadlo ck.
End Algorithm 5.
Algorithm 5 generates two sets of event pairs. Conc contains those concurrent event
pairs. Seq contains those unordered sequential event pairs. The remaining event pairs are
ordered. Figure 5.1 shows the application of this algorithm to the trace from Figure 1.1.
6 Conclusion
One of the most dicult tasks in debugging parallel programs is determining the timing
relationships between the events performed by the parallel program. Although several
parallel systems include facilities for creating a trace of the signicant events, the sequential
nature of the trace makes it dicult to determine which events could have happened in
parallel.
5. Adjusting the Timestamps to Determine Concurrency
19
From equation 4.6 we know that at most
k
non-shadowed signals (excluding
e
i
) in
R
(
e
) do
not follow
e
i
(i.e. the
k
+ 1st smallest and later always follow
e
i
).
Therefore, in every execution, at least one of the
k
+ 1 non-shadowed signals preceding
e
follows (or is equal to)
e
i
.
By transitivity
e
i
happens b efore
e
in every execution consistent with, so,
e
i
e
.
5 Adjusting the Timestamps to Determine Concurrency
Up to now, we have computed a partial order that reects a safe order relation between
events from the trace
E
. Given any two events
e
i
2
E
i
and
e
j
2
E
j
, if ^
(
e
i
)[
i
]
^
(
e
j
)[
i
] or
^
(
e
j
)[
j
]
^
(
e
i
)[
j
] then the two events are ordered. Otherwise,
e
i
and
e
j
are two unordered
events. The unordered events are not necessarily concurrent events. They may have to
occur sequentially. In this case, we call them
unordered sequential
events. For example,
if the program has a properly implemented lock around a critical region, then dierent
executions may have tasks entering the critical region in dierent orders. In no execution,
however, do two tasks concurrently enter the critical region.
When debugging parallel programs, we would like to distinguish those pairs of events
that are concurrent in some consistent execution from pairs of events which can happen
in either order, but not concurrently. Unfortunately, the concurrent relation cannot be
determined immediately from the timestamps. We cannot necessarily say
e
i
can happen
concurrently with event
e
j
even if we know ^
(
e
i
)
k
^
(
e
j
). As an example, in Figure 4.2,
even though ^
(
BW
1)
k
^
(
C W
1), the two W1 events cannot occur at the same time. It is,
in general, a hard problem to determine whether two unordered events can really happen
concurrently.
Let
e; e
0
2
E
be a pair of events. Event
e
may happen concurrently with
e
0
only
if ^
(
e
)
k
^
(
e
0
). The following procedure can be used to detect critical regions, and
to determine unordered sequential event pairs in critical regions. The algorithm will
calculate two sets. The set Conc contains concurrent event pairs, while the set Seq
contains unordered sequential event pairs. Initially, we assume that all unordered events are
potential concurrent events. Once some critical regions have been detected, the algorithm
will move those unordered sequential event pairs from Conc to Seq.
Algorithm 5:
Initially let Conc =
ff
e; e
0
g
:
e; e
0
2
E
, and ^
(
e
)
k
^
(
e
0
)
g
. Let Seq =
.
Repeat the following pro cedure until no more changes are p ossible.
Pick any two unordered wait events
e
and
e
0
for semaphore
S
where (
e; e
0
)
2
Conc.
Let
G
(
e; e
0
) be the set of wait events for semaphore
S
which precede either event
e
or
e
0
(based on current timestamps ^
).
Let
R
(
e; e
0
) =
f
e
00
:
e
00
is a signal event using
S
and
e
00
precedes
e
or
e
0
g [ f
e
00
:
e
00
is not
shadowed with respect to either
e
or
e
0
, and
e
00
does not follow either
e
or
e
0
g
.
Let
s
=
j
R
(
e; e
0
)
j
and
w
=
j
G
(
e; e
0
)
j
.
4. Expanding the Safe Order Relation
18
Figure 4.2: Expanding the Safe Order Relation
Case1: Assume ^
(
e
)[
i
] = ^
(
e
p
)[
i
] then
^
(
e
i
)[
i
]
^
(
e
p
)[
i
]
)
e
i
e
p
by the induction hypothesis (4.4)
)
e
i
e
by transitivity (4.5)
Case2: ^
(
e
)[
i
]
6
= ^
(
e
p
)[
i
].
Event
e
is a wait on semaphore
S
. Let
k
be computed as sp ecied in the algorithm, then
^
(
e
i
)[
i
]
^
(
e
)[
i
] = min
k
+1
f
^
(
e
s
)[
i
] :
e
s
2
R
(
e
)
g
(4.6)
where non-shadowed signal set
R
(
e
) is computed according to the algorithm, and
min
k
+1
selects the
k
+ 1st smallest value from the set.
In every execution, at least
k
+ 1 signal events precede
e
since there are at least
k
waits
on the same semaphore must happen b efore
e
.
In any arbitrary execution P, let
k
s
be the number of shadowed signals (with respect to
e
)
that precede
e
.
By transitivity the corresponding
k
s
shadowing waits precede
e
, and at least
k
+
k
s
waits
on
S
precede wait event
e
.
Therefore, at least
k
+
k
s
+ 1 signal events precede
e
in the execution and
k
+ 1 of them
are non-shadowed signals.
4. Expanding the Safe Order Relation
17
Therefore, the signal event
e
0
s
is shadowed by some wait event
e
0
w
where
e
s
e
0
w
e
0
s
with
respect to
e
.
This forms a contradiction with the assumption that
e
0
s
is shadowed by
e
w
.
The Algorithm 4 is based on the following observation. If
e
is a wait event on semaphore
S
and
k
other wait events on
S
must happen before
e
, then at least
k
+ 1 non-shadowed
signal events happen before
e
in every execution consistent with the trace.
Algorithm 4:
Initially ^
(
e
) =
0
(
e
) for all events
e
2
E
.
Repeat the following pro cedure until no more changes are p ossible.
Pick an event
e
. If
e
is a wait event using semaphore
S
, let
W
(
S
) be the set of wait events on semaphore
S
,
k
be the number of wait events
e
w
2
W
(
S
) such that
e
w
6
=
e
and if
e
w
2
E
i
then
^
(
e
w
)[
i
]
^
(
e
)[
i
], and
R
(
e
) =
f
^
e
: ^
e
is a signal event on
S
,
e
6
^
e
as indicated by the ^
timestamps, and ^
e
is not shadowed with respect to
e
g
and
v
s
= the
k
+ 1st component-wise minimum of ^
(^
e
) for ^
e
2
R
(
e
).
If
e
is not a wait event, let
v
s
be the 0 vector.
^
(
e
) = max(^
(
e
p
)
;
#
(
e
)
; v
s
)
End Algorithm 4.
Figure 4.2 shows the new ^
timestamps generated when Algorithm 4 is executed starting
with Figure 3.1.
Theorem 5:
Algorithm 4 generates only safe order relations, i.e., for any two events
e
i
2
E
i
and
e
2
E
:
^
(
e
i
)[
i
]
^
(
e
)[
i
]
)
e
i
e
Proof
: The proof is by induction on the number of updates. As a base case the theorem
holds for the initial values of ^
from Theorem 4.
Assume the theorem holds before some update. Consider two events
e
i
2
E
i
and
e
2
E
where
^
(
e
i
)[
i
]
>
^
(
e
)[
i
] before the update, and
^
(
e
i
)[
i
]
^
(
e
)[
i
] after the update.
Because ^
(
e
i
)[
i
] never changes, ^
(
e
)[
i
] was updated.
We consider two cases.
4. Expanding the Safe Order Relation
16
Figure 4.1: Shadowed Signal Event
Since for each shadowed signal there is only one corresponding shadowing wait (by
Denition 15), we have
j
R
i
s
(
e
)
jj
R
i
w
(
e
)
j
.
We only need to show that
j
R
i
s
(
e
)
j
=
j
R
i
w
(
e
)
j
.
Assume to the contrary that
j
R
i
s
(
e
)
j
>
j
R
i
w
(
e
)
j
, which means that there are at least two
signals
e
s
and
e
0
s
in
R
i
s
(
e
) shadowed by some
e
w
2
R
i
w
(
e
).
Assume
e
w
e
s
e
0
s
. Let
w
1
and
s
1
be the number of waits and signals on
S
performed by
T
i
between
e
w
and
e
s
,
w
2
and
s
2
be the number of waits and signals on
S
performed by
T
i
between
e
s
and
e
0
s
. This is shown in the following which represents the local subsequence
of events performed by some task, where time moves from left to right.
j ?
s
1
?!j j ?
s
2
?!j
...
e
w
...
e
s
...
e
0
s
...
j ?
w
1
?!j j ?
w
2
?!j
Therefore,
w
1
=
s
1
since
e
s
is shadowed by
e
w
(4.1)
w
1
+
w
2
=
s
1
+
s
2
+ 1 since
e
0
s
is shadowed by
e
w
(4.2)
Combining equations 4.1 and 4.2 gives us
w
2
=
s
2
+ 1 (4.3)
However, equation 4.3 means that the subsequence between
e
s
and
e
0
s
contains more waits
on
S
than signals.
4. Expanding the Safe Order Relation
15
add additional safe orderings into the partial order using the fact that only some wait
events in the trace can actually proceed immediately after each signal event. The partial
order resulting from this nal step will be represented by the time vectors ^
(
e
). Initially,
^
(
e
) =
0
(
e
).
Denition 14:
Let
e
2
E
i
be a wait event and
e
s
2
E
j
be a signal event on the same
semaphore
S
where
^
(
e
)
k
^
(
e
s
)
. Let
E
(
e; e
s
)
be the subsequence of
E
j
containing every
event
e
j
where
e
j
e
s
and
^
(
e
j
)
k
^
(
e
)
. If any sux of
E
(
e; e
s
)
contains more wait events
on
S
than signal events on
S
, then the signal event
e
s
is
shadowed
with respect to
e
.
Denition 15:
Let
E
0
(
e; e
s
)
be the shortest sux of
E
(
e; e
s
)
which contains more wait
events than signal events on
S
, and let
e
w
be the rst event of
E
0
(
e; e
s
)
. We say
e
s
is
shadowed by event
e
w
with respect to
e
.
Lemma 2:
Given a wait event
e
and a signal event
e
s
on the same semaphore
S
, if
e
s
is
shadowed by some event
e
w
with respect to
e
then
Event
e
w
is a wait event on semaphore
S
,
The event
e
w
, which shadows
e
s
with respect to
e
, is unique. We dene
e
w
to be the
shadowing wait event corresponding to
e
s
, and
The subsequence between
e
w
and
e
s
(in the same task) contains as many signal events
as wait events on semaphore
S
.
Proof
: The pro of is straightforward from the denitions.
Denition 16:
For any wait event
e
2
E
, let
R
s
(
e
) =
f
e
s
:
e
s
is shadowed with respect to
e
g
, and
R
w
(
e
) =
f
e
w
:
9
e
s
2
R
s
(
e
)
s.t.
e
s
is shadowed by
e
w
with respect to
e
g
.
In the example shown in Figure 4.1, the signal event CS1 is shadowed by CW1 with
respect to two wait events performed by task B.
Lemma 3:
For any wait event
e
2
E
, the correspondence between shadowed signal and
shadowing wait is one to one, i.e.,
j
R
s
(
e
)
j
=
j
R
w
(
e
)
j
:
Proof
: Let
e
2
E
be a wait event on semaphore
S
. From Denitions 14 and 15, we know
that any pair of corresponding shadowed signal and shadowing wait belongs to the same
task.
Therefore, it is enough to show that the corresp ondence between shadowed signal and
shadowing wait is one to one within each task
T
i
where 1
i
n
.
Let
R
i
s
(
e
) and
R
i
w
(
e
) be the sets of shadowed signal events and shadowing wait events
performed by task
T
i
with resp ect to
e
.
4. Expanding the Safe Order Relation
14
Equation 3.7 implies for some
c
max(
P
(
e
p
)
;
#
(
e
)
;
P
(
e
s
))[
c
]
<
max(
0
(
e
p
)
;
#
(
e
)
;
min(
0
(
e
s
1
)
;
...
0
(
e
s
m
)))[
c
]
:
(3
:
8)
Equation 3.5 and 3.8 imply
P
(
e
s
)[
c
]
<
min(
0
(
e
s
1
)
;
...
0
(
e
s
m
))[
c
] (3.9)
P
(
e
s
)[
c
]
<
0
(
e
s
)[
c
] (3.10)
P
(
e
s
)
6
0
(
e
s
) (3.11)
Again 3.11 contradicts the assumption that
e
is the rst event in the top ological order of
the partial order P such that
P
(
e
)
6
0
(
e
).
Therefore, there is no event
e
in any execution P such that
P
(
e
)
6
0
(
e
).
Theorem 4:
After rewinding, we have a partial order that is a safe order relation, i.e.
0
(
e
i
)
<
0
(
e
)
)
e
i
e:
Proof
: Let
i
be the task performing
e
i
, so
0
(
e
i
)[
i
] =
P
(
e
i
)[
i
] =
#
(
e
i
)[
i
].
0
(
e
i
)[
i
]
0
(
e
)[
i
] from the hypothesis (3.12)
0
(
e
i
)[
i
]
P
(
e
)[
i
] by Lemma 1 (3.13)
P
(
e
i
)[
i
]
P
(
e
)[
i
] since
e
i
2
E
i
(3.14)
e
i
P
!
e
for all
P
and (3.15)
e
i
e
from the denition of
:
(3.16)
The rewinding process is based on the fact that any signal event might enable any
wait event on the same semaphore. We may have lost some safe order relations during
rewinding. As an example, in Figure 3.1, time vector
0
says that two W2 events and
the W1 event in task A may happen concurrently with all of the events in task B and C.
However, it is obvious that the W1 in task A must happ en after the two S1 events in task
B and C, and the second W2 in task A has to wait until all of the events in B and C have
occurred. The nal step in the algorithm will nd some of the order relations lost during
the rewinding procedure.
4 Expanding the Safe Order Relation
The result of the rewind step is a partial order that is a safe order relation. It is an
overly conservative safe order relation because it assumed that any wait could happen
immediately after any signal for the same semaphore. We now undertake a process to
3. Rewinding the Time Vectors
13
From the inductive hypothesis
0
(
e
)
min(
0
(
e
S
1
)
;
...
;
0
(
e
S
k
))
;
and from the algorithm
min(
0
(
e
S
1
)
;
...
;
0
(
e
S
k
))
<
0
(^
e
)
:
Therefore
0
(
e
)
<
0
(^
e
).
After rewinding, we have a partial order that is a safe order relation. If event
e
i
has
an earlier time vector than
e
, we can say
e
i
will happen before
e
in all executions that are
consistent with the given trace. Before we prove this in theorem 4 we rst present one
lemma used in the proof.
Lemma 1:
For any execution
P
consistent with a trace
E
and for al l events
e
2
E
P
(
e
)
0
(
e
)
Proof
: Assume to the contrary that there is an execution
P
and some event
e
such that
P
(
e
)
6
0
(
e
).
In any topological ordering (with respect to partial order P) of events in E, let
e
be the
rst event in the topological ordering such that
P
(
e
)
6
0
(
e
).
We consider two cases.
Case1: If
e
is not a wait event then from Algorithms 1 and 3:
0
(
e
) = max(
0
(
e
p
)
;
#
(
e
)) (3.3)
P
(
e
) = max(
P
(
e
p
)
;
#
(
e
)) (3.4)
Note that
P
(
e
p
)
6
0
(
e
p
) by our choice of
e
. This contradicts the assumption that
e
is the
rst event in the topological order of the partial order P such that
P
(
e
)
6
0
(
e
).
Case2: Event
e
is a wait event. From the choice of
e
we get
P
(
e
p
)
0
(
e
p
) (3.5)
P
(
e
)
6
0
(
e
) (3.6)
Substituting the denitions of
P
and
0
into 3.6 gives:
max(
P
(
e
p
)
;
#
(
e
)
;
P
(
e
s
))
6
max(
0
(
e
p
)
;
#
(
e
)
;
min(
0
(
e
s
1
)
;
...
0
(
e
s
m
))) (3
:
7)
where
e
s
is the corresponding signal event of
e
and thus app ears before
e
in the topological
order, and each
e
s
i
for 1
i
m
is one of the
m
signal events for the semaphore waited
on by
e
w
.
3. Rewinding the Time Vectors
12
Figure 3.1: Rewinding the Time Vectors
Proof
: It is enough to show that
0
(
e
)[
i
]
0
(^
e
)[
i
]
)
0
(
e
)
<
0
(^
e
). The opposite direction
follows directly from the denition of vector comparison.
The proof is by induction on the number of updates made by Algorithm 3. As the base
case, from theorem 2, the theorem holds for the initial
0
values.
Assume the theorem holds before some update. Consider two arbitrary events,
e
2
E
i
and
^
e
2
E
j
, after updating a single time vector.
Since Algorithm 3 does not change
0
(
e
)[
i
] and never increases time vectors, updating
0
(
e
)
can not make
0
(
e
)[
i
]
0
(^
e
)[
i
]
)
0
(
e
)
<
0
(^
e
) false. Therefore, we consider three cases
when
0
(^
e
) was up dated.
Case1:
i
=
j
If
e
= ^
e
p
then from the algorithm
0
(
e
)
<
0
(^
e
).
Otherwise,
0
(
e
)
<
0
(^
e
p
) which implies
0
(
e
)
<
0
(^
e
).
Case2:
i
6
=
j
and
0
(^
e
)[
i
] =
0
(^
e
p
)[
i
].
This implies
0
(
e
)[
i
]
0
(^
e
p
)[
i
]. Since neither
0
(^
e
p
) nor
0
(
e
) changed, by the induction
hypothesis
0
(
e
)
0
(^
e
p
), and the algorithm ensures that
0
(^
e
p
)
<
0
(^
e
). Therefore
0
(
e
)
<
0
(^
e
).
Case3:
i
6
=
j
and
0
(^
e
)[
i
]
6
=
0
(^
e
p
)[
i
]. This implies that ^
e
is a wait event for some semaphore
S
.
Let
e
S
1
...
e
S
k
be the signal events for the semaphore S. From the algorithm denition and
the assumption we know
0
(
e
)[
i
]
0
(^
e
)[
i
] = min(
0
(
e
S
1
)
;
...
;
0
(
e
S
k
))[
i
]
:
3. Rewinding the Time Vectors
11
3 Rewinding the Time Vectors
The result of the initialize step in the previous section is an unsafe order relation. It is
unsafe b ecause we assumed that the
k
th signal event for a particular semaphore was the
one allowing the
k
th wait event to precede. The next step is to rewind the time vectors to
account for the fact that any signal event might be the one that allowed any wait event on
the same semaphore to complete. We use
0
(
e
) to represent the new time vector assigned
to event
e
during and after the rewinding process. Initially
0
is the same as
.
Suppose
e
is a wait event, and
e
1
and
e
2
are two signal events, either of which could
have caused
e
to complete. In this case, we only know that either
e
1
or
e
2
must have
happened b efore
e
. The trace might be in any of the forms:
...
; e
1
;
...
; e;
...
; e
2
;
. . .;
...
; e
2
;
...
; e;
...
; e
1
;
. . .;
...
; e
1
;
...
; e
2
;
. . .
; e;
. . .; or
...
; e
2
;
...
; e
1
;
. . .
; e;
. . ..
However, we can conclude that the common ancestors of
e
1
and
e
2
must o ccur before
e
.
Therefore if
e
a
e
1
and
e
a
e
2
then
e
a
e
. The rewind step dened below uses this fact
to obtain a safe order relation.
Algorithm 3:
Initially,
8
e
2
E;
0
(
e
) =
(
e
).
Repeat the following procedure until no further changes are possible.
For all event
e
2
E
, let
0
(
e
) = max(
0
(
e
p
)
;
#
(
e
)
; v
s
)
where if
e
is wait event on semaphore S:
v
s
= min(
0
(
e
s
1
)
;
...
;
0
(
e
s
k
)) (3.1)
where
e
s
1
...
e
s
k
are all the signal events for the semaphore S. (3.2)
otherwise
v
s
is the 0 vector.
End Algorithm 3.
Observe that the only dierence between Algorithm 3 and Algorithm 2 (used to
compute
) is that for wait events in Algorithm 3,
v
s
is the minimum of a set of time
vectors, which includes the time vector used for
v
s
in computing
. Therefore the values
of
0
will only get smaller as Algorithm 3 executes.
Theorem 3:
For any two distinct events
e
2
E
i
;
^
e
2
E
j
,
0
(
e
)[
i
]
0
(^
e
)[
i
]
()
0
(
e
)
<
0
(^
e
)
2. Initializing the Vectors
10
2. If
e
l
is the corresponding signal event and
e
i
!
e
l
then
(
e
i
)[
i
]
(
e
l
)[
i
] and
e
l
!
e
)
(
e
l
)[
i
]
(
e
)[
i
] and the result follows.
3. If
e
l
is the corresponding signal event and
e
i
6!
e
l
then
e
i
!
e
p
Property 8 (2.8)
(
e
i
)[
i
]
(
e
p
)[
i
] from the inductive hypothesis (2.9)
(
e
p
)[
i
]
(
e
)[
i
] from the denition of
(2.10)
and the result follows.
Theorem 2:
For any two distinct events
e
i
2
E
i
; e
2
E
,
(
e
i
)[
i
]
(
e
)[
i
]
()
(
e
i
)
<
(
e
)
Proof
: The
(
direction is trivial. For the
)
direction, assume to the contrary that there
are two events,
e
i
2
E
i
and
e
2
E
where
(
e
i
)[
i
]
(
e
)[
i
] but
(
e
i
)
6
<
(
e
). Thus there is
some vector comp onent
c
such that
(
e
i
)[
c
]
>
(
e
)[
c
]
:
(2.11)
Let
e
c
be the event occurring in
E
c
with sequence number
(
e
i
)[
c
] then
(
e
c
)[
c
] =
(
e
i
)[
c
]
:
(2.12)
e
c
!
e
i
from Theorem 1 and 2.12 (2.13)
e
i
!
e
from the hypothesis and theorem 1 (2.14)
e
c
!
e
from transitivity of
!
(2.15)
(
e
c
)[
c
]
(
e
)[
c
] from 2.15 and Theorem 1 (2.16)
Combining 2.12 and 2.16 forms a contradiction with 2.11.
Corollary 1:
For any two distinct events
e
2
E
i
,
^
e
2
E
j
,
i
6
=
j
:
(
e
)[
i
]
>
(^
e
)[
i
]
and
(^
e
)[
j
]
>
(
e
)[
j
] =
)
e
k
^
e
The initialization process creates a partial ordering of the events in the trace. This
partial ordering corresponds to an execution which is strongly consistent with the trace.
It describ es the
happened before
relation for the
canonical execution
.
Unfortunately, this partial order only gives the happ ened before relationships between
events for the canonical execution, i.e. it is an unsafe order relation. The
k
th signal event in
one execution might not necessarily be the
k
th signal in some other execution. Therefore,
event
e
may not happen before ^
e
in some other execution even if it did in this execution.
Even when
(
e
)
<
(^
e
) we cannot say
e
must happen before
^
e
.
2. Initializing the Vectors
9
Proof
: From Properties 3 and 7 we know that if either side holds then
e
i
appears before
e
in the trace. Therefore, it suces to prove that whenever the algorithm assigns a time
vector to some event
e
, and
e
i
is any event appearing earlier in the trace (and thus already
assigned a time vector by the algorithm) the two conditions are equivalent. We prove this
by induction on the position of
e
in the trace.
After the rst event is assigned a time vector, the theorem trivially holds as no distinct
pairs of events have been assigned time vectors. We now show that the time vector assigned
to the next event,
e
, satises the theorem assuming that the time vectors assigned to all
events appearing before
e
in the trace satisfy the theorem.
We rst show that assuming
(
e
i
)[
i
]
(
e
)[
i
] then
e
i
!
e
. If
e
2
E
i
, so that the
two events are in the same task,
T
i
, the implication follows because the selected vector
component is the event count for task
T
i
. Otherwise the events occur in dierent tasks
and
(
e
)[
i
] =
(^
e
)[
i
]
where
^
e
is either
(
e
p
or possibly
e
j
if
e
is a wait event and
e
j
is the corresponding signal event.
In either case ^
e
has previously been assigned a time vector and
(
e
)[
i
] =
(^
e
)[
i
] by the denition of
(2.1)
(
e
i
)[
i
]
(^
e
)[
i
] from the assumption (2.2)
^
e
!
e
from the denition of ^
e
(2.3)
Either
e
i
= ^
e
and the theorem is proven or by the induction hypothesis
e
i
!
^
e
, and by
transitivity
e
i
!
e
.
To prove that
e
i
!
e
)
(
e
i
)[
i
]
(
e
)[
i
] we consider three cases.
Case1: If
e
2
E
i
, so that the two events are in the same task, the result follows from
Properties 3 and 4.
Case2: If
e
is not a wait event then
(
e
)[
i
] =
(
e
p
)[
i
] (2.4)
e
i
!
e
p
Property 8 (2.5)
(
e
i
)[
i
]
(
e
p
)[
i
] from the hypothesis (2.6)
(
e
i
)[
i
]
(
e
)[
i
]
:
(2.7)
Case3: If
e
is a wait event then we have three subcases:
1. If
e
i
is the corresponding signal event then the result trivially holds.
2. Initializing the Vectors
8
Algorithm 2:
To compute initial time vectors,
(
e
i
), from a trace
E
use algorithm 1 with
the following mo dications.
The
k
th wait event on semaphore
S
(in trace order) corresp onds to the
k
th signal
event on
S
.
The events are assigned time vectors in the order they appear in the trace.
End Algorithm 2.
For the given trace, Figure 1.1(a) shows the result of the initialization procedure.
The time vectors computed for the canonical execution have the following properties:
Property 4:
If
e
and
^
e
are two events in the same task
T
i
and
e
occurred before
^
e
in the
trace, then
e
!
^
e
and
(
e
)
<
(^
e
)
.
Property 5:
If
e
and
^
e
are the corresponding signal/wait pair (the
k
th signal and the
k
th
wait on the same semaphore
S
in the trace), then
e
!
^
e
and
(
e
)
<
(^
e
)
.
Property 6:
At any point in the trace, the maximum value of any time vector component
is the number of events performed by the corresponding task up to that point.
Property 7:
If
e
2
E
i
and
(
e
)[
i
]
(^
e
)[
i
]
then either
e
= ^
e
or
e
appears before
^
e
in the
trace.
Because an event is only constrained to follow its predecessor in the same task, and in
the case of wait events, the corresponding signal, the following property holds.
Property 8:
If
e
!
^
e
then one of the following is true:
1.
e
= ^
e
p
,
2.
e
= ^
e
s
where
^
e
is a wait event and
^
e
s
is the corresponding signal event,
3.
e
!
^
e
p
or
4.
e
!
^
e
s
where
^
e
is a wait event and
^
e
s
is the corresponding signal event.
Given the correspondence between signal and wait events for execution P, events can
be assigned time vectors by using Algorithm 1. Mattern [Mat88] has shown that the time
vector
P
correctly represents the partial order relation
P
!
, i.e., for any pair of distinct
events
e
i
2
E
i
and
e
2
E
,
P
(
e
i
)[
i
]
P
(
e
)[
i
]
()
e
i
P
!
e:
For completeness, we now prove that the initial time vectors,
, correctly represent the
happened before relation for the canonical execution.
Theorem 1:
For any pair of distinct events
e
i
2
E
i
and
e
2
E
,
(
e
i
)[
i
]
(
e
)[
i
]
()
e
i
!
e
2. Initializing the Vectors
7
1.3 An Overview of the New Algorithms
In the following sections we will introduce a series of algorithms to calculate dierent
time vectors for trace events. By comparing their nal time vectors, we can distinguish
many
ordered
events from the
unordered
, p otentially concurrent, events. Our goal is a set
of time vectors where if event
e
1
has an earlier time vector than
e
2
, then
e
1
will happen
before
e
2
in
all
executions that are consistent with the given trace
3
.
The three phases of the algorithm are \initialize", \rewind", and \expand". The
initialization uses Algorithm 1. The resulting partial order is similar to that computed
by the algorithm of [Fid88]. This partial order is shown to be equivalent to the \happened
before",
P
!
, relation for a canonical execution
P
. Note that the canonical execution is in
general not the same execution which generated the trace. The result of the rewinding
phase is a partial order that is a subrelation of the
relation. Unfortunately this safe
order relation is overly conservative, in that there may be many \must happen before"
relations that it does not include. The third and nal phase results in a safe partial order
that is closer to the \must happen before" relation.
2 Initializing the Vectors
Before giving the algorithm for computing the initial time vectors, we dene a canonical
execution that will be used to verify the \correctness" of the time vectors.
Denition 13:
Given a trace
E
with the total ordering of events,
<
E
, the partial order
P
!
corresponding to the
canonical
execution
P
is constructed by selecting and taking the
transitive closure of the fol lowing subrelation of
<
E
.
If
e
i
and
e
j
are two events from the same task and
e
i
<
E
e
j
then
e
i
P
!
e
j
.
If
e
i
and
e
j
are the
k
th signal and wait events respectively on the same semaphore,
then
e
i
P
!
e
j
.
In the remainder of the paper we will use
!
to mean
P
!
where
P
is the canonical
execution dened above.
Property 3:
If
e
!
^
e
then
e
appears before
^
e
in the trace.
3
Given a specic input and a trace, there are in general executions which are not consistent with that
trace, however, any such execution will contain a race if and only if a race occurred in the execution that
generated the trace [AP87].
1. Introduction
6
length
n
, where
n
is the total number of tasks.
2
Each task
T
i
has its own vector component
C
i
[
i
] which guarantees a strict temporal ordering of events occurring in that task. A local
event counter which is incremented each time an event occurs in the task can b e used as
the lo cal clock.
Before presenting the algorithms for computing time vectors from a trace, we need to
dene some notation.
Denition 9:
For an event
e
2
E
i
,
e
p
is the previous event performed by the same task
T
i
if such an event exists.
Denition 10:
For an event
e
2
E
i
,
#
(
e
)
is the time vector containing the local event
count for
e
in the
i
th position and zeros elsewhere.
Denition 11:
For any two time vectors
u; v
in
Z
n
1.
u
v
() 8
i
(
u
[
i
]
v
[
i
])
2.
u<v
()
u
v
and
u
6
=
v
3.
u
k
v
() :
(
u < v
)
and
:
(
v < u
)
.
Denition 12:
For any k time vectors
v
1
;
...
; v
k
of
Z
n
min(
v
1
;
...
; v
k
)
is a vector of
Z
n
whose
i
th component is
min(
v
1
[
i
]
;
...
; v
k
[
i
])
, and
max(
v
1
;
...
; v
k
)
is a vector of
Z
n
whose
i
th component is
max(
v
1
[
i
]
;
...
; v
k
[
i
])
.
The following algorithm (derived from [Mat88, Fid88]) computes time vectors for the
events in an execution. This algorithm requires the correspondence between signal and
wait events. The time vectors produced reect the execution's partial order.
Algorithm 1:
Given the correspondence between signal and wait events for execution
P
,
events are assigned time vectors,
P
(
e
i
), in topological order.
P
(
e
i
) = max(
v
t
; v
s
;
#
(
e
i
))
where
v
t
=
(
P
(
e
p
i
) if
e
i
has a predecessor
the 0 vector otherwise
v
s
=
8
>
<
>
:
P
(^
e
) if
e
i
is a wait event and
^
e
is the corresponding signal event
the 0 vector if
e
i
is not a wait event
End Algorithm 1.
2
We use an integer valued clock in our discussion although a real number valued clock can also be used.
1. Introduction
5
Denition 4:
An execution is
strongly consistent
with a trace if it is consistent and
the total order specied by the trace is an extension of the partial order specied by the
execution.
For example, consider the trace
f
AS1, CW1, CS1, CS2, BW1, BS1, BS2, AW2, AW2,
AW1
g
. Event AS1 means task A performs a signal(
S
1
), AW1 means task A performs
wait(
S
1
) etc. Figure 1.1 shows the four executions which are consistent with this trace. In
addition, the executions (a) and (b) are strongly consistent with the trace.
Denition 5:
Consider the correspondence between signal and wait events in execution P
and two distinct events
e; e
0
. If
e
P
6!
e
0
and
e
0
P
6!
e
then events
e
and
e
0
are concurrent, and
thus can happen at the same time, in the execution.
Denition 6:
The symbol \
k
" is used to represent the concurrent relationship between
events. Two events
e
and
e
0
are concurrent, i.e.
e
k
e
0
, if they can happen at the same
time in some execution which is consistent with the trace.
Denition 7:
The symbol \
" is used to represent the
must happen b efore
relationship
between events. Given two events
e
and
e
0
, if
e
e
0
, then event
e
will happen before
e
0
in
all executions that are consistent with the given trace. Events
e
and
e
0
are
ordered
if
e
e
0
or
e
0
e
, otherwise, they are
unordered
.
Concurrent events are always unordered, but unordered events need not be concurrent.
For example, see events BW1 and CW1 in Figure 1.1.
Notice that, in general,
e
e
0
is dierent from the relation
e
P
!
e
0
for any choice of
P
.
The former relation tells us that
e
must happen before
e
0
in all executions consistent with
the trace being analyzed, while the later says that
e
happened before
e
0
in the execution
represented by the partial order
P
. If
e
e
0
then
e
P
!
e
0
for all consistent executions
P
.
But the converse condition do es not hold.
In Figure 1.1, CS1
P
!
BW1 if
P
is the execution (a). However, if
P
is the execution
(c), BS1
P
!
CW1, and BW1
P
!
CS1 by transitivity. Event AS1 happens before BW1 and
CW1 in all executions consistent with the trace, therefore AS1
BW1 and AS1
CW1.
There is no order relation between event CS2 and BW1 in execution (a). Therefore, they
can happen concurrently, i.e., CS2
k
BW1.
Denition 8:
A partial ordering R on the events is a
safe
order relation if
e
i
R
e
j
)
e
i
e
j
. If R is not safe, then R is
unsafe
.
1.2 Virtual Time
The concept of virtual time for distributed systems was introduced by Lamport in 1978
[Lam78]. The time vectors we compute in this paper are an extension of the time vectors of
Fidge [Fid88] and Mattern [Mat88]. There, each task
T
i
has a clock
C
i
which is a vector of
1. Introduction
4
Trace =
f
AS1, CW1, CS1, CS2, BW1, BS1, BS2, AW2, AW2, AW1
g
Figure 1.1: Trace, Executions, and Time Vectors
1. Introduction
3
and a (positive integer) sequence number equal to one plus the number of previous
operations performed by the task.
In order to perform the nal race analysis, it must be p ossible to determine from a trace
what shared objects are referenced b etween any two synchronization events. This can b e
done by additionally associating with each event the source line number of the statement
generating the event. From this the path between two adjacent events can be determined
and the variables referenced along the path can be computed [McD89].
Many other kinds of synchronization op erations can b e simulated by using counting
semaphores. Consider, for example, the event \
init task t
" which creates a new task
t
and the event \
await task t
" which blocks the running task until task
t
has terminated.
Given a trace containing these events, we can create an equivalent a trace containing only
semaphore events.
In each execution every wait event has a corresponding signal event. We use this
correspondence to dene a partial order representing that execution.
Denition 1:
An
execution
of a paral lel program is a partial ordering of the events
performed. This partial order is the transitive closure of edges from each event to the
next event performed by the same task and edges to each wait event from the corresponding
signal event.
The relation dened by the partial order
P
representing an execution is called the
happened
before
relation and is denoted with the symbol
P
!
. Our denition of \happened before" is
consistent with that of Lamport[Lam78].
Denition 2:
A trace of an execution is an interleaving of the local sequences of events
E
i
for
1
i
n
where for every prex of the trace and every semaphore S, the prex
contains at least as many signal(S) events as wait(S) events.
Every trace must satisfy the following properties:
Property 1:
No two events in the trace have both the same task id and the same sequence
number.
Property 2:
If there is an event with task id
t
and sequence number
k
, then for every
1
i < k
, there is an event with task id
t
and sequence number
i
appearing earlier in the
trace.
A single execution usually has many possible traces. Similarly, a single trace could have
been generated by any one of a number of executions. (Figures 1.1(a) and 1.1(b) show two
dierent executions for the same trace).
Denition 3:
An execution is
consistent
with a trace if the local sequences of trace events
E
i
for each task
1
i
n
is the same as in the execution.
1. Introduction
2
occur" execution order. Our algorithms appear to be more ecient and may nd more
guaranteed order relations.
Netzer and Miller [NM89] present a formal model of a program execution based
on Lamport's model of concurrent systems [Lam86]. Their model includes fork/join
parallelism and synchronization using semaphores. They distinguish b etween an
actual
data race
, which is a data race exhibited by the particular program execution generating
the trace, and a
feasible data race
, which is a data race that could have been exhibited
due to timing variations. They show how to characterize each detected data race as either
being feasible, or as belonging to a set of data races such that at least one data race in
the set is feasible. They rely on the trace for their ordering information. As an example,
when two tasks try to enter some critical regions surrounded by some binary semaphore
S, their algorithm will say that these two tasks are ordered when accessing these regions.
Under their denitions there is neither an actual nor feasible data race even if two tasks
write to some shared variable in this case. We view the ordering relationships in the trace
with suspicion, and wish to generate race reports in this situation.
We believe that it is more helpful to analyze sets of executions rather than just one
specic execution based on some trace information. We feel that, in terms of detecting data
races by trace analysis, it is critical to distinguish the
ordered
events from the
unordered
,
potentially
concurrent
, events. In this paper we present a collection of algorithms that
extend previous work in computing partial orders. The algorithms presented compute a
partial order containing only \
must occur
" type orderings from a linearly ordered trace
containing anonymous synchronization. The algorithms presented in this paper make
few assumptions about specic trace features and can be adjusted to work with traces
generated by many parallel systems, including IBM Parallel Fortran [IBM88], and Cedar
Fortran [GPH*88].
1.1 Description of the Mo del
We view a parallel program as a nite set of
tasks
T
1
;
...
; T
n
where
n
is the number
of tasks in the system. These tasks p erform synchronization and computation operations,
including computation on shared data
1
. In an execution, each task
T
i
is a sequential entity
characterized by a local sequence
E
i
of events. Dierent tasks may perform operations
concurrently. We assume, for convenience, that each task has a unique identier.
In our model, programs synchronize using only counting semaphores which are assumed
to be initialized to zero. Therefore, each event is a tuple containing:
the operation completed (wait or signal),
the semaphore aected,
the id of the task that performed the operation,
1
Although operations on shared data can b e used for synchronization [Dij65], we only consider explicit
synchronization operations as capable of generating synchronization events.
1. Introduction
1
1 Introduction
One of the fundamental problems encountered when debugging a parallel program is
determining the race conditions in the program. A race condition may exist when two
or more parallel tasks access shared data in an unspecied order and at least one of the
accesses is a write access. Notice that races include both accesses that may occur \at the
same time" and accesses that must occur sequentially but the order is unspecied (e.g.
accesses protected by a lock). One approach to determining potential races is based on
computing all of the reachable concurrent states of the program [McD89, Tay84]. The
major disadvantage of this approach is that the number of concurrent states may become
prohibitively large. Another approach to determining p otential races is based on analyzing
a trace from an execution of the program [EP88, EGP89, NM89]. This approach has the
disadvantage that a trace must be recorded, and is limited to determining races that
can occur given the input data used. Even for the given data, it may not b e possible
to determine all races [AP87]. Nevertheless, this later approach can provide important
information to help in debugging parallel programs and is the sub ject of this paper.
A
trace
species a total ordering of the events performed by the program. For our
purposes, the trace reects only one of the orders in which the events could have occurred.
A more restrictive denition that is dicult to achieve in practice would be for a trace to
specify the exact order in which the events did occur. Since traces are only approximations
of executions, there are usually several executions that are consistent with a given trace.
What we want to compute is the orderings b etween pairs of events that
must occur
in all
executions which are consistent with the trace. In general this will be a partial order. If
the partial order contains all orderings that must occur, then a pair of events not ordered
by this \
must occur
" partial ordering can potentially execute in either order.
Much research has been directed towards determining the partial ordering of events in
parallel and distributed systems. Previous models have assumed point-to-point commu-
nication which makes it very easy to determine which events were caused by which other
events (e.g. \message received by B from A" is clearly caused by \message sent by A to
B"). Unfortunately the synchronization models supported by several parallel programming
languages allow for anonymous communication, where the partner is unknown. Examples
of anonymous communication include lo cks, semaphores, and monitors.
Emrath, Ghosh, and Padua [EGP89] present a method for detecting non-determinacy
in parallel programs that utilize fork/join and event style synchronization instructions
with the
Post, Wait
, and
Clear
primitives. They construct a
Task Graph
from the given
synchronization instructions and the sequential components of the program that is intended
to show the guaranteed orderings b etween events. For each
Wait
event node, all
Post
nodes
that might have triggered that
Wait
are identied. An edge is then added from the closest
common ancestor of these
Post
events to the
Wait
event node. The idea of the algorithm
is very simple, but it may be computationally complex. Also some of the guaranteed
order relations may be missed by their algorithm. Rather than repeatedly computing
the common ancestor information, we use time vectors to calculate the guaranteed \must
0
Analyzing Traces with
Anonymous Synchronization
David P. Helmbold
Charles E. McDowell
Jian-Zhong Wang
UCSC-CRL-89-42
December, 1989
Board of Studies in Computer and Information Sciences
University of California at Santa Cruz
Santa Cruz, CA 95064
abstract
In a parallel system, events can o ccur concurrently. However, programmers are often
forced to rely on misleading sequential traces for information about their program's behav-
ior. We present a series of algorithms which extract ordering information from a sequential
trace with anonymous semaphore-style synchronization.
We view a program execution as a partial ordering of events, and dene which executions
are consistent with a given trace. Although it is generally not possible to determine which
of the consistent executions occurred, we dene the notion of \safe orderings" which are
guaranteed to occur in every execution which is consistent with the trace.
The main results of the paper are algorithms which determine many of the \safe or-
derings". The rst algorithm starts from a sequential trace and creates a partially ordered
canonical execution. The second algorithm strips away the ordering relationships particular
to the canonical execution, so that the resulting partial order is safe. The third algorithm
increases the amount of ordering information while maintaining a safe partial order. All
three algorithms are accompanied by proofs of correctness.
keywords: virtual time, program tracing, parallel processing, debugging
This work was supported by IBM under agreement SL 88096.
... Incompatibility of these facilities forces the programmer to either use different facilities for different parts of the cycle or debug without them. On the other hand, it compels the debuggers to either constrain the range of behaviors that can be checked [9], [7]; or to tolerate the ambiguities in the observed behavior [8], [16]; or to demand extra programming effort [4], [12], [6], [1]. ...
... Moreover, another debugger is needed to resolve the ambiguities arising from the inability to record the causal orderings in P → E part of the cycle [8], [18]. ...
... Distributed systems often record the event orderings by exploiting the data dependences introduced by the send/ receive of messages with the help of unique timestamps (or identifiers) [13]. Shared memory debuggers that detect races [16], [8], however, ignore the data dependences introduced by the accesses to the shared synchronization variables. They, also, ignore the importance of unique identifiers or time-stamps for each event. ...
... There have been many studies on debugging data races. Some perform a post-mortem analysis based on program execution traces [8, 11, 13, 21, 22], while others perform on-the-fly analysis during program exe- cution [2, 10, 20, 27]. Among modern shared-memory parallel programming models [9, 23, 24, 26], only Cilk++ [9] provides a data race detector called Cilkscreen [2, 9, 16]. ...
... There have been many studies on debugging data races. Some perform a post-mortem analysis based on program execution traces [8,11,13,21,22], while others perform on-the-fly analysis during program execution [2,10,20,27]. Among modern shared-memory parallel programming models [9,23,24,26], only Cilk++ [9] provides a data race detector called Cilkscreen [2,9,16]. ...
Conference Paper
Full-text available
This paper proposes a data race prevention scheme, which can prevent data races in the View-Oriented Parallel Programming (VOPP) model. VOPP is a novel shared-memory data-centric parallel programming model, which uses views to bundle mutual exclusion with data access. We have implemented the data race prevention scheme with a memory protection mechanism. Experimental results show that the extra overhead of memory protection is trivial in our applications. We also present a new VOPP implementation-Maotai 2.0, which has advanced features such as deadlock avoidance, producer/consumer view and system queues, in addition to the data race prevention scheme. The performance of Maotai 2.0 is evaluated and compared with modern programming models such as OpenMP and Cilk.
... Miller and Netzer showed that detecting race conditions in parallel programs that use multiple semaphores is NP-complete [15]. Researchers have developed exact algorithms for cases where the problem is efficiently solvable (programs that use types of synchronization weaker than semaphores such as post/wait/clear) [8, 9, 14], and heuristics for the multiple semaphore case [4, 10]. The complexity for the case of constant number of semaphores was unknown. ...
Article
We address the problem of detecting race conditions in programs that use semaphores for synchronization. Netzer and Miller showed that it is NP-complete to detect race conditions in programs that use many semaphores. We show in this paper that it remains NP-complete even if only two semaphores are used in the parallel programs. For the tractable case, i.e., using only one semaphore, we give two algorithms for detecting race conditions from the trace of executing a parallel program on p processors, where n semaphore operations are executed. The first algorithm determines in O(n) time whether a race condition exists between any two given operations. The second algorithm runs in O( np log n) time and outputs a compact representation from which one can determine in O(1) time whether a race condition exists between any two given operations. The second algorithm is near-optimal in that the running time is only O( log n) times the time required simply to write down the output.
... Helmbold, McDowell and Wang [HDW90,HDW93] present models and algorithms to find possible event orderings in traces of programs that use anonymous synchronisation. On-the-fly analysis techniques for detecting data races in programs using fork/join and arbitrary synchronization have been developed by Schoenberg [SCH89]. ...
... Existing data race detection methods [3,6,7,1,11,12,5,2,4] work by first instrumenting the program so that information about its execution is recorded, then executing the program, and finally analyzing the collected information. Although these methods differ in how this information is collected and analyzed (on-the-fly and post-mortem approaches exist), all analyze essentially the same information about the execution: which sections of code executed, the sets of shared variables read and written by each section of code, and the relative execution order between some synchronization operations. ...
Conference Paper
Dynamic data race detection is a critical part of debugging shared-memory parallel programs. The races that can be detected must be refined to filter out false alarms and pinpoint only those that are direct manifestations of bugs. Most race detection methods can report false alarms because of imprecise run-time information and because some races are caused by others. To overcome this problem, race refinement uses whatever run-time information is available to speculate on which of the detected races should be reported. In this paper we report on experimental tests of two refinement techniques previously developed by us. Our goal was to determine whether good refinement is possible, and how much run-time information is required. We analyzed two sets of programs, one set written by others (which they had tested and believed to be race-free but which in fact had subtle races) and another set written by us (in which we introduced more complex races). We performed race detection and refinement on executions of these programs, and recorded both the global event ordering and an approximate ordering recorded without a global clock. We found that in all the programs written by others, accurate refinement was possible even without the global ordering. In the other programs, accurate refinement was also possible but required the global ordering. These results suggest that our techniques refine races accurately, and lead a programmer directly to race-causing bugs. They also suggest that race detection methods should record only enough information necessary for good refinement (either global or approximate event orderings), and this information depends on the severity of the races being debugged.
... There have been many studies on debugging data races. Some perform a post-mortem analysis based on program execution traces [7, 10, 12, 20, 21] , while others perform on-the-fly analysis during program execu- tion [2, 9, 19, 25]. Among modern shared-memory parallel programming models [6, 8, 22, 24], only Cilk++ [8] provides a data race detector called Cilkscreen [2, 8, 15]. ...
Article
Full-text available
Data races hamper parallel programming and threaten the reliability of future software. This paper proposes the data race prevention scheme View-Oriented Data race Prevention (VODAP), which can prevent data races in the View-Oriented Parallel Programming (VOPP) model. VOPP is a novel shared-memory data-centric parallel programming model, which uses views to bundle mutual exclusion with data access. We have implemented the data race prevention scheme with a memory protection mechanism. Experimental results show that the extra overhead of memory protection is trivial in our applications. The performance is evaluated and compared with modern programming models such as OpenMP and Cilk. VOPP-View oriented parallel programming-Concurrent programming-SPMD-Data race free-Data-centric programming-Cilk-Multicore-Shared-memory
... Whenever the next event is requested by the modeling layer, we have to determine the buffer from which to read that event, thus incrementally merging the local buffers into a consistent global sequence. The problem of merging local event traces to a global one according to causal relationships is well known; there are also several solutions yet (e.g. [8, 9]). However, with a state based visualization approach it turns out that just considering causal dependencies between events is not sufficient for achieving a consistent visualization. ...
... Existing data race detection methods [3,6,7,1,11,12,5,2,4] work by first instrumenting the program so that information about its execution is recorded, then executing the program, and finally analyzing the collected information. Although these methods differ in how this information is collected and analyzed (on-the-fly and post-mortem approaches exist), all analyze essentially the same information about the execution: which sections of code executed, the sets of shared variables read and written by each section of code, and the relative execution order between some synchronization operations. ...
Conference Paper
The availability of multicore processors across a wide range of computing platforms has created a strong demand for software frameworks that can harness these resources. This paper overviews the Cilk++ programming environment, which incorporates a compiler, a runtime system, and a race-detection tool. The Cilk++ runtime system guarantees to load-balance computations effectively. To cope with legacy codes containing global variables, Cilk++ provides a ldquohyperobjectrdquo library which allows races on nonlocal variables to be mitigated without lock contention or substantial code restructuring.
ResearchGate has not been able to resolve any references for this publication.